netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch net-next v2 00/10] introduce rocker switch driver with hardware accelerated datapath api - phase 1: bridge fdb offload
@ 2014-11-09 10:51 Jiri Pirko
  2014-11-09 10:51 ` [patch net-next v2 01/10] net: rename netdev_phys_port_id to more generic name Jiri Pirko
                   ` (13 more replies)
  0 siblings, 14 replies; 100+ messages in thread
From: Jiri Pirko @ 2014-11-09 10:51 UTC (permalink / raw)
  To: netdev
  Cc: davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse, pshelar,
	azhou, ben, stephen, jeffrey.t.kirsher, vyasevic, xiyou.wangcong,
	john.r.fastabend, edumazet, jhs, sfeldma, f.fainelli, roopa,
	linville, jasowang, ebiederm, nicolas.dichtel, ryazanov.s.a,
	buytenh, aviadr, nbd, alexei.starovoitov, Neil.Jerram, ronye,
	simon.horman, alexander.h.duyck, john.ronciak, mleitner,
	shrijeet, gospo, bcrl

Hi all.

This patchset is just the first phase of switch and switch-ish device
support api in kernel. Note that the api will extend (our complete work
can be pulled from https://github.com/jpirko/net-next-rocker).

So what this patchset includes:
- introduce switchdev api for implementing switch drivers (so far
  only linux bridge fdb offload is covered)
- introduce rocker switch driver which implements switchdev api

As to the discussion if there is need to have specific class of device
representing the switch itself, so far we found no need to introduce that.
But we are generally ok with the idea and when the time comes and it will
be needed, it can be easily introduced without any disturbance.

This patchset introduces switch id export through rtnetlink and sysfs,
which is similar to what we have for port id in SR-IOV. I will send iproute2
patchset for showing the switch id for port netdevs once this is applied.

For detailed description, please see individual patches.

v1->v2:
- addressed all DaveM's comments

Jiri Pirko (5):
  net: rename netdev_phys_port_id to more generic name
  net: introduce generic switch devices support
  rtnl: expose physical switch id for particular device
  net-sysfs: expose physical switch id for particular device
  rocker: introduce rocker switch driver

Scott Feldman (5):
  bridge: introduce fdb offloading via switchdev
  bridge: call netdev_sw_port_stp_update when bridge port STP status
    changes
  bridge: add API to notify bridge driver of learned FBD on offloaded
    device
  rocker: implement rocker ofdpa flow table manipulation
  rocker: implement L2 bridge offloading

 Documentation/networking/switchdev.txt           |   59 +
 MAINTAINERS                                      |   14 +
 drivers/net/ethernet/Kconfig                     |    1 +
 drivers/net/ethernet/Makefile                    |    1 +
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c |    2 +-
 drivers/net/ethernet/intel/i40e/i40e_main.c      |    2 +-
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c   |    2 +-
 drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c |    2 +-
 drivers/net/ethernet/rocker/Kconfig              |   27 +
 drivers/net/ethernet/rocker/Makefile             |    5 +
 drivers/net/ethernet/rocker/rocker.c             | 4182 ++++++++++++++++++++++
 drivers/net/ethernet/rocker/rocker.h             |  427 +++
 include/linux/if_bridge.h                        |   18 +
 include/linux/netdevice.h                        |   48 +-
 include/net/switchdev.h                          |   53 +
 include/uapi/linux/if_link.h                     |    1 +
 net/Kconfig                                      |    1 +
 net/Makefile                                     |    3 +
 net/bridge/br_fdb.c                              |   94 +-
 net/bridge/br_netlink.c                          |    2 +
 net/bridge/br_stp.c                              |    4 +
 net/bridge/br_stp_if.c                           |    3 +
 net/bridge/br_stp_timer.c                        |    2 +
 net/core/dev.c                                   |    2 +-
 net/core/net-sysfs.c                             |   26 +-
 net/core/rtnetlink.c                             |   30 +-
 net/switchdev/Kconfig                            |   13 +
 net/switchdev/Makefile                           |    5 +
 net/switchdev/switchdev.c                        |   93 +
 29 files changed, 5104 insertions(+), 18 deletions(-)
 create mode 100644 Documentation/networking/switchdev.txt
 create mode 100644 drivers/net/ethernet/rocker/Kconfig
 create mode 100644 drivers/net/ethernet/rocker/Makefile
 create mode 100644 drivers/net/ethernet/rocker/rocker.c
 create mode 100644 drivers/net/ethernet/rocker/rocker.h
 create mode 100644 include/net/switchdev.h
 create mode 100644 net/switchdev/Kconfig
 create mode 100644 net/switchdev/Makefile
 create mode 100644 net/switchdev/switchdev.c

-- 
1.9.3

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [patch net-next v2 01/10] net: rename netdev_phys_port_id to more generic name
  2014-11-09 10:51 [patch net-next v2 00/10] introduce rocker switch driver with hardware accelerated datapath api - phase 1: bridge fdb offload Jiri Pirko
@ 2014-11-09 10:51 ` Jiri Pirko
  2014-11-10  3:35   ` Jamal Hadi Salim
  2014-11-10 21:57   ` John Fastabend
  2014-11-09 10:51 ` [patch net-next v2 02/10] net: introduce generic switch devices support Jiri Pirko
                   ` (12 subsequent siblings)
  13 siblings, 2 replies; 100+ messages in thread
From: Jiri Pirko @ 2014-11-09 10:51 UTC (permalink / raw)
  To: netdev
  Cc: davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse, pshelar,
	azhou, ben, stephen, jeffrey.t.kirsher, vyasevic, xiyou.wangcong,
	john.r.fastabend, edumazet, jhs, sfeldma, f.fainelli, roopa,
	linville, jasowang, ebiederm, nicolas.dichtel, ryazanov.s.a,
	buytenh, aviadr, nbd, alexei.starovoitov, Neil.Jerram, ronye,
	simon.horman, alexander.h.duyck, john.ronciak, mleitner,
	shrijeet, gospo, bcrl

So this can be reused for identification of other "items" as well.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c |  2 +-
 drivers/net/ethernet/intel/i40e/i40e_main.c      |  2 +-
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c   |  2 +-
 drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c |  2 +-
 include/linux/netdevice.h                        | 16 ++++++++--------
 net/core/dev.c                                   |  2 +-
 net/core/net-sysfs.c                             |  2 +-
 net/core/rtnetlink.c                             |  6 +++---
 8 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index c4bd025..336ef3c 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -12537,7 +12537,7 @@ static int bnx2x_validate_addr(struct net_device *dev)
 }
 
 static int bnx2x_get_phys_port_id(struct net_device *netdev,
-				  struct netdev_phys_port_id *ppid)
+				  struct netdev_phys_item_id *ppid)
 {
 	struct bnx2x *bp = netdev_priv(netdev);
 
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 1a98e23..d749165 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -7373,7 +7373,7 @@ static void i40e_del_vxlan_port(struct net_device *netdev,
 
 #endif
 static int i40e_get_phys_port_id(struct net_device *netdev,
-				 struct netdev_phys_port_id *ppid)
+				 struct netdev_phys_item_id *ppid)
 {
 	struct i40e_netdev_priv *np = netdev_priv(netdev);
 	struct i40e_pf *pf = np->vsi->back;
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index 0efbae9..bd007c3 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -2258,7 +2258,7 @@ static int mlx4_en_set_vf_link_state(struct net_device *dev, int vf, int link_st
 
 #define PORT_ID_BYTE_LEN 8
 static int mlx4_en_get_phys_port_id(struct net_device *dev,
-				    struct netdev_phys_port_id *ppid)
+				    struct netdev_phys_item_id *ppid)
 {
 	struct mlx4_en_priv *priv = netdev_priv(dev);
 	struct mlx4_dev *mdev = priv->mdev->dev;
diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
index f5e29f7..6e514d2 100644
--- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
+++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
@@ -460,7 +460,7 @@ static void qlcnic_82xx_cancel_idc_work(struct qlcnic_adapter *adapter)
 }
 
 static int qlcnic_get_phys_port_id(struct net_device *netdev,
-				   struct netdev_phys_port_id *ppid)
+				   struct netdev_phys_item_id *ppid)
 {
 	struct qlcnic_adapter *adapter = netdev_priv(netdev);
 	struct qlcnic_hardware_context *ahw = adapter->ahw;
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 90ac959..71922e0 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -753,13 +753,13 @@ struct netdev_fcoe_hbainfo {
 };
 #endif
 
-#define MAX_PHYS_PORT_ID_LEN 32
+#define MAX_PHYS_ITEM_ID_LEN 32
 
-/* This structure holds a unique identifier to identify the
- * physical port used by a netdevice.
+/* This structure holds a unique identifier to identify some
+ * physical item (port for example) used by a netdevice.
  */
-struct netdev_phys_port_id {
-	unsigned char id[MAX_PHYS_PORT_ID_LEN];
+struct netdev_phys_item_id {
+	unsigned char id[MAX_PHYS_ITEM_ID_LEN];
 	unsigned char id_len;
 };
 
@@ -975,7 +975,7 @@ typedef u16 (*select_queue_fallback_t)(struct net_device *dev,
  *	USB_CDC_NOTIFY_NETWORK_CONNECTION) should NOT implement this function.
  *
  * int (*ndo_get_phys_port_id)(struct net_device *dev,
- *			       struct netdev_phys_port_id *ppid);
+ *			       struct netdev_phys_item_id *ppid);
  *	Called to get ID of physical port of this device. If driver does
  *	not implement this, it is assumed that the hw is not able to have
  *	multiple net devices on single physical port.
@@ -1149,7 +1149,7 @@ struct net_device_ops {
 	int			(*ndo_change_carrier)(struct net_device *dev,
 						      bool new_carrier);
 	int			(*ndo_get_phys_port_id)(struct net_device *dev,
-							struct netdev_phys_port_id *ppid);
+							struct netdev_phys_item_id *ppid);
 	void			(*ndo_add_vxlan_port)(struct  net_device *dev,
 						      sa_family_t sa_family,
 						      __be16 port);
@@ -2878,7 +2878,7 @@ void dev_set_group(struct net_device *, int);
 int dev_set_mac_address(struct net_device *, struct sockaddr *);
 int dev_change_carrier(struct net_device *, bool new_carrier);
 int dev_get_phys_port_id(struct net_device *dev,
-			 struct netdev_phys_port_id *ppid);
+			 struct netdev_phys_item_id *ppid);
 struct sk_buff *validate_xmit_skb_list(struct sk_buff *skb, struct net_device *dev);
 struct sk_buff *dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
 				    struct netdev_queue *txq, int *ret);
diff --git a/net/core/dev.c b/net/core/dev.c
index 70bb609..9c67f36 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5818,7 +5818,7 @@ EXPORT_SYMBOL(dev_change_carrier);
  *	Get device physical port ID
  */
 int dev_get_phys_port_id(struct net_device *dev,
-			 struct netdev_phys_port_id *ppid)
+			 struct netdev_phys_item_id *ppid)
 {
 	const struct net_device_ops *ops = dev->netdev_ops;
 
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 9dd0669..55dc4da 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -387,7 +387,7 @@ static ssize_t phys_port_id_show(struct device *dev,
 		return restart_syscall();
 
 	if (dev_isalive(netdev)) {
-		struct netdev_phys_port_id ppid;
+		struct netdev_phys_item_id ppid;
 
 		ret = dev_get_phys_port_id(netdev, &ppid);
 		if (!ret)
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index a688268..1087c6d 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -868,7 +868,7 @@ static noinline size_t if_nlmsg_size(const struct net_device *dev,
 	       + rtnl_port_size(dev, ext_filter_mask) /* IFLA_VF_PORTS + IFLA_PORT_SELF */
 	       + rtnl_link_get_size(dev) /* IFLA_LINKINFO */
 	       + rtnl_link_get_af_size(dev) /* IFLA_AF_SPEC */
-	       + nla_total_size(MAX_PHYS_PORT_ID_LEN); /* IFLA_PHYS_PORT_ID */
+	       + nla_total_size(MAX_PHYS_ITEM_ID_LEN); /* IFLA_PHYS_PORT_ID */
 }
 
 static int rtnl_vf_ports_fill(struct sk_buff *skb, struct net_device *dev)
@@ -952,7 +952,7 @@ static int rtnl_port_fill(struct sk_buff *skb, struct net_device *dev,
 static int rtnl_phys_port_id_fill(struct sk_buff *skb, struct net_device *dev)
 {
 	int err;
-	struct netdev_phys_port_id ppid;
+	struct netdev_phys_item_id ppid;
 
 	err = dev_get_phys_port_id(dev, &ppid);
 	if (err) {
@@ -1196,7 +1196,7 @@ static const struct nla_policy ifla_policy[IFLA_MAX+1] = {
 	[IFLA_PROMISCUITY]	= { .type = NLA_U32 },
 	[IFLA_NUM_TX_QUEUES]	= { .type = NLA_U32 },
 	[IFLA_NUM_RX_QUEUES]	= { .type = NLA_U32 },
-	[IFLA_PHYS_PORT_ID]	= { .type = NLA_BINARY, .len = MAX_PHYS_PORT_ID_LEN },
+	[IFLA_PHYS_PORT_ID]	= { .type = NLA_BINARY, .len = MAX_PHYS_ITEM_ID_LEN },
 	[IFLA_CARRIER_CHANGES]	= { .type = NLA_U32 },  /* ignored */
 };
 
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [patch net-next v2 02/10] net: introduce generic switch devices support
  2014-11-09 10:51 [patch net-next v2 00/10] introduce rocker switch driver with hardware accelerated datapath api - phase 1: bridge fdb offload Jiri Pirko
  2014-11-09 10:51 ` [patch net-next v2 01/10] net: rename netdev_phys_port_id to more generic name Jiri Pirko
@ 2014-11-09 10:51 ` Jiri Pirko
  2014-11-10 21:59   ` John Fastabend
                     ` (2 more replies)
  2014-11-09 10:51 ` [patch net-next v2 03/10] rtnl: expose physical switch id for particular device Jiri Pirko
                   ` (11 subsequent siblings)
  13 siblings, 3 replies; 100+ messages in thread
From: Jiri Pirko @ 2014-11-09 10:51 UTC (permalink / raw)
  To: netdev
  Cc: davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse, pshelar,
	azhou, ben, stephen, jeffrey.t.kirsher, vyasevic, xiyou.wangcong,
	john.r.fastabend, edumazet, jhs, sfeldma, f.fainelli, roopa,
	linville, jasowang, ebiederm, nicolas.dichtel, ryazanov.s.a,
	buytenh, aviadr, nbd, alexei.starovoitov, Neil.Jerram, ronye,
	simon.horman, alexander.h.duyck, john.ronciak, mleitner,
	shrijeet, gospo, bcrl

The goal of this is to provide a possibility to support various switch
chips. Drivers should implement relevant ndos to do so. Now there is
only one ndo defined:
- for getting physical switch id is in place.

Note that user can use random port netdevice to access the switch.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
---
 Documentation/networking/switchdev.txt | 59 ++++++++++++++++++++++++++++++++++
 MAINTAINERS                            |  7 ++++
 include/linux/netdevice.h              | 10 ++++++
 include/net/switchdev.h                | 30 +++++++++++++++++
 net/Kconfig                            |  1 +
 net/Makefile                           |  3 ++
 net/switchdev/Kconfig                  | 13 ++++++++
 net/switchdev/Makefile                 |  5 +++
 net/switchdev/switchdev.c              | 33 +++++++++++++++++++
 9 files changed, 161 insertions(+)
 create mode 100644 Documentation/networking/switchdev.txt
 create mode 100644 include/net/switchdev.h
 create mode 100644 net/switchdev/Kconfig
 create mode 100644 net/switchdev/Makefile
 create mode 100644 net/switchdev/switchdev.c

diff --git a/Documentation/networking/switchdev.txt b/Documentation/networking/switchdev.txt
new file mode 100644
index 0000000..98be76c
--- /dev/null
+++ b/Documentation/networking/switchdev.txt
@@ -0,0 +1,59 @@
+Switch (and switch-ish) device drivers HOWTO
+===========================
+
+Please note that the word "switch" is here used in very generic meaning.
+This include devices supporting L2/L3 but also various flow offloading chips,
+including switches embedded into SR-IOV NICs.
+
+Lets describe a topology a bit. Imagine the following example:
+
+       +----------------------------+    +---------------+
+       |     SOME switch chip       |    |      CPU      |
+       +----------------------------+    +---------------+
+       port1 port2 port3 port4 MNGMNT    |     PCI-E     |
+         |     |     |     |     |       +---------------+
+        PHY   PHY    |     |     |         |  NIC0 NIC1
+                     |     |     |         |   |    |
+                     |     |     +- PCI-E -+   |    |
+                     |     +------- MII -------+    |
+                     +------------- MII ------------+
+
+In this example, there are two independent lines between the switch silicon
+and CPU. NIC0 and NIC1 drivers are not aware of a switch presence. They are
+separate from the switch driver. SOME switch chip is by managed by a driver
+via PCI-E device MNGMNT. Note that MNGMNT device, NIC0 and NIC1 may be
+connected to some other type of bus.
+
+Now, for the previous example show the representation in kernel:
+
+       +----------------------------+    +---------------+
+       |     SOME switch chip       |    |      CPU      |
+       +----------------------------+    +---------------+
+       sw0p0 sw0p1 sw0p2 sw0p3 MNGMNT    |     PCI-E     |
+         |     |     |     |     |       +---------------+
+        PHY   PHY    |     |     |         |  eth0 eth1
+                     |     |     |         |   |    |
+                     |     |     +- PCI-E -+   |    |
+                     |     +------- MII -------+    |
+                     +------------- MII ------------+
+
+Lets call the example switch driver for SOME switch chip "SOMEswitch". This
+driver takes care of PCI-E device MNGMNT. There is a netdevice instance sw0pX
+created for each port of a switch. These netdevices are instances
+of "SOMEswitch" driver. sw0pX netdevices serve as a "representation"
+of the switch chip. eth0 and eth1 are instances of some other existing driver.
+
+The only difference of the switch-port netdevice from the ordinary netdevice
+is that is implements couple more NDOs:
+
+	ndo_sw_parent_get_id - This returns the same ID for two port netdevices
+			       of the same physical switch chip. This is
+			       mandatory to be implemented by all switch drivers
+			       and serves the caller for recognition of a port
+			       netdevice.
+	ndo_sw_parent_* - Functions that serve for a manipulation of the switch
+			  chip itself (it can be though of as a "parent" of the
+			  port, therefore the name). They are not port-specific.
+			  Caller might use arbitrary port netdevice of the same
+			  switch and it will make no difference.
+	ndo_sw_port_* - Functions that serve for a port-specific manipulation.
diff --git a/MAINTAINERS b/MAINTAINERS
index 3a41fb0..776e078 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9003,6 +9003,13 @@ F:	lib/swiotlb.c
 F:	arch/*/kernel/pci-swiotlb.c
 F:	include/linux/swiotlb.h
 
+SWITCHDEV
+M:	Jiri Pirko <jiri@resnulli.us>
+L:	netdev@vger.kernel.org
+S:	Supported
+F:	net/switchdev/
+F:	include/net/switchdev.h
+
 SYNOPSYS ARC ARCHITECTURE
 M:	Vineet Gupta <vgupta@synopsys.com>
 S:	Supported
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 71922e0..97eade9 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1017,6 +1017,12 @@ typedef u16 (*select_queue_fallback_t)(struct net_device *dev,
  *	performing GSO on a packet. The device returns true if it is
  *	able to GSO the packet, false otherwise. If the return value is
  *	false the stack will do software GSO.
+ *
+ * int (*ndo_sw_parent_id_get)(struct net_device *dev,
+ *			       struct netdev_phys_item_id *psid);
+ *	Called to get an ID of the switch chip this port is part of.
+ *	If driver implements this, it indicates that it represents a port
+ *	of a switch chip.
  */
 struct net_device_ops {
 	int			(*ndo_init)(struct net_device *dev);
@@ -1168,6 +1174,10 @@ struct net_device_ops {
 	int			(*ndo_get_lock_subclass)(struct net_device *dev);
 	bool			(*ndo_gso_check) (struct sk_buff *skb,
 						  struct net_device *dev);
+#ifdef CONFIG_NET_SWITCHDEV
+	int			(*ndo_sw_parent_id_get)(struct net_device *dev,
+							struct netdev_phys_item_id *psid);
+#endif
 };
 
 /**
diff --git a/include/net/switchdev.h b/include/net/switchdev.h
new file mode 100644
index 0000000..79bf9bd
--- /dev/null
+++ b/include/net/switchdev.h
@@ -0,0 +1,30 @@
+/*
+ * include/net/switchdev.h - Switch device API
+ * Copyright (c) 2014 Jiri Pirko <jiri@resnulli.us>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+#ifndef _LINUX_SWITCHDEV_H_
+#define _LINUX_SWITCHDEV_H_
+
+#include <linux/netdevice.h>
+
+#ifdef CONFIG_NET_SWITCHDEV
+
+int netdev_sw_parent_id_get(struct net_device *dev,
+			    struct netdev_phys_item_id *psid);
+
+#else
+
+static inline int netdev_sw_parent_id_get(struct net_device *dev,
+					  struct netdev_phys_item_id *psid)
+{
+	return -EOPNOTSUPP;
+}
+
+#endif
+
+#endif /* _LINUX_SWITCHDEV_H_ */
diff --git a/net/Kconfig b/net/Kconfig
index 99815b5..ff9ffc1 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -228,6 +228,7 @@ source "net/vmw_vsock/Kconfig"
 source "net/netlink/Kconfig"
 source "net/mpls/Kconfig"
 source "net/hsr/Kconfig"
+source "net/switchdev/Kconfig"
 
 config RPS
 	boolean
diff --git a/net/Makefile b/net/Makefile
index 7ed1970..95fc694 100644
--- a/net/Makefile
+++ b/net/Makefile
@@ -73,3 +73,6 @@ obj-$(CONFIG_OPENVSWITCH)	+= openvswitch/
 obj-$(CONFIG_VSOCKETS)	+= vmw_vsock/
 obj-$(CONFIG_NET_MPLS_GSO)	+= mpls/
 obj-$(CONFIG_HSR)		+= hsr/
+ifneq ($(CONFIG_NET_SWITCHDEV),)
+obj-y				+= switchdev/
+endif
diff --git a/net/switchdev/Kconfig b/net/switchdev/Kconfig
new file mode 100644
index 0000000..1557545
--- /dev/null
+++ b/net/switchdev/Kconfig
@@ -0,0 +1,13 @@
+#
+# Configuration for Switch device support
+#
+
+config NET_SWITCHDEV
+	boolean "Switch (and switch-ish) device support (EXPERIMENTAL)"
+	depends on INET
+	---help---
+	  This module provides glue between core networking code and device
+	  drivers in order to support hardware switch chips in very generic
+	  meaning of the word "switch". This include devices supporting L2/L3 but
+	  also various flow offloading chips, including switches embedded into
+	  SR-IOV NICs.
diff --git a/net/switchdev/Makefile b/net/switchdev/Makefile
new file mode 100644
index 0000000..5ed63ed
--- /dev/null
+++ b/net/switchdev/Makefile
@@ -0,0 +1,5 @@
+#
+# Makefile for the Switch device API
+#
+
+obj-$(CONFIG_NET_SWITCHDEV) += switchdev.o
diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
new file mode 100644
index 0000000..5010f646
--- /dev/null
+++ b/net/switchdev/switchdev.c
@@ -0,0 +1,33 @@
+/*
+ * net/switchdev/switchdev.c - Switch device API
+ * Copyright (c) 2014 Jiri Pirko <jiri@resnulli.us>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/kernel.h>
+#include <linux/types.h>
+#include <linux/init.h>
+#include <linux/netdevice.h>
+#include <net/switchdev.h>
+
+/**
+ *	netdev_sw_parent_id_get - Get ID of a switch
+ *	@dev: port device
+ *	@psid: switch ID
+ *
+ *	Get ID of a switch this port is part of.
+ */
+int netdev_sw_parent_id_get(struct net_device *dev,
+			    struct netdev_phys_item_id *psid)
+{
+	const struct net_device_ops *ops = dev->netdev_ops;
+
+	if (!ops->ndo_sw_parent_id_get)
+		return -EOPNOTSUPP;
+	return ops->ndo_sw_parent_id_get(dev, psid);
+}
+EXPORT_SYMBOL(netdev_sw_parent_id_get);
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [patch net-next v2 03/10] rtnl: expose physical switch id for particular device
  2014-11-09 10:51 [patch net-next v2 00/10] introduce rocker switch driver with hardware accelerated datapath api - phase 1: bridge fdb offload Jiri Pirko
  2014-11-09 10:51 ` [patch net-next v2 01/10] net: rename netdev_phys_port_id to more generic name Jiri Pirko
  2014-11-09 10:51 ` [patch net-next v2 02/10] net: introduce generic switch devices support Jiri Pirko
@ 2014-11-09 10:51 ` Jiri Pirko
  2014-11-10  3:43   ` Jamal Hadi Salim
                     ` (2 more replies)
  2014-11-09 10:51 ` [patch net-next v2 04/10] net-sysfs: " Jiri Pirko
                   ` (10 subsequent siblings)
  13 siblings, 3 replies; 100+ messages in thread
From: Jiri Pirko @ 2014-11-09 10:51 UTC (permalink / raw)
  To: netdev
  Cc: davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse, pshelar,
	azhou, ben, stephen, jeffrey.t.kirsher, vyasevic, xiyou.wangcong,
	john.r.fastabend, edumazet, jhs, sfeldma, f.fainelli, roopa,
	linville, jasowang, ebiederm, nicolas.dichtel, ryazanov.s.a,
	buytenh, aviadr, nbd, alexei.starovoitov, Neil.Jerram, ronye,
	simon.horman, alexander.h.duyck, john.ronciak, mleitner,
	shrijeet, gospo, bcrl

The netdevice represents a port in a switch, it will expose
IFLA_PHYS_SWITCH_ID value via rtnl. Two netdevices with the same value
belong to one physical switch.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
---
 include/uapi/linux/if_link.h |  1 +
 net/core/rtnetlink.c         | 26 +++++++++++++++++++++++++-
 2 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 7072d83..4163753 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -145,6 +145,7 @@ enum {
 	IFLA_CARRIER,
 	IFLA_PHYS_PORT_ID,
 	IFLA_CARRIER_CHANGES,
+	IFLA_PHYS_SWITCH_ID,
 	__IFLA_MAX
 };
 
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 1087c6d..f839354 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -43,6 +43,7 @@
 
 #include <linux/inet.h>
 #include <linux/netdevice.h>
+#include <net/switchdev.h>
 #include <net/ip.h>
 #include <net/protocol.h>
 #include <net/arp.h>
@@ -868,7 +869,8 @@ static noinline size_t if_nlmsg_size(const struct net_device *dev,
 	       + rtnl_port_size(dev, ext_filter_mask) /* IFLA_VF_PORTS + IFLA_PORT_SELF */
 	       + rtnl_link_get_size(dev) /* IFLA_LINKINFO */
 	       + rtnl_link_get_af_size(dev) /* IFLA_AF_SPEC */
-	       + nla_total_size(MAX_PHYS_ITEM_ID_LEN); /* IFLA_PHYS_PORT_ID */
+	       + nla_total_size(MAX_PHYS_ITEM_ID_LEN) /* IFLA_PHYS_PORT_ID */
+	       + nla_total_size(MAX_PHYS_ITEM_ID_LEN); /* IFLA_PHYS_SWITCH_ID */
 }
 
 static int rtnl_vf_ports_fill(struct sk_buff *skb, struct net_device *dev)
@@ -967,6 +969,24 @@ static int rtnl_phys_port_id_fill(struct sk_buff *skb, struct net_device *dev)
 	return 0;
 }
 
+static int rtnl_phys_switch_id_fill(struct sk_buff *skb, struct net_device *dev)
+{
+	int err;
+	struct netdev_phys_item_id psid;
+
+	err = netdev_sw_parent_id_get(dev, &psid);
+	if (err) {
+		if (err == -EOPNOTSUPP)
+			return 0;
+		return err;
+	}
+
+	if (nla_put(skb, IFLA_PHYS_SWITCH_ID, psid.id_len, psid.id))
+		return -EMSGSIZE;
+
+	return 0;
+}
+
 static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev,
 			    int type, u32 pid, u32 seq, u32 change,
 			    unsigned int flags, u32 ext_filter_mask)
@@ -1039,6 +1059,9 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev,
 	if (rtnl_phys_port_id_fill(skb, dev))
 		goto nla_put_failure;
 
+	if (rtnl_phys_switch_id_fill(skb, dev))
+		goto nla_put_failure;
+
 	attr = nla_reserve(skb, IFLA_STATS,
 			sizeof(struct rtnl_link_stats));
 	if (attr == NULL)
@@ -1198,6 +1221,7 @@ static const struct nla_policy ifla_policy[IFLA_MAX+1] = {
 	[IFLA_NUM_RX_QUEUES]	= { .type = NLA_U32 },
 	[IFLA_PHYS_PORT_ID]	= { .type = NLA_BINARY, .len = MAX_PHYS_ITEM_ID_LEN },
 	[IFLA_CARRIER_CHANGES]	= { .type = NLA_U32 },  /* ignored */
+	[IFLA_PHYS_SWITCH_ID]	= { .type = NLA_BINARY, .len = MAX_PHYS_ITEM_ID_LEN },
 };
 
 static const struct nla_policy ifla_info_policy[IFLA_INFO_MAX+1] = {
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [patch net-next v2 04/10] net-sysfs: expose physical switch id for particular device
  2014-11-09 10:51 [patch net-next v2 00/10] introduce rocker switch driver with hardware accelerated datapath api - phase 1: bridge fdb offload Jiri Pirko
                   ` (2 preceding siblings ...)
  2014-11-09 10:51 ` [patch net-next v2 03/10] rtnl: expose physical switch id for particular device Jiri Pirko
@ 2014-11-09 10:51 ` Jiri Pirko
  2014-11-10 22:01   ` John Fastabend
  2014-11-09 10:51 ` [patch net-next v2 05/10] rocker: introduce rocker switch driver Jiri Pirko
                   ` (9 subsequent siblings)
  13 siblings, 1 reply; 100+ messages in thread
From: Jiri Pirko @ 2014-11-09 10:51 UTC (permalink / raw)
  To: netdev
  Cc: davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse, pshelar,
	azhou, ben, stephen, jeffrey.t.kirsher, vyasevic, xiyou.wangcong,
	john.r.fastabend, edumazet, jhs, sfeldma, f.fainelli, roopa,
	linville, jasowang, ebiederm, nicolas.dichtel, ryazanov.s.a,
	buytenh, aviadr, nbd, alexei.starovoitov, Neil.Jerram, ronye,
	simon.horman, alexander.h.duyck, john.ronciak, mleitner,
	shrijeet, gospo, bcrl

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
---
 net/core/net-sysfs.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 55dc4da..8e6603c 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -12,6 +12,7 @@
 #include <linux/capability.h>
 #include <linux/kernel.h>
 #include <linux/netdevice.h>
+#include <net/switchdev.h>
 #include <linux/if_arp.h>
 #include <linux/slab.h>
 #include <linux/nsproxy.h>
@@ -399,6 +400,28 @@ static ssize_t phys_port_id_show(struct device *dev,
 }
 static DEVICE_ATTR_RO(phys_port_id);
 
+static ssize_t phys_switch_id_show(struct device *dev,
+				   struct device_attribute *attr, char *buf)
+{
+	struct net_device *netdev = to_net_dev(dev);
+	ssize_t ret = -EINVAL;
+
+	if (!rtnl_trylock())
+		return restart_syscall();
+
+	if (dev_isalive(netdev)) {
+		struct netdev_phys_item_id ppid;
+
+		ret = netdev_sw_parent_id_get(netdev, &ppid);
+		if (!ret)
+			ret = sprintf(buf, "%*phN\n", ppid.id_len, ppid.id);
+	}
+	rtnl_unlock();
+
+	return ret;
+}
+static DEVICE_ATTR_RO(phys_switch_id);
+
 static struct attribute *net_class_attrs[] = {
 	&dev_attr_netdev_group.attr,
 	&dev_attr_type.attr,
@@ -423,6 +446,7 @@ static struct attribute *net_class_attrs[] = {
 	&dev_attr_flags.attr,
 	&dev_attr_tx_queue_len.attr,
 	&dev_attr_phys_port_id.attr,
+	&dev_attr_phys_switch_id.attr,
 	NULL,
 };
 ATTRIBUTE_GROUPS(net_class);
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [patch net-next v2 05/10] rocker: introduce rocker switch driver
  2014-11-09 10:51 [patch net-next v2 00/10] introduce rocker switch driver with hardware accelerated datapath api - phase 1: bridge fdb offload Jiri Pirko
                   ` (3 preceding siblings ...)
  2014-11-09 10:51 ` [patch net-next v2 04/10] net-sysfs: " Jiri Pirko
@ 2014-11-09 10:51 ` Jiri Pirko
  2014-11-10 22:04   ` John Fastabend
  2014-11-09 10:51 ` [patch net-next v2 06/10] bridge: introduce fdb offloading via switchdev Jiri Pirko
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 100+ messages in thread
From: Jiri Pirko @ 2014-11-09 10:51 UTC (permalink / raw)
  To: netdev
  Cc: davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse, pshelar,
	azhou, ben, stephen, jeffrey.t.kirsher, vyasevic, xiyou.wangcong,
	john.r.fastabend, edumazet, jhs, sfeldma, f.fainelli, roopa,
	linville, jasowang, ebiederm, nicolas.dichtel, ryazanov.s.a,
	buytenh, aviadr, nbd, alexei.starovoitov, Neil.Jerram, ronye,
	simon.horman, alexander.h.duyck, john.ronciak, mleitner,
	shrijeet, gospo, bcrl

This patch introduces the first driver to benefit from the switchdev
infrastructure and to implement newly introduced switch ndos. This is a
driver for emulated switch chip implemented in qemu:
https://github.com/sfeldma/qemu-rocker/

This patch is a result of joint work with Scott Feldman.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
---
 MAINTAINERS                          |    7 +
 drivers/net/ethernet/Kconfig         |    1 +
 drivers/net/ethernet/Makefile        |    1 +
 drivers/net/ethernet/rocker/Kconfig  |   27 +
 drivers/net/ethernet/rocker/Makefile |    5 +
 drivers/net/ethernet/rocker/rocker.c | 2060 ++++++++++++++++++++++++++++++++++
 drivers/net/ethernet/rocker/rocker.h |  427 +++++++
 7 files changed, 2528 insertions(+)
 create mode 100644 drivers/net/ethernet/rocker/Kconfig
 create mode 100644 drivers/net/ethernet/rocker/Makefile
 create mode 100644 drivers/net/ethernet/rocker/rocker.c
 create mode 100644 drivers/net/ethernet/rocker/rocker.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 776e078..7e15b50 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -7807,6 +7807,13 @@ F:	drivers/hid/hid-roccat*
 F:	include/linux/hid-roccat*
 F:	Documentation/ABI/*/sysfs-driver-hid-roccat*
 
+ROCKER DRIVER
+M:	Jiri Pirko <jiri@resnulli.us>
+M:	Scott Feldman <sfeldma@gmail.com>
+L:	netdev@vger.kernel.org
+S:	Supported
+F:	drivers/net/ethernet/rocker/
+
 ROCKETPORT DRIVER
 P:	Comtrol Corp.
 W:	http://www.comtrol.com
diff --git a/drivers/net/ethernet/Kconfig b/drivers/net/ethernet/Kconfig
index 1ed1fbb..df76050 100644
--- a/drivers/net/ethernet/Kconfig
+++ b/drivers/net/ethernet/Kconfig
@@ -155,6 +155,7 @@ source "drivers/net/ethernet/qualcomm/Kconfig"
 source "drivers/net/ethernet/realtek/Kconfig"
 source "drivers/net/ethernet/renesas/Kconfig"
 source "drivers/net/ethernet/rdc/Kconfig"
+source "drivers/net/ethernet/rocker/Kconfig"
 
 config S6GMAC
 	tristate "S6105 GMAC ethernet support"
diff --git a/drivers/net/ethernet/Makefile b/drivers/net/ethernet/Makefile
index 6e0b629..bf56f8b 100644
--- a/drivers/net/ethernet/Makefile
+++ b/drivers/net/ethernet/Makefile
@@ -65,6 +65,7 @@ obj-$(CONFIG_NET_VENDOR_QUALCOMM) += qualcomm/
 obj-$(CONFIG_NET_VENDOR_REALTEK) += realtek/
 obj-$(CONFIG_SH_ETH) += renesas/
 obj-$(CONFIG_NET_VENDOR_RDC) += rdc/
+obj-$(CONFIG_NET_VENDOR_ROCKER) += rocker/
 obj-$(CONFIG_S6GMAC) += s6gmac.o
 obj-$(CONFIG_NET_VENDOR_SAMSUNG) += samsung/
 obj-$(CONFIG_NET_VENDOR_SEEQ) += seeq/
diff --git a/drivers/net/ethernet/rocker/Kconfig b/drivers/net/ethernet/rocker/Kconfig
new file mode 100644
index 0000000..11a850e
--- /dev/null
+++ b/drivers/net/ethernet/rocker/Kconfig
@@ -0,0 +1,27 @@
+#
+# Rocker device configuration
+#
+
+config NET_VENDOR_ROCKER
+	bool "Rocker devices"
+	default y
+	---help---
+	  If you have a network device belonging to this class, say Y.
+
+	  Note that the answer to this question doesn't directly affect the
+	  kernel: saying N will just cause the configurator to skip all
+	  the questions about Rocker devices. If you say Y, you will be asked for
+	  your specific card in the following questions.
+
+if NET_VENDOR_ROCKER
+
+config ROCKER
+	tristate "Rocker switch driver (EXPERIMENTAL)"
+	depends on PCI && NET_SWITCHDEV
+	---help---
+	  This driver supports Rocker switch device.
+
+	  To compile this driver as a module, choose M here: the
+	  module will be called rocker.
+
+endif # NET_VENDOR_ROCKER
diff --git a/drivers/net/ethernet/rocker/Makefile b/drivers/net/ethernet/rocker/Makefile
new file mode 100644
index 0000000..f85fb12
--- /dev/null
+++ b/drivers/net/ethernet/rocker/Makefile
@@ -0,0 +1,5 @@
+#
+# Makefile for the Rocker network device drivers.
+#
+
+obj-$(CONFIG_ROCKER) += rocker.o
diff --git a/drivers/net/ethernet/rocker/rocker.c b/drivers/net/ethernet/rocker/rocker.c
new file mode 100644
index 0000000..ebad09c
--- /dev/null
+++ b/drivers/net/ethernet/rocker/rocker.c
@@ -0,0 +1,2060 @@
+/*
+ * drivers/net/ethernet/rocker/rocker.c - Rocker switch device driver
+ * Copyright (c) 2014 Jiri Pirko <jiri@resnulli.us>
+ * Copyright (c) 2014 Scott Feldman <sfeldma@gmail.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/pci.h>
+#include <linux/interrupt.h>
+#include <linux/sched.h>
+#include <linux/wait.h>
+#include <linux/spinlock.h>
+#include <linux/crc32.h>
+#include <linux/sort.h>
+#include <linux/random.h>
+#include <linux/netdevice.h>
+#include <linux/inetdevice.h>
+#include <linux/skbuff.h>
+#include <linux/socket.h>
+#include <linux/etherdevice.h>
+#include <linux/ethtool.h>
+#include <linux/if_ether.h>
+#include <linux/if_vlan.h>
+#include <net/switchdev.h>
+#include <net/rtnetlink.h>
+#include <asm-generic/io-64-nonatomic-lo-hi.h>
+#include <generated/utsrelease.h>
+
+#include "rocker.h"
+
+static const char rocker_driver_name[] = "rocker";
+
+static const struct pci_device_id rocker_pci_id_table[] = {
+	{PCI_VDEVICE(REDHAT, PCI_DEVICE_ID_REDHAT_ROCKER), 0},
+	{0, }
+};
+
+struct rocker_desc_info {
+	char *data; /* mapped */
+	size_t data_size;
+	size_t tlv_size;
+	struct rocker_desc *desc;
+	DEFINE_DMA_UNMAP_ADDR(mapaddr);
+};
+
+struct rocker_dma_ring_info {
+	size_t size;
+	u32 head;
+	u32 tail;
+	struct rocker_desc *desc; /* mapped */
+	dma_addr_t mapaddr;
+	struct rocker_desc_info *desc_info;
+	unsigned int type;
+};
+
+struct rocker;
+
+struct rocker_port {
+	struct net_device *dev;
+	struct rocker *rocker;
+	unsigned int port_number;
+	u32 lport;
+	struct napi_struct napi_tx;
+	struct napi_struct napi_rx;
+	struct rocker_dma_ring_info tx_ring;
+	struct rocker_dma_ring_info rx_ring;
+};
+
+struct rocker {
+	struct pci_dev *pdev;
+	u8 __iomem *hw_addr;
+	struct msix_entry *msix_entries;
+	unsigned int port_count;
+	struct rocker_port **ports;
+	struct {
+		u64 id;
+	} hw;
+	spinlock_t cmd_ring_lock;
+	struct rocker_dma_ring_info cmd_ring;
+	struct rocker_dma_ring_info event_ring;
+};
+
+struct rocker_wait {
+	wait_queue_head_t wait;
+	bool done;
+	bool nowait;
+};
+
+static void rocker_wait_reset(struct rocker_wait *wait)
+{
+	wait->done = false;
+	wait->nowait = false;
+}
+
+static void rocker_wait_init(struct rocker_wait *wait)
+{
+	init_waitqueue_head(&wait->wait);
+	rocker_wait_reset(wait);
+}
+
+static struct rocker_wait *rocker_wait_create(gfp_t gfp)
+{
+	struct rocker_wait *wait;
+
+	wait = kmalloc(sizeof(*wait), gfp);
+	if (!wait)
+		return NULL;
+	rocker_wait_init(wait);
+	return wait;
+}
+
+static void rocker_wait_destroy(struct rocker_wait *work)
+{
+	kfree(work);
+}
+
+static bool rocker_wait_event_timeout(struct rocker_wait *wait,
+				      unsigned long timeout)
+{
+	wait_event_timeout(wait->wait, wait->done, HZ / 10);
+	if (!wait->done)
+		return false;
+	return true;
+}
+
+static void rocker_wait_wake_up(struct rocker_wait *wait)
+{
+	wait->done = true;
+	wake_up(&wait->wait);
+}
+
+static u32 rocker_msix_vector(struct rocker *rocker, unsigned int vector)
+{
+	return rocker->msix_entries[vector].vector;
+}
+
+static u32 rocker_msix_tx_vector(struct rocker_port *rocker_port)
+{
+	return rocker_msix_vector(rocker_port->rocker,
+				  ROCKER_MSIX_VEC_TX(rocker_port->port_number));
+}
+
+static u32 rocker_msix_rx_vector(struct rocker_port *rocker_port)
+{
+	return rocker_msix_vector(rocker_port->rocker,
+				  ROCKER_MSIX_VEC_RX(rocker_port->port_number));
+}
+
+#define rocker_write32(rocker, reg, val)	\
+	writel((val), (rocker)->hw_addr + (ROCKER_ ## reg))
+#define rocker_read32(rocker, reg)	\
+	readl((rocker)->hw_addr + (ROCKER_ ## reg))
+#define rocker_write64(rocker, reg, val)	\
+	writeq((val), (rocker)->hw_addr + (ROCKER_ ## reg))
+#define rocker_read64(rocker, reg)	\
+	readq((rocker)->hw_addr + (ROCKER_ ## reg))
+
+/*****************************
+ * HW basic testing functions
+ *****************************/
+
+static int rocker_reg_test(struct rocker *rocker)
+{
+	struct pci_dev *pdev = rocker->pdev;
+	u64 test_reg;
+	u64 rnd;
+
+	rnd = prandom_u32();
+	rnd >>= 1;
+	rocker_write32(rocker, TEST_REG, rnd);
+	test_reg = rocker_read32(rocker, TEST_REG);
+	if (test_reg != rnd * 2) {
+		dev_err(&pdev->dev, "unexpected 32bit register value %08llx, expected %08llx\n",
+			test_reg, rnd * 2);
+		return -EIO;
+	}
+
+	rnd = prandom_u32();
+	rnd <<= 31;
+	rnd |= prandom_u32();
+	rocker_write64(rocker, TEST_REG64, rnd);
+	test_reg = rocker_read64(rocker, TEST_REG64);
+	if (test_reg != rnd * 2) {
+		dev_err(&pdev->dev, "unexpected 64bit register value %16llx, expected %16llx\n",
+			test_reg, rnd * 2);
+		return -EIO;
+	}
+
+	return 0;
+}
+
+static int rocker_dma_test_one(struct rocker *rocker, struct rocker_wait *wait,
+			       u32 test_type, dma_addr_t dma_handle,
+			       unsigned char *buf, unsigned char *expect,
+			       size_t size)
+{
+	struct pci_dev *pdev = rocker->pdev;
+	int i;
+
+	rocker_wait_reset(wait);
+	rocker_write32(rocker, TEST_DMA_CTRL, test_type);
+
+	if (!rocker_wait_event_timeout(wait, HZ / 10)) {
+		dev_err(&pdev->dev, "no interrupt received within a timeout\n");
+		return -EIO;
+	}
+
+	for (i = 0; i < size; i++) {
+		if (buf[i] != expect[i]) {
+			dev_err(&pdev->dev, "unexpected memory content %02x at byte %x\n, %02x expected",
+				buf[i], i, expect[i]);
+			return -EIO;
+		}
+	}
+	return 0;
+}
+
+#define ROCKER_TEST_DMA_BUF_SIZE (PAGE_SIZE * 4)
+#define ROCKER_TEST_DMA_FILL_PATTERN 0x96
+
+static int rocker_dma_test_offset(struct rocker *rocker,
+				  struct rocker_wait *wait, int offset)
+{
+	struct pci_dev *pdev = rocker->pdev;
+	unsigned char *alloc;
+	unsigned char *buf;
+	unsigned char *expect;
+	dma_addr_t dma_handle;
+	int i;
+	int err;
+
+	alloc = kzalloc(ROCKER_TEST_DMA_BUF_SIZE * 2 + offset,
+			GFP_KERNEL | GFP_DMA);
+	if (!alloc)
+		return -ENOMEM;
+	buf = alloc + offset;
+	expect = buf + ROCKER_TEST_DMA_BUF_SIZE;
+
+	dma_handle = pci_map_single(pdev, buf, ROCKER_TEST_DMA_BUF_SIZE,
+				    PCI_DMA_BIDIRECTIONAL);
+	if (pci_dma_mapping_error(pdev, dma_handle)) {
+		err = -EIO;
+		goto free_alloc;
+	}
+
+	rocker_write64(rocker, TEST_DMA_ADDR, dma_handle);
+	rocker_write32(rocker, TEST_DMA_SIZE, ROCKER_TEST_DMA_BUF_SIZE);
+
+	memset(expect, ROCKER_TEST_DMA_FILL_PATTERN, ROCKER_TEST_DMA_BUF_SIZE);
+	err = rocker_dma_test_one(rocker, wait, ROCKER_TEST_DMA_CTRL_FILL,
+				  dma_handle, buf, expect,
+				  ROCKER_TEST_DMA_BUF_SIZE);
+	if (err)
+		goto unmap;
+
+	memset(expect, 0, ROCKER_TEST_DMA_BUF_SIZE);
+	err = rocker_dma_test_one(rocker, wait, ROCKER_TEST_DMA_CTRL_CLEAR,
+				  dma_handle, buf, expect,
+				  ROCKER_TEST_DMA_BUF_SIZE);
+	if (err)
+		goto unmap;
+
+	prandom_bytes(buf, ROCKER_TEST_DMA_BUF_SIZE);
+	for (i = 0; i < ROCKER_TEST_DMA_BUF_SIZE; i++)
+		expect[i] = ~buf[i];
+	err = rocker_dma_test_one(rocker, wait, ROCKER_TEST_DMA_CTRL_INVERT,
+				  dma_handle, buf, expect,
+				  ROCKER_TEST_DMA_BUF_SIZE);
+	if (err)
+		goto unmap;
+
+unmap:
+	pci_unmap_single(pdev, dma_handle, ROCKER_TEST_DMA_BUF_SIZE,
+			 PCI_DMA_BIDIRECTIONAL);
+free_alloc:
+	kfree(alloc);
+
+	return err;
+}
+
+static int rocker_dma_test(struct rocker *rocker, struct rocker_wait *wait)
+{
+	int i;
+	int err;
+
+	for (i = 0; i < 8; i++) {
+		err = rocker_dma_test_offset(rocker, wait, i);
+		if (err)
+			return err;
+	}
+	return 0;
+}
+
+static irqreturn_t rocker_test_irq_handler(int irq, void *dev_id)
+{
+	struct rocker_wait *wait = dev_id;
+
+	rocker_wait_wake_up(wait);
+
+	return IRQ_HANDLED;
+}
+
+static int rocker_basic_hw_test(struct rocker *rocker)
+{
+	struct pci_dev *pdev = rocker->pdev;
+	struct rocker_wait wait;
+	int err;
+
+	err = rocker_reg_test(rocker);
+	if (err) {
+		dev_err(&pdev->dev, "reg test failed\n");
+		return err;
+	}
+
+	err = request_irq(rocker_msix_vector(rocker, ROCKER_MSIX_VEC_TEST),
+			  rocker_test_irq_handler, 0,
+			  rocker_driver_name, &wait);
+	if (err) {
+		dev_err(&pdev->dev, "cannot assign test irq\n");
+		return err;
+	}
+
+	rocker_wait_init(&wait);
+	rocker_write32(rocker, TEST_IRQ, ROCKER_MSIX_VEC_TEST);
+
+	if (!rocker_wait_event_timeout(&wait, HZ / 10)) {
+		dev_err(&pdev->dev, "no interrupt received within a timeout\n");
+		err = -EIO;
+		goto free_irq;
+	}
+
+	err = rocker_dma_test(rocker, &wait);
+	if (err)
+		dev_err(&pdev->dev, "dma test failed\n");
+
+free_irq:
+	free_irq(rocker_msix_vector(rocker, ROCKER_MSIX_VEC_TEST), &wait);
+	return err;
+}
+
+/******
+ * TLV
+ ******/
+
+#define ROCKER_TLV_ALIGNTO 8U
+#define ROCKER_TLV_ALIGN(len) \
+	(((len) + ROCKER_TLV_ALIGNTO - 1) & ~(ROCKER_TLV_ALIGNTO - 1))
+#define ROCKER_TLV_HDRLEN ROCKER_TLV_ALIGN(sizeof(struct rocker_tlv))
+
+/*  <------- ROCKER_TLV_HDRLEN -------> <--- ROCKER_TLV_ALIGN(payload) --->
+ * +-----------------------------+- - -+- - - - - - - - - - - - - - -+- - -+
+ * |             Header          | Pad |           Payload           | Pad |
+ * |      (struct rocker_tlv)    | ing |                             | ing |
+ * +-----------------------------+- - -+- - - - - - - - - - - - - - -+- - -+
+ *  <--------------------------- tlv->len -------------------------->
+ */
+
+static struct rocker_tlv *rocker_tlv_next(const struct rocker_tlv *tlv,
+					  int *remaining)
+{
+	int totlen = ROCKER_TLV_ALIGN(tlv->len);
+
+	*remaining -= totlen;
+	return (struct rocker_tlv *) ((char *) tlv + totlen);
+}
+
+static int rocker_tlv_ok(const struct rocker_tlv *tlv, int remaining)
+{
+	return remaining >= (int) ROCKER_TLV_HDRLEN &&
+	       tlv->len >= ROCKER_TLV_HDRLEN &&
+	       tlv->len <= remaining;
+}
+
+#define rocker_tlv_for_each(pos, head, len, rem)	\
+	for (pos = head, rem = len;			\
+	     rocker_tlv_ok(pos, rem);			\
+	     pos = rocker_tlv_next(pos, &(rem)))
+
+#define rocker_tlv_for_each_nested(pos, tlv, rem)	\
+	rocker_tlv_for_each(pos, rocker_tlv_data(tlv),	\
+			    rocker_tlv_len(tlv), rem)
+
+static int rocker_tlv_attr_size(int payload)
+{
+	return ROCKER_TLV_HDRLEN + payload;
+}
+
+static int rocker_tlv_total_size(int payload)
+{
+	return ROCKER_TLV_ALIGN(rocker_tlv_attr_size(payload));
+}
+
+static int rocker_tlv_padlen(int payload)
+{
+	return rocker_tlv_total_size(payload) - rocker_tlv_attr_size(payload);
+}
+
+static int rocker_tlv_type(const struct rocker_tlv *tlv)
+{
+	return tlv->type;
+}
+
+static void *rocker_tlv_data(const struct rocker_tlv *tlv)
+{
+	return (char *) tlv + ROCKER_TLV_HDRLEN;
+}
+
+static int rocker_tlv_len(const struct rocker_tlv *tlv)
+{
+	return tlv->len - ROCKER_TLV_HDRLEN;
+}
+
+static u8 rocker_tlv_get_u8(const struct rocker_tlv *tlv)
+{
+	return *(u8 *) rocker_tlv_data(tlv);
+}
+
+static u16 rocker_tlv_get_u16(const struct rocker_tlv *tlv)
+{
+	return *(u16 *) rocker_tlv_data(tlv);
+}
+
+static u32 rocker_tlv_get_u32(const struct rocker_tlv *tlv)
+{
+	return *(u32 *) rocker_tlv_data(tlv);
+}
+
+static u64 rocker_tlv_get_u64(const struct rocker_tlv *tlv)
+{
+	return *(u64 *) rocker_tlv_data(tlv);
+}
+
+static void rocker_tlv_parse(struct rocker_tlv **tb, int maxtype,
+			     const char *buf, int buf_len)
+{
+	const struct rocker_tlv *tlv;
+	const struct rocker_tlv *head = (const struct rocker_tlv *) buf;
+	int rem;
+
+	memset(tb, 0, sizeof(struct rocker_tlv *) * (maxtype + 1));
+
+	rocker_tlv_for_each(tlv, head, buf_len, rem) {
+		u32 type = rocker_tlv_type(tlv);
+
+		if (type > 0 && type <= maxtype)
+			tb[type] = (struct rocker_tlv *) tlv;
+	}
+}
+
+static void rocker_tlv_parse_nested(struct rocker_tlv **tb, int maxtype,
+				    const struct rocker_tlv *tlv)
+{
+	rocker_tlv_parse(tb, maxtype, rocker_tlv_data(tlv),
+			 rocker_tlv_len(tlv));
+}
+
+static void rocker_tlv_parse_desc(struct rocker_tlv **tb, int maxtype,
+				  struct rocker_desc_info *desc_info)
+{
+	rocker_tlv_parse(tb, maxtype, desc_info->data,
+			 desc_info->desc->tlv_size);
+}
+
+static struct rocker_tlv *rocker_tlv_start(struct rocker_desc_info *desc_info)
+{
+	return (struct rocker_tlv *) ((char *) desc_info->data +
+					       desc_info->tlv_size);
+}
+
+static int rocker_tlv_put(struct rocker_desc_info *desc_info,
+			  int attrtype, int attrlen, const void *data)
+{
+	int tail_room = desc_info->data_size - desc_info->tlv_size;
+	int total_size = rocker_tlv_total_size(attrlen);
+	struct rocker_tlv *tlv;
+
+	if (unlikely(tail_room < total_size))
+		return -EMSGSIZE;
+
+	tlv = rocker_tlv_start(desc_info);
+	desc_info->tlv_size += total_size;
+	tlv->type = attrtype;
+	tlv->len = rocker_tlv_attr_size(attrlen);
+	memcpy(rocker_tlv_data(tlv), data, attrlen);
+	memset((char *) tlv + tlv->len, 0, rocker_tlv_padlen(attrlen));
+	return 0;
+}
+
+static int rocker_tlv_put_u8(struct rocker_desc_info *desc_info,
+			     int attrtype, u8 value)
+{
+	return rocker_tlv_put(desc_info, attrtype, sizeof(u8), &value);
+}
+
+static int rocker_tlv_put_u16(struct rocker_desc_info *desc_info,
+			      int attrtype, u16 value)
+{
+	return rocker_tlv_put(desc_info, attrtype, sizeof(u16), &value);
+}
+
+static int rocker_tlv_put_u32(struct rocker_desc_info *desc_info,
+			      int attrtype, u32 value)
+{
+	return rocker_tlv_put(desc_info, attrtype, sizeof(u32), &value);
+}
+
+static int rocker_tlv_put_u64(struct rocker_desc_info *desc_info,
+			      int attrtype, u64 value)
+{
+	return rocker_tlv_put(desc_info, attrtype, sizeof(u64), &value);
+}
+
+static struct rocker_tlv *
+rocker_tlv_nest_start(struct rocker_desc_info *desc_info, int attrtype)
+{
+	struct rocker_tlv *start = rocker_tlv_start(desc_info);
+
+	if (rocker_tlv_put(desc_info, attrtype, 0, NULL) < 0)
+		return NULL;
+
+	return start;
+}
+
+static void rocker_tlv_nest_end(struct rocker_desc_info *desc_info,
+				struct rocker_tlv *start)
+{
+	start->len = (char *) rocker_tlv_start(desc_info) - (char *) start;
+}
+
+static void rocker_tlv_nest_cancel(struct rocker_desc_info *desc_info,
+				   struct rocker_tlv *start)
+{
+	desc_info->tlv_size = (char *) start - desc_info->data;
+}
+
+/******************************************
+ * DMA rings and descriptors manipulations
+ ******************************************/
+
+static u32 __pos_inc(u32 pos, size_t limit)
+{
+	return ++pos == limit ? 0 : pos;
+}
+
+static int rocker_desc_err(struct rocker_desc_info *desc_info)
+{
+	return -(desc_info->desc->comp_err & ~ROCKER_DMA_DESC_COMP_ERR_GEN);
+}
+
+static void rocker_desc_gen_clear(struct rocker_desc_info *desc_info)
+{
+	desc_info->desc->comp_err &= ~ROCKER_DMA_DESC_COMP_ERR_GEN;
+}
+
+static bool rocker_desc_gen(struct rocker_desc_info *desc_info)
+{
+	u32 comp_err = desc_info->desc->comp_err;
+
+	return comp_err & ROCKER_DMA_DESC_COMP_ERR_GEN ? true : false;
+}
+
+static void *rocker_desc_cookie_ptr_get(struct rocker_desc_info *desc_info)
+{
+	return (void *) desc_info->desc->cookie;
+}
+
+static void rocker_desc_cookie_ptr_set(struct rocker_desc_info *desc_info,
+				       void *ptr)
+{
+	desc_info->desc->cookie = (long) ptr;
+}
+
+static struct rocker_desc_info *
+rocker_desc_head_get(struct rocker_dma_ring_info *info)
+{
+	static struct rocker_desc_info *desc_info;
+	u32 head = __pos_inc(info->head, info->size);
+
+	desc_info = &info->desc_info[info->head];
+	if (head == info->tail)
+		return NULL; /* ring full */
+	desc_info->tlv_size = 0;
+	return desc_info;
+}
+
+static void rocker_desc_commit(struct rocker_desc_info *desc_info)
+{
+	desc_info->desc->buf_size = desc_info->data_size;
+	desc_info->desc->tlv_size = desc_info->tlv_size;
+}
+
+static void rocker_desc_head_set(struct rocker *rocker,
+				 struct rocker_dma_ring_info *info,
+				 struct rocker_desc_info *desc_info)
+{
+	u32 head = __pos_inc(info->head, info->size);
+
+	BUG_ON(head == info->tail);
+	rocker_desc_commit(desc_info);
+	info->head = head;
+	rocker_write32(rocker, DMA_DESC_HEAD(info->type), head);
+}
+
+static struct rocker_desc_info *
+rocker_desc_tail_get(struct rocker_dma_ring_info *info)
+{
+	static struct rocker_desc_info *desc_info;
+
+	if (info->tail == info->head)
+		return NULL; /* nothing to be done between head and tail */
+	desc_info = &info->desc_info[info->tail];
+	if (!rocker_desc_gen(desc_info))
+		return NULL; /* gen bit not set, desc is not ready yet */
+	info->tail = __pos_inc(info->tail, info->size);
+	desc_info->tlv_size = desc_info->desc->tlv_size;
+	return desc_info;
+}
+
+static void rocker_dma_ring_credits_set(struct rocker *rocker,
+					struct rocker_dma_ring_info *info,
+					u32 credits)
+{
+	if (credits)
+		rocker_write32(rocker, DMA_DESC_CREDITS(info->type), credits);
+}
+
+static unsigned long rocker_dma_ring_size_fix(size_t size)
+{
+	return max(ROCKER_DMA_SIZE_MIN,
+		   min(roundup_pow_of_two(size), ROCKER_DMA_SIZE_MAX));
+}
+
+static int rocker_dma_ring_create(struct rocker *rocker,
+				  unsigned int type,
+				  size_t size,
+				  struct rocker_dma_ring_info *info)
+{
+	int i;
+
+	BUG_ON(size != rocker_dma_ring_size_fix(size));
+	info->size = size;
+	info->type = type;
+	info->head = 0;
+	info->tail = 0;
+	info->desc_info = kcalloc(info->size, sizeof(*info->desc_info),
+				  GFP_KERNEL);
+	if (!info->desc_info)
+		return -ENOMEM;
+
+	info->desc = pci_alloc_consistent(rocker->pdev,
+					  info->size * sizeof(*info->desc),
+					  &info->mapaddr);
+	if (!info->desc) {
+		kfree(info->desc_info);
+		return -ENOMEM;
+	}
+
+	for (i = 0; i < info->size; i++)
+		info->desc_info[i].desc = &info->desc[i];
+
+	rocker_write32(rocker, DMA_DESC_CTRL(info->type),
+		       ROCKER_DMA_DESC_CTRL_RESET);
+	rocker_write64(rocker, DMA_DESC_ADDR(info->type), info->mapaddr);
+	rocker_write32(rocker, DMA_DESC_SIZE(info->type), info->size);
+
+	return 0;
+}
+
+static void rocker_dma_ring_destroy(struct rocker *rocker,
+				    struct rocker_dma_ring_info *info)
+{
+	rocker_write64(rocker, DMA_DESC_ADDR(info->type), 0);
+
+	pci_free_consistent(rocker->pdev,
+			    info->size * sizeof(struct rocker_desc),
+			    info->desc, info->mapaddr);
+	kfree(info->desc_info);
+}
+
+static void rocker_dma_ring_pass_to_producer(struct rocker *rocker,
+					     struct rocker_dma_ring_info *info)
+{
+	int i;
+
+	BUG_ON(info->head || info->tail);
+
+	/* When ring is consumer, we need to advance head for each desc.
+	 * That tells hw that the desc is ready to be used by it.
+	 */
+	for (i = 0; i < info->size - 1; i++)
+		rocker_desc_head_set(rocker, info, &info->desc_info[i]);
+	rocker_desc_commit(&info->desc_info[i]);
+}
+
+static int rocker_dma_ring_bufs_alloc(struct rocker *rocker,
+				      struct rocker_dma_ring_info *info,
+				      int direction, size_t buf_size)
+{
+	struct pci_dev *pdev = rocker->pdev;
+	int i;
+	int err;
+
+	for (i = 0; i < info->size; i++) {
+		struct rocker_desc_info *desc_info = &info->desc_info[i];
+		struct rocker_desc *desc = &info->desc[i];
+		dma_addr_t dma_handle;
+		char *buf;
+
+		buf = kzalloc(buf_size, GFP_KERNEL | GFP_DMA);
+		if (!buf) {
+			err = -ENOMEM;
+			goto rollback;
+		}
+
+		dma_handle = pci_map_single(pdev, buf, buf_size, direction);
+		if (pci_dma_mapping_error(pdev, dma_handle)) {
+			kfree(buf);
+			err = -EIO;
+			goto rollback;
+		}
+
+		desc_info->data = buf;
+		desc_info->data_size = buf_size;
+		dma_unmap_addr_set(desc_info, mapaddr, dma_handle);
+
+		desc->buf_addr = dma_handle;
+		desc->buf_size = buf_size;
+	}
+	return 0;
+
+rollback:
+	for (i--; i >= 0; i--) {
+		struct rocker_desc_info *desc_info = &info->desc_info[i];
+
+		pci_unmap_single(pdev, dma_unmap_addr(desc_info, mapaddr),
+				 desc_info->data_size, direction);
+		kfree(desc_info->data);
+	}
+	return err;
+}
+
+static void rocker_dma_ring_bufs_free(struct rocker *rocker,
+				      struct rocker_dma_ring_info *info,
+				      int direction)
+{
+	struct pci_dev *pdev = rocker->pdev;
+	int i;
+
+	for (i = 0; i < info->size; i++) {
+		struct rocker_desc_info *desc_info = &info->desc_info[i];
+		struct rocker_desc *desc = &info->desc[i];
+
+		desc->buf_addr = 0;
+		desc->buf_size = 0;
+		pci_unmap_single(pdev, dma_unmap_addr(desc_info, mapaddr),
+				 desc_info->data_size, direction);
+		kfree(desc_info->data);
+	}
+}
+
+static int rocker_dma_rings_init(struct rocker *rocker)
+{
+	struct pci_dev *pdev = rocker->pdev;
+	int err;
+
+	err = rocker_dma_ring_create(rocker, ROCKER_DMA_CMD,
+				     ROCKER_DMA_CMD_DEFAULT_SIZE,
+				     &rocker->cmd_ring);
+	if (err) {
+		dev_err(&pdev->dev, "failed to create command dma ring\n");
+		return err;
+	}
+
+	spin_lock_init(&rocker->cmd_ring_lock);
+
+	err = rocker_dma_ring_bufs_alloc(rocker, &rocker->cmd_ring,
+					 PCI_DMA_BIDIRECTIONAL, PAGE_SIZE);
+	if (err) {
+		dev_err(&pdev->dev, "failed to alloc command dma ring buffers\n");
+		goto err_dma_cmd_ring_bufs_alloc;
+	}
+
+	err = rocker_dma_ring_create(rocker, ROCKER_DMA_EVENT,
+				     ROCKER_DMA_EVENT_DEFAULT_SIZE,
+				     &rocker->event_ring);
+	if (err) {
+		dev_err(&pdev->dev, "failed to create event dma ring\n");
+		goto err_dma_event_ring_create;
+	}
+
+	err = rocker_dma_ring_bufs_alloc(rocker, &rocker->event_ring,
+					 PCI_DMA_FROMDEVICE, PAGE_SIZE);
+	if (err) {
+		dev_err(&pdev->dev, "failed to alloc event dma ring buffers\n");
+		goto err_dma_event_ring_bufs_alloc;
+	}
+	rocker_dma_ring_pass_to_producer(rocker, &rocker->event_ring);
+	return 0;
+
+err_dma_event_ring_bufs_alloc:
+	rocker_dma_ring_destroy(rocker, &rocker->event_ring);
+err_dma_event_ring_create:
+	rocker_dma_ring_bufs_free(rocker, &rocker->cmd_ring,
+				  PCI_DMA_BIDIRECTIONAL);
+err_dma_cmd_ring_bufs_alloc:
+	rocker_dma_ring_destroy(rocker, &rocker->cmd_ring);
+	return err;
+}
+
+static void rocker_dma_rings_fini(struct rocker *rocker)
+{
+	rocker_dma_ring_bufs_free(rocker, &rocker->event_ring,
+				  PCI_DMA_BIDIRECTIONAL);
+	rocker_dma_ring_destroy(rocker, &rocker->event_ring);
+	rocker_dma_ring_bufs_free(rocker, &rocker->cmd_ring,
+				  PCI_DMA_BIDIRECTIONAL);
+	rocker_dma_ring_destroy(rocker, &rocker->cmd_ring);
+}
+
+static int rocker_dma_rx_ring_skb_map(struct rocker *rocker,
+				      struct rocker_port *rocker_port,
+				      struct rocker_desc_info *desc_info,
+				      struct sk_buff *skb, size_t buf_len)
+{
+	struct pci_dev *pdev = rocker->pdev;
+	dma_addr_t dma_handle;
+
+	dma_handle = pci_map_single(pdev, skb->data, buf_len,
+				    PCI_DMA_FROMDEVICE);
+	if (pci_dma_mapping_error(pdev, dma_handle))
+		return -EIO;
+	if (rocker_tlv_put_u64(desc_info, ROCKER_TLV_RX_FRAG_ADDR, dma_handle))
+		goto tlv_put_failure;
+	if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_RX_FRAG_MAX_LEN, buf_len))
+		goto tlv_put_failure;
+	return 0;
+
+tlv_put_failure:
+	pci_unmap_single(pdev, dma_handle, buf_len, PCI_DMA_FROMDEVICE);
+	desc_info->tlv_size = 0;
+	return -EMSGSIZE;
+}
+
+static size_t rocker_port_rx_buf_len(struct rocker_port *rocker_port)
+{
+	return rocker_port->dev->mtu + ETH_HLEN + ETH_FCS_LEN + VLAN_HLEN;
+}
+
+static int rocker_dma_rx_ring_skb_alloc(struct rocker *rocker,
+					struct rocker_port *rocker_port,
+					struct rocker_desc_info *desc_info)
+{
+	struct net_device *dev = rocker_port->dev;
+	struct sk_buff *skb;
+	size_t buf_len = rocker_port_rx_buf_len(rocker_port);
+	int err;
+
+	/* Ensure that hw will see tlv_size zero in case of an error.
+	 * That tells hw to use another descriptor.
+	 */
+	rocker_desc_cookie_ptr_set(desc_info, NULL);
+	desc_info->tlv_size = 0;
+
+	skb = netdev_alloc_skb_ip_align(dev, buf_len);
+	if (!skb)
+		return -ENOMEM;
+	err = rocker_dma_rx_ring_skb_map(rocker, rocker_port, desc_info,
+					 skb, buf_len);
+	if (err) {
+		dev_kfree_skb_any(skb);
+		return err;
+	}
+	rocker_desc_cookie_ptr_set(desc_info, skb);
+	return 0;
+}
+
+static void rocker_dma_rx_ring_skb_unmap(struct rocker *rocker,
+					 struct rocker_tlv **attrs)
+{
+	struct pci_dev *pdev = rocker->pdev;
+	dma_addr_t dma_handle;
+	size_t len;
+
+	if (!attrs[ROCKER_TLV_RX_FRAG_ADDR] ||
+	    !attrs[ROCKER_TLV_RX_FRAG_MAX_LEN])
+		return;
+	dma_handle = rocker_tlv_get_u64(attrs[ROCKER_TLV_RX_FRAG_ADDR]);
+	len = rocker_tlv_get_u16(attrs[ROCKER_TLV_RX_FRAG_MAX_LEN]);
+	pci_unmap_single(pdev, dma_handle, len, PCI_DMA_FROMDEVICE);
+}
+
+static void rocker_dma_rx_ring_skb_free(struct rocker *rocker,
+					struct rocker_desc_info *desc_info)
+{
+	struct rocker_tlv *attrs[ROCKER_TLV_RX_MAX + 1];
+	struct sk_buff *skb = rocker_desc_cookie_ptr_get(desc_info);
+
+	if (!skb)
+		return;
+	rocker_tlv_parse_desc(attrs, ROCKER_TLV_RX_MAX, desc_info);
+	rocker_dma_rx_ring_skb_unmap(rocker, attrs);
+	dev_kfree_skb_any(skb);
+}
+
+static int rocker_dma_rx_ring_skbs_alloc(struct rocker *rocker,
+					 struct rocker_port *rocker_port)
+{
+	struct rocker_dma_ring_info *rx_ring = &rocker_port->rx_ring;
+	int i;
+	int err;
+
+	for (i = 0; i < rx_ring->size; i++) {
+		err = rocker_dma_rx_ring_skb_alloc(rocker, rocker_port,
+						   &rx_ring->desc_info[i]);
+		if (err)
+			goto rollback;
+	}
+	return 0;
+
+rollback:
+	for (i--; i >= 0; i--)
+		rocker_dma_rx_ring_skb_free(rocker, &rx_ring->desc_info[i]);
+	return err;
+}
+
+static void rocker_dma_rx_ring_skbs_free(struct rocker *rocker,
+					 struct rocker_port *rocker_port)
+{
+	struct rocker_dma_ring_info *rx_ring = &rocker_port->rx_ring;
+	int i;
+
+	for (i = 0; i < rx_ring->size; i++)
+		rocker_dma_rx_ring_skb_free(rocker, &rx_ring->desc_info[i]);
+}
+
+static int rocker_port_dma_rings_init(struct rocker_port *rocker_port)
+{
+	struct rocker *rocker = rocker_port->rocker;
+	int err;
+
+	err = rocker_dma_ring_create(rocker,
+				     ROCKER_DMA_TX(rocker_port->port_number),
+				     ROCKER_DMA_TX_DEFAULT_SIZE,
+				     &rocker_port->tx_ring);
+	if (err) {
+		netdev_err(rocker_port->dev, "failed to create tx dma ring\n");
+		return err;
+	}
+
+	err = rocker_dma_ring_bufs_alloc(rocker, &rocker_port->tx_ring,
+					 PCI_DMA_TODEVICE,
+					 ROCKER_DMA_TX_DESC_SIZE);
+	if (err) {
+		netdev_err(rocker_port->dev, "failed to alloc tx dma ring buffers\n");
+		goto err_dma_tx_ring_bufs_alloc;
+	}
+
+	err = rocker_dma_ring_create(rocker,
+				     ROCKER_DMA_RX(rocker_port->port_number),
+				     ROCKER_DMA_RX_DEFAULT_SIZE,
+				     &rocker_port->rx_ring);
+	if (err) {
+		netdev_err(rocker_port->dev, "failed to create rx dma ring\n");
+		goto err_dma_rx_ring_create;
+	}
+
+	err = rocker_dma_ring_bufs_alloc(rocker, &rocker_port->rx_ring,
+					 PCI_DMA_BIDIRECTIONAL,
+					 ROCKER_DMA_RX_DESC_SIZE);
+	if (err) {
+		netdev_err(rocker_port->dev, "failed to alloc rx dma ring buffers\n");
+		goto err_dma_rx_ring_bufs_alloc;
+	}
+
+	err = rocker_dma_rx_ring_skbs_alloc(rocker, rocker_port);
+	if (err) {
+		netdev_err(rocker_port->dev, "failed to alloc rx dma ring skbs\n");
+		goto err_dma_rx_ring_skbs_alloc;
+	}
+	rocker_dma_ring_pass_to_producer(rocker, &rocker_port->rx_ring);
+
+	return 0;
+
+err_dma_rx_ring_skbs_alloc:
+	rocker_dma_ring_bufs_free(rocker, &rocker_port->rx_ring,
+				  PCI_DMA_BIDIRECTIONAL);
+err_dma_rx_ring_bufs_alloc:
+	rocker_dma_ring_destroy(rocker, &rocker_port->rx_ring);
+err_dma_rx_ring_create:
+	rocker_dma_ring_bufs_free(rocker, &rocker_port->tx_ring,
+				  PCI_DMA_TODEVICE);
+err_dma_tx_ring_bufs_alloc:
+	rocker_dma_ring_destroy(rocker, &rocker_port->tx_ring);
+	return err;
+}
+
+static void rocker_port_dma_rings_fini(struct rocker_port *rocker_port)
+{
+	struct rocker *rocker = rocker_port->rocker;
+
+	rocker_dma_rx_ring_skbs_free(rocker, rocker_port);
+	rocker_dma_ring_bufs_free(rocker, &rocker_port->rx_ring,
+				  PCI_DMA_BIDIRECTIONAL);
+	rocker_dma_ring_destroy(rocker, &rocker_port->rx_ring);
+	rocker_dma_ring_bufs_free(rocker, &rocker_port->tx_ring,
+				  PCI_DMA_TODEVICE);
+	rocker_dma_ring_destroy(rocker, &rocker_port->tx_ring);
+}
+
+static void rocker_port_set_enable(struct rocker_port *rocker_port, bool enable)
+{
+	u64 val = rocker_read64(rocker_port->rocker, PORT_PHYS_ENABLE);
+
+	if (enable)
+		val |= 1 << rocker_port->lport;
+	else
+		val &= ~(1 << rocker_port->lport);
+	rocker_write64(rocker_port->rocker, PORT_PHYS_ENABLE, val);
+}
+
+/********************************
+ * Interrupt handler and helpers
+ ********************************/
+
+static irqreturn_t rocker_cmd_irq_handler(int irq, void *dev_id)
+{
+	struct rocker *rocker = dev_id;
+	struct rocker_desc_info *desc_info;
+	struct rocker_wait *wait;
+	u32 credits = 0;
+
+	spin_lock(&rocker->cmd_ring_lock);
+	while ((desc_info = rocker_desc_tail_get(&rocker->cmd_ring))) {
+		wait = rocker_desc_cookie_ptr_get(desc_info);
+		if (wait->nowait) {
+			rocker_desc_gen_clear(desc_info);
+			rocker_wait_destroy(wait);
+		} else {
+			rocker_wait_wake_up(wait);
+		}
+		credits++;
+	}
+	spin_unlock(&rocker->cmd_ring_lock);
+	rocker_dma_ring_credits_set(rocker, &rocker->cmd_ring, credits);
+
+	return IRQ_HANDLED;
+}
+
+static void rocker_port_link_up(struct rocker_port *rocker_port)
+{
+	netif_carrier_on(rocker_port->dev);
+	netdev_info(rocker_port->dev, "Link is up\n");
+}
+
+static void rocker_port_link_down(struct rocker_port *rocker_port)
+{
+	netif_carrier_off(rocker_port->dev);
+	netdev_info(rocker_port->dev, "Link is down\n");
+}
+
+static int rocker_event_link_change(struct rocker *rocker,
+				    const struct rocker_tlv *info)
+{
+	struct rocker_tlv *attrs[ROCKER_TLV_EVENT_LINK_CHANGED_MAX + 1];
+	unsigned int port_number;
+	bool link_up;
+	struct rocker_port *rocker_port;
+
+	rocker_tlv_parse_nested(attrs, ROCKER_TLV_EVENT_LINK_CHANGED_MAX, info);
+	if (!attrs[ROCKER_TLV_EVENT_LINK_CHANGED_LPORT] ||
+	    !attrs[ROCKER_TLV_EVENT_LINK_CHANGED_LINKUP])
+		return -EIO;
+	port_number =
+		rocker_tlv_get_u32(attrs[ROCKER_TLV_EVENT_LINK_CHANGED_LPORT]) - 1;
+	link_up = rocker_tlv_get_u8(attrs[ROCKER_TLV_EVENT_LINK_CHANGED_LINKUP]);
+
+	if (port_number >= rocker->port_count)
+		return -EINVAL;
+
+	rocker_port = rocker->ports[port_number];
+	if (netif_carrier_ok(rocker_port->dev) != link_up) {
+		if (link_up)
+			rocker_port_link_up(rocker_port);
+		else
+			rocker_port_link_down(rocker_port);
+	}
+
+	return 0;
+}
+
+static int rocker_event_process(struct rocker *rocker,
+				struct rocker_desc_info *desc_info)
+{
+	struct rocker_tlv *attrs[ROCKER_TLV_EVENT_MAX + 1];
+	struct rocker_tlv *info;
+	u16 type;
+
+	rocker_tlv_parse_desc(attrs, ROCKER_TLV_EVENT_MAX, desc_info);
+	if (!attrs[ROCKER_TLV_EVENT_TYPE] ||
+	    !attrs[ROCKER_TLV_EVENT_INFO])
+		return -EIO;
+
+	type = rocker_tlv_get_u16(attrs[ROCKER_TLV_EVENT_TYPE]);
+	info = attrs[ROCKER_TLV_EVENT_INFO];
+
+	switch (type) {
+	case ROCKER_TLV_EVENT_TYPE_LINK_CHANGED:
+		return rocker_event_link_change(rocker, info);
+	}
+
+	return -EOPNOTSUPP;
+}
+
+static irqreturn_t rocker_event_irq_handler(int irq, void *dev_id)
+{
+	struct rocker *rocker = dev_id;
+	struct pci_dev *pdev = rocker->pdev;
+	struct rocker_desc_info *desc_info;
+	u32 credits = 0;
+	int err;
+
+	while ((desc_info = rocker_desc_tail_get(&rocker->event_ring))) {
+		err = rocker_desc_err(desc_info);
+		if (err) {
+			dev_err(&pdev->dev, "event desc received with err %d\n",
+				err);
+		} else {
+			err = rocker_event_process(rocker, desc_info);
+			if (err)
+				dev_err(&pdev->dev, "event processing failed with err %d\n",
+					err);
+		}
+		rocker_desc_gen_clear(desc_info);
+		rocker_desc_head_set(rocker, &rocker->event_ring, desc_info);
+		credits++;
+	}
+	rocker_dma_ring_credits_set(rocker, &rocker->event_ring, credits);
+
+	return IRQ_HANDLED;
+}
+
+static irqreturn_t rocker_tx_irq_handler(int irq, void *dev_id)
+{
+	struct rocker_port *rocker_port = dev_id;
+
+	napi_schedule(&rocker_port->napi_tx);
+	return IRQ_HANDLED;
+}
+
+static irqreturn_t rocker_rx_irq_handler(int irq, void *dev_id)
+{
+	struct rocker_port *rocker_port = dev_id;
+
+	napi_schedule(&rocker_port->napi_rx);
+	return IRQ_HANDLED;
+}
+
+/********************
+ * Command interface
+ ********************/
+
+typedef int (*rocker_cmd_cb_t)(struct rocker *rocker,
+			       struct rocker_port *rocker_port,
+			       struct rocker_desc_info *desc_info,
+			       void *priv);
+
+static int rocker_cmd_exec(struct rocker *rocker,
+			   struct rocker_port *rocker_port,
+			   rocker_cmd_cb_t prepare, void *prepare_priv,
+			   rocker_cmd_cb_t process, void *process_priv,
+			   bool nowait)
+{
+	struct rocker_desc_info *desc_info;
+	struct rocker_wait *wait;
+	unsigned long flags;
+	int err;
+
+	wait = rocker_wait_create(nowait ? GFP_ATOMIC : GFP_KERNEL);
+	if (!wait)
+		return -ENOMEM;
+	wait->nowait = nowait;
+
+	spin_lock_irqsave(&rocker->cmd_ring_lock, flags);
+	desc_info = rocker_desc_head_get(&rocker->cmd_ring);
+	if (!desc_info) {
+		spin_unlock_irqrestore(&rocker->cmd_ring_lock, flags);
+		err = -EAGAIN;
+		goto out;
+	}
+	err = prepare(rocker, rocker_port, desc_info, prepare_priv);
+	if (err) {
+		spin_unlock_irqrestore(&rocker->cmd_ring_lock, flags);
+		goto out;
+	}
+	rocker_desc_cookie_ptr_set(desc_info, wait);
+	rocker_desc_head_set(rocker, &rocker->cmd_ring, desc_info);
+	spin_unlock_irqrestore(&rocker->cmd_ring_lock, flags);
+
+	if (nowait)
+		return 0;
+
+	if (!rocker_wait_event_timeout(wait, HZ / 10))
+		return -EIO;
+
+	err = rocker_desc_err(desc_info);
+	if (err)
+		return err;
+
+	if (process)
+		err = process(rocker, rocker_port, desc_info, process_priv);
+
+	rocker_desc_gen_clear(desc_info);
+out:
+	rocker_wait_destroy(wait);
+	return err;
+}
+
+static int
+rocker_cmd_get_port_settings_prep(struct rocker *rocker,
+				  struct rocker_port *rocker_port,
+				  struct rocker_desc_info *desc_info,
+				  void *priv)
+{
+	struct rocker_tlv *cmd_info;
+
+	if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_CMD_TYPE,
+			       ROCKER_TLV_CMD_TYPE_GET_PORT_SETTINGS))
+		return -EMSGSIZE;
+	cmd_info = rocker_tlv_nest_start(desc_info, ROCKER_TLV_CMD_INFO);
+	if (!cmd_info)
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u32(desc_info, ROCKER_TLV_CMD_PORT_SETTINGS_LPORT,
+			       rocker_port->lport))
+		return -EMSGSIZE;
+	rocker_tlv_nest_end(desc_info, cmd_info);
+	return 0;
+}
+
+static int
+rocker_cmd_get_port_settings_ethtool_proc(struct rocker *rocker,
+					  struct rocker_port *rocker_port,
+					  struct rocker_desc_info *desc_info,
+					  void *priv)
+{
+	struct ethtool_cmd *ecmd = priv;
+	struct rocker_tlv *attrs[ROCKER_TLV_CMD_MAX + 1];
+	struct rocker_tlv *info_attrs[ROCKER_TLV_CMD_PORT_SETTINGS_MAX + 1];
+	u32 speed;
+	u8 duplex;
+	u8 autoneg;
+
+	rocker_tlv_parse_desc(attrs, ROCKER_TLV_CMD_MAX, desc_info);
+	if (!attrs[ROCKER_TLV_CMD_INFO])
+		return -EIO;
+
+	rocker_tlv_parse_nested(info_attrs, ROCKER_TLV_CMD_PORT_SETTINGS_MAX,
+				attrs[ROCKER_TLV_CMD_INFO]);
+	if (!info_attrs[ROCKER_TLV_CMD_PORT_SETTINGS_SPEED] ||
+	    !info_attrs[ROCKER_TLV_CMD_PORT_SETTINGS_DUPLEX] ||
+	    !info_attrs[ROCKER_TLV_CMD_PORT_SETTINGS_AUTONEG])
+		return -EIO;
+
+	speed = rocker_tlv_get_u32(info_attrs[ROCKER_TLV_CMD_PORT_SETTINGS_SPEED]);
+	duplex = rocker_tlv_get_u8(info_attrs[ROCKER_TLV_CMD_PORT_SETTINGS_DUPLEX]);
+	autoneg = rocker_tlv_get_u8(info_attrs[ROCKER_TLV_CMD_PORT_SETTINGS_AUTONEG]);
+
+	ecmd->transceiver = XCVR_INTERNAL;
+	ecmd->supported = SUPPORTED_TP;
+	ecmd->phy_address = 0xff;
+	ecmd->port = PORT_TP;
+	ethtool_cmd_speed_set(ecmd, speed);
+	ecmd->duplex = duplex ? DUPLEX_FULL : DUPLEX_HALF;
+	ecmd->autoneg = autoneg ? AUTONEG_ENABLE : AUTONEG_DISABLE;
+
+	return 0;
+}
+
+static int
+rocker_cmd_get_port_settings_macaddr_proc(struct rocker *rocker,
+					  struct rocker_port *rocker_port,
+					  struct rocker_desc_info *desc_info,
+					  void *priv)
+{
+	unsigned char *macaddr = priv;
+	struct rocker_tlv *attrs[ROCKER_TLV_CMD_MAX + 1];
+	struct rocker_tlv *info_attrs[ROCKER_TLV_CMD_PORT_SETTINGS_MAX + 1];
+	struct rocker_tlv *attr;
+
+	rocker_tlv_parse_desc(attrs, ROCKER_TLV_CMD_MAX, desc_info);
+	if (!attrs[ROCKER_TLV_CMD_INFO])
+		return -EIO;
+
+	rocker_tlv_parse_nested(info_attrs, ROCKER_TLV_CMD_PORT_SETTINGS_MAX,
+				attrs[ROCKER_TLV_CMD_INFO]);
+	attr = info_attrs[ROCKER_TLV_CMD_PORT_SETTINGS_MACADDR];
+	if (!attr)
+		return -EIO;
+
+	if (rocker_tlv_len(attr) != ETH_ALEN)
+		return -EINVAL;
+
+	ether_addr_copy(macaddr, rocker_tlv_data(attr));
+	return 0;
+}
+
+static int
+rocker_cmd_set_port_settings_ethtool_prep(struct rocker *rocker,
+					  struct rocker_port *rocker_port,
+					  struct rocker_desc_info *desc_info,
+					  void *priv)
+{
+	struct ethtool_cmd *ecmd = priv;
+	struct rocker_tlv *cmd_info;
+
+	if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_CMD_TYPE,
+			       ROCKER_TLV_CMD_TYPE_SET_PORT_SETTINGS))
+		return -EMSGSIZE;
+	cmd_info = rocker_tlv_nest_start(desc_info, ROCKER_TLV_CMD_INFO);
+	if (!cmd_info)
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u32(desc_info, ROCKER_TLV_CMD_PORT_SETTINGS_LPORT,
+			       rocker_port->lport))
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u32(desc_info, ROCKER_TLV_CMD_PORT_SETTINGS_SPEED,
+			       ethtool_cmd_speed(ecmd)))
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u8(desc_info, ROCKER_TLV_CMD_PORT_SETTINGS_DUPLEX,
+			      ecmd->duplex))
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u8(desc_info, ROCKER_TLV_CMD_PORT_SETTINGS_AUTONEG,
+			      ecmd->autoneg))
+		return -EMSGSIZE;
+	rocker_tlv_nest_end(desc_info, cmd_info);
+	return 0;
+}
+
+static int
+rocker_cmd_set_port_settings_macaddr_prep(struct rocker *rocker,
+					  struct rocker_port *rocker_port,
+					  struct rocker_desc_info *desc_info,
+					  void *priv)
+{
+	unsigned char *macaddr = priv;
+	struct rocker_tlv *cmd_info;
+
+	if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_CMD_TYPE,
+			       ROCKER_TLV_CMD_TYPE_SET_PORT_SETTINGS))
+		return -EMSGSIZE;
+	cmd_info = rocker_tlv_nest_start(desc_info, ROCKER_TLV_CMD_INFO);
+	if (!cmd_info)
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u32(desc_info, ROCKER_TLV_CMD_PORT_SETTINGS_LPORT,
+			       rocker_port->lport))
+		return -EMSGSIZE;
+	if (rocker_tlv_put(desc_info, ROCKER_TLV_CMD_PORT_SETTINGS_MACADDR,
+			   ETH_ALEN, macaddr))
+		return -EMSGSIZE;
+	rocker_tlv_nest_end(desc_info, cmd_info);
+	return 0;
+}
+
+static int rocker_cmd_get_port_settings_ethtool(struct rocker_port *rocker_port,
+						struct ethtool_cmd *ecmd)
+{
+	return rocker_cmd_exec(rocker_port->rocker, rocker_port,
+			       rocker_cmd_get_port_settings_prep, NULL,
+			       rocker_cmd_get_port_settings_ethtool_proc,
+			       ecmd, false);
+}
+
+static int rocker_cmd_get_port_settings_macaddr(struct rocker_port *rocker_port,
+						unsigned char *macaddr)
+{
+	return rocker_cmd_exec(rocker_port->rocker, rocker_port,
+			       rocker_cmd_get_port_settings_prep, NULL,
+			       rocker_cmd_get_port_settings_macaddr_proc,
+			       macaddr, false);
+}
+
+static int rocker_cmd_set_port_settings_ethtool(struct rocker_port *rocker_port,
+						struct ethtool_cmd *ecmd)
+{
+	return rocker_cmd_exec(rocker_port->rocker, rocker_port,
+			       rocker_cmd_set_port_settings_ethtool_prep,
+			       ecmd, NULL, NULL, false);
+}
+
+static int rocker_cmd_set_port_settings_macaddr(struct rocker_port *rocker_port,
+						unsigned char *macaddr)
+{
+	return rocker_cmd_exec(rocker_port->rocker, rocker_port,
+			       rocker_cmd_set_port_settings_macaddr_prep,
+			       macaddr, NULL, NULL, false);
+}
+
+/*****************
+ * Net device ops
+ *****************/
+
+static int rocker_port_open(struct net_device *dev)
+{
+	struct rocker_port *rocker_port = netdev_priv(dev);
+	int err;
+
+	err = rocker_port_dma_rings_init(rocker_port);
+	if (err)
+		return err;
+
+	err = request_irq(rocker_msix_tx_vector(rocker_port),
+			  rocker_tx_irq_handler, 0,
+			  rocker_driver_name, rocker_port);
+	if (err) {
+		netdev_err(rocker_port->dev, "cannot assign tx irq\n");
+		goto err_request_tx_irq;
+	}
+
+	err = request_irq(rocker_msix_rx_vector(rocker_port),
+			  rocker_rx_irq_handler, 0,
+			  rocker_driver_name, rocker_port);
+	if (err) {
+		netdev_err(rocker_port->dev, "cannot assign rx irq\n");
+		goto err_request_rx_irq;
+	}
+
+	napi_enable(&rocker_port->napi_tx);
+	napi_enable(&rocker_port->napi_rx);
+	rocker_port_set_enable(rocker_port, true);
+	netif_start_queue(dev);
+	return 0;
+
+err_request_rx_irq:
+	free_irq(rocker_msix_tx_vector(rocker_port), rocker_port);
+err_request_tx_irq:
+	rocker_port_dma_rings_fini(rocker_port);
+	return err;
+}
+
+static int rocker_port_stop(struct net_device *dev)
+{
+	struct rocker_port *rocker_port = netdev_priv(dev);
+
+	netif_stop_queue(dev);
+	rocker_port_set_enable(rocker_port, false);
+	napi_disable(&rocker_port->napi_rx);
+	napi_disable(&rocker_port->napi_tx);
+	free_irq(rocker_msix_rx_vector(rocker_port), rocker_port);
+	free_irq(rocker_msix_tx_vector(rocker_port), rocker_port);
+	rocker_port_dma_rings_fini(rocker_port);
+
+	return 0;
+}
+
+static void rocker_tx_desc_frags_unmap(struct rocker_port *rocker_port,
+				       struct rocker_desc_info *desc_info)
+{
+	struct rocker *rocker = rocker_port->rocker;
+	struct pci_dev *pdev = rocker->pdev;
+	struct rocker_tlv *attrs[ROCKER_TLV_TX_MAX + 1];
+	struct rocker_tlv *attr;
+	int rem;
+
+	rocker_tlv_parse_desc(attrs, ROCKER_TLV_TX_MAX, desc_info);
+	if (!attrs[ROCKER_TLV_TX_FRAGS])
+		return;
+	rocker_tlv_for_each_nested(attr, attrs[ROCKER_TLV_TX_FRAGS], rem) {
+		struct rocker_tlv *frag_attrs[ROCKER_TLV_TX_FRAG_ATTR_MAX + 1];
+		dma_addr_t dma_handle;
+		size_t len;
+
+		if (rocker_tlv_type(attr) != ROCKER_TLV_TX_FRAG)
+			continue;
+		rocker_tlv_parse_nested(frag_attrs, ROCKER_TLV_TX_FRAG_ATTR_MAX,
+					attr);
+		if (!frag_attrs[ROCKER_TLV_TX_FRAG_ATTR_ADDR] ||
+		    !frag_attrs[ROCKER_TLV_TX_FRAG_ATTR_LEN])
+			continue;
+		dma_handle = rocker_tlv_get_u64(frag_attrs[ROCKER_TLV_TX_FRAG_ATTR_ADDR]);
+		len = rocker_tlv_get_u16(frag_attrs[ROCKER_TLV_TX_FRAG_ATTR_LEN]);
+		pci_unmap_single(pdev, dma_handle, len, DMA_TO_DEVICE);
+	}
+}
+
+static int rocker_tx_desc_frag_map_put(struct rocker_port *rocker_port,
+				       struct rocker_desc_info *desc_info,
+				       char *buf, size_t buf_len)
+{
+	struct rocker *rocker = rocker_port->rocker;
+	struct pci_dev *pdev = rocker->pdev;
+	dma_addr_t dma_handle;
+	struct rocker_tlv *frag;
+
+	dma_handle = pci_map_single(pdev, buf, buf_len, DMA_TO_DEVICE);
+	if (unlikely(pci_dma_mapping_error(pdev, dma_handle))) {
+		if (net_ratelimit())
+			netdev_err(rocker_port->dev, "failed to dma map tx frag\n");
+		return -EIO;
+	}
+	frag = rocker_tlv_nest_start(desc_info, ROCKER_TLV_TX_FRAG);
+	if (!frag)
+		goto unmap_frag;
+	if (rocker_tlv_put_u64(desc_info, ROCKER_TLV_TX_FRAG_ATTR_ADDR,
+			       dma_handle))
+		goto nest_cancel;
+	if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_TX_FRAG_ATTR_LEN,
+			       buf_len))
+		goto nest_cancel;
+	rocker_tlv_nest_end(desc_info, frag);
+	return 0;
+
+nest_cancel:
+	rocker_tlv_nest_cancel(desc_info, frag);
+unmap_frag:
+	pci_unmap_single(pdev, dma_handle, buf_len, DMA_TO_DEVICE);
+	return -EMSGSIZE;
+}
+
+static netdev_tx_t rocker_port_xmit(struct sk_buff *skb, struct net_device *dev)
+{
+	struct rocker_port *rocker_port = netdev_priv(dev);
+	struct rocker *rocker = rocker_port->rocker;
+	struct rocker_desc_info *desc_info;
+	struct rocker_tlv *frags;
+	int i;
+	int err;
+
+	desc_info = rocker_desc_head_get(&rocker_port->tx_ring);
+	if (unlikely(!desc_info)) {
+		if (net_ratelimit())
+			netdev_err(dev, "tx ring full when queue awake\n");
+		return NETDEV_TX_BUSY;
+	}
+
+	rocker_desc_cookie_ptr_set(desc_info, skb);
+
+	frags = rocker_tlv_nest_start(desc_info, ROCKER_TLV_TX_FRAGS);
+	if (!frags)
+		goto out;
+	err = rocker_tx_desc_frag_map_put(rocker_port, desc_info,
+					  skb->data, skb_headlen(skb));
+	if (err)
+		goto nest_cancel;
+	if (skb_shinfo(skb)->nr_frags > ROCKER_TX_FRAGS_MAX)
+		goto nest_cancel;
+
+	for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
+		const skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
+
+		err = rocker_tx_desc_frag_map_put(rocker_port, desc_info,
+						  skb_frag_address(frag),
+						  skb_frag_size(frag));
+		if (err)
+			goto unmap_frags;
+	}
+	rocker_tlv_nest_end(desc_info, frags);
+
+	rocker_desc_gen_clear(desc_info);
+	rocker_desc_head_set(rocker, &rocker_port->tx_ring, desc_info);
+
+	desc_info = rocker_desc_head_get(&rocker_port->tx_ring);
+	if (!desc_info)
+		netif_stop_queue(dev);
+
+	return NETDEV_TX_OK;
+
+unmap_frags:
+	rocker_tx_desc_frags_unmap(rocker_port, desc_info);
+nest_cancel:
+	rocker_tlv_nest_cancel(desc_info, frags);
+out:
+	dev_kfree_skb(skb);
+	return NETDEV_TX_OK;
+}
+
+static int rocker_port_set_mac_address(struct net_device *dev, void *p)
+{
+	struct sockaddr *addr = p;
+	struct rocker_port *rocker_port = netdev_priv(dev);
+	int err;
+
+	if (!is_valid_ether_addr(addr->sa_data))
+		return -EADDRNOTAVAIL;
+
+	err = rocker_cmd_set_port_settings_macaddr(rocker_port, addr->sa_data);
+	if (err)
+		return err;
+	memcpy(dev->dev_addr, addr->sa_data, dev->addr_len);
+	return 0;
+}
+
+static int rocker_port_sw_parent_id_get(struct net_device *dev,
+					struct netdev_phys_item_id *psid)
+{
+	struct rocker_port *rocker_port = netdev_priv(dev);
+	struct rocker *rocker = rocker_port->rocker;
+
+	psid->id_len = sizeof(rocker->hw.id);
+	memcpy(&psid->id, &rocker->hw.id, psid->id_len);
+	return 0;
+}
+
+static const struct net_device_ops rocker_port_netdev_ops = {
+	.ndo_open			= rocker_port_open,
+	.ndo_stop			= rocker_port_stop,
+	.ndo_start_xmit			= rocker_port_xmit,
+	.ndo_set_mac_address		= rocker_port_set_mac_address,
+	.ndo_sw_parent_id_get		= rocker_port_sw_parent_id_get,
+};
+
+/********************
+ * ethtool interface
+ ********************/
+
+static int rocker_port_get_settings(struct net_device *dev,
+				    struct ethtool_cmd *ecmd)
+{
+	struct rocker_port *rocker_port = netdev_priv(dev);
+
+	return rocker_cmd_get_port_settings_ethtool(rocker_port, ecmd);
+}
+
+static int rocker_port_set_settings(struct net_device *dev,
+				    struct ethtool_cmd *ecmd)
+{
+	struct rocker_port *rocker_port = netdev_priv(dev);
+
+	return rocker_cmd_set_port_settings_ethtool(rocker_port, ecmd);
+}
+
+static void rocker_port_get_drvinfo(struct net_device *dev,
+				    struct ethtool_drvinfo *drvinfo)
+{
+	strlcpy(drvinfo->driver, rocker_driver_name, sizeof(drvinfo->driver));
+	strlcpy(drvinfo->version, UTS_RELEASE, sizeof(drvinfo->version));
+}
+
+static const struct ethtool_ops rocker_port_ethtool_ops = {
+	.get_settings		= rocker_port_get_settings,
+	.set_settings		= rocker_port_set_settings,
+	.get_drvinfo		= rocker_port_get_drvinfo,
+	.get_link		= ethtool_op_get_link,
+};
+
+/*****************
+ * NAPI interface
+ *****************/
+
+static struct rocker_port *rocker_port_napi_tx_get(struct napi_struct *napi)
+{
+	return container_of(napi, struct rocker_port, napi_tx);
+}
+
+static int rocker_port_poll_tx(struct napi_struct *napi, int budget)
+{
+	struct rocker_port *rocker_port = rocker_port_napi_tx_get(napi);
+	struct rocker *rocker = rocker_port->rocker;
+	struct rocker_desc_info *desc_info;
+	u32 credits = 0;
+	int err;
+
+	/* Cleanup tx descriptors */
+	while ((desc_info = rocker_desc_tail_get(&rocker_port->tx_ring))) {
+		err = rocker_desc_err(desc_info);
+		if (err && net_ratelimit())
+			netdev_err(rocker_port->dev, "tx desc received with err %d\n",
+				   err);
+		rocker_tx_desc_frags_unmap(rocker_port, desc_info);
+		dev_kfree_skb_any(rocker_desc_cookie_ptr_get(desc_info));
+		credits++;
+	}
+
+	if (credits && netif_queue_stopped(rocker_port->dev))
+		netif_wake_queue(rocker_port->dev);
+
+	napi_complete(napi);
+	rocker_dma_ring_credits_set(rocker, &rocker_port->tx_ring, credits);
+
+	return 0;
+}
+
+static int rocker_port_rx_proc(struct rocker *rocker,
+			       struct rocker_port *rocker_port,
+			       struct rocker_desc_info *desc_info)
+{
+	struct rocker_tlv *attrs[ROCKER_TLV_RX_MAX + 1];
+	struct sk_buff *skb = rocker_desc_cookie_ptr_get(desc_info);
+	size_t rx_len;
+
+	if (!skb)
+		return -ENOENT;
+
+	rocker_tlv_parse_desc(attrs, ROCKER_TLV_RX_MAX, desc_info);
+	if (!attrs[ROCKER_TLV_RX_FRAG_LEN])
+		return -EINVAL;
+
+	rocker_dma_rx_ring_skb_unmap(rocker, attrs);
+
+	rx_len = rocker_tlv_get_u16(attrs[ROCKER_TLV_RX_FRAG_LEN]);
+	skb_put(skb, rx_len);
+	skb->protocol = eth_type_trans(skb, rocker_port->dev);
+	netif_receive_skb(skb);
+
+	return rocker_dma_rx_ring_skb_alloc(rocker, rocker_port, desc_info);
+}
+
+static struct rocker_port *rocker_port_napi_rx_get(struct napi_struct *napi)
+{
+	return container_of(napi, struct rocker_port, napi_rx);
+}
+
+static int rocker_port_poll_rx(struct napi_struct *napi, int budget)
+{
+	struct rocker_port *rocker_port = rocker_port_napi_rx_get(napi);
+	struct rocker *rocker = rocker_port->rocker;
+	struct rocker_desc_info *desc_info;
+	u32 credits = 0;
+	int err;
+
+	/* Process rx descriptors */
+	while (credits < budget &&
+	       (desc_info = rocker_desc_tail_get(&rocker_port->rx_ring))) {
+		err = rocker_desc_err(desc_info);
+		if (err) {
+			if (net_ratelimit())
+				netdev_err(rocker_port->dev, "rx desc received with err %d\n",
+					   err);
+		} else {
+			err = rocker_port_rx_proc(rocker, rocker_port,
+						  desc_info);
+			if (err && net_ratelimit())
+				netdev_err(rocker_port->dev, "rx processing failed with err %d\n",
+					   err);
+		}
+		rocker_desc_gen_clear(desc_info);
+		rocker_desc_head_set(rocker, &rocker_port->rx_ring, desc_info);
+		credits++;
+	}
+
+	if (credits < budget)
+		napi_complete(napi);
+
+	rocker_dma_ring_credits_set(rocker, &rocker_port->rx_ring, credits);
+
+	return credits;
+}
+
+/*****************
+ * PCI driver ops
+ *****************/
+
+static void rocker_carrier_init(struct rocker_port *rocker_port)
+{
+	struct rocker *rocker = rocker_port->rocker;
+	u64 link_status = rocker_read64(rocker, PORT_PHYS_LINK_STATUS);
+	bool link_up;
+
+	link_up = link_status & (1 << rocker_port->lport);
+	if (link_up)
+		netif_carrier_on(rocker_port->dev);
+	else
+		netif_carrier_off(rocker_port->dev);
+}
+
+static void rocker_remove_ports(struct rocker *rocker)
+{
+	int i;
+
+	for (i = 0; i < rocker->port_count; i++)
+		unregister_netdev(rocker->ports[i]->dev);
+	kfree(rocker->ports);
+}
+
+static void rocker_port_dev_addr_init(struct rocker *rocker,
+				      struct rocker_port *rocker_port)
+{
+	struct pci_dev *pdev = rocker->pdev;
+	int err;
+
+	err = rocker_cmd_get_port_settings_macaddr(rocker_port,
+						   rocker_port->dev->dev_addr);
+	if (err) {
+		dev_warn(&pdev->dev, "failed to get mac address, using random\n");
+		eth_hw_addr_random(rocker_port->dev);
+	}
+}
+
+static int rocker_probe_port(struct rocker *rocker, unsigned int port_number)
+{
+	struct pci_dev *pdev = rocker->pdev;
+	struct rocker_port *rocker_port;
+	struct net_device *dev;
+	int err;
+
+	dev = alloc_etherdev(sizeof(struct rocker_port));
+	if (!dev)
+		return -ENOMEM;
+	rocker_port = netdev_priv(dev);
+	rocker_port->dev = dev;
+	rocker_port->rocker = rocker;
+	rocker_port->port_number = port_number;
+	rocker_port->lport = port_number + 1;
+
+	rocker_port_dev_addr_init(rocker, rocker_port);
+	dev->netdev_ops = &rocker_port_netdev_ops;
+	dev->ethtool_ops = &rocker_port_ethtool_ops;
+	netif_napi_add(dev, &rocker_port->napi_tx, rocker_port_poll_tx,
+		       NAPI_POLL_WEIGHT);
+	netif_napi_add(dev, &rocker_port->napi_rx, rocker_port_poll_rx,
+		       NAPI_POLL_WEIGHT);
+	rocker_carrier_init(rocker_port);
+
+	dev->features |= NETIF_F_HW_VLAN_CTAG_FILTER;
+
+	err = register_netdev(dev);
+	if (err) {
+		dev_err(&pdev->dev, "register_netdev failed\n");
+		goto err_register_netdev;
+	}
+	rocker->ports[port_number] = rocker_port;
+
+	return 0;
+
+err_register_netdev:
+	free_netdev(dev);
+	return err;
+}
+
+static int rocker_probe_ports(struct rocker *rocker)
+{
+	int i;
+	size_t alloc_size;
+	int err;
+
+	alloc_size = sizeof(struct rocker_port *) * rocker->port_count;
+	rocker->ports = kmalloc(alloc_size, GFP_KERNEL);
+	for (i = 0; i < rocker->port_count; i++) {
+		err = rocker_probe_port(rocker, i);
+		if (err)
+			goto remove_ports;
+	}
+	return 0;
+
+remove_ports:
+	rocker_remove_ports(rocker);
+	return err;
+}
+
+static int rocker_msix_init(struct rocker *rocker)
+{
+	struct pci_dev *pdev = rocker->pdev;
+	int msix_entries;
+	int i;
+	int err;
+
+	msix_entries = pci_msix_vec_count(pdev);
+	if (msix_entries < 0)
+		return msix_entries;
+
+	if (msix_entries != ROCKER_MSIX_VEC_COUNT(rocker->port_count))
+		return -EINVAL;
+
+	rocker->msix_entries = kmalloc_array(msix_entries,
+					     sizeof(struct msix_entry),
+					     GFP_KERNEL);
+	if (!rocker->msix_entries)
+		return -ENOMEM;
+
+	for (i = 0; i < msix_entries; i++)
+		rocker->msix_entries[i].entry = i;
+
+	err = pci_enable_msix_exact(pdev, rocker->msix_entries, msix_entries);
+	if (err < 0)
+		goto err_enable_msix;
+
+	return 0;
+
+err_enable_msix:
+	kfree(rocker->msix_entries);
+	return err;
+}
+
+static void rocker_msix_fini(struct rocker *rocker)
+{
+	pci_disable_msix(rocker->pdev);
+	kfree(rocker->msix_entries);
+}
+
+static int rocker_probe(struct pci_dev *pdev, const struct pci_device_id *id)
+{
+	struct rocker *rocker;
+	int err;
+
+	rocker = kzalloc(sizeof(*rocker), GFP_KERNEL);
+	if (!rocker)
+		return -ENOMEM;
+
+	err = pci_enable_device(pdev);
+	if (err) {
+		dev_err(&pdev->dev, "pci_enable_device failed\n");
+		goto err_pci_enable_device;
+	}
+
+	err = pci_request_regions(pdev, rocker_driver_name);
+	if (err) {
+		dev_err(&pdev->dev, "pci_request_regions failed\n");
+		goto err_pci_request_regions;
+	}
+
+	err = pci_set_dma_mask(pdev, DMA_BIT_MASK(64));
+	if (!err) {
+		err = pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(64));
+		if (err) {
+			dev_err(&pdev->dev, "pci_set_consistent_dma_mask failed\n");
+			goto err_pci_set_dma_mask;
+		}
+	} else {
+		err = pci_set_dma_mask(pdev, DMA_BIT_MASK(32));
+		if (err) {
+			dev_err(&pdev->dev, "pci_set_dma_mask failed\n");
+			goto err_pci_set_dma_mask;
+		}
+	}
+
+	if (pci_resource_len(pdev, 0) < ROCKER_PCI_BAR0_SIZE) {
+		dev_err(&pdev->dev, "invalid PCI region size\n");
+		goto err_pci_resource_len_check;
+	}
+
+	rocker->hw_addr = ioremap(pci_resource_start(pdev, 0),
+				  pci_resource_len(pdev, 0));
+	if (!rocker->hw_addr) {
+		dev_err(&pdev->dev, "ioremap failed\n");
+		err = -EIO;
+		goto err_ioremap;
+	}
+	pci_set_master(pdev);
+
+	rocker->pdev = pdev;
+	pci_set_drvdata(pdev, rocker);
+
+	rocker->port_count = rocker_read32(rocker, PORT_PHYS_COUNT);
+
+	err = rocker_msix_init(rocker);
+	if (err) {
+		dev_err(&pdev->dev, "MSI-X init failed\n");
+		goto err_msix_init;
+	}
+
+	err = rocker_basic_hw_test(rocker);
+	if (err) {
+		dev_err(&pdev->dev, "basic hw test failed\n");
+		goto err_basic_hw_test;
+	}
+
+	rocker_write32(rocker, CONTROL, ROCKER_CONTROL_RESET);
+
+	err = rocker_dma_rings_init(rocker);
+	if (err)
+		goto err_dma_rings_init;
+
+	err = request_irq(rocker_msix_vector(rocker, ROCKER_MSIX_VEC_CMD),
+			  rocker_cmd_irq_handler, 0,
+			  rocker_driver_name, rocker);
+	if (err) {
+		dev_err(&pdev->dev, "cannot assign cmd irq\n");
+		goto err_request_cmd_irq;
+	}
+
+	err = request_irq(rocker_msix_vector(rocker, ROCKER_MSIX_VEC_EVENT),
+			  rocker_event_irq_handler, 0,
+			  rocker_driver_name, rocker);
+	if (err) {
+		dev_err(&pdev->dev, "cannot assign event irq\n");
+		goto err_request_event_irq;
+	}
+
+	rocker->hw.id = rocker_read64(rocker, SWITCH_ID);
+
+	err = rocker_probe_ports(rocker);
+	if (err) {
+		dev_err(&pdev->dev, "failed to probe ports\n");
+		goto err_probe_ports;
+	}
+
+	dev_info(&pdev->dev, "Rocker switch with id %016llx\n", rocker->hw.id);
+
+	return 0;
+
+err_probe_ports:
+	free_irq(rocker_msix_vector(rocker, ROCKER_MSIX_VEC_EVENT), rocker);
+err_request_event_irq:
+	free_irq(rocker_msix_vector(rocker, ROCKER_MSIX_VEC_CMD), rocker);
+err_request_cmd_irq:
+	rocker_dma_rings_fini(rocker);
+err_dma_rings_init:
+err_basic_hw_test:
+	rocker_msix_fini(rocker);
+err_msix_init:
+	iounmap(rocker->hw_addr);
+err_ioremap:
+err_pci_resource_len_check:
+err_pci_set_dma_mask:
+	pci_release_regions(pdev);
+err_pci_request_regions:
+	pci_disable_device(pdev);
+err_pci_enable_device:
+	kfree(rocker);
+	return err;
+}
+
+static void rocker_remove(struct pci_dev *pdev)
+{
+	struct rocker *rocker = pci_get_drvdata(pdev);
+
+	rocker_write32(rocker, CONTROL, ROCKER_CONTROL_RESET);
+	rocker_remove_ports(rocker);
+	free_irq(rocker_msix_vector(rocker, ROCKER_MSIX_VEC_EVENT), rocker);
+	free_irq(rocker_msix_vector(rocker, ROCKER_MSIX_VEC_CMD), rocker);
+	rocker_dma_rings_fini(rocker);
+	rocker_msix_fini(rocker);
+	iounmap(rocker->hw_addr);
+	pci_release_regions(rocker->pdev);
+	pci_disable_device(rocker->pdev);
+	kfree(rocker);
+}
+
+static struct pci_driver rocker_pci_driver = {
+	.name		= rocker_driver_name,
+	.id_table	= rocker_pci_id_table,
+	.probe		= rocker_probe,
+	.remove		= rocker_remove,
+};
+
+/***********************
+ * Module init and exit
+ ***********************/
+
+static int __init rocker_module_init(void)
+{
+	return pci_register_driver(&rocker_pci_driver);
+}
+
+static void __exit rocker_module_exit(void)
+{
+	pci_unregister_driver(&rocker_pci_driver);
+}
+
+module_init(rocker_module_init);
+module_exit(rocker_module_exit);
+
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("Jiri Pirko <jiri@resnulli.us>");
+MODULE_AUTHOR("Scott Feldman <sfeldma@gmail.com>");
+MODULE_DESCRIPTION("Rocker switch device driver");
+MODULE_DEVICE_TABLE(pci, rocker_pci_id_table);
diff --git a/drivers/net/ethernet/rocker/rocker.h b/drivers/net/ethernet/rocker/rocker.h
new file mode 100644
index 0000000..5251cf8
--- /dev/null
+++ b/drivers/net/ethernet/rocker/rocker.h
@@ -0,0 +1,427 @@
+/*
+ * drivers/net/ethernet/rocker/rocker.h - Rocker switch device driver
+ * Copyright (c) 2014 Jiri Pirko <jiri@resnulli.us>
+ * Copyright (c) 2014 Scott Feldman <sfeldma@gmail.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#ifndef _ROCKER_H
+#define _ROCKER_H
+
+#include <linux/types.h>
+
+#define PCI_VENDOR_ID_REDHAT		0x1b36
+#define PCI_DEVICE_ID_REDHAT_ROCKER	0x0006
+
+#define ROCKER_PCI_BAR0_SIZE		0x2000
+
+/* MSI-X vectors */
+enum {
+	ROCKER_MSIX_VEC_CMD,
+	ROCKER_MSIX_VEC_EVENT,
+	ROCKER_MSIX_VEC_TEST,
+	ROCKER_MSIX_VEC_RESERVED0,
+	__ROCKER_MSIX_VEC_TX,
+	__ROCKER_MSIX_VEC_RX,
+#define ROCKER_MSIX_VEC_TX(port) \
+	(__ROCKER_MSIX_VEC_TX + ((port) * 2))
+#define ROCKER_MSIX_VEC_RX(port) \
+	(__ROCKER_MSIX_VEC_RX + ((port) * 2))
+#define ROCKER_MSIX_VEC_COUNT(portcnt) \
+	(ROCKER_MSIX_VEC_RX((portcnt - 1)) + 1)
+};
+
+/* Rocker bogus registers */
+#define ROCKER_BOGUS_REG0		0x0000
+#define ROCKER_BOGUS_REG1		0x0004
+#define ROCKER_BOGUS_REG2		0x0008
+#define ROCKER_BOGUS_REG3		0x000c
+
+/* Rocker test registers */
+#define ROCKER_TEST_REG			0x0010
+#define ROCKER_TEST_REG64		0x0018  /* 8-byte */
+#define ROCKER_TEST_IRQ			0x0020
+#define ROCKER_TEST_DMA_ADDR		0x0028  /* 8-byte */
+#define ROCKER_TEST_DMA_SIZE		0x0030
+#define ROCKER_TEST_DMA_CTRL		0x0034
+
+/* Rocker test register ctrl */
+#define ROCKER_TEST_DMA_CTRL_CLEAR	(1 << 0)
+#define ROCKER_TEST_DMA_CTRL_FILL	(1 << 1)
+#define ROCKER_TEST_DMA_CTRL_INVERT	(1 << 2)
+
+/* Rocker DMA ring register offsets */
+#define ROCKER_DMA_DESC_ADDR(x)		(0x1000 + (x) * 32)  /* 8-byte */
+#define ROCKER_DMA_DESC_SIZE(x)		(0x1008 + (x) * 32)
+#define ROCKER_DMA_DESC_HEAD(x)		(0x100c + (x) * 32)
+#define ROCKER_DMA_DESC_TAIL(x)		(0x1010 + (x) * 32)
+#define ROCKER_DMA_DESC_CTRL(x)		(0x1014 + (x) * 32)
+#define ROCKER_DMA_DESC_CREDITS(x)	(0x1018 + (x) * 32)
+#define ROCKER_DMA_DESC_RES1(x)		(0x101c + (x) * 32)
+
+/* Rocker dma ctrl register bits */
+#define ROCKER_DMA_DESC_CTRL_RESET	(1 << 0)
+
+/* Rocker DMA ring types */
+enum rocker_dma_type {
+	ROCKER_DMA_CMD,
+	ROCKER_DMA_EVENT,
+	__ROCKER_DMA_TX,
+	__ROCKER_DMA_RX,
+#define ROCKER_DMA_TX(port) (__ROCKER_DMA_TX + (port) * 2)
+#define ROCKER_DMA_RX(port) (__ROCKER_DMA_RX + (port) * 2)
+};
+
+/* Rocker DMA ring size limits and default sizes */
+#define ROCKER_DMA_SIZE_MIN		2ul
+#define ROCKER_DMA_SIZE_MAX		65536ul
+#define ROCKER_DMA_CMD_DEFAULT_SIZE	32ul
+#define ROCKER_DMA_EVENT_DEFAULT_SIZE	32ul
+#define ROCKER_DMA_TX_DEFAULT_SIZE	64ul
+#define ROCKER_DMA_TX_DESC_SIZE		256
+#define ROCKER_DMA_RX_DEFAULT_SIZE	64ul
+#define ROCKER_DMA_RX_DESC_SIZE		256
+
+/* Rocker DMA descriptor struct */
+struct rocker_desc {
+	u64 buf_addr;
+	u64 cookie;
+	u16 buf_size;
+	u16 tlv_size;
+	u16 resv[5];
+	u16 comp_err;
+};
+
+#define ROCKER_DMA_DESC_COMP_ERR_GEN	(1 << 15)
+
+/* Rocker DMA TLV struct */
+struct rocker_tlv {
+	u32 type;
+	u16 len;
+};
+
+/* TLVs */
+enum {
+	ROCKER_TLV_CMD_UNSPEC,
+	ROCKER_TLV_CMD_TYPE,	/* u16 */
+	ROCKER_TLV_CMD_INFO,	/* nest */
+
+	__ROCKER_TLV_CMD_MAX,
+	ROCKER_TLV_CMD_MAX = __ROCKER_TLV_CMD_MAX - 1,
+};
+
+enum {
+	ROCKER_TLV_CMD_TYPE_UNSPEC,
+	ROCKER_TLV_CMD_TYPE_GET_PORT_SETTINGS,
+	ROCKER_TLV_CMD_TYPE_SET_PORT_SETTINGS,
+	ROCKER_TLV_CMD_TYPE_OF_DPA_FLOW_ADD,
+	ROCKER_TLV_CMD_TYPE_OF_DPA_FLOW_MOD,
+	ROCKER_TLV_CMD_TYPE_OF_DPA_FLOW_DEL,
+	ROCKER_TLV_CMD_TYPE_OF_DPA_FLOW_GET_STATS,
+	ROCKER_TLV_CMD_TYPE_OF_DPA_GROUP_ADD,
+	ROCKER_TLV_CMD_TYPE_OF_DPA_GROUP_MOD,
+	ROCKER_TLV_CMD_TYPE_OF_DPA_GROUP_DEL,
+	ROCKER_TLV_CMD_TYPE_OF_DPA_GROUP_GET_STATS,
+
+	__ROCKER_TLV_CMD_TYPE_MAX,
+	ROCKER_TLV_CMD_TYPE_MAX = __ROCKER_TLV_CMD_TYPE_MAX - 1,
+};
+
+enum {
+	ROCKER_TLV_CMD_PORT_SETTINGS_UNSPEC,
+	ROCKER_TLV_CMD_PORT_SETTINGS_LPORT,		/* u32 */
+	ROCKER_TLV_CMD_PORT_SETTINGS_SPEED,		/* u32 */
+	ROCKER_TLV_CMD_PORT_SETTINGS_DUPLEX,		/* u8 */
+	ROCKER_TLV_CMD_PORT_SETTINGS_AUTONEG,		/* u8 */
+	ROCKER_TLV_CMD_PORT_SETTINGS_MACADDR,		/* binary */
+	ROCKER_TLV_CMD_PORT_SETTINGS_MODE,		/* u8 */
+
+	__ROCKER_TLV_CMD_PORT_SETTINGS_MAX,
+	ROCKER_TLV_CMD_PORT_SETTINGS_MAX =
+			__ROCKER_TLV_CMD_PORT_SETTINGS_MAX - 1,
+};
+
+enum rocker_port_mode {
+	ROCKER_PORT_MODE_OF_DPA,
+};
+
+enum {
+	ROCKER_TLV_EVENT_UNSPEC,
+	ROCKER_TLV_EVENT_TYPE,	/* u16 */
+	ROCKER_TLV_EVENT_INFO,	/* nest */
+
+	__ROCKER_TLV_EVENT_MAX,
+	ROCKER_TLV_EVENT_MAX = __ROCKER_TLV_EVENT_MAX - 1,
+};
+
+enum {
+	ROCKER_TLV_EVENT_TYPE_UNSPEC,
+	ROCKER_TLV_EVENT_TYPE_LINK_CHANGED,
+	ROCKER_TLV_EVENT_TYPE_MAC_VLAN_SEEN,
+
+	__ROCKER_TLV_EVENT_TYPE_MAX,
+	ROCKER_TLV_EVENT_TYPE_MAX = __ROCKER_TLV_EVENT_TYPE_MAX - 1,
+};
+
+enum {
+	ROCKER_TLV_EVENT_LINK_CHANGED_UNSPEC,
+	ROCKER_TLV_EVENT_LINK_CHANGED_LPORT,	/* u32 */
+	ROCKER_TLV_EVENT_LINK_CHANGED_LINKUP,	/* u8 */
+
+	__ROCKER_TLV_EVENT_LINK_CHANGED_MAX,
+	ROCKER_TLV_EVENT_LINK_CHANGED_MAX =
+			__ROCKER_TLV_EVENT_LINK_CHANGED_MAX - 1,
+};
+
+enum {
+	ROCKER_TLV_EVENT_MAC_VLAN_UNSPEC,
+	ROCKER_TLV_EVENT_MAC_VLAN_LPORT,	/* u32 */
+	ROCKER_TLV_EVENT_MAC_VLAN_MAC,		/* binary */
+	ROCKER_TLV_EVENT_MAC_VLAN_VLAN_ID,	/* __be16 */
+
+	__ROCKER_TLV_EVENT_MAC_VLAN_MAX,
+	ROCKER_TLV_EVENT_MAC_VLAN_MAX = __ROCKER_TLV_EVENT_MAC_VLAN_MAX - 1,
+};
+
+enum {
+	ROCKER_TLV_RX_UNSPEC,
+	ROCKER_TLV_RX_FLAGS,		/* u16, see ROCKER_RX_FLAGS_ */
+	ROCKER_TLV_RX_CSUM,		/* u16 */
+	ROCKER_TLV_RX_FRAG_ADDR,	/* u64 */
+	ROCKER_TLV_RX_FRAG_MAX_LEN,	/* u16 */
+	ROCKER_TLV_RX_FRAG_LEN,		/* u16 */
+
+	__ROCKER_TLV_RX_MAX,
+	ROCKER_TLV_RX_MAX = __ROCKER_TLV_RX_MAX - 1,
+};
+
+#define ROCKER_RX_FLAGS_IPV4			(1 << 0)
+#define ROCKER_RX_FLAGS_IPV6			(1 << 1)
+#define ROCKER_RX_FLAGS_CSUM_CALC		(1 << 2)
+#define ROCKER_RX_FLAGS_IPV4_CSUM_GOOD		(1 << 3)
+#define ROCKER_RX_FLAGS_IP_FRAG			(1 << 4)
+#define ROCKER_RX_FLAGS_TCP			(1 << 5)
+#define ROCKER_RX_FLAGS_UDP			(1 << 6)
+#define ROCKER_RX_FLAGS_TCP_UDP_CSUM_GOOD	(1 << 7)
+
+enum {
+	ROCKER_TLV_TX_UNSPEC,
+	ROCKER_TLV_TX_OFFLOAD,		/* u8, see ROCKER_TX_OFFLOAD_ */
+	ROCKER_TLV_TX_L3_CSUM_OFF,	/* u16 */
+	ROCKER_TLV_TX_TSO_MSS,		/* u16 */
+	ROCKER_TLV_TX_TSO_HDR_LEN,	/* u16 */
+	ROCKER_TLV_TX_FRAGS,		/* array */
+
+	__ROCKER_TLV_TX_MAX,
+	ROCKER_TLV_TX_MAX = __ROCKER_TLV_TX_MAX - 1,
+};
+
+#define ROCKER_TX_OFFLOAD_NONE		0
+#define ROCKER_TX_OFFLOAD_IP_CSUM	1
+#define ROCKER_TX_OFFLOAD_TCP_UDP_CSUM	2
+#define ROCKER_TX_OFFLOAD_L3_CSUM	3
+#define ROCKER_TX_OFFLOAD_TSO		4
+
+#define ROCKER_TX_FRAGS_MAX		16
+
+enum {
+	ROCKER_TLV_TX_FRAG_UNSPEC,
+	ROCKER_TLV_TX_FRAG,		/* nest */
+
+	__ROCKER_TLV_TX_FRAG_MAX,
+	ROCKER_TLV_TX_FRAG_MAX = __ROCKER_TLV_TX_FRAG_MAX - 1,
+};
+
+enum {
+	ROCKER_TLV_TX_FRAG_ATTR_UNSPEC,
+	ROCKER_TLV_TX_FRAG_ATTR_ADDR,	/* u64 */
+	ROCKER_TLV_TX_FRAG_ATTR_LEN,	/* u16 */
+
+	__ROCKER_TLV_TX_FRAG_ATTR_MAX,
+	ROCKER_TLV_TX_FRAG_ATTR_MAX = __ROCKER_TLV_TX_FRAG_ATTR_MAX - 1,
+};
+
+/* cmd info nested for OF-DPA msgs */
+enum {
+	ROCKER_TLV_OF_DPA_UNSPEC,
+	ROCKER_TLV_OF_DPA_TABLE_ID,		/* u16 */
+	ROCKER_TLV_OF_DPA_PRIORITY,		/* u32 */
+	ROCKER_TLV_OF_DPA_HARDTIME,		/* u32 */
+	ROCKER_TLV_OF_DPA_IDLETIME,		/* u32 */
+	ROCKER_TLV_OF_DPA_COOKIE,		/* u64 */
+	ROCKER_TLV_OF_DPA_IN_LPORT,		/* u32 */
+	ROCKER_TLV_OF_DPA_IN_LPORT_MASK,	/* u32 */
+	ROCKER_TLV_OF_DPA_OUT_LPORT,		/* u32 */
+	ROCKER_TLV_OF_DPA_GOTO_TABLE_ID,	/* u16 */
+	ROCKER_TLV_OF_DPA_GROUP_ID,		/* u32 */
+	ROCKER_TLV_OF_DPA_GROUP_ID_LOWER,	/* u32 */
+	ROCKER_TLV_OF_DPA_GROUP_COUNT,		/* u16 */
+	ROCKER_TLV_OF_DPA_GROUP_IDS,		/* u32 array */
+	ROCKER_TLV_OF_DPA_VLAN_ID,		/* __be16 */
+	ROCKER_TLV_OF_DPA_VLAN_ID_MASK,		/* __be16 */
+	ROCKER_TLV_OF_DPA_VLAN_PCP,		/* __be16 */
+	ROCKER_TLV_OF_DPA_VLAN_PCP_MASK,	/* __be16 */
+	ROCKER_TLV_OF_DPA_VLAN_PCP_ACTION,	/* u8 */
+	ROCKER_TLV_OF_DPA_NEW_VLAN_ID,		/* __be16 */
+	ROCKER_TLV_OF_DPA_NEW_VLAN_PCP,		/* u8 */
+	ROCKER_TLV_OF_DPA_TUNNEL_ID,		/* u32 */
+	ROCKER_TLV_OF_DPA_TUN_LOG_LPORT,	/* u32 */
+	ROCKER_TLV_OF_DPA_ETHERTYPE,		/* __be16 */
+	ROCKER_TLV_OF_DPA_DST_MAC,		/* binary */
+	ROCKER_TLV_OF_DPA_DST_MAC_MASK,		/* binary */
+	ROCKER_TLV_OF_DPA_SRC_MAC,		/* binary */
+	ROCKER_TLV_OF_DPA_SRC_MAC_MASK,		/* binary */
+	ROCKER_TLV_OF_DPA_IP_PROTO,		/* u8 */
+	ROCKER_TLV_OF_DPA_IP_PROTO_MASK,	/* u8 */
+	ROCKER_TLV_OF_DPA_IP_DSCP,		/* u8 */
+	ROCKER_TLV_OF_DPA_IP_DSCP_MASK,		/* u8 */
+	ROCKER_TLV_OF_DPA_IP_DSCP_ACTION,	/* u8 */
+	ROCKER_TLV_OF_DPA_NEW_IP_DSCP,		/* u8 */
+	ROCKER_TLV_OF_DPA_IP_ECN,		/* u8 */
+	ROCKER_TLV_OF_DPA_IP_ECN_MASK,		/* u8 */
+	ROCKER_TLV_OF_DPA_DST_IP,		/* __be32 */
+	ROCKER_TLV_OF_DPA_DST_IP_MASK,		/* __be32 */
+	ROCKER_TLV_OF_DPA_SRC_IP,		/* __be32 */
+	ROCKER_TLV_OF_DPA_SRC_IP_MASK,		/* __be32 */
+	ROCKER_TLV_OF_DPA_DST_IPV6,		/* binary */
+	ROCKER_TLV_OF_DPA_DST_IPV6_MASK,	/* binary */
+	ROCKER_TLV_OF_DPA_SRC_IPV6,		/* binary */
+	ROCKER_TLV_OF_DPA_SRC_IPV6_MASK,	/* binary */
+	ROCKER_TLV_OF_DPA_SRC_ARP_IP,		/* __be32 */
+	ROCKER_TLV_OF_DPA_SRC_ARP_IP_MASK,	/* __be32 */
+	ROCKER_TLV_OF_DPA_L4_DST_PORT,		/* __be16 */
+	ROCKER_TLV_OF_DPA_L4_DST_PORT_MASK,	/* __be16 */
+	ROCKER_TLV_OF_DPA_L4_SRC_PORT,		/* __be16 */
+	ROCKER_TLV_OF_DPA_L4_SRC_PORT_MASK,	/* __be16 */
+	ROCKER_TLV_OF_DPA_ICMP_TYPE,		/* u8 */
+	ROCKER_TLV_OF_DPA_ICMP_TYPE_MASK,	/* u8 */
+	ROCKER_TLV_OF_DPA_ICMP_CODE,		/* u8 */
+	ROCKER_TLV_OF_DPA_ICMP_CODE_MASK,	/* u8 */
+	ROCKER_TLV_OF_DPA_IPV6_LABEL,		/* __be32 */
+	ROCKER_TLV_OF_DPA_IPV6_LABEL_MASK,	/* __be32 */
+	ROCKER_TLV_OF_DPA_QUEUE_ID_ACTION,	/* u8 */
+	ROCKER_TLV_OF_DPA_NEW_QUEUE_ID,		/* u8 */
+	ROCKER_TLV_OF_DPA_CLEAR_ACTIONS,	/* u32 */
+	ROCKER_TLV_OF_DPA_POP_VLAN,		/* u8 */
+	ROCKER_TLV_OF_DPA_TTL_CHECK,		/* u8 */
+	ROCKER_TLV_OF_DPA_COPY_CPU_ACTION,	/* u8 */
+
+	__ROCKER_TLV_OF_DPA_MAX,
+	ROCKER_TLV_OF_DPA_MAX = __ROCKER_TLV_OF_DPA_MAX - 1,
+};
+
+/* OF-DPA table IDs */
+
+enum rocker_of_dpa_table_id {
+	ROCKER_OF_DPA_TABLE_ID_INGRESS_PORT = 0,
+	ROCKER_OF_DPA_TABLE_ID_VLAN = 10,
+	ROCKER_OF_DPA_TABLE_ID_TERMINATION_MAC = 20,
+	ROCKER_OF_DPA_TABLE_ID_UNICAST_ROUTING = 30,
+	ROCKER_OF_DPA_TABLE_ID_MULTICAST_ROUTING = 40,
+	ROCKER_OF_DPA_TABLE_ID_BRIDGING = 50,
+	ROCKER_OF_DPA_TABLE_ID_ACL_POLICY = 60,
+};
+
+/* OF-DPA flow stats */
+enum {
+	ROCKER_TLV_OF_DPA_FLOW_STAT_UNSPEC,
+	ROCKER_TLV_OF_DPA_FLOW_STAT_DURATION,	/* u32 */
+	ROCKER_TLV_OF_DPA_FLOW_STAT_RX_PKTS,	/* u64 */
+	ROCKER_TLV_OF_DPA_FLOW_STAT_TX_PKTS,	/* u64 */
+
+	__ROCKER_TLV_OF_DPA_FLOW_STAT_MAX,
+	ROCKER_TLV_OF_DPA_FLOW_STAT_MAX = __ROCKER_TLV_OF_DPA_FLOW_STAT_MAX - 1,
+};
+
+/* OF-DPA group types */
+enum rocker_of_dpa_group_type {
+	ROCKER_OF_DPA_GROUP_TYPE_L2_INTERFACE = 0,
+	ROCKER_OF_DPA_GROUP_TYPE_L2_REWRITE,
+	ROCKER_OF_DPA_GROUP_TYPE_L3_UCAST,
+	ROCKER_OF_DPA_GROUP_TYPE_L2_MCAST,
+	ROCKER_OF_DPA_GROUP_TYPE_L2_FLOOD,
+	ROCKER_OF_DPA_GROUP_TYPE_L3_INTERFACE,
+	ROCKER_OF_DPA_GROUP_TYPE_L3_MCAST,
+	ROCKER_OF_DPA_GROUP_TYPE_L3_ECMP,
+	ROCKER_OF_DPA_GROUP_TYPE_L2_OVERLAY,
+};
+
+/* OF-DPA group L2 overlay types */
+enum rocker_of_dpa_overlay_type {
+	ROCKER_OF_DPA_OVERLAY_TYPE_FLOOD_UCAST = 0,
+	ROCKER_OF_DPA_OVERLAY_TYPE_FLOOD_MCAST,
+	ROCKER_OF_DPA_OVERLAY_TYPE_MCAST_UCAST,
+	ROCKER_OF_DPA_OVERLAY_TYPE_MCAST_MCAST,
+};
+
+/* OF-DPA group ID encoding */
+#define ROCKER_GROUP_TYPE_SHIFT 28
+#define ROCKER_GROUP_TYPE_MASK 0xf0000000
+#define ROCKER_GROUP_VLAN_SHIFT 16
+#define ROCKER_GROUP_VLAN_MASK 0x0fff0000
+#define ROCKER_GROUP_PORT_SHIFT 0
+#define ROCKER_GROUP_PORT_MASK 0x0000ffff
+#define ROCKER_GROUP_TUNNEL_ID_SHIFT 12
+#define ROCKER_GROUP_TUNNEL_ID_MASK 0x0ffff000
+#define ROCKER_GROUP_SUBTYPE_SHIFT 10
+#define ROCKER_GROUP_SUBTYPE_MASK 0x00000c00
+#define ROCKER_GROUP_INDEX_SHIFT 0
+#define ROCKER_GROUP_INDEX_MASK 0x0000ffff
+#define ROCKER_GROUP_INDEX_LONG_SHIFT 0
+#define ROCKER_GROUP_INDEX_LONG_MASK 0x0fffffff
+
+#define ROCKER_GROUP_TYPE_GET(group_id) \
+	(((group_id) & ROCKER_GROUP_TYPE_MASK) >> ROCKER_GROUP_TYPE_SHIFT)
+#define ROCKER_GROUP_TYPE_SET(type) \
+	(((type) << ROCKER_GROUP_TYPE_SHIFT) & ROCKER_GROUP_TYPE_MASK)
+#define ROCKER_GROUP_VLAN_GET(group_id) \
+	(((group_id) & ROCKER_GROUP_VLAN_ID_MASK) >> ROCKER_GROUP_VLAN_ID_SHIFT)
+#define ROCKER_GROUP_VLAN_SET(vlan_id) \
+	(((vlan_id) << ROCKER_GROUP_VLAN_SHIFT) & ROCKER_GROUP_VLAN_MASK)
+#define ROCKER_GROUP_PORT_GET(group_id) \
+	(((group_id) & ROCKER_GROUP_PORT_MASK) >> ROCKER_GROUP_PORT_SHIFT)
+#define ROCKER_GROUP_PORT_SET(port) \
+	(((port) << ROCKER_GROUP_PORT_SHIFT) & ROCKER_GROUP_PORT_MASK)
+#define ROCKER_GROUP_INDEX_GET(group_id) \
+	(((group_id) & ROCKER_GROUP_INDEX_MASK) >> ROCKER_GROUP_INDEX_SHIFT)
+#define ROCKER_GROUP_INDEX_SET(index) \
+	(((index) << ROCKER_GROUP_INDEX_SHIFT) & ROCKER_GROUP_INDEX_MASK)
+#define ROCKER_GROUP_INDEX_LONG_GET(group_id) \
+	(((group_id) & ROCKER_GROUP_INDEX_LONG_MASK) >> \
+	 ROCKER_GROUP_INDEX_LONG_SHIFT)
+#define ROCKER_GROUP_INDEX_LONG_SET(index) \
+	(((index) << ROCKER_GROUP_INDEX_LONG_SHIFT) & \
+	 ROCKER_GROUP_INDEX_LONG_MASK)
+
+#define ROCKER_GROUP_NONE 0
+#define ROCKER_GROUP_L2_INTERFACE(vlan_id, port) \
+	(ROCKER_GROUP_TYPE_SET(ROCKER_OF_DPA_GROUP_TYPE_L2_INTERFACE) |\
+	 ROCKER_GROUP_VLAN_SET(ntohs(vlan_id)) | ROCKER_GROUP_PORT_SET(port))
+#define ROCKER_GROUP_L2_REWRITE(index) \
+	(ROCKER_GROUP_TYPE_SET(ROCKER_OF_DPA_GROUP_TYPE_L2_REWRITE) |\
+	 ROCKER_GROUP_INDEX_LONG_SET(index))
+#define ROCKER_GROUP_L2_MCAST(vlan_id, index) \
+	(ROCKER_GROUP_TYPE_SET(ROCKER_OF_DPA_GROUP_TYPE_L2_MCAST) |\
+	 ROCKER_GROUP_VLAN_SET(ntohs(vlan_id)) | ROCKER_GROUP_INDEX_SET(index))
+#define ROCKER_GROUP_L2_FLOOD(vlan_id, index) \
+	(ROCKER_GROUP_TYPE_SET(ROCKER_OF_DPA_GROUP_TYPE_L2_FLOOD) |\
+	ROCKER_GROUP_VLAN_SET(ntohs(vlan_id)) | ROCKER_GROUP_INDEX_SET(index))
+#define ROCKER_GROUP_L3_UNICAST(index) \
+	(ROCKER_GROUP_TYPE_SET(ROCKER_OF_DPA_GROUP_TYPE_L3_UCAST) |\
+	 ROCKER_GROUP_INDEX_LONG_SET(index))
+
+/* Rocker general purpose registers */
+#define ROCKER_CONTROL			0x0300
+#define ROCKER_PORT_PHYS_COUNT		0x0304
+#define ROCKER_PORT_PHYS_LINK_STATUS	0x0310 /* 8-byte */
+#define ROCKER_PORT_PHYS_ENABLE		0x0318 /* 8-byte */
+#define ROCKER_SWITCH_ID		0x0320 /* 8-byte */
+
+/* Rocker control bits */
+#define ROCKER_CONTROL_RESET		(1 << 0)
+
+#endif
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [patch net-next v2 06/10] bridge: introduce fdb offloading via switchdev
  2014-11-09 10:51 [patch net-next v2 00/10] introduce rocker switch driver with hardware accelerated datapath api - phase 1: bridge fdb offload Jiri Pirko
                   ` (4 preceding siblings ...)
  2014-11-09 10:51 ` [patch net-next v2 05/10] rocker: introduce rocker switch driver Jiri Pirko
@ 2014-11-09 10:51 ` Jiri Pirko
  2014-11-10  3:47   ` Jamal Hadi Salim
  2014-11-09 10:51 ` [patch net-next v2 07/10] bridge: call netdev_sw_port_stp_update when bridge port STP status changes Jiri Pirko
                   ` (7 subsequent siblings)
  13 siblings, 1 reply; 100+ messages in thread
From: Jiri Pirko @ 2014-11-09 10:51 UTC (permalink / raw)
  To: netdev
  Cc: davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse, pshelar,
	azhou, ben, stephen, jeffrey.t.kirsher, vyasevic, xiyou.wangcong,
	john.r.fastabend, edumazet, jhs, sfeldma, f.fainelli, roopa,
	linville, jasowang, ebiederm, nicolas.dichtel, ryazanov.s.a,
	buytenh, aviadr, nbd, alexei.starovoitov, Neil.Jerram, ronye,
	simon.horman, alexander.h.duyck, john.ronciak, mleitner,
	shrijeet, gospo, bcrl

From: Scott Feldman <sfeldma@gmail.com>

Add two new ndos: ndo_sw_port_fdb_add/del to offload static bridge
fdb entries.  Static bridge FDB entries are installed, for example,
using iproute2 bridge cmd:

       bridge fdb add ADDR dev DEV master vlan VID

This would install ADDR into the bridge's FDB for port DEV on vlan VID.  The
switch driver implements two ndo_swdev ops to add/delete FDB entries in the
switch device:

       int ndo_sw_port_fdb_add(struct net_device *dev,
                               const unsigned char *addr,
                               u16 vid);

       int ndo_sw_port_fdb_del(struct net_device *dev,
                               const unsigned char *addr,
                               u16 vid);

The driver returns 0 on success, negative error code on failure.

Note: the switch driver would not implement ndo_fdb_add/del/dump on a port
netdev as these are intended for devices maintaining their own FDB.  In our
case, we want the Linux bridge to own the FBD.

Note: by default, the bridge does not filter on VLAN and only bridges untagged
traffic.  To enable VLAN support, turn on VLAN filtering:

      echo 1 >/sys/class/net/<bridge>/bridge/vlan_filtering

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
---
 include/linux/netdevice.h | 16 ++++++++++++++++
 include/net/switchdev.h   | 17 +++++++++++++++++
 net/bridge/br_fdb.c       | 10 +++++++++-
 net/switchdev/switchdev.c | 41 +++++++++++++++++++++++++++++++++++++++++
 4 files changed, 83 insertions(+), 1 deletion(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 97eade9..116a19d 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1023,6 +1023,16 @@ typedef u16 (*select_queue_fallback_t)(struct net_device *dev,
  *	Called to get an ID of the switch chip this port is part of.
  *	If driver implements this, it indicates that it represents a port
  *	of a switch chip.
+ *
+ * int (*ndo_sw_port_fdb_add)(struct net_device *dev,
+ *			      const unsigned char *addr,
+ *			      u16 vid);
+ *	Called to add a fdb to switch device port.
+ *
+ * int (*ndo_sw_port_fdb_del)(struct net_device *dev,
+ *			      const unsigned char *addr,
+ *			      u16 vid);
+ *	Called to delete a fdb from switch device port.
  */
 struct net_device_ops {
 	int			(*ndo_init)(struct net_device *dev);
@@ -1177,6 +1187,12 @@ struct net_device_ops {
 #ifdef CONFIG_NET_SWITCHDEV
 	int			(*ndo_sw_parent_id_get)(struct net_device *dev,
 							struct netdev_phys_item_id *psid);
+	int			(*ndo_sw_port_fdb_add)(struct net_device *dev,
+						       const unsigned char *addr,
+						       u16 vid);
+	int			(*ndo_sw_port_fdb_del)(struct net_device *dev,
+						       const unsigned char *addr,
+						       u16 vid);
 #endif
 };
 
diff --git a/include/net/switchdev.h b/include/net/switchdev.h
index 79bf9bd..130cef7 100644
--- a/include/net/switchdev.h
+++ b/include/net/switchdev.h
@@ -1,6 +1,7 @@
 /*
  * include/net/switchdev.h - Switch device API
  * Copyright (c) 2014 Jiri Pirko <jiri@resnulli.us>
+ * Copyright (c) 2014 Scott Feldman <sfeldma@gmail.com>
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License as published by
@@ -16,6 +17,10 @@
 
 int netdev_sw_parent_id_get(struct net_device *dev,
 			    struct netdev_phys_item_id *psid);
+int netdev_sw_port_fdb_add(struct net_device *dev,
+			   const unsigned char *addr, u16 vid);
+int netdev_sw_port_fdb_del(struct net_device *dev,
+			   const unsigned char *addr, u16 vid);
 
 #else
 
@@ -25,6 +30,18 @@ static inline int netdev_sw_parent_id_get(struct net_device *dev,
 	return -EOPNOTSUPP;
 }
 
+static inline int netdev_sw_port_fdb_add(struct net_device *dev,
+					 const unsigned char *addr, u16 vid)
+{
+	return -EOPNOTSUPP;
+}
+
+static inline int netdev_sw_port_fdb_del(struct net_device *dev,
+					 const unsigned char *addr, u16 vid)
+{
+	return -EOPNOTSUPP;
+}
+
 #endif
 
 #endif /* _LINUX_SWITCHDEV_H_ */
diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
index 6f6c95c..f6f8bb5 100644
--- a/net/bridge/br_fdb.c
+++ b/net/bridge/br_fdb.c
@@ -24,6 +24,7 @@
 #include <linux/atomic.h>
 #include <asm/unaligned.h>
 #include <linux/if_vlan.h>
+#include <net/switchdev.h>
 #include "br_private.h"
 
 static struct kmem_cache *br_fdb_cache __read_mostly;
@@ -132,8 +133,12 @@ static void fdb_del_hw(struct net_bridge *br, const unsigned char *addr)
 
 static void fdb_delete(struct net_bridge *br, struct net_bridge_fdb_entry *f)
 {
-	if (f->is_static)
+	if (f->is_static) {
 		fdb_del_hw(br, f->addr.addr);
+		if (f->dst)
+			netdev_sw_port_fdb_del(f->dst->dev,
+					       f->addr.addr, f->vlan_id);
+	}
 
 	hlist_del_rcu(&f->hlist);
 	fdb_notify(br, f, RTM_DELNEIGH);
@@ -755,18 +760,21 @@ static int fdb_add_entry(struct net_bridge_port *source, const __u8 *addr,
 			if (!fdb->is_static) {
 				fdb->is_static = 1;
 				fdb_add_hw(br, addr);
+				netdev_sw_port_fdb_add(source->dev, addr, vid);
 			}
 		} else if (state & NUD_NOARP) {
 			fdb->is_local = 0;
 			if (!fdb->is_static) {
 				fdb->is_static = 1;
 				fdb_add_hw(br, addr);
+				netdev_sw_port_fdb_add(source->dev, addr, vid);
 			}
 		} else {
 			fdb->is_local = 0;
 			if (fdb->is_static) {
 				fdb->is_static = 0;
 				fdb_del_hw(br, addr);
+				netdev_sw_port_fdb_del(source->dev, addr, vid);
 			}
 		}
 
diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
index 5010f646..93d47b7 100644
--- a/net/switchdev/switchdev.c
+++ b/net/switchdev/switchdev.c
@@ -1,6 +1,7 @@
 /*
  * net/switchdev/switchdev.c - Switch device API
  * Copyright (c) 2014 Jiri Pirko <jiri@resnulli.us>
+ * Copyright (c) 2014 Scott Feldman <sfeldma@gmail.com>
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License as published by
@@ -31,3 +32,43 @@ int netdev_sw_parent_id_get(struct net_device *dev,
 	return ops->ndo_sw_parent_id_get(dev, psid);
 }
 EXPORT_SYMBOL(netdev_sw_parent_id_get);
+
+/**
+ *	netdev_sw_port_fdb_add - Add a fdb into switch port
+ *	@dev: port device
+ *	@addr: mac address
+ *	@vid: vlan id
+ *
+ *	Add a fdb into switch port.
+ */
+int netdev_sw_port_fdb_add(struct net_device *dev,
+			   const unsigned char *addr, u16 vid)
+{
+	const struct net_device_ops *ops = dev->netdev_ops;
+
+	if (!ops->ndo_sw_port_fdb_add)
+		return -EOPNOTSUPP;
+	WARN_ON(!ops->ndo_sw_parent_id_get);
+	return ops->ndo_sw_port_fdb_add(dev, addr, vid);
+}
+EXPORT_SYMBOL(netdev_sw_port_fdb_add);
+
+/**
+ *	netdev_sw_port_fdb_del - Delete a fdb from switch port
+ *	@dev: port device
+ *	@addr: mac address
+ *	@vid: vlan id
+ *
+ *	Delete a fdb from switch port.
+ */
+int netdev_sw_port_fdb_del(struct net_device *dev,
+			   const unsigned char *addr, u16 vid)
+{
+	const struct net_device_ops *ops = dev->netdev_ops;
+
+	if (!ops->ndo_sw_port_fdb_del)
+		return -EOPNOTSUPP;
+	WARN_ON(!ops->ndo_sw_parent_id_get);
+	return ops->ndo_sw_port_fdb_del(dev, addr, vid);
+}
+EXPORT_SYMBOL(netdev_sw_port_fdb_del);
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [patch net-next v2 07/10] bridge: call netdev_sw_port_stp_update when bridge port STP status changes
  2014-11-09 10:51 [patch net-next v2 00/10] introduce rocker switch driver with hardware accelerated datapath api - phase 1: bridge fdb offload Jiri Pirko
                   ` (5 preceding siblings ...)
  2014-11-09 10:51 ` [patch net-next v2 06/10] bridge: introduce fdb offloading via switchdev Jiri Pirko
@ 2014-11-09 10:51 ` Jiri Pirko
  2014-11-10 13:11   ` Jamal Hadi Salim
  2014-11-09 10:51 ` [patch net-next v2 08/10] bridge: add API to notify bridge driver of learned FBD on offloaded device Jiri Pirko
                   ` (6 subsequent siblings)
  13 siblings, 1 reply; 100+ messages in thread
From: Jiri Pirko @ 2014-11-09 10:51 UTC (permalink / raw)
  To: netdev
  Cc: davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse, pshelar,
	azhou, ben, stephen, jeffrey.t.kirsher, vyasevic, xiyou.wangcong,
	john.r.fastabend, edumazet, jhs, sfeldma, f.fainelli, roopa,
	linville, jasowang, ebiederm, nicolas.dichtel, ryazanov.s.a,
	buytenh, aviadr, nbd, alexei.starovoitov, Neil.Jerram, ronye,
	simon.horman, alexander.h.duyck, john.ronciak, mleitner,
	shrijeet, gospo, bcrl

From: Scott Feldman <sfeldma@gmail.com>

To notify switch driver of change in STP state of bridge port, add new
.ndo op and provide swdev wrapper func to call ndo op. Use it in bridge
code then.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
---
 include/linux/netdevice.h |  6 ++++++
 include/net/switchdev.h   |  6 ++++++
 net/bridge/br_netlink.c   |  2 ++
 net/bridge/br_stp.c       |  4 ++++
 net/bridge/br_stp_if.c    |  3 +++
 net/bridge/br_stp_timer.c |  2 ++
 net/switchdev/switchdev.c | 19 +++++++++++++++++++
 7 files changed, 42 insertions(+)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 116a19d..35f21a95 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1033,6 +1033,10 @@ typedef u16 (*select_queue_fallback_t)(struct net_device *dev,
  *			      const unsigned char *addr,
  *			      u16 vid);
  *	Called to delete a fdb from switch device port.
+ *
+ * int (*ndo_sw_port_stp_update)(struct net_device *dev, u8 state);
+ *	Called to notify switch device port of bridge port STP
+ *	state change.
  */
 struct net_device_ops {
 	int			(*ndo_init)(struct net_device *dev);
@@ -1193,6 +1197,8 @@ struct net_device_ops {
 	int			(*ndo_sw_port_fdb_del)(struct net_device *dev,
 						       const unsigned char *addr,
 						       u16 vid);
+	int			(*ndo_sw_port_stp_update)(struct net_device *dev,
+							  u8 state);
 #endif
 };
 
diff --git a/include/net/switchdev.h b/include/net/switchdev.h
index 130cef7..bbf7369 100644
--- a/include/net/switchdev.h
+++ b/include/net/switchdev.h
@@ -21,6 +21,7 @@ int netdev_sw_port_fdb_add(struct net_device *dev,
 			   const unsigned char *addr, u16 vid);
 int netdev_sw_port_fdb_del(struct net_device *dev,
 			   const unsigned char *addr, u16 vid);
+int netdev_sw_port_stp_update(struct net_device *dev, u8 state);
 
 #else
 
@@ -42,6 +43,11 @@ static inline int netdev_sw_port_fdb_del(struct net_device *dev,
 	return -EOPNOTSUPP;
 }
 
+static inline int netdev_sw_port_stp_update(struct net_device *dev, u8 state)
+{
+	return -EOPNOTSUPP;
+}
+
 #endif
 
 #endif /* _LINUX_SWITCHDEV_H_ */
diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
index 86c239b..13fecf1 100644
--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
@@ -17,6 +17,7 @@
 #include <net/net_namespace.h>
 #include <net/sock.h>
 #include <uapi/linux/if_bridge.h>
+#include <net/switchdev.h>
 
 #include "br_private.h"
 #include "br_private_stp.h"
@@ -304,6 +305,7 @@ static int br_set_port_state(struct net_bridge_port *p, u8 state)
 
 	br_set_state(p, state);
 	br_log_state(p);
+	netdev_sw_port_stp_update(p->dev, p->state);
 	br_port_state_selection(p->br);
 	return 0;
 }
diff --git a/net/bridge/br_stp.c b/net/bridge/br_stp.c
index 2b047bc..c00139b 100644
--- a/net/bridge/br_stp.c
+++ b/net/bridge/br_stp.c
@@ -12,6 +12,7 @@
  */
 #include <linux/kernel.h>
 #include <linux/rculist.h>
+#include <net/switchdev.h>
 
 #include "br_private.h"
 #include "br_private_stp.h"
@@ -114,6 +115,7 @@ static void br_root_port_block(const struct net_bridge *br,
 
 	br_set_state(p, BR_STATE_LISTENING);
 	br_log_state(p);
+	netdev_sw_port_stp_update(p->dev, p->state);
 	br_ifinfo_notify(RTM_NEWLINK, p);
 
 	if (br->forward_delay > 0)
@@ -394,6 +396,7 @@ static void br_make_blocking(struct net_bridge_port *p)
 
 		br_set_state(p, BR_STATE_BLOCKING);
 		br_log_state(p);
+		netdev_sw_port_stp_update(p->dev, p->state);
 		br_ifinfo_notify(RTM_NEWLINK, p);
 
 		del_timer(&p->forward_delay_timer);
@@ -419,6 +422,7 @@ static void br_make_forwarding(struct net_bridge_port *p)
 
 	br_multicast_enable_port(p);
 	br_log_state(p);
+	netdev_sw_port_stp_update(p->dev, p->state);
 	br_ifinfo_notify(RTM_NEWLINK, p);
 
 	if (br->forward_delay != 0)
diff --git a/net/bridge/br_stp_if.c b/net/bridge/br_stp_if.c
index 4114687..91279f8 100644
--- a/net/bridge/br_stp_if.c
+++ b/net/bridge/br_stp_if.c
@@ -15,6 +15,7 @@
 #include <linux/kmod.h>
 #include <linux/etherdevice.h>
 #include <linux/rtnetlink.h>
+#include <net/switchdev.h>
 
 #include "br_private.h"
 #include "br_private_stp.h"
@@ -89,6 +90,7 @@ void br_stp_enable_port(struct net_bridge_port *p)
 	br_init_port(p);
 	br_port_state_selection(p->br);
 	br_log_state(p);
+	netdev_sw_port_stp_update(p->dev, p->state);
 	br_ifinfo_notify(RTM_NEWLINK, p);
 }
 
@@ -105,6 +107,7 @@ void br_stp_disable_port(struct net_bridge_port *p)
 	p->config_pending = 0;
 
 	br_log_state(p);
+	netdev_sw_port_stp_update(p->dev, p->state);
 	br_ifinfo_notify(RTM_NEWLINK, p);
 
 	del_timer(&p->message_age_timer);
diff --git a/net/bridge/br_stp_timer.c b/net/bridge/br_stp_timer.c
index 4fcaa67..5bb8997 100644
--- a/net/bridge/br_stp_timer.c
+++ b/net/bridge/br_stp_timer.c
@@ -13,6 +13,7 @@
 
 #include <linux/kernel.h>
 #include <linux/times.h>
+#include <net/switchdev.h>
 
 #include "br_private.h"
 #include "br_private_stp.h"
@@ -97,6 +98,7 @@ static void br_forward_delay_timer_expired(unsigned long arg)
 		netif_carrier_on(br->dev);
 	}
 	br_log_state(p);
+	netdev_sw_port_stp_update(p->dev, p->state);
 	br_ifinfo_notify(RTM_NEWLINK, p);
 	spin_unlock(&br->lock);
 }
diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
index 93d47b7..75997a5 100644
--- a/net/switchdev/switchdev.c
+++ b/net/switchdev/switchdev.c
@@ -72,3 +72,22 @@ int netdev_sw_port_fdb_del(struct net_device *dev,
 	return ops->ndo_sw_port_fdb_del(dev, addr, vid);
 }
 EXPORT_SYMBOL(netdev_sw_port_fdb_del);
+
+/**
+ *	netdev_sw_port_stp_update - Notify switch device port of STP
+ *				    state change
+ *	@dev: port device
+ *	@state: port STP state
+ *
+ *	Notify switch device port of bridge port STP state change.
+ */
+int netdev_sw_port_stp_update(struct net_device *dev, u8 state)
+{
+	const struct net_device_ops *ops = dev->netdev_ops;
+
+	if (!ops->ndo_sw_port_stp_update)
+		return -EOPNOTSUPP;
+	WARN_ON(!ops->ndo_sw_parent_id_get);
+	return ops->ndo_sw_port_stp_update(dev, state);
+}
+EXPORT_SYMBOL(netdev_sw_port_stp_update);
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [patch net-next v2 08/10] bridge: add API to notify bridge driver of learned FBD on offloaded device
  2014-11-09 10:51 [patch net-next v2 00/10] introduce rocker switch driver with hardware accelerated datapath api - phase 1: bridge fdb offload Jiri Pirko
                   ` (6 preceding siblings ...)
  2014-11-09 10:51 ` [patch net-next v2 07/10] bridge: call netdev_sw_port_stp_update when bridge port STP status changes Jiri Pirko
@ 2014-11-09 10:51 ` Jiri Pirko
  2014-11-11 14:21   ` Roopa Prabhu
  2014-11-09 10:51 ` [patch net-next v2 09/10] rocker: implement rocker ofdpa flow table manipulation Jiri Pirko
                   ` (5 subsequent siblings)
  13 siblings, 1 reply; 100+ messages in thread
From: Jiri Pirko @ 2014-11-09 10:51 UTC (permalink / raw)
  To: netdev
  Cc: davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse, pshelar,
	azhou, ben, stephen, jeffrey.t.kirsher, vyasevic, xiyou.wangcong,
	john.r.fastabend, edumazet, jhs, sfeldma, f.fainelli, roopa,
	linville, jasowang, ebiederm, nicolas.dichtel, ryazanov.s.a,
	buytenh, aviadr, nbd, alexei.starovoitov, Neil.Jerram, ronye,
	simon.horman, alexander.h.duyck, john.ronciak, mleitner,
	shrijeet, gospo, bcrl

From: Scott Feldman <sfeldma@gmail.com>

When the swdev device learns a new mac/vlan on a port, it sends some async
notification to the driver and the driver installs an FDB in the device.
To give a holistic system view, the learned mac/vlan should be reflected
in the bridge's FBD table, so the user, using normal iproute2 cmds, can view
what is currently learned by the device.  This API on the bridge driver gives
a way for the swdev driver to install an FBD entry in the bridge FBD table.
(And remove one).

This is equivalent to the device running these cmds:

  bridge fdb [add|del] <mac> dev <dev> vid <vlan id> master

This patch needs some extra eyeballs for review, in paricular around the
locking and contexts.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
---
 include/linux/if_bridge.h | 18 ++++++++++
 net/bridge/br_fdb.c       | 84 +++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 102 insertions(+)

diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
index 808dcb8..27ab217 100644
--- a/include/linux/if_bridge.h
+++ b/include/linux/if_bridge.h
@@ -37,6 +37,24 @@ extern void brioctl_set(int (*ioctl_hook)(struct net *, unsigned int, void __use
 typedef int br_should_route_hook_t(struct sk_buff *skb);
 extern br_should_route_hook_t __rcu *br_should_route_hook;
 
+#if IS_ENABLED(CONFIG_BRIDGE)
+int br_fdb_learn_add(struct net_device *dev,
+		     const unsigned char *addr, u16 vid);
+int br_fdb_learn_del(struct net_device *dev,
+		     const unsigned char *addr, u16 vid);
+#else
+static inline int br_fdb_learn_add(struct net_device *dev,
+				   const unsigned char *addr, u16 vid)
+{
+	return 0;
+}
+static inline int br_fdb_learn_del(struct net_device *dev,
+				   const unsigned char *addr, u16 vid)
+{
+	return 0;
+}
+#endif
+
 #if IS_ENABLED(CONFIG_BRIDGE) && IS_ENABLED(CONFIG_BRIDGE_IGMP_SNOOPING)
 int br_multicast_list_adjacent(struct net_device *dev,
 			       struct list_head *br_ip_list);
diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
index f6f8bb5..e02d21b 100644
--- a/net/bridge/br_fdb.c
+++ b/net/bridge/br_fdb.c
@@ -1022,3 +1022,87 @@ void br_fdb_unsync_static(struct net_bridge *br, struct net_bridge_port *p)
 		}
 	}
 }
+
+int br_fdb_learn_add(struct net_device *dev, const unsigned char *addr,
+		     u16 vid)
+{
+	struct net_bridge_port *p;
+	struct net_bridge *br;
+	struct hlist_head *head;
+	struct net_bridge_fdb_entry *fdb;
+	int err = 0;
+
+	rtnl_lock();
+
+	p = br_port_get_rtnl(dev);
+	if (p == NULL) {
+		pr_info("bridge: %s not a bridge port\n", dev->name);
+		err = -EINVAL;
+		goto err_rtnl_unlock;
+	}
+
+	br = p->br;
+
+	spin_lock(&br->hash_lock);
+
+	head = &br->hash[br_mac_hash(addr, vid)];
+	fdb = fdb_find(head, addr, vid);
+	if (fdb == NULL) {
+		fdb = fdb_create(head, p, addr, vid);
+		if (!fdb) {
+			err = -ENOMEM;
+			goto err_unlock;
+		}
+		fdb->is_local = 1;
+		fdb->used = jiffies;
+		fdb->updated = jiffies;
+		fdb_notify(br, fdb, RTM_NEWNEIGH);
+	} else {
+		err = -EEXIST;
+	}
+
+err_unlock:
+	spin_unlock(&br->hash_lock);
+err_rtnl_unlock:
+	rtnl_unlock();
+
+	return err;
+}
+EXPORT_SYMBOL(br_fdb_learn_add);
+
+int br_fdb_learn_del(struct net_device *dev, const unsigned char *addr,
+		     u16 vid)
+{
+	struct net_bridge_port *p;
+	struct net_bridge *br;
+	struct hlist_head *head;
+	struct net_bridge_fdb_entry *fdb;
+	int err = 0;
+
+	rtnl_lock();
+
+	p = br_port_get_rtnl(dev);
+	if (p == NULL) {
+		pr_info("bridge: %s not a bridge port\n", dev->name);
+		err = -EINVAL;
+		goto err_rtnl_unlock;
+	}
+
+	br = p->br;
+
+	spin_lock(&br->hash_lock);
+
+	head = &br->hash[br_mac_hash(addr, vid)];
+	fdb = fdb_find(head, addr, vid);
+	if (fdb)
+		fdb_delete(br, fdb);
+	else
+		err = -ENOENT;
+
+	spin_unlock(&br->hash_lock);
+err_rtnl_unlock:
+	rtnl_unlock();
+
+	return err;
+}
+EXPORT_SYMBOL(br_fdb_learn_del);
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [patch net-next v2 09/10] rocker: implement rocker ofdpa flow table manipulation
  2014-11-09 10:51 [patch net-next v2 00/10] introduce rocker switch driver with hardware accelerated datapath api - phase 1: bridge fdb offload Jiri Pirko
                   ` (7 preceding siblings ...)
  2014-11-09 10:51 ` [patch net-next v2 08/10] bridge: add API to notify bridge driver of learned FBD on offloaded device Jiri Pirko
@ 2014-11-09 10:51 ` Jiri Pirko
  2014-11-09 10:51 ` [patch net-next v2 10/10] rocker: implement L2 bridge offloading Jiri Pirko
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 100+ messages in thread
From: Jiri Pirko @ 2014-11-09 10:51 UTC (permalink / raw)
  To: netdev
  Cc: davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse, pshelar,
	azhou, ben, stephen, jeffrey.t.kirsher, vyasevic, xiyou.wangcong,
	john.r.fastabend, edumazet, jhs, sfeldma, f.fainelli, roopa,
	linville, jasowang, ebiederm, nicolas.dichtel, ryazanov.s.a,
	buytenh, aviadr, nbd, alexei.starovoitov, Neil.Jerram, ronye,
	simon.horman, alexander.h.duyck, john.ronciak, mleitner,
	shrijeet, gospo, bcrl

From: Scott Feldman <sfeldma@gmail.com>

The rocker driver maintains 4 hash tables: flows, groups, FDB, and VLANs.

Flow and group tables track the entries installed to OF-DPA tables,
per the OF-DPA spec.  See OF-DPA spec for full description of fields
in each flow and group table.  New table entries are pushed to the
device with ADD cmd.  Updated entries are pushed to the device with
MOD cmd.  For flow table entries, a crc32 key is made from fields of
the particular field.  For group table entries, the group_id is used
as the key.

The FDB table tracks fdb entries learned by the device or manually
pushed to the bridge by the user.  A crc32 key is made from the
port/mac/vlan tuple for the fdb entry.

The VLAN table tracks the ifindex-to-internal-vlan mapping for
untagged pkts.  On ingress, an untagged pkt is inserted with an
internal VLAN ID based on the input port's current internal VLAN ID.
The input port's internal VLAN will either be referenced by the port's
ifindex, if not bridged, or the containing bridge's ifindex, if
bridged.  Since the ifindex space isn't within a fixed range, uses a
hash table (with ifindex as key) to track internal VLAN ID for a given
ifindex.  The internal VLAN ID range is fixed and currently uses the
upper 255 VLAN IDs, starting at 0xf00.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
---
 drivers/net/ethernet/rocker/rocker.c | 1468 +++++++++++++++++++++++++++++++++-
 1 file changed, 1466 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/rocker/rocker.c b/drivers/net/ethernet/rocker/rocker.c
index ebad09c..ea58a4f 100644
--- a/drivers/net/ethernet/rocker/rocker.c
+++ b/drivers/net/ethernet/rocker/rocker.c
@@ -16,6 +16,7 @@
 #include <linux/sched.h>
 #include <linux/wait.h>
 #include <linux/spinlock.h>
+#include <linux/hashtable.h>
 #include <linux/crc32.h>
 #include <linux/sort.h>
 #include <linux/random.h>
@@ -27,6 +28,7 @@
 #include <linux/ethtool.h>
 #include <linux/if_ether.h>
 #include <linux/if_vlan.h>
+#include <linux/bitops.h>
 #include <net/switchdev.h>
 #include <net/rtnetlink.h>
 #include <asm-generic/io-64-nonatomic-lo-hi.h>
@@ -41,6 +43,123 @@ static const struct pci_device_id rocker_pci_id_table[] = {
 	{0, }
 };
 
+struct rocker_flow_tbl_key {
+	u32 priority;
+	enum rocker_of_dpa_table_id tbl_id;
+	union {
+		struct {
+			u32 in_lport;
+			u32 in_lport_mask;
+			enum rocker_of_dpa_table_id goto_tbl;
+		} ig_port;
+		struct {
+			u32 in_lport;
+			__be16 vlan_id;
+			__be16 vlan_id_mask;
+			enum rocker_of_dpa_table_id goto_tbl;
+			bool untagged;
+			__be16 new_vlan_id;
+		} vlan;
+		struct {
+			u32 in_lport;
+			u32 in_lport_mask;
+			__be16 eth_type;
+			u8 eth_dst[ETH_ALEN];
+			u8 eth_dst_mask[ETH_ALEN];
+			__be16 vlan_id;
+			__be16 vlan_id_mask;
+			enum rocker_of_dpa_table_id goto_tbl;
+			bool copy_to_cpu;
+		} term_mac;
+		struct {
+			__be16 eth_type;
+			__be32 dst4;
+			__be32 dst4_mask;
+			enum rocker_of_dpa_table_id goto_tbl;
+			u32 group_id;
+		} ucast_routing;
+		struct {
+			u8 eth_dst[ETH_ALEN];
+			u8 eth_dst_mask[ETH_ALEN];
+			int has_eth_dst;
+			int has_eth_dst_mask;
+			__be16 vlan_id;
+			u32 tunnel_id;
+			enum rocker_of_dpa_table_id goto_tbl;
+			u32 group_id;
+			bool copy_to_cpu;
+		} bridge;
+		struct {
+			u32 in_lport;
+			u32 in_lport_mask;
+			u8 eth_src[ETH_ALEN];
+			u8 eth_src_mask[ETH_ALEN];
+			u8 eth_dst[ETH_ALEN];
+			u8 eth_dst_mask[ETH_ALEN];
+			__be16 eth_type;
+			__be16 vlan_id;
+			__be16 vlan_id_mask;
+			u8 ip_proto;
+			u8 ip_proto_mask;
+			u8 ip_tos;
+			u8 ip_tos_mask;
+			u32 group_id;
+		} acl;
+	};
+};
+
+struct rocker_flow_tbl_entry {
+	struct hlist_node entry;
+	u32 ref_count;
+	u64 cookie;
+	struct rocker_flow_tbl_key key;
+	u32 key_crc32; /* key */
+};
+
+struct rocker_group_tbl_entry {
+	struct hlist_node entry;
+	u32 cmd;
+	u32 group_id; /* key */
+	u16 group_count;
+	u32 *group_ids;
+	union {
+		struct {
+			u8 pop_vlan;
+		} l2_interface;
+		struct {
+			u8 eth_src[ETH_ALEN];
+			u8 eth_dst[ETH_ALEN];
+			__be16 vlan_id;
+			u32 group_id;
+		} l2_rewrite;
+		struct {
+			u8 eth_src[ETH_ALEN];
+			u8 eth_dst[ETH_ALEN];
+			__be16 vlan_id;
+			bool ttl_check;
+			u32 group_id;
+		} l3_unicast;
+	};
+};
+
+struct rocker_fdb_tbl_entry {
+	struct hlist_node entry;
+	u32 key_crc32; /* key */
+	bool learned;
+	struct rocker_fdb_tbl_key {
+		u32 lport;
+		u8 addr[ETH_ALEN];
+		__be16 vlan_id;
+	} key;
+};
+
+struct rocker_internal_vlan_tbl_entry {
+	struct hlist_node entry;
+	int ifindex; /* key */
+	u32 ref_count;
+	__be16 vlan_id;
+};
+
 struct rocker_desc_info {
 	char *data; /* mapped */
 	size_t data_size;
@@ -61,11 +180,28 @@ struct rocker_dma_ring_info {
 
 struct rocker;
 
+enum {
+	ROCKER_CTRL_LINK_LOCAL_MCAST,
+	ROCKER_CTRL_LOCAL_ARP,
+	ROCKER_CTRL_IPV4_MCAST,
+	ROCKER_CTRL_IPV6_MCAST,
+	ROCKER_CTRL_DFLT_BRIDGING,
+	ROCKER_CTRL_MAX,
+};
+
+#define ROCKER_INTERNAL_VLAN_ID_BASE	0x0f00
+#define ROCKER_N_INTERNAL_VLANS		255
+#define ROCKER_VLAN_BITMAP_LEN		BITS_TO_LONGS(VLAN_N_VID)
+#define ROCKER_INTERNAL_VLAN_BITMAP_LEN	BITS_TO_LONGS(ROCKER_N_INTERNAL_VLANS)
+
 struct rocker_port {
 	struct net_device *dev;
 	struct rocker *rocker;
 	unsigned int port_number;
 	u32 lport;
+	__be16 internal_vlan_id;
+	bool ctrls[ROCKER_CTRL_MAX];
+	unsigned long vlan_bitmap[ROCKER_VLAN_BITMAP_LEN];
 	struct napi_struct napi_tx;
 	struct napi_struct napi_rx;
 	struct rocker_dma_ring_info tx_ring;
@@ -84,8 +220,76 @@ struct rocker {
 	spinlock_t cmd_ring_lock;
 	struct rocker_dma_ring_info cmd_ring;
 	struct rocker_dma_ring_info event_ring;
+	DECLARE_HASHTABLE(flow_tbl, 16);
+	spinlock_t flow_tbl_lock;
+	u64 flow_tbl_next_cookie;
+	DECLARE_HASHTABLE(group_tbl, 16);
+	spinlock_t group_tbl_lock;
+	DECLARE_HASHTABLE(fdb_tbl, 16);
+	spinlock_t fdb_tbl_lock;
+	unsigned long internal_vlan_bitmap[ROCKER_INTERNAL_VLAN_BITMAP_LEN];
+	DECLARE_HASHTABLE(internal_vlan_tbl, 8);
+	spinlock_t internal_vlan_tbl_lock;
+};
+
+static const u8 zero_mac[ETH_ALEN]   = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 };
+static const u8 ff_mac[ETH_ALEN]     = { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff };
+static const u8 ll_mac[ETH_ALEN]     = { 0x01, 0x80, 0xc2, 0x00, 0x00, 0x00 };
+static const u8 ll_mask[ETH_ALEN]    = { 0xff, 0xff, 0xff, 0xff, 0xff, 0xf0 };
+static const u8 mcast_mac[ETH_ALEN]  = { 0x01, 0x00, 0x00, 0x00, 0x00, 0x00 };
+static const u8 ipv4_mcast[ETH_ALEN] = { 0x01, 0x00, 0x5e, 0x00, 0x00, 0x00 };
+static const u8 ipv4_mask[ETH_ALEN]  = { 0xff, 0xff, 0xff, 0x80, 0x00, 0x00 };
+static const u8 ipv6_mcast[ETH_ALEN] = { 0x33, 0x33, 0x00, 0x00, 0x00, 0x00 };
+static const u8 ipv6_mask[ETH_ALEN]  = { 0xff, 0xff, 0x00, 0x00, 0x00, 0x00 };
+
+/* Rocker priority levels for flow table entries.  Higher
+ * priority match takes precedence over lower priority match.
+ */
+
+enum {
+	ROCKER_PRIORITY_UNKNOWN = 0,
+	ROCKER_PRIORITY_IG_PORT = 1,
+	ROCKER_PRIORITY_VLAN = 1,
+	ROCKER_PRIORITY_TERM_MAC_UCAST = 0,
+	ROCKER_PRIORITY_TERM_MAC_MCAST = 1,
+	ROCKER_PRIORITY_UNICAST_ROUTING = 1,
+	ROCKER_PRIORITY_BRIDGING_VLAN_DFLT_EXACT = 1,
+	ROCKER_PRIORITY_BRIDGING_VLAN_DFLT_WILD = 2,
+	ROCKER_PRIORITY_BRIDGING_VLAN = 3,
+	ROCKER_PRIORITY_BRIDGING_TENANT_DFLT_EXACT = 1,
+	ROCKER_PRIORITY_BRIDGING_TENANT_DFLT_WILD = 2,
+	ROCKER_PRIORITY_BRIDGING_TENANT = 3,
+	ROCKER_PRIORITY_ACL_CTRL = 3,
+	ROCKER_PRIORITY_ACL_NORMAL = 2,
+	ROCKER_PRIORITY_ACL_DFLT = 1,
 };
 
+static bool rocker_vlan_id_is_internal(__be16 vlan_id)
+{
+	u16 start = ROCKER_INTERNAL_VLAN_ID_BASE;
+	u16 end = 0xffe;
+	u16 _vlan_id = ntohs(vlan_id);
+
+	return (_vlan_id >= start && _vlan_id <= end);
+}
+
+static __be16 rocker_port_vid_to_vlan(struct rocker_port *rocker_port,
+				      u16 vid, bool *pop_vlan)
+{
+	__be16 vlan_id;
+
+	if (pop_vlan)
+		*pop_vlan = false;
+	vlan_id = htons(vid);
+	if (!vlan_id) {
+		vlan_id = rocker_port->internal_vlan_id;
+		if (pop_vlan)
+			*pop_vlan = true;
+	}
+
+	return vlan_id;
+}
+
 struct rocker_wait {
 	wait_queue_head_t wait;
 	bool done;
@@ -1094,6 +1298,10 @@ static int rocker_event_link_change(struct rocker *rocker,
 	return 0;
 }
 
+#define ROCKER_OP_FLAG_REMOVE		BIT(0)
+#define ROCKER_OP_FLAG_NOWAIT		BIT(1)
+#define ROCKER_OP_FLAG_LEARNED		BIT(2)
+
 static int rocker_event_process(struct rocker *rocker,
 				struct rocker_desc_info *desc_info)
 {
@@ -1399,6 +1607,1239 @@ static int rocker_cmd_set_port_settings_macaddr(struct rocker_port *rocker_port,
 			       macaddr, NULL, NULL, false);
 }
 
+static int rocker_cmd_flow_tbl_add_ig_port(struct rocker_desc_info *desc_info,
+					   struct rocker_flow_tbl_entry *entry)
+{
+	if (rocker_tlv_put_u32(desc_info, ROCKER_TLV_OF_DPA_IN_LPORT,
+			       entry->key.ig_port.in_lport))
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u32(desc_info, ROCKER_TLV_OF_DPA_IN_LPORT_MASK,
+			       entry->key.ig_port.in_lport_mask))
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_OF_DPA_GOTO_TABLE_ID,
+			       entry->key.ig_port.goto_tbl))
+		return -EMSGSIZE;
+
+	return 0;
+}
+
+static int rocker_cmd_flow_tbl_add_vlan(struct rocker_desc_info *desc_info,
+					struct rocker_flow_tbl_entry *entry)
+{
+	if (rocker_tlv_put_u32(desc_info, ROCKER_TLV_OF_DPA_IN_LPORT,
+			       entry->key.vlan.in_lport))
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_OF_DPA_VLAN_ID,
+			       entry->key.vlan.vlan_id))
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_OF_DPA_VLAN_ID_MASK,
+			       entry->key.vlan.vlan_id_mask))
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_OF_DPA_GOTO_TABLE_ID,
+			       entry->key.vlan.goto_tbl))
+		return -EMSGSIZE;
+	if (entry->key.vlan.untagged &&
+	    rocker_tlv_put_u16(desc_info, ROCKER_TLV_OF_DPA_NEW_VLAN_ID,
+			       entry->key.vlan.new_vlan_id))
+		return -EMSGSIZE;
+
+	return 0;
+}
+
+static int rocker_cmd_flow_tbl_add_term_mac(struct rocker_desc_info *desc_info,
+					    struct rocker_flow_tbl_entry *entry)
+{
+	if (rocker_tlv_put_u32(desc_info, ROCKER_TLV_OF_DPA_IN_LPORT,
+			       entry->key.term_mac.in_lport))
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u32(desc_info, ROCKER_TLV_OF_DPA_IN_LPORT_MASK,
+			       entry->key.term_mac.in_lport_mask))
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_OF_DPA_ETHERTYPE,
+			       entry->key.term_mac.eth_type))
+		return -EMSGSIZE;
+	if (rocker_tlv_put(desc_info, ROCKER_TLV_OF_DPA_DST_MAC,
+			   ETH_ALEN, entry->key.term_mac.eth_dst))
+		return -EMSGSIZE;
+	if (rocker_tlv_put(desc_info, ROCKER_TLV_OF_DPA_DST_MAC_MASK,
+			   ETH_ALEN, entry->key.term_mac.eth_dst_mask))
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_OF_DPA_VLAN_ID,
+			       entry->key.term_mac.vlan_id))
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_OF_DPA_VLAN_ID_MASK,
+			       entry->key.term_mac.vlan_id_mask))
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_OF_DPA_GOTO_TABLE_ID,
+			       entry->key.term_mac.goto_tbl))
+		return -EMSGSIZE;
+	if (entry->key.term_mac.copy_to_cpu &&
+	    rocker_tlv_put_u8(desc_info, ROCKER_TLV_OF_DPA_COPY_CPU_ACTION,
+			      entry->key.term_mac.copy_to_cpu))
+		return -EMSGSIZE;
+
+	return 0;
+}
+
+static int
+rocker_cmd_flow_tbl_add_ucast_routing(struct rocker_desc_info *desc_info,
+				      struct rocker_flow_tbl_entry *entry)
+{
+	if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_OF_DPA_ETHERTYPE,
+			       entry->key.ucast_routing.eth_type))
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u32(desc_info, ROCKER_TLV_OF_DPA_DST_IP,
+			       entry->key.ucast_routing.dst4))
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u32(desc_info, ROCKER_TLV_OF_DPA_DST_IP_MASK,
+			       entry->key.ucast_routing.dst4_mask))
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_OF_DPA_GOTO_TABLE_ID,
+			       entry->key.ucast_routing.goto_tbl))
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u32(desc_info, ROCKER_TLV_OF_DPA_GROUP_ID,
+			       entry->key.ucast_routing.group_id))
+		return -EMSGSIZE;
+
+	return 0;
+}
+
+static int rocker_cmd_flow_tbl_add_bridge(struct rocker_desc_info *desc_info,
+					  struct rocker_flow_tbl_entry *entry)
+{
+	if (entry->key.bridge.has_eth_dst &&
+	    rocker_tlv_put(desc_info, ROCKER_TLV_OF_DPA_DST_MAC,
+			   ETH_ALEN, entry->key.bridge.eth_dst))
+		return -EMSGSIZE;
+	if (entry->key.bridge.has_eth_dst_mask &&
+	    rocker_tlv_put(desc_info, ROCKER_TLV_OF_DPA_DST_MAC_MASK,
+			   ETH_ALEN, entry->key.bridge.eth_dst_mask))
+		return -EMSGSIZE;
+	if (entry->key.bridge.vlan_id &&
+	    rocker_tlv_put_u16(desc_info, ROCKER_TLV_OF_DPA_VLAN_ID,
+			       entry->key.bridge.vlan_id))
+		return -EMSGSIZE;
+	if (entry->key.bridge.tunnel_id &&
+	    rocker_tlv_put_u32(desc_info, ROCKER_TLV_OF_DPA_TUNNEL_ID,
+			       entry->key.bridge.tunnel_id))
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_OF_DPA_GOTO_TABLE_ID,
+			       entry->key.bridge.goto_tbl))
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u32(desc_info, ROCKER_TLV_OF_DPA_GROUP_ID,
+			       entry->key.bridge.group_id))
+		return -EMSGSIZE;
+	if (entry->key.bridge.copy_to_cpu &&
+	    rocker_tlv_put_u8(desc_info, ROCKER_TLV_OF_DPA_COPY_CPU_ACTION,
+			      entry->key.bridge.copy_to_cpu))
+		return -EMSGSIZE;
+
+	return 0;
+}
+
+static int rocker_cmd_flow_tbl_add_acl(struct rocker_desc_info *desc_info,
+				       struct rocker_flow_tbl_entry *entry)
+{
+	if (rocker_tlv_put_u32(desc_info, ROCKER_TLV_OF_DPA_IN_LPORT,
+			       entry->key.acl.in_lport))
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u32(desc_info, ROCKER_TLV_OF_DPA_IN_LPORT_MASK,
+			       entry->key.acl.in_lport_mask))
+		return -EMSGSIZE;
+	if (rocker_tlv_put(desc_info, ROCKER_TLV_OF_DPA_SRC_MAC,
+			   ETH_ALEN, entry->key.acl.eth_src))
+		return -EMSGSIZE;
+	if (rocker_tlv_put(desc_info, ROCKER_TLV_OF_DPA_SRC_MAC_MASK,
+			   ETH_ALEN, entry->key.acl.eth_src_mask))
+		return -EMSGSIZE;
+	if (rocker_tlv_put(desc_info, ROCKER_TLV_OF_DPA_DST_MAC,
+			   ETH_ALEN, entry->key.acl.eth_dst))
+		return -EMSGSIZE;
+	if (rocker_tlv_put(desc_info, ROCKER_TLV_OF_DPA_DST_MAC_MASK,
+			   ETH_ALEN, entry->key.acl.eth_dst_mask))
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_OF_DPA_ETHERTYPE,
+			       entry->key.acl.eth_type))
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_OF_DPA_VLAN_ID,
+			       entry->key.acl.vlan_id))
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_OF_DPA_VLAN_ID_MASK,
+			       entry->key.acl.vlan_id_mask))
+		return -EMSGSIZE;
+
+	switch (ntohs(entry->key.acl.eth_type)) {
+	case ETH_P_IP:
+	case ETH_P_IPV6:
+		if (rocker_tlv_put_u8(desc_info, ROCKER_TLV_OF_DPA_IP_PROTO,
+				      entry->key.acl.ip_proto))
+			return -EMSGSIZE;
+		if (rocker_tlv_put_u8(desc_info,
+				      ROCKER_TLV_OF_DPA_IP_PROTO_MASK,
+				      entry->key.acl.ip_proto_mask))
+			return -EMSGSIZE;
+		if (rocker_tlv_put_u8(desc_info, ROCKER_TLV_OF_DPA_IP_DSCP,
+				      entry->key.acl.ip_tos & 0x3f))
+			return -EMSGSIZE;
+		if (rocker_tlv_put_u8(desc_info,
+				      ROCKER_TLV_OF_DPA_IP_DSCP_MASK,
+				      entry->key.acl.ip_tos_mask & 0x3f))
+			return -EMSGSIZE;
+		if (rocker_tlv_put_u8(desc_info, ROCKER_TLV_OF_DPA_IP_ECN,
+				      (entry->key.acl.ip_tos & 0xc0) >> 6))
+			return -EMSGSIZE;
+		if (rocker_tlv_put_u8(desc_info,
+				      ROCKER_TLV_OF_DPA_IP_ECN_MASK,
+				      (entry->key.acl.ip_tos_mask & 0xc0) >> 6))
+			return -EMSGSIZE;
+		break;
+	}
+
+	if (entry->key.acl.group_id != ROCKER_GROUP_NONE &&
+	    rocker_tlv_put_u32(desc_info, ROCKER_TLV_OF_DPA_GROUP_ID,
+			       entry->key.acl.group_id))
+		return -EMSGSIZE;
+
+	return 0;
+}
+
+static int rocker_cmd_flow_tbl_add(struct rocker *rocker,
+				   struct rocker_port *rocker_port,
+				   struct rocker_desc_info *desc_info,
+				   void *priv)
+{
+	struct rocker_flow_tbl_entry *entry = priv;
+	struct rocker_tlv *cmd_info;
+	int err = 0;
+
+	if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_CMD_TYPE,
+			       ROCKER_TLV_CMD_TYPE_OF_DPA_FLOW_ADD))
+		return -EMSGSIZE;
+	cmd_info = rocker_tlv_nest_start(desc_info, ROCKER_TLV_CMD_INFO);
+	if (!cmd_info)
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_OF_DPA_TABLE_ID,
+			       entry->key.tbl_id))
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u32(desc_info, ROCKER_TLV_OF_DPA_PRIORITY,
+			       entry->key.priority))
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u32(desc_info, ROCKER_TLV_OF_DPA_HARDTIME, 0))
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u64(desc_info, ROCKER_TLV_OF_DPA_COOKIE,
+			       entry->cookie))
+		return -EMSGSIZE;
+
+	switch (entry->key.tbl_id) {
+	case ROCKER_OF_DPA_TABLE_ID_INGRESS_PORT:
+		err = rocker_cmd_flow_tbl_add_ig_port(desc_info, entry);
+		break;
+	case ROCKER_OF_DPA_TABLE_ID_VLAN:
+		err = rocker_cmd_flow_tbl_add_vlan(desc_info, entry);
+		break;
+	case ROCKER_OF_DPA_TABLE_ID_TERMINATION_MAC:
+		err = rocker_cmd_flow_tbl_add_term_mac(desc_info, entry);
+		break;
+	case ROCKER_OF_DPA_TABLE_ID_UNICAST_ROUTING:
+		err = rocker_cmd_flow_tbl_add_ucast_routing(desc_info, entry);
+		break;
+	case ROCKER_OF_DPA_TABLE_ID_BRIDGING:
+		err = rocker_cmd_flow_tbl_add_bridge(desc_info, entry);
+		break;
+	case ROCKER_OF_DPA_TABLE_ID_ACL_POLICY:
+		err = rocker_cmd_flow_tbl_add_acl(desc_info, entry);
+		break;
+	default:
+		err = -ENOTSUPP;
+		break;
+	}
+
+	if (err)
+		return err;
+
+	rocker_tlv_nest_end(desc_info, cmd_info);
+
+	return 0;
+}
+
+static int rocker_cmd_flow_tbl_del(struct rocker *rocker,
+				   struct rocker_port *rocker_port,
+				   struct rocker_desc_info *desc_info,
+				   void *priv)
+{
+	const struct rocker_flow_tbl_entry *entry = priv;
+	struct rocker_tlv *cmd_info;
+
+	if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_CMD_TYPE,
+			       ROCKER_TLV_CMD_TYPE_OF_DPA_FLOW_DEL))
+		return -EMSGSIZE;
+	cmd_info = rocker_tlv_nest_start(desc_info, ROCKER_TLV_CMD_INFO);
+	if (!cmd_info)
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u64(desc_info, ROCKER_TLV_OF_DPA_COOKIE,
+			       entry->cookie))
+		return -EMSGSIZE;
+	rocker_tlv_nest_end(desc_info, cmd_info);
+
+	return 0;
+}
+
+static int
+rocker_cmd_group_tbl_add_l2_interface(struct rocker_desc_info *desc_info,
+				      struct rocker_group_tbl_entry *entry)
+{
+	if (rocker_tlv_put_u32(desc_info, ROCKER_TLV_OF_DPA_OUT_LPORT,
+			       ROCKER_GROUP_PORT_GET(entry->group_id)))
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u8(desc_info, ROCKER_TLV_OF_DPA_POP_VLAN,
+			      entry->l2_interface.pop_vlan))
+		return -EMSGSIZE;
+
+	return 0;
+}
+
+static int
+rocker_cmd_group_tbl_add_l2_rewrite(struct rocker_desc_info *desc_info,
+				    struct rocker_group_tbl_entry *entry)
+{
+	if (rocker_tlv_put_u32(desc_info, ROCKER_TLV_OF_DPA_GROUP_ID_LOWER,
+			       entry->l2_rewrite.group_id))
+		return -EMSGSIZE;
+	if (!is_zero_ether_addr(entry->l2_rewrite.eth_src) &&
+	    rocker_tlv_put(desc_info, ROCKER_TLV_OF_DPA_SRC_MAC,
+			   ETH_ALEN, entry->l2_rewrite.eth_src))
+		return -EMSGSIZE;
+	if (!is_zero_ether_addr(entry->l2_rewrite.eth_dst) &&
+	    rocker_tlv_put(desc_info, ROCKER_TLV_OF_DPA_DST_MAC,
+			   ETH_ALEN, entry->l2_rewrite.eth_dst))
+		return -EMSGSIZE;
+	if (entry->l2_rewrite.vlan_id &&
+	    rocker_tlv_put_u16(desc_info, ROCKER_TLV_OF_DPA_VLAN_ID,
+			       entry->l2_rewrite.vlan_id))
+		return -EMSGSIZE;
+
+	return 0;
+}
+
+static int
+rocker_cmd_group_tbl_add_group_ids(struct rocker_desc_info *desc_info,
+				   struct rocker_group_tbl_entry *entry)
+{
+	int i;
+	struct rocker_tlv *group_ids;
+
+	if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_OF_DPA_GROUP_COUNT,
+			       entry->group_count))
+		return -EMSGSIZE;
+
+	group_ids = rocker_tlv_nest_start(desc_info,
+					  ROCKER_TLV_OF_DPA_GROUP_IDS);
+	if (!group_ids)
+		return -EMSGSIZE;
+
+	for (i = 0; i < entry->group_count; i++)
+		/* Note TLV array is 1-based */
+		if (rocker_tlv_put_u32(desc_info, i + 1, entry->group_ids[i]))
+			return -EMSGSIZE;
+
+	rocker_tlv_nest_end(desc_info, group_ids);
+
+	return 0;
+}
+
+static int
+rocker_cmd_group_tbl_add_l3_unicast(struct rocker_desc_info *desc_info,
+				    struct rocker_group_tbl_entry *entry)
+{
+	if (!is_zero_ether_addr(entry->l3_unicast.eth_src) &&
+	    rocker_tlv_put(desc_info, ROCKER_TLV_OF_DPA_SRC_MAC,
+			   ETH_ALEN, entry->l3_unicast.eth_src))
+		return -EMSGSIZE;
+	if (!is_zero_ether_addr(entry->l3_unicast.eth_dst) &&
+	    rocker_tlv_put(desc_info, ROCKER_TLV_OF_DPA_DST_MAC,
+			   ETH_ALEN, entry->l3_unicast.eth_dst))
+		return -EMSGSIZE;
+	if (entry->l3_unicast.vlan_id &&
+	    rocker_tlv_put_u16(desc_info, ROCKER_TLV_OF_DPA_VLAN_ID,
+			       entry->l3_unicast.vlan_id))
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u8(desc_info, ROCKER_TLV_OF_DPA_TTL_CHECK,
+			      entry->l3_unicast.ttl_check))
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u32(desc_info, ROCKER_TLV_OF_DPA_GROUP_ID_LOWER,
+			       entry->l3_unicast.group_id))
+		return -EMSGSIZE;
+
+	return 0;
+}
+
+static int rocker_cmd_group_tbl_add(struct rocker *rocker,
+				    struct rocker_port *rocker_port,
+				    struct rocker_desc_info *desc_info,
+				    void *priv)
+{
+	struct rocker_group_tbl_entry *entry = priv;
+	struct rocker_tlv *cmd_info;
+	int err = 0;
+
+	if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_CMD_TYPE, entry->cmd))
+		return -EMSGSIZE;
+	cmd_info = rocker_tlv_nest_start(desc_info, ROCKER_TLV_CMD_INFO);
+	if (!cmd_info)
+		return -EMSGSIZE;
+
+	if (rocker_tlv_put_u32(desc_info, ROCKER_TLV_OF_DPA_GROUP_ID,
+			       entry->group_id))
+		return -EMSGSIZE;
+
+	switch (ROCKER_GROUP_TYPE_GET(entry->group_id)) {
+	case ROCKER_OF_DPA_GROUP_TYPE_L2_INTERFACE:
+		err = rocker_cmd_group_tbl_add_l2_interface(desc_info, entry);
+		break;
+	case ROCKER_OF_DPA_GROUP_TYPE_L2_REWRITE:
+		err = rocker_cmd_group_tbl_add_l2_rewrite(desc_info, entry);
+		break;
+	case ROCKER_OF_DPA_GROUP_TYPE_L2_FLOOD:
+	case ROCKER_OF_DPA_GROUP_TYPE_L2_MCAST:
+		err = rocker_cmd_group_tbl_add_group_ids(desc_info, entry);
+		break;
+	case ROCKER_OF_DPA_GROUP_TYPE_L3_UCAST:
+		err = rocker_cmd_group_tbl_add_l3_unicast(desc_info, entry);
+		break;
+	default:
+		err = -ENOTSUPP;
+		break;
+	}
+
+	if (err)
+		return err;
+
+	rocker_tlv_nest_end(desc_info, cmd_info);
+
+	return 0;
+}
+
+static int rocker_cmd_group_tbl_del(struct rocker *rocker,
+				    struct rocker_port *rocker_port,
+				    struct rocker_desc_info *desc_info,
+				    void *priv)
+{
+	const struct rocker_group_tbl_entry *entry = priv;
+	struct rocker_tlv *cmd_info;
+
+	if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_CMD_TYPE, entry->cmd))
+		return -EMSGSIZE;
+	cmd_info = rocker_tlv_nest_start(desc_info, ROCKER_TLV_CMD_INFO);
+	if (!cmd_info)
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u32(desc_info, ROCKER_TLV_OF_DPA_GROUP_ID,
+			       entry->group_id))
+		return -EMSGSIZE;
+	rocker_tlv_nest_end(desc_info, cmd_info);
+
+	return 0;
+}
+
+/*****************************************
+ * Flow, group, FDB, internal VLAN tables
+ *****************************************/
+
+static int rocker_init_tbls(struct rocker *rocker)
+{
+	hash_init(rocker->flow_tbl);
+	spin_lock_init(&rocker->flow_tbl_lock);
+
+	hash_init(rocker->group_tbl);
+	spin_lock_init(&rocker->group_tbl_lock);
+
+	hash_init(rocker->fdb_tbl);
+	spin_lock_init(&rocker->fdb_tbl_lock);
+
+	hash_init(rocker->internal_vlan_tbl);
+	spin_lock_init(&rocker->internal_vlan_tbl_lock);
+
+	return 0;
+}
+
+static void rocker_free_tbls(struct rocker *rocker)
+{
+	unsigned long flags;
+	struct rocker_flow_tbl_entry *flow_entry;
+	struct rocker_group_tbl_entry *group_entry;
+	struct rocker_fdb_tbl_entry *fdb_entry;
+	struct rocker_internal_vlan_tbl_entry *internal_vlan_entry;
+	struct hlist_node *tmp;
+	int bkt;
+
+	spin_lock_irqsave(&rocker->flow_tbl_lock, flags);
+	hash_for_each_safe(rocker->flow_tbl, bkt, tmp, flow_entry, entry)
+		hash_del(&flow_entry->entry);
+	spin_unlock_irqrestore(&rocker->flow_tbl_lock, flags);
+
+	spin_lock_irqsave(&rocker->group_tbl_lock, flags);
+	hash_for_each_safe(rocker->group_tbl, bkt, tmp, group_entry, entry)
+		hash_del(&group_entry->entry);
+	spin_unlock_irqrestore(&rocker->group_tbl_lock, flags);
+
+	spin_lock_irqsave(&rocker->fdb_tbl_lock, flags);
+	hash_for_each_safe(rocker->fdb_tbl, bkt, tmp, fdb_entry, entry)
+		hash_del(&fdb_entry->entry);
+	spin_unlock_irqrestore(&rocker->fdb_tbl_lock, flags);
+
+	spin_lock_irqsave(&rocker->internal_vlan_tbl_lock, flags);
+	hash_for_each_safe(rocker->internal_vlan_tbl, bkt,
+			   tmp, internal_vlan_entry, entry)
+		hash_del(&internal_vlan_entry->entry);
+	spin_unlock_irqrestore(&rocker->internal_vlan_tbl_lock, flags);
+}
+
+static struct rocker_flow_tbl_entry *
+rocker_flow_tbl_find(struct rocker *rocker, struct rocker_flow_tbl_entry *match)
+{
+	struct rocker_flow_tbl_entry *found;
+
+	hash_for_each_possible(rocker->flow_tbl, found, entry, match->key_crc32) {
+		if (memcmp(&found->key, &match->key, sizeof(found->key)) == 0)
+			return found;
+	}
+
+	return NULL;
+}
+
+static int rocker_flow_tbl_add(struct rocker_port *rocker_port,
+			       struct rocker_flow_tbl_entry *match,
+			       bool nowait)
+{
+	struct rocker *rocker = rocker_port->rocker;
+	struct rocker_flow_tbl_entry *found;
+	unsigned long flags;
+	bool add_to_hw = false;
+	int err = 0;
+
+	match->key_crc32 = crc32(~0, &match->key, sizeof(match->key));
+
+	spin_lock_irqsave(&rocker->flow_tbl_lock, flags);
+
+	found = rocker_flow_tbl_find(rocker, match);
+
+	if (found) {
+		kfree(match);
+	} else {
+		found = match;
+		found->cookie = rocker->flow_tbl_next_cookie++;
+		hash_add(rocker->flow_tbl, &found->entry, found->key_crc32);
+		add_to_hw = true;
+	}
+
+	found->ref_count++;
+
+	spin_unlock_irqrestore(&rocker->flow_tbl_lock, flags);
+
+	if (add_to_hw) {
+		err = rocker_cmd_exec(rocker, rocker_port,
+				      rocker_cmd_flow_tbl_add,
+				      found, NULL, NULL, nowait);
+		if (err) {
+			spin_lock_irqsave(&rocker->flow_tbl_lock, flags);
+			hash_del(&found->entry);
+			spin_unlock_irqrestore(&rocker->flow_tbl_lock, flags);
+			kfree(found);
+		}
+	}
+
+	return err;
+}
+
+static int rocker_flow_tbl_del(struct rocker_port *rocker_port,
+			       struct rocker_flow_tbl_entry *match,
+			       bool nowait)
+{
+	struct rocker *rocker = rocker_port->rocker;
+	struct rocker_flow_tbl_entry *found;
+	unsigned long flags;
+	bool del_from_hw = false;
+	int err = 0;
+
+	match->key_crc32 = crc32(~0, &match->key, sizeof(match->key));
+
+	spin_lock_irqsave(&rocker->flow_tbl_lock, flags);
+
+	found = rocker_flow_tbl_find(rocker, match);
+
+	if (found) {
+		found->ref_count--;
+		if (found->ref_count == 0) {
+			hash_del(&found->entry);
+			del_from_hw = true;
+		}
+	}
+
+	spin_unlock_irqrestore(&rocker->flow_tbl_lock, flags);
+
+	kfree(match);
+
+	if (del_from_hw) {
+		err = rocker_cmd_exec(rocker, rocker_port,
+				      rocker_cmd_flow_tbl_del,
+				      found, NULL, NULL, nowait);
+		kfree(found);
+	}
+
+	return err;
+}
+
+static gfp_t rocker_op_flags_gfp(int flags)
+{
+	return flags & ROCKER_OP_FLAG_NOWAIT ? GFP_ATOMIC : GFP_KERNEL;
+}
+
+static int rocker_flow_tbl_do(struct rocker_port *rocker_port,
+			      int flags, struct rocker_flow_tbl_entry *entry)
+{
+	bool nowait = flags & ROCKER_OP_FLAG_NOWAIT;
+
+	if (flags & ROCKER_OP_FLAG_REMOVE)
+		return rocker_flow_tbl_del(rocker_port, entry, nowait);
+	else
+		return rocker_flow_tbl_add(rocker_port, entry, nowait);
+}
+
+static int rocker_flow_tbl_ig_port(struct rocker_port *rocker_port,
+				   int flags, u32 in_lport, u32 in_lport_mask,
+				   enum rocker_of_dpa_table_id goto_tbl)
+{
+	struct rocker_flow_tbl_entry *entry;
+
+	entry = kzalloc(sizeof(*entry), rocker_op_flags_gfp(flags));
+	if (!entry)
+		return -ENOMEM;
+
+	entry->key.priority = ROCKER_PRIORITY_IG_PORT;
+	entry->key.tbl_id = ROCKER_OF_DPA_TABLE_ID_INGRESS_PORT;
+	entry->key.ig_port.in_lport = in_lport;
+	entry->key.ig_port.in_lport_mask = in_lport_mask;
+	entry->key.ig_port.goto_tbl = goto_tbl;
+
+	return rocker_flow_tbl_do(rocker_port, flags, entry);
+}
+
+static int rocker_flow_tbl_vlan(struct rocker_port *rocker_port,
+				int flags, u32 in_lport,
+				__be16 vlan_id, __be16 vlan_id_mask,
+				enum rocker_of_dpa_table_id goto_tbl,
+				bool untagged, __be16 new_vlan_id)
+{
+	struct rocker_flow_tbl_entry *entry;
+
+	entry = kzalloc(sizeof(*entry), rocker_op_flags_gfp(flags));
+	if (!entry)
+		return -ENOMEM;
+
+	entry->key.priority = ROCKER_PRIORITY_VLAN;
+	entry->key.tbl_id = ROCKER_OF_DPA_TABLE_ID_VLAN;
+	entry->key.vlan.in_lport = in_lport;
+	entry->key.vlan.vlan_id = vlan_id;
+	entry->key.vlan.vlan_id_mask = vlan_id_mask;
+	entry->key.vlan.goto_tbl = goto_tbl;
+
+	entry->key.vlan.untagged = untagged;
+	entry->key.vlan.new_vlan_id = new_vlan_id;
+
+	return rocker_flow_tbl_do(rocker_port, flags, entry);
+}
+
+static int rocker_flow_tbl_term_mac(struct rocker_port *rocker_port,
+				    u32 in_lport, u32 in_lport_mask,
+				    __be16 eth_type, const u8 *eth_dst,
+				    const u8 *eth_dst_mask, __be16 vlan_id,
+				    __be16 vlan_id_mask, bool copy_to_cpu,
+				    int flags)
+{
+	struct rocker_flow_tbl_entry *entry;
+
+	entry = kzalloc(sizeof(*entry), rocker_op_flags_gfp(flags));
+	if (!entry)
+		return -ENOMEM;
+
+	if (is_multicast_ether_addr(eth_dst)) {
+		entry->key.priority = ROCKER_PRIORITY_TERM_MAC_MCAST;
+		entry->key.term_mac.goto_tbl =
+			 ROCKER_OF_DPA_TABLE_ID_MULTICAST_ROUTING;
+	} else {
+		entry->key.priority = ROCKER_PRIORITY_TERM_MAC_UCAST;
+		entry->key.term_mac.goto_tbl =
+			 ROCKER_OF_DPA_TABLE_ID_UNICAST_ROUTING;
+	}
+
+	entry->key.tbl_id = ROCKER_OF_DPA_TABLE_ID_TERMINATION_MAC;
+	entry->key.term_mac.in_lport = in_lport;
+	entry->key.term_mac.in_lport_mask = in_lport_mask;
+	entry->key.term_mac.eth_type = eth_type;
+	ether_addr_copy(entry->key.term_mac.eth_dst, eth_dst);
+	ether_addr_copy(entry->key.term_mac.eth_dst_mask, eth_dst_mask);
+	entry->key.term_mac.vlan_id = vlan_id;
+	entry->key.term_mac.vlan_id_mask = vlan_id_mask;
+	entry->key.term_mac.copy_to_cpu = copy_to_cpu;
+
+	return rocker_flow_tbl_do(rocker_port, flags, entry);
+}
+
+static int rocker_flow_tbl_bridge(struct rocker_port *rocker_port,
+				  int flags,
+				  const u8 *eth_dst, const u8 *eth_dst_mask,
+				  __be16 vlan_id, u32 tunnel_id,
+				  enum rocker_of_dpa_table_id goto_tbl,
+				  u32 group_id, bool copy_to_cpu)
+{
+	struct rocker_flow_tbl_entry *entry;
+	u32 priority;
+	bool vlan_bridging = !!vlan_id;
+	bool dflt = !eth_dst || (eth_dst && eth_dst_mask);
+	bool wild = false;
+
+	entry = kzalloc(sizeof(*entry), rocker_op_flags_gfp(flags));
+	if (!entry)
+		return -ENOMEM;
+
+	entry->key.tbl_id = ROCKER_OF_DPA_TABLE_ID_BRIDGING;
+
+	if (eth_dst) {
+		entry->key.bridge.has_eth_dst = 1;
+		ether_addr_copy(entry->key.bridge.eth_dst, eth_dst);
+	}
+	if (eth_dst_mask) {
+		entry->key.bridge.has_eth_dst_mask = 1;
+		ether_addr_copy(entry->key.bridge.eth_dst_mask, eth_dst_mask);
+		if (memcmp(eth_dst_mask, ff_mac, ETH_ALEN))
+			wild = true;
+	}
+
+	priority = ROCKER_PRIORITY_UNKNOWN;
+	if (vlan_bridging & dflt & wild)
+		priority = ROCKER_PRIORITY_BRIDGING_VLAN_DFLT_WILD;
+	else if (vlan_bridging & dflt & !wild)
+		priority = ROCKER_PRIORITY_BRIDGING_VLAN_DFLT_EXACT;
+	else if (vlan_bridging & !dflt)
+		priority = ROCKER_PRIORITY_BRIDGING_VLAN;
+	else if (!vlan_bridging & dflt & wild)
+		priority = ROCKER_PRIORITY_BRIDGING_TENANT_DFLT_WILD;
+	else if (!vlan_bridging & dflt & !wild)
+		priority = ROCKER_PRIORITY_BRIDGING_TENANT_DFLT_EXACT;
+	else if (!vlan_bridging & !dflt)
+		priority = ROCKER_PRIORITY_BRIDGING_TENANT;
+
+	entry->key.priority = priority;
+	entry->key.bridge.vlan_id = vlan_id;
+	entry->key.bridge.tunnel_id = tunnel_id;
+	entry->key.bridge.goto_tbl = goto_tbl;
+	entry->key.bridge.group_id = group_id;
+	entry->key.bridge.copy_to_cpu = copy_to_cpu;
+
+	return rocker_flow_tbl_do(rocker_port, flags, entry);
+}
+
+static int rocker_flow_tbl_acl(struct rocker_port *rocker_port,
+			       int flags, u32 in_lport,
+			       u32 in_lport_mask,
+			       const u8 *eth_src, const u8 *eth_src_mask,
+			       const u8 *eth_dst, const u8 *eth_dst_mask,
+			       __be16 eth_type,
+			       __be16 vlan_id, __be16 vlan_id_mask,
+			       u8 ip_proto, u8 ip_proto_mask,
+			       u8 ip_tos, u8 ip_tos_mask,
+			       u32 group_id)
+{
+	u32 priority;
+	struct rocker_flow_tbl_entry *entry;
+
+	entry = kzalloc(sizeof(*entry), rocker_op_flags_gfp(flags));
+	if (!entry)
+		return -ENOMEM;
+
+	priority = ROCKER_PRIORITY_ACL_NORMAL;
+	if (eth_dst && eth_dst_mask) {
+		if (memcmp(eth_dst_mask, mcast_mac, ETH_ALEN) == 0)
+			priority = ROCKER_PRIORITY_ACL_DFLT;
+		else if (is_link_local_ether_addr(eth_dst))
+			priority = ROCKER_PRIORITY_ACL_CTRL;
+	}
+
+	entry->key.priority = priority;
+	entry->key.tbl_id = ROCKER_OF_DPA_TABLE_ID_ACL_POLICY;
+	entry->key.acl.in_lport = in_lport;
+	entry->key.acl.in_lport_mask = in_lport_mask;
+
+	if (eth_src)
+		ether_addr_copy(entry->key.acl.eth_src, eth_src);
+	if (eth_src_mask)
+		ether_addr_copy(entry->key.acl.eth_src_mask, eth_src_mask);
+	if (eth_dst)
+		ether_addr_copy(entry->key.acl.eth_dst, eth_dst);
+	if (eth_dst_mask)
+		ether_addr_copy(entry->key.acl.eth_dst_mask, eth_dst_mask);
+
+	entry->key.acl.eth_type = eth_type;
+	entry->key.acl.vlan_id = vlan_id;
+	entry->key.acl.vlan_id_mask = vlan_id_mask;
+	entry->key.acl.ip_proto = ip_proto;
+	entry->key.acl.ip_proto_mask = ip_proto_mask;
+	entry->key.acl.ip_tos = ip_tos;
+	entry->key.acl.ip_tos_mask = ip_tos_mask;
+	entry->key.acl.group_id = group_id;
+
+	return rocker_flow_tbl_do(rocker_port, flags, entry);
+}
+
+static struct rocker_group_tbl_entry *
+rocker_group_tbl_find(struct rocker *rocker,
+		      struct rocker_group_tbl_entry *match)
+{
+	struct rocker_group_tbl_entry *found;
+
+	hash_for_each_possible(rocker->group_tbl, found,
+			       entry, match->group_id) {
+		if (found->group_id == match->group_id)
+			return found;
+	}
+
+	return NULL;
+}
+
+static void rocker_group_tbl_entry_free(struct rocker_group_tbl_entry *entry)
+{
+	switch (ROCKER_GROUP_TYPE_GET(entry->group_id)) {
+	case ROCKER_OF_DPA_GROUP_TYPE_L2_FLOOD:
+	case ROCKER_OF_DPA_GROUP_TYPE_L2_MCAST:
+		kfree(entry->group_ids);
+		break;
+	default:
+		break;
+	}
+	kfree(entry);
+}
+
+static int rocker_group_tbl_add(struct rocker_port *rocker_port,
+				struct rocker_group_tbl_entry *match,
+				bool nowait)
+{
+	struct rocker *rocker = rocker_port->rocker;
+	struct rocker_group_tbl_entry *found;
+	unsigned long flags;
+	int err = 0;
+
+	spin_lock_irqsave(&rocker->group_tbl_lock, flags);
+
+	found = rocker_group_tbl_find(rocker, match);
+
+	if (found) {
+		hash_del(&found->entry);
+		rocker_group_tbl_entry_free(found);
+		found = match;
+		found->cmd = ROCKER_TLV_CMD_TYPE_OF_DPA_GROUP_MOD;
+	} else {
+		found = match;
+		found->cmd = ROCKER_TLV_CMD_TYPE_OF_DPA_GROUP_ADD;
+	}
+
+	hash_add(rocker->group_tbl, &found->entry, found->group_id);
+
+	spin_unlock_irqrestore(&rocker->group_tbl_lock, flags);
+
+	if (found->cmd)
+		err = rocker_cmd_exec(rocker, rocker_port,
+				      rocker_cmd_group_tbl_add,
+				      found, NULL, NULL, nowait);
+
+	return err;
+}
+
+static int rocker_group_tbl_del(struct rocker_port *rocker_port,
+				struct rocker_group_tbl_entry *match,
+				bool nowait)
+{
+	struct rocker *rocker = rocker_port->rocker;
+	struct rocker_group_tbl_entry *found;
+	unsigned long flags;
+	int err = 0;
+
+	spin_lock_irqsave(&rocker->group_tbl_lock, flags);
+
+	found = rocker_group_tbl_find(rocker, match);
+
+	if (found) {
+		hash_del(&found->entry);
+		found->cmd = ROCKER_TLV_CMD_TYPE_OF_DPA_GROUP_DEL;
+	}
+
+	spin_unlock_irqrestore(&rocker->group_tbl_lock, flags);
+
+	rocker_group_tbl_entry_free(match);
+
+	if (found) {
+		err = rocker_cmd_exec(rocker, rocker_port,
+				      rocker_cmd_group_tbl_del,
+				      found, NULL, NULL, nowait);
+		rocker_group_tbl_entry_free(found);
+	}
+
+	return err;
+}
+
+static int rocker_group_tbl_do(struct rocker_port *rocker_port,
+			       int flags, struct rocker_group_tbl_entry *entry)
+{
+	bool nowait = flags & ROCKER_OP_FLAG_NOWAIT;
+
+	if (flags & ROCKER_OP_FLAG_REMOVE)
+		return rocker_group_tbl_del(rocker_port, entry, nowait);
+	else
+		return rocker_group_tbl_add(rocker_port, entry, nowait);
+}
+
+static int rocker_group_l2_interface(struct rocker_port *rocker_port,
+				     int flags, __be16 vlan_id,
+				     u32 out_lport, int pop_vlan)
+{
+	struct rocker_group_tbl_entry *entry;
+
+	entry = kzalloc(sizeof(*entry), rocker_op_flags_gfp(flags));
+	if (!entry)
+		return -ENOMEM;
+
+	entry->group_id = ROCKER_GROUP_L2_INTERFACE(vlan_id, out_lport);
+	entry->l2_interface.pop_vlan = pop_vlan;
+
+	return rocker_group_tbl_do(rocker_port, flags, entry);
+}
+
+static int rocker_group_l2_fan_out(struct rocker_port *rocker_port,
+				   int flags, u8 group_count,
+				   u32 *group_ids, u32 group_id)
+{
+	struct rocker_group_tbl_entry *entry;
+
+	entry = kzalloc(sizeof(*entry), rocker_op_flags_gfp(flags));
+	if (!entry)
+		return -ENOMEM;
+
+	entry->group_id = group_id;
+	entry->group_count = group_count;
+
+	entry->group_ids = kcalloc(group_count, sizeof(u32),
+				   rocker_op_flags_gfp(flags));
+	if (!entry->group_ids) {
+		kfree(entry);
+		return -ENOMEM;
+	}
+	memcpy(entry->group_ids, group_ids, group_count * sizeof(u32));
+
+	return rocker_group_tbl_do(rocker_port, flags, entry);
+}
+
+static int rocker_group_l2_flood(struct rocker_port *rocker_port,
+				 int flags, __be16 vlan_id,
+				 u8 group_count, u32 *group_ids,
+				 u32 group_id)
+{
+	return rocker_group_l2_fan_out(rocker_port, flags,
+				       group_count, group_ids,
+				       group_id);
+}
+
+static struct rocker_ctrl {
+	const u8 *eth_dst;
+	const u8 *eth_dst_mask;
+	u16 eth_type;
+	bool acl;
+	bool bridge;
+	bool term;
+	bool copy_to_cpu;
+} rocker_ctrls[] = {
+	[ROCKER_CTRL_LINK_LOCAL_MCAST] = {
+		/* pass link local multicast pkts up to CPU for filtering */
+		.eth_dst = ll_mac,
+		.eth_dst_mask = ll_mask,
+		.acl = true,
+	},
+	[ROCKER_CTRL_LOCAL_ARP] = {
+		/* pass local ARP pkts up to CPU */
+		.eth_dst = zero_mac,
+		.eth_dst_mask = zero_mac,
+		.eth_type = htons(ETH_P_ARP),
+		.acl = true,
+	},
+	[ROCKER_CTRL_IPV4_MCAST] = {
+		/* pass IPv4 mcast pkts up to CPU, RFC 1112 */
+		.eth_dst = ipv4_mcast,
+		.eth_dst_mask = ipv4_mask,
+		.eth_type = htons(ETH_P_IP),
+		.term  = true,
+		.copy_to_cpu = true,
+	},
+	[ROCKER_CTRL_IPV6_MCAST] = {
+		/* pass IPv6 mcast pkts up to CPU, RFC 2464 */
+		.eth_dst = ipv6_mcast,
+		.eth_dst_mask = ipv6_mask,
+		.eth_type = htons(ETH_P_IPV6),
+		.term  = true,
+		.copy_to_cpu = true,
+	},
+	[ROCKER_CTRL_DFLT_BRIDGING] = {
+		/* flood any pkts on vlan */
+		.bridge = true,
+		.copy_to_cpu = true,
+	},
+};
+
+static int rocker_port_ctrl_vlan_acl(struct rocker_port *rocker_port,
+				     int flags, struct rocker_ctrl *ctrl,
+				     __be16 vlan_id)
+{
+	u32 in_lport = rocker_port->lport;
+	u32 in_lport_mask = 0xffffffff;
+	u32 out_lport = 0;
+	u8 *eth_src = NULL;
+	u8 *eth_src_mask = NULL;
+	__be16 vlan_id_mask = htons(0xffff);
+	u8 ip_proto = 0;
+	u8 ip_proto_mask = 0;
+	u8 ip_tos = 0;
+	u8 ip_tos_mask = 0;
+	u32 group_id = ROCKER_GROUP_L2_INTERFACE(vlan_id, out_lport);
+	int err;
+
+	err = rocker_flow_tbl_acl(rocker_port, flags,
+				  in_lport, in_lport_mask,
+				  eth_src, eth_src_mask,
+				  ctrl->eth_dst, ctrl->eth_dst_mask,
+				  ctrl->eth_type,
+				  vlan_id, vlan_id_mask,
+				  ip_proto, ip_proto_mask,
+				  ip_tos, ip_tos_mask,
+				  group_id);
+
+	if (err)
+		netdev_err(rocker_port->dev, "Error (%d) ctrl ACL\n", err);
+
+	return err;
+}
+
+static int rocker_port_ctrl_vlan_term(struct rocker_port *rocker_port,
+				      int flags, struct rocker_ctrl *ctrl,
+				      __be16 vlan_id)
+{
+	u32 in_lport_mask = 0xffffffff;
+	__be16 vlan_id_mask = htons(0xffff);
+	int err;
+
+	if (ntohs(vlan_id) == 0)
+		vlan_id = rocker_port->internal_vlan_id;
+
+	err = rocker_flow_tbl_term_mac(rocker_port,
+				       rocker_port->lport, in_lport_mask,
+				       ctrl->eth_type, ctrl->eth_dst,
+				       ctrl->eth_dst_mask, vlan_id,
+				       vlan_id_mask, ctrl->copy_to_cpu,
+				       flags);
+
+	if (err)
+		netdev_err(rocker_port->dev, "Error (%d) ctrl term\n", err);
+
+	return err;
+}
+
+static int rocker_port_ctrl_vlan(struct rocker_port *rocker_port, int flags,
+				 struct rocker_ctrl *ctrl, __be16 vlan_id)
+{
+	if (ctrl->acl)
+		return rocker_port_ctrl_vlan_acl(rocker_port, flags,
+						 ctrl, vlan_id);
+
+	if (ctrl->term)
+		return rocker_port_ctrl_vlan_term(rocker_port, flags,
+						  ctrl, vlan_id);
+
+	return -EOPNOTSUPP;
+}
+
+static int rocker_port_ctrl_vlan_add(struct rocker_port *rocker_port,
+				     int flags, __be16 vlan_id)
+{
+	int err = 0;
+	int i;
+
+	for (i = 0; i < ROCKER_CTRL_MAX; i++) {
+		if (rocker_port->ctrls[i]) {
+			err = rocker_port_ctrl_vlan(rocker_port, flags,
+						    &rocker_ctrls[i], vlan_id);
+			if (err)
+				return err;
+		}
+	}
+
+	return err;
+}
+
+static int rocker_port_ctrl(struct rocker_port *rocker_port, int flags,
+			    struct rocker_ctrl *ctrl)
+{
+	u16 vid;
+	int err = 0;
+
+	for (vid = 1; vid < VLAN_N_VID; vid++) {
+		if (!test_bit(vid, rocker_port->vlan_bitmap))
+			continue;
+		err = rocker_port_ctrl_vlan(rocker_port, flags,
+					    ctrl, htons(vid));
+		if (err)
+			break;
+	}
+
+	return err;
+}
+
+static int rocker_port_ig_tbl(struct rocker_port *rocker_port, int flags)
+{
+	enum rocker_of_dpa_table_id goto_tbl;
+	u32 in_lport;
+	u32 in_lport_mask;
+	int err;
+
+	/* Normal Ethernet Frames.  Matches pkts from any local physical
+	 * ports.  Goto VLAN tbl.
+	 */
+
+	in_lport = 0;
+	in_lport_mask = 0xffff0000;
+	goto_tbl = ROCKER_OF_DPA_TABLE_ID_VLAN;
+
+	err = rocker_flow_tbl_ig_port(rocker_port, flags,
+				      in_lport, in_lport_mask,
+				      goto_tbl);
+	if (err)
+		netdev_err(rocker_port->dev,
+			   "Error (%d) ingress port table entry\n", err);
+
+	return err;
+}
+
+static int rocker_port_router_mac(struct rocker_port *rocker_port,
+				  int flags, __be16 vlan_id)
+{
+	u32 in_lport_mask = 0xffffffff;
+	__be16 eth_type;
+	const u8 *dst_mac_mask = ff_mac;
+	__be16 vlan_id_mask = htons(0xffff);
+	bool copy_to_cpu = false;
+	int err;
+
+	if (ntohs(vlan_id) == 0)
+		vlan_id = rocker_port->internal_vlan_id;
+
+	eth_type = htons(ETH_P_IP);
+	err = rocker_flow_tbl_term_mac(rocker_port,
+				       rocker_port->lport, in_lport_mask,
+				       eth_type, rocker_port->dev->dev_addr,
+				       dst_mac_mask, vlan_id, vlan_id_mask,
+				       copy_to_cpu, flags);
+	if (err)
+		return err;
+
+	eth_type = htons(ETH_P_IPV6);
+	err = rocker_flow_tbl_term_mac(rocker_port,
+				       rocker_port->lport, in_lport_mask,
+				       eth_type, rocker_port->dev->dev_addr,
+				       dst_mac_mask, vlan_id, vlan_id_mask,
+				       copy_to_cpu, flags);
+
+	return err;
+}
+
+static struct rocker_internal_vlan_tbl_entry *
+rocker_internal_vlan_tbl_find(struct rocker *rocker, int ifindex)
+{
+	struct rocker_internal_vlan_tbl_entry *found;
+
+	hash_for_each_possible(rocker->internal_vlan_tbl,
+			       found, entry, ifindex) {
+		if (found->ifindex == ifindex)
+			return found;
+	}
+
+	return NULL;
+}
+
+static __be16 rocker_port_internal_vlan_id_get(struct rocker_port *rocker_port,
+					       int ifindex)
+{
+	struct rocker *rocker = rocker_port->rocker;
+	struct rocker_internal_vlan_tbl_entry *entry;
+	struct rocker_internal_vlan_tbl_entry *found;
+	unsigned long lock_flags;
+	int i;
+
+	entry = kzalloc(sizeof(*entry), GFP_KERNEL);
+	if (!entry)
+		return 0;
+
+	entry->ifindex = ifindex;
+
+	spin_lock_irqsave(&rocker->internal_vlan_tbl_lock, lock_flags);
+
+	found = rocker_internal_vlan_tbl_find(rocker, ifindex);
+	if (found) {
+		kfree(entry);
+		goto found;
+	}
+
+	found = entry;
+	hash_add(rocker->internal_vlan_tbl, &found->entry, found->ifindex);
+
+	for (i = 0; i < ROCKER_N_INTERNAL_VLANS; i++) {
+		if (test_and_set_bit(i, rocker->internal_vlan_bitmap))
+			continue;
+		found->vlan_id = htons(ROCKER_INTERNAL_VLAN_ID_BASE + i);
+		goto found;
+	}
+
+	netdev_err(rocker_port->dev, "Out of internal VLAN IDs\n");
+
+found:
+	found->ref_count++;
+	spin_unlock_irqrestore(&rocker->internal_vlan_tbl_lock, lock_flags);
+
+	return found->vlan_id;
+}
+
+static void rocker_port_internal_vlan_id_put(struct rocker_port *rocker_port,
+					     int ifindex)
+{
+	struct rocker *rocker = rocker_port->rocker;
+	struct rocker_internal_vlan_tbl_entry *found;
+	unsigned long lock_flags;
+	unsigned long bit;
+
+	spin_lock_irqsave(&rocker->internal_vlan_tbl_lock, lock_flags);
+
+	found = rocker_internal_vlan_tbl_find(rocker, ifindex);
+	if (!found) {
+		netdev_err(rocker_port->dev,
+			   "ifindex (%d) not found in internal VLAN tbl\n",
+			   ifindex);
+		goto not_found;
+	}
+
+	if (--found->ref_count <= 0) {
+		bit = ntohs(found->vlan_id) - ROCKER_INTERNAL_VLAN_ID_BASE;
+		clear_bit(bit, rocker->internal_vlan_bitmap);
+		hash_del(&found->entry);
+		kfree(found);
+	}
+
+not_found:
+	spin_unlock_irqrestore(&rocker->internal_vlan_tbl_lock, lock_flags);
+}
+
 /*****************
  * Net device ops
  *****************/
@@ -1768,10 +3209,14 @@ static void rocker_carrier_init(struct rocker_port *rocker_port)
 
 static void rocker_remove_ports(struct rocker *rocker)
 {
+	struct rocker_port *rocker_port;
 	int i;
 
-	for (i = 0; i < rocker->port_count; i++)
-		unregister_netdev(rocker->ports[i]->dev);
+	for (i = 0; i < rocker->port_count; i++) {
+		rocker_port = rocker->ports[i];
+		rocker_port_ig_tbl(rocker_port, ROCKER_OP_FLAG_REMOVE);
+		unregister_netdev(rocker_port->dev);
+	}
 	kfree(rocker->ports);
 }
 
@@ -1823,8 +3268,18 @@ static int rocker_probe_port(struct rocker *rocker, unsigned int port_number)
 	}
 	rocker->ports[port_number] = rocker_port;
 
+	rocker_port->internal_vlan_id =
+		rocker_port_internal_vlan_id_get(rocker_port, dev->ifindex);
+	err = rocker_port_ig_tbl(rocker_port, 0);
+	if (err) {
+		dev_err(&pdev->dev, "install ig port table failed\n");
+		goto err_port_ig_tbl;
+	}
+
 	return 0;
 
+err_port_ig_tbl:
+	unregister_netdev(dev);
 err_register_netdev:
 	free_netdev(dev);
 	return err;
@@ -1981,6 +3436,12 @@ static int rocker_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 
 	rocker->hw.id = rocker_read64(rocker, SWITCH_ID);
 
+	err = rocker_init_tbls(rocker);
+	if (err) {
+		dev_err(&pdev->dev, "cannot init rocker tables\n");
+		goto err_init_tbls;
+	}
+
 	err = rocker_probe_ports(rocker);
 	if (err) {
 		dev_err(&pdev->dev, "failed to probe ports\n");
@@ -1992,6 +3453,8 @@ static int rocker_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	return 0;
 
 err_probe_ports:
+	rocker_free_tbls(rocker);
+err_init_tbls:
 	free_irq(rocker_msix_vector(rocker, ROCKER_MSIX_VEC_EVENT), rocker);
 err_request_event_irq:
 	free_irq(rocker_msix_vector(rocker, ROCKER_MSIX_VEC_CMD), rocker);
@@ -2017,6 +3480,7 @@ static void rocker_remove(struct pci_dev *pdev)
 {
 	struct rocker *rocker = pci_get_drvdata(pdev);
 
+	rocker_free_tbls(rocker);
 	rocker_write32(rocker, CONTROL, ROCKER_CONTROL_RESET);
 	rocker_remove_ports(rocker);
 	free_irq(rocker_msix_vector(rocker, ROCKER_MSIX_VEC_EVENT), rocker);
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [patch net-next v2 10/10] rocker: implement L2 bridge offloading
  2014-11-09 10:51 [patch net-next v2 00/10] introduce rocker switch driver with hardware accelerated datapath api - phase 1: bridge fdb offload Jiri Pirko
                   ` (8 preceding siblings ...)
  2014-11-09 10:51 ` [patch net-next v2 09/10] rocker: implement rocker ofdpa flow table manipulation Jiri Pirko
@ 2014-11-09 10:51 ` Jiri Pirko
  2014-11-10  3:53   ` Jamal Hadi Salim
  2014-11-09 16:40 ` [patch net-next] bridge: rename fdb_*_hw to fdb_*_hw_addr to avoid confusion Jiri Pirko
                   ` (3 subsequent siblings)
  13 siblings, 1 reply; 100+ messages in thread
From: Jiri Pirko @ 2014-11-09 10:51 UTC (permalink / raw)
  To: netdev
  Cc: davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse, pshelar,
	azhou, ben, stephen, jeffrey.t.kirsher, vyasevic, xiyou.wangcong,
	john.r.fastabend, edumazet, jhs, sfeldma, f.fainelli, roopa,
	linville, jasowang, ebiederm, nicolas.dichtel, ryazanov.s.a,
	buytenh, aviadr, nbd, alexei.starovoitov, Neil.Jerram, ronye,
	simon.horman, alexander.h.duyck, john.ronciak, mleitner,
	shrijeet, gospo, bcrl

From: Scott Feldman <sfeldma@gmail.com>

Add L2 bridge offloading support to rocker driver.  Here, the Linux bridge
driver is used to collect swdev ports into a tagged (or untagged) VLAN
bridge.  The swdev will offload from the bridge driver the following L2
bridging functions:

 - Learning of neighbor MAC addresses on VLAN X  Learned mac/vlan is
installed in bridge FDB.  (And removed when device unlearns mac/vlan).
Learning must be turned off on each bridge port to disable the feature in
the bridge driver.

- Flooding of multicast/broadcast and unknown unicast pkts to (STP)
active ports in bridge.  The bridge driver is unaware of the flooding happening
at the device level.  Flooding must be turned off on each bridge port to
disable the feature on the bridge driver.

- STP port state is pushed down to driver/device.  The bridge still processes
STP BDPUs and maintains port STP state (for all VLANs in bridge), but
the driver/device must be notified of port STP state change to program
the device.

Multiple (VLAN) bridges are supported.  The device (implemented per
the OF-DPA spec) must use a portion of the VLAN namespace for
internal VLANs.  Right now, the upper 255 VLANs (0xf00 to 0xffe) are
used as internal VLAN IDs for untagged traffic and are not available
as port VLANs.

The driver uses the following interfaces:

1. To track VLAN add/del on ports in bridge:

.ndo_vlan_rx_add_vid
.ndo_vlan_rx_kill_vid

2. To track port add/del membership in bridge:

NETDEV_CHANGEUPPER netdevice notifier

3. To catch static FDB entries installed on bridge/vlan by user using netlink:

.ndo_sw_port_fdb_add
.ndo_sw_port_fdb_del

4. To be notified on port STP state change:

.ndo_sw_port_stp_update

5. To notify bridge driver on learned/forgotten mac/vlans on bridge port:

br_fdb_learn_add
br_fdb_learn_del

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
---
 drivers/net/ethernet/rocker/rocker.c | 660 ++++++++++++++++++++++++++++++++++-
 1 file changed, 659 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/rocker/rocker.c b/drivers/net/ethernet/rocker/rocker.c
index ea58a4f..17ffa30 100644
--- a/drivers/net/ethernet/rocker/rocker.c
+++ b/drivers/net/ethernet/rocker/rocker.c
@@ -28,6 +28,7 @@
 #include <linux/ethtool.h>
 #include <linux/if_ether.h>
 #include <linux/if_vlan.h>
+#include <linux/if_bridge.h>
 #include <linux/bitops.h>
 #include <net/switchdev.h>
 #include <net/rtnetlink.h>
@@ -196,10 +197,12 @@ enum {
 
 struct rocker_port {
 	struct net_device *dev;
+	struct net_device *bridge_dev;
 	struct rocker *rocker;
 	unsigned int port_number;
 	u32 lport;
 	__be16 internal_vlan_id;
+	int stp_state;
 	bool ctrls[ROCKER_CTRL_MAX];
 	unsigned long vlan_bitmap[ROCKER_VLAN_BITMAP_LEN];
 	struct napi_struct napi_tx;
@@ -290,6 +293,20 @@ static __be16 rocker_port_vid_to_vlan(struct rocker_port *rocker_port,
 	return vlan_id;
 }
 
+static u16 rocker_port_vlan_to_vid(struct rocker_port *rocker_port,
+				   __be16 vlan_id)
+{
+	if (rocker_vlan_id_is_internal(vlan_id))
+		return 0;
+
+	return ntohs(vlan_id);
+}
+
+static bool rocker_port_is_bridged(struct rocker_port *rocker_port)
+{
+	return !!rocker_port->bridge_dev;
+}
+
 struct rocker_wait {
 	wait_queue_head_t wait;
 	bool done;
@@ -1302,6 +1319,42 @@ static int rocker_event_link_change(struct rocker *rocker,
 #define ROCKER_OP_FLAG_NOWAIT		BIT(1)
 #define ROCKER_OP_FLAG_LEARNED		BIT(2)
 
+static int rocker_port_fdb(struct rocker_port *rocker_port,
+			   const unsigned char *addr,
+			   __be16 vlan_id, int flags);
+
+static int rocker_event_mac_vlan_seen(struct rocker *rocker,
+				      const struct rocker_tlv *info)
+{
+	struct rocker_tlv *attrs[ROCKER_TLV_EVENT_MAC_VLAN_MAX + 1];
+	unsigned int port_number;
+	struct rocker_port *rocker_port;
+	unsigned char *addr;
+	int flags = ROCKER_OP_FLAG_NOWAIT | ROCKER_OP_FLAG_LEARNED;
+	__be16 vlan_id;
+
+	rocker_tlv_parse_nested(attrs, ROCKER_TLV_EVENT_MAC_VLAN_MAX, info);
+	if (!attrs[ROCKER_TLV_EVENT_MAC_VLAN_LPORT] ||
+	    !attrs[ROCKER_TLV_EVENT_MAC_VLAN_MAC] ||
+	    !attrs[ROCKER_TLV_EVENT_MAC_VLAN_VLAN_ID])
+		return -EIO;
+	port_number =
+		rocker_tlv_get_u32(attrs[ROCKER_TLV_EVENT_MAC_VLAN_LPORT]) - 1;
+	addr = rocker_tlv_data(attrs[ROCKER_TLV_EVENT_MAC_VLAN_MAC]);
+	vlan_id = rocker_tlv_get_u16(attrs[ROCKER_TLV_EVENT_MAC_VLAN_VLAN_ID]);
+
+	if (port_number >= rocker->port_count)
+		return -EINVAL;
+
+	rocker_port = rocker->ports[port_number];
+
+	if (rocker_port->stp_state != BR_STATE_LEARNING &&
+	    rocker_port->stp_state != BR_STATE_FORWARDING)
+		return 0;
+
+	return rocker_port_fdb(rocker_port, addr, vlan_id, flags);
+}
+
 static int rocker_event_process(struct rocker *rocker,
 				struct rocker_desc_info *desc_info)
 {
@@ -1320,6 +1373,8 @@ static int rocker_event_process(struct rocker *rocker,
 	switch (type) {
 	case ROCKER_TLV_EVENT_TYPE_LINK_CHANGED:
 		return rocker_event_link_change(rocker, info);
+	case ROCKER_TLV_EVENT_TYPE_MAC_VLAN_SEEN:
+		return rocker_event_mac_vlan_seen(rocker, info);
 	}
 
 	return -EOPNOTSUPP;
@@ -2546,6 +2601,104 @@ static int rocker_group_l2_flood(struct rocker_port *rocker_port,
 				       group_id);
 }
 
+static int rocker_port_vlan_flood_group(struct rocker_port *rocker_port,
+					int flags, __be16 vlan_id)
+{
+	struct rocker_port *p;
+	struct rocker *rocker = rocker_port->rocker;
+	u32 group_id = ROCKER_GROUP_L2_FLOOD(vlan_id, 0);
+	u32 group_ids[rocker->port_count];
+	u8 group_count = 0;
+	int err;
+	int i;
+
+	/* Adjust the flood group for this VLAN.  The flood group
+	 * references an L2 interface group for each port in this
+	 * VLAN.
+	 */
+
+	for (i = 0; i < rocker->port_count; i++) {
+		p = rocker->ports[i];
+		if (!rocker_port_is_bridged(p))
+			continue;
+		if (test_bit(ntohs(vlan_id), p->vlan_bitmap)) {
+			group_ids[group_count++] =
+				ROCKER_GROUP_L2_INTERFACE(vlan_id,
+							  p->lport);
+		}
+	}
+
+	/* If there are no bridged ports in this VLAN, we're done */
+	if (group_count == 0)
+		return 0;
+
+	err = rocker_group_l2_flood(rocker_port, flags, vlan_id,
+				    group_count, group_ids,
+				    group_id);
+	if (err)
+		netdev_err(rocker_port->dev,
+			   "Error (%d) port VLAN l2 flood group\n", err);
+
+	return err;
+}
+
+static int rocker_port_vlan_l2_groups(struct rocker_port *rocker_port,
+				      int flags, __be16 vlan_id,
+				      bool pop_vlan)
+{
+	struct rocker *rocker = rocker_port->rocker;
+	struct rocker_port *p;
+	bool adding = !(flags & ROCKER_OP_FLAG_REMOVE);
+	u32 out_lport;
+	int ref = 0;
+	int err;
+	int i;
+
+	/* An L2 interface group for this port in this VLAN, but
+	 * only when port STP state is LEARNING|FORWARDING.
+	 */
+
+	if (rocker_port->stp_state == BR_STATE_LEARNING ||
+	    rocker_port->stp_state == BR_STATE_FORWARDING) {
+		out_lport = rocker_port->lport;
+		err = rocker_group_l2_interface(rocker_port, flags,
+						vlan_id, out_lport,
+						pop_vlan);
+		if (err) {
+			netdev_err(rocker_port->dev,
+				   "Error (%d) port VLAN l2 group for lport %d\n",
+				   err, out_lport);
+			return err;
+		}
+	}
+
+	/* An L2 interface group for this VLAN to CPU port.
+	 * Add when first port joins this VLAN and destroy when
+	 * last port leaves this VLAN.
+	 */
+
+	for (i = 0; i < rocker->port_count; i++) {
+		p = rocker->ports[i];
+		if (test_bit(ntohs(vlan_id), p->vlan_bitmap))
+			ref++;
+	}
+
+	if ((!adding || ref != 1) && (adding || ref != 0))
+		return 0;
+
+	out_lport = 0;
+	err = rocker_group_l2_interface(rocker_port, flags,
+					vlan_id, out_lport,
+					pop_vlan);
+	if (err) {
+		netdev_err(rocker_port->dev,
+			   "Error (%d) port VLAN l2 group for CPU port\n", err);
+		return err;
+	}
+
+	return 0;
+}
+
 static struct rocker_ctrl {
 	const u8 *eth_dst;
 	const u8 *eth_dst_mask;
@@ -2624,6 +2777,30 @@ static int rocker_port_ctrl_vlan_acl(struct rocker_port *rocker_port,
 	return err;
 }
 
+static int rocker_port_ctrl_vlan_bridge(struct rocker_port *rocker_port,
+					int flags, struct rocker_ctrl *ctrl,
+					__be16 vlan_id)
+{
+	enum rocker_of_dpa_table_id goto_tbl =
+		ROCKER_OF_DPA_TABLE_ID_ACL_POLICY;
+	u32 group_id = ROCKER_GROUP_L2_FLOOD(vlan_id, 0);
+	u32 tunnel_id = 0;
+	int err;
+
+	if (!rocker_port_is_bridged(rocker_port))
+		return 0;
+
+	err = rocker_flow_tbl_bridge(rocker_port, flags,
+				     ctrl->eth_dst, ctrl->eth_dst_mask,
+				     vlan_id, tunnel_id,
+				     goto_tbl, group_id, ctrl->copy_to_cpu);
+
+	if (err)
+		netdev_err(rocker_port->dev, "Error (%d) ctrl FLOOD\n", err);
+
+	return err;
+}
+
 static int rocker_port_ctrl_vlan_term(struct rocker_port *rocker_port,
 				      int flags, struct rocker_ctrl *ctrl,
 				      __be16 vlan_id)
@@ -2654,6 +2831,9 @@ static int rocker_port_ctrl_vlan(struct rocker_port *rocker_port, int flags,
 	if (ctrl->acl)
 		return rocker_port_ctrl_vlan_acl(rocker_port, flags,
 						 ctrl, vlan_id);
+	if (ctrl->bridge)
+		return rocker_port_ctrl_vlan_bridge(rocker_port, flags,
+						    ctrl, vlan_id);
 
 	if (ctrl->term)
 		return rocker_port_ctrl_vlan_term(rocker_port, flags,
@@ -2698,6 +2878,64 @@ static int rocker_port_ctrl(struct rocker_port *rocker_port, int flags,
 	return err;
 }
 
+static int rocker_port_vlan(struct rocker_port *rocker_port, int flags,
+			    u16 vid)
+{
+	enum rocker_of_dpa_table_id goto_tbl =
+		ROCKER_OF_DPA_TABLE_ID_TERMINATION_MAC;
+	u32 in_lport = rocker_port->lport;
+	__be16 vlan_id = htons(vid);
+	__be16 vlan_id_mask = htons(0xffff);
+	__be16 internal_vlan_id;
+	bool untagged;
+	bool adding = !(flags & ROCKER_OP_FLAG_REMOVE);
+	int err;
+
+	internal_vlan_id = rocker_port_vid_to_vlan(rocker_port, vid, &untagged);
+
+	if (adding && test_and_set_bit(ntohs(internal_vlan_id),
+				       rocker_port->vlan_bitmap))
+			return 0; /* already added */
+	else if (!adding && !test_and_clear_bit(ntohs(internal_vlan_id),
+						rocker_port->vlan_bitmap))
+			return 0; /* already removed */
+
+	if (adding) {
+		err = rocker_port_ctrl_vlan_add(rocker_port, flags,
+						internal_vlan_id);
+		if (err) {
+			netdev_err(rocker_port->dev,
+				   "Error (%d) port ctrl vlan add\n", err);
+			return err;
+		}
+	}
+
+	err = rocker_port_vlan_l2_groups(rocker_port, flags,
+					 internal_vlan_id, untagged);
+	if (err) {
+		netdev_err(rocker_port->dev,
+			   "Error (%d) port VLAN l2 groups\n", err);
+		return err;
+	}
+
+	err = rocker_port_vlan_flood_group(rocker_port, flags,
+					   internal_vlan_id);
+	if (err) {
+		netdev_err(rocker_port->dev,
+			   "Error (%d) port VLAN l2 flood group\n", err);
+		return err;
+	}
+
+	err = rocker_flow_tbl_vlan(rocker_port, flags,
+				   in_lport, vlan_id, vlan_id_mask,
+				   goto_tbl, untagged, internal_vlan_id);
+	if (err)
+		netdev_err(rocker_port->dev,
+			   "Error (%d) port VLAN table\n", err);
+
+	return err;
+}
+
 static int rocker_port_ig_tbl(struct rocker_port *rocker_port, int flags)
 {
 	enum rocker_of_dpa_table_id goto_tbl;
@@ -2723,6 +2961,158 @@ static int rocker_port_ig_tbl(struct rocker_port *rocker_port, int flags)
 	return err;
 }
 
+struct rocker_fdb_learn_work {
+	struct work_struct work;
+	struct net_device *dev;
+	int flags;
+	u8 addr[ETH_ALEN];
+	u16 vid;
+};
+
+static void rocker_port_fdb_learn_work(struct work_struct *work)
+{
+	struct rocker_fdb_learn_work *lw =
+		container_of(work, struct rocker_fdb_learn_work, work);
+	bool removing = (lw->flags & ROCKER_OP_FLAG_REMOVE);
+	bool learned = (lw->flags & ROCKER_OP_FLAG_LEARNED);
+
+	if (learned & removing)
+		br_fdb_learn_del(lw->dev, lw->addr, lw->vid);
+	else if (learned & !removing)
+		br_fdb_learn_add(lw->dev, lw->addr, lw->vid);
+
+	kfree(work);
+}
+
+static int rocker_port_fdb_learn(struct rocker_port *rocker_port,
+				 int flags, const u8 *addr, __be16 vlan_id)
+{
+	struct rocker_fdb_learn_work *lw;
+	enum rocker_of_dpa_table_id goto_tbl =
+		ROCKER_OF_DPA_TABLE_ID_ACL_POLICY;
+	u32 out_lport = rocker_port->lport;
+	u32 tunnel_id = 0;
+	u32 group_id = ROCKER_GROUP_NONE;
+	bool copy_to_cpu = false;
+	int err;
+
+	if (rocker_port_is_bridged(rocker_port))
+		group_id = ROCKER_GROUP_L2_INTERFACE(vlan_id, out_lport);
+
+	err = rocker_flow_tbl_bridge(rocker_port, flags, addr, NULL,
+				     vlan_id, tunnel_id, goto_tbl,
+				     group_id, copy_to_cpu);
+	if (err)
+		return err;
+
+	if (!rocker_port_is_bridged(rocker_port))
+		return 0;
+
+	lw = kmalloc(sizeof(*lw), rocker_op_flags_gfp(flags));
+	if (!lw)
+		return -ENOMEM;
+
+	INIT_WORK(&lw->work, rocker_port_fdb_learn_work);
+
+	lw->dev = rocker_port->dev;
+	lw->flags = flags;
+	ether_addr_copy(lw->addr, addr);
+	lw->vid = rocker_port_vlan_to_vid(rocker_port, vlan_id);
+
+	schedule_work(&lw->work);
+
+	return 0;
+}
+
+static struct rocker_fdb_tbl_entry *
+rocker_fdb_tbl_find(struct rocker *rocker, struct rocker_fdb_tbl_entry *match)
+{
+	struct rocker_fdb_tbl_entry *found;
+
+	hash_for_each_possible(rocker->fdb_tbl, found, entry, match->key_crc32)
+		if (memcmp(&found->key, &match->key, sizeof(found->key)) == 0)
+			return found;
+
+	return NULL;
+}
+
+static int rocker_port_fdb(struct rocker_port *rocker_port,
+			   const unsigned char *addr,
+			   __be16 vlan_id, int flags)
+{
+	struct rocker *rocker = rocker_port->rocker;
+	struct rocker_fdb_tbl_entry *fdb;
+	struct rocker_fdb_tbl_entry *found;
+	bool removing = (flags & ROCKER_OP_FLAG_REMOVE);
+	unsigned long lock_flags;
+
+	fdb = kzalloc(sizeof(*fdb), rocker_op_flags_gfp(flags));
+	if (!fdb)
+		return -ENOMEM;
+
+	fdb->learned = (flags & ROCKER_OP_FLAG_LEARNED);
+	fdb->key.lport = rocker_port->lport;
+	ether_addr_copy(fdb->key.addr, addr);
+	fdb->key.vlan_id = vlan_id;
+	fdb->key_crc32 = crc32(~0, &fdb->key, sizeof(fdb->key));
+
+	spin_lock_irqsave(&rocker->fdb_tbl_lock, lock_flags);
+
+	found = rocker_fdb_tbl_find(rocker, fdb);
+
+	if (removing && found) {
+		kfree(fdb);
+		hash_del(&found->entry);
+	} else if (!removing && !found) {
+		hash_add(rocker->fdb_tbl, &fdb->entry, fdb->key_crc32);
+	}
+
+	spin_unlock_irqrestore(&rocker->fdb_tbl_lock, lock_flags);
+
+	/* Check if adding and already exists, or removing and can't find */
+	if (!found != !removing) {
+		kfree(fdb);
+		return 0;
+	}
+
+	return rocker_port_fdb_learn(rocker_port, flags, addr, vlan_id);
+}
+
+static int rocker_port_fdb_flush(struct rocker_port *rocker_port)
+{
+	struct rocker *rocker = rocker_port->rocker;
+	struct rocker_fdb_tbl_entry *found;
+	unsigned long lock_flags;
+	int flags = ROCKER_OP_FLAG_NOWAIT | ROCKER_OP_FLAG_REMOVE;
+	struct hlist_node *tmp;
+	int bkt;
+	int err = 0;
+
+	if (rocker_port->stp_state == BR_STATE_LEARNING ||
+	    rocker_port->stp_state == BR_STATE_FORWARDING)
+		return 0;
+
+	spin_lock_irqsave(&rocker->fdb_tbl_lock, lock_flags);
+
+	hash_for_each_safe(rocker->fdb_tbl, bkt, tmp, found, entry) {
+		if (found->key.lport != rocker_port->lport)
+			continue;
+		if (!found->learned)
+			continue;
+		err = rocker_port_fdb_learn(rocker_port, flags,
+					    found->key.addr,
+					    found->key.vlan_id);
+		if (err)
+			goto err_out;
+		hash_del(&found->entry);
+	}
+
+err_out:
+	spin_unlock_irqrestore(&rocker->fdb_tbl_lock, lock_flags);
+
+	return err;
+}
+
 static int rocker_port_router_mac(struct rocker_port *rocker_port,
 				  int flags, __be16 vlan_id)
 {
@@ -2755,6 +3145,98 @@ static int rocker_port_router_mac(struct rocker_port *rocker_port,
 	return err;
 }
 
+static int rocker_port_fwding(struct rocker_port *rocker_port)
+{
+	bool pop_vlan;
+	u32 out_lport;
+	__be16 vlan_id;
+	u16 vid;
+	int flags = ROCKER_OP_FLAG_NOWAIT;
+	int err;
+
+	/* Port will be forwarding-enabled if its STP state is LEARNING
+	 * or FORWARDING.  Traffic from CPU can still egress, regardless of
+	 * port STP state.  Use L2 interface group on port VLANs as a way
+	 * to toggle port forwarding: if forwarding is disabled, L2
+	 * interface group will not exist.
+	 */
+
+	if (rocker_port->stp_state != BR_STATE_LEARNING &&
+	    rocker_port->stp_state != BR_STATE_FORWARDING)
+		flags |= ROCKER_OP_FLAG_REMOVE;
+
+	out_lport = rocker_port->lport;
+	for (vid = 1; vid < VLAN_N_VID; vid++) {
+		if (!test_bit(vid, rocker_port->vlan_bitmap))
+			continue;
+		vlan_id = htons(vid);
+		pop_vlan = rocker_vlan_id_is_internal(vlan_id);
+		err = rocker_group_l2_interface(rocker_port, flags,
+						vlan_id, out_lport,
+						pop_vlan);
+		if (err) {
+			netdev_err(rocker_port->dev,
+				   "Error (%d) port VLAN l2 group for lport %d\n",
+				   err, out_lport);
+			return err;
+		}
+	}
+
+	return 0;
+}
+
+static int rocker_port_stp_update(struct net_device *dev, u8 state)
+{
+	struct rocker_port *rocker_port = netdev_priv(dev);
+	bool want[ROCKER_CTRL_MAX] = { 0, };
+	int flags;
+	int err;
+	int i;
+
+	if (rocker_port->stp_state == state)
+		return 0;
+
+	rocker_port->stp_state = state;
+
+	switch (state) {
+	case BR_STATE_DISABLED:
+		/* port is completely disabled */
+		break;
+	case BR_STATE_LISTENING:
+	case BR_STATE_BLOCKING:
+		want[ROCKER_CTRL_LINK_LOCAL_MCAST] = true;
+		break;
+	case BR_STATE_LEARNING:
+	case BR_STATE_FORWARDING:
+		want[ROCKER_CTRL_LINK_LOCAL_MCAST] = true;
+		want[ROCKER_CTRL_IPV4_MCAST] = true;
+		want[ROCKER_CTRL_IPV6_MCAST] = true;
+		if (rocker_port_is_bridged(rocker_port))
+			want[ROCKER_CTRL_DFLT_BRIDGING] = true;
+		else
+			want[ROCKER_CTRL_LOCAL_ARP] = true;
+		break;
+	}
+
+	for (i = 0; i < ROCKER_CTRL_MAX; i++) {
+		if (want[i] != rocker_port->ctrls[i]) {
+			flags = ROCKER_OP_FLAG_NOWAIT |
+				(want[i] ? 0 : ROCKER_OP_FLAG_REMOVE);
+			err = rocker_port_ctrl(rocker_port, flags,
+					       &rocker_ctrls[i]);
+			if (err)
+				return err;
+			rocker_port->ctrls[i] = want[i];
+		}
+	}
+
+	err = rocker_port_fdb_flush(rocker_port);
+	if (err)
+		return err;
+
+	return rocker_port_fwding(rocker_port);
+}
+
 static struct rocker_internal_vlan_tbl_entry *
 rocker_internal_vlan_tbl_find(struct rocker *rocker, int ifindex)
 {
@@ -2847,6 +3329,8 @@ not_found:
 static int rocker_port_open(struct net_device *dev)
 {
 	struct rocker_port *rocker_port = netdev_priv(dev);
+	u8 stp_state = rocker_port_is_bridged(rocker_port) ?
+		BR_STATE_BLOCKING : BR_STATE_FORWARDING;
 	int err;
 
 	err = rocker_port_dma_rings_init(rocker_port);
@@ -2869,12 +3353,18 @@ static int rocker_port_open(struct net_device *dev)
 		goto err_request_rx_irq;
 	}
 
+	err = rocker_port_stp_update(dev, stp_state);
+	if (err)
+		goto err_stp_update;
+
 	napi_enable(&rocker_port->napi_tx);
 	napi_enable(&rocker_port->napi_rx);
 	rocker_port_set_enable(rocker_port, true);
 	netif_start_queue(dev);
 	return 0;
 
+err_stp_update:
+	free_irq(rocker_msix_rx_vector(rocker_port), rocker_port);
 err_request_rx_irq:
 	free_irq(rocker_msix_tx_vector(rocker_port), rocker_port);
 err_request_tx_irq:
@@ -2890,6 +3380,7 @@ static int rocker_port_stop(struct net_device *dev)
 	rocker_port_set_enable(rocker_port, false);
 	napi_disable(&rocker_port->napi_rx);
 	napi_disable(&rocker_port->napi_tx);
+	rocker_port_stp_update(dev, BR_STATE_DISABLED);
 	free_irq(rocker_msix_rx_vector(rocker_port), rocker_port);
 	free_irq(rocker_msix_tx_vector(rocker_port), rocker_port);
 	rocker_port_dma_rings_fini(rocker_port);
@@ -3034,6 +3525,33 @@ static int rocker_port_set_mac_address(struct net_device *dev, void *p)
 	return 0;
 }
 
+static int rocker_port_vlan_rx_add_vid(struct net_device *dev,
+				       __be16 proto, u16 vid)
+{
+	struct rocker_port *rocker_port = netdev_priv(dev);
+	int err;
+
+	err = rocker_port_vlan(rocker_port, 0, vid);
+	if (err)
+		return err;
+
+	return rocker_port_router_mac(rocker_port, 0, htons(vid));
+}
+
+static int rocker_port_vlan_rx_kill_vid(struct net_device *dev,
+					__be16 proto, u16 vid)
+{
+	struct rocker_port *rocker_port = netdev_priv(dev);
+	int err;
+
+	err = rocker_port_router_mac(rocker_port, ROCKER_OP_FLAG_REMOVE,
+				     htons(vid));
+	if (err)
+		return err;
+
+	return rocker_port_vlan(rocker_port, ROCKER_OP_FLAG_REMOVE, vid);
+}
+
 static int rocker_port_sw_parent_id_get(struct net_device *dev,
 					struct netdev_phys_item_id *psid)
 {
@@ -3045,12 +3563,48 @@ static int rocker_port_sw_parent_id_get(struct net_device *dev,
 	return 0;
 }
 
+static int rocker_port_sw_port_fdb_add(struct net_device *dev,
+				       const unsigned char *addr, u16 vid)
+{
+	struct rocker_port *rocker_port = netdev_priv(dev);
+	__be16 vlan_id = rocker_port_vid_to_vlan(rocker_port, vid, NULL);
+	int flags = 0;
+
+	if (!rocker_port_is_bridged(rocker_port))
+		return -EINVAL;
+
+	return rocker_port_fdb(rocker_port, addr, vlan_id, flags);
+}
+
+static int rocker_port_sw_port_fdb_del(struct net_device *dev,
+				       const unsigned char *addr, u16 vid)
+{
+	struct rocker_port *rocker_port = netdev_priv(dev);
+	__be16 vlan_id = rocker_port_vid_to_vlan(rocker_port, vid, NULL);
+	int flags = ROCKER_OP_FLAG_REMOVE;
+
+	if (!rocker_port_is_bridged(rocker_port))
+		return -EINVAL;
+
+	return rocker_port_fdb(rocker_port, addr, vlan_id, flags);
+}
+
+static int rocker_port_sw_port_stp_update(struct net_device *dev, u8 state)
+{
+	return rocker_port_stp_update(dev, state);
+}
+
 static const struct net_device_ops rocker_port_netdev_ops = {
 	.ndo_open			= rocker_port_open,
 	.ndo_stop			= rocker_port_stop,
 	.ndo_start_xmit			= rocker_port_xmit,
 	.ndo_set_mac_address		= rocker_port_set_mac_address,
+	.ndo_vlan_rx_add_vid		= rocker_port_vlan_rx_add_vid,
+	.ndo_vlan_rx_kill_vid		= rocker_port_vlan_rx_kill_vid,
 	.ndo_sw_parent_id_get		= rocker_port_sw_parent_id_get,
+	.ndo_sw_port_fdb_add		= rocker_port_sw_port_fdb_add,
+	.ndo_sw_port_fdb_del		= rocker_port_sw_port_fdb_del,
+	.ndo_sw_port_stp_update		= rocker_port_sw_port_stp_update,
 };
 
 /********************
@@ -3500,17 +4054,121 @@ static struct pci_driver rocker_pci_driver = {
 	.remove		= rocker_remove,
 };
 
+/************************************
+ * Net device notifier event handler
+ ************************************/
+
+static bool rocker_port_dev_check(struct net_device *dev)
+{
+	return dev->netdev_ops == &rocker_port_netdev_ops;
+}
+
+static int rocker_port_bridge_join(struct rocker_port *rocker_port,
+				   struct net_device *bridge)
+{
+	int err;
+
+	rocker_port_internal_vlan_id_put(rocker_port,
+					 rocker_port->dev->ifindex);
+
+	rocker_port->bridge_dev = bridge;
+
+	/* Use bridge internal VLAN ID for untagged pkts */
+	err = rocker_port_vlan(rocker_port, ROCKER_OP_FLAG_REMOVE, 0);
+	if (err)
+		return err;
+	rocker_port->internal_vlan_id =
+		rocker_port_internal_vlan_id_get(rocker_port,
+						 bridge->ifindex);
+	err = rocker_port_vlan(rocker_port, 0, 0);
+
+	return err;
+}
+
+static int rocker_port_bridge_leave(struct rocker_port *rocker_port)
+{
+	int err;
+
+	rocker_port_internal_vlan_id_put(rocker_port,
+					 rocker_port->bridge_dev->ifindex);
+
+	rocker_port->bridge_dev = NULL;
+
+	/* Use port internal VLAN ID for untagged pkts */
+	err = rocker_port_vlan(rocker_port, ROCKER_OP_FLAG_REMOVE, 0);
+	if (err)
+		return err;
+	rocker_port->internal_vlan_id =
+		rocker_port_internal_vlan_id_get(rocker_port,
+						 rocker_port->dev->ifindex);
+	err = rocker_port_vlan(rocker_port, 0, 0);
+
+	return err;
+}
+
+static int rocker_port_master_changed(struct net_device *dev)
+{
+	struct rocker_port *rocker_port = netdev_priv(dev);
+	struct net_device *master = netdev_master_upper_dev_get(dev);
+	int err = 0;
+
+	if (master && master->rtnl_link_ops &&
+	    !strcmp(master->rtnl_link_ops->kind, "bridge"))
+		err = rocker_port_bridge_join(rocker_port, master);
+	else
+		err = rocker_port_bridge_leave(rocker_port);
+
+	return err;
+}
+
+static int rocker_netdevice_event(struct notifier_block *unused,
+				  unsigned long event, void *ptr)
+{
+	struct net_device *dev;
+	int err;
+
+	switch (event) {
+	case NETDEV_CHANGEUPPER:
+		dev = netdev_notifier_info_to_dev(ptr);
+		if (!rocker_port_dev_check(dev))
+			return NOTIFY_DONE;
+		err = rocker_port_master_changed(dev);
+		if (err)
+			netdev_warn(dev,
+				    "failed to reflect master change (err %d)\n",
+				    err);
+		break;
+	}
+
+	return NOTIFY_DONE;
+}
+
+static struct notifier_block rocker_netdevice_nb __read_mostly = {
+	.notifier_call = rocker_netdevice_event,
+};
+
 /***********************
  * Module init and exit
  ***********************/
 
 static int __init rocker_module_init(void)
 {
-	return pci_register_driver(&rocker_pci_driver);
+	int err;
+
+	register_netdevice_notifier(&rocker_netdevice_nb);
+	err = pci_register_driver(&rocker_pci_driver);
+	if (err)
+		goto err_pci_register_driver;
+	return 0;
+
+err_pci_register_driver:
+	unregister_netdevice_notifier(&rocker_netdevice_nb);
+	return err;
 }
 
 static void __exit rocker_module_exit(void)
 {
+	unregister_netdevice_notifier(&rocker_netdevice_nb);
 	pci_unregister_driver(&rocker_pci_driver);
 }
 
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 100+ messages in thread

* [patch net-next] bridge: rename fdb_*_hw to fdb_*_hw_addr to avoid confusion
  2014-11-09 10:51 [patch net-next v2 00/10] introduce rocker switch driver with hardware accelerated datapath api - phase 1: bridge fdb offload Jiri Pirko
                   ` (9 preceding siblings ...)
  2014-11-09 10:51 ` [patch net-next v2 10/10] rocker: implement L2 bridge offloading Jiri Pirko
@ 2014-11-09 16:40 ` Jiri Pirko
  2014-11-11  2:33   ` David Miller
  2014-11-10  3:31 ` [patch net-next v2 00/10] introduce rocker switch driver with hardware accelerated datapath api - phase 1: bridge fdb offload Jamal Hadi Salim
                   ` (2 subsequent siblings)
  13 siblings, 1 reply; 100+ messages in thread
From: Jiri Pirko @ 2014-11-09 16:40 UTC (permalink / raw)
  To: netdev
  Cc: davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse, pshelar,
	azhou, ben, stephen, jeffrey.t.kirsher, vyasevic, xiyou.wangcong,
	john.r.fastabend, edumazet, jhs, sfeldma, f.fainelli, roopa,
	linville, jasowang, ebiederm, nicolas.dichtel, ryazanov.s.a,
	buytenh, aviadr, nbd, alexei.starovoitov, Neil.Jerram, ronye,
	simon.horman, alexander.h.duyck, john.ronciak, mleitner,
	shrijeet, gospo, bcrl

The current name might seem that this actually offloads the fdb entry to
hw. So rename it to clearly present that this for hardware address
addition/removal.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
---
 net/bridge/br_fdb.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
index e02d21b..3886f84 100644
--- a/net/bridge/br_fdb.c
+++ b/net/bridge/br_fdb.c
@@ -91,7 +91,7 @@ static void fdb_rcu_free(struct rcu_head *head)
  * are then updated with the new information.
  * Called under RTNL.
  */
-static void fdb_add_hw(struct net_bridge *br, const unsigned char *addr)
+static void fdb_add_hw_addr(struct net_bridge *br, const unsigned char *addr)
 {
 	int err;
 	struct net_bridge_port *p;
@@ -119,7 +119,7 @@ undo:
  * the ports with needed information.
  * Called under RTNL.
  */
-static void fdb_del_hw(struct net_bridge *br, const unsigned char *addr)
+static void fdb_del_hw_addr(struct net_bridge *br, const unsigned char *addr)
 {
 	struct net_bridge_port *p;
 
@@ -134,7 +134,7 @@ static void fdb_del_hw(struct net_bridge *br, const unsigned char *addr)
 static void fdb_delete(struct net_bridge *br, struct net_bridge_fdb_entry *f)
 {
 	if (f->is_static) {
-		fdb_del_hw(br, f->addr.addr);
+		fdb_del_hw_addr(br, f->addr.addr);
 		if (f->dst)
 			netdev_sw_port_fdb_del(f->dst->dev,
 					       f->addr.addr, f->vlan_id);
@@ -519,7 +519,7 @@ static int fdb_insert(struct net_bridge *br, struct net_bridge_port *source,
 		return -ENOMEM;
 
 	fdb->is_local = fdb->is_static = 1;
-	fdb_add_hw(br, addr);
+	fdb_add_hw_addr(br, addr);
 	fdb_notify(br, fdb, RTM_NEWNEIGH);
 	return 0;
 }
@@ -759,21 +759,21 @@ static int fdb_add_entry(struct net_bridge_port *source, const __u8 *addr,
 			fdb->is_local = 1;
 			if (!fdb->is_static) {
 				fdb->is_static = 1;
-				fdb_add_hw(br, addr);
+				fdb_add_hw_addr(br, addr);
 				netdev_sw_port_fdb_add(source->dev, addr, vid);
 			}
 		} else if (state & NUD_NOARP) {
 			fdb->is_local = 0;
 			if (!fdb->is_static) {
 				fdb->is_static = 1;
-				fdb_add_hw(br, addr);
+				fdb_add_hw_addr(br, addr);
 				netdev_sw_port_fdb_add(source->dev, addr, vid);
 			}
 		} else {
 			fdb->is_local = 0;
 			if (fdb->is_static) {
 				fdb->is_static = 0;
-				fdb_del_hw(br, addr);
+				fdb_del_hw_addr(br, addr);
 				netdev_sw_port_fdb_del(source->dev, addr, vid);
 			}
 		}
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 100+ messages in thread

* Re: [patch net-next v2 00/10] introduce rocker switch driver with hardware accelerated datapath api - phase 1: bridge fdb offload
  2014-11-09 10:51 [patch net-next v2 00/10] introduce rocker switch driver with hardware accelerated datapath api - phase 1: bridge fdb offload Jiri Pirko
                   ` (10 preceding siblings ...)
  2014-11-09 16:40 ` [patch net-next] bridge: rename fdb_*_hw to fdb_*_hw_addr to avoid confusion Jiri Pirko
@ 2014-11-10  3:31 ` Jamal Hadi Salim
  2014-11-10  3:46   ` Simon Horman
  2014-11-10  7:23   ` Jiri Pirko
  2014-11-10 16:48 ` Thomas Graf
  2014-11-12 13:44 ` Jiri Pirko
  13 siblings, 2 replies; 100+ messages in thread
From: Jamal Hadi Salim @ 2014-11-10  3:31 UTC (permalink / raw)
  To: Jiri Pirko, netdev
  Cc: davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse, pshelar,
	azhou, ben, stephen, jeffrey.t.kirsher, vyasevic, xiyou.wangcong,
	john.r.fastabend, edumazet, sfeldma, f.fainelli, roopa, linville,
	jasowang, ebiederm, nicolas.dichtel, ryazanov.s.a, buytenh,
	aviadr, nbd, alexei.starovoitov, Neil.Jerram, ronye,
	simon.horman, alexander.h.duyck, john.ronciak, mleitner,
	shrijeet, gospo, bcrl

Jiri,

I am hoping you have considered what Ben Lahaise's, John Fastabend's,
and Roopa's patches after all those discussions and
meetings we have had (in which you promised you will merge patches
in). I am not seeing much of that here or mention of anything of that
sort.
At least please get their sign on - this  is such an important piece of
new work that you should make sure you get consensus.
Otherwise we are back to square one and everyone is going their way with
their patches;

Ben/Roopa/John - please issue either a signed-off as well
if you agree with this approach otherwise i am hoping none of these
patches are merged in.

I will look at the patches and comment.

cheers,
jamal

On 11/09/14 05:51, Jiri Pirko wrote:
> Hi all.
>
> This patchset is just the first phase of switch and switch-ish device
> support api in kernel. Note that the api will extend (our complete work
> can be pulled from https://github.com/jpirko/net-next-rocker).
>
> So what this patchset includes:
> - introduce switchdev api for implementing switch drivers (so far
>    only linux bridge fdb offload is covered)
> - introduce rocker switch driver which implements switchdev api
>
> As to the discussion if there is need to have specific class of device
> representing the switch itself, so far we found no need to introduce that.
> But we are generally ok with the idea and when the time comes and it will
> be needed, it can be easily introduced without any disturbance.
>
> This patchset introduces switch id export through rtnetlink and sysfs,
> which is similar to what we have for port id in SR-IOV. I will send iproute2
> patchset for showing the switch id for port netdevs once this is applied.
>
> For detailed description, please see individual patches.
>
> v1->v2:
> - addressed all DaveM's comments
>
> Jiri Pirko (5):
>    net: rename netdev_phys_port_id to more generic name
>    net: introduce generic switch devices support
>    rtnl: expose physical switch id for particular device
>    net-sysfs: expose physical switch id for particular device
>    rocker: introduce rocker switch driver
>
> Scott Feldman (5):
>    bridge: introduce fdb offloading via switchdev
>    bridge: call netdev_sw_port_stp_update when bridge port STP status
>      changes
>    bridge: add API to notify bridge driver of learned FBD on offloaded
>      device
>    rocker: implement rocker ofdpa flow table manipulation
>    rocker: implement L2 bridge offloading
>
>   Documentation/networking/switchdev.txt           |   59 +
>   MAINTAINERS                                      |   14 +
>   drivers/net/ethernet/Kconfig                     |    1 +
>   drivers/net/ethernet/Makefile                    |    1 +
>   drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c |    2 +-
>   drivers/net/ethernet/intel/i40e/i40e_main.c      |    2 +-
>   drivers/net/ethernet/mellanox/mlx4/en_netdev.c   |    2 +-
>   drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c |    2 +-
>   drivers/net/ethernet/rocker/Kconfig              |   27 +
>   drivers/net/ethernet/rocker/Makefile             |    5 +
>   drivers/net/ethernet/rocker/rocker.c             | 4182 ++++++++++++++++++++++
>   drivers/net/ethernet/rocker/rocker.h             |  427 +++
>   include/linux/if_bridge.h                        |   18 +
>   include/linux/netdevice.h                        |   48 +-
>   include/net/switchdev.h                          |   53 +
>   include/uapi/linux/if_link.h                     |    1 +
>   net/Kconfig                                      |    1 +
>   net/Makefile                                     |    3 +
>   net/bridge/br_fdb.c                              |   94 +-
>   net/bridge/br_netlink.c                          |    2 +
>   net/bridge/br_stp.c                              |    4 +
>   net/bridge/br_stp_if.c                           |    3 +
>   net/bridge/br_stp_timer.c                        |    2 +
>   net/core/dev.c                                   |    2 +-
>   net/core/net-sysfs.c                             |   26 +-
>   net/core/rtnetlink.c                             |   30 +-
>   net/switchdev/Kconfig                            |   13 +
>   net/switchdev/Makefile                           |    5 +
>   net/switchdev/switchdev.c                        |   93 +
>   29 files changed, 5104 insertions(+), 18 deletions(-)
>   create mode 100644 Documentation/networking/switchdev.txt
>   create mode 100644 drivers/net/ethernet/rocker/Kconfig
>   create mode 100644 drivers/net/ethernet/rocker/Makefile
>   create mode 100644 drivers/net/ethernet/rocker/rocker.c
>   create mode 100644 drivers/net/ethernet/rocker/rocker.h
>   create mode 100644 include/net/switchdev.h
>   create mode 100644 net/switchdev/Kconfig
>   create mode 100644 net/switchdev/Makefile
>   create mode 100644 net/switchdev/switchdev.c
>

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch net-next v2 01/10] net: rename netdev_phys_port_id to more generic name
  2014-11-09 10:51 ` [patch net-next v2 01/10] net: rename netdev_phys_port_id to more generic name Jiri Pirko
@ 2014-11-10  3:35   ` Jamal Hadi Salim
  2014-11-10  5:23     ` David Miller
  2014-11-10  7:43     ` Jiri Pirko
  2014-11-10 21:57   ` John Fastabend
  1 sibling, 2 replies; 100+ messages in thread
From: Jamal Hadi Salim @ 2014-11-10  3:35 UTC (permalink / raw)
  To: Jiri Pirko, netdev
  Cc: davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse, pshelar,
	azhou, ben, stephen, jeffrey.t.kirsher, vyasevic, xiyou.wangcong,
	john.r.fastabend, edumazet, sfeldma, f.fainelli, roopa, linville,
	jasowang, ebiederm, nicolas.dichtel, ryazanov.s.a, buytenh,
	aviadr, nbd, alexei.starovoitov, Neil.Jerram, ronye,
	simon.horman, alexander.h.duyck, john.ronciak, mleitner,
	shrijeet, gospo, bcrl


On 11/09/14 05:51, Jiri Pirko wrote:
> So this can be reused for identification of other "items" as well.
>




>
> diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
> index 9dd0669..55dc4da 100644
> --- a/net/core/net-sysfs.c
> +++ b/net/core/net-sysfs.c
> @@ -387,7 +387,7 @@ static ssize_t phys_port_id_show(struct device *dev,
>   		return restart_syscall();
>
>   	if (dev_isalive(netdev)) {
> -		struct netdev_phys_port_id ppid;
> +		struct netdev_phys_item_id ppid;
>
>   		ret = dev_get_phys_port_id(netdev, &ppid);
>   		if (!ret)
> diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> index a688268..1087c6d 100644
> --- a/net/core/rtnetlink.c
> +++ b/net/core/rtnetlink.c
> @@ -868,7 +868,7 @@ static noinline size_t if_nlmsg_size(const struct net_device *dev,
>   	       + rtnl_port_size(dev, ext_filter_mask) /* IFLA_VF_PORTS + IFLA_PORT_SELF */
>   	       + rtnl_link_get_size(dev) /* IFLA_LINKINFO */
>   	       + rtnl_link_get_af_size(dev) /* IFLA_AF_SPEC */
> -	       + nla_total_size(MAX_PHYS_PORT_ID_LEN); /* IFLA_PHYS_PORT_ID */
> +	       + nla_total_size(MAX_PHYS_ITEM_ID_LEN); /* IFLA_PHYS_PORT_ID */
>   }
>

[...]
>   static int rtnl_vf_ports_fill(struct sk_buff *skb, struct net_device *dev)
> @@ -952,7 +952,7 @@ static int rtnl_port_fill(struct sk_buff *skb, struct net_device *dev,
>   static int rtnl_phys_port_id_fill(struct sk_buff *skb, struct net_device *dev)
>   {
>   	int err;
> -	struct netdev_phys_port_id ppid;
> +	struct netdev_phys_item_id ppid;
>
>   	err = dev_get_phys_port_id(dev, &ppid);
>   	if (err) {
> @@ -1196,7 +1196,7 @@ static const struct nla_policy ifla_policy[IFLA_MAX+1] = {
>   	[IFLA_PROMISCUITY]	= { .type = NLA_U32 },
>   	[IFLA_NUM_TX_QUEUES]	= { .type = NLA_U32 },
>   	[IFLA_NUM_RX_QUEUES]	= { .type = NLA_U32 },
> -	[IFLA_PHYS_PORT_ID]	= { .type = NLA_BINARY, .len = MAX_PHYS_PORT_ID_LEN },
> +	[IFLA_PHYS_PORT_ID]	= { .type = NLA_BINARY, .len = MAX_PHYS_ITEM_ID_LEN },
>   	[IFLA_CARRIER_CHANGES]	= { .type = NLA_U32 },  /* ignored */
>   };


wouldnt this just break an existing ABI? You may need to introduce a new 
attribute.

cheers,
jamal

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch net-next v2 03/10] rtnl: expose physical switch id for particular device
  2014-11-09 10:51 ` [patch net-next v2 03/10] rtnl: expose physical switch id for particular device Jiri Pirko
@ 2014-11-10  3:43   ` Jamal Hadi Salim
  2014-11-10  7:45     ` Jiri Pirko
  2014-11-10 17:58   ` Roopa Prabhu
  2014-11-10 22:01   ` John Fastabend
  2 siblings, 1 reply; 100+ messages in thread
From: Jamal Hadi Salim @ 2014-11-10  3:43 UTC (permalink / raw)
  To: Jiri Pirko, netdev
  Cc: davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse, pshelar,
	azhou, ben, stephen, jeffrey.t.kirsher, vyasevic, xiyou.wangcong,
	john.r.fastabend, edumazet, sfeldma, f.fainelli, roopa, linville,
	jasowang, ebiederm, nicolas.dichtel, ryazanov.s.a, buytenh,
	aviadr, nbd, alexei.starovoitov, Neil.Jerram, ronye,
	simon.horman, alexander.h.duyck, john.ronciak, mleitner,
	shrijeet, gospo, bcrl

On 11/09/14 05:51, Jiri Pirko wrote:
> The netdevice represents a port in a switch, it will expose
> IFLA_PHYS_SWITCH_ID value via rtnl. Two netdevices with the same value
> belong to one physical switch.
>
> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
> ---
>   include/uapi/linux/if_link.h |  1 +
>   net/core/rtnetlink.c         | 26 +++++++++++++++++++++++++-
>   2 files changed, 26 insertions(+), 1 deletion(-)
>
> diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
> index 7072d83..4163753 100644
> --- a/include/uapi/linux/if_link.h
> +++ b/include/uapi/linux/if_link.h
> @@ -145,6 +145,7 @@ enum {
>   	IFLA_CARRIER,
>   	IFLA_PHYS_PORT_ID,
>   	IFLA_CARRIER_CHANGES,
> +	IFLA_PHYS_SWITCH_ID,
>   	__IFLA_MAX
>   };
>


> @@ -1198,6 +1221,7 @@ static const struct nla_policy ifla_policy[IFLA_MAX+1] = {
>   	[IFLA_NUM_RX_QUEUES]	= { .type = NLA_U32 },
>   	[IFLA_PHYS_PORT_ID]	= { .type = NLA_BINARY, .len = MAX_PHYS_ITEM_ID_LEN },
>   	[IFLA_CARRIER_CHANGES]	= { .type = NLA_U32 },  /* ignored */
> +	[IFLA_PHYS_SWITCH_ID]	= { .type = NLA_BINARY, .len = MAX_PHYS_ITEM_ID_LEN },
>   };
>

Ok, looking at this compared to #1 i can see you are introducing 
IFLA_PHYS_SWITCH_ID but then why did you need to change 
IFLA_PHYS_PORT_ID earlier?

cheers,
jamal

>   static const struct nla_policy ifla_info_policy[IFLA_INFO_MAX+1] = {
>

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch net-next v2 00/10] introduce rocker switch driver with hardware accelerated datapath api - phase 1: bridge fdb offload
  2014-11-10  3:31 ` [patch net-next v2 00/10] introduce rocker switch driver with hardware accelerated datapath api - phase 1: bridge fdb offload Jamal Hadi Salim
@ 2014-11-10  3:46   ` Simon Horman
  2014-11-10  4:03     ` Jamal Hadi Salim
  2014-11-10  7:23   ` Jiri Pirko
  1 sibling, 1 reply; 100+ messages in thread
From: Simon Horman @ 2014-11-10  3:46 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Jiri Pirko, netdev, davem, nhorman, andy, tgraf, dborkman,
	ogerlitz, jesse, pshelar, azhou, ben, stephen, jeffrey.t.kirsher,
	vyasevic, xiyou.wangcong, john.r.fastabend, edumazet, sfeldma,
	f.fainelli, roopa, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, alexander.h.duyck, john.ronciak, mleitner,
	shrijeet, gospo, bcrl

Hi Jamal, Hi Jiri,

On a somewhat related note I am also wondering what if any progress has
been made regarding discussions of (and code for) the following:

1. Exposing flow tables to user-space
   - I realise that this is Open vSwitch specific to some extent
     but I am in no way implying that it should be done instead of
     non-Open vSwitch specific work.
   - Jiri, IIRC this was part ~v2 of your earlier offload patchset

2. Describing Switch Hardware
   - I see John Fastabend moving forwards on this in his git repository
     https://github.com/jrfastab/flow-net-next

The way that I see things is that both of the above could be exposed via
netlink. And that the first at least could be backed by NDOs.  As such I
see this work as complementary and perhaps applying on top of this
patchset. If I am mistaken in this regards it would be good to know :)

I am of course also interested to know if the above are moving forwards.
To be clear I am very interested in being able to use these APIs to
perform Open vSwitch offloads and I am very happy to help.
(Jamal: I'm also interested in non-Open vSwitch offloads :)

On Sun, Nov 09, 2014 at 10:31:39PM -0500, Jamal Hadi Salim wrote:
> Jiri,
> 
> I am hoping you have considered what Ben Lahaise's, John Fastabend's,
> and Roopa's patches after all those discussions and
> meetings we have had (in which you promised you will merge patches
> in). I am not seeing much of that here or mention of anything of that
> sort.
> At least please get their sign on - this  is such an important piece of
> new work that you should make sure you get consensus.
> Otherwise we are back to square one and everyone is going their way with
> their patches;
> 
> Ben/Roopa/John - please issue either a signed-off as well
> if you agree with this approach otherwise i am hoping none of these
> patches are merged in.
> 
> I will look at the patches and comment.
> 
> cheers,
> jamal
> 
> On 11/09/14 05:51, Jiri Pirko wrote:
> >Hi all.
> >
> >This patchset is just the first phase of switch and switch-ish device
> >support api in kernel. Note that the api will extend (our complete work
> >can be pulled from https://github.com/jpirko/net-next-rocker).
> >
> >So what this patchset includes:
> >- introduce switchdev api for implementing switch drivers (so far
> >   only linux bridge fdb offload is covered)
> >- introduce rocker switch driver which implements switchdev api
> >
> >As to the discussion if there is need to have specific class of device
> >representing the switch itself, so far we found no need to introduce that.
> >But we are generally ok with the idea and when the time comes and it will
> >be needed, it can be easily introduced without any disturbance.
> >
> >This patchset introduces switch id export through rtnetlink and sysfs,
> >which is similar to what we have for port id in SR-IOV. I will send iproute2
> >patchset for showing the switch id for port netdevs once this is applied.
> >
> >For detailed description, please see individual patches.
> >
> >v1->v2:
> >- addressed all DaveM's comments
> >
> >Jiri Pirko (5):
> >   net: rename netdev_phys_port_id to more generic name
> >   net: introduce generic switch devices support
> >   rtnl: expose physical switch id for particular device
> >   net-sysfs: expose physical switch id for particular device
> >   rocker: introduce rocker switch driver
> >
> >Scott Feldman (5):
> >   bridge: introduce fdb offloading via switchdev
> >   bridge: call netdev_sw_port_stp_update when bridge port STP status
> >     changes
> >   bridge: add API to notify bridge driver of learned FBD on offloaded
> >     device
> >   rocker: implement rocker ofdpa flow table manipulation
> >   rocker: implement L2 bridge offloading
> >
> >  Documentation/networking/switchdev.txt           |   59 +
> >  MAINTAINERS                                      |   14 +
> >  drivers/net/ethernet/Kconfig                     |    1 +
> >  drivers/net/ethernet/Makefile                    |    1 +
> >  drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c |    2 +-
> >  drivers/net/ethernet/intel/i40e/i40e_main.c      |    2 +-
> >  drivers/net/ethernet/mellanox/mlx4/en_netdev.c   |    2 +-
> >  drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c |    2 +-
> >  drivers/net/ethernet/rocker/Kconfig              |   27 +
> >  drivers/net/ethernet/rocker/Makefile             |    5 +
> >  drivers/net/ethernet/rocker/rocker.c             | 4182 ++++++++++++++++++++++
> >  drivers/net/ethernet/rocker/rocker.h             |  427 +++
> >  include/linux/if_bridge.h                        |   18 +
> >  include/linux/netdevice.h                        |   48 +-
> >  include/net/switchdev.h                          |   53 +
> >  include/uapi/linux/if_link.h                     |    1 +
> >  net/Kconfig                                      |    1 +
> >  net/Makefile                                     |    3 +
> >  net/bridge/br_fdb.c                              |   94 +-
> >  net/bridge/br_netlink.c                          |    2 +
> >  net/bridge/br_stp.c                              |    4 +
> >  net/bridge/br_stp_if.c                           |    3 +
> >  net/bridge/br_stp_timer.c                        |    2 +
> >  net/core/dev.c                                   |    2 +-
> >  net/core/net-sysfs.c                             |   26 +-
> >  net/core/rtnetlink.c                             |   30 +-
> >  net/switchdev/Kconfig                            |   13 +
> >  net/switchdev/Makefile                           |    5 +
> >  net/switchdev/switchdev.c                        |   93 +
> >  29 files changed, 5104 insertions(+), 18 deletions(-)
> >  create mode 100644 Documentation/networking/switchdev.txt
> >  create mode 100644 drivers/net/ethernet/rocker/Kconfig
> >  create mode 100644 drivers/net/ethernet/rocker/Makefile
> >  create mode 100644 drivers/net/ethernet/rocker/rocker.c
> >  create mode 100644 drivers/net/ethernet/rocker/rocker.h
> >  create mode 100644 include/net/switchdev.h
> >  create mode 100644 net/switchdev/Kconfig
> >  create mode 100644 net/switchdev/Makefile
> >  create mode 100644 net/switchdev/switchdev.c
> >
> 

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch net-next v2 06/10] bridge: introduce fdb offloading via switchdev
  2014-11-09 10:51 ` [patch net-next v2 06/10] bridge: introduce fdb offloading via switchdev Jiri Pirko
@ 2014-11-10  3:47   ` Jamal Hadi Salim
  2014-11-10  8:15     ` Jiri Pirko
  0 siblings, 1 reply; 100+ messages in thread
From: Jamal Hadi Salim @ 2014-11-10  3:47 UTC (permalink / raw)
  To: Jiri Pirko, netdev
  Cc: davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse, pshelar,
	azhou, ben, stephen, jeffrey.t.kirsher, vyasevic, xiyou.wangcong,
	john.r.fastabend, edumazet, sfeldma, f.fainelli, roopa, linville,
	jasowang, ebiederm, nicolas.dichtel, ryazanov.s.a, buytenh,
	aviadr, nbd, alexei.starovoitov, Neil.Jerram, ronye,
	simon.horman, alexander.h.duyck, john.ronciak, mleitner,
	shrijeet, gospo, bcrl

On 11/09/14 05:51, Jiri Pirko wrote:
> From: Scott Feldman <sfeldma@gmail.com>
>
> Add two new ndos: ndo_sw_port_fdb_add/del to offload static bridge
> fdb entries.  Static bridge FDB entries are installed, for example,
> using iproute2 bridge cmd:
>
>         bridge fdb add ADDR dev DEV master vlan VID
>
> This would install ADDR into the bridge's FDB for port DEV on vlan VID.  The
> switch driver implements two ndo_swdev ops to add/delete FDB entries in the
> switch device:
>
>         int ndo_sw_port_fdb_add(struct net_device *dev,
>                                 const unsigned char *addr,
>                                 u16 vid);
>
>         int ndo_sw_port_fdb_del(struct net_device *dev,
>                                 const unsigned char *addr,
>                                 u16 vid);
>
> The driver returns 0 on success, negative error code on failure.
>
> Note: the switch driver would not implement ndo_fdb_add/del/dump on a port
> netdev as these are intended for devices maintaining their own FDB.  In our
> case, we want the Linux bridge to own the FBD.
>
> Note: by default, the bridge does not filter on VLAN and only bridges untagged
> traffic.  To enable VLAN support, turn on VLAN filtering:
>
>        echo 1 >/sys/class/net/<bridge>/bridge/vlan_filtering
>

Sorry - why is the current fdb_add/del insufficient? It needs a vlanid
and the master/self flags should indicate intent to add to h/w vs s/w.

cheers,
jamal

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch net-next v2 10/10] rocker: implement L2 bridge offloading
  2014-11-09 10:51 ` [patch net-next v2 10/10] rocker: implement L2 bridge offloading Jiri Pirko
@ 2014-11-10  3:53   ` Jamal Hadi Salim
  2014-11-10  8:18     ` Jiri Pirko
  2014-11-10  8:46     ` Scott Feldman
  0 siblings, 2 replies; 100+ messages in thread
From: Jamal Hadi Salim @ 2014-11-10  3:53 UTC (permalink / raw)
  To: Jiri Pirko, netdev
  Cc: davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse, pshelar,
	azhou, ben, stephen, jeffrey.t.kirsher, vyasevic, xiyou.wangcong,
	john.r.fastabend, edumazet, sfeldma, f.fainelli, roopa, linville,
	jasowang, ebiederm, nicolas.dichtel, ryazanov.s.a, buytenh,
	aviadr, nbd, alexei.starovoitov, Neil.Jerram, ronye,
	simon.horman, alexander.h.duyck, john.ronciak, mleitner,
	shrijeet, gospo, bcrl

On 11/09/14 05:51, Jiri Pirko wrote:
> From: Scott Feldman <sfeldma@gmail.com>
>
> Add L2 bridge offloading support to rocker driver.  Here, the Linux bridge
> driver is used to collect swdev ports into a tagged (or untagged) VLAN
> bridge.  The swdev will offload from the bridge driver the following L2
> bridging functions:
>
>   - Learning of neighbor MAC addresses on VLAN X  Learned mac/vlan is
> installed in bridge FDB.  (And removed when device unlearns mac/vlan).
> Learning must be turned off on each bridge port to disable the feature in
> the bridge driver.
>

I have quiet a few use cases where the above is a no-no. I dont want
learning of any sort (we have a knob for that in the bridge).
And i dont want learning in hardware to be reflected in software.
Basically this is a policy decision. Introduce a knob to choose whether
hardware learnt addresses should be reflected in software.


Have to run, but will comment when i get the time.

cheers,
jamal

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch net-next v2 00/10] introduce rocker switch driver with hardware accelerated datapath api - phase 1: bridge fdb offload
  2014-11-10  3:46   ` Simon Horman
@ 2014-11-10  4:03     ` Jamal Hadi Salim
  2014-11-10  4:58       ` Simon Horman
  0 siblings, 1 reply; 100+ messages in thread
From: Jamal Hadi Salim @ 2014-11-10  4:03 UTC (permalink / raw)
  To: Simon Horman
  Cc: Jiri Pirko, netdev, davem, nhorman, andy, tgraf, dborkman,
	ogerlitz, jesse, pshelar, azhou, ben, stephen, jeffrey.t.kirsher,
	vyasevic, xiyou.wangcong, john.r.fastabend, edumazet, sfeldma,
	f.fainelli, roopa, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, alexander.h.duyck, john.ronciak, mleitner,
	shrijeet, gospo, bcrl

Hi Simon,

On 11/09/14 22:46, Simon Horman wrote:
> Hi Jamal, Hi Jiri,
>
> On a somewhat related note I am also wondering what if any progress has
> been made regarding discussions of (and code for) the following:
>
> 1. Exposing flow tables to user-space
>     - I realise that this is Open vSwitch specific to some extent
>       but I am in no way implying that it should be done instead of
>       non-Open vSwitch specific work.
>     - Jiri, IIRC this was part ~v2 of your earlier offload patchset
>

I dont know what Rocker crowd is doing; however, I know
John F. has been doing some work which i have stared at
and I was hoping to join in with Ben's effort and show tc flow
offload on the realtek chip in my infinite spare time unles.
(for both Linux bridge and ports).
The priority is to merge the obvious bits first.

> 2. Describing Switch Hardware
>     - I see John Fastabend moving forwards on this in his git repository
>       https://github.com/jrfastab/flow-net-next
>
> The way that I see things is that both of the above could be exposed via
> netlink. And that the first at least could be backed by NDOs.  As such I
> see this work as complementary and perhaps applying on top of this
> patchset. If I am mistaken in this regards it would be good to know :)
>

You are correct - I will let John speak on his work, but
that is the intent.
The challenge is there are many schools of thoughts and i am hoping
it is not an arms race.

> I am of course also interested to know if the above are moving forwards.
> To be clear I am very interested in being able to use these APIs to
> perform Open vSwitch offloads and I am very happy to help.
> (Jamal: I'm also interested in non-Open vSwitch offloads :)
>

Hey, OVS should be able to use these APIs; i am just interested in 
making sure they are not just for OVS or OF. Then we are all happy;->

cheers,
jamal

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch net-next v2 00/10] introduce rocker switch driver with hardware accelerated datapath api - phase 1: bridge fdb offload
  2014-11-10  4:03     ` Jamal Hadi Salim
@ 2014-11-10  4:58       ` Simon Horman
  2014-11-10 22:23         ` John Fastabend
  0 siblings, 1 reply; 100+ messages in thread
From: Simon Horman @ 2014-11-10  4:58 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Jiri Pirko, netdev, davem, nhorman, andy, tgraf, dborkman,
	ogerlitz, jesse, pshelar, azhou, ben, stephen, jeffrey.t.kirsher,
	vyasevic, xiyou.wangcong, john.r.fastabend, edumazet, sfeldma,
	f.fainelli, roopa, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, alexander.h.duyck, john.ronciak, mleitner,
	shrijeet, gospo, bcrl

On Sun, Nov 09, 2014 at 11:03:40PM -0500, Jamal Hadi Salim wrote:
> Hi Simon,
> 
> On 11/09/14 22:46, Simon Horman wrote:
> >Hi Jamal, Hi Jiri,
> >
> >On a somewhat related note I am also wondering what if any progress has
> >been made regarding discussions of (and code for) the following:
> >
> >1. Exposing flow tables to user-space
> >    - I realise that this is Open vSwitch specific to some extent
> >      but I am in no way implying that it should be done instead of
> >      non-Open vSwitch specific work.
> >    - Jiri, IIRC this was part ~v2 of your earlier offload patchset
> >
> 
> I dont know what Rocker crowd is doing; however, I know
> John F. has been doing some work which i have stared at
> and I was hoping to join in with Ben's effort and show tc flow
> offload on the realtek chip in my infinite spare time unles.
> (for both Linux bridge and ports).
> The priority is to merge the obvious bits first.

Merging the obvious bits first is quite fine my me.

> >2. Describing Switch Hardware
> >    - I see John Fastabend moving forwards on this in his git repository
> >      https://github.com/jrfastab/flow-net-next
> >
> >The way that I see things is that both of the above could be exposed via
> >netlink. And that the first at least could be backed by NDOs.  As such I
> >see this work as complementary and perhaps applying on top of this
> >patchset. If I am mistaken in this regards it would be good to know :)
> >
> 
> You are correct - I will let John speak on his work, but
> that is the intent.
> The challenge is there are many schools of thoughts and i am hoping
> it is not an arms race.

That is also my hope.

> >I am of course also interested to know if the above are moving forwards.
> >To be clear I am very interested in being able to use these APIs to
> >perform Open vSwitch offloads and I am very happy to help.
> >(Jamal: I'm also interested in non-Open vSwitch offloads :)
> >
> 
> Hey, OVS should be able to use these APIs; i am just interested in making
> sure they are not just for OVS or OF. Then we are all happy;->

I think we are all happy :)

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch net-next v2 01/10] net: rename netdev_phys_port_id to more generic name
  2014-11-10  3:35   ` Jamal Hadi Salim
@ 2014-11-10  5:23     ` David Miller
  2014-11-10 12:06       ` Jamal Hadi Salim
  2014-11-10  7:43     ` Jiri Pirko
  1 sibling, 1 reply; 100+ messages in thread
From: David Miller @ 2014-11-10  5:23 UTC (permalink / raw)
  To: jhs
  Cc: jiri, netdev, nhorman, andy, tgraf, dborkman, ogerlitz, jesse,
	pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, sfeldma, f.fainelli,
	roopa, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck,
	john.ronciak, mleitner, shrijeet, gospo, bcrl

From: Jamal Hadi Salim <jhs@mojatatu.com>
Date: Sun, 09 Nov 2014 22:35:12 -0500

> wouldnt this just break an existing ABI? You may need to introduce a
> new attribute.

He isn't breaking anything Jamal, he's just changing the internal
macro name we use for the attribute's maximum length.

Please read his patches carefully instead of jumping to conclusions.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch net-next v2 00/10] introduce rocker switch driver with hardware accelerated datapath api - phase 1: bridge fdb offload
  2014-11-10  3:31 ` [patch net-next v2 00/10] introduce rocker switch driver with hardware accelerated datapath api - phase 1: bridge fdb offload Jamal Hadi Salim
  2014-11-10  3:46   ` Simon Horman
@ 2014-11-10  7:23   ` Jiri Pirko
  2014-11-10 12:16     ` Jamal Hadi Salim
  1 sibling, 1 reply; 100+ messages in thread
From: Jiri Pirko @ 2014-11-10  7:23 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: netdev, davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse,
	pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, sfeldma, f.fainelli,
	roopa, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck,
	john.ronciak, mleitner, shrijeet, gospo, bcrl

Mon, Nov 10, 2014 at 04:31:39AM CET, jhs@mojatatu.com wrote:
>Jiri,
>
>I am hoping you have considered what Ben Lahaise's, John Fastabend's,
>and Roopa's patches after all those discussions and
>meetings we have had (in which you promised you will merge patches
>in). I am not seeing much of that here or mention of anything of that
>sort.

Hi Jamal.

Yes I looked over their patches. Roopas patche's are about new class of
device which, as I commented in the cover letter, I left out for now and
can be safely added later on.

I went over the Ben's work very carefully as well. The patches are very
rough, mostly rtl-chip specific. But again, my patchset is a base on
which this patches can be build on. I see no issues in that.

>At least please get their sign on - this  is such an important piece of
>new work that you should make sure you get consensus.

Since I did not use their code now, I only put sign off of Scott.

>Otherwise we are back to square one and everyone is going their way with
>their patches;

I do think that we are in sync. I do not see any counter ways. As I
said, their work can be added on to the base made of this patchset.

>
>Ben/Roopa/John - please issue either a signed-off as well
>if you agree with this approach otherwise i am hoping none of these
>patches are merged in.
>
>I will look at the patches and comment.
>
>cheers,
>jamal
>
>On 11/09/14 05:51, Jiri Pirko wrote:
>>Hi all.
>>
>>This patchset is just the first phase of switch and switch-ish device
>>support api in kernel. Note that the api will extend (our complete work
>>can be pulled from https://github.com/jpirko/net-next-rocker).
>>
>>So what this patchset includes:
>>- introduce switchdev api for implementing switch drivers (so far
>>   only linux bridge fdb offload is covered)
>>- introduce rocker switch driver which implements switchdev api
>>
>>As to the discussion if there is need to have specific class of device
>>representing the switch itself, so far we found no need to introduce that.
>>But we are generally ok with the idea and when the time comes and it will
>>be needed, it can be easily introduced without any disturbance.
>>
>>This patchset introduces switch id export through rtnetlink and sysfs,
>>which is similar to what we have for port id in SR-IOV. I will send iproute2
>>patchset for showing the switch id for port netdevs once this is applied.
>>
>>For detailed description, please see individual patches.
>>
>>v1->v2:
>>- addressed all DaveM's comments
>>
>>Jiri Pirko (5):
>>   net: rename netdev_phys_port_id to more generic name
>>   net: introduce generic switch devices support
>>   rtnl: expose physical switch id for particular device
>>   net-sysfs: expose physical switch id for particular device
>>   rocker: introduce rocker switch driver
>>
>>Scott Feldman (5):
>>   bridge: introduce fdb offloading via switchdev
>>   bridge: call netdev_sw_port_stp_update when bridge port STP status
>>     changes
>>   bridge: add API to notify bridge driver of learned FBD on offloaded
>>     device
>>   rocker: implement rocker ofdpa flow table manipulation
>>   rocker: implement L2 bridge offloading
>>
>>  Documentation/networking/switchdev.txt           |   59 +
>>  MAINTAINERS                                      |   14 +
>>  drivers/net/ethernet/Kconfig                     |    1 +
>>  drivers/net/ethernet/Makefile                    |    1 +
>>  drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c |    2 +-
>>  drivers/net/ethernet/intel/i40e/i40e_main.c      |    2 +-
>>  drivers/net/ethernet/mellanox/mlx4/en_netdev.c   |    2 +-
>>  drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c |    2 +-
>>  drivers/net/ethernet/rocker/Kconfig              |   27 +
>>  drivers/net/ethernet/rocker/Makefile             |    5 +
>>  drivers/net/ethernet/rocker/rocker.c             | 4182 ++++++++++++++++++++++
>>  drivers/net/ethernet/rocker/rocker.h             |  427 +++
>>  include/linux/if_bridge.h                        |   18 +
>>  include/linux/netdevice.h                        |   48 +-
>>  include/net/switchdev.h                          |   53 +
>>  include/uapi/linux/if_link.h                     |    1 +
>>  net/Kconfig                                      |    1 +
>>  net/Makefile                                     |    3 +
>>  net/bridge/br_fdb.c                              |   94 +-
>>  net/bridge/br_netlink.c                          |    2 +
>>  net/bridge/br_stp.c                              |    4 +
>>  net/bridge/br_stp_if.c                           |    3 +
>>  net/bridge/br_stp_timer.c                        |    2 +
>>  net/core/dev.c                                   |    2 +-
>>  net/core/net-sysfs.c                             |   26 +-
>>  net/core/rtnetlink.c                             |   30 +-
>>  net/switchdev/Kconfig                            |   13 +
>>  net/switchdev/Makefile                           |    5 +
>>  net/switchdev/switchdev.c                        |   93 +
>>  29 files changed, 5104 insertions(+), 18 deletions(-)
>>  create mode 100644 Documentation/networking/switchdev.txt
>>  create mode 100644 drivers/net/ethernet/rocker/Kconfig
>>  create mode 100644 drivers/net/ethernet/rocker/Makefile
>>  create mode 100644 drivers/net/ethernet/rocker/rocker.c
>>  create mode 100644 drivers/net/ethernet/rocker/rocker.h
>>  create mode 100644 include/net/switchdev.h
>>  create mode 100644 net/switchdev/Kconfig
>>  create mode 100644 net/switchdev/Makefile
>>  create mode 100644 net/switchdev/switchdev.c
>>
>

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch net-next v2 01/10] net: rename netdev_phys_port_id to more generic name
  2014-11-10  3:35   ` Jamal Hadi Salim
  2014-11-10  5:23     ` David Miller
@ 2014-11-10  7:43     ` Jiri Pirko
  2014-11-10 12:17       ` Jamal Hadi Salim
  1 sibling, 1 reply; 100+ messages in thread
From: Jiri Pirko @ 2014-11-10  7:43 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: netdev, davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse,
	pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, sfeldma, f.fainelli,
	roopa, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck,
	john.ronciak, mleitner, shrijeet, gospo, bcrl

Mon, Nov 10, 2014 at 04:35:12AM CET, jhs@mojatatu.com wrote:
>
>On 11/09/14 05:51, Jiri Pirko wrote:
>>So this can be reused for identification of other "items" as well.
>>
>
>
>
>
>>
>>diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
>>index 9dd0669..55dc4da 100644
>>--- a/net/core/net-sysfs.c
>>+++ b/net/core/net-sysfs.c
>>@@ -387,7 +387,7 @@ static ssize_t phys_port_id_show(struct device *dev,
>>  		return restart_syscall();
>>
>>  	if (dev_isalive(netdev)) {
>>-		struct netdev_phys_port_id ppid;
>>+		struct netdev_phys_item_id ppid;
>>
>>  		ret = dev_get_phys_port_id(netdev, &ppid);
>>  		if (!ret)
>>diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
>>index a688268..1087c6d 100644
>>--- a/net/core/rtnetlink.c
>>+++ b/net/core/rtnetlink.c
>>@@ -868,7 +868,7 @@ static noinline size_t if_nlmsg_size(const struct net_device *dev,
>>  	       + rtnl_port_size(dev, ext_filter_mask) /* IFLA_VF_PORTS + IFLA_PORT_SELF */
>>  	       + rtnl_link_get_size(dev) /* IFLA_LINKINFO */
>>  	       + rtnl_link_get_af_size(dev) /* IFLA_AF_SPEC */
>>-	       + nla_total_size(MAX_PHYS_PORT_ID_LEN); /* IFLA_PHYS_PORT_ID */
>>+	       + nla_total_size(MAX_PHYS_ITEM_ID_LEN); /* IFLA_PHYS_PORT_ID */
>>  }
>>
>
>[...]
>>  static int rtnl_vf_ports_fill(struct sk_buff *skb, struct net_device *dev)
>>@@ -952,7 +952,7 @@ static int rtnl_port_fill(struct sk_buff *skb, struct net_device *dev,
>>  static int rtnl_phys_port_id_fill(struct sk_buff *skb, struct net_device *dev)
>>  {
>>  	int err;
>>-	struct netdev_phys_port_id ppid;
>>+	struct netdev_phys_item_id ppid;
>>
>>  	err = dev_get_phys_port_id(dev, &ppid);
>>  	if (err) {
>>@@ -1196,7 +1196,7 @@ static const struct nla_policy ifla_policy[IFLA_MAX+1] = {
>>  	[IFLA_PROMISCUITY]	= { .type = NLA_U32 },
>>  	[IFLA_NUM_TX_QUEUES]	= { .type = NLA_U32 },
>>  	[IFLA_NUM_RX_QUEUES]	= { .type = NLA_U32 },
>>-	[IFLA_PHYS_PORT_ID]	= { .type = NLA_BINARY, .len = MAX_PHYS_PORT_ID_LEN },
>>+	[IFLA_PHYS_PORT_ID]	= { .type = NLA_BINARY, .len = MAX_PHYS_ITEM_ID_LEN },
>>  	[IFLA_CARRIER_CHANGES]	= { .type = NLA_U32 },  /* ignored */
>>  };
>
>
>wouldnt this just break an existing ABI? You may need to introduce a new
>attribute.

I don't see a reason why this would break kabi:

-#define MAX_PHYS_PORT_ID_LEN 32
+#define MAX_PHYS_ITEM_ID_LEN 32



>
>cheers,
>jamal
>
>

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch net-next v2 03/10] rtnl: expose physical switch id for particular device
  2014-11-10  3:43   ` Jamal Hadi Salim
@ 2014-11-10  7:45     ` Jiri Pirko
  0 siblings, 0 replies; 100+ messages in thread
From: Jiri Pirko @ 2014-11-10  7:45 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: netdev, davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse,
	pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, sfeldma, f.fainelli,
	roopa, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck,
	john.ronciak, mleitner, shrijeet, gospo, bcrl

Mon, Nov 10, 2014 at 04:43:58AM CET, jhs@mojatatu.com wrote:
>On 11/09/14 05:51, Jiri Pirko wrote:
>>The netdevice represents a port in a switch, it will expose
>>IFLA_PHYS_SWITCH_ID value via rtnl. Two netdevices with the same value
>>belong to one physical switch.
>>
>>Signed-off-by: Jiri Pirko <jiri@resnulli.us>
>>---
>>  include/uapi/linux/if_link.h |  1 +
>>  net/core/rtnetlink.c         | 26 +++++++++++++++++++++++++-
>>  2 files changed, 26 insertions(+), 1 deletion(-)
>>
>>diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
>>index 7072d83..4163753 100644
>>--- a/include/uapi/linux/if_link.h
>>+++ b/include/uapi/linux/if_link.h
>>@@ -145,6 +145,7 @@ enum {
>>  	IFLA_CARRIER,
>>  	IFLA_PHYS_PORT_ID,
>>  	IFLA_CARRIER_CHANGES,
>>+	IFLA_PHYS_SWITCH_ID,
>>  	__IFLA_MAX
>>  };
>>
>
>
>>@@ -1198,6 +1221,7 @@ static const struct nla_policy ifla_policy[IFLA_MAX+1] = {
>>  	[IFLA_NUM_RX_QUEUES]	= { .type = NLA_U32 },
>>  	[IFLA_PHYS_PORT_ID]	= { .type = NLA_BINARY, .len = MAX_PHYS_ITEM_ID_LEN },
>>  	[IFLA_CARRIER_CHANGES]	= { .type = NLA_U32 },  /* ignored */
>>+	[IFLA_PHYS_SWITCH_ID]	= { .type = NLA_BINARY, .len = MAX_PHYS_ITEM_ID_LEN },
>>  };
>>
>
>Ok, looking at this compared to #1 i can see you are introducing
>IFLA_PHYS_SWITCH_ID but then why did you need to change IFLA_PHYS_PORT_ID
>earlier?

I did not change it at all. I only made the name more generic. Please
look closer to the patch 1.

>
>cheers,
>jamal
>
>>  static const struct nla_policy ifla_info_policy[IFLA_INFO_MAX+1] = {
>>
>

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch net-next v2 06/10] bridge: introduce fdb offloading via switchdev
  2014-11-10  3:47   ` Jamal Hadi Salim
@ 2014-11-10  8:15     ` Jiri Pirko
  2014-11-10  9:30       ` Scott Feldman
                         ` (2 more replies)
  0 siblings, 3 replies; 100+ messages in thread
From: Jiri Pirko @ 2014-11-10  8:15 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: netdev, davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse,
	pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, sfeldma, f.fainelli,
	roopa, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck,
	john.ronciak, mleitner, shrijeet, gospo, bcrl

Mon, Nov 10, 2014 at 04:47:48AM CET, jhs@mojatatu.com wrote:
>On 11/09/14 05:51, Jiri Pirko wrote:
>>From: Scott Feldman <sfeldma@gmail.com>
>>
>>Add two new ndos: ndo_sw_port_fdb_add/del to offload static bridge
>>fdb entries.  Static bridge FDB entries are installed, for example,
>>using iproute2 bridge cmd:
>>
>>        bridge fdb add ADDR dev DEV master vlan VID
>>
>>This would install ADDR into the bridge's FDB for port DEV on vlan VID.  The
>>switch driver implements two ndo_swdev ops to add/delete FDB entries in the
>>switch device:
>>
>>        int ndo_sw_port_fdb_add(struct net_device *dev,
>>                                const unsigned char *addr,
>>                                u16 vid);
>>
>>        int ndo_sw_port_fdb_del(struct net_device *dev,
>>                                const unsigned char *addr,
>>                                u16 vid);
>>
>>The driver returns 0 on success, negative error code on failure.
>>
>>Note: the switch driver would not implement ndo_fdb_add/del/dump on a port
>>netdev as these are intended for devices maintaining their own FDB.  In our
>>case, we want the Linux bridge to own the FBD.
>>
>>Note: by default, the bridge does not filter on VLAN and only bridges untagged
>>traffic.  To enable VLAN support, turn on VLAN filtering:
>>
>>       echo 1 >/sys/class/net/<bridge>/bridge/vlan_filtering
>>
>
>Sorry - why is the current fdb_add/del insufficient? It needs a vlanid
>and the master/self flags should indicate intent to add to h/w vs s/w.

Jamal, I believe we discussed this already. The thing is that current
fdb_add/del does not need vlanid and master/self flags, because it
already has that (struct nlattr *tb[]). Here is the whole list of
parameters to these functions:
        NDA_DST,
        NDA_LLADDR,
        NDA_CACHEINFO,
        NDA_PROBES,
        NDA_VLAN,
        NDA_PORT,
        NDA_VNI,
        NDA_IFINDEX,
        NDA_MASTER,

There are few problems in re-using this. It is netlink based so for calling
it from bridge code, we would have to construct netlink message. But
that could be probably changed.
As you can see from the list of parameters, this is no longer about fdb (addr,
vlanid) but this has been extended to something else. See vxlan code for
what this is used for. I believe that fdb_add/del should be renamed to
something else, perhaps l2neigh_add/del or something like that.
The other problem is that fdb_add/del is currently used by various
drivers for different purpose (adding macs to unicast list).

Scott, you may probably want to add something to this.

>
>cheers,
>jamal
>

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch net-next v2 10/10] rocker: implement L2 bridge offloading
  2014-11-10  3:53   ` Jamal Hadi Salim
@ 2014-11-10  8:18     ` Jiri Pirko
  2014-11-10  9:10       ` Nicolas Dichtel
  2014-11-10  8:46     ` Scott Feldman
  1 sibling, 1 reply; 100+ messages in thread
From: Jiri Pirko @ 2014-11-10  8:18 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: netdev, davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse,
	pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, sfeldma, f.fainelli,
	roopa, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck,
	john.ronciak, mleitner, shrijeet, gospo, bcrl

Mon, Nov 10, 2014 at 04:53:07AM CET, jhs@mojatatu.com wrote:
>On 11/09/14 05:51, Jiri Pirko wrote:
>>From: Scott Feldman <sfeldma@gmail.com>
>>
>>Add L2 bridge offloading support to rocker driver.  Here, the Linux bridge
>>driver is used to collect swdev ports into a tagged (or untagged) VLAN
>>bridge.  The swdev will offload from the bridge driver the following L2
>>bridging functions:
>>
>>  - Learning of neighbor MAC addresses on VLAN X  Learned mac/vlan is
>>installed in bridge FDB.  (And removed when device unlearns mac/vlan).
>>Learning must be turned off on each bridge port to disable the feature in
>>the bridge driver.
>>
>
>I have quiet a few use cases where the above is a no-no. I dont want
>learning of any sort (we have a knob for that in the bridge).

I would love to see your code. Please do share.

>And i dont want learning in hardware to be reflected in software.
>Basically this is a policy decision. Introduce a knob to choose whether
>hardware learnt addresses should be reflected in software.

This can be easily done as a bridge option I believe. If you are ok with
that, I will add a patch (either in re-post or as a follow-up)

>
>
>Have to run, but will comment when i get the time.
>
>cheers,
>jamal
>

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch net-next v2 10/10] rocker: implement L2 bridge offloading
  2014-11-10  3:53   ` Jamal Hadi Salim
  2014-11-10  8:18     ` Jiri Pirko
@ 2014-11-10  8:46     ` Scott Feldman
  2014-11-10 12:27       ` Jamal Hadi Salim
  1 sibling, 1 reply; 100+ messages in thread
From: Scott Feldman @ 2014-11-10  8:46 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Jiri Pirko, Netdev, David S. Miller, nhorman, Andy Gospodarek,
	Thomas Graf, dborkman, ogerlitz, jesse, pshelar, azhou, ben,
	stephen, Kirsher, Jeffrey T, vyasevic, xiyou.wangcong, Fastabend,
	John R, edumazet, Florian Fainelli, Roopa Prabhu, John Linville,
	jasowang, ebiederm, nicolas.dichtel, ryazanov.s.a, buytenh,
	aviadr, nbd, alexei.starovoitov, Neil.Jerram, ronye, simon.ho

On Sun, Nov 9, 2014 at 5:53 PM, Jamal Hadi Salim <jhs@mojatatu.com> wrote:
>
> On 11/09/14 05:51, Jiri Pirko wrote:
>>
>> From: Scott Feldman <sfeldma@gmail.com>
>>
>> Add L2 bridge offloading support to rocker driver.  Here, the Linux bridge
>> driver is used to collect swdev ports into a tagged (or untagged) VLAN
>> bridge.  The swdev will offload from the bridge driver the following L2
>> bridging functions:
>>
>>   - Learning of neighbor MAC addresses on VLAN X  Learned mac/vlan is
>> installed in bridge FDB.  (And removed when device unlearns mac/vlan).
>> Learning must be turned off on each bridge port to disable the feature in
>> the bridge driver.
>>
>
> I have quiet a few use cases where the above is a no-no. I dont want
> learning of any sort (we have a knob for that in the bridge).
> And i dont want learning in hardware to be reflected in software.
> Basically this is a policy decision. Introduce a knob to choose whether
> hardware learnt addresses should be reflected in software.


IFLA_BRPORT_LEARNING is u8 attr and we're only using lower bit to turn
learning on/off.  Maybe we can use another bit to indicate learning to
be done in sw or hw.  I don't think adding another bit would break
existing iproute2.

LEARNING_ENABLED    (1 << 0)
LEARNING_HW              (1 << 1)

Would this work?


>
>
> Have to run, but will comment when i get the time.
>
> cheers,
> jamal
>

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch net-next v2 10/10] rocker: implement L2 bridge offloading
  2014-11-10  8:18     ` Jiri Pirko
@ 2014-11-10  9:10       ` Nicolas Dichtel
  0 siblings, 0 replies; 100+ messages in thread
From: Nicolas Dichtel @ 2014-11-10  9:10 UTC (permalink / raw)
  To: Jiri Pirko, Jamal Hadi Salim
  Cc: netdev, davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse,
	pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, sfeldma, f.fainelli,
	roopa, linville, jasowang, ebiederm, ryazanov.s.a, buytenh,
	aviadr, nbd, alexei.starovoitov, Neil.Jerram, ronye,
	simon.horman, alexander.h.duyck, john.ronciak, mleitner,
	shrijeet, gospo, bcrl

Le 10/11/2014 09:18, Jiri Pirko a écrit :
> Mon, Nov 10, 2014 at 04:53:07AM CET, jhs@mojatatu.com wrote:
>> On 11/09/14 05:51, Jiri Pirko wrote:
>>> From: Scott Feldman <sfeldma@gmail.com>
>>>
>>> Add L2 bridge offloading support to rocker driver.  Here, the Linux bridge
>>> driver is used to collect swdev ports into a tagged (or untagged) VLAN
>>> bridge.  The swdev will offload from the bridge driver the following L2
>>> bridging functions:
>>>
>>>   - Learning of neighbor MAC addresses on VLAN X  Learned mac/vlan is
>>> installed in bridge FDB.  (And removed when device unlearns mac/vlan).
>>> Learning must be turned off on each bridge port to disable the feature in
>>> the bridge driver.
>>>
>>
>> I have quiet a few use cases where the above is a no-no. I dont want
>> learning of any sort (we have a knob for that in the bridge).
>
> I would love to see your code. Please do share.
>
>> And i dont want learning in hardware to be reflected in software.
>> Basically this is a policy decision. Introduce a knob to choose whether
>> hardware learnt addresses should be reflected in software.
>
> This can be easily done as a bridge option I believe. If you are ok with
> that, I will add a patch (either in re-post or as a follow-up)
An option will be nice.


Regards,
Nicolas

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch net-next v2 06/10] bridge: introduce fdb offloading via switchdev
  2014-11-10  8:15     ` Jiri Pirko
@ 2014-11-10  9:30       ` Scott Feldman
  2014-11-10 12:47       ` Jamal Hadi Salim
  2014-11-10 13:51       ` Thomas Graf
  2 siblings, 0 replies; 100+ messages in thread
From: Scott Feldman @ 2014-11-10  9:30 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Jamal Hadi Salim, Netdev, David S. Miller, nhorman,
	Andy Gospodarek, Thomas Graf, dborkman, ogerlitz, jesse, pshelar,
	azhou, ben, stephen, Kirsher, Jeffrey T, vyasevic, Cong Wang,
	Fastabend, John R, Eric Dumazet, Florian Fainelli, Roopa Prabhu,
	John Linville, jasowang, ebiederm, nicolas.dichtel, ryazanov.s.a,
	buytenh, aviadr, nbd, Alexei Starovoitov

On Sun, Nov 9, 2014 at 10:15 PM, Jiri Pirko <jiri@resnulli.us> wrote:
> Mon, Nov 10, 2014 at 04:47:48AM CET, jhs@mojatatu.com wrote:
>>On 11/09/14 05:51, Jiri Pirko wrote:
>>>From: Scott Feldman <sfeldma@gmail.com>
>>>
>>>Add two new ndos: ndo_sw_port_fdb_add/del to offload static bridge
>>>fdb entries.  Static bridge FDB entries are installed, for example,
>>>using iproute2 bridge cmd:
>>>
>>>        bridge fdb add ADDR dev DEV master vlan VID
>>>
>>>This would install ADDR into the bridge's FDB for port DEV on vlan VID.  The
>>>switch driver implements two ndo_swdev ops to add/delete FDB entries in the
>>>switch device:
>>>
>>>        int ndo_sw_port_fdb_add(struct net_device *dev,
>>>                                const unsigned char *addr,
>>>                                u16 vid);
>>>
>>>        int ndo_sw_port_fdb_del(struct net_device *dev,
>>>                                const unsigned char *addr,
>>>                                u16 vid);
>>>
>>>The driver returns 0 on success, negative error code on failure.
>>>
>>>Note: the switch driver would not implement ndo_fdb_add/del/dump on a port
>>>netdev as these are intended for devices maintaining their own FDB.  In our
>>>case, we want the Linux bridge to own the FBD.
>>>
>>>Note: by default, the bridge does not filter on VLAN and only bridges untagged
>>>traffic.  To enable VLAN support, turn on VLAN filtering:
>>>
>>>       echo 1 >/sys/class/net/<bridge>/bridge/vlan_filtering
>>>
>>
>>Sorry - why is the current fdb_add/del insufficient? It needs a vlanid
>>and the master/self flags should indicate intent to add to h/w vs s/w.
>
> Jamal, I believe we discussed this already. The thing is that current
> fdb_add/del does not need vlanid and master/self flags, because it
> already has that (struct nlattr *tb[]). Here is the whole list of
> parameters to these functions:
>         NDA_DST,
>         NDA_LLADDR,
>         NDA_CACHEINFO,
>         NDA_PROBES,
>         NDA_VLAN,
>         NDA_PORT,
>         NDA_VNI,
>         NDA_IFINDEX,
>         NDA_MASTER,
>
> There are few problems in re-using this. It is netlink based so for calling
> it from bridge code, we would have to construct netlink message. But
> that could be probably changed.
> As you can see from the list of parameters, this is no longer about fdb (addr,
> vlanid) but this has been extended to something else. See vxlan code for
> what this is used for. I believe that fdb_add/del should be renamed to
> something else, perhaps l2neigh_add/del or something like that.
> The other problem is that fdb_add/del is currently used by various
> drivers for different purpose (adding macs to unicast list).
>
> Scott, you may probably want to add something to this.

You hit the main point: having to synthesize netlink msg in the bridge
driver and pass it down to port driver using .ndo_fdb_add/del is
awkward.  And then, if the port driver implements .ndo_fdb_add/del
(and dump), then user could by-pass bridge and install fdbs directly
on port, as if the port maintains its own fdb.  We want the bridge fdb
to be the single fdb, and static fdbs installed by user on bridge to
be pushed down to sw port driver to be installed in hw.


>>
>>cheers,
>>jamal
>>

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch net-next v2 01/10] net: rename netdev_phys_port_id to more generic name
  2014-11-10  5:23     ` David Miller
@ 2014-11-10 12:06       ` Jamal Hadi Salim
  2014-11-10 12:33         ` Daniel Borkmann
  2014-11-10 16:28         ` David Miller
  0 siblings, 2 replies; 100+ messages in thread
From: Jamal Hadi Salim @ 2014-11-10 12:06 UTC (permalink / raw)
  To: David Miller
  Cc: jiri, netdev, nhorman, andy, tgraf, dborkman, ogerlitz, jesse,
	pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, sfeldma, f.fainelli,
	roopa, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck,
	john.ronciak, mleitner, shrijeet, gospo, bcrl

On 11/10/14 00:23, David Miller wrote:
> From: Jamal Hadi Salim <jhs@mojatatu.com>
> Date: Sun, 09 Nov 2014 22:35:12 -0500
>
>> wouldnt this just break an existing ABI? You may need to introduce a
>> new attribute.
>
> He isn't breaking anything Jamal, he's just changing the internal
> macro name we use for the attribute's maximum length.


It is a _user space visible rename_, how about:

#define MAX_PHYS_ITEM_ID_LEN 32
#define MAX_PHYS_PORT_ID_LEN   MAX_PHYS_ITEM_ID_LEN

I did miss the fact that the size didnt change.

cheers,
jamal

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch net-next v2 00/10] introduce rocker switch driver with hardware accelerated datapath api - phase 1: bridge fdb offload
  2014-11-10  7:23   ` Jiri Pirko
@ 2014-11-10 12:16     ` Jamal Hadi Salim
  2014-11-10 13:12       ` Jiri Pirko
  0 siblings, 1 reply; 100+ messages in thread
From: Jamal Hadi Salim @ 2014-11-10 12:16 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse,
	pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, sfeldma, f.fainelli,
	roopa, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck,
	john.ronciak, mleitner, shrijeet, gospo, bcrl

On 11/10/14 02:23, Jiri Pirko wrote:

>
> Yes I looked over their patches. Roopas patche's are about new class of
> device which, as I commented in the cover letter, I left out for now and
> can be safely added later on.
>
> I went over the Ben's work very carefully as well. The patches are very
> rough, mostly rtl-chip specific. But again, my patchset is a base on
> which this patches can be build on. I see no issues in that.
>
>> At least please get their sign on - this  is such an important piece of
>> new work that you should make sure you get consensus.
>
> Since I did not use their code now, I only put sign off of Scott.
>

Your last comment was "i am going to merge the patches" ;->
At least send an email explaining your plan to people who have worked
hard to cooperate with you or say it in the cover letter.

>> Otherwise we are back to square one and everyone is going their way with
>> their patches;
>
> I do think that we are in sync. I do not see any counter ways. As I
> said, their work can be added on to the base made of this patchset.
>

Ok, I hope so. I spoke for myself - it is important for this patches
you get their sign-on in my opinion.

cheers,
jamal

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch net-next v2 01/10] net: rename netdev_phys_port_id to more generic name
  2014-11-10  7:43     ` Jiri Pirko
@ 2014-11-10 12:17       ` Jamal Hadi Salim
  2014-11-10 13:16         ` Jiri Pirko
  2014-11-10 16:28         ` David Miller
  0 siblings, 2 replies; 100+ messages in thread
From: Jamal Hadi Salim @ 2014-11-10 12:17 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse,
	pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, sfeldma, f.fainelli,
	roopa, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck,
	john.ronciak, mleitner, shrijeet, gospo, bcrl

On 11/10/14 02:43, Jiri Pirko wrote:
> Mon, Nov 10, 2014 at 04:35:12AM CET, jhs@mojatatu.com wrote:

> I don't see a reason why this would break kabi:
>
> -#define MAX_PHYS_PORT_ID_LEN 32
> +#define MAX_PHYS_ITEM_ID_LEN 32
>

refer to my response to Dave. Just define MAX_PHYS_PORT_ID_LEN to
MAX_PHYS_ITEM_ID_LEN so people dont have to change their code
because a name change.

cheers,
jamal

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch net-next v2 10/10] rocker: implement L2 bridge offloading
  2014-11-10  8:46     ` Scott Feldman
@ 2014-11-10 12:27       ` Jamal Hadi Salim
  2014-11-10 16:12         ` Roopa Prabhu
  2014-11-10 17:22         ` Scott Feldman
  0 siblings, 2 replies; 100+ messages in thread
From: Jamal Hadi Salim @ 2014-11-10 12:27 UTC (permalink / raw)
  To: Scott Feldman
  Cc: Jiri Pirko, Netdev, David S. Miller, nhorman, Andy Gospodarek,
	Thomas Graf, dborkman, ogerlitz, jesse, pshelar, azhou, ben,
	stephen, Kirsher, Jeffrey T, vyasevic, xiyou.wangcong, Fastabend,
	John R, edumazet, Florian Fainelli, Roopa Prabhu, John Linville,
	jasowang, ebiederm, nicolas.dichtel, ryazanov.s.a, buytenh,
	aviadr, nbd, alexei.starovoitov, Neil.Jerram, ronye, simon.ho

On 11/10/14 03:46, Scott Feldman wrote:

>
> IFLA_BRPORT_LEARNING is u8 attr and we're only using lower bit to turn
> learning on/off.  Maybe we can use another bit to indicate learning to
> be done in sw or hw.  I don't think adding another bit would break
> existing iproute2.
>
> LEARNING_ENABLED    (1 << 0)
> LEARNING_HW              (1 << 1)
>
> Would this work?
>

Yes to making it a bit. But:
This is not *learning*. You are doing a *sync*.
Those are two different things.

Learning on/off exists today. It signals to the L2 whether you
should learn or not.
I like the way fdb_add/del work with a flag which says
it is the software and/or offloaded version. Please keep that
semantic.
What you are doing above is letting the hardware learn then
syncing to software. You need a different flag there. something
like:

SYNC_HW_FDB (1<<1)


cheers,
jamal

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch net-next v2 01/10] net: rename netdev_phys_port_id to more generic name
  2014-11-10 12:06       ` Jamal Hadi Salim
@ 2014-11-10 12:33         ` Daniel Borkmann
  2014-11-10 12:56           ` Jamal Hadi Salim
  2014-11-10 16:28         ` David Miller
  1 sibling, 1 reply; 100+ messages in thread
From: Daniel Borkmann @ 2014-11-10 12:33 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: David Miller, jiri, netdev, nhorman, andy, tgraf, ogerlitz,
	jesse, pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, sfeldma, f.fainelli,
	roopa, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck,
	john.ronciak, mleitner, shrijeet, gospo, bcrl

On 11/10/2014 01:06 PM, Jamal Hadi Salim wrote:
> On 11/10/14 00:23, David Miller wrote:
>> From: Jamal Hadi Salim <jhs@mojatatu.com>
>> Date: Sun, 09 Nov 2014 22:35:12 -0500
>>
>>> wouldnt this just break an existing ABI? You may need to introduce a
>>> new attribute.
>>
>> He isn't breaking anything Jamal, he's just changing the internal
>> macro name we use for the attribute's maximum length.
>
> It is a _user space visible rename_, how about:
>
> #define MAX_PHYS_ITEM_ID_LEN 32
> #define MAX_PHYS_PORT_ID_LEN   MAX_PHYS_ITEM_ID_LEN
>
> I did miss the fact that the size didnt change.

Actually, it's currently not exposed via any uapi header ...

$ git grep -n MAX_PHYS_PORT_ID_LEN
include/linux/netdevice.h:756:#define MAX_PHYS_PORT_ID_LEN 32
include/linux/netdevice.h:762:  unsigned char id[MAX_PHYS_PORT_ID_LEN];
net/core/rtnetlink.c:871:              + nla_total_size(MAX_PHYS_PORT_ID_LEN); /* IFLA_PHYS_PORT_ID */
net/core/rtnetlink.c:1199:      [IFLA_PHYS_PORT_ID]     = { .type = NLA_BINARY, .len = MAX_PHYS_PORT_ID_LEN },

... and based on commit 66cae9ed6bc4 ("rtnl: export physical port id
via RT netlink") only exported as read-only.

Best,
Daniel

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch net-next v2 06/10] bridge: introduce fdb offloading via switchdev
  2014-11-10  8:15     ` Jiri Pirko
  2014-11-10  9:30       ` Scott Feldman
@ 2014-11-10 12:47       ` Jamal Hadi Salim
  2014-11-10 13:47         ` Jiri Pirko
  2014-11-10 13:51       ` Thomas Graf
  2 siblings, 1 reply; 100+ messages in thread
From: Jamal Hadi Salim @ 2014-11-10 12:47 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse,
	pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, sfeldma, f.fainelli,
	roopa, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck,
	john.ronciak, mleitner, shrijeet, gospo, bcrl

On 11/10/14 03:15, Jiri Pirko wrote:
> Mon, Nov 10, 2014 at 04:47:48AM CET, jhs@mojatatu.com wrote:
>> On 11/09/14 05:51, Jiri Pirko wrote:
>>> From: Scott Feldman <sfeldma@gmail.com>
>>>

> Jamal, I believe we discussed this already.

I cant remember how that ended.

> The thing is that current
> fdb_add/del does not need vlanid and master/self flags, because it
> already has that (struct nlattr *tb[]). Here is the whole list of
> parameters to these functions:
>          NDA_DST,
>          NDA_LLADDR,
>          NDA_CACHEINFO,
>          NDA_PROBES,
>          NDA_VLAN,
>          NDA_PORT,
>          NDA_VNI,
>          NDA_IFINDEX,
>          NDA_MASTER,
>
> There are few problems in re-using this. It is netlink based so for calling
> it from bridge code, we would have to construct netlink message. But
> that could be probably changed.

Trying to understand.

A netlink message for a bridge to add an fdb is targeted at the
*bridge port*.
That message has semantic which says "please add this entry
to the software bridge and/or offloaded hardware".
If something is targetted at the bridge port, ->ndo_fdb_add()
is invoked with an internally chewed structure.
Why would you have to construct a new netlink message to the driver?


> As you can see from the list of parameters, this is no longer about fdb (addr,
> vlanid) but this has been extended to something else.

I am still missing understanding that part.
Or maybe are you saying that you dont want to pass netlink
constructs to the driver?

> See vxlan code for
> what this is used for. I believe that fdb_add/del should be renamed to
> something else, perhaps l2neigh_add/del or something like that.
> The other problem is that fdb_add/del is currently used by various
> drivers for different purpose (adding macs to unicast list).
>

Ok, now a small spark ignited in my brain. You did talk about renaming
things to neighXXX in one of the exchanges. I think this is a separate
issue from the question of why you cant refactor ndo_fdb_add/del

The abuse of using this interface for unicast addresses is probably
driven by the fact some of the hardware probably offloads vlanid 0 or
something speacial like 4095 to point to the underlying hardware that
"this belongs to host cpu".
I am not a fan of it (and have posted in exchanges with Vlad in the
past).

cheers,
jamal

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch net-next v2 01/10] net: rename netdev_phys_port_id to more generic name
  2014-11-10 12:33         ` Daniel Borkmann
@ 2014-11-10 12:56           ` Jamal Hadi Salim
  0 siblings, 0 replies; 100+ messages in thread
From: Jamal Hadi Salim @ 2014-11-10 12:56 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: David Miller, jiri, netdev, nhorman, andy, tgraf, ogerlitz,
	jesse, pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, sfeldma, f.fainelli,
	roopa, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck,
	john.ronciak, mleitner, shrijeet, gospo, bcrl

On 11/10/14 07:33, Daniel Borkmann wrote:

> $ git grep -n MAX_PHYS_PORT_ID_LEN
> include/linux/netdevice.h:756:#define MAX_PHYS_PORT_ID_LEN 32
> include/linux/netdevice.h:762:  unsigned char id[MAX_PHYS_PORT_ID_LEN];
> net/core/rtnetlink.c:871:              +
> nla_total_size(MAX_PHYS_PORT_ID_LEN); /* IFLA_PHYS_PORT_ID */
> net/core/rtnetlink.c:1199:      [IFLA_PHYS_PORT_ID]     = { .type =
> NLA_BINARY, .len = MAX_PHYS_PORT_ID_LEN },
>
> ... and based on commit 66cae9ed6bc4 ("rtnl: export physical port id
> via RT netlink") only exported as read-only.
>

I guess it is *not exported* if no user space code sees it.
If that is the case, I agree that my suggestion is unneeded.

cheers,
jamal

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch net-next v2 07/10] bridge: call netdev_sw_port_stp_update when bridge port STP status changes
  2014-11-09 10:51 ` [patch net-next v2 07/10] bridge: call netdev_sw_port_stp_update when bridge port STP status changes Jiri Pirko
@ 2014-11-10 13:11   ` Jamal Hadi Salim
  2014-11-10 14:04     ` Thomas Graf
  2014-11-10 15:59     ` Roopa Prabhu
  0 siblings, 2 replies; 100+ messages in thread
From: Jamal Hadi Salim @ 2014-11-10 13:11 UTC (permalink / raw)
  To: Jiri Pirko, netdev
  Cc: davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse, pshelar,
	azhou, ben, stephen, jeffrey.t.kirsher, vyasevic, xiyou.wangcong,
	john.r.fastabend, edumazet, sfeldma, f.fainelli, roopa, linville,
	jasowang, ebiederm, nicolas.dichtel, ryazanov.s.a, buytenh,
	aviadr, nbd, alexei.starovoitov, Neil.Jerram, ronye,
	simon.horman, alexander.h.duyck, john.ronciak, mleitner,
	shrijeet, gospo, bcrl

On 11/09/14 05:51, Jiri Pirko wrote:
> From: Scott Feldman <sfeldma@gmail.com>
>
> To notify switch driver of change in STP state of bridge port, add new
> .ndo op and provide swdev wrapper func to call ndo op. Use it in bridge
> code then.
>
> Signed-off-by: Scott Feldman <sfeldma@gmail.com>
> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
> ---
>   include/linux/netdevice.h |  6 ++++++
>   include/net/switchdev.h   |  6 ++++++
>   net/bridge/br_netlink.c   |  2 ++
>   net/bridge/br_stp.c       |  4 ++++
>   net/bridge/br_stp_if.c    |  3 +++
>   net/bridge/br_stp_timer.c |  2 ++
>   net/switchdev/switchdev.c | 19 +++++++++++++++++++
>   7 files changed, 42 insertions(+)
>
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 116a19d..35f21a95 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -1033,6 +1033,10 @@ typedef u16 (*select_queue_fallback_t)(struct net_device *dev,
>    *			      const unsigned char *addr,
>    *			      u16 vid);
>    *	Called to delete a fdb from switch device port.
> + *
> + * int (*ndo_sw_port_stp_update)(struct net_device *dev, u8 state);
> + *	Called to notify switch device port of bridge port STP
> + *	state change.

You are unconditionally calling
netdev_sw_port_stp_update(p->dev, p->state);
Again issue is policy. Could you make this work the same
way the fdb_add e.g user intent of whether i want to turn
a port in hardware and/or software to disabled/learning/etc
is reflected?

btw: does _sw_ stand for switch? why not _hw_ ?
Could we have one ndo for all flags instead of individual ones.

I know the current user space code uses u8 as a bitflag; but
maybe we can introduce a new u32 flag bitmask that has all the
flags set for backward compat? I can count about a total of 10.

cheers,
jamal

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch net-next v2 00/10] introduce rocker switch driver with hardware accelerated datapath api - phase 1: bridge fdb offload
  2014-11-10 12:16     ` Jamal Hadi Salim
@ 2014-11-10 13:12       ` Jiri Pirko
  0 siblings, 0 replies; 100+ messages in thread
From: Jiri Pirko @ 2014-11-10 13:12 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: netdev, davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse,
	pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, sfeldma, f.fainelli,
	roopa, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck,
	john.ronciak, mleitner, shrijeet, gospo, bcrl

Mon, Nov 10, 2014 at 01:16:16PM CET, jhs@mojatatu.com wrote:
>On 11/10/14 02:23, Jiri Pirko wrote:
>
>>
>>Yes I looked over their patches. Roopas patche's are about new class of
>>device which, as I commented in the cover letter, I left out for now and
>>can be safely added later on.
>>
>>I went over the Ben's work very carefully as well. The patches are very
>>rough, mostly rtl-chip specific. But again, my patchset is a base on
>>which this patches can be build on. I see no issues in that.
>>
>>>At least please get their sign on - this  is such an important piece of
>>>new work that you should make sure you get consensus.
>>
>>Since I did not use their code now, I only put sign off of Scott.
>>
>
>Your last comment was "i am going to merge the patches" ;->
>At least send an email explaining your plan to people who have worked
>hard to cooperate with you or say it in the cover letter.

Well I had feedback only from Roopa and we discussed it in following
email thread. I never had any feedback from Ben. I only saw you pasting
link to his git.

There's really nothing else to merge at the moment. I would love to went
over patches and merge them into my tree if anyone sends them.

>
>>>Otherwise we are back to square one and everyone is going their way with
>>>their patches;
>>
>>I do think that we are in sync. I do not see any counter ways. As I
>>said, their work can be added on to the base made of this patchset.
>>
>
>Ok, I hope so. I spoke for myself - it is important for this patches
>you get their sign-on in my opinion.
>
>cheers,
>jamal
>

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch net-next v2 01/10] net: rename netdev_phys_port_id to more generic name
  2014-11-10 12:17       ` Jamal Hadi Salim
@ 2014-11-10 13:16         ` Jiri Pirko
  2014-11-10 13:20           ` Jamal Hadi Salim
  2014-11-10 16:28         ` David Miller
  1 sibling, 1 reply; 100+ messages in thread
From: Jiri Pirko @ 2014-11-10 13:16 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: netdev, davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse,
	pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, sfeldma, f.fainelli,
	roopa, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck,
	john.ronciak, mleitner, shrijeet, gospo, bcrl

Mon, Nov 10, 2014 at 01:17:04PM CET, jhs@mojatatu.com wrote:
>On 11/10/14 02:43, Jiri Pirko wrote:
>>Mon, Nov 10, 2014 at 04:35:12AM CET, jhs@mojatatu.com wrote:
>
>>I don't see a reason why this would break kabi:
>>
>>-#define MAX_PHYS_PORT_ID_LEN 32
>>+#define MAX_PHYS_ITEM_ID_LEN 32
>>
>
>refer to my response to Dave. Just define MAX_PHYS_PORT_ID_LEN to
>MAX_PHYS_ITEM_ID_LEN so people dont have to change their code
>because a name change.

Jamal, please look at the patch & code. MAX_PHYS_PORT_ID_LEN is in
include/linux/netdevice.h which is not part of user exported api.

>
>cheers,
>jamal

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch net-next v2 01/10] net: rename netdev_phys_port_id to more generic name
  2014-11-10 13:16         ` Jiri Pirko
@ 2014-11-10 13:20           ` Jamal Hadi Salim
  0 siblings, 0 replies; 100+ messages in thread
From: Jamal Hadi Salim @ 2014-11-10 13:20 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse,
	pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, sfeldma, f.fainelli,
	roopa, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck,
	john.ronciak, mleitner, shrijeet, gospo, bcrl

On 11/10/14 08:16, Jiri Pirko wrote:

> Jamal, please look at the patch & code. MAX_PHYS_PORT_ID_LEN is in
> include/linux/netdevice.h which is not part of user exported api.
>

I got it - the confusing part was rtnetlink.c was looking for it
as if it was expecting user space to send it.

cheers,
jamal

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch net-next v2 06/10] bridge: introduce fdb offloading via switchdev
  2014-11-10 12:47       ` Jamal Hadi Salim
@ 2014-11-10 13:47         ` Jiri Pirko
  2014-11-10 19:13           ` Jamal Hadi Salim
  0 siblings, 1 reply; 100+ messages in thread
From: Jiri Pirko @ 2014-11-10 13:47 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: netdev, davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse,
	pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, sfeldma, f.fainelli,
	roopa, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck,
	john.ronciak, mleitner, shrijeet, gospo, bcrl

Mon, Nov 10, 2014 at 01:47:33PM CET, jhs@mojatatu.com wrote:
>On 11/10/14 03:15, Jiri Pirko wrote:
>>Mon, Nov 10, 2014 at 04:47:48AM CET, jhs@mojatatu.com wrote:
>>>On 11/09/14 05:51, Jiri Pirko wrote:
>>>>From: Scott Feldman <sfeldma@gmail.com>
>>>>
>
>>Jamal, I believe we discussed this already.
>
>I cant remember how that ended.
>
>>The thing is that current
>>fdb_add/del does not need vlanid and master/self flags, because it
>>already has that (struct nlattr *tb[]). Here is the whole list of
>>parameters to these functions:
>>         NDA_DST,
>>         NDA_LLADDR,
>>         NDA_CACHEINFO,
>>         NDA_PROBES,
>>         NDA_VLAN,
>>         NDA_PORT,
>>         NDA_VNI,
>>         NDA_IFINDEX,
>>         NDA_MASTER,
>>
>>There are few problems in re-using this. It is netlink based so for calling
>>it from bridge code, we would have to construct netlink message. But
>>that could be probably changed.
>
>Trying to understand.
>
>A netlink message for a bridge to add an fdb is targeted at the
>*bridge port*.
>That message has semantic which says "please add this entry
>to the software bridge and/or offloaded hardware".
>If something is targetted at the bridge port, ->ndo_fdb_add()
>is invoked with an internally chewed structure.
>Why would you have to construct a new netlink message to the driver?

Because now, If you would like to pass one of NDA_DST, NDA_LLADDR,
NDA_CACHEINFO, NDA_PROBES, NDA_VLAN, NDA_PORT, NDA_VNI, NDA_IFINDEX,
NDA_MASTER values via ndo_fdb_add/del to the driver, you have to
construct "struct nlattr *tb[]". Preprocessing this tb into struct might
be suitable for some use-case, for some it may not.


>
>
>>As you can see from the list of parameters, this is no longer about fdb (addr,
>>vlanid) but this has been extended to something else.
>
>I am still missing understanding that part.
>Or maybe are you saying that you dont want to pass netlink
>constructs to the driver?

What I try to say is that the naming ndo_fdb_add/del is not accurate
because it is now used for far more than fdb (addr, vlan). See vxlan
code for example.


>
>>See vxlan code for
>>what this is used for. I believe that fdb_add/del should be renamed to
>>something else, perhaps l2neigh_add/del or something like that.
>>The other problem is that fdb_add/del is currently used by various
>>drivers for different purpose (adding macs to unicast list).
>>
>
>Ok, now a small spark ignited in my brain. You did talk about renaming
>things to neighXXX in one of the exchanges. I think this is a separate
>issue from the question of why you cant refactor ndo_fdb_add/del

It can be probably refactored in a way so it fits our fdb offloading
needs. I'm not really sure we would want it. ndo_fdb_* use-case
is dirrerent from what we introduce with ndo_sw_port_fdb_*. The only
similarity is the "fdb" name which in case of ndo_fdb_* is no longer
correct I believe.


>
>The abuse of using this interface for unicast addresses is probably
>driven by the fact some of the hardware probably offloads vlanid 0 or
>something speacial like 4095 to point to the underlying hardware that
>"this belongs to host cpu".
>I am not a fan of it (and have posted in exchanges with Vlad in the
>past).
>
>cheers,
>jamal

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch net-next v2 06/10] bridge: introduce fdb offloading via switchdev
  2014-11-10  8:15     ` Jiri Pirko
  2014-11-10  9:30       ` Scott Feldman
  2014-11-10 12:47       ` Jamal Hadi Salim
@ 2014-11-10 13:51       ` Thomas Graf
  2014-11-10 17:30         ` Andy Gospodarek
  2 siblings, 1 reply; 100+ messages in thread
From: Thomas Graf @ 2014-11-10 13:51 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Jamal Hadi Salim, netdev, davem, nhorman, andy, dborkman,
	ogerlitz, jesse, pshelar, azhou, ben, stephen, jeffrey.t.kirsher,
	vyasevic, xiyou.wangcong, john.r.fastabend, edumazet, sfeldma,
	f.fainelli, roopa, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck,
	john.ronciak, mleitner, shrijeet, gospo, bcrl

On 11/10/14 at 09:15am, Jiri Pirko wrote:
> There are few problems in re-using this. It is netlink based so for calling
> it from bridge code, we would have to construct netlink message. But
> that could be probably changed.
> As you can see from the list of parameters, this is no longer about fdb (addr,
> vlanid) but this has been extended to something else. See vxlan code for
> what this is used for. I believe that fdb_add/del should be renamed to
> something else, perhaps l2neigh_add/del or something like that.
> The other problem is that fdb_add/del is currently used by various
> drivers for different purpose (adding macs to unicast list).

Can you elaborate a bit on the intended semantic differences between
the existing ndo_fdb_add() and ndo_sw_port_fdb_add()? I'm not sure we
need the sw_ prefix for this specific ndo.

I completely agree that relying on Netlink is wrong because we'll have
in-kernel users of the API but I believe that existing ndo_fdb_add()
implementations in i40e, ixgbe, qlcnic and macvlan could use the new
API you propose.

How about we rename the existing ndo_fdb_add() to ndo_neigh_add() as
you propose and convert vxlan over to it and have all others which don't
even depend on the Netlink attributes being passed in (i40e, ixgbe,
qlcnic, macvlan) use ndo_fdb_add() which would have the behaviour of your
proposed ndo_sw_port_fdb_add()?

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch net-next v2 07/10] bridge: call netdev_sw_port_stp_update when bridge port STP status changes
  2014-11-10 13:11   ` Jamal Hadi Salim
@ 2014-11-10 14:04     ` Thomas Graf
  2014-11-10 19:20       ` Jamal Hadi Salim
  2014-11-10 15:59     ` Roopa Prabhu
  1 sibling, 1 reply; 100+ messages in thread
From: Thomas Graf @ 2014-11-10 14:04 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Jiri Pirko, netdev, davem, nhorman, andy, dborkman, ogerlitz,
	jesse, pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, sfeldma, f.fainelli,
	roopa, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck,
	john.ronciak, mleitner, shrijeet, gospo, bcrl

On 11/10/14 at 08:11am, Jamal Hadi Salim wrote:
> You are unconditionally calling
> netdev_sw_port_stp_update(p->dev, p->state);
> Again issue is policy. Could you make this work the same
> way the fdb_add e.g user intent of whether i want to turn
> a port in hardware and/or software to disabled/learning/etc
> is reflected?

Agreed. Can be added in a next series perhaps?

> btw: does _sw_ stand for switch? why not _hw_ ?
> Could we have one ndo for all flags instead of individual ones.
> 
> I know the current user space code uses u8 as a bitflag; but
> maybe we can introduce a new u32 flag bitmask that has all the
> flags set for backward compat? I can count about a total of 10.

I think we can just extend the size of IFLA_BRPORT_STATE, accept
both a u8 and u32, and return a u32 that that is compatible to
existing u8 readers.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch net-next v2 07/10] bridge: call netdev_sw_port_stp_update when bridge port STP status changes
  2014-11-10 13:11   ` Jamal Hadi Salim
  2014-11-10 14:04     ` Thomas Graf
@ 2014-11-10 15:59     ` Roopa Prabhu
  1 sibling, 0 replies; 100+ messages in thread
From: Roopa Prabhu @ 2014-11-10 15:59 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Jiri Pirko, netdev, davem, nhorman, andy, tgraf, dborkman,
	ogerlitz, jesse, pshelar, azhou, ben, stephen, jeffrey.t.kirsher,
	vyasevic, xiyou.wangcong, john.r.fastabend, edumazet, sfeldma,
	f.fainelli, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck,
	john.ronciak, mleitner, shrijeet, gospo, bcrl

On 11/10/14, 5:11 AM, Jamal Hadi Salim wrote:
> On 11/09/14 05:51, Jiri Pirko wrote:
>> From: Scott Feldman <sfeldma@gmail.com>
>>
>> To notify switch driver of change in STP state of bridge port, add new
>> .ndo op and provide swdev wrapper func to call ndo op. Use it in bridge
>> code then.
>>
>> Signed-off-by: Scott Feldman <sfeldma@gmail.com>
>> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
>> ---
>>   include/linux/netdevice.h |  6 ++++++
>>   include/net/switchdev.h   |  6 ++++++
>>   net/bridge/br_netlink.c   |  2 ++
>>   net/bridge/br_stp.c       |  4 ++++
>>   net/bridge/br_stp_if.c    |  3 +++
>>   net/bridge/br_stp_timer.c |  2 ++
>>   net/switchdev/switchdev.c | 19 +++++++++++++++++++
>>   7 files changed, 42 insertions(+)
>>
>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>> index 116a19d..35f21a95 100644
>> --- a/include/linux/netdevice.h
>> +++ b/include/linux/netdevice.h
>> @@ -1033,6 +1033,10 @@ typedef u16 (*select_queue_fallback_t)(struct 
>> net_device *dev,
>>    *                  const unsigned char *addr,
>>    *                  u16 vid);
>>    *    Called to delete a fdb from switch device port.
>> + *
>> + * int (*ndo_sw_port_stp_update)(struct net_device *dev, u8 state);
>> + *    Called to notify switch device port of bridge port STP
>> + *    state change.
>
> You are unconditionally calling
> netdev_sw_port_stp_update(p->dev, p->state);
> Again issue is policy. Could you make this work the same
> way the fdb_add e.g user intent of whether i want to turn
> a port in hardware and/or software to disabled/learning/etc
> is reflected?
>
> btw: does _sw_ stand for switch? why not _hw_ ?
> Could we have one ndo for all flags instead of individual ones.

I agree. There is the bridge port state and a bunch of bridge port 
flags. A generic ndo will be good.
>
> I know the current user space code uses u8 as a bitflag; but
> maybe we can introduce a new u32 flag bitmask that has all the
> flags set for backward compat? I can count about a total of 10.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch net-next v2 10/10] rocker: implement L2 bridge offloading
  2014-11-10 12:27       ` Jamal Hadi Salim
@ 2014-11-10 16:12         ` Roopa Prabhu
  2014-11-10 17:36           ` Scott Feldman
  2014-11-10 17:22         ` Scott Feldman
  1 sibling, 1 reply; 100+ messages in thread
From: Roopa Prabhu @ 2014-11-10 16:12 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Scott Feldman, Jiri Pirko, Netdev, David S. Miller, nhorman,
	Andy Gospodarek, Thomas Graf, dborkman, ogerlitz, jesse, pshelar,
	azhou, ben, stephen, Kirsher, Jeffrey T, vyasevic,
	xiyou.wangcong, Fastabend, John R, edumazet, Florian Fainelli,
	John Linville, jasowang, ebiederm, nicolas.dichtel, ryazanov.s.a,
	buytenh, aviadr, nbd, alexei.starovoitov, Neil.Jerram, ronye,
	simon.horman

On 11/10/14, 4:27 AM, Jamal Hadi Salim wrote:
> On 11/10/14 03:46, Scott Feldman wrote:
>
>>
>> IFLA_BRPORT_LEARNING is u8 attr and we're only using lower bit to turn
>> learning on/off.  Maybe we can use another bit to indicate learning to
>> be done in sw or hw.  I don't think adding another bit would break
>> existing iproute2.
>>
>> LEARNING_ENABLED    (1 << 0)
>> LEARNING_HW              (1 << 1)
>>
>> Would this work?
>>
>
> Yes to making it a bit. But:
> This is not *learning*. You are doing a *sync*.
> Those are two different things.
>
> Learning on/off exists today. It signals to the L2 whether you
> should learn or not.
> I like the way fdb_add/del work with a flag which says
> it is the software and/or offloaded version. Please keep that
> semantic.
> What you are doing above is letting the hardware learn then
> syncing to software. You need a different flag there. something
> like:
>
> SYNC_HW_FDB (1<<1)
>
And in any case, It seems like this policy should be per bridge or per 
switch chip...or per fdb..
entry (like the original fdb_add/del) and not a "port" flag.. ?

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch net-next v2 01/10] net: rename netdev_phys_port_id to more generic name
  2014-11-10 12:06       ` Jamal Hadi Salim
  2014-11-10 12:33         ` Daniel Borkmann
@ 2014-11-10 16:28         ` David Miller
  1 sibling, 0 replies; 100+ messages in thread
From: David Miller @ 2014-11-10 16:28 UTC (permalink / raw)
  To: jhs
  Cc: jiri, netdev, nhorman, andy, tgraf, dborkman, ogerlitz, jesse,
	pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, sfeldma, f.fainelli,
	roopa, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck,
	john.ronciak, mleitner, shrijeet, gospo, bcrl

From: Jamal Hadi Salim <jhs@mojatatu.com>
Date: Mon, 10 Nov 2014 07:06:29 -0500

> It is a _user space visible rename_, how about:
> 
> #define MAX_PHYS_ITEM_ID_LEN 32
> #define MAX_PHYS_PORT_ID_LEN   MAX_PHYS_ITEM_ID_LEN
> 
> I did miss the fact that the size didnt change.

The user cannot see this macro Jamal, please really read this
code, instead of, once again, jumping to conclusions.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch net-next v2 01/10] net: rename netdev_phys_port_id to more generic name
  2014-11-10 12:17       ` Jamal Hadi Salim
  2014-11-10 13:16         ` Jiri Pirko
@ 2014-11-10 16:28         ` David Miller
  2014-11-10 19:03           ` Jamal Hadi Salim
  1 sibling, 1 reply; 100+ messages in thread
From: David Miller @ 2014-11-10 16:28 UTC (permalink / raw)
  To: jhs
  Cc: jiri, netdev, nhorman, andy, tgraf, dborkman, ogerlitz, jesse,
	pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, sfeldma, f.fainelli,
	roopa, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck,
	john.ronciak, mleitner, shrijeet, gospo, bcrl

From: Jamal Hadi Salim <jhs@mojatatu.com>
Date: Mon, 10 Nov 2014 07:17:04 -0500

> On 11/10/14 02:43, Jiri Pirko wrote:
>> Mon, Nov 10, 2014 at 04:35:12AM CET, jhs@mojatatu.com wrote:
> 
>> I don't see a reason why this would break kabi:
>>
>> -#define MAX_PHYS_PORT_ID_LEN 32
>> +#define MAX_PHYS_ITEM_ID_LEN 32
>>
> 
> refer to my response to Dave. Just define MAX_PHYS_PORT_ID_LEN to
> MAX_PHYS_ITEM_ID_LEN so people dont have to change their code
> because a name change.

Again, nobody has to change anything.

This macro is not visible outside of the kernel.

Jamal, this is really driving me crazy, this is a non-issue.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch net-next v2 00/10] introduce rocker switch driver with hardware accelerated datapath api - phase 1: bridge fdb offload
  2014-11-09 10:51 [patch net-next v2 00/10] introduce rocker switch driver with hardware accelerated datapath api - phase 1: bridge fdb offload Jiri Pirko
                   ` (11 preceding siblings ...)
  2014-11-10  3:31 ` [patch net-next v2 00/10] introduce rocker switch driver with hardware accelerated datapath api - phase 1: bridge fdb offload Jamal Hadi Salim
@ 2014-11-10 16:48 ` Thomas Graf
  2014-11-12 13:44 ` Jiri Pirko
  13 siblings, 0 replies; 100+ messages in thread
From: Thomas Graf @ 2014-11-10 16:48 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, nhorman, andy, dborkman, ogerlitz, jesse, pshelar,
	azhou, ben, stephen, jeffrey.t.kirsher, vyasevic, xiyou.wangcong,
	john.r.fastabend, edumazet, jhs, sfeldma, f.fainelli, roopa,
	linville, jasowang, ebiederm, nicolas.dichtel, ryazanov.s.a,
	buytenh, aviadr, nbd, alexei.starovoitov, Neil.Jerram, ronye,
	simon.horman, alexander.h.duyck, john.ronciak, mleitner,
	shrijeet, gospo, bcrl

On 11/09/14 at 11:51am, Jiri Pirko wrote:
> Hi all.
> 
> This patchset is just the first phase of switch and switch-ish device
> support api in kernel. Note that the api will extend (our complete work
> can be pulled from https://github.com/jpirko/net-next-rocker).

Despite my comment on ndo_fdb_add() which I believe can be reconsidered
later. I like this pach series a lot. Thanks for putting in so much work
on this topic.

Reviewed-by: Thomas Graf <tgraf@suug.ch>

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch net-next v2 10/10] rocker: implement L2 bridge offloading
  2014-11-10 12:27       ` Jamal Hadi Salim
  2014-11-10 16:12         ` Roopa Prabhu
@ 2014-11-10 17:22         ` Scott Feldman
  1 sibling, 0 replies; 100+ messages in thread
From: Scott Feldman @ 2014-11-10 17:22 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Jiri Pirko, Netdev, David S. Miller, nhorman, Andy Gospodarek,
	Thomas Graf, dborkman, ogerlitz, jesse, pshelar, azhou, ben,
	stephen, Kirsher, Jeffrey T, vyasevic, Cong Wang, Fastabend,
	John R, Eric Dumazet, Florian Fainelli, Roopa Prabhu,
	John Linville, jasowang, ebiederm, nicolas.dichtel, ryazanov.s.a,
	buytenh, aviadr, nbd, Alexei Starovoitov, Neil.Je

On Mon, Nov 10, 2014 at 2:27 AM, Jamal Hadi Salim <jhs@mojatatu.com> wrote:
> On 11/10/14 03:46, Scott Feldman wrote:
>
>>
>> IFLA_BRPORT_LEARNING is u8 attr and we're only using lower bit to turn
>> learning on/off.  Maybe we can use another bit to indicate learning to
>> be done in sw or hw.  I don't think adding another bit would break
>> existing iproute2.
>>
>> LEARNING_ENABLED    (1 << 0)
>> LEARNING_HW              (1 << 1)
>>
>> Would this work?
>>
>
> Yes to making it a bit. But:
> This is not *learning*. You are doing a *sync*.
> Those are two different things.
>
> Learning on/off exists today. It signals to the L2 whether you
> should learn or not.
> I like the way fdb_add/del work with a flag which says
> it is the software and/or offloaded version. Please keep that
> semantic.
> What you are doing above is letting the hardware learn then
> syncing to software. You need a different flag there. something
> like:
>
> SYNC_HW_FDB (1<<1)

Agreed, that's more accurate.  Thanks for the refinement.

>
> cheers,
> jamal

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch net-next v2 06/10] bridge: introduce fdb offloading via switchdev
  2014-11-10 13:51       ` Thomas Graf
@ 2014-11-10 17:30         ` Andy Gospodarek
  2014-11-10 19:03           ` Roopa Prabhu
  0 siblings, 1 reply; 100+ messages in thread
From: Andy Gospodarek @ 2014-11-10 17:30 UTC (permalink / raw)
  To: Thomas Graf
  Cc: Jiri Pirko, Jamal Hadi Salim, netdev, davem, nhorman, andy,
	dborkman, ogerlitz, jesse, pshelar, azhou, ben, stephen,
	jeffrey.t.kirsher, vyasevic, xiyou.wangcong, john.r.fastabend,
	edumazet, sfeldma, f.fainelli, roopa, linville, jasowang,
	ebiederm, nicolas.dichtel, ryazanov.s.a, buytenh, aviadr, nbd,
	alexei.starovoitov, Neil.Jerram, ronye, simon.horman,
	alexander.h.duyck, john.ronciak, mleitner, shrijeet, b

On Mon, Nov 10, 2014 at 01:51:00PM +0000, Thomas Graf wrote:
> On 11/10/14 at 09:15am, Jiri Pirko wrote:
> > There are few problems in re-using this. It is netlink based so for calling
> > it from bridge code, we would have to construct netlink message. But
> > that could be probably changed.
> > As you can see from the list of parameters, this is no longer about fdb (addr,
> > vlanid) but this has been extended to something else. See vxlan code for
> > what this is used for. I believe that fdb_add/del should be renamed to
> > something else, perhaps l2neigh_add/del or something like that.
> > The other problem is that fdb_add/del is currently used by various
> > drivers for different purpose (adding macs to unicast list).
> 
> Can you elaborate a bit on the intended semantic differences between
> the existing ndo_fdb_add() and ndo_sw_port_fdb_add()? I'm not sure we
> need the sw_ prefix for this specific ndo.
> 
> I completely agree that relying on Netlink is wrong because we'll have
> in-kernel users of the API but I believe that existing ndo_fdb_add()
> implementations in i40e, ixgbe, qlcnic and macvlan could use the new
> API you propose.
I also think the same API could be used quite easily on the current
drivers that use it.

I was looking at this earlier today and there are only 5 drivers
(outside the bridge code) that support ndo_fdb_add.  The 3 hardware
drivers and vxlan driver seem like they could use this new API.  The
macvlan code appears to simply set the uc and mc lists, which seems like
it could be done other ways -- confirmation from John Fastabend, Roopa,
and mst would be good.

> How about we rename the existing ndo_fdb_add() to ndo_neigh_add() as
> you propose and convert vxlan over to it and have all others which don't
> even depend on the Netlink attributes being passed in (i40e, ixgbe,
> qlcnic, macvlan) use ndo_fdb_add() which would have the behaviour of your
> proposed ndo_sw_port_fdb_add()?
I would much rather see something like Thomas proposes here.  I know you
would like to see these patches get included (I'm anxious to see better
in-kernel offload support too!), but separate, possibly unnecessary
APIs like this can get painful for driver maintainers (upstream and
distro maintainers).

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch net-next v2 10/10] rocker: implement L2 bridge offloading
  2014-11-10 16:12         ` Roopa Prabhu
@ 2014-11-10 17:36           ` Scott Feldman
  2014-11-10 18:35             ` Roopa Prabhu
  2014-11-10 19:25             ` Jamal Hadi Salim
  0 siblings, 2 replies; 100+ messages in thread
From: Scott Feldman @ 2014-11-10 17:36 UTC (permalink / raw)
  To: Roopa Prabhu
  Cc: Jamal Hadi Salim, Jiri Pirko, Netdev, David S. Miller, nhorman,
	Andy Gospodarek, Thomas Graf, dborkman, ogerlitz, jesse, pshelar,
	azhou, ben, stephen, Kirsher, Jeffrey T, vyasevic, Cong Wang,
	Fastabend, John R, Eric Dumazet, Florian Fainelli, John Linville,
	jasowang, ebiederm, nicolas.dichtel, ryazanov.s.a, buytenh,
	aviadr, nbd, Alexei Starovoitov, Neil

On Mon, Nov 10, 2014 at 6:12 AM, Roopa Prabhu <roopa@cumulusnetworks.com> wrote:
> On 11/10/14, 4:27 AM, Jamal Hadi Salim wrote:
>>
>> On 11/10/14 03:46, Scott Feldman wrote:
>>
>>>
>>> IFLA_BRPORT_LEARNING is u8 attr and we're only using lower bit to turn
>>> learning on/off.  Maybe we can use another bit to indicate learning to
>>> be done in sw or hw.  I don't think adding another bit would break
>>> existing iproute2.
>>>
>>> LEARNING_ENABLED    (1 << 0)
>>> LEARNING_HW              (1 << 1)
>>>
>>> Would this work?
>>>
>>
>> Yes to making it a bit. But:
>> This is not *learning*. You are doing a *sync*.
>> Those are two different things.
>>
>> Learning on/off exists today. It signals to the L2 whether you
>> should learn or not.
>> I like the way fdb_add/del work with a flag which says
>> it is the software and/or offloaded version. Please keep that
>> semantic.
>> What you are doing above is letting the hardware learn then
>> syncing to software. You need a different flag there. something
>> like:
>>
>> SYNC_HW_FDB (1<<1)
>>
> And in any case, It seems like this policy should be per bridge or per
> switch chip...or per fdb..
> entry (like the original fdb_add/del) and not a "port" flag.. ?

Per-port gives more flexibility, and it looks like we can extend
existing IFLA_BRPORT_LEARNING without much trouble.

I didn't follow the fdb_add/del comment?  Isn't an fdb entry
port-specific by nature?  fdb entry = {port, mac, vlan} tuple.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch net-next v2 03/10] rtnl: expose physical switch id for particular device
  2014-11-09 10:51 ` [patch net-next v2 03/10] rtnl: expose physical switch id for particular device Jiri Pirko
  2014-11-10  3:43   ` Jamal Hadi Salim
@ 2014-11-10 17:58   ` Roopa Prabhu
  2014-11-10 20:02     ` Scott Feldman
  2014-11-10 22:14     ` Jiri Pirko
  2014-11-10 22:01   ` John Fastabend
  2 siblings, 2 replies; 100+ messages in thread
From: Roopa Prabhu @ 2014-11-10 17:58 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse,
	pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, jhs, sfeldma,
	f.fainelli, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck,
	john.ronciak, mleitner, shrijeet, gospo, bcrl

On 11/9/14, 2:51 AM, Jiri Pirko wrote:
> The netdevice represents a port in a switch, it will expose
> IFLA_PHYS_SWITCH_ID value via rtnl. Two netdevices with the same value
> belong to one physical switch.
>
> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
> ---
>   include/uapi/linux/if_link.h |  1 +
>   net/core/rtnetlink.c         | 26 +++++++++++++++++++++++++-
>   2 files changed, 26 insertions(+), 1 deletion(-)
>
> diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
> index 7072d83..4163753 100644
> --- a/include/uapi/linux/if_link.h
> +++ b/include/uapi/linux/if_link.h
> @@ -145,6 +145,7 @@ enum {
>   	IFLA_CARRIER,
>   	IFLA_PHYS_PORT_ID,
>   	IFLA_CARRIER_CHANGES,
> +	IFLA_PHYS_SWITCH_ID,

Jiri, since we have not really converged on the switchdev class or 
having a separate switchdev instance,
am thinking it is better if we dont expose any such switch_id to 
userspace yet until absolutely needed. Do you need it today ?
There is no real in kernel hw switch driver that will use it today. And 
quite likely this will need to change when we introduce real hw switch 
drivers.


>   	__IFLA_MAX
>   };
>   
> diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> index 1087c6d..f839354 100644
> --- a/net/core/rtnetlink.c
> +++ b/net/core/rtnetlink.c
> @@ -43,6 +43,7 @@
>   
>   #include <linux/inet.h>
>   #include <linux/netdevice.h>
> +#include <net/switchdev.h>
>   #include <net/ip.h>
>   #include <net/protocol.h>
>   #include <net/arp.h>
> @@ -868,7 +869,8 @@ static noinline size_t if_nlmsg_size(const struct net_device *dev,
>   	       + rtnl_port_size(dev, ext_filter_mask) /* IFLA_VF_PORTS + IFLA_PORT_SELF */
>   	       + rtnl_link_get_size(dev) /* IFLA_LINKINFO */
>   	       + rtnl_link_get_af_size(dev) /* IFLA_AF_SPEC */
> -	       + nla_total_size(MAX_PHYS_ITEM_ID_LEN); /* IFLA_PHYS_PORT_ID */
> +	       + nla_total_size(MAX_PHYS_ITEM_ID_LEN) /* IFLA_PHYS_PORT_ID */
> +	       + nla_total_size(MAX_PHYS_ITEM_ID_LEN); /* IFLA_PHYS_SWITCH_ID */
>   }
>   
>   static int rtnl_vf_ports_fill(struct sk_buff *skb, struct net_device *dev)
> @@ -967,6 +969,24 @@ static int rtnl_phys_port_id_fill(struct sk_buff *skb, struct net_device *dev)
>   	return 0;
>   }
>   
> +static int rtnl_phys_switch_id_fill(struct sk_buff *skb, struct net_device *dev)
> +{
> +	int err;
> +	struct netdev_phys_item_id psid;
> +
> +	err = netdev_sw_parent_id_get(dev, &psid);
> +	if (err) {
> +		if (err == -EOPNOTSUPP)
> +			return 0;
> +		return err;
> +	}
> +
> +	if (nla_put(skb, IFLA_PHYS_SWITCH_ID, psid.id_len, psid.id))
> +		return -EMSGSIZE;
> +
> +	return 0;
> +}
> +
>   static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev,
>   			    int type, u32 pid, u32 seq, u32 change,
>   			    unsigned int flags, u32 ext_filter_mask)
> @@ -1039,6 +1059,9 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev,
>   	if (rtnl_phys_port_id_fill(skb, dev))
>   		goto nla_put_failure;
>   
> +	if (rtnl_phys_switch_id_fill(skb, dev))
> +		goto nla_put_failure;
> +
>   	attr = nla_reserve(skb, IFLA_STATS,
>   			sizeof(struct rtnl_link_stats));
>   	if (attr == NULL)
> @@ -1198,6 +1221,7 @@ static const struct nla_policy ifla_policy[IFLA_MAX+1] = {
>   	[IFLA_NUM_RX_QUEUES]	= { .type = NLA_U32 },
>   	[IFLA_PHYS_PORT_ID]	= { .type = NLA_BINARY, .len = MAX_PHYS_ITEM_ID_LEN },
>   	[IFLA_CARRIER_CHANGES]	= { .type = NLA_U32 },  /* ignored */
> +	[IFLA_PHYS_SWITCH_ID]	= { .type = NLA_BINARY, .len = MAX_PHYS_ITEM_ID_LEN },
>   };
>   
>   static const struct nla_policy ifla_info_policy[IFLA_INFO_MAX+1] = {

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch net-next v2 10/10] rocker: implement L2 bridge offloading
  2014-11-10 17:36           ` Scott Feldman
@ 2014-11-10 18:35             ` Roopa Prabhu
  2014-11-10 19:27               ` Jamal Hadi Salim
  2014-11-10 19:25             ` Jamal Hadi Salim
  1 sibling, 1 reply; 100+ messages in thread
From: Roopa Prabhu @ 2014-11-10 18:35 UTC (permalink / raw)
  To: Scott Feldman
  Cc: Jamal Hadi Salim, Jiri Pirko, Netdev, David S. Miller, nhorman,
	Andy Gospodarek, Thomas Graf, dborkman, ogerlitz, jesse, pshelar,
	azhou, ben, stephen, Kirsher, Jeffrey T, vyasevic, Cong Wang,
	Fastabend, John R, Eric Dumazet, Florian Fainelli, John Linville,
	jasowang, ebiederm, nicolas.dichtel, ryazanov.s.a, buytenh,
	aviadr, nbd, Alexei Starovoitov, Neil

On 11/10/14, 9:36 AM, Scott Feldman wrote:
> On Mon, Nov 10, 2014 at 6:12 AM, Roopa Prabhu <roopa@cumulusnetworks.com> wrote:
>> On 11/10/14, 4:27 AM, Jamal Hadi Salim wrote:
>>> On 11/10/14 03:46, Scott Feldman wrote:
>>>
>>>> IFLA_BRPORT_LEARNING is u8 attr and we're only using lower bit to turn
>>>> learning on/off.  Maybe we can use another bit to indicate learning to
>>>> be done in sw or hw.  I don't think adding another bit would break
>>>> existing iproute2.
>>>>
>>>> LEARNING_ENABLED    (1 << 0)
>>>> LEARNING_HW              (1 << 1)
>>>>
>>>> Would this work?
>>>>
>>> Yes to making it a bit. But:
>>> This is not *learning*. You are doing a *sync*.
>>> Those are two different things.
>>>
>>> Learning on/off exists today. It signals to the L2 whether you
>>> should learn or not.
>>> I like the way fdb_add/del work with a flag which says
>>> it is the software and/or offloaded version. Please keep that
>>> semantic.
>>> What you are doing above is letting the hardware learn then
>>> syncing to software. You need a different flag there. something
>>> like:
>>>
>>> SYNC_HW_FDB (1<<1)
>>>
>> And in any case, It seems like this policy should be per bridge or per
>> switch chip...or per fdb..
>> entry (like the original fdb_add/del) and not a "port" flag.. ?
> Per-port gives more flexibility, and it looks like we can extend
> existing IFLA_BRPORT_LEARNING without much trouble.
>
> I didn't follow the fdb_add/del comment?  Isn't an fdb entry
> port-specific by nature?  fdb entry = {port, mac, vlan} tuple.
yes it is, But if i remember correctly, the api (ndo op) could indicate 
offload to hw (or nic in this case)
by giving 'self'. And in those cases the netdev nic port represents the 
switch.
  (Will be nice to check and confirm this though).

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch net-next v2 01/10] net: rename netdev_phys_port_id to more generic name
  2014-11-10 16:28         ` David Miller
@ 2014-11-10 19:03           ` Jamal Hadi Salim
  0 siblings, 0 replies; 100+ messages in thread
From: Jamal Hadi Salim @ 2014-11-10 19:03 UTC (permalink / raw)
  To: David Miller
  Cc: jiri, netdev, nhorman, andy, tgraf, dborkman, ogerlitz, jesse,
	pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, sfeldma, f.fainelli,
	roopa, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck,
	john.ronciak, mleitner, shrijeet, gospo, bc

On 11/10/14 11:28, David Miller wrote:
> From: Jamal Hadi Salim <jhs@mojatatu.com>

>
> Jamal, this is really driving me crazy, this is a non-issue.
>

You are right Dave it is a non-issue, I am sorry. I was still humping on
the wrong tree on the attribute size change.

But please let us have the discussion so the right thing can happen.
I am not trying to beat up on Jiri. I actually think he is doing the
right thing; we just need to make sure input from other people is taken
into consideration because they have implemented on real devices.

cheers,
jamal

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch net-next v2 06/10] bridge: introduce fdb offloading via switchdev
  2014-11-10 17:30         ` Andy Gospodarek
@ 2014-11-10 19:03           ` Roopa Prabhu
  2014-11-12 13:43             ` Jiri Pirko
  0 siblings, 1 reply; 100+ messages in thread
From: Roopa Prabhu @ 2014-11-10 19:03 UTC (permalink / raw)
  To: Andy Gospodarek
  Cc: Thomas Graf, Jiri Pirko, Jamal Hadi Salim, netdev, davem,
	nhorman, andy, dborkman, ogerlitz, jesse, pshelar, azhou, ben,
	stephen, jeffrey.t.kirsher, vyasevic, xiyou.wangcong,
	john.r.fastabend, edumazet, sfeldma, f.fainelli, linville,
	jasowang, ebiederm, nicolas.dichtel, ryazanov.s.a, buytenh,
	aviadr, nbd, alexei.starovoitov, Neil.Jerram, ronye,
	simon.horman, alexander.h.duyck, john.ronciak, mleitner,
	shrijeet

On 11/10/14, 9:30 AM, Andy Gospodarek wrote:
> On Mon, Nov 10, 2014 at 01:51:00PM +0000, Thomas Graf wrote:
>> On 11/10/14 at 09:15am, Jiri Pirko wrote:
>>> There are few problems in re-using this. It is netlink based so for calling
>>> it from bridge code, we would have to construct netlink message. But
>>> that could be probably changed.
>>> As you can see from the list of parameters, this is no longer about fdb (addr,
>>> vlanid) but this has been extended to something else. See vxlan code for
>>> what this is used for. I believe that fdb_add/del should be renamed to
>>> something else, perhaps l2neigh_add/del or something like that.
>>> The other problem is that fdb_add/del is currently used by various
>>> drivers for different purpose (adding macs to unicast list).
>> Can you elaborate a bit on the intended semantic differences between
>> the existing ndo_fdb_add() and ndo_sw_port_fdb_add()? I'm not sure we
>> need the sw_ prefix for this specific ndo.
>>
>> I completely agree that relying on Netlink is wrong because we'll have
>> in-kernel users of the API but I believe that existing ndo_fdb_add()
>> implementations in i40e, ixgbe, qlcnic and macvlan could use the new
>> API you propose.
> I also think the same API could be used quite easily on the current
> drivers that use it.
>
> I was looking at this earlier today and there are only 5 drivers
> (outside the bridge code) that support ndo_fdb_add.  The 3 hardware
> drivers and vxlan driver seem like they could use this new API.  The
> macvlan code appears to simply set the uc and mc lists, which seems like
> it could be done other ways -- confirmation from John Fastabend, Roopa,
> and mst would be good.
yes, that is correct. The macvlan code, when not set for passthru mode, 
seems to just program the uc-mc lists,
  which again get synched to the lowerdev (and possibly from lowerdev to 
hw in some cases).

I agree that it would be really nice if the existing api's can be made 
to work.
>> How about we rename the existing ndo_fdb_add() to ndo_neigh_add() as
>> you propose and convert vxlan over to it and have all others which don't
>> even depend on the Netlink attributes being passed in (i40e, ixgbe,
>> qlcnic, macvlan) use ndo_fdb_add() which would have the behaviour of your
>> proposed ndo_sw_port_fdb_add()?
> I would much rather see something like Thomas proposes here.  I know you
> would like to see these patches get included (I'm anxious to see better
> in-kernel offload support too!), but separate, possibly unnecessary
> APIs like this can get painful for driver maintainers (upstream and
> distro maintainers).
>
Ack!.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch net-next v2 06/10] bridge: introduce fdb offloading via switchdev
  2014-11-10 13:47         ` Jiri Pirko
@ 2014-11-10 19:13           ` Jamal Hadi Salim
  0 siblings, 0 replies; 100+ messages in thread
From: Jamal Hadi Salim @ 2014-11-10 19:13 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse,
	pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, sfeldma, f.fainelli,
	roopa, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck,
	john.ronciak, mleitner, shrijeet, gospo, bcrl

On 11/10/14 08:47, Jiri Pirko wrote:
> Mon, Nov 10, 2014 at 01:47:33PM CET, jhs@mojatatu.com wrote:

>
> Because now, If you would like to pass one of NDA_DST, NDA_LLADDR,
> NDA_CACHEINFO, NDA_PROBES, NDA_VLAN, NDA_PORT, NDA_VNI, NDA_IFINDEX,
> NDA_MASTER values via ndo_fdb_add/del to the driver, you have to
> construct "struct nlattr *tb[]". Preprocessing this tb into struct might
> be suitable for some use-case, for some it may not.
>

Ok, I see what you mean now. yes, netlink attributes are passed around.

>
> What I try to say is that the naming ndo_fdb_add/del is not accurate
> because it is now used for far more than fdb (addr, vlan). See vxlan
> code for example.
>

In vxlan semantics that is an "fdb".
In any case, I am indifferent to be honest - netlink attributes in
this case seem to be easier. i.e For the driver:
Ignore what you dont want and suck in what you need (whether vxlan
or vlan or nvgre etc).

cheers,
jamal

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch net-next v2 07/10] bridge: call netdev_sw_port_stp_update when bridge port STP status changes
  2014-11-10 14:04     ` Thomas Graf
@ 2014-11-10 19:20       ` Jamal Hadi Salim
  0 siblings, 0 replies; 100+ messages in thread
From: Jamal Hadi Salim @ 2014-11-10 19:20 UTC (permalink / raw)
  To: Thomas Graf
  Cc: Jiri Pirko, netdev, davem, nhorman, andy, dborkman, ogerlitz,
	jesse, pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, sfeldma, f.fainelli,
	roopa, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck,
	john.ronciak, mleitner, shrijeet, gospo, bcrl

On 11/10/14 09:04, Thomas Graf wrote:
> On 11/10/14 at 08:11am, Jamal Hadi Salim wrote:
>> You are unconditionally calling
>> netdev_sw_port_stp_update(p->dev, p->state);
>> Again issue is policy. Could you make this work the same
>> way the fdb_add e.g user intent of whether i want to turn
>> a port in hardware and/or software to disabled/learning/etc
>> is reflected?
>
> Agreed. Can be added in a next series perhaps?
>

Doesnt seem be hard to fix now. As Andy was pointing out, we have
the opportunity to get the basics right in the beggining.


> I think we can just extend the size of IFLA_BRPORT_STATE, accept
> both a u8 and u32, and return a u32 that that is compatible to
> existing u8 readers.
>

Iam thinking we have an opportunity for a totally different new
attribute instead of growing IFLA_BRPORT_STATE to add another bit.
Almost every single u8 that is being carried today in its own attribute
in the bridge code is in fact a boolean (0/1). We could leave just
intro IFLA_BRIDGE_FLAGS and use it for both FDB and BRPORT.

cheers,
jamal

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch net-next v2 10/10] rocker: implement L2 bridge offloading
  2014-11-10 17:36           ` Scott Feldman
  2014-11-10 18:35             ` Roopa Prabhu
@ 2014-11-10 19:25             ` Jamal Hadi Salim
  1 sibling, 0 replies; 100+ messages in thread
From: Jamal Hadi Salim @ 2014-11-10 19:25 UTC (permalink / raw)
  To: Scott Feldman, Roopa Prabhu
  Cc: Jiri Pirko, Netdev, David S. Miller, nhorman, Andy Gospodarek,
	Thomas Graf, dborkman, ogerlitz, jesse, pshelar, azhou, ben,
	stephen, Kirsher, Jeffrey T, vyasevic, Cong Wang, Fastabend,
	John R, Eric Dumazet, Florian Fainelli, John Linville, jasowang,
	ebiederm, nicolas.dichtel, ryazanov.s.a, buytenh, aviadr, nbd,
	Alexei Starovoitov, Neil Jerram, ronye

On 11/10/14 12:36, Scott Feldman wrote:
> On Mon, Nov 10, 2014 at 6:12 AM, Roopa Prabhu <roopa@cumulusnetworks.com> wrote:

> Per-port gives more flexibility, and it looks like we can extend
> existing IFLA_BRPORT_LEARNING without much trouble.
>

I am thinking per port as well...
So we have hardware and/or software selection of per port:
fdb flooding control, fdb learning control,
stp controls, multicast controls
some of those weird vepa controls

This is why i thought all these could use a bit representation each.
You have 32 bit flags and 32 bit selector.

cheers,
jamal

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch net-next v2 10/10] rocker: implement L2 bridge offloading
  2014-11-10 18:35             ` Roopa Prabhu
@ 2014-11-10 19:27               ` Jamal Hadi Salim
  2014-11-10 19:47                 ` Scott Feldman
  0 siblings, 1 reply; 100+ messages in thread
From: Jamal Hadi Salim @ 2014-11-10 19:27 UTC (permalink / raw)
  To: Roopa Prabhu, Scott Feldman
  Cc: Jiri Pirko, Netdev, David S. Miller, nhorman, Andy Gospodarek,
	Thomas Graf, dborkman, ogerlitz, jesse, pshelar, azhou, ben,
	stephen, Kirsher, Jeffrey T, vyasevic, Cong Wang, Fastabend,
	John R, Eric Dumazet, Florian Fainelli, John Linville, jasowang,
	ebiederm, nicolas.dichtel, ryazanov.s.a, buytenh, aviadr, nbd,
	Alexei Starovoitov, Neil Jerram, ronye

On 11/10/14 13:35, Roopa Prabhu wrote:
> On 11/10/14, 9:36 AM, Scott Feldman wrote:
>> On Mon, Nov 10, 2014 at 6:12 AM, Roopa Prabhu
>> <roopa@cumulusnetworks.com> wrote:
>>> On 11/10/14, 4:27 AM, Jamal Hadi Salim wrote:
>>>> On 11/10/14 03:46, Scott Feldman wrote:
>
> yes it is, But if i remember correctly, the api (ndo op) could indicate
> offload to hw (or nic in this case)
> by giving 'self'. And in those cases the netdev nic port represents the
> switch.
>   (Will be nice to check and confirm this though).

No, you are correct. You select to add to the bridge fdb or/and via
the underlying brport fdb.

cheers,
jamal

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch net-next v2 10/10] rocker: implement L2 bridge offloading
  2014-11-10 19:27               ` Jamal Hadi Salim
@ 2014-11-10 19:47                 ` Scott Feldman
  2014-11-10 21:14                   ` Jamal Hadi Salim
  0 siblings, 1 reply; 100+ messages in thread
From: Scott Feldman @ 2014-11-10 19:47 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Roopa Prabhu, Jiri Pirko, Netdev, David S. Miller, nhorman,
	Andy Gospodarek, Thomas Graf, dborkman, ogerlitz, jesse, pshelar,
	azhou, ben, stephen, Kirsher, Jeffrey T, vyasevic, Cong Wang,
	Fastabend, John R, Eric Dumazet, Florian Fainelli, John Linville,
	jasowang, ebiederm, Nicolas Dichtel, ryazanov.s.a, buytenh,
	aviadr, nbd, Alexei Starovoitov

On Mon, Nov 10, 2014 at 9:27 AM, Jamal Hadi Salim <jhs@mojatatu.com> wrote:
> On 11/10/14 13:35, Roopa Prabhu wrote:
>>
>> On 11/10/14, 9:36 AM, Scott Feldman wrote:
>>>
>>> On Mon, Nov 10, 2014 at 6:12 AM, Roopa Prabhu
>>> <roopa@cumulusnetworks.com> wrote:
>>>>
>>>> On 11/10/14, 4:27 AM, Jamal Hadi Salim wrote:
>>>>>
>>>>> On 11/10/14 03:46, Scott Feldman wrote:
>>
>>
>> yes it is, But if i remember correctly, the api (ndo op) could indicate
>> offload to hw (or nic in this case)
>> by giving 'self'. And in those cases the netdev nic port represents the
>> switch.
>>   (Will be nice to check and confirm this though).
>
>
> No, you are correct. You select to add to the bridge fdb or/and via
> the underlying brport fdb.


For swdev, I don't care for the model where each port has an fdb and
the bridge has an fdb.  The bridge's fdb lookup/learning/fwding is
what we're offloading to HW, so it makes more sense from the driver
and to the user to use one fdb, the bridge's fdb.  So user types
"bridge fdb show" and static fdbs installed on the bridge and learned
fdbs synced from HW are represented.  One table.

I view the existing ndo_fdb_add/del ops useful for devices working
standalone without the bridge driver that have some HW fwding
capabilities and need to manage their own fdb.  For devices under
bridge, let's use the bridge's fdb, at least for swdev.

Does this make sense?  I hate to use a lot of "I"s in my sentences,
but looks like I did exactly that in above, so take this as an
opinion, within the scope of swdev.

-scott

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch net-next v2 03/10] rtnl: expose physical switch id for particular device
  2014-11-10 17:58   ` Roopa Prabhu
@ 2014-11-10 20:02     ` Scott Feldman
  2014-11-11 13:55       ` Roopa Prabhu
  2014-11-10 22:14     ` Jiri Pirko
  1 sibling, 1 reply; 100+ messages in thread
From: Scott Feldman @ 2014-11-10 20:02 UTC (permalink / raw)
  To: Roopa Prabhu
  Cc: Jiri Pirko, Netdev, David S. Miller, nhorman, Andy Gospodarek,
	Thomas Graf, dborkman, ogerlitz, jesse, pshelar, azhou, ben,
	stephen, Kirsher, Jeffrey T, vyasevic, Cong Wang, Fastabend,
	John R, Eric Dumazet, Jamal Hadi Salim, Florian Fainelli,
	John Linville, jasowang, ebiederm, Nicolas Dichtel, ryazanov.s.a,
	buytenh, aviadr, nbd, Alexei Starovoitov

On Mon, Nov 10, 2014 at 7:58 AM, Roopa Prabhu <roopa@cumulusnetworks.com> wrote:
> On 11/9/14, 2:51 AM, Jiri Pirko wrote:
>>
>> The netdevice represents a port in a switch, it will expose
>> IFLA_PHYS_SWITCH_ID value via rtnl. Two netdevices with the same value
>> belong to one physical switch.
>>
>> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
>> ---
>>   include/uapi/linux/if_link.h |  1 +
>>   net/core/rtnetlink.c         | 26 +++++++++++++++++++++++++-
>>   2 files changed, 26 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
>> index 7072d83..4163753 100644
>> --- a/include/uapi/linux/if_link.h
>> +++ b/include/uapi/linux/if_link.h
>> @@ -145,6 +145,7 @@ enum {
>>         IFLA_CARRIER,
>>         IFLA_PHYS_PORT_ID,
>>         IFLA_CARRIER_CHANGES,
>> +       IFLA_PHYS_SWITCH_ID,
>
>
> Jiri, since we have not really converged on the switchdev class or having a
> separate switchdev instance,
> am thinking it is better if we dont expose any such switch_id to userspace
> yet until absolutely needed. Do you need it today ?
> There is no real in kernel hw switch driver that will use it today. And
> quite likely this will need to change when we introduce real hw switch
> drivers.

How will it change when real hw switch drivers are introduced?  Will
the real sw driver not be able to give a up unique ID for the switch?

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch net-next v2 10/10] rocker: implement L2 bridge offloading
  2014-11-10 19:47                 ` Scott Feldman
@ 2014-11-10 21:14                   ` Jamal Hadi Salim
  0 siblings, 0 replies; 100+ messages in thread
From: Jamal Hadi Salim @ 2014-11-10 21:14 UTC (permalink / raw)
  To: Scott Feldman
  Cc: Roopa Prabhu, Jiri Pirko, Netdev, David S. Miller, nhorman,
	Andy Gospodarek, Thomas Graf, dborkman, ogerlitz, jesse, pshelar,
	azhou, ben, stephen, Kirsher, Jeffrey T, vyasevic, Cong Wang,
	Fastabend, John R, Eric Dumazet, Florian Fainelli, John Linville,
	jasowang, ebiederm, Nicolas Dichtel, ryazanov.s.a, buytenh,
	aviadr, nbd, Alexei Starovoitov

On 11/10/14 14:47, Scott Feldman wrote:
> On Mon, Nov 10, 2014 at 9:27 AM, Jamal Hadi Salim <jhs@mojatatu.com> wrote:

>
> For swdev, I don't care for the model where each port has an fdb and
> the bridge has an fdb.  The bridge's fdb lookup/learning/fwding is
> what we're offloading to HW, so it makes more sense from the driver
> and to the user to use one fdb, the bridge's fdb.  So user types
> "bridge fdb show" and static fdbs installed on the bridge and learned
> fdbs synced from HW are represented.  One table.
>

side note:
I hope we'd be able to tell apart what is in hardware vs software
and what has been synced up from hardware (assuming policy says we
are allowed to sync things).
Yes, there is one fdb table per bridge - but each entry would say
which brport is involved. So the reference point being a bridge point
sounds reasonable to me. i.e
Any fdb entry whether in h/w or s/w would point to an egress
brport *always*. Are you thinking there is only one possible bridge?
Caveat: a piece of hardware could have multiple virtual bridges.
In what Ben showed on the $5 realtek, after boot up we just
know which brports exist and nothing more.
We can then create a bridge and attach brport to each. This gets
reflected into hardware on a per brport level (I think there was
a field called FID?).
Constraint: Each brport connects only to one bridge.

> I view the existing ndo_fdb_add/del ops useful for devices working
> standalone without the bridge driver that have some HW fwding
> capabilities and need to manage their own fdb.

As in offload said fdb?

>For devices under
> bridge, let's use the bridge's fdb, at least for swdev.
>

If the hardware can only do one bridge sure.

> Does this make sense?

not quiet for me but i may be missing something.

I hate to use a lot of "I"s in my sentences,
> but looks like I did exactly that in above, so take this as an
> opinion, within the scope of swdev.
>

"I" is useful for expressing an opinion or an expectation of course ;->
As an example:
_I_ believe we should be able to define how {learning, flooding etc} and 
where {hw vs sw} things are to be installed or learnt-from.
I have use cases where the controller makes all the decisions.
And i tried to provide my motivation in one of the meetings here:
https://linux.cumulusnetworks.com/offload-discussion-1/jamal-NFstatecaching.pdf

cheers,
jamal

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch net-next v2 01/10] net: rename netdev_phys_port_id to more generic name
  2014-11-09 10:51 ` [patch net-next v2 01/10] net: rename netdev_phys_port_id to more generic name Jiri Pirko
  2014-11-10  3:35   ` Jamal Hadi Salim
@ 2014-11-10 21:57   ` John Fastabend
  1 sibling, 0 replies; 100+ messages in thread
From: John Fastabend @ 2014-11-10 21:57 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse,
	pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, jhs, sfeldma,
	f.fainelli, roopa, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck,
	john.ronciak, mleitner, shrijeet, gospo, bcrl

On 11/09/2014 02:51 AM, Jiri Pirko wrote:
> So this can be reused for identification of other "items" as well.
>
> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
> ---

Looks good to me. Just a simple code refactoring and doesn't
touch any user visible uapi files and makes the next patches a
bit cleaner.

Acked-by: John Fastabend <john.r.fastabend@intel.com>


-- 
John Fastabend         Intel Corporation

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch net-next v2 02/10] net: introduce generic switch devices support
  2014-11-09 10:51 ` [patch net-next v2 02/10] net: introduce generic switch devices support Jiri Pirko
@ 2014-11-10 21:59   ` John Fastabend
  2014-11-11 15:11     ` Jiri Pirko
  2014-11-11  9:49   ` M. Braun
  2014-11-19 13:28   ` Roopa Prabhu
  2 siblings, 1 reply; 100+ messages in thread
From: John Fastabend @ 2014-11-10 21:59 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse,
	pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, jhs, sfeldma,
	f.fainelli, roopa, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck,
	john.ronciak, mleitner, shrijeet, gospo, bcrl

On 11/09/2014 02:51 AM, Jiri Pirko wrote:
> The goal of this is to provide a possibility to support various switch
> chips. Drivers should implement relevant ndos to do so. Now there is
> only one ndo defined:
> - for getting physical switch id is in place.
>
> Note that user can use random port netdevice to access the switch.
>
> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
> ---
>   Documentation/networking/switchdev.txt | 59 ++++++++++++++++++++++++++++++++++
>   MAINTAINERS                            |  7 ++++
>   include/linux/netdevice.h              | 10 ++++++
>   include/net/switchdev.h                | 30 +++++++++++++++++
>   net/Kconfig                            |  1 +
>   net/Makefile                           |  3 ++
>   net/switchdev/Kconfig                  | 13 ++++++++
>   net/switchdev/Makefile                 |  5 +++
>   net/switchdev/switchdev.c              | 33 +++++++++++++++++++
>   9 files changed, 161 insertions(+)
>   create mode 100644 Documentation/networking/switchdev.txt
>   create mode 100644 include/net/switchdev.h
>   create mode 100644 net/switchdev/Kconfig
>   create mode 100644 net/switchdev/Makefile
>   create mode 100644 net/switchdev/switchdev.c
>
> diff --git a/Documentation/networking/switchdev.txt b/Documentation/networking/switchdev.txt
> new file mode 100644
> index 0000000..98be76c
> --- /dev/null
> +++ b/Documentation/networking/switchdev.txt
> @@ -0,0 +1,59 @@
> +Switch (and switch-ish) device drivers HOWTO
> +===========================
> +
> +Please note that the word "switch" is here used in very generic meaning.
> +This include devices supporting L2/L3 but also various flow offloading chips,
> +including switches embedded into SR-IOV NICs.
> +
> +Lets describe a topology a bit. Imagine the following example:
> +
> +       +----------------------------+    +---------------+
> +       |     SOME switch chip       |    |      CPU      |
> +       +----------------------------+    +---------------+
> +       port1 port2 port3 port4 MNGMNT    |     PCI-E     |
> +         |     |     |     |     |       +---------------+
> +        PHY   PHY    |     |     |         |  NIC0 NIC1
> +                     |     |     |         |   |    |
> +                     |     |     +- PCI-E -+   |    |
> +                     |     +------- MII -------+    |
> +                     +------------- MII ------------+
> +
> +In this example, there are two independent lines between the switch silicon
> +and CPU. NIC0 and NIC1 drivers are not aware of a switch presence. They are
> +separate from the switch driver. SOME switch chip is by managed by a driver
> +via PCI-E device MNGMNT. Note that MNGMNT device, NIC0 and NIC1 may be
> +connected to some other type of bus.
> +
> +Now, for the previous example show the representation in kernel:
> +
> +       +----------------------------+    +---------------+
> +       |     SOME switch chip       |    |      CPU      |
> +       +----------------------------+    +---------------+
> +       sw0p0 sw0p1 sw0p2 sw0p3 MNGMNT    |     PCI-E     |
> +         |     |     |     |     |       +---------------+
> +        PHY   PHY    |     |     |         |  eth0 eth1
> +                     |     |     |         |   |    |
> +                     |     |     +- PCI-E -+   |    |
> +                     |     +------- MII -------+    |
> +                     +------------- MII ------------+
> +
> +Lets call the example switch driver for SOME switch chip "SOMEswitch". This
> +driver takes care of PCI-E device MNGMNT. There is a netdevice instance sw0pX
> +created for each port of a switch. These netdevices are instances
> +of "SOMEswitch" driver. sw0pX netdevices serve as a "representation"
> +of the switch chip. eth0 and eth1 are instances of some other existing driver.
> +
> +The only difference of the switch-port netdevice from the ordinary netdevice
> +is that is implements couple more NDOs:
> +
> +	ndo_sw_parent_get_id - This returns the same ID for two port netdevices
> +			       of the same physical switch chip. This is
> +			       mandatory to be implemented by all switch drivers
> +			       and serves the caller for recognition of a port
> +			       netdevice.

What is the connection between ndo_sw_parent_get_id and
ndo_get_phys_port_id(). I'm having a bit of trouble teasing
this out.

For example here is my ascii art for a SR-IOV NIC,

        eth0     eth1     eth2
         |         |        |
         |         |        |
         PF        VF       VF
    +----+---------+--------+----+
    |       embedded bridge      |
    +-------------+--------------+
                  |
                 port

that can do switching between the various uplinks and downlinks.
In IEEE 802.1Q language the embedded bridge acts like an edge
relay. At least that seems to be the current state of the art
for SR-IOV. Edge relay just means it has a single uplink port
to the network and multiple downlinks and also isn't required
to do learning and run loop detection protocols STP, et. al.

Also there are multi-function devices that look the same except
replace the VFs with PFs. It seems to be a common mode for NICs
that do the iSCSI offloads with storage functions.

When something is an embedded bridge vs a SOME switch chip is
not entirely clear.

My understanding is use ndo_sw_parent_get_id() when you have
multiple physical ports all connected to a single switch object.
When you have a single port connected to multiple PCIE functions
or queues representing a netdev (e.g. macvlan offload) use the
ndo_get_phys_port_id(). Just want to be sure we are on the
same page here.

Otherwise patch looks good. I think we can clear the above up
with an addition to the documentation. Could go in after the
initial set and be OK with me.

IMO this patch is needed otherwise user space is at a complete
loss on trying to figure out how netdevs map to switch silicon.
You could have reused ndo_get_phys_port_id() perhaps but then
I think user space may get confused by SR-IOV/VMDQ/etc ports
attached to a switch silicon. For .02$ having a new distinct
identifier is cleaner.


> +	ndo_sw_parent_* - Functions that serve for a manipulation of the switch
> +			  chip itself (it can be though of as a "parent" of the
> +			  port, therefore the name). They are not port-specific.
> +			  Caller might use arbitrary port netdevice of the same
> +			  switch and it will make no difference.
> +	ndo_sw_port_* - Functions that serve for a port-specific manipulation.

[...]

Thanks,
John


-- 
John Fastabend         Intel Corporation

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch net-next v2 03/10] rtnl: expose physical switch id for particular device
  2014-11-09 10:51 ` [patch net-next v2 03/10] rtnl: expose physical switch id for particular device Jiri Pirko
  2014-11-10  3:43   ` Jamal Hadi Salim
  2014-11-10 17:58   ` Roopa Prabhu
@ 2014-11-10 22:01   ` John Fastabend
  2 siblings, 0 replies; 100+ messages in thread
From: John Fastabend @ 2014-11-10 22:01 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse,
	pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, jhs, sfeldma,
	f.fainelli, roopa, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck,
	john.ronciak, mleitner, shrijeet, gospo, bcrl

On 11/09/2014 02:51 AM, Jiri Pirko wrote:
> The netdevice represents a port in a switch, it will expose
> IFLA_PHYS_SWITCH_ID value via rtnl. Two netdevices with the same value
> belong to one physical switch.
>
> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
> ---

Yep, I need something like this for my management app to
learn how the switch is laid out. This becomes more relevant
if I have switch silicon mixed with NICs that are not connected
to the switch object in the same platform.

Acked-by: John Fastabend <john.r.fastabend@intel.com>


-- 
John Fastabend         Intel Corporation

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch net-next v2 04/10] net-sysfs: expose physical switch id for particular device
  2014-11-09 10:51 ` [patch net-next v2 04/10] net-sysfs: " Jiri Pirko
@ 2014-11-10 22:01   ` John Fastabend
  0 siblings, 0 replies; 100+ messages in thread
From: John Fastabend @ 2014-11-10 22:01 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse,
	pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, jhs, sfeldma,
	f.fainelli, roopa, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck,
	john.ronciak, mleitner, shrijeet, gospo, bcrl

On 11/09/2014 02:51 AM, Jiri Pirko wrote:
> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
> ---

Acked-by: John Fastabend <john.r.fastabend@intel.com>



-- 
John Fastabend         Intel Corporation

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch net-next v2 05/10] rocker: introduce rocker switch driver
  2014-11-09 10:51 ` [patch net-next v2 05/10] rocker: introduce rocker switch driver Jiri Pirko
@ 2014-11-10 22:04   ` John Fastabend
  2014-11-11 14:29     ` Thomas Graf
  2014-11-11 15:28     ` Jiri Pirko
  0 siblings, 2 replies; 100+ messages in thread
From: John Fastabend @ 2014-11-10 22:04 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse,
	pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, jhs, sfeldma,
	f.fainelli, roopa, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck,
	john.ronciak, mleitner, shrijeet, gospo, bcrl

On 11/09/2014 02:51 AM, Jiri Pirko wrote:
> This patch introduces the first driver to benefit from the switchdev
> infrastructure and to implement newly introduced switch ndos. This is a
> driver for emulated switch chip implemented in qemu:
> https://github.com/sfeldma/qemu-rocker/
>
> This patch is a result of joint work with Scott Feldman.
>
> Signed-off-by: Scott Feldman <sfeldma@gmail.com>
> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
> ---
>   MAINTAINERS                          |    7 +
>   drivers/net/ethernet/Kconfig         |    1 +
>   drivers/net/ethernet/Makefile        |    1 +
>   drivers/net/ethernet/rocker/Kconfig  |   27 +
>   drivers/net/ethernet/rocker/Makefile |    5 +
>   drivers/net/ethernet/rocker/rocker.c | 2060 ++++++++++++++++++++++++++++++++++
>   drivers/net/ethernet/rocker/rocker.h |  427 +++++++
>   7 files changed, 2528 insertions(+)
>   create mode 100644 drivers/net/ethernet/rocker/Kconfig
>   create mode 100644 drivers/net/ethernet/rocker/Makefile
>   create mode 100644 drivers/net/ethernet/rocker/rocker.c
>   create mode 100644 drivers/net/ethernet/rocker/rocker.h
>

[...]

> +
> +static netdev_tx_t rocker_port_xmit(struct sk_buff *skb, struct net_device *dev)
> +{
> +	struct rocker_port *rocker_port = netdev_priv(dev);
> +	struct rocker *rocker = rocker_port->rocker;
> +	struct rocker_desc_info *desc_info;
> +	struct rocker_tlv *frags;
> +	int i;
> +	int err;
> +
> +	desc_info = rocker_desc_head_get(&rocker_port->tx_ring);
> +	if (unlikely(!desc_info)) {
> +		if (net_ratelimit())

Could you have a netif_stop_queue() here as well same as below? Not
that optimizing the xmit routine is the interesting part of this patch.
But I guess this is just some strange error path because I see you
check this case below.

> +			netdev_err(dev, "tx ring full when queue awake\n");
> +		return NETDEV_TX_BUSY;
> +	}
> +
> +	rocker_desc_cookie_ptr_set(desc_info, skb);
> +
> +	frags = rocker_tlv_nest_start(desc_info, ROCKER_TLV_TX_FRAGS);
> +	if (!frags)
> +		goto out;
> +	err = rocker_tx_desc_frag_map_put(rocker_port, desc_info,
> +					  skb->data, skb_headlen(skb));
> +	if (err)
> +		goto nest_cancel;
> +	if (skb_shinfo(skb)->nr_frags > ROCKER_TX_FRAGS_MAX)
> +		goto nest_cancel;
> +
> +	for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
> +		const skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
> +
> +		err = rocker_tx_desc_frag_map_put(rocker_port, desc_info,
> +						  skb_frag_address(frag),
> +						  skb_frag_size(frag));
> +		if (err)
> +			goto unmap_frags;
> +	}
> +	rocker_tlv_nest_end(desc_info, frags);
> +
> +	rocker_desc_gen_clear(desc_info);
> +	rocker_desc_head_set(rocker, &rocker_port->tx_ring, desc_info);
> +
> +	desc_info = rocker_desc_head_get(&rocker_port->tx_ring);
> +	if (!desc_info)
> +		netif_stop_queue(dev);

I'm not entirely sure I followed the TLV usage here but OK. If it
works...

> +
> +	return NETDEV_TX_OK;
> +
> +unmap_frags:
> +	rocker_tx_desc_frags_unmap(rocker_port, desc_info);
> +nest_cancel:
> +	rocker_tlv_nest_cancel(desc_info, frags);
> +out:
> +	dev_kfree_skb(skb);
> +	return NETDEV_TX_OK;
> +}
> +
> +static int rocker_port_set_mac_address(struct net_device *dev, void *p)
> +{
> +	struct sockaddr *addr = p;
> +	struct rocker_port *rocker_port = netdev_priv(dev);
> +	int err;
> +
> +	if (!is_valid_ether_addr(addr->sa_data))
> +		return -EADDRNOTAVAIL;
> +
> +	err = rocker_cmd_set_port_settings_macaddr(rocker_port, addr->sa_data);
> +	if (err)
> +		return err;
> +	memcpy(dev->dev_addr, addr->sa_data, dev->addr_len);
> +	return 0;
> +}
> +
> +static int rocker_port_sw_parent_id_get(struct net_device *dev,
> +					struct netdev_phys_item_id *psid)
> +{
> +	struct rocker_port *rocker_port = netdev_priv(dev);
> +	struct rocker *rocker = rocker_port->rocker;
> +

hmm looks like you read this out of a magic switch register :) but
my switch doesn't have this magic reg. I suposse the switch MAC address
should work.

> +	psid->id_len = sizeof(rocker->hw.id);
> +	memcpy(&psid->id, &rocker->hw.id, psid->id_len);
> +	return 0;
> +}
> +
> +static const struct net_device_ops rocker_port_netdev_ops = {
> +	.ndo_open			= rocker_port_open,
> +	.ndo_stop			= rocker_port_stop,
> +	.ndo_start_xmit			= rocker_port_xmit,
> +	.ndo_set_mac_address		= rocker_port_set_mac_address,
> +	.ndo_sw_parent_id_get		= rocker_port_sw_parent_id_get,
> +};
> +
> +/********************
> + * ethtool interface
> + ********************/
> +
> +static int rocker_port_get_settings(struct net_device *dev,
> +				    struct ethtool_cmd *ecmd)
> +{
> +	struct rocker_port *rocker_port = netdev_priv(dev);
> +
> +	return rocker_cmd_get_port_settings_ethtool(rocker_port, ecmd);
> +}
> +
> +static int rocker_port_set_settings(struct net_device *dev,
> +				    struct ethtool_cmd *ecmd)
> +{
> +	struct rocker_port *rocker_port = netdev_priv(dev);
> +
> +	return rocker_cmd_set_port_settings_ethtool(rocker_port, ecmd);
> +}
> +
> +static void rocker_port_get_drvinfo(struct net_device *dev,
> +				    struct ethtool_drvinfo *drvinfo)
> +{
> +	strlcpy(drvinfo->driver, rocker_driver_name, sizeof(drvinfo->driver));
> +	strlcpy(drvinfo->version, UTS_RELEASE, sizeof(drvinfo->version));
> +}
> +
> +static const struct ethtool_ops rocker_port_ethtool_ops = {
> +	.get_settings		= rocker_port_get_settings,
> +	.set_settings		= rocker_port_set_settings,
> +	.get_drvinfo		= rocker_port_get_drvinfo,
> +	.get_link		= ethtool_op_get_link,
> +};
> +

[...]

Looks reasonable to me, although I mostly scanned it and looked
over the interface parts. My comments are just observations no
need to change anything for them.

Reviewed-by: John Fastabend <john.r.fastabend@intel.com>


-- 
John Fastabend         Intel Corporation

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch net-next v2 03/10] rtnl: expose physical switch id for particular device
  2014-11-10 17:58   ` Roopa Prabhu
  2014-11-10 20:02     ` Scott Feldman
@ 2014-11-10 22:14     ` Jiri Pirko
  2014-11-10 22:31       ` John Fastabend
  1 sibling, 1 reply; 100+ messages in thread
From: Jiri Pirko @ 2014-11-10 22:14 UTC (permalink / raw)
  To: Roopa Prabhu
  Cc: netdev, davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse,
	pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, jhs, sfeldma,
	f.fainelli, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck,
	john.ronciak, mleitner, shrijeet, gospo, bcrl

Mon, Nov 10, 2014 at 06:58:08PM CET, roopa@cumulusnetworks.com wrote:
>On 11/9/14, 2:51 AM, Jiri Pirko wrote:
>>The netdevice represents a port in a switch, it will expose
>>IFLA_PHYS_SWITCH_ID value via rtnl. Two netdevices with the same value
>>belong to one physical switch.
>>
>>Signed-off-by: Jiri Pirko <jiri@resnulli.us>
>>---
>>  include/uapi/linux/if_link.h |  1 +
>>  net/core/rtnetlink.c         | 26 +++++++++++++++++++++++++-
>>  2 files changed, 26 insertions(+), 1 deletion(-)
>>
>>diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
>>index 7072d83..4163753 100644
>>--- a/include/uapi/linux/if_link.h
>>+++ b/include/uapi/linux/if_link.h
>>@@ -145,6 +145,7 @@ enum {
>>  	IFLA_CARRIER,
>>  	IFLA_PHYS_PORT_ID,
>>  	IFLA_CARRIER_CHANGES,
>>+	IFLA_PHYS_SWITCH_ID,
>
>Jiri, since we have not really converged on the switchdev class or having a
>separate switchdev instance,
>am thinking it is better if we dont expose any such switch_id to userspace
>yet until absolutely needed. Do you need it today ?
>There is no real in kernel hw switch driver that will use it today. And quite
>likely this will need to change when we introduce real hw switch drivers.

When and if the switchdev class is introduced, switch id can happily
live on. It is nothing against it. Userspace should use this id to group
the ports of physical switch.

>
>
>>  	__IFLA_MAX
>>  };
>>diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
>>index 1087c6d..f839354 100644
>>--- a/net/core/rtnetlink.c
>>+++ b/net/core/rtnetlink.c
>>@@ -43,6 +43,7 @@
>>  #include <linux/inet.h>
>>  #include <linux/netdevice.h>
>>+#include <net/switchdev.h>
>>  #include <net/ip.h>
>>  #include <net/protocol.h>
>>  #include <net/arp.h>
>>@@ -868,7 +869,8 @@ static noinline size_t if_nlmsg_size(const struct net_device *dev,
>>  	       + rtnl_port_size(dev, ext_filter_mask) /* IFLA_VF_PORTS + IFLA_PORT_SELF */
>>  	       + rtnl_link_get_size(dev) /* IFLA_LINKINFO */
>>  	       + rtnl_link_get_af_size(dev) /* IFLA_AF_SPEC */
>>-	       + nla_total_size(MAX_PHYS_ITEM_ID_LEN); /* IFLA_PHYS_PORT_ID */
>>+	       + nla_total_size(MAX_PHYS_ITEM_ID_LEN) /* IFLA_PHYS_PORT_ID */
>>+	       + nla_total_size(MAX_PHYS_ITEM_ID_LEN); /* IFLA_PHYS_SWITCH_ID */
>>  }
>>  static int rtnl_vf_ports_fill(struct sk_buff *skb, struct net_device *dev)
>>@@ -967,6 +969,24 @@ static int rtnl_phys_port_id_fill(struct sk_buff *skb, struct net_device *dev)
>>  	return 0;
>>  }
>>+static int rtnl_phys_switch_id_fill(struct sk_buff *skb, struct net_device *dev)
>>+{
>>+	int err;
>>+	struct netdev_phys_item_id psid;
>>+
>>+	err = netdev_sw_parent_id_get(dev, &psid);
>>+	if (err) {
>>+		if (err == -EOPNOTSUPP)
>>+			return 0;
>>+		return err;
>>+	}
>>+
>>+	if (nla_put(skb, IFLA_PHYS_SWITCH_ID, psid.id_len, psid.id))
>>+		return -EMSGSIZE;
>>+
>>+	return 0;
>>+}
>>+
>>  static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev,
>>  			    int type, u32 pid, u32 seq, u32 change,
>>  			    unsigned int flags, u32 ext_filter_mask)
>>@@ -1039,6 +1059,9 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev,
>>  	if (rtnl_phys_port_id_fill(skb, dev))
>>  		goto nla_put_failure;
>>+	if (rtnl_phys_switch_id_fill(skb, dev))
>>+		goto nla_put_failure;
>>+
>>  	attr = nla_reserve(skb, IFLA_STATS,
>>  			sizeof(struct rtnl_link_stats));
>>  	if (attr == NULL)
>>@@ -1198,6 +1221,7 @@ static const struct nla_policy ifla_policy[IFLA_MAX+1] = {
>>  	[IFLA_NUM_RX_QUEUES]	= { .type = NLA_U32 },
>>  	[IFLA_PHYS_PORT_ID]	= { .type = NLA_BINARY, .len = MAX_PHYS_ITEM_ID_LEN },
>>  	[IFLA_CARRIER_CHANGES]	= { .type = NLA_U32 },  /* ignored */
>>+	[IFLA_PHYS_SWITCH_ID]	= { .type = NLA_BINARY, .len = MAX_PHYS_ITEM_ID_LEN },
>>  };
>>  static const struct nla_policy ifla_info_policy[IFLA_INFO_MAX+1] = {
>

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch net-next v2 00/10] introduce rocker switch driver with hardware accelerated datapath api - phase 1: bridge fdb offload
  2014-11-10  4:58       ` Simon Horman
@ 2014-11-10 22:23         ` John Fastabend
  2014-11-11  8:51           ` Simon Horman
  2014-11-13  5:44           ` Simon Horman
  0 siblings, 2 replies; 100+ messages in thread
From: John Fastabend @ 2014-11-10 22:23 UTC (permalink / raw)
  To: Simon Horman
  Cc: Jamal Hadi Salim, Jiri Pirko, netdev, davem, nhorman, andy,
	tgraf, dborkman, ogerlitz, jesse, pshelar, azhou, ben, stephen,
	jeffrey.t.kirsher, vyasevic, xiyou.wangcong, john.r.fastabend,
	edumazet, sfeldma, f.fainelli, roopa, linville, jasowang,
	ebiederm, nicolas.dichtel, ryazanov.s.a, buytenh, aviadr, nbd,
	alexei.starovoitov, Neil.Jerram, ronye, alexander.h.duyck,
	john.ronciak, mleitner, shrijeet, gospo

On 11/09/2014 08:58 PM, Simon Horman wrote:
> On Sun, Nov 09, 2014 at 11:03:40PM -0500, Jamal Hadi Salim wrote:
>> Hi Simon,
>>
>> On 11/09/14 22:46, Simon Horman wrote:
>>> Hi Jamal, Hi Jiri,
>>>
>>> On a somewhat related note I am also wondering what if any progress has
>>> been made regarding discussions of (and code for) the following:
>>>
>>> 1. Exposing flow tables to user-space
>>>     - I realise that this is Open vSwitch specific to some extent
>>>       but I am in no way implying that it should be done instead of
>>>       non-Open vSwitch specific work.
>>>     - Jiri, IIRC this was part ~v2 of your earlier offload patchset
>>>
>>
>> I dont know what Rocker crowd is doing; however, I know
>> John F. has been doing some work which i have stared at
>> and I was hoping to join in with Ben's effort and show tc flow
>> offload on the realtek chip in my infinite spare time unles.
>> (for both Linux bridge and ports).
>> The priority is to merge the obvious bits first.
>
> Merging the obvious bits first is quite fine my me.
>

+1

>>> 2. Describing Switch Hardware
>>>     - I see John Fastabend moving forwards on this in his git repository
>>>       https://github.com/jrfastab/flow-net-next
>>>
>>> The way that I see things is that both of the above could be exposed via
>>> netlink. And that the first at least could be backed by NDOs.  As such I
>>> see this work as complementary and perhaps applying on top of this
>>> patchset. If I am mistaken in this regards it would be good to know :)
>>>
>>
>> You are correct - I will let John speak on his work, but
>> that is the intent.
>> The challenge is there are many schools of thoughts and i am hoping
>> it is not an arms race.
>
> That is also my hope.

My intent is to submit the Flow API bits once the base rocker switch
gets committed. I've implemented the Flow API against ixgbe and a
sadly a proprietary SDK. I'll implement it against the rocker switch
as well.

>
>>> I am of course also interested to know if the above are moving forwards.
>>> To be clear I am very interested in being able to use these APIs to
>>> perform Open vSwitch offloads and I am very happy to help.
>>> (Jamal: I'm also interested in non-Open vSwitch offloads :)
>>>
>>

I think they are moving forward. I have some code cleanup to do on
the flow API, but its mostly in place. Then I want to implement
an example on Rocker switch so we could experiment with something
why we wait for a real hardware driver. From my side assuming I at
least got it close to correct should be doable in the next few week.

After that I want to work with Jesse/Jamal and look at integrating
with OVS and other stacks. I thought a bit about the OVS integration
path but I'll hold that discussion for the moment.

Simon, if your feeling adventurous any feedback on the repo link
would be great. I still need to smash the commit log into something
coherent though at the moment you can see all the errors and rewrites,
etc as I made them.

>> Hey, OVS should be able to use these APIs; i am just interested in making
>> sure they are not just for OVS or OF. Then we are all happy;->
>
> I think we are all happy :)

I'm happy :) Also my intent is the flow API is more general then
just OVS. My view is OVS should be one user of the API.

.John

> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


-- 
John Fastabend         Intel Corporation

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch net-next v2 03/10] rtnl: expose physical switch id for particular device
  2014-11-10 22:14     ` Jiri Pirko
@ 2014-11-10 22:31       ` John Fastabend
  0 siblings, 0 replies; 100+ messages in thread
From: John Fastabend @ 2014-11-10 22:31 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Roopa Prabhu, netdev, davem, nhorman, andy, tgraf, dborkman,
	ogerlitz, jesse, pshelar, azhou, ben, stephen, jeffrey.t.kirsher,
	vyasevic, xiyou.wangcong, john.r.fastabend, edumazet, jhs,
	sfeldma, f.fainelli, linville, jasowang, ebiederm,
	nicolas.dichtel, ryazanov.s.a, buytenh, aviadr, nbd,
	alexei.starovoitov, Neil.Jerram, ronye, simon.horman,
	alexander.h.duyck, john.ronciak, mleitner, shrijeet, gospo

On 11/10/2014 02:14 PM, Jiri Pirko wrote:
> Mon, Nov 10, 2014 at 06:58:08PM CET, roopa@cumulusnetworks.com wrote:
>> On 11/9/14, 2:51 AM, Jiri Pirko wrote:
>>> The netdevice represents a port in a switch, it will expose
>>> IFLA_PHYS_SWITCH_ID value via rtnl. Two netdevices with the same value
>>> belong to one physical switch.
>>>
>>> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
>>> ---
>>>   include/uapi/linux/if_link.h |  1 +
>>>   net/core/rtnetlink.c         | 26 +++++++++++++++++++++++++-
>>>   2 files changed, 26 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
>>> index 7072d83..4163753 100644
>>> --- a/include/uapi/linux/if_link.h
>>> +++ b/include/uapi/linux/if_link.h
>>> @@ -145,6 +145,7 @@ enum {
>>>   	IFLA_CARRIER,
>>>   	IFLA_PHYS_PORT_ID,
>>>   	IFLA_CARRIER_CHANGES,
>>> +	IFLA_PHYS_SWITCH_ID,
>>
>> Jiri, since we have not really converged on the switchdev class or having a
>> separate switchdev instance,
>> am thinking it is better if we dont expose any such switch_id to userspace
>> yet until absolutely needed. Do you need it today ?
>> There is no real in kernel hw switch driver that will use it today. And quite
>> likely this will need to change when we introduce real hw switch drivers.
>
> When and if the switchdev class is introduced, switch id can happily
> live on. It is nothing against it. Userspace should use this id to group
> the ports of physical switch.
>
>

+1 I think we need this otherwise how will userspace "know" how the
ports are related? If I have two switch silicon blocks in a single
platform or perhaps have a switch silicon with traditional host
nics on the same platform.


-- 
John Fastabend         Intel Corporation

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch net-next] bridge: rename fdb_*_hw to fdb_*_hw_addr to avoid confusion
  2014-11-09 16:40 ` [patch net-next] bridge: rename fdb_*_hw to fdb_*_hw_addr to avoid confusion Jiri Pirko
@ 2014-11-11  2:33   ` David Miller
  2014-11-11  7:20     ` Jiri Pirko
  0 siblings, 1 reply; 100+ messages in thread
From: David Miller @ 2014-11-11  2:33 UTC (permalink / raw)
  To: jiri
  Cc: netdev, nhorman, andy, tgraf, dborkman, ogerlitz, jesse, pshelar,
	azhou, ben, stephen, jeffrey.t.kirsher, vyasevic, xiyou.wangcong,
	john.r.fastabend, edumazet, jhs, sfeldma, f.fainelli, roopa,
	linville, jasowang, ebiederm, nicolas.dichtel, ryazanov.s.a,
	buytenh, aviadr, nbd, alexei.starovoitov, Neil.Jerram, ronye,
	simon.horman, alexander.h.duyck, john.ronciak, mleitner,
	shrijeet, gospo, bcrl

From: Jiri Pirko <jiri@resnulli.us>
Date: Sun,  9 Nov 2014 17:40:16 +0100

> The current name might seem that this actually offloads the fdb entry to
> hw. So rename it to clearly present that this for hardware address
> addition/removal.
> 
> Signed-off-by: Jiri Pirko <jiri@resnulli.us>

This must have been relative to your rocket patch series because it
doesn't apply cleanly.

I already am assuming that the rocket patch set is getting one more
spin, so why don't you add this to that series?

Thanks.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch net-next] bridge: rename fdb_*_hw to fdb_*_hw_addr to avoid confusion
  2014-11-11  2:33   ` David Miller
@ 2014-11-11  7:20     ` Jiri Pirko
  0 siblings, 0 replies; 100+ messages in thread
From: Jiri Pirko @ 2014-11-11  7:20 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, nhorman, andy, tgraf, dborkman, ogerlitz, jesse, pshelar,
	azhou, ben, stephen, jeffrey.t.kirsher, vyasevic, xiyou.wangcong,
	john.r.fastabend, edumazet, jhs, sfeldma, f.fainelli, roopa,
	linville, jasowang, ebiederm, nicolas.dichtel, ryazanov.s.a,
	buytenh, aviadr, nbd, alexei.starovoitov, Neil.Jerram, ronye,
	simon.horman, alexander.h.duyck, john.ronciak, mleitner,
	shrijeet, gospo, bcrl

Tue, Nov 11, 2014 at 03:33:06AM CET, davem@davemloft.net wrote:
>From: Jiri Pirko <jiri@resnulli.us>
>Date: Sun,  9 Nov 2014 17:40:16 +0100
>
>> The current name might seem that this actually offloads the fdb entry to
>> hw. So rename it to clearly present that this for hardware address
>> addition/removal.
>> 
>> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
>
>This must have been relative to your rocket patch series because it
>doesn't apply cleanly.
>
>I already am assuming that the rocket patch set is getting one more
>spin, so why don't you add this to that series?

Will do.

>
>Thanks.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch net-next v2 00/10] introduce rocker switch driver with hardware accelerated datapath api - phase 1: bridge fdb offload
  2014-11-10 22:23         ` John Fastabend
@ 2014-11-11  8:51           ` Simon Horman
  2014-11-13  5:44           ` Simon Horman
  1 sibling, 0 replies; 100+ messages in thread
From: Simon Horman @ 2014-11-11  8:51 UTC (permalink / raw)
  To: John Fastabend
  Cc: Jamal Hadi Salim, Jiri Pirko, netdev, davem, nhorman, andy,
	tgraf, dborkman, ogerlitz, jesse, pshelar, azhou, ben, stephen,
	jeffrey.t.kirsher, vyasevic, xiyou.wangcong, john.r.fastabend,
	edumazet, sfeldma, f.fainelli, roopa, linville, jasowang,
	ebiederm, nicolas.dichtel, ryazanov.s.a, buytenh, aviadr, nbd,
	alexei.starovoitov, Neil.Jerram, ronye, alexander.h.duyck,
	john.ronciak, mleitner, shrijeet, gospo

On Mon, Nov 10, 2014 at 02:23:15PM -0800, John Fastabend wrote:
> On 11/09/2014 08:58 PM, Simon Horman wrote:
> >On Sun, Nov 09, 2014 at 11:03:40PM -0500, Jamal Hadi Salim wrote:
> >>Hi Simon,
> >>
> >>On 11/09/14 22:46, Simon Horman wrote:
> >>>Hi Jamal, Hi Jiri,
> >>>
> >>>On a somewhat related note I am also wondering what if any progress has
> >>>been made regarding discussions of (and code for) the following:
> >>>
> >>>1. Exposing flow tables to user-space
> >>>    - I realise that this is Open vSwitch specific to some extent
> >>>      but I am in no way implying that it should be done instead of
> >>>      non-Open vSwitch specific work.
> >>>    - Jiri, IIRC this was part ~v2 of your earlier offload patchset
> >>>
> >>
> >>I dont know what Rocker crowd is doing; however, I know
> >>John F. has been doing some work which i have stared at
> >>and I was hoping to join in with Ben's effort and show tc flow
> >>offload on the realtek chip in my infinite spare time unles.
> >>(for both Linux bridge and ports).
> >>The priority is to merge the obvious bits first.
> >
> >Merging the obvious bits first is quite fine my me.
> >
> 
> +1
> 
> >>>2. Describing Switch Hardware
> >>>    - I see John Fastabend moving forwards on this in his git repository
> >>>      https://github.com/jrfastab/flow-net-next
> >>>
> >>>The way that I see things is that both of the above could be exposed via
> >>>netlink. And that the first at least could be backed by NDOs.  As such I
> >>>see this work as complementary and perhaps applying on top of this
> >>>patchset. If I am mistaken in this regards it would be good to know :)
> >>>
> >>
> >>You are correct - I will let John speak on his work, but
> >>that is the intent.
> >>The challenge is there are many schools of thoughts and i am hoping
> >>it is not an arms race.
> >
> >That is also my hope.
> 
> My intent is to submit the Flow API bits once the base rocker switch
> gets committed. I've implemented the Flow API against ixgbe and a
> sadly a proprietary SDK. I'll implement it against the rocker switch
> as well.

Understood, that seems like a good approach to me.

> >>>I am of course also interested to know if the above are moving forwards.
> >>>To be clear I am very interested in being able to use these APIs to
> >>>perform Open vSwitch offloads and I am very happy to help.
> >>>(Jamal: I'm also interested in non-Open vSwitch offloads :)
> >>>
> >>
> 
> I think they are moving forward. I have some code cleanup to do on
> the flow API, but its mostly in place. Then I want to implement
> an example on Rocker switch so we could experiment with something
> why we wait for a real hardware driver. From my side assuming I at
> least got it close to correct should be doable in the next few week.
> 
> After that I want to work with Jesse/Jamal and look at integrating
> with OVS and other stacks. I thought a bit about the OVS integration
> path but I'll hold that discussion for the moment.

I am looking forward to that discussion.

> Simon, if your feeling adventurous any feedback on the repo link
> would be great. I still need to smash the commit log into something
> coherent though at the moment you can see all the errors and rewrites,
> etc as I made them.

Sure, will do.  I took a look over your code about two weeks ago but I
believe you have made some updates since then.

> >>Hey, OVS should be able to use these APIs; i am just interested in making
> >>sure they are not just for OVS or OF. Then we are all happy;->
> >
> >I think we are all happy :)
> 
> I'm happy :) Also my intent is the flow API is more general then
> just OVS. My view is OVS should be one user of the API.

I agree entirely.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch net-next v2 02/10] net: introduce generic switch devices support
  2014-11-09 10:51 ` [patch net-next v2 02/10] net: introduce generic switch devices support Jiri Pirko
  2014-11-10 21:59   ` John Fastabend
@ 2014-11-11  9:49   ` M. Braun
  2014-11-11 10:04     ` Jiri Pirko
  2014-11-19 13:28   ` Roopa Prabhu
  2 siblings, 1 reply; 100+ messages in thread
From: M. Braun @ 2014-11-11  9:49 UTC (permalink / raw)
  To: Jiri Pirko, netdev
  Cc: davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse, pshelar,
	azhou, ben, stephen, jeffrey.t.kirsher, vyasevic, xiyou.wangcong,
	john.r.fastabend, edumazet, jhs, sfeldma, f.fainelli, roopa,
	linville, jasowang, ebiederm, nicolas.dichtel, ryazanov.s.a,
	buytenh, aviadr, nbd, alexei.starovoitov, Neil.Jerram, ronye,
	simon.horman, alexander.h.duyck, john.ronciak, mleitner,
	shrijeet, gospo, bcrl



Am 09.11.2014 um 11:51 schrieb Jiri Pirko:
> diff --git a/Documentation/networking/switchdev.txt b/Documentation/networking/switchdev.txt
> +	ndo_sw_parent_get_id - ...

here the ndo is called get_id

but

> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> + *
> + * int (*ndo_sw_parent_id_get)(struct net_device *dev,
> + *			       struct netdev_phys_item_id *psid);
> @@ -1168,6 +1174,10 @@ struct net_device_ops {
> +	int			(*ndo_sw_parent_id_get)(struct net_device *dev,
> +							struct netdev_phys_item_id *psid);

here it is call id_get, which is similar but not the same.

Regards,
 M. Braun

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch net-next v2 02/10] net: introduce generic switch devices support
  2014-11-11  9:49   ` M. Braun
@ 2014-11-11 10:04     ` Jiri Pirko
  0 siblings, 0 replies; 100+ messages in thread
From: Jiri Pirko @ 2014-11-11 10:04 UTC (permalink / raw)
  To: M. Braun
  Cc: netdev, davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse,
	pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, jhs, sfeldma,
	f.fainelli, roopa, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck,
	john.ronciak, mleitner, shrijeet, gospo, bcrl

Tue, Nov 11, 2014 at 10:49:36AM CET, michael-dev@fami-braun.de wrote:
>
>
>Am 09.11.2014 um 11:51 schrieb Jiri Pirko:
>> diff --git a/Documentation/networking/switchdev.txt b/Documentation/networking/switchdev.txt
>> +	ndo_sw_parent_get_id - ...
>
>here the ndo is called get_id
>
>but
>
>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>> + *
>> + * int (*ndo_sw_parent_id_get)(struct net_device *dev,
>> + *			       struct netdev_phys_item_id *psid);
>> @@ -1168,6 +1174,10 @@ struct net_device_ops {
>> +	int			(*ndo_sw_parent_id_get)(struct net_device *dev,
>> +							struct netdev_phys_item_id *psid);
>
>here it is call id_get, which is similar but not the same.


Will fix the docs, thanks.

>
>Regards,
> M. Braun

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch net-next v2 03/10] rtnl: expose physical switch id for particular device
  2014-11-10 20:02     ` Scott Feldman
@ 2014-11-11 13:55       ` Roopa Prabhu
  0 siblings, 0 replies; 100+ messages in thread
From: Roopa Prabhu @ 2014-11-11 13:55 UTC (permalink / raw)
  To: Scott Feldman
  Cc: Jiri Pirko, Netdev, David S. Miller, nhorman, Andy Gospodarek,
	Thomas Graf, dborkman, ogerlitz, jesse, pshelar, azhou, ben,
	stephen, Kirsher, Jeffrey T, vyasevic, Cong Wang, Fastabend,
	John R, Eric Dumazet, Jamal Hadi Salim, Florian Fainelli,
	John Linville, jasowang, ebiederm, Nicolas Dichtel, ryazanov.s.a,
	buytenh, aviadr, nbd, Alexei Starovoitov

On 11/10/14, 12:02 PM, Scott Feldman wrote:
> On Mon, Nov 10, 2014 at 7:58 AM, Roopa Prabhu <roopa@cumulusnetworks.com> wrote:
>> On 11/9/14, 2:51 AM, Jiri Pirko wrote:
>>> The netdevice represents a port in a switch, it will expose
>>> IFLA_PHYS_SWITCH_ID value via rtnl. Two netdevices with the same value
>>> belong to one physical switch.
>>>
>>> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
>>> ---
>>>    include/uapi/linux/if_link.h |  1 +
>>>    net/core/rtnetlink.c         | 26 +++++++++++++++++++++++++-
>>>    2 files changed, 26 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
>>> index 7072d83..4163753 100644
>>> --- a/include/uapi/linux/if_link.h
>>> +++ b/include/uapi/linux/if_link.h
>>> @@ -145,6 +145,7 @@ enum {
>>>          IFLA_CARRIER,
>>>          IFLA_PHYS_PORT_ID,
>>>          IFLA_CARRIER_CHANGES,
>>> +       IFLA_PHYS_SWITCH_ID,
>>
>> Jiri, since we have not really converged on the switchdev class or having a
>> separate switchdev instance,
>> am thinking it is better if we dont expose any such switch_id to userspace
>> yet until absolutely needed. Do you need it today ?
>> There is no real in kernel hw switch driver that will use it today. And
>> quite likely this will need to change when we introduce real hw switch
>> drivers.
> How will it change when real hw switch drivers are introduced?  Will
> the real sw driver not be able to give a up unique ID for the switch?
  With my question i was trying to see if there are other ways to manage 
the relationship between the
switch device and the ports, instead of an random id provided by each 
switch driver. Today the switch id namespace seems
to be with each switch driver. On my systems on the first switch chip 
quite likely i will choose an id 0.,,and possibly some other
driver will choose the same id. The switch id namespace handling was not 
clear to me.

But, to your question, am sure we will have some id to go in there since 
the field is now available.

Thanks,
Roopa

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch net-next v2 08/10] bridge: add API to notify bridge driver of learned FBD on offloaded device
  2014-11-09 10:51 ` [patch net-next v2 08/10] bridge: add API to notify bridge driver of learned FBD on offloaded device Jiri Pirko
@ 2014-11-11 14:21   ` Roopa Prabhu
  2014-11-11 17:38     ` Scott Feldman
  0 siblings, 1 reply; 100+ messages in thread
From: Roopa Prabhu @ 2014-11-11 14:21 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse,
	pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, jhs, sfeldma,
	f.fainelli, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck,
	john.ronciak, mleitner, shrijeet, gospo, bcrl

On 11/9/14, 2:51 AM, Jiri Pirko wrote:
> From: Scott Feldman <sfeldma@gmail.com>
>
> When the swdev device learns a new mac/vlan on a port, it sends some async
> notification to the driver and the driver installs an FDB in the device.
> To give a holistic system view, the learned mac/vlan should be reflected
> in the bridge's FBD table, so the user, using normal iproute2 cmds, can view
> what is currently learned by the device.  This API on the bridge driver gives
> a way for the swdev driver to install an FBD entry in the bridge FBD table.
> (And remove one).
>
> This is equivalent to the device running these cmds:
>
>    bridge fdb [add|del] <mac> dev <dev> vid <vlan id> master
>
> This patch needs some extra eyeballs for review, in paricular around the
> locking and contexts.

scott/jiri, love that you have handled this case!, This will be useful.
But, quick question, Cant this also be done using the same ndo_op that 
is done to add the static fdb..?

Thanks!.
>
> Signed-off-by: Scott Feldman <sfeldma@gmail.com>
> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
> ---
>   include/linux/if_bridge.h | 18 ++++++++++
>   net/bridge/br_fdb.c       | 84 +++++++++++++++++++++++++++++++++++++++++++++++
>   2 files changed, 102 insertions(+)
>
> diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
> index 808dcb8..27ab217 100644
> --- a/include/linux/if_bridge.h
> +++ b/include/linux/if_bridge.h
> @@ -37,6 +37,24 @@ extern void brioctl_set(int (*ioctl_hook)(struct net *, unsigned int, void __use
>   typedef int br_should_route_hook_t(struct sk_buff *skb);
>   extern br_should_route_hook_t __rcu *br_should_route_hook;
>   
> +#if IS_ENABLED(CONFIG_BRIDGE)
> +int br_fdb_learn_add(struct net_device *dev,
> +		     const unsigned char *addr, u16 vid);
> +int br_fdb_learn_del(struct net_device *dev,
> +		     const unsigned char *addr, u16 vid);
> +#else
> +static inline int br_fdb_learn_add(struct net_device *dev,
> +				   const unsigned char *addr, u16 vid)
> +{
> +	return 0;
> +}
> +static inline int br_fdb_learn_del(struct net_device *dev,
> +				   const unsigned char *addr, u16 vid)
> +{
> +	return 0;
> +}
> +#endif
> +
>   #if IS_ENABLED(CONFIG_BRIDGE) && IS_ENABLED(CONFIG_BRIDGE_IGMP_SNOOPING)
>   int br_multicast_list_adjacent(struct net_device *dev,
>   			       struct list_head *br_ip_list);
> diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
> index f6f8bb5..e02d21b 100644
> --- a/net/bridge/br_fdb.c
> +++ b/net/bridge/br_fdb.c
> @@ -1022,3 +1022,87 @@ void br_fdb_unsync_static(struct net_bridge *br, struct net_bridge_port *p)
>   		}
>   	}
>   }
> +
> +int br_fdb_learn_add(struct net_device *dev, const unsigned char *addr,
> +		     u16 vid)
> +{
> +	struct net_bridge_port *p;
> +	struct net_bridge *br;
> +	struct hlist_head *head;
> +	struct net_bridge_fdb_entry *fdb;
> +	int err = 0;
> +
> +	rtnl_lock();
> +
> +	p = br_port_get_rtnl(dev);
> +	if (p == NULL) {
> +		pr_info("bridge: %s not a bridge port\n", dev->name);
> +		err = -EINVAL;
> +		goto err_rtnl_unlock;
> +	}
> +
> +	br = p->br;
> +
> +	spin_lock(&br->hash_lock);
> +
> +	head = &br->hash[br_mac_hash(addr, vid)];
> +	fdb = fdb_find(head, addr, vid);
> +	if (fdb == NULL) {
> +		fdb = fdb_create(head, p, addr, vid);
> +		if (!fdb) {
> +			err = -ENOMEM;
> +			goto err_unlock;
> +		}
> +		fdb->is_local = 1;
> +		fdb->used = jiffies;
> +		fdb->updated = jiffies;
> +		fdb_notify(br, fdb, RTM_NEWNEIGH);
> +	} else {
> +		err = -EEXIST;
> +	}
> +
> +err_unlock:
> +	spin_unlock(&br->hash_lock);
> +err_rtnl_unlock:
> +	rtnl_unlock();
> +
> +	return err;
> +}
> +EXPORT_SYMBOL(br_fdb_learn_add);
> +
> +int br_fdb_learn_del(struct net_device *dev, const unsigned char *addr,
> +		     u16 vid)
> +{
> +	struct net_bridge_port *p;
> +	struct net_bridge *br;
> +	struct hlist_head *head;
> +	struct net_bridge_fdb_entry *fdb;
> +	int err = 0;
> +
> +	rtnl_lock();
> +
> +	p = br_port_get_rtnl(dev);
> +	if (p == NULL) {
> +		pr_info("bridge: %s not a bridge port\n", dev->name);
> +		err = -EINVAL;
> +		goto err_rtnl_unlock;
> +	}
> +
> +	br = p->br;
> +
> +	spin_lock(&br->hash_lock);
> +
> +	head = &br->hash[br_mac_hash(addr, vid)];
> +	fdb = fdb_find(head, addr, vid);
> +	if (fdb)
> +		fdb_delete(br, fdb);
> +	else
> +		err = -ENOENT;
> +
> +	spin_unlock(&br->hash_lock);
> +err_rtnl_unlock:
> +	rtnl_unlock();
> +
> +	return err;
> +}
> +EXPORT_SYMBOL(br_fdb_learn_del);

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch net-next v2 05/10] rocker: introduce rocker switch driver
  2014-11-10 22:04   ` John Fastabend
@ 2014-11-11 14:29     ` Thomas Graf
  2014-11-11 15:19       ` Jiri Pirko
  2014-11-11 15:28     ` Jiri Pirko
  1 sibling, 1 reply; 100+ messages in thread
From: Thomas Graf @ 2014-11-11 14:29 UTC (permalink / raw)
  To: John Fastabend
  Cc: Jiri Pirko, netdev, davem, nhorman, andy, dborkman, ogerlitz,
	jesse, pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, jhs, sfeldma,
	f.fainelli, roopa, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck,
	john.ronciak, mleitner, shrijeet, gospo

On 11/10/14 at 02:04pm, John Fastabend wrote:
> On 11/09/2014 02:51 AM, Jiri Pirko wrote:
> >+static int rocker_port_sw_parent_id_get(struct net_device *dev,
> >+					struct netdev_phys_item_id *psid)
> >+{
> >+	struct rocker_port *rocker_port = netdev_priv(dev);
> >+	struct rocker *rocker = rocker_port->rocker;
> >+
> 
> hmm looks like you read this out of a magic switch register :) but
> my switch doesn't have this magic reg. I suposse the switch MAC address
> should work.

This needs more work afterwards. Either we define that the switch ID
is only unique in combination with the parent ifindex or we need to
introduce a notation of uniquness into the switch ID itself.

Is the goal to expose a hardware ID here to allow identification of
the hardware chip?

MAC is tempting but I'm pretty sure that we'll have pure L3 devices
being handled by this API at some point.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch net-next v2 02/10] net: introduce generic switch devices support
  2014-11-10 21:59   ` John Fastabend
@ 2014-11-11 15:11     ` Jiri Pirko
  0 siblings, 0 replies; 100+ messages in thread
From: Jiri Pirko @ 2014-11-11 15:11 UTC (permalink / raw)
  To: John Fastabend
  Cc: netdev, davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse,
	pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, jhs, sfeldma,
	f.fainelli, roopa, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck,
	john.ronciak, mleitner, shrijeet, gospo, bcrl

Mon, Nov 10, 2014 at 10:59:38PM CET, john.fastabend@gmail.com wrote:
>On 11/09/2014 02:51 AM, Jiri Pirko wrote:
>>The goal of this is to provide a possibility to support various switch
>>chips. Drivers should implement relevant ndos to do so. Now there is
>>only one ndo defined:
>>- for getting physical switch id is in place.
>>
>>Note that user can use random port netdevice to access the switch.
>>
>>Signed-off-by: Jiri Pirko <jiri@resnulli.us>
>>---
>>  Documentation/networking/switchdev.txt | 59 ++++++++++++++++++++++++++++++++++
>>  MAINTAINERS                            |  7 ++++
>>  include/linux/netdevice.h              | 10 ++++++
>>  include/net/switchdev.h                | 30 +++++++++++++++++
>>  net/Kconfig                            |  1 +
>>  net/Makefile                           |  3 ++
>>  net/switchdev/Kconfig                  | 13 ++++++++
>>  net/switchdev/Makefile                 |  5 +++
>>  net/switchdev/switchdev.c              | 33 +++++++++++++++++++
>>  9 files changed, 161 insertions(+)
>>  create mode 100644 Documentation/networking/switchdev.txt
>>  create mode 100644 include/net/switchdev.h
>>  create mode 100644 net/switchdev/Kconfig
>>  create mode 100644 net/switchdev/Makefile
>>  create mode 100644 net/switchdev/switchdev.c
>>
>>diff --git a/Documentation/networking/switchdev.txt b/Documentation/networking/switchdev.txt
>>new file mode 100644
>>index 0000000..98be76c
>>--- /dev/null
>>+++ b/Documentation/networking/switchdev.txt
>>@@ -0,0 +1,59 @@
>>+Switch (and switch-ish) device drivers HOWTO
>>+===========================
>>+
>>+Please note that the word "switch" is here used in very generic meaning.
>>+This include devices supporting L2/L3 but also various flow offloading chips,
>>+including switches embedded into SR-IOV NICs.
>>+
>>+Lets describe a topology a bit. Imagine the following example:
>>+
>>+       +----------------------------+    +---------------+
>>+       |     SOME switch chip       |    |      CPU      |
>>+       +----------------------------+    +---------------+
>>+       port1 port2 port3 port4 MNGMNT    |     PCI-E     |
>>+         |     |     |     |     |       +---------------+
>>+        PHY   PHY    |     |     |         |  NIC0 NIC1
>>+                     |     |     |         |   |    |
>>+                     |     |     +- PCI-E -+   |    |
>>+                     |     +------- MII -------+    |
>>+                     +------------- MII ------------+
>>+
>>+In this example, there are two independent lines between the switch silicon
>>+and CPU. NIC0 and NIC1 drivers are not aware of a switch presence. They are
>>+separate from the switch driver. SOME switch chip is by managed by a driver
>>+via PCI-E device MNGMNT. Note that MNGMNT device, NIC0 and NIC1 may be
>>+connected to some other type of bus.
>>+
>>+Now, for the previous example show the representation in kernel:
>>+
>>+       +----------------------------+    +---------------+
>>+       |     SOME switch chip       |    |      CPU      |
>>+       +----------------------------+    +---------------+
>>+       sw0p0 sw0p1 sw0p2 sw0p3 MNGMNT    |     PCI-E     |
>>+         |     |     |     |     |       +---------------+
>>+        PHY   PHY    |     |     |         |  eth0 eth1
>>+                     |     |     |         |   |    |
>>+                     |     |     +- PCI-E -+   |    |
>>+                     |     +------- MII -------+    |
>>+                     +------------- MII ------------+
>>+
>>+Lets call the example switch driver for SOME switch chip "SOMEswitch". This
>>+driver takes care of PCI-E device MNGMNT. There is a netdevice instance sw0pX
>>+created for each port of a switch. These netdevices are instances
>>+of "SOMEswitch" driver. sw0pX netdevices serve as a "representation"
>>+of the switch chip. eth0 and eth1 are instances of some other existing driver.
>>+
>>+The only difference of the switch-port netdevice from the ordinary netdevice
>>+is that is implements couple more NDOs:
>>+
>>+	ndo_sw_parent_get_id - This returns the same ID for two port netdevices
>>+			       of the same physical switch chip. This is
>>+			       mandatory to be implemented by all switch drivers
>>+			       and serves the caller for recognition of a port
>>+			       netdevice.
>
>What is the connection between ndo_sw_parent_get_id and
>ndo_get_phys_port_id(). I'm having a bit of trouble teasing
>this out.
>
>For example here is my ascii art for a SR-IOV NIC,
>
>       eth0     eth1     eth2
>        |         |        |
>        |         |        |
>        PF        VF       VF
>   +----+---------+--------+----+
>   |       embedded bridge      |
>   +-------------+--------------+
>                 |
>                port
>
>that can do switching between the various uplinks and downlinks.
>In IEEE 802.1Q language the embedded bridge acts like an edge
>relay. At least that seems to be the current state of the art
>for SR-IOV. Edge relay just means it has a single uplink port
>to the network and multiple downlinks and also isn't required
>to do learning and run loop detection protocols STP, et. al.
>
>Also there are multi-function devices that look the same except
>replace the VFs with PFs. It seems to be a common mode for NICs
>that do the iSCSI offloads with storage functions.
>
>When something is an embedded bridge vs a SOME switch chip is
>not entirely clear.
>
>My understanding is use ndo_sw_parent_get_id() when you have
>multiple physical ports all connected to a single switch object.
>When you have a single port connected to multiple PCIE functions
>or queues representing a netdev (e.g. macvlan offload) use the
>ndo_get_phys_port_id(). Just want to be sure we are on the
>same page here.

Nod. You described that right.


>
>Otherwise patch looks good. I think we can clear the above up
>with an addition to the documentation. Could go in after the
>initial set and be OK with me.
>
>IMO this patch is needed otherwise user space is at a complete
>loss on trying to figure out how netdevs map to switch silicon.
>You could have reused ndo_get_phys_port_id() perhaps but then
>I think user space may get confused by SR-IOV/VMDQ/etc ports
>attached to a switch silicon. For .02$ having a new distinct
>identifier is cleaner.

It most definitelly is. Therefore I went that way.


>
>
>>+	ndo_sw_parent_* - Functions that serve for a manipulation of the switch
>>+			  chip itself (it can be though of as a "parent" of the
>>+			  port, therefore the name). They are not port-specific.
>>+			  Caller might use arbitrary port netdevice of the same
>>+			  switch and it will make no difference.
>>+	ndo_sw_port_* - Functions that serve for a port-specific manipulation.
>
>[...]
>
>Thanks,
>John
>
>
>-- 
>John Fastabend         Intel Corporation

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch net-next v2 05/10] rocker: introduce rocker switch driver
  2014-11-11 14:29     ` Thomas Graf
@ 2014-11-11 15:19       ` Jiri Pirko
  2014-11-11 15:32         ` Thomas Graf
  0 siblings, 1 reply; 100+ messages in thread
From: Jiri Pirko @ 2014-11-11 15:19 UTC (permalink / raw)
  To: Thomas Graf
  Cc: John Fastabend, netdev, davem, nhorman, andy, dborkman, ogerlitz,
	jesse, pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, jhs, sfeldma,
	f.fainelli, roopa, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck,
	john.ronciak, mleitner, shrijeet, gospo

Tue, Nov 11, 2014 at 03:29:46PM CET, tgraf@suug.ch wrote:
>On 11/10/14 at 02:04pm, John Fastabend wrote:
>> On 11/09/2014 02:51 AM, Jiri Pirko wrote:
>> >+static int rocker_port_sw_parent_id_get(struct net_device *dev,
>> >+					struct netdev_phys_item_id *psid)
>> >+{
>> >+	struct rocker_port *rocker_port = netdev_priv(dev);
>> >+	struct rocker *rocker = rocker_port->rocker;
>> >+
>> 
>> hmm looks like you read this out of a magic switch register :) but
>> my switch doesn't have this magic reg. I suposse the switch MAC address
>> should work.
>
>This needs more work afterwards. Either we define that the switch ID
>is only unique in combination with the parent ifindex or we need to
>introduce a notation of uniquness into the switch ID itself.

This is something similar to physical port id. Each driver should take
care of generating that id.

>
>Is the goal to expose a hardware ID here to allow identification of
>the hardware chip?
>
>MAC is tempting but I'm pretty sure that we'll have pure L3 devices
>being handled by this API at some point.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch net-next v2 05/10] rocker: introduce rocker switch driver
  2014-11-10 22:04   ` John Fastabend
  2014-11-11 14:29     ` Thomas Graf
@ 2014-11-11 15:28     ` Jiri Pirko
  1 sibling, 0 replies; 100+ messages in thread
From: Jiri Pirko @ 2014-11-11 15:28 UTC (permalink / raw)
  To: John Fastabend
  Cc: netdev, davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse,
	pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, jhs, sfeldma,
	f.fainelli, roopa, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck,
	john.ronciak, mleitner, shrijeet, gospo, bcrl

Mon, Nov 10, 2014 at 11:04:26PM CET, john.fastabend@gmail.com wrote:
>On 11/09/2014 02:51 AM, Jiri Pirko wrote:
>>This patch introduces the first driver to benefit from the switchdev
>>infrastructure and to implement newly introduced switch ndos. This is a
>>driver for emulated switch chip implemented in qemu:
>>https://github.com/sfeldma/qemu-rocker/
>>
>>This patch is a result of joint work with Scott Feldman.
>>
>>Signed-off-by: Scott Feldman <sfeldma@gmail.com>
>>Signed-off-by: Jiri Pirko <jiri@resnulli.us>
>>---
>>  MAINTAINERS                          |    7 +
>>  drivers/net/ethernet/Kconfig         |    1 +
>>  drivers/net/ethernet/Makefile        |    1 +
>>  drivers/net/ethernet/rocker/Kconfig  |   27 +
>>  drivers/net/ethernet/rocker/Makefile |    5 +
>>  drivers/net/ethernet/rocker/rocker.c | 2060 ++++++++++++++++++++++++++++++++++
>>  drivers/net/ethernet/rocker/rocker.h |  427 +++++++
>>  7 files changed, 2528 insertions(+)
>>  create mode 100644 drivers/net/ethernet/rocker/Kconfig
>>  create mode 100644 drivers/net/ethernet/rocker/Makefile
>>  create mode 100644 drivers/net/ethernet/rocker/rocker.c
>>  create mode 100644 drivers/net/ethernet/rocker/rocker.h
>>
>
>[...]
>
>>+
>>+static netdev_tx_t rocker_port_xmit(struct sk_buff *skb, struct net_device *dev)
>>+{
>>+	struct rocker_port *rocker_port = netdev_priv(dev);
>>+	struct rocker *rocker = rocker_port->rocker;
>>+	struct rocker_desc_info *desc_info;
>>+	struct rocker_tlv *frags;
>>+	int i;
>>+	int err;
>>+
>>+	desc_info = rocker_desc_head_get(&rocker_port->tx_ring);
>>+	if (unlikely(!desc_info)) {
>>+		if (net_ratelimit())
>
>Could you have a netif_stop_queue() here as well same as below? Not
>that optimizing the xmit routine is the interesting part of this patch.
>But I guess this is just some strange error path because I see you
>check this case below.

This code should never be reached.
If the ring is full, the queue is previously stopped by:

        desc_info = rocker_desc_head_get(&rocker_port->tx_ring);
        if (!desc_info)
                netif_stop_queue(dev);





>
>>+			netdev_err(dev, "tx ring full when queue awake\n");
>>+		return NETDEV_TX_BUSY;
>>+	}
>>+
>>+	rocker_desc_cookie_ptr_set(desc_info, skb);
>>+
>>+	frags = rocker_tlv_nest_start(desc_info, ROCKER_TLV_TX_FRAGS);
>>+	if (!frags)
>>+		goto out;
>>+	err = rocker_tx_desc_frag_map_put(rocker_port, desc_info,
>>+					  skb->data, skb_headlen(skb));
>>+	if (err)
>>+		goto nest_cancel;
>>+	if (skb_shinfo(skb)->nr_frags > ROCKER_TX_FRAGS_MAX)
>>+		goto nest_cancel;
>>+
>>+	for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
>>+		const skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
>>+
>>+		err = rocker_tx_desc_frag_map_put(rocker_port, desc_info,
>>+						  skb_frag_address(frag),
>>+						  skb_frag_size(frag));
>>+		if (err)
>>+			goto unmap_frags;
>>+	}
>>+	rocker_tlv_nest_end(desc_info, frags);
>>+
>>+	rocker_desc_gen_clear(desc_info);
>>+	rocker_desc_head_set(rocker, &rocker_port->tx_ring, desc_info);
>>+
>>+	desc_info = rocker_desc_head_get(&rocker_port->tx_ring);
>>+	if (!desc_info)
>>+		netif_stop_queue(dev);
>
>I'm not entirely sure I followed the TLV usage here but OK. If it
>works...
>
>>+
>>+	return NETDEV_TX_OK;
>>+
>>+unmap_frags:
>>+	rocker_tx_desc_frags_unmap(rocker_port, desc_info);
>>+nest_cancel:
>>+	rocker_tlv_nest_cancel(desc_info, frags);
>>+out:
>>+	dev_kfree_skb(skb);
>>+	return NETDEV_TX_OK;
>>+}
>>+
>>+static int rocker_port_set_mac_address(struct net_device *dev, void *p)
>>+{
>>+	struct sockaddr *addr = p;
>>+	struct rocker_port *rocker_port = netdev_priv(dev);
>>+	int err;
>>+
>>+	if (!is_valid_ether_addr(addr->sa_data))
>>+		return -EADDRNOTAVAIL;
>>+
>>+	err = rocker_cmd_set_port_settings_macaddr(rocker_port, addr->sa_data);
>>+	if (err)
>>+		return err;
>>+	memcpy(dev->dev_addr, addr->sa_data, dev->addr_len);
>>+	return 0;
>>+}
>>+
>>+static int rocker_port_sw_parent_id_get(struct net_device *dev,
>>+					struct netdev_phys_item_id *psid)
>>+{
>>+	struct rocker_port *rocker_port = netdev_priv(dev);
>>+	struct rocker *rocker = rocker_port->rocker;
>>+
>
>hmm looks like you read this out of a magic switch register :) but
>my switch doesn't have this magic reg. I suposse the switch MAC address
>should work.
>
>>+	psid->id_len = sizeof(rocker->hw.id);
>>+	memcpy(&psid->id, &rocker->hw.id, psid->id_len);
>>+	return 0;
>>+}
>>+
>>+static const struct net_device_ops rocker_port_netdev_ops = {
>>+	.ndo_open			= rocker_port_open,
>>+	.ndo_stop			= rocker_port_stop,
>>+	.ndo_start_xmit			= rocker_port_xmit,
>>+	.ndo_set_mac_address		= rocker_port_set_mac_address,
>>+	.ndo_sw_parent_id_get		= rocker_port_sw_parent_id_get,
>>+};
>>+
>>+/********************
>>+ * ethtool interface
>>+ ********************/
>>+
>>+static int rocker_port_get_settings(struct net_device *dev,
>>+				    struct ethtool_cmd *ecmd)
>>+{
>>+	struct rocker_port *rocker_port = netdev_priv(dev);
>>+
>>+	return rocker_cmd_get_port_settings_ethtool(rocker_port, ecmd);
>>+}
>>+
>>+static int rocker_port_set_settings(struct net_device *dev,
>>+				    struct ethtool_cmd *ecmd)
>>+{
>>+	struct rocker_port *rocker_port = netdev_priv(dev);
>>+
>>+	return rocker_cmd_set_port_settings_ethtool(rocker_port, ecmd);
>>+}
>>+
>>+static void rocker_port_get_drvinfo(struct net_device *dev,
>>+				    struct ethtool_drvinfo *drvinfo)
>>+{
>>+	strlcpy(drvinfo->driver, rocker_driver_name, sizeof(drvinfo->driver));
>>+	strlcpy(drvinfo->version, UTS_RELEASE, sizeof(drvinfo->version));
>>+}
>>+
>>+static const struct ethtool_ops rocker_port_ethtool_ops = {
>>+	.get_settings		= rocker_port_get_settings,
>>+	.set_settings		= rocker_port_set_settings,
>>+	.get_drvinfo		= rocker_port_get_drvinfo,
>>+	.get_link		= ethtool_op_get_link,
>>+};
>>+
>
>[...]
>
>Looks reasonable to me, although I mostly scanned it and looked
>over the interface parts. My comments are just observations no
>need to change anything for them.
>
>Reviewed-by: John Fastabend <john.r.fastabend@intel.com>
>
>
>-- 
>John Fastabend         Intel Corporation

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch net-next v2 05/10] rocker: introduce rocker switch driver
  2014-11-11 15:19       ` Jiri Pirko
@ 2014-11-11 15:32         ` Thomas Graf
  2014-11-11 15:40           ` Jiri Pirko
  2014-11-11 15:41           ` Roopa Prabhu
  0 siblings, 2 replies; 100+ messages in thread
From: Thomas Graf @ 2014-11-11 15:32 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: John Fastabend, netdev, davem, nhorman, andy, dborkman, ogerlitz,
	jesse, pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, jhs, sfeldma,
	f.fainelli, roopa, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck,
	john.ronciak, mleitner, shrijeet, gospo

On 11/11/14 at 04:19pm, Jiri Pirko wrote:
> Tue, Nov 11, 2014 at 03:29:46PM CET, tgraf@suug.ch wrote:
> >On 11/10/14 at 02:04pm, John Fastabend wrote:
> >> On 11/09/2014 02:51 AM, Jiri Pirko wrote:
> >> >+static int rocker_port_sw_parent_id_get(struct net_device *dev,
> >> >+					struct netdev_phys_item_id *psid)
> >> >+{
> >> >+	struct rocker_port *rocker_port = netdev_priv(dev);
> >> >+	struct rocker *rocker = rocker_port->rocker;
> >> >+
> >> 
> >> hmm looks like you read this out of a magic switch register :) but
> >> my switch doesn't have this magic reg. I suposse the switch MAC address
> >> should work.
> >
> >This needs more work afterwards. Either we define that the switch ID
> >is only unique in combination with the parent ifindex or we need to
> >introduce a notation of uniquness into the switch ID itself.
> 
> This is something similar to physical port id. Each driver should take
> care of generating that id.

If the ID is only unique within a driver, then the user space cannot
rely on using the ID to group switch ports. Multiple drivers might
come up with the same ID.

Even now, multiple rocker instances would have the same ID.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch net-next v2 05/10] rocker: introduce rocker switch driver
  2014-11-11 15:32         ` Thomas Graf
@ 2014-11-11 15:40           ` Jiri Pirko
  2014-11-11 16:10             ` Thomas Graf
  2014-11-27 14:09             ` Florian Fainelli
  2014-11-11 15:41           ` Roopa Prabhu
  1 sibling, 2 replies; 100+ messages in thread
From: Jiri Pirko @ 2014-11-11 15:40 UTC (permalink / raw)
  To: Thomas Graf
  Cc: John Fastabend, netdev, davem, nhorman, andy, dborkman, ogerlitz,
	jesse, pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, jhs, sfeldma,
	f.fainelli, roopa, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck,
	john.ronciak, mleitner, shrijeet, gospo

Tue, Nov 11, 2014 at 04:32:32PM CET, tgraf@suug.ch wrote:
>On 11/11/14 at 04:19pm, Jiri Pirko wrote:
>> Tue, Nov 11, 2014 at 03:29:46PM CET, tgraf@suug.ch wrote:
>> >On 11/10/14 at 02:04pm, John Fastabend wrote:
>> >> On 11/09/2014 02:51 AM, Jiri Pirko wrote:
>> >> >+static int rocker_port_sw_parent_id_get(struct net_device *dev,
>> >> >+					struct netdev_phys_item_id *psid)
>> >> >+{
>> >> >+	struct rocker_port *rocker_port = netdev_priv(dev);
>> >> >+	struct rocker *rocker = rocker_port->rocker;
>> >> >+
>> >> 
>> >> hmm looks like you read this out of a magic switch register :) but
>> >> my switch doesn't have this magic reg. I suposse the switch MAC address
>> >> should work.
>> >
>> >This needs more work afterwards. Either we define that the switch ID
>> >is only unique in combination with the parent ifindex or we need to
>> >introduce a notation of uniquness into the switch ID itself.
>> 
>> This is something similar to physical port id. Each driver should take
>> care of generating that id.
>
>If the ID is only unique within a driver, then the user space cannot
>rely on using the ID to group switch ports. Multiple drivers might
>come up with the same ID.

Well, as I said, it is the same as for physical port id. But if needed,
there can be added some simple mechanism for the id registration
ensuring their uniqueness.

>
>Even now, multiple rocker instances would have the same ID.

It depends on what hw returns to driver.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch net-next v2 05/10] rocker: introduce rocker switch driver
  2014-11-11 15:32         ` Thomas Graf
  2014-11-11 15:40           ` Jiri Pirko
@ 2014-11-11 15:41           ` Roopa Prabhu
  2014-11-11 15:44             ` John Fastabend
  1 sibling, 1 reply; 100+ messages in thread
From: Roopa Prabhu @ 2014-11-11 15:41 UTC (permalink / raw)
  To: Thomas Graf
  Cc: Jiri Pirko, John Fastabend, netdev, davem, nhorman, andy,
	dborkman, ogerlitz, jesse, pshelar, azhou, ben, stephen,
	jeffrey.t.kirsher, vyasevic, xiyou.wangcong, john.r.fastabend,
	edumazet, jhs, sfeldma, f.fainelli, linville, jasowang, ebiederm,
	nicolas.dichtel, ryazanov.s.a, buytenh, aviadr, nbd,
	alexei.starovoitov, Neil.Jerram, ronye, simon.horman,
	alexander.h.duyck, john.ronciak, mleitner, shrijeet, gos

On 11/11/14, 7:32 AM, Thomas Graf wrote:
> On 11/11/14 at 04:19pm, Jiri Pirko wrote:
>> Tue, Nov 11, 2014 at 03:29:46PM CET, tgraf@suug.ch wrote:
>>> On 11/10/14 at 02:04pm, John Fastabend wrote:
>>>> On 11/09/2014 02:51 AM, Jiri Pirko wrote:
>>>>> +static int rocker_port_sw_parent_id_get(struct net_device *dev,
>>>>> +					struct netdev_phys_item_id *psid)
>>>>> +{
>>>>> +	struct rocker_port *rocker_port = netdev_priv(dev);
>>>>> +	struct rocker *rocker = rocker_port->rocker;
>>>>> +
>>>> hmm looks like you read this out of a magic switch register :) but
>>>> my switch doesn't have this magic reg. I suposse the switch MAC address
>>>> should work.
>>> This needs more work afterwards. Either we define that the switch ID
>>> is only unique in combination with the parent ifindex or we need to
>>> introduce a notation of uniquness into the switch ID itself.
>> This is something similar to physical port id. Each driver should take
>> care of generating that id.
> If the ID is only unique within a driver, then the user space cannot
> rely on using the ID to group switch ports. Multiple drivers might
> come up with the same ID.
>
> Even now, multiple rocker instances would have the same ID.
yep, that was my concern on the switch id namespace handling (on the 
other thread).

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch net-next v2 05/10] rocker: introduce rocker switch driver
  2014-11-11 15:41           ` Roopa Prabhu
@ 2014-11-11 15:44             ` John Fastabend
  0 siblings, 0 replies; 100+ messages in thread
From: John Fastabend @ 2014-11-11 15:44 UTC (permalink / raw)
  To: Roopa Prabhu
  Cc: Thomas Graf, Jiri Pirko, netdev, davem, nhorman, andy, dborkman,
	ogerlitz, jesse, pshelar, azhou, ben, stephen, jeffrey.t.kirsher,
	vyasevic, xiyou.wangcong, john.r.fastabend, edumazet, jhs,
	sfeldma, f.fainelli, linville, jasowang, ebiederm,
	nicolas.dichtel, ryazanov.s.a, buytenh, aviadr, nbd,
	alexei.starovoitov, Neil.Jerram, ronye, simon.horman,
	alexander.h.duyck, john.ronciak, mleitner, shrijeet, gospo

On 11/11/2014 07:41 AM, Roopa Prabhu wrote:
> On 11/11/14, 7:32 AM, Thomas Graf wrote:
>> On 11/11/14 at 04:19pm, Jiri Pirko wrote:
>>> Tue, Nov 11, 2014 at 03:29:46PM CET, tgraf@suug.ch wrote:
>>>> On 11/10/14 at 02:04pm, John Fastabend wrote:
>>>>> On 11/09/2014 02:51 AM, Jiri Pirko wrote:
>>>>>> +static int rocker_port_sw_parent_id_get(struct net_device *dev,
>>>>>> +                    struct netdev_phys_item_id *psid)
>>>>>> +{
>>>>>> +    struct rocker_port *rocker_port = netdev_priv(dev);
>>>>>> +    struct rocker *rocker = rocker_port->rocker;
>>>>>> +
>>>>> hmm looks like you read this out of a magic switch register :) but
>>>>> my switch doesn't have this magic reg. I suposse the switch MAC
>>>>> address
>>>>> should work.
>>>> This needs more work afterwards. Either we define that the switch ID
>>>> is only unique in combination with the parent ifindex or we need to
>>>> introduce a notation of uniquness into the switch ID itself.
>>> This is something similar to physical port id. Each driver should take
>>> care of generating that id.
>> If the ID is only unique within a driver, then the user space cannot
>> rely on using the ID to group switch ports. Multiple drivers might
>> come up with the same ID.
>>
>> Even now, multiple rocker instances would have the same ID.
> yep, that was my concern on the switch id namespace handling (on the
> other thread).

At least the devices I work with have a burnt in MAC which could
be used for this.


-- 
John Fastabend         Intel Corporation

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch net-next v2 05/10] rocker: introduce rocker switch driver
  2014-11-11 15:40           ` Jiri Pirko
@ 2014-11-11 16:10             ` Thomas Graf
  2014-11-27 14:09             ` Florian Fainelli
  1 sibling, 0 replies; 100+ messages in thread
From: Thomas Graf @ 2014-11-11 16:10 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: John Fastabend, netdev, davem, nhorman, andy, dborkman, ogerlitz,
	jesse, pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, jhs, sfeldma,
	f.fainelli, roopa, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck,
	john.ronciak, mleitner, shrijeet, gospo

On 11/11/14 at 04:40pm, Jiri Pirko wrote:
> Tue, Nov 11, 2014 at 04:32:32PM CET, tgraf@suug.ch wrote:
> >If the ID is only unique within a driver, then the user space cannot
> >rely on using the ID to group switch ports. Multiple drivers might
> >come up with the same ID.
> 
> Well, as I said, it is the same as for physical port id. But if needed,
> there can be added some simple mechanism for the id registration
> ensuring their uniqueness.
> 
> >
> >Even now, multiple rocker instances would have the same ID.
> 
> It depends on what hw returns to driver.

I think that falls within the responsibility of the API. I'll propose
a patch to address it after this gets in.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch net-next v2 08/10] bridge: add API to notify bridge driver of learned FBD on offloaded device
  2014-11-11 14:21   ` Roopa Prabhu
@ 2014-11-11 17:38     ` Scott Feldman
  2014-11-11 21:43       ` Roopa Prabhu
  0 siblings, 1 reply; 100+ messages in thread
From: Scott Feldman @ 2014-11-11 17:38 UTC (permalink / raw)
  To: Roopa Prabhu
  Cc: Jiri Pirko, Netdev, David S. Miller, nhorman, Andy Gospodarek,
	Thomas Graf, dborkman, ogerlitz, jesse, pshelar, azhou, ben,
	stephen, Kirsher, Jeffrey T, vyasevic, Cong Wang, Fastabend,
	John R, Eric Dumazet, Jamal Hadi Salim, Florian Fainelli,
	John Linville, jasowang, ebiederm, Nicolas Dichtel, ryazanov.s.a,
	buytenh, Aviad Raveh, nbd, Alexei Starovoitov

On Tue, Nov 11, 2014 at 4:21 AM, Roopa Prabhu <roopa@cumulusnetworks.com> wrote:
> On 11/9/14, 2:51 AM, Jiri Pirko wrote:
>>
>> From: Scott Feldman <sfeldma@gmail.com>
>>
>> When the swdev device learns a new mac/vlan on a port, it sends some async
>> notification to the driver and the driver installs an FDB in the device.
>> To give a holistic system view, the learned mac/vlan should be reflected
>> in the bridge's FBD table, so the user, using normal iproute2 cmds, can
>> view
>> what is currently learned by the device.  This API on the bridge driver
>> gives
>> a way for the swdev driver to install an FBD entry in the bridge FBD
>> table.
>> (And remove one).
>>
>> This is equivalent to the device running these cmds:
>>
>>    bridge fdb [add|del] <mac> dev <dev> vid <vlan id> master
>>
>> This patch needs some extra eyeballs for review, in paricular around the
>> locking and contexts.
>
>
> scott/jiri, love that you have handled this case!, This will be useful.
> But, quick question, Cant this also be done using the same ndo_op that is
> done to add the static fdb..?

Maybe.  I think I tried sending netlink msg from swdev driver to
bridge driver, but wasn't able to make it to work.  I'm not sure if
it's possible to do that, in general, send netlink kernel-to-kernel.
The other option is to synthesize a netlink msg with needed attrs and
call ndo in bridge directly.  That feels a little yucky to me.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch net-next v2 08/10] bridge: add API to notify bridge driver of learned FBD on offloaded device
  2014-11-11 17:38     ` Scott Feldman
@ 2014-11-11 21:43       ` Roopa Prabhu
  0 siblings, 0 replies; 100+ messages in thread
From: Roopa Prabhu @ 2014-11-11 21:43 UTC (permalink / raw)
  To: Scott Feldman
  Cc: Jiri Pirko, Netdev, David S. Miller, nhorman, Andy Gospodarek,
	Thomas Graf, dborkman, ogerlitz, jesse, pshelar, azhou, ben,
	stephen, Kirsher, Jeffrey T, vyasevic, Cong Wang, Fastabend,
	John R, Eric Dumazet, Jamal Hadi Salim, Florian Fainelli,
	John Linville, jasowang, ebiederm, Nicolas Dichtel, ryazanov.s.a,
	buytenh, Aviad Raveh, nbd, Alexei Starovoitov

On 11/11/14, 9:38 AM, Scott Feldman wrote:
> On Tue, Nov 11, 2014 at 4:21 AM, Roopa Prabhu <roopa@cumulusnetworks.com> wrote:
>> On 11/9/14, 2:51 AM, Jiri Pirko wrote:
>>> From: Scott Feldman <sfeldma@gmail.com>
>>>
>>> When the swdev device learns a new mac/vlan on a port, it sends some async
>>> notification to the driver and the driver installs an FDB in the device.
>>> To give a holistic system view, the learned mac/vlan should be reflected
>>> in the bridge's FBD table, so the user, using normal iproute2 cmds, can
>>> view
>>> what is currently learned by the device.  This API on the bridge driver
>>> gives
>>> a way for the swdev driver to install an FBD entry in the bridge FBD
>>> table.
>>> (And remove one).
>>>
>>> This is equivalent to the device running these cmds:
>>>
>>>     bridge fdb [add|del] <mac> dev <dev> vid <vlan id> master
>>>
>>> This patch needs some extra eyeballs for review, in paricular around the
>>> locking and contexts.
>>
>> scott/jiri, love that you have handled this case!, This will be useful.
>> But, quick question, Cant this also be done using the same ndo_op that is
>> done to add the static fdb..?
> Maybe.  I think I tried sending netlink msg from swdev driver to
> bridge driver, but wasn't able to make it to work.  I'm not sure if
> it's possible to do that, in general, send netlink kernel-to-kernel.
> The other option is to synthesize a netlink msg with needed attrs and
> call ndo in bridge directly.  That feels a little yucky to me.
ok.
If i understand correctly this is just an api that drivers can use if 
they want to push learnt fdb
entries to the fdb associated with the bridge.  So, the learn 
sync-hw-fdb behavior is switch driver dependent. If the switch driver
allows learning on a port, she can use this api to sync learnt entries 
to the kernel.

How do the policy flags (That jamal and you were discussing on the other 
thread) apply here ?
SYNC_HW_FDB  ?...will this flag be needed...?. I am just trying to see 
how this api relates to the other thread.

Thanks,
Roopa

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch net-next v2 06/10] bridge: introduce fdb offloading via switchdev
  2014-11-10 19:03           ` Roopa Prabhu
@ 2014-11-12 13:43             ` Jiri Pirko
  0 siblings, 0 replies; 100+ messages in thread
From: Jiri Pirko @ 2014-11-12 13:43 UTC (permalink / raw)
  To: Roopa Prabhu
  Cc: Andy Gospodarek, Thomas Graf, Jamal Hadi Salim, netdev, davem,
	nhorman, andy, dborkman, ogerlitz, jesse, pshelar, azhou, ben,
	stephen, jeffrey.t.kirsher, vyasevic, xiyou.wangcong,
	john.r.fastabend, edumazet, sfeldma, f.fainelli, linville,
	jasowang, ebiederm, nicolas.dichtel, ryazanov.s.a, buytenh,
	aviadr, nbd, alexei.starovoitov, Neil.Jerram, ronye,
	simon.horman, alexander.h.duyck, john.ronciak, mleitner, shrij

Mon, Nov 10, 2014 at 08:03:41PM CET, roopa@cumulusnetworks.com wrote:
>On 11/10/14, 9:30 AM, Andy Gospodarek wrote:
>>On Mon, Nov 10, 2014 at 01:51:00PM +0000, Thomas Graf wrote:
>>>On 11/10/14 at 09:15am, Jiri Pirko wrote:
>>>>There are few problems in re-using this. It is netlink based so for calling
>>>>it from bridge code, we would have to construct netlink message. But
>>>>that could be probably changed.
>>>>As you can see from the list of parameters, this is no longer about fdb (addr,
>>>>vlanid) but this has been extended to something else. See vxlan code for
>>>>what this is used for. I believe that fdb_add/del should be renamed to
>>>>something else, perhaps l2neigh_add/del or something like that.
>>>>The other problem is that fdb_add/del is currently used by various
>>>>drivers for different purpose (adding macs to unicast list).
>>>Can you elaborate a bit on the intended semantic differences between
>>>the existing ndo_fdb_add() and ndo_sw_port_fdb_add()? I'm not sure we
>>>need the sw_ prefix for this specific ndo.
>>>
>>>I completely agree that relying on Netlink is wrong because we'll have
>>>in-kernel users of the API but I believe that existing ndo_fdb_add()
>>>implementations in i40e, ixgbe, qlcnic and macvlan could use the new
>>>API you propose.
>>I also think the same API could be used quite easily on the current
>>drivers that use it.
>>
>>I was looking at this earlier today and there are only 5 drivers
>>(outside the bridge code) that support ndo_fdb_add.  The 3 hardware
>>drivers and vxlan driver seem like they could use this new API.  The
>>macvlan code appears to simply set the uc and mc lists, which seems like
>>it could be done other ways -- confirmation from John Fastabend, Roopa,
>>and mst would be good.
>yes, that is correct. The macvlan code, when not set for passthru mode, seems
>to just program the uc-mc lists,

If you look at how drivers implement this, they also only add addr into
uc/mc list. And if the ndo fdb add is not implemented by the driver, core
does it for you - see ndo_dflt_fdb_add.



> which again get synched to the lowerdev (and possibly from lowerdev to hw in
>some cases).
>
>I agree that it would be really nice if the existing api's can be made to
>work.
>>>How about we rename the existing ndo_fdb_add() to ndo_neigh_add() as
>>>you propose and convert vxlan over to it and have all others which don't
>>>even depend on the Netlink attributes being passed in (i40e, ixgbe,
>>>qlcnic, macvlan) use ndo_fdb_add() which would have the behaviour of your
>>>proposed ndo_sw_port_fdb_add()?
>>I would much rather see something like Thomas proposes here.  I know you
>>would like to see these patches get included (I'm anxious to see better
>>in-kernel offload support too!), but separate, possibly unnecessary
>>APIs like this can get painful for driver maintainers (upstream and
>>distro maintainers).
>>
>Ack!.


I will think about this more and prepare something like Thomas proposes.
Stay tuned, I will be out undil Monday so I will post v3 Tue/Wed next
week.

In the meantime, feel free to study the fdb_add code and feel free to
give me some more thoughts/patches about this. Thanks!


>--
>To unsubscribe from this list: send the line "unsubscribe netdev" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch net-next v2 00/10] introduce rocker switch driver with hardware accelerated datapath api - phase 1: bridge fdb offload
  2014-11-09 10:51 [patch net-next v2 00/10] introduce rocker switch driver with hardware accelerated datapath api - phase 1: bridge fdb offload Jiri Pirko
                   ` (12 preceding siblings ...)
  2014-11-10 16:48 ` Thomas Graf
@ 2014-11-12 13:44 ` Jiri Pirko
  13 siblings, 0 replies; 100+ messages in thread
From: Jiri Pirko @ 2014-11-12 13:44 UTC (permalink / raw)
  To: netdev
  Cc: davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse, pshelar,
	azhou, ben, stephen, jeffrey.t.kirsher, vyasevic, xiyou.wangcong,
	john.r.fastabend, edumazet, jhs, sfeldma, f.fainelli, roopa,
	linville, jasowang, ebiederm, nicolas.dichtel, ryazanov.s.a,
	buytenh, aviadr, nbd, alexei.starovoitov, Neil.Jerram, ronye,
	simon.horman, alexander.h.duyck, john.ronciak, mleitner,
	shrijeet, gospo, bcrl

Going to post v3 with the suggested changes included next Tue/Wed.
Thank you all for review!


Sun, Nov 09, 2014 at 11:51:10AM CET, jiri@resnulli.us wrote:
>Hi all.
>
>This patchset is just the first phase of switch and switch-ish device
>support api in kernel. Note that the api will extend (our complete work
>can be pulled from https://github.com/jpirko/net-next-rocker).
>
>So what this patchset includes:
>- introduce switchdev api for implementing switch drivers (so far
>  only linux bridge fdb offload is covered)
>- introduce rocker switch driver which implements switchdev api
>
>As to the discussion if there is need to have specific class of device
>representing the switch itself, so far we found no need to introduce that.
>But we are generally ok with the idea and when the time comes and it will
>be needed, it can be easily introduced without any disturbance.
>
>This patchset introduces switch id export through rtnetlink and sysfs,
>which is similar to what we have for port id in SR-IOV. I will send iproute2
>patchset for showing the switch id for port netdevs once this is applied.
>
>For detailed description, please see individual patches.
>
>v1->v2:
>- addressed all DaveM's comments
>
>Jiri Pirko (5):
>  net: rename netdev_phys_port_id to more generic name
>  net: introduce generic switch devices support
>  rtnl: expose physical switch id for particular device
>  net-sysfs: expose physical switch id for particular device
>  rocker: introduce rocker switch driver
>
>Scott Feldman (5):
>  bridge: introduce fdb offloading via switchdev
>  bridge: call netdev_sw_port_stp_update when bridge port STP status
>    changes
>  bridge: add API to notify bridge driver of learned FBD on offloaded
>    device
>  rocker: implement rocker ofdpa flow table manipulation
>  rocker: implement L2 bridge offloading
>
> Documentation/networking/switchdev.txt           |   59 +
> MAINTAINERS                                      |   14 +
> drivers/net/ethernet/Kconfig                     |    1 +
> drivers/net/ethernet/Makefile                    |    1 +
> drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c |    2 +-
> drivers/net/ethernet/intel/i40e/i40e_main.c      |    2 +-
> drivers/net/ethernet/mellanox/mlx4/en_netdev.c   |    2 +-
> drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c |    2 +-
> drivers/net/ethernet/rocker/Kconfig              |   27 +
> drivers/net/ethernet/rocker/Makefile             |    5 +
> drivers/net/ethernet/rocker/rocker.c             | 4182 ++++++++++++++++++++++
> drivers/net/ethernet/rocker/rocker.h             |  427 +++
> include/linux/if_bridge.h                        |   18 +
> include/linux/netdevice.h                        |   48 +-
> include/net/switchdev.h                          |   53 +
> include/uapi/linux/if_link.h                     |    1 +
> net/Kconfig                                      |    1 +
> net/Makefile                                     |    3 +
> net/bridge/br_fdb.c                              |   94 +-
> net/bridge/br_netlink.c                          |    2 +
> net/bridge/br_stp.c                              |    4 +
> net/bridge/br_stp_if.c                           |    3 +
> net/bridge/br_stp_timer.c                        |    2 +
> net/core/dev.c                                   |    2 +-
> net/core/net-sysfs.c                             |   26 +-
> net/core/rtnetlink.c                             |   30 +-
> net/switchdev/Kconfig                            |   13 +
> net/switchdev/Makefile                           |    5 +
> net/switchdev/switchdev.c                        |   93 +
> 29 files changed, 5104 insertions(+), 18 deletions(-)
> create mode 100644 Documentation/networking/switchdev.txt
> create mode 100644 drivers/net/ethernet/rocker/Kconfig
> create mode 100644 drivers/net/ethernet/rocker/Makefile
> create mode 100644 drivers/net/ethernet/rocker/rocker.c
> create mode 100644 drivers/net/ethernet/rocker/rocker.h
> create mode 100644 include/net/switchdev.h
> create mode 100644 net/switchdev/Kconfig
> create mode 100644 net/switchdev/Makefile
> create mode 100644 net/switchdev/switchdev.c
>
>-- 
>1.9.3
>

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch net-next v2 00/10] introduce rocker switch driver with hardware accelerated datapath api - phase 1: bridge fdb offload
  2014-11-10 22:23         ` John Fastabend
  2014-11-11  8:51           ` Simon Horman
@ 2014-11-13  5:44           ` Simon Horman
  2014-11-13  6:31             ` John Fastabend
  1 sibling, 1 reply; 100+ messages in thread
From: Simon Horman @ 2014-11-13  5:44 UTC (permalink / raw)
  To: John Fastabend
  Cc: Jamal Hadi Salim, Jiri Pirko, netdev, davem, nhorman, andy,
	tgraf, dborkman, ogerlitz, jesse, pshelar, azhou, ben, stephen,
	jeffrey.t.kirsher, vyasevic, xiyou.wangcong, john.r.fastabend,
	edumazet, sfeldma, f.fainelli, roopa, linville, jasowang,
	ebiederm, nicolas.dichtel, ryazanov.s.a, buytenh, aviadr, nbd,
	alexei.starovoitov, Neil.Jerram, ronye, alexander.h.duyck,
	john.ronciak, mleitner, shrijeet, gospo

[snip]

> Simon, if your feeling adventurous any feedback on the repo link
> would be great. I still need to smash the commit log into something
> coherent though at the moment you can see all the errors and rewrites,
> etc as I made them.

Hi John,

here is some preliminary feedback:

* I notice that the parse graph code isn't present yet.
  I suppose this is a difficult piece that naturally follows many
  other piece. None the less it is possibly the piece of most
  interest to me :-)

* Will del and update flows require flows to already exist?
  And similarly, will add flow require flows with the same match to not
  already exist?  If so, the error handling seems tricky of more than one
  flow is to be deleted/updated. IIRC there was some discussion of that
  kind of issue at the (double) round table discussion on the last day of
  LPC14 in Düsseldorf.

* Should the .node_count value of ixgbe_table_node_l2 be 3?
  ixgbe_table_graph_nodes has three elements but perhaps you
  are intentionally excluding the last element ixgbe_table_node_nil?

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch net-next v2 00/10] introduce rocker switch driver with hardware accelerated datapath api - phase 1: bridge fdb offload
  2014-11-13  5:44           ` Simon Horman
@ 2014-11-13  6:31             ` John Fastabend
  2014-11-21  2:01               ` Simon Horman
  0 siblings, 1 reply; 100+ messages in thread
From: John Fastabend @ 2014-11-13  6:31 UTC (permalink / raw)
  To: Simon Horman, John Fastabend
  Cc: Jamal Hadi Salim, Jiri Pirko, netdev, davem, nhorman, andy,
	tgraf, dborkman, ogerlitz, jesse, pshelar, azhou, ben, stephen,
	jeffrey.t.kirsher, vyasevic, xiyou.wangcong, edumazet, sfeldma,
	f.fainelli, roopa, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, alexander.h.duyck, john.ronciak, mleitner,
	shrijeet, gospo, bcrl

On 11/12/2014 09:44 PM, Simon Horman wrote:
> [snip]
> 
>> Simon, if your feeling adventurous any feedback on the repo link
>> would be great. I still need to smash the commit log into something
>> coherent though at the moment you can see all the errors and rewrites,
>> etc as I made them.
> 
> Hi John,
> 
> here is some preliminary feedback:
> 
> * I notice that the parse graph code isn't present yet.
>   I suppose this is a difficult piece that naturally follows many
>   other piece. None the less it is possibly the piece of most
>   interest to me :-)

I can add this over the next few days. Also I wanted to publish some
more complex examples on top of rocker switch. The nic drivers are
interesting but not as complex as some of the switch devices.

There is also the table graph layout which I wanted tweak a bit. At
the moment I have hardware that can run tables in parallel and some
that executes tables in sequence. It might not be clear from the code
(why I need the cleanup) but the source id is being used to indicate
if the tables are executed in parallel or not.

> 
> * Will del and update flows require flows to already exist?
>   And similarly, will add flow require flows with the same match to not
>   already exist?  If so, the error handling seems tricky of more than one
>   flow is to be deleted/updated. IIRC there was some discussion of that
>   kind of issue at the (double) round table discussion on the last day of
>   LPC14 in Düsseldorf.

I would expect del/updates for flows that don't exist should fail.

I didn't intend to add any checks in the kernel to verify the matches
are unique. My opinion on this is that user space shouldn't add new
duplicate flows. And if it does hardware resources will be wasted.

> 
> * Should the .node_count value of ixgbe_table_node_l2 be 3?
>   ixgbe_table_graph_nodes has three elements but perhaps you
>   are intentionally excluding the last element ixgbe_table_node_nil?
> 

Actually I could just drop the node_count at the moment because I've
been null terminating the arrays with null items.

I should either add a count field to all the structures or null terminate
the arrays. For now I mostly null terminate the arrays when I use
them. For example matches is null terminates same for actions.

.John

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch net-next v2 02/10] net: introduce generic switch devices support
  2014-11-09 10:51 ` [patch net-next v2 02/10] net: introduce generic switch devices support Jiri Pirko
  2014-11-10 21:59   ` John Fastabend
  2014-11-11  9:49   ` M. Braun
@ 2014-11-19 13:28   ` Roopa Prabhu
  2014-11-19 13:46     ` Jiri Pirko
  2 siblings, 1 reply; 100+ messages in thread
From: Roopa Prabhu @ 2014-11-19 13:28 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse,
	pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, jhs, sfeldma,
	f.fainelli, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck,
	john.ronciak, mleitner, shrijeet, gospo, bcrl

On 11/9/14, 2:51 AM, Jiri Pirko wrote:
> The goal of this is to provide a possibility to support various switch
> chips. Drivers should implement relevant ndos to do so. Now there is
> only one ndo defined:
> - for getting physical switch id is in place.
>
> Note that user can use random port netdevice to access the switch.
>
> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
> ---
>   Documentation/networking/switchdev.txt | 59 ++++++++++++++++++++++++++++++++++
>   MAINTAINERS                            |  7 ++++
>   include/linux/netdevice.h              | 10 ++++++
>   include/net/switchdev.h                | 30 +++++++++++++++++
>   net/Kconfig                            |  1 +
>   net/Makefile                           |  3 ++
>   net/switchdev/Kconfig                  | 13 ++++++++
>   net/switchdev/Makefile                 |  5 +++
>   net/switchdev/switchdev.c              | 33 +++++++++++++++++++
>   9 files changed, 161 insertions(+)
>   create mode 100644 Documentation/networking/switchdev.txt
>   create mode 100644 include/net/switchdev.h
>   create mode 100644 net/switchdev/Kconfig
>   create mode 100644 net/switchdev/Makefile
>   create mode 100644 net/switchdev/switchdev.c
>
> diff --git a/Documentation/networking/switchdev.txt b/Documentation/networking/switchdev.txt
> new file mode 100644
> index 0000000..98be76c
> --- /dev/null
> +++ b/Documentation/networking/switchdev.txt
> @@ -0,0 +1,59 @@
> +Switch (and switch-ish) device drivers HOWTO
> +===========================
> +
> +Please note that the word "switch" is here used in very generic meaning.
> +This include devices supporting L2/L3 but also various flow offloading chips,
> +including switches embedded into SR-IOV NICs.
> +
> +Lets describe a topology a bit. Imagine the following example:
> +
> +       +----------------------------+    +---------------+
> +       |     SOME switch chip       |    |      CPU      |
> +       +----------------------------+    +---------------+
> +       port1 port2 port3 port4 MNGMNT    |     PCI-E     |
> +         |     |     |     |     |       +---------------+
> +        PHY   PHY    |     |     |         |  NIC0 NIC1
> +                     |     |     |         |   |    |
> +                     |     |     +- PCI-E -+   |    |
> +                     |     +------- MII -------+    |
> +                     +------------- MII ------------+
> +
> +In this example, there are two independent lines between the switch silicon
> +and CPU. NIC0 and NIC1 drivers are not aware of a switch presence. They are
> +separate from the switch driver. SOME switch chip is by managed by a driver
> +via PCI-E device MNGMNT. Note that MNGMNT device, NIC0 and NIC1 may be
> +connected to some other type of bus.
> +
> +Now, for the previous example show the representation in kernel:
> +
> +       +----------------------------+    +---------------+
> +       |     SOME switch chip       |    |      CPU      |
> +       +----------------------------+    +---------------+
> +       sw0p0 sw0p1 sw0p2 sw0p3 MNGMNT    |     PCI-E     |
> +         |     |     |     |     |       +---------------+
> +        PHY   PHY    |     |     |         |  eth0 eth1
> +                     |     |     |         |   |    |
> +                     |     |     +- PCI-E -+   |    |
> +                     |     +------- MII -------+    |
> +                     +------------- MII ------------+
> +
> +Lets call the example switch driver for SOME switch chip "SOMEswitch". This
> +driver takes care of PCI-E device MNGMNT. There is a netdevice instance sw0pX
> +created for each port of a switch. These netdevices are instances
> +of "SOMEswitch" driver. sw0pX netdevices serve as a "representation"
> +of the switch chip. eth0 and eth1 are instances of some other existing driver.
> +
> +The only difference of the switch-port netdevice from the ordinary netdevice
> +is that is implements couple more NDOs:
> +
> +	ndo_sw_parent_get_id - This returns the same ID for two port netdevices
> +			       of the same physical switch chip. This is
> +			       mandatory to be implemented by all switch drivers
> +			       and serves the caller for recognition of a port
> +			       netdevice.
> +	ndo_sw_parent_* - Functions that serve for a manipulation of the switch
> +			  chip itself (it can be though of as a "parent" of the
> +			  port, therefore the name). They are not port-specific.
> +			  Caller might use arbitrary port netdevice of the same
> +			  switch and it will make no difference.
> +	ndo_sw_port_* - Functions that serve for a port-specific manipulation.
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 3a41fb0..776e078 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -9003,6 +9003,13 @@ F:	lib/swiotlb.c
>   F:	arch/*/kernel/pci-swiotlb.c
>   F:	include/linux/swiotlb.h
>   
> +SWITCHDEV
> +M:	Jiri Pirko <jiri@resnulli.us>
> +L:	netdev@vger.kernel.org
> +S:	Supported
> +F:	net/switchdev/
> +F:	include/net/switchdev.h
> +
>   SYNOPSYS ARC ARCHITECTURE
>   M:	Vineet Gupta <vgupta@synopsys.com>
>   S:	Supported
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 71922e0..97eade9 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -1017,6 +1017,12 @@ typedef u16 (*select_queue_fallback_t)(struct net_device *dev,
>    *	performing GSO on a packet. The device returns true if it is
>    *	able to GSO the packet, false otherwise. If the return value is
>    *	false the stack will do software GSO.
> + *
> + * int (*ndo_sw_parent_id_get)(struct net_device *dev,
> + *			       struct netdev_phys_item_id *psid);
> + *	Called to get an ID of the switch chip this port is part of.
> + *	If driver implements this, it indicates that it represents a port
> + *	of a switch chip.
>    */
>   struct net_device_ops {
>   	int			(*ndo_init)(struct net_device *dev);
> @@ -1168,6 +1174,10 @@ struct net_device_ops {
>   	int			(*ndo_get_lock_subclass)(struct net_device *dev);
>   	bool			(*ndo_gso_check) (struct sk_buff *skb,
>   						  struct net_device *dev);
> +#ifdef CONFIG_NET_SWITCHDEV
> +	int			(*ndo_sw_parent_id_get)(struct net_device *dev,
> +							struct netdev_phys_item_id *psid);
Can we keep the name generic and not include "sw" which implies switch 
here ?.
I understand that it is under CONFIG_NET_SWITCHDEV but we might find use 
for them in other offload scenarios in the future.
This particular ndo can be just ndo_parent_id_get().
And the others that do specific offloads can have "offload" in them if 
required..?.



> +#endif
>   };
>   
>   /**
> diff --git a/include/net/switchdev.h b/include/net/switchdev.h
> new file mode 100644
> index 0000000..79bf9bd
> --- /dev/null
> +++ b/include/net/switchdev.h
> @@ -0,0 +1,30 @@
> +/*
> + * include/net/switchdev.h - Switch device API
> + * Copyright (c) 2014 Jiri Pirko <jiri@resnulli.us>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + */
> +#ifndef _LINUX_SWITCHDEV_H_
> +#define _LINUX_SWITCHDEV_H_
> +
> +#include <linux/netdevice.h>
> +
> +#ifdef CONFIG_NET_SWITCHDEV
> +
> +int netdev_sw_parent_id_get(struct net_device *dev,
> +			    struct netdev_phys_item_id *psid);
> +
> +#else
> +
> +static inline int netdev_sw_parent_id_get(struct net_device *dev,
> +					  struct netdev_phys_item_id *psid)
> +{
> +	return -EOPNOTSUPP;
> +}
> +
> +#endif
> +
> +#endif /* _LINUX_SWITCHDEV_H_ */
> diff --git a/net/Kconfig b/net/Kconfig
> index 99815b5..ff9ffc1 100644
> --- a/net/Kconfig
> +++ b/net/Kconfig
> @@ -228,6 +228,7 @@ source "net/vmw_vsock/Kconfig"
>   source "net/netlink/Kconfig"
>   source "net/mpls/Kconfig"
>   source "net/hsr/Kconfig"
> +source "net/switchdev/Kconfig"
>   
>   config RPS
>   	boolean
> diff --git a/net/Makefile b/net/Makefile
> index 7ed1970..95fc694 100644
> --- a/net/Makefile
> +++ b/net/Makefile
> @@ -73,3 +73,6 @@ obj-$(CONFIG_OPENVSWITCH)	+= openvswitch/
>   obj-$(CONFIG_VSOCKETS)	+= vmw_vsock/
>   obj-$(CONFIG_NET_MPLS_GSO)	+= mpls/
>   obj-$(CONFIG_HSR)		+= hsr/
> +ifneq ($(CONFIG_NET_SWITCHDEV),)
> +obj-y				+= switchdev/
> +endif
> diff --git a/net/switchdev/Kconfig b/net/switchdev/Kconfig
> new file mode 100644
> index 0000000..1557545
> --- /dev/null
> +++ b/net/switchdev/Kconfig
> @@ -0,0 +1,13 @@
> +#
> +# Configuration for Switch device support
> +#
> +
> +config NET_SWITCHDEV
> +	boolean "Switch (and switch-ish) device support (EXPERIMENTAL)"
> +	depends on INET
> +	---help---
> +	  This module provides glue between core networking code and device
> +	  drivers in order to support hardware switch chips in very generic
> +	  meaning of the word "switch". This include devices supporting L2/L3 but
> +	  also various flow offloading chips, including switches embedded into
> +	  SR-IOV NICs.
> diff --git a/net/switchdev/Makefile b/net/switchdev/Makefile
> new file mode 100644
> index 0000000..5ed63ed
> --- /dev/null
> +++ b/net/switchdev/Makefile
> @@ -0,0 +1,5 @@
> +#
> +# Makefile for the Switch device API
> +#
> +
> +obj-$(CONFIG_NET_SWITCHDEV) += switchdev.o
> diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
> new file mode 100644
> index 0000000..5010f646
> --- /dev/null
> +++ b/net/switchdev/switchdev.c
> @@ -0,0 +1,33 @@
> +/*
> + * net/switchdev/switchdev.c - Switch device API
> + * Copyright (c) 2014 Jiri Pirko <jiri@resnulli.us>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + */
> +
> +#include <linux/kernel.h>
> +#include <linux/types.h>
> +#include <linux/init.h>
> +#include <linux/netdevice.h>
> +#include <net/switchdev.h>
> +
> +/**
> + *	netdev_sw_parent_id_get - Get ID of a switch
> + *	@dev: port device
> + *	@psid: switch ID
> + *
> + *	Get ID of a switch this port is part of.
> + */
> +int netdev_sw_parent_id_get(struct net_device *dev,
> +			    struct netdev_phys_item_id *psid)
> +{
> +	const struct net_device_ops *ops = dev->netdev_ops;
> +
> +	if (!ops->ndo_sw_parent_id_get)
> +		return -EOPNOTSUPP;
> +	return ops->ndo_sw_parent_id_get(dev, psid);
> +}
> +EXPORT_SYMBOL(netdev_sw_parent_id_get);

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch net-next v2 02/10] net: introduce generic switch devices support
  2014-11-19 13:28   ` Roopa Prabhu
@ 2014-11-19 13:46     ` Jiri Pirko
  2014-11-19 13:59       ` Roopa Prabhu
  0 siblings, 1 reply; 100+ messages in thread
From: Jiri Pirko @ 2014-11-19 13:46 UTC (permalink / raw)
  To: Roopa Prabhu
  Cc: netdev, davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse,
	pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, jhs, sfeldma,
	f.fainelli, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck,
	john.ronciak, mleitner, shrijeet, gospo, bcrl

Wed, Nov 19, 2014 at 02:28:10PM CET, roopa@cumulusnetworks.com wrote:
>On 11/9/14, 2:51 AM, Jiri Pirko wrote:
>>The goal of this is to provide a possibility to support various switch
>>chips. Drivers should implement relevant ndos to do so. Now there is
>>only one ndo defined:
>>- for getting physical switch id is in place.
>>
>>Note that user can use random port netdevice to access the switch.
>>
>>Signed-off-by: Jiri Pirko <jiri@resnulli.us>
>>---
>>  Documentation/networking/switchdev.txt | 59 ++++++++++++++++++++++++++++++++++
>>  MAINTAINERS                            |  7 ++++
>>  include/linux/netdevice.h              | 10 ++++++
>>  include/net/switchdev.h                | 30 +++++++++++++++++
>>  net/Kconfig                            |  1 +
>>  net/Makefile                           |  3 ++
>>  net/switchdev/Kconfig                  | 13 ++++++++
>>  net/switchdev/Makefile                 |  5 +++
>>  net/switchdev/switchdev.c              | 33 +++++++++++++++++++
>>  9 files changed, 161 insertions(+)
>>  create mode 100644 Documentation/networking/switchdev.txt
>>  create mode 100644 include/net/switchdev.h
>>  create mode 100644 net/switchdev/Kconfig
>>  create mode 100644 net/switchdev/Makefile
>>  create mode 100644 net/switchdev/switchdev.c
>>
>>diff --git a/Documentation/networking/switchdev.txt b/Documentation/networking/switchdev.txt
>>new file mode 100644
>>index 0000000..98be76c
>>--- /dev/null
>>+++ b/Documentation/networking/switchdev.txt
>>@@ -0,0 +1,59 @@
>>+Switch (and switch-ish) device drivers HOWTO
>>+===========================
>>+
>>+Please note that the word "switch" is here used in very generic meaning.
>>+This include devices supporting L2/L3 but also various flow offloading chips,
>>+including switches embedded into SR-IOV NICs.
>>+
>>+Lets describe a topology a bit. Imagine the following example:
>>+
>>+       +----------------------------+    +---------------+
>>+       |     SOME switch chip       |    |      CPU      |
>>+       +----------------------------+    +---------------+
>>+       port1 port2 port3 port4 MNGMNT    |     PCI-E     |
>>+         |     |     |     |     |       +---------------+
>>+        PHY   PHY    |     |     |         |  NIC0 NIC1
>>+                     |     |     |         |   |    |
>>+                     |     |     +- PCI-E -+   |    |
>>+                     |     +------- MII -------+    |
>>+                     +------------- MII ------------+
>>+
>>+In this example, there are two independent lines between the switch silicon
>>+and CPU. NIC0 and NIC1 drivers are not aware of a switch presence. They are
>>+separate from the switch driver. SOME switch chip is by managed by a driver
>>+via PCI-E device MNGMNT. Note that MNGMNT device, NIC0 and NIC1 may be
>>+connected to some other type of bus.
>>+
>>+Now, for the previous example show the representation in kernel:
>>+
>>+       +----------------------------+    +---------------+
>>+       |     SOME switch chip       |    |      CPU      |
>>+       +----------------------------+    +---------------+
>>+       sw0p0 sw0p1 sw0p2 sw0p3 MNGMNT    |     PCI-E     |
>>+         |     |     |     |     |       +---------------+
>>+        PHY   PHY    |     |     |         |  eth0 eth1
>>+                     |     |     |         |   |    |
>>+                     |     |     +- PCI-E -+   |    |
>>+                     |     +------- MII -------+    |
>>+                     +------------- MII ------------+
>>+
>>+Lets call the example switch driver for SOME switch chip "SOMEswitch". This
>>+driver takes care of PCI-E device MNGMNT. There is a netdevice instance sw0pX
>>+created for each port of a switch. These netdevices are instances
>>+of "SOMEswitch" driver. sw0pX netdevices serve as a "representation"
>>+of the switch chip. eth0 and eth1 are instances of some other existing driver.
>>+
>>+The only difference of the switch-port netdevice from the ordinary netdevice
>>+is that is implements couple more NDOs:
>>+
>>+	ndo_sw_parent_get_id - This returns the same ID for two port netdevices
>>+			       of the same physical switch chip. This is
>>+			       mandatory to be implemented by all switch drivers
>>+			       and serves the caller for recognition of a port
>>+			       netdevice.
>>+	ndo_sw_parent_* - Functions that serve for a manipulation of the switch
>>+			  chip itself (it can be though of as a "parent" of the
>>+			  port, therefore the name). They are not port-specific.
>>+			  Caller might use arbitrary port netdevice of the same
>>+			  switch and it will make no difference.
>>+	ndo_sw_port_* - Functions that serve for a port-specific manipulation.
>>diff --git a/MAINTAINERS b/MAINTAINERS
>>index 3a41fb0..776e078 100644
>>--- a/MAINTAINERS
>>+++ b/MAINTAINERS
>>@@ -9003,6 +9003,13 @@ F:	lib/swiotlb.c
>>  F:	arch/*/kernel/pci-swiotlb.c
>>  F:	include/linux/swiotlb.h
>>+SWITCHDEV
>>+M:	Jiri Pirko <jiri@resnulli.us>
>>+L:	netdev@vger.kernel.org
>>+S:	Supported
>>+F:	net/switchdev/
>>+F:	include/net/switchdev.h
>>+
>>  SYNOPSYS ARC ARCHITECTURE
>>  M:	Vineet Gupta <vgupta@synopsys.com>
>>  S:	Supported
>>diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>>index 71922e0..97eade9 100644
>>--- a/include/linux/netdevice.h
>>+++ b/include/linux/netdevice.h
>>@@ -1017,6 +1017,12 @@ typedef u16 (*select_queue_fallback_t)(struct net_device *dev,
>>   *	performing GSO on a packet. The device returns true if it is
>>   *	able to GSO the packet, false otherwise. If the return value is
>>   *	false the stack will do software GSO.
>>+ *
>>+ * int (*ndo_sw_parent_id_get)(struct net_device *dev,
>>+ *			       struct netdev_phys_item_id *psid);
>>+ *	Called to get an ID of the switch chip this port is part of.
>>+ *	If driver implements this, it indicates that it represents a port
>>+ *	of a switch chip.
>>   */
>>  struct net_device_ops {
>>  	int			(*ndo_init)(struct net_device *dev);
>>@@ -1168,6 +1174,10 @@ struct net_device_ops {
>>  	int			(*ndo_get_lock_subclass)(struct net_device *dev);
>>  	bool			(*ndo_gso_check) (struct sk_buff *skb,
>>  						  struct net_device *dev);
>>+#ifdef CONFIG_NET_SWITCHDEV
>>+	int			(*ndo_sw_parent_id_get)(struct net_device *dev,
>>+							struct netdev_phys_item_id *psid);
>Can we keep the name generic and not include "sw" which implies switch here
>?.
>I understand that it is under CONFIG_NET_SWITCHDEV but we might find use for
>them in other offload scenarios in the future.
>This particular ndo can be just ndo_parent_id_get().
>And the others that do specific offloads can have "offload" in them if
>required..?.


But this is for getting parent switch id, sw should be there. If comes a
time when this might be reused to something else, we change it then.
This is internal api, easily changeable.

>
>
>
>>+#endif
>>  };
>>  /**
>>diff --git a/include/net/switchdev.h b/include/net/switchdev.h
>>new file mode 100644
>>index 0000000..79bf9bd
>>--- /dev/null
>>+++ b/include/net/switchdev.h
>>@@ -0,0 +1,30 @@
>>+/*
>>+ * include/net/switchdev.h - Switch device API
>>+ * Copyright (c) 2014 Jiri Pirko <jiri@resnulli.us>
>>+ *
>>+ * This program is free software; you can redistribute it and/or modify
>>+ * it under the terms of the GNU General Public License as published by
>>+ * the Free Software Foundation; either version 2 of the License, or
>>+ * (at your option) any later version.
>>+ */
>>+#ifndef _LINUX_SWITCHDEV_H_
>>+#define _LINUX_SWITCHDEV_H_
>>+
>>+#include <linux/netdevice.h>
>>+
>>+#ifdef CONFIG_NET_SWITCHDEV
>>+
>>+int netdev_sw_parent_id_get(struct net_device *dev,
>>+			    struct netdev_phys_item_id *psid);
>>+
>>+#else
>>+
>>+static inline int netdev_sw_parent_id_get(struct net_device *dev,
>>+					  struct netdev_phys_item_id *psid)
>>+{
>>+	return -EOPNOTSUPP;
>>+}
>>+
>>+#endif
>>+
>>+#endif /* _LINUX_SWITCHDEV_H_ */
>>diff --git a/net/Kconfig b/net/Kconfig
>>index 99815b5..ff9ffc1 100644
>>--- a/net/Kconfig
>>+++ b/net/Kconfig
>>@@ -228,6 +228,7 @@ source "net/vmw_vsock/Kconfig"
>>  source "net/netlink/Kconfig"
>>  source "net/mpls/Kconfig"
>>  source "net/hsr/Kconfig"
>>+source "net/switchdev/Kconfig"
>>  config RPS
>>  	boolean
>>diff --git a/net/Makefile b/net/Makefile
>>index 7ed1970..95fc694 100644
>>--- a/net/Makefile
>>+++ b/net/Makefile
>>@@ -73,3 +73,6 @@ obj-$(CONFIG_OPENVSWITCH)	+= openvswitch/
>>  obj-$(CONFIG_VSOCKETS)	+= vmw_vsock/
>>  obj-$(CONFIG_NET_MPLS_GSO)	+= mpls/
>>  obj-$(CONFIG_HSR)		+= hsr/
>>+ifneq ($(CONFIG_NET_SWITCHDEV),)
>>+obj-y				+= switchdev/
>>+endif
>>diff --git a/net/switchdev/Kconfig b/net/switchdev/Kconfig
>>new file mode 100644
>>index 0000000..1557545
>>--- /dev/null
>>+++ b/net/switchdev/Kconfig
>>@@ -0,0 +1,13 @@
>>+#
>>+# Configuration for Switch device support
>>+#
>>+
>>+config NET_SWITCHDEV
>>+	boolean "Switch (and switch-ish) device support (EXPERIMENTAL)"
>>+	depends on INET
>>+	---help---
>>+	  This module provides glue between core networking code and device
>>+	  drivers in order to support hardware switch chips in very generic
>>+	  meaning of the word "switch". This include devices supporting L2/L3 but
>>+	  also various flow offloading chips, including switches embedded into
>>+	  SR-IOV NICs.
>>diff --git a/net/switchdev/Makefile b/net/switchdev/Makefile
>>new file mode 100644
>>index 0000000..5ed63ed
>>--- /dev/null
>>+++ b/net/switchdev/Makefile
>>@@ -0,0 +1,5 @@
>>+#
>>+# Makefile for the Switch device API
>>+#
>>+
>>+obj-$(CONFIG_NET_SWITCHDEV) += switchdev.o
>>diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
>>new file mode 100644
>>index 0000000..5010f646
>>--- /dev/null
>>+++ b/net/switchdev/switchdev.c
>>@@ -0,0 +1,33 @@
>>+/*
>>+ * net/switchdev/switchdev.c - Switch device API
>>+ * Copyright (c) 2014 Jiri Pirko <jiri@resnulli.us>
>>+ *
>>+ * This program is free software; you can redistribute it and/or modify
>>+ * it under the terms of the GNU General Public License as published by
>>+ * the Free Software Foundation; either version 2 of the License, or
>>+ * (at your option) any later version.
>>+ */
>>+
>>+#include <linux/kernel.h>
>>+#include <linux/types.h>
>>+#include <linux/init.h>
>>+#include <linux/netdevice.h>
>>+#include <net/switchdev.h>
>>+
>>+/**
>>+ *	netdev_sw_parent_id_get - Get ID of a switch
>>+ *	@dev: port device
>>+ *	@psid: switch ID
>>+ *
>>+ *	Get ID of a switch this port is part of.
>>+ */
>>+int netdev_sw_parent_id_get(struct net_device *dev,
>>+			    struct netdev_phys_item_id *psid)
>>+{
>>+	const struct net_device_ops *ops = dev->netdev_ops;
>>+
>>+	if (!ops->ndo_sw_parent_id_get)
>>+		return -EOPNOTSUPP;
>>+	return ops->ndo_sw_parent_id_get(dev, psid);
>>+}
>>+EXPORT_SYMBOL(netdev_sw_parent_id_get);
>

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch net-next v2 02/10] net: introduce generic switch devices support
  2014-11-19 13:46     ` Jiri Pirko
@ 2014-11-19 13:59       ` Roopa Prabhu
  2014-11-20 15:55         ` Andy Gospodarek
  0 siblings, 1 reply; 100+ messages in thread
From: Roopa Prabhu @ 2014-11-19 13:59 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse,
	pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, jhs, sfeldma,
	f.fainelli, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck,
	john.ronciak, mleitner, shrijeet, gospo, bcrl

On 11/19/14, 5:46 AM, Jiri Pirko wrote:
> Wed, Nov 19, 2014 at 02:28:10PM CET, roopa@cumulusnetworks.com wrote:
>> On 11/9/14, 2:51 AM, Jiri Pirko wrote:
>>> The goal of this is to provide a possibility to support various switch
>>> chips. Drivers should implement relevant ndos to do so. Now there is
>>> only one ndo defined:
>>> - for getting physical switch id is in place.
>>>
>>> Note that user can use random port netdevice to access the switch.
>>>
>>> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
>>> ---
>>>   Documentation/networking/switchdev.txt | 59 ++++++++++++++++++++++++++++++++++
>>>   MAINTAINERS                            |  7 ++++
>>>   include/linux/netdevice.h              | 10 ++++++
>>>   include/net/switchdev.h                | 30 +++++++++++++++++
>>>   net/Kconfig                            |  1 +
>>>   net/Makefile                           |  3 ++
>>>   net/switchdev/Kconfig                  | 13 ++++++++
>>>   net/switchdev/Makefile                 |  5 +++
>>>   net/switchdev/switchdev.c              | 33 +++++++++++++++++++
>>>   9 files changed, 161 insertions(+)
>>>   create mode 100644 Documentation/networking/switchdev.txt
>>>   create mode 100644 include/net/switchdev.h
>>>   create mode 100644 net/switchdev/Kconfig
>>>   create mode 100644 net/switchdev/Makefile
>>>   create mode 100644 net/switchdev/switchdev.c
>>>
>>> diff --git a/Documentation/networking/switchdev.txt b/Documentation/networking/switchdev.txt
>>> new file mode 100644
>>> index 0000000..98be76c
>>> --- /dev/null
>>> +++ b/Documentation/networking/switchdev.txt
>>> @@ -0,0 +1,59 @@
>>> +Switch (and switch-ish) device drivers HOWTO
>>> +===========================
>>> +
>>> +Please note that the word "switch" is here used in very generic meaning.
>>> +This include devices supporting L2/L3 but also various flow offloading chips,
>>> +including switches embedded into SR-IOV NICs.
>>> +
>>> +Lets describe a topology a bit. Imagine the following example:
>>> +
>>> +       +----------------------------+    +---------------+
>>> +       |     SOME switch chip       |    |      CPU      |
>>> +       +----------------------------+    +---------------+
>>> +       port1 port2 port3 port4 MNGMNT    |     PCI-E     |
>>> +         |     |     |     |     |       +---------------+
>>> +        PHY   PHY    |     |     |         |  NIC0 NIC1
>>> +                     |     |     |         |   |    |
>>> +                     |     |     +- PCI-E -+   |    |
>>> +                     |     +------- MII -------+    |
>>> +                     +------------- MII ------------+
>>> +
>>> +In this example, there are two independent lines between the switch silicon
>>> +and CPU. NIC0 and NIC1 drivers are not aware of a switch presence. They are
>>> +separate from the switch driver. SOME switch chip is by managed by a driver
>>> +via PCI-E device MNGMNT. Note that MNGMNT device, NIC0 and NIC1 may be
>>> +connected to some other type of bus.
>>> +
>>> +Now, for the previous example show the representation in kernel:
>>> +
>>> +       +----------------------------+    +---------------+
>>> +       |     SOME switch chip       |    |      CPU      |
>>> +       +----------------------------+    +---------------+
>>> +       sw0p0 sw0p1 sw0p2 sw0p3 MNGMNT    |     PCI-E     |
>>> +         |     |     |     |     |       +---------------+
>>> +        PHY   PHY    |     |     |         |  eth0 eth1
>>> +                     |     |     |         |   |    |
>>> +                     |     |     +- PCI-E -+   |    |
>>> +                     |     +------- MII -------+    |
>>> +                     +------------- MII ------------+
>>> +
>>> +Lets call the example switch driver for SOME switch chip "SOMEswitch". This
>>> +driver takes care of PCI-E device MNGMNT. There is a netdevice instance sw0pX
>>> +created for each port of a switch. These netdevices are instances
>>> +of "SOMEswitch" driver. sw0pX netdevices serve as a "representation"
>>> +of the switch chip. eth0 and eth1 are instances of some other existing driver.
>>> +
>>> +The only difference of the switch-port netdevice from the ordinary netdevice
>>> +is that is implements couple more NDOs:
>>> +
>>> +	ndo_sw_parent_get_id - This returns the same ID for two port netdevices
>>> +			       of the same physical switch chip. This is
>>> +			       mandatory to be implemented by all switch drivers
>>> +			       and serves the caller for recognition of a port
>>> +			       netdevice.
>>> +	ndo_sw_parent_* - Functions that serve for a manipulation of the switch
>>> +			  chip itself (it can be though of as a "parent" of the
>>> +			  port, therefore the name). They are not port-specific.
>>> +			  Caller might use arbitrary port netdevice of the same
>>> +			  switch and it will make no difference.
>>> +	ndo_sw_port_* - Functions that serve for a port-specific manipulation.
>>> diff --git a/MAINTAINERS b/MAINTAINERS
>>> index 3a41fb0..776e078 100644
>>> --- a/MAINTAINERS
>>> +++ b/MAINTAINERS
>>> @@ -9003,6 +9003,13 @@ F:	lib/swiotlb.c
>>>   F:	arch/*/kernel/pci-swiotlb.c
>>>   F:	include/linux/swiotlb.h
>>> +SWITCHDEV
>>> +M:	Jiri Pirko <jiri@resnulli.us>
>>> +L:	netdev@vger.kernel.org
>>> +S:	Supported
>>> +F:	net/switchdev/
>>> +F:	include/net/switchdev.h
>>> +
>>>   SYNOPSYS ARC ARCHITECTURE
>>>   M:	Vineet Gupta <vgupta@synopsys.com>
>>>   S:	Supported
>>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>>> index 71922e0..97eade9 100644
>>> --- a/include/linux/netdevice.h
>>> +++ b/include/linux/netdevice.h
>>> @@ -1017,6 +1017,12 @@ typedef u16 (*select_queue_fallback_t)(struct net_device *dev,
>>>    *	performing GSO on a packet. The device returns true if it is
>>>    *	able to GSO the packet, false otherwise. If the return value is
>>>    *	false the stack will do software GSO.
>>> + *
>>> + * int (*ndo_sw_parent_id_get)(struct net_device *dev,
>>> + *			       struct netdev_phys_item_id *psid);
>>> + *	Called to get an ID of the switch chip this port is part of.
>>> + *	If driver implements this, it indicates that it represents a port
>>> + *	of a switch chip.
>>>    */
>>>   struct net_device_ops {
>>>   	int			(*ndo_init)(struct net_device *dev);
>>> @@ -1168,6 +1174,10 @@ struct net_device_ops {
>>>   	int			(*ndo_get_lock_subclass)(struct net_device *dev);
>>>   	bool			(*ndo_gso_check) (struct sk_buff *skb,
>>>   						  struct net_device *dev);
>>> +#ifdef CONFIG_NET_SWITCHDEV
>>> +	int			(*ndo_sw_parent_id_get)(struct net_device *dev,
>>> +							struct netdev_phys_item_id *psid);
>> Can we keep the name generic and not include "sw" which implies switch here
>> ?.
>> I understand that it is under CONFIG_NET_SWITCHDEV but we might find use for
>> them in other offload scenarios in the future.
>> This particular ndo can be just ndo_parent_id_get().
>> And the others that do specific offloads can have "offload" in them if
>> required..?.
>
> But this is for getting parent switch id, sw should be there.

Since we have not figured out the details or namespace for switchd ids 
yet, its still some parent id to me.

> If comes a
> time when this might be reused to something else, we change it then.
> This is internal api, easily changeable.
also, "sw" seems more "software"  than "switch".

>
>>
>>
>>> +#endif
>>>   };
>>>   /**
>>> diff --git a/include/net/switchdev.h b/include/net/switchdev.h
>>> new file mode 100644
>>> index 0000000..79bf9bd
>>> --- /dev/null
>>> +++ b/include/net/switchdev.h
>>> @@ -0,0 +1,30 @@
>>> +/*
>>> + * include/net/switchdev.h - Switch device API
>>> + * Copyright (c) 2014 Jiri Pirko <jiri@resnulli.us>
>>> + *
>>> + * This program is free software; you can redistribute it and/or modify
>>> + * it under the terms of the GNU General Public License as published by
>>> + * the Free Software Foundation; either version 2 of the License, or
>>> + * (at your option) any later version.
>>> + */
>>> +#ifndef _LINUX_SWITCHDEV_H_
>>> +#define _LINUX_SWITCHDEV_H_
>>> +
>>> +#include <linux/netdevice.h>
>>> +
>>> +#ifdef CONFIG_NET_SWITCHDEV
>>> +
>>> +int netdev_sw_parent_id_get(struct net_device *dev,
>>> +			    struct netdev_phys_item_id *psid);
>>> +
>>> +#else
>>> +
>>> +static inline int netdev_sw_parent_id_get(struct net_device *dev,
>>> +					  struct netdev_phys_item_id *psid)
>>> +{
>>> +	return -EOPNOTSUPP;
>>> +}
>>> +
>>> +#endif
>>> +
>>> +#endif /* _LINUX_SWITCHDEV_H_ */
>>> diff --git a/net/Kconfig b/net/Kconfig
>>> index 99815b5..ff9ffc1 100644
>>> --- a/net/Kconfig
>>> +++ b/net/Kconfig
>>> @@ -228,6 +228,7 @@ source "net/vmw_vsock/Kconfig"
>>>   source "net/netlink/Kconfig"
>>>   source "net/mpls/Kconfig"
>>>   source "net/hsr/Kconfig"
>>> +source "net/switchdev/Kconfig"
>>>   config RPS
>>>   	boolean
>>> diff --git a/net/Makefile b/net/Makefile
>>> index 7ed1970..95fc694 100644
>>> --- a/net/Makefile
>>> +++ b/net/Makefile
>>> @@ -73,3 +73,6 @@ obj-$(CONFIG_OPENVSWITCH)	+= openvswitch/
>>>   obj-$(CONFIG_VSOCKETS)	+= vmw_vsock/
>>>   obj-$(CONFIG_NET_MPLS_GSO)	+= mpls/
>>>   obj-$(CONFIG_HSR)		+= hsr/
>>> +ifneq ($(CONFIG_NET_SWITCHDEV),)
>>> +obj-y				+= switchdev/
>>> +endif
>>> diff --git a/net/switchdev/Kconfig b/net/switchdev/Kconfig
>>> new file mode 100644
>>> index 0000000..1557545
>>> --- /dev/null
>>> +++ b/net/switchdev/Kconfig
>>> @@ -0,0 +1,13 @@
>>> +#
>>> +# Configuration for Switch device support
>>> +#
>>> +
>>> +config NET_SWITCHDEV
>>> +	boolean "Switch (and switch-ish) device support (EXPERIMENTAL)"
>>> +	depends on INET
>>> +	---help---
>>> +	  This module provides glue between core networking code and device
>>> +	  drivers in order to support hardware switch chips in very generic
>>> +	  meaning of the word "switch". This include devices supporting L2/L3 but
>>> +	  also various flow offloading chips, including switches embedded into
>>> +	  SR-IOV NICs.
>>> diff --git a/net/switchdev/Makefile b/net/switchdev/Makefile
>>> new file mode 100644
>>> index 0000000..5ed63ed
>>> --- /dev/null
>>> +++ b/net/switchdev/Makefile
>>> @@ -0,0 +1,5 @@
>>> +#
>>> +# Makefile for the Switch device API
>>> +#
>>> +
>>> +obj-$(CONFIG_NET_SWITCHDEV) += switchdev.o
>>> diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
>>> new file mode 100644
>>> index 0000000..5010f646
>>> --- /dev/null
>>> +++ b/net/switchdev/switchdev.c
>>> @@ -0,0 +1,33 @@
>>> +/*
>>> + * net/switchdev/switchdev.c - Switch device API
>>> + * Copyright (c) 2014 Jiri Pirko <jiri@resnulli.us>
>>> + *
>>> + * This program is free software; you can redistribute it and/or modify
>>> + * it under the terms of the GNU General Public License as published by
>>> + * the Free Software Foundation; either version 2 of the License, or
>>> + * (at your option) any later version.
>>> + */
>>> +
>>> +#include <linux/kernel.h>
>>> +#include <linux/types.h>
>>> +#include <linux/init.h>
>>> +#include <linux/netdevice.h>
>>> +#include <net/switchdev.h>
>>> +
>>> +/**
>>> + *	netdev_sw_parent_id_get - Get ID of a switch
>>> + *	@dev: port device
>>> + *	@psid: switch ID
>>> + *
>>> + *	Get ID of a switch this port is part of.
>>> + */
>>> +int netdev_sw_parent_id_get(struct net_device *dev,
>>> +			    struct netdev_phys_item_id *psid)
>>> +{
>>> +	const struct net_device_ops *ops = dev->netdev_ops;
>>> +
>>> +	if (!ops->ndo_sw_parent_id_get)
>>> +		return -EOPNOTSUPP;
>>> +	return ops->ndo_sw_parent_id_get(dev, psid);
>>> +}
>>> +EXPORT_SYMBOL(netdev_sw_parent_id_get);
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch net-next v2 02/10] net: introduce generic switch devices support
  2014-11-19 13:59       ` Roopa Prabhu
@ 2014-11-20 15:55         ` Andy Gospodarek
  2014-11-21  7:16           ` Jiri Pirko
  0 siblings, 1 reply; 100+ messages in thread
From: Andy Gospodarek @ 2014-11-20 15:55 UTC (permalink / raw)
  To: Roopa Prabhu
  Cc: Jiri Pirko, netdev, davem, nhorman, andy, tgraf, dborkman,
	ogerlitz, jesse, pshelar, azhou, ben, stephen, jeffrey.t.kirsher,
	vyasevic, xiyou.wangcong, john.r.fastabend, edumazet, jhs,
	sfeldma, f.fainelli, linville, jasowang, ebiederm,
	nicolas.dichtel, ryazanov.s.a, buytenh, aviadr, nbd,
	alexei.starovoitov, Neil.Jerram, ronye, simon.horman,
	alexander.h.duyck, john.ronciak, mleitner, shrijeet, bcrl

On Wed, Nov 19, 2014 at 05:59:19AM -0800, Roopa Prabhu wrote:
> On 11/19/14, 5:46 AM, Jiri Pirko wrote:
> >Wed, Nov 19, 2014 at 02:28:10PM CET, roopa@cumulusnetworks.com wrote:
> >>On 11/9/14, 2:51 AM, Jiri Pirko wrote:
> >>>The goal of this is to provide a possibility to support various switch
> >>>chips. Drivers should implement relevant ndos to do so. Now there is
> >>>only one ndo defined:
> >>>- for getting physical switch id is in place.
> >>>
> >>>Note that user can use random port netdevice to access the switch.
> >>>
> >>>Signed-off-by: Jiri Pirko <jiri@resnulli.us>
> >>>---
> >>>  Documentation/networking/switchdev.txt | 59 ++++++++++++++++++++++++++++++++++
> >>>  MAINTAINERS                            |  7 ++++
> >>>  include/linux/netdevice.h              | 10 ++++++
> >>>  include/net/switchdev.h                | 30 +++++++++++++++++
> >>>  net/Kconfig                            |  1 +
> >>>  net/Makefile                           |  3 ++
> >>>  net/switchdev/Kconfig                  | 13 ++++++++
> >>>  net/switchdev/Makefile                 |  5 +++
> >>>  net/switchdev/switchdev.c              | 33 +++++++++++++++++++
> >>>  9 files changed, 161 insertions(+)
> >>>  create mode 100644 Documentation/networking/switchdev.txt
> >>>  create mode 100644 include/net/switchdev.h
> >>>  create mode 100644 net/switchdev/Kconfig
> >>>  create mode 100644 net/switchdev/Makefile
> >>>  create mode 100644 net/switchdev/switchdev.c
> >>>
> >>>diff --git a/Documentation/networking/switchdev.txt b/Documentation/networking/switchdev.txt
> >>>new file mode 100644
> >>>index 0000000..98be76c
> >>>--- /dev/null
> >>>+++ b/Documentation/networking/switchdev.txt
> >>>@@ -0,0 +1,59 @@
> >>>+Switch (and switch-ish) device drivers HOWTO
> >>>+===========================
> >>>+
> >>>+Please note that the word "switch" is here used in very generic meaning.
> >>>+This include devices supporting L2/L3 but also various flow offloading chips,
> >>>+including switches embedded into SR-IOV NICs.
> >>>+
> >>>+Lets describe a topology a bit. Imagine the following example:
> >>>+
> >>>+       +----------------------------+    +---------------+
> >>>+       |     SOME switch chip       |    |      CPU      |
> >>>+       +----------------------------+    +---------------+
> >>>+       port1 port2 port3 port4 MNGMNT    |     PCI-E     |
> >>>+         |     |     |     |     |       +---------------+
> >>>+        PHY   PHY    |     |     |         |  NIC0 NIC1
> >>>+                     |     |     |         |   |    |
> >>>+                     |     |     +- PCI-E -+   |    |
> >>>+                     |     +------- MII -------+    |
> >>>+                     +------------- MII ------------+
> >>>+
> >>>+In this example, there are two independent lines between the switch silicon
> >>>+and CPU. NIC0 and NIC1 drivers are not aware of a switch presence. They are
> >>>+separate from the switch driver. SOME switch chip is by managed by a driver
> >>>+via PCI-E device MNGMNT. Note that MNGMNT device, NIC0 and NIC1 may be
> >>>+connected to some other type of bus.
> >>>+
> >>>+Now, for the previous example show the representation in kernel:
> >>>+
> >>>+       +----------------------------+    +---------------+
> >>>+       |     SOME switch chip       |    |      CPU      |
> >>>+       +----------------------------+    +---------------+
> >>>+       sw0p0 sw0p1 sw0p2 sw0p3 MNGMNT    |     PCI-E     |
> >>>+         |     |     |     |     |       +---------------+
> >>>+        PHY   PHY    |     |     |         |  eth0 eth1
> >>>+                     |     |     |         |   |    |
> >>>+                     |     |     +- PCI-E -+   |    |
> >>>+                     |     +------- MII -------+    |
> >>>+                     +------------- MII ------------+
> >>>+
> >>>+Lets call the example switch driver for SOME switch chip "SOMEswitch". This
> >>>+driver takes care of PCI-E device MNGMNT. There is a netdevice instance sw0pX
> >>>+created for each port of a switch. These netdevices are instances
> >>>+of "SOMEswitch" driver. sw0pX netdevices serve as a "representation"
> >>>+of the switch chip. eth0 and eth1 are instances of some other existing driver.
> >>>+
> >>>+The only difference of the switch-port netdevice from the ordinary netdevice
> >>>+is that is implements couple more NDOs:
> >>>+
> >>>+	ndo_sw_parent_get_id - This returns the same ID for two port netdevices
> >>>+			       of the same physical switch chip. This is
> >>>+			       mandatory to be implemented by all switch drivers
> >>>+			       and serves the caller for recognition of a port
> >>>+			       netdevice.
> >>>+	ndo_sw_parent_* - Functions that serve for a manipulation of the switch
> >>>+			  chip itself (it can be though of as a "parent" of the
> >>>+			  port, therefore the name). They are not port-specific.
> >>>+			  Caller might use arbitrary port netdevice of the same
> >>>+			  switch and it will make no difference.
> >>>+	ndo_sw_port_* - Functions that serve for a port-specific manipulation.
> >>>diff --git a/MAINTAINERS b/MAINTAINERS
> >>>index 3a41fb0..776e078 100644
> >>>--- a/MAINTAINERS
> >>>+++ b/MAINTAINERS
> >>>@@ -9003,6 +9003,13 @@ F:	lib/swiotlb.c
> >>>  F:	arch/*/kernel/pci-swiotlb.c
> >>>  F:	include/linux/swiotlb.h
> >>>+SWITCHDEV
> >>>+M:	Jiri Pirko <jiri@resnulli.us>
> >>>+L:	netdev@vger.kernel.org
> >>>+S:	Supported
> >>>+F:	net/switchdev/
> >>>+F:	include/net/switchdev.h
> >>>+
> >>>  SYNOPSYS ARC ARCHITECTURE
> >>>  M:	Vineet Gupta <vgupta@synopsys.com>
> >>>  S:	Supported
> >>>diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> >>>index 71922e0..97eade9 100644
> >>>--- a/include/linux/netdevice.h
> >>>+++ b/include/linux/netdevice.h
> >>>@@ -1017,6 +1017,12 @@ typedef u16 (*select_queue_fallback_t)(struct net_device *dev,
> >>>   *	performing GSO on a packet. The device returns true if it is
> >>>   *	able to GSO the packet, false otherwise. If the return value is
> >>>   *	false the stack will do software GSO.
> >>>+ *
> >>>+ * int (*ndo_sw_parent_id_get)(struct net_device *dev,
> >>>+ *			       struct netdev_phys_item_id *psid);
> >>>+ *	Called to get an ID of the switch chip this port is part of.
> >>>+ *	If driver implements this, it indicates that it represents a port
> >>>+ *	of a switch chip.
> >>>   */
> >>>  struct net_device_ops {
> >>>  	int			(*ndo_init)(struct net_device *dev);
> >>>@@ -1168,6 +1174,10 @@ struct net_device_ops {
> >>>  	int			(*ndo_get_lock_subclass)(struct net_device *dev);
> >>>  	bool			(*ndo_gso_check) (struct sk_buff *skb,
> >>>  						  struct net_device *dev);
> >>>+#ifdef CONFIG_NET_SWITCHDEV
> >>>+	int			(*ndo_sw_parent_id_get)(struct net_device *dev,
> >>>+							struct netdev_phys_item_id *psid);
> >>Can we keep the name generic and not include "sw" which implies switch here
> >>?.
> >>I understand that it is under CONFIG_NET_SWITCHDEV but we might find use for
> >>them in other offload scenarios in the future.
> >>This particular ndo can be just ndo_parent_id_get().
> >>And the others that do specific offloads can have "offload" in them if
> >>required..?.
> >
> >But this is for getting parent switch id, sw should be there.
> 
> Since we have not figured out the details or namespace for switchd ids yet,
> its still some parent id to me.
> 
> >If comes a
> >time when this might be reused to something else, we change it then.
> >This is internal api, easily changeable.
> also, "sw" seems more "software"  than "switch".

I had voiced this same concern when we met and discussed this in
Dusseldorf.  Let's move to another name such as 'hw' (since we really
are talking about hardware abstraction) or 'offload.'  Just using 'sw'
is confusing as many will not read it as switch.

I'll be happy to post a fix based on your devel patches.


> 
> >
> >>
> >>
> >>>+#endif
> >>>  };
> >>>  /**
> >>>diff --git a/include/net/switchdev.h b/include/net/switchdev.h
> >>>new file mode 100644
> >>>index 0000000..79bf9bd
> >>>--- /dev/null
> >>>+++ b/include/net/switchdev.h
> >>>@@ -0,0 +1,30 @@
> >>>+/*
> >>>+ * include/net/switchdev.h - Switch device API
> >>>+ * Copyright (c) 2014 Jiri Pirko <jiri@resnulli.us>
> >>>+ *
> >>>+ * This program is free software; you can redistribute it and/or modify
> >>>+ * it under the terms of the GNU General Public License as published by
> >>>+ * the Free Software Foundation; either version 2 of the License, or
> >>>+ * (at your option) any later version.
> >>>+ */
> >>>+#ifndef _LINUX_SWITCHDEV_H_
> >>>+#define _LINUX_SWITCHDEV_H_
> >>>+
> >>>+#include <linux/netdevice.h>
> >>>+
> >>>+#ifdef CONFIG_NET_SWITCHDEV
> >>>+
> >>>+int netdev_sw_parent_id_get(struct net_device *dev,
> >>>+			    struct netdev_phys_item_id *psid);
> >>>+
> >>>+#else
> >>>+
> >>>+static inline int netdev_sw_parent_id_get(struct net_device *dev,
> >>>+					  struct netdev_phys_item_id *psid)
> >>>+{
> >>>+	return -EOPNOTSUPP;
> >>>+}
> >>>+
> >>>+#endif
> >>>+
> >>>+#endif /* _LINUX_SWITCHDEV_H_ */
> >>>diff --git a/net/Kconfig b/net/Kconfig
> >>>index 99815b5..ff9ffc1 100644
> >>>--- a/net/Kconfig
> >>>+++ b/net/Kconfig
> >>>@@ -228,6 +228,7 @@ source "net/vmw_vsock/Kconfig"
> >>>  source "net/netlink/Kconfig"
> >>>  source "net/mpls/Kconfig"
> >>>  source "net/hsr/Kconfig"
> >>>+source "net/switchdev/Kconfig"
> >>>  config RPS
> >>>  	boolean
> >>>diff --git a/net/Makefile b/net/Makefile
> >>>index 7ed1970..95fc694 100644
> >>>--- a/net/Makefile
> >>>+++ b/net/Makefile
> >>>@@ -73,3 +73,6 @@ obj-$(CONFIG_OPENVSWITCH)	+= openvswitch/
> >>>  obj-$(CONFIG_VSOCKETS)	+= vmw_vsock/
> >>>  obj-$(CONFIG_NET_MPLS_GSO)	+= mpls/
> >>>  obj-$(CONFIG_HSR)		+= hsr/
> >>>+ifneq ($(CONFIG_NET_SWITCHDEV),)
> >>>+obj-y				+= switchdev/
> >>>+endif
> >>>diff --git a/net/switchdev/Kconfig b/net/switchdev/Kconfig
> >>>new file mode 100644
> >>>index 0000000..1557545
> >>>--- /dev/null
> >>>+++ b/net/switchdev/Kconfig
> >>>@@ -0,0 +1,13 @@
> >>>+#
> >>>+# Configuration for Switch device support
> >>>+#
> >>>+
> >>>+config NET_SWITCHDEV
> >>>+	boolean "Switch (and switch-ish) device support (EXPERIMENTAL)"
> >>>+	depends on INET
> >>>+	---help---
> >>>+	  This module provides glue between core networking code and device
> >>>+	  drivers in order to support hardware switch chips in very generic
> >>>+	  meaning of the word "switch". This include devices supporting L2/L3 but
> >>>+	  also various flow offloading chips, including switches embedded into
> >>>+	  SR-IOV NICs.
> >>>diff --git a/net/switchdev/Makefile b/net/switchdev/Makefile
> >>>new file mode 100644
> >>>index 0000000..5ed63ed
> >>>--- /dev/null
> >>>+++ b/net/switchdev/Makefile
> >>>@@ -0,0 +1,5 @@
> >>>+#
> >>>+# Makefile for the Switch device API
> >>>+#
> >>>+
> >>>+obj-$(CONFIG_NET_SWITCHDEV) += switchdev.o
> >>>diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
> >>>new file mode 100644
> >>>index 0000000..5010f646
> >>>--- /dev/null
> >>>+++ b/net/switchdev/switchdev.c
> >>>@@ -0,0 +1,33 @@
> >>>+/*
> >>>+ * net/switchdev/switchdev.c - Switch device API
> >>>+ * Copyright (c) 2014 Jiri Pirko <jiri@resnulli.us>
> >>>+ *
> >>>+ * This program is free software; you can redistribute it and/or modify
> >>>+ * it under the terms of the GNU General Public License as published by
> >>>+ * the Free Software Foundation; either version 2 of the License, or
> >>>+ * (at your option) any later version.
> >>>+ */
> >>>+
> >>>+#include <linux/kernel.h>
> >>>+#include <linux/types.h>
> >>>+#include <linux/init.h>
> >>>+#include <linux/netdevice.h>
> >>>+#include <net/switchdev.h>
> >>>+
> >>>+/**
> >>>+ *	netdev_sw_parent_id_get - Get ID of a switch
> >>>+ *	@dev: port device
> >>>+ *	@psid: switch ID
> >>>+ *
> >>>+ *	Get ID of a switch this port is part of.
> >>>+ */
> >>>+int netdev_sw_parent_id_get(struct net_device *dev,
> >>>+			    struct netdev_phys_item_id *psid)
> >>>+{
> >>>+	const struct net_device_ops *ops = dev->netdev_ops;
> >>>+
> >>>+	if (!ops->ndo_sw_parent_id_get)
> >>>+		return -EOPNOTSUPP;
> >>>+	return ops->ndo_sw_parent_id_get(dev, psid);
> >>>+}
> >>>+EXPORT_SYMBOL(netdev_sw_parent_id_get);
> >--
> >To unsubscribe from this list: send the line "unsubscribe netdev" in
> >the body of a message to majordomo@vger.kernel.org
> >More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch net-next v2 00/10] introduce rocker switch driver with hardware accelerated datapath api - phase 1: bridge fdb offload
  2014-11-13  6:31             ` John Fastabend
@ 2014-11-21  2:01               ` Simon Horman
  2014-11-21  7:20                 ` John Fastabend
  0 siblings, 1 reply; 100+ messages in thread
From: Simon Horman @ 2014-11-21  2:01 UTC (permalink / raw)
  To: John Fastabend
  Cc: John Fastabend, Jamal Hadi Salim, Jiri Pirko, netdev, davem,
	nhorman, andy, tgraf, dborkman, ogerlitz, jesse, pshelar, azhou,
	ben, stephen, jeffrey.t.kirsher, vyasevic, xiyou.wangcong,
	edumazet, sfeldma, f.fainelli, roopa, linville, jasowang,
	ebiederm, nicolas.dichtel, ryazanov.s.a, buytenh, aviadr, nbd,
	alexei.starovoitov, Neil.Jerram, ronye, alexander.h.duyck,
	john.ronciak, mleitner, shrijeet

Hi John,

On Wed, Nov 12, 2014 at 10:31:06PM -0800, John Fastabend wrote:
> On 11/12/2014 09:44 PM, Simon Horman wrote:
> > [snip]
> > 
> >> Simon, if your feeling adventurous any feedback on the repo link
> >> would be great. I still need to smash the commit log into something
> >> coherent though at the moment you can see all the errors and rewrites,
> >> etc as I made them.
> > 
> > Hi John,
> > 
> > here is some preliminary feedback:
> > 
> > * I notice that the parse graph code isn't present yet.
> >   I suppose this is a difficult piece that naturally follows many
> >   other piece. None the less it is possibly the piece of most
> >   interest to me :-)
> 
> I can add this over the next few days. Also I wanted to publish some
> more complex examples on top of rocker switch. The nic drivers are
> interesting but not as complex as some of the switch devices.

I see that you have added the header graph, which seems pretty nice
from my reading so far.

It seems to allow for arbitrary connections between instances
of net_flow_header_node, including I loops I suppose. This seems
nice and flexible to me.

I have a very minor update to contribute which helped me to
read the code. Please feel free to squash/ignore/...

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_pipeline.h b/drivers/net/ethernet/intel/ixgbe/ixgbe_pipeline.h
index a4818ab..4025a61 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_pipeline.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_pipeline.h
@@ -412,7 +412,7 @@ struct net_flow_jump_table ixgbe_vlan_inner_jump[2] = {
 		   .type = NET_FLOW_FIELD_REF_ATTR_TYPE_U16,
 		   .value_u16 = 0x0800,
 		},
-		.node = 4,
+		.node = HEADER_INSTANCE_IP,
 	},
 	{
 		.field = {0},
@@ -443,7 +443,7 @@ struct net_flow_jump_table ixgbe_ip_jump[2] = {
 		   .type = NET_FLOW_FIELD_REF_ATTR_TYPE_U8,
 		   .value_u8 = 0x06,
 		},
-		.node = 5,
+		.node = HEADER_INSTANCE_TCP,
 	},
 	{
 		.field = {0},

> There is also the table graph layout which I wanted tweak a bit. At
> the moment I have hardware that can run tables in parallel and some
> that executes tables in sequence. It might not be clear from the code
> (why I need the cleanup) but the source id is being used to indicate
> if the tables are executed in parallel or not.

Thanks, that was not clear to me.

> > * Will del and update flows require flows to already exist?
> >   And similarly, will add flow require flows with the same match to not
> >   already exist?  If so, the error handling seems tricky of more than one
> >   flow is to be deleted/updated. IIRC there was some discussion of that
> >   kind of issue at the (double) round table discussion on the last day of
> >   LPC14 in Düsseldorf.
> 
> I would expect del/updates for flows that don't exist should fail.

Am I right in thinking that del and update flow NDOs may take
a list of flows? If so I think some consideration needs
to be made for handling failure of e.g. the last element of
the list when the previous elements succeeded.

I suppose that user-space could dump the flow table if an error and
adjust its state accordingly. But that seems somewhat onerous.

> I didn't intend to add any checks in the kernel to verify the matches
> are unique. My opinion on this is that user space shouldn't add new
> duplicate flows. And if it does hardware resources will be wasted.

I don't have any strong opinions on that at this time.
But it does seem reasonable so long as its clear that is the case.

> > * Should the .node_count value of ixgbe_table_node_l2 be 3?
> >   ixgbe_table_graph_nodes has three elements but perhaps you
> >   are intentionally excluding the last element ixgbe_table_node_nil?
> > 
> 
> Actually I could just drop the node_count at the moment because I've
> been null terminating the arrays with null items.
> 
> I should either add a count field to all the structures or null terminate
> the arrays. For now I mostly null terminate the arrays when I use
> them. For example matches is null terminates same for actions.

It looks like you have move towards the null termination option.
No objections here.

^ permalink raw reply related	[flat|nested] 100+ messages in thread

* Re: [patch net-next v2 02/10] net: introduce generic switch devices support
  2014-11-20 15:55         ` Andy Gospodarek
@ 2014-11-21  7:16           ` Jiri Pirko
  0 siblings, 0 replies; 100+ messages in thread
From: Jiri Pirko @ 2014-11-21  7:16 UTC (permalink / raw)
  To: Andy Gospodarek
  Cc: Roopa Prabhu, netdev, davem, nhorman, andy, tgraf, dborkman,
	ogerlitz, jesse, pshelar, azhou, ben, stephen, jeffrey.t.kirsher,
	vyasevic, xiyou.wangcong, john.r.fastabend, edumazet, jhs,
	sfeldma, f.fainelli, linville, jasowang, ebiederm,
	nicolas.dichtel, ryazanov.s.a, buytenh, aviadr, nbd,
	alexei.starovoitov, Neil.Jerram, ronye, simon.horman,
	alexander.h.duyck, john.ronciak, mleitner, shrijeet, bcrl

Thu, Nov 20, 2014 at 04:55:41PM CET, gospo@cumulusnetworks.com wrote:
>On Wed, Nov 19, 2014 at 05:59:19AM -0800, Roopa Prabhu wrote:
>> On 11/19/14, 5:46 AM, Jiri Pirko wrote:
>> >Wed, Nov 19, 2014 at 02:28:10PM CET, roopa@cumulusnetworks.com wrote:
>> >>On 11/9/14, 2:51 AM, Jiri Pirko wrote:
>> >>>The goal of this is to provide a possibility to support various switch
>> >>>chips. Drivers should implement relevant ndos to do so. Now there is
>> >>>only one ndo defined:
>> >>>- for getting physical switch id is in place.
>> >>>
>> >>>Note that user can use random port netdevice to access the switch.
>> >>>
>> >>>Signed-off-by: Jiri Pirko <jiri@resnulli.us>
>> >>>---
>> >>>  Documentation/networking/switchdev.txt | 59 ++++++++++++++++++++++++++++++++++
>> >>>  MAINTAINERS                            |  7 ++++
>> >>>  include/linux/netdevice.h              | 10 ++++++
>> >>>  include/net/switchdev.h                | 30 +++++++++++++++++
>> >>>  net/Kconfig                            |  1 +
>> >>>  net/Makefile                           |  3 ++
>> >>>  net/switchdev/Kconfig                  | 13 ++++++++
>> >>>  net/switchdev/Makefile                 |  5 +++
>> >>>  net/switchdev/switchdev.c              | 33 +++++++++++++++++++
>> >>>  9 files changed, 161 insertions(+)
>> >>>  create mode 100644 Documentation/networking/switchdev.txt
>> >>>  create mode 100644 include/net/switchdev.h
>> >>>  create mode 100644 net/switchdev/Kconfig
>> >>>  create mode 100644 net/switchdev/Makefile
>> >>>  create mode 100644 net/switchdev/switchdev.c
>> >>>
>> >>>diff --git a/Documentation/networking/switchdev.txt b/Documentation/networking/switchdev.txt
>> >>>new file mode 100644
>> >>>index 0000000..98be76c
>> >>>--- /dev/null
>> >>>+++ b/Documentation/networking/switchdev.txt
>> >>>@@ -0,0 +1,59 @@
>> >>>+Switch (and switch-ish) device drivers HOWTO
>> >>>+===========================
>> >>>+
>> >>>+Please note that the word "switch" is here used in very generic meaning.
>> >>>+This include devices supporting L2/L3 but also various flow offloading chips,
>> >>>+including switches embedded into SR-IOV NICs.
>> >>>+
>> >>>+Lets describe a topology a bit. Imagine the following example:
>> >>>+
>> >>>+       +----------------------------+    +---------------+
>> >>>+       |     SOME switch chip       |    |      CPU      |
>> >>>+       +----------------------------+    +---------------+
>> >>>+       port1 port2 port3 port4 MNGMNT    |     PCI-E     |
>> >>>+         |     |     |     |     |       +---------------+
>> >>>+        PHY   PHY    |     |     |         |  NIC0 NIC1
>> >>>+                     |     |     |         |   |    |
>> >>>+                     |     |     +- PCI-E -+   |    |
>> >>>+                     |     +------- MII -------+    |
>> >>>+                     +------------- MII ------------+
>> >>>+
>> >>>+In this example, there are two independent lines between the switch silicon
>> >>>+and CPU. NIC0 and NIC1 drivers are not aware of a switch presence. They are
>> >>>+separate from the switch driver. SOME switch chip is by managed by a driver
>> >>>+via PCI-E device MNGMNT. Note that MNGMNT device, NIC0 and NIC1 may be
>> >>>+connected to some other type of bus.
>> >>>+
>> >>>+Now, for the previous example show the representation in kernel:
>> >>>+
>> >>>+       +----------------------------+    +---------------+
>> >>>+       |     SOME switch chip       |    |      CPU      |
>> >>>+       +----------------------------+    +---------------+
>> >>>+       sw0p0 sw0p1 sw0p2 sw0p3 MNGMNT    |     PCI-E     |
>> >>>+         |     |     |     |     |       +---------------+
>> >>>+        PHY   PHY    |     |     |         |  eth0 eth1
>> >>>+                     |     |     |         |   |    |
>> >>>+                     |     |     +- PCI-E -+   |    |
>> >>>+                     |     +------- MII -------+    |
>> >>>+                     +------------- MII ------------+
>> >>>+
>> >>>+Lets call the example switch driver for SOME switch chip "SOMEswitch". This
>> >>>+driver takes care of PCI-E device MNGMNT. There is a netdevice instance sw0pX
>> >>>+created for each port of a switch. These netdevices are instances
>> >>>+of "SOMEswitch" driver. sw0pX netdevices serve as a "representation"
>> >>>+of the switch chip. eth0 and eth1 are instances of some other existing driver.
>> >>>+
>> >>>+The only difference of the switch-port netdevice from the ordinary netdevice
>> >>>+is that is implements couple more NDOs:
>> >>>+
>> >>>+	ndo_sw_parent_get_id - This returns the same ID for two port netdevices
>> >>>+			       of the same physical switch chip. This is
>> >>>+			       mandatory to be implemented by all switch drivers
>> >>>+			       and serves the caller for recognition of a port
>> >>>+			       netdevice.
>> >>>+	ndo_sw_parent_* - Functions that serve for a manipulation of the switch
>> >>>+			  chip itself (it can be though of as a "parent" of the
>> >>>+			  port, therefore the name). They are not port-specific.
>> >>>+			  Caller might use arbitrary port netdevice of the same
>> >>>+			  switch and it will make no difference.
>> >>>+	ndo_sw_port_* - Functions that serve for a port-specific manipulation.
>> >>>diff --git a/MAINTAINERS b/MAINTAINERS
>> >>>index 3a41fb0..776e078 100644
>> >>>--- a/MAINTAINERS
>> >>>+++ b/MAINTAINERS
>> >>>@@ -9003,6 +9003,13 @@ F:	lib/swiotlb.c
>> >>>  F:	arch/*/kernel/pci-swiotlb.c
>> >>>  F:	include/linux/swiotlb.h
>> >>>+SWITCHDEV
>> >>>+M:	Jiri Pirko <jiri@resnulli.us>
>> >>>+L:	netdev@vger.kernel.org
>> >>>+S:	Supported
>> >>>+F:	net/switchdev/
>> >>>+F:	include/net/switchdev.h
>> >>>+
>> >>>  SYNOPSYS ARC ARCHITECTURE
>> >>>  M:	Vineet Gupta <vgupta@synopsys.com>
>> >>>  S:	Supported
>> >>>diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>> >>>index 71922e0..97eade9 100644
>> >>>--- a/include/linux/netdevice.h
>> >>>+++ b/include/linux/netdevice.h
>> >>>@@ -1017,6 +1017,12 @@ typedef u16 (*select_queue_fallback_t)(struct net_device *dev,
>> >>>   *	performing GSO on a packet. The device returns true if it is
>> >>>   *	able to GSO the packet, false otherwise. If the return value is
>> >>>   *	false the stack will do software GSO.
>> >>>+ *
>> >>>+ * int (*ndo_sw_parent_id_get)(struct net_device *dev,
>> >>>+ *			       struct netdev_phys_item_id *psid);
>> >>>+ *	Called to get an ID of the switch chip this port is part of.
>> >>>+ *	If driver implements this, it indicates that it represents a port
>> >>>+ *	of a switch chip.
>> >>>   */
>> >>>  struct net_device_ops {
>> >>>  	int			(*ndo_init)(struct net_device *dev);
>> >>>@@ -1168,6 +1174,10 @@ struct net_device_ops {
>> >>>  	int			(*ndo_get_lock_subclass)(struct net_device *dev);
>> >>>  	bool			(*ndo_gso_check) (struct sk_buff *skb,
>> >>>  						  struct net_device *dev);
>> >>>+#ifdef CONFIG_NET_SWITCHDEV
>> >>>+	int			(*ndo_sw_parent_id_get)(struct net_device *dev,
>> >>>+							struct netdev_phys_item_id *psid);
>> >>Can we keep the name generic and not include "sw" which implies switch here
>> >>?.
>> >>I understand that it is under CONFIG_NET_SWITCHDEV but we might find use for
>> >>them in other offload scenarios in the future.
>> >>This particular ndo can be just ndo_parent_id_get().
>> >>And the others that do specific offloads can have "offload" in them if
>> >>required..?.
>> >
>> >But this is for getting parent switch id, sw should be there.
>> 
>> Since we have not figured out the details or namespace for switchd ids yet,
>> its still some parent id to me.
>> 
>> >If comes a
>> >time when this might be reused to something else, we change it then.
>> >This is internal api, easily changeable.
>> also, "sw" seems more "software"  than "switch".
>
>I had voiced this same concern when we met and discussed this in
>Dusseldorf.  Let's move to another name such as 'hw' (since we really
>are talking about hardware abstraction) or 'offload.'  Just using 'sw'
>is confusing as many will not read it as switch.
>
>I'll be happy to post a fix based on your devel patches.

Np, I'll take care of it.

>
>
>> 
>> >
>> >>
>> >>
>> >>>+#endif
>> >>>  };
>> >>>  /**
>> >>>diff --git a/include/net/switchdev.h b/include/net/switchdev.h
>> >>>new file mode 100644
>> >>>index 0000000..79bf9bd
>> >>>--- /dev/null
>> >>>+++ b/include/net/switchdev.h
>> >>>@@ -0,0 +1,30 @@
>> >>>+/*
>> >>>+ * include/net/switchdev.h - Switch device API
>> >>>+ * Copyright (c) 2014 Jiri Pirko <jiri@resnulli.us>
>> >>>+ *
>> >>>+ * This program is free software; you can redistribute it and/or modify
>> >>>+ * it under the terms of the GNU General Public License as published by
>> >>>+ * the Free Software Foundation; either version 2 of the License, or
>> >>>+ * (at your option) any later version.
>> >>>+ */
>> >>>+#ifndef _LINUX_SWITCHDEV_H_
>> >>>+#define _LINUX_SWITCHDEV_H_
>> >>>+
>> >>>+#include <linux/netdevice.h>
>> >>>+
>> >>>+#ifdef CONFIG_NET_SWITCHDEV
>> >>>+
>> >>>+int netdev_sw_parent_id_get(struct net_device *dev,
>> >>>+			    struct netdev_phys_item_id *psid);
>> >>>+
>> >>>+#else
>> >>>+
>> >>>+static inline int netdev_sw_parent_id_get(struct net_device *dev,
>> >>>+					  struct netdev_phys_item_id *psid)
>> >>>+{
>> >>>+	return -EOPNOTSUPP;
>> >>>+}
>> >>>+
>> >>>+#endif
>> >>>+
>> >>>+#endif /* _LINUX_SWITCHDEV_H_ */
>> >>>diff --git a/net/Kconfig b/net/Kconfig
>> >>>index 99815b5..ff9ffc1 100644
>> >>>--- a/net/Kconfig
>> >>>+++ b/net/Kconfig
>> >>>@@ -228,6 +228,7 @@ source "net/vmw_vsock/Kconfig"
>> >>>  source "net/netlink/Kconfig"
>> >>>  source "net/mpls/Kconfig"
>> >>>  source "net/hsr/Kconfig"
>> >>>+source "net/switchdev/Kconfig"
>> >>>  config RPS
>> >>>  	boolean
>> >>>diff --git a/net/Makefile b/net/Makefile
>> >>>index 7ed1970..95fc694 100644
>> >>>--- a/net/Makefile
>> >>>+++ b/net/Makefile
>> >>>@@ -73,3 +73,6 @@ obj-$(CONFIG_OPENVSWITCH)	+= openvswitch/
>> >>>  obj-$(CONFIG_VSOCKETS)	+= vmw_vsock/
>> >>>  obj-$(CONFIG_NET_MPLS_GSO)	+= mpls/
>> >>>  obj-$(CONFIG_HSR)		+= hsr/
>> >>>+ifneq ($(CONFIG_NET_SWITCHDEV),)
>> >>>+obj-y				+= switchdev/
>> >>>+endif
>> >>>diff --git a/net/switchdev/Kconfig b/net/switchdev/Kconfig
>> >>>new file mode 100644
>> >>>index 0000000..1557545
>> >>>--- /dev/null
>> >>>+++ b/net/switchdev/Kconfig
>> >>>@@ -0,0 +1,13 @@
>> >>>+#
>> >>>+# Configuration for Switch device support
>> >>>+#
>> >>>+
>> >>>+config NET_SWITCHDEV
>> >>>+	boolean "Switch (and switch-ish) device support (EXPERIMENTAL)"
>> >>>+	depends on INET
>> >>>+	---help---
>> >>>+	  This module provides glue between core networking code and device
>> >>>+	  drivers in order to support hardware switch chips in very generic
>> >>>+	  meaning of the word "switch". This include devices supporting L2/L3 but
>> >>>+	  also various flow offloading chips, including switches embedded into
>> >>>+	  SR-IOV NICs.
>> >>>diff --git a/net/switchdev/Makefile b/net/switchdev/Makefile
>> >>>new file mode 100644
>> >>>index 0000000..5ed63ed
>> >>>--- /dev/null
>> >>>+++ b/net/switchdev/Makefile
>> >>>@@ -0,0 +1,5 @@
>> >>>+#
>> >>>+# Makefile for the Switch device API
>> >>>+#
>> >>>+
>> >>>+obj-$(CONFIG_NET_SWITCHDEV) += switchdev.o
>> >>>diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
>> >>>new file mode 100644
>> >>>index 0000000..5010f646
>> >>>--- /dev/null
>> >>>+++ b/net/switchdev/switchdev.c
>> >>>@@ -0,0 +1,33 @@
>> >>>+/*
>> >>>+ * net/switchdev/switchdev.c - Switch device API
>> >>>+ * Copyright (c) 2014 Jiri Pirko <jiri@resnulli.us>
>> >>>+ *
>> >>>+ * This program is free software; you can redistribute it and/or modify
>> >>>+ * it under the terms of the GNU General Public License as published by
>> >>>+ * the Free Software Foundation; either version 2 of the License, or
>> >>>+ * (at your option) any later version.
>> >>>+ */
>> >>>+
>> >>>+#include <linux/kernel.h>
>> >>>+#include <linux/types.h>
>> >>>+#include <linux/init.h>
>> >>>+#include <linux/netdevice.h>
>> >>>+#include <net/switchdev.h>
>> >>>+
>> >>>+/**
>> >>>+ *	netdev_sw_parent_id_get - Get ID of a switch
>> >>>+ *	@dev: port device
>> >>>+ *	@psid: switch ID
>> >>>+ *
>> >>>+ *	Get ID of a switch this port is part of.
>> >>>+ */
>> >>>+int netdev_sw_parent_id_get(struct net_device *dev,
>> >>>+			    struct netdev_phys_item_id *psid)
>> >>>+{
>> >>>+	const struct net_device_ops *ops = dev->netdev_ops;
>> >>>+
>> >>>+	if (!ops->ndo_sw_parent_id_get)
>> >>>+		return -EOPNOTSUPP;
>> >>>+	return ops->ndo_sw_parent_id_get(dev, psid);
>> >>>+}
>> >>>+EXPORT_SYMBOL(netdev_sw_parent_id_get);
>> >--
>> >To unsubscribe from this list: send the line "unsubscribe netdev" in
>> >the body of a message to majordomo@vger.kernel.org
>> >More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> 

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch net-next v2 00/10] introduce rocker switch driver with hardware accelerated datapath api - phase 1: bridge fdb offload
  2014-11-21  2:01               ` Simon Horman
@ 2014-11-21  7:20                 ` John Fastabend
  0 siblings, 0 replies; 100+ messages in thread
From: John Fastabend @ 2014-11-21  7:20 UTC (permalink / raw)
  To: Simon Horman
  Cc: John Fastabend, Jamal Hadi Salim, Jiri Pirko, netdev, davem,
	nhorman, andy, tgraf, dborkman, ogerlitz, jesse, pshelar, azhou,
	ben, stephen, jeffrey.t.kirsher, vyasevic, xiyou.wangcong,
	edumazet, sfeldma, f.fainelli, roopa, linville, jasowang,
	ebiederm, nicolas.dichtel, ryazanov.s.a, buytenh, aviadr, nbd,
	alexei.starovoitov, Neil.Jerram, ronye, alexander.h.duyck,
	john.ronciak, mleitner, shrijeet

On 11/20/2014 06:01 PM, Simon Horman wrote:
> Hi John,
> 
> On Wed, Nov 12, 2014 at 10:31:06PM -0800, John Fastabend wrote:
>> On 11/12/2014 09:44 PM, Simon Horman wrote:
>>> [snip]
>>>
>>>> Simon, if your feeling adventurous any feedback on the repo link
>>>> would be great. I still need to smash the commit log into something
>>>> coherent though at the moment you can see all the errors and rewrites,
>>>> etc as I made them.
>>>
>>> Hi John,
>>>
>>> here is some preliminary feedback:
>>>
>>> * I notice that the parse graph code isn't present yet.
>>>   I suppose this is a difficult piece that naturally follows many
>>>   other piece. None the less it is possibly the piece of most
>>>   interest to me :-)
>>
>> I can add this over the next few days. Also I wanted to publish some
>> more complex examples on top of rocker switch. The nic drivers are
>> interesting but not as complex as some of the switch devices.
> 
> I see that you have added the header graph, which seems pretty nice
> from my reading so far.

Great. I'm iterating over some other hardware now to be sure it will
work on some more complex configurations. And about ready to start
the rocker implementation.

> 
> It seems to allow for arbitrary connections between instances
> of net_flow_header_node, including I loops I suppose. This seems
> nice and flexible to me.

Right loops could be supported.


> 
> I have a very minor update to contribute which helped me to
> read the code. Please feel free to squash/ignore/...
> 

I like the patch. This sort of cleanup is needed.

> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_pipeline.h b/drivers/net/ethernet/intel/ixgbe/ixgbe_pipeline.h
> index a4818ab..4025a61 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_pipeline.h
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_pipeline.h
> @@ -412,7 +412,7 @@ struct net_flow_jump_table ixgbe_vlan_inner_jump[2] = {
>  		   .type = NET_FLOW_FIELD_REF_ATTR_TYPE_U16,
>  		   .value_u16 = 0x0800,
>  		},
> -		.node = 4,
> +		.node = HEADER_INSTANCE_IP,
>  	},
>  	{
>  		.field = {0},
> @@ -443,7 +443,7 @@ struct net_flow_jump_table ixgbe_ip_jump[2] = {
>  		   .type = NET_FLOW_FIELD_REF_ATTR_TYPE_U8,
>  		   .value_u8 = 0x06,
>  		},
> -		.node = 5,
> +		.node = HEADER_INSTANCE_TCP,
>  	},
>  	{
>  		.field = {0},
> 
>> There is also the table graph layout which I wanted tweak a bit. At
>> the moment I have hardware that can run tables in parallel and some
>> that executes tables in sequence. It might not be clear from the code
>> (why I need the cleanup) but the source id is being used to indicate
>> if the tables are executed in parallel or not.
> 
> Thanks, that was not clear to me.
> 
>>> * Will del and update flows require flows to already exist?
>>>   And similarly, will add flow require flows with the same match to not
>>>   already exist?  If so, the error handling seems tricky of more than one
>>>   flow is to be deleted/updated. IIRC there was some discussion of that
>>>   kind of issue at the (double) round table discussion on the last day of
>>>   LPC14 in Düsseldorf.
>>
>> I would expect del/updates for flows that don't exist should fail.
> 
> Am I right in thinking that del and update flow NDOs may take
> a list of flows? If so I think some consideration needs
> to be made for handling failure of e.g. the last element of
> the list when the previous elements succeeded.

Yes the list of add/deletes are needed for user space to push down
bulk commands. At init time or when management software resets or
likely other cases we get large sets of rules being pushed
down doing this in bulk is much better then one-offing the flow commands.

> 
> I suppose that user-space could dump the flow table if an error and
> adjust its state accordingly. But that seems somewhat onerous.

Right we talked about this briefly @ LPC if I recall correctly. I see
a couple ways to do this. Either we do a best effort and return the
first flow to have an error so userspace can learn where the failure
occurred. Another option is to have the driver roll-back and remove any
flows that were added leaving the driver in the same state it was before
the add flow command started. Or we could support both and let user space
tell us if we should use best effort or required modes.

Any thoughts?

> 
>> I didn't intend to add any checks in the kernel to verify the matches
>> are unique. My opinion on this is that user space shouldn't add new
>> duplicate flows. And if it does hardware resources will be wasted.
> 
> I don't have any strong opinions on that at this time.
> But it does seem reasonable so long as its clear that is the case.

When I get to document this I'll call it out explicitly. I think we
should have something in ./Documentation/networkin/ for this API.

> 
>>> * Should the .node_count value of ixgbe_table_node_l2 be 3?
>>>   ixgbe_table_graph_nodes has three elements but perhaps you
>>>   are intentionally excluding the last element ixgbe_table_node_nil?
>>>
>>
>> Actually I could just drop the node_count at the moment because I've
>> been null terminating the arrays with null items.
>>
>> I should either add a count field to all the structures or null terminate
>> the arrays. For now I mostly null terminate the arrays when I use
>> them. For example matches is null terminates same for actions.
> 
> It looks like you have move towards the null termination option.
> No objections here.
> 

yep, the code looked nicer to me at least with this approach.

Thanks for looking it over,
John

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [patch net-next v2 05/10] rocker: introduce rocker switch driver
  2014-11-11 15:40           ` Jiri Pirko
  2014-11-11 16:10             ` Thomas Graf
@ 2014-11-27 14:09             ` Florian Fainelli
  1 sibling, 0 replies; 100+ messages in thread
From: Florian Fainelli @ 2014-11-27 14:09 UTC (permalink / raw)
  To: Jiri Pirko, Thomas Graf
  Cc: John Fastabend, netdev, davem, nhorman, andy, dborkman, ogerlitz,
	jesse, pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, jhs, sfeldma, roopa,
	linville, jasowang, ebiederm, nicolas.dichtel, ryazanov.s.a,
	buytenh, aviadr, nbd, alexei.starovoitov, Neil.Jerram, ronye,
	simon.horman, alexander.h.duyck, john.ronciak, mleitner,
	shrijeet, gospo, bcr

Le 11/11/2014 07:40, Jiri Pirko a écrit :
> Tue, Nov 11, 2014 at 04:32:32PM CET, tgraf@suug.ch wrote:
>> On 11/11/14 at 04:19pm, Jiri Pirko wrote:
>>> Tue, Nov 11, 2014 at 03:29:46PM CET, tgraf@suug.ch wrote:
>>>> On 11/10/14 at 02:04pm, John Fastabend wrote:
>>>>> On 11/09/2014 02:51 AM, Jiri Pirko wrote:
>>>>>> +static int rocker_port_sw_parent_id_get(struct net_device *dev,
>>>>>> +					struct netdev_phys_item_id *psid)
>>>>>> +{
>>>>>> +	struct rocker_port *rocker_port = netdev_priv(dev);
>>>>>> +	struct rocker *rocker = rocker_port->rocker;
>>>>>> +
>>>>>
>>>>> hmm looks like you read this out of a magic switch register :) but
>>>>> my switch doesn't have this magic reg. I suposse the switch MAC address
>>>>> should work.
>>>>
>>>> This needs more work afterwards. Either we define that the switch ID
>>>> is only unique in combination with the parent ifindex or we need to
>>>> introduce a notation of uniquness into the switch ID itself.
>>>
>>> This is something similar to physical port id. Each driver should take
>>> care of generating that id.
>>
>> If the ID is only unique within a driver, then the user space cannot
>> rely on using the ID to group switch ports. Multiple drivers might
>> come up with the same ID.
> 
> Well, as I said, it is the same as for physical port id. But if needed,
> there can be added some simple mechanism for the id registration
> ensuring their uniqueness.

We could use the idr/ida subsystem to provide a global unique id per
switch device that gets registered, ultimately, I suspect that a
management application might want to get some sense of the topology by
exploiting some unique HW properties such as:

- MDIO bus address for MDIO-connected switches
- SPI chip-select address
- GPIO(s) used to connect
- PCI bus/slot

they are also unique by design, and add to that any revision/OUI
register that is available for the driver. I can't find of a good way to
hash that to produce a unique identifier, but maybe we can use that
information somehow.

> 
>>
>> Even now, multiple rocker instances would have the same ID.
> 
> It depends on what hw returns to driver.
> 

^ permalink raw reply	[flat|nested] 100+ messages in thread

end of thread, other threads:[~2014-11-27  6:09 UTC | newest]

Thread overview: 100+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-11-09 10:51 [patch net-next v2 00/10] introduce rocker switch driver with hardware accelerated datapath api - phase 1: bridge fdb offload Jiri Pirko
2014-11-09 10:51 ` [patch net-next v2 01/10] net: rename netdev_phys_port_id to more generic name Jiri Pirko
2014-11-10  3:35   ` Jamal Hadi Salim
2014-11-10  5:23     ` David Miller
2014-11-10 12:06       ` Jamal Hadi Salim
2014-11-10 12:33         ` Daniel Borkmann
2014-11-10 12:56           ` Jamal Hadi Salim
2014-11-10 16:28         ` David Miller
2014-11-10  7:43     ` Jiri Pirko
2014-11-10 12:17       ` Jamal Hadi Salim
2014-11-10 13:16         ` Jiri Pirko
2014-11-10 13:20           ` Jamal Hadi Salim
2014-11-10 16:28         ` David Miller
2014-11-10 19:03           ` Jamal Hadi Salim
2014-11-10 21:57   ` John Fastabend
2014-11-09 10:51 ` [patch net-next v2 02/10] net: introduce generic switch devices support Jiri Pirko
2014-11-10 21:59   ` John Fastabend
2014-11-11 15:11     ` Jiri Pirko
2014-11-11  9:49   ` M. Braun
2014-11-11 10:04     ` Jiri Pirko
2014-11-19 13:28   ` Roopa Prabhu
2014-11-19 13:46     ` Jiri Pirko
2014-11-19 13:59       ` Roopa Prabhu
2014-11-20 15:55         ` Andy Gospodarek
2014-11-21  7:16           ` Jiri Pirko
2014-11-09 10:51 ` [patch net-next v2 03/10] rtnl: expose physical switch id for particular device Jiri Pirko
2014-11-10  3:43   ` Jamal Hadi Salim
2014-11-10  7:45     ` Jiri Pirko
2014-11-10 17:58   ` Roopa Prabhu
2014-11-10 20:02     ` Scott Feldman
2014-11-11 13:55       ` Roopa Prabhu
2014-11-10 22:14     ` Jiri Pirko
2014-11-10 22:31       ` John Fastabend
2014-11-10 22:01   ` John Fastabend
2014-11-09 10:51 ` [patch net-next v2 04/10] net-sysfs: " Jiri Pirko
2014-11-10 22:01   ` John Fastabend
2014-11-09 10:51 ` [patch net-next v2 05/10] rocker: introduce rocker switch driver Jiri Pirko
2014-11-10 22:04   ` John Fastabend
2014-11-11 14:29     ` Thomas Graf
2014-11-11 15:19       ` Jiri Pirko
2014-11-11 15:32         ` Thomas Graf
2014-11-11 15:40           ` Jiri Pirko
2014-11-11 16:10             ` Thomas Graf
2014-11-27 14:09             ` Florian Fainelli
2014-11-11 15:41           ` Roopa Prabhu
2014-11-11 15:44             ` John Fastabend
2014-11-11 15:28     ` Jiri Pirko
2014-11-09 10:51 ` [patch net-next v2 06/10] bridge: introduce fdb offloading via switchdev Jiri Pirko
2014-11-10  3:47   ` Jamal Hadi Salim
2014-11-10  8:15     ` Jiri Pirko
2014-11-10  9:30       ` Scott Feldman
2014-11-10 12:47       ` Jamal Hadi Salim
2014-11-10 13:47         ` Jiri Pirko
2014-11-10 19:13           ` Jamal Hadi Salim
2014-11-10 13:51       ` Thomas Graf
2014-11-10 17:30         ` Andy Gospodarek
2014-11-10 19:03           ` Roopa Prabhu
2014-11-12 13:43             ` Jiri Pirko
2014-11-09 10:51 ` [patch net-next v2 07/10] bridge: call netdev_sw_port_stp_update when bridge port STP status changes Jiri Pirko
2014-11-10 13:11   ` Jamal Hadi Salim
2014-11-10 14:04     ` Thomas Graf
2014-11-10 19:20       ` Jamal Hadi Salim
2014-11-10 15:59     ` Roopa Prabhu
2014-11-09 10:51 ` [patch net-next v2 08/10] bridge: add API to notify bridge driver of learned FBD on offloaded device Jiri Pirko
2014-11-11 14:21   ` Roopa Prabhu
2014-11-11 17:38     ` Scott Feldman
2014-11-11 21:43       ` Roopa Prabhu
2014-11-09 10:51 ` [patch net-next v2 09/10] rocker: implement rocker ofdpa flow table manipulation Jiri Pirko
2014-11-09 10:51 ` [patch net-next v2 10/10] rocker: implement L2 bridge offloading Jiri Pirko
2014-11-10  3:53   ` Jamal Hadi Salim
2014-11-10  8:18     ` Jiri Pirko
2014-11-10  9:10       ` Nicolas Dichtel
2014-11-10  8:46     ` Scott Feldman
2014-11-10 12:27       ` Jamal Hadi Salim
2014-11-10 16:12         ` Roopa Prabhu
2014-11-10 17:36           ` Scott Feldman
2014-11-10 18:35             ` Roopa Prabhu
2014-11-10 19:27               ` Jamal Hadi Salim
2014-11-10 19:47                 ` Scott Feldman
2014-11-10 21:14                   ` Jamal Hadi Salim
2014-11-10 19:25             ` Jamal Hadi Salim
2014-11-10 17:22         ` Scott Feldman
2014-11-09 16:40 ` [patch net-next] bridge: rename fdb_*_hw to fdb_*_hw_addr to avoid confusion Jiri Pirko
2014-11-11  2:33   ` David Miller
2014-11-11  7:20     ` Jiri Pirko
2014-11-10  3:31 ` [patch net-next v2 00/10] introduce rocker switch driver with hardware accelerated datapath api - phase 1: bridge fdb offload Jamal Hadi Salim
2014-11-10  3:46   ` Simon Horman
2014-11-10  4:03     ` Jamal Hadi Salim
2014-11-10  4:58       ` Simon Horman
2014-11-10 22:23         ` John Fastabend
2014-11-11  8:51           ` Simon Horman
2014-11-13  5:44           ` Simon Horman
2014-11-13  6:31             ` John Fastabend
2014-11-21  2:01               ` Simon Horman
2014-11-21  7:20                 ` John Fastabend
2014-11-10  7:23   ` Jiri Pirko
2014-11-10 12:16     ` Jamal Hadi Salim
2014-11-10 13:12       ` Jiri Pirko
2014-11-10 16:48 ` Thomas Graf
2014-11-12 13:44 ` Jiri Pirko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).