All of lore.kernel.org
 help / color / mirror / Atom feed
* [next-queue v6 PATCH 0/7] i40e: Add port representor and initial switchdev support
@ 2017-03-30  0:22 ` Sridhar Samudrala
  0 siblings, 0 replies; 46+ messages in thread
From: Sridhar Samudrala @ 2017-03-30  0:22 UTC (permalink / raw)
  To: intel-wired-lan, netdev, alexander.h.duyck, anjali.singhai,
	jakub.kicinski, gerlitz.or, jiri, sridhar.samudrala

- Patch 1 introduces devlink interface to get/set the mode of switch.
- Patch 2 adds support to create control plane port representor netdevs
  associated with dataplane PF and VFs that can be used to control/configure
  PF/VFs even when they are in a different namespace.
- Patch 3 enables syncing link state between PF/VFs and associated Port
  representors.
- Patch 4 adds a new type to metadata_dst to allow passing port id to lower
  device.
- Patch 5 adds TX and RX support to port netdevs.
- Patch 6 enables HW and SW Port statistics to be exposed via netlink on
  Port netdevs.
- Patch 7 adds support to get switch id and port number for Port netdevs.

v6:
- Port representor netdevs are created and supported for PFs too.
- Broadcast filters are not disabled by default on VFs in switchdev mode.
  Instead, offload_fwd_mark is set on skb's that are forwarded to Port
  netdevs to indicate that the HW has done the forwarding.
- Random mac addresses are assigned to Port netdevs.
v5:
- Fix an issue with the link state sync patch.
v4:
- Make VFPR ndo_get_stats64 a void function to match with recent upstream
  change.
v3:
- misc. error handling fixes suggested by Scott Peterson
- introduce switchdev_ops and add support to get switch id and port no.
  for VFPR netdevs. Suggested by Or Gerlitz
v2:
- handle i40e_alloc_vfpr_netdev() failures.
- minor comment/commit msg updates.

Jakub Kicinski (1):
  net: store port/representator id in metadata_dst

Sridhar Samudrala (6):
  i40e: Introduce devlink interface
  i40e: Introduce Port Representor netdevs and switchdev mode.
  i40e: Sync link state between PF/VFs and Port representor netdevs
  i40e: Add TX and RX support over port netdev's in switchdev mode
  i40e: Add support for exposing switch port statistics via port netdevs
  i40e: Add support to get switch id and port number for port netdevs

 drivers/net/ethernet/intel/Kconfig                 |   1 +
 drivers/net/ethernet/intel/i40e/i40e.h             |  37 ++
 drivers/net/ethernet/intel/i40e/i40e_main.c        | 609 ++++++++++++++++++++-
 drivers/net/ethernet/intel/i40e/i40e_txrx.c        | 170 +++++-
 drivers/net/ethernet/intel/i40e/i40e_txrx.h        |   2 +
 drivers/net/ethernet/intel/i40e/i40e_type.h        |   3 +
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c |  34 +-
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h |   7 +
 include/net/dst_metadata.h                         |  41 +-
 net/core/dst.c                                     |  15 +-
 net/core/filter.c                                  |   1 +
 net/ipv4/ip_tunnel_core.c                          |   6 +-
 net/openvswitch/flow_netlink.c                     |   4 +-
 13 files changed, 899 insertions(+), 31 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [Intel-wired-lan] [next-queue v6 PATCH 0/7] i40e: Add port representor and initial switchdev support
@ 2017-03-30  0:22 ` Sridhar Samudrala
  0 siblings, 0 replies; 46+ messages in thread
From: Sridhar Samudrala @ 2017-03-30  0:22 UTC (permalink / raw)
  To: intel-wired-lan

- Patch 1 introduces devlink interface to get/set the mode of switch.
- Patch 2 adds support to create control plane port representor netdevs
  associated with dataplane PF and VFs that can be used to control/configure
  PF/VFs even when they are in a different namespace.
- Patch 3 enables syncing link state between PF/VFs and associated Port
  representors.
- Patch 4 adds a new type to metadata_dst to allow passing port id to lower
  device.
- Patch 5 adds TX and RX support to port netdevs.
- Patch 6 enables HW and SW Port statistics to be exposed via netlink on
  Port netdevs.
- Patch 7 adds support to get switch id and port number for Port netdevs.

v6:
- Port representor netdevs are created and supported for PFs too.
- Broadcast filters are not disabled by default on VFs in switchdev mode.
  Instead, offload_fwd_mark is set on skb's that are forwarded to Port
  netdevs to indicate that the HW has done the forwarding.
- Random mac addresses are assigned to Port netdevs.
v5:
- Fix an issue with the link state sync patch.
v4:
- Make VFPR ndo_get_stats64 a void function to match with recent upstream
  change.
v3:
- misc. error handling fixes suggested by Scott Peterson
- introduce switchdev_ops and add support to get switch id and port no.
  for VFPR netdevs. Suggested by Or Gerlitz
v2:
- handle i40e_alloc_vfpr_netdev() failures.
- minor comment/commit msg updates.

Jakub Kicinski (1):
  net: store port/representator id in metadata_dst

Sridhar Samudrala (6):
  i40e: Introduce devlink interface
  i40e: Introduce Port Representor netdevs and switchdev mode.
  i40e: Sync link state between PF/VFs and Port representor netdevs
  i40e: Add TX and RX support over port netdev's in switchdev mode
  i40e: Add support for exposing switch port statistics via port netdevs
  i40e: Add support to get switch id and port number for port netdevs

 drivers/net/ethernet/intel/Kconfig                 |   1 +
 drivers/net/ethernet/intel/i40e/i40e.h             |  37 ++
 drivers/net/ethernet/intel/i40e/i40e_main.c        | 609 ++++++++++++++++++++-
 drivers/net/ethernet/intel/i40e/i40e_txrx.c        | 170 +++++-
 drivers/net/ethernet/intel/i40e/i40e_txrx.h        |   2 +
 drivers/net/ethernet/intel/i40e/i40e_type.h        |   3 +
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c |  34 +-
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h |   7 +
 include/net/dst_metadata.h                         |  41 +-
 net/core/dst.c                                     |  15 +-
 net/core/filter.c                                  |   1 +
 net/ipv4/ip_tunnel_core.c                          |   6 +-
 net/openvswitch/flow_netlink.c                     |   4 +-
 13 files changed, 899 insertions(+), 31 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [next-queue v6 PATCH 1/7] i40e: Introduce devlink interface
  2017-03-30  0:22 ` [Intel-wired-lan] " Sridhar Samudrala
@ 2017-03-30  0:22   ` Sridhar Samudrala
  -1 siblings, 0 replies; 46+ messages in thread
From: Sridhar Samudrala @ 2017-03-30  0:22 UTC (permalink / raw)
  To: intel-wired-lan, netdev, alexander.h.duyck, anjali.singhai,
	jakub.kicinski, gerlitz.or, jiri, sridhar.samudrala

Add initial devlink support to get/set the mode of SRIOV switch.
This patch sets the default mode as 'legacy' and enables getting the mode
and and setting it to 'legacy'.

The switch mode can be get/set via following 'devlink' commands.

# devlink dev eswitch show pci/0000:42:00.0
pci/0000:42:00.0: mode legacy
# devlink dev eswitch set pci/0000:42:00.0 mode switchdev
devlink answers: Operation not supported
# devlink dev eswitch set pci/0000:42:00.0 mode legacy
# devlink dev eswitch show pci/0000:42:00.0
pci/0000:05:00.0: mode legacy

Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
---
 drivers/net/ethernet/intel/Kconfig          |  1 +
 drivers/net/ethernet/intel/i40e/i40e.h      |  3 ++
 drivers/net/ethernet/intel/i40e/i40e_main.c | 77 ++++++++++++++++++++++++++---
 3 files changed, 75 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/intel/Kconfig b/drivers/net/ethernet/intel/Kconfig
index 1542a21..ababcae 100644
--- a/drivers/net/ethernet/intel/Kconfig
+++ b/drivers/net/ethernet/intel/Kconfig
@@ -215,6 +215,7 @@ config I40E
 	tristate "Intel(R) Ethernet Controller XL710 Family support"
 	imply PTP_1588_CLOCK
 	depends on PCI
+	depends on MAY_USE_DEVLINK
 	---help---
 	  This driver supports Intel(R) Ethernet Controller XL710 Family of
 	  devices.  For more information on how to identify your adapter, go
diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
index 421ea57..f788125c 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -54,6 +54,8 @@
 #include <linux/clocksource.h>
 #include <linux/net_tstamp.h>
 #include <linux/ptp_clock_kernel.h>
+#include <net/devlink.h>
+
 #include "i40e_type.h"
 #include "i40e_prototype.h"
 #include "i40e_client.h"
@@ -517,6 +519,7 @@ struct i40e_pf {
 	u32 ioremap_len;
 	u32 fd_inv;
 	u16 phy_led_val;
+	enum devlink_eswitch_mode eswitch_mode;
 };
 
 /**
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 1a4643c..afcf14d 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -10823,6 +10823,57 @@ static void i40e_get_platform_mac_addr(struct pci_dev *pdev, struct i40e_pf *pf)
 }
 
 /**
+ * i40e_devlink_eswitch_mode_get
+ *
+ * @devlink: pointer to devlink struct
+ * @mode: sr-iov switch mode pointer
+ *
+ * Returns the switch mode of the associated PF in the @mode pointer.
+ */
+static int i40e_devlink_eswitch_mode_get(struct devlink *devlink, u16 *mode)
+{
+	struct i40e_pf *pf = devlink_priv(devlink);
+
+	*mode = pf->eswitch_mode;
+
+	return 0;
+}
+
+/**
+ * i40e_devlink_eswitch_mode_set
+ *
+ * @devlink: pointer to devlink struct
+ * @mode: sr-iov switch mode
+ *
+ * Set the switch mode of the associated PF.
+ * Returns 0 on success and -EOPNOTSUPP on error.
+ */
+static int i40e_devlink_eswitch_mode_set(struct devlink *devlink, u16 mode)
+{
+	struct i40e_pf *pf = devlink_priv(devlink);
+	int err = 0;
+
+	if (mode == pf->eswitch_mode)
+		goto done;
+
+	switch (mode) {
+	case DEVLINK_ESWITCH_MODE_LEGACY:
+		pf->eswitch_mode = mode;
+		break;
+	default:
+		err = -EOPNOTSUPP;
+		break;
+	}
+done:
+	return err;
+}
+
+static const struct devlink_ops i40e_devlink_ops = {
+	.eswitch_mode_get = i40e_devlink_eswitch_mode_get,
+	.eswitch_mode_set = i40e_devlink_eswitch_mode_set,
+};
+
+/**
  * i40e_probe - Device initialization routine
  * @pdev: PCI device information struct
  * @ent: entry in i40e_pci_tbl
@@ -10839,6 +10890,7 @@ static int i40e_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	struct i40e_pf *pf;
 	struct i40e_hw *hw;
 	static u16 pfs_found;
+	struct devlink *devlink;
 	u16 wol_nvm_bits;
 	u16 link_status;
 	int err, globr_probe = 1;
@@ -10877,11 +10929,15 @@ static int i40e_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	 * the Admin Queue structures and then querying for the
 	 * device's current profile information.
 	 */
-	pf = kzalloc(sizeof(*pf), GFP_KERNEL);
-	if (!pf) {
+
+	devlink = devlink_alloc(&i40e_devlink_ops, sizeof(*pf));
+	if (!devlink) {
+		dev_err(&pdev->dev, "devlink_alloc failed\n");
 		err = -ENOMEM;
-		goto err_pf_alloc;
+		goto err_devlink_alloc;
 	}
+
+	pf = devlink_priv(devlink);
 	pf->next_vsi = 0;
 	pf->pdev = pdev;
 	set_bit(__I40E_DOWN, &pf->state);
@@ -11080,6 +11136,11 @@ static int i40e_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	device_set_wakeup_enable(&pf->pdev->dev, pf->wol_en);
 
 	/* set up the main switch operations */
+	pf->eswitch_mode = DEVLINK_ESWITCH_MODE_LEGACY;
+	err = devlink_register(devlink, &pdev->dev);
+	if (err)
+		goto err_devlink_register;
+
 	i40e_determine_queue_usage(pf);
 	err = i40e_init_interrupt_scheme(pf);
 	if (err)
@@ -11339,6 +11400,8 @@ static int i40e_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 err_switch_setup:
 	i40e_reset_interrupt_capability(pf);
 	del_timer_sync(&pf->service_timer);
+	devlink_unregister(devlink);
+err_devlink_register:
 err_mac_addr:
 err_configure_lan_hmc:
 	(void)i40e_shutdown_lan_hmc(hw);
@@ -11349,8 +11412,8 @@ static int i40e_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 err_pf_reset:
 	iounmap(hw->hw_addr);
 err_ioremap:
-	kfree(pf);
-err_pf_alloc:
+	devlink_free(devlink);
+err_devlink_alloc:
 	pci_disable_pcie_error_reporting(pdev);
 	pci_release_mem_regions(pdev);
 err_pci_reg:
@@ -11372,6 +11435,7 @@ static void i40e_remove(struct pci_dev *pdev)
 {
 	struct i40e_pf *pf = pci_get_drvdata(pdev);
 	struct i40e_hw *hw = &pf->hw;
+	struct devlink *devlink = priv_to_devlink(pf);
 	i40e_status ret_code;
 	int i;
 
@@ -11458,7 +11522,8 @@ static void i40e_remove(struct pci_dev *pdev)
 	kfree(pf->vsi);
 
 	iounmap(hw->hw_addr);
-	kfree(pf);
+	devlink_unregister(devlink);
+	devlink_free(devlink);
 	pci_release_mem_regions(pdev);
 
 	pci_disable_pcie_error_reporting(pdev);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Intel-wired-lan] [next-queue v6 PATCH 1/7] i40e: Introduce devlink interface
@ 2017-03-30  0:22   ` Sridhar Samudrala
  0 siblings, 0 replies; 46+ messages in thread
From: Sridhar Samudrala @ 2017-03-30  0:22 UTC (permalink / raw)
  To: intel-wired-lan

Add initial devlink support to get/set the mode of SRIOV switch.
This patch sets the default mode as 'legacy' and enables getting the mode
and and setting it to 'legacy'.

The switch mode can be get/set via following 'devlink' commands.

# devlink dev eswitch show pci/0000:42:00.0
pci/0000:42:00.0: mode legacy
# devlink dev eswitch set pci/0000:42:00.0 mode switchdev
devlink answers: Operation not supported
# devlink dev eswitch set pci/0000:42:00.0 mode legacy
# devlink dev eswitch show pci/0000:42:00.0
pci/0000:05:00.0: mode legacy

Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
---
 drivers/net/ethernet/intel/Kconfig          |  1 +
 drivers/net/ethernet/intel/i40e/i40e.h      |  3 ++
 drivers/net/ethernet/intel/i40e/i40e_main.c | 77 ++++++++++++++++++++++++++---
 3 files changed, 75 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/intel/Kconfig b/drivers/net/ethernet/intel/Kconfig
index 1542a21..ababcae 100644
--- a/drivers/net/ethernet/intel/Kconfig
+++ b/drivers/net/ethernet/intel/Kconfig
@@ -215,6 +215,7 @@ config I40E
 	tristate "Intel(R) Ethernet Controller XL710 Family support"
 	imply PTP_1588_CLOCK
 	depends on PCI
+	depends on MAY_USE_DEVLINK
 	---help---
 	  This driver supports Intel(R) Ethernet Controller XL710 Family of
 	  devices.  For more information on how to identify your adapter, go
diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
index 421ea57..f788125c 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -54,6 +54,8 @@
 #include <linux/clocksource.h>
 #include <linux/net_tstamp.h>
 #include <linux/ptp_clock_kernel.h>
+#include <net/devlink.h>
+
 #include "i40e_type.h"
 #include "i40e_prototype.h"
 #include "i40e_client.h"
@@ -517,6 +519,7 @@ struct i40e_pf {
 	u32 ioremap_len;
 	u32 fd_inv;
 	u16 phy_led_val;
+	enum devlink_eswitch_mode eswitch_mode;
 };
 
 /**
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 1a4643c..afcf14d 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -10823,6 +10823,57 @@ static void i40e_get_platform_mac_addr(struct pci_dev *pdev, struct i40e_pf *pf)
 }
 
 /**
+ * i40e_devlink_eswitch_mode_get
+ *
+ * @devlink: pointer to devlink struct
+ * @mode: sr-iov switch mode pointer
+ *
+ * Returns the switch mode of the associated PF in the @mode pointer.
+ */
+static int i40e_devlink_eswitch_mode_get(struct devlink *devlink, u16 *mode)
+{
+	struct i40e_pf *pf = devlink_priv(devlink);
+
+	*mode = pf->eswitch_mode;
+
+	return 0;
+}
+
+/**
+ * i40e_devlink_eswitch_mode_set
+ *
+ * @devlink: pointer to devlink struct
+ * @mode: sr-iov switch mode
+ *
+ * Set the switch mode of the associated PF.
+ * Returns 0 on success and -EOPNOTSUPP on error.
+ */
+static int i40e_devlink_eswitch_mode_set(struct devlink *devlink, u16 mode)
+{
+	struct i40e_pf *pf = devlink_priv(devlink);
+	int err = 0;
+
+	if (mode == pf->eswitch_mode)
+		goto done;
+
+	switch (mode) {
+	case DEVLINK_ESWITCH_MODE_LEGACY:
+		pf->eswitch_mode = mode;
+		break;
+	default:
+		err = -EOPNOTSUPP;
+		break;
+	}
+done:
+	return err;
+}
+
+static const struct devlink_ops i40e_devlink_ops = {
+	.eswitch_mode_get = i40e_devlink_eswitch_mode_get,
+	.eswitch_mode_set = i40e_devlink_eswitch_mode_set,
+};
+
+/**
  * i40e_probe - Device initialization routine
  * @pdev: PCI device information struct
  * @ent: entry in i40e_pci_tbl
@@ -10839,6 +10890,7 @@ static int i40e_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	struct i40e_pf *pf;
 	struct i40e_hw *hw;
 	static u16 pfs_found;
+	struct devlink *devlink;
 	u16 wol_nvm_bits;
 	u16 link_status;
 	int err, globr_probe = 1;
@@ -10877,11 +10929,15 @@ static int i40e_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	 * the Admin Queue structures and then querying for the
 	 * device's current profile information.
 	 */
-	pf = kzalloc(sizeof(*pf), GFP_KERNEL);
-	if (!pf) {
+
+	devlink = devlink_alloc(&i40e_devlink_ops, sizeof(*pf));
+	if (!devlink) {
+		dev_err(&pdev->dev, "devlink_alloc failed\n");
 		err = -ENOMEM;
-		goto err_pf_alloc;
+		goto err_devlink_alloc;
 	}
+
+	pf = devlink_priv(devlink);
 	pf->next_vsi = 0;
 	pf->pdev = pdev;
 	set_bit(__I40E_DOWN, &pf->state);
@@ -11080,6 +11136,11 @@ static int i40e_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	device_set_wakeup_enable(&pf->pdev->dev, pf->wol_en);
 
 	/* set up the main switch operations */
+	pf->eswitch_mode = DEVLINK_ESWITCH_MODE_LEGACY;
+	err = devlink_register(devlink, &pdev->dev);
+	if (err)
+		goto err_devlink_register;
+
 	i40e_determine_queue_usage(pf);
 	err = i40e_init_interrupt_scheme(pf);
 	if (err)
@@ -11339,6 +11400,8 @@ static int i40e_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 err_switch_setup:
 	i40e_reset_interrupt_capability(pf);
 	del_timer_sync(&pf->service_timer);
+	devlink_unregister(devlink);
+err_devlink_register:
 err_mac_addr:
 err_configure_lan_hmc:
 	(void)i40e_shutdown_lan_hmc(hw);
@@ -11349,8 +11412,8 @@ static int i40e_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 err_pf_reset:
 	iounmap(hw->hw_addr);
 err_ioremap:
-	kfree(pf);
-err_pf_alloc:
+	devlink_free(devlink);
+err_devlink_alloc:
 	pci_disable_pcie_error_reporting(pdev);
 	pci_release_mem_regions(pdev);
 err_pci_reg:
@@ -11372,6 +11435,7 @@ static void i40e_remove(struct pci_dev *pdev)
 {
 	struct i40e_pf *pf = pci_get_drvdata(pdev);
 	struct i40e_hw *hw = &pf->hw;
+	struct devlink *devlink = priv_to_devlink(pf);
 	i40e_status ret_code;
 	int i;
 
@@ -11458,7 +11522,8 @@ static void i40e_remove(struct pci_dev *pdev)
 	kfree(pf->vsi);
 
 	iounmap(hw->hw_addr);
-	kfree(pf);
+	devlink_unregister(devlink);
+	devlink_free(devlink);
 	pci_release_mem_regions(pdev);
 
 	pci_disable_pcie_error_reporting(pdev);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [next-queue v6 PATCH 2/7] i40e: Introduce Port Representor netdevs and switchdev mode.
  2017-03-30  0:22 ` [Intel-wired-lan] " Sridhar Samudrala
@ 2017-03-30  0:22   ` Sridhar Samudrala
  -1 siblings, 0 replies; 46+ messages in thread
From: Sridhar Samudrala @ 2017-03-30  0:22 UTC (permalink / raw)
  To: intel-wired-lan, netdev, alexander.h.duyck, anjali.singhai,
	jakub.kicinski, gerlitz.or, jiri, sridhar.samudrala

Port Representator netdevs are created for each PF and VF if the switch
mode is set to 'switchdev'. These netdevs can be used to control and
configure VFs and PFs when they are moved to a different namespace.
They enable exposing statistics, configure and monitor link state, mtu,
filters,fdb/vlan entries etc.

Sample script to create port representors
# rmmod i40e; modprobe i40e
# devlink dev eswitch set pci/0000:42:00.0 mode switchdev
# echo 2 > /sys/class/net/p4p1/device/sriov_numvfs
# ip l show
122: p4p1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT qlen 1000
    link/ether 3c:fd:fe:a3:18:f8 brd ff:ff:ff:ff:ff:ff
    vf 0 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off
    vf 1 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off
124: p4p1-pf: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT qlen 1000
    link/ether 72:8e:34:b2:d0:44 brd ff:ff:ff:ff:ff:ff
125: p4p1-vf0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT qlen 1000
    link/ether 02:57:a0:18:2b:ce brd ff:ff:ff:ff:ff:ff
126: p4p1-vf1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT qlen 1000
    link/ether 32:7c:77:5f:3e:e3 brd ff:ff:ff:ff:ff:ff
127: p4p1_0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT qlen 1000
    link/ether 26:51:28:54:69:43 brd ff:ff:ff:ff:ff:ff
128: p4p1_1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT qlen 1000

p4p1 is the PF. p4p1-pf is the port netdev for PF.
p4p1_0, p4p1_1 are VFs and p4p1-vf0, p4p1-vf1 are the port netdev's for the 2 VFs.

Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e.h             |  19 +++
 drivers/net/ethernet/intel/i40e/i40e_main.c        | 187 ++++++++++++++++++++-
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c |   9 +
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h |   6 +
 4 files changed, 220 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
index f788125c..c865803 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -320,6 +320,17 @@ struct i40e_flex_pit {
 	u8 pit_index;
 };
 
+enum i40e_port_netdev_type {
+	I40E_PORT_NETDEV_PF,
+	I40E_PORT_NETDEV_VF
+};
+
+/* Port representor netdev private structure */
+struct i40e_port_netdev_priv {
+	enum i40e_port_netdev_type type;	/* type - PF or VF */
+	void *f;				/* ptr to PF or VF struct */
+};
+
 /* struct that defines the Ethernet device */
 struct i40e_pf {
 	struct pci_dev *pdev;
@@ -328,6 +339,12 @@ struct i40e_pf {
 	struct msix_entry *msix_entries;
 	bool fc_autoneg_status;
 
+	/* PF Port representor netdev that allows control and configuration of
+	 * PFs when they are moved to a different namespace. Enables returning
+	 * PF stats, configuring/monitoring link state, fdb/vlans, filters etc.
+	 */
+	struct net_device *port_netdev;
+
 	u16 eeprom_version;
 	u16 num_vmdq_vsis;         /* num vmdq vsis this PF has set up */
 	u16 num_vmdq_qps;          /* num queue pairs per vmdq pool */
@@ -985,4 +1002,6 @@ bool i40e_dcb_need_reconfig(struct i40e_pf *pf,
 i40e_status i40e_set_npar_bw_setting(struct i40e_pf *pf);
 i40e_status i40e_commit_npar_bw_setting(struct i40e_pf *pf);
 void i40e_print_link_message(struct i40e_vsi *vsi, bool isup);
+int i40e_alloc_port_netdev(void *f, enum i40e_port_netdev_type type);
+void i40e_free_port_netdev(void *f, enum i40e_port_netdev_type type);
 #endif /* _I40E_H_ */
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index afcf14d..e441e39 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -9985,6 +9985,11 @@ struct i40e_vsi *i40e_vsi_setup(struct i40e_pf *pf, u8 type,
 					 ret);
 			}
 		}
+		if (pf->eswitch_mode == DEVLINK_ESWITCH_MODE_SWITCHDEV) {
+			ret = i40e_alloc_port_netdev(pf, I40E_PORT_NETDEV_PF);
+			if (ret)
+				goto err_port_netdev;
+		}
 	case I40E_VSI_VMDQ2:
 		ret = i40e_config_netdev(vsi);
 		if (ret)
@@ -10037,6 +10042,9 @@ struct i40e_vsi *i40e_vsi_setup(struct i40e_pf *pf, u8 type,
 		vsi->netdev = NULL;
 	}
 err_netdev:
+	if (pf->port_netdev)
+		i40e_free_port_netdev(pf, I40E_PORT_NETDEV_PF);
+err_port_netdev:
 	i40e_aq_delete_element(&pf->hw, vsi->seid, NULL);
 err_vsi:
 	i40e_vsi_clear(vsi);
@@ -10851,13 +10859,38 @@ static int i40e_devlink_eswitch_mode_get(struct devlink *devlink, u16 *mode)
 static int i40e_devlink_eswitch_mode_set(struct devlink *devlink, u16 mode)
 {
 	struct i40e_pf *pf = devlink_priv(devlink);
-	int err = 0;
+	struct i40e_vf *vf;
+	int i, j, err = 0;
 
 	if (mode == pf->eswitch_mode)
 		goto done;
 
 	switch (mode) {
 	case DEVLINK_ESWITCH_MODE_LEGACY:
+		for (i = 0; i < pf->num_alloc_vfs; i++) {
+			vf = &pf->vf[i];
+			i40e_free_port_netdev(vf, I40E_PORT_NETDEV_VF);
+		}
+		i40e_free_port_netdev(pf, I40E_PORT_NETDEV_PF);
+		pf->eswitch_mode = mode;
+		break;
+	case DEVLINK_ESWITCH_MODE_SWITCHDEV:
+		err = i40e_alloc_port_netdev(pf, I40E_PORT_NETDEV_PF);
+		if (err)
+			goto done;
+		for (i = 0; i < pf->num_alloc_vfs; i++) {
+			vf = &pf->vf[i];
+			err = i40e_alloc_port_netdev(vf, I40E_PORT_NETDEV_VF);
+			if (err) {
+				for (j = 0; j < i; j++) {
+					vf = &pf->vf[j];
+					i40e_free_port_netdev(vf,
+							I40E_PORT_NETDEV_VF);
+				}
+				i40e_free_port_netdev(pf, I40E_PORT_NETDEV_PF);
+				goto done;
+			}
+		}
 		pf->eswitch_mode = mode;
 		break;
 	default:
@@ -10874,6 +10907,157 @@ static int i40e_devlink_eswitch_mode_set(struct devlink *devlink, u16 mode)
 };
 
 /**
+ * i40e_port_netdev_open
+ * @dev: network interface device structure
+ *
+ * Called when port netdevice is brought up.
+ **/
+static int i40e_port_netdev_open(struct net_device *dev)
+{
+	return 0;
+}
+
+/**
+ * i40e_port_netdev_stop
+ * @dev: network interface device structure
+ *
+ * Called when port netdevice is brought down.
+ **/
+static int i40e_port_netdev_stop(struct net_device *dev)
+{
+	return 0;
+}
+
+static const struct net_device_ops i40e_port_netdev_ops = {
+	.ndo_open		= i40e_port_netdev_open,
+	.ndo_stop		= i40e_port_netdev_stop,
+};
+
+/**
+ * i40e_alloc_port_netdev
+ * @f: pointer to the PF or VF structure
+ * @type: port netdev type
+ *
+ * Create Port representor netdev
+ **/
+int i40e_alloc_port_netdev(void *f, enum i40e_port_netdev_type type)
+{
+	struct net_device *port_netdev;
+	char netdev_name[IFNAMSIZ];
+	struct i40e_port_netdev_priv *priv;
+	struct i40e_pf *pf;
+	struct i40e_vf *vf;
+	struct i40e_vsi *vsi;
+	int err;
+
+	switch (type) {
+	case I40E_PORT_NETDEV_PF:
+		pf = (struct i40e_pf *)f;
+		vsi = pf->vsi[pf->lan_vsi];
+
+		snprintf(netdev_name, IFNAMSIZ, "%s-pf", vsi->netdev->name);
+		port_netdev = alloc_netdev(sizeof(struct i40e_port_netdev_priv),
+					   netdev_name, NET_NAME_UNKNOWN,
+					   ether_setup);
+		if (!port_netdev) {
+			dev_err(&pf->pdev->dev,
+				"alloc_netdev failed for PF:%s port netdev\n",
+				vsi->netdev->name);
+			return -ENOMEM;
+		}
+		pf->port_netdev = port_netdev;
+		priv = netdev_priv(port_netdev);
+		priv->f = pf;
+		priv->type = I40E_PORT_NETDEV_PF;
+		break;
+	case I40E_PORT_NETDEV_VF:
+		vf = (struct i40e_vf *)f;
+		pf = vf->pf;
+		vsi = pf->vsi[pf->lan_vsi];
+
+		snprintf(netdev_name, IFNAMSIZ, "%s-vf%d", vsi->netdev->name,
+			 vf->vf_id);
+		port_netdev = alloc_netdev(sizeof(struct i40e_port_netdev_priv),
+					   netdev_name, NET_NAME_UNKNOWN,
+					   ether_setup);
+		if (!port_netdev) {
+			dev_err(&pf->pdev->dev,
+				"alloc_netdev failed for VF%d port netdev\n",
+				vf->vf_id);
+			return -ENOMEM;
+		}
+		vf->port_netdev = port_netdev;
+		priv = netdev_priv(port_netdev);
+		priv->f = vf;
+		priv->type = I40E_PORT_NETDEV_VF;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	port_netdev->netdev_ops = &i40e_port_netdev_ops;
+	eth_hw_addr_random(port_netdev);
+
+	netif_carrier_off(port_netdev);
+	netif_tx_stop_all_queues(port_netdev);
+
+	err = register_netdev(port_netdev);
+	if (err) {
+		dev_err(&pf->pdev->dev, "register_netdev failed for port netdev: %s\n",
+			port_netdev->name);
+		free_netdev(port_netdev);
+		return err;
+	}
+
+	dev_info(&pf->pdev->dev, "%s Port representor %s created\n",
+		 ((type == I40E_PORT_NETDEV_PF) ? "PF" : "VF"),
+		 port_netdev->name);
+
+	return 0;
+}
+
+/**
+ * i40e_free_port_netdev
+ * @pf: pointer to the PF or VF structure
+ * @type: port netdev type
+ *
+ * Free Port representor netdev
+ **/
+void i40e_free_port_netdev(void *f, enum i40e_port_netdev_type type)
+{
+	struct i40e_pf *pf;
+	struct i40e_vf *vf;
+
+	switch (type) {
+	case I40E_PORT_NETDEV_PF:
+		pf = (struct i40e_pf *)f;
+
+		if (!pf->port_netdev)
+			return;
+		dev_info(&pf->pdev->dev, "Freeing PF Port representor %s\n",
+			 pf->port_netdev->name);
+		unregister_netdev(pf->port_netdev);
+		free_netdev(pf->port_netdev);
+		pf->port_netdev = NULL;
+		break;
+	case I40E_PORT_NETDEV_VF:
+		vf = (struct i40e_vf *)f;
+		pf = vf->pf;
+
+		if (!vf->port_netdev)
+			return;
+		dev_info(&pf->pdev->dev, "Freeing VF Port representor %s\n",
+			 vf->port_netdev->name);
+		unregister_netdev(vf->port_netdev);
+		free_netdev(vf->port_netdev);
+		vf->port_netdev = NULL;
+		break;
+	default:
+		break;
+	}
+}
+
+/**
  * i40e_probe - Device initialization routine
  * @pdev: PCI device information struct
  * @ent: entry in i40e_pci_tbl
@@ -11474,6 +11658,7 @@ static void i40e_remove(struct pci_dev *pdev)
 			i40e_switch_branch_release(pf->veb[i]);
 	}
 
+	i40e_free_port_netdev(pf, I40E_PORT_NETDEV_PF);
 	/* Now we can shutdown the PF's VSI, just before we kill
 	 * adminq and hmc.
 	 */
diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
index 65c95ff..e89f4c4 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
@@ -1081,6 +1081,9 @@ void i40e_free_vfs(struct i40e_pf *pf)
 			i40e_free_vf_res(&pf->vf[i]);
 		/* disable qp mappings */
 		i40e_disable_vf_mappings(&pf->vf[i]);
+
+		if (pf->eswitch_mode == DEVLINK_ESWITCH_MODE_SWITCHDEV)
+			i40e_free_port_netdev(&pf->vf[i], I40E_PORT_NETDEV_VF);
 	}
 
 	kfree(pf->vf);
@@ -1148,6 +1151,12 @@ int i40e_alloc_vfs(struct i40e_pf *pf, u16 num_alloc_vfs)
 		/* VF resources get allocated during reset */
 		i40e_reset_vf(&vfs[i], false);
 
+		if (pf->eswitch_mode == DEVLINK_ESWITCH_MODE_SWITCHDEV) {
+			ret = i40e_alloc_port_netdev(&vfs[i],
+						     I40E_PORT_NETDEV_VF);
+			if (ret)
+				goto err_alloc;
+		}
 	}
 	pf->num_alloc_vfs = num_alloc_vfs;
 
diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h
index 37af437..b24d0c6 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h
@@ -76,6 +76,12 @@ enum i40e_vf_capabilities {
 struct i40e_vf {
 	struct i40e_pf *pf;
 
+	/* VF Port representor netdev that allows control and configuration
+	 * of VFs from the host. Enables returning VF stats, configuring link
+	 * state, mtu, fdb/vlans, filters etc.
+	 */
+	struct net_device *port_netdev;
+
 	/* VF id in the PF space */
 	s16 vf_id;
 	/* all VF vsis connect to the same parent */
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Intel-wired-lan] [next-queue v6 PATCH 2/7] i40e: Introduce Port Representor netdevs and switchdev mode.
@ 2017-03-30  0:22   ` Sridhar Samudrala
  0 siblings, 0 replies; 46+ messages in thread
From: Sridhar Samudrala @ 2017-03-30  0:22 UTC (permalink / raw)
  To: intel-wired-lan

Port Representator netdevs are created for each PF and VF if the switch
mode is set to 'switchdev'. These netdevs can be used to control and
configure VFs and PFs when they are moved to a different namespace.
They enable exposing statistics, configure and monitor link state, mtu,
filters,fdb/vlan entries etc.

Sample script to create port representors
# rmmod i40e; modprobe i40e
# devlink dev eswitch set pci/0000:42:00.0 mode switchdev
# echo 2 > /sys/class/net/p4p1/device/sriov_numvfs
# ip l show
122: p4p1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT qlen 1000
    link/ether 3c:fd:fe:a3:18:f8 brd ff:ff:ff:ff:ff:ff
    vf 0 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off
    vf 1 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off
124: p4p1-pf: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT qlen 1000
    link/ether 72:8e:34:b2:d0:44 brd ff:ff:ff:ff:ff:ff
125: p4p1-vf0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT qlen 1000
    link/ether 02:57:a0:18:2b:ce brd ff:ff:ff:ff:ff:ff
126: p4p1-vf1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT qlen 1000
    link/ether 32:7c:77:5f:3e:e3 brd ff:ff:ff:ff:ff:ff
127: p4p1_0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT qlen 1000
    link/ether 26:51:28:54:69:43 brd ff:ff:ff:ff:ff:ff
128: p4p1_1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT qlen 1000

p4p1 is the PF. p4p1-pf is the port netdev for PF.
p4p1_0, p4p1_1 are VFs and p4p1-vf0, p4p1-vf1 are the port netdev's for the 2 VFs.

Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e.h             |  19 +++
 drivers/net/ethernet/intel/i40e/i40e_main.c        | 187 ++++++++++++++++++++-
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c |   9 +
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h |   6 +
 4 files changed, 220 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
index f788125c..c865803 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -320,6 +320,17 @@ struct i40e_flex_pit {
 	u8 pit_index;
 };
 
+enum i40e_port_netdev_type {
+	I40E_PORT_NETDEV_PF,
+	I40E_PORT_NETDEV_VF
+};
+
+/* Port representor netdev private structure */
+struct i40e_port_netdev_priv {
+	enum i40e_port_netdev_type type;	/* type - PF or VF */
+	void *f;				/* ptr to PF or VF struct */
+};
+
 /* struct that defines the Ethernet device */
 struct i40e_pf {
 	struct pci_dev *pdev;
@@ -328,6 +339,12 @@ struct i40e_pf {
 	struct msix_entry *msix_entries;
 	bool fc_autoneg_status;
 
+	/* PF Port representor netdev that allows control and configuration of
+	 * PFs when they are moved to a different namespace. Enables returning
+	 * PF stats, configuring/monitoring link state, fdb/vlans, filters etc.
+	 */
+	struct net_device *port_netdev;
+
 	u16 eeprom_version;
 	u16 num_vmdq_vsis;         /* num vmdq vsis this PF has set up */
 	u16 num_vmdq_qps;          /* num queue pairs per vmdq pool */
@@ -985,4 +1002,6 @@ bool i40e_dcb_need_reconfig(struct i40e_pf *pf,
 i40e_status i40e_set_npar_bw_setting(struct i40e_pf *pf);
 i40e_status i40e_commit_npar_bw_setting(struct i40e_pf *pf);
 void i40e_print_link_message(struct i40e_vsi *vsi, bool isup);
+int i40e_alloc_port_netdev(void *f, enum i40e_port_netdev_type type);
+void i40e_free_port_netdev(void *f, enum i40e_port_netdev_type type);
 #endif /* _I40E_H_ */
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index afcf14d..e441e39 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -9985,6 +9985,11 @@ struct i40e_vsi *i40e_vsi_setup(struct i40e_pf *pf, u8 type,
 					 ret);
 			}
 		}
+		if (pf->eswitch_mode == DEVLINK_ESWITCH_MODE_SWITCHDEV) {
+			ret = i40e_alloc_port_netdev(pf, I40E_PORT_NETDEV_PF);
+			if (ret)
+				goto err_port_netdev;
+		}
 	case I40E_VSI_VMDQ2:
 		ret = i40e_config_netdev(vsi);
 		if (ret)
@@ -10037,6 +10042,9 @@ struct i40e_vsi *i40e_vsi_setup(struct i40e_pf *pf, u8 type,
 		vsi->netdev = NULL;
 	}
 err_netdev:
+	if (pf->port_netdev)
+		i40e_free_port_netdev(pf, I40E_PORT_NETDEV_PF);
+err_port_netdev:
 	i40e_aq_delete_element(&pf->hw, vsi->seid, NULL);
 err_vsi:
 	i40e_vsi_clear(vsi);
@@ -10851,13 +10859,38 @@ static int i40e_devlink_eswitch_mode_get(struct devlink *devlink, u16 *mode)
 static int i40e_devlink_eswitch_mode_set(struct devlink *devlink, u16 mode)
 {
 	struct i40e_pf *pf = devlink_priv(devlink);
-	int err = 0;
+	struct i40e_vf *vf;
+	int i, j, err = 0;
 
 	if (mode == pf->eswitch_mode)
 		goto done;
 
 	switch (mode) {
 	case DEVLINK_ESWITCH_MODE_LEGACY:
+		for (i = 0; i < pf->num_alloc_vfs; i++) {
+			vf = &pf->vf[i];
+			i40e_free_port_netdev(vf, I40E_PORT_NETDEV_VF);
+		}
+		i40e_free_port_netdev(pf, I40E_PORT_NETDEV_PF);
+		pf->eswitch_mode = mode;
+		break;
+	case DEVLINK_ESWITCH_MODE_SWITCHDEV:
+		err = i40e_alloc_port_netdev(pf, I40E_PORT_NETDEV_PF);
+		if (err)
+			goto done;
+		for (i = 0; i < pf->num_alloc_vfs; i++) {
+			vf = &pf->vf[i];
+			err = i40e_alloc_port_netdev(vf, I40E_PORT_NETDEV_VF);
+			if (err) {
+				for (j = 0; j < i; j++) {
+					vf = &pf->vf[j];
+					i40e_free_port_netdev(vf,
+							I40E_PORT_NETDEV_VF);
+				}
+				i40e_free_port_netdev(pf, I40E_PORT_NETDEV_PF);
+				goto done;
+			}
+		}
 		pf->eswitch_mode = mode;
 		break;
 	default:
@@ -10874,6 +10907,157 @@ static int i40e_devlink_eswitch_mode_set(struct devlink *devlink, u16 mode)
 };
 
 /**
+ * i40e_port_netdev_open
+ * @dev: network interface device structure
+ *
+ * Called when port netdevice is brought up.
+ **/
+static int i40e_port_netdev_open(struct net_device *dev)
+{
+	return 0;
+}
+
+/**
+ * i40e_port_netdev_stop
+ * @dev: network interface device structure
+ *
+ * Called when port netdevice is brought down.
+ **/
+static int i40e_port_netdev_stop(struct net_device *dev)
+{
+	return 0;
+}
+
+static const struct net_device_ops i40e_port_netdev_ops = {
+	.ndo_open		= i40e_port_netdev_open,
+	.ndo_stop		= i40e_port_netdev_stop,
+};
+
+/**
+ * i40e_alloc_port_netdev
+ * @f: pointer to the PF or VF structure
+ * @type: port netdev type
+ *
+ * Create Port representor netdev
+ **/
+int i40e_alloc_port_netdev(void *f, enum i40e_port_netdev_type type)
+{
+	struct net_device *port_netdev;
+	char netdev_name[IFNAMSIZ];
+	struct i40e_port_netdev_priv *priv;
+	struct i40e_pf *pf;
+	struct i40e_vf *vf;
+	struct i40e_vsi *vsi;
+	int err;
+
+	switch (type) {
+	case I40E_PORT_NETDEV_PF:
+		pf = (struct i40e_pf *)f;
+		vsi = pf->vsi[pf->lan_vsi];
+
+		snprintf(netdev_name, IFNAMSIZ, "%s-pf", vsi->netdev->name);
+		port_netdev = alloc_netdev(sizeof(struct i40e_port_netdev_priv),
+					   netdev_name, NET_NAME_UNKNOWN,
+					   ether_setup);
+		if (!port_netdev) {
+			dev_err(&pf->pdev->dev,
+				"alloc_netdev failed for PF:%s port netdev\n",
+				vsi->netdev->name);
+			return -ENOMEM;
+		}
+		pf->port_netdev = port_netdev;
+		priv = netdev_priv(port_netdev);
+		priv->f = pf;
+		priv->type = I40E_PORT_NETDEV_PF;
+		break;
+	case I40E_PORT_NETDEV_VF:
+		vf = (struct i40e_vf *)f;
+		pf = vf->pf;
+		vsi = pf->vsi[pf->lan_vsi];
+
+		snprintf(netdev_name, IFNAMSIZ, "%s-vf%d", vsi->netdev->name,
+			 vf->vf_id);
+		port_netdev = alloc_netdev(sizeof(struct i40e_port_netdev_priv),
+					   netdev_name, NET_NAME_UNKNOWN,
+					   ether_setup);
+		if (!port_netdev) {
+			dev_err(&pf->pdev->dev,
+				"alloc_netdev failed for VF%d port netdev\n",
+				vf->vf_id);
+			return -ENOMEM;
+		}
+		vf->port_netdev = port_netdev;
+		priv = netdev_priv(port_netdev);
+		priv->f = vf;
+		priv->type = I40E_PORT_NETDEV_VF;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	port_netdev->netdev_ops = &i40e_port_netdev_ops;
+	eth_hw_addr_random(port_netdev);
+
+	netif_carrier_off(port_netdev);
+	netif_tx_stop_all_queues(port_netdev);
+
+	err = register_netdev(port_netdev);
+	if (err) {
+		dev_err(&pf->pdev->dev, "register_netdev failed for port netdev: %s\n",
+			port_netdev->name);
+		free_netdev(port_netdev);
+		return err;
+	}
+
+	dev_info(&pf->pdev->dev, "%s Port representor %s created\n",
+		 ((type == I40E_PORT_NETDEV_PF) ? "PF" : "VF"),
+		 port_netdev->name);
+
+	return 0;
+}
+
+/**
+ * i40e_free_port_netdev
+ * @pf: pointer to the PF or VF structure
+ * @type: port netdev type
+ *
+ * Free Port representor netdev
+ **/
+void i40e_free_port_netdev(void *f, enum i40e_port_netdev_type type)
+{
+	struct i40e_pf *pf;
+	struct i40e_vf *vf;
+
+	switch (type) {
+	case I40E_PORT_NETDEV_PF:
+		pf = (struct i40e_pf *)f;
+
+		if (!pf->port_netdev)
+			return;
+		dev_info(&pf->pdev->dev, "Freeing PF Port representor %s\n",
+			 pf->port_netdev->name);
+		unregister_netdev(pf->port_netdev);
+		free_netdev(pf->port_netdev);
+		pf->port_netdev = NULL;
+		break;
+	case I40E_PORT_NETDEV_VF:
+		vf = (struct i40e_vf *)f;
+		pf = vf->pf;
+
+		if (!vf->port_netdev)
+			return;
+		dev_info(&pf->pdev->dev, "Freeing VF Port representor %s\n",
+			 vf->port_netdev->name);
+		unregister_netdev(vf->port_netdev);
+		free_netdev(vf->port_netdev);
+		vf->port_netdev = NULL;
+		break;
+	default:
+		break;
+	}
+}
+
+/**
  * i40e_probe - Device initialization routine
  * @pdev: PCI device information struct
  * @ent: entry in i40e_pci_tbl
@@ -11474,6 +11658,7 @@ static void i40e_remove(struct pci_dev *pdev)
 			i40e_switch_branch_release(pf->veb[i]);
 	}
 
+	i40e_free_port_netdev(pf, I40E_PORT_NETDEV_PF);
 	/* Now we can shutdown the PF's VSI, just before we kill
 	 * adminq and hmc.
 	 */
diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
index 65c95ff..e89f4c4 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
@@ -1081,6 +1081,9 @@ void i40e_free_vfs(struct i40e_pf *pf)
 			i40e_free_vf_res(&pf->vf[i]);
 		/* disable qp mappings */
 		i40e_disable_vf_mappings(&pf->vf[i]);
+
+		if (pf->eswitch_mode == DEVLINK_ESWITCH_MODE_SWITCHDEV)
+			i40e_free_port_netdev(&pf->vf[i], I40E_PORT_NETDEV_VF);
 	}
 
 	kfree(pf->vf);
@@ -1148,6 +1151,12 @@ int i40e_alloc_vfs(struct i40e_pf *pf, u16 num_alloc_vfs)
 		/* VF resources get allocated during reset */
 		i40e_reset_vf(&vfs[i], false);
 
+		if (pf->eswitch_mode == DEVLINK_ESWITCH_MODE_SWITCHDEV) {
+			ret = i40e_alloc_port_netdev(&vfs[i],
+						     I40E_PORT_NETDEV_VF);
+			if (ret)
+				goto err_alloc;
+		}
 	}
 	pf->num_alloc_vfs = num_alloc_vfs;
 
diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h
index 37af437..b24d0c6 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h
@@ -76,6 +76,12 @@ enum i40e_vf_capabilities {
 struct i40e_vf {
 	struct i40e_pf *pf;
 
+	/* VF Port representor netdev that allows control and configuration
+	 * of VFs from the host. Enables returning VF stats, configuring link
+	 * state, mtu, fdb/vlans, filters etc.
+	 */
+	struct net_device *port_netdev;
+
 	/* VF id in the PF space */
 	s16 vf_id;
 	/* all VF vsis connect to the same parent */
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [next-queue v6 PATCH 3/7] i40e: Sync link state between PF/VFs and Port representor netdevs
  2017-03-30  0:22 ` [Intel-wired-lan] " Sridhar Samudrala
@ 2017-03-30  0:22   ` Sridhar Samudrala
  -1 siblings, 0 replies; 46+ messages in thread
From: Sridhar Samudrala @ 2017-03-30  0:22 UTC (permalink / raw)
  To: intel-wired-lan, netdev, alexander.h.duyck, anjali.singhai,
	jakub.kicinski, gerlitz.or, jiri, sridhar.samudrala

This patch enables
- reflecting the link state of port netdev based on PF/VF admin state &
  link state of PF/VF based on admin state of the associated port netdev.
- bringing up/down the VF port netdev sends a notification to update VF
  link state.
- bringing up/down the VF will cause the link state update of VF port
  netdev.
- enable/disable VF link state via ndo_set_vf_link_state will update the
  admin state of associated VF port netdev.
- bringing up/down the PF port netdev updates the link state of PF based on
  the hw link info.
- bringing up/down the PF will update the link state of PF port netdev.

PF: p4p1, VFs: p4p1_0,p4p1_1  PF Port netdev: p4p1-pf
VF Port netdevs:p4p1-vf0, p4p1-vf1
# rmmod i40e; modprobe i40e
# devlink dev eswitch set pci/0000:42:00.0 mode switchdev
# echo 2 > /sys/class/net/p4p1/device/sriov_numvfs

# ip link set p4p1 up
# ip link show p4p1-pf
29: p4p1-pf: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT qlen 1000
    link/ether 02:25:9e:c6:a7:3f brd ff:ff:ff:ff:ff:ff

/* p4p1-pf DOWN ->  p4p1 NO-CARRIER */
# ip link show p4p1
27: p4p1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT qlen 1000
    link/ether 3c:fd:fe:a3:18:f8 brd ff:ff:ff:ff:ff:ff

# ip link set p4p1-pf up
# ip link show p4p1-pf
29: p4p1-pf: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT qlen 1000
    link/ether 02:25:9e:c6:a7:3f brd ff:ff:ff:ff:ff:ff

/* p4p1-pf UP ->  p4p1 CARRIER ON */
# ip link show p4p1
27: p4p1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT qlen 1000
    link/ether 3c:fd:fe:a3:18:f8 brd ff:ff:ff:ff:ff:ff

# ip link set p4p1-vf0 up
# ip link set p4p1_0 up

/* p4p1_0 UP -> p4p1-vf0 CARRIER ON */
# ip link show p4p1-vf0
30: p4p1-vf0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT qlen 1000
    link/ether d2:7c:19:e6:e2:ef brd ff:ff:ff:ff:ff:ff

/* p4p1-vf0 UP -> p4p1_0 CARRIER ON */
# ip link show p4p1_0
32: p4p1_0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT qlen 1000
    link/ether ba:29:a3:bb:a0:d5 brd ff:ff:ff:ff:ff:ff

# ip link set p4p1_1 up

/* p4p1-vf1 DOWN -> p4p1_1 NO-CARRIER */
# ip link show p4p1_1
33: p4p1_1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT qlen 1000
    link/ether 3e:fd:68:79:91:02 brd ff:ff:ff:ff:ff:ff

# ip link set p4p1_1 down
# ip link set p4p1-vf1 up

/* p4p1_1 DOWN -> p4p1-vf1 NO-CARRIER */
# ip link show p4p1-vf1
31: p4p1-vf1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast state DOWN mode DEFAULT qlen 1000
    link/ether e2:a0:20:20:c8:b4 brd ff:ff:ff:ff:ff:ff

# ip link set p4p1-vf0 down

/* p4p1-vf0 DOWN -> p4p1_0 NO-CARRIER */
# ip link show p4p1_0
32: p4p1_0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT qlen 1000
    link/ether 52:36:3d:37:1e:73 brd ff:ff:ff:ff:ff:ff

# ip -d link show p4p1
27: p4p1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop portid 6805ca27268 state DOWN mode DEFAULT group default qlen 1000
    link/ether 68:05:ca:2e:72:68 brd ff:ff:ff:ff:ff:ff promiscuity 0 addrgenmode eui64
    vf 0 MAC 00:00:00:00:00:00, spoof checking on, link-state disable, trust off
    vf 1 MAC 00:00:00:00:00:00, spoof checking on, link-state enable, trust off

Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_main.c        | 106 ++++++++++++++++++++-
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c |  21 +++-
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h |   1 +
 3 files changed, 122 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index e441e39..683aa20 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -5316,7 +5316,8 @@ static int i40e_up_complete(struct i40e_vsi *vsi)
 	i40e_vsi_enable_irq(vsi);
 
 	if ((pf->hw.phy.link_info.link_info & I40E_AQ_LINK_UP) &&
-	    (vsi->netdev)) {
+	    (vsi->netdev) && (!pf->port_netdev ||
+			      (pf->port_netdev->flags & IFF_UP))) {
 		i40e_print_link_message(vsi, true);
 		netif_tx_start_all_queues(vsi->netdev);
 		netif_carrier_on(vsi->netdev);
@@ -5518,6 +5519,9 @@ int i40e_open(struct net_device *netdev)
 
 	udp_tunnel_get_rx_info(netdev);
 
+	if (pf->port_netdev)
+		netif_carrier_on(pf->port_netdev);
+
 	return 0;
 }
 
@@ -5667,9 +5671,13 @@ int i40e_close(struct net_device *netdev)
 {
 	struct i40e_netdev_priv *np = netdev_priv(netdev);
 	struct i40e_vsi *vsi = np->vsi;
+	struct i40e_pf *pf = vsi->back;
 
 	i40e_vsi_close(vsi);
 
+	if (pf->port_netdev)
+		netif_carrier_off(pf->port_netdev);
+
 	return 0;
 }
 
@@ -6181,10 +6189,15 @@ static void i40e_vsi_link_event(struct i40e_vsi *vsi, bool link_up)
 
 	switch (vsi->type) {
 	case I40E_VSI_MAIN:
+	{
+		struct i40e_pf *pf = vsi->back;
+		struct net_device *port_netdev = pf->port_netdev;
+
 		if (!vsi->netdev || !vsi->netdev_registered)
 			break;
 
-		if (link_up) {
+		if (link_up && (!port_netdev ||
+				(port_netdev->flags & IFF_UP))) {
 			netif_carrier_on(vsi->netdev);
 			netif_tx_wake_all_queues(vsi->netdev);
 		} else {
@@ -6192,7 +6205,7 @@ static void i40e_vsi_link_event(struct i40e_vsi *vsi, bool link_up)
 			netif_tx_stop_all_queues(vsi->netdev);
 		}
 		break;
-
+	}
 	case I40E_VSI_SRIOV:
 	case I40E_VSI_VMDQ2:
 	case I40E_VSI_CTRL:
@@ -10914,7 +10927,35 @@ static int i40e_devlink_eswitch_mode_set(struct devlink *devlink, u16 mode)
  **/
 static int i40e_port_netdev_open(struct net_device *dev)
 {
-	return 0;
+	struct i40e_port_netdev_priv *priv = netdev_priv(dev);
+	struct i40e_pf *pf;
+	struct i40e_vf *vf;
+	struct i40e_vsi *vsi;
+	int err = 0;
+
+	switch (priv->type) {
+	case I40E_PORT_NETDEV_VF:
+		vf = (struct i40e_vf *)priv->f;
+		vf->link_forced = true;
+		vf->link_up = true;
+		i40e_vc_notify_vf_link_state(vf);
+		break;
+	case I40E_PORT_NETDEV_PF:
+		pf = (struct i40e_pf *)priv->f;
+		vsi = pf->vsi[pf->lan_vsi];
+		if (pf->hw.phy.link_info.link_info & I40E_AQ_LINK_UP) {
+			netif_carrier_on(vsi->netdev);
+			netif_tx_start_all_queues(vsi->netdev);
+		} else {
+			err = -ENETDOWN;
+		}
+		break;
+	default:
+		err = -EINVAL;
+		break;
+	}
+
+	return err;
 }
 
 /**
@@ -10925,7 +10966,31 @@ static int i40e_port_netdev_open(struct net_device *dev)
  **/
 static int i40e_port_netdev_stop(struct net_device *dev)
 {
-	return 0;
+	struct i40e_port_netdev_priv *priv = netdev_priv(dev);
+	struct i40e_pf *pf;
+	struct i40e_vf *vf;
+	struct i40e_vsi *vsi;
+	int err = 0;
+
+	switch (priv->type) {
+	case I40E_PORT_NETDEV_VF:
+		vf = (struct i40e_vf *)priv->f;
+		vf->link_forced = true;
+		vf->link_up = false;
+		i40e_vc_notify_vf_link_state(vf);
+		break;
+	case I40E_PORT_NETDEV_PF:
+		pf = (struct i40e_pf *)priv->f;
+		vsi = pf->vsi[pf->lan_vsi];
+		netif_carrier_off(vsi->netdev);
+		netif_tx_stop_all_queues(vsi->netdev);
+		break;
+	default:
+		err = -EINVAL;
+		break;
+	}
+
+	return err;
 }
 
 static const struct net_device_ops i40e_port_netdev_ops = {
@@ -11013,6 +11078,26 @@ int i40e_alloc_port_netdev(void *f, enum i40e_port_netdev_type type)
 		 ((type == I40E_PORT_NETDEV_PF) ? "PF" : "VF"),
 		 port_netdev->name);
 
+	switch (type) {
+	case I40E_PORT_NETDEV_PF:
+		/* Reset PF link as we are changing the mode to 'switchdev'.
+		 * Port netdev needs to be brought up to enable VF link.
+		 */
+		netif_carrier_off(vsi->netdev);
+		netif_tx_stop_all_queues(vsi->netdev);
+		if (pf->hw.phy.link_info.link_info & I40E_AQ_LINK_UP)
+			netif_carrier_on(port_netdev);
+		break;
+	case I40E_PORT_NETDEV_VF:
+		/* Reset VF link as we are changing the mode to 'switchdev'.
+		 * Port netdev needs to be brought up to enable VF link.
+		 */
+		vf->link_forced = true;
+		vf->link_up = false;
+		i40e_vc_notify_vf_link_state(vf);
+		break;
+	}
+
 	return 0;
 }
 
@@ -11027,10 +11112,12 @@ void i40e_free_port_netdev(void *f, enum i40e_port_netdev_type type)
 {
 	struct i40e_pf *pf;
 	struct i40e_vf *vf;
+	struct i40e_vsi *vsi;
 
 	switch (type) {
 	case I40E_PORT_NETDEV_PF:
 		pf = (struct i40e_pf *)f;
+		vsi = pf->vsi[pf->lan_vsi];
 
 		if (!pf->port_netdev)
 			return;
@@ -11039,6 +11126,11 @@ void i40e_free_port_netdev(void *f, enum i40e_port_netdev_type type)
 		unregister_netdev(pf->port_netdev);
 		free_netdev(pf->port_netdev);
 		pf->port_netdev = NULL;
+		/* In legacy mode, PF link is not controlled by Port netdev */
+		if (pf->hw.phy.link_info.link_info & I40E_AQ_LINK_UP) {
+			netif_carrier_on(vsi->netdev);
+			netif_tx_start_all_queues(vsi->netdev);
+		}
 		break;
 	case I40E_PORT_NETDEV_VF:
 		vf = (struct i40e_vf *)f;
@@ -11051,6 +11143,10 @@ void i40e_free_port_netdev(void *f, enum i40e_port_netdev_type type)
 		unregister_netdev(vf->port_netdev);
 		free_netdev(vf->port_netdev);
 		vf->port_netdev = NULL;
+
+		/* In legacy mode, VF link is not controlled by Port netdev */
+		vf->link_forced = false;
+		i40e_vc_notify_vf_link_state(vf);
 		break;
 	default:
 		break;
diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
index e89f4c4..7c2e7b0 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
@@ -68,7 +68,7 @@ static void i40e_vc_vf_broadcast(struct i40e_pf *pf,
  *
  * send a link status message to a single VF
  **/
-static void i40e_vc_notify_vf_link_state(struct i40e_vf *vf)
+void i40e_vc_notify_vf_link_state(struct i40e_vf *vf)
 {
 	struct i40e_virtchnl_pf_event pfe;
 	struct i40e_pf *pf = vf->pf;
@@ -1805,6 +1805,10 @@ static int i40e_vc_enable_queues_msg(struct i40e_vf *vf, u8 *msg, u16 msglen)
 
 	if (i40e_vsi_start_rings(pf->vsi[vf->lan_vsi_idx]))
 		aq_ret = I40E_ERR_TIMEOUT;
+
+	if ((aq_ret == 0) && vf->port_netdev)
+		netif_carrier_on(vf->port_netdev);
+
 error_param:
 	/* send the response to the VF */
 	return i40e_vc_send_resp_to_vf(vf, I40E_VIRTCHNL_OP_ENABLE_QUEUES,
@@ -1844,6 +1848,9 @@ static int i40e_vc_disable_queues_msg(struct i40e_vf *vf, u8 *msg, u16 msglen)
 
 	i40e_vsi_stop_rings(pf->vsi[vf->lan_vsi_idx]);
 
+	if ((aq_ret == 0) && vf->port_netdev)
+		netif_carrier_off(vf->port_netdev);
+
 error_param:
 	/* send the response to the VF */
 	return i40e_vc_send_resp_to_vf(vf, I40E_VIRTCHNL_OP_DISABLE_QUEUES,
@@ -3080,6 +3087,7 @@ int i40e_ndo_set_vf_link_state(struct net_device *netdev, int vf_id, int link)
 	struct i40e_pf *pf = np->vsi->back;
 	struct i40e_virtchnl_pf_event pfe;
 	struct i40e_hw *hw = &pf->hw;
+	struct net_device *port_netdev;
 	struct i40e_vf *vf;
 	int abs_vf_id;
 	int ret = 0;
@@ -3121,6 +3129,17 @@ int i40e_ndo_set_vf_link_state(struct net_device *netdev, int vf_id, int link)
 		ret = -EINVAL;
 		goto error_out;
 	}
+
+	port_netdev = vf->port_netdev;
+	if (port_netdev) {
+		unsigned int flags = port_netdev->flags;
+
+		if (vf->link_up)
+			dev_change_flags(port_netdev, flags | IFF_UP);
+		else
+			dev_change_flags(port_netdev, flags & ~IFF_UP);
+	}
+
 	/* Notify the VF of its new link state */
 	i40e_aq_send_msg_to_vf(hw, abs_vf_id, I40E_VIRTCHNL_OP_EVENT,
 			       0, (u8 *)&pfe, sizeof(pfe), NULL);
diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h
index b24d0c6..3e1f8f6 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h
@@ -146,5 +146,6 @@ int i40e_ndo_get_vf_config(struct net_device *netdev,
 
 void i40e_vc_notify_link_state(struct i40e_pf *pf);
 void i40e_vc_notify_reset(struct i40e_pf *pf);
+void i40e_vc_notify_vf_link_state(struct i40e_vf *vf);
 
 #endif /* _I40E_VIRTCHNL_PF_H_ */
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Intel-wired-lan] [next-queue v6 PATCH 3/7] i40e: Sync link state between PF/VFs and Port representor netdevs
@ 2017-03-30  0:22   ` Sridhar Samudrala
  0 siblings, 0 replies; 46+ messages in thread
From: Sridhar Samudrala @ 2017-03-30  0:22 UTC (permalink / raw)
  To: intel-wired-lan

This patch enables
- reflecting the link state of port netdev based on PF/VF admin state &
  link state of PF/VF based on admin state of the associated port netdev.
- bringing up/down the VF port netdev sends a notification to update VF
  link state.
- bringing up/down the VF will cause the link state update of VF port
  netdev.
- enable/disable VF link state via ndo_set_vf_link_state will update the
  admin state of associated VF port netdev.
- bringing up/down the PF port netdev updates the link state of PF based on
  the hw link info.
- bringing up/down the PF will update the link state of PF port netdev.

PF: p4p1, VFs: p4p1_0,p4p1_1  PF Port netdev: p4p1-pf
VF Port netdevs:p4p1-vf0, p4p1-vf1
# rmmod i40e; modprobe i40e
# devlink dev eswitch set pci/0000:42:00.0 mode switchdev
# echo 2 > /sys/class/net/p4p1/device/sriov_numvfs

# ip link set p4p1 up
# ip link show p4p1-pf
29: p4p1-pf: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT qlen 1000
    link/ether 02:25:9e:c6:a7:3f brd ff:ff:ff:ff:ff:ff

/* p4p1-pf DOWN ->  p4p1 NO-CARRIER */
# ip link show p4p1
27: p4p1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT qlen 1000
    link/ether 3c:fd:fe:a3:18:f8 brd ff:ff:ff:ff:ff:ff

# ip link set p4p1-pf up
# ip link show p4p1-pf
29: p4p1-pf: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT qlen 1000
    link/ether 02:25:9e:c6:a7:3f brd ff:ff:ff:ff:ff:ff

/* p4p1-pf UP ->  p4p1 CARRIER ON */
# ip link show p4p1
27: p4p1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT qlen 1000
    link/ether 3c:fd:fe:a3:18:f8 brd ff:ff:ff:ff:ff:ff

# ip link set p4p1-vf0 up
# ip link set p4p1_0 up

/* p4p1_0 UP -> p4p1-vf0 CARRIER ON */
# ip link show p4p1-vf0
30: p4p1-vf0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT qlen 1000
    link/ether d2:7c:19:e6:e2:ef brd ff:ff:ff:ff:ff:ff

/* p4p1-vf0 UP -> p4p1_0 CARRIER ON */
# ip link show p4p1_0
32: p4p1_0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT qlen 1000
    link/ether ba:29:a3:bb:a0:d5 brd ff:ff:ff:ff:ff:ff

# ip link set p4p1_1 up

/* p4p1-vf1 DOWN -> p4p1_1 NO-CARRIER */
# ip link show p4p1_1
33: p4p1_1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT qlen 1000
    link/ether 3e:fd:68:79:91:02 brd ff:ff:ff:ff:ff:ff

# ip link set p4p1_1 down
# ip link set p4p1-vf1 up

/* p4p1_1 DOWN -> p4p1-vf1 NO-CARRIER */
# ip link show p4p1-vf1
31: p4p1-vf1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast state DOWN mode DEFAULT qlen 1000
    link/ether e2:a0:20:20:c8:b4 brd ff:ff:ff:ff:ff:ff

# ip link set p4p1-vf0 down

/* p4p1-vf0 DOWN -> p4p1_0 NO-CARRIER */
# ip link show p4p1_0
32: p4p1_0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT qlen 1000
    link/ether 52:36:3d:37:1e:73 brd ff:ff:ff:ff:ff:ff

# ip -d link show p4p1
27: p4p1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop portid 6805ca27268 state DOWN mode DEFAULT group default qlen 1000
    link/ether 68:05:ca:2e:72:68 brd ff:ff:ff:ff:ff:ff promiscuity 0 addrgenmode eui64
    vf 0 MAC 00:00:00:00:00:00, spoof checking on, link-state disable, trust off
    vf 1 MAC 00:00:00:00:00:00, spoof checking on, link-state enable, trust off

Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_main.c        | 106 ++++++++++++++++++++-
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c |  21 +++-
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h |   1 +
 3 files changed, 122 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index e441e39..683aa20 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -5316,7 +5316,8 @@ static int i40e_up_complete(struct i40e_vsi *vsi)
 	i40e_vsi_enable_irq(vsi);
 
 	if ((pf->hw.phy.link_info.link_info & I40E_AQ_LINK_UP) &&
-	    (vsi->netdev)) {
+	    (vsi->netdev) && (!pf->port_netdev ||
+			      (pf->port_netdev->flags & IFF_UP))) {
 		i40e_print_link_message(vsi, true);
 		netif_tx_start_all_queues(vsi->netdev);
 		netif_carrier_on(vsi->netdev);
@@ -5518,6 +5519,9 @@ int i40e_open(struct net_device *netdev)
 
 	udp_tunnel_get_rx_info(netdev);
 
+	if (pf->port_netdev)
+		netif_carrier_on(pf->port_netdev);
+
 	return 0;
 }
 
@@ -5667,9 +5671,13 @@ int i40e_close(struct net_device *netdev)
 {
 	struct i40e_netdev_priv *np = netdev_priv(netdev);
 	struct i40e_vsi *vsi = np->vsi;
+	struct i40e_pf *pf = vsi->back;
 
 	i40e_vsi_close(vsi);
 
+	if (pf->port_netdev)
+		netif_carrier_off(pf->port_netdev);
+
 	return 0;
 }
 
@@ -6181,10 +6189,15 @@ static void i40e_vsi_link_event(struct i40e_vsi *vsi, bool link_up)
 
 	switch (vsi->type) {
 	case I40E_VSI_MAIN:
+	{
+		struct i40e_pf *pf = vsi->back;
+		struct net_device *port_netdev = pf->port_netdev;
+
 		if (!vsi->netdev || !vsi->netdev_registered)
 			break;
 
-		if (link_up) {
+		if (link_up && (!port_netdev ||
+				(port_netdev->flags & IFF_UP))) {
 			netif_carrier_on(vsi->netdev);
 			netif_tx_wake_all_queues(vsi->netdev);
 		} else {
@@ -6192,7 +6205,7 @@ static void i40e_vsi_link_event(struct i40e_vsi *vsi, bool link_up)
 			netif_tx_stop_all_queues(vsi->netdev);
 		}
 		break;
-
+	}
 	case I40E_VSI_SRIOV:
 	case I40E_VSI_VMDQ2:
 	case I40E_VSI_CTRL:
@@ -10914,7 +10927,35 @@ static int i40e_devlink_eswitch_mode_set(struct devlink *devlink, u16 mode)
  **/
 static int i40e_port_netdev_open(struct net_device *dev)
 {
-	return 0;
+	struct i40e_port_netdev_priv *priv = netdev_priv(dev);
+	struct i40e_pf *pf;
+	struct i40e_vf *vf;
+	struct i40e_vsi *vsi;
+	int err = 0;
+
+	switch (priv->type) {
+	case I40E_PORT_NETDEV_VF:
+		vf = (struct i40e_vf *)priv->f;
+		vf->link_forced = true;
+		vf->link_up = true;
+		i40e_vc_notify_vf_link_state(vf);
+		break;
+	case I40E_PORT_NETDEV_PF:
+		pf = (struct i40e_pf *)priv->f;
+		vsi = pf->vsi[pf->lan_vsi];
+		if (pf->hw.phy.link_info.link_info & I40E_AQ_LINK_UP) {
+			netif_carrier_on(vsi->netdev);
+			netif_tx_start_all_queues(vsi->netdev);
+		} else {
+			err = -ENETDOWN;
+		}
+		break;
+	default:
+		err = -EINVAL;
+		break;
+	}
+
+	return err;
 }
 
 /**
@@ -10925,7 +10966,31 @@ static int i40e_port_netdev_open(struct net_device *dev)
  **/
 static int i40e_port_netdev_stop(struct net_device *dev)
 {
-	return 0;
+	struct i40e_port_netdev_priv *priv = netdev_priv(dev);
+	struct i40e_pf *pf;
+	struct i40e_vf *vf;
+	struct i40e_vsi *vsi;
+	int err = 0;
+
+	switch (priv->type) {
+	case I40E_PORT_NETDEV_VF:
+		vf = (struct i40e_vf *)priv->f;
+		vf->link_forced = true;
+		vf->link_up = false;
+		i40e_vc_notify_vf_link_state(vf);
+		break;
+	case I40E_PORT_NETDEV_PF:
+		pf = (struct i40e_pf *)priv->f;
+		vsi = pf->vsi[pf->lan_vsi];
+		netif_carrier_off(vsi->netdev);
+		netif_tx_stop_all_queues(vsi->netdev);
+		break;
+	default:
+		err = -EINVAL;
+		break;
+	}
+
+	return err;
 }
 
 static const struct net_device_ops i40e_port_netdev_ops = {
@@ -11013,6 +11078,26 @@ int i40e_alloc_port_netdev(void *f, enum i40e_port_netdev_type type)
 		 ((type == I40E_PORT_NETDEV_PF) ? "PF" : "VF"),
 		 port_netdev->name);
 
+	switch (type) {
+	case I40E_PORT_NETDEV_PF:
+		/* Reset PF link as we are changing the mode to 'switchdev'.
+		 * Port netdev needs to be brought up to enable VF link.
+		 */
+		netif_carrier_off(vsi->netdev);
+		netif_tx_stop_all_queues(vsi->netdev);
+		if (pf->hw.phy.link_info.link_info & I40E_AQ_LINK_UP)
+			netif_carrier_on(port_netdev);
+		break;
+	case I40E_PORT_NETDEV_VF:
+		/* Reset VF link as we are changing the mode to 'switchdev'.
+		 * Port netdev needs to be brought up to enable VF link.
+		 */
+		vf->link_forced = true;
+		vf->link_up = false;
+		i40e_vc_notify_vf_link_state(vf);
+		break;
+	}
+
 	return 0;
 }
 
@@ -11027,10 +11112,12 @@ void i40e_free_port_netdev(void *f, enum i40e_port_netdev_type type)
 {
 	struct i40e_pf *pf;
 	struct i40e_vf *vf;
+	struct i40e_vsi *vsi;
 
 	switch (type) {
 	case I40E_PORT_NETDEV_PF:
 		pf = (struct i40e_pf *)f;
+		vsi = pf->vsi[pf->lan_vsi];
 
 		if (!pf->port_netdev)
 			return;
@@ -11039,6 +11126,11 @@ void i40e_free_port_netdev(void *f, enum i40e_port_netdev_type type)
 		unregister_netdev(pf->port_netdev);
 		free_netdev(pf->port_netdev);
 		pf->port_netdev = NULL;
+		/* In legacy mode, PF link is not controlled by Port netdev */
+		if (pf->hw.phy.link_info.link_info & I40E_AQ_LINK_UP) {
+			netif_carrier_on(vsi->netdev);
+			netif_tx_start_all_queues(vsi->netdev);
+		}
 		break;
 	case I40E_PORT_NETDEV_VF:
 		vf = (struct i40e_vf *)f;
@@ -11051,6 +11143,10 @@ void i40e_free_port_netdev(void *f, enum i40e_port_netdev_type type)
 		unregister_netdev(vf->port_netdev);
 		free_netdev(vf->port_netdev);
 		vf->port_netdev = NULL;
+
+		/* In legacy mode, VF link is not controlled by Port netdev */
+		vf->link_forced = false;
+		i40e_vc_notify_vf_link_state(vf);
 		break;
 	default:
 		break;
diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
index e89f4c4..7c2e7b0 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
@@ -68,7 +68,7 @@ static void i40e_vc_vf_broadcast(struct i40e_pf *pf,
  *
  * send a link status message to a single VF
  **/
-static void i40e_vc_notify_vf_link_state(struct i40e_vf *vf)
+void i40e_vc_notify_vf_link_state(struct i40e_vf *vf)
 {
 	struct i40e_virtchnl_pf_event pfe;
 	struct i40e_pf *pf = vf->pf;
@@ -1805,6 +1805,10 @@ static int i40e_vc_enable_queues_msg(struct i40e_vf *vf, u8 *msg, u16 msglen)
 
 	if (i40e_vsi_start_rings(pf->vsi[vf->lan_vsi_idx]))
 		aq_ret = I40E_ERR_TIMEOUT;
+
+	if ((aq_ret == 0) && vf->port_netdev)
+		netif_carrier_on(vf->port_netdev);
+
 error_param:
 	/* send the response to the VF */
 	return i40e_vc_send_resp_to_vf(vf, I40E_VIRTCHNL_OP_ENABLE_QUEUES,
@@ -1844,6 +1848,9 @@ static int i40e_vc_disable_queues_msg(struct i40e_vf *vf, u8 *msg, u16 msglen)
 
 	i40e_vsi_stop_rings(pf->vsi[vf->lan_vsi_idx]);
 
+	if ((aq_ret == 0) && vf->port_netdev)
+		netif_carrier_off(vf->port_netdev);
+
 error_param:
 	/* send the response to the VF */
 	return i40e_vc_send_resp_to_vf(vf, I40E_VIRTCHNL_OP_DISABLE_QUEUES,
@@ -3080,6 +3087,7 @@ int i40e_ndo_set_vf_link_state(struct net_device *netdev, int vf_id, int link)
 	struct i40e_pf *pf = np->vsi->back;
 	struct i40e_virtchnl_pf_event pfe;
 	struct i40e_hw *hw = &pf->hw;
+	struct net_device *port_netdev;
 	struct i40e_vf *vf;
 	int abs_vf_id;
 	int ret = 0;
@@ -3121,6 +3129,17 @@ int i40e_ndo_set_vf_link_state(struct net_device *netdev, int vf_id, int link)
 		ret = -EINVAL;
 		goto error_out;
 	}
+
+	port_netdev = vf->port_netdev;
+	if (port_netdev) {
+		unsigned int flags = port_netdev->flags;
+
+		if (vf->link_up)
+			dev_change_flags(port_netdev, flags | IFF_UP);
+		else
+			dev_change_flags(port_netdev, flags & ~IFF_UP);
+	}
+
 	/* Notify the VF of its new link state */
 	i40e_aq_send_msg_to_vf(hw, abs_vf_id, I40E_VIRTCHNL_OP_EVENT,
 			       0, (u8 *)&pfe, sizeof(pfe), NULL);
diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h
index b24d0c6..3e1f8f6 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h
@@ -146,5 +146,6 @@ int i40e_ndo_get_vf_config(struct net_device *netdev,
 
 void i40e_vc_notify_link_state(struct i40e_pf *pf);
 void i40e_vc_notify_reset(struct i40e_pf *pf);
+void i40e_vc_notify_vf_link_state(struct i40e_vf *vf);
 
 #endif /* _I40E_VIRTCHNL_PF_H_ */
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [next-queue v6 PATCH 4/7] net: store port/representator id in metadata_dst
  2017-03-30  0:22 ` [Intel-wired-lan] " Sridhar Samudrala
@ 2017-03-30  0:22   ` Sridhar Samudrala
  -1 siblings, 0 replies; 46+ messages in thread
From: Sridhar Samudrala @ 2017-03-30  0:22 UTC (permalink / raw)
  To: intel-wired-lan, netdev, alexander.h.duyck, anjali.singhai,
	jakub.kicinski, gerlitz.or, jiri, sridhar.samudrala

From: Jakub Kicinski <jakub.kicinski@netronome.com>

Switches and modern SR-IOV enabled NICs may multiplex traffic from Port
representators and control messages over single set of hardware queues.
Control messages and muxed traffic may need ordered delivery.

Those requirements make it hard to comfortably use TC infrastructure today
unless we have a way of attaching metadata to skbs at the upper device.
Because single set of queues is used for many netdevs stopping TC/sched
queues of all of them reliably is impossible and lower device has to
retreat to returning NETDEV_TX_BUSY and usually has to take extra locks
on the fastpath.

This patch attempts to enable port/representative devs to attach metadata
to skbs which carry port id.  This way representatives can be queueless
and all queuing can be performed at the lower netdev in the usual way.

Traffic arriving on the port/representative interfaces will be have
metadata attached and will subsequently be queued to the lower device
for transmission. The lower device should recognize the metadata and
translate it to HW specific format which is most likely either a special
header inserted before the network headers or descriptor/metadata fields.

Metadata is associated with the lower device by storing the netdev pointer
along with port id so that if TC decides to redirect or mirror the new
netdev will not try to interpret it.

This is mostly for SR-IOV devices since switches don't have lower netdevs
today.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
---
 include/net/dst_metadata.h     | 41 ++++++++++++++++++++++++++++++++---------
 net/core/dst.c                 | 15 ++++++++++-----
 net/core/filter.c              |  1 +
 net/ipv4/ip_tunnel_core.c      |  6 ++++--
 net/openvswitch/flow_netlink.c |  4 +++-
 5 files changed, 50 insertions(+), 17 deletions(-)

diff --git a/include/net/dst_metadata.h b/include/net/dst_metadata.h
index 701fc81..a803129 100644
--- a/include/net/dst_metadata.h
+++ b/include/net/dst_metadata.h
@@ -5,10 +5,22 @@
 #include <net/ip_tunnels.h>
 #include <net/dst.h>
 
+enum metadata_type {
+	METADATA_IP_TUNNEL,
+	METADATA_HW_PORT_MUX,
+};
+
+struct hw_port_info {
+	struct net_device *lower_dev;
+	u32 port_id;
+};
+
 struct metadata_dst {
 	struct dst_entry		dst;
+	enum metadata_type		type;
 	union {
 		struct ip_tunnel_info	tun_info;
+		struct hw_port_info	port_info;
 	} u;
 };
 
@@ -27,7 +39,7 @@ static inline struct ip_tunnel_info *skb_tunnel_info(struct sk_buff *skb)
 	struct metadata_dst *md_dst = skb_metadata_dst(skb);
 	struct dst_entry *dst;
 
-	if (md_dst)
+	if (md_dst && md_dst->type == METADATA_IP_TUNNEL)
 		return &md_dst->u.tun_info;
 
 	dst = skb_dst(skb);
@@ -55,22 +67,33 @@ static inline int skb_metadata_dst_cmp(const struct sk_buff *skb_a,
 	a = (const struct metadata_dst *) skb_dst(skb_a);
 	b = (const struct metadata_dst *) skb_dst(skb_b);
 
-	if (!a != !b || a->u.tun_info.options_len != b->u.tun_info.options_len)
+	if (!a != !b || a->type != b->type)
 		return 1;
 
-	return memcmp(&a->u.tun_info, &b->u.tun_info,
-		      sizeof(a->u.tun_info) + a->u.tun_info.options_len);
+	switch (a->type) {
+	case METADATA_HW_PORT_MUX:
+		return memcmp(&a->u.port_info, &b->u.port_info,
+			      sizeof(a->u.port_info));
+	case METADATA_IP_TUNNEL:
+		return memcmp(&a->u.tun_info, &b->u.tun_info,
+			      sizeof(a->u.tun_info) +
+					 a->u.tun_info.options_len);
+	default:
+		return 1;
+	}
 }
 
 void metadata_dst_free(struct metadata_dst *);
-struct metadata_dst *metadata_dst_alloc(u8 optslen, gfp_t flags);
-struct metadata_dst __percpu *metadata_dst_alloc_percpu(u8 optslen, gfp_t flags);
+struct metadata_dst *metadata_dst_alloc(u8 optslen, enum metadata_type type,
+					gfp_t flags);
+struct metadata_dst __percpu *
+metadata_dst_alloc_percpu(u8 optslen, enum metadata_type type, gfp_t flags);
 
 static inline struct metadata_dst *tun_rx_dst(int md_size)
 {
 	struct metadata_dst *tun_dst;
 
-	tun_dst = metadata_dst_alloc(md_size, GFP_ATOMIC);
+	tun_dst = metadata_dst_alloc(md_size, METADATA_IP_TUNNEL, GFP_ATOMIC);
 	if (!tun_dst)
 		return NULL;
 
@@ -85,11 +108,11 @@ static inline struct metadata_dst *tun_dst_unclone(struct sk_buff *skb)
 	int md_size;
 	struct metadata_dst *new_md;
 
-	if (!md_dst)
+	if (!md_dst || md_dst->type != METADATA_IP_TUNNEL)
 		return ERR_PTR(-EINVAL);
 
 	md_size = md_dst->u.tun_info.options_len;
-	new_md = metadata_dst_alloc(md_size, GFP_ATOMIC);
+	new_md = metadata_dst_alloc(md_size, METADATA_IP_TUNNEL, GFP_ATOMIC);
 	if (!new_md)
 		return ERR_PTR(-ENOMEM);
 
diff --git a/net/core/dst.c b/net/core/dst.c
index 960e503..230e430 100644
--- a/net/core/dst.c
+++ b/net/core/dst.c
@@ -366,7 +366,9 @@ static int dst_md_discard(struct sk_buff *skb)
 	return 0;
 }
 
-static void __metadata_dst_init(struct metadata_dst *md_dst, u8 optslen)
+static void __metadata_dst_init(struct metadata_dst *md_dst,
+				enum metadata_type type, u8 optslen)
+
 {
 	struct dst_entry *dst;
 
@@ -378,9 +380,11 @@ static void __metadata_dst_init(struct metadata_dst *md_dst, u8 optslen)
 	dst->output = dst_md_discard_out;
 
 	memset(dst + 1, 0, sizeof(*md_dst) + optslen - sizeof(*dst));
+	md_dst->type = type;
 }
 
-struct metadata_dst *metadata_dst_alloc(u8 optslen, gfp_t flags)
+struct metadata_dst *metadata_dst_alloc(u8 optslen, enum metadata_type type,
+					gfp_t flags)
 {
 	struct metadata_dst *md_dst;
 
@@ -388,7 +392,7 @@ struct metadata_dst *metadata_dst_alloc(u8 optslen, gfp_t flags)
 	if (!md_dst)
 		return NULL;
 
-	__metadata_dst_init(md_dst, optslen);
+	__metadata_dst_init(md_dst, type, optslen);
 
 	return md_dst;
 }
@@ -402,7 +406,8 @@ void metadata_dst_free(struct metadata_dst *md_dst)
 	kfree(md_dst);
 }
 
-struct metadata_dst __percpu *metadata_dst_alloc_percpu(u8 optslen, gfp_t flags)
+struct metadata_dst __percpu *
+metadata_dst_alloc_percpu(u8 optslen, enum metadata_type type, gfp_t flags)
 {
 	int cpu;
 	struct metadata_dst __percpu *md_dst;
@@ -413,7 +418,7 @@ struct metadata_dst __percpu *metadata_dst_alloc_percpu(u8 optslen, gfp_t flags)
 		return NULL;
 
 	for_each_possible_cpu(cpu)
-		__metadata_dst_init(per_cpu_ptr(md_dst, cpu), optslen);
+		__metadata_dst_init(per_cpu_ptr(md_dst, cpu), type, optslen);
 
 	return md_dst;
 }
diff --git a/net/core/filter.c b/net/core/filter.c
index dfb9f61..ab29297 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2531,6 +2531,7 @@ static unsigned short bpf_tunnel_key_af(u64 flags)
 		 * that is holding verifier mutex.
 		 */
 		md_dst = metadata_dst_alloc_percpu(IP_TUNNEL_OPTS_MAX,
+						   METADATA_IP_TUNNEL,
 						   GFP_KERNEL);
 		if (!md_dst)
 			return NULL;
diff --git a/net/ipv4/ip_tunnel_core.c b/net/ipv4/ip_tunnel_core.c
index a31f47c..cb1328e 100644
--- a/net/ipv4/ip_tunnel_core.c
+++ b/net/ipv4/ip_tunnel_core.c
@@ -134,10 +134,12 @@ struct metadata_dst *iptunnel_metadata_reply(struct metadata_dst *md,
 	struct metadata_dst *res;
 	struct ip_tunnel_info *dst, *src;
 
-	if (!md || md->u.tun_info.mode & IP_TUNNEL_INFO_TX)
+	if (!md || md->type != METADATA_IP_TUNNEL ||
+	    md->u.tun_info.mode & IP_TUNNEL_INFO_TX)
+
 		return NULL;
 
-	res = metadata_dst_alloc(0, flags);
+	res = metadata_dst_alloc(0, METADATA_IP_TUNNEL, flags);
 	if (!res)
 		return NULL;
 
diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
index df82b81..a36efae 100644
--- a/net/openvswitch/flow_netlink.c
+++ b/net/openvswitch/flow_netlink.c
@@ -2202,7 +2202,9 @@ static int validate_and_copy_set_tun(const struct nlattr *attr,
 	if (start < 0)
 		return start;
 
-	tun_dst = metadata_dst_alloc(key.tun_opts_len, GFP_KERNEL);
+	tun_dst = metadata_dst_alloc(key.tun_opts_len, METADATA_IP_TUNNEL,
+				     GFP_KERNEL);
+
 	if (!tun_dst)
 		return -ENOMEM;
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Intel-wired-lan] [next-queue v6 PATCH 4/7] net: store port/representator id in metadata_dst
@ 2017-03-30  0:22   ` Sridhar Samudrala
  0 siblings, 0 replies; 46+ messages in thread
From: Sridhar Samudrala @ 2017-03-30  0:22 UTC (permalink / raw)
  To: intel-wired-lan

From: Jakub Kicinski <jakub.kicinski@netronome.com>

Switches and modern SR-IOV enabled NICs may multiplex traffic from Port
representators and control messages over single set of hardware queues.
Control messages and muxed traffic may need ordered delivery.

Those requirements make it hard to comfortably use TC infrastructure today
unless we have a way of attaching metadata to skbs at the upper device.
Because single set of queues is used for many netdevs stopping TC/sched
queues of all of them reliably is impossible and lower device has to
retreat to returning NETDEV_TX_BUSY and usually has to take extra locks
on the fastpath.

This patch attempts to enable port/representative devs to attach metadata
to skbs which carry port id.  This way representatives can be queueless
and all queuing can be performed at the lower netdev in the usual way.

Traffic arriving on the port/representative interfaces will be have
metadata attached and will subsequently be queued to the lower device
for transmission. The lower device should recognize the metadata and
translate it to HW specific format which is most likely either a special
header inserted before the network headers or descriptor/metadata fields.

Metadata is associated with the lower device by storing the netdev pointer
along with port id so that if TC decides to redirect or mirror the new
netdev will not try to interpret it.

This is mostly for SR-IOV devices since switches don't have lower netdevs
today.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
---
 include/net/dst_metadata.h     | 41 ++++++++++++++++++++++++++++++++---------
 net/core/dst.c                 | 15 ++++++++++-----
 net/core/filter.c              |  1 +
 net/ipv4/ip_tunnel_core.c      |  6 ++++--
 net/openvswitch/flow_netlink.c |  4 +++-
 5 files changed, 50 insertions(+), 17 deletions(-)

diff --git a/include/net/dst_metadata.h b/include/net/dst_metadata.h
index 701fc81..a803129 100644
--- a/include/net/dst_metadata.h
+++ b/include/net/dst_metadata.h
@@ -5,10 +5,22 @@
 #include <net/ip_tunnels.h>
 #include <net/dst.h>
 
+enum metadata_type {
+	METADATA_IP_TUNNEL,
+	METADATA_HW_PORT_MUX,
+};
+
+struct hw_port_info {
+	struct net_device *lower_dev;
+	u32 port_id;
+};
+
 struct metadata_dst {
 	struct dst_entry		dst;
+	enum metadata_type		type;
 	union {
 		struct ip_tunnel_info	tun_info;
+		struct hw_port_info	port_info;
 	} u;
 };
 
@@ -27,7 +39,7 @@ static inline struct ip_tunnel_info *skb_tunnel_info(struct sk_buff *skb)
 	struct metadata_dst *md_dst = skb_metadata_dst(skb);
 	struct dst_entry *dst;
 
-	if (md_dst)
+	if (md_dst && md_dst->type == METADATA_IP_TUNNEL)
 		return &md_dst->u.tun_info;
 
 	dst = skb_dst(skb);
@@ -55,22 +67,33 @@ static inline int skb_metadata_dst_cmp(const struct sk_buff *skb_a,
 	a = (const struct metadata_dst *) skb_dst(skb_a);
 	b = (const struct metadata_dst *) skb_dst(skb_b);
 
-	if (!a != !b || a->u.tun_info.options_len != b->u.tun_info.options_len)
+	if (!a != !b || a->type != b->type)
 		return 1;
 
-	return memcmp(&a->u.tun_info, &b->u.tun_info,
-		      sizeof(a->u.tun_info) + a->u.tun_info.options_len);
+	switch (a->type) {
+	case METADATA_HW_PORT_MUX:
+		return memcmp(&a->u.port_info, &b->u.port_info,
+			      sizeof(a->u.port_info));
+	case METADATA_IP_TUNNEL:
+		return memcmp(&a->u.tun_info, &b->u.tun_info,
+			      sizeof(a->u.tun_info) +
+					 a->u.tun_info.options_len);
+	default:
+		return 1;
+	}
 }
 
 void metadata_dst_free(struct metadata_dst *);
-struct metadata_dst *metadata_dst_alloc(u8 optslen, gfp_t flags);
-struct metadata_dst __percpu *metadata_dst_alloc_percpu(u8 optslen, gfp_t flags);
+struct metadata_dst *metadata_dst_alloc(u8 optslen, enum metadata_type type,
+					gfp_t flags);
+struct metadata_dst __percpu *
+metadata_dst_alloc_percpu(u8 optslen, enum metadata_type type, gfp_t flags);
 
 static inline struct metadata_dst *tun_rx_dst(int md_size)
 {
 	struct metadata_dst *tun_dst;
 
-	tun_dst = metadata_dst_alloc(md_size, GFP_ATOMIC);
+	tun_dst = metadata_dst_alloc(md_size, METADATA_IP_TUNNEL, GFP_ATOMIC);
 	if (!tun_dst)
 		return NULL;
 
@@ -85,11 +108,11 @@ static inline struct metadata_dst *tun_dst_unclone(struct sk_buff *skb)
 	int md_size;
 	struct metadata_dst *new_md;
 
-	if (!md_dst)
+	if (!md_dst || md_dst->type != METADATA_IP_TUNNEL)
 		return ERR_PTR(-EINVAL);
 
 	md_size = md_dst->u.tun_info.options_len;
-	new_md = metadata_dst_alloc(md_size, GFP_ATOMIC);
+	new_md = metadata_dst_alloc(md_size, METADATA_IP_TUNNEL, GFP_ATOMIC);
 	if (!new_md)
 		return ERR_PTR(-ENOMEM);
 
diff --git a/net/core/dst.c b/net/core/dst.c
index 960e503..230e430 100644
--- a/net/core/dst.c
+++ b/net/core/dst.c
@@ -366,7 +366,9 @@ static int dst_md_discard(struct sk_buff *skb)
 	return 0;
 }
 
-static void __metadata_dst_init(struct metadata_dst *md_dst, u8 optslen)
+static void __metadata_dst_init(struct metadata_dst *md_dst,
+				enum metadata_type type, u8 optslen)
+
 {
 	struct dst_entry *dst;
 
@@ -378,9 +380,11 @@ static void __metadata_dst_init(struct metadata_dst *md_dst, u8 optslen)
 	dst->output = dst_md_discard_out;
 
 	memset(dst + 1, 0, sizeof(*md_dst) + optslen - sizeof(*dst));
+	md_dst->type = type;
 }
 
-struct metadata_dst *metadata_dst_alloc(u8 optslen, gfp_t flags)
+struct metadata_dst *metadata_dst_alloc(u8 optslen, enum metadata_type type,
+					gfp_t flags)
 {
 	struct metadata_dst *md_dst;
 
@@ -388,7 +392,7 @@ struct metadata_dst *metadata_dst_alloc(u8 optslen, gfp_t flags)
 	if (!md_dst)
 		return NULL;
 
-	__metadata_dst_init(md_dst, optslen);
+	__metadata_dst_init(md_dst, type, optslen);
 
 	return md_dst;
 }
@@ -402,7 +406,8 @@ void metadata_dst_free(struct metadata_dst *md_dst)
 	kfree(md_dst);
 }
 
-struct metadata_dst __percpu *metadata_dst_alloc_percpu(u8 optslen, gfp_t flags)
+struct metadata_dst __percpu *
+metadata_dst_alloc_percpu(u8 optslen, enum metadata_type type, gfp_t flags)
 {
 	int cpu;
 	struct metadata_dst __percpu *md_dst;
@@ -413,7 +418,7 @@ struct metadata_dst __percpu *metadata_dst_alloc_percpu(u8 optslen, gfp_t flags)
 		return NULL;
 
 	for_each_possible_cpu(cpu)
-		__metadata_dst_init(per_cpu_ptr(md_dst, cpu), optslen);
+		__metadata_dst_init(per_cpu_ptr(md_dst, cpu), type, optslen);
 
 	return md_dst;
 }
diff --git a/net/core/filter.c b/net/core/filter.c
index dfb9f61..ab29297 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2531,6 +2531,7 @@ static unsigned short bpf_tunnel_key_af(u64 flags)
 		 * that is holding verifier mutex.
 		 */
 		md_dst = metadata_dst_alloc_percpu(IP_TUNNEL_OPTS_MAX,
+						   METADATA_IP_TUNNEL,
 						   GFP_KERNEL);
 		if (!md_dst)
 			return NULL;
diff --git a/net/ipv4/ip_tunnel_core.c b/net/ipv4/ip_tunnel_core.c
index a31f47c..cb1328e 100644
--- a/net/ipv4/ip_tunnel_core.c
+++ b/net/ipv4/ip_tunnel_core.c
@@ -134,10 +134,12 @@ struct metadata_dst *iptunnel_metadata_reply(struct metadata_dst *md,
 	struct metadata_dst *res;
 	struct ip_tunnel_info *dst, *src;
 
-	if (!md || md->u.tun_info.mode & IP_TUNNEL_INFO_TX)
+	if (!md || md->type != METADATA_IP_TUNNEL ||
+	    md->u.tun_info.mode & IP_TUNNEL_INFO_TX)
+
 		return NULL;
 
-	res = metadata_dst_alloc(0, flags);
+	res = metadata_dst_alloc(0, METADATA_IP_TUNNEL, flags);
 	if (!res)
 		return NULL;
 
diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
index df82b81..a36efae 100644
--- a/net/openvswitch/flow_netlink.c
+++ b/net/openvswitch/flow_netlink.c
@@ -2202,7 +2202,9 @@ static int validate_and_copy_set_tun(const struct nlattr *attr,
 	if (start < 0)
 		return start;
 
-	tun_dst = metadata_dst_alloc(key.tun_opts_len, GFP_KERNEL);
+	tun_dst = metadata_dst_alloc(key.tun_opts_len, METADATA_IP_TUNNEL,
+				     GFP_KERNEL);
+
 	if (!tun_dst)
 		return -ENOMEM;
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [next-queue v6 PATCH 5/7] i40e: Add TX and RX support over port netdev's in switchdev mode
  2017-03-30  0:22 ` [Intel-wired-lan] " Sridhar Samudrala
@ 2017-03-30  0:22   ` Sridhar Samudrala
  -1 siblings, 0 replies; 46+ messages in thread
From: Sridhar Samudrala @ 2017-03-30  0:22 UTC (permalink / raw)
  To: intel-wired-lan, netdev, alexander.h.duyck, anjali.singhai,
	jakub.kicinski, gerlitz.or, jiri, sridhar.samudrala

In switchdev mode, broadcasts from VFs are received by the PF and passed
to corresponding port representor netdev.
Any frames sent via port netdevs are sent as directed transmits to the
corresponding VFs. To enable directed transmit, skb metadata dst is used
to pass the port id and the frame is requeued to call the PFs transmit
routine. VF id is used as port id for VFs and PF port id is defined as
I40_MAIN_VSI_PORT_ID.

Small script to demonstrate inter VF and PF to VF pings in switchdev mode.
PF: p4p1, VFs: p4p1_0,p4p1_1 VF Port Reps:p4p1-vf0, p4p1-vf1
PF Port rep: p4p1-pf

# rmmod i40e; modprobe i40e
# devlink dev eswitch set pci/0000:05:00.0 mode switchdev
# echo 2 > /sys/class/net/p4p1/device/sriov_numvfs
# ip link set p4p1 vf 0 mac 00:11:22:33:44:55
# ip link set p4p1 vf 1 mac 00:11:22:33:44:56
# rmmod i40evf; modprobe i40evf

/* Create 2 namespaces and move the VFs to the corresponding ns */
# ip netns add ns0
# ip link set p4p1_0 netns ns0
# ip netns exec ns0 ip addr add 192.168.1.10/24 dev p4p1_0
# ip netns exec ns0 ip link set p4p1_0 up
# ip netns add ns1
# ip link set p4p1_1 netns ns1
# ip netns exec ns1 ip addr add 192.168.1.11/24 dev p4p1_1
# ip netns exec ns1 ip link set p4p1_1 up

/* bring up pf and port netdevs */
# ip addr add 192.168.1.1/24 dev p4p1
# ip link set p4p1 up
# ip link set p4p1-vf0 up
# ip link set p4p1-vf1 up
# ip link set p4p1-pf up

# ip netns exec ns0 ping -c3 192.168.1.11  /* VF0 -> VF1 */
# ip netns exec ns1 ping -c3 192.168.1.10  /* VF1 -> VF0 */
# ping -c3 192.168.1.10   /* PF -> VF0 */
# ping -c3 192.168.1.11   /* PF -> VF1 */

/* VF0 -> IP in same subnet - broadcasts will be seen on p4p1-vf0 & p4p1 */
# ip netns exec ns0 ping -c1 -W1 192.168.1.200
/* VF1 -> IP in same subnet -  broadcasts will be seen on p4p1-vf1 & p4p1*/
# ip netns exec ns0 ping -c1 -W1 192.168.1.200
/* port rep VF0 -> IP in same subnet - broadcasts will be seen on p4p1_0 */
# ping -I p4p1-vf0 -c1 -W1 192.168.1.200
/* port rep VF1 -> IP in same subnet  - broadcasts will be seen on p4p1_1 */
# ping -I p4p1-vf1 -c1 -W1 192.168.1.200

Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e.h             |   4 +
 drivers/net/ethernet/intel/i40e/i40e_main.c        |  27 +++-
 drivers/net/ethernet/intel/i40e/i40e_txrx.c        | 148 ++++++++++++++++++++-
 drivers/net/ethernet/intel/i40e/i40e_txrx.h        |   2 +
 drivers/net/ethernet/intel/i40e/i40e_type.h        |   3 +
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c |   8 +-
 6 files changed, 184 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
index c865803..ac11005 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -55,6 +55,7 @@
 #include <linux/net_tstamp.h>
 #include <linux/ptp_clock_kernel.h>
 #include <net/devlink.h>
+#include <net/dst_metadata.h>
 
 #include "i40e_type.h"
 #include "i40e_prototype.h"
@@ -320,6 +321,8 @@ struct i40e_flex_pit {
 	u8 pit_index;
 };
 
+#define I40E_MAIN_VSI_PORT_ID	(1 << 15)
+
 enum i40e_port_netdev_type {
 	I40E_PORT_NETDEV_PF,
 	I40E_PORT_NETDEV_VF
@@ -328,6 +331,7 @@ enum i40e_port_netdev_type {
 /* Port representor netdev private structure */
 struct i40e_port_netdev_priv {
 	enum i40e_port_netdev_type type;	/* type - PF or VF */
+	struct metadata_dst *dst;		/* port id */
 	void *f;				/* ptr to PF or VF struct */
 };
 
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 683aa20..e9c5c6b 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -5519,8 +5519,10 @@ int i40e_open(struct net_device *netdev)
 
 	udp_tunnel_get_rx_info(netdev);
 
-	if (pf->port_netdev)
+	if (pf->port_netdev) {
 		netif_carrier_on(pf->port_netdev);
+		netif_tx_start_all_queues(pf->port_netdev);
+	}
 
 	return 0;
 }
@@ -5675,8 +5677,10 @@ int i40e_close(struct net_device *netdev)
 
 	i40e_vsi_close(vsi);
 
-	if (pf->port_netdev)
+	if (pf->port_netdev) {
 		netif_carrier_off(pf->port_netdev);
+		netif_tx_stop_all_queues(pf->port_netdev);
+	}
 
 	return 0;
 }
@@ -10872,6 +10876,7 @@ static int i40e_devlink_eswitch_mode_get(struct devlink *devlink, u16 *mode)
 static int i40e_devlink_eswitch_mode_set(struct devlink *devlink, u16 mode)
 {
 	struct i40e_pf *pf = devlink_priv(devlink);
+	struct i40e_vsi *vsi = pf->vsi[pf->lan_vsi];
 	struct i40e_vf *vf;
 	int i, j, err = 0;
 
@@ -10886,6 +10891,8 @@ static int i40e_devlink_eswitch_mode_set(struct devlink *devlink, u16 mode)
 		}
 		i40e_free_port_netdev(pf, I40E_PORT_NETDEV_PF);
 		pf->eswitch_mode = mode;
+		vsi->netdev->priv_flags |=
+			(IFF_XMIT_DST_RELEASE | IFF_XMIT_DST_RELEASE_PERM);
 		break;
 	case DEVLINK_ESWITCH_MODE_SWITCHDEV:
 		err = i40e_alloc_port_netdev(pf, I40E_PORT_NETDEV_PF);
@@ -10905,6 +10912,7 @@ static int i40e_devlink_eswitch_mode_set(struct devlink *devlink, u16 mode)
 			}
 		}
 		pf->eswitch_mode = mode;
+		netif_keep_dst(vsi->netdev);
 		break;
 	default:
 		err = -EOPNOTSUPP;
@@ -10996,6 +11004,7 @@ static int i40e_port_netdev_stop(struct net_device *dev)
 static const struct net_device_ops i40e_port_netdev_ops = {
 	.ndo_open		= i40e_port_netdev_open,
 	.ndo_stop		= i40e_port_netdev_stop,
+	.ndo_start_xmit		= i40e_port_netdev_start_xmit,
 };
 
 /**
@@ -11034,6 +11043,10 @@ int i40e_alloc_port_netdev(void *f, enum i40e_port_netdev_type type)
 		priv = netdev_priv(port_netdev);
 		priv->f = pf;
 		priv->type = I40E_PORT_NETDEV_PF;
+		priv->dst = metadata_dst_alloc(0, METADATA_HW_PORT_MUX,
+					       GFP_KERNEL);
+		priv->dst->u.port_info.lower_dev = vsi->netdev;
+		priv->dst->u.port_info.port_id = I40E_MAIN_VSI_PORT_ID;
 		break;
 	case I40E_PORT_NETDEV_VF:
 		vf = (struct i40e_vf *)f;
@@ -11055,6 +11068,10 @@ int i40e_alloc_port_netdev(void *f, enum i40e_port_netdev_type type)
 		priv = netdev_priv(port_netdev);
 		priv->f = vf;
 		priv->type = I40E_PORT_NETDEV_VF;
+		priv->dst = metadata_dst_alloc(0, METADATA_HW_PORT_MUX,
+					       GFP_KERNEL);
+		priv->dst->u.port_info.lower_dev = vsi->netdev;
+		priv->dst->u.port_info.port_id = vf->vf_id;
 		break;
 	default:
 		return -EINVAL;
@@ -11070,6 +11087,7 @@ int i40e_alloc_port_netdev(void *f, enum i40e_port_netdev_type type)
 	if (err) {
 		dev_err(&pf->pdev->dev, "register_netdev failed for port netdev: %s\n",
 			port_netdev->name);
+		dst_release((struct dst_entry *)priv->dst);
 		free_netdev(port_netdev);
 		return err;
 	}
@@ -11110,6 +11128,7 @@ int i40e_alloc_port_netdev(void *f, enum i40e_port_netdev_type type)
  **/
 void i40e_free_port_netdev(void *f, enum i40e_port_netdev_type type)
 {
+	struct i40e_port_netdev_priv *priv;
 	struct i40e_pf *pf;
 	struct i40e_vf *vf;
 	struct i40e_vsi *vsi;
@@ -11123,6 +11142,8 @@ void i40e_free_port_netdev(void *f, enum i40e_port_netdev_type type)
 			return;
 		dev_info(&pf->pdev->dev, "Freeing PF Port representor %s\n",
 			 pf->port_netdev->name);
+		priv = netdev_priv(pf->port_netdev);
+		dst_release((struct dst_entry *)priv->dst);
 		unregister_netdev(pf->port_netdev);
 		free_netdev(pf->port_netdev);
 		pf->port_netdev = NULL;
@@ -11140,6 +11161,8 @@ void i40e_free_port_netdev(void *f, enum i40e_port_netdev_type type)
 			return;
 		dev_info(&pf->pdev->dev, "Freeing VF Port representor %s\n",
 			 vf->port_netdev->name);
+		priv = netdev_priv(vf->port_netdev);
+		dst_release((struct dst_entry *)priv->dst);
 		unregister_netdev(vf->port_netdev);
 		free_netdev(vf->port_netdev);
 		vf->port_netdev = NULL;
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index ebffca0..86d2510 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -1302,20 +1302,64 @@ static bool i40e_alloc_mapped_page(struct i40e_ring *rx_ring,
 }
 
 /**
+ * i40e_handle_lpbk_skb - Update skb->dev of a loopback frame
+ * @rx_ring: rx ring in play
+ * @skb: packet to send up
+ **/
+static void i40e_handle_lpbk_skb(struct i40e_ring *rx_ring, struct sk_buff *skb)
+{
+	struct i40e_q_vector *q_vector = rx_ring->q_vector;
+	struct i40e_pf *pf = rx_ring->vsi->back;
+	struct sk_buff *nskb;
+	struct i40e_vf *vf;
+	struct ethhdr *eth;
+	int vf_id;
+
+	if ((skb->pkt_type != PACKET_BROADCAST) &&
+	    (skb->pkt_type != PACKET_MULTICAST) &&
+	    (skb->pkt_type != PACKET_OTHERHOST))
+		return;
+
+	eth = (struct ethhdr *)skb_mac_header(skb);
+
+	/* If a loopback packet is received in switchdev mode, clone the skb
+	 * and pass it to the corresponding port netdev based on the source MAC.
+	 */
+	for (vf_id = 0; vf_id < pf->num_alloc_vfs; vf_id++) {
+		vf = &pf->vf[vf_id];
+		if (ether_addr_equal(eth->h_source,
+				     vf->default_lan_addr.addr)) {
+			nskb = skb_clone(skb, GFP_ATOMIC);
+			if (!nskb)
+				break;
+			nskb->offload_fwd_mark = 1;
+			nskb->dev = vf->port_netdev;
+			napi_gro_receive(&q_vector->napi, nskb);
+			break;
+		}
+	}
+}
+
+/**
  * i40e_receive_skb - Send a completed packet up the stack
  * @rx_ring:  rx ring in play
  * @skb: packet to send up
  * @vlan_tag: vlan tag for packet
+ * @lpbk: is it a loopback frame?
  **/
 static void i40e_receive_skb(struct i40e_ring *rx_ring,
-			     struct sk_buff *skb, u16 vlan_tag)
+			     struct sk_buff *skb, u16 vlan_tag, bool lpbk)
 {
 	struct i40e_q_vector *q_vector = rx_ring->q_vector;
+	struct i40e_pf *pf = rx_ring->vsi->back;
 
 	if ((rx_ring->netdev->features & NETIF_F_HW_VLAN_CTAG_RX) &&
 	    (vlan_tag & VLAN_VID_MASK))
 		__vlan_hwaccel_put_tag(skb, htons(ETH_P_8021Q), vlan_tag);
 
+	if ((pf->eswitch_mode == DEVLINK_ESWITCH_MODE_SWITCHDEV) && lpbk)
+		i40e_handle_lpbk_skb(rx_ring, skb);
+
 	napi_gro_receive(&q_vector->napi, skb);
 }
 
@@ -1528,6 +1572,7 @@ static inline void i40e_rx_hash(struct i40e_ring *ring,
  * @rx_desc: pointer to the EOP Rx descriptor
  * @skb: pointer to current skb being populated
  * @rx_ptype: the packet type decoded by hardware
+ * @lpbk: is it a loopback frame?
  *
  * This function checks the ring, descriptor, and packet information in
  * order to populate the hash, checksum, VLAN, protocol, and
@@ -1536,7 +1581,7 @@ static inline void i40e_rx_hash(struct i40e_ring *ring,
 static inline
 void i40e_process_skb_fields(struct i40e_ring *rx_ring,
 			     union i40e_rx_desc *rx_desc, struct sk_buff *skb,
-			     u8 rx_ptype)
+			     u8 rx_ptype, bool *lpbk)
 {
 	u64 qword = le64_to_cpu(rx_desc->wb.qword1.status_error_len);
 	u32 rx_status = (qword & I40E_RXD_QW1_STATUS_MASK) >>
@@ -1545,6 +1590,9 @@ void i40e_process_skb_fields(struct i40e_ring *rx_ring,
 	u32 tsyn = (rx_status & I40E_RXD_QW1_STATUS_TSYNINDX_MASK) >>
 		   I40E_RXD_QW1_STATUS_TSYNINDX_SHIFT;
 
+	*lpbk = !!((rx_status & I40E_RXD_QW1_STATUS_LPBK_MASK) >>
+		I40E_RXD_QW1_STATUS_LPBK_SHIFT);
+
 	if (unlikely(tsynvalid))
 		i40e_ptp_rx_hwtstamp(rx_ring->vsi->back, skb, tsyn);
 
@@ -1898,6 +1946,7 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget)
 		u16 vlan_tag;
 		u8 rx_ptype;
 		u64 qword;
+		bool lpbk;
 
 		/* return some buffers to hardware, one at a time is too slow */
 		if (cleaned_count >= I40E_RX_BUFFER_WRITE) {
@@ -1970,12 +2019,12 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget)
 			   I40E_RXD_QW1_PTYPE_SHIFT;
 
 		/* populate checksum, VLAN, and protocol */
-		i40e_process_skb_fields(rx_ring, rx_desc, skb, rx_ptype);
+		i40e_process_skb_fields(rx_ring, rx_desc, skb, rx_ptype, &lpbk);
 
 		vlan_tag = (qword & BIT(I40E_RX_DESC_STATUS_L2TAG1P_SHIFT)) ?
 			   le16_to_cpu(rx_desc->wb.qword0.lo_dword.l2tag1) : 0;
 
-		i40e_receive_skb(rx_ring, skb, vlan_tag);
+		i40e_receive_skb(rx_ring, skb, vlan_tag, lpbk);
 		skb = NULL;
 
 		/* update budget accounting */
@@ -3037,6 +3086,58 @@ static inline void i40e_tx_map(struct i40e_ring *tx_ring, struct sk_buff *skb,
 }
 
 /**
+ * i40e_tvsi - set up the target vsi in TX context descriptor
+ * @skb:     send buffer
+ * @tx_ring:  ptr to the target vsi
+ * @cd_type_cmd_tso_mss: Quad Word 1
+ *
+ * Returns 0 on success, -EINVAL on error
+ **/
+static int i40e_tvsi(struct sk_buff *skb, struct i40e_ring *tx_ring,
+		     u64 *cd_type_cmd_tso_mss)
+{
+	struct metadata_dst *md_dst = skb_metadata_dst(skb);
+	struct i40e_pf *pf;
+	struct i40e_vsi *t_vsi = NULL;
+	struct i40e_vf *t_vf;
+	u64 cd_cmd, cd_tvsi;
+	u32 port_id;
+
+	/* If skb metadata dst points to a port id, do a directed transmit to
+	 * that VSI. TSO is mutually exclusive with this option. So TSO is not
+	 * enabled when doing a directed transmit.
+	 */
+	if (!md_dst || (md_dst->type != METADATA_HW_PORT_MUX))
+		return 0;
+
+	port_id = md_dst->u.port_info.port_id;
+
+	pf = tx_ring->vsi->back;
+	if ((port_id >= pf->num_alloc_vfs) &&
+	    (port_id != I40E_MAIN_VSI_PORT_ID)) {
+		WARN_ONCE(1, "Unexpected port_id: %d num_vfs:%d\n",
+			  md_dst->u.port_info.port_id, pf->num_alloc_vfs);
+		return -EINVAL;
+	}
+
+	if (port_id == I40E_MAIN_VSI_PORT_ID) {
+		t_vsi = pf->vsi[pf->lan_vsi];
+	} else {
+		t_vf = &pf->vf[port_id];
+		t_vsi = pf->vsi[t_vf->lan_vsi_idx];
+	}
+
+	cd_cmd = I40E_TX_CTX_DESC_SWTCH_VSI;
+	cd_tvsi = t_vsi->id;
+	cd_tvsi = (cd_tvsi << I40E_TXD_CTX_QW1_VSI_SHIFT) &
+		  I40E_TXD_CTX_QW1_VSI_MASK;
+	*cd_type_cmd_tso_mss |= (cd_cmd << I40E_TXD_CTX_QW1_CMD_SHIFT) |
+				 cd_tvsi;
+
+	return 0;
+}
+
+/**
  * i40e_xmit_frame_ring - Sends buffer on Tx ring
  * @skb:     send buffer
  * @tx_ring: ring to send buffer on
@@ -3101,6 +3202,8 @@ static netdev_tx_t i40e_xmit_frame_ring(struct sk_buff *skb,
 		tx_flags |= I40E_TX_FLAGS_IPV6;
 
 	tso = i40e_tso(first, &hdr_len, &cd_type_cmd_tso_mss);
+	if (!tso)
+		tso = i40e_tvsi(skb, tx_ring, &cd_type_cmd_tso_mss);
 
 	if (tso < 0)
 		goto out_drop;
@@ -3164,3 +3267,40 @@ netdev_tx_t i40e_lan_xmit_frame(struct sk_buff *skb, struct net_device *netdev)
 
 	return i40e_xmit_frame_ring(skb, tx_ring);
 }
+
+/**
+ * i40e_port_netdev_start_xmit
+ * @skb:    send buffer
+ * @netdev: network interface device structure
+ *
+ * Sets skb->dev to PF netdev, and port id in the skb->dst and requeues
+ * skb via dev_queue_xmit()
+ **/
+netdev_tx_t i40e_port_netdev_start_xmit(struct sk_buff *skb,
+					struct net_device *netdev)
+{
+	struct i40e_port_netdev_priv *priv = netdev_priv(netdev);
+	struct i40e_vsi *vsi;
+	struct i40e_pf *pf;
+	struct i40e_vf *vf;
+
+	switch (priv->type) {
+	case I40E_PORT_NETDEV_VF:
+		vf = (struct i40e_vf *)priv->f;
+		pf = vf->pf;
+		break;
+	case I40E_PORT_NETDEV_PF:
+		pf = (struct i40e_pf *)priv->f;
+		break;
+	default:
+		dev_kfree_skb_any(skb);
+		return NETDEV_TX_OK;
+	}
+
+	vsi = pf->vsi[pf->lan_vsi];
+	dst_hold(&priv->dst->dst);
+	skb_dst_set(skb, &priv->dst->dst);
+	skb->dev = vsi->netdev;
+
+	return dev_queue_xmit(skb);
+}
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.h b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
index d6609de..715de92 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
@@ -392,6 +392,8 @@ struct i40e_ring_container {
 
 bool i40e_alloc_rx_buffers(struct i40e_ring *rxr, u16 cleaned_count);
 netdev_tx_t i40e_lan_xmit_frame(struct sk_buff *skb, struct net_device *netdev);
+netdev_tx_t i40e_port_netdev_start_xmit(struct sk_buff *skb,
+					struct net_device *netdev);
 void i40e_clean_tx_ring(struct i40e_ring *tx_ring);
 void i40e_clean_rx_ring(struct i40e_ring *rx_ring);
 int i40e_setup_tx_descriptors(struct i40e_ring *tx_ring);
diff --git a/drivers/net/ethernet/intel/i40e/i40e_type.h b/drivers/net/ethernet/intel/i40e/i40e_type.h
index 9200f2d..08364a4 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_type.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_type.h
@@ -729,6 +729,9 @@ enum i40e_rx_desc_status_bits {
 #define I40E_RXD_QW1_STATUS_TSYNVALID_SHIFT  I40E_RX_DESC_STATUS_TSYNVALID_SHIFT
 #define I40E_RXD_QW1_STATUS_TSYNVALID_MASK \
 				    BIT_ULL(I40E_RXD_QW1_STATUS_TSYNVALID_SHIFT)
+#define I40E_RXD_QW1_STATUS_LPBK_SHIFT  I40E_RX_DESC_STATUS_LPBK_SHIFT
+#define I40E_RXD_QW1_STATUS_LPBK_MASK \
+				BIT_ULL(I40E_RXD_QW1_STATUS_LPBK_SHIFT)
 
 enum i40e_rx_desc_fltstat_values {
 	I40E_RX_DESC_FLTSTAT_NO_DATA	= 0,
diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
index 7c2e7b0..f8d25cb 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
@@ -1806,8 +1806,10 @@ static int i40e_vc_enable_queues_msg(struct i40e_vf *vf, u8 *msg, u16 msglen)
 	if (i40e_vsi_start_rings(pf->vsi[vf->lan_vsi_idx]))
 		aq_ret = I40E_ERR_TIMEOUT;
 
-	if ((aq_ret == 0) && vf->port_netdev)
+	if ((aq_ret == 0) && vf->port_netdev) {
 		netif_carrier_on(vf->port_netdev);
+		netif_tx_start_all_queues(vf->port_netdev);
+	}
 
 error_param:
 	/* send the response to the VF */
@@ -1848,8 +1850,10 @@ static int i40e_vc_disable_queues_msg(struct i40e_vf *vf, u8 *msg, u16 msglen)
 
 	i40e_vsi_stop_rings(pf->vsi[vf->lan_vsi_idx]);
 
-	if ((aq_ret == 0) && vf->port_netdev)
+	if ((aq_ret == 0) && vf->port_netdev) {
+		netif_tx_stop_all_queues(vf->port_netdev);
 		netif_carrier_off(vf->port_netdev);
+	}
 
 error_param:
 	/* send the response to the VF */
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Intel-wired-lan] [next-queue v6 PATCH 5/7] i40e: Add TX and RX support over port netdev's in switchdev mode
@ 2017-03-30  0:22   ` Sridhar Samudrala
  0 siblings, 0 replies; 46+ messages in thread
From: Sridhar Samudrala @ 2017-03-30  0:22 UTC (permalink / raw)
  To: intel-wired-lan

In switchdev mode, broadcasts from VFs are received by the PF and passed
to corresponding port representor netdev.
Any frames sent via port netdevs are sent as directed transmits to the
corresponding VFs. To enable directed transmit, skb metadata dst is used
to pass the port id and the frame is requeued to call the PFs transmit
routine. VF id is used as port id for VFs and PF port id is defined as
I40_MAIN_VSI_PORT_ID.

Small script to demonstrate inter VF and PF to VF pings in switchdev mode.
PF: p4p1, VFs: p4p1_0,p4p1_1 VF Port Reps:p4p1-vf0, p4p1-vf1
PF Port rep: p4p1-pf

# rmmod i40e; modprobe i40e
# devlink dev eswitch set pci/0000:05:00.0 mode switchdev
# echo 2 > /sys/class/net/p4p1/device/sriov_numvfs
# ip link set p4p1 vf 0 mac 00:11:22:33:44:55
# ip link set p4p1 vf 1 mac 00:11:22:33:44:56
# rmmod i40evf; modprobe i40evf

/* Create 2 namespaces and move the VFs to the corresponding ns */
# ip netns add ns0
# ip link set p4p1_0 netns ns0
# ip netns exec ns0 ip addr add 192.168.1.10/24 dev p4p1_0
# ip netns exec ns0 ip link set p4p1_0 up
# ip netns add ns1
# ip link set p4p1_1 netns ns1
# ip netns exec ns1 ip addr add 192.168.1.11/24 dev p4p1_1
# ip netns exec ns1 ip link set p4p1_1 up

/* bring up pf and port netdevs */
# ip addr add 192.168.1.1/24 dev p4p1
# ip link set p4p1 up
# ip link set p4p1-vf0 up
# ip link set p4p1-vf1 up
# ip link set p4p1-pf up

# ip netns exec ns0 ping -c3 192.168.1.11  /* VF0 -> VF1 */
# ip netns exec ns1 ping -c3 192.168.1.10  /* VF1 -> VF0 */
# ping -c3 192.168.1.10   /* PF -> VF0 */
# ping -c3 192.168.1.11   /* PF -> VF1 */

/* VF0 -> IP in same subnet - broadcasts will be seen on p4p1-vf0 & p4p1 */
# ip netns exec ns0 ping -c1 -W1 192.168.1.200
/* VF1 -> IP in same subnet -  broadcasts will be seen on p4p1-vf1 & p4p1*/
# ip netns exec ns0 ping -c1 -W1 192.168.1.200
/* port rep VF0 -> IP in same subnet - broadcasts will be seen on p4p1_0 */
# ping -I p4p1-vf0 -c1 -W1 192.168.1.200
/* port rep VF1 -> IP in same subnet  - broadcasts will be seen on p4p1_1 */
# ping -I p4p1-vf1 -c1 -W1 192.168.1.200

Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e.h             |   4 +
 drivers/net/ethernet/intel/i40e/i40e_main.c        |  27 +++-
 drivers/net/ethernet/intel/i40e/i40e_txrx.c        | 148 ++++++++++++++++++++-
 drivers/net/ethernet/intel/i40e/i40e_txrx.h        |   2 +
 drivers/net/ethernet/intel/i40e/i40e_type.h        |   3 +
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c |   8 +-
 6 files changed, 184 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
index c865803..ac11005 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -55,6 +55,7 @@
 #include <linux/net_tstamp.h>
 #include <linux/ptp_clock_kernel.h>
 #include <net/devlink.h>
+#include <net/dst_metadata.h>
 
 #include "i40e_type.h"
 #include "i40e_prototype.h"
@@ -320,6 +321,8 @@ struct i40e_flex_pit {
 	u8 pit_index;
 };
 
+#define I40E_MAIN_VSI_PORT_ID	(1 << 15)
+
 enum i40e_port_netdev_type {
 	I40E_PORT_NETDEV_PF,
 	I40E_PORT_NETDEV_VF
@@ -328,6 +331,7 @@ enum i40e_port_netdev_type {
 /* Port representor netdev private structure */
 struct i40e_port_netdev_priv {
 	enum i40e_port_netdev_type type;	/* type - PF or VF */
+	struct metadata_dst *dst;		/* port id */
 	void *f;				/* ptr to PF or VF struct */
 };
 
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 683aa20..e9c5c6b 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -5519,8 +5519,10 @@ int i40e_open(struct net_device *netdev)
 
 	udp_tunnel_get_rx_info(netdev);
 
-	if (pf->port_netdev)
+	if (pf->port_netdev) {
 		netif_carrier_on(pf->port_netdev);
+		netif_tx_start_all_queues(pf->port_netdev);
+	}
 
 	return 0;
 }
@@ -5675,8 +5677,10 @@ int i40e_close(struct net_device *netdev)
 
 	i40e_vsi_close(vsi);
 
-	if (pf->port_netdev)
+	if (pf->port_netdev) {
 		netif_carrier_off(pf->port_netdev);
+		netif_tx_stop_all_queues(pf->port_netdev);
+	}
 
 	return 0;
 }
@@ -10872,6 +10876,7 @@ static int i40e_devlink_eswitch_mode_get(struct devlink *devlink, u16 *mode)
 static int i40e_devlink_eswitch_mode_set(struct devlink *devlink, u16 mode)
 {
 	struct i40e_pf *pf = devlink_priv(devlink);
+	struct i40e_vsi *vsi = pf->vsi[pf->lan_vsi];
 	struct i40e_vf *vf;
 	int i, j, err = 0;
 
@@ -10886,6 +10891,8 @@ static int i40e_devlink_eswitch_mode_set(struct devlink *devlink, u16 mode)
 		}
 		i40e_free_port_netdev(pf, I40E_PORT_NETDEV_PF);
 		pf->eswitch_mode = mode;
+		vsi->netdev->priv_flags |=
+			(IFF_XMIT_DST_RELEASE | IFF_XMIT_DST_RELEASE_PERM);
 		break;
 	case DEVLINK_ESWITCH_MODE_SWITCHDEV:
 		err = i40e_alloc_port_netdev(pf, I40E_PORT_NETDEV_PF);
@@ -10905,6 +10912,7 @@ static int i40e_devlink_eswitch_mode_set(struct devlink *devlink, u16 mode)
 			}
 		}
 		pf->eswitch_mode = mode;
+		netif_keep_dst(vsi->netdev);
 		break;
 	default:
 		err = -EOPNOTSUPP;
@@ -10996,6 +11004,7 @@ static int i40e_port_netdev_stop(struct net_device *dev)
 static const struct net_device_ops i40e_port_netdev_ops = {
 	.ndo_open		= i40e_port_netdev_open,
 	.ndo_stop		= i40e_port_netdev_stop,
+	.ndo_start_xmit		= i40e_port_netdev_start_xmit,
 };
 
 /**
@@ -11034,6 +11043,10 @@ int i40e_alloc_port_netdev(void *f, enum i40e_port_netdev_type type)
 		priv = netdev_priv(port_netdev);
 		priv->f = pf;
 		priv->type = I40E_PORT_NETDEV_PF;
+		priv->dst = metadata_dst_alloc(0, METADATA_HW_PORT_MUX,
+					       GFP_KERNEL);
+		priv->dst->u.port_info.lower_dev = vsi->netdev;
+		priv->dst->u.port_info.port_id = I40E_MAIN_VSI_PORT_ID;
 		break;
 	case I40E_PORT_NETDEV_VF:
 		vf = (struct i40e_vf *)f;
@@ -11055,6 +11068,10 @@ int i40e_alloc_port_netdev(void *f, enum i40e_port_netdev_type type)
 		priv = netdev_priv(port_netdev);
 		priv->f = vf;
 		priv->type = I40E_PORT_NETDEV_VF;
+		priv->dst = metadata_dst_alloc(0, METADATA_HW_PORT_MUX,
+					       GFP_KERNEL);
+		priv->dst->u.port_info.lower_dev = vsi->netdev;
+		priv->dst->u.port_info.port_id = vf->vf_id;
 		break;
 	default:
 		return -EINVAL;
@@ -11070,6 +11087,7 @@ int i40e_alloc_port_netdev(void *f, enum i40e_port_netdev_type type)
 	if (err) {
 		dev_err(&pf->pdev->dev, "register_netdev failed for port netdev: %s\n",
 			port_netdev->name);
+		dst_release((struct dst_entry *)priv->dst);
 		free_netdev(port_netdev);
 		return err;
 	}
@@ -11110,6 +11128,7 @@ int i40e_alloc_port_netdev(void *f, enum i40e_port_netdev_type type)
  **/
 void i40e_free_port_netdev(void *f, enum i40e_port_netdev_type type)
 {
+	struct i40e_port_netdev_priv *priv;
 	struct i40e_pf *pf;
 	struct i40e_vf *vf;
 	struct i40e_vsi *vsi;
@@ -11123,6 +11142,8 @@ void i40e_free_port_netdev(void *f, enum i40e_port_netdev_type type)
 			return;
 		dev_info(&pf->pdev->dev, "Freeing PF Port representor %s\n",
 			 pf->port_netdev->name);
+		priv = netdev_priv(pf->port_netdev);
+		dst_release((struct dst_entry *)priv->dst);
 		unregister_netdev(pf->port_netdev);
 		free_netdev(pf->port_netdev);
 		pf->port_netdev = NULL;
@@ -11140,6 +11161,8 @@ void i40e_free_port_netdev(void *f, enum i40e_port_netdev_type type)
 			return;
 		dev_info(&pf->pdev->dev, "Freeing VF Port representor %s\n",
 			 vf->port_netdev->name);
+		priv = netdev_priv(vf->port_netdev);
+		dst_release((struct dst_entry *)priv->dst);
 		unregister_netdev(vf->port_netdev);
 		free_netdev(vf->port_netdev);
 		vf->port_netdev = NULL;
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index ebffca0..86d2510 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -1302,20 +1302,64 @@ static bool i40e_alloc_mapped_page(struct i40e_ring *rx_ring,
 }
 
 /**
+ * i40e_handle_lpbk_skb - Update skb->dev of a loopback frame
+ * @rx_ring: rx ring in play
+ * @skb: packet to send up
+ **/
+static void i40e_handle_lpbk_skb(struct i40e_ring *rx_ring, struct sk_buff *skb)
+{
+	struct i40e_q_vector *q_vector = rx_ring->q_vector;
+	struct i40e_pf *pf = rx_ring->vsi->back;
+	struct sk_buff *nskb;
+	struct i40e_vf *vf;
+	struct ethhdr *eth;
+	int vf_id;
+
+	if ((skb->pkt_type != PACKET_BROADCAST) &&
+	    (skb->pkt_type != PACKET_MULTICAST) &&
+	    (skb->pkt_type != PACKET_OTHERHOST))
+		return;
+
+	eth = (struct ethhdr *)skb_mac_header(skb);
+
+	/* If a loopback packet is received in switchdev mode, clone the skb
+	 * and pass it to the corresponding port netdev based on the source MAC.
+	 */
+	for (vf_id = 0; vf_id < pf->num_alloc_vfs; vf_id++) {
+		vf = &pf->vf[vf_id];
+		if (ether_addr_equal(eth->h_source,
+				     vf->default_lan_addr.addr)) {
+			nskb = skb_clone(skb, GFP_ATOMIC);
+			if (!nskb)
+				break;
+			nskb->offload_fwd_mark = 1;
+			nskb->dev = vf->port_netdev;
+			napi_gro_receive(&q_vector->napi, nskb);
+			break;
+		}
+	}
+}
+
+/**
  * i40e_receive_skb - Send a completed packet up the stack
  * @rx_ring:  rx ring in play
  * @skb: packet to send up
  * @vlan_tag: vlan tag for packet
+ * @lpbk: is it a loopback frame?
  **/
 static void i40e_receive_skb(struct i40e_ring *rx_ring,
-			     struct sk_buff *skb, u16 vlan_tag)
+			     struct sk_buff *skb, u16 vlan_tag, bool lpbk)
 {
 	struct i40e_q_vector *q_vector = rx_ring->q_vector;
+	struct i40e_pf *pf = rx_ring->vsi->back;
 
 	if ((rx_ring->netdev->features & NETIF_F_HW_VLAN_CTAG_RX) &&
 	    (vlan_tag & VLAN_VID_MASK))
 		__vlan_hwaccel_put_tag(skb, htons(ETH_P_8021Q), vlan_tag);
 
+	if ((pf->eswitch_mode == DEVLINK_ESWITCH_MODE_SWITCHDEV) && lpbk)
+		i40e_handle_lpbk_skb(rx_ring, skb);
+
 	napi_gro_receive(&q_vector->napi, skb);
 }
 
@@ -1528,6 +1572,7 @@ static inline void i40e_rx_hash(struct i40e_ring *ring,
  * @rx_desc: pointer to the EOP Rx descriptor
  * @skb: pointer to current skb being populated
  * @rx_ptype: the packet type decoded by hardware
+ * @lpbk: is it a loopback frame?
  *
  * This function checks the ring, descriptor, and packet information in
  * order to populate the hash, checksum, VLAN, protocol, and
@@ -1536,7 +1581,7 @@ static inline void i40e_rx_hash(struct i40e_ring *ring,
 static inline
 void i40e_process_skb_fields(struct i40e_ring *rx_ring,
 			     union i40e_rx_desc *rx_desc, struct sk_buff *skb,
-			     u8 rx_ptype)
+			     u8 rx_ptype, bool *lpbk)
 {
 	u64 qword = le64_to_cpu(rx_desc->wb.qword1.status_error_len);
 	u32 rx_status = (qword & I40E_RXD_QW1_STATUS_MASK) >>
@@ -1545,6 +1590,9 @@ void i40e_process_skb_fields(struct i40e_ring *rx_ring,
 	u32 tsyn = (rx_status & I40E_RXD_QW1_STATUS_TSYNINDX_MASK) >>
 		   I40E_RXD_QW1_STATUS_TSYNINDX_SHIFT;
 
+	*lpbk = !!((rx_status & I40E_RXD_QW1_STATUS_LPBK_MASK) >>
+		I40E_RXD_QW1_STATUS_LPBK_SHIFT);
+
 	if (unlikely(tsynvalid))
 		i40e_ptp_rx_hwtstamp(rx_ring->vsi->back, skb, tsyn);
 
@@ -1898,6 +1946,7 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget)
 		u16 vlan_tag;
 		u8 rx_ptype;
 		u64 qword;
+		bool lpbk;
 
 		/* return some buffers to hardware, one at a time is too slow */
 		if (cleaned_count >= I40E_RX_BUFFER_WRITE) {
@@ -1970,12 +2019,12 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget)
 			   I40E_RXD_QW1_PTYPE_SHIFT;
 
 		/* populate checksum, VLAN, and protocol */
-		i40e_process_skb_fields(rx_ring, rx_desc, skb, rx_ptype);
+		i40e_process_skb_fields(rx_ring, rx_desc, skb, rx_ptype, &lpbk);
 
 		vlan_tag = (qword & BIT(I40E_RX_DESC_STATUS_L2TAG1P_SHIFT)) ?
 			   le16_to_cpu(rx_desc->wb.qword0.lo_dword.l2tag1) : 0;
 
-		i40e_receive_skb(rx_ring, skb, vlan_tag);
+		i40e_receive_skb(rx_ring, skb, vlan_tag, lpbk);
 		skb = NULL;
 
 		/* update budget accounting */
@@ -3037,6 +3086,58 @@ static inline void i40e_tx_map(struct i40e_ring *tx_ring, struct sk_buff *skb,
 }
 
 /**
+ * i40e_tvsi - set up the target vsi in TX context descriptor
+ * @skb:     send buffer
+ * @tx_ring:  ptr to the target vsi
+ * @cd_type_cmd_tso_mss: Quad Word 1
+ *
+ * Returns 0 on success, -EINVAL on error
+ **/
+static int i40e_tvsi(struct sk_buff *skb, struct i40e_ring *tx_ring,
+		     u64 *cd_type_cmd_tso_mss)
+{
+	struct metadata_dst *md_dst = skb_metadata_dst(skb);
+	struct i40e_pf *pf;
+	struct i40e_vsi *t_vsi = NULL;
+	struct i40e_vf *t_vf;
+	u64 cd_cmd, cd_tvsi;
+	u32 port_id;
+
+	/* If skb metadata dst points to a port id, do a directed transmit to
+	 * that VSI. TSO is mutually exclusive with this option. So TSO is not
+	 * enabled when doing a directed transmit.
+	 */
+	if (!md_dst || (md_dst->type != METADATA_HW_PORT_MUX))
+		return 0;
+
+	port_id = md_dst->u.port_info.port_id;
+
+	pf = tx_ring->vsi->back;
+	if ((port_id >= pf->num_alloc_vfs) &&
+	    (port_id != I40E_MAIN_VSI_PORT_ID)) {
+		WARN_ONCE(1, "Unexpected port_id: %d num_vfs:%d\n",
+			  md_dst->u.port_info.port_id, pf->num_alloc_vfs);
+		return -EINVAL;
+	}
+
+	if (port_id == I40E_MAIN_VSI_PORT_ID) {
+		t_vsi = pf->vsi[pf->lan_vsi];
+	} else {
+		t_vf = &pf->vf[port_id];
+		t_vsi = pf->vsi[t_vf->lan_vsi_idx];
+	}
+
+	cd_cmd = I40E_TX_CTX_DESC_SWTCH_VSI;
+	cd_tvsi = t_vsi->id;
+	cd_tvsi = (cd_tvsi << I40E_TXD_CTX_QW1_VSI_SHIFT) &
+		  I40E_TXD_CTX_QW1_VSI_MASK;
+	*cd_type_cmd_tso_mss |= (cd_cmd << I40E_TXD_CTX_QW1_CMD_SHIFT) |
+				 cd_tvsi;
+
+	return 0;
+}
+
+/**
  * i40e_xmit_frame_ring - Sends buffer on Tx ring
  * @skb:     send buffer
  * @tx_ring: ring to send buffer on
@@ -3101,6 +3202,8 @@ static netdev_tx_t i40e_xmit_frame_ring(struct sk_buff *skb,
 		tx_flags |= I40E_TX_FLAGS_IPV6;
 
 	tso = i40e_tso(first, &hdr_len, &cd_type_cmd_tso_mss);
+	if (!tso)
+		tso = i40e_tvsi(skb, tx_ring, &cd_type_cmd_tso_mss);
 
 	if (tso < 0)
 		goto out_drop;
@@ -3164,3 +3267,40 @@ netdev_tx_t i40e_lan_xmit_frame(struct sk_buff *skb, struct net_device *netdev)
 
 	return i40e_xmit_frame_ring(skb, tx_ring);
 }
+
+/**
+ * i40e_port_netdev_start_xmit
+ * @skb:    send buffer
+ * @netdev: network interface device structure
+ *
+ * Sets skb->dev to PF netdev, and port id in the skb->dst and requeues
+ * skb via dev_queue_xmit()
+ **/
+netdev_tx_t i40e_port_netdev_start_xmit(struct sk_buff *skb,
+					struct net_device *netdev)
+{
+	struct i40e_port_netdev_priv *priv = netdev_priv(netdev);
+	struct i40e_vsi *vsi;
+	struct i40e_pf *pf;
+	struct i40e_vf *vf;
+
+	switch (priv->type) {
+	case I40E_PORT_NETDEV_VF:
+		vf = (struct i40e_vf *)priv->f;
+		pf = vf->pf;
+		break;
+	case I40E_PORT_NETDEV_PF:
+		pf = (struct i40e_pf *)priv->f;
+		break;
+	default:
+		dev_kfree_skb_any(skb);
+		return NETDEV_TX_OK;
+	}
+
+	vsi = pf->vsi[pf->lan_vsi];
+	dst_hold(&priv->dst->dst);
+	skb_dst_set(skb, &priv->dst->dst);
+	skb->dev = vsi->netdev;
+
+	return dev_queue_xmit(skb);
+}
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.h b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
index d6609de..715de92 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
@@ -392,6 +392,8 @@ struct i40e_ring_container {
 
 bool i40e_alloc_rx_buffers(struct i40e_ring *rxr, u16 cleaned_count);
 netdev_tx_t i40e_lan_xmit_frame(struct sk_buff *skb, struct net_device *netdev);
+netdev_tx_t i40e_port_netdev_start_xmit(struct sk_buff *skb,
+					struct net_device *netdev);
 void i40e_clean_tx_ring(struct i40e_ring *tx_ring);
 void i40e_clean_rx_ring(struct i40e_ring *rx_ring);
 int i40e_setup_tx_descriptors(struct i40e_ring *tx_ring);
diff --git a/drivers/net/ethernet/intel/i40e/i40e_type.h b/drivers/net/ethernet/intel/i40e/i40e_type.h
index 9200f2d..08364a4 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_type.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_type.h
@@ -729,6 +729,9 @@ enum i40e_rx_desc_status_bits {
 #define I40E_RXD_QW1_STATUS_TSYNVALID_SHIFT  I40E_RX_DESC_STATUS_TSYNVALID_SHIFT
 #define I40E_RXD_QW1_STATUS_TSYNVALID_MASK \
 				    BIT_ULL(I40E_RXD_QW1_STATUS_TSYNVALID_SHIFT)
+#define I40E_RXD_QW1_STATUS_LPBK_SHIFT  I40E_RX_DESC_STATUS_LPBK_SHIFT
+#define I40E_RXD_QW1_STATUS_LPBK_MASK \
+				BIT_ULL(I40E_RXD_QW1_STATUS_LPBK_SHIFT)
 
 enum i40e_rx_desc_fltstat_values {
 	I40E_RX_DESC_FLTSTAT_NO_DATA	= 0,
diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
index 7c2e7b0..f8d25cb 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
@@ -1806,8 +1806,10 @@ static int i40e_vc_enable_queues_msg(struct i40e_vf *vf, u8 *msg, u16 msglen)
 	if (i40e_vsi_start_rings(pf->vsi[vf->lan_vsi_idx]))
 		aq_ret = I40E_ERR_TIMEOUT;
 
-	if ((aq_ret == 0) && vf->port_netdev)
+	if ((aq_ret == 0) && vf->port_netdev) {
 		netif_carrier_on(vf->port_netdev);
+		netif_tx_start_all_queues(vf->port_netdev);
+	}
 
 error_param:
 	/* send the response to the VF */
@@ -1848,8 +1850,10 @@ static int i40e_vc_disable_queues_msg(struct i40e_vf *vf, u8 *msg, u16 msglen)
 
 	i40e_vsi_stop_rings(pf->vsi[vf->lan_vsi_idx]);
 
-	if ((aq_ret == 0) && vf->port_netdev)
+	if ((aq_ret == 0) && vf->port_netdev) {
+		netif_tx_stop_all_queues(vf->port_netdev);
 		netif_carrier_off(vf->port_netdev);
+	}
 
 error_param:
 	/* send the response to the VF */
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [next-queue v6 PATCH 6/7] i40e: Add support for exposing switch port statistics via port netdevs
  2017-03-30  0:22 ` [Intel-wired-lan] " Sridhar Samudrala
@ 2017-03-30  0:22   ` Sridhar Samudrala
  -1 siblings, 0 replies; 46+ messages in thread
From: Sridhar Samudrala @ 2017-03-30  0:22 UTC (permalink / raw)
  To: intel-wired-lan, netdev, alexander.h.duyck, anjali.singhai,
	jakub.kicinski, gerlitz.or, jiri, sridhar.samudrala

By default stats counted by HW are returned via the original
ndo_get_stats64() api. Stats counted in SW are returned via
ndo_get_offload_stats() api.

Small script to demonstrate port stats in switchdev mode.
PF: p4p1, VFs: p4p1_0,p4p1_1 VF Port Reps:p4p1-vf0, p4p1-vf1
PF Port rep: p4p1-pf

# rmmod i40e; modprobe i40e
# devlink dev eswitch set pci/0000:05:00.0 mode switchdev
# echo 2 > /sys/class/net/enp5s0f0/device/sriov_numvfs
# ip link set p4p1 vf 0 mac 00:11:22:33:44:55
# ip link set p4p1 vf 1 mac 00:11:22:33:44:56
# rmmod i40evf; modprobe i40evf

/* Create 2 namespaces and move the VFs to the corresponding ns */
# ip netns add ns0
# ip link set p4p1_0 netns ns0
# ip netns exec ns0 ip addr add 192.168.1.10/24 dev p4p1_0
# ip netns exec ns0 ip link set p4p1_0 up
# ip netns add ns1
# ip link set p4p1_1 netns ns1
# ip netns exec ns1 ip addr add 192.168.1.11/24 dev p4p1_1
# ip netns exec ns1 ip link set p4p1_1 up

/* bring up pf and port netdevs */
# ip addr add 192.168.1.1/24 dev p4p1
# ip link set p4p1 up
# ip link set p4p1-vf0 up
# ip link set p4p1-vf1 up
# ip link set p4p1-pf up

# ip netns exec ns0 ping -c3 192.168.1.11  /* VF0 -> VF1 */
# ip netns exec ns1 ping -c3 192.168.1.10  /* VF1 -> VF0 */
# ping -c3 192.168.1.10   /* PF -> VF0 */
# ping -c3 192.168.1.11   /* PF -> VF1 */

/* VF0 -> IP in same subnet - broadcasts will be seen on p4p1-vf0 */
# ip netns exec ns0 ping -c1 -W1 192.168.1.200
/* VF1 -> IP in same subnet-  broadcasts will be seen on p4p1-vf1 */
# ip netns exec ns0 ping -c1 -W1 192.168.1.200
/* port rep VF0 -> IP in same subnet - broadcasts will be seen on p4p1_0 */
# ping -I p4p1-vf0 -c1 -W1 192.168.1.200
/* port rep VF1 -> IP in same subnet  - broadcasts will be seen on p4p1_1 */
# ping -I p4p1-vf1 -c1 -W1 192.168.1.200

HW STATS
# ip netns exec ns0 ip -s l show p4p1_0
41: p4p1_0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT qlen 1000
    link/ether 00:11:22:33:44:55 brd ff:ff:ff:ff:ff:ff
    RX: bytes  packets  errors  dropped overrun mcast
    1274       21       0       0       0       0
    TX: bytes  packets  errors  dropped carrier collsns
    980        14       0       0       0       0
# ip -s l show p4p1-vf0
37: p4p1-vf0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT qlen 1000
    link/ether f6:07:98:0e:cd:97 brd ff:ff:ff:ff:ff:ff
    RX: bytes  packets  errors  dropped overrun mcast
    980        14       0       0       0       0
    TX: bytes  packets  errors  dropped carrier collsns
    1274       21       0       0       0       0
# ip netns exec ns1 ip -s l show p4p1_1
42: p4p1_1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT qlen 1000
    link/ether 00:11:22:33:44:56 brd ff:ff:ff:ff:ff:ff
    RX: bytes  packets  errors  dropped overrun mcast
    1246       19       0       0       0       0
    TX: bytes  packets  errors  dropped carrier collsns
    1078       15       0       0       0       0
# ip -s l show p4p1-vf1
38: p4p1-vf1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT qlen 1000
    link/ether 2a:cf:ff:6a:f3:66 brd ff:ff:ff:ff:ff:ff
    RX: bytes  packets  errors  dropped overrun mcast
    1078       15       0       0       0       0
    TX: bytes  packets  errors  dropped carrier collsns
    1246       19       0       0       0       0
# ip -s l show p4p1
34: p4p1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT qlen 1000
    link/ether 3c:fd:fe:a3:18:f8 brd ff:ff:ff:ff:ff:ff
    RX: bytes  packets  errors  dropped overrun mcast
    1134       17       0       0       0       0
    TX: bytes  packets  errors  dropped carrier collsns
    966        19       0       0       0       0
# ip -s l show p4p1-pf
36: p4p1-pf: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT qlen 1000
    link/ether da:0f:67:fe:2e:66 brd ff:ff:ff:ff:ff:ff
    RX: bytes  packets  errors  dropped overrun mcast
    966        19       0       0       0       0
    TX: bytes  packets  errors  dropped carrier collsns
    882        17       0       0       0       0

SW STATS
# ifstat -a -x c
#kernel
Interface        RX Pkts/Rate    TX Pkts/Rate    RX Data/Rate    TX Data/Rate
                 RX Errs/Drop    TX Errs/Drop    RX Over/Rate    TX Coll/Rate
p4p1-pf                0 0             3 0             0 0           126 0
                       0 0             0 0             0 0             0 0
p4p1-vf0               4 0             6 0           184 0           252 0
                       0 0             0 0             0 0             0 0
p4p1-vf1               3 0             3 0           138 0           126 0
                       0 0             0 0             0 0             0 0

Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e.h      |  10 +++
 drivers/net/ethernet/intel/i40e/i40e_main.c | 125 ++++++++++++++++++++++++++++
 drivers/net/ethernet/intel/i40e/i40e_txrx.c |  24 +++++-
 3 files changed, 158 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
index ac11005..72e11b2 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -328,8 +328,18 @@ enum i40e_port_netdev_type {
 	I40E_PORT_NETDEV_VF
 };
 
+struct port_netdev_pcpu_stats {
+	u64			tx_packets;
+	u64			tx_bytes;
+	u64			tx_drops;
+	u64			rx_packets;
+	u64			rx_bytes;
+	struct u64_stats_sync	syncp;
+};
+
 /* Port representor netdev private structure */
 struct i40e_port_netdev_priv {
+	struct port_netdev_pcpu_stats __percpu *stats;
 	enum i40e_port_netdev_type type;	/* type - PF or VF */
 	struct metadata_dst *dst;		/* port id */
 	void *f;				/* ptr to PF or VF struct */
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index e9c5c6b..4f0eebc 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -11001,10 +11001,123 @@ static int i40e_port_netdev_stop(struct net_device *dev)
 	return err;
 }
 
+/**
+ * i40e_port_netdev_get_stats64
+ * @dev: network interface device structure
+ * @stats: netlink stats structure
+ *
+ * Fills the hw statistics from the VSI corresponding to the associated port
+ * netdev
+ **/
+static void
+i40e_port_netdev_get_stats64(struct net_device *netdev,
+			     struct rtnl_link_stats64 *stats)
+{
+	struct i40e_port_netdev_priv *priv = netdev_priv(netdev);
+	struct i40e_vf *vf;
+	struct i40e_pf *pf;
+	struct i40e_vsi *vsi;
+	struct i40e_eth_stats *estats;
+
+	switch (priv->type) {
+	case I40E_PORT_NETDEV_VF:
+		vf = (struct i40e_vf *)priv->f;
+		pf = vf->pf;
+		vsi = pf->vsi[vf->lan_vsi_idx];
+		break;
+	case I40E_PORT_NETDEV_PF:
+		pf = (struct i40e_pf *)priv->f;
+		vsi = pf->vsi[pf->lan_vsi];
+		break;
+	default:
+		return;
+	}
+
+	i40e_update_stats(vsi);
+	estats = &vsi->eth_stats;
+
+	/* TX and RX stats are flipped as we are returning the stats as seen
+	 * at the switch port corresponding to the VF.
+	 */
+	stats->rx_packets = estats->tx_unicast + estats->tx_multicast +
+			    estats->tx_broadcast;
+	stats->tx_packets = estats->rx_unicast + estats->rx_multicast +
+			    estats->rx_broadcast;
+	stats->rx_bytes = estats->tx_bytes;
+	stats->tx_bytes = estats->rx_bytes;
+	stats->rx_dropped = estats->tx_discards;
+	stats->tx_dropped = estats->rx_discards;
+}
+
+/**
+ * i40e_port_netdev_get_cpu_hit_stats64
+ * @dev: network interface device structure
+ * @stats: netlink stats structure
+ *
+ * stats are filled from the priv structure. correspond to the packets
+ * that are seen by the cpu and sent/received via port netdev.
+ **/
+static int
+i40e_port_netdev_get_cpu_hit_stats64(const struct net_device *dev,
+				     struct rtnl_link_stats64 *stats)
+{
+	struct i40e_port_netdev_priv *priv = netdev_priv(dev);
+	int i;
+
+	for_each_possible_cpu(i) {
+		struct port_netdev_pcpu_stats *port_netdev_stats;
+		u64 tbytes, tpkts, tdrops, rbytes, rpkts;
+		unsigned int start;
+
+		port_netdev_stats = per_cpu_ptr(priv->stats, i);
+		do {
+			start = u64_stats_fetch_begin_irq(&port_netdev_stats->syncp);
+			tbytes = port_netdev_stats->tx_bytes;
+			tpkts = port_netdev_stats->tx_packets;
+			tdrops = port_netdev_stats->tx_drops;
+			rbytes = port_netdev_stats->rx_bytes;
+			rpkts = port_netdev_stats->rx_packets;
+		} while (u64_stats_fetch_retry_irq(&port_netdev_stats->syncp, start));
+		stats->tx_bytes += tbytes;
+		stats->tx_packets += tpkts;
+		stats->tx_dropped += tdrops;
+		stats->rx_bytes += rbytes;
+		stats->rx_packets += rpkts;
+	}
+
+	return 0;
+}
+
+static bool
+i40e_port_netdev_has_offload_stats(const struct net_device *dev, int attr_id)
+{
+	switch (attr_id) {
+	case IFLA_OFFLOAD_XSTATS_CPU_HIT:
+		return true;
+	}
+
+	return false;
+}
+
+static int
+i40e_port_netdev_get_offload_stats(int attr_id, const struct net_device *dev,
+				   void *sp)
+{
+	switch (attr_id) {
+	case IFLA_OFFLOAD_XSTATS_CPU_HIT:
+		return i40e_port_netdev_get_cpu_hit_stats64(dev, sp);
+	}
+
+	return -EINVAL;
+}
+
 static const struct net_device_ops i40e_port_netdev_ops = {
 	.ndo_open		= i40e_port_netdev_open,
 	.ndo_stop		= i40e_port_netdev_stop,
 	.ndo_start_xmit		= i40e_port_netdev_start_xmit,
+	.ndo_get_stats64	= i40e_port_netdev_get_stats64,
+	.ndo_has_offload_stats	= i40e_port_netdev_has_offload_stats,
+	.ndo_get_offload_stats	= i40e_port_netdev_get_offload_stats,
 };
 
 /**
@@ -11077,6 +11190,16 @@ int i40e_alloc_port_netdev(void *f, enum i40e_port_netdev_type type)
 		return -EINVAL;
 	}
 
+	priv->stats = netdev_alloc_pcpu_stats(struct port_netdev_pcpu_stats);
+	if (!priv->stats) {
+		dev_err(&pf->pdev->dev,
+			"alloc_pcpu_stats failed for port netdev: %s\n",
+			port_netdev->name);
+		dst_release((struct dst_entry *)priv->dst);
+		free_netdev(port_netdev);
+		return -ENOMEM;
+	}
+
 	port_netdev->netdev_ops = &i40e_port_netdev_ops;
 	eth_hw_addr_random(port_netdev);
 
@@ -11144,6 +11267,7 @@ void i40e_free_port_netdev(void *f, enum i40e_port_netdev_type type)
 			 pf->port_netdev->name);
 		priv = netdev_priv(pf->port_netdev);
 		dst_release((struct dst_entry *)priv->dst);
+		free_percpu(priv->stats);
 		unregister_netdev(pf->port_netdev);
 		free_netdev(pf->port_netdev);
 		pf->port_netdev = NULL;
@@ -11163,6 +11287,7 @@ void i40e_free_port_netdev(void *f, enum i40e_port_netdev_type type)
 			 vf->port_netdev->name);
 		priv = netdev_priv(vf->port_netdev);
 		dst_release((struct dst_entry *)priv->dst);
+		free_percpu(priv->stats);
 		unregister_netdev(vf->port_netdev);
 		free_netdev(vf->port_netdev);
 		vf->port_netdev = NULL;
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 86d2510..449a35c 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -1309,7 +1309,9 @@ static bool i40e_alloc_mapped_page(struct i40e_ring *rx_ring,
 static void i40e_handle_lpbk_skb(struct i40e_ring *rx_ring, struct sk_buff *skb)
 {
 	struct i40e_q_vector *q_vector = rx_ring->q_vector;
+	struct port_netdev_pcpu_stats *port_netdev_stats;
 	struct i40e_pf *pf = rx_ring->vsi->back;
+	struct i40e_port_netdev_priv *priv;
 	struct sk_buff *nskb;
 	struct i40e_vf *vf;
 	struct ethhdr *eth;
@@ -1334,6 +1336,12 @@ static void i40e_handle_lpbk_skb(struct i40e_ring *rx_ring, struct sk_buff *skb)
 				break;
 			nskb->offload_fwd_mark = 1;
 			nskb->dev = vf->port_netdev;
+			priv = netdev_priv(vf->port_netdev);
+			port_netdev_stats = this_cpu_ptr(priv->stats);
+			u64_stats_update_begin(&port_netdev_stats->syncp);
+			port_netdev_stats->rx_packets++;
+			port_netdev_stats->rx_bytes += nskb->len;
+			u64_stats_update_end(&port_netdev_stats->syncp);
 			napi_gro_receive(&q_vector->napi, nskb);
 			break;
 		}
@@ -3283,6 +3291,7 @@ netdev_tx_t i40e_port_netdev_start_xmit(struct sk_buff *skb,
 	struct i40e_vsi *vsi;
 	struct i40e_pf *pf;
 	struct i40e_vf *vf;
+	int ret;
 
 	switch (priv->type) {
 	case I40E_PORT_NETDEV_VF:
@@ -3302,5 +3311,18 @@ netdev_tx_t i40e_port_netdev_start_xmit(struct sk_buff *skb,
 	skb_dst_set(skb, &priv->dst->dst);
 	skb->dev = vsi->netdev;
 
-	return dev_queue_xmit(skb);
+	ret = dev_queue_xmit(skb);
+	if (likely(ret == NET_XMIT_SUCCESS || ret == NET_XMIT_CN)) {
+		struct port_netdev_pcpu_stats *port_netdev_stats;
+
+		port_netdev_stats = this_cpu_ptr(priv->stats);
+		u64_stats_update_begin(&port_netdev_stats->syncp);
+		port_netdev_stats->tx_packets++;
+		port_netdev_stats->tx_bytes += skb->len;
+		u64_stats_update_end(&port_netdev_stats->syncp);
+	} else {
+		this_cpu_inc(priv->stats->tx_drops);
+	}
+
+	return ret;
 }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Intel-wired-lan] [next-queue v6 PATCH 6/7] i40e: Add support for exposing switch port statistics via port netdevs
@ 2017-03-30  0:22   ` Sridhar Samudrala
  0 siblings, 0 replies; 46+ messages in thread
From: Sridhar Samudrala @ 2017-03-30  0:22 UTC (permalink / raw)
  To: intel-wired-lan

By default stats counted by HW are returned via the original
ndo_get_stats64() api. Stats counted in SW are returned via
ndo_get_offload_stats() api.

Small script to demonstrate port stats in switchdev mode.
PF: p4p1, VFs: p4p1_0,p4p1_1 VF Port Reps:p4p1-vf0, p4p1-vf1
PF Port rep: p4p1-pf

# rmmod i40e; modprobe i40e
# devlink dev eswitch set pci/0000:05:00.0 mode switchdev
# echo 2 > /sys/class/net/enp5s0f0/device/sriov_numvfs
# ip link set p4p1 vf 0 mac 00:11:22:33:44:55
# ip link set p4p1 vf 1 mac 00:11:22:33:44:56
# rmmod i40evf; modprobe i40evf

/* Create 2 namespaces and move the VFs to the corresponding ns */
# ip netns add ns0
# ip link set p4p1_0 netns ns0
# ip netns exec ns0 ip addr add 192.168.1.10/24 dev p4p1_0
# ip netns exec ns0 ip link set p4p1_0 up
# ip netns add ns1
# ip link set p4p1_1 netns ns1
# ip netns exec ns1 ip addr add 192.168.1.11/24 dev p4p1_1
# ip netns exec ns1 ip link set p4p1_1 up

/* bring up pf and port netdevs */
# ip addr add 192.168.1.1/24 dev p4p1
# ip link set p4p1 up
# ip link set p4p1-vf0 up
# ip link set p4p1-vf1 up
# ip link set p4p1-pf up

# ip netns exec ns0 ping -c3 192.168.1.11  /* VF0 -> VF1 */
# ip netns exec ns1 ping -c3 192.168.1.10  /* VF1 -> VF0 */
# ping -c3 192.168.1.10   /* PF -> VF0 */
# ping -c3 192.168.1.11   /* PF -> VF1 */

/* VF0 -> IP in same subnet - broadcasts will be seen on p4p1-vf0 */
# ip netns exec ns0 ping -c1 -W1 192.168.1.200
/* VF1 -> IP in same subnet-  broadcasts will be seen on p4p1-vf1 */
# ip netns exec ns0 ping -c1 -W1 192.168.1.200
/* port rep VF0 -> IP in same subnet - broadcasts will be seen on p4p1_0 */
# ping -I p4p1-vf0 -c1 -W1 192.168.1.200
/* port rep VF1 -> IP in same subnet  - broadcasts will be seen on p4p1_1 */
# ping -I p4p1-vf1 -c1 -W1 192.168.1.200

HW STATS
# ip netns exec ns0 ip -s l show p4p1_0
41: p4p1_0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT qlen 1000
    link/ether 00:11:22:33:44:55 brd ff:ff:ff:ff:ff:ff
    RX: bytes  packets  errors  dropped overrun mcast
    1274       21       0       0       0       0
    TX: bytes  packets  errors  dropped carrier collsns
    980        14       0       0       0       0
# ip -s l show p4p1-vf0
37: p4p1-vf0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT qlen 1000
    link/ether f6:07:98:0e:cd:97 brd ff:ff:ff:ff:ff:ff
    RX: bytes  packets  errors  dropped overrun mcast
    980        14       0       0       0       0
    TX: bytes  packets  errors  dropped carrier collsns
    1274       21       0       0       0       0
# ip netns exec ns1 ip -s l show p4p1_1
42: p4p1_1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT qlen 1000
    link/ether 00:11:22:33:44:56 brd ff:ff:ff:ff:ff:ff
    RX: bytes  packets  errors  dropped overrun mcast
    1246       19       0       0       0       0
    TX: bytes  packets  errors  dropped carrier collsns
    1078       15       0       0       0       0
# ip -s l show p4p1-vf1
38: p4p1-vf1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT qlen 1000
    link/ether 2a:cf:ff:6a:f3:66 brd ff:ff:ff:ff:ff:ff
    RX: bytes  packets  errors  dropped overrun mcast
    1078       15       0       0       0       0
    TX: bytes  packets  errors  dropped carrier collsns
    1246       19       0       0       0       0
# ip -s l show p4p1
34: p4p1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT qlen 1000
    link/ether 3c:fd:fe:a3:18:f8 brd ff:ff:ff:ff:ff:ff
    RX: bytes  packets  errors  dropped overrun mcast
    1134       17       0       0       0       0
    TX: bytes  packets  errors  dropped carrier collsns
    966        19       0       0       0       0
# ip -s l show p4p1-pf
36: p4p1-pf: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT qlen 1000
    link/ether da:0f:67:fe:2e:66 brd ff:ff:ff:ff:ff:ff
    RX: bytes  packets  errors  dropped overrun mcast
    966        19       0       0       0       0
    TX: bytes  packets  errors  dropped carrier collsns
    882        17       0       0       0       0

SW STATS
# ifstat -a -x c
#kernel
Interface        RX Pkts/Rate    TX Pkts/Rate    RX Data/Rate    TX Data/Rate
                 RX Errs/Drop    TX Errs/Drop    RX Over/Rate    TX Coll/Rate
p4p1-pf                0 0             3 0             0 0           126 0
                       0 0             0 0             0 0             0 0
p4p1-vf0               4 0             6 0           184 0           252 0
                       0 0             0 0             0 0             0 0
p4p1-vf1               3 0             3 0           138 0           126 0
                       0 0             0 0             0 0             0 0

Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e.h      |  10 +++
 drivers/net/ethernet/intel/i40e/i40e_main.c | 125 ++++++++++++++++++++++++++++
 drivers/net/ethernet/intel/i40e/i40e_txrx.c |  24 +++++-
 3 files changed, 158 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
index ac11005..72e11b2 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -328,8 +328,18 @@ enum i40e_port_netdev_type {
 	I40E_PORT_NETDEV_VF
 };
 
+struct port_netdev_pcpu_stats {
+	u64			tx_packets;
+	u64			tx_bytes;
+	u64			tx_drops;
+	u64			rx_packets;
+	u64			rx_bytes;
+	struct u64_stats_sync	syncp;
+};
+
 /* Port representor netdev private structure */
 struct i40e_port_netdev_priv {
+	struct port_netdev_pcpu_stats __percpu *stats;
 	enum i40e_port_netdev_type type;	/* type - PF or VF */
 	struct metadata_dst *dst;		/* port id */
 	void *f;				/* ptr to PF or VF struct */
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index e9c5c6b..4f0eebc 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -11001,10 +11001,123 @@ static int i40e_port_netdev_stop(struct net_device *dev)
 	return err;
 }
 
+/**
+ * i40e_port_netdev_get_stats64
+ * @dev: network interface device structure
+ * @stats: netlink stats structure
+ *
+ * Fills the hw statistics from the VSI corresponding to the associated port
+ * netdev
+ **/
+static void
+i40e_port_netdev_get_stats64(struct net_device *netdev,
+			     struct rtnl_link_stats64 *stats)
+{
+	struct i40e_port_netdev_priv *priv = netdev_priv(netdev);
+	struct i40e_vf *vf;
+	struct i40e_pf *pf;
+	struct i40e_vsi *vsi;
+	struct i40e_eth_stats *estats;
+
+	switch (priv->type) {
+	case I40E_PORT_NETDEV_VF:
+		vf = (struct i40e_vf *)priv->f;
+		pf = vf->pf;
+		vsi = pf->vsi[vf->lan_vsi_idx];
+		break;
+	case I40E_PORT_NETDEV_PF:
+		pf = (struct i40e_pf *)priv->f;
+		vsi = pf->vsi[pf->lan_vsi];
+		break;
+	default:
+		return;
+	}
+
+	i40e_update_stats(vsi);
+	estats = &vsi->eth_stats;
+
+	/* TX and RX stats are flipped as we are returning the stats as seen
+	 * at the switch port corresponding to the VF.
+	 */
+	stats->rx_packets = estats->tx_unicast + estats->tx_multicast +
+			    estats->tx_broadcast;
+	stats->tx_packets = estats->rx_unicast + estats->rx_multicast +
+			    estats->rx_broadcast;
+	stats->rx_bytes = estats->tx_bytes;
+	stats->tx_bytes = estats->rx_bytes;
+	stats->rx_dropped = estats->tx_discards;
+	stats->tx_dropped = estats->rx_discards;
+}
+
+/**
+ * i40e_port_netdev_get_cpu_hit_stats64
+ * @dev: network interface device structure
+ * @stats: netlink stats structure
+ *
+ * stats are filled from the priv structure. correspond to the packets
+ * that are seen by the cpu and sent/received via port netdev.
+ **/
+static int
+i40e_port_netdev_get_cpu_hit_stats64(const struct net_device *dev,
+				     struct rtnl_link_stats64 *stats)
+{
+	struct i40e_port_netdev_priv *priv = netdev_priv(dev);
+	int i;
+
+	for_each_possible_cpu(i) {
+		struct port_netdev_pcpu_stats *port_netdev_stats;
+		u64 tbytes, tpkts, tdrops, rbytes, rpkts;
+		unsigned int start;
+
+		port_netdev_stats = per_cpu_ptr(priv->stats, i);
+		do {
+			start = u64_stats_fetch_begin_irq(&port_netdev_stats->syncp);
+			tbytes = port_netdev_stats->tx_bytes;
+			tpkts = port_netdev_stats->tx_packets;
+			tdrops = port_netdev_stats->tx_drops;
+			rbytes = port_netdev_stats->rx_bytes;
+			rpkts = port_netdev_stats->rx_packets;
+		} while (u64_stats_fetch_retry_irq(&port_netdev_stats->syncp, start));
+		stats->tx_bytes += tbytes;
+		stats->tx_packets += tpkts;
+		stats->tx_dropped += tdrops;
+		stats->rx_bytes += rbytes;
+		stats->rx_packets += rpkts;
+	}
+
+	return 0;
+}
+
+static bool
+i40e_port_netdev_has_offload_stats(const struct net_device *dev, int attr_id)
+{
+	switch (attr_id) {
+	case IFLA_OFFLOAD_XSTATS_CPU_HIT:
+		return true;
+	}
+
+	return false;
+}
+
+static int
+i40e_port_netdev_get_offload_stats(int attr_id, const struct net_device *dev,
+				   void *sp)
+{
+	switch (attr_id) {
+	case IFLA_OFFLOAD_XSTATS_CPU_HIT:
+		return i40e_port_netdev_get_cpu_hit_stats64(dev, sp);
+	}
+
+	return -EINVAL;
+}
+
 static const struct net_device_ops i40e_port_netdev_ops = {
 	.ndo_open		= i40e_port_netdev_open,
 	.ndo_stop		= i40e_port_netdev_stop,
 	.ndo_start_xmit		= i40e_port_netdev_start_xmit,
+	.ndo_get_stats64	= i40e_port_netdev_get_stats64,
+	.ndo_has_offload_stats	= i40e_port_netdev_has_offload_stats,
+	.ndo_get_offload_stats	= i40e_port_netdev_get_offload_stats,
 };
 
 /**
@@ -11077,6 +11190,16 @@ int i40e_alloc_port_netdev(void *f, enum i40e_port_netdev_type type)
 		return -EINVAL;
 	}
 
+	priv->stats = netdev_alloc_pcpu_stats(struct port_netdev_pcpu_stats);
+	if (!priv->stats) {
+		dev_err(&pf->pdev->dev,
+			"alloc_pcpu_stats failed for port netdev: %s\n",
+			port_netdev->name);
+		dst_release((struct dst_entry *)priv->dst);
+		free_netdev(port_netdev);
+		return -ENOMEM;
+	}
+
 	port_netdev->netdev_ops = &i40e_port_netdev_ops;
 	eth_hw_addr_random(port_netdev);
 
@@ -11144,6 +11267,7 @@ void i40e_free_port_netdev(void *f, enum i40e_port_netdev_type type)
 			 pf->port_netdev->name);
 		priv = netdev_priv(pf->port_netdev);
 		dst_release((struct dst_entry *)priv->dst);
+		free_percpu(priv->stats);
 		unregister_netdev(pf->port_netdev);
 		free_netdev(pf->port_netdev);
 		pf->port_netdev = NULL;
@@ -11163,6 +11287,7 @@ void i40e_free_port_netdev(void *f, enum i40e_port_netdev_type type)
 			 vf->port_netdev->name);
 		priv = netdev_priv(vf->port_netdev);
 		dst_release((struct dst_entry *)priv->dst);
+		free_percpu(priv->stats);
 		unregister_netdev(vf->port_netdev);
 		free_netdev(vf->port_netdev);
 		vf->port_netdev = NULL;
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 86d2510..449a35c 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -1309,7 +1309,9 @@ static bool i40e_alloc_mapped_page(struct i40e_ring *rx_ring,
 static void i40e_handle_lpbk_skb(struct i40e_ring *rx_ring, struct sk_buff *skb)
 {
 	struct i40e_q_vector *q_vector = rx_ring->q_vector;
+	struct port_netdev_pcpu_stats *port_netdev_stats;
 	struct i40e_pf *pf = rx_ring->vsi->back;
+	struct i40e_port_netdev_priv *priv;
 	struct sk_buff *nskb;
 	struct i40e_vf *vf;
 	struct ethhdr *eth;
@@ -1334,6 +1336,12 @@ static void i40e_handle_lpbk_skb(struct i40e_ring *rx_ring, struct sk_buff *skb)
 				break;
 			nskb->offload_fwd_mark = 1;
 			nskb->dev = vf->port_netdev;
+			priv = netdev_priv(vf->port_netdev);
+			port_netdev_stats = this_cpu_ptr(priv->stats);
+			u64_stats_update_begin(&port_netdev_stats->syncp);
+			port_netdev_stats->rx_packets++;
+			port_netdev_stats->rx_bytes += nskb->len;
+			u64_stats_update_end(&port_netdev_stats->syncp);
 			napi_gro_receive(&q_vector->napi, nskb);
 			break;
 		}
@@ -3283,6 +3291,7 @@ netdev_tx_t i40e_port_netdev_start_xmit(struct sk_buff *skb,
 	struct i40e_vsi *vsi;
 	struct i40e_pf *pf;
 	struct i40e_vf *vf;
+	int ret;
 
 	switch (priv->type) {
 	case I40E_PORT_NETDEV_VF:
@@ -3302,5 +3311,18 @@ netdev_tx_t i40e_port_netdev_start_xmit(struct sk_buff *skb,
 	skb_dst_set(skb, &priv->dst->dst);
 	skb->dev = vsi->netdev;
 
-	return dev_queue_xmit(skb);
+	ret = dev_queue_xmit(skb);
+	if (likely(ret == NET_XMIT_SUCCESS || ret == NET_XMIT_CN)) {
+		struct port_netdev_pcpu_stats *port_netdev_stats;
+
+		port_netdev_stats = this_cpu_ptr(priv->stats);
+		u64_stats_update_begin(&port_netdev_stats->syncp);
+		port_netdev_stats->tx_packets++;
+		port_netdev_stats->tx_bytes += skb->len;
+		u64_stats_update_end(&port_netdev_stats->syncp);
+	} else {
+		this_cpu_inc(priv->stats->tx_drops);
+	}
+
+	return ret;
 }
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [next-queue v6 PATCH 7/7] i40e: Add support to get switch id and port number for port netdevs
  2017-03-30  0:22 ` [Intel-wired-lan] " Sridhar Samudrala
@ 2017-03-30  0:22   ` Sridhar Samudrala
  -1 siblings, 0 replies; 46+ messages in thread
From: Sridhar Samudrala @ 2017-03-30  0:22 UTC (permalink / raw)
  To: intel-wired-lan, netdev, alexander.h.duyck, anjali.singhai,
	jakub.kicinski, gerlitz.or, jiri, sridhar.samudrala

Introduce switchdev_ops to PF and port netdevs to return the switch id via
SWITCHDEV_ATTR_ID_PORT_PARENT_ID attribute.
Also, ndo_get_phys_port_name() support is added to port netdevs to return
the port number.

PF: p4p1, VFs: p4p1_0,p4p1_1, VF port reps:p4p1-vf0, p4p1-vf1,
PF port rep: p4p1-pf
# rmmod i40e; modprobe i40e
# devlink dev eswitch set pci/0000:42:00.0 mode switchdev
# echo 2 > /sys/class/net/enp5s0f0/device/sriov_numvfs
# ip -d l show p4p1
27: p4p1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 3c:fd:fe:a3:18:f8 brd ff:ff:ff:ff:ff:ff promiscuity 0 numtxqueues 64 numrxqueues 64 gso_max_size 65536 gso_max_segs 65535 portid 3cfdfea318f8 switchid 3cfdfea318f8
    vf 0 MAC 00:00:00:00:00:00, spoof checking on, link-state disable, trust off
    vf 1 MAC 00:00:00:00:00:00, spoof checking on, link-state disable, trust off
# ip -d l show p4p1-pf
29: p4p1-pf: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 42:7a:b5:dc:85:11 brd ff:ff:ff:ff:ff:ff promiscuity 0 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 portname 65535 switchid 3cfdfea318f8
# ip -d l show p4p1-vf0
30: p4p1-vf0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 6e:ff:0b:5a:63:6d brd ff:ff:ff:ff:ff:ff promiscuity 0 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 portname 0 switchid 3cfdfea318f8
# ip -d l show p4p1-vf1
31: p4p1-vf1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 92:6e:ff:35:05:d5 brd ff:ff:ff:ff:ff:ff promiscuity 0 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 portname 1 switchid 3cfdfea318f8

Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e.h      |  1 +
 drivers/net/ethernet/intel/i40e/i40e_main.c | 97 +++++++++++++++++++++++++++++
 2 files changed, 98 insertions(+)

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
index 72e11b2..9eb2ba5 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -56,6 +56,7 @@
 #include <linux/ptp_clock_kernel.h>
 #include <net/devlink.h>
 #include <net/dst_metadata.h>
+#include <net/switchdev.h>
 
 #include "i40e_type.h"
 #include "i40e_prototype.h"
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 4f0eebc..85f214d 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -4862,6 +4862,32 @@ static int i40e_vsi_configure_bw_alloc(struct i40e_vsi *vsi, u8 enabled_tc,
 	return 0;
 }
 
+static int i40e_switchdev_pf_attr_get(struct net_device *dev,
+				      struct switchdev_attr *attr)
+{
+	struct i40e_netdev_priv *np = netdev_priv(dev);
+	struct i40e_vsi *vsi = np->vsi;
+	struct i40e_pf *pf = vsi->back;
+
+	if (pf->eswitch_mode == DEVLINK_ESWITCH_MODE_LEGACY)
+		return -EOPNOTSUPP;
+
+	switch (attr->id) {
+	case SWITCHDEV_ATTR_ID_PORT_PARENT_ID:
+		attr->u.ppid.id_len = ETH_ALEN;
+		ether_addr_copy(attr->u.ppid.id, dev->dev_addr);
+		break;
+	default:
+		return -EOPNOTSUPP;
+	}
+
+	return 0;
+}
+
+static const struct switchdev_ops i40e_switchdev_pf_ops = {
+	.switchdev_port_attr_get	= i40e_switchdev_pf_attr_get,
+};
+
 /**
  * i40e_vsi_config_netdev_tc - Setup the netdev TC configuration
  * @vsi: the VSI being configured
@@ -9364,6 +9390,9 @@ static int i40e_config_netdev(struct i40e_vsi *vsi)
 	netdev->netdev_ops = &i40e_netdev_ops;
 	netdev->watchdog_timeo = 5 * HZ;
 	i40e_set_ethtool_ops(netdev);
+#ifdef CONFIG_NET_SWITCHDEV
+	netdev->switchdev_ops = &i40e_switchdev_pf_ops;
+#endif
 
 	/* MTU range: 68 - 9706 */
 	netdev->min_mtu = ETH_MIN_MTU;
@@ -11111,6 +11140,32 @@ static int i40e_port_netdev_stop(struct net_device *dev)
 	return -EINVAL;
 }
 
+static int
+i40e_port_netdev_get_phys_port_name(struct net_device *dev, char *buf,
+				    size_t len)
+{
+	struct i40e_port_netdev_priv *priv = netdev_priv(dev);
+	struct i40e_vf *vf;
+	int ret;
+
+	switch (priv->type) {
+	case I40E_PORT_NETDEV_VF:
+		vf = (struct i40e_vf *)priv->f;
+		ret = snprintf(buf, len, "%d", vf->vf_id);
+		break;
+	case I40E_PORT_NETDEV_PF:
+		ret = snprintf(buf, len, "%d", I40E_MAIN_VSI_PORT_ID);
+		break;
+	default:
+		return -EOPNOTSUPP;
+	}
+
+	if (ret >= len)
+		return -EOPNOTSUPP;
+
+	return 0;
+}
+
 static const struct net_device_ops i40e_port_netdev_ops = {
 	.ndo_open		= i40e_port_netdev_open,
 	.ndo_stop		= i40e_port_netdev_stop,
@@ -11118,6 +11173,44 @@ static int i40e_port_netdev_stop(struct net_device *dev)
 	.ndo_get_stats64	= i40e_port_netdev_get_stats64,
 	.ndo_has_offload_stats	= i40e_port_netdev_has_offload_stats,
 	.ndo_get_offload_stats	= i40e_port_netdev_get_offload_stats,
+	.ndo_get_phys_port_name	= i40e_port_netdev_get_phys_port_name,
+};
+
+static int i40e_switchdev_port_attr_get(struct net_device *dev,
+					struct switchdev_attr *attr)
+{
+	struct i40e_port_netdev_priv *priv = netdev_priv(dev);
+	struct i40e_vsi *vsi;
+	struct i40e_pf *pf;
+	struct i40e_vf *vf;
+
+	switch (priv->type) {
+	case I40E_PORT_NETDEV_VF:
+		vf = (struct i40e_vf *)priv->f;
+		pf = vf->pf;
+		break;
+	case I40E_PORT_NETDEV_PF:
+		pf = (struct i40e_pf *)priv->f;
+		break;
+	default:
+		return -EOPNOTSUPP;
+	}
+
+	vsi = pf->vsi[pf->lan_vsi];
+	switch (attr->id) {
+	case SWITCHDEV_ATTR_ID_PORT_PARENT_ID:
+		attr->u.ppid.id_len = ETH_ALEN;
+		ether_addr_copy(attr->u.ppid.id, vsi->netdev->dev_addr);
+		break;
+	default:
+		return -EOPNOTSUPP;
+	}
+
+	return 0;
+}
+
+static const struct switchdev_ops i40e_switchdev_port_ops = {
+	.switchdev_port_attr_get	= i40e_switchdev_port_attr_get,
 };
 
 /**
@@ -11203,6 +11296,10 @@ int i40e_alloc_port_netdev(void *f, enum i40e_port_netdev_type type)
 	port_netdev->netdev_ops = &i40e_port_netdev_ops;
 	eth_hw_addr_random(port_netdev);
 
+#ifdef CONFIG_NET_SWITCHDEV
+	port_netdev->switchdev_ops = &i40e_switchdev_port_ops;
+#endif
+
 	netif_carrier_off(port_netdev);
 	netif_tx_stop_all_queues(port_netdev);
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Intel-wired-lan] [next-queue v6 PATCH 7/7] i40e: Add support to get switch id and port number for port netdevs
@ 2017-03-30  0:22   ` Sridhar Samudrala
  0 siblings, 0 replies; 46+ messages in thread
From: Sridhar Samudrala @ 2017-03-30  0:22 UTC (permalink / raw)
  To: intel-wired-lan

Introduce switchdev_ops to PF and port netdevs to return the switch id via
SWITCHDEV_ATTR_ID_PORT_PARENT_ID attribute.
Also, ndo_get_phys_port_name() support is added to port netdevs to return
the port number.

PF: p4p1, VFs: p4p1_0,p4p1_1, VF port reps:p4p1-vf0, p4p1-vf1,
PF port rep: p4p1-pf
# rmmod i40e; modprobe i40e
# devlink dev eswitch set pci/0000:42:00.0 mode switchdev
# echo 2 > /sys/class/net/enp5s0f0/device/sriov_numvfs
# ip -d l show p4p1
27: p4p1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 3c:fd:fe:a3:18:f8 brd ff:ff:ff:ff:ff:ff promiscuity 0 numtxqueues 64 numrxqueues 64 gso_max_size 65536 gso_max_segs 65535 portid 3cfdfea318f8 switchid 3cfdfea318f8
    vf 0 MAC 00:00:00:00:00:00, spoof checking on, link-state disable, trust off
    vf 1 MAC 00:00:00:00:00:00, spoof checking on, link-state disable, trust off
# ip -d l show p4p1-pf
29: p4p1-pf: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 42:7a:b5:dc:85:11 brd ff:ff:ff:ff:ff:ff promiscuity 0 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 portname 65535 switchid 3cfdfea318f8
# ip -d l show p4p1-vf0
30: p4p1-vf0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 6e:ff:0b:5a:63:6d brd ff:ff:ff:ff:ff:ff promiscuity 0 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 portname 0 switchid 3cfdfea318f8
# ip -d l show p4p1-vf1
31: p4p1-vf1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 92:6e:ff:35:05:d5 brd ff:ff:ff:ff:ff:ff promiscuity 0 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 portname 1 switchid 3cfdfea318f8

Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e.h      |  1 +
 drivers/net/ethernet/intel/i40e/i40e_main.c | 97 +++++++++++++++++++++++++++++
 2 files changed, 98 insertions(+)

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
index 72e11b2..9eb2ba5 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -56,6 +56,7 @@
 #include <linux/ptp_clock_kernel.h>
 #include <net/devlink.h>
 #include <net/dst_metadata.h>
+#include <net/switchdev.h>
 
 #include "i40e_type.h"
 #include "i40e_prototype.h"
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 4f0eebc..85f214d 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -4862,6 +4862,32 @@ static int i40e_vsi_configure_bw_alloc(struct i40e_vsi *vsi, u8 enabled_tc,
 	return 0;
 }
 
+static int i40e_switchdev_pf_attr_get(struct net_device *dev,
+				      struct switchdev_attr *attr)
+{
+	struct i40e_netdev_priv *np = netdev_priv(dev);
+	struct i40e_vsi *vsi = np->vsi;
+	struct i40e_pf *pf = vsi->back;
+
+	if (pf->eswitch_mode == DEVLINK_ESWITCH_MODE_LEGACY)
+		return -EOPNOTSUPP;
+
+	switch (attr->id) {
+	case SWITCHDEV_ATTR_ID_PORT_PARENT_ID:
+		attr->u.ppid.id_len = ETH_ALEN;
+		ether_addr_copy(attr->u.ppid.id, dev->dev_addr);
+		break;
+	default:
+		return -EOPNOTSUPP;
+	}
+
+	return 0;
+}
+
+static const struct switchdev_ops i40e_switchdev_pf_ops = {
+	.switchdev_port_attr_get	= i40e_switchdev_pf_attr_get,
+};
+
 /**
  * i40e_vsi_config_netdev_tc - Setup the netdev TC configuration
  * @vsi: the VSI being configured
@@ -9364,6 +9390,9 @@ static int i40e_config_netdev(struct i40e_vsi *vsi)
 	netdev->netdev_ops = &i40e_netdev_ops;
 	netdev->watchdog_timeo = 5 * HZ;
 	i40e_set_ethtool_ops(netdev);
+#ifdef CONFIG_NET_SWITCHDEV
+	netdev->switchdev_ops = &i40e_switchdev_pf_ops;
+#endif
 
 	/* MTU range: 68 - 9706 */
 	netdev->min_mtu = ETH_MIN_MTU;
@@ -11111,6 +11140,32 @@ static int i40e_port_netdev_stop(struct net_device *dev)
 	return -EINVAL;
 }
 
+static int
+i40e_port_netdev_get_phys_port_name(struct net_device *dev, char *buf,
+				    size_t len)
+{
+	struct i40e_port_netdev_priv *priv = netdev_priv(dev);
+	struct i40e_vf *vf;
+	int ret;
+
+	switch (priv->type) {
+	case I40E_PORT_NETDEV_VF:
+		vf = (struct i40e_vf *)priv->f;
+		ret = snprintf(buf, len, "%d", vf->vf_id);
+		break;
+	case I40E_PORT_NETDEV_PF:
+		ret = snprintf(buf, len, "%d", I40E_MAIN_VSI_PORT_ID);
+		break;
+	default:
+		return -EOPNOTSUPP;
+	}
+
+	if (ret >= len)
+		return -EOPNOTSUPP;
+
+	return 0;
+}
+
 static const struct net_device_ops i40e_port_netdev_ops = {
 	.ndo_open		= i40e_port_netdev_open,
 	.ndo_stop		= i40e_port_netdev_stop,
@@ -11118,6 +11173,44 @@ static int i40e_port_netdev_stop(struct net_device *dev)
 	.ndo_get_stats64	= i40e_port_netdev_get_stats64,
 	.ndo_has_offload_stats	= i40e_port_netdev_has_offload_stats,
 	.ndo_get_offload_stats	= i40e_port_netdev_get_offload_stats,
+	.ndo_get_phys_port_name	= i40e_port_netdev_get_phys_port_name,
+};
+
+static int i40e_switchdev_port_attr_get(struct net_device *dev,
+					struct switchdev_attr *attr)
+{
+	struct i40e_port_netdev_priv *priv = netdev_priv(dev);
+	struct i40e_vsi *vsi;
+	struct i40e_pf *pf;
+	struct i40e_vf *vf;
+
+	switch (priv->type) {
+	case I40E_PORT_NETDEV_VF:
+		vf = (struct i40e_vf *)priv->f;
+		pf = vf->pf;
+		break;
+	case I40E_PORT_NETDEV_PF:
+		pf = (struct i40e_pf *)priv->f;
+		break;
+	default:
+		return -EOPNOTSUPP;
+	}
+
+	vsi = pf->vsi[pf->lan_vsi];
+	switch (attr->id) {
+	case SWITCHDEV_ATTR_ID_PORT_PARENT_ID:
+		attr->u.ppid.id_len = ETH_ALEN;
+		ether_addr_copy(attr->u.ppid.id, vsi->netdev->dev_addr);
+		break;
+	default:
+		return -EOPNOTSUPP;
+	}
+
+	return 0;
+}
+
+static const struct switchdev_ops i40e_switchdev_port_ops = {
+	.switchdev_port_attr_get	= i40e_switchdev_port_attr_get,
 };
 
 /**
@@ -11203,6 +11296,10 @@ int i40e_alloc_port_netdev(void *f, enum i40e_port_netdev_type type)
 	port_netdev->netdev_ops = &i40e_port_netdev_ops;
 	eth_hw_addr_random(port_netdev);
 
+#ifdef CONFIG_NET_SWITCHDEV
+	port_netdev->switchdev_ops = &i40e_switchdev_port_ops;
+#endif
+
 	netif_carrier_off(port_netdev);
 	netif_tx_stop_all_queues(port_netdev);
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Intel-wired-lan] [next-queue v6 PATCH 2/7] i40e: Introduce Port Representor netdevs and switchdev mode.
  2017-03-30  0:22   ` [Intel-wired-lan] " Sridhar Samudrala
  (?)
@ 2017-03-30  7:17   ` Or Gerlitz
  2017-04-03 18:41     ` Samudrala, Sridhar
  -1 siblings, 1 reply; 46+ messages in thread
From: Or Gerlitz @ 2017-03-30  7:17 UTC (permalink / raw)
  To: intel-wired-lan

On Thu, Mar 30, 2017 at 3:22 AM, Sridhar Samudrala <
sridhar.samudrala@intel.com> wrote:

> Port Representator netdevs are created for each PF and VF if the switch
> mode is set to 'switchdev'. These netdevs can be used to control and
> configure VFs and PFs when they are moved to a different namespace.
> They enable exposing statistics, configure and monitor link state, mtu,
> filters,fdb/vlan entries etc.
>


What netdev represents the uplink (wire port) in your impl?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osuosl.org/pipermail/intel-wired-lan/attachments/20170330/887c866e/attachment.html>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [next-queue v6 PATCH 2/7] i40e: Introduce Port Representor netdevs and switchdev mode.
  2017-03-30  0:22   ` [Intel-wired-lan] " Sridhar Samudrala
@ 2017-03-30  9:17     ` Or Gerlitz
  -1 siblings, 0 replies; 46+ messages in thread
From: Or Gerlitz @ 2017-03-30  9:17 UTC (permalink / raw)
  To: Sridhar Samudrala
  Cc: intel-wired-lan, Linux Netdev List, Alexander Duyck,
	Anjali Singhai Jain, Jakub Kicinski, Jiri Pirko

On Thu, Mar 30, 2017 at 3:22 AM, Sridhar Samudrala
<sridhar.samudrala@intel.com> wrote:
> Port Representator netdevs are created for each PF and VF if the switch
> mode is set to 'switchdev'. These netdevs can be used to control and
> configure VFs and PFs when they are moved to a different namespace.
> They enable exposing statistics, configure and monitor link state, mtu,
> filters,fdb/vlan entries etc.


What netdev represents the uplink (wire port) in your impl?

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [Intel-wired-lan] [next-queue v6 PATCH 2/7] i40e: Introduce Port Representor netdevs and switchdev mode.
@ 2017-03-30  9:17     ` Or Gerlitz
  0 siblings, 0 replies; 46+ messages in thread
From: Or Gerlitz @ 2017-03-30  9:17 UTC (permalink / raw)
  To: intel-wired-lan

On Thu, Mar 30, 2017 at 3:22 AM, Sridhar Samudrala
<sridhar.samudrala@intel.com> wrote:
> Port Representator netdevs are created for each PF and VF if the switch
> mode is set to 'switchdev'. These netdevs can be used to control and
> configure VFs and PFs when they are moved to a different namespace.
> They enable exposing statistics, configure and monitor link state, mtu,
> filters,fdb/vlan entries etc.


What netdev represents the uplink (wire port) in your impl?

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [next-queue v6 PATCH 5/7] i40e: Add TX and RX support over port netdev's in switchdev mode
  2017-03-30  0:22   ` [Intel-wired-lan] " Sridhar Samudrala
@ 2017-03-30  9:26     ` Or Gerlitz
  -1 siblings, 0 replies; 46+ messages in thread
From: Or Gerlitz @ 2017-03-30  9:26 UTC (permalink / raw)
  To: Sridhar Samudrala
  Cc: intel-wired-lan, Linux Netdev List, Alexander Duyck,
	Anjali Singhai Jain, Jakub Kicinski, Jiri Pirko

On Thu, Mar 30, 2017 at 3:22 AM, Sridhar Samudrala
<sridhar.samudrala@intel.com> wrote:
> Any frames sent via port netdevs are sent as directed transmits to the
> corresponding VFs.

okay, cool

> In switchdev mode, broadcasts from VFs are received by the PF and passed
> to corresponding port representor netdev.

not following.

If a VF sends a packet and it doesn't match any HW steering rule, then
it has to meet some default rule. Such rule can be fwd to host CPU or drop
or something else.

E.g in mlx5 currently it's fwd to CPU --> the packet is delivered to
the HW queue
of the corresponding VF rep is received into the host networking stack
from there
(the VF rep does netif_rx).

In this series you are not doing any offloading, right? so 100% of the packets
sent by VFs should meet your default rule which I assume you want to be
fwd to host CPU (--> vf rep)

Is that broadcast a special case which will remain in place also when you
add fdb/tc offloading? why not let the HW steering configuration for all types
of traffic be dictated by offloading some SW switching rules?

FWIW - I will not be online till Tues, so will see you reply only then

Or.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [Intel-wired-lan] [next-queue v6 PATCH 5/7] i40e: Add TX and RX support over port netdev's in switchdev mode
@ 2017-03-30  9:26     ` Or Gerlitz
  0 siblings, 0 replies; 46+ messages in thread
From: Or Gerlitz @ 2017-03-30  9:26 UTC (permalink / raw)
  To: intel-wired-lan

On Thu, Mar 30, 2017 at 3:22 AM, Sridhar Samudrala
<sridhar.samudrala@intel.com> wrote:
> Any frames sent via port netdevs are sent as directed transmits to the
> corresponding VFs.

okay, cool

> In switchdev mode, broadcasts from VFs are received by the PF and passed
> to corresponding port representor netdev.

not following.

If a VF sends a packet and it doesn't match any HW steering rule, then
it has to meet some default rule. Such rule can be fwd to host CPU or drop
or something else.

E.g in mlx5 currently it's fwd to CPU --> the packet is delivered to
the HW queue
of the corresponding VF rep is received into the host networking stack
from there
(the VF rep does netif_rx).

In this series you are not doing any offloading, right? so 100% of the packets
sent by VFs should meet your default rule which I assume you want to be
fwd to host CPU (--> vf rep)

Is that broadcast a special case which will remain in place also when you
add fdb/tc offloading? why not let the HW steering configuration for all types
of traffic be dictated by offloading some SW switching rules?

FWIW - I will not be online till Tues, so will see you reply only then

Or.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [next-queue v6 PATCH 7/7] i40e: Add support to get switch id and port number for port netdevs
  2017-03-30  0:22   ` [Intel-wired-lan] " Sridhar Samudrala
@ 2017-03-30 21:45     ` Jakub Kicinski
  -1 siblings, 0 replies; 46+ messages in thread
From: Jakub Kicinski @ 2017-03-30 21:45 UTC (permalink / raw)
  To: Sridhar Samudrala
  Cc: intel-wired-lan, netdev, alexander.h.duyck, anjali.singhai,
	gerlitz.or, jiri, Simon Horman

On Wed, 29 Mar 2017 17:22:55 -0700, Sridhar Samudrala wrote:
> Introduce switchdev_ops to PF and port netdevs to return the switch id via
> SWITCHDEV_ATTR_ID_PORT_PARENT_ID attribute.
> Also, ndo_get_phys_port_name() support is added to port netdevs to return
> the port number.
> 
...
> +static int
> +i40e_port_netdev_get_phys_port_name(struct net_device *dev, char *buf,
> +				    size_t len)
> +{
> +	struct i40e_port_netdev_priv *priv = netdev_priv(dev);
> +	struct i40e_vf *vf;
> +	int ret;
> +
> +	switch (priv->type) {
> +	case I40E_PORT_NETDEV_VF:
> +		vf = (struct i40e_vf *)priv->f;
> +		ret = snprintf(buf, len, "%d", vf->vf_id);
> +		break;
> +	case I40E_PORT_NETDEV_PF:
> +		ret = snprintf(buf, len, "%d", I40E_MAIN_VSI_PORT_ID);
> +		break;
> +	default:
> +		return -EOPNOTSUPP;
> +	}
> +
> +	if (ret >= len)
> +		return -EOPNOTSUPP;
> +
> +	return 0;
> +}

You are using only an integer here, which forces you to manually name
the netdev in patch 2, and that is what phys_port_name is supposed to
help avoid doing AFAIU.

We have naming rules in Documentation/networking/switchdev.txt for
switch ports suggested as pX for physical ports or pXsY for ports which
are broken out/split.  Could we establish similar suggestion for vf and
pf representors and document it? (note: we may need pf representors for
multi-host devices.)

IMHO naming representors pfr%d or vfr%d would make sense.  This way
actual VF and PF netdevs could be called pf%d and vf%d, and
udev/systemd will give all netdevs nice, meaningful names without any
custom rules.

Sorry for the bike shedding but I was hoping we could save some user
pain by establishing those rules (more or less) upfront.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [Intel-wired-lan] [next-queue v6 PATCH 7/7] i40e: Add support to get switch id and port number for port netdevs
@ 2017-03-30 21:45     ` Jakub Kicinski
  0 siblings, 0 replies; 46+ messages in thread
From: Jakub Kicinski @ 2017-03-30 21:45 UTC (permalink / raw)
  To: intel-wired-lan

On Wed, 29 Mar 2017 17:22:55 -0700, Sridhar Samudrala wrote:
> Introduce switchdev_ops to PF and port netdevs to return the switch id via
> SWITCHDEV_ATTR_ID_PORT_PARENT_ID attribute.
> Also, ndo_get_phys_port_name() support is added to port netdevs to return
> the port number.
> 
...
> +static int
> +i40e_port_netdev_get_phys_port_name(struct net_device *dev, char *buf,
> +				    size_t len)
> +{
> +	struct i40e_port_netdev_priv *priv = netdev_priv(dev);
> +	struct i40e_vf *vf;
> +	int ret;
> +
> +	switch (priv->type) {
> +	case I40E_PORT_NETDEV_VF:
> +		vf = (struct i40e_vf *)priv->f;
> +		ret = snprintf(buf, len, "%d", vf->vf_id);
> +		break;
> +	case I40E_PORT_NETDEV_PF:
> +		ret = snprintf(buf, len, "%d", I40E_MAIN_VSI_PORT_ID);
> +		break;
> +	default:
> +		return -EOPNOTSUPP;
> +	}
> +
> +	if (ret >= len)
> +		return -EOPNOTSUPP;
> +
> +	return 0;
> +}

You are using only an integer here, which forces you to manually name
the netdev in patch 2, and that is what phys_port_name is supposed to
help avoid doing AFAIU.

We have naming rules in Documentation/networking/switchdev.txt for
switch ports suggested as pX for physical ports or pXsY for ports which
are broken out/split.  Could we establish similar suggestion for vf and
pf representors and document it? (note: we may need pf representors for
multi-host devices.)

IMHO naming representors pfr%d or vfr%d would make sense.  This way
actual VF and PF netdevs could be called pf%d and vf%d, and
udev/systemd will give all netdevs nice, meaningful names without any
custom rules.

Sorry for the bike shedding but I was hoping we could save some user
pain by establishing those rules (more or less) upfront.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [next-queue v6 PATCH 7/7] i40e: Add support to get switch id and port number for port netdevs
  2017-03-30 21:45     ` [Intel-wired-lan] " Jakub Kicinski
@ 2017-03-30 22:31       ` Alexander Duyck
  -1 siblings, 0 replies; 46+ messages in thread
From: Alexander Duyck @ 2017-03-30 22:31 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Sridhar Samudrala, intel-wired-lan, Netdev, Duyck, Alexander H,
	Anjali Singhai Jain, Or Gerlitz, Jiri Pirko, Simon Horman

On Thu, Mar 30, 2017 at 2:45 PM, Jakub Kicinski
<jakub.kicinski@netronome.com> wrote:
> On Wed, 29 Mar 2017 17:22:55 -0700, Sridhar Samudrala wrote:
>> Introduce switchdev_ops to PF and port netdevs to return the switch id via
>> SWITCHDEV_ATTR_ID_PORT_PARENT_ID attribute.
>> Also, ndo_get_phys_port_name() support is added to port netdevs to return
>> the port number.
>>
> ...
>> +static int
>> +i40e_port_netdev_get_phys_port_name(struct net_device *dev, char *buf,
>> +                                 size_t len)
>> +{
>> +     struct i40e_port_netdev_priv *priv = netdev_priv(dev);
>> +     struct i40e_vf *vf;
>> +     int ret;
>> +
>> +     switch (priv->type) {
>> +     case I40E_PORT_NETDEV_VF:
>> +             vf = (struct i40e_vf *)priv->f;
>> +             ret = snprintf(buf, len, "%d", vf->vf_id);
>> +             break;
>> +     case I40E_PORT_NETDEV_PF:
>> +             ret = snprintf(buf, len, "%d", I40E_MAIN_VSI_PORT_ID);
>> +             break;
>> +     default:
>> +             return -EOPNOTSUPP;
>> +     }
>> +
>> +     if (ret >= len)
>> +             return -EOPNOTSUPP;
>> +
>> +     return 0;
>> +}
>
> You are using only an integer here, which forces you to manually name
> the netdev in patch 2, and that is what phys_port_name is supposed to
> help avoid doing AFAIU.
>
> We have naming rules in Documentation/networking/switchdev.txt for
> switch ports suggested as pX for physical ports or pXsY for ports which
> are broken out/split.  Could we establish similar suggestion for vf and
> pf representors and document it? (note: we may need pf representors for
> multi-host devices.)
>
> IMHO naming representors pfr%d or vfr%d would make sense.  This way
> actual VF and PF netdevs could be called pf%d and vf%d, and
> udev/systemd will give all netdevs nice, meaningful names without any
> custom rules.
>
> Sorry for the bike shedding but I was hoping we could save some user
> pain by establishing those rules (more or less) upfront.

This is something we should probably discuss at netdev/netconf next
week. It seems like the convention has been to just use an integer and
I think we might want to look at doing something like you are
suggesting where if nothing else we come up with a way of identifying
that a VF versus something like a segmented port which is the only
thing currently defined in the documentation.

- Alex

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [Intel-wired-lan] [next-queue v6 PATCH 7/7] i40e: Add support to get switch id and port number for port netdevs
@ 2017-03-30 22:31       ` Alexander Duyck
  0 siblings, 0 replies; 46+ messages in thread
From: Alexander Duyck @ 2017-03-30 22:31 UTC (permalink / raw)
  To: intel-wired-lan

On Thu, Mar 30, 2017 at 2:45 PM, Jakub Kicinski
<jakub.kicinski@netronome.com> wrote:
> On Wed, 29 Mar 2017 17:22:55 -0700, Sridhar Samudrala wrote:
>> Introduce switchdev_ops to PF and port netdevs to return the switch id via
>> SWITCHDEV_ATTR_ID_PORT_PARENT_ID attribute.
>> Also, ndo_get_phys_port_name() support is added to port netdevs to return
>> the port number.
>>
> ...
>> +static int
>> +i40e_port_netdev_get_phys_port_name(struct net_device *dev, char *buf,
>> +                                 size_t len)
>> +{
>> +     struct i40e_port_netdev_priv *priv = netdev_priv(dev);
>> +     struct i40e_vf *vf;
>> +     int ret;
>> +
>> +     switch (priv->type) {
>> +     case I40E_PORT_NETDEV_VF:
>> +             vf = (struct i40e_vf *)priv->f;
>> +             ret = snprintf(buf, len, "%d", vf->vf_id);
>> +             break;
>> +     case I40E_PORT_NETDEV_PF:
>> +             ret = snprintf(buf, len, "%d", I40E_MAIN_VSI_PORT_ID);
>> +             break;
>> +     default:
>> +             return -EOPNOTSUPP;
>> +     }
>> +
>> +     if (ret >= len)
>> +             return -EOPNOTSUPP;
>> +
>> +     return 0;
>> +}
>
> You are using only an integer here, which forces you to manually name
> the netdev in patch 2, and that is what phys_port_name is supposed to
> help avoid doing AFAIU.
>
> We have naming rules in Documentation/networking/switchdev.txt for
> switch ports suggested as pX for physical ports or pXsY for ports which
> are broken out/split.  Could we establish similar suggestion for vf and
> pf representors and document it? (note: we may need pf representors for
> multi-host devices.)
>
> IMHO naming representors pfr%d or vfr%d would make sense.  This way
> actual VF and PF netdevs could be called pf%d and vf%d, and
> udev/systemd will give all netdevs nice, meaningful names without any
> custom rules.
>
> Sorry for the bike shedding but I was hoping we could save some user
> pain by establishing those rules (more or less) upfront.

This is something we should probably discuss at netdev/netconf next
week. It seems like the convention has been to just use an integer and
I think we might want to look at doing something like you are
suggesting where if nothing else we come up with a way of identifying
that a VF versus something like a segmented port which is the only
thing currently defined in the documentation.

- Alex

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [next-queue v6 PATCH 7/7] i40e: Add support to get switch id and port number for port netdevs
  2017-03-30 22:31       ` [Intel-wired-lan] " Alexander Duyck
@ 2017-03-31  1:16         ` Jakub Kicinski
  -1 siblings, 0 replies; 46+ messages in thread
From: Jakub Kicinski @ 2017-03-31  1:16 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Sridhar Samudrala, intel-wired-lan, Netdev, Duyck, Alexander H,
	Anjali Singhai Jain, Or Gerlitz, Jiri Pirko, Simon Horman

On Thu, 30 Mar 2017 15:31:01 -0700, Alexander Duyck wrote:
> On Thu, Mar 30, 2017 at 2:45 PM, Jakub Kicinski
> <jakub.kicinski@netronome.com> wrote:
> > On Wed, 29 Mar 2017 17:22:55 -0700, Sridhar Samudrala wrote:  
> >> Introduce switchdev_ops to PF and port netdevs to return the switch id via
> >> SWITCHDEV_ATTR_ID_PORT_PARENT_ID attribute.
> >> Also, ndo_get_phys_port_name() support is added to port netdevs to return
> >> the port number.
> >>  
> > ...  
> >> +static int
> >> +i40e_port_netdev_get_phys_port_name(struct net_device *dev, char *buf,
> >> +                                 size_t len)
> >> +{
> >> +     struct i40e_port_netdev_priv *priv = netdev_priv(dev);
> >> +     struct i40e_vf *vf;
> >> +     int ret;
> >> +
> >> +     switch (priv->type) {
> >> +     case I40E_PORT_NETDEV_VF:
> >> +             vf = (struct i40e_vf *)priv->f;
> >> +             ret = snprintf(buf, len, "%d", vf->vf_id);
> >> +             break;
> >> +     case I40E_PORT_NETDEV_PF:
> >> +             ret = snprintf(buf, len, "%d", I40E_MAIN_VSI_PORT_ID);
> >> +             break;
> >> +     default:
> >> +             return -EOPNOTSUPP;
> >> +     }
> >> +
> >> +     if (ret >= len)
> >> +             return -EOPNOTSUPP;
> >> +
> >> +     return 0;
> >> +}  
> >
> > You are using only an integer here, which forces you to manually name
> > the netdev in patch 2, and that is what phys_port_name is supposed to
> > help avoid doing AFAIU.
> >
> > We have naming rules in Documentation/networking/switchdev.txt for
> > switch ports suggested as pX for physical ports or pXsY for ports which
> > are broken out/split.  Could we establish similar suggestion for vf and
> > pf representors and document it? (note: we may need pf representors for
> > multi-host devices.)
> >
> > IMHO naming representors pfr%d or vfr%d would make sense.  This way
> > actual VF and PF netdevs could be called pf%d and vf%d, and
> > udev/systemd will give all netdevs nice, meaningful names without any
> > custom rules.
> >
> > Sorry for the bike shedding but I was hoping we could save some user
> > pain by establishing those rules (more or less) upfront.  
> 
> This is something we should probably discuss at netdev/netconf next
> week. It seems like the convention has been to just use an integer and
> I think we might want to look at doing something like you are
> suggesting where if nothing else we come up with a way of identifying
> that a VF versus something like a segmented port which is the only
> thing currently defined in the documentation.

Sure.  If we want to talk about this at netdev there is another
more minor thing we were pondering.  How to represent the VF -- PCI DEV
-- representor netdev relation nicely e.g. for OpenStack integration?

AFAIU when PCI device is added to a VM user space should add the
representors to appropriate bridges and fire the legacy sriov ndos
to set mac/vlan.  VF PCI dev and PF PCI dev are nicely linked in sysfs
via virtfnX and physfn files.  But going from VF PCI dev to the
representor requires iteration over all representor netdevs to find the
right switchdev_id + phys_port_name combination.

One way to solve this would be to SET_NETDEV_DEV() the representor
netdev to the VF pci dev, but then representors may not share the base
enpXsYfZ name since they will be using different PCI devices as the
parent...

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [Intel-wired-lan] [next-queue v6 PATCH 7/7] i40e: Add support to get switch id and port number for port netdevs
@ 2017-03-31  1:16         ` Jakub Kicinski
  0 siblings, 0 replies; 46+ messages in thread
From: Jakub Kicinski @ 2017-03-31  1:16 UTC (permalink / raw)
  To: intel-wired-lan

On Thu, 30 Mar 2017 15:31:01 -0700, Alexander Duyck wrote:
> On Thu, Mar 30, 2017 at 2:45 PM, Jakub Kicinski
> <jakub.kicinski@netronome.com> wrote:
> > On Wed, 29 Mar 2017 17:22:55 -0700, Sridhar Samudrala wrote:  
> >> Introduce switchdev_ops to PF and port netdevs to return the switch id via
> >> SWITCHDEV_ATTR_ID_PORT_PARENT_ID attribute.
> >> Also, ndo_get_phys_port_name() support is added to port netdevs to return
> >> the port number.
> >>  
> > ...  
> >> +static int
> >> +i40e_port_netdev_get_phys_port_name(struct net_device *dev, char *buf,
> >> +                                 size_t len)
> >> +{
> >> +     struct i40e_port_netdev_priv *priv = netdev_priv(dev);
> >> +     struct i40e_vf *vf;
> >> +     int ret;
> >> +
> >> +     switch (priv->type) {
> >> +     case I40E_PORT_NETDEV_VF:
> >> +             vf = (struct i40e_vf *)priv->f;
> >> +             ret = snprintf(buf, len, "%d", vf->vf_id);
> >> +             break;
> >> +     case I40E_PORT_NETDEV_PF:
> >> +             ret = snprintf(buf, len, "%d", I40E_MAIN_VSI_PORT_ID);
> >> +             break;
> >> +     default:
> >> +             return -EOPNOTSUPP;
> >> +     }
> >> +
> >> +     if (ret >= len)
> >> +             return -EOPNOTSUPP;
> >> +
> >> +     return 0;
> >> +}  
> >
> > You are using only an integer here, which forces you to manually name
> > the netdev in patch 2, and that is what phys_port_name is supposed to
> > help avoid doing AFAIU.
> >
> > We have naming rules in Documentation/networking/switchdev.txt for
> > switch ports suggested as pX for physical ports or pXsY for ports which
> > are broken out/split.  Could we establish similar suggestion for vf and
> > pf representors and document it? (note: we may need pf representors for
> > multi-host devices.)
> >
> > IMHO naming representors pfr%d or vfr%d would make sense.  This way
> > actual VF and PF netdevs could be called pf%d and vf%d, and
> > udev/systemd will give all netdevs nice, meaningful names without any
> > custom rules.
> >
> > Sorry for the bike shedding but I was hoping we could save some user
> > pain by establishing those rules (more or less) upfront.  
> 
> This is something we should probably discuss at netdev/netconf next
> week. It seems like the convention has been to just use an integer and
> I think we might want to look at doing something like you are
> suggesting where if nothing else we come up with a way of identifying
> that a VF versus something like a segmented port which is the only
> thing currently defined in the documentation.

Sure.  If we want to talk about this at netdev there is another
more minor thing we were pondering.  How to represent the VF -- PCI DEV
-- representor netdev relation nicely e.g. for OpenStack integration?

AFAIU when PCI device is added to a VM user space should add the
representors to appropriate bridges and fire the legacy sriov ndos
to set mac/vlan.  VF PCI dev and PF PCI dev are nicely linked in sysfs
via virtfnX and physfn files.  But going from VF PCI dev to the
representor requires iteration over all representor netdevs to find the
right switchdev_id + phys_port_name combination.

One way to solve this would be to SET_NETDEV_DEV() the representor
netdev to the VF pci dev, but then representors may not share the base
enpXsYfZ name since they will be using different PCI devices as the
parent...

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [Intel-wired-lan] [next-queue v6 PATCH 1/7] i40e: Introduce devlink interface
  2017-03-30  0:22   ` [Intel-wired-lan] " Sridhar Samudrala
  (?)
@ 2017-03-31 19:35   ` Bowers, AndrewX
  -1 siblings, 0 replies; 46+ messages in thread
From: Bowers, AndrewX @ 2017-03-31 19:35 UTC (permalink / raw)
  To: intel-wired-lan

> -----Original Message-----
> From: Intel-wired-lan [mailto:intel-wired-lan-bounces at lists.osuosl.org] On
> Behalf Of Sridhar Samudrala
> Sent: Wednesday, March 29, 2017 5:23 PM
> To: intel-wired-lan at lists.osuosl.org; netdev at vger.kernel.org; Duyck,
> Alexander H <alexander.h.duyck@intel.com>; Singhai, Anjali
> <anjali.singhai@intel.com>; jakub.kicinski at netronome.com;
> gerlitz.or at gmail.com; jiri at resnulli.us; Samudrala, Sridhar
> <sridhar.samudrala@intel.com>
> Subject: [Intel-wired-lan] [next-queue v6 PATCH 1/7] i40e: Introduce devlink
> interface
> 
> Add initial devlink support to get/set the mode of SRIOV switch.
> This patch sets the default mode as 'legacy' and enables getting the mode
> and and setting it to 'legacy'.
> 
> The switch mode can be get/set via following 'devlink' commands.
> 
> # devlink dev eswitch show pci/0000:42:00.0
> pci/0000:42:00.0: mode legacy
> # devlink dev eswitch set pci/0000:42:00.0 mode switchdev devlink answers:
> Operation not supported # devlink dev eswitch set pci/0000:42:00.0 mode
> legacy # devlink dev eswitch show pci/0000:42:00.0
> pci/0000:05:00.0: mode legacy
> 
> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
> ---
>  drivers/net/ethernet/intel/Kconfig          |  1 +
>  drivers/net/ethernet/intel/i40e/i40e.h      |  3 ++
>  drivers/net/ethernet/intel/i40e/i40e_main.c | 77
> ++++++++++++++++++++++++++---
>  3 files changed, 75 insertions(+), 6 deletions(-)

Tested-by: Andrew Bowers <andrewx.bowers@intel.com>



^ permalink raw reply	[flat|nested] 46+ messages in thread

* [Intel-wired-lan] [next-queue v6 PATCH 2/7] i40e: Introduce Port Representor netdevs and switchdev mode.
  2017-03-30  0:22   ` [Intel-wired-lan] " Sridhar Samudrala
                     ` (2 preceding siblings ...)
  (?)
@ 2017-03-31 19:35   ` Bowers, AndrewX
  -1 siblings, 0 replies; 46+ messages in thread
From: Bowers, AndrewX @ 2017-03-31 19:35 UTC (permalink / raw)
  To: intel-wired-lan

> -----Original Message-----
> From: Intel-wired-lan [mailto:intel-wired-lan-bounces at lists.osuosl.org] On
> Behalf Of Sridhar Samudrala
> Sent: Wednesday, March 29, 2017 5:23 PM
> To: intel-wired-lan at lists.osuosl.org; netdev at vger.kernel.org; Duyck,
> Alexander H <alexander.h.duyck@intel.com>; Singhai, Anjali
> <anjali.singhai@intel.com>; jakub.kicinski at netronome.com;
> gerlitz.or at gmail.com; jiri at resnulli.us; Samudrala, Sridhar
> <sridhar.samudrala@intel.com>
> Subject: [Intel-wired-lan] [next-queue v6 PATCH 2/7] i40e: Introduce Port
> Representor netdevs and switchdev mode.
> 
> Port Representator netdevs are created for each PF and VF if the switch
> mode is set to 'switchdev'. These netdevs can be used to control and
> configure VFs and PFs when they are moved to a different namespace.
> They enable exposing statistics, configure and monitor link state, mtu,
> filters,fdb/vlan entries etc.
> 
> Sample script to create port representors # rmmod i40e; modprobe i40e #
> devlink dev eswitch set pci/0000:42:00.0 mode switchdev # echo 2 >
> /sys/class/net/p4p1/device/sriov_numvfs
> # ip l show
> 122: p4p1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN
> mode DEFAULT qlen 1000
>     link/ether 3c:fd:fe:a3:18:f8 brd ff:ff:ff:ff:ff:ff
>     vf 0 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off
>     vf 1 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off
> 124: p4p1-pf: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN
> mode DEFAULT qlen 1000
>     link/ether 72:8e:34:b2:d0:44 brd ff:ff:ff:ff:ff:ff
> 125: p4p1-vf0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN
> mode DEFAULT qlen 1000
>     link/ether 02:57:a0:18:2b:ce brd ff:ff:ff:ff:ff:ff
> 126: p4p1-vf1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN
> mode DEFAULT qlen 1000
>     link/ether 32:7c:77:5f:3e:e3 brd ff:ff:ff:ff:ff:ff
> 127: p4p1_0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN
> mode DEFAULT qlen 1000
>     link/ether 26:51:28:54:69:43 brd ff:ff:ff:ff:ff:ff
> 128: p4p1_1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN
> mode DEFAULT qlen 1000
> 
> p4p1 is the PF. p4p1-pf is the port netdev for PF.
> p4p1_0, p4p1_1 are VFs and p4p1-vf0, p4p1-vf1 are the port netdev's for the
> 2 VFs.
> 
> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
> ---
>  drivers/net/ethernet/intel/i40e/i40e.h             |  19 +++
>  drivers/net/ethernet/intel/i40e/i40e_main.c        | 187
> ++++++++++++++++++++-
>  drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c |   9 +
>  drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h |   6 +
>  4 files changed, 220 insertions(+), 1 deletion(-)

Tested-by: Andrew Bowers <andrewx.bowers@intel.com>



^ permalink raw reply	[flat|nested] 46+ messages in thread

* [Intel-wired-lan] [next-queue v6 PATCH 3/7] i40e: Sync link state between PF/VFs and Port representor netdevs
  2017-03-30  0:22   ` [Intel-wired-lan] " Sridhar Samudrala
  (?)
@ 2017-03-31 19:37   ` Bowers, AndrewX
  -1 siblings, 0 replies; 46+ messages in thread
From: Bowers, AndrewX @ 2017-03-31 19:37 UTC (permalink / raw)
  To: intel-wired-lan

> -----Original Message-----
> From: Intel-wired-lan [mailto:intel-wired-lan-bounces at lists.osuosl.org] On
> Behalf Of Sridhar Samudrala
> Sent: Wednesday, March 29, 2017 5:23 PM
> To: intel-wired-lan at lists.osuosl.org; netdev at vger.kernel.org; Duyck,
> Alexander H <alexander.h.duyck@intel.com>; Singhai, Anjali
> <anjali.singhai@intel.com>; jakub.kicinski at netronome.com;
> gerlitz.or at gmail.com; jiri at resnulli.us; Samudrala, Sridhar
> <sridhar.samudrala@intel.com>
> Subject: [Intel-wired-lan] [next-queue v6 PATCH 3/7] i40e: Sync link state
> between PF/VFs and Port representor netdevs
> 
> This patch enables
> - reflecting the link state of port netdev based on PF/VF admin state &
>   link state of PF/VF based on admin state of the associated port netdev.
> - bringing up/down the VF port netdev sends a notification to update VF
>   link state.
> - bringing up/down the VF will cause the link state update of VF port
>   netdev.
> - enable/disable VF link state via ndo_set_vf_link_state will update the
>   admin state of associated VF port netdev.
> - bringing up/down the PF port netdev updates the link state of PF based on
>   the hw link info.
> - bringing up/down the PF will update the link state of PF port netdev.
> 
> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
> ---
>  drivers/net/ethernet/intel/i40e/i40e_main.c        | 106
> ++++++++++++++++++++-
>  drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c |  21 +++-
>  drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h |   1 +
>  3 files changed, 122 insertions(+), 6 deletions(-)

Tested-by: Andrew Bowers <andrewx.bowers@intel.com>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [Intel-wired-lan] [next-queue v6 PATCH 6/7] i40e: Add support for exposing switch port statistics via port netdevs
  2017-03-30  0:22   ` [Intel-wired-lan] " Sridhar Samudrala
  (?)
@ 2017-03-31 19:39   ` Bowers, AndrewX
  -1 siblings, 0 replies; 46+ messages in thread
From: Bowers, AndrewX @ 2017-03-31 19:39 UTC (permalink / raw)
  To: intel-wired-lan

> -----Original Message-----
> From: Intel-wired-lan [mailto:intel-wired-lan-bounces at lists.osuosl.org] On
> Behalf Of Sridhar Samudrala
> Sent: Wednesday, March 29, 2017 5:23 PM
> To: intel-wired-lan at lists.osuosl.org; netdev at vger.kernel.org; Duyck,
> Alexander H <alexander.h.duyck@intel.com>; Singhai, Anjali
> <anjali.singhai@intel.com>; jakub.kicinski at netronome.com;
> gerlitz.or at gmail.com; jiri at resnulli.us; Samudrala, Sridhar
> <sridhar.samudrala@intel.com>
> Subject: [Intel-wired-lan] [next-queue v6 PATCH 6/7] i40e: Add support for
> exposing switch port statistics via port netdevs
> 
> By default stats counted by HW are returned via the original
> ndo_get_stats64() api. Stats counted in SW are returned via
> ndo_get_offload_stats() api.
> 
> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
> ---
>  drivers/net/ethernet/intel/i40e/i40e.h      |  10 +++
>  drivers/net/ethernet/intel/i40e/i40e_main.c | 125
> ++++++++++++++++++++++++++++
> drivers/net/ethernet/intel/i40e/i40e_txrx.c |  24 +++++-
>  3 files changed, 158 insertions(+), 1 deletion(-)

Tested-by: Andrew Bowers <andrewx.bowers@intel.com>



^ permalink raw reply	[flat|nested] 46+ messages in thread

* [Intel-wired-lan] [next-queue v6 PATCH 4/7] net: store port/representator id in metadata_dst
  2017-03-30  0:22   ` [Intel-wired-lan] " Sridhar Samudrala
  (?)
@ 2017-03-31 19:42   ` Bowers, AndrewX
  -1 siblings, 0 replies; 46+ messages in thread
From: Bowers, AndrewX @ 2017-03-31 19:42 UTC (permalink / raw)
  To: intel-wired-lan

> -----Original Message-----
> From: Intel-wired-lan [mailto:intel-wired-lan-bounces at lists.osuosl.org] On
> Behalf Of Sridhar Samudrala
> Sent: Wednesday, March 29, 2017 5:23 PM
> To: intel-wired-lan at lists.osuosl.org; netdev at vger.kernel.org; Duyck,
> Alexander H <alexander.h.duyck@intel.com>; Singhai, Anjali
> <anjali.singhai@intel.com>; jakub.kicinski at netronome.com;
> gerlitz.or at gmail.com; jiri at resnulli.us; Samudrala, Sridhar
> <sridhar.samudrala@intel.com>
> Subject: [Intel-wired-lan] [next-queue v6 PATCH 4/7] net: store
> port/representator id in metadata_dst
> 
> From: Jakub Kicinski <jakub.kicinski@netronome.com>
> 
> Switches and modern SR-IOV enabled NICs may multiplex traffic from Port
> representators and control messages over single set of hardware queues.
> Control messages and muxed traffic may need ordered delivery.
> 
> Those requirements make it hard to comfortably use TC infrastructure today
> unless we have a way of attaching metadata to skbs at the upper device.
> Because single set of queues is used for many netdevs stopping TC/sched
> queues of all of them reliably is impossible and lower device has to retreat to
> returning NETDEV_TX_BUSY and usually has to take extra locks on the
> fastpath.
> 
> This patch attempts to enable port/representative devs to attach metadata
> to skbs which carry port id.  This way representatives can be queueless and
> all queuing can be performed at the lower netdev in the usual way.
> 
> Traffic arriving on the port/representative interfaces will be have metadata
> attached and will subsequently be queued to the lower device for
> transmission. The lower device should recognize the metadata and translate
> it to HW specific format which is most likely either a special header inserted
> before the network headers or descriptor/metadata fields.
> 
> Metadata is associated with the lower device by storing the netdev pointer
> along with port id so that if TC decides to redirect or mirror the new netdev
> will not try to interpret it.
> 
> This is mostly for SR-IOV devices since switches don't have lower netdevs
> today.
> 
> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
> ---
>  include/net/dst_metadata.h     | 41
> ++++++++++++++++++++++++++++++++---------
>  net/core/dst.c                 | 15 ++++++++++-----
>  net/core/filter.c              |  1 +
>  net/ipv4/ip_tunnel_core.c      |  6 ++++--
>  net/openvswitch/flow_netlink.c |  4 +++-
>  5 files changed, 50 insertions(+), 17 deletions(-)

Tested-by: Andrew Bowers <andrewx.bowers@intel.com>



^ permalink raw reply	[flat|nested] 46+ messages in thread

* [Intel-wired-lan] [next-queue v6 PATCH 7/7] i40e: Add support to get switch id and port number for port netdevs
  2017-03-30  0:22   ` [Intel-wired-lan] " Sridhar Samudrala
  (?)
  (?)
@ 2017-03-31 21:09   ` Bowers, AndrewX
  -1 siblings, 0 replies; 46+ messages in thread
From: Bowers, AndrewX @ 2017-03-31 21:09 UTC (permalink / raw)
  To: intel-wired-lan

> -----Original Message-----
> From: Intel-wired-lan [mailto:intel-wired-lan-bounces at lists.osuosl.org] On
> Behalf Of Sridhar Samudrala
> Sent: Wednesday, March 29, 2017 5:23 PM
> To: intel-wired-lan at lists.osuosl.org; netdev at vger.kernel.org; Duyck,
> Alexander H <alexander.h.duyck@intel.com>; Singhai, Anjali
> <anjali.singhai@intel.com>; jakub.kicinski at netronome.com;
> gerlitz.or at gmail.com; jiri at resnulli.us; Samudrala, Sridhar
> <sridhar.samudrala@intel.com>
> Subject: [Intel-wired-lan] [next-queue v6 PATCH 7/7] i40e: Add support to
> get switch id and port number for port netdevs
> 
> Introduce switchdev_ops to PF and port netdevs to return the switch id via
> SWITCHDEV_ATTR_ID_PORT_PARENT_ID attribute.
> Also, ndo_get_phys_port_name() support is added to port netdevs to return
> the port number.
> 
> PF: p4p1, VFs: p4p1_0,p4p1_1, VF port reps:p4p1-vf0, p4p1-vf1, PF port rep:
> p4p1-pf # rmmod i40e; modprobe i40e # devlink dev eswitch set
> pci/0000:42:00.0 mode switchdev # echo 2 >
> /sys/class/net/enp5s0f0/device/sriov_numvfs
> # ip -d l show p4p1
> 27: p4p1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode
> DEFAULT group default qlen 1000
>     link/ether 3c:fd:fe:a3:18:f8 brd ff:ff:ff:ff:ff:ff promiscuity 0 numtxqueues
> 64 numrxqueues 64 gso_max_size 65536 gso_max_segs 65535 portid
> 3cfdfea318f8 switchid 3cfdfea318f8
>     vf 0 MAC 00:00:00:00:00:00, spoof checking on, link-state disable, trust off
>     vf 1 MAC 00:00:00:00:00:00, spoof checking on, link-state disable, trust off #
> ip -d l show p4p1-pf
> 29: p4p1-pf: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN
> mode DEFAULT group default qlen 1000
>     link/ether 42:7a:b5:dc:85:11 brd ff:ff:ff:ff:ff:ff promiscuity 0 numtxqueues
> 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 portname 65535
> switchid 3cfdfea318f8 # ip -d l show p4p1-vf0
> 30: p4p1-vf0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN
> mode DEFAULT group default qlen 1000
>     link/ether 6e:ff:0b:5a:63:6d brd ff:ff:ff:ff:ff:ff promiscuity 0 numtxqueues 1
> numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 portname 0
> switchid 3cfdfea318f8 # ip -d l show p4p1-vf1
> 31: p4p1-vf1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN
> mode DEFAULT group default qlen 1000
>     link/ether 92:6e:ff:35:05:d5 brd ff:ff:ff:ff:ff:ff promiscuity 0 numtxqueues 1
> numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 portname 1
> switchid 3cfdfea318f8
> 
> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
> ---
>  drivers/net/ethernet/intel/i40e/i40e.h      |  1 +
>  drivers/net/ethernet/intel/i40e/i40e_main.c | 97
> +++++++++++++++++++++++++++++
>  2 files changed, 98 insertions(+)

Tested-by: Andrew Bowers <andrewx.bowers@intel.com>



^ permalink raw reply	[flat|nested] 46+ messages in thread

* [Intel-wired-lan] [next-queue v6 PATCH 2/7] i40e: Introduce Port Representor netdevs and switchdev mode.
  2017-03-30  7:17   ` Or Gerlitz
@ 2017-04-03 18:41     ` Samudrala, Sridhar
  2017-04-04 11:58         ` [Intel-wired-lan] " Or Gerlitz
  0 siblings, 1 reply; 46+ messages in thread
From: Samudrala, Sridhar @ 2017-04-03 18:41 UTC (permalink / raw)
  To: intel-wired-lan



On 3/30/2017 12:17 AM, Or Gerlitz wrote:
> On Thu, Mar 30, 2017 at 3:22 AM, Sridhar Samudrala 
> <sridhar.samudrala at intel.com <mailto:sridhar.samudrala@intel.com>> wrote:
>
>     Port Representator netdevs are created for each PF and VF if the
>     switch
>     mode is set to 'switchdev'. These netdevs can be used to control and
>     configure VFs and PFs when they are moved to a different namespace.
>     They enable exposing statistics, configure and monitor link state,
>     mtu,
>     filters,fdb/vlan entries etc.
>
>
>
> What netdev represents the uplink (wire port) in your impl?
>
We don't have a port netdev representing the uplink in this 
implementation as we
cannot control the frames going out the uplink via sw rules with the 
current generation of
hw/fw.

-Sridhar

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osuosl.org/pipermail/intel-wired-lan/attachments/20170403/b8a82103/attachment.html>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [next-queue v6 PATCH 5/7] i40e: Add TX and RX support over port netdev's in switchdev mode
  2017-03-30  9:26     ` [Intel-wired-lan] " Or Gerlitz
@ 2017-04-03 18:52       ` Samudrala, Sridhar
  -1 siblings, 0 replies; 46+ messages in thread
From: Samudrala, Sridhar @ 2017-04-03 18:52 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: intel-wired-lan, Linux Netdev List, Alexander Duyck,
	Anjali Singhai Jain, Jakub Kicinski, Jiri Pirko

On 3/30/2017 2:26 AM, Or Gerlitz wrote:
> On Thu, Mar 30, 2017 at 3:22 AM, Sridhar Samudrala
> <sridhar.samudrala@intel.com> wrote:
>> Any frames sent via port netdevs are sent as directed transmits to the
>> corresponding VFs.
> okay, cool
>
>> In switchdev mode, broadcasts from VFs are received by the PF and passed
>> to corresponding port representor netdev.
> not following.
>
> If a VF sends a packet and it doesn't match any HW steering rule, then
> it has to meet some default rule. Such rule can be fwd to host CPU or drop
> or something else.
>
> E.g in mlx5 currently it's fwd to CPU --> the packet is delivered to
> the HW queue
> of the corresponding VF rep is received into the host networking stack
> from there
> (the VF rep does netif_rx).
fwd to CPU as default rule is not possible with the current generation 
of hw/fw.
So we would like to enable switchdev to expose the port representors and 
start
adding offloads in an incremental way.

>
> In this series you are not doing any offloading, right? so 100% of the packets
> sent by VFs should meet your default rule which I assume you want to be
> fwd to host CPU (--> vf rep)
>
> Is that broadcast a special case which will remain in place also when you
> add fdb/tc offloading? why not let the HW steering configuration for all types
> of traffic be dictated by offloading some SW switching rules?
>
> FWIW - I will not be online till Tues, so will see you reply only then
>
> Or.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [Intel-wired-lan] [next-queue v6 PATCH 5/7] i40e: Add TX and RX support over port netdev's in switchdev mode
@ 2017-04-03 18:52       ` Samudrala, Sridhar
  0 siblings, 0 replies; 46+ messages in thread
From: Samudrala, Sridhar @ 2017-04-03 18:52 UTC (permalink / raw)
  To: intel-wired-lan

On 3/30/2017 2:26 AM, Or Gerlitz wrote:
> On Thu, Mar 30, 2017 at 3:22 AM, Sridhar Samudrala
> <sridhar.samudrala@intel.com> wrote:
>> Any frames sent via port netdevs are sent as directed transmits to the
>> corresponding VFs.
> okay, cool
>
>> In switchdev mode, broadcasts from VFs are received by the PF and passed
>> to corresponding port representor netdev.
> not following.
>
> If a VF sends a packet and it doesn't match any HW steering rule, then
> it has to meet some default rule. Such rule can be fwd to host CPU or drop
> or something else.
>
> E.g in mlx5 currently it's fwd to CPU --> the packet is delivered to
> the HW queue
> of the corresponding VF rep is received into the host networking stack
> from there
> (the VF rep does netif_rx).
fwd to CPU as default rule is not possible with the current generation 
of hw/fw.
So we would like to enable switchdev to expose the port representors and 
start
adding offloads in an incremental way.

>
> In this series you are not doing any offloading, right? so 100% of the packets
> sent by VFs should meet your default rule which I assume you want to be
> fwd to host CPU (--> vf rep)
>
> Is that broadcast a special case which will remain in place also when you
> add fdb/tc offloading? why not let the HW steering configuration for all types
> of traffic be dictated by offloading some SW switching rules?
>
> FWIW - I will not be online till Tues, so will see you reply only then
>
> Or.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [next-queue v6 PATCH 2/7] i40e: Introduce Port Representor netdevs and switchdev mode.
  2017-04-03 18:41     ` Samudrala, Sridhar
@ 2017-04-04 11:58         ` Or Gerlitz
  0 siblings, 0 replies; 46+ messages in thread
From: Or Gerlitz @ 2017-04-04 11:58 UTC (permalink / raw)
  To: Samudrala, Sridhar, Anjali Singhai Jain, Alexander Duyck
  Cc: intel-wired-lan, Linux Netdev List, Jakub Kicinski, Jiri Pirko

On Mon, Apr 3, 2017 at 9:41 PM, Samudrala, Sridhar
<sridhar.samudrala@intel.com> wrote:
> On 3/30/2017 12:17 AM, Or Gerlitz wrote:
>> On Thu, Mar 30, 2017, Sridhar Samudrala wrote:

>>> Port Representator netdevs are created for each PF and VF if the switch
>>> mode is set to 'switchdev'. These netdevs can be used to control and
>>> configure VFs and PFs when they are moved to a different namespace.
>>> They enable exposing statistics, configure and monitor link state, mtu,
>>> filters,fdb/vlan entries etc.

>>> In switchdev mode, broadcasts from VFs are received by the PF and passed
>>> to corresponding port representor netdev.

>> What netdev represents the uplink (wire port) in your impl?

combining your replies from the two emails:

> We don't have a port netdev representing the uplink in this implementation as we
> cannot control the frames going out the uplink via sw rules with the current
> generation of hw/fw.

> fwd to CPU as default rule is not possible with the current generation of hw/fw.
> So we would like to enable switchdev to expose the port representors and start
> adding offloads in an incremental way.

I lost you even deeper

I was asking on frames getting in from the uplink and not getting out
the uplink.

This is about offloading to HW a switching model where the steering
(matching and actions)
comes into play on the port ingress. E.g

VF NIC xmit ---> VF vport e-switch rep recv --> SW or HW steering

other node xmit --> UPLINK vport e-switch rep recv --> SW or HW steering

If your current HW can't let you have "send to CPU" as the default
action on ingress
for the VFs and uplink ports, I am not clear what use-cases you can do
in slow path
(only reps, no offloaded SW rules) and for past path (reps + offloaded
SW rules)...

Can you please elaborate on such use-cases, so the bigger picture is more clear?

Or.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [Intel-wired-lan] [next-queue v6 PATCH 2/7] i40e: Introduce Port Representor netdevs and switchdev mode.
@ 2017-04-04 11:58         ` Or Gerlitz
  0 siblings, 0 replies; 46+ messages in thread
From: Or Gerlitz @ 2017-04-04 11:58 UTC (permalink / raw)
  To: intel-wired-lan

On Mon, Apr 3, 2017 at 9:41 PM, Samudrala, Sridhar
<sridhar.samudrala@intel.com> wrote:
> On 3/30/2017 12:17 AM, Or Gerlitz wrote:
>> On Thu, Mar 30, 2017, Sridhar Samudrala wrote:

>>> Port Representator netdevs are created for each PF and VF if the switch
>>> mode is set to 'switchdev'. These netdevs can be used to control and
>>> configure VFs and PFs when they are moved to a different namespace.
>>> They enable exposing statistics, configure and monitor link state, mtu,
>>> filters,fdb/vlan entries etc.

>>> In switchdev mode, broadcasts from VFs are received by the PF and passed
>>> to corresponding port representor netdev.

>> What netdev represents the uplink (wire port) in your impl?

combining your replies from the two emails:

> We don't have a port netdev representing the uplink in this implementation as we
> cannot control the frames going out the uplink via sw rules with the current
> generation of hw/fw.

> fwd to CPU as default rule is not possible with the current generation of hw/fw.
> So we would like to enable switchdev to expose the port representors and start
> adding offloads in an incremental way.

I lost you even deeper

I was asking on frames getting in from the uplink and not getting out
the uplink.

This is about offloading to HW a switching model where the steering
(matching and actions)
comes into play on the port ingress. E.g

VF NIC xmit ---> VF vport e-switch rep recv --> SW or HW steering

other node xmit --> UPLINK vport e-switch rep recv --> SW or HW steering

If your current HW can't let you have "send to CPU" as the default
action on ingress
for the VFs and uplink ports, I am not clear what use-cases you can do
in slow path
(only reps, no offloaded SW rules) and for past path (reps + offloaded
SW rules)...

Can you please elaborate on such use-cases, so the bigger picture is more clear?

Or.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Intel-wired-lan] [next-queue v6 PATCH 2/7] i40e: Introduce Port Representor netdevs and switchdev mode.
  2017-04-04 11:58         ` [Intel-wired-lan] " Or Gerlitz
@ 2017-04-04 15:29           ` Alexander Duyck
  -1 siblings, 0 replies; 46+ messages in thread
From: Alexander Duyck @ 2017-04-04 15:29 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Samudrala, Sridhar, Anjali Singhai Jain, Alexander Duyck,
	Jiri Pirko, Linux Netdev List, Jakub Kicinski, intel-wired-lan

On Tue, Apr 4, 2017 at 4:58 AM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
> On Mon, Apr 3, 2017 at 9:41 PM, Samudrala, Sridhar
> <sridhar.samudrala@intel.com> wrote:
>> On 3/30/2017 12:17 AM, Or Gerlitz wrote:
>>> On Thu, Mar 30, 2017, Sridhar Samudrala wrote:
>
>>>> Port Representator netdevs are created for each PF and VF if the switch
>>>> mode is set to 'switchdev'. These netdevs can be used to control and
>>>> configure VFs and PFs when they are moved to a different namespace.
>>>> They enable exposing statistics, configure and monitor link state, mtu,
>>>> filters,fdb/vlan entries etc.
>
>>>> In switchdev mode, broadcasts from VFs are received by the PF and passed
>>>> to corresponding port representor netdev.
>
>>> What netdev represents the uplink (wire port) in your impl?
>
> combining your replies from the two emails:
>
>> We don't have a port netdev representing the uplink in this implementation as we
>> cannot control the frames going out the uplink via sw rules with the current
>> generation of hw/fw.
>
>> fwd to CPU as default rule is not possible with the current generation of hw/fw.
>> So we would like to enable switchdev to expose the port representors and start
>> adding offloads in an incremental way.
>
> I lost you even deeper
>
> I was asking on frames getting in from the uplink and not getting out
> the uplink.

Frames coming from the uplink will by default be routed to the PF. So
are you saying you want a representor for the uplink to handle the
packets that don't have any rules set up for them, correct?

I think we could set something like this up as we do have the concept
of a "default" entity that everything falls back into. It is just a
bit muddled since that current exists as a part of the PF.

> This is about offloading to HW a switching model where the steering
> (matching and actions)
> comes into play on the port ingress. E.g
>
> VF NIC xmit ---> VF vport e-switch rep recv --> SW or HW steering

So this bit we can't really support very well with the i40e hardware.
The problem is that unless there is a rule that exists to route it to
another PF/VF there is a default rule in the hardware that would send
it out the uplink port. The only data we can really catch on the port
representors is broadcast/multicast because it does replication.

> other node xmit --> UPLINK vport e-switch rep recv --> SW or HW steering

This part I think we can do. The default behavior would be to send a
packet to the default entity which in this case is the PF.

> If your current HW can't let you have "send to CPU" as the default
> action on ingress
> for the VFs and uplink ports, I am not clear what use-cases you can do
> in slow path
> (only reps, no offloaded SW rules) and for past path (reps + offloaded
> SW rules)...
>
> Can you please elaborate on such use-cases, so the bigger picture is more clear?

So the main goal with all of this is to support TC offloads so that we
can program filters to route packets from the default entity to the
VF. I agree that I think we are missing the uplink port. We probably
just need to add it as the "default" handler for packets that
originate with a source MAC address that is not the PF or one of the
VFs.

We can discuss this further at netdev/netconf.

- Alex

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [Intel-wired-lan] [next-queue v6 PATCH 2/7] i40e: Introduce Port Representor netdevs and switchdev mode.
@ 2017-04-04 15:29           ` Alexander Duyck
  0 siblings, 0 replies; 46+ messages in thread
From: Alexander Duyck @ 2017-04-04 15:29 UTC (permalink / raw)
  To: intel-wired-lan

On Tue, Apr 4, 2017 at 4:58 AM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
> On Mon, Apr 3, 2017 at 9:41 PM, Samudrala, Sridhar
> <sridhar.samudrala@intel.com> wrote:
>> On 3/30/2017 12:17 AM, Or Gerlitz wrote:
>>> On Thu, Mar 30, 2017, Sridhar Samudrala wrote:
>
>>>> Port Representator netdevs are created for each PF and VF if the switch
>>>> mode is set to 'switchdev'. These netdevs can be used to control and
>>>> configure VFs and PFs when they are moved to a different namespace.
>>>> They enable exposing statistics, configure and monitor link state, mtu,
>>>> filters,fdb/vlan entries etc.
>
>>>> In switchdev mode, broadcasts from VFs are received by the PF and passed
>>>> to corresponding port representor netdev.
>
>>> What netdev represents the uplink (wire port) in your impl?
>
> combining your replies from the two emails:
>
>> We don't have a port netdev representing the uplink in this implementation as we
>> cannot control the frames going out the uplink via sw rules with the current
>> generation of hw/fw.
>
>> fwd to CPU as default rule is not possible with the current generation of hw/fw.
>> So we would like to enable switchdev to expose the port representors and start
>> adding offloads in an incremental way.
>
> I lost you even deeper
>
> I was asking on frames getting in from the uplink and not getting out
> the uplink.

Frames coming from the uplink will by default be routed to the PF. So
are you saying you want a representor for the uplink to handle the
packets that don't have any rules set up for them, correct?

I think we could set something like this up as we do have the concept
of a "default" entity that everything falls back into. It is just a
bit muddled since that current exists as a part of the PF.

> This is about offloading to HW a switching model where the steering
> (matching and actions)
> comes into play on the port ingress. E.g
>
> VF NIC xmit ---> VF vport e-switch rep recv --> SW or HW steering

So this bit we can't really support very well with the i40e hardware.
The problem is that unless there is a rule that exists to route it to
another PF/VF there is a default rule in the hardware that would send
it out the uplink port. The only data we can really catch on the port
representors is broadcast/multicast because it does replication.

> other node xmit --> UPLINK vport e-switch rep recv --> SW or HW steering

This part I think we can do. The default behavior would be to send a
packet to the default entity which in this case is the PF.

> If your current HW can't let you have "send to CPU" as the default
> action on ingress
> for the VFs and uplink ports, I am not clear what use-cases you can do
> in slow path
> (only reps, no offloaded SW rules) and for past path (reps + offloaded
> SW rules)...
>
> Can you please elaborate on such use-cases, so the bigger picture is more clear?

So the main goal with all of this is to support TC offloads so that we
can program filters to route packets from the default entity to the
VF. I agree that I think we are missing the uplink port. We probably
just need to add it as the "default" handler for packets that
originate with a source MAC address that is not the PF or one of the
VFs.

We can discuss this further at netdev/netconf.

- Alex

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Intel-wired-lan] [next-queue v6 PATCH 2/7] i40e: Introduce Port Representor netdevs and switchdev mode.
  2017-04-04 15:29           ` Alexander Duyck
@ 2017-04-05 13:41             ` Or Gerlitz
  -1 siblings, 0 replies; 46+ messages in thread
From: Or Gerlitz @ 2017-04-05 13:41 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Samudrala, Sridhar, Anjali Singhai Jain, Alexander Duyck,
	Jiri Pirko, Linux Netdev List, Jakub Kicinski, intel-wired-lan

On Tue, Apr 4, 2017 at 6:29 PM, Alexander Duyck
<alexander.duyck@gmail.com> wrote:
> On Tue, Apr 4, 2017 at 4:58 AM, Or Gerlitz <gerlitz.or@gmail.com> wrote:

>> I was asking on frames getting in from the uplink and not getting out
>> the uplink.

> Frames coming from the uplink will by default be routed to the PF. So
> are you saying you want a representor for the uplink to handle the
> packets that don't have any rules set up for them, correct?

In our impl, currently the PF serves the uplink representor when we
are in the switchdev mode.

> I think we could set something like this up as we do have the concept
> of a "default" entity that everything falls back into. It is just a
> bit muddled since that current exists as a part of the PF.

>> This is about offloading to HW a switching model where the steering
>> (matching and actions) comes into play on the port ingress. E.g
>> VF NIC xmit ---> VF vport e-switch rep recv --> SW or HW steering

> So this bit we can't really support very well with the i40e hardware.
> The problem is that unless there is a rule that exists to route it to
> another PF/VF there is a default rule in the hardware that would send
> it out the uplink port. The only data we can really catch on the port
> representors is broadcast/multicast because it does replication.

Can't you put a black hole (matching on nothing) rule saying that if
source is VF send it to the PF RX queues and not to the wire, from
the PF recv descriptor somehow realize from what VF the packet originated
and then inject it to the host kernel as it was received from the rep
of that VF?

Later when you add offloads, you make this rule with the lowest prio.

>> other node xmit --> UPLINK vport e-switch rep recv --> SW or HW steering

> This part I think we can do. The default behavior would be to send a
> packet to the default entity which in this case is the PF.

good

>> Can you please elaborate on such use-cases, so the bigger picture is more clear?

> So the main goal with all of this is to support TC offloads so that we
> can program filters to route packets from the default entity to the VF.

This is somehow too limited and I don't see what use case it can serve :(

> I agree that I think we are missing the uplink port. We probably
> just need to add it as the "default" handler for packets that
> originate with a source MAC address that is not the PF or one of the VFs.

> We can discuss this further at netdev/netconf.

yeah, but I will not be there (still asked everyone to get me a pack
of maple cookies),
so feel free to discuss with the MLNX folks, Rony and Jiri

Or.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [Intel-wired-lan] [next-queue v6 PATCH 2/7] i40e: Introduce Port Representor netdevs and switchdev mode.
@ 2017-04-05 13:41             ` Or Gerlitz
  0 siblings, 0 replies; 46+ messages in thread
From: Or Gerlitz @ 2017-04-05 13:41 UTC (permalink / raw)
  To: intel-wired-lan

On Tue, Apr 4, 2017 at 6:29 PM, Alexander Duyck
<alexander.duyck@gmail.com> wrote:
> On Tue, Apr 4, 2017 at 4:58 AM, Or Gerlitz <gerlitz.or@gmail.com> wrote:

>> I was asking on frames getting in from the uplink and not getting out
>> the uplink.

> Frames coming from the uplink will by default be routed to the PF. So
> are you saying you want a representor for the uplink to handle the
> packets that don't have any rules set up for them, correct?

In our impl, currently the PF serves the uplink representor when we
are in the switchdev mode.

> I think we could set something like this up as we do have the concept
> of a "default" entity that everything falls back into. It is just a
> bit muddled since that current exists as a part of the PF.

>> This is about offloading to HW a switching model where the steering
>> (matching and actions) comes into play on the port ingress. E.g
>> VF NIC xmit ---> VF vport e-switch rep recv --> SW or HW steering

> So this bit we can't really support very well with the i40e hardware.
> The problem is that unless there is a rule that exists to route it to
> another PF/VF there is a default rule in the hardware that would send
> it out the uplink port. The only data we can really catch on the port
> representors is broadcast/multicast because it does replication.

Can't you put a black hole (matching on nothing) rule saying that if
source is VF send it to the PF RX queues and not to the wire, from
the PF recv descriptor somehow realize from what VF the packet originated
and then inject it to the host kernel as it was received from the rep
of that VF?

Later when you add offloads, you make this rule with the lowest prio.

>> other node xmit --> UPLINK vport e-switch rep recv --> SW or HW steering

> This part I think we can do. The default behavior would be to send a
> packet to the default entity which in this case is the PF.

good

>> Can you please elaborate on such use-cases, so the bigger picture is more clear?

> So the main goal with all of this is to support TC offloads so that we
> can program filters to route packets from the default entity to the VF.

This is somehow too limited and I don't see what use case it can serve :(

> I agree that I think we are missing the uplink port. We probably
> just need to add it as the "default" handler for packets that
> originate with a source MAC address that is not the PF or one of the VFs.

> We can discuss this further at netdev/netconf.

yeah, but I will not be there (still asked everyone to get me a pack
of maple cookies),
so feel free to discuss with the MLNX folks, Rony and Jiri

Or.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Intel-wired-lan] [next-queue v6 PATCH 5/7] i40e: Add TX and RX support over port netdev's in switchdev mode
  2017-03-30  0:22   ` [Intel-wired-lan] " Sridhar Samudrala
@ 2017-04-14 16:47     ` Alexander Duyck
  -1 siblings, 0 replies; 46+ messages in thread
From: Alexander Duyck @ 2017-04-14 16:47 UTC (permalink / raw)
  To: Sridhar Samudrala
  Cc: intel-wired-lan, Netdev, Duyck, Alexander H, Anjali Singhai Jain,
	Jakub Kicinski, Or Gerlitz, Jiri Pirko

On Wed, Mar 29, 2017 at 5:22 PM, Sridhar Samudrala
<sridhar.samudrala@intel.com> wrote:
> In switchdev mode, broadcasts from VFs are received by the PF and passed
> to corresponding port representor netdev.
> Any frames sent via port netdevs are sent as directed transmits to the
> corresponding VFs. To enable directed transmit, skb metadata dst is used
> to pass the port id and the frame is requeued to call the PFs transmit
> routine. VF id is used as port id for VFs and PF port id is defined as
> I40_MAIN_VSI_PORT_ID.
>
> Small script to demonstrate inter VF and PF to VF pings in switchdev mode.
> PF: p4p1, VFs: p4p1_0,p4p1_1 VF Port Reps:p4p1-vf0, p4p1-vf1
> PF Port rep: p4p1-pf
>
> # rmmod i40e; modprobe i40e
> # devlink dev eswitch set pci/0000:05:00.0 mode switchdev
> # echo 2 > /sys/class/net/p4p1/device/sriov_numvfs
> # ip link set p4p1 vf 0 mac 00:11:22:33:44:55
> # ip link set p4p1 vf 1 mac 00:11:22:33:44:56
> # rmmod i40evf; modprobe i40evf
>
> /* Create 2 namespaces and move the VFs to the corresponding ns */
> # ip netns add ns0
> # ip link set p4p1_0 netns ns0
> # ip netns exec ns0 ip addr add 192.168.1.10/24 dev p4p1_0
> # ip netns exec ns0 ip link set p4p1_0 up
> # ip netns add ns1
> # ip link set p4p1_1 netns ns1
> # ip netns exec ns1 ip addr add 192.168.1.11/24 dev p4p1_1
> # ip netns exec ns1 ip link set p4p1_1 up
>
> /* bring up pf and port netdevs */
> # ip addr add 192.168.1.1/24 dev p4p1
> # ip link set p4p1 up
> # ip link set p4p1-vf0 up
> # ip link set p4p1-vf1 up
> # ip link set p4p1-pf up
>
> # ip netns exec ns0 ping -c3 192.168.1.11  /* VF0 -> VF1 */
> # ip netns exec ns1 ping -c3 192.168.1.10  /* VF1 -> VF0 */
> # ping -c3 192.168.1.10   /* PF -> VF0 */
> # ping -c3 192.168.1.11   /* PF -> VF1 */
>
> /* VF0 -> IP in same subnet - broadcasts will be seen on p4p1-vf0 & p4p1 */
> # ip netns exec ns0 ping -c1 -W1 192.168.1.200
> /* VF1 -> IP in same subnet -  broadcasts will be seen on p4p1-vf1 & p4p1*/
> # ip netns exec ns0 ping -c1 -W1 192.168.1.200
> /* port rep VF0 -> IP in same subnet - broadcasts will be seen on p4p1_0 */
> # ping -I p4p1-vf0 -c1 -W1 192.168.1.200
> /* port rep VF1 -> IP in same subnet  - broadcasts will be seen on p4p1_1 */
> # ping -I p4p1-vf1 -c1 -W1 192.168.1.200
>
> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
> ---
>  drivers/net/ethernet/intel/i40e/i40e.h             |   4 +
>  drivers/net/ethernet/intel/i40e/i40e_main.c        |  27 +++-
>  drivers/net/ethernet/intel/i40e/i40e_txrx.c        | 148 ++++++++++++++++++++-
>  drivers/net/ethernet/intel/i40e/i40e_txrx.h        |   2 +
>  drivers/net/ethernet/intel/i40e/i40e_type.h        |   3 +
>  drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c |   8 +-
>  6 files changed, 184 insertions(+), 8 deletions(-)
>

<snip>

> diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> index ebffca0..86d2510 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> @@ -1302,20 +1302,64 @@ static bool i40e_alloc_mapped_page(struct i40e_ring *rx_ring,
>  }
>
>  /**
> + * i40e_handle_lpbk_skb - Update skb->dev of a loopback frame
> + * @rx_ring: rx ring in play
> + * @skb: packet to send up
> + **/
> +static void i40e_handle_lpbk_skb(struct i40e_ring *rx_ring, struct sk_buff *skb)
> +{
> +       struct i40e_q_vector *q_vector = rx_ring->q_vector;
> +       struct i40e_pf *pf = rx_ring->vsi->back;
> +       struct sk_buff *nskb;
> +       struct i40e_vf *vf;
> +       struct ethhdr *eth;
> +       int vf_id;
> +
> +       if ((skb->pkt_type != PACKET_BROADCAST) &&
> +           (skb->pkt_type != PACKET_MULTICAST) &&
> +           (skb->pkt_type != PACKET_OTHERHOST))
> +               return;
> +
> +       eth = (struct ethhdr *)skb_mac_header(skb);
> +
> +       /* If a loopback packet is received in switchdev mode, clone the skb
> +        * and pass it to the corresponding port netdev based on the source MAC.
> +        */
> +       for (vf_id = 0; vf_id < pf->num_alloc_vfs; vf_id++) {
> +               vf = &pf->vf[vf_id];
> +               if (ether_addr_equal(eth->h_source,
> +                                    vf->default_lan_addr.addr)) {
> +                       nskb = skb_clone(skb, GFP_ATOMIC);
> +                       if (!nskb)
> +                               break;
> +                       nskb->offload_fwd_mark = 1;

So this line is causing build errors when switchdev is not enabled.
This whole function should probably be wrapped in a check to see if
switchdev support is enabled or not.

> +                       nskb->dev = vf->port_netdev;
> +                       napi_gro_receive(&q_vector->napi, nskb);
> +                       break;
> +               }
> +       }
> +}
> +
> +/**

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [Intel-wired-lan] [next-queue v6 PATCH 5/7] i40e: Add TX and RX support over port netdev's in switchdev mode
@ 2017-04-14 16:47     ` Alexander Duyck
  0 siblings, 0 replies; 46+ messages in thread
From: Alexander Duyck @ 2017-04-14 16:47 UTC (permalink / raw)
  To: intel-wired-lan

On Wed, Mar 29, 2017 at 5:22 PM, Sridhar Samudrala
<sridhar.samudrala@intel.com> wrote:
> In switchdev mode, broadcasts from VFs are received by the PF and passed
> to corresponding port representor netdev.
> Any frames sent via port netdevs are sent as directed transmits to the
> corresponding VFs. To enable directed transmit, skb metadata dst is used
> to pass the port id and the frame is requeued to call the PFs transmit
> routine. VF id is used as port id for VFs and PF port id is defined as
> I40_MAIN_VSI_PORT_ID.
>
> Small script to demonstrate inter VF and PF to VF pings in switchdev mode.
> PF: p4p1, VFs: p4p1_0,p4p1_1 VF Port Reps:p4p1-vf0, p4p1-vf1
> PF Port rep: p4p1-pf
>
> # rmmod i40e; modprobe i40e
> # devlink dev eswitch set pci/0000:05:00.0 mode switchdev
> # echo 2 > /sys/class/net/p4p1/device/sriov_numvfs
> # ip link set p4p1 vf 0 mac 00:11:22:33:44:55
> # ip link set p4p1 vf 1 mac 00:11:22:33:44:56
> # rmmod i40evf; modprobe i40evf
>
> /* Create 2 namespaces and move the VFs to the corresponding ns */
> # ip netns add ns0
> # ip link set p4p1_0 netns ns0
> # ip netns exec ns0 ip addr add 192.168.1.10/24 dev p4p1_0
> # ip netns exec ns0 ip link set p4p1_0 up
> # ip netns add ns1
> # ip link set p4p1_1 netns ns1
> # ip netns exec ns1 ip addr add 192.168.1.11/24 dev p4p1_1
> # ip netns exec ns1 ip link set p4p1_1 up
>
> /* bring up pf and port netdevs */
> # ip addr add 192.168.1.1/24 dev p4p1
> # ip link set p4p1 up
> # ip link set p4p1-vf0 up
> # ip link set p4p1-vf1 up
> # ip link set p4p1-pf up
>
> # ip netns exec ns0 ping -c3 192.168.1.11  /* VF0 -> VF1 */
> # ip netns exec ns1 ping -c3 192.168.1.10  /* VF1 -> VF0 */
> # ping -c3 192.168.1.10   /* PF -> VF0 */
> # ping -c3 192.168.1.11   /* PF -> VF1 */
>
> /* VF0 -> IP in same subnet - broadcasts will be seen on p4p1-vf0 & p4p1 */
> # ip netns exec ns0 ping -c1 -W1 192.168.1.200
> /* VF1 -> IP in same subnet -  broadcasts will be seen on p4p1-vf1 & p4p1*/
> # ip netns exec ns0 ping -c1 -W1 192.168.1.200
> /* port rep VF0 -> IP in same subnet - broadcasts will be seen on p4p1_0 */
> # ping -I p4p1-vf0 -c1 -W1 192.168.1.200
> /* port rep VF1 -> IP in same subnet  - broadcasts will be seen on p4p1_1 */
> # ping -I p4p1-vf1 -c1 -W1 192.168.1.200
>
> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
> ---
>  drivers/net/ethernet/intel/i40e/i40e.h             |   4 +
>  drivers/net/ethernet/intel/i40e/i40e_main.c        |  27 +++-
>  drivers/net/ethernet/intel/i40e/i40e_txrx.c        | 148 ++++++++++++++++++++-
>  drivers/net/ethernet/intel/i40e/i40e_txrx.h        |   2 +
>  drivers/net/ethernet/intel/i40e/i40e_type.h        |   3 +
>  drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c |   8 +-
>  6 files changed, 184 insertions(+), 8 deletions(-)
>

<snip>

> diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> index ebffca0..86d2510 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> @@ -1302,20 +1302,64 @@ static bool i40e_alloc_mapped_page(struct i40e_ring *rx_ring,
>  }
>
>  /**
> + * i40e_handle_lpbk_skb - Update skb->dev of a loopback frame
> + * @rx_ring: rx ring in play
> + * @skb: packet to send up
> + **/
> +static void i40e_handle_lpbk_skb(struct i40e_ring *rx_ring, struct sk_buff *skb)
> +{
> +       struct i40e_q_vector *q_vector = rx_ring->q_vector;
> +       struct i40e_pf *pf = rx_ring->vsi->back;
> +       struct sk_buff *nskb;
> +       struct i40e_vf *vf;
> +       struct ethhdr *eth;
> +       int vf_id;
> +
> +       if ((skb->pkt_type != PACKET_BROADCAST) &&
> +           (skb->pkt_type != PACKET_MULTICAST) &&
> +           (skb->pkt_type != PACKET_OTHERHOST))
> +               return;
> +
> +       eth = (struct ethhdr *)skb_mac_header(skb);
> +
> +       /* If a loopback packet is received in switchdev mode, clone the skb
> +        * and pass it to the corresponding port netdev based on the source MAC.
> +        */
> +       for (vf_id = 0; vf_id < pf->num_alloc_vfs; vf_id++) {
> +               vf = &pf->vf[vf_id];
> +               if (ether_addr_equal(eth->h_source,
> +                                    vf->default_lan_addr.addr)) {
> +                       nskb = skb_clone(skb, GFP_ATOMIC);
> +                       if (!nskb)
> +                               break;
> +                       nskb->offload_fwd_mark = 1;

So this line is causing build errors when switchdev is not enabled.
This whole function should probably be wrapped in a check to see if
switchdev support is enabled or not.

> +                       nskb->dev = vf->port_netdev;
> +                       napi_gro_receive(&q_vector->napi, nskb);
> +                       break;
> +               }
> +       }
> +}
> +
> +/**

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Intel-wired-lan] [next-queue v6 PATCH 5/7] i40e: Add TX and RX support over port netdev's in switchdev mode
  2017-04-14 16:47     ` Alexander Duyck
@ 2017-04-14 18:26       ` Samudrala, Sridhar
  -1 siblings, 0 replies; 46+ messages in thread
From: Samudrala, Sridhar @ 2017-04-14 18:26 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: intel-wired-lan, Netdev, Duyck, Alexander H, Anjali Singhai Jain,
	Jakub Kicinski, Or Gerlitz, Jiri Pirko



On 4/14/2017 9:47 AM, Alexander Duyck wrote:
> On Wed, Mar 29, 2017 at 5:22 PM, Sridhar Samudrala
> <sridhar.samudrala@intel.com> wrote:
>> In switchdev mode, broadcasts from VFs are received by the PF and passed
>> to corresponding port representor netdev.
>> Any frames sent via port netdevs are sent as directed transmits to the
>> corresponding VFs. To enable directed transmit, skb metadata dst is used
>> to pass the port id and the frame is requeued to call the PFs transmit
>> routine. VF id is used as port id for VFs and PF port id is defined as
>> I40_MAIN_VSI_PORT_ID.
>>
>> Small script to demonstrate inter VF and PF to VF pings in switchdev mode.
>> PF: p4p1, VFs: p4p1_0,p4p1_1 VF Port Reps:p4p1-vf0, p4p1-vf1
>> PF Port rep: p4p1-pf
>>
>> # rmmod i40e; modprobe i40e
>> # devlink dev eswitch set pci/0000:05:00.0 mode switchdev
>> # echo 2 > /sys/class/net/p4p1/device/sriov_numvfs
>> # ip link set p4p1 vf 0 mac 00:11:22:33:44:55
>> # ip link set p4p1 vf 1 mac 00:11:22:33:44:56
>> # rmmod i40evf; modprobe i40evf
>>
>> /* Create 2 namespaces and move the VFs to the corresponding ns */
>> # ip netns add ns0
>> # ip link set p4p1_0 netns ns0
>> # ip netns exec ns0 ip addr add 192.168.1.10/24 dev p4p1_0
>> # ip netns exec ns0 ip link set p4p1_0 up
>> # ip netns add ns1
>> # ip link set p4p1_1 netns ns1
>> # ip netns exec ns1 ip addr add 192.168.1.11/24 dev p4p1_1
>> # ip netns exec ns1 ip link set p4p1_1 up
>>
>> /* bring up pf and port netdevs */
>> # ip addr add 192.168.1.1/24 dev p4p1
>> # ip link set p4p1 up
>> # ip link set p4p1-vf0 up
>> # ip link set p4p1-vf1 up
>> # ip link set p4p1-pf up
>>
>> # ip netns exec ns0 ping -c3 192.168.1.11  /* VF0 -> VF1 */
>> # ip netns exec ns1 ping -c3 192.168.1.10  /* VF1 -> VF0 */
>> # ping -c3 192.168.1.10   /* PF -> VF0 */
>> # ping -c3 192.168.1.11   /* PF -> VF1 */
>>
>> /* VF0 -> IP in same subnet - broadcasts will be seen on p4p1-vf0 & p4p1 */
>> # ip netns exec ns0 ping -c1 -W1 192.168.1.200
>> /* VF1 -> IP in same subnet -  broadcasts will be seen on p4p1-vf1 & p4p1*/
>> # ip netns exec ns0 ping -c1 -W1 192.168.1.200
>> /* port rep VF0 -> IP in same subnet - broadcasts will be seen on p4p1_0 */
>> # ping -I p4p1-vf0 -c1 -W1 192.168.1.200
>> /* port rep VF1 -> IP in same subnet  - broadcasts will be seen on p4p1_1 */
>> # ping -I p4p1-vf1 -c1 -W1 192.168.1.200
>>
>> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
>> ---
>>   drivers/net/ethernet/intel/i40e/i40e.h             |   4 +
>>   drivers/net/ethernet/intel/i40e/i40e_main.c        |  27 +++-
>>   drivers/net/ethernet/intel/i40e/i40e_txrx.c        | 148 ++++++++++++++++++++-
>>   drivers/net/ethernet/intel/i40e/i40e_txrx.h        |   2 +
>>   drivers/net/ethernet/intel/i40e/i40e_type.h        |   3 +
>>   drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c |   8 +-
>>   6 files changed, 184 insertions(+), 8 deletions(-)
>>
> <snip>
>
>> diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
>> index ebffca0..86d2510 100644
>> --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
>> +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
>> @@ -1302,20 +1302,64 @@ static bool i40e_alloc_mapped_page(struct i40e_ring *rx_ring,
>>   }
>>
>>   /**
>> + * i40e_handle_lpbk_skb - Update skb->dev of a loopback frame
>> + * @rx_ring: rx ring in play
>> + * @skb: packet to send up
>> + **/
>> +static void i40e_handle_lpbk_skb(struct i40e_ring *rx_ring, struct sk_buff *skb)
>> +{
>> +       struct i40e_q_vector *q_vector = rx_ring->q_vector;
>> +       struct i40e_pf *pf = rx_ring->vsi->back;
>> +       struct sk_buff *nskb;
>> +       struct i40e_vf *vf;
>> +       struct ethhdr *eth;
>> +       int vf_id;
>> +
>> +       if ((skb->pkt_type != PACKET_BROADCAST) &&
>> +           (skb->pkt_type != PACKET_MULTICAST) &&
>> +           (skb->pkt_type != PACKET_OTHERHOST))
>> +               return;
>> +
>> +       eth = (struct ethhdr *)skb_mac_header(skb);
>> +
>> +       /* If a loopback packet is received in switchdev mode, clone the skb
>> +        * and pass it to the corresponding port netdev based on the source MAC.
>> +        */
>> +       for (vf_id = 0; vf_id < pf->num_alloc_vfs; vf_id++) {
>> +               vf = &pf->vf[vf_id];
>> +               if (ether_addr_equal(eth->h_source,
>> +                                    vf->default_lan_addr.addr)) {
>> +                       nskb = skb_clone(skb, GFP_ATOMIC);
>> +                       if (!nskb)
>> +                               break;
>> +                       nskb->offload_fwd_mark = 1;
> So this line is causing build errors when switchdev is not enabled.
> This whole function should probably be wrapped in a check to see if
> switchdev support is enabled or not.
Yes. will fix it in the next revision.

Thanks
Sridhar

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [Intel-wired-lan] [next-queue v6 PATCH 5/7] i40e: Add TX and RX support over port netdev's in switchdev mode
@ 2017-04-14 18:26       ` Samudrala, Sridhar
  0 siblings, 0 replies; 46+ messages in thread
From: Samudrala, Sridhar @ 2017-04-14 18:26 UTC (permalink / raw)
  To: intel-wired-lan



On 4/14/2017 9:47 AM, Alexander Duyck wrote:
> On Wed, Mar 29, 2017 at 5:22 PM, Sridhar Samudrala
> <sridhar.samudrala@intel.com> wrote:
>> In switchdev mode, broadcasts from VFs are received by the PF and passed
>> to corresponding port representor netdev.
>> Any frames sent via port netdevs are sent as directed transmits to the
>> corresponding VFs. To enable directed transmit, skb metadata dst is used
>> to pass the port id and the frame is requeued to call the PFs transmit
>> routine. VF id is used as port id for VFs and PF port id is defined as
>> I40_MAIN_VSI_PORT_ID.
>>
>> Small script to demonstrate inter VF and PF to VF pings in switchdev mode.
>> PF: p4p1, VFs: p4p1_0,p4p1_1 VF Port Reps:p4p1-vf0, p4p1-vf1
>> PF Port rep: p4p1-pf
>>
>> # rmmod i40e; modprobe i40e
>> # devlink dev eswitch set pci/0000:05:00.0 mode switchdev
>> # echo 2 > /sys/class/net/p4p1/device/sriov_numvfs
>> # ip link set p4p1 vf 0 mac 00:11:22:33:44:55
>> # ip link set p4p1 vf 1 mac 00:11:22:33:44:56
>> # rmmod i40evf; modprobe i40evf
>>
>> /* Create 2 namespaces and move the VFs to the corresponding ns */
>> # ip netns add ns0
>> # ip link set p4p1_0 netns ns0
>> # ip netns exec ns0 ip addr add 192.168.1.10/24 dev p4p1_0
>> # ip netns exec ns0 ip link set p4p1_0 up
>> # ip netns add ns1
>> # ip link set p4p1_1 netns ns1
>> # ip netns exec ns1 ip addr add 192.168.1.11/24 dev p4p1_1
>> # ip netns exec ns1 ip link set p4p1_1 up
>>
>> /* bring up pf and port netdevs */
>> # ip addr add 192.168.1.1/24 dev p4p1
>> # ip link set p4p1 up
>> # ip link set p4p1-vf0 up
>> # ip link set p4p1-vf1 up
>> # ip link set p4p1-pf up
>>
>> # ip netns exec ns0 ping -c3 192.168.1.11  /* VF0 -> VF1 */
>> # ip netns exec ns1 ping -c3 192.168.1.10  /* VF1 -> VF0 */
>> # ping -c3 192.168.1.10   /* PF -> VF0 */
>> # ping -c3 192.168.1.11   /* PF -> VF1 */
>>
>> /* VF0 -> IP in same subnet - broadcasts will be seen on p4p1-vf0 & p4p1 */
>> # ip netns exec ns0 ping -c1 -W1 192.168.1.200
>> /* VF1 -> IP in same subnet -  broadcasts will be seen on p4p1-vf1 & p4p1*/
>> # ip netns exec ns0 ping -c1 -W1 192.168.1.200
>> /* port rep VF0 -> IP in same subnet - broadcasts will be seen on p4p1_0 */
>> # ping -I p4p1-vf0 -c1 -W1 192.168.1.200
>> /* port rep VF1 -> IP in same subnet  - broadcasts will be seen on p4p1_1 */
>> # ping -I p4p1-vf1 -c1 -W1 192.168.1.200
>>
>> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
>> ---
>>   drivers/net/ethernet/intel/i40e/i40e.h             |   4 +
>>   drivers/net/ethernet/intel/i40e/i40e_main.c        |  27 +++-
>>   drivers/net/ethernet/intel/i40e/i40e_txrx.c        | 148 ++++++++++++++++++++-
>>   drivers/net/ethernet/intel/i40e/i40e_txrx.h        |   2 +
>>   drivers/net/ethernet/intel/i40e/i40e_type.h        |   3 +
>>   drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c |   8 +-
>>   6 files changed, 184 insertions(+), 8 deletions(-)
>>
> <snip>
>
>> diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
>> index ebffca0..86d2510 100644
>> --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
>> +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
>> @@ -1302,20 +1302,64 @@ static bool i40e_alloc_mapped_page(struct i40e_ring *rx_ring,
>>   }
>>
>>   /**
>> + * i40e_handle_lpbk_skb - Update skb->dev of a loopback frame
>> + * @rx_ring: rx ring in play
>> + * @skb: packet to send up
>> + **/
>> +static void i40e_handle_lpbk_skb(struct i40e_ring *rx_ring, struct sk_buff *skb)
>> +{
>> +       struct i40e_q_vector *q_vector = rx_ring->q_vector;
>> +       struct i40e_pf *pf = rx_ring->vsi->back;
>> +       struct sk_buff *nskb;
>> +       struct i40e_vf *vf;
>> +       struct ethhdr *eth;
>> +       int vf_id;
>> +
>> +       if ((skb->pkt_type != PACKET_BROADCAST) &&
>> +           (skb->pkt_type != PACKET_MULTICAST) &&
>> +           (skb->pkt_type != PACKET_OTHERHOST))
>> +               return;
>> +
>> +       eth = (struct ethhdr *)skb_mac_header(skb);
>> +
>> +       /* If a loopback packet is received in switchdev mode, clone the skb
>> +        * and pass it to the corresponding port netdev based on the source MAC.
>> +        */
>> +       for (vf_id = 0; vf_id < pf->num_alloc_vfs; vf_id++) {
>> +               vf = &pf->vf[vf_id];
>> +               if (ether_addr_equal(eth->h_source,
>> +                                    vf->default_lan_addr.addr)) {
>> +                       nskb = skb_clone(skb, GFP_ATOMIC);
>> +                       if (!nskb)
>> +                               break;
>> +                       nskb->offload_fwd_mark = 1;
> So this line is causing build errors when switchdev is not enabled.
> This whole function should probably be wrapped in a check to see if
> switchdev support is enabled or not.
Yes. will fix it in the next revision.

Thanks
Sridhar


^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2017-04-14 18:26 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-30  0:22 [next-queue v6 PATCH 0/7] i40e: Add port representor and initial switchdev support Sridhar Samudrala
2017-03-30  0:22 ` [Intel-wired-lan] " Sridhar Samudrala
2017-03-30  0:22 ` [next-queue v6 PATCH 1/7] i40e: Introduce devlink interface Sridhar Samudrala
2017-03-30  0:22   ` [Intel-wired-lan] " Sridhar Samudrala
2017-03-31 19:35   ` Bowers, AndrewX
2017-03-30  0:22 ` [next-queue v6 PATCH 2/7] i40e: Introduce Port Representor netdevs and switchdev mode Sridhar Samudrala
2017-03-30  0:22   ` [Intel-wired-lan] " Sridhar Samudrala
2017-03-30  7:17   ` Or Gerlitz
2017-04-03 18:41     ` Samudrala, Sridhar
2017-04-04 11:58       ` Or Gerlitz
2017-04-04 11:58         ` [Intel-wired-lan] " Or Gerlitz
2017-04-04 15:29         ` Alexander Duyck
2017-04-04 15:29           ` Alexander Duyck
2017-04-05 13:41           ` Or Gerlitz
2017-04-05 13:41             ` Or Gerlitz
2017-03-30  9:17   ` Or Gerlitz
2017-03-30  9:17     ` [Intel-wired-lan] " Or Gerlitz
2017-03-31 19:35   ` Bowers, AndrewX
2017-03-30  0:22 ` [next-queue v6 PATCH 3/7] i40e: Sync link state between PF/VFs and Port representor netdevs Sridhar Samudrala
2017-03-30  0:22   ` [Intel-wired-lan] " Sridhar Samudrala
2017-03-31 19:37   ` Bowers, AndrewX
2017-03-30  0:22 ` [next-queue v6 PATCH 4/7] net: store port/representator id in metadata_dst Sridhar Samudrala
2017-03-30  0:22   ` [Intel-wired-lan] " Sridhar Samudrala
2017-03-31 19:42   ` Bowers, AndrewX
2017-03-30  0:22 ` [next-queue v6 PATCH 5/7] i40e: Add TX and RX support over port netdev's in switchdev mode Sridhar Samudrala
2017-03-30  0:22   ` [Intel-wired-lan] " Sridhar Samudrala
2017-03-30  9:26   ` Or Gerlitz
2017-03-30  9:26     ` [Intel-wired-lan] " Or Gerlitz
2017-04-03 18:52     ` Samudrala, Sridhar
2017-04-03 18:52       ` [Intel-wired-lan] " Samudrala, Sridhar
2017-04-14 16:47   ` Alexander Duyck
2017-04-14 16:47     ` Alexander Duyck
2017-04-14 18:26     ` Samudrala, Sridhar
2017-04-14 18:26       ` Samudrala, Sridhar
2017-03-30  0:22 ` [next-queue v6 PATCH 6/7] i40e: Add support for exposing switch port statistics via port netdevs Sridhar Samudrala
2017-03-30  0:22   ` [Intel-wired-lan] " Sridhar Samudrala
2017-03-31 19:39   ` Bowers, AndrewX
2017-03-30  0:22 ` [next-queue v6 PATCH 7/7] i40e: Add support to get switch id and port number for " Sridhar Samudrala
2017-03-30  0:22   ` [Intel-wired-lan] " Sridhar Samudrala
2017-03-30 21:45   ` Jakub Kicinski
2017-03-30 21:45     ` [Intel-wired-lan] " Jakub Kicinski
2017-03-30 22:31     ` Alexander Duyck
2017-03-30 22:31       ` [Intel-wired-lan] " Alexander Duyck
2017-03-31  1:16       ` Jakub Kicinski
2017-03-31  1:16         ` [Intel-wired-lan] " Jakub Kicinski
2017-03-31 21:09   ` Bowers, AndrewX

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.