All of lore.kernel.org
 help / color / mirror / Atom feed
* [net-next 0/5][pull request] 100GbE Intel Wired LAN Driver Updates 2020-07-13
@ 2020-07-13 17:43 Tony Nguyen
  2020-07-13 17:43 ` [net-next 1/5] ice: add the virtchnl handler for AdminQ command Tony Nguyen
                   ` (4 more replies)
  0 siblings, 5 replies; 21+ messages in thread
From: Tony Nguyen @ 2020-07-13 17:43 UTC (permalink / raw)
  To: davem; +Cc: Tony Nguyen, netdev, nhorman, sassmann, jeffrey.t.kirsher

This series contains updates to ice driver and virtchnl header file.

The iproute2 and ethtool are evolving to expose the NIC hardware
capability. But these available orchestration methods in Linux kernel are
limited in their capability to exercise advanced functionality in the
hardware, since different vendors may have different data programming
method.

Intel Ethernet Adaptive Virtual Function (AVF) is the common hardware
interface for SR-IOV, it has the defined message format to talk with the
PF.

To make good use of the advanced functionality like Switch (Binary
Classifier). The ice PF driver introduces a DCF (Device Config Function)
mode to extend the AVF feature.

The DCF (Device Config Function) method wraps a raw flow admin queue
command in a virthcnl message and sends it to the PF to be executed. This
is required because it doesn't have the privilege level to issue the admin
queue commands, so it acts as a proxy PF. So that the user can customize
the AVF feature, and use their own programming language to translate the
flow rule management data into ice raw flow, these raw flows then can be
executed in PF's sandbox.

And the kernel PF driver fully controls the behavior of DCF for security,
like only the trusted VF with ID zero can run in DCF mode, and also for
being compatible with the common AVF feature, the VF needs to advertise and
acquire DCF capability first.

The following are changes since commit 94339443686b36d3223bc032b7947267474e2679:
  net: bridge: notify on vlan tunnel changes done via the old api
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue 100GbE

Haiyue Wang (5):
  ice: add the virtchnl handler for AdminQ command
  ice: add DCF cap negotiation and state machine
  ice: support to get the VSI mapping
  ice: enable DDP package info querying
  ice: add switch rule management for DCF

 drivers/net/ethernet/intel/ice/Makefile       |   2 +-
 drivers/net/ethernet/intel/ice/ice.h          |   2 +
 .../net/ethernet/intel/ice/ice_adminq_cmd.h   |   6 +
 drivers/net/ethernet/intel/ice/ice_dcf.c      | 833 ++++++++++++++++++
 drivers/net/ethernet/intel/ice/ice_dcf.h      |  91 ++
 drivers/net/ethernet/intel/ice/ice_main.c     |   2 +
 drivers/net/ethernet/intel/ice/ice_switch.c   |  16 +-
 drivers/net/ethernet/intel/ice/ice_switch.h   |  27 +-
 drivers/net/ethernet/intel/ice/ice_type.h     |   9 +
 .../net/ethernet/intel/ice/ice_virtchnl_pf.c  | 366 ++++++++
 .../net/ethernet/intel/ice/ice_virtchnl_pf.h  |   1 +
 include/linux/avf/virtchnl.h                  |  63 ++
 12 files changed, 1392 insertions(+), 26 deletions(-)
 create mode 100644 drivers/net/ethernet/intel/ice/ice_dcf.c
 create mode 100644 drivers/net/ethernet/intel/ice/ice_dcf.h

-- 
2.26.2


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [net-next 1/5] ice: add the virtchnl handler for AdminQ command
  2020-07-13 17:43 [net-next 0/5][pull request] 100GbE Intel Wired LAN Driver Updates 2020-07-13 Tony Nguyen
@ 2020-07-13 17:43 ` Tony Nguyen
  2020-07-13 22:48   ` Jakub Kicinski
  2020-07-13 17:43 ` [net-next 2/5] ice: add DCF cap negotiation and state machine Tony Nguyen
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 21+ messages in thread
From: Tony Nguyen @ 2020-07-13 17:43 UTC (permalink / raw)
  To: davem
  Cc: Haiyue Wang, netdev, nhorman, sassmann, jeffrey.t.kirsher,
	anthony.l.nguyen, Nannan Lu, Andrew Bowers

From: Haiyue Wang <haiyue.wang@intel.com>

The DCF (Device Config Function) is a named trust VF (always with ID 0,
single entity per PF port) that can act as a sole controlling entity to
exercise advance functionality such as adding switch rules for the rest
of VFs.

To achieve this approach, this VF is permitted to send some basic AdminQ
commands to the PF through virtual channel (mailbox), then the PF driver
sends these commands to the firmware, and returns the response to the VF
again through virtual channel.

The AdminQ command from DCF is split into two parts: one is the AdminQ
descriptor, the other is the buffer (the descriptor has BUF flag set).
These two parts should be sent in order, so that the PF can handle them
correctly.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Tested-by: Nannan Lu <nannan.lu@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
---
 drivers/net/ethernet/intel/ice/Makefile       |   2 +-
 drivers/net/ethernet/intel/ice/ice.h          |   2 +
 .../net/ethernet/intel/ice/ice_adminq_cmd.h   |   6 +
 drivers/net/ethernet/intel/ice/ice_dcf.c      |  49 +++++++
 drivers/net/ethernet/intel/ice/ice_dcf.h      |  20 +++
 .../net/ethernet/intel/ice/ice_virtchnl_pf.c  | 130 ++++++++++++++++++
 .../net/ethernet/intel/ice/ice_virtchnl_pf.h  |   1 +
 include/linux/avf/virtchnl.h                  |  10 ++
 8 files changed, 219 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/intel/ice/ice_dcf.c
 create mode 100644 drivers/net/ethernet/intel/ice/ice_dcf.h

diff --git a/drivers/net/ethernet/intel/ice/Makefile b/drivers/net/ethernet/intel/ice/Makefile
index 980bbcc64b4b..eb83b5fe11c3 100644
--- a/drivers/net/ethernet/intel/ice/Makefile
+++ b/drivers/net/ethernet/intel/ice/Makefile
@@ -24,7 +24,7 @@ ice-y := ice_main.o	\
 	 ice_flow.o	\
 	 ice_devlink.o	\
 	 ice_ethtool.o
-ice-$(CONFIG_PCI_IOV) += ice_virtchnl_pf.o ice_sriov.o
+ice-$(CONFIG_PCI_IOV) += ice_virtchnl_pf.o ice_sriov.o ice_dcf.o
 ice-$(CONFIG_DCB) += ice_dcb.o ice_dcb_nl.o ice_dcb_lib.o
 ice-$(CONFIG_RFS_ACCEL) += ice_arfs.o
 ice-$(CONFIG_XDP_SOCKETS) += ice_xsk.o
diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h
index 7486d010a619..a6e419a3f547 100644
--- a/drivers/net/ethernet/intel/ice/ice.h
+++ b/drivers/net/ethernet/intel/ice/ice.h
@@ -435,6 +435,8 @@ struct ice_pf {
 	u32 tx_timeout_recovery_level;
 	char int_name[ICE_INT_NAME_STR_LEN];
 	u32 sw_int_count;
+
+	struct ice_dcf dcf;
 };
 
 struct ice_netdev_priv {
diff --git a/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h b/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
index 99c39249613a..89b91ccaf533 100644
--- a/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
+++ b/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
@@ -1830,6 +1830,12 @@ enum ice_adminq_opc {
 	ice_aqc_opc_update_vsi				= 0x0211,
 	ice_aqc_opc_free_vsi				= 0x0213,
 
+	/* recipe commands */
+	ice_aqc_opc_add_recipe				= 0x0290,
+	ice_aqc_opc_recipe_to_profile			= 0x0291,
+	ice_aqc_opc_get_recipe				= 0x0292,
+	ice_aqc_opc_get_recipe_to_profile		= 0x0293,
+
 	/* switch rules population commands */
 	ice_aqc_opc_add_sw_rules			= 0x02A0,
 	ice_aqc_opc_update_sw_rules			= 0x02A1,
diff --git a/drivers/net/ethernet/intel/ice/ice_dcf.c b/drivers/net/ethernet/intel/ice/ice_dcf.c
new file mode 100644
index 000000000000..cbe60a0cb2d2
--- /dev/null
+++ b/drivers/net/ethernet/intel/ice/ice_dcf.c
@@ -0,0 +1,49 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (C) 2018-2020, Intel Corporation. */
+
+#include "ice.h"
+
+static const enum ice_adminq_opc aqc_permitted_tbl[] = {
+	/* Generic Firmware Admin commands */
+	ice_aqc_opc_get_ver,
+	ice_aqc_opc_req_res,
+	ice_aqc_opc_release_res,
+	ice_aqc_opc_list_func_caps,
+	ice_aqc_opc_list_dev_caps,
+
+	/* Package Configuration Admin Commands */
+	ice_aqc_opc_update_pkg,
+	ice_aqc_opc_get_pkg_info_list,
+
+	/* PHY commands */
+	ice_aqc_opc_get_phy_caps,
+	ice_aqc_opc_get_link_status,
+
+	/* Switch Block */
+	ice_aqc_opc_get_sw_cfg,
+	ice_aqc_opc_alloc_res,
+	ice_aqc_opc_free_res,
+	ice_aqc_opc_add_recipe,
+	ice_aqc_opc_recipe_to_profile,
+	ice_aqc_opc_get_recipe,
+	ice_aqc_opc_get_recipe_to_profile,
+	ice_aqc_opc_add_sw_rules,
+	ice_aqc_opc_update_sw_rules,
+	ice_aqc_opc_remove_sw_rules,
+};
+
+/**
+ * ice_dcf_aq_cmd_permitted - validate the AdminQ command permitted or not
+ * @desc: descriptor describing the command
+ */
+bool ice_dcf_aq_cmd_permitted(struct ice_aq_desc *desc)
+{
+	u16 opc = le16_to_cpu(desc->opcode);
+	unsigned int i;
+
+	for (i = 0; i < ARRAY_SIZE(aqc_permitted_tbl); i++)
+		if (opc == aqc_permitted_tbl[i])
+			return true;
+
+	return false;
+}
diff --git a/drivers/net/ethernet/intel/ice/ice_dcf.h b/drivers/net/ethernet/intel/ice/ice_dcf.h
new file mode 100644
index 000000000000..9edb2d5d9d8f
--- /dev/null
+++ b/drivers/net/ethernet/intel/ice/ice_dcf.h
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (C) 2018-2020, Intel Corporation. */
+
+#ifndef _ICE_DCF_H_
+#define _ICE_DCF_H_
+
+struct ice_dcf {
+	/* Handle the AdminQ command between the DCF (Device Config Function)
+	 * and the firmware.
+	 */
+#define ICE_DCF_AQ_DESC_TIMEOUT	(HZ / 10)
+	struct ice_aq_desc aq_desc;
+	u8 aq_desc_received;
+	unsigned long aq_desc_expires;
+};
+
+#ifdef CONFIG_PCI_IOV
+bool ice_dcf_aq_cmd_permitted(struct ice_aq_desc *desc);
+#endif /* CONFIG_PCI_IOV */
+#endif /* _ICE_DCF_H_ */
diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
index 16a2f2526ccc..0851944b8fd4 100644
--- a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
@@ -3652,6 +3652,130 @@ static int ice_vf_init_vlan_stripping(struct ice_vf *vf)
 		return ice_vsi_manage_vlan_stripping(vsi, false);
 }
 
+/**
+ * ice_dcf_handle_aq_cmd - handle the AdminQ command from DCF to FW
+ * @vf: pointer to the VF info
+ * @aq_desc: the AdminQ command descriptor
+ * @aq_buf: the AdminQ command buffer if aq_buf_size is non-zero
+ * @aq_buf_size: the AdminQ command buffer size
+ *
+ * The VF splits the AdminQ command into two parts: one is the descriptor of
+ * AdminQ command, the other is the buffer of AdminQ command (the descriptor
+ * has BUF flag set). When both of them are received by PF, this function will
+ * forward them to firmware once to get the AdminQ's response. And also, the
+ * filled descriptor and buffer of the response will be sent back to VF one by
+ * one through the virtchnl.
+ */
+static int
+ice_dcf_handle_aq_cmd(struct ice_vf *vf, struct ice_aq_desc *aq_desc,
+		      u8 *aq_buf, u16 aq_buf_size)
+{
+	enum virtchnl_status_code v_ret = VIRTCHNL_STATUS_SUCCESS;
+	struct ice_pf *pf = vf->pf;
+	enum virtchnl_ops v_op;
+	enum ice_status aq_ret;
+	u16 v_msg_len = 0;
+	u8 *v_msg = NULL;
+	int ret;
+
+	pf->dcf.aq_desc_received = false;
+
+	if ((aq_buf && !aq_buf_size) || (!aq_buf && aq_buf_size))
+		return -EINVAL;
+
+	aq_ret = ice_aq_send_cmd(&pf->hw, aq_desc, aq_buf, aq_buf_size, NULL);
+	/* It needs to send back the AQ response message if ICE_ERR_AQ_ERROR
+	 * returns, some AdminQ handlers will use the error code filled by FW
+	 * to do exception handling.
+	 */
+	if (aq_ret && aq_ret != ICE_ERR_AQ_ERROR) {
+		v_ret = VIRTCHNL_STATUS_ERR_ADMIN_QUEUE_ERROR;
+		v_op = VIRTCHNL_OP_DCF_CMD_DESC;
+		goto err;
+	}
+
+	ret = ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_DCF_CMD_DESC, v_ret,
+				    (u8 *)aq_desc, sizeof(*aq_desc));
+	/* Bail out so we don't send the VIRTCHNL_OP_DCF_CMD_BUFF message
+	 * below if failure happens or no AdminQ command buffer.
+	 */
+	if (ret || !aq_buf_size)
+		return ret;
+
+	v_op = VIRTCHNL_OP_DCF_CMD_BUFF;
+	v_msg_len = le16_to_cpu(aq_desc->datalen);
+
+	/* buffer is not updated if data length exceeds buffer size */
+	if (v_msg_len > aq_buf_size)
+		v_msg_len = 0;
+	else if (v_msg_len)
+		v_msg = aq_buf;
+
+	/* send the response back to the VF */
+err:
+	return ice_vc_send_msg_to_vf(vf, v_op, v_ret, v_msg, v_msg_len);
+}
+
+/**
+ * ice_vc_dcf_cmd_desc_msg - handle the DCF AdminQ command descriptor
+ * @vf: pointer to the VF info
+ * @msg: pointer to the msg buffer which holds the command descriptor
+ * @len: length of the message
+ */
+static int ice_vc_dcf_cmd_desc_msg(struct ice_vf *vf, u8 *msg, u16 len)
+{
+	struct ice_aq_desc *aq_desc = (struct ice_aq_desc *)msg;
+	struct ice_pf *pf = vf->pf;
+
+	if (len != sizeof(*aq_desc) || !ice_dcf_aq_cmd_permitted(aq_desc)) {
+		/* In case to avoid the VIRTCHNL_OP_DCF_CMD_DESC message with
+		 * the ICE_AQ_FLAG_BUF set followed by another bad message
+		 * VIRTCHNL_OP_DCF_CMD_DESC.
+		 */
+		pf->dcf.aq_desc_received = false;
+		goto err;
+	}
+
+	/* The AdminQ descriptor needs to be stored for use when the followed
+	 * VIRTCHNL_OP_DCF_CMD_BUFF is received.
+	 */
+	if (aq_desc->flags & cpu_to_le16(ICE_AQ_FLAG_BUF)) {
+		pf->dcf.aq_desc = *aq_desc;
+		pf->dcf.aq_desc_received = true;
+		pf->dcf.aq_desc_expires = jiffies + ICE_DCF_AQ_DESC_TIMEOUT;
+		return 0;
+	}
+
+	return ice_dcf_handle_aq_cmd(vf, aq_desc, NULL, 0);
+
+	/* send the response back to the VF */
+err:
+	return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_DCF_CMD_DESC,
+				     VIRTCHNL_STATUS_ERR_PARAM, NULL, 0);
+}
+
+/**
+ * ice_vc_dcf_cmd_buff_msg - handle the DCF AdminQ command buffer
+ * @vf: pointer to the VF info
+ * @msg: pointer to the msg buffer which holds the command buffer
+ * @len: length of the message
+ */
+static int ice_vc_dcf_cmd_buff_msg(struct ice_vf *vf, u8 *msg, u16 len)
+{
+	struct ice_pf *pf = vf->pf;
+
+	if (!len || !pf->dcf.aq_desc_received ||
+	    time_is_before_jiffies(pf->dcf.aq_desc_expires))
+		goto err;
+
+	return ice_dcf_handle_aq_cmd(vf, &pf->dcf.aq_desc, msg, len);
+
+	/* send the response back to the VF */
+err:
+	return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_DCF_CMD_BUFF,
+				     VIRTCHNL_STATUS_ERR_PARAM, NULL, 0);
+}
+
 /**
  * ice_vc_process_vf_msg - Process request from VF
  * @pf: pointer to the PF structure
@@ -3762,6 +3886,12 @@ void ice_vc_process_vf_msg(struct ice_pf *pf, struct ice_rq_event_info *event)
 	case VIRTCHNL_OP_DISABLE_VLAN_STRIPPING:
 		err = ice_vc_dis_vlan_stripping(vf);
 		break;
+	case VIRTCHNL_OP_DCF_CMD_DESC:
+		err = ice_vc_dcf_cmd_desc_msg(vf, msg, msglen);
+		break;
+	case VIRTCHNL_OP_DCF_CMD_BUFF:
+		err = ice_vc_dcf_cmd_buff_msg(vf, msg, msglen);
+		break;
 	case VIRTCHNL_OP_UNKNOWN:
 	default:
 		dev_err(dev, "Unsupported opcode %d from VF %d\n", v_opcode,
diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.h b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.h
index 67aa9110fdd1..4a257415f6a5 100644
--- a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.h
+++ b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.h
@@ -4,6 +4,7 @@
 #ifndef _ICE_VIRTCHNL_PF_H_
 #define _ICE_VIRTCHNL_PF_H_
 #include "ice.h"
+#include "ice_dcf.h"
 
 /* Restrict number of MAC Addr and VLAN that non-trusted VF can programmed */
 #define ICE_MAX_VLAN_PER_VF		8
diff --git a/include/linux/avf/virtchnl.h b/include/linux/avf/virtchnl.h
index 40bad71865ea..cd627a217b1c 100644
--- a/include/linux/avf/virtchnl.h
+++ b/include/linux/avf/virtchnl.h
@@ -136,6 +136,9 @@ enum virtchnl_ops {
 	VIRTCHNL_OP_DISABLE_CHANNELS = 31,
 	VIRTCHNL_OP_ADD_CLOUD_FILTER = 32,
 	VIRTCHNL_OP_DEL_CLOUD_FILTER = 33,
+	/* opcode 34, 35, 36, 37 and 38 are reserved */
+	VIRTCHNL_OP_DCF_CMD_DESC = 39,
+	VIRTCHNL_OP_DCF_CMD_BUFF = 40,
 };
 
 /* These macros are used to generate compilation errors if a structure/union
@@ -828,6 +831,13 @@ virtchnl_vc_validate_vf_msg(struct virtchnl_version_info *ver, u32 v_opcode,
 	case VIRTCHNL_OP_DEL_CLOUD_FILTER:
 		valid_len = sizeof(struct virtchnl_filter);
 		break;
+	case VIRTCHNL_OP_DCF_CMD_DESC:
+	case VIRTCHNL_OP_DCF_CMD_BUFF:
+		/* These two opcodes are specific to handle the AdminQ command,
+		 * so the validation needs to be done in PF's context.
+		 */
+		valid_len = msglen;
+		break;
 	/* These are always errors coming from the VF. */
 	case VIRTCHNL_OP_EVENT:
 	case VIRTCHNL_OP_UNKNOWN:
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [net-next 2/5] ice: add DCF cap negotiation and state machine
  2020-07-13 17:43 [net-next 0/5][pull request] 100GbE Intel Wired LAN Driver Updates 2020-07-13 Tony Nguyen
  2020-07-13 17:43 ` [net-next 1/5] ice: add the virtchnl handler for AdminQ command Tony Nguyen
@ 2020-07-13 17:43 ` Tony Nguyen
  2020-07-13 17:43 ` [net-next 3/5] ice: support to get the VSI mapping Tony Nguyen
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 21+ messages in thread
From: Tony Nguyen @ 2020-07-13 17:43 UTC (permalink / raw)
  To: davem
  Cc: Haiyue Wang, netdev, nhorman, sassmann, jeffrey.t.kirsher,
	anthony.l.nguyen, Xiaoyun Li, Nannan Lu, Andrew Bowers

From: Haiyue Wang <haiyue.wang@intel.com>

The trust VF0 needs to negotiate the DCF capability firstly. Then the PF
driver may allow this VF to enter into DCF "ON" state if various checks
are passed. In DCF "ON" state, the VF0 can send the AdminQ command to do
advanced switch rules creation for other VFs.

If one of the VFs resets, its hardware VSI number may be changed, so the
VF0 will enter into the DCF "BUSY" state immediately to avoid adding the
wrong rule. After the reset is done, the DCF state is changed to "PAUSE"
mode, and a DCF_VSI_MAP_UPDATE event will be sent to the VF0. This event
notifies the VF0 to restart negotiating the DCF capability again.

Also the VF0 can exits the DCF service gracefully by issuing the virtual
channel command OP_DCF_DISABLE.

The system administrator can disable the DCF service by changing the
trust mode to untrusted.

Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Tested-by: Nannan Lu <nannan.lu@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_dcf.c      | 77 +++++++++++++++++
 drivers/net/ethernet/intel/ice/ice_dcf.h      | 24 ++++++
 .../net/ethernet/intel/ice/ice_virtchnl_pf.c  | 85 ++++++++++++++++++-
 include/linux/avf/virtchnl.h                  |  9 ++
 4 files changed, 194 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_dcf.c b/drivers/net/ethernet/intel/ice/ice_dcf.c
index cbe60a0cb2d2..e7d37735aaa5 100644
--- a/drivers/net/ethernet/intel/ice/ice_dcf.c
+++ b/drivers/net/ethernet/intel/ice/ice_dcf.c
@@ -47,3 +47,80 @@ bool ice_dcf_aq_cmd_permitted(struct ice_aq_desc *desc)
 
 	return false;
 }
+
+/**
+ * ice_check_dcf_allowed - check if DCF is allowed based on various checks
+ * @vf: pointer to the VF to check
+ */
+bool ice_check_dcf_allowed(struct ice_vf *vf)
+{
+	struct ice_pf *pf = vf->pf;
+	struct device *dev;
+
+	dev = ice_pf_to_dev(pf);
+
+	if (vf->vf_id != ICE_DCF_VFID) {
+		dev_err(dev, "VF %d requested DCF capability, but only VF %d is allowed to request DCF capability\n",
+			vf->vf_id, ICE_DCF_VFID);
+		return false;
+	}
+
+	if (!vf->trusted) {
+		dev_err(dev, "VF needs to be trusted to configure DCF capability\n");
+		return false;
+	}
+
+	return true;
+}
+
+/**
+ * ice_vf_is_dcf - helper to check if the assigned VF is a DCF
+ * @vf: the assigned VF to be checked
+ */
+bool ice_is_vf_dcf(struct ice_vf *vf)
+{
+	return vf == vf->pf->dcf.vf;
+}
+
+/**
+ * ice_dcf_get_state - Get DCF state of the associated PF
+ * @pf: PF instance
+ */
+enum ice_dcf_state ice_dcf_get_state(struct ice_pf *pf)
+{
+	return pf->dcf.vf ? pf->dcf.state : ICE_DCF_STATE_OFF;
+}
+
+/**
+ * ice_dcf_state_str - convert DCF state code to a string
+ * @state: the DCF state code to convert
+ */
+static const char *ice_dcf_state_str(enum ice_dcf_state state)
+{
+	switch (state) {
+	case ICE_DCF_STATE_OFF:
+		return "ICE_DCF_STATE_OFF";
+	case ICE_DCF_STATE_ON:
+		return "ICE_DCF_STATE_ON";
+	case ICE_DCF_STATE_BUSY:
+		return "ICE_DCF_STATE_BUSY";
+	case ICE_DCF_STATE_PAUSE:
+		return "ICE_DCF_STATE_PAUSE";
+	}
+
+	return "ICE_DCF_STATE_UNKNOWN";
+}
+
+/**
+ * ice_dcf_set_state - Set DCF state for the associated PF
+ * @pf: PF instance
+ * @state: new DCF state
+ */
+void ice_dcf_set_state(struct ice_pf *pf, enum ice_dcf_state state)
+{
+	dev_dbg(ice_pf_to_dev(pf), "DCF state is changing from %s to %s\n",
+		ice_dcf_state_str(pf->dcf.state),
+		ice_dcf_state_str(state));
+
+	pf->dcf.state = state;
+}
diff --git a/drivers/net/ethernet/intel/ice/ice_dcf.h b/drivers/net/ethernet/intel/ice/ice_dcf.h
index 9edb2d5d9d8f..1dabcca6f753 100644
--- a/drivers/net/ethernet/intel/ice/ice_dcf.h
+++ b/drivers/net/ethernet/intel/ice/ice_dcf.h
@@ -4,7 +4,27 @@
 #ifndef _ICE_DCF_H_
 #define _ICE_DCF_H_
 
+struct ice_vf;
+struct ice_pf;
+
+#define ICE_DCF_VFID	0
+
+/* DCF mode states */
+enum ice_dcf_state {
+	/* DCF mode is fully off */
+	ICE_DCF_STATE_OFF = 0,
+	/* Process is live, acquired capability to send DCF CMD */
+	ICE_DCF_STATE_ON,
+	/* Kernel is busy, deny DCF CMD */
+	ICE_DCF_STATE_BUSY,
+	/* Kernel is ready for Process to Re-establish, deny DCF CMD */
+	ICE_DCF_STATE_PAUSE,
+};
+
 struct ice_dcf {
+	struct ice_vf *vf;
+	enum ice_dcf_state state;
+
 	/* Handle the AdminQ command between the DCF (Device Config Function)
 	 * and the firmware.
 	 */
@@ -16,5 +36,9 @@ struct ice_dcf {
 
 #ifdef CONFIG_PCI_IOV
 bool ice_dcf_aq_cmd_permitted(struct ice_aq_desc *desc);
+bool ice_check_dcf_allowed(struct ice_vf *vf);
+bool ice_is_vf_dcf(struct ice_vf *vf);
+enum ice_dcf_state ice_dcf_get_state(struct ice_pf *pf);
+void ice_dcf_set_state(struct ice_pf *pf, enum ice_dcf_state state);
 #endif /* CONFIG_PCI_IOV */
 #endif /* _ICE_DCF_H_ */
diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
index 0851944b8fd4..86ca35d0942f 100644
--- a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
@@ -360,6 +360,11 @@ void ice_free_vfs(struct ice_pf *pf)
 	else
 		dev_warn(dev, "VFs are assigned - not disabling SR-IOV\n");
 
+	if (ice_dcf_get_state(pf) != ICE_DCF_STATE_OFF) {
+		ice_dcf_set_state(pf, ICE_DCF_STATE_OFF);
+		pf->dcf.vf = NULL;
+	}
+
 	/* Avoid wait time by stopping all VFs at the same time */
 	ice_for_each_vf(pf, i)
 		if (test_bit(ICE_VF_STATE_QS_ENA, pf->vf[i].vf_states))
@@ -1285,6 +1290,9 @@ bool ice_reset_vf(struct ice_vf *vf, bool is_vflr)
 	set_bit(ICE_VF_STATE_DIS, vf->vf_states);
 	ice_trigger_vf_reset(vf, is_vflr, false);
 
+	if (ice_dcf_get_state(pf) == ICE_DCF_STATE_ON)
+		ice_dcf_set_state(pf, ICE_DCF_STATE_BUSY);
+
 	vsi = pf->vsi[vf->lan_vsi_idx];
 
 	if (test_bit(ICE_VF_STATE_QS_ENA, vf->vf_states))
@@ -1340,6 +1348,21 @@ bool ice_reset_vf(struct ice_vf *vf, bool is_vflr)
 	ice_vf_rebuild_vsi_with_release(vf);
 	ice_vf_post_vsi_rebuild(vf);
 
+	if (ice_dcf_get_state(pf) == ICE_DCF_STATE_BUSY) {
+		struct virtchnl_pf_event pfe = { 0 };
+
+		ice_dcf_set_state(pf, ICE_DCF_STATE_PAUSE);
+
+		pfe.event = VIRTCHNL_EVENT_DCF_VSI_MAP_UPDATE;
+		pfe.event_data.vf_vsi_map.vf_id = vf->vf_id;
+		pfe.event_data.vf_vsi_map.vsi_id = vf->lan_vsi_num;
+
+		ice_aq_send_msg_to_vf(&pf->hw, ICE_DCF_VFID,
+				      VIRTCHNL_OP_EVENT,
+				      VIRTCHNL_STATUS_SUCCESS,
+				      (u8 *)&pfe, sizeof(pfe), NULL);
+	}
+
 	return true;
 }
 
@@ -1977,6 +2000,24 @@ static int ice_vc_get_vf_res_msg(struct ice_vf *vf, u8 *msg)
 	if (vf->driver_caps & VIRTCHNL_VF_CAP_ADV_LINK_SPEED)
 		vfres->vf_cap_flags |= VIRTCHNL_VF_CAP_ADV_LINK_SPEED;
 
+	/* Negotiate DCF capability. */
+	if (vf->driver_caps & VIRTCHNL_VF_CAP_DCF) {
+		if (!ice_check_dcf_allowed(vf)) {
+			v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+			goto err;
+		}
+		vfres->vf_cap_flags |= VIRTCHNL_VF_CAP_DCF;
+		pf->dcf.vf = vf;
+		ice_dcf_set_state(pf, ICE_DCF_STATE_ON);
+		dev_info(ice_pf_to_dev(pf), "Grant request for DCF functionality to VF%d\n",
+			 ICE_DCF_VFID);
+	} else if (ice_is_vf_dcf(vf) &&
+		   ice_dcf_get_state(pf) != ICE_DCF_STATE_OFF) {
+		ice_dcf_set_state(pf, ICE_DCF_STATE_OFF);
+		pf->dcf.vf = NULL;
+		ice_reset_vf(vf, false);
+	}
+
 	vfres->num_vsis = 1;
 	/* Tx and Rx queue are equal for VF */
 	vfres->num_queue_pairs = vsi->num_txq;
@@ -3727,6 +3768,9 @@ static int ice_vc_dcf_cmd_desc_msg(struct ice_vf *vf, u8 *msg, u16 len)
 	struct ice_aq_desc *aq_desc = (struct ice_aq_desc *)msg;
 	struct ice_pf *pf = vf->pf;
 
+	if (!ice_is_vf_dcf(vf) || ice_dcf_get_state(pf) != ICE_DCF_STATE_ON)
+		goto err;
+
 	if (len != sizeof(*aq_desc) || !ice_dcf_aq_cmd_permitted(aq_desc)) {
 		/* In case to avoid the VIRTCHNL_OP_DCF_CMD_DESC message with
 		 * the ICE_AQ_FLAG_BUF set followed by another bad message
@@ -3764,7 +3808,8 @@ static int ice_vc_dcf_cmd_buff_msg(struct ice_vf *vf, u8 *msg, u16 len)
 {
 	struct ice_pf *pf = vf->pf;
 
-	if (!len || !pf->dcf.aq_desc_received ||
+	if (!ice_is_vf_dcf(vf) || ice_dcf_get_state(pf) != ICE_DCF_STATE_ON ||
+	    !len || !pf->dcf.aq_desc_received ||
 	    time_is_before_jiffies(pf->dcf.aq_desc_expires))
 		goto err;
 
@@ -3776,6 +3821,34 @@ static int ice_vc_dcf_cmd_buff_msg(struct ice_vf *vf, u8 *msg, u16 len)
 				     VIRTCHNL_STATUS_ERR_PARAM, NULL, 0);
 }
 
+/**
+ * ice_vc_dis_dcf_cap - disable DCF capability for the VF
+ * @vf: pointer to the VF
+ */
+static int ice_vc_dis_dcf_cap(struct ice_vf *vf)
+{
+	enum virtchnl_status_code v_ret = VIRTCHNL_STATUS_SUCCESS;
+
+	if (!test_bit(ICE_VF_STATE_ACTIVE, vf->vf_states)) {
+		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+		goto err;
+	}
+
+	if (!ice_is_vf_dcf(vf)) {
+		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+		goto err;
+	}
+
+	if (vf->driver_caps & VIRTCHNL_VF_CAP_DCF) {
+		vf->driver_caps &= ~VIRTCHNL_VF_CAP_DCF;
+		ice_dcf_set_state(vf->pf, ICE_DCF_STATE_OFF);
+		vf->pf->dcf.vf = NULL;
+	}
+err:
+	return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_DCF_DISABLE,
+				     v_ret, NULL, 0);
+}
+
 /**
  * ice_vc_process_vf_msg - Process request from VF
  * @pf: pointer to the PF structure
@@ -3892,6 +3965,9 @@ void ice_vc_process_vf_msg(struct ice_pf *pf, struct ice_rq_event_info *event)
 	case VIRTCHNL_OP_DCF_CMD_BUFF:
 		err = ice_vc_dcf_cmd_buff_msg(vf, msg, msglen);
 		break;
+	case VIRTCHNL_OP_DCF_DISABLE:
+		err = ice_vc_dis_dcf_cap(vf);
+		break;
 	case VIRTCHNL_OP_UNKNOWN:
 	default:
 		dev_err(dev, "Unsupported opcode %d from VF %d\n", v_opcode,
@@ -4068,6 +4144,13 @@ int ice_set_vf_trust(struct net_device *netdev, int vf_id, bool trusted)
 	if (trusted == vf->trusted)
 		return 0;
 
+	if (ice_is_vf_dcf(vf) && !trusted &&
+	    ice_dcf_get_state(pf) != ICE_DCF_STATE_OFF) {
+		ice_dcf_set_state(pf, ICE_DCF_STATE_OFF);
+		pf->dcf.vf = NULL;
+		vf->driver_caps &= ~VIRTCHNL_VF_CAP_DCF;
+	}
+
 	vf->trusted = trusted;
 	ice_vc_reset_vf(vf);
 	dev_info(ice_pf_to_dev(pf), "VF %u is now %strusted\n",
diff --git a/include/linux/avf/virtchnl.h b/include/linux/avf/virtchnl.h
index cd627a217b1c..2ff8e31f3172 100644
--- a/include/linux/avf/virtchnl.h
+++ b/include/linux/avf/virtchnl.h
@@ -139,6 +139,7 @@ enum virtchnl_ops {
 	/* opcode 34, 35, 36, 37 and 38 are reserved */
 	VIRTCHNL_OP_DCF_CMD_DESC = 39,
 	VIRTCHNL_OP_DCF_CMD_BUFF = 40,
+	VIRTCHNL_OP_DCF_DISABLE = 41,
 };
 
 /* These macros are used to generate compilation errors if a structure/union
@@ -250,6 +251,7 @@ VIRTCHNL_CHECK_STRUCT_LEN(16, virtchnl_vsi_resource);
 #define VIRTCHNL_VF_OFFLOAD_ENCAP_CSUM		0X00200000
 #define VIRTCHNL_VF_OFFLOAD_RX_ENCAP_CSUM	0X00400000
 #define VIRTCHNL_VF_OFFLOAD_ADQ			0X00800000
+#define VIRTCHNL_VF_CAP_DCF			0X40000000
 
 /* Define below the capability flags that are not offloads */
 #define VIRTCHNL_VF_CAP_ADV_LINK_SPEED		0x00000080
@@ -592,6 +594,7 @@ enum virtchnl_event_codes {
 	VIRTCHNL_EVENT_LINK_CHANGE,
 	VIRTCHNL_EVENT_RESET_IMPENDING,
 	VIRTCHNL_EVENT_PF_DRIVER_CLOSE,
+	VIRTCHNL_EVENT_DCF_VSI_MAP_UPDATE,
 };
 
 #define PF_EVENT_SEVERITY_INFO		0
@@ -618,6 +621,10 @@ struct virtchnl_pf_event {
 			u8 link_status;
 			u8 pad[3];
 		} link_event_adv;
+		struct {
+			u16 vf_id;
+			u16 vsi_id;
+		} vf_vsi_map;
 	} event_data;
 
 	int severity;
@@ -838,6 +845,8 @@ virtchnl_vc_validate_vf_msg(struct virtchnl_version_info *ver, u32 v_opcode,
 		 */
 		valid_len = msglen;
 		break;
+	case VIRTCHNL_OP_DCF_DISABLE:
+		break;
 	/* These are always errors coming from the VF. */
 	case VIRTCHNL_OP_EVENT:
 	case VIRTCHNL_OP_UNKNOWN:
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [net-next 3/5] ice: support to get the VSI mapping
  2020-07-13 17:43 [net-next 0/5][pull request] 100GbE Intel Wired LAN Driver Updates 2020-07-13 Tony Nguyen
  2020-07-13 17:43 ` [net-next 1/5] ice: add the virtchnl handler for AdminQ command Tony Nguyen
  2020-07-13 17:43 ` [net-next 2/5] ice: add DCF cap negotiation and state machine Tony Nguyen
@ 2020-07-13 17:43 ` Tony Nguyen
  2020-07-13 17:43 ` [net-next 4/5] ice: enable DDP package info querying Tony Nguyen
  2020-07-13 17:43 ` [net-next 5/5] ice: add switch rule management for DCF Tony Nguyen
  4 siblings, 0 replies; 21+ messages in thread
From: Tony Nguyen @ 2020-07-13 17:43 UTC (permalink / raw)
  To: davem
  Cc: Haiyue Wang, netdev, nhorman, sassmann, jeffrey.t.kirsher,
	anthony.l.nguyen, Beilei Xing, Nannan Lu, Andrew Bowers

From: Haiyue Wang <haiyue.wang@intel.com>

The DCF needs the mapping information of the VF ID to logical hardware
VSI ID, so that it can create the switch flow rules for other VFs.

Signed-off-by: Beilei Xing <beilei.xing@intel.com>
Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Tested-by: Nannan Lu <nannan.lu@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
---
 .../net/ethernet/intel/ice/ice_virtchnl_pf.c  | 61 +++++++++++++++++++
 include/linux/avf/virtchnl.h                  | 21 +++++++
 2 files changed, 82 insertions(+)

diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
index 86ca35d0942f..45050bf189d5 100644
--- a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
@@ -3849,6 +3849,64 @@ static int ice_vc_dis_dcf_cap(struct ice_vf *vf)
 				     v_ret, NULL, 0);
 }
 
+/**
+ * ice_vc_dcf_get_vsi_map - get VSI mapping table
+ * @vf: pointer to the VF info
+ */
+static int ice_vc_dcf_get_vsi_map(struct ice_vf *vf)
+{
+	enum virtchnl_status_code v_ret = VIRTCHNL_STATUS_SUCCESS;
+	struct virtchnl_dcf_vsi_map *vsi_map = NULL;
+	struct ice_pf *pf = vf->pf;
+	struct ice_vsi *pf_vsi;
+	u16 len = 0;
+	int vf_id;
+	int ret;
+
+	if (!test_bit(ICE_VF_STATE_ACTIVE, vf->vf_states)) {
+		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+		goto err;
+	}
+
+	if (!ice_is_vf_dcf(vf) || ice_dcf_get_state(pf) != ICE_DCF_STATE_ON) {
+		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+		goto err;
+	}
+
+	len = struct_size(vsi_map, vf_vsi, pf->num_alloc_vfs - 1);
+	vsi_map = kzalloc(len, GFP_KERNEL);
+	if (!vsi_map) {
+		v_ret = VIRTCHNL_STATUS_ERR_NO_MEMORY;
+		len = 0;
+		goto err;
+	}
+
+	pf_vsi = ice_get_main_vsi(pf);
+	if (!pf_vsi) {
+		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+		len = 0;
+		goto err;
+	}
+
+	vsi_map->pf_vsi = pf_vsi->vsi_num;
+	vsi_map->num_vfs = pf->num_alloc_vfs;
+
+	ice_for_each_vf(pf, vf_id) {
+		struct ice_vf *tmp_vf = &pf->vf[vf_id];
+
+		if (!ice_is_vf_disabled(tmp_vf) &&
+		    test_bit(ICE_VF_STATE_INIT, tmp_vf->vf_states))
+			vsi_map->vf_vsi[vf_id] = tmp_vf->lan_vsi_num |
+				VIRTCHNL_DCF_VF_VSI_VALID;
+	}
+
+err:
+	ret = ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_DCF_GET_VSI_MAP, v_ret,
+				    (u8 *)vsi_map, len);
+	kfree(vsi_map);
+	return ret;
+}
+
 /**
  * ice_vc_process_vf_msg - Process request from VF
  * @pf: pointer to the PF structure
@@ -3968,6 +4026,9 @@ void ice_vc_process_vf_msg(struct ice_pf *pf, struct ice_rq_event_info *event)
 	case VIRTCHNL_OP_DCF_DISABLE:
 		err = ice_vc_dis_dcf_cap(vf);
 		break;
+	case VIRTCHNL_OP_DCF_GET_VSI_MAP:
+		err = ice_vc_dcf_get_vsi_map(vf);
+		break;
 	case VIRTCHNL_OP_UNKNOWN:
 	default:
 		dev_err(dev, "Unsupported opcode %d from VF %d\n", v_opcode,
diff --git a/include/linux/avf/virtchnl.h b/include/linux/avf/virtchnl.h
index 2ff8e31f3172..a34f0a529f0a 100644
--- a/include/linux/avf/virtchnl.h
+++ b/include/linux/avf/virtchnl.h
@@ -140,6 +140,7 @@ enum virtchnl_ops {
 	VIRTCHNL_OP_DCF_CMD_DESC = 39,
 	VIRTCHNL_OP_DCF_CMD_BUFF = 40,
 	VIRTCHNL_OP_DCF_DISABLE = 41,
+	VIRTCHNL_OP_DCF_GET_VSI_MAP = 42,
 };
 
 /* These macros are used to generate compilation errors if a structure/union
@@ -584,6 +585,25 @@ struct virtchnl_filter {
 
 VIRTCHNL_CHECK_STRUCT_LEN(272, virtchnl_filter);
 
+/* VIRTCHNL_OP_DCF_GET_VSI_MAP
+ * VF sends this message to get VSI mapping table.
+ * PF responds with an indirect message containing VF's
+ * HW VSI IDs.
+ * The index of vf_vsi array is the logical VF ID, the
+ * value of vf_vsi array is the VF's HW VSI ID with its
+ * valid configuration.
+ */
+struct virtchnl_dcf_vsi_map {
+	u16 pf_vsi;	/* PF's HW VSI ID */
+	u16 num_vfs;	/* The actual number of VFs allocated */
+#define VIRTCHNL_DCF_VF_VSI_ID_S	0
+#define VIRTCHNL_DCF_VF_VSI_ID_M	(0xFFF << VIRTCHNL_DCF_VF_VSI_ID_S)
+#define VIRTCHNL_DCF_VF_VSI_VALID	BIT(15)
+	u16 vf_vsi[1];
+};
+
+VIRTCHNL_CHECK_STRUCT_LEN(6, virtchnl_dcf_vsi_map);
+
 /* VIRTCHNL_OP_EVENT
  * PF sends this message to inform the VF driver of events that may affect it.
  * No direct response is expected from the VF, though it may generate other
@@ -846,6 +866,7 @@ virtchnl_vc_validate_vf_msg(struct virtchnl_version_info *ver, u32 v_opcode,
 		valid_len = msglen;
 		break;
 	case VIRTCHNL_OP_DCF_DISABLE:
+	case VIRTCHNL_OP_DCF_GET_VSI_MAP:
 		break;
 	/* These are always errors coming from the VF. */
 	case VIRTCHNL_OP_EVENT:
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [net-next 4/5] ice: enable DDP package info querying
  2020-07-13 17:43 [net-next 0/5][pull request] 100GbE Intel Wired LAN Driver Updates 2020-07-13 Tony Nguyen
                   ` (2 preceding siblings ...)
  2020-07-13 17:43 ` [net-next 3/5] ice: support to get the VSI mapping Tony Nguyen
@ 2020-07-13 17:43 ` Tony Nguyen
  2020-07-13 17:43 ` [net-next 5/5] ice: add switch rule management for DCF Tony Nguyen
  4 siblings, 0 replies; 21+ messages in thread
From: Tony Nguyen @ 2020-07-13 17:43 UTC (permalink / raw)
  To: davem
  Cc: Haiyue Wang, netdev, nhorman, sassmann, jeffrey.t.kirsher,
	anthony.l.nguyen, Leyi Rong, Ting Xu, Nannan Lu, Andrew Bowers

From: Haiyue Wang <haiyue.wang@intel.com>

Since the firmware doesn't support reading the DDP package data that PF
is using. The DCF has to find the PF's DDP package file directly.

For searching the right DDP package that the PF uses, the DCF needs the
DDP package characteristic information such as the PF's device serial
number which is used to find the package loading path, and the exact DDP
track ID, package name, version.

Only with the matched DDP package, the DCF can get the right metadata to
create switch rules etc.

Signed-off-by: Leyi Rong <leyi.rong@intel.com>
Signed-off-by: Ting Xu <ting.xu@intel.com>
Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Tested-by: Nannan Lu <nannan.lu@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_dcf.h      |  6 +++
 drivers/net/ethernet/intel/ice/ice_main.c     |  2 +
 .../net/ethernet/intel/ice/ice_virtchnl_pf.c  | 52 +++++++++++++++++++
 include/linux/avf/virtchnl.h                  | 23 ++++++++
 4 files changed, 83 insertions(+)

diff --git a/drivers/net/ethernet/intel/ice/ice_dcf.h b/drivers/net/ethernet/intel/ice/ice_dcf.h
index 1dabcca6f753..1ca228f89a19 100644
--- a/drivers/net/ethernet/intel/ice/ice_dcf.h
+++ b/drivers/net/ethernet/intel/ice/ice_dcf.h
@@ -32,6 +32,12 @@ struct ice_dcf {
 	struct ice_aq_desc aq_desc;
 	u8 aq_desc_received;
 	unsigned long aq_desc_expires;
+
+	/* Save the current Device Serial Number when searching the package
+	 * path for later query.
+	 */
+#define ICE_DSN_NUM_LEN 8
+	u8 dsn[ICE_DSN_NUM_LEN];
 };
 
 #ifdef CONFIG_PCI_IOV
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index a1cef089201a..983b5e21b436 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -3243,6 +3243,8 @@ static char *ice_get_opt_fw_name(struct ice_pf *pf)
 	snprintf(opt_fw_filename, NAME_MAX, "%sice-%016llx.pkg",
 		 ICE_DDP_PKG_PATH, dsn);
 
+	memcpy(pf->dcf.dsn, &dsn, sizeof(pf->dcf.dsn));
+
 	return opt_fw_filename;
 }
 
diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
index 45050bf189d5..c9dd9d473922 100644
--- a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
@@ -3907,6 +3907,55 @@ static int ice_vc_dcf_get_vsi_map(struct ice_vf *vf)
 	return ret;
 }
 
+/**
+ * ice_vc_dcf_query_pkg_info - query DDP package info from PF
+ * @vf: pointer to VF info
+ *
+ * Called from VF to query DDP package information loaded in PF,
+ * including track ID, package name, version and device serial
+ * number.
+ */
+static int ice_vc_dcf_query_pkg_info(struct ice_vf *vf)
+{
+	enum virtchnl_status_code v_ret = VIRTCHNL_STATUS_SUCCESS;
+	struct virtchnl_pkg_info *pkg_info = NULL;
+	struct ice_hw *hw = &vf->pf->hw;
+	struct ice_pf *pf = vf->pf;
+	int len = 0;
+	int ret;
+
+	if (!test_bit(ICE_VF_STATE_ACTIVE, vf->vf_states)) {
+		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+		goto err;
+	}
+
+	if (!ice_is_vf_dcf(vf) || ice_dcf_get_state(pf) != ICE_DCF_STATE_ON) {
+		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+		goto err;
+	}
+
+	len = sizeof(struct virtchnl_pkg_info);
+	pkg_info = kzalloc(len, GFP_KERNEL);
+	if (!pkg_info) {
+		v_ret = VIRTCHNL_STATUS_ERR_NO_MEMORY;
+		len = 0;
+		goto err;
+	}
+
+	pkg_info->track_id = hw->active_track_id;
+	memcpy(&pkg_info->pkg_ver, &hw->active_pkg_ver,
+	       sizeof(pkg_info->pkg_ver));
+	memcpy(pkg_info->pkg_name, hw->active_pkg_name,
+	       sizeof(pkg_info->pkg_name));
+	memcpy(pkg_info->dsn, pf->dcf.dsn, sizeof(pkg_info->dsn));
+
+err:
+	ret = ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_DCF_GET_PKG_INFO,
+				    v_ret, (u8 *)pkg_info, len);
+	kfree(pkg_info);
+	return ret;
+}
+
 /**
  * ice_vc_process_vf_msg - Process request from VF
  * @pf: pointer to the PF structure
@@ -4029,6 +4078,9 @@ void ice_vc_process_vf_msg(struct ice_pf *pf, struct ice_rq_event_info *event)
 	case VIRTCHNL_OP_DCF_GET_VSI_MAP:
 		err = ice_vc_dcf_get_vsi_map(vf);
 		break;
+	case VIRTCHNL_OP_DCF_GET_PKG_INFO:
+		err = ice_vc_dcf_query_pkg_info(vf);
+		break;
 	case VIRTCHNL_OP_UNKNOWN:
 	default:
 		dev_err(dev, "Unsupported opcode %d from VF %d\n", v_opcode,
diff --git a/include/linux/avf/virtchnl.h b/include/linux/avf/virtchnl.h
index a34f0a529f0a..0191df7e5f9f 100644
--- a/include/linux/avf/virtchnl.h
+++ b/include/linux/avf/virtchnl.h
@@ -141,6 +141,7 @@ enum virtchnl_ops {
 	VIRTCHNL_OP_DCF_CMD_BUFF = 40,
 	VIRTCHNL_OP_DCF_DISABLE = 41,
 	VIRTCHNL_OP_DCF_GET_VSI_MAP = 42,
+	VIRTCHNL_OP_DCF_GET_PKG_INFO = 43,
 };
 
 /* These macros are used to generate compilation errors if a structure/union
@@ -604,6 +605,27 @@ struct virtchnl_dcf_vsi_map {
 
 VIRTCHNL_CHECK_STRUCT_LEN(6, virtchnl_dcf_vsi_map);
 
+#define PKG_NAME_SIZE	32
+#define DSN_SIZE	8
+
+struct pkg_version {
+	u8 major;
+	u8 minor;
+	u8 update;
+	u8 draft;
+};
+
+VIRTCHNL_CHECK_STRUCT_LEN(4, pkg_version);
+
+struct virtchnl_pkg_info {
+	struct pkg_version pkg_ver;
+	u32 track_id;
+	char pkg_name[PKG_NAME_SIZE];
+	u8 dsn[DSN_SIZE];
+};
+
+VIRTCHNL_CHECK_STRUCT_LEN(48, virtchnl_pkg_info);
+
 /* VIRTCHNL_OP_EVENT
  * PF sends this message to inform the VF driver of events that may affect it.
  * No direct response is expected from the VF, though it may generate other
@@ -867,6 +889,7 @@ virtchnl_vc_validate_vf_msg(struct virtchnl_version_info *ver, u32 v_opcode,
 		break;
 	case VIRTCHNL_OP_DCF_DISABLE:
 	case VIRTCHNL_OP_DCF_GET_VSI_MAP:
+	case VIRTCHNL_OP_DCF_GET_PKG_INFO:
 		break;
 	/* These are always errors coming from the VF. */
 	case VIRTCHNL_OP_EVENT:
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [net-next 5/5] ice: add switch rule management for DCF
  2020-07-13 17:43 [net-next 0/5][pull request] 100GbE Intel Wired LAN Driver Updates 2020-07-13 Tony Nguyen
                   ` (3 preceding siblings ...)
  2020-07-13 17:43 ` [net-next 4/5] ice: enable DDP package info querying Tony Nguyen
@ 2020-07-13 17:43 ` Tony Nguyen
  4 siblings, 0 replies; 21+ messages in thread
From: Tony Nguyen @ 2020-07-13 17:43 UTC (permalink / raw)
  To: davem
  Cc: Haiyue Wang, netdev, nhorman, sassmann, jeffrey.t.kirsher,
	anthony.l.nguyen, Xiao Zhang, Beilei Xing, Nannan Lu,
	Andrew Bowers

From: Haiyue Wang <haiyue.wang@intel.com>

The PF shall track all the outstanding switch filters (filter IDs to be
precise) added by the DCF.

Upon a VF reset event, the PF shall clear all outstanding switch filters
for the given VF. Upon completion of either VF or PF reset, the PF shall
skip replay of filters that were added by the DCF. The PF shall however
continue to replay the filters that were not added by DCF for the VF(s).

If the trust mode of the DCF is taken away without the DCF gracefully
relinquishing the DCF functionality (by way appropriate virtchnl message
exchanges), then the PF shall remove all switch filters that were added
by the DCF. The PF shall transition back from DCF mode to regular mode,
the VF shall be treated as any other untrusted VF, and the PF will reset
the VF.

If a designated DCF requests AVF functionality from the same VF (VF-ID)
without the DCF gracefully relinquishing the DCF functionality first (by
way appropriate virtchnl message exchanges), the PF shall remove all the
switch filters that were added by the DCF.

Signed-off-by: Xiao Zhang <xiao.zhang@intel.com>
Signed-off-by: Beilei Xing <beilei.xing@intel.com>
Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Tested-by: Nannan Lu <nannan.lu@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_dcf.c      | 707 ++++++++++++++++++
 drivers/net/ethernet/intel/ice/ice_dcf.h      |  41 +
 drivers/net/ethernet/intel/ice/ice_switch.c   |  16 +-
 drivers/net/ethernet/intel/ice/ice_switch.h   |  27 +-
 drivers/net/ethernet/intel/ice/ice_type.h     |   9 +
 .../net/ethernet/intel/ice/ice_virtchnl_pf.c  |  40 +
 6 files changed, 815 insertions(+), 25 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_dcf.c b/drivers/net/ethernet/intel/ice/ice_dcf.c
index e7d37735aaa5..5234ec581d32 100644
--- a/drivers/net/ethernet/intel/ice/ice_dcf.c
+++ b/drivers/net/ethernet/intel/ice/ice_dcf.c
@@ -124,3 +124,710 @@ void ice_dcf_set_state(struct ice_pf *pf, enum ice_dcf_state state)
 
 	pf->dcf.state = state;
 }
+
+/**
+ * ice_dcf_rm_sw_rule_to_vsi - remove switch rule of "forward to VSI"
+ * @pf: pointer to the PF struct
+ * @s_entry: pointer to switch rule entry to remove
+ */
+static int
+ice_dcf_rm_sw_rule_to_vsi(struct ice_pf *pf,
+			  struct ice_dcf_sw_rule_entry *s_entry)
+{
+	struct ice_aqc_sw_rules_elem *s_rule;
+	enum ice_status status;
+
+	s_rule = kzalloc(ICE_SW_RULE_RX_TX_NO_HDR_SIZE, GFP_KERNEL);
+	if (!s_rule)
+		return -ENOMEM;
+
+	s_rule->type = cpu_to_le16(ICE_AQC_SW_RULES_T_LKUP_RX);
+	s_rule->pdata.lkup_tx_rx.act = 0;
+	s_rule->pdata.lkup_tx_rx.hdr_len = 0;
+	s_rule->pdata.lkup_tx_rx.index = cpu_to_le16(s_entry->rule_id);
+	status = ice_aq_sw_rules(&pf->hw, s_rule, ICE_SW_RULE_RX_TX_NO_HDR_SIZE,
+				 1, ice_aqc_opc_remove_sw_rules, NULL);
+	kfree(s_rule);
+	if (status)
+		return -EIO;
+
+	list_del(&s_entry->list_entry);
+	kfree(s_entry);
+	return 0;
+}
+
+/**
+ * ice_dcf_rm_sw_rule_to_vsi_list - remove switch rule of "forward to VSI list"
+ * @pf: pointer to the PF struct
+ * @s_entry: pointer to switch rule entry to remove
+ */
+static int
+ice_dcf_rm_sw_rule_to_vsi_list(struct ice_pf *pf,
+			       struct ice_dcf_sw_rule_entry *s_entry)
+{
+	struct ice_dcf_vsi_list_info *vsi_list_info = s_entry->vsi_list_info;
+	struct ice_aqc_alloc_free_res_elem *res_buf;
+	struct ice_aqc_sw_rules_elem *s_rule;
+	enum ice_status status;
+	u16 rule_sz;
+	u16 vsi_id;
+	int i = 0;
+
+	if (!vsi_list_info)
+		return -EINVAL;
+
+	/* The VSI list is empty, it can be freed immediately */
+	if (!vsi_list_info->vsi_count)
+		goto free_vsi_list;
+
+	rule_sz = ICE_SW_RULE_VSI_LIST_SIZE(vsi_list_info->vsi_count);
+	s_rule = kzalloc(rule_sz, GFP_KERNEL);
+	if (!s_rule)
+		return -ENOMEM;
+
+	s_rule->type = cpu_to_le16(ICE_AQC_SW_RULES_T_VSI_LIST_CLEAR);
+	s_rule->pdata.vsi_list.index = cpu_to_le16(vsi_list_info->list_id);
+	s_rule->pdata.vsi_list.number_vsi =
+					cpu_to_le16(vsi_list_info->vsi_count);
+	for_each_set_bit(vsi_id, vsi_list_info->hw_vsi_map, ICE_HW_VSI_ID_MAX)
+		s_rule->pdata.vsi_list.vsi[i++] = cpu_to_le16(vsi_id);
+
+	bitmap_zero(vsi_list_info->hw_vsi_map, ICE_HW_VSI_ID_MAX);
+	vsi_list_info->vsi_count = 0;
+
+	status = ice_aq_sw_rules(&pf->hw, s_rule, rule_sz, 1,
+				 ice_aqc_opc_update_sw_rules, NULL);
+	kfree(s_rule);
+	if (status)
+		return -EIO;
+
+free_vsi_list:
+	res_buf = kzalloc(sizeof(*res_buf), GFP_KERNEL);
+	if (!res_buf)
+		return -ENOMEM;
+
+	res_buf->res_type = cpu_to_le16(ICE_AQC_RES_TYPE_VSI_LIST_REP);
+	res_buf->num_elems = cpu_to_le16(1);
+	res_buf->elem[0].e.sw_resp = cpu_to_le16(vsi_list_info->list_id);
+	status = ice_aq_alloc_free_res(&pf->hw, 1, res_buf, sizeof(*res_buf),
+				       ice_aqc_opc_free_res, NULL);
+	kfree(res_buf);
+	if (status)
+		return -EIO;
+
+	list_del(&vsi_list_info->list_entry);
+	kfree(vsi_list_info);
+	s_entry->vsi_list_info = NULL;
+
+	return ice_dcf_rm_sw_rule_to_vsi(pf, s_entry);
+}
+
+/**
+ * ice_dcf_rm_vsi_from_list - remove VSI from switch rule forward VSI list
+ * @pf: pointer to the PF struct
+ * @vsi_list_info: pointer to the VSI list info
+ * @hw_vsi_id: the Hardware VSI number
+ */
+static int
+ice_dcf_rm_vsi_from_list(struct ice_pf *pf,
+			 struct ice_dcf_vsi_list_info *vsi_list_info,
+			 u16 hw_vsi_id)
+{
+	struct ice_aqc_sw_rules_elem *s_rule;
+	enum ice_status status;
+
+	if (!vsi_list_info || !vsi_list_info->vsi_count ||
+	    !test_bit(hw_vsi_id, vsi_list_info->hw_vsi_map))
+		return -ENOENT;
+
+	s_rule = kzalloc(ICE_SW_RULE_VSI_LIST_SIZE(1), GFP_KERNEL);
+	if (!s_rule)
+		return -ENOMEM;
+
+	s_rule->type = cpu_to_le16(ICE_AQC_SW_RULES_T_VSI_LIST_CLEAR);
+	s_rule->pdata.vsi_list.index = cpu_to_le16(vsi_list_info->list_id);
+	s_rule->pdata.vsi_list.number_vsi = cpu_to_le16(1);
+	s_rule->pdata.vsi_list.vsi[0] = cpu_to_le16(hw_vsi_id);
+	status = ice_aq_sw_rules(&pf->hw, s_rule,
+				 ICE_SW_RULE_VSI_LIST_SIZE(1), 1,
+				 ice_aqc_opc_update_sw_rules, NULL);
+	kfree(s_rule);
+	if (status)
+		return -EIO;
+
+	/* When the VF resets gracefully, it should keep the VSI list and its
+	 * rule, just clears the VSI from list, so that the DCF can replay the
+	 * rule by updating this VF to list successfully.
+	 */
+	vsi_list_info->vsi_count--;
+	clear_bit(hw_vsi_id, vsi_list_info->hw_vsi_map);
+
+	return 0;
+}
+
+/**
+ * ice_rm_dcf_sw_vsi_rule - remove switch rules added by DCF to VSI
+ * @pf: pointer to the PF struct
+ * @hw_vsi_id: hardware VSI ID of the VF
+ */
+void ice_rm_dcf_sw_vsi_rule(struct ice_pf *pf, u16 hw_vsi_id)
+{
+	struct ice_dcf_sw_rule_entry *s_entry, *tmp;
+	int ret;
+
+	list_for_each_entry_safe(s_entry, tmp, &pf->dcf.sw_rule_head,
+				 list_entry)
+		if (s_entry->fltr_act == ICE_FWD_TO_VSI_LIST) {
+			ret = ice_dcf_rm_vsi_from_list(pf,
+						       s_entry->vsi_list_info,
+						       hw_vsi_id);
+			if (ret && ret != -ENOENT)
+				dev_err(ice_pf_to_dev(pf),
+					"Failed to remove VSI %u from VSI list : %d\n",
+					hw_vsi_id, ret);
+		} else if (s_entry->fwd_id.hw_vsi_id == hw_vsi_id) {
+			ret = ice_dcf_rm_sw_rule_to_vsi(pf, s_entry);
+			if (ret)
+				dev_err(ice_pf_to_dev(pf),
+					"Failed to remove VSI %u switch rule : %d\n",
+					hw_vsi_id, ret);
+		}
+}
+
+/**
+ * ice_dcf_init_sw_rule_mgmt - initializes DCF rule filter mgmt struct
+ * @pf: pointer to the PF struct
+ */
+void ice_dcf_init_sw_rule_mgmt(struct ice_pf *pf)
+{
+	INIT_LIST_HEAD(&pf->dcf.sw_rule_head);
+	INIT_LIST_HEAD(&pf->dcf.vsi_list_info_head);
+}
+
+/**
+ * ice_rm_all_dcf_sw_rules - remove switch rules configured by DCF
+ * @pf: pointer to the PF struct
+ */
+void ice_rm_all_dcf_sw_rules(struct ice_pf *pf)
+{
+	struct ice_dcf_vsi_list_info *vsi_list_info, *list_info_tmp;
+	struct ice_dcf_sw_rule_entry *sw_rule, *rule_tmp;
+	u16 rule_id, list_id;
+	int ret;
+
+	list_for_each_entry_safe(sw_rule, rule_tmp, &pf->dcf.sw_rule_head,
+				 list_entry)
+		if (sw_rule->fltr_act == ICE_FWD_TO_VSI_LIST) {
+			list_id = sw_rule->fwd_id.vsi_list_id;
+			rule_id = sw_rule->rule_id;
+			ret = ice_dcf_rm_sw_rule_to_vsi_list(pf, sw_rule);
+			if (ret)
+				dev_err(ice_pf_to_dev(pf),
+					"Failed to remove switch rule 0x%04x with list id %u : %d\n",
+					rule_id, list_id, ret);
+		} else {
+			rule_id = sw_rule->rule_id;
+			ret = ice_dcf_rm_sw_rule_to_vsi(pf, sw_rule);
+			if (ret)
+				dev_err(ice_pf_to_dev(pf),
+					"Failed to remove switch rule 0x%04x : %d\n",
+					rule_id, ret);
+		}
+
+	/* clears rule filter management data if AdminQ command has error */
+	list_for_each_entry_safe(vsi_list_info, list_info_tmp,
+				 &pf->dcf.vsi_list_info_head,
+				 list_entry) {
+		list_del(&vsi_list_info->list_entry);
+		kfree(vsi_list_info);
+	}
+
+	list_for_each_entry_safe(sw_rule, rule_tmp, &pf->dcf.sw_rule_head,
+				 list_entry) {
+		list_del(&sw_rule->list_entry);
+		kfree(sw_rule);
+	}
+}
+
+/**
+ * ice_dcf_find_vsi_list_info - find the VSI list by ID
+ * @pf: pointer to the PF info
+ * @vsi_list_id: VSI list ID
+ */
+static struct ice_dcf_vsi_list_info *
+ice_dcf_find_vsi_list_info(struct ice_pf *pf, u16 vsi_list_id)
+{
+	struct ice_dcf_vsi_list_info *list_info;
+
+	list_for_each_entry(list_info, &pf->dcf.vsi_list_info_head, list_entry)
+		if (list_info->list_id == vsi_list_id)
+			return list_info;
+
+	return NULL;
+}
+
+/**
+ * ice_dcf_add_vsi_id - add new VSI ID into list
+ * @vsi_list_info: pointer to the VSI list info
+ * @hw_vsi_id: the VSI ID
+ */
+static void
+ice_dcf_add_vsi_id(struct ice_dcf_vsi_list_info *vsi_list_info, u16 hw_vsi_id)
+{
+	if (!test_and_set_bit(hw_vsi_id, vsi_list_info->hw_vsi_map))
+		vsi_list_info->vsi_count++;
+}
+
+/**
+ * ice_dcf_del_vsi_id - delete the VSI ID from list
+ * @vsi_list_info: pointer to the VSI list info
+ * @hw_vsi_id: the VSI ID
+ */
+static void
+ice_dcf_del_vsi_id(struct ice_dcf_vsi_list_info *vsi_list_info, u16 hw_vsi_id)
+{
+	if (test_and_clear_bit(hw_vsi_id, vsi_list_info->hw_vsi_map))
+		vsi_list_info->vsi_count--;
+}
+
+/**
+ * ice_dcf_parse_alloc_vsi_list_res - parse the allocate VSI list resource
+ * @pf: pointer to the PF info
+ * @res: pointer to the VSI list resource
+ */
+static enum virtchnl_status_code
+ice_dcf_parse_alloc_vsi_list_res(struct ice_pf *pf,
+				 struct ice_aqc_res_elem *res)
+{
+	struct ice_dcf_vsi_list_info *vsi_list_info;
+	u16 list_id = le16_to_cpu(res->e.sw_resp);
+
+	vsi_list_info = ice_dcf_find_vsi_list_info(pf, list_id);
+	if (vsi_list_info)
+		return VIRTCHNL_STATUS_SUCCESS;
+
+	vsi_list_info = kzalloc(sizeof(*vsi_list_info), GFP_KERNEL);
+	if (!vsi_list_info)
+		return VIRTCHNL_STATUS_ERR_NO_MEMORY;
+
+	vsi_list_info->list_id = list_id;
+	list_add(&vsi_list_info->list_entry, &pf->dcf.vsi_list_info_head);
+
+	return VIRTCHNL_STATUS_SUCCESS;
+}
+
+/**
+ * ice_dcf_parse_free_vsi_list_res - parse the free VSI list resource
+ * @pf: pointer to the PF info
+ * @res: pointer to the VSI list resource
+ */
+static enum virtchnl_status_code
+ice_dcf_parse_free_vsi_list_res(struct ice_pf *pf,
+				struct ice_aqc_res_elem *res)
+{
+	struct ice_dcf_vsi_list_info *vsi_list_info;
+	u16 list_id = le16_to_cpu(res->e.sw_resp);
+
+	vsi_list_info = ice_dcf_find_vsi_list_info(pf, list_id);
+	if (!vsi_list_info)
+		return VIRTCHNL_STATUS_ERR_PARAM;
+
+	if (vsi_list_info->vsi_count)
+		dev_warn(ice_pf_to_dev(pf),
+			 "VSI list %u still has %u VSIs to be removed!\n",
+			 list_id, vsi_list_info->vsi_count);
+
+	if (vsi_list_info->sw_rule)
+		vsi_list_info->sw_rule->vsi_list_info = NULL;
+
+	list_del(&vsi_list_info->list_entry);
+	kfree(vsi_list_info);
+
+	return VIRTCHNL_STATUS_SUCCESS;
+}
+
+/**
+ * ice_dcf_set_vsi_list - set the VSI to VSI list
+ * @pf: pointer to the PF info
+ * @vsi_list: pointer to the VSI ID list to be set
+ */
+static enum virtchnl_status_code
+ice_dcf_set_vsi_list(struct ice_pf *pf, struct ice_aqc_sw_rules_elem *vsi_list)
+{
+	struct ice_dcf_vsi_list_info *vsi_list_info;
+	int i;
+
+	vsi_list_info = ice_dcf_find_vsi_list_info(pf,
+						   le16_to_cpu(vsi_list->pdata.vsi_list.index));
+	if (!vsi_list_info)
+		return VIRTCHNL_STATUS_ERR_PARAM;
+
+	for (i = 0; i < le16_to_cpu(vsi_list->pdata.vsi_list.number_vsi); i++)
+		ice_dcf_add_vsi_id(vsi_list_info,
+				   le16_to_cpu(vsi_list->pdata.vsi_list.vsi[i]));
+
+	return VIRTCHNL_STATUS_SUCCESS;
+}
+
+/**
+ * ice_dcf_clear_vsi_list - clear the VSI from VSI list
+ * @pf: pointer to the PF info
+ * @vsi_list: pointer to the VSI ID list to be cleared
+ */
+static enum virtchnl_status_code
+ice_dcf_clear_vsi_list(struct ice_pf *pf, struct ice_aqc_sw_rules_elem *vsi_list)
+{
+	struct ice_dcf_vsi_list_info *vsi_list_info;
+	int i;
+
+	vsi_list_info = ice_dcf_find_vsi_list_info(pf,
+						   le16_to_cpu(vsi_list->pdata.vsi_list.index));
+	if (!vsi_list_info)
+		return VIRTCHNL_STATUS_ERR_PARAM;
+
+	for (i = 0; i < le16_to_cpu(vsi_list->pdata.vsi_list.number_vsi); i++)
+		ice_dcf_del_vsi_id(vsi_list_info,
+				   le16_to_cpu(vsi_list->pdata.vsi_list.vsi[i]));
+
+	return VIRTCHNL_STATUS_SUCCESS;
+}
+
+/**
+ * ice_dcf_find_sw_rule - find the switch rule by ID
+ * @pf: pointer to the PF info
+ * @rule_id: switch rule ID
+ */
+static struct ice_dcf_sw_rule_entry *
+ice_dcf_find_sw_rule(struct ice_pf *pf, u16 rule_id)
+{
+	struct ice_dcf_sw_rule_entry *sw_rule;
+
+	list_for_each_entry(sw_rule, &pf->dcf.sw_rule_head, list_entry)
+		if (sw_rule->rule_id == rule_id)
+			return sw_rule;
+
+	return NULL;
+}
+
+/**
+ * ice_dcf_parse_add_sw_rule_data - parse the add switch rule data
+ * @pf: pointer to the PF info
+ * @lkup: pointer to the add switch rule data
+ */
+static enum virtchnl_status_code
+ice_dcf_parse_add_sw_rule_data(struct ice_pf *pf, struct ice_aqc_sw_rules_elem *lkup)
+{
+	struct ice_dcf_sw_rule_entry *sw_rule;
+	u32 act;
+
+	sw_rule = kzalloc(sizeof(*sw_rule), GFP_KERNEL);
+	if (!sw_rule)
+		return VIRTCHNL_STATUS_ERR_NO_MEMORY;
+
+	act = le32_to_cpu(lkup->pdata.lkup_tx_rx.act);
+	sw_rule->fltr_act = ICE_FWD_TO_VSI;
+	sw_rule->fwd_id.hw_vsi_id = (act & ICE_SINGLE_ACT_VSI_ID_M) >>
+					ICE_SINGLE_ACT_VSI_ID_S;
+	sw_rule->rule_id = le16_to_cpu(lkup->pdata.lkup_tx_rx.index);
+
+	list_add(&sw_rule->list_entry, &pf->dcf.sw_rule_head);
+
+	return VIRTCHNL_STATUS_SUCCESS;
+}
+
+/**
+ * ice_dcf_parse_updt_sw_rule_data - parse the update switch rule data
+ * @pf: pointer to the PF info
+ * @lkup: pointer to the update switch rule data
+ */
+static enum virtchnl_status_code
+ice_dcf_parse_updt_sw_rule_data(struct ice_pf *pf, struct ice_aqc_sw_rules_elem *lkup)
+{
+	struct ice_dcf_vsi_list_info *vsi_list_info;
+	struct ice_dcf_sw_rule_entry *sw_rule;
+	u16 vsi_list_id, rule_id;
+	u32 act;
+
+	rule_id = le16_to_cpu(lkup->pdata.lkup_tx_rx.index);
+	sw_rule = ice_dcf_find_sw_rule(pf, rule_id);
+	if (!sw_rule)
+		return VIRTCHNL_STATUS_ERR_PARAM;
+
+	act = le32_to_cpu(lkup->pdata.lkup_tx_rx.act);
+	if (!(act & ICE_SINGLE_ACT_VSI_LIST)) {
+		u16 vsi_hw_id = (act & ICE_SINGLE_ACT_VSI_ID_M) >>
+				ICE_SINGLE_ACT_VSI_ID_S;
+
+		sw_rule->fltr_act = ICE_FWD_TO_VSI;
+		sw_rule->fwd_id.hw_vsi_id = vsi_hw_id;
+
+		return VIRTCHNL_STATUS_SUCCESS;
+	}
+
+	vsi_list_id = (act & ICE_SINGLE_ACT_VSI_LIST_ID_M) >>
+				ICE_SINGLE_ACT_VSI_LIST_ID_S;
+	if (sw_rule->vsi_list_info) {
+		if (sw_rule->vsi_list_info->list_id == vsi_list_id)
+			return VIRTCHNL_STATUS_SUCCESS;
+
+		dev_err(ice_pf_to_dev(pf),
+			"The switch rule 0x%04x is running on VSI list %u\n",
+			rule_id, sw_rule->vsi_list_info->list_id);
+		return VIRTCHNL_STATUS_ERR_PARAM;
+	}
+
+	vsi_list_info = ice_dcf_find_vsi_list_info(pf, vsi_list_id);
+	if (!vsi_list_info) {
+		dev_err(ice_pf_to_dev(pf),
+			"No VSI list %u found to bind the switch rule 0x%04x\n",
+			vsi_list_id, rule_id);
+		return VIRTCHNL_STATUS_ERR_PARAM;
+	}
+
+	if (vsi_list_info->sw_rule) {
+		if (vsi_list_info->sw_rule->rule_id == rule_id)
+			return VIRTCHNL_STATUS_SUCCESS;
+
+		dev_err(ice_pf_to_dev(pf),
+			"The VSI list %u is running on switch rule 0x%04x\n",
+			vsi_list_id, vsi_list_info->sw_rule->rule_id);
+		return VIRTCHNL_STATUS_ERR_PARAM;
+	}
+
+	vsi_list_info->sw_rule = sw_rule;
+
+	sw_rule->fltr_act = ICE_FWD_TO_VSI_LIST;
+	sw_rule->fwd_id.vsi_list_id = vsi_list_id;
+	sw_rule->vsi_list_info = vsi_list_info;
+
+	return VIRTCHNL_STATUS_SUCCESS;
+}
+
+/**
+ * ice_dcf_parse_rm_sw_rule_data - parse the remove switch rule data
+ * @pf: pointer to the PF info
+ * @lkup: pointer to the remove switch rule data
+ */
+static enum virtchnl_status_code
+ice_dcf_parse_rm_sw_rule_data(struct ice_pf *pf, struct ice_aqc_sw_rules_elem *lkup)
+{
+	u16 rule_id = le16_to_cpu(lkup->pdata.lkup_tx_rx.index);
+	struct ice_dcf_sw_rule_entry *sw_rule, *tmp;
+
+	list_for_each_entry_safe(sw_rule, tmp, &pf->dcf.sw_rule_head,
+				 list_entry)
+		if (sw_rule->rule_id == rule_id) {
+			list_del(&sw_rule->list_entry);
+			kfree(sw_rule);
+		}
+
+	return VIRTCHNL_STATUS_SUCCESS;
+}
+
+/**
+ * ice_dcf_handle_add_sw_rule_rsp - handle the add switch rule response
+ * @pf: pointer to the PF info
+ * @aq_buf: pointer to the add switch rule command buffer
+ */
+static enum virtchnl_status_code
+ice_dcf_handle_add_sw_rule_rsp(struct ice_pf *pf, u8 *aq_buf)
+{
+	enum virtchnl_status_code status = VIRTCHNL_STATUS_SUCCESS;
+	struct ice_aqc_sw_rules_elem *em =
+			(struct ice_aqc_sw_rules_elem *)aq_buf;
+	u16 type = le16_to_cpu(em->type);
+
+	if (type == ICE_AQC_SW_RULES_T_VSI_LIST_SET)
+		status = ice_dcf_set_vsi_list(pf, em);
+	else if (type == ICE_AQC_SW_RULES_T_LKUP_RX)
+		status = ice_dcf_parse_add_sw_rule_data(pf, em);
+
+	return status;
+}
+
+/**
+ * ice_dcf_handle_updt_sw_rule_rsp - handle the update switch rule response
+ * @pf: pointer to the PF info
+ * @aq_buf: pointer to the update switch rule command buffer
+ */
+static enum virtchnl_status_code
+ice_dcf_handle_updt_sw_rule_rsp(struct ice_pf *pf, u8 *aq_buf)
+{
+	enum virtchnl_status_code status = VIRTCHNL_STATUS_SUCCESS;
+	struct ice_aqc_sw_rules_elem *em =
+			(struct ice_aqc_sw_rules_elem *)aq_buf;
+	u16 type = le16_to_cpu(em->type);
+
+	if (type == ICE_AQC_SW_RULES_T_VSI_LIST_SET)
+		status = ice_dcf_set_vsi_list(pf, em);
+	else if (type == ICE_AQC_SW_RULES_T_VSI_LIST_CLEAR)
+		status = ice_dcf_clear_vsi_list(pf, em);
+	else if (type == ICE_AQC_SW_RULES_T_LKUP_RX)
+		status = ice_dcf_parse_updt_sw_rule_data(pf, em);
+
+	return status;
+}
+
+/**
+ * ice_dcf_handle_rm_sw_rule_rsp - handle the remove switch rule response
+ * @pf: pointer to the PF info
+ * @aq_buf: pointer to the remove switch rule command buffer
+ */
+static enum virtchnl_status_code
+ice_dcf_handle_rm_sw_rule_rsp(struct ice_pf *pf, u8 *aq_buf)
+{
+	enum virtchnl_status_code status = VIRTCHNL_STATUS_SUCCESS;
+	struct ice_aqc_sw_rules_elem *em =
+			(struct ice_aqc_sw_rules_elem *)aq_buf;
+	u16 type = le16_to_cpu(em->type);
+
+	if (type == ICE_AQC_SW_RULES_T_LKUP_RX)
+		status = ice_dcf_parse_rm_sw_rule_data(pf, em);
+
+	return status;
+}
+
+/**
+ * ice_dcf_handle_alloc_res_rsp - handle the allocate resource response
+ * @pf: pointer to the PF info
+ * @aq_buf: pointer to the allocate resource command buffer
+ */
+static enum virtchnl_status_code
+ice_dcf_handle_alloc_res_rsp(struct ice_pf *pf, u8 *aq_buf)
+{
+	enum virtchnl_status_code status = VIRTCHNL_STATUS_SUCCESS;
+	struct ice_aqc_alloc_free_res_elem *res_buf =
+		 (struct ice_aqc_alloc_free_res_elem *)aq_buf;
+	u16 type = (le16_to_cpu(res_buf->res_type) &
+		    ICE_AQC_RES_TYPE_M) >> ICE_AQC_RES_TYPE_S;
+
+	if (type == ICE_AQC_RES_TYPE_VSI_LIST_REP)
+		status = ice_dcf_parse_alloc_vsi_list_res(pf,
+							  &res_buf->elem[0]);
+
+	return status;
+}
+
+/**
+ * ice_dcf_handle_free_res_rsp - handle the free resource response
+ * @pf: pointer to the PF info
+ * @aq_buf: pointer to the free resource command buffer
+ */
+static enum virtchnl_status_code
+ice_dcf_handle_free_res_rsp(struct ice_pf *pf, u8 *aq_buf)
+{
+	enum virtchnl_status_code status = VIRTCHNL_STATUS_SUCCESS;
+	struct ice_aqc_alloc_free_res_elem *res_buf =
+		 (struct ice_aqc_alloc_free_res_elem *)aq_buf;
+	u16 type = (le16_to_cpu(res_buf->res_type) &
+		    ICE_AQC_RES_TYPE_M) >> ICE_AQC_RES_TYPE_S;
+
+	if (type == ICE_AQC_RES_TYPE_VSI_LIST_REP)
+		status = ice_dcf_parse_free_vsi_list_res(pf,
+							 &res_buf->elem[0]);
+
+	return status;
+}
+
+/**
+ * ice_dcf_post_aq_send_cmd - get the data from firmware successful response
+ * @pf: pointer to the PF info
+ * @aq_desc: descriptor describing the command
+ * @aq_buf: the AdminQ command buffer
+ */
+enum virtchnl_status_code
+ice_dcf_post_aq_send_cmd(struct ice_pf *pf, struct ice_aq_desc *aq_desc,
+			 u8 *aq_buf)
+{
+	enum virtchnl_status_code status = VIRTCHNL_STATUS_SUCCESS;
+	u16 opc = le16_to_cpu(aq_desc->opcode);
+
+	if (!aq_buf)
+		return VIRTCHNL_STATUS_SUCCESS;
+
+	if (opc == ice_aqc_opc_add_sw_rules)
+		status = ice_dcf_handle_add_sw_rule_rsp(pf, aq_buf);
+	else if (opc == ice_aqc_opc_update_sw_rules)
+		status = ice_dcf_handle_updt_sw_rule_rsp(pf, aq_buf);
+	else if (opc == ice_aqc_opc_remove_sw_rules)
+		status = ice_dcf_handle_rm_sw_rule_rsp(pf, aq_buf);
+	else if (opc == ice_aqc_opc_alloc_res)
+		status = ice_dcf_handle_alloc_res_rsp(pf, aq_buf);
+	else if (opc == ice_aqc_opc_free_res)
+		status = ice_dcf_handle_free_res_rsp(pf, aq_buf);
+
+	return status;
+}
+
+/**
+ * ice_dcf_pre_aq_send_cmd - check if it needs to send the command to firmware
+ * @vf: pointer to the VF info
+ * @aq_desc: descriptor describing the command
+ * @aq_buf: the AdminQ command buffer
+ * @aq_buf_size: the AdminQ command buffer size
+ */
+bool
+ice_dcf_pre_aq_send_cmd(struct ice_vf *vf, struct ice_aq_desc *aq_desc,
+			u8 *aq_buf, u16 aq_buf_size)
+{
+	struct ice_pf *pf = vf->pf;
+
+	switch (le16_to_cpu(aq_desc->opcode)) {
+	case ice_aqc_opc_update_sw_rules:
+	{
+		struct ice_dcf_vsi_list_info *vsi_list_info;
+		struct ice_aqc_sw_rules_elem *s_rule;
+		u16 list_id, vsi_id;
+
+		if (aq_buf_size < ICE_SW_RULE_VSI_LIST_SIZE(1))
+			break;
+
+		s_rule = (struct ice_aqc_sw_rules_elem *)aq_buf;
+		if (le16_to_cpu(s_rule->type) !=
+					ICE_AQC_SW_RULES_T_VSI_LIST_CLEAR ||
+		    le16_to_cpu(s_rule->pdata.vsi_list.number_vsi) != 1)
+			break;
+
+		list_id = le16_to_cpu(s_rule->pdata.vsi_list.index);
+		vsi_list_info = ice_dcf_find_vsi_list_info(pf, list_id);
+		if (!vsi_list_info)
+			break;
+
+		vsi_id = le16_to_cpu(s_rule->pdata.vsi_list.vsi[0]);
+		if (vsi_id >= ICE_HW_VSI_ID_MAX ||
+		    test_bit(vsi_id, vsi_list_info->hw_vsi_map))
+			break;
+
+		/* The VSI is removed from list already, no need to send the
+		 * command to firmware.
+		 */
+		return true;
+	}
+	case ice_aqc_opc_remove_sw_rules:
+	{
+		struct ice_aqc_sw_rules_elem *s_rule;
+		u16 rule_id;
+
+		if (aq_buf_size < ICE_SW_RULE_RX_TX_NO_HDR_SIZE)
+			break;
+
+		s_rule = (struct ice_aqc_sw_rules_elem *)aq_buf;
+		if (le16_to_cpu(s_rule->type) != ICE_AQC_SW_RULES_T_LKUP_RX)
+			break;
+
+		rule_id = le16_to_cpu(s_rule->pdata.lkup_tx_rx.index);
+		if (ice_dcf_find_sw_rule(pf, rule_id))
+			break;
+
+		/* The switch rule is removed already, no need to send the
+		 * command to firmware.
+		 */
+		return true;
+	}
+
+	default:
+		break;
+	}
+
+	return false;
+}
diff --git a/drivers/net/ethernet/intel/ice/ice_dcf.h b/drivers/net/ethernet/intel/ice/ice_dcf.h
index 1ca228f89a19..23842db0a884 100644
--- a/drivers/net/ethernet/intel/ice/ice_dcf.h
+++ b/drivers/net/ethernet/intel/ice/ice_dcf.h
@@ -21,10 +21,42 @@ enum ice_dcf_state {
 	ICE_DCF_STATE_PAUSE,
 };
 
+struct ice_dcf_sw_rule_entry;
+
+#define ICE_HW_VSI_ID_MAX	BIT(10) /* The AQ VSI number uses 10 bits */
+
+struct ice_dcf_vsi_list_info {
+	struct list_head list_entry;
+	struct ice_dcf_sw_rule_entry *sw_rule;
+	u16 list_id;
+
+	u16 vsi_count;
+	DECLARE_BITMAP(hw_vsi_map, ICE_HW_VSI_ID_MAX);
+};
+
+struct ice_dcf_sw_rule_entry {
+	struct list_head list_entry;
+	u16 rule_id;
+
+	/* Only support ICE_FWD_TO_VSI and ICE_FWD_TO_VSI_LIST */
+	enum ice_sw_fwd_act_type fltr_act;
+	/* Depending on filter action */
+	union {
+		u16 hw_vsi_id:10;
+		u16 vsi_list_id:10;
+	} fwd_id;
+
+	struct ice_dcf_vsi_list_info *vsi_list_info;
+};
+
 struct ice_dcf {
 	struct ice_vf *vf;
 	enum ice_dcf_state state;
 
+	/* Trace the switch rules added/removed by DCF */
+	struct list_head sw_rule_head;
+	struct list_head vsi_list_info_head;
+
 	/* Handle the AdminQ command between the DCF (Device Config Function)
 	 * and the firmware.
 	 */
@@ -46,5 +78,14 @@ bool ice_check_dcf_allowed(struct ice_vf *vf);
 bool ice_is_vf_dcf(struct ice_vf *vf);
 enum ice_dcf_state ice_dcf_get_state(struct ice_pf *pf);
 void ice_dcf_set_state(struct ice_pf *pf, enum ice_dcf_state state);
+void ice_dcf_init_sw_rule_mgmt(struct ice_pf *pf);
+void ice_rm_all_dcf_sw_rules(struct ice_pf *pf);
+void ice_rm_dcf_sw_vsi_rule(struct ice_pf *pf, u16 hw_vsi_id);
+bool
+ice_dcf_pre_aq_send_cmd(struct ice_vf *vf, struct ice_aq_desc *aq_desc,
+			u8 *aq_buf, u16 aq_buf_size);
+enum virtchnl_status_code
+ice_dcf_post_aq_send_cmd(struct ice_pf *pf, struct ice_aq_desc *aq_desc,
+			 u8 *aq_buf);
 #endif /* CONFIG_PCI_IOV */
 #endif /* _ICE_DCF_H_ */
diff --git a/drivers/net/ethernet/intel/ice/ice_switch.c b/drivers/net/ethernet/intel/ice/ice_switch.c
index ccbe1cc64295..ee434b8d794d 100644
--- a/drivers/net/ethernet/intel/ice/ice_switch.c
+++ b/drivers/net/ethernet/intel/ice/ice_switch.c
@@ -23,24 +23,10 @@
  *	In case of Ether type filter it is treated as header without VLAN tag
  *	and byte 12 and 13 is used to program a given Ether type instead
  */
-#define DUMMY_ETH_HDR_LEN		16
 static const u8 dummy_eth_header[DUMMY_ETH_HDR_LEN] = { 0x2, 0, 0, 0, 0, 0,
 							0x2, 0, 0, 0, 0, 0,
 							0x81, 0, 0, 0};
 
-#define ICE_SW_RULE_RX_TX_ETH_HDR_SIZE \
-	(offsetof(struct ice_aqc_sw_rules_elem, pdata.lkup_tx_rx.hdr) + \
-	 (DUMMY_ETH_HDR_LEN * \
-	  sizeof(((struct ice_sw_rule_lkup_rx_tx *)0)->hdr[0])))
-#define ICE_SW_RULE_RX_TX_NO_HDR_SIZE \
-	(offsetof(struct ice_aqc_sw_rules_elem, pdata.lkup_tx_rx.hdr))
-#define ICE_SW_RULE_LG_ACT_SIZE(n) \
-	(offsetof(struct ice_aqc_sw_rules_elem, pdata.lg_act.act) + \
-	 ((n) * sizeof(((struct ice_sw_rule_lg_act *)0)->act[0])))
-#define ICE_SW_RULE_VSI_LIST_SIZE(n) \
-	(offsetof(struct ice_aqc_sw_rules_elem, pdata.vsi_list.vsi) + \
-	 ((n) * sizeof(((struct ice_sw_rule_vsi_list *)0)->vsi[0])))
-
 /**
  * ice_init_def_sw_recp - initialize the recipe book keeping tables
  * @hw: pointer to the HW struct
@@ -490,7 +476,7 @@ ice_aq_alloc_free_vsi_list(struct ice_hw *hw, u16 *vsi_list_id,
  *
  * Add(0x02a0)/Update(0x02a1)/Remove(0x02a2) switch rules commands to firmware
  */
-static enum ice_status
+enum ice_status
 ice_aq_sw_rules(struct ice_hw *hw, void *rule_list, u16 rule_list_sz,
 		u8 num_rules, enum ice_adminq_opc opc, struct ice_sq_cd *cd)
 {
diff --git a/drivers/net/ethernet/intel/ice/ice_switch.h b/drivers/net/ethernet/intel/ice/ice_switch.h
index 8b4f9d35c860..05cd6ea24d44 100644
--- a/drivers/net/ethernet/intel/ice/ice_switch.h
+++ b/drivers/net/ethernet/intel/ice/ice_switch.h
@@ -14,6 +14,20 @@
 #define ICE_VSI_INVAL_ID 0xffff
 #define ICE_INVAL_Q_HANDLE 0xFFFF
 
+#define DUMMY_ETH_HDR_LEN		16
+#define ICE_SW_RULE_RX_TX_ETH_HDR_SIZE \
+	(offsetof(struct ice_aqc_sw_rules_elem, pdata.lkup_tx_rx.hdr) + \
+	 (DUMMY_ETH_HDR_LEN * \
+	  sizeof(((struct ice_sw_rule_lkup_rx_tx *)0)->hdr[0])))
+#define ICE_SW_RULE_RX_TX_NO_HDR_SIZE \
+	(offsetof(struct ice_aqc_sw_rules_elem, pdata.lkup_tx_rx.hdr))
+#define ICE_SW_RULE_LG_ACT_SIZE(n) \
+	(offsetof(struct ice_aqc_sw_rules_elem, pdata.lg_act.act) + \
+	 ((n) * sizeof(((struct ice_sw_rule_lg_act *)0)->act[0])))
+#define ICE_SW_RULE_VSI_LIST_SIZE(n) \
+	(offsetof(struct ice_aqc_sw_rules_elem, pdata.vsi_list.vsi) + \
+	 ((n) * sizeof(((struct ice_sw_rule_vsi_list *)0)->vsi[0])))
+
 /* VSI context structure for add/get/update/free operations */
 struct ice_vsi_ctx {
 	u16 vsi_num;
@@ -28,15 +42,6 @@ struct ice_vsi_ctx {
 	struct ice_q_ctx *lan_q_ctx[ICE_MAX_TRAFFIC_CLASS];
 };
 
-enum ice_sw_fwd_act_type {
-	ICE_FWD_TO_VSI = 0,
-	ICE_FWD_TO_VSI_LIST, /* Do not use this when adding filter */
-	ICE_FWD_TO_Q,
-	ICE_FWD_TO_QGRP,
-	ICE_DROP_PACKET,
-	ICE_INVAL_ACT
-};
-
 /* Switch recipe ID enum values are specific to hardware */
 enum ice_sw_lkup_type {
 	ICE_SW_LKUP_ETHERTYPE = 0,
@@ -247,5 +252,7 @@ bool ice_is_vsi_valid(struct ice_hw *hw, u16 vsi_handle);
 
 enum ice_status ice_replay_vsi_all_fltr(struct ice_hw *hw, u16 vsi_handle);
 void ice_rm_all_sw_replay_rule_info(struct ice_hw *hw);
-
+enum ice_status
+ice_aq_sw_rules(struct ice_hw *hw, void *rule_list, u16 rule_list_sz,
+		u8 num_rules, enum ice_adminq_opc opc, struct ice_sq_cd *cd);
 #endif /* _ICE_SWITCH_H_ */
diff --git a/drivers/net/ethernet/intel/ice/ice_type.h b/drivers/net/ethernet/intel/ice/ice_type.h
index c1ad8622e65c..c2eff68d5469 100644
--- a/drivers/net/ethernet/intel/ice/ice_type.h
+++ b/drivers/net/ethernet/intel/ice/ice_type.h
@@ -707,6 +707,15 @@ struct ice_hw_port_stats {
 	u64 fd_sb_match;
 };
 
+enum ice_sw_fwd_act_type {
+	ICE_FWD_TO_VSI = 0,
+	ICE_FWD_TO_VSI_LIST, /* Do not use this when adding filter */
+	ICE_FWD_TO_Q,
+	ICE_FWD_TO_QGRP,
+	ICE_DROP_PACKET,
+	ICE_INVAL_ACT
+};
+
 /* Checksum and Shadow RAM pointers */
 #define ICE_SR_BOOT_CFG_PTR		0x132
 #define ICE_NVM_OROM_VER_OFF		0x02
diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
index c9dd9d473922..a0ec716c193b 100644
--- a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
@@ -361,6 +361,7 @@ void ice_free_vfs(struct ice_pf *pf)
 		dev_warn(dev, "VFs are assigned - not disabling SR-IOV\n");
 
 	if (ice_dcf_get_state(pf) != ICE_DCF_STATE_OFF) {
+		ice_rm_all_dcf_sw_rules(pf);
 		ice_dcf_set_state(pf, ICE_DCF_STATE_OFF);
 		pf->dcf.vf = NULL;
 	}
@@ -1055,6 +1056,9 @@ static void ice_vf_clear_counters(struct ice_vf *vf)
  */
 static void ice_vf_pre_vsi_rebuild(struct ice_vf *vf)
 {
+	/* Remove switch rules associated with the reset VF */
+	ice_rm_dcf_sw_vsi_rule(vf->pf, vf->lan_vsi_num);
+
 	ice_vf_clear_counters(vf);
 	ice_clear_vf_reset_trigger(vf);
 }
@@ -1584,6 +1588,8 @@ static int ice_ena_vfs(struct ice_pf *pf, u16 num_vfs)
 	if (ret)
 		goto err_pci_disable_sriov;
 
+	ice_dcf_init_sw_rule_mgmt(pf);
+
 	if (ice_set_per_vf_res(pf)) {
 		dev_err(dev, "Not enough resources for %d VFs, try with fewer number of VFs\n",
 			num_vfs);
@@ -2013,6 +2019,13 @@ static int ice_vc_get_vf_res_msg(struct ice_vf *vf, u8 *msg)
 			 ICE_DCF_VFID);
 	} else if (ice_is_vf_dcf(vf) &&
 		   ice_dcf_get_state(pf) != ICE_DCF_STATE_OFF) {
+		/* If a designated DCF requests AVF functionality from the
+		 * same VF without the DCF gracefully relinquishing the DCF
+		 * functionality first, remove ALL switch filters that were
+		 * added by the DCF.
+		 */
+		dev_info(ice_pf_to_dev(pf), "DCF is not in the OFF state, removing all switch filters that were added by the DCF\n");
+		ice_rm_all_dcf_sw_rules(pf);
 		ice_dcf_set_state(pf, ICE_DCF_STATE_OFF);
 		pf->dcf.vf = NULL;
 		ice_reset_vf(vf, false);
@@ -3724,6 +3737,18 @@ ice_dcf_handle_aq_cmd(struct ice_vf *vf, struct ice_aq_desc *aq_desc,
 	if ((aq_buf && !aq_buf_size) || (!aq_buf && aq_buf_size))
 		return -EINVAL;
 
+	if (ice_dcf_pre_aq_send_cmd(vf, aq_desc, aq_buf, aq_buf_size)) {
+		ret = ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_DCF_CMD_DESC,
+					    VIRTCHNL_STATUS_SUCCESS,
+					    (u8 *)aq_desc, sizeof(*aq_desc));
+		if (ret || !aq_buf_size)
+			return ret;
+
+		v_op = VIRTCHNL_OP_DCF_CMD_BUFF;
+		v_ret = VIRTCHNL_STATUS_SUCCESS;
+		goto err;
+	}
+
 	aq_ret = ice_aq_send_cmd(&pf->hw, aq_desc, aq_buf, aq_buf_size, NULL);
 	/* It needs to send back the AQ response message if ICE_ERR_AQ_ERROR
 	 * returns, some AdminQ handlers will use the error code filled by FW
@@ -3735,6 +3760,14 @@ ice_dcf_handle_aq_cmd(struct ice_vf *vf, struct ice_aq_desc *aq_desc,
 		goto err;
 	}
 
+	if (aq_ret != ICE_ERR_AQ_ERROR) {
+		v_ret = ice_dcf_post_aq_send_cmd(pf, aq_desc, aq_buf);
+		if (v_ret != VIRTCHNL_STATUS_SUCCESS) {
+			v_op = VIRTCHNL_OP_DCF_CMD_DESC;
+			goto err;
+		}
+	}
+
 	ret = ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_DCF_CMD_DESC, v_ret,
 				    (u8 *)aq_desc, sizeof(*aq_desc));
 	/* Bail out so we don't send the VIRTCHNL_OP_DCF_CMD_BUFF message
@@ -3841,6 +3874,7 @@ static int ice_vc_dis_dcf_cap(struct ice_vf *vf)
 
 	if (vf->driver_caps & VIRTCHNL_VF_CAP_DCF) {
 		vf->driver_caps &= ~VIRTCHNL_VF_CAP_DCF;
+		ice_rm_all_dcf_sw_rules(vf->pf);
 		ice_dcf_set_state(vf->pf, ICE_DCF_STATE_OFF);
 		vf->pf->dcf.vf = NULL;
 	}
@@ -4257,8 +4291,14 @@ int ice_set_vf_trust(struct net_device *netdev, int vf_id, bool trusted)
 	if (trusted == vf->trusted)
 		return 0;
 
+	/* If the trust mode of a given DCF is taken away without the DCF
+	 * gracefully relinquishing the DCF functionality, remove ALL switch
+	 * filters that were added by the DCF and treat this VF as any other
+	 * untrusted AVF.
+	 */
 	if (ice_is_vf_dcf(vf) && !trusted &&
 	    ice_dcf_get_state(pf) != ICE_DCF_STATE_OFF) {
+		ice_rm_all_dcf_sw_rules(pf);
 		ice_dcf_set_state(pf, ICE_DCF_STATE_OFF);
 		pf->dcf.vf = NULL;
 		vf->driver_caps &= ~VIRTCHNL_VF_CAP_DCF;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [net-next 1/5] ice: add the virtchnl handler for AdminQ command
  2020-07-13 17:43 ` [net-next 1/5] ice: add the virtchnl handler for AdminQ command Tony Nguyen
@ 2020-07-13 22:48   ` Jakub Kicinski
  2020-07-14  1:29     ` Wang, Haiyue
  0 siblings, 1 reply; 21+ messages in thread
From: Jakub Kicinski @ 2020-07-13 22:48 UTC (permalink / raw)
  To: Tony Nguyen
  Cc: davem, Haiyue Wang, netdev, nhorman, sassmann, jeffrey.t.kirsher,
	Nannan Lu, Andrew Bowers

On Mon, 13 Jul 2020 10:43:16 -0700 Tony Nguyen wrote:
> From: Haiyue Wang <haiyue.wang@intel.com>
> 
> The DCF (Device Config Function) is a named trust VF (always with ID 0,
> single entity per PF port) that can act as a sole controlling entity to
> exercise advance functionality such as adding switch rules for the rest
> of VFs.

But why? This looks like a bifurcated driver to me.

> To achieve this approach, this VF is permitted to send some basic AdminQ
> commands to the PF through virtual channel (mailbox), then the PF driver
> sends these commands to the firmware, and returns the response to the VF
> again through virtual channel.
> 
> The AdminQ command from DCF is split into two parts: one is the AdminQ
> descriptor, the other is the buffer (the descriptor has BUF flag set).
> These two parts should be sent in order, so that the PF can handle them
> correctly.
> 
> Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> Tested-by: Nannan Lu <nannan.lu@intel.com>
> Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [net-next 1/5] ice: add the virtchnl handler for AdminQ command
  2020-07-13 22:48   ` Jakub Kicinski
@ 2020-07-14  1:29     ` Wang, Haiyue
  2020-07-14 18:24       ` Jakub Kicinski
  0 siblings, 1 reply; 21+ messages in thread
From: Wang, Haiyue @ 2020-07-14  1:29 UTC (permalink / raw)
  To: Jakub Kicinski, Nguyen, Anthony L
  Cc: davem, netdev, nhorman, sassmann, Kirsher, Jeffrey T, Lu, Nannan,
	Bowers, AndrewX

> -----Original Message-----
> From: Jakub Kicinski <kuba@kernel.org>
> Sent: Tuesday, July 14, 2020 06:49
> To: Nguyen, Anthony L <anthony.l.nguyen@intel.com>
> Cc: davem@davemloft.net; Wang, Haiyue <haiyue.wang@intel.com>; netdev@vger.kernel.org;
> nhorman@redhat.com; sassmann@redhat.com; Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com>; Lu, Nannan
> <nannan.lu@intel.com>; Bowers, AndrewX <andrewx.bowers@intel.com>
> Subject: Re: [net-next 1/5] ice: add the virtchnl handler for AdminQ command
> 
> On Mon, 13 Jul 2020 10:43:16 -0700 Tony Nguyen wrote:
> > From: Haiyue Wang <haiyue.wang@intel.com>
> >
> > The DCF (Device Config Function) is a named trust VF (always with ID 0,
> > single entity per PF port) that can act as a sole controlling entity to
> > exercise advance functionality such as adding switch rules for the rest
> > of VFs.
> 
> But why? This looks like a bifurcated driver to me.
> 

Yes, like bifurcated about flow control. This expands Intel AVF virtual channel
commands, so that VF can talk to hardware indirectly, which is under control of
PF. Then VF can set up the flow control for other VFs. This enrich current PF's
Flow Director filter for PF itself only by ethtool. 

> > To achieve this approach, this VF is permitted to send some basic AdminQ
> > commands to the PF through virtual channel (mailbox), then the PF driver
> > sends these commands to the firmware, and returns the response to the VF
> > again through virtual channel.
> >
> > The AdminQ command from DCF is split into two parts: one is the AdminQ
> > descriptor, the other is the buffer (the descriptor has BUF flag set).
> > These two parts should be sent in order, so that the PF can handle them
> > correctly.
> >
> > Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> > Tested-by: Nannan Lu <nannan.lu@intel.com>
> > Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
> > Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [net-next 1/5] ice: add the virtchnl handler for AdminQ command
  2020-07-14  1:29     ` Wang, Haiyue
@ 2020-07-14 18:24       ` Jakub Kicinski
  2020-07-15  1:17         ` Wang, Haiyue
  0 siblings, 1 reply; 21+ messages in thread
From: Jakub Kicinski @ 2020-07-14 18:24 UTC (permalink / raw)
  To: Wang, Haiyue
  Cc: Nguyen, Anthony L, davem, netdev, nhorman, sassmann, Kirsher,
	Jeffrey T, Lu, Nannan, Bowers, AndrewX

On Tue, 14 Jul 2020 01:29:40 +0000 Wang, Haiyue wrote:
> > On Mon, 13 Jul 2020 10:43:16 -0700 Tony Nguyen wrote:  
> > > From: Haiyue Wang <haiyue.wang@intel.com>
> > >
> > > The DCF (Device Config Function) is a named trust VF (always with ID 0,
> > > single entity per PF port) that can act as a sole controlling entity to
> > > exercise advance functionality such as adding switch rules for the rest
> > > of VFs.  
> > 
> > But why? This looks like a bifurcated driver to me.
> 
> Yes, like bifurcated about flow control. This expands Intel AVF virtual channel
> commands, so that VF can talk to hardware indirectly, which is under control of
> PF. Then VF can set up the flow control for other VFs. This enrich current PF's
> Flow Director filter for PF itself only by ethtool. 

Could you say a little more about the application and motivation for
this?

We are talking about a single control domain here, correct?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [net-next 1/5] ice: add the virtchnl handler for AdminQ command
  2020-07-14 18:24       ` Jakub Kicinski
@ 2020-07-15  1:17         ` Wang, Haiyue
  2020-07-15 18:03           ` Jakub Kicinski
  0 siblings, 1 reply; 21+ messages in thread
From: Wang, Haiyue @ 2020-07-15  1:17 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Nguyen, Anthony L, davem, netdev, nhorman, sassmann, Kirsher,
	Jeffrey T, Lu, Nannan, Bowers, AndrewX

> -----Original Message-----
> From: Jakub Kicinski <kuba@kernel.org>
> Sent: Wednesday, July 15, 2020 02:24
> To: Wang, Haiyue <haiyue.wang@intel.com>
> Cc: Nguyen, Anthony L <anthony.l.nguyen@intel.com>; davem@davemloft.net; netdev@vger.kernel.org;
> nhorman@redhat.com; sassmann@redhat.com; Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com>; Lu, Nannan
> <nannan.lu@intel.com>; Bowers, AndrewX <andrewx.bowers@intel.com>
> Subject: Re: [net-next 1/5] ice: add the virtchnl handler for AdminQ command
> 
> On Tue, 14 Jul 2020 01:29:40 +0000 Wang, Haiyue wrote:
> > > On Mon, 13 Jul 2020 10:43:16 -0700 Tony Nguyen wrote:
> > > > From: Haiyue Wang <haiyue.wang@intel.com>
> > > >
> > > > The DCF (Device Config Function) is a named trust VF (always with ID 0,
> > > > single entity per PF port) that can act as a sole controlling entity to
> > > > exercise advance functionality such as adding switch rules for the rest
> > > > of VFs.
> > >
> > > But why? This looks like a bifurcated driver to me.
> >
> > Yes, like bifurcated about flow control. This expands Intel AVF virtual channel
> > commands, so that VF can talk to hardware indirectly, which is under control of
> > PF. Then VF can set up the flow control for other VFs. This enrich current PF's
> > Flow Director filter for PF itself only by ethtool.
> 
> Could you say a little more about the application and motivation for
> this?
> 

Sure, I will try to describe the whole story.

> We are talking about a single control domain here, correct?

Correct.

As you know, with the help of vfio-pci kernel module, we can write the user space
driver for PCI devices, like DPDK. ;-)

We have
  1). user space iavf framework:
	http://git.dpdk.org/dpdk/tree/drivers/common/iavf

  2). user space iavf driver:
      http://git.dpdk.org/dpdk/tree/drivers/net/iavf

  3). user space ice driver with no SR-IOV support:
      http://git.dpdk.org/dpdk/tree/drivers/net/ice

Nowadays, the concept of control path and data path separation is popular, we tried
to design a software defined control path by the above software portfolio, and the
scenario is described in:
      http://doc.dpdk.org/guides/nics/ice.html 23.4.3. Device Config Function (DCF)

With the patch in user space ice driver:
      http://git.dpdk.org/dpdk/commit/?id=c5dccda9f2ae6ecc716892c233a0dadc94e013da

and this patch set in ice kernel driver, we can now promote the VF from iAVF (data
path we called) to DCF (control path) for each PF device.





^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [net-next 1/5] ice: add the virtchnl handler for AdminQ command
  2020-07-15  1:17         ` Wang, Haiyue
@ 2020-07-15 18:03           ` Jakub Kicinski
  2020-07-15 18:18             ` Wang, Haiyue
  2020-07-23  0:37             ` Venkataramanan, Anirudh
  0 siblings, 2 replies; 21+ messages in thread
From: Jakub Kicinski @ 2020-07-15 18:03 UTC (permalink / raw)
  To: Wang, Haiyue
  Cc: Nguyen, Anthony L, davem, netdev, nhorman, sassmann, Kirsher,
	Jeffrey T, Lu, Nannan, Bowers, AndrewX

On Wed, 15 Jul 2020 01:17:26 +0000 Wang, Haiyue wrote:
> > Could you say a little more about the application and motivation for
> > this?
> 
> Sure, I will try to describe the whole story.
> 
> > We are talking about a single control domain here, correct?  
> 
> Correct.

We have a long standing policy of not supporting or helping bifurcated
drivers.

I'm tossing this from patchwork.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [net-next 1/5] ice: add the virtchnl handler for AdminQ command
  2020-07-15 18:03           ` Jakub Kicinski
@ 2020-07-15 18:18             ` Wang, Haiyue
  2020-07-23  0:37             ` Venkataramanan, Anirudh
  1 sibling, 0 replies; 21+ messages in thread
From: Wang, Haiyue @ 2020-07-15 18:18 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Nguyen, Anthony L, davem, netdev, nhorman, sassmann, Kirsher,
	Jeffrey T, Lu, Nannan, Bowers, AndrewX, Liang, Cunming

> -----Original Message-----
> From: netdev-owner@vger.kernel.org <netdev-owner@vger.kernel.org> On Behalf Of Jakub Kicinski
> Sent: Thursday, July 16, 2020 02:04
> To: Wang, Haiyue <haiyue.wang@intel.com>
> Cc: Nguyen, Anthony L <anthony.l.nguyen@intel.com>; davem@davemloft.net; netdev@vger.kernel.org;
> nhorman@redhat.com; sassmann@redhat.com; Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com>; Lu, Nannan
> <nannan.lu@intel.com>; Bowers, AndrewX <andrewx.bowers@intel.com>
> Subject: Re: [net-next 1/5] ice: add the virtchnl handler for AdminQ command
> 
> On Wed, 15 Jul 2020 01:17:26 +0000 Wang, Haiyue wrote:
> > > Could you say a little more about the application and motivation for
> > > this?
> >
> > Sure, I will try to describe the whole story.
> >
> > > We are talking about a single control domain here, correct?
> >
> > Correct.
> 
> We have a long standing policy of not supporting or helping bifurcated
> drivers.

I searched the concept about 'Bifurcated', not sure the below is the corret:
 "Queue Splitting (Bifurcated Driver) is a design that allows for directing
  some traffic to user space, while keeping the remaining traffic in the
  traditional Linux networking stack."

What we did is some path finding about how to partition the hardware function
in real world to meet customer's requirement flexibly.

> 
> I'm tossing this from patchwork.

Thanks for your time, I understand your concern from another point. We fix the
real world issue by our limited wisdom, and it is nice to open source our design
to get the feedback of the net community.

Problems
-	User app expects to take advantage of a few SR-IOV PF capabilities (e.g.
      binary/ternary classify) in raw manner
-	It doesn't expect to take control of entire SR-IOV PF device from user
      space
 
Motivation
-	Single control domain is always managed by kernel SR-IOV PF driver
-	It allows app to issue access intent for raw capabilities via named
      trust virtchnl

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [net-next 1/5] ice: add the virtchnl handler for AdminQ command
  2020-07-15 18:03           ` Jakub Kicinski
  2020-07-15 18:18             ` Wang, Haiyue
@ 2020-07-23  0:37             ` Venkataramanan, Anirudh
  2020-07-23  1:07               ` Jakub Kicinski
  1 sibling, 1 reply; 21+ messages in thread
From: Venkataramanan, Anirudh @ 2020-07-23  0:37 UTC (permalink / raw)
  To: kuba, Wang, Haiyue
  Cc: davem, nhorman, sassmann, Bowers, AndrewX, Kirsher, Jeffrey T,
	netdev, Nguyen, Anthony L, Lu, Nannan

On Wed, 2020-07-15 at 11:03 -0700, Jakub Kicinski wrote:
> On Wed, 15 Jul 2020 01:17:26 +0000 Wang, Haiyue wrote:
> > > Could you say a little more about the application and motivation
> > > for
> > > this?
> > 
> > Sure, I will try to describe the whole story.
> > 
> > > We are talking about a single control domain here, correct?  
> > 
> > Correct.
> 
> We have a long standing policy of not supporting or helping
> bifurcated
> drivers.

Jakub,
 
From what I understand, a bifurcated driver carves out the host
interface netdev's queues (resources if you want to further generalize
this) and allows another entity to control/use them. If this is the
definition, then DCF is not bifurcating the driver.
 
DCF is on top of SR-IOV VFs and the device resources required for the
DCF VF are made available by the device through the PCI configuration
space during SR-IOV init. This part isn't anything different from the
usual SR-IOV VF instantiation. The DCF feature adds the ability for a
trusted VF to add flow rules. So this is really just an extension of
the VF-PF interface that allows advanced flow/switch rules programming.
Additionally, the driver also has a list of operations that the DCF VF
is allowed to execute.

Can you please clarify how you (and the community) define bifurcated
driver?

Thanks,
Ani

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [net-next 1/5] ice: add the virtchnl handler for AdminQ command
  2020-07-23  0:37             ` Venkataramanan, Anirudh
@ 2020-07-23  1:07               ` Jakub Kicinski
  2020-07-25 16:39                 ` Tom Herbert
  0 siblings, 1 reply; 21+ messages in thread
From: Jakub Kicinski @ 2020-07-23  1:07 UTC (permalink / raw)
  To: Venkataramanan, Anirudh
  Cc: Wang, Haiyue, davem, nhorman, sassmann, Bowers, AndrewX, Kirsher,
	Jeffrey T, netdev, Nguyen, Anthony L, Lu, Nannan

On Thu, 23 Jul 2020 00:37:29 +0000 Venkataramanan, Anirudh wrote:
> Can you please clarify how you (and the community) define bifurcated
> driver?

No amount of clarification from me will change the fact that you need
this for DPDK.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [net-next 1/5] ice: add the virtchnl handler for AdminQ command
  2020-07-23  1:07               ` Jakub Kicinski
@ 2020-07-25 16:39                 ` Tom Herbert
  2020-07-27 23:04                   ` Jakub Kicinski
  0 siblings, 1 reply; 21+ messages in thread
From: Tom Herbert @ 2020-07-25 16:39 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Venkataramanan, Anirudh, Wang, Haiyue, davem, nhorman, sassmann,
	Bowers, AndrewX, Kirsher, Jeffrey T, netdev, Nguyen, Anthony L,
	Lu, Nannan

On Wed, Jul 22, 2020 at 6:07 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Thu, 23 Jul 2020 00:37:29 +0000 Venkataramanan, Anirudh wrote:
> > Can you please clarify how you (and the community) define bifurcated
> > driver?
>
> No amount of clarification from me will change the fact that you need
> this for DPDK.

Jakub,

The fundamental problem we have wrt DPDK is that they are not just
satisfied to just bypass the kernel datapath, they are endeavouring to
bypass the kernel control path as well with things like RTE flow API.
The purpose of this patch set, AFAICT, is to keep the kernel in the
control plane path so that we can maintain one consistent management
view of device resources. The unpleasant alternative is that DPDK will
send control messages directly to the device thereby bypassing the
kernel control plane and thus resulting in two independent entities
managing the same device and forcing a bifurcated control plane in the
device (which of course results in a complete mess!).

Tom

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [net-next 1/5] ice: add the virtchnl handler for AdminQ command
  2020-07-25 16:39                 ` Tom Herbert
@ 2020-07-27 23:04                   ` Jakub Kicinski
  2020-08-03 10:39                     ` Wang, Haiyue
  0 siblings, 1 reply; 21+ messages in thread
From: Jakub Kicinski @ 2020-07-27 23:04 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Venkataramanan, Anirudh, Wang, Haiyue, davem, nhorman, sassmann,
	Bowers, AndrewX, Kirsher, Jeffrey T, netdev, Nguyen, Anthony L,
	Lu, Nannan

On Sat, 25 Jul 2020 09:39:07 -0700 Tom Herbert wrote:
> Jakub,
> 
> The fundamental problem we have wrt DPDK is that they are not just
> satisfied to just bypass the kernel datapath, they are endeavouring to
> bypass the kernel control path as well with things like RTE flow API.
> The purpose of this patch set, AFAICT, is to keep the kernel in the
> control plane path so that we can maintain one consistent management
> view of device resources. The unpleasant alternative is that DPDK will
> send control messages directly to the device thereby bypassing the
> kernel control plane and thus resulting in two independent entities
> managing the same device and forcing a bifurcated control plane in the
> device (which of course results in a complete mess!).

Sorry for a late reply.

I try to do my best to predict what effect the community pushing back
on merging features will have. It does appear that the unpleasant
alternative you mention is becoming more and more prevalent. I believe
this is a result of multiple factors - convenience of the single point
of control, backward/forward compat, the growing size of the driver SW
stack, relative SW vs Si development cost in a NIC project, software
distribution models..

My day to day experience working with NICs shows that FW has already
taken over high perf NICs, and I hate it. I'd take DPDK over closed
source FW any time of the day, but I will not fool myself into thinking
that expansion of FW control can be stopped by the kernel opening the
floodgates to anything the vendors want to throw at us. 

Ergo the lesser evil argument does not apply.


In this case, I'm guessing, Intel can reuse RTE flow -> AQ code written
to run on PFs on the special VF.

This community has selected switchdev + flower for programming flows.
I believe implementing flower offloads would solve your use case, and
at the same time be most beneficial to the netdev community.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [net-next 1/5] ice: add the virtchnl handler for AdminQ command
  2020-07-27 23:04                   ` Jakub Kicinski
@ 2020-08-03 10:39                     ` Wang, Haiyue
  2020-08-03 20:45                       ` Jakub Kicinski
  0 siblings, 1 reply; 21+ messages in thread
From: Wang, Haiyue @ 2020-08-03 10:39 UTC (permalink / raw)
  To: Jakub Kicinski, Tom Herbert
  Cc: Venkataramanan, Anirudh, davem, nhorman, sassmann, Bowers,
	AndrewX, Kirsher, Jeffrey T, netdev, Nguyen, Anthony L, Lu,
	Nannan, Liang, Cunming

> -----Original Message-----
> From: Jakub Kicinski <kuba@kernel.org>
> Sent: Tuesday, July 28, 2020 07:04
> To: Tom Herbert <tom@herbertland.com>
> Cc: Venkataramanan, Anirudh <anirudh.venkataramanan@intel.com>; Wang, Haiyue <haiyue.wang@intel.com>;
> davem@davemloft.net; nhorman@redhat.com; sassmann@redhat.com; Bowers, AndrewX
> <andrewx.bowers@intel.com>; Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com>; netdev@vger.kernel.org;
> Nguyen, Anthony L <anthony.l.nguyen@intel.com>; Lu, Nannan <nannan.lu@intel.com>
> Subject: Re: [net-next 1/5] ice: add the virtchnl handler for AdminQ command
> 


> In this case, I'm guessing, Intel can reuse RTE flow -> AQ code written
> to run on PFs on the special VF.
> 
> This community has selected switchdev + flower for programming flows.
> I believe implementing flower offloads would solve your use case, and
> at the same time be most beneficial to the netdev community.

Jakub,

Thanks, I deep into the switchdev, it is kernel software bridge for hardware
offload, and each port is registered with register_netdev. So this solution
is not suitable for current case: VF can be assigned to VMs.

BR,
Haiyue

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [net-next 1/5] ice: add the virtchnl handler for AdminQ command
  2020-08-03 10:39                     ` Wang, Haiyue
@ 2020-08-03 20:45                       ` Jakub Kicinski
  2020-08-05  1:06                         ` Wang, Haiyue
  0 siblings, 1 reply; 21+ messages in thread
From: Jakub Kicinski @ 2020-08-03 20:45 UTC (permalink / raw)
  To: Wang, Haiyue
  Cc: Tom Herbert, Venkataramanan, Anirudh, davem, nhorman, sassmann,
	Bowers, AndrewX, Kirsher, Jeffrey T, netdev, Nguyen, Anthony L,
	Lu, Nannan, Liang, Cunming

On Mon, 3 Aug 2020 10:39:52 +0000 Wang, Haiyue wrote:
> > In this case, I'm guessing, Intel can reuse RTE flow -> AQ code written
> > to run on PFs on the special VF.
> > 
> > This community has selected switchdev + flower for programming flows.
> > I believe implementing flower offloads would solve your use case, and
> > at the same time be most beneficial to the netdev community.  
> 
> Jakub,
> 
> Thanks, I deep into the switchdev, it is kernel software bridge for hardware
> offload, and each port is registered with register_netdev. So this solution
> is not suitable for current case: VF can be assigned to VMs.

You may be missing the concept of a representor.

Sridhar from Intel was investigating this, I believe, at some point.
Perhaps sync with him?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [net-next 1/5] ice: add the virtchnl handler for AdminQ command
  2020-08-03 20:45                       ` Jakub Kicinski
@ 2020-08-05  1:06                         ` Wang, Haiyue
  2020-08-26  2:45                           ` Liang, Cunming
  2020-08-26  2:53                           ` Liang, Cunming
  0 siblings, 2 replies; 21+ messages in thread
From: Wang, Haiyue @ 2020-08-05  1:06 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Tom Herbert, Venkataramanan, Anirudh, davem, nhorman, sassmann,
	Bowers, AndrewX, Kirsher, Jeffrey T, netdev, Nguyen, Anthony L,
	Lu, Nannan, Liang, Cunming

> -----Original Message-----
> From: Jakub Kicinski <kuba@kernel.org>
> Sent: Tuesday, August 4, 2020 04:46
> To: Wang, Haiyue <haiyue.wang@intel.com>
> Cc: Tom Herbert <tom@herbertland.com>; Venkataramanan, Anirudh <anirudh.venkataramanan@intel.com>;
> davem@davemloft.net; nhorman@redhat.com; sassmann@redhat.com; Bowers, AndrewX
> <andrewx.bowers@intel.com>; Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com>; netdev@vger.kernel.org;
> Nguyen, Anthony L <anthony.l.nguyen@intel.com>; Lu, Nannan <nannan.lu@intel.com>; Liang, Cunming
> <cunming.liang@intel.com>
> Subject: Re: [net-next 1/5] ice: add the virtchnl handler for AdminQ command
> 
> On Mon, 3 Aug 2020 10:39:52 +0000 Wang, Haiyue wrote:
> > > In this case, I'm guessing, Intel can reuse RTE flow -> AQ code written
> > > to run on PFs on the special VF.
> > >
> > > This community has selected switchdev + flower for programming flows.
> > > I believe implementing flower offloads would solve your use case, and
> > > at the same time be most beneficial to the netdev community.
> >
> > Jakub,
> >
> > Thanks, I deep into the switchdev, it is kernel software bridge for hardware
> > offload, and each port is registered with register_netdev. So this solution
> > is not suitable for current case: VF can be assigned to VMs.
> 
> You may be missing the concept of a representor.
> 

I found the concept, thanks, missed it!

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [net-next 1/5] ice: add the virtchnl handler for AdminQ command
  2020-08-05  1:06                         ` Wang, Haiyue
@ 2020-08-26  2:45                           ` Liang, Cunming
  2020-08-26  2:53                           ` Liang, Cunming
  1 sibling, 0 replies; 21+ messages in thread
From: Liang, Cunming @ 2020-08-26  2:45 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Wang, Haiyue, Singhai, Anjali, Samudrala, Sridhar, Sarangam,
	Parthasarathy, Jamal Hadi Salim, shm, Tom Herbert,
	Venkataramanan, Anirudh, davem, nhorman, sassmann, Bowers,
	AndrewX, Kirsher, Jeffrey T, netdev, Nguyen, Anthony L, Lu,
	Nannan, Brandeburg, Jesse, Parikh, Neerav



> -----Original Message-----
> From: Wang, Haiyue <haiyue.wang@intel.com>
> Sent: Wednesday, August 5, 2020 9:06 AM
> To: Jakub Kicinski <kuba@kernel.org>
> Cc: Tom Herbert <tom@herbertland.com>; Venkataramanan, Anirudh
> <anirudh.venkataramanan@intel.com>; davem@davemloft.net;
> nhorman@redhat.com; sassmann@redhat.com; Bowers, AndrewX
> <andrewx.bowers@intel.com>; Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com>;
> netdev@vger.kernel.org; Nguyen, Anthony L <anthony.l.nguyen@intel.com>; Lu,
> Nannan <nannan.lu@intel.com>; Liang, Cunming <cunming.liang@intel.com>
> Subject: RE: [net-next 1/5] ice: add the virtchnl handler for AdminQ command
> 
> > -----Original Message-----
> > From: Jakub Kicinski <kuba@kernel.org>
> > Sent: Tuesday, August 4, 2020 04:46
> > To: Wang, Haiyue <haiyue.wang@intel.com>
> > Cc: Tom Herbert <tom@herbertland.com>; Venkataramanan, Anirudh
> > <anirudh.venkataramanan@intel.com>;
> > davem@davemloft.net; nhorman@redhat.com; sassmann@redhat.com;
> Bowers,
> > AndrewX <andrewx.bowers@intel.com>; Kirsher, Jeffrey T
> > <jeffrey.t.kirsher@intel.com>; netdev@vger.kernel.org; Nguyen, Anthony
> > L <anthony.l.nguyen@intel.com>; Lu, Nannan <nannan.lu@intel.com>;
> > Liang, Cunming <cunming.liang@intel.com>
> > Subject: Re: [net-next 1/5] ice: add the virtchnl handler for AdminQ
> > command
> >
> > On Mon, 3 Aug 2020 10:39:52 +0000 Wang, Haiyue wrote:
> > > > In this case, I'm guessing, Intel can reuse RTE flow -> AQ code
> > > > written to run on PFs on the special VF.
> > > >
> > > > This community has selected switchdev + flower for programming flows.
> > > > I believe implementing flower offloads would solve your use case,
> > > > and at the same time be most beneficial to the netdev community.

Jacob, thanks for the feedback. We proposed the previous solution in our eagerness to satisfy customers who were using mature, and validated (for their data centers) host kernels and still enable rapid adaption to new network control planes.

When revisiting the real problems we were facing, basically we're looking for a rapid self-iteration control plane independent of a mature deployed host kernel. Definitely kernel is the most suitable path for a control plane and we need to enhance the kernel to add the missing piece required for this solution. Best way to achieve this is allow such use cases is to deploy control plane on latest kernel running as virtual machine. We shared some thoughts on netdev 0x14 workshop, attached link as https://github.com/seaturn/switchdev-trust-vf/blob/master/netconf-workshop.pdf .

As a follow-up, we'll continue work on tc-generic proposal and look for programming rate improvement. As an independent effort of enhancing tc-generic/switchdev on trusted VF, delegating device specific capabilities (e.g. eswitch) to an assignable trusted VF brings all the benefit of a separated kernel to upgrade up-to-date features in the pace of applications, and always prevent host stack from any connectivity (e.g. stable access) issues.

Will be happy to answer any queries...and thank you for guiding us in the right path.

> > >
> > > Jakub,
> > >
> > > Thanks, I deep into the switchdev, it is kernel software bridge for
> > > hardware offload, and each port is registered with register_netdev.
> > > So this solution is not suitable for current case: VF can be assigned to VMs.
> >
> > You may be missing the concept of a representor.
> >
> 
> I found the concept, thanks, missed it!

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [net-next 1/5] ice: add the virtchnl handler for AdminQ command
  2020-08-05  1:06                         ` Wang, Haiyue
  2020-08-26  2:45                           ` Liang, Cunming
@ 2020-08-26  2:53                           ` Liang, Cunming
  1 sibling, 0 replies; 21+ messages in thread
From: Liang, Cunming @ 2020-08-26  2:53 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Wang, Haiyue, Singhai, Anjali, Samudrala, Sridhar, Sarangam,
	Parthasarathy, Jamal Hadi Salim, shm, Tom Herbert,
	Venkataramanan, Anirudh, davem, nhorman, sassmann, Bowers,
	AndrewX, Kirsher, Jeffrey T, netdev, Nguyen, Anthony L, Lu,
	Nannan, Brandeburg, Jesse, Parikh, Neerav



> -----Original Message-----
> From: Wang, Haiyue <haiyue.wang@intel.com>
> Sent: Wednesday, August 5, 2020 9:06 AM
> To: Jakub Kicinski <kuba@kernel.org>
> Cc: Tom Herbert <tom@herbertland.com>; Venkataramanan, Anirudh
> <anirudh.venkataramanan@intel.com>; davem@davemloft.net;
> nhorman@redhat.com; sassmann@redhat.com; Bowers, AndrewX
> <andrewx.bowers@intel.com>; Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com>;
> netdev@vger.kernel.org; Nguyen, Anthony L <anthony.l.nguyen@intel.com>; Lu,
> Nannan <nannan.lu@intel.com>; Liang, Cunming <cunming.liang@intel.com>
> Subject: RE: [net-next 1/5] ice: add the virtchnl handler for AdminQ command
> 
> > -----Original Message-----
> > From: Jakub Kicinski <kuba@kernel.org>
> > Sent: Tuesday, August 4, 2020 04:46
> > To: Wang, Haiyue <haiyue.wang@intel.com>
> > Cc: Tom Herbert <tom@herbertland.com>; Venkataramanan, Anirudh
> > <anirudh.venkataramanan@intel.com>;
> > davem@davemloft.net; nhorman@redhat.com; sassmann@redhat.com;
> Bowers,
> > AndrewX <andrewx.bowers@intel.com>; Kirsher, Jeffrey T
> > <jeffrey.t.kirsher@intel.com>; netdev@vger.kernel.org; Nguyen, Anthony
> > L <anthony.l.nguyen@intel.com>; Lu, Nannan <nannan.lu@intel.com>;
> > Liang, Cunming <cunming.liang@intel.com>
> > Subject: Re: [net-next 1/5] ice: add the virtchnl handler for AdminQ
> > command
> >
> > On Mon, 3 Aug 2020 10:39:52 +0000 Wang, Haiyue wrote:
> > > > In this case, I'm guessing, Intel can reuse RTE flow -> AQ code
> > > > written to run on PFs on the special VF.
> > > >
> > > > This community has selected switchdev + flower for programming flows.
> > > > I believe implementing flower offloads would solve your use case,
> > > > and at the same time be most beneficial to the netdev community.

Jakub, thanks for the feedback. We proposed the previous solution in our eagerness to satisfy customers who were using mature, and validated (for their data centers) host kernels and still enable rapid adaption to new network control planes.

When revisiting the real problems we were facing, basically we're looking for a rapid self-iteration control plane independent of a mature deployed host kernel. Definitely kernel is the most suitable path for a control plane and we need to enhance the kernel to add the missing piece required for this solution. Best way to achieve this is allow such use cases is to deploy control plane on latest kernel running as virtual machine. We shared some thoughts on netdev 0x14 workshop, attached link as https://github.com/seaturn/switchdev-trust-vf/blob/master/netconf-workshop.pdf.

As a follow-up, we'll continue work on tc-generic proposal and look for programming rate improvement. As an independent effort of enhancing tc-generic/switchdev on trusted VF, delegating device specific capabilities (e.g. eswitch) to an assignable trusted VF brings all the benefit of a separated kernel to upgrade up-to-date features in the pace of applications, and always prevent host stack from any connectivity (e.g. stable access) issues.

Will be happy to answer any queries...and thank you for guiding us in the right path.

> > >
> > > Jakub,
> > >
> > > Thanks, I deep into the switchdev, it is kernel software bridge for
> > > hardware offload, and each port is registered with register_netdev.
> > > So this solution is not suitable for current case: VF can be assigned to VMs.
> >
> > You may be missing the concept of a representor.
> >
> 
> I found the concept, thanks, missed it!

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2020-08-26  2:53 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-13 17:43 [net-next 0/5][pull request] 100GbE Intel Wired LAN Driver Updates 2020-07-13 Tony Nguyen
2020-07-13 17:43 ` [net-next 1/5] ice: add the virtchnl handler for AdminQ command Tony Nguyen
2020-07-13 22:48   ` Jakub Kicinski
2020-07-14  1:29     ` Wang, Haiyue
2020-07-14 18:24       ` Jakub Kicinski
2020-07-15  1:17         ` Wang, Haiyue
2020-07-15 18:03           ` Jakub Kicinski
2020-07-15 18:18             ` Wang, Haiyue
2020-07-23  0:37             ` Venkataramanan, Anirudh
2020-07-23  1:07               ` Jakub Kicinski
2020-07-25 16:39                 ` Tom Herbert
2020-07-27 23:04                   ` Jakub Kicinski
2020-08-03 10:39                     ` Wang, Haiyue
2020-08-03 20:45                       ` Jakub Kicinski
2020-08-05  1:06                         ` Wang, Haiyue
2020-08-26  2:45                           ` Liang, Cunming
2020-08-26  2:53                           ` Liang, Cunming
2020-07-13 17:43 ` [net-next 2/5] ice: add DCF cap negotiation and state machine Tony Nguyen
2020-07-13 17:43 ` [net-next 3/5] ice: support to get the VSI mapping Tony Nguyen
2020-07-13 17:43 ` [net-next 4/5] ice: enable DDP package info querying Tony Nguyen
2020-07-13 17:43 ` [net-next 5/5] ice: add switch rule management for DCF Tony Nguyen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.