All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next 0/3] net: Add support to configure SR-IOV VF queues.
@ 2018-05-29  8:18 Michael Chan
  2018-05-29  8:18 ` [PATCH net-next 1/3] net: Add support to configure SR-IOV VF minimum and maximum queues Michael Chan
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Michael Chan @ 2018-05-29  8:18 UTC (permalink / raw)
  To: davem; +Cc: netdev

VF Queue resources are always limited and there is currently no
infrastructure to allow the admin. on the host to add or reduce queue
resources for any particular VF.  This series adds the infrastructure
to do that and adds the functionality to the bnxt_en driver.

The "ip link set" command will subsequently be patched to support the new
operation.

v1:
- Changed the meaning of the min parameters to be strictly the minimum
guaranteed value, suggested by Jakub Kicinsky.
- More complete implementation in the bnxt_en driver.

Michael Chan (3):
  net: Add support to configure SR-IOV VF minimum and maximum queues.
  bnxt_en: Store min/max tx/rx rings for individual VFs.
  bnxt_en: Implement .ndo_set_vf_queues().

 drivers/net/ethernet/broadcom/bnxt/bnxt.c       |   1 +
 drivers/net/ethernet/broadcom/bnxt/bnxt.h       |   9 ++
 drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c | 157 +++++++++++++++++++++++-
 drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.h |   2 +
 include/linux/if_link.h                         |   4 +
 include/linux/netdevice.h                       |   6 +
 include/uapi/linux/if_link.h                    |   9 ++
 net/core/rtnetlink.c                            |  32 ++++-
 8 files changed, 213 insertions(+), 7 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH net-next 1/3] net: Add support to configure SR-IOV VF minimum and maximum queues.
  2018-05-29  8:18 [PATCH net-next 0/3] net: Add support to configure SR-IOV VF queues Michael Chan
@ 2018-05-29  8:18 ` Michael Chan
  2018-05-29 20:46   ` Samudrala, Sridhar
  2018-05-29  8:18 ` [PATCH net-next 2/3] bnxt_en: Store min/max tx/rx rings for individual VFs Michael Chan
  2018-05-29  8:18 ` [PATCH net-next 3/3] bnxt_en: Implement .ndo_set_vf_queues() Michael Chan
  2 siblings, 1 reply; 14+ messages in thread
From: Michael Chan @ 2018-05-29  8:18 UTC (permalink / raw)
  To: davem; +Cc: netdev

VF Queue resources are always limited and there is currently no
infrastructure to allow the admin. on the host to add or reduce queue
resources for any particular VF.  With ever increasing number of VFs
being supported, it is desirable to allow the admin. to configure queue
resources differently for the VFs.  Some VFs may require more or fewer
queues due to different bandwidth requirements or different number of
vCPUs in the VM.  This patch adds the infrastructure to do that by
adding IFLA_VF_QUEUES netlink attribute and a new .ndo_set_vf_queues()
to the net_device_ops.

Four parameters are exposed for each VF:

o min_tx_queues - Guaranteed tx queues available to the VF.

o max_tx_queues - Maximum but not necessarily guaranteed tx queues
  available to the VF.

o min_rx_queues - Guaranteed rx queues available to the VF.

o max_rx_queues - Maximum but not necessarily guaranteed rx queues
  available to the VF.

The "ip link set" command will subsequently be patched to support the new
operation to set the above parameters.

After the admin. makes a change to the above parameters, the corresponding
VF will have a new range of channels to set using ethtool -L.  The VF may
have to go through IF down/up before the new queues will take effect.  Up
to the min values are guaranteed.  Up to the max values are possible but not
guaranteed.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 include/linux/if_link.h      |  4 ++++
 include/linux/netdevice.h    |  6 ++++++
 include/uapi/linux/if_link.h |  9 +++++++++
 net/core/rtnetlink.c         | 32 +++++++++++++++++++++++++++++---
 4 files changed, 48 insertions(+), 3 deletions(-)

diff --git a/include/linux/if_link.h b/include/linux/if_link.h
index 622658d..8e81121 100644
--- a/include/linux/if_link.h
+++ b/include/linux/if_link.h
@@ -29,5 +29,9 @@ struct ifla_vf_info {
 	__u32 rss_query_en;
 	__u32 trusted;
 	__be16 vlan_proto;
+	__u32 min_tx_queues;
+	__u32 max_tx_queues;
+	__u32 min_rx_queues;
+	__u32 max_rx_queues;
 };
 #endif /* _LINUX_IF_LINK_H */
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 8452f72..17f5892 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1023,6 +1023,8 @@ struct dev_ifalias {
  *      with PF and querying it may introduce a theoretical security risk.
  * int (*ndo_set_vf_rss_query_en)(struct net_device *dev, int vf, bool setting);
  * int (*ndo_get_vf_port)(struct net_device *dev, int vf, struct sk_buff *skb);
+ * int (*ndo_set_vf_queues)(struct net_device *dev, int vf, int min_txq,
+ *			    int max_txq, int min_rxq, int max_rxq);
  * int (*ndo_setup_tc)(struct net_device *dev, enum tc_setup_type type,
  *		       void *type_data);
  *	Called to setup any 'tc' scheduler, classifier or action on @dev.
@@ -1276,6 +1278,10 @@ struct net_device_ops {
 	int			(*ndo_set_vf_rss_query_en)(
 						   struct net_device *dev,
 						   int vf, bool setting);
+	int			(*ndo_set_vf_queues)(struct net_device *dev,
+						     int vf,
+						     int min_txq, int max_txq,
+						     int min_rxq, int max_rxq);
 	int			(*ndo_setup_tc)(struct net_device *dev,
 						enum tc_setup_type type,
 						void *type_data);
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index cf01b68..81bbc4e 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -659,6 +659,7 @@ enum {
 	IFLA_VF_IB_NODE_GUID,	/* VF Infiniband node GUID */
 	IFLA_VF_IB_PORT_GUID,	/* VF Infiniband port GUID */
 	IFLA_VF_VLAN_LIST,	/* nested list of vlans, option for QinQ */
+	IFLA_VF_QUEUES,		/* Min and Max TX/RX queues */
 	__IFLA_VF_MAX,
 };
 
@@ -749,6 +750,14 @@ struct ifla_vf_trust {
 	__u32 setting;
 };
 
+struct ifla_vf_queues {
+	__u32 vf;
+	__u32 min_tx_queues;	/* min guaranteed tx queues */
+	__u32 max_tx_queues;	/* max non guaranteed tx queues */
+	__u32 min_rx_queues;	/* min guaranteed rx queues */
+	__u32 max_rx_queues;	/* max non guaranteed rx queues */
+};
+
 /* VF ports management section
  *
  *	Nested layout of set/get msg is:
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 8080254..e21ab8a 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -921,7 +921,8 @@ static inline int rtnl_vfinfo_size(const struct net_device *dev,
 			 nla_total_size_64bit(sizeof(__u64)) +
 			 /* IFLA_VF_STATS_TX_DROPPED */
 			 nla_total_size_64bit(sizeof(__u64)) +
-			 nla_total_size(sizeof(struct ifla_vf_trust)));
+			 nla_total_size(sizeof(struct ifla_vf_trust)) +
+			 nla_total_size(sizeof(struct ifla_vf_queues)));
 		return size;
 	} else
 		return 0;
@@ -1181,6 +1182,7 @@ static noinline_for_stack int rtnl_fill_vfinfo(struct sk_buff *skb,
 	struct ifla_vf_vlan_info vf_vlan_info;
 	struct ifla_vf_spoofchk vf_spoofchk;
 	struct ifla_vf_tx_rate vf_tx_rate;
+	struct ifla_vf_queues vf_queues;
 	struct ifla_vf_stats vf_stats;
 	struct ifla_vf_trust vf_trust;
 	struct ifla_vf_vlan vf_vlan;
@@ -1198,6 +1200,10 @@ static noinline_for_stack int rtnl_fill_vfinfo(struct sk_buff *skb,
 	ivi.spoofchk = -1;
 	ivi.rss_query_en = -1;
 	ivi.trusted = -1;
+	ivi.min_tx_queues = -1;
+	ivi.max_tx_queues = -1;
+	ivi.min_rx_queues = -1;
+	ivi.max_rx_queues = -1;
 	/* The default value for VF link state is "auto"
 	 * IFLA_VF_LINK_STATE_AUTO which equals zero
 	 */
@@ -1217,7 +1223,8 @@ static noinline_for_stack int rtnl_fill_vfinfo(struct sk_buff *skb,
 		vf_spoofchk.vf =
 		vf_linkstate.vf =
 		vf_rss_query_en.vf =
-		vf_trust.vf = ivi.vf;
+		vf_trust.vf =
+		vf_queues.vf = ivi.vf;
 
 	memcpy(vf_mac.mac, ivi.mac, sizeof(ivi.mac));
 	vf_vlan.vlan = ivi.vlan;
@@ -1232,6 +1239,10 @@ static noinline_for_stack int rtnl_fill_vfinfo(struct sk_buff *skb,
 	vf_linkstate.link_state = ivi.linkstate;
 	vf_rss_query_en.setting = ivi.rss_query_en;
 	vf_trust.setting = ivi.trusted;
+	vf_queues.min_tx_queues = ivi.min_tx_queues;
+	vf_queues.max_tx_queues = ivi.max_tx_queues;
+	vf_queues.min_rx_queues = ivi.min_rx_queues;
+	vf_queues.max_rx_queues = ivi.max_rx_queues;
 	vf = nla_nest_start(skb, IFLA_VF_INFO);
 	if (!vf)
 		goto nla_put_vfinfo_failure;
@@ -1249,7 +1260,9 @@ static noinline_for_stack int rtnl_fill_vfinfo(struct sk_buff *skb,
 		    sizeof(vf_rss_query_en),
 		    &vf_rss_query_en) ||
 	    nla_put(skb, IFLA_VF_TRUST,
-		    sizeof(vf_trust), &vf_trust))
+		    sizeof(vf_trust), &vf_trust) ||
+	    nla_put(skb, IFLA_VF_QUEUES,
+		    sizeof(vf_queues), &vf_queues))
 		goto nla_put_vf_failure;
 	vfvlanlist = nla_nest_start(skb, IFLA_VF_VLAN_LIST);
 	if (!vfvlanlist)
@@ -1706,6 +1719,7 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb,
 	[IFLA_VF_TRUST]		= { .len = sizeof(struct ifla_vf_trust) },
 	[IFLA_VF_IB_NODE_GUID]	= { .len = sizeof(struct ifla_vf_guid) },
 	[IFLA_VF_IB_PORT_GUID]	= { .len = sizeof(struct ifla_vf_guid) },
+	[IFLA_VF_QUEUES]	= { .len = sizeof(struct ifla_vf_queues) },
 };
 
 static const struct nla_policy ifla_port_policy[IFLA_PORT_MAX+1] = {
@@ -2208,6 +2222,18 @@ static int do_setvfinfo(struct net_device *dev, struct nlattr **tb)
 		return handle_vf_guid(dev, ivt, IFLA_VF_IB_PORT_GUID);
 	}
 
+	if (tb[IFLA_VF_QUEUES]) {
+		struct ifla_vf_queues *ivq = nla_data(tb[IFLA_VF_QUEUES]);
+
+		err = -EOPNOTSUPP;
+		if (ops->ndo_set_vf_queues)
+			err = ops->ndo_set_vf_queues(dev, ivq->vf,
+					ivq->min_tx_queues, ivq->max_tx_queues,
+					ivq->min_rx_queues, ivq->max_rx_queues);
+		if (err < 0)
+			return err;
+	}
+
 	return err;
 }
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH net-next 2/3] bnxt_en: Store min/max tx/rx rings for individual VFs.
  2018-05-29  8:18 [PATCH net-next 0/3] net: Add support to configure SR-IOV VF queues Michael Chan
  2018-05-29  8:18 ` [PATCH net-next 1/3] net: Add support to configure SR-IOV VF minimum and maximum queues Michael Chan
@ 2018-05-29  8:18 ` Michael Chan
  2018-05-29  8:18 ` [PATCH net-next 3/3] bnxt_en: Implement .ndo_set_vf_queues() Michael Chan
  2 siblings, 0 replies; 14+ messages in thread
From: Michael Chan @ 2018-05-29  8:18 UTC (permalink / raw)
  To: davem; +Cc: netdev

With new infrastructure to configure queues differently for each VF,
we need to store the current min/max rx/tx rings and other resources
for each VF.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.h       |  9 +++++++++
 drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c | 27 +++++++++++++++++++++----
 2 files changed, 32 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index 9b14eb6..531c77d 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -837,6 +837,14 @@ struct bnxt_vf_info {
 	u32	func_flags; /* func cfg flags */
 	u32	min_tx_rate;
 	u32	max_tx_rate;
+	u16	min_tx_rings;
+	u16	max_tx_rings;
+	u16	min_rx_rings;
+	u16	max_rx_rings;
+	u16	min_cp_rings;
+	u16	min_stat_ctxs;
+	u16	min_ring_grps;
+	u16	min_vnics;
 	void	*hwrm_cmd_req_addr;
 	dma_addr_t	hwrm_cmd_req_dma_addr;
 };
@@ -1351,6 +1359,7 @@ struct bnxt {
 #ifdef CONFIG_BNXT_SRIOV
 	int			nr_vfs;
 	struct bnxt_vf_info	vf;
+	struct hwrm_func_vf_resource_cfg_input vf_resc_cfg_input;
 	wait_queue_head_t	sriov_cfg_wait;
 	bool			sriov_cfg;
 #define BNXT_SRIOV_CFG_WAIT_TMO	msecs_to_jiffies(10000)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
index a649108..7a92125 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
@@ -171,6 +171,10 @@ int bnxt_get_vf_config(struct net_device *dev, int vf_id,
 		ivi->linkstate = IFLA_VF_LINK_STATE_ENABLE;
 	else
 		ivi->linkstate = IFLA_VF_LINK_STATE_DISABLE;
+	ivi->min_tx_queues = vf->min_tx_rings;
+	ivi->max_tx_queues = vf->max_tx_rings;
+	ivi->min_rx_queues = vf->min_rx_rings;
+	ivi->max_rx_queues = vf->max_rx_rings;
 
 	return 0;
 }
@@ -498,6 +502,8 @@ static int bnxt_hwrm_func_vf_resc_cfg(struct bnxt *bp, int num_vfs)
 
 	mutex_lock(&bp->hwrm_cmd_lock);
 	for (i = 0; i < num_vfs; i++) {
+		struct bnxt_vf_info *vf = &pf->vf[i];
+
 		req.vf_id = cpu_to_le16(pf->first_vf_id + i);
 		rc = _hwrm_send_message(bp, &req, sizeof(req),
 					HWRM_CMD_TIMEOUT);
@@ -506,7 +512,15 @@ static int bnxt_hwrm_func_vf_resc_cfg(struct bnxt *bp, int num_vfs)
 			break;
 		}
 		pf->active_vfs = i + 1;
-		pf->vf[i].fw_fid = pf->first_vf_id + i;
+		vf->fw_fid = pf->first_vf_id + i;
+		vf->min_tx_rings = le16_to_cpu(req.min_tx_rings);
+		vf->max_tx_rings = vf_tx_rings;
+		vf->min_rx_rings = le16_to_cpu(req.min_rx_rings);
+		vf->max_rx_rings = vf_rx_rings;
+		vf->min_cp_rings = le16_to_cpu(req.min_cmpl_rings);
+		vf->min_stat_ctxs = le16_to_cpu(req.min_stat_ctx);
+		vf->min_ring_grps = le16_to_cpu(req.min_hw_ring_grps);
+		vf->min_vnics = le16_to_cpu(req.min_vnics);
 	}
 	mutex_unlock(&bp->hwrm_cmd_lock);
 	if (pf->active_vfs) {
@@ -521,6 +535,7 @@ static int bnxt_hwrm_func_vf_resc_cfg(struct bnxt *bp, int num_vfs)
 		hw_resc->max_stat_ctxs -= le16_to_cpu(req.min_stat_ctx) * n;
 		hw_resc->max_vnics -= le16_to_cpu(req.min_vnics) * n;
 
+		memcpy(&bp->vf_resc_cfg_input, &req, sizeof(req));
 		rc = pf->active_vfs;
 	}
 	return rc;
@@ -585,6 +600,7 @@ static int bnxt_hwrm_func_cfg(struct bnxt *bp, int num_vfs)
 
 	mutex_lock(&bp->hwrm_cmd_lock);
 	for (i = 0; i < num_vfs; i++) {
+		struct bnxt_vf_info *vf = &pf->vf[i];
 		int vf_tx_rsvd = vf_tx_rings;
 
 		req.fid = cpu_to_le16(pf->first_vf_id + i);
@@ -593,12 +609,15 @@ static int bnxt_hwrm_func_cfg(struct bnxt *bp, int num_vfs)
 		if (rc)
 			break;
 		pf->active_vfs = i + 1;
-		pf->vf[i].fw_fid = le16_to_cpu(req.fid);
-		rc = __bnxt_hwrm_get_tx_rings(bp, pf->vf[i].fw_fid,
-					      &vf_tx_rsvd);
+		vf->fw_fid = le16_to_cpu(req.fid);
+		rc = __bnxt_hwrm_get_tx_rings(bp, vf->fw_fid, &vf_tx_rsvd);
 		if (rc)
 			break;
 		total_vf_tx_rings += vf_tx_rsvd;
+		vf->min_tx_rings = vf_tx_rsvd;
+		vf->max_tx_rings = vf_tx_rsvd;
+		vf->min_rx_rings = vf_rx_rings;
+		vf->max_rx_rings = vf_rx_rings;
 	}
 	mutex_unlock(&bp->hwrm_cmd_lock);
 	if (rc)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH net-next 3/3] bnxt_en: Implement .ndo_set_vf_queues().
  2018-05-29  8:18 [PATCH net-next 0/3] net: Add support to configure SR-IOV VF queues Michael Chan
  2018-05-29  8:18 ` [PATCH net-next 1/3] net: Add support to configure SR-IOV VF minimum and maximum queues Michael Chan
  2018-05-29  8:18 ` [PATCH net-next 2/3] bnxt_en: Store min/max tx/rx rings for individual VFs Michael Chan
@ 2018-05-29  8:18 ` Michael Chan
  2 siblings, 0 replies; 14+ messages in thread
From: Michael Chan @ 2018-05-29  8:18 UTC (permalink / raw)
  To: davem; +Cc: netdev

Implement .ndo_set_vf_queues() on the PF driver to configure the queues
parameters for individual VFs.  This allows the admin. on the host to
increase or decrease queues for individual VFs.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c       |   1 +
 drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c | 130 ++++++++++++++++++++++++
 drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.h |   2 +
 3 files changed, 133 insertions(+)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index dfa0839..2ce9779 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -8373,6 +8373,7 @@ static int bnxt_swdev_port_attr_get(struct net_device *dev,
 	.ndo_set_vf_link_state	= bnxt_set_vf_link_state,
 	.ndo_set_vf_spoofchk	= bnxt_set_vf_spoofchk,
 	.ndo_set_vf_trust	= bnxt_set_vf_trust,
+	.ndo_set_vf_queues	= bnxt_set_vf_queues,
 #endif
 #ifdef CONFIG_NET_POLL_CONTROLLER
 	.ndo_poll_controller	= bnxt_poll_controller,
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
index 7a92125..a34a32f 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
@@ -138,6 +138,136 @@ int bnxt_set_vf_trust(struct net_device *dev, int vf_id, bool trusted)
 	return 0;
 }
 
+static bool bnxt_param_ok(int new, u16 curr, u16 avail)
+{
+	int delta;
+
+	if (new <= curr)
+		return true;
+
+	delta = new - curr;
+	if (delta <= avail)
+		return true;
+	return false;
+}
+
+static void bnxt_adjust_ring_resc(struct bnxt *bp, struct bnxt_vf_info *vf,
+				  struct hwrm_func_vf_resource_cfg_input *req)
+{
+	struct bnxt_hw_resc *hw_resc = &bp->hw_resc;
+	u16 avail_cp_rings, avail_stat_ctx;
+	u16 avail_vnics, avail_ring_grps;
+	u16 cp, grp, stat, vnic;
+	u16 min_tx, min_rx;
+
+	min_tx = le16_to_cpu(req->min_tx_rings);
+	min_rx = le16_to_cpu(req->min_rx_rings);
+	avail_cp_rings = hw_resc->max_cp_rings - bp->cp_nr_rings;
+	avail_stat_ctx = hw_resc->max_stat_ctxs - bp->num_stat_ctxs;
+	avail_ring_grps = hw_resc->max_hw_ring_grps - bp->rx_nr_rings;
+	avail_vnics = hw_resc->max_vnics - bp->nr_vnics;
+
+	cp = max_t(u16, 2 * min_tx, min_rx);
+	if (cp > vf->min_cp_rings)
+		cp = min_t(u16, cp, avail_cp_rings + vf->min_cp_rings);
+	grp = min_tx;
+	if (grp > vf->min_ring_grps)
+		grp = min_t(u16, grp, avail_ring_grps + vf->min_ring_grps);
+	stat = min_rx;
+	if (stat > vf->min_stat_ctxs)
+		stat = min_t(u16, stat, avail_stat_ctx + vf->min_stat_ctxs);
+	vnic = min_rx;
+	if (vnic > vf->min_vnics)
+		vnic = min_t(u16, vnic, avail_vnics + vf->min_vnics);
+
+	req->min_cmpl_rings = req->max_cmpl_rings = cpu_to_le16(cp);
+	req->min_hw_ring_grps = req->max_hw_ring_grps = cpu_to_le16(grp);
+	req->min_stat_ctx = req->max_stat_ctx = cpu_to_le16(stat);
+	req->min_vnics = req->max_vnics = cpu_to_le16(vnic);
+}
+
+static void bnxt_record_ring_resc(struct bnxt *bp, struct bnxt_vf_info *vf,
+				  struct hwrm_func_vf_resource_cfg_input *req)
+{
+	struct bnxt_hw_resc *hw_resc = &bp->hw_resc;
+
+	hw_resc->max_tx_rings += vf->min_tx_rings;
+	hw_resc->max_rx_rings += vf->min_rx_rings;
+	vf->min_tx_rings = le16_to_cpu(req->min_tx_rings);
+	vf->max_tx_rings = le16_to_cpu(req->max_tx_rings);
+	vf->min_rx_rings = le16_to_cpu(req->min_rx_rings);
+	vf->max_rx_rings = le16_to_cpu(req->max_rx_rings);
+	hw_resc->max_tx_rings -= vf->min_tx_rings;
+	hw_resc->max_rx_rings -= vf->min_rx_rings;
+	if (bp->pf.vf_resv_strategy == BNXT_VF_RESV_STRATEGY_MAXIMAL) {
+		hw_resc->max_cp_rings += vf->min_cp_rings;
+		hw_resc->max_hw_ring_grps += vf->min_ring_grps;
+		hw_resc->max_stat_ctxs += vf->min_stat_ctxs;
+		hw_resc->max_vnics += vf->min_vnics;
+		vf->min_cp_rings = le16_to_cpu(req->min_cmpl_rings);
+		vf->min_ring_grps = le16_to_cpu(req->min_hw_ring_grps);
+		vf->min_stat_ctxs = le16_to_cpu(req->min_stat_ctx);
+		vf->min_vnics = le16_to_cpu(req->min_vnics);
+		hw_resc->max_cp_rings -= vf->min_cp_rings;
+		hw_resc->max_hw_ring_grps -= vf->min_ring_grps;
+		hw_resc->max_stat_ctxs -= vf->min_stat_ctxs;
+		hw_resc->max_vnics -= vf->min_vnics;
+	}
+}
+
+int bnxt_set_vf_queues(struct net_device *dev, int vf_id, int min_txq,
+		       int max_txq, int min_rxq, int max_rxq)
+{
+	struct hwrm_func_vf_resource_cfg_input req = {0};
+	struct bnxt *bp = netdev_priv(dev);
+	u16 avail_tx_rings, avail_rx_rings;
+	struct bnxt_hw_resc *hw_resc;
+	struct bnxt_vf_info *vf;
+	int rc;
+
+	if (bnxt_vf_ndo_prep(bp, vf_id))
+		return -EINVAL;
+
+	if (!(bp->flags & BNXT_FLAG_NEW_RM) || bp->hwrm_spec_code < 0x10902)
+		return -EOPNOTSUPP;
+
+	vf = &bp->pf.vf[vf_id];
+	hw_resc = &bp->hw_resc;
+
+	avail_tx_rings = hw_resc->max_tx_rings - bp->tx_nr_rings;
+	if (bp->flags & BNXT_FLAG_AGG_RINGS)
+		avail_rx_rings = hw_resc->max_rx_rings - bp->rx_nr_rings * 2;
+	else
+		avail_rx_rings = hw_resc->max_rx_rings - bp->rx_nr_rings;
+
+	if (!bnxt_param_ok(min_txq, vf->min_tx_rings, avail_tx_rings))
+		return -ENOBUFS;
+	if (!bnxt_param_ok(min_rxq, vf->min_rx_rings, avail_rx_rings))
+		return -ENOBUFS;
+	if (!bnxt_param_ok(max_txq, vf->max_tx_rings, avail_tx_rings))
+		return -ENOBUFS;
+	if (!bnxt_param_ok(max_rxq, vf->max_rx_rings, avail_rx_rings))
+		return -ENOBUFS;
+
+	bnxt_hwrm_cmd_hdr_init(bp, &req, HWRM_FUNC_VF_RESOURCE_CFG, -1, -1);
+	memcpy(&req, &bp->vf_resc_cfg_input, sizeof(req));
+	req.vf_id = cpu_to_le16(vf->fw_fid);
+	req.min_tx_rings = cpu_to_le16(min_txq);
+	req.min_rx_rings = cpu_to_le16(min_rxq);
+	req.max_tx_rings = cpu_to_le16(max_txq);
+	req.max_rx_rings = cpu_to_le16(max_rxq);
+
+	if (bp->pf.vf_resv_strategy == BNXT_VF_RESV_STRATEGY_MAXIMAL)
+		bnxt_adjust_ring_resc(bp, vf, &req);
+
+	rc = hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT);
+	if (rc)
+		return -EIO;
+
+	bnxt_record_ring_resc(bp, vf, &req);
+	return 0;
+}
+
 int bnxt_get_vf_config(struct net_device *dev, int vf_id,
 		       struct ifla_vf_info *ivi)
 {
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.h b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.h
index e9b20cd..325b412 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.h
@@ -35,6 +35,8 @@
 int bnxt_set_vf_link_state(struct net_device *, int, int);
 int bnxt_set_vf_spoofchk(struct net_device *, int, bool);
 int bnxt_set_vf_trust(struct net_device *dev, int vf_id, bool trust);
+int bnxt_set_vf_queues(struct net_device *dev, int vf_id, int min_txq,
+		       int max_txq, int min_rxq, int max_rxq);
 int bnxt_sriov_configure(struct pci_dev *pdev, int num_vfs);
 void bnxt_sriov_disable(struct bnxt *);
 void bnxt_hwrm_exec_fwd_req(struct bnxt *);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next 1/3] net: Add support to configure SR-IOV VF minimum and maximum queues.
  2018-05-29  8:18 ` [PATCH net-next 1/3] net: Add support to configure SR-IOV VF minimum and maximum queues Michael Chan
@ 2018-05-29 20:46   ` Samudrala, Sridhar
  2018-05-30  3:19     ` Michael Chan
  0 siblings, 1 reply; 14+ messages in thread
From: Samudrala, Sridhar @ 2018-05-29 20:46 UTC (permalink / raw)
  To: Michael Chan, davem; +Cc: netdev

On 5/29/2018 1:18 AM, Michael Chan wrote:
> VF Queue resources are always limited and there is currently no
> infrastructure to allow the admin. on the host to add or reduce queue
> resources for any particular VF.  With ever increasing number of VFs
> being supported, it is desirable to allow the admin. to configure queue
> resources differently for the VFs.  Some VFs may require more or fewer
> queues due to different bandwidth requirements or different number of
> vCPUs in the VM.  This patch adds the infrastructure to do that by
> adding IFLA_VF_QUEUES netlink attribute and a new .ndo_set_vf_queues()
> to the net_device_ops.
>
> Four parameters are exposed for each VF:
>
> o min_tx_queues - Guaranteed tx queues available to the VF.
>
> o max_tx_queues - Maximum but not necessarily guaranteed tx queues
>    available to the VF.
>
> o min_rx_queues - Guaranteed rx queues available to the VF.
>
> o max_rx_queues - Maximum but not necessarily guaranteed rx queues
>    available to the VF.
>
> The "ip link set" command will subsequently be patched to support the new
> operation to set the above parameters.
>
> After the admin. makes a change to the above parameters, the corresponding
> VF will have a new range of channels to set using ethtool -L.  The VF may
> have to go through IF down/up before the new queues will take effect.  Up
> to the min values are guaranteed.  Up to the max values are possible but not
> guaranteed.
>
> Signed-off-by: Michael Chan <michael.chan@broadcom.com>
> ---
>   include/linux/if_link.h      |  4 ++++
>   include/linux/netdevice.h    |  6 ++++++
>   include/uapi/linux/if_link.h |  9 +++++++++
>   net/core/rtnetlink.c         | 32 +++++++++++++++++++++++++++++---
>   4 files changed, 48 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/if_link.h b/include/linux/if_link.h
> index 622658d..8e81121 100644
> --- a/include/linux/if_link.h
> +++ b/include/linux/if_link.h
> @@ -29,5 +29,9 @@ struct ifla_vf_info {
>   	__u32 rss_query_en;
>   	__u32 trusted;
>   	__be16 vlan_proto;
> +	__u32 min_tx_queues;
> +	__u32 max_tx_queues;
> +	__u32 min_rx_queues;
> +	__u32 max_rx_queues;
>   };
>   #endif /* _LINUX_IF_LINK_H */
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 8452f72..17f5892 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -1023,6 +1023,8 @@ struct dev_ifalias {
>    *      with PF and querying it may introduce a theoretical security risk.
>    * int (*ndo_set_vf_rss_query_en)(struct net_device *dev, int vf, bool setting);
>    * int (*ndo_get_vf_port)(struct net_device *dev, int vf, struct sk_buff *skb);
> + * int (*ndo_set_vf_queues)(struct net_device *dev, int vf, int min_txq,
> + *			    int max_txq, int min_rxq, int max_rxq);

Isn't ndo_set_vf_xxx() considered a legacy interface and not planned to be extended?
Shouldn't we enable this via ethtool on the port representor netdev?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next 1/3] net: Add support to configure SR-IOV VF minimum and maximum queues.
  2018-05-29 20:46   ` Samudrala, Sridhar
@ 2018-05-30  3:19     ` Michael Chan
  2018-05-30 22:36       ` Jakub Kicinski
  0 siblings, 1 reply; 14+ messages in thread
From: Michael Chan @ 2018-05-30  3:19 UTC (permalink / raw)
  To: Samudrala, Sridhar; +Cc: David Miller, Netdev

On Tue, May 29, 2018 at 1:46 PM, Samudrala, Sridhar
<sridhar.samudrala@intel.com> wrote:

>
> Isn't ndo_set_vf_xxx() considered a legacy interface and not planned to be
> extended?

I didn't know about that.

> Shouldn't we enable this via ethtool on the port representor netdev?
>
>

We discussed about this.  ethtool on the VF representor will only work
in switchdev mode and also will not support min/max values.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next 1/3] net: Add support to configure SR-IOV VF minimum and maximum queues.
  2018-05-30  3:19     ` Michael Chan
@ 2018-05-30 22:36       ` Jakub Kicinski
  0 siblings, 0 replies; 14+ messages in thread
From: Jakub Kicinski @ 2018-05-30 22:36 UTC (permalink / raw)
  To: Michael Chan; +Cc: Samudrala, Sridhar, David Miller, Netdev

On Wed, 30 May 2018 00:18:39 -0700, Michael Chan wrote:
> On Tue, May 29, 2018 at 11:33 PM, Jakub Kicinski wrote:
> > At some points you (Broadcom) were working whole bunch of devlink
> > configuration options for the PCIe side of the ASIC.  The number of
> > queues relates to things like number of allocated MSI-X vectors, which
> > if memory serves me was in your devlink patch set.  In an ideal world
> > we would try to keep all those in one place :)  
> 
> Yeah, another colleague is now working with Mellanox on something similar.
> 
> One difference between those devlink parameters and these queue
> parameters is that the former are more permanent and global settings.
> For example, number of VFs or number of MSIX per VF are persistent
> settings once they are set and after PCIe reset.  On the other hand,
> these queue settings are pure run-time settings and may be unique for
> each VF.  These are not stored as there is no room in NVRAM to store
> 128 sets or more of these parameters.

Indeed, I think the API must be flexible as to what is persistent and
what is not because HW will certainly differ in that regard.  And
agreed, queues may be a bit of a stretch here, but worth a try.

> Anyway, let me discuss this with my colleague to see if there is a
> natural fit for these queue parameters in the devlink infrastructure
> that they are working on.

Thank you!

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next 1/3] net: Add support to configure SR-IOV VF minimum and maximum queues.
  2018-05-30 22:53       ` Jakub Kicinski
@ 2018-05-31  3:35         ` Samudrala, Sridhar
  0 siblings, 0 replies; 14+ messages in thread
From: Samudrala, Sridhar @ 2018-05-31  3:35 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Michael Chan, David Miller, Netdev, Or Gerlitz

On 5/30/2018 3:53 PM, Jakub Kicinski wrote:
> On Wed, 30 May 2018 14:23:06 -0700, Samudrala, Sridhar wrote:
>> On 5/29/2018 11:33 PM, Jakub Kicinski wrote:
>>> On Tue, 29 May 2018 23:08:11 -0700, Michael Chan wrote:
>>>> On Tue, May 29, 2018 at 10:56 PM, Jakub Kicinski wrote:
>>>>> On Tue, 29 May 2018 20:19:54 -0700, Michael Chan wrote:
>>>>>> On Tue, May 29, 2018 at 1:46 PM, Samudrala, Sridhar wrote:
>>>>>>> Isn't ndo_set_vf_xxx() considered a legacy interface and not planned to be
>>>>>>> extended?
>>>>> +1 it's painful to see this feature being added to the legacy
>>>>> API :(  Another duplicated configuration knob.
>>>>>   
>>>>>> I didn't know about that.
>>>>>>   
>>>>>>> Shouldn't we enable this via ethtool on the port representor netdev?
>>>>>> We discussed about this.  ethtool on the VF representor will only work
>>>>>> in switchdev mode and also will not support min/max values.
>>>>> Ethtool channel API may be overdue a rewrite in devlink anyway, but I
>>>>> feel like implementing switchdev mode and rewriting features in devlink
>>>>> may be too much to ask.
>>>> Totally agreed.  And switchdev mode doesn't seem to be that widely
>>>> used at the moment.  Do you have other suggestions besides NDO?
>>> At some points you (Broadcom) were working whole bunch of devlink
>>> configuration options for the PCIe side of the ASIC.  The number of
>>> queues relates to things like number of allocated MSI-X vectors, which
>>> if memory serves me was in your devlink patch set.  In an ideal world
>>> we would try to keep all those in one place :)
>>>
>>> For PCIe config there is always the question of what can be configured
>>> at runtime, and what requires a HW reset.  Therefore that devlink API
>>> which could configure current as well as persistent device settings was
>>> quite nice.  I'm not sure if reallocating queues would ever require
>>> PCIe block reset but maybe...  Certainly it seems the notion of min
>>> queues would make more sense in PCIe configuration devlink API than
>>> ethtool channel API to me as well.
>>>
>>> Queues are in the grey area between netdev and non-netdev constructs.
>>> They make sense both from PCIe resource allocation perspective (i.e.
>>> devlink PCIe settings) and netdev perspective (ethtool) because they
>>> feed into things like qdisc offloads, maybe per-queue stats etc.
>>>
>>> So yes...  IMHO it would be nice to add this to a devlink SR-IOV config
>>> API and/or switchdev representors.  But neither of those are really an
>>> option for you today so IDK :)
>> One reason why 'switchdev' mode is not yet widely used or enabled by default
>> could be due to the requirement to program the flow rules only via slow path.
> Do you mean the fallback traffic requirement?

Yes.

>
>> Would it make sense to relax this requirement and support a mode where port
>> representors are created and let the PF driver implement a default policy that
>> adds flow rules for all the VFs to enable connectivity and let the user
>> add/modify the rules via port representors?
> I definitely share your concerns, stopping a major HW vendor from using
> this new and preferred mode is not helping us make progress.
>
> The problem is that if we allow this diversion, i.e. driver to implement
> some special policy, or pre-populate a bridge in a configuration that
> suits the HW we may condition users to expect that as the standard Linux
> behaviour.  And we will be stuck with it forever even tho your next gen
> HW (ice?) may support correct behaviour.

Yes. ice can support slowpath behavior as required to support OVS offload.
However, i was just wondering if we should have an option to allow switchdev
without slowpath so that the user can use alternate mechanisms to program
the flow rules instead of having to use OVS.


>
> We should perhaps separate switchdev mode from TC flower/OvS offloads.
> Is your objective to implement OvS offload or just switchdev mode?
>
> For OvS without proper fallback behaviour you may struggle.
>
> Switchdev mode could be within your reach even without changing the
> default rules.  What if you spawned all port netdevs (I dislike the
> term representor, sorry, it's confusing people) in down state and then
> refuse to bring them up unless user instantiated a bridge that would
> behave in a way that your HW can support?  If ports are down you won't
> have fallback traffic so no problem to solve.

If we want to use port netdev's admin state to control the link state of the
VFs then this will not work.
We need to only disable TX/RX but admin state and link state need to be
supported on the port netdevs.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next 1/3] net: Add support to configure SR-IOV VF minimum and maximum queues.
  2018-05-30 21:23     ` Samudrala, Sridhar
@ 2018-05-30 22:53       ` Jakub Kicinski
  2018-05-31  3:35         ` Samudrala, Sridhar
  0 siblings, 1 reply; 14+ messages in thread
From: Jakub Kicinski @ 2018-05-30 22:53 UTC (permalink / raw)
  To: Samudrala, Sridhar; +Cc: Michael Chan, David Miller, Netdev, Or Gerlitz

On Wed, 30 May 2018 14:23:06 -0700, Samudrala, Sridhar wrote:
> On 5/29/2018 11:33 PM, Jakub Kicinski wrote:
> > On Tue, 29 May 2018 23:08:11 -0700, Michael Chan wrote:  
> >> On Tue, May 29, 2018 at 10:56 PM, Jakub Kicinski wrote:  
> >>> On Tue, 29 May 2018 20:19:54 -0700, Michael Chan wrote:  
> >>>> On Tue, May 29, 2018 at 1:46 PM, Samudrala, Sridhar wrote:  
> >>>>> Isn't ndo_set_vf_xxx() considered a legacy interface and not planned to be
> >>>>> extended?  
> >>> +1 it's painful to see this feature being added to the legacy
> >>> API :(  Another duplicated configuration knob.
> >>>  
> >>>> I didn't know about that.
> >>>>  
> >>>>> Shouldn't we enable this via ethtool on the port representor netdev?  
> >>>> We discussed about this.  ethtool on the VF representor will only work
> >>>> in switchdev mode and also will not support min/max values.  
> >>> Ethtool channel API may be overdue a rewrite in devlink anyway, but I
> >>> feel like implementing switchdev mode and rewriting features in devlink
> >>> may be too much to ask.  
> >> Totally agreed.  And switchdev mode doesn't seem to be that widely
> >> used at the moment.  Do you have other suggestions besides NDO?  
> > At some points you (Broadcom) were working whole bunch of devlink
> > configuration options for the PCIe side of the ASIC.  The number of
> > queues relates to things like number of allocated MSI-X vectors, which
> > if memory serves me was in your devlink patch set.  In an ideal world
> > we would try to keep all those in one place :)
> >
> > For PCIe config there is always the question of what can be configured
> > at runtime, and what requires a HW reset.  Therefore that devlink API
> > which could configure current as well as persistent device settings was
> > quite nice.  I'm not sure if reallocating queues would ever require
> > PCIe block reset but maybe...  Certainly it seems the notion of min
> > queues would make more sense in PCIe configuration devlink API than
> > ethtool channel API to me as well.
> >
> > Queues are in the grey area between netdev and non-netdev constructs.
> > They make sense both from PCIe resource allocation perspective (i.e.
> > devlink PCIe settings) and netdev perspective (ethtool) because they
> > feed into things like qdisc offloads, maybe per-queue stats etc.
> >
> > So yes...  IMHO it would be nice to add this to a devlink SR-IOV config
> > API and/or switchdev representors.  But neither of those are really an
> > option for you today so IDK :)  
> 
> One reason why 'switchdev' mode is not yet widely used or enabled by default
> could be due to the requirement to program the flow rules only via slow path.

Do you mean the fallback traffic requirement?

> Would it make sense to relax this requirement and support a mode where port
> representors are created and let the PF driver implement a default policy that
> adds flow rules for all the VFs to enable connectivity and let the user
> add/modify the rules via port representors?

I definitely share your concerns, stopping a major HW vendor from using
this new and preferred mode is not helping us make progress.

The problem is that if we allow this diversion, i.e. driver to implement
some special policy, or pre-populate a bridge in a configuration that
suits the HW we may condition users to expect that as the standard Linux
behaviour.  And we will be stuck with it forever even tho your next gen
HW (ice?) may support correct behaviour.

We should perhaps separate switchdev mode from TC flower/OvS offloads.
Is your objective to implement OvS offload or just switchdev mode?  

For OvS without proper fallback behaviour you may struggle.

Switchdev mode could be within your reach even without changing the
default rules.  What if you spawned all port netdevs (I dislike the
term representor, sorry, it's confusing people) in down state and then
refuse to bring them up unless user instantiated a bridge that would
behave in a way that your HW can support?  If ports are down you won't
have fallback traffic so no problem to solve.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next 1/3] net: Add support to configure SR-IOV VF minimum and maximum queues.
  2018-05-30  6:33   ` Jakub Kicinski
  2018-05-30  7:18     ` Michael Chan
@ 2018-05-30 21:23     ` Samudrala, Sridhar
  2018-05-30 22:53       ` Jakub Kicinski
  1 sibling, 1 reply; 14+ messages in thread
From: Samudrala, Sridhar @ 2018-05-30 21:23 UTC (permalink / raw)
  To: Jakub Kicinski, Michael Chan; +Cc: David Miller, Netdev, Or Gerlitz

On 5/29/2018 11:33 PM, Jakub Kicinski wrote:
> On Tue, 29 May 2018 23:08:11 -0700, Michael Chan wrote:
>> On Tue, May 29, 2018 at 10:56 PM, Jakub Kicinski wrote:
>>> On Tue, 29 May 2018 20:19:54 -0700, Michael Chan wrote:
>>>> On Tue, May 29, 2018 at 1:46 PM, Samudrala, Sridhar wrote:
>>>>> Isn't ndo_set_vf_xxx() considered a legacy interface and not planned to be
>>>>> extended?
>>> +1 it's painful to see this feature being added to the legacy
>>> API :(  Another duplicated configuration knob.
>>>
>>>> I didn't know about that.
>>>>
>>>>> Shouldn't we enable this via ethtool on the port representor netdev?
>>>> We discussed about this.  ethtool on the VF representor will only work
>>>> in switchdev mode and also will not support min/max values.
>>> Ethtool channel API may be overdue a rewrite in devlink anyway, but I
>>> feel like implementing switchdev mode and rewriting features in devlink
>>> may be too much to ask.
>> Totally agreed.  And switchdev mode doesn't seem to be that widely
>> used at the moment.  Do you have other suggestions besides NDO?
> At some points you (Broadcom) were working whole bunch of devlink
> configuration options for the PCIe side of the ASIC.  The number of
> queues relates to things like number of allocated MSI-X vectors, which
> if memory serves me was in your devlink patch set.  In an ideal world
> we would try to keep all those in one place :)
>
> For PCIe config there is always the question of what can be configured
> at runtime, and what requires a HW reset.  Therefore that devlink API
> which could configure current as well as persistent device settings was
> quite nice.  I'm not sure if reallocating queues would ever require
> PCIe block reset but maybe...  Certainly it seems the notion of min
> queues would make more sense in PCIe configuration devlink API than
> ethtool channel API to me as well.
>
> Queues are in the grey area between netdev and non-netdev constructs.
> They make sense both from PCIe resource allocation perspective (i.e.
> devlink PCIe settings) and netdev perspective (ethtool) because they
> feed into things like qdisc offloads, maybe per-queue stats etc.
>
> So yes...  IMHO it would be nice to add this to a devlink SR-IOV config
> API and/or switchdev representors.  But neither of those are really an
> option for you today so IDK :)

One reason why 'switchdev' mode is not yet widely used or enabled by default
could be due to the requirement to program the flow rules only via slow path.

Would it make sense to relax this requirement and support a mode where port
representors are created and let the PF driver implement a default policy that
adds flow rules for all the VFs to enable connectivity and let the user
add/modify the rules via port representors?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next 1/3] net: Add support to configure SR-IOV VF minimum and maximum queues.
  2018-05-30  6:33   ` Jakub Kicinski
@ 2018-05-30  7:18     ` Michael Chan
  2018-05-30 21:23     ` Samudrala, Sridhar
  1 sibling, 0 replies; 14+ messages in thread
From: Michael Chan @ 2018-05-30  7:18 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Samudrala, Sridhar, David Miller, Netdev, Or Gerlitz

On Tue, May 29, 2018 at 11:33 PM, Jakub Kicinski
<jakub.kicinski@netronome.com> wrote:

>
> At some points you (Broadcom) were working whole bunch of devlink
> configuration options for the PCIe side of the ASIC.  The number of
> queues relates to things like number of allocated MSI-X vectors, which
> if memory serves me was in your devlink patch set.  In an ideal world
> we would try to keep all those in one place :)

Yeah, another colleague is now working with Mellanox on something similar.

One difference between those devlink parameters and these queue
parameters is that the former are more permanent and global settings.
For example, number of VFs or number of MSIX per VF are persistent
settings once they are set and after PCIe reset.  On the other hand,
these queue settings are pure run-time settings and may be unique for
each VF.  These are not stored as there is no room in NVRAM to store
128 sets or more of these parameters.

Anyway, let me discuss this with my colleague to see if there is a
natural fit for these queue parameters in the devlink infrastructure
that they are working on.

>
> For PCIe config there is always the question of what can be configured
> at runtime, and what requires a HW reset.  Therefore that devlink API
> which could configure current as well as persistent device settings was
> quite nice.  I'm not sure if reallocating queues would ever require
> PCIe block reset but maybe...  Certainly it seems the notion of min
> queues would make more sense in PCIe configuration devlink API than
> ethtool channel API to me as well.
>
> Queues are in the grey area between netdev and non-netdev constructs.
> They make sense both from PCIe resource allocation perspective (i.e.
> devlink PCIe settings) and netdev perspective (ethtool) because they
> feed into things like qdisc offloads, maybe per-queue stats etc.
>
> So yes...  IMHO it would be nice to add this to a devlink SR-IOV config
> API and/or switchdev representors.  But neither of those are really an
> option for you today so IDK :)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next 1/3] net: Add support to configure SR-IOV VF minimum and maximum queues.
  2018-05-30  6:08 ` Michael Chan
@ 2018-05-30  6:33   ` Jakub Kicinski
  2018-05-30  7:18     ` Michael Chan
  2018-05-30 21:23     ` Samudrala, Sridhar
  0 siblings, 2 replies; 14+ messages in thread
From: Jakub Kicinski @ 2018-05-30  6:33 UTC (permalink / raw)
  To: Michael Chan; +Cc: Samudrala, Sridhar, David Miller, Netdev, Or Gerlitz

On Tue, 29 May 2018 23:08:11 -0700, Michael Chan wrote:
> On Tue, May 29, 2018 at 10:56 PM, Jakub Kicinski wrote:
> > On Tue, 29 May 2018 20:19:54 -0700, Michael Chan wrote:
> >> On Tue, May 29, 2018 at 1:46 PM, Samudrala, Sridhar wrote:
> >> > Isn't ndo_set_vf_xxx() considered a legacy interface and not planned to be
> >> > extended?
> >
> > +1 it's painful to see this feature being added to the legacy
> > API :(  Another duplicated configuration knob.
> >
> >> I didn't know about that.
> >>
> >> > Shouldn't we enable this via ethtool on the port representor netdev?
> >>
> >> We discussed about this.  ethtool on the VF representor will only work
> >> in switchdev mode and also will not support min/max values.
> >
> > Ethtool channel API may be overdue a rewrite in devlink anyway, but I
> > feel like implementing switchdev mode and rewriting features in devlink
> > may be too much to ask.
>
> Totally agreed.  And switchdev mode doesn't seem to be that widely
> used at the moment.  Do you have other suggestions besides NDO?

At some points you (Broadcom) were working whole bunch of devlink
configuration options for the PCIe side of the ASIC.  The number of
queues relates to things like number of allocated MSI-X vectors, which
if memory serves me was in your devlink patch set.  In an ideal world
we would try to keep all those in one place :)

For PCIe config there is always the question of what can be configured
at runtime, and what requires a HW reset.  Therefore that devlink API
which could configure current as well as persistent device settings was
quite nice.  I'm not sure if reallocating queues would ever require
PCIe block reset but maybe...  Certainly it seems the notion of min
queues would make more sense in PCIe configuration devlink API than
ethtool channel API to me as well.

Queues are in the grey area between netdev and non-netdev constructs.
They make sense both from PCIe resource allocation perspective (i.e.
devlink PCIe settings) and netdev perspective (ethtool) because they
feed into things like qdisc offloads, maybe per-queue stats etc.

So yes...  IMHO it would be nice to add this to a devlink SR-IOV config
API and/or switchdev representors.  But neither of those are really an
option for you today so IDK :)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next 1/3] net: Add support to configure SR-IOV VF minimum and maximum queues.
  2018-05-30  5:56 [PATCH net-next 1/3] net: Add support to configure SR-IOV VF minimum and maximum queues Jakub Kicinski
@ 2018-05-30  6:08 ` Michael Chan
  2018-05-30  6:33   ` Jakub Kicinski
  0 siblings, 1 reply; 14+ messages in thread
From: Michael Chan @ 2018-05-30  6:08 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Samudrala, Sridhar, David Miller, Netdev, Or Gerlitz

On Tue, May 29, 2018 at 10:56 PM, Jakub Kicinski
<jakub.kicinski@netronome.com> wrote:
> On Tue, 29 May 2018 20:19:54 -0700, Michael Chan wrote:
>> On Tue, May 29, 2018 at 1:46 PM, Samudrala, Sridhar wrote:
>> > Isn't ndo_set_vf_xxx() considered a legacy interface and not planned to be
>> > extended?
>
> +1 it's painful to see this feature being added to the legacy
> API :(  Another duplicated configuration knob.
>
>> I didn't know about that.
>>
>> > Shouldn't we enable this via ethtool on the port representor netdev?
>>
>> We discussed about this.  ethtool on the VF representor will only work
>> in switchdev mode and also will not support min/max values.
>
> Ethtool channel API may be overdue a rewrite in devlink anyway, but I
> feel like implementing switchdev mode and rewriting features in devlink
> may be too much to ask.

Totally agreed.  And switchdev mode doesn't seem to be that widely
used at the moment.  Do you have other suggestions besides NDO?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next 1/3] net: Add support to configure SR-IOV VF minimum and maximum queues.
@ 2018-05-30  5:56 Jakub Kicinski
  2018-05-30  6:08 ` Michael Chan
  0 siblings, 1 reply; 14+ messages in thread
From: Jakub Kicinski @ 2018-05-30  5:56 UTC (permalink / raw)
  To: Michael Chan, Samudrala, Sridhar; +Cc: David Miller, Netdev, Or Gerlitz

On Tue, 29 May 2018 20:19:54 -0700, Michael Chan wrote:
> On Tue, May 29, 2018 at 1:46 PM, Samudrala, Sridhar wrote:
> > Isn't ndo_set_vf_xxx() considered a legacy interface and not planned to be
> > extended?

+1 it's painful to see this feature being added to the legacy
API :(  Another duplicated configuration knob.

> I didn't know about that.
>
> > Shouldn't we enable this via ethtool on the port representor netdev?
>
> We discussed about this.  ethtool on the VF representor will only work
> in switchdev mode and also will not support min/max values.

Ethtool channel API may be overdue a rewrite in devlink anyway, but I
feel like implementing switchdev mode and rewriting features in devlink
may be too much to ask.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2018-05-31  3:35 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-29  8:18 [PATCH net-next 0/3] net: Add support to configure SR-IOV VF queues Michael Chan
2018-05-29  8:18 ` [PATCH net-next 1/3] net: Add support to configure SR-IOV VF minimum and maximum queues Michael Chan
2018-05-29 20:46   ` Samudrala, Sridhar
2018-05-30  3:19     ` Michael Chan
2018-05-30 22:36       ` Jakub Kicinski
2018-05-29  8:18 ` [PATCH net-next 2/3] bnxt_en: Store min/max tx/rx rings for individual VFs Michael Chan
2018-05-29  8:18 ` [PATCH net-next 3/3] bnxt_en: Implement .ndo_set_vf_queues() Michael Chan
2018-05-30  5:56 [PATCH net-next 1/3] net: Add support to configure SR-IOV VF minimum and maximum queues Jakub Kicinski
2018-05-30  6:08 ` Michael Chan
2018-05-30  6:33   ` Jakub Kicinski
2018-05-30  7:18     ` Michael Chan
2018-05-30 21:23     ` Samudrala, Sridhar
2018-05-30 22:53       ` Jakub Kicinski
2018-05-31  3:35         ` Samudrala, Sridhar

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.