Netdev Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH net-next 1/1] ice: Improve MSI-X vector enablement fallback logic
@ 2021-01-13 23:42 Tony Nguyen
  2021-01-15  0:42 ` Jakub Kicinski
  0 siblings, 1 reply; 6+ messages in thread
From: Tony Nguyen @ 2021-01-13 23:42 UTC (permalink / raw)
  To: davem, kuba
  Cc: Brett Creeley, netdev, sassmann, anthony.l.nguyen, Tony Brelinski

From: Brett Creeley <brett.creeley@intel.com>

The current MSI-X enablement logic either tries to enable best-case
MSI-X vectors and if that fails we only support a bare-minimum set.
This is not very flexible and the current logic is broken when actually
allocating and reserving MSI-X in the driver. Fix this by improving the
fall-back logic and fixing the allocation/reservation of MSI-X when the
best-case MSI-X vectors are not received from the OS.

The new fall-back logic is described below with each [#] being an
attempt at enabling a certain number of MSI-X. If any of the steps
succeed, then return the number of MSI-X enabled from
ice_ena_msix_range(). If any of the attempts fail, then goto the next
step.

Attempt [0]: Enable the best-case scenario MSI-X vectors.

Attempt [1]: Enable MSI-X vectors with the number of pf->num_lan_msix
reduced by a factor of 2 from the previous attempt (i.e.
num_online_cpus() / 2).

Attempt [2]: Same as attempt [1], except reduce by a factor of 4.

Attempt [3]: Enable the bare-minimum MSI-X vectors.

Also, if the adjusted_base_msix ever hits the minimum required for LAN,
then just set the needed MSI-X for that feature to the minimum
(similar to attempt [3]).

To fix the allocation/reservation of MSI-X, the PF VSI needs to take
into account the pf->num_lan_msix available and only allocate up to that
many MSI-X vectors. To do this, limit the number of Tx and Rx queues
based on pf->num_lan_msix. This is done because we don't want more than
1 Tx/Rx queue per interrupt due to performance concerns. Also, limit the
number of MSI-X based on pf->num_lan_msix available.

Also, prevent users from enabling more combined queues than there are
MSI-X available via ethtool.

Fixes: 152b978a1f90 ("ice: Rework ice_ena_msix_range")
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
---
 drivers/net/ethernet/intel/ice/ice.h         |   4 +-
 drivers/net/ethernet/intel/ice/ice_ethtool.c |   8 +-
 drivers/net/ethernet/intel/ice/ice_lib.c     |  15 +-
 drivers/net/ethernet/intel/ice/ice_main.c    | 161 ++++++++++---------
 4 files changed, 102 insertions(+), 86 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h
index 56725356a17b..3cf6f2bdca41 100644
--- a/drivers/net/ethernet/intel/ice/ice.h
+++ b/drivers/net/ethernet/intel/ice/ice.h
@@ -68,8 +68,10 @@
 #define ICE_INT_NAME_STR_LEN	(IFNAMSIZ + 16)
 #define ICE_AQ_LEN		64
 #define ICE_MBXSQ_LEN		64
-#define ICE_MIN_MSIX		2
 #define ICE_FDIR_MSIX		1
+#define ICE_MIN_LAN_MSIX	1
+#define ICE_OICR_MSIX		1
+#define ICE_MIN_MSIX		(ICE_MIN_LAN_MSIX + ICE_OICR_MSIX)
 #define ICE_NO_VSI		0xffff
 #define ICE_VSI_MAP_CONTIG	0
 #define ICE_VSI_MAP_SCATTER	1
diff --git a/drivers/net/ethernet/intel/ice/ice_ethtool.c b/drivers/net/ethernet/intel/ice/ice_ethtool.c
index 9e8e9531cd87..69c113a4de7e 100644
--- a/drivers/net/ethernet/intel/ice/ice_ethtool.c
+++ b/drivers/net/ethernet/intel/ice/ice_ethtool.c
@@ -3258,8 +3258,8 @@ ice_set_rxfh(struct net_device *netdev, const u32 *indir, const u8 *key,
  */
 static int ice_get_max_txq(struct ice_pf *pf)
 {
-	return min_t(int, num_online_cpus(),
-		     pf->hw.func_caps.common_cap.num_txq);
+	return min3(pf->num_lan_msix, (u16)num_online_cpus(),
+		    (u16)pf->hw.func_caps.common_cap.num_txq);
 }
 
 /**
@@ -3268,8 +3268,8 @@ static int ice_get_max_txq(struct ice_pf *pf)
  */
 static int ice_get_max_rxq(struct ice_pf *pf)
 {
-	return min_t(int, num_online_cpus(),
-		     pf->hw.func_caps.common_cap.num_rxq);
+	return min3(pf->num_lan_msix, (u16)num_online_cpus(),
+		    (u16)pf->hw.func_caps.common_cap.num_rxq);
 }
 
 /**
diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c
index 3df67486d42d..01da18ee17da 100644
--- a/drivers/net/ethernet/intel/ice/ice_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_lib.c
@@ -161,8 +161,10 @@ static void ice_vsi_set_num_qs(struct ice_vsi *vsi, u16 vf_id)
 
 	switch (vsi->type) {
 	case ICE_VSI_PF:
-		vsi->alloc_txq = min_t(int, ice_get_avail_txq_count(pf),
-				       num_online_cpus());
+		/* default to 1 Tx queue per MSI-X to not hurt our performance */
+		vsi->alloc_txq = min3(pf->num_lan_msix,
+				      ice_get_avail_txq_count(pf),
+				      (u16)num_online_cpus());
 		if (vsi->req_txq) {
 			vsi->alloc_txq = vsi->req_txq;
 			vsi->num_txq = vsi->req_txq;
@@ -174,8 +176,10 @@ static void ice_vsi_set_num_qs(struct ice_vsi *vsi, u16 vf_id)
 		if (!test_bit(ICE_FLAG_RSS_ENA, pf->flags)) {
 			vsi->alloc_rxq = 1;
 		} else {
-			vsi->alloc_rxq = min_t(int, ice_get_avail_rxq_count(pf),
-					       num_online_cpus());
+			/* default to 1 Rx queue per MSI-X to not hurt our performance */
+			vsi->alloc_rxq = min3(pf->num_lan_msix,
+					      ice_get_avail_rxq_count(pf),
+					      (u16)num_online_cpus());
 			if (vsi->req_rxq) {
 				vsi->alloc_rxq = vsi->req_rxq;
 				vsi->num_rxq = vsi->req_rxq;
@@ -184,7 +188,8 @@ static void ice_vsi_set_num_qs(struct ice_vsi *vsi, u16 vf_id)
 
 		pf->num_lan_rx = vsi->alloc_rxq;
 
-		vsi->num_q_vectors = max_t(int, vsi->alloc_rxq, vsi->alloc_txq);
+		vsi->num_q_vectors = min_t(int, pf->num_lan_msix,
+					   max_t(int, vsi->alloc_rxq, vsi->alloc_txq));
 		break;
 	case ICE_VSI_VF:
 		vf = &pf->vf[vsi->vf_id];
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index 6e251dfffc91..040e868a0637 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -3367,95 +3367,105 @@ static int ice_init_pf(struct ice_pf *pf)
 	return 0;
 }
 
+static int ice_alloc_msix_entries(struct ice_pf *pf, u16 num_entries)
+{
+	u16 i;
+
+	pf->msix_entries = devm_kcalloc(ice_pf_to_dev(pf), num_entries,
+					sizeof(*pf->msix_entries), GFP_KERNEL);
+	if (!pf->msix_entries)
+		return -ENOMEM;
+
+	for (i = 0; i < num_entries; i++)
+		pf->msix_entries[i].entry = i;
+
+	return 0;
+}
+
+static void ice_free_msix_entries(struct ice_pf *pf)
+{
+	devm_kfree(ice_pf_to_dev(pf), pf->msix_entries);
+	pf->msix_entries = NULL;
+}
+
 /**
- * ice_ena_msix_range - Request a range of MSIX vectors from the OS
+ * ice_ena_msix_range - request a range of MSI-X vectors from the OS
  * @pf: board private structure
  *
- * compute the number of MSIX vectors required (v_budget) and request from
- * the OS. Return the number of vectors reserved or negative on failure
+ * The driver first tries to enable best-case scenario MSI-X vectors. If that
+ * doesn't succeed then a fall-back method is employed.
+ *
+ * The fall-back logic is described below with each [#] being an attempt at
+ * enabling a certain number of MSI-X. If any of the steps succeed, then return
+ * the number of MSI-X enabled from  pci_ena_msix_exact(). If any of the
+ * attempts fail, then goto the next step.
+ *
+ * Attempt [0]: Enable the best-case scenario MSI-X vectors.
+ *
+ * Attempt [1]: Enable MSI-X vectors with the number of pf->num_lan_msix reduced
+ * by a factor of 2 from the previous attempts (i.e. num_online_cpus() / 2).
+ *
+ * Attempt [2]: Same as attempt [1], except reduce both by a factor of 4.
+ *
+ * Attempt [3]: Enable the bare-minimum MSI-X vectors.
+ *
+ * Also, if the adjusted_base_msix ever hits the mimimum required for LAN, then
+ * just set the needed MSI-X for that feature to the minimum (similar to
+ * attempt [3]).
  */
 static int ice_ena_msix_range(struct ice_pf *pf)
 {
+	int err = -ENOSPC, num_cpus, attempt, adjusted_msix_divisor = 1, needed;
 	struct device *dev = ice_pf_to_dev(pf);
-	int v_left, v_actual, v_budget = 0;
-	int needed, err, i;
-
-	v_left = pf->hw.func_caps.common_cap.num_msix_vectors;
-
-	/* reserve one vector for miscellaneous handler */
-	needed = 1;
-	if (v_left < needed)
-		goto no_hw_vecs_left_err;
-	v_budget += needed;
-	v_left -= needed;
-
-	/* reserve vectors for LAN traffic */
-	needed = min_t(int, num_online_cpus(), v_left);
-	if (v_left < needed)
-		goto no_hw_vecs_left_err;
-	pf->num_lan_msix = needed;
-	v_budget += needed;
-	v_left -= needed;
-
-	/* reserve one vector for flow director */
-	if (test_bit(ICE_FLAG_FD_ENA, pf->flags)) {
-		needed = ICE_FDIR_MSIX;
-		if (v_left < needed)
-			goto no_hw_vecs_left_err;
-		v_budget += needed;
-		v_left -= needed;
-	}
-
-	pf->msix_entries = devm_kcalloc(dev, v_budget,
-					sizeof(*pf->msix_entries), GFP_KERNEL);
 
-	if (!pf->msix_entries) {
-		err = -ENOMEM;
-		goto exit_err;
-	}
+	num_cpus = num_online_cpus();
 
-	for (i = 0; i < v_budget; i++)
-		pf->msix_entries[i].entry = i;
+#define ICE_MAX_ENABLE_MSIX_ATTEMPTS 3
+	/* make multiple passes at enabling MSI-X vectors in case there aren't
+	 * enough available for the best-case scenario
+	 */
+	for (attempt = 0; attempt <= ICE_MAX_ENABLE_MSIX_ATTEMPTS; attempt++) {
+		int adjusted_base_msix = num_cpus / adjusted_msix_divisor;
 
-	/* actually reserve the vectors */
-	v_actual = pci_enable_msix_range(pf->pdev, pf->msix_entries,
-					 ICE_MIN_MSIX, v_budget);
-
-	if (v_actual < 0) {
-		dev_err(dev, "unable to reserve MSI-X vectors\n");
-		err = v_actual;
-		goto msix_err;
-	}
-
-	if (v_actual < v_budget) {
-		dev_warn(dev, "not enough OS MSI-X vectors. requested = %d, obtained = %d\n",
-			 v_budget, v_actual);
-/* 2 vectors each for LAN and RDMA (traffic + OICR), one for flow director */
-#define ICE_MIN_LAN_VECS 2
-#define ICE_MIN_RDMA_VECS 2
-#define ICE_MIN_VECS (ICE_MIN_LAN_VECS + ICE_MIN_RDMA_VECS + 1)
-
-		if (v_actual < ICE_MIN_LAN_VECS) {
-			/* error if we can't get minimum vectors */
-			pci_disable_msix(pf->pdev);
-			err = -ERANGE;
-			goto msix_err;
+		/* attempt to enable minimum MSI-X range */
+		if (attempt == ICE_MAX_ENABLE_MSIX_ATTEMPTS) {
+			needed = ICE_MIN_MSIX;
+			pf->num_lan_msix = ICE_MIN_LAN_MSIX;
 		} else {
-			pf->num_lan_msix = ICE_MIN_LAN_VECS;
+			if (adjusted_base_msix > ICE_MIN_LAN_MSIX)
+				pf->num_lan_msix = adjusted_base_msix;
+			else
+				pf->num_lan_msix = ICE_MIN_LAN_MSIX;
+
+			needed = pf->num_lan_msix + ICE_OICR_MSIX;
 		}
-	}
 
-	return v_actual;
+		if (test_bit(ICE_FLAG_FD_ENA, pf->flags))
+			needed += ICE_FDIR_MSIX;
 
-msix_err:
-	devm_kfree(dev, pf->msix_entries);
-	goto exit_err;
+		err = ice_alloc_msix_entries(pf, needed);
+		if (err)
+			goto err_out;
+
+		dev_dbg(dev, "attempting to enable %d MSI-X vectors\n", needed);
+		err = pci_enable_msix_exact(pf->pdev, pf->msix_entries, needed);
+		if (err < 0) {
+			ice_free_msix_entries(pf);
+			dev_notice(dev, "Couldn't get %d MSI-X vectors due to OS, Platform, and/or PCI-function limitations. Reducing request and retrying.",
+				   needed);
+
+			adjusted_msix_divisor *= 2;
+		} else {
+			if (pf->num_lan_msix != num_cpus)
+				dev_notice(dev, "Enabled %d MSI-X vectors for LAN traffic.\n",
+					   pf->num_lan_msix);
+
+			return needed;
+		}
+	}
 
-no_hw_vecs_left_err:
-	dev_err(dev, "not enough device MSI-X vectors. requested = %d, available = %d\n",
-		needed, v_left);
-	err = -ERANGE;
-exit_err:
+err_out:
+	dev_err(dev, "failed to enable MSI-X vectors\n");
 	pf->num_lan_msix = 0;
 	return err;
 }
@@ -3467,8 +3477,7 @@ static int ice_ena_msix_range(struct ice_pf *pf)
 static void ice_dis_msix(struct ice_pf *pf)
 {
 	pci_disable_msix(pf->pdev);
-	devm_kfree(ice_pf_to_dev(pf), pf->msix_entries);
-	pf->msix_entries = NULL;
+	ice_free_msix_entries(pf);
 }
 
 /**
-- 
2.26.2


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net-next 1/1] ice: Improve MSI-X vector enablement fallback logic
  2021-01-13 23:42 [PATCH net-next 1/1] ice: Improve MSI-X vector enablement fallback logic Tony Nguyen
@ 2021-01-15  0:42 ` Jakub Kicinski
  2021-01-20  0:12   ` Venkataramanan, Anirudh
  0 siblings, 1 reply; 6+ messages in thread
From: Jakub Kicinski @ 2021-01-15  0:42 UTC (permalink / raw)
  To: Tony Nguyen; +Cc: davem, Brett Creeley, netdev, sassmann, Tony Brelinski

On Wed, 13 Jan 2021 15:42:26 -0800 Tony Nguyen wrote:
> From: Brett Creeley <brett.creeley@intel.com>
> 
> The current MSI-X enablement logic either tries to enable best-case
> MSI-X vectors and if that fails we only support a bare-minimum set.
> This is not very flexible and the current logic is broken when actually
> allocating and reserving MSI-X in the driver. Fix this by improving the
> fall-back logic and fixing the allocation/reservation of MSI-X when the
> best-case MSI-X vectors are not received from the OS.
> 
> The new fall-back logic is described below with each [#] being an
> attempt at enabling a certain number of MSI-X. If any of the steps
> succeed, then return the number of MSI-X enabled from
> ice_ena_msix_range(). If any of the attempts fail, then goto the next
> step.
> 
> Attempt [0]: Enable the best-case scenario MSI-X vectors.
> 
> Attempt [1]: Enable MSI-X vectors with the number of pf->num_lan_msix
> reduced by a factor of 2 from the previous attempt (i.e.
> num_online_cpus() / 2).
> 
> Attempt [2]: Same as attempt [1], except reduce by a factor of 4.
> 
> Attempt [3]: Enable the bare-minimum MSI-X vectors.
> 
> Also, if the adjusted_base_msix ever hits the minimum required for LAN,
> then just set the needed MSI-X for that feature to the minimum
> (similar to attempt [3]).

I don't really get why you switch to this manual "exponential back-off"
rather than continuing to use pci_enable_msix_range(), but fixing the
capping to ICE_MIN_LAN_VECS.

> To fix the allocation/reservation of MSI-X, the PF VSI needs to take
> into account the pf->num_lan_msix available and only allocate up to that
> many MSI-X vectors. To do this, limit the number of Tx and Rx queues
> based on pf->num_lan_msix. This is done because we don't want more than
> 1 Tx/Rx queue per interrupt due to performance concerns. Also, limit the
> number of MSI-X based on pf->num_lan_msix available.
> 
> Also, prevent users from enabling more combined queues than there are
> MSI-X available via ethtool.

Right, this part sounds like a fix, and should be a separate patch
against net, not net-next.

> Fixes: 152b978a1f90 ("ice: Rework ice_ena_msix_range")
> Signed-off-by: Brett Creeley <brett.creeley@intel.com>
> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net-next 1/1] ice: Improve MSI-X vector enablement fallback logic
  2021-01-15  0:42 ` Jakub Kicinski
@ 2021-01-20  0:12   ` Venkataramanan, Anirudh
  2021-01-20  0:41     ` Jakub Kicinski
  0 siblings, 1 reply; 6+ messages in thread
From: Venkataramanan, Anirudh @ 2021-01-20  0:12 UTC (permalink / raw)
  To: kuba, Nguyen, Anthony L
  Cc: Brelinski, TonyX, sassmann, Creeley, Brett, davem, netdev

On Thu, 2021-01-14 at 16:42 -0800, Jakub Kicinski wrote:
> On Wed, 13 Jan 2021 15:42:26 -0800 Tony Nguyen wrote:
> > From: Brett Creeley <brett.creeley@intel.com>
> > 
> > The current MSI-X enablement logic either tries to enable best-case
> > MSI-X vectors and if that fails we only support a bare-minimum set.
> > This is not very flexible and the current logic is broken when
> > actually
> > allocating and reserving MSI-X in the driver. Fix this by improving
> > the
> > fall-back logic and fixing the allocation/reservation of MSI-X when
> > the
> > best-case MSI-X vectors are not received from the OS.
> > 
> > The new fall-back logic is described below with each [#] being an
> > attempt at enabling a certain number of MSI-X. If any of the steps
> > succeed, then return the number of MSI-X enabled from
> > ice_ena_msix_range(). If any of the attempts fail, then goto the
> > next
> > step.
> > 
> > Attempt [0]: Enable the best-case scenario MSI-X vectors.
> > 
> > Attempt [1]: Enable MSI-X vectors with the number of pf-
> > >num_lan_msix
> > reduced by a factor of 2 from the previous attempt (i.e.
> > num_online_cpus() / 2).
> > 
> > Attempt [2]: Same as attempt [1], except reduce by a factor of 4.
> > 
> > Attempt [3]: Enable the bare-minimum MSI-X vectors.
> > 
> > Also, if the adjusted_base_msix ever hits the minimum required for
> > LAN,
> > then just set the needed MSI-X for that feature to the minimum
> > (similar to attempt [3]).
> 
> I don't really get why you switch to this manual "exponential back-
> off"
> rather than continuing to use pci_enable_msix_range(), but fixing the
> capping to ICE_MIN_LAN_VECS.

As per the current logic, if the driver does not get the number of MSI-
X vectors it needs, it will immediately drop to "Do I have at least two
(ICE_MIN_LAN_VECS) MSI-X vectors?". If yes, the driver will enable a
single Tx/Rx traffic queue pair, bound to one of the two MSI-X vectors.

This is a bit of an all-or-nothing type approach. There's a mid-ground
that can allow more queues to be enabled (ex. driver asked for 300
vectors, but got 68 vectors, so enabled 64 data queues) and this patch
implements the mid-ground logic. 

This mid-ground logic can also be implemented based on the return value
of pci_enable_msix_range() but IMHO the implementation in this patch
using pci_enable_msix_exact is better because it's always only
enabling/reserving as many MSI-X vectors as required, not more, not
less.

> 
> > To fix the allocation/reservation of MSI-X, the PF VSI needs to
> > take
> > into account the pf->num_lan_msix available and only allocate up to
> > that
> > many MSI-X vectors. To do this, limit the number of Tx and Rx
> > queues
> > based on pf->num_lan_msix. This is done because we don't want more
> > than
> > 1 Tx/Rx queue per interrupt due to performance concerns. Also,
> > limit the
> > number of MSI-X based on pf->num_lan_msix available.
> > 
> > Also, prevent users from enabling more combined queues than there
> > are
> > MSI-X available via ethtool.
> 
> Right, this part sounds like a fix, and should be a separate patch
> against net, not net-next.

ACK. We will do a different patch for this.

> 
> > Fixes: 152b978a1f90 ("ice: Rework ice_ena_msix_range")
> > Signed-off-by: Brett Creeley <brett.creeley@intel.com>
> > Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
> > Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net-next 1/1] ice: Improve MSI-X vector enablement fallback logic
  2021-01-20  0:12   ` Venkataramanan, Anirudh
@ 2021-01-20  0:41     ` Jakub Kicinski
  2021-01-20  2:13       ` Venkataramanan, Anirudh
  0 siblings, 1 reply; 6+ messages in thread
From: Jakub Kicinski @ 2021-01-20  0:41 UTC (permalink / raw)
  To: Venkataramanan, Anirudh
  Cc: Nguyen, Anthony L, Brelinski, TonyX, sassmann, Creeley, Brett,
	davem, netdev

On Wed, 20 Jan 2021 00:12:26 +0000 Venkataramanan, Anirudh wrote:
> > > Attempt [0]: Enable the best-case scenario MSI-X vectors.
> > > 
> > > Attempt [1]: Enable MSI-X vectors with the number of pf-  
> > > >num_lan_msix  
> > > reduced by a factor of 2 from the previous attempt (i.e.
> > > num_online_cpus() / 2).
> > > 
> > > Attempt [2]: Same as attempt [1], except reduce by a factor of 4.
> > > 
> > > Attempt [3]: Enable the bare-minimum MSI-X vectors.
> > > 
> > > Also, if the adjusted_base_msix ever hits the minimum required for
> > > LAN,
> > > then just set the needed MSI-X for that feature to the minimum
> > > (similar to attempt [3]).  
> > 
> > I don't really get why you switch to this manual "exponential back-
> > off"
> > rather than continuing to use pci_enable_msix_range(), but fixing the
> > capping to ICE_MIN_LAN_VECS.  
> 
> As per the current logic, if the driver does not get the number of MSI-
> X vectors it needs, it will immediately drop to "Do I have at least two
> (ICE_MIN_LAN_VECS) MSI-X vectors?". If yes, the driver will enable a
> single Tx/Rx traffic queue pair, bound to one of the two MSI-X vectors.
> 
> This is a bit of an all-or-nothing type approach. There's a mid-ground
> that can allow more queues to be enabled (ex. driver asked for 300
> vectors, but got 68 vectors, so enabled 64 data queues) and this patch
> implements the mid-ground logic. 
> 
> This mid-ground logic can also be implemented based on the return value
> of pci_enable_msix_range() but IMHO the implementation in this patch
> using pci_enable_msix_exact is better because it's always only
> enabling/reserving as many MSI-X vectors as required, not more, not
> less.

What do you mean by "required" in the last sentence? The driver
requests num_online_cpus()-worth of IRQs, so it must work with any
number of IRQs. Why is num_cpus() / 1,2,4,8 "required"?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net-next 1/1] ice: Improve MSI-X vector enablement fallback logic
  2021-01-20  0:41     ` Jakub Kicinski
@ 2021-01-20  2:13       ` Venkataramanan, Anirudh
  2021-01-20  2:29         ` Jakub Kicinski
  0 siblings, 1 reply; 6+ messages in thread
From: Venkataramanan, Anirudh @ 2021-01-20  2:13 UTC (permalink / raw)
  To: kuba
  Cc: Brelinski, TonyX, sassmann, Nguyen, Anthony L, davem, Creeley,
	Brett, netdev

On Tue, 2021-01-19 at 16:41 -0800, Jakub Kicinski wrote:
> On Wed, 20 Jan 2021 00:12:26 +0000 Venkataramanan, Anirudh wrote:
> > > > Attempt [0]: Enable the best-case scenario MSI-X vectors.
> > > > 
> > > > Attempt [1]: Enable MSI-X vectors with the number of pf-  
> > > > > num_lan_msix  
> > > > reduced by a factor of 2 from the previous attempt (i.e.
> > > > num_online_cpus() / 2).
> > > > 
> > > > Attempt [2]: Same as attempt [1], except reduce by a factor of
> > > > 4.
> > > > 
> > > > Attempt [3]: Enable the bare-minimum MSI-X vectors.
> > > > 
> > > > Also, if the adjusted_base_msix ever hits the minimum required
> > > > for
> > > > LAN,
> > > > then just set the needed MSI-X for that feature to the minimum
> > > > (similar to attempt [3]).  
> > > 
> > > I don't really get why you switch to this manual "exponential
> > > back-
> > > off"
> > > rather than continuing to use pci_enable_msix_range(), but fixing
> > > the
> > > capping to ICE_MIN_LAN_VECS.  
> > 
> > As per the current logic, if the driver does not get the number of
> > MSI-
> > X vectors it needs, it will immediately drop to "Do I have at least
> > two
> > (ICE_MIN_LAN_VECS) MSI-X vectors?". If yes, the driver will enable
> > a
> > single Tx/Rx traffic queue pair, bound to one of the two MSI-X
> > vectors.
> > 
> > This is a bit of an all-or-nothing type approach. There's a mid-
> > ground
> > that can allow more queues to be enabled (ex. driver asked for 300
> > vectors, but got 68 vectors, so enabled 64 data queues) and this
> > patch
> > implements the mid-ground logic. 
> > 
> > This mid-ground logic can also be implemented based on the return
> > value
> > of pci_enable_msix_range() but IMHO the implementation in this
> > patch
> > using pci_enable_msix_exact is better because it's always only
> > enabling/reserving as many MSI-X vectors as required, not more, not
> > less.
> 
> What do you mean by "required" in the last sentence? 

.. as "required" in that particular iteration of the loop.

> The driver
> requests num_online_cpus()-worth of IRQs, so it must work with any
> number of IRQs. Why is num_cpus() / 1,2,4,8 "required"?

Let me back up a bit here. 

Ultimately, the issue we are trying to solve here is "what happens when
the driver doesn't get as many MSI-X vectors as it needs, and how it's
interpreted by the end user"

Let's say there are these two systems, each with 256 cores but the
response to pci_enable_msix_range() is different:

System 1: 256 cores, pci_enable_msix_range returns 75 vectors
System 2: 256 cores, pci_enable_msix_range returns 220 vectors 

In this case, the number of queues the user would see enabled on each
of these systems would be very different (73 on system 1 and 218 on
system 2). This variabilty makes it difficult to define what the
expected behavior should be, because it's not exactly obvious to the
user how many free MSI-X vectors a given system has. Instead, if the
driver reduced it's demand for vectors in a well defined manner
(num_cpus() / 1,2,4,8), the user visible difference between the two
systems wouldn't be so drastic.

If this is plain wrong or if there's a preferred approach, I'd be happy
to discuss further.

Ani

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net-next 1/1] ice: Improve MSI-X vector enablement fallback logic
  2021-01-20  2:13       ` Venkataramanan, Anirudh
@ 2021-01-20  2:29         ` Jakub Kicinski
  0 siblings, 0 replies; 6+ messages in thread
From: Jakub Kicinski @ 2021-01-20  2:29 UTC (permalink / raw)
  To: Venkataramanan, Anirudh
  Cc: Brelinski, TonyX, sassmann, Nguyen, Anthony L, davem, Creeley,
	Brett, netdev

On Wed, 20 Jan 2021 02:13:36 +0000 Venkataramanan, Anirudh wrote:
> > > As per the current logic, if the driver does not get the number of
> > > MSI-
> > > X vectors it needs, it will immediately drop to "Do I have at least
> > > two
> > > (ICE_MIN_LAN_VECS) MSI-X vectors?". If yes, the driver will enable
> > > a
> > > single Tx/Rx traffic queue pair, bound to one of the two MSI-X
> > > vectors.
> > > 
> > > This is a bit of an all-or-nothing type approach. There's a mid-
> > > ground
> > > that can allow more queues to be enabled (ex. driver asked for 300
> > > vectors, but got 68 vectors, so enabled 64 data queues) and this
> > > patch
> > > implements the mid-ground logic. 
> > > 
> > > This mid-ground logic can also be implemented based on the return
> > > value
> > > of pci_enable_msix_range() but IMHO the implementation in this
> > > patch
> > > using pci_enable_msix_exact is better because it's always only
> > > enabling/reserving as many MSI-X vectors as required, not more, not
> > > less.  
> > 
> > What do you mean by "required" in the last sentence?   
> 
> .. as "required" in that particular iteration of the loop.
> 
> > The driver
> > requests num_online_cpus()-worth of IRQs, so it must work with any
> > number of IRQs. Why is num_cpus() / 1,2,4,8 "required"?  
> 
> Let me back up a bit here. 
> 
> Ultimately, the issue we are trying to solve here is "what happens when
> the driver doesn't get as many MSI-X vectors as it needs, and how it's
> interpreted by the end user"
> 
> Let's say there are these two systems, each with 256 cores but the
> response to pci_enable_msix_range() is different:
> 
> System 1: 256 cores, pci_enable_msix_range returns 75 vectors
> System 2: 256 cores, pci_enable_msix_range returns 220 vectors 
> 
> In this case, the number of queues the user would see enabled on each
> of these systems would be very different (73 on system 1 and 218 on
> system 2). This variabilty makes it difficult to define what the
> expected behavior should be, because it's not exactly obvious to the
> user how many free MSI-X vectors a given system has. Instead, if the
> driver reduced it's demand for vectors in a well defined manner
> (num_cpus() / 1,2,4,8), the user visible difference between the two
> systems wouldn't be so drastic.
> 
> If this is plain wrong or if there's a preferred approach, I'd be happy
> to discuss further.

Let's stick to the standard Linux way of handling IRQ exhaustion, and
rely on pci_enable_msix_range() to pick the number. If the current
behavior of pci_enable_msix_range() is now what users want we can
change it. Each driver creating its own heuristic is worst of all
choices as most brownfield deployments will have a mix of NICs.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, back to index

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-13 23:42 [PATCH net-next 1/1] ice: Improve MSI-X vector enablement fallback logic Tony Nguyen
2021-01-15  0:42 ` Jakub Kicinski
2021-01-20  0:12   ` Venkataramanan, Anirudh
2021-01-20  0:41     ` Jakub Kicinski
2021-01-20  2:13       ` Venkataramanan, Anirudh
2021-01-20  2:29         ` Jakub Kicinski

Netdev Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/netdev/0 netdev/git/0.git
	git clone --mirror https://lore.kernel.org/netdev/1 netdev/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 netdev netdev/ https://lore.kernel.org/netdev \
		netdev@vger.kernel.org
	public-inbox-index netdev

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.netdev


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git