* [PATCH net-next 1/1] ice: Improve MSI-X vector enablement fallback logic
@ 2021-01-13 23:42 Tony Nguyen
2021-01-15 0:42 ` Jakub Kicinski
0 siblings, 1 reply; 7+ messages in thread
From: Tony Nguyen @ 2021-01-13 23:42 UTC (permalink / raw)
To: davem, kuba
Cc: Brett Creeley, netdev, sassmann, anthony.l.nguyen, Tony Brelinski
From: Brett Creeley <brett.creeley@intel.com>
The current MSI-X enablement logic either tries to enable best-case
MSI-X vectors and if that fails we only support a bare-minimum set.
This is not very flexible and the current logic is broken when actually
allocating and reserving MSI-X in the driver. Fix this by improving the
fall-back logic and fixing the allocation/reservation of MSI-X when the
best-case MSI-X vectors are not received from the OS.
The new fall-back logic is described below with each [#] being an
attempt at enabling a certain number of MSI-X. If any of the steps
succeed, then return the number of MSI-X enabled from
ice_ena_msix_range(). If any of the attempts fail, then goto the next
step.
Attempt [0]: Enable the best-case scenario MSI-X vectors.
Attempt [1]: Enable MSI-X vectors with the number of pf->num_lan_msix
reduced by a factor of 2 from the previous attempt (i.e.
num_online_cpus() / 2).
Attempt [2]: Same as attempt [1], except reduce by a factor of 4.
Attempt [3]: Enable the bare-minimum MSI-X vectors.
Also, if the adjusted_base_msix ever hits the minimum required for LAN,
then just set the needed MSI-X for that feature to the minimum
(similar to attempt [3]).
To fix the allocation/reservation of MSI-X, the PF VSI needs to take
into account the pf->num_lan_msix available and only allocate up to that
many MSI-X vectors. To do this, limit the number of Tx and Rx queues
based on pf->num_lan_msix. This is done because we don't want more than
1 Tx/Rx queue per interrupt due to performance concerns. Also, limit the
number of MSI-X based on pf->num_lan_msix available.
Also, prevent users from enabling more combined queues than there are
MSI-X available via ethtool.
Fixes: 152b978a1f90 ("ice: Rework ice_ena_msix_range")
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
---
drivers/net/ethernet/intel/ice/ice.h | 4 +-
drivers/net/ethernet/intel/ice/ice_ethtool.c | 8 +-
drivers/net/ethernet/intel/ice/ice_lib.c | 15 +-
drivers/net/ethernet/intel/ice/ice_main.c | 161 ++++++++++---------
4 files changed, 102 insertions(+), 86 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h
index 56725356a17b..3cf6f2bdca41 100644
--- a/drivers/net/ethernet/intel/ice/ice.h
+++ b/drivers/net/ethernet/intel/ice/ice.h
@@ -68,8 +68,10 @@
#define ICE_INT_NAME_STR_LEN (IFNAMSIZ + 16)
#define ICE_AQ_LEN 64
#define ICE_MBXSQ_LEN 64
-#define ICE_MIN_MSIX 2
#define ICE_FDIR_MSIX 1
+#define ICE_MIN_LAN_MSIX 1
+#define ICE_OICR_MSIX 1
+#define ICE_MIN_MSIX (ICE_MIN_LAN_MSIX + ICE_OICR_MSIX)
#define ICE_NO_VSI 0xffff
#define ICE_VSI_MAP_CONTIG 0
#define ICE_VSI_MAP_SCATTER 1
diff --git a/drivers/net/ethernet/intel/ice/ice_ethtool.c b/drivers/net/ethernet/intel/ice/ice_ethtool.c
index 9e8e9531cd87..69c113a4de7e 100644
--- a/drivers/net/ethernet/intel/ice/ice_ethtool.c
+++ b/drivers/net/ethernet/intel/ice/ice_ethtool.c
@@ -3258,8 +3258,8 @@ ice_set_rxfh(struct net_device *netdev, const u32 *indir, const u8 *key,
*/
static int ice_get_max_txq(struct ice_pf *pf)
{
- return min_t(int, num_online_cpus(),
- pf->hw.func_caps.common_cap.num_txq);
+ return min3(pf->num_lan_msix, (u16)num_online_cpus(),
+ (u16)pf->hw.func_caps.common_cap.num_txq);
}
/**
@@ -3268,8 +3268,8 @@ static int ice_get_max_txq(struct ice_pf *pf)
*/
static int ice_get_max_rxq(struct ice_pf *pf)
{
- return min_t(int, num_online_cpus(),
- pf->hw.func_caps.common_cap.num_rxq);
+ return min3(pf->num_lan_msix, (u16)num_online_cpus(),
+ (u16)pf->hw.func_caps.common_cap.num_rxq);
}
/**
diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c
index 3df67486d42d..01da18ee17da 100644
--- a/drivers/net/ethernet/intel/ice/ice_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_lib.c
@@ -161,8 +161,10 @@ static void ice_vsi_set_num_qs(struct ice_vsi *vsi, u16 vf_id)
switch (vsi->type) {
case ICE_VSI_PF:
- vsi->alloc_txq = min_t(int, ice_get_avail_txq_count(pf),
- num_online_cpus());
+ /* default to 1 Tx queue per MSI-X to not hurt our performance */
+ vsi->alloc_txq = min3(pf->num_lan_msix,
+ ice_get_avail_txq_count(pf),
+ (u16)num_online_cpus());
if (vsi->req_txq) {
vsi->alloc_txq = vsi->req_txq;
vsi->num_txq = vsi->req_txq;
@@ -174,8 +176,10 @@ static void ice_vsi_set_num_qs(struct ice_vsi *vsi, u16 vf_id)
if (!test_bit(ICE_FLAG_RSS_ENA, pf->flags)) {
vsi->alloc_rxq = 1;
} else {
- vsi->alloc_rxq = min_t(int, ice_get_avail_rxq_count(pf),
- num_online_cpus());
+ /* default to 1 Rx queue per MSI-X to not hurt our performance */
+ vsi->alloc_rxq = min3(pf->num_lan_msix,
+ ice_get_avail_rxq_count(pf),
+ (u16)num_online_cpus());
if (vsi->req_rxq) {
vsi->alloc_rxq = vsi->req_rxq;
vsi->num_rxq = vsi->req_rxq;
@@ -184,7 +188,8 @@ static void ice_vsi_set_num_qs(struct ice_vsi *vsi, u16 vf_id)
pf->num_lan_rx = vsi->alloc_rxq;
- vsi->num_q_vectors = max_t(int, vsi->alloc_rxq, vsi->alloc_txq);
+ vsi->num_q_vectors = min_t(int, pf->num_lan_msix,
+ max_t(int, vsi->alloc_rxq, vsi->alloc_txq));
break;
case ICE_VSI_VF:
vf = &pf->vf[vsi->vf_id];
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index 6e251dfffc91..040e868a0637 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -3367,95 +3367,105 @@ static int ice_init_pf(struct ice_pf *pf)
return 0;
}
+static int ice_alloc_msix_entries(struct ice_pf *pf, u16 num_entries)
+{
+ u16 i;
+
+ pf->msix_entries = devm_kcalloc(ice_pf_to_dev(pf), num_entries,
+ sizeof(*pf->msix_entries), GFP_KERNEL);
+ if (!pf->msix_entries)
+ return -ENOMEM;
+
+ for (i = 0; i < num_entries; i++)
+ pf->msix_entries[i].entry = i;
+
+ return 0;
+}
+
+static void ice_free_msix_entries(struct ice_pf *pf)
+{
+ devm_kfree(ice_pf_to_dev(pf), pf->msix_entries);
+ pf->msix_entries = NULL;
+}
+
/**
- * ice_ena_msix_range - Request a range of MSIX vectors from the OS
+ * ice_ena_msix_range - request a range of MSI-X vectors from the OS
* @pf: board private structure
*
- * compute the number of MSIX vectors required (v_budget) and request from
- * the OS. Return the number of vectors reserved or negative on failure
+ * The driver first tries to enable best-case scenario MSI-X vectors. If that
+ * doesn't succeed then a fall-back method is employed.
+ *
+ * The fall-back logic is described below with each [#] being an attempt at
+ * enabling a certain number of MSI-X. If any of the steps succeed, then return
+ * the number of MSI-X enabled from pci_ena_msix_exact(). If any of the
+ * attempts fail, then goto the next step.
+ *
+ * Attempt [0]: Enable the best-case scenario MSI-X vectors.
+ *
+ * Attempt [1]: Enable MSI-X vectors with the number of pf->num_lan_msix reduced
+ * by a factor of 2 from the previous attempts (i.e. num_online_cpus() / 2).
+ *
+ * Attempt [2]: Same as attempt [1], except reduce both by a factor of 4.
+ *
+ * Attempt [3]: Enable the bare-minimum MSI-X vectors.
+ *
+ * Also, if the adjusted_base_msix ever hits the mimimum required for LAN, then
+ * just set the needed MSI-X for that feature to the minimum (similar to
+ * attempt [3]).
*/
static int ice_ena_msix_range(struct ice_pf *pf)
{
+ int err = -ENOSPC, num_cpus, attempt, adjusted_msix_divisor = 1, needed;
struct device *dev = ice_pf_to_dev(pf);
- int v_left, v_actual, v_budget = 0;
- int needed, err, i;
-
- v_left = pf->hw.func_caps.common_cap.num_msix_vectors;
-
- /* reserve one vector for miscellaneous handler */
- needed = 1;
- if (v_left < needed)
- goto no_hw_vecs_left_err;
- v_budget += needed;
- v_left -= needed;
-
- /* reserve vectors for LAN traffic */
- needed = min_t(int, num_online_cpus(), v_left);
- if (v_left < needed)
- goto no_hw_vecs_left_err;
- pf->num_lan_msix = needed;
- v_budget += needed;
- v_left -= needed;
-
- /* reserve one vector for flow director */
- if (test_bit(ICE_FLAG_FD_ENA, pf->flags)) {
- needed = ICE_FDIR_MSIX;
- if (v_left < needed)
- goto no_hw_vecs_left_err;
- v_budget += needed;
- v_left -= needed;
- }
-
- pf->msix_entries = devm_kcalloc(dev, v_budget,
- sizeof(*pf->msix_entries), GFP_KERNEL);
- if (!pf->msix_entries) {
- err = -ENOMEM;
- goto exit_err;
- }
+ num_cpus = num_online_cpus();
- for (i = 0; i < v_budget; i++)
- pf->msix_entries[i].entry = i;
+#define ICE_MAX_ENABLE_MSIX_ATTEMPTS 3
+ /* make multiple passes at enabling MSI-X vectors in case there aren't
+ * enough available for the best-case scenario
+ */
+ for (attempt = 0; attempt <= ICE_MAX_ENABLE_MSIX_ATTEMPTS; attempt++) {
+ int adjusted_base_msix = num_cpus / adjusted_msix_divisor;
- /* actually reserve the vectors */
- v_actual = pci_enable_msix_range(pf->pdev, pf->msix_entries,
- ICE_MIN_MSIX, v_budget);
-
- if (v_actual < 0) {
- dev_err(dev, "unable to reserve MSI-X vectors\n");
- err = v_actual;
- goto msix_err;
- }
-
- if (v_actual < v_budget) {
- dev_warn(dev, "not enough OS MSI-X vectors. requested = %d, obtained = %d\n",
- v_budget, v_actual);
-/* 2 vectors each for LAN and RDMA (traffic + OICR), one for flow director */
-#define ICE_MIN_LAN_VECS 2
-#define ICE_MIN_RDMA_VECS 2
-#define ICE_MIN_VECS (ICE_MIN_LAN_VECS + ICE_MIN_RDMA_VECS + 1)
-
- if (v_actual < ICE_MIN_LAN_VECS) {
- /* error if we can't get minimum vectors */
- pci_disable_msix(pf->pdev);
- err = -ERANGE;
- goto msix_err;
+ /* attempt to enable minimum MSI-X range */
+ if (attempt == ICE_MAX_ENABLE_MSIX_ATTEMPTS) {
+ needed = ICE_MIN_MSIX;
+ pf->num_lan_msix = ICE_MIN_LAN_MSIX;
} else {
- pf->num_lan_msix = ICE_MIN_LAN_VECS;
+ if (adjusted_base_msix > ICE_MIN_LAN_MSIX)
+ pf->num_lan_msix = adjusted_base_msix;
+ else
+ pf->num_lan_msix = ICE_MIN_LAN_MSIX;
+
+ needed = pf->num_lan_msix + ICE_OICR_MSIX;
}
- }
- return v_actual;
+ if (test_bit(ICE_FLAG_FD_ENA, pf->flags))
+ needed += ICE_FDIR_MSIX;
-msix_err:
- devm_kfree(dev, pf->msix_entries);
- goto exit_err;
+ err = ice_alloc_msix_entries(pf, needed);
+ if (err)
+ goto err_out;
+
+ dev_dbg(dev, "attempting to enable %d MSI-X vectors\n", needed);
+ err = pci_enable_msix_exact(pf->pdev, pf->msix_entries, needed);
+ if (err < 0) {
+ ice_free_msix_entries(pf);
+ dev_notice(dev, "Couldn't get %d MSI-X vectors due to OS, Platform, and/or PCI-function limitations. Reducing request and retrying.",
+ needed);
+
+ adjusted_msix_divisor *= 2;
+ } else {
+ if (pf->num_lan_msix != num_cpus)
+ dev_notice(dev, "Enabled %d MSI-X vectors for LAN traffic.\n",
+ pf->num_lan_msix);
+
+ return needed;
+ }
+ }
-no_hw_vecs_left_err:
- dev_err(dev, "not enough device MSI-X vectors. requested = %d, available = %d\n",
- needed, v_left);
- err = -ERANGE;
-exit_err:
+err_out:
+ dev_err(dev, "failed to enable MSI-X vectors\n");
pf->num_lan_msix = 0;
return err;
}
@@ -3467,8 +3477,7 @@ static int ice_ena_msix_range(struct ice_pf *pf)
static void ice_dis_msix(struct ice_pf *pf)
{
pci_disable_msix(pf->pdev);
- devm_kfree(ice_pf_to_dev(pf), pf->msix_entries);
- pf->msix_entries = NULL;
+ ice_free_msix_entries(pf);
}
/**
--
2.26.2
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH net-next 1/1] ice: Improve MSI-X vector enablement fallback logic
2021-01-13 23:42 [PATCH net-next 1/1] ice: Improve MSI-X vector enablement fallback logic Tony Nguyen
@ 2021-01-15 0:42 ` Jakub Kicinski
2021-01-20 0:12 ` Venkataramanan, Anirudh
0 siblings, 1 reply; 7+ messages in thread
From: Jakub Kicinski @ 2021-01-15 0:42 UTC (permalink / raw)
To: Tony Nguyen; +Cc: davem, Brett Creeley, netdev, sassmann, Tony Brelinski
On Wed, 13 Jan 2021 15:42:26 -0800 Tony Nguyen wrote:
> From: Brett Creeley <brett.creeley@intel.com>
>
> The current MSI-X enablement logic either tries to enable best-case
> MSI-X vectors and if that fails we only support a bare-minimum set.
> This is not very flexible and the current logic is broken when actually
> allocating and reserving MSI-X in the driver. Fix this by improving the
> fall-back logic and fixing the allocation/reservation of MSI-X when the
> best-case MSI-X vectors are not received from the OS.
>
> The new fall-back logic is described below with each [#] being an
> attempt at enabling a certain number of MSI-X. If any of the steps
> succeed, then return the number of MSI-X enabled from
> ice_ena_msix_range(). If any of the attempts fail, then goto the next
> step.
>
> Attempt [0]: Enable the best-case scenario MSI-X vectors.
>
> Attempt [1]: Enable MSI-X vectors with the number of pf->num_lan_msix
> reduced by a factor of 2 from the previous attempt (i.e.
> num_online_cpus() / 2).
>
> Attempt [2]: Same as attempt [1], except reduce by a factor of 4.
>
> Attempt [3]: Enable the bare-minimum MSI-X vectors.
>
> Also, if the adjusted_base_msix ever hits the minimum required for LAN,
> then just set the needed MSI-X for that feature to the minimum
> (similar to attempt [3]).
I don't really get why you switch to this manual "exponential back-off"
rather than continuing to use pci_enable_msix_range(), but fixing the
capping to ICE_MIN_LAN_VECS.
> To fix the allocation/reservation of MSI-X, the PF VSI needs to take
> into account the pf->num_lan_msix available and only allocate up to that
> many MSI-X vectors. To do this, limit the number of Tx and Rx queues
> based on pf->num_lan_msix. This is done because we don't want more than
> 1 Tx/Rx queue per interrupt due to performance concerns. Also, limit the
> number of MSI-X based on pf->num_lan_msix available.
>
> Also, prevent users from enabling more combined queues than there are
> MSI-X available via ethtool.
Right, this part sounds like a fix, and should be a separate patch
against net, not net-next.
> Fixes: 152b978a1f90 ("ice: Rework ice_ena_msix_range")
> Signed-off-by: Brett Creeley <brett.creeley@intel.com>
> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH net-next 1/1] ice: Improve MSI-X vector enablement fallback logic
2021-01-15 0:42 ` Jakub Kicinski
@ 2021-01-20 0:12 ` Venkataramanan, Anirudh
2021-01-20 0:41 ` Jakub Kicinski
0 siblings, 1 reply; 7+ messages in thread
From: Venkataramanan, Anirudh @ 2021-01-20 0:12 UTC (permalink / raw)
To: kuba, Nguyen, Anthony L
Cc: Brelinski, TonyX, sassmann, Creeley, Brett, davem, netdev
On Thu, 2021-01-14 at 16:42 -0800, Jakub Kicinski wrote:
> On Wed, 13 Jan 2021 15:42:26 -0800 Tony Nguyen wrote:
> > From: Brett Creeley <brett.creeley@intel.com>
> >
> > The current MSI-X enablement logic either tries to enable best-case
> > MSI-X vectors and if that fails we only support a bare-minimum set.
> > This is not very flexible and the current logic is broken when
> > actually
> > allocating and reserving MSI-X in the driver. Fix this by improving
> > the
> > fall-back logic and fixing the allocation/reservation of MSI-X when
> > the
> > best-case MSI-X vectors are not received from the OS.
> >
> > The new fall-back logic is described below with each [#] being an
> > attempt at enabling a certain number of MSI-X. If any of the steps
> > succeed, then return the number of MSI-X enabled from
> > ice_ena_msix_range(). If any of the attempts fail, then goto the
> > next
> > step.
> >
> > Attempt [0]: Enable the best-case scenario MSI-X vectors.
> >
> > Attempt [1]: Enable MSI-X vectors with the number of pf-
> > >num_lan_msix
> > reduced by a factor of 2 from the previous attempt (i.e.
> > num_online_cpus() / 2).
> >
> > Attempt [2]: Same as attempt [1], except reduce by a factor of 4.
> >
> > Attempt [3]: Enable the bare-minimum MSI-X vectors.
> >
> > Also, if the adjusted_base_msix ever hits the minimum required for
> > LAN,
> > then just set the needed MSI-X for that feature to the minimum
> > (similar to attempt [3]).
>
> I don't really get why you switch to this manual "exponential back-
> off"
> rather than continuing to use pci_enable_msix_range(), but fixing the
> capping to ICE_MIN_LAN_VECS.
As per the current logic, if the driver does not get the number of MSI-
X vectors it needs, it will immediately drop to "Do I have at least two
(ICE_MIN_LAN_VECS) MSI-X vectors?". If yes, the driver will enable a
single Tx/Rx traffic queue pair, bound to one of the two MSI-X vectors.
This is a bit of an all-or-nothing type approach. There's a mid-ground
that can allow more queues to be enabled (ex. driver asked for 300
vectors, but got 68 vectors, so enabled 64 data queues) and this patch
implements the mid-ground logic.
This mid-ground logic can also be implemented based on the return value
of pci_enable_msix_range() but IMHO the implementation in this patch
using pci_enable_msix_exact is better because it's always only
enabling/reserving as many MSI-X vectors as required, not more, not
less.
>
> > To fix the allocation/reservation of MSI-X, the PF VSI needs to
> > take
> > into account the pf->num_lan_msix available and only allocate up to
> > that
> > many MSI-X vectors. To do this, limit the number of Tx and Rx
> > queues
> > based on pf->num_lan_msix. This is done because we don't want more
> > than
> > 1 Tx/Rx queue per interrupt due to performance concerns. Also,
> > limit the
> > number of MSI-X based on pf->num_lan_msix available.
> >
> > Also, prevent users from enabling more combined queues than there
> > are
> > MSI-X available via ethtool.
>
> Right, this part sounds like a fix, and should be a separate patch
> against net, not net-next.
ACK. We will do a different patch for this.
>
> > Fixes: 152b978a1f90 ("ice: Rework ice_ena_msix_range")
> > Signed-off-by: Brett Creeley <brett.creeley@intel.com>
> > Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
> > Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH net-next 1/1] ice: Improve MSI-X vector enablement fallback logic
2021-01-20 0:12 ` Venkataramanan, Anirudh
@ 2021-01-20 0:41 ` Jakub Kicinski
2021-01-20 2:13 ` Venkataramanan, Anirudh
0 siblings, 1 reply; 7+ messages in thread
From: Jakub Kicinski @ 2021-01-20 0:41 UTC (permalink / raw)
To: Venkataramanan, Anirudh
Cc: Nguyen, Anthony L, Brelinski, TonyX, sassmann, Creeley, Brett,
davem, netdev
On Wed, 20 Jan 2021 00:12:26 +0000 Venkataramanan, Anirudh wrote:
> > > Attempt [0]: Enable the best-case scenario MSI-X vectors.
> > >
> > > Attempt [1]: Enable MSI-X vectors with the number of pf-
> > > >num_lan_msix
> > > reduced by a factor of 2 from the previous attempt (i.e.
> > > num_online_cpus() / 2).
> > >
> > > Attempt [2]: Same as attempt [1], except reduce by a factor of 4.
> > >
> > > Attempt [3]: Enable the bare-minimum MSI-X vectors.
> > >
> > > Also, if the adjusted_base_msix ever hits the minimum required for
> > > LAN,
> > > then just set the needed MSI-X for that feature to the minimum
> > > (similar to attempt [3]).
> >
> > I don't really get why you switch to this manual "exponential back-
> > off"
> > rather than continuing to use pci_enable_msix_range(), but fixing the
> > capping to ICE_MIN_LAN_VECS.
>
> As per the current logic, if the driver does not get the number of MSI-
> X vectors it needs, it will immediately drop to "Do I have at least two
> (ICE_MIN_LAN_VECS) MSI-X vectors?". If yes, the driver will enable a
> single Tx/Rx traffic queue pair, bound to one of the two MSI-X vectors.
>
> This is a bit of an all-or-nothing type approach. There's a mid-ground
> that can allow more queues to be enabled (ex. driver asked for 300
> vectors, but got 68 vectors, so enabled 64 data queues) and this patch
> implements the mid-ground logic.
>
> This mid-ground logic can also be implemented based on the return value
> of pci_enable_msix_range() but IMHO the implementation in this patch
> using pci_enable_msix_exact is better because it's always only
> enabling/reserving as many MSI-X vectors as required, not more, not
> less.
What do you mean by "required" in the last sentence? The driver
requests num_online_cpus()-worth of IRQs, so it must work with any
number of IRQs. Why is num_cpus() / 1,2,4,8 "required"?
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH net-next 1/1] ice: Improve MSI-X vector enablement fallback logic
2021-01-20 0:41 ` Jakub Kicinski
@ 2021-01-20 2:13 ` Venkataramanan, Anirudh
2021-01-20 2:29 ` Jakub Kicinski
0 siblings, 1 reply; 7+ messages in thread
From: Venkataramanan, Anirudh @ 2021-01-20 2:13 UTC (permalink / raw)
To: kuba
Cc: Brelinski, TonyX, sassmann, Nguyen, Anthony L, davem, Creeley,
Brett, netdev
On Tue, 2021-01-19 at 16:41 -0800, Jakub Kicinski wrote:
> On Wed, 20 Jan 2021 00:12:26 +0000 Venkataramanan, Anirudh wrote:
> > > > Attempt [0]: Enable the best-case scenario MSI-X vectors.
> > > >
> > > > Attempt [1]: Enable MSI-X vectors with the number of pf-
> > > > > num_lan_msix
> > > > reduced by a factor of 2 from the previous attempt (i.e.
> > > > num_online_cpus() / 2).
> > > >
> > > > Attempt [2]: Same as attempt [1], except reduce by a factor of
> > > > 4.
> > > >
> > > > Attempt [3]: Enable the bare-minimum MSI-X vectors.
> > > >
> > > > Also, if the adjusted_base_msix ever hits the minimum required
> > > > for
> > > > LAN,
> > > > then just set the needed MSI-X for that feature to the minimum
> > > > (similar to attempt [3]).
> > >
> > > I don't really get why you switch to this manual "exponential
> > > back-
> > > off"
> > > rather than continuing to use pci_enable_msix_range(), but fixing
> > > the
> > > capping to ICE_MIN_LAN_VECS.
> >
> > As per the current logic, if the driver does not get the number of
> > MSI-
> > X vectors it needs, it will immediately drop to "Do I have at least
> > two
> > (ICE_MIN_LAN_VECS) MSI-X vectors?". If yes, the driver will enable
> > a
> > single Tx/Rx traffic queue pair, bound to one of the two MSI-X
> > vectors.
> >
> > This is a bit of an all-or-nothing type approach. There's a mid-
> > ground
> > that can allow more queues to be enabled (ex. driver asked for 300
> > vectors, but got 68 vectors, so enabled 64 data queues) and this
> > patch
> > implements the mid-ground logic.
> >
> > This mid-ground logic can also be implemented based on the return
> > value
> > of pci_enable_msix_range() but IMHO the implementation in this
> > patch
> > using pci_enable_msix_exact is better because it's always only
> > enabling/reserving as many MSI-X vectors as required, not more, not
> > less.
>
> What do you mean by "required" in the last sentence?
.. as "required" in that particular iteration of the loop.
> The driver
> requests num_online_cpus()-worth of IRQs, so it must work with any
> number of IRQs. Why is num_cpus() / 1,2,4,8 "required"?
Let me back up a bit here.
Ultimately, the issue we are trying to solve here is "what happens when
the driver doesn't get as many MSI-X vectors as it needs, and how it's
interpreted by the end user"
Let's say there are these two systems, each with 256 cores but the
response to pci_enable_msix_range() is different:
System 1: 256 cores, pci_enable_msix_range returns 75 vectors
System 2: 256 cores, pci_enable_msix_range returns 220 vectors
In this case, the number of queues the user would see enabled on each
of these systems would be very different (73 on system 1 and 218 on
system 2). This variabilty makes it difficult to define what the
expected behavior should be, because it's not exactly obvious to the
user how many free MSI-X vectors a given system has. Instead, if the
driver reduced it's demand for vectors in a well defined manner
(num_cpus() / 1,2,4,8), the user visible difference between the two
systems wouldn't be so drastic.
If this is plain wrong or if there's a preferred approach, I'd be happy
to discuss further.
Ani
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH net-next 1/1] ice: Improve MSI-X vector enablement fallback logic
2021-01-20 2:13 ` Venkataramanan, Anirudh
@ 2021-01-20 2:29 ` Jakub Kicinski
2021-01-21 17:41 ` Venkataramanan, Anirudh
0 siblings, 1 reply; 7+ messages in thread
From: Jakub Kicinski @ 2021-01-20 2:29 UTC (permalink / raw)
To: Venkataramanan, Anirudh
Cc: Brelinski, TonyX, sassmann, Nguyen, Anthony L, davem, Creeley,
Brett, netdev
On Wed, 20 Jan 2021 02:13:36 +0000 Venkataramanan, Anirudh wrote:
> > > As per the current logic, if the driver does not get the number of
> > > MSI-
> > > X vectors it needs, it will immediately drop to "Do I have at least
> > > two
> > > (ICE_MIN_LAN_VECS) MSI-X vectors?". If yes, the driver will enable
> > > a
> > > single Tx/Rx traffic queue pair, bound to one of the two MSI-X
> > > vectors.
> > >
> > > This is a bit of an all-or-nothing type approach. There's a mid-
> > > ground
> > > that can allow more queues to be enabled (ex. driver asked for 300
> > > vectors, but got 68 vectors, so enabled 64 data queues) and this
> > > patch
> > > implements the mid-ground logic.
> > >
> > > This mid-ground logic can also be implemented based on the return
> > > value
> > > of pci_enable_msix_range() but IMHO the implementation in this
> > > patch
> > > using pci_enable_msix_exact is better because it's always only
> > > enabling/reserving as many MSI-X vectors as required, not more, not
> > > less.
> >
> > What do you mean by "required" in the last sentence?
>
> .. as "required" in that particular iteration of the loop.
>
> > The driver
> > requests num_online_cpus()-worth of IRQs, so it must work with any
> > number of IRQs. Why is num_cpus() / 1,2,4,8 "required"?
>
> Let me back up a bit here.
>
> Ultimately, the issue we are trying to solve here is "what happens when
> the driver doesn't get as many MSI-X vectors as it needs, and how it's
> interpreted by the end user"
>
> Let's say there are these two systems, each with 256 cores but the
> response to pci_enable_msix_range() is different:
>
> System 1: 256 cores, pci_enable_msix_range returns 75 vectors
> System 2: 256 cores, pci_enable_msix_range returns 220 vectors
>
> In this case, the number of queues the user would see enabled on each
> of these systems would be very different (73 on system 1 and 218 on
> system 2). This variabilty makes it difficult to define what the
> expected behavior should be, because it's not exactly obvious to the
> user how many free MSI-X vectors a given system has. Instead, if the
> driver reduced it's demand for vectors in a well defined manner
> (num_cpus() / 1,2,4,8), the user visible difference between the two
> systems wouldn't be so drastic.
>
> If this is plain wrong or if there's a preferred approach, I'd be happy
> to discuss further.
Let's stick to the standard Linux way of handling IRQ exhaustion, and
rely on pci_enable_msix_range() to pick the number. If the current
behavior of pci_enable_msix_range() is now what users want we can
change it. Each driver creating its own heuristic is worst of all
choices as most brownfield deployments will have a mix of NICs.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH net-next 1/1] ice: Improve MSI-X vector enablement fallback logic
2021-01-20 2:29 ` Jakub Kicinski
@ 2021-01-21 17:41 ` Venkataramanan, Anirudh
0 siblings, 0 replies; 7+ messages in thread
From: Venkataramanan, Anirudh @ 2021-01-21 17:41 UTC (permalink / raw)
To: kuba
Cc: Brelinski, TonyX, sassmann, Nguyen, Anthony L, davem, Creeley,
Brett, netdev
On Tue, 2021-01-19 at 18:29 -0800, Jakub Kicinski wrote:
> On Wed, 20 Jan 2021 02:13:36 +0000 Venkataramanan, Anirudh wrote:
> > > > As per the current logic, if the driver does not get the number
> > > > of
> > > > MSI-
> > > > X vectors it needs, it will immediately drop to "Do I have at
> > > > least
> > > > two
> > > > (ICE_MIN_LAN_VECS) MSI-X vectors?". If yes, the driver will
> > > > enable
> > > > a
> > > > single Tx/Rx traffic queue pair, bound to one of the two MSI-X
> > > > vectors.
> > > >
> > > > This is a bit of an all-or-nothing type approach. There's a
> > > > mid-
> > > > ground
> > > > that can allow more queues to be enabled (ex. driver asked for
> > > > 300
> > > > vectors, but got 68 vectors, so enabled 64 data queues) and
> > > > this
> > > > patch
> > > > implements the mid-ground logic.
> > > >
> > > > This mid-ground logic can also be implemented based on the
> > > > return
> > > > value
> > > > of pci_enable_msix_range() but IMHO the implementation in this
> > > > patch
> > > > using pci_enable_msix_exact is better because it's always only
> > > > enabling/reserving as many MSI-X vectors as required, not more,
> > > > not
> > > > less.
> > >
> > > What do you mean by "required" in the last sentence?
> >
> > .. as "required" in that particular iteration of the loop.
> >
> > > The driver
> > > requests num_online_cpus()-worth of IRQs, so it must work with
> > > any
> > > number of IRQs. Why is num_cpus() / 1,2,4,8 "required"?
> >
> > Let me back up a bit here.
> >
> > Ultimately, the issue we are trying to solve here is "what happens
> > when
> > the driver doesn't get as many MSI-X vectors as it needs, and how
> > it's
> > interpreted by the end user"
> >
> > Let's say there are these two systems, each with 256 cores but the
> > response to pci_enable_msix_range() is different:
> >
> > System 1: 256 cores, pci_enable_msix_range returns 75 vectors
> > System 2: 256 cores, pci_enable_msix_range returns 220 vectors
> >
> > In this case, the number of queues the user would see enabled on
> > each
> > of these systems would be very different (73 on system 1 and 218 on
> > system 2). This variabilty makes it difficult to define what the
> > expected behavior should be, because it's not exactly obvious to
> > the
> > user how many free MSI-X vectors a given system has. Instead, if
> > the
> > driver reduced it's demand for vectors in a well defined manner
> > (num_cpus() / 1,2,4,8), the user visible difference between the two
> > systems wouldn't be so drastic.
> >
> > If this is plain wrong or if there's a preferred approach, I'd be
> > happy
> > to discuss further.
>
> Let's stick to the standard Linux way of handling IRQ exhaustion, and
> rely on pci_enable_msix_range() to pick the number. If the current
> behavior of pci_enable_msix_range() is now what users want we can
> change it. Each driver creating its own heuristic is worst of all
> choices as most brownfield deployments will have a mix of NICs.
Okay, we will rework the fallback improvement logic.
Just so you are aware, we will be posting a couple of bug-fix patches
that fix issues around the current fallback logic.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2021-01-21 17:43 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-13 23:42 [PATCH net-next 1/1] ice: Improve MSI-X vector enablement fallback logic Tony Nguyen
2021-01-15 0:42 ` Jakub Kicinski
2021-01-20 0:12 ` Venkataramanan, Anirudh
2021-01-20 0:41 ` Jakub Kicinski
2021-01-20 2:13 ` Venkataramanan, Anirudh
2021-01-20 2:29 ` Jakub Kicinski
2021-01-21 17:41 ` Venkataramanan, Anirudh
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).