netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next 00/13] resource management using devlink reload
@ 2022-11-14 12:57 Michal Swiatkowski
  2022-11-14 12:57 ` [PATCH net-next 01/13] ice: move RDMA init to ice_idc.c Michal Swiatkowski
                   ` (13 more replies)
  0 siblings, 14 replies; 49+ messages in thread
From: Michal Swiatkowski @ 2022-11-14 12:57 UTC (permalink / raw)
  To: netdev, davem, kuba, pabeni, edumazet
  Cc: intel-wired-lan, jiri, anthony.l.nguyen, alexandr.lobakin,
	sridhar.samudrala, wojciech.drewek, lukasz.czapnik,
	shiraz.saleem, jesse.brandeburg, mustafa.ismail,
	przemyslaw.kitszel, piotr.raczynski, jacob.e.keller,
	david.m.ertman, leszek.kaliszczuk, Michal Swiatkowski

Currently the default value for number of PF vectors is number of CPUs.
Because of that there are cases when all vectors are used for PF
and user can't create more VFs. It is hard to set default number of
CPUs right for all different use cases. Instead allow user to choose
how many vectors should be used for various features. After implementing
subdevices this mechanism will be also used to set number of vectors
for subfunctions.

The idea is to set vectors for eth or VFs using devlink resource API.
New value of vectors will be used after devlink reinit. Example
commands:
$ sudo devlink resource set pci/0000:31:00.0 path msix/msix_eth size 16
$ sudo devlink dev reload pci/0000:31:00.0
After reload driver will work with 16 vectors used for eth instead of
num_cpus.

The default number of queues is implicitly derived from interrupt
vectors and can be later changed by ethtool.
To decrease queues used on eth user can decrease vectors on eth.
The result will be the same. Still user can change number of queues
using ethtool:
$ sudo ethtool -L enp24s0f0 tx 72 rx 72
but maximum queues amount is equal to amount of vectors.

Most of this patchset is about implementing driver reload mechanism.
Part of code from probe and rebuild is used to not duplicate code.
To allow this reuse probe and rebuild path are split into smaller
functions.

Patch "ice: split ice_vsi_setup into smaller functions" changes
boolean variable in function call to integer and adds define
for it. Instead of having the function called with true/false now it
can be called with readable defines ICE_VSI_FLAG_INIT or
ICE_VSI_FLAG_NO_INIT. It was suggested by Jacob Keller and probably this
mechanism will be implemented across ice driver in follow up patchset.

Patch 1 - 10	-> cleanup code to reuse most of the already
                   implemented function in reload path
Patch 11	-> implement devlink reload API
Patch 12        -> prepare interrupts reservation, make irdma see
                   changeable vectors count
Patch 13        -> changing number of vectors


Jacob Keller (1):
  ice: stop hard coding the ICE_VSI_CTRL location

Michal Kubiak (1):
  devlink, ice: add MSIX vectors as devlink resource

Michal Swiatkowski (11):
  ice: move RDMA init to ice_idc.c
  ice: alloc id for RDMA using xa_array
  ice: cleanup in VSI config/deconfig code
  ice: split ice_vsi_setup into smaller functions
  ice: split probe into smaller functions
  ice: sync netdev filters after clearing VSI
  ice: move VSI delete outside deconfig
  ice: update VSI instead of init in some case
  ice: implement devlink reinit action
  ice: introduce eswitch capable flag
  ice, irdma: prepare reservation of MSI-X to reload

 .../networking/devlink/devlink-resource.rst   |   10 +
 drivers/infiniband/hw/irdma/main.c            |    2 +-
 drivers/net/ethernet/intel/ice/ice.h          |   23 +-
 drivers/net/ethernet/intel/ice/ice_common.c   |   11 +-
 drivers/net/ethernet/intel/ice/ice_devlink.c  |  263 +++-
 drivers/net/ethernet/intel/ice/ice_devlink.h  |    2 +
 drivers/net/ethernet/intel/ice/ice_eswitch.c  |    6 +
 drivers/net/ethernet/intel/ice/ice_ethtool.c  |    6 +-
 drivers/net/ethernet/intel/ice/ice_fltr.c     |    5 +
 drivers/net/ethernet/intel/ice/ice_idc.c      |   57 +-
 drivers/net/ethernet/intel/ice/ice_lib.c      |  789 +++++-----
 drivers/net/ethernet/intel/ice/ice_lib.h      |    8 +-
 drivers/net/ethernet/intel/ice/ice_main.c     | 1354 ++++++++++-------
 drivers/net/ethernet/intel/ice/ice_sriov.c    |    3 +-
 drivers/net/ethernet/intel/ice/ice_vf_lib.c   |    2 +-
 include/net/devlink.h                         |   14 +
 16 files changed, 1517 insertions(+), 1038 deletions(-)

-- 
2.36.1


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH net-next 01/13] ice: move RDMA init to ice_idc.c
  2022-11-14 12:57 [PATCH net-next 00/13] resource management using devlink reload Michal Swiatkowski
@ 2022-11-14 12:57 ` Michal Swiatkowski
  2022-11-14 12:57 ` [PATCH net-next 02/13] ice: alloc id for RDMA using xa_array Michal Swiatkowski
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 49+ messages in thread
From: Michal Swiatkowski @ 2022-11-14 12:57 UTC (permalink / raw)
  To: netdev, davem, kuba, pabeni, edumazet
  Cc: intel-wired-lan, jiri, anthony.l.nguyen, alexandr.lobakin,
	sridhar.samudrala, wojciech.drewek, lukasz.czapnik,
	shiraz.saleem, jesse.brandeburg, mustafa.ismail,
	przemyslaw.kitszel, piotr.raczynski, jacob.e.keller,
	david.m.ertman, leszek.kaliszczuk, Michal Swiatkowski

Simplify probe flow by moving all RDMA related code to ice_init_rdma().
Unroll irq allocation if RDMA initialization fails.

Implement ice_deinit_rdma() and use it in remove flow.

Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Acked-by: Dave Ertman <david.m.ertman@intel.com>
---
 drivers/net/ethernet/intel/ice/ice.h      |  1 +
 drivers/net/ethernet/intel/ice/ice_idc.c  | 52 ++++++++++++++++++++++-
 drivers/net/ethernet/intel/ice/ice_main.c | 29 +++----------
 3 files changed, 57 insertions(+), 25 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h
index f88ee051e71c..c0079e88cda7 100644
--- a/drivers/net/ethernet/intel/ice/ice.h
+++ b/drivers/net/ethernet/intel/ice/ice.h
@@ -900,6 +900,7 @@ void ice_print_link_msg(struct ice_vsi *vsi, bool isup);
 int ice_plug_aux_dev(struct ice_pf *pf);
 void ice_unplug_aux_dev(struct ice_pf *pf);
 int ice_init_rdma(struct ice_pf *pf);
+void ice_deinit_rdma(struct ice_pf *pf);
 const char *ice_aq_str(enum ice_aq_err aq_err);
 bool ice_is_wol_supported(struct ice_hw *hw);
 void ice_fdir_del_all_fltrs(struct ice_vsi *vsi);
diff --git a/drivers/net/ethernet/intel/ice/ice_idc.c b/drivers/net/ethernet/intel/ice/ice_idc.c
index 895c32bcc8b5..579d2a433ea1 100644
--- a/drivers/net/ethernet/intel/ice/ice_idc.c
+++ b/drivers/net/ethernet/intel/ice/ice_idc.c
@@ -6,6 +6,8 @@
 #include "ice_lib.h"
 #include "ice_dcb_lib.h"
 
+static DEFINE_IDA(ice_aux_ida);
+
 /**
  * ice_get_auxiliary_drv - retrieve iidc_auxiliary_drv struct
  * @pf: pointer to PF struct
@@ -245,6 +247,17 @@ static int ice_reserve_rdma_qvector(struct ice_pf *pf)
 	return 0;
 }
 
+/**
+ * ice_free_rdma_qvector - free vector resources reserved for RDMA driver
+ * @pf: board private structure to initialize
+ */
+static void ice_free_rdma_qvector(struct ice_pf *pf)
+{
+	pf->num_avail_sw_msix -= pf->num_rdma_msix;
+	ice_free_res(pf->irq_tracker, pf->rdma_base_vector,
+		     ICE_RES_RDMA_VEC_ID);
+}
+
 /**
  * ice_adev_release - function to be mapped to AUX dev's release op
  * @dev: pointer to device to free
@@ -331,12 +344,47 @@ int ice_init_rdma(struct ice_pf *pf)
 	struct device *dev = &pf->pdev->dev;
 	int ret;
 
+	if (!ice_is_rdma_ena(pf)) {
+		dev_warn(dev, "RDMA is not supported on this device\n");
+		return 0;
+	}
+
+	pf->aux_idx = ida_alloc(&ice_aux_ida, GFP_KERNEL);
+	if (pf->aux_idx < 0) {
+		dev_err(dev, "Failed to allocate device ID for AUX driver\n");
+		return -ENOMEM;
+	}
+
 	/* Reserve vector resources */
 	ret = ice_reserve_rdma_qvector(pf);
 	if (ret < 0) {
 		dev_err(dev, "failed to reserve vectors for RDMA\n");
-		return ret;
+		goto err_reserve_rdma_qvector;
 	}
 	pf->rdma_mode |= IIDC_RDMA_PROTOCOL_ROCEV2;
-	return ice_plug_aux_dev(pf);
+	ret = ice_plug_aux_dev(pf);
+	if (ret)
+		goto err_plug_aux_dev;
+	return 0;
+
+err_plug_aux_dev:
+	ice_free_rdma_qvector(pf);
+err_reserve_rdma_qvector:
+	pf->adev = NULL;
+	ida_free(&ice_aux_ida, pf->aux_idx);
+	return ret;
+}
+
+/**
+ * ice_deinit_rdma - deinitialize RDMA on PF
+ * @pf: ptr to ice_pf
+ */
+void ice_deinit_rdma(struct ice_pf *pf)
+{
+	if (!ice_is_rdma_ena(pf))
+		return;
+
+	ice_unplug_aux_dev(pf);
+	ice_free_rdma_qvector(pf);
+	ida_free(&ice_aux_ida, pf->aux_idx);
 }
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index a9fc89aebebe..d5bd56a213ae 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -44,7 +44,6 @@ MODULE_PARM_DESC(debug, "netif level (0=none,...,16=all), hw debug_mask (0x8XXXX
 MODULE_PARM_DESC(debug, "netif level (0=none,...,16=all)");
 #endif /* !CONFIG_DYNAMIC_DEBUG */
 
-static DEFINE_IDA(ice_aux_ida);
 DEFINE_STATIC_KEY_FALSE(ice_xdp_locking_key);
 EXPORT_SYMBOL(ice_xdp_locking_key);
 
@@ -4904,30 +4903,16 @@ ice_probe(struct pci_dev *pdev, const struct pci_device_id __always_unused *ent)
 
 	/* ready to go, so clear down state bit */
 	clear_bit(ICE_DOWN, pf->state);
-	if (ice_is_rdma_ena(pf)) {
-		pf->aux_idx = ida_alloc(&ice_aux_ida, GFP_KERNEL);
-		if (pf->aux_idx < 0) {
-			dev_err(dev, "Failed to allocate device ID for AUX driver\n");
-			err = -ENOMEM;
-			goto err_devlink_reg_param;
-		}
-
-		err = ice_init_rdma(pf);
-		if (err) {
-			dev_err(dev, "Failed to initialize RDMA: %d\n", err);
-			err = -EIO;
-			goto err_init_aux_unroll;
-		}
-	} else {
-		dev_warn(dev, "RDMA is not supported on this device\n");
+	err = ice_init_rdma(pf);
+	if (err) {
+		dev_err(dev, "Failed to initialize RDMA: %d\n", err);
+		err = -EIO;
+		goto err_devlink_reg_param;
 	}
 
 	ice_devlink_register(pf);
 	return 0;
 
-err_init_aux_unroll:
-	pf->adev = NULL;
-	ida_free(&ice_aux_ida, pf->aux_idx);
 err_devlink_reg_param:
 	ice_devlink_unregister_params(pf);
 err_netdev_reg:
@@ -5040,9 +5025,7 @@ static void ice_remove(struct pci_dev *pdev)
 	ice_service_task_stop(pf);
 
 	ice_aq_cancel_waiting_tasks(pf);
-	ice_unplug_aux_dev(pf);
-	if (pf->aux_idx >= 0)
-		ida_free(&ice_aux_ida, pf->aux_idx);
+	ice_deinit_rdma(pf);
 	ice_devlink_unregister_params(pf);
 	set_bit(ICE_DOWN, pf->state);
 
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH net-next 02/13] ice: alloc id for RDMA using xa_array
  2022-11-14 12:57 [PATCH net-next 00/13] resource management using devlink reload Michal Swiatkowski
  2022-11-14 12:57 ` [PATCH net-next 01/13] ice: move RDMA init to ice_idc.c Michal Swiatkowski
@ 2022-11-14 12:57 ` Michal Swiatkowski
  2022-11-14 12:57 ` [PATCH net-next 03/13] ice: cleanup in VSI config/deconfig code Michal Swiatkowski
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 49+ messages in thread
From: Michal Swiatkowski @ 2022-11-14 12:57 UTC (permalink / raw)
  To: netdev, davem, kuba, pabeni, edumazet
  Cc: intel-wired-lan, jiri, anthony.l.nguyen, alexandr.lobakin,
	sridhar.samudrala, wojciech.drewek, lukasz.czapnik,
	shiraz.saleem, jesse.brandeburg, mustafa.ismail,
	przemyslaw.kitszel, piotr.raczynski, jacob.e.keller,
	david.m.ertman, leszek.kaliszczuk, Michal Swiatkowski

Use xa_array instead of deprecated ida to alloc id for RDMA aux driver.

Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
---
 drivers/net/ethernet/intel/ice/ice_idc.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_idc.c b/drivers/net/ethernet/intel/ice/ice_idc.c
index 579d2a433ea1..e6bc2285071e 100644
--- a/drivers/net/ethernet/intel/ice/ice_idc.c
+++ b/drivers/net/ethernet/intel/ice/ice_idc.c
@@ -6,7 +6,7 @@
 #include "ice_lib.h"
 #include "ice_dcb_lib.h"
 
-static DEFINE_IDA(ice_aux_ida);
+static DEFINE_XARRAY_ALLOC1(ice_aux_id);
 
 /**
  * ice_get_auxiliary_drv - retrieve iidc_auxiliary_drv struct
@@ -349,8 +349,9 @@ int ice_init_rdma(struct ice_pf *pf)
 		return 0;
 	}
 
-	pf->aux_idx = ida_alloc(&ice_aux_ida, GFP_KERNEL);
-	if (pf->aux_idx < 0) {
+	ret = xa_alloc(&ice_aux_id, &pf->aux_idx, NULL, XA_LIMIT(1, INT_MAX),
+		       GFP_KERNEL);
+	if (ret) {
 		dev_err(dev, "Failed to allocate device ID for AUX driver\n");
 		return -ENOMEM;
 	}
@@ -371,7 +372,7 @@ int ice_init_rdma(struct ice_pf *pf)
 	ice_free_rdma_qvector(pf);
 err_reserve_rdma_qvector:
 	pf->adev = NULL;
-	ida_free(&ice_aux_ida, pf->aux_idx);
+	xa_erase(&ice_aux_id, pf->aux_idx);
 	return ret;
 }
 
@@ -386,5 +387,5 @@ void ice_deinit_rdma(struct ice_pf *pf)
 
 	ice_unplug_aux_dev(pf);
 	ice_free_rdma_qvector(pf);
-	ida_free(&ice_aux_ida, pf->aux_idx);
+	xa_erase(&ice_aux_id, pf->aux_idx);
 }
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH net-next 03/13] ice: cleanup in VSI config/deconfig code
  2022-11-14 12:57 [PATCH net-next 00/13] resource management using devlink reload Michal Swiatkowski
  2022-11-14 12:57 ` [PATCH net-next 01/13] ice: move RDMA init to ice_idc.c Michal Swiatkowski
  2022-11-14 12:57 ` [PATCH net-next 02/13] ice: alloc id for RDMA using xa_array Michal Swiatkowski
@ 2022-11-14 12:57 ` Michal Swiatkowski
  2022-11-14 12:57 ` [PATCH net-next 04/13] ice: split ice_vsi_setup into smaller functions Michal Swiatkowski
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 49+ messages in thread
From: Michal Swiatkowski @ 2022-11-14 12:57 UTC (permalink / raw)
  To: netdev, davem, kuba, pabeni, edumazet
  Cc: intel-wired-lan, jiri, anthony.l.nguyen, alexandr.lobakin,
	sridhar.samudrala, wojciech.drewek, lukasz.czapnik,
	shiraz.saleem, jesse.brandeburg, mustafa.ismail,
	przemyslaw.kitszel, piotr.raczynski, jacob.e.keller,
	david.m.ertman, leszek.kaliszczuk, Michal Swiatkowski

Do few small cleanups:

1) Rename the function to reflect that it doesn't configure all things
related to VSI. ice_vsi_cfg_lan() better fits to what function is doing.

ice_vsi_cfg() can be use to name function that will configure whole VSI.

2) Remove unused ethtype field from VSI. There is no need to set
ethtype here, because it is never used.

3) Remove unnecessary check for ICE_VSI_CHNL. There is check for
ICE_VSI_CHNL in ice_vsi_get_qs, so there is no need to check it before
calling the function.

4) Simplify ice_vsi_alloc() call. There is no need to check the type of
VSI before calling ice_vsi_alloc(). For ICE_VSI_CHNL vf is always NULL
(ice_vsi_setup() is called with vf=NULL).
For ICE_VSI_VF or ICE_VSI_CTRL ch is always NULL and for other VSI types
ch and vf are always NULL.

5) Remove unnecessary call to ice_vsi_dis_irq(). ice_vsi_dis_irq() will
be called in ice_vsi_close() flow (ice_vsi_close() -> ice_vsi_down() ->
ice_vsi_dis_irq()). Remove unnecessary call.

6) Don't remove specific filters in release. All hw filters are removed
in ice_fltr_remove_alli(), which is always called in VSI release flow.
There is no need to remove only ethertype filters before calling
ice_fltr_remove_all().

7) Rename ice_vsi_clear() to ice_vsi_free(). As ice_vsi_clear() only
free memory allocated in ice_vsi_alloc() rename it to ice_vsi_free()
which better shows what function is doing.

8) Free coalesce param in rebuild. There is potential memory leak if
configuration of VSI lan fails. Free coalesce to avoid it.

Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
---
 drivers/net/ethernet/intel/ice/ice.h         |  3 +-
 drivers/net/ethernet/intel/ice/ice_ethtool.c |  2 +-
 drivers/net/ethernet/intel/ice/ice_lib.c     | 51 +++++++-------------
 drivers/net/ethernet/intel/ice/ice_lib.h     |  2 +-
 drivers/net/ethernet/intel/ice/ice_main.c    | 12 ++---
 5 files changed, 26 insertions(+), 44 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h
index c0079e88cda7..5c5b188fb3e9 100644
--- a/drivers/net/ethernet/intel/ice/ice.h
+++ b/drivers/net/ethernet/intel/ice/ice.h
@@ -347,7 +347,6 @@ struct ice_vsi {
 
 	struct ice_vf *vf;		/* VF associated with this VSI */
 
-	u16 ethtype;			/* Ethernet protocol for pause frame */
 	u16 num_gfltr;
 	u16 num_bfltr;
 
@@ -882,7 +881,7 @@ ice_fetch_u64_stats_per_ring(struct u64_stats_sync *syncp,
 int ice_up(struct ice_vsi *vsi);
 int ice_down(struct ice_vsi *vsi);
 int ice_down_up(struct ice_vsi *vsi);
-int ice_vsi_cfg(struct ice_vsi *vsi);
+int ice_vsi_cfg_lan(struct ice_vsi *vsi);
 struct ice_vsi *ice_lb_vsi_setup(struct ice_pf *pf, struct ice_port_info *pi);
 int ice_vsi_determine_xdp_res(struct ice_vsi *vsi);
 int ice_prepare_xdp_rings(struct ice_vsi *vsi, struct bpf_prog *prog);
diff --git a/drivers/net/ethernet/intel/ice/ice_ethtool.c b/drivers/net/ethernet/intel/ice/ice_ethtool.c
index f71a7521c7bd..1dfce48cfdc2 100644
--- a/drivers/net/ethernet/intel/ice/ice_ethtool.c
+++ b/drivers/net/ethernet/intel/ice/ice_ethtool.c
@@ -656,7 +656,7 @@ static int ice_lbtest_prepare_rings(struct ice_vsi *vsi)
 	if (status)
 		goto err_setup_rx_ring;
 
-	status = ice_vsi_cfg(vsi);
+	status = ice_vsi_cfg_lan(vsi);
 	if (status)
 		goto err_setup_rx_ring;
 
diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c
index 7276badfa19e..42203a0efcac 100644
--- a/drivers/net/ethernet/intel/ice/ice_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_lib.c
@@ -348,7 +348,7 @@ static void ice_vsi_free_arrays(struct ice_vsi *vsi)
 }
 
 /**
- * ice_vsi_clear - clean up and deallocate the provided VSI
+ * ice_vsi_free - clean up and deallocate the provided VSI
  * @vsi: pointer to VSI being cleared
  *
  * This deallocates the VSI's queue resources, removes it from the PF's
@@ -356,7 +356,7 @@ static void ice_vsi_free_arrays(struct ice_vsi *vsi)
  *
  * Returns 0 on success, negative on failure
  */
-int ice_vsi_clear(struct ice_vsi *vsi)
+int ice_vsi_free(struct ice_vsi *vsi)
 {
 	struct ice_pf *pf = NULL;
 	struct device *dev;
@@ -2516,12 +2516,7 @@ ice_vsi_setup(struct ice_pf *pf, struct ice_port_info *pi,
 	struct ice_vsi *vsi;
 	int ret, i;
 
-	if (vsi_type == ICE_VSI_CHNL)
-		vsi = ice_vsi_alloc(pf, vsi_type, ch, NULL);
-	else if (vsi_type == ICE_VSI_VF || vsi_type == ICE_VSI_CTRL)
-		vsi = ice_vsi_alloc(pf, vsi_type, NULL, vf);
-	else
-		vsi = ice_vsi_alloc(pf, vsi_type, NULL, NULL);
+	vsi = ice_vsi_alloc(pf, vsi_type, ch, vf);
 
 	if (!vsi) {
 		dev_err(dev, "could not allocate VSI\n");
@@ -2530,17 +2525,13 @@ ice_vsi_setup(struct ice_pf *pf, struct ice_port_info *pi,
 
 	vsi->port_info = pi;
 	vsi->vsw = pf->first_sw;
-	if (vsi->type == ICE_VSI_PF)
-		vsi->ethtype = ETH_P_PAUSE;
 
 	ice_alloc_fd_res(vsi);
 
-	if (vsi_type != ICE_VSI_CHNL) {
-		if (ice_vsi_get_qs(vsi)) {
-			dev_err(dev, "Failed to allocate queues. vsi->idx = %d\n",
-				vsi->idx);
-			goto unroll_vsi_alloc;
-		}
+	if (ice_vsi_get_qs(vsi)) {
+		dev_err(dev, "Failed to allocate queues. vsi->idx = %d\n",
+			vsi->idx);
+		goto unroll_vsi_alloc;
 	}
 
 	/* set RSS capabilities */
@@ -2692,7 +2683,7 @@ ice_vsi_setup(struct ice_pf *pf, struct ice_port_info *pi,
 unroll_vsi_alloc:
 	if (vsi_type == ICE_VSI_VF)
 		ice_enable_lag(pf->lag);
-	ice_vsi_clear(vsi);
+	ice_vsi_free(vsi);
 
 	return NULL;
 }
@@ -3016,9 +3007,6 @@ int ice_vsi_release(struct ice_vsi *vsi)
 	if (test_bit(ICE_FLAG_RSS_ENA, pf->flags))
 		ice_rss_clean(vsi);
 
-	/* Disable VSI and free resources */
-	if (vsi->type != ICE_VSI_LB)
-		ice_vsi_dis_irq(vsi);
 	ice_vsi_close(vsi);
 
 	/* SR-IOV determines needed MSIX resources all at once instead of per
@@ -3034,18 +3022,12 @@ int ice_vsi_release(struct ice_vsi *vsi)
 		pf->num_avail_sw_msix += vsi->num_q_vectors;
 	}
 
-	if (!ice_is_safe_mode(pf)) {
-		if (vsi->type == ICE_VSI_PF) {
-			ice_fltr_remove_eth(vsi, ETH_P_PAUSE, ICE_FLTR_TX,
-					    ICE_DROP_PACKET);
-			ice_cfg_sw_lldp(vsi, true, false);
-			/* The Rx rule will only exist to remove if the LLDP FW
-			 * engine is currently stopped
-			 */
-			if (!test_bit(ICE_FLAG_FW_LLDP_AGENT, pf->flags))
-				ice_cfg_sw_lldp(vsi, false, false);
-		}
-	}
+	/* The Rx rule will only exist to remove if the LLDP FW
+	 * engine is currently stopped
+	 */
+	if (!ice_is_safe_mode(pf) && vsi->type == ICE_VSI_PF &&
+	    !test_bit(ICE_FLAG_FW_LLDP_AGENT, pf->flags))
+		ice_cfg_sw_lldp(vsi, false, false);
 
 	if (ice_is_vsi_dflt_vsi(vsi))
 		ice_clear_dflt_vsi(vsi);
@@ -3085,7 +3067,7 @@ int ice_vsi_release(struct ice_vsi *vsi)
 	 * for ex: during rmmod.
 	 */
 	if (!ice_is_reset_in_progress(pf->state))
-		ice_vsi_clear(vsi);
+		ice_vsi_free(vsi);
 
 	return 0;
 }
@@ -3384,6 +3366,7 @@ int ice_vsi_rebuild(struct ice_vsi *vsi, bool init_vsi)
 			ret = -EIO;
 			goto err_vectors;
 		} else {
+			kfree(coalesce);
 			return ice_schedule_reset(pf, ICE_RESET_PFR);
 		}
 	}
@@ -3402,7 +3385,7 @@ int ice_vsi_rebuild(struct ice_vsi *vsi, bool init_vsi)
 		vsi->netdev = NULL;
 	}
 err_vsi:
-	ice_vsi_clear(vsi);
+	ice_vsi_free(vsi);
 	set_bit(ICE_RESET_FAILED, pf->state);
 	kfree(coalesce);
 	return ret;
diff --git a/drivers/net/ethernet/intel/ice/ice_lib.h b/drivers/net/ethernet/intel/ice/ice_lib.h
index dcdf69a693e9..6203114b805c 100644
--- a/drivers/net/ethernet/intel/ice/ice_lib.h
+++ b/drivers/net/ethernet/intel/ice/ice_lib.h
@@ -42,7 +42,7 @@ void ice_cfg_sw_lldp(struct ice_vsi *vsi, bool tx, bool create);
 int ice_set_link(struct ice_vsi *vsi, bool ena);
 
 void ice_vsi_delete(struct ice_vsi *vsi);
-int ice_vsi_clear(struct ice_vsi *vsi);
+int ice_vsi_free(struct ice_vsi *vsi);
 
 int ice_vsi_cfg_tc(struct ice_vsi *vsi, u8 ena_tc);
 
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index d5bd56a213ae..d39ceb3a3b72 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -6091,12 +6091,12 @@ static int ice_vsi_vlan_setup(struct ice_vsi *vsi)
 }
 
 /**
- * ice_vsi_cfg - Setup the VSI
+ * ice_vsi_cfg_lan - Setup the VSI lan related config
  * @vsi: the VSI being configured
  *
  * Return 0 on success and negative value on error
  */
-int ice_vsi_cfg(struct ice_vsi *vsi)
+int ice_vsi_cfg_lan(struct ice_vsi *vsi)
 {
 	int err;
 
@@ -6314,7 +6314,7 @@ int ice_up(struct ice_vsi *vsi)
 {
 	int err;
 
-	err = ice_vsi_cfg(vsi);
+	err = ice_vsi_cfg_lan(vsi);
 	if (!err)
 		err = ice_up_complete(vsi);
 
@@ -6855,7 +6855,7 @@ int ice_vsi_open_ctrl(struct ice_vsi *vsi)
 	if (err)
 		goto err_setup_rx;
 
-	err = ice_vsi_cfg(vsi);
+	err = ice_vsi_cfg_lan(vsi);
 	if (err)
 		goto err_setup_rx;
 
@@ -6909,7 +6909,7 @@ int ice_vsi_open(struct ice_vsi *vsi)
 	if (err)
 		goto err_setup_rx;
 
-	err = ice_vsi_cfg(vsi);
+	err = ice_vsi_cfg_lan(vsi);
 	if (err)
 		goto err_setup_rx;
 
@@ -8343,7 +8343,7 @@ static void ice_remove_q_channels(struct ice_vsi *vsi, bool rem_fltr)
 		ice_vsi_delete(ch->ch_vsi);
 
 		/* Delete VSI from PF and HW VSI arrays */
-		ice_vsi_clear(ch->ch_vsi);
+		ice_vsi_free(ch->ch_vsi);
 
 		/* free the channel */
 		kfree(ch);
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH net-next 04/13] ice: split ice_vsi_setup into smaller functions
  2022-11-14 12:57 [PATCH net-next 00/13] resource management using devlink reload Michal Swiatkowski
                   ` (2 preceding siblings ...)
  2022-11-14 12:57 ` [PATCH net-next 03/13] ice: cleanup in VSI config/deconfig code Michal Swiatkowski
@ 2022-11-14 12:57 ` Michal Swiatkowski
  2022-11-15  5:08   ` Jakub Kicinski
  2022-11-14 12:57 ` [PATCH net-next 05/13] ice: stop hard coding the ICE_VSI_CTRL location Michal Swiatkowski
                   ` (9 subsequent siblings)
  13 siblings, 1 reply; 49+ messages in thread
From: Michal Swiatkowski @ 2022-11-14 12:57 UTC (permalink / raw)
  To: netdev, davem, kuba, pabeni, edumazet
  Cc: intel-wired-lan, jiri, anthony.l.nguyen, alexandr.lobakin,
	sridhar.samudrala, wojciech.drewek, lukasz.czapnik,
	shiraz.saleem, jesse.brandeburg, mustafa.ismail,
	przemyslaw.kitszel, piotr.raczynski, jacob.e.keller,
	david.m.ertman, leszek.kaliszczuk, Michal Swiatkowski

Main goal is to reuse the same functions in VSI config and rebuild
paths.
To do this split ice_vsi_setup into smaller pieces and reuse it during
rebuild.

ice_vsi_alloc() should only alloc memory, not set the default values
for VSI.
Move setting defaults to separate function. This will allow config of
already allocated VSI, for example in reload path.

The path is mostly moving code around without introducing new
functionality. Functions ice_vsi_cfg() and ice_vsi_decfg() were
added, but they are using code that already exist.

Use flag to pass information about VSI initialization during rebuild
instead of using boolean value.

Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Co-developed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_lib.c    | 648 +++++++++-----------
 drivers/net/ethernet/intel/ice/ice_lib.h    |   7 +-
 drivers/net/ethernet/intel/ice/ice_main.c   |  12 +-
 drivers/net/ethernet/intel/ice/ice_vf_lib.c |   2 +-
 4 files changed, 301 insertions(+), 368 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c
index 42203a0efcac..298bb901e4e7 100644
--- a/drivers/net/ethernet/intel/ice/ice_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_lib.c
@@ -447,9 +447,55 @@ static irqreturn_t ice_eswitch_msix_clean_rings(int __always_unused irq, void *d
 	return IRQ_HANDLED;
 }
 
+/**
+ * ice_vsi_alloc_def - set default values for already allocated VSI
+ * @vsi_type: type of VSI
+ * @ch: ptr to channel
+ * @vf: VF for ICE_VSI_VF and ICE_VSI_CTRL
+ */
+static int
+ice_vsi_alloc_def(struct ice_vsi *vsi, struct ice_vf *vf,
+		  struct ice_channel *ch)
+{
+	if (vsi->type != ICE_VSI_CHNL) {
+		ice_vsi_set_num_qs(vsi, vf);
+		if (ice_vsi_alloc_arrays(vsi))
+			return -ENOMEM;
+	}
+
+	switch (vsi->type) {
+	case ICE_VSI_SWITCHDEV_CTRL:
+		/* Setup eswitch MSIX irq handler for VSI */
+		vsi->irq_handler = ice_eswitch_msix_clean_rings;
+		break;
+	case ICE_VSI_PF:
+		/* Setup default MSIX irq handler for VSI */
+		vsi->irq_handler = ice_msix_clean_rings;
+		break;
+	case ICE_VSI_CTRL:
+		/* Setup ctrl VSI MSIX irq handler */
+		vsi->irq_handler = ice_msix_clean_ctrl_vsi;
+		break;
+	case ICE_VSI_CHNL:
+		if (!ch)
+			return -EINVAL;
+
+		vsi->num_rxq = ch->num_rxq;
+		vsi->num_txq = ch->num_txq;
+		vsi->next_base_q = ch->base_q;
+		break;
+	default:
+		ice_vsi_free_arrays(vsi);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
 /**
  * ice_vsi_alloc - Allocates the next available struct VSI in the PF
  * @pf: board private structure
+ * @pi: pointer to the port_info instance
  * @vsi_type: type of VSI
  * @ch: ptr to channel
  * @vf: VF for ICE_VSI_VF and ICE_VSI_CTRL
@@ -461,8 +507,9 @@ static irqreturn_t ice_eswitch_msix_clean_rings(int __always_unused irq, void *d
  * returns a pointer to a VSI on success, NULL on failure.
  */
 static struct ice_vsi *
-ice_vsi_alloc(struct ice_pf *pf, enum ice_vsi_type vsi_type,
-	      struct ice_channel *ch, struct ice_vf *vf)
+ice_vsi_alloc(struct ice_pf *pf, struct ice_port_info *pi,
+	      enum ice_vsi_type vsi_type, struct ice_channel *ch,
+	      struct ice_vf *vf)
 {
 	struct device *dev = ice_pf_to_dev(pf);
 	struct ice_vsi *vsi = NULL;
@@ -488,61 +535,11 @@ ice_vsi_alloc(struct ice_pf *pf, enum ice_vsi_type vsi_type,
 
 	vsi->type = vsi_type;
 	vsi->back = pf;
+	vsi->port_info = pi;
+	/* For VSIs which don't have a connected VF, this will be NULL */
+	vsi->vf = vf;
 	set_bit(ICE_VSI_DOWN, vsi->state);
 
-	if (vsi_type == ICE_VSI_VF)
-		ice_vsi_set_num_qs(vsi, vf);
-	else if (vsi_type != ICE_VSI_CHNL)
-		ice_vsi_set_num_qs(vsi, NULL);
-
-	switch (vsi->type) {
-	case ICE_VSI_SWITCHDEV_CTRL:
-		if (ice_vsi_alloc_arrays(vsi))
-			goto err_rings;
-
-		/* Setup eswitch MSIX irq handler for VSI */
-		vsi->irq_handler = ice_eswitch_msix_clean_rings;
-		break;
-	case ICE_VSI_PF:
-		if (ice_vsi_alloc_arrays(vsi))
-			goto err_rings;
-
-		/* Setup default MSIX irq handler for VSI */
-		vsi->irq_handler = ice_msix_clean_rings;
-		break;
-	case ICE_VSI_CTRL:
-		if (ice_vsi_alloc_arrays(vsi))
-			goto err_rings;
-
-		/* Setup ctrl VSI MSIX irq handler */
-		vsi->irq_handler = ice_msix_clean_ctrl_vsi;
-
-		/* For the PF control VSI this is NULL, for the VF control VSI
-		 * this will be the first VF to allocate it.
-		 */
-		vsi->vf = vf;
-		break;
-	case ICE_VSI_VF:
-		if (ice_vsi_alloc_arrays(vsi))
-			goto err_rings;
-		vsi->vf = vf;
-		break;
-	case ICE_VSI_CHNL:
-		if (!ch)
-			goto err_rings;
-		vsi->num_rxq = ch->num_rxq;
-		vsi->num_txq = ch->num_txq;
-		vsi->next_base_q = ch->base_q;
-		break;
-	case ICE_VSI_LB:
-		if (ice_vsi_alloc_arrays(vsi))
-			goto err_rings;
-		break;
-	default:
-		dev_warn(dev, "Unknown VSI type %d\n", vsi->type);
-		goto unlock_pf;
-	}
-
 	if (vsi->type == ICE_VSI_CTRL && !vf) {
 		/* Use the last VSI slot as the index for PF control VSI */
 		vsi->idx = pf->num_alloc_vsi - 1;
@@ -560,11 +557,7 @@ ice_vsi_alloc(struct ice_pf *pf, enum ice_vsi_type vsi_type,
 
 	if (vsi->type == ICE_VSI_CTRL && vf)
 		vf->ctrl_vsi_idx = vsi->idx;
-	goto unlock_pf;
 
-err_rings:
-	devm_kfree(dev, vsi);
-	vsi = NULL;
 unlock_pf:
 	mutex_unlock(&pf->sw_mutex);
 	return vsi;
@@ -1129,12 +1122,12 @@ ice_chnl_vsi_setup_q_map(struct ice_vsi *vsi, struct ice_vsi_ctx *ctxt)
 /**
  * ice_vsi_init - Create and initialize a VSI
  * @vsi: the VSI being configured
- * @init_vsi: is this call creating a VSI
+ * @init_vsi: flag, tell if VSI need to be initialized
  *
  * This initializes a VSI context depending on the VSI type to be added and
  * passes it down to the add_vsi aq command to create a new VSI.
  */
-static int ice_vsi_init(struct ice_vsi *vsi, bool init_vsi)
+static int ice_vsi_init(struct ice_vsi *vsi, int init_vsi)
 {
 	struct ice_pf *pf = vsi->back;
 	struct ice_hw *hw = &pf->hw;
@@ -1196,7 +1189,7 @@ static int ice_vsi_init(struct ice_vsi *vsi, bool init_vsi)
 		/* if updating VSI context, make sure to set valid_section:
 		 * to indicate which section of VSI context being updated
 		 */
-		if (!init_vsi)
+		if (!(init_vsi & ICE_VSI_FLAG_INIT))
 			ctxt->info.valid_sections |=
 				cpu_to_le16(ICE_AQ_VSI_PROP_Q_OPT_VALID);
 	}
@@ -1209,7 +1202,8 @@ static int ice_vsi_init(struct ice_vsi *vsi, bool init_vsi)
 		if (ret)
 			goto out;
 
-		if (!init_vsi) /* means VSI being updated */
+		if (!(init_vsi & ICE_VSI_FLAG_INIT))
+			/* means VSI being updated */
 			/* must to indicate which section of VSI context are
 			 * being modified
 			 */
@@ -1224,7 +1218,7 @@ static int ice_vsi_init(struct ice_vsi *vsi, bool init_vsi)
 			cpu_to_le16(ICE_AQ_VSI_PROP_SECURITY_VALID);
 	}
 
-	if (init_vsi) {
+	if (init_vsi & ICE_VSI_FLAG_INIT) {
 		ret = ice_add_vsi(hw, vsi->idx, ctxt, NULL);
 		if (ret) {
 			dev_err(dev, "Add VSI failed, err %d\n", ret);
@@ -2493,39 +2487,89 @@ static void ice_set_agg_vsi(struct ice_vsi *vsi)
 }
 
 /**
- * ice_vsi_setup - Set up a VSI by a given type
- * @pf: board private structure
- * @pi: pointer to the port_info instance
- * @vsi_type: VSI type
- * @vf: pointer to VF to which this VSI connects. This field is used primarily
- *      for the ICE_VSI_VF type. Other VSI types should pass NULL.
- * @ch: ptr to channel
- *
- * This allocates the sw VSI structure and its queue resources.
+ * ice_free_vf_ctrl_res - Free the VF control VSI resource
+ * @pf: pointer to PF structure
+ * @vsi: the VSI to free resources for
  *
- * Returns pointer to the successfully allocated and configured VSI sw struct on
- * success, NULL on failure.
+ * Check if the VF control VSI resource is still in use. If no VF is using it
+ * any more, release the VSI resource. Otherwise, leave it to be cleaned up
+ * once no other VF uses it.
  */
-struct ice_vsi *
-ice_vsi_setup(struct ice_pf *pf, struct ice_port_info *pi,
-	      enum ice_vsi_type vsi_type, struct ice_vf *vf,
-	      struct ice_channel *ch)
+static void ice_free_vf_ctrl_res(struct ice_pf *pf,  struct ice_vsi *vsi)
+{
+	struct ice_vf *vf;
+	unsigned int bkt;
+
+	rcu_read_lock();
+	ice_for_each_vf_rcu(pf, bkt, vf) {
+		if (vf != vsi->vf && vf->ctrl_vsi_idx != ICE_NO_VSI) {
+			rcu_read_unlock();
+			return;
+		}
+	}
+	rcu_read_unlock();
+
+	/* No other VFs left that have control VSI. It is now safe to reclaim
+	 * SW interrupts back to the common pool.
+	 */
+	ice_free_res(pf->irq_tracker, vsi->base_vector,
+		     ICE_RES_VF_CTRL_VEC_ID);
+	pf->num_avail_sw_msix += vsi->num_q_vectors;
+}
+
+static int ice_vsi_cfg_tc_lan(struct ice_pf *pf, struct ice_vsi *vsi)
 {
 	u16 max_txqs[ICE_MAX_TRAFFIC_CLASS] = { 0 };
 	struct device *dev = ice_pf_to_dev(pf);
-	struct ice_vsi *vsi;
 	int ret, i;
 
-	vsi = ice_vsi_alloc(pf, vsi_type, ch, vf);
+	/* configure VSI nodes based on number of queues and TC's */
+	ice_for_each_traffic_class(i) {
+		if (!(vsi->tc_cfg.ena_tc & BIT(i)))
+			continue;
 
-	if (!vsi) {
-		dev_err(dev, "could not allocate VSI\n");
-		return NULL;
+		if (vsi->type == ICE_VSI_CHNL) {
+			if (!vsi->alloc_txq && vsi->num_txq)
+				max_txqs[i] = vsi->num_txq;
+			else
+				max_txqs[i] = pf->num_lan_tx;
+		} else {
+			max_txqs[i] = vsi->alloc_txq;
+		}
 	}
 
-	vsi->port_info = pi;
+	dev_dbg(dev, "vsi->tc_cfg.ena_tc = %d\n", vsi->tc_cfg.ena_tc);
+	ret = ice_cfg_vsi_lan(vsi->port_info, vsi->idx, vsi->tc_cfg.ena_tc,
+			      max_txqs);
+	if (ret) {
+		dev_err(dev, "VSI %d failed lan queue config, error %d\n",
+			vsi->vsi_num, ret);
+		return ret;
+	}
+
+	return 0;
+}
+
+/**
+ * ice_vsi_cfg_def - configure default VSI based on the type
+ * @vsi: pointer to VSI
+ * @vf: pointer to VF to which this VSI connects. This field is used primarily
+ *      for the ICE_VSI_VF type. Other VSI types should pass NULL.
+ * @ch: ptr to channel
+ */
+static int
+ice_vsi_cfg_def(struct ice_vsi *vsi, struct ice_vf *vf, struct ice_channel *ch)
+{
+	struct device *dev = ice_pf_to_dev(vsi->back);
+	struct ice_pf *pf = vsi->back;
+	int ret;
+
 	vsi->vsw = pf->first_sw;
 
+	ret = ice_vsi_alloc_def(vsi, vf, ch);
+	if (ret)
+		return ret;
+
 	ice_alloc_fd_res(vsi);
 
 	if (ice_vsi_get_qs(vsi)) {
@@ -2568,6 +2612,14 @@ ice_vsi_setup(struct ice_pf *pf, struct ice_port_info *pi,
 			goto unroll_vector_base;
 
 		ice_vsi_map_rings_to_vectors(vsi);
+		if (ice_is_xdp_ena_vsi(vsi)) {
+			ret = ice_vsi_determine_xdp_res(vsi);
+			if (ret)
+				goto unroll_vector_base;
+			ret = ice_prepare_xdp_rings(vsi, vsi->xdp_prog);
+			if (ret)
+				goto unroll_vector_base;
+		}
 
 		/* ICE_VSI_CTRL does not need RSS so skip RSS processing */
 		if (vsi->type != ICE_VSI_CTRL)
@@ -2624,29 +2676,135 @@ ice_vsi_setup(struct ice_pf *pf, struct ice_port_info *pi,
 		goto unroll_vsi_init;
 	}
 
-	/* configure VSI nodes based on number of queues and TC's */
-	ice_for_each_traffic_class(i) {
-		if (!(vsi->tc_cfg.ena_tc & BIT(i)))
-			continue;
+	return 0;
 
-		if (vsi->type == ICE_VSI_CHNL) {
-			if (!vsi->alloc_txq && vsi->num_txq)
-				max_txqs[i] = vsi->num_txq;
-			else
-				max_txqs[i] = pf->num_lan_tx;
-		} else {
-			max_txqs[i] = vsi->alloc_txq;
-		}
+unroll_vector_base:
+	/* reclaim SW interrupts back to the common pool */
+	ice_free_res(pf->irq_tracker, vsi->base_vector, vsi->idx);
+	pf->num_avail_sw_msix += vsi->num_q_vectors;
+unroll_alloc_q_vector:
+	ice_vsi_free_q_vectors(vsi);
+unroll_vsi_init:
+	ice_vsi_delete(vsi);
+unroll_get_qs:
+	ice_vsi_put_qs(vsi);
+unroll_vsi_alloc:
+	ice_vsi_free_arrays(vsi);
+	return ret;
+}
+
+/**
+ * ice_vsi_cfg - configure VSI and tc on it
+ * @vsi: pointer to VSI
+ * @vf: pointer to VF to which this VSI connects. This field is used primarily
+ *      for the ICE_VSI_VF type. Other VSI types should pass NULL.
+ * @ch: ptr to channel
+ */
+int ice_vsi_cfg(struct ice_vsi *vsi, struct ice_vf *vf, struct ice_channel *ch)
+{
+	int ret;
+
+	ret = ice_vsi_cfg_def(vsi, vf, ch);
+	if (ret)
+		return ret;
+
+	ret = ice_vsi_cfg_tc_lan(vsi->back, vsi);
+	if (ret)
+		ice_vsi_decfg(vsi);
+
+	return ret;
+}
+
+/**
+ * ice_vsi_decfg - remove all VSI configuration
+ * @vsi: pointer to VSI
+ */
+void ice_vsi_decfg(struct ice_vsi *vsi)
+{
+	struct ice_pf *pf = vsi->back;
+	int err;
+
+	/* The Rx rule will only exist to remove if the LLDP FW
+	 * engine is currently stopped
+	 */
+	if (!ice_is_safe_mode(pf) && vsi->type == ICE_VSI_PF &&
+	    !test_bit(ICE_FLAG_FW_LLDP_AGENT, pf->flags))
+		ice_cfg_sw_lldp(vsi, false, false);
+
+	ice_fltr_remove_all(vsi);
+	ice_rm_vsi_lan_cfg(vsi->port_info, vsi->idx);
+	err = ice_rm_vsi_rdma_cfg(vsi->port_info, vsi->idx);
+	if (err)
+		dev_err(ice_pf_to_dev(pf), "Failed to remove RDMA scheduler config for VSI %u, err %d\n",
+			vsi->vsi_num, err);
+
+	if (ice_is_xdp_ena_vsi(vsi))
+		/* return value check can be skipped here, it always returns
+		 * 0 if reset is in progress
+		 */
+		ice_destroy_xdp_rings(vsi);
+
+	ice_vsi_clear_rings(vsi);
+	ice_vsi_free_q_vectors(vsi);
+	ice_vsi_delete(vsi);
+	ice_vsi_put_qs(vsi);
+	ice_vsi_free_arrays(vsi);
+
+	/* SR-IOV determines needed MSIX resources all at once instead of per
+	 * VSI since when VFs are spawned we know how many VFs there are and how
+	 * many interrupts each VF needs. SR-IOV MSIX resources are also
+	 * cleared in the same manner.
+	 */
+	if (vsi->type == ICE_VSI_CTRL && vsi->vf) {
+		ice_free_vf_ctrl_res(pf, vsi);
+	} else if (vsi->type != ICE_VSI_VF) {
+		/* reclaim SW interrupts back to the common pool */
+		ice_free_res(pf->irq_tracker, vsi->base_vector, vsi->idx);
+		pf->num_avail_sw_msix += vsi->num_q_vectors;
+		vsi->base_vector = 0;
 	}
 
-	dev_dbg(dev, "vsi->tc_cfg.ena_tc = %d\n", vsi->tc_cfg.ena_tc);
-	ret = ice_cfg_vsi_lan(vsi->port_info, vsi->idx, vsi->tc_cfg.ena_tc,
-			      max_txqs);
-	if (ret) {
-		dev_err(dev, "VSI %d failed lan queue config, error %d\n",
-			vsi->vsi_num, ret);
-		goto unroll_clear_rings;
+	if (vsi->type == ICE_VSI_VF &&
+	    vsi->agg_node && vsi->agg_node->valid)
+		vsi->agg_node->num_vsis--;
+	if (vsi->agg_node) {
+		vsi->agg_node->valid = false;
+		vsi->agg_node->agg_id = 0;
 	}
+}
+
+/**
+ * ice_vsi_setup - Set up a VSI by a given type
+ * @pf: board private structure
+ * @pi: pointer to the port_info instance
+ * @vsi_type: VSI type
+ * @vf: pointer to VF to which this VSI connects. This field is used primarily
+ *      for the ICE_VSI_VF type. Other VSI types should pass NULL.
+ * @ch: ptr to channel
+ *
+ * This allocates the sw VSI structure and its queue resources.
+ *
+ * Returns pointer to the successfully allocated and configured VSI sw struct on
+ * success, NULL on failure.
+ */
+struct ice_vsi *
+ice_vsi_setup(struct ice_pf *pf, struct ice_port_info *pi,
+	      enum ice_vsi_type vsi_type, struct ice_vf *vf,
+	      struct ice_channel *ch)
+{
+	struct device *dev = ice_pf_to_dev(pf);
+	struct ice_vsi *vsi;
+	int ret;
+
+	vsi = ice_vsi_alloc(pf, pi, vsi_type, ch, vf);
+	if (!vsi) {
+		dev_err(dev, "could not allocate VSI\n");
+		return NULL;
+	}
+
+	ret = ice_vsi_cfg(vsi, vf, ch);
+	if (ret)
+		goto err_vsi_cfg;
 
 	/* Add switch rule to drop all Tx Flow Control Frames, of look up
 	 * type ETHERTYPE from VSIs, and restrict malicious VF from sending
@@ -2657,30 +2815,18 @@ ice_vsi_setup(struct ice_pf *pf, struct ice_port_info *pi,
 	 * be dropped so that VFs cannot send LLDP packets to reconfig DCB
 	 * settings in the HW.
 	 */
-	if (!ice_is_safe_mode(pf))
-		if (vsi->type == ICE_VSI_PF) {
-			ice_fltr_add_eth(vsi, ETH_P_PAUSE, ICE_FLTR_TX,
-					 ICE_DROP_PACKET);
-			ice_cfg_sw_lldp(vsi, true, true);
-		}
+	if (!ice_is_safe_mode(pf) && vsi->type == ICE_VSI_PF) {
+		ice_fltr_add_eth(vsi, ETH_P_PAUSE, ICE_FLTR_TX,
+				 ICE_DROP_PACKET);
+		ice_cfg_sw_lldp(vsi, true, true);
+	}
 
 	if (!vsi->agg_node)
 		ice_set_agg_vsi(vsi);
+
 	return vsi;
 
-unroll_clear_rings:
-	ice_vsi_clear_rings(vsi);
-unroll_vector_base:
-	/* reclaim SW interrupts back to the common pool */
-	ice_free_res(pf->irq_tracker, vsi->base_vector, vsi->idx);
-	pf->num_avail_sw_msix += vsi->num_q_vectors;
-unroll_alloc_q_vector:
-	ice_vsi_free_q_vectors(vsi);
-unroll_vsi_init:
-	ice_vsi_delete(vsi);
-unroll_get_qs:
-	ice_vsi_put_qs(vsi);
-unroll_vsi_alloc:
+err_vsi_cfg:
 	if (vsi_type == ICE_VSI_VF)
 		ice_enable_lag(pf->lag);
 	ice_vsi_free(vsi);
@@ -2946,37 +3092,6 @@ void ice_napi_del(struct ice_vsi *vsi)
 		netif_napi_del(&vsi->q_vectors[v_idx]->napi);
 }
 
-/**
- * ice_free_vf_ctrl_res - Free the VF control VSI resource
- * @pf: pointer to PF structure
- * @vsi: the VSI to free resources for
- *
- * Check if the VF control VSI resource is still in use. If no VF is using it
- * any more, release the VSI resource. Otherwise, leave it to be cleaned up
- * once no other VF uses it.
- */
-static void ice_free_vf_ctrl_res(struct ice_pf *pf,  struct ice_vsi *vsi)
-{
-	struct ice_vf *vf;
-	unsigned int bkt;
-
-	rcu_read_lock();
-	ice_for_each_vf_rcu(pf, bkt, vf) {
-		if (vf != vsi->vf && vf->ctrl_vsi_idx != ICE_NO_VSI) {
-			rcu_read_unlock();
-			return;
-		}
-	}
-	rcu_read_unlock();
-
-	/* No other VFs left that have control VSI. It is now safe to reclaim
-	 * SW interrupts back to the common pool.
-	 */
-	ice_free_res(pf->irq_tracker, vsi->base_vector,
-		     ICE_RES_VF_CTRL_VEC_ID);
-	pf->num_avail_sw_msix += vsi->num_q_vectors;
-}
-
 /**
  * ice_vsi_release - Delete a VSI and free its resources
  * @vsi: the VSI being removed
@@ -2986,7 +3101,6 @@ static void ice_free_vf_ctrl_res(struct ice_pf *pf,  struct ice_vsi *vsi)
 int ice_vsi_release(struct ice_vsi *vsi)
 {
 	struct ice_pf *pf;
-	int err;
 
 	if (!vsi->back)
 		return -ENODEV;
@@ -3004,41 +3118,14 @@ int ice_vsi_release(struct ice_vsi *vsi)
 		clear_bit(ICE_VSI_NETDEV_REGISTERED, vsi->state);
 	}
 
+	if (vsi->type == ICE_VSI_PF)
+		ice_devlink_destroy_pf_port(pf);
+
 	if (test_bit(ICE_FLAG_RSS_ENA, pf->flags))
 		ice_rss_clean(vsi);
 
 	ice_vsi_close(vsi);
-
-	/* SR-IOV determines needed MSIX resources all at once instead of per
-	 * VSI since when VFs are spawned we know how many VFs there are and how
-	 * many interrupts each VF needs. SR-IOV MSIX resources are also
-	 * cleared in the same manner.
-	 */
-	if (vsi->type == ICE_VSI_CTRL && vsi->vf) {
-		ice_free_vf_ctrl_res(pf, vsi);
-	} else if (vsi->type != ICE_VSI_VF) {
-		/* reclaim SW interrupts back to the common pool */
-		ice_free_res(pf->irq_tracker, vsi->base_vector, vsi->idx);
-		pf->num_avail_sw_msix += vsi->num_q_vectors;
-	}
-
-	/* The Rx rule will only exist to remove if the LLDP FW
-	 * engine is currently stopped
-	 */
-	if (!ice_is_safe_mode(pf) && vsi->type == ICE_VSI_PF &&
-	    !test_bit(ICE_FLAG_FW_LLDP_AGENT, pf->flags))
-		ice_cfg_sw_lldp(vsi, false, false);
-
-	if (ice_is_vsi_dflt_vsi(vsi))
-		ice_clear_dflt_vsi(vsi);
-	ice_fltr_remove_all(vsi);
-	ice_rm_vsi_lan_cfg(vsi->port_info, vsi->idx);
-	err = ice_rm_vsi_rdma_cfg(vsi->port_info, vsi->idx);
-	if (err)
-		dev_err(ice_pf_to_dev(vsi->back), "Failed to remove RDMA scheduler config for VSI %u, err %d\n",
-			vsi->vsi_num, err);
-	ice_vsi_delete(vsi);
-	ice_vsi_free_q_vectors(vsi);
+	ice_vsi_decfg(vsi);
 
 	if (vsi->netdev) {
 		if (test_bit(ICE_VSI_NETDEV_REGISTERED, vsi->state)) {
@@ -3052,16 +3139,6 @@ int ice_vsi_release(struct ice_vsi *vsi)
 		}
 	}
 
-	if (vsi->type == ICE_VSI_PF)
-		ice_devlink_destroy_pf_port(pf);
-
-	if (vsi->type == ICE_VSI_VF &&
-	    vsi->agg_node && vsi->agg_node->valid)
-		vsi->agg_node->num_vsis--;
-	ice_vsi_clear_rings(vsi);
-
-	ice_vsi_put_qs(vsi);
-
 	/* retain SW VSI data structure since it is needed to unregister and
 	 * free VSI netdev when PF is not in reset recovery pending state,\
 	 * for ex: during rmmod.
@@ -3189,29 +3266,24 @@ ice_vsi_rebuild_set_coalesce(struct ice_vsi *vsi,
 /**
  * ice_vsi_rebuild - Rebuild VSI after reset
  * @vsi: VSI to be rebuild
- * @init_vsi: is this an initialization or a reconfigure of the VSI
+ * @init_vsi: flag, tell if VSI need to be initialized
  *
  * Returns 0 on success and negative value on failure
  */
-int ice_vsi_rebuild(struct ice_vsi *vsi, bool init_vsi)
+int ice_vsi_rebuild(struct ice_vsi *vsi, int init_vsi)
 {
-	u16 max_txqs[ICE_MAX_TRAFFIC_CLASS] = { 0 };
 	struct ice_coalesce_stored *coalesce;
-	int prev_num_q_vectors = 0;
-	enum ice_vsi_type vtype;
+	int prev_num_q_vectors;
 	struct ice_pf *pf;
-	int ret, i;
+	int ret;
 
 	if (!vsi)
 		return -EINVAL;
 
 	pf = vsi->back;
-	vtype = vsi->type;
-	if (WARN_ON(vtype == ICE_VSI_VF && !vsi->vf))
+	if (WARN_ON(vsi->type == ICE_VSI_VF && !vsi->vf))
 		return -EINVAL;
 
-	ice_vsi_init_vlan_ops(vsi);
-
 	coalesce = kcalloc(vsi->num_q_vectors,
 			   sizeof(struct ice_coalesce_stored), GFP_KERNEL);
 	if (!coalesce)
@@ -3219,174 +3291,30 @@ int ice_vsi_rebuild(struct ice_vsi *vsi, bool init_vsi)
 
 	prev_num_q_vectors = ice_vsi_rebuild_get_coalesce(vsi, coalesce);
 
-	ice_rm_vsi_lan_cfg(vsi->port_info, vsi->idx);
-	ret = ice_rm_vsi_rdma_cfg(vsi->port_info, vsi->idx);
+	ice_vsi_decfg(vsi);
+	ret = ice_vsi_cfg_def(vsi, vsi->vf, vsi->ch);
 	if (ret)
-		dev_err(ice_pf_to_dev(vsi->back), "Failed to remove RDMA scheduler config for VSI %u, err %d\n",
-			vsi->vsi_num, ret);
-	ice_vsi_free_q_vectors(vsi);
-
-	/* SR-IOV determines needed MSIX resources all at once instead of per
-	 * VSI since when VFs are spawned we know how many VFs there are and how
-	 * many interrupts each VF needs. SR-IOV MSIX resources are also
-	 * cleared in the same manner.
-	 */
-	if (vtype != ICE_VSI_VF) {
-		/* reclaim SW interrupts back to the common pool */
-		ice_free_res(pf->irq_tracker, vsi->base_vector, vsi->idx);
-		pf->num_avail_sw_msix += vsi->num_q_vectors;
-		vsi->base_vector = 0;
-	}
-
-	if (ice_is_xdp_ena_vsi(vsi))
-		/* return value check can be skipped here, it always returns
-		 * 0 if reset is in progress
-		 */
-		ice_destroy_xdp_rings(vsi);
-	ice_vsi_put_qs(vsi);
-	ice_vsi_clear_rings(vsi);
-	ice_vsi_free_arrays(vsi);
-	if (vtype == ICE_VSI_VF)
-		ice_vsi_set_num_qs(vsi, vsi->vf);
-	else
-		ice_vsi_set_num_qs(vsi, NULL);
-
-	ret = ice_vsi_alloc_arrays(vsi);
-	if (ret < 0)
-		goto err_vsi;
-
-	ice_vsi_get_qs(vsi);
-
-	ice_alloc_fd_res(vsi);
-	ice_vsi_set_tc_cfg(vsi);
-
-	/* Initialize VSI struct elements and create VSI in FW */
-	ret = ice_vsi_init(vsi, init_vsi);
-	if (ret < 0)
-		goto err_vsi;
-
-	switch (vtype) {
-	case ICE_VSI_CTRL:
-	case ICE_VSI_SWITCHDEV_CTRL:
-	case ICE_VSI_PF:
-		ret = ice_vsi_alloc_q_vectors(vsi);
-		if (ret)
-			goto err_rings;
-
-		ret = ice_vsi_setup_vector_base(vsi);
-		if (ret)
-			goto err_vectors;
-
-		ret = ice_vsi_set_q_vectors_reg_idx(vsi);
-		if (ret)
-			goto err_vectors;
-
-		ret = ice_vsi_alloc_rings(vsi);
-		if (ret)
-			goto err_vectors;
-
-		ice_vsi_map_rings_to_vectors(vsi);
-		if (ice_is_xdp_ena_vsi(vsi)) {
-			ret = ice_vsi_determine_xdp_res(vsi);
-			if (ret)
-				goto err_vectors;
-			ret = ice_prepare_xdp_rings(vsi, vsi->xdp_prog);
-			if (ret)
-				goto err_vectors;
-		}
-		/* ICE_VSI_CTRL does not need RSS so skip RSS processing */
-		if (vtype != ICE_VSI_CTRL)
-			/* Do not exit if configuring RSS had an issue, at
-			 * least receive traffic on first queue. Hence no
-			 * need to capture return value
-			 */
-			if (test_bit(ICE_FLAG_RSS_ENA, pf->flags))
-				ice_vsi_cfg_rss_lut_key(vsi);
-
-		/* disable or enable CRC stripping */
-		if (vsi->netdev)
-			ice_vsi_cfg_crc_strip(vsi, !!(vsi->netdev->features &
-					      NETIF_F_RXFCS));
-
-		break;
-	case ICE_VSI_VF:
-		ret = ice_vsi_alloc_q_vectors(vsi);
-		if (ret)
-			goto err_rings;
-
-		ret = ice_vsi_set_q_vectors_reg_idx(vsi);
-		if (ret)
-			goto err_vectors;
-
-		ret = ice_vsi_alloc_rings(vsi);
-		if (ret)
-			goto err_vectors;
-
-		break;
-	case ICE_VSI_CHNL:
-		if (test_bit(ICE_FLAG_RSS_ENA, pf->flags)) {
-			ice_vsi_cfg_rss_lut_key(vsi);
-			ice_vsi_set_rss_flow_fld(vsi);
-		}
-		break;
-	default:
-		break;
-	}
-
-	/* configure VSI nodes based on number of queues and TC's */
-	for (i = 0; i < vsi->tc_cfg.numtc; i++) {
-		/* configure VSI nodes based on number of queues and TC's.
-		 * ADQ creates VSIs for each TC/Channel but doesn't
-		 * allocate queues instead it reconfigures the PF queues
-		 * as per the TC command. So max_txqs should point to the
-		 * PF Tx queues.
-		 */
-		if (vtype == ICE_VSI_CHNL)
-			max_txqs[i] = pf->num_lan_tx;
-		else
-			max_txqs[i] = vsi->alloc_txq;
-
-		if (ice_is_xdp_ena_vsi(vsi))
-			max_txqs[i] += vsi->num_xdp_txq;
-	}
-
-	if (test_bit(ICE_FLAG_TC_MQPRIO, pf->flags))
-		/* If MQPRIO is set, means channel code path, hence for main
-		 * VSI's, use TC as 1
-		 */
-		ret = ice_cfg_vsi_lan(vsi->port_info, vsi->idx, 1, max_txqs);
-	else
-		ret = ice_cfg_vsi_lan(vsi->port_info, vsi->idx,
-				      vsi->tc_cfg.ena_tc, max_txqs);
+		goto err_vsi_cfg;
 
+	ret = ice_vsi_cfg_tc_lan(pf, vsi);
 	if (ret) {
-		dev_err(ice_pf_to_dev(pf), "VSI %d failed lan queue config, error %d\n",
-			vsi->vsi_num, ret);
-		if (init_vsi) {
+		if (init_vsi & ICE_VSI_FLAG_INIT) {
 			ret = -EIO;
-			goto err_vectors;
+			goto err_vsi_cfg_tc_lan;
 		} else {
 			kfree(coalesce);
 			return ice_schedule_reset(pf, ICE_RESET_PFR);
 		}
 	}
+
 	ice_vsi_rebuild_set_coalesce(vsi, coalesce, prev_num_q_vectors);
 	kfree(coalesce);
 
 	return 0;
 
-err_vectors:
-	ice_vsi_free_q_vectors(vsi);
-err_rings:
-	if (vsi->netdev) {
-		vsi->current_netdev_flags = 0;
-		unregister_netdev(vsi->netdev);
-		free_netdev(vsi->netdev);
-		vsi->netdev = NULL;
-	}
-err_vsi:
-	ice_vsi_free(vsi);
-	set_bit(ICE_RESET_FAILED, pf->state);
+err_vsi_cfg_tc_lan:
+	ice_vsi_decfg(vsi);
+err_vsi_cfg:
 	kfree(coalesce);
 	return ret;
 }
diff --git a/drivers/net/ethernet/intel/ice/ice_lib.h b/drivers/net/ethernet/intel/ice/ice_lib.h
index 6203114b805c..ad4d5314ca76 100644
--- a/drivers/net/ethernet/intel/ice/ice_lib.h
+++ b/drivers/net/ethernet/intel/ice/ice_lib.h
@@ -61,8 +61,11 @@ int ice_vsi_release(struct ice_vsi *vsi);
 
 void ice_vsi_close(struct ice_vsi *vsi);
 
+int ice_vsi_cfg(struct ice_vsi *vsi, struct ice_vf *vf,
+		struct ice_channel *ch);
 int ice_ena_vsi(struct ice_vsi *vsi, bool locked);
 
+void ice_vsi_decfg(struct ice_vsi *vsi);
 void ice_dis_vsi(struct ice_vsi *vsi, bool locked);
 
 int ice_free_res(struct ice_res_tracker *res, u16 index, u16 id);
@@ -70,7 +73,9 @@ int ice_free_res(struct ice_res_tracker *res, u16 index, u16 id);
 int
 ice_get_res(struct ice_pf *pf, struct ice_res_tracker *res, u16 needed, u16 id);
 
-int ice_vsi_rebuild(struct ice_vsi *vsi, bool init_vsi);
+#define ICE_VSI_FLAG_INIT	BIT(0)
+#define ICE_VSI_FLAG_NO_INIT	0
+int ice_vsi_rebuild(struct ice_vsi *vsi, int init_vsi);
 
 bool ice_is_reset_in_progress(unsigned long *state);
 int ice_wait_for_reset(struct ice_pf *pf, unsigned long timeout);
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index d39ceb3a3b72..c71f60c3bb15 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -4204,13 +4204,13 @@ int ice_vsi_recfg_qs(struct ice_vsi *vsi, int new_rx, int new_tx)
 
 	/* set for the next time the netdev is started */
 	if (!netif_running(vsi->netdev)) {
-		ice_vsi_rebuild(vsi, false);
+		ice_vsi_rebuild(vsi, ICE_VSI_FLAG_NO_INIT);
 		dev_dbg(ice_pf_to_dev(pf), "Link is down, queue count change happens when link is brought up\n");
 		goto done;
 	}
 
 	ice_vsi_close(vsi);
-	ice_vsi_rebuild(vsi, false);
+	ice_vsi_rebuild(vsi, ICE_VSI_FLAG_NO_INIT);
 	ice_pf_dcb_recfg(pf);
 	ice_vsi_open(vsi);
 done:
@@ -6994,7 +6994,7 @@ static int ice_vsi_rebuild_by_type(struct ice_pf *pf, enum ice_vsi_type type)
 			continue;
 
 		/* rebuild the VSI */
-		err = ice_vsi_rebuild(vsi, true);
+		err = ice_vsi_rebuild(vsi, ICE_VSI_FLAG_INIT);
 		if (err) {
 			dev_err(dev, "rebuild VSI failed, err %d, VSI index %d, type %s\n",
 				err, vsi->idx, ice_vsi_type_str(type));
@@ -8403,7 +8403,7 @@ static int ice_rebuild_channels(struct ice_pf *pf)
 		type = vsi->type;
 
 		/* rebuild ADQ VSI */
-		err = ice_vsi_rebuild(vsi, true);
+		err = ice_vsi_rebuild(vsi, ICE_VSI_FLAG_INIT);
 		if (err) {
 			dev_err(dev, "VSI (type:%s) at index %d rebuild failed, err %d\n",
 				ice_vsi_type_str(type), vsi->idx, err);
@@ -8629,14 +8629,14 @@ static int ice_setup_tc_mqprio_qdisc(struct net_device *netdev, void *type_data)
 	cur_rxq = vsi->num_rxq;
 
 	/* proceed with rebuild main VSI using correct number of queues */
-	ret = ice_vsi_rebuild(vsi, false);
+	ret = ice_vsi_rebuild(vsi, ICE_VSI_FLAG_NO_INIT);
 	if (ret) {
 		/* fallback to current number of queues */
 		dev_info(dev, "Rebuild failed with new queues, try with current number of queues\n");
 		vsi->req_txq = cur_txq;
 		vsi->req_rxq = cur_rxq;
 		clear_bit(ICE_RESET_FAILED, pf->state);
-		if (ice_vsi_rebuild(vsi, false)) {
+		if (ice_vsi_rebuild(vsi, ICE_VSI_FLAG_NO_INIT)) {
 			dev_err(dev, "Rebuild of main VSI failed again\n");
 			return ret;
 		}
diff --git a/drivers/net/ethernet/intel/ice/ice_vf_lib.c b/drivers/net/ethernet/intel/ice/ice_vf_lib.c
index 1c51778db951..149642765874 100644
--- a/drivers/net/ethernet/intel/ice/ice_vf_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_vf_lib.c
@@ -256,7 +256,7 @@ static int ice_vf_rebuild_vsi(struct ice_vf *vf)
 	if (WARN_ON(!vsi))
 		return -EINVAL;
 
-	if (ice_vsi_rebuild(vsi, true)) {
+	if (ice_vsi_rebuild(vsi, ICE_VSI_FLAG_INIT)) {
 		dev_err(ice_pf_to_dev(pf), "failed to rebuild VF %d VSI\n",
 			vf->vf_id);
 		return -EIO;
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH net-next 05/13] ice: stop hard coding the ICE_VSI_CTRL location
  2022-11-14 12:57 [PATCH net-next 00/13] resource management using devlink reload Michal Swiatkowski
                   ` (3 preceding siblings ...)
  2022-11-14 12:57 ` [PATCH net-next 04/13] ice: split ice_vsi_setup into smaller functions Michal Swiatkowski
@ 2022-11-14 12:57 ` Michal Swiatkowski
  2022-11-14 12:57 ` [PATCH net-next 06/13] ice: split probe into smaller functions Michal Swiatkowski
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 49+ messages in thread
From: Michal Swiatkowski @ 2022-11-14 12:57 UTC (permalink / raw)
  To: netdev, davem, kuba, pabeni, edumazet
  Cc: intel-wired-lan, jiri, anthony.l.nguyen, alexandr.lobakin,
	sridhar.samudrala, wojciech.drewek, lukasz.czapnik,
	shiraz.saleem, jesse.brandeburg, mustafa.ismail,
	przemyslaw.kitszel, piotr.raczynski, jacob.e.keller,
	david.m.ertman, leszek.kaliszczuk, Michal Swiatkowski

From: Jacob Keller <jacob.e.keller@intel.com>

When allocating the ICE_VSI_CTRL, the allocated struct ice_vsi pointer is
stored into the PF's pf->vsi array at a fixed location. This was
historically done on the basis that it could provide an O(1) lookup for the
special control VSI.

Since we store the ctrl_vsi_idx, we already have O(1) lookup regardless of
where in the array we store this VSI.

Simplify the logic in ice_vsi_alloc by using the same method of storing the
control VSI as other types of VSIs.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
---
 drivers/net/ethernet/intel/ice/ice_lib.c | 34 +++++++++++-------------
 1 file changed, 15 insertions(+), 19 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c
index 298bb901e4e7..dc672af10c25 100644
--- a/drivers/net/ethernet/intel/ice/ice_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_lib.c
@@ -379,10 +379,7 @@ int ice_vsi_free(struct ice_vsi *vsi)
 	/* updates the PF for this cleared VSI */
 
 	pf->vsi[vsi->idx] = NULL;
-	if (vsi->idx < pf->next_vsi && vsi->type != ICE_VSI_CTRL)
-		pf->next_vsi = vsi->idx;
-	if (vsi->idx < pf->next_vsi && vsi->type == ICE_VSI_CTRL && vsi->vf)
-		pf->next_vsi = vsi->idx;
+	pf->next_vsi = vsi->idx;
 
 	ice_vsi_free_arrays(vsi);
 	mutex_unlock(&pf->sw_mutex);
@@ -540,23 +537,22 @@ ice_vsi_alloc(struct ice_pf *pf, struct ice_port_info *pi,
 	vsi->vf = vf;
 	set_bit(ICE_VSI_DOWN, vsi->state);
 
-	if (vsi->type == ICE_VSI_CTRL && !vf) {
-		/* Use the last VSI slot as the index for PF control VSI */
-		vsi->idx = pf->num_alloc_vsi - 1;
-		pf->ctrl_vsi_idx = vsi->idx;
-		pf->vsi[vsi->idx] = vsi;
-	} else {
-		/* fill slot and make note of the index */
-		vsi->idx = pf->next_vsi;
-		pf->vsi[pf->next_vsi] = vsi;
+	/* fill slot and make note of the index */
+	vsi->idx = pf->next_vsi;
+	pf->vsi[pf->next_vsi] = vsi;
 
-		/* prepare pf->next_vsi for next use */
-		pf->next_vsi = ice_get_free_slot(pf->vsi, pf->num_alloc_vsi,
-						 pf->next_vsi);
-	}
+	/* prepare pf->next_vsi for next use */
+	pf->next_vsi = ice_get_free_slot(pf->vsi, pf->num_alloc_vsi,
+					 pf->next_vsi);
 
-	if (vsi->type == ICE_VSI_CTRL && vf)
-		vf->ctrl_vsi_idx = vsi->idx;
+	if (vsi->type == ICE_VSI_CTRL) {
+		if (vf) {
+			vf->ctrl_vsi_idx = vsi->idx;
+		} else {
+			WARN_ON(pf->ctrl_vsi_idx != ICE_NO_VSI);
+			pf->ctrl_vsi_idx = vsi->idx;
+		}
+	}
 
 unlock_pf:
 	mutex_unlock(&pf->sw_mutex);
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH net-next 06/13] ice: split probe into smaller functions
  2022-11-14 12:57 [PATCH net-next 00/13] resource management using devlink reload Michal Swiatkowski
                   ` (4 preceding siblings ...)
  2022-11-14 12:57 ` [PATCH net-next 05/13] ice: stop hard coding the ICE_VSI_CTRL location Michal Swiatkowski
@ 2022-11-14 12:57 ` Michal Swiatkowski
  2022-11-14 12:57 ` [PATCH net-next 07/13] ice: sync netdev filters after clearing VSI Michal Swiatkowski
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 49+ messages in thread
From: Michal Swiatkowski @ 2022-11-14 12:57 UTC (permalink / raw)
  To: netdev, davem, kuba, pabeni, edumazet
  Cc: intel-wired-lan, jiri, anthony.l.nguyen, alexandr.lobakin,
	sridhar.samudrala, wojciech.drewek, lukasz.czapnik,
	shiraz.saleem, jesse.brandeburg, mustafa.ismail,
	przemyslaw.kitszel, piotr.raczynski, jacob.e.keller,
	david.m.ertman, leszek.kaliszczuk, Michal Swiatkowski

Part of code from probe can be reused in reload flow. Move this code to
separate function. Create unroll functions for each part of
initialization, like: ice_init_dev() and ice_deinit_dev(). It
simplifies unrolling and can be used in remove flow.

Avoid freeing port info as it could be reused in reload path.
Will be freed in remove path since is allocated via devm_kzalloc().

Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
---
 drivers/net/ethernet/intel/ice/ice.h        |   2 +
 drivers/net/ethernet/intel/ice/ice_common.c |  11 +-
 drivers/net/ethernet/intel/ice/ice_main.c   | 896 ++++++++++++--------
 3 files changed, 562 insertions(+), 347 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h
index 5c5b188fb3e9..62219f995cf2 100644
--- a/drivers/net/ethernet/intel/ice/ice.h
+++ b/drivers/net/ethernet/intel/ice/ice.h
@@ -924,6 +924,8 @@ int ice_open(struct net_device *netdev);
 int ice_open_internal(struct net_device *netdev);
 int ice_stop(struct net_device *netdev);
 void ice_service_task_schedule(struct ice_pf *pf);
+int ice_load(struct ice_pf *pf);
+void ice_unload(struct ice_pf *pf);
 
 /**
  * ice_set_rdma_cap - enable RDMA support
diff --git a/drivers/net/ethernet/intel/ice/ice_common.c b/drivers/net/ethernet/intel/ice/ice_common.c
index 039342a0ed15..9c48e6cf5a0e 100644
--- a/drivers/net/ethernet/intel/ice/ice_common.c
+++ b/drivers/net/ethernet/intel/ice/ice_common.c
@@ -1088,8 +1088,10 @@ int ice_init_hw(struct ice_hw *hw)
 	if (status)
 		goto err_unroll_cqinit;
 
-	hw->port_info = devm_kzalloc(ice_hw_to_dev(hw),
-				     sizeof(*hw->port_info), GFP_KERNEL);
+	if (!hw->port_info)
+		hw->port_info = devm_kzalloc(ice_hw_to_dev(hw),
+					     sizeof(*hw->port_info),
+					     GFP_KERNEL);
 	if (!hw->port_info) {
 		status = -ENOMEM;
 		goto err_unroll_cqinit;
@@ -1214,11 +1216,6 @@ void ice_deinit_hw(struct ice_hw *hw)
 	ice_free_hw_tbls(hw);
 	mutex_destroy(&hw->tnl_lock);
 
-	if (hw->port_info) {
-		devm_kfree(ice_hw_to_dev(hw), hw->port_info);
-		hw->port_info = NULL;
-	}
-
 	/* Attempt to disable FW logging before shutting down control queues */
 	ice_cfg_fw_log(hw, false);
 	ice_destroy_all_ctrlq(hw);
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index c71f60c3bb15..bf98334fadbc 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -3405,53 +3405,6 @@ static void ice_set_netdev_features(struct net_device *netdev)
 	netdev->hw_features |= NETIF_F_RXFCS;
 }
 
-/**
- * ice_cfg_netdev - Allocate, configure and register a netdev
- * @vsi: the VSI associated with the new netdev
- *
- * Returns 0 on success, negative value on failure
- */
-static int ice_cfg_netdev(struct ice_vsi *vsi)
-{
-	struct ice_netdev_priv *np;
-	struct net_device *netdev;
-	u8 mac_addr[ETH_ALEN];
-
-	netdev = alloc_etherdev_mqs(sizeof(*np), vsi->alloc_txq,
-				    vsi->alloc_rxq);
-	if (!netdev)
-		return -ENOMEM;
-
-	set_bit(ICE_VSI_NETDEV_ALLOCD, vsi->state);
-	vsi->netdev = netdev;
-	np = netdev_priv(netdev);
-	np->vsi = vsi;
-
-	ice_set_netdev_features(netdev);
-
-	ice_set_ops(netdev);
-
-	if (vsi->type == ICE_VSI_PF) {
-		SET_NETDEV_DEV(netdev, ice_pf_to_dev(vsi->back));
-		ether_addr_copy(mac_addr, vsi->port_info->mac.perm_addr);
-		eth_hw_addr_set(netdev, mac_addr);
-		ether_addr_copy(netdev->perm_addr, mac_addr);
-	}
-
-	netdev->priv_flags |= IFF_UNICAST_FLT;
-
-	/* Setup netdev TC information */
-	ice_vsi_cfg_netdev_tc(vsi, vsi->tc_cfg.ena_tc);
-
-	/* setup watchdog timeout value to be 5 second */
-	netdev->watchdog_timeo = 5 * HZ;
-
-	netdev->min_mtu = ETH_MIN_MTU;
-	netdev->max_mtu = ICE_MAX_MTU;
-
-	return 0;
-}
-
 /**
  * ice_fill_rss_lut - Fill the RSS lookup table with default values
  * @lut: Lookup table
@@ -3704,78 +3657,6 @@ static int ice_tc_indir_block_register(struct ice_vsi *vsi)
 	return flow_indr_dev_register(ice_indr_setup_tc_cb, np);
 }
 
-/**
- * ice_setup_pf_sw - Setup the HW switch on startup or after reset
- * @pf: board private structure
- *
- * Returns 0 on success, negative value on failure
- */
-static int ice_setup_pf_sw(struct ice_pf *pf)
-{
-	struct device *dev = ice_pf_to_dev(pf);
-	bool dvm = ice_is_dvm_ena(&pf->hw);
-	struct ice_vsi *vsi;
-	int status;
-
-	if (ice_is_reset_in_progress(pf->state))
-		return -EBUSY;
-
-	status = ice_aq_set_port_params(pf->hw.port_info, dvm, NULL);
-	if (status)
-		return -EIO;
-
-	vsi = ice_pf_vsi_setup(pf, pf->hw.port_info);
-	if (!vsi)
-		return -ENOMEM;
-
-	/* init channel list */
-	INIT_LIST_HEAD(&vsi->ch_list);
-
-	status = ice_cfg_netdev(vsi);
-	if (status)
-		goto unroll_vsi_setup;
-	/* netdev has to be configured before setting frame size */
-	ice_vsi_cfg_frame_size(vsi);
-
-	/* init indirect block notifications */
-	status = ice_tc_indir_block_register(vsi);
-	if (status) {
-		dev_err(dev, "Failed to register netdev notifier\n");
-		goto unroll_cfg_netdev;
-	}
-
-	/* Setup DCB netlink interface */
-	ice_dcbnl_setup(vsi);
-
-	/* registering the NAPI handler requires both the queues and
-	 * netdev to be created, which are done in ice_pf_vsi_setup()
-	 * and ice_cfg_netdev() respectively
-	 */
-	ice_napi_add(vsi);
-
-	status = ice_init_mac_fltr(pf);
-	if (status)
-		goto unroll_napi_add;
-
-	return 0;
-
-unroll_napi_add:
-	ice_tc_indir_block_unregister(vsi);
-unroll_cfg_netdev:
-	if (vsi) {
-		ice_napi_del(vsi);
-		if (vsi->netdev) {
-			clear_bit(ICE_VSI_NETDEV_ALLOCD, vsi->state);
-			free_netdev(vsi->netdev);
-			vsi->netdev = NULL;
-		}
-	}
-
-unroll_vsi_setup:
-	ice_vsi_release(vsi);
-	return status;
-}
-
 /**
  * ice_get_avail_q_count - Get count of queues in use
  * @pf_qmap: bitmap to get queue use count from
@@ -4473,6 +4354,21 @@ static int ice_init_fdir(struct ice_pf *pf)
 	return err;
 }
 
+static void ice_deinit_fdir(struct ice_pf *pf)
+{
+	struct ice_vsi *vsi = ice_get_ctrl_vsi(pf);
+
+	if (!vsi)
+		return;
+
+	ice_vsi_manage_fdir(vsi, false);
+	ice_vsi_release(vsi);
+	if (pf->ctrl_vsi_idx != ICE_NO_VSI) {
+		pf->vsi[pf->ctrl_vsi_idx] = NULL;
+		pf->ctrl_vsi_idx = ICE_NO_VSI;
+	}
+}
+
 /**
  * ice_get_opt_fw_name - return optional firmware file name or NULL
  * @pf: pointer to the PF instance
@@ -4573,14 +4469,13 @@ static void ice_print_wake_reason(struct ice_pf *pf)
 
 /**
  * ice_register_netdev - register netdev and devlink port
- * @pf: pointer to the PF struct
+ * @vsi: pointer to the VSI struct
  */
-static int ice_register_netdev(struct ice_pf *pf)
+static int ice_register_netdev(struct ice_vsi *vsi)
 {
-	struct ice_vsi *vsi;
-	int err = 0;
+	struct ice_pf *pf = vsi->back;
+	int err;
 
-	vsi = ice_get_main_vsi(pf);
 	if (!vsi || !vsi->netdev)
 		return -EIO;
 
@@ -4598,168 +4493,208 @@ static int ice_register_netdev(struct ice_pf *pf)
 	netif_tx_stop_all_queues(vsi->netdev);
 
 	return 0;
+
 err_register_netdev:
 	ice_devlink_destroy_pf_port(pf);
 err_devlink_create:
-	free_netdev(vsi->netdev);
-	vsi->netdev = NULL;
-	clear_bit(ICE_VSI_NETDEV_ALLOCD, vsi->state);
+	unregister_netdev(vsi->netdev);
+	clear_bit(ICE_VSI_NETDEV_REGISTERED, vsi->state);
 	return err;
 }
 
+static void ice_unregister_netdev(struct ice_vsi *vsi)
+{
+	if (!vsi || !vsi->netdev)
+		return;
+
+	unregister_netdev(vsi->netdev);
+	ice_devlink_destroy_pf_port(vsi->back);
+	clear_bit(ICE_VSI_NETDEV_REGISTERED, vsi->state);
+}
+
 /**
- * ice_probe - Device initialization routine
- * @pdev: PCI device information struct
- * @ent: entry in ice_pci_tbl
+ * ice_cfg_netdev - Allocate, configure and register a netdev
+ * @vsi: the VSI associated with the new netdev
  *
- * Returns 0 on success, negative on failure
+ * Returns 0 on success, negative value on failure
  */
-static int
-ice_probe(struct pci_dev *pdev, const struct pci_device_id __always_unused *ent)
+static int ice_cfg_netdev(struct ice_vsi *vsi)
 {
-	struct device *dev = &pdev->dev;
-	struct ice_pf *pf;
-	struct ice_hw *hw;
-	int i, err;
+	struct ice_netdev_priv *np;
+	struct net_device *netdev;
+	u8 mac_addr[ETH_ALEN];
 
-	if (pdev->is_virtfn) {
-		dev_err(dev, "can't probe a virtual function\n");
-		return -EINVAL;
-	}
+	netdev = alloc_etherdev_mqs(sizeof(*np), vsi->alloc_txq,
+				    vsi->alloc_rxq);
+	if (!netdev)
+		return -ENOMEM;
 
-	/* this driver uses devres, see
-	 * Documentation/driver-api/driver-model/devres.rst
-	 */
-	err = pcim_enable_device(pdev);
-	if (err)
-		return err;
+	set_bit(ICE_VSI_NETDEV_ALLOCD, vsi->state);
+	vsi->netdev = netdev;
+	np = netdev_priv(netdev);
+	np->vsi = vsi;
 
-	err = pcim_iomap_regions(pdev, BIT(ICE_BAR0), dev_driver_string(dev));
-	if (err) {
-		dev_err(dev, "BAR0 I/O map error %d\n", err);
-		return err;
+	ice_set_netdev_features(netdev);
+	ice_set_ops(netdev);
+
+	if (vsi->type == ICE_VSI_PF) {
+		SET_NETDEV_DEV(netdev, ice_pf_to_dev(vsi->back));
+		ether_addr_copy(mac_addr, vsi->port_info->mac.perm_addr);
+		eth_hw_addr_set(netdev, mac_addr);
 	}
 
-	pf = ice_allocate_pf(dev);
-	if (!pf)
-		return -ENOMEM;
+	netdev->priv_flags |= IFF_UNICAST_FLT;
 
-	/* initialize Auxiliary index to invalid value */
-	pf->aux_idx = -1;
+	/* Setup netdev TC information */
+	ice_vsi_cfg_netdev_tc(vsi, vsi->tc_cfg.ena_tc);
 
-	/* set up for high or low DMA */
-	err = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(64));
-	if (err) {
-		dev_err(dev, "DMA configuration failed: 0x%x\n", err);
-		return err;
-	}
+	netdev->max_mtu = ICE_MAX_MTU;
 
-	pci_enable_pcie_error_reporting(pdev);
-	pci_set_master(pdev);
+	return 0;
+}
 
-	pf->pdev = pdev;
-	pci_set_drvdata(pdev, pf);
-	set_bit(ICE_DOWN, pf->state);
-	/* Disable service task until DOWN bit is cleared */
-	set_bit(ICE_SERVICE_DIS, pf->state);
+static void ice_decfg_netdev(struct ice_vsi *vsi)
+{
+	clear_bit(ICE_VSI_NETDEV_ALLOCD, vsi->state);
+	free_netdev(vsi->netdev);
+	vsi->netdev = NULL;
+}
 
-	hw = &pf->hw;
-	hw->hw_addr = pcim_iomap_table(pdev)[ICE_BAR0];
-	pci_save_state(pdev);
+static int ice_start_eth(struct ice_vsi *vsi)
+{
+	int err;
 
-	hw->back = pf;
-	hw->vendor_id = pdev->vendor;
-	hw->device_id = pdev->device;
-	pci_read_config_byte(pdev, PCI_REVISION_ID, &hw->revision_id);
-	hw->subsystem_vendor_id = pdev->subsystem_vendor;
-	hw->subsystem_device_id = pdev->subsystem_device;
-	hw->bus.device = PCI_SLOT(pdev->devfn);
-	hw->bus.func = PCI_FUNC(pdev->devfn);
-	ice_set_ctrlq_len(hw);
+	err = ice_init_mac_fltr(vsi->back);
+	if (err)
+		return err;
 
-	pf->msg_enable = netif_msg_init(debug, ICE_DFLT_NETIF_M);
+	rtnl_lock();
+	err = ice_vsi_open(vsi);
+	rtnl_unlock();
 
-#ifndef CONFIG_DYNAMIC_DEBUG
-	if (debug < -1)
-		hw->debug_mask = debug;
-#endif
+	return err;
+}
 
-	err = ice_init_hw(hw);
-	if (err) {
-		dev_err(dev, "ice_init_hw failed: %d\n", err);
-		err = -EIO;
-		goto err_exit_unroll;
-	}
+static int ice_init_eth(struct ice_pf *pf)
+{
+	struct ice_vsi *vsi = ice_get_main_vsi(pf);
+	struct device *dev = ice_pf_to_dev(pf);
+	int err;
 
-	ice_init_feature_support(pf);
+	if (!vsi)
+		return -EINVAL;
 
-	ice_request_fw(pf);
+	/* init channel list */
+	INIT_LIST_HEAD(&vsi->ch_list);
 
-	/* if ice_request_fw fails, ICE_FLAG_ADV_FEATURES bit won't be
-	 * set in pf->state, which will cause ice_is_safe_mode to return
-	 * true
-	 */
-	if (ice_is_safe_mode(pf)) {
-		/* we already got function/device capabilities but these don't
-		 * reflect what the driver needs to do in safe mode. Instead of
-		 * adding conditional logic everywhere to ignore these
-		 * device/function capabilities, override them.
-		 */
-		ice_set_safe_mode_caps(hw);
-	}
+	err = ice_cfg_netdev(vsi);
+	if (err)
+		return err;
+	/* Setup DCB netlink interface */
+	ice_dcbnl_setup(vsi);
 
-	err = ice_init_pf(pf);
+	err = ice_set_cpu_rx_rmap(vsi);
 	if (err) {
-		dev_err(dev, "ice_init_pf failed: %d\n", err);
-		goto err_init_pf_unroll;
+		dev_err(dev, "Failed to set CPU Rx map VSI %d error %d\n",
+			vsi->vsi_num, err);
+		goto err_set_cpu_rx_rmap;
 	}
+	err = ice_init_mac_fltr(pf);
+	if (err)
+		goto err_init_mac_fltr;
 
-	ice_devlink_init_regions(pf);
+	err = ice_register_netdev(vsi);
+	if (err)
+		goto err_register_netdev;
 
-	pf->hw.udp_tunnel_nic.set_port = ice_udp_tunnel_set_port;
-	pf->hw.udp_tunnel_nic.unset_port = ice_udp_tunnel_unset_port;
+	err = ice_tc_indir_block_register(vsi);
+	if (err)
+		goto err_tc_indir_block_register;
+
+	ice_napi_add(vsi);
+
+	return 0;
+
+err_tc_indir_block_register:
+	ice_unregister_netdev(vsi);
+err_register_netdev:
+err_init_mac_fltr:
+	ice_free_cpu_rx_rmap(vsi);
+err_set_cpu_rx_rmap:
+	ice_decfg_netdev(vsi);
+	return err;
+}
+
+static void ice_deinit_eth(struct ice_pf *pf)
+{
+	struct ice_vsi *vsi = ice_get_main_vsi(pf);
+
+	if (!vsi)
+		return;
+
+	ice_vsi_close(vsi);
+	ice_unregister_netdev(vsi);
+	ice_free_cpu_rx_rmap(vsi);
+	ice_tc_indir_block_unregister(vsi);
+	ice_decfg_netdev(vsi);
+}
+
+static int ice_init_dev(struct ice_pf *pf)
+{
+	struct device *dev = ice_pf_to_dev(pf);
+	int err;
+
+	err = ice_init_hw(&pf->hw);
+	if (err) {
+		dev_err(dev, "ice_init_hw failed: %d\n", err);
+		return err;
+	}
+
+	ice_init_feature_support(pf);
+
+	ice_request_fw(pf);
+
+	/* if ice_request_fw fails, ICE_FLAG_ADV_FEATURES bit won't be
+	 * set in pf->state, which will cause ice_is_safe_mode to return
+	 * true
+	 */
+	if (ice_is_safe_mode(pf)) {
+		/* we already got function/device capabilities but these don't
+		 * reflect what the driver needs to do in safe mode. Instead of
+		 * adding conditional logic everywhere to ignore these
+		 * device/function capabilities, override them.
+		 */
+		ice_set_safe_mode_caps(&pf->hw);
+	}
+
+	err = ice_init_pf(pf);
+	if (err) {
+		dev_err(dev, "ice_init_pf failed: %d\n", err);
+		goto err_init_pf;
+	}
+
+	pf->hw.udp_tunnel_nic.set_port = ice_udp_tunnel_set_port;
+	pf->hw.udp_tunnel_nic.unset_port = ice_udp_tunnel_unset_port;
 	pf->hw.udp_tunnel_nic.flags = UDP_TUNNEL_NIC_INFO_MAY_SLEEP;
 	pf->hw.udp_tunnel_nic.shared = &pf->hw.udp_tunnel_shared;
-	i = 0;
 	if (pf->hw.tnl.valid_count[TNL_VXLAN]) {
-		pf->hw.udp_tunnel_nic.tables[i].n_entries =
+		pf->hw.udp_tunnel_nic.tables[0].n_entries =
 			pf->hw.tnl.valid_count[TNL_VXLAN];
-		pf->hw.udp_tunnel_nic.tables[i].tunnel_types =
+		pf->hw.udp_tunnel_nic.tables[0].tunnel_types =
 			UDP_TUNNEL_TYPE_VXLAN;
-		i++;
 	}
 	if (pf->hw.tnl.valid_count[TNL_GENEVE]) {
-		pf->hw.udp_tunnel_nic.tables[i].n_entries =
+		pf->hw.udp_tunnel_nic.tables[1].n_entries =
 			pf->hw.tnl.valid_count[TNL_GENEVE];
-		pf->hw.udp_tunnel_nic.tables[i].tunnel_types =
+		pf->hw.udp_tunnel_nic.tables[1].tunnel_types =
 			UDP_TUNNEL_TYPE_GENEVE;
-		i++;
-	}
-
-	pf->num_alloc_vsi = hw->func_caps.guar_num_vsi;
-	if (!pf->num_alloc_vsi) {
-		err = -EIO;
-		goto err_init_pf_unroll;
-	}
-	if (pf->num_alloc_vsi > UDP_TUNNEL_NIC_MAX_SHARING_DEVICES) {
-		dev_warn(&pf->pdev->dev,
-			 "limiting the VSI count due to UDP tunnel limitation %d > %d\n",
-			 pf->num_alloc_vsi, UDP_TUNNEL_NIC_MAX_SHARING_DEVICES);
-		pf->num_alloc_vsi = UDP_TUNNEL_NIC_MAX_SHARING_DEVICES;
-	}
-
-	pf->vsi = devm_kcalloc(dev, pf->num_alloc_vsi, sizeof(*pf->vsi),
-			       GFP_KERNEL);
-	if (!pf->vsi) {
-		err = -ENOMEM;
-		goto err_init_pf_unroll;
 	}
 
 	err = ice_init_interrupt_scheme(pf);
 	if (err) {
 		dev_err(dev, "ice_init_interrupt_scheme failed: %d\n", err);
 		err = -EIO;
-		goto err_init_vsi_unroll;
+		goto err_init_interrupt_scheme;
 	}
 
 	/* In case of MSIX we are going to setup the misc vector right here
@@ -4770,49 +4705,94 @@ ice_probe(struct pci_dev *pdev, const struct pci_device_id __always_unused *ent)
 	err = ice_req_irq_msix_misc(pf);
 	if (err) {
 		dev_err(dev, "setup of misc vector failed: %d\n", err);
-		goto err_init_interrupt_unroll;
+		goto err_req_irq_msix_misc;
 	}
 
-	/* create switch struct for the switch element created by FW on boot */
-	pf->first_sw = devm_kzalloc(dev, sizeof(*pf->first_sw), GFP_KERNEL);
-	if (!pf->first_sw) {
-		err = -ENOMEM;
-		goto err_msix_misc_unroll;
-	}
+	return 0;
 
-	if (hw->evb_veb)
-		pf->first_sw->bridge_mode = BRIDGE_MODE_VEB;
-	else
-		pf->first_sw->bridge_mode = BRIDGE_MODE_VEPA;
+err_req_irq_msix_misc:
+	ice_clear_interrupt_scheme(pf);
+err_init_interrupt_scheme:
+	ice_deinit_pf(pf);
+err_init_pf:
+	ice_deinit_hw(&pf->hw);
+	return err;
+}
 
-	pf->first_sw->pf = pf;
+static void ice_deinit_dev(struct ice_pf *pf)
+{
+	ice_free_irq_msix_misc(pf);
+	ice_clear_interrupt_scheme(pf);
+	ice_deinit_pf(pf);
+	ice_deinit_hw(&pf->hw);
+}
 
-	/* record the sw_id available for later use */
-	pf->first_sw->sw_id = hw->port_info->sw_id;
+static void ice_init_features(struct ice_pf *pf)
+{
+	struct device *dev = ice_pf_to_dev(pf);
 
-	err = ice_setup_pf_sw(pf);
-	if (err) {
-		dev_err(dev, "probe failed due to setup PF switch: %d\n", err);
-		goto err_alloc_sw_unroll;
-	}
+	if (ice_is_safe_mode(pf))
+		return;
 
-	clear_bit(ICE_SERVICE_DIS, pf->state);
+	/* initialize DDP driven features */
+	if (test_bit(ICE_FLAG_PTP_SUPPORTED, pf->flags))
+		ice_ptp_init(pf);
 
-	/* tell the firmware we are up */
-	err = ice_send_version(pf);
-	if (err) {
-		dev_err(dev, "probe failed sending driver version %s. error: %d\n",
-			UTS_RELEASE, err);
-		goto err_send_version_unroll;
+	if (ice_is_feature_supported(pf, ICE_F_GNSS))
+		ice_gnss_init(pf);
+
+	/* Note: Flow director init failure is non-fatal to load */
+	if (ice_init_fdir(pf))
+		dev_err(dev, "could not initialize flow director\n");
+
+	/* Note: DCB init failure is non-fatal to load */
+	if (ice_init_pf_dcb(pf, false)) {
+		clear_bit(ICE_FLAG_DCB_CAPABLE, pf->flags);
+		clear_bit(ICE_FLAG_DCB_ENA, pf->flags);
+	} else {
+		ice_cfg_lldp_mib_change(&pf->hw, true);
 	}
 
-	/* since everything is good, start the service timer */
-	mod_timer(&pf->serv_tmr, round_jiffies(jiffies + pf->serv_tmr_period));
+	if (ice_init_lag(pf))
+		dev_warn(dev, "Failed to init link aggregation support\n");
+}
+
+static void ice_deinit_features(struct ice_pf *pf)
+{
+	ice_deinit_lag(pf);
+	if (test_bit(ICE_FLAG_DCB_CAPABLE, pf->flags))
+		ice_cfg_lldp_mib_change(&pf->hw, false);
+	ice_deinit_fdir(pf);
+	if (ice_is_feature_supported(pf, ICE_F_GNSS))
+		ice_gnss_exit(pf);
+	if (test_bit(ICE_FLAG_PTP_SUPPORTED, pf->flags))
+		ice_ptp_release(pf);
+}
+
+static void ice_init_wakeup(struct ice_pf *pf)
+{
+	/* Save wakeup reason register for later use */
+	pf->wakeup_reason = rd32(&pf->hw, PFPM_WUS);
+
+	/* check for a power management event */
+	ice_print_wake_reason(pf);
+
+	/* clear wake status, all bits */
+	wr32(&pf->hw, PFPM_WUS, U32_MAX);
+
+	/* Disable WoL at init, wait for user to enable */
+	device_set_wakeup_enable(ice_pf_to_dev(pf), false);
+}
+
+static int ice_init_link(struct ice_pf *pf)
+{
+	struct device *dev = ice_pf_to_dev(pf);
+	int err;
 
 	err = ice_init_link_events(pf->hw.port_info);
 	if (err) {
 		dev_err(dev, "ice_init_link_events failed: %d\n", err);
-		goto err_send_version_unroll;
+		return err;
 	}
 
 	/* not a fatal error if this fails */
@@ -4848,91 +4828,336 @@ ice_probe(struct pci_dev *pdev, const struct pci_device_id __always_unused *ent)
 		set_bit(ICE_FLAG_NO_MEDIA, pf->flags);
 	}
 
-	ice_verify_cacheline_size(pf);
+	return err;
+}
 
-	/* Save wakeup reason register for later use */
-	pf->wakeup_reason = rd32(hw, PFPM_WUS);
+static int ice_init_pf_sw(struct ice_pf *pf)
+{
+	bool dvm = ice_is_dvm_ena(&pf->hw);
+	struct ice_vsi *vsi;
+	int err;
 
-	/* check for a power management event */
-	ice_print_wake_reason(pf);
+	/* create switch struct for the switch element created by FW on boot */
+	pf->first_sw = kzalloc(sizeof(*pf->first_sw), GFP_KERNEL);
+	if (!pf->first_sw)
+		return -ENOMEM;
 
-	/* clear wake status, all bits */
-	wr32(hw, PFPM_WUS, U32_MAX);
+	if (pf->hw.evb_veb)
+		pf->first_sw->bridge_mode = BRIDGE_MODE_VEB;
+	else
+		pf->first_sw->bridge_mode = BRIDGE_MODE_VEPA;
 
-	/* Disable WoL at init, wait for user to enable */
-	device_set_wakeup_enable(dev, false);
+	pf->first_sw->pf = pf;
 
-	if (ice_is_safe_mode(pf)) {
-		ice_set_safe_mode_vlan_cfg(pf);
-		goto probe_done;
+	/* record the sw_id available for later use */
+	pf->first_sw->sw_id = pf->hw.port_info->sw_id;
+
+	err = ice_aq_set_port_params(pf->hw.port_info, dvm, NULL);
+	if (err)
+		goto err_aq_set_port_params;
+
+	vsi = ice_pf_vsi_setup(pf, pf->hw.port_info);
+	if (!vsi) {
+		err = -ENOMEM;
+		goto err_pf_vsi_setup;
 	}
 
-	/* initialize DDP driven features */
-	if (test_bit(ICE_FLAG_PTP_SUPPORTED, pf->flags))
-		ice_ptp_init(pf);
+	return 0;
 
-	if (ice_is_feature_supported(pf, ICE_F_GNSS))
-		ice_gnss_init(pf);
+err_pf_vsi_setup:
+err_aq_set_port_params:
+	kfree(pf->first_sw);
+	return err;
+}
 
-	/* Note: Flow director init failure is non-fatal to load */
-	if (ice_init_fdir(pf))
-		dev_err(dev, "could not initialize flow director\n");
+static void ice_deinit_pf_sw(struct ice_pf *pf)
+{
+	struct ice_vsi *vsi = ice_get_main_vsi(pf);
 
-	/* Note: DCB init failure is non-fatal to load */
-	if (ice_init_pf_dcb(pf, false)) {
-		clear_bit(ICE_FLAG_DCB_CAPABLE, pf->flags);
-		clear_bit(ICE_FLAG_DCB_ENA, pf->flags);
-	} else {
-		ice_cfg_lldp_mib_change(&pf->hw, true);
+	if (!vsi)
+		return;
+
+	ice_vsi_release(vsi);
+	kfree(pf->first_sw);
+}
+
+static int ice_alloc_vsis(struct ice_pf *pf)
+{
+	struct device *dev = ice_pf_to_dev(pf);
+
+	pf->num_alloc_vsi = pf->hw.func_caps.guar_num_vsi;
+	if (!pf->num_alloc_vsi)
+		return -EIO;
+
+	if (pf->num_alloc_vsi > UDP_TUNNEL_NIC_MAX_SHARING_DEVICES) {
+		dev_warn(dev,
+			 "limiting the VSI count due to UDP tunnel limitation %d > %d\n",
+			 pf->num_alloc_vsi, UDP_TUNNEL_NIC_MAX_SHARING_DEVICES);
+		pf->num_alloc_vsi = UDP_TUNNEL_NIC_MAX_SHARING_DEVICES;
 	}
 
-	if (ice_init_lag(pf))
-		dev_warn(dev, "Failed to init link aggregation support\n");
+	pf->vsi = devm_kcalloc(dev, pf->num_alloc_vsi, sizeof(*pf->vsi),
+			       GFP_KERNEL);
+	if (!pf->vsi)
+		return -ENOMEM;
 
-	/* print PCI link speed and width */
-	pcie_print_link_status(pf->pdev);
+	return 0;
+}
 
-probe_done:
-	err = ice_register_netdev(pf);
-	if (err)
-		goto err_netdev_reg;
+static void ice_dealloc_vsis(struct ice_pf *pf)
+{
+	pf->num_alloc_vsi = 0;
+	devm_kfree(ice_pf_to_dev(pf), pf->vsi);
+	pf->vsi = NULL;
+}
+
+static int ice_init_devlink(struct ice_pf *pf)
+{
+	int err;
 
 	err = ice_devlink_register_params(pf);
 	if (err)
-		goto err_netdev_reg;
+		return err;
+
+	ice_devlink_init_regions(pf);
+	ice_devlink_register(pf);
+
+	return 0;
+}
+
+static void ice_deinit_devlink(struct ice_pf *pf)
+{
+	ice_devlink_unregister(pf);
+	ice_devlink_destroy_regions(pf);
+	ice_devlink_unregister_params(pf);
+}
+
+static int ice_init(struct ice_pf *pf)
+{
+	int err;
+
+	err = ice_init_dev(pf);
+	if (err)
+		return err;
+
+	err = ice_alloc_vsis(pf);
+	if (err)
+		goto err_alloc_vsis;
+
+	err = ice_init_pf_sw(pf);
+	if (err)
+		goto err_init_pf_sw;
+
+	ice_init_wakeup(pf);
+
+	err = ice_init_link(pf);
+	if (err)
+		goto err_init_link;
+
+	err = ice_send_version(pf);
+	if (err)
+		goto err_init_link;
+
+	ice_verify_cacheline_size(pf);
+
+	if (ice_is_safe_mode(pf))
+		ice_set_safe_mode_vlan_cfg(pf);
+	else
+		/* print PCI link speed and width */
+		pcie_print_link_status(pf->pdev);
 
 	/* ready to go, so clear down state bit */
 	clear_bit(ICE_DOWN, pf->state);
+	clear_bit(ICE_SERVICE_DIS, pf->state);
+
+	/* since everything is good, start the service timer */
+	mod_timer(&pf->serv_tmr, round_jiffies(jiffies + pf->serv_tmr_period));
+
+	return 0;
+
+err_init_link:
+	ice_deinit_pf_sw(pf);
+err_init_pf_sw:
+	ice_dealloc_vsis(pf);
+err_alloc_vsis:
+	ice_deinit_dev(pf);
+	return err;
+}
+
+static void ice_deinit(struct ice_pf *pf)
+{
+	set_bit(ICE_SERVICE_DIS, pf->state);
+	set_bit(ICE_DOWN, pf->state);
+
+	ice_deinit_dev(pf);
+	ice_dealloc_vsis(pf);
+	ice_deinit_pf_sw(pf);
+}
+
+/**
+ * ice_load - load pf by init hw and starting VSI
+ * @pf: pointer to the pf instance
+ */
+int ice_load(struct ice_pf *pf)
+{
+	struct ice_vsi *vsi;
+	int err;
+
+	err = ice_reset(&pf->hw, ICE_RESET_PFR);
+	if (err)
+		return err;
+
+	err = ice_init_dev(pf);
+	if (err)
+		return err;
+
+	vsi = ice_get_main_vsi(pf);
+	err = ice_vsi_cfg(vsi, NULL, NULL);
+	if (err)
+		goto err_vsi_cfg;
+
+	err = ice_start_eth(ice_get_main_vsi(pf));
+	if (err)
+		goto err_start_eth;
+
 	err = ice_init_rdma(pf);
+	if (err)
+		goto err_init_rdma;
+
+	ice_init_features(pf);
+	ice_service_task_restart(pf);
+
+	clear_bit(ICE_DOWN, pf->state);
+
+	return 0;
+
+err_init_rdma:
+	ice_vsi_close(ice_get_main_vsi(pf));
+err_start_eth:
+	ice_vsi_decfg(ice_get_main_vsi(pf));
+err_vsi_cfg:
+	ice_deinit_dev(pf);
+	return err;
+}
+
+/**
+ * ice_unload - unload pf by stopping VSI and deinit hw
+ * @pf: pointer to the pf instance
+ */
+void ice_unload(struct ice_pf *pf)
+{
+	ice_deinit_features(pf);
+	ice_deinit_rdma(pf);
+	ice_vsi_close(ice_get_main_vsi(pf));
+	ice_vsi_decfg(ice_get_main_vsi(pf));
+	ice_deinit_dev(pf);
+}
+
+/**
+ * ice_probe - Device initialization routine
+ * @pdev: PCI device information struct
+ * @ent: entry in ice_pci_tbl
+ *
+ * Returns 0 on success, negative on failure
+ */
+static int
+ice_probe(struct pci_dev *pdev, const struct pci_device_id __always_unused *ent)
+{
+	struct device *dev = &pdev->dev;
+	struct ice_pf *pf;
+	struct ice_hw *hw;
+	int err;
+
+	if (pdev->is_virtfn) {
+		dev_err(dev, "can't probe a virtual function\n");
+		return -EINVAL;
+	}
+
+	/* this driver uses devres, see
+	 * Documentation/driver-api/driver-model/devres.rst
+	 */
+	err = pcim_enable_device(pdev);
+	if (err)
+		return err;
+
+	err = pcim_iomap_regions(pdev, BIT(ICE_BAR0), dev_driver_string(dev));
 	if (err) {
-		dev_err(dev, "Failed to initialize RDMA: %d\n", err);
-		err = -EIO;
-		goto err_devlink_reg_param;
+		dev_err(dev, "BAR0 I/O map error %d\n", err);
+		return err;
 	}
 
-	ice_devlink_register(pf);
-	return 0;
+	pf = ice_allocate_pf(dev);
+	if (!pf)
+		return -ENOMEM;
 
-err_devlink_reg_param:
-	ice_devlink_unregister_params(pf);
-err_netdev_reg:
-err_send_version_unroll:
-	ice_vsi_release_all(pf);
-err_alloc_sw_unroll:
-	set_bit(ICE_SERVICE_DIS, pf->state);
+	/* initialize Auxiliary index to invalid value */
+	pf->aux_idx = -1;
+
+	/* set up for high or low DMA */
+	err = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(64));
+	if (err) {
+		dev_err(dev, "DMA configuration failed: 0x%x\n", err);
+		return err;
+	}
+
+	pci_enable_pcie_error_reporting(pdev);
+	pci_set_master(pdev);
+
+	pf->pdev = pdev;
+	pci_set_drvdata(pdev, pf);
 	set_bit(ICE_DOWN, pf->state);
-	devm_kfree(dev, pf->first_sw);
-err_msix_misc_unroll:
-	ice_free_irq_msix_misc(pf);
-err_init_interrupt_unroll:
-	ice_clear_interrupt_scheme(pf);
-err_init_vsi_unroll:
-	devm_kfree(dev, pf->vsi);
-err_init_pf_unroll:
-	ice_deinit_pf(pf);
-	ice_devlink_destroy_regions(pf);
-	ice_deinit_hw(hw);
-err_exit_unroll:
+	/* Disable service task until DOWN bit is cleared */
+	set_bit(ICE_SERVICE_DIS, pf->state);
+
+	hw = &pf->hw;
+	hw->hw_addr = pcim_iomap_table(pdev)[ICE_BAR0];
+	pci_save_state(pdev);
+
+	hw->back = pf;
+	hw->port_info = NULL;
+	hw->vendor_id = pdev->vendor;
+	hw->device_id = pdev->device;
+	pci_read_config_byte(pdev, PCI_REVISION_ID, &hw->revision_id);
+	hw->subsystem_vendor_id = pdev->subsystem_vendor;
+	hw->subsystem_device_id = pdev->subsystem_device;
+	hw->bus.device = PCI_SLOT(pdev->devfn);
+	hw->bus.func = PCI_FUNC(pdev->devfn);
+	ice_set_ctrlq_len(hw);
+
+	pf->msg_enable = netif_msg_init(debug, ICE_DFLT_NETIF_M);
+
+#ifndef CONFIG_DYNAMIC_DEBUG
+	if (debug < -1)
+		hw->debug_mask = debug;
+#endif
+
+	err = ice_init(pf);
+	if (err)
+		goto err_init;
+
+	err = ice_init_eth(pf);
+	if (err)
+		goto err_init_eth;
+
+	err = ice_init_rdma(pf);
+	if (err)
+		goto err_init_rdma;
+
+	err = ice_init_devlink(pf);
+	if (err)
+		goto err_init_devlink;
+
+	ice_init_features(pf);
+
+	return 0;
+
+err_init_devlink:
+	ice_deinit_rdma(pf);
+err_init_rdma:
+	ice_deinit_eth(pf);
+err_init_eth:
+	ice_deinit(pf);
+err_init:
 	pci_disable_pcie_error_reporting(pdev);
 	pci_disable_device(pdev);
 	return err;
@@ -5008,7 +5233,7 @@ static void ice_remove(struct pci_dev *pdev)
 	struct ice_pf *pf = pci_get_drvdata(pdev);
 	int i;
 
-	ice_devlink_unregister(pf);
+	ice_deinit_devlink(pf);
 	for (i = 0; i < ICE_MAX_RESET_WAIT; i++) {
 		if (!ice_is_reset_in_progress(pf->state))
 			break;
@@ -5025,30 +5250,22 @@ static void ice_remove(struct pci_dev *pdev)
 	ice_service_task_stop(pf);
 
 	ice_aq_cancel_waiting_tasks(pf);
-	ice_deinit_rdma(pf);
-	ice_devlink_unregister_params(pf);
 	set_bit(ICE_DOWN, pf->state);
 
-	ice_deinit_lag(pf);
-	if (test_bit(ICE_FLAG_PTP_SUPPORTED, pf->flags))
-		ice_ptp_release(pf);
-	if (ice_is_feature_supported(pf, ICE_F_GNSS))
-		ice_gnss_exit(pf);
+	ice_deinit_features(pf);
+	ice_deinit_rdma(pf);
 	if (!ice_is_safe_mode(pf))
 		ice_remove_arfs(pf);
 	ice_setup_mc_magic_wake(pf);
 	ice_vsi_release_all(pf);
 	mutex_destroy(&(&pf->hw)->fdir_fltr_lock);
 	ice_set_wake(pf);
-	ice_free_irq_msix_misc(pf);
+	ice_deinit_dev(pf);
 	ice_for_each_vsi(pf, i) {
 		if (!pf->vsi[i])
 			continue;
 		ice_vsi_free_q_vectors(pf->vsi[i]);
 	}
-	ice_deinit_pf(pf);
-	ice_devlink_destroy_regions(pf);
-	ice_deinit_hw(&pf->hw);
 
 	/* Issue a PFR as part of the prescribed driver unload flow.  Do not
 	 * do it via ice_schedule_reset() since there is no need to rebuild
@@ -5056,7 +5273,6 @@ static void ice_remove(struct pci_dev *pdev)
 	 */
 	ice_reset(&pf->hw, ICE_RESET_PFR);
 	pci_wait_for_pending_transaction(pdev);
-	ice_clear_interrupt_scheme(pf);
 	pci_disable_pcie_error_reporting(pdev);
 	pci_disable_device(pdev);
 }
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH net-next 07/13] ice: sync netdev filters after clearing VSI
  2022-11-14 12:57 [PATCH net-next 00/13] resource management using devlink reload Michal Swiatkowski
                   ` (5 preceding siblings ...)
  2022-11-14 12:57 ` [PATCH net-next 06/13] ice: split probe into smaller functions Michal Swiatkowski
@ 2022-11-14 12:57 ` Michal Swiatkowski
  2022-11-14 12:57 ` [PATCH net-next 08/13] ice: move VSI delete outside deconfig Michal Swiatkowski
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 49+ messages in thread
From: Michal Swiatkowski @ 2022-11-14 12:57 UTC (permalink / raw)
  To: netdev, davem, kuba, pabeni, edumazet
  Cc: intel-wired-lan, jiri, anthony.l.nguyen, alexandr.lobakin,
	sridhar.samudrala, wojciech.drewek, lukasz.czapnik,
	shiraz.saleem, jesse.brandeburg, mustafa.ismail,
	przemyslaw.kitszel, piotr.raczynski, jacob.e.keller,
	david.m.ertman, leszek.kaliszczuk, Michal Swiatkowski

In driver reload path the netdev isn't removed, but VSI is. Remove
filters on netdev right after removing them on VSI.

Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
---
 drivers/net/ethernet/intel/ice/ice_fltr.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/net/ethernet/intel/ice/ice_fltr.c b/drivers/net/ethernet/intel/ice/ice_fltr.c
index 40e678cfb507..aff7a141c30d 100644
--- a/drivers/net/ethernet/intel/ice/ice_fltr.c
+++ b/drivers/net/ethernet/intel/ice/ice_fltr.c
@@ -208,6 +208,11 @@ static int ice_fltr_remove_eth_list(struct ice_vsi *vsi, struct list_head *list)
 void ice_fltr_remove_all(struct ice_vsi *vsi)
 {
 	ice_remove_vsi_fltr(&vsi->back->hw, vsi->idx);
+	/* sync netdev filters if exist */
+	if (vsi->netdev) {
+		__dev_uc_unsync(vsi->netdev, NULL);
+		__dev_mc_unsync(vsi->netdev, NULL);
+	}
 }
 
 /**
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH net-next 08/13] ice: move VSI delete outside deconfig
  2022-11-14 12:57 [PATCH net-next 00/13] resource management using devlink reload Michal Swiatkowski
                   ` (6 preceding siblings ...)
  2022-11-14 12:57 ` [PATCH net-next 07/13] ice: sync netdev filters after clearing VSI Michal Swiatkowski
@ 2022-11-14 12:57 ` Michal Swiatkowski
  2022-11-14 12:57 ` [PATCH net-next 09/13] ice: update VSI instead of init in some case Michal Swiatkowski
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 49+ messages in thread
From: Michal Swiatkowski @ 2022-11-14 12:57 UTC (permalink / raw)
  To: netdev, davem, kuba, pabeni, edumazet
  Cc: intel-wired-lan, jiri, anthony.l.nguyen, alexandr.lobakin,
	sridhar.samudrala, wojciech.drewek, lukasz.czapnik,
	shiraz.saleem, jesse.brandeburg, mustafa.ismail,
	przemyslaw.kitszel, piotr.raczynski, jacob.e.keller,
	david.m.ertman, leszek.kaliszczuk, Michal Swiatkowski

In deconfig VSI shouldn't be deleted from hw.

Rewrite VSI delete function to reflect that sometimes it is only needed
to remove VSI from hw without freeing the memory:
ice_vsi_delete() -> delete from HW and free memory
ice_vsi_delete_from_hw() -> delete only from HW

Value returned from ice_vsi_free() is never used. Change return type to
void.

Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
---
 drivers/net/ethernet/intel/ice/ice_lib.c  | 28 +++++++++++------------
 drivers/net/ethernet/intel/ice/ice_lib.h  |  1 -
 drivers/net/ethernet/intel/ice/ice_main.c |  5 +---
 3 files changed, 14 insertions(+), 20 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c
index dc672af10c25..32da2da74056 100644
--- a/drivers/net/ethernet/intel/ice/ice_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_lib.c
@@ -282,10 +282,10 @@ static int ice_get_free_slot(void *array, int size, int curr)
 }
 
 /**
- * ice_vsi_delete - delete a VSI from the switch
+ * ice_vsi_delete_from_hw - delete a VSI from the switch
  * @vsi: pointer to VSI being removed
  */
-void ice_vsi_delete(struct ice_vsi *vsi)
+static void ice_vsi_delete_from_hw(struct ice_vsi *vsi)
 {
 	struct ice_pf *pf = vsi->back;
 	struct ice_vsi_ctx *ctxt;
@@ -353,26 +353,21 @@ static void ice_vsi_free_arrays(struct ice_vsi *vsi)
  *
  * This deallocates the VSI's queue resources, removes it from the PF's
  * VSI array if necessary, and deallocates the VSI
- *
- * Returns 0 on success, negative on failure
  */
-int ice_vsi_free(struct ice_vsi *vsi)
+static void ice_vsi_free(struct ice_vsi *vsi)
 {
 	struct ice_pf *pf = NULL;
 	struct device *dev;
 
-	if (!vsi)
-		return 0;
-
-	if (!vsi->back)
-		return -EINVAL;
+	if (!vsi || !vsi->back)
+		return;
 
 	pf = vsi->back;
 	dev = ice_pf_to_dev(pf);
 
 	if (!pf->vsi[vsi->idx] || pf->vsi[vsi->idx] != vsi) {
 		dev_dbg(dev, "vsi does not exist at pf->vsi[%d]\n", vsi->idx);
-		return -EINVAL;
+		return;
 	}
 
 	mutex_lock(&pf->sw_mutex);
@@ -384,8 +379,12 @@ int ice_vsi_free(struct ice_vsi *vsi)
 	ice_vsi_free_arrays(vsi);
 	mutex_unlock(&pf->sw_mutex);
 	devm_kfree(dev, vsi);
+}
 
-	return 0;
+void ice_vsi_delete(struct ice_vsi *vsi)
+{
+	ice_vsi_delete_from_hw(vsi);
+	ice_vsi_free(vsi);
 }
 
 /**
@@ -2681,7 +2680,7 @@ ice_vsi_cfg_def(struct ice_vsi *vsi, struct ice_vf *vf, struct ice_channel *ch)
 unroll_alloc_q_vector:
 	ice_vsi_free_q_vectors(vsi);
 unroll_vsi_init:
-	ice_vsi_delete(vsi);
+	ice_vsi_delete_from_hw(vsi);
 unroll_get_qs:
 	ice_vsi_put_qs(vsi);
 unroll_vsi_alloc:
@@ -2742,7 +2741,6 @@ void ice_vsi_decfg(struct ice_vsi *vsi)
 
 	ice_vsi_clear_rings(vsi);
 	ice_vsi_free_q_vectors(vsi);
-	ice_vsi_delete(vsi);
 	ice_vsi_put_qs(vsi);
 	ice_vsi_free_arrays(vsi);
 
@@ -3140,7 +3138,7 @@ int ice_vsi_release(struct ice_vsi *vsi)
 	 * for ex: during rmmod.
 	 */
 	if (!ice_is_reset_in_progress(pf->state))
-		ice_vsi_free(vsi);
+		ice_vsi_delete(vsi);
 
 	return 0;
 }
diff --git a/drivers/net/ethernet/intel/ice/ice_lib.h b/drivers/net/ethernet/intel/ice/ice_lib.h
index ad4d5314ca76..8905f8721a76 100644
--- a/drivers/net/ethernet/intel/ice/ice_lib.h
+++ b/drivers/net/ethernet/intel/ice/ice_lib.h
@@ -42,7 +42,6 @@ void ice_cfg_sw_lldp(struct ice_vsi *vsi, bool tx, bool create);
 int ice_set_link(struct ice_vsi *vsi, bool ena);
 
 void ice_vsi_delete(struct ice_vsi *vsi);
-int ice_vsi_free(struct ice_vsi *vsi);
 
 int ice_vsi_cfg_tc(struct ice_vsi *vsi, u8 ena_tc);
 
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index bf98334fadbc..49d29eb61c17 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -8555,12 +8555,9 @@ static void ice_remove_q_channels(struct ice_vsi *vsi, bool rem_fltr)
 		/* clear the VSI from scheduler tree */
 		ice_rm_vsi_lan_cfg(ch->ch_vsi->port_info, ch->ch_vsi->idx);
 
-		/* Delete VSI from FW */
+		/* Delete VSI from FW, PF and HW VSI arrays */
 		ice_vsi_delete(ch->ch_vsi);
 
-		/* Delete VSI from PF and HW VSI arrays */
-		ice_vsi_free(ch->ch_vsi);
-
 		/* free the channel */
 		kfree(ch);
 	}
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH net-next 09/13] ice: update VSI instead of init in some case
  2022-11-14 12:57 [PATCH net-next 00/13] resource management using devlink reload Michal Swiatkowski
                   ` (7 preceding siblings ...)
  2022-11-14 12:57 ` [PATCH net-next 08/13] ice: move VSI delete outside deconfig Michal Swiatkowski
@ 2022-11-14 12:57 ` Michal Swiatkowski
  2022-11-14 12:57 ` [PATCH net-next 10/13] ice: implement devlink reinit action Michal Swiatkowski
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 49+ messages in thread
From: Michal Swiatkowski @ 2022-11-14 12:57 UTC (permalink / raw)
  To: netdev, davem, kuba, pabeni, edumazet
  Cc: intel-wired-lan, jiri, anthony.l.nguyen, alexandr.lobakin,
	sridhar.samudrala, wojciech.drewek, lukasz.czapnik,
	shiraz.saleem, jesse.brandeburg, mustafa.ismail,
	przemyslaw.kitszel, piotr.raczynski, jacob.e.keller,
	david.m.ertman, leszek.kaliszczuk, Michal Swiatkowski

ice_vsi_cfg() is called from different contexts:
1) VSI exsist in HW, but it is reconfigured, because of changing queues
   for example -> update instead of init should be used
2) VSI doesn't exsist, because rest has happened -> init command should
   be sent

To support both cases pass boolean value which will store information
what type of command has to be sent to HW.

Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
---
 drivers/net/ethernet/intel/ice/ice_lib.c  | 16 ++++++++++------
 drivers/net/ethernet/intel/ice/ice_lib.h  |  4 ++--
 drivers/net/ethernet/intel/ice/ice_main.c |  2 +-
 3 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c
index 32da2da74056..bc04f2b9f8a2 100644
--- a/drivers/net/ethernet/intel/ice/ice_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_lib.c
@@ -2551,9 +2551,11 @@ static int ice_vsi_cfg_tc_lan(struct ice_pf *pf, struct ice_vsi *vsi)
  * @vf: pointer to VF to which this VSI connects. This field is used primarily
  *      for the ICE_VSI_VF type. Other VSI types should pass NULL.
  * @ch: ptr to channel
+ * @init_vsi: is this an initialization or a reconfigure of the VSI
  */
 static int
-ice_vsi_cfg_def(struct ice_vsi *vsi, struct ice_vf *vf, struct ice_channel *ch)
+ice_vsi_cfg_def(struct ice_vsi *vsi, struct ice_vf *vf, struct ice_channel *ch,
+		int init_vsi)
 {
 	struct device *dev = ice_pf_to_dev(vsi->back);
 	struct ice_pf *pf = vsi->back;
@@ -2580,7 +2582,7 @@ ice_vsi_cfg_def(struct ice_vsi *vsi, struct ice_vf *vf, struct ice_channel *ch)
 	ice_vsi_set_tc_cfg(vsi);
 
 	/* create the VSI */
-	ret = ice_vsi_init(vsi, true);
+	ret = ice_vsi_init(vsi, init_vsi);
 	if (ret)
 		goto unroll_get_qs;
 
@@ -2694,12 +2696,14 @@ ice_vsi_cfg_def(struct ice_vsi *vsi, struct ice_vf *vf, struct ice_channel *ch)
  * @vf: pointer to VF to which this VSI connects. This field is used primarily
  *      for the ICE_VSI_VF type. Other VSI types should pass NULL.
  * @ch: ptr to channel
+ * @init_vsi: is this an initialization or a reconfigure of the VSI
  */
-int ice_vsi_cfg(struct ice_vsi *vsi, struct ice_vf *vf, struct ice_channel *ch)
+int ice_vsi_cfg(struct ice_vsi *vsi, struct ice_vf *vf, struct ice_channel *ch,
+		int init_vsi)
 {
 	int ret;
 
-	ret = ice_vsi_cfg_def(vsi, vf, ch);
+	ret = ice_vsi_cfg_def(vsi, vf, ch, init_vsi);
 	if (ret)
 		return ret;
 
@@ -2796,7 +2800,7 @@ ice_vsi_setup(struct ice_pf *pf, struct ice_port_info *pi,
 		return NULL;
 	}
 
-	ret = ice_vsi_cfg(vsi, vf, ch);
+	ret = ice_vsi_cfg(vsi, vf, ch, ICE_VSI_FLAG_INIT);
 	if (ret)
 		goto err_vsi_cfg;
 
@@ -3286,7 +3290,7 @@ int ice_vsi_rebuild(struct ice_vsi *vsi, int init_vsi)
 	prev_num_q_vectors = ice_vsi_rebuild_get_coalesce(vsi, coalesce);
 
 	ice_vsi_decfg(vsi);
-	ret = ice_vsi_cfg_def(vsi, vsi->vf, vsi->ch);
+	ret = ice_vsi_cfg_def(vsi, vsi->vf, vsi->ch, init_vsi);
 	if (ret)
 		goto err_vsi_cfg;
 
diff --git a/drivers/net/ethernet/intel/ice/ice_lib.h b/drivers/net/ethernet/intel/ice/ice_lib.h
index 8905f8721a76..b76f05e1f8a3 100644
--- a/drivers/net/ethernet/intel/ice/ice_lib.h
+++ b/drivers/net/ethernet/intel/ice/ice_lib.h
@@ -60,8 +60,6 @@ int ice_vsi_release(struct ice_vsi *vsi);
 
 void ice_vsi_close(struct ice_vsi *vsi);
 
-int ice_vsi_cfg(struct ice_vsi *vsi, struct ice_vf *vf,
-		struct ice_channel *ch);
 int ice_ena_vsi(struct ice_vsi *vsi, bool locked);
 
 void ice_vsi_decfg(struct ice_vsi *vsi);
@@ -75,6 +73,8 @@ ice_get_res(struct ice_pf *pf, struct ice_res_tracker *res, u16 needed, u16 id);
 #define ICE_VSI_FLAG_INIT	BIT(0)
 #define ICE_VSI_FLAG_NO_INIT	0
 int ice_vsi_rebuild(struct ice_vsi *vsi, int init_vsi);
+int ice_vsi_cfg(struct ice_vsi *vsi, struct ice_vf *vf,
+		struct ice_channel *ch, int init_vsi);
 
 bool ice_is_reset_in_progress(unsigned long *state);
 int ice_wait_for_reset(struct ice_pf *pf, unsigned long timeout);
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index 49d29eb61c17..9766032e95ec 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -5012,7 +5012,7 @@ int ice_load(struct ice_pf *pf)
 		return err;
 
 	vsi = ice_get_main_vsi(pf);
-	err = ice_vsi_cfg(vsi, NULL, NULL);
+	err = ice_vsi_cfg(vsi, NULL, NULL, ICE_VSI_FLAG_INIT);
 	if (err)
 		goto err_vsi_cfg;
 
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH net-next 10/13] ice: implement devlink reinit action
  2022-11-14 12:57 [PATCH net-next 00/13] resource management using devlink reload Michal Swiatkowski
                   ` (8 preceding siblings ...)
  2022-11-14 12:57 ` [PATCH net-next 09/13] ice: update VSI instead of init in some case Michal Swiatkowski
@ 2022-11-14 12:57 ` Michal Swiatkowski
  2022-11-14 12:57 ` [PATCH net-next 11/13] ice: introduce eswitch capable flag Michal Swiatkowski
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 49+ messages in thread
From: Michal Swiatkowski @ 2022-11-14 12:57 UTC (permalink / raw)
  To: netdev, davem, kuba, pabeni, edumazet
  Cc: intel-wired-lan, jiri, anthony.l.nguyen, alexandr.lobakin,
	sridhar.samudrala, wojciech.drewek, lukasz.czapnik,
	shiraz.saleem, jesse.brandeburg, mustafa.ismail,
	przemyslaw.kitszel, piotr.raczynski, jacob.e.keller,
	david.m.ertman, leszek.kaliszczuk, Michal Swiatkowski

Call ice_unload() and ice_load() in driver reinit flow.

Block reinit when switchdev, ADQ or SRIOV is active. In reload path we
don't want to rebuild all features. Ask user to remove them instead of
quitely removing it in reload path.

Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
---
 drivers/net/ethernet/intel/ice/ice_devlink.c | 103 +++++++++++++++----
 1 file changed, 81 insertions(+), 22 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_devlink.c b/drivers/net/ethernet/intel/ice/ice_devlink.c
index 455489e9457d..8f73c4008a56 100644
--- a/drivers/net/ethernet/intel/ice/ice_devlink.c
+++ b/drivers/net/ethernet/intel/ice/ice_devlink.c
@@ -376,10 +376,7 @@ static int ice_devlink_info_get(struct devlink *devlink,
 
 /**
  * ice_devlink_reload_empr_start - Start EMP reset to activate new firmware
- * @devlink: pointer to the devlink instance to reload
- * @netns_change: if true, the network namespace is changing
- * @action: the action to perform. Must be DEVLINK_RELOAD_ACTION_FW_ACTIVATE
- * @limit: limits on what reload should do, such as not resetting
+ * @pf: pointer to the pf instance
  * @extack: netlink extended ACK structure
  *
  * Allow user to activate new Embedded Management Processor firmware by
@@ -392,12 +389,9 @@ static int ice_devlink_info_get(struct devlink *devlink,
  * any source.
  */
 static int
-ice_devlink_reload_empr_start(struct devlink *devlink, bool netns_change,
-			      enum devlink_reload_action action,
-			      enum devlink_reload_limit limit,
+ice_devlink_reload_empr_start(struct ice_pf *pf,
 			      struct netlink_ext_ack *extack)
 {
-	struct ice_pf *pf = devlink_priv(devlink);
 	struct device *dev = ice_pf_to_dev(pf);
 	struct ice_hw *hw = &pf->hw;
 	u8 pending;
@@ -435,12 +429,52 @@ ice_devlink_reload_empr_start(struct devlink *devlink, bool netns_change,
 	return 0;
 }
 
+/**
+ * ice_devlink_reload_down - prepare for reload
+ * @devlink: pointer to the devlink instance to reload
+ * @netns_change: if true, the network namespace is changing
+ * @action: the action to perform
+ * @limit: limits on what reload should do, such as not resetting
+ * @extack: netlink extended ACK structure
+ */
+static int
+ice_devlink_reload_down(struct devlink *devlink, bool netns_change,
+			enum devlink_reload_action action,
+			enum devlink_reload_limit limit,
+			struct netlink_ext_ack *extack)
+{
+	struct ice_pf *pf = devlink_priv(devlink);
+
+	switch (action) {
+	case DEVLINK_RELOAD_ACTION_DRIVER_REINIT:
+		if (ice_is_eswitch_mode_switchdev(pf)) {
+			NL_SET_ERR_MSG_MOD(extack,
+					   "Go to legacy mode before doing reinit\n");
+			return -EOPNOTSUPP;
+		}
+		if (ice_is_adq_active(pf)) {
+			NL_SET_ERR_MSG_MOD(extack,
+					   "Turn off ADQ before doing reinit\n");
+			return -EOPNOTSUPP;
+		}
+		if (ice_has_vfs(pf)) {
+			NL_SET_ERR_MSG_MOD(extack,
+					   "Remove all VFs before doing reinit\n");
+			return -EOPNOTSUPP;
+		}
+		ice_unload(pf);
+		return 0;
+	case DEVLINK_RELOAD_ACTION_FW_ACTIVATE:
+		return ice_devlink_reload_empr_start(pf, extack);
+	default:
+		WARN_ON(1);
+		return -EOPNOTSUPP;
+	}
+}
+
 /**
  * ice_devlink_reload_empr_finish - Wait for EMP reset to finish
- * @devlink: pointer to the devlink instance reloading
- * @action: the action requested
- * @limit: limits imposed by userspace, such as not resetting
- * @actions_performed: on return, indicate what actions actually performed
+ * @pf: pointer to the pf instance
  * @extack: netlink extended ACK structure
  *
  * Wait for driver to finish rebuilding after EMP reset is completed. This
@@ -448,17 +482,11 @@ ice_devlink_reload_empr_start(struct devlink *devlink, bool netns_change,
  * for the driver's rebuild to complete.
  */
 static int
-ice_devlink_reload_empr_finish(struct devlink *devlink,
-			       enum devlink_reload_action action,
-			       enum devlink_reload_limit limit,
-			       u32 *actions_performed,
+ice_devlink_reload_empr_finish(struct ice_pf *pf,
 			       struct netlink_ext_ack *extack)
 {
-	struct ice_pf *pf = devlink_priv(devlink);
 	int err;
 
-	*actions_performed = BIT(DEVLINK_RELOAD_ACTION_FW_ACTIVATE);
-
 	err = ice_wait_for_reset(pf, 60 * HZ);
 	if (err) {
 		NL_SET_ERR_MSG_MOD(extack, "Device still resetting after 1 minute");
@@ -713,12 +741,43 @@ ice_devlink_port_unsplit(struct devlink *devlink, struct devlink_port *port,
 	return ice_devlink_port_split(devlink, port, 1, extack);
 }
 
+/**
+ * ice_devlink_reload_up - do reload up after reinit
+ * @devlink: pointer to the devlink instance reloading
+ * @action: the action requested
+ * @limit: limits imposed by userspace, such as not resetting
+ * @actions_performed: on return, indicate what actions actually performed
+ * @extack: netlink extended ACK structure
+ */
+static int
+ice_devlink_reload_up(struct devlink *devlink,
+		      enum devlink_reload_action action,
+		      enum devlink_reload_limit limit,
+		      u32 *actions_performed,
+		      struct netlink_ext_ack *extack)
+{
+	struct ice_pf *pf = devlink_priv(devlink);
+
+	switch (action) {
+	case DEVLINK_RELOAD_ACTION_DRIVER_REINIT:
+		*actions_performed = BIT(DEVLINK_RELOAD_ACTION_DRIVER_REINIT);
+		return ice_load(pf);
+	case DEVLINK_RELOAD_ACTION_FW_ACTIVATE:
+		*actions_performed = BIT(DEVLINK_RELOAD_ACTION_FW_ACTIVATE);
+		return ice_devlink_reload_empr_finish(pf, extack);
+	default:
+		WARN_ON(1);
+		return -EOPNOTSUPP;
+	}
+}
+
 static const struct devlink_ops ice_devlink_ops = {
 	.supported_flash_update_params = DEVLINK_SUPPORT_FLASH_UPDATE_OVERWRITE_MASK,
-	.reload_actions = BIT(DEVLINK_RELOAD_ACTION_FW_ACTIVATE),
+	.reload_actions = BIT(DEVLINK_RELOAD_ACTION_DRIVER_REINIT) |
+			  BIT(DEVLINK_RELOAD_ACTION_FW_ACTIVATE),
 	/* The ice driver currently does not support driver reinit */
-	.reload_down = ice_devlink_reload_empr_start,
-	.reload_up = ice_devlink_reload_empr_finish,
+	.reload_down = ice_devlink_reload_down,
+	.reload_up = ice_devlink_reload_up,
 	.port_split = ice_devlink_port_split,
 	.port_unsplit = ice_devlink_port_unsplit,
 	.eswitch_mode_get = ice_eswitch_mode_get,
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH net-next 11/13] ice: introduce eswitch capable flag
  2022-11-14 12:57 [PATCH net-next 00/13] resource management using devlink reload Michal Swiatkowski
                   ` (9 preceding siblings ...)
  2022-11-14 12:57 ` [PATCH net-next 10/13] ice: implement devlink reinit action Michal Swiatkowski
@ 2022-11-14 12:57 ` Michal Swiatkowski
  2022-11-14 12:57 ` [PATCH net-next 12/13] ice, irdma: prepare reservation of MSI-X to reload Michal Swiatkowski
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 49+ messages in thread
From: Michal Swiatkowski @ 2022-11-14 12:57 UTC (permalink / raw)
  To: netdev, davem, kuba, pabeni, edumazet
  Cc: intel-wired-lan, jiri, anthony.l.nguyen, alexandr.lobakin,
	sridhar.samudrala, wojciech.drewek, lukasz.czapnik,
	shiraz.saleem, jesse.brandeburg, mustafa.ismail,
	przemyslaw.kitszel, piotr.raczynski, jacob.e.keller,
	david.m.ertman, leszek.kaliszczuk, Michal Swiatkowski

The flag is used to check if hardware support eswitch mode. Block going
to switchdev if the flag is unset.

It can be also use to turn the eswitch feature off to save MSI-X
interrupt.

Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
---
 drivers/net/ethernet/intel/ice/ice.h         | 1 +
 drivers/net/ethernet/intel/ice/ice_eswitch.c | 6 ++++++
 drivers/net/ethernet/intel/ice/ice_main.c    | 5 ++++-
 3 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h
index 62219f995cf2..df1f6d85cd43 100644
--- a/drivers/net/ethernet/intel/ice/ice.h
+++ b/drivers/net/ethernet/intel/ice/ice.h
@@ -501,6 +501,7 @@ enum ice_pf_flags {
 	ICE_FLAG_PLUG_AUX_DEV,
 	ICE_FLAG_MTU_CHANGED,
 	ICE_FLAG_GNSS,			/* GNSS successfully initialized */
+	ICE_FLAG_ESWITCH_CAPABLE,	/* switchdev mode can be supported */
 	ICE_PF_FLAGS_NBITS		/* must be last */
 };
 
diff --git a/drivers/net/ethernet/intel/ice/ice_eswitch.c b/drivers/net/ethernet/intel/ice/ice_eswitch.c
index f9f15acae90a..8532d5c47bad 100644
--- a/drivers/net/ethernet/intel/ice/ice_eswitch.c
+++ b/drivers/net/ethernet/intel/ice/ice_eswitch.c
@@ -554,6 +554,12 @@ ice_eswitch_mode_set(struct devlink *devlink, u16 mode,
 		return -EOPNOTSUPP;
 	}
 
+	if (!test_bit(ICE_FLAG_ESWITCH_CAPABLE, pf->flags)) {
+		dev_info(ice_pf_to_dev(pf), "There is no support for switchdev in hardware");
+		NL_SET_ERR_MSG_MOD(extack, "There is no support for switchdev in hardware");
+		return -EOPNOTSUPP;
+	}
+
 	switch (mode) {
 	case DEVLINK_ESWITCH_MODE_LEGACY:
 		dev_info(ice_pf_to_dev(pf), "PF %d changed eswitch mode to legacy",
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index 9766032e95ec..0dfc427e623a 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -3739,8 +3739,10 @@ static void ice_set_pf_caps(struct ice_pf *pf)
 	if (func_caps->common_cap.dcb)
 		set_bit(ICE_FLAG_DCB_CAPABLE, pf->flags);
 	clear_bit(ICE_FLAG_SRIOV_CAPABLE, pf->flags);
+	clear_bit(ICE_FLAG_ESWITCH_CAPABLE, pf->flags);
 	if (func_caps->common_cap.sr_iov_1_1) {
 		set_bit(ICE_FLAG_SRIOV_CAPABLE, pf->flags);
+		set_bit(ICE_FLAG_ESWITCH_CAPABLE, pf->flags);
 		pf->vfs.num_supported = min_t(int, func_caps->num_allocd_vfs,
 					      ICE_MAX_SRIOV_VFS);
 	}
@@ -3881,7 +3883,8 @@ static int ice_ena_msix_range(struct ice_pf *pf)
 		v_other += ICE_FDIR_MSIX;
 
 	/* switchdev */
-	v_other += ICE_ESWITCH_MSIX;
+	if (test_bit(ICE_FLAG_ESWITCH_CAPABLE, pf->flags))
+		v_other += ICE_ESWITCH_MSIX;
 
 	v_wanted = v_other;
 
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH net-next 12/13] ice, irdma: prepare reservation of MSI-X to reload
  2022-11-14 12:57 [PATCH net-next 00/13] resource management using devlink reload Michal Swiatkowski
                   ` (10 preceding siblings ...)
  2022-11-14 12:57 ` [PATCH net-next 11/13] ice: introduce eswitch capable flag Michal Swiatkowski
@ 2022-11-14 12:57 ` Michal Swiatkowski
  2022-11-15  5:08   ` Jakub Kicinski
  2022-11-14 12:57 ` [PATCH net-next 13/13] devlink, ice: add MSIX vectors as devlink resource Michal Swiatkowski
  2022-11-14 13:23 ` [PATCH net-next 00/13] resource management using devlink reload Leon Romanovsky
  13 siblings, 1 reply; 49+ messages in thread
From: Michal Swiatkowski @ 2022-11-14 12:57 UTC (permalink / raw)
  To: netdev, davem, kuba, pabeni, edumazet
  Cc: intel-wired-lan, jiri, anthony.l.nguyen, alexandr.lobakin,
	sridhar.samudrala, wojciech.drewek, lukasz.czapnik,
	shiraz.saleem, jesse.brandeburg, mustafa.ismail,
	przemyslaw.kitszel, piotr.raczynski, jacob.e.keller,
	david.m.ertman, leszek.kaliszczuk, Michal Swiatkowski

Move MSI-X number for LAN and RDMA to structure to have it in one
place. Use req_msix to store how many MSI-X for each feature user
requested. Structure msix is used to store the current number of MSI-X.

The MSI-X number needs to be adjust if kernel doesn't support that many
MSI-X or if hardware doesn't support it. Rewrite MSI-X adjustment
function to use it in both cases.

Use the same algorithm like previously. First allocate minimum MSI-X for
each feature than if there is enough MSI-X distribute it equally between
each one.

Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Co-developed-by: Piotr Raczynski <piotr.raczynski@intel.com>
Signed-off-by: Piotr Raczynski <piotr.raczynski@intel.com>
---
 drivers/infiniband/hw/irdma/main.c           |   2 +-
 drivers/net/ethernet/intel/ice/ice.h         |  16 +-
 drivers/net/ethernet/intel/ice/ice_ethtool.c |   4 +-
 drivers/net/ethernet/intel/ice/ice_idc.c     |   6 +-
 drivers/net/ethernet/intel/ice/ice_lib.c     |   8 +-
 drivers/net/ethernet/intel/ice/ice_main.c    | 284 +++++++++++--------
 6 files changed, 189 insertions(+), 131 deletions(-)

diff --git a/drivers/infiniband/hw/irdma/main.c b/drivers/infiniband/hw/irdma/main.c
index 514453777e07..0a70b27b93a3 100644
--- a/drivers/infiniband/hw/irdma/main.c
+++ b/drivers/infiniband/hw/irdma/main.c
@@ -230,7 +230,7 @@ static void irdma_fill_device_info(struct irdma_device *iwdev, struct ice_pf *pf
 	rf->gen_ops.unregister_qset = irdma_lan_unregister_qset;
 	rf->hw.hw_addr = pf->hw.hw_addr;
 	rf->pcidev = pf->pdev;
-	rf->msix_count =  pf->num_rdma_msix;
+	rf->msix_count =  pf->msix.rdma;
 	rf->pf_id = pf->hw.pf_id;
 	rf->msix_entries = &pf->msix_entries[pf->rdma_base_vector];
 	rf->default_vsi.vsi_idx = vsi->vsi_num;
diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h
index df1f6d85cd43..ef9ea43470bf 100644
--- a/drivers/net/ethernet/intel/ice/ice.h
+++ b/drivers/net/ethernet/intel/ice/ice.h
@@ -518,6 +518,14 @@ struct ice_agg_node {
 	u8 valid;
 };
 
+struct ice_msix {
+	u16 all;			/* All reserved MSI-X */
+	u16 misc;			/* MSI-X reserved for misc */
+	u16 eth;			/* MSI-X reserved for eth */
+	u16 rdma;			/* MSI-X reserved for RDMA */
+	u16 vf;				/* MSI-X reserved for VF */
+};
+
 struct ice_pf {
 	struct pci_dev *pdev;
 
@@ -531,6 +539,10 @@ struct ice_pf {
 	/* OS reserved IRQ details */
 	struct msix_entry *msix_entries;
 	struct ice_res_tracker *irq_tracker;
+	/* Store MSI-X requested by user via devlink */
+	struct ice_msix req_msix;
+	/* Store currently used MSI-X */
+	struct ice_msix msix;
 	/* First MSIX vector used by SR-IOV VFs. Calculated by subtracting the
 	 * number of MSIX vectors needed for all SR-IOV VFs from the number of
 	 * MSIX vectors allowed on this PF.
@@ -561,7 +573,6 @@ struct ice_pf {
 	struct tty_driver *ice_gnss_tty_driver;
 	struct tty_port *gnss_tty_port[ICE_GNSS_TTY_MINOR_DEVICES];
 	struct gnss_serial *gnss_serial[ICE_GNSS_TTY_MINOR_DEVICES];
-	u16 num_rdma_msix;		/* Total MSIX vectors for RDMA driver */
 	u16 rdma_base_vector;
 
 	/* spinlock to protect the AdminQ wait list */
@@ -578,7 +589,6 @@ struct ice_pf {
 	u16 num_avail_sw_msix;	/* remaining MSIX SW vectors left unclaimed */
 	u16 max_pf_txqs;	/* Total Tx queues PF wide */
 	u16 max_pf_rxqs;	/* Total Rx queues PF wide */
-	u16 num_lan_msix;	/* Total MSIX vectors for base driver */
 	u16 num_lan_tx;		/* num LAN Tx queues setup */
 	u16 num_lan_rx;		/* num LAN Rx queues setup */
 	u16 next_vsi;		/* Next free slot in pf->vsi[] - 0-based! */
@@ -934,7 +944,7 @@ void ice_unload(struct ice_pf *pf);
  */
 static inline void ice_set_rdma_cap(struct ice_pf *pf)
 {
-	if (pf->hw.func_caps.common_cap.rdma && pf->num_rdma_msix) {
+	if (pf->hw.func_caps.common_cap.rdma && pf->msix.rdma) {
 		set_bit(ICE_FLAG_RDMA_ENA, pf->flags);
 		set_bit(ICE_FLAG_PLUG_AUX_DEV, pf->flags);
 	}
diff --git a/drivers/net/ethernet/intel/ice/ice_ethtool.c b/drivers/net/ethernet/intel/ice/ice_ethtool.c
index 1dfce48cfdc2..65f71da6bc70 100644
--- a/drivers/net/ethernet/intel/ice/ice_ethtool.c
+++ b/drivers/net/ethernet/intel/ice/ice_ethtool.c
@@ -3513,7 +3513,7 @@ ice_get_ts_info(struct net_device *dev, struct ethtool_ts_info *info)
  */
 static int ice_get_max_txq(struct ice_pf *pf)
 {
-	return min3(pf->num_lan_msix, (u16)num_online_cpus(),
+	return min3(pf->msix.eth, (u16)num_online_cpus(),
 		    (u16)pf->hw.func_caps.common_cap.num_txq);
 }
 
@@ -3523,7 +3523,7 @@ static int ice_get_max_txq(struct ice_pf *pf)
  */
 static int ice_get_max_rxq(struct ice_pf *pf)
 {
-	return min3(pf->num_lan_msix, (u16)num_online_cpus(),
+	return min3(pf->msix.eth, (u16)num_online_cpus(),
 		    (u16)pf->hw.func_caps.common_cap.num_rxq);
 }
 
diff --git a/drivers/net/ethernet/intel/ice/ice_idc.c b/drivers/net/ethernet/intel/ice/ice_idc.c
index e6bc2285071e..2534a3460596 100644
--- a/drivers/net/ethernet/intel/ice/ice_idc.c
+++ b/drivers/net/ethernet/intel/ice/ice_idc.c
@@ -237,11 +237,11 @@ static int ice_reserve_rdma_qvector(struct ice_pf *pf)
 	if (ice_is_rdma_ena(pf)) {
 		int index;
 
-		index = ice_get_res(pf, pf->irq_tracker, pf->num_rdma_msix,
+		index = ice_get_res(pf, pf->irq_tracker, pf->msix.rdma,
 				    ICE_RES_RDMA_VEC_ID);
 		if (index < 0)
 			return index;
-		pf->num_avail_sw_msix -= pf->num_rdma_msix;
+		pf->num_avail_sw_msix -= pf->msix.rdma;
 		pf->rdma_base_vector = (u16)index;
 	}
 	return 0;
@@ -253,7 +253,7 @@ static int ice_reserve_rdma_qvector(struct ice_pf *pf)
  */
 static void ice_free_rdma_qvector(struct ice_pf *pf)
 {
-	pf->num_avail_sw_msix -= pf->num_rdma_msix;
+	pf->num_avail_sw_msix -= pf->msix.rdma;
 	ice_free_res(pf->irq_tracker, pf->rdma_base_vector,
 		     ICE_RES_RDMA_VEC_ID);
 }
diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c
index bc04f2b9f8a2..11b0b3cca76c 100644
--- a/drivers/net/ethernet/intel/ice/ice_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_lib.c
@@ -184,7 +184,7 @@ static void ice_vsi_set_num_qs(struct ice_vsi *vsi, struct ice_vf *vf)
 			vsi->alloc_txq = vsi->req_txq;
 			vsi->num_txq = vsi->req_txq;
 		} else {
-			vsi->alloc_txq = min3(pf->num_lan_msix,
+			vsi->alloc_txq = min3(pf->msix.eth,
 					      ice_get_avail_txq_count(pf),
 					      (u16)num_online_cpus());
 		}
@@ -199,7 +199,7 @@ static void ice_vsi_set_num_qs(struct ice_vsi *vsi, struct ice_vf *vf)
 				vsi->alloc_rxq = vsi->req_rxq;
 				vsi->num_rxq = vsi->req_rxq;
 			} else {
-				vsi->alloc_rxq = min3(pf->num_lan_msix,
+				vsi->alloc_rxq = min3(pf->msix.eth,
 						      ice_get_avail_rxq_count(pf),
 						      (u16)num_online_cpus());
 			}
@@ -207,7 +207,7 @@ static void ice_vsi_set_num_qs(struct ice_vsi *vsi, struct ice_vf *vf)
 
 		pf->num_lan_rx = vsi->alloc_rxq;
 
-		vsi->num_q_vectors = min_t(int, pf->num_lan_msix,
+		vsi->num_q_vectors = min_t(int, pf->msix.eth,
 					   max_t(int, vsi->alloc_rxq,
 						 vsi->alloc_txq));
 		break;
@@ -1100,7 +1100,7 @@ ice_chnl_vsi_setup_q_map(struct ice_vsi *vsi, struct ice_vsi_ctx *ctxt)
 	u8 offset = 0;
 	int pow;
 
-	qcount = min_t(int, vsi->num_rxq, pf->num_lan_msix);
+	qcount = min_t(int, vsi->num_rxq, pf->msix.eth);
 
 	pow = order_base_2(qcount);
 	qmap = ((offset << ICE_AQ_VSI_TC_Q_OFFSET_S) &
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index 0dfc427e623a..a4c283bc8da0 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -3818,119 +3818,193 @@ static int ice_init_pf(struct ice_pf *pf)
 }
 
 /**
- * ice_reduce_msix_usage - Reduce usage of MSI-X vectors
+ * ice_distribute_msix - distribute amount of MSI-X across all features
  * @pf: board private structure
- * @v_remain: number of remaining MSI-X vectors to be distributed
+ * @msix: structure that stores MSI-X amount for all features
+ * @num_msix: amount of MSI-X to distribute
+ * @req_lan: amount of requested LAN MSI-X
+ * @req_rdma: amount of requested RDMA MSI-X
  *
- * Reduce the usage of MSI-X vectors when entire request cannot be fulfilled.
- * pf->num_lan_msix and pf->num_rdma_msix values are set based on number of
- * remaining vectors.
+ * The function is trying to distribute MSI-X across all supported features.
+ * As a side effect the function can turn off not critical features if the
+ * number of msix is too low.
+ *
+ * Function returns -ERANGE if the amount of msix is too low to support
+ * basic driver operations. It is critical error.
+ *
+ * In rest cases function returns number of used MSI-X in total,
+ *
+ * First try to distribute minimum amount of MSI-X across all features. If
+ * there still are remaing MSI-X distribute it equally.
+ *
+ * Function can be called to distribute MSI-X amount received from hardware
+ * or from kernel. To support both cases MSI-X for VF isn't set here. This
+ * should be set outside this function based on returned value.
  */
-static void ice_reduce_msix_usage(struct ice_pf *pf, int v_remain)
+static int
+ice_distribute_msix(struct ice_pf *pf, struct ice_msix *msix, int num_msix,
+		    int req_lan, int req_rdma)
 {
-	int v_rdma;
+	struct device *dev = ice_pf_to_dev(pf);
+	int rem_msix = num_msix;
+	int additional_msix;
 
-	if (!ice_is_rdma_ena(pf)) {
-		pf->num_lan_msix = v_remain;
-		return;
+	msix->all = num_msix;
+	msix->eth = 0;
+	msix->rdma = 0;
+	msix->misc = 0;
+
+	if (num_msix < ICE_MIN_MSIX) {
+		dev_warn(dev, "not enough MSI-X vectors. wanted = %d, available = %d\n",
+			 ICE_MIN_MSIX, num_msix);
+		return -ERANGE;
 	}
 
-	/* RDMA needs at least 1 interrupt in addition to AEQ MSIX */
-	v_rdma = ICE_RDMA_NUM_AEQ_MSIX + 1;
+	msix->misc = ICE_MIN_LAN_OICR_MSIX;
+	/* Minimum LAN */
+	msix->eth = ICE_MIN_LAN_TXRX_MSIX;
+	req_lan -= ICE_MIN_LAN_TXRX_MSIX;
+	rem_msix = rem_msix - msix->misc - msix->eth;
+	additional_msix = req_lan;
 
-	if (v_remain < ICE_MIN_LAN_TXRX_MSIX + ICE_MIN_RDMA_MSIX) {
-		dev_warn(ice_pf_to_dev(pf), "Not enough MSI-X vectors to support RDMA.\n");
-		clear_bit(ICE_FLAG_RDMA_ENA, pf->flags);
+	/* MSI-X distribution for misc */
+	if (test_bit(ICE_FLAG_FD_ENA, pf->flags)) {
+		if (rem_msix >= ICE_FDIR_MSIX) {
+			msix->misc += ICE_FDIR_MSIX;
+			rem_msix -= ICE_FDIR_MSIX;
+		} else {
+			dev_warn(dev, "not enough MSI-X vectors. wanted = %d, available = %d\n",
+				 ICE_FDIR_MSIX, rem_msix);
+			dev_warn(dev, "turning flow director off\n");
+			clear_bit(ICE_FLAG_FD_ENA, pf->flags);
+			goto exit;
+		}
+	}
+
+	if (test_bit(ICE_FLAG_ESWITCH_CAPABLE, pf->flags)) {
+		if (rem_msix >= ICE_ESWITCH_MSIX) {
+			msix->misc += ICE_ESWITCH_MSIX;
+			rem_msix -= ICE_ESWITCH_MSIX;
+		} else {
+			dev_warn(dev, "not enough MSI-X vectors. wanted = %d, available = %d\n",
+				 ICE_ESWITCH_MSIX, rem_msix);
+			dev_warn(dev, "turning eswitch off\n");
+			clear_bit(ICE_FLAG_ESWITCH_CAPABLE, pf->flags);
+			goto exit;
+		}
+	}
 
-		pf->num_rdma_msix = 0;
-		pf->num_lan_msix = ICE_MIN_LAN_TXRX_MSIX;
-	} else if ((v_remain < ICE_MIN_LAN_TXRX_MSIX + v_rdma) ||
-		   (v_remain - v_rdma < v_rdma)) {
-		/* Support minimum RDMA and give remaining vectors to LAN MSIX */
-		pf->num_rdma_msix = ICE_MIN_RDMA_MSIX;
-		pf->num_lan_msix = v_remain - ICE_MIN_RDMA_MSIX;
+	/* Minimum RDMA
+	 * Minimum MSI-X for RDMA is ICE_MIN_RDMA_MSIX, but it is better to
+	 * have ICE_RDMA_NUM_AEQ_MSIX + 1 for queue. Instead of starting
+	 * with ICE_MIN_RDMA_MSIX and going to distribution for LAN, try
+	 * to start with ICE_RDMA_NUM_AEQ_MSIX + 1 if there are MSI-X for it
+	 */
+	if (ice_is_rdma_ena(pf)) {
+		if (rem_msix >= ICE_RDMA_NUM_AEQ_MSIX + 1 &&
+		    req_rdma >= ICE_RDMA_NUM_AEQ_MSIX + 1) {
+			msix->rdma = ICE_RDMA_NUM_AEQ_MSIX + 1;
+			rem_msix -= msix->rdma;
+			req_rdma -= msix->rdma;
+			additional_msix += req_rdma;
+		} else if (rem_msix >= ICE_MIN_RDMA_MSIX) {
+			msix->rdma = ICE_MIN_RDMA_MSIX;
+			rem_msix -= ICE_MIN_RDMA_MSIX;
+			req_rdma -= ICE_MIN_RDMA_MSIX;
+			additional_msix += req_rdma;
+		} else {
+			dev_warn(dev, "not enough MSI-X vectors. wanted = %d, available = %d\n",
+				 ICE_MIN_RDMA_MSIX, rem_msix);
+			dev_warn(dev, "turning RDMA off\n");
+			clear_bit(ICE_FLAG_RDMA_ENA, pf->flags);
+			goto exit;
+		}
 	} else {
-		/* Split remaining MSIX with RDMA after accounting for AEQ MSIX
-		 */
-		pf->num_rdma_msix = (v_remain - ICE_RDMA_NUM_AEQ_MSIX) / 2 +
-				    ICE_RDMA_NUM_AEQ_MSIX;
-		pf->num_lan_msix = v_remain - pf->num_rdma_msix;
+		req_rdma = 0;
 	}
-}
 
-/**
- * ice_ena_msix_range - Request a range of MSIX vectors from the OS
- * @pf: board private structure
- *
- * Compute the number of MSIX vectors wanted and request from the OS. Adjust
- * device usage if there are not enough vectors. Return the number of vectors
- * reserved or negative on failure.
- */
-static int ice_ena_msix_range(struct ice_pf *pf)
-{
-	int num_cpus, hw_num_msix, v_other, v_wanted, v_actual;
-	struct device *dev = ice_pf_to_dev(pf);
-	int err, i;
+	/* Check if there is enough additional MSI-X for all used features */
+	if (rem_msix >= additional_msix) {
+		/* Best case scenario */
+		msix->eth += req_lan;
+		rem_msix -= req_lan;
 
-	hw_num_msix = pf->hw.func_caps.common_cap.num_msix_vectors;
-	num_cpus = num_online_cpus();
+		msix->rdma += req_rdma;
+		rem_msix -= req_rdma;
+	} else if (rem_msix) {
+		int factor = DIV_ROUND_UP(additional_msix, rem_msix);
+		int rdma = req_rdma / factor;
 
-	/* LAN miscellaneous handler */
-	v_other = ICE_MIN_LAN_OICR_MSIX;
+		dev_warn(dev, "not enough MSI-X vectors. requested = %d, obtained = %d\n",
+			 additional_msix, rem_msix);
 
-	/* Flow Director */
-	if (test_bit(ICE_FLAG_FD_ENA, pf->flags))
-		v_other += ICE_FDIR_MSIX;
+		msix->rdma += rdma;
+		rem_msix -= rdma;
 
-	/* switchdev */
-	if (test_bit(ICE_FLAG_ESWITCH_CAPABLE, pf->flags))
-		v_other += ICE_ESWITCH_MSIX;
+		msix->eth += rem_msix;
+		rem_msix = 0;
 
-	v_wanted = v_other;
+		dev_notice(dev, "enable %d MSI-X vectors for LAN and %d for RDMA\n",
+			   msix->eth, msix->rdma);
+	}
 
-	/* LAN traffic */
-	pf->num_lan_msix = num_cpus;
-	v_wanted += pf->num_lan_msix;
+	/* It is fine if rem_msix isn't 0 here, as it can be used for VF.
+	 * It isn't done here as this function also cover MSI-X distribution
+	 * for MSI-X amount returned from kernel
+	 */
 
-	/* RDMA auxiliary driver */
-	if (ice_is_rdma_ena(pf)) {
-		pf->num_rdma_msix = num_cpus + ICE_RDMA_NUM_AEQ_MSIX;
-		v_wanted += pf->num_rdma_msix;
-	}
+exit:
+	/* Return number of used MSI-X instead of remaing */
+	return num_msix - rem_msix;
+}
 
-	if (v_wanted > hw_num_msix) {
-		int v_remain;
+/**
+ * ice_set_def_msix - set default value for MSI-X based on hardware
+ * @pf: board private structure
+ *
+ * Function will return -ERANGE when there is critical error and
+ * minimum amount of MSI-X can't be used. In success returns 0.
+ */
+static int
+ice_set_def_msix(struct ice_pf *pf)
+{
+	int hw_msix = pf->hw.func_caps.common_cap.num_msix_vectors;
+	int num_cpus = num_present_cpus();
+	int used_msix;
 
-		dev_warn(dev, "not enough device MSI-X vectors. wanted = %d, available = %d\n",
-			 v_wanted, hw_num_msix);
+	/* set default only if it wasn't previously set in devlink API */
+	if (pf->req_msix.eth)
+		return 0;
 
-		if (hw_num_msix < ICE_MIN_MSIX) {
-			err = -ERANGE;
-			goto exit_err;
-		}
+	used_msix = ice_distribute_msix(pf, &pf->req_msix, hw_msix, num_cpus,
+					num_cpus + ICE_RDMA_NUM_AEQ_MSIX);
 
-		v_remain = hw_num_msix - v_other;
-		if (v_remain < ICE_MIN_LAN_TXRX_MSIX) {
-			v_other = ICE_MIN_MSIX - ICE_MIN_LAN_TXRX_MSIX;
-			v_remain = ICE_MIN_LAN_TXRX_MSIX;
-		}
+	if (used_msix < 0)
+		return -ERANGE;
 
-		ice_reduce_msix_usage(pf, v_remain);
-		v_wanted = pf->num_lan_msix + pf->num_rdma_msix + v_other;
+	pf->req_msix.vf = hw_msix - used_msix;
 
-		dev_notice(dev, "Reducing request to %d MSI-X vectors for LAN traffic.\n",
-			   pf->num_lan_msix);
-		if (ice_is_rdma_ena(pf))
-			dev_notice(dev, "Reducing request to %d MSI-X vectors for RDMA.\n",
-				   pf->num_rdma_msix);
-	}
+	return 0;
+}
 
+/**
+ * ice_ena_msix_range - Request a range of MSIX vectors from the OS
+ * @pf: board private structure
+ *
+ * Compute the number of MSIX vectors wanted and request from the OS. Adjust
+ * device usage if there are not enough vectors. Return the number of vectors
+ * reserved or negative on failure.
+ */
+static int ice_ena_msix_range(struct ice_pf *pf)
+{
+	struct device *dev = ice_pf_to_dev(pf);
+	int v_wanted, v_actual, err, i;
+
+	v_wanted = pf->req_msix.eth + pf->req_msix.rdma + pf->req_msix.misc;
 	pf->msix_entries = devm_kcalloc(dev, v_wanted,
 					sizeof(*pf->msix_entries), GFP_KERNEL);
-	if (!pf->msix_entries) {
-		err = -ENOMEM;
-		goto exit_err;
-	}
+	if (!pf->msix_entries)
+		return -ENOMEM;
 
 	for (i = 0; i < v_wanted; i++)
 		pf->msix_entries[i].entry = i;
@@ -3938,46 +4012,17 @@ static int ice_ena_msix_range(struct ice_pf *pf)
 	/* actually reserve the vectors */
 	v_actual = pci_enable_msix_range(pf->pdev, pf->msix_entries,
 					 ICE_MIN_MSIX, v_wanted);
-	if (v_actual < 0) {
-		dev_err(dev, "unable to reserve MSI-X vectors\n");
-		err = v_actual;
-		goto msix_err;
-	}
 
-	if (v_actual < v_wanted) {
-		dev_warn(dev, "not enough OS MSI-X vectors. requested = %d, obtained = %d\n",
-			 v_wanted, v_actual);
-
-		if (v_actual < ICE_MIN_MSIX) {
-			/* error if we can't get minimum vectors */
-			pci_disable_msix(pf->pdev);
-			err = -ERANGE;
-			goto msix_err;
-		} else {
-			int v_remain = v_actual - v_other;
-
-			if (v_remain < ICE_MIN_LAN_TXRX_MSIX)
-				v_remain = ICE_MIN_LAN_TXRX_MSIX;
-
-			ice_reduce_msix_usage(pf, v_remain);
-
-			dev_notice(dev, "Enabled %d MSI-X vectors for LAN traffic.\n",
-				   pf->num_lan_msix);
-
-			if (ice_is_rdma_ena(pf))
-				dev_notice(dev, "Enabled %d MSI-X vectors for RDMA.\n",
-					   pf->num_rdma_msix);
-		}
-	}
+	v_actual = ice_distribute_msix(pf, &pf->msix, v_actual,
+				       pf->req_msix.eth, pf->req_msix.rdma);
+	if (v_actual < 0)
+		goto msix_err;
 
 	return v_actual;
 
 msix_err:
 	devm_kfree(dev, pf->msix_entries);
 
-exit_err:
-	pf->num_rdma_msix = 0;
-	pf->num_lan_msix = 0;
 	return err;
 }
 
@@ -4014,6 +4059,9 @@ static int ice_init_interrupt_scheme(struct ice_pf *pf)
 {
 	int vectors;
 
+	if (ice_set_def_msix(pf))
+		return -ERANGE;
+
 	vectors = ice_ena_msix_range(pf);
 
 	if (vectors < 0)
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH net-next 13/13] devlink, ice: add MSIX vectors as devlink resource
  2022-11-14 12:57 [PATCH net-next 00/13] resource management using devlink reload Michal Swiatkowski
                   ` (11 preceding siblings ...)
  2022-11-14 12:57 ` [PATCH net-next 12/13] ice, irdma: prepare reservation of MSI-X to reload Michal Swiatkowski
@ 2022-11-14 12:57 ` Michal Swiatkowski
  2022-11-14 15:28   ` Jiri Pirko
  2022-11-14 13:23 ` [PATCH net-next 00/13] resource management using devlink reload Leon Romanovsky
  13 siblings, 1 reply; 49+ messages in thread
From: Michal Swiatkowski @ 2022-11-14 12:57 UTC (permalink / raw)
  To: netdev, davem, kuba, pabeni, edumazet
  Cc: intel-wired-lan, jiri, anthony.l.nguyen, alexandr.lobakin,
	sridhar.samudrala, wojciech.drewek, lukasz.czapnik,
	shiraz.saleem, jesse.brandeburg, mustafa.ismail,
	przemyslaw.kitszel, piotr.raczynski, jacob.e.keller,
	david.m.ertman, leszek.kaliszczuk, Michal Kubiak,
	Michal Swiatkowski

From: Michal Kubiak <michal.kubiak@intel.com>

Implement devlink resource to control how many MSI-X vectors are
used for eth, VF and RDMA. Show misc MSI-X as read only.

This is first approach to control the mix of resources managed
by ice driver. This commit registers number of available MSI-X
as devlink resource and also add specific resources for eth, vf and RDMA.

Also, make those resources generic.

$ devlink resource show pci/0000:31:00.0
  name msix size 1024 occ 172 unit entry dpipe_tables none
    resources:
      name msix_misc size 4 unit entry dpipe_tables none
      name msix_eth size 92 occ 92 unit entry size_min 1 size_max
	128 size_gran 1 dpipe_tables none
      name msix_vf size 128 occ 0 unit entry size_min 0 size_max
	1020 size_gran 1 dpipe_tables none
      name msix_rdma size 76 occ 76 unit entry size_min 0 size_max
	132 size_gran 1 dpipe_tables none

example commands:
$ devlink resource set pci/0000:31:00.0 path msix/msix_eth size 16
$ devlink resource set pci/0000:31:00.0 path msix/msix_vf size 512
$ devlink dev reload pci/0000:31:00.0

Signed-off-by: Michal Kubiak <michal.kubiak@intel.com>
Co-developed-by: Piotr Raczynski <piotr.raczynski@intel.com>
Signed-off-by: Piotr Raczynski <piotr.raczynski@intel.com>
Co-developed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
---
 .../networking/devlink/devlink-resource.rst   |  10 ++
 drivers/net/ethernet/intel/ice/ice_devlink.c  | 160 ++++++++++++++++++
 drivers/net/ethernet/intel/ice/ice_devlink.h  |   2 +
 drivers/net/ethernet/intel/ice/ice_main.c     |   5 +
 drivers/net/ethernet/intel/ice/ice_sriov.c    |   3 +-
 include/net/devlink.h                         |  14 ++
 6 files changed, 192 insertions(+), 2 deletions(-)

diff --git a/Documentation/networking/devlink/devlink-resource.rst b/Documentation/networking/devlink/devlink-resource.rst
index 3d5ae51e65a2..dcd3d124ebfb 100644
--- a/Documentation/networking/devlink/devlink-resource.rst
+++ b/Documentation/networking/devlink/devlink-resource.rst
@@ -36,6 +36,16 @@ device drivers and their description must be added to the following table:
      - Description
    * - ``physical_ports``
      - A limited capacity of physical ports that the switch ASIC can support
+   * - ``msix``
+     - Amount of MSIX interrupt vectors that the device can support
+   * - ``msix_misc``
+     - Amount of MSIX interrupt vectors that can be used for miscellaneous purposes
+   * - ``msix_eth``
+     - Amount of MSIX interrupt vectors that can be used by ethernet
+   * - ``msix_vf``
+     - Amount of MSIX interrupt vectors that can be used by virtual functions
+   * - ``msix_rdma``
+     - Amount of MSIX interrupt vectors that can be used by RDMA
 
 example usage
 -------------
diff --git a/drivers/net/ethernet/intel/ice/ice_devlink.c b/drivers/net/ethernet/intel/ice/ice_devlink.c
index 8f73c4008a56..00f01cf84812 100644
--- a/drivers/net/ethernet/intel/ice/ice_devlink.c
+++ b/drivers/net/ethernet/intel/ice/ice_devlink.c
@@ -496,6 +496,27 @@ ice_devlink_reload_empr_finish(struct ice_pf *pf,
 	return 0;
 }
 
+static void ice_devlink_read_resources_size(struct ice_pf *pf)
+{
+	struct devlink *devlink = priv_to_devlink(pf);
+	u64 size_new;
+
+	devl_resource_size_get(devlink,
+			       DEVLINK_RESOURCE_GENERIC_ID_MSIX_ETH,
+			       &size_new);
+	pf->req_msix.eth = size_new;
+
+	devl_resource_size_get(devlink,
+			       DEVLINK_RESOURCE_GENERIC_ID_MSIX_VF,
+			       &size_new);
+	pf->msix.vf = size_new;
+
+	devl_resource_size_get(devlink,
+			       DEVLINK_RESOURCE_GENERIC_ID_MSIX_RDMA,
+			       &size_new);
+	pf->req_msix.rdma = size_new;
+}
+
 /**
  * ice_devlink_port_opt_speed_str - convert speed to a string
  * @speed: speed value
@@ -761,6 +782,7 @@ ice_devlink_reload_up(struct devlink *devlink,
 	switch (action) {
 	case DEVLINK_RELOAD_ACTION_DRIVER_REINIT:
 		*actions_performed = BIT(DEVLINK_RELOAD_ACTION_DRIVER_REINIT);
+		ice_devlink_read_resources_size(pf);
 		return ice_load(pf);
 	case DEVLINK_RELOAD_ACTION_FW_ACTIVATE:
 		*actions_performed = BIT(DEVLINK_RELOAD_ACTION_FW_ACTIVATE);
@@ -955,6 +977,144 @@ void ice_devlink_unregister(struct ice_pf *pf)
 	devlink_unregister(priv_to_devlink(pf));
 }
 
+static u64 ice_devlink_res_msix_pf_occ_get(void *priv)
+{
+	struct ice_pf *pf = priv;
+
+	return pf->msix.eth;
+}
+
+static u64 ice_devlink_res_msix_vf_occ_get(void *priv)
+{
+	struct ice_pf *pf = priv;
+
+	return pf->vfs.num_msix_per * ice_get_num_vfs(pf);
+}
+
+static u64 ice_devlink_res_msix_rdma_occ_get(void *priv)
+{
+	struct ice_pf *pf = priv;
+
+	return pf->msix.rdma;
+}
+
+static u64 ice_devlink_res_msix_occ_get(void *priv)
+{
+	struct ice_pf *pf = priv;
+
+	return ice_devlink_res_msix_pf_occ_get(priv) +
+	       ice_devlink_res_msix_vf_occ_get(priv) +
+	       ice_devlink_res_msix_rdma_occ_get(priv) +
+	       pf->msix.misc;
+}
+
+int ice_devlink_register_resources(struct ice_pf *pf)
+{
+	struct devlink *devlink = priv_to_devlink(pf);
+	struct devlink_resource_size_params params;
+	struct device *dev = ice_pf_to_dev(pf);
+	struct ice_msix *req = &pf->req_msix;
+	int err, max_pf_msix, max_rdma_msix;
+	const char *res_name;
+
+	max_pf_msix = min_t(int, num_online_cpus(), ICE_MAX_LG_RSS_QS);
+	max_rdma_msix = max_pf_msix + ICE_RDMA_NUM_AEQ_MSIX;
+
+	devlink_resource_size_params_init(&params, req->all, req->all, 1,
+					  DEVLINK_RESOURCE_UNIT_ENTRY);
+	res_name = DEVLINK_RESOURCE_GENERIC_NAME_MSIX;
+	err = devlink_resource_register(devlink, res_name, req->all,
+					DEVLINK_RESOURCE_GENERIC_ID_MSIX,
+					DEVLINK_RESOURCE_ID_PARENT_TOP,
+					&params);
+	if (err)
+		goto res_create_err;
+
+	devlink_resource_size_params_init(&params, req->misc, req->misc, 1,
+					  DEVLINK_RESOURCE_UNIT_ENTRY);
+	res_name = DEVLINK_RESOURCE_GENERIC_NAME_MSIX_MISC;
+	err = devlink_resource_register(devlink, res_name, req->misc,
+					DEVLINK_RESOURCE_GENERIC_ID_MSIX_MISC,
+					DEVLINK_RESOURCE_GENERIC_ID_MSIX,
+					&params);
+	if (err)
+		goto res_create_err;
+
+	devlink_resource_size_params_init(&params, ICE_MIN_LAN_TXRX_MSIX,
+					  max_pf_msix, 1,
+					  DEVLINK_RESOURCE_UNIT_ENTRY);
+	res_name = DEVLINK_RESOURCE_GENERIC_NAME_MSIX_ETH;
+	err = devlink_resource_register(devlink, res_name, req->eth,
+					DEVLINK_RESOURCE_GENERIC_ID_MSIX_ETH,
+					DEVLINK_RESOURCE_GENERIC_ID_MSIX,
+					&params);
+	if (err)
+		goto res_create_err;
+
+	devlink_resource_size_params_init(&params, 0, req->all - req->misc, 1,
+					  DEVLINK_RESOURCE_UNIT_ENTRY);
+
+	res_name = DEVLINK_RESOURCE_GENERIC_NAME_MSIX_VF;
+	err = devlink_resource_register(devlink, res_name, req->vf,
+					DEVLINK_RESOURCE_GENERIC_ID_MSIX_VF,
+					DEVLINK_RESOURCE_GENERIC_ID_MSIX,
+					&params);
+	if (err)
+		goto res_create_err;
+
+	devlink_resource_size_params_init(&params, ICE_MIN_RDMA_MSIX,
+					  max_rdma_msix, 1,
+					  DEVLINK_RESOURCE_UNIT_ENTRY);
+
+	res_name = DEVLINK_RESOURCE_GENERIC_NAME_MSIX_RDMA;
+	err = devlink_resource_register(devlink, res_name, req->rdma,
+					DEVLINK_RESOURCE_GENERIC_ID_MSIX_RDMA,
+					DEVLINK_RESOURCE_GENERIC_ID_MSIX,
+					&params);
+	if (err)
+		goto res_create_err;
+
+	devlink_resource_occ_get_register(devlink,
+					  DEVLINK_RESOURCE_GENERIC_ID_MSIX,
+					  ice_devlink_res_msix_occ_get, pf);
+
+	devlink_resource_occ_get_register(devlink,
+					  DEVLINK_RESOURCE_GENERIC_ID_MSIX_ETH,
+					  ice_devlink_res_msix_pf_occ_get, pf);
+
+	devlink_resource_occ_get_register(devlink,
+					  DEVLINK_RESOURCE_GENERIC_ID_MSIX_VF,
+					  ice_devlink_res_msix_vf_occ_get, pf);
+
+	devlink_resource_occ_get_register(devlink,
+					  DEVLINK_RESOURCE_GENERIC_ID_MSIX_RDMA,
+					  ice_devlink_res_msix_rdma_occ_get,
+					  pf);
+	return 0;
+
+res_create_err:
+	dev_err(dev, "Failed to register devlink resource: %s error: %pe\n",
+		res_name, ERR_PTR(err));
+	devlink_resources_unregister(devlink);
+
+	return err;
+}
+
+void ice_devlink_unregister_resources(struct ice_pf *pf)
+{
+	struct devlink *devlink = priv_to_devlink(pf);
+
+	devlink_resource_occ_get_unregister(devlink,
+					    DEVLINK_RESOURCE_GENERIC_ID_MSIX);
+	devlink_resource_occ_get_unregister(devlink,
+					    DEVLINK_RESOURCE_GENERIC_ID_MSIX_ETH);
+	devlink_resource_occ_get_unregister(devlink,
+					    DEVLINK_RESOURCE_GENERIC_ID_MSIX_VF);
+	devlink_resource_occ_get_unregister(devlink,
+					    DEVLINK_RESOURCE_GENERIC_ID_MSIX_RDMA);
+	devlink_resources_unregister(devlink);
+}
+
 /**
  * ice_devlink_set_switch_id - Set unique switch id based on pci dsn
  * @pf: the PF to create a devlink port for
diff --git a/drivers/net/ethernet/intel/ice/ice_devlink.h b/drivers/net/ethernet/intel/ice/ice_devlink.h
index fe006d9946f8..35233ea087fc 100644
--- a/drivers/net/ethernet/intel/ice/ice_devlink.h
+++ b/drivers/net/ethernet/intel/ice/ice_devlink.h
@@ -10,6 +10,8 @@ void ice_devlink_register(struct ice_pf *pf);
 void ice_devlink_unregister(struct ice_pf *pf);
 int ice_devlink_register_params(struct ice_pf *pf);
 void ice_devlink_unregister_params(struct ice_pf *pf);
+int ice_devlink_register_resources(struct ice_pf *pf);
+void ice_devlink_unregister_resources(struct ice_pf *pf);
 int ice_devlink_create_pf_port(struct ice_pf *pf);
 void ice_devlink_destroy_pf_port(struct ice_pf *pf);
 int ice_devlink_create_vf_port(struct ice_vf *vf);
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index a4c283bc8da0..ba6f93e7e640 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -4970,6 +4970,10 @@ static int ice_init_devlink(struct ice_pf *pf)
 	if (err)
 		return err;
 
+	err = ice_devlink_register_resources(pf);
+	if (err)
+		return err;
+
 	ice_devlink_init_regions(pf);
 	ice_devlink_register(pf);
 
@@ -4980,6 +4984,7 @@ static void ice_deinit_devlink(struct ice_pf *pf)
 {
 	ice_devlink_unregister(pf);
 	ice_devlink_destroy_regions(pf);
+	ice_devlink_unregister_resources(pf);
 	ice_devlink_unregister_params(pf);
 }
 
diff --git a/drivers/net/ethernet/intel/ice/ice_sriov.c b/drivers/net/ethernet/intel/ice/ice_sriov.c
index 3ba1408c56a9..d42df5983494 100644
--- a/drivers/net/ethernet/intel/ice/ice_sriov.c
+++ b/drivers/net/ethernet/intel/ice/ice_sriov.c
@@ -518,8 +518,7 @@ static int ice_set_per_vf_res(struct ice_pf *pf, u16 num_vfs)
 		return -ENOSPC;
 
 	/* determine MSI-X resources per VF */
-	msix_avail_for_sriov = pf->hw.func_caps.common_cap.num_msix_vectors -
-		pf->irq_tracker->num_entries;
+	msix_avail_for_sriov = pf->msix.vf;
 	msix_avail_per_vf = msix_avail_for_sriov / num_vfs;
 	if (msix_avail_per_vf >= ICE_NUM_VF_MSIX_MED) {
 		num_msix_per_vf = ICE_NUM_VF_MSIX_MED;
diff --git a/include/net/devlink.h b/include/net/devlink.h
index 611a23a3deb2..a041c50f72c6 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -409,7 +409,21 @@ typedef u64 devlink_resource_occ_get_t(void *priv);
 
 #define DEVLINK_RESOURCE_ID_PARENT_TOP 0
 
+enum devlink_msix_resource_id {
+	/* generic resource for MSIX */
+	DEVLINK_RESOURCE_GENERIC_ID_MSIX = 1,
+	DEVLINK_RESOURCE_GENERIC_ID_MSIX_MISC,
+	DEVLINK_RESOURCE_GENERIC_ID_MSIX_ETH,
+	DEVLINK_RESOURCE_GENERIC_ID_MSIX_VF,
+	DEVLINK_RESOURCE_GENERIC_ID_MSIX_RDMA,
+};
+
 #define DEVLINK_RESOURCE_GENERIC_NAME_PORTS "physical_ports"
+#define DEVLINK_RESOURCE_GENERIC_NAME_MSIX "msix"
+#define DEVLINK_RESOURCE_GENERIC_NAME_MSIX_MISC "msix_misc"
+#define DEVLINK_RESOURCE_GENERIC_NAME_MSIX_ETH "msix_eth"
+#define DEVLINK_RESOURCE_GENERIC_NAME_MSIX_VF "msix_vf"
+#define DEVLINK_RESOURCE_GENERIC_NAME_MSIX_RDMA "msix_rdma"
 
 #define __DEVLINK_PARAM_MAX_STRING_VALUE 32
 enum devlink_param_type {
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH net-next 00/13] resource management using devlink reload
  2022-11-14 12:57 [PATCH net-next 00/13] resource management using devlink reload Michal Swiatkowski
                   ` (12 preceding siblings ...)
  2022-11-14 12:57 ` [PATCH net-next 13/13] devlink, ice: add MSIX vectors as devlink resource Michal Swiatkowski
@ 2022-11-14 13:23 ` Leon Romanovsky
  2022-11-14 15:31   ` Samudrala, Sridhar
  13 siblings, 1 reply; 49+ messages in thread
From: Leon Romanovsky @ 2022-11-14 13:23 UTC (permalink / raw)
  To: Michal Swiatkowski
  Cc: netdev, davem, kuba, pabeni, edumazet, intel-wired-lan, jiri,
	anthony.l.nguyen, alexandr.lobakin, sridhar.samudrala,
	wojciech.drewek, lukasz.czapnik, shiraz.saleem, jesse.brandeburg,
	mustafa.ismail, przemyslaw.kitszel, piotr.raczynski,
	jacob.e.keller, david.m.ertman, leszek.kaliszczuk

On Mon, Nov 14, 2022 at 01:57:42PM +0100, Michal Swiatkowski wrote:
> Currently the default value for number of PF vectors is number of CPUs.
> Because of that there are cases when all vectors are used for PF
> and user can't create more VFs. It is hard to set default number of
> CPUs right for all different use cases. Instead allow user to choose
> how many vectors should be used for various features. After implementing
> subdevices this mechanism will be also used to set number of vectors
> for subfunctions.
> 
> The idea is to set vectors for eth or VFs using devlink resource API.
> New value of vectors will be used after devlink reinit. Example
> commands:
> $ sudo devlink resource set pci/0000:31:00.0 path msix/msix_eth size 16
> $ sudo devlink dev reload pci/0000:31:00.0
> After reload driver will work with 16 vectors used for eth instead of
> num_cpus.

By saying "vectors", are you referring to MSI-X vectors?
If yes, you have specific interface for that.
https://lore.kernel.org/linux-pci/20210314124256.70253-1-leon@kernel.org/

Thanks

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH net-next 13/13] devlink, ice: add MSIX vectors as devlink resource
  2022-11-14 12:57 ` [PATCH net-next 13/13] devlink, ice: add MSIX vectors as devlink resource Michal Swiatkowski
@ 2022-11-14 15:28   ` Jiri Pirko
  2022-11-14 16:03     ` Piotr Raczynski
  0 siblings, 1 reply; 49+ messages in thread
From: Jiri Pirko @ 2022-11-14 15:28 UTC (permalink / raw)
  To: Michal Swiatkowski
  Cc: netdev, davem, kuba, pabeni, edumazet, intel-wired-lan,
	anthony.l.nguyen, alexandr.lobakin, sridhar.samudrala,
	wojciech.drewek, lukasz.czapnik, shiraz.saleem, jesse.brandeburg,
	mustafa.ismail, przemyslaw.kitszel, piotr.raczynski,
	jacob.e.keller, david.m.ertman, leszek.kaliszczuk, Michal Kubiak

Mon, Nov 14, 2022 at 01:57:55PM CET, michal.swiatkowski@linux.intel.com wrote:
>From: Michal Kubiak <michal.kubiak@intel.com>
>
>Implement devlink resource to control how many MSI-X vectors are
>used for eth, VF and RDMA. Show misc MSI-X as read only.
>
>This is first approach to control the mix of resources managed
>by ice driver. This commit registers number of available MSI-X
>as devlink resource and also add specific resources for eth, vf and RDMA.
>
>Also, make those resources generic.
>
>$ devlink resource show pci/0000:31:00.0
>  name msix size 1024 occ 172 unit entry dpipe_tables none


So, 1024 is the total vector count available in your hw?


>    resources:
>      name msix_misc size 4 unit entry dpipe_tables none

What's misc? Why you don't show occupancy for it? Yet, it seems to be
accounted in the total (172)

Also, drop the "msix_" prefix from all, you already have parent called
"msix".


>      name msix_eth size 92 occ 92 unit entry size_min 1 size_max

Why "size_min is not 0 here?


>	128 size_gran 1 dpipe_tables none
>      name msix_vf size 128 occ 0 unit entry size_min 0 size_max
>	1020 size_gran 1 dpipe_tables none
>      name msix_rdma size 76 occ 76 unit entry size_min 0 size_max

Okay, this means that for eth and rdma, the vectors are fully used, no
VF is instanciated?



>	132 size_gran 1 dpipe_tables none
>
>example commands:
>$ devlink resource set pci/0000:31:00.0 path msix/msix_eth size 16
>$ devlink resource set pci/0000:31:00.0 path msix/msix_vf size 512
>$ devlink dev reload pci/0000:31:00.0

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH net-next 00/13] resource management using devlink reload
  2022-11-14 13:23 ` [PATCH net-next 00/13] resource management using devlink reload Leon Romanovsky
@ 2022-11-14 15:31   ` Samudrala, Sridhar
  2022-11-14 16:58     ` Keller, Jacob E
  2022-11-14 17:07     ` Leon Romanovsky
  0 siblings, 2 replies; 49+ messages in thread
From: Samudrala, Sridhar @ 2022-11-14 15:31 UTC (permalink / raw)
  To: Leon Romanovsky, Michal Swiatkowski
  Cc: netdev, davem, kuba, pabeni, edumazet, intel-wired-lan, jiri,
	anthony.l.nguyen, alexandr.lobakin, wojciech.drewek,
	lukasz.czapnik, shiraz.saleem, jesse.brandeburg, mustafa.ismail,
	przemyslaw.kitszel, piotr.raczynski, jacob.e.keller,
	david.m.ertman, leszek.kaliszczuk

On 11/14/2022 7:23 AM, Leon Romanovsky wrote:
> On Mon, Nov 14, 2022 at 01:57:42PM +0100, Michal Swiatkowski wrote:
>> Currently the default value for number of PF vectors is number of CPUs.
>> Because of that there are cases when all vectors are used for PF
>> and user can't create more VFs. It is hard to set default number of
>> CPUs right for all different use cases. Instead allow user to choose
>> how many vectors should be used for various features. After implementing
>> subdevices this mechanism will be also used to set number of vectors
>> for subfunctions.
>>
>> The idea is to set vectors for eth or VFs using devlink resource API.
>> New value of vectors will be used after devlink reinit. Example
>> commands:
>> $ sudo devlink resource set pci/0000:31:00.0 path msix/msix_eth size 16
>> $ sudo devlink dev reload pci/0000:31:00.0
>> After reload driver will work with 16 vectors used for eth instead of
>> num_cpus.
> By saying "vectors", are you referring to MSI-X vectors?
> If yes, you have specific interface for that.
> https://lore.kernel.org/linux-pci/20210314124256.70253-1-leon@kernel.org/

This patch series is exposing a resources API to split the device level MSI-X vectors
across the different functions supported by the device (PF, RDMA, SR-IOV VFs and
in future subfunctions). Today this is all hidden in a policy implemented within
the PF driver.

The patch you are referring to seems to be providing an interface to change the
msix count for a particular VF. This patch is providing a interface to set the total
msix count for all the possible VFs from the available device level pool of
msix-vectors.




^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH net-next 13/13] devlink, ice: add MSIX vectors as devlink resource
  2022-11-14 15:28   ` Jiri Pirko
@ 2022-11-14 16:03     ` Piotr Raczynski
  2022-11-15  6:56       ` Michal Swiatkowski
  2022-11-15 12:08       ` Jiri Pirko
  0 siblings, 2 replies; 49+ messages in thread
From: Piotr Raczynski @ 2022-11-14 16:03 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Michal Swiatkowski, netdev, davem, kuba, pabeni, edumazet,
	intel-wired-lan, anthony.l.nguyen, alexandr.lobakin,
	sridhar.samudrala, wojciech.drewek, lukasz.czapnik,
	shiraz.saleem, jesse.brandeburg, mustafa.ismail,
	przemyslaw.kitszel, jacob.e.keller, david.m.ertman,
	leszek.kaliszczuk, Michal Kubiak

On Mon, Nov 14, 2022 at 04:28:38PM +0100, Jiri Pirko wrote:
> Mon, Nov 14, 2022 at 01:57:55PM CET, michal.swiatkowski@linux.intel.com wrote:
> >From: Michal Kubiak <michal.kubiak@intel.com>
> >
> >Implement devlink resource to control how many MSI-X vectors are
> >used for eth, VF and RDMA. Show misc MSI-X as read only.
> >
> >This is first approach to control the mix of resources managed
> >by ice driver. This commit registers number of available MSI-X
> >as devlink resource and also add specific resources for eth, vf and RDMA.
> >
> >Also, make those resources generic.
> >
> >$ devlink resource show pci/0000:31:00.0
> >  name msix size 1024 occ 172 unit entry dpipe_tables none
> 
> 
> So, 1024 is the total vector count available in your hw?
> 

For this particular device and physical function, yes.


> 
> >    resources:
> >      name msix_misc size 4 unit entry dpipe_tables none
> 
> What's misc? Why you don't show occupancy for it? Yet, it seems to be
> accounted in the total (172)
> 
> Also, drop the "msix_" prefix from all, you already have parent called
> "msix".

misc interrupts are for miscellaneous purposes like communication with
Firmware or other control plane interrupts (if any).

> 
> 
> >      name msix_eth size 92 occ 92 unit entry size_min 1 size_max
> 
> Why "size_min is not 0 here?

Thanks, actually 0 would mean disable the eth, default, netdev at all.
It could be done, however not implemented in this patchset. But for
cases when the default port is not needed at all, it seems like a good
idea.

> 
> 
> >	128 size_gran 1 dpipe_tables none
> >      name msix_vf size 128 occ 0 unit entry size_min 0 size_max
> >	1020 size_gran 1 dpipe_tables none
> >      name msix_rdma size 76 occ 76 unit entry size_min 0 size_max
> 
> Okay, this means that for eth and rdma, the vectors are fully used, no
> VF is instanciated?

Yes, in this driver implementation, both eth and rdma will most probably
be always fully utilized, but the moment you change the size and execute
`devlink reload` then they will reconfigure with new values.

The VF allocation here is the maximum number of interrupt vectors that
can be assigned to actually created VFs. If so, then occ shows how many
are actually utilized by the VFs.

> 
> 
> 
> >	132 size_gran 1 dpipe_tables none
> >
> >example commands:
> >$ devlink resource set pci/0000:31:00.0 path msix/msix_eth size 16
> >$ devlink resource set pci/0000:31:00.0 path msix/msix_vf size 512
> >$ devlink dev reload pci/0000:31:00.0

^ permalink raw reply	[flat|nested] 49+ messages in thread

* RE: [PATCH net-next 00/13] resource management using devlink reload
  2022-11-14 15:31   ` Samudrala, Sridhar
@ 2022-11-14 16:58     ` Keller, Jacob E
  2022-11-14 17:09       ` Leon Romanovsky
  2022-11-14 17:07     ` Leon Romanovsky
  1 sibling, 1 reply; 49+ messages in thread
From: Keller, Jacob E @ 2022-11-14 16:58 UTC (permalink / raw)
  To: Samudrala, Sridhar, Leon Romanovsky, Michal Swiatkowski
  Cc: netdev, davem, kuba, pabeni, edumazet, intel-wired-lan, jiri,
	Nguyen, Anthony L, Lobakin, Alexandr, Drewek, Wojciech, Czapnik,
	Lukasz, Saleem, Shiraz, Brandeburg, Jesse, Ismail, Mustafa,
	Kitszel, Przemyslaw, Raczynski, Piotr, Ertman, David M,
	Kaliszczuk, Leszek



> -----Original Message-----
> From: Samudrala, Sridhar <sridhar.samudrala@intel.com>
> Sent: Monday, November 14, 2022 7:31 AM
> To: Leon Romanovsky <leon@kernel.org>; Michal Swiatkowski
> <michal.swiatkowski@linux.intel.com>
> Cc: netdev@vger.kernel.org; davem@davemloft.net; kuba@kernel.org;
> pabeni@redhat.com; edumazet@google.com; intel-wired-lan@lists.osuosl.org;
> jiri@nvidia.com; Nguyen, Anthony L <anthony.l.nguyen@intel.com>; Lobakin,
> Alexandr <alexandr.lobakin@intel.com>; Drewek, Wojciech
> <wojciech.drewek@intel.com>; Czapnik, Lukasz <lukasz.czapnik@intel.com>;
> Saleem, Shiraz <shiraz.saleem@intel.com>; Brandeburg, Jesse
> <jesse.brandeburg@intel.com>; Ismail, Mustafa <mustafa.ismail@intel.com>;
> Kitszel, Przemyslaw <przemyslaw.kitszel@intel.com>; Raczynski, Piotr
> <piotr.raczynski@intel.com>; Keller, Jacob E <jacob.e.keller@intel.com>; Ertman,
> David M <david.m.ertman@intel.com>; Kaliszczuk, Leszek
> <leszek.kaliszczuk@intel.com>
> Subject: Re: [PATCH net-next 00/13] resource management using devlink reload
> 
> On 11/14/2022 7:23 AM, Leon Romanovsky wrote:
> > On Mon, Nov 14, 2022 at 01:57:42PM +0100, Michal Swiatkowski wrote:
> >> Currently the default value for number of PF vectors is number of CPUs.
> >> Because of that there are cases when all vectors are used for PF
> >> and user can't create more VFs. It is hard to set default number of
> >> CPUs right for all different use cases. Instead allow user to choose
> >> how many vectors should be used for various features. After implementing
> >> subdevices this mechanism will be also used to set number of vectors
> >> for subfunctions.
> >>
> >> The idea is to set vectors for eth or VFs using devlink resource API.
> >> New value of vectors will be used after devlink reinit. Example
> >> commands:
> >> $ sudo devlink resource set pci/0000:31:00.0 path msix/msix_eth size 16
> >> $ sudo devlink dev reload pci/0000:31:00.0
> >> After reload driver will work with 16 vectors used for eth instead of
> >> num_cpus.
> > By saying "vectors", are you referring to MSI-X vectors?
> > If yes, you have specific interface for that.
> > https://lore.kernel.org/linux-pci/20210314124256.70253-1-leon@kernel.org/
> 
> This patch series is exposing a resources API to split the device level MSI-X vectors
> across the different functions supported by the device (PF, RDMA, SR-IOV VFs
> and
> in future subfunctions). Today this is all hidden in a policy implemented within
> the PF driver.
> 
> The patch you are referring to seems to be providing an interface to change the
> msix count for a particular VF. This patch is providing a interface to set the total
> msix count for all the possible VFs from the available device level pool of
> msix-vectors.
> 

It looks like we should implement both: resources to configure the "pool" of available vectors for each VF, and the sysfs VF Interface to allow configuring individual VFs.

Thanks,
Jake

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH net-next 00/13] resource management using devlink reload
  2022-11-14 15:31   ` Samudrala, Sridhar
  2022-11-14 16:58     ` Keller, Jacob E
@ 2022-11-14 17:07     ` Leon Romanovsky
  2022-11-15  7:12       ` Michal Swiatkowski
  1 sibling, 1 reply; 49+ messages in thread
From: Leon Romanovsky @ 2022-11-14 17:07 UTC (permalink / raw)
  To: Samudrala, Sridhar
  Cc: Michal Swiatkowski, netdev, davem, kuba, pabeni, edumazet,
	intel-wired-lan, jiri, anthony.l.nguyen, alexandr.lobakin,
	wojciech.drewek, lukasz.czapnik, shiraz.saleem, jesse.brandeburg,
	mustafa.ismail, przemyslaw.kitszel, piotr.raczynski,
	jacob.e.keller, david.m.ertman, leszek.kaliszczuk

On Mon, Nov 14, 2022 at 09:31:11AM -0600, Samudrala, Sridhar wrote:
> On 11/14/2022 7:23 AM, Leon Romanovsky wrote:
> > On Mon, Nov 14, 2022 at 01:57:42PM +0100, Michal Swiatkowski wrote:
> > > Currently the default value for number of PF vectors is number of CPUs.
> > > Because of that there are cases when all vectors are used for PF
> > > and user can't create more VFs. It is hard to set default number of
> > > CPUs right for all different use cases. Instead allow user to choose
> > > how many vectors should be used for various features. After implementing
> > > subdevices this mechanism will be also used to set number of vectors
> > > for subfunctions.
> > > 
> > > The idea is to set vectors for eth or VFs using devlink resource API.
> > > New value of vectors will be used after devlink reinit. Example
> > > commands:
> > > $ sudo devlink resource set pci/0000:31:00.0 path msix/msix_eth size 16
> > > $ sudo devlink dev reload pci/0000:31:00.0
> > > After reload driver will work with 16 vectors used for eth instead of
> > > num_cpus.
> > By saying "vectors", are you referring to MSI-X vectors?
> > If yes, you have specific interface for that.
> > https://lore.kernel.org/linux-pci/20210314124256.70253-1-leon@kernel.org/
> 
> This patch series is exposing a resources API to split the device level MSI-X vectors
> across the different functions supported by the device (PF, RDMA, SR-IOV VFs and
> in future subfunctions). Today this is all hidden in a policy implemented within
> the PF driver.

Maybe we are talking about different VFs, but if you refer to PCI VFs,
the amount of MSI-X comes from PCI config space for that specific VF.

You shouldn't set any value through netdev as it will cause to
difference in output between lspci (which doesn't require any driver)
and your newly set number.

Also in RDMA case, it is not clear what will you achieve by this
setting too.

Thanks

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH net-next 00/13] resource management using devlink reload
  2022-11-14 16:58     ` Keller, Jacob E
@ 2022-11-14 17:09       ` Leon Romanovsky
  2022-11-15  7:00         ` Michal Swiatkowski
  0 siblings, 1 reply; 49+ messages in thread
From: Leon Romanovsky @ 2022-11-14 17:09 UTC (permalink / raw)
  To: Keller, Jacob E
  Cc: Samudrala, Sridhar, Michal Swiatkowski, netdev, davem, kuba,
	pabeni, edumazet, intel-wired-lan, jiri, Nguyen, Anthony L,
	Lobakin, Alexandr, Drewek, Wojciech, Czapnik, Lukasz, Saleem,
	Shiraz, Brandeburg, Jesse, Ismail, Mustafa, Kitszel, Przemyslaw,
	Raczynski, Piotr, Ertman, David M, Kaliszczuk, Leszek

On Mon, Nov 14, 2022 at 04:58:57PM +0000, Keller, Jacob E wrote:
> 
> 
> > -----Original Message-----
> > From: Samudrala, Sridhar <sridhar.samudrala@intel.com>
> > Sent: Monday, November 14, 2022 7:31 AM
> > To: Leon Romanovsky <leon@kernel.org>; Michal Swiatkowski
> > <michal.swiatkowski@linux.intel.com>
> > Cc: netdev@vger.kernel.org; davem@davemloft.net; kuba@kernel.org;
> > pabeni@redhat.com; edumazet@google.com; intel-wired-lan@lists.osuosl.org;
> > jiri@nvidia.com; Nguyen, Anthony L <anthony.l.nguyen@intel.com>; Lobakin,
> > Alexandr <alexandr.lobakin@intel.com>; Drewek, Wojciech
> > <wojciech.drewek@intel.com>; Czapnik, Lukasz <lukasz.czapnik@intel.com>;
> > Saleem, Shiraz <shiraz.saleem@intel.com>; Brandeburg, Jesse
> > <jesse.brandeburg@intel.com>; Ismail, Mustafa <mustafa.ismail@intel.com>;
> > Kitszel, Przemyslaw <przemyslaw.kitszel@intel.com>; Raczynski, Piotr
> > <piotr.raczynski@intel.com>; Keller, Jacob E <jacob.e.keller@intel.com>; Ertman,
> > David M <david.m.ertman@intel.com>; Kaliszczuk, Leszek
> > <leszek.kaliszczuk@intel.com>
> > Subject: Re: [PATCH net-next 00/13] resource management using devlink reload
> > 
> > On 11/14/2022 7:23 AM, Leon Romanovsky wrote:
> > > On Mon, Nov 14, 2022 at 01:57:42PM +0100, Michal Swiatkowski wrote:
> > >> Currently the default value for number of PF vectors is number of CPUs.
> > >> Because of that there are cases when all vectors are used for PF
> > >> and user can't create more VFs. It is hard to set default number of
> > >> CPUs right for all different use cases. Instead allow user to choose
> > >> how many vectors should be used for various features. After implementing
> > >> subdevices this mechanism will be also used to set number of vectors
> > >> for subfunctions.
> > >>
> > >> The idea is to set vectors for eth or VFs using devlink resource API.
> > >> New value of vectors will be used after devlink reinit. Example
> > >> commands:
> > >> $ sudo devlink resource set pci/0000:31:00.0 path msix/msix_eth size 16
> > >> $ sudo devlink dev reload pci/0000:31:00.0
> > >> After reload driver will work with 16 vectors used for eth instead of
> > >> num_cpus.
> > > By saying "vectors", are you referring to MSI-X vectors?
> > > If yes, you have specific interface for that.
> > > https://lore.kernel.org/linux-pci/20210314124256.70253-1-leon@kernel.org/
> > 
> > This patch series is exposing a resources API to split the device level MSI-X vectors
> > across the different functions supported by the device (PF, RDMA, SR-IOV VFs
> > and
> > in future subfunctions). Today this is all hidden in a policy implemented within
> > the PF driver.
> > 
> > The patch you are referring to seems to be providing an interface to change the
> > msix count for a particular VF. This patch is providing a interface to set the total
> > msix count for all the possible VFs from the available device level pool of
> > msix-vectors.
> > 
> 
> It looks like we should implement both: resources to configure the "pool" of available vectors for each VF, and the sysfs VF Interface to allow configuring individual VFs.

Yes, to be aligned with PCI spec and see coherent lspci output for VFs.

Thanks

> 
> Thanks,
> Jake

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH net-next 04/13] ice: split ice_vsi_setup into smaller functions
  2022-11-14 12:57 ` [PATCH net-next 04/13] ice: split ice_vsi_setup into smaller functions Michal Swiatkowski
@ 2022-11-15  5:08   ` Jakub Kicinski
  2022-11-15  6:49     ` Michal Swiatkowski
  0 siblings, 1 reply; 49+ messages in thread
From: Jakub Kicinski @ 2022-11-15  5:08 UTC (permalink / raw)
  To: Michal Swiatkowski
  Cc: netdev, davem, pabeni, edumazet, intel-wired-lan, jiri,
	anthony.l.nguyen, alexandr.lobakin, sridhar.samudrala,
	wojciech.drewek, lukasz.czapnik, shiraz.saleem, jesse.brandeburg,
	mustafa.ismail, przemyslaw.kitszel, piotr.raczynski,
	jacob.e.keller, david.m.ertman, leszek.kaliszczuk

On Mon, 14 Nov 2022 13:57:46 +0100 Michal Swiatkowski wrote:
> Main goal is to reuse the same functions in VSI config and rebuild
> paths.
> To do this split ice_vsi_setup into smaller pieces and reuse it during
> rebuild.
> 
> ice_vsi_alloc() should only alloc memory, not set the default values
> for VSI.
> Move setting defaults to separate function. This will allow config of
> already allocated VSI, for example in reload path.
> 
> The path is mostly moving code around without introducing new
> functionality. Functions ice_vsi_cfg() and ice_vsi_decfg() were
> added, but they are using code that already exist.
> 
> Use flag to pass information about VSI initialization during rebuild
> instead of using boolean value.
> 
> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
> Co-developed-by: Jacob Keller <jacob.e.keller@intel.com>
> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>

nit:

drivers/net/ethernet/intel/ice/ice_lib.c:459: warning: Function parameter or member 'vsi' not described in 'ice_vsi_alloc_def'
drivers/net/ethernet/intel/ice/ice_lib.c:459: warning: Excess function parameter 'vsi_type' description in 'ice_vsi_alloc_def'

Sorry, didn't get to actually reviewing because of the weekend backlog.
Will do tomorrow.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH net-next 12/13] ice, irdma: prepare reservation of MSI-X to reload
  2022-11-14 12:57 ` [PATCH net-next 12/13] ice, irdma: prepare reservation of MSI-X to reload Michal Swiatkowski
@ 2022-11-15  5:08   ` Jakub Kicinski
  2022-11-15  6:49     ` Michal Swiatkowski
  0 siblings, 1 reply; 49+ messages in thread
From: Jakub Kicinski @ 2022-11-15  5:08 UTC (permalink / raw)
  To: Michal Swiatkowski
  Cc: netdev, davem, pabeni, edumazet, intel-wired-lan, jiri,
	anthony.l.nguyen, alexandr.lobakin, sridhar.samudrala,
	wojciech.drewek, lukasz.czapnik, shiraz.saleem, jesse.brandeburg,
	mustafa.ismail, przemyslaw.kitszel, piotr.raczynski,
	jacob.e.keller, david.m.ertman, leszek.kaliszczuk

On Mon, 14 Nov 2022 13:57:54 +0100 Michal Swiatkowski wrote:
> Move MSI-X number for LAN and RDMA to structure to have it in one
> place. Use req_msix to store how many MSI-X for each feature user
> requested. Structure msix is used to store the current number of MSI-X.
> 
> The MSI-X number needs to be adjust if kernel doesn't support that many
> MSI-X or if hardware doesn't support it. Rewrite MSI-X adjustment
> function to use it in both cases.
> 
> Use the same algorithm like previously. First allocate minimum MSI-X for
> each feature than if there is enough MSI-X distribute it equally between
> each one.

drivers/net/ethernet/intel/ice/ice_lib.c:455: warning: Function parameter or member 'vsi' not described in 'ice_vsi_alloc_def'
drivers/net/ethernet/intel/ice/ice_lib.c:455: warning: Excess function parameter 'vsi_type' description in 'ice_vsi_alloc_def'
drivers/net/ethernet/intel/ice/ice_main.c:4026:9: warning: variable 'err' is uninitialized when used here [-Wuninitialized]
        return err;
               ^~~
drivers/net/ethernet/intel/ice/ice_main.c:4001:29: note: initialize the variable 'err' to silence this warning
        int v_wanted, v_actual, err, i;
                                   ^
                                    = 0

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH net-next 04/13] ice: split ice_vsi_setup into smaller functions
  2022-11-15  5:08   ` Jakub Kicinski
@ 2022-11-15  6:49     ` Michal Swiatkowski
  0 siblings, 0 replies; 49+ messages in thread
From: Michal Swiatkowski @ 2022-11-15  6:49 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev, davem, pabeni, edumazet, intel-wired-lan, jiri,
	anthony.l.nguyen, alexandr.lobakin, sridhar.samudrala,
	wojciech.drewek, lukasz.czapnik, shiraz.saleem, jesse.brandeburg,
	mustafa.ismail, przemyslaw.kitszel, piotr.raczynski,
	jacob.e.keller, david.m.ertman, leszek.kaliszczuk

On Mon, Nov 14, 2022 at 09:08:25PM -0800, Jakub Kicinski wrote:
> On Mon, 14 Nov 2022 13:57:46 +0100 Michal Swiatkowski wrote:
> > Main goal is to reuse the same functions in VSI config and rebuild
> > paths.
> > To do this split ice_vsi_setup into smaller pieces and reuse it during
> > rebuild.
> > 
> > ice_vsi_alloc() should only alloc memory, not set the default values
> > for VSI.
> > Move setting defaults to separate function. This will allow config of
> > already allocated VSI, for example in reload path.
> > 
> > The path is mostly moving code around without introducing new
> > functionality. Functions ice_vsi_cfg() and ice_vsi_decfg() were
> > added, but they are using code that already exist.
> > 
> > Use flag to pass information about VSI initialization during rebuild
> > instead of using boolean value.
> > 
> > Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
> > Co-developed-by: Jacob Keller <jacob.e.keller@intel.com>
> > Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
> 
> nit:
> 
> drivers/net/ethernet/intel/ice/ice_lib.c:459: warning: Function parameter or member 'vsi' not described in 'ice_vsi_alloc_def'
> drivers/net/ethernet/intel/ice/ice_lib.c:459: warning: Excess function parameter 'vsi_type' description in 'ice_vsi_alloc_def'
> 
> Sorry, didn't get to actually reviewing because of the weekend backlog.
> Will do tomorrow.

Thanks, I will fix it in next version.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH net-next 12/13] ice, irdma: prepare reservation of MSI-X to reload
  2022-11-15  5:08   ` Jakub Kicinski
@ 2022-11-15  6:49     ` Michal Swiatkowski
  0 siblings, 0 replies; 49+ messages in thread
From: Michal Swiatkowski @ 2022-11-15  6:49 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev, davem, pabeni, edumazet, intel-wired-lan, jiri,
	anthony.l.nguyen, alexandr.lobakin, sridhar.samudrala,
	wojciech.drewek, lukasz.czapnik, shiraz.saleem, jesse.brandeburg,
	mustafa.ismail, przemyslaw.kitszel, piotr.raczynski,
	jacob.e.keller, david.m.ertman, leszek.kaliszczuk

On Mon, Nov 14, 2022 at 09:08:56PM -0800, Jakub Kicinski wrote:
> On Mon, 14 Nov 2022 13:57:54 +0100 Michal Swiatkowski wrote:
> > Move MSI-X number for LAN and RDMA to structure to have it in one
> > place. Use req_msix to store how many MSI-X for each feature user
> > requested. Structure msix is used to store the current number of MSI-X.
> > 
> > The MSI-X number needs to be adjust if kernel doesn't support that many
> > MSI-X or if hardware doesn't support it. Rewrite MSI-X adjustment
> > function to use it in both cases.
> > 
> > Use the same algorithm like previously. First allocate minimum MSI-X for
> > each feature than if there is enough MSI-X distribute it equally between
> > each one.
> 
> drivers/net/ethernet/intel/ice/ice_lib.c:455: warning: Function parameter or member 'vsi' not described in 'ice_vsi_alloc_def'
> drivers/net/ethernet/intel/ice/ice_lib.c:455: warning: Excess function parameter 'vsi_type' description in 'ice_vsi_alloc_def'
> drivers/net/ethernet/intel/ice/ice_main.c:4026:9: warning: variable 'err' is uninitialized when used here [-Wuninitialized]
>         return err;
>                ^~~
> drivers/net/ethernet/intel/ice/ice_main.c:4001:29: note: initialize the variable 'err' to silence this warning
>         int v_wanted, v_actual, err, i;
>                                    ^
>                                     = 0

Thanks, will fix 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH net-next 13/13] devlink, ice: add MSIX vectors as devlink resource
  2022-11-14 16:03     ` Piotr Raczynski
@ 2022-11-15  6:56       ` Michal Swiatkowski
  2022-11-15 12:08       ` Jiri Pirko
  1 sibling, 0 replies; 49+ messages in thread
From: Michal Swiatkowski @ 2022-11-15  6:56 UTC (permalink / raw)
  To: Piotr Raczynski
  Cc: Jiri Pirko, netdev, davem, kuba, pabeni, edumazet,
	intel-wired-lan, anthony.l.nguyen, alexandr.lobakin,
	sridhar.samudrala, wojciech.drewek, lukasz.czapnik,
	shiraz.saleem, jesse.brandeburg, mustafa.ismail,
	przemyslaw.kitszel, jacob.e.keller, david.m.ertman,
	leszek.kaliszczuk, Michal Kubiak

On Mon, Nov 14, 2022 at 05:03:43PM +0100, Piotr Raczynski wrote:
> On Mon, Nov 14, 2022 at 04:28:38PM +0100, Jiri Pirko wrote:
> > Mon, Nov 14, 2022 at 01:57:55PM CET, michal.swiatkowski@linux.intel.com wrote:
> > >From: Michal Kubiak <michal.kubiak@intel.com>
> > >
> > >Implement devlink resource to control how many MSI-X vectors are
> > >used for eth, VF and RDMA. Show misc MSI-X as read only.
> > >
> > >This is first approach to control the mix of resources managed
> > >by ice driver. This commit registers number of available MSI-X
> > >as devlink resource and also add specific resources for eth, vf and RDMA.
> > >
> > >Also, make those resources generic.
> > >
> > >$ devlink resource show pci/0000:31:00.0
> > >  name msix size 1024 occ 172 unit entry dpipe_tables none
> > 
> > 
> > So, 1024 is the total vector count available in your hw?
> > 
> 
> For this particular device and physical function, yes.
> 
> 
> > 
> > >    resources:
> > >      name msix_misc size 4 unit entry dpipe_tables none
> > 
> > What's misc? Why you don't show occupancy for it? Yet, it seems to be
> > accounted in the total (172)
> > 
> > Also, drop the "msix_" prefix from all, you already have parent called
> > "msix".
> 
> misc interrupts are for miscellaneous purposes like communication with
> Firmware or other control plane interrupts (if any).
> 

I will drop msix_ prefix. I didn't show the occupancy because it is the
same all the time and user can't change it. But You are righ it is
accounted, so I will add occupancy also here in next version.

> > 
> > 
> > >      name msix_eth size 92 occ 92 unit entry size_min 1 size_max
> > 
> > Why "size_min is not 0 here?
> 
> Thanks, actually 0 would mean disable the eth, default, netdev at all.
> It could be done, however not implemented in this patchset. But for
> cases when the default port is not needed at all, it seems like a good
> idea.
>

I will try to do it in next version, thanks.

> > 
> > 
> > >	128 size_gran 1 dpipe_tables none
> > >      name msix_vf size 128 occ 0 unit entry size_min 0 size_max
> > >	1020 size_gran 1 dpipe_tables none
> > >      name msix_rdma size 76 occ 76 unit entry size_min 0 size_max
> > 
> > Okay, this means that for eth and rdma, the vectors are fully used, no
> > VF is instanciated?
> 
> Yes, in this driver implementation, both eth and rdma will most probably
> be always fully utilized, but the moment you change the size and execute
> `devlink reload` then they will reconfigure with new values.
> 
> The VF allocation here is the maximum number of interrupt vectors that
> can be assigned to actually created VFs. If so, then occ shows how many
> are actually utilized by the VFs.
> 
> > 
> > 
> > 
> > >	132 size_gran 1 dpipe_tables none
> > >
> > >example commands:
> > >$ devlink resource set pci/0000:31:00.0 path msix/msix_eth size 16
> > >$ devlink resource set pci/0000:31:00.0 path msix/msix_vf size 512
> > >$ devlink dev reload pci/0000:31:00.0

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH net-next 00/13] resource management using devlink reload
  2022-11-14 17:09       ` Leon Romanovsky
@ 2022-11-15  7:00         ` Michal Swiatkowski
  0 siblings, 0 replies; 49+ messages in thread
From: Michal Swiatkowski @ 2022-11-15  7:00 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Keller, Jacob E, Samudrala, Sridhar, netdev, davem, kuba, pabeni,
	edumazet, intel-wired-lan, jiri, Nguyen, Anthony L, Lobakin,
	Alexandr, Drewek, Wojciech, Czapnik, Lukasz, Saleem, Shiraz,
	Brandeburg, Jesse, Ismail, Mustafa, Kitszel, Przemyslaw,
	Raczynski, Piotr, Ertman, David M, Kaliszczuk, Leszek

On Mon, Nov 14, 2022 at 07:09:36PM +0200, Leon Romanovsky wrote:
> On Mon, Nov 14, 2022 at 04:58:57PM +0000, Keller, Jacob E wrote:
> > 
> > 
> > > -----Original Message-----
> > > From: Samudrala, Sridhar <sridhar.samudrala@intel.com>
> > > Sent: Monday, November 14, 2022 7:31 AM
> > > To: Leon Romanovsky <leon@kernel.org>; Michal Swiatkowski
> > > <michal.swiatkowski@linux.intel.com>
> > > Cc: netdev@vger.kernel.org; davem@davemloft.net; kuba@kernel.org;
> > > pabeni@redhat.com; edumazet@google.com; intel-wired-lan@lists.osuosl.org;
> > > jiri@nvidia.com; Nguyen, Anthony L <anthony.l.nguyen@intel.com>; Lobakin,
> > > Alexandr <alexandr.lobakin@intel.com>; Drewek, Wojciech
> > > <wojciech.drewek@intel.com>; Czapnik, Lukasz <lukasz.czapnik@intel.com>;
> > > Saleem, Shiraz <shiraz.saleem@intel.com>; Brandeburg, Jesse
> > > <jesse.brandeburg@intel.com>; Ismail, Mustafa <mustafa.ismail@intel.com>;
> > > Kitszel, Przemyslaw <przemyslaw.kitszel@intel.com>; Raczynski, Piotr
> > > <piotr.raczynski@intel.com>; Keller, Jacob E <jacob.e.keller@intel.com>; Ertman,
> > > David M <david.m.ertman@intel.com>; Kaliszczuk, Leszek
> > > <leszek.kaliszczuk@intel.com>
> > > Subject: Re: [PATCH net-next 00/13] resource management using devlink reload
> > > 
> > > On 11/14/2022 7:23 AM, Leon Romanovsky wrote:
> > > > On Mon, Nov 14, 2022 at 01:57:42PM +0100, Michal Swiatkowski wrote:
> > > >> Currently the default value for number of PF vectors is number of CPUs.
> > > >> Because of that there are cases when all vectors are used for PF
> > > >> and user can't create more VFs. It is hard to set default number of
> > > >> CPUs right for all different use cases. Instead allow user to choose
> > > >> how many vectors should be used for various features. After implementing
> > > >> subdevices this mechanism will be also used to set number of vectors
> > > >> for subfunctions.
> > > >>
> > > >> The idea is to set vectors for eth or VFs using devlink resource API.
> > > >> New value of vectors will be used after devlink reinit. Example
> > > >> commands:
> > > >> $ sudo devlink resource set pci/0000:31:00.0 path msix/msix_eth size 16
> > > >> $ sudo devlink dev reload pci/0000:31:00.0
> > > >> After reload driver will work with 16 vectors used for eth instead of
> > > >> num_cpus.
> > > > By saying "vectors", are you referring to MSI-X vectors?
> > > > If yes, you have specific interface for that.
> > > > https://lore.kernel.org/linux-pci/20210314124256.70253-1-leon@kernel.org/
> > > 
> > > This patch series is exposing a resources API to split the device level MSI-X vectors
> > > across the different functions supported by the device (PF, RDMA, SR-IOV VFs
> > > and
> > > in future subfunctions). Today this is all hidden in a policy implemented within
> > > the PF driver.
> > > 
> > > The patch you are referring to seems to be providing an interface to change the
> > > msix count for a particular VF. This patch is providing a interface to set the total
> > > msix count for all the possible VFs from the available device level pool of
> > > msix-vectors.
> > > 
> > 
> > It looks like we should implement both: resources to configure the "pool" of available vectors for each VF, and the sysfs VF Interface to allow configuring individual VFs.
> 
> Yes, to be aligned with PCI spec and see coherent lspci output for VFs.
> 
> Thanks
> 

Thanks, I will implement this sysf interface to configure individual
VFs.

> > 
> > Thanks,
> > Jake

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH net-next 00/13] resource management using devlink reload
  2022-11-14 17:07     ` Leon Romanovsky
@ 2022-11-15  7:12       ` Michal Swiatkowski
  2022-11-15  8:11         ` Leon Romanovsky
  0 siblings, 1 reply; 49+ messages in thread
From: Michal Swiatkowski @ 2022-11-15  7:12 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Samudrala, Sridhar, netdev, davem, kuba, pabeni, edumazet,
	intel-wired-lan, jiri, anthony.l.nguyen, alexandr.lobakin,
	wojciech.drewek, lukasz.czapnik, shiraz.saleem, jesse.brandeburg,
	mustafa.ismail, przemyslaw.kitszel, piotr.raczynski,
	jacob.e.keller, david.m.ertman, leszek.kaliszczuk

On Mon, Nov 14, 2022 at 07:07:54PM +0200, Leon Romanovsky wrote:
> On Mon, Nov 14, 2022 at 09:31:11AM -0600, Samudrala, Sridhar wrote:
> > On 11/14/2022 7:23 AM, Leon Romanovsky wrote:
> > > On Mon, Nov 14, 2022 at 01:57:42PM +0100, Michal Swiatkowski wrote:
> > > > Currently the default value for number of PF vectors is number of CPUs.
> > > > Because of that there are cases when all vectors are used for PF
> > > > and user can't create more VFs. It is hard to set default number of
> > > > CPUs right for all different use cases. Instead allow user to choose
> > > > how many vectors should be used for various features. After implementing
> > > > subdevices this mechanism will be also used to set number of vectors
> > > > for subfunctions.
> > > > 
> > > > The idea is to set vectors for eth or VFs using devlink resource API.
> > > > New value of vectors will be used after devlink reinit. Example
> > > > commands:
> > > > $ sudo devlink resource set pci/0000:31:00.0 path msix/msix_eth size 16
> > > > $ sudo devlink dev reload pci/0000:31:00.0
> > > > After reload driver will work with 16 vectors used for eth instead of
> > > > num_cpus.
> > > By saying "vectors", are you referring to MSI-X vectors?
> > > If yes, you have specific interface for that.
> > > https://lore.kernel.org/linux-pci/20210314124256.70253-1-leon@kernel.org/
> > 
> > This patch series is exposing a resources API to split the device level MSI-X vectors
> > across the different functions supported by the device (PF, RDMA, SR-IOV VFs and
> > in future subfunctions). Today this is all hidden in a policy implemented within
> > the PF driver.
> 
> Maybe we are talking about different VFs, but if you refer to PCI VFs,
> the amount of MSI-X comes from PCI config space for that specific VF.
> 
> You shouldn't set any value through netdev as it will cause to
> difference in output between lspci (which doesn't require any driver)
> and your newly set number.

If I understand correctly, lspci shows the MSI-X number for individual
VF. Value set via devlink is the total number of MSI-X that can be used
when creating VFs. As Jake said I will fix the code to track both
values. Thanks for pointing the patch.

> 
> Also in RDMA case, it is not clear what will you achieve by this
> setting too.
>

We have limited number of MSI-X (1024) in the device. Because of that
the amount of MSI-X for each feature is set to the best values. Half for
ethernet, half for RDMA. This patchset allow user to change this values.
If he wants more MSI-X for ethernet, he can decrease MSI-X for RDMA.

> Thanks

Thanks for reviewing

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH net-next 00/13] resource management using devlink reload
  2022-11-15  7:12       ` Michal Swiatkowski
@ 2022-11-15  8:11         ` Leon Romanovsky
  2022-11-15  9:04           ` Michal Swiatkowski
  0 siblings, 1 reply; 49+ messages in thread
From: Leon Romanovsky @ 2022-11-15  8:11 UTC (permalink / raw)
  To: Michal Swiatkowski
  Cc: Samudrala, Sridhar, netdev, davem, kuba, pabeni, edumazet,
	intel-wired-lan, jiri, anthony.l.nguyen, alexandr.lobakin,
	wojciech.drewek, lukasz.czapnik, shiraz.saleem, jesse.brandeburg,
	mustafa.ismail, przemyslaw.kitszel, piotr.raczynski,
	jacob.e.keller, david.m.ertman, leszek.kaliszczuk

On Tue, Nov 15, 2022 at 08:12:52AM +0100, Michal Swiatkowski wrote:
> On Mon, Nov 14, 2022 at 07:07:54PM +0200, Leon Romanovsky wrote:
> > On Mon, Nov 14, 2022 at 09:31:11AM -0600, Samudrala, Sridhar wrote:
> > > On 11/14/2022 7:23 AM, Leon Romanovsky wrote:
> > > > On Mon, Nov 14, 2022 at 01:57:42PM +0100, Michal Swiatkowski wrote:
> > > > > Currently the default value for number of PF vectors is number of CPUs.
> > > > > Because of that there are cases when all vectors are used for PF
> > > > > and user can't create more VFs. It is hard to set default number of
> > > > > CPUs right for all different use cases. Instead allow user to choose
> > > > > how many vectors should be used for various features. After implementing
> > > > > subdevices this mechanism will be also used to set number of vectors
> > > > > for subfunctions.
> > > > > 
> > > > > The idea is to set vectors for eth or VFs using devlink resource API.
> > > > > New value of vectors will be used after devlink reinit. Example
> > > > > commands:
> > > > > $ sudo devlink resource set pci/0000:31:00.0 path msix/msix_eth size 16
> > > > > $ sudo devlink dev reload pci/0000:31:00.0
> > > > > After reload driver will work with 16 vectors used for eth instead of
> > > > > num_cpus.
> > > > By saying "vectors", are you referring to MSI-X vectors?
> > > > If yes, you have specific interface for that.
> > > > https://lore.kernel.org/linux-pci/20210314124256.70253-1-leon@kernel.org/
> > > 
> > > This patch series is exposing a resources API to split the device level MSI-X vectors
> > > across the different functions supported by the device (PF, RDMA, SR-IOV VFs and
> > > in future subfunctions). Today this is all hidden in a policy implemented within
> > > the PF driver.
> > 
> > Maybe we are talking about different VFs, but if you refer to PCI VFs,
> > the amount of MSI-X comes from PCI config space for that specific VF.
> > 
> > You shouldn't set any value through netdev as it will cause to
> > difference in output between lspci (which doesn't require any driver)
> > and your newly set number.
> 
> If I understand correctly, lspci shows the MSI-X number for individual
> VF. Value set via devlink is the total number of MSI-X that can be used
> when creating VFs. 

Yes and no, lspci shows how much MSI-X vectors exist from HW point of
view. Driver can use less than that. It is exactly as your proposed
devlink interface.


> As Jake said I will fix the code to track both values. Thanks for pointing the patch.
> 
> > 
> > Also in RDMA case, it is not clear what will you achieve by this
> > setting too.
> >
> 
> We have limited number of MSI-X (1024) in the device. Because of that
> the amount of MSI-X for each feature is set to the best values. Half for
> ethernet, half for RDMA. This patchset allow user to change this values.
> If he wants more MSI-X for ethernet, he can decrease MSI-X for RDMA.

RDMA devices doesn't have PCI logic and everything is controlled through
you main core module. It means that when you create RDMA auxiliary device,
it will be connected to netdev (RoCE and iWARP) and that netdev should
deal with vectors. So I still don't understand what does it mean "half
for RDMA".

Thanks

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH net-next 00/13] resource management using devlink reload
  2022-11-15  8:11         ` Leon Romanovsky
@ 2022-11-15  9:04           ` Michal Swiatkowski
  2022-11-15  9:32             ` Leon Romanovsky
  0 siblings, 1 reply; 49+ messages in thread
From: Michal Swiatkowski @ 2022-11-15  9:04 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Samudrala, Sridhar, netdev, davem, kuba, pabeni, edumazet,
	intel-wired-lan, jiri, anthony.l.nguyen, alexandr.lobakin,
	wojciech.drewek, lukasz.czapnik, shiraz.saleem, jesse.brandeburg,
	mustafa.ismail, przemyslaw.kitszel, piotr.raczynski,
	jacob.e.keller, david.m.ertman, leszek.kaliszczuk

On Tue, Nov 15, 2022 at 10:11:10AM +0200, Leon Romanovsky wrote:
> On Tue, Nov 15, 2022 at 08:12:52AM +0100, Michal Swiatkowski wrote:
> > On Mon, Nov 14, 2022 at 07:07:54PM +0200, Leon Romanovsky wrote:
> > > On Mon, Nov 14, 2022 at 09:31:11AM -0600, Samudrala, Sridhar wrote:
> > > > On 11/14/2022 7:23 AM, Leon Romanovsky wrote:
> > > > > On Mon, Nov 14, 2022 at 01:57:42PM +0100, Michal Swiatkowski wrote:
> > > > > > Currently the default value for number of PF vectors is number of CPUs.
> > > > > > Because of that there are cases when all vectors are used for PF
> > > > > > and user can't create more VFs. It is hard to set default number of
> > > > > > CPUs right for all different use cases. Instead allow user to choose
> > > > > > how many vectors should be used for various features. After implementing
> > > > > > subdevices this mechanism will be also used to set number of vectors
> > > > > > for subfunctions.
> > > > > > 
> > > > > > The idea is to set vectors for eth or VFs using devlink resource API.
> > > > > > New value of vectors will be used after devlink reinit. Example
> > > > > > commands:
> > > > > > $ sudo devlink resource set pci/0000:31:00.0 path msix/msix_eth size 16
> > > > > > $ sudo devlink dev reload pci/0000:31:00.0
> > > > > > After reload driver will work with 16 vectors used for eth instead of
> > > > > > num_cpus.
> > > > > By saying "vectors", are you referring to MSI-X vectors?
> > > > > If yes, you have specific interface for that.
> > > > > https://lore.kernel.org/linux-pci/20210314124256.70253-1-leon@kernel.org/
> > > > 
> > > > This patch series is exposing a resources API to split the device level MSI-X vectors
> > > > across the different functions supported by the device (PF, RDMA, SR-IOV VFs and
> > > > in future subfunctions). Today this is all hidden in a policy implemented within
> > > > the PF driver.
> > > 
> > > Maybe we are talking about different VFs, but if you refer to PCI VFs,
> > > the amount of MSI-X comes from PCI config space for that specific VF.
> > > 
> > > You shouldn't set any value through netdev as it will cause to
> > > difference in output between lspci (which doesn't require any driver)
> > > and your newly set number.
> > 
> > If I understand correctly, lspci shows the MSI-X number for individual
> > VF. Value set via devlink is the total number of MSI-X that can be used
> > when creating VFs. 
> 
> Yes and no, lspci shows how much MSI-X vectors exist from HW point of
> view. Driver can use less than that. It is exactly as your proposed
> devlink interface.
> 
> 

Ok, I have to take a closer look at it. So, are You saing that we should
drop this devlink solution and use sysfs interface fo VFs or are You
fine with having both? What with MSI-X allocation for subfunction?

> > As Jake said I will fix the code to track both values. Thanks for pointing the patch.
> > 
> > > 
> > > Also in RDMA case, it is not clear what will you achieve by this
> > > setting too.
> > >
> > 
> > We have limited number of MSI-X (1024) in the device. Because of that
> > the amount of MSI-X for each feature is set to the best values. Half for
> > ethernet, half for RDMA. This patchset allow user to change this values.
> > If he wants more MSI-X for ethernet, he can decrease MSI-X for RDMA.
> 
> RDMA devices doesn't have PCI logic and everything is controlled through
> you main core module. It means that when you create RDMA auxiliary device,
> it will be connected to netdev (RoCE and iWARP) and that netdev should
> deal with vectors. So I still don't understand what does it mean "half
> for RDMA".
>

Yes, it is controlled by module, but during probe, MSI-X vectors for RDMA
are reserved and can't be used by ethernet. For example I have
64 CPUs, when loading I get 64 vectors from HW for ethernet and 64 for
RDMA. The vectors for RDMA will be consumed by irdma driver, so I won't
be able to use it in ethernet and vice versa.

By saing it can't be used I mean that irdma driver received the MSI-X
vectors number and it is using them (connected them with RDMA interrupts).

Devlink resource is a way to change the number of MSI-X vectors that
will be reserved for RDMA. You wrote that netdev should deal with
vectors, but how netdev will know how many vectors should go to RDMA aux
device? Does there an interface for setting the vectors amount for RDMA
device?

Thanks

> Thanks

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH net-next 00/13] resource management using devlink reload
  2022-11-15  9:04           ` Michal Swiatkowski
@ 2022-11-15  9:32             ` Leon Romanovsky
  2022-11-15 10:16               ` Michal Swiatkowski
  0 siblings, 1 reply; 49+ messages in thread
From: Leon Romanovsky @ 2022-11-15  9:32 UTC (permalink / raw)
  To: Michal Swiatkowski
  Cc: Samudrala, Sridhar, netdev, davem, kuba, pabeni, edumazet,
	intel-wired-lan, jiri, anthony.l.nguyen, alexandr.lobakin,
	wojciech.drewek, lukasz.czapnik, shiraz.saleem, jesse.brandeburg,
	mustafa.ismail, przemyslaw.kitszel, piotr.raczynski,
	jacob.e.keller, david.m.ertman, leszek.kaliszczuk

On Tue, Nov 15, 2022 at 10:04:49AM +0100, Michal Swiatkowski wrote:
> On Tue, Nov 15, 2022 at 10:11:10AM +0200, Leon Romanovsky wrote:
> > On Tue, Nov 15, 2022 at 08:12:52AM +0100, Michal Swiatkowski wrote:
> > > On Mon, Nov 14, 2022 at 07:07:54PM +0200, Leon Romanovsky wrote:
> > > > On Mon, Nov 14, 2022 at 09:31:11AM -0600, Samudrala, Sridhar wrote:
> > > > > On 11/14/2022 7:23 AM, Leon Romanovsky wrote:
> > > > > > On Mon, Nov 14, 2022 at 01:57:42PM +0100, Michal Swiatkowski wrote:
> > > > > > > Currently the default value for number of PF vectors is number of CPUs.
> > > > > > > Because of that there are cases when all vectors are used for PF
> > > > > > > and user can't create more VFs. It is hard to set default number of
> > > > > > > CPUs right for all different use cases. Instead allow user to choose
> > > > > > > how many vectors should be used for various features. After implementing
> > > > > > > subdevices this mechanism will be also used to set number of vectors
> > > > > > > for subfunctions.
> > > > > > > 
> > > > > > > The idea is to set vectors for eth or VFs using devlink resource API.
> > > > > > > New value of vectors will be used after devlink reinit. Example
> > > > > > > commands:
> > > > > > > $ sudo devlink resource set pci/0000:31:00.0 path msix/msix_eth size 16
> > > > > > > $ sudo devlink dev reload pci/0000:31:00.0
> > > > > > > After reload driver will work with 16 vectors used for eth instead of
> > > > > > > num_cpus.
> > > > > > By saying "vectors", are you referring to MSI-X vectors?
> > > > > > If yes, you have specific interface for that.
> > > > > > https://lore.kernel.org/linux-pci/20210314124256.70253-1-leon@kernel.org/
> > > > > 
> > > > > This patch series is exposing a resources API to split the device level MSI-X vectors
> > > > > across the different functions supported by the device (PF, RDMA, SR-IOV VFs and
> > > > > in future subfunctions). Today this is all hidden in a policy implemented within
> > > > > the PF driver.
> > > > 
> > > > Maybe we are talking about different VFs, but if you refer to PCI VFs,
> > > > the amount of MSI-X comes from PCI config space for that specific VF.
> > > > 
> > > > You shouldn't set any value through netdev as it will cause to
> > > > difference in output between lspci (which doesn't require any driver)
> > > > and your newly set number.
> > > 
> > > If I understand correctly, lspci shows the MSI-X number for individual
> > > VF. Value set via devlink is the total number of MSI-X that can be used
> > > when creating VFs. 
> > 
> > Yes and no, lspci shows how much MSI-X vectors exist from HW point of
> > view. Driver can use less than that. It is exactly as your proposed
> > devlink interface.
> > 
> > 
> 
> Ok, I have to take a closer look at it. So, are You saing that we should
> drop this devlink solution and use sysfs interface fo VFs or are You
> fine with having both? What with MSI-X allocation for subfunction?

You should drop for VFs and PFs and keep it for SFs only.

> 
> > > As Jake said I will fix the code to track both values. Thanks for pointing the patch.
> > > 
> > > > 
> > > > Also in RDMA case, it is not clear what will you achieve by this
> > > > setting too.
> > > >
> > > 
> > > We have limited number of MSI-X (1024) in the device. Because of that
> > > the amount of MSI-X for each feature is set to the best values. Half for
> > > ethernet, half for RDMA. This patchset allow user to change this values.
> > > If he wants more MSI-X for ethernet, he can decrease MSI-X for RDMA.
> > 
> > RDMA devices doesn't have PCI logic and everything is controlled through
> > you main core module. It means that when you create RDMA auxiliary device,
> > it will be connected to netdev (RoCE and iWARP) and that netdev should
> > deal with vectors. So I still don't understand what does it mean "half
> > for RDMA".
> >
> 
> Yes, it is controlled by module, but during probe, MSI-X vectors for RDMA
> are reserved and can't be used by ethernet. For example I have
> 64 CPUs, when loading I get 64 vectors from HW for ethernet and 64 for
> RDMA. The vectors for RDMA will be consumed by irdma driver, so I won't
> be able to use it in ethernet and vice versa.
> 
> By saing it can't be used I mean that irdma driver received the MSI-X
> vectors number and it is using them (connected them with RDMA interrupts).
> 
> Devlink resource is a way to change the number of MSI-X vectors that
> will be reserved for RDMA. You wrote that netdev should deal with
> vectors, but how netdev will know how many vectors should go to RDMA aux
> device? Does there an interface for setting the vectors amount for RDMA
> device?

When RDMA core adds device, it calls to irdma_init_rdma_device() and
num_comp_vectors is actually the number of MSI-X vectors which you want
to give to that device.

I'm trying to say that probably you shouldn't reserve specific vectors
for both ethernet and RDMA and simply share same vectors. RDMA applications
that care about performance set comp_vector through user space verbs anyway.

Thanks

> 
> Thanks
> 
> > Thanks

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH net-next 00/13] resource management using devlink reload
  2022-11-15  9:32             ` Leon Romanovsky
@ 2022-11-15 10:16               ` Michal Swiatkowski
  2022-11-15 12:12                 ` Leon Romanovsky
  0 siblings, 1 reply; 49+ messages in thread
From: Michal Swiatkowski @ 2022-11-15 10:16 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Samudrala, Sridhar, netdev, davem, kuba, pabeni, edumazet,
	intel-wired-lan, jiri, anthony.l.nguyen, alexandr.lobakin,
	wojciech.drewek, lukasz.czapnik, shiraz.saleem, jesse.brandeburg,
	mustafa.ismail, przemyslaw.kitszel, piotr.raczynski,
	jacob.e.keller, david.m.ertman, leszek.kaliszczuk

On Tue, Nov 15, 2022 at 11:32:14AM +0200, Leon Romanovsky wrote:
> On Tue, Nov 15, 2022 at 10:04:49AM +0100, Michal Swiatkowski wrote:
> > On Tue, Nov 15, 2022 at 10:11:10AM +0200, Leon Romanovsky wrote:
> > > On Tue, Nov 15, 2022 at 08:12:52AM +0100, Michal Swiatkowski wrote:
> > > > On Mon, Nov 14, 2022 at 07:07:54PM +0200, Leon Romanovsky wrote:
> > > > > On Mon, Nov 14, 2022 at 09:31:11AM -0600, Samudrala, Sridhar wrote:
> > > > > > On 11/14/2022 7:23 AM, Leon Romanovsky wrote:
> > > > > > > On Mon, Nov 14, 2022 at 01:57:42PM +0100, Michal Swiatkowski wrote:
> > > > > > > > Currently the default value for number of PF vectors is number of CPUs.
> > > > > > > > Because of that there are cases when all vectors are used for PF
> > > > > > > > and user can't create more VFs. It is hard to set default number of
> > > > > > > > CPUs right for all different use cases. Instead allow user to choose
> > > > > > > > how many vectors should be used for various features. After implementing
> > > > > > > > subdevices this mechanism will be also used to set number of vectors
> > > > > > > > for subfunctions.
> > > > > > > > 
> > > > > > > > The idea is to set vectors for eth or VFs using devlink resource API.
> > > > > > > > New value of vectors will be used after devlink reinit. Example
> > > > > > > > commands:
> > > > > > > > $ sudo devlink resource set pci/0000:31:00.0 path msix/msix_eth size 16
> > > > > > > > $ sudo devlink dev reload pci/0000:31:00.0
> > > > > > > > After reload driver will work with 16 vectors used for eth instead of
> > > > > > > > num_cpus.
> > > > > > > By saying "vectors", are you referring to MSI-X vectors?
> > > > > > > If yes, you have specific interface for that.
> > > > > > > https://lore.kernel.org/linux-pci/20210314124256.70253-1-leon@kernel.org/
> > > > > > 
> > > > > > This patch series is exposing a resources API to split the device level MSI-X vectors
> > > > > > across the different functions supported by the device (PF, RDMA, SR-IOV VFs and
> > > > > > in future subfunctions). Today this is all hidden in a policy implemented within
> > > > > > the PF driver.
> > > > > 
> > > > > Maybe we are talking about different VFs, but if you refer to PCI VFs,
> > > > > the amount of MSI-X comes from PCI config space for that specific VF.
> > > > > 
> > > > > You shouldn't set any value through netdev as it will cause to
> > > > > difference in output between lspci (which doesn't require any driver)
> > > > > and your newly set number.
> > > > 
> > > > If I understand correctly, lspci shows the MSI-X number for individual
> > > > VF. Value set via devlink is the total number of MSI-X that can be used
> > > > when creating VFs. 
> > > 
> > > Yes and no, lspci shows how much MSI-X vectors exist from HW point of
> > > view. Driver can use less than that. It is exactly as your proposed
> > > devlink interface.
> > > 
> > > 
> > 
> > Ok, I have to take a closer look at it. So, are You saing that we should
> > drop this devlink solution and use sysfs interface fo VFs or are You
> > fine with having both? What with MSI-X allocation for subfunction?
> 
> You should drop for VFs and PFs and keep it for SFs only.
> 

I understand that MSI-X for VFs can be set via sysfs interface, but what
with PFs? Should we always allow max MSI-X for PFs? So hw_max - used -
sfs? Is it save to call pci_enable_msix always with max vectors
supported by device?

I added the value for PFs, because we faced a problem with MSI-X
allocation on 8 port device. Our default value (num_cpus) was too big
(not enough vectors in hw). Changing the amount of vectors that can be
used on PFs was solving the issue.

Let me write an example. As default MSI-X for PF is set to num_cpus, the
platform have 128 CPUs, we have 8 port device installed there and still
have 1024 vectors in HW (I simplified because I don't count additional
interrupts). We run out of vectors, there is 0 vectors that can be used
for VFs. Sure, it is easy to handle, we can divide PFs interrupts by 2
and will end with 512 vectors for VFs. I assume that with current sysfs
interface in this situation MSI-X for VFs can be set from 0 to 512? What
if user wants more? If there is a PFs MSI-X value which can be set by
user, user can decrease the value and use more vectors for VFs. Is it
possible in current VFs sysfs interface? I mean, setting VFs MSI-X
vectors to value that will need to decrease MSI-X for PFs.

> > 
> > > > As Jake said I will fix the code to track both values. Thanks for pointing the patch.
> > > > 
> > > > > 
> > > > > Also in RDMA case, it is not clear what will you achieve by this
> > > > > setting too.
> > > > >
> > > > 
> > > > We have limited number of MSI-X (1024) in the device. Because of that
> > > > the amount of MSI-X for each feature is set to the best values. Half for
> > > > ethernet, half for RDMA. This patchset allow user to change this values.
> > > > If he wants more MSI-X for ethernet, he can decrease MSI-X for RDMA.
> > > 
> > > RDMA devices doesn't have PCI logic and everything is controlled through
> > > you main core module. It means that when you create RDMA auxiliary device,
> > > it will be connected to netdev (RoCE and iWARP) and that netdev should
> > > deal with vectors. So I still don't understand what does it mean "half
> > > for RDMA".
> > >
> > 
> > Yes, it is controlled by module, but during probe, MSI-X vectors for RDMA
> > are reserved and can't be used by ethernet. For example I have
> > 64 CPUs, when loading I get 64 vectors from HW for ethernet and 64 for
> > RDMA. The vectors for RDMA will be consumed by irdma driver, so I won't
> > be able to use it in ethernet and vice versa.
> > 
> > By saing it can't be used I mean that irdma driver received the MSI-X
> > vectors number and it is using them (connected them with RDMA interrupts).
> > 
> > Devlink resource is a way to change the number of MSI-X vectors that
> > will be reserved for RDMA. You wrote that netdev should deal with
> > vectors, but how netdev will know how many vectors should go to RDMA aux
> > device? Does there an interface for setting the vectors amount for RDMA
> > device?
> 
> When RDMA core adds device, it calls to irdma_init_rdma_device() and
> num_comp_vectors is actually the number of MSI-X vectors which you want
> to give to that device.
> 
> I'm trying to say that probably you shouldn't reserve specific vectors
> for both ethernet and RDMA and simply share same vectors. RDMA applications
> that care about performance set comp_vector through user space verbs anyway.
> 

Thanks for explanation, appriciate that. In our driver num_comp_vectors for
RDMA is set during driver probe. Do we have any interface to change
num_comp_vectors while driver is working?

Sorry, I am not fully familiar with RDMA. Can user app for RDMA set
comp_vector to any value or only to max which is num_comp_vectors given
for RDMA while creating aux device?

Assuming that I gave 64 MSI-X for RDMA by setting num_comp_vectors to
64, how I will know if I can or can't use these vectors in ethernet?

Thanks

> Thanks
> 
> > 
> > Thanks
> > 
> > > Thanks

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH net-next 13/13] devlink, ice: add MSIX vectors as devlink resource
  2022-11-14 16:03     ` Piotr Raczynski
  2022-11-15  6:56       ` Michal Swiatkowski
@ 2022-11-15 12:08       ` Jiri Pirko
  1 sibling, 0 replies; 49+ messages in thread
From: Jiri Pirko @ 2022-11-15 12:08 UTC (permalink / raw)
  To: Piotr Raczynski
  Cc: Michal Swiatkowski, netdev, davem, kuba, pabeni, edumazet,
	intel-wired-lan, anthony.l.nguyen, alexandr.lobakin,
	sridhar.samudrala, wojciech.drewek, lukasz.czapnik,
	shiraz.saleem, jesse.brandeburg, mustafa.ismail,
	przemyslaw.kitszel, jacob.e.keller, david.m.ertman,
	leszek.kaliszczuk, Michal Kubiak

Mon, Nov 14, 2022 at 05:03:43PM CET, piotr.raczynski@intel.com wrote:
>On Mon, Nov 14, 2022 at 04:28:38PM +0100, Jiri Pirko wrote:
>> Mon, Nov 14, 2022 at 01:57:55PM CET, michal.swiatkowski@linux.intel.com wrote:
>> >From: Michal Kubiak <michal.kubiak@intel.com>
>> >
>> >Implement devlink resource to control how many MSI-X vectors are
>> >used for eth, VF and RDMA. Show misc MSI-X as read only.
>> >
>> >This is first approach to control the mix of resources managed
>> >by ice driver. This commit registers number of available MSI-X
>> >as devlink resource and also add specific resources for eth, vf and RDMA.
>> >
>> >Also, make those resources generic.
>> >
>> >$ devlink resource show pci/0000:31:00.0
>> >  name msix size 1024 occ 172 unit entry dpipe_tables none
>> 
>> 
>> So, 1024 is the total vector count available in your hw?
>> 
>
>For this particular device and physical function, yes.
>
>
>> 
>> >    resources:
>> >      name msix_misc size 4 unit entry dpipe_tables none
>> 
>> What's misc? Why you don't show occupancy for it? Yet, it seems to be
>> accounted in the total (172)
>> 
>> Also, drop the "msix_" prefix from all, you already have parent called
>> "msix".
>
>misc interrupts are for miscellaneous purposes like communication with
>Firmware or other control plane interrupts (if any).
>
>> 
>> 
>> >      name msix_eth size 92 occ 92 unit entry size_min 1 size_max
>> 
>> Why "size_min is not 0 here?
>
>Thanks, actually 0 would mean disable the eth, default, netdev at all.
>It could be done, however not implemented in this patchset. But for
>cases when the default port is not needed at all, it seems like a good
>idea.

So the behaviour you describe is, if you set number of vectors to 0, the
driver won't instantiate netdev ? That sounds quite odd. In that case,
the minimal value should really be 1. The same goes to rdma.


>
>> 
>> 
>> >	128 size_gran 1 dpipe_tables none
>> >      name msix_vf size 128 occ 0 unit entry size_min 0 size_max
>> >	1020 size_gran 1 dpipe_tables none
>> >      name msix_rdma size 76 occ 76 unit entry size_min 0 size_max
>> 
>> Okay, this means that for eth and rdma, the vectors are fully used, no
>> VF is instanciated?
>
>Yes, in this driver implementation, both eth and rdma will most probably
>be always fully utilized, but the moment you change the size and execute
>`devlink reload` then they will reconfigure with new values.
>
>The VF allocation here is the maximum number of interrupt vectors that
>can be assigned to actually created VFs. If so, then occ shows how many
>are actually utilized by the VFs.
>
>> 
>> 
>> 
>> >	132 size_gran 1 dpipe_tables none
>> >
>> >example commands:
>> >$ devlink resource set pci/0000:31:00.0 path msix/msix_eth size 16
>> >$ devlink resource set pci/0000:31:00.0 path msix/msix_vf size 512
>> >$ devlink dev reload pci/0000:31:00.0

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH net-next 00/13] resource management using devlink reload
  2022-11-15 10:16               ` Michal Swiatkowski
@ 2022-11-15 12:12                 ` Leon Romanovsky
  2022-11-15 14:02                   ` Michal Swiatkowski
  0 siblings, 1 reply; 49+ messages in thread
From: Leon Romanovsky @ 2022-11-15 12:12 UTC (permalink / raw)
  To: Michal Swiatkowski
  Cc: Samudrala, Sridhar, netdev, davem, kuba, pabeni, edumazet,
	intel-wired-lan, jiri, anthony.l.nguyen, alexandr.lobakin,
	wojciech.drewek, lukasz.czapnik, shiraz.saleem, jesse.brandeburg,
	mustafa.ismail, przemyslaw.kitszel, piotr.raczynski,
	jacob.e.keller, david.m.ertman, leszek.kaliszczuk

On Tue, Nov 15, 2022 at 11:16:58AM +0100, Michal Swiatkowski wrote:
> On Tue, Nov 15, 2022 at 11:32:14AM +0200, Leon Romanovsky wrote:
> > On Tue, Nov 15, 2022 at 10:04:49AM +0100, Michal Swiatkowski wrote:
> > > On Tue, Nov 15, 2022 at 10:11:10AM +0200, Leon Romanovsky wrote:
> > > > On Tue, Nov 15, 2022 at 08:12:52AM +0100, Michal Swiatkowski wrote:
> > > > > On Mon, Nov 14, 2022 at 07:07:54PM +0200, Leon Romanovsky wrote:
> > > > > > On Mon, Nov 14, 2022 at 09:31:11AM -0600, Samudrala, Sridhar wrote:
> > > > > > > On 11/14/2022 7:23 AM, Leon Romanovsky wrote:
> > > > > > > > On Mon, Nov 14, 2022 at 01:57:42PM +0100, Michal Swiatkowski wrote:
> > > > > > > > > Currently the default value for number of PF vectors is number of CPUs.
> > > > > > > > > Because of that there are cases when all vectors are used for PF
> > > > > > > > > and user can't create more VFs. It is hard to set default number of
> > > > > > > > > CPUs right for all different use cases. Instead allow user to choose
> > > > > > > > > how many vectors should be used for various features. After implementing
> > > > > > > > > subdevices this mechanism will be also used to set number of vectors
> > > > > > > > > for subfunctions.
> > > > > > > > > 
> > > > > > > > > The idea is to set vectors for eth or VFs using devlink resource API.
> > > > > > > > > New value of vectors will be used after devlink reinit. Example
> > > > > > > > > commands:
> > > > > > > > > $ sudo devlink resource set pci/0000:31:00.0 path msix/msix_eth size 16
> > > > > > > > > $ sudo devlink dev reload pci/0000:31:00.0
> > > > > > > > > After reload driver will work with 16 vectors used for eth instead of
> > > > > > > > > num_cpus.
> > > > > > > > By saying "vectors", are you referring to MSI-X vectors?
> > > > > > > > If yes, you have specific interface for that.
> > > > > > > > https://lore.kernel.org/linux-pci/20210314124256.70253-1-leon@kernel.org/
> > > > > > > 
> > > > > > > This patch series is exposing a resources API to split the device level MSI-X vectors
> > > > > > > across the different functions supported by the device (PF, RDMA, SR-IOV VFs and
> > > > > > > in future subfunctions). Today this is all hidden in a policy implemented within
> > > > > > > the PF driver.
> > > > > > 
> > > > > > Maybe we are talking about different VFs, but if you refer to PCI VFs,
> > > > > > the amount of MSI-X comes from PCI config space for that specific VF.
> > > > > > 
> > > > > > You shouldn't set any value through netdev as it will cause to
> > > > > > difference in output between lspci (which doesn't require any driver)
> > > > > > and your newly set number.
> > > > > 
> > > > > If I understand correctly, lspci shows the MSI-X number for individual
> > > > > VF. Value set via devlink is the total number of MSI-X that can be used
> > > > > when creating VFs. 
> > > > 
> > > > Yes and no, lspci shows how much MSI-X vectors exist from HW point of
> > > > view. Driver can use less than that. It is exactly as your proposed
> > > > devlink interface.
> > > > 
> > > > 
> > > 
> > > Ok, I have to take a closer look at it. So, are You saing that we should
> > > drop this devlink solution and use sysfs interface fo VFs or are You
> > > fine with having both? What with MSI-X allocation for subfunction?
> > 
> > You should drop for VFs and PFs and keep it for SFs only.
> > 
> 
> I understand that MSI-X for VFs can be set via sysfs interface, but what
> with PFs? 

PFs are even more tricker than VFs, as you are changing that number
while driver is bound. This makes me wonder what will be lspci output,
as you will need to show right number before driver starts to load.

You need to present right value if user decided to unbind driver from PF too.

> Should we always allow max MSI-X for PFs? So hw_max - used -
> sfs? Is it save to call pci_enable_msix always with max vectors
> supported by device?

I'm not sure. I think that it won't give you much if you enable
more than num_online_cpu().

> 
> I added the value for PFs, because we faced a problem with MSI-X
> allocation on 8 port device. Our default value (num_cpus) was too big
> (not enough vectors in hw). Changing the amount of vectors that can be
> used on PFs was solving the issue.

We had something similar for mlx5 SFs, where we don't have enough vectors.
Our solution is simply to move to automatic shared MSI-X mode. I would
advocate for that for you as well. 

> 
> Let me write an example. As default MSI-X for PF is set to num_cpus, the
> platform have 128 CPUs, we have 8 port device installed there and still
> have 1024 vectors in HW (I simplified because I don't count additional
> interrupts). We run out of vectors, there is 0 vectors that can be used
> for VFs. Sure, it is easy to handle, we can divide PFs interrupts by 2
> and will end with 512 vectors for VFs. I assume that with current sysfs
> interface in this situation MSI-X for VFs can be set from 0 to 512? What
> if user wants more? If there is a PFs MSI-X value which can be set by
> user, user can decrease the value and use more vectors for VFs. Is it
> possible in current VFs sysfs interface? I mean, setting VFs MSI-X
> vectors to value that will need to decrease MSI-X for PFs.

You can't do it and this limitation is because PF is bound. You can't change
that number while driver is running. AFAIR, such change will be PCI spec
violation.

> 
> > > 
> > > > > As Jake said I will fix the code to track both values. Thanks for pointing the patch.
> > > > > 
> > > > > > 
> > > > > > Also in RDMA case, it is not clear what will you achieve by this
> > > > > > setting too.
> > > > > >
> > > > > 
> > > > > We have limited number of MSI-X (1024) in the device. Because of that
> > > > > the amount of MSI-X for each feature is set to the best values. Half for
> > > > > ethernet, half for RDMA. This patchset allow user to change this values.
> > > > > If he wants more MSI-X for ethernet, he can decrease MSI-X for RDMA.
> > > > 
> > > > RDMA devices doesn't have PCI logic and everything is controlled through
> > > > you main core module. It means that when you create RDMA auxiliary device,
> > > > it will be connected to netdev (RoCE and iWARP) and that netdev should
> > > > deal with vectors. So I still don't understand what does it mean "half
> > > > for RDMA".
> > > >
> > > 
> > > Yes, it is controlled by module, but during probe, MSI-X vectors for RDMA
> > > are reserved and can't be used by ethernet. For example I have
> > > 64 CPUs, when loading I get 64 vectors from HW for ethernet and 64 for
> > > RDMA. The vectors for RDMA will be consumed by irdma driver, so I won't
> > > be able to use it in ethernet and vice versa.
> > > 
> > > By saing it can't be used I mean that irdma driver received the MSI-X
> > > vectors number and it is using them (connected them with RDMA interrupts).
> > > 
> > > Devlink resource is a way to change the number of MSI-X vectors that
> > > will be reserved for RDMA. You wrote that netdev should deal with
> > > vectors, but how netdev will know how many vectors should go to RDMA aux
> > > device? Does there an interface for setting the vectors amount for RDMA
> > > device?
> > 
> > When RDMA core adds device, it calls to irdma_init_rdma_device() and
> > num_comp_vectors is actually the number of MSI-X vectors which you want
> > to give to that device.
> > 
> > I'm trying to say that probably you shouldn't reserve specific vectors
> > for both ethernet and RDMA and simply share same vectors. RDMA applications
> > that care about performance set comp_vector through user space verbs anyway.
> > 
> 
> Thanks for explanation, appriciate that. In our driver num_comp_vectors for
> RDMA is set during driver probe. Do we have any interface to change
> num_comp_vectors while driver is working?

No, as this number is indirectly exposed to the user space.

> 
> Sorry, I am not fully familiar with RDMA. Can user app for RDMA set
> comp_vector to any value or only to max which is num_comp_vectors given
> for RDMA while creating aux device?

comp_vector logically equal to IRQ number and this is how RDMA
applications controls to which interrupt deliver completions.

" The CQ will use the completion vector comp_vector for signaling
  completion events; it must be at least zero and less than
  context->num_comp_vectors. "
https://man7.org/linux/man-pages/man3/ibv_create_cq.3.html

> 
> Assuming that I gave 64 MSI-X for RDMA by setting num_comp_vectors to
> 64, how I will know if I can or can't use these vectors in ethernet?

Why should you need to know? Vectors are not exclusive and they can be
used by many applications at the same time. The thing is that it is far
fetch to except that high performance RDMA applications and high
performance ethernet can coexist on same device at the same time.

Thanks

> 
> Thanks
> 
> > Thanks
> > 
> > > 
> > > Thanks
> > > 
> > > > Thanks

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH net-next 00/13] resource management using devlink reload
  2022-11-15 12:12                 ` Leon Romanovsky
@ 2022-11-15 14:02                   ` Michal Swiatkowski
  2022-11-15 17:57                     ` Leon Romanovsky
  0 siblings, 1 reply; 49+ messages in thread
From: Michal Swiatkowski @ 2022-11-15 14:02 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Samudrala, Sridhar, netdev, davem, kuba, pabeni, edumazet,
	intel-wired-lan, jiri, anthony.l.nguyen, alexandr.lobakin,
	wojciech.drewek, lukasz.czapnik, shiraz.saleem, jesse.brandeburg,
	mustafa.ismail, przemyslaw.kitszel, piotr.raczynski,
	jacob.e.keller, david.m.ertman, leszek.kaliszczuk

On Tue, Nov 15, 2022 at 02:12:12PM +0200, Leon Romanovsky wrote:
> On Tue, Nov 15, 2022 at 11:16:58AM +0100, Michal Swiatkowski wrote:
> > On Tue, Nov 15, 2022 at 11:32:14AM +0200, Leon Romanovsky wrote:
> > > On Tue, Nov 15, 2022 at 10:04:49AM +0100, Michal Swiatkowski wrote:
> > > > On Tue, Nov 15, 2022 at 10:11:10AM +0200, Leon Romanovsky wrote:
> > > > > On Tue, Nov 15, 2022 at 08:12:52AM +0100, Michal Swiatkowski wrote:
> > > > > > On Mon, Nov 14, 2022 at 07:07:54PM +0200, Leon Romanovsky wrote:
> > > > > > > On Mon, Nov 14, 2022 at 09:31:11AM -0600, Samudrala, Sridhar wrote:
> > > > > > > > On 11/14/2022 7:23 AM, Leon Romanovsky wrote:
> > > > > > > > > On Mon, Nov 14, 2022 at 01:57:42PM +0100, Michal Swiatkowski wrote:
> > > > > > > > > > Currently the default value for number of PF vectors is number of CPUs.
> > > > > > > > > > Because of that there are cases when all vectors are used for PF
> > > > > > > > > > and user can't create more VFs. It is hard to set default number of
> > > > > > > > > > CPUs right for all different use cases. Instead allow user to choose
> > > > > > > > > > how many vectors should be used for various features. After implementing
> > > > > > > > > > subdevices this mechanism will be also used to set number of vectors
> > > > > > > > > > for subfunctions.
> > > > > > > > > > 
> > > > > > > > > > The idea is to set vectors for eth or VFs using devlink resource API.
> > > > > > > > > > New value of vectors will be used after devlink reinit. Example
> > > > > > > > > > commands:
> > > > > > > > > > $ sudo devlink resource set pci/0000:31:00.0 path msix/msix_eth size 16
> > > > > > > > > > $ sudo devlink dev reload pci/0000:31:00.0
> > > > > > > > > > After reload driver will work with 16 vectors used for eth instead of
> > > > > > > > > > num_cpus.
> > > > > > > > > By saying "vectors", are you referring to MSI-X vectors?
> > > > > > > > > If yes, you have specific interface for that.
> > > > > > > > > https://lore.kernel.org/linux-pci/20210314124256.70253-1-leon@kernel.org/
> > > > > > > > 
> > > > > > > > This patch series is exposing a resources API to split the device level MSI-X vectors
> > > > > > > > across the different functions supported by the device (PF, RDMA, SR-IOV VFs and
> > > > > > > > in future subfunctions). Today this is all hidden in a policy implemented within
> > > > > > > > the PF driver.
> > > > > > > 
> > > > > > > Maybe we are talking about different VFs, but if you refer to PCI VFs,
> > > > > > > the amount of MSI-X comes from PCI config space for that specific VF.
> > > > > > > 
> > > > > > > You shouldn't set any value through netdev as it will cause to
> > > > > > > difference in output between lspci (which doesn't require any driver)
> > > > > > > and your newly set number.
> > > > > > 
> > > > > > If I understand correctly, lspci shows the MSI-X number for individual
> > > > > > VF. Value set via devlink is the total number of MSI-X that can be used
> > > > > > when creating VFs. 
> > > > > 
> > > > > Yes and no, lspci shows how much MSI-X vectors exist from HW point of
> > > > > view. Driver can use less than that. It is exactly as your proposed
> > > > > devlink interface.
> > > > > 
> > > > > 
> > > > 
> > > > Ok, I have to take a closer look at it. So, are You saing that we should
> > > > drop this devlink solution and use sysfs interface fo VFs or are You
> > > > fine with having both? What with MSI-X allocation for subfunction?
> > > 
> > > You should drop for VFs and PFs and keep it for SFs only.
> > > 
> > 
> > I understand that MSI-X for VFs can be set via sysfs interface, but what
> > with PFs? 
> 
> PFs are even more tricker than VFs, as you are changing that number
> while driver is bound. This makes me wonder what will be lspci output,
> as you will need to show right number before driver starts to load.
> 
> You need to present right value if user decided to unbind driver from PF too.
>

In case of ice driver lspci -vs shows:
Capabilities: [70] MSI-X: Enable+ Count=1024 Masked

so all vectors that hw supports (PFs, VFs, misc, etc). Because of that
total number of MSI-X in the devlink example from cover letter is 1024.

I see that mellanox shows:
Capabilities: [9c] MSI-X: Enable+ Count=64 Masked

I assume that 64 is in this case MSI-X ony for this one PF (it make
sense).
To be honest I don't know why we show maximum MSI-X for the device
there, but because of that the value will be the same afer changing
allocation of MSI-X across features.

Isn't the MSI-X capabilities read from HW register?

> > Should we always allow max MSI-X for PFs? So hw_max - used -
> > sfs? Is it save to call pci_enable_msix always with max vectors
> > supported by device?
> 
> I'm not sure. I think that it won't give you much if you enable
> more than num_online_cpu().
> 

Oh, yes, correct, I missed that.

> > 
> > I added the value for PFs, because we faced a problem with MSI-X
> > allocation on 8 port device. Our default value (num_cpus) was too big
> > (not enough vectors in hw). Changing the amount of vectors that can be
> > used on PFs was solving the issue.
> 
> We had something similar for mlx5 SFs, where we don't have enough vectors.
> Our solution is simply to move to automatic shared MSI-X mode. I would
> advocate for that for you as well. 
>

Thanks for proposing solution, I will take a look how this work in mlx5.

> > 
> > Let me write an example. As default MSI-X for PF is set to num_cpus, the
> > platform have 128 CPUs, we have 8 port device installed there and still
> > have 1024 vectors in HW (I simplified because I don't count additional
> > interrupts). We run out of vectors, there is 0 vectors that can be used
> > for VFs. Sure, it is easy to handle, we can divide PFs interrupts by 2
> > and will end with 512 vectors for VFs. I assume that with current sysfs
> > interface in this situation MSI-X for VFs can be set from 0 to 512? What
> > if user wants more? If there is a PFs MSI-X value which can be set by
> > user, user can decrease the value and use more vectors for VFs. Is it
> > possible in current VFs sysfs interface? I mean, setting VFs MSI-X
> > vectors to value that will need to decrease MSI-X for PFs.
> 
> You can't do it and this limitation is because PF is bound. You can't change
> that number while driver is running. AFAIR, such change will be PCI spec
> violation.
>

As in previous comment, we have different value in MSI-X capabilities
field (max number of vectors), so I won't allow user to change this
value. I don't know why it is done like that (MSI-X amount for whole
device in PCI caps). I will try to dig into it.

> > 
> > > > 
> > > > > > As Jake said I will fix the code to track both values. Thanks for pointing the patch.
> > > > > > 
> > > > > > > 
> > > > > > > Also in RDMA case, it is not clear what will you achieve by this
> > > > > > > setting too.
> > > > > > >
> > > > > > 
> > > > > > We have limited number of MSI-X (1024) in the device. Because of that
> > > > > > the amount of MSI-X for each feature is set to the best values. Half for
> > > > > > ethernet, half for RDMA. This patchset allow user to change this values.
> > > > > > If he wants more MSI-X for ethernet, he can decrease MSI-X for RDMA.
> > > > > 
> > > > > RDMA devices doesn't have PCI logic and everything is controlled through
> > > > > you main core module. It means that when you create RDMA auxiliary device,
> > > > > it will be connected to netdev (RoCE and iWARP) and that netdev should
> > > > > deal with vectors. So I still don't understand what does it mean "half
> > > > > for RDMA".
> > > > >
> > > > 
> > > > Yes, it is controlled by module, but during probe, MSI-X vectors for RDMA
> > > > are reserved and can't be used by ethernet. For example I have
> > > > 64 CPUs, when loading I get 64 vectors from HW for ethernet and 64 for
> > > > RDMA. The vectors for RDMA will be consumed by irdma driver, so I won't
> > > > be able to use it in ethernet and vice versa.
> > > > 
> > > > By saing it can't be used I mean that irdma driver received the MSI-X
> > > > vectors number and it is using them (connected them with RDMA interrupts).
> > > > 
> > > > Devlink resource is a way to change the number of MSI-X vectors that
> > > > will be reserved for RDMA. You wrote that netdev should deal with
> > > > vectors, but how netdev will know how many vectors should go to RDMA aux
> > > > device? Does there an interface for setting the vectors amount for RDMA
> > > > device?
> > > 
> > > When RDMA core adds device, it calls to irdma_init_rdma_device() and
> > > num_comp_vectors is actually the number of MSI-X vectors which you want
> > > to give to that device.
> > > 
> > > I'm trying to say that probably you shouldn't reserve specific vectors
> > > for both ethernet and RDMA and simply share same vectors. RDMA applications
> > > that care about performance set comp_vector through user space verbs anyway.
> > > 
> > 
> > Thanks for explanation, appriciate that. In our driver num_comp_vectors for
> > RDMA is set during driver probe. Do we have any interface to change
> > num_comp_vectors while driver is working?
> 
> No, as this number is indirectly exposed to the user space.
> 

Ok, thanks.

> > 
> > Sorry, I am not fully familiar with RDMA. Can user app for RDMA set
> > comp_vector to any value or only to max which is num_comp_vectors given
> > for RDMA while creating aux device?
> 
> comp_vector logically equal to IRQ number and this is how RDMA
> applications controls to which interrupt deliver completions.
> 
> " The CQ will use the completion vector comp_vector for signaling
>   completion events; it must be at least zero and less than
>   context->num_comp_vectors. "
> https://man7.org/linux/man-pages/man3/ibv_create_cq.3.html
>

Thank You

> > 
> > Assuming that I gave 64 MSI-X for RDMA by setting num_comp_vectors to
> > 64, how I will know if I can or can't use these vectors in ethernet?
> 
> Why should you need to know? Vectors are not exclusive and they can be
> used by many applications at the same time. The thing is that it is far
> fetch to except that high performance RDMA applications and high
> performance ethernet can coexist on same device at the same time.
>

Yes, but after loading aux device part of vectors (num_comp_vectors) are
reserved for only RDMA use case (at least in ice driver). We though that
devlink resource interface can be a good way to change the
num_comp_vectors in this case.

Maybe I am wrong, but I think that vectors that we reserved for RDMA
can't be used by ethernet (in ice driver). We sent specific MSI-X entries
to rdma driver, so I don't think that the same entries can be reuse by
ethernet. I am talking only about ice + irdma case, am I wrong (probably
question to Intel devs)?

Huge thanks for Your comments here, I understnad more now.

> Thanks
> 
> > 
> > Thanks
> > 
> > > Thanks
> > > 
> > > > 
> > > > Thanks
> > > > 
> > > > > Thanks

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH net-next 00/13] resource management using devlink reload
  2022-11-15 14:02                   ` Michal Swiatkowski
@ 2022-11-15 17:57                     ` Leon Romanovsky
  2022-11-16  1:59                       ` Samudrala, Sridhar
  0 siblings, 1 reply; 49+ messages in thread
From: Leon Romanovsky @ 2022-11-15 17:57 UTC (permalink / raw)
  To: Michal Swiatkowski
  Cc: Samudrala, Sridhar, netdev, davem, kuba, pabeni, edumazet,
	intel-wired-lan, jiri, anthony.l.nguyen, alexandr.lobakin,
	wojciech.drewek, lukasz.czapnik, shiraz.saleem, jesse.brandeburg,
	mustafa.ismail, przemyslaw.kitszel, piotr.raczynski,
	jacob.e.keller, david.m.ertman, leszek.kaliszczuk

On Tue, Nov 15, 2022 at 03:02:40PM +0100, Michal Swiatkowski wrote:
> On Tue, Nov 15, 2022 at 02:12:12PM +0200, Leon Romanovsky wrote:
> > On Tue, Nov 15, 2022 at 11:16:58AM +0100, Michal Swiatkowski wrote:
> > > On Tue, Nov 15, 2022 at 11:32:14AM +0200, Leon Romanovsky wrote:
> > > > On Tue, Nov 15, 2022 at 10:04:49AM +0100, Michal Swiatkowski wrote:
> > > > > On Tue, Nov 15, 2022 at 10:11:10AM +0200, Leon Romanovsky wrote:
> > > > > > On Tue, Nov 15, 2022 at 08:12:52AM +0100, Michal Swiatkowski wrote:
> > > > > > > On Mon, Nov 14, 2022 at 07:07:54PM +0200, Leon Romanovsky wrote:
> > > > > > > > On Mon, Nov 14, 2022 at 09:31:11AM -0600, Samudrala, Sridhar wrote:
> > > > > > > > > On 11/14/2022 7:23 AM, Leon Romanovsky wrote:
> > > > > > > > > > On Mon, Nov 14, 2022 at 01:57:42PM +0100, Michal Swiatkowski wrote:
> > > > > > > > > > > Currently the default value for number of PF vectors is number of CPUs.
> > > > > > > > > > > Because of that there are cases when all vectors are used for PF
> > > > > > > > > > > and user can't create more VFs. It is hard to set default number of
> > > > > > > > > > > CPUs right for all different use cases. Instead allow user to choose
> > > > > > > > > > > how many vectors should be used for various features. After implementing
> > > > > > > > > > > subdevices this mechanism will be also used to set number of vectors
> > > > > > > > > > > for subfunctions.
> > > > > > > > > > > 
> > > > > > > > > > > The idea is to set vectors for eth or VFs using devlink resource API.
> > > > > > > > > > > New value of vectors will be used after devlink reinit. Example
> > > > > > > > > > > commands:
> > > > > > > > > > > $ sudo devlink resource set pci/0000:31:00.0 path msix/msix_eth size 16
> > > > > > > > > > > $ sudo devlink dev reload pci/0000:31:00.0
> > > > > > > > > > > After reload driver will work with 16 vectors used for eth instead of
> > > > > > > > > > > num_cpus.
> > > > > > > > > > By saying "vectors", are you referring to MSI-X vectors?
> > > > > > > > > > If yes, you have specific interface for that.
> > > > > > > > > > https://lore.kernel.org/linux-pci/20210314124256.70253-1-leon@kernel.org/
> > > > > > > > > 
> > > > > > > > > This patch series is exposing a resources API to split the device level MSI-X vectors
> > > > > > > > > across the different functions supported by the device (PF, RDMA, SR-IOV VFs and
> > > > > > > > > in future subfunctions). Today this is all hidden in a policy implemented within
> > > > > > > > > the PF driver.
> > > > > > > > 
> > > > > > > > Maybe we are talking about different VFs, but if you refer to PCI VFs,
> > > > > > > > the amount of MSI-X comes from PCI config space for that specific VF.
> > > > > > > > 
> > > > > > > > You shouldn't set any value through netdev as it will cause to
> > > > > > > > difference in output between lspci (which doesn't require any driver)
> > > > > > > > and your newly set number.
> > > > > > > 
> > > > > > > If I understand correctly, lspci shows the MSI-X number for individual
> > > > > > > VF. Value set via devlink is the total number of MSI-X that can be used
> > > > > > > when creating VFs. 
> > > > > > 
> > > > > > Yes and no, lspci shows how much MSI-X vectors exist from HW point of
> > > > > > view. Driver can use less than that. It is exactly as your proposed
> > > > > > devlink interface.
> > > > > > 
> > > > > > 
> > > > > 
> > > > > Ok, I have to take a closer look at it. So, are You saing that we should
> > > > > drop this devlink solution and use sysfs interface fo VFs or are You
> > > > > fine with having both? What with MSI-X allocation for subfunction?
> > > > 
> > > > You should drop for VFs and PFs and keep it for SFs only.
> > > > 
> > > 
> > > I understand that MSI-X for VFs can be set via sysfs interface, but what
> > > with PFs? 
> > 
> > PFs are even more tricker than VFs, as you are changing that number
> > while driver is bound. This makes me wonder what will be lspci output,
> > as you will need to show right number before driver starts to load.
> > 
> > You need to present right value if user decided to unbind driver from PF too.
> >
> 
> In case of ice driver lspci -vs shows:
> Capabilities: [70] MSI-X: Enable+ Count=1024 Masked
> 
> so all vectors that hw supports (PFs, VFs, misc, etc). Because of that
> total number of MSI-X in the devlink example from cover letter is 1024.
> 
> I see that mellanox shows:
> Capabilities: [9c] MSI-X: Enable+ Count=64 Masked
> 
> I assume that 64 is in this case MSI-X ony for this one PF (it make
> sense).

Yes and PF MSI-X count can be changed through FW configuration tool, as
we need to write new value when the driver is unbound and we need it to
be persistent. Users are expecting to see "stable" number any time they
reboot the server. It is not the case for VFs, as they are explicitly
created after reboots and start "fresh" after every boot.

So we set large enough but not too large value as a default for PFs.
If you find sane model of how to change it through kernel, you can count
on our support.

> To be honest I don't know why we show maximum MSI-X for the device
> there, but because of that the value will be the same afer changing
> allocation of MSI-X across features.
> 
> Isn't the MSI-X capabilities read from HW register?

Yes, it comes from Message Control register.

> 
> > > Should we always allow max MSI-X for PFs? So hw_max - used -
> > > sfs? Is it save to call pci_enable_msix always with max vectors
> > > supported by device?
> > 
> > I'm not sure. I think that it won't give you much if you enable
> > more than num_online_cpu().
> > 
> 
> Oh, yes, correct, I missed that.
> 
> > > 
> > > I added the value for PFs, because we faced a problem with MSI-X
> > > allocation on 8 port device. Our default value (num_cpus) was too big
> > > (not enough vectors in hw). Changing the amount of vectors that can be
> > > used on PFs was solving the issue.
> > 
> > We had something similar for mlx5 SFs, where we don't have enough vectors.
> > Our solution is simply to move to automatic shared MSI-X mode. I would
> > advocate for that for you as well. 
> >
> 
> Thanks for proposing solution, I will take a look how this work in mlx5.

BTW, PCI spec talks about something like that in short paragraph
"Handling MSI-X Vector Shortages".

<...>

> > > 
> > > Assuming that I gave 64 MSI-X for RDMA by setting num_comp_vectors to
> > > 64, how I will know if I can or can't use these vectors in ethernet?
> > 
> > Why should you need to know? Vectors are not exclusive and they can be
> > used by many applications at the same time. The thing is that it is far
> > fetch to except that high performance RDMA applications and high
> > performance ethernet can coexist on same device at the same time.
> >
> 
> Yes, but after loading aux device part of vectors (num_comp_vectors) are
> reserved for only RDMA use case (at least in ice driver). We though that
> devlink resource interface can be a good way to change the
> num_comp_vectors in this case.

It can be, but is much better to save from users devlink magic. :)

Thanks

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH net-next 00/13] resource management using devlink reload
  2022-11-15 17:57                     ` Leon Romanovsky
@ 2022-11-16  1:59                       ` Samudrala, Sridhar
  2022-11-16  6:04                         ` Leon Romanovsky
  0 siblings, 1 reply; 49+ messages in thread
From: Samudrala, Sridhar @ 2022-11-16  1:59 UTC (permalink / raw)
  To: Leon Romanovsky, Michal Swiatkowski
  Cc: netdev, davem, kuba, pabeni, edumazet, intel-wired-lan, jiri,
	anthony.l.nguyen, alexandr.lobakin, wojciech.drewek,
	lukasz.czapnik, shiraz.saleem, jesse.brandeburg, mustafa.ismail,
	przemyslaw.kitszel, piotr.raczynski, jacob.e.keller,
	david.m.ertman, leszek.kaliszczuk

On 11/15/2022 11:57 AM, Leon Romanovsky wrote:
> On Tue, Nov 15, 2022 at 03:02:40PM +0100, Michal Swiatkowski wrote:
>> On Tue, Nov 15, 2022 at 02:12:12PM +0200, Leon Romanovsky wrote:
>>> On Tue, Nov 15, 2022 at 11:16:58AM +0100, Michal Swiatkowski wrote:
>>>> On Tue, Nov 15, 2022 at 11:32:14AM +0200, Leon Romanovsky wrote:
>>>>> On Tue, Nov 15, 2022 at 10:04:49AM +0100, Michal Swiatkowski wrote:
>>>>>> On Tue, Nov 15, 2022 at 10:11:10AM +0200, Leon Romanovsky wrote:
>>>>>>> On Tue, Nov 15, 2022 at 08:12:52AM +0100, Michal Swiatkowski wrote:
>>>>>>>> On Mon, Nov 14, 2022 at 07:07:54PM +0200, Leon Romanovsky wrote:
>>>>>>>>> On Mon, Nov 14, 2022 at 09:31:11AM -0600, Samudrala, Sridhar wrote:
>>>>>>>>>> On 11/14/2022 7:23 AM, Leon Romanovsky wrote:
>>>>>>>>>>> On Mon, Nov 14, 2022 at 01:57:42PM +0100, Michal Swiatkowski wrote:
>>>>>>>>>>>> Currently the default value for number of PF vectors is number of CPUs.
>>>>>>>>>>>> Because of that there are cases when all vectors are used for PF
>>>>>>>>>>>> and user can't create more VFs. It is hard to set default number of
>>>>>>>>>>>> CPUs right for all different use cases. Instead allow user to choose
>>>>>>>>>>>> how many vectors should be used for various features. After implementing
>>>>>>>>>>>> subdevices this mechanism will be also used to set number of vectors
>>>>>>>>>>>> for subfunctions.
>>>>>>>>>>>>
>>>>>>>>>>>> The idea is to set vectors for eth or VFs using devlink resource API.
>>>>>>>>>>>> New value of vectors will be used after devlink reinit. Example
>>>>>>>>>>>> commands:
>>>>>>>>>>>> $ sudo devlink resource set pci/0000:31:00.0 path msix/msix_eth size 16
>>>>>>>>>>>> $ sudo devlink dev reload pci/0000:31:00.0
>>>>>>>>>>>> After reload driver will work with 16 vectors used for eth instead of
>>>>>>>>>>>> num_cpus.
>>>>>>>>>>> By saying "vectors", are you referring to MSI-X vectors?
>>>>>>>>>>> If yes, you have specific interface for that.
>>>>>>>>>>> https://lore.kernel.org/linux-pci/20210314124256.70253-1-leon@kernel.org/
>>>>>>>>>> This patch series is exposing a resources API to split the device level MSI-X vectors
>>>>>>>>>> across the different functions supported by the device (PF, RDMA, SR-IOV VFs and
>>>>>>>>>> in future subfunctions). Today this is all hidden in a policy implemented within
>>>>>>>>>> the PF driver.
>>>>>>>>> Maybe we are talking about different VFs, but if you refer to PCI VFs,
>>>>>>>>> the amount of MSI-X comes from PCI config space for that specific VF.
>>>>>>>>>
>>>>>>>>> You shouldn't set any value through netdev as it will cause to
>>>>>>>>> difference in output between lspci (which doesn't require any driver)
>>>>>>>>> and your newly set number.
>>>>>>>> If I understand correctly, lspci shows the MSI-X number for individual
>>>>>>>> VF. Value set via devlink is the total number of MSI-X that can be used
>>>>>>>> when creating VFs.
>>>>>>> Yes and no, lspci shows how much MSI-X vectors exist from HW point of
>>>>>>> view. Driver can use less than that. It is exactly as your proposed
>>>>>>> devlink interface.
>>>>>>>
>>>>>>>
>>>>>> Ok, I have to take a closer look at it. So, are You saing that we should
>>>>>> drop this devlink solution and use sysfs interface fo VFs or are You
>>>>>> fine with having both? What with MSI-X allocation for subfunction?
>>>>> You should drop for VFs and PFs and keep it for SFs only.
>>>>>
>>>> I understand that MSI-X for VFs can be set via sysfs interface, but what
>>>> with PFs?
>>> PFs are even more tricker than VFs, as you are changing that number
>>> while driver is bound. This makes me wonder what will be lspci output,
>>> as you will need to show right number before driver starts to load.
>>>
>>> You need to present right value if user decided to unbind driver from PF too.
>>>
>> In case of ice driver lspci -vs shows:
>> Capabilities: [70] MSI-X: Enable+ Count=1024 Masked
>>
>> so all vectors that hw supports (PFs, VFs, misc, etc). Because of that
>> total number of MSI-X in the devlink example from cover letter is 1024.
>>
>> I see that mellanox shows:
>> Capabilities: [9c] MSI-X: Enable+ Count=64 Masked
>>
>> I assume that 64 is in this case MSI-X ony for this one PF (it make
>> sense).
> Yes and PF MSI-X count can be changed through FW configuration tool, as
> we need to write new value when the driver is unbound and we need it to
> be persistent. Users are expecting to see "stable" number any time they
> reboot the server. It is not the case for VFs, as they are explicitly
> created after reboots and start "fresh" after every boot.
>
> So we set large enough but not too large value as a default for PFs.
> If you find sane model of how to change it through kernel, you can count
> on our support.

I guess one main difference is that in case of ice, PF driver manager resources
for all its associated functions, not the FW. So the MSI-X count reported for PF
shows the total vectors(PF netdev, VFs, rdma, SFs). VFs talk to PF over a mailbox
to get their MSI-X vector information.




^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH net-next 00/13] resource management using devlink reload
  2022-11-16  1:59                       ` Samudrala, Sridhar
@ 2022-11-16  6:04                         ` Leon Romanovsky
  2022-11-16 12:04                           ` Michal Swiatkowski
  0 siblings, 1 reply; 49+ messages in thread
From: Leon Romanovsky @ 2022-11-16  6:04 UTC (permalink / raw)
  To: Samudrala, Sridhar
  Cc: Michal Swiatkowski, netdev, davem, kuba, pabeni, edumazet,
	intel-wired-lan, jiri, anthony.l.nguyen, alexandr.lobakin,
	wojciech.drewek, lukasz.czapnik, shiraz.saleem, jesse.brandeburg,
	mustafa.ismail, przemyslaw.kitszel, piotr.raczynski,
	jacob.e.keller, david.m.ertman, leszek.kaliszczuk

On Tue, Nov 15, 2022 at 07:59:06PM -0600, Samudrala, Sridhar wrote:
> On 11/15/2022 11:57 AM, Leon Romanovsky wrote:

<...>

> > > In case of ice driver lspci -vs shows:
> > > Capabilities: [70] MSI-X: Enable+ Count=1024 Masked
> > > 
> > > so all vectors that hw supports (PFs, VFs, misc, etc). Because of that
> > > total number of MSI-X in the devlink example from cover letter is 1024.
> > > 
> > > I see that mellanox shows:
> > > Capabilities: [9c] MSI-X: Enable+ Count=64 Masked
> > > 
> > > I assume that 64 is in this case MSI-X ony for this one PF (it make
> > > sense).
> > Yes and PF MSI-X count can be changed through FW configuration tool, as
> > we need to write new value when the driver is unbound and we need it to
> > be persistent. Users are expecting to see "stable" number any time they
> > reboot the server. It is not the case for VFs, as they are explicitly
> > created after reboots and start "fresh" after every boot.
> > 
> > So we set large enough but not too large value as a default for PFs.
> > If you find sane model of how to change it through kernel, you can count
> > on our support.
> 
> I guess one main difference is that in case of ice, PF driver manager resources
> for all its associated functions, not the FW. So the MSI-X count reported for PF
> shows the total vectors(PF netdev, VFs, rdma, SFs). VFs talk to PF over a mailbox
> to get their MSI-X vector information.

What is the output of lspci for ice VF when the driver is not bound?

Thanks

> 
> 
> 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH net-next 00/13] resource management using devlink reload
  2022-11-16  6:04                         ` Leon Romanovsky
@ 2022-11-16 12:04                           ` Michal Swiatkowski
  2022-11-16 17:59                             ` Leon Romanovsky
  0 siblings, 1 reply; 49+ messages in thread
From: Michal Swiatkowski @ 2022-11-16 12:04 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Samudrala, Sridhar, netdev, davem, kuba, pabeni, edumazet,
	intel-wired-lan, jiri, anthony.l.nguyen, alexandr.lobakin,
	wojciech.drewek, lukasz.czapnik, shiraz.saleem, jesse.brandeburg,
	mustafa.ismail, przemyslaw.kitszel, piotr.raczynski,
	jacob.e.keller, david.m.ertman, leszek.kaliszczuk

On Wed, Nov 16, 2022 at 08:04:56AM +0200, Leon Romanovsky wrote:
> On Tue, Nov 15, 2022 at 07:59:06PM -0600, Samudrala, Sridhar wrote:
> > On 11/15/2022 11:57 AM, Leon Romanovsky wrote:
> 
> <...>
> 
> > > > In case of ice driver lspci -vs shows:
> > > > Capabilities: [70] MSI-X: Enable+ Count=1024 Masked
> > > > 
> > > > so all vectors that hw supports (PFs, VFs, misc, etc). Because of that
> > > > total number of MSI-X in the devlink example from cover letter is 1024.
> > > > 
> > > > I see that mellanox shows:
> > > > Capabilities: [9c] MSI-X: Enable+ Count=64 Masked
> > > > 
> > > > I assume that 64 is in this case MSI-X ony for this one PF (it make
> > > > sense).
> > > Yes and PF MSI-X count can be changed through FW configuration tool, as
> > > we need to write new value when the driver is unbound and we need it to
> > > be persistent. Users are expecting to see "stable" number any time they
> > > reboot the server. It is not the case for VFs, as they are explicitly
> > > created after reboots and start "fresh" after every boot.
> > > 
> > > So we set large enough but not too large value as a default for PFs.
> > > If you find sane model of how to change it through kernel, you can count
> > > on our support.
> > 
> > I guess one main difference is that in case of ice, PF driver manager resources
> > for all its associated functions, not the FW. So the MSI-X count reported for PF
> > shows the total vectors(PF netdev, VFs, rdma, SFs). VFs talk to PF over a mailbox
> > to get their MSI-X vector information.
> 
> What is the output of lspci for ice VF when the driver is not bound?
> 
> Thanks
> 

It is the same after creating and after unbonding:
Capabilities: [70] MSI-X: Enable- Count=17 Masked-

17, because 16 for traffic and 1 for mailbox.

Thanks
> > 
> > 
> > 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH net-next 00/13] resource management using devlink reload
  2022-11-16 12:04                           ` Michal Swiatkowski
@ 2022-11-16 17:59                             ` Leon Romanovsky
  2022-11-17 11:10                               ` Michal Swiatkowski
  0 siblings, 1 reply; 49+ messages in thread
From: Leon Romanovsky @ 2022-11-16 17:59 UTC (permalink / raw)
  To: Michal Swiatkowski
  Cc: Samudrala, Sridhar, netdev, davem, kuba, pabeni, edumazet,
	intel-wired-lan, jiri, anthony.l.nguyen, alexandr.lobakin,
	wojciech.drewek, lukasz.czapnik, shiraz.saleem, jesse.brandeburg,
	mustafa.ismail, przemyslaw.kitszel, piotr.raczynski,
	jacob.e.keller, david.m.ertman, leszek.kaliszczuk

On Wed, Nov 16, 2022 at 01:04:36PM +0100, Michal Swiatkowski wrote:
> On Wed, Nov 16, 2022 at 08:04:56AM +0200, Leon Romanovsky wrote:
> > On Tue, Nov 15, 2022 at 07:59:06PM -0600, Samudrala, Sridhar wrote:
> > > On 11/15/2022 11:57 AM, Leon Romanovsky wrote:
> > 
> > <...>
> > 
> > > > > In case of ice driver lspci -vs shows:
> > > > > Capabilities: [70] MSI-X: Enable+ Count=1024 Masked
> > > > > 
> > > > > so all vectors that hw supports (PFs, VFs, misc, etc). Because of that
> > > > > total number of MSI-X in the devlink example from cover letter is 1024.
> > > > > 
> > > > > I see that mellanox shows:
> > > > > Capabilities: [9c] MSI-X: Enable+ Count=64 Masked
> > > > > 
> > > > > I assume that 64 is in this case MSI-X ony for this one PF (it make
> > > > > sense).
> > > > Yes and PF MSI-X count can be changed through FW configuration tool, as
> > > > we need to write new value when the driver is unbound and we need it to
> > > > be persistent. Users are expecting to see "stable" number any time they
> > > > reboot the server. It is not the case for VFs, as they are explicitly
> > > > created after reboots and start "fresh" after every boot.
> > > > 
> > > > So we set large enough but not too large value as a default for PFs.
> > > > If you find sane model of how to change it through kernel, you can count
> > > > on our support.
> > > 
> > > I guess one main difference is that in case of ice, PF driver manager resources
> > > for all its associated functions, not the FW. So the MSI-X count reported for PF
> > > shows the total vectors(PF netdev, VFs, rdma, SFs). VFs talk to PF over a mailbox
> > > to get their MSI-X vector information.
> > 
> > What is the output of lspci for ice VF when the driver is not bound?
> > 
> > Thanks
> > 
> 
> It is the same after creating and after unbonding:
> Capabilities: [70] MSI-X: Enable- Count=17 Masked-
> 
> 17, because 16 for traffic and 1 for mailbox.

Interesting, I think that your PF violates PCI spec as it always
uses word "function" and not "device" while talks about MSI-X related
registers.

Thanks

> 
> Thanks
> > > 
> > > 
> > > 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH net-next 00/13] resource management using devlink reload
  2022-11-16 17:59                             ` Leon Romanovsky
@ 2022-11-17 11:10                               ` Michal Swiatkowski
  2022-11-17 11:45                                 ` Leon Romanovsky
  0 siblings, 1 reply; 49+ messages in thread
From: Michal Swiatkowski @ 2022-11-17 11:10 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Samudrala, Sridhar, netdev, davem, kuba, pabeni, edumazet,
	intel-wired-lan, jiri, anthony.l.nguyen, alexandr.lobakin,
	wojciech.drewek, lukasz.czapnik, shiraz.saleem, jesse.brandeburg,
	mustafa.ismail, przemyslaw.kitszel, piotr.raczynski,
	jacob.e.keller, david.m.ertman, leszek.kaliszczuk

On Wed, Nov 16, 2022 at 07:59:43PM +0200, Leon Romanovsky wrote:
> On Wed, Nov 16, 2022 at 01:04:36PM +0100, Michal Swiatkowski wrote:
> > On Wed, Nov 16, 2022 at 08:04:56AM +0200, Leon Romanovsky wrote:
> > > On Tue, Nov 15, 2022 at 07:59:06PM -0600, Samudrala, Sridhar wrote:
> > > > On 11/15/2022 11:57 AM, Leon Romanovsky wrote:
> > > 
> > > <...>
> > > 
> > > > > > In case of ice driver lspci -vs shows:
> > > > > > Capabilities: [70] MSI-X: Enable+ Count=1024 Masked
> > > > > > 
> > > > > > so all vectors that hw supports (PFs, VFs, misc, etc). Because of that
> > > > > > total number of MSI-X in the devlink example from cover letter is 1024.
> > > > > > 
> > > > > > I see that mellanox shows:
> > > > > > Capabilities: [9c] MSI-X: Enable+ Count=64 Masked
> > > > > > 
> > > > > > I assume that 64 is in this case MSI-X ony for this one PF (it make
> > > > > > sense).
> > > > > Yes and PF MSI-X count can be changed through FW configuration tool, as
> > > > > we need to write new value when the driver is unbound and we need it to
> > > > > be persistent. Users are expecting to see "stable" number any time they
> > > > > reboot the server. It is not the case for VFs, as they are explicitly
> > > > > created after reboots and start "fresh" after every boot.
> > > > > 
> > > > > So we set large enough but not too large value as a default for PFs.
> > > > > If you find sane model of how to change it through kernel, you can count
> > > > > on our support.
> > > > 
> > > > I guess one main difference is that in case of ice, PF driver manager resources
> > > > for all its associated functions, not the FW. So the MSI-X count reported for PF
> > > > shows the total vectors(PF netdev, VFs, rdma, SFs). VFs talk to PF over a mailbox
> > > > to get their MSI-X vector information.
> > > 
> > > What is the output of lspci for ice VF when the driver is not bound?
> > > 
> > > Thanks
> > > 
> > 
> > It is the same after creating and after unbonding:
> > Capabilities: [70] MSI-X: Enable- Count=17 Masked-
> > 
> > 17, because 16 for traffic and 1 for mailbox.
> 
> Interesting, I think that your PF violates PCI spec as it always
> uses word "function" and not "device" while talks about MSI-X related
> registers.
> 
> Thanks
> 

I made mistake in one comment. 1024 isn't MSI-X amount for device. On
ice we have 2048 for the whole device. On two ports card each PF have
1024 MSI-X. Our control register mapping to the internal space looks
like that (Assuming two port card; one VF with 5 MSI-X created):
INT[PF0].FIRST	0
		1
		2
		
		.
		.
		.

		1019	INT[VF0].FIRST	__
		1020			  | interrupts used
		1021			  | by VF on PF0
		1022			  |
INT[PF0].LAST	1023	INT[VF0].LAST	__|
INT[PF1].FIRST	1024
		1025
		1026

		.
		.
		.
		
INT[PF1].LAST	2047

MSI-X entry table size for PF0 is 1024, but entry table for VF is a part
of PF0 physical space.

Do You mean that "sharing" the entry between PF and VF is a violation of
PCI spec? Sum of MSI-X count on all function within device shouldn't be
grater than 2048? It is hard to find sth about this in spec. I only read
that: "MSI-X supports a maximum table size of 2048 entries". I will
continue searching for information about that.

I don't think that from driver perspective we can change the table size
located in message control register.

I assume in mlnx the tool that You mentioned can modify this table size?

Thanks

> > 
> > Thanks
> > > > 
> > > > 
> > > > 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH net-next 00/13] resource management using devlink reload
  2022-11-17 11:10                               ` Michal Swiatkowski
@ 2022-11-17 11:45                                 ` Leon Romanovsky
  2022-11-17 13:39                                   ` Michal Swiatkowski
  0 siblings, 1 reply; 49+ messages in thread
From: Leon Romanovsky @ 2022-11-17 11:45 UTC (permalink / raw)
  To: Michal Swiatkowski
  Cc: Samudrala, Sridhar, netdev, davem, kuba, pabeni, edumazet,
	intel-wired-lan, jiri, anthony.l.nguyen, alexandr.lobakin,
	wojciech.drewek, lukasz.czapnik, shiraz.saleem, jesse.brandeburg,
	mustafa.ismail, przemyslaw.kitszel, piotr.raczynski,
	jacob.e.keller, david.m.ertman, leszek.kaliszczuk

On Thu, Nov 17, 2022 at 12:10:21PM +0100, Michal Swiatkowski wrote:
> On Wed, Nov 16, 2022 at 07:59:43PM +0200, Leon Romanovsky wrote:
> > On Wed, Nov 16, 2022 at 01:04:36PM +0100, Michal Swiatkowski wrote:
> > > On Wed, Nov 16, 2022 at 08:04:56AM +0200, Leon Romanovsky wrote:
> > > > On Tue, Nov 15, 2022 at 07:59:06PM -0600, Samudrala, Sridhar wrote:
> > > > > On 11/15/2022 11:57 AM, Leon Romanovsky wrote:
> > > > 
> > > > <...>
> > > > 
> > > > > > > In case of ice driver lspci -vs shows:
> > > > > > > Capabilities: [70] MSI-X: Enable+ Count=1024 Masked
> > > > > > > 
> > > > > > > so all vectors that hw supports (PFs, VFs, misc, etc). Because of that
> > > > > > > total number of MSI-X in the devlink example from cover letter is 1024.
> > > > > > > 
> > > > > > > I see that mellanox shows:
> > > > > > > Capabilities: [9c] MSI-X: Enable+ Count=64 Masked
> > > > > > > 
> > > > > > > I assume that 64 is in this case MSI-X ony for this one PF (it make
> > > > > > > sense).
> > > > > > Yes and PF MSI-X count can be changed through FW configuration tool, as
> > > > > > we need to write new value when the driver is unbound and we need it to
> > > > > > be persistent. Users are expecting to see "stable" number any time they
> > > > > > reboot the server. It is not the case for VFs, as they are explicitly
> > > > > > created after reboots and start "fresh" after every boot.
> > > > > > 
> > > > > > So we set large enough but not too large value as a default for PFs.
> > > > > > If you find sane model of how to change it through kernel, you can count
> > > > > > on our support.
> > > > > 
> > > > > I guess one main difference is that in case of ice, PF driver manager resources
> > > > > for all its associated functions, not the FW. So the MSI-X count reported for PF
> > > > > shows the total vectors(PF netdev, VFs, rdma, SFs). VFs talk to PF over a mailbox
> > > > > to get their MSI-X vector information.
> > > > 
> > > > What is the output of lspci for ice VF when the driver is not bound?
> > > > 
> > > > Thanks
> > > > 
> > > 
> > > It is the same after creating and after unbonding:
> > > Capabilities: [70] MSI-X: Enable- Count=17 Masked-
> > > 
> > > 17, because 16 for traffic and 1 for mailbox.
> > 
> > Interesting, I think that your PF violates PCI spec as it always
> > uses word "function" and not "device" while talks about MSI-X related
> > registers.
> > 
> > Thanks
> > 
> 
> I made mistake in one comment. 1024 isn't MSI-X amount for device. On
> ice we have 2048 for the whole device. On two ports card each PF have
> 1024 MSI-X. Our control register mapping to the internal space looks
> like that (Assuming two port card; one VF with 5 MSI-X created):
> INT[PF0].FIRST	0
> 		1
> 		2
> 		
> 		.
> 		.
> 		.
> 
> 		1019	INT[VF0].FIRST	__
> 		1020			  | interrupts used
> 		1021			  | by VF on PF0
> 		1022			  |
> INT[PF0].LAST	1023	INT[VF0].LAST	__|
> INT[PF1].FIRST	1024
> 		1025
> 		1026
> 
> 		.
> 		.
> 		.
> 		
> INT[PF1].LAST	2047
> 
> MSI-X entry table size for PF0 is 1024, but entry table for VF is a part
> of PF0 physical space.
> 
> Do You mean that "sharing" the entry between PF and VF is a violation of
> PCI spec? 

You should consult with your PCI specification experts. It was my
spec interpretation, which can be wrong.


> Sum of MSI-X count on all function within device shouldn't be
> grater than 2048? 

No, it is 2K per-control message/per-function.


> It is hard to find sth about this in spec. I only read
> that: "MSI-X supports a maximum table size of 2048 entries". I will
> continue searching for information about that.
> 
> I don't think that from driver perspective we can change the table size
> located in message control register.

No, you can't, unless you decide explicitly violate spec.

> 
> I assume in mlnx the tool that You mentioned can modify this table size?

Yes, it is FW configuration tool.

Thanks

> 
> Thanks
> 
> > > 
> > > Thanks
> > > > > 
> > > > > 
> > > > > 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH net-next 00/13] resource management using devlink reload
  2022-11-17 11:45                                 ` Leon Romanovsky
@ 2022-11-17 13:39                                   ` Michal Swiatkowski
  2022-11-17 17:38                                     ` Leon Romanovsky
  0 siblings, 1 reply; 49+ messages in thread
From: Michal Swiatkowski @ 2022-11-17 13:39 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Samudrala, Sridhar, netdev, davem, kuba, pabeni, edumazet,
	intel-wired-lan, jiri, anthony.l.nguyen, alexandr.lobakin,
	wojciech.drewek, lukasz.czapnik, shiraz.saleem, jesse.brandeburg,
	mustafa.ismail, przemyslaw.kitszel, piotr.raczynski,
	jacob.e.keller, david.m.ertman, leszek.kaliszczuk

On Thu, Nov 17, 2022 at 01:45:38PM +0200, Leon Romanovsky wrote:
> On Thu, Nov 17, 2022 at 12:10:21PM +0100, Michal Swiatkowski wrote:
> > On Wed, Nov 16, 2022 at 07:59:43PM +0200, Leon Romanovsky wrote:
> > > On Wed, Nov 16, 2022 at 01:04:36PM +0100, Michal Swiatkowski wrote:
> > > > On Wed, Nov 16, 2022 at 08:04:56AM +0200, Leon Romanovsky wrote:
> > > > > On Tue, Nov 15, 2022 at 07:59:06PM -0600, Samudrala, Sridhar wrote:
> > > > > > On 11/15/2022 11:57 AM, Leon Romanovsky wrote:
> > > > > 
> > > > > <...>
> > > > > 
> > > > > > > > In case of ice driver lspci -vs shows:
> > > > > > > > Capabilities: [70] MSI-X: Enable+ Count=1024 Masked
> > > > > > > > 
> > > > > > > > so all vectors that hw supports (PFs, VFs, misc, etc). Because of that
> > > > > > > > total number of MSI-X in the devlink example from cover letter is 1024.
> > > > > > > > 
> > > > > > > > I see that mellanox shows:
> > > > > > > > Capabilities: [9c] MSI-X: Enable+ Count=64 Masked
> > > > > > > > 
> > > > > > > > I assume that 64 is in this case MSI-X ony for this one PF (it make
> > > > > > > > sense).
> > > > > > > Yes and PF MSI-X count can be changed through FW configuration tool, as
> > > > > > > we need to write new value when the driver is unbound and we need it to
> > > > > > > be persistent. Users are expecting to see "stable" number any time they
> > > > > > > reboot the server. It is not the case for VFs, as they are explicitly
> > > > > > > created after reboots and start "fresh" after every boot.
> > > > > > > 
> > > > > > > So we set large enough but not too large value as a default for PFs.
> > > > > > > If you find sane model of how to change it through kernel, you can count
> > > > > > > on our support.
> > > > > > 
> > > > > > I guess one main difference is that in case of ice, PF driver manager resources
> > > > > > for all its associated functions, not the FW. So the MSI-X count reported for PF
> > > > > > shows the total vectors(PF netdev, VFs, rdma, SFs). VFs talk to PF over a mailbox
> > > > > > to get their MSI-X vector information.
> > > > > 
> > > > > What is the output of lspci for ice VF when the driver is not bound?
> > > > > 
> > > > > Thanks
> > > > > 
> > > > 
> > > > It is the same after creating and after unbonding:
> > > > Capabilities: [70] MSI-X: Enable- Count=17 Masked-
> > > > 
> > > > 17, because 16 for traffic and 1 for mailbox.
> > > 
> > > Interesting, I think that your PF violates PCI spec as it always
> > > uses word "function" and not "device" while talks about MSI-X related
> > > registers.
> > > 
> > > Thanks
> > > 
> > 
> > I made mistake in one comment. 1024 isn't MSI-X amount for device. On
> > ice we have 2048 for the whole device. On two ports card each PF have
> > 1024 MSI-X. Our control register mapping to the internal space looks
> > like that (Assuming two port card; one VF with 5 MSI-X created):
> > INT[PF0].FIRST	0
> > 		1
> > 		2
> > 		
> > 		.
> > 		.
> > 		.
> > 
> > 		1019	INT[VF0].FIRST	__
> > 		1020			  | interrupts used
> > 		1021			  | by VF on PF0
> > 		1022			  |
> > INT[PF0].LAST	1023	INT[VF0].LAST	__|
> > INT[PF1].FIRST	1024
> > 		1025
> > 		1026
> > 
> > 		.
> > 		.
> > 		.
> > 		
> > INT[PF1].LAST	2047
> > 
> > MSI-X entry table size for PF0 is 1024, but entry table for VF is a part
> > of PF0 physical space.
> > 
> > Do You mean that "sharing" the entry between PF and VF is a violation of
> > PCI spec? 
> 
> You should consult with your PCI specification experts. It was my
> spec interpretation, which can be wrong.
> 

Sure, I will

> 
> > Sum of MSI-X count on all function within device shouldn't be
> > grater than 2048? 
> 
> No, it is 2K per-control message/per-function.
> 
> 
> > It is hard to find sth about this in spec. I only read
> > that: "MSI-X supports a maximum table size of 2048 entries". I will
> > continue searching for information about that.
> > 
> > I don't think that from driver perspective we can change the table size
> > located in message control register.
> 
> No, you can't, unless you decide explicitly violate spec.
> 
> > 
> > I assume in mlnx the tool that You mentioned can modify this table size?
> 
> Yes, it is FW configuration tool.
>

Thanks for clarification.
In summary, if this is really violation of PCI spec we for sure will
have to fix that and resource managment implemented in this patchset
will not make sense (as config for PF MSI-X amount will have to happen
in FW).

If it isn't, is it still NACK from You? I mean, if we implement the
devlink resources managment (maybe not generic one, only define in ice)
and sysfs for setting VF MSI-X, will You be ok with that? Or using
devlink to manage MSI-X resources isn't in general good solution?

Our motivation here is to allow user to configure MSI-X allocation for
his own use case. For example giving only 1 MSI-X for ethernet and RDMA
and spending rest on VFs. If I understand correctly, on mlnx card user
can set PF MSI-X to 1 in FW configuration tool and achieve the same.
Currently we don't have other mechanism to change default number of
MSI-X on ethernet.

> Thanks
> 
> > 
> > Thanks
> > 
> > > > 
> > > > Thanks
> > > > > > 
> > > > > > 
> > > > > > 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH net-next 00/13] resource management using devlink reload
  2022-11-17 13:39                                   ` Michal Swiatkowski
@ 2022-11-17 17:38                                     ` Leon Romanovsky
  2022-11-18  3:36                                       ` Jakub Kicinski
  0 siblings, 1 reply; 49+ messages in thread
From: Leon Romanovsky @ 2022-11-17 17:38 UTC (permalink / raw)
  To: Michal Swiatkowski
  Cc: Samudrala, Sridhar, netdev, davem, kuba, pabeni, edumazet,
	intel-wired-lan, jiri, anthony.l.nguyen, alexandr.lobakin,
	wojciech.drewek, lukasz.czapnik, shiraz.saleem, jesse.brandeburg,
	mustafa.ismail, przemyslaw.kitszel, piotr.raczynski,
	jacob.e.keller, david.m.ertman, leszek.kaliszczuk

On Thu, Nov 17, 2022 at 02:39:50PM +0100, Michal Swiatkowski wrote:
> On Thu, Nov 17, 2022 at 01:45:38PM +0200, Leon Romanovsky wrote:
> > On Thu, Nov 17, 2022 at 12:10:21PM +0100, Michal Swiatkowski wrote:
> > > On Wed, Nov 16, 2022 at 07:59:43PM +0200, Leon Romanovsky wrote:
> > > > On Wed, Nov 16, 2022 at 01:04:36PM +0100, Michal Swiatkowski wrote:
> > > > > On Wed, Nov 16, 2022 at 08:04:56AM +0200, Leon Romanovsky wrote:
> > > > > > On Tue, Nov 15, 2022 at 07:59:06PM -0600, Samudrala, Sridhar wrote:
> > > > > > > On 11/15/2022 11:57 AM, Leon Romanovsky wrote:

<...>

> 
> Thanks for clarification.
> In summary, if this is really violation of PCI spec we for sure will
> have to fix that and resource managment implemented in this patchset
> will not make sense (as config for PF MSI-X amount will have to happen
> in FW).
> 
> If it isn't, is it still NACK from You? I mean, if we implement the
> devlink resources managment (maybe not generic one, only define in ice)
> and sysfs for setting VF MSI-X, will You be ok with that? Or using
> devlink to manage MSI-X resources isn't in general good solution?

NACK/ACK question is not for me, it is not my decision :)

I don't think that management of PCI specific parameters in devlink is
right idea. PCI has his own subsystem with rules and assumptions, netdev
shouldn't mangle them. In MSI-X case, this even more troublesome as users
sees these values through lspci without driver attached to it.

For example (very popular case), users creates VFs, unbinds driver from
all functions (PF and VFs) and pass them to VMs (bind to vfio driver).

Your solution being in wrong layer won't work there.

Thanks

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH net-next 00/13] resource management using devlink reload
  2022-11-17 17:38                                     ` Leon Romanovsky
@ 2022-11-18  3:36                                       ` Jakub Kicinski
  2022-11-18  6:20                                         ` Leon Romanovsky
  0 siblings, 1 reply; 49+ messages in thread
From: Jakub Kicinski @ 2022-11-18  3:36 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Michal Swiatkowski, Samudrala, Sridhar, netdev, davem, pabeni,
	edumazet, intel-wired-lan, jiri, anthony.l.nguyen,
	alexandr.lobakin, wojciech.drewek, lukasz.czapnik, shiraz.saleem,
	jesse.brandeburg, mustafa.ismail, przemyslaw.kitszel,
	piotr.raczynski, jacob.e.keller, david.m.ertman,
	leszek.kaliszczuk, Bjorn Helgaas

On Thu, 17 Nov 2022 19:38:48 +0200 Leon Romanovsky wrote:
> I don't think that management of PCI specific parameters in devlink is
> right idea. PCI has his own subsystem with rules and assumptions, netdev
> shouldn't mangle them.

Not netdev, devlink, which covers netdev, RDMA and others.

> In MSI-X case, this even more troublesome as users
> sees these values through lspci without driver attached to it.

I'm no PCI expert either but FWIW to me the patch set seems reasonable.
Whether management FW policies are configured via core PCI sysfs or
subsystem-specific-ish APIs is a toss up. Adding Bjorn and please CC him
on the next rev.

Link to the series:
https://lore.kernel.org/all/20221114125755.13659-1-michal.swiatkowski@linux.intel.com/

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH net-next 00/13] resource management using devlink reload
  2022-11-18  3:36                                       ` Jakub Kicinski
@ 2022-11-18  6:20                                         ` Leon Romanovsky
  2022-11-18 14:23                                           ` Saleem, Shiraz
  0 siblings, 1 reply; 49+ messages in thread
From: Leon Romanovsky @ 2022-11-18  6:20 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Michal Swiatkowski, Samudrala, Sridhar, netdev, davem, pabeni,
	edumazet, intel-wired-lan, jiri, anthony.l.nguyen,
	alexandr.lobakin, wojciech.drewek, lukasz.czapnik, shiraz.saleem,
	jesse.brandeburg, mustafa.ismail, przemyslaw.kitszel,
	piotr.raczynski, jacob.e.keller, david.m.ertman,
	leszek.kaliszczuk, Bjorn Helgaas

On Thu, Nov 17, 2022 at 07:36:18PM -0800, Jakub Kicinski wrote:
> On Thu, 17 Nov 2022 19:38:48 +0200 Leon Romanovsky wrote:
> > I don't think that management of PCI specific parameters in devlink is
> > right idea. PCI has his own subsystem with rules and assumptions, netdev
> > shouldn't mangle them.
> 
> Not netdev, devlink, which covers netdev, RDMA and others.

devlink is located in net/*, it is netdev.
Regarding RDMA, it is not fully accurate. We use devlink to convey
information to FW through pci device located in netdev. Some of such
parameters are RDMA related. However, we don't configure RDMA properties
through devlink, there is a special tool for that (rdmatool).

> 
> > In MSI-X case, this even more troublesome as users
> > sees these values through lspci without driver attached to it.
> 
> I'm no PCI expert either but FWIW to me the patch set seems reasonable.
> Whether management FW policies are configured via core PCI sysfs or
> subsystem-specific-ish APIs is a toss up. Adding Bjorn and please CC him
> on the next rev.

The thing is that it must be managed by FW. Earlier in this thread, I pointed
the pretty normal use case where you need to have valid PCI structure without
any driver involvement.

> 
> Link to the series:
> https://lore.kernel.org/all/20221114125755.13659-1-michal.swiatkowski@linux.intel.com/

^ permalink raw reply	[flat|nested] 49+ messages in thread

* RE: [PATCH net-next 00/13] resource management using devlink reload
  2022-11-18  6:20                                         ` Leon Romanovsky
@ 2022-11-18 14:23                                           ` Saleem, Shiraz
  2022-11-18 17:31                                             ` Leon Romanovsky
  0 siblings, 1 reply; 49+ messages in thread
From: Saleem, Shiraz @ 2022-11-18 14:23 UTC (permalink / raw)
  To: Leon Romanovsky, Jakub Kicinski
  Cc: Michal Swiatkowski, Samudrala, Sridhar, netdev, davem, pabeni,
	edumazet, intel-wired-lan, jiri, Nguyen, Anthony L, Lobakin,
	Alexandr, Drewek, Wojciech, Czapnik, Lukasz, Brandeburg, Jesse,
	Ismail, Mustafa, Kitszel, Przemyslaw, Raczynski, Piotr, Keller,
	Jacob E, Ertman, David M, Kaliszczuk, Leszek, Bjorn Helgaas

> Subject: Re: [PATCH net-next 00/13] resource management using devlink reload
> 
> On Thu, Nov 17, 2022 at 07:36:18PM -0800, Jakub Kicinski wrote:
> > On Thu, 17 Nov 2022 19:38:48 +0200 Leon Romanovsky wrote:
> > > I don't think that management of PCI specific parameters in devlink
> > > is right idea. PCI has his own subsystem with rules and assumptions,
> > > netdev shouldn't mangle them.
> >
> > Not netdev, devlink, which covers netdev, RDMA and others.
> 
> devlink is located in net/*, it is netdev.
> Regarding RDMA, it is not fully accurate. We use devlink to convey information to
> FW through pci device located in netdev. Some of such parameters are RDMA
> related. However, we don't configure RDMA properties through devlink, there is a
> special tool for that (rdmatool).

rdmatool though is usable only once the rdma driver probe() completes and the ib_device is registered.
And cannot be used for any configurations at driver init time.

Don't we already have PCI specific parameters managed through devlink today?

https://docs.kernel.org/networking/devlink/devlink-params.html
enable_sriov
ignore_ari
msix_vec_per_pf_max
msix_vec_per_pf_min

How are these in a different bracket from what Michal is trying to do? Or were these ones a bad idea in hindsight?

Shiraz

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH net-next 00/13] resource management using devlink reload
  2022-11-18 14:23                                           ` Saleem, Shiraz
@ 2022-11-18 17:31                                             ` Leon Romanovsky
  2022-11-20 22:24                                               ` Samudrala, Sridhar
  0 siblings, 1 reply; 49+ messages in thread
From: Leon Romanovsky @ 2022-11-18 17:31 UTC (permalink / raw)
  To: Saleem, Shiraz
  Cc: Jakub Kicinski, Michal Swiatkowski, Samudrala, Sridhar, netdev,
	davem, pabeni, edumazet, intel-wired-lan, jiri, Nguyen,
	Anthony L, Lobakin, Alexandr, Drewek, Wojciech, Czapnik, Lukasz,
	Brandeburg, Jesse, Ismail, Mustafa, Kitszel, Przemyslaw,
	Raczynski, Piotr, Keller, Jacob E, Ertman, David M, Kaliszczuk,
	Leszek, Bjorn Helgaas

On Fri, Nov 18, 2022 at 02:23:50PM +0000, Saleem, Shiraz wrote:
> > Subject: Re: [PATCH net-next 00/13] resource management using devlink reload
> > 
> > On Thu, Nov 17, 2022 at 07:36:18PM -0800, Jakub Kicinski wrote:
> > > On Thu, 17 Nov 2022 19:38:48 +0200 Leon Romanovsky wrote:
> > > > I don't think that management of PCI specific parameters in devlink
> > > > is right idea. PCI has his own subsystem with rules and assumptions,
> > > > netdev shouldn't mangle them.
> > >
> > > Not netdev, devlink, which covers netdev, RDMA and others.
> > 
> > devlink is located in net/*, it is netdev.
> > Regarding RDMA, it is not fully accurate. We use devlink to convey information to
> > FW through pci device located in netdev. Some of such parameters are RDMA
> > related. However, we don't configure RDMA properties through devlink, there is a
> > special tool for that (rdmatool).
> 
> rdmatool though is usable only once the rdma driver probe() completes and the ib_device is registered.
> And cannot be used for any configurations at driver init time.

Like I said, we use devlink to configure FW and "core" device to which
ib_device is connected. We don't configure RDMA specific properties, but
only device specific ones.

> 
> Don't we already have PCI specific parameters managed through devlink today?
> 
> https://docs.kernel.org/networking/devlink/devlink-params.html
> enable_sriov
> ignore_ari
> msix_vec_per_pf_max
> msix_vec_per_pf_min
> 
> How are these in a different bracket from what Michal is trying to do? Or were these ones a bad idea in hindsight?

Yes as it doesn't belong to net/* and it exists there just because of
one reason: ability to write to FW of specific device.

At least for ARI, I don't see that bnxt driver masked ARI Extended Capability
and informed PCI subsystem about it, so PCI core will recreate device.

Thanks

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH net-next 00/13] resource management using devlink reload
  2022-11-18 17:31                                             ` Leon Romanovsky
@ 2022-11-20 22:24                                               ` Samudrala, Sridhar
  0 siblings, 0 replies; 49+ messages in thread
From: Samudrala, Sridhar @ 2022-11-20 22:24 UTC (permalink / raw)
  To: Leon Romanovsky, Saleem, Shiraz
  Cc: Jakub Kicinski, Michal Swiatkowski, netdev, davem, pabeni,
	edumazet, intel-wired-lan, jiri, Nguyen, Anthony L, Lobakin,
	Alexandr, Drewek, Wojciech, Czapnik, Lukasz, Brandeburg, Jesse,
	Ismail, Mustafa, Kitszel, Przemyslaw, Raczynski, Piotr, Keller,
	Jacob E, Ertman, David M, Kaliszczuk, Leszek, Bjorn Helgaas

On 11/18/2022 11:31 AM, Leon Romanovsky wrote:
> On Fri, Nov 18, 2022 at 02:23:50PM +0000, Saleem, Shiraz wrote:
>>> Subject: Re: [PATCH net-next 00/13] resource management using devlink reload
>>>
>>> On Thu, Nov 17, 2022 at 07:36:18PM -0800, Jakub Kicinski wrote:
>>>> On Thu, 17 Nov 2022 19:38:48 +0200 Leon Romanovsky wrote:
>>>>> I don't think that management of PCI specific parameters in devlink
>>>>> is right idea. PCI has his own subsystem with rules and assumptions,
>>>>> netdev shouldn't mangle them.
>>>> Not netdev, devlink, which covers netdev, RDMA and others.
>>> devlink is located in net/*, it is netdev.
>>> Regarding RDMA, it is not fully accurate. We use devlink to convey information to
>>> FW through pci device located in netdev. Some of such parameters are RDMA
>>> related. However, we don't configure RDMA properties through devlink, there is a
>>> special tool for that (rdmatool).
>> rdmatool though is usable only once the rdma driver probe() completes and the ib_device is registered.
>> And cannot be used for any configurations at driver init time.
> Like I said, we use devlink to configure FW and "core" device to which
> ib_device is connected. We don't configure RDMA specific properties, but
> only device specific ones.

I don't think this patch series is configuring any PCI specific parameters OR per-PCI device parameters.
The FW splits the device level MSI-X vectors across its PFs(1 per port).
Each PF manages its own MSI-X vectors and distributes them across the different functions associated with that PF.
So far the PF driver has been doing this with a hard-coded policy implemented within the driver.

We are exposing that policy via devlink and allowing a host admin to split the MSI-X vectors that are
assigned to a PF by the FW across its different functions (PF, all its VFs, all its SFs and RDMA) based
on the use cases/workload running on the host.





^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2022-11-20 22:24 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-14 12:57 [PATCH net-next 00/13] resource management using devlink reload Michal Swiatkowski
2022-11-14 12:57 ` [PATCH net-next 01/13] ice: move RDMA init to ice_idc.c Michal Swiatkowski
2022-11-14 12:57 ` [PATCH net-next 02/13] ice: alloc id for RDMA using xa_array Michal Swiatkowski
2022-11-14 12:57 ` [PATCH net-next 03/13] ice: cleanup in VSI config/deconfig code Michal Swiatkowski
2022-11-14 12:57 ` [PATCH net-next 04/13] ice: split ice_vsi_setup into smaller functions Michal Swiatkowski
2022-11-15  5:08   ` Jakub Kicinski
2022-11-15  6:49     ` Michal Swiatkowski
2022-11-14 12:57 ` [PATCH net-next 05/13] ice: stop hard coding the ICE_VSI_CTRL location Michal Swiatkowski
2022-11-14 12:57 ` [PATCH net-next 06/13] ice: split probe into smaller functions Michal Swiatkowski
2022-11-14 12:57 ` [PATCH net-next 07/13] ice: sync netdev filters after clearing VSI Michal Swiatkowski
2022-11-14 12:57 ` [PATCH net-next 08/13] ice: move VSI delete outside deconfig Michal Swiatkowski
2022-11-14 12:57 ` [PATCH net-next 09/13] ice: update VSI instead of init in some case Michal Swiatkowski
2022-11-14 12:57 ` [PATCH net-next 10/13] ice: implement devlink reinit action Michal Swiatkowski
2022-11-14 12:57 ` [PATCH net-next 11/13] ice: introduce eswitch capable flag Michal Swiatkowski
2022-11-14 12:57 ` [PATCH net-next 12/13] ice, irdma: prepare reservation of MSI-X to reload Michal Swiatkowski
2022-11-15  5:08   ` Jakub Kicinski
2022-11-15  6:49     ` Michal Swiatkowski
2022-11-14 12:57 ` [PATCH net-next 13/13] devlink, ice: add MSIX vectors as devlink resource Michal Swiatkowski
2022-11-14 15:28   ` Jiri Pirko
2022-11-14 16:03     ` Piotr Raczynski
2022-11-15  6:56       ` Michal Swiatkowski
2022-11-15 12:08       ` Jiri Pirko
2022-11-14 13:23 ` [PATCH net-next 00/13] resource management using devlink reload Leon Romanovsky
2022-11-14 15:31   ` Samudrala, Sridhar
2022-11-14 16:58     ` Keller, Jacob E
2022-11-14 17:09       ` Leon Romanovsky
2022-11-15  7:00         ` Michal Swiatkowski
2022-11-14 17:07     ` Leon Romanovsky
2022-11-15  7:12       ` Michal Swiatkowski
2022-11-15  8:11         ` Leon Romanovsky
2022-11-15  9:04           ` Michal Swiatkowski
2022-11-15  9:32             ` Leon Romanovsky
2022-11-15 10:16               ` Michal Swiatkowski
2022-11-15 12:12                 ` Leon Romanovsky
2022-11-15 14:02                   ` Michal Swiatkowski
2022-11-15 17:57                     ` Leon Romanovsky
2022-11-16  1:59                       ` Samudrala, Sridhar
2022-11-16  6:04                         ` Leon Romanovsky
2022-11-16 12:04                           ` Michal Swiatkowski
2022-11-16 17:59                             ` Leon Romanovsky
2022-11-17 11:10                               ` Michal Swiatkowski
2022-11-17 11:45                                 ` Leon Romanovsky
2022-11-17 13:39                                   ` Michal Swiatkowski
2022-11-17 17:38                                     ` Leon Romanovsky
2022-11-18  3:36                                       ` Jakub Kicinski
2022-11-18  6:20                                         ` Leon Romanovsky
2022-11-18 14:23                                           ` Saleem, Shiraz
2022-11-18 17:31                                             ` Leon Romanovsky
2022-11-20 22:24                                               ` Samudrala, Sridhar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).