[PATCH v3 for-4.13 0/6] Automatic affinity settings for nvme over rdma

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v3 for-4.13 0/6] Automatic affinity settings for nvme over rdma
@ 2017-06-05  6:35 ` Sagi Grimberg
  0 siblings, 0 replies; 36+ messages in thread
From: Sagi Grimberg @ 2017-06-05  6:35 UTC (permalink / raw)
  To: Doug Ledford
  Cc: Christoph Hellwig, Leon Romanovsky,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

Doug, please consider this patch set for 4.13.

This patch set is aiming to automatically find the optimal
queue <-> irq multi-queue assignments in storage ULPs (demonstrated
on nvme-rdma) based on the underlying rdma device irq affinity
settings.

Changes from v2:
- rebased to 4.12
- added review tags

Changes from v1:
- Removed mlx5e_get_cpu as Christoph suggested
- Fixed up nvme-rdma queue comp_vector selection to get a better match
- Added a comment on why we limit on @dev->num_comp_vectors
- rebased to Jens's for-4.12/block
- Collected review tags

Sagi Grimberg (6):
  mlx5: convert to generic pci_alloc_irq_vectors
  mlx5: move affinity hints assignments to generic code
  RDMA/core: expose affinity mappings per completion vector
  mlx5: support ->get_vector_affinity
  block: Add rdma affinity based queue mapping helper
  nvme-rdma: use intelligent affinity based queue mappings

 block/Kconfig                                      |   5 +
 block/Makefile                                     |   1 +
 block/blk-mq-rdma.c                                |  54 +++++++++++
 drivers/infiniband/hw/mlx5/main.c                  |  10 ++
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |  14 +--
 drivers/net/ethernet/mellanox/mlx5/core/eq.c       |   9 +-
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.c  |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/health.c   |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/main.c     | 108 +++------------------
 .../net/ethernet/mellanox/mlx5/core/mlx5_core.h    |   1 -
 drivers/nvme/host/rdma.c                           |  29 ++++--
 include/linux/blk-mq-rdma.h                        |  10 ++
 include/linux/mlx5/driver.h                        |   2 -
 include/rdma/ib_verbs.h                            |  25 ++++-
 14 files changed, 149 insertions(+), 123 deletions(-)
 create mode 100644 block/blk-mq-rdma.c
 create mode 100644 include/linux/blk-mq-rdma.h

-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v3 for-4.13 0/6] Automatic affinity settings for nvme over rdma
@ 2017-06-05  6:35 ` Sagi Grimberg
  0 siblings, 0 replies; 36+ messages in thread
From: Sagi Grimberg @ 2017-06-05  6:35 UTC (permalink / raw)


Doug, please consider this patch set for 4.13.

This patch set is aiming to automatically find the optimal
queue <-> irq multi-queue assignments in storage ULPs (demonstrated
on nvme-rdma) based on the underlying rdma device irq affinity
settings.

Changes from v2:
- rebased to 4.12
- added review tags

Changes from v1:
- Removed mlx5e_get_cpu as Christoph suggested
- Fixed up nvme-rdma queue comp_vector selection to get a better match
- Added a comment on why we limit on @dev->num_comp_vectors
- rebased to Jens's for-4.12/block
- Collected review tags

Sagi Grimberg (6):
  mlx5: convert to generic pci_alloc_irq_vectors
  mlx5: move affinity hints assignments to generic code
  RDMA/core: expose affinity mappings per completion vector
  mlx5: support ->get_vector_affinity
  block: Add rdma affinity based queue mapping helper
  nvme-rdma: use intelligent affinity based queue mappings

 block/Kconfig                                      |   5 +
 block/Makefile                                     |   1 +
 block/blk-mq-rdma.c                                |  54 +++++++++++
 drivers/infiniband/hw/mlx5/main.c                  |  10 ++
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |  14 +--
 drivers/net/ethernet/mellanox/mlx5/core/eq.c       |   9 +-
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.c  |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/health.c   |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/main.c     | 108 +++------------------
 .../net/ethernet/mellanox/mlx5/core/mlx5_core.h    |   1 -
 drivers/nvme/host/rdma.c                           |  29 ++++--
 include/linux/blk-mq-rdma.h                        |  10 ++
 include/linux/mlx5/driver.h                        |   2 -
 include/rdma/ib_verbs.h                            |  25 ++++-
 14 files changed, 149 insertions(+), 123 deletions(-)
 create mode 100644 block/blk-mq-rdma.c
 create mode 100644 include/linux/blk-mq-rdma.h

-- 
2.7.4

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v3 for-4.13 1/6] mlx5: convert to generic pci_alloc_irq_vectors
  2017-06-05  6:35 ` Sagi Grimberg
@ 2017-06-05  6:35     ` Sagi Grimberg
  -1 siblings, 0 replies; 36+ messages in thread
From: Sagi Grimberg @ 2017-06-05  6:35 UTC (permalink / raw)
  To: Doug Ledford
  Cc: Christoph Hellwig, Leon Romanovsky,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

Now that we have a generic code to allocate an array
of irq vectors and even correctly spread their affinity,
correctly handle cpu hotplug events and more, were much
better off using it.

Reviewed-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
Acked-by: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Sagi Grimberg <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |  2 +-
 drivers/net/ethernet/mellanox/mlx5/core/eq.c       |  9 ++----
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.c  |  2 +-
 drivers/net/ethernet/mellanox/mlx5/core/health.c   |  2 +-
 drivers/net/ethernet/mellanox/mlx5/core/main.c     | 33 ++++++++--------------
 .../net/ethernet/mellanox/mlx5/core/mlx5_core.h    |  1 -
 include/linux/mlx5/driver.h                        |  1 -
 7 files changed, 17 insertions(+), 33 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 41cd22a223dc..2a3c59e55dcf 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -385,7 +385,7 @@ static void mlx5e_enable_async_events(struct mlx5e_priv *priv)
 static void mlx5e_disable_async_events(struct mlx5e_priv *priv)
 {
 	clear_bit(MLX5E_STATE_ASYNC_EVENTS_ENABLED, &priv->state);
-	synchronize_irq(mlx5_get_msix_vec(priv->mdev, MLX5_EQ_VEC_ASYNC));
+	synchronize_irq(pci_irq_vector(priv->mdev->pdev, MLX5_EQ_VEC_ASYNC));
 }
 
 static inline int mlx5e_get_wqe_mtt_sz(void)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
index ea5d8d37a75c..e2c33c493b89 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
@@ -575,7 +575,7 @@ int mlx5_create_map_eq(struct mlx5_core_dev *dev, struct mlx5_eq *eq, u8 vecidx,
 		 name, pci_name(dev->pdev));
 
 	eq->eqn = MLX5_GET(create_eq_out, out, eq_number);
-	eq->irqn = priv->msix_arr[vecidx].vector;
+	eq->irqn = pci_irq_vector(dev->pdev, vecidx);
 	eq->dev = dev;
 	eq->doorbell = priv->uar->map + MLX5_EQ_DOORBEL_OFFSET;
 	err = request_irq(eq->irqn, handler, 0,
@@ -610,7 +610,7 @@ int mlx5_create_map_eq(struct mlx5_core_dev *dev, struct mlx5_eq *eq, u8 vecidx,
 	return 0;
 
 err_irq:
-	free_irq(priv->msix_arr[vecidx].vector, eq);
+	free_irq(eq->irqn, eq);
 
 err_eq:
 	mlx5_cmd_destroy_eq(dev, eq->eqn);
@@ -651,11 +651,6 @@ int mlx5_destroy_unmap_eq(struct mlx5_core_dev *dev, struct mlx5_eq *eq)
 }
 EXPORT_SYMBOL_GPL(mlx5_destroy_unmap_eq);
 
-u32 mlx5_get_msix_vec(struct mlx5_core_dev *dev, int vecidx)
-{
-	return dev->priv.msix_arr[MLX5_EQ_VEC_ASYNC].vector;
-}
-
 int mlx5_eq_init(struct mlx5_core_dev *dev)
 {
 	int err;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
index 2e34d95ea776..e9256b7017b6 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
@@ -1592,7 +1592,7 @@ static void esw_disable_vport(struct mlx5_eswitch *esw, int vport_num)
 	/* Mark this vport as disabled to discard new events */
 	vport->enabled = false;
 
-	synchronize_irq(mlx5_get_msix_vec(esw->dev, MLX5_EQ_VEC_ASYNC));
+	synchronize_irq(pci_irq_vector(esw->dev->pdev, MLX5_EQ_VEC_ASYNC));
 	/* Wait for current already scheduled events to complete */
 	flush_workqueue(esw->work_queue);
 	/* Disable events from this vport */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/health.c b/drivers/net/ethernet/mellanox/mlx5/core/health.c
index d0515391d33b..8b38d5cfd4c5 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/health.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/health.c
@@ -80,7 +80,7 @@ static void trigger_cmd_completions(struct mlx5_core_dev *dev)
 	u64 vector;
 
 	/* wait for pending handlers to complete */
-	synchronize_irq(dev->priv.msix_arr[MLX5_EQ_VEC_CMD].vector);
+	synchronize_irq(pci_irq_vector(dev->pdev, MLX5_EQ_VEC_CMD));
 	spin_lock_irqsave(&dev->cmd.alloc_lock, flags);
 	vector = ~dev->cmd.bitmask & ((1ul << (1 << dev->cmd.log_sz)) - 1);
 	if (!vector)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 0c123d571b4c..e4431aacce9d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -308,13 +308,12 @@ static void release_bar(struct pci_dev *pdev)
 	pci_release_regions(pdev);
 }
 
-static int mlx5_enable_msix(struct mlx5_core_dev *dev)
+static int mlx5_alloc_irq_vectors(struct mlx5_core_dev *dev)
 {
 	struct mlx5_priv *priv = &dev->priv;
 	struct mlx5_eq_table *table = &priv->eq_table;
 	int num_eqs = 1 << MLX5_CAP_GEN(dev, log_max_eq);
 	int nvec;
-	int i;
 
 	nvec = MLX5_CAP_GEN(dev, num_ports) * num_online_cpus() +
 	       MLX5_EQ_VEC_COMP_BASE;
@@ -322,17 +321,13 @@ static int mlx5_enable_msix(struct mlx5_core_dev *dev)
 	if (nvec <= MLX5_EQ_VEC_COMP_BASE)
 		return -ENOMEM;
 
-	priv->msix_arr = kcalloc(nvec, sizeof(*priv->msix_arr), GFP_KERNEL);
-
 	priv->irq_info = kcalloc(nvec, sizeof(*priv->irq_info), GFP_KERNEL);
-	if (!priv->msix_arr || !priv->irq_info)
+	if (!priv->irq_info)
 		goto err_free_msix;
 
-	for (i = 0; i < nvec; i++)
-		priv->msix_arr[i].entry = i;
-
-	nvec = pci_enable_msix_range(dev->pdev, priv->msix_arr,
-				     MLX5_EQ_VEC_COMP_BASE + 1, nvec);
+	nvec = pci_alloc_irq_vectors(dev->pdev,
+			MLX5_EQ_VEC_COMP_BASE + 1, nvec,
+			PCI_IRQ_MSIX);
 	if (nvec < 0)
 		return nvec;
 
@@ -342,7 +337,6 @@ static int mlx5_enable_msix(struct mlx5_core_dev *dev)
 
 err_free_msix:
 	kfree(priv->irq_info);
-	kfree(priv->msix_arr);
 	return -ENOMEM;
 }
 
@@ -350,9 +344,8 @@ static void mlx5_disable_msix(struct mlx5_core_dev *dev)
 {
 	struct mlx5_priv *priv = &dev->priv;
 
-	pci_disable_msix(dev->pdev);
+	pci_free_irq_vectors(dev->pdev);
 	kfree(priv->irq_info);
-	kfree(priv->msix_arr);
 }
 
 struct mlx5_reg_host_endianess {
@@ -610,8 +603,7 @@ u64 mlx5_read_internal_timer(struct mlx5_core_dev *dev)
 static int mlx5_irq_set_affinity_hint(struct mlx5_core_dev *mdev, int i)
 {
 	struct mlx5_priv *priv  = &mdev->priv;
-	struct msix_entry *msix = priv->msix_arr;
-	int irq                 = msix[i + MLX5_EQ_VEC_COMP_BASE].vector;
+	int irq = pci_irq_vector(mdev->pdev, MLX5_EQ_VEC_COMP_BASE + i);
 	int err;
 
 	if (!zalloc_cpumask_var(&priv->irq_info[i].mask, GFP_KERNEL)) {
@@ -639,8 +631,7 @@ static int mlx5_irq_set_affinity_hint(struct mlx5_core_dev *mdev, int i)
 static void mlx5_irq_clear_affinity_hint(struct mlx5_core_dev *mdev, int i)
 {
 	struct mlx5_priv *priv  = &mdev->priv;
-	struct msix_entry *msix = priv->msix_arr;
-	int irq                 = msix[i + MLX5_EQ_VEC_COMP_BASE].vector;
+	int irq = pci_irq_vector(mdev->pdev, MLX5_EQ_VEC_COMP_BASE + i);
 
 	irq_set_affinity_hint(irq, NULL);
 	free_cpumask_var(priv->irq_info[i].mask);
@@ -763,8 +754,8 @@ static int alloc_comp_eqs(struct mlx5_core_dev *dev)
 		}
 
 #ifdef CONFIG_RFS_ACCEL
-		irq_cpu_rmap_add(dev->rmap,
-				 dev->priv.msix_arr[i + MLX5_EQ_VEC_COMP_BASE].vector);
+		irq_cpu_rmap_add(dev->rmap, pci_irq_vector(dev->pdev,
+				 MLX5_EQ_VEC_COMP_BASE + i));
 #endif
 		snprintf(name, MLX5_MAX_IRQ_NAME, "mlx5_comp%d", i);
 		err = mlx5_create_map_eq(dev, eq,
@@ -1101,9 +1092,9 @@ static int mlx5_load_one(struct mlx5_core_dev *dev, struct mlx5_priv *priv,
 		goto err_stop_poll;
 	}
 
-	err = mlx5_enable_msix(dev);
+	err = mlx5_alloc_irq_vectors(dev);
 	if (err) {
-		dev_err(&pdev->dev, "enable msix failed\n");
+		dev_err(&pdev->dev, "alloc irq vectors failed\n");
 		goto err_cleanup_once;
 	}
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
index fbc6e9e9e305..521768c56073 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
@@ -109,7 +109,6 @@ int mlx5_destroy_scheduling_element_cmd(struct mlx5_core_dev *dev, u8 hierarchy,
 					u32 element_id);
 int mlx5_wait_for_vf_pages(struct mlx5_core_dev *dev);
 u64 mlx5_read_internal_timer(struct mlx5_core_dev *dev);
-u32 mlx5_get_msix_vec(struct mlx5_core_dev *dev, int vecidx);
 struct mlx5_eq *mlx5_eqn2eq(struct mlx5_core_dev *dev, int eqn);
 void mlx5_cq_tasklet_cb(unsigned long data);
 
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index bcdf739ee41a..4843fab18b83 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -590,7 +590,6 @@ struct mlx5_port_module_event_stats {
 struct mlx5_priv {
 	char			name[MLX5_MAX_NAME_LEN];
 	struct mlx5_eq_table	eq_table;
-	struct msix_entry	*msix_arr;
 	struct mlx5_irq_info	*irq_info;
 
 	/* pages stuff */
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v3 for-4.13 1/6] mlx5: convert to generic pci_alloc_irq_vectors
@ 2017-06-05  6:35     ` Sagi Grimberg
  0 siblings, 0 replies; 36+ messages in thread
From: Sagi Grimberg @ 2017-06-05  6:35 UTC (permalink / raw)


Now that we have a generic code to allocate an array
of irq vectors and even correctly spread their affinity,
correctly handle cpu hotplug events and more, were much
better off using it.

Reviewed-by: Christoph Hellwig <hch at lst.de>
Acked-by: Leon Romanovsky <leonro at mellanox.com>
Signed-off-by: Sagi Grimberg <sagi at grimberg.me>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |  2 +-
 drivers/net/ethernet/mellanox/mlx5/core/eq.c       |  9 ++----
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.c  |  2 +-
 drivers/net/ethernet/mellanox/mlx5/core/health.c   |  2 +-
 drivers/net/ethernet/mellanox/mlx5/core/main.c     | 33 ++++++++--------------
 .../net/ethernet/mellanox/mlx5/core/mlx5_core.h    |  1 -
 include/linux/mlx5/driver.h                        |  1 -
 7 files changed, 17 insertions(+), 33 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 41cd22a223dc..2a3c59e55dcf 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -385,7 +385,7 @@ static void mlx5e_enable_async_events(struct mlx5e_priv *priv)
 static void mlx5e_disable_async_events(struct mlx5e_priv *priv)
 {
 	clear_bit(MLX5E_STATE_ASYNC_EVENTS_ENABLED, &priv->state);
-	synchronize_irq(mlx5_get_msix_vec(priv->mdev, MLX5_EQ_VEC_ASYNC));
+	synchronize_irq(pci_irq_vector(priv->mdev->pdev, MLX5_EQ_VEC_ASYNC));
 }
 
 static inline int mlx5e_get_wqe_mtt_sz(void)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
index ea5d8d37a75c..e2c33c493b89 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
@@ -575,7 +575,7 @@ int mlx5_create_map_eq(struct mlx5_core_dev *dev, struct mlx5_eq *eq, u8 vecidx,
 		 name, pci_name(dev->pdev));
 
 	eq->eqn = MLX5_GET(create_eq_out, out, eq_number);
-	eq->irqn = priv->msix_arr[vecidx].vector;
+	eq->irqn = pci_irq_vector(dev->pdev, vecidx);
 	eq->dev = dev;
 	eq->doorbell = priv->uar->map + MLX5_EQ_DOORBEL_OFFSET;
 	err = request_irq(eq->irqn, handler, 0,
@@ -610,7 +610,7 @@ int mlx5_create_map_eq(struct mlx5_core_dev *dev, struct mlx5_eq *eq, u8 vecidx,
 	return 0;
 
 err_irq:
-	free_irq(priv->msix_arr[vecidx].vector, eq);
+	free_irq(eq->irqn, eq);
 
 err_eq:
 	mlx5_cmd_destroy_eq(dev, eq->eqn);
@@ -651,11 +651,6 @@ int mlx5_destroy_unmap_eq(struct mlx5_core_dev *dev, struct mlx5_eq *eq)
 }
 EXPORT_SYMBOL_GPL(mlx5_destroy_unmap_eq);
 
-u32 mlx5_get_msix_vec(struct mlx5_core_dev *dev, int vecidx)
-{
-	return dev->priv.msix_arr[MLX5_EQ_VEC_ASYNC].vector;
-}
-
 int mlx5_eq_init(struct mlx5_core_dev *dev)
 {
 	int err;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
index 2e34d95ea776..e9256b7017b6 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
@@ -1592,7 +1592,7 @@ static void esw_disable_vport(struct mlx5_eswitch *esw, int vport_num)
 	/* Mark this vport as disabled to discard new events */
 	vport->enabled = false;
 
-	synchronize_irq(mlx5_get_msix_vec(esw->dev, MLX5_EQ_VEC_ASYNC));
+	synchronize_irq(pci_irq_vector(esw->dev->pdev, MLX5_EQ_VEC_ASYNC));
 	/* Wait for current already scheduled events to complete */
 	flush_workqueue(esw->work_queue);
 	/* Disable events from this vport */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/health.c b/drivers/net/ethernet/mellanox/mlx5/core/health.c
index d0515391d33b..8b38d5cfd4c5 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/health.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/health.c
@@ -80,7 +80,7 @@ static void trigger_cmd_completions(struct mlx5_core_dev *dev)
 	u64 vector;
 
 	/* wait for pending handlers to complete */
-	synchronize_irq(dev->priv.msix_arr[MLX5_EQ_VEC_CMD].vector);
+	synchronize_irq(pci_irq_vector(dev->pdev, MLX5_EQ_VEC_CMD));
 	spin_lock_irqsave(&dev->cmd.alloc_lock, flags);
 	vector = ~dev->cmd.bitmask & ((1ul << (1 << dev->cmd.log_sz)) - 1);
 	if (!vector)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 0c123d571b4c..e4431aacce9d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -308,13 +308,12 @@ static void release_bar(struct pci_dev *pdev)
 	pci_release_regions(pdev);
 }
 
-static int mlx5_enable_msix(struct mlx5_core_dev *dev)
+static int mlx5_alloc_irq_vectors(struct mlx5_core_dev *dev)
 {
 	struct mlx5_priv *priv = &dev->priv;
 	struct mlx5_eq_table *table = &priv->eq_table;
 	int num_eqs = 1 << MLX5_CAP_GEN(dev, log_max_eq);
 	int nvec;
-	int i;
 
 	nvec = MLX5_CAP_GEN(dev, num_ports) * num_online_cpus() +
 	       MLX5_EQ_VEC_COMP_BASE;
@@ -322,17 +321,13 @@ static int mlx5_enable_msix(struct mlx5_core_dev *dev)
 	if (nvec <= MLX5_EQ_VEC_COMP_BASE)
 		return -ENOMEM;
 
-	priv->msix_arr = kcalloc(nvec, sizeof(*priv->msix_arr), GFP_KERNEL);
-
 	priv->irq_info = kcalloc(nvec, sizeof(*priv->irq_info), GFP_KERNEL);
-	if (!priv->msix_arr || !priv->irq_info)
+	if (!priv->irq_info)
 		goto err_free_msix;
 
-	for (i = 0; i < nvec; i++)
-		priv->msix_arr[i].entry = i;
-
-	nvec = pci_enable_msix_range(dev->pdev, priv->msix_arr,
-				     MLX5_EQ_VEC_COMP_BASE + 1, nvec);
+	nvec = pci_alloc_irq_vectors(dev->pdev,
+			MLX5_EQ_VEC_COMP_BASE + 1, nvec,
+			PCI_IRQ_MSIX);
 	if (nvec < 0)
 		return nvec;
 
@@ -342,7 +337,6 @@ static int mlx5_enable_msix(struct mlx5_core_dev *dev)
 
 err_free_msix:
 	kfree(priv->irq_info);
-	kfree(priv->msix_arr);
 	return -ENOMEM;
 }
 
@@ -350,9 +344,8 @@ static void mlx5_disable_msix(struct mlx5_core_dev *dev)
 {
 	struct mlx5_priv *priv = &dev->priv;
 
-	pci_disable_msix(dev->pdev);
+	pci_free_irq_vectors(dev->pdev);
 	kfree(priv->irq_info);
-	kfree(priv->msix_arr);
 }
 
 struct mlx5_reg_host_endianess {
@@ -610,8 +603,7 @@ u64 mlx5_read_internal_timer(struct mlx5_core_dev *dev)
 static int mlx5_irq_set_affinity_hint(struct mlx5_core_dev *mdev, int i)
 {
 	struct mlx5_priv *priv  = &mdev->priv;
-	struct msix_entry *msix = priv->msix_arr;
-	int irq                 = msix[i + MLX5_EQ_VEC_COMP_BASE].vector;
+	int irq = pci_irq_vector(mdev->pdev, MLX5_EQ_VEC_COMP_BASE + i);
 	int err;
 
 	if (!zalloc_cpumask_var(&priv->irq_info[i].mask, GFP_KERNEL)) {
@@ -639,8 +631,7 @@ static int mlx5_irq_set_affinity_hint(struct mlx5_core_dev *mdev, int i)
 static void mlx5_irq_clear_affinity_hint(struct mlx5_core_dev *mdev, int i)
 {
 	struct mlx5_priv *priv  = &mdev->priv;
-	struct msix_entry *msix = priv->msix_arr;
-	int irq                 = msix[i + MLX5_EQ_VEC_COMP_BASE].vector;
+	int irq = pci_irq_vector(mdev->pdev, MLX5_EQ_VEC_COMP_BASE + i);
 
 	irq_set_affinity_hint(irq, NULL);
 	free_cpumask_var(priv->irq_info[i].mask);
@@ -763,8 +754,8 @@ static int alloc_comp_eqs(struct mlx5_core_dev *dev)
 		}
 
 #ifdef CONFIG_RFS_ACCEL
-		irq_cpu_rmap_add(dev->rmap,
-				 dev->priv.msix_arr[i + MLX5_EQ_VEC_COMP_BASE].vector);
+		irq_cpu_rmap_add(dev->rmap, pci_irq_vector(dev->pdev,
+				 MLX5_EQ_VEC_COMP_BASE + i));
 #endif
 		snprintf(name, MLX5_MAX_IRQ_NAME, "mlx5_comp%d", i);
 		err = mlx5_create_map_eq(dev, eq,
@@ -1101,9 +1092,9 @@ static int mlx5_load_one(struct mlx5_core_dev *dev, struct mlx5_priv *priv,
 		goto err_stop_poll;
 	}
 
-	err = mlx5_enable_msix(dev);
+	err = mlx5_alloc_irq_vectors(dev);
 	if (err) {
-		dev_err(&pdev->dev, "enable msix failed\n");
+		dev_err(&pdev->dev, "alloc irq vectors failed\n");
 		goto err_cleanup_once;
 	}
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
index fbc6e9e9e305..521768c56073 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
@@ -109,7 +109,6 @@ int mlx5_destroy_scheduling_element_cmd(struct mlx5_core_dev *dev, u8 hierarchy,
 					u32 element_id);
 int mlx5_wait_for_vf_pages(struct mlx5_core_dev *dev);
 u64 mlx5_read_internal_timer(struct mlx5_core_dev *dev);
-u32 mlx5_get_msix_vec(struct mlx5_core_dev *dev, int vecidx);
 struct mlx5_eq *mlx5_eqn2eq(struct mlx5_core_dev *dev, int eqn);
 void mlx5_cq_tasklet_cb(unsigned long data);
 
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index bcdf739ee41a..4843fab18b83 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -590,7 +590,6 @@ struct mlx5_port_module_event_stats {
 struct mlx5_priv {
 	char			name[MLX5_MAX_NAME_LEN];
 	struct mlx5_eq_table	eq_table;
-	struct msix_entry	*msix_arr;
 	struct mlx5_irq_info	*irq_info;
 
 	/* pages stuff */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v3 for-4.13 2/6] mlx5: move affinity hints assignments to generic code
  2017-06-05  6:35 ` Sagi Grimberg
@ 2017-06-05  6:35     ` Sagi Grimberg
  -1 siblings, 0 replies; 36+ messages in thread
From: Sagi Grimberg @ 2017-06-05  6:35 UTC (permalink / raw)
  To: Doug Ledford
  Cc: Christoph Hellwig, Leon Romanovsky,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

generic api takes care of spreading affinity similar to
what mlx5 open coded (and even handles better asymmetric
configurations). Ask the generic API to spread affinity
for us, and feed him pre_vectors that do not participate
in affinity settings (which is an improvement to what we
had before).

The affinity assignments should match what mlx5 tried to
do earlier but now we do not set affinity to async, cmd
and pages dedicated vectors.

Also, remove mlx5e_get_cpu routine as we have generic helpers
to get cpumask and node given a irq vector, so use them
directly.

Reviewed-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
Acked-by: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Sagi Grimberg <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 12 ++--
 drivers/net/ethernet/mellanox/mlx5/core/main.c    | 83 ++---------------------
 include/linux/mlx5/driver.h                       |  1 -
 3 files changed, 10 insertions(+), 86 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 2a3c59e55dcf..ebfda1eae6b4 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -1565,11 +1565,6 @@ static void mlx5e_close_cq(struct mlx5e_cq *cq)
 	mlx5e_free_cq(cq);
 }
 
-static int mlx5e_get_cpu(struct mlx5e_priv *priv, int ix)
-{
-	return cpumask_first(priv->mdev->priv.irq_info[ix].mask);
-}
-
 static int mlx5e_open_tx_cqs(struct mlx5e_channel *c,
 			     struct mlx5e_params *params,
 			     struct mlx5e_channel_param *cparam)
@@ -1718,11 +1713,11 @@ static int mlx5e_open_channel(struct mlx5e_priv *priv, int ix,
 {
 	struct mlx5e_cq_moder icocq_moder = {0, 0};
 	struct net_device *netdev = priv->netdev;
-	int cpu = mlx5e_get_cpu(priv, ix);
 	struct mlx5e_channel *c;
 	int err;
 
-	c = kzalloc_node(sizeof(*c), GFP_KERNEL, cpu_to_node(cpu));
+	c = kzalloc_node(sizeof(*c), GFP_KERNEL,
+		pci_irq_get_node(priv->mdev->pdev, MLX5_EQ_VEC_COMP_BASE + ix));
 	if (!c)
 		return -ENOMEM;
 
@@ -1730,7 +1725,8 @@ static int mlx5e_open_channel(struct mlx5e_priv *priv, int ix,
 	c->mdev     = priv->mdev;
 	c->tstamp   = &priv->tstamp;
 	c->ix       = ix;
-	c->cpu      = cpu;
+	c->cpu      = cpumask_first(pci_irq_get_affinity(priv->mdev->pdev,
+			MLX5_EQ_VEC_COMP_BASE + ix));
 	c->pdev     = &priv->mdev->pdev->dev;
 	c->netdev   = priv->netdev;
 	c->mkey_be  = cpu_to_be32(priv->mdev->mlx5e_res.mkey.key);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index e4431aacce9d..7b9e7301929b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -312,6 +312,9 @@ static int mlx5_alloc_irq_vectors(struct mlx5_core_dev *dev)
 {
 	struct mlx5_priv *priv = &dev->priv;
 	struct mlx5_eq_table *table = &priv->eq_table;
+	struct irq_affinity irqdesc = {
+		.pre_vectors = MLX5_EQ_VEC_COMP_BASE,
+	};
 	int num_eqs = 1 << MLX5_CAP_GEN(dev, log_max_eq);
 	int nvec;
 
@@ -325,9 +328,10 @@ static int mlx5_alloc_irq_vectors(struct mlx5_core_dev *dev)
 	if (!priv->irq_info)
 		goto err_free_msix;
 
-	nvec = pci_alloc_irq_vectors(dev->pdev,
+	nvec = pci_alloc_irq_vectors_affinity(dev->pdev,
 			MLX5_EQ_VEC_COMP_BASE + 1, nvec,
-			PCI_IRQ_MSIX);
+			PCI_IRQ_MSIX | PCI_IRQ_AFFINITY,
+			&irqdesc);
 	if (nvec < 0)
 		return nvec;
 
@@ -600,71 +604,6 @@ u64 mlx5_read_internal_timer(struct mlx5_core_dev *dev)
 	return (u64)timer_l | (u64)timer_h1 << 32;
 }
 
-static int mlx5_irq_set_affinity_hint(struct mlx5_core_dev *mdev, int i)
-{
-	struct mlx5_priv *priv  = &mdev->priv;
-	int irq = pci_irq_vector(mdev->pdev, MLX5_EQ_VEC_COMP_BASE + i);
-	int err;
-
-	if (!zalloc_cpumask_var(&priv->irq_info[i].mask, GFP_KERNEL)) {
-		mlx5_core_warn(mdev, "zalloc_cpumask_var failed");
-		return -ENOMEM;
-	}
-
-	cpumask_set_cpu(cpumask_local_spread(i, priv->numa_node),
-			priv->irq_info[i].mask);
-
-	err = irq_set_affinity_hint(irq, priv->irq_info[i].mask);
-	if (err) {
-		mlx5_core_warn(mdev, "irq_set_affinity_hint failed,irq 0x%.4x",
-			       irq);
-		goto err_clear_mask;
-	}
-
-	return 0;
-
-err_clear_mask:
-	free_cpumask_var(priv->irq_info[i].mask);
-	return err;
-}
-
-static void mlx5_irq_clear_affinity_hint(struct mlx5_core_dev *mdev, int i)
-{
-	struct mlx5_priv *priv  = &mdev->priv;
-	int irq = pci_irq_vector(mdev->pdev, MLX5_EQ_VEC_COMP_BASE + i);
-
-	irq_set_affinity_hint(irq, NULL);
-	free_cpumask_var(priv->irq_info[i].mask);
-}
-
-static int mlx5_irq_set_affinity_hints(struct mlx5_core_dev *mdev)
-{
-	int err;
-	int i;
-
-	for (i = 0; i < mdev->priv.eq_table.num_comp_vectors; i++) {
-		err = mlx5_irq_set_affinity_hint(mdev, i);
-		if (err)
-			goto err_out;
-	}
-
-	return 0;
-
-err_out:
-	for (i--; i >= 0; i--)
-		mlx5_irq_clear_affinity_hint(mdev, i);
-
-	return err;
-}
-
-static void mlx5_irq_clear_affinity_hints(struct mlx5_core_dev *mdev)
-{
-	int i;
-
-	for (i = 0; i < mdev->priv.eq_table.num_comp_vectors; i++)
-		mlx5_irq_clear_affinity_hint(mdev, i);
-}
-
 int mlx5_vector2eqn(struct mlx5_core_dev *dev, int vector, int *eqn,
 		    unsigned int *irqn)
 {
@@ -1116,12 +1055,6 @@ static int mlx5_load_one(struct mlx5_core_dev *dev, struct mlx5_priv *priv,
 		goto err_stop_eqs;
 	}
 
-	err = mlx5_irq_set_affinity_hints(dev);
-	if (err) {
-		dev_err(&pdev->dev, "Failed to alloc affinity hint cpumask\n");
-		goto err_affinity_hints;
-	}
-
 	err = mlx5_init_fs(dev);
 	if (err) {
 		dev_err(&pdev->dev, "Failed to init flow steering\n");
@@ -1165,9 +1098,6 @@ static int mlx5_load_one(struct mlx5_core_dev *dev, struct mlx5_priv *priv,
 	mlx5_cleanup_fs(dev);
 
 err_fs:
-	mlx5_irq_clear_affinity_hints(dev);
-
-err_affinity_hints:
 	free_comp_eqs(dev);
 
 err_stop_eqs:
@@ -1234,7 +1164,6 @@ static int mlx5_unload_one(struct mlx5_core_dev *dev, struct mlx5_priv *priv,
 	mlx5_eswitch_detach(dev->priv.eswitch);
 #endif
 	mlx5_cleanup_fs(dev);
-	mlx5_irq_clear_affinity_hints(dev);
 	free_comp_eqs(dev);
 	mlx5_stop_eqs(dev);
 	mlx5_put_uars_page(dev, priv->uar);
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index 4843fab18b83..963e3d59d740 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -527,7 +527,6 @@ struct mlx5_core_sriov {
 };
 
 struct mlx5_irq_info {
-	cpumask_var_t mask;
 	char name[MLX5_MAX_IRQ_NAME];
 };
 
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v3 for-4.13 2/6] mlx5: move affinity hints assignments to generic code
@ 2017-06-05  6:35     ` Sagi Grimberg
  0 siblings, 0 replies; 36+ messages in thread
From: Sagi Grimberg @ 2017-06-05  6:35 UTC (permalink / raw)


generic api takes care of spreading affinity similar to
what mlx5 open coded (and even handles better asymmetric
configurations). Ask the generic API to spread affinity
for us, and feed him pre_vectors that do not participate
in affinity settings (which is an improvement to what we
had before).

The affinity assignments should match what mlx5 tried to
do earlier but now we do not set affinity to async, cmd
and pages dedicated vectors.

Also, remove mlx5e_get_cpu routine as we have generic helpers
to get cpumask and node given a irq vector, so use them
directly.

Reviewed-by: Christoph Hellwig <hch at lst.de>
Acked-by: Leon Romanovsky <leonro at mellanox.com>
Signed-off-by: Sagi Grimberg <sagi at grimberg.me>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 12 ++--
 drivers/net/ethernet/mellanox/mlx5/core/main.c    | 83 ++---------------------
 include/linux/mlx5/driver.h                       |  1 -
 3 files changed, 10 insertions(+), 86 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 2a3c59e55dcf..ebfda1eae6b4 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -1565,11 +1565,6 @@ static void mlx5e_close_cq(struct mlx5e_cq *cq)
 	mlx5e_free_cq(cq);
 }
 
-static int mlx5e_get_cpu(struct mlx5e_priv *priv, int ix)
-{
-	return cpumask_first(priv->mdev->priv.irq_info[ix].mask);
-}
-
 static int mlx5e_open_tx_cqs(struct mlx5e_channel *c,
 			     struct mlx5e_params *params,
 			     struct mlx5e_channel_param *cparam)
@@ -1718,11 +1713,11 @@ static int mlx5e_open_channel(struct mlx5e_priv *priv, int ix,
 {
 	struct mlx5e_cq_moder icocq_moder = {0, 0};
 	struct net_device *netdev = priv->netdev;
-	int cpu = mlx5e_get_cpu(priv, ix);
 	struct mlx5e_channel *c;
 	int err;
 
-	c = kzalloc_node(sizeof(*c), GFP_KERNEL, cpu_to_node(cpu));
+	c = kzalloc_node(sizeof(*c), GFP_KERNEL,
+		pci_irq_get_node(priv->mdev->pdev, MLX5_EQ_VEC_COMP_BASE + ix));
 	if (!c)
 		return -ENOMEM;
 
@@ -1730,7 +1725,8 @@ static int mlx5e_open_channel(struct mlx5e_priv *priv, int ix,
 	c->mdev     = priv->mdev;
 	c->tstamp   = &priv->tstamp;
 	c->ix       = ix;
-	c->cpu      = cpu;
+	c->cpu      = cpumask_first(pci_irq_get_affinity(priv->mdev->pdev,
+			MLX5_EQ_VEC_COMP_BASE + ix));
 	c->pdev     = &priv->mdev->pdev->dev;
 	c->netdev   = priv->netdev;
 	c->mkey_be  = cpu_to_be32(priv->mdev->mlx5e_res.mkey.key);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index e4431aacce9d..7b9e7301929b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -312,6 +312,9 @@ static int mlx5_alloc_irq_vectors(struct mlx5_core_dev *dev)
 {
 	struct mlx5_priv *priv = &dev->priv;
 	struct mlx5_eq_table *table = &priv->eq_table;
+	struct irq_affinity irqdesc = {
+		.pre_vectors = MLX5_EQ_VEC_COMP_BASE,
+	};
 	int num_eqs = 1 << MLX5_CAP_GEN(dev, log_max_eq);
 	int nvec;
 
@@ -325,9 +328,10 @@ static int mlx5_alloc_irq_vectors(struct mlx5_core_dev *dev)
 	if (!priv->irq_info)
 		goto err_free_msix;
 
-	nvec = pci_alloc_irq_vectors(dev->pdev,
+	nvec = pci_alloc_irq_vectors_affinity(dev->pdev,
 			MLX5_EQ_VEC_COMP_BASE + 1, nvec,
-			PCI_IRQ_MSIX);
+			PCI_IRQ_MSIX | PCI_IRQ_AFFINITY,
+			&irqdesc);
 	if (nvec < 0)
 		return nvec;
 
@@ -600,71 +604,6 @@ u64 mlx5_read_internal_timer(struct mlx5_core_dev *dev)
 	return (u64)timer_l | (u64)timer_h1 << 32;
 }
 
-static int mlx5_irq_set_affinity_hint(struct mlx5_core_dev *mdev, int i)
-{
-	struct mlx5_priv *priv  = &mdev->priv;
-	int irq = pci_irq_vector(mdev->pdev, MLX5_EQ_VEC_COMP_BASE + i);
-	int err;
-
-	if (!zalloc_cpumask_var(&priv->irq_info[i].mask, GFP_KERNEL)) {
-		mlx5_core_warn(mdev, "zalloc_cpumask_var failed");
-		return -ENOMEM;
-	}
-
-	cpumask_set_cpu(cpumask_local_spread(i, priv->numa_node),
-			priv->irq_info[i].mask);
-
-	err = irq_set_affinity_hint(irq, priv->irq_info[i].mask);
-	if (err) {
-		mlx5_core_warn(mdev, "irq_set_affinity_hint failed,irq 0x%.4x",
-			       irq);
-		goto err_clear_mask;
-	}
-
-	return 0;
-
-err_clear_mask:
-	free_cpumask_var(priv->irq_info[i].mask);
-	return err;
-}
-
-static void mlx5_irq_clear_affinity_hint(struct mlx5_core_dev *mdev, int i)
-{
-	struct mlx5_priv *priv  = &mdev->priv;
-	int irq = pci_irq_vector(mdev->pdev, MLX5_EQ_VEC_COMP_BASE + i);
-
-	irq_set_affinity_hint(irq, NULL);
-	free_cpumask_var(priv->irq_info[i].mask);
-}
-
-static int mlx5_irq_set_affinity_hints(struct mlx5_core_dev *mdev)
-{
-	int err;
-	int i;
-
-	for (i = 0; i < mdev->priv.eq_table.num_comp_vectors; i++) {
-		err = mlx5_irq_set_affinity_hint(mdev, i);
-		if (err)
-			goto err_out;
-	}
-
-	return 0;
-
-err_out:
-	for (i--; i >= 0; i--)
-		mlx5_irq_clear_affinity_hint(mdev, i);
-
-	return err;
-}
-
-static void mlx5_irq_clear_affinity_hints(struct mlx5_core_dev *mdev)
-{
-	int i;
-
-	for (i = 0; i < mdev->priv.eq_table.num_comp_vectors; i++)
-		mlx5_irq_clear_affinity_hint(mdev, i);
-}
-
 int mlx5_vector2eqn(struct mlx5_core_dev *dev, int vector, int *eqn,
 		    unsigned int *irqn)
 {
@@ -1116,12 +1055,6 @@ static int mlx5_load_one(struct mlx5_core_dev *dev, struct mlx5_priv *priv,
 		goto err_stop_eqs;
 	}
 
-	err = mlx5_irq_set_affinity_hints(dev);
-	if (err) {
-		dev_err(&pdev->dev, "Failed to alloc affinity hint cpumask\n");
-		goto err_affinity_hints;
-	}
-
 	err = mlx5_init_fs(dev);
 	if (err) {
 		dev_err(&pdev->dev, "Failed to init flow steering\n");
@@ -1165,9 +1098,6 @@ static int mlx5_load_one(struct mlx5_core_dev *dev, struct mlx5_priv *priv,
 	mlx5_cleanup_fs(dev);
 
 err_fs:
-	mlx5_irq_clear_affinity_hints(dev);
-
-err_affinity_hints:
 	free_comp_eqs(dev);
 
 err_stop_eqs:
@@ -1234,7 +1164,6 @@ static int mlx5_unload_one(struct mlx5_core_dev *dev, struct mlx5_priv *priv,
 	mlx5_eswitch_detach(dev->priv.eswitch);
 #endif
 	mlx5_cleanup_fs(dev);
-	mlx5_irq_clear_affinity_hints(dev);
 	free_comp_eqs(dev);
 	mlx5_stop_eqs(dev);
 	mlx5_put_uars_page(dev, priv->uar);
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index 4843fab18b83..963e3d59d740 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -527,7 +527,6 @@ struct mlx5_core_sriov {
 };
 
 struct mlx5_irq_info {
-	cpumask_var_t mask;
 	char name[MLX5_MAX_IRQ_NAME];
 };
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v3 for-4.13 3/6] RDMA/core: expose affinity mappings per completion vector
  2017-06-05  6:35 ` Sagi Grimberg
@ 2017-06-05  6:35     ` Sagi Grimberg
  -1 siblings, 0 replies; 36+ messages in thread
From: Sagi Grimberg @ 2017-06-05  6:35 UTC (permalink / raw)
  To: Doug Ledford
  Cc: Christoph Hellwig, Leon Romanovsky,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

This will allow ULPs to intelligently locate threads based
on completion vector cpu affinity mappings. In case the
driver does not expose a get_vector_affinity callout, return
NULL so the caller can maintain a fallback logic.

Reviewed-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
Reviewed-by: Håkon Bugge <haakon.bugge-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
Acked-by: Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Signed-off-by: Sagi Grimberg <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
---
 include/rdma/ib_verbs.h | 25 ++++++++++++++++++++++++-
 1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index ba8314ec5768..2349143297c9 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -2245,6 +2245,8 @@ struct ib_device {
 	 */
 	int (*get_port_immutable)(struct ib_device *, u8, struct ib_port_immutable *);
 	void (*get_dev_fw_str)(struct ib_device *, char *str, size_t str_len);
+	const struct cpumask *(*get_vector_affinity)(struct ib_device *ibdev,
+						     int comp_vector);
 };
 
 struct ib_client {
@@ -3609,7 +3611,6 @@ static inline void rdma_ah_set_interface_id(struct rdma_ah_attr *attr,
 
 	grh->dgid.global.interface_id = if_id;
 }
-
 static inline void rdma_ah_set_grh(struct rdma_ah_attr *attr,
 				   union ib_gid *dgid, u32 flow_label,
 				   u8 sgid_index, u8 hop_limit,
@@ -3639,4 +3640,26 @@ static inline enum rdma_ah_attr_type rdma_ah_find_type(struct ib_device *dev,
 	else
 		return RDMA_AH_ATTR_TYPE_IB;
 }
+
+/**
+ * ib_get_vector_affinity - Get the affinity mappings of a given completion
+ *   vector
+ * @device:         the rdma device
+ * @comp_vector:    index of completion vector
+ *
+ * Returns NULL on failure, otherwise a corresponding cpu map of the
+ * completion vector (returns all-cpus map if the device driver doesn't
+ * implement get_vector_affinity).
+ */
+static inline const struct cpumask *
+ib_get_vector_affinity(struct ib_device *device, int comp_vector)
+{
+	if (comp_vector < 0 || comp_vector >= device->num_comp_vectors ||
+	    !device->get_vector_affinity)
+		return NULL;
+
+	return device->get_vector_affinity(device, comp_vector);
+
+}
+
 #endif /* IB_VERBS_H */
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v3 for-4.13 3/6] RDMA/core: expose affinity mappings per completion vector
@ 2017-06-05  6:35     ` Sagi Grimberg
  0 siblings, 0 replies; 36+ messages in thread
From: Sagi Grimberg @ 2017-06-05  6:35 UTC (permalink / raw)


This will allow ULPs to intelligently locate threads based
on completion vector cpu affinity mappings. In case the
driver does not expose a get_vector_affinity callout, return
NULL so the caller can maintain a fallback logic.

Reviewed-by: Christoph Hellwig <hch at lst.de>
Reviewed-by: H?kon Bugge <haakon.bugge at oracle.com>
Acked-by: Doug Ledford <dledford at redhat.com>
Signed-off-by: Sagi Grimberg <sagi at grimberg.me>
---
 include/rdma/ib_verbs.h | 25 ++++++++++++++++++++++++-
 1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index ba8314ec5768..2349143297c9 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -2245,6 +2245,8 @@ struct ib_device {
 	 */
 	int (*get_port_immutable)(struct ib_device *, u8, struct ib_port_immutable *);
 	void (*get_dev_fw_str)(struct ib_device *, char *str, size_t str_len);
+	const struct cpumask *(*get_vector_affinity)(struct ib_device *ibdev,
+						     int comp_vector);
 };
 
 struct ib_client {
@@ -3609,7 +3611,6 @@ static inline void rdma_ah_set_interface_id(struct rdma_ah_attr *attr,
 
 	grh->dgid.global.interface_id = if_id;
 }
-
 static inline void rdma_ah_set_grh(struct rdma_ah_attr *attr,
 				   union ib_gid *dgid, u32 flow_label,
 				   u8 sgid_index, u8 hop_limit,
@@ -3639,4 +3640,26 @@ static inline enum rdma_ah_attr_type rdma_ah_find_type(struct ib_device *dev,
 	else
 		return RDMA_AH_ATTR_TYPE_IB;
 }
+
+/**
+ * ib_get_vector_affinity - Get the affinity mappings of a given completion
+ *   vector
+ * @device:         the rdma device
+ * @comp_vector:    index of completion vector
+ *
+ * Returns NULL on failure, otherwise a corresponding cpu map of the
+ * completion vector (returns all-cpus map if the device driver doesn't
+ * implement get_vector_affinity).
+ */
+static inline const struct cpumask *
+ib_get_vector_affinity(struct ib_device *device, int comp_vector)
+{
+	if (comp_vector < 0 || comp_vector >= device->num_comp_vectors ||
+	    !device->get_vector_affinity)
+		return NULL;
+
+	return device->get_vector_affinity(device, comp_vector);
+
+}
+
 #endif /* IB_VERBS_H */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v3 for-4.13 4/6] mlx5: support ->get_vector_affinity
  2017-06-05  6:35 ` Sagi Grimberg
@ 2017-06-05  6:35     ` Sagi Grimberg
  -1 siblings, 0 replies; 36+ messages in thread
From: Sagi Grimberg @ 2017-06-05  6:35 UTC (permalink / raw)
  To: Doug Ledford
  Cc: Christoph Hellwig, Leon Romanovsky,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

Simply refer to the generic affinity mask helper.

Reviewed-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
Acked-by: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Sagi Grimberg <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
---
 drivers/infiniband/hw/mlx5/main.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index d45772da0963..dc272ece106d 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -3550,6 +3550,15 @@ static void mlx5_ib_free_rdma_netdev(struct net_device *netdev)
 	return mlx5_rdma_netdev_free(netdev);
 }
 
+const struct cpumask *mlx5_ib_get_vector_affinity(struct ib_device *ibdev,
+		int comp_vector)
+{
+	struct mlx5_ib_dev *dev = to_mdev(ibdev);
+
+	return pci_irq_get_affinity(dev->mdev->pdev,
+			MLX5_EQ_VEC_COMP_BASE + comp_vector);
+}
+
 static void *mlx5_ib_add(struct mlx5_core_dev *mdev)
 {
 	struct mlx5_ib_dev *dev;
@@ -3682,6 +3691,7 @@ static void *mlx5_ib_add(struct mlx5_core_dev *mdev)
 	dev->ib_dev.get_dev_fw_str      = get_dev_fw_str;
 	dev->ib_dev.alloc_rdma_netdev	= mlx5_ib_alloc_rdma_netdev;
 	dev->ib_dev.free_rdma_netdev	= mlx5_ib_free_rdma_netdev;
+	dev->ib_dev.get_vector_affinity	= mlx5_ib_get_vector_affinity;
 	if (mlx5_core_is_pf(mdev)) {
 		dev->ib_dev.get_vf_config	= mlx5_ib_get_vf_config;
 		dev->ib_dev.set_vf_link_state	= mlx5_ib_set_vf_link_state;
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v3 for-4.13 4/6] mlx5: support ->get_vector_affinity
@ 2017-06-05  6:35     ` Sagi Grimberg
  0 siblings, 0 replies; 36+ messages in thread
From: Sagi Grimberg @ 2017-06-05  6:35 UTC (permalink / raw)


Simply refer to the generic affinity mask helper.

Reviewed-by: Christoph Hellwig <hch at lst.de>
Acked-by: Leon Romanovsky <leonro at mellanox.com>
Signed-off-by: Sagi Grimberg <sagi at grimberg.me>
---
 drivers/infiniband/hw/mlx5/main.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index d45772da0963..dc272ece106d 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -3550,6 +3550,15 @@ static void mlx5_ib_free_rdma_netdev(struct net_device *netdev)
 	return mlx5_rdma_netdev_free(netdev);
 }
 
+const struct cpumask *mlx5_ib_get_vector_affinity(struct ib_device *ibdev,
+		int comp_vector)
+{
+	struct mlx5_ib_dev *dev = to_mdev(ibdev);
+
+	return pci_irq_get_affinity(dev->mdev->pdev,
+			MLX5_EQ_VEC_COMP_BASE + comp_vector);
+}
+
 static void *mlx5_ib_add(struct mlx5_core_dev *mdev)
 {
 	struct mlx5_ib_dev *dev;
@@ -3682,6 +3691,7 @@ static void *mlx5_ib_add(struct mlx5_core_dev *mdev)
 	dev->ib_dev.get_dev_fw_str      = get_dev_fw_str;
 	dev->ib_dev.alloc_rdma_netdev	= mlx5_ib_alloc_rdma_netdev;
 	dev->ib_dev.free_rdma_netdev	= mlx5_ib_free_rdma_netdev;
+	dev->ib_dev.get_vector_affinity	= mlx5_ib_get_vector_affinity;
 	if (mlx5_core_is_pf(mdev)) {
 		dev->ib_dev.get_vf_config	= mlx5_ib_get_vf_config;
 		dev->ib_dev.set_vf_link_state	= mlx5_ib_set_vf_link_state;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v3 for-4.13 5/6] block: Add rdma affinity based queue mapping helper
  2017-06-05  6:35 ` Sagi Grimberg
@ 2017-06-05  6:35     ` Sagi Grimberg
  -1 siblings, 0 replies; 36+ messages in thread
From: Sagi Grimberg @ 2017-06-05  6:35 UTC (permalink / raw)
  To: Doug Ledford
  Cc: Christoph Hellwig, Leon Romanovsky,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

Like pci and virtio, we add a rdma helper for affinity
spreading. This achieves optimal mq affinity assignments
according to the underlying rdma device affinity maps.

Reviewed-by: Jens Axboe <axboe-b10kYP2dOMg@public.gmane.org>
Reviewed-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
Reviewed-by: Max Gurtovoy <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Sagi Grimberg <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
---
 block/Kconfig               |  5 +++++
 block/Makefile              |  1 +
 block/blk-mq-rdma.c         | 54 +++++++++++++++++++++++++++++++++++++++++++++
 include/linux/blk-mq-rdma.h | 10 +++++++++
 4 files changed, 70 insertions(+)
 create mode 100644 block/blk-mq-rdma.c
 create mode 100644 include/linux/blk-mq-rdma.h

diff --git a/block/Kconfig b/block/Kconfig
index 89cd28f8d051..3ab42bbb06d5 100644
--- a/block/Kconfig
+++ b/block/Kconfig
@@ -206,4 +206,9 @@ config BLK_MQ_VIRTIO
 	depends on BLOCK && VIRTIO
 	default y
 
+config BLK_MQ_RDMA
+	bool
+	depends on BLOCK && INFINIBAND
+	default y
+
 source block/Kconfig.iosched
diff --git a/block/Makefile b/block/Makefile
index 2b281cf258a0..9396ebc85d24 100644
--- a/block/Makefile
+++ b/block/Makefile
@@ -29,6 +29,7 @@ obj-$(CONFIG_BLK_CMDLINE_PARSER)	+= cmdline-parser.o
 obj-$(CONFIG_BLK_DEV_INTEGRITY) += bio-integrity.o blk-integrity.o t10-pi.o
 obj-$(CONFIG_BLK_MQ_PCI)	+= blk-mq-pci.o
 obj-$(CONFIG_BLK_MQ_VIRTIO)	+= blk-mq-virtio.o
+obj-$(CONFIG_BLK_MQ_RDMA)	+= blk-mq-rdma.o
 obj-$(CONFIG_BLK_DEV_ZONED)	+= blk-zoned.o
 obj-$(CONFIG_BLK_WBT)		+= blk-wbt.o
 obj-$(CONFIG_BLK_DEBUG_FS)	+= blk-mq-debugfs.o
diff --git a/block/blk-mq-rdma.c b/block/blk-mq-rdma.c
new file mode 100644
index 000000000000..7dc07b43858b
--- /dev/null
+++ b/block/blk-mq-rdma.c
@@ -0,0 +1,54 @@
+/*
+ * Copyright (c) 2017 Sagi Grimberg.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+#include <linux/blk-mq.h>
+#include <linux/blk-mq-rdma.h>
+#include <rdma/ib_verbs.h>
+
+/**
+ * blk_mq_rdma_map_queues - provide a default queue mapping for rdma device
+ * @set:	tagset to provide the mapping for
+ * @dev:	rdma device associated with @set.
+ * @first_vec:	first interrupt vectors to use for queues (usually 0)
+ *
+ * This function assumes the rdma device @dev has at least as many available
+ * interrupt vetors as @set has queues.  It will then query it's affinity mask
+ * and built queue mapping that maps a queue to the CPUs that have irq affinity
+ * for the corresponding vector.
+ *
+ * In case either the driver passed a @dev with less vectors than
+ * @set->nr_hw_queues, or @dev does not provide an affinity mask for a
+ * vector, we fallback to the naive mapping.
+ */
+int blk_mq_rdma_map_queues(struct blk_mq_tag_set *set,
+		struct ib_device *dev, int first_vec)
+{
+	const struct cpumask *mask;
+	unsigned int queue, cpu;
+
+	if (set->nr_hw_queues > dev->num_comp_vectors)
+		goto fallback;
+
+	for (queue = 0; queue < set->nr_hw_queues; queue++) {
+		mask = ib_get_vector_affinity(dev, first_vec + queue);
+		if (!mask)
+			goto fallback;
+
+		for_each_cpu(cpu, mask)
+			set->mq_map[cpu] = queue;
+	}
+
+	return 0;
+fallback:
+	return blk_mq_map_queues(set);
+}
+EXPORT_SYMBOL_GPL(blk_mq_rdma_map_queues);
diff --git a/include/linux/blk-mq-rdma.h b/include/linux/blk-mq-rdma.h
new file mode 100644
index 000000000000..b4ade198007d
--- /dev/null
+++ b/include/linux/blk-mq-rdma.h
@@ -0,0 +1,10 @@
+#ifndef _LINUX_BLK_MQ_RDMA_H
+#define _LINUX_BLK_MQ_RDMA_H
+
+struct blk_mq_tag_set;
+struct ib_device;
+
+int blk_mq_rdma_map_queues(struct blk_mq_tag_set *set,
+		struct ib_device *dev, int first_vec);
+
+#endif /* _LINUX_BLK_MQ_RDMA_H */
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v3 for-4.13 5/6] block: Add rdma affinity based queue mapping helper
@ 2017-06-05  6:35     ` Sagi Grimberg
  0 siblings, 0 replies; 36+ messages in thread
From: Sagi Grimberg @ 2017-06-05  6:35 UTC (permalink / raw)


Like pci and virtio, we add a rdma helper for affinity
spreading. This achieves optimal mq affinity assignments
according to the underlying rdma device affinity maps.

Reviewed-by: Jens Axboe <axboe at fb.com>
Reviewed-by: Christoph Hellwig <hch at lst.de>
Reviewed-by: Max Gurtovoy <maxg at mellanox.com>
Signed-off-by: Sagi Grimberg <sagi at grimberg.me>
---
 block/Kconfig               |  5 +++++
 block/Makefile              |  1 +
 block/blk-mq-rdma.c         | 54 +++++++++++++++++++++++++++++++++++++++++++++
 include/linux/blk-mq-rdma.h | 10 +++++++++
 4 files changed, 70 insertions(+)
 create mode 100644 block/blk-mq-rdma.c
 create mode 100644 include/linux/blk-mq-rdma.h

diff --git a/block/Kconfig b/block/Kconfig
index 89cd28f8d051..3ab42bbb06d5 100644
--- a/block/Kconfig
+++ b/block/Kconfig
@@ -206,4 +206,9 @@ config BLK_MQ_VIRTIO
 	depends on BLOCK && VIRTIO
 	default y
 
+config BLK_MQ_RDMA
+	bool
+	depends on BLOCK && INFINIBAND
+	default y
+
 source block/Kconfig.iosched
diff --git a/block/Makefile b/block/Makefile
index 2b281cf258a0..9396ebc85d24 100644
--- a/block/Makefile
+++ b/block/Makefile
@@ -29,6 +29,7 @@ obj-$(CONFIG_BLK_CMDLINE_PARSER)	+= cmdline-parser.o
 obj-$(CONFIG_BLK_DEV_INTEGRITY) += bio-integrity.o blk-integrity.o t10-pi.o
 obj-$(CONFIG_BLK_MQ_PCI)	+= blk-mq-pci.o
 obj-$(CONFIG_BLK_MQ_VIRTIO)	+= blk-mq-virtio.o
+obj-$(CONFIG_BLK_MQ_RDMA)	+= blk-mq-rdma.o
 obj-$(CONFIG_BLK_DEV_ZONED)	+= blk-zoned.o
 obj-$(CONFIG_BLK_WBT)		+= blk-wbt.o
 obj-$(CONFIG_BLK_DEBUG_FS)	+= blk-mq-debugfs.o
diff --git a/block/blk-mq-rdma.c b/block/blk-mq-rdma.c
new file mode 100644
index 000000000000..7dc07b43858b
--- /dev/null
+++ b/block/blk-mq-rdma.c
@@ -0,0 +1,54 @@
+/*
+ * Copyright (c) 2017 Sagi Grimberg.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+#include <linux/blk-mq.h>
+#include <linux/blk-mq-rdma.h>
+#include <rdma/ib_verbs.h>
+
+/**
+ * blk_mq_rdma_map_queues - provide a default queue mapping for rdma device
+ * @set:	tagset to provide the mapping for
+ * @dev:	rdma device associated with @set.
+ * @first_vec:	first interrupt vectors to use for queues (usually 0)
+ *
+ * This function assumes the rdma device @dev has at least as many available
+ * interrupt vetors as @set has queues.  It will then query it's affinity mask
+ * and built queue mapping that maps a queue to the CPUs that have irq affinity
+ * for the corresponding vector.
+ *
+ * In case either the driver passed a @dev with less vectors than
+ * @set->nr_hw_queues, or @dev does not provide an affinity mask for a
+ * vector, we fallback to the naive mapping.
+ */
+int blk_mq_rdma_map_queues(struct blk_mq_tag_set *set,
+		struct ib_device *dev, int first_vec)
+{
+	const struct cpumask *mask;
+	unsigned int queue, cpu;
+
+	if (set->nr_hw_queues > dev->num_comp_vectors)
+		goto fallback;
+
+	for (queue = 0; queue < set->nr_hw_queues; queue++) {
+		mask = ib_get_vector_affinity(dev, first_vec + queue);
+		if (!mask)
+			goto fallback;
+
+		for_each_cpu(cpu, mask)
+			set->mq_map[cpu] = queue;
+	}
+
+	return 0;
+fallback:
+	return blk_mq_map_queues(set);
+}
+EXPORT_SYMBOL_GPL(blk_mq_rdma_map_queues);
diff --git a/include/linux/blk-mq-rdma.h b/include/linux/blk-mq-rdma.h
new file mode 100644
index 000000000000..b4ade198007d
--- /dev/null
+++ b/include/linux/blk-mq-rdma.h
@@ -0,0 +1,10 @@
+#ifndef _LINUX_BLK_MQ_RDMA_H
+#define _LINUX_BLK_MQ_RDMA_H
+
+struct blk_mq_tag_set;
+struct ib_device;
+
+int blk_mq_rdma_map_queues(struct blk_mq_tag_set *set,
+		struct ib_device *dev, int first_vec);
+
+#endif /* _LINUX_BLK_MQ_RDMA_H */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v3 for-4.13 6/6] nvme-rdma: use intelligent affinity based queue mappings
  2017-06-05  6:35 ` Sagi Grimberg
@ 2017-06-05  6:36     ` Sagi Grimberg
  -1 siblings, 0 replies; 36+ messages in thread
From: Sagi Grimberg @ 2017-06-05  6:36 UTC (permalink / raw)
  To: Doug Ledford
  Cc: Christoph Hellwig, Leon Romanovsky,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

Use the generic block layer affinity mapping helper. Also,
limit nr_hw_queues to the rdma device number of irq vectors
as we don't really need more.

Reviewed-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
Signed-off-by: Sagi Grimberg <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
---
 drivers/nvme/host/rdma.c | 29 ++++++++++++++++++++++-------
 1 file changed, 22 insertions(+), 7 deletions(-)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index d4a940ae9a56..f44811fbceee 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -19,6 +19,7 @@
 #include <linux/string.h>
 #include <linux/atomic.h>
 #include <linux/blk-mq.h>
+#include <linux/blk-mq-rdma.h>
 #include <linux/types.h>
 #include <linux/list.h>
 #include <linux/mutex.h>
@@ -469,14 +470,10 @@ static int nvme_rdma_create_queue_ib(struct nvme_rdma_queue *queue)
 	ibdev = queue->device->dev;
 
 	/*
-	 * The admin queue is barely used once the controller is live, so don't
-	 * bother to spread it out.
+	 * Spread I/O queues completion vectors according their queue index.
+	 * Admin queues can always go on completion vector 0.
 	 */
-	if (idx == 0)
-		comp_vector = 0;
-	else
-		comp_vector = idx % ibdev->num_comp_vectors;
-
+	comp_vector = idx == 0 ? idx : idx - 1;
 
 	/* +1 for ib_stop_cq */
 	queue->ib_cq = ib_alloc_cq(ibdev, queue,
@@ -616,10 +613,20 @@ static int nvme_rdma_connect_io_queues(struct nvme_rdma_ctrl *ctrl)
 static int nvme_rdma_init_io_queues(struct nvme_rdma_ctrl *ctrl)
 {
 	struct nvmf_ctrl_options *opts = ctrl->ctrl.opts;
+	struct ib_device *ibdev = ctrl->device->dev;
 	unsigned int nr_io_queues;
 	int i, ret;
 
 	nr_io_queues = min(opts->nr_io_queues, num_online_cpus());
+
+	/*
+	 * we map queues according to the device irq vectors for
+	 * optimal locality so we don't need more queues than
+	 * completion vectors.
+	 */
+	nr_io_queues = min_t(unsigned int, nr_io_queues,
+				ibdev->num_comp_vectors);
+
 	ret = nvme_set_queue_count(&ctrl->ctrl, &nr_io_queues);
 	if (ret)
 		return ret;
@@ -1494,6 +1501,13 @@ static void nvme_rdma_complete_rq(struct request *rq)
 	nvme_complete_rq(rq);
 }
 
+static int nvme_rdma_map_queues(struct blk_mq_tag_set *set)
+{
+	struct nvme_rdma_ctrl *ctrl = set->driver_data;
+
+	return blk_mq_rdma_map_queues(set, ctrl->device->dev, 0);
+}
+
 static const struct blk_mq_ops nvme_rdma_mq_ops = {
 	.queue_rq	= nvme_rdma_queue_rq,
 	.complete	= nvme_rdma_complete_rq,
@@ -1503,6 +1517,7 @@ static const struct blk_mq_ops nvme_rdma_mq_ops = {
 	.init_hctx	= nvme_rdma_init_hctx,
 	.poll		= nvme_rdma_poll,
 	.timeout	= nvme_rdma_timeout,
+	.map_queues	= nvme_rdma_map_queues,
 };
 
 static const struct blk_mq_ops nvme_rdma_admin_mq_ops = {
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v3 for-4.13 6/6] nvme-rdma: use intelligent affinity based queue mappings
@ 2017-06-05  6:36     ` Sagi Grimberg
  0 siblings, 0 replies; 36+ messages in thread
From: Sagi Grimberg @ 2017-06-05  6:36 UTC (permalink / raw)


Use the generic block layer affinity mapping helper. Also,
limit nr_hw_queues to the rdma device number of irq vectors
as we don't really need more.

Reviewed-by: Christoph Hellwig <hch at lst.de>
Signed-off-by: Sagi Grimberg <sagi at grimberg.me>
---
 drivers/nvme/host/rdma.c | 29 ++++++++++++++++++++++-------
 1 file changed, 22 insertions(+), 7 deletions(-)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index d4a940ae9a56..f44811fbceee 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -19,6 +19,7 @@
 #include <linux/string.h>
 #include <linux/atomic.h>
 #include <linux/blk-mq.h>
+#include <linux/blk-mq-rdma.h>
 #include <linux/types.h>
 #include <linux/list.h>
 #include <linux/mutex.h>
@@ -469,14 +470,10 @@ static int nvme_rdma_create_queue_ib(struct nvme_rdma_queue *queue)
 	ibdev = queue->device->dev;
 
 	/*
-	 * The admin queue is barely used once the controller is live, so don't
-	 * bother to spread it out.
+	 * Spread I/O queues completion vectors according their queue index.
+	 * Admin queues can always go on completion vector 0.
 	 */
-	if (idx == 0)
-		comp_vector = 0;
-	else
-		comp_vector = idx % ibdev->num_comp_vectors;
-
+	comp_vector = idx == 0 ? idx : idx - 1;
 
 	/* +1 for ib_stop_cq */
 	queue->ib_cq = ib_alloc_cq(ibdev, queue,
@@ -616,10 +613,20 @@ static int nvme_rdma_connect_io_queues(struct nvme_rdma_ctrl *ctrl)
 static int nvme_rdma_init_io_queues(struct nvme_rdma_ctrl *ctrl)
 {
 	struct nvmf_ctrl_options *opts = ctrl->ctrl.opts;
+	struct ib_device *ibdev = ctrl->device->dev;
 	unsigned int nr_io_queues;
 	int i, ret;
 
 	nr_io_queues = min(opts->nr_io_queues, num_online_cpus());
+
+	/*
+	 * we map queues according to the device irq vectors for
+	 * optimal locality so we don't need more queues than
+	 * completion vectors.
+	 */
+	nr_io_queues = min_t(unsigned int, nr_io_queues,
+				ibdev->num_comp_vectors);
+
 	ret = nvme_set_queue_count(&ctrl->ctrl, &nr_io_queues);
 	if (ret)
 		return ret;
@@ -1494,6 +1501,13 @@ static void nvme_rdma_complete_rq(struct request *rq)
 	nvme_complete_rq(rq);
 }
 
+static int nvme_rdma_map_queues(struct blk_mq_tag_set *set)
+{
+	struct nvme_rdma_ctrl *ctrl = set->driver_data;
+
+	return blk_mq_rdma_map_queues(set, ctrl->device->dev, 0);
+}
+
 static const struct blk_mq_ops nvme_rdma_mq_ops = {
 	.queue_rq	= nvme_rdma_queue_rq,
 	.complete	= nvme_rdma_complete_rq,
@@ -1503,6 +1517,7 @@ static const struct blk_mq_ops nvme_rdma_mq_ops = {
 	.init_hctx	= nvme_rdma_init_hctx,
 	.poll		= nvme_rdma_poll,
 	.timeout	= nvme_rdma_timeout,
+	.map_queues	= nvme_rdma_map_queues,
 };
 
 static const struct blk_mq_ops nvme_rdma_admin_mq_ops = {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 for-4.13 0/6] Automatic affinity settings for nvme over rdma
  2017-06-05  6:35 ` Sagi Grimberg
@ 2017-06-06  9:31     ` Christoph Hellwig
  -1 siblings, 0 replies; 36+ messages in thread
From: Christoph Hellwig @ 2017-06-06  9:31 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Doug Ledford, Christoph Hellwig, Leon Romanovsky,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Mon, Jun 05, 2017 at 09:35:54AM +0300, Sagi Grimberg wrote:
> Doug, please consider this patch set for 4.13.

Yes, please.  It would also be great if the maintainers of non-mlx5
HCA drivers could look into supporting this scheme.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v3 for-4.13 0/6] Automatic affinity settings for nvme over rdma
@ 2017-06-06  9:31     ` Christoph Hellwig
  0 siblings, 0 replies; 36+ messages in thread
From: Christoph Hellwig @ 2017-06-06  9:31 UTC (permalink / raw)


On Mon, Jun 05, 2017@09:35:54AM +0300, Sagi Grimberg wrote:
> Doug, please consider this patch set for 4.13.

Yes, please.  It would also be great if the maintainers of non-mlx5
HCA drivers could look into supporting this scheme.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 for-4.13 1/6] mlx5: convert to generic pci_alloc_irq_vectors
  2017-06-05  6:35     ` Sagi Grimberg
@ 2017-06-07  5:26         ` Saeed Mahameed
  -1 siblings, 0 replies; 36+ messages in thread
From: Saeed Mahameed @ 2017-06-07  5:26 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Doug Ledford, Christoph Hellwig, Leon Romanovsky,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Mon, Jun 5, 2017 at 9:35 AM, Sagi Grimberg <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org> wrote:
> Now that we have a generic code to allocate an array
> of irq vectors and even correctly spread their affinity,
> correctly handle cpu hotplug events and more, were much
> better off using it.
>
> Reviewed-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
> Acked-by: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> Signed-off-by: Sagi Grimberg <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>

Hey Sagi, i am sorry for the late review, had some old mail role that
made me lose some rdma mails.
Anyway i have one small comment here, But if the reset of the series
is ok, i don't want to bother you with another V.
so you can just ignore it if you wish to.

> ---
>  drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |  2 +-
>  drivers/net/ethernet/mellanox/mlx5/core/eq.c       |  9 ++----
>  drivers/net/ethernet/mellanox/mlx5/core/eswitch.c  |  2 +-
>  drivers/net/ethernet/mellanox/mlx5/core/health.c   |  2 +-
>  drivers/net/ethernet/mellanox/mlx5/core/main.c     | 33 ++++++++--------------
>  .../net/ethernet/mellanox/mlx5/core/mlx5_core.h    |  1 -
>  include/linux/mlx5/driver.h                        |  1 -
>  7 files changed, 17 insertions(+), 33 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> index 41cd22a223dc..2a3c59e55dcf 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> @@ -385,7 +385,7 @@ static void mlx5e_enable_async_events(struct mlx5e_priv *priv)
>  static void mlx5e_disable_async_events(struct mlx5e_priv *priv)
>  {
>         clear_bit(MLX5E_STATE_ASYNC_EVENTS_ENABLED, &priv->state);
> -       synchronize_irq(mlx5_get_msix_vec(priv->mdev, MLX5_EQ_VEC_ASYNC));
> +       synchronize_irq(pci_irq_vector(priv->mdev->pdev, MLX5_EQ_VEC_ASYNC));
>  }
>
>  static inline int mlx5e_get_wqe_mtt_sz(void)
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> index ea5d8d37a75c..e2c33c493b89 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> @@ -575,7 +575,7 @@ int mlx5_create_map_eq(struct mlx5_core_dev *dev, struct mlx5_eq *eq, u8 vecidx,
>                  name, pci_name(dev->pdev));
>
>         eq->eqn = MLX5_GET(create_eq_out, out, eq_number);
> -       eq->irqn = priv->msix_arr[vecidx].vector;
> +       eq->irqn = pci_irq_vector(dev->pdev, vecidx);
>         eq->dev = dev;
>         eq->doorbell = priv->uar->map + MLX5_EQ_DOORBEL_OFFSET;
>         err = request_irq(eq->irqn, handler, 0,
> @@ -610,7 +610,7 @@ int mlx5_create_map_eq(struct mlx5_core_dev *dev, struct mlx5_eq *eq, u8 vecidx,
>         return 0;
>
>  err_irq:
> -       free_irq(priv->msix_arr[vecidx].vector, eq);
> +       free_irq(eq->irqn, eq);
>
>  err_eq:
>         mlx5_cmd_destroy_eq(dev, eq->eqn);
> @@ -651,11 +651,6 @@ int mlx5_destroy_unmap_eq(struct mlx5_core_dev *dev, struct mlx5_eq *eq)
>  }
>  EXPORT_SYMBOL_GPL(mlx5_destroy_unmap_eq);
>
> -u32 mlx5_get_msix_vec(struct mlx5_core_dev *dev, int vecidx)
> -{
> -       return dev->priv.msix_arr[MLX5_EQ_VEC_ASYNC].vector;
> -}
> -
>  int mlx5_eq_init(struct mlx5_core_dev *dev)
>  {
>         int err;
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
> index 2e34d95ea776..e9256b7017b6 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
> @@ -1592,7 +1592,7 @@ static void esw_disable_vport(struct mlx5_eswitch *esw, int vport_num)
>         /* Mark this vport as disabled to discard new events */
>         vport->enabled = false;
>
> -       synchronize_irq(mlx5_get_msix_vec(esw->dev, MLX5_EQ_VEC_ASYNC));
> +       synchronize_irq(pci_irq_vector(esw->dev->pdev, MLX5_EQ_VEC_ASYNC));
>         /* Wait for current already scheduled events to complete */
>         flush_workqueue(esw->work_queue);
>         /* Disable events from this vport */
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/health.c b/drivers/net/ethernet/mellanox/mlx5/core/health.c
> index d0515391d33b..8b38d5cfd4c5 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/health.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/health.c
> @@ -80,7 +80,7 @@ static void trigger_cmd_completions(struct mlx5_core_dev *dev)
>         u64 vector;
>
>         /* wait for pending handlers to complete */
> -       synchronize_irq(dev->priv.msix_arr[MLX5_EQ_VEC_CMD].vector);
> +       synchronize_irq(pci_irq_vector(dev->pdev, MLX5_EQ_VEC_CMD));
>         spin_lock_irqsave(&dev->cmd.alloc_lock, flags);
>         vector = ~dev->cmd.bitmask & ((1ul << (1 << dev->cmd.log_sz)) - 1);
>         if (!vector)
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
> index 0c123d571b4c..e4431aacce9d 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
> @@ -308,13 +308,12 @@ static void release_bar(struct pci_dev *pdev)
>         pci_release_regions(pdev);
>  }
>
> -static int mlx5_enable_msix(struct mlx5_core_dev *dev)
> +static int mlx5_alloc_irq_vectors(struct mlx5_core_dev *dev)
>  {
>         struct mlx5_priv *priv = &dev->priv;
>         struct mlx5_eq_table *table = &priv->eq_table;
>         int num_eqs = 1 << MLX5_CAP_GEN(dev, log_max_eq);
>         int nvec;
> -       int i;
>
>         nvec = MLX5_CAP_GEN(dev, num_ports) * num_online_cpus() +
>                MLX5_EQ_VEC_COMP_BASE;
> @@ -322,17 +321,13 @@ static int mlx5_enable_msix(struct mlx5_core_dev *dev)
>         if (nvec <= MLX5_EQ_VEC_COMP_BASE)
>                 return -ENOMEM;
>
> -       priv->msix_arr = kcalloc(nvec, sizeof(*priv->msix_arr), GFP_KERNEL);
> -
>         priv->irq_info = kcalloc(nvec, sizeof(*priv->irq_info), GFP_KERNEL);
> -       if (!priv->msix_arr || !priv->irq_info)
> +       if (!priv->irq_info)
>                 goto err_free_msix;
>
> -       for (i = 0; i < nvec; i++)
> -               priv->msix_arr[i].entry = i;
> -
> -       nvec = pci_enable_msix_range(dev->pdev, priv->msix_arr,
> -                                    MLX5_EQ_VEC_COMP_BASE + 1, nvec);
> +       nvec = pci_alloc_irq_vectors(dev->pdev,
> +                       MLX5_EQ_VEC_COMP_BASE + 1, nvec,
> +                       PCI_IRQ_MSIX);
>         if (nvec < 0)
>                 return nvec;
>
> @@ -342,7 +337,6 @@ static int mlx5_enable_msix(struct mlx5_core_dev *dev)
>
>  err_free_msix:
>         kfree(priv->irq_info);
> -       kfree(priv->msix_arr);
>         return -ENOMEM;
>  }
>
> @@ -350,9 +344,8 @@ static void mlx5_disable_msix(struct mlx5_core_dev *dev)
>  {

rename to mlx5_free_irq_vectors to keep a symmetric function naming

>         struct mlx5_priv *priv = &dev->priv;
>
> -       pci_disable_msix(dev->pdev);
> +       pci_free_irq_vectors(dev->pdev);
>         kfree(priv->irq_info);
> -       kfree(priv->msix_arr);
>  }
>
>  struct mlx5_reg_host_endianess {
> @@ -610,8 +603,7 @@ u64 mlx5_read_internal_timer(struct mlx5_core_dev *dev)
>  static int mlx5_irq_set_affinity_hint(struct mlx5_core_dev *mdev, int i)
>  {
>         struct mlx5_priv *priv  = &mdev->priv;
> -       struct msix_entry *msix = priv->msix_arr;
> -       int irq                 = msix[i + MLX5_EQ_VEC_COMP_BASE].vector;
> +       int irq = pci_irq_vector(mdev->pdev, MLX5_EQ_VEC_COMP_BASE + i);
>         int err;
>
>         if (!zalloc_cpumask_var(&priv->irq_info[i].mask, GFP_KERNEL)) {
> @@ -639,8 +631,7 @@ static int mlx5_irq_set_affinity_hint(struct mlx5_core_dev *mdev, int i)
>  static void mlx5_irq_clear_affinity_hint(struct mlx5_core_dev *mdev, int i)
>  {
>         struct mlx5_priv *priv  = &mdev->priv;
> -       struct msix_entry *msix = priv->msix_arr;
> -       int irq                 = msix[i + MLX5_EQ_VEC_COMP_BASE].vector;
> +       int irq = pci_irq_vector(mdev->pdev, MLX5_EQ_VEC_COMP_BASE + i);
>
>         irq_set_affinity_hint(irq, NULL);
>         free_cpumask_var(priv->irq_info[i].mask);
> @@ -763,8 +754,8 @@ static int alloc_comp_eqs(struct mlx5_core_dev *dev)
>                 }
>
>  #ifdef CONFIG_RFS_ACCEL
> -               irq_cpu_rmap_add(dev->rmap,
> -                                dev->priv.msix_arr[i + MLX5_EQ_VEC_COMP_BASE].vector);
> +               irq_cpu_rmap_add(dev->rmap, pci_irq_vector(dev->pdev,
> +                                MLX5_EQ_VEC_COMP_BASE + i));
>  #endif
>                 snprintf(name, MLX5_MAX_IRQ_NAME, "mlx5_comp%d", i);
>                 err = mlx5_create_map_eq(dev, eq,
> @@ -1101,9 +1092,9 @@ static int mlx5_load_one(struct mlx5_core_dev *dev, struct mlx5_priv *priv,
>                 goto err_stop_poll;
>         }
>
> -       err = mlx5_enable_msix(dev);
> +       err = mlx5_alloc_irq_vectors(dev);
>         if (err) {
> -               dev_err(&pdev->dev, "enable msix failed\n");
> +               dev_err(&pdev->dev, "alloc irq vectors failed\n");
>                 goto err_cleanup_once;
>         }
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
> index fbc6e9e9e305..521768c56073 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
> @@ -109,7 +109,6 @@ int mlx5_destroy_scheduling_element_cmd(struct mlx5_core_dev *dev, u8 hierarchy,
>                                         u32 element_id);
>  int mlx5_wait_for_vf_pages(struct mlx5_core_dev *dev);
>  u64 mlx5_read_internal_timer(struct mlx5_core_dev *dev);
> -u32 mlx5_get_msix_vec(struct mlx5_core_dev *dev, int vecidx);
>  struct mlx5_eq *mlx5_eqn2eq(struct mlx5_core_dev *dev, int eqn);
>  void mlx5_cq_tasklet_cb(unsigned long data);
>
> diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
> index bcdf739ee41a..4843fab18b83 100644
> --- a/include/linux/mlx5/driver.h
> +++ b/include/linux/mlx5/driver.h
> @@ -590,7 +590,6 @@ struct mlx5_port_module_event_stats {
>  struct mlx5_priv {
>         char                    name[MLX5_MAX_NAME_LEN];
>         struct mlx5_eq_table    eq_table;
> -       struct msix_entry       *msix_arr;
>         struct mlx5_irq_info    *irq_info;
>
>         /* pages stuff */
> --
> 2.7.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v3 for-4.13 1/6] mlx5: convert to generic pci_alloc_irq_vectors
@ 2017-06-07  5:26         ` Saeed Mahameed
  0 siblings, 0 replies; 36+ messages in thread
From: Saeed Mahameed @ 2017-06-07  5:26 UTC (permalink / raw)


On Mon, Jun 5, 2017@9:35 AM, Sagi Grimberg <sagi@grimberg.me> wrote:
> Now that we have a generic code to allocate an array
> of irq vectors and even correctly spread their affinity,
> correctly handle cpu hotplug events and more, were much
> better off using it.
>
> Reviewed-by: Christoph Hellwig <hch at lst.de>
> Acked-by: Leon Romanovsky <leonro at mellanox.com>
> Signed-off-by: Sagi Grimberg <sagi at grimberg.me>

Hey Sagi, i am sorry for the late review, had some old mail role that
made me lose some rdma mails.
Anyway i have one small comment here, But if the reset of the series
is ok, i don't want to bother you with another V.
so you can just ignore it if you wish to.

> ---
>  drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |  2 +-
>  drivers/net/ethernet/mellanox/mlx5/core/eq.c       |  9 ++----
>  drivers/net/ethernet/mellanox/mlx5/core/eswitch.c  |  2 +-
>  drivers/net/ethernet/mellanox/mlx5/core/health.c   |  2 +-
>  drivers/net/ethernet/mellanox/mlx5/core/main.c     | 33 ++++++++--------------
>  .../net/ethernet/mellanox/mlx5/core/mlx5_core.h    |  1 -
>  include/linux/mlx5/driver.h                        |  1 -
>  7 files changed, 17 insertions(+), 33 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> index 41cd22a223dc..2a3c59e55dcf 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> @@ -385,7 +385,7 @@ static void mlx5e_enable_async_events(struct mlx5e_priv *priv)
>  static void mlx5e_disable_async_events(struct mlx5e_priv *priv)
>  {
>         clear_bit(MLX5E_STATE_ASYNC_EVENTS_ENABLED, &priv->state);
> -       synchronize_irq(mlx5_get_msix_vec(priv->mdev, MLX5_EQ_VEC_ASYNC));
> +       synchronize_irq(pci_irq_vector(priv->mdev->pdev, MLX5_EQ_VEC_ASYNC));
>  }
>
>  static inline int mlx5e_get_wqe_mtt_sz(void)
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> index ea5d8d37a75c..e2c33c493b89 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> @@ -575,7 +575,7 @@ int mlx5_create_map_eq(struct mlx5_core_dev *dev, struct mlx5_eq *eq, u8 vecidx,
>                  name, pci_name(dev->pdev));
>
>         eq->eqn = MLX5_GET(create_eq_out, out, eq_number);
> -       eq->irqn = priv->msix_arr[vecidx].vector;
> +       eq->irqn = pci_irq_vector(dev->pdev, vecidx);
>         eq->dev = dev;
>         eq->doorbell = priv->uar->map + MLX5_EQ_DOORBEL_OFFSET;
>         err = request_irq(eq->irqn, handler, 0,
> @@ -610,7 +610,7 @@ int mlx5_create_map_eq(struct mlx5_core_dev *dev, struct mlx5_eq *eq, u8 vecidx,
>         return 0;
>
>  err_irq:
> -       free_irq(priv->msix_arr[vecidx].vector, eq);
> +       free_irq(eq->irqn, eq);
>
>  err_eq:
>         mlx5_cmd_destroy_eq(dev, eq->eqn);
> @@ -651,11 +651,6 @@ int mlx5_destroy_unmap_eq(struct mlx5_core_dev *dev, struct mlx5_eq *eq)
>  }
>  EXPORT_SYMBOL_GPL(mlx5_destroy_unmap_eq);
>
> -u32 mlx5_get_msix_vec(struct mlx5_core_dev *dev, int vecidx)
> -{
> -       return dev->priv.msix_arr[MLX5_EQ_VEC_ASYNC].vector;
> -}
> -
>  int mlx5_eq_init(struct mlx5_core_dev *dev)
>  {
>         int err;
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
> index 2e34d95ea776..e9256b7017b6 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
> @@ -1592,7 +1592,7 @@ static void esw_disable_vport(struct mlx5_eswitch *esw, int vport_num)
>         /* Mark this vport as disabled to discard new events */
>         vport->enabled = false;
>
> -       synchronize_irq(mlx5_get_msix_vec(esw->dev, MLX5_EQ_VEC_ASYNC));
> +       synchronize_irq(pci_irq_vector(esw->dev->pdev, MLX5_EQ_VEC_ASYNC));
>         /* Wait for current already scheduled events to complete */
>         flush_workqueue(esw->work_queue);
>         /* Disable events from this vport */
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/health.c b/drivers/net/ethernet/mellanox/mlx5/core/health.c
> index d0515391d33b..8b38d5cfd4c5 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/health.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/health.c
> @@ -80,7 +80,7 @@ static void trigger_cmd_completions(struct mlx5_core_dev *dev)
>         u64 vector;
>
>         /* wait for pending handlers to complete */
> -       synchronize_irq(dev->priv.msix_arr[MLX5_EQ_VEC_CMD].vector);
> +       synchronize_irq(pci_irq_vector(dev->pdev, MLX5_EQ_VEC_CMD));
>         spin_lock_irqsave(&dev->cmd.alloc_lock, flags);
>         vector = ~dev->cmd.bitmask & ((1ul << (1 << dev->cmd.log_sz)) - 1);
>         if (!vector)
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
> index 0c123d571b4c..e4431aacce9d 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
> @@ -308,13 +308,12 @@ static void release_bar(struct pci_dev *pdev)
>         pci_release_regions(pdev);
>  }
>
> -static int mlx5_enable_msix(struct mlx5_core_dev *dev)
> +static int mlx5_alloc_irq_vectors(struct mlx5_core_dev *dev)
>  {
>         struct mlx5_priv *priv = &dev->priv;
>         struct mlx5_eq_table *table = &priv->eq_table;
>         int num_eqs = 1 << MLX5_CAP_GEN(dev, log_max_eq);
>         int nvec;
> -       int i;
>
>         nvec = MLX5_CAP_GEN(dev, num_ports) * num_online_cpus() +
>                MLX5_EQ_VEC_COMP_BASE;
> @@ -322,17 +321,13 @@ static int mlx5_enable_msix(struct mlx5_core_dev *dev)
>         if (nvec <= MLX5_EQ_VEC_COMP_BASE)
>                 return -ENOMEM;
>
> -       priv->msix_arr = kcalloc(nvec, sizeof(*priv->msix_arr), GFP_KERNEL);
> -
>         priv->irq_info = kcalloc(nvec, sizeof(*priv->irq_info), GFP_KERNEL);
> -       if (!priv->msix_arr || !priv->irq_info)
> +       if (!priv->irq_info)
>                 goto err_free_msix;
>
> -       for (i = 0; i < nvec; i++)
> -               priv->msix_arr[i].entry = i;
> -
> -       nvec = pci_enable_msix_range(dev->pdev, priv->msix_arr,
> -                                    MLX5_EQ_VEC_COMP_BASE + 1, nvec);
> +       nvec = pci_alloc_irq_vectors(dev->pdev,
> +                       MLX5_EQ_VEC_COMP_BASE + 1, nvec,
> +                       PCI_IRQ_MSIX);
>         if (nvec < 0)
>                 return nvec;
>
> @@ -342,7 +337,6 @@ static int mlx5_enable_msix(struct mlx5_core_dev *dev)
>
>  err_free_msix:
>         kfree(priv->irq_info);
> -       kfree(priv->msix_arr);
>         return -ENOMEM;
>  }
>
> @@ -350,9 +344,8 @@ static void mlx5_disable_msix(struct mlx5_core_dev *dev)
>  {

rename to mlx5_free_irq_vectors to keep a symmetric function naming

>         struct mlx5_priv *priv = &dev->priv;
>
> -       pci_disable_msix(dev->pdev);
> +       pci_free_irq_vectors(dev->pdev);
>         kfree(priv->irq_info);
> -       kfree(priv->msix_arr);
>  }
>
>  struct mlx5_reg_host_endianess {
> @@ -610,8 +603,7 @@ u64 mlx5_read_internal_timer(struct mlx5_core_dev *dev)
>  static int mlx5_irq_set_affinity_hint(struct mlx5_core_dev *mdev, int i)
>  {
>         struct mlx5_priv *priv  = &mdev->priv;
> -       struct msix_entry *msix = priv->msix_arr;
> -       int irq                 = msix[i + MLX5_EQ_VEC_COMP_BASE].vector;
> +       int irq = pci_irq_vector(mdev->pdev, MLX5_EQ_VEC_COMP_BASE + i);
>         int err;
>
>         if (!zalloc_cpumask_var(&priv->irq_info[i].mask, GFP_KERNEL)) {
> @@ -639,8 +631,7 @@ static int mlx5_irq_set_affinity_hint(struct mlx5_core_dev *mdev, int i)
>  static void mlx5_irq_clear_affinity_hint(struct mlx5_core_dev *mdev, int i)
>  {
>         struct mlx5_priv *priv  = &mdev->priv;
> -       struct msix_entry *msix = priv->msix_arr;
> -       int irq                 = msix[i + MLX5_EQ_VEC_COMP_BASE].vector;
> +       int irq = pci_irq_vector(mdev->pdev, MLX5_EQ_VEC_COMP_BASE + i);
>
>         irq_set_affinity_hint(irq, NULL);
>         free_cpumask_var(priv->irq_info[i].mask);
> @@ -763,8 +754,8 @@ static int alloc_comp_eqs(struct mlx5_core_dev *dev)
>                 }
>
>  #ifdef CONFIG_RFS_ACCEL
> -               irq_cpu_rmap_add(dev->rmap,
> -                                dev->priv.msix_arr[i + MLX5_EQ_VEC_COMP_BASE].vector);
> +               irq_cpu_rmap_add(dev->rmap, pci_irq_vector(dev->pdev,
> +                                MLX5_EQ_VEC_COMP_BASE + i));
>  #endif
>                 snprintf(name, MLX5_MAX_IRQ_NAME, "mlx5_comp%d", i);
>                 err = mlx5_create_map_eq(dev, eq,
> @@ -1101,9 +1092,9 @@ static int mlx5_load_one(struct mlx5_core_dev *dev, struct mlx5_priv *priv,
>                 goto err_stop_poll;
>         }
>
> -       err = mlx5_enable_msix(dev);
> +       err = mlx5_alloc_irq_vectors(dev);
>         if (err) {
> -               dev_err(&pdev->dev, "enable msix failed\n");
> +               dev_err(&pdev->dev, "alloc irq vectors failed\n");
>                 goto err_cleanup_once;
>         }
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
> index fbc6e9e9e305..521768c56073 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
> @@ -109,7 +109,6 @@ int mlx5_destroy_scheduling_element_cmd(struct mlx5_core_dev *dev, u8 hierarchy,
>                                         u32 element_id);
>  int mlx5_wait_for_vf_pages(struct mlx5_core_dev *dev);
>  u64 mlx5_read_internal_timer(struct mlx5_core_dev *dev);
> -u32 mlx5_get_msix_vec(struct mlx5_core_dev *dev, int vecidx);
>  struct mlx5_eq *mlx5_eqn2eq(struct mlx5_core_dev *dev, int eqn);
>  void mlx5_cq_tasklet_cb(unsigned long data);
>
> diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
> index bcdf739ee41a..4843fab18b83 100644
> --- a/include/linux/mlx5/driver.h
> +++ b/include/linux/mlx5/driver.h
> @@ -590,7 +590,6 @@ struct mlx5_port_module_event_stats {
>  struct mlx5_priv {
>         char                    name[MLX5_MAX_NAME_LEN];
>         struct mlx5_eq_table    eq_table;
> -       struct msix_entry       *msix_arr;
>         struct mlx5_irq_info    *irq_info;
>
>         /* pages stuff */
> --
> 2.7.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 for-4.13 1/6] mlx5: convert to generic pci_alloc_irq_vectors
  2017-06-07  5:26         ` Saeed Mahameed
@ 2017-06-07  5:46             ` Sagi Grimberg
  -1 siblings, 0 replies; 36+ messages in thread
From: Sagi Grimberg @ 2017-06-07  5:46 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: Doug Ledford, Christoph Hellwig, Leon Romanovsky,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

Saeed,

> Hey Sagi, i am sorry for the late review, had some old mail role that
> made me lose some rdma mails.
> Anyway i have one small comment here, But if the reset of the series
> is ok, i don't want to bother you with another V.
> so you can just ignore it if you wish to.
> 

...

>> @@ -350,9 +344,8 @@ static void mlx5_disable_msix(struct mlx5_core_dev *dev)
>>   {
> 
> rename to mlx5_free_irq_vectors to keep a symmetric function naming

Yea I should have done that,

I'll fix and send v4.

Thanks
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v3 for-4.13 1/6] mlx5: convert to generic pci_alloc_irq_vectors
@ 2017-06-07  5:46             ` Sagi Grimberg
  0 siblings, 0 replies; 36+ messages in thread
From: Sagi Grimberg @ 2017-06-07  5:46 UTC (permalink / raw)


Saeed,

> Hey Sagi, i am sorry for the late review, had some old mail role that
> made me lose some rdma mails.
> Anyway i have one small comment here, But if the reset of the series
> is ok, i don't want to bother you with another V.
> so you can just ignore it if you wish to.
> 

...

>> @@ -350,9 +344,8 @@ static void mlx5_disable_msix(struct mlx5_core_dev *dev)
>>   {
> 
> rename to mlx5_free_irq_vectors to keep a symmetric function naming

Yea I should have done that,

I'll fix and send v4.

Thanks

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 for-4.13 2/6] mlx5: move affinity hints assignments to generic code
  2017-06-05  6:35     ` Sagi Grimberg
@ 2017-06-07  6:16         ` Saeed Mahameed
  -1 siblings, 0 replies; 36+ messages in thread
From: Saeed Mahameed @ 2017-06-07  6:16 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Doug Ledford, Christoph Hellwig, Leon Romanovsky,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Mon, Jun 5, 2017 at 9:35 AM, Sagi Grimberg <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org> wrote:
> generic api takes care of spreading affinity similar to
> what mlx5 open coded (and even handles better asymmetric
> configurations). Ask the generic API to spread affinity
> for us, and feed him pre_vectors that do not participate
> in affinity settings (which is an improvement to what we
> had before).
>
> The affinity assignments should match what mlx5 tried to
> do earlier but now we do not set affinity to async, cmd
> and pages dedicated vectors.
>

I am not sure the new assignment will match what we tried to do before
this patch, and i would like to preserve that behavior
from before we simply spread comp vectors to the close numa cpus first
then to other cores uniformly.
i.e prefer first IRQs to go to close numa cores.

for example if you have 2 numa nodes each have 4 cpus, and the device
is on 2nd numa,
Numa 1 cpus: 0 1 2 3
Numa 2 cpus: 4 5 6 7

this should be the affinity:

IRQ[0] -> cpu[4] (Numa 2)
IRQ[1] -> cpu[5]
IRQ[2] -> cpu[6]
IRQ[3] -> cpu[7]

IRQ[4] -> cpu[0] (Numa 1)
IRQ[5] -> cpu[1]
IRQ[6] -> cpu[2]
IRQ[7] -> cpu[3]

looking at irq_create_affinity_masks, and it seems not to be the case !
"nodemask_t nodemsk = NODE_MASK_NONE;" it doesn't seem to prefer any numa node.

I am sure that there is a way to force our mlx5 affinity strategy and
override the default one with the new API.

>
> Also, remove mlx5e_get_cpu routine as we have generic helpers
> to get cpumask and node given a irq vector, so use them
> directly.
>
> Reviewed-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
> Acked-by: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> Signed-off-by: Sagi Grimberg <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 12 ++--
>  drivers/net/ethernet/mellanox/mlx5/core/main.c    | 83 ++---------------------
>  include/linux/mlx5/driver.h                       |  1 -
>  3 files changed, 10 insertions(+), 86 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> index 2a3c59e55dcf..ebfda1eae6b4 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> @@ -1565,11 +1565,6 @@ static void mlx5e_close_cq(struct mlx5e_cq *cq)
>         mlx5e_free_cq(cq);
>  }
>
> -static int mlx5e_get_cpu(struct mlx5e_priv *priv, int ix)
> -{
> -       return cpumask_first(priv->mdev->priv.irq_info[ix].mask);
> -}
>

let's keep this abstraction, even let's consider moving this to a
helper function in the mlx5_core dirver main.c,
it is not right when mlx5_ib and mlx5e netdev know about internal mdev
structures and implementations of stuff.

I suggest to move mlx5_ib_get_vector_affinity from patch #4 into
drivers/net/ethernet/../mlx5/core/main.c
and rename it to mlx5_get_vector_affinity and use it from both rdma
and netdevice

and change the above function to:

static int mlx5e_get_cpu(struct mlx5e_priv *priv, int ix)
{
       return cpumask_first(mlx5_get_vector_affinity(priv->mdev, ix));
}

Also this way you don't have to touch all the lines that used
mlx5e_get_cpu in this file.

>                              struct mlx5e_params *params,
>                              struct mlx5e_channel_param *cparam)
> @@ -1718,11 +1713,11 @@ static int mlx5e_open_channel(struct mlx5e_priv *priv, int ix,
>  {
>         struct mlx5e_cq_moder icocq_moder = {0, 0};
>         struct net_device *netdev = priv->netdev;
> -       int cpu = mlx5e_get_cpu(priv, ix);
>         struct mlx5e_channel *c;
>         int err;
>
> -       c = kzalloc_node(sizeof(*c), GFP_KERNEL, cpu_to_node(cpu));
> +       c = kzalloc_node(sizeof(*c), GFP_KERNEL,
> +               pci_irq_get_node(priv->mdev->pdev, MLX5_EQ_VEC_COMP_BASE + ix));

this might yield different behavior of what originally intended we
want to get the node of the CPU and not of the IRQ's, maybe there is
no difference but
let's keep mlx5e_get_cpu abstraction as above.

>         if (!c)
>                 return -ENOMEM;
>
> @@ -1730,7 +1725,8 @@ static int mlx5e_open_channel(struct mlx5e_priv *priv, int ix,
>         c->mdev     = priv->mdev;
>         c->tstamp   = &priv->tstamp;
>         c->ix       = ix;
> -       c->cpu      = cpu;
> +       c->cpu      = cpumask_first(pci_irq_get_affinity(priv->mdev->pdev,
> +                       MLX5_EQ_VEC_COMP_BASE + ix));
>         c->pdev     = &priv->mdev->pdev->dev;
>         c->netdev   = priv->netdev;
>         c->mkey_be  = cpu_to_be32(priv->mdev->mlx5e_res.mkey.key);
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
> index e4431aacce9d..7b9e7301929b 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
> @@ -312,6 +312,9 @@ static int mlx5_alloc_irq_vectors(struct mlx5_core_dev *dev)
>  {
>         struct mlx5_priv *priv = &dev->priv;
>         struct mlx5_eq_table *table = &priv->eq_table;
> +       struct irq_affinity irqdesc = {
> +               .pre_vectors = MLX5_EQ_VEC_COMP_BASE,
> +       };
>         int num_eqs = 1 << MLX5_CAP_GEN(dev, log_max_eq);
>         int nvec;
>
> @@ -325,9 +328,10 @@ static int mlx5_alloc_irq_vectors(struct mlx5_core_dev *dev)
>         if (!priv->irq_info)
>                 goto err_free_msix;
>
> -       nvec = pci_alloc_irq_vectors(dev->pdev,
> +       nvec = pci_alloc_irq_vectors_affinity(dev->pdev,
>                         MLX5_EQ_VEC_COMP_BASE + 1, nvec,
> -                       PCI_IRQ_MSIX);
> +                       PCI_IRQ_MSIX | PCI_IRQ_AFFINITY,
> +                       &irqdesc);
>         if (nvec < 0)
>                 return nvec;
>
> @@ -600,71 +604,6 @@ u64 mlx5_read_internal_timer(struct mlx5_core_dev *dev)
>         return (u64)timer_l | (u64)timer_h1 << 32;
>  }
>
> -static int mlx5_irq_set_affinity_hint(struct mlx5_core_dev *mdev, int i)
> -{
> -       struct mlx5_priv *priv  = &mdev->priv;
> -       int irq = pci_irq_vector(mdev->pdev, MLX5_EQ_VEC_COMP_BASE + i);
> -       int err;
> -
> -       if (!zalloc_cpumask_var(&priv->irq_info[i].mask, GFP_KERNEL)) {
> -               mlx5_core_warn(mdev, "zalloc_cpumask_var failed");
> -               return -ENOMEM;
> -       }
> -
> -       cpumask_set_cpu(cpumask_local_spread(i, priv->numa_node),
> -                       priv->irq_info[i].mask);
> -
> -       err = irq_set_affinity_hint(irq, priv->irq_info[i].mask);
> -       if (err) {
> -               mlx5_core_warn(mdev, "irq_set_affinity_hint failed,irq 0x%.4x",
> -                              irq);
> -               goto err_clear_mask;
> -       }
> -
> -       return 0;
> -
> -err_clear_mask:
> -       free_cpumask_var(priv->irq_info[i].mask);
> -       return err;
> -}
> -
> -static void mlx5_irq_clear_affinity_hint(struct mlx5_core_dev *mdev, int i)
> -{
> -       struct mlx5_priv *priv  = &mdev->priv;
> -       int irq = pci_irq_vector(mdev->pdev, MLX5_EQ_VEC_COMP_BASE + i);
> -
> -       irq_set_affinity_hint(irq, NULL);
> -       free_cpumask_var(priv->irq_info[i].mask);
> -}
> -
> -static int mlx5_irq_set_affinity_hints(struct mlx5_core_dev *mdev)
> -{
> -       int err;
> -       int i;
> -
> -       for (i = 0; i < mdev->priv.eq_table.num_comp_vectors; i++) {
> -               err = mlx5_irq_set_affinity_hint(mdev, i);
> -               if (err)
> -                       goto err_out;
> -       }
> -
> -       return 0;
> -
> -err_out:
> -       for (i--; i >= 0; i--)
> -               mlx5_irq_clear_affinity_hint(mdev, i);
> -
> -       return err;
> -}
> -
> -static void mlx5_irq_clear_affinity_hints(struct mlx5_core_dev *mdev)
> -{
> -       int i;
> -
> -       for (i = 0; i < mdev->priv.eq_table.num_comp_vectors; i++)
> -               mlx5_irq_clear_affinity_hint(mdev, i);
> -}
> -
>  int mlx5_vector2eqn(struct mlx5_core_dev *dev, int vector, int *eqn,
>                     unsigned int *irqn)
>  {
> @@ -1116,12 +1055,6 @@ static int mlx5_load_one(struct mlx5_core_dev *dev, struct mlx5_priv *priv,
>                 goto err_stop_eqs;
>         }
>
> -       err = mlx5_irq_set_affinity_hints(dev);
> -       if (err) {
> -               dev_err(&pdev->dev, "Failed to alloc affinity hint cpumask\n");
> -               goto err_affinity_hints;
> -       }
> -
>         err = mlx5_init_fs(dev);
>         if (err) {
>                 dev_err(&pdev->dev, "Failed to init flow steering\n");
> @@ -1165,9 +1098,6 @@ static int mlx5_load_one(struct mlx5_core_dev *dev, struct mlx5_priv *priv,
>         mlx5_cleanup_fs(dev);
>
>  err_fs:
> -       mlx5_irq_clear_affinity_hints(dev);
> -
> -err_affinity_hints:
>         free_comp_eqs(dev);
>
>  err_stop_eqs:
> @@ -1234,7 +1164,6 @@ static int mlx5_unload_one(struct mlx5_core_dev *dev, struct mlx5_priv *priv,
>         mlx5_eswitch_detach(dev->priv.eswitch);
>  #endif
>         mlx5_cleanup_fs(dev);
> -       mlx5_irq_clear_affinity_hints(dev);
>         free_comp_eqs(dev);
>         mlx5_stop_eqs(dev);
>         mlx5_put_uars_page(dev, priv->uar);
> diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
> index 4843fab18b83..963e3d59d740 100644
> --- a/include/linux/mlx5/driver.h
> +++ b/include/linux/mlx5/driver.h
> @@ -527,7 +527,6 @@ struct mlx5_core_sriov {
>  };
>
>  struct mlx5_irq_info {
> -       cpumask_var_t mask;
>         char name[MLX5_MAX_IRQ_NAME];
>  };
>
> --
> 2.7.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v3 for-4.13 2/6] mlx5: move affinity hints assignments to generic code
@ 2017-06-07  6:16         ` Saeed Mahameed
  0 siblings, 0 replies; 36+ messages in thread
From: Saeed Mahameed @ 2017-06-07  6:16 UTC (permalink / raw)


On Mon, Jun 5, 2017@9:35 AM, Sagi Grimberg <sagi@grimberg.me> wrote:
> generic api takes care of spreading affinity similar to
> what mlx5 open coded (and even handles better asymmetric
> configurations). Ask the generic API to spread affinity
> for us, and feed him pre_vectors that do not participate
> in affinity settings (which is an improvement to what we
> had before).
>
> The affinity assignments should match what mlx5 tried to
> do earlier but now we do not set affinity to async, cmd
> and pages dedicated vectors.
>

I am not sure the new assignment will match what we tried to do before
this patch, and i would like to preserve that behavior
from before we simply spread comp vectors to the close numa cpus first
then to other cores uniformly.
i.e prefer first IRQs to go to close numa cores.

for example if you have 2 numa nodes each have 4 cpus, and the device
is on 2nd numa,
Numa 1 cpus: 0 1 2 3
Numa 2 cpus: 4 5 6 7

this should be the affinity:

IRQ[0] -> cpu[4] (Numa 2)
IRQ[1] -> cpu[5]
IRQ[2] -> cpu[6]
IRQ[3] -> cpu[7]

IRQ[4] -> cpu[0] (Numa 1)
IRQ[5] -> cpu[1]
IRQ[6] -> cpu[2]
IRQ[7] -> cpu[3]

looking at irq_create_affinity_masks, and it seems not to be the case !
"nodemask_t nodemsk = NODE_MASK_NONE;" it doesn't seem to prefer any numa node.

I am sure that there is a way to force our mlx5 affinity strategy and
override the default one with the new API.

>
> Also, remove mlx5e_get_cpu routine as we have generic helpers
> to get cpumask and node given a irq vector, so use them
> directly.
>
> Reviewed-by: Christoph Hellwig <hch at lst.de>
> Acked-by: Leon Romanovsky <leonro at mellanox.com>
> Signed-off-by: Sagi Grimberg <sagi at grimberg.me>
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 12 ++--
>  drivers/net/ethernet/mellanox/mlx5/core/main.c    | 83 ++---------------------
>  include/linux/mlx5/driver.h                       |  1 -
>  3 files changed, 10 insertions(+), 86 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> index 2a3c59e55dcf..ebfda1eae6b4 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> @@ -1565,11 +1565,6 @@ static void mlx5e_close_cq(struct mlx5e_cq *cq)
>         mlx5e_free_cq(cq);
>  }
>
> -static int mlx5e_get_cpu(struct mlx5e_priv *priv, int ix)
> -{
> -       return cpumask_first(priv->mdev->priv.irq_info[ix].mask);
> -}
>

let's keep this abstraction, even let's consider moving this to a
helper function in the mlx5_core dirver main.c,
it is not right when mlx5_ib and mlx5e netdev know about internal mdev
structures and implementations of stuff.

I suggest to move mlx5_ib_get_vector_affinity from patch #4 into
drivers/net/ethernet/../mlx5/core/main.c
and rename it to mlx5_get_vector_affinity and use it from both rdma
and netdevice

and change the above function to:

static int mlx5e_get_cpu(struct mlx5e_priv *priv, int ix)
{
       return cpumask_first(mlx5_get_vector_affinity(priv->mdev, ix));
}

Also this way you don't have to touch all the lines that used
mlx5e_get_cpu in this file.

>                              struct mlx5e_params *params,
>                              struct mlx5e_channel_param *cparam)
> @@ -1718,11 +1713,11 @@ static int mlx5e_open_channel(struct mlx5e_priv *priv, int ix,
>  {
>         struct mlx5e_cq_moder icocq_moder = {0, 0};
>         struct net_device *netdev = priv->netdev;
> -       int cpu = mlx5e_get_cpu(priv, ix);
>         struct mlx5e_channel *c;
>         int err;
>
> -       c = kzalloc_node(sizeof(*c), GFP_KERNEL, cpu_to_node(cpu));
> +       c = kzalloc_node(sizeof(*c), GFP_KERNEL,
> +               pci_irq_get_node(priv->mdev->pdev, MLX5_EQ_VEC_COMP_BASE + ix));

this might yield different behavior of what originally intended we
want to get the node of the CPU and not of the IRQ's, maybe there is
no difference but
let's keep mlx5e_get_cpu abstraction as above.

>         if (!c)
>                 return -ENOMEM;
>
> @@ -1730,7 +1725,8 @@ static int mlx5e_open_channel(struct mlx5e_priv *priv, int ix,
>         c->mdev     = priv->mdev;
>         c->tstamp   = &priv->tstamp;
>         c->ix       = ix;
> -       c->cpu      = cpu;
> +       c->cpu      = cpumask_first(pci_irq_get_affinity(priv->mdev->pdev,
> +                       MLX5_EQ_VEC_COMP_BASE + ix));
>         c->pdev     = &priv->mdev->pdev->dev;
>         c->netdev   = priv->netdev;
>         c->mkey_be  = cpu_to_be32(priv->mdev->mlx5e_res.mkey.key);
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
> index e4431aacce9d..7b9e7301929b 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
> @@ -312,6 +312,9 @@ static int mlx5_alloc_irq_vectors(struct mlx5_core_dev *dev)
>  {
>         struct mlx5_priv *priv = &dev->priv;
>         struct mlx5_eq_table *table = &priv->eq_table;
> +       struct irq_affinity irqdesc = {
> +               .pre_vectors = MLX5_EQ_VEC_COMP_BASE,
> +       };
>         int num_eqs = 1 << MLX5_CAP_GEN(dev, log_max_eq);
>         int nvec;
>
> @@ -325,9 +328,10 @@ static int mlx5_alloc_irq_vectors(struct mlx5_core_dev *dev)
>         if (!priv->irq_info)
>                 goto err_free_msix;
>
> -       nvec = pci_alloc_irq_vectors(dev->pdev,
> +       nvec = pci_alloc_irq_vectors_affinity(dev->pdev,
>                         MLX5_EQ_VEC_COMP_BASE + 1, nvec,
> -                       PCI_IRQ_MSIX);
> +                       PCI_IRQ_MSIX | PCI_IRQ_AFFINITY,
> +                       &irqdesc);
>         if (nvec < 0)
>                 return nvec;
>
> @@ -600,71 +604,6 @@ u64 mlx5_read_internal_timer(struct mlx5_core_dev *dev)
>         return (u64)timer_l | (u64)timer_h1 << 32;
>  }
>
> -static int mlx5_irq_set_affinity_hint(struct mlx5_core_dev *mdev, int i)
> -{
> -       struct mlx5_priv *priv  = &mdev->priv;
> -       int irq = pci_irq_vector(mdev->pdev, MLX5_EQ_VEC_COMP_BASE + i);
> -       int err;
> -
> -       if (!zalloc_cpumask_var(&priv->irq_info[i].mask, GFP_KERNEL)) {
> -               mlx5_core_warn(mdev, "zalloc_cpumask_var failed");
> -               return -ENOMEM;
> -       }
> -
> -       cpumask_set_cpu(cpumask_local_spread(i, priv->numa_node),
> -                       priv->irq_info[i].mask);
> -
> -       err = irq_set_affinity_hint(irq, priv->irq_info[i].mask);
> -       if (err) {
> -               mlx5_core_warn(mdev, "irq_set_affinity_hint failed,irq 0x%.4x",
> -                              irq);
> -               goto err_clear_mask;
> -       }
> -
> -       return 0;
> -
> -err_clear_mask:
> -       free_cpumask_var(priv->irq_info[i].mask);
> -       return err;
> -}
> -
> -static void mlx5_irq_clear_affinity_hint(struct mlx5_core_dev *mdev, int i)
> -{
> -       struct mlx5_priv *priv  = &mdev->priv;
> -       int irq = pci_irq_vector(mdev->pdev, MLX5_EQ_VEC_COMP_BASE + i);
> -
> -       irq_set_affinity_hint(irq, NULL);
> -       free_cpumask_var(priv->irq_info[i].mask);
> -}
> -
> -static int mlx5_irq_set_affinity_hints(struct mlx5_core_dev *mdev)
> -{
> -       int err;
> -       int i;
> -
> -       for (i = 0; i < mdev->priv.eq_table.num_comp_vectors; i++) {
> -               err = mlx5_irq_set_affinity_hint(mdev, i);
> -               if (err)
> -                       goto err_out;
> -       }
> -
> -       return 0;
> -
> -err_out:
> -       for (i--; i >= 0; i--)
> -               mlx5_irq_clear_affinity_hint(mdev, i);
> -
> -       return err;
> -}
> -
> -static void mlx5_irq_clear_affinity_hints(struct mlx5_core_dev *mdev)
> -{
> -       int i;
> -
> -       for (i = 0; i < mdev->priv.eq_table.num_comp_vectors; i++)
> -               mlx5_irq_clear_affinity_hint(mdev, i);
> -}
> -
>  int mlx5_vector2eqn(struct mlx5_core_dev *dev, int vector, int *eqn,
>                     unsigned int *irqn)
>  {
> @@ -1116,12 +1055,6 @@ static int mlx5_load_one(struct mlx5_core_dev *dev, struct mlx5_priv *priv,
>                 goto err_stop_eqs;
>         }
>
> -       err = mlx5_irq_set_affinity_hints(dev);
> -       if (err) {
> -               dev_err(&pdev->dev, "Failed to alloc affinity hint cpumask\n");
> -               goto err_affinity_hints;
> -       }
> -
>         err = mlx5_init_fs(dev);
>         if (err) {
>                 dev_err(&pdev->dev, "Failed to init flow steering\n");
> @@ -1165,9 +1098,6 @@ static int mlx5_load_one(struct mlx5_core_dev *dev, struct mlx5_priv *priv,
>         mlx5_cleanup_fs(dev);
>
>  err_fs:
> -       mlx5_irq_clear_affinity_hints(dev);
> -
> -err_affinity_hints:
>         free_comp_eqs(dev);
>
>  err_stop_eqs:
> @@ -1234,7 +1164,6 @@ static int mlx5_unload_one(struct mlx5_core_dev *dev, struct mlx5_priv *priv,
>         mlx5_eswitch_detach(dev->priv.eswitch);
>  #endif
>         mlx5_cleanup_fs(dev);
> -       mlx5_irq_clear_affinity_hints(dev);
>         free_comp_eqs(dev);
>         mlx5_stop_eqs(dev);
>         mlx5_put_uars_page(dev, priv->uar);
> diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
> index 4843fab18b83..963e3d59d740 100644
> --- a/include/linux/mlx5/driver.h
> +++ b/include/linux/mlx5/driver.h
> @@ -527,7 +527,6 @@ struct mlx5_core_sriov {
>  };
>
>  struct mlx5_irq_info {
> -       cpumask_var_t mask;
>         char name[MLX5_MAX_IRQ_NAME];
>  };
>
> --
> 2.7.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 for-4.13 2/6] mlx5: move affinity hints assignments to generic code
  2017-06-07  6:16         ` Saeed Mahameed
@ 2017-06-07  8:31             ` Christoph Hellwig
  -1 siblings, 0 replies; 36+ messages in thread
From: Christoph Hellwig @ 2017-06-07  8:31 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: Sagi Grimberg, Doug Ledford, Christoph Hellwig, Leon Romanovsky,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Wed, Jun 07, 2017 at 09:16:47AM +0300, Saeed Mahameed wrote:
> On Mon, Jun 5, 2017 at 9:35 AM, Sagi Grimberg <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org> wrote:
> > generic api takes care of spreading affinity similar to
> > what mlx5 open coded (and even handles better asymmetric
> > configurations). Ask the generic API to spread affinity
> > for us, and feed him pre_vectors that do not participate
> > in affinity settings (which is an improvement to what we
> > had before).
> >
> > The affinity assignments should match what mlx5 tried to
> > do earlier but now we do not set affinity to async, cmd
> > and pages dedicated vectors.
> >
> 
> I am not sure the new assignment will match what we tried to do before
> this patch, and i would like to preserve that behavior
> from before we simply spread comp vectors to the close numa cpus first
> then to other cores uniformly.
> i.e prefer first IRQs to go to close numa cores.
> 
> for example if you have 2 numa nodes each have 4 cpus, and the device
> is on 2nd numa,
> Numa 1 cpus: 0 1 2 3
> Numa 2 cpus: 4 5 6 7
> 
> this should be the affinity:
> 
> IRQ[0] -> cpu[4] (Numa 2)
> IRQ[1] -> cpu[5]
> IRQ[2] -> cpu[6]
> IRQ[3] -> cpu[7]
> 
> IRQ[4] -> cpu[0] (Numa 1)
> IRQ[5] -> cpu[1]
> IRQ[6] -> cpu[2]
> IRQ[7] -> cpu[3]
> 
> looking at irq_create_affinity_masks, and it seems not to be the case !
> "nodemask_t nodemsk = NODE_MASK_NONE;" it doesn't seem to prefer any numa node.

nodemsk is set up by get_nodes_in_cpumask.  The mask you should
get with the new code is:

IRQ[0] -> cpu[0] (Numa 1)
IRQ[1] -> cpu[1]
IRQ[2] -> cpu[2]
IRQ[3] -> cpu[3]

IRQ[4] -> cpu[4] (Numa 2)
IRQ[5] -> cpu[5]
IRQ[6] -> cpu[6]
IRQ[7] -> cpu[7]

is there any reason you want to start assining vectors on the local
node?  This is doable, but would complicate the code quite a bit
so it needs a good argument.
 
> I am sure that there is a way to force our mlx5 affinity strategy and
> override the default one with the new API.

No, there is not.  The whole point is that we want to come up with
a common policy instead of each driver doing their own weird little
thing.

> > -static int mlx5e_get_cpu(struct mlx5e_priv *priv, int ix)
> > -{
> > -       return cpumask_first(priv->mdev->priv.irq_info[ix].mask);
> > -}
> >
> 
> let's keep this abstraction, even let's consider moving this to a
> helper function in the mlx5_core dirver main.c,
> it is not right when mlx5_ib and mlx5e netdev know about internal mdev
> structures and implementations of stuff.
> 
> I suggest to move mlx5_ib_get_vector_affinity from patch #4 into
> drivers/net/ethernet/../mlx5/core/main.c
> and rename it to mlx5_get_vector_affinity and use it from both rdma
> and netdevice
> 
> and change the above function to:
> 
> static int mlx5e_get_cpu(struct mlx5e_priv *priv, int ix)
> {
>        return cpumask_first(mlx5_get_vector_affinity(priv->mdev, ix));
> }

Take a look at my comment to Sagi's repost.  The driver never
actually cares about this weird cpu value - it cares about a node
for the vectors and PCI layer provides the pci_irq_get_node helper
for that.  We could wrap this with a mlx5e helper, but that's not
really the normal style in the kernel.

> >         int err;
> >
> > -       c = kzalloc_node(sizeof(*c), GFP_KERNEL, cpu_to_node(cpu));
> > +       c = kzalloc_node(sizeof(*c), GFP_KERNEL,
> > +               pci_irq_get_node(priv->mdev->pdev, MLX5_EQ_VEC_COMP_BASE + ix));
> 
> this might yield different behavior of what originally intended we
> want to get the node of the CPU and not of the IRQ's, maybe there is
> no difference but
> let's keep mlx5e_get_cpu abstraction as above.

It's a completely bogus abstraction.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v3 for-4.13 2/6] mlx5: move affinity hints assignments to generic code
@ 2017-06-07  8:31             ` Christoph Hellwig
  0 siblings, 0 replies; 36+ messages in thread
From: Christoph Hellwig @ 2017-06-07  8:31 UTC (permalink / raw)


On Wed, Jun 07, 2017@09:16:47AM +0300, Saeed Mahameed wrote:
> On Mon, Jun 5, 2017@9:35 AM, Sagi Grimberg <sagi@grimberg.me> wrote:
> > generic api takes care of spreading affinity similar to
> > what mlx5 open coded (and even handles better asymmetric
> > configurations). Ask the generic API to spread affinity
> > for us, and feed him pre_vectors that do not participate
> > in affinity settings (which is an improvement to what we
> > had before).
> >
> > The affinity assignments should match what mlx5 tried to
> > do earlier but now we do not set affinity to async, cmd
> > and pages dedicated vectors.
> >
> 
> I am not sure the new assignment will match what we tried to do before
> this patch, and i would like to preserve that behavior
> from before we simply spread comp vectors to the close numa cpus first
> then to other cores uniformly.
> i.e prefer first IRQs to go to close numa cores.
> 
> for example if you have 2 numa nodes each have 4 cpus, and the device
> is on 2nd numa,
> Numa 1 cpus: 0 1 2 3
> Numa 2 cpus: 4 5 6 7
> 
> this should be the affinity:
> 
> IRQ[0] -> cpu[4] (Numa 2)
> IRQ[1] -> cpu[5]
> IRQ[2] -> cpu[6]
> IRQ[3] -> cpu[7]
> 
> IRQ[4] -> cpu[0] (Numa 1)
> IRQ[5] -> cpu[1]
> IRQ[6] -> cpu[2]
> IRQ[7] -> cpu[3]
> 
> looking at irq_create_affinity_masks, and it seems not to be the case !
> "nodemask_t nodemsk = NODE_MASK_NONE;" it doesn't seem to prefer any numa node.

nodemsk is set up by get_nodes_in_cpumask.  The mask you should
get with the new code is:

IRQ[0] -> cpu[0] (Numa 1)
IRQ[1] -> cpu[1]
IRQ[2] -> cpu[2]
IRQ[3] -> cpu[3]

IRQ[4] -> cpu[4] (Numa 2)
IRQ[5] -> cpu[5]
IRQ[6] -> cpu[6]
IRQ[7] -> cpu[7]

is there any reason you want to start assining vectors on the local
node?  This is doable, but would complicate the code quite a bit
so it needs a good argument.
 
> I am sure that there is a way to force our mlx5 affinity strategy and
> override the default one with the new API.

No, there is not.  The whole point is that we want to come up with
a common policy instead of each driver doing their own weird little
thing.

> > -static int mlx5e_get_cpu(struct mlx5e_priv *priv, int ix)
> > -{
> > -       return cpumask_first(priv->mdev->priv.irq_info[ix].mask);
> > -}
> >
> 
> let's keep this abstraction, even let's consider moving this to a
> helper function in the mlx5_core dirver main.c,
> it is not right when mlx5_ib and mlx5e netdev know about internal mdev
> structures and implementations of stuff.
> 
> I suggest to move mlx5_ib_get_vector_affinity from patch #4 into
> drivers/net/ethernet/../mlx5/core/main.c
> and rename it to mlx5_get_vector_affinity and use it from both rdma
> and netdevice
> 
> and change the above function to:
> 
> static int mlx5e_get_cpu(struct mlx5e_priv *priv, int ix)
> {
>        return cpumask_first(mlx5_get_vector_affinity(priv->mdev, ix));
> }

Take a look at my comment to Sagi's repost.  The driver never
actually cares about this weird cpu value - it cares about a node
for the vectors and PCI layer provides the pci_irq_get_node helper
for that.  We could wrap this with a mlx5e helper, but that's not
really the normal style in the kernel.

> >         int err;
> >
> > -       c = kzalloc_node(sizeof(*c), GFP_KERNEL, cpu_to_node(cpu));
> > +       c = kzalloc_node(sizeof(*c), GFP_KERNEL,
> > +               pci_irq_get_node(priv->mdev->pdev, MLX5_EQ_VEC_COMP_BASE + ix));
> 
> this might yield different behavior of what originally intended we
> want to get the node of the CPU and not of the IRQ's, maybe there is
> no difference but
> let's keep mlx5e_get_cpu abstraction as above.

It's a completely bogus abstraction.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 for-4.13 2/6] mlx5: move affinity hints assignments to generic code
  2017-06-07  8:31             ` Christoph Hellwig
@ 2017-06-07  9:48                 ` Sagi Grimberg
  -1 siblings, 0 replies; 36+ messages in thread
From: Sagi Grimberg @ 2017-06-07  9:48 UTC (permalink / raw)
  To: Christoph Hellwig, Saeed Mahameed
  Cc: Doug Ledford, Leon Romanovsky, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r


>> I am not sure the new assignment will match what we tried to do before
>> this patch, and i would like to preserve that behavior
>> from before we simply spread comp vectors to the close numa cpus first
>> then to other cores uniformly.
>> i.e prefer first IRQs to go to close numa cores.
>>
>> for example if you have 2 numa nodes each have 4 cpus, and the device
>> is on 2nd numa,
>> Numa 1 cpus: 0 1 2 3
>> Numa 2 cpus: 4 5 6 7
>>
>> this should be the affinity:
>>
>> IRQ[0] -> cpu[4] (Numa 2)
>> IRQ[1] -> cpu[5]
>> IRQ[2] -> cpu[6]
>> IRQ[3] -> cpu[7]
>>
>> IRQ[4] -> cpu[0] (Numa 1)
>> IRQ[5] -> cpu[1]
>> IRQ[6] -> cpu[2]
>> IRQ[7] -> cpu[3]
>>
>> looking at irq_create_affinity_masks, and it seems not to be the case !
>> "nodemask_t nodemsk = NODE_MASK_NONE;" it doesn't seem to prefer any numa node.
> 
> nodemsk is set up by get_nodes_in_cpumask.  The mask you should
> get with the new code is:
> 
> IRQ[0] -> cpu[0] (Numa 1)
> IRQ[1] -> cpu[1]
> IRQ[2] -> cpu[2]
> IRQ[3] -> cpu[3]
> 
> IRQ[4] -> cpu[4] (Numa 2)
> IRQ[5] -> cpu[5]
> IRQ[6] -> cpu[6]
> IRQ[7] -> cpu[7]
> 
> is there any reason you want to start assining vectors on the local
> node?  This is doable, but would complicate the code quite a bit
> so it needs a good argument.

My interpretation is that mlx5 tried to do this for the (rather esoteric
in my mind) case where the platform does not have enough vectors for the
driver to allocate percpu. In this case, the next best thing is to stay
as close to the device affinity as possible.

>> I am sure that there is a way to force our mlx5 affinity strategy and
>> override the default one with the new API.
> 
> No, there is not.  The whole point is that we want to come up with
> a common policy instead of each driver doing their own weird little
> thing.

Agreed.

>>> -static int mlx5e_get_cpu(struct mlx5e_priv *priv, int ix)
>>> -{
>>> -       return cpumask_first(priv->mdev->priv.irq_info[ix].mask);
>>> -}
>>>
>>
>> let's keep this abstraction, even let's consider moving this to a
>> helper function in the mlx5_core dirver main.c,
>> it is not right when mlx5_ib and mlx5e netdev know about internal mdev
>> structures and implementations of stuff.
>>
>> I suggest to move mlx5_ib_get_vector_affinity from patch #4 into
>> drivers/net/ethernet/../mlx5/core/main.c
>> and rename it to mlx5_get_vector_affinity and use it from both rdma
>> and netdevice
>>
>> and change the above function to:
>>
>> static int mlx5e_get_cpu(struct mlx5e_priv *priv, int ix)
>> {
>>         return cpumask_first(mlx5_get_vector_affinity(priv->mdev, ix));
>> }
> 
> Take a look at my comment to Sagi's repost.  The driver never
> actually cares about this weird cpu value - it cares about a node
> for the vectors and PCI layer provides the pci_irq_get_node helper
> for that.  We could wrap this with a mlx5e helper, but that's not
> really the normal style in the kernel.
> 
>>>          int err;
>>>
>>> -       c = kzalloc_node(sizeof(*c), GFP_KERNEL, cpu_to_node(cpu));
>>> +       c = kzalloc_node(sizeof(*c), GFP_KERNEL,
>>> +               pci_irq_get_node(priv->mdev->pdev, MLX5_EQ_VEC_COMP_BASE + ix));
>>
>> this might yield different behavior of what originally intended we
>> want to get the node of the CPU and not of the IRQ's, maybe there is
>> no difference but

There is no difference, the node of the CPU _is_ the node of the IRQs
(it originates from irq affinity).

>> let's keep mlx5e_get_cpu abstraction as above.
> 
> It's a completely bogus abstraction.

I tend to agree, but can easily change it.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v3 for-4.13 2/6] mlx5: move affinity hints assignments to generic code
@ 2017-06-07  9:48                 ` Sagi Grimberg
  0 siblings, 0 replies; 36+ messages in thread
From: Sagi Grimberg @ 2017-06-07  9:48 UTC (permalink / raw)



>> I am not sure the new assignment will match what we tried to do before
>> this patch, and i would like to preserve that behavior
>> from before we simply spread comp vectors to the close numa cpus first
>> then to other cores uniformly.
>> i.e prefer first IRQs to go to close numa cores.
>>
>> for example if you have 2 numa nodes each have 4 cpus, and the device
>> is on 2nd numa,
>> Numa 1 cpus: 0 1 2 3
>> Numa 2 cpus: 4 5 6 7
>>
>> this should be the affinity:
>>
>> IRQ[0] -> cpu[4] (Numa 2)
>> IRQ[1] -> cpu[5]
>> IRQ[2] -> cpu[6]
>> IRQ[3] -> cpu[7]
>>
>> IRQ[4] -> cpu[0] (Numa 1)
>> IRQ[5] -> cpu[1]
>> IRQ[6] -> cpu[2]
>> IRQ[7] -> cpu[3]
>>
>> looking at irq_create_affinity_masks, and it seems not to be the case !
>> "nodemask_t nodemsk = NODE_MASK_NONE;" it doesn't seem to prefer any numa node.
> 
> nodemsk is set up by get_nodes_in_cpumask.  The mask you should
> get with the new code is:
> 
> IRQ[0] -> cpu[0] (Numa 1)
> IRQ[1] -> cpu[1]
> IRQ[2] -> cpu[2]
> IRQ[3] -> cpu[3]
> 
> IRQ[4] -> cpu[4] (Numa 2)
> IRQ[5] -> cpu[5]
> IRQ[6] -> cpu[6]
> IRQ[7] -> cpu[7]
> 
> is there any reason you want to start assining vectors on the local
> node?  This is doable, but would complicate the code quite a bit
> so it needs a good argument.

My interpretation is that mlx5 tried to do this for the (rather esoteric
in my mind) case where the platform does not have enough vectors for the
driver to allocate percpu. In this case, the next best thing is to stay
as close to the device affinity as possible.

>> I am sure that there is a way to force our mlx5 affinity strategy and
>> override the default one with the new API.
> 
> No, there is not.  The whole point is that we want to come up with
> a common policy instead of each driver doing their own weird little
> thing.

Agreed.

>>> -static int mlx5e_get_cpu(struct mlx5e_priv *priv, int ix)
>>> -{
>>> -       return cpumask_first(priv->mdev->priv.irq_info[ix].mask);
>>> -}
>>>
>>
>> let's keep this abstraction, even let's consider moving this to a
>> helper function in the mlx5_core dirver main.c,
>> it is not right when mlx5_ib and mlx5e netdev know about internal mdev
>> structures and implementations of stuff.
>>
>> I suggest to move mlx5_ib_get_vector_affinity from patch #4 into
>> drivers/net/ethernet/../mlx5/core/main.c
>> and rename it to mlx5_get_vector_affinity and use it from both rdma
>> and netdevice
>>
>> and change the above function to:
>>
>> static int mlx5e_get_cpu(struct mlx5e_priv *priv, int ix)
>> {
>>         return cpumask_first(mlx5_get_vector_affinity(priv->mdev, ix));
>> }
> 
> Take a look at my comment to Sagi's repost.  The driver never
> actually cares about this weird cpu value - it cares about a node
> for the vectors and PCI layer provides the pci_irq_get_node helper
> for that.  We could wrap this with a mlx5e helper, but that's not
> really the normal style in the kernel.
> 
>>>          int err;
>>>
>>> -       c = kzalloc_node(sizeof(*c), GFP_KERNEL, cpu_to_node(cpu));
>>> +       c = kzalloc_node(sizeof(*c), GFP_KERNEL,
>>> +               pci_irq_get_node(priv->mdev->pdev, MLX5_EQ_VEC_COMP_BASE + ix));
>>
>> this might yield different behavior of what originally intended we
>> want to get the node of the CPU and not of the IRQ's, maybe there is
>> no difference but

There is no difference, the node of the CPU _is_ the node of the IRQs
(it originates from irq affinity).

>> let's keep mlx5e_get_cpu abstraction as above.
> 
> It's a completely bogus abstraction.

I tend to agree, but can easily change it.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 for-4.13 2/6] mlx5: move affinity hints assignments to generic code
  2017-06-07  9:48                 ` Sagi Grimberg
@ 2017-06-08  9:28                     ` Saeed Mahameed
  -1 siblings, 0 replies; 36+ messages in thread
From: Saeed Mahameed @ 2017-06-08  9:28 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Christoph Hellwig, Doug Ledford, Leon Romanovsky,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Wed, Jun 7, 2017 at 12:48 PM, Sagi Grimberg <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org> wrote:
>
>>> I am not sure the new assignment will match what we tried to do before
>>> this patch, and i would like to preserve that behavior
>>> from before we simply spread comp vectors to the close numa cpus first
>>> then to other cores uniformly.
>>> i.e prefer first IRQs to go to close numa cores.
>>>
>>> for example if you have 2 numa nodes each have 4 cpus, and the device
>>> is on 2nd numa,
>>> Numa 1 cpus: 0 1 2 3
>>> Numa 2 cpus: 4 5 6 7
>>>
>>> this should be the affinity:
>>>
>>> IRQ[0] -> cpu[4] (Numa 2)
>>> IRQ[1] -> cpu[5]
>>> IRQ[2] -> cpu[6]
>>> IRQ[3] -> cpu[7]
>>>
>>> IRQ[4] -> cpu[0] (Numa 1)
>>> IRQ[5] -> cpu[1]
>>> IRQ[6] -> cpu[2]
>>> IRQ[7] -> cpu[3]
>>>
>>> looking at irq_create_affinity_masks, and it seems not to be the case !
>>> "nodemask_t nodemsk = NODE_MASK_NONE;" it doesn't seem to prefer any numa
>>> node.
>>
>>
>> nodemsk is set up by get_nodes_in_cpumask.  The mask you should
>> get with the new code is:
>>
>> IRQ[0] -> cpu[0] (Numa 1)
>> IRQ[1] -> cpu[1]
>> IRQ[2] -> cpu[2]
>> IRQ[3] -> cpu[3]
>>
>> IRQ[4] -> cpu[4] (Numa 2)
>> IRQ[5] -> cpu[5]
>> IRQ[6] -> cpu[6]
>> IRQ[7] -> cpu[7]
>>
>> is there any reason you want to start assining vectors on the local
>> node?  This is doable, but would complicate the code quite a bit
>> so it needs a good argument.
>
>
> My interpretation is that mlx5 tried to do this for the (rather esoteric
> in my mind) case where the platform does not have enough vectors for the
> driver to allocate percpu. In this case, the next best thing is to stay
> as close to the device affinity as possible.
>

No, we did it for the reason that mlx5e netdevice assumes that
IRQ[0]..IRQ[#num_numa/#cpu_per_numa]
are always bound to the numa close to the device. and the mlx5e driver
choose those IRQs to spread
the RSS hash only into them and never uses other IRQs/Cores

which means with the current mlx5e code
(mlx5e_build_default_indir_rqt), there is a good chance the driver
will only use
cpus/IRQs on the far numa for it's RX traffic.

one way to fix this is to change mlx5e_build_default_indir_rqt to not
have this assumption and spread the RSS hash across
all the IRQs.

But this will increase the risk of conflicting with current net-next,
is Doug ok with this ? did we merge test ?

>>> I am sure that there is a way to force our mlx5 affinity strategy and
>>> override the default one with the new API.
>>
>>
>> No, there is not.  The whole point is that we want to come up with
>> a common policy instead of each driver doing their own weird little
>> thing.
>
>
> Agreed.
>
>

I can live with that, but please address the above, since it will be a
regression.

>>>> -static int mlx5e_get_cpu(struct mlx5e_priv *priv, int ix)
>>>> -{
>>>> -       return cpumask_first(priv->mdev->priv.irq_info[ix].mask);
>>>> -}
>>>>
>>>
>>> let's keep this abstraction, even let's consider moving this to a
>>> helper function in the mlx5_core dirver main.c,
>>> it is not right when mlx5_ib and mlx5e netdev know about internal mdev
>>> structures and implementations of stuff.
>>>
>>> I suggest to move mlx5_ib_get_vector_affinity from patch #4 into
>>> drivers/net/ethernet/../mlx5/core/main.c
>>> and rename it to mlx5_get_vector_affinity and use it from both rdma
>>> and netdevice
>>>
>>> and change the above function to:
>>>
>>> static int mlx5e_get_cpu(struct mlx5e_priv *priv, int ix)
>>> {
>>>         return cpumask_first(mlx5_get_vector_affinity(priv->mdev, ix));
>>> }
>>
>>
>> Take a look at my comment to Sagi's repost.  The driver never
>> actually cares about this weird cpu value - it cares about a node
>> for the vectors and PCI layer provides the pci_irq_get_node helper
>> for that.  We could wrap this with a mlx5e helper, but that's not
>> really the normal style in the kernel.
>>
>>>>          int err;
>>>>
>>>> -       c = kzalloc_node(sizeof(*c), GFP_KERNEL, cpu_to_node(cpu));
>>>> +       c = kzalloc_node(sizeof(*c), GFP_KERNEL,
>>>> +               pci_irq_get_node(priv->mdev->pdev, MLX5_EQ_VEC_COMP_BASE
>>>> + ix));
>>>
>>>
>>> this might yield different behavior of what originally intended we
>>> want to get the node of the CPU and not of the IRQ's, maybe there is
>>> no difference but
>
>
> There is no difference, the node of the CPU _is_ the node of the IRQs
> (it originates from irq affinity).
>
>>> let's keep mlx5e_get_cpu abstraction as above.
>>
>>
>> It's a completely bogus abstraction.
>
>
> I tend to agree, but can easily change it.

at least change to to mlx5e_get_node ! As i said i don't want to
pepper the mlx5e code with stuff like
(priv->mdev->pdev, MLX5_EQ_VEC_COMP_BASE + ix) just call
mlx5e_get_channel_node(priv, channel_ix);
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v3 for-4.13 2/6] mlx5: move affinity hints assignments to generic code
@ 2017-06-08  9:28                     ` Saeed Mahameed
  0 siblings, 0 replies; 36+ messages in thread
From: Saeed Mahameed @ 2017-06-08  9:28 UTC (permalink / raw)


On Wed, Jun 7, 2017@12:48 PM, Sagi Grimberg <sagi@grimberg.me> wrote:
>
>>> I am not sure the new assignment will match what we tried to do before
>>> this patch, and i would like to preserve that behavior
>>> from before we simply spread comp vectors to the close numa cpus first
>>> then to other cores uniformly.
>>> i.e prefer first IRQs to go to close numa cores.
>>>
>>> for example if you have 2 numa nodes each have 4 cpus, and the device
>>> is on 2nd numa,
>>> Numa 1 cpus: 0 1 2 3
>>> Numa 2 cpus: 4 5 6 7
>>>
>>> this should be the affinity:
>>>
>>> IRQ[0] -> cpu[4] (Numa 2)
>>> IRQ[1] -> cpu[5]
>>> IRQ[2] -> cpu[6]
>>> IRQ[3] -> cpu[7]
>>>
>>> IRQ[4] -> cpu[0] (Numa 1)
>>> IRQ[5] -> cpu[1]
>>> IRQ[6] -> cpu[2]
>>> IRQ[7] -> cpu[3]
>>>
>>> looking at irq_create_affinity_masks, and it seems not to be the case !
>>> "nodemask_t nodemsk = NODE_MASK_NONE;" it doesn't seem to prefer any numa
>>> node.
>>
>>
>> nodemsk is set up by get_nodes_in_cpumask.  The mask you should
>> get with the new code is:
>>
>> IRQ[0] -> cpu[0] (Numa 1)
>> IRQ[1] -> cpu[1]
>> IRQ[2] -> cpu[2]
>> IRQ[3] -> cpu[3]
>>
>> IRQ[4] -> cpu[4] (Numa 2)
>> IRQ[5] -> cpu[5]
>> IRQ[6] -> cpu[6]
>> IRQ[7] -> cpu[7]
>>
>> is there any reason you want to start assining vectors on the local
>> node?  This is doable, but would complicate the code quite a bit
>> so it needs a good argument.
>
>
> My interpretation is that mlx5 tried to do this for the (rather esoteric
> in my mind) case where the platform does not have enough vectors for the
> driver to allocate percpu. In this case, the next best thing is to stay
> as close to the device affinity as possible.
>

No, we did it for the reason that mlx5e netdevice assumes that
IRQ[0]..IRQ[#num_numa/#cpu_per_numa]
are always bound to the numa close to the device. and the mlx5e driver
choose those IRQs to spread
the RSS hash only into them and never uses other IRQs/Cores

which means with the current mlx5e code
(mlx5e_build_default_indir_rqt), there is a good chance the driver
will only use
cpus/IRQs on the far numa for it's RX traffic.

one way to fix this is to change mlx5e_build_default_indir_rqt to not
have this assumption and spread the RSS hash across
all the IRQs.

But this will increase the risk of conflicting with current net-next,
is Doug ok with this ? did we merge test ?

>>> I am sure that there is a way to force our mlx5 affinity strategy and
>>> override the default one with the new API.
>>
>>
>> No, there is not.  The whole point is that we want to come up with
>> a common policy instead of each driver doing their own weird little
>> thing.
>
>
> Agreed.
>
>

I can live with that, but please address the above, since it will be a
regression.

>>>> -static int mlx5e_get_cpu(struct mlx5e_priv *priv, int ix)
>>>> -{
>>>> -       return cpumask_first(priv->mdev->priv.irq_info[ix].mask);
>>>> -}
>>>>
>>>
>>> let's keep this abstraction, even let's consider moving this to a
>>> helper function in the mlx5_core dirver main.c,
>>> it is not right when mlx5_ib and mlx5e netdev know about internal mdev
>>> structures and implementations of stuff.
>>>
>>> I suggest to move mlx5_ib_get_vector_affinity from patch #4 into
>>> drivers/net/ethernet/../mlx5/core/main.c
>>> and rename it to mlx5_get_vector_affinity and use it from both rdma
>>> and netdevice
>>>
>>> and change the above function to:
>>>
>>> static int mlx5e_get_cpu(struct mlx5e_priv *priv, int ix)
>>> {
>>>         return cpumask_first(mlx5_get_vector_affinity(priv->mdev, ix));
>>> }
>>
>>
>> Take a look at my comment to Sagi's repost.  The driver never
>> actually cares about this weird cpu value - it cares about a node
>> for the vectors and PCI layer provides the pci_irq_get_node helper
>> for that.  We could wrap this with a mlx5e helper, but that's not
>> really the normal style in the kernel.
>>
>>>>          int err;
>>>>
>>>> -       c = kzalloc_node(sizeof(*c), GFP_KERNEL, cpu_to_node(cpu));
>>>> +       c = kzalloc_node(sizeof(*c), GFP_KERNEL,
>>>> +               pci_irq_get_node(priv->mdev->pdev, MLX5_EQ_VEC_COMP_BASE
>>>> + ix));
>>>
>>>
>>> this might yield different behavior of what originally intended we
>>> want to get the node of the CPU and not of the IRQ's, maybe there is
>>> no difference but
>
>
> There is no difference, the node of the CPU _is_ the node of the IRQs
> (it originates from irq affinity).
>
>>> let's keep mlx5e_get_cpu abstraction as above.
>>
>>
>> It's a completely bogus abstraction.
>
>
> I tend to agree, but can easily change it.

at least change to to mlx5e_get_node ! As i said i don't want to
pepper the mlx5e code with stuff like
(priv->mdev->pdev, MLX5_EQ_VEC_COMP_BASE + ix) just call
mlx5e_get_channel_node(priv, channel_ix);

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 for-4.13 2/6] mlx5: move affinity hints assignments to generic code
  2017-06-08  9:28                     ` Saeed Mahameed
@ 2017-06-08 10:16                         ` Sagi Grimberg
  -1 siblings, 0 replies; 36+ messages in thread
From: Sagi Grimberg @ 2017-06-08 10:16 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: Christoph Hellwig, Doug Ledford, Leon Romanovsky,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r


>>> is there any reason you want to start assining vectors on the local
>>> node?  This is doable, but would complicate the code quite a bit
>>> so it needs a good argument.
>>
>>
>> My interpretation is that mlx5 tried to do this for the (rather esoteric
>> in my mind) case where the platform does not have enough vectors for the
>> driver to allocate percpu. In this case, the next best thing is to stay
>> as close to the device affinity as possible.
>>
> 
> No, we did it for the reason that mlx5e netdevice assumes that
> IRQ[0]..IRQ[#num_numa/#cpu_per_numa]
> are always bound to the numa close to the device. and the mlx5e driver
> choose those IRQs to spread
> the RSS hash only into them and never uses other IRQs/Cores

OK, that explains a lot of weirdness I've seen with mlx5e.

Can you explain why you're using only a single numa node for your RSS
table? What does it buy you? You open RX rings for _all_ cpus but
only spread on part of them? I must be missing something here...
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v3 for-4.13 2/6] mlx5: move affinity hints assignments to generic code
@ 2017-06-08 10:16                         ` Sagi Grimberg
  0 siblings, 0 replies; 36+ messages in thread
From: Sagi Grimberg @ 2017-06-08 10:16 UTC (permalink / raw)



>>> is there any reason you want to start assining vectors on the local
>>> node?  This is doable, but would complicate the code quite a bit
>>> so it needs a good argument.
>>
>>
>> My interpretation is that mlx5 tried to do this for the (rather esoteric
>> in my mind) case where the platform does not have enough vectors for the
>> driver to allocate percpu. In this case, the next best thing is to stay
>> as close to the device affinity as possible.
>>
> 
> No, we did it for the reason that mlx5e netdevice assumes that
> IRQ[0]..IRQ[#num_numa/#cpu_per_numa]
> are always bound to the numa close to the device. and the mlx5e driver
> choose those IRQs to spread
> the RSS hash only into them and never uses other IRQs/Cores

OK, that explains a lot of weirdness I've seen with mlx5e.

Can you explain why you're using only a single numa node for your RSS
table? What does it buy you? You open RX rings for _all_ cpus but
only spread on part of them? I must be missing something here...

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 for-4.13 2/6] mlx5: move affinity hints assignments to generic code
  2017-06-08 10:16                         ` Sagi Grimberg
@ 2017-06-08 11:42                             ` Saeed Mahameed
  -1 siblings, 0 replies; 36+ messages in thread
From: Saeed Mahameed @ 2017-06-08 11:42 UTC (permalink / raw)
  To: Sagi Grimberg, Tariq Toukan
  Cc: Christoph Hellwig, Doug Ledford, Leon Romanovsky,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Thu, Jun 8, 2017 at 1:16 PM, Sagi Grimberg <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org> wrote:
>
>>>> is there any reason you want to start assining vectors on the local
>>>> node?  This is doable, but would complicate the code quite a bit
>>>> so it needs a good argument.
>>>
>>>
>>>
>>> My interpretation is that mlx5 tried to do this for the (rather esoteric
>>> in my mind) case where the platform does not have enough vectors for the
>>> driver to allocate percpu. In this case, the next best thing is to stay
>>> as close to the device affinity as possible.
>>>
>>
>> No, we did it for the reason that mlx5e netdevice assumes that
>> IRQ[0]..IRQ[#num_numa/#cpu_per_numa]
>> are always bound to the numa close to the device. and the mlx5e driver
>> choose those IRQs to spread
>> the RSS hash only into them and never uses other IRQs/Cores
>
>
> OK, that explains a lot of weirdness I've seen with mlx5e.
>
> Can you explain why you're using only a single numa node for your RSS
> table? What does it buy you? You open RX rings for _all_ cpus but
> only spread on part of them? I must be missing something here...

Adding Tariq,

this is also part of the weirdness :), we do that to make sure any OOB
test you run you always get the best performance
and we will guarantee to always use close numa cores.

we open RX rings on all of the cores in case if the user want to
change the RSS table to point to the whole thing on the fly "ethtool
-X"

But we are willing to change that, Tariq can provide the patch,
without changing this mlx5e is broken.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v3 for-4.13 2/6] mlx5: move affinity hints assignments to generic code
@ 2017-06-08 11:42                             ` Saeed Mahameed
  0 siblings, 0 replies; 36+ messages in thread
From: Saeed Mahameed @ 2017-06-08 11:42 UTC (permalink / raw)


On Thu, Jun 8, 2017@1:16 PM, Sagi Grimberg <sagi@grimberg.me> wrote:
>
>>>> is there any reason you want to start assining vectors on the local
>>>> node?  This is doable, but would complicate the code quite a bit
>>>> so it needs a good argument.
>>>
>>>
>>>
>>> My interpretation is that mlx5 tried to do this for the (rather esoteric
>>> in my mind) case where the platform does not have enough vectors for the
>>> driver to allocate percpu. In this case, the next best thing is to stay
>>> as close to the device affinity as possible.
>>>
>>
>> No, we did it for the reason that mlx5e netdevice assumes that
>> IRQ[0]..IRQ[#num_numa/#cpu_per_numa]
>> are always bound to the numa close to the device. and the mlx5e driver
>> choose those IRQs to spread
>> the RSS hash only into them and never uses other IRQs/Cores
>
>
> OK, that explains a lot of weirdness I've seen with mlx5e.
>
> Can you explain why you're using only a single numa node for your RSS
> table? What does it buy you? You open RX rings for _all_ cpus but
> only spread on part of them? I must be missing something here...

Adding Tariq,

this is also part of the weirdness :), we do that to make sure any OOB
test you run you always get the best performance
and we will guarantee to always use close numa cores.

we open RX rings on all of the cores in case if the user want to
change the RSS table to point to the whole thing on the fly "ethtool
-X"

But we are willing to change that, Tariq can provide the patch,
without changing this mlx5e is broken.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 for-4.13 2/6] mlx5: move affinity hints assignments to generic code
  2017-06-08 11:42                             ` Saeed Mahameed
@ 2017-06-08 12:29                                 ` Sagi Grimberg
  -1 siblings, 0 replies; 36+ messages in thread
From: Sagi Grimberg @ 2017-06-08 12:29 UTC (permalink / raw)
  To: Saeed Mahameed, Tariq Toukan
  Cc: Christoph Hellwig, Doug Ledford, Leon Romanovsky,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r


>>>> My interpretation is that mlx5 tried to do this for the (rather esoteric
>>>> in my mind) case where the platform does not have enough vectors for the
>>>> driver to allocate percpu. In this case, the next best thing is to stay
>>>> as close to the device affinity as possible.
>>>>
>>>
>>> No, we did it for the reason that mlx5e netdevice assumes that
>>> IRQ[0]..IRQ[#num_numa/#cpu_per_numa]
>>> are always bound to the numa close to the device. and the mlx5e driver
>>> choose those IRQs to spread
>>> the RSS hash only into them and never uses other IRQs/Cores
>>
>>
>> OK, that explains a lot of weirdness I've seen with mlx5e.
>>
>> Can you explain why you're using only a single numa node for your RSS
>> table? What does it buy you? You open RX rings for _all_ cpus but
>> only spread on part of them? I must be missing something here...
> 
> Adding Tariq,
> 
> this is also part of the weirdness :), we do that to make sure any OOB
> test you run you always get the best performance
> and we will guarantee to always use close numa cores.

Well I wish I knew that before :( I got to a point where I started
to seriously doubt the math truth of xor/toeplitz hashing strength :)

I'm sure you ran plenty of performance tests, but from my experience,
application locality makes much more difference than device locality,
especially when the application needs to touch the data...

> we open RX rings on all of the cores in case if the user want to
> change the RSS table to point to the whole thing on the fly "ethtool
> -X"

That is very counter intuitive afaict, is it documented anywhere?

users might rely on the (absolutely reasonable) assumption that if a
NIC exposes X rx rings, rx hashing should spread across all of them and
not a subset.

> But we are willing to change that, Tariq can provide the patch,
> without changing this mlx5e is broken.

What patch? to modify the RSS spread? What is exactly broken?

So I'm not sure how to move forward here, should we modify the
indirection table construction to not rely on the unique affinity
mappings?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v3 for-4.13 2/6] mlx5: move affinity hints assignments to generic code
@ 2017-06-08 12:29                                 ` Sagi Grimberg
  0 siblings, 0 replies; 36+ messages in thread
From: Sagi Grimberg @ 2017-06-08 12:29 UTC (permalink / raw)



>>>> My interpretation is that mlx5 tried to do this for the (rather esoteric
>>>> in my mind) case where the platform does not have enough vectors for the
>>>> driver to allocate percpu. In this case, the next best thing is to stay
>>>> as close to the device affinity as possible.
>>>>
>>>
>>> No, we did it for the reason that mlx5e netdevice assumes that
>>> IRQ[0]..IRQ[#num_numa/#cpu_per_numa]
>>> are always bound to the numa close to the device. and the mlx5e driver
>>> choose those IRQs to spread
>>> the RSS hash only into them and never uses other IRQs/Cores
>>
>>
>> OK, that explains a lot of weirdness I've seen with mlx5e.
>>
>> Can you explain why you're using only a single numa node for your RSS
>> table? What does it buy you? You open RX rings for _all_ cpus but
>> only spread on part of them? I must be missing something here...
> 
> Adding Tariq,
> 
> this is also part of the weirdness :), we do that to make sure any OOB
> test you run you always get the best performance
> and we will guarantee to always use close numa cores.

Well I wish I knew that before :( I got to a point where I started
to seriously doubt the math truth of xor/toeplitz hashing strength :)

I'm sure you ran plenty of performance tests, but from my experience,
application locality makes much more difference than device locality,
especially when the application needs to touch the data...

> we open RX rings on all of the cores in case if the user want to
> change the RSS table to point to the whole thing on the fly "ethtool
> -X"

That is very counter intuitive afaict, is it documented anywhere?

users might rely on the (absolutely reasonable) assumption that if a
NIC exposes X rx rings, rx hashing should spread across all of them and
not a subset.

> But we are willing to change that, Tariq can provide the patch,
> without changing this mlx5e is broken.

What patch? to modify the RSS spread? What is exactly broken?

So I'm not sure how to move forward here, should we modify the
indirection table construction to not rely on the unique affinity
mappings?

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 for-4.13 2/6] mlx5: move affinity hints assignments to generic code
  2017-06-08 12:29                                 ` Sagi Grimberg
@ 2017-06-11  9:32                                     ` Saeed Mahameed
  -1 siblings, 0 replies; 36+ messages in thread
From: Saeed Mahameed @ 2017-06-11  9:32 UTC (permalink / raw)
  To: Sagi Grimberg, Majd Dibbiny, Eran Ben Elisha,
	amira-VPRAkNaXOzVWk0Htik3J/w
  Cc: Tariq Toukan, Christoph Hellwig, Doug Ledford, Leon Romanovsky,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Thu, Jun 8, 2017 at 3:29 PM, Sagi Grimberg <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org> wrote:
>
>>>>> My interpretation is that mlx5 tried to do this for the (rather
>>>>> esoteric
>>>>> in my mind) case where the platform does not have enough vectors for
>>>>> the
>>>>> driver to allocate percpu. In this case, the next best thing is to stay
>>>>> as close to the device affinity as possible.
>>>>>
>>>>
>>>> No, we did it for the reason that mlx5e netdevice assumes that
>>>> IRQ[0]..IRQ[#num_numa/#cpu_per_numa]
>>>> are always bound to the numa close to the device. and the mlx5e driver
>>>> choose those IRQs to spread
>>>> the RSS hash only into them and never uses other IRQs/Cores
>>>
>>>
>>>
>>> OK, that explains a lot of weirdness I've seen with mlx5e.
>>>
>>> Can you explain why you're using only a single numa node for your RSS
>>> table? What does it buy you? You open RX rings for _all_ cpus but
>>> only spread on part of them? I must be missing something here...
>>
>>
>> Adding Tariq,
>>
>> this is also part of the weirdness :), we do that to make sure any OOB
>> test you run you always get the best performance
>> and we will guarantee to always use close numa cores.
>
>
> Well I wish I knew that before :( I got to a point where I started
> to seriously doubt the math truth of xor/toeplitz hashing strength :)
>
> I'm sure you ran plenty of performance tests, but from my experience,
> application locality makes much more difference than device locality,
> especially when the application needs to touch the data...
>
>> we open RX rings on all of the cores in case if the user want to
>> change the RSS table to point to the whole thing on the fly "ethtool
>> -X"
>
>
> That is very counter intuitive afaict, is it documented anywhere?
>
> users might rely on the (absolutely reasonable) assumption that if a
> NIC exposes X rx rings, rx hashing should spread across all of them and
> not a subset.
>

This is why we want to remove this assumption from mlx5e

>> But we are willing to change that, Tariq can provide the patch,
>> without changing this mlx5e is broken.
>
>
> What patch? to modify the RSS spread? What is exactly broken?

current mlx5 netdev with your patches might spread traffic _ONLY_ to
the far numa.

To fix this in mlx5e we need something like this:

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 41cd22a223dc..15499865784f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -3733,18 +3733,8 @@ void mlx5e_build_default_indir_rqt(struct
mlx5_core_dev *mdev,
                                   u32 *indirection_rqt, int len,
                                   int num_channels)
 {
-       int node = mdev->priv.numa_node;
-       int node_num_of_cores;
        int i;

-       if (node == -1)
-               node = first_online_node;
-
-       node_num_of_cores = cpumask_weight(cpumask_of_node(node));
-
-       if (node_num_of_cores)
-               num_channels = min_t(int, num_channels, node_num_of_cores);
-

we are working on such patch, and we would like to have it submitted
and accepted along or before your seires.
Also, we will have to run a performance & functional regression cycle
on all the patches combined.

>
> So I'm not sure how to move forward here, should we modify the
> indirection table construction to not rely on the unique affinity
> mappings?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v3 for-4.13 2/6] mlx5: move affinity hints assignments to generic code
@ 2017-06-11  9:32                                     ` Saeed Mahameed
  0 siblings, 0 replies; 36+ messages in thread
From: Saeed Mahameed @ 2017-06-11  9:32 UTC (permalink / raw)


On Thu, Jun 8, 2017@3:29 PM, Sagi Grimberg <sagi@grimberg.me> wrote:
>
>>>>> My interpretation is that mlx5 tried to do this for the (rather
>>>>> esoteric
>>>>> in my mind) case where the platform does not have enough vectors for
>>>>> the
>>>>> driver to allocate percpu. In this case, the next best thing is to stay
>>>>> as close to the device affinity as possible.
>>>>>
>>>>
>>>> No, we did it for the reason that mlx5e netdevice assumes that
>>>> IRQ[0]..IRQ[#num_numa/#cpu_per_numa]
>>>> are always bound to the numa close to the device. and the mlx5e driver
>>>> choose those IRQs to spread
>>>> the RSS hash only into them and never uses other IRQs/Cores
>>>
>>>
>>>
>>> OK, that explains a lot of weirdness I've seen with mlx5e.
>>>
>>> Can you explain why you're using only a single numa node for your RSS
>>> table? What does it buy you? You open RX rings for _all_ cpus but
>>> only spread on part of them? I must be missing something here...
>>
>>
>> Adding Tariq,
>>
>> this is also part of the weirdness :), we do that to make sure any OOB
>> test you run you always get the best performance
>> and we will guarantee to always use close numa cores.
>
>
> Well I wish I knew that before :( I got to a point where I started
> to seriously doubt the math truth of xor/toeplitz hashing strength :)
>
> I'm sure you ran plenty of performance tests, but from my experience,
> application locality makes much more difference than device locality,
> especially when the application needs to touch the data...
>
>> we open RX rings on all of the cores in case if the user want to
>> change the RSS table to point to the whole thing on the fly "ethtool
>> -X"
>
>
> That is very counter intuitive afaict, is it documented anywhere?
>
> users might rely on the (absolutely reasonable) assumption that if a
> NIC exposes X rx rings, rx hashing should spread across all of them and
> not a subset.
>

This is why we want to remove this assumption from mlx5e

>> But we are willing to change that, Tariq can provide the patch,
>> without changing this mlx5e is broken.
>
>
> What patch? to modify the RSS spread? What is exactly broken?

current mlx5 netdev with your patches might spread traffic _ONLY_ to
the far numa.

To fix this in mlx5e we need something like this:

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 41cd22a223dc..15499865784f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -3733,18 +3733,8 @@ void mlx5e_build_default_indir_rqt(struct
mlx5_core_dev *mdev,
                                   u32 *indirection_rqt, int len,
                                   int num_channels)
 {
-       int node = mdev->priv.numa_node;
-       int node_num_of_cores;
        int i;

-       if (node == -1)
-               node = first_online_node;
-
-       node_num_of_cores = cpumask_weight(cpumask_of_node(node));
-
-       if (node_num_of_cores)
-               num_channels = min_t(int, num_channels, node_num_of_cores);
-

we are working on such patch, and we would like to have it submitted
and accepted along or before your seires.
Also, we will have to run a performance & functional regression cycle
on all the patches combined.

>
> So I'm not sure how to move forward here, should we modify the
> indirection table construction to not rely on the unique affinity
> mappings?

^ permalink raw reply related	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2017-06-11  9:32 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-05  6:35 [PATCH v3 for-4.13 0/6] Automatic affinity settings for nvme over rdma Sagi Grimberg
2017-06-05  6:35 ` Sagi Grimberg
     [not found] ` <1496644560-28923-1-git-send-email-sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
2017-06-05  6:35   ` [PATCH v3 for-4.13 1/6] mlx5: convert to generic pci_alloc_irq_vectors Sagi Grimberg
2017-06-05  6:35     ` Sagi Grimberg
     [not found]     ` <1496644560-28923-2-git-send-email-sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
2017-06-07  5:26       ` Saeed Mahameed
2017-06-07  5:26         ` Saeed Mahameed
     [not found]         ` <CALzJLG97HkMQ8AULoTS+P5FLXhh+2DhjVfVHRYEemea90tXJAw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-06-07  5:46           ` Sagi Grimberg
2017-06-07  5:46             ` Sagi Grimberg
2017-06-05  6:35   ` [PATCH v3 for-4.13 2/6] mlx5: move affinity hints assignments to generic code Sagi Grimberg
2017-06-05  6:35     ` Sagi Grimberg
     [not found]     ` <1496644560-28923-3-git-send-email-sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
2017-06-07  6:16       ` Saeed Mahameed
2017-06-07  6:16         ` Saeed Mahameed
     [not found]         ` <CALzJLG-SQj2OsOOjhHsMaD8gKC0JKD068L=ZjYRPBG=iwf9r8Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-06-07  8:31           ` Christoph Hellwig
2017-06-07  8:31             ` Christoph Hellwig
     [not found]             ` <20170607083156.GB24876-jcswGhMUV9g@public.gmane.org>
2017-06-07  9:48               ` Sagi Grimberg
2017-06-07  9:48                 ` Sagi Grimberg
     [not found]                 ` <9332ed02-9711-0196-14d9-e9641c4dc932-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
2017-06-08  9:28                   ` Saeed Mahameed
2017-06-08  9:28                     ` Saeed Mahameed
     [not found]                     ` <CALzJLG_h8ig97uzz9S_4n-DhcFFi1e+WcUwLejM4cHUFd1MozA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-06-08 10:16                       ` Sagi Grimberg
2017-06-08 10:16                         ` Sagi Grimberg
     [not found]                         ` <aff81384-517a-94e4-c743-8b89fd8cda1b-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
2017-06-08 11:42                           ` Saeed Mahameed
2017-06-08 11:42                             ` Saeed Mahameed
     [not found]                             ` <CALzJLG_sFqSm5o1WvjJzoBLMdoSPuQFBBbiBF8T8xn9m8SDc5g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-06-08 12:29                               ` Sagi Grimberg
2017-06-08 12:29                                 ` Sagi Grimberg
     [not found]                                 ` <8057a5c0-24a4-6211-c318-1d037b645604-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
2017-06-11  9:32                                   ` Saeed Mahameed
2017-06-11  9:32                                     ` Saeed Mahameed
2017-06-05  6:35   ` [PATCH v3 for-4.13 3/6] RDMA/core: expose affinity mappings per completion vector Sagi Grimberg
2017-06-05  6:35     ` Sagi Grimberg
2017-06-05  6:35   ` [PATCH v3 for-4.13 4/6] mlx5: support ->get_vector_affinity Sagi Grimberg
2017-06-05  6:35     ` Sagi Grimberg
2017-06-05  6:35   ` [PATCH v3 for-4.13 5/6] block: Add rdma affinity based queue mapping helper Sagi Grimberg
2017-06-05  6:35     ` Sagi Grimberg
2017-06-05  6:36   ` [PATCH v3 for-4.13 6/6] nvme-rdma: use intelligent affinity based queue mappings Sagi Grimberg
2017-06-05  6:36     ` Sagi Grimberg
2017-06-06  9:31   ` [PATCH v3 for-4.13 0/6] Automatic affinity settings for nvme over rdma Christoph Hellwig
2017-06-06  9:31     ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.