netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next V1 0/3] net/mlx4: Mellanox driver update 01-01-2014
@ 2014-02-19 12:58 Amir Vadai
  2014-02-19 12:58 ` [PATCH net-next V1 1/3] net/mlx4_en: Use affinity hint Amir Vadai
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Amir Vadai @ 2014-02-19 12:58 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Yevgeny Petrilin, Or Gerlitz, Amir Vadai

Hi,

V0 of this patch was sent before previous net-next got closed, and now we would
like to resume it.

Yuval has reworked the affinity hint patch, according to Ben's comments. The
patch was actually rewritten.
After a discussion with Yuval Mintz, use of netif_get_num_default_rss_queues()
is not reverted, but done in the right place. Instead of limiting the number of
IRQ's for the driver it will limit the number of queues in RSS.

Patchset was applied and tested against commit: cb6e926 "ipv6:fix checkpatch
errors with assignment in if condition"

Thanks,
Amir

Ido Shamay (2):
  net/mlx4: Set number of RX rings in a utility function
  net/mlx4: Fix limiting number of IRQ's instead of RSS queues

Yuval Atias (1):
  net/mlx4_en: Use affinity hint

 drivers/infiniband/hw/mlx4/main.c              |  2 +-
 drivers/net/ethernet/mellanox/mlx4/en_cq.c     |  5 +-
 drivers/net/ethernet/mellanox/mlx4/en_main.c   | 15 +----
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 88 ++++++++++++++++++++++++--
 drivers/net/ethernet/mellanox/mlx4/en_rx.c     | 32 +++++++++-
 drivers/net/ethernet/mellanox/mlx4/eq.c        | 14 +++-
 drivers/net/ethernet/mellanox/mlx4/main.c      |  3 +-
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h   |  5 +-
 include/linux/mlx4/device.h                    |  2 +-
 9 files changed, 138 insertions(+), 28 deletions(-)

-- 
1.8.3.4

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH net-next V1 1/3] net/mlx4_en: Use affinity hint
  2014-02-19 12:58 [PATCH net-next V1 0/3] net/mlx4: Mellanox driver update 01-01-2014 Amir Vadai
@ 2014-02-19 12:58 ` Amir Vadai
  2014-02-19 12:58 ` [PATCH net-next V1 2/3] net/mlx4: Set number of RX rings in a utility function Amir Vadai
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 11+ messages in thread
From: Amir Vadai @ 2014-02-19 12:58 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Yevgeny Petrilin, Or Gerlitz, Yuval Atias, Ben Hutchings,
	Amir Vadai

From: Yuval Atias <yuvala@mellanox.com>

The affinity hint mechanism is used by the user space
daemon, irqbalancer, to indicate a preferred CPU mask for irqs.
Irqbalancer can use this hint to balance the irqs between the
cpus indicated by the mask.

We wish the HCA to preferentially map the IRQs it uses to numa cores
close to it.
To accomplish this, we use affinity hint: first we map IRQs to close
numa cores.
If these are exhausted, the remaining IRQs are mapped to far numa
cores.

CC: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Yuval Atias <yuvala@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>

---

Changes from V0:
Use numa bit mask to set affinity hint.
Move dynamic allocation to mlx4_en_start_port function.
Handle error when device numa node is not known.
Change rx_ring affinity_mask var type to cpumask_var_t.

 drivers/infiniband/hw/mlx4/main.c              |  2 +-
 drivers/net/ethernet/mellanox/mlx4/en_cq.c     |  5 +-
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 88 ++++++++++++++++++++++++--
 drivers/net/ethernet/mellanox/mlx4/en_rx.c     |  7 +-
 drivers/net/ethernet/mellanox/mlx4/eq.c        | 14 +++-
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h   |  3 +-
 include/linux/mlx4/device.h                    |  2 +-
 7 files changed, 109 insertions(+), 12 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
index c2702f5..a7dcfa7 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -1763,7 +1763,7 @@ static void mlx4_ib_alloc_eqs(struct mlx4_dev *dev, struct mlx4_ib_dev *ibdev)
 				i, j, dev->pdev->bus->name);
 			/* Set IRQ for specific name (per ring) */
 			if (mlx4_assign_eq(dev, name, NULL,
-					   &ibdev->eq_table[eq])) {
+					   &ibdev->eq_table[eq], NULL)) {
 				/* Use legacy (same as mlx4_en driver) */
 				pr_warn("Can't allocate EQ %d; reverting to legacy\n", eq);
 				ibdev->eq_table[eq] =
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_cq.c b/drivers/net/ethernet/mellanox/mlx4/en_cq.c
index 70e9532..5ef3f7a 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_cq.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_cq.c
@@ -96,7 +96,7 @@ err_cq:
 }
 
 int mlx4_en_activate_cq(struct mlx4_en_priv *priv, struct mlx4_en_cq *cq,
-			int cq_idx)
+			int cq_idx, cpumask_var_t affinity_mask)
 {
 	struct mlx4_en_dev *mdev = priv->mdev;
 	int err = 0;
@@ -123,7 +123,8 @@ int mlx4_en_activate_cq(struct mlx4_en_priv *priv, struct mlx4_en_cq *cq,
 					cq->ring);
 				/* Set IRQ for specific name (per ring) */
 				if (mlx4_assign_eq(mdev->dev, name, rmap,
-						   &cq->vector)) {
+						   &cq->vector,
+						   affinity_mask)) {
 					cq->vector = (cq->ring + 1 + priv->port)
 					    % mdev->dev->caps.num_comp_vectors;
 					mlx4_warn(mdev, "Failed Assigning an EQ to "
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index fad4531..88f16c0 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -1533,6 +1533,55 @@ static void mlx4_en_linkstate(struct work_struct *work)
 	mutex_unlock(&mdev->state_lock);
 }
 
+static int mlx4_en_rings_affinity_hint(struct mlx4_en_priv *priv,
+				cpumask_var_t non_numa_cores_mask,
+				cpumask_var_t numa_cores_mask,
+				const struct cpumask **p_affinity_numa_mask,
+				int *affinity_cpu)
+{
+	if (priv->mdev->dev->numa_node == -1)
+		goto err;
+	*p_affinity_numa_mask = cpumask_of_node(priv->mdev->dev->numa_node);
+	if (!*p_affinity_numa_mask)
+		goto err;
+	cpumask_copy(numa_cores_mask, *p_affinity_numa_mask);
+	if (!cpumask_and(numa_cores_mask,
+			 cpu_online_mask, numa_cores_mask)) {
+		en_warn(priv, "Failed to find online cores for numa\n");
+		goto err;
+	}
+	cpumask_xor(non_numa_cores_mask, cpu_online_mask,
+		    numa_cores_mask);
+	*p_affinity_numa_mask = numa_cores_mask;
+	*affinity_cpu = cpumask_first(*p_affinity_numa_mask);
+
+	return 0;
+
+err:
+	*affinity_cpu = -1;
+	return -EINVAL;
+}
+
+static int mlx4_en_set_affinity_hint(struct mlx4_en_priv *priv,
+			      cpumask_var_t non_numa_cores_mask,
+			      cpumask_var_t numa_cores_mask,
+			      const struct cpumask **p_affinity_numa_mask,
+			      int *affinity_cpu, int ring)
+{
+	if (*affinity_cpu == -1)
+		return -EINVAL;
+	cpumask_set_cpu(*affinity_cpu,
+			priv->rx_ring[ring]->affinity_mask);
+	*affinity_cpu = cpumask_next(*affinity_cpu, *p_affinity_numa_mask);
+	if (*affinity_cpu >= nr_cpu_ids) {
+		*p_affinity_numa_mask =
+			*p_affinity_numa_mask == numa_cores_mask ?
+			non_numa_cores_mask : numa_cores_mask;
+		*affinity_cpu = cpumask_first(*p_affinity_numa_mask);
+	}
+
+	return 0;
+}
 
 int mlx4_en_start_port(struct net_device *dev)
 {
@@ -1540,6 +1589,10 @@ int mlx4_en_start_port(struct net_device *dev)
 	struct mlx4_en_dev *mdev = priv->mdev;
 	struct mlx4_en_cq *cq;
 	struct mlx4_en_tx_ring *tx_ring;
+	const struct cpumask *affinity_numa_mask;
+	cpumask_var_t numa_cores_mask = NULL;
+	cpumask_var_t non_numa_cores_mask = NULL;
+	int affinity_cpu;
 	int rx_index = 0;
 	int tx_index = 0;
 	int err = 0;
@@ -1551,7 +1604,12 @@ int mlx4_en_start_port(struct net_device *dev)
 		en_dbg(DRV, priv, "start port called while port already up\n");
 		return 0;
 	}
-
+	if (!zalloc_cpumask_var(&numa_cores_mask, GFP_KERNEL) ||
+	    !zalloc_cpumask_var(&non_numa_cores_mask, GFP_KERNEL)) {
+		en_err(priv, "Failed to allocating core mask\n");
+		err = -EINVAL;
+		goto affinity_err;
+	}
 	INIT_LIST_HEAD(&priv->mc_list);
 	INIT_LIST_HEAD(&priv->curr_list);
 	INIT_LIST_HEAD(&priv->ethtool_list);
@@ -1569,12 +1627,28 @@ int mlx4_en_start_port(struct net_device *dev)
 		en_err(priv, "Failed to activate RX rings\n");
 		return err;
 	}
+	err = mlx4_en_rings_affinity_hint(priv,
+					  non_numa_cores_mask,
+					  numa_cores_mask,
+					  &affinity_numa_mask,
+					  &affinity_cpu);
+	if (err)
+		en_err(priv, "Failed to set affinity hints\n");
+
 	for (i = 0; i < priv->rx_ring_num; i++) {
 		cq = priv->rx_cq[i];
 
 		mlx4_en_cq_init_lock(cq);
-
-		err = mlx4_en_activate_cq(priv, cq, i);
+		err = mlx4_en_set_affinity_hint(priv,
+						non_numa_cores_mask,
+						numa_cores_mask,
+						&affinity_numa_mask,
+						&affinity_cpu, i);
+		if (err)
+			en_err(priv, "Failed setting affinity hint\n");
+		err = mlx4_en_activate_cq(priv, cq, i,
+				affinity_cpu == -1 ? NULL :
+				priv->rx_ring[i]->affinity_mask);
 		if (err) {
 			en_err(priv, "Failed activating Rx CQ\n");
 			goto cq_err;
@@ -1615,7 +1689,7 @@ int mlx4_en_start_port(struct net_device *dev)
 	for (i = 0; i < priv->tx_ring_num; i++) {
 		/* Configure cq */
 		cq = priv->tx_cq[i];
-		err = mlx4_en_activate_cq(priv, cq, i);
+		err = mlx4_en_activate_cq(priv, cq, i, NULL);
 		if (err) {
 			en_err(priv, "Failed allocating Tx CQ\n");
 			goto tx_err;
@@ -1704,7 +1778,8 @@ int mlx4_en_start_port(struct net_device *dev)
 	priv->port_up = true;
 	netif_tx_start_all_queues(dev);
 	netif_device_attach(dev);
-
+	free_cpumask_var(non_numa_cores_mask);
+	free_cpumask_var(numa_cores_mask);
 	return 0;
 
 tx_err:
@@ -1722,6 +1797,9 @@ cq_err:
 		mlx4_en_deactivate_cq(priv, priv->rx_cq[rx_index]);
 	for (i = 0; i < priv->rx_ring_num; i++)
 		mlx4_en_deactivate_rx_ring(priv, priv->rx_ring[i]);
+affinity_err:
+	free_cpumask_var(non_numa_cores_mask);
+	free_cpumask_var(numa_cores_mask);
 
 	return err; /* need to close devices */
 }
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index 890922c..b0eba6d 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -335,7 +335,11 @@ int mlx4_en_create_rx_ring(struct mlx4_en_priv *priv,
 			return -ENOMEM;
 		}
 	}
-
+	if (!zalloc_cpumask_var(&ring->affinity_mask, GFP_KERNEL)) {
+		en_err(priv, "Failed to allocating core mask\n");
+		err = -ENOMEM;
+		goto err_ring;
+	}
 	ring->prod = 0;
 	ring->cons = 0;
 	ring->size = size;
@@ -470,6 +474,7 @@ void mlx4_en_destroy_rx_ring(struct mlx4_en_priv *priv,
 	mlx4_free_hwq_res(mdev->dev, &ring->wqres, size * stride + TXBB_SIZE);
 	vfree(ring->rx_info);
 	ring->rx_info = NULL;
+	free_cpumask_var(ring->affinity_mask);
 	kfree(ring);
 	*pring = NULL;
 #ifdef CONFIG_RFS_ACCEL
diff --git a/drivers/net/ethernet/mellanox/mlx4/eq.c b/drivers/net/ethernet/mellanox/mlx4/eq.c
index 8992b38..a3d8502 100644
--- a/drivers/net/ethernet/mellanox/mlx4/eq.c
+++ b/drivers/net/ethernet/mellanox/mlx4/eq.c
@@ -1311,7 +1311,7 @@ int mlx4_test_interrupts(struct mlx4_dev *dev)
 EXPORT_SYMBOL(mlx4_test_interrupts);
 
 int mlx4_assign_eq(struct mlx4_dev *dev, char *name, struct cpu_rmap *rmap,
-		   int *vector)
+		   int *vector, cpumask_var_t cpu_hint_mask)
 {
 
 	struct mlx4_priv *priv = mlx4_priv(dev);
@@ -1344,6 +1344,16 @@ int mlx4_assign_eq(struct mlx4_dev *dev, char *name, struct cpu_rmap *rmap,
 				continue;
 				/*we dont want to break here*/
 			}
+			if (cpu_hint_mask) {
+				err = irq_set_affinity_hint(
+						priv->eq_table.eq[vec].irq,
+						cpu_hint_mask);
+				if (err) {
+					mlx4_warn(dev, "Failed setting affinity hint\n");
+					/*we dont want to break here*/
+				}
+			}
+
 			eq_set_ci(&priv->eq_table.eq[vec], 1);
 		}
 	}
@@ -1370,6 +1380,8 @@ void mlx4_release_eq(struct mlx4_dev *dev, int vec)
 		  Belonging to a legacy EQ*/
 		mutex_lock(&priv->msix_ctl.pool_lock);
 		if (priv->msix_ctl.pool_bm & 1ULL << i) {
+			irq_set_affinity_hint(priv->eq_table.eq[vec].irq,
+					      NULL);
 			free_irq(priv->eq_table.eq[vec].irq,
 				 &priv->eq_table.eq[vec]);
 			priv->msix_ctl.pool_bm &= ~(1ULL << i);
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
index 3af04c3..101d636 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
@@ -303,6 +303,7 @@ struct mlx4_en_rx_ring {
 	unsigned long csum_ok;
 	unsigned long csum_none;
 	int hwtstamp_rx_filter;
+	cpumask_var_t affinity_mask;
 };
 
 struct mlx4_en_cq {
@@ -716,7 +717,7 @@ int mlx4_en_create_cq(struct mlx4_en_priv *priv, struct mlx4_en_cq **pcq,
 		      int entries, int ring, enum cq_type mode, int node);
 void mlx4_en_destroy_cq(struct mlx4_en_priv *priv, struct mlx4_en_cq **pcq);
 int mlx4_en_activate_cq(struct mlx4_en_priv *priv, struct mlx4_en_cq *cq,
-			int cq_idx);
+			int cq_idx, cpumask_var_t affinity_mask);
 void mlx4_en_deactivate_cq(struct mlx4_en_priv *priv, struct mlx4_en_cq *cq);
 int mlx4_en_set_cq_moder(struct mlx4_en_priv *priv, struct mlx4_en_cq *cq);
 int mlx4_en_arm_cq(struct mlx4_en_priv *priv, struct mlx4_en_cq *cq);
diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
index 5edd2c6..f8c253f 100644
--- a/include/linux/mlx4/device.h
+++ b/include/linux/mlx4/device.h
@@ -1148,7 +1148,7 @@ int mlx4_fmr_free(struct mlx4_dev *dev, struct mlx4_fmr *fmr);
 int mlx4_SYNC_TPT(struct mlx4_dev *dev);
 int mlx4_test_interrupts(struct mlx4_dev *dev);
 int mlx4_assign_eq(struct mlx4_dev *dev, char *name, struct cpu_rmap *rmap,
-		   int *vector);
+		   int *vector, cpumask_t *cpu_hint_mask);
 void mlx4_release_eq(struct mlx4_dev *dev, int vec);
 
 int mlx4_get_phys_port_id(struct mlx4_dev *dev);
-- 
1.8.3.4

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH net-next V1 2/3] net/mlx4: Set number of RX rings in a utility function
  2014-02-19 12:58 [PATCH net-next V1 0/3] net/mlx4: Mellanox driver update 01-01-2014 Amir Vadai
  2014-02-19 12:58 ` [PATCH net-next V1 1/3] net/mlx4_en: Use affinity hint Amir Vadai
@ 2014-02-19 12:58 ` Amir Vadai
  2014-02-19 12:58 ` [PATCH net-next V1 3/3] net/mlx4: Fix limiting number of IRQ's instead of RSS queues Amir Vadai
  2014-02-19 21:50 ` [PATCH net-next V1 0/3] net/mlx4: Mellanox driver update 01-01-2014 David Miller
  3 siblings, 0 replies; 11+ messages in thread
From: Amir Vadai @ 2014-02-19 12:58 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Yevgeny Petrilin, Or Gerlitz, Ido Shamay, Amir Vadai

From: Ido Shamay <idos@mellanox.com>

mlx4_en_add() is too long.
Moving set number of RX rings to a utiltity function to improve
readability and modulization of the code.

Signed-off-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx4/en_main.c | 15 ++-------------
 drivers/net/ethernet/mellanox/mlx4/en_rx.c   | 22 ++++++++++++++++++++++
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h |  2 +-
 3 files changed, 25 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_main.c b/drivers/net/ethernet/mellanox/mlx4/en_main.c
index d357bf5..fa2f6e7 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_main.c
@@ -274,19 +274,8 @@ static void *mlx4_en_add(struct mlx4_dev *dev)
 	if (mdev->dev->caps.flags2 & MLX4_DEV_CAP_FLAG2_TS)
 		mlx4_en_init_timestamp(mdev);
 
-	mlx4_foreach_port(i, dev, MLX4_PORT_TYPE_ETH) {
-		if (!dev->caps.comp_pool) {
-			mdev->profile.prof[i].rx_ring_num =
-				rounddown_pow_of_two(max_t(int, MIN_RX_RINGS,
-							   min_t(int,
-								 dev->caps.num_comp_vectors,
-								 DEF_RX_RINGS)));
-		} else {
-			mdev->profile.prof[i].rx_ring_num = rounddown_pow_of_two(
-				min_t(int, dev->caps.comp_pool/
-				      dev->caps.num_ports - 1 , MAX_MSIX_P_PORT - 1));
-		}
-	}
+	/* Set default number of RX rings*/
+	mlx4_en_set_num_rx_rings(mdev);
 
 	/* Create our own workqueue for reset/multicast tasks
 	 * Note: we cannot use the shared workqueue because of deadlocks caused
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index b0eba6d..61e44ae 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -318,6 +318,28 @@ static void mlx4_en_free_rx_buf(struct mlx4_en_priv *priv,
 	}
 }
 
+void mlx4_en_set_num_rx_rings(struct mlx4_en_dev *mdev)
+{
+	int i;
+	int num_of_eqs;
+	struct mlx4_dev *dev = mdev->dev;
+
+	mlx4_foreach_port(i, dev, MLX4_PORT_TYPE_ETH) {
+		if (!dev->caps.comp_pool)
+			num_of_eqs = max_t(int, MIN_RX_RINGS,
+					   min_t(int,
+						 dev->caps.num_comp_vectors,
+						 DEF_RX_RINGS));
+		else
+			num_of_eqs = min_t(int, MAX_MSIX_P_PORT,
+					   dev->caps.comp_pool/
+					   dev->caps.num_ports) - 1;
+
+		mdev->profile.prof[i].rx_ring_num =
+			rounddown_pow_of_two(num_of_eqs);
+	}
+}
+
 int mlx4_en_create_rx_ring(struct mlx4_en_priv *priv,
 			   struct mlx4_en_rx_ring **pring,
 			   u32 size, u16 stride, int node)
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
index 101d636..f97a0d3 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
@@ -738,7 +738,7 @@ int mlx4_en_activate_tx_ring(struct mlx4_en_priv *priv,
 			     int cq, int user_prio);
 void mlx4_en_deactivate_tx_ring(struct mlx4_en_priv *priv,
 				struct mlx4_en_tx_ring *ring);
-
+void mlx4_en_set_num_rx_rings(struct mlx4_en_dev *mdev);
 int mlx4_en_create_rx_ring(struct mlx4_en_priv *priv,
 			   struct mlx4_en_rx_ring **pring,
 			   u32 size, u16 stride, int node);
-- 
1.8.3.4

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH net-next V1 3/3] net/mlx4: Fix limiting number of IRQ's instead of RSS queues
  2014-02-19 12:58 [PATCH net-next V1 0/3] net/mlx4: Mellanox driver update 01-01-2014 Amir Vadai
  2014-02-19 12:58 ` [PATCH net-next V1 1/3] net/mlx4_en: Use affinity hint Amir Vadai
  2014-02-19 12:58 ` [PATCH net-next V1 2/3] net/mlx4: Set number of RX rings in a utility function Amir Vadai
@ 2014-02-19 12:58 ` Amir Vadai
  2014-02-19 21:50 ` [PATCH net-next V1 0/3] net/mlx4: Mellanox driver update 01-01-2014 David Miller
  3 siblings, 0 replies; 11+ messages in thread
From: Amir Vadai @ 2014-02-19 12:58 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Yevgeny Petrilin, Or Gerlitz, Ido Shamay, Yuval Mintz,
	Amir Vadai

From: Ido Shamay <idos@mellanox.com>

This fix a performance bug introduced by commit 90b1ebe "mlx4: set
maximal number of default RSS queues", which limits the numbers of IRQs
opened by core module.
The limit should be on the number of queues in the indirection table -
rx_rings, and not on the number of IRQ's. Also, limiting on mlx4_core
initialization instead of in mlx4_en, prevented using "ethtool -L" to
utilize all the CPU's, when performance mode is prefered, since limiting
this number to 8 reduces overall packet rate by 15%-50% in multiple TCP
streams applications.

For example, after running ethtool -L <ethx> rx 16

          Packet rate
Before the fix  897799
After the fix   1142070

Results were obtained using netperf:

S=200 ; ( for i in $(seq 1 $S) ; do ( \
  netperf -H 11.7.13.55 -t TCP_RR -l 30 &) ; \
  wait ; done | grep "1        1" | awk '{SUM+=$6} END {print SUM}' )


CC: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx4/en_rx.c | 5 ++++-
 drivers/net/ethernet/mellanox/mlx4/main.c  | 3 +--
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index 61e44ae..630ec03 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -322,6 +322,7 @@ void mlx4_en_set_num_rx_rings(struct mlx4_en_dev *mdev)
 {
 	int i;
 	int num_of_eqs;
+	int num_rx_rings;
 	struct mlx4_dev *dev = mdev->dev;
 
 	mlx4_foreach_port(i, dev, MLX4_PORT_TYPE_ETH) {
@@ -335,8 +336,10 @@ void mlx4_en_set_num_rx_rings(struct mlx4_en_dev *mdev)
 					   dev->caps.comp_pool/
 					   dev->caps.num_ports) - 1;
 
+		num_rx_rings = min_t(int, num_of_eqs,
+				     netif_get_num_default_rss_queues());
 		mdev->profile.prof[i].rx_ring_num =
-			rounddown_pow_of_two(num_of_eqs);
+			rounddown_pow_of_two(num_rx_rings);
 	}
 }
 
diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c
index d711158..8c82a6b 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -41,7 +41,6 @@
 #include <linux/slab.h>
 #include <linux/io-mapping.h>
 #include <linux/delay.h>
-#include <linux/netdevice.h>
 #include <linux/kmod.h>
 
 #include <linux/mlx4/device.h>
@@ -1974,7 +1973,7 @@ static void mlx4_enable_msi_x(struct mlx4_dev *dev)
 	struct mlx4_priv *priv = mlx4_priv(dev);
 	struct msix_entry *entries;
 	int nreq = min_t(int, dev->caps.num_ports *
-			 min_t(int, netif_get_num_default_rss_queues() + 1,
+			 min_t(int, num_online_cpus() + 1,
 			       MAX_MSIX_P_PORT) + MSIX_LEGACY_SZ, MAX_MSIX);
 	int err;
 	int i;
-- 
1.8.3.4

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next V1 0/3] net/mlx4: Mellanox driver update 01-01-2014
  2014-02-19 12:58 [PATCH net-next V1 0/3] net/mlx4: Mellanox driver update 01-01-2014 Amir Vadai
                   ` (2 preceding siblings ...)
  2014-02-19 12:58 ` [PATCH net-next V1 3/3] net/mlx4: Fix limiting number of IRQ's instead of RSS queues Amir Vadai
@ 2014-02-19 21:50 ` David Miller
  2014-02-20 20:15   ` Amir Vadai
  2014-02-22  1:33   ` [PATCH net-next V1 0/3] " Ben Hutchings
  3 siblings, 2 replies; 11+ messages in thread
From: David Miller @ 2014-02-19 21:50 UTC (permalink / raw)
  To: amirv; +Cc: netdev, yevgenyp, ogerlitz

From: Amir Vadai <amirv@mellanox.com>
Date: Wed, 19 Feb 2014 14:58:01 +0200

> V0 of this patch was sent before previous net-next got closed, and
> now we would like to resume it.
> 
> Yuval has reworked the affinity hint patch, according to Ben's comments. The
> patch was actually rewritten.
> After a discussion with Yuval Mintz, use of netif_get_num_default_rss_queues()
> is not reverted, but done in the right place. Instead of limiting the number of
> IRQ's for the driver it will limit the number of queues in RSS.
> 
> Patchset was applied and tested against commit: cb6e926 "ipv6:fix checkpatch
> errors with assignment in if condition"

Influencing IRQs to be allocated on the same NUMA code as the one where
the card resides doesn't sound like an mlx4 specific desire to me.

Other devices, both networking and non-networking, probably might like
that as well.

Therefore doing this by hand in a specific driver doesn't seem
appropriate at all.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: net/mlx4: Mellanox driver update 01-01-2014
  2014-02-19 21:50 ` [PATCH net-next V1 0/3] net/mlx4: Mellanox driver update 01-01-2014 David Miller
@ 2014-02-20 20:15   ` Amir Vadai
  2014-02-20 21:11     ` David Miller
  2014-02-22  1:33   ` [PATCH net-next V1 0/3] " Ben Hutchings
  1 sibling, 1 reply; 11+ messages in thread
From: Amir Vadai @ 2014-02-20 20:15 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, yevgenyp, ogerlitz

On 19/02/14 16:50 -0500, David Miller wrote:
> From: Amir Vadai <amirv@mellanox.com>
> Date: Wed, 19 Feb 2014 14:58:01 +0200
> 
> > V0 of this patch was sent before previous net-next got closed, and
> > now we would like to resume it.
> > 
> > Yuval has reworked the affinity hint patch, according to Ben's comments. The
> > patch was actually rewritten.
> > After a discussion with Yuval Mintz, use of netif_get_num_default_rss_queues()
> > is not reverted, but done in the right place. Instead of limiting the number of
> > IRQ's for the driver it will limit the number of queues in RSS.
> > 
> > Patchset was applied and tested against commit: cb6e926 "ipv6:fix checkpatch
> > errors with assignment in if condition"
> 
> Influencing IRQs to be allocated on the same NUMA code as the one where
> the card resides doesn't sound like an mlx4 specific desire to me.
> 
> Other devices, both networking and non-networking, probably might like
> that as well.
> 
> Therefore doing this by hand in a specific driver doesn't seem
> appropriate at all.

I agree. Will try to find a way to make it generic and resubmit this
patch.
Please continue with the review of the other patches by Ido, that will
fix the way netif_get_num_default_rss_queues() is used in Mellanox
driver.

Thanks,
Amir

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: net/mlx4: Mellanox driver update 01-01-2014
  2014-02-20 20:15   ` Amir Vadai
@ 2014-02-20 21:11     ` David Miller
  0 siblings, 0 replies; 11+ messages in thread
From: David Miller @ 2014-02-20 21:11 UTC (permalink / raw)
  To: amirv; +Cc: netdev, yevgenyp, ogerlitz

From: Amir Vadai <amirv@mellanox.com>
Date: Thu, 20 Feb 2014 22:15:47 +0200

> Please continue with the review of the other patches by Ido, that will
> fix the way netif_get_num_default_rss_queues() is used in Mellanox
> driver.

Please resubmit the series with this contentious patch omitted if
you'd like us to do that, thank you.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next V1 0/3] net/mlx4: Mellanox driver update 01-01-2014
  2014-02-19 21:50 ` [PATCH net-next V1 0/3] net/mlx4: Mellanox driver update 01-01-2014 David Miller
  2014-02-20 20:15   ` Amir Vadai
@ 2014-02-22  1:33   ` Ben Hutchings
  2014-02-23  9:01     ` Amir Vadai
  1 sibling, 1 reply; 11+ messages in thread
From: Ben Hutchings @ 2014-02-22  1:33 UTC (permalink / raw)
  To: David Miller; +Cc: amirv, netdev, yevgenyp, ogerlitz

[-- Attachment #1: Type: text/plain, Size: 1741 bytes --]

On Wed, 2014-02-19 at 16:50 -0500, David Miller wrote:
> From: Amir Vadai <amirv@mellanox.com>
> Date: Wed, 19 Feb 2014 14:58:01 +0200
> 
> > V0 of this patch was sent before previous net-next got closed, and
> > now we would like to resume it.
> > 
> > Yuval has reworked the affinity hint patch, according to Ben's comments. The
> > patch was actually rewritten.
> > After a discussion with Yuval Mintz, use of netif_get_num_default_rss_queues()
> > is not reverted, but done in the right place. Instead of limiting the number of
> > IRQ's for the driver it will limit the number of queues in RSS.
> > 
> > Patchset was applied and tested against commit: cb6e926 "ipv6:fix checkpatch
> > errors with assignment in if condition"
> 
> Influencing IRQs to be allocated on the same NUMA code as the one where
> the card resides doesn't sound like an mlx4 specific desire to me.
> 
> Other devices, both networking and non-networking, probably might like
> that as well.
> 
> Therefore doing this by hand in a specific driver doesn't seem
> appropriate at all.

Handling network traffic only on the local node can be really good on
recent Intel processors, where DMA writes will usually go into cache on
the local node.  But on other architectures, AMD processors, older Intel
processors... I don't think there's such a big difference.  Also, where
the system and device implement PCIe Transaction Processing Hints, DMA
writes to cache should work on all nodes (following interrupt
affinity)... in theory.

So this sort of policy not only shouldn't be implemented in specific
drivers, but also ought to be configurable.

Ben.

-- 
Ben Hutchings
I haven't lost my mind; it's backed up on tape somewhere.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: net/mlx4: Mellanox driver update 01-01-2014
  2014-02-22  1:33   ` [PATCH net-next V1 0/3] " Ben Hutchings
@ 2014-02-23  9:01     ` Amir Vadai
  2014-02-23 17:06       ` Ben Hutchings
  0 siblings, 1 reply; 11+ messages in thread
From: Amir Vadai @ 2014-02-23  9:01 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: David Miller, netdev, yevgenyp, ogerlitz, yuvala

On 22/02/14 01:33 +0000, Ben Hutchings wrote:
> On Wed, 2014-02-19 at 16:50 -0500, David Miller wrote:
> > From: Amir Vadai <amirv@mellanox.com>
> > Date: Wed, 19 Feb 2014 14:58:01 +0200
> > 
> > > V0 of this patch was sent before previous net-next got closed, and
> > > now we would like to resume it.
> > > 
> > > Yuval has reworked the affinity hint patch, according to Ben's comments. The
> > > patch was actually rewritten.
> > > After a discussion with Yuval Mintz, use of netif_get_num_default_rss_queues()
> > > is not reverted, but done in the right place. Instead of limiting the number of
> > > IRQ's for the driver it will limit the number of queues in RSS.
> > > 
> > > Patchset was applied and tested against commit: cb6e926 "ipv6:fix checkpatch
> > > errors with assignment in if condition"
> > 
> > Influencing IRQs to be allocated on the same NUMA code as the one where
> > the card resides doesn't sound like an mlx4 specific desire to me.
> > 
> > Other devices, both networking and non-networking, probably might like
> > that as well.
> > 
> > Therefore doing this by hand in a specific driver doesn't seem
> > appropriate at all.
> 
> Handling network traffic only on the local node can be really good on
> recent Intel processors, where DMA writes will usually go into cache on
> the local node.  But on other architectures, AMD processors, older Intel
> processors... I don't think there's such a big difference.  Also, where
> the system and device implement PCIe Transaction Processing Hints, DMA
> writes to cache should work on all nodes (following interrupt
> affinity)... in theory.
> 
> So this sort of policy not only shouldn't be implemented in specific
> drivers, but also ought to be configurable.
> 
> Ben.

Hi,

The idea here is to prefer a local node than a remote one - and not to
*always* use local nodes - kind of best effort approach.
The patch is relevant when number of rings is smaller (or bigger) than
number of CPU's - in this case the idea is to try to put the majority
of rx queues on the local node, and not to spread it evenly on the
numa nodes.
So, for Intel architecture you will get better performance in average,
and on other architectures, things will stay the same.

Therefore, I don't think it is needed to have a tunable for that.

If a user wants to use a more aggressive mode: to handle all incoming
traffic only by local nodes, he'll be able to do it by changing number
of rx channels to number of local cores - so we actually have the
tunable you had in mind.

Amir

> 
> -- 
> Ben Hutchings
> I haven't lost my mind; it's backed up on tape somewhere.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: net/mlx4: Mellanox driver update 01-01-2014
  2014-02-23  9:01     ` Amir Vadai
@ 2014-02-23 17:06       ` Ben Hutchings
  2014-02-23 17:25         ` Amir Vadai
  0 siblings, 1 reply; 11+ messages in thread
From: Ben Hutchings @ 2014-02-23 17:06 UTC (permalink / raw)
  To: Amir Vadai; +Cc: David Miller, netdev, yevgenyp, ogerlitz, yuvala

[-- Attachment #1: Type: text/plain, Size: 1724 bytes --]

On Sun, 2014-02-23 at 11:01 +0200, Amir Vadai wrote:
> On 22/02/14 01:33 +0000, Ben Hutchings wrote:
[...]
> > So this sort of policy not only shouldn't be implemented in specific
> > drivers, but also ought to be configurable.
> > 
> > Ben.
> 
> Hi,
> 
> The idea here is to prefer a local node than a remote one - and not to
> *always* use local nodes - kind of best effort approach.
> The patch is relevant when number of rings is smaller (or bigger) than
> number of CPU's - in this case the idea is to try to put the majority
> of rx queues on the local node, and not to spread it evenly on the
> numa nodes.
> So, for Intel architecture you will get better performance in average,
> and on other architectures, things will stay the same.
> 
> Therefore, I don't think it is needed to have a tunable for that.
> 
> If a user wants to use a more aggressive mode: to handle all incoming
> traffic only by local nodes, he'll be able to do it by changing number
> of rx channels to number of local cores - so we actually have the
> tunable you had in mind.

Right, that does sound pretty good as a default.  And I accept that it
would be reasonable to implement that initially without a tunable beyond
total number of IRQs.

I would like to have a central mechanism for this that would allow the
administrator to set a policy of spreading IRQs across all threads,
cores, packages, local cores, etc.  (The out-of-tree version of sfc has
such options.)  If you add the mechanism and default that you've
proposed, then someone (maybe me) can get round to the configurable
policy later.

Ben.

-- 
Ben Hutchings
All the simple programs have been written, and all the good names taken.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: net/mlx4: Mellanox driver update 01-01-2014
  2014-02-23 17:06       ` Ben Hutchings
@ 2014-02-23 17:25         ` Amir Vadai
  0 siblings, 0 replies; 11+ messages in thread
From: Amir Vadai @ 2014-02-23 17:25 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: David Miller, netdev, yevgenyp, ogerlitz, yuvala

On 23/02/14 17:06 +0000, Ben Hutchings wrote:
> On Sun, 2014-02-23 at 11:01 +0200, Amir Vadai wrote:
> > On 22/02/14 01:33 +0000, Ben Hutchings wrote:
[...]
> 
> Right, that does sound pretty good as a default.  And I accept that it
> would be reasonable to implement that initially without a tunable beyond
> total number of IRQs.
> 
> I would like to have a central mechanism for this that would allow the
> administrator to set a policy of spreading IRQs across all threads,
> cores, packages, local cores, etc.  (The out-of-tree version of sfc has
> such options.)  If you add the mechanism and default that you've
> proposed, then someone (maybe me) can get round to the configurable
> policy later.
> 
> Ben.
> 

What I'm preparing now is just a very simple helper function that
gets the local numa node and queue number as its inputs, and returns
the recommended cpumask.
A driver that would like to use it, will set using it the
affinity_hint.

This function does seem to be a convenient place for setting different
policies.

Amir.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2014-02-23 17:25 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-02-19 12:58 [PATCH net-next V1 0/3] net/mlx4: Mellanox driver update 01-01-2014 Amir Vadai
2014-02-19 12:58 ` [PATCH net-next V1 1/3] net/mlx4_en: Use affinity hint Amir Vadai
2014-02-19 12:58 ` [PATCH net-next V1 2/3] net/mlx4: Set number of RX rings in a utility function Amir Vadai
2014-02-19 12:58 ` [PATCH net-next V1 3/3] net/mlx4: Fix limiting number of IRQ's instead of RSS queues Amir Vadai
2014-02-19 21:50 ` [PATCH net-next V1 0/3] net/mlx4: Mellanox driver update 01-01-2014 David Miller
2014-02-20 20:15   ` Amir Vadai
2014-02-20 21:11     ` David Miller
2014-02-22  1:33   ` [PATCH net-next V1 0/3] " Ben Hutchings
2014-02-23  9:01     ` Amir Vadai
2014-02-23 17:06       ` Ben Hutchings
2014-02-23 17:25         ` Amir Vadai

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).