All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net V1 0/3] Mellanox EN driver fixes 2014-06-23
@ 2014-06-29  8:54 Amir Vadai
  2014-06-29  8:54 ` [PATCH net V1 1/3] net/mlx4_en: Don't use irq_affinity_notifier to track changes in IRQ affinity map Amir Vadai
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Amir Vadai @ 2014-06-29  8:54 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Amir Vadai, Or Gerlitz, Yevgeny Petrilin

Hi,

Below are some fixes to patches submitted to 3.16.

First patch is according to discussions with Ben [1] and Thomas [2] - to do not
use affinity notifier, since it breaks RFS. Instead detect changes in IRQ
affinity map, by checking if current CPU is set in affinity map on NAPI poll.
The two other patches fix some bugs introduced in commit [3].

Patches were applied and tested over commit dba6311: ('powerpc: bpf: Fix the
broken LD_VLAN_TAG_PRESENT test')

Thanks,
Amir

[1] - http://www.spinics.net/lists/netdev/msg283096.html
[2] - https://lkml.org/lkml/2014/5/25/31
[3] - 554e101: ('lib/cpumask: cpumask_set_cpu_local_first to use all cores when
      numa node is not defined')

Changes from V0:
- Patch 1/3: net/mlx4_en: Don't use irq_affinity_notifier to track changes...
  - Added missing include to fix build issue on s390x arch

Amir Vadai (3):
  net/mlx4_en: Don't use irq_affinity_notifier to track changes in IRQ
    affinity map
  lib/cpumask: cpumask_set_cpu_local_first to use all cores when numa
    node is not defined
  net/mlx4_en: IRQ affinity hint is not cleared on port down

 drivers/net/ethernet/mellanox/mlx4/cq.c      |  2 -
 drivers/net/ethernet/mellanox/mlx4/en_cq.c   |  7 ++-
 drivers/net/ethernet/mellanox/mlx4/en_rx.c   | 16 +++++--
 drivers/net/ethernet/mellanox/mlx4/en_tx.c   |  6 ---
 drivers/net/ethernet/mellanox/mlx4/eq.c      | 69 ++++------------------------
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h |  1 +
 include/linux/mlx4/device.h                  |  4 +-
 lib/cpumask.c                                |  2 +-
 8 files changed, 30 insertions(+), 77 deletions(-)

-- 
1.8.3.4

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH net V1 1/3] net/mlx4_en: Don't use irq_affinity_notifier to track changes in IRQ affinity map
  2014-06-29  8:54 [PATCH net V1 0/3] Mellanox EN driver fixes 2014-06-23 Amir Vadai
@ 2014-06-29  8:54 ` Amir Vadai
  2014-06-30  6:41   ` Eric Dumazet
  2014-06-29  8:54 ` [PATCH net V1 2/3] lib/cpumask: cpumask_set_cpu_local_first to use all cores when numa node is not defined Amir Vadai
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 11+ messages in thread
From: Amir Vadai @ 2014-06-29  8:54 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Amir Vadai, Or Gerlitz, Yevgeny Petrilin,
	Thomas Gleixner, Ben Hutchings

IRQ affinity notifier can only have a single notifier - cpu_rmap
notifier. Can't use it to track changes in IRQ affinity map.
Detect IRQ affinity changes by comparing CPU to current IRQ affinity map
during NAPI poll thread.

CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ben Hutchings <ben@decadent.org.uk>
Fixes: 2eacc23 ("net/mlx4_core: Enforce irq affinity changes immediatly")
Signed-off-by: Amir Vadai <amirv@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx4/cq.c      |  2 -
 drivers/net/ethernet/mellanox/mlx4/en_cq.c   |  4 ++
 drivers/net/ethernet/mellanox/mlx4/en_rx.c   | 16 +++++--
 drivers/net/ethernet/mellanox/mlx4/en_tx.c   |  6 ---
 drivers/net/ethernet/mellanox/mlx4/eq.c      | 69 ++++------------------------
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h |  1 +
 include/linux/mlx4/device.h                  |  4 +-
 7 files changed, 28 insertions(+), 74 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/cq.c b/drivers/net/ethernet/mellanox/mlx4/cq.c
index 80f7252..56022d6 100644
--- a/drivers/net/ethernet/mellanox/mlx4/cq.c
+++ b/drivers/net/ethernet/mellanox/mlx4/cq.c
@@ -294,8 +294,6 @@ int mlx4_cq_alloc(struct mlx4_dev *dev, int nent,
 	init_completion(&cq->free);
 
 	cq->irq = priv->eq_table.eq[cq->vector].irq;
-	cq->irq_affinity_change = false;
-
 	return 0;
 
 err_radix:
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_cq.c b/drivers/net/ethernet/mellanox/mlx4/en_cq.c
index 4b21307..1213cc7 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_cq.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_cq.c
@@ -128,6 +128,10 @@ int mlx4_en_activate_cq(struct mlx4_en_priv *priv, struct mlx4_en_cq *cq,
 					mlx4_warn(mdev, "Failed assigning an EQ to %s, falling back to legacy EQ's\n",
 						  name);
 				}
+
+				cq->irq_desc =
+					irq_to_desc(mlx4_eq_get_irq(mdev->dev,
+								    cq->vector));
 			}
 		} else {
 			cq->vector = (cq->ring + 1 + priv->port) %
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index d2d4157..9672417 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -40,6 +40,7 @@
 #include <linux/if_ether.h>
 #include <linux/if_vlan.h>
 #include <linux/vmalloc.h>
+#include <linux/irq.h>
 
 #include "mlx4_en.h"
 
@@ -896,16 +897,25 @@ int mlx4_en_poll_rx_cq(struct napi_struct *napi, int budget)
 
 	/* If we used up all the quota - we're probably not done yet... */
 	if (done == budget) {
+		int cpu_curr;
+		const struct cpumask *aff;
+
 		INC_PERF_COUNTER(priv->pstats.napi_quota);
-		if (unlikely(cq->mcq.irq_affinity_change)) {
-			cq->mcq.irq_affinity_change = false;
+
+		cpu_curr = smp_processor_id();
+		aff = irq_desc_get_irq_data(cq->irq_desc)->affinity;
+
+		if (unlikely(!cpumask_test_cpu(cpu_curr, aff))) {
+			/* Current cpu is not according to smp_irq_affinity -
+			 * probably affinity changed. need to stop this NAPI
+			 * poll, and restart it on the right CPU
+			 */
 			napi_complete(napi);
 			mlx4_en_arm_cq(priv, cq);
 			return 0;
 		}
 	} else {
 		/* Done for now */
-		cq->mcq.irq_affinity_change = false;
 		napi_complete(napi);
 		mlx4_en_arm_cq(priv, cq);
 	}
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
index 8be7483..ac3dead 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
@@ -474,15 +474,9 @@ int mlx4_en_poll_tx_cq(struct napi_struct *napi, int budget)
 	/* If we used up all the quota - we're probably not done yet... */
 	if (done < budget) {
 		/* Done for now */
-		cq->mcq.irq_affinity_change = false;
 		napi_complete(napi);
 		mlx4_en_arm_cq(priv, cq);
 		return done;
-	} else if (unlikely(cq->mcq.irq_affinity_change)) {
-		cq->mcq.irq_affinity_change = false;
-		napi_complete(napi);
-		mlx4_en_arm_cq(priv, cq);
-		return 0;
 	}
 	return budget;
 }
diff --git a/drivers/net/ethernet/mellanox/mlx4/eq.c b/drivers/net/ethernet/mellanox/mlx4/eq.c
index d954ec1..2a004b3 100644
--- a/drivers/net/ethernet/mellanox/mlx4/eq.c
+++ b/drivers/net/ethernet/mellanox/mlx4/eq.c
@@ -53,11 +53,6 @@ enum {
 	MLX4_EQ_ENTRY_SIZE	= 0x20
 };
 
-struct mlx4_irq_notify {
-	void *arg;
-	struct irq_affinity_notify notify;
-};
-
 #define MLX4_EQ_STATUS_OK	   ( 0 << 28)
 #define MLX4_EQ_STATUS_WRITE_FAIL  (10 << 28)
 #define MLX4_EQ_OWNER_SW	   ( 0 << 24)
@@ -1088,57 +1083,6 @@ static void mlx4_unmap_clr_int(struct mlx4_dev *dev)
 	iounmap(priv->clr_base);
 }
 
-static void mlx4_irq_notifier_notify(struct irq_affinity_notify *notify,
-				     const cpumask_t *mask)
-{
-	struct mlx4_irq_notify *n = container_of(notify,
-						 struct mlx4_irq_notify,
-						 notify);
-	struct mlx4_priv *priv = (struct mlx4_priv *)n->arg;
-	struct radix_tree_iter iter;
-	void **slot;
-
-	radix_tree_for_each_slot(slot, &priv->cq_table.tree, &iter, 0) {
-		struct mlx4_cq *cq = (struct mlx4_cq *)(*slot);
-
-		if (cq->irq == notify->irq)
-			cq->irq_affinity_change = true;
-	}
-}
-
-static void mlx4_release_irq_notifier(struct kref *ref)
-{
-	struct mlx4_irq_notify *n = container_of(ref, struct mlx4_irq_notify,
-						 notify.kref);
-	kfree(n);
-}
-
-static void mlx4_assign_irq_notifier(struct mlx4_priv *priv,
-				     struct mlx4_dev *dev, int irq)
-{
-	struct mlx4_irq_notify *irq_notifier = NULL;
-	int err = 0;
-
-	irq_notifier = kzalloc(sizeof(*irq_notifier), GFP_KERNEL);
-	if (!irq_notifier) {
-		mlx4_warn(dev, "Failed to allocate irq notifier. irq %d\n",
-			  irq);
-		return;
-	}
-
-	irq_notifier->notify.irq = irq;
-	irq_notifier->notify.notify = mlx4_irq_notifier_notify;
-	irq_notifier->notify.release = mlx4_release_irq_notifier;
-	irq_notifier->arg = priv;
-	err = irq_set_affinity_notifier(irq, &irq_notifier->notify);
-	if (err) {
-		kfree(irq_notifier);
-		irq_notifier = NULL;
-		mlx4_warn(dev, "Failed to set irq notifier. irq %d\n", irq);
-	}
-}
-
-
 int mlx4_alloc_eq_table(struct mlx4_dev *dev)
 {
 	struct mlx4_priv *priv = mlx4_priv(dev);
@@ -1409,8 +1353,6 @@ int mlx4_assign_eq(struct mlx4_dev *dev, char *name, struct cpu_rmap *rmap,
 				continue;
 				/*we dont want to break here*/
 			}
-			mlx4_assign_irq_notifier(priv, dev,
-						 priv->eq_table.eq[vec].irq);
 
 			eq_set_ci(&priv->eq_table.eq[vec], 1);
 		}
@@ -1427,6 +1369,14 @@ int mlx4_assign_eq(struct mlx4_dev *dev, char *name, struct cpu_rmap *rmap,
 }
 EXPORT_SYMBOL(mlx4_assign_eq);
 
+int mlx4_eq_get_irq(struct mlx4_dev *dev, int vec)
+{
+	struct mlx4_priv *priv = mlx4_priv(dev);
+
+	return priv->eq_table.eq[vec].irq;
+}
+EXPORT_SYMBOL(mlx4_eq_get_irq);
+
 void mlx4_release_eq(struct mlx4_dev *dev, int vec)
 {
 	struct mlx4_priv *priv = mlx4_priv(dev);
@@ -1438,9 +1388,6 @@ void mlx4_release_eq(struct mlx4_dev *dev, int vec)
 		  Belonging to a legacy EQ*/
 		mutex_lock(&priv->msix_ctl.pool_lock);
 		if (priv->msix_ctl.pool_bm & 1ULL << i) {
-			irq_set_affinity_notifier(
-				priv->eq_table.eq[vec].irq,
-				NULL);
 			free_irq(priv->eq_table.eq[vec].irq,
 				 &priv->eq_table.eq[vec]);
 			priv->msix_ctl.pool_bm &= ~(1ULL << i);
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
index 0e15295..624e193 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
@@ -343,6 +343,7 @@ struct mlx4_en_cq {
 #define CQ_USER_PEND (MLX4_EN_CQ_STATE_POLL | MLX4_EN_CQ_STATE_POLL_YIELD)
 	spinlock_t poll_lock; /* protects from LLS/napi conflicts */
 #endif  /* CONFIG_NET_RX_BUSY_POLL */
+	struct irq_desc *irq_desc;
 };
 
 struct mlx4_en_port_profile {
diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
index b12f4bb..35b51e7 100644
--- a/include/linux/mlx4/device.h
+++ b/include/linux/mlx4/device.h
@@ -578,8 +578,6 @@ struct mlx4_cq {
 	u32			cons_index;
 
 	u16                     irq;
-	bool                    irq_affinity_change;
-
 	__be32		       *set_ci_db;
 	__be32		       *arm_db;
 	int			arm_sn;
@@ -1167,6 +1165,8 @@ int mlx4_assign_eq(struct mlx4_dev *dev, char *name, struct cpu_rmap *rmap,
 		   int *vector);
 void mlx4_release_eq(struct mlx4_dev *dev, int vec);
 
+int mlx4_eq_get_irq(struct mlx4_dev *dev, int vec);
+
 int mlx4_get_phys_port_id(struct mlx4_dev *dev);
 int mlx4_wol_read(struct mlx4_dev *dev, u64 *config, int port);
 int mlx4_wol_write(struct mlx4_dev *dev, u64 config, int port);
-- 
1.8.3.4

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH net V1 2/3] lib/cpumask: cpumask_set_cpu_local_first to use all cores when numa node is not defined
  2014-06-29  8:54 [PATCH net V1 0/3] Mellanox EN driver fixes 2014-06-23 Amir Vadai
  2014-06-29  8:54 ` [PATCH net V1 1/3] net/mlx4_en: Don't use irq_affinity_notifier to track changes in IRQ affinity map Amir Vadai
@ 2014-06-29  8:54 ` Amir Vadai
  2014-06-29  8:54 ` [PATCH net V1 3/3] net/mlx4_en: IRQ affinity hint is not cleared on port down Amir Vadai
  2014-07-03  1:29 ` [PATCH net V1 0/3] Mellanox EN driver fixes 2014-06-23 David Miller
  3 siblings, 0 replies; 11+ messages in thread
From: Amir Vadai @ 2014-06-29  8:54 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Amir Vadai, Or Gerlitz, Yevgeny Petrilin

When device is non numa aware (numa_node == -1), use all online cpu's.

Signed-off-by: Amir Vadai <amirv@mellanox.com>
---
 lib/cpumask.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/cpumask.c b/lib/cpumask.c
index c101230..b6513a9 100644
--- a/lib/cpumask.c
+++ b/lib/cpumask.c
@@ -191,7 +191,7 @@ int cpumask_set_cpu_local_first(int i, int numa_node, cpumask_t *dstp)
 
 	i %= num_online_cpus();
 
-	if (!cpumask_of_node(numa_node)) {
+	if (numa_node == -1 || !cpumask_of_node(numa_node)) {
 		/* Use all online cpu's for non numa aware system */
 		cpumask_copy(mask, cpu_online_mask);
 	} else {
-- 
1.8.3.4

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH net V1 3/3] net/mlx4_en: IRQ affinity hint is not cleared on port down
  2014-06-29  8:54 [PATCH net V1 0/3] Mellanox EN driver fixes 2014-06-23 Amir Vadai
  2014-06-29  8:54 ` [PATCH net V1 1/3] net/mlx4_en: Don't use irq_affinity_notifier to track changes in IRQ affinity map Amir Vadai
  2014-06-29  8:54 ` [PATCH net V1 2/3] lib/cpumask: cpumask_set_cpu_local_first to use all cores when numa node is not defined Amir Vadai
@ 2014-06-29  8:54 ` Amir Vadai
  2014-07-03  1:29 ` [PATCH net V1 0/3] Mellanox EN driver fixes 2014-06-23 David Miller
  3 siblings, 0 replies; 11+ messages in thread
From: Amir Vadai @ 2014-06-29  8:54 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Amir Vadai, Or Gerlitz, Yevgeny Petrilin

Need to remove affinity hint at mlx4_en_deactivate_cq() and not at
mlx4_en_destroy_cq() - since affinity_mask might be free'd while still
being used by procfs.

Signed-off-by: Amir Vadai <amirv@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx4/en_cq.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_cq.c b/drivers/net/ethernet/mellanox/mlx4/en_cq.c
index 1213cc7..14c0004 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_cq.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_cq.c
@@ -191,8 +191,6 @@ void mlx4_en_destroy_cq(struct mlx4_en_priv *priv, struct mlx4_en_cq **pcq)
 	mlx4_en_unmap_buffer(&cq->wqres.buf);
 	mlx4_free_hwq_res(mdev->dev, &cq->wqres, cq->buf_size);
 	if (priv->mdev->dev->caps.comp_pool && cq->vector) {
-		if (!cq->is_tx)
-			irq_set_affinity_hint(cq->mcq.irq, NULL);
 		mlx4_release_eq(priv->mdev->dev, cq->vector);
 	}
 	cq->vector = 0;
@@ -208,6 +206,7 @@ void mlx4_en_deactivate_cq(struct mlx4_en_priv *priv, struct mlx4_en_cq *cq)
 	if (!cq->is_tx) {
 		napi_hash_del(&cq->napi);
 		synchronize_rcu();
+		irq_set_affinity_hint(cq->mcq.irq, NULL);
 	}
 	netif_napi_del(&cq->napi);
 
-- 
1.8.3.4

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH net V1 1/3] net/mlx4_en: Don't use irq_affinity_notifier to track changes in IRQ affinity map
  2014-06-29  8:54 ` [PATCH net V1 1/3] net/mlx4_en: Don't use irq_affinity_notifier to track changes in IRQ affinity map Amir Vadai
@ 2014-06-30  6:41   ` Eric Dumazet
  2014-06-30  8:34     ` Amir Vadai
  0 siblings, 1 reply; 11+ messages in thread
From: Eric Dumazet @ 2014-06-30  6:41 UTC (permalink / raw)
  To: Amir Vadai
  Cc: David S. Miller, netdev, Or Gerlitz, Yevgeny Petrilin,
	Thomas Gleixner, Ben Hutchings

On Sun, 2014-06-29 at 11:54 +0300, Amir Vadai wrote:
> IRQ affinity notifier can only have a single notifier - cpu_rmap
> notifier. Can't use it to track changes in IRQ affinity map.
> Detect IRQ affinity changes by comparing CPU to current IRQ affinity map
> during NAPI poll thread.

...

> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
> index 8be7483..ac3dead 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
> @@ -474,15 +474,9 @@ int mlx4_en_poll_tx_cq(struct napi_struct *napi, int budget)
>  	/* If we used up all the quota - we're probably not done yet... */
>  	if (done < budget) {
>  		/* Done for now */
> -		cq->mcq.irq_affinity_change = false;
>  		napi_complete(napi);
>  		mlx4_en_arm_cq(priv, cq);
>  		return done;
> -	} else if (unlikely(cq->mcq.irq_affinity_change)) {
> -		cq->mcq.irq_affinity_change = false;
> -		napi_complete(napi);
> -		mlx4_en_arm_cq(priv, cq);
> -		return 0;
>  	}
>  	return budget;
>  }

It seems nothing is done then for the TX side after this patch ?

You might want to drain whole queue instead of limiting to a 'budget',
otherwise, a cpu might be stuck servicing (soft)irq for the TX
completion, even if irq affinities say otherwise.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net V1 1/3] net/mlx4_en: Don't use irq_affinity_notifier to track changes in IRQ affinity map
  2014-06-30  6:41   ` Eric Dumazet
@ 2014-06-30  8:34     ` Amir Vadai
  2014-06-30  9:11       ` Eric Dumazet
  2014-07-01  3:33       ` David Miller
  0 siblings, 2 replies; 11+ messages in thread
From: Amir Vadai @ 2014-06-30  8:34 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S. Miller, netdev, Or Gerlitz, Yevgeny Petrilin,
	Thomas Gleixner, Ben Hutchings, amira, Yuval Atias

On 6/30/2014 9:41 AM, Eric Dumazet wrote:
> On Sun, 2014-06-29 at 11:54 +0300, Amir Vadai wrote:
>> IRQ affinity notifier can only have a single notifier - cpu_rmap
>> notifier. Can't use it to track changes in IRQ affinity map.
>> Detect IRQ affinity changes by comparing CPU to current IRQ affinity map
>> during NAPI poll thread.
> 
> ...
> 
>> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
>> index 8be7483..ac3dead 100644
>> --- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
>> +++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
>> @@ -474,15 +474,9 @@ int mlx4_en_poll_tx_cq(struct napi_struct *napi, int budget)
>>  	/* If we used up all the quota - we're probably not done yet... */
>>  	if (done < budget) {
>>  		/* Done for now */
>> -		cq->mcq.irq_affinity_change = false;
>>  		napi_complete(napi);
>>  		mlx4_en_arm_cq(priv, cq);
>>  		return done;
>> -	} else if (unlikely(cq->mcq.irq_affinity_change)) {
>> -		cq->mcq.irq_affinity_change = false;
>> -		napi_complete(napi);
>> -		mlx4_en_arm_cq(priv, cq);
>> -		return 0;
>>  	}
>>  	return budget;
>>  }
> 
> It seems nothing is done then for the TX side after this patch ?
> 
> You might want to drain whole queue instead of limiting to a 'budget',
> otherwise, a cpu might be stuck servicing (soft)irq for the TX
> completion, even if irq affinities say otherwise.
> 

TX completions are very quick compared to the skb preparation and
sending. Which is not the case for RX completions. Because of that, it
is very easy to reproduce the problem in RX flows, but we never had any
report of that problem in the TX flow.
I prefer not to spend time on the TX, since we plan to send a patch soon
to use the same NAPI for both TX and RX.

Amir

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net V1 1/3] net/mlx4_en: Don't use irq_affinity_notifier to track changes in IRQ affinity map
  2014-06-30  8:34     ` Amir Vadai
@ 2014-06-30  9:11       ` Eric Dumazet
  2014-06-30 10:43         ` Amir Vadai
  2014-07-01  3:33       ` David Miller
  1 sibling, 1 reply; 11+ messages in thread
From: Eric Dumazet @ 2014-06-30  9:11 UTC (permalink / raw)
  To: amirv
  Cc: David S. Miller, netdev, Or Gerlitz, Yevgeny Petrilin,
	Thomas Gleixner, Ben Hutchings, amira, Yuval Atias

On Mon, 2014-06-30 at 11:34 +0300, Amir Vadai wrote:

> TX completions are very quick compared to the skb preparation and
> sending. Which is not the case for RX completions. Because of that, it
> is very easy to reproduce the problem in RX flows, but we never had any
> report of that problem in the TX flow.

This is because reporters probably use same number of RX and TX queues.

With TCP Small queues, TX completions are not always quick, if thousands
of flows are active.

Some people hit the locked cpu when say one cpu has to drain 8 TX
queues, because 7 other cpus can continuously feed more packets 

> I prefer not to spend time on the TX, since we plan to send a patch soon
> to use the same NAPI for both TX and RX.

Thanks, this sounds great.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net V1 1/3] net/mlx4_en: Don't use irq_affinity_notifier to track changes in IRQ affinity map
  2014-06-30  9:11       ` Eric Dumazet
@ 2014-06-30 10:43         ` Amir Vadai
  0 siblings, 0 replies; 11+ messages in thread
From: Amir Vadai @ 2014-06-30 10:43 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S. Miller, netdev, Or Gerlitz, Yevgeny Petrilin,
	Thomas Gleixner, Ben Hutchings, amira, Yuval Atias

On 6/30/2014 12:11 PM, Eric Dumazet wrote:
> On Mon, 2014-06-30 at 11:34 +0300, Amir Vadai wrote:
> 
>> TX completions are very quick compared to the skb preparation and
>> sending. Which is not the case for RX completions. Because of that, it
>> is very easy to reproduce the problem in RX flows, but we never had any
>> report of that problem in the TX flow.
> 
> This is because reporters probably use same number of RX and TX queues.
> 
> With TCP Small queues, TX completions are not always quick, if thousands
> of flows are active.
> 
> Some people hit the locked cpu when say one cpu has to drain 8 TX
> queues, because 7 other cpus can continuously feed more packets 
> 
>> I prefer not to spend time on the TX, since we plan to send a patch soon
>> to use the same NAPI for both TX and RX.
> 
> Thanks, this sounds great.
> 
> 

Ok, so unless anyone objects, the plan is to continue with this patch
(RX only). If the unified NAPI patch will be delayed I will send the TX
fix too.

Amir

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net V1 1/3] net/mlx4_en: Don't use irq_affinity_notifier to track changes in IRQ affinity map
  2014-06-30  8:34     ` Amir Vadai
  2014-06-30  9:11       ` Eric Dumazet
@ 2014-07-01  3:33       ` David Miller
  2014-07-01  9:14         ` Amir Vadai
  1 sibling, 1 reply; 11+ messages in thread
From: David Miller @ 2014-07-01  3:33 UTC (permalink / raw)
  To: amirv, amirv.mellanox
  Cc: eric.dumazet, netdev, ogerlitz, yevgenyp, tglx, ben, amira, yuvala

From: Amir Vadai <amirv.mellanox@gmail.com>
Date: Mon, 30 Jun 2014 11:34:22 +0300

> On 6/30/2014 9:41 AM, Eric Dumazet wrote:
>> You might want to drain whole queue instead of limiting to a 'budget',
>> otherwise, a cpu might be stuck servicing (soft)irq for the TX
>> completion, even if irq affinities say otherwise.
>> 
> 
> TX completions are very quick compared to the skb preparation and
> sending. Which is not the case for RX completions. Because of that, it
> is very easy to reproduce the problem in RX flows, but we never had any
> report of that problem in the TX flow.
> I prefer not to spend time on the TX, since we plan to send a patch soon
> to use the same NAPI for both TX and RX.

It is always advised to completely ignore the budget for TX work, this is
what we tell every driver author when discussion NAPI implementations.

Please make your driver conform to this, thanks.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net V1 1/3] net/mlx4_en: Don't use irq_affinity_notifier to track changes in IRQ affinity map
  2014-07-01  3:33       ` David Miller
@ 2014-07-01  9:14         ` Amir Vadai
  0 siblings, 0 replies; 11+ messages in thread
From: Amir Vadai @ 2014-07-01  9:14 UTC (permalink / raw)
  To: David Miller, amirv.mellanox
  Cc: eric.dumazet, netdev, ogerlitz, yevgenyp, tglx, ben, amira, yuvala

On 07/01/2014 06:33 AM, David Miller wrote:
> From: Amir Vadai <amirv.mellanox@gmail.com>
> Date: Mon, 30 Jun 2014 11:34:22 +0300
>
>> On 6/30/2014 9:41 AM, Eric Dumazet wrote:
>>> You might want to drain whole queue instead of limiting to a 'budget',
>>> otherwise, a cpu might be stuck servicing (soft)irq for the TX
>>> completion, even if irq affinities say otherwise.
>>>
>>
>> TX completions are very quick compared to the skb preparation and
>> sending. Which is not the case for RX completions. Because of that, it
>> is very easy to reproduce the problem in RX flows, but we never had any
>> report of that problem in the TX flow.
>> I prefer not to spend time on the TX, since we plan to send a patch soon
>> to use the same NAPI for both TX and RX.
>
> It is always advised to completely ignore the budget for TX work, this is
> what we tell every driver author when discussion NAPI implementations.
>
> Please make your driver conform to this, thanks.
>

Ok.

Please continue the process on this V1 of the patchset. The fix to TX 
poll is not related to this patch - this patch is fixing a regression 
that broke aRFS in mlx4_en.

I will send a separate fix to purge all packets on TX work later on this 
week.

Thanks,
Amir

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net V1 0/3] Mellanox EN driver fixes 2014-06-23
  2014-06-29  8:54 [PATCH net V1 0/3] Mellanox EN driver fixes 2014-06-23 Amir Vadai
                   ` (2 preceding siblings ...)
  2014-06-29  8:54 ` [PATCH net V1 3/3] net/mlx4_en: IRQ affinity hint is not cleared on port down Amir Vadai
@ 2014-07-03  1:29 ` David Miller
  3 siblings, 0 replies; 11+ messages in thread
From: David Miller @ 2014-07-03  1:29 UTC (permalink / raw)
  To: amirv; +Cc: netdev, ogerlitz, yevgenyp

From: Amir Vadai <amirv@mellanox.com>
Date: Sun, 29 Jun 2014 11:54:54 +0300

> Below are some fixes to patches submitted to 3.16.
> 
> First patch is according to discussions with Ben [1] and Thomas [2] - to do not
> use affinity notifier, since it breaks RFS. Instead detect changes in IRQ
> affinity map, by checking if current CPU is set in affinity map on NAPI poll.
> The two other patches fix some bugs introduced in commit [3].
> 
> Patches were applied and tested over commit dba6311: ('powerpc: bpf: Fix the
> broken LD_VLAN_TAG_PRESENT test')

Series applied, thanks.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2014-07-03  1:29 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-29  8:54 [PATCH net V1 0/3] Mellanox EN driver fixes 2014-06-23 Amir Vadai
2014-06-29  8:54 ` [PATCH net V1 1/3] net/mlx4_en: Don't use irq_affinity_notifier to track changes in IRQ affinity map Amir Vadai
2014-06-30  6:41   ` Eric Dumazet
2014-06-30  8:34     ` Amir Vadai
2014-06-30  9:11       ` Eric Dumazet
2014-06-30 10:43         ` Amir Vadai
2014-07-01  3:33       ` David Miller
2014-07-01  9:14         ` Amir Vadai
2014-06-29  8:54 ` [PATCH net V1 2/3] lib/cpumask: cpumask_set_cpu_local_first to use all cores when numa node is not defined Amir Vadai
2014-06-29  8:54 ` [PATCH net V1 3/3] net/mlx4_en: IRQ affinity hint is not cleared on port down Amir Vadai
2014-07-03  1:29 ` [PATCH net V1 0/3] Mellanox EN driver fixes 2014-06-23 David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.