All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH for-next 0/2] IB/mlx5: Use tasklet to decrease completions processing time in interrupts
@ 2016-04-17 14:08 Matan Barak
       [not found] ` <1460902121-5567-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: Matan Barak @ 2016-04-17 14:08 UTC (permalink / raw)
  To: Doug Ledford; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Majd Dibbiny, Matan Barak

Hi Doug,

The mlx5 driver handles completion callbacks inside interrupts.
These callbacks could be lengthy and thus cause hard lockups.
In order to avoid these lockups, we introduce a tasklet mechanism.
The mlx5_ib driver uses this mechanism in order to defer its
completion callbacks processing to the tasklet.

This follows the same mechanism we implemented for mlx4 that
successfully decreased the processing time in interrupts.

Regards,
Matan

Matan Barak (2):
  net/mlx5_core: Use tasklet for user-space CQ completion events
  IB/mlx5: Fire the CQ completion handler from tasklet

 drivers/infiniband/hw/mlx5/cq.c                    |  5 +-
 drivers/net/ethernet/mellanox/mlx5/core/cq.c       | 59 ++++++++++++++++++++++
 drivers/net/ethernet/mellanox/mlx5/core/eq.c       | 12 ++++-
 drivers/net/ethernet/mellanox/mlx5/core/main.c     | 17 +++++++
 .../net/ethernet/mellanox/mlx5/core/mlx5_core.h    |  2 +
 include/linux/mlx5/cq.h                            |  5 ++
 include/linux/mlx5/driver.h                        | 10 ++++
 7 files changed, 108 insertions(+), 2 deletions(-)

-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH for-next 1/2] net/mlx5_core: Use tasklet for user-space CQ completion events
       [not found] ` <1460902121-5567-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2016-04-17 14:08   ` Matan Barak
  2016-04-17 14:08   ` [PATCH for-next 2/2] IB/mlx5: Fire the CQ completion handler from tasklet Matan Barak
  2016-04-18 13:04   ` [PATCH for-next 0/2] IB/mlx5: Use tasklet to decrease completions processing time in interrupts Christoph Hellwig
  2 siblings, 0 replies; 9+ messages in thread
From: Matan Barak @ 2016-04-17 14:08 UTC (permalink / raw)
  To: Doug Ledford; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Majd Dibbiny, Matan Barak

Previously, we've fired all our completion callbacks straight from
our ISR.

Some of those callbacks were lightweight (for example, mlx5 Ethernet
napi callbacks), but some of them did more work (for example,
the user-space RDMA stack uverbs' completion handler). Besides that,
doing more than the minimal work in ISR is generally considered wrong,
it could even lead to a hard lockup of the system. Since when a lot
of completion events are generated by the hardware, the loop over
those events could be so long, that we'll get into a hard lockup by
the system watchdog.

In order to avoid that, add a new way of invoking completion events
callbacks. In the interrupt itself, we add the CQs which receive
completion event to a per-EQ list and schedule a tasklet. In the
tasklet context we loop over all the CQs in the list and invoke the
user callback.

Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/net/ethernet/mellanox/mlx5/core/cq.c       | 59 ++++++++++++++++++++++
 drivers/net/ethernet/mellanox/mlx5/core/eq.c       | 12 ++++-
 drivers/net/ethernet/mellanox/mlx5/core/main.c     | 17 +++++++
 .../net/ethernet/mellanox/mlx5/core/mlx5_core.h    |  2 +
 include/linux/mlx5/cq.h                            |  5 ++
 include/linux/mlx5/driver.h                        | 10 ++++
 6 files changed, 104 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cq.c b/drivers/net/ethernet/mellanox/mlx5/core/cq.c
index b51e42d..873a631 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/cq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/cq.c
@@ -39,6 +39,53 @@
 #include <linux/mlx5/cq.h>
 #include "mlx5_core.h"
 
+#define TASKLET_MAX_TIME 2
+#define TASKLET_MAX_TIME_JIFFIES msecs_to_jiffies(TASKLET_MAX_TIME)
+
+void mlx5_cq_tasklet_cb(unsigned long data)
+{
+	unsigned long flags;
+	unsigned long end = jiffies + TASKLET_MAX_TIME_JIFFIES;
+	struct mlx5_eq_tasklet *ctx = (struct mlx5_eq_tasklet *)data;
+	struct mlx5_core_cq *mcq;
+	struct mlx5_core_cq *temp;
+
+	spin_lock_irqsave(&ctx->lock, flags);
+	list_splice_tail_init(&ctx->list, &ctx->process_list);
+	spin_unlock_irqrestore(&ctx->lock, flags);
+
+	list_for_each_entry_safe(mcq, temp, &ctx->process_list,
+				 tasklet_ctx.list) {
+		list_del_init(&mcq->tasklet_ctx.list);
+		mcq->tasklet_ctx.comp(mcq);
+		if (atomic_dec_and_test(&mcq->refcount))
+			complete(&mcq->free);
+		if (time_after(jiffies, end))
+			break;
+	}
+
+	if (!list_empty(&ctx->process_list))
+		tasklet_schedule(&ctx->task);
+}
+
+static void mlx5_add_cq_to_tasklet(struct mlx5_core_cq *cq)
+{
+	unsigned long flags;
+	struct mlx5_eq_tasklet *tasklet_ctx = cq->tasklet_ctx.priv;
+
+	spin_lock_irqsave(&tasklet_ctx->lock, flags);
+	/* When migrating CQs between EQs will be implemented, please note
+	 * that you need to sync this point. It is possible that
+	 * while migrating a CQ, completions on the old EQs could
+	 * still arrive.
+	 */
+	if (list_empty_careful(&cq->tasklet_ctx.list)) {
+		atomic_inc(&cq->refcount);
+		list_add_tail(&cq->tasklet_ctx.list, &tasklet_ctx->list);
+	}
+	spin_unlock_irqrestore(&tasklet_ctx->lock, flags);
+}
+
 void mlx5_cq_completion(struct mlx5_core_dev *dev, u32 cqn)
 {
 	struct mlx5_core_cq *cq;
@@ -96,6 +143,13 @@ int mlx5_core_create_cq(struct mlx5_core_dev *dev, struct mlx5_core_cq *cq,
 	struct mlx5_create_cq_mbox_out out;
 	struct mlx5_destroy_cq_mbox_in din;
 	struct mlx5_destroy_cq_mbox_out dout;
+	int eqn = MLX5_GET(cqc, MLX5_ADDR_OF(create_cq_in, in, cq_context),
+			   c_eqn);
+	struct mlx5_eq *eq;
+
+	eq = mlx5_eqn2eq(dev, eqn);
+	if (IS_ERR(eq))
+		return PTR_ERR(eq);
 
 	in->hdr.opcode = cpu_to_be16(MLX5_CMD_OP_CREATE_CQ);
 	memset(&out, 0, sizeof(out));
@@ -111,6 +165,11 @@ int mlx5_core_create_cq(struct mlx5_core_dev *dev, struct mlx5_core_cq *cq,
 	cq->arm_sn     = 0;
 	atomic_set(&cq->refcount, 1);
 	init_completion(&cq->free);
+	if (!cq->comp)
+		cq->comp = mlx5_add_cq_to_tasklet;
+	/* assuming CQ will be deleted before the EQ */
+	cq->tasklet_ctx.priv = &eq->tasklet_ctx;
+	INIT_LIST_HEAD(&cq->tasklet_ctx.list);
 
 	spin_lock_irq(&table->lock);
 	err = radix_tree_insert(&table->tree, cq->cqn, cq);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
index 18fccec..0e30602 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
@@ -202,7 +202,7 @@ static int mlx5_eq_int(struct mlx5_core_dev *dev, struct mlx5_eq *eq)
 	struct mlx5_eqe *eqe;
 	int eqes_found = 0;
 	int set_ci = 0;
-	u32 cqn;
+	u32 cqn = -1;
 	u32 rsn;
 	u8 port;
 
@@ -320,6 +320,9 @@ static int mlx5_eq_int(struct mlx5_core_dev *dev, struct mlx5_eq *eq)
 
 	eq_update_ci(eq, 1);
 
+	if (cqn != -1)
+		tasklet_schedule(&eq->tasklet_ctx.task);
+
 	return eqes_found;
 }
 
@@ -403,6 +406,12 @@ int mlx5_create_map_eq(struct mlx5_core_dev *dev, struct mlx5_eq *eq, u8 vecidx,
 	if (err)
 		goto err_irq;
 
+	INIT_LIST_HEAD(&eq->tasklet_ctx.list);
+	INIT_LIST_HEAD(&eq->tasklet_ctx.process_list);
+	spin_lock_init(&eq->tasklet_ctx.lock);
+	tasklet_init(&eq->tasklet_ctx.task, mlx5_cq_tasklet_cb,
+		     (unsigned long)&eq->tasklet_ctx);
+
 	/* EQs are created in ARMED state
 	 */
 	eq_update_ci(eq, 1);
@@ -436,6 +445,7 @@ int mlx5_destroy_unmap_eq(struct mlx5_core_dev *dev, struct mlx5_eq *eq)
 		mlx5_core_warn(dev, "failed to destroy a previously created eq: eqn %d\n",
 			       eq->eqn);
 	synchronize_irq(eq->irqn);
+	tasklet_disable(&eq->tasklet_ctx.task);
 	mlx5_buf_free(dev, &eq->buf);
 
 	return err;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 3f3b2fa..515b640 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -660,6 +660,23 @@ int mlx5_vector2eqn(struct mlx5_core_dev *dev, int vector, int *eqn,
 }
 EXPORT_SYMBOL(mlx5_vector2eqn);
 
+struct mlx5_eq *mlx5_eqn2eq(struct mlx5_core_dev *dev, int eqn)
+{
+	struct mlx5_eq_table *table = &dev->priv.eq_table;
+	struct mlx5_eq *eq;
+
+	spin_lock(&table->lock);
+	list_for_each_entry(eq, &table->comp_eqs_list, list)
+		if (eq->eqn == eqn) {
+			spin_unlock(&table->lock);
+			return eq;
+		}
+
+	spin_unlock(&table->lock);
+
+	return ERR_PTR(-ENOENT);
+}
+
 static void free_comp_eqs(struct mlx5_core_dev *dev)
 {
 	struct mlx5_eq_table *table = &dev->priv.eq_table;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
index 0b0b226..f0d8704 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
@@ -100,6 +100,8 @@ int mlx5_core_disable_hca(struct mlx5_core_dev *dev, u16 func_id);
 int mlx5_wait_for_vf_pages(struct mlx5_core_dev *dev);
 cycle_t mlx5_read_internal_timer(struct mlx5_core_dev *dev);
 u32 mlx5_get_msix_vec(struct mlx5_core_dev *dev, int vecidx);
+struct mlx5_eq *mlx5_eqn2eq(struct mlx5_core_dev *dev, int eqn);
+void mlx5_cq_tasklet_cb(unsigned long data);
 
 void mlx5e_init(void);
 void mlx5e_cleanup(void);
diff --git a/include/linux/mlx5/cq.h b/include/linux/mlx5/cq.h
index b2c9fad..2be976d 100644
--- a/include/linux/mlx5/cq.h
+++ b/include/linux/mlx5/cq.h
@@ -53,6 +53,11 @@ struct mlx5_core_cq {
 	unsigned		arm_sn;
 	struct mlx5_rsc_debug	*dbg;
 	int			pid;
+	struct {
+		struct list_head list;
+		void (*comp)(struct mlx5_core_cq *);
+		void		*priv;
+	} tasklet_ctx;
 };
 
 
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index dcd5ac8..816fe00 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -41,6 +41,7 @@
 #include <linux/slab.h>
 #include <linux/vmalloc.h>
 #include <linux/radix-tree.h>
+#include <linux/interrupt.h>
 
 #include <linux/mlx5/device.h>
 #include <linux/mlx5/doorbell.h>
@@ -304,6 +305,14 @@ struct mlx5_buf {
 	u8			page_shift;
 };
 
+struct mlx5_eq_tasklet {
+	struct list_head list;
+	struct list_head process_list;
+	struct tasklet_struct task;
+	/* lock on completion tasklet list */
+	spinlock_t lock;
+};
+
 struct mlx5_eq {
 	struct mlx5_core_dev   *dev;
 	__be32 __iomem	       *doorbell;
@@ -317,6 +326,7 @@ struct mlx5_eq {
 	struct list_head	list;
 	int			index;
 	struct mlx5_rsc_debug	*dbg;
+	struct mlx5_eq_tasklet	tasklet_ctx;
 };
 
 struct mlx5_core_psv {
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH for-next 2/2] IB/mlx5: Fire the CQ completion handler from tasklet
       [not found] ` <1460902121-5567-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2016-04-17 14:08   ` [PATCH for-next 1/2] net/mlx5_core: Use tasklet for user-space CQ completion events Matan Barak
@ 2016-04-17 14:08   ` Matan Barak
  2016-04-18 13:04   ` [PATCH for-next 0/2] IB/mlx5: Use tasklet to decrease completions processing time in interrupts Christoph Hellwig
  2 siblings, 0 replies; 9+ messages in thread
From: Matan Barak @ 2016-04-17 14:08 UTC (permalink / raw)
  To: Doug Ledford; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Majd Dibbiny, Matan Barak

Previously, mlx5_ib_cq_comp was executed from interrupt context.
Under heavy load, this could cause the CPU core to be in an interrupt
context too long.
Instead of executing the handler from the interrupt context we
execute it from a much friendly tasklet context.

Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/hw/mlx5/cq.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c
index a00ba44..dabcc65 100644
--- a/drivers/infiniband/hw/mlx5/cq.c
+++ b/drivers/infiniband/hw/mlx5/cq.c
@@ -879,7 +879,10 @@ struct ib_cq *mlx5_ib_create_cq(struct ib_device *ibdev,
 
 	mlx5_ib_dbg(dev, "cqn 0x%x\n", cq->mcq.cqn);
 	cq->mcq.irqn = irqn;
-	cq->mcq.comp  = mlx5_ib_cq_comp;
+	if (context)
+		cq->mcq.tasklet_ctx.comp = mlx5_ib_cq_comp;
+	else
+		cq->mcq.comp  = mlx5_ib_cq_comp;
 	cq->mcq.event = mlx5_ib_cq_event;
 
 	INIT_LIST_HEAD(&cq->wc_list);
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH for-next 0/2] IB/mlx5: Use tasklet to decrease completions processing time in interrupts
       [not found] ` <1460902121-5567-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2016-04-17 14:08   ` [PATCH for-next 1/2] net/mlx5_core: Use tasklet for user-space CQ completion events Matan Barak
  2016-04-17 14:08   ` [PATCH for-next 2/2] IB/mlx5: Fire the CQ completion handler from tasklet Matan Barak
@ 2016-04-18 13:04   ` Christoph Hellwig
       [not found]     ` <20160418130456.GA11508-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
  2 siblings, 1 reply; 9+ messages in thread
From: Christoph Hellwig @ 2016-04-18 13:04 UTC (permalink / raw)
  To: Matan Barak; +Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Majd Dibbiny

On Sun, Apr 17, 2016 at 05:08:39PM +0300, Matan Barak wrote:
> Hi Doug,
> 
> The mlx5 driver handles completion callbacks inside interrupts.
> These callbacks could be lengthy and thus cause hard lockups.
> In order to avoid these lockups, we introduce a tasklet mechanism.
> The mlx5_ib driver uses this mechanism in order to defer its
> completion callbacks processing to the tasklet.
> 
> This follows the same mechanism we implemented for mlx4 that
> successfully decreased the processing time in interrupts.

Just curious:  how much of this time is spent inside the mlx5 driver,
and how much is spent in the callbacks from the consumers?  We've now
more than half done with switch the kernel ULPs to the new CQ API
which will always offload the callbacks to softirq or workqueue context,
so if we can avoid a previous offload the completions would be a lot
more efficient.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH for-next 0/2] IB/mlx5: Use tasklet to decrease completions processing time in interrupts
       [not found]     ` <20160418130456.GA11508-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
@ 2016-04-18 13:21       ` Matan Barak (External)
       [not found]         ` <5714DF68.6040509-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: Matan Barak (External) @ 2016-04-18 13:21 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Majd Dibbiny

On 18/04/2016 16:04, Christoph Hellwig wrote:
> On Sun, Apr 17, 2016 at 05:08:39PM +0300, Matan Barak wrote:
>> Hi Doug,
>>
>> The mlx5 driver handles completion callbacks inside interrupts.
>> These callbacks could be lengthy and thus cause hard lockups.
>> In order to avoid these lockups, we introduce a tasklet mechanism.
>> The mlx5_ib driver uses this mechanism in order to defer its
>> completion callbacks processing to the tasklet.
>>
>> This follows the same mechanism we implemented for mlx4 that
>> successfully decreased the processing time in interrupts.
>
> Just curious:  how much of this time is spent inside the mlx5 driver,
> and how much is spent in the callbacks from the consumers?  We've now
> more than half done with switch the kernel ULPs to the new CQ API
> which will always offload the callbacks to softirq or workqueue context,
> so if we can avoid a previous offload the completions would be a lot
> more efficient.
>

In short, you could hit a situation where processing the completions in 
the interrupt takes longer than the rate at which they arrive (lab cases 
that use one event queue, but still).
I agree that going to softirqs/WQs (like the rest of the offloads) is a 
good solution too - maybe even better than this one as the mechanism 
already exists, but why would it be a lot more efficient?

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH for-next 0/2] IB/mlx5: Use tasklet to decrease completions processing time in interrupts
       [not found]         ` <5714DF68.6040509-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2016-04-18 14:33           ` Sagi Grimberg
  2016-05-13 20:16           ` Doug Ledford
  1 sibling, 0 replies; 9+ messages in thread
From: Sagi Grimberg @ 2016-04-18 14:33 UTC (permalink / raw)
  To: Matan Barak (External), Christoph Hellwig
  Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Majd Dibbiny

Hey Matan, Christoph,

>> Just curious:  how much of this time is spent inside the mlx5 driver,
>> and how much is spent in the callbacks from the consumers?  We've now
>> more than half done with switch the kernel ULPs to the new CQ API
>> which will always offload the callbacks to softirq or workqueue context,
>> so if we can avoid a previous offload the completions would be a lot
>> more efficient.
>>
>
> In short, you could hit a situation where processing the completions in
> the interrupt takes longer than the rate at which they arrive (lab cases
> that use one event queue, but still).
> I agree that going to softirqs/WQs (like the rest of the offloads) is a
> good solution too - maybe even better than this one as the mechanism
> already exists, but why would it be a lot more efficient?

The reasons are:
- irq_poll mechanism correctly budgets the time you spend in soft-IRQ.
- it maintains fairness for all the CQs directed to the same EQ.
- it has the perk that you only re-arm the CQ when you consumed all the
completions (without staying in soft-IRQ forever).

Basically it's the same answer as to why using napi is better...
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH for-next 0/2] IB/mlx5: Use tasklet to decrease completions processing time in interrupts
       [not found]         ` <5714DF68.6040509-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2016-04-18 14:33           ` Sagi Grimberg
@ 2016-05-13 20:16           ` Doug Ledford
       [not found]             ` <a93b20ca-42bf-ca2b-2995-aa2837a1bafe-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  1 sibling, 1 reply; 9+ messages in thread
From: Doug Ledford @ 2016-05-13 20:16 UTC (permalink / raw)
  To: Matan Barak (External), Christoph Hellwig
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Majd Dibbiny

[-- Attachment #1: Type: text/plain, Size: 1678 bytes --]

On 04/18/2016 09:21 AM, Matan Barak (External) wrote:
> On 18/04/2016 16:04, Christoph Hellwig wrote:
>> On Sun, Apr 17, 2016 at 05:08:39PM +0300, Matan Barak wrote:
>>> Hi Doug,
>>>
>>> The mlx5 driver handles completion callbacks inside interrupts.
>>> These callbacks could be lengthy and thus cause hard lockups.
>>> In order to avoid these lockups, we introduce a tasklet mechanism.
>>> The mlx5_ib driver uses this mechanism in order to defer its
>>> completion callbacks processing to the tasklet.
>>>
>>> This follows the same mechanism we implemented for mlx4 that
>>> successfully decreased the processing time in interrupts.
>>
>> Just curious:  how much of this time is spent inside the mlx5 driver,
>> and how much is spent in the callbacks from the consumers?  We've now
>> more than half done with switch the kernel ULPs to the new CQ API
>> which will always offload the callbacks to softirq or workqueue context,
>> so if we can avoid a previous offload the completions would be a lot
>> more efficient.
>>
> 
> In short, you could hit a situation where processing the completions in
> the interrupt takes longer than the rate at which they arrive (lab cases
> that use one event queue, but still).
> I agree that going to softirqs/WQs (like the rest of the offloads) is a
> good solution too - maybe even better than this one as the mechanism
> already exists, but why would it be a lot more efficient?
> 

Given the amount of the kernel converted to the new CQ API, are you guys
still looking to have this included?

-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
              GPG KeyID: 0E572FDD



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 884 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH for-next 0/2] IB/mlx5: Use tasklet to decrease completions processing time in interrupts
       [not found]             ` <a93b20ca-42bf-ca2b-2995-aa2837a1bafe-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2016-05-15  7:31               ` Matan Barak
       [not found]                 ` <868e5bcf-18c5-30f4-fe3f-b361c0ce43ec-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: Matan Barak @ 2016-05-15  7:31 UTC (permalink / raw)
  To: Doug Ledford, Christoph Hellwig
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Majd Dibbiny

On 13/05/2016 23:16, Doug Ledford wrote:
> On 04/18/2016 09:21 AM, Matan Barak (External) wrote:
>> On 18/04/2016 16:04, Christoph Hellwig wrote:
>>> On Sun, Apr 17, 2016 at 05:08:39PM +0300, Matan Barak wrote:
>>>> Hi Doug,
>>>>
>>>> The mlx5 driver handles completion callbacks inside interrupts.
>>>> These callbacks could be lengthy and thus cause hard lockups.
>>>> In order to avoid these lockups, we introduce a tasklet mechanism.
>>>> The mlx5_ib driver uses this mechanism in order to defer its
>>>> completion callbacks processing to the tasklet.
>>>>
>>>> This follows the same mechanism we implemented for mlx4 that
>>>> successfully decreased the processing time in interrupts.
>>>
>>> Just curious:  how much of this time is spent inside the mlx5 driver,
>>> and how much is spent in the callbacks from the consumers?  We've now
>>> more than half done with switch the kernel ULPs to the new CQ API
>>> which will always offload the callbacks to softirq or workqueue context,
>>> so if we can avoid a previous offload the completions would be a lot
>>> more efficient.
>>>
>>
>> In short, you could hit a situation where processing the completions in
>> the interrupt takes longer than the rate at which they arrive (lab cases
>> that use one event queue, but still).
>> I agree that going to softirqs/WQs (like the rest of the offloads) is a
>> good solution too - maybe even better than this one as the mechanism
>> already exists, but why would it be a lot more efficient?
>>
>
> Given the amount of the kernel converted to the new CQ API, are you guys
> still looking to have this included?
>

Using irqpoll might be better, although - If I got things right it 
really polls the CQ instead of just notifying the user-space it has some 
work to do (because it handles kernel usages probably). So, in order to 
use the current approach for user-space, we might needs to change it a 
bit (unless I missed something).

Saying that, because we're risking here at having a hard lockup and the 
mlx4 driver already uses this proposed method successfully, I'll be very 
happy if we can merge this and look at migrating to the new CQ API later 
if it makes sense here.

Regards,
Matan
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH for-next 0/2] IB/mlx5: Use tasklet to decrease completions processing time in interrupts
       [not found]                 ` <868e5bcf-18c5-30f4-fe3f-b361c0ce43ec-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2016-05-18 14:48                   ` Doug Ledford
  0 siblings, 0 replies; 9+ messages in thread
From: Doug Ledford @ 2016-05-18 14:48 UTC (permalink / raw)
  To: Matan Barak, Christoph Hellwig
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Majd Dibbiny

[-- Attachment #1: Type: text/plain, Size: 2556 bytes --]

On 05/15/2016 03:31 AM, Matan Barak wrote:
> On 13/05/2016 23:16, Doug Ledford wrote:
>> On 04/18/2016 09:21 AM, Matan Barak (External) wrote:
>>> On 18/04/2016 16:04, Christoph Hellwig wrote:
>>>> On Sun, Apr 17, 2016 at 05:08:39PM +0300, Matan Barak wrote:
>>>>> Hi Doug,
>>>>>
>>>>> The mlx5 driver handles completion callbacks inside interrupts.
>>>>> These callbacks could be lengthy and thus cause hard lockups.
>>>>> In order to avoid these lockups, we introduce a tasklet mechanism.
>>>>> The mlx5_ib driver uses this mechanism in order to defer its
>>>>> completion callbacks processing to the tasklet.
>>>>>
>>>>> This follows the same mechanism we implemented for mlx4 that
>>>>> successfully decreased the processing time in interrupts.
>>>>
>>>> Just curious:  how much of this time is spent inside the mlx5 driver,
>>>> and how much is spent in the callbacks from the consumers?  We've now
>>>> more than half done with switch the kernel ULPs to the new CQ API
>>>> which will always offload the callbacks to softirq or workqueue
>>>> context,
>>>> so if we can avoid a previous offload the completions would be a lot
>>>> more efficient.
>>>>
>>>
>>> In short, you could hit a situation where processing the completions in
>>> the interrupt takes longer than the rate at which they arrive (lab cases
>>> that use one event queue, but still).
>>> I agree that going to softirqs/WQs (like the rest of the offloads) is a
>>> good solution too - maybe even better than this one as the mechanism
>>> already exists, but why would it be a lot more efficient?
>>>
>>
>> Given the amount of the kernel converted to the new CQ API, are you guys
>> still looking to have this included?
>>
> 
> Using irqpoll might be better, although - If I got things right it
> really polls the CQ instead of just notifying the user-space it has some
> work to do (because it handles kernel usages probably). So, in order to
> use the current approach for user-space, we might needs to change it a
> bit (unless I missed something).
> 
> Saying that, because we're risking here at having a hard lockup and the
> mlx4 driver already uses this proposed method successfully, I'll be very
> happy if we can merge this and look at migrating to the new CQ API later
> if it makes sense here.

I've taken these, but I want you to look into whether or not the CQ API
is the next appropriate things to use here.


-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
              GPG KeyID: 0E572FDD



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 884 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-05-18 14:48 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-17 14:08 [PATCH for-next 0/2] IB/mlx5: Use tasklet to decrease completions processing time in interrupts Matan Barak
     [not found] ` <1460902121-5567-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2016-04-17 14:08   ` [PATCH for-next 1/2] net/mlx5_core: Use tasklet for user-space CQ completion events Matan Barak
2016-04-17 14:08   ` [PATCH for-next 2/2] IB/mlx5: Fire the CQ completion handler from tasklet Matan Barak
2016-04-18 13:04   ` [PATCH for-next 0/2] IB/mlx5: Use tasklet to decrease completions processing time in interrupts Christoph Hellwig
     [not found]     ` <20160418130456.GA11508-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2016-04-18 13:21       ` Matan Barak (External)
     [not found]         ` <5714DF68.6040509-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2016-04-18 14:33           ` Sagi Grimberg
2016-05-13 20:16           ` Doug Ledford
     [not found]             ` <a93b20ca-42bf-ca2b-2995-aa2837a1bafe-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-05-15  7:31               ` Matan Barak
     [not found]                 ` <868e5bcf-18c5-30f4-fe3f-b361c0ce43ec-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2016-05-18 14:48                   ` Doug Ledford

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.