All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] SRP initiator patches for kernel 3.17
@ 2014-07-03 13:44 Bart Van Assche
  2014-07-03 14:17 ` Bart Van Assche
       [not found] ` <53B55E55.5040907-HInyCGIudOg@public.gmane.org>
  0 siblings, 2 replies; 23+ messages in thread
From: Bart Van Assche @ 2014-07-03 13:44 UTC (permalink / raw)
  To: Roland Dreier
  Cc: James Bottomley, Sagi Grimberg, Sebastian Parschauer,
	David Dillow, linux-rdma

This patch series consists of the following five patches, of which three
are bug fixes and two are performance optimizations:

0001-scsi_transport_srp-Fix-fast_io_fail_tmo-dev_loss_tmo.patch
0002-IB-srp-Fix-deadlock-between-host-removal-and-multipa.patch
0003-IB-srp-Fix-residual-handling.patch
0004-IB-srp-Use-P_Key-cache-for-P_Key-lookups.patch
0005-IB-srp-Optimize-completion-queue-polling.patch

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 1/5] scsi_transport_srp: Fix fast_io_fail_tmo=dev_loss_tmo=off behavior
       [not found] ` <53B55E55.5040907-HInyCGIudOg@public.gmane.org>
@ 2014-07-03 13:45   ` Bart Van Assche
  2014-07-03 13:46   ` [PATCH 2/5] IB/srp: Fix deadlock between host removal and multipathd Bart Van Assche
                     ` (5 subsequent siblings)
  6 siblings, 0 replies; 23+ messages in thread
From: Bart Van Assche @ 2014-07-03 13:45 UTC (permalink / raw)
  To: Roland Dreier
  Cc: James Bottomley, Sagi Grimberg, Sebastian Parschauer,
	David Dillow, linux-rdma

If scsi_remove_host() is called while an rport is in the blocked
state then scsi_remove_host() will only finish if the rport is
unblocked from inside a timer function. Make sure that an rport
only enters the blocked state if a timer will be started that
will unblock it. This avoids that unloading the ib_srp kernel
module after having disconnected the initiator from the target
system results in a deadlock if both the fast_io_fail_tmo and
dev_loss_tmo parameters have been set to "off".

Signed-off-by: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>
Cc: James Bottomley <jbottomley-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
Cc: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Cc: Sebastian Parschauer <sebastian.riemer-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
Cc: David Dillow <dave-i1Mk8JYDVaaSihdK6806/g@public.gmane.org>
Cc: <stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
---
 drivers/scsi/scsi_transport_srp.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/scsi_transport_srp.c b/drivers/scsi/scsi_transport_srp.c
index 13e8983..a0c5bfd 100644
--- a/drivers/scsi/scsi_transport_srp.c
+++ b/drivers/scsi/scsi_transport_srp.c
@@ -473,7 +473,8 @@ static void __srp_start_tl_fail_timers(struct srp_rport *rport)
 	if (delay > 0)
 		queue_delayed_work(system_long_wq, &rport->reconnect_work,
 				   1UL * delay * HZ);
-	if (srp_rport_set_state(rport, SRP_RPORT_BLOCKED) == 0) {
+	if ((fast_io_fail_tmo >= 0 || dev_loss_tmo >= 0) &&
+	    srp_rport_set_state(rport, SRP_RPORT_BLOCKED) == 0) {
 		pr_debug("%s new state: %d\n", dev_name(&shost->shost_gendev),
 			 rport->state);
 		scsi_target_block(&shost->shost_gendev);
-- 
1.8.4.5

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 2/5] IB/srp: Fix deadlock between host removal and multipathd
       [not found] ` <53B55E55.5040907-HInyCGIudOg@public.gmane.org>
  2014-07-03 13:45   ` [PATCH 1/5] scsi_transport_srp: Fix fast_io_fail_tmo=dev_loss_tmo=off behavior Bart Van Assche
@ 2014-07-03 13:46   ` Bart Van Assche
  2014-07-03 13:47   ` [PATCH 3/5] IB/srp: Fix residual handling Bart Van Assche
                     ` (4 subsequent siblings)
  6 siblings, 0 replies; 23+ messages in thread
From: Bart Van Assche @ 2014-07-03 13:46 UTC (permalink / raw)
  To: Roland Dreier
  Cc: Sagi Grimberg, Sebastian Parschauer, David Dillow, linux-rdma

If scsi_remove_host() is invoked after a SCSI device has been
blocked, if the fast_io_fail_tmo or dev_loss_tmo work gets
scheduled on the workqueue executing srp_remove_work() and if an
I/O request is scheduled after the SCSI device had been blocked
by e.g. multipathd then the following deadlock can occur:

kworker/6:1     D ffff880831f3c460     0   195      2 0x00000000
Call Trace:
 [<ffffffff814aafd9>] schedule+0x29/0x70
 [<ffffffff814aa0ef>] schedule_timeout+0x10f/0x2a0
 [<ffffffff8105af6f>] msleep+0x2f/0x40
 [<ffffffff8123b0ae>] __blk_drain_queue+0x4e/0x180
 [<ffffffff8123d2d5>] blk_cleanup_queue+0x225/0x230
 [<ffffffffa0010732>] __scsi_remove_device+0x62/0xe0 [scsi_mod]
 [<ffffffffa000ed2f>] scsi_forget_host+0x6f/0x80 [scsi_mod]
 [<ffffffffa0002eba>] scsi_remove_host+0x7a/0x130 [scsi_mod]
 [<ffffffffa07cf5c5>] srp_remove_work+0x95/0x180 [ib_srp]
 [<ffffffff8106d7aa>] process_one_work+0x1ea/0x6c0
 [<ffffffff8106dd9b>] worker_thread+0x11b/0x3a0
 [<ffffffff810758bd>] kthread+0xed/0x110
 [<ffffffff814b972c>] ret_from_fork+0x7c/0xb0
multipathd      D ffff880096acc460     0  5340      1 0x00000000
Call Trace:
 [<ffffffff814aafd9>] schedule+0x29/0x70
 [<ffffffff814aa0ef>] schedule_timeout+0x10f/0x2a0
 [<ffffffff814ab79b>] io_schedule_timeout+0x9b/0xf0
 [<ffffffff814abe1c>] wait_for_completion_io_timeout+0xdc/0x110
 [<ffffffff81244b9b>] blk_execute_rq+0x9b/0x100
 [<ffffffff8124f665>] sg_io+0x1a5/0x450
 [<ffffffff8124fd21>] scsi_cmd_ioctl+0x2a1/0x430
 [<ffffffff8124fef2>] scsi_cmd_blk_ioctl+0x42/0x50
 [<ffffffffa00ec97e>] sd_ioctl+0xbe/0x140 [sd_mod]
 [<ffffffff8124bd04>] blkdev_ioctl+0x234/0x840
 [<ffffffff811cb491>] block_ioctl+0x41/0x50
 [<ffffffff811a0df0>] do_vfs_ioctl+0x300/0x520
 [<ffffffff811a1051>] SyS_ioctl+0x41/0x80
 [<ffffffff814b9962>] tracesys+0xd0/0xd5

Fix this by scheduling removal work on another workqueue than the
transport layer timers.

Signed-off-by: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>
Cc: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Cc: Sebastian Parschauer <sebastian.riemer-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
Cc: David Dillow <dave-i1Mk8JYDVaaSihdK6806/g@public.gmane.org>
Cc: <stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
---
 drivers/infiniband/ulp/srp/ib_srp.c | 38 +++++++++++++++++++++++++++----------
 1 file changed, 28 insertions(+), 10 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index e3c2c5b..7670008 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -130,6 +130,7 @@ static void srp_send_completion(struct ib_cq *cq, void *target_ptr);
 static int srp_cm_handler(struct ib_cm_id *cm_id, struct ib_cm_event *event);
 
 static struct scsi_transport_template *ib_srp_transport_template;
+static struct workqueue_struct *srp_remove_wq;
 
 static struct ib_client srp_client = {
 	.name   = "srp",
@@ -731,7 +732,7 @@ static bool srp_queue_remove_work(struct srp_target_port *target)
 	spin_unlock_irq(&target->lock);
 
 	if (changed)
-		queue_work(system_long_wq, &target->remove_work);
+		queue_work(srp_remove_wq, &target->remove_work);
 
 	return changed;
 }
@@ -3261,9 +3262,10 @@ static void srp_remove_one(struct ib_device *device)
 		spin_unlock(&host->target_lock);
 
 		/*
-		 * Wait for target port removal tasks.
+		 * Wait for tl_err and target port removal tasks.
 		 */
 		flush_workqueue(system_long_wq);
+		flush_workqueue(srp_remove_wq);
 
 		kfree(host);
 	}
@@ -3313,16 +3315,22 @@ static int __init srp_init_module(void)
 		indirect_sg_entries = cmd_sg_entries;
 	}
 
+	srp_remove_wq = create_workqueue("srp_remove");
+	if (IS_ERR(srp_remove_wq)) {
+		ret = PTR_ERR(srp_remove_wq);
+		goto out;
+	}
+
+	ret = -ENOMEM;
 	ib_srp_transport_template =
 		srp_attach_transport(&ib_srp_transport_functions);
 	if (!ib_srp_transport_template)
-		return -ENOMEM;
+		goto destroy_wq;
 
 	ret = class_register(&srp_class);
 	if (ret) {
 		pr_err("couldn't register class infiniband_srp\n");
-		srp_release_transport(ib_srp_transport_template);
-		return ret;
+		goto release_tr;
 	}
 
 	ib_sa_register_client(&srp_sa_client);
@@ -3330,13 +3338,22 @@ static int __init srp_init_module(void)
 	ret = ib_register_client(&srp_client);
 	if (ret) {
 		pr_err("couldn't register IB client\n");
-		srp_release_transport(ib_srp_transport_template);
-		ib_sa_unregister_client(&srp_sa_client);
-		class_unregister(&srp_class);
-		return ret;
+		goto unreg_sa;
 	}
 
-	return 0;
+out:
+	return ret;
+
+unreg_sa:
+	ib_sa_unregister_client(&srp_sa_client);
+	class_unregister(&srp_class);
+
+release_tr:
+	srp_release_transport(ib_srp_transport_template);
+
+destroy_wq:
+	destroy_workqueue(srp_remove_wq);
+	goto out;
 }
 
 static void __exit srp_cleanup_module(void)
@@ -3345,6 +3362,7 @@ static void __exit srp_cleanup_module(void)
 	ib_sa_unregister_client(&srp_sa_client);
 	class_unregister(&srp_class);
 	srp_release_transport(ib_srp_transport_template);
+	destroy_workqueue(srp_remove_wq);
 }
 
 module_init(srp_init_module);
-- 
1.8.4.5

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 3/5] IB/srp: Fix residual handling
       [not found] ` <53B55E55.5040907-HInyCGIudOg@public.gmane.org>
  2014-07-03 13:45   ` [PATCH 1/5] scsi_transport_srp: Fix fast_io_fail_tmo=dev_loss_tmo=off behavior Bart Van Assche
  2014-07-03 13:46   ` [PATCH 2/5] IB/srp: Fix deadlock between host removal and multipathd Bart Van Assche
@ 2014-07-03 13:47   ` Bart Van Assche
       [not found]     ` <53B55EE0.1050403-HInyCGIudOg@public.gmane.org>
  2014-07-03 13:47   ` [PATCH 4/5] IB/srp: Use P_Key cache for P_Key lookups Bart Van Assche
                     ` (3 subsequent siblings)
  6 siblings, 1 reply; 23+ messages in thread
From: Bart Van Assche @ 2014-07-03 13:47 UTC (permalink / raw)
  To: Roland Dreier
  Cc: Sagi Grimberg, Sebastian Parschauer, David Dillow, linux-rdma

>From Documentation/scsi/scsi_mid_low_api.txt: "resid - an LLD should
set this signed integer to the requested transfer length (i.e.
'request_bufflen') less the number of bytes that are actually
transferred." This means that resid > 0 in case of an underrun and
also that resid < 0 in case of an overrun. Modify the SRP initiator
code such that it matches this requirement.

Signed-off-by: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>
Cc: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Cc: Sebastian Parschauer <sebastian.riemer-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
Cc: David Dillow <dave-i1Mk8JYDVaaSihdK6806/g@public.gmane.org>
---
 drivers/infiniband/ulp/srp/ib_srp.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index 7670008..6abfff4 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -1644,10 +1644,14 @@ static void srp_process_rsp(struct srp_target_port *target, struct srp_rsp *rsp)
 				     SCSI_SENSE_BUFFERSIZE));
 		}
 
-		if (rsp->flags & (SRP_RSP_FLAG_DOOVER | SRP_RSP_FLAG_DOUNDER))
-			scsi_set_resid(scmnd, be32_to_cpu(rsp->data_out_res_cnt));
-		else if (rsp->flags & (SRP_RSP_FLAG_DIOVER | SRP_RSP_FLAG_DIUNDER))
+		if (rsp->flags & SRP_RSP_FLAG_DIUNDER)
 			scsi_set_resid(scmnd, be32_to_cpu(rsp->data_in_res_cnt));
+		else if (rsp->flags & SRP_RSP_FLAG_DIOVER)
+			scsi_set_resid(scmnd, -be32_to_cpu(rsp->data_in_res_cnt));
+		else if (rsp->flags & SRP_RSP_FLAG_DOUNDER)
+			scsi_set_resid(scmnd, be32_to_cpu(rsp->data_out_res_cnt));
+		else if (rsp->flags & SRP_RSP_FLAG_DOOVER)
+			scsi_set_resid(scmnd, -be32_to_cpu(rsp->data_out_res_cnt));
 
 		srp_free_req(target, req, scmnd,
 			     be32_to_cpu(rsp->req_lim_delta));
-- 
1.8.4.5

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 4/5] IB/srp: Use P_Key cache for P_Key lookups
       [not found] ` <53B55E55.5040907-HInyCGIudOg@public.gmane.org>
                     ` (2 preceding siblings ...)
  2014-07-03 13:47   ` [PATCH 3/5] IB/srp: Fix residual handling Bart Van Assche
@ 2014-07-03 13:47   ` Bart Van Assche
  2014-07-03 13:48   ` [PATCH 5/5] IB/srp: Optimize completion queue polling Bart Van Assche
                     ` (2 subsequent siblings)
  6 siblings, 0 replies; 23+ messages in thread
From: Bart Van Assche @ 2014-07-03 13:47 UTC (permalink / raw)
  To: Roland Dreier
  Cc: Sagi Grimberg, Sebastian Parschauer, David Dillow, linux-rdma

This change slightly reduces the time needed to log in.

Signed-off-by: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>
Cc: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Cc: Sebastian Parschauer <sebastian.riemer-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
Cc: David Dillow <dave-i1Mk8JYDVaaSihdK6806/g@public.gmane.org>
---
 drivers/infiniband/ulp/srp/ib_srp.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index 6abfff4..f685e06 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -40,6 +40,7 @@
 #include <linux/parser.h>
 #include <linux/random.h>
 #include <linux/jiffies.h>
+#include <rdma/ib_cache.h>
 
 #include <linux/atomic.h>
 
@@ -260,10 +261,10 @@ static int srp_init_qp(struct srp_target_port *target,
 	if (!attr)
 		return -ENOMEM;
 
-	ret = ib_find_pkey(target->srp_host->srp_dev->dev,
-			   target->srp_host->port,
-			   be16_to_cpu(target->path.pkey),
-			   &attr->pkey_index);
+	ret = ib_find_cached_pkey(target->srp_host->srp_dev->dev,
+				  target->srp_host->port,
+				  be16_to_cpu(target->path.pkey),
+				  &attr->pkey_index);
 	if (ret)
 		goto out;
 
-- 
1.8.4.5

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 5/5] IB/srp: Optimize completion queue polling
       [not found] ` <53B55E55.5040907-HInyCGIudOg@public.gmane.org>
                     ` (3 preceding siblings ...)
  2014-07-03 13:47   ` [PATCH 4/5] IB/srp: Use P_Key cache for P_Key lookups Bart Van Assche
@ 2014-07-03 13:48   ` Bart Van Assche
       [not found]     ` <53B55F1F.6000704-HInyCGIudOg@public.gmane.org>
  2014-07-03 13:55   ` [PATCH 0/5] SRP initiator patches for kernel 3.17 James Bottomley
  2014-07-03 17:03   ` David Dillow
  6 siblings, 1 reply; 23+ messages in thread
From: Bart Van Assche @ 2014-07-03 13:48 UTC (permalink / raw)
  To: Roland Dreier
  Cc: Sagi Grimberg, Sebastian Parschauer, David Dillow, linux-rdma

Reduce completion queue lock contention by polling for multiple
work completions at once. Limit the number of poll cycles per
completion notification to preserve fairness if multiple verbs
applications use the same port or if multiple IB interrupts have
been mapped to the same CPU core.

Signed-off-by: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>
Cc: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Cc: Sebastian Parschauer <sebastian.riemer-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
Cc: David Dillow <dave-i1Mk8JYDVaaSihdK6806/g@public.gmane.org>
---
 drivers/infiniband/ulp/srp/ib_srp.c | 55 +++++++++++++++++++++++++------------
 drivers/infiniband/ulp/srp/ib_srp.h |  4 +++
 2 files changed, 42 insertions(+), 17 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index f685e06..b017a3a 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -1818,33 +1818,54 @@ static void srp_handle_qp_err(u64 wr_id, enum ib_wc_status wc_status,
 	target->qp_in_error = true;
 }
 
+static void srp_poll_recv(struct ib_cq *cq, struct srp_target_port *target,
+			  int budget)
+{
+	struct ib_wc *const wc = target->recv_wc;
+	int i, n;
+
+	while ((n = ib_poll_cq(cq, min_t(int, ARRAY_SIZE(target->recv_wc),
+					 budget), wc)) > 0) {
+		budget -= n;
+		for (i = 0; i < n; ++i) {
+			if (likely(wc[i].status == IB_WC_SUCCESS)) {
+				srp_handle_recv(target, &wc[i]);
+			} else {
+				srp_handle_qp_err(wc[i].wr_id, wc[i].status,
+						  false, target);
+			}
+		}
+	}
+}
+
 static void srp_recv_completion(struct ib_cq *cq, void *target_ptr)
 {
 	struct srp_target_port *target = target_ptr;
-	struct ib_wc wc;
+	int n = SRP_POLL_BUDGET;
 
-	ib_req_notify_cq(cq, IB_CQ_NEXT_COMP);
-	while (ib_poll_cq(cq, 1, &wc) > 0) {
-		if (likely(wc.status == IB_WC_SUCCESS)) {
-			srp_handle_recv(target, &wc);
-		} else {
-			srp_handle_qp_err(wc.wr_id, wc.status, false, target);
-		}
-	}
+	do {
+		srp_poll_recv(cq, target, n);
+		n = ib_req_notify_cq(cq, IB_CQ_NEXT_COMP |
+				     IB_CQ_REPORT_MISSED_EVENTS);
+	} while (n > 0);
 }
 
 static void srp_send_completion(struct ib_cq *cq, void *target_ptr)
 {
 	struct srp_target_port *target = target_ptr;
-	struct ib_wc wc;
+	struct ib_wc *const wc = target->send_wc;
 	struct srp_iu *iu;
-
-	while (ib_poll_cq(cq, 1, &wc) > 0) {
-		if (likely(wc.status == IB_WC_SUCCESS)) {
-			iu = (struct srp_iu *) (uintptr_t) wc.wr_id;
-			list_add(&iu->list, &target->free_tx);
-		} else {
-			srp_handle_qp_err(wc.wr_id, wc.status, true, target);
+	int i, n;
+
+	while ((n = ib_poll_cq(cq, ARRAY_SIZE(target->send_wc), wc)) > 0) {
+		for (i = 0; i < n; ++i) {
+			if (likely(wc[i].status == IB_WC_SUCCESS)) {
+				iu = (struct srp_iu *) (uintptr_t) wc[i].wr_id;
+				list_add(&iu->list, &target->free_tx);
+			} else {
+				srp_handle_qp_err(wc[i].wr_id, wc[i].status,
+						  true, target);
+			}
 		}
 	}
 }
diff --git a/drivers/infiniband/ulp/srp/ib_srp.h b/drivers/infiniband/ulp/srp/ib_srp.h
index e46ecb1..e81d190 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.h
+++ b/drivers/infiniband/ulp/srp/ib_srp.h
@@ -70,6 +70,8 @@ enum {
 
 	LOCAL_INV_WR_ID_MASK	= 1,
 	FAST_REG_WR_ID_MASK	= 2,
+
+	SRP_POLL_BUDGET		= 512,
 };
 
 enum srp_target_state {
@@ -151,6 +153,8 @@ struct srp_target_port {
 	unsigned int		cmd_sg_cnt;
 	unsigned int		indirect_size;
 	bool			allow_ext_sg;
+	struct ib_wc		recv_wc[16];
+	struct ib_wc		send_wc[16];
 
 	/* Everything above this point is used in the hot path of
 	 * command processing. Try to keep them packed into cachelines.
-- 
1.8.4.5

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/5] SRP initiator patches for kernel 3.17
       [not found] ` <53B55E55.5040907-HInyCGIudOg@public.gmane.org>
                     ` (4 preceding siblings ...)
  2014-07-03 13:48   ` [PATCH 5/5] IB/srp: Optimize completion queue polling Bart Van Assche
@ 2014-07-03 13:55   ` James Bottomley
  2014-07-03 17:03   ` David Dillow
  6 siblings, 0 replies; 23+ messages in thread
From: James Bottomley @ 2014-07-03 13:55 UTC (permalink / raw)
  To: bvanassche-HInyCGIudOg
  Cc: sagig-VPRAkNaXOzVWk0Htik3J/w,
	sebastian.riemer-EIkl63zCoXaH+58JC4qpiA,
	dave-i1Mk8JYDVaaSihdK6806/g, roland-DgEjT+Ai2ygdnm+yROfE0A,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Thu, 2014-07-03 at 15:44 +0200, Bart Van Assche wrote:
> This patch series consists of the following five patches, of which three
> are bug fixes and two are performance optimizations:
> 
> 0001-scsi_transport_srp-Fix-fast_io_fail_tmo-dev_loss_tmo.patch
> 0002-IB-srp-Fix-deadlock-between-host-removal-and-multipa.patch
> 0003-IB-srp-Fix-residual-handling.patch
> 0004-IB-srp-Use-P_Key-cache-for-P_Key-lookups.patch
> 0005-IB-srp-Optimize-completion-queue-polling.patch

These are SCSI patches (well at least 1/5 is), so please cc linux-scsi.

James


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/5] SRP initiator patches for kernel 3.17
  2014-07-03 13:44 [PATCH 0/5] SRP initiator patches for kernel 3.17 Bart Van Assche
@ 2014-07-03 14:17 ` Bart Van Assche
       [not found]   ` <53B565EB.50301-HInyCGIudOg@public.gmane.org>
  2014-07-03 15:06   ` Sagi Grimberg
       [not found] ` <53B55E55.5040907-HInyCGIudOg@public.gmane.org>
  1 sibling, 2 replies; 23+ messages in thread
From: Bart Van Assche @ 2014-07-03 14:17 UTC (permalink / raw)
  To: Roland Dreier
  Cc: James Bottomley, Sagi Grimberg, Sebastian Parschauer,
	David Dillow, linux-rdma, linux-scsi

(CC-ing linux-scsi)

On 07/03/14 15:44, Bart Van Assche wrote:
> This patch series consists of the following five patches, of which three
> are bug fixes and two are performance optimizations:
> 
> 0001-scsi_transport_srp-Fix-fast_io_fail_tmo-dev_loss_tmo.patch
> 0002-IB-srp-Fix-deadlock-between-host-removal-and-multipa.patch
> 0003-IB-srp-Fix-residual-handling.patch
> 0004-IB-srp-Use-P_Key-cache-for-P_Key-lookups.patch
> 0005-IB-srp-Optimize-completion-queue-polling.patch
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> .
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/5] SRP initiator patches for kernel 3.17
       [not found]   ` <53B565EB.50301-HInyCGIudOg@public.gmane.org>
@ 2014-07-03 14:19     ` Bart Van Assche
  0 siblings, 0 replies; 23+ messages in thread
From: Bart Van Assche @ 2014-07-03 14:19 UTC (permalink / raw)
  To: Roland Dreier
  Cc: James Bottomley, Sagi Grimberg, Sebastian Parschauer,
	David Dillow, linux-rdma, linux-scsi-u79uwXL29TY76Z2rM5mHXA

(CC'ing linux-scsi)

On 07/03/14 15:45, Bart Van Assche wrote:> If scsi_remove_host() is called while an rport is in the blocked
> state then scsi_remove_host() will only finish if the rport is
> unblocked from inside a timer function. Make sure that an rport
> only enters the blocked state if a timer will be started that
> will unblock it. This avoids that unloading the ib_srp kernel
> module after having disconnected the initiator from the target
> system results in a deadlock if both the fast_io_fail_tmo and
> dev_loss_tmo parameters have been set to "off".
> 
> Signed-off-by: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>
> Cc: James Bottomley <jbottomley-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
> Cc: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> Cc: Sebastian Parschauer <sebastian.riemer-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
> Cc: David Dillow <dave-i1Mk8JYDVaaSihdK6806/g@public.gmane.org>
> Cc: <stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
> ---
>  drivers/scsi/scsi_transport_srp.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/scsi/scsi_transport_srp.c b/drivers/scsi/scsi_transport_srp.c
> index 13e8983..a0c5bfd 100644
> --- a/drivers/scsi/scsi_transport_srp.c
> +++ b/drivers/scsi/scsi_transport_srp.c
> @@ -473,7 +473,8 @@ static void __srp_start_tl_fail_timers(struct srp_rport *rport)
>  	if (delay > 0)
>  		queue_delayed_work(system_long_wq, &rport->reconnect_work,
>  				   1UL * delay * HZ);
> -	if (srp_rport_set_state(rport, SRP_RPORT_BLOCKED) == 0) {
> +	if ((fast_io_fail_tmo >= 0 || dev_loss_tmo >= 0) &&
> +	    srp_rport_set_state(rport, SRP_RPORT_BLOCKED) == 0) {
>  		pr_debug("%s new state: %d\n", dev_name(&shost->shost_gendev),
>  			 rport->state);
>  		scsi_target_block(&shost->shost_gendev);
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/5] SRP initiator patches for kernel 3.17
  2014-07-03 14:17 ` Bart Van Assche
       [not found]   ` <53B565EB.50301-HInyCGIudOg@public.gmane.org>
@ 2014-07-03 15:06   ` Sagi Grimberg
  1 sibling, 0 replies; 23+ messages in thread
From: Sagi Grimberg @ 2014-07-03 15:06 UTC (permalink / raw)
  To: Bart Van Assche, Roland Dreier
  Cc: James Bottomley, Sagi Grimberg, Sebastian Parschauer,
	David Dillow, linux-rdma, linux-scsi

On 7/3/2014 5:17 PM, Bart Van Assche wrote:
>> This patch series consists of the following five patches, of which three
>> are bug fixes and two are performance optimizations:
>>
>> 0001-scsi_transport_srp-Fix-fast_io_fail_tmo-dev_loss_tmo.patch
>> 0002-IB-srp-Fix-deadlock-between-host-removal-and-multipa.patch
>> 0003-IB-srp-Fix-residual-handling.patch
>> 0004-IB-srp-Use-P_Key-cache-for-P_Key-lookups.patch
>> 0005-IB-srp-Optimize-completion-queue-polling.patch
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> .
>>

Hey Bart,

The series looks good,

You can add to the series:
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>

Specifically to patches #1, #4, #5 you can add:
Tested-by: Sagi Grimberg <sagig@mellanox.com>

Sagi.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 5/5] IB/srp: Optimize completion queue polling
       [not found]     ` <53B55F1F.6000704-HInyCGIudOg@public.gmane.org>
@ 2014-07-03 16:46       ` Or Gerlitz
       [not found]         ` <CAJZOPZ+RPe8B_KhKZ-8-S6g871-EKQSnExZgsZ2Z+bo-9L=P9w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2014-07-03 16:53       ` David Dillow
  2014-07-03 17:05       ` David Dillow
  2 siblings, 1 reply; 23+ messages in thread
From: Or Gerlitz @ 2014-07-03 16:46 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Roland Dreier, Sagi Grimberg, Sebastian Parschauer, David Dillow,
	linux-rdma

On Thu, Jul 3, 2014 at 4:48 PM, Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org> wrote:
[...]
>
> Limit the number of poll cycles per completion notification to preserve fairness

[...]

Any reason not to make use of the block layer NAPI like API for that
(blk-iopoll http://lwn.net/Articles/346187/) ?

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 5/5] IB/srp: Optimize completion queue polling
       [not found]     ` <53B55F1F.6000704-HInyCGIudOg@public.gmane.org>
  2014-07-03 16:46       ` Or Gerlitz
@ 2014-07-03 16:53       ` David Dillow
  2014-07-03 17:05       ` David Dillow
  2 siblings, 0 replies; 23+ messages in thread
From: David Dillow @ 2014-07-03 16:53 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Roland Dreier, Sagi Grimberg, Sebastian Parschauer, linux-rdma

On Thu, 2014-07-03 at 15:48 +0200, Bart Van Assche wrote:
> Reduce completion queue lock contention by polling for multiple
> work completions at once. Limit the number of poll cycles per
> completion notification to preserve fairness if multiple verbs
> applications use the same port or if multiple IB interrupts have
> been mapped to the same CPU core.

Do you have performance/CPU utilization numbers for this change?

When we looked at this back in the 2.6.38 time-frame, there were several
performance anomalies with this -- and an block-iopoll conversion -- and
IIRC, neither was a win. It's possible that the issues were specific to
the drivers and that newer hardware is better, but it'd be good to see
the numbers.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/5] IB/srp: Fix residual handling
       [not found]     ` <53B55EE0.1050403-HInyCGIudOg@public.gmane.org>
@ 2014-07-03 17:00       ` David Dillow
  0 siblings, 0 replies; 23+ messages in thread
From: David Dillow @ 2014-07-03 17:00 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Roland Dreier, Sagi Grimberg, Sebastian Parschauer, linux-rdma

On Thu, 2014-07-03 at 15:47 +0200, Bart Van Assche wrote:

> -		if (rsp->flags & (SRP_RSP_FLAG_DOOVER | SRP_RSP_FLAG_DOUNDER))
> -			scsi_set_resid(scmnd, be32_to_cpu(rsp->data_out_res_cnt));
> -		else if (rsp->flags & (SRP_RSP_FLAG_DIOVER | SRP_RSP_FLAG_DIUNDER))
> +		if (rsp->flags & SRP_RSP_FLAG_DIUNDER)
>  			scsi_set_resid(scmnd, be32_to_cpu(rsp->data_in_res_cnt));
> +		else if (rsp->flags & SRP_RSP_FLAG_DIOVER)
> +			scsi_set_resid(scmnd, -be32_to_cpu(rsp->data_in_res_cnt));
> +		else if (rsp->flags & SRP_RSP_FLAG_DOUNDER)
> +			scsi_set_resid(scmnd, be32_to_cpu(rsp->data_out_res_cnt));
> +		else if (rsp->flags & SRP_RSP_FLAG_DOOVER)
> +			scsi_set_resid(scmnd, -be32_to_cpu(rsp->data_out_res_cnt));

LGTM. I wonder if we're getting to a point we should hide all the flag
checking behind a if (unlikely(rsp->flags)) { ...

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/5] SRP initiator patches for kernel 3.17
       [not found] ` <53B55E55.5040907-HInyCGIudOg@public.gmane.org>
                     ` (5 preceding siblings ...)
  2014-07-03 13:55   ` [PATCH 0/5] SRP initiator patches for kernel 3.17 James Bottomley
@ 2014-07-03 17:03   ` David Dillow
  6 siblings, 0 replies; 23+ messages in thread
From: David Dillow @ 2014-07-03 17:03 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Roland Dreier, James Bottomley, Sagi Grimberg,
	Sebastian Parschauer, linux-rdma

On Thu, 2014-07-03 at 15:44 +0200, Bart Van Assche wrote:
> This patch series consists of the following five patches, of which three
> are bug fixes and two are performance optimizations:
> 
> 0001-scsi_transport_srp-Fix-fast_io_fail_tmo-dev_loss_tmo.patch
> 0002-IB-srp-Fix-deadlock-between-host-removal-and-multipa.patch
> 0003-IB-srp-Fix-residual-handling.patch
> 0004-IB-srp-Use-P_Key-cache-for-P_Key-lookups.patch
> 0005-IB-srp-Optimize-completion-queue-polling.patch

All except the last can be

Acked-by: David Dillow <dave-i1Mk8JYDVaaSihdK6806/g@public.gmane.org>

or feel free to s/Acked/Reviewed/


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 5/5] IB/srp: Optimize completion queue polling
       [not found]     ` <53B55F1F.6000704-HInyCGIudOg@public.gmane.org>
  2014-07-03 16:46       ` Or Gerlitz
  2014-07-03 16:53       ` David Dillow
@ 2014-07-03 17:05       ` David Dillow
       [not found]         ` <1404407103.32754.3.camel-a7a0dvSY7KqLUyTwlgNVppKKF0rrzTr+@public.gmane.org>
  2 siblings, 1 reply; 23+ messages in thread
From: David Dillow @ 2014-07-03 17:05 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Roland Dreier, Sagi Grimberg, Sebastian Parschauer, linux-rdma

[Apologies if you get this twice, mailer crashed during the first send]

On Thu, 2014-07-03 at 15:48 +0200, Bart Van Assche wrote:
> Reduce completion queue lock contention by polling for multiple
> work completions at once. Limit the number of poll cycles per
> completion notification to preserve fairness if multiple verbs
> applications use the same port or if multiple IB interrupts have
> been mapped to the same CPU core.

Do you have performance/CPU utilization numbers for this change?

When we looked at this back in the 2.6.38 time-frame, there were several
performance anomalies with this -- and an block-iopoll conversion -- and
IIRC, neither was a win. It's possible that the issues were specific to
the drivers and that newer hardware is better, but it'd be good to see
the numbers.


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 5/5] IB/srp: Optimize completion queue polling
       [not found]         ` <CAJZOPZ+RPe8B_KhKZ-8-S6g871-EKQSnExZgsZ2Z+bo-9L=P9w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2014-07-04  9:59           ` Bart Van Assche
       [not found]             ` <53B67B0E.5070004-HInyCGIudOg@public.gmane.org>
  0 siblings, 1 reply; 23+ messages in thread
From: Bart Van Assche @ 2014-07-04  9:59 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Roland Dreier, Sagi Grimberg, Sebastian Parschauer, David Dillow,
	linux-rdma

On 07/03/14 18:46, Or Gerlitz wrote:
> On Thu, Jul 3, 2014 at 4:48 PM, Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org> wrote:
> [...]
>>
>> Limit the number of poll cycles per completion notification to preserve fairness
> 
> [...]
> 
> Any reason not to make use of the block layer NAPI like API for that
> (blk-iopoll http://lwn.net/Articles/346187/) ?

Hello Or,

As you maybe remember about four years ago I have been experimenting
with adding blk-iopoll support in the SRP initiator. What I learned at
that time is that the blk-iopoll framework defers work to softirq
context. This means that a context switch from interrupt to softirq
context has to occur before SRP completion processing can start. Recent
measurements on current hardware have shown that such a context switch
takes about 0.5 microseconds. Since I prefer to keep the latency of SRP
I/O as low as possible I haven't looked further into using blk-iopoll
for the SRP initiator driver.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 5/5] IB/srp: Optimize completion queue polling
       [not found]         ` <1404407103.32754.3.camel-a7a0dvSY7KqLUyTwlgNVppKKF0rrzTr+@public.gmane.org>
@ 2014-07-04 10:48           ` Bart Van Assche
       [not found]             ` <1404501176.16296.18.camel@haswell.thedillows.org>
  0 siblings, 1 reply; 23+ messages in thread
From: Bart Van Assche @ 2014-07-04 10:48 UTC (permalink / raw)
  To: David Dillow
  Cc: Roland Dreier, Sagi Grimberg, Sebastian Parschauer, linux-rdma

On 07/03/14 19:05, David Dillow wrote:
> When we looked at this back in the 2.6.38 time-frame, there were several
> performance anomalies with this -- and an block-iopoll conversion -- and
> IIRC, neither was a win. It's possible that the issues were specific to
> the drivers and that newer hardware is better, but it'd be good to see
> the numbers.

Hello Dave,

Do you still have that measurement data available and/or the scripts
that were used to collect that data ? I'd like to have a look at which
test you ran such that I can repeat that test with Linus' master tree. A
lot has been changed since kernel 2.6.38 was released, e.g. several more
SCSI core and SRP initiator driver optimizations have been accepted
upstream since then. I have found an e-mail from you from 2010 with
measurement data attached in an .ods document but it's probably not that
e-mail that you are referring to
(http://www.spinics.net/lists/linux-scsi/msg49216.html).

Thanks,

Bart.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 5/5] IB/srp: Optimize completion queue polling
       [not found]             ` <53B67B0E.5070004-HInyCGIudOg@public.gmane.org>
@ 2014-07-08  7:55               ` Or Gerlitz
       [not found]                 ` <CAJZOPZJokriDMCAxnoXJo+RTRvtDquSMGpQThp8OoDnHjKU32A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 23+ messages in thread
From: Or Gerlitz @ 2014-07-08  7:55 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Roland Dreier, Sagi Grimberg, Sebastian Parschauer, David Dillow,
	linux-rdma

On Fri, Jul 4, 2014 at 12:59 PM, Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org> wrote:
>
> [...] the blk-iopoll framework defers work to softirq
> context. This means that a context switch from interrupt to softirq
> context has to occur before SRP completion processing can start. Recent
> measurements on current hardware have shown that such a context switch
> takes about 0.5 microseconds. Since I prefer to keep the latency of SRP
> I/O as low as possible I haven't looked further into using blk-iopoll for the SRP initiator driver.

Hi Bart,

In the same manner that SoftirqD is the right place to process
received packets in the networking space, it should be
the case for IO completions in the storage space too.  Note that
typical packet latency in latest/fastest 10g NICs go below 5us
when using NAPI so if we (== NIC drivers) are happy with the CS
overhead there, we (== SCSI LLD storage drivers whose
latency is in the area of 15-20us) should be happy here too.
Specifically, the CS cost is amortized across multiple IOs.

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 5/5] IB/srp: Optimize completion queue polling
       [not found]                 ` <CAJZOPZJokriDMCAxnoXJo+RTRvtDquSMGpQThp8OoDnHjKU32A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2014-07-08  8:11                   ` Bart Van Assche
       [not found]                     ` <53BBA79A.9000000-HInyCGIudOg@public.gmane.org>
  0 siblings, 1 reply; 23+ messages in thread
From: Bart Van Assche @ 2014-07-08  8:11 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Roland Dreier, Sagi Grimberg, Sebastian Parschauer, David Dillow,
	linux-rdma

On 07/08/14 09:55, Or Gerlitz wrote:
> On Fri, Jul 4, 2014 at 12:59 PM, Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org> wrote:
>>
>> [...] the blk-iopoll framework defers work to softirq
>> context. This means that a context switch from interrupt to softirq
>> context has to occur before SRP completion processing can start. Recent
>> measurements on current hardware have shown that such a context switch
>> takes about 0.5 microseconds. Since I prefer to keep the latency of SRP
>> I/O as low as possible I haven't looked further into using blk-iopoll for the SRP initiator driver.
> 
> In the same manner that SoftirqD is the right place to process
> received packets in the networking space, it should be
> the case for IO completions in the storage space too.  Note that
> typical packet latency in latest/fastest 10g NICs go below 5us
> when using NAPI so if we (== NIC drivers) are happy with the CS
> overhead there, we (== SCSI LLD storage drivers whose
> latency is in the area of 15-20us) should be happy here too.
> Specifically, the CS cost is amortized across multiple IOs.

Hello Or,

We might each be referring to different concepts here. What you are
referring to is average response time for QD > 1. What I was referring
to is average response time for QD = 1. The measurements I ran with fio
have shown that the response time for QD = 1 is lower when processing
completions in interrupt context compared to processing interrupts in
softirq context.

Bart.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 5/5] IB/srp: Optimize completion queue polling
       [not found]                     ` <53BBA79A.9000000-HInyCGIudOg@public.gmane.org>
@ 2014-07-08  9:24                       ` Or Gerlitz
  0 siblings, 0 replies; 23+ messages in thread
From: Or Gerlitz @ 2014-07-08  9:24 UTC (permalink / raw)
  To: Bart Van Assche, Or Gerlitz
  Cc: Roland Dreier, Sagi Grimberg, Sebastian Parschauer, David Dillow,
	linux-rdma

On 08/07/2014 11:11, Bart Van Assche wrote:
> e might each be referring to different concepts here. What you are
> referring to is average response time for QD > 1. What I was referring
> to is average response time for QD = 1. The measurements I ran with fio
> have shown that the response time for QD = 1 is lower when processing
> completions in interrupt context compared to processing interrupts in
> softirq context.

I see, but wait, I had few arguments... among them the fact that the 
vast majority if not the entire set of networking drivers live well with 
this CS to softirqD over-head also for their ping-ping (~ QD=1) latency 
tests and their latency is ~3x small vs block drivers.

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 5/5] IB/srp: Optimize completion queue polling
       [not found]               ` <1404501176.16296.18.camel-a7a0dvSY7KqLUyTwlgNVppKKF0rrzTr+@public.gmane.org>
@ 2014-07-08 13:49                 ` Bart Van Assche
       [not found]                   ` <53BBF6E3.3040403-HInyCGIudOg@public.gmane.org>
  0 siblings, 1 reply; 23+ messages in thread
From: Bart Van Assche @ 2014-07-08 13:49 UTC (permalink / raw)
  To: David Dillow
  Cc: Roland Dreier, Sagi Grimberg, Sebastian Parschauer, linux-rdma

On 07/04/14 21:12, David Dillow wrote:
> On Fri, 2014-07-04 at 12:48 +0200, Bart Van Assche wrote:
>> Do you still have that measurement data available and/or the scripts
>> that were used to collect that data?
> 
> I had looked for them before posting and thought they were lost to the
> sands of time, but your pointer to the email gave me the proper search
> terms, thanks!
> 
> srptest.c is a simple test target that fakes a single-LUN, read-only
> target. It's special, in that it does not actually transfer any data, it
> just responds to the SRP command as though it had. It's intended to do
> the minimum work necessary to try push the IOP bottlenecks into the
> initiator.
> 
> run_tests.sh runs the battery, which was saved into an appropriately
> named file for parse.{sh,awk} to process into a csv, which gets turned
> into all.ods.
> 
> In the runs from then, batching (using your patch from that time) saw a
> 2 to 11% decrease in the number of IOPS, though it isn't perfectly clear
> what the noise level is from the pivot table in the spreadsheet. Using
> iopoll (weight of 128, 10, and with the batched CQ patch [not sure of
> weight, probably 10]) shows some scattered small improvements in IOPS
> (1-2%) but quickly fell to a 30+% loss of IOPS. I never had time to
> investigate further.
> 
> In none of the cases did the test target seem to become the bottleneck.
> 
>>  I'd like to have a look at which
>> test you ran such that I can repeat that test with Linus' master tree. A
>> lot has been changed since kernel 2.6.38 was released, e.g. several more
>> SCSI core and SRP initiator driver optimizations have been accepted
>> upstream since then.
> 
> Certainly, things have changed in the code, but I'll be pleasantly
> surprised if the relative results change much -- the only changes were
> the batching, and/or the conversion to iopoll.
> 
> Also, these tests were on QDR on Connect-X (maybe X2) hardware if I
> recall correctly. It would be interesting to see it on X3, or Connect-IB
> to see if they respond better to the changes -- I could easily see the
> batching being pretty hardware-specific in terms of tuning.

Hello Dave,

Thanks for digging up this information and also for sharing it. This is
interesting. What I noticed is that the in the SRP target driver
attached to the previous e-mail ("srptest.c") one command at a time is
processed. However, in the SRP target driver I ran my own tests with
(based on SCST) multiple SCSI commands are processed simultaneously by a
single thread. A finite state machine is associated with each SCSI
command and events like IB work completions trigger transitions of that
state machine. So that might be a possible explanation why my
measurement results were different.

However, before I repost (a variant of) this patch I will try to find a
way to combine the advantages of interrupt-based processing (low
latency) and the blk-iopoll approach (minimal time spent in interrupt
context).

Bart.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 5/5] IB/srp: Optimize completion queue polling
       [not found]                   ` <53BBF6E3.3040403-HInyCGIudOg@public.gmane.org>
@ 2014-07-08 14:45                     ` Sagi Grimberg
  2014-07-09  6:22                     ` David Dillow
  1 sibling, 0 replies; 23+ messages in thread
From: Sagi Grimberg @ 2014-07-08 14:45 UTC (permalink / raw)
  To: Bart Van Assche, David Dillow
  Cc: Roland Dreier, Sagi Grimberg, Sebastian Parschauer, linux-rdma,
	Jens Axboe

On 7/8/2014 4:49 PM, Bart Van Assche wrote:

<SNIP>
> Hello Dave,
>
> Thanks for digging up this information and also for sharing it. This is
> interesting. What I noticed is that the in the SRP target driver
> attached to the previous e-mail ("srptest.c") one command at a time is
> processed. However, in the SRP target driver I ran my own tests with
> (based on SCST) multiple SCSI commands are processed simultaneously by a
> single thread. A finite state machine is associated with each SCSI
> command and events like IB work completions trigger transitions of that
> state machine. So that might be a possible explanation why my
> measurement results were different.
>
> However, before I repost (a variant of) this patch I will try to find a
> way to combine the advantages of interrupt-based processing (low
> latency) and the blk-iopoll approach (minimal time spent in interrupt
> context).

Hey Bart, Dave & Or (CC'ing Jens)

Or and myself did some experiments with blk-iopoll for iSER some time ago.
We noticed that we got better overall performance using this type of 
"internal" budget logic (vs. blk-iopoll approach)
while stats clearly show more interrupts occurring. The reason for that 
is still unknown to us given that iSER completion
context is softirq to begin with (tasklet).

We didn't submit the "internal" budget solution since we believe that 
the correct way to go is using blk-iopoll.
So IMHO the correct way to continue here is with the blk-iopoll approach 
trying to resolve the unexplained performance
gaps we got in the past.

I do agree with Or that SRP should benefit using blk-iopoll as well, the 
canonical Latency lost here seems pretty minor
against the gain in decreasing the amount of interrupts.

Sagi.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 5/5] IB/srp: Optimize completion queue polling
       [not found]                   ` <53BBF6E3.3040403-HInyCGIudOg@public.gmane.org>
  2014-07-08 14:45                     ` Sagi Grimberg
@ 2014-07-09  6:22                     ` David Dillow
  1 sibling, 0 replies; 23+ messages in thread
From: David Dillow @ 2014-07-09  6:22 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Roland Dreier, Sagi Grimberg, Sebastian Parschauer, linux-rdma

On Tue, 2014-07-08 at 15:49 +0200, Bart Van Assche wrote:
> Thanks for digging up this information and also for sharing it.

Sure thing; it's a bummer that something in the email must have tickled
vger's taboo filters...

>  This is interesting. What I noticed is that the in the SRP target
> driver attached to the previous e-mail ("srptest.c") one command at a
> time is processed. However, in the SRP target driver I ran my own
> tests with (based on SCST) multiple SCSI commands are processed
> simultaneously by a single thread. A finite state machine is
> associated with each SCSI command and events like IB work completions
> trigger transitions of that state machine. So that might be a possible
> explanation why my measurement results were different.

True, I expect to see a difference between those models, and I suspect
we were also measuring different values. I was looking at IOPS, and that
I saw a drop when adding the batching (as well as blk-iopoll) tells me
that the test driver was not the limiting factor -- it did not change
between them.

I'll readily agree that it is not a "real" load -- it purposefully does
not do any RDMA reads or writes; it just responds to the request as fast
as possible. The fact that it is single threaded shouldn't matter in
this context -- the IB streaming benchmarks are also single threaded and
can do many more operations per second.

I'll also admit that perhaps your threaded target bunches up responses a
bit, so perhaps the initiator sees enough work to make the batching
worth it for your tests. It'd be a bit surprising, but not overly so.

Regardless, it'd be nice to see your numbers for this change, and if you
have time to run it against my test target and post those for
comparison, that'd be swell. I no longer have much in the way of access
to relevant IB hardware -- just some SDR and perhaps a few DDR mthca
cards at home, and none are currently in machines.

> However, before I repost (a variant of) this patch I will try to find
> a way to combine the advantages of interrupt-based processing (low
> latency) and the blk-iopoll approach (minimal time spent in interrupt
> context).

Have you run an idle-soaking process like zc (by Andrew Morton, back in
the days of sendfile) to measure the CPU usage of the different
approaches? Those would be some interesting numbers in conjunction with
the latency and IOPs numbers -- it would give everyone a picture of the
trade-offs involved.

Thanks, and I may have plenty of questions, I'm glad you're looking at
this.
Dave

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2014-07-09  6:22 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-07-03 13:44 [PATCH 0/5] SRP initiator patches for kernel 3.17 Bart Van Assche
2014-07-03 14:17 ` Bart Van Assche
     [not found]   ` <53B565EB.50301-HInyCGIudOg@public.gmane.org>
2014-07-03 14:19     ` Bart Van Assche
2014-07-03 15:06   ` Sagi Grimberg
     [not found] ` <53B55E55.5040907-HInyCGIudOg@public.gmane.org>
2014-07-03 13:45   ` [PATCH 1/5] scsi_transport_srp: Fix fast_io_fail_tmo=dev_loss_tmo=off behavior Bart Van Assche
2014-07-03 13:46   ` [PATCH 2/5] IB/srp: Fix deadlock between host removal and multipathd Bart Van Assche
2014-07-03 13:47   ` [PATCH 3/5] IB/srp: Fix residual handling Bart Van Assche
     [not found]     ` <53B55EE0.1050403-HInyCGIudOg@public.gmane.org>
2014-07-03 17:00       ` David Dillow
2014-07-03 13:47   ` [PATCH 4/5] IB/srp: Use P_Key cache for P_Key lookups Bart Van Assche
2014-07-03 13:48   ` [PATCH 5/5] IB/srp: Optimize completion queue polling Bart Van Assche
     [not found]     ` <53B55F1F.6000704-HInyCGIudOg@public.gmane.org>
2014-07-03 16:46       ` Or Gerlitz
     [not found]         ` <CAJZOPZ+RPe8B_KhKZ-8-S6g871-EKQSnExZgsZ2Z+bo-9L=P9w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-07-04  9:59           ` Bart Van Assche
     [not found]             ` <53B67B0E.5070004-HInyCGIudOg@public.gmane.org>
2014-07-08  7:55               ` Or Gerlitz
     [not found]                 ` <CAJZOPZJokriDMCAxnoXJo+RTRvtDquSMGpQThp8OoDnHjKU32A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-07-08  8:11                   ` Bart Van Assche
     [not found]                     ` <53BBA79A.9000000-HInyCGIudOg@public.gmane.org>
2014-07-08  9:24                       ` Or Gerlitz
2014-07-03 16:53       ` David Dillow
2014-07-03 17:05       ` David Dillow
     [not found]         ` <1404407103.32754.3.camel-a7a0dvSY7KqLUyTwlgNVppKKF0rrzTr+@public.gmane.org>
2014-07-04 10:48           ` Bart Van Assche
     [not found]             ` <1404501176.16296.18.camel@haswell.thedillows.org>
     [not found]               ` <1404501176.16296.18.camel-a7a0dvSY7KqLUyTwlgNVppKKF0rrzTr+@public.gmane.org>
2014-07-08 13:49                 ` Bart Van Assche
     [not found]                   ` <53BBF6E3.3040403-HInyCGIudOg@public.gmane.org>
2014-07-08 14:45                     ` Sagi Grimberg
2014-07-09  6:22                     ` David Dillow
2014-07-03 13:55   ` [PATCH 0/5] SRP initiator patches for kernel 3.17 James Bottomley
2014-07-03 17:03   ` David Dillow

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.