All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/14] Make ib_srp better suited for H.A. purposes
@ 2011-12-01 18:54 Bart Van Assche
  2011-12-01 19:05 ` [PATCH 08/14] srp_transport: Document sysfs attributes Bart Van Assche
                   ` (5 more replies)
  0 siblings, 6 replies; 65+ messages in thread
From: Bart Van Assche @ 2011-12-01 18:54 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: linux-scsi-u79uwXL29TY76Z2rM5mHXA, David Dillow, Roland Dreier, Vu Pham

This patch series makes the ib_srp driver better suited for use in a H.A.
setup because:
- Switchover without triggering read or write errors become possible. Such
  errors are bad because these can make a filesystem switch to read-only
  mode.
- A ping mechanism has been added that allows to reduce the switch-over
  time greatly.
- Disconnecting from a target without unloading ib_srp becomes possible.
- Switchover can be triggered explicitly by deleting an initiator device.

Where possible these mechanisms have been modelled after similar
functionality in the open-iscsi kernel code.

According to the measurements I ran these patches do not affect ib_srp
performance.

The individual patches are:
0001-ib_srp-Introduce-pr_fmt.patch
0002-ib_srp-Consolidate-repetitive-sysfs-code.patch
0003-ib_srp-Disallow-duplicate-logins.patch
0004-ib_srp-Set-block-layer-timeout.patch
0005-ib_srp-Avoid-that-late-SRP-replies-cause-trouble.patch
0006-ib_srp-Micro-optimize-completion-handlers.patch
0007-ib_srp-Introduce-srp_handle_qp_err.patch
0008-srp_transport-Document-sysfs-attributes.patch
0009-srp_transport-Fix-atttribute-registration-race.patch
0010-srp_transport-Simplify-attribute-initialization-code.patch
0011-ib_srp-Document-sysfs-attributes.patch
0012-ib_srp-Rework-error-handling.patch
0013-ib_srp-Implement-transport-layer-ping.patch
0014-ib_srp-Allow-SRP-disconnect-through-sysfs.patch
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 65+ messages in thread

* [PATCH 01/14] ib_srp: Introduce pr_fmt()
       [not found] ` <201112011954.25811.bvanassche-HInyCGIudOg@public.gmane.org>
@ 2011-12-01 18:55   ` Bart Van Assche
       [not found]     ` <201112011955.47198.bvanassche-HInyCGIudOg@public.gmane.org>
  2011-12-01 18:57   ` [PATCH 02/14] ib_srp: Consolidate repetitive sysfs code Bart Van Assche
                     ` (8 subsequent siblings)
  9 siblings, 1 reply; 65+ messages in thread
From: Bart Van Assche @ 2011-12-01 18:55 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA, David Dillow; +Cc: Roland Dreier

Make the logging code more a little more brief by replacing
printk(KERN_WARNING PFX ...) by pr_warn(...) and by replacing
printk(KERN_ERR PFX ...) by pr_err(...). Remove one trailing
space to avoid a checkpatch warning.

Signed-off-by: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>
Cc: David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org>
Cc: Roland Dreier <roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org>
---
 drivers/infiniband/ulp/srp/ib_srp.c |   40 ++++++++++++++++++----------------
 1 files changed, 21 insertions(+), 19 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index 0bfa545..5093cc6 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -30,6 +30,8 @@
  * SOFTWARE.
  */
 
+#define pr_fmt(fmt) PFX fmt
+
 #include <linux/module.h>
 #include <linux/init.h>
 #include <linux/slab.h>
@@ -165,7 +167,7 @@ static void srp_free_iu(struct srp_host *host, struct srp_iu *iu)
 
 static void srp_qp_event(struct ib_event *event, void *context)
 {
-	printk(KERN_ERR PFX "QP event %d\n", event->event);
+	pr_err("QP event %d\n", event->event);
 }
 
 static int srp_init_qp(struct srp_target_port *target,
@@ -1989,7 +1991,7 @@ static int srp_parse_options(const char *buf, struct srp_target_port *target)
 				goto out;
 			}
 			if (strlen(p) != 32) {
-				printk(KERN_WARNING PFX "bad dest GID parameter '%s'\n", p);
+				pr_warn("bad dest GID parameter '%s'\n", p);
 				kfree(p);
 				goto out;
 			}
@@ -2004,7 +2006,7 @@ static int srp_parse_options(const char *buf, struct srp_target_port *target)
 
 		case SRP_OPT_PKEY:
 			if (match_hex(args, &token)) {
-				printk(KERN_WARNING PFX "bad P_Key parameter '%s'\n", p);
+				pr_warn("bad P_Key parameter '%s'\n", p);
 				goto out;
 			}
 			target->path.pkey = cpu_to_be16(token);
@@ -2023,7 +2025,7 @@ static int srp_parse_options(const char *buf, struct srp_target_port *target)
 
 		case SRP_OPT_MAX_SECT:
 			if (match_int(args, &token)) {
-				printk(KERN_WARNING PFX "bad max sect parameter '%s'\n", p);
+				pr_warn("bad max sect parameter '%s'\n", p);
 				goto out;
 			}
 			target->scsi_host->max_sectors = token;
@@ -2031,7 +2033,7 @@ static int srp_parse_options(const char *buf, struct srp_target_port *target)
 
 		case SRP_OPT_MAX_CMD_PER_LUN:
 			if (match_int(args, &token)) {
-				printk(KERN_WARNING PFX "bad max cmd_per_lun parameter '%s'\n", p);
+				pr_warn("bad max cmd_per_lun parameter '%s'\n", p);
 				goto out;
 			}
 			target->scsi_host->cmd_per_lun = min(token, SRP_CMD_SQ_SIZE);
@@ -2039,12 +2041,12 @@ static int srp_parse_options(const char *buf, struct srp_target_port *target)
 
 		case SRP_OPT_IO_CLASS:
 			if (match_hex(args, &token)) {
-				printk(KERN_WARNING PFX "bad  IO class parameter '%s' \n", p);
+				pr_warn("bad  IO class parameter '%s'\n", p);
 				goto out;
 			}
 			if (token != SRP_REV10_IB_IO_CLASS &&
 			    token != SRP_REV16A_IB_IO_CLASS) {
-				printk(KERN_WARNING PFX "unknown IO class parameter value"
+				pr_warn("unknown IO class parameter value"
 				       " %x specified (use %x or %x).\n",
 				       token, SRP_REV10_IB_IO_CLASS, SRP_REV16A_IB_IO_CLASS);
 				goto out;
@@ -2064,7 +2066,7 @@ static int srp_parse_options(const char *buf, struct srp_target_port *target)
 
 		case SRP_OPT_CMD_SG_ENTRIES:
 			if (match_int(args, &token) || token < 1 || token > 255) {
-				printk(KERN_WARNING PFX "bad max cmd_sg_entries parameter '%s'\n", p);
+				pr_warn("bad max cmd_sg_entries parameter '%s'\n", p);
 				goto out;
 			}
 			target->cmd_sg_cnt = token;
@@ -2072,7 +2074,7 @@ static int srp_parse_options(const char *buf, struct srp_target_port *target)
 
 		case SRP_OPT_ALLOW_EXT_SG:
 			if (match_int(args, &token)) {
-				printk(KERN_WARNING PFX "bad allow_ext_sg parameter '%s'\n", p);
+				pr_warn("bad allow_ext_sg parameter '%s'\n", p);
 				goto out;
 			}
 			target->allow_ext_sg = !!token;
@@ -2081,14 +2083,14 @@ static int srp_parse_options(const char *buf, struct srp_target_port *target)
 		case SRP_OPT_SG_TABLESIZE:
 			if (match_int(args, &token) || token < 1 ||
 					token > SCSI_MAX_SG_CHAIN_SEGMENTS) {
-				printk(KERN_WARNING PFX "bad max sg_tablesize parameter '%s'\n", p);
+				pr_warn("bad max sg_tablesize parameter '%s'\n", p);
 				goto out;
 			}
 			target->sg_tablesize = token;
 			break;
 
 		default:
-			printk(KERN_WARNING PFX "unknown parameter or missing value "
+			pr_warn("unknown parameter or missing value "
 			       "'%s' in target creation request\n", p);
 			goto out;
 		}
@@ -2100,7 +2102,7 @@ static int srp_parse_options(const char *buf, struct srp_target_port *target)
 		for (i = 0; i < ARRAY_SIZE(srp_opt_tokens); ++i)
 			if ((srp_opt_tokens[i].token & SRP_OPT_ALL) &&
 			    !(srp_opt_tokens[i].token & opt_mask))
-				printk(KERN_WARNING PFX "target creation request is "
+				pr_warn("target creation request is "
 				       "missing parameter '%s'\n",
 				       srp_opt_tokens[i].pattern);
 
@@ -2149,7 +2151,7 @@ static ssize_t srp_create_target(struct device *dev,
 
 	if (!host->srp_dev->fmr_pool && !target->allow_ext_sg &&
 				target->cmd_sg_cnt < target->sg_tablesize) {
-		printk(KERN_WARNING PFX "No FMR pool and no external indirect descriptors, limiting sg_tablesize to cmd_sg_cnt\n");
+		pr_warn("No FMR pool and no external indirect descriptors, limiting sg_tablesize to cmd_sg_cnt\n");
 		target->sg_tablesize = target->cmd_sg_cnt;
 	}
 
@@ -2309,7 +2311,7 @@ static void srp_add_one(struct ib_device *device)
 		return;
 
 	if (ib_query_device(device, dev_attr)) {
-		printk(KERN_WARNING PFX "Query device failed for %s\n",
+		pr_warn("Query device failed for %s\n",
 		       device->name);
 		goto free_attr;
 	}
@@ -2459,7 +2461,7 @@ static int __init srp_init_module(void)
 	BUILD_BUG_ON(FIELD_SIZEOF(struct ib_wc, wr_id) < sizeof(void *));
 
 	if (srp_sg_tablesize) {
-		printk(KERN_WARNING PFX "srp_sg_tablesize is deprecated, please use cmd_sg_entries\n");
+		pr_warn("srp_sg_tablesize is deprecated, please use cmd_sg_entries\n");
 		if (!cmd_sg_entries)
 			cmd_sg_entries = srp_sg_tablesize;
 	}
@@ -2468,14 +2470,14 @@ static int __init srp_init_module(void)
 		cmd_sg_entries = SRP_DEF_SG_TABLESIZE;
 
 	if (cmd_sg_entries > 255) {
-		printk(KERN_WARNING PFX "Clamping cmd_sg_entries to 255\n");
+		pr_warn("Clamping cmd_sg_entries to 255\n");
 		cmd_sg_entries = 255;
 	}
 
 	if (!indirect_sg_entries)
 		indirect_sg_entries = cmd_sg_entries;
 	else if (indirect_sg_entries < cmd_sg_entries) {
-		printk(KERN_WARNING PFX "Bumping up indirect_sg_entries to match cmd_sg_entries (%u)\n", cmd_sg_entries);
+		pr_warn("Bumping up indirect_sg_entries to match cmd_sg_entries (%u)\n", cmd_sg_entries);
 		indirect_sg_entries = cmd_sg_entries;
 	}
 
@@ -2486,7 +2488,7 @@ static int __init srp_init_module(void)
 
 	ret = class_register(&srp_class);
 	if (ret) {
-		printk(KERN_ERR PFX "couldn't register class infiniband_srp\n");
+		pr_err("couldn't register class infiniband_srp\n");
 		srp_release_transport(ib_srp_transport_template);
 		return ret;
 	}
@@ -2495,7 +2497,7 @@ static int __init srp_init_module(void)
 
 	ret = ib_register_client(&srp_client);
 	if (ret) {
-		printk(KERN_ERR PFX "couldn't register IB client\n");
+		pr_err("couldn't register IB client\n");
 		srp_release_transport(ib_srp_transport_template);
 		ib_sa_unregister_client(&srp_sa_client);
 		class_unregister(&srp_class);
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 02/14] ib_srp: Consolidate repetitive sysfs code
       [not found] ` <201112011954.25811.bvanassche-HInyCGIudOg@public.gmane.org>
  2011-12-01 18:55   ` [PATCH 01/14] ib_srp: Introduce pr_fmt() Bart Van Assche
@ 2011-12-01 18:57   ` Bart Van Assche
       [not found]     ` <201112011957.19892.bvanassche-HInyCGIudOg@public.gmane.org>
  2011-12-01 18:58   ` [PATCH 03/14] ib_srp: Disallow duplicate logins Bart Van Assche
                     ` (7 subsequent siblings)
  9 siblings, 1 reply; 65+ messages in thread
From: Bart Van Assche @ 2011-12-01 18:57 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA, David Dillow; +Cc: Roland Dreier

Remove sysfs attributes before removing a target instead of
testing the target state in every sysfs attribute callback
method. Note: it is safe to invoke a sysfs attribute removal
method like device_remove_file() twice on the same attribute.

Signed-off-by: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>
Cc: David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org>
Cc: Roland Dreier <roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org>
---
 drivers/infiniband/ulp/srp/ib_srp.c |   49 ++++++++++++----------------------
 1 files changed, 17 insertions(+), 32 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index 5093cc6..868e7f8 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -474,6 +474,21 @@ static void srp_free_req_data(struct srp_target_port *target)
 	}
 }
 
+/**
+ * srp_del_scsi_host_attr() - Remove attributes defined in the host template.
+ * @shost: SCSI host whose attributes to remove from sysfs.
+ *
+ * Note: Any attributes defined in the host template and that did not exist
+ * before invocation of this function will be ignored.
+ */
+static void srp_del_scsi_host_attr(struct Scsi_Host *shost)
+{
+	struct device_attribute **attr;
+
+	for (attr = shost->hostt->shost_attrs; attr && *attr; ++attr)
+		device_remove_file(&shost->shost_dev, *attr);
+}
+
 static void srp_remove_work(struct work_struct *work)
 {
 	struct srp_target_port *target =
@@ -486,6 +501,7 @@ static void srp_remove_work(struct work_struct *work)
 	list_del(&target->list);
 	spin_unlock(&target->srp_host->target_lock);
 
+	srp_del_scsi_host_attr(target->scsi_host);
 	srp_remove_host(target->scsi_host);
 	scsi_remove_host(target->scsi_host);
 	ib_destroy_cm_id(target->cm_id);
@@ -1678,10 +1694,6 @@ static ssize_t show_id_ext(struct device *dev, struct device_attribute *attr,
 {
 	struct srp_target_port *target = host_to_target(class_to_shost(dev));
 
-	if (target->state == SRP_TARGET_DEAD ||
-	    target->state == SRP_TARGET_REMOVED)
-		return -ENODEV;
-
 	return sprintf(buf, "0x%016llx\n",
 		       (unsigned long long) be64_to_cpu(target->id_ext));
 }
@@ -1691,10 +1703,6 @@ static ssize_t show_ioc_guid(struct device *dev, struct device_attribute *attr,
 {
 	struct srp_target_port *target = host_to_target(class_to_shost(dev));
 
-	if (target->state == SRP_TARGET_DEAD ||
-	    target->state == SRP_TARGET_REMOVED)
-		return -ENODEV;
-
 	return sprintf(buf, "0x%016llx\n",
 		       (unsigned long long) be64_to_cpu(target->ioc_guid));
 }
@@ -1704,10 +1712,6 @@ static ssize_t show_service_id(struct device *dev,
 {
 	struct srp_target_port *target = host_to_target(class_to_shost(dev));
 
-	if (target->state == SRP_TARGET_DEAD ||
-	    target->state == SRP_TARGET_REMOVED)
-		return -ENODEV;
-
 	return sprintf(buf, "0x%016llx\n",
 		       (unsigned long long) be64_to_cpu(target->service_id));
 }
@@ -1717,10 +1721,6 @@ static ssize_t show_pkey(struct device *dev, struct device_attribute *attr,
 {
 	struct srp_target_port *target = host_to_target(class_to_shost(dev));
 
-	if (target->state == SRP_TARGET_DEAD ||
-	    target->state == SRP_TARGET_REMOVED)
-		return -ENODEV;
-
 	return sprintf(buf, "0x%04x\n", be16_to_cpu(target->path.pkey));
 }
 
@@ -1729,10 +1729,6 @@ static ssize_t show_dgid(struct device *dev, struct device_attribute *attr,
 {
 	struct srp_target_port *target = host_to_target(class_to_shost(dev));
 
-	if (target->state == SRP_TARGET_DEAD ||
-	    target->state == SRP_TARGET_REMOVED)
-		return -ENODEV;
-
 	return sprintf(buf, "%pI6\n", target->path.dgid.raw);
 }
 
@@ -1741,10 +1737,6 @@ static ssize_t show_orig_dgid(struct device *dev,
 {
 	struct srp_target_port *target = host_to_target(class_to_shost(dev));
 
-	if (target->state == SRP_TARGET_DEAD ||
-	    target->state == SRP_TARGET_REMOVED)
-		return -ENODEV;
-
 	return sprintf(buf, "%pI6\n", target->orig_dgid);
 }
 
@@ -1753,10 +1745,6 @@ static ssize_t show_req_lim(struct device *dev,
 {
 	struct srp_target_port *target = host_to_target(class_to_shost(dev));
 
-	if (target->state == SRP_TARGET_DEAD ||
-	    target->state == SRP_TARGET_REMOVED)
-		return -ENODEV;
-
 	return sprintf(buf, "%d\n", target->req_lim);
 }
 
@@ -1765,10 +1753,6 @@ static ssize_t show_zero_req_lim(struct device *dev,
 {
 	struct srp_target_port *target = host_to_target(class_to_shost(dev));
 
-	if (target->state == SRP_TARGET_DEAD ||
-	    target->state == SRP_TARGET_REMOVED)
-		return -ENODEV;
-
 	return sprintf(buf, "%d\n", target->zero_req_lim);
 }
 
@@ -2431,6 +2415,7 @@ static void srp_remove_one(struct ib_device *device)
 
 		list_for_each_entry_safe(target, tmp_target,
 					 &host->target_list, list) {
+			srp_del_scsi_host_attr(target->scsi_host);
 			srp_remove_host(target->scsi_host);
 			scsi_remove_host(target->scsi_host);
 			srp_disconnect_target(target);
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 03/14] ib_srp: Disallow duplicate logins
       [not found] ` <201112011954.25811.bvanassche-HInyCGIudOg@public.gmane.org>
  2011-12-01 18:55   ` [PATCH 01/14] ib_srp: Introduce pr_fmt() Bart Van Assche
  2011-12-01 18:57   ` [PATCH 02/14] ib_srp: Consolidate repetitive sysfs code Bart Van Assche
@ 2011-12-01 18:58   ` Bart Van Assche
       [not found]     ` <201112011958.17339.bvanassche-HInyCGIudOg@public.gmane.org>
  2011-12-01 18:59   ` [PATCH 04/14] ib_srp: Set block layer timeout Bart Van Assche
                     ` (6 subsequent siblings)
  9 siblings, 1 reply; 65+ messages in thread
From: Bart Van Assche @ 2011-12-01 18:58 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA, David Dillow; +Cc: Roland Dreier

Currently the ib_srp driver allows to log in multiple times to the
same target via its sysfs interface. This leads to each target LUN
being imported multiple times at the initiator side, something that
should not be allowed.

Signed-off-by: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>
Cc: David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org>
Cc: Roland Dreier <roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org>
---
 drivers/infiniband/ulp/srp/ib_srp.c |   32 ++++++++++++++++++++++++++++++++
 1 files changed, 32 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index 868e7f8..2e0169b 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -2095,6 +2095,34 @@ out:
 	return ret;
 }
 
+/**
+ * srp_target_exists() - Check whether a given target already exists.
+ *
+ * Returns -EEXIST if the target already exists and 0 if not.
+ */
+static int srp_target_exists(struct srp_host *host,
+			     struct srp_target_port *target)
+{
+	struct srp_target_port *t;
+	int ret = 0;
+
+	spin_lock(&host->target_lock);
+	list_for_each_entry(t, &host->target_list, list) {
+		WARN_ON(t == target);
+		if (target->id_ext == t->id_ext &&
+		    target->ioc_guid == t->ioc_guid &&
+		    memcmp(target->orig_dgid, t->orig_dgid, 16) == 0 &&
+		    target->path.pkey == t->path.pkey &&
+		    target->service_id == t->service_id) {
+			ret = -EEXIST;
+			break;
+		}
+	}
+	spin_unlock(&host->target_lock);
+
+	return ret;
+}
+
 static ssize_t srp_create_target(struct device *dev,
 				 struct device_attribute *attr,
 				 const char *buf, size_t count)
@@ -2133,6 +2161,10 @@ static ssize_t srp_create_target(struct device *dev,
 	if (ret)
 		goto err;
 
+	ret = srp_target_exists(host, target);
+	if (ret)
+		goto err;
+
 	if (!host->srp_dev->fmr_pool && !target->allow_ext_sg &&
 				target->cmd_sg_cnt < target->sg_tablesize) {
 		pr_warn("No FMR pool and no external indirect descriptors, limiting sg_tablesize to cmd_sg_cnt\n");
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 04/14] ib_srp: Set block layer timeout
       [not found] ` <201112011954.25811.bvanassche-HInyCGIudOg@public.gmane.org>
                     ` (2 preceding siblings ...)
  2011-12-01 18:58   ` [PATCH 03/14] ib_srp: Disallow duplicate logins Bart Van Assche
@ 2011-12-01 18:59   ` Bart Van Assche
       [not found]     ` <201112011959.11956.bvanassche-HInyCGIudOg@public.gmane.org>
  2011-12-01 19:00   ` [PATCH 05/14] ib_srp: Avoid that late SRP replies cause trouble Bart Van Assche
                     ` (5 subsequent siblings)
  9 siblings, 1 reply; 65+ messages in thread
From: Bart Van Assche @ 2011-12-01 18:59 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: David Dillow, Roland Dreier

Make sure that the default block layer timeout is above the
InfiniBand transport layer timeout. The default block layer
timeout is 30 seconds. Typical values for the QP local ack
timeout and retry count are 19 and 7 respectively, which means
that it can take up to 60.1 seconds before a HCA submits an
error completion. The block layer timeout must be larger than
the transport layer timeout because otherwise it can happen
that an SRP response is received after the SCSI layer has
already killed a command.

Signed-off-by: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>
Cc: David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org>
Cc: Roland Dreier <roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org>
---
 drivers/infiniband/ulp/srp/ib_srp.c |   30 ++++++++++++++++++++++++++++++
 drivers/infiniband/ulp/srp/ib_srp.h |    1 +
 2 files changed, 31 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index 2e0169b..0e2353d 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -1372,6 +1372,17 @@ err:
 	return -ENOMEM;
 }
 
+static int srp_slave_alloc(struct scsi_device *sdev)
+{
+	struct Scsi_Host *shost = sdev->host;
+	struct srp_target_port *target = host_to_target(shost);
+
+	if (!WARN_ON(target->rq_tmo_jiffies == 0))
+		blk_queue_rq_timeout(sdev->request_queue,
+				     target->rq_tmo_jiffies);
+	return 0;
+}
+
 static void srp_cm_rep_handler(struct ib_cm_id *cm_id,
 			       struct srp_login_rsp *lrsp,
 			       struct srp_target_port *target)
@@ -1431,6 +1442,24 @@ static void srp_cm_rep_handler(struct ib_cm_id *cm_id,
 	if (ret)
 		goto error_free;
 
+	if (!WARN_ON((attr_mask & (IB_QP_TIMEOUT | IB_QP_RETRY_CNT)) == 0)) {
+		uint64_t T_tr_ns, max_compl_time_ms;
+
+		/*
+		 * Set target->rq_tmo_jiffies to one second more than the
+		 * largest time it can take before an error completion is
+		 * generated.
+		 */
+		T_tr_ns = 1ULL << (12 + qp_attr->timeout);
+		max_compl_time_ms = qp_attr->retry_cnt * 4 * T_tr_ns;
+		do_div(max_compl_time_ms, 1000 * 1000);
+		target->rq_tmo_jiffies =
+			msecs_to_jiffies(max_compl_time_ms + 1000);
+		pr_debug("max. IB completion time = %lld ms; block layer"
+			 " timeout = %d jiffies\n", max_compl_time_ms,
+			 target->rq_tmo_jiffies);
+	}
+
 	ret = ib_modify_qp(target->qp, qp_attr, attr_mask);
 	if (ret)
 		goto error_free;
@@ -1821,6 +1850,7 @@ static struct scsi_host_template srp_template = {
 	.module				= THIS_MODULE,
 	.name				= "InfiniBand SRP initiator",
 	.proc_name			= DRV_NAME,
+	.slave_alloc			= srp_slave_alloc,
 	.info				= srp_target_info,
 	.queuecommand			= srp_queuecommand,
 	.eh_abort_handler		= srp_abort,
diff --git a/drivers/infiniband/ulp/srp/ib_srp.h b/drivers/infiniband/ulp/srp/ib_srp.h
index 020caf0..f010bd9 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.h
+++ b/drivers/infiniband/ulp/srp/ib_srp.h
@@ -138,6 +138,7 @@ struct srp_target_port {
 	u32			lkey;
 	u32			rkey;
 	enum srp_target_state	state;
+	u32			rq_tmo_jiffies;
 	unsigned int		max_iu_len;
 	unsigned int		cmd_sg_cnt;
 	unsigned int		indirect_size;
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 05/14] ib_srp: Avoid that late SRP replies cause trouble
       [not found] ` <201112011954.25811.bvanassche-HInyCGIudOg@public.gmane.org>
                     ` (3 preceding siblings ...)
  2011-12-01 18:59   ` [PATCH 04/14] ib_srp: Set block layer timeout Bart Van Assche
@ 2011-12-01 19:00   ` Bart Van Assche
       [not found]     ` <201112012000.04427.bvanassche-HInyCGIudOg@public.gmane.org>
  2011-12-01 19:02   ` [PATCH 06/14] ib_srp: Micro-optimize completion handlers Bart Van Assche
                     ` (4 subsequent siblings)
  9 siblings, 1 reply; 65+ messages in thread
From: Bart Van Assche @ 2011-12-01 19:00 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: David Dillow, Roland Dreier

If a user misconfigures the block layer timeout such that it is below
the InfiniBand RC timeout it can happen that an SRP reply arrives after
the SCSI error handler has already killed the associated SCSI command.
Avoid that late replies cause a kernel crash.

Signed-off-by: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>
Cc: David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org>
Cc: Roland Dreier <roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org>
---
 drivers/infiniband/ulp/srp/ib_srp.c |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index 0e2353d..a2624bf 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -1074,10 +1074,12 @@ static void srp_process_rsp(struct srp_target_port *target, struct srp_rsp *rsp)
 	} else {
 		req = &target->req_ring[rsp->tag];
 		scmnd = req->scmnd;
-		if (!scmnd)
+		if (unlikely(!scmnd)) {
 			shost_printk(KERN_ERR, target->scsi_host,
 				     "Null scmnd for RSP w/tag %016llx\n",
 				     (unsigned long long) rsp->tag);
+			return;
+		}
 		scmnd->result = rsp->status;
 
 		if (rsp->flags & SRP_RSP_FLAG_SNSVALID) {
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 06/14] ib_srp: Micro-optimize completion handlers
       [not found] ` <201112011954.25811.bvanassche-HInyCGIudOg@public.gmane.org>
                     ` (4 preceding siblings ...)
  2011-12-01 19:00   ` [PATCH 05/14] ib_srp: Avoid that late SRP replies cause trouble Bart Van Assche
@ 2011-12-01 19:02   ` Bart Van Assche
       [not found]     ` <201112012002.17829.bvanassche-HInyCGIudOg@public.gmane.org>
  2011-12-01 19:02   ` [PATCH 07/14] ib_srp: Introduce srp_handle_qp_err() Bart Van Assche
                     ` (3 subsequent siblings)
  9 siblings, 1 reply; 65+ messages in thread
From: Bart Van Assche @ 2011-12-01 19:02 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: David Dillow, Roland Dreier

Help the CPU branch predictor by rearranging two if-statements such
that the most likely code is in the if-part instead of the else-part.

Signed-off-by: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>
Cc: David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org>
Cc: Roland Dreier <roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org>
---
 drivers/infiniband/ulp/srp/ib_srp.c |   14 +++++++-------
 1 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index a2624bf..e6d1aef 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -1224,15 +1224,15 @@ static void srp_recv_completion(struct ib_cq *cq, void *target_ptr)
 
 	ib_req_notify_cq(cq, IB_CQ_NEXT_COMP);
 	while (ib_poll_cq(cq, 1, &wc) > 0) {
-		if (wc.status) {
+		if (wc.status == IB_WC_SUCCESS) {
+			srp_handle_recv(target, &wc);
+		} else {
 			shost_printk(KERN_ERR, target->scsi_host,
 				     PFX "failed receive status %d\n",
 				     wc.status);
 			target->qp_in_error = 1;
 			break;
 		}
-
-		srp_handle_recv(target, &wc);
 	}
 }
 
@@ -1243,16 +1243,16 @@ static void srp_send_completion(struct ib_cq *cq, void *target_ptr)
 	struct srp_iu *iu;
 
 	while (ib_poll_cq(cq, 1, &wc) > 0) {
-		if (wc.status) {
+		if (wc.status == IB_WC_SUCCESS) {
+			iu = (struct srp_iu *) (uintptr_t) wc.wr_id;
+			list_add(&iu->list, &target->free_tx);
+		} else {
 			shost_printk(KERN_ERR, target->scsi_host,
 				     PFX "failed send status %d\n",
 				     wc.status);
 			target->qp_in_error = 1;
 			break;
 		}
-
-		iu = (struct srp_iu *) (uintptr_t) wc.wr_id;
-		list_add(&iu->list, &target->free_tx);
 	}
 }
 
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 07/14] ib_srp: Introduce srp_handle_qp_err()
       [not found] ` <201112011954.25811.bvanassche-HInyCGIudOg@public.gmane.org>
                     ` (5 preceding siblings ...)
  2011-12-01 19:02   ` [PATCH 06/14] ib_srp: Micro-optimize completion handlers Bart Van Assche
@ 2011-12-01 19:02   ` Bart Van Assche
       [not found]     ` <201112012002.56307.bvanassche-HInyCGIudOg@public.gmane.org>
  2011-12-01 19:08   ` [PATCH 10/14] srp_transport: Simplify attribute initialization code Bart Van Assche
                     ` (2 subsequent siblings)
  9 siblings, 1 reply; 65+ messages in thread
From: Bart Van Assche @ 2011-12-01 19:02 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: David Dillow, Roland Dreier

Factor out common code into a new function.

Signed-off-by: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>
Cc: David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org>
Cc: Roland Dreier <roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org>
---
 drivers/infiniband/ulp/srp/ib_srp.c |   19 +++++++++++--------
 1 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index e6d1aef..44c810b 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -1217,6 +1217,15 @@ static void srp_handle_recv(struct srp_target_port *target, struct ib_wc *wc)
 			     PFX "Recv failed with error code %d\n", res);
 }
 
+static void srp_handle_qp_err(enum ib_wc_status wc_status,
+			      enum ib_wc_opcode wc_opcode,
+			      struct srp_target_port *target)
+{
+	shost_printk(KERN_ERR, target->scsi_host,
+		     PFX "failed receive status %d\n", wc_status);
+	target->qp_in_error = 1;
+}
+
 static void srp_recv_completion(struct ib_cq *cq, void *target_ptr)
 {
 	struct srp_target_port *target = target_ptr;
@@ -1227,10 +1236,7 @@ static void srp_recv_completion(struct ib_cq *cq, void *target_ptr)
 		if (wc.status == IB_WC_SUCCESS) {
 			srp_handle_recv(target, &wc);
 		} else {
-			shost_printk(KERN_ERR, target->scsi_host,
-				     PFX "failed receive status %d\n",
-				     wc.status);
-			target->qp_in_error = 1;
+			srp_handle_qp_err(wc.status, wc.opcode, target);
 			break;
 		}
 	}
@@ -1247,10 +1253,7 @@ static void srp_send_completion(struct ib_cq *cq, void *target_ptr)
 			iu = (struct srp_iu *) (uintptr_t) wc.wr_id;
 			list_add(&iu->list, &target->free_tx);
 		} else {
-			shost_printk(KERN_ERR, target->scsi_host,
-				     PFX "failed send status %d\n",
-				     wc.status);
-			target->qp_in_error = 1;
+			srp_handle_qp_err(wc.status, wc.opcode, target);
 			break;
 		}
 	}
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 08/14] srp_transport: Document sysfs attributes
  2011-12-01 18:54 [PATCH 00/14] Make ib_srp better suited for H.A. purposes Bart Van Assche
@ 2011-12-01 19:05 ` Bart Van Assche
  2011-12-01 19:06 ` [PATCH 09/14] srp_transport: Fix attribute registration Bart Van Assche
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 65+ messages in thread
From: Bart Van Assche @ 2011-12-01 19:05 UTC (permalink / raw)
  To: linux-rdma; +Cc: linux-scsi, Fujita Tomonori, Brian King

Document the srp_transport sysfs attributes according to the rules specified
in Documentation/ABI/README.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Brian King <brking@linux.vnet.ibm.com>
---
 Documentation/ABI/stable/sysfs-transport-srp |   12 ++++++++++++
 1 files changed, 12 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/ABI/stable/sysfs-transport-srp

diff --git a/Documentation/ABI/stable/sysfs-transport-srp b/Documentation/ABI/stable/sysfs-transport-srp
new file mode 100644
index 0000000..7b0d4a5
--- /dev/null
+++ b/Documentation/ABI/stable/sysfs-transport-srp
@@ -0,0 +1,12 @@
+What:		/sys/class/srp_remote_ports/port-<h>:<n>/port_id
+Date:		June 27, 2007
+KernelVersion:	2.6.24
+Contact:	linux-scsi@vger.kernel.org
+Description:	16-byte local SRP port identifier in hexadecimal format. An
+		example: 4c:49:4e:55:58:20:56:49:4f:00:00:00:00:00:00:00.
+
+What:		/sys/class/srp_remote_ports/port-<h>:<n>/roles
+Date:		June 27, 2007
+KernelVersion:	2.6.24
+Contact:	linux-scsi@vger.kernel.org
+Description:	Role of the remote port. Either "SRP Initiator" or "SRP Target".
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 09/14] srp_transport: Fix attribute registration
  2011-12-01 18:54 [PATCH 00/14] Make ib_srp better suited for H.A. purposes Bart Van Assche
  2011-12-01 19:05 ` [PATCH 08/14] srp_transport: Document sysfs attributes Bart Van Assche
@ 2011-12-01 19:06 ` Bart Van Assche
       [not found]   ` <201112012006.50445.bvanassche-HInyCGIudOg@public.gmane.org>
  2011-12-01 19:10 ` [PATCH 12/14] ib_srp: Rework error handling Bart Van Assche
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 65+ messages in thread
From: Bart Van Assche @ 2011-12-01 19:06 UTC (permalink / raw)
  To: linux-rdma, David Dillow, Roland Dreier, stable
  Cc: linux-scsi, Fujita Tomonori, Brian King

Register transport attributes after the attribute array has been
set up instead of before.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Brian King <brking@linux.vnet.ibm.com>
Cc: David Dillow <dillowda@ornl.gov>
Cc: Roland Dreier <roland@purestorage.com>
Cc: stable@vger.kernel.org
---
 drivers/scsi/scsi_transport_srp.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/drivers/scsi/scsi_transport_srp.c b/drivers/scsi/scsi_transport_srp.c
index 21a045e..07c4394 100644
--- a/drivers/scsi/scsi_transport_srp.c
+++ b/drivers/scsi/scsi_transport_srp.c
@@ -324,13 +324,14 @@ srp_attach_transport(struct srp_function_template *ft)
 	i->rport_attr_cont.ac.attrs = &i->rport_attrs[0];
 	i->rport_attr_cont.ac.class = &srp_rport_class.class;
 	i->rport_attr_cont.ac.match = srp_rport_match;
-	transport_container_register(&i->rport_attr_cont);
 
 	count = 0;
 	SETUP_RPORT_ATTRIBUTE_RD(port_id);
 	SETUP_RPORT_ATTRIBUTE_RD(roles);
 	i->rport_attrs[count] = NULL;
 
+	transport_container_register(&i->rport_attr_cont);
+
 	i->f = ft;
 
 	return &i->t;
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 10/14] srp_transport: Simplify attribute initialization code
       [not found] ` <201112011954.25811.bvanassche-HInyCGIudOg@public.gmane.org>
                     ` (6 preceding siblings ...)
  2011-12-01 19:02   ` [PATCH 07/14] ib_srp: Introduce srp_handle_qp_err() Bart Van Assche
@ 2011-12-01 19:08   ` Bart Van Assche
       [not found]     ` <201112012008.00502.bvanassche-HInyCGIudOg@public.gmane.org>
  2011-12-01 19:09   ` [PATCH 11/14] ib_srp: Document sysfs attributes Bart Van Assche
  2011-12-01 19:11   ` [PATCH 13/14] ib_srp: Implement transport layer ping Bart Van Assche
  9 siblings, 1 reply; 65+ messages in thread
From: Bart Van Assche @ 2011-12-01 19:08 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Fujita Tomonori, Brian King,
	David Dillow, Roland Dreier
  Cc: linux-scsi-u79uwXL29TY76Z2rM5mHXA

Eliminate the private_rport_attrs[] array and the SETUP_*() macros
used to set up that array since the information in that array
duplicates the information in the static device attributes. Also,
trigger a kernel warning if SRP_RPORT_ATTRS is not large enough
since it is easy to forget to update that macro when adding new
attributes.

Signed-off-by: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>
Cc: FUJITA Tomonori <fujita.tomonori-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
Cc: Brian King <brking-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
Cc: David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org>
Cc: Roland Dreier <roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org>
---
 drivers/scsi/scsi_transport_srp.c |   26 ++++----------------------
 1 files changed, 4 insertions(+), 22 deletions(-)

diff --git a/drivers/scsi/scsi_transport_srp.c b/drivers/scsi/scsi_transport_srp.c
index 07c4394..1cbf097 100644
--- a/drivers/scsi/scsi_transport_srp.c
+++ b/drivers/scsi/scsi_transport_srp.c
@@ -47,7 +47,6 @@ struct srp_internal {
 	struct device_attribute *host_attrs[SRP_HOST_ATTRS + 1];
 
 	struct device_attribute *rport_attrs[SRP_RPORT_ATTRS + 1];
-	struct device_attribute private_rport_attrs[SRP_RPORT_ATTRS];
 	struct transport_container rport_attr_cont;
 };
 
@@ -72,24 +71,6 @@ static DECLARE_TRANSPORT_CLASS(srp_host_class, "srp_host", srp_host_setup,
 static DECLARE_TRANSPORT_CLASS(srp_rport_class, "srp_remote_ports",
 			       NULL, NULL, NULL);
 
-#define SETUP_TEMPLATE(attrb, field, perm, test, ro_test, ro_perm)	\
-	i->private_##attrb[count] = dev_attr_##field;		\
-	i->private_##attrb[count].attr.mode = perm;			\
-	if (ro_test) {							\
-		i->private_##attrb[count].attr.mode = ro_perm;		\
-		i->private_##attrb[count].store = NULL;			\
-	}								\
-	i->attrb[count] = &i->private_##attrb[count];			\
-	if (test)							\
-		count++
-
-#define SETUP_RPORT_ATTRIBUTE_RD(field)					\
-	SETUP_TEMPLATE(rport_attrs, field, S_IRUGO, 1, 0, 0)
-
-#define SETUP_RPORT_ATTRIBUTE_RW(field)					\
-	SETUP_TEMPLATE(rport_attrs, field, S_IRUGO | S_IWUSR,		\
-		       1, 1, S_IRUGO)
-
 #define SRP_PID(p) \
 	(p)->port_id[0], (p)->port_id[1], (p)->port_id[2], (p)->port_id[3], \
 	(p)->port_id[4], (p)->port_id[5], (p)->port_id[6], (p)->port_id[7], \
@@ -326,9 +307,10 @@ srp_attach_transport(struct srp_function_template *ft)
 	i->rport_attr_cont.ac.match = srp_rport_match;
 
 	count = 0;
-	SETUP_RPORT_ATTRIBUTE_RD(port_id);
-	SETUP_RPORT_ATTRIBUTE_RD(roles);
-	i->rport_attrs[count] = NULL;
+	i->rport_attrs[count++] = &dev_attr_port_id;
+	i->rport_attrs[count++] = &dev_attr_roles;
+	i->rport_attrs[count++] = NULL;
+	WARN_ON(count > ARRAY_SIZE(i->rport_attrs));
 
 	transport_container_register(&i->rport_attr_cont);
 
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 11/14] ib_srp: Document sysfs attributes
       [not found] ` <201112011954.25811.bvanassche-HInyCGIudOg@public.gmane.org>
                     ` (7 preceding siblings ...)
  2011-12-01 19:08   ` [PATCH 10/14] srp_transport: Simplify attribute initialization code Bart Van Assche
@ 2011-12-01 19:09   ` Bart Van Assche
       [not found]     ` <201112012009.12815.bvanassche-HInyCGIudOg@public.gmane.org>
  2011-12-01 19:11   ` [PATCH 13/14] ib_srp: Implement transport layer ping Bart Van Assche
  9 siblings, 1 reply; 65+ messages in thread
From: Bart Van Assche @ 2011-12-01 19:09 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: David Dillow, Roland Dreier

Document the ib_srp sysfs attributes according to the rules specified
in Documentation/ABI/README.

Signed-off-by: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>
Cc: David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org>
Cc: Roland Dreier <roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org>
---
 Documentation/ABI/stable/sysfs-driver-ib_srp |  141 ++++++++++++++++++++++++++
 1 files changed, 141 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/ABI/stable/sysfs-driver-ib_srp

diff --git a/Documentation/ABI/stable/sysfs-driver-ib_srp b/Documentation/ABI/stable/sysfs-driver-ib_srp
new file mode 100644
index 0000000..6b74c09
--- /dev/null
+++ b/Documentation/ABI/stable/sysfs-driver-ib_srp
@@ -0,0 +1,141 @@
+What:		/sys/class/infiniband_srp/srp-<hca>-<port_number>/add_target
+Date:		January 2, 2006
+KernelVersion:	2.6.15
+Contact:	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
+Description:	Interface for making ib_srp connect to a new target.
+		One can request ib_srp to connect to a new target by writing
+		a comma-separated list of login parameters to this sysfs
+		attribute. The supported parameters are:
+		* id_ext, a 16-digit hexadecimal number specifying the last
+		  eight bytes of the 16-byte target port identifier sent by
+		  ib_srp to the target in the SRP_LOGIN_REQ request.
+		* ioc_guid, a 16-digit hexadecimal number specifying the first
+		  eight bytes of the 16-byte target port identifier sent by
+		  ib_srp to the target in the SRP_LOGIN_REQ request.
+		* dgid, a 16-digit hexadecimal number specifying the
+		  destination GID.
+		* pkey, a four-digit hexadecimal number specifying the
+		  InfiniBand partition key.
+		* service_id, a 16-digit hexadecimal number specifying the
+		  InfiniBand service ID used for communication with the SRP
+		  target.
+		* max_sect, a decimal number specifying the maximum number of
+		  sectors to be transferred via a single SRP_CMD request.
+		* max_cmd_per_lun, a decimal number specifying the maximum
+		  number of outstanding commands per LUN.
+		* io_class, a hexadecimal number specifying the SRP I/O class.
+		  Must be either 0xff00 (rev 10) or 0x0100 (rev 16a).
+		* initiator_ext, a 16-digit hexadecimal number specifying the
+		  last eight bytes of the 16-byte initiator port identifier
+		  sent by ib_srp to the target in the SRP_LOGIN_REQ request.
+		* cmd_sg_entries, the maximum number of memory descriptors
+		  that fit in a single SRP command when using the direct data
+		  buffer descriptor format.
+		* allow_ext_sg, whether ib_srp is allowed to send SRP_CMD
+		  requests to the target with an indirect data buffer
+		  descriptor, that is a data buffer descriptor that has to be
+		  obtained by the target via an additional RDMA transfer.
+		* sg_tablesize, a number specifying the maximum length of
+		  scatter/gather lists passed by the SCSI layer to ib_srp.
+
+What:		/sys/class/infiniband_srp/srp-<hca>-<port_number>/ibdev
+Date:		January 2, 2006
+KernelVersion:	2.6.15
+Contact:	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
+Description:	HCA name (<hca>).
+
+What:		/sys/class/infiniband_srp/srp-<hca>-<port_number>/port
+Date:		January 2, 2006
+KernelVersion:	2.6.15
+Contact:	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
+Description:	HCA port number (<port_number>).
+
+What:		/sys/class/scsi_host/host<n>/allow_ext_sg
+Date:		May 19, 2011
+KernelVersion:	2.6.39
+Contact:	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
+Description:	Whether ib_srp is allowed to send SRP_CMD requests to the
+		target with an indirect data buffer descriptor, that is a
+		data buffer descriptor that has to be obtained by the target
+		via an additional RDMA transfer. If set to zero, ib_srp will
+		refuse to process data transfers whose scatter/gather list
+		length exceeds cmd_sg_entries.
+
+What:		/sys/class/scsi_host/host<n>/cmd_sg_entries
+Date:		May 19, 2011
+KernelVersion:	2.6.39
+Contact:	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
+Description:	See also allow_ext_sg.
+
+What:		/sys/class/scsi_host/host<n>/dgid
+Date:		June 17, 2006
+KernelVersion:	2.6.17
+Contact:	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
+Description:	InfiniBand destination GID used for communication with the SRP
+		target. Differs from orig_dgid if port redirection has happened.
+
+What:		/sys/class/scsi_host/host<n>/id_ext
+Date:		June 17, 2006
+KernelVersion:	2.6.17
+Contact:	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
+Description:	Last eight bytes of the 16-byte target port identifier sent by
+		ib_srp to the target in the SRP_LOGIN_REQ request.
+
+What:		/sys/class/scsi_host/host<n>/ioc_guid
+Date:		June 17, 2006
+KernelVersion:	2.6.17
+Contact:	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
+Description:	First eight bytes of the 16-byte target port identifier sent
+		by ib_srp to the target in the SRP_LOGIN_REQ request.
+
+What:		/sys/class/scsi_host/host<n>/local_ib_device
+Date:		November 29, 2006
+KernelVersion:	2.6.19
+Contact:	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
+Description:	Name of the InfiniBand HCA used for communicating with the
+		SRP target.
+
+What:		/sys/class/scsi_host/host<n>/local_ib_port
+Date:		November 29, 2006
+KernelVersion:	2.6.19
+Contact:	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
+Description:	Number of the HCA port used for communicating with the
+		SRP target.
+
+What:		/sys/class/scsi_host/host<n>/orig_dgid
+Date:		June 17, 2006
+KernelVersion:	2.6.17
+Contact:	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
+Description:	InfiniBand destination GID specified in the parameters
+		written to the add_target sysfs attribute.
+
+What:		/sys/class/scsi_host/host<n>/pkey
+Date:		June 17, 2006
+KernelVersion:	2.6.17
+Contact:	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
+Description:	A 16-bit number representing the InfiniBand partition key used
+		for communication with the SRP target.
+
+What:		/sys/class/scsi_host/host<n>/req_lim
+Date:		October 20, 2010
+KernelVersion:	2.6.36
+Contact:	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
+Description:	Number of requests ib_srp can send to the target before it has
+		to wait for more credits. For more information see also the
+		SRP credit algorithm in the SRP specification.
+
+What:		/sys/class/scsi_host/host<n>/service_id
+Date:		June 17, 2006
+KernelVersion:	2.6.17
+Contact:	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
+Description:	InfiniBand service ID used for communication with the SRP
+		target.
+
+What:		/sys/class/scsi_host/host<n>/zero_req_lim
+Date:		September 20, 2006
+KernelVersion:	2.6.18
+Contact:	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
+Description:	Number of times the initiator had to wait with sending a
+		request to the target because it ran out of credits. For more
+		information see also the SRP credit algorithm in the SRP
+		specification.
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 12/14] ib_srp: Rework error handling
  2011-12-01 18:54 [PATCH 00/14] Make ib_srp better suited for H.A. purposes Bart Van Assche
  2011-12-01 19:05 ` [PATCH 08/14] srp_transport: Document sysfs attributes Bart Van Assche
  2011-12-01 19:06 ` [PATCH 09/14] srp_transport: Fix attribute registration Bart Van Assche
@ 2011-12-01 19:10 ` Bart Van Assche
       [not found]   ` <201112012010.37276.bvanassche-HInyCGIudOg@public.gmane.org>
       [not found] ` <201112011954.25811.bvanassche-HInyCGIudOg@public.gmane.org>
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 65+ messages in thread
From: Bart Van Assche @ 2011-12-01 19:10 UTC (permalink / raw)
  To: linux-rdma, David Dillow, Fujita Tomonori, Brian King, Roland Dreier
  Cc: linux-scsi

Add the necessary functions in the SRP transport module to allow
an SRP initiator driver to implement transport layer recovery.

Convert srp_target_port.qp_in_error into a target port state.

Factor out the code for removing an SRP target into the new function
srp_remove_target().

In the ib_srp IB completion handlers, do not stop processing
completions after the first error completion or a CM DREQ has been
received. Block the SCSI target as soon as the first IB error
completion has been received. Eliminate the SCSI target state test
from srp_queuecommand().

Rework ib_srp transport layer error handling. Instead of letting SCSI
commands time out if a transport layer error occurs, block the SCSI
host and try to reconnect until the reconnect timeout elapses or until
the maximum number of reconnect attempts has been exceeded, whichever
happens first.

Rescan LUNs after having unblocked a SCSI target controlled by ib_srp.

Add the sysfs attributes reconnect_tmo, failed_reconnects and
max_reconnects.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Cc: David Dillow <dillowda@ornl.gov>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Brian King <brking@linux.vnet.ibm.com>
Cc: Roland Dreier <roland@purestorage.com>
---
 Documentation/ABI/stable/sysfs-driver-ib_srp |   25 ++
 drivers/infiniband/ulp/srp/ib_srp.c          |  398 ++++++++++++++++++++------
 drivers/infiniband/ulp/srp/ib_srp.h          |   27 ++-
 drivers/scsi/scsi_transport_srp.c            |  162 +++++++++++-
 include/scsi/scsi_transport_srp.h            |   31 ++
 5 files changed, 548 insertions(+), 95 deletions(-)

diff --git a/Documentation/ABI/stable/sysfs-driver-ib_srp b/Documentation/ABI/stable/sysfs-driver-ib_srp
index 6b74c09..b1f449c 100644
--- a/Documentation/ABI/stable/sysfs-driver-ib_srp
+++ b/Documentation/ABI/stable/sysfs-driver-ib_srp
@@ -74,6 +74,12 @@ Contact:	linux-rdma@vger.kernel.org
 Description:	InfiniBand destination GID used for communication with the SRP
 		target. Differs from orig_dgid if port redirection has happened.
 
+What:		/sys/class/scsi_host/host<n>/failed_reconnects
+Date:		November 1, 2011
+KernelVersion:	3.3
+Contact:	linux-rdma@vger.kernel.org
+Description:	Number of consecutive failed SRP reconnect attempts.
+
 What:		/sys/class/scsi_host/host<n>/id_ext
 Date:		June 17, 2006
 KernelVersion:	2.6.17
@@ -102,6 +108,14 @@ Contact:	linux-rdma@vger.kernel.org
 Description:	Number of the HCA port used for communicating with the
 		SRP target.
 
+What:		/sys/class/scsi_host/host<n>/max_reconnects
+Date:		November 1, 2011
+KernelVersion:	3.3
+Contact:	linux-rdma@vger.kernel.org
+Description:	Maximum number of times ib_srp should attempt to reconnect
+		to the SRP target after an I/O error occurred. The value -1
+		means that ib_srp should keep trying to reconnect forever.
+
 What:		/sys/class/scsi_host/host<n>/orig_dgid
 Date:		June 17, 2006
 KernelVersion:	2.6.17
@@ -116,6 +130,17 @@ Contact:	linux-rdma@vger.kernel.org
 Description:	A 16-bit number representing the InfiniBand partition key used
 		for communication with the SRP target.
 
+What:		/sys/class/scsi_host/host<n>/reconnect_tmo
+Date:		November 1, 2011
+KernelVersion:	3.3
+Contact:	linux-rdma@vger.kernel.org
+Description:	Time in seconds to wait after a failed connection attempt
+		before trying to reconnect. Setting this parameter to zero or
+		to a negative value prevents ib_srp to attempt to reconnect.
+		Changing this parameter from a value <= 0 into a value > 0 at
+		a time there is no connection with the target will cause
+		ib_srp to try to reconnect with the target.
+
 What:		/sys/class/scsi_host/host<n>/req_lim
 Date:		October 20, 2010
 KernelVersion:	2.6.36
diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index 44c810b..797a8f1 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -1,5 +1,6 @@
 /*
  * Copyright (c) 2005 Cisco Systems.  All rights reserved.
+ * Copyright (c) 2010-2011 Bart Van Assche <bvanassche@acm.org>.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
@@ -430,6 +431,22 @@ static int srp_send_req(struct srp_target_port *target)
 
 static void srp_disconnect_target(struct srp_target_port *target)
 {
+	bool disconnect = false;
+
+	spin_lock_irq(&target->lock);
+	switch (target->state) {
+	case SRP_TARGET_LIVE:
+	case SRP_TARGET_BLOCKED:
+		disconnect = true;
+		target->state = SRP_TARGET_DISCONNECTED;
+	default:
+		break;
+	}
+	spin_unlock_irq(&target->lock);
+
+	if (!disconnect)
+		return;
+
 	/* XXX should send SRP_I_LOGOUT request */
 
 	init_completion(&target->done);
@@ -489,32 +506,77 @@ static void srp_del_scsi_host_attr(struct Scsi_Host *shost)
 		device_remove_file(&shost->shost_dev, *attr);
 }
 
-static void srp_remove_work(struct work_struct *work)
+/**
+ * srp_remove_target() - Remove an SRP target.
+ *
+ * The strategy to remove a target is as follows:
+ * - The caller must have set target->state to SRP_TARGET_REMOVED before
+ *   invoking or queueing this function such that new calls to
+ *   srp_disconnect_target(), srp_reconnect_target(), srp_recovery_timedout()
+ *   or srp_block_work() do not have any effect.
+ * - Cancel block_work and lock and unlock target->mutex to make sure that any
+ *   concurrent srp_block_rport() calls have finished.
+ * - Unblock the rport such that any blocked scanning work resumes.
+ * - Tear down the rport, the SCSI host and the IB resources associated with
+ *   the target.
+ */
+static void srp_remove_target(struct srp_target_port *target)
 {
-	struct srp_target_port *target =
-		container_of(work, struct srp_target_port, work);
-
-	if (!srp_change_state(target, SRP_TARGET_DEAD, SRP_TARGET_REMOVED))
-		return;
-
-	spin_lock(&target->srp_host->target_lock);
-	list_del(&target->list);
-	spin_unlock(&target->srp_host->target_lock);
+	WARN_ON(target->state != SRP_TARGET_REMOVED);
 
 	srp_del_scsi_host_attr(target->scsi_host);
+	cancel_work_sync(&target->block_work);
+	mutex_lock(&target->mutex);
+	mutex_unlock(&target->mutex);
+	srp_unblock_rport(rport);
+	srp_rport_disable_recovery(target->rport);
+	cancel_delayed_work_sync(&target->reconnect_work);
+	srp_stop_rport(target->rport);
 	srp_remove_host(target->scsi_host);
 	scsi_remove_host(target->scsi_host);
+	cancel_work_sync(&target->scan_work);
+	srp_disconnect_target(target);
 	ib_destroy_cm_id(target->cm_id);
 	srp_free_target_ib(target);
 	srp_free_req_data(target);
 	scsi_host_put(target->scsi_host);
 }
 
+static void srp_remove_work(struct work_struct *work)
+{
+	struct srp_target_port *target;
+
+	target = container_of(work, struct srp_target_port, remove_work);
+
+	spin_lock(&target->srp_host->target_lock);
+	list_del(&target->list);
+	spin_unlock(&target->srp_host->target_lock);
+
+	srp_remove_target(target);
+}
+
+static void srp_recovery_timedout(struct srp_rport *rport)
+{
+	struct srp_target_port *target = rport->lld_data;
+
+	pr_debug("recovery timeout: rport = %p; target = %p / state %d\n",
+		 rport, target, target->state);
+
+	cancel_delayed_work_sync(&target->reconnect_work);
+
+	mutex_lock(&target->mutex);
+	if (srp_change_state(target, SRP_TARGET_BLOCKED, SRP_TARGET_LIVE))
+		srp_unblock_rport(rport);
+	mutex_unlock(&target->mutex);
+}
+
 static int srp_connect_target(struct srp_target_port *target)
 {
 	int retries = 3;
 	int ret;
 
+	WARN_ON(target->state != SRP_TARGET_CONNECTING);
+
 	ret = srp_lookup_path(target);
 	if (ret)
 		return ret;
@@ -606,16 +668,40 @@ static void srp_reset_req(struct srp_target_port *target, struct srp_request *re
 	srp_remove_req(target, req, 0);
 }
 
+static void srp_scan_target(struct srp_target_port *target)
+{
+	scsi_scan_target(&target->scsi_host->shost_gendev, 0, target->scsi_id,
+			 SCAN_WILD_CARD, 0);
+}
+
+static void srp_scan_work(struct work_struct *work)
+{
+	struct srp_target_port *target;
+
+	target = container_of(work, struct srp_target_port, scan_work);
+	srp_scan_target(target);
+}
+
 static int srp_reconnect_target(struct srp_target_port *target)
 {
 	struct ib_qp_attr qp_attr;
 	struct ib_wc wc;
 	int i, ret;
 
-	if (!srp_change_state(target, SRP_TARGET_LIVE, SRP_TARGET_CONNECTING))
-		return -EAGAIN;
+	mutex_lock(&target->mutex);
+	if (srp_change_state(target, SRP_TARGET_LIVE, SRP_TARGET_BLOCKED)) {
+		shost_printk(KERN_INFO, target->scsi_host,
+			     PFX "blocked SCSI host.\n");
+		srp_block_rport(target->rport);
+	}
+	mutex_unlock(&target->mutex);
 
 	srp_disconnect_target(target);
+
+	if (!srp_change_state(target, SRP_TARGET_DISCONNECTED,
+			      SRP_TARGET_CONNECTING))
+		return -EAGAIN;
+
 	/*
 	 * Now get a new local CM ID so that we avoid confusing the
 	 * target in case things are really fouled up.
@@ -648,38 +734,65 @@ static int srp_reconnect_target(struct srp_target_port *target)
 	for (i = 0; i < SRP_SQ_SIZE; ++i)
 		list_add(&target->tx_ring[i]->list, &target->free_tx);
 
-	target->qp_in_error = 0;
 	ret = srp_connect_target(target);
 	if (ret)
 		goto err;
 
-	if (!srp_change_state(target, SRP_TARGET_CONNECTING, SRP_TARGET_LIVE))
+	mutex_lock(&target->mutex);
+	if (srp_change_state(target, SRP_TARGET_CONNECTING, SRP_TARGET_LIVE)) {
+		shost_printk(KERN_INFO, target->scsi_host,
+			     PFX "unblocked SCSI host.\n");
+		srp_unblock_rport(target->rport);
+	} else {
 		ret = -EAGAIN;
+	}
+	mutex_unlock(&target->mutex);
+
+	/*
+	 * Since this code can be invoked from the context of the SCSI error
+	 * handler, invoke SCSI scanning asynchronously.
+	 */
+	if (ret == 0)
+		queue_work(system_long_wq, &target->scan_work);
 
 	return ret;
 
 err:
-	shost_printk(KERN_ERR, target->scsi_host,
-		     PFX "reconnect failed (%d), removing target port.\n", ret);
+	srp_change_state(target, SRP_TARGET_CONNECTING, SRP_TARGET_BLOCKED);
+	return ret;
+}
 
-	/*
-	 * We couldn't reconnect, so kill our target port off.
-	 * However, we have to defer the real removal because we
-	 * are in the context of the SCSI error handler now, which
-	 * will deadlock if we call scsi_remove_host().
-	 *
-	 * Schedule our work inside the lock to avoid a race with
-	 * the flush_scheduled_work() in srp_remove_one().
-	 */
-	spin_lock_irq(&target->lock);
-	if (target->state == SRP_TARGET_CONNECTING) {
-		target->state = SRP_TARGET_DEAD;
-		INIT_WORK(&target->work, srp_remove_work);
-		queue_work(ib_wq, &target->work);
+static void srp_reconnect_repeatedly(struct srp_target_port *target)
+{
+	int res, tmo;
+
+	res = srp_reconnect_target(target);
+	if (res == 0 || target->state != SRP_TARGET_BLOCKED)
+		return;
+
+	++target->failed_reconnects;
+
+	shost_printk(KERN_ERR, target->scsi_host,
+		     PFX "reconnect attempt %d failed (%d).\n",
+		     target->failed_reconnects, res);
+
+	tmo = target->reconnect_tmo;
+	if (tmo > 0 &&
+	    (target->max_reconnects < 0 ||
+	     target->failed_reconnects < target->max_reconnects)) {
+		queue_delayed_work(system_long_wq, &target->reconnect_work,
+				   tmo * HZ);
 	}
-	spin_unlock_irq(&target->lock);
+}
 
-	return ret;
+static void srp_reconnect_work(struct work_struct *work)
+{
+	struct srp_target_port *target;
+
+	target = container_of(to_delayed_work(work), struct srp_target_port,
+			      reconnect_work);
+
+	srp_reconnect_repeatedly(target);
 }
 
 static void srp_map_desc(struct srp_map_state *state, dma_addr_t dma_addr,
@@ -1217,13 +1330,39 @@ static void srp_handle_recv(struct srp_target_port *target, struct ib_wc *wc)
 			     PFX "Recv failed with error code %d\n", res);
 }
 
+static void srp_block_work(struct work_struct *work)
+{
+	struct srp_target_port *target;
+	bool reconnect = false;
+
+	target = container_of(work, struct srp_target_port, block_work);
+
+	mutex_lock(&target->mutex);
+	if (srp_change_state(target, SRP_TARGET_LIVE, SRP_TARGET_BLOCKED)) {
+		shost_printk(KERN_INFO, target->scsi_host,
+			     PFX "blocked SCSI host.\n");
+		srp_block_rport(target->rport);
+		srp_start_recovery_timer(target->rport);
+		reconnect = true;
+	}
+	mutex_unlock(&target->mutex);
+
+	if (reconnect) {
+		target->failed_reconnects = 0;
+		srp_reconnect_repeatedly(target);
+	}
+}
+
 static void srp_handle_qp_err(enum ib_wc_status wc_status,
 			      enum ib_wc_opcode wc_opcode,
 			      struct srp_target_port *target)
 {
-	shost_printk(KERN_ERR, target->scsi_host,
-		     PFX "failed receive status %d\n", wc_status);
-	target->qp_in_error = 1;
+	if (!queue_work(system_long_wq, &target->block_work) ||
+	    target->state == SRP_TARGET_BLOCKED)
+		return;
+
+	shost_printk(KERN_ERR, target->scsi_host, PFX "failed %s status %d\n",
+		     wc_opcode & IB_WC_RECV ? "receive" : "send", wc_status);
 }
 
 static void srp_recv_completion(struct ib_cq *cq, void *target_ptr)
@@ -1233,12 +1372,10 @@ static void srp_recv_completion(struct ib_cq *cq, void *target_ptr)
 
 	ib_req_notify_cq(cq, IB_CQ_NEXT_COMP);
 	while (ib_poll_cq(cq, 1, &wc) > 0) {
-		if (wc.status == IB_WC_SUCCESS) {
+		if (wc.status == IB_WC_SUCCESS)
 			srp_handle_recv(target, &wc);
-		} else {
+		else
 			srp_handle_qp_err(wc.status, wc.opcode, target);
-			break;
-		}
 	}
 }
 
@@ -1254,7 +1391,6 @@ static void srp_send_completion(struct ib_cq *cq, void *target_ptr)
 			list_add(&iu->list, &target->free_tx);
 		} else {
 			srp_handle_qp_err(wc.status, wc.opcode, target);
-			break;
 		}
 	}
 }
@@ -1269,16 +1405,6 @@ static int srp_queuecommand(struct Scsi_Host *shost, struct scsi_cmnd *scmnd)
 	unsigned long flags;
 	int len;
 
-	if (target->state == SRP_TARGET_CONNECTING)
-		goto err;
-
-	if (target->state == SRP_TARGET_DEAD ||
-	    target->state == SRP_TARGET_REMOVED) {
-		scmnd->result = DID_BAD_TARGET << 16;
-		scmnd->scsi_done(scmnd);
-		return 0;
-	}
-
 	spin_lock_irqsave(&target->lock, flags);
 	iu = __srp_get_tx_iu(target, SRP_IU_CMD);
 	if (!iu)
@@ -1335,7 +1461,6 @@ err_iu:
 err_unlock:
 	spin_unlock_irqrestore(&target->lock, flags);
 
-err:
 	return SCSI_MLQUEUE_HOST_BUSY;
 }
 
@@ -1589,6 +1714,7 @@ static int srp_cm_handler(struct ib_cm_id *cm_id, struct ib_cm_event *event)
 		if (ib_send_cm_drep(cm_id, NULL, 0))
 			shost_printk(KERN_ERR, target->scsi_host,
 				     PFX "Sending CM DREP failed\n");
+		queue_work(system_long_wq, &target->block_work);
 		break;
 
 	case IB_CM_TIMEWAIT_EXIT:
@@ -1623,10 +1749,6 @@ static int srp_send_tsk_mgmt(struct srp_target_port *target,
 	struct srp_iu *iu;
 	struct srp_tsk_mgmt *tsk_mgmt;
 
-	if (target->state == SRP_TARGET_DEAD ||
-	    target->state == SRP_TARGET_REMOVED)
-		return -1;
-
 	init_completion(&target->tsk_mgmt_done);
 
 	spin_lock_irq(&target->lock);
@@ -1669,7 +1791,7 @@ static int srp_abort(struct scsi_cmnd *scmnd)
 
 	shost_printk(KERN_ERR, target->scsi_host, "SRP abort called\n");
 
-	if (!req || target->qp_in_error)
+	if (!req)
 		return FAILED;
 	if (srp_send_tsk_mgmt(target, req->index, scmnd->device->lun,
 			      SRP_TSK_ABORT_TASK))
@@ -1693,8 +1815,6 @@ static int srp_reset_device(struct scsi_cmnd *scmnd)
 
 	shost_printk(KERN_ERR, target->scsi_host, "SRP reset_device called\n");
 
-	if (target->qp_in_error)
-		return FAILED;
 	if (srp_send_tsk_mgmt(target, SRP_TAG_NO_REQ, scmnd->device->lun,
 			      SRP_TSK_LUN_RESET))
 		return FAILED;
@@ -1713,12 +1833,38 @@ static int srp_reset_device(struct scsi_cmnd *scmnd)
 static int srp_reset_host(struct scsi_cmnd *scmnd)
 {
 	struct srp_target_port *target = host_to_target(scmnd->device->host);
+	struct srp_host *host = target->srp_host;
 	int ret = FAILED;
+	bool remove = false;
 
 	shost_printk(KERN_ERR, target->scsi_host, PFX "SRP reset_host called\n");
 
-	if (!srp_reconnect_target(target))
+	if (!srp_reconnect_target(target)) {
 		ret = SUCCESS;
+	} else {
+		/*
+		 * We couldn't reconnect, so kill our target port off.
+		 * However, we have to defer the real removal because we
+		 * are in the context of the SCSI error handler now, which
+		 * will deadlock if we call scsi_remove_host().
+		 *
+		 * Schedule our work inside the lock to avoid a race with
+		 * the flush_work_sync() call in srp_remove_one().
+		 */
+		spin_lock(&host->target_lock);
+		spin_lock_irq(&target->lock);
+		if (target->state != SRP_TARGET_REMOVED) {
+			target->state = SRP_TARGET_REMOVED;
+			remove = true;
+		}
+		spin_unlock_irq(&target->lock);
+		if (remove) {
+			shost_printk(KERN_ERR, target->scsi_host, PFX "recon"
+				     "nect failed, removing target port.\n");
+			queue_work(system_long_wq, &target->remove_work);
+		}
+		spin_unlock(&host->target_lock);
+	}
 
 	return ret;
 }
@@ -1822,6 +1968,66 @@ static ssize_t show_allow_ext_sg(struct device *dev,
 	return sprintf(buf, "%s\n", target->allow_ext_sg ? "true" : "false");
 }
 
+static ssize_t show_reconnect_tmo(struct device *dev,
+				  struct device_attribute *attr, char *buf)
+{
+	struct srp_target_port *target = host_to_target(class_to_shost(dev));
+
+	return sprintf(buf, "%d\n", target->reconnect_tmo);
+}
+
+static ssize_t store_reconnect_tmo(struct device *dev,
+				   struct device_attribute *attr,
+				   const char *buf, const size_t count)
+{
+	struct srp_target_port *target = host_to_target(class_to_shost(dev));
+	char ch[16];
+	int res, tmo;
+
+	sprintf(ch, "%.*s", (int)min(sizeof(ch) - 1, count), buf);
+	res = kstrtoint(ch, 0, &tmo);
+	if (res)
+		goto out;
+	target->reconnect_tmo = tmo;
+	if (tmo > 0 && target->state == SRP_TARGET_BLOCKED)
+		queue_delayed_work(system_long_wq, &target->reconnect_work,
+				   tmo * HZ);
+	else if (tmo <= 0)
+		cancel_delayed_work(&target->reconnect_work);
+	res = count;
+out:
+	return res;
+}
+
+static ssize_t show_failed_reconnects(struct device *dev,
+				      struct device_attribute *attr, char *buf)
+{
+	struct srp_target_port *target = host_to_target(class_to_shost(dev));
+
+	return sprintf(buf, "%d\n", target->failed_reconnects);
+}
+
+static ssize_t show_max_reconnects(struct device *dev,
+				   struct device_attribute *attr, char *buf)
+{
+	struct srp_target_port *target = host_to_target(class_to_shost(dev));
+
+	return sprintf(buf, "%d\n", target->max_reconnects);
+}
+
+static ssize_t store_max_reconnects(struct device *dev,
+				    struct device_attribute *attr,
+				    const char *buf, const size_t count)
+{
+	struct srp_target_port *target = host_to_target(class_to_shost(dev));
+	char ch[16];
+	int res;
+
+	sprintf(ch, "%.*s", (int)min(sizeof(ch) - 1, count), buf);
+	res = kstrtoint(ch, 0, &target->max_reconnects);
+	return res == 0 ? count : res;
+}
+
 static DEVICE_ATTR(id_ext,	    S_IRUGO, show_id_ext,	   NULL);
 static DEVICE_ATTR(ioc_guid,	    S_IRUGO, show_ioc_guid,	   NULL);
 static DEVICE_ATTR(service_id,	    S_IRUGO, show_service_id,	   NULL);
@@ -1834,6 +2040,11 @@ static DEVICE_ATTR(local_ib_port,   S_IRUGO, show_local_ib_port,   NULL);
 static DEVICE_ATTR(local_ib_device, S_IRUGO, show_local_ib_device, NULL);
 static DEVICE_ATTR(cmd_sg_entries,  S_IRUGO, show_cmd_sg_entries,  NULL);
 static DEVICE_ATTR(allow_ext_sg,    S_IRUGO, show_allow_ext_sg,    NULL);
+static DEVICE_ATTR(reconnect_tmo,   S_IRUGO | S_IWUSR, show_reconnect_tmo,
+		   store_reconnect_tmo);
+static DEVICE_ATTR(failed_reconnects, S_IRUGO, show_failed_reconnects, NULL);
+static DEVICE_ATTR(max_reconnects, S_IRUGO | S_IWUSR, show_max_reconnects,
+		   store_max_reconnects);
 
 static struct device_attribute *srp_host_attrs[] = {
 	&dev_attr_id_ext,
@@ -1848,6 +2059,9 @@ static struct device_attribute *srp_host_attrs[] = {
 	&dev_attr_local_ib_device,
 	&dev_attr_cmd_sg_entries,
 	&dev_attr_allow_ext_sg,
+	&dev_attr_reconnect_tmo,
+	&dev_attr_failed_reconnects,
+	&dev_attr_max_reconnects,
 	NULL
 };
 
@@ -1869,6 +2083,10 @@ static struct scsi_host_template srp_template = {
 	.shost_attrs			= srp_host_attrs
 };
 
+static struct srp_function_template ib_srp_transport_functions = {
+	.rport_recovery_timedout = srp_recovery_timedout,
+};
+
 static int srp_add_target(struct srp_host *host, struct srp_target_port *target)
 {
 	struct srp_rport_identifiers ids;
@@ -1889,14 +2107,17 @@ static int srp_add_target(struct srp_host *host, struct srp_target_port *target)
 		return PTR_ERR(rport);
 	}
 
+	rport->ft = &ib_srp_transport_functions;
+	rport->lld_data = target;
+
 	spin_lock(&host->target_lock);
 	list_add_tail(&target->list, &host->target_list);
 	spin_unlock(&host->target_lock);
 
 	target->state = SRP_TARGET_LIVE;
+	target->rport = rport;
 
-	scsi_scan_target(&target->scsi_host->shost_gendev,
-			 0, target->scsi_id, SCAN_WILD_CARD, 0);
+	srp_scan_target(target);
 
 	return 0;
 }
@@ -2213,6 +2434,13 @@ static ssize_t srp_create_target(struct device *dev,
 			     sizeof (struct srp_indirect_buf) +
 			     target->cmd_sg_cnt * sizeof (struct srp_direct_buf);
 
+	mutex_init(&target->mutex);
+	INIT_WORK(&target->block_work, srp_block_work);
+	INIT_WORK(&target->remove_work, srp_remove_work);
+	INIT_WORK(&target->scan_work, srp_scan_work);
+	INIT_DELAYED_WORK(&target->reconnect_work, srp_reconnect_work);
+	target->reconnect_tmo = 10;
+	target->max_reconnects = -1;
 	spin_lock_init(&target->lock);
 	INIT_LIST_HEAD(&target->free_tx);
 	INIT_LIST_HEAD(&target->free_reqs);
@@ -2257,7 +2485,8 @@ static ssize_t srp_create_target(struct device *dev,
 	if (ret)
 		goto err_free_ib;
 
-	target->qp_in_error = 0;
+	target->state = SRP_TARGET_CONNECTING;
+
 	ret = srp_connect_target(target);
 	if (ret) {
 		shost_printk(KERN_ERR, target->scsi_host,
@@ -2448,8 +2677,7 @@ static void srp_remove_one(struct ib_device *device)
 {
 	struct srp_device *srp_dev;
 	struct srp_host *host, *tmp_host;
-	LIST_HEAD(target_list);
-	struct srp_target_port *target, *tmp_target;
+	struct srp_target_port *target;
 
 	srp_dev = ib_get_client_data(device, &srp_client);
 
@@ -2462,36 +2690,31 @@ static void srp_remove_one(struct ib_device *device)
 		wait_for_completion(&host->released);
 
 		/*
-		 * Mark all target ports as removed, so we stop queueing
-		 * commands and don't try to reconnect.
+		 * Mark all target ports as removed so we don't try to
+		 * reconnect. Wait for any removal tasks that may have
+		 * started before we marked our target ports as
+		 * removed.
 		 */
 		spin_lock(&host->target_lock);
-		list_for_each_entry(target, &host->target_list, list) {
+		while (!list_empty(&host->target_list)) {
+			target = list_first_entry(&host->target_list,
+						  struct srp_target_port, list);
 			spin_lock_irq(&target->lock);
 			target->state = SRP_TARGET_REMOVED;
 			spin_unlock_irq(&target->lock);
+			if (work_pending(&target->remove_work)) {
+				spin_unlock(&host->target_lock);
+				flush_work_sync(&target->remove_work);
+				spin_lock(&host->target_lock);
+			} else {
+				list_del(&target->list);
+				spin_unlock(&host->target_lock);
+				srp_remove_target(target);
+				spin_lock(&host->target_lock);
+			}
 		}
 		spin_unlock(&host->target_lock);
 
-		/*
-		 * Wait for any reconnection tasks that may have
-		 * started before we marked our target ports as
-		 * removed, and any target port removal tasks.
-		 */
-		flush_workqueue(ib_wq);
-
-		list_for_each_entry_safe(target, tmp_target,
-					 &host->target_list, list) {
-			srp_del_scsi_host_attr(target->scsi_host);
-			srp_remove_host(target->scsi_host);
-			scsi_remove_host(target->scsi_host);
-			srp_disconnect_target(target);
-			ib_destroy_cm_id(target->cm_id);
-			srp_free_target_ib(target);
-			srp_free_req_data(target);
-			scsi_host_put(target->scsi_host);
-		}
-
 		kfree(host);
 	}
 
@@ -2503,9 +2726,6 @@ static void srp_remove_one(struct ib_device *device)
 	kfree(srp_dev);
 }
 
-static struct srp_function_template ib_srp_transport_functions = {
-};
-
 static int __init srp_init_module(void)
 {
 	int ret;
diff --git a/drivers/infiniband/ulp/srp/ib_srp.h b/drivers/infiniband/ulp/srp/ib_srp.h
index f010bd9..5e30cff 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.h
+++ b/drivers/infiniband/ulp/srp/ib_srp.h
@@ -78,11 +78,20 @@ enum {
 	SRP_MAP_NO_FMR		= 1,
 };
 
+/**
+ * @SRP_TARGET_LIVE: IB RC connection is established and SCSI host is unblocked.
+ * @SRP_TARGET_CONNECTING: IB RC connection being established.
+ * @SRP_TARGET_BLOCKED: An IB RC error occurred. Recovery timer may be running.
+ *                      SCSI host is blocked.
+ * @SRP_TARGET_DISCONNECTED: IB RC connection has been disconnected.
+ * @SRP_TARGET_REMOVED: SCSI host removal is pending.
+ */
 enum srp_target_state {
 	SRP_TARGET_LIVE,
 	SRP_TARGET_CONNECTING,
-	SRP_TARGET_DEAD,
-	SRP_TARGET_REMOVED
+	SRP_TARGET_BLOCKED,
+	SRP_TARGET_DISCONNECTED,
+	SRP_TARGET_REMOVED,
 };
 
 enum srp_iu_type {
@@ -155,6 +164,7 @@ struct srp_target_port {
 	u16			io_class;
 	struct srp_host	       *srp_host;
 	struct Scsi_Host       *scsi_host;
+	struct srp_rport       *rport;
 	char			target_name[32];
 	unsigned int		scsi_id;
 	unsigned int		sg_tablesize;
@@ -174,15 +184,22 @@ struct srp_target_port {
 	struct srp_iu	       *rx_ring[SRP_RQ_SIZE];
 	struct srp_request	req_ring[SRP_CMD_SQ_SIZE];
 
-	struct work_struct	work;
-
 	struct list_head	list;
 	struct completion	done;
 	int			status;
-	int			qp_in_error;
+	struct mutex		mutex;
 
 	struct completion	tsk_mgmt_done;
 	u8			tsk_mgmt_status;
+
+	struct work_struct	block_work;
+	struct work_struct	remove_work;
+	struct work_struct	scan_work;
+
+	int			reconnect_tmo;
+	int			max_reconnects;
+	int			failed_reconnects;
+	struct delayed_work	reconnect_work;
 };
 
 struct srp_iu {
diff --git a/drivers/scsi/scsi_transport_srp.c b/drivers/scsi/scsi_transport_srp.c
index 1cbf097..9a57b7b 100644
--- a/drivers/scsi/scsi_transport_srp.c
+++ b/drivers/scsi/scsi_transport_srp.c
@@ -2,6 +2,7 @@
  * SCSI RDMA (SRP) transport class
  *
  * Copyright (C) 2007 FUJITA Tomonori <tomof@acm.org>
+ * Copyright (C) 2011 Bart Van Assche <bvanassche@acm.org>
  *
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of the GNU General Public License as
@@ -38,7 +39,7 @@ struct srp_host_attrs {
 #define to_srp_host_attrs(host)	((struct srp_host_attrs *)(host)->shost_data)
 
 #define SRP_HOST_ATTRS 0
-#define SRP_RPORT_ATTRS 2
+#define SRP_RPORT_ATTRS 3
 
 struct srp_internal {
 	struct scsi_transport_template t;
@@ -116,6 +117,158 @@ show_srp_rport_roles(struct device *dev, struct device_attribute *attr,
 
 static DEVICE_ATTR(roles, S_IRUGO, show_srp_rport_roles, NULL);
 
+static ssize_t show_srp_rport_recovery_tmo(struct device *dev,
+					   struct device_attribute *attr,
+					   char *buf)
+{
+	struct srp_rport *rport = transport_class_to_srp_rport(dev);
+
+	return sprintf(buf, "%d\n", rport->recovery_tmo);
+}
+
+static ssize_t store_srp_rport_recovery_tmo(struct device *dev,
+					    struct device_attribute *attr,
+					    const char *buf, size_t count)
+{
+	struct srp_rport *rport = transport_class_to_srp_rport(dev);
+	char ch[16];
+	int res;
+	int recovery_tmo;
+
+	sprintf(ch, "%.*s", (int)min(sizeof(ch) - 1, count), buf);
+	res = kstrtoint(ch, 0, &recovery_tmo);
+	if (res)
+		goto out;
+	rport->recovery_tmo = recovery_tmo;
+	if (recovery_tmo > 0 && rport->state == SRP_RPORT_BLOCKED)
+		queue_delayed_work(system_long_wq, &rport->recovery_work,
+				   recovery_tmo * HZ);
+	else if (recovery_tmo <= 0)
+		cancel_delayed_work(&rport->recovery_work);
+	res = count;
+out:
+	return res;
+}
+
+static DEVICE_ATTR(recovery_tmo, S_IRUGO | S_IWUSR, show_srp_rport_recovery_tmo,
+		   store_srp_rport_recovery_tmo);
+
+/**
+ * rport_recovery_timedout() - Transport layer recovery timeout handler.
+ *
+ * If rport->ft->rport_recovery_timedout is not NULL, invoke that function.
+ * Otherwise unblock the SCSI host.
+ *
+ * Note: rport->ft->rport_recovery_timedout must either unblock or remove the
+ * SCSI host.
+ */
+static void rport_recovery_timedout(struct work_struct *work)
+{
+	struct srp_rport *rport;
+
+	rport = container_of(to_delayed_work(work), struct srp_rport,
+			     recovery_work);
+
+	pr_debug("rport %p recovery timed out after %d secs\n", rport,
+		 rport->recovery_tmo);
+
+	BUG_ON(!rport->ft);
+
+	if (rport->state == SRP_RPORT_BLOCKED &&
+	    rport->ft->rport_recovery_timedout) {
+		rport->ft->rport_recovery_timedout(rport);
+	}
+}
+
+/**
+ * srp_unblock_rport() - Unblock an rport.
+ */
+void srp_unblock_rport(struct srp_rport *rport)
+{
+	unsigned long flags;
+	enum srp_rport_state prev_state;
+	bool unblock = false;
+
+	pr_debug("Trying to unblock rport %p\n", rport);
+
+	spin_lock_irqsave(&rport->lock, flags);
+	prev_state = rport->state;
+	if (prev_state == SRP_RPORT_BLOCKED) {
+		rport->state = SRP_RPORT_LIVE;
+		unblock = true;
+	}
+	spin_unlock_irqrestore(&rport->lock, flags);
+
+	if (!unblock) {
+		pr_debug("Not unblocking rport %p because it is in state %d\n",
+			 rport, prev_state);
+		return;
+	}
+
+	cancel_delayed_work(&rport->recovery_work);
+	scsi_target_unblock(rport->dev.parent);
+
+	pr_debug("Completed unblocking rport %p\n", rport);
+}
+EXPORT_SYMBOL(srp_unblock_rport);
+
+/**
+ * srp_block_rport() - Block an rport.
+ */
+void srp_block_rport(struct srp_rport *rport)
+{
+	unsigned long flags;
+	bool block = false;
+
+	pr_debug("Blocking rport %p\n", rport);
+
+	spin_lock_irqsave(&rport->lock, flags);
+	if (rport->state == SRP_RPORT_LIVE) {
+		rport->state = SRP_RPORT_BLOCKED;
+		block = true;
+	}
+	spin_unlock_irqrestore(&rport->lock, flags);
+
+	scsi_target_block(rport->dev.parent);
+
+	pr_debug("Completed blocking rport %p\n", rport);
+}
+EXPORT_SYMBOL(srp_block_rport);
+
+/**
+ * srp_start_recovery_timer() - Start the rport transport layer recovery timer.
+ */
+void srp_start_recovery_timer(struct srp_rport *rport)
+{
+	WARN_ON(rport->state != SRP_RPORT_BLOCKED);
+
+	if (rport->recovery_tmo >= 0)
+		queue_delayed_work(system_long_wq, &rport->recovery_work,
+				   rport->recovery_tmo * HZ);
+}
+EXPORT_SYMBOL(srp_start_recovery_timer);
+
+/**
+ * srp_rport_disable_recovery() - Disable the transport layer recovery timer.
+ */
+void srp_rport_disable_recovery(struct srp_rport *rport)
+{
+	struct device *dev = rport->dev.parent;
+
+	device_remove_file(dev, &dev_attr_recovery_tmo);
+	cancel_delayed_work_sync(&rport->recovery_work);
+}
+EXPORT_SYMBOL(srp_rport_disable_recovery);
+
+/**
+ * srp_stop_rport() - Stop asynchronous work for an rport.
+ */
+void srp_stop_rport(struct srp_rport *rport)
+{
+	srp_rport_disable_recovery(rport);
+}
+EXPORT_SYMBOL(srp_stop_rport);
+
 static void srp_rport_release(struct device *dev)
 {
 	struct srp_rport *rport = dev_to_rport(dev);
@@ -192,6 +345,11 @@ struct srp_rport *srp_rport_add(struct Scsi_Host *shost,
 	memcpy(rport->port_id, ids->port_id, sizeof(rport->port_id));
 	rport->roles = ids->roles;
 
+	rport->recovery_tmo = 120;
+	INIT_DELAYED_WORK(&rport->recovery_work, rport_recovery_timedout);
+	spin_lock_init(&rport->lock);
+	rport->state = SRP_RPORT_LIVE;
+
 	id = atomic_inc_return(&to_srp_host_attrs(shost)->next_port_id);
 	dev_set_name(&rport->dev, "port-%d:%d", shost->host_no, id);
 
@@ -309,6 +467,8 @@ srp_attach_transport(struct srp_function_template *ft)
 	count = 0;
 	i->rport_attrs[count++] = &dev_attr_port_id;
 	i->rport_attrs[count++] = &dev_attr_roles;
+	if (ft->rport_recovery_timedout)
+		i->rport_attrs[count++] = &dev_attr_recovery_tmo;
 	i->rport_attrs[count++] = NULL;
 	WARN_ON(count > ARRAY_SIZE(i->rport_attrs));
 
diff --git a/include/scsi/scsi_transport_srp.h b/include/scsi/scsi_transport_srp.h
index 9c60ca1..fcda8e3 100644
--- a/include/scsi/scsi_transport_srp.h
+++ b/include/scsi/scsi_transport_srp.h
@@ -8,19 +8,45 @@
 #define SRP_RPORT_ROLE_INITIATOR 0
 #define SRP_RPORT_ROLE_TARGET 1
 
+/**
+ * enum srp_rport_state - Rport state.
+ * SRP_RPORT_LIVE: SCSI host logged in.
+ * SRP_RPORT_BLOCKED: SCSI host blocked because of a transport layer issue.
+ */
+enum srp_rport_state {
+	SRP_RPORT_LIVE,
+	SRP_RPORT_BLOCKED,
+};
+
 struct srp_rport_identifiers {
 	u8 port_id[16];
 	u8 roles;
 };
 
 struct srp_rport {
+	/* for initiator and target drivers */
+
+	struct srp_function_template *ft;
+
 	struct device dev;
 
 	u8 port_id[16];
 	u8 roles;
+
+	/* for initiator drivers */
+
+	void			*lld_data;	/* LLD private data */
+
+	spinlock_t		lock;
+	enum srp_rport_state	state;
+
+	int			recovery_tmo;
+	struct delayed_work	recovery_work;
 };
 
 struct srp_function_template {
+	/* for initiator drivers */
+	void (*rport_recovery_timedout) (struct srp_rport *rport);
 	/* for target drivers */
 	int (* tsk_mgmt_response)(struct Scsi_Host *, u64, u64, int);
 	int (* it_nexus_response)(struct Scsi_Host *, u64, int);
@@ -33,6 +59,11 @@ extern void srp_release_transport(struct scsi_transport_template *);
 extern struct srp_rport *srp_rport_add(struct Scsi_Host *,
 				       struct srp_rport_identifiers *);
 extern void srp_rport_del(struct srp_rport *);
+extern void srp_unblock_rport(struct srp_rport *rport);
+extern void srp_block_rport(struct srp_rport *rport);
+extern void srp_start_recovery_timer(struct srp_rport *rport);
+extern void srp_rport_disable_recovery(struct srp_rport *rport);
+extern void srp_stop_rport(struct srp_rport *rport);
 
 extern void srp_remove_host(struct Scsi_Host *);
 
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 13/14] ib_srp: Implement transport layer ping
       [not found] ` <201112011954.25811.bvanassche-HInyCGIudOg@public.gmane.org>
                     ` (8 preceding siblings ...)
  2011-12-01 19:09   ` [PATCH 11/14] ib_srp: Document sysfs attributes Bart Van Assche
@ 2011-12-01 19:11   ` Bart Van Assche
  2011-12-19  0:50     ` David Dillow
  9 siblings, 1 reply; 65+ messages in thread
From: Bart Van Assche @ 2011-12-01 19:11 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: linux-scsi-u79uwXL29TY76Z2rM5mHXA, David Dillow, Roland Dreier,
	Fujita Tomonori, Brian King

Add a time-based transport layer test such that fail-over in a multipath
setup can happen quickly.

Add the necessary functions in the SRP transport module to allow an SRP
initiator driver to implement this.

Add a slave_delete callback in the SCSI host template such that SCSI
hosts that hold a reference to a SCSI device can be deleted via the
sysfs SCSI host "delete" attribute.

Signed-off-by: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>
Cc: David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org>
Cc: Roland Dreier <roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org>
Cc: FUJITA Tomonori <fujita.tomonori-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
Cc: Brian King <brking-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
---
 Documentation/ABI/stable/sysfs-transport-srp |   18 +++
 drivers/infiniband/ulp/srp/ib_srp.c          |   48 ++++++++
 drivers/scsi/scsi_sysfs.c                    |    7 +-
 drivers/scsi/scsi_transport_srp.c            |  168 +++++++++++++++++++++++++-
 include/scsi/scsi_host.h                     |    9 ++
 include/scsi/scsi_transport_srp.h            |    9 ++
 6 files changed, 257 insertions(+), 2 deletions(-)

diff --git a/Documentation/ABI/stable/sysfs-transport-srp b/Documentation/ABI/stable/sysfs-transport-srp
index 7b0d4a5..d8a9048 100644
--- a/Documentation/ABI/stable/sysfs-transport-srp
+++ b/Documentation/ABI/stable/sysfs-transport-srp
@@ -1,3 +1,21 @@
+What:		/sys/class/srp_remote_ports/port-<h>:<n>/ping_interval
+Date:		November 1, 2011
+KernelVersion:	3.3
+Contact:	linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
+Description:	Time in seconds between two sucessive ping attempts. Setting
+		this parameter to zero or a negative value disables the ping
+		mechanism.
+
+What:		/sys/class/srp_remote_ports/port-<h>:<n>/ping_timeout
+Date:		November 1, 2011
+KernelVersion:	3.3
+Contact:	linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
+Description:	If more time has elapsed than the specified number of seconds
+		since the latest successful ping attempt, the SRP initiator
+		driver that enabled this feature is informed about a transport
+		layer timeout by invoking its rport_ping_timedout callback
+		function.
+
 What:		/sys/class/srp_remote_ports/port-<h>:<n>/port_id
 Date:		June 27, 2007
 KernelVersion:	2.6.24
diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index 797a8f1..82057f2 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -507,6 +507,26 @@ static void srp_del_scsi_host_attr(struct Scsi_Host *shost)
 }
 
 /**
+ * srp_disable_ping() - Stop pinging a target.
+ *
+ * Note: Can be invoked concurrently via the SCSI host sysfs attribute "delete"
+ * and one of the rport callback functions.
+ */
+static void srp_disable_ping(struct scsi_device *sdev)
+{
+	struct Scsi_Host *shost = sdev->host;
+	struct srp_target_port *target = host_to_target(shost);
+	struct srp_rport *rport = target->rport;
+
+	if (rport->sdev == sdev) {
+		pr_debug("Disabling pinging rport %p via sdev %p\n", rport,
+			 sdev);
+		srp_rport_set_sdev(rport, NULL);
+		srp_rport_disable_ping(rport);
+	}
+}
+
+/*
  * srp_remove_target() - Remove an SRP target.
  *
  * The strategy to remove a target is as follows:
@@ -522,6 +542,8 @@ static void srp_del_scsi_host_attr(struct Scsi_Host *shost)
  */
 static void srp_remove_target(struct srp_target_port *target)
 {
+	struct srp_rport *rport = target->rport;
+
 	WARN_ON(target->state != SRP_TARGET_REMOVED);
 
 	srp_del_scsi_host_attr(target->scsi_host);
@@ -529,6 +551,8 @@ static void srp_remove_target(struct srp_target_port *target)
 	mutex_lock(&target->mutex);
 	mutex_unlock(&target->mutex);
 	srp_unblock_rport(rport);
+	if (rport->sdev)
+		srp_disable_ping(rport->sdev);
 	srp_rport_disable_recovery(target->rport);
 	cancel_delayed_work_sync(&target->reconnect_work);
 	srp_stop_rport(target->rport);
@@ -555,6 +579,21 @@ static void srp_remove_work(struct work_struct *work)
 	srp_remove_target(target);
 }
 
+static void srp_ping_timedout(struct srp_rport *rport)
+{
+	struct srp_target_port *target = rport->lld_data;
+
+	pr_debug("ping timeout: rport = %p; target = %p / state %d\n",
+		 rport, target, target->state);
+
+	mutex_lock(&target->mutex);
+	if (srp_change_state(target, SRP_TARGET_LIVE, SRP_TARGET_BLOCKED)) {
+		srp_block_rport(rport);
+		srp_start_recovery_timer(target->rport);
+	}
+	mutex_unlock(&target->mutex);
+}
+
 static void srp_recovery_timedout(struct srp_rport *rport)
 {
 	struct srp_target_port *target = rport->lld_data;
@@ -1506,6 +1545,13 @@ static int srp_slave_alloc(struct scsi_device *sdev)
 {
 	struct Scsi_Host *shost = sdev->host;
 	struct srp_target_port *target = host_to_target(shost);
+	struct srp_rport *rport = target->rport;
+
+	if (!rport->sdev) {
+		pr_debug("Enable pinging rport %p through sdev %p\n", rport,
+			 sdev);
+		srp_rport_set_sdev(rport, sdev);
+	}
 
 	if (!WARN_ON(target->rq_tmo_jiffies == 0))
 		blk_queue_rq_timeout(sdev->request_queue,
@@ -2070,6 +2116,7 @@ static struct scsi_host_template srp_template = {
 	.name				= "InfiniBand SRP initiator",
 	.proc_name			= DRV_NAME,
 	.slave_alloc			= srp_slave_alloc,
+	.slave_delete			= srp_disable_ping,
 	.info				= srp_target_info,
 	.queuecommand			= srp_queuecommand,
 	.eh_abort_handler		= srp_abort,
@@ -2084,6 +2131,7 @@ static struct scsi_host_template srp_template = {
 };
 
 static struct srp_function_template ib_srp_transport_functions = {
+	.rport_ping_timedout	 = srp_ping_timedout,
 	.rport_recovery_timedout = srp_recovery_timedout,
 };
 
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index e0bd3f7..328caa3 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -534,7 +534,12 @@ static DEVICE_ATTR(rescan, S_IWUSR, NULL, store_rescan_field);
 
 static void sdev_store_delete_callback(struct device *dev)
 {
-	scsi_remove_device(to_scsi_device(dev));
+	struct scsi_device *sdev = to_scsi_device(dev);
+	struct scsi_host_template *sht = sdev->host->hostt;
+
+	if (sht->slave_delete)
+		sht->slave_delete(sdev);
+	scsi_remove_device(sdev);
 }
 
 static ssize_t
diff --git a/drivers/scsi/scsi_transport_srp.c b/drivers/scsi/scsi_transport_srp.c
index 9a57b7b..135a870 100644
--- a/drivers/scsi/scsi_transport_srp.c
+++ b/drivers/scsi/scsi_transport_srp.c
@@ -29,6 +29,7 @@
 #include <scsi/scsi.h>
 #include <scsi/scsi_device.h>
 #include <scsi/scsi_host.h>
+#include <scsi/scsi_eh.h>
 #include <scsi/scsi_transport.h>
 #include <scsi/scsi_transport_srp.h>
 #include "scsi_transport_srp_internal.h"
@@ -39,7 +40,7 @@ struct srp_host_attrs {
 #define to_srp_host_attrs(host)	((struct srp_host_attrs *)(host)->shost_data)
 
 #define SRP_HOST_ATTRS 0
-#define SRP_RPORT_ATTRS 3
+#define SRP_RPORT_ATTRS 5
 
 struct srp_internal {
 	struct scsi_transport_template t;
@@ -117,6 +118,67 @@ show_srp_rport_roles(struct device *dev, struct device_attribute *attr,
 
 static DEVICE_ATTR(roles, S_IRUGO, show_srp_rport_roles, NULL);
 
+static ssize_t show_srp_rport_ping_interval(struct device *dev,
+					   struct device_attribute *attr,
+					   char *buf)
+{
+	struct srp_rport *rport = transport_class_to_srp_rport(dev);
+
+	return sprintf(buf, "%d\n", rport->ping_itv);
+}
+
+static ssize_t store_srp_rport_ping_interval(struct device *dev,
+					    struct device_attribute *attr,
+					    const char *buf, size_t count)
+{
+	struct srp_rport *rport = transport_class_to_srp_rport(dev);
+	char ch[16];
+	int res, ping_itv;
+
+	sprintf(ch, "%.*s", (int)min(sizeof(ch) - 1, count), buf);
+	res = kstrtoint(ch, 0, &ping_itv);
+	if (res)
+		goto out;
+	rport->ping_itv = ping_itv;
+	pr_debug("rport %p: new ping interval = %d seconds\n", rport, ping_itv);
+	if (ping_itv > 0)
+		queue_delayed_work(system_long_wq, &rport->ping_work,
+				   ping_itv * HZ);
+	else
+		cancel_delayed_work(&rport->ping_work);
+	res = count;
+out:
+	return res;
+}
+
+static DEVICE_ATTR(ping_interval, S_IRUGO | S_IWUSR,
+		   show_srp_rport_ping_interval, store_srp_rport_ping_interval);
+
+static ssize_t show_srp_rport_ping_timeout(struct device *dev,
+					   struct device_attribute *attr,
+					   char *buf)
+{
+	struct srp_rport *rport = transport_class_to_srp_rport(dev);
+
+	return sprintf(buf, "%d\n", rport->ping_tmo);
+}
+
+static ssize_t store_srp_rport_ping_timeout(struct device *dev,
+					    struct device_attribute *attr,
+					    const char *buf, size_t count)
+{
+	struct srp_rport *rport = transport_class_to_srp_rport(dev);
+	char ch[16];
+	int res;
+
+	sprintf(ch, "%.*s", (int)min(sizeof(ch) - 1, count), buf);
+	res = kstrtoint(ch, 0, &rport->ping_tmo);
+	return res == 0 ? count : res;
+}
+
+static DEVICE_ATTR(ping_timeout, S_IRUGO | S_IWUSR, show_srp_rport_ping_timeout,
+		   store_srp_rport_ping_timeout);
+
 static ssize_t show_srp_rport_recovery_tmo(struct device *dev,
 					   struct device_attribute *attr,
 					   char *buf)
@@ -194,6 +256,7 @@ void srp_unblock_rport(struct srp_rport *rport)
 	spin_lock_irqsave(&rport->lock, flags);
 	prev_state = rport->state;
 	if (prev_state == SRP_RPORT_BLOCKED) {
+		rport->latest_ping_response = jiffies;
 		rport->state = SRP_RPORT_LIVE;
 		unblock = true;
 	}
@@ -249,6 +312,20 @@ void srp_start_recovery_timer(struct srp_rport *rport)
 EXPORT_SYMBOL(srp_start_recovery_timer);
 
 /**
+ * srp_rport_disable_ping() - Stop pinging and prevent reenabling pinging.
+ */
+void srp_rport_disable_ping(struct srp_rport *rport)
+{
+	struct device *dev = rport->dev.parent;
+
+	WARN_ON(rport->state == SRP_RPORT_BLOCKED);
+	device_remove_file(dev, &dev_attr_ping_timeout);
+	device_remove_file(dev, &dev_attr_ping_interval);
+	cancel_delayed_work_sync(&rport->ping_work);
+}
+EXPORT_SYMBOL(srp_rport_disable_ping);
+
+/**
  * srp_rport_disable_recovery() - Disable the transport layer recovery timer.
  */
 void srp_rport_disable_recovery(struct srp_rport *rport)
@@ -265,10 +342,92 @@ EXPORT_SYMBOL(srp_rport_disable_recovery);
  */
 void srp_stop_rport(struct srp_rport *rport)
 {
+	srp_rport_disable_ping(rport);
 	srp_rport_disable_recovery(rport);
 }
 EXPORT_SYMBOL(srp_stop_rport);
 
+/**
+ * srp_rport_set_sdev() - Set the SCSI device that will be used for pinging.
+ */
+void srp_rport_set_sdev(struct srp_rport *rport, struct scsi_device *sdev)
+{
+	unsigned long flags;
+
+	if (sdev && !get_device(&sdev->sdev_dev))
+		sdev = NULL;
+
+	spin_lock_irqsave(&rport->lock, flags);
+	swap(rport->sdev, sdev);
+	spin_unlock_irqrestore(&rport->lock, flags);
+
+	if (sdev)
+		put_device(&sdev->sdev_dev);
+}
+EXPORT_SYMBOL(srp_rport_set_sdev);
+
+/**
+ * srp_rport_get_sdev() - Get the SCSI device used for pinging.
+ */
+static struct scsi_device *rport_get_sdev(struct srp_rport *rport)
+{
+	struct scsi_device *sdev;
+	unsigned long flags;
+
+	spin_lock_irqsave(&rport->lock, flags);
+	sdev = rport->sdev;
+	if (sdev && !get_device(&sdev->sdev_dev))
+		sdev = NULL;
+	spin_unlock_irqrestore(&rport->lock, flags);
+
+	return sdev;
+}
+
+/**
+ * rport_ping() - Verify whether the transport layer is still operational.
+ */
+static void rport_ping(struct work_struct *work)
+{
+	struct scsi_sense_hdr sshdr;
+	struct srp_rport *rport;
+	struct scsi_device *sdev;
+	int res, itv, tmo;
+
+	rport = container_of(work, struct srp_rport, ping_work.work);
+	itv = rport->ping_itv;
+	tmo = rport->ping_tmo;
+	sdev = rport_get_sdev(rport);
+	pr_debug("rport %p has state %d; sdev = %p; ping interval = %d\n",
+		 rport, rport->state, sdev, itv);
+	if (itv <= 0)
+		goto out;
+	if (!sdev)
+		goto schedule;
+	if (rport->state == SRP_RPORT_BLOCKED)
+		goto put;
+	memset(&sshdr, 0, sizeof(sshdr));
+	res = scsi_test_unit_ready(sdev, itv, 1, NULL);
+	pr_debug("scsi_test_unit_ready() result = 0x%x / %s%s\n", res,
+		 scsi_sense_valid(&sshdr) ? "sense valid" : "sense not valid",
+		 scsi_sense_valid(&sshdr) &&
+		 sshdr.sense_key == UNIT_ATTENTION ? " (unit attention)" : "");
+	if (scsi_status_is_good(res) || (res & SAM_STAT_CHECK_CONDITION)) {
+		rport->latest_ping_response = jiffies;
+	} else if (tmo > 0 &&
+		   time_after(jiffies, rport->latest_ping_response + tmo)) {
+		shost_printk(KERN_INFO, sdev->host,
+			     "SRP ping timeout elapsed\n");
+		if (rport->ft->rport_ping_timedout)
+			rport->ft->rport_ping_timedout(rport);
+	}
+put:
+	put_device(&sdev->sdev_dev);
+schedule:
+	queue_delayed_work(system_long_wq, &rport->ping_work, itv * HZ);
+out:
+	return;
+}
+
 static void srp_rport_release(struct device *dev)
 {
 	struct srp_rport *rport = dev_to_rport(dev);
@@ -346,6 +505,7 @@ struct srp_rport *srp_rport_add(struct Scsi_Host *shost,
 	rport->roles = ids->roles;
 
 	rport->recovery_tmo = 120;
+	INIT_DELAYED_WORK(&rport->ping_work, rport_ping);
 	INIT_DELAYED_WORK(&rport->recovery_work, rport_recovery_timedout);
 	spin_lock_init(&rport->lock);
 	rport->state = SRP_RPORT_LIVE;
@@ -353,6 +513,8 @@ struct srp_rport *srp_rport_add(struct Scsi_Host *shost,
 	id = atomic_inc_return(&to_srp_host_attrs(shost)->next_port_id);
 	dev_set_name(&rport->dev, "port-%d:%d", shost->host_no, id);
 
+	pr_debug("rport %p has name %s\n", rport, dev_name(&rport->dev));
+
 	transport_setup_device(&rport->dev);
 
 	ret = device_add(&rport->dev);
@@ -467,6 +629,10 @@ srp_attach_transport(struct srp_function_template *ft)
 	count = 0;
 	i->rport_attrs[count++] = &dev_attr_port_id;
 	i->rport_attrs[count++] = &dev_attr_roles;
+	if (ft->rport_ping_timedout) {
+		i->rport_attrs[count++] = &dev_attr_ping_interval;
+		i->rport_attrs[count++] = &dev_attr_ping_timeout;
+	}
 	if (ft->rport_recovery_timedout)
 		i->rport_attrs[count++] = &dev_attr_recovery_tmo;
 	i->rport_attrs[count++] = NULL;
diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
index f1f2644..16536a6 100644
--- a/include/scsi/scsi_host.h
+++ b/include/scsi/scsi_host.h
@@ -233,6 +233,15 @@ struct scsi_host_template {
 	 */
 	int (* slave_configure)(struct scsi_device *);
 
+	/**
+	 * Callback invoked just before scsi_remove_device() will be invoked
+	 * e.g. if device removal has been requested via the sdev "delete"
+	 * sysfs attribute.
+	 *
+	 * Status: OPTIONAL
+	 */
+	void (* slave_delete)(struct scsi_device *);
+
 	/*
 	 * Immediately prior to deallocating the device and after all activity
 	 * has ceased the mid layer calls this point so that the low level
diff --git a/include/scsi/scsi_transport_srp.h b/include/scsi/scsi_transport_srp.h
index fcda8e3..565cb79 100644
--- a/include/scsi/scsi_transport_srp.h
+++ b/include/scsi/scsi_transport_srp.h
@@ -38,14 +38,20 @@ struct srp_rport {
 	void			*lld_data;	/* LLD private data */
 
 	spinlock_t		lock;
+	struct scsi_device	*sdev;
 	enum srp_rport_state	state;
 
+	int			ping_itv;
+	int			ping_tmo;
+	unsigned long		latest_ping_response;
 	int			recovery_tmo;
+	struct delayed_work	ping_work;
 	struct delayed_work	recovery_work;
 };
 
 struct srp_function_template {
 	/* for initiator drivers */
+	void (*rport_ping_timedout) (struct srp_rport *rport);
 	void (*rport_recovery_timedout) (struct srp_rport *rport);
 	/* for target drivers */
 	int (* tsk_mgmt_response)(struct Scsi_Host *, u64, u64, int);
@@ -62,6 +68,9 @@ extern void srp_rport_del(struct srp_rport *);
 extern void srp_unblock_rport(struct srp_rport *rport);
 extern void srp_block_rport(struct srp_rport *rport);
 extern void srp_start_recovery_timer(struct srp_rport *rport);
+extern void srp_rport_set_sdev(struct srp_rport *rport,
+			       struct scsi_device *sdev);
+extern void srp_rport_disable_ping(struct srp_rport *rport);
 extern void srp_rport_disable_recovery(struct srp_rport *rport);
 extern void srp_stop_rport(struct srp_rport *rport);
 
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 14/14] ib_srp: Allow SRP disconnect through sysfs
  2011-12-01 18:54 [PATCH 00/14] Make ib_srp better suited for H.A. purposes Bart Van Assche
                   ` (3 preceding siblings ...)
       [not found] ` <201112011954.25811.bvanassche-HInyCGIudOg@public.gmane.org>
@ 2011-12-01 19:13 ` Bart Van Assche
  2011-12-19  4:03   ` David Dillow
  2011-12-01 23:26 ` [PATCH 00/14] Make ib_srp better suited for H.A. purposes David Dillow
  5 siblings, 1 reply; 65+ messages in thread
From: Bart Van Assche @ 2011-12-01 19:13 UTC (permalink / raw)
  To: linux-rdma
  Cc: linux-scsi, David Dillow, Roland Dreier, Fujita Tomonori, Brian King

Make it possible to disconnect via sysfs the IB RC connection used by the
SRP protocol to communicate with a target.

Let the SRP transport layer create a sysfs "delete" attribute for
initiator drivers that support this functionality.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Cc: David Dillow <dillowda@ornl.gov>
Cc: Roland Dreier <roland@purestorage.com>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Brian King <brking@linux.vnet.ibm.com>
---
 Documentation/ABI/stable/sysfs-transport-srp |    7 ++++
 drivers/infiniband/ulp/srp/ib_srp.c          |   44 +++++++++++++++----------
 drivers/scsi/scsi_transport_srp.c            |   20 +++++++++++-
 include/scsi/scsi_transport_srp.h            |    1 +
 4 files changed, 53 insertions(+), 19 deletions(-)

diff --git a/Documentation/ABI/stable/sysfs-transport-srp b/Documentation/ABI/stable/sysfs-transport-srp
index d8a9048..b1c8acd 100644
--- a/Documentation/ABI/stable/sysfs-transport-srp
+++ b/Documentation/ABI/stable/sysfs-transport-srp
@@ -1,3 +1,10 @@
+What:		/sys/class/srp_remote_ports/port-<h>:<n>/delete
+Date:		November 1, 2011
+KernelVersion:	3.3
+Contact:	linux-scsi@vger.kernel.org, linux-rdma@vger.kernel.org
+Description:	Instructs an SRP initiator to disconnect from a target and to
+		remove all LUNs imported from that target.
+
 What:		/sys/class/srp_remote_ports/port-<h>:<n>/ping_interval
 Date:		November 1, 2011
 KernelVersion:	3.3
diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index 82057f2..f7e0224 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -579,6 +579,28 @@ static void srp_remove_work(struct work_struct *work)
 	srp_remove_target(target);
 }
 
+static void srp_rport_delete(struct srp_rport *rport)
+{
+	struct srp_target_port *target = rport->lld_data;
+	struct srp_host *host = target->srp_host;
+	bool remove = false;
+
+	/*
+	 * Schedule our work inside the lock to avoid a race with
+	 * the flush_work_sync() call in srp_remove_one().
+	 */
+	spin_lock(&host->target_lock);
+	spin_lock_irq(&target->lock);
+	if (target->state != SRP_TARGET_REMOVED) {
+		target->state = SRP_TARGET_REMOVED;
+		remove = true;
+	}
+	spin_unlock_irq(&target->lock);
+	if (remove)
+		queue_work(system_long_wq, &target->remove_work);
+	spin_unlock(&host->target_lock);
+}
+
 static void srp_ping_timedout(struct srp_rport *rport)
 {
 	struct srp_target_port *target = rport->lld_data;
@@ -1879,9 +1901,7 @@ static int srp_reset_device(struct scsi_cmnd *scmnd)
 static int srp_reset_host(struct scsi_cmnd *scmnd)
 {
 	struct srp_target_port *target = host_to_target(scmnd->device->host);
-	struct srp_host *host = target->srp_host;
 	int ret = FAILED;
-	bool remove = false;
 
 	shost_printk(KERN_ERR, target->scsi_host, PFX "SRP reset_host called\n");
 
@@ -1893,23 +1913,10 @@ static int srp_reset_host(struct scsi_cmnd *scmnd)
 		 * However, we have to defer the real removal because we
 		 * are in the context of the SCSI error handler now, which
 		 * will deadlock if we call scsi_remove_host().
-		 *
-		 * Schedule our work inside the lock to avoid a race with
-		 * the flush_work_sync() call in srp_remove_one().
 		 */
-		spin_lock(&host->target_lock);
-		spin_lock_irq(&target->lock);
-		if (target->state != SRP_TARGET_REMOVED) {
-			target->state = SRP_TARGET_REMOVED;
-			remove = true;
-		}
-		spin_unlock_irq(&target->lock);
-		if (remove) {
-			shost_printk(KERN_ERR, target->scsi_host, PFX "recon"
-				     "nect failed, removing target port.\n");
-			queue_work(system_long_wq, &target->remove_work);
-		}
-		spin_unlock(&host->target_lock);
+		shost_printk(KERN_ERR, target->scsi_host,
+			     PFX "reconnect failed, removing target port.\n");
+		srp_rport_delete(target->rport);
 	}
 
 	return ret;
@@ -2131,6 +2138,7 @@ static struct scsi_host_template srp_template = {
 };
 
 static struct srp_function_template ib_srp_transport_functions = {
+	.rport_delete		 = srp_rport_delete,
 	.rport_ping_timedout	 = srp_ping_timedout,
 	.rport_recovery_timedout = srp_recovery_timedout,
 };
diff --git a/drivers/scsi/scsi_transport_srp.c b/drivers/scsi/scsi_transport_srp.c
index 135a870..e0359b4 100644
--- a/drivers/scsi/scsi_transport_srp.c
+++ b/drivers/scsi/scsi_transport_srp.c
@@ -40,7 +40,7 @@ struct srp_host_attrs {
 #define to_srp_host_attrs(host)	((struct srp_host_attrs *)(host)->shost_data)
 
 #define SRP_HOST_ATTRS 0
-#define SRP_RPORT_ATTRS 5
+#define SRP_RPORT_ATTRS 6
 
 struct srp_internal {
 	struct scsi_transport_template t;
@@ -118,6 +118,22 @@ show_srp_rport_roles(struct device *dev, struct device_attribute *attr,
 
 static DEVICE_ATTR(roles, S_IRUGO, show_srp_rport_roles, NULL);
 
+static ssize_t store_srp_rport_delete(struct device *dev,
+				      struct device_attribute *attr,
+				      const char *buf, size_t count)
+{
+	struct srp_rport *rport = transport_class_to_srp_rport(dev);
+
+	if (rport->ft->rport_delete) {
+		rport->ft->rport_delete(rport);
+		return count;
+	} else {
+		return -ENOSYS;
+	}
+}
+
+static DEVICE_ATTR(delete, S_IWUSR, NULL, store_srp_rport_delete);
+
 static ssize_t show_srp_rport_ping_interval(struct device *dev,
 					   struct device_attribute *attr,
 					   char *buf)
@@ -635,6 +651,8 @@ srp_attach_transport(struct srp_function_template *ft)
 	}
 	if (ft->rport_recovery_timedout)
 		i->rport_attrs[count++] = &dev_attr_recovery_tmo;
+	if (ft->rport_delete)
+		i->rport_attrs[count++] = &dev_attr_delete;
 	i->rport_attrs[count++] = NULL;
 	WARN_ON(count > ARRAY_SIZE(i->rport_attrs));
 
diff --git a/include/scsi/scsi_transport_srp.h b/include/scsi/scsi_transport_srp.h
index 565cb79..8782d6a 100644
--- a/include/scsi/scsi_transport_srp.h
+++ b/include/scsi/scsi_transport_srp.h
@@ -51,6 +51,7 @@ struct srp_rport {
 
 struct srp_function_template {
 	/* for initiator drivers */
+	void (*rport_delete)(struct srp_rport *rport);
 	void (*rport_ping_timedout) (struct srp_rport *rport);
 	void (*rport_recovery_timedout) (struct srp_rport *rport);
 	/* for target drivers */
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* Re: [PATCH 06/14] ib_srp: Micro-optimize completion handlers
       [not found]     ` <201112012002.17829.bvanassche-HInyCGIudOg@public.gmane.org>
@ 2011-12-01 21:35       ` chas williams - CONTRACTOR
       [not found]         ` <20111201163539.572ca669-KCdNrDJlFBBhNwqIksvPR6qiWZVw4kCD+aIohriVLy8@public.gmane.org>
  0 siblings, 1 reply; 65+ messages in thread
From: chas williams - CONTRACTOR @ 2011-12-01 21:35 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, David Dillow, Roland Dreier

perhaps this should use the unlikely() macro instead:

	if (unlikely(wc.status)) {
		...
	}

	...

On Thu, 1 Dec 2011 20:02:17 +0100
Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org> wrote:

> Help the CPU branch predictor by rearranging two if-statements such
> that the most likely code is in the if-part instead of the else-part.
> 
> Signed-off-by: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>
> Cc: David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org>
> Cc: Roland Dreier <roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org>
> ---
>  drivers/infiniband/ulp/srp/ib_srp.c |   14 +++++++-------
>  1 files changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
> index a2624bf..e6d1aef 100644
> --- a/drivers/infiniband/ulp/srp/ib_srp.c
> +++ b/drivers/infiniband/ulp/srp/ib_srp.c
> @@ -1224,15 +1224,15 @@ static void srp_recv_completion(struct ib_cq *cq, void *target_ptr)
>  
>  	ib_req_notify_cq(cq, IB_CQ_NEXT_COMP);
>  	while (ib_poll_cq(cq, 1, &wc) > 0) {
> -		if (wc.status) {
> +		if (wc.status == IB_WC_SUCCESS) {
> +			srp_handle_recv(target, &wc);
> +		} else {
>  			shost_printk(KERN_ERR, target->scsi_host,
>  				     PFX "failed receive status %d\n",
>  				     wc.status);
>  			target->qp_in_error = 1;
>  			break;
>  		}
> -
> -		srp_handle_recv(target, &wc);
>  	}
>  }
>  
> @@ -1243,16 +1243,16 @@ static void srp_send_completion(struct ib_cq *cq, void *target_ptr)
>  	struct srp_iu *iu;
>  
>  	while (ib_poll_cq(cq, 1, &wc) > 0) {
> -		if (wc.status) {
> +		if (wc.status == IB_WC_SUCCESS) {
> +			iu = (struct srp_iu *) (uintptr_t) wc.wr_id;
> +			list_add(&iu->list, &target->free_tx);
> +		} else {
>  			shost_printk(KERN_ERR, target->scsi_host,
>  				     PFX "failed send status %d\n",
>  				     wc.status);
>  			target->qp_in_error = 1;
>  			break;
>  		}
> -
> -		iu = (struct srp_iu *) (uintptr_t) wc.wr_id;
> -		list_add(&iu->list, &target->free_tx);
>  	}
>  }
>  

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 00/14] Make ib_srp better suited for H.A. purposes
  2011-12-01 18:54 [PATCH 00/14] Make ib_srp better suited for H.A. purposes Bart Van Assche
                   ` (4 preceding siblings ...)
  2011-12-01 19:13 ` [PATCH 14/14] ib_srp: Allow SRP disconnect through sysfs Bart Van Assche
@ 2011-12-01 23:26 ` David Dillow
  5 siblings, 0 replies; 65+ messages in thread
From: David Dillow @ 2011-12-01 23:26 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: linux-rdma, linux-scsi, Roland Dreier, Vu Pham

On Thu, 2011-12-01 at 13:54 -0500, Bart Van Assche wrote:
> This patch series makes the ib_srp driver better suited for use in a H.A.
> setup because:
> - Switchover without triggering read or write errors become possible. Such
>   errors are bad because these can make a filesystem switch to read-only
>   mode.
> - A ping mechanism has been added that allows to reduce the switch-over
>   time greatly.
> - Disconnecting from a target without unloading ib_srp becomes possible.
> - Switchover can be triggered explicitly by deleting an initiator device.

In general, I'm glad to see this being done. I won't have the time this
week for a detailed review, but will take a look as soon as I can. I
have a number of ideas I'd like to see implemented (and have partially
done), so it will be interesting to see if we're using similar
approaches.
-- 
Dave Dillow
National Center for Computational Science
Oak Ridge National Laboratory
(865) 241-6602 office



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 06/14] ib_srp: Micro-optimize completion handlers
       [not found]         ` <20111201163539.572ca669-KCdNrDJlFBBhNwqIksvPR6qiWZVw4kCD+aIohriVLy8@public.gmane.org>
@ 2011-12-01 23:32           ` David Dillow
       [not found]             ` <1322782369.11664.5.camel-FqX9LgGZnHWDB2HL1qBt2PIbXMQ5te18@public.gmane.org>
  0 siblings, 1 reply; 65+ messages in thread
From: David Dillow @ 2011-12-01 23:32 UTC (permalink / raw)
  To: chas williams - CONTRACTOR
  Cc: Bart Van Assche, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Roland Dreier

On Thu, 2011-12-01 at 16:35 -0500, chas williams - CONTRACTOR wrote:
> perhaps this should use the unlikely() macro instead:
> 
> 	if (unlikely(wc.status)) {
> 		...
> 	}

If it needs to change at all, this is the direction it should take.

-- 
Dave Dillow
National Center for Computational Science
Oak Ridge National Laboratory
(865) 241-6602 office


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 01/14] ib_srp: Introduce pr_fmt()
       [not found]     ` <201112011955.47198.bvanassche-HInyCGIudOg@public.gmane.org>
@ 2011-12-01 23:40       ` David Dillow
  0 siblings, 0 replies; 65+ messages in thread
From: David Dillow @ 2011-12-01 23:40 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Roland Dreier

On Thu, 2011-12-01 at 13:55 -0500, Bart Van Assche wrote:
> Make the logging code more a little more brief by replacing
> printk(KERN_WARNING PFX ...) by pr_warn(...) and by replacing
> printk(KERN_ERR PFX ...) by pr_err(...). Remove one trailing
> space to avoid a checkpatch warning.

This is a good step; three minor nits:
1) Please go ahead and merge the multiline strings into one, even though
checkpatch will complain about the line length. I'm not a huge fan of
the long lines either, but Linus has clearly stated his preference for
being able to grep the string in the message
2) Please re-align the variables on wrapped lines to just after the (;
below the work token should be pushed over by one space.
3) All the commit message needs to really say is 'Using pr_* is
preferred over printk.' Probably can just be the short summary, and no
commit body needed. 

>  			if (token != SRP_REV10_IB_IO_CLASS &&
>  			    token != SRP_REV16A_IB_IO_CLASS) {
> -				printk(KERN_WARNING PFX "unknown IO class parameter value"
> +				pr_warn("unknown IO class parameter value"
>  				       " %x specified (use %x or %x).\n",
>  				       token, SRP_REV10_IB_IO_CLASS, SRP_REV16A_IB_IO_CLASS);
>  				goto out;

Thanks!
-- 
Dave Dillow
National Center for Computational Science
Oak Ridge National Laboratory
(865) 241-6602 office


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 06/14] ib_srp: Micro-optimize completion handlers
       [not found]             ` <1322782369.11664.5.camel-FqX9LgGZnHWDB2HL1qBt2PIbXMQ5te18@public.gmane.org>
@ 2011-12-12 11:41               ` Bart Van Assche
       [not found]                 ` <CAO+b5-pFBEQybN+01heAzr=_dNCb7Sr7ri_o_hF1MnOeX4_idQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 65+ messages in thread
From: Bart Van Assche @ 2011-12-12 11:41 UTC (permalink / raw)
  To: David Dillow; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Fri, Dec 2, 2011 at 12:32 AM, David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org> wrote:
> On Thu, 2011-12-01 at 16:35 -0500, chas williams - CONTRACTOR wrote:
>> perhaps this should use the unlikely() macro instead:
>>
>>       if (unlikely(wc.status)) {
>>               ...
>>       }
>
> If it needs to change at all, this is the direction it should take.

Hi Dave,

Prepatch 3.2-rc5 has been released which means that the 3.3 merge
window is coming closer. Do you have any other comments about this
patch series than those already posted to the list ?

Thanks,

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 06/14] ib_srp: Micro-optimize completion handlers
       [not found]                 ` <CAO+b5-pFBEQybN+01heAzr=_dNCb7Sr7ri_o_hF1MnOeX4_idQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2011-12-12 23:12                   ` David Dillow
  0 siblings, 0 replies; 65+ messages in thread
From: David Dillow @ 2011-12-12 23:12 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Mon, 2011-12-12 at 06:41 -0500, Bart Van Assche wrote:
> On Fri, Dec 2, 2011 at 12:32 AM, David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org> wrote:
> > On Thu, 2011-12-01 at 16:35 -0500, chas williams - CONTRACTOR wrote:
> >> perhaps this should use the unlikely() macro instead:
> >>
> >>       if (unlikely(wc.status)) {
> >>               ...
> >>       }
> >
> > If it needs to change at all, this is the direction it should take.
> 
> Hi Dave,
> 
> Prepatch 3.2-rc5 has been released which means that the 3.3 merge
> window is coming closer. Do you have any other comments about this
> patch series than those already posted to the list ?

I do; I'm sorry to keep you waiting, but other pressures at work have
kept me occupied. I'll have feedback for you this week, or will do it
over the weekend if necessary.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 02/14] ib_srp: Consolidate repetitive sysfs code
       [not found]     ` <201112011957.19892.bvanassche-HInyCGIudOg@public.gmane.org>
@ 2011-12-15 18:26       ` David Dillow
  0 siblings, 0 replies; 65+ messages in thread
From: David Dillow @ 2011-12-15 18:26 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Roland Dreier

On Thu, 2011-12-01 at 19:57 +0100, Bart Van Assche wrote:
> Remove sysfs attributes before removing a target instead of
> testing the target state in every sysfs attribute callback
> method. Note: it is safe to invoke a sysfs attribute removal
> method like device_remove_file() twice on the same attribute.

This is much cleaner, thanks.
-- 
Dave Dillow
National Center for Computational Science
Oak Ridge National Laboratory
(865) 241-6602 office

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 03/14] ib_srp: Disallow duplicate logins
       [not found]     ` <201112011958.17339.bvanassche-HInyCGIudOg@public.gmane.org>
@ 2011-12-15 19:08       ` David Dillow
       [not found]         ` <1323976101.16703.42.camel-FqX9LgGZnHWDB2HL1qBt2PIbXMQ5te18@public.gmane.org>
  0 siblings, 1 reply; 65+ messages in thread
From: David Dillow @ 2011-12-15 19:08 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Roland Dreier

On Thu, 2011-12-01 at 19:58 +0100, Bart Van Assche wrote:
> Currently the ib_srp driver allows to log in multiple times to the
> same target via its sysfs interface. This leads to each target LUN
> being imported multiple times at the initiator side, something that
> should not be allowed.

I'm of mixed opinion on this one. It seems like we're working around a
broken target -- we request single RDMA channel operation in our login
requests, so the standard says that the target should disconnect all
other connections for that I_T nexus when it receives a new login
request. That should keep us from having multiple instances of LUNs from
the same target over a single srp_host (HCA).

In the past, we've seen connections having issues but not going away,
and the only way to kill them was to create a new connection to the RAID
controller, which would then kill the old one. It's been a few years
since I've seen this, so the details are a bit fuzzy, but I recall the
current behavior being useful.

However, I'm not adverse to going beyond the standard if it makes sense,
and your later patch to allow disconnecting channels without removing
the module would fix the use case above. What problem are we solving
here other than a potentially confused admin?


> +	spin_lock(&host->target_lock);
> +	list_for_each_entry(t, &host->target_list, list) {
> +		WARN_ON(t == target);
> +		if (target->id_ext == t->id_ext &&
> +		    target->ioc_guid == t->ioc_guid &&
> +		    memcmp(target->orig_dgid, t->orig_dgid, 16) == 0 &&
> +		    target->path.pkey == t->path.pkey &&
> +		    target->service_id == t->service_id) {

The spec discusses the I_T nexus defined by the initiator and target
port identifiers as the unit for disambiguation when running in single
channel mode. The initiator port identifier is controlled by the port's
GUID combined with target->initiator_ext and the target port identifier
is defined by target->{id_ext,ioc_guid}.

>From the standard's perspective, why should we care about pkey, dgid (or
orig_dgid as above) or the service id? Those don't change the I_T nexus,
so the target should disconnect previous channels appropriately.


>  static ssize_t srp_create_target(struct device *dev,
>  				 struct device_attribute *attr,
>  				 const char *buf, size_t count)
> @@ -2133,6 +2161,10 @@ static ssize_t srp_create_target(struct device *dev,
>  	if (ret)
>  		goto err;
>  
> +	ret = srp_target_exists(host, target);
> +	if (ret)
> +		goto err;
> +

We need to tell the user that we didn't create a new connection because
one already exists.

-- 
Dave Dillow
National Center for Computational Science
Oak Ridge National Laboratory
(865) 241-6602 office

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 04/14] ib_srp: Set block layer timeout
       [not found]     ` <201112011959.11956.bvanassche-HInyCGIudOg@public.gmane.org>
@ 2011-12-15 19:37       ` David Dillow
       [not found]         ` <1323977841.16703.57.camel-FqX9LgGZnHWDB2HL1qBt2PIbXMQ5te18@public.gmane.org>
  2011-12-17 22:03       ` Or Gerlitz
  1 sibling, 1 reply; 65+ messages in thread
From: David Dillow @ 2011-12-15 19:37 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Roland Dreier

On Thu, 2011-12-01 at 19:59 +0100, Bart Van Assche wrote:
> Make sure that the default block layer timeout is above the
> InfiniBand transport layer timeout. The default block layer
> timeout is 30 seconds. Typical values for the QP local ack
> timeout and retry count are 19 and 7 respectively, which means
> that it can take up to 60.1 seconds before a HCA submits an
> error completion. The block layer timeout must be larger than
> the transport layer timeout because otherwise it can happen
> that an SRP response is received after the SCSI layer has
> already killed a command.

Good idea, but a few nits:


> +static int srp_slave_alloc(struct scsi_device *sdev)
> +{
> +	struct Scsi_Host *shost = sdev->host;
> +	struct srp_target_port *target = host_to_target(shost);
> +
> +	if (!WARN_ON(target->rq_tmo_jiffies == 0))
> +		blk_queue_rq_timeout(sdev->request_queue,
> +				     target->rq_tmo_jiffies);

What does having a WARN_ON here buy us? We should just set
target->rq_tmo_jiffies to a default value in srp_create_target(), and
adjust it as needed based on the CM response.

Also, since this is a call-back routine, please put it after
srp_reset_host() rather than in the middle of the target setup and CM
handling.

>  static void srp_cm_rep_handler(struct ib_cm_id *cm_id,
>  			       struct srp_login_rsp *lrsp,
>  			       struct srp_target_port *target)
> @@ -1431,6 +1442,24 @@ static void srp_cm_rep_handler(struct ib_cm_id *cm_id,
>  	if (ret)
>  		goto error_free;
>  
> +	if (!WARN_ON((attr_mask & (IB_QP_TIMEOUT | IB_QP_RETRY_CNT)) == 0)) {

Mentally unpacking that to realize the body gets executed if either of
those is set took a bit of thought... I think you will be better served
setting a default and reducing line that to

	if (attr_mask & (IP_QP_TIMEOUT | IB_QP_RETRY_CNT)) {

Also, you need to handle the case where only one of those is set, and
provide defaults when they aren't -- unless the spec says they both must
always be set (I haven't looked), in which case you can modify the
attr_mask test.

> +		uint64_t T_tr_ns, max_compl_time_ms;
> +
> +		/*
> +		 * Set target->rq_tmo_jiffies to one second more than the
> +		 * largest time it can take before an error completion is
> +		 * generated.
> +		 */

The calculations have a fair number of magic numbers in them; may be
useful to explain. 

You should be able to use nsecs_to_jiffies() instead of doing the math
yourself, but I see that's not exported. Bummer, but we can leave fixing
that for later, I think.

> +		T_tr_ns = 1ULL << (12 + qp_attr->timeout);
> +		max_compl_time_ms = qp_attr->retry_cnt * 4 * T_tr_ns;
> +		do_div(max_compl_time_ms, 1000 * 1000);
> +		target->rq_tmo_jiffies =
> +			msecs_to_jiffies(max_compl_time_ms + 1000);
> +		pr_debug("max. IB completion time = %lld ms; block layer"
> +			 " timeout = %d jiffies\n", max_compl_time_ms,
> +			 target->rq_tmo_jiffies);

Is the debug statement really needed? Though I suppose it isn't really
hurting anything...


> diff --git a/drivers/infiniband/ulp/srp/ib_srp.h b/drivers/infiniband/ulp/srp/ib_srp.h
> index 020caf0..f010bd9 100644
> --- a/drivers/infiniband/ulp/srp/ib_srp.h
> +++ b/drivers/infiniband/ulp/srp/ib_srp.h
> @@ -138,6 +138,7 @@ struct srp_target_port {
>  	u32			lkey;
>  	u32			rkey;
>  	enum srp_target_state	state;
> +	u32			rq_tmo_jiffies;
>  	unsigned int		max_iu_len;

Please don't put that in the middle of the options used in the hot path;
it should go further down in the struct, somewhere after the comment

        /* Everything above this point is used in the hot path of
         * command processing. Try to keep them packed into cachelines.
         */

Bonus points if you can shove into a hole caused by packing rules.

-- 
Dave Dillow
National Center for Computational Science
Oak Ridge National Laboratory
(865) 241-6602 office

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 05/14] ib_srp: Avoid that late SRP replies cause trouble
       [not found]     ` <201112012000.04427.bvanassche-HInyCGIudOg@public.gmane.org>
@ 2011-12-15 20:03       ` David Dillow
  0 siblings, 0 replies; 65+ messages in thread
From: David Dillow @ 2011-12-15 20:03 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Roland Dreier

On Thu, 2011-12-01 at 20:00 +0100, Bart Van Assche wrote:
> If a user misconfigures the block layer timeout such that it is below
> the InfiniBand RC timeout it can happen that an SRP reply arrives after
> the SCSI error handler has already killed the associated SCSI command.
> Avoid that late replies cause a kernel crash.

I'm not sure that the case you describe can happen. The request do not
get killed until srp_remove_req() is called on them. In the event of
timeouts, this will be through srp_abort() initially, and then
srp_reset_device() and srp_reset_host(). If srp_abort() is successful,
we will never see a reply for that request unless the target is broken.
If it fails, a successful srp_reset_device() gives us the same
assurance. And if that fails, we kill the connection in
srp_reset_host(), so there is no possibility of a late reply.

However, the code as it currently is leaves us open to malformed
requests from the target. We cannot do anything about it if the target
is swapping tags around on the requests, but we should detect invalid
tags and take appropriate action. Checking for a null req->scmnd and
returning isn't the right action, as that leaks the request, and does
nothing to protect against bad tags.

I'm happy to code up a patch that sanity-checks the response for
in-range and outstanding (req->scmnd != NULL) responses, and kills the
connection, or you are welcome to do so as well. I think it would be
better suited later in the patch series, building on top of your work to
disconnect a target without removing the module.

-- 
Dave Dillow
National Center for Computational Science
Oak Ridge National Laboratory
(865) 241-6602 office

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 09/14] srp_transport: Fix attribute registration
       [not found]   ` <201112012006.50445.bvanassche-HInyCGIudOg@public.gmane.org>
@ 2011-12-15 20:09     ` David Dillow
  0 siblings, 0 replies; 65+ messages in thread
From: David Dillow @ 2011-12-15 20:09 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Roland Dreier,
	stable-u79uwXL29TY76Z2rM5mHXA, linux-scsi-u79uwXL29TY76Z2rM5mHXA,
	Fujita Tomonori, Brian King

On Thu, 2011-12-01 at 20:06 +0100, Bart Van Assche wrote:
> Register transport attributes after the attribute array has been
> set up instead of before.

Why?

I don't disagree with the change, but there should be at least a little
discussion of why you're doing it. Your last message about this to James
Bottomley had quite a bit of content, perhaps too much, but then his
comment that 'it's best practice to initialize a structure before
registering it' seems appropriate.

> Signed-off-by: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>
> Cc: FUJITA Tomonori <fujita.tomonori-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
> Cc: Brian King <brking-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
> Cc: David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org>
> Cc: Roland Dreier <roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org>
> Cc: stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> ---
>  drivers/scsi/scsi_transport_srp.c |    3 ++-
>  1 files changed, 2 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/scsi/scsi_transport_srp.c b/drivers/scsi/scsi_transport_srp.c
> index 21a045e..07c4394 100644
> --- a/drivers/scsi/scsi_transport_srp.c
> +++ b/drivers/scsi/scsi_transport_srp.c
> @@ -324,13 +324,14 @@ srp_attach_transport(struct srp_function_template *ft)
>  	i->rport_attr_cont.ac.attrs = &i->rport_attrs[0];
>  	i->rport_attr_cont.ac.class = &srp_rport_class.class;
>  	i->rport_attr_cont.ac.match = srp_rport_match;
> -	transport_container_register(&i->rport_attr_cont);
>  
>  	count = 0;
>  	SETUP_RPORT_ATTRIBUTE_RD(port_id);
>  	SETUP_RPORT_ATTRIBUTE_RD(roles);
>  	i->rport_attrs[count] = NULL;
>  
> +	transport_container_register(&i->rport_attr_cont);
> +
>  	i->f = ft;
>  
>  	return &i->t;

-- 
Dave Dillow
National Center for Computational Science
Oak Ridge National Laboratory
(865) 241-6602 office

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 12/14] ib_srp: Rework error handling
       [not found]   ` <201112012010.37276.bvanassche-HInyCGIudOg@public.gmane.org>
@ 2011-12-15 20:20     ` David Dillow
  2011-12-19  3:36     ` David Dillow
  1 sibling, 0 replies; 65+ messages in thread
From: David Dillow @ 2011-12-15 20:20 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Fujita Tomonori, Brian King,
	Roland Dreier, linux-scsi-u79uwXL29TY76Z2rM5mHXA

On Thu, 2011-12-01 at 20:10 +0100, Bart Van Assche wrote:
> Add the necessary functions in the SRP transport module to allow
> an SRP initiator driver to implement transport layer recovery.

I've run out of time today to look further, but I did want to point out
that I think you're doing way too much in this patch. I like that you're
moving some pieces into the transport, as it would be good to handle
in-fabric/out-of-fabric there as well, as done for the FC transport.
Getting there needs to be done through a series of patches rather one
big patches, especially since you're touching code that multiple drivers
use and need to get the core changes reviewed by so many people. I would
need some quite some persuasion to be comfortable making large changes
to srp_transport through the IB tree without ack's by Fujita and Brian.
-- 
Dave Dillow
National Center for Computational Science
Oak Ridge National Laboratory
(865) 241-6602 office

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 04/14] ib_srp: Set block layer timeout
       [not found]         ` <1323977841.16703.57.camel-FqX9LgGZnHWDB2HL1qBt2PIbXMQ5te18@public.gmane.org>
@ 2011-12-17 17:50           ` Bart Van Assche
       [not found]             ` <CAO+b5-p9FQVvdVZtUvRJgMW6qhcnUKrp4RhCch1Hnt8JsjS5qQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 65+ messages in thread
From: Bart Van Assche @ 2011-12-17 17:50 UTC (permalink / raw)
  To: David Dillow; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Roland Dreier

On Thu, Dec 15, 2011 at 8:37 PM, David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org> wrote:
> > +static int srp_slave_alloc(struct scsi_device *sdev)
> > +{
> > +     struct Scsi_Host *shost = sdev->host;
> > +     struct srp_target_port *target = host_to_target(shost);
> > +
> > +     if (!WARN_ON(target->rq_tmo_jiffies == 0))
> > +             blk_queue_rq_timeout(sdev->request_queue,
> > +                                  target->rq_tmo_jiffies);
>
> What does having a WARN_ON here buy us? We should just set
> target->rq_tmo_jiffies to a default value in srp_create_target(), and
> adjust it as needed based on the CM response.

That WARN_ON() statement is there to verify that rq_tmo_jiffies has
been initialized to a non-zero value - which is the case unless
something bad like memory corruption has happened. I can leave it out
if you want.

> >  static void srp_cm_rep_handler(struct ib_cm_id *cm_id,
> >                              struct srp_login_rsp *lrsp,
> >                              struct srp_target_port *target)
> > @@ -1431,6 +1442,24 @@ static void srp_cm_rep_handler(struct ib_cm_id *cm_id,
> >       if (ret)
> >               goto error_free;
> >
> > +     if (!WARN_ON((attr_mask & (IB_QP_TIMEOUT | IB_QP_RETRY_CNT)) == 0)) {
>
> Mentally unpacking that to realize the body gets executed if either of
> those is set took a bit of thought... I think you will be better served
> setting a default and reducing line that to
>
>        if (attr_mask & (IP_QP_TIMEOUT | IB_QP_RETRY_CNT)) {
>
> Also, you need to handle the case where only one of those is set, and
> provide defaults when they aren't -- unless the spec says they both must
> always be set (I haven't looked), in which case you can modify the
> attr_mask test.

According to table 91 in the IB spec (section 11.2.4.2, Modify Queue
Pair), both the QP timeout and the retry count have to be set for RC
QP's upon the RTR to RTS transition.

> > +             uint64_t T_tr_ns, max_compl_time_ms;
> > +
> > +             /*
> > +              * Set target->rq_tmo_jiffies to one second more than the
> > +              * largest time it can take before an error completion is
> > +              * generated.
> > +              */
>
> The calculations have a fair number of magic numbers in them; may be
> useful to explain.

All these magic numbers come straight from the IB spec. I can add a
comment mentioning that if you want.

> > +             pr_debug("max. IB completion time = %lld ms; block layer"
> > +                      " timeout = %d jiffies\n", max_compl_time_ms,
> > +                      target->rq_tmo_jiffies);
>
> Is the debug statement really needed? Though I suppose it isn't really
> hurting anything...

That debug statement was convenient to verify the result of the
computation. But you are right, it's not really necessary. I'll leave
it out.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 04/14] ib_srp: Set block layer timeout
       [not found]     ` <201112011959.11956.bvanassche-HInyCGIudOg@public.gmane.org>
  2011-12-15 19:37       ` David Dillow
@ 2011-12-17 22:03       ` Or Gerlitz
       [not found]         ` <CAJZOPZKQ4rg6D=ZDt2q+aJbsNAQbgqgh19FmGC5Vi6_EQ4ROFQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  1 sibling, 1 reply; 65+ messages in thread
From: Or Gerlitz @ 2011-12-17 22:03 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, David Dillow

Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org> wrote:

> The default block layer timeout is 30 seconds.

Could you provider a pointer to where this is defined?

> Typical values for the QP local ack
> timeout and retry count are 19 and 7 respectively, which means
> that it can take up to 60.1 seconds before a HCA submits an error completion.

isn't timeout(R,X) = R * ~4us * 2^X  ==> timeout(7,19) = 7 * ~2seconds
= ~14 seconds

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 04/14] ib_srp: Set block layer timeout
       [not found]             ` <CAO+b5-p9FQVvdVZtUvRJgMW6qhcnUKrp4RhCch1Hnt8JsjS5qQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2011-12-17 22:39               ` David Dillow
  0 siblings, 0 replies; 65+ messages in thread
From: David Dillow @ 2011-12-17 22:39 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Roland Dreier

On Sat, 2011-12-17 at 18:50 +0100, Bart Van Assche wrote:
> On Thu, Dec 15, 2011 at 8:37 PM, David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org> wrote:
> > > +static int srp_slave_alloc(struct scsi_device *sdev)
> > > +{
> > > +     struct Scsi_Host *shost = sdev->host;
> > > +     struct srp_target_port *target = host_to_target(shost);
> > > +
> > > +     if (!WARN_ON(target->rq_tmo_jiffies == 0))
> > > +             blk_queue_rq_timeout(sdev->request_queue,
> > > +                                  target->rq_tmo_jiffies);
> >
> > What does having a WARN_ON here buy us? We should just set
> > target->rq_tmo_jiffies to a default value in srp_create_target(), and
> > adjust it as needed based on the CM response.
> 
> That WARN_ON() statement is there to verify that rq_tmo_jiffies has
> been initialized to a non-zero value - which is the case unless
> something bad like memory corruption has happened. I can leave it out
> if you want.

Better to just set it to a nonzero value that is appropriate for the
defaults and set it up when the CM handler gives you the reply. Sounds
like 61 seconds is the correct value, from your original commit message.
This way you know you have a sane value, even if the CM is broken, and
the user doesn't get a confusing stack trace that doesn't really tell
them what's wrong.

> >        if (attr_mask & (IP_QP_TIMEOUT | IB_QP_RETRY_CNT)) {
> >
> > Also, you need to handle the case where only one of those is set, and
> > provide defaults when they aren't -- unless the spec says they both must
> > always be set (I haven't looked), in which case you can modify the
> > attr_mask test.
> 
> According to table 91 in the IB spec (section 11.2.4.2, Modify Queue
> Pair), both the QP timeout and the retry count have to be set for RC
> QP's upon the RTR to RTS transition.

Ok, then safer to only change from the default set in
srp_create_target() when
(attr_mask & (IP_QP_TIMEOUT|IB_OP_RETRY_CNT) == (IP_OP_TIMEOUT|IB_OP_RETRY_CNT))

> > > +             /*
> > > +              * Set target->rq_tmo_jiffies to one second more than the
> > > +              * largest time it can take before an error completion is
> > > +              * generated.
> > > +              */
> >
> > The calculations have a fair number of magic numbers in them; may be
> > useful to explain.
> 
> All these magic numbers come straight from the IB spec. I can add a
> comment mentioning that if you want.

Changing to something like

/* The maximum detection time for a lost ACK is 4 times the retry timeout,
 * and we'll retry a set number of times before declaring the connection
 * dead (IB spec C9-140, C9-141, C9-142). Set our default IO timeout to be
 * one second longer to make sure we wait long enough before declaring a
 * request dead.
 */
T_tr_ns = 4096 * (1ULL << qp_attr->timeout);
max_compl_time_ms = ...

may make it pretty obvious. Also, you could use USEC_PER_MSEC and
MSEC_PER_SEC from <linux/time.h> to make the other numbers meaningful.

-- 
Dave Dillow
National Center for Computational Science
Oak Ridge National Laboratory
(865) 241-6602 office

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 04/14] ib_srp: Set block layer timeout
       [not found]         ` <CAJZOPZKQ4rg6D=ZDt2q+aJbsNAQbgqgh19FmGC5Vi6_EQ4ROFQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2011-12-17 22:39           ` David Dillow
  2011-12-18 11:53           ` Bart Van Assche
  1 sibling, 0 replies; 65+ messages in thread
From: David Dillow @ 2011-12-17 22:39 UTC (permalink / raw)
  To: Or Gerlitz; +Cc: Bart Van Assche, linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Sun, 2011-12-18 at 00:03 +0200, Or Gerlitz wrote:
> Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org> wrote:
> 
> > The default block layer timeout is 30 seconds.
> 
> Could you provider a pointer to where this is defined?
> 
> > Typical values for the QP local ack
> > timeout and retry count are 19 and 7 respectively, which means
> > that it can take up to 60.1 seconds before a HCA submits an error completion.
> 
> isn't timeout(R,X) = R * ~4us * 2^X  ==> timeout(7,19) = 7 * ~2seconds
> = ~14 seconds

C9-141 says we could take up to 4 times longer than 4us * 2^X to detect
the ACK loss.



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 04/14] ib_srp: Set block layer timeout
       [not found]         ` <CAJZOPZKQ4rg6D=ZDt2q+aJbsNAQbgqgh19FmGC5Vi6_EQ4ROFQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2011-12-17 22:39           ` David Dillow
@ 2011-12-18 11:53           ` Bart Van Assche
  1 sibling, 0 replies; 65+ messages in thread
From: Bart Van Assche @ 2011-12-18 11:53 UTC (permalink / raw)
  To: Or Gerlitz; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, David Dillow

On Sat, Dec 17, 2011 at 5:03 PM, Or Gerlitz <or.gerlitz-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org> wrote:
>
>> The default block layer timeout is 30 seconds.
>
> Could you provider a pointer to where this is defined?

Sorry, it's not the block layer but sd (SCSI disk) that sets that
timeout value. From drivers/scsi/sd.h: #define SD_TIMEOUT (30 * HZ)

>> Typical values for the QP local ack
>> timeout and retry count are 19 and 7 respectively, which means
>> that it can take up to 60.1 seconds before a HCA submits an error completion.
>
> isn't timeout(R,X) = R * ~4us * 2^X  ==> timeout(7,19) = 7 * ~2seconds
> = ~14 seconds

I started looking at the timeout settings after I noticed while
performing cable pulling tests with an mlx4-controlled HCA that it
could take up to (about) 61 seconds before an error completion was
delivered. And it looks like Dave already answered the question where
the factor four comes from ?

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 03/14] ib_srp: Disallow duplicate logins
       [not found]         ` <1323976101.16703.42.camel-FqX9LgGZnHWDB2HL1qBt2PIbXMQ5te18@public.gmane.org>
@ 2011-12-18 19:20           ` Bart Van Assche
       [not found]             ` <CAO+b5-r6dHt-vmbNjeD4zvcAaRVqhiEm5eZF7hgF6ei35kqjdQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 65+ messages in thread
From: Bart Van Assche @ 2011-12-18 19:20 UTC (permalink / raw)
  To: David Dillow; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Roland Dreier

On Thu, Dec 15, 2011 at 7:08 PM, David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org> wrote:
> On Thu, 2011-12-01 at 19:58 +0100, Bart Van Assche wrote:
> > Currently the ib_srp driver allows to log in multiple times to the
> > same target via its sysfs interface. This leads to each target LUN
> > being imported multiple times at the initiator side, something that
> > should not be allowed.
>
> I'm of mixed opinion on this one. It seems like we're working around a
> broken target -- we request single RDMA channel operation in our login
> requests, so the standard says that the target should disconnect all
> other connections for that I_T nexus when it receives a new login
> request. That should keep us from having multiple instances of LUNs from
> the same target over a single srp_host (HCA).
>
> In the past, we've seen connections having issues but not going away,
> and the only way to kill them was to create a new connection to the RAID
> controller, which would then kill the old one. It's been a few years
> since I've seen this, so the details are a bit fuzzy, but I recall the
> current behavior being useful.
>
> However, I'm not adverse to going beyond the standard if it makes sense,
> and your later patch to allow disconnecting channels without removing
> the module would fix the use case above. What problem are we solving
> here other than a potentially confused admin?

I agree that relogin is a useful feature. But I doubt the way it works
now is fine because during a certain time window all LUNs appear twice
at the initiator side - half of them working, half of them not.
Relogin is still possible with this patch set, but it requires an
extra step: deleting a target before attempting to log in again.

> > +     spin_lock(&host->target_lock);
> > +     list_for_each_entry(t, &host->target_list, list) {
> > +             WARN_ON(t == target);
> > +             if (target->id_ext == t->id_ext &&
> > +                 target->ioc_guid == t->ioc_guid &&
> > +                 memcmp(target->orig_dgid, t->orig_dgid, 16) == 0 &&
> > +                 target->path.pkey == t->path.pkey &&
> > +                 target->service_id == t->service_id) {
>
> The spec discusses the I_T nexus defined by the initiator and target
> port identifiers as the unit for disambiguation when running in single
> channel mode. The initiator port identifier is controlled by the port's
> GUID combined with target->initiator_ext and the target port identifier
> is defined by target->{id_ext,ioc_guid}.
>
> From the standard's perspective, why should we care about pkey, dgid (or
> orig_dgid as above) or the service id? Those don't change the I_T nexus,
> so the target should disconnect previous channels appropriately.

I'm fine with leaving out the PKey and the service ID. But I'm not
sure it's fine to leave the DGID out because leaving out the DGID will
prevent setting up redundant connections from an initiator to a target
with multiple active IB ports. The reason the orig_dgid is being used
in the above comparison instead of dgid is because these two ID's may
differ due to automatic path migration.

> >  static ssize_t srp_create_target(struct device *dev,
> >                                struct device_attribute *attr,
> >                                const char *buf, size_t count)
> > @@ -2133,6 +2161,10 @@ static ssize_t srp_create_target(struct device *dev,
> >       if (ret)
> >               goto err;
> >
> > +     ret = srp_target_exists(host, target);
> > +     if (ret)
> > +             goto err;
> > +
>
> We need to tell the user that we didn't create a new connection because
> one already exists.

Does that mean that you'd like to see a pr_err() in addition to the
error code returned via the sysfs write() call ?

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 03/14] ib_srp: Disallow duplicate logins
       [not found]             ` <CAO+b5-r6dHt-vmbNjeD4zvcAaRVqhiEm5eZF7hgF6ei35kqjdQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2011-12-18 21:40               ` David Dillow
       [not found]                 ` <1324244446.17849.39.camel-1q1vX8mYZiGLUyTwlgNVppKKF0rrzTr+@public.gmane.org>
  0 siblings, 1 reply; 65+ messages in thread
From: David Dillow @ 2011-12-18 21:40 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Roland Dreier

On Sun, 2011-12-18 at 19:20 +0000, Bart Van Assche wrote:
> On Thu, Dec 15, 2011 at 7:08 PM, David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org> wrote:
> > On Thu, 2011-12-01 at 19:58 +0100, Bart Van Assche wrote:
> > > Currently the ib_srp driver allows to log in multiple times to the
> > > same target via its sysfs interface. This leads to each target LUN
> > > being imported multiple times at the initiator side, something that
> > > should not be allowed.
> >
> > I'm of mixed opinion on this one. It seems like we're working around a
> > broken target -- we request single RDMA channel operation in our login
> > requests, so the standard says that the target should disconnect all
> > other connections for that I_T nexus when it receives a new login
> > request. That should keep us from having multiple instances of LUNs from
> > the same target over a single srp_host (HCA).
> >
> > In the past, we've seen connections having issues but not going away,
> > and the only way to kill them was to create a new connection to the RAID
> > controller, which would then kill the old one. It's been a few years
> > since I've seen this, so the details are a bit fuzzy, but I recall the
> > current behavior being useful.
> >
> > However, I'm not adverse to going beyond the standard if it makes sense,
> > and your later patch to allow disconnecting channels without removing
> > the module would fix the use case above. What problem are we solving
> > here other than a potentially confused admin?
> 
> I agree that relogin is a useful feature. But I doubt the way it works
> now is fine because during a certain time window all LUNs appear twice
> at the initiator side - half of them working, half of them not.
> Relogin is still possible with this patch set, but it requires an
> extra step: deleting a target before attempting to log in again.

According to the spec, they should only appear once -- the old
connection should be disconnected before the new login request is
accepted. Of course, the spec does not always match reality, though it
has worked that way on the HW I've tested. But as I said, I'm fine with
adding this to the initiator -- I just want good justification.

> > > +     spin_lock(&host->target_lock);
> > > +     list_for_each_entry(t, &host->target_list, list) {
> > > +             WARN_ON(t == target);
> > > +             if (target->id_ext == t->id_ext &&
> > > +                 target->ioc_guid == t->ioc_guid &&
> > > +                 memcmp(target->orig_dgid, t->orig_dgid, 16) == 0 &&
> > > +                 target->path.pkey == t->path.pkey &&
> > > +                 target->service_id == t->service_id) {
> >
> > The spec discusses the I_T nexus defined by the initiator and target
> > port identifiers as the unit for disambiguation when running in single
> > channel mode. The initiator port identifier is controlled by the port's
> > GUID combined with target->initiator_ext and the target port identifier
> > is defined by target->{id_ext,ioc_guid}.
> >
> > From the standard's perspective, why should we care about pkey, dgid (or
> > orig_dgid as above) or the service id? Those don't change the I_T nexus,
> > so the target should disconnect previous channels appropriately.
> 
> I'm fine with leaving out the PKey and the service ID. But I'm not
> sure it's fine to leave the DGID out because leaving out the DGID will
> prevent setting up redundant connections from an initiator to a target
> with multiple active IB ports. The reason the orig_dgid is being used
> in the above comparison instead of dgid is because these two ID's may
> differ due to automatic path migration.

A SRP target port (for the purposes of an I_T nexus) is defined by the
IOC GUID and extension; if you want to have redundant connections to a
port, multi-channel logins must be supported. The way most targets avoid
this issue is to present a different IOC on each IB port, though you
should be also able to export multiple ServiceEntries in the
IOControllerProfile response to allow different service extensions to be
selected.

As for APM, I thought we had determined the last time it came up that
implementing that would require protocol changes to SRP. I haven't gone
back to refresh my memory -- and it isn't implemented yet either way, so
we could go back and relax this check as part of the implementation.

> > > +     ret = srp_target_exists(host, target);
> > > +     if (ret)
> > > +             goto err;
> > > +
> >
> > We need to tell the user that we didn't create a new connection because
> > one already exists.
> 
> Does that mean that you'd like to see a pr_err() in addition to the
> error code returned via the sysfs write() call ?

Yes, similar to what we do for bad parameters in srp_parse_options().
Most scripts use echo to add a target, and only see succeed/fail; the
admin will need a way to see what failed. The errno is useful to daemons
using write(), though.

-- 
Dave Dillow
National Center for Computational Science
Oak Ridge National Laboratory
(865) 241-6602 office

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 07/14] ib_srp: Introduce srp_handle_qp_err()
       [not found]     ` <201112012002.56307.bvanassche-HInyCGIudOg@public.gmane.org>
@ 2011-12-18 23:55       ` David Dillow
  0 siblings, 0 replies; 65+ messages in thread
From: David Dillow @ 2011-12-18 23:55 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Roland Dreier

On Thu, 2011-12-01 at 20:02 +0100, Bart Van Assche wrote:
> +static void srp_handle_qp_err(enum ib_wc_status wc_status,
> +			      enum ib_wc_opcode wc_opcode,
> +			      struct srp_target_port *target)
> +{
> +	shost_printk(KERN_ERR, target->scsi_host,
> +		     PFX "failed receive status %d\n", wc_status);
> +	target->qp_in_error = 1;
> +}

You don't do anything with the wc_opcode here. You add a shost_printk()
in patch 12 -- that part should be in this patch.


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 10/14] srp_transport: Simplify attribute initialization code
       [not found]     ` <201112012008.00502.bvanassche-HInyCGIudOg@public.gmane.org>
@ 2011-12-19  0:07       ` David Dillow
       [not found]         ` <1324253243.17849.45.camel-1q1vX8mYZiGLUyTwlgNVppKKF0rrzTr+@public.gmane.org>
  0 siblings, 1 reply; 65+ messages in thread
From: David Dillow @ 2011-12-19  0:07 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Fujita Tomonori, Brian King,
	Roland Dreier, linux-scsi-u79uwXL29TY76Z2rM5mHXA

On Thu, 2011-12-01 at 20:08 +0100, Bart Van Assche wrote:
> Eliminate the private_rport_attrs[] array and the SETUP_*() macros
> used to set up that array since the information in that array
> duplicates the information in the static device attributes. Also,
> trigger a kernel warning if SRP_RPORT_ATTRS is not large enough
> since it is easy to forget to update that macro when adding new
> attributes.

Not sure I care one way or the other, but it looks to have been written
the way it was to share a common code structure with the FC, SAS, and
other SCSI transport code. They use a BUG_ON instead of WARN_ON, which
seems more appropriate, since we just overran a buffer...

The SCSI guys need to review this one, as it needs their ack.
-- 
Dave Dillow
National Center for Computational Science
Oak Ridge National Laboratory
(865) 241-6602 office

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 11/14] ib_srp: Document sysfs attributes
       [not found]     ` <201112012009.12815.bvanassche-HInyCGIudOg@public.gmane.org>
@ 2011-12-19  0:33       ` David Dillow
       [not found]         ` <1324254811.17849.65.camel-1q1vX8mYZiGLUyTwlgNVppKKF0rrzTr+@public.gmane.org>
  0 siblings, 1 reply; 65+ messages in thread
From: David Dillow @ 2011-12-19  0:33 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Roland Dreier

On Thu, 2011-12-01 at 20:09 +0100, Bart Van Assche wrote:
> Document the ib_srp sysfs attributes according to the rules specified
> in Documentation/ABI/README.

%s/ib_srp/SRP initiator/g

> +What:		/sys/class/infiniband_srp/srp-<hca>-<port_number>/add_target
> +Date:		January 2, 2006
> +KernelVersion:	2.6.15
> +Contact:	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> +Description:	Interface for making ib_srp connect to a new target.
> +		One can request ib_srp to connect to a new target by writing
> +		a comma-separated list of login parameters to this sysfs
> +		attribute. The supported parameters are:
> +		* id_ext, a 16-digit hexadecimal number specifying the last
> +		  eight bytes of the 16-byte target port identifier sent by
> +		  ib_srp to the target in the SRP_LOGIN_REQ request.

... specifying the eight byte identifier extension portion of the SCSI
target port identifier.

First 8 or last 8 depends on io_class...

> +		* ioc_guid, a 16-digit hexadecimal number specifying the first
> +		  eight bytes of the 16-byte target port identifier sent by
> +		  ib_srp to the target in the SRP_LOGIN_REQ request.

... specifying the eight byte IOController GUID portion of the SCSI
target port identifier.

> +		* dgid, a 16-digit hexadecimal number specifying the
> +		  destination GID.
> +		* pkey, a four-digit hexadecimal number specifying the
> +		  InfiniBand partition key.
> +		* service_id, a 16-digit hexadecimal number specifying the
> +		  InfiniBand service ID used for communication with the SRP
> +		  target.

... used to establish communication with the SRP target.

> +		* max_sect, a decimal number specifying the maximum number of
> +		  sectors to be transferred via a single SRP_CMD request.

512 byte sectors, via a single SCSI command.

> +		* max_cmd_per_lun, a decimal number specifying the maximum
> +		  number of outstanding commands per LUN.

for a given LUN

> +		* io_class, a hexadecimal number specifying the SRP I/O class.
> +		  Must be either 0xff00 (rev 10) or 0x0100 (rev 16a).
> +		* initiator_ext, a 16-digit hexadecimal number specifying the
> +		  last eight bytes of the 16-byte initiator port identifier
> +		  sent by ib_srp to the target in the SRP_LOGIN_REQ request.

... specifying the identifier extension portion of the SCSI initiator
port ID.

> +		* cmd_sg_entries, the maximum number of memory descriptors
> +		  that fit in a single SRP command when using the direct data
> +		  buffer descriptor format.

s/direct/indirect/

> +		* allow_ext_sg, whether ib_srp is allowed to send SRP_CMD
> +		  requests to the target with an indirect data buffer
> +		  descriptor, that is a data buffer descriptor that has to be
> +		  obtained by the target via an additional RDMA transfer.

allow_ext_sg, set to non-zero to allow the initiator to send an indirect
data buffer descriptor with more than cmd_sg_entries; portions of this
descriptor must be obtained by the target via an additional RDMA
transfer. The default is to never send such a request, as the SRP target
must support this mode of operation.

> +		* sg_tablesize, a number specifying the maximum length of
> +		  scatter/gather lists passed by the SCSI layer to ib_srp.

... maximum length of the scatter/gather lists supported by the SRP
initiator. This value may be larger than cmd_sg_entries if allow_ext_sg
is true, or the administrator accepts the (small) risk that fast memory
registration will not be able to collapse the SG list into few enough
entries.


> +
> +What:		/sys/class/scsi_host/host<n>/allow_ext_sg
> +Date:		May 19, 2011
> +KernelVersion:	2.6.39
> +Contact:	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> +Description:	Whether ib_srp is allowed to send SRP_CMD requests to the
> +		target with an indirect data buffer descriptor, that is a
> +		data buffer descriptor that has to be obtained by the target
> +		via an additional RDMA transfer. If set to zero, ib_srp will
> +		refuse to process data transfers whose scatter/gather list
> +		length exceeds cmd_sg_entries.

If true, the initiator believes the target will accept an indirect data
buffer descriptor with more than cmd_sg_entries, which requires an
additional RDMA operation. If false and memory registration mechanism
fails to collapse the SG list to few enough entries, the initiator will
fail the command.

> +
> +What:		/sys/class/scsi_host/host<n>/cmd_sg_entries
> +Date:		May 19, 2011
> +KernelVersion:	2.6.39
> +Contact:	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> +Description:	See also allow_ext_sg.

Maximum number of indirect data buffer descriptors that will fit inside
the request buffer without requiring an additional RDMA operation.

> +What:		/sys/class/scsi_host/host<n>/id_ext
> +What:		/sys/class/scsi_host/host<n>/ioc_guid

Same comments as for add_target.

> +What:		/sys/class/scsi_host/host<n>/service_id
> +Date:		June 17, 2006
> +KernelVersion:	2.6.17
> +Contact:	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> +Description:	InfiniBand service ID used for communication with the SRP
> +		target.

... for the InfiniBand connection management request used to establish
communication with the SRP target.


Interesting that we don't seem to have a sysfs entry for the initiator
ID value used.
-- 
Dave Dillow
National Center for Computational Science
Oak Ridge National Laboratory
(865) 241-6602 office

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 13/14] ib_srp: Implement transport layer ping
  2011-12-01 19:11   ` [PATCH 13/14] ib_srp: Implement transport layer ping Bart Van Assche
@ 2011-12-19  0:50     ` David Dillow
  2011-12-19 10:16       ` Bart Van Assche
  0 siblings, 1 reply; 65+ messages in thread
From: David Dillow @ 2011-12-19  0:50 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: linux-rdma, linux-scsi, Roland Dreier, Fujita Tomonori, Brian King

On Thu, 2011-12-01 at 20:11 +0100, Bart Van Assche wrote:
> Add a time-based transport layer test such that fail-over in a multipath
> setup can happen quickly.

Why should this be done in the kernel? multipathd already verifies all
paths to a SCSI device are up and that the device is reachable.
-- 
Dave Dillow
National Center for Computational Science
Oak Ridge National Laboratory
(865) 241-6602 office


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 12/14] ib_srp: Rework error handling
       [not found]   ` <201112012010.37276.bvanassche-HInyCGIudOg@public.gmane.org>
  2011-12-15 20:20     ` David Dillow
@ 2011-12-19  3:36     ` David Dillow
  2011-12-19 10:38       ` Bart Van Assche
       [not found]       ` <1324265791.17849.92.camel-1q1vX8mYZiGLUyTwlgNVppKKF0rrzTr+@public.gmane.org>
  1 sibling, 2 replies; 65+ messages in thread
From: David Dillow @ 2011-12-19  3:36 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Fujita Tomonori, Brian King,
	Roland Dreier, linux-scsi-u79uwXL29TY76Z2rM5mHXA

On Thu, 2011-12-01 at 20:10 +0100, Bart Van Assche wrote:
> Add the necessary functions in the SRP transport module to allow
> an SRP initiator driver to implement transport layer recovery.
> 
> Convert srp_target_port.qp_in_error into a target port state.

This part should probably be folded into patch 7.

> Factor out the code for removing an SRP target into the new function
> srp_remove_target().

This could be a separate patch, making that easy to review.

> In the ib_srp IB completion handlers, do not stop processing
> completions after the first error completion or a CM DREQ has been
> received.

Why do this? It seems like you're just going to create a flurry of error
messages when one would suffice, and we've got to tear down and
reconnect anyway.

> Block the SCSI target as soon as the first IB error
> completion has been received. Eliminate the SCSI target state test
> from srp_queuecommand().

I think you still want to check the SRP target state in
srp_queuecommand() -- you schedule the blocking of the queue, but it
could be a while before it happens. No sense sending down more commands
when you know they are going to fail -- and if you're worried about the
performance of the conditional in the hot path, we can make it
unlikely(). Cleaning up the target states lets us just check for
SRP_TARGET_LIVE, though you would want to change the creation sequence
so that isn't the birth state of the target.

> Rework ib_srp transport layer error handling. Instead of letting SCSI
> commands time out if a transport layer error occurs,

This is good, but should probably be part of the initial disconnect. We
want to run the receive queue dry, processing responses to avoid
unnecessarily resetting a command that was successful.

> block the SCSI
> host and try to reconnect until the reconnect timeout elapses or until
> the maximum number of reconnect attempts has been exceeded, whichever
> happens first.

When we're blocked for a disconnected target, we may want to call the
state something other than SRP_TARGET_BLOCKED. I'd like to eventually
handle the case where the target leaves the fabric temporarily -- we
don't necessarily disconnect in that case, but we need to block commands
until it comes back or we give up on it.

>  /*
>   * Copyright (c) 2005 Cisco Systems.  All rights reserved.
> + * Copyright (c) 2010-2011 Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>.

You've tried to add this in the past; I don't mean to deny you credit,
and I certainly value your work, but I'm not sure where the threshold is
for adding a copyright line.

Roland, care to comment here?


> +static void srp_remove_target(struct srp_target_port *target)
>  {

>  	srp_del_scsi_host_attr(target->scsi_host);
> +	cancel_work_sync(&target->block_work);
> +	mutex_lock(&target->mutex);
> +	mutex_unlock(&target->mutex);

You lock and unlock here without doing anything in the critical section.


>  static int srp_reset_host(struct scsi_cmnd *scmnd)
>  {
>  	struct srp_target_port *target = host_to_target(scmnd->device->host);
> +	struct srp_host *host = target->srp_host;
>  	int ret = FAILED;
> +	bool remove = false;
>  
>  	shost_printk(KERN_ERR, target->scsi_host, PFX "SRP reset_host called\n");
>  
> -	if (!srp_reconnect_target(target))
> +	if (!srp_reconnect_target(target)) {
>  		ret = SUCCESS;
> +	} else {
> +		/*
> +		 * We couldn't reconnect, so kill our target port off.
> +		 * However, we have to defer the real removal because we
> +		 * are in the context of the SCSI error handler now, which
> +		 * will deadlock if we call scsi_remove_host().

Part of me wonders if killing the SCSI host on a failed reconnect is the
correct thing to do. Certainly, the checks in scsi_eh_test_devices() are
going to kick the drives to offline state, but the FC transport blocks
the error handling to avoid that.

I think we need to give our error behavior a bit more thought.
-- 
Dave Dillow
National Center for Computational Science
Oak Ridge National Laboratory
(865) 241-6602 office

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 14/14] ib_srp: Allow SRP disconnect through sysfs
  2011-12-01 19:13 ` [PATCH 14/14] ib_srp: Allow SRP disconnect through sysfs Bart Van Assche
@ 2011-12-19  4:03   ` David Dillow
       [not found]     ` <1324267414.17849.98.camel-1q1vX8mYZiGLUyTwlgNVppKKF0rrzTr+@public.gmane.org>
  0 siblings, 1 reply; 65+ messages in thread
From: David Dillow @ 2011-12-19  4:03 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: linux-rdma, linux-scsi, Roland Dreier, Fujita Tomonori, Brian King

On Thu, 2011-12-01 at 20:13 +0100, Bart Van Assche wrote:
> Make it possible to disconnect via sysfs the IB RC connection used by the
> SRP protocol to communicate with a target.
> 
> Let the SRP transport layer create a sysfs "delete" attribute for
> initiator drivers that support this functionality.

This will be a nice feature to have, but I wonder about the proper
behavior. I had thought about having a 'drain' state, where we wouldn't
start new commands and would let existing ones complete or timeout
before killing the connection. That may be in addition to a 'kill it
now' variant.

Though perhaps my gentler delete is best done by having a user script
walk all of the devices under the host and set them to offline, wait for
scsi_host/hostX/host_busy to drop to zero, then hit the SRP delete attr.

-- 
Dave Dillow
National Center for Computational Science
Oak Ridge National Laboratory
(865) 241-6602 office


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 11/14] ib_srp: Document sysfs attributes
       [not found]         ` <1324254811.17849.65.camel-1q1vX8mYZiGLUyTwlgNVppKKF0rrzTr+@public.gmane.org>
@ 2011-12-19  8:46           ` Bart Van Assche
       [not found]             ` <CAO+b5-rb=Gtch0UCZPwTSCHhOdsUBecSpgYjaLvj5pbo9_1AeQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 65+ messages in thread
From: Bart Van Assche @ 2011-12-19  8:46 UTC (permalink / raw)
  To: David Dillow; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Roland Dreier

On Mon, Dec 19, 2011 at 12:33 AM, David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org> wrote:
> On Thu, 2011-12-01 at 20:09 +0100, Bart Van Assche wrote:
>> +             * cmd_sg_entries, the maximum number of memory descriptors
>> +               that fit in a single SRP command when using the direct data
>> +               buffer descriptor format.
>
> s/direct/indirect/

For the current ib_srp implementation cmd_sg_entries is the maximum
number of direct data buffer descriptors allocated.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 14/14] ib_srp: Allow SRP disconnect through sysfs
       [not found]     ` <1324267414.17849.98.camel-1q1vX8mYZiGLUyTwlgNVppKKF0rrzTr+@public.gmane.org>
@ 2011-12-19  9:04       ` Bart Van Assche
  0 siblings, 0 replies; 65+ messages in thread
From: Bart Van Assche @ 2011-12-19  9:04 UTC (permalink / raw)
  To: David Dillow
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA, Roland Dreier,
	Fujita Tomonori, Brian King

On Mon, Dec 19, 2011 at 4:03 AM, David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org> wrote:
> On Thu, 2011-12-01 at 20:13 +0100, Bart Van Assche wrote:
>> Make it possible to disconnect via sysfs the IB RC connection used by the
>> SRP protocol to communicate with a target.
>>
>> Let the SRP transport layer create a sysfs "delete" attribute for
>> initiator drivers that support this functionality.
>
> This will be a nice feature to have, but I wonder about the proper
> behavior. I had thought about having a 'drain' state, where we wouldn't
> start new commands and would let existing ones complete or timeout
> before killing the connection. That may be in addition to a 'kill it
> now' variant.
>
> Though perhaps my gentler delete is best done by having a user script
> walk all of the devices under the host and set them to offline, wait for
> scsi_host/hostX/host_busy to drop to zero, then hit the SRP delete attr.

Sorry, but I disagree. One of the goals of this patch set is to make
fast fail-over possible. Waiting for an error completion or time out
contradicts with that goal.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 13/14] ib_srp: Implement transport layer ping
  2011-12-19  0:50     ` David Dillow
@ 2011-12-19 10:16       ` Bart Van Assche
  2011-12-19 22:32         ` David Dillow
  0 siblings, 1 reply; 65+ messages in thread
From: Bart Van Assche @ 2011-12-19 10:16 UTC (permalink / raw)
  To: David Dillow
  Cc: linux-rdma, linux-scsi, Roland Dreier, Fujita Tomonori, Brian King

On Mon, Dec 19, 2011 at 12:50 AM, David Dillow <dillowda@ornl.gov> wrote:
> On Thu, 2011-12-01 at 20:11 +0100, Bart Van Assche wrote:
>> Add a time-based transport layer test such that fail-over in a multipath
>> setup can happen quickly.
>
> Why should this be done in the kernel? multipathd already verifies all
> paths to a SCSI device are up and that the device is reachable.

I'm afraid it's impossible to make a transport layer check work
reliably from user space. As an example, srp_reset_host() blocks the
SCSI host before reconnecting. Before starting to attempt to
reconnect, that action does block the SCSI host and hence also all
transport layer checks issued from user space. I doubt it's possible
to fix the resulting race between a transport layer reconnect issued
from srp_reset_host() and a transport layer reconnect triggered from
user space.

Bart.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 03/14] ib_srp: Disallow duplicate logins
       [not found]                 ` <1324244446.17849.39.camel-1q1vX8mYZiGLUyTwlgNVppKKF0rrzTr+@public.gmane.org>
@ 2011-12-19 10:31                   ` Bart Van Assche
  0 siblings, 0 replies; 65+ messages in thread
From: Bart Van Assche @ 2011-12-19 10:31 UTC (permalink / raw)
  To: David Dillow; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Roland Dreier

On Sun, Dec 18, 2011 at 9:40 PM, David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org> wrote:
> A SRP target port (for the purposes of an I_T nexus) is defined by the
> IOC GUID and extension; if you want to have redundant connections to a
> port, multi-channel logins must be supported. The way most targets avoid
> this issue is to present a different IOC on each IB port, though you
> should be also able to export multiple ServiceEntries in the
> IOControllerProfile response to allow different service extensions to be
> selected.

I'm considering to leave out this patch since deleting an SRP target
port already provides a way to avoid that the initiator creates
duplicate LUNs.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 12/14] ib_srp: Rework error handling
  2011-12-19  3:36     ` David Dillow
@ 2011-12-19 10:38       ` Bart Van Assche
  2011-12-19 22:51         ` David Dillow
       [not found]       ` <1324265791.17849.92.camel-1q1vX8mYZiGLUyTwlgNVppKKF0rrzTr+@public.gmane.org>
  1 sibling, 1 reply; 65+ messages in thread
From: Bart Van Assche @ 2011-12-19 10:38 UTC (permalink / raw)
  To: David Dillow
  Cc: linux-rdma, Fujita Tomonori, Brian King, Roland Dreier, linux-scsi

On Mon, Dec 19, 2011 at 3:36 AM, David Dillow <dillowda@ornl.gov> wrote:
> On Thu, 2011-12-01 at 20:10 +0100, Bart Van Assche wrote:
>> Rework ib_srp transport layer error handling. Instead of letting SCSI
>> commands time out if a transport layer error occurs,
>
> This is good, but should probably be part of the initial disconnect. We
> want to run the receive queue dry, processing responses to avoid
> unnecessarily resetting a command that was successful.

Blocking the SCSI host doesn't prevent ib_srp from continuing to
process IB completion notifications.

>> block the SCSI
>> host and try to reconnect until the reconnect timeout elapses or until
>> the maximum number of reconnect attempts has been exceeded, whichever
>> happens first.
>
> When we're blocked for a disconnected target, we may want to call the
> state something other than SRP_TARGET_BLOCKED. I'd like to eventually
> handle the case where the target leaves the fabric temporarily -- we
> don't necessarily disconnect in that case, but we need to block commands
> until it comes back or we give up on it.

The case where the target leaves the fabric temporarily should already
be covered by this patch series.

>> +static void srp_remove_target(struct srp_target_port *target)
>>  {
>
>>       srp_del_scsi_host_attr(target->scsi_host);
>> +     cancel_work_sync(&target->block_work);
>> +     mutex_lock(&target->mutex);
>> +     mutex_unlock(&target->mutex);
>
> You lock and unlock here without doing anything in the critical section.

That's on purpose.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 11/14] ib_srp: Document sysfs attributes
       [not found]             ` <CAO+b5-rb=Gtch0UCZPwTSCHhOdsUBecSpgYjaLvj5pbo9_1AeQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2011-12-19 21:27               ` David Dillow
  0 siblings, 0 replies; 65+ messages in thread
From: David Dillow @ 2011-12-19 21:27 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Roland Dreier

On Mon, 2011-12-19 at 03:46 -0500, Bart Van Assche wrote:
> On Mon, Dec 19, 2011 at 12:33 AM, David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org> wrote:
> > On Thu, 2011-12-01 at 20:09 +0100, Bart Van Assche wrote:
> >> +             * cmd_sg_entries, the maximum number of memory descriptors
> >> +               that fit in a single SRP command when using the direct data
> >> +               buffer descriptor format.
> >
> > s/direct/indirect/
> 
> For the current ib_srp implementation cmd_sg_entries is the maximum
> number of direct data buffer descriptors allocated.

The direct memory descriptor format is only used when there is a single
memory region for the command; all other cases are described by an
indirect table; cmd_sg_entries limits the number of memory descriptors
from that table that will be cached in the command packet.

-- 
Dave Dillow
National Center for Computational Science
Oak Ridge National Laboratory
(865) 241-6602 office

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 13/14] ib_srp: Implement transport layer ping
  2011-12-19 10:16       ` Bart Van Assche
@ 2011-12-19 22:32         ` David Dillow
       [not found]           ` <1324333931.7043.52.camel-FqX9LgGZnHWDB2HL1qBt2PIbXMQ5te18@public.gmane.org>
  0 siblings, 1 reply; 65+ messages in thread
From: David Dillow @ 2011-12-19 22:32 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: linux-rdma, linux-scsi, Roland Dreier, Fujita Tomonori, Brian King

On Mon, 2011-12-19 at 05:16 -0500, Bart Van Assche wrote:
> On Mon, Dec 19, 2011 at 12:50 AM, David Dillow <dillowda@ornl.gov> wrote:
> > On Thu, 2011-12-01 at 20:11 +0100, Bart Van Assche wrote:
> >> Add a time-based transport layer test such that fail-over in a multipath
> >> setup can happen quickly.
> >
> > Why should this be done in the kernel? multipathd already verifies all
> > paths to a SCSI device are up and that the device is reachable.
> 
> I'm afraid it's impossible to make a transport layer check work
> reliably from user space. As an example, srp_reset_host() blocks the
> SCSI host before reconnecting. Before starting to attempt to
> reconnect, that action does block the SCSI host and hence also all
> transport layer checks issued from user space. I doubt it's possible
> to fix the resulting race between a transport layer reconnect issued
> from srp_reset_host() and a transport layer reconnect triggered from
> user space.

Part of the problem is introduced by allowing for permanent connections
rather than using the familiar dev_loss_tmo and fast_io_fail_tmo
parameters from other SCSI transports. For instance, in the FC
transport, rports are allowed to disappear for up to dev_loss_tmo
seconds before being removed from the SCSI device tree. Until they have
been gone for fast_fail_io_tmo seconds, they are blocked (as is error
handling to prevent offlining devices). Once they have been gone longer
that fast_fail_io_tmo, they become unblocked and new IO will be failed.

Now, the FC transport is probably a bit more complex than we want right
now, but following it (and the SAS transport's) lead should keep us in
sync with where the rest of the SCSI stack is headed.

As for reliability from user space, multipathd checks that the SCSI
initiator is not blocked before checking the liveness of the path;
blocked paths are assumed to be down. It still seems to default to a 30
second timeout for the test TUR/directIO/etc check, and that doesn't
currently look to be configurable (fixable). These timeouts are
independent of the SCSI layer, and will mark a path down for new traffic
without waiting for the lower level timeout.

The transport layer reconnect from user space (I'm assuming you were
thinking ioctl or using the sg device) would coordinated by the
SCSI-midlayer, using calls to srp_reset_host(), so I think we avoid any
race condition. Or did you mean a manual reconnect attempt from
manipulating srp_host/X/reconnect_tmo via sysfs -- in which case it is
in our code, so we can certainly avoid race conditions.

I still think this is already solved in user space, but the new
reconnect model you've implemented doesn't match up with the expected
semantics. It'd be better to match the rest of the SCSI stack for this.
-- 
Dave Dillow
National Center for Computational Science
Oak Ridge National Laboratory
(865) 241-6602 office


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 12/14] ib_srp: Rework error handling
  2011-12-19 10:38       ` Bart Van Assche
@ 2011-12-19 22:51         ` David Dillow
       [not found]           ` <1324335083.7043.66.camel-FqX9LgGZnHWDB2HL1qBt2PIbXMQ5te18@public.gmane.org>
  0 siblings, 1 reply; 65+ messages in thread
From: David Dillow @ 2011-12-19 22:51 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: linux-rdma, Fujita Tomonori, Brian King, Roland Dreier, linux-scsi

On Mon, 2011-12-19 at 05:38 -0500, Bart Van Assche wrote:
> On Mon, Dec 19, 2011 at 3:36 AM, David Dillow <dillowda@ornl.gov> wrote:
> > On Thu, 2011-12-01 at 20:10 +0100, Bart Van Assche wrote:
> >> Rework ib_srp transport layer error handling. Instead of letting SCSI
> >> commands time out if a transport layer error occurs,
> >
> > This is good, but should probably be part of the initial disconnect. We
> > want to run the receive queue dry, processing responses to avoid
> > unnecessarily resetting a command that was successful.
> 
> Blocking the SCSI host doesn't prevent ib_srp from continuing to
> process IB completion notifications.

In your series, it doesn't -- and I was wrong about the message spam,
you check for the proper state and the work being queued.

I haven't parsed it all out from your changes just yet, but I think part
of the reason you may have had problems with req->scmd being null in
srp_handle_recv() is due to a new race between the tear down of the
connection and continuing to process completion notifications.

> >> block the SCSI
> >> host and try to reconnect until the reconnect timeout elapses or until
> >> the maximum number of reconnect attempts has been exceeded, whichever
> >> happens first.
> >
> > When we're blocked for a disconnected target, we may want to call the
> > state something other than SRP_TARGET_BLOCKED. I'd like to eventually
> > handle the case where the target leaves the fabric temporarily -- we
> > don't necessarily disconnect in that case, but we need to block commands
> > until it comes back or we give up on it.
> 
> The case where the target leaves the fabric temporarily should already
> be covered by this patch series.

Only once a command is sent to it and timesout or we get a QP error. The
idea here would be to start the dev_loss_tmo timer once it was detected
that it left the fabric. This was handled in OFED via sysfs attributes,
but it seems like we could register for the events ourselves.

This isn't something I expect you to implement, but I'd like to leave
room for it in the future.

> >> +static void srp_remove_target(struct srp_target_port *target)
> >>  {
> >
> >>       srp_del_scsi_host_attr(target->scsi_host);
> >> +     cancel_work_sync(&target->block_work);
> >> +     mutex_lock(&target->mutex);
> >> +     mutex_unlock(&target->mutex);
> >
> > You lock and unlock here without doing anything in the critical section.
> 
> That's on purpose.

I'm assuming you do this to ensure that everyone has seen the
appropriate state and exited the critical section, then? Best to add a
comment to that effect. Actually, is there any reason to unlock it again
since we're removing the target? I don't have the code in front of me
ATM, so I'm assuming that you don't call any other routines that need
the lock after this. sparse may complain, though.

-- 
Dave Dillow
National Center for Computational Science
Oak Ridge National Laboratory
(865) 241-6602 office


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 12/14] ib_srp: Rework error handling
       [not found]           ` <1324335083.7043.66.camel-FqX9LgGZnHWDB2HL1qBt2PIbXMQ5te18@public.gmane.org>
@ 2011-12-20  9:01             ` Bart Van Assche
       [not found]               ` <CAO+b5-qF2taG0B4n9SBwqnuh0wajH5fXFLTb-VAaDrfT9TZ6aQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 65+ messages in thread
From: Bart Van Assche @ 2011-12-20  9:01 UTC (permalink / raw)
  To: David Dillow
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Fujita Tomonori, Brian King,
	Roland Dreier, linux-scsi-u79uwXL29TY76Z2rM5mHXA

On Mon, Dec 19, 2011 at 10:51 PM, David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org> wrote:
> I haven't parsed it all out from your changes just yet, but I think part
> of the reason you may have had problems with req->scmd being null in
> srp_handle_recv() is due to a new race between the tear down of the
> connection and continuing to process completion notifications.

The resources used by the QP completion handler are the srp_target
port data structure and the QP data structure. And as you can see in
srp_remove_target(), the scsi_host_put() call is invoked after
srp_disconnect_target(). That last function waits for the DREP
notification from the IB CM. That should guarantee that no IB
completions arrive during or after the scsi_host_put() call. Or did I
miss something ?

Note: I haven't observed the req->scmd == NULL condition yet.

> > > You lock and unlock here without doing anything in the critical section.
> >
> > That's on purpose.
>
> I'm assuming you do this to ensure that everyone has seen the
> appropriate state and exited the critical section, then? Best to add a
> comment to that effect.

OK, will do.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 13/14] ib_srp: Implement transport layer ping
       [not found]           ` <1324333931.7043.52.camel-FqX9LgGZnHWDB2HL1qBt2PIbXMQ5te18@public.gmane.org>
@ 2011-12-20 10:13             ` Bart Van Assche
       [not found]               ` <CAO+b5-qLxmcXCCxA8+bPYsinjr1eqCDO2JUJbjgVr59N55CU1Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2011-12-20 10:27             ` Bart Van Assche
  1 sibling, 1 reply; 65+ messages in thread
From: Bart Van Assche @ 2011-12-20 10:13 UTC (permalink / raw)
  To: David Dillow
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA, Roland Dreier,
	Fujita Tomonori, Brian King

On Mon, Dec 19, 2011 at 10:32 PM, David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org> wrote:
> I still think this is already solved in user space, but the new
> reconnect model you've implemented doesn't match up with the expected
> semantics. It'd be better to match the rest of the SCSI stack for this.

Sorry, but I'm afraid there is no "rest of the SCSI stack
[semantics]". I've been following the semantics implemented in
open-iscsi. If the semantics implemented for FC differ of those
proposed in this patch set, then that means that the semantics
implemented for FC differ of those for open-iscsi.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 10/14] srp_transport: Simplify attribute initialization code
       [not found]         ` <1324253243.17849.45.camel-1q1vX8mYZiGLUyTwlgNVppKKF0rrzTr+@public.gmane.org>
@ 2011-12-20 10:21           ` Bart Van Assche
  2011-12-21  3:23             ` David Dillow
  0 siblings, 1 reply; 65+ messages in thread
From: Bart Van Assche @ 2011-12-20 10:21 UTC (permalink / raw)
  To: David Dillow
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Fujita Tomonori, Brian King,
	Roland Dreier, linux-scsi-u79uwXL29TY76Z2rM5mHXA

On Mon, Dec 19, 2011 at 12:07 AM, David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org> wrote:
> On Thu, 2011-12-01 at 20:08 +0100, Bart Van Assche wrote:
>> Eliminate the private_rport_attrs[] array and the SETUP_*() macros
>> used to set up that array since the information in that array
>> duplicates the information in the static device attributes. Also,
>> trigger a kernel warning if SRP_RPORT_ATTRS is not large enough
>> since it is easy to forget to update that macro when adding new
>> attributes.
>
> Not sure I care one way or the other, but it looks to have been written
> the way it was to share a common code structure with the FC, SAS, and
> other SCSI transport code.

My opinion about the SETUP_*() macros is that - at least in the SRP
transport code - they make the code harder to read without having any
real value and hence should be removed.

> They use a BUG_ON instead of WARN_ON, which
> seems more appropriate, since we just overran a buffer...

Sorry, but I disagree. The guideline for Linux kernel code is to use
BUG_ON() only if a crash can't be avoided. With SLUB red zoning
enabled it is still possible to unload kernel modules after a (mild)
buffer overrun. And if you have a look at mm/slub.c, you will see that
slab_err() is invoked in case padding has been overwritten. That last
function does something similar to WARN_ON().

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 13/14] ib_srp: Implement transport layer ping
       [not found]           ` <1324333931.7043.52.camel-FqX9LgGZnHWDB2HL1qBt2PIbXMQ5te18@public.gmane.org>
  2011-12-20 10:13             ` Bart Van Assche
@ 2011-12-20 10:27             ` Bart Van Assche
  2011-12-21  3:05               ` David Dillow
  1 sibling, 1 reply; 65+ messages in thread
From: Bart Van Assche @ 2011-12-20 10:27 UTC (permalink / raw)
  To: David Dillow
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA, Roland Dreier,
	Fujita Tomonori, Brian King

On Mon, Dec 19, 2011 at 10:32 PM, David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org> wrote:
> Part of the problem is introduced by allowing for permanent connections
> rather than using the familiar dev_loss_tmo and fast_io_fail_tmo
> parameters from other SCSI transports. For instance, in the FC
> transport, rports are allowed to disappear for up to dev_loss_tmo
> seconds before being removed from the SCSI device tree. Until they have
> been gone for fast_fail_io_tmo seconds, they are blocked (as is error
> handling to prevent offlining devices). Once they have been gone longer
> that fast_fail_io_tmo, they become unblocked and new IO will be failed.

I'm not convinced an equivalent of fast_fail_io_tmo is necessary for
the SRP transport. If a target disappears briefly from the IB fabric
what will happen with the posted patch series is that the initiator is
blocked during one ping interval and also that a reconnect is
triggered. Also, some SCSI commands may be reissued after
reconnecting. But that shouldn't have any adverse consequences, isn't
it ?

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 13/14] ib_srp: Implement transport layer ping
       [not found]               ` <CAO+b5-qLxmcXCCxA8+bPYsinjr1eqCDO2JUJbjgVr59N55CU1Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2011-12-21  2:32                 ` David Dillow
  0 siblings, 0 replies; 65+ messages in thread
From: David Dillow @ 2011-12-21  2:32 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA, Roland Dreier,
	Fujita Tomonori, Brian King

On Tue, 2011-12-20 at 10:13 +0000, Bart Van Assche wrote:
> On Mon, Dec 19, 2011 at 10:32 PM, David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org> wrote:
> > I still think this is already solved in user space, but the new
> > reconnect model you've implemented doesn't match up with the expected
> > semantics. It'd be better to match the rest of the SCSI stack for this.
> 
> Sorry, but I'm afraid there is no "rest of the SCSI stack
> [semantics]". I've been following the semantics implemented in
> open-iscsi. If the semantics implemented for FC differ of those
> proposed in this patch set, then that means that the semantics
> implemented for FC differ of those for open-iscsi.

FC, SAS, SPI, (current) SRP, and FCoE currently use the same semantic to
check if the transport is still alive -- user space initiates activity,
usually via multipathd. You are changing SRP to match the iSCSI stack,
and because SRP does not have a NOP command, you have to fake one up
using Test Unit Ready to the first LUN to come through slave_alloc().
You kill the entire connection if that LUN has issues and causes a
timeout, though the rest of the LUNs may be fine and able to
communicate.

This is all handled properly by existing user space code; there's no
need for it to be in kernel, especially if you have to fake it while
going against the grain.
-- 
Dave Dillow
National Center for Computational Science
Oak Ridge National Laboratory
(865) 241-6602 office

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 13/14] ib_srp: Implement transport layer ping
  2011-12-20 10:27             ` Bart Van Assche
@ 2011-12-21  3:05               ` David Dillow
       [not found]                 ` <1324436736.7621.38.camel-1q1vX8mYZiGLUyTwlgNVppKKF0rrzTr+@public.gmane.org>
  0 siblings, 1 reply; 65+ messages in thread
From: David Dillow @ 2011-12-21  3:05 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: linux-rdma, linux-scsi, Roland Dreier, Fujita Tomonori, Brian King

On Tue, 2011-12-20 at 10:27 +0000, Bart Van Assche wrote:
> On Mon, Dec 19, 2011 at 10:32 PM, David Dillow <dillowda@ornl.gov> wrote:
> > Part of the problem is introduced by allowing for permanent connections
> > rather than using the familiar dev_loss_tmo and fast_io_fail_tmo
> > parameters from other SCSI transports. For instance, in the FC
> > transport, rports are allowed to disappear for up to dev_loss_tmo
> > seconds before being removed from the SCSI device tree. Until they have
> > been gone for fast_fail_io_tmo seconds, they are blocked (as is error
> > handling to prevent offlining devices). Once they have been gone longer
> > that fast_fail_io_tmo, they become unblocked and new IO will be failed.
> 
> I'm not convinced an equivalent of fast_fail_io_tmo is necessary for
> the SRP transport. If a target disappears briefly from the IB fabric
> what will happen with the posted patch series is that the initiator is
> blocked during one ping interval and also that a reconnect is
> triggered. Also, some SCSI commands may be reissued after
> reconnecting. But that shouldn't have any adverse consequences, isn't
> it ?

We don't want to leave a target blocked indefinitely -- commands caught
in the blocked queue won't be reissued until the queue is unblocked --
but we may want to keep the sdX mappings around for a long time.
fast_io_fail_tmo gives us the ability to do that -- the expiration of
fast_io_fail_tmo unblocks the queue and allows commands to be failed due
to the transport error. See commit f2818663.

Also, these settings can be used to help tune multipath failover for
devices with relatively long LUN transfer times (say with RDAC) vs short
ones (ALUA). You still don't want the mappings to go away, but you want
to move data to the other paths in a reasonably quick fashion.
http://lkml.org/lkml/2008/1/7/244 talks about this a bit.

As for reissued commands, most shouldn't be an issue -- once they get
out of the blocked queue. I would expect there to be problems with
certain vendor specific commands, and tape drives will have issues, but
those are common to any multipath solution.
-- 
Dave Dillow
National Center for Computational Science
Oak Ridge National Laboratory
(865) 241-6602 office


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 10/14] srp_transport: Simplify attribute initialization code
  2011-12-20 10:21           ` Bart Van Assche
@ 2011-12-21  3:23             ` David Dillow
  0 siblings, 0 replies; 65+ messages in thread
From: David Dillow @ 2011-12-21  3:23 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: linux-rdma, Fujita Tomonori, Brian King, Roland Dreier, linux-scsi

On Tue, 2011-12-20 at 10:21 +0000, Bart Van Assche wrote:
> On Mon, Dec 19, 2011 at 12:07 AM, David Dillow <dillowda@ornl.gov> wrote:
> > They use a BUG_ON instead of WARN_ON, which
> > seems more appropriate, since we just overran a buffer...
> 
> Sorry, but I disagree. The guideline for Linux kernel code is to use
> BUG_ON() only if a crash can't be avoided. With SLUB red zoning
> enabled it is still possible to unload kernel modules after a (mild)
> buffer overrun. And if you have a look at mm/slub.c, you will see that
> slab_err() is invoked in case padding has been overwritten. That last
> function does something similar to WARN_ON().

You aren't guaranteed to have SLUB as your allocator, much less having
the red-zoning turned on, so you have no guarantee it is safe to
continue.

-- 
Dave Dillow
National Center for Computational Science
Oak Ridge National Laboratory
(865) 241-6602 office


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 12/14] ib_srp: Rework error handling
       [not found]               ` <CAO+b5-qF2taG0B4n9SBwqnuh0wajH5fXFLTb-VAaDrfT9TZ6aQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2011-12-21  3:33                 ` David Dillow
       [not found]                   ` <1324438387.7621.53.camel-1q1vX8mYZiGLUyTwlgNVppKKF0rrzTr+@public.gmane.org>
  0 siblings, 1 reply; 65+ messages in thread
From: David Dillow @ 2011-12-21  3:33 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Fujita Tomonori, Brian King,
	Roland Dreier, linux-scsi-u79uwXL29TY76Z2rM5mHXA

On Tue, 2011-12-20 at 09:01 +0000, Bart Van Assche wrote:
> On Mon, Dec 19, 2011 at 10:51 PM, David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org> wrote:
> > I haven't parsed it all out from your changes just yet, but I think part
> > of the reason you may have had problems with req->scmd being null in
> > srp_handle_recv() is due to a new race between the tear down of the
> > connection and continuing to process completion notifications.
> 
> The resources used by the QP completion handler are the srp_target
> port data structure and the QP data structure. And as you can see in
> srp_remove_target(), the scsi_host_put() call is invoked after
> srp_disconnect_target(). That last function waits for the DREP
> notification from the IB CM. That should guarantee that no IB
> completions arrive during or after the scsi_host_put() call. Or did I
> miss something ?

What keeps the srp_recv_completion() --> srp_handle_recv() -->
srp_process_rsp() --> etc. call chain from racing with
srp_reconnect_target()?

It looks like the srp_reset_req() in srp_reconnect_tartget() could race
with srp_process_rsp() if receive completions continue to be processed
after a send failure, but perhaps it is I who is missing something.
-- 
Dave Dillow
National Center for Computational Science
Oak Ridge National Laboratory
(865) 241-6602 office

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 12/14] ib_srp: Rework error handling
       [not found]                   ` <1324438387.7621.53.camel-1q1vX8mYZiGLUyTwlgNVppKKF0rrzTr+@public.gmane.org>
@ 2011-12-21 13:26                     ` Bart Van Assche
  0 siblings, 0 replies; 65+ messages in thread
From: Bart Van Assche @ 2011-12-21 13:26 UTC (permalink / raw)
  To: David Dillow
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Fujita Tomonori, Brian King,
	Roland Dreier, linux-scsi-u79uwXL29TY76Z2rM5mHXA

On Wed, Dec 21, 2011 at 3:33 AM, David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org> wrote:
> What keeps the srp_recv_completion() --> srp_handle_recv() -->
> srp_process_rsp() --> etc. call chain from racing with
> srp_reconnect_target()?
>
> It looks like the srp_reset_req() in srp_reconnect_target() could race
> with srp_process_rsp() if receive completions continue to be processed
> after a send failure, but perhaps it is I who is missing something.

In the tests I ran so far no issues came up most likely because the
decision to reconnect is taken a long time after the most recent
response has been received. I'll see what has to change to make this
safe even if there wouldn't be any such delay.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 13/14] ib_srp: Implement transport layer ping
       [not found]                 ` <1324436736.7621.38.camel-1q1vX8mYZiGLUyTwlgNVppKKF0rrzTr+@public.gmane.org>
@ 2011-12-21 14:07                   ` Bart Van Assche
  2011-12-23 22:34                     ` David Dillow
  0 siblings, 1 reply; 65+ messages in thread
From: Bart Van Assche @ 2011-12-21 14:07 UTC (permalink / raw)
  To: David Dillow
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA, Roland Dreier,
	Fujita Tomonori, Brian King

On Wed, Dec 21, 2011 at 3:05 AM, David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org> wrote:
> We don't want to leave a target blocked indefinitely -- commands caught
> in the blocked queue won't be reissued until the queue is unblocked --
> but we may want to keep the sdX mappings around for a long time.
> fast_io_fail_tmo gives us the ability to do that -- the expiration of
> fast_io_fail_tmo unblocks the queue and allows commands to be failed due
> to the transport error. See commit f2818663.

Our opinions may differ here. My opinion is that for some use cases it
is crucial to be able to block a target indefinitely, e.g. when there
is only a single path between initiator and target and when the user
prefers that I/O blocks instead of encountering I/O errors. See e.g.
the "iSCSI settings for iSCSI root" section in the iSCSI README
(http://www.open-iscsi.org/docs/README).

Regarding commit f2818663: as far as I can see what that commit does
is to make it impossible for a user to set fast_io_fail_tmo == -1
together with dev_loss_tmo > SCSI_DEVICE_BLOCK_MAX_TIMEOUT. That
change is specific to the FC code - only the FC transport code limits
the blocked state time to the SCSI_DEVICE_BLOCK_MAX_TIMEOUT limit.

> Also, these settings can be used to help tune multipath failover for
> devices with relatively long LUN transfer times (say with RDAC) vs short
> ones (ALUA). You still don't want the mappings to go away, but you want
> to move data to the other paths in a reasonably quick fashion.
> http://lkml.org/lkml/2008/1/7/244 talks about this a bit.

In a multipath setup the ping_interval, ping_timeout, recovery_tmo and
max_reconnects parameters (all added via this patch series) should be
sufficient to allow control over how quickly multipath failover
happens.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 13/14] ib_srp: Implement transport layer ping
  2011-12-21 14:07                   ` Bart Van Assche
@ 2011-12-23 22:34                     ` David Dillow
       [not found]                       ` <1324679698.3004.12.camel-1q1vX8mYZiGLUyTwlgNVppKKF0rrzTr+@public.gmane.org>
  0 siblings, 1 reply; 65+ messages in thread
From: David Dillow @ 2011-12-23 22:34 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: linux-rdma, linux-scsi, Roland Dreier, Fujita Tomonori, Brian King

On Wed, 2011-12-21 at 14:07 +0000, Bart Van Assche wrote:
> On Wed, Dec 21, 2011 at 3:05 AM, David Dillow <dillowda@ornl.gov> wrote:
> > We don't want to leave a target blocked indefinitely -- commands caught
> > in the blocked queue won't be reissued until the queue is unblocked --
> > but we may want to keep the sdX mappings around for a long time.
> > fast_io_fail_tmo gives us the ability to do that -- the expiration of
> > fast_io_fail_tmo unblocks the queue and allows commands to be failed due
> > to the transport error. See commit f2818663.
> 
> Our opinions may differ here. My opinion is that for some use cases it
> is crucial to be able to block a target indefinitely, e.g. when there
> is only a single path between initiator and target and when the user
> prefers that I/O blocks instead of encountering I/O errors. See e.g.
> the "iSCSI settings for iSCSI root" section in the iSCSI README
> (http://www.open-iscsi.org/docs/README).

I can see admins desiring that option, but it's also handled by putting
your root on a dm-multipath volume. Either way, I'm open to letting
things block indefinitely, but I do think we should look into why the
SCSI stack has a concept of the maximum blocking time.

> Regarding commit f2818663: as far as I can see what that commit does
> is to make it impossible for a user to set fast_io_fail_tmo == -1

I was pointing to the commit message as evidence of more prominent SCSI
developer's line of thought on handling missing devices. As you note,
the actual diff is uninteresting.

We should match the rest of the SCSI stack unless there is good reason
to go our own way. The iSCSI transport can do what it does because it
has a native ping, but I'm not convinced they should have introduced new
semantics and/or names when taking advantage of that. As far as I can
tell without deeply studying it, replacement_timeout is equivalent to
fast_io_fail_tmo.

-- 
Dave Dillow
National Center for Computational Science
Oak Ridge National Laboratory
(865) 241-6602 office


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 13/14] ib_srp: Implement transport layer ping
       [not found]                       ` <1324679698.3004.12.camel-1q1vX8mYZiGLUyTwlgNVppKKF0rrzTr+@public.gmane.org>
@ 2011-12-23 22:56                         ` Mike Christie
  2011-12-24 20:07                           ` David Dillow
  2011-12-26 20:01                           ` Bart Van Assche
  0 siblings, 2 replies; 65+ messages in thread
From: Mike Christie @ 2011-12-23 22:56 UTC (permalink / raw)
  To: David Dillow
  Cc: Bart Van Assche, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA, Roland Dreier,
	Fujita Tomonori, Brian King

On 12/23/2011 04:34 PM, David Dillow wrote:
> On Wed, 2011-12-21 at 14:07 +0000, Bart Van Assche wrote:
>> On Wed, Dec 21, 2011 at 3:05 AM, David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org> wrote:
>>> We don't want to leave a target blocked indefinitely -- commands caught
>>> in the blocked queue won't be reissued until the queue is unblocked --
>>> but we may want to keep the sdX mappings around for a long time.
>>> fast_io_fail_tmo gives us the ability to do that -- the expiration of
>>> fast_io_fail_tmo unblocks the queue and allows commands to be failed due
>>> to the transport error. See commit f2818663.
>>
>> Our opinions may differ here. My opinion is that for some use cases it
>> is crucial to be able to block a target indefinitely, e.g. when there
>> is only a single path between initiator and target and when the user
>> prefers that I/O blocks instead of encountering I/O errors. See e.g.
>> the "iSCSI settings for iSCSI root" section in the iSCSI README
>> (http://www.open-iscsi.org/docs/README).
> 
> I can see admins desiring that option, but it's also handled by putting
> your root on a dm-multipath volume. Either way, I'm open to letting
> things block indefinitely, but I do think we should look into why the
> SCSI stack has a concept of the maximum blocking time.
> 
>> Regarding commit f2818663: as far as I can see what that commit does
>> is to make it impossible for a user to set fast_io_fail_tmo == -1
> 
> I was pointing to the commit message as evidence of more prominent SCSI
> developer's line of thought on handling missing devices. As you note,
> the actual diff is uninteresting.
> 
> We should match the rest of the SCSI stack unless there is good reason
> to go our own way. The iSCSI transport can do what it does because it
> has a native ping, but I'm not convinced they should have introduced new
> semantics and/or names when taking advantage of that. As far as I can
> tell without deeply studying it, replacement_timeout is equivalent to
> fast_io_fail_tmo.
> 

iSCSI replacement_timeout is the same as fast_io_fail_tmo for FC. iSCSI
replacement_timeout actually came first so you should say FC should have
copied our name :)

So FC has fast_io_fail_tmo which controls when to fail IO when a port is
blocked. This can be set to so that it does not ever fire. However, FC
also has the dev_loss_tmo which controls when to remove devices (and
also fail IO) when a port is blocked. This one cannot be turned off, so
it will eventually fire and you will get IO errors (this is due to the
devices being removed and the scsi layer then failing all IO).

iSCSI had replacement_timeout first, and it only fails IO, because at
the time things did not handle hotplug removal well, and I did not know
there was a need for the removal. There is patch to add dev_loss_tmo to
the iSCSI layer. There have been some bugs though, so I have not pushed
it. But it will also work like FC where you cannot turn it off.

SAS has something like the dev_loss_tmo. It does not have something like
the fast_io_fail/replacement_timeout.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 13/14] ib_srp: Implement transport layer ping
  2011-12-23 22:56                         ` Mike Christie
@ 2011-12-24 20:07                           ` David Dillow
  2011-12-26 19:39                             ` Bart Van Assche
  2011-12-26 20:01                           ` Bart Van Assche
  1 sibling, 1 reply; 65+ messages in thread
From: David Dillow @ 2011-12-24 20:07 UTC (permalink / raw)
  To: Mike Christie
  Cc: Bart Van Assche, linux-rdma, linux-scsi, Roland Dreier,
	Fujita Tomonori, Brian King

On Fri, 2011-12-23 at 16:56 -0600, Mike Christie wrote:
> On 12/23/2011 04:34 PM, David Dillow wrote:
> > We should match the rest of the SCSI stack unless there is good reason
> > to go our own way. The iSCSI transport can do what it does because it
> > has a native ping, but I'm not convinced they should have introduced new
> > semantics and/or names when taking advantage of that. As far as I can
> > tell without deeply studying it, replacement_timeout is equivalent to
> > fast_io_fail_tmo.
> > 
> 
> iSCSI replacement_timeout is the same as fast_io_fail_tmo for FC. iSCSI
> replacement_timeout actually came first so you should say FC should have
> copied our name :)

Indeed, some git-foo seems to indicate both stacks had that
functionality mainlined in 2.6.18, with iSCSI getting it first. FC
should have used your name.

> iSCSI had replacement_timeout first, and it only fails IO, because at
> the time things did not handle hotplug removal well, and I did not know
> there was a need for the removal. There is patch to add dev_loss_tmo to
> the iSCSI layer. There have been some bugs though, so I have not pushed
> it. But it will also work like FC where you cannot turn it off.
> 
> SAS has something like the dev_loss_tmo. It does not have something like
> the fast_io_fail/replacement_timeout.

This says to me that SRP should use the dev_loss_tmo semantics, though
the naming of fast_io_fail vs replacement_timeout is a bit more of a
question than I thought. I tend to think of SRP more in terms of FC than
iSCSI, so I still prefer the former, but perhaps not as strongly now.

I still believe consistency is good -- what do the SCSI devs think here?
Should we work on getting a consistent name for this property -- and if
so, which one -- or does the name not matter?
-- 
Dave Dillow
National Center for Computational Science
Oak Ridge National Laboratory
(865) 241-6602 office


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 12/14] ib_srp: Rework error handling
       [not found]       ` <1324265791.17849.92.camel-1q1vX8mYZiGLUyTwlgNVppKKF0rrzTr+@public.gmane.org>
@ 2011-12-26 19:13         ` Bart Van Assche
  0 siblings, 0 replies; 65+ messages in thread
From: Bart Van Assche @ 2011-12-26 19:13 UTC (permalink / raw)
  To: David Dillow
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Fujita Tomonori, Brian King,
	Roland Dreier, linux-scsi-u79uwXL29TY76Z2rM5mHXA

On Mon, Dec 19, 2011 at 3:36 AM, David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org> wrote:
> On Thu, 2011-12-01 at 20:10 +0100, Bart Van Assche wrote:
> >  /*
> >   * Copyright (c) 2005 Cisco Systems.  All rights reserved.
> > + * Copyright (c) 2010-2011 Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>.
>
> You've tried to add this in the past; I don't mean to deny you credit,
> and I certainly value your work, but I'm not sure where the threshold is
> for adding a copyright line.

The excerpts below might provide some guidance (source: "The Copyright
Handbook: What Every Writer Needs to Know"):
[ ... ]
There are strict technical requirements as to what a copyright notice
must contain. Follow these rules exactly or your notice may be found
to be invalid and not accomplish its intended purpose. A valid
copyright notice contains three elements:
* the copyright symbol
* the year in which the work was published
* the name of the copyright owner.
It is not required that these elements appear in any particular order
in the notice, but most notices are written in the order set forth
above. We'll discuss each element in turn.
[ ... ]
If there are multiple authors, they should all be listed in the
copyright notice. The authors' names can appear in any order.
[ ... ]

If I interpret the above correctly, at least two names are missing in
the copyright header of the source file ib_srp.c.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 13/14] ib_srp: Implement transport layer ping
  2011-12-24 20:07                           ` David Dillow
@ 2011-12-26 19:39                             ` Bart Van Assche
  2011-12-28 23:53                               ` David Dillow
  0 siblings, 1 reply; 65+ messages in thread
From: Bart Van Assche @ 2011-12-26 19:39 UTC (permalink / raw)
  To: David Dillow
  Cc: Mike Christie, linux-rdma, linux-scsi, Roland Dreier,
	Fujita Tomonori, Brian King

On Sat, Dec 24, 2011 at 8:07 PM, David Dillow <dillowda@ornl.gov> wrote:
> This says to me that SRP should use the dev_loss_tmo semantics, though
> the naming of fast_io_fail vs replacement_timeout is a bit more of a
> question than I thought. I tend to think of SRP more in terms of FC than
> iSCSI, so I still prefer the former, but perhaps not as strongly now.

Do we need dev_loss_tmo functionality ? Since multipathd switches over
if the active path is in the blocked state, the posted patch set
already provides a way to make multipathd switch over if communication
is lost.

Bart.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 13/14] ib_srp: Implement transport layer ping
  2011-12-23 22:56                         ` Mike Christie
  2011-12-24 20:07                           ` David Dillow
@ 2011-12-26 20:01                           ` Bart Van Assche
  1 sibling, 0 replies; 65+ messages in thread
From: Bart Van Assche @ 2011-12-26 20:01 UTC (permalink / raw)
  To: Mike Christie
  Cc: David Dillow, linux-rdma, linux-scsi, Roland Dreier,
	Fujita Tomonori, Brian King

On Fri, Dec 23, 2011 at 10:56 PM, Mike Christie <michaelc@cs.wisc.edu> wrote:
> iSCSI replacement_timeout is the same as fast_io_fail_tmo for FC. iSCSI
> replacement_timeout actually came first so you should say FC should have
> copied our name :)

Agreed. But as far as I can see multipathd only supports the FC name
(fast_io_fail_tmo) and not the iSCSI name (recovery_tmo). That looks
like an argument to me to switch to the FC name for ib_srp.

Bart.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 13/14] ib_srp: Implement transport layer ping
  2011-12-26 19:39                             ` Bart Van Assche
@ 2011-12-28 23:53                               ` David Dillow
  0 siblings, 0 replies; 65+ messages in thread
From: David Dillow @ 2011-12-28 23:53 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Mike Christie, linux-rdma, linux-scsi, Roland Dreier,
	Fujita Tomonori, Brian King

On Mon, 2011-12-26 at 19:39 +0000, Bart Van Assche wrote:
> On Sat, Dec 24, 2011 at 8:07 PM, David Dillow <dillowda@ornl.gov> wrote:
> > This says to me that SRP should use the dev_loss_tmo semantics, though
> > the naming of fast_io_fail vs replacement_timeout is a bit more of a
> > question than I thought. I tend to think of SRP more in terms of FC than
> > iSCSI, so I still prefer the former, but perhaps not as strongly now.
> 
> Do we need dev_loss_tmo functionality ? Since multipathd switches over
> if the active path is in the blocked state, the posted patch set
> already provides a way to make multipathd switch over if communication
> is lost.

I think so. multipathd isn't the only use case, though I think we need
to reorganize the SRP device model to fully take advantage of
dev_loss_tmo -- but that can come later. I'd like to see dev_loss_tmo
take the place of max_reconnects to give admins an easy way to set a
maximum time.
-- 
Dave Dillow
National Center for Computational Science
Oak Ridge National Laboratory
(865) 241-6602 office


^ permalink raw reply	[flat|nested] 65+ messages in thread

end of thread, other threads:[~2011-12-28 23:53 UTC | newest]

Thread overview: 65+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-12-01 18:54 [PATCH 00/14] Make ib_srp better suited for H.A. purposes Bart Van Assche
2011-12-01 19:05 ` [PATCH 08/14] srp_transport: Document sysfs attributes Bart Van Assche
2011-12-01 19:06 ` [PATCH 09/14] srp_transport: Fix attribute registration Bart Van Assche
     [not found]   ` <201112012006.50445.bvanassche-HInyCGIudOg@public.gmane.org>
2011-12-15 20:09     ` David Dillow
2011-12-01 19:10 ` [PATCH 12/14] ib_srp: Rework error handling Bart Van Assche
     [not found]   ` <201112012010.37276.bvanassche-HInyCGIudOg@public.gmane.org>
2011-12-15 20:20     ` David Dillow
2011-12-19  3:36     ` David Dillow
2011-12-19 10:38       ` Bart Van Assche
2011-12-19 22:51         ` David Dillow
     [not found]           ` <1324335083.7043.66.camel-FqX9LgGZnHWDB2HL1qBt2PIbXMQ5te18@public.gmane.org>
2011-12-20  9:01             ` Bart Van Assche
     [not found]               ` <CAO+b5-qF2taG0B4n9SBwqnuh0wajH5fXFLTb-VAaDrfT9TZ6aQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-12-21  3:33                 ` David Dillow
     [not found]                   ` <1324438387.7621.53.camel-1q1vX8mYZiGLUyTwlgNVppKKF0rrzTr+@public.gmane.org>
2011-12-21 13:26                     ` Bart Van Assche
     [not found]       ` <1324265791.17849.92.camel-1q1vX8mYZiGLUyTwlgNVppKKF0rrzTr+@public.gmane.org>
2011-12-26 19:13         ` Bart Van Assche
     [not found] ` <201112011954.25811.bvanassche-HInyCGIudOg@public.gmane.org>
2011-12-01 18:55   ` [PATCH 01/14] ib_srp: Introduce pr_fmt() Bart Van Assche
     [not found]     ` <201112011955.47198.bvanassche-HInyCGIudOg@public.gmane.org>
2011-12-01 23:40       ` David Dillow
2011-12-01 18:57   ` [PATCH 02/14] ib_srp: Consolidate repetitive sysfs code Bart Van Assche
     [not found]     ` <201112011957.19892.bvanassche-HInyCGIudOg@public.gmane.org>
2011-12-15 18:26       ` David Dillow
2011-12-01 18:58   ` [PATCH 03/14] ib_srp: Disallow duplicate logins Bart Van Assche
     [not found]     ` <201112011958.17339.bvanassche-HInyCGIudOg@public.gmane.org>
2011-12-15 19:08       ` David Dillow
     [not found]         ` <1323976101.16703.42.camel-FqX9LgGZnHWDB2HL1qBt2PIbXMQ5te18@public.gmane.org>
2011-12-18 19:20           ` Bart Van Assche
     [not found]             ` <CAO+b5-r6dHt-vmbNjeD4zvcAaRVqhiEm5eZF7hgF6ei35kqjdQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-12-18 21:40               ` David Dillow
     [not found]                 ` <1324244446.17849.39.camel-1q1vX8mYZiGLUyTwlgNVppKKF0rrzTr+@public.gmane.org>
2011-12-19 10:31                   ` Bart Van Assche
2011-12-01 18:59   ` [PATCH 04/14] ib_srp: Set block layer timeout Bart Van Assche
     [not found]     ` <201112011959.11956.bvanassche-HInyCGIudOg@public.gmane.org>
2011-12-15 19:37       ` David Dillow
     [not found]         ` <1323977841.16703.57.camel-FqX9LgGZnHWDB2HL1qBt2PIbXMQ5te18@public.gmane.org>
2011-12-17 17:50           ` Bart Van Assche
     [not found]             ` <CAO+b5-p9FQVvdVZtUvRJgMW6qhcnUKrp4RhCch1Hnt8JsjS5qQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-12-17 22:39               ` David Dillow
2011-12-17 22:03       ` Or Gerlitz
     [not found]         ` <CAJZOPZKQ4rg6D=ZDt2q+aJbsNAQbgqgh19FmGC5Vi6_EQ4ROFQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-12-17 22:39           ` David Dillow
2011-12-18 11:53           ` Bart Van Assche
2011-12-01 19:00   ` [PATCH 05/14] ib_srp: Avoid that late SRP replies cause trouble Bart Van Assche
     [not found]     ` <201112012000.04427.bvanassche-HInyCGIudOg@public.gmane.org>
2011-12-15 20:03       ` David Dillow
2011-12-01 19:02   ` [PATCH 06/14] ib_srp: Micro-optimize completion handlers Bart Van Assche
     [not found]     ` <201112012002.17829.bvanassche-HInyCGIudOg@public.gmane.org>
2011-12-01 21:35       ` chas williams - CONTRACTOR
     [not found]         ` <20111201163539.572ca669-KCdNrDJlFBBhNwqIksvPR6qiWZVw4kCD+aIohriVLy8@public.gmane.org>
2011-12-01 23:32           ` David Dillow
     [not found]             ` <1322782369.11664.5.camel-FqX9LgGZnHWDB2HL1qBt2PIbXMQ5te18@public.gmane.org>
2011-12-12 11:41               ` Bart Van Assche
     [not found]                 ` <CAO+b5-pFBEQybN+01heAzr=_dNCb7Sr7ri_o_hF1MnOeX4_idQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-12-12 23:12                   ` David Dillow
2011-12-01 19:02   ` [PATCH 07/14] ib_srp: Introduce srp_handle_qp_err() Bart Van Assche
     [not found]     ` <201112012002.56307.bvanassche-HInyCGIudOg@public.gmane.org>
2011-12-18 23:55       ` David Dillow
2011-12-01 19:08   ` [PATCH 10/14] srp_transport: Simplify attribute initialization code Bart Van Assche
     [not found]     ` <201112012008.00502.bvanassche-HInyCGIudOg@public.gmane.org>
2011-12-19  0:07       ` David Dillow
     [not found]         ` <1324253243.17849.45.camel-1q1vX8mYZiGLUyTwlgNVppKKF0rrzTr+@public.gmane.org>
2011-12-20 10:21           ` Bart Van Assche
2011-12-21  3:23             ` David Dillow
2011-12-01 19:09   ` [PATCH 11/14] ib_srp: Document sysfs attributes Bart Van Assche
     [not found]     ` <201112012009.12815.bvanassche-HInyCGIudOg@public.gmane.org>
2011-12-19  0:33       ` David Dillow
     [not found]         ` <1324254811.17849.65.camel-1q1vX8mYZiGLUyTwlgNVppKKF0rrzTr+@public.gmane.org>
2011-12-19  8:46           ` Bart Van Assche
     [not found]             ` <CAO+b5-rb=Gtch0UCZPwTSCHhOdsUBecSpgYjaLvj5pbo9_1AeQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-12-19 21:27               ` David Dillow
2011-12-01 19:11   ` [PATCH 13/14] ib_srp: Implement transport layer ping Bart Van Assche
2011-12-19  0:50     ` David Dillow
2011-12-19 10:16       ` Bart Van Assche
2011-12-19 22:32         ` David Dillow
     [not found]           ` <1324333931.7043.52.camel-FqX9LgGZnHWDB2HL1qBt2PIbXMQ5te18@public.gmane.org>
2011-12-20 10:13             ` Bart Van Assche
     [not found]               ` <CAO+b5-qLxmcXCCxA8+bPYsinjr1eqCDO2JUJbjgVr59N55CU1Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-12-21  2:32                 ` David Dillow
2011-12-20 10:27             ` Bart Van Assche
2011-12-21  3:05               ` David Dillow
     [not found]                 ` <1324436736.7621.38.camel-1q1vX8mYZiGLUyTwlgNVppKKF0rrzTr+@public.gmane.org>
2011-12-21 14:07                   ` Bart Van Assche
2011-12-23 22:34                     ` David Dillow
     [not found]                       ` <1324679698.3004.12.camel-1q1vX8mYZiGLUyTwlgNVppKKF0rrzTr+@public.gmane.org>
2011-12-23 22:56                         ` Mike Christie
2011-12-24 20:07                           ` David Dillow
2011-12-26 19:39                             ` Bart Van Assche
2011-12-28 23:53                               ` David Dillow
2011-12-26 20:01                           ` Bart Van Assche
2011-12-01 19:13 ` [PATCH 14/14] ib_srp: Allow SRP disconnect through sysfs Bart Van Assche
2011-12-19  4:03   ` David Dillow
     [not found]     ` <1324267414.17849.98.camel-1q1vX8mYZiGLUyTwlgNVppKKF0rrzTr+@public.gmane.org>
2011-12-19  9:04       ` Bart Van Assche
2011-12-01 23:26 ` [PATCH 00/14] Make ib_srp better suited for H.A. purposes David Dillow

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.