All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] Unblock SCSI devices even if the LLD is being unloaded
@ 2017-03-16 20:56 Bart Van Assche
  2017-03-16 20:56   ` Bart Van Assche
                   ` (3 more replies)
  0 siblings, 4 replies; 18+ messages in thread
From: Bart Van Assche @ 2017-03-16 20:56 UTC (permalink / raw)
  To: Martin K . Petersen
  Cc: linux-scsi, Hannes Reinecke, Israel Rukshin, Max Gurtovoy,
	Bart Van Assche

scsi_target_unblock() must unblock SCSI devices even if this function
is called after unloading of the LLD that created these devices has
started. This is necessary to prevent that __scsi_remove_device() hangs
on the SYNCHRONIZE CACHE command issued by the sd driver during shutdown.

Bart Van Assche (3):
  __scsi_iterate_devices(): Make the get and put functions arguments
  Introduce starget_for_all_devices() and shost_for_all_devices()
  Ensure that scsi_target_unblock() examines all devices

 drivers/scsi/scsi.c        | 60 +++++++++++++++++++++++++++++++++++++++++-----
 drivers/scsi/scsi_lib.c    | 22 ++++++++++++++---
 include/scsi/scsi_device.h | 29 ++++++++++++++++++----
 3 files changed, 97 insertions(+), 14 deletions(-)

-- 
2.12.0

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/3] __scsi_iterate_devices(): Make the get and put functions arguments
  2017-03-16 20:56 [PATCH 0/3] Unblock SCSI devices even if the LLD is being unloaded Bart Van Assche
@ 2017-03-16 20:56   ` Bart Van Assche
  2017-03-16 20:56   ` Bart Van Assche
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 18+ messages in thread
From: Bart Van Assche @ 2017-03-16 20:56 UTC (permalink / raw)
  To: Martin K . Petersen
  Cc: linux-scsi, Hannes Reinecke, Israel Rukshin, Max Gurtovoy,
	Bart Van Assche, stable

This patch does not change any functionality.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Tested-by: Israel Rukshin <israelr@mellanox.com>
Cc: Max Gurtovoy <maxg@mellanox.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: <stable@vger.kernel.org>
---
 drivers/scsi/scsi.c        |  8 +++++---
 include/scsi/scsi_device.h | 11 ++++++-----
 2 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index 7bfbcfa7af40..5ac16fecbdab 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -609,7 +609,9 @@ EXPORT_SYMBOL(scsi_device_put);
 
 /* helper for shost_for_each_device, see that for documentation */
 struct scsi_device *__scsi_iterate_devices(struct Scsi_Host *shost,
-					   struct scsi_device *prev)
+					   struct scsi_device *prev,
+					   int (*get)(struct scsi_device *),
+					   void (*put)(struct scsi_device *))
 {
 	struct list_head *list = (prev ? &prev->siblings : &shost->__devices);
 	struct scsi_device *next = NULL;
@@ -619,7 +621,7 @@ struct scsi_device *__scsi_iterate_devices(struct Scsi_Host *shost,
 	while (list->next != &shost->__devices) {
 		next = list_entry(list->next, struct scsi_device, siblings);
 		/* skip devices that we can't get a reference to */
-		if (!scsi_device_get(next))
+		if (!get(next))
 			break;
 		next = NULL;
 		list = list->next;
@@ -627,7 +629,7 @@ struct scsi_device *__scsi_iterate_devices(struct Scsi_Host *shost,
 	spin_unlock_irqrestore(shost->host_lock, flags);
 
 	if (prev)
-		scsi_device_put(prev);
+		put(prev);
 	return next;
 }
 EXPORT_SYMBOL(__scsi_iterate_devices);
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index 080c7ce9bae8..434b617c9f76 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -334,7 +334,9 @@ extern void __starget_for_each_device(struct scsi_target *, void *,
 
 /* only exposed to implement shost_for_each_device */
 extern struct scsi_device *__scsi_iterate_devices(struct Scsi_Host *,
-						  struct scsi_device *);
+					struct scsi_device *,
+					int (*get)(struct scsi_device *),
+					void (*put)(struct scsi_device *));
 
 /**
  * shost_for_each_device - iterate over all devices of a host
@@ -345,10 +347,9 @@ extern struct scsi_device *__scsi_iterate_devices(struct Scsi_Host *,
  * takes a reference on each device and releases it at the end.  If
  * you break out of the loop, you must call scsi_device_put(sdev).
  */
-#define shost_for_each_device(sdev, shost) \
-	for ((sdev) = __scsi_iterate_devices((shost), NULL); \
-	     (sdev); \
-	     (sdev) = __scsi_iterate_devices((shost), (sdev)))
+#define shost_for_each_device(sdev, shost)			\
+	for ((sdev) = NULL; ((sdev) = __scsi_iterate_devices((shost), (sdev), \
+	     scsi_device_get, scsi_device_put)) != NULL; )
 
 /**
  * __shost_for_each_device - iterate over all devices of a host (UNLOCKED)
-- 
2.12.0

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 1/3] __scsi_iterate_devices(): Make the get and put functions arguments
@ 2017-03-16 20:56   ` Bart Van Assche
  0 siblings, 0 replies; 18+ messages in thread
From: Bart Van Assche @ 2017-03-16 20:56 UTC (permalink / raw)
  To: Martin K . Petersen
  Cc: linux-scsi, Hannes Reinecke, Israel Rukshin, Max Gurtovoy,
	Bart Van Assche, stable

This patch does not change any functionality.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Tested-by: Israel Rukshin <israelr@mellanox.com>
Cc: Max Gurtovoy <maxg@mellanox.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: <stable@vger.kernel.org>
---
 drivers/scsi/scsi.c        |  8 +++++---
 include/scsi/scsi_device.h | 11 ++++++-----
 2 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index 7bfbcfa7af40..5ac16fecbdab 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -609,7 +609,9 @@ EXPORT_SYMBOL(scsi_device_put);
 
 /* helper for shost_for_each_device, see that for documentation */
 struct scsi_device *__scsi_iterate_devices(struct Scsi_Host *shost,
-					   struct scsi_device *prev)
+					   struct scsi_device *prev,
+					   int (*get)(struct scsi_device *),
+					   void (*put)(struct scsi_device *))
 {
 	struct list_head *list = (prev ? &prev->siblings : &shost->__devices);
 	struct scsi_device *next = NULL;
@@ -619,7 +621,7 @@ struct scsi_device *__scsi_iterate_devices(struct Scsi_Host *shost,
 	while (list->next != &shost->__devices) {
 		next = list_entry(list->next, struct scsi_device, siblings);
 		/* skip devices that we can't get a reference to */
-		if (!scsi_device_get(next))
+		if (!get(next))
 			break;
 		next = NULL;
 		list = list->next;
@@ -627,7 +629,7 @@ struct scsi_device *__scsi_iterate_devices(struct Scsi_Host *shost,
 	spin_unlock_irqrestore(shost->host_lock, flags);
 
 	if (prev)
-		scsi_device_put(prev);
+		put(prev);
 	return next;
 }
 EXPORT_SYMBOL(__scsi_iterate_devices);
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index 080c7ce9bae8..434b617c9f76 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -334,7 +334,9 @@ extern void __starget_for_each_device(struct scsi_target *, void *,
 
 /* only exposed to implement shost_for_each_device */
 extern struct scsi_device *__scsi_iterate_devices(struct Scsi_Host *,
-						  struct scsi_device *);
+					struct scsi_device *,
+					int (*get)(struct scsi_device *),
+					void (*put)(struct scsi_device *));
 
 /**
  * shost_for_each_device - iterate over all devices of a host
@@ -345,10 +347,9 @@ extern struct scsi_device *__scsi_iterate_devices(struct Scsi_Host *,
  * takes a reference on each device and releases it at the end.  If
  * you break out of the loop, you must call scsi_device_put(sdev).
  */
-#define shost_for_each_device(sdev, shost) \
-	for ((sdev) = __scsi_iterate_devices((shost), NULL); \
-	     (sdev); \
-	     (sdev) = __scsi_iterate_devices((shost), (sdev)))
+#define shost_for_each_device(sdev, shost)			\
+	for ((sdev) = NULL; ((sdev) = __scsi_iterate_devices((shost), (sdev), \
+	     scsi_device_get, scsi_device_put)) != NULL; )
 
 /**
  * __shost_for_each_device - iterate over all devices of a host (UNLOCKED)
-- 
2.12.0

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/3] Introduce starget_for_all_devices() and shost_for_all_devices()
  2017-03-16 20:56 [PATCH 0/3] Unblock SCSI devices even if the LLD is being unloaded Bart Van Assche
@ 2017-03-16 20:56   ` Bart Van Assche
  2017-03-16 20:56   ` Bart Van Assche
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 18+ messages in thread
From: Bart Van Assche @ 2017-03-16 20:56 UTC (permalink / raw)
  To: Martin K . Petersen
  Cc: linux-scsi, Hannes Reinecke, Israel Rukshin, Max Gurtovoy,
	Bart Van Assche, stable

This patch does not change any functionality.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Tested-by: Israel Rukshin <israelr@mellanox.com>
Cc: Max Gurtovoy <maxg@mellanox.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: <stable@vger.kernel.org>
---
 drivers/scsi/scsi.c        | 52 +++++++++++++++++++++++++++++++++++++++++++---
 include/scsi/scsi_device.h | 22 ++++++++++++++++++--
 2 files changed, 69 insertions(+), 5 deletions(-)

diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index 5ac16fecbdab..ea408da8de29 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -640,9 +640,11 @@ EXPORT_SYMBOL(__scsi_iterate_devices);
  * @data:	Opaque passed to each function call.
  * @fn:		Function to call on each device
  *
- * This traverses over each device of @starget.  The devices have
- * a reference that must be released by scsi_host_put when breaking
- * out of the loop.
+ * This traverses over each device of @starget except the devices that are in
+ * state SDEV_DEL or SDEV_CANCEL. The devices have a reference that must be
+ * released by scsi_device_put() when breaking out of the loop. If the LLD
+ * associated with the devices is being unloaded, @fn is not called for any
+ * device.
  */
 void starget_for_each_device(struct scsi_target *starget, void *data,
 		     void (*fn)(struct scsi_device *, void *))
@@ -659,6 +661,50 @@ void starget_for_each_device(struct scsi_target *starget, void *data,
 EXPORT_SYMBOL(starget_for_each_device);
 
 /**
+ * scsi_device_get_any() - get a reference to @sdev even if it is being deleted
+ *
+ * See also scsi_device_get().
+ */
+static int scsi_device_get_any(struct scsi_device *sdev)
+{
+	return get_device(&sdev->sdev_gendev) ? 0 : -ENXIO;
+}
+
+/**
+ * scsi_device_put_any() - drop a reference obtained by scsi_device_get_any()
+ *
+ * See also scsi_device_put().
+ */
+static void scsi_device_put_any(struct scsi_device *sdev)
+{
+	put_device(&sdev->sdev_gendev);
+}
+
+/**
+ * starget_for_all_devices - helper to walk all devices of a target
+ * @starget:	target whose devices we want to iterate over.
+ * @data:	Pointer passed to each function call.
+ * @fn:		Function to call on each device
+ *
+ * This traverses over each device of @starget, including the devices in state
+ * SDEV_DEL or SDEV_ANY. The devices have a reference that must be released by
+ * scsi_device_put_any() when breaking out of the loop.
+ */
+void starget_for_all_devices(struct scsi_target *starget, void *data,
+			     void (*fn)(struct scsi_device *, void *))
+{
+	struct Scsi_Host *shost = dev_to_shost(starget->dev.parent);
+	struct scsi_device *sdev;
+
+	shost_for_all_devices(sdev, shost, scsi_device_get_any,
+			      scsi_device_put_any)
+		if (sdev->channel == starget->channel &&
+		    sdev->id == starget->id)
+			fn(sdev, data);
+}
+EXPORT_SYMBOL(starget_for_all_devices);
+
+/**
  * __starget_for_each_device - helper to walk all devices of a target (UNLOCKED)
  * @starget:	target whose devices we want to iterate over.
  * @data:	parameter for callback @fn()
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index 434b617c9f76..cd6a5383e9b7 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -328,6 +328,8 @@ extern struct scsi_device *__scsi_device_lookup_by_target(struct scsi_target *,
 							  u64);
 extern void starget_for_each_device(struct scsi_target *, void *,
 		     void (*fn)(struct scsi_device *, void *));
+extern void starget_for_all_devices(struct scsi_target *, void *,
+				    void (*fn)(struct scsi_device *, void *));
 extern void __starget_for_each_device(struct scsi_target *, void *,
 				      void (*fn)(struct scsi_device *,
 						 void *));
@@ -339,6 +341,22 @@ extern struct scsi_device *__scsi_iterate_devices(struct Scsi_Host *,
 					void (*put)(struct scsi_device *));
 
 /**
+ * shost_for_all_devices - iterate over all devices of a host
+ * @sdev: the &struct scsi_device to use as a cursor
+ * @shost: the &struct scsi_host to iterate over
+ * @get: function that obtains a reference to a device and returns 0 upon
+ *       success
+ * @put: function that drops a device reference.
+ *
+ * Iterator that returns each device attached to @shost.  This loop
+ * takes a reference on each device and releases it at the end.  If
+ * you break out of the loop, you must call @put(sdev).
+ */
+#define shost_for_all_devices(sdev, shost, get, put)			\
+	for ((sdev) = NULL; ((sdev) = __scsi_iterate_devices((shost), (sdev), \
+	     (get), (put))) != NULL; )
+
+/**
  * shost_for_each_device - iterate over all devices of a host
  * @sdev: the &struct scsi_device to use as a cursor
  * @shost: the &struct scsi_host to iterate over
@@ -348,8 +366,8 @@ extern struct scsi_device *__scsi_iterate_devices(struct Scsi_Host *,
  * you break out of the loop, you must call scsi_device_put(sdev).
  */
 #define shost_for_each_device(sdev, shost)			\
-	for ((sdev) = NULL; ((sdev) = __scsi_iterate_devices((shost), (sdev), \
-	     scsi_device_get, scsi_device_put)) != NULL; )
+	shost_for_all_devices((sdev), (shost), scsi_device_get,	\
+			      scsi_device_put)
 
 /**
  * __shost_for_each_device - iterate over all devices of a host (UNLOCKED)
-- 
2.12.0

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/3] Introduce starget_for_all_devices() and shost_for_all_devices()
@ 2017-03-16 20:56   ` Bart Van Assche
  0 siblings, 0 replies; 18+ messages in thread
From: Bart Van Assche @ 2017-03-16 20:56 UTC (permalink / raw)
  To: Martin K . Petersen
  Cc: linux-scsi, Hannes Reinecke, Israel Rukshin, Max Gurtovoy,
	Bart Van Assche, stable

This patch does not change any functionality.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Tested-by: Israel Rukshin <israelr@mellanox.com>
Cc: Max Gurtovoy <maxg@mellanox.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: <stable@vger.kernel.org>
---
 drivers/scsi/scsi.c        | 52 +++++++++++++++++++++++++++++++++++++++++++---
 include/scsi/scsi_device.h | 22 ++++++++++++++++++--
 2 files changed, 69 insertions(+), 5 deletions(-)

diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index 5ac16fecbdab..ea408da8de29 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -640,9 +640,11 @@ EXPORT_SYMBOL(__scsi_iterate_devices);
  * @data:	Opaque passed to each function call.
  * @fn:		Function to call on each device
  *
- * This traverses over each device of @starget.  The devices have
- * a reference that must be released by scsi_host_put when breaking
- * out of the loop.
+ * This traverses over each device of @starget except the devices that are in
+ * state SDEV_DEL or SDEV_CANCEL. The devices have a reference that must be
+ * released by scsi_device_put() when breaking out of the loop. If the LLD
+ * associated with the devices is being unloaded, @fn is not called for any
+ * device.
  */
 void starget_for_each_device(struct scsi_target *starget, void *data,
 		     void (*fn)(struct scsi_device *, void *))
@@ -659,6 +661,50 @@ void starget_for_each_device(struct scsi_target *starget, void *data,
 EXPORT_SYMBOL(starget_for_each_device);
 
 /**
+ * scsi_device_get_any() - get a reference to @sdev even if it is being deleted
+ *
+ * See also scsi_device_get().
+ */
+static int scsi_device_get_any(struct scsi_device *sdev)
+{
+	return get_device(&sdev->sdev_gendev) ? 0 : -ENXIO;
+}
+
+/**
+ * scsi_device_put_any() - drop a reference obtained by scsi_device_get_any()
+ *
+ * See also scsi_device_put().
+ */
+static void scsi_device_put_any(struct scsi_device *sdev)
+{
+	put_device(&sdev->sdev_gendev);
+}
+
+/**
+ * starget_for_all_devices - helper to walk all devices of a target
+ * @starget:	target whose devices we want to iterate over.
+ * @data:	Pointer passed to each function call.
+ * @fn:		Function to call on each device
+ *
+ * This traverses over each device of @starget, including the devices in state
+ * SDEV_DEL or SDEV_ANY. The devices have a reference that must be released by
+ * scsi_device_put_any() when breaking out of the loop.
+ */
+void starget_for_all_devices(struct scsi_target *starget, void *data,
+			     void (*fn)(struct scsi_device *, void *))
+{
+	struct Scsi_Host *shost = dev_to_shost(starget->dev.parent);
+	struct scsi_device *sdev;
+
+	shost_for_all_devices(sdev, shost, scsi_device_get_any,
+			      scsi_device_put_any)
+		if (sdev->channel == starget->channel &&
+		    sdev->id == starget->id)
+			fn(sdev, data);
+}
+EXPORT_SYMBOL(starget_for_all_devices);
+
+/**
  * __starget_for_each_device - helper to walk all devices of a target (UNLOCKED)
  * @starget:	target whose devices we want to iterate over.
  * @data:	parameter for callback @fn()
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index 434b617c9f76..cd6a5383e9b7 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -328,6 +328,8 @@ extern struct scsi_device *__scsi_device_lookup_by_target(struct scsi_target *,
 							  u64);
 extern void starget_for_each_device(struct scsi_target *, void *,
 		     void (*fn)(struct scsi_device *, void *));
+extern void starget_for_all_devices(struct scsi_target *, void *,
+				    void (*fn)(struct scsi_device *, void *));
 extern void __starget_for_each_device(struct scsi_target *, void *,
 				      void (*fn)(struct scsi_device *,
 						 void *));
@@ -339,6 +341,22 @@ extern struct scsi_device *__scsi_iterate_devices(struct Scsi_Host *,
 					void (*put)(struct scsi_device *));
 
 /**
+ * shost_for_all_devices - iterate over all devices of a host
+ * @sdev: the &struct scsi_device to use as a cursor
+ * @shost: the &struct scsi_host to iterate over
+ * @get: function that obtains a reference to a device and returns 0 upon
+ *       success
+ * @put: function that drops a device reference.
+ *
+ * Iterator that returns each device attached to @shost.  This loop
+ * takes a reference on each device and releases it at the end.  If
+ * you break out of the loop, you must call @put(sdev).
+ */
+#define shost_for_all_devices(sdev, shost, get, put)			\
+	for ((sdev) = NULL; ((sdev) = __scsi_iterate_devices((shost), (sdev), \
+	     (get), (put))) != NULL; )
+
+/**
  * shost_for_each_device - iterate over all devices of a host
  * @sdev: the &struct scsi_device to use as a cursor
  * @shost: the &struct scsi_host to iterate over
@@ -348,8 +366,8 @@ extern struct scsi_device *__scsi_iterate_devices(struct Scsi_Host *,
  * you break out of the loop, you must call scsi_device_put(sdev).
  */
 #define shost_for_each_device(sdev, shost)			\
-	for ((sdev) = NULL; ((sdev) = __scsi_iterate_devices((shost), (sdev), \
-	     scsi_device_get, scsi_device_put)) != NULL; )
+	shost_for_all_devices((sdev), (shost), scsi_device_get,	\
+			      scsi_device_put)
 
 /**
  * __shost_for_each_device - iterate over all devices of a host (UNLOCKED)
-- 
2.12.0

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 3/3] Ensure that scsi_target_unblock() examines all devices
  2017-03-16 20:56 [PATCH 0/3] Unblock SCSI devices even if the LLD is being unloaded Bart Van Assche
@ 2017-03-16 20:56   ` Bart Van Assche
  2017-03-16 20:56   ` Bart Van Assche
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 18+ messages in thread
From: Bart Van Assche @ 2017-03-16 20:56 UTC (permalink / raw)
  To: Martin K . Petersen
  Cc: linux-scsi, Hannes Reinecke, Israel Rukshin, Max Gurtovoy,
	Bart Van Assche, stable

Since scsi_target_unblock() uses starget_for_each_device(), since
starget_for_each_device() uses scsi_device_get(), since
scsi_device_get() fails after unloading of the LLD kernel module
has been started scsi_target_unblock() may skip devices that were
affected by scsi_target_block(). Ensure that scsi_target_block()
examines all SCSI devices. This patch avoids that unloading the
ib_srp kernel module can trigger the following hang:

Call Trace:
 schedule+0x35/0x80
 schedule_timeout+0x237/0x2d0
 io_schedule_timeout+0xa6/0x110
 wait_for_completion_io+0xa3/0x110
 blk_execute_rq+0xdf/0x120
 scsi_execute+0xce/0x150 [scsi_mod]
 scsi_execute_req_flags+0x8f/0xf0 [scsi_mod]
 sd_sync_cache+0xa9/0x190 [sd_mod]
 sd_shutdown+0x6a/0x100 [sd_mod]
 sd_remove+0x64/0xc0 [sd_mod]
 __device_release_driver+0x8d/0x120
 device_release_driver+0x1e/0x30
 bus_remove_device+0xf9/0x170
 device_del+0x127/0x240
 __scsi_remove_device+0xc1/0xd0 [scsi_mod]
 scsi_forget_host+0x57/0x60 [scsi_mod]
 scsi_remove_host+0x72/0x110 [scsi_mod]
 srp_remove_work+0x8b/0x200 [ib_srp]

Reported-by: Israel Rukshin <israelr@mellanox.com>
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Tested-by: Israel Rukshin <israelr@mellanox.com>
Cc: Max Gurtovoy <maxg@mellanox.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: <stable@vger.kernel.org>
---
 drivers/scsi/scsi_lib.c | 22 +++++++++++++++++++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 19125d72f322..f03a7867c04f 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -3071,21 +3071,37 @@ device_unblock(struct scsi_device *sdev, void *data)
 	scsi_internal_device_unblock(sdev, *(enum scsi_device_state *)data);
 }
 
+/**
+ * target_unblock() - unblock all devices associated with a SCSI target
+ *
+ * Notes:
+ * - Do not use scsi_device_get() nor any of the macros that use this
+ *   function from inside scsi_target_block() because otherwise this function
+ *   won't have any effect when called while the SCSI LLD is being unloaded.
+ * - Do not hold the host lock around the device_unblock() calls because at
+ *   least for blk-sq the block layer queue lock is the outer lock and the
+ *   SCSI host lock is the inner lock.
+ */
 static int
 target_unblock(struct device *dev, void *data)
 {
 	if (scsi_is_target_device(dev))
-		starget_for_each_device(to_scsi_target(dev), data,
+		starget_for_all_devices(to_scsi_target(dev), data,
 					device_unblock);
 	return 0;
 }
 
+/**
+ * scsi_target_unblock() - unblock all devices associated with a SCSI target
+ * @dev: Either a pointer to the dev member of struct scsi_target or a pointer
+ *	to the shost_gendev member of struct Scsi_Host.
+ * @new_state: New SCSI device state.
+ */
 void
 scsi_target_unblock(struct device *dev, enum scsi_device_state new_state)
 {
 	if (scsi_is_target_device(dev))
-		starget_for_each_device(to_scsi_target(dev), &new_state,
-					device_unblock);
+		target_unblock(dev, &new_state);
 	else
 		device_for_each_child(dev, &new_state, target_unblock);
 }
-- 
2.12.0

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 3/3] Ensure that scsi_target_unblock() examines all devices
@ 2017-03-16 20:56   ` Bart Van Assche
  0 siblings, 0 replies; 18+ messages in thread
From: Bart Van Assche @ 2017-03-16 20:56 UTC (permalink / raw)
  To: Martin K . Petersen
  Cc: linux-scsi, Hannes Reinecke, Israel Rukshin, Max Gurtovoy,
	Bart Van Assche, stable

Since scsi_target_unblock() uses starget_for_each_device(), since
starget_for_each_device() uses scsi_device_get(), since
scsi_device_get() fails after unloading of the LLD kernel module
has been started scsi_target_unblock() may skip devices that were
affected by scsi_target_block(). Ensure that scsi_target_block()
examines all SCSI devices. This patch avoids that unloading the
ib_srp kernel module can trigger the following hang:

Call Trace:
 schedule+0x35/0x80
 schedule_timeout+0x237/0x2d0
 io_schedule_timeout+0xa6/0x110
 wait_for_completion_io+0xa3/0x110
 blk_execute_rq+0xdf/0x120
 scsi_execute+0xce/0x150 [scsi_mod]
 scsi_execute_req_flags+0x8f/0xf0 [scsi_mod]
 sd_sync_cache+0xa9/0x190 [sd_mod]
 sd_shutdown+0x6a/0x100 [sd_mod]
 sd_remove+0x64/0xc0 [sd_mod]
 __device_release_driver+0x8d/0x120
 device_release_driver+0x1e/0x30
 bus_remove_device+0xf9/0x170
 device_del+0x127/0x240
 __scsi_remove_device+0xc1/0xd0 [scsi_mod]
 scsi_forget_host+0x57/0x60 [scsi_mod]
 scsi_remove_host+0x72/0x110 [scsi_mod]
 srp_remove_work+0x8b/0x200 [ib_srp]

Reported-by: Israel Rukshin <israelr@mellanox.com>
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Tested-by: Israel Rukshin <israelr@mellanox.com>
Cc: Max Gurtovoy <maxg@mellanox.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: <stable@vger.kernel.org>
---
 drivers/scsi/scsi_lib.c | 22 +++++++++++++++++++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 19125d72f322..f03a7867c04f 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -3071,21 +3071,37 @@ device_unblock(struct scsi_device *sdev, void *data)
 	scsi_internal_device_unblock(sdev, *(enum scsi_device_state *)data);
 }
 
+/**
+ * target_unblock() - unblock all devices associated with a SCSI target
+ *
+ * Notes:
+ * - Do not use scsi_device_get() nor any of the macros that use this
+ *   function from inside scsi_target_block() because otherwise this function
+ *   won't have any effect when called while the SCSI LLD is being unloaded.
+ * - Do not hold the host lock around the device_unblock() calls because at
+ *   least for blk-sq the block layer queue lock is the outer lock and the
+ *   SCSI host lock is the inner lock.
+ */
 static int
 target_unblock(struct device *dev, void *data)
 {
 	if (scsi_is_target_device(dev))
-		starget_for_each_device(to_scsi_target(dev), data,
+		starget_for_all_devices(to_scsi_target(dev), data,
 					device_unblock);
 	return 0;
 }
 
+/**
+ * scsi_target_unblock() - unblock all devices associated with a SCSI target
+ * @dev: Either a pointer to the dev member of struct scsi_target or a pointer
+ *	to the shost_gendev member of struct Scsi_Host.
+ * @new_state: New SCSI device state.
+ */
 void
 scsi_target_unblock(struct device *dev, enum scsi_device_state new_state)
 {
 	if (scsi_is_target_device(dev))
-		starget_for_each_device(to_scsi_target(dev), &new_state,
-					device_unblock);
+		target_unblock(dev, &new_state);
 	else
 		device_for_each_child(dev, &new_state, target_unblock);
 }
-- 
2.12.0

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/3] Unblock SCSI devices even if the LLD is being unloaded
  2017-03-16 20:56 [PATCH 0/3] Unblock SCSI devices even if the LLD is being unloaded Bart Van Assche
                   ` (2 preceding siblings ...)
  2017-03-16 20:56   ` Bart Van Assche
@ 2017-03-16 22:53 ` James Bottomley
  2017-03-16 23:19   ` Bart Van Assche
  3 siblings, 1 reply; 18+ messages in thread
From: James Bottomley @ 2017-03-16 22:53 UTC (permalink / raw)
  To: Bart Van Assche, Martin K . Petersen
  Cc: linux-scsi, Hannes Reinecke, Israel Rukshin, Max Gurtovoy

On Thu, 2017-03-16 at 13:56 -0700, Bart Van Assche wrote:
> scsi_target_unblock() must unblock SCSI devices even if this function
> is called after unloading of the LLD that created these devices has
> started. This is necessary to prevent that __scsi_remove_device() 
> hangs on the SYNCHRONIZE CACHE command issued by the sd driver during
> shutdown.

Your special get function misses the try_module_get().  But this is all
really a bit ugly. Since the only problem is the SYNC CACHE triggered
by device_del, isn't a better solution a new state: SDEV_CANCEL_BLOCK. 
 This will make the device visible to scsi_get_device() and we can take
it back from CANCEL_BLOCKED->CANCEL when the queue is unblocked.  I
suspect we could also simply throw away the sync cache command when the
device is blocked (the cache should destage naturally in the time it
takes for the device to be unblocked).

James

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/3] Unblock SCSI devices even if the LLD is being unloaded
  2017-03-16 22:53 ` [PATCH 0/3] Unblock SCSI devices even if the LLD is being unloaded James Bottomley
@ 2017-03-16 23:19   ` Bart Van Assche
  2017-03-17 12:54     ` James Bottomley
  0 siblings, 1 reply; 18+ messages in thread
From: Bart Van Assche @ 2017-03-16 23:19 UTC (permalink / raw)
  To: James.Bottomley, martin.petersen; +Cc: linux-scsi, maxg, israelr, hare

On Thu, 2017-03-16 at 15:53 -0700, James Bottomley wrote:
> On Thu, 2017-03-16 at 13:56 -0700, Bart Van Assche wrote:
> > scsi_target_unblock() must unblock SCSI devices even if this function
> > is called after unloading of the LLD that created these devices has
> > started. This is necessary to prevent that __scsi_remove_device() 
> > hangs on the SYNCHRONIZE CACHE command issued by the sd driver during
> > shutdown.
> 
> Your special get function misses the try_module_get().  But this is all
> really a bit ugly. Since the only problem is the SYNC CACHE triggered
> by device_del, isn't a better solution a new state: SDEV_CANCEL_BLOCK. 
>  This will make the device visible to scsi_get_device() and we can take
> it back from CANCEL_BLOCKED->CANCEL when the queue is unblocked.  I
> suspect we could also simply throw away the sync cache command when the
> device is blocked (the cache should destage naturally in the time it
> takes for the device to be unblocked).

Hello James,

The purpose of this patch series is to make sure that unblock also occurs
after module unloading has started. My understanding of try_module_get() is
that it fails once module unloading has started. In other words, it is on
purpose that try_module_get() is not called. From the kernel module code:

bool try_module_get(struct module *module)
{
	bool ret = true;

	if (module) {
		preempt_disable();
		/* Note: here, we can fail to get a reference */
		if (likely(module_is_live(module) &&
			   atomic_inc_not_zero(&module->refcnt) != 0))
			trace_module_get(module, _RET_IP_);
		else
			ret = false;
		preempt_enable();
	}
	return ret;
}

static inline int module_is_live(struct module *mod)
{
	return mod->state != MODULE_STATE_GOING;
}

Regarding introducing a new device state: this is something I would like to
avoid. Any code that manipulates the SCSI device state is unnecessarily hard
to modify because multiple types of state information have been mixed up in
a single state variable (blocked / not blocked; created / running / cancel /
offline). Additionally, the SCSI device state is visible to user space.
Adding a new SCSI device state could break existing user space applications.

There is another problem with the introduction of the SDEV_CANCEL_BLOCKED
state: we do not want open() calls to succeed for devices that are in the
SDEV_DEL, SDEV_CANCEL nor for devices that are in the SDEV_CANCEL_BLOCKED
state. scsi_disk_get() in the sd driver relies on scsi_device_get() to check
the SCSI device state. If scsi_device_get() would succeed for devices in the
SDEV_CANCEL_BLOCKED state then an explicit check for that state would have
to be added in several users of scsi_device_get(). In other words, I think
adding the SDEV_CANCEL_BLOCKED state would result in a much more complex and
also harder to test patch.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/3] Unblock SCSI devices even if the LLD is being unloaded
  2017-03-16 23:19   ` Bart Van Assche
@ 2017-03-17 12:54     ` James Bottomley
  2017-03-17 16:40       ` Bart Van Assche
  2017-04-10 17:46       ` Bart Van Assche
  0 siblings, 2 replies; 18+ messages in thread
From: James Bottomley @ 2017-03-17 12:54 UTC (permalink / raw)
  To: Bart Van Assche, martin.petersen; +Cc: linux-scsi, maxg, israelr, hare

On Thu, 2017-03-16 at 23:19 +0000, Bart Van Assche wrote:
> On Thu, 2017-03-16 at 15:53 -0700, James Bottomley wrote:
> > On Thu, 2017-03-16 at 13:56 -0700, Bart Van Assche wrote:
> > > scsi_target_unblock() must unblock SCSI devices even if this
> > > function
> > > is called after unloading of the LLD that created these devices
> > > has
> > > started. This is necessary to prevent that __scsi_remove_device()
> > > hangs on the SYNCHRONIZE CACHE command issued by the sd driver
> > > during
> > > shutdown.
> > 
> > Your special get function misses the try_module_get().  But this is
> > all
> > really a bit ugly. Since the only problem is the SYNC CACHE 
> > triggered by device_del, isn't a better solution a new state:
> > SDEV_CANCEL_BLOCK.  This will make the device visible to 
> > scsi_get_device() and we can take it back from CANCEL_BLOCKED
> > ->CANCEL when the queue is unblocked.  I suspect we could also 
> > simply throw away the sync cache command when the device is blocked 
> > (the cache should destage naturally in the time it takes for the
> > device to be unblocked).
> 
> Hello James,
> 
> The purpose of this patch series is to make sure that unblock also 
> occurs after module unloading has started. My understanding of
> try_module_get() is that it fails once module unloading has started.
>  In other words, it is on purpose that try_module_get() is not
> called. From the kernel module
> code:
> 
> bool try_module_get(struct module *module)
> {
> 	bool ret = true;
> 
> 	if (module) {
> 		preempt_disable();
> 		/* Note: here, we can fail to get a reference */
> 		if (likely(module_is_live(module) &&
> 			   atomic_inc_not_zero(&module->refcnt) != 0))
> 			trace_module_get(module, _RET_IP_);
> 		else
> 			ret = false;
> 		preempt_enable();
> 	}
> 	return ret;
> }
> 
> static inline int module_is_live(struct module *mod)
> {
> 	return mod->state != MODULE_STATE_GOING;
> }

So it's better to use the module without a reference in place and take
the risk that it may exit and release its code area while we're calling
it?

> Regarding introducing a new device state: this is something I would 
> like to avoid. Any code that manipulates the SCSI device state is
> unnecessarily hard to modify because multiple types of state 
> information have been mixed up in a single state variable (blocked / 
> not blocked; created / running / cancel / offline). Additionally, the 
> SCSI device state is visible to user space.
> Adding a new SCSI device state could break existing user space
> applications.

I'm not sure that's a real concern for a new cancel state, but it's
addressable by not exposing the state to user space (it can just show
up as blocked)

> There is another problem with the introduction of the 
> SDEV_CANCEL_BLOCKED state: we do not want open() calls to succeed for 
> devices that are in the SDEV_DEL, SDEV_CANCEL nor for devices that 
> are in the SDEV_CANCEL_BLOCKED state. scsi_disk_get() in the sd 
> driver relies on scsi_device_get() to check the SCSI device state. If 
> scsi_device_get() would succeed for devices in the
> SDEV_CANCEL_BLOCKED state then an explicit check for that state would 
> have to be added in several users of scsi_device_get().

That really doesn't matter: getting a reference via open is a race. 
 Currently if you do it just before SDEV_CANCEL you end up in the same
position: a properly refcounted open device that can't send any
commands, so this doesn't add any new problems.

>  In other words, I think adding the SDEV_CANCEL_BLOCKED state would
> result in a much more complex and also harder to test patch.

Fine, here's what I thing it would look like; it seems a lot shorter
and simpler to me, but if you want to pursue your approach fixing the
race with module exit is a requirement.

James

---

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index ba22866..b952a6a 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1248,6 +1248,13 @@ scsi_prep_state_check(struct scsi_device *sdev, struct request *req)
 				    "rejecting I/O to dead device\n");
 			ret = BLKPREP_KILL;
 			break;
+		case SDEV_CANCEL_BLOCK:
+			/*
+			 * silently kill the I/O: the only allowed thing
+			 * is a ULD remove one (like SYNC CACHE)
+			 */
+			ret = BLKPREP_KILL;
+			break;
 		case SDEV_BLOCK:
 		case SDEV_CREATED_BLOCK:
 			ret = BLKPREP_DEFER;
@@ -2604,6 +2611,15 @@ scsi_device_set_state(struct scsi_device *sdev, enum scsi_device_state state)
 		}
 		break;
 
+	case SDEV_CANCEL_BLOCK:
+		switch (oldstate) {
+		case SDEV_CANCEL:
+			break;
+		default:
+			goto illegal;
+		}
+		break;
+
 	case SDEV_CANCEL:
 		switch (oldstate) {
 		case SDEV_CREATED:
@@ -2612,6 +2628,7 @@ scsi_device_set_state(struct scsi_device *sdev, enum scsi_device_state state)
 		case SDEV_OFFLINE:
 		case SDEV_TRANSPORT_OFFLINE:
 		case SDEV_BLOCK:
+		case SDEV_CANCEL_BLOCK:
 			break;
 		default:
 			goto illegal;
@@ -3017,6 +3034,8 @@ scsi_internal_device_unblock(struct scsi_device *sdev,
 			sdev->sdev_state = new_state;
 		else
 			sdev->sdev_state = SDEV_CREATED;
+	} else if (sdev->sdev_state == SDEV_CANCEL_BLOCK) {
+		sdev->sdev_state = SDEV_CANCEL;
 	} else if (sdev->sdev_state != SDEV_CANCEL &&
 		 sdev->sdev_state != SDEV_OFFLINE)
 		return -EINVAL;
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index 82dfe07..fd1ba1d 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -39,6 +39,7 @@ static const struct {
 	{ SDEV_TRANSPORT_OFFLINE, "transport-offline" },
 	{ SDEV_BLOCK,	"blocked" },
 	{ SDEV_CREATED_BLOCK, "created-blocked" },
+	{ SDEV_CANCEL_BLOCK, "blocked" },
 };
 
 const char *scsi_device_state_name(enum scsi_device_state state)
@@ -972,7 +973,8 @@ sdev_store_dh_state(struct device *dev, struct device_attribute *attr,
 	int err = -EINVAL;
 
 	if (sdev->sdev_state == SDEV_CANCEL ||
-	    sdev->sdev_state == SDEV_DEL)
+	    sdev->sdev_state == SDEV_DEL ||
+	    sdev->sdev_state == SDEV_CANCEL_BLOCK)
 		return -ENODEV;
 
 	if (!sdev->handler) {
@@ -1282,7 +1284,8 @@ void __scsi_remove_device(struct scsi_device *sdev)
 		return;
 
 	if (sdev->is_visible) {
-		if (scsi_device_set_state(sdev, SDEV_CANCEL) != 0)
+		if (scsi_device_set_state(sdev, SDEV_CANCEL) != 0
+		    && scsi_device_set_state(sdev, SDEV_CANCEL_BLOCK) != 0)
 			return;
 
 		bsg_unregister_queue(sdev->request_queue);
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index 6f22b39..a78a17a 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -48,6 +48,7 @@ enum scsi_device_state {
 				 * should be issued to the scsi
 				 * lld. */
 	SDEV_CREATED_BLOCK,	/* same as above but for created devices */
+	SDEV_CANCEL_BLOCK,	/* same as above but for cancelling devices */
 };
 
 enum scsi_scan_mode {

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/3] Unblock SCSI devices even if the LLD is being unloaded
  2017-03-17 12:54     ` James Bottomley
@ 2017-03-17 16:40       ` Bart Van Assche
  2017-03-18 12:44         ` James Bottomley
  2017-04-10 17:46       ` Bart Van Assche
  1 sibling, 1 reply; 18+ messages in thread
From: Bart Van Assche @ 2017-03-17 16:40 UTC (permalink / raw)
  To: James.Bottomley, martin.petersen; +Cc: linux-scsi, maxg, israelr, hare

On Fri, 2017-03-17 at 05:54 -0700, James Bottomley wrote:
> So it's better to use the module without a reference in place and take
> the risk that it may exit and release its code area while we're calling
> it?

Hello James,

My understanding of scsi_device_get() / scsi_device_put() is that the reason
why these manipulate the module reference count is to avoid that a SCSI LLD
module can be unloaded while a SCSI device is being used from a context that
is not notified about SCSI LLD unloading (e.g. a file handle controlled by
the sd driver or a SCSI ALUA device handler worker thread).

Does your comment mean that you think there is a scenario in which
scsi_target_block() or scsi_target_unblock() can be called while the text
area of a SCSI LLD is being released? I have reviewed all the callers of
these functions but I have not found such a scenario. scsi_target_block()
and scsi_target_unblock() are either called from a SCSI transport layer
implementation (FC, iSCSI, SRP) or from a SCSI LLD kernel module (snic_disc).
All these kernel modules only call scsi_target_*block() for resources (rport
or SCSI target respectively) that are removed before the code area of these
modules is released. This is why I think that calling scsi_target_*block()
without increasing the SCSI LLD module reference count is safe.

> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
> index 82dfe07..fd1ba1d 100644
> --- a/drivers/scsi/scsi_sysfs.c
> +++ b/drivers/scsi/scsi_sysfs.c
> @@ -39,6 +39,7 @@ static const struct {
>  	{ SDEV_TRANSPORT_OFFLINE, "transport-offline" },
>  	{ SDEV_BLOCK,	"blocked" },
>  	{ SDEV_CREATED_BLOCK, "created-blocked" },
> +	{ SDEV_CANCEL_BLOCK, "blocked" },
>  };

The multipathd function path_offline() translates "blocked" into PATH_PENDING.
Shouldn't SDEV_CANCEL_BLOCK be translated by multipathd into PATH_DOWN? There
might be other user space applications that interpret the SCSI device state
and that I am not aware of.

Additionally, your patch does not modify scsi_device_get() and hence will
cause scsi_device_get() to succeed for devices that are in state
SDEV_CANCEL_BLOCK. I think that's a subtle behavior change.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/3] Unblock SCSI devices even if the LLD is being unloaded
  2017-03-17 16:40       ` Bart Van Assche
@ 2017-03-18 12:44         ` James Bottomley
  2017-03-18 20:49           ` Bart Van Assche
  0 siblings, 1 reply; 18+ messages in thread
From: James Bottomley @ 2017-03-18 12:44 UTC (permalink / raw)
  To: Bart Van Assche, martin.petersen; +Cc: linux-scsi, maxg, israelr, hare

On Fri, 2017-03-17 at 16:40 +0000, Bart Van Assche wrote:
> On Fri, 2017-03-17 at 05:54 -0700, James Bottomley wrote:
> > So it's better to use the module without a reference in place and 
> > take the risk that it may exit and release its code area while 
> > we're calling it?
> 
> Hello James,
> 
> My understanding of scsi_device_get() / scsi_device_put() is that the 
> reason why these manipulate the module reference count is to avoid 
> that a SCSI LLD module can be unloaded while a SCSI device is being 
> used from a context that is not notified about SCSI LLD unloading 
> (e.g. a file handle controlled by the sd driver or a SCSI ALUA device
> handler worker thread).

Not just that: it's so the underlying module is pinned for every
potential user as well.  the unblock code is called in places outside
the actual hba driver module, so it needs that protection.

> Does your comment mean that you think there is a scenario in which
> scsi_target_block() or scsi_target_unblock() can be called while the 
> text area of a SCSI LLD is being released? I have reviewed all the 
> callers of these functions but I have not found such a scenario.
> scsi_target_block() and scsi_target_unblock() are either called from 
> a SCSI transport layer implementation (FC, iSCSI, SRP) or from a SCSI 
> LLD kernel module (snic_disc). All these kernel modules only call 
> scsi_target_*block() for resources (rport or SCSI target
> respectively) that are removed before the code area of
> these modules is released. This is why I think that calling
> scsi_target_*block() without increasing the SCSI LLD module reference
> count is safe.

The transport code is above the HBA module code and in that code
unblock could be racing with module removal.  The original premise was
that once the dev/target/host goes into DEL, nothing can call into
queuecommand or get a reference to the device, so nothing halts removal
after that, but you changed that with your code, which is why it's now
unsafe.

> > diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
> > index 82dfe07..fd1ba1d 100644
> > --- a/drivers/scsi/scsi_sysfs.c
> > +++ b/drivers/scsi/scsi_sysfs.c
> > @@ -39,6 +39,7 @@ static const struct {
> >  	{ SDEV_TRANSPORT_OFFLINE, "transport-offline" },
> >  	{ SDEV_BLOCK,	"blocked" },
> >  	{ SDEV_CREATED_BLOCK, "created-blocked" },
> > +	{ SDEV_CANCEL_BLOCK, "blocked" },
> >  };
> 
> The multipathd function path_offline() translates "blocked" into 
> PATH_PENDING. Shouldn't SDEV_CANCEL_BLOCK be translated by multipathd 
> into PATH_DOWN? There might be other user space applications that 
> interpret the SCSI device state and that I am not aware of.

Given it's very short lived, I don't think it much matters, but if you
think it does, that can become cancel.

> Additionally, your patch does not modify scsi_device_get() and hence
> will cause scsi_device_get() to succeed for devices that are in state
> SDEV_CANCEL_BLOCK. I think that's a subtle behavior change.

Yes, otherwise device unblock wouldn't work (on the off chance the
device is unblocked before we get to the sync cache).  Again, it's like
the open race: you could have got the reference just before the device
went into cancel and you're in the same position so there's actually no
substantive behaviour change at all, it just elongates the window where
you get a reference to a device you can't send commands to.

James


> Thanks,
> 
> Bart.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/3] Introduce starget_for_all_devices() and shost_for_all_devices()
  2017-03-16 20:56   ` Bart Van Assche
@ 2017-03-18 17:14     ` kbuild test robot
  -1 siblings, 0 replies; 18+ messages in thread
From: kbuild test robot @ 2017-03-18 17:14 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: kbuild-all, Martin K . Petersen, linux-scsi, Hannes Reinecke,
	Israel Rukshin, Max Gurtovoy, Bart Van Assche, stable

[-- Attachment #1: Type: text/plain, Size: 1655 bytes --]

Hi Bart,

[auto build test WARNING on scsi/for-next]
[also build test WARNING on v4.11-rc2 next-20170310]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Bart-Van-Assche/Unblock-SCSI-devices-even-if-the-LLD-is-being-unloaded/20170318-214016
base:   https://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi.git for-next
reproduce: make htmldocs

All warnings (new ones prefixed by >>):

>> drivers/scsi/scsi.c:669: warning: No description found for parameter 'sdev'
   drivers/scsi/scsi.c:679: warning: No description found for parameter 'sdev'
   drivers/scsi/constants.c:1: warning: no structured comments found

vim +/sdev +669 drivers/scsi/scsi.c

   653		struct scsi_device *sdev;
   654	
   655		shost_for_each_device(sdev, shost) {
   656			if ((sdev->channel == starget->channel) &&
   657			    (sdev->id == starget->id))
   658				fn(sdev, data);
   659		}
   660	}
   661	EXPORT_SYMBOL(starget_for_each_device);
   662	
   663	/**
   664	 * scsi_device_get_any() - get a reference to @sdev even if it is being deleted
   665	 *
   666	 * See also scsi_device_get().
   667	 */
   668	static int scsi_device_get_any(struct scsi_device *sdev)
 > 669	{
   670		return get_device(&sdev->sdev_gendev) ? 0 : -ENXIO;
   671	}
   672	
   673	/**
   674	 * scsi_device_put_any() - drop a reference obtained by scsi_device_get_any()
   675	 *
   676	 * See also scsi_device_put().
   677	 */

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 6576 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/3] Introduce starget_for_all_devices() and shost_for_all_devices()
@ 2017-03-18 17:14     ` kbuild test robot
  0 siblings, 0 replies; 18+ messages in thread
From: kbuild test robot @ 2017-03-18 17:14 UTC (permalink / raw)
  Cc: kbuild-all, Martin K . Petersen, linux-scsi, Hannes Reinecke,
	Israel Rukshin, Max Gurtovoy, Bart Van Assche, stable

[-- Attachment #1: Type: text/plain, Size: 1655 bytes --]

Hi Bart,

[auto build test WARNING on scsi/for-next]
[also build test WARNING on v4.11-rc2 next-20170310]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Bart-Van-Assche/Unblock-SCSI-devices-even-if-the-LLD-is-being-unloaded/20170318-214016
base:   https://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi.git for-next
reproduce: make htmldocs

All warnings (new ones prefixed by >>):

>> drivers/scsi/scsi.c:669: warning: No description found for parameter 'sdev'
   drivers/scsi/scsi.c:679: warning: No description found for parameter 'sdev'
   drivers/scsi/constants.c:1: warning: no structured comments found

vim +/sdev +669 drivers/scsi/scsi.c

   653		struct scsi_device *sdev;
   654	
   655		shost_for_each_device(sdev, shost) {
   656			if ((sdev->channel == starget->channel) &&
   657			    (sdev->id == starget->id))
   658				fn(sdev, data);
   659		}
   660	}
   661	EXPORT_SYMBOL(starget_for_each_device);
   662	
   663	/**
   664	 * scsi_device_get_any() - get a reference to @sdev even if it is being deleted
   665	 *
   666	 * See also scsi_device_get().
   667	 */
   668	static int scsi_device_get_any(struct scsi_device *sdev)
 > 669	{
   670		return get_device(&sdev->sdev_gendev) ? 0 : -ENXIO;
   671	}
   672	
   673	/**
   674	 * scsi_device_put_any() - drop a reference obtained by scsi_device_get_any()
   675	 *
   676	 * See also scsi_device_put().
   677	 */

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 6576 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/3] Ensure that scsi_target_unblock() examines all devices
  2017-03-16 20:56   ` Bart Van Assche
@ 2017-03-18 20:22     ` kbuild test robot
  -1 siblings, 0 replies; 18+ messages in thread
From: kbuild test robot @ 2017-03-18 20:22 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: kbuild-all, Martin K . Petersen, linux-scsi, Hannes Reinecke,
	Israel Rukshin, Max Gurtovoy, Bart Van Assche, stable

[-- Attachment #1: Type: text/plain, Size: 3262 bytes --]

Hi Bart,

[auto build test WARNING on scsi/for-next]
[also build test WARNING on v4.11-rc2 next-20170310]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Bart-Van-Assche/Unblock-SCSI-devices-even-if-the-LLD-is-being-unloaded/20170318-214016
base:   https://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi.git for-next
reproduce: make htmldocs

All warnings (new ones prefixed by >>):

   drivers/scsi/scsi.c:669: warning: No description found for parameter 'sdev'
   drivers/scsi/scsi.c:679: warning: No description found for parameter 'sdev'
>> drivers/scsi/scsi_lib.c:3087: warning: No description found for parameter 'dev'
>> drivers/scsi/scsi_lib.c:3087: warning: No description found for parameter 'data'
   drivers/scsi/constants.c:1: warning: no structured comments found

vim +/dev +3087 drivers/scsi/scsi_lib.c

5d9fb5cc1 Mike Christie   2012-05-17  3071  	scsi_internal_device_unblock(sdev, *(enum scsi_device_state *)data);
^1da177e4 Linus Torvalds  2005-04-16  3072  }
^1da177e4 Linus Torvalds  2005-04-16  3073  
ae188a841 Bart Van Assche 2017-03-16  3074  /**
ae188a841 Bart Van Assche 2017-03-16  3075   * target_unblock() - unblock all devices associated with a SCSI target
ae188a841 Bart Van Assche 2017-03-16  3076   *
ae188a841 Bart Van Assche 2017-03-16  3077   * Notes:
ae188a841 Bart Van Assche 2017-03-16  3078   * - Do not use scsi_device_get() nor any of the macros that use this
ae188a841 Bart Van Assche 2017-03-16  3079   *   function from inside scsi_target_block() because otherwise this function
ae188a841 Bart Van Assche 2017-03-16  3080   *   won't have any effect when called while the SCSI LLD is being unloaded.
ae188a841 Bart Van Assche 2017-03-16  3081   * - Do not hold the host lock around the device_unblock() calls because at
ae188a841 Bart Van Assche 2017-03-16  3082   *   least for blk-sq the block layer queue lock is the outer lock and the
ae188a841 Bart Van Assche 2017-03-16  3083   *   SCSI host lock is the inner lock.
ae188a841 Bart Van Assche 2017-03-16  3084   */
^1da177e4 Linus Torvalds  2005-04-16  3085  static int
^1da177e4 Linus Torvalds  2005-04-16  3086  target_unblock(struct device *dev, void *data)
^1da177e4 Linus Torvalds  2005-04-16 @3087  {
^1da177e4 Linus Torvalds  2005-04-16  3088  	if (scsi_is_target_device(dev))
ae188a841 Bart Van Assche 2017-03-16  3089  		starget_for_all_devices(to_scsi_target(dev), data,
^1da177e4 Linus Torvalds  2005-04-16  3090  					device_unblock);
^1da177e4 Linus Torvalds  2005-04-16  3091  	return 0;
^1da177e4 Linus Torvalds  2005-04-16  3092  }
^1da177e4 Linus Torvalds  2005-04-16  3093  
ae188a841 Bart Van Assche 2017-03-16  3094  /**
ae188a841 Bart Van Assche 2017-03-16  3095   * scsi_target_unblock() - unblock all devices associated with a SCSI target

:::::: The code at line 3087 was first introduced by commit
:::::: 1da177e4c3f41524e886b7f1b8a0c1fc7321cac2 Linux-2.6.12-rc2

:::::: TO: Linus Torvalds <torvalds@ppc970.osdl.org>
:::::: CC: Linus Torvalds <torvalds@ppc970.osdl.org>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 6576 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/3] Ensure that scsi_target_unblock() examines all devices
@ 2017-03-18 20:22     ` kbuild test robot
  0 siblings, 0 replies; 18+ messages in thread
From: kbuild test robot @ 2017-03-18 20:22 UTC (permalink / raw)
  Cc: kbuild-all, Martin K . Petersen, linux-scsi, Hannes Reinecke,
	Israel Rukshin, Max Gurtovoy, Bart Van Assche, stable

[-- Attachment #1: Type: text/plain, Size: 3262 bytes --]

Hi Bart,

[auto build test WARNING on scsi/for-next]
[also build test WARNING on v4.11-rc2 next-20170310]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Bart-Van-Assche/Unblock-SCSI-devices-even-if-the-LLD-is-being-unloaded/20170318-214016
base:   https://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi.git for-next
reproduce: make htmldocs

All warnings (new ones prefixed by >>):

   drivers/scsi/scsi.c:669: warning: No description found for parameter 'sdev'
   drivers/scsi/scsi.c:679: warning: No description found for parameter 'sdev'
>> drivers/scsi/scsi_lib.c:3087: warning: No description found for parameter 'dev'
>> drivers/scsi/scsi_lib.c:3087: warning: No description found for parameter 'data'
   drivers/scsi/constants.c:1: warning: no structured comments found

vim +/dev +3087 drivers/scsi/scsi_lib.c

5d9fb5cc1 Mike Christie   2012-05-17  3071  	scsi_internal_device_unblock(sdev, *(enum scsi_device_state *)data);
^1da177e4 Linus Torvalds  2005-04-16  3072  }
^1da177e4 Linus Torvalds  2005-04-16  3073  
ae188a841 Bart Van Assche 2017-03-16  3074  /**
ae188a841 Bart Van Assche 2017-03-16  3075   * target_unblock() - unblock all devices associated with a SCSI target
ae188a841 Bart Van Assche 2017-03-16  3076   *
ae188a841 Bart Van Assche 2017-03-16  3077   * Notes:
ae188a841 Bart Van Assche 2017-03-16  3078   * - Do not use scsi_device_get() nor any of the macros that use this
ae188a841 Bart Van Assche 2017-03-16  3079   *   function from inside scsi_target_block() because otherwise this function
ae188a841 Bart Van Assche 2017-03-16  3080   *   won't have any effect when called while the SCSI LLD is being unloaded.
ae188a841 Bart Van Assche 2017-03-16  3081   * - Do not hold the host lock around the device_unblock() calls because at
ae188a841 Bart Van Assche 2017-03-16  3082   *   least for blk-sq the block layer queue lock is the outer lock and the
ae188a841 Bart Van Assche 2017-03-16  3083   *   SCSI host lock is the inner lock.
ae188a841 Bart Van Assche 2017-03-16  3084   */
^1da177e4 Linus Torvalds  2005-04-16  3085  static int
^1da177e4 Linus Torvalds  2005-04-16  3086  target_unblock(struct device *dev, void *data)
^1da177e4 Linus Torvalds  2005-04-16 @3087  {
^1da177e4 Linus Torvalds  2005-04-16  3088  	if (scsi_is_target_device(dev))
ae188a841 Bart Van Assche 2017-03-16  3089  		starget_for_all_devices(to_scsi_target(dev), data,
^1da177e4 Linus Torvalds  2005-04-16  3090  					device_unblock);
^1da177e4 Linus Torvalds  2005-04-16  3091  	return 0;
^1da177e4 Linus Torvalds  2005-04-16  3092  }
^1da177e4 Linus Torvalds  2005-04-16  3093  
ae188a841 Bart Van Assche 2017-03-16  3094  /**
ae188a841 Bart Van Assche 2017-03-16  3095   * scsi_target_unblock() - unblock all devices associated with a SCSI target

:::::: The code at line 3087 was first introduced by commit
:::::: 1da177e4c3f41524e886b7f1b8a0c1fc7321cac2 Linux-2.6.12-rc2

:::::: TO: Linus Torvalds <torvalds@ppc970.osdl.org>
:::::: CC: Linus Torvalds <torvalds@ppc970.osdl.org>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 6576 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/3] Unblock SCSI devices even if the LLD is being unloaded
  2017-03-18 12:44         ` James Bottomley
@ 2017-03-18 20:49           ` Bart Van Assche
  0 siblings, 0 replies; 18+ messages in thread
From: Bart Van Assche @ 2017-03-18 20:49 UTC (permalink / raw)
  To: James.Bottomley, martin.petersen; +Cc: linux-scsi, maxg, israelr, hare

On Sat, 2017-03-18 at 07:44 -0500, James Bottomley wrote:
> On Fri, 2017-03-17 at 16:40 +0000, Bart Van Assche wrote:
> > Does your comment mean that you think there is a scenario in which
> > scsi_target_block() or scsi_target_unblock() can be called while the 
> > text area of a SCSI LLD is being released? I have reviewed all the 
> > callers of these functions but I have not found such a scenario.
> > scsi_target_block() and scsi_target_unblock() are either called from 
> > a SCSI transport layer implementation (FC, iSCSI, SRP) or from a SCSI 
> > LLD kernel module (snic_disc). All these kernel modules only call 
> > scsi_target_*block() for resources (rport or SCSI target
> > respectively) that are removed before the code area of
> > these modules is released. This is why I think that calling
> > scsi_target_*block() without increasing the SCSI LLD module reference
> > count is safe.
> 
> The transport code is above the HBA module code and in that code
> unblock could be racing with module removal.  The original premise was
> that once the dev/target/host goes into DEL, nothing can call into
> queuecommand or get a reference to the device, so nothing halts removal
> after that, but you changed that with your code, which is why it's now
> unsafe.

Hello James,

Thank you for having provided more background information about the design
goals of the SCSI code. However, regarding scsi_target_*block(), I think it
is safe even for transport modules to call these functions without obtaining
an additional reference on the SCSI LLD kernel module. The transport
implementation won't attempt to block or unblock a SCSI target anymore once
unloading of the LLD text has started. SCSI LLDs release the attached
transport before unloading of the SCSI LLD kernel module text starts and
SCSI transport modules guarantee that scsi_target_*block() won't be called
anymore after the transport module has been released.
  
If you do not agree with the above please provide a call sequence for an
existing SCSI LLD or transport module that illustrates the race you
described.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/3] Unblock SCSI devices even if the LLD is being unloaded
  2017-03-17 12:54     ` James Bottomley
  2017-03-17 16:40       ` Bart Van Assche
@ 2017-04-10 17:46       ` Bart Van Assche
  1 sibling, 0 replies; 18+ messages in thread
From: Bart Van Assche @ 2017-04-10 17:46 UTC (permalink / raw)
  To: James.Bottomley, martin.petersen; +Cc: linux-scsi, maxg, israelr, hare

On Fri, 2017-03-17 at 05:54 -0700, James Bottomley wrote:
> but if you want to pursue your approach fixing the
> race with module exit is a requirement.

Hello James,

Sorry that it took so long but I finally found the time to implement and
test an alternative. I will post the patches that implement that new approach.

Bart.

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2017-04-10 17:46 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-16 20:56 [PATCH 0/3] Unblock SCSI devices even if the LLD is being unloaded Bart Van Assche
2017-03-16 20:56 ` [PATCH 1/3] __scsi_iterate_devices(): Make the get and put functions arguments Bart Van Assche
2017-03-16 20:56   ` Bart Van Assche
2017-03-16 20:56 ` [PATCH 2/3] Introduce starget_for_all_devices() and shost_for_all_devices() Bart Van Assche
2017-03-16 20:56   ` Bart Van Assche
2017-03-18 17:14   ` kbuild test robot
2017-03-18 17:14     ` kbuild test robot
2017-03-16 20:56 ` [PATCH 3/3] Ensure that scsi_target_unblock() examines all devices Bart Van Assche
2017-03-16 20:56   ` Bart Van Assche
2017-03-18 20:22   ` kbuild test robot
2017-03-18 20:22     ` kbuild test robot
2017-03-16 22:53 ` [PATCH 0/3] Unblock SCSI devices even if the LLD is being unloaded James Bottomley
2017-03-16 23:19   ` Bart Van Assche
2017-03-17 12:54     ` James Bottomley
2017-03-17 16:40       ` Bart Van Assche
2017-03-18 12:44         ` James Bottomley
2017-03-18 20:49           ` Bart Van Assche
2017-04-10 17:46       ` Bart Van Assche

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.