All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v11 0/9] More device removal fixes
@ 2013-06-12 12:48 Bart Van Assche
  2013-06-12 12:49 ` [PATCH v11 1/9] Fix race between starved list and device removal Bart Van Assche
                   ` (8 more replies)
  0 siblings, 9 replies; 51+ messages in thread
From: Bart Van Assche @ 2013-06-12 12:48 UTC (permalink / raw)
  To: linux-scsi, Joe Lawrence, Tejun Heo, Chanho Min, David Milburn


Fix a few issues that can be triggered by removing a SCSI device:
- Fix a race between starved list processing and device removal that
   can trigger a kernel oops.
- Avoid that a SCSI LLD callback can get invoked after
   scsi_remove_host() finished.
- Speed up device removal by stopping error handling as soon as
   the SHOST_DEL or SHOST_DEL_RECOVERY state has been reached.
- Save and restore the host_scribble field during error handling.

Changes compared to v10:
- Rebased and retested on top of Linux kernel v3.10-rc5.

Changes compared to v9:
- Changed one WARN_ON() statement into a WARN() statement.

Changes compared to v8:
- Addressed the feedback from Joe Lawrence - dropped the patch that
   makes scsi_remove_host() wait until the last sdev user is gone.
- Eliminated Scsi_Host.tmf_in_progress since it duplicates state
   information available in Scsi_Host.eh_active.
- Added a patch to avoid reenabling I/O after the transport layer
   became offline.

Changes compared to v7:
- Addressed the review comments posted by Hannes Reinecke and Rolf Eike
   Beer.
- Modified patch "Make scsi_remove_host() wait until error handling
   finished" such that it is also safe for SCSI timeout values below
   the maximum LLD response time by modifying scsi_send_eh_cmnd() such
   that it does not invoke any LLD code after scsi_remove_host() started.
- Added a patch to save and restore the host_scribble field.
- Refined / clarified several patch descriptions.
- Rebased and retested on top of kernel v3.8-rc6.

Changes compared to v6:
- Dropped the first six patches since Jens queued these for 3.8.
- Added patch to avoid that __scsi_remove_device() is invoked twice.
- Restore error recovery in the SHOST_CANCEL state.

Changes compared to v5:
- Avoid that block layer work can be scheduled on a dead queue.
- Do not invoke any SCSI LLD callback after scsi_remove_host() finished.
- Stop error handling as soon as scsi_remove_host() started.
- Remove the unused function bsg_goose_queue().
- Avoid that scsi_device_set_state() triggers a race condition.

Changes compared to v4:
- Moved queue_flag_set(QUEUE_FLAG_DEAD, q) from blk_drain_queue() into
   blk_cleanup_queue().
- Declared the new __blk_run_queue_uncond() function inline. Checked in
   the generated assembler code that this function is really inlined in
   __blk_run_queue().
- Elaborated several patch descriptions.
- Added sparse annotations to scsi_request_fn().
- Split several patches.

Changes compared to v3:
- Fixed a race condition by setting QUEUE_FLAG_DEAD earlier.
- Added a patch for fixing a race between starved list processing
   and device removal to this series.

Changes compared to v2:
- Split second patch into two patches.
- Refined patch descriptions.

Changes compared to v1:
- Included a patch to rename QUEUE_FLAG_DEAD.
- Refined the descriptions of the __blk_run_queue_uncond() and
   blk_cleanup_queue() functions.
-- 
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH v11 1/9] Fix race between starved list and device removal
  2013-06-12 12:48 [PATCH v11 0/9] More device removal fixes Bart Van Assche
@ 2013-06-12 12:49 ` Bart Van Assche
  2013-06-24 15:38   ` James Bottomley
  2013-06-12 12:51 ` [PATCH v11 2/9] Remove get_device() / put_device() pair from scsi_request_fn() Bart Van Assche
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 51+ messages in thread
From: Bart Van Assche @ 2013-06-12 12:49 UTC (permalink / raw)
  Cc: linux-scsi, Joe Lawrence, Tejun Heo, Chanho Min, David Milburn,
	Hannes Reinecke, Mike Christie

scsi_run_queue() examines all SCSI devices that are present on
the starved list. Since scsi_run_queue() unlocks the SCSI host
lock before running a queue a SCSI device can get removed after
it has been removed from the starved list and before its queue
is run. Protect against that race condition by increasing the
sdev refcount before running the sdev queue. Also, remove a
SCSI device from the starved list before __scsi_remove_device()
decreases the sdev refcount such that the get_device() call
added in scsi_run_queue() is guaranteed to succeed.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reported-and-tested-by: Chanho Min <chanho.min@lge.com>
Reference: http://lkml.org/lkml/2012/8/2/96
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Cc: Hannes Reinecke <hare@suse.de>
Cc: <stable@vger.kernel.org>
---
 drivers/scsi/scsi_lib.c   |   16 +++++++++++-----
 drivers/scsi/scsi_sysfs.c |   14 +++++++++++++-
 2 files changed, 24 insertions(+), 6 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 86d5220..d6ca072 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -456,11 +456,17 @@ static void scsi_run_queue(struct request_queue *q)
 			continue;
 		}
 
-		spin_unlock(shost->host_lock);
-		spin_lock(sdev->request_queue->queue_lock);
-		__blk_run_queue(sdev->request_queue);
-		spin_unlock(sdev->request_queue->queue_lock);
-		spin_lock(shost->host_lock);
+		/*
+		 * Obtain a reference before unlocking the host_lock such that
+		 * the sdev can't disappear before blk_run_queue() is invoked.
+		 */
+		get_device(&sdev->sdev_gendev);
+		spin_unlock_irqrestore(shost->host_lock, flags);
+
+		blk_run_queue(sdev->request_queue);
+		put_device(&sdev->sdev_gendev);
+
+		spin_lock_irqsave(shost->host_lock, flags);
 	}
 	/* put any unprocessed entries back */
 	list_splice(&starved_list, &shost->starved_list);
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index 931a7d9..34f1b39 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -345,7 +345,6 @@ static void scsi_device_dev_release_usercontext(struct work_struct *work)
 	starget->reap_ref++;
 	list_del(&sdev->siblings);
 	list_del(&sdev->same_target_siblings);
-	list_del(&sdev->starved_entry);
 	spin_unlock_irqrestore(sdev->host->host_lock, flags);
 
 	cancel_work_sync(&sdev->event_work);
@@ -953,6 +952,8 @@ int scsi_sysfs_add_sdev(struct scsi_device *sdev)
 void __scsi_remove_device(struct scsi_device *sdev)
 {
 	struct device *dev = &sdev->sdev_gendev;
+	struct Scsi_Host *shost = sdev->host;
+	unsigned long flags;
 
 	if (sdev->is_visible) {
 		if (scsi_device_set_state(sdev, SDEV_CANCEL) != 0)
@@ -974,6 +975,17 @@ void __scsi_remove_device(struct scsi_device *sdev)
 	blk_cleanup_queue(sdev->request_queue);
 	cancel_work_sync(&sdev->requeue_work);
 
+	/*
+	 * Remove a SCSI device from the starved list after blk_cleanup_queue()
+	 * finished such that scsi_request_fn() can't add it back to that list.
+	 * Also remove an sdev from the starved list before invoking
+	 * put_device() such that get_device() is guaranteed to succeed for an
+	 * sdev present on the starved list.
+	 */
+	spin_lock_irqsave(shost->host_lock, flags);
+	list_del(&sdev->starved_entry);
+	spin_unlock_irqrestore(shost->host_lock, flags);
+
 	if (sdev->host->hostt->slave_destroy)
 		sdev->host->hostt->slave_destroy(sdev);
 	transport_destroy_device(dev);
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v11 2/9] Remove get_device() / put_device() pair from scsi_request_fn()
  2013-06-12 12:48 [PATCH v11 0/9] More device removal fixes Bart Van Assche
  2013-06-12 12:49 ` [PATCH v11 1/9] Fix race between starved list and device removal Bart Van Assche
@ 2013-06-12 12:51 ` Bart Van Assche
  2013-06-24  1:29   ` Mike Christie
  2013-06-24  2:36   ` James Bottomley
  2013-06-12 12:52 ` [PATCH v11 3/9] Avoid calling __scsi_remove_device() twice Bart Van Assche
                   ` (6 subsequent siblings)
  8 siblings, 2 replies; 51+ messages in thread
From: Bart Van Assche @ 2013-06-12 12:51 UTC (permalink / raw)
  Cc: linux-scsi, Joe Lawrence, Tejun Heo, Chanho Min, David Milburn,
	James Bottomley, Hannes Reinecke, Mike Christie

Now that all scsi_request_fn() callers hold a reference on the
SCSI device that function is invoked for and since
blk_cleanup_queue() waits until scsi_request_fn() has finished
it is safe to remove the get_device() / put_device() pair from
scsi_request_fn().

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: James Bottomley <JBottomley@Parallels.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Mike Christie <michaelc@cs.wisc.edu>
---
 drivers/scsi/scsi_lib.c |   14 +++-----------
 1 file changed, 3 insertions(+), 11 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index d6ca072..03d4165 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1533,16 +1533,14 @@ static void scsi_softirq_done(struct request *rq)
  * Lock status: IO request lock assumed to be held when called.
  */
 static void scsi_request_fn(struct request_queue *q)
+	__releases(q->queue_lock)
+	__acquires(q->queue_lock)
 {
 	struct scsi_device *sdev = q->queuedata;
 	struct Scsi_Host *shost;
 	struct scsi_cmnd *cmd;
 	struct request *req;
 
-	if(!get_device(&sdev->sdev_gendev))
-		/* We must be tearing the block queue down already */
-		return;
-
 	/*
 	 * To start with, we keep looping until the queue is empty, or until
 	 * the host is no longer able to accept any more requests.
@@ -1631,7 +1629,7 @@ static void scsi_request_fn(struct request_queue *q)
 			goto out_delay;
 	}
 
-	goto out;
+	return;
 
  not_ready:
 	spin_unlock_irq(shost->host_lock);
@@ -1650,12 +1648,6 @@ static void scsi_request_fn(struct request_queue *q)
 out_delay:
 	if (sdev->device_busy == 0)
 		blk_delay_queue(q, SCSI_QUEUE_DELAY);
-out:
-	/* must be careful here...if we trigger the ->remove() function
-	 * we cannot be holding the q lock */
-	spin_unlock_irq(q->queue_lock);
-	put_device(&sdev->sdev_gendev);
-	spin_lock_irq(q->queue_lock);
 }
 
 u64 scsi_calculate_bounce_limit(struct Scsi_Host *shost)
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v11 3/9] Avoid calling __scsi_remove_device() twice
  2013-06-12 12:48 [PATCH v11 0/9] More device removal fixes Bart Van Assche
  2013-06-12 12:49 ` [PATCH v11 1/9] Fix race between starved list and device removal Bart Van Assche
  2013-06-12 12:51 ` [PATCH v11 2/9] Remove get_device() / put_device() pair from scsi_request_fn() Bart Van Assche
@ 2013-06-12 12:52 ` Bart Van Assche
  2013-06-23 21:35   ` Mike Christie
  2013-06-24 17:38   ` James Bottomley
  2013-06-12 12:53 ` [PATCH v11 4/9] Disallow changing the device state via sysfs into "deleted" Bart Van Assche
                   ` (5 subsequent siblings)
  8 siblings, 2 replies; 51+ messages in thread
From: Bart Van Assche @ 2013-06-12 12:52 UTC (permalink / raw)
  Cc: linux-scsi, Joe Lawrence, Tejun Heo, Chanho Min, David Milburn,
	James Bottomley, Mike Christie, Hannes Reinecke

SCSI devices are added to the shost->__devices list from inside
scsi_alloc_sdev(). If something goes wrong during LUN scanning,
e.g. a transport layer failure occurs, then __scsi_remove_device()
can get invoked by the LUN scanning code for a SCSI device in
state SDEV_CREATED_BLOCK or SDEV_BLOCKED. If this happens then
the SCSI device has not yet been added to sysfs (is_visible == 0).
Make sure that if this happens these devices are transitioned
into state SDEV_DEL. This avoids that __scsi_remove_device()
gets invoked a second time by scsi_forget_host().

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Cc: James Bottomley <JBottomley@Parallels.com>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Tejun Heo <tj@kernel.org>
---
 drivers/scsi/scsi_lib.c   |    2 ++
 drivers/scsi/scsi_sysfs.c |    7 ++++---
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 03d4165..d57b5e1 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -2175,6 +2175,8 @@ scsi_device_set_state(struct scsi_device *sdev, enum scsi_device_state state)
 		case SDEV_OFFLINE:
 		case SDEV_TRANSPORT_OFFLINE:
 		case SDEV_CANCEL:
+		case SDEV_BLOCK:
+		case SDEV_CREATED_BLOCK:
 			break;
 		default:
 			goto illegal;
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index 34f1b39..f869ef85 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -956,8 +956,9 @@ void __scsi_remove_device(struct scsi_device *sdev)
 	unsigned long flags;
 
 	if (sdev->is_visible) {
-		if (scsi_device_set_state(sdev, SDEV_CANCEL) != 0)
-			return;
+		WARN(scsi_device_set_state(sdev, SDEV_CANCEL) != 0,
+		     "%s: unexpected state %d\n", dev_name(&sdev->sdev_gendev),
+		     sdev->sdev_state);
 
 		bsg_unregister_queue(sdev->request_queue);
 		device_unregister(&sdev->sdev_dev);
@@ -971,7 +972,7 @@ void __scsi_remove_device(struct scsi_device *sdev)
 	 * scsi_run_queue() invocations have finished before tearing down the
 	 * device.
 	 */
-	scsi_device_set_state(sdev, SDEV_DEL);
+	WARN_ON_ONCE(scsi_device_set_state(sdev, SDEV_DEL) != 0);
 	blk_cleanup_queue(sdev->request_queue);
 	cancel_work_sync(&sdev->requeue_work);
 
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v11 4/9] Disallow changing the device state via sysfs into "deleted"
  2013-06-12 12:48 [PATCH v11 0/9] More device removal fixes Bart Van Assche
                   ` (2 preceding siblings ...)
  2013-06-12 12:52 ` [PATCH v11 3/9] Avoid calling __scsi_remove_device() twice Bart Van Assche
@ 2013-06-12 12:53 ` Bart Van Assche
  2013-06-24  1:05   ` Mike Christie
  2013-06-24 17:59   ` James Bottomley
  2013-06-12 12:54 ` [PATCH v11 5/9] Avoid saving/restoring interrupt state inside scsi_remove_host() Bart Van Assche
                   ` (4 subsequent siblings)
  8 siblings, 2 replies; 51+ messages in thread
From: Bart Van Assche @ 2013-06-12 12:53 UTC (permalink / raw)
  Cc: linux-scsi, Joe Lawrence, Tejun Heo, Chanho Min, David Milburn,
	James Bottomley, Mike Christie, Hannes Reinecke

Changing the state of a SCSI device via sysfs into "cancel" or
"deleted" prevents removal of these devices by scsi_remove_host().
Hence do not allow this. Also, introduce the symbolic name
INVALID_SDEV_STATE, representing a value different from any valid
SCSI device state. Update scsi_device_set_state() such that gcc
does not issue a warning about an enumeration value not being
handled inside a switch statement.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: James Bottomley <JBottomley@Parallels.com>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: Hannes Reinecke <hare@suse.de>
Cc: David Milburn <dmilburn@redhat.com>
---
 drivers/scsi/scsi_lib.c    |    2 ++
 drivers/scsi/scsi_sysfs.c  |    5 +++--
 include/scsi/scsi_device.h |    6 +++++-
 3 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index d57b5e1..e1cc1d1 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -2183,6 +2183,8 @@ scsi_device_set_state(struct scsi_device *sdev, enum scsi_device_state state)
 		}
 		break;
 
+	case INVALID_SDEV_STATE:
+		goto illegal;
 	}
 	sdev->sdev_state = state;
 	return 0;
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index f869ef85..fe3ea686 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -594,7 +594,7 @@ store_state_field(struct device *dev, struct device_attribute *attr,
 {
 	int i;
 	struct scsi_device *sdev = to_scsi_device(dev);
-	enum scsi_device_state state = 0;
+	enum scsi_device_state state = INVALID_SDEV_STATE;
 
 	for (i = 0; i < ARRAY_SIZE(sdev_states); i++) {
 		const int len = strlen(sdev_states[i].name);
@@ -604,7 +604,8 @@ store_state_field(struct device *dev, struct device_attribute *attr,
 			break;
 		}
 	}
-	if (!state)
+	if (state == INVALID_SDEV_STATE || state == SDEV_CANCEL ||
+	    state == SDEV_DEL)
 		return -EINVAL;
 
 	if (scsi_device_set_state(sdev, state))
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index cc64587..cc1b52a 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -29,7 +29,11 @@ struct scsi_mode_data {
  * scsi_lib:scsi_device_set_state().
  */
 enum scsi_device_state {
-	SDEV_CREATED = 1,	/* device created but not added to sysfs
+	INVALID_SDEV_STATE,	/* Not a valid SCSI device state but a
+				 * symbolic name that can be used wherever
+				 * a value is needed that is different from
+				 * any valid SCSI device state. */
+	SDEV_CREATED,		/* device created but not added to sysfs
 				 * Only internal commands allowed (for inq) */
 	SDEV_RUNNING,		/* device properly configured
 				 * All commands allowed */
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v11 5/9] Avoid saving/restoring interrupt state inside scsi_remove_host()
  2013-06-12 12:48 [PATCH v11 0/9] More device removal fixes Bart Van Assche
                   ` (3 preceding siblings ...)
  2013-06-12 12:53 ` [PATCH v11 4/9] Disallow changing the device state via sysfs into "deleted" Bart Van Assche
@ 2013-06-12 12:54 ` Bart Van Assche
  2013-06-24  1:06   ` Mike Christie
  2013-06-12 12:55 ` [PATCH v11 6/9] Make scsi_remove_host() wait until error handling finished Bart Van Assche
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 51+ messages in thread
From: Bart Van Assche @ 2013-06-12 12:54 UTC (permalink / raw)
  Cc: linux-scsi, Joe Lawrence, Tejun Heo, Chanho Min, David Milburn,
	Mike Christie, James Bottomley, Hannes Reinecke

Since it is not allowed to invoke scsi_remove_host() with interrupts
disabled, avoid saving and restoring the interrupt state inside
scsi_remove_host(). This patch does not change the functionality of
the function scsi_remove_host().

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Hannes Reinecke <hare@suse.de>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: James Bottomley <JBottomley@Parallels.com>
---
 drivers/scsi/hosts.c |   12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
index df0c3c7..034a567 100644
--- a/drivers/scsi/hosts.c
+++ b/drivers/scsi/hosts.c
@@ -156,27 +156,25 @@ EXPORT_SYMBOL(scsi_host_set_state);
  **/
 void scsi_remove_host(struct Scsi_Host *shost)
 {
-	unsigned long flags;
-
 	mutex_lock(&shost->scan_mutex);
-	spin_lock_irqsave(shost->host_lock, flags);
+	spin_lock_irq(shost->host_lock);
 	if (scsi_host_set_state(shost, SHOST_CANCEL))
 		if (scsi_host_set_state(shost, SHOST_CANCEL_RECOVERY)) {
-			spin_unlock_irqrestore(shost->host_lock, flags);
+			spin_unlock_irq(shost->host_lock);
 			mutex_unlock(&shost->scan_mutex);
 			return;
 		}
-	spin_unlock_irqrestore(shost->host_lock, flags);
+	spin_unlock_irq(shost->host_lock);
 
 	scsi_autopm_get_host(shost);
 	scsi_forget_host(shost);
 	mutex_unlock(&shost->scan_mutex);
 	scsi_proc_host_rm(shost);
 
-	spin_lock_irqsave(shost->host_lock, flags);
+	spin_lock_irq(shost->host_lock);
 	if (scsi_host_set_state(shost, SHOST_DEL))
 		BUG_ON(scsi_host_set_state(shost, SHOST_DEL_RECOVERY));
-	spin_unlock_irqrestore(shost->host_lock, flags);
+	spin_unlock_irq(shost->host_lock);
 
 	transport_unregister_device(&shost->shost_gendev);
 	device_unregister(&shost->shost_dev);
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v11 6/9] Make scsi_remove_host() wait until error handling finished
  2013-06-12 12:48 [PATCH v11 0/9] More device removal fixes Bart Van Assche
                   ` (4 preceding siblings ...)
  2013-06-12 12:54 ` [PATCH v11 5/9] Avoid saving/restoring interrupt state inside scsi_remove_host() Bart Van Assche
@ 2013-06-12 12:55 ` Bart Van Assche
  2013-06-24  1:15   ` Mike Christie
  2013-06-24 19:19   ` James Bottomley
  2013-06-12 12:56 ` PATCH v11 7/9] Avoid that scsi_device_set_state() triggers a race Bart Van Assche
                   ` (2 subsequent siblings)
  8 siblings, 2 replies; 51+ messages in thread
From: Bart Van Assche @ 2013-06-12 12:55 UTC (permalink / raw)
  To: linux-scsi
  Cc: Joe Lawrence, Tejun Heo, Chanho Min, David Milburn,
	Hannes Reinecke, Mike Christie

A SCSI LLD may start cleaning up host resources as soon as
scsi_remove_host() returns. These host resources may be needed by
the LLD in an implementation of one of the eh_* functions. So if
one of the eh_* functions is in progress when scsi_remove_host()
is invoked, wait until the eh_* function has finished. Also, do
not invoke any of the eh_* functions after scsi_remove_host() has
started. Remove Scsi_Host.tmf_in_progress because it is now
superfluous.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: Tejun Heo <tj@kernel.org>
---
 drivers/scsi/hosts.c      |    6 ++++
 drivers/scsi/scsi_error.c |   86 ++++++++++++++++++++++++++++++++++++++-------
 include/scsi/scsi_host.h  |    6 ++--
 3 files changed, 81 insertions(+), 17 deletions(-)

diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
index 034a567..17e2ccb 100644
--- a/drivers/scsi/hosts.c
+++ b/drivers/scsi/hosts.c
@@ -176,6 +176,12 @@ void scsi_remove_host(struct Scsi_Host *shost)
 		BUG_ON(scsi_host_set_state(shost, SHOST_DEL_RECOVERY));
 	spin_unlock_irq(shost->host_lock);
 
+	/*
+	 * Wait until the error handler has finished invoking LLD callbacks
+	 * before allowing the LLD to proceed.
+	 */
+	wait_event(shost->host_wait, shost->eh_active == 0);
+
 	transport_unregister_device(&shost->shost_gendev);
 	device_unregister(&shost->shost_dev);
 	device_del(&shost->shost_gendev);
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index f43de1e..1f2f593 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -537,8 +537,53 @@ static void scsi_eh_done(struct scsi_cmnd *scmd)
 }
 
 /**
+ * scsi_begin_eh - start host-related error handling
+ *
+ * Must be called before invoking an LLD callback function to avoid that
+ * scsi_remove_host() returns while one of these callback functions is in
+ * progress.
+ *
+ * Returns 0 if invoking an eh_* function is allowed and a negative value if
+ * not. If this function returns 0 then scsi_end_eh() must be called
+ * eventually.
+ */
+static int scsi_begin_eh(struct Scsi_Host *host)
+{
+	int res;
+
+	spin_lock_irq(host->host_lock);
+	switch (host->shost_state) {
+	case SHOST_DEL:
+	case SHOST_DEL_RECOVERY:
+		res = -ENODEV;
+		break;
+	default:
+		WARN_ON_ONCE(host->eh_active < 0);
+		host->eh_active++;
+		res = 0;
+		break;
+	}
+	spin_unlock_irq(host->host_lock);
+
+	return res;
+}
+
+/**
+ * scsi_end_eh - finish host-related error handling
+ */
+static void scsi_end_eh(struct Scsi_Host *host)
+{
+	spin_lock_irq(host->host_lock);
+	host->eh_active--;
+	WARN_ON_ONCE(host->eh_active < 0);
+	if (host->eh_active == 0)
+		wake_up(&host->host_wait);
+	spin_unlock_irq(host->host_lock);
+}
+
+/**
  * scsi_try_host_reset - ask host adapter to reset itself
- * @scmd:	SCSI cmd to send hsot reset.
+ * @scmd:	SCSI cmd to send host reset.
  */
 static int scsi_try_host_reset(struct scsi_cmnd *scmd)
 {
@@ -553,6 +598,9 @@ static int scsi_try_host_reset(struct scsi_cmnd *scmd)
 	if (!hostt->eh_host_reset_handler)
 		return FAILED;
 
+	if (scsi_begin_eh(host))
+		return FAST_IO_FAIL;
+
 	rtn = hostt->eh_host_reset_handler(scmd);
 
 	if (rtn == SUCCESS) {
@@ -562,6 +610,7 @@ static int scsi_try_host_reset(struct scsi_cmnd *scmd)
 		scsi_report_bus_reset(host, scmd_channel(scmd));
 		spin_unlock_irqrestore(host->host_lock, flags);
 	}
+	scsi_end_eh(host);
 
 	return rtn;
 }
@@ -583,6 +632,9 @@ static int scsi_try_bus_reset(struct scsi_cmnd *scmd)
 	if (!hostt->eh_bus_reset_handler)
 		return FAILED;
 
+	if (scsi_begin_eh(host))
+		return FAST_IO_FAIL;
+
 	rtn = hostt->eh_bus_reset_handler(scmd);
 
 	if (rtn == SUCCESS) {
@@ -592,6 +644,7 @@ static int scsi_try_bus_reset(struct scsi_cmnd *scmd)
 		scsi_report_bus_reset(host, scmd_channel(scmd));
 		spin_unlock_irqrestore(host->host_lock, flags);
 	}
+	scsi_end_eh(host);
 
 	return rtn;
 }
@@ -622,6 +675,9 @@ static int scsi_try_target_reset(struct scsi_cmnd *scmd)
 	if (!hostt->eh_target_reset_handler)
 		return FAILED;
 
+	if (scsi_begin_eh(host))
+		return FAST_IO_FAIL;
+
 	rtn = hostt->eh_target_reset_handler(scmd);
 	if (rtn == SUCCESS) {
 		spin_lock_irqsave(host->host_lock, flags);
@@ -629,6 +685,7 @@ static int scsi_try_target_reset(struct scsi_cmnd *scmd)
 					  __scsi_report_device_reset);
 		spin_unlock_irqrestore(host->host_lock, flags);
 	}
+	scsi_end_eh(host);
 
 	return rtn;
 }
@@ -646,14 +703,20 @@ static int scsi_try_target_reset(struct scsi_cmnd *scmd)
 static int scsi_try_bus_device_reset(struct scsi_cmnd *scmd)
 {
 	int rtn;
-	struct scsi_host_template *hostt = scmd->device->host->hostt;
+	struct Scsi_Host *host = scmd->device->host;
+	struct scsi_host_template *hostt = host->hostt;
 
 	if (!hostt->eh_device_reset_handler)
 		return FAILED;
 
+	if (scsi_begin_eh(host))
+		return FAST_IO_FAIL;
+
 	rtn = hostt->eh_device_reset_handler(scmd);
 	if (rtn == SUCCESS)
 		__scsi_report_device_reset(scmd->device, NULL);
+	scsi_end_eh(host);
+
 	return rtn;
 }
 
@@ -797,6 +860,9 @@ static int scsi_send_eh_cmnd(struct scsi_cmnd *scmd, unsigned char *cmnd,
 	const unsigned long stall_for = msecs_to_jiffies(100);
 	int rtn;
 
+	if (scsi_begin_eh(shost))
+		return FAILED;
+
 retry:
 	scsi_eh_prep_cmnd(scmd, &ses, cmnd, cmnd_size, sense_bytes);
 	shost->eh_action = &done;
@@ -867,6 +933,8 @@ retry:
 			rtn = sdrv->eh_action(scmd, cmnd, cmnd_size, rtn);
 	}
 
+	scsi_end_eh(shost);
+
 	return rtn;
 }
 
@@ -1894,6 +1962,9 @@ int scsi_error_handler(void *data)
 	}
 	__set_current_state(TASK_RUNNING);
 
+	WARN_ONCE(shost->eh_active, "scsi_eh_%d: eh_active = %d\n",
+		  shost->host_no, shost->eh_active);
+
 	SCSI_LOG_ERROR_RECOVERY(1,
 		printk("Error handler scsi_eh_%d exiting\n", shost->host_no));
 	shost->ehandler = NULL;
@@ -1990,7 +2061,6 @@ scsi_reset_provider(struct scsi_device *dev, int flag)
 	struct scsi_cmnd *scmd;
 	struct Scsi_Host *shost = dev->host;
 	struct request req;
-	unsigned long flags;
 	int rtn;
 
 	if (scsi_autopm_get_host(shost) < 0)
@@ -2009,10 +2079,6 @@ scsi_reset_provider(struct scsi_device *dev, int flag)
 
 	scmd->sc_data_direction		= DMA_BIDIRECTIONAL;
 
-	spin_lock_irqsave(shost->host_lock, flags);
-	shost->tmf_in_progress = 1;
-	spin_unlock_irqrestore(shost->host_lock, flags);
-
 	switch (flag) {
 	case SCSI_TRY_RESET_DEVICE:
 		rtn = scsi_try_bus_device_reset(scmd);
@@ -2036,10 +2102,6 @@ scsi_reset_provider(struct scsi_device *dev, int flag)
 		rtn = FAILED;
 	}
 
-	spin_lock_irqsave(shost->host_lock, flags);
-	shost->tmf_in_progress = 0;
-	spin_unlock_irqrestore(shost->host_lock, flags);
-
 	/*
 	 * be sure to wake up anyone who was sleeping or had their queue
 	 * suspended while we performed the TMF.
@@ -2048,8 +2110,6 @@ scsi_reset_provider(struct scsi_device *dev, int flag)
 		printk("%s: waking up host to restart after TMF\n",
 		__func__));
 
-	wake_up(&shost->host_wait);
-
 	scsi_run_host_queues(shost);
 
 	scsi_next_command(scmd);
diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
index 7552435..9785e51 100644
--- a/include/scsi/scsi_host.h
+++ b/include/scsi/scsi_host.h
@@ -578,6 +578,7 @@ struct Scsi_Host {
 	struct task_struct    * ehandler;  /* Error recovery thread. */
 	struct completion     * eh_action; /* Wait for specific actions on the
 					      host. */
+	int			eh_active;
 	wait_queue_head_t       host_wait;
 	struct scsi_host_template *hostt;
 	struct scsi_transport_template *transportt;
@@ -665,9 +666,6 @@ struct Scsi_Host {
 	 */
 	unsigned ordered_tag:1;
 
-	/* Task mgmt function in progress */
-	unsigned tmf_in_progress:1;
-
 	/* Asynchronous scan in progress */
 	unsigned async_scan:1;
 
@@ -771,7 +769,7 @@ static inline int scsi_host_in_recovery(struct Scsi_Host *shost)
 	return shost->shost_state == SHOST_RECOVERY ||
 		shost->shost_state == SHOST_CANCEL_RECOVERY ||
 		shost->shost_state == SHOST_DEL_RECOVERY ||
-		shost->tmf_in_progress;
+		shost->eh_active;
 }
 
 extern int scsi_queue_work(struct Scsi_Host *, struct work_struct *);
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* PATCH v11 7/9] Avoid that scsi_device_set_state() triggers a race
  2013-06-12 12:48 [PATCH v11 0/9] More device removal fixes Bart Van Assche
                   ` (5 preceding siblings ...)
  2013-06-12 12:55 ` [PATCH v11 6/9] Make scsi_remove_host() wait until error handling finished Bart Van Assche
@ 2013-06-12 12:56 ` Bart Van Assche
  2013-06-12 12:57 ` [PATCH v11 8/9] Save and restore host_scribble during error handling Bart Van Assche
  2013-06-12 12:58 ` [PATCH v11 9/9] Avoid reenabling I/O after the transport became offline Bart Van Assche
  8 siblings, 0 replies; 51+ messages in thread
From: Bart Van Assche @ 2013-06-12 12:56 UTC (permalink / raw)
  To: linux-scsi
  Cc: Joe Lawrence, Tejun Heo, Chanho Min, David Milburn,
	James Bottomley, Mike Christie, Hannes Reinecke

Make concurrent invocations of scsi_device_set_state() safe.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: Hannes Reinecke <hare@suse.de>
Cc: James Bottomley <JBottomley@Parallels.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Mike Christie <michaelc@cs.wisc.edu>
---
 drivers/scsi/scsi_error.c |    4 ++++
 drivers/scsi/scsi_lib.c   |   43 ++++++++++++++++++++++++++++++++++---------
 drivers/scsi/scsi_scan.c  |   15 ++++++++-------
 drivers/scsi/scsi_sysfs.c |   18 ++++++++++++++----
 4 files changed, 60 insertions(+), 20 deletions(-)

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 1f2f593..2c99eab 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -1448,7 +1448,11 @@ static void scsi_eh_offline_sdevs(struct list_head *work_q,
 	list_for_each_entry_safe(scmd, next, work_q, eh_entry) {
 		sdev_printk(KERN_INFO, scmd->device, "Device offlined - "
 			    "not ready after error recovery\n");
+
+		spin_lock_irq(scmd->device->host->host_lock);
 		scsi_device_set_state(scmd->device, SDEV_OFFLINE);
+		spin_unlock_irq(scmd->device->host->host_lock);
+
 		if (scmd->eh_eflags & SCSI_EH_CANCEL_CMD) {
 			/*
 			 * FIXME: Handle lost cmds.
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index e1cc1d1..10c054f 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -2078,7 +2078,9 @@ EXPORT_SYMBOL(scsi_test_unit_ready);
  *	@state:	state to change to.
  *
  *	Returns zero if unsuccessful or an error if the requested 
- *	transition is illegal.
+ *	transition is illegal. It is the responsibility of the caller to make
+ *      sure that a call of this function does not race with other code that
+ *      accesses the device state, e.g. by holding the host lock.
  */
 int
 scsi_device_set_state(struct scsi_device *sdev, enum scsi_device_state state)
@@ -2359,7 +2361,13 @@ EXPORT_SYMBOL_GPL(sdev_evt_send_simple);
 int
 scsi_device_quiesce(struct scsi_device *sdev)
 {
-	int err = scsi_device_set_state(sdev, SDEV_QUIESCE);
+	struct Scsi_Host *host = sdev->host;
+	int err;
+
+	spin_lock_irq(host->host_lock);
+	err = scsi_device_set_state(sdev, SDEV_QUIESCE);
+	spin_unlock_irq(host->host_lock);
+
 	if (err)
 		return err;
 
@@ -2383,13 +2391,21 @@ EXPORT_SYMBOL(scsi_device_quiesce);
  */
 void scsi_device_resume(struct scsi_device *sdev)
 {
+	struct Scsi_Host *host = sdev->host;
+	int err;
+
 	/* check if the device state was mutated prior to resume, and if
 	 * so assume the state is being managed elsewhere (for example
 	 * device deleted during suspend)
 	 */
-	if (sdev->sdev_state != SDEV_QUIESCE ||
-	    scsi_device_set_state(sdev, SDEV_RUNNING))
+	spin_lock_irq(host->host_lock);
+	err = sdev->sdev_state == SDEV_QUIESCE ?
+		scsi_device_set_state(sdev, SDEV_RUNNING) : -EINVAL;
+	spin_unlock_irq(host->host_lock);
+
+	if (err)
 		return;
+
 	scsi_run_queue(sdev->request_queue);
 }
 EXPORT_SYMBOL(scsi_device_resume);
@@ -2439,17 +2455,19 @@ EXPORT_SYMBOL(scsi_target_resume);
 int
 scsi_internal_device_block(struct scsi_device *sdev)
 {
+	struct Scsi_Host *host = sdev->host;
 	struct request_queue *q = sdev->request_queue;
 	unsigned long flags;
 	int err = 0;
 
+	spin_lock_irqsave(host->host_lock, flags);
 	err = scsi_device_set_state(sdev, SDEV_BLOCK);
-	if (err) {
+	if (err)
 		err = scsi_device_set_state(sdev, SDEV_CREATED_BLOCK);
+	spin_unlock_irqrestore(host->host_lock, flags);
 
-		if (err)
-			return err;
-	}
+	if (err)
+		return err;
 
 	/* 
 	 * The device has transitioned to SDEV_BLOCK.  Stop the
@@ -2484,13 +2502,16 @@ int
 scsi_internal_device_unblock(struct scsi_device *sdev,
 			     enum scsi_device_state new_state)
 {
+	struct Scsi_Host *host = sdev->host;
 	struct request_queue *q = sdev->request_queue; 
 	unsigned long flags;
+	int ret = 0;
 
 	/*
 	 * Try to transition the scsi device to SDEV_RUNNING or one of the
 	 * offlined states and goose the device queue if successful.
 	 */
+	spin_lock_irqsave(host->host_lock, flags);
 	if ((sdev->sdev_state == SDEV_BLOCK) ||
 	    (sdev->sdev_state == SDEV_TRANSPORT_OFFLINE))
 		sdev->sdev_state = new_state;
@@ -2502,7 +2523,11 @@ scsi_internal_device_unblock(struct scsi_device *sdev,
 			sdev->sdev_state = SDEV_CREATED;
 	} else if (sdev->sdev_state != SDEV_CANCEL &&
 		 sdev->sdev_state != SDEV_OFFLINE)
-		return -EINVAL;
+		ret = -EINVAL;
+	spin_unlock_irqrestore(host->host_lock, flags);
+
+	if (ret)
+		return ret;
 
 	spin_lock_irqsave(q->queue_lock, flags);
 	blk_start_queue(q);
diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index 3e58b22..5041aa8 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -898,18 +898,19 @@ static int scsi_add_lun(struct scsi_device *sdev, unsigned char *inq_result,
 	if (*bflags & BLIST_USE_10_BYTE_MS)
 		sdev->use_10_for_ms = 1;
 
+	spin_lock_irq(sdev->host->host_lock);
 	/* set the device running here so that slave configure
 	 * may do I/O */
 	ret = scsi_device_set_state(sdev, SDEV_RUNNING);
-	if (ret) {
+	if (ret)
 		ret = scsi_device_set_state(sdev, SDEV_BLOCK);
+	spin_unlock_irq(sdev->host->host_lock);
 
-		if (ret) {
-			sdev_printk(KERN_ERR, sdev,
-				    "in wrong state %s to complete scan\n",
-				    scsi_device_state_name(sdev->sdev_state));
-			return SCSI_SCAN_NO_RESPONSE;
-		}
+	if (ret) {
+		sdev_printk(KERN_ERR, sdev,
+			    "in wrong state %s to complete scan\n",
+			    scsi_device_state_name(sdev->sdev_state));
+		return SCSI_SCAN_NO_RESPONSE;
 	}
 
 	if (*bflags & BLIST_MS_192_BYTES_FOR_3F)
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index fe3ea686..11318cc 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -592,7 +592,7 @@ static ssize_t
 store_state_field(struct device *dev, struct device_attribute *attr,
 		  const char *buf, size_t count)
 {
-	int i;
+	int i, ret;
 	struct scsi_device *sdev = to_scsi_device(dev);
 	enum scsi_device_state state = INVALID_SDEV_STATE;
 
@@ -608,9 +608,11 @@ store_state_field(struct device *dev, struct device_attribute *attr,
 	    state == SDEV_DEL)
 		return -EINVAL;
 
-	if (scsi_device_set_state(sdev, state))
-		return -EINVAL;
-	return count;
+	spin_lock_irq(sdev->host->host_lock);
+	ret = scsi_device_set_state(sdev, state);
+	spin_unlock_irq(sdev->host->host_lock);
+
+	return ret < 0 ? ret : count;
 }
 
 static ssize_t
@@ -870,7 +872,10 @@ int scsi_sysfs_add_sdev(struct scsi_device *sdev)
 	struct request_queue *rq = sdev->request_queue;
 	struct scsi_target *starget = sdev->sdev_target;
 
+	spin_lock_irq(sdev->host->host_lock);
 	error = scsi_device_set_state(sdev, SDEV_RUNNING);
+	spin_unlock_irq(sdev->host->host_lock);
+
 	if (error)
 		return error;
 
@@ -957,9 +962,11 @@ void __scsi_remove_device(struct scsi_device *sdev)
 	unsigned long flags;
 
 	if (sdev->is_visible) {
+		spin_lock_irqsave(shost->host_lock, flags);
 		WARN(scsi_device_set_state(sdev, SDEV_CANCEL) != 0,
 		     "%s: unexpected state %d\n", dev_name(&sdev->sdev_gendev),
 		     sdev->sdev_state);
+		spin_unlock_irqrestore(shost->host_lock, flags);
 
 		bsg_unregister_queue(sdev->request_queue);
 		device_unregister(&sdev->sdev_dev);
@@ -973,7 +980,10 @@ void __scsi_remove_device(struct scsi_device *sdev)
 	 * scsi_run_queue() invocations have finished before tearing down the
 	 * device.
 	 */
+	spin_lock_irqsave(shost->host_lock, flags);
 	WARN_ON_ONCE(scsi_device_set_state(sdev, SDEV_DEL) != 0);
+	spin_unlock_irqrestore(shost->host_lock, flags);
+
 	blk_cleanup_queue(sdev->request_queue);
 	cancel_work_sync(&sdev->requeue_work);
 
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v11 8/9] Save and restore host_scribble during error handling
  2013-06-12 12:48 [PATCH v11 0/9] More device removal fixes Bart Van Assche
                   ` (6 preceding siblings ...)
  2013-06-12 12:56 ` PATCH v11 7/9] Avoid that scsi_device_set_state() triggers a race Bart Van Assche
@ 2013-06-12 12:57 ` Bart Van Assche
  2013-06-24  1:21   ` Mike Christie
  2013-06-12 12:58 ` [PATCH v11 9/9] Avoid reenabling I/O after the transport became offline Bart Van Assche
  8 siblings, 1 reply; 51+ messages in thread
From: Bart Van Assche @ 2013-06-12 12:57 UTC (permalink / raw)
  To: linux-scsi
  Cc: Joe Lawrence, Tejun Heo, Chanho Min, David Milburn,
	Mike Christie, Hannes Reinecke, James Bottomley

A SCSI LLD may overwrite host_scribble in its queuecommand()
implementation. Several drivers need that field to process
requests and aborts correctly. Hence this field must be saved
by scsi_eh_prep_cmnd() and must be restored by
scsi_eh_restore_cmnd().

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Cc: James Bottomley <JBottomley@Parallels.com>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Tejun Heo <tj@kernel.org>
---
 drivers/scsi/scsi_error.c |    2 ++
 include/scsi/scsi_eh.h    |    1 +
 2 files changed, 3 insertions(+)

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 2c99eab..728d573 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -771,6 +771,7 @@ void scsi_eh_prep_cmnd(struct scsi_cmnd *scmd, struct scsi_eh_save *ses,
 	ses->result = scmd->result;
 	ses->underflow = scmd->underflow;
 	ses->prot_op = scmd->prot_op;
+	ses->host_scribble = scmd->host_scribble;
 
 	scmd->prot_op = SCSI_PROT_NORMAL;
 	scmd->cmnd = ses->eh_cmnd;
@@ -832,6 +833,7 @@ void scsi_eh_restore_cmnd(struct scsi_cmnd* scmd, struct scsi_eh_save *ses)
 	scmd->result = ses->result;
 	scmd->underflow = ses->underflow;
 	scmd->prot_op = ses->prot_op;
+	scmd->host_scribble = ses->host_scribble;
 }
 EXPORT_SYMBOL(scsi_eh_restore_cmnd);
 
diff --git a/include/scsi/scsi_eh.h b/include/scsi/scsi_eh.h
index 06a8790..7bdb02f 100644
--- a/include/scsi/scsi_eh.h
+++ b/include/scsi/scsi_eh.h
@@ -83,6 +83,7 @@ struct scsi_eh_save {
 	/* new command support */
 	unsigned char eh_cmnd[BLK_MAX_CDB];
 	struct scatterlist sense_sgl;
+	unsigned char *host_scribble;
 };
 
 extern void scsi_eh_prep_cmnd(struct scsi_cmnd *scmd,
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v11 9/9] Avoid reenabling I/O after the transport became offline
  2013-06-12 12:48 [PATCH v11 0/9] More device removal fixes Bart Van Assche
                   ` (7 preceding siblings ...)
  2013-06-12 12:57 ` [PATCH v11 8/9] Save and restore host_scribble during error handling Bart Van Assche
@ 2013-06-12 12:58 ` Bart Van Assche
  8 siblings, 0 replies; 51+ messages in thread
From: Bart Van Assche @ 2013-06-12 12:58 UTC (permalink / raw)
  To: linux-scsi
  Cc: Joe Lawrence, Tejun Heo, Chanho Min, David Milburn,
	James Bottomley, Mike Christie, Hannes Reinecke

Disallow the SDEV_TRANSPORT_OFFLINE to SDEV_CANCEL transition such
that no I/O is sent to devices for which the transport is offline.
Notes:
- Functions like sd_shutdown() use scsi_execute_req() and hence
  set the REQ_PREEMPT flag. Such requests are passed to the LLD
  queuecommand callback in the SDEV_CANCEL state.
- This patch does not affect Fibre Channel LLD drivers since these
  drivers invoke fc_remote_port_chkready() before submitting a SCSI
  request to the HBA. That prevents a timeout to occur in state
  SDEV_CANCEL if the transport is offline.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Cc: James Bottomley <JBottomley@Parallels.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Tejun Heo <tj@kernel.org>
---
 drivers/scsi/scsi_lib.c   |    1 -
 drivers/scsi/scsi_sysfs.c |    3 ++-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 10c054f..024e768 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -2162,7 +2162,6 @@ scsi_device_set_state(struct scsi_device *sdev, enum scsi_device_state state)
 		case SDEV_RUNNING:
 		case SDEV_QUIESCE:
 		case SDEV_OFFLINE:
-		case SDEV_TRANSPORT_OFFLINE:
 		case SDEV_BLOCK:
 			break;
 		default:
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index 11318cc..dd2005c 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -963,7 +963,8 @@ void __scsi_remove_device(struct scsi_device *sdev)
 
 	if (sdev->is_visible) {
 		spin_lock_irqsave(shost->host_lock, flags);
-		WARN(scsi_device_set_state(sdev, SDEV_CANCEL) != 0,
+		WARN(scsi_device_set_state(sdev, SDEV_CANCEL) != 0 &&
+		     sdev->sdev_state != SDEV_TRANSPORT_OFFLINE,
 		     "%s: unexpected state %d\n", dev_name(&sdev->sdev_gendev),
 		     sdev->sdev_state);
 		spin_unlock_irqrestore(shost->host_lock, flags);
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCH v11 3/9] Avoid calling __scsi_remove_device() twice
  2013-06-12 12:52 ` [PATCH v11 3/9] Avoid calling __scsi_remove_device() twice Bart Van Assche
@ 2013-06-23 21:35   ` Mike Christie
  2013-06-24  6:29     ` Bart Van Assche
  2013-06-24 17:38   ` James Bottomley
  1 sibling, 1 reply; 51+ messages in thread
From: Mike Christie @ 2013-06-23 21:35 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: linux-scsi, Joe Lawrence, Tejun Heo, Chanho Min, David Milburn,
	James Bottomley, Hannes Reinecke

On 06/12/2013 07:52 AM, Bart Van Assche wrote:
> SCSI devices are added to the shost->__devices list from inside
> scsi_alloc_sdev(). If something goes wrong during LUN scanning,
> e.g. a transport layer failure occurs, then __scsi_remove_device()
> can get invoked by the LUN scanning code for a SCSI device in
> state SDEV_CREATED_BLOCK or SDEV_BLOCKED. If this happens then
> the SCSI device has not yet been added to sysfs (is_visible == 0).
> Make sure that if this happens these devices are transitioned
> into state SDEV_DEL. This avoids that __scsi_remove_device()
> gets invoked a second time by scsi_forget_host().
> 
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> Cc: James Bottomley <JBottomley@Parallels.com>
> Cc: Mike Christie <michaelc@cs.wisc.edu>
> Cc: Hannes Reinecke <hare@suse.de>
> Cc: Tejun Heo <tj@kernel.org>
> ---
>  drivers/scsi/scsi_lib.c   |    2 ++
>  drivers/scsi/scsi_sysfs.c |    7 ++++---
>  2 files changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 03d4165..d57b5e1 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -2175,6 +2175,8 @@ scsi_device_set_state(struct scsi_device *sdev, enum scsi_device_state state)
>  		case SDEV_OFFLINE:
>  		case SDEV_TRANSPORT_OFFLINE:
>  		case SDEV_CANCEL:
> +		case SDEV_BLOCK:
> +		case SDEV_CREATED_BLOCK:
>  			break;
>  		default:
>  			goto illegal;
> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
> index 34f1b39..f869ef85 100644
> --- a/drivers/scsi/scsi_sysfs.c
> +++ b/drivers/scsi/scsi_sysfs.c
> @@ -956,8 +956,9 @@ void __scsi_remove_device(struct scsi_device *sdev)
>  	unsigned long flags;
>  
>  	if (sdev->is_visible) {
> -		if (scsi_device_set_state(sdev, SDEV_CANCEL) != 0)
> -			return;
> +		WARN(scsi_device_set_state(sdev, SDEV_CANCEL) != 0,
> +		     "%s: unexpected state %d\n", dev_name(&sdev->sdev_gendev),
> +		     sdev->sdev_state);
>  
>  		bsg_unregister_queue(sdev->request_queue);
>  		device_unregister(&sdev->sdev_dev);
> @@ -971,7 +972,7 @@ void __scsi_remove_device(struct scsi_device *sdev)
>  	 * scsi_run_queue() invocations have finished before tearing down the
>  	 * device.
>  	 */
> -	scsi_device_set_state(sdev, SDEV_DEL);
> +	WARN_ON_ONCE(scsi_device_set_state(sdev, SDEV_DEL) != 0);
>  	blk_cleanup_queue(sdev->request_queue);
>  	cancel_work_sync(&sdev->requeue_work);
>  
> 



The Addition to scsi_device_set_state looks ok. I was not sure why we
need the new WARNs though. Do we still think there are other cases?

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v11 4/9] Disallow changing the device state via sysfs into "deleted"
  2013-06-12 12:53 ` [PATCH v11 4/9] Disallow changing the device state via sysfs into "deleted" Bart Van Assche
@ 2013-06-24  1:05   ` Mike Christie
  2013-06-24  6:35     ` Bart Van Assche
  2013-06-24 17:59   ` James Bottomley
  1 sibling, 1 reply; 51+ messages in thread
From: Mike Christie @ 2013-06-24  1:05 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: linux-scsi, Joe Lawrence, Tejun Heo, Chanho Min, David Milburn,
	James Bottomley, Hannes Reinecke

On 6/12/13 7:53 AM, Bart Van Assche wrote:
> Changing the state of a SCSI device via sysfs into "cancel" or
> "deleted" prevents removal of these devices by scsi_remove_host().
> Hence do not allow this. Also, introduce the symbolic name
> INVALID_SDEV_STATE, representing a value different from any valid


Maybe it should be named SDEV_INVALID_STATE to match the current naming 
scheme? Other than that, it seems ok.



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v11 5/9] Avoid saving/restoring interrupt state inside scsi_remove_host()
  2013-06-12 12:54 ` [PATCH v11 5/9] Avoid saving/restoring interrupt state inside scsi_remove_host() Bart Van Assche
@ 2013-06-24  1:06   ` Mike Christie
  0 siblings, 0 replies; 51+ messages in thread
From: Mike Christie @ 2013-06-24  1:06 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: linux-scsi, Joe Lawrence, Tejun Heo, Chanho Min, David Milburn,
	James Bottomley, Hannes Reinecke

On 6/12/13 7:54 AM, Bart Van Assche wrote:
> Since it is not allowed to invoke scsi_remove_host() with interrupts
> disabled, avoid saving and restoring the interrupt state inside
> scsi_remove_host(). This patch does not change the functionality of
> the function scsi_remove_host().
>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> Acked-by: Tejun Heo <tj@kernel.org>
> Acked-by: Hannes Reinecke <hare@suse.de>
> Cc: Mike Christie <michaelc@cs.wisc.edu>
> Cc: James Bottomley <JBottomley@Parallels.com>


Seems ok.

Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v11 6/9] Make scsi_remove_host() wait until error handling finished
  2013-06-12 12:55 ` [PATCH v11 6/9] Make scsi_remove_host() wait until error handling finished Bart Van Assche
@ 2013-06-24  1:15   ` Mike Christie
  2013-06-24  6:49     ` Bart Van Assche
  2013-06-24 19:19   ` James Bottomley
  1 sibling, 1 reply; 51+ messages in thread
From: Mike Christie @ 2013-06-24  1:15 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: linux-scsi, Joe Lawrence, Tejun Heo, Chanho Min, David Milburn,
	Hannes Reinecke

On 6/12/13 7:55 AM, Bart Van Assche wrote:
> A SCSI LLD may start cleaning up host resources as soon as
> scsi_remove_host() returns. These host resources may be needed by
> the LLD in an implementation of one of the eh_* functions. So if
> one of the eh_* functions is in progress when scsi_remove_host()
> is invoked, wait until the eh_* function has finished. Also, do
> not invoke any of the eh_* functions after scsi_remove_host() has
> started. Remove Scsi_Host.tmf_in_progress because it is now
> superfluous.
>

I think the patch looks ok for drivers that do not implement their own
eh_strategy_handler, but what about SAS? If you added a scsi_begin_eh in 
scsi_error_handler before the eh_strategy_handler is called and then add 
a scsi_end_eh after it is called, I think it would cover them too.

> @@ -1894,6 +1962,9 @@ int scsi_error_handler(void *data)
>   	}
>   	__set_current_state(TASK_RUNNING);
>
> +	WARN_ONCE(shost->eh_active, "scsi_eh_%d: eh_active = %d\n",
> +		  shost->host_no, shost->eh_active);
> +
>   	SCSI_LOG_ERROR_RECOVERY(1,
>   		printk("Error handler scsi_eh_%d exiting\n", shost->host_no));
>   	shost->ehandler = NULL;

What is the warn for? Is there a chance this can happen with some non 
upstream driver or are you just adding it just in case?


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v11 8/9] Save and restore host_scribble during error handling
  2013-06-12 12:57 ` [PATCH v11 8/9] Save and restore host_scribble during error handling Bart Van Assche
@ 2013-06-24  1:21   ` Mike Christie
  2013-06-24  2:08     ` James Bottomley
  0 siblings, 1 reply; 51+ messages in thread
From: Mike Christie @ 2013-06-24  1:21 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: linux-scsi, Joe Lawrence, Tejun Heo, Chanho Min, David Milburn,
	Hannes Reinecke, James Bottomley

On 6/12/13 7:57 AM, Bart Van Assche wrote:
> A SCSI LLD may overwrite host_scribble in its queuecommand()
> implementation. Several drivers need that field to process
> requests and aborts correctly. Hence this field must be saved
> by scsi_eh_prep_cmnd() and must be restored by
> scsi_eh_restore_cmnd().
>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> Cc: James Bottomley <JBottomley@Parallels.com>
> Cc: Mike Christie <michaelc@cs.wisc.edu>
> Cc: Hannes Reinecke <hare@suse.de>
> Cc: Tejun Heo <tj@kernel.org>


How have we not hit this before :) ?

Looks ok.

Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v11 2/9] Remove get_device() / put_device() pair from scsi_request_fn()
  2013-06-12 12:51 ` [PATCH v11 2/9] Remove get_device() / put_device() pair from scsi_request_fn() Bart Van Assche
@ 2013-06-24  1:29   ` Mike Christie
  2013-06-24  2:36   ` James Bottomley
  1 sibling, 0 replies; 51+ messages in thread
From: Mike Christie @ 2013-06-24  1:29 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: linux-scsi, Joe Lawrence, Tejun Heo, Chanho Min, David Milburn,
	James Bottomley, Hannes Reinecke

On 6/12/13 7:51 AM, Bart Van Assche wrote:
> Now that all scsi_request_fn() callers hold a reference on the
> SCSI device that function is invoked for and since
> blk_cleanup_queue() waits until scsi_request_fn() has finished
> it is safe to remove the get_device() / put_device() pair from
> scsi_request_fn().
>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> Acked-by: Tejun Heo <tj@kernel.org>
> Cc: James Bottomley <JBottomley@Parallels.com>
> Cc: Hannes Reinecke <hare@suse.de>
> Cc: Mike Christie <michaelc@cs.wisc.edu>
> ---


Looks ok.

Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v11 8/9] Save and restore host_scribble during error handling
  2013-06-24  1:21   ` Mike Christie
@ 2013-06-24  2:08     ` James Bottomley
  0 siblings, 0 replies; 51+ messages in thread
From: James Bottomley @ 2013-06-24  2:08 UTC (permalink / raw)
  To: Mike Christie
  Cc: Bart Van Assche, linux-scsi, Joe Lawrence, Tejun Heo, Chanho Min,
	David Milburn, Hannes Reinecke

On Sun, 2013-06-23 at 20:21 -0500, Mike Christie wrote:
> On 6/12/13 7:57 AM, Bart Van Assche wrote:
> > A SCSI LLD may overwrite host_scribble in its queuecommand()
> > implementation. Several drivers need that field to process
> > requests and aborts correctly. Hence this field must be saved
> > by scsi_eh_prep_cmnd() and must be restored by
> > scsi_eh_restore_cmnd().
> >
> > Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> > Cc: James Bottomley <JBottomley@Parallels.com>
> > Cc: Mike Christie <michaelc@cs.wisc.edu>
> > Cc: Hannes Reinecke <hare@suse.de>
> > Cc: Tejun Heo <tj@kernel.org>
> 
> 
> How have we not hit this before :) ?

Well, because the patch is making bogus assumptions and then acting on
them.   host_scribble may only exist for the lifetime of the command ...
meaning from when we pass the command into the hba to when it comes
back.  When we begin error recovery, that lifetime is over, so
host_scribble should be cleared at this point to reuse the command
because that's how it is on a pristine command, so this patch, if
applied, would cause us to violate that assumption.

Fortunately, I don't think any HBA driver expects host_scribble to be
NULL, so they all unconditionally overwrite the value in their
queuecommand routines if they use it, so I think the net effect of
applying this would have been zero, but it could easily have been
unfortunate.

> Looks ok.

Oops, minus 10 review kudos points for you.  However, you seem to have a
stock of several million of these built up over the years, so the net
effect on your maintainer trust rating for reviews is zero.

James


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v11 2/9] Remove get_device() / put_device() pair from scsi_request_fn()
  2013-06-12 12:51 ` [PATCH v11 2/9] Remove get_device() / put_device() pair from scsi_request_fn() Bart Van Assche
  2013-06-24  1:29   ` Mike Christie
@ 2013-06-24  2:36   ` James Bottomley
  2013-06-24  7:13     ` Bart Van Assche
  1 sibling, 1 reply; 51+ messages in thread
From: James Bottomley @ 2013-06-24  2:36 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: linux-scsi, Joe Lawrence, Tejun Heo, Chanho Min, David Milburn,
	Hannes Reinecke, Mike Christie

On Wed, 2013-06-12 at 14:51 +0200, Bart Van Assche wrote:
> Now that all scsi_request_fn() callers hold a reference on the
> SCSI device that function is invoked for

What makes you think that this is a true statement?  The usual caller is
the block layer, which doesn't really know anything about the
sdev->sdev_gendev.

James


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v11 3/9] Avoid calling __scsi_remove_device() twice
  2013-06-23 21:35   ` Mike Christie
@ 2013-06-24  6:29     ` Bart Van Assche
  0 siblings, 0 replies; 51+ messages in thread
From: Bart Van Assche @ 2013-06-24  6:29 UTC (permalink / raw)
  To: Mike Christie
  Cc: linux-scsi, Joe Lawrence, Tejun Heo, Chanho Min, David Milburn,
	James Bottomley, Hannes Reinecke

On 06/23/13 23:35, Mike Christie wrote:
> On 06/12/2013 07:52 AM, Bart Van Assche wrote:
>> SCSI devices are added to the shost->__devices list from inside
>> scsi_alloc_sdev(). If something goes wrong during LUN scanning,
>> e.g. a transport layer failure occurs, then __scsi_remove_device()
>> can get invoked by the LUN scanning code for a SCSI device in
>> state SDEV_CREATED_BLOCK or SDEV_BLOCKED. If this happens then
>> the SCSI device has not yet been added to sysfs (is_visible == 0).
>> Make sure that if this happens these devices are transitioned
>> into state SDEV_DEL. This avoids that __scsi_remove_device()
>> gets invoked a second time by scsi_forget_host().
>>
>> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
>> Cc: James Bottomley <JBottomley@Parallels.com>
>> Cc: Mike Christie <michaelc@cs.wisc.edu>
>> Cc: Hannes Reinecke <hare@suse.de>
>> Cc: Tejun Heo <tj@kernel.org>
>> ---
>>   drivers/scsi/scsi_lib.c   |    2 ++
>>   drivers/scsi/scsi_sysfs.c |    7 ++++---
>>   2 files changed, 6 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
>> index 03d4165..d57b5e1 100644
>> --- a/drivers/scsi/scsi_lib.c
>> +++ b/drivers/scsi/scsi_lib.c
>> @@ -2175,6 +2175,8 @@ scsi_device_set_state(struct scsi_device *sdev, enum scsi_device_state state)
>>   		case SDEV_OFFLINE:
>>   		case SDEV_TRANSPORT_OFFLINE:
>>   		case SDEV_CANCEL:
>> +		case SDEV_BLOCK:
>> +		case SDEV_CREATED_BLOCK:
>>   			break;
>>   		default:
>>   			goto illegal;
>> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
>> index 34f1b39..f869ef85 100644
>> --- a/drivers/scsi/scsi_sysfs.c
>> +++ b/drivers/scsi/scsi_sysfs.c
>> @@ -956,8 +956,9 @@ void __scsi_remove_device(struct scsi_device *sdev)
>>   	unsigned long flags;
>>
>>   	if (sdev->is_visible) {
>> -		if (scsi_device_set_state(sdev, SDEV_CANCEL) != 0)
>> -			return;
>> +		WARN(scsi_device_set_state(sdev, SDEV_CANCEL) != 0,
>> +		     "%s: unexpected state %d\n", dev_name(&sdev->sdev_gendev),
>> +		     sdev->sdev_state);
>>
>>   		bsg_unregister_queue(sdev->request_queue);
>>   		device_unregister(&sdev->sdev_dev);
>> @@ -971,7 +972,7 @@ void __scsi_remove_device(struct scsi_device *sdev)
>>   	 * scsi_run_queue() invocations have finished before tearing down the
>>   	 * device.
>>   	 */
>> -	scsi_device_set_state(sdev, SDEV_DEL);
>> +	WARN_ON_ONCE(scsi_device_set_state(sdev, SDEV_DEL) != 0);
>>   	blk_cleanup_queue(sdev->request_queue);
>>   	cancel_work_sync(&sdev->requeue_work);
>
> The Addition to scsi_device_set_state looks ok. I was not sure why we
> need the new WARNs though. Do we still think there are other cases?

Hello Mike,

The new WARN statements helped me while testing and debugging this code. 
These statements might also be helpful if anyone would ever add more 
SCSI device states. I can leave these out if you want.

Bart.


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v11 4/9] Disallow changing the device state via sysfs into "deleted"
  2013-06-24  1:05   ` Mike Christie
@ 2013-06-24  6:35     ` Bart Van Assche
  0 siblings, 0 replies; 51+ messages in thread
From: Bart Van Assche @ 2013-06-24  6:35 UTC (permalink / raw)
  To: Mike Christie
  Cc: linux-scsi, Joe Lawrence, Tejun Heo, Chanho Min, David Milburn,
	James Bottomley, Hannes Reinecke

On 06/24/13 03:05, Mike Christie wrote:
> On 6/12/13 7:53 AM, Bart Van Assche wrote:
>> Changing the state of a SCSI device via sysfs into "cancel" or
>> "deleted" prevents removal of these devices by scsi_remove_host().
>> Hence do not allow this. Also, introduce the symbolic name
>> INVALID_SDEV_STATE, representing a value different from any valid
>
> Maybe it should be named SDEV_INVALID_STATE to match the current naming
> scheme? Other than that, it seems ok.

If nobody objects I will rename that new state into SDEV_INVALID_STATE.

Bart.


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v11 6/9] Make scsi_remove_host() wait until error handling finished
  2013-06-24  1:15   ` Mike Christie
@ 2013-06-24  6:49     ` Bart Van Assche
  0 siblings, 0 replies; 51+ messages in thread
From: Bart Van Assche @ 2013-06-24  6:49 UTC (permalink / raw)
  To: Mike Christie
  Cc: linux-scsi, Joe Lawrence, Tejun Heo, Chanho Min, David Milburn,
	Hannes Reinecke

On 06/24/13 03:15, Mike Christie wrote:
> On 6/12/13 7:55 AM, Bart Van Assche wrote:
>> A SCSI LLD may start cleaning up host resources as soon as
>> scsi_remove_host() returns. These host resources may be needed by
>> the LLD in an implementation of one of the eh_* functions. So if
>> one of the eh_* functions is in progress when scsi_remove_host()
>> is invoked, wait until the eh_* function has finished. Also, do
>> not invoke any of the eh_* functions after scsi_remove_host() has
>> started. Remove Scsi_Host.tmf_in_progress because it is now
>> superfluous.
> 
> I think the patch looks ok for drivers that do not implement their own
> eh_strategy_handler, but what about SAS? If you added a scsi_begin_eh in 
> scsi_error_handler before the eh_strategy_handler is called and then add 
> a scsi_end_eh after it is called, I think it would cover them too.

I will start testing the modification below for the patch at the start of
this thread:

--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -1950,10 +1950,14 @@ int scsi_error_handler(void *data)
                        continue;
                }
 
-               if (shost->transportt->eh_strategy_handler)
-                       shost->transportt->eh_strategy_handler(shost);
-               else
+               if (shost->transportt->eh_strategy_handler) {
+                       if (scsi_begin_eh(shost) == 0) {
+                               shost->transportt->eh_strategy_handler(shost);
+                               scsi_end_eh(shost);
+                       }
+               } else {
                        scsi_unjam_host(shost);
+               }
 
                /*
                 * Note - if the above fails completely, the action is to take


>> @@ -1894,6 +1962,9 @@ int scsi_error_handler(void *data)
>>       }
>>       __set_current_state(TASK_RUNNING);
>>
>> +    WARN_ONCE(shost->eh_active, "scsi_eh_%d: eh_active = %d\n",
>> +          shost->host_no, shost->eh_active);
>> +
>>       SCSI_LOG_ERROR_RECOVERY(1,
>>           printk("Error handler scsi_eh_%d exiting\n", shost->host_no));
>>       shost->ehandler = NULL;
> 
> What is the warn for? Is there a chance this can happen with some non 
> upstream driver or are you just adding it just in case?

This is code that helped me to test this patch. I can leave it out if
you prefer so.

Bart.


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v11 2/9] Remove get_device() / put_device() pair from scsi_request_fn()
  2013-06-24  2:36   ` James Bottomley
@ 2013-06-24  7:13     ` Bart Van Assche
  2013-06-24 13:34       ` James Bottomley
  0 siblings, 1 reply; 51+ messages in thread
From: Bart Van Assche @ 2013-06-24  7:13 UTC (permalink / raw)
  To: James Bottomley
  Cc: linux-scsi, Joe Lawrence, Tejun Heo, Chanho Min, David Milburn,
	Hannes Reinecke, Mike Christie

On 06/24/13 04:36, James Bottomley wrote:
> On Wed, 2013-06-12 at 14:51 +0200, Bart Van Assche wrote:
>> Now that all scsi_request_fn() callers hold a reference on the
>> SCSI device that function is invoked for
>
> What makes you think that this is a true statement?  The usual caller is
> the block layer, which doesn't really know anything about the
> sdev->sdev_gendev.

The reasoning behind that comment is as follows:
* The block layer guarantees that the reference count of a request
   queue is >= 1 as long as a request_fn() call is in progress (see also
   blk_cleanup_queue(), the __blk_drain_queue() call in that function
   and the loop in __blk_drain_queue() that waits until
   request_fn_active == 0).
* The SCSI core guarantees that blk_cleanup_queue() is invoked before
   the final put on sdev->sdev_gendev.

Bart.


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v11 2/9] Remove get_device() / put_device() pair from scsi_request_fn()
  2013-06-24  7:13     ` Bart Van Assche
@ 2013-06-24 13:34       ` James Bottomley
  2013-06-24 15:43         ` Bart Van Assche
  0 siblings, 1 reply; 51+ messages in thread
From: James Bottomley @ 2013-06-24 13:34 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: linux-scsi, Joe Lawrence, Tejun Heo, Chanho Min, David Milburn,
	Hannes Reinecke, Mike Christie

On Mon, 2013-06-24 at 09:13 +0200, Bart Van Assche wrote:
> On 06/24/13 04:36, James Bottomley wrote:
> > On Wed, 2013-06-12 at 14:51 +0200, Bart Van Assche wrote:
> >> Now that all scsi_request_fn() callers hold a reference on the
> >> SCSI device that function is invoked for
> >
> > What makes you think that this is a true statement?  The usual caller is
> > the block layer, which doesn't really know anything about the
> > sdev->sdev_gendev.
> 
> The reasoning behind that comment is as follows:
> * The block layer guarantees that the reference count of a request
>    queue is >= 1 as long as a request_fn() call is in progress (see also
>    blk_cleanup_queue(), the __blk_drain_queue() call in that function
>    and the loop in __blk_drain_queue() that waits until
>    request_fn_active == 0).
> * The SCSI core guarantees that blk_cleanup_queue() is invoked before
>    the final put on sdev->sdev_gendev.

That's still unclear.

I think a clear explanation is something like:

        scsi_devices may only be removed by by scsi_remove_device()
        which has a call to blk_cleanup_queue() before the final put of
        sdev->sdev_gendev.  Since blk_cleanup_queue() waits for the
        block queue to drain and then tears it down, scsi_request_fn
        cannot be active once blk_cleanup_queue() returns and hence the
        get_device/put_device pairs in scsi_request_fn are unnecessary
        because the final put will always be done by
        scsi_remove_device().

This, by the way, is an optimisation not a fix, so it shouldn't really
be part of a series labelled "device removal fixes"

James



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v11 1/9] Fix race between starved list and device removal
  2013-06-12 12:49 ` [PATCH v11 1/9] Fix race between starved list and device removal Bart Van Assche
@ 2013-06-24 15:38   ` James Bottomley
  2013-06-24 16:16     ` Bart Van Assche
  2013-06-24 17:24     ` Mike Christie
  0 siblings, 2 replies; 51+ messages in thread
From: James Bottomley @ 2013-06-24 15:38 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: linux-scsi, Joe Lawrence, Tejun Heo, Chanho Min, David Milburn,
	Hannes Reinecke, Mike Christie

On Wed, 2013-06-12 at 14:49 +0200, Bart Van Assche wrote:
> scsi_run_queue() examines all SCSI devices that are present on
> the starved list. Since scsi_run_queue() unlocks the SCSI host
> lock before running a queue a SCSI device can get removed after
> it has been removed from the starved list and before its queue
> is run. Protect against that race condition by increasing the
> sdev refcount before running the sdev queue. Also, remove a
> SCSI device from the starved list before __scsi_remove_device()
> decreases the sdev refcount such that the get_device() call
> added in scsi_run_queue() is guaranteed to succeed.
> 
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> Reported-and-tested-by: Chanho Min <chanho.min@lge.com>
> Reference: http://lkml.org/lkml/2012/8/2/96
> Acked-by: Tejun Heo <tj@kernel.org>
> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
> Cc: Hannes Reinecke <hare@suse.de>
> Cc: <stable@vger.kernel.org>
> ---
>  drivers/scsi/scsi_lib.c   |   16 +++++++++++-----
>  drivers/scsi/scsi_sysfs.c |   14 +++++++++++++-
>  2 files changed, 24 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 86d5220..d6ca072 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -456,11 +456,17 @@ static void scsi_run_queue(struct request_queue *q)
>  			continue;
>  		}
>  
> -		spin_unlock(shost->host_lock);
> -		spin_lock(sdev->request_queue->queue_lock);
> -		__blk_run_queue(sdev->request_queue);
> -		spin_unlock(sdev->request_queue->queue_lock);
> -		spin_lock(shost->host_lock);
> +		/*
> +		 * Obtain a reference before unlocking the host_lock such that
> +		 * the sdev can't disappear before blk_run_queue() is invoked.
> +		 */
> +		get_device(&sdev->sdev_gendev);
> +		spin_unlock_irqrestore(shost->host_lock, flags);
> +
> +		blk_run_queue(sdev->request_queue);
> +		put_device(&sdev->sdev_gendev);
> +
> +		spin_lock_irqsave(shost->host_lock, flags);
>  	}
>  	/* put any unprocessed entries back */
>  	list_splice(&starved_list, &shost->starved_list);
> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
> index 931a7d9..34f1b39 100644
> --- a/drivers/scsi/scsi_sysfs.c
> +++ b/drivers/scsi/scsi_sysfs.c
> @@ -345,7 +345,6 @@ static void scsi_device_dev_release_usercontext(struct work_struct *work)
>  	starget->reap_ref++;
>  	list_del(&sdev->siblings);
>  	list_del(&sdev->same_target_siblings);
> -	list_del(&sdev->starved_entry);
>  	spin_unlock_irqrestore(sdev->host->host_lock, flags);
>  
>  	cancel_work_sync(&sdev->event_work);
> @@ -953,6 +952,8 @@ int scsi_sysfs_add_sdev(struct scsi_device *sdev)
>  void __scsi_remove_device(struct scsi_device *sdev)
>  {
>  	struct device *dev = &sdev->sdev_gendev;
> +	struct Scsi_Host *shost = sdev->host;
> +	unsigned long flags;
>  
>  	if (sdev->is_visible) {
>  		if (scsi_device_set_state(sdev, SDEV_CANCEL) != 0)
> @@ -974,6 +975,17 @@ void __scsi_remove_device(struct scsi_device *sdev)
>  	blk_cleanup_queue(sdev->request_queue);
>  	cancel_work_sync(&sdev->requeue_work);
>  
> +	/*
> +	 * Remove a SCSI device from the starved list after blk_cleanup_queue()
> +	 * finished such that scsi_request_fn() can't add it back to that list.
> +	 * Also remove an sdev from the starved list before invoking
> +	 * put_device() such that get_device() is guaranteed to succeed for an
> +	 * sdev present on the starved list.
> +	 */
> +	spin_lock_irqsave(shost->host_lock, flags);
> +	list_del(&sdev->starved_entry);
> +	spin_unlock_irqrestore(shost->host_lock, flags);
> +
>  	if (sdev->host->hostt->slave_destroy)
>  		sdev->host->hostt->slave_destroy(sdev);
>  	transport_destroy_device(dev);

I really don't like this because it's shuffling potentially fragile
lifetime rules since you now have to have the sdev deleted from the
starved list before final put.  That becomes an unstated assumption
within the code.

The theory is that the starved list processing may be racing with a
scsi_remove_device, so when we unlock the host lock, the device (and the
queue) may be destroyed.  OK, so I agree with this, remote a possibility
though it may be.  The easy way of fixing it without making assumptions
is this, isn't it?  All it requires is that the queue be destroyed after
the starved list entry is deleted in the sdev release code.

James

---

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 86d5220..f294cc6 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -434,6 +434,7 @@ static void scsi_run_queue(struct request_queue *q)
 	list_splice_init(&shost->starved_list, &starved_list);
 
 	while (!list_empty(&starved_list)) {
+		struct request_queue *slq;
 		/*
 		 * As long as shost is accepting commands and we have
 		 * starved queues, call blk_run_queue. scsi_request_fn
@@ -456,10 +457,21 @@ static void scsi_run_queue(struct request_queue *q)
 			continue;
 		}
 
+		/*
+		 * once we drop the host lock, a racing scsi_remove_device may
+		 * remove the sdev from the starved list and destroy it and
+		 * the queue.  Mitigate by taking a reference to the queue and
+		 * never touching the sdev again after we drop the host lock.
+		 */
+		slq = sdev->request_queue;
+		if (!blk_get_queue(slq))
+			continue;
+
 		spin_unlock(shost->host_lock);
-		spin_lock(sdev->request_queue->queue_lock);
-		__blk_run_queue(sdev->request_queue);
-		spin_unlock(sdev->request_queue->queue_lock);
+
+		blk_run_queue(slq);
+		blk_put_queue(slq);
+
 		spin_lock(shost->host_lock);
 	}
 	/* put any unprocessed entries back */



^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCH v11 2/9] Remove get_device() / put_device() pair from scsi_request_fn()
  2013-06-24 13:34       ` James Bottomley
@ 2013-06-24 15:43         ` Bart Van Assche
  0 siblings, 0 replies; 51+ messages in thread
From: Bart Van Assche @ 2013-06-24 15:43 UTC (permalink / raw)
  To: James Bottomley
  Cc: linux-scsi, Joe Lawrence, Tejun Heo, Chanho Min, David Milburn,
	Hannes Reinecke, Mike Christie

On 06/24/13 15:34, James Bottomley wrote:
> On Mon, 2013-06-24 at 09:13 +0200, Bart Van Assche wrote:
>> On 06/24/13 04:36, James Bottomley wrote:
>>> On Wed, 2013-06-12 at 14:51 +0200, Bart Van Assche wrote:
>>>> Now that all scsi_request_fn() callers hold a reference on the
>>>> SCSI device that function is invoked for
>>>
>>> What makes you think that this is a true statement?  The usual caller is
>>> the block layer, which doesn't really know anything about the
>>> sdev->sdev_gendev.
>>
>> The reasoning behind that comment is as follows:
>> * The block layer guarantees that the reference count of a request
>>     queue is >= 1 as long as a request_fn() call is in progress (see also
>>     blk_cleanup_queue(), the __blk_drain_queue() call in that function
>>     and the loop in __blk_drain_queue() that waits until
>>     request_fn_active == 0).
>> * The SCSI core guarantees that blk_cleanup_queue() is invoked before
>>     the final put on sdev->sdev_gendev.
>
> That's still unclear.
>
> I think a clear explanation is something like:
>
>          scsi_devices may only be removed by by scsi_remove_device()
>          which has a call to blk_cleanup_queue() before the final put of
>          sdev->sdev_gendev.  Since blk_cleanup_queue() waits for the
>          block queue to drain and then tears it down, scsi_request_fn
>          cannot be active once blk_cleanup_queue() returns and hence the
>          get_device/put_device pairs in scsi_request_fn are unnecessary
>          because the final put will always be done by
>          scsi_remove_device().
>
> This, by the way, is an optimisation not a fix, so it shouldn't really
> be part of a series labelled "device removal fixes"

Thanks for the feedback. I will update the patch description and take 
this patch out of the device removal fixes patch series.

Bart.


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v11 1/9] Fix race between starved list and device removal
  2013-06-24 15:38   ` James Bottomley
@ 2013-06-24 16:16     ` Bart Van Assche
  2013-06-24 16:23       ` James Bottomley
  2013-06-24 17:24     ` Mike Christie
  1 sibling, 1 reply; 51+ messages in thread
From: Bart Van Assche @ 2013-06-24 16:16 UTC (permalink / raw)
  To: James Bottomley
  Cc: linux-scsi, Joe Lawrence, Tejun Heo, Chanho Min, David Milburn,
	Hannes Reinecke, Mike Christie

On 06/24/13 17:38, James Bottomley wrote:
> I really don't like this because it's shuffling potentially fragile
> lifetime rules since you now have to have the sdev deleted from the
> starved list before final put.  That becomes an unstated assumption
> within the code.
>
> The theory is that the starved list processing may be racing with a
> scsi_remove_device, so when we unlock the host lock, the device (and the
> queue) may be destroyed.  OK, so I agree with this, remote a possibility
> though it may be.  The easy way of fixing it without making assumptions
> is this, isn't it?  All it requires is that the queue be destroyed after
> the starved list entry is deleted in the sdev release code.
>
> James
>
> ---
>
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 86d5220..f294cc6 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -434,6 +434,7 @@ static void scsi_run_queue(struct request_queue *q)
>   	list_splice_init(&shost->starved_list, &starved_list);
>
>   	while (!list_empty(&starved_list)) {
> +		struct request_queue *slq;
>   		/*
>   		 * As long as shost is accepting commands and we have
>   		 * starved queues, call blk_run_queue. scsi_request_fn
> @@ -456,10 +457,21 @@ static void scsi_run_queue(struct request_queue *q)
>   			continue;
>   		}
>
> +		/*
> +		 * once we drop the host lock, a racing scsi_remove_device may
> +		 * remove the sdev from the starved list and destroy it and
> +		 * the queue.  Mitigate by taking a reference to the queue and
> +		 * never touching the sdev again after we drop the host lock.
> +		 */
> +		slq = sdev->request_queue;
> +		if (!blk_get_queue(slq))
> +			continue;
> +
>   		spin_unlock(shost->host_lock);
> -		spin_lock(sdev->request_queue->queue_lock);
> -		__blk_run_queue(sdev->request_queue);
> -		spin_unlock(sdev->request_queue->queue_lock);
> +
> +		blk_run_queue(slq);
> +		blk_put_queue(slq);
> +
>   		spin_lock(shost->host_lock);
>   	}
>   	/* put any unprocessed entries back */

Since the above patch invokes blk_put_queue() with interrupts disabled 
it may cause blk_release_queue() to be invoked with interrupts disabled. 
Sorry but I'm not sure whether that will work fine.

Bart.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v11 1/9] Fix race between starved list and device removal
  2013-06-24 16:16     ` Bart Van Assche
@ 2013-06-24 16:23       ` James Bottomley
  0 siblings, 0 replies; 51+ messages in thread
From: James Bottomley @ 2013-06-24 16:23 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: linux-scsi, Joe Lawrence, Tejun Heo, Chanho Min, David Milburn,
	Hannes Reinecke, Mike Christie

On Mon, 2013-06-24 at 18:16 +0200, Bart Van Assche wrote:
> On 06/24/13 17:38, James Bottomley wrote:
> > I really don't like this because it's shuffling potentially fragile
> > lifetime rules since you now have to have the sdev deleted from the
> > starved list before final put.  That becomes an unstated assumption
> > within the code.
> >
> > The theory is that the starved list processing may be racing with a
> > scsi_remove_device, so when we unlock the host lock, the device (and the
> > queue) may be destroyed.  OK, so I agree with this, remote a possibility
> > though it may be.  The easy way of fixing it without making assumptions
> > is this, isn't it?  All it requires is that the queue be destroyed after
> > the starved list entry is deleted in the sdev release code.
> >
> > James
> >
> > ---
> >
> > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> > index 86d5220..f294cc6 100644
> > --- a/drivers/scsi/scsi_lib.c
> > +++ b/drivers/scsi/scsi_lib.c
> > @@ -434,6 +434,7 @@ static void scsi_run_queue(struct request_queue *q)
> >   	list_splice_init(&shost->starved_list, &starved_list);
> >
> >   	while (!list_empty(&starved_list)) {
> > +		struct request_queue *slq;
> >   		/*
> >   		 * As long as shost is accepting commands and we have
> >   		 * starved queues, call blk_run_queue. scsi_request_fn
> > @@ -456,10 +457,21 @@ static void scsi_run_queue(struct request_queue *q)
> >   			continue;
> >   		}
> >
> > +		/*
> > +		 * once we drop the host lock, a racing scsi_remove_device may
> > +		 * remove the sdev from the starved list and destroy it and
> > +		 * the queue.  Mitigate by taking a reference to the queue and
> > +		 * never touching the sdev again after we drop the host lock.
> > +		 */
> > +		slq = sdev->request_queue;
> > +		if (!blk_get_queue(slq))
> > +			continue;
> > +
> >   		spin_unlock(shost->host_lock);
> > -		spin_lock(sdev->request_queue->queue_lock);
> > -		__blk_run_queue(sdev->request_queue);
> > -		spin_unlock(sdev->request_queue->queue_lock);
> > +
> > +		blk_run_queue(slq);
> > +		blk_put_queue(slq);
> > +
> >   		spin_lock(shost->host_lock);
> >   	}
> >   	/* put any unprocessed entries back */
> 
> Since the above patch invokes blk_put_queue() with interrupts disabled 
> it may cause blk_release_queue() to be invoked with interrupts disabled. 
> Sorry but I'm not sure whether that will work fine.

I think it is, but make the unlock/lock _irqrestore and _irqsave and the
concern goes away.

James




^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v11 1/9] Fix race between starved list and device removal
  2013-06-24 15:38   ` James Bottomley
  2013-06-24 16:16     ` Bart Van Assche
@ 2013-06-24 17:24     ` Mike Christie
  2013-06-24 17:49       ` James Bottomley
  1 sibling, 1 reply; 51+ messages in thread
From: Mike Christie @ 2013-06-24 17:24 UTC (permalink / raw)
  To: James Bottomley
  Cc: Bart Van Assche, linux-scsi, Joe Lawrence, Tejun Heo, Chanho Min,
	David Milburn, Hannes Reinecke

On 06/24/2013 10:38 AM, James Bottomley wrote:
> On Wed, 2013-06-12 at 14:49 +0200, Bart Van Assche wrote:
>> scsi_run_queue() examines all SCSI devices that are present on
>> the starved list. Since scsi_run_queue() unlocks the SCSI host
>> lock before running a queue a SCSI device can get removed after
>> it has been removed from the starved list and before its queue
>> is run. Protect against that race condition by increasing the
>> sdev refcount before running the sdev queue. Also, remove a
>> SCSI device from the starved list before __scsi_remove_device()
>> decreases the sdev refcount such that the get_device() call
>> added in scsi_run_queue() is guaranteed to succeed.
>>
>> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
>> Reported-and-tested-by: Chanho Min <chanho.min@lge.com>
>> Reference: http://lkml.org/lkml/2012/8/2/96
>> Acked-by: Tejun Heo <tj@kernel.org>
>> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
>> Cc: Hannes Reinecke <hare@suse.de>
>> Cc: <stable@vger.kernel.org>
>> ---
>>  drivers/scsi/scsi_lib.c   |   16 +++++++++++-----
>>  drivers/scsi/scsi_sysfs.c |   14 +++++++++++++-
>>  2 files changed, 24 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
>> index 86d5220..d6ca072 100644
>> --- a/drivers/scsi/scsi_lib.c
>> +++ b/drivers/scsi/scsi_lib.c
>> @@ -456,11 +456,17 @@ static void scsi_run_queue(struct request_queue *q)
>>  			continue;
>>  		}
>>  
>> -		spin_unlock(shost->host_lock);
>> -		spin_lock(sdev->request_queue->queue_lock);
>> -		__blk_run_queue(sdev->request_queue);
>> -		spin_unlock(sdev->request_queue->queue_lock);
>> -		spin_lock(shost->host_lock);
>> +		/*
>> +		 * Obtain a reference before unlocking the host_lock such that
>> +		 * the sdev can't disappear before blk_run_queue() is invoked.
>> +		 */
>> +		get_device(&sdev->sdev_gendev);
>> +		spin_unlock_irqrestore(shost->host_lock, flags);
>> +
>> +		blk_run_queue(sdev->request_queue);
>> +		put_device(&sdev->sdev_gendev);
>> +
>> +		spin_lock_irqsave(shost->host_lock, flags);
>>  	}
>>  	/* put any unprocessed entries back */
>>  	list_splice(&starved_list, &shost->starved_list);
>> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
>> index 931a7d9..34f1b39 100644
>> --- a/drivers/scsi/scsi_sysfs.c
>> +++ b/drivers/scsi/scsi_sysfs.c
>> @@ -345,7 +345,6 @@ static void scsi_device_dev_release_usercontext(struct work_struct *work)
>>  	starget->reap_ref++;
>>  	list_del(&sdev->siblings);
>>  	list_del(&sdev->same_target_siblings);
>> -	list_del(&sdev->starved_entry);
>>  	spin_unlock_irqrestore(sdev->host->host_lock, flags);
>>  
>>  	cancel_work_sync(&sdev->event_work);
>> @@ -953,6 +952,8 @@ int scsi_sysfs_add_sdev(struct scsi_device *sdev)
>>  void __scsi_remove_device(struct scsi_device *sdev)
>>  {
>>  	struct device *dev = &sdev->sdev_gendev;
>> +	struct Scsi_Host *shost = sdev->host;
>> +	unsigned long flags;
>>  
>>  	if (sdev->is_visible) {
>>  		if (scsi_device_set_state(sdev, SDEV_CANCEL) != 0)
>> @@ -974,6 +975,17 @@ void __scsi_remove_device(struct scsi_device *sdev)
>>  	blk_cleanup_queue(sdev->request_queue);
>>  	cancel_work_sync(&sdev->requeue_work);
>>  
>> +	/*
>> +	 * Remove a SCSI device from the starved list after blk_cleanup_queue()
>> +	 * finished such that scsi_request_fn() can't add it back to that list.
>> +	 * Also remove an sdev from the starved list before invoking
>> +	 * put_device() such that get_device() is guaranteed to succeed for an
>> +	 * sdev present on the starved list.
>> +	 */
>> +	spin_lock_irqsave(shost->host_lock, flags);
>> +	list_del(&sdev->starved_entry);
>> +	spin_unlock_irqrestore(shost->host_lock, flags);
>> +
>>  	if (sdev->host->hostt->slave_destroy)
>>  		sdev->host->hostt->slave_destroy(sdev);
>>  	transport_destroy_device(dev);
> 
> I really don't like this because it's shuffling potentially fragile
> lifetime rules since you now have to have the sdev deleted from the
> starved list before final put.  That becomes an unstated assumption
> within the code.
> 
> The theory is that the starved list processing may be racing with a
> scsi_remove_device, so when we unlock the host lock, the device (and the
> queue) may be destroyed.  OK, so I agree with this, remote a possibility
> though it may be.  The easy way of fixing it without making assumptions
> is this, isn't it?  All it requires is that the queue be destroyed after
> the starved list entry is deleted in the sdev release code.
> 
> James
> 
> ---
> 
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 86d5220..f294cc6 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -434,6 +434,7 @@ static void scsi_run_queue(struct request_queue *q)
>  	list_splice_init(&shost->starved_list, &starved_list);
>  
>  	while (!list_empty(&starved_list)) {
> +		struct request_queue *slq;
>  		/*
>  		 * As long as shost is accepting commands and we have
>  		 * starved queues, call blk_run_queue. scsi_request_fn
> @@ -456,10 +457,21 @@ static void scsi_run_queue(struct request_queue *q)
>  			continue;
>  		}
>  
> +		/*
> +		 * once we drop the host lock, a racing scsi_remove_device may
> +		 * remove the sdev from the starved list and destroy it and
> +		 * the queue.  Mitigate by taking a reference to the queue and
> +		 * never touching the sdev again after we drop the host lock.
> +		 */
> +		slq = sdev->request_queue;
> +		if (!blk_get_queue(slq))
> +			continue;
> +
>  		spin_unlock(shost->host_lock);
> -		spin_lock(sdev->request_queue->queue_lock);
> -		__blk_run_queue(sdev->request_queue);
> -		spin_unlock(sdev->request_queue->queue_lock);
> +
> +		blk_run_queue(slq);
> +		blk_put_queue(slq);
> +
>  		spin_lock(shost->host_lock);
>  	}
>  	/* put any unprocessed entries back */
> 
> 

I think we will hit issues with the scsi_device being freed too soon still.


1. In thread 1, __scsi_remove_device runs. It cleans up the commands and
it does the last put_device. It left the sdev on the starved list though.

2. In thread 2, scsi_run_queue runs and takes the dev off the starved
list and calls into block layer __blk_run_queue.

3. scsi_device_dev_release_usercontext runs and frees the scsi_device.

4. __blk_run_queue from #2 runs and calls into scsi_request_fn. We now
reference the freed sdev at the top of scsi_request_fn.


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v11 3/9] Avoid calling __scsi_remove_device() twice
  2013-06-12 12:52 ` [PATCH v11 3/9] Avoid calling __scsi_remove_device() twice Bart Van Assche
  2013-06-23 21:35   ` Mike Christie
@ 2013-06-24 17:38   ` James Bottomley
  2013-06-25  8:37     ` Bart Van Assche
  1 sibling, 1 reply; 51+ messages in thread
From: James Bottomley @ 2013-06-24 17:38 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: linux-scsi, Joe Lawrence, Tejun Heo, Chanho Min, David Milburn,
	Mike Christie, Hannes Reinecke

On Wed, 2013-06-12 at 14:52 +0200, Bart Van Assche wrote:
> SCSI devices are added to the shost->__devices list from inside
> scsi_alloc_sdev(). If something goes wrong during LUN scanning,
> e.g. a transport layer failure occurs, then __scsi_remove_device()
> can get invoked by the LUN scanning code for a SCSI device in
> state SDEV_CREATED_BLOCK or SDEV_BLOCKED. If this happens then
> the SCSI device has not yet been added to sysfs (is_visible == 0).
> Make sure that if this happens these devices are transitioned
> into state SDEV_DEL. This avoids that __scsi_remove_device()
> gets invoked a second time by scsi_forget_host().

The current principle is that scsi_remove_device can fail, so the
condition you're avoiding is expected.  If you want to make it always
succeed, we have to worry about any device state racing with an
asynchronous remove, which looks like a whole nasty can of worms.

The change log makes it sound like what you actually want to enable is
the ability to remove devices which fail probing but which are in the
blocked state, so why not just respin with only that, which is just
adding the blocked states to the ->SDEV_DEL state transitions?

James


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v11 1/9] Fix race between starved list and device removal
  2013-06-24 17:24     ` Mike Christie
@ 2013-06-24 17:49       ` James Bottomley
  0 siblings, 0 replies; 51+ messages in thread
From: James Bottomley @ 2013-06-24 17:49 UTC (permalink / raw)
  To: Mike Christie
  Cc: Bart Van Assche, linux-scsi, Joe Lawrence, Tejun Heo, Chanho Min,
	David Milburn, Hannes Reinecke

On Mon, 2013-06-24 at 12:24 -0500, Mike Christie wrote:
> On 06/24/2013 10:38 AM, James Bottomley wrote:
> > On Wed, 2013-06-12 at 14:49 +0200, Bart Van Assche wrote:
> >> scsi_run_queue() examines all SCSI devices that are present on
> >> the starved list. Since scsi_run_queue() unlocks the SCSI host
> >> lock before running a queue a SCSI device can get removed after
> >> it has been removed from the starved list and before its queue
> >> is run. Protect against that race condition by increasing the
> >> sdev refcount before running the sdev queue. Also, remove a
> >> SCSI device from the starved list before __scsi_remove_device()
> >> decreases the sdev refcount such that the get_device() call
> >> added in scsi_run_queue() is guaranteed to succeed.
> >>
> >> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> >> Reported-and-tested-by: Chanho Min <chanho.min@lge.com>
> >> Reference: http://lkml.org/lkml/2012/8/2/96
> >> Acked-by: Tejun Heo <tj@kernel.org>
> >> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
> >> Cc: Hannes Reinecke <hare@suse.de>
> >> Cc: <stable@vger.kernel.org>
> >> ---
> >>  drivers/scsi/scsi_lib.c   |   16 +++++++++++-----
> >>  drivers/scsi/scsi_sysfs.c |   14 +++++++++++++-
> >>  2 files changed, 24 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> >> index 86d5220..d6ca072 100644
> >> --- a/drivers/scsi/scsi_lib.c
> >> +++ b/drivers/scsi/scsi_lib.c
> >> @@ -456,11 +456,17 @@ static void scsi_run_queue(struct request_queue *q)
> >>  			continue;
> >>  		}
> >>  
> >> -		spin_unlock(shost->host_lock);
> >> -		spin_lock(sdev->request_queue->queue_lock);
> >> -		__blk_run_queue(sdev->request_queue);
> >> -		spin_unlock(sdev->request_queue->queue_lock);
> >> -		spin_lock(shost->host_lock);
> >> +		/*
> >> +		 * Obtain a reference before unlocking the host_lock such that
> >> +		 * the sdev can't disappear before blk_run_queue() is invoked.
> >> +		 */
> >> +		get_device(&sdev->sdev_gendev);
> >> +		spin_unlock_irqrestore(shost->host_lock, flags);
> >> +
> >> +		blk_run_queue(sdev->request_queue);
> >> +		put_device(&sdev->sdev_gendev);
> >> +
> >> +		spin_lock_irqsave(shost->host_lock, flags);
> >>  	}
> >>  	/* put any unprocessed entries back */
> >>  	list_splice(&starved_list, &shost->starved_list);
> >> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
> >> index 931a7d9..34f1b39 100644
> >> --- a/drivers/scsi/scsi_sysfs.c
> >> +++ b/drivers/scsi/scsi_sysfs.c
> >> @@ -345,7 +345,6 @@ static void scsi_device_dev_release_usercontext(struct work_struct *work)
> >>  	starget->reap_ref++;
> >>  	list_del(&sdev->siblings);
> >>  	list_del(&sdev->same_target_siblings);
> >> -	list_del(&sdev->starved_entry);
> >>  	spin_unlock_irqrestore(sdev->host->host_lock, flags);
> >>  
> >>  	cancel_work_sync(&sdev->event_work);
> >> @@ -953,6 +952,8 @@ int scsi_sysfs_add_sdev(struct scsi_device *sdev)
> >>  void __scsi_remove_device(struct scsi_device *sdev)
> >>  {
> >>  	struct device *dev = &sdev->sdev_gendev;
> >> +	struct Scsi_Host *shost = sdev->host;
> >> +	unsigned long flags;
> >>  
> >>  	if (sdev->is_visible) {
> >>  		if (scsi_device_set_state(sdev, SDEV_CANCEL) != 0)
> >> @@ -974,6 +975,17 @@ void __scsi_remove_device(struct scsi_device *sdev)
> >>  	blk_cleanup_queue(sdev->request_queue);
> >>  	cancel_work_sync(&sdev->requeue_work);
> >>  
> >> +	/*
> >> +	 * Remove a SCSI device from the starved list after blk_cleanup_queue()
> >> +	 * finished such that scsi_request_fn() can't add it back to that list.
> >> +	 * Also remove an sdev from the starved list before invoking
> >> +	 * put_device() such that get_device() is guaranteed to succeed for an
> >> +	 * sdev present on the starved list.
> >> +	 */
> >> +	spin_lock_irqsave(shost->host_lock, flags);
> >> +	list_del(&sdev->starved_entry);
> >> +	spin_unlock_irqrestore(shost->host_lock, flags);
> >> +
> >>  	if (sdev->host->hostt->slave_destroy)
> >>  		sdev->host->hostt->slave_destroy(sdev);
> >>  	transport_destroy_device(dev);
> > 
> > I really don't like this because it's shuffling potentially fragile
> > lifetime rules since you now have to have the sdev deleted from the
> > starved list before final put.  That becomes an unstated assumption
> > within the code.
> > 
> > The theory is that the starved list processing may be racing with a
> > scsi_remove_device, so when we unlock the host lock, the device (and the
> > queue) may be destroyed.  OK, so I agree with this, remote a possibility
> > though it may be.  The easy way of fixing it without making assumptions
> > is this, isn't it?  All it requires is that the queue be destroyed after
> > the starved list entry is deleted in the sdev release code.
> > 
> > James
> > 
> > ---
> > 
> > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> > index 86d5220..f294cc6 100644
> > --- a/drivers/scsi/scsi_lib.c
> > +++ b/drivers/scsi/scsi_lib.c
> > @@ -434,6 +434,7 @@ static void scsi_run_queue(struct request_queue *q)
> >  	list_splice_init(&shost->starved_list, &starved_list);
> >  
> >  	while (!list_empty(&starved_list)) {
> > +		struct request_queue *slq;
> >  		/*
> >  		 * As long as shost is accepting commands and we have
> >  		 * starved queues, call blk_run_queue. scsi_request_fn
> > @@ -456,10 +457,21 @@ static void scsi_run_queue(struct request_queue *q)
> >  			continue;
> >  		}
> >  
> > +		/*
> > +		 * once we drop the host lock, a racing scsi_remove_device may
> > +		 * remove the sdev from the starved list and destroy it and
> > +		 * the queue.  Mitigate by taking a reference to the queue and
> > +		 * never touching the sdev again after we drop the host lock.
> > +		 */
> > +		slq = sdev->request_queue;
> > +		if (!blk_get_queue(slq))
> > +			continue;
> > +
> >  		spin_unlock(shost->host_lock);
> > -		spin_lock(sdev->request_queue->queue_lock);
> > -		__blk_run_queue(sdev->request_queue);
> > -		spin_unlock(sdev->request_queue->queue_lock);
> > +
> > +		blk_run_queue(slq);
> > +		blk_put_queue(slq);
> > +
> >  		spin_lock(shost->host_lock);
> >  	}
> >  	/* put any unprocessed entries back */
> > 
> > 
> 
> I think we will hit issues with the scsi_device being freed too soon still.
> 
> 
> 1. In thread 1, __scsi_remove_device runs. It cleans up the commands and
> it does the last put_device. It left the sdev on the starved list though.
> 
> 2. In thread 2, scsi_run_queue runs and takes the dev off the starved
> list and calls into block layer __blk_run_queue.
> 
> 3. scsi_device_dev_release_usercontext runs and frees the scsi_device.
> 
> 4. __blk_run_queue from #2 runs and calls into scsi_request_fn. We now
> reference the freed sdev at the top of scsi_request_fn.

This last can't happen because in 1. and before 3. the queue has
transitioned to DEAD.  That means you can't get a reference to the
queue, but if you do, it won't run its request function.  If it's
already running its request function in 1. then blk_cleanup_queue() will
wait for that to complete before transitioning the queue to DEAD.

James



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v11 4/9] Disallow changing the device state via sysfs into "deleted"
  2013-06-12 12:53 ` [PATCH v11 4/9] Disallow changing the device state via sysfs into "deleted" Bart Van Assche
  2013-06-24  1:05   ` Mike Christie
@ 2013-06-24 17:59   ` James Bottomley
  2013-06-25  8:41     ` Bart Van Assche
  1 sibling, 1 reply; 51+ messages in thread
From: James Bottomley @ 2013-06-24 17:59 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: linux-scsi, Joe Lawrence, Tejun Heo, Chanho Min, David Milburn,
	Mike Christie, Hannes Reinecke

On Wed, 2013-06-12 at 14:53 +0200, Bart Van Assche wrote:
> Changing the state of a SCSI device via sysfs into "cancel" or
> "deleted" prevents removal of these devices by scsi_remove_host().
> Hence do not allow this. Also, introduce the symbolic name
> INVALID_SDEV_STATE, representing a value different from any valid
> SCSI device state. Update scsi_device_set_state() such that gcc
> does not issue a warning about an enumeration value not being
> handled inside a switch statement.

zero is the invalid state, that's why the SDEV_ states start at 1.
Using a bare zero also means that gcc doesn't have to consider it in the
switch statement, so there's no need to introduce a new one.

If we want to try to babysit user initiated state changes, then it looks
like OFFLINE<->RUNNING might be the only useful ones?

James


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v11 6/9] Make scsi_remove_host() wait until error handling finished
  2013-06-12 12:55 ` [PATCH v11 6/9] Make scsi_remove_host() wait until error handling finished Bart Van Assche
  2013-06-24  1:15   ` Mike Christie
@ 2013-06-24 19:19   ` James Bottomley
  2013-06-24 20:04     ` Mike Christie
  1 sibling, 1 reply; 51+ messages in thread
From: James Bottomley @ 2013-06-24 19:19 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: linux-scsi, Joe Lawrence, Tejun Heo, Chanho Min, David Milburn,
	Hannes Reinecke, Mike Christie

On Wed, 2013-06-12 at 14:55 +0200, Bart Van Assche wrote:
> A SCSI LLD may start cleaning up host resources as soon as
> scsi_remove_host() returns. These host resources may be needed by
> the LLD in an implementation of one of the eh_* functions. So if
> one of the eh_* functions is in progress when scsi_remove_host()
> is invoked, wait until the eh_* function has finished. Also, do
> not invoke any of the eh_* functions after scsi_remove_host() has
> started.

We already have state guards for this, don't we?  That's the
SHOST_*_RECOVERY ones.  When eh functions are active, the host
transitions to a recovery state, so the wait could just wait on that
state rather than implement an open coded counting semaphore.

However, what's the reasoning behind wanting to do this?  In theory all
necessary resources for the eh thread should only be freed in the
release callback.  That means they aren't freed until all error recovery
completes.

James



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v11 6/9] Make scsi_remove_host() wait until error handling finished
  2013-06-24 19:19   ` James Bottomley
@ 2013-06-24 20:04     ` Mike Christie
  2013-06-24 22:27       ` James Bottomley
  0 siblings, 1 reply; 51+ messages in thread
From: Mike Christie @ 2013-06-24 20:04 UTC (permalink / raw)
  To: James Bottomley
  Cc: Bart Van Assche, linux-scsi, Joe Lawrence, Tejun Heo, Chanho Min,
	David Milburn, Hannes Reinecke

On 06/24/2013 02:19 PM, James Bottomley wrote:
> On Wed, 2013-06-12 at 14:55 +0200, Bart Van Assche wrote:
>> A SCSI LLD may start cleaning up host resources as soon as
>> scsi_remove_host() returns. These host resources may be needed by
>> the LLD in an implementation of one of the eh_* functions. So if
>> one of the eh_* functions is in progress when scsi_remove_host()
>> is invoked, wait until the eh_* function has finished. Also, do
>> not invoke any of the eh_* functions after scsi_remove_host() has
>> started.
> 
> We already have state guards for this, don't we?  That's the
> SHOST_*_RECOVERY ones.  When eh functions are active, the host
> transitions to a recovery state, so the wait could just wait on that
> state rather than implement an open coded counting semaphore.
> 

That seems better. For the sg_reset_provider case we just would have to
also wait on the tmf_in_progress bit.

I think I used all my credits messing up reviewing this patchset.

> However, what's the reasoning behind wanting to do this?  In theory all
> necessary resources for the eh thread should only be freed in the
> release callback.  That means they aren't freed until all error recovery
> completes.
> 

I think it makes it easier to handle cleanup of driver resources needed
for aborts/resets for some drivers. If after host removal, the scsi eh
can call into the driver after scsi_remove_host is called then we have
to set some internal bits to fail future eh callback calls and handle
waiting on or flushing running eh operations. If we know that after
scsi_host_remove is called the eh callbacks will not be running and will
not be called we can just free the driver resources.

For iscsi and I think drivers that do scsi_remove_target it would be
helpful to have something similar at the target level.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v11 6/9] Make scsi_remove_host() wait until error handling finished
  2013-06-24 20:04     ` Mike Christie
@ 2013-06-24 22:27       ` James Bottomley
  2013-06-25  2:26         ` Mike Christie
                           ` (2 more replies)
  0 siblings, 3 replies; 51+ messages in thread
From: James Bottomley @ 2013-06-24 22:27 UTC (permalink / raw)
  To: Mike Christie
  Cc: Bart Van Assche, linux-scsi, Joe Lawrence, Tejun Heo, Chanho Min,
	David Milburn, Hannes Reinecke

On Mon, 2013-06-24 at 15:04 -0500, Mike Christie wrote:
> On 06/24/2013 02:19 PM, James Bottomley wrote:
> > On Wed, 2013-06-12 at 14:55 +0200, Bart Van Assche wrote:
> >> A SCSI LLD may start cleaning up host resources as soon as
> >> scsi_remove_host() returns. These host resources may be needed by
> >> the LLD in an implementation of one of the eh_* functions. So if
> >> one of the eh_* functions is in progress when scsi_remove_host()
> >> is invoked, wait until the eh_* function has finished. Also, do
> >> not invoke any of the eh_* functions after scsi_remove_host() has
> >> started.
> > 
> > We already have state guards for this, don't we?  That's the
> > SHOST_*_RECOVERY ones.  When eh functions are active, the host
> > transitions to a recovery state, so the wait could just wait on that
> > state rather than implement an open coded counting semaphore.
> > 
> 
> That seems better. For the sg_reset_provider case we just would have to
> also wait on the tmf_in_progress bit.

The simplest way is may just be to move the kthread_stop() from release
to remove.  That synchronously waits for the outstanding error handling
to complete and the eh thread to stop.  Perhaps the eh thread should
also wait for tmf in progress before it dies?

> I think I used all my credits messing up reviewing this patchset.

Hey, you're one of my best reviewers and this doesn't change that.

For a variety of reasons this patch set is incredibly hard to review:
Almost every patch touches pieces in the mid layer where you have to be
sure in minute detail you know what's going on (and what should be going
on), so usually it's a couple of hours with the source code just making
sure you do know this.  Plus it's code where the underlying usage model
has evolved over the years meaning the original guarantees might have
been violated silently somewhere and the ramifications spread out like
tentacles everywhere.  Finally, it's not clear from the change logs why
the changes are actually being made: for instance sentence one of this
change log says "A SCSI LLD may start cleaning up host resources as soon
as scsi_remove_host() returns." which causes my sanity checker to go off
immediately because in a refcounted model, like we use for dev, target
and host, nothing essential is supposed to be freed until the last put
which may or may not happen in the remove function.

> > However, what's the reasoning behind wanting to do this?  In theory all
> > necessary resources for the eh thread should only be freed in the
> > release callback.  That means they aren't freed until all error recovery
> > completes.
> 
> I think it makes it easier to handle cleanup of driver resources
> needed
> for aborts/resets for some drivers. If after host removal, the scsi eh
> can call into the driver after scsi_remove_host is called then we have
> to set some internal bits to fail future eh callback calls and handle
> waiting on or flushing running eh operations. If we know that after
> scsi_host_remove is called the eh callbacks will not be running and
> will
> not be called we can just free the driver resources.
> 
> For iscsi and I think drivers that do scsi_remove_target it would be
> helpful to have something similar at the target level.

I'm wary of this because it combines two models: a definite state model
(where we move from state to state waiting for the completions) with an
event driven one (in theory the current one); such combinations rarely
lead to good things because you get a mixture of actions causing state
transitions some of which are waited for and some of which have an async
transition and that ends up confusing the heck out of everybody no
matter how carefully documented.  Can you give me some use cases of what
you're trying to achieve?  Could it be as simple as an event that fires
on release?

James



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v11 6/9] Make scsi_remove_host() wait until error handling finished
  2013-06-24 22:27       ` James Bottomley
@ 2013-06-25  2:26         ` Mike Christie
  2013-06-25  2:56           ` Michael Christie
  2013-06-25  9:01         ` Bart Van Assche
  2013-06-25 11:13         ` Bart Van Assche
  2 siblings, 1 reply; 51+ messages in thread
From: Mike Christie @ 2013-06-25  2:26 UTC (permalink / raw)
  To: James Bottomley
  Cc: Bart Van Assche, linux-scsi, Joe Lawrence, Tejun Heo, Chanho Min,
	David Milburn, Hannes Reinecke

On 06/24/2013 05:27 PM, James Bottomley wrote:
>>> However, what's the reasoning behind wanting to do this?  In theory all
>>> necessary resources for the eh thread should only be freed in the
>>> release callback.  That means they aren't freed until all error recovery
>>> completes.
>>
>> I think it makes it easier to handle cleanup of driver resources
>> needed
>> for aborts/resets for some drivers. If after host removal, the scsi eh
>> can call into the driver after scsi_remove_host is called then we have
>> to set some internal bits to fail future eh callback calls and handle
>> waiting on or flushing running eh operations. If we know that after
>> scsi_host_remove is called the eh callbacks will not be running and
>> will
>> not be called we can just free the driver resources.
>>
>> For iscsi and I think drivers that do scsi_remove_target it would be
>> helpful to have something similar at the target level.
> 
> I'm wary of this because it combines two models: a definite state model
> (where we move from state to state waiting for the completions) with an
> event driven one (in theory the current one); such combinations rarely
> lead to good things because you get a mixture of actions causing state
> transitions some of which are waited for and some of which have an async
> transition and that ends up confusing the heck out of everybody no
> matter how carefully documented.  Can you give me some use cases of what
> you're trying to achieve?  Could it be as simple as an event that fires
> on release?
> 


The problem that we hit in iscsi is that we will call scsi_remove_target
(we used to call scsi_remove_host when we always did a host per target
so we hit the problem at that level before). That will complete, but the
scsi eh might still be trying to abort/reset devices accessed through
that target. To avoid freeing resources that the iscsi scsi eh might be
using, we set internal state bits and wait on host_busy to go to zero
before we tear down the iscsi session, conn and task structs.

I think Bart was hitting a similar issue but a level up in the host
removal case, because srp always does a host per target and so it just
does a scsi_remove_host.

I think some driver/scsi_host_template callbacks that are called when
the host or target is released would work. At least it will for iscsi.
Will let Bart comment on srp, and we can make patches from there.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v11 6/9] Make scsi_remove_host() wait until error handling finished
  2013-06-25  2:26         ` Mike Christie
@ 2013-06-25  2:56           ` Michael Christie
  0 siblings, 0 replies; 51+ messages in thread
From: Michael Christie @ 2013-06-25  2:56 UTC (permalink / raw)
  To: James Bottomley
  Cc: Bart Van Assche, linux-scsi, Joe Lawrence, Tejun Heo, Chanho Min,
	David Milburn, Hannes Reinecke

On Jun 24, 2013, at 9:26 PM, Mike Christie <michaelc@cs.wisc.edu> wrote:

> On 06/24/2013 05:27 PM, James Bottomley wrote:
>>>> However, what's the reasoning behind wanting to do this?  In theory all
>>>> necessary resources for the eh thread should only be freed in the
>>>> release callback.  That means they aren't freed until all error recovery
>>>> completes.
>>> 
>>> I think it makes it easier to handle cleanup of driver resources
>>> needed
>>> for aborts/resets for some drivers. If after host removal, the scsi eh
>>> can call into the driver after scsi_remove_host is called then we have
>>> to set some internal bits to fail future eh callback calls and handle
>>> waiting on or flushing running eh operations. If we know that after
>>> scsi_host_remove is called the eh callbacks will not be running and
>>> will
>>> not be called we can just free the driver resources.
>>> 
>>> For iscsi and I think drivers that do scsi_remove_target it would be
>>> helpful to have something similar at the target level.
>> 
>> I'm wary of this because it combines two models: a definite state model
>> (where we move from state to state waiting for the completions) with an
>> event driven one (in theory the current one); such combinations rarely
>> lead to good things because you get a mixture of actions causing state
>> transitions some of which are waited for and some of which have an async
>> transition and that ends up confusing the heck out of everybody no
>> matter how carefully documented.  Can you give me some use cases of what
>> you're trying to achieve?  Could it be as simple as an event that fires
>> on release?
> 
> 
> The problem that we hit in iscsi is that we will call scsi_remove_target
> (we used to call scsi_remove_host when we always did a host per target
> so we hit the problem at that level before). That will complete, but the
> scsi eh might still be trying to abort/reset devices accessed through
> that target. To avoid freeing resources that the iscsi scsi eh might be
> using, we set internal state bits and wait on host_busy to go to zero
> before we tear down the iscsi session, conn and task structs.
> 
> I think Bart was hitting a similar issue but a level up in the host
> removal case, because srp always does a host per target and so it just
> does a scsi_remove_host..

I take this back. I don't think it is a issue anymore and I think I can remove the iscsi hack. With the blk_cleanup_queue/blk_drain_queue code I think the target and the removal of its devices will not complete until the scsi eh is completed. The blk_drain_queue code will now wait for the eh to complete because the rq counters will be incremented.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v11 3/9] Avoid calling __scsi_remove_device() twice
  2013-06-24 17:38   ` James Bottomley
@ 2013-06-25  8:37     ` Bart Van Assche
  2013-06-25 13:44       ` James Bottomley
  0 siblings, 1 reply; 51+ messages in thread
From: Bart Van Assche @ 2013-06-25  8:37 UTC (permalink / raw)
  To: James Bottomley
  Cc: linux-scsi, Joe Lawrence, Tejun Heo, Chanho Min, David Milburn,
	Mike Christie, Hannes Reinecke

On 06/24/13 19:38, James Bottomley wrote:
> On Wed, 2013-06-12 at 14:52 +0200, Bart Van Assche wrote:
>> SCSI devices are added to the shost->__devices list from inside
>> scsi_alloc_sdev(). If something goes wrong during LUN scanning,
>> e.g. a transport layer failure occurs, then __scsi_remove_device()
>> can get invoked by the LUN scanning code for a SCSI device in
>> state SDEV_CREATED_BLOCK or SDEV_BLOCKED. If this happens then
>> the SCSI device has not yet been added to sysfs (is_visible == 0).
>> Make sure that if this happens these devices are transitioned
>> into state SDEV_DEL. This avoids that __scsi_remove_device()
>> gets invoked a second time by scsi_forget_host().
> 
> The current principle is that scsi_remove_device can fail, so the
> condition you're avoiding is expected.  If you want to make it always
> succeed, we have to worry about any device state racing with an
> asynchronous remove, which looks like a whole nasty can of worms.
> 
> The change log makes it sound like what you actually want to enable is
> the ability to remove devices which fail probing but which are in the
> blocked state, so why not just respin with only that, which is just
> adding the blocked states to the ->SDEV_DEL state transitions?

If what you had in mind is the patch below, I think we agree:

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index e3d6276..eaea242 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -2185,6 +2185,8 @@ scsi_device_set_state(struct scsi_device *sdev, enum scsi_device_state state)
 		case SDEV_OFFLINE:
 		case SDEV_TRANSPORT_OFFLINE:
 		case SDEV_CANCEL:
+		case SDEV_BLOCK:
+		case SDEV_CREATED_BLOCK:
 			break;
 		default:
 			goto illegal;



^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCH v11 4/9] Disallow changing the device state via sysfs into "deleted"
  2013-06-24 17:59   ` James Bottomley
@ 2013-06-25  8:41     ` Bart Van Assche
  2013-06-25 13:42       ` James Bottomley
  0 siblings, 1 reply; 51+ messages in thread
From: Bart Van Assche @ 2013-06-25  8:41 UTC (permalink / raw)
  To: James Bottomley
  Cc: linux-scsi, Joe Lawrence, Tejun Heo, Chanho Min, David Milburn,
	Mike Christie, Hannes Reinecke

On 06/24/13 19:59, James Bottomley wrote:
> On Wed, 2013-06-12 at 14:53 +0200, Bart Van Assche wrote:
>> Changing the state of a SCSI device via sysfs into "cancel" or
>> "deleted" prevents removal of these devices by scsi_remove_host().
>> Hence do not allow this. Also, introduce the symbolic name
>> INVALID_SDEV_STATE, representing a value different from any valid
>> SCSI device state. Update scsi_device_set_state() such that gcc
>> does not issue a warning about an enumeration value not being
>> handled inside a switch statement.
>
> zero is the invalid state, that's why the SDEV_ states start at 1.
> Using a bare zero also means that gcc doesn't have to consider it in the
> switch statement, so there's no need to introduce a new one.
>
> If we want to try to babysit user initiated state changes, then it looks
> like OFFLINE<->RUNNING might be the only useful ones?

How about the BLOCKED<>RUNNING and QUIESCE<>RUNNING transitions ? I 
think it may be useful for a user to trigger these as well.

Bart.


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v11 6/9] Make scsi_remove_host() wait until error handling finished
  2013-06-24 22:27       ` James Bottomley
  2013-06-25  2:26         ` Mike Christie
@ 2013-06-25  9:01         ` Bart Van Assche
  2013-06-25 13:45           ` James Bottomley
  2013-06-25 11:13         ` Bart Van Assche
  2 siblings, 1 reply; 51+ messages in thread
From: Bart Van Assche @ 2013-06-25  9:01 UTC (permalink / raw)
  To: James Bottomley
  Cc: Mike Christie, linux-scsi, Joe Lawrence, Tejun Heo, Chanho Min,
	David Milburn, Hannes Reinecke

On 06/25/13 00:27, James Bottomley wrote:
> On Mon, 2013-06-24 at 15:04 -0500, Mike Christie wrote:
>> On 06/24/2013 02:19 PM, James Bottomley wrote:
>>> On Wed, 2013-06-12 at 14:55 +0200, Bart Van Assche wrote:
>>>> A SCSI LLD may start cleaning up host resources as soon as
>>>> scsi_remove_host() returns. These host resources may be needed by
>>>> the LLD in an implementation of one of the eh_* functions. So if
>>>> one of the eh_* functions is in progress when scsi_remove_host()
>>>> is invoked, wait until the eh_* function has finished. Also, do
>>>> not invoke any of the eh_* functions after scsi_remove_host() has
>>>> started.
>>>
>>> We already have state guards for this, don't we?  That's the
>>> SHOST_*_RECOVERY ones.  When eh functions are active, the host
>>> transitions to a recovery state, so the wait could just wait on that
>>> state rather than implement an open coded counting semaphore.
>>
>> That seems better. For the sg_reset_provider case we just would have to
>> also wait on the tmf_in_progress bit.
>
> The simplest way is may just be to move the kthread_stop() from release
> to remove.  That synchronously waits for the outstanding error handling
> to complete and the eh thread to stop.  Perhaps the eh thread should
> also wait for tmf in progress before it dies?

Regarding TMF that are in progress: my preference is to leave it to the 
LLD to wait for any TMF in progress if necessary. At least with SRP over 
RDMA it is possible to prevent receiving further TMF completion 
notifications by closing the connection over which these TMF were sent.

There is a difference though between moving the EH kthread_stop() call 
and the patch at the start of this thread: moving the EH kthread_stop() 
call does not prevent that an ioctl like SG_SCSI_RESET triggers an eh_* 
callback after scsi_remove_host() has finished. However, the 
scsi_begin_eh() / scsi_end_eh() functions do prevent that an ioctl can 
cause an eh_* callback to be invoked after scsi_remove_device() finished.

Bart.



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v11 6/9] Make scsi_remove_host() wait until error handling finished
  2013-06-24 22:27       ` James Bottomley
  2013-06-25  2:26         ` Mike Christie
  2013-06-25  9:01         ` Bart Van Assche
@ 2013-06-25 11:13         ` Bart Van Assche
  2 siblings, 0 replies; 51+ messages in thread
From: Bart Van Assche @ 2013-06-25 11:13 UTC (permalink / raw)
  To: James Bottomley
  Cc: Mike Christie, linux-scsi, Joe Lawrence, Tejun Heo, Chanho Min,
	David Milburn, Hannes Reinecke

On 06/25/13 00:27, James Bottomley wrote:
> For a variety of reasons this patch set is incredibly hard to review:
> Almost every patch touches pieces in the mid layer where you have to be
> sure in minute detail you know what's going on (and what should be going
> on), so usually it's a couple of hours with the source code just making
> sure you do know this.  Plus it's code where the underlying usage model
> has evolved over the years meaning the original guarantees might have
> been violated silently somewhere and the ramifications spread out like
> tentacles everywhere.  Finally, it's not clear from the change logs why
> the changes are actually being made: for instance sentence one of this
> change log says "A SCSI LLD may start cleaning up host resources as soon
> as scsi_remove_host() returns." which causes my sanity checker to go off
> immediately because in a refcounted model, like we use for dev, target
> and host, nothing essential is supposed to be freed until the last put
> which may or may not happen in the remove function.

If the invocations of the eh_* callback functions would be visible to 
the block layer then blk_cleanup_queue() would wait until any such eh_* 
invocations have finished. Such an approach could simplify device 
removal in the SCSI mid-layer significantly. It also would avoid that an 
eh_* callback can be invoked via an ioctl after scsi_remove_device() has 
finished.

Bart.


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v11 4/9] Disallow changing the device state via sysfs into "deleted"
  2013-06-25  8:41     ` Bart Van Assche
@ 2013-06-25 13:42       ` James Bottomley
  0 siblings, 0 replies; 51+ messages in thread
From: James Bottomley @ 2013-06-25 13:42 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: linux-scsi, Joe Lawrence, Tejun Heo, Chanho Min, David Milburn,
	Mike Christie, Hannes Reinecke

On Tue, 2013-06-25 at 10:41 +0200, Bart Van Assche wrote:
> On 06/24/13 19:59, James Bottomley wrote:
> > On Wed, 2013-06-12 at 14:53 +0200, Bart Van Assche wrote:
> >> Changing the state of a SCSI device via sysfs into "cancel" or
> >> "deleted" prevents removal of these devices by scsi_remove_host().
> >> Hence do not allow this. Also, introduce the symbolic name
> >> INVALID_SDEV_STATE, representing a value different from any valid
> >> SCSI device state. Update scsi_device_set_state() such that gcc
> >> does not issue a warning about an enumeration value not being
> >> handled inside a switch statement.
> >
> > zero is the invalid state, that's why the SDEV_ states start at 1.
> > Using a bare zero also means that gcc doesn't have to consider it in the
> > switch statement, so there's no need to introduce a new one.
> >
> > If we want to try to babysit user initiated state changes, then it looks
> > like OFFLINE<->RUNNING might be the only useful ones?
> 
> How about the BLOCKED<>RUNNING and QUIESCE<>RUNNING transitions ? I 
> think it may be useful for a user to trigger these as well.

They're part of paired state, so the user would tamper with assumptions
the HBA is making ... also, just changing the state doesn't help, the
queue needs to be restarted for these transitions which it currently
isn't.

James


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v11 3/9] Avoid calling __scsi_remove_device() twice
  2013-06-25  8:37     ` Bart Van Assche
@ 2013-06-25 13:44       ` James Bottomley
  2013-06-25 15:23         ` Bart Van Assche
  0 siblings, 1 reply; 51+ messages in thread
From: James Bottomley @ 2013-06-25 13:44 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: linux-scsi, Joe Lawrence, Tejun Heo, Chanho Min, David Milburn,
	Mike Christie, Hannes Reinecke

On Tue, 2013-06-25 at 10:37 +0200, Bart Van Assche wrote:
> On 06/24/13 19:38, James Bottomley wrote:
> > On Wed, 2013-06-12 at 14:52 +0200, Bart Van Assche wrote:
> >> SCSI devices are added to the shost->__devices list from inside
> >> scsi_alloc_sdev(). If something goes wrong during LUN scanning,
> >> e.g. a transport layer failure occurs, then __scsi_remove_device()
> >> can get invoked by the LUN scanning code for a SCSI device in
> >> state SDEV_CREATED_BLOCK or SDEV_BLOCKED. If this happens then
> >> the SCSI device has not yet been added to sysfs (is_visible == 0).
> >> Make sure that if this happens these devices are transitioned
> >> into state SDEV_DEL. This avoids that __scsi_remove_device()
> >> gets invoked a second time by scsi_forget_host().
> > 
> > The current principle is that scsi_remove_device can fail, so the
> > condition you're avoiding is expected.  If you want to make it always
> > succeed, we have to worry about any device state racing with an
> > asynchronous remove, which looks like a whole nasty can of worms.
> > 
> > The change log makes it sound like what you actually want to enable is
> > the ability to remove devices which fail probing but which are in the
> > blocked state, so why not just respin with only that, which is just
> > adding the blocked states to the ->SDEV_DEL state transitions?
> 
> If what you had in mind is the patch below, I think we agree:
> 
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index e3d6276..eaea242 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -2185,6 +2185,8 @@ scsi_device_set_state(struct scsi_device *sdev, enum scsi_device_state state)
>  		case SDEV_OFFLINE:
>  		case SDEV_TRANSPORT_OFFLINE:
>  		case SDEV_CANCEL:
> +		case SDEV_BLOCK:
> +		case SDEV_CREATED_BLOCK:

Something like this, yes.  For the probe lun case, we have to be in
CREATED, so any block action transitions only to CREATED_BLOCK.  The
BLOCK->DEL transition can only be a result of an async remove racing
with bringup, can't it?  Which is something I think we still want to
forbid.

James


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v11 6/9] Make scsi_remove_host() wait until error handling finished
  2013-06-25  9:01         ` Bart Van Assche
@ 2013-06-25 13:45           ` James Bottomley
  2013-06-25 15:31             ` Bart Van Assche
  0 siblings, 1 reply; 51+ messages in thread
From: James Bottomley @ 2013-06-25 13:45 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Mike Christie, linux-scsi, Joe Lawrence, Tejun Heo, Chanho Min,
	David Milburn, Hannes Reinecke

On Tue, 2013-06-25 at 11:01 +0200, Bart Van Assche wrote:
> On 06/25/13 00:27, James Bottomley wrote:
> > On Mon, 2013-06-24 at 15:04 -0500, Mike Christie wrote:
> >> On 06/24/2013 02:19 PM, James Bottomley wrote:
> >>> On Wed, 2013-06-12 at 14:55 +0200, Bart Van Assche wrote:
> >>>> A SCSI LLD may start cleaning up host resources as soon as
> >>>> scsi_remove_host() returns. These host resources may be needed by
> >>>> the LLD in an implementation of one of the eh_* functions. So if
> >>>> one of the eh_* functions is in progress when scsi_remove_host()
> >>>> is invoked, wait until the eh_* function has finished. Also, do
> >>>> not invoke any of the eh_* functions after scsi_remove_host() has
> >>>> started.
> >>>
> >>> We already have state guards for this, don't we?  That's the
> >>> SHOST_*_RECOVERY ones.  When eh functions are active, the host
> >>> transitions to a recovery state, so the wait could just wait on that
> >>> state rather than implement an open coded counting semaphore.
> >>
> >> That seems better. For the sg_reset_provider case we just would have to
> >> also wait on the tmf_in_progress bit.
> >
> > The simplest way is may just be to move the kthread_stop() from release
> > to remove.  That synchronously waits for the outstanding error handling
> > to complete and the eh thread to stop.  Perhaps the eh thread should
> > also wait for tmf in progress before it dies?
> 
> Regarding TMF that are in progress: my preference is to leave it to the 
> LLD to wait for any TMF in progress if necessary. At least with SRP over 
> RDMA it is possible to prevent receiving further TMF completion 
> notifications by closing the connection over which these TMF were sent.
> 
> There is a difference though between moving the EH kthread_stop() call 
> and the patch at the start of this thread: moving the EH kthread_stop() 
> call does not prevent that an ioctl like SG_SCSI_RESET triggers an eh_* 
> callback after scsi_remove_host() has finished. However, the 
> scsi_begin_eh() / scsi_end_eh() functions do prevent that an ioctl can 
> cause an eh_* callback to be invoked after scsi_remove_device() finished.

OK, but this doesn't tell me what you're trying to achieve.

An eh function is allowable as long as the host hadn't had the release
callback executed.  That means you must have to have a reference to the
device/host to execute the eh function, which is currently guaranteed
for all invocations.

James



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v11 3/9] Avoid calling __scsi_remove_device() twice
  2013-06-25 13:44       ` James Bottomley
@ 2013-06-25 15:23         ` Bart Van Assche
  0 siblings, 0 replies; 51+ messages in thread
From: Bart Van Assche @ 2013-06-25 15:23 UTC (permalink / raw)
  To: James Bottomley
  Cc: linux-scsi, Joe Lawrence, Tejun Heo, Chanho Min, David Milburn,
	Mike Christie, Hannes Reinecke

On 06/25/13 15:44, James Bottomley wrote:
> On Tue, 2013-06-25 at 10:37 +0200, Bart Van Assche wrote:
>> On 06/24/13 19:38, James Bottomley wrote:
>>> On Wed, 2013-06-12 at 14:52 +0200, Bart Van Assche wrote:
>>>> SCSI devices are added to the shost->__devices list from inside
>>>> scsi_alloc_sdev(). If something goes wrong during LUN scanning,
>>>> e.g. a transport layer failure occurs, then __scsi_remove_device()
>>>> can get invoked by the LUN scanning code for a SCSI device in
>>>> state SDEV_CREATED_BLOCK or SDEV_BLOCKED. If this happens then
>>>> the SCSI device has not yet been added to sysfs (is_visible == 0).
>>>> Make sure that if this happens these devices are transitioned
>>>> into state SDEV_DEL. This avoids that __scsi_remove_device()
>>>> gets invoked a second time by scsi_forget_host().
>>>
>>> The current principle is that scsi_remove_device can fail, so the
>>> condition you're avoiding is expected.  If you want to make it always
>>> succeed, we have to worry about any device state racing with an
>>> asynchronous remove, which looks like a whole nasty can of worms.
>>>
>>> The change log makes it sound like what you actually want to enable is
>>> the ability to remove devices which fail probing but which are in the
>>> blocked state, so why not just respin with only that, which is just
>>> adding the blocked states to the ->SDEV_DEL state transitions?
>>
>> If what you had in mind is the patch below, I think we agree:
>>
>> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
>> index e3d6276..eaea242 100644
>> --- a/drivers/scsi/scsi_lib.c
>> +++ b/drivers/scsi/scsi_lib.c
>> @@ -2185,6 +2185,8 @@ scsi_device_set_state(struct scsi_device *sdev, enum scsi_device_state state)
>>   		case SDEV_OFFLINE:
>>   		case SDEV_TRANSPORT_OFFLINE:
>>   		case SDEV_CANCEL:
>> +		case SDEV_BLOCK:
>> +		case SDEV_CREATED_BLOCK:
>
> Something like this, yes.  For the probe lun case, we have to be in
> CREATED, so any block action transitions only to CREATED_BLOCK.  The
> BLOCK->DEL transition can only be a result of an async remove racing
> with bringup, can't it?  Which is something I think we still want to
> forbid.

OK, I will leave the BLOCK->DEL transition out.

Bart.


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v11 6/9] Make scsi_remove_host() wait until error handling finished
  2013-06-25 13:45           ` James Bottomley
@ 2013-06-25 15:31             ` Bart Van Assche
  2013-06-25 16:13               ` Michael Christie
  0 siblings, 1 reply; 51+ messages in thread
From: Bart Van Assche @ 2013-06-25 15:31 UTC (permalink / raw)
  To: James Bottomley
  Cc: Mike Christie, linux-scsi, Joe Lawrence, Tejun Heo, Chanho Min,
	David Milburn, Hannes Reinecke

On 06/25/13 15:45, James Bottomley wrote:
> On Tue, 2013-06-25 at 11:01 +0200, Bart Van Assche wrote:
>> There is a difference though between moving the EH kthread_stop() call
>> and the patch at the start of this thread: moving the EH kthread_stop()
>> call does not prevent that an ioctl like SG_SCSI_RESET triggers an eh_*
>> callback after scsi_remove_host() has finished. However, the
>> scsi_begin_eh() / scsi_end_eh() functions do prevent that an ioctl can
>> cause an eh_* callback to be invoked after scsi_remove_device() finished.
>
> OK, but this doesn't tell me what you're trying to achieve.
>
> An eh function is allowable as long as the host hadn't had the release
> callback executed.  That means you must have to have a reference to the
> device/host to execute the eh function, which is currently guaranteed
> for all invocations.

That raises a new question: how is an LLD expected to clean up resources 
without triggering a race condition ? What you wrote means that it's not 
safe for an LLD to start cleaning up the resources needed by the eh_* 
callbacks immediately after scsi_remove_device() returns since it it not 
guaranteed that at that time all references to the device have already 
been dropped.

Bart.


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v11 6/9] Make scsi_remove_host() wait until error handling finished
  2013-06-25 15:31             ` Bart Van Assche
@ 2013-06-25 16:13               ` Michael Christie
  2013-06-25 17:40                 ` James Bottomley
  2014-01-30 19:46                 ` Bart Van Assche
  0 siblings, 2 replies; 51+ messages in thread
From: Michael Christie @ 2013-06-25 16:13 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: James Bottomley, linux-scsi, Joe Lawrence, Tejun Heo, Chanho Min,
	David Milburn, Hannes Reinecke


On Jun 25, 2013, at 10:31 AM, Bart Van Assche <bvanassche@acm.org> wrote:

> On 06/25/13 15:45, James Bottomley wrote:
>> On Tue, 2013-06-25 at 11:01 +0200, Bart Van Assche wrote:
>>> There is a difference though between moving the EH kthread_stop() call
>>> and the patch at the start of this thread: moving the EH kthread_stop()
>>> call does not prevent that an ioctl like SG_SCSI_RESET triggers an eh_*
>>> callback after scsi_remove_host() has finished. However, the
>>> scsi_begin_eh() / scsi_end_eh() functions do prevent that an ioctl can
>>> cause an eh_* callback to be invoked after scsi_remove_device() finished.
>> 
>> OK, but this doesn't tell me what you're trying to achieve.
>> 
>> An eh function is allowable as long as the host hadn't had the release
>> callback executed.  That means you must have to have a reference to the
>> device/host to execute the eh function, which is currently guaranteed
>> for all invocations.
> 
> That raises a new question: how is an LLD expected to clean up resources without triggering a race condition ? What you wrote means that it's not safe for an LLD to start cleaning up the resources needed by the eh_* callbacks immediately after scsi_remove_device() returns since it it not guaranteed that at that time all references to the device have already been dropped.
> 


A callback in the device/target/host (whatever is needed) release function would do this right? If I understand James right, I think he suggested something like this in another mail.


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v11 6/9] Make scsi_remove_host() wait until error handling finished
  2013-06-25 16:13               ` Michael Christie
@ 2013-06-25 17:40                 ` James Bottomley
  2013-06-25 17:47                   ` Bart Van Assche
  2014-01-30 19:46                 ` Bart Van Assche
  1 sibling, 1 reply; 51+ messages in thread
From: James Bottomley @ 2013-06-25 17:40 UTC (permalink / raw)
  To: Michael Christie
  Cc: Bart Van Assche, linux-scsi, Joe Lawrence, Tejun Heo, Chanho Min,
	David Milburn, Hannes Reinecke

On Tue, 2013-06-25 at 11:13 -0500, Michael Christie wrote:
> On Jun 25, 2013, at 10:31 AM, Bart Van Assche <bvanassche@acm.org> wrote:
> 
> > On 06/25/13 15:45, James Bottomley wrote:
> >> On Tue, 2013-06-25 at 11:01 +0200, Bart Van Assche wrote:
> >>> There is a difference though between moving the EH kthread_stop() call
> >>> and the patch at the start of this thread: moving the EH kthread_stop()
> >>> call does not prevent that an ioctl like SG_SCSI_RESET triggers an eh_*
> >>> callback after scsi_remove_host() has finished. However, the
> >>> scsi_begin_eh() / scsi_end_eh() functions do prevent that an ioctl can
> >>> cause an eh_* callback to be invoked after scsi_remove_device() finished.
> >> 
> >> OK, but this doesn't tell me what you're trying to achieve.
> >> 
> >> An eh function is allowable as long as the host hadn't had the release
> >> callback executed.  That means you must have to have a reference to the
> >> device/host to execute the eh function, which is currently guaranteed
> >> for all invocations.
> > 
> > That raises a new question: how is an LLD expected to clean up resources without triggering a race condition ? What you wrote means that it's not safe for an LLD to start cleaning up the resources needed by the eh_* callbacks immediately after scsi_remove_device() returns since it it not guaranteed that at that time all references to the device have already been dropped.
> > 
> 
> 
> A callback in the device/target/host (whatever is needed) release
> function would do this right? If I understand James right, I think he
> suggested something like this in another mail.

Exactly ... at least that's what we should do.

If I look at what we actually do: all the HBAs treat scsi_remove_host as
a waited for transition.  The reason this works is the loop over
__scsi_remove_device() in scsi_forget_host().  By the time that loop
returns, every scsi_device is gone (and so is every target).  Because
blk_cleanup_queue() induces a synchronous wait for the queue to die in
__scsi_remove_device(), there can be no outstanding I/O and no eh
activity for the device when it returns (and no possibility of starting
any).  Thus at the end of scsi_forget_host, we have no devices to start
I/O and no eh activity, so the final put will be the last.

James





^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v11 6/9] Make scsi_remove_host() wait until error handling finished
  2013-06-25 17:40                 ` James Bottomley
@ 2013-06-25 17:47                   ` Bart Van Assche
  0 siblings, 0 replies; 51+ messages in thread
From: Bart Van Assche @ 2013-06-25 17:47 UTC (permalink / raw)
  To: James Bottomley
  Cc: linux-scsi, Joe Lawrence, Tejun Heo, Chanho Min, David Milburn,
	Hannes Reinecke, Mike Christie

On 06/25/13 19:40, James Bottomley wrote:
> If I look at what we actually do: all the HBAs treat scsi_remove_host as
> a waited for transition.  The reason this works is the loop over
> __scsi_remove_device() in scsi_forget_host().  By the time that loop
> returns, every scsi_device is gone (and so is every target).  Because
> blk_cleanup_queue() induces a synchronous wait for the queue to die in
> __scsi_remove_device(), there can be no outstanding I/O and no eh
> activity for the device when it returns (and no possibility of starting
> any).  Thus at the end of scsi_forget_host, we have no devices to start
> I/O and no eh activity, so the final put will be the last.

With the patch at the start of this thread everything works like you 
described above. However, without that patch EH activity can continue 
after scsi_remove_device() has returned. I have seen that occurring 
several times while testing SCSI LLD patches.

Bart.


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v11 6/9] Make scsi_remove_host() wait until error handling finished
  2013-06-25 16:13               ` Michael Christie
  2013-06-25 17:40                 ` James Bottomley
@ 2014-01-30 19:46                 ` Bart Van Assche
  2014-01-31  5:58                   ` James Bottomley
  1 sibling, 1 reply; 51+ messages in thread
From: Bart Van Assche @ 2014-01-30 19:46 UTC (permalink / raw)
  To: Michael Christie; +Cc: James Bottomley, Hannes Reinecke, linux-scsi

On 06/25/13 18:13, Michael Christie wrote:
> On Jun 25, 2013, at 10:31 AM, Bart Van Assche <bvanassche@acm.org> wrote:
>> On 06/25/13 15:45, James Bottomley wrote:
>>> On Tue, 2013-06-25 at 11:01 +0200, Bart Van Assche wrote:
>>>> There is a difference though between moving the EH kthread_stop() call
>>>> and the patch at the start of this thread: moving the EH kthread_stop()
>>>> call does not prevent that an ioctl like SG_SCSI_RESET triggers an eh_*
>>>> callback after scsi_remove_host() has finished. However, the
>>>> scsi_begin_eh() / scsi_end_eh() functions do prevent that an ioctl can
>>>> cause an eh_* callback to be invoked after scsi_remove_device() finished.
>>>
>>> OK, but this doesn't tell me what you're trying to achieve.
>>>
>>> An eh function is allowable as long as the host hadn't had the release
>>> callback executed.  That means you must have to have a reference to the
>>> device/host to execute the eh function, which is currently guaranteed
>>> for all invocations.
>>
>> That raises a new question: how is an LLD expected to clean up resources
>> without triggering a race condition ? What you wrote means that it's not
>> safe for an LLD to start cleaning up the resources needed by the eh_*
>> callbacks immediately after scsi_remove_device() returns since it it not
>> guaranteed that at that time all references to the device have already
>> been dropped.
>
> A callback in the device/target/host (whatever is needed) release function
> would do this right? If I understand James right, I think he suggested
> something like this in another mail.

(replying to an e-mail of seven months ago - see also
http://thread.gmane.org/gmane.linux.scsi/82572/focus=82822)

Hello Mike,

Sorry but I'm afraid that making the SCSI core invoke a callback
function from a device, target or host release function would create a
new challenge, namely making sure that all these callback functions have
finished before the module is unloaded that contains the SCSI host
template and the code implementing such a callback function. That
challenge is not specific to the SCSI infrastructure but is a general
question that has not yet been solved in the Linux kernel (see e.g.
"[PATCH] kobject: provide kobject_put_wait to fix module unload race"
for a more general discussion about how to ensure that kobject release
functions are invoked before the module is unloaded that owns the
release function - http://thread.gmane.org/gmane.linux.kernel/1622885).

Or maybe this just means that I misunderstood you ?

In case it is not clear why I'm reviving this discussion: now that the
"improved eh timeout handler" patch is upstream (commit
e494f6a728394ab0df194342549ee20e6f0752df) there is an additional way in
which the SCSI core can invoke an EH function concurrently with or after
scsi_remove_host() has finished, namely from the TMF work queue
(tmf_work_q).

Thanks,

Bart.


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v11 6/9] Make scsi_remove_host() wait until error handling finished
  2014-01-30 19:46                 ` Bart Van Assche
@ 2014-01-31  5:58                   ` James Bottomley
  2014-01-31  7:52                     ` Bart Van Assche
  0 siblings, 1 reply; 51+ messages in thread
From: James Bottomley @ 2014-01-31  5:58 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: Michael Christie, Hannes Reinecke, linux-scsi

On Thu, 2014-01-30 at 20:46 +0100, Bart Van Assche wrote:
> On 06/25/13 18:13, Michael Christie wrote:
> > On Jun 25, 2013, at 10:31 AM, Bart Van Assche <bvanassche@acm.org> wrote:
> >> On 06/25/13 15:45, James Bottomley wrote:
> >>> On Tue, 2013-06-25 at 11:01 +0200, Bart Van Assche wrote:
> >>>> There is a difference though between moving the EH kthread_stop() call
> >>>> and the patch at the start of this thread: moving the EH kthread_stop()
> >>>> call does not prevent that an ioctl like SG_SCSI_RESET triggers an eh_*
> >>>> callback after scsi_remove_host() has finished. However, the
> >>>> scsi_begin_eh() / scsi_end_eh() functions do prevent that an ioctl can
> >>>> cause an eh_* callback to be invoked after scsi_remove_device() finished.
> >>>
> >>> OK, but this doesn't tell me what you're trying to achieve.
> >>>
> >>> An eh function is allowable as long as the host hadn't had the release
> >>> callback executed.  That means you must have to have a reference to the
> >>> device/host to execute the eh function, which is currently guaranteed
> >>> for all invocations.
> >>
> >> That raises a new question: how is an LLD expected to clean up resources
> >> without triggering a race condition ? What you wrote means that it's not
> >> safe for an LLD to start cleaning up the resources needed by the eh_*
> >> callbacks immediately after scsi_remove_device() returns since it it not
> >> guaranteed that at that time all references to the device have already
> >> been dropped.
> >
> > A callback in the device/target/host (whatever is needed) release function
> > would do this right? If I understand James right, I think he suggested
> > something like this in another mail.
> 
> (replying to an e-mail of seven months ago - see also
> http://thread.gmane.org/gmane.linux.scsi/82572/focus=82822)
> 
> Hello Mike,
> 
> Sorry but I'm afraid that making the SCSI core invoke a callback
> function from a device, target or host release function would create a
> new challenge, namely making sure that all these callback functions have
> finished before the module is unloaded that contains the SCSI host
> template and the code implementing such a callback function. That
> challenge is not specific to the SCSI infrastructure but is a general
> question that has not yet been solved in the Linux kernel (see e.g.
> "[PATCH] kobject: provide kobject_put_wait to fix module unload race"
> for a more general discussion about how to ensure that kobject release
> functions are invoked before the module is unloaded that owns the
> release function - http://thread.gmane.org/gmane.linux.kernel/1622885).

For callbacks, that's easy: it's module_get/module_put

> Or maybe this just means that I misunderstood you ?
> 
> In case it is not clear why I'm reviving this discussion: now that the
> "improved eh timeout handler" patch is upstream (commit
> e494f6a728394ab0df194342549ee20e6f0752df) there is an additional way in
> which the SCSI core can invoke an EH function concurrently with or after
> scsi_remove_host() has finished, namely from the TMF work queue
> (tmf_work_q).

But the fundamental guarantee is that the eh thread for the host (the eh
context if you will) has to be dead before the host can be removed and
the module unloaded.  The thread doesn't die until all the work is done.

James



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v11 6/9] Make scsi_remove_host() wait until error handling finished
  2014-01-31  5:58                   ` James Bottomley
@ 2014-01-31  7:52                     ` Bart Van Assche
  0 siblings, 0 replies; 51+ messages in thread
From: Bart Van Assche @ 2014-01-31  7:52 UTC (permalink / raw)
  To: James Bottomley; +Cc: Michael Christie, Hannes Reinecke, linux-scsi

On 01/31/14 06:58, James Bottomley wrote:
> On Thu, 2014-01-30 at 20:46 +0100, Bart Van Assche wrote:
>> On 06/25/13 18:13, Michael Christie wrote:
>> Sorry but I'm afraid that making the SCSI core invoke a callback
>> function from a device, target or host release function would create a
>> new challenge, namely making sure that all these callback functions have
>> finished before the module is unloaded that contains the SCSI host
>> template and the code implementing such a callback function. That
>> challenge is not specific to the SCSI infrastructure but is a general
>> question that has not yet been solved in the Linux kernel (see e.g.
>> "[PATCH] kobject: provide kobject_put_wait to fix module unload race"
>> for a more general discussion about how to ensure that kobject release
>> functions are invoked before the module is unloaded that owns the
>> release function - http://thread.gmane.org/gmane.linux.kernel/1622885).
> 
> For callbacks, that's easy: it's module_get/module_put

Adding a module_get() call in scsi_add_host() and a module_put() call in
scsi_host_dev_release() would cause a behavior change that would be
reported by end users as "system hangs during shutdown". Today it is
possible to unload a kernel module via rmmod that created one or more
SCSI hosts. Since user space does not remove SCSI hosts during system
shutdown, trying to unload a SCSI LLD kernel module that had created one
or more SCSI hosts would cause the module unload to fail because of a
non-zero module reference count.

>> In case it is not clear why I'm reviving this discussion: now that the
>> "improved eh timeout handler" patch is upstream (commit
>> e494f6a728394ab0df194342549ee20e6f0752df) there is an additional way in
>> which the SCSI core can invoke an EH function concurrently with or after
>> scsi_remove_host() has finished, namely from the TMF work queue
>> (tmf_work_q).
> 
> But the fundamental guarantee is that the eh thread for the host (the eh
> context if you will) has to be dead before the host can be removed and
> the module unloaded.  The thread doesn't die until all the work is done.

Maybe I'm misunderstanding something, but as far as I can see
kthread_stop(shost->ehandler) is invoked from scsi_host_dev_release().
That last function is called by the last scsi_host_put(). And LLD's
invoke scsi_host_put() after scsi_remove_host(). In other words, the
SCSI error handler thread and TMF work queue are still active when
scsi_remove_host() returns.

Should I repost the patch at the start of this thread ?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 51+ messages in thread

end of thread, other threads:[~2014-01-31  7:52 UTC | newest]

Thread overview: 51+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-06-12 12:48 [PATCH v11 0/9] More device removal fixes Bart Van Assche
2013-06-12 12:49 ` [PATCH v11 1/9] Fix race between starved list and device removal Bart Van Assche
2013-06-24 15:38   ` James Bottomley
2013-06-24 16:16     ` Bart Van Assche
2013-06-24 16:23       ` James Bottomley
2013-06-24 17:24     ` Mike Christie
2013-06-24 17:49       ` James Bottomley
2013-06-12 12:51 ` [PATCH v11 2/9] Remove get_device() / put_device() pair from scsi_request_fn() Bart Van Assche
2013-06-24  1:29   ` Mike Christie
2013-06-24  2:36   ` James Bottomley
2013-06-24  7:13     ` Bart Van Assche
2013-06-24 13:34       ` James Bottomley
2013-06-24 15:43         ` Bart Van Assche
2013-06-12 12:52 ` [PATCH v11 3/9] Avoid calling __scsi_remove_device() twice Bart Van Assche
2013-06-23 21:35   ` Mike Christie
2013-06-24  6:29     ` Bart Van Assche
2013-06-24 17:38   ` James Bottomley
2013-06-25  8:37     ` Bart Van Assche
2013-06-25 13:44       ` James Bottomley
2013-06-25 15:23         ` Bart Van Assche
2013-06-12 12:53 ` [PATCH v11 4/9] Disallow changing the device state via sysfs into "deleted" Bart Van Assche
2013-06-24  1:05   ` Mike Christie
2013-06-24  6:35     ` Bart Van Assche
2013-06-24 17:59   ` James Bottomley
2013-06-25  8:41     ` Bart Van Assche
2013-06-25 13:42       ` James Bottomley
2013-06-12 12:54 ` [PATCH v11 5/9] Avoid saving/restoring interrupt state inside scsi_remove_host() Bart Van Assche
2013-06-24  1:06   ` Mike Christie
2013-06-12 12:55 ` [PATCH v11 6/9] Make scsi_remove_host() wait until error handling finished Bart Van Assche
2013-06-24  1:15   ` Mike Christie
2013-06-24  6:49     ` Bart Van Assche
2013-06-24 19:19   ` James Bottomley
2013-06-24 20:04     ` Mike Christie
2013-06-24 22:27       ` James Bottomley
2013-06-25  2:26         ` Mike Christie
2013-06-25  2:56           ` Michael Christie
2013-06-25  9:01         ` Bart Van Assche
2013-06-25 13:45           ` James Bottomley
2013-06-25 15:31             ` Bart Van Assche
2013-06-25 16:13               ` Michael Christie
2013-06-25 17:40                 ` James Bottomley
2013-06-25 17:47                   ` Bart Van Assche
2014-01-30 19:46                 ` Bart Van Assche
2014-01-31  5:58                   ` James Bottomley
2014-01-31  7:52                     ` Bart Van Assche
2013-06-25 11:13         ` Bart Van Assche
2013-06-12 12:56 ` PATCH v11 7/9] Avoid that scsi_device_set_state() triggers a race Bart Van Assche
2013-06-12 12:57 ` [PATCH v11 8/9] Save and restore host_scribble during error handling Bart Van Assche
2013-06-24  1:21   ` Mike Christie
2013-06-24  2:08     ` James Bottomley
2013-06-12 12:58 ` [PATCH v11 9/9] Avoid reenabling I/O after the transport became offline Bart Van Assche

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.