[PATCH 0/7 v5] More device removal fixes

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/7 v5] More device removal fixes
@ 2012-10-26 12:00 Bart Van Assche
  2012-10-26 12:01 ` [PATCH 1/7] block: Avoid that blk_drain_queue() finishes early Bart Van Assche
                   ` (7 more replies)
  0 siblings, 8 replies; 24+ messages in thread
From: Bart Van Assche @ 2012-10-26 12:00 UTC (permalink / raw)
  To: linux-scsi, James Bottomley, Mike Christie, Jens Axboe,
	Tejun Heo, Chanho Min

Fix a few race conditions that can be triggered by removing a device:
- Avoid that request_fn() can be invoked on a dead queue.
- Avoid that blk_cleanup_queue() can finish while request_fn is still
   running.
- Fix a race between starved list processing and device removal.

These patches have been tested on top of kernel v3.7-rc2.

Changes compared to v4:
- Moved queue_flag_set(QUEUE_FLAG_DEAD, q) from blk_drain_queue() into
   blk_cleanup_queue().
- Declared the new __blk_run_queue_uncond() function inline. Checked in
   the generated assembler code that this function is really inlined in
   __blk_run_queue().
- Elaborated several patch descriptions.
- Added sparse annotations to scsi_request_fn().
- Split several patches.

Changes compared to v3:
- Fixed a race condition by setting QUEUE_FLAG_DEAD earlier.
- Added a patch for fixing a race between starved list processing
   and device removal to this series.

Changes compared to v2:
- Split second patch into two patches.
- Refined patch descriptions.

Changes compared to v1:
- Included a patch to rename QUEUE_FLAG_DEAD.
- Refined the descriptions of the __blk_run_queue_uncond() and
   blk_cleanup_queue() functions.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 1/7] block: Avoid that blk_drain_queue() finishes early
  2012-10-26 12:00 [PATCH 0/7 v5] More device removal fixes Bart Van Assche
@ 2012-10-26 12:01 ` Bart Van Assche
  2012-10-29  1:47   ` Tejun Heo
  2012-10-26 12:02 ` [PATCH 2/7] block: Let blk_drain_queue() caller obtain the queue lock Bart Van Assche
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 24+ messages in thread
From: Bart Van Assche @ 2012-10-26 12:01 UTC (permalink / raw)
  To: linux-scsi
  Cc: James Bottomley, Mike Christie, Jens Axboe, Tejun Heo, Chanho Min

Code like "drain |= q->nr_rqs[i]" might result in blk_drain_queue()
to finish early if the expression at the RHS is a multiple of 256
since the drain variable is only eight bits wide. Avoid this by
changing the type of the drain variable from bool into unsigned.

Cc: James Bottomley <JBottomley@Parallels.com>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Tejun Heo <tj@kernel.org>
Cc: Chanho Min <chanho.min@lge.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 block/blk-core.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index a33870b..ef2e045 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -362,7 +362,7 @@ void blk_drain_queue(struct request_queue *q, bool drain_all)
 	int i;
 
 	while (true) {
-		bool drain = false;
+		unsigned drain = 0;
 
 		spin_lock_irq(q->queue_lock);
 
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 2/7] block: Let blk_drain_queue() caller obtain the queue lock
  2012-10-26 12:00 [PATCH 0/7 v5] More device removal fixes Bart Van Assche
  2012-10-26 12:01 ` [PATCH 1/7] block: Avoid that blk_drain_queue() finishes early Bart Van Assche
@ 2012-10-26 12:02 ` Bart Van Assche
  2012-10-29  1:55   ` Tejun Heo
  2012-10-26 12:02 ` [PATCH 3/7] block: Rename queue dead flag Bart Van Assche
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 24+ messages in thread
From: Bart Van Assche @ 2012-10-26 12:02 UTC (permalink / raw)
  To: linux-scsi
  Cc: James Bottomley, Mike Christie, Jens Axboe, Tejun Heo, Chanho Min

Let the caller of blk_drain_queue() obtain the queue lock to improve
readability of the patch called "Avoid that request_fn is invoked on
a dead queue".

Cc: James Bottomley <JBottomley@Parallels.com>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Tejun Heo <tj@kernel.org>
Cc: Chanho Min <chanho.min@lge.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 block/blk-core.c |   30 ++++++++++++++++++------------
 1 file changed, 18 insertions(+), 12 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index ef2e045..76aff1d 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -349,7 +349,7 @@ void blk_put_queue(struct request_queue *q)
 EXPORT_SYMBOL(blk_put_queue);
 
 /**
- * blk_drain_queue - drain requests from request_queue
+ * __blk_drain_queue - drain requests from request_queue
  * @q: queue to drain
  * @drain_all: whether to drain all requests or only the ones w/ ELVPRIV
  *
@@ -357,15 +357,17 @@ EXPORT_SYMBOL(blk_put_queue);
  * If not, only ELVPRIV requests are drained.  The caller is responsible
  * for ensuring that no new requests which need to be drained are queued.
  */
-void blk_drain_queue(struct request_queue *q, bool drain_all)
+static void __blk_drain_queue(struct request_queue *q, bool drain_all)
+	__releases(q->queue_lock)
+	__acquires(q->queue_lock)
 {
 	int i;
 
+	lockdep_assert_held(q->queue_lock);
+
 	while (true) {
 		unsigned drain = 0;
 
-		spin_lock_irq(q->queue_lock);
-
 		/*
 		 * The caller might be trying to drain @q before its
 		 * elevator is initialized.
@@ -401,11 +403,14 @@ void blk_drain_queue(struct request_queue *q, bool drain_all)
 			}
 		}
 
-		spin_unlock_irq(q->queue_lock);
-
 		if (!drain)
 			break;
+
+		spin_unlock_irq(q->queue_lock);
+
 		msleep(10);
+
+		spin_lock_irq(q->queue_lock);
 	}
 
 	/*
@@ -416,13 +421,9 @@ void blk_drain_queue(struct request_queue *q, bool drain_all)
 	if (q->request_fn) {
 		struct request_list *rl;
 
-		spin_lock_irq(q->queue_lock);
-
 		blk_queue_for_each_rl(rl, q)
 			for (i = 0; i < ARRAY_SIZE(rl->wait); i++)
 				wake_up_all(&rl->wait[i]);
-
-		spin_unlock_irq(q->queue_lock);
 	}
 }
 
@@ -446,7 +447,10 @@ void blk_queue_bypass_start(struct request_queue *q)
 	spin_unlock_irq(q->queue_lock);
 
 	if (drain) {
-		blk_drain_queue(q, false);
+		spin_lock_irq(q->queue_lock);
+		__blk_drain_queue(q, false);
+		spin_unlock_irq(q->queue_lock);
+
 		/* ensure blk_queue_bypass() is %true inside RCU read lock */
 		synchronize_rcu();
 	}
@@ -504,7 +508,9 @@ void blk_cleanup_queue(struct request_queue *q)
 	mutex_unlock(&q->sysfs_lock);
 
 	/* drain all requests queued before DEAD marking */
-	blk_drain_queue(q, true);
+	spin_lock_irq(lock);
+	__blk_drain_queue(q, true);
+	spin_unlock_irq(lock);
 
 	/* @q won't process any more request, flush async actions */
 	del_timer_sync(&q->backing_dev_info.laptop_mode_wb_timer);
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 3/7] block: Rename queue dead flag
  2012-10-26 12:00 [PATCH 0/7 v5] More device removal fixes Bart Van Assche
  2012-10-26 12:01 ` [PATCH 1/7] block: Avoid that blk_drain_queue() finishes early Bart Van Assche
  2012-10-26 12:02 ` [PATCH 2/7] block: Let blk_drain_queue() caller obtain the queue lock Bart Van Assche
@ 2012-10-26 12:02 ` Bart Van Assche
  2012-10-26 12:03 ` [PATCH 4/7] block: Avoid that request_fn is invoked on a dead queue Bart Van Assche
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 24+ messages in thread
From: Bart Van Assche @ 2012-10-26 12:02 UTC (permalink / raw)
  To: linux-scsi
  Cc: James Bottomley, Mike Christie, Jens Axboe, Tejun Heo, Chanho Min

QUEUE_FLAG_DEAD is used to indicate that queuing new requests must
stop. After this flag has been set queue draining starts. However,
during the queue draining phase it is still safe to invoke the
queue's request_fn, so QUEUE_FLAG_DYING is a better name for this
flag.

This patch has been generated by running the following command
over the kernel source tree:

git grep -lEw 'blk_queue_dead|QUEUE_FLAG_DEAD' |
    xargs sed -i.tmp -e 's/blk_queue_dead/blk_queue_dying/g'  \
        -e 's/QUEUE_FLAG_DEAD/QUEUE_FLAG_DYING/g';            \
sed -i.tmp -e "s/QUEUE_FLAG_DYING$(printf \\t)*5/QUEUE_FLAG_DYING$(printf \\t)5/g" \
    include/linux/blkdev.h;                                   \
sed -i.tmp -e 's/ DEAD/ DYING/g' block/blk-core.c

Cc: James Bottomley <JBottomley@Parallels.com>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Chanho Min <chanho.min@lge.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 block/blk-cgroup.c      |    2 +-
 block/blk-core.c        |   22 +++++++++++-----------
 block/blk-exec.c        |    2 +-
 block/blk-sysfs.c       |    4 ++--
 block/blk-throttle.c    |    2 +-
 block/blk.h             |    2 +-
 drivers/scsi/scsi_lib.c |    2 +-
 include/linux/blkdev.h  |    4 ++--
 8 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index cafcd74..396c319 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -231,7 +231,7 @@ struct blkcg_gq *blkg_lookup_create(struct blkcg *blkcg,
 	 * we shouldn't allow anything to go through for a bypassing queue.
 	 */
 	if (unlikely(blk_queue_bypass(q)))
-		return ERR_PTR(blk_queue_dead(q) ? -EINVAL : -EBUSY);
+		return ERR_PTR(blk_queue_dying(q) ? -EINVAL : -EBUSY);
 	return __blkg_lookup_create(blkcg, q, NULL);
 }
 EXPORT_SYMBOL_GPL(blkg_lookup_create);
diff --git a/block/blk-core.c b/block/blk-core.c
index 76aff1d..986e38e 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -477,16 +477,16 @@ EXPORT_SYMBOL_GPL(blk_queue_bypass_end);
  * blk_cleanup_queue - shutdown a request queue
  * @q: request queue to shutdown
  *
- * Mark @q DEAD, drain all pending requests, destroy and put it.  All
+ * Mark @q DYING, drain all pending requests, destroy and put it.  All
  * future requests will be failed immediately with -ENODEV.
  */
 void blk_cleanup_queue(struct request_queue *q)
 {
 	spinlock_t *lock = q->queue_lock;
 
-	/* mark @q DEAD, no new request or merges will be allowed afterwards */
+	/* mark @q DYING, no new request or merges will be allowed afterwards */
 	mutex_lock(&q->sysfs_lock);
-	queue_flag_set_unlocked(QUEUE_FLAG_DEAD, q);
+	queue_flag_set_unlocked(QUEUE_FLAG_DYING, q);
 	spin_lock_irq(lock);
 
 	/*
@@ -503,11 +503,11 @@ void blk_cleanup_queue(struct request_queue *q)
 
 	queue_flag_set(QUEUE_FLAG_NOMERGES, q);
 	queue_flag_set(QUEUE_FLAG_NOXMERGES, q);
-	queue_flag_set(QUEUE_FLAG_DEAD, q);
+	queue_flag_set(QUEUE_FLAG_DYING, q);
 	spin_unlock_irq(lock);
 	mutex_unlock(&q->sysfs_lock);
 
-	/* drain all requests queued before DEAD marking */
+	/* drain all requests queued before DYING marking */
 	spin_lock_irq(lock);
 	__blk_drain_queue(q, true);
 	spin_unlock_irq(lock);
@@ -722,7 +722,7 @@ EXPORT_SYMBOL(blk_init_allocated_queue);
 
 bool blk_get_queue(struct request_queue *q)
 {
-	if (likely(!blk_queue_dead(q))) {
+	if (likely(!blk_queue_dying(q))) {
 		__blk_get_queue(q);
 		return true;
 	}
@@ -876,7 +876,7 @@ static struct request *__get_request(struct request_list *rl, int rw_flags,
 	const bool is_sync = rw_is_sync(rw_flags) != 0;
 	int may_queue;
 
-	if (unlikely(blk_queue_dead(q)))
+	if (unlikely(blk_queue_dying(q)))
 		return NULL;
 
 	may_queue = elv_may_queue(q, rw_flags);
@@ -1056,7 +1056,7 @@ retry:
 	if (rq)
 		return rq;
 
-	if (!(gfp_mask & __GFP_WAIT) || unlikely(blk_queue_dead(q))) {
+	if (!(gfp_mask & __GFP_WAIT) || unlikely(blk_queue_dying(q))) {
 		blk_put_rl(rl);
 		return NULL;
 	}
@@ -1916,7 +1916,7 @@ int blk_insert_cloned_request(struct request_queue *q, struct request *rq)
 		return -EIO;
 
 	spin_lock_irqsave(q->queue_lock, flags);
-	if (unlikely(blk_queue_dead(q))) {
+	if (unlikely(blk_queue_dying(q))) {
 		spin_unlock_irqrestore(q->queue_lock, flags);
 		return -ENODEV;
 	}
@@ -2892,7 +2892,7 @@ static void queue_unplugged(struct request_queue *q, unsigned int depth,
 	/*
 	 * Don't mess with dead queue.
 	 */
-	if (unlikely(blk_queue_dead(q))) {
+	if (unlikely(blk_queue_dying(q))) {
 		spin_unlock(q->queue_lock);
 		return;
 	}
@@ -3001,7 +3001,7 @@ void blk_flush_plug_list(struct blk_plug *plug, bool from_schedule)
 		/*
 		 * Short-circuit if @q is dead
 		 */
-		if (unlikely(blk_queue_dead(q))) {
+		if (unlikely(blk_queue_dying(q))) {
 			__blk_end_request_all(rq, -ENODEV);
 			continue;
 		}
diff --git a/block/blk-exec.c b/block/blk-exec.c
index 8b6dc5b..4aec98d 100644
--- a/block/blk-exec.c
+++ b/block/blk-exec.c
@@ -60,7 +60,7 @@ void blk_execute_rq_nowait(struct request_queue *q, struct gendisk *bd_disk,
 
 	spin_lock_irq(q->queue_lock);
 
-	if (unlikely(blk_queue_dead(q))) {
+	if (unlikely(blk_queue_dying(q))) {
 		rq->errors = -ENXIO;
 		if (rq->end_io)
 			rq->end_io(rq, rq->errors);
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index ce62046..7881477 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -466,7 +466,7 @@ queue_attr_show(struct kobject *kobj, struct attribute *attr, char *page)
 	if (!entry->show)
 		return -EIO;
 	mutex_lock(&q->sysfs_lock);
-	if (blk_queue_dead(q)) {
+	if (blk_queue_dying(q)) {
 		mutex_unlock(&q->sysfs_lock);
 		return -ENOENT;
 	}
@@ -488,7 +488,7 @@ queue_attr_store(struct kobject *kobj, struct attribute *attr,
 
 	q = container_of(kobj, struct request_queue, kobj);
 	mutex_lock(&q->sysfs_lock);
-	if (blk_queue_dead(q)) {
+	if (blk_queue_dying(q)) {
 		mutex_unlock(&q->sysfs_lock);
 		return -ENOENT;
 	}
diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index a9664fa..3114622 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -302,7 +302,7 @@ static struct throtl_grp *throtl_lookup_create_tg(struct throtl_data *td,
 		/* if %NULL and @q is alive, fall back to root_tg */
 		if (!IS_ERR(blkg))
 			tg = blkg_to_tg(blkg);
-		else if (!blk_queue_dead(q))
+		else if (!blk_queue_dying(q))
 			tg = td_root_tg(td);
 	}
 
diff --git a/block/blk.h b/block/blk.h
index ca51543..2218a8a 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -96,7 +96,7 @@ static inline struct request *__elv_next_request(struct request_queue *q)
 			q->flush_queue_delayed = 1;
 			return NULL;
 		}
-		if (unlikely(blk_queue_dead(q)) ||
+		if (unlikely(blk_queue_dying(q)) ||
 		    !q->elevator->type->ops.elevator_dispatch_fn(q, 0))
 			return NULL;
 	}
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index da36a3a..f29a1a9 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1406,7 +1406,7 @@ static int scsi_lld_busy(struct request_queue *q)
 	struct scsi_device *sdev = q->queuedata;
 	struct Scsi_Host *shost;
 
-	if (blk_queue_dead(q))
+	if (blk_queue_dying(q))
 		return 0;
 
 	shost = sdev->host;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 1756001..aba8246 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -437,7 +437,7 @@ struct request_queue {
 #define QUEUE_FLAG_STOPPED	2	/* queue is stopped */
 #define	QUEUE_FLAG_SYNCFULL	3	/* read queue has been filled */
 #define QUEUE_FLAG_ASYNCFULL	4	/* write queue has been filled */
-#define QUEUE_FLAG_DEAD		5	/* queue being torn down */
+#define QUEUE_FLAG_DYING	5	/* queue being torn down */
 #define QUEUE_FLAG_BYPASS	6	/* act as dumb FIFO queue */
 #define QUEUE_FLAG_BIDI		7	/* queue supports bidi requests */
 #define QUEUE_FLAG_NOMERGES     8	/* disable merge attempts */
@@ -521,7 +521,7 @@ static inline void queue_flag_clear(unsigned int flag, struct request_queue *q)
 
 #define blk_queue_tagged(q)	test_bit(QUEUE_FLAG_QUEUED, &(q)->queue_flags)
 #define blk_queue_stopped(q)	test_bit(QUEUE_FLAG_STOPPED, &(q)->queue_flags)
-#define blk_queue_dead(q)	test_bit(QUEUE_FLAG_DEAD, &(q)->queue_flags)
+#define blk_queue_dying(q)	test_bit(QUEUE_FLAG_DYING, &(q)->queue_flags)
 #define blk_queue_bypass(q)	test_bit(QUEUE_FLAG_BYPASS, &(q)->queue_flags)
 #define blk_queue_nomerges(q)	test_bit(QUEUE_FLAG_NOMERGES, &(q)->queue_flags)
 #define blk_queue_noxmerges(q)	\
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 4/7] block: Avoid that request_fn is invoked on a dead queue
  2012-10-26 12:00 [PATCH 0/7 v5] More device removal fixes Bart Van Assche
                   ` (2 preceding siblings ...)
  2012-10-26 12:02 ` [PATCH 3/7] block: Rename queue dead flag Bart Van Assche
@ 2012-10-26 12:03 ` Bart Van Assche
  2012-10-29  1:59   ` Tejun Heo
  2012-10-26 12:04 ` [PATCH 5/7] block: Make blk_cleanup_queue() wait until request_fn finished Bart Van Assche
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 24+ messages in thread
From: Bart Van Assche @ 2012-10-26 12:03 UTC (permalink / raw)
  To: linux-scsi
  Cc: James Bottomley, Mike Christie, Jens Axboe, Tejun Heo, Chanho Min

A block driver may start cleaning up resources needed by its
request_fn as soon as blk_cleanup_queue() finished, so request_fn
must not be invoked after draining finished.

Cc: James Bottomley <JBottomley@Parallels.com>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Tejun Heo <tj@kernel.org>
Cc: Chanho Min <chanho.min@lge.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 block/blk-core.c       |   26 +++++++++++++++++++++++---
 block/blk-exec.c       |    2 +-
 block/blk.h            |    2 ++
 include/linux/blkdev.h |    2 ++
 4 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 986e38e..2f1be7a 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -293,6 +293,25 @@ void blk_sync_queue(struct request_queue *q)
 EXPORT_SYMBOL(blk_sync_queue);
 
 /**
+ * __blk_run_queue_uncond - run a queue whether or not it has been stopped
+ * @q:	The queue to run
+ *
+ * Description:
+ *    Invoke request handling on a queue if there are any pending requests.
+ *    May be used to restart request handling after a request has completed.
+ *    This variant runs the queue whether or not the queue has been
+ *    stopped. Must be called with the queue lock held and interrupts
+ *    disabled. See also @blk_run_queue.
+ */
+void inline __blk_run_queue_uncond(struct request_queue *q)
+{
+	if (unlikely(blk_queue_dead(q)))
+		return;
+
+	q->request_fn(q);
+}
+
+/**
  * __blk_run_queue - run a single device queue
  * @q:	The queue to run
  *
@@ -305,7 +324,7 @@ void __blk_run_queue(struct request_queue *q)
 	if (unlikely(blk_queue_stopped(q)))
 		return;
 
-	q->request_fn(q);
+	__blk_run_queue_uncond(q);
 }
 EXPORT_SYMBOL(__blk_run_queue);
 
@@ -477,8 +496,8 @@ EXPORT_SYMBOL_GPL(blk_queue_bypass_end);
  * blk_cleanup_queue - shutdown a request queue
  * @q: request queue to shutdown
  *
- * Mark @q DYING, drain all pending requests, destroy and put it.  All
- * future requests will be failed immediately with -ENODEV.
+ * Mark @q DYING, drain all pending requests, mark @q DEAD, destroy and
+ * put it.  All future requests will be failed immediately with -ENODEV.
  */
 void blk_cleanup_queue(struct request_queue *q)
 {
@@ -510,6 +529,7 @@ void blk_cleanup_queue(struct request_queue *q)
 	/* drain all requests queued before DYING marking */
 	spin_lock_irq(lock);
 	__blk_drain_queue(q, true);
+	queue_flag_set(QUEUE_FLAG_DEAD, q);
 	spin_unlock_irq(lock);
 
 	/* @q won't process any more request, flush async actions */
diff --git a/block/blk-exec.c b/block/blk-exec.c
index 4aec98d..1320e74 100644
--- a/block/blk-exec.c
+++ b/block/blk-exec.c
@@ -72,7 +72,7 @@ void blk_execute_rq_nowait(struct request_queue *q, struct gendisk *bd_disk,
 	__blk_run_queue(q);
 	/* the queue is stopped so it won't be run */
 	if (rq->cmd_type == REQ_TYPE_PM_RESUME)
-		q->request_fn(q);
+		__blk_run_queue_uncond(q);
 	spin_unlock_irq(q->queue_lock);
 }
 EXPORT_SYMBOL_GPL(blk_execute_rq_nowait);
diff --git a/block/blk.h b/block/blk.h
index 2218a8a..47fdfdd 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -145,6 +145,8 @@ int blk_try_merge(struct request *rq, struct bio *bio);
 
 void blk_queue_congestion_threshold(struct request_queue *q);
 
+void __blk_run_queue_uncond(struct request_queue *q);
+
 int blk_dev_init(void);
 
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index aba8246..8bc46c2 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -452,6 +452,7 @@ struct request_queue {
 #define QUEUE_FLAG_ADD_RANDOM  16	/* Contributes to random pool */
 #define QUEUE_FLAG_SECDISCARD  17	/* supports SECDISCARD */
 #define QUEUE_FLAG_SAME_FORCE  18	/* force complete on same CPU */
+#define QUEUE_FLAG_DEAD        19	/* queue tear-down finished */
 
 #define QUEUE_FLAG_DEFAULT	((1 << QUEUE_FLAG_IO_STAT) |		\
 				 (1 << QUEUE_FLAG_STACKABLE)	|	\
@@ -522,6 +523,7 @@ static inline void queue_flag_clear(unsigned int flag, struct request_queue *q)
 #define blk_queue_tagged(q)	test_bit(QUEUE_FLAG_QUEUED, &(q)->queue_flags)
 #define blk_queue_stopped(q)	test_bit(QUEUE_FLAG_STOPPED, &(q)->queue_flags)
 #define blk_queue_dying(q)	test_bit(QUEUE_FLAG_DYING, &(q)->queue_flags)
+#define blk_queue_dead(q)	test_bit(QUEUE_FLAG_DEAD, &(q)->queue_flags)
 #define blk_queue_bypass(q)	test_bit(QUEUE_FLAG_BYPASS, &(q)->queue_flags)
 #define blk_queue_nomerges(q)	test_bit(QUEUE_FLAG_NOMERGES, &(q)->queue_flags)
 #define blk_queue_noxmerges(q)	\
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 5/7] block: Make blk_cleanup_queue() wait until request_fn finished
  2012-10-26 12:00 [PATCH 0/7 v5] More device removal fixes Bart Van Assche
                   ` (3 preceding siblings ...)
  2012-10-26 12:03 ` [PATCH 4/7] block: Avoid that request_fn is invoked on a dead queue Bart Van Assche
@ 2012-10-26 12:04 ` Bart Van Assche
  2012-10-29  2:00   ` Tejun Heo
  2012-10-26 12:05 ` [PATCH 6/7] Fix race between starved list processing and device removal Bart Van Assche
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 24+ messages in thread
From: Bart Van Assche @ 2012-10-26 12:04 UTC (permalink / raw)
  To: linux-scsi
  Cc: James Bottomley, Mike Christie, Jens Axboe, Tejun Heo, Chanho Min

Some request_fn implementations, e.g. scsi_request_fn(), unlock
the queue lock. Make sure that blk_cleanup_queue() waits until all
active request_fn invocations have finished. A block driver may
start cleaning up resources needed by its request_fn as soon as
blk_cleanup_queue() finished, so blk_cleanup_queue() must wait
for all outstanding request_fn invocations to finish.

Reported-by: Chanho Min <chanho.min@lge.com>
Cc: James Bottomley <JBottomley@Parallels.com>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 block/blk-core.c       |    3 +++
 include/linux/blkdev.h |    6 ++++++
 2 files changed, 9 insertions(+)

diff --git a/block/blk-core.c b/block/blk-core.c
index 2f1be7a..7662d83 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -308,7 +308,9 @@ void inline __blk_run_queue_uncond(struct request_queue *q)
 	if (unlikely(blk_queue_dead(q)))
 		return;
 
+	q->request_fn_active++;
 	q->request_fn(q);
+	q->request_fn_active--;
 }
 
 /**
@@ -407,6 +409,7 @@ static void __blk_drain_queue(struct request_queue *q, bool drain_all)
 			__blk_run_queue(q);
 
 		drain |= q->nr_rqs_elvpriv;
+		drain |= q->request_fn_active;
 
 		/*
 		 * Unfortunately, requests are queued at and tracked from
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 8bc46c2..c9d233e 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -378,6 +378,12 @@ struct request_queue {
 
 	unsigned int		nr_sorted;
 	unsigned int		in_flight[2];
+	/*
+	 * Number of active block driver functions for which blk_drain_queue()
+	 * must wait. Must be incremented around functions that unlock the
+	 * queue_lock internally, e.g. scsi_request_fn().
+	 */
+	unsigned int		request_fn_active;
 
 	unsigned int		rq_timeout;
 	struct timer_list	timeout;
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 6/7] Fix race between starved list processing and device removal
  2012-10-26 12:00 [PATCH 0/7 v5] More device removal fixes Bart Van Assche
                   ` (4 preceding siblings ...)
  2012-10-26 12:04 ` [PATCH 5/7] block: Make blk_cleanup_queue() wait until request_fn finished Bart Van Assche
@ 2012-10-26 12:05 ` Bart Van Assche
  2012-10-28 18:01   ` Zhuang, Jin Can
  2012-10-29  2:07   ` Tejun Heo
  2012-10-26 12:05 ` [PATCH 7/7] Remove get_device() / put_device() pair from scsi_request_fn() Bart Van Assche
  2012-11-23 10:37 ` [PATCH 0/7 v5] More device removal fixes Bart Van Assche
  7 siblings, 2 replies; 24+ messages in thread
From: Bart Van Assche @ 2012-10-26 12:05 UTC (permalink / raw)
  To: linux-scsi
  Cc: James Bottomley, Mike Christie, Jens Axboe, Tejun Heo, Chanho Min

The SCSI core maintains a "starved list" per SCSI host. This is a
list of devices for which one or more requests have been queued
but that have not yet been passed to the SCSI LLD. The function
scsi_run_queue() examines all SCSI devices on the starved list.
Since scsi_remove_device() can be invoked concurrently with
scsi_run_queue() it is important to avoid that a SCSI device is
accessed by that function after it has been freed. Avoid that the
sdev reference count can drop to zero before the queue is run by
scsi_run_queue() by inserting a get_device() / put_device() pair
in that function. Move the code for removing a device from the
starved list from scsi_device_dev_release_usercontext() to
__scsi_remove_device() such that it is guaranteed that the newly
added get_device() call succeeds.

Reported-and-tested-by: Chanho Min <chanho.min@lge.com>
Reference: http://lkml.org/lkml/2012/8/2/96
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Tejun Heo <tj@kernel.org>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/scsi/scsi_lib.c   |    5 +++++
 drivers/scsi/scsi_sysfs.c |    7 ++++++-
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index f29a1a9..c5d4ec2 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -452,10 +452,15 @@ static void scsi_run_queue(struct request_queue *q)
 			continue;
 		}
 
+		get_device(&sdev->sdev_gendev);
 		spin_unlock(shost->host_lock);
+
 		spin_lock(sdev->request_queue->queue_lock);
 		__blk_run_queue(sdev->request_queue);
 		spin_unlock(sdev->request_queue->queue_lock);
+
+		put_device(&sdev->sdev_gendev);
+
 		spin_lock(shost->host_lock);
 	}
 	/* put any unprocessed entries back */
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index ce5224c..2661a957 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -348,7 +348,6 @@ static void scsi_device_dev_release_usercontext(struct work_struct *work)
 	starget->reap_ref++;
 	list_del(&sdev->siblings);
 	list_del(&sdev->same_target_siblings);
-	list_del(&sdev->starved_entry);
 	spin_unlock_irqrestore(sdev->host->host_lock, flags);
 
 	cancel_work_sync(&sdev->event_work);
@@ -956,6 +955,8 @@ int scsi_sysfs_add_sdev(struct scsi_device *sdev)
 void __scsi_remove_device(struct scsi_device *sdev)
 {
 	struct device *dev = &sdev->sdev_gendev;
+	struct Scsi_Host *shost = sdev->host;
+	unsigned long flags;
 
 	if (sdev->is_visible) {
 		if (scsi_device_set_state(sdev, SDEV_CANCEL) != 0)
@@ -977,6 +978,10 @@ void __scsi_remove_device(struct scsi_device *sdev)
 	blk_cleanup_queue(sdev->request_queue);
 	cancel_work_sync(&sdev->requeue_work);
 
+	spin_lock_irqsave(shost->host_lock, flags);
+	list_del(&sdev->starved_entry);
+	spin_unlock_irqrestore(shost->host_lock, flags);
+
 	if (sdev->host->hostt->slave_destroy)
 		sdev->host->hostt->slave_destroy(sdev);
 	transport_destroy_device(dev);
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 7/7] Remove get_device() / put_device() pair from scsi_request_fn()
  2012-10-26 12:00 [PATCH 0/7 v5] More device removal fixes Bart Van Assche
                   ` (5 preceding siblings ...)
  2012-10-26 12:05 ` [PATCH 6/7] Fix race between starved list processing and device removal Bart Van Assche
@ 2012-10-26 12:05 ` Bart Van Assche
  2012-10-29  2:08   ` Tejun Heo
  2012-11-23 10:37 ` [PATCH 0/7 v5] More device removal fixes Bart Van Assche
  7 siblings, 1 reply; 24+ messages in thread
From: Bart Van Assche @ 2012-10-26 12:05 UTC (permalink / raw)
  To: linux-scsi
  Cc: James Bottomley, Mike Christie, Jens Axboe, Tejun Heo, Chanho Min

Since all scsi_request_fn() callers hold a reference on the
SCSI device that function is invoked for and since
blk_cleanup_queue() waits until scsi_request_fn() has finished
it is safe to remove the get_device() / put_device() pair from
scsi_request_fn().

Cc: James Bottomley <JBottomley@Parallels.com>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/scsi/scsi_lib.c |   12 +++---------
 1 file changed, 3 insertions(+), 9 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index c5d4ec2..71bddec 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1516,16 +1516,14 @@ static void scsi_softirq_done(struct request *rq)
  * Lock status: IO request lock assumed to be held when called.
  */
 static void scsi_request_fn(struct request_queue *q)
+	__releases(q->queue_lock)
+	__acquires(q->queue_lock)
 {
 	struct scsi_device *sdev = q->queuedata;
 	struct Scsi_Host *shost;
 	struct scsi_cmnd *cmd;
 	struct request *req;
 
-	if(!get_device(&sdev->sdev_gendev))
-		/* We must be tearing the block queue down already */
-		return;
-
 	/*
 	 * To start with, we keep looping until the queue is empty, or until
 	 * the host is no longer able to accept any more requests.
@@ -1634,11 +1632,7 @@ out_delay:
 	if (sdev->device_busy == 0)
 		blk_delay_queue(q, SCSI_QUEUE_DELAY);
 out:
-	/* must be careful here...if we trigger the ->remove() function
-	 * we cannot be holding the q lock */
-	spin_unlock_irq(q->queue_lock);
-	put_device(&sdev->sdev_gendev);
-	spin_lock_irq(q->queue_lock);
+	;
 }
 
 u64 scsi_calculate_bounce_limit(struct Scsi_Host *shost)
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* RE: [PATCH 6/7] Fix race between starved list processing and device removal
  2012-10-26 12:05 ` [PATCH 6/7] Fix race between starved list processing and device removal Bart Van Assche
@ 2012-10-28 18:01   ` Zhuang, Jin Can
  2012-10-29 14:32     ` Bart Van Assche
  2012-10-29  2:07   ` Tejun Heo
  1 sibling, 1 reply; 24+ messages in thread
From: Zhuang, Jin Can @ 2012-10-28 18:01 UTC (permalink / raw)
  To: Bart Van Assche, linux-scsi
  Cc: James Bottomley, Mike Christie, Jens Axboe, Tejun Heo, Chanho Min

I recently ran into the same issue
The test I did is plug/unplug u-disk in an interval of 1 second. And I found when sdev1 is being removed, scsi_run_queue is triggered by sdev2, which then accesses all the starving scsi device including sdev1.

I have adopted the solution below which works fine for me so far.
But there's one thing to fix in the patch below. When it put_device in scsi_run_queue, irq is disabled. As put_device may get into sleep, irq should be enabled before it's called.
So I change it to:
		spin_unlock_irq(sdev->request_queue->queue_lock);

		put_device(&sdev->sdev_gendev);

 		spin_lock_irq(shost->host_lock);

-Jincan

-----Original Message-----
From: linux-scsi-owner@vger.kernel.org [mailto:linux-scsi-owner@vger.kernel.org] On Behalf Of Bart Van Assche
Sent: Friday, October 26, 2012 8:05 PM
To: linux-scsi
Cc: James Bottomley; Mike Christie; Jens Axboe; Tejun Heo; Chanho Min
Subject: [PATCH 6/7] Fix race between starved list processing and device removal

The SCSI core maintains a "starved list" per SCSI host. This is a list of devices for which one or more requests have been queued but that have not yet been passed to the SCSI LLD. The function
scsi_run_queue() examines all SCSI devices on the starved list.
Since scsi_remove_device() can be invoked concurrently with
scsi_run_queue() it is important to avoid that a SCSI device is accessed by that function after it has been freed. Avoid that the sdev reference count can drop to zero before the queue is run by
scsi_run_queue() by inserting a get_device() / put_device() pair in that function. Move the code for removing a device from the starved list from scsi_device_dev_release_usercontext() to
__scsi_remove_device() such that it is guaranteed that the newly added get_device() call succeeds.

Reported-and-tested-by: Chanho Min <chanho.min@lge.com>
Reference: http://lkml.org/lkml/2012/8/2/96
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Tejun Heo <tj@kernel.org>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/scsi/scsi_lib.c   |    5 +++++
 drivers/scsi/scsi_sysfs.c |    7 ++++++-
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index f29a1a9..c5d4ec2 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -452,10 +452,15 @@ static void scsi_run_queue(struct request_queue *q)
 			continue;
 		}
 
+		get_device(&sdev->sdev_gendev);
 		spin_unlock(shost->host_lock);
+
 		spin_lock(sdev->request_queue->queue_lock);
 		__blk_run_queue(sdev->request_queue);
 		spin_unlock(sdev->request_queue->queue_lock);
+
+		put_device(&sdev->sdev_gendev);
+
 		spin_lock(shost->host_lock);
 	}
 	/* put any unprocessed entries back */ diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c index ce5224c..2661a957 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -348,7 +348,6 @@ static void scsi_device_dev_release_usercontext(struct work_struct *work)
 	starget->reap_ref++;
 	list_del(&sdev->siblings);
 	list_del(&sdev->same_target_siblings);
-	list_del(&sdev->starved_entry);
 	spin_unlock_irqrestore(sdev->host->host_lock, flags);
 
 	cancel_work_sync(&sdev->event_work);
@@ -956,6 +955,8 @@ int scsi_sysfs_add_sdev(struct scsi_device *sdev)  void __scsi_remove_device(struct scsi_device *sdev)  {
 	struct device *dev = &sdev->sdev_gendev;
+	struct Scsi_Host *shost = sdev->host;
+	unsigned long flags;
 
 	if (sdev->is_visible) {
 		if (scsi_device_set_state(sdev, SDEV_CANCEL) != 0) @@ -977,6 +978,10 @@ void __scsi_remove_device(struct scsi_device *sdev)
 	blk_cleanup_queue(sdev->request_queue);
 	cancel_work_sync(&sdev->requeue_work);
 
+	spin_lock_irqsave(shost->host_lock, flags);
+	list_del(&sdev->starved_entry);
+	spin_unlock_irqrestore(shost->host_lock, flags);
+
 	if (sdev->host->hostt->slave_destroy)
 		sdev->host->hostt->slave_destroy(sdev);
 	transport_destroy_device(dev);
--
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/7] block: Avoid that blk_drain_queue() finishes early
  2012-10-26 12:01 ` [PATCH 1/7] block: Avoid that blk_drain_queue() finishes early Bart Van Assche
@ 2012-10-29  1:47   ` Tejun Heo
  2012-10-29  1:52     ` Tejun Heo
  0 siblings, 1 reply; 24+ messages in thread
From: Tejun Heo @ 2012-10-29  1:47 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: linux-scsi, James Bottomley, Mike Christie, Jens Axboe, Chanho Min

Hello,

On Fri, Oct 26, 2012 at 02:01:23PM +0200, Bart Van Assche wrote:
> Code like "drain |= q->nr_rqs[i]" might result in blk_drain_queue()
> to finish early if the expression at the RHS is a multiple of 256
> since the drain variable is only eight bits wide. Avoid this by
> changing the type of the drain variable from bool into unsigned.

No, it doesn't happen that way.  One of the reasons we have bool at
all is to avoid this type of problems caused by implicit type-casting.

Why do you keep pushing this?  It's WRONG.  Please drop it.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/7] block: Avoid that blk_drain_queue() finishes early
  2012-10-29  1:47   ` Tejun Heo
@ 2012-10-29  1:52     ` Tejun Heo
  2012-10-29 14:35       ` Bart Van Assche
  0 siblings, 1 reply; 24+ messages in thread
From: Tejun Heo @ 2012-10-29  1:52 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: linux-scsi, James Bottomley, Mike Christie, Jens Axboe, Chanho Min

On Sun, Oct 28, 2012 at 06:47:22PM -0700, Tejun Heo wrote:
> Hello,
> 
> On Fri, Oct 26, 2012 at 02:01:23PM +0200, Bart Van Assche wrote:
> > Code like "drain |= q->nr_rqs[i]" might result in blk_drain_queue()
> > to finish early if the expression at the RHS is a multiple of 256
> > since the drain variable is only eight bits wide. Avoid this by
> > changing the type of the drain variable from bool into unsigned.
> 
> No, it doesn't happen that way.  One of the reasons we have bool at
> all is to avoid this type of problems caused by implicit type-casting.
> 
> Why do you keep pushing this?  It's WRONG.  Please drop it.

>From C99 std draft.

 6.3 Conversions
 6.3.1.2 Boolean type

  When any scalar value is converted to _Bool, the result is 0 if the
  value compares equal to 0; otherwise, the result is 1.

It doesn't care the width or signedness of the type being converted.
If the origin value equals zero, it converts to 0; otherwise 1.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/7] block: Let blk_drain_queue() caller obtain the queue lock
  2012-10-26 12:02 ` [PATCH 2/7] block: Let blk_drain_queue() caller obtain the queue lock Bart Van Assche
@ 2012-10-29  1:55   ` Tejun Heo
  0 siblings, 0 replies; 24+ messages in thread
From: Tejun Heo @ 2012-10-29  1:55 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: linux-scsi, James Bottomley, Mike Christie, Jens Axboe, Chanho Min

On Fri, Oct 26, 2012 at 02:02:08PM +0200, Bart Van Assche wrote:
> Let the caller of blk_drain_queue() obtain the queue lock to improve
> readability of the patch called "Avoid that request_fn is invoked on
> a dead queue".
> 
> Cc: James Bottomley <JBottomley@Parallels.com>
> Cc: Mike Christie <michaelc@cs.wisc.edu>
> Cc: Jens Axboe <axboe@kernel.dk>
> Cc: Tejun Heo <tj@kernel.org>
> Cc: Chanho Min <chanho.min@lge.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>

Acked-by: Tejun Heo <tj@kernel.org>

Nit: probably better to make this change right before the said patch.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 4/7] block: Avoid that request_fn is invoked on a dead queue
  2012-10-26 12:03 ` [PATCH 4/7] block: Avoid that request_fn is invoked on a dead queue Bart Van Assche
@ 2012-10-29  1:59   ` Tejun Heo
  0 siblings, 0 replies; 24+ messages in thread
From: Tejun Heo @ 2012-10-29  1:59 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: linux-scsi, James Bottomley, Mike Christie, Jens Axboe, Chanho Min

Hello,

On Fri, Oct 26, 2012 at 02:03:45PM +0200, Bart Van Assche wrote:
> A block driver may start cleaning up resources needed by its
> request_fn as soon as blk_cleanup_queue() finished, so request_fn
> must not be invoked after draining finished.

Please be a lot more detailed.  Please describe an example scenario
where things go wrong, explain why that's happening and how it's
fixed.

> +void inline __blk_run_queue_uncond(struct request_queue *q)
> +{
> +	if (unlikely(blk_queue_dead(q)))
> +		return;
> +
> +	q->request_fn(q);
> +}
...
> @@ -510,6 +529,7 @@ void blk_cleanup_queue(struct request_queue *q)
>  	/* drain all requests queued before DYING marking */
>  	spin_lock_irq(lock);
>  	__blk_drain_queue(q, true);
> +	queue_flag_set(QUEUE_FLAG_DEAD, q);
>  	spin_unlock_irq(lock);

And some comments explaining what the dead thing is doing would be
nice too; other than that, looks good to me.

Thanks!

-- 
tejun

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 5/7] block: Make blk_cleanup_queue() wait until request_fn finished
  2012-10-26 12:04 ` [PATCH 5/7] block: Make blk_cleanup_queue() wait until request_fn finished Bart Van Assche
@ 2012-10-29  2:00   ` Tejun Heo
  0 siblings, 0 replies; 24+ messages in thread
From: Tejun Heo @ 2012-10-29  2:00 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: linux-scsi, James Bottomley, Mike Christie, Jens Axboe, Chanho Min

On Fri, Oct 26, 2012 at 02:04:27PM +0200, Bart Van Assche wrote:
> Some request_fn implementations, e.g. scsi_request_fn(), unlock
> the queue lock. Make sure that blk_cleanup_queue() waits until all
> active request_fn invocations have finished. A block driver may
> start cleaning up resources needed by its request_fn as soon as
> blk_cleanup_queue() finished, so blk_cleanup_queue() must wait
> for all outstanding request_fn invocations to finish.

Ditto.  More verbose explanation *please*.

> Reported-by: Chanho Min <chanho.min@lge.com>
> Cc: James Bottomley <JBottomley@Parallels.com>
> Cc: Mike Christie <michaelc@cs.wisc.edu>
> Cc: Jens Axboe <axboe@kernel.dk>
> Cc: Tejun Heo <tj@kernel.org>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>  block/blk-core.c       |    3 +++
>  include/linux/blkdev.h |    6 ++++++
>  2 files changed, 9 insertions(+)
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 2f1be7a..7662d83 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -308,7 +308,9 @@ void inline __blk_run_queue_uncond(struct request_queue *q)
>  	if (unlikely(blk_queue_dead(q)))
>  		return;
>  
> +	q->request_fn_active++;
>  	q->request_fn(q);
> +	q->request_fn_active--;
>  }

Here too.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 6/7] Fix race between starved list processing and device removal
  2012-10-26 12:05 ` [PATCH 6/7] Fix race between starved list processing and device removal Bart Van Assche
  2012-10-28 18:01   ` Zhuang, Jin Can
@ 2012-10-29  2:07   ` Tejun Heo
  1 sibling, 0 replies; 24+ messages in thread
From: Tejun Heo @ 2012-10-29  2:07 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: linux-scsi, James Bottomley, Mike Christie, Jens Axboe, Chanho Min

Hello, Bart.

On Fri, Oct 26, 2012 at 02:05:01PM +0200, Bart Van Assche wrote:
> The SCSI core maintains a "starved list" per SCSI host. This is a
> list of devices for which one or more requests have been queued
> but that have not yet been passed to the SCSI LLD. The function
> scsi_run_queue() examines all SCSI devices on the starved list.

New paragraph.

> Since scsi_remove_device() can be invoked concurrently with
> scsi_run_queue() it is important to avoid that a SCSI device is
> accessed by that function after it has been freed.

New paragraph.

> Avoid that the
> sdev reference count can drop to zero before the queue is run by
> scsi_run_queue() by inserting a get_device() / put_device() pair
> in that function. Move the code for removing a device from the
> starved list from scsi_device_dev_release_usercontext() to
> __scsi_remove_device() such that it is guaranteed that the newly
> added get_device() call succeeds.
>
> Reported-and-tested-by: Chanho Min <chanho.min@lge.com>
> Reference: http://lkml.org/lkml/2012/8/2/96
> Cc: Jens Axboe <axboe@kernel.dk>
> Cc: Tejun Heo <tj@kernel.org>
> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>

Heh, for some reason, the commit message is a hard read for me but I
think it should do.

 Acked-by: Tejun Heo <tj@kernel.org>

Thanks!

-- 
tejun

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 7/7] Remove get_device() / put_device() pair from scsi_request_fn()
  2012-10-26 12:05 ` [PATCH 7/7] Remove get_device() / put_device() pair from scsi_request_fn() Bart Van Assche
@ 2012-10-29  2:08   ` Tejun Heo
  0 siblings, 0 replies; 24+ messages in thread
From: Tejun Heo @ 2012-10-29  2:08 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: linux-scsi, James Bottomley, Mike Christie, Jens Axboe, Chanho Min

On Fri, Oct 26, 2012 at 02:05:47PM +0200, Bart Van Assche wrote:
> Since all scsi_request_fn() callers hold a reference on the

Maybe s/Since/Now that/

> SCSI device that function is invoked for and since
> blk_cleanup_queue() waits until scsi_request_fn() has finished
> it is safe to remove the get_device() / put_device() pair from
> scsi_request_fn().
> 
> Cc: James Bottomley <JBottomley@Parallels.com>
> Cc: Mike Christie <michaelc@cs.wisc.edu>
> Cc: Jens Axboe <axboe@kernel.dk>
> Cc: Tejun Heo <tj@kernel.org>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>

Acked-by: Tejun Heo <tj@kernel.org>

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 6/7] Fix race between starved list processing and device removal
  2012-10-28 18:01   ` Zhuang, Jin Can
@ 2012-10-29 14:32     ` Bart Van Assche
  2012-10-30  5:40       ` Zhuang, Jin Can
  0 siblings, 1 reply; 24+ messages in thread
From: Bart Van Assche @ 2012-10-29 14:32 UTC (permalink / raw)
  To: Zhuang, Jin Can
  Cc: linux-scsi, James Bottomley, Mike Christie, Jens Axboe,
	Tejun Heo, Chanho Min

On 10/28/12 19:01, Zhuang, Jin Can wrote:
> I recently ran into the same issue
> The test I did is plug/unplug u-disk in an interval of 1 second. And
 > I found when sdev1 is being removed, scsi_run_queue is triggered by
 > sdev2, which then accesses all the starving scsi device including sdev1.
>
> I have adopted the solution below which works fine for me so far.
> But there's one thing to fix in the patch below. When it put_device
 > in scsi_run_queue, irq is disabled. As put_device may get into sleep,
 > irq should be enabled before it's called.

Hello Jincan,

Thanks for testing and the feedback. However, are you sure that 
put_device() for a SCSI device may sleep ? Have you noticed the 
execute_in_process_context() call in scsi_device_dev_release() ?

Bart.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/7] block: Avoid that blk_drain_queue() finishes early
  2012-10-29  1:52     ` Tejun Heo
@ 2012-10-29 14:35       ` Bart Van Assche
  0 siblings, 0 replies; 24+ messages in thread
From: Bart Van Assche @ 2012-10-29 14:35 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-scsi, James Bottomley, Mike Christie, Jens Axboe, Chanho Min

On 10/29/12 02:52, Tejun Heo wrote:
> On Sun, Oct 28, 2012 at 06:47:22PM -0700, Tejun Heo wrote:
>> On Fri, Oct 26, 2012 at 02:01:23PM +0200, Bart Van Assche wrote:
>>> Code like "drain |= q->nr_rqs[i]" might result in blk_drain_queue()
>>> to finish early if the expression at the RHS is a multiple of 256
>>> since the drain variable is only eight bits wide. Avoid this by
>>> changing the type of the drain variable from bool into unsigned.
>>
>> No, it doesn't happen that way.  One of the reasons we have bool at
>> all is to avoid this type of problems caused by implicit type-casting.
>>
>> Why do you keep pushing this?  It's WRONG.  Please drop it.
>
>>From C99 std draft.
>
>   6.3 Conversions
>   6.3.1.2 Boolean type
>
>    When any scalar value is converted to _Bool, the result is 0 if the
>    value compares equal to 0; otherwise, the result is 1.
>
> It doesn't care the width or signedness of the type being converted.
> If the origin value equals zero, it converts to 0; otherwise 1.

For one or another reason I was assuming that bool was a typedef for 
unsigned char. But you are right, it's a synonym for the C99 type _Bool 
so I'll drop this patch.

Bart.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH 6/7] Fix race between starved list processing and device removal
  2012-10-29 14:32     ` Bart Van Assche
@ 2012-10-30  5:40       ` Zhuang, Jin Can
  2012-11-02 10:48         ` Bart Van Assche
       [not found]         ` <026701cdb8c3$d2e3cb50$78ab61f0$@min@lge.com>
  0 siblings, 2 replies; 24+ messages in thread
From: Zhuang, Jin Can @ 2012-10-30  5:40 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: linux-scsi, James Bottomley, Mike Christie, Jens Axboe,
	Tejun Heo, Chanho Min

Hi Bart,

Yes. Here's the warning. 
For the trace below, I used scsi_device_get/scsi_device_put() in scsi_run_queue(). (A little different from your patch). But I think it's the same.

10-23 18:15:53.309     8     8 I KERNEL  : [  268.994556] BUG: sleeping function called from invalid context at linux-2.6/kernel/workqueue.c:2500
10-23 18:15:53.309     8     8 I KERNEL  : [  269.006898] in_atomic(): 0, irqs_disabled(): 1, pid: 8, name: kworker/0:1
10-23 18:15:53.309     8     8 I KERNEL  : [  269.013689] Pid: 8, comm: kworker/0:1 Tainted: G        WC  3.0.34-140359-g85a6d67-dirty #43
10-23 18:15:53.309     8     8 I KERNEL  : [  269.022113] Call Trace:
10-23 18:15:53.309     8     8 I KERNEL  : [  269.024567]  [<c1859ea5>] ? printk+0x1d/0x1f
10-23 18:15:53.309     8     8 I KERNEL  : [  269.028828]  [<c123464a>] __might_sleep+0x10a/0x110
10-23 18:15:53.309     8     8 I KERNEL  : [  269.033695]  [<c12628a3>] wait_on_work+0x23/0x1a0
10-23 18:15:53.309     8     8 I KERNEL  : [  269.038390]  [<c1863ee6>] ? _raw_spin_unlock_irqrestore+0x26/0x50
10-23 18:15:53.309     8     8 I KERNEL  : [  269.044476]  [<c152fd66>] ? __pm_runtime_idle+0x66/0xf0
10-23 18:15:53.309     8     8 I KERNEL  : [  269.049706]  [<c165ae3e>] ? ram_console_write+0x4e/0xa0
10-23 18:15:53.309     8     8 I KERNEL  : [  269.054913]  [<c126472a>] __cancel_work_timer+0x6a/0x110
10-23 18:15:53.309     8     8 I KERNEL  : [  269.060217]  [<c12647ff>] cancel_work_sync+0xf/0x20
10-23 18:15:53.309     8     8 I KERNEL  : [  269.065087]  [<c1548d5d>] scsi_device_dev_release_usercontext+0x6d/0x100
10-23 18:15:53.309     8     8 I KERNEL  : [  269.071785]  [<c12626a2>] execute_in_process_context+0x42/0x50
10-23 18:15:53.309     8     8 I KERNEL  : [  269.077609]  [<c1548cc8>] scsi_device_dev_release+0x18/0x20
10-23 18:15:53.309     8     8 I KERNEL  : [  269.083174]  [<c15234a0>] device_release+0x20/0x80
10-23 18:15:53.309     8     8 I KERNEL  : [  269.087958]  [<c124a21e>] ? vprintk+0x2be/0x4e0
10-23 18:15:53.309     8     8 I KERNEL  : [  269.092479]  [<c148d1b4>] kobject_release+0x84/0x1f0
10-23 18:15:53.309     8     8 I KERNEL  : [  269.097439]  [<c1863d22>] ? _raw_spin_lock_irq+0x22/0x30
10-23 18:15:53.309     8     8 I KERNEL  : [  269.102732]  [<c148d130>] ? kobject_del+0x70/0x70
10-23 18:15:53.309     8     8 I KERNEL  : [  269.107430]  [<c148e8ec>] kref_put+0x2c/0x60
10-23 18:15:53.309     8     8 I KERNEL  : [  269.111688]  [<c148d06d>] kobject_put+0x1d/0x50
10-23 18:15:53.309     8     8 I KERNEL  : [  269.116209]  [<c15232a4>] put_device+0x14/0x20
10-23 18:15:53.309     8     8 I KERNEL  : [  269.120646]  [<c153daa7>] scsi_device_put+0x37/0x60
10-23 18:15:53.309     8     8 I KERNEL  : [  269.125515]  [<c1543cc7>] scsi_run_queue+0x247/0x320
10-23 18:15:53.309     8     8 I KERNEL  : [  269.130470]  [<c1545903>] scsi_requeue_run_queue+0x13/0x20
10-23 18:15:53.309     8     8 I KERNEL  : [  269.135941]  [<c1263efe>] process_one_work+0xfe/0x3f0
10-23 18:15:53.309     8     8 I KERNEL  : [  269.140997]  [<c15458f0>] ? scsi_softirq_done+0x120/0x120
10-23 18:15:53.309     8     8 I KERNEL  : [  269.146384]  [<c12644f1>] worker_thread+0x121/0x2f0
10-23 18:15:53.309     8     8 I KERNEL  : [  269.151254]  [<c12643d0>] ? rescuer_thread+0x1e0/0x1e0
10-23 18:15:53.309     8     8 I KERNEL  : [  269.156383]  [<c1267ffd>] kthread+0x6d/0x80
10-23 18:15:53.309     8     8 I KERNEL  : [  269.160558]  [<c1267f90>] ? __init_kthread_worker+0x30/0x30
10-23 18:15:53.309     8     8 I KERNEL  : [  269.166124]  [<c186a27a>] kernel_thread_helper+0x6/0x10

-Jincan

-----Original Message-----
From: linux-scsi-owner@vger.kernel.org [mailto:linux-scsi-owner@vger.kernel.org] On Behalf Of Bart Van Assche
Sent: Monday, October 29, 2012 10:32 PM
To: Zhuang, Jin Can
Cc: linux-scsi; James Bottomley; Mike Christie; Jens Axboe; Tejun Heo; Chanho Min
Subject: Re: [PATCH 6/7] Fix race between starved list processing and device removal

On 10/28/12 19:01, Zhuang, Jin Can wrote:
> I recently ran into the same issue
> The test I did is plug/unplug u-disk in an interval of 1 second. And
 > I found when sdev1 is being removed, scsi_run_queue is triggered by  > sdev2, which then accesses all the starving scsi device including sdev1.
>
> I have adopted the solution below which works fine for me so far.
> But there's one thing to fix in the patch below. When it put_device
 > in scsi_run_queue, irq is disabled. As put_device may get into sleep,  > irq should be enabled before it's called.

Hello Jincan,

Thanks for testing and the feedback. However, are you sure that
put_device() for a SCSI device may sleep ? Have you noticed the
execute_in_process_context() call in scsi_device_dev_release() ?

Bart.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 6/7] Fix race between starved list processing and device removal
  2012-10-30  5:40       ` Zhuang, Jin Can
@ 2012-11-02 10:48         ` Bart Van Assche
  2012-11-21 11:06           ` Bart Van Assche
       [not found]         ` <026701cdb8c3$d2e3cb50$78ab61f0$@min@lge.com>
  1 sibling, 1 reply; 24+ messages in thread
From: Bart Van Assche @ 2012-11-02 10:48 UTC (permalink / raw)
  To: Zhuang, Jin Can
  Cc: linux-scsi, James Bottomley, Mike Christie, Jens Axboe,
	Tejun Heo, Chanho Min

On 10/30/12 06:40, Zhuang, Jin Can wrote:
> Yes. Here's the warning.
> For the trace below, I used scsi_device_get/scsi_device_put() in scsi_run_queue(). (A little different from your patch). But I think it's the same.
> 
> 10-23 18:15:53.309     8     8 I KERNEL  : [  268.994556] BUG: sleeping function called from invalid context at linux-2.6/kernel/workqueue.c:2500
> 10-23 18:15:53.309     8     8 I KERNEL  : [  269.006898] in_atomic(): 0, irqs_disabled(): 1, pid: 8, name: kworker/0:1
> 10-23 18:15:53.309     8     8 I KERNEL  : [  269.013689] Pid: 8, comm: kworker/0:1 Tainted: G        WC  3.0.34-140359-g85a6d67-dirty #43
> 10-23 18:15:53.309     8     8 I KERNEL  : [  269.022113] Call Trace:
> 10-23 18:15:53.309     8     8 I KERNEL  : [  269.028828]  [<c123464a>] __might_sleep+0x10a/0x110
> 10-23 18:15:53.309     8     8 I KERNEL  : [  269.033695]  [<c12628a3>] wait_on_work+0x23/0x1a0
> 10-23 18:15:53.309     8     8 I KERNEL  : [  269.054913]  [<c126472a>] __cancel_work_timer+0x6a/0x110
> 10-23 18:15:53.309     8     8 I KERNEL  : [  269.060217]  [<c12647ff>] cancel_work_sync+0xf/0x20
> 10-23 18:15:53.309     8     8 I KERNEL  : [  269.065087]  [<c1548d5d>] scsi_device_dev_release_usercontext+0x6d/0x100
> 10-23 18:15:53.309     8     8 I KERNEL  : [  269.071785]  [<c12626a2>] execute_in_process_context+0x42/0x50
> 10-23 18:15:53.309     8     8 I KERNEL  : [  269.077609]  [<c1548cc8>] scsi_device_dev_release+0x18/0x20
> 10-23 18:15:53.309     8     8 I KERNEL  : [  269.083174]  [<c15234a0>] device_release+0x20/0x80
> 10-23 18:15:53.309     8     8 I KERNEL  : [  269.092479]  [<c148d1b4>] kobject_release+0x84/0x1f0
> 10-23 18:15:53.309     8     8 I KERNEL  : [  269.107430]  [<c148e8ec>] kref_put+0x2c/0x60
> 10-23 18:15:53.309     8     8 I KERNEL  : [  269.111688]  [<c148d06d>] kobject_put+0x1d/0x50
> 10-23 18:15:53.309     8     8 I KERNEL  : [  269.116209]  [<c15232a4>] put_device+0x14/0x20
> 10-23 18:15:53.309     8     8 I KERNEL  : [  269.120646]  [<c153daa7>] scsi_device_put+0x37/0x60
> 10-23 18:15:53.309     8     8 I KERNEL  : [  269.125515]  [<c1543cc7>] scsi_run_queue+0x247/0x320
> 10-23 18:15:53.309     8     8 I KERNEL  : [  269.130470]  [<c1545903>] scsi_requeue_run_queue+0x13/0x20
> 10-23 18:15:53.309     8     8 I KERNEL  : [  269.135941]  [<c1263efe>] process_one_work+0xfe/0x3f0
> 10-23 18:15:53.309     8     8 I KERNEL  : [  269.146384]  [<c12644f1>] worker_thread+0x121/0x2f0
> 10-23 18:15:53.309     8     8 I KERNEL  : [  269.156383]  [<c1267ffd>] kthread+0x6d/0x80
> 10-23 18:15:53.309     8     8 I KERNEL  : [  269.166124]  [<c186a27a>] kernel_thread_helper+0x6/0x10

Thanks for the feedback. Something that kept me busy since I posted
the patch at the start of this thread is how to avoid adding two
atomic operations in a hot path (the get_device() and put_device()
calls in scsi_run_queue()). The patch below should realize that.
However, since I haven't been able so far to trigger the above call
trace that means that the test I ran wasn't sufficient to trigger
all code paths. So it would be appreciated if anyone could help
testing the patch below.

[PATCH] Fix race between starved list processing and device removal

---
 block/blk-core.c          |    9 +++++----
 drivers/scsi/scsi_lib.c   |   20 ++++++++++++++------
 drivers/scsi/scsi_sysfs.c |    9 ++++++++-
 3 files changed, 27 insertions(+), 11 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index e4f4e06..565484f 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -407,10 +407,11 @@ static void __blk_drain_queue(struct request_queue *q, bool drain_all)
 
 		/*
 		 * This function might be called on a queue which failed
-		 * driver init after queue creation or is not yet fully
-		 * active yet.  Some drivers (e.g. fd and loop) get unhappy
-		 * in such cases.  Kick queue iff dispatch queue has
-		 * something on it and @q has request_fn set.
+		 * driver init after queue creation, is not yet fully active
+		 * or is being cleaned up and doesn't make progress anymore
+		 * (e.g. a SCSI device in state SDEV_DEL). Kick queue iff
+		 * dispatch queue has something on it and @q has request_fn
+		 * set.
 		 */
 		if (!list_empty(&q->queue_head) && q->request_fn)
 			__blk_run_queue(q);
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 488035b..1763181 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -447,8 +447,9 @@ static void scsi_run_queue(struct request_queue *q)
 				  struct scsi_device, starved_entry);
 		list_del_init(&sdev->starved_entry);
 		if (scsi_target_is_busy(scsi_target(sdev))) {
-			list_move_tail(&sdev->starved_entry,
-				       &shost->starved_list);
+			if (sdev->sdev_state != SDEV_DEL)
+				list_add_tail(&sdev->starved_entry,
+					      &shost->starved_list);
 			continue;
 		}
 
@@ -1344,7 +1345,9 @@ static inline int scsi_target_queue_ready(struct Scsi_Host *shost,
 	}
 
 	if (scsi_target_is_busy(starget)) {
-		list_move_tail(&sdev->starved_entry, &shost->starved_list);
+		if (sdev->sdev_state != SDEV_DEL)
+			list_move_tail(&sdev->starved_entry,
+				       &shost->starved_list);
 		return 0;
 	}
 
@@ -1377,8 +1380,11 @@ static inline int scsi_host_queue_ready(struct request_queue *q,
 		}
 	}
 	if (scsi_host_is_busy(shost)) {
-		if (list_empty(&sdev->starved_entry))
-			list_add_tail(&sdev->starved_entry, &shost->starved_list);
+		if (list_empty(&sdev->starved_entry) &&
+		    sdev->sdev_state != SDEV_DEL) {
+			list_add_tail(&sdev->starved_entry,
+				      &shost->starved_list);
+		}
 		return 0;
 	}
 
@@ -1571,9 +1577,11 @@ static void scsi_request_fn(struct request_queue *q)
 		 * a run when a tag is freed.
 		 */
 		if (blk_queue_tagged(q) && !blk_rq_tagged(req)) {
-			if (list_empty(&sdev->starved_entry))
+			if (list_empty(&sdev->starved_entry) &&
+			    sdev->sdev_state != SDEV_DEL) {
 				list_add_tail(&sdev->starved_entry,
 					      &shost->starved_list);
+			}
 			goto not_ready;
 		}
 
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index ce5224c..2f0f31e 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -348,7 +348,6 @@ static void scsi_device_dev_release_usercontext(struct work_struct *work)
 	starget->reap_ref++;
 	list_del(&sdev->siblings);
 	list_del(&sdev->same_target_siblings);
-	list_del(&sdev->starved_entry);
 	spin_unlock_irqrestore(sdev->host->host_lock, flags);
 
 	cancel_work_sync(&sdev->event_work);
@@ -956,6 +955,8 @@ int scsi_sysfs_add_sdev(struct scsi_device *sdev)
 void __scsi_remove_device(struct scsi_device *sdev)
 {
 	struct device *dev = &sdev->sdev_gendev;
+	struct Scsi_Host *shost = sdev->host;
+	unsigned long flags;
 
 	if (sdev->is_visible) {
 		if (scsi_device_set_state(sdev, SDEV_CANCEL) != 0)
@@ -973,7 +974,13 @@ void __scsi_remove_device(struct scsi_device *sdev)
 	 * scsi_run_queue() invocations have finished before tearing down the
 	 * device.
 	 */
+
 	scsi_device_set_state(sdev, SDEV_DEL);
+
+	spin_lock_irqsave(shost->host_lock, flags);
+	list_del(&sdev->starved_entry);
+	spin_unlock_irqrestore(shost->host_lock, flags);
+
 	blk_cleanup_queue(sdev->request_queue);
 	cancel_work_sync(&sdev->requeue_work);
 
-- 
1.7.10.4



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 6/7] Fix race between starved list processing and device removal
  2012-11-02 10:48         ` Bart Van Assche
@ 2012-11-21 11:06           ` Bart Van Assche
  0 siblings, 0 replies; 24+ messages in thread
From: Bart Van Assche @ 2012-11-21 11:06 UTC (permalink / raw)
  To: Zhuang, Jin Can
  Cc: linux-scsi, James Bottomley, Mike Christie, Jens Axboe,
	Tejun Heo, Chanho Min

On 11/02/12 11:48, Bart Van Assche wrote:
> [PATCH] Fix race between starved list processing and device removal
> [ ... ]
> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
> index ce5224c..2f0f31e 100644
> --- a/drivers/scsi/scsi_sysfs.c
> +++ b/drivers/scsi/scsi_sysfs.c
> @@ -348,7 +348,6 @@ static void scsi_device_dev_release_usercontext(struct work_struct *work)
>   	starget->reap_ref++;
>   	list_del(&sdev->siblings);
>   	list_del(&sdev->same_target_siblings);
> -	list_del(&sdev->starved_entry);
>   	spin_unlock_irqrestore(sdev->host->host_lock, flags);
>
>   	cancel_work_sync(&sdev->event_work);
> @@ -956,6 +955,8 @@ int scsi_sysfs_add_sdev(struct scsi_device *sdev)
>   void __scsi_remove_device(struct scsi_device *sdev)
>   {
>   	struct device *dev = &sdev->sdev_gendev;
> +	struct Scsi_Host *shost = sdev->host;
> +	unsigned long flags;
>
>   	if (sdev->is_visible) {
>   		if (scsi_device_set_state(sdev, SDEV_CANCEL) != 0)
> @@ -973,7 +974,13 @@ void __scsi_remove_device(struct scsi_device *sdev)
>   	 * scsi_run_queue() invocations have finished before tearing down the
>   	 * device.
>   	 */
> +
>   	scsi_device_set_state(sdev, SDEV_DEL);
> +
> +	spin_lock_irqsave(shost->host_lock, flags);
> +	list_del(&sdev->starved_entry);
> +	spin_unlock_irqrestore(shost->host_lock, flags);
> +
>   	blk_cleanup_queue(sdev->request_queue);
>   	cancel_work_sync(&sdev->requeue_work);
>

Please ignore this patch. Even with this patch applied there is still a 
race condition present, namely that the __blk_run_queue() call in 
scsi_run_queue() can get invoked after __scsi_remove_device() invoked 
put_device().

Bart.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 6/7] Fix race between starved list processing and device removal
       [not found]         ` <026701cdb8c3$d2e3cb50$78ab61f0$@min@lge.com>
@ 2012-11-21 12:10           ` Bart Van Assche
  0 siblings, 0 replies; 24+ messages in thread
From: Bart Van Assche @ 2012-11-21 12:10 UTC (permalink / raw)
  To: Chanho Min
  Cc: 'Zhuang, Jin Can', 'linux-scsi',
	'James Bottomley', 'Mike Christie',
	'Jens Axboe', 'Tejun Heo'

On 11/02/12 07:32, Chanho Min wrote:
>> Yes. Here's the warning.
>> For the trace below, I used scsi_device_get/scsi_device_put() in scsi_run_queue(). (A little different
>>from your patch). But I think it's the same.
> 
> I think it's correct. cancel_work_sync can sleep. It is caught under CONFIG_DEBUG_ATOMIC_SLEEP.
> What if we only enable irq at cancel_work_sync as the patch bellows?
> 
> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
> index bb7c482..6e17db9 100644
> --- a/drivers/scsi/scsi_sysfs.c
> +++ b/drivers/scsi/scsi_sysfs.c
> @@ -350,7 +350,9 @@ static void scsi_device_dev_release_usercontext(struct work_struct *work)
>          list_del(&sdev->starved_entry);
>          spin_unlock_irqrestore(sdev->host->host_lock, flags);
>   
> +       local_irq_enable();
>          cancel_work_sync(&sdev->event_work);
> +       local_irq_restore(flags);
>   
>          list_for_each_safe(this, tmp, &sdev->event_list) {
>                  struct scsi_event *evt;
> 

As far as I can see this should work but unfortunately this change
creates a nontrivial dependency between scsi_run_queue() and
scsi_device_dev_release_usercontext(). Personally I would prefer
something like this follow-up patch:

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 71bddec..20ea2e9 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -453,15 +453,12 @@ static void scsi_run_queue(struct request_queue *q)
                }
 
                get_device(&sdev->sdev_gendev);
-               spin_unlock(shost->host_lock);
-
-               spin_lock(sdev->request_queue->queue_lock);
-               __blk_run_queue(sdev->request_queue);
-               spin_unlock(sdev->request_queue->queue_lock);
+               spin_unlock_irqrestore(shost->host_lock, flags);
 
+               blk_run_queue(sdev->request_queue);
                put_device(&sdev->sdev_gendev);
 
-               spin_lock(shost->host_lock);
+               spin_lock_irqsave(shost->host_lock, flags);
        }
        /* put any unprocessed entries back */
        list_splice(&starved_list, &shost->starved_list);

Bart.

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/7 v5] More device removal fixes
  2012-10-26 12:00 [PATCH 0/7 v5] More device removal fixes Bart Van Assche
                   ` (6 preceding siblings ...)
  2012-10-26 12:05 ` [PATCH 7/7] Remove get_device() / put_device() pair from scsi_request_fn() Bart Van Assche
@ 2012-11-23 10:37 ` Bart Van Assche
  2012-11-26 17:19   ` Bart Van Assche
  7 siblings, 1 reply; 24+ messages in thread
From: Bart Van Assche @ 2012-11-23 10:37 UTC (permalink / raw)
  To: linux-scsi, James Bottomley, Mike Christie, Jens Axboe,
	Tejun Heo, Chanho Min

On 10/26/12 14:00, Bart Van Assche wrote:
> Fix a few race conditions that can be triggered by removing a device:
> [ ... ]

Hello,

I'd like to add the patch below to this series. This is something I came
up with after analyzing why a crash was triggered during an SRP failover
test. One of the functions in the crash call stack was blk_delay_work().

Bart.


[PATCH] block: Avoid scheduling delayed work on a dead queue

Running a queue must continue after it has been marked dying until
it has been marked dead. So the function blk_run_queue_async() must
not schedule delayed work after blk_cleanup_queue() has marked a queue
dead. Hence add a test for that queue state in blk_run_queue_async()
and make sure that queue_unplugged() invokes that function with the
queue lock held. This avoids that the queue state can change after
it has been tested and before mod_delayed_work() is invoked. Drop
the queue dying test in queue_unplugged() since it is now
superfluous: __blk_run_queue() already tests whether or not the
queue is dead.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: Jens Axboe <axboe@kernel.dk>
---
 block/blk-core.c |   26 +++++---------------------
 1 file changed, 5 insertions(+), 21 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index e4f4e06..212c878 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -343,11 +343,11 @@ EXPORT_SYMBOL(__blk_run_queue);
  *
  * Description:
  *    Tells kblockd to perform the equivalent of @blk_run_queue on behalf
- *    of us.
+ *    of us. The caller must hold the queue lock.
  */
 void blk_run_queue_async(struct request_queue *q)
 {
-	if (likely(!blk_queue_stopped(q)))
+	if (likely(!blk_queue_stopped(q) && !blk_queue_dead(q)))
 		mod_delayed_work(kblockd_workqueue, &q->delay_work, 0);
 }
 EXPORT_SYMBOL(blk_run_queue_async);
@@ -2923,27 +2923,11 @@ static void queue_unplugged(struct request_queue *q, unsigned int depth,
 {
 	trace_block_unplug(q, depth, !from_schedule);
 
-	/*
-	 * Don't mess with a dying queue.
-	 */
-	if (unlikely(blk_queue_dying(q))) {
-		spin_unlock(q->queue_lock);
-		return;
-	}
-
-	/*
-	 * If we are punting this to kblockd, then we can safely drop
-	 * the queue_lock before waking kblockd (which needs to take
-	 * this lock).
-	 */
-	if (from_schedule) {
-		spin_unlock(q->queue_lock);
+	if (from_schedule)
 		blk_run_queue_async(q);
-	} else {
+	else
 		__blk_run_queue(q);
-		spin_unlock(q->queue_lock);
-	}
-
+	spin_unlock(q->queue_lock);
 }
 
 static void flush_plug_callbacks(struct blk_plug *plug, bool from_schedule)
-- 
1.7.10.4



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/7 v5] More device removal fixes
  2012-11-23 10:37 ` [PATCH 0/7 v5] More device removal fixes Bart Van Assche
@ 2012-11-26 17:19   ` Bart Van Assche
  0 siblings, 0 replies; 24+ messages in thread
From: Bart Van Assche @ 2012-11-26 17:19 UTC (permalink / raw)
  To: linux-scsi, James Bottomley, Mike Christie, Jens Axboe,
	Tejun Heo, Chanho Min

On 11/23/12 11:37, Bart Van Assche wrote:
> On 10/26/12 14:00, Bart Van Assche wrote:
>> Fix a few race conditions that can be triggered by removing a device:
>> [ ... ]
> 
> I'd like to add the patch below to this series. [ ... ]

Below is another patch (hopefully the last) that I'd like to add to
this series. Note: the reason that I only ran into this issue now is
because I only started testing very recently with an ib_srp version
with the host state check removed from srp_queuecommand().

[PATCH] Make scsi_remove_host() wait for device removal

SCSI LLDs may start cleaning up host resources needed by their
queuecommand() callback as soon as scsi_remove_host() finished.
Hence scsi_remove_host() must wait until blk_cleanup_queue() for
all devices associated with the host has finished.

This patch fixes the following kernel oops:

BUG: spinlock bad magic on CPU#0, multipathd/20128
 lock: 0xffff8801b32e8bc8, .magic: ffff8801, .owner: <none>/-1, .owner_cpu: -1556759232
Pid: 20128, comm: multipathd Not tainted 3.7.0-rc7-debug+ #1
Call Trace:
 [<ffffffff8141206f>] spin_dump+0x8c/0x91
 [<ffffffff81412095>] spin_bug+0x21/0x26
 [<ffffffff81218a6f>] do_raw_spin_lock+0x13f/0x150
 [<ffffffff81417b38>] _raw_spin_lock_irqsave+0x78/0xa0
 [<ffffffffa0293c6c>] srp_queuecommand+0x3c/0xc80 [ib_srp]
 [<ffffffffa0002f18>] scsi_dispatch_cmd+0x148/0x310 [scsi_mod]
 [<ffffffffa000a080>] scsi_request_fn+0x320/0x520 [scsi_mod]
 [<ffffffff811ec397>] __blk_run_queue+0x37/0x50
 [<ffffffff811ec3ee>] queue_unplugged+0x3e/0xd0
 [<ffffffff811eef33>] blk_flush_plug_list+0x1c3/0x240
 [<ffffffff811eefc8>] blk_finish_plug+0x18/0x50
 [<ffffffff8119661e>] do_io_submit+0x29e/0xa00
 [<ffffffff81196d90>] sys_io_submit+0x10/0x20
 [<ffffffff81420d82>] system_call_fastpath+0x16/0x1b
---
 drivers/scsi/hosts.c      |   16 ++++++++++++++++
 drivers/scsi/scsi_priv.h  |    2 +-
 drivers/scsi/scsi_scan.c  |    5 ++++-
 drivers/scsi/scsi_sysfs.c |   15 ++++++++++++---
 include/scsi/scsi_host.h  |    1 +
 5 files changed, 34 insertions(+), 5 deletions(-)

diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
index 593085a..7b6b0b2 100644
--- a/drivers/scsi/hosts.c
+++ b/drivers/scsi/hosts.c
@@ -150,6 +150,19 @@ int scsi_host_set_state(struct Scsi_Host *shost, enum scsi_host_state state)
 }
 EXPORT_SYMBOL(scsi_host_set_state);
 
+static bool scsi_device_list_empty(struct Scsi_Host *shost)
+{
+	bool res;
+
+	WARN_ON_ONCE(irqs_disabled());
+
+	spin_lock_irq(shost->host_lock);
+	res = list_empty(&shost->__devices);
+	spin_unlock_irq(shost->host_lock);
+
+	return res;
+}
+
 /**
  * scsi_remove_host - remove a scsi host
  * @shost:	a pointer to a scsi host to remove
@@ -178,6 +191,8 @@ void scsi_remove_host(struct Scsi_Host *shost)
 		BUG_ON(scsi_host_set_state(shost, SHOST_DEL_RECOVERY));
 	spin_unlock_irqrestore(shost->host_lock, flags);
 
+	wait_event(shost->device_list_empty, scsi_device_list_empty(shost));
+
 	transport_unregister_device(&shost->shost_gendev);
 	device_unregister(&shost->shost_dev);
 	device_del(&shost->shost_gendev);
@@ -351,6 +366,7 @@ struct Scsi_Host *scsi_host_alloc(struct scsi_host_template *sht, int privsize)
 	shost->shost_state = SHOST_CREATED;
 	INIT_LIST_HEAD(&shost->__devices);
 	INIT_LIST_HEAD(&shost->__targets);
+	init_waitqueue_head(&shost->device_list_empty);
 	INIT_LIST_HEAD(&shost->eh_cmd_q);
 	INIT_LIST_HEAD(&shost->starved_list);
 	init_waitqueue_head(&shost->host_wait);
diff --git a/drivers/scsi/scsi_priv.h b/drivers/scsi/scsi_priv.h
index 8f9a0ca..86250fb 100644
--- a/drivers/scsi/scsi_priv.h
+++ b/drivers/scsi/scsi_priv.h
@@ -130,7 +130,7 @@ extern int scsi_sysfs_add_sdev(struct scsi_device *);
 extern int scsi_sysfs_add_host(struct Scsi_Host *);
 extern int scsi_sysfs_register(void);
 extern void scsi_sysfs_unregister(void);
-extern void scsi_sysfs_device_initialize(struct scsi_device *);
+extern int scsi_sysfs_device_initialize(struct scsi_device *);
 extern int scsi_sysfs_target_initialize(struct scsi_device *);
 extern struct scsi_transport_template blank_transport_template;
 extern void __scsi_remove_device(struct scsi_device *);
diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index 3e58b22..ddea318 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -289,7 +289,10 @@ static struct scsi_device *scsi_alloc_sdev(struct scsi_target *starget,
 	sdev->request_queue->queuedata = sdev;
 	scsi_adjust_queue_depth(sdev, 0, sdev->host->cmd_per_lun);
 
-	scsi_sysfs_device_initialize(sdev);
+	if (scsi_sysfs_device_initialize(sdev)) {
+		display_failure_msg = 0;
+		goto out_device_destroy;
+	}
 
 	if (shost->hostt->slave_alloc) {
 		ret = shost->hostt->slave_alloc(sdev);
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index 2661a957..760fc85 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -348,6 +348,8 @@ static void scsi_device_dev_release_usercontext(struct work_struct *work)
 	starget->reap_ref++;
 	list_del(&sdev->siblings);
 	list_del(&sdev->same_target_siblings);
+	if (list_empty(&sdev->host->__devices))
+		wake_up(&sdev->host->device_list_empty);
 	spin_unlock_irqrestore(sdev->host->host_lock, flags);
 
 	cancel_work_sync(&sdev->event_work);
@@ -1109,11 +1111,12 @@ static struct device_type scsi_dev_type = {
 	.groups =	scsi_sdev_attr_groups,
 };
 
-void scsi_sysfs_device_initialize(struct scsi_device *sdev)
+int scsi_sysfs_device_initialize(struct scsi_device *sdev)
 {
 	unsigned long flags;
 	struct Scsi_Host *shost = sdev->host;
 	struct scsi_target  *starget = sdev->sdev_target;
+	int ret = -ENODEV;
 
 	device_initialize(&sdev->sdev_gendev);
 	sdev->sdev_gendev.bus = &scsi_bus_type;
@@ -1128,10 +1131,16 @@ void scsi_sysfs_device_initialize(struct scsi_device *sdev)
 		     sdev->host->host_no, sdev->channel, sdev->id, sdev->lun);
 	sdev->scsi_level = starget->scsi_level;
 	transport_setup_device(&sdev->sdev_gendev);
+
 	spin_lock_irqsave(shost->host_lock, flags);
-	list_add_tail(&sdev->same_target_siblings, &starget->devices);
-	list_add_tail(&sdev->siblings, &shost->__devices);
+	if (scsi_host_scan_allowed(shost)) {
+		list_add_tail(&sdev->same_target_siblings, &starget->devices);
+		list_add_tail(&sdev->siblings, &shost->__devices);
+		ret = 0;
+	}
 	spin_unlock_irqrestore(shost->host_lock, flags);
+
+	return ret;
 }
 
 int scsi_is_sdev_device(const struct device *dev)
diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
index 4908480..4ad2d9f 100644
--- a/include/scsi/scsi_host.h
+++ b/include/scsi/scsi_host.h
@@ -561,6 +561,7 @@ struct Scsi_Host {
 	 */
 	struct list_head	__devices;
 	struct list_head	__targets;
+	wait_queue_head_t	device_list_empty;
 	
 	struct scsi_host_cmd_pool *cmd_pool;
 	spinlock_t		free_list_lock;
-- 
1.7.10.4




^ permalink raw reply related	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2012-11-26 17:19 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-10-26 12:00 [PATCH 0/7 v5] More device removal fixes Bart Van Assche
2012-10-26 12:01 ` [PATCH 1/7] block: Avoid that blk_drain_queue() finishes early Bart Van Assche
2012-10-29  1:47   ` Tejun Heo
2012-10-29  1:52     ` Tejun Heo
2012-10-29 14:35       ` Bart Van Assche
2012-10-26 12:02 ` [PATCH 2/7] block: Let blk_drain_queue() caller obtain the queue lock Bart Van Assche
2012-10-29  1:55   ` Tejun Heo
2012-10-26 12:02 ` [PATCH 3/7] block: Rename queue dead flag Bart Van Assche
2012-10-26 12:03 ` [PATCH 4/7] block: Avoid that request_fn is invoked on a dead queue Bart Van Assche
2012-10-29  1:59   ` Tejun Heo
2012-10-26 12:04 ` [PATCH 5/7] block: Make blk_cleanup_queue() wait until request_fn finished Bart Van Assche
2012-10-29  2:00   ` Tejun Heo
2012-10-26 12:05 ` [PATCH 6/7] Fix race between starved list processing and device removal Bart Van Assche
2012-10-28 18:01   ` Zhuang, Jin Can
2012-10-29 14:32     ` Bart Van Assche
2012-10-30  5:40       ` Zhuang, Jin Can
2012-11-02 10:48         ` Bart Van Assche
2012-11-21 11:06           ` Bart Van Assche
     [not found]         ` <026701cdb8c3$d2e3cb50$78ab61f0$@min@lge.com>
2012-11-21 12:10           ` Bart Van Assche
2012-10-29  2:07   ` Tejun Heo
2012-10-26 12:05 ` [PATCH 7/7] Remove get_device() / put_device() pair from scsi_request_fn() Bart Van Assche
2012-10-29  2:08   ` Tejun Heo
2012-11-23 10:37 ` [PATCH 0/7 v5] More device removal fixes Bart Van Assche
2012-11-26 17:19   ` Bart Van Assche

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.