All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/7] Introduce blk_quiesce_queue() and blk_resume_queue()
@ 2016-09-28 23:57 ` Bart Van Assche
  0 siblings, 0 replies; 75+ messages in thread
From: Bart Van Assche @ 2016-09-28 23:57 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Christoph Hellwig, James Bottomley, Martin K. Petersen,
	Mike Snitzer, Doug Ledford, Keith Busch, linux-block, linux-scsi,
	linux-rdma, linux-nvme

Hello Jens,

Multiple block drivers need the functionality to stop a request queue 
and to wait until all ongoing request_fn() / queue_rq() calls have 
finished without waiting until all outstanding requests have finished. 
Hence this patch series that introduces the blk_mq_quiesce_queue() and 
blk_mq_resume_queue() functions. The dm-mq, SRP and nvme patches in this 
patch series are three examples of where these functions are useful. 
These patches apply on top of the September 21 version of your 
for-4.9/block branch.

The changes compared to the previous version of this patch series are:
- Dropped the non-blk-mq changes from this patch series.
- Added support for harware queues with BLK_MQ_F_BLOCKING set.
- Added a call stack to the description of the dm race fix patch.
- Dropped the non-scsi-mq changes from the SRP patch.
- Added a patch that introduces blk_mq_queue_stopped() in the dm driver.

The individual patches in this series are:

0001-blk-mq-Introduce-blk_mq_queue_stopped.patch
0002-dm-Use-BLK_MQ_S_STOPPED-instead-of-QUEUE_FLAG_STOPPE.patch
0003-RFC-nvme-Use-BLK_MQ_S_STOPPED-instead-of-QUEUE_FLAG_.patch
0004-blk-mq-Introduce-blk_quiesce_queue-and-blk_resume_qu.patch
0005-dm-Fix-a-race-condition-related-to-stopping-and-star.patch
0006-SRP-transport-Port-srp_wait_for_queuecommand-to-scsi.patch
0007-RFC-nvme-Fix-a-race-condition.patch

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH v2 0/7] Introduce blk_quiesce_queue() and blk_resume_queue()
@ 2016-09-28 23:57 ` Bart Van Assche
  0 siblings, 0 replies; 75+ messages in thread
From: Bart Van Assche @ 2016-09-28 23:57 UTC (permalink / raw)


Hello Jens,

Multiple block drivers need the functionality to stop a request queue 
and to wait until all ongoing request_fn() / queue_rq() calls have 
finished without waiting until all outstanding requests have finished. 
Hence this patch series that introduces the blk_mq_quiesce_queue() and 
blk_mq_resume_queue() functions. The dm-mq, SRP and nvme patches in this 
patch series are three examples of where these functions are useful. 
These patches apply on top of the September 21 version of your 
for-4.9/block branch.

The changes compared to the previous version of this patch series are:
- Dropped the non-blk-mq changes from this patch series.
- Added support for harware queues with BLK_MQ_F_BLOCKING set.
- Added a call stack to the description of the dm race fix patch.
- Dropped the non-scsi-mq changes from the SRP patch.
- Added a patch that introduces blk_mq_queue_stopped() in the dm driver.

The individual patches in this series are:

0001-blk-mq-Introduce-blk_mq_queue_stopped.patch
0002-dm-Use-BLK_MQ_S_STOPPED-instead-of-QUEUE_FLAG_STOPPE.patch
0003-RFC-nvme-Use-BLK_MQ_S_STOPPED-instead-of-QUEUE_FLAG_.patch
0004-blk-mq-Introduce-blk_quiesce_queue-and-blk_resume_qu.patch
0005-dm-Fix-a-race-condition-related-to-stopping-and-star.patch
0006-SRP-transport-Port-srp_wait_for_queuecommand-to-scsi.patch
0007-RFC-nvme-Fix-a-race-condition.patch

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH v2 1/7] blk-mq: Introduce blk_mq_queue_stopped()
  2016-09-28 23:57 ` Bart Van Assche
@ 2016-09-28 23:57   ` Bart Van Assche
  -1 siblings, 0 replies; 75+ messages in thread
From: Bart Van Assche @ 2016-09-28 23:57 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Christoph Hellwig, James Bottomley, Martin K. Petersen,
	Mike Snitzer, Doug Ledford, Keith Busch, linux-block, linux-scsi,
	linux-rdma, linux-nvme

The function blk_queue_stopped() allows to test whether or not a
traditional request queue has been stopped. Introduce a helper
function that allows block drivers to query easily whether or not
one or more hardware contexts of a blk-mq queue have been stopped.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
---
 block/blk-mq.c         | 20 ++++++++++++++++++++
 include/linux/blk-mq.h |  1 +
 2 files changed, 21 insertions(+)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index dc5f47f..d8c45de 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -950,6 +950,26 @@ void blk_mq_run_hw_queues(struct request_queue *q, bool async)
 }
 EXPORT_SYMBOL(blk_mq_run_hw_queues);
 
+/**
+ * blk_mq_queue_stopped() - check whether one or more hctxs have been stopped
+ * @q: request queue.
+ *
+ * The caller is responsible for serializing this function against
+ * blk_mq_{start,stop}_hw_queue().
+ */
+bool blk_mq_queue_stopped(struct request_queue *q)
+{
+	struct blk_mq_hw_ctx *hctx;
+	int i;
+
+	queue_for_each_hw_ctx(q, hctx, i)
+		if (test_bit(BLK_MQ_S_STOPPED, &hctx->state))
+			return true;
+
+	return false;
+}
+EXPORT_SYMBOL(blk_mq_queue_stopped);
+
 void blk_mq_stop_hw_queue(struct blk_mq_hw_ctx *hctx)
 {
 	cancel_work(&hctx->run_work);
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 5daa0ef..368c460d 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -233,6 +233,7 @@ void blk_mq_delay_kick_requeue_list(struct request_queue *q, unsigned long msecs
 void blk_mq_abort_requeue_list(struct request_queue *q);
 void blk_mq_complete_request(struct request *rq, int error);
 
+bool blk_mq_queue_stopped(struct request_queue *q);
 void blk_mq_stop_hw_queue(struct blk_mq_hw_ctx *hctx);
 void blk_mq_start_hw_queue(struct blk_mq_hw_ctx *hctx);
 void blk_mq_stop_hw_queues(struct request_queue *q);
-- 
2.10.0


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v2 1/7] blk-mq: Introduce blk_mq_queue_stopped()
@ 2016-09-28 23:57   ` Bart Van Assche
  0 siblings, 0 replies; 75+ messages in thread
From: Bart Van Assche @ 2016-09-28 23:57 UTC (permalink / raw)


The function blk_queue_stopped() allows to test whether or not a
traditional request queue has been stopped. Introduce a helper
function that allows block drivers to query easily whether or not
one or more hardware contexts of a blk-mq queue have been stopped.

Signed-off-by: Bart Van Assche <bart.vanassche at sandisk.com>
Reviewed-by: Hannes Reinecke <hare at suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn at suse.de>
Cc: Christoph Hellwig <hch at lst.de>
---
 block/blk-mq.c         | 20 ++++++++++++++++++++
 include/linux/blk-mq.h |  1 +
 2 files changed, 21 insertions(+)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index dc5f47f..d8c45de 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -950,6 +950,26 @@ void blk_mq_run_hw_queues(struct request_queue *q, bool async)
 }
 EXPORT_SYMBOL(blk_mq_run_hw_queues);
 
+/**
+ * blk_mq_queue_stopped() - check whether one or more hctxs have been stopped
+ * @q: request queue.
+ *
+ * The caller is responsible for serializing this function against
+ * blk_mq_{start,stop}_hw_queue().
+ */
+bool blk_mq_queue_stopped(struct request_queue *q)
+{
+	struct blk_mq_hw_ctx *hctx;
+	int i;
+
+	queue_for_each_hw_ctx(q, hctx, i)
+		if (test_bit(BLK_MQ_S_STOPPED, &hctx->state))
+			return true;
+
+	return false;
+}
+EXPORT_SYMBOL(blk_mq_queue_stopped);
+
 void blk_mq_stop_hw_queue(struct blk_mq_hw_ctx *hctx)
 {
 	cancel_work(&hctx->run_work);
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 5daa0ef..368c460d 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -233,6 +233,7 @@ void blk_mq_delay_kick_requeue_list(struct request_queue *q, unsigned long msecs
 void blk_mq_abort_requeue_list(struct request_queue *q);
 void blk_mq_complete_request(struct request *rq, int error);
 
+bool blk_mq_queue_stopped(struct request_queue *q);
 void blk_mq_stop_hw_queue(struct blk_mq_hw_ctx *hctx);
 void blk_mq_start_hw_queue(struct blk_mq_hw_ctx *hctx);
 void blk_mq_stop_hw_queues(struct request_queue *q);
-- 
2.10.0

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v2 4/7] blk-mq: Introduce blk_quiesce_queue() and blk_resume_queue()
  2016-09-28 23:57 ` Bart Van Assche
@ 2016-09-28 23:59   ` Bart Van Assche
  -1 siblings, 0 replies; 75+ messages in thread
From: Bart Van Assche @ 2016-09-28 23:59 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Christoph Hellwig, James Bottomley, Martin K. Petersen,
	Mike Snitzer, Doug Ledford, Keith Busch, linux-block, linux-scsi,
	linux-rdma, linux-nvme

blk_quiesce_queue() prevents that new queue_rq() invocations
occur and waits until ongoing invocations have finished. This
function does *not* wait until all outstanding requests have
finished (this means invocation of request.end_io()).
blk_resume_queue() resumes normal I/O processing.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
---
 block/blk-mq.c         | 137 ++++++++++++++++++++++++++++++++++++++++++++-----
 include/linux/blk-mq.h |   2 +
 include/linux/blkdev.h |   5 ++
 3 files changed, 131 insertions(+), 13 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index d8c45de..f5c71ad 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -58,15 +58,23 @@ static void blk_mq_hctx_clear_pending(struct blk_mq_hw_ctx *hctx,
 	sbitmap_clear_bit(&hctx->ctx_map, ctx->index_hw);
 }
 
-void blk_mq_freeze_queue_start(struct request_queue *q)
+static bool __blk_mq_freeze_queue_start(struct request_queue *q,
+					bool kill_percpu_ref)
 {
 	int freeze_depth;
 
 	freeze_depth = atomic_inc_return(&q->mq_freeze_depth);
 	if (freeze_depth == 1) {
-		percpu_ref_kill(&q->q_usage_counter);
+		if (kill_percpu_ref)
+			percpu_ref_kill(&q->q_usage_counter);
 		blk_mq_run_hw_queues(q, false);
 	}
+	return freeze_depth == 1;
+}
+
+void blk_mq_freeze_queue_start(struct request_queue *q)
+{
+	__blk_mq_freeze_queue_start(q, true);
 }
 EXPORT_SYMBOL_GPL(blk_mq_freeze_queue_start);
 
@@ -102,19 +110,90 @@ void blk_mq_freeze_queue(struct request_queue *q)
 }
 EXPORT_SYMBOL_GPL(blk_mq_freeze_queue);
 
-void blk_mq_unfreeze_queue(struct request_queue *q)
+static bool __blk_mq_unfreeze_queue(struct request_queue *q,
+				    bool reinit_percpu_ref)
 {
 	int freeze_depth;
 
 	freeze_depth = atomic_dec_return(&q->mq_freeze_depth);
 	WARN_ON_ONCE(freeze_depth < 0);
 	if (!freeze_depth) {
-		percpu_ref_reinit(&q->q_usage_counter);
+		if (reinit_percpu_ref)
+			percpu_ref_reinit(&q->q_usage_counter);
 		wake_up_all(&q->mq_freeze_wq);
 	}
+	return freeze_depth == 0;
+}
+
+void blk_mq_unfreeze_queue(struct request_queue *q)
+{
+	__blk_mq_unfreeze_queue(q, true);
 }
 EXPORT_SYMBOL_GPL(blk_mq_unfreeze_queue);
 
+/**
+ * blk_mq_quiesce_queue() - wait until all pending queue_rq calls have finished
+ *
+ * Prevent that new I/O requests are queued and wait until all pending
+ * queue_rq() calls have finished. Must not be called if the queue has already
+ * been frozen. Additionally, freezing the queue after having quiesced the
+ * queue and before resuming the queue is not allowed.
+ *
+ * Note: this function does not prevent that the struct request end_io()
+ * callback function is invoked.
+ */
+void blk_mq_quiesce_queue(struct request_queue *q)
+{
+	struct blk_mq_hw_ctx *hctx;
+	unsigned int i;
+	bool res, rcu = false;
+
+	spin_lock_irq(q->queue_lock);
+	WARN_ON_ONCE(blk_queue_quiescing(q));
+	queue_flag_set(QUEUE_FLAG_QUIESCING, q);
+	spin_unlock_irq(q->queue_lock);
+
+	res = __blk_mq_freeze_queue_start(q, false);
+	WARN_ON_ONCE(!res);
+	queue_for_each_hw_ctx(q, hctx, i) {
+		if (hctx->flags & BLK_MQ_F_BLOCKING) {
+			mutex_lock(&hctx->queue_rq_mutex);
+			mutex_unlock(&hctx->queue_rq_mutex);
+		} else {
+			rcu = true;
+		}
+	}
+	if (rcu)
+		synchronize_rcu();
+
+	spin_lock_irq(q->queue_lock);
+	WARN_ON_ONCE(!blk_queue_quiescing(q));
+	queue_flag_clear(QUEUE_FLAG_QUIESCING, q);
+	spin_unlock_irq(q->queue_lock);
+}
+EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue);
+
+/**
+ * blk_mq_resume_queue() - resume request processing
+ *
+ * Resume blk_queue_enter() calls that have been suspended by
+ * blk_mq_quiesce_queue().
+ *
+ * The caller is responsible for serializing blk_mq_quiesce_queue() and
+ * blk_mq_resume_queue().
+ */
+void blk_mq_resume_queue(struct request_queue *q)
+{
+	bool res;
+
+	res = __blk_mq_unfreeze_queue(q, false);
+	WARN_ON_ONCE(!res);
+	WARN_ON_ONCE(blk_queue_quiescing(q));
+
+	blk_mq_run_hw_queues(q, false);
+}
+EXPORT_SYMBOL_GPL(blk_mq_resume_queue);
+
 void blk_mq_wake_waiters(struct request_queue *q)
 {
 	struct blk_mq_hw_ctx *hctx;
@@ -488,6 +567,9 @@ static void blk_mq_requeue_work(struct work_struct *work)
 	struct request *rq, *next;
 	unsigned long flags;
 
+	if (blk_queue_quiescing(q))
+		return;
+
 	spin_lock_irqsave(&q->requeue_lock, flags);
 	list_splice_init(&q->requeue_list, &rq_list);
 	spin_unlock_irqrestore(&q->requeue_lock, flags);
@@ -782,7 +864,7 @@ static inline unsigned int queued_to_index(unsigned int queued)
  * of IO. In particular, we'd like FIFO behaviour on handling existing
  * items on the hctx->dispatch list. Ignore that for now.
  */
-static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
+static void blk_mq_process_rq_list(struct blk_mq_hw_ctx *hctx)
 {
 	struct request_queue *q = hctx->queue;
 	struct request *rq;
@@ -791,9 +873,6 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
 	struct list_head *dptr;
 	int queued;
 
-	if (unlikely(test_bit(BLK_MQ_S_STOPPED, &hctx->state)))
-		return;
-
 	WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) &&
 		cpu_online(hctx->next_cpu));
 
@@ -883,7 +962,24 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
 		 *
 		 * blk_mq_run_hw_queue() already checks the STOPPED bit
 		 **/
-		blk_mq_run_hw_queue(hctx, true);
+		if (!blk_queue_quiescing(q))
+			blk_mq_run_hw_queue(hctx, true);
+	}
+}
+
+static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
+{
+	if (unlikely(test_bit(BLK_MQ_S_STOPPED, &hctx->state)))
+		return;
+
+	if (hctx->flags & BLK_MQ_F_BLOCKING) {
+		mutex_lock(&hctx->queue_rq_mutex);
+		blk_mq_process_rq_list(hctx);
+		mutex_unlock(&hctx->queue_rq_mutex);
+	} else {
+		rcu_read_lock();
+		blk_mq_process_rq_list(hctx);
+		rcu_read_unlock();
 	}
 }
 
@@ -1341,7 +1437,7 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
 		blk_mq_bio_to_request(rq, bio);
 
 		/*
-		 * We do limited pluging. If the bio can be merged, do that.
+		 * We do limited plugging. If the bio can be merged, do that.
 		 * Otherwise the existing request in the plug list will be
 		 * issued. So the plug list will have one request at most
 		 */
@@ -1361,9 +1457,23 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
 		blk_mq_put_ctx(data.ctx);
 		if (!old_rq)
 			goto done;
-		if (!blk_mq_direct_issue_request(old_rq, &cookie))
-			goto done;
-		blk_mq_insert_request(old_rq, false, true, true);
+
+		if (data.hctx->flags & BLK_MQ_F_BLOCKING) {
+			mutex_lock(&data.hctx->queue_rq_mutex);
+			if (blk_queue_quiescing(q) ||
+			    blk_mq_direct_issue_request(old_rq, &cookie) != 0)
+				blk_mq_insert_request(old_rq, false, true,
+						      true);
+			mutex_unlock(&data.hctx->queue_rq_mutex);
+		} else {
+			rcu_read_lock();
+			if (blk_queue_quiescing(q) ||
+			    blk_mq_direct_issue_request(old_rq, &cookie) != 0)
+				blk_mq_insert_request(old_rq, false, true,
+						      true);
+			rcu_read_unlock();
+		}
+
 		goto done;
 	}
 
@@ -1702,6 +1812,7 @@ static int blk_mq_init_hctx(struct request_queue *q,
 	INIT_DELAYED_WORK(&hctx->delay_work, blk_mq_delay_work_fn);
 	spin_lock_init(&hctx->lock);
 	INIT_LIST_HEAD(&hctx->dispatch);
+	mutex_init(&hctx->queue_rq_mutex);
 	hctx->queue = q;
 	hctx->queue_num = hctx_idx;
 	hctx->flags = set->flags & ~BLK_MQ_F_TAG_SHARED;
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 368c460d..4b970f1 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -41,6 +41,8 @@ struct blk_mq_hw_ctx {
 
 	struct blk_mq_tags	*tags;
 
+	struct mutex		queue_rq_mutex;
+
 	unsigned long		queued;
 	unsigned long		run;
 #define BLK_MQ_MAX_DISPATCH_ORDER	7
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index c47c358..d365cdf 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -505,6 +505,7 @@ struct request_queue {
 #define QUEUE_FLAG_FUA	       24	/* device supports FUA writes */
 #define QUEUE_FLAG_FLUSH_NQ    25	/* flush not queueuable */
 #define QUEUE_FLAG_DAX         26	/* device supports DAX */
+#define QUEUE_FLAG_QUIESCING   27
 
 #define QUEUE_FLAG_DEFAULT	((1 << QUEUE_FLAG_IO_STAT) |		\
 				 (1 << QUEUE_FLAG_STACKABLE)	|	\
@@ -595,6 +596,8 @@ static inline void queue_flag_clear(unsigned int flag, struct request_queue *q)
 #define blk_queue_secure_erase(q) \
 	(test_bit(QUEUE_FLAG_SECERASE, &(q)->queue_flags))
 #define blk_queue_dax(q)	test_bit(QUEUE_FLAG_DAX, &(q)->queue_flags)
+#define blk_queue_quiescing(q)	test_bit(QUEUE_FLAG_QUIESCING,	\
+					 &(q)->queue_flags)
 
 #define blk_noretry_request(rq) \
 	((rq)->cmd_flags & (REQ_FAILFAST_DEV|REQ_FAILFAST_TRANSPORT| \
@@ -824,6 +827,8 @@ extern void __blk_run_queue(struct request_queue *q);
 extern void __blk_run_queue_uncond(struct request_queue *q);
 extern void blk_run_queue(struct request_queue *);
 extern void blk_run_queue_async(struct request_queue *q);
+extern void blk_mq_quiesce_queue(struct request_queue *q);
+extern void blk_mq_resume_queue(struct request_queue *q);
 extern int blk_rq_map_user(struct request_queue *, struct request *,
 			   struct rq_map_data *, void __user *, unsigned long,
 			   gfp_t);
-- 
2.10.0


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v2 4/7] blk-mq: Introduce blk_quiesce_queue() and blk_resume_queue()
@ 2016-09-28 23:59   ` Bart Van Assche
  0 siblings, 0 replies; 75+ messages in thread
From: Bart Van Assche @ 2016-09-28 23:59 UTC (permalink / raw)


blk_quiesce_queue() prevents that new queue_rq() invocations
occur and waits until ongoing invocations have finished. This
function does *not* wait until all outstanding requests have
finished (this means invocation of request.end_io()).
blk_resume_queue() resumes normal I/O processing.

Signed-off-by: Bart Van Assche <bart.vanassche at sandisk.com>
Cc: Hannes Reinecke <hare at suse.com>
Cc: Johannes Thumshirn <jthumshirn at suse.de>
---
 block/blk-mq.c         | 137 ++++++++++++++++++++++++++++++++++++++++++++-----
 include/linux/blk-mq.h |   2 +
 include/linux/blkdev.h |   5 ++
 3 files changed, 131 insertions(+), 13 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index d8c45de..f5c71ad 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -58,15 +58,23 @@ static void blk_mq_hctx_clear_pending(struct blk_mq_hw_ctx *hctx,
 	sbitmap_clear_bit(&hctx->ctx_map, ctx->index_hw);
 }
 
-void blk_mq_freeze_queue_start(struct request_queue *q)
+static bool __blk_mq_freeze_queue_start(struct request_queue *q,
+					bool kill_percpu_ref)
 {
 	int freeze_depth;
 
 	freeze_depth = atomic_inc_return(&q->mq_freeze_depth);
 	if (freeze_depth == 1) {
-		percpu_ref_kill(&q->q_usage_counter);
+		if (kill_percpu_ref)
+			percpu_ref_kill(&q->q_usage_counter);
 		blk_mq_run_hw_queues(q, false);
 	}
+	return freeze_depth == 1;
+}
+
+void blk_mq_freeze_queue_start(struct request_queue *q)
+{
+	__blk_mq_freeze_queue_start(q, true);
 }
 EXPORT_SYMBOL_GPL(blk_mq_freeze_queue_start);
 
@@ -102,19 +110,90 @@ void blk_mq_freeze_queue(struct request_queue *q)
 }
 EXPORT_SYMBOL_GPL(blk_mq_freeze_queue);
 
-void blk_mq_unfreeze_queue(struct request_queue *q)
+static bool __blk_mq_unfreeze_queue(struct request_queue *q,
+				    bool reinit_percpu_ref)
 {
 	int freeze_depth;
 
 	freeze_depth = atomic_dec_return(&q->mq_freeze_depth);
 	WARN_ON_ONCE(freeze_depth < 0);
 	if (!freeze_depth) {
-		percpu_ref_reinit(&q->q_usage_counter);
+		if (reinit_percpu_ref)
+			percpu_ref_reinit(&q->q_usage_counter);
 		wake_up_all(&q->mq_freeze_wq);
 	}
+	return freeze_depth == 0;
+}
+
+void blk_mq_unfreeze_queue(struct request_queue *q)
+{
+	__blk_mq_unfreeze_queue(q, true);
 }
 EXPORT_SYMBOL_GPL(blk_mq_unfreeze_queue);
 
+/**
+ * blk_mq_quiesce_queue() - wait until all pending queue_rq calls have finished
+ *
+ * Prevent that new I/O requests are queued and wait until all pending
+ * queue_rq() calls have finished. Must not be called if the queue has already
+ * been frozen. Additionally, freezing the queue after having quiesced the
+ * queue and before resuming the queue is not allowed.
+ *
+ * Note: this function does not prevent that the struct request end_io()
+ * callback function is invoked.
+ */
+void blk_mq_quiesce_queue(struct request_queue *q)
+{
+	struct blk_mq_hw_ctx *hctx;
+	unsigned int i;
+	bool res, rcu = false;
+
+	spin_lock_irq(q->queue_lock);
+	WARN_ON_ONCE(blk_queue_quiescing(q));
+	queue_flag_set(QUEUE_FLAG_QUIESCING, q);
+	spin_unlock_irq(q->queue_lock);
+
+	res = __blk_mq_freeze_queue_start(q, false);
+	WARN_ON_ONCE(!res);
+	queue_for_each_hw_ctx(q, hctx, i) {
+		if (hctx->flags & BLK_MQ_F_BLOCKING) {
+			mutex_lock(&hctx->queue_rq_mutex);
+			mutex_unlock(&hctx->queue_rq_mutex);
+		} else {
+			rcu = true;
+		}
+	}
+	if (rcu)
+		synchronize_rcu();
+
+	spin_lock_irq(q->queue_lock);
+	WARN_ON_ONCE(!blk_queue_quiescing(q));
+	queue_flag_clear(QUEUE_FLAG_QUIESCING, q);
+	spin_unlock_irq(q->queue_lock);
+}
+EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue);
+
+/**
+ * blk_mq_resume_queue() - resume request processing
+ *
+ * Resume blk_queue_enter() calls that have been suspended by
+ * blk_mq_quiesce_queue().
+ *
+ * The caller is responsible for serializing blk_mq_quiesce_queue() and
+ * blk_mq_resume_queue().
+ */
+void blk_mq_resume_queue(struct request_queue *q)
+{
+	bool res;
+
+	res = __blk_mq_unfreeze_queue(q, false);
+	WARN_ON_ONCE(!res);
+	WARN_ON_ONCE(blk_queue_quiescing(q));
+
+	blk_mq_run_hw_queues(q, false);
+}
+EXPORT_SYMBOL_GPL(blk_mq_resume_queue);
+
 void blk_mq_wake_waiters(struct request_queue *q)
 {
 	struct blk_mq_hw_ctx *hctx;
@@ -488,6 +567,9 @@ static void blk_mq_requeue_work(struct work_struct *work)
 	struct request *rq, *next;
 	unsigned long flags;
 
+	if (blk_queue_quiescing(q))
+		return;
+
 	spin_lock_irqsave(&q->requeue_lock, flags);
 	list_splice_init(&q->requeue_list, &rq_list);
 	spin_unlock_irqrestore(&q->requeue_lock, flags);
@@ -782,7 +864,7 @@ static inline unsigned int queued_to_index(unsigned int queued)
  * of IO. In particular, we'd like FIFO behaviour on handling existing
  * items on the hctx->dispatch list. Ignore that for now.
  */
-static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
+static void blk_mq_process_rq_list(struct blk_mq_hw_ctx *hctx)
 {
 	struct request_queue *q = hctx->queue;
 	struct request *rq;
@@ -791,9 +873,6 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
 	struct list_head *dptr;
 	int queued;
 
-	if (unlikely(test_bit(BLK_MQ_S_STOPPED, &hctx->state)))
-		return;
-
 	WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) &&
 		cpu_online(hctx->next_cpu));
 
@@ -883,7 +962,24 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
 		 *
 		 * blk_mq_run_hw_queue() already checks the STOPPED bit
 		 **/
-		blk_mq_run_hw_queue(hctx, true);
+		if (!blk_queue_quiescing(q))
+			blk_mq_run_hw_queue(hctx, true);
+	}
+}
+
+static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
+{
+	if (unlikely(test_bit(BLK_MQ_S_STOPPED, &hctx->state)))
+		return;
+
+	if (hctx->flags & BLK_MQ_F_BLOCKING) {
+		mutex_lock(&hctx->queue_rq_mutex);
+		blk_mq_process_rq_list(hctx);
+		mutex_unlock(&hctx->queue_rq_mutex);
+	} else {
+		rcu_read_lock();
+		blk_mq_process_rq_list(hctx);
+		rcu_read_unlock();
 	}
 }
 
@@ -1341,7 +1437,7 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
 		blk_mq_bio_to_request(rq, bio);
 
 		/*
-		 * We do limited pluging. If the bio can be merged, do that.
+		 * We do limited plugging. If the bio can be merged, do that.
 		 * Otherwise the existing request in the plug list will be
 		 * issued. So the plug list will have one request at most
 		 */
@@ -1361,9 +1457,23 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
 		blk_mq_put_ctx(data.ctx);
 		if (!old_rq)
 			goto done;
-		if (!blk_mq_direct_issue_request(old_rq, &cookie))
-			goto done;
-		blk_mq_insert_request(old_rq, false, true, true);
+
+		if (data.hctx->flags & BLK_MQ_F_BLOCKING) {
+			mutex_lock(&data.hctx->queue_rq_mutex);
+			if (blk_queue_quiescing(q) ||
+			    blk_mq_direct_issue_request(old_rq, &cookie) != 0)
+				blk_mq_insert_request(old_rq, false, true,
+						      true);
+			mutex_unlock(&data.hctx->queue_rq_mutex);
+		} else {
+			rcu_read_lock();
+			if (blk_queue_quiescing(q) ||
+			    blk_mq_direct_issue_request(old_rq, &cookie) != 0)
+				blk_mq_insert_request(old_rq, false, true,
+						      true);
+			rcu_read_unlock();
+		}
+
 		goto done;
 	}
 
@@ -1702,6 +1812,7 @@ static int blk_mq_init_hctx(struct request_queue *q,
 	INIT_DELAYED_WORK(&hctx->delay_work, blk_mq_delay_work_fn);
 	spin_lock_init(&hctx->lock);
 	INIT_LIST_HEAD(&hctx->dispatch);
+	mutex_init(&hctx->queue_rq_mutex);
 	hctx->queue = q;
 	hctx->queue_num = hctx_idx;
 	hctx->flags = set->flags & ~BLK_MQ_F_TAG_SHARED;
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 368c460d..4b970f1 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -41,6 +41,8 @@ struct blk_mq_hw_ctx {
 
 	struct blk_mq_tags	*tags;
 
+	struct mutex		queue_rq_mutex;
+
 	unsigned long		queued;
 	unsigned long		run;
 #define BLK_MQ_MAX_DISPATCH_ORDER	7
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index c47c358..d365cdf 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -505,6 +505,7 @@ struct request_queue {
 #define QUEUE_FLAG_FUA	       24	/* device supports FUA writes */
 #define QUEUE_FLAG_FLUSH_NQ    25	/* flush not queueuable */
 #define QUEUE_FLAG_DAX         26	/* device supports DAX */
+#define QUEUE_FLAG_QUIESCING   27
 
 #define QUEUE_FLAG_DEFAULT	((1 << QUEUE_FLAG_IO_STAT) |		\
 				 (1 << QUEUE_FLAG_STACKABLE)	|	\
@@ -595,6 +596,8 @@ static inline void queue_flag_clear(unsigned int flag, struct request_queue *q)
 #define blk_queue_secure_erase(q) \
 	(test_bit(QUEUE_FLAG_SECERASE, &(q)->queue_flags))
 #define blk_queue_dax(q)	test_bit(QUEUE_FLAG_DAX, &(q)->queue_flags)
+#define blk_queue_quiescing(q)	test_bit(QUEUE_FLAG_QUIESCING,	\
+					 &(q)->queue_flags)
 
 #define blk_noretry_request(rq) \
 	((rq)->cmd_flags & (REQ_FAILFAST_DEV|REQ_FAILFAST_TRANSPORT| \
@@ -824,6 +827,8 @@ extern void __blk_run_queue(struct request_queue *q);
 extern void __blk_run_queue_uncond(struct request_queue *q);
 extern void blk_run_queue(struct request_queue *);
 extern void blk_run_queue_async(struct request_queue *q);
+extern void blk_mq_quiesce_queue(struct request_queue *q);
+extern void blk_mq_resume_queue(struct request_queue *q);
 extern int blk_rq_map_user(struct request_queue *, struct request *,
 			   struct rq_map_data *, void __user *, unsigned long,
 			   gfp_t);
-- 
2.10.0

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v2 5/7] dm: Fix a race condition related to stopping and starting queues
@ 2016-09-29  0:00   ` Bart Van Assche
  0 siblings, 0 replies; 75+ messages in thread
From: Bart Van Assche @ 2016-09-29  0:00 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Christoph Hellwig, James Bottomley, Martin K. Petersen,
	Mike Snitzer, Doug Ledford, Keith Busch, linux-block, linux-scsi,
	linux-rdma, linux-nvme

Ensure that all ongoing dm_mq_queue_rq() and dm_mq_requeue_request()
calls have stopped before setting the "queue stopped" flag. This
allows to remove the "queue stopped" test from dm_mq_queue_rq() and
dm_mq_requeue_request(). This patch fixes a race condition because
dm_mq_queue_rq() is called without holding the queue lock and hence
BLK_MQ_S_STOPPED can be set at any time while dm_mq_queue_rq() is
in progress. This patch prevents that the following hang occurs
sporadically when using dm-mq:

INFO: task systemd-udevd:10111 blocked for more than 480 seconds.
      Not tainted 4.7.0-dbg+ #1
Call Trace:
 [<ffffffff8161f397>] schedule+0x37/0x90
 [<ffffffff816239ef>] schedule_timeout+0x27f/0x470
 [<ffffffff8161e76f>] io_schedule_timeout+0x9f/0x110
 [<ffffffff8161fb36>] bit_wait_io+0x16/0x60
 [<ffffffff8161f929>] __wait_on_bit_lock+0x49/0xa0
 [<ffffffff8114fe69>] __lock_page+0xb9/0xc0
 [<ffffffff81165d90>] truncate_inode_pages_range+0x3e0/0x760
 [<ffffffff81166120>] truncate_inode_pages+0x10/0x20
 [<ffffffff81212a20>] kill_bdev+0x30/0x40
 [<ffffffff81213d41>] __blkdev_put+0x71/0x360
 [<ffffffff81214079>] blkdev_put+0x49/0x170
 [<ffffffff812141c0>] blkdev_close+0x20/0x30
 [<ffffffff811d48e8>] __fput+0xe8/0x1f0
 [<ffffffff811d4a29>] ____fput+0x9/0x10
 [<ffffffff810842d3>] task_work_run+0x83/0xb0
 [<ffffffff8106606e>] do_exit+0x3ee/0xc40
 [<ffffffff8106694b>] do_group_exit+0x4b/0xc0
 [<ffffffff81073d9a>] get_signal+0x2ca/0x940
 [<ffffffff8101bf43>] do_signal+0x23/0x660
 [<ffffffff810022b3>] exit_to_usermode_loop+0x73/0xb0
 [<ffffffff81002cb0>] syscall_return_slowpath+0xb0/0xc0
 [<ffffffff81624e33>] entry_SYSCALL_64_fastpath+0xa6/0xa8

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Mike Snitzer <snitzer@redhat.com>
---
 drivers/md/dm-rq.c | 14 +++-----------
 1 file changed, 3 insertions(+), 11 deletions(-)

diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
index f2c271a..f7a8ba3 100644
--- a/drivers/md/dm-rq.c
+++ b/drivers/md/dm-rq.c
@@ -102,9 +102,12 @@ static void dm_mq_stop_queue(struct request_queue *q)
 	if (blk_mq_queue_stopped(q))
 		return;
 
+	/* Wait until dm_mq_queue_rq() has finished. */
+	blk_mq_quiesce_queue(q);
 	/* Avoid that requeuing could restart the queue. */
 	blk_mq_cancel_requeue_work(q);
 	blk_mq_stop_hw_queues(q);
+	blk_mq_resume_queue(q);
 }
 
 void dm_stop_queue(struct request_queue *q)
@@ -887,17 +890,6 @@ static int dm_mq_queue_rq(struct blk_mq_hw_ctx *hctx,
 		dm_put_live_table(md, srcu_idx);
 	}
 
-	/*
-	 * On suspend dm_stop_queue() handles stopping the blk-mq
-	 * request_queue BUT: even though the hw_queues are marked
-	 * BLK_MQ_S_STOPPED at that point there is still a race that
-	 * is allowing block/blk-mq.c to call ->queue_rq against a
-	 * hctx that it really shouldn't.  The following check guards
-	 * against this rarity (albeit _not_ race-free).
-	 */
-	if (unlikely(test_bit(BLK_MQ_S_STOPPED, &hctx->state)))
-		return BLK_MQ_RQ_QUEUE_BUSY;
-
 	if (ti->type->busy && ti->type->busy(ti))
 		return BLK_MQ_RQ_QUEUE_BUSY;
 
-- 
2.10.0


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v2 5/7] dm: Fix a race condition related to stopping and starting queues
@ 2016-09-29  0:00   ` Bart Van Assche
  0 siblings, 0 replies; 75+ messages in thread
From: Bart Van Assche @ 2016-09-29  0:00 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Christoph Hellwig, James Bottomley, Martin K. Petersen,
	Mike Snitzer, Doug Ledford, Keith Busch,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

Ensure that all ongoing dm_mq_queue_rq() and dm_mq_requeue_request()
calls have stopped before setting the "queue stopped" flag. This
allows to remove the "queue stopped" test from dm_mq_queue_rq() and
dm_mq_requeue_request(). This patch fixes a race condition because
dm_mq_queue_rq() is called without holding the queue lock and hence
BLK_MQ_S_STOPPED can be set at any time while dm_mq_queue_rq() is
in progress. This patch prevents that the following hang occurs
sporadically when using dm-mq:

INFO: task systemd-udevd:10111 blocked for more than 480 seconds.
      Not tainted 4.7.0-dbg+ #1
Call Trace:
 [<ffffffff8161f397>] schedule+0x37/0x90
 [<ffffffff816239ef>] schedule_timeout+0x27f/0x470
 [<ffffffff8161e76f>] io_schedule_timeout+0x9f/0x110
 [<ffffffff8161fb36>] bit_wait_io+0x16/0x60
 [<ffffffff8161f929>] __wait_on_bit_lock+0x49/0xa0
 [<ffffffff8114fe69>] __lock_page+0xb9/0xc0
 [<ffffffff81165d90>] truncate_inode_pages_range+0x3e0/0x760
 [<ffffffff81166120>] truncate_inode_pages+0x10/0x20
 [<ffffffff81212a20>] kill_bdev+0x30/0x40
 [<ffffffff81213d41>] __blkdev_put+0x71/0x360
 [<ffffffff81214079>] blkdev_put+0x49/0x170
 [<ffffffff812141c0>] blkdev_close+0x20/0x30
 [<ffffffff811d48e8>] __fput+0xe8/0x1f0
 [<ffffffff811d4a29>] ____fput+0x9/0x10
 [<ffffffff810842d3>] task_work_run+0x83/0xb0
 [<ffffffff8106606e>] do_exit+0x3ee/0xc40
 [<ffffffff8106694b>] do_group_exit+0x4b/0xc0
 [<ffffffff81073d9a>] get_signal+0x2ca/0x940
 [<ffffffff8101bf43>] do_signal+0x23/0x660
 [<ffffffff810022b3>] exit_to_usermode_loop+0x73/0xb0
 [<ffffffff81002cb0>] syscall_return_slowpath+0xb0/0xc0
 [<ffffffff81624e33>] entry_SYSCALL_64_fastpath+0xa6/0xa8

Signed-off-by: Bart Van Assche <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
Reviewed-by: Hannes Reinecke <hare-IBi9RG/b67k@public.gmane.org>
Reviewed-by: Johannes Thumshirn <jthumshirn-l3A5Bk7waGM@public.gmane.org>
Cc: Mike Snitzer <snitzer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 drivers/md/dm-rq.c | 14 +++-----------
 1 file changed, 3 insertions(+), 11 deletions(-)

diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
index f2c271a..f7a8ba3 100644
--- a/drivers/md/dm-rq.c
+++ b/drivers/md/dm-rq.c
@@ -102,9 +102,12 @@ static void dm_mq_stop_queue(struct request_queue *q)
 	if (blk_mq_queue_stopped(q))
 		return;
 
+	/* Wait until dm_mq_queue_rq() has finished. */
+	blk_mq_quiesce_queue(q);
 	/* Avoid that requeuing could restart the queue. */
 	blk_mq_cancel_requeue_work(q);
 	blk_mq_stop_hw_queues(q);
+	blk_mq_resume_queue(q);
 }
 
 void dm_stop_queue(struct request_queue *q)
@@ -887,17 +890,6 @@ static int dm_mq_queue_rq(struct blk_mq_hw_ctx *hctx,
 		dm_put_live_table(md, srcu_idx);
 	}
 
-	/*
-	 * On suspend dm_stop_queue() handles stopping the blk-mq
-	 * request_queue BUT: even though the hw_queues are marked
-	 * BLK_MQ_S_STOPPED at that point there is still a race that
-	 * is allowing block/blk-mq.c to call ->queue_rq against a
-	 * hctx that it really shouldn't.  The following check guards
-	 * against this rarity (albeit _not_ race-free).
-	 */
-	if (unlikely(test_bit(BLK_MQ_S_STOPPED, &hctx->state)))
-		return BLK_MQ_RQ_QUEUE_BUSY;
-
 	if (ti->type->busy && ti->type->busy(ti))
 		return BLK_MQ_RQ_QUEUE_BUSY;
 
-- 
2.10.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v2 5/7] dm: Fix a race condition related to stopping and starting queues
@ 2016-09-29  0:00   ` Bart Van Assche
  0 siblings, 0 replies; 75+ messages in thread
From: Bart Van Assche @ 2016-09-29  0:00 UTC (permalink / raw)


Ensure that all ongoing dm_mq_queue_rq() and dm_mq_requeue_request()
calls have stopped before setting the "queue stopped" flag. This
allows to remove the "queue stopped" test from dm_mq_queue_rq() and
dm_mq_requeue_request(). This patch fixes a race condition because
dm_mq_queue_rq() is called without holding the queue lock and hence
BLK_MQ_S_STOPPED can be set at any time while dm_mq_queue_rq() is
in progress. This patch prevents that the following hang occurs
sporadically when using dm-mq:

INFO: task systemd-udevd:10111 blocked for more than 480 seconds.
      Not tainted 4.7.0-dbg+ #1
Call Trace:
 [<ffffffff8161f397>] schedule+0x37/0x90
 [<ffffffff816239ef>] schedule_timeout+0x27f/0x470
 [<ffffffff8161e76f>] io_schedule_timeout+0x9f/0x110
 [<ffffffff8161fb36>] bit_wait_io+0x16/0x60
 [<ffffffff8161f929>] __wait_on_bit_lock+0x49/0xa0
 [<ffffffff8114fe69>] __lock_page+0xb9/0xc0
 [<ffffffff81165d90>] truncate_inode_pages_range+0x3e0/0x760
 [<ffffffff81166120>] truncate_inode_pages+0x10/0x20
 [<ffffffff81212a20>] kill_bdev+0x30/0x40
 [<ffffffff81213d41>] __blkdev_put+0x71/0x360
 [<ffffffff81214079>] blkdev_put+0x49/0x170
 [<ffffffff812141c0>] blkdev_close+0x20/0x30
 [<ffffffff811d48e8>] __fput+0xe8/0x1f0
 [<ffffffff811d4a29>] ____fput+0x9/0x10
 [<ffffffff810842d3>] task_work_run+0x83/0xb0
 [<ffffffff8106606e>] do_exit+0x3ee/0xc40
 [<ffffffff8106694b>] do_group_exit+0x4b/0xc0
 [<ffffffff81073d9a>] get_signal+0x2ca/0x940
 [<ffffffff8101bf43>] do_signal+0x23/0x660
 [<ffffffff810022b3>] exit_to_usermode_loop+0x73/0xb0
 [<ffffffff81002cb0>] syscall_return_slowpath+0xb0/0xc0
 [<ffffffff81624e33>] entry_SYSCALL_64_fastpath+0xa6/0xa8

Signed-off-by: Bart Van Assche <bart.vanassche at sandisk.com>
Reviewed-by: Hannes Reinecke <hare at suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn at suse.de>
Cc: Mike Snitzer <snitzer at redhat.com>
---
 drivers/md/dm-rq.c | 14 +++-----------
 1 file changed, 3 insertions(+), 11 deletions(-)

diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
index f2c271a..f7a8ba3 100644
--- a/drivers/md/dm-rq.c
+++ b/drivers/md/dm-rq.c
@@ -102,9 +102,12 @@ static void dm_mq_stop_queue(struct request_queue *q)
 	if (blk_mq_queue_stopped(q))
 		return;
 
+	/* Wait until dm_mq_queue_rq() has finished. */
+	blk_mq_quiesce_queue(q);
 	/* Avoid that requeuing could restart the queue. */
 	blk_mq_cancel_requeue_work(q);
 	blk_mq_stop_hw_queues(q);
+	blk_mq_resume_queue(q);
 }
 
 void dm_stop_queue(struct request_queue *q)
@@ -887,17 +890,6 @@ static int dm_mq_queue_rq(struct blk_mq_hw_ctx *hctx,
 		dm_put_live_table(md, srcu_idx);
 	}
 
-	/*
-	 * On suspend dm_stop_queue() handles stopping the blk-mq
-	 * request_queue BUT: even though the hw_queues are marked
-	 * BLK_MQ_S_STOPPED at that point there is still a race that
-	 * is allowing block/blk-mq.c to call ->queue_rq against a
-	 * hctx that it really shouldn't.  The following check guards
-	 * against this rarity (albeit _not_ race-free).
-	 */
-	if (unlikely(test_bit(BLK_MQ_S_STOPPED, &hctx->state)))
-		return BLK_MQ_RQ_QUEUE_BUSY;
-
 	if (ti->type->busy && ti->type->busy(ti))
 		return BLK_MQ_RQ_QUEUE_BUSY;
 
-- 
2.10.0

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v2 6/7] SRP transport: Port srp_wait_for_queuecommand() to scsi-mq
  2016-09-28 23:57 ` Bart Van Assche
@ 2016-09-29  0:01   ` Bart Van Assche
  -1 siblings, 0 replies; 75+ messages in thread
From: Bart Van Assche @ 2016-09-29  0:01 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Christoph Hellwig, James Bottomley, Martin K. Petersen,
	Mike Snitzer, Doug Ledford, Keith Busch, linux-block, linux-scsi,
	linux-rdma, linux-nvme

Ensure that if scsi-mq is enabled that srp_wait_for_queuecommand()
waits until ongoing shost->hostt->queuecommand() calls have finished.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: James Bottomley <jejb@linux.vnet.ibm.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Doug Ledford <dledford@redhat.com>
---
 drivers/scsi/scsi_transport_srp.c | 24 ++++++++++++++++++++----
 1 file changed, 20 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/scsi_transport_srp.c b/drivers/scsi/scsi_transport_srp.c
index e3cd3ec..f1d580e 100644
--- a/drivers/scsi/scsi_transport_srp.c
+++ b/drivers/scsi/scsi_transport_srp.c
@@ -25,6 +25,7 @@
 #include <linux/slab.h>
 #include <linux/string.h>
 #include <linux/delay.h>
+#include <linux/blk-mq.h>
 
 #include <scsi/scsi.h>
 #include <scsi/scsi_cmnd.h>
@@ -405,8 +406,6 @@ static void srp_reconnect_work(struct work_struct *work)
 /**
  * scsi_request_fn_active() - number of kernel threads inside scsi_request_fn()
  * @shost: SCSI host for which to count the number of scsi_request_fn() callers.
- *
- * To do: add support for scsi-mq in this function.
  */
 static int scsi_request_fn_active(struct Scsi_Host *shost)
 {
@@ -425,11 +424,28 @@ static int scsi_request_fn_active(struct Scsi_Host *shost)
 	return request_fn_active;
 }
 
+static void srp_mq_wait_for_queuecommand(struct Scsi_Host *shost)
+{
+	struct scsi_device *sdev;
+	struct request_queue *q;
+
+	shost_for_each_device(sdev, shost) {
+		q = sdev->request_queue;
+
+		blk_mq_quiesce_queue(q);
+		blk_mq_resume_queue(q);
+	}
+}
+
 /* Wait until ongoing shost->hostt->queuecommand() calls have finished. */
 static void srp_wait_for_queuecommand(struct Scsi_Host *shost)
 {
-	while (scsi_request_fn_active(shost))
-		msleep(20);
+	if (shost->use_blk_mq) {
+		srp_mq_wait_for_queuecommand(shost);
+	} else {
+		while (scsi_request_fn_active(shost))
+			msleep(20);
+	}
 }
 
 static void __rport_fail_io_fast(struct srp_rport *rport)
-- 
2.10.0


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v2 6/7] SRP transport: Port srp_wait_for_queuecommand() to scsi-mq
@ 2016-09-29  0:01   ` Bart Van Assche
  0 siblings, 0 replies; 75+ messages in thread
From: Bart Van Assche @ 2016-09-29  0:01 UTC (permalink / raw)


Ensure that if scsi-mq is enabled that srp_wait_for_queuecommand()
waits until ongoing shost->hostt->queuecommand() calls have finished.

Signed-off-by: Bart Van Assche <bart.vanassche at sandisk.com>
Cc: James Bottomley <jejb at linux.vnet.ibm.com>
Cc: Martin K. Petersen <martin.petersen at oracle.com>
Cc: Doug Ledford <dledford at redhat.com>
---
 drivers/scsi/scsi_transport_srp.c | 24 ++++++++++++++++++++----
 1 file changed, 20 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/scsi_transport_srp.c b/drivers/scsi/scsi_transport_srp.c
index e3cd3ec..f1d580e 100644
--- a/drivers/scsi/scsi_transport_srp.c
+++ b/drivers/scsi/scsi_transport_srp.c
@@ -25,6 +25,7 @@
 #include <linux/slab.h>
 #include <linux/string.h>
 #include <linux/delay.h>
+#include <linux/blk-mq.h>
 
 #include <scsi/scsi.h>
 #include <scsi/scsi_cmnd.h>
@@ -405,8 +406,6 @@ static void srp_reconnect_work(struct work_struct *work)
 /**
  * scsi_request_fn_active() - number of kernel threads inside scsi_request_fn()
  * @shost: SCSI host for which to count the number of scsi_request_fn() callers.
- *
- * To do: add support for scsi-mq in this function.
  */
 static int scsi_request_fn_active(struct Scsi_Host *shost)
 {
@@ -425,11 +424,28 @@ static int scsi_request_fn_active(struct Scsi_Host *shost)
 	return request_fn_active;
 }
 
+static void srp_mq_wait_for_queuecommand(struct Scsi_Host *shost)
+{
+	struct scsi_device *sdev;
+	struct request_queue *q;
+
+	shost_for_each_device(sdev, shost) {
+		q = sdev->request_queue;
+
+		blk_mq_quiesce_queue(q);
+		blk_mq_resume_queue(q);
+	}
+}
+
 /* Wait until ongoing shost->hostt->queuecommand() calls have finished. */
 static void srp_wait_for_queuecommand(struct Scsi_Host *shost)
 {
-	while (scsi_request_fn_active(shost))
-		msleep(20);
+	if (shost->use_blk_mq) {
+		srp_mq_wait_for_queuecommand(shost);
+	} else {
+		while (scsi_request_fn_active(shost))
+			msleep(20);
+	}
 }
 
 static void __rport_fail_io_fast(struct srp_rport *rport)
-- 
2.10.0

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v2 7/7] [RFC] nvme: Fix a race condition
  2016-09-28 23:57 ` Bart Van Assche
@ 2016-09-29  0:01   ` Bart Van Assche
  -1 siblings, 0 replies; 75+ messages in thread
From: Bart Van Assche @ 2016-09-29  0:01 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Christoph Hellwig, James Bottomley, Martin K. Petersen,
	Mike Snitzer, Doug Ledford, Keith Busch, linux-block, linux-scsi,
	linux-rdma, linux-nvme

Avoid that nvme_queue_rq() is still running when nvme_stop_queues()
returns. Untested.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
---
 drivers/nvme/host/core.c | 20 ++++++++++++--------
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index d791fba..98f1f29 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -201,13 +201,9 @@ fail:
 
 void nvme_requeue_req(struct request *req)
 {
-	unsigned long flags;
-
 	blk_mq_requeue_request(req);
-	spin_lock_irqsave(req->q->queue_lock, flags);
-	if (!blk_mq_queue_stopped(req->q))
-		blk_mq_kick_requeue_list(req->q);
-	spin_unlock_irqrestore(req->q->queue_lock, flags);
+	WARN_ON_ONCE(blk_mq_queue_stopped(req->q));
+	blk_mq_kick_requeue_list(req->q);
 }
 EXPORT_SYMBOL_GPL(nvme_requeue_req);
 
@@ -2077,11 +2073,19 @@ EXPORT_SYMBOL_GPL(nvme_kill_queues);
 void nvme_stop_queues(struct nvme_ctrl *ctrl)
 {
 	struct nvme_ns *ns;
+	struct request_queue *q;
 
 	mutex_lock(&ctrl->namespaces_mutex);
 	list_for_each_entry(ns, &ctrl->namespaces, list) {
-		blk_mq_cancel_requeue_work(ns->queue);
-		blk_mq_stop_hw_queues(ns->queue);
+		q = ns->queue;
+		blk_mq_quiesce_queue(q);
+		blk_mq_cancel_requeue_work(q);
+		blk_mq_stop_hw_queues(q);
+		/*
+		 * Note: blk_mq_resume_queue() does not affect the value
+		 * of the BLK_MQ_S_STOPPED bit.
+		 */
+		blk_mq_resume_queue(q);
 	}
 	mutex_unlock(&ctrl->namespaces_mutex);
 }
-- 
2.10.0


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v2 7/7] [RFC] nvme: Fix a race condition
@ 2016-09-29  0:01   ` Bart Van Assche
  0 siblings, 0 replies; 75+ messages in thread
From: Bart Van Assche @ 2016-09-29  0:01 UTC (permalink / raw)


Avoid that nvme_queue_rq() is still running when nvme_stop_queues()
returns. Untested.

Signed-off-by: Bart Van Assche <bart.vanassche at sandisk.com>
Cc: Keith Busch <keith.busch at intel.com>
Cc: Christoph Hellwig <hch at lst.de>
Cc: Sagi Grimberg <sagi at grimberg.me>
---
 drivers/nvme/host/core.c | 20 ++++++++++++--------
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index d791fba..98f1f29 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -201,13 +201,9 @@ fail:
 
 void nvme_requeue_req(struct request *req)
 {
-	unsigned long flags;
-
 	blk_mq_requeue_request(req);
-	spin_lock_irqsave(req->q->queue_lock, flags);
-	if (!blk_mq_queue_stopped(req->q))
-		blk_mq_kick_requeue_list(req->q);
-	spin_unlock_irqrestore(req->q->queue_lock, flags);
+	WARN_ON_ONCE(blk_mq_queue_stopped(req->q));
+	blk_mq_kick_requeue_list(req->q);
 }
 EXPORT_SYMBOL_GPL(nvme_requeue_req);
 
@@ -2077,11 +2073,19 @@ EXPORT_SYMBOL_GPL(nvme_kill_queues);
 void nvme_stop_queues(struct nvme_ctrl *ctrl)
 {
 	struct nvme_ns *ns;
+	struct request_queue *q;
 
 	mutex_lock(&ctrl->namespaces_mutex);
 	list_for_each_entry(ns, &ctrl->namespaces, list) {
-		blk_mq_cancel_requeue_work(ns->queue);
-		blk_mq_stop_hw_queues(ns->queue);
+		q = ns->queue;
+		blk_mq_quiesce_queue(q);
+		blk_mq_cancel_requeue_work(q);
+		blk_mq_stop_hw_queues(q);
+		/*
+		 * Note: blk_mq_resume_queue() does not affect the value
+		 * of the BLK_MQ_S_STOPPED bit.
+		 */
+		blk_mq_resume_queue(q);
 	}
 	mutex_unlock(&ctrl->namespaces_mutex);
 }
-- 
2.10.0

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v2 2/7] dm: Use BLK_MQ_S_STOPPED instead of QUEUE_FLAG_STOPPED in blk-mq code
  2016-09-28 23:57 ` Bart Van Assche
@ 2016-09-29  0:02   ` Bart Van Assche
  -1 siblings, 0 replies; 75+ messages in thread
From: Bart Van Assche @ 2016-09-29  0:02 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Christoph Hellwig, James Bottomley, Martin K. Petersen,
	Mike Snitzer, Doug Ledford, Keith Busch, linux-block, linux-scsi,
	linux-rdma, linux-nvme

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Mike Snitzer <snitzer@redhat.com>
---
 drivers/md/dm-rq.c | 18 ++----------------
 1 file changed, 2 insertions(+), 16 deletions(-)

diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
index 182b679..f2c271a 100644
--- a/drivers/md/dm-rq.c
+++ b/drivers/md/dm-rq.c
@@ -75,12 +75,6 @@ static void dm_old_start_queue(struct request_queue *q)
 
 static void dm_mq_start_queue(struct request_queue *q)
 {
-	unsigned long flags;
-
-	spin_lock_irqsave(q->queue_lock, flags);
-	queue_flag_clear(QUEUE_FLAG_STOPPED, q);
-	spin_unlock_irqrestore(q->queue_lock, flags);
-
 	blk_mq_start_stopped_hw_queues(q, true);
 	blk_mq_kick_requeue_list(q);
 }
@@ -105,16 +99,8 @@ static void dm_old_stop_queue(struct request_queue *q)
 
 static void dm_mq_stop_queue(struct request_queue *q)
 {
-	unsigned long flags;
-
-	spin_lock_irqsave(q->queue_lock, flags);
-	if (blk_queue_stopped(q)) {
-		spin_unlock_irqrestore(q->queue_lock, flags);
+	if (blk_mq_queue_stopped(q))
 		return;
-	}
-
-	queue_flag_set(QUEUE_FLAG_STOPPED, q);
-	spin_unlock_irqrestore(q->queue_lock, flags);
 
 	/* Avoid that requeuing could restart the queue. */
 	blk_mq_cancel_requeue_work(q);
@@ -341,7 +327,7 @@ static void __dm_mq_kick_requeue_list(struct request_queue *q, unsigned long mse
 	unsigned long flags;
 
 	spin_lock_irqsave(q->queue_lock, flags);
-	if (!blk_queue_stopped(q))
+	if (!blk_mq_queue_stopped(q))
 		blk_mq_delay_kick_requeue_list(q, msecs);
 	spin_unlock_irqrestore(q->queue_lock, flags);
 }
-- 
2.10.0


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v2 2/7] dm: Use BLK_MQ_S_STOPPED instead of QUEUE_FLAG_STOPPED in blk-mq code
@ 2016-09-29  0:02   ` Bart Van Assche
  0 siblings, 0 replies; 75+ messages in thread
From: Bart Van Assche @ 2016-09-29  0:02 UTC (permalink / raw)


Signed-off-by: Bart Van Assche <bart.vanassche at sandisk.com>
Cc: Mike Snitzer <snitzer at redhat.com>
---
 drivers/md/dm-rq.c | 18 ++----------------
 1 file changed, 2 insertions(+), 16 deletions(-)

diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
index 182b679..f2c271a 100644
--- a/drivers/md/dm-rq.c
+++ b/drivers/md/dm-rq.c
@@ -75,12 +75,6 @@ static void dm_old_start_queue(struct request_queue *q)
 
 static void dm_mq_start_queue(struct request_queue *q)
 {
-	unsigned long flags;
-
-	spin_lock_irqsave(q->queue_lock, flags);
-	queue_flag_clear(QUEUE_FLAG_STOPPED, q);
-	spin_unlock_irqrestore(q->queue_lock, flags);
-
 	blk_mq_start_stopped_hw_queues(q, true);
 	blk_mq_kick_requeue_list(q);
 }
@@ -105,16 +99,8 @@ static void dm_old_stop_queue(struct request_queue *q)
 
 static void dm_mq_stop_queue(struct request_queue *q)
 {
-	unsigned long flags;
-
-	spin_lock_irqsave(q->queue_lock, flags);
-	if (blk_queue_stopped(q)) {
-		spin_unlock_irqrestore(q->queue_lock, flags);
+	if (blk_mq_queue_stopped(q))
 		return;
-	}
-
-	queue_flag_set(QUEUE_FLAG_STOPPED, q);
-	spin_unlock_irqrestore(q->queue_lock, flags);
 
 	/* Avoid that requeuing could restart the queue. */
 	blk_mq_cancel_requeue_work(q);
@@ -341,7 +327,7 @@ static void __dm_mq_kick_requeue_list(struct request_queue *q, unsigned long mse
 	unsigned long flags;
 
 	spin_lock_irqsave(q->queue_lock, flags);
-	if (!blk_queue_stopped(q))
+	if (!blk_mq_queue_stopped(q))
 		blk_mq_delay_kick_requeue_list(q, msecs);
 	spin_unlock_irqrestore(q->queue_lock, flags);
 }
-- 
2.10.0

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v2 3/7] [RFC] nvme: Use BLK_MQ_S_STOPPED instead of QUEUE_FLAG_STOPPED in blk-mq code
  2016-09-28 23:57 ` Bart Van Assche
@ 2016-09-29  0:02   ` Bart Van Assche
  -1 siblings, 0 replies; 75+ messages in thread
From: Bart Van Assche @ 2016-09-29  0:02 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Christoph Hellwig, James Bottomley, Martin K. Petersen,
	Mike Snitzer, Doug Ledford, Keith Busch, linux-block, linux-scsi,
	linux-rdma, linux-nvme

Make nvme_requeue_req() check BLK_MQ_S_STOPPED instead of
QUEUE_FLAG_STOPPED. Remove the QUEUE_FLAG_STOPPED manipulations
that became superfluous because of this change. This patch fixes
a race condition: using queue_flag_clear_unlocked() is not safe
if any other function that manipulates the queue flags can be
called concurrently, e.g. blk_cleanup_queue(). Untested.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
---
 drivers/nvme/host/core.c | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 4669c05..d791fba 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -205,7 +205,7 @@ void nvme_requeue_req(struct request *req)
 
 	blk_mq_requeue_request(req);
 	spin_lock_irqsave(req->q->queue_lock, flags);
-	if (!blk_queue_stopped(req->q))
+	if (!blk_mq_queue_stopped(req->q))
 		blk_mq_kick_requeue_list(req->q);
 	spin_unlock_irqrestore(req->q->queue_lock, flags);
 }
@@ -2080,10 +2080,6 @@ void nvme_stop_queues(struct nvme_ctrl *ctrl)
 
 	mutex_lock(&ctrl->namespaces_mutex);
 	list_for_each_entry(ns, &ctrl->namespaces, list) {
-		spin_lock_irq(ns->queue->queue_lock);
-		queue_flag_set(QUEUE_FLAG_STOPPED, ns->queue);
-		spin_unlock_irq(ns->queue->queue_lock);
-
 		blk_mq_cancel_requeue_work(ns->queue);
 		blk_mq_stop_hw_queues(ns->queue);
 	}
@@ -2097,7 +2093,6 @@ void nvme_start_queues(struct nvme_ctrl *ctrl)
 
 	mutex_lock(&ctrl->namespaces_mutex);
 	list_for_each_entry(ns, &ctrl->namespaces, list) {
-		queue_flag_clear_unlocked(QUEUE_FLAG_STOPPED, ns->queue);
 		blk_mq_start_stopped_hw_queues(ns->queue, true);
 		blk_mq_kick_requeue_list(ns->queue);
 	}
-- 
2.10.0


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v2 3/7] [RFC] nvme: Use BLK_MQ_S_STOPPED instead of QUEUE_FLAG_STOPPED in blk-mq code
@ 2016-09-29  0:02   ` Bart Van Assche
  0 siblings, 0 replies; 75+ messages in thread
From: Bart Van Assche @ 2016-09-29  0:02 UTC (permalink / raw)


Make nvme_requeue_req() check BLK_MQ_S_STOPPED instead of
QUEUE_FLAG_STOPPED. Remove the QUEUE_FLAG_STOPPED manipulations
that became superfluous because of this change. This patch fixes
a race condition: using queue_flag_clear_unlocked() is not safe
if any other function that manipulates the queue flags can be
called concurrently, e.g. blk_cleanup_queue(). Untested.

Signed-off-by: Bart Van Assche <bart.vanassche at sandisk.com>
Cc: Keith Busch <keith.busch at intel.com>
Cc: Christoph Hellwig <hch at lst.de>
Cc: Sagi Grimberg <sagi at grimberg.me>
---
 drivers/nvme/host/core.c | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 4669c05..d791fba 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -205,7 +205,7 @@ void nvme_requeue_req(struct request *req)
 
 	blk_mq_requeue_request(req);
 	spin_lock_irqsave(req->q->queue_lock, flags);
-	if (!blk_queue_stopped(req->q))
+	if (!blk_mq_queue_stopped(req->q))
 		blk_mq_kick_requeue_list(req->q);
 	spin_unlock_irqrestore(req->q->queue_lock, flags);
 }
@@ -2080,10 +2080,6 @@ void nvme_stop_queues(struct nvme_ctrl *ctrl)
 
 	mutex_lock(&ctrl->namespaces_mutex);
 	list_for_each_entry(ns, &ctrl->namespaces, list) {
-		spin_lock_irq(ns->queue->queue_lock);
-		queue_flag_set(QUEUE_FLAG_STOPPED, ns->queue);
-		spin_unlock_irq(ns->queue->queue_lock);
-
 		blk_mq_cancel_requeue_work(ns->queue);
 		blk_mq_stop_hw_queues(ns->queue);
 	}
@@ -2097,7 +2093,6 @@ void nvme_start_queues(struct nvme_ctrl *ctrl)
 
 	mutex_lock(&ctrl->namespaces_mutex);
 	list_for_each_entry(ns, &ctrl->namespaces, list) {
-		queue_flag_clear_unlocked(QUEUE_FLAG_STOPPED, ns->queue);
 		blk_mq_start_stopped_hw_queues(ns->queue, true);
 		blk_mq_kick_requeue_list(ns->queue);
 	}
-- 
2.10.0

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 4/7] blk-mq: Introduce blk_quiesce_queue() and blk_resume_queue()
@ 2016-09-29  5:52     ` Hannes Reinecke
  0 siblings, 0 replies; 75+ messages in thread
From: Hannes Reinecke @ 2016-09-29  5:52 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: Christoph Hellwig, James Bottomley, Martin K. Petersen,
	Mike Snitzer, Doug Ledford, Keith Busch, linux-block, linux-scsi,
	linux-rdma, linux-nvme

On 09/29/2016 01:59 AM, Bart Van Assche wrote:
> blk_quiesce_queue() prevents that new queue_rq() invocations
> occur and waits until ongoing invocations have finished. This
> function does *not* wait until all outstanding requests have
> finished (this means invocation of request.end_io()).
> blk_resume_queue() resumes normal I/O processing.
> 
> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
> Cc: Hannes Reinecke <hare@suse.com>
> Cc: Johannes Thumshirn <jthumshirn@suse.de>
> ---
>  block/blk-mq.c         | 137 ++++++++++++++++++++++++++++++++++++++++++++-----
>  include/linux/blk-mq.h |   2 +
>  include/linux/blkdev.h |   5 ++
>  3 files changed, 131 insertions(+), 13 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.com>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 4/7] blk-mq: Introduce blk_quiesce_queue() and blk_resume_queue()
@ 2016-09-29  5:52     ` Hannes Reinecke
  0 siblings, 0 replies; 75+ messages in thread
From: Hannes Reinecke @ 2016-09-29  5:52 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: Christoph Hellwig, James Bottomley, Martin K. Petersen,
	Mike Snitzer, Doug Ledford, Keith Busch,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On 09/29/2016 01:59 AM, Bart Van Assche wrote:
> blk_quiesce_queue() prevents that new queue_rq() invocations
> occur and waits until ongoing invocations have finished. This
> function does *not* wait until all outstanding requests have
> finished (this means invocation of request.end_io()).
> blk_resume_queue() resumes normal I/O processing.
> 
> Signed-off-by: Bart Van Assche <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
> Cc: Hannes Reinecke <hare-IBi9RG/b67k@public.gmane.org>
> Cc: Johannes Thumshirn <jthumshirn-l3A5Bk7waGM@public.gmane.org>
> ---
>  block/blk-mq.c         | 137 ++++++++++++++++++++++++++++++++++++++++++++-----
>  include/linux/blk-mq.h |   2 +
>  include/linux/blkdev.h |   5 ++
>  3 files changed, 131 insertions(+), 13 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare-IBi9RG/b67k@public.gmane.org>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare-l3A5Bk7waGM@public.gmane.org			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH v2 4/7] blk-mq: Introduce blk_quiesce_queue() and blk_resume_queue()
@ 2016-09-29  5:52     ` Hannes Reinecke
  0 siblings, 0 replies; 75+ messages in thread
From: Hannes Reinecke @ 2016-09-29  5:52 UTC (permalink / raw)


On 09/29/2016 01:59 AM, Bart Van Assche wrote:
> blk_quiesce_queue() prevents that new queue_rq() invocations
> occur and waits until ongoing invocations have finished. This
> function does *not* wait until all outstanding requests have
> finished (this means invocation of request.end_io()).
> blk_resume_queue() resumes normal I/O processing.
> 
> Signed-off-by: Bart Van Assche <bart.vanassche at sandisk.com>
> Cc: Hannes Reinecke <hare at suse.com>
> Cc: Johannes Thumshirn <jthumshirn at suse.de>
> ---
>  block/blk-mq.c         | 137 ++++++++++++++++++++++++++++++++++++++++++++-----
>  include/linux/blk-mq.h |   2 +
>  include/linux/blkdev.h |   5 ++
>  3 files changed, 131 insertions(+), 13 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare at suse.com>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare at suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: J. Hawn, J. Guild, F. Imend?rffer, HRB 16746 (AG N?rnberg)

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 6/7] SRP transport: Port srp_wait_for_queuecommand() to scsi-mq
  2016-09-29  0:01   ` Bart Van Assche
@ 2016-09-29  5:54     ` Hannes Reinecke
  -1 siblings, 0 replies; 75+ messages in thread
From: Hannes Reinecke @ 2016-09-29  5:54 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: Christoph Hellwig, James Bottomley, Martin K. Petersen,
	Mike Snitzer, Doug Ledford, Keith Busch, linux-block, linux-scsi,
	linux-rdma, linux-nvme

On 09/29/2016 02:01 AM, Bart Van Assche wrote:
> Ensure that if scsi-mq is enabled that srp_wait_for_queuecommand()
> waits until ongoing shost->hostt->queuecommand() calls have finished.
> 
> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
> Cc: James Bottomley <jejb@linux.vnet.ibm.com>
> Cc: Martin K. Petersen <martin.petersen@oracle.com>
> Cc: Doug Ledford <dledford@redhat.com>
> ---
>  drivers/scsi/scsi_transport_srp.c | 24 ++++++++++++++++++++----
>  1 file changed, 20 insertions(+), 4 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.com>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH v2 6/7] SRP transport: Port srp_wait_for_queuecommand() to scsi-mq
@ 2016-09-29  5:54     ` Hannes Reinecke
  0 siblings, 0 replies; 75+ messages in thread
From: Hannes Reinecke @ 2016-09-29  5:54 UTC (permalink / raw)


On 09/29/2016 02:01 AM, Bart Van Assche wrote:
> Ensure that if scsi-mq is enabled that srp_wait_for_queuecommand()
> waits until ongoing shost->hostt->queuecommand() calls have finished.
> 
> Signed-off-by: Bart Van Assche <bart.vanassche at sandisk.com>
> Cc: James Bottomley <jejb at linux.vnet.ibm.com>
> Cc: Martin K. Petersen <martin.petersen at oracle.com>
> Cc: Doug Ledford <dledford at redhat.com>
> ---
>  drivers/scsi/scsi_transport_srp.c | 24 ++++++++++++++++++++----
>  1 file changed, 20 insertions(+), 4 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare at suse.com>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare at suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: J. Hawn, J. Guild, F. Imend?rffer, HRB 16746 (AG N?rnberg)

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 4/7] blk-mq: Introduce blk_quiesce_queue() and blk_resume_queue()
@ 2016-09-29 21:51     ` Ming Lei
  0 siblings, 0 replies; 75+ messages in thread
From: Ming Lei @ 2016-09-29 21:51 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, Christoph Hellwig, James Bottomley,
	Martin K. Petersen, Mike Snitzer, Doug Ledford, Keith Busch,
	linux-block, linux-scsi, linux-rdma, linux-nvme

On Thu, Sep 29, 2016 at 7:59 AM, Bart Van Assche
<bart.vanassche@sandisk.com> wrote:
> blk_quiesce_queue() prevents that new queue_rq() invocations

blk_mq_quiesce_queue()

> occur and waits until ongoing invocations have finished. This
> function does *not* wait until all outstanding requests have

I guess it still may wait finally since this way may exhaust tag space
easily, :-)

> finished (this means invocation of request.end_io()).
> blk_resume_queue() resumes normal I/O processing.
>
> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
> Cc: Hannes Reinecke <hare@suse.com>
> Cc: Johannes Thumshirn <jthumshirn@suse.de>
> ---
>  block/blk-mq.c         | 137 ++++++++++++++++++++++++++++++++++++++++++++-----
>  include/linux/blk-mq.h |   2 +
>  include/linux/blkdev.h |   5 ++
>  3 files changed, 131 insertions(+), 13 deletions(-)
>
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index d8c45de..f5c71ad 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -58,15 +58,23 @@ static void blk_mq_hctx_clear_pending(struct blk_mq_hw_ctx *hctx,
>         sbitmap_clear_bit(&hctx->ctx_map, ctx->index_hw);
>  }
>
> -void blk_mq_freeze_queue_start(struct request_queue *q)
> +static bool __blk_mq_freeze_queue_start(struct request_queue *q,
> +                                       bool kill_percpu_ref)
>  {
>         int freeze_depth;
>
>         freeze_depth = atomic_inc_return(&q->mq_freeze_depth);
>         if (freeze_depth == 1) {
> -               percpu_ref_kill(&q->q_usage_counter);
> +               if (kill_percpu_ref)
> +                       percpu_ref_kill(&q->q_usage_counter);
>                 blk_mq_run_hw_queues(q, false);
>         }
> +       return freeze_depth == 1;
> +}
> +
> +void blk_mq_freeze_queue_start(struct request_queue *q)
> +{
> +       __blk_mq_freeze_queue_start(q, true);
>  }
>  EXPORT_SYMBOL_GPL(blk_mq_freeze_queue_start);
>
> @@ -102,19 +110,90 @@ void blk_mq_freeze_queue(struct request_queue *q)
>  }
>  EXPORT_SYMBOL_GPL(blk_mq_freeze_queue);
>
> -void blk_mq_unfreeze_queue(struct request_queue *q)
> +static bool __blk_mq_unfreeze_queue(struct request_queue *q,
> +                                   bool reinit_percpu_ref)
>  {
>         int freeze_depth;
>
>         freeze_depth = atomic_dec_return(&q->mq_freeze_depth);
>         WARN_ON_ONCE(freeze_depth < 0);
>         if (!freeze_depth) {
> -               percpu_ref_reinit(&q->q_usage_counter);
> +               if (reinit_percpu_ref)
> +                       percpu_ref_reinit(&q->q_usage_counter);
>                 wake_up_all(&q->mq_freeze_wq);
>         }
> +       return freeze_depth == 0;
> +}
> +
> +void blk_mq_unfreeze_queue(struct request_queue *q)
> +{
> +       __blk_mq_unfreeze_queue(q, true);
>  }
>  EXPORT_SYMBOL_GPL(blk_mq_unfreeze_queue);
>
> +/**
> + * blk_mq_quiesce_queue() - wait until all pending queue_rq calls have finished
> + *
> + * Prevent that new I/O requests are queued and wait until all pending
> + * queue_rq() calls have finished. Must not be called if the queue has already
> + * been frozen. Additionally, freezing the queue after having quiesced the
> + * queue and before resuming the queue is not allowed.
> + *
> + * Note: this function does not prevent that the struct request end_io()
> + * callback function is invoked.
> + */
> +void blk_mq_quiesce_queue(struct request_queue *q)
> +{
> +       struct blk_mq_hw_ctx *hctx;
> +       unsigned int i;
> +       bool res, rcu = false;
> +
> +       spin_lock_irq(q->queue_lock);
> +       WARN_ON_ONCE(blk_queue_quiescing(q));
> +       queue_flag_set(QUEUE_FLAG_QUIESCING, q);
> +       spin_unlock_irq(q->queue_lock);
> +
> +       res = __blk_mq_freeze_queue_start(q, false);

Looks the implementation is a bit tricky and complicated, if the percpu
usage counter isn't killed, it isn't necessary to touch .mq_freeze_depth
since you use QUEUE_FLAG_QUIESCING to set/get this status of
the queue.

Then using synchronize_rcu() and rcu_read_lock()/rcu_read_unlock()
with the flag of QUIESCING may be enough to wait for completing
of ongoing invocations of .queue_rq() and avoid to run new .queue_rq,
right?

> +       WARN_ON_ONCE(!res);
> +       queue_for_each_hw_ctx(q, hctx, i) {
> +               if (hctx->flags & BLK_MQ_F_BLOCKING) {
> +                       mutex_lock(&hctx->queue_rq_mutex);
> +                       mutex_unlock(&hctx->queue_rq_mutex);

Could you explain a bit why all BLOCKING is treated so special? And
that flag just means the hw queue always need to schedule asynchronously,
and of couse even though the flag isn't set, it still may be run
asynchronously too. So looks it should be enough to just use RCU.

> +               } else {
> +                       rcu = true;
> +               }
> +       }
> +       if (rcu)
> +               synchronize_rcu();
> +
> +       spin_lock_irq(q->queue_lock);
> +       WARN_ON_ONCE(!blk_queue_quiescing(q));
> +       queue_flag_clear(QUEUE_FLAG_QUIESCING, q);
> +       spin_unlock_irq(q->queue_lock);
> +}
> +EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue);
> +
> +/**
> + * blk_mq_resume_queue() - resume request processing
> + *
> + * Resume blk_queue_enter() calls that have been suspended by
> + * blk_mq_quiesce_queue().
> + *
> + * The caller is responsible for serializing blk_mq_quiesce_queue() and
> + * blk_mq_resume_queue().
> + */
> +void blk_mq_resume_queue(struct request_queue *q)
> +{
> +       bool res;
> +
> +       res = __blk_mq_unfreeze_queue(q, false);
> +       WARN_ON_ONCE(!res);
> +       WARN_ON_ONCE(blk_queue_quiescing(q));
> +
> +       blk_mq_run_hw_queues(q, false);
> +}
> +EXPORT_SYMBOL_GPL(blk_mq_resume_queue);
> +
>  void blk_mq_wake_waiters(struct request_queue *q)
>  {
>         struct blk_mq_hw_ctx *hctx;
> @@ -488,6 +567,9 @@ static void blk_mq_requeue_work(struct work_struct *work)
>         struct request *rq, *next;
>         unsigned long flags;
>
> +       if (blk_queue_quiescing(q))
> +               return;
> +
>         spin_lock_irqsave(&q->requeue_lock, flags);
>         list_splice_init(&q->requeue_list, &rq_list);
>         spin_unlock_irqrestore(&q->requeue_lock, flags);
> @@ -782,7 +864,7 @@ static inline unsigned int queued_to_index(unsigned int queued)
>   * of IO. In particular, we'd like FIFO behaviour on handling existing
>   * items on the hctx->dispatch list. Ignore that for now.
>   */
> -static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
> +static void blk_mq_process_rq_list(struct blk_mq_hw_ctx *hctx)
>  {
>         struct request_queue *q = hctx->queue;
>         struct request *rq;
> @@ -791,9 +873,6 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
>         struct list_head *dptr;
>         int queued;
>
> -       if (unlikely(test_bit(BLK_MQ_S_STOPPED, &hctx->state)))
> -               return;
> -
>         WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) &&
>                 cpu_online(hctx->next_cpu));
>
> @@ -883,7 +962,24 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
>                  *
>                  * blk_mq_run_hw_queue() already checks the STOPPED bit
>                  **/
> -               blk_mq_run_hw_queue(hctx, true);
> +               if (!blk_queue_quiescing(q))
> +                       blk_mq_run_hw_queue(hctx, true);
> +       }
> +}
> +
> +static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
> +{
> +       if (unlikely(test_bit(BLK_MQ_S_STOPPED, &hctx->state)))
> +               return;
> +
> +       if (hctx->flags & BLK_MQ_F_BLOCKING) {
> +               mutex_lock(&hctx->queue_rq_mutex);
> +               blk_mq_process_rq_list(hctx);
> +               mutex_unlock(&hctx->queue_rq_mutex);
> +       } else {
> +               rcu_read_lock();
> +               blk_mq_process_rq_list(hctx);
> +               rcu_read_unlock();
>         }
>  }
>
> @@ -1341,7 +1437,7 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
>                 blk_mq_bio_to_request(rq, bio);
>
>                 /*
> -                * We do limited pluging. If the bio can be merged, do that.
> +                * We do limited plugging. If the bio can be merged, do that.
>                  * Otherwise the existing request in the plug list will be
>                  * issued. So the plug list will have one request at most
>                  */
> @@ -1361,9 +1457,23 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
>                 blk_mq_put_ctx(data.ctx);
>                 if (!old_rq)
>                         goto done;
> -               if (!blk_mq_direct_issue_request(old_rq, &cookie))
> -                       goto done;
> -               blk_mq_insert_request(old_rq, false, true, true);
> +
> +               if (data.hctx->flags & BLK_MQ_F_BLOCKING) {
> +                       mutex_lock(&data.hctx->queue_rq_mutex);
> +                       if (blk_queue_quiescing(q) ||
> +                           blk_mq_direct_issue_request(old_rq, &cookie) != 0)
> +                               blk_mq_insert_request(old_rq, false, true,
> +                                                     true);
> +                       mutex_unlock(&data.hctx->queue_rq_mutex);
> +               } else {
> +                       rcu_read_lock();
> +                       if (blk_queue_quiescing(q) ||
> +                           blk_mq_direct_issue_request(old_rq, &cookie) != 0)
> +                               blk_mq_insert_request(old_rq, false, true,
> +                                                     true);
> +                       rcu_read_unlock();
> +               }
> +
>                 goto done;
>         }
>
> @@ -1702,6 +1812,7 @@ static int blk_mq_init_hctx(struct request_queue *q,
>         INIT_DELAYED_WORK(&hctx->delay_work, blk_mq_delay_work_fn);
>         spin_lock_init(&hctx->lock);
>         INIT_LIST_HEAD(&hctx->dispatch);
> +       mutex_init(&hctx->queue_rq_mutex);
>         hctx->queue = q;
>         hctx->queue_num = hctx_idx;
>         hctx->flags = set->flags & ~BLK_MQ_F_TAG_SHARED;
> diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
> index 368c460d..4b970f1 100644
> --- a/include/linux/blk-mq.h
> +++ b/include/linux/blk-mq.h
> @@ -41,6 +41,8 @@ struct blk_mq_hw_ctx {
>
>         struct blk_mq_tags      *tags;
>
> +       struct mutex            queue_rq_mutex;
> +
>         unsigned long           queued;
>         unsigned long           run;
>  #define BLK_MQ_MAX_DISPATCH_ORDER      7
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index c47c358..d365cdf 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -505,6 +505,7 @@ struct request_queue {
>  #define QUEUE_FLAG_FUA        24       /* device supports FUA writes */
>  #define QUEUE_FLAG_FLUSH_NQ    25      /* flush not queueuable */
>  #define QUEUE_FLAG_DAX         26      /* device supports DAX */
> +#define QUEUE_FLAG_QUIESCING   27
>
>  #define QUEUE_FLAG_DEFAULT     ((1 << QUEUE_FLAG_IO_STAT) |            \
>                                  (1 << QUEUE_FLAG_STACKABLE)    |       \
> @@ -595,6 +596,8 @@ static inline void queue_flag_clear(unsigned int flag, struct request_queue *q)
>  #define blk_queue_secure_erase(q) \
>         (test_bit(QUEUE_FLAG_SECERASE, &(q)->queue_flags))
>  #define blk_queue_dax(q)       test_bit(QUEUE_FLAG_DAX, &(q)->queue_flags)
> +#define blk_queue_quiescing(q) test_bit(QUEUE_FLAG_QUIESCING,  \
> +                                        &(q)->queue_flags)
>
>  #define blk_noretry_request(rq) \
>         ((rq)->cmd_flags & (REQ_FAILFAST_DEV|REQ_FAILFAST_TRANSPORT| \
> @@ -824,6 +827,8 @@ extern void __blk_run_queue(struct request_queue *q);
>  extern void __blk_run_queue_uncond(struct request_queue *q);
>  extern void blk_run_queue(struct request_queue *);
>  extern void blk_run_queue_async(struct request_queue *q);
> +extern void blk_mq_quiesce_queue(struct request_queue *q);
> +extern void blk_mq_resume_queue(struct request_queue *q);
>  extern int blk_rq_map_user(struct request_queue *, struct request *,
>                            struct rq_map_data *, void __user *, unsigned long,
>                            gfp_t);
> --
> 2.10.0
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-block" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Ming Lei

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 4/7] blk-mq: Introduce blk_quiesce_queue() and blk_resume_queue()
@ 2016-09-29 21:51     ` Ming Lei
  0 siblings, 0 replies; 75+ messages in thread
From: Ming Lei @ 2016-09-29 21:51 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, Christoph Hellwig, James Bottomley,
	Martin K. Petersen, Mike Snitzer, Doug Ledford, Keith Busch,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Thu, Sep 29, 2016 at 7:59 AM, Bart Van Assche
<bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org> wrote:
> blk_quiesce_queue() prevents that new queue_rq() invocations

blk_mq_quiesce_queue()

> occur and waits until ongoing invocations have finished. This
> function does *not* wait until all outstanding requests have

I guess it still may wait finally since this way may exhaust tag space
easily, :-)

> finished (this means invocation of request.end_io()).
> blk_resume_queue() resumes normal I/O processing.
>
> Signed-off-by: Bart Van Assche <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
> Cc: Hannes Reinecke <hare-IBi9RG/b67k@public.gmane.org>
> Cc: Johannes Thumshirn <jthumshirn-l3A5Bk7waGM@public.gmane.org>
> ---
>  block/blk-mq.c         | 137 ++++++++++++++++++++++++++++++++++++++++++++-----
>  include/linux/blk-mq.h |   2 +
>  include/linux/blkdev.h |   5 ++
>  3 files changed, 131 insertions(+), 13 deletions(-)
>
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index d8c45de..f5c71ad 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -58,15 +58,23 @@ static void blk_mq_hctx_clear_pending(struct blk_mq_hw_ctx *hctx,
>         sbitmap_clear_bit(&hctx->ctx_map, ctx->index_hw);
>  }
>
> -void blk_mq_freeze_queue_start(struct request_queue *q)
> +static bool __blk_mq_freeze_queue_start(struct request_queue *q,
> +                                       bool kill_percpu_ref)
>  {
>         int freeze_depth;
>
>         freeze_depth = atomic_inc_return(&q->mq_freeze_depth);
>         if (freeze_depth == 1) {
> -               percpu_ref_kill(&q->q_usage_counter);
> +               if (kill_percpu_ref)
> +                       percpu_ref_kill(&q->q_usage_counter);
>                 blk_mq_run_hw_queues(q, false);
>         }
> +       return freeze_depth == 1;
> +}
> +
> +void blk_mq_freeze_queue_start(struct request_queue *q)
> +{
> +       __blk_mq_freeze_queue_start(q, true);
>  }
>  EXPORT_SYMBOL_GPL(blk_mq_freeze_queue_start);
>
> @@ -102,19 +110,90 @@ void blk_mq_freeze_queue(struct request_queue *q)
>  }
>  EXPORT_SYMBOL_GPL(blk_mq_freeze_queue);
>
> -void blk_mq_unfreeze_queue(struct request_queue *q)
> +static bool __blk_mq_unfreeze_queue(struct request_queue *q,
> +                                   bool reinit_percpu_ref)
>  {
>         int freeze_depth;
>
>         freeze_depth = atomic_dec_return(&q->mq_freeze_depth);
>         WARN_ON_ONCE(freeze_depth < 0);
>         if (!freeze_depth) {
> -               percpu_ref_reinit(&q->q_usage_counter);
> +               if (reinit_percpu_ref)
> +                       percpu_ref_reinit(&q->q_usage_counter);
>                 wake_up_all(&q->mq_freeze_wq);
>         }
> +       return freeze_depth == 0;
> +}
> +
> +void blk_mq_unfreeze_queue(struct request_queue *q)
> +{
> +       __blk_mq_unfreeze_queue(q, true);
>  }
>  EXPORT_SYMBOL_GPL(blk_mq_unfreeze_queue);
>
> +/**
> + * blk_mq_quiesce_queue() - wait until all pending queue_rq calls have finished
> + *
> + * Prevent that new I/O requests are queued and wait until all pending
> + * queue_rq() calls have finished. Must not be called if the queue has already
> + * been frozen. Additionally, freezing the queue after having quiesced the
> + * queue and before resuming the queue is not allowed.
> + *
> + * Note: this function does not prevent that the struct request end_io()
> + * callback function is invoked.
> + */
> +void blk_mq_quiesce_queue(struct request_queue *q)
> +{
> +       struct blk_mq_hw_ctx *hctx;
> +       unsigned int i;
> +       bool res, rcu = false;
> +
> +       spin_lock_irq(q->queue_lock);
> +       WARN_ON_ONCE(blk_queue_quiescing(q));
> +       queue_flag_set(QUEUE_FLAG_QUIESCING, q);
> +       spin_unlock_irq(q->queue_lock);
> +
> +       res = __blk_mq_freeze_queue_start(q, false);

Looks the implementation is a bit tricky and complicated, if the percpu
usage counter isn't killed, it isn't necessary to touch .mq_freeze_depth
since you use QUEUE_FLAG_QUIESCING to set/get this status of
the queue.

Then using synchronize_rcu() and rcu_read_lock()/rcu_read_unlock()
with the flag of QUIESCING may be enough to wait for completing
of ongoing invocations of .queue_rq() and avoid to run new .queue_rq,
right?

> +       WARN_ON_ONCE(!res);
> +       queue_for_each_hw_ctx(q, hctx, i) {
> +               if (hctx->flags & BLK_MQ_F_BLOCKING) {
> +                       mutex_lock(&hctx->queue_rq_mutex);
> +                       mutex_unlock(&hctx->queue_rq_mutex);

Could you explain a bit why all BLOCKING is treated so special? And
that flag just means the hw queue always need to schedule asynchronously,
and of couse even though the flag isn't set, it still may be run
asynchronously too. So looks it should be enough to just use RCU.

> +               } else {
> +                       rcu = true;
> +               }
> +       }
> +       if (rcu)
> +               synchronize_rcu();
> +
> +       spin_lock_irq(q->queue_lock);
> +       WARN_ON_ONCE(!blk_queue_quiescing(q));
> +       queue_flag_clear(QUEUE_FLAG_QUIESCING, q);
> +       spin_unlock_irq(q->queue_lock);
> +}
> +EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue);
> +
> +/**
> + * blk_mq_resume_queue() - resume request processing
> + *
> + * Resume blk_queue_enter() calls that have been suspended by
> + * blk_mq_quiesce_queue().
> + *
> + * The caller is responsible for serializing blk_mq_quiesce_queue() and
> + * blk_mq_resume_queue().
> + */
> +void blk_mq_resume_queue(struct request_queue *q)
> +{
> +       bool res;
> +
> +       res = __blk_mq_unfreeze_queue(q, false);
> +       WARN_ON_ONCE(!res);
> +       WARN_ON_ONCE(blk_queue_quiescing(q));
> +
> +       blk_mq_run_hw_queues(q, false);
> +}
> +EXPORT_SYMBOL_GPL(blk_mq_resume_queue);
> +
>  void blk_mq_wake_waiters(struct request_queue *q)
>  {
>         struct blk_mq_hw_ctx *hctx;
> @@ -488,6 +567,9 @@ static void blk_mq_requeue_work(struct work_struct *work)
>         struct request *rq, *next;
>         unsigned long flags;
>
> +       if (blk_queue_quiescing(q))
> +               return;
> +
>         spin_lock_irqsave(&q->requeue_lock, flags);
>         list_splice_init(&q->requeue_list, &rq_list);
>         spin_unlock_irqrestore(&q->requeue_lock, flags);
> @@ -782,7 +864,7 @@ static inline unsigned int queued_to_index(unsigned int queued)
>   * of IO. In particular, we'd like FIFO behaviour on handling existing
>   * items on the hctx->dispatch list. Ignore that for now.
>   */
> -static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
> +static void blk_mq_process_rq_list(struct blk_mq_hw_ctx *hctx)
>  {
>         struct request_queue *q = hctx->queue;
>         struct request *rq;
> @@ -791,9 +873,6 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
>         struct list_head *dptr;
>         int queued;
>
> -       if (unlikely(test_bit(BLK_MQ_S_STOPPED, &hctx->state)))
> -               return;
> -
>         WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) &&
>                 cpu_online(hctx->next_cpu));
>
> @@ -883,7 +962,24 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
>                  *
>                  * blk_mq_run_hw_queue() already checks the STOPPED bit
>                  **/
> -               blk_mq_run_hw_queue(hctx, true);
> +               if (!blk_queue_quiescing(q))
> +                       blk_mq_run_hw_queue(hctx, true);
> +       }
> +}
> +
> +static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
> +{
> +       if (unlikely(test_bit(BLK_MQ_S_STOPPED, &hctx->state)))
> +               return;
> +
> +       if (hctx->flags & BLK_MQ_F_BLOCKING) {
> +               mutex_lock(&hctx->queue_rq_mutex);
> +               blk_mq_process_rq_list(hctx);
> +               mutex_unlock(&hctx->queue_rq_mutex);
> +       } else {
> +               rcu_read_lock();
> +               blk_mq_process_rq_list(hctx);
> +               rcu_read_unlock();
>         }
>  }
>
> @@ -1341,7 +1437,7 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
>                 blk_mq_bio_to_request(rq, bio);
>
>                 /*
> -                * We do limited pluging. If the bio can be merged, do that.
> +                * We do limited plugging. If the bio can be merged, do that.
>                  * Otherwise the existing request in the plug list will be
>                  * issued. So the plug list will have one request at most
>                  */
> @@ -1361,9 +1457,23 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
>                 blk_mq_put_ctx(data.ctx);
>                 if (!old_rq)
>                         goto done;
> -               if (!blk_mq_direct_issue_request(old_rq, &cookie))
> -                       goto done;
> -               blk_mq_insert_request(old_rq, false, true, true);
> +
> +               if (data.hctx->flags & BLK_MQ_F_BLOCKING) {
> +                       mutex_lock(&data.hctx->queue_rq_mutex);
> +                       if (blk_queue_quiescing(q) ||
> +                           blk_mq_direct_issue_request(old_rq, &cookie) != 0)
> +                               blk_mq_insert_request(old_rq, false, true,
> +                                                     true);
> +                       mutex_unlock(&data.hctx->queue_rq_mutex);
> +               } else {
> +                       rcu_read_lock();
> +                       if (blk_queue_quiescing(q) ||
> +                           blk_mq_direct_issue_request(old_rq, &cookie) != 0)
> +                               blk_mq_insert_request(old_rq, false, true,
> +                                                     true);
> +                       rcu_read_unlock();
> +               }
> +
>                 goto done;
>         }
>
> @@ -1702,6 +1812,7 @@ static int blk_mq_init_hctx(struct request_queue *q,
>         INIT_DELAYED_WORK(&hctx->delay_work, blk_mq_delay_work_fn);
>         spin_lock_init(&hctx->lock);
>         INIT_LIST_HEAD(&hctx->dispatch);
> +       mutex_init(&hctx->queue_rq_mutex);
>         hctx->queue = q;
>         hctx->queue_num = hctx_idx;
>         hctx->flags = set->flags & ~BLK_MQ_F_TAG_SHARED;
> diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
> index 368c460d..4b970f1 100644
> --- a/include/linux/blk-mq.h
> +++ b/include/linux/blk-mq.h
> @@ -41,6 +41,8 @@ struct blk_mq_hw_ctx {
>
>         struct blk_mq_tags      *tags;
>
> +       struct mutex            queue_rq_mutex;
> +
>         unsigned long           queued;
>         unsigned long           run;
>  #define BLK_MQ_MAX_DISPATCH_ORDER      7
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index c47c358..d365cdf 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -505,6 +505,7 @@ struct request_queue {
>  #define QUEUE_FLAG_FUA        24       /* device supports FUA writes */
>  #define QUEUE_FLAG_FLUSH_NQ    25      /* flush not queueuable */
>  #define QUEUE_FLAG_DAX         26      /* device supports DAX */
> +#define QUEUE_FLAG_QUIESCING   27
>
>  #define QUEUE_FLAG_DEFAULT     ((1 << QUEUE_FLAG_IO_STAT) |            \
>                                  (1 << QUEUE_FLAG_STACKABLE)    |       \
> @@ -595,6 +596,8 @@ static inline void queue_flag_clear(unsigned int flag, struct request_queue *q)
>  #define blk_queue_secure_erase(q) \
>         (test_bit(QUEUE_FLAG_SECERASE, &(q)->queue_flags))
>  #define blk_queue_dax(q)       test_bit(QUEUE_FLAG_DAX, &(q)->queue_flags)
> +#define blk_queue_quiescing(q) test_bit(QUEUE_FLAG_QUIESCING,  \
> +                                        &(q)->queue_flags)
>
>  #define blk_noretry_request(rq) \
>         ((rq)->cmd_flags & (REQ_FAILFAST_DEV|REQ_FAILFAST_TRANSPORT| \
> @@ -824,6 +827,8 @@ extern void __blk_run_queue(struct request_queue *q);
>  extern void __blk_run_queue_uncond(struct request_queue *q);
>  extern void blk_run_queue(struct request_queue *);
>  extern void blk_run_queue_async(struct request_queue *q);
> +extern void blk_mq_quiesce_queue(struct request_queue *q);
> +extern void blk_mq_resume_queue(struct request_queue *q);
>  extern int blk_rq_map_user(struct request_queue *, struct request *,
>                            struct rq_map_data *, void __user *, unsigned long,
>                            gfp_t);
> --
> 2.10.0
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-block" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH v2 4/7] blk-mq: Introduce blk_quiesce_queue() and blk_resume_queue()
@ 2016-09-29 21:51     ` Ming Lei
  0 siblings, 0 replies; 75+ messages in thread
From: Ming Lei @ 2016-09-29 21:51 UTC (permalink / raw)


On Thu, Sep 29, 2016 at 7:59 AM, Bart Van Assche
<bart.vanassche@sandisk.com> wrote:
> blk_quiesce_queue() prevents that new queue_rq() invocations

blk_mq_quiesce_queue()

> occur and waits until ongoing invocations have finished. This
> function does *not* wait until all outstanding requests have

I guess it still may wait finally since this way may exhaust tag space
easily, :-)

> finished (this means invocation of request.end_io()).
> blk_resume_queue() resumes normal I/O processing.
>
> Signed-off-by: Bart Van Assche <bart.vanassche at sandisk.com>
> Cc: Hannes Reinecke <hare at suse.com>
> Cc: Johannes Thumshirn <jthumshirn at suse.de>
> ---
>  block/blk-mq.c         | 137 ++++++++++++++++++++++++++++++++++++++++++++-----
>  include/linux/blk-mq.h |   2 +
>  include/linux/blkdev.h |   5 ++
>  3 files changed, 131 insertions(+), 13 deletions(-)
>
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index d8c45de..f5c71ad 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -58,15 +58,23 @@ static void blk_mq_hctx_clear_pending(struct blk_mq_hw_ctx *hctx,
>         sbitmap_clear_bit(&hctx->ctx_map, ctx->index_hw);
>  }
>
> -void blk_mq_freeze_queue_start(struct request_queue *q)
> +static bool __blk_mq_freeze_queue_start(struct request_queue *q,
> +                                       bool kill_percpu_ref)
>  {
>         int freeze_depth;
>
>         freeze_depth = atomic_inc_return(&q->mq_freeze_depth);
>         if (freeze_depth == 1) {
> -               percpu_ref_kill(&q->q_usage_counter);
> +               if (kill_percpu_ref)
> +                       percpu_ref_kill(&q->q_usage_counter);
>                 blk_mq_run_hw_queues(q, false);
>         }
> +       return freeze_depth == 1;
> +}
> +
> +void blk_mq_freeze_queue_start(struct request_queue *q)
> +{
> +       __blk_mq_freeze_queue_start(q, true);
>  }
>  EXPORT_SYMBOL_GPL(blk_mq_freeze_queue_start);
>
> @@ -102,19 +110,90 @@ void blk_mq_freeze_queue(struct request_queue *q)
>  }
>  EXPORT_SYMBOL_GPL(blk_mq_freeze_queue);
>
> -void blk_mq_unfreeze_queue(struct request_queue *q)
> +static bool __blk_mq_unfreeze_queue(struct request_queue *q,
> +                                   bool reinit_percpu_ref)
>  {
>         int freeze_depth;
>
>         freeze_depth = atomic_dec_return(&q->mq_freeze_depth);
>         WARN_ON_ONCE(freeze_depth < 0);
>         if (!freeze_depth) {
> -               percpu_ref_reinit(&q->q_usage_counter);
> +               if (reinit_percpu_ref)
> +                       percpu_ref_reinit(&q->q_usage_counter);
>                 wake_up_all(&q->mq_freeze_wq);
>         }
> +       return freeze_depth == 0;
> +}
> +
> +void blk_mq_unfreeze_queue(struct request_queue *q)
> +{
> +       __blk_mq_unfreeze_queue(q, true);
>  }
>  EXPORT_SYMBOL_GPL(blk_mq_unfreeze_queue);
>
> +/**
> + * blk_mq_quiesce_queue() - wait until all pending queue_rq calls have finished
> + *
> + * Prevent that new I/O requests are queued and wait until all pending
> + * queue_rq() calls have finished. Must not be called if the queue has already
> + * been frozen. Additionally, freezing the queue after having quiesced the
> + * queue and before resuming the queue is not allowed.
> + *
> + * Note: this function does not prevent that the struct request end_io()
> + * callback function is invoked.
> + */
> +void blk_mq_quiesce_queue(struct request_queue *q)
> +{
> +       struct blk_mq_hw_ctx *hctx;
> +       unsigned int i;
> +       bool res, rcu = false;
> +
> +       spin_lock_irq(q->queue_lock);
> +       WARN_ON_ONCE(blk_queue_quiescing(q));
> +       queue_flag_set(QUEUE_FLAG_QUIESCING, q);
> +       spin_unlock_irq(q->queue_lock);
> +
> +       res = __blk_mq_freeze_queue_start(q, false);

Looks the implementation is a bit tricky and complicated, if the percpu
usage counter isn't killed, it isn't necessary to touch .mq_freeze_depth
since you use QUEUE_FLAG_QUIESCING to set/get this status of
the queue.

Then using synchronize_rcu() and rcu_read_lock()/rcu_read_unlock()
with the flag of QUIESCING may be enough to wait for completing
of ongoing invocations of .queue_rq() and avoid to run new .queue_rq,
right?

> +       WARN_ON_ONCE(!res);
> +       queue_for_each_hw_ctx(q, hctx, i) {
> +               if (hctx->flags & BLK_MQ_F_BLOCKING) {
> +                       mutex_lock(&hctx->queue_rq_mutex);
> +                       mutex_unlock(&hctx->queue_rq_mutex);

Could you explain a bit why all BLOCKING is treated so special? And
that flag just means the hw queue always need to schedule asynchronously,
and of couse even though the flag isn't set, it still may be run
asynchronously too. So looks it should be enough to just use RCU.

> +               } else {
> +                       rcu = true;
> +               }
> +       }
> +       if (rcu)
> +               synchronize_rcu();
> +
> +       spin_lock_irq(q->queue_lock);
> +       WARN_ON_ONCE(!blk_queue_quiescing(q));
> +       queue_flag_clear(QUEUE_FLAG_QUIESCING, q);
> +       spin_unlock_irq(q->queue_lock);
> +}
> +EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue);
> +
> +/**
> + * blk_mq_resume_queue() - resume request processing
> + *
> + * Resume blk_queue_enter() calls that have been suspended by
> + * blk_mq_quiesce_queue().
> + *
> + * The caller is responsible for serializing blk_mq_quiesce_queue() and
> + * blk_mq_resume_queue().
> + */
> +void blk_mq_resume_queue(struct request_queue *q)
> +{
> +       bool res;
> +
> +       res = __blk_mq_unfreeze_queue(q, false);
> +       WARN_ON_ONCE(!res);
> +       WARN_ON_ONCE(blk_queue_quiescing(q));
> +
> +       blk_mq_run_hw_queues(q, false);
> +}
> +EXPORT_SYMBOL_GPL(blk_mq_resume_queue);
> +
>  void blk_mq_wake_waiters(struct request_queue *q)
>  {
>         struct blk_mq_hw_ctx *hctx;
> @@ -488,6 +567,9 @@ static void blk_mq_requeue_work(struct work_struct *work)
>         struct request *rq, *next;
>         unsigned long flags;
>
> +       if (blk_queue_quiescing(q))
> +               return;
> +
>         spin_lock_irqsave(&q->requeue_lock, flags);
>         list_splice_init(&q->requeue_list, &rq_list);
>         spin_unlock_irqrestore(&q->requeue_lock, flags);
> @@ -782,7 +864,7 @@ static inline unsigned int queued_to_index(unsigned int queued)
>   * of IO. In particular, we'd like FIFO behaviour on handling existing
>   * items on the hctx->dispatch list. Ignore that for now.
>   */
> -static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
> +static void blk_mq_process_rq_list(struct blk_mq_hw_ctx *hctx)
>  {
>         struct request_queue *q = hctx->queue;
>         struct request *rq;
> @@ -791,9 +873,6 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
>         struct list_head *dptr;
>         int queued;
>
> -       if (unlikely(test_bit(BLK_MQ_S_STOPPED, &hctx->state)))
> -               return;
> -
>         WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) &&
>                 cpu_online(hctx->next_cpu));
>
> @@ -883,7 +962,24 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
>                  *
>                  * blk_mq_run_hw_queue() already checks the STOPPED bit
>                  **/
> -               blk_mq_run_hw_queue(hctx, true);
> +               if (!blk_queue_quiescing(q))
> +                       blk_mq_run_hw_queue(hctx, true);
> +       }
> +}
> +
> +static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
> +{
> +       if (unlikely(test_bit(BLK_MQ_S_STOPPED, &hctx->state)))
> +               return;
> +
> +       if (hctx->flags & BLK_MQ_F_BLOCKING) {
> +               mutex_lock(&hctx->queue_rq_mutex);
> +               blk_mq_process_rq_list(hctx);
> +               mutex_unlock(&hctx->queue_rq_mutex);
> +       } else {
> +               rcu_read_lock();
> +               blk_mq_process_rq_list(hctx);
> +               rcu_read_unlock();
>         }
>  }
>
> @@ -1341,7 +1437,7 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
>                 blk_mq_bio_to_request(rq, bio);
>
>                 /*
> -                * We do limited pluging. If the bio can be merged, do that.
> +                * We do limited plugging. If the bio can be merged, do that.
>                  * Otherwise the existing request in the plug list will be
>                  * issued. So the plug list will have one request at most
>                  */
> @@ -1361,9 +1457,23 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
>                 blk_mq_put_ctx(data.ctx);
>                 if (!old_rq)
>                         goto done;
> -               if (!blk_mq_direct_issue_request(old_rq, &cookie))
> -                       goto done;
> -               blk_mq_insert_request(old_rq, false, true, true);
> +
> +               if (data.hctx->flags & BLK_MQ_F_BLOCKING) {
> +                       mutex_lock(&data.hctx->queue_rq_mutex);
> +                       if (blk_queue_quiescing(q) ||
> +                           blk_mq_direct_issue_request(old_rq, &cookie) != 0)
> +                               blk_mq_insert_request(old_rq, false, true,
> +                                                     true);
> +                       mutex_unlock(&data.hctx->queue_rq_mutex);
> +               } else {
> +                       rcu_read_lock();
> +                       if (blk_queue_quiescing(q) ||
> +                           blk_mq_direct_issue_request(old_rq, &cookie) != 0)
> +                               blk_mq_insert_request(old_rq, false, true,
> +                                                     true);
> +                       rcu_read_unlock();
> +               }
> +
>                 goto done;
>         }
>
> @@ -1702,6 +1812,7 @@ static int blk_mq_init_hctx(struct request_queue *q,
>         INIT_DELAYED_WORK(&hctx->delay_work, blk_mq_delay_work_fn);
>         spin_lock_init(&hctx->lock);
>         INIT_LIST_HEAD(&hctx->dispatch);
> +       mutex_init(&hctx->queue_rq_mutex);
>         hctx->queue = q;
>         hctx->queue_num = hctx_idx;
>         hctx->flags = set->flags & ~BLK_MQ_F_TAG_SHARED;
> diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
> index 368c460d..4b970f1 100644
> --- a/include/linux/blk-mq.h
> +++ b/include/linux/blk-mq.h
> @@ -41,6 +41,8 @@ struct blk_mq_hw_ctx {
>
>         struct blk_mq_tags      *tags;
>
> +       struct mutex            queue_rq_mutex;
> +
>         unsigned long           queued;
>         unsigned long           run;
>  #define BLK_MQ_MAX_DISPATCH_ORDER      7
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index c47c358..d365cdf 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -505,6 +505,7 @@ struct request_queue {
>  #define QUEUE_FLAG_FUA        24       /* device supports FUA writes */
>  #define QUEUE_FLAG_FLUSH_NQ    25      /* flush not queueuable */
>  #define QUEUE_FLAG_DAX         26      /* device supports DAX */
> +#define QUEUE_FLAG_QUIESCING   27
>
>  #define QUEUE_FLAG_DEFAULT     ((1 << QUEUE_FLAG_IO_STAT) |            \
>                                  (1 << QUEUE_FLAG_STACKABLE)    |       \
> @@ -595,6 +596,8 @@ static inline void queue_flag_clear(unsigned int flag, struct request_queue *q)
>  #define blk_queue_secure_erase(q) \
>         (test_bit(QUEUE_FLAG_SECERASE, &(q)->queue_flags))
>  #define blk_queue_dax(q)       test_bit(QUEUE_FLAG_DAX, &(q)->queue_flags)
> +#define blk_queue_quiescing(q) test_bit(QUEUE_FLAG_QUIESCING,  \
> +                                        &(q)->queue_flags)
>
>  #define blk_noretry_request(rq) \
>         ((rq)->cmd_flags & (REQ_FAILFAST_DEV|REQ_FAILFAST_TRANSPORT| \
> @@ -824,6 +827,8 @@ extern void __blk_run_queue(struct request_queue *q);
>  extern void __blk_run_queue_uncond(struct request_queue *q);
>  extern void blk_run_queue(struct request_queue *);
>  extern void blk_run_queue_async(struct request_queue *q);
> +extern void blk_mq_quiesce_queue(struct request_queue *q);
> +extern void blk_mq_resume_queue(struct request_queue *q);
>  extern int blk_rq_map_user(struct request_queue *, struct request *,
>                            struct rq_map_data *, void __user *, unsigned long,
>                            gfp_t);
> --
> 2.10.0
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-block" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Ming Lei

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 4/7] blk-mq: Introduce blk_quiesce_queue() and blk_resume_queue()
  2016-09-29 21:51     ` Ming Lei
@ 2016-09-30 15:55       ` Bart Van Assche
  -1 siblings, 0 replies; 75+ messages in thread
From: Bart Van Assche @ 2016-09-30 15:55 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, Christoph Hellwig, James Bottomley,
	Martin K. Petersen, Mike Snitzer, Doug Ledford, Keith Busch,
	linux-block, linux-scsi, linux-rdma, linux-nvme

[-- Attachment #1: Type: text/plain, Size: 2206 bytes --]

On 09/29/16 14:51, Ming Lei wrote:
> On Thu, Sep 29, 2016 at 7:59 AM, Bart Van Assche
> <bart.vanassche@sandisk.com> wrote:
>> blk_quiesce_queue() prevents that new queue_rq() invocations
>
> blk_mq_quiesce_queue()

Thanks, I will update the patch title and patch description.

>> +void blk_mq_quiesce_queue(struct request_queue *q)
>> +{
>> +       struct blk_mq_hw_ctx *hctx;
>> +       unsigned int i;
>> +       bool res, rcu = false;
>> +
>> +       spin_lock_irq(q->queue_lock);
>> +       WARN_ON_ONCE(blk_queue_quiescing(q));
>> +       queue_flag_set(QUEUE_FLAG_QUIESCING, q);
>> +       spin_unlock_irq(q->queue_lock);
>> +
>> +       res = __blk_mq_freeze_queue_start(q, false);
>
> Looks the implementation is a bit tricky and complicated, if the percpu
> usage counter isn't killed, it isn't necessary to touch .mq_freeze_depth
> since you use QUEUE_FLAG_QUIESCING to set/get this status of
> the queue.
 >
 > Then using synchronize_rcu() and rcu_read_lock()/rcu_read_unlock()
 > with the flag of QUIESCING may be enough to wait for completing
 > of ongoing invocations of .queue_rq() and avoid to run new .queue_rq,
 > right?

That's an interesting thought. Can you have a look at the attached patch 
in which blk_mq_quiesce_queue() no longer manipulates the 
mq_freeze_depth counter?

>> +       WARN_ON_ONCE(!res);
>> +       queue_for_each_hw_ctx(q, hctx, i) {
>> +               if (hctx->flags & BLK_MQ_F_BLOCKING) {
>> +                       mutex_lock(&hctx->queue_rq_mutex);
>> +                       mutex_unlock(&hctx->queue_rq_mutex);
>
> Could you explain a bit why all BLOCKING is treated so special? And
> that flag just means the hw queue always need to schedule asynchronously,
> and of course even though the flag isn't set, it still may be run
> asynchronously too. So looks it should be enough to just use RCU.

The mutex manipulations introduce atomic stores in the hot path. Hence 
the use of synchronize_rcu() and rcu_read_lock()/rcu_read_unlock() when 
possible. Since BLK_MQ_F_BLOCKING indicates that .queue_rq() may sleep 
and since sleeping is not allowed while holding the RCU read lock, 
queue_rq_mutex has been introduced for the BLK_MQ_F_BLOCKING case.

Bart.

[-- Attachment #2: 0001-blk-mq-Introduce-blk_mq_quiesce_queue.patch --]
[-- Type: text/x-patch, Size: 6466 bytes --]

>From 3a549df53eba84fad94279484e54f9c43b2d6626 Mon Sep 17 00:00:00 2001
From: Bart Van Assche <bart.vanassche@sandisk.com>
Date: Tue, 27 Sep 2016 10:52:36 -0700
Subject: [PATCH] blk-mq: Introduce blk_mq_quiesce_queue()

blk_mq_quiesce_queue() waits until ongoing .queue_rq() invocations
have finished. This function does *not* wait until all outstanding
requests have finished (this means invocation of request.end_io()).

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
---
 block/blk-mq.c         | 69 +++++++++++++++++++++++++++++++++++++++++++++-----
 include/linux/blk-mq.h |  2 ++
 include/linux/blkdev.h |  4 +++
 3 files changed, 69 insertions(+), 6 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index d8c45de..57701c5 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -115,6 +115,32 @@ void blk_mq_unfreeze_queue(struct request_queue *q)
 }
 EXPORT_SYMBOL_GPL(blk_mq_unfreeze_queue);
 
+/**
+ * blk_mq_quiesce_queue() - wait until all ongoing queue_rq calls have finished
+ *
+ * Note: this function does not prevent that the struct request end_io()
+ * callback function is invoked. Additionally, it is not prevented that
+ * new queue_rq() calls occur unless the queue has been stopped first.
+ */
+void blk_mq_quiesce_queue(struct request_queue *q)
+{
+	struct blk_mq_hw_ctx *hctx;
+	unsigned int i;
+	bool res, rcu = false;
+
+	queue_for_each_hw_ctx(q, hctx, i) {
+		if (hctx->flags & BLK_MQ_F_BLOCKING) {
+			mutex_lock(&hctx->queue_rq_mutex);
+			mutex_unlock(&hctx->queue_rq_mutex);
+		} else {
+			rcu = true;
+		}
+	}
+	if (rcu)
+		synchronize_rcu();
+}
+EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue);
+
 void blk_mq_wake_waiters(struct request_queue *q)
 {
 	struct blk_mq_hw_ctx *hctx;
@@ -782,7 +808,7 @@ static inline unsigned int queued_to_index(unsigned int queued)
  * of IO. In particular, we'd like FIFO behaviour on handling existing
  * items on the hctx->dispatch list. Ignore that for now.
  */
-static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
+static void blk_mq_process_rq_list(struct blk_mq_hw_ctx *hctx)
 {
 	struct request_queue *q = hctx->queue;
 	struct request *rq;
@@ -791,7 +817,7 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
 	struct list_head *dptr;
 	int queued;
 
-	if (unlikely(test_bit(BLK_MQ_S_STOPPED, &hctx->state)))
+	if (unlikely(blk_queue_quiescing(q)))
 		return;
 
 	WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) &&
@@ -887,6 +913,22 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
 	}
 }
 
+static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
+{
+	if (unlikely(test_bit(BLK_MQ_S_STOPPED, &hctx->state)))
+		return;
+
+	if (hctx->flags & BLK_MQ_F_BLOCKING) {
+		mutex_lock(&hctx->queue_rq_mutex);
+		blk_mq_process_rq_list(hctx);
+		mutex_unlock(&hctx->queue_rq_mutex);
+	} else {
+		rcu_read_lock();
+		blk_mq_process_rq_list(hctx);
+		rcu_read_unlock();
+	}
+}
+
 /*
  * It'd be great if the workqueue API had a way to pass
  * in a mask and had some smarts for more clever placement.
@@ -1341,7 +1383,7 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
 		blk_mq_bio_to_request(rq, bio);
 
 		/*
-		 * We do limited pluging. If the bio can be merged, do that.
+		 * We do limited plugging. If the bio can be merged, do that.
 		 * Otherwise the existing request in the plug list will be
 		 * issued. So the plug list will have one request at most
 		 */
@@ -1361,9 +1403,23 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
 		blk_mq_put_ctx(data.ctx);
 		if (!old_rq)
 			goto done;
-		if (!blk_mq_direct_issue_request(old_rq, &cookie))
-			goto done;
-		blk_mq_insert_request(old_rq, false, true, true);
+
+		if (data.hctx->flags & BLK_MQ_F_BLOCKING) {
+			mutex_lock(&data.hctx->queue_rq_mutex);
+			if (blk_queue_quiescing(q) ||
+			    blk_mq_direct_issue_request(old_rq, &cookie) != 0)
+				blk_mq_insert_request(old_rq, false, true,
+						      true);
+			mutex_unlock(&data.hctx->queue_rq_mutex);
+		} else {
+			rcu_read_lock();
+			if (blk_queue_quiescing(q) ||
+			    blk_mq_direct_issue_request(old_rq, &cookie) != 0)
+				blk_mq_insert_request(old_rq, false, true,
+						      true);
+			rcu_read_unlock();
+		}
+
 		goto done;
 	}
 
@@ -1702,6 +1758,7 @@ static int blk_mq_init_hctx(struct request_queue *q,
 	INIT_DELAYED_WORK(&hctx->delay_work, blk_mq_delay_work_fn);
 	spin_lock_init(&hctx->lock);
 	INIT_LIST_HEAD(&hctx->dispatch);
+	mutex_init(&hctx->queue_rq_mutex);
 	hctx->queue = q;
 	hctx->queue_num = hctx_idx;
 	hctx->flags = set->flags & ~BLK_MQ_F_TAG_SHARED;
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 368c460d..4b970f1 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -41,6 +41,8 @@ struct blk_mq_hw_ctx {
 
 	struct blk_mq_tags	*tags;
 
+	struct mutex		queue_rq_mutex;
+
 	unsigned long		queued;
 	unsigned long		run;
 #define BLK_MQ_MAX_DISPATCH_ORDER	7
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index c47c358..6c2d987 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -505,6 +505,7 @@ struct request_queue {
 #define QUEUE_FLAG_FUA	       24	/* device supports FUA writes */
 #define QUEUE_FLAG_FLUSH_NQ    25	/* flush not queueuable */
 #define QUEUE_FLAG_DAX         26	/* device supports DAX */
+#define QUEUE_FLAG_QUIESCING   27
 
 #define QUEUE_FLAG_DEFAULT	((1 << QUEUE_FLAG_IO_STAT) |		\
 				 (1 << QUEUE_FLAG_STACKABLE)	|	\
@@ -595,6 +596,8 @@ static inline void queue_flag_clear(unsigned int flag, struct request_queue *q)
 #define blk_queue_secure_erase(q) \
 	(test_bit(QUEUE_FLAG_SECERASE, &(q)->queue_flags))
 #define blk_queue_dax(q)	test_bit(QUEUE_FLAG_DAX, &(q)->queue_flags)
+#define blk_queue_quiescing(q)	test_bit(QUEUE_FLAG_QUIESCING,	\
+					 &(q)->queue_flags)
 
 #define blk_noretry_request(rq) \
 	((rq)->cmd_flags & (REQ_FAILFAST_DEV|REQ_FAILFAST_TRANSPORT| \
@@ -824,6 +827,7 @@ extern void __blk_run_queue(struct request_queue *q);
 extern void __blk_run_queue_uncond(struct request_queue *q);
 extern void blk_run_queue(struct request_queue *);
 extern void blk_run_queue_async(struct request_queue *q);
+extern void blk_mq_quiesce_queue(struct request_queue *q);
 extern int blk_rq_map_user(struct request_queue *, struct request *,
 			   struct rq_map_data *, void __user *, unsigned long,
 			   gfp_t);
-- 
2.10.0


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v2 4/7] blk-mq: Introduce blk_quiesce_queue() and blk_resume_queue()
@ 2016-09-30 15:55       ` Bart Van Assche
  0 siblings, 0 replies; 75+ messages in thread
From: Bart Van Assche @ 2016-09-30 15:55 UTC (permalink / raw)


On 09/29/16 14:51, Ming Lei wrote:
> On Thu, Sep 29, 2016 at 7:59 AM, Bart Van Assche
> <bart.vanassche@sandisk.com> wrote:
>> blk_quiesce_queue() prevents that new queue_rq() invocations
>
> blk_mq_quiesce_queue()

Thanks, I will update the patch title and patch description.

>> +void blk_mq_quiesce_queue(struct request_queue *q)
>> +{
>> +       struct blk_mq_hw_ctx *hctx;
>> +       unsigned int i;
>> +       bool res, rcu = false;
>> +
>> +       spin_lock_irq(q->queue_lock);
>> +       WARN_ON_ONCE(blk_queue_quiescing(q));
>> +       queue_flag_set(QUEUE_FLAG_QUIESCING, q);
>> +       spin_unlock_irq(q->queue_lock);
>> +
>> +       res = __blk_mq_freeze_queue_start(q, false);
>
> Looks the implementation is a bit tricky and complicated, if the percpu
> usage counter isn't killed, it isn't necessary to touch .mq_freeze_depth
> since you use QUEUE_FLAG_QUIESCING to set/get this status of
> the queue.
 >
 > Then using synchronize_rcu() and rcu_read_lock()/rcu_read_unlock()
 > with the flag of QUIESCING may be enough to wait for completing
 > of ongoing invocations of .queue_rq() and avoid to run new .queue_rq,
 > right?

That's an interesting thought. Can you have a look at the attached patch 
in which blk_mq_quiesce_queue() no longer manipulates the 
mq_freeze_depth counter?

>> +       WARN_ON_ONCE(!res);
>> +       queue_for_each_hw_ctx(q, hctx, i) {
>> +               if (hctx->flags & BLK_MQ_F_BLOCKING) {
>> +                       mutex_lock(&hctx->queue_rq_mutex);
>> +                       mutex_unlock(&hctx->queue_rq_mutex);
>
> Could you explain a bit why all BLOCKING is treated so special? And
> that flag just means the hw queue always need to schedule asynchronously,
> and of course even though the flag isn't set, it still may be run
> asynchronously too. So looks it should be enough to just use RCU.

The mutex manipulations introduce atomic stores in the hot path. Hence 
the use of synchronize_rcu() and rcu_read_lock()/rcu_read_unlock() when 
possible. Since BLK_MQ_F_BLOCKING indicates that .queue_rq() may sleep 
and since sleeping is not allowed while holding the RCU read lock, 
queue_rq_mutex has been introduced for the BLK_MQ_F_BLOCKING case.

Bart.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-blk-mq-Introduce-blk_mq_quiesce_queue.patch
Type: text/x-patch
Size: 6466 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-nvme/attachments/20160930/0dd0b79e/attachment-0001.bin>

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 4/7] blk-mq: Introduce blk_quiesce_queue() and blk_resume_queue()
@ 2016-10-01 22:56         ` Ming Lei
  0 siblings, 0 replies; 75+ messages in thread
From: Ming Lei @ 2016-10-01 22:56 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, Christoph Hellwig, James Bottomley,
	Martin K. Petersen, Mike Snitzer, Doug Ledford, Keith Busch,
	linux-block, linux-scsi, linux-rdma, linux-nvme

On Fri, Sep 30, 2016 at 11:55 PM, Bart Van Assche
<bart.vanassche@sandisk.com> wrote:
> On 09/29/16 14:51, Ming Lei wrote:
>>
>> On Thu, Sep 29, 2016 at 7:59 AM, Bart Van Assche
>> <bart.vanassche@sandisk.com> wrote:
>>>
>>> blk_quiesce_queue() prevents that new queue_rq() invocations
>>
>>
>> blk_mq_quiesce_queue()
>
>
> Thanks, I will update the patch title and patch description.
>
>>> +void blk_mq_quiesce_queue(struct request_queue *q)
>>> +{
>>> +       struct blk_mq_hw_ctx *hctx;
>>> +       unsigned int i;
>>> +       bool res, rcu = false;
>>> +
>>> +       spin_lock_irq(q->queue_lock);
>>> +       WARN_ON_ONCE(blk_queue_quiescing(q));
>>> +       queue_flag_set(QUEUE_FLAG_QUIESCING, q);
>>> +       spin_unlock_irq(q->queue_lock);
>>> +
>>> +       res = __blk_mq_freeze_queue_start(q, false);
>>
>>
>> Looks the implementation is a bit tricky and complicated, if the percpu
>> usage counter isn't killed, it isn't necessary to touch .mq_freeze_depth
>> since you use QUEUE_FLAG_QUIESCING to set/get this status of
>> the queue.
>
>>
>> Then using synchronize_rcu() and rcu_read_lock()/rcu_read_unlock()
>> with the flag of QUIESCING may be enough to wait for completing
>> of ongoing invocations of .queue_rq() and avoid to run new .queue_rq,
>> right?
>
> That's an interesting thought. Can you have a look at the attached patch in
> which blk_mq_quiesce_queue() no longer manipulates the mq_freeze_depth
> counter?

About this part of manipulating 'mq_freeze_depth', your new patch looks
better and cleaner.

>
>>> +       WARN_ON_ONCE(!res);
>>> +       queue_for_each_hw_ctx(q, hctx, i) {
>>> +               if (hctx->flags & BLK_MQ_F_BLOCKING) {
>>> +                       mutex_lock(&hctx->queue_rq_mutex);
>>> +                       mutex_unlock(&hctx->queue_rq_mutex);
>>
>>
>> Could you explain a bit why all BLOCKING is treated so special? And
>> that flag just means the hw queue always need to schedule asynchronously,
>> and of course even though the flag isn't set, it still may be run
>> asynchronously too. So looks it should be enough to just use RCU.
>
>
> The mutex manipulations introduce atomic stores in the hot path. Hence the
> use of synchronize_rcu() and rcu_read_lock()/rcu_read_unlock() when
> possible. Since BLK_MQ_F_BLOCKING indicates that .queue_rq() may sleep and
> since sleeping is not allowed while holding the RCU read lock,
> queue_rq_mutex has been introduced for the BLK_MQ_F_BLOCKING case.

OK, got it.

But this part looks still not good enough:

> +static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
> +{
> +       if (unlikely(test_bit(BLK_MQ_S_STOPPED, &hctx->state)))
> +               return;
> +
> +       if (hctx->flags & BLK_MQ_F_BLOCKING) {
> +               mutex_lock(&hctx->queue_rq_mutex);
> +               blk_mq_process_rq_list(hctx);
> +               mutex_unlock(&hctx->queue_rq_mutex);

With the new mutex, .queue_rq can't be run concurrently any more, even
though the hw queue can be mapped to more than one CPUs. So maybe
srcu for BLK_MQ_F_BLOCKING?

> +       } else {
> +               rcu_read_lock();
> +               blk_mq_process_rq_list(hctx);
> +               rcu_read_unlock();
>         }

...

> -               if (!blk_mq_direct_issue_request(old_rq, &cookie))
> -                       goto done;
> -               blk_mq_insert_request(old_rq, false, true, true);
> +
> +               if (data.hctx->flags & BLK_MQ_F_BLOCKING) {
> +                       mutex_lock(&data.hctx->queue_rq_mutex);
> +                       if (blk_queue_quiescing(q) ||
> +                           blk_mq_direct_issue_request(old_rq, &cookie) != 0)
> +                               blk_mq_insert_request(old_rq, false, true,
> +                                                     true);
> +                       mutex_unlock(&data.hctx->queue_rq_mutex);
> +               } else {
> +                       rcu_read_lock();
> +                       if (blk_queue_quiescing(q) ||
> +                           blk_mq_direct_issue_request(old_rq, &cookie) != 0)
> +                               blk_mq_insert_request(old_rq, false, true,
> +                                                     true);
> +                       rcu_read_unlock();
> +               }
> +

If we just call the rcu/srcu read lock(or the mutex) around .queue_rq(), the
above code needn't to be duplicated any more.

Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 4/7] blk-mq: Introduce blk_quiesce_queue() and blk_resume_queue()
@ 2016-10-01 22:56         ` Ming Lei
  0 siblings, 0 replies; 75+ messages in thread
From: Ming Lei @ 2016-10-01 22:56 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, Christoph Hellwig, James Bottomley,
	Martin K. Petersen, Mike Snitzer, Doug Ledford, Keith Busch,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Fri, Sep 30, 2016 at 11:55 PM, Bart Van Assche
<bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org> wrote:
> On 09/29/16 14:51, Ming Lei wrote:
>>
>> On Thu, Sep 29, 2016 at 7:59 AM, Bart Van Assche
>> <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org> wrote:
>>>
>>> blk_quiesce_queue() prevents that new queue_rq() invocations
>>
>>
>> blk_mq_quiesce_queue()
>
>
> Thanks, I will update the patch title and patch description.
>
>>> +void blk_mq_quiesce_queue(struct request_queue *q)
>>> +{
>>> +       struct blk_mq_hw_ctx *hctx;
>>> +       unsigned int i;
>>> +       bool res, rcu = false;
>>> +
>>> +       spin_lock_irq(q->queue_lock);
>>> +       WARN_ON_ONCE(blk_queue_quiescing(q));
>>> +       queue_flag_set(QUEUE_FLAG_QUIESCING, q);
>>> +       spin_unlock_irq(q->queue_lock);
>>> +
>>> +       res = __blk_mq_freeze_queue_start(q, false);
>>
>>
>> Looks the implementation is a bit tricky and complicated, if the percpu
>> usage counter isn't killed, it isn't necessary to touch .mq_freeze_depth
>> since you use QUEUE_FLAG_QUIESCING to set/get this status of
>> the queue.
>
>>
>> Then using synchronize_rcu() and rcu_read_lock()/rcu_read_unlock()
>> with the flag of QUIESCING may be enough to wait for completing
>> of ongoing invocations of .queue_rq() and avoid to run new .queue_rq,
>> right?
>
> That's an interesting thought. Can you have a look at the attached patch in
> which blk_mq_quiesce_queue() no longer manipulates the mq_freeze_depth
> counter?

About this part of manipulating 'mq_freeze_depth', your new patch looks
better and cleaner.

>
>>> +       WARN_ON_ONCE(!res);
>>> +       queue_for_each_hw_ctx(q, hctx, i) {
>>> +               if (hctx->flags & BLK_MQ_F_BLOCKING) {
>>> +                       mutex_lock(&hctx->queue_rq_mutex);
>>> +                       mutex_unlock(&hctx->queue_rq_mutex);
>>
>>
>> Could you explain a bit why all BLOCKING is treated so special? And
>> that flag just means the hw queue always need to schedule asynchronously,
>> and of course even though the flag isn't set, it still may be run
>> asynchronously too. So looks it should be enough to just use RCU.
>
>
> The mutex manipulations introduce atomic stores in the hot path. Hence the
> use of synchronize_rcu() and rcu_read_lock()/rcu_read_unlock() when
> possible. Since BLK_MQ_F_BLOCKING indicates that .queue_rq() may sleep and
> since sleeping is not allowed while holding the RCU read lock,
> queue_rq_mutex has been introduced for the BLK_MQ_F_BLOCKING case.

OK, got it.

But this part looks still not good enough:

> +static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
> +{
> +       if (unlikely(test_bit(BLK_MQ_S_STOPPED, &hctx->state)))
> +               return;
> +
> +       if (hctx->flags & BLK_MQ_F_BLOCKING) {
> +               mutex_lock(&hctx->queue_rq_mutex);
> +               blk_mq_process_rq_list(hctx);
> +               mutex_unlock(&hctx->queue_rq_mutex);

With the new mutex, .queue_rq can't be run concurrently any more, even
though the hw queue can be mapped to more than one CPUs. So maybe
srcu for BLK_MQ_F_BLOCKING?

> +       } else {
> +               rcu_read_lock();
> +               blk_mq_process_rq_list(hctx);
> +               rcu_read_unlock();
>         }

...

> -               if (!blk_mq_direct_issue_request(old_rq, &cookie))
> -                       goto done;
> -               blk_mq_insert_request(old_rq, false, true, true);
> +
> +               if (data.hctx->flags & BLK_MQ_F_BLOCKING) {
> +                       mutex_lock(&data.hctx->queue_rq_mutex);
> +                       if (blk_queue_quiescing(q) ||
> +                           blk_mq_direct_issue_request(old_rq, &cookie) != 0)
> +                               blk_mq_insert_request(old_rq, false, true,
> +                                                     true);
> +                       mutex_unlock(&data.hctx->queue_rq_mutex);
> +               } else {
> +                       rcu_read_lock();
> +                       if (blk_queue_quiescing(q) ||
> +                           blk_mq_direct_issue_request(old_rq, &cookie) != 0)
> +                               blk_mq_insert_request(old_rq, false, true,
> +                                                     true);
> +                       rcu_read_unlock();
> +               }
> +

If we just call the rcu/srcu read lock(or the mutex) around .queue_rq(), the
above code needn't to be duplicated any more.

Thanks,
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH v2 4/7] blk-mq: Introduce blk_quiesce_queue() and blk_resume_queue()
@ 2016-10-01 22:56         ` Ming Lei
  0 siblings, 0 replies; 75+ messages in thread
From: Ming Lei @ 2016-10-01 22:56 UTC (permalink / raw)


On Fri, Sep 30, 2016 at 11:55 PM, Bart Van Assche
<bart.vanassche@sandisk.com> wrote:
> On 09/29/16 14:51, Ming Lei wrote:
>>
>> On Thu, Sep 29, 2016 at 7:59 AM, Bart Van Assche
>> <bart.vanassche@sandisk.com> wrote:
>>>
>>> blk_quiesce_queue() prevents that new queue_rq() invocations
>>
>>
>> blk_mq_quiesce_queue()
>
>
> Thanks, I will update the patch title and patch description.
>
>>> +void blk_mq_quiesce_queue(struct request_queue *q)
>>> +{
>>> +       struct blk_mq_hw_ctx *hctx;
>>> +       unsigned int i;
>>> +       bool res, rcu = false;
>>> +
>>> +       spin_lock_irq(q->queue_lock);
>>> +       WARN_ON_ONCE(blk_queue_quiescing(q));
>>> +       queue_flag_set(QUEUE_FLAG_QUIESCING, q);
>>> +       spin_unlock_irq(q->queue_lock);
>>> +
>>> +       res = __blk_mq_freeze_queue_start(q, false);
>>
>>
>> Looks the implementation is a bit tricky and complicated, if the percpu
>> usage counter isn't killed, it isn't necessary to touch .mq_freeze_depth
>> since you use QUEUE_FLAG_QUIESCING to set/get this status of
>> the queue.
>
>>
>> Then using synchronize_rcu() and rcu_read_lock()/rcu_read_unlock()
>> with the flag of QUIESCING may be enough to wait for completing
>> of ongoing invocations of .queue_rq() and avoid to run new .queue_rq,
>> right?
>
> That's an interesting thought. Can you have a look at the attached patch in
> which blk_mq_quiesce_queue() no longer manipulates the mq_freeze_depth
> counter?

About this part of manipulating 'mq_freeze_depth', your new patch looks
better and cleaner.

>
>>> +       WARN_ON_ONCE(!res);
>>> +       queue_for_each_hw_ctx(q, hctx, i) {
>>> +               if (hctx->flags & BLK_MQ_F_BLOCKING) {
>>> +                       mutex_lock(&hctx->queue_rq_mutex);
>>> +                       mutex_unlock(&hctx->queue_rq_mutex);
>>
>>
>> Could you explain a bit why all BLOCKING is treated so special? And
>> that flag just means the hw queue always need to schedule asynchronously,
>> and of course even though the flag isn't set, it still may be run
>> asynchronously too. So looks it should be enough to just use RCU.
>
>
> The mutex manipulations introduce atomic stores in the hot path. Hence the
> use of synchronize_rcu() and rcu_read_lock()/rcu_read_unlock() when
> possible. Since BLK_MQ_F_BLOCKING indicates that .queue_rq() may sleep and
> since sleeping is not allowed while holding the RCU read lock,
> queue_rq_mutex has been introduced for the BLK_MQ_F_BLOCKING case.

OK, got it.

But this part looks still not good enough:

> +static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
> +{
> +       if (unlikely(test_bit(BLK_MQ_S_STOPPED, &hctx->state)))
> +               return;
> +
> +       if (hctx->flags & BLK_MQ_F_BLOCKING) {
> +               mutex_lock(&hctx->queue_rq_mutex);
> +               blk_mq_process_rq_list(hctx);
> +               mutex_unlock(&hctx->queue_rq_mutex);

With the new mutex, .queue_rq can't be run concurrently any more, even
though the hw queue can be mapped to more than one CPUs. So maybe
srcu for BLK_MQ_F_BLOCKING?

> +       } else {
> +               rcu_read_lock();
> +               blk_mq_process_rq_list(hctx);
> +               rcu_read_unlock();
>         }

...

> -               if (!blk_mq_direct_issue_request(old_rq, &cookie))
> -                       goto done;
> -               blk_mq_insert_request(old_rq, false, true, true);
> +
> +               if (data.hctx->flags & BLK_MQ_F_BLOCKING) {
> +                       mutex_lock(&data.hctx->queue_rq_mutex);
> +                       if (blk_queue_quiescing(q) ||
> +                           blk_mq_direct_issue_request(old_rq, &cookie) != 0)
> +                               blk_mq_insert_request(old_rq, false, true,
> +                                                     true);
> +                       mutex_unlock(&data.hctx->queue_rq_mutex);
> +               } else {
> +                       rcu_read_lock();
> +                       if (blk_queue_quiescing(q) ||
> +                           blk_mq_direct_issue_request(old_rq, &cookie) != 0)
> +                               blk_mq_insert_request(old_rq, false, true,
> +                                                     true);
> +                       rcu_read_unlock();
> +               }
> +

If we just call the rcu/srcu read lock(or the mutex) around .queue_rq(), the
above code needn't to be duplicated any more.

Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 4/7] blk-mq: Introduce blk_quiesce_queue() and blk_resume_queue()
  2016-10-01 22:56         ` Ming Lei
@ 2016-10-05  4:16           ` Bart Van Assche
  -1 siblings, 0 replies; 75+ messages in thread
From: Bart Van Assche @ 2016-10-05  4:16 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, Christoph Hellwig, James Bottomley,
	Martin K. Petersen, Mike Snitzer, Doug Ledford, Keith Busch,
	linux-block, linux-scsi, linux-rdma, linux-nvme

[-- Attachment #1: Type: text/plain, Size: 466 bytes --]

On 10/01/16 15:56, Ming Lei wrote:
> If we just call the rcu/srcu read lock(or the mutex) around .queue_rq(), the
> above code needn't to be duplicated any more.

Hello Ming,

Can you have a look at the attached patch? That patch uses an srcu read 
lock for all queue types, whether or not the BLK_MQ_F_BLOCKING flag has 
been set. Additionally, I have dropped the QUEUE_FLAG_QUIESCING flag. 
Just like previous versions, this patch has been tested.

Thanks,

Bart.

[-- Attachment #2: 0001-blk-mq-Introduce-blk_mq_quiesce_queue.patch --]
[-- Type: text/x-patch, Size: 5211 bytes --]

>From 25f02ed7ab7b2308fd18b89d180c0c613e55d416 Mon Sep 17 00:00:00 2001
From: Bart Van Assche <bart.vanassche@sandisk.com>
Date: Tue, 27 Sep 2016 10:52:36 -0700
Subject: [PATCH] blk-mq: Introduce blk_mq_quiesce_queue()

blk_mq_quiesce_queue() waits until ongoing .queue_rq() invocations
have finished. This function does *not* wait until all outstanding
requests have finished (this means invocation of request.end_io()).

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Ming Lei <tom.leiming@gmail.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
---
 block/blk-mq.c         | 40 ++++++++++++++++++++++++++++++++++------
 include/linux/blk-mq.h |  3 +++
 include/linux/blkdev.h |  1 +
 3 files changed, 38 insertions(+), 6 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index d8c45de..38ae685 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -115,6 +115,23 @@ void blk_mq_unfreeze_queue(struct request_queue *q)
 }
 EXPORT_SYMBOL_GPL(blk_mq_unfreeze_queue);
 
+/**
+ * blk_mq_quiesce_queue() - wait until all ongoing queue_rq calls have finished
+ *
+ * Note: this function does not prevent that the struct request end_io()
+ * callback function is invoked. Additionally, it is not prevented that
+ * new queue_rq() calls occur unless the queue has been stopped first.
+ */
+void blk_mq_quiesce_queue(struct request_queue *q)
+{
+	struct blk_mq_hw_ctx *hctx;
+	unsigned int i;
+
+	queue_for_each_hw_ctx(q, hctx, i)
+		synchronize_srcu(&hctx->queue_rq_srcu);
+}
+EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue);
+
 void blk_mq_wake_waiters(struct request_queue *q)
 {
 	struct blk_mq_hw_ctx *hctx;
@@ -789,11 +806,13 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
 	LIST_HEAD(rq_list);
 	LIST_HEAD(driver_list);
 	struct list_head *dptr;
-	int queued;
+	int queued, srcu_idx;
 
 	if (unlikely(test_bit(BLK_MQ_S_STOPPED, &hctx->state)))
 		return;
 
+	srcu_idx = srcu_read_lock(&hctx->queue_rq_srcu);
+
 	WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) &&
 		cpu_online(hctx->next_cpu));
 
@@ -885,6 +904,8 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
 		 **/
 		blk_mq_run_hw_queue(hctx, true);
 	}
+
+	srcu_read_unlock(&hctx->queue_rq_srcu, srcu_idx);
 }
 
 /*
@@ -1298,7 +1319,7 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
 	const int is_flush_fua = bio->bi_opf & (REQ_PREFLUSH | REQ_FUA);
 	struct blk_map_ctx data;
 	struct request *rq;
-	unsigned int request_count = 0;
+	unsigned int request_count = 0, srcu_idx;
 	struct blk_plug *plug;
 	struct request *same_queue_rq = NULL;
 	blk_qc_t cookie;
@@ -1341,7 +1362,7 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
 		blk_mq_bio_to_request(rq, bio);
 
 		/*
-		 * We do limited pluging. If the bio can be merged, do that.
+		 * We do limited plugging. If the bio can be merged, do that.
 		 * Otherwise the existing request in the plug list will be
 		 * issued. So the plug list will have one request at most
 		 */
@@ -1361,9 +1382,12 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
 		blk_mq_put_ctx(data.ctx);
 		if (!old_rq)
 			goto done;
-		if (!blk_mq_direct_issue_request(old_rq, &cookie))
-			goto done;
-		blk_mq_insert_request(old_rq, false, true, true);
+
+		srcu_idx = srcu_read_lock(&data.hctx->queue_rq_srcu);
+		if (blk_mq_direct_issue_request(old_rq, &cookie) != 0)
+			blk_mq_insert_request(old_rq, false, true, true);
+		srcu_read_unlock(&data.hctx->queue_rq_srcu, srcu_idx);
+
 		goto done;
 	}
 
@@ -1659,6 +1683,8 @@ static void blk_mq_exit_hctx(struct request_queue *q,
 	if (set->ops->exit_hctx)
 		set->ops->exit_hctx(hctx, hctx_idx);
 
+	cleanup_srcu_struct(&hctx->queue_rq_srcu);
+
 	blk_mq_unregister_cpu_notifier(&hctx->cpu_notifier);
 	blk_free_flush_queue(hctx->fq);
 	sbitmap_free(&hctx->ctx_map);
@@ -1741,6 +1767,8 @@ static int blk_mq_init_hctx(struct request_queue *q,
 				   flush_start_tag + hctx_idx, node))
 		goto free_fq;
 
+	init_srcu_struct(&hctx->queue_rq_srcu);
+
 	return 0;
 
  free_fq:
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 368c460d..b2ccd3c 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -3,6 +3,7 @@
 
 #include <linux/blkdev.h>
 #include <linux/sbitmap.h>
+#include <linux/srcu.h>
 
 struct blk_mq_tags;
 struct blk_flush_queue;
@@ -41,6 +42,8 @@ struct blk_mq_hw_ctx {
 
 	struct blk_mq_tags	*tags;
 
+	struct srcu_struct	queue_rq_srcu;
+
 	unsigned long		queued;
 	unsigned long		run;
 #define BLK_MQ_MAX_DISPATCH_ORDER	7
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index c47c358..8259d87 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -824,6 +824,7 @@ extern void __blk_run_queue(struct request_queue *q);
 extern void __blk_run_queue_uncond(struct request_queue *q);
 extern void blk_run_queue(struct request_queue *);
 extern void blk_run_queue_async(struct request_queue *q);
+extern void blk_mq_quiesce_queue(struct request_queue *q);
 extern int blk_rq_map_user(struct request_queue *, struct request *,
 			   struct rq_map_data *, void __user *, unsigned long,
 			   gfp_t);
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v2 4/7] blk-mq: Introduce blk_quiesce_queue() and blk_resume_queue()
@ 2016-10-05  4:16           ` Bart Van Assche
  0 siblings, 0 replies; 75+ messages in thread
From: Bart Van Assche @ 2016-10-05  4:16 UTC (permalink / raw)


On 10/01/16 15:56, Ming Lei wrote:
> If we just call the rcu/srcu read lock(or the mutex) around .queue_rq(), the
> above code needn't to be duplicated any more.

Hello Ming,

Can you have a look at the attached patch? That patch uses an srcu read 
lock for all queue types, whether or not the BLK_MQ_F_BLOCKING flag has 
been set. Additionally, I have dropped the QUEUE_FLAG_QUIESCING flag. 
Just like previous versions, this patch has been tested.

Thanks,

Bart.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-blk-mq-Introduce-blk_mq_quiesce_queue.patch
Type: text/x-patch
Size: 5211 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-nvme/attachments/20161004/c8855299/attachment.bin>

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 4/7] blk-mq: Introduce blk_quiesce_queue() and blk_resume_queue()
@ 2016-10-05  4:32             ` Ming Lei
  0 siblings, 0 replies; 75+ messages in thread
From: Ming Lei @ 2016-10-05  4:32 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, Christoph Hellwig, James Bottomley,
	Martin K. Petersen, Mike Snitzer, Doug Ledford, Keith Busch,
	linux-block, linux-scsi, linux-rdma, linux-nvme

On Wed, Oct 5, 2016 at 12:16 PM, Bart Van Assche
<bart.vanassche@sandisk.com> wrote:
> On 10/01/16 15:56, Ming Lei wrote:
>>
>> If we just call the rcu/srcu read lock(or the mutex) around .queue_rq(),
>> the
>> above code needn't to be duplicated any more.
>
>
> Hello Ming,
>
> Can you have a look at the attached patch? That patch uses an srcu read lock
> for all queue types, whether or not the BLK_MQ_F_BLOCKING flag has been set.

That is much cleaner now.

> Additionally, I have dropped the QUEUE_FLAG_QUIESCING flag. Just like
> previous versions, this patch has been tested.

I think the flag of QUEUE_FLAG_QUIESCING is still needed because we
have to set this flag to prevent new coming .queue_rq() from being run,
and synchronize_srcu() won't wait for completion of that at all (see
section of 'Update-Side Primitives' in [1]).


[1] https://lwn.net/Articles/202847/

-- 
Ming Lei

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 4/7] blk-mq: Introduce blk_quiesce_queue() and blk_resume_queue()
@ 2016-10-05  4:32             ` Ming Lei
  0 siblings, 0 replies; 75+ messages in thread
From: Ming Lei @ 2016-10-05  4:32 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, Christoph Hellwig, James Bottomley,
	Martin K. Petersen, Mike Snitzer, Doug Ledford, Keith Busch,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Wed, Oct 5, 2016 at 12:16 PM, Bart Van Assche
<bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org> wrote:
> On 10/01/16 15:56, Ming Lei wrote:
>>
>> If we just call the rcu/srcu read lock(or the mutex) around .queue_rq(),
>> the
>> above code needn't to be duplicated any more.
>
>
> Hello Ming,
>
> Can you have a look at the attached patch? That patch uses an srcu read lock
> for all queue types, whether or not the BLK_MQ_F_BLOCKING flag has been set.

That is much cleaner now.

> Additionally, I have dropped the QUEUE_FLAG_QUIESCING flag. Just like
> previous versions, this patch has been tested.

I think the flag of QUEUE_FLAG_QUIESCING is still needed because we
have to set this flag to prevent new coming .queue_rq() from being run,
and synchronize_srcu() won't wait for completion of that at all (see
section of 'Update-Side Primitives' in [1]).


[1] https://lwn.net/Articles/202847/

-- 
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH v2 4/7] blk-mq: Introduce blk_quiesce_queue() and blk_resume_queue()
@ 2016-10-05  4:32             ` Ming Lei
  0 siblings, 0 replies; 75+ messages in thread
From: Ming Lei @ 2016-10-05  4:32 UTC (permalink / raw)


On Wed, Oct 5, 2016 at 12:16 PM, Bart Van Assche
<bart.vanassche@sandisk.com> wrote:
> On 10/01/16 15:56, Ming Lei wrote:
>>
>> If we just call the rcu/srcu read lock(or the mutex) around .queue_rq(),
>> the
>> above code needn't to be duplicated any more.
>
>
> Hello Ming,
>
> Can you have a look at the attached patch? That patch uses an srcu read lock
> for all queue types, whether or not the BLK_MQ_F_BLOCKING flag has been set.

That is much cleaner now.

> Additionally, I have dropped the QUEUE_FLAG_QUIESCING flag. Just like
> previous versions, this patch has been tested.

I think the flag of QUEUE_FLAG_QUIESCING is still needed because we
have to set this flag to prevent new coming .queue_rq() from being run,
and synchronize_srcu() won't wait for completion of that at all (see
section of 'Update-Side Primitives' in [1]).


[1] https://lwn.net/Articles/202847/

-- 
Ming Lei

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 4/7] blk-mq: Introduce blk_quiesce_queue() and blk_resume_queue()
  2016-10-05  4:32             ` Ming Lei
@ 2016-10-05 14:46               ` Bart Van Assche
  -1 siblings, 0 replies; 75+ messages in thread
From: Bart Van Assche @ 2016-10-05 14:46 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, Christoph Hellwig, James Bottomley,
	Martin K. Petersen, Mike Snitzer, Doug Ledford, Keith Busch,
	linux-block, linux-scsi, linux-rdma, linux-nvme

On 10/04/16 21:32, Ming Lei wrote:
> On Wed, Oct 5, 2016 at 12:16 PM, Bart Van Assche
> <bart.vanassche@sandisk.com> wrote:
>> On 10/01/16 15:56, Ming Lei wrote:
>>>
>>> If we just call the rcu/srcu read lock(or the mutex) around .queue_rq(),
>>> the above code needn't to be duplicated any more.
>>
>> Can you have a look at the attached patch? That patch uses an srcu read lock
>> for all queue types, whether or not the BLK_MQ_F_BLOCKING flag has been set.
>
> That is much cleaner now.
>
>> Additionally, I have dropped the QUEUE_FLAG_QUIESCING flag. Just like
>> previous versions, this patch has been tested.
>
> I think the flag of QUEUE_FLAG_QUIESCING is still needed because we
> have to set this flag to prevent new coming .queue_rq() from being run,
> and synchronize_srcu() won't wait for completion of that at all (see
> section of 'Update-Side Primitives' in [1]).
>
> [1] https://lwn.net/Articles/202847/

Hello Ming,

How about using the existing flag BLK_MQ_S_STOPPED instead of 
introducing a new QUEUE_FLAG_QUIESCING flag? From the comment above 
blk_mq_quiesce_queue() in the patch that was attached to my previous 
e-mail: "Additionally, it is not prevented that new queue_rq() calls 
occur unless the queue has been stopped first."

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH v2 4/7] blk-mq: Introduce blk_quiesce_queue() and blk_resume_queue()
@ 2016-10-05 14:46               ` Bart Van Assche
  0 siblings, 0 replies; 75+ messages in thread
From: Bart Van Assche @ 2016-10-05 14:46 UTC (permalink / raw)


On 10/04/16 21:32, Ming Lei wrote:
> On Wed, Oct 5, 2016 at 12:16 PM, Bart Van Assche
> <bart.vanassche@sandisk.com> wrote:
>> On 10/01/16 15:56, Ming Lei wrote:
>>>
>>> If we just call the rcu/srcu read lock(or the mutex) around .queue_rq(),
>>> the above code needn't to be duplicated any more.
>>
>> Can you have a look at the attached patch? That patch uses an srcu read lock
>> for all queue types, whether or not the BLK_MQ_F_BLOCKING flag has been set.
>
> That is much cleaner now.
>
>> Additionally, I have dropped the QUEUE_FLAG_QUIESCING flag. Just like
>> previous versions, this patch has been tested.
>
> I think the flag of QUEUE_FLAG_QUIESCING is still needed because we
> have to set this flag to prevent new coming .queue_rq() from being run,
> and synchronize_srcu() won't wait for completion of that at all (see
> section of 'Update-Side Primitives' in [1]).
>
> [1] https://lwn.net/Articles/202847/

Hello Ming,

How about using the existing flag BLK_MQ_S_STOPPED instead of 
introducing a new QUEUE_FLAG_QUIESCING flag? From the comment above 
blk_mq_quiesce_queue() in the patch that was attached to my previous 
e-mail: "Additionally, it is not prevented that new queue_rq() calls 
occur unless the queue has been stopped first."

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 4/7] blk-mq: Introduce blk_quiesce_queue() and blk_resume_queue()
  2016-10-05 14:46               ` Bart Van Assche
@ 2016-10-05 16:11                 ` Ming Lei
  -1 siblings, 0 replies; 75+ messages in thread
From: Ming Lei @ 2016-10-05 16:11 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, Christoph Hellwig, James Bottomley,
	Martin K. Petersen, Mike Snitzer, Doug Ledford, Keith Busch,
	linux-block, linux-scsi, linux-rdma, linux-nvme

On Wed, Oct 5, 2016 at 10:46 PM, Bart Van Assche
<bart.vanassche@sandisk.com> wrote:
> On 10/04/16 21:32, Ming Lei wrote:
>>
>> On Wed, Oct 5, 2016 at 12:16 PM, Bart Van Assche
>> <bart.vanassche@sandisk.com> wrote:
>>>
>>> On 10/01/16 15:56, Ming Lei wrote:
>>>>
>>>>
>>>> If we just call the rcu/srcu read lock(or the mutex) around .queue_rq(),
>>>> the above code needn't to be duplicated any more.
>>>
>>>
>>> Can you have a look at the attached patch? That patch uses an srcu read
>>> lock
>>> for all queue types, whether or not the BLK_MQ_F_BLOCKING flag has been
>>> set.
>>
>>
>> That is much cleaner now.
>>
>>> Additionally, I have dropped the QUEUE_FLAG_QUIESCING flag. Just like
>>> previous versions, this patch has been tested.
>>
>>
>> I think the flag of QUEUE_FLAG_QUIESCING is still needed because we
>> have to set this flag to prevent new coming .queue_rq() from being run,
>> and synchronize_srcu() won't wait for completion of that at all (see
>> section of 'Update-Side Primitives' in [1]).
>>
>> [1] https://lwn.net/Articles/202847/
>
>
> Hello Ming,
>
> How about using the existing flag BLK_MQ_S_STOPPED instead of introducing a
> new QUEUE_FLAG_QUIESCING flag? From the comment above blk_mq_quiesce_queue()

That looks fine, and we need to stop direct issue first after hw queue
becomes BLK_MQ_S_STOPPED.

> in the patch that was attached to my previous e-mail: "Additionally, it is
> not prevented that new queue_rq() calls occur unless the queue has been
> stopped first."
>
> Thanks,
>
> Bart.



-- 
Ming Lei

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH v2 4/7] blk-mq: Introduce blk_quiesce_queue() and blk_resume_queue()
@ 2016-10-05 16:11                 ` Ming Lei
  0 siblings, 0 replies; 75+ messages in thread
From: Ming Lei @ 2016-10-05 16:11 UTC (permalink / raw)


On Wed, Oct 5, 2016 at 10:46 PM, Bart Van Assche
<bart.vanassche@sandisk.com> wrote:
> On 10/04/16 21:32, Ming Lei wrote:
>>
>> On Wed, Oct 5, 2016 at 12:16 PM, Bart Van Assche
>> <bart.vanassche@sandisk.com> wrote:
>>>
>>> On 10/01/16 15:56, Ming Lei wrote:
>>>>
>>>>
>>>> If we just call the rcu/srcu read lock(or the mutex) around .queue_rq(),
>>>> the above code needn't to be duplicated any more.
>>>
>>>
>>> Can you have a look at the attached patch? That patch uses an srcu read
>>> lock
>>> for all queue types, whether or not the BLK_MQ_F_BLOCKING flag has been
>>> set.
>>
>>
>> That is much cleaner now.
>>
>>> Additionally, I have dropped the QUEUE_FLAG_QUIESCING flag. Just like
>>> previous versions, this patch has been tested.
>>
>>
>> I think the flag of QUEUE_FLAG_QUIESCING is still needed because we
>> have to set this flag to prevent new coming .queue_rq() from being run,
>> and synchronize_srcu() won't wait for completion of that at all (see
>> section of 'Update-Side Primitives' in [1]).
>>
>> [1] https://lwn.net/Articles/202847/
>
>
> Hello Ming,
>
> How about using the existing flag BLK_MQ_S_STOPPED instead of introducing a
> new QUEUE_FLAG_QUIESCING flag? From the comment above blk_mq_quiesce_queue()

That looks fine, and we need to stop direct issue first after hw queue
becomes BLK_MQ_S_STOPPED.

> in the patch that was attached to my previous e-mail: "Additionally, it is
> not prevented that new queue_rq() calls occur unless the queue has been
> stopped first."
>
> Thanks,
>
> Bart.



-- 
Ming Lei

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 6/7] SRP transport: Port srp_wait_for_queuecommand() to scsi-mq
@ 2016-10-05 17:38     ` Sagi Grimberg
  0 siblings, 0 replies; 75+ messages in thread
From: Sagi Grimberg @ 2016-10-05 17:38 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, James Bottomley, Martin K. Petersen, Mike Snitzer,
	linux-rdma, linux-nvme, Keith Busch, Doug Ledford, linux-scsi,
	Christoph Hellwig


> +static void srp_mq_wait_for_queuecommand(struct Scsi_Host *shost)
> +{
> +	struct scsi_device *sdev;
> +	struct request_queue *q;
> +
> +	shost_for_each_device(sdev, shost) {
> +		q = sdev->request_queue;
> +
> +		blk_mq_quiesce_queue(q);
> +		blk_mq_resume_queue(q);
> +	}
> +}
> +

This *should* live in scsi_lib.c. I suspect that
various drivers would really want this functionality.

Thoughts?

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 6/7] SRP transport: Port srp_wait_for_queuecommand() to scsi-mq
@ 2016-10-05 17:38     ` Sagi Grimberg
  0 siblings, 0 replies; 75+ messages in thread
From: Sagi Grimberg @ 2016-10-05 17:38 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: Christoph Hellwig, James Bottomley, Martin K. Petersen,
	Mike Snitzer, Doug Ledford, Keith Busch,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r


> +static void srp_mq_wait_for_queuecommand(struct Scsi_Host *shost)
> +{
> +	struct scsi_device *sdev;
> +	struct request_queue *q;
> +
> +	shost_for_each_device(sdev, shost) {
> +		q = sdev->request_queue;
> +
> +		blk_mq_quiesce_queue(q);
> +		blk_mq_resume_queue(q);
> +	}
> +}
> +

This *should* live in scsi_lib.c. I suspect that
various drivers would really want this functionality.

Thoughts?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH v2 6/7] SRP transport: Port srp_wait_for_queuecommand() to scsi-mq
@ 2016-10-05 17:38     ` Sagi Grimberg
  0 siblings, 0 replies; 75+ messages in thread
From: Sagi Grimberg @ 2016-10-05 17:38 UTC (permalink / raw)



> +static void srp_mq_wait_for_queuecommand(struct Scsi_Host *shost)
> +{
> +	struct scsi_device *sdev;
> +	struct request_queue *q;
> +
> +	shost_for_each_device(sdev, shost) {
> +		q = sdev->request_queue;
> +
> +		blk_mq_quiesce_queue(q);
> +		blk_mq_resume_queue(q);
> +	}
> +}
> +

This *should* live in scsi_lib.c. I suspect that
various drivers would really want this functionality.

Thoughts?

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 7/7] [RFC] nvme: Fix a race condition
  2016-09-29  0:01   ` Bart Van Assche
@ 2016-10-05 17:40     ` Sagi Grimberg
  -1 siblings, 0 replies; 75+ messages in thread
From: Sagi Grimberg @ 2016-10-05 17:40 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: Christoph Hellwig, James Bottomley, Martin K. Petersen,
	Mike Snitzer, Doug Ledford, Keith Busch, linux-block, linux-scsi,
	linux-rdma, linux-nvme


> Avoid that nvme_queue_rq() is still running when nvme_stop_queues()
> returns. Untested.
>
> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
> Cc: Keith Busch <keith.busch@intel.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Sagi Grimberg <sagi@grimberg.me>

Bart this looks really good! and possibly fixes an issue
I've been chasing with fabrics a while ago. I'll take it
for testing but you can add my:

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH v2 7/7] [RFC] nvme: Fix a race condition
@ 2016-10-05 17:40     ` Sagi Grimberg
  0 siblings, 0 replies; 75+ messages in thread
From: Sagi Grimberg @ 2016-10-05 17:40 UTC (permalink / raw)



> Avoid that nvme_queue_rq() is still running when nvme_stop_queues()
> returns. Untested.
>
> Signed-off-by: Bart Van Assche <bart.vanassche at sandisk.com>
> Cc: Keith Busch <keith.busch at intel.com>
> Cc: Christoph Hellwig <hch at lst.de>
> Cc: Sagi Grimberg <sagi at grimberg.me>

Bart this looks really good! and possibly fixes an issue
I've been chasing with fabrics a while ago. I'll take it
for testing but you can add my:

Reviewed-by: Sagi Grimberg <sagi at grimberg.me>

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 3/7] [RFC] nvme: Use BLK_MQ_S_STOPPED instead of QUEUE_FLAG_STOPPED in blk-mq code
@ 2016-10-05 17:43     ` Sagi Grimberg
  0 siblings, 0 replies; 75+ messages in thread
From: Sagi Grimberg @ 2016-10-05 17:43 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: Christoph Hellwig, James Bottomley, Martin K. Petersen,
	Mike Snitzer, Doug Ledford, Keith Busch, linux-block, linux-scsi,
	linux-rdma, linux-nvme

> Make nvme_requeue_req() check BLK_MQ_S_STOPPED instead of
> QUEUE_FLAG_STOPPED. Remove the QUEUE_FLAG_STOPPED manipulations
> that became superfluous because of this change. This patch fixes
> a race condition: using queue_flag_clear_unlocked() is not safe
> if any other function that manipulates the queue flags can be
> called concurrently, e.g. blk_cleanup_queue(). Untested.

This looks good to me, but I know keith had all sort of
creative ways to challenge this are so I'd wait for his
input...

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 3/7] [RFC] nvme: Use BLK_MQ_S_STOPPED instead of QUEUE_FLAG_STOPPED in blk-mq code
@ 2016-10-05 17:43     ` Sagi Grimberg
  0 siblings, 0 replies; 75+ messages in thread
From: Sagi Grimberg @ 2016-10-05 17:43 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: Christoph Hellwig, James Bottomley, Martin K. Petersen,
	Mike Snitzer, Doug Ledford, Keith Busch,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

> Make nvme_requeue_req() check BLK_MQ_S_STOPPED instead of
> QUEUE_FLAG_STOPPED. Remove the QUEUE_FLAG_STOPPED manipulations
> that became superfluous because of this change. This patch fixes
> a race condition: using queue_flag_clear_unlocked() is not safe
> if any other function that manipulates the queue flags can be
> called concurrently, e.g. blk_cleanup_queue(). Untested.

This looks good to me, but I know keith had all sort of
creative ways to challenge this are so I'd wait for his
input...
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH v2 3/7] [RFC] nvme: Use BLK_MQ_S_STOPPED instead of QUEUE_FLAG_STOPPED in blk-mq code
@ 2016-10-05 17:43     ` Sagi Grimberg
  0 siblings, 0 replies; 75+ messages in thread
From: Sagi Grimberg @ 2016-10-05 17:43 UTC (permalink / raw)


> Make nvme_requeue_req() check BLK_MQ_S_STOPPED instead of
> QUEUE_FLAG_STOPPED. Remove the QUEUE_FLAG_STOPPED manipulations
> that became superfluous because of this change. This patch fixes
> a race condition: using queue_flag_clear_unlocked() is not safe
> if any other function that manipulates the queue flags can be
> called concurrently, e.g. blk_cleanup_queue(). Untested.

This looks good to me, but I know keith had all sort of
creative ways to challenge this are so I'd wait for his
input...

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 1/7] blk-mq: Introduce blk_mq_queue_stopped()
  2016-09-28 23:57   ` Bart Van Assche
@ 2016-10-05 17:50     ` Sagi Grimberg
  -1 siblings, 0 replies; 75+ messages in thread
From: Sagi Grimberg @ 2016-10-05 17:50 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: Christoph Hellwig, James Bottomley, Martin K. Petersen,
	Mike Snitzer, Doug Ledford, Keith Busch, linux-block, linux-scsi,
	linux-rdma, linux-nvme

Looks good,

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH v2 1/7] blk-mq: Introduce blk_mq_queue_stopped()
@ 2016-10-05 17:50     ` Sagi Grimberg
  0 siblings, 0 replies; 75+ messages in thread
From: Sagi Grimberg @ 2016-10-05 17:50 UTC (permalink / raw)


Looks good,

Reviewed-by: Sagi Grimberg <sagi at grimberg.me>

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 4/7] blk-mq: Introduce blk_quiesce_queue() and blk_resume_queue()
@ 2016-10-05 18:14             ` Sagi Grimberg
  0 siblings, 0 replies; 75+ messages in thread
From: Sagi Grimberg @ 2016-10-05 18:14 UTC (permalink / raw)
  To: Bart Van Assche, Ming Lei
  Cc: Jens Axboe, Christoph Hellwig, James Bottomley,
	Martin K. Petersen, Mike Snitzer, Doug Ledford, Keith Busch,
	linux-block, linux-scsi, linux-rdma, linux-nvme


> Hello Ming,
>
> Can you have a look at the attached patch? That patch uses an srcu read
> lock for all queue types, whether or not the BLK_MQ_F_BLOCKING flag has
> been set. Additionally, I have dropped the QUEUE_FLAG_QUIESCING flag.
> Just like previous versions, this patch has been tested.

Hey Bart,

Do we care about the synchronization of queue_rq and/or
blk_mq_run_hw_queue of the hctx is not stopped?

I'm wandering if we can avoid introducing new barriers in the
submission path of its not absolutely needed.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 4/7] blk-mq: Introduce blk_quiesce_queue() and blk_resume_queue()
@ 2016-10-05 18:14             ` Sagi Grimberg
  0 siblings, 0 replies; 75+ messages in thread
From: Sagi Grimberg @ 2016-10-05 18:14 UTC (permalink / raw)
  To: Bart Van Assche, Ming Lei
  Cc: Jens Axboe, Christoph Hellwig, James Bottomley,
	Martin K. Petersen, Mike Snitzer, Doug Ledford, Keith Busch,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r


> Hello Ming,
>
> Can you have a look at the attached patch? That patch uses an srcu read
> lock for all queue types, whether or not the BLK_MQ_F_BLOCKING flag has
> been set. Additionally, I have dropped the QUEUE_FLAG_QUIESCING flag.
> Just like previous versions, this patch has been tested.

Hey Bart,

Do we care about the synchronization of queue_rq and/or
blk_mq_run_hw_queue of the hctx is not stopped?

I'm wandering if we can avoid introducing new barriers in the
submission path of its not absolutely needed.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH v2 4/7] blk-mq: Introduce blk_quiesce_queue() and blk_resume_queue()
@ 2016-10-05 18:14             ` Sagi Grimberg
  0 siblings, 0 replies; 75+ messages in thread
From: Sagi Grimberg @ 2016-10-05 18:14 UTC (permalink / raw)



> Hello Ming,
>
> Can you have a look at the attached patch? That patch uses an srcu read
> lock for all queue types, whether or not the BLK_MQ_F_BLOCKING flag has
> been set. Additionally, I have dropped the QUEUE_FLAG_QUIESCING flag.
> Just like previous versions, this patch has been tested.

Hey Bart,

Do we care about the synchronization of queue_rq and/or
blk_mq_run_hw_queue of the hctx is not stopped?

I'm wandering if we can avoid introducing new barriers in the
submission path of its not absolutely needed.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 4/7] blk-mq: Introduce blk_quiesce_queue() and blk_resume_queue()
  2016-10-05 18:14             ` Sagi Grimberg
@ 2016-10-05 19:05               ` Bart Van Assche
  -1 siblings, 0 replies; 75+ messages in thread
From: Bart Van Assche @ 2016-10-05 19:05 UTC (permalink / raw)
  To: Sagi Grimberg, Ming Lei
  Cc: Jens Axboe, Christoph Hellwig, James Bottomley,
	Martin K. Petersen, Mike Snitzer, Doug Ledford, Keith Busch,
	linux-block, linux-scsi, linux-rdma, linux-nvme

On 10/05/2016 11:14 AM, Sagi Grimberg wrote:
>> Hello Ming,
>>
>> Can you have a look at the attached patch? That patch uses an srcu read
>> lock for all queue types, whether or not the BLK_MQ_F_BLOCKING flag has
>> been set. Additionally, I have dropped the QUEUE_FLAG_QUIESCING flag.
>> Just like previous versions, this patch has been tested.
>
> Hey Bart,
>
> Do we care about the synchronization of queue_rq and/or
> blk_mq_run_hw_queue of the hctx is not stopped?
>
> I'm wandering if we can avoid introducing new barriers in the
> submission path of its not absolutely needed.

Hello Sagi,

I'm not sure whether the new blk_quiesce_queue() function is useful 
without stopping all hardware contexts first. In other words, in my view 
setting BLK_MQ_F_BLOCKING flag before calling blk_quiesce_queue() is 
sufficient and I don't think that a new QUEUE_FLAG_QUIESCING flag is 
necessary.

Bart.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH v2 4/7] blk-mq: Introduce blk_quiesce_queue() and blk_resume_queue()
@ 2016-10-05 19:05               ` Bart Van Assche
  0 siblings, 0 replies; 75+ messages in thread
From: Bart Van Assche @ 2016-10-05 19:05 UTC (permalink / raw)


On 10/05/2016 11:14 AM, Sagi Grimberg wrote:
>> Hello Ming,
>>
>> Can you have a look at the attached patch? That patch uses an srcu read
>> lock for all queue types, whether or not the BLK_MQ_F_BLOCKING flag has
>> been set. Additionally, I have dropped the QUEUE_FLAG_QUIESCING flag.
>> Just like previous versions, this patch has been tested.
>
> Hey Bart,
>
> Do we care about the synchronization of queue_rq and/or
> blk_mq_run_hw_queue of the hctx is not stopped?
>
> I'm wandering if we can avoid introducing new barriers in the
> submission path of its not absolutely needed.

Hello Sagi,

I'm not sure whether the new blk_quiesce_queue() function is useful 
without stopping all hardware contexts first. In other words, in my view 
setting BLK_MQ_F_BLOCKING flag before calling blk_quiesce_queue() is 
sufficient and I don't think that a new QUEUE_FLAG_QUIESCING flag is 
necessary.

Bart.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 4/7] blk-mq: Introduce blk_quiesce_queue() and blk_resume_queue()
  2016-10-05 19:05               ` Bart Van Assche
@ 2016-10-05 19:10                 ` Sagi Grimberg
  -1 siblings, 0 replies; 75+ messages in thread
From: Sagi Grimberg @ 2016-10-05 19:10 UTC (permalink / raw)
  To: Bart Van Assche, Ming Lei
  Cc: Jens Axboe, Christoph Hellwig, James Bottomley,
	Martin K. Petersen, Mike Snitzer, Doug Ledford, Keith Busch,
	linux-block, linux-scsi, linux-rdma, linux-nvme


>>> Hello Ming,
>>>
>>> Can you have a look at the attached patch? That patch uses an srcu read
>>> lock for all queue types, whether or not the BLK_MQ_F_BLOCKING flag has
>>> been set. Additionally, I have dropped the QUEUE_FLAG_QUIESCING flag.
>>> Just like previous versions, this patch has been tested.
>>
>> Hey Bart,
>>
>> Do we care about the synchronization of queue_rq and/or
>> blk_mq_run_hw_queue of the hctx is not stopped?
>>
>> I'm wandering if we can avoid introducing new barriers in the
>> submission path of its not absolutely needed.
>
> Hello Sagi,

Hey Bart,

>
> I'm not sure whether the new blk_quiesce_queue() function is useful
> without stopping all hardware contexts first. In other words, in my view
> setting BLK_MQ_F_BLOCKING flag before calling blk_quiesce_queue() is
> sufficient and I don't think that a new QUEUE_FLAG_QUIESCING flag is
> necessary.

I was referring to weather we can take srcu in the submission path
conditional of the hctx being STOPPED?

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH v2 4/7] blk-mq: Introduce blk_quiesce_queue() and blk_resume_queue()
@ 2016-10-05 19:10                 ` Sagi Grimberg
  0 siblings, 0 replies; 75+ messages in thread
From: Sagi Grimberg @ 2016-10-05 19:10 UTC (permalink / raw)



>>> Hello Ming,
>>>
>>> Can you have a look at the attached patch? That patch uses an srcu read
>>> lock for all queue types, whether or not the BLK_MQ_F_BLOCKING flag has
>>> been set. Additionally, I have dropped the QUEUE_FLAG_QUIESCING flag.
>>> Just like previous versions, this patch has been tested.
>>
>> Hey Bart,
>>
>> Do we care about the synchronization of queue_rq and/or
>> blk_mq_run_hw_queue of the hctx is not stopped?
>>
>> I'm wandering if we can avoid introducing new barriers in the
>> submission path of its not absolutely needed.
>
> Hello Sagi,

Hey Bart,

>
> I'm not sure whether the new blk_quiesce_queue() function is useful
> without stopping all hardware contexts first. In other words, in my view
> setting BLK_MQ_F_BLOCKING flag before calling blk_quiesce_queue() is
> sufficient and I don't think that a new QUEUE_FLAG_QUIESCING flag is
> necessary.

I was referring to weather we can take srcu in the submission path
conditional of the hctx being STOPPED?

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 4/7] blk-mq: Introduce blk_quiesce_queue() and blk_resume_queue()
@ 2016-10-05 21:08                   ` Bart Van Assche
  0 siblings, 0 replies; 75+ messages in thread
From: Bart Van Assche @ 2016-10-05 21:08 UTC (permalink / raw)
  To: Sagi Grimberg, Ming Lei
  Cc: Jens Axboe, Christoph Hellwig, James Bottomley,
	Martin K. Petersen, Mike Snitzer, Doug Ledford, Keith Busch,
	linux-block, linux-scsi, linux-rdma, linux-nvme

On 10/05/2016 12:11 PM, Sagi Grimberg wrote:
> I was referring to weather we can take srcu in the submission path
> conditional of the hctx being STOPPED?

Hello Sagi,

Regarding run-time overhead:
* rcu_read_lock() is a no-op on CONFIG_PREEMPT_NONE kernels and is
   translated into preempt_disable() with preemption enabled. The latter
   function modifies a per-cpu variable.
* Checking BLK_MQ_S_STOPPED before taking an rcu or srcu lock is only
   safe if the BLK_MQ_S_STOPPED flag is tested in such a way that the
   compiler is told to reread the hctx flags (READ_ONCE()) and if the
   compiler and CPU are told not to reorder test_bit() with the
   memory accesses in (s)rcu_read_lock(). To avoid races
   BLK_MQ_S_STOPPED will have to be tested a second time after the lock
   has been obtained, similar to the double-checked-locking pattern.
* srcu_read_lock() reads a word from the srcu structure, disables
   preemption, calls __srcu_read_lock() and re-enables preemption. The
   latter function increments two CPU-local variables and triggers a
   memory barrier (smp_mp()).

Swapping srcu_read_lock() and the BLK_MQ_S_STOPPED flag test will make 
the code more complicated. Going back to the implementation that calls 
rcu_read_lock() if .queue_rq() won't sleep will result in an 
implementation that is easier to read and to verify. If I overlooked 
something, please let me know.

Bart.



^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 4/7] blk-mq: Introduce blk_quiesce_queue() and blk_resume_queue()
@ 2016-10-05 21:08                   ` Bart Van Assche
  0 siblings, 0 replies; 75+ messages in thread
From: Bart Van Assche @ 2016-10-05 21:08 UTC (permalink / raw)
  To: Sagi Grimberg, Ming Lei
  Cc: Jens Axboe, Christoph Hellwig, James Bottomley,
	Martin K. Petersen, Mike Snitzer, Doug Ledford, Keith Busch,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On 10/05/2016 12:11 PM, Sagi Grimberg wrote:
> I was referring to weather we can take srcu in the submission path
> conditional of the hctx being STOPPED?

Hello Sagi,

Regarding run-time overhead:
* rcu_read_lock() is a no-op on CONFIG_PREEMPT_NONE kernels and is
   translated into preempt_disable() with preemption enabled. The latter
   function modifies a per-cpu variable.
* Checking BLK_MQ_S_STOPPED before taking an rcu or srcu lock is only
   safe if the BLK_MQ_S_STOPPED flag is tested in such a way that the
   compiler is told to reread the hctx flags (READ_ONCE()) and if the
   compiler and CPU are told not to reorder test_bit() with the
   memory accesses in (s)rcu_read_lock(). To avoid races
   BLK_MQ_S_STOPPED will have to be tested a second time after the lock
   has been obtained, similar to the double-checked-locking pattern.
* srcu_read_lock() reads a word from the srcu structure, disables
   preemption, calls __srcu_read_lock() and re-enables preemption. The
   latter function increments two CPU-local variables and triggers a
   memory barrier (smp_mp()).

Swapping srcu_read_lock() and the BLK_MQ_S_STOPPED flag test will make 
the code more complicated. Going back to the implementation that calls 
rcu_read_lock() if .queue_rq() won't sleep will result in an 
implementation that is easier to read and to verify. If I overlooked 
something, please let me know.

Bart.


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH v2 4/7] blk-mq: Introduce blk_quiesce_queue() and blk_resume_queue()
@ 2016-10-05 21:08                   ` Bart Van Assche
  0 siblings, 0 replies; 75+ messages in thread
From: Bart Van Assche @ 2016-10-05 21:08 UTC (permalink / raw)


On 10/05/2016 12:11 PM, Sagi Grimberg wrote:
> I was referring to weather we can take srcu in the submission path
> conditional of the hctx being STOPPED?

Hello Sagi,

Regarding run-time overhead:
* rcu_read_lock() is a no-op on CONFIG_PREEMPT_NONE kernels and is
   translated into preempt_disable() with preemption enabled. The latter
   function modifies a per-cpu variable.
* Checking BLK_MQ_S_STOPPED before taking an rcu or srcu lock is only
   safe if the BLK_MQ_S_STOPPED flag is tested in such a way that the
   compiler is told to reread the hctx flags (READ_ONCE()) and if the
   compiler and CPU are told not to reorder test_bit() with the
   memory accesses in (s)rcu_read_lock(). To avoid races
   BLK_MQ_S_STOPPED will have to be tested a second time after the lock
   has been obtained, similar to the double-checked-locking pattern.
* srcu_read_lock() reads a word from the srcu structure, disables
   preemption, calls __srcu_read_lock() and re-enables preemption. The
   latter function increments two CPU-local variables and triggers a
   memory barrier (smp_mp()).

Swapping srcu_read_lock() and the BLK_MQ_S_STOPPED flag test will make 
the code more complicated. Going back to the implementation that calls 
rcu_read_lock() if .queue_rq() won't sleep will result in an 
implementation that is easier to read and to verify. If I overlooked 
something, please let me know.

Bart.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 6/7] SRP transport: Port srp_wait_for_queuecommand() to scsi-mq
  2016-10-05 17:38     ` Sagi Grimberg
@ 2016-10-05 21:51       ` Bart Van Assche
  -1 siblings, 0 replies; 75+ messages in thread
From: Bart Van Assche @ 2016-10-05 21:51 UTC (permalink / raw)
  To: Sagi Grimberg, Jens Axboe
  Cc: Christoph Hellwig, James Bottomley, Martin K. Petersen,
	Mike Snitzer, Doug Ledford, Keith Busch, linux-block, linux-scsi,
	linux-rdma, linux-nvme

On 10/05/2016 10:38 AM, Sagi Grimberg wrote:
>> +static void srp_mq_wait_for_queuecommand(struct Scsi_Host *shost)
>> +{
>> +    struct scsi_device *sdev;
>> +    struct request_queue *q;
>> +
>> +    shost_for_each_device(sdev, shost) {
>> +        q = sdev->request_queue;
>> +
>> +        blk_mq_quiesce_queue(q);
>> +        blk_mq_resume_queue(q);
>> +    }
>> +}
>> +
>
> This *should* live in scsi_lib.c. I suspect that
> various drivers would really want this functionality.

Hello Sagi,

There are multiple direct blk_*() calls in other SCSI transport drivers. 
So my proposal is to wait with moving this code into scsi_lib.c until 
there is a second user of this code.

Bart.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH v2 6/7] SRP transport: Port srp_wait_for_queuecommand() to scsi-mq
@ 2016-10-05 21:51       ` Bart Van Assche
  0 siblings, 0 replies; 75+ messages in thread
From: Bart Van Assche @ 2016-10-05 21:51 UTC (permalink / raw)


On 10/05/2016 10:38 AM, Sagi Grimberg wrote:
>> +static void srp_mq_wait_for_queuecommand(struct Scsi_Host *shost)
>> +{
>> +    struct scsi_device *sdev;
>> +    struct request_queue *q;
>> +
>> +    shost_for_each_device(sdev, shost) {
>> +        q = sdev->request_queue;
>> +
>> +        blk_mq_quiesce_queue(q);
>> +        blk_mq_resume_queue(q);
>> +    }
>> +}
>> +
>
> This *should* live in scsi_lib.c. I suspect that
> various drivers would really want this functionality.

Hello Sagi,

There are multiple direct blk_*() calls in other SCSI transport drivers. 
So my proposal is to wait with moving this code into scsi_lib.c until 
there is a second user of this code.

Bart.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 4/7] blk-mq: Introduce blk_quiesce_queue() and blk_resume_queue()
@ 2016-10-05 22:49                     ` Ming Lei
  0 siblings, 0 replies; 75+ messages in thread
From: Ming Lei @ 2016-10-05 22:49 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Sagi Grimberg, Jens Axboe, Christoph Hellwig, James Bottomley,
	Martin K. Petersen, Mike Snitzer, Doug Ledford, Keith Busch,
	linux-block, linux-scsi, linux-rdma, linux-nvme

On Thu, Oct 6, 2016 at 5:08 AM, Bart Van Assche
<bart.vanassche@sandisk.com> wrote:
> On 10/05/2016 12:11 PM, Sagi Grimberg wrote:
>>
>> I was referring to weather we can take srcu in the submission path
>> conditional of the hctx being STOPPED?
>
>
> Hello Sagi,
>
> Regarding run-time overhead:
> * rcu_read_lock() is a no-op on CONFIG_PREEMPT_NONE kernels and is
>   translated into preempt_disable() with preemption enabled. The latter
>   function modifies a per-cpu variable.
> * Checking BLK_MQ_S_STOPPED before taking an rcu or srcu lock is only
>   safe if the BLK_MQ_S_STOPPED flag is tested in such a way that the
>   compiler is told to reread the hctx flags (READ_ONCE()) and if the
>   compiler and CPU are told not to reorder test_bit() with the
>   memory accesses in (s)rcu_read_lock(). To avoid races
>   BLK_MQ_S_STOPPED will have to be tested a second time after the lock
>   has been obtained, similar to the double-checked-locking pattern.
> * srcu_read_lock() reads a word from the srcu structure, disables
>   preemption, calls __srcu_read_lock() and re-enables preemption. The
>   latter function increments two CPU-local variables and triggers a
>   memory barrier (smp_mp()).

We can use srcu read lock for BLOCKING and rcu read lock for non-BLOCKING,
by putting *_read_lock() and *_read_unlock() into two wrappers, which
should minimize the cost of srcu read lock & unlock and the code is still easy
to read & verify.

>
> Swapping srcu_read_lock() and the BLK_MQ_S_STOPPED flag test will make the
> code more complicated. Going back to the implementation that calls
> rcu_read_lock() if .queue_rq() won't sleep will result in an implementation
> that is easier to read and to verify.

Yeah, I agree.

Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 4/7] blk-mq: Introduce blk_quiesce_queue() and blk_resume_queue()
@ 2016-10-05 22:49                     ` Ming Lei
  0 siblings, 0 replies; 75+ messages in thread
From: Ming Lei @ 2016-10-05 22:49 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Sagi Grimberg, Jens Axboe, Christoph Hellwig, James Bottomley,
	Martin K. Petersen, Mike Snitzer, Doug Ledford, Keith Busch,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Thu, Oct 6, 2016 at 5:08 AM, Bart Van Assche
<bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org> wrote:
> On 10/05/2016 12:11 PM, Sagi Grimberg wrote:
>>
>> I was referring to weather we can take srcu in the submission path
>> conditional of the hctx being STOPPED?
>
>
> Hello Sagi,
>
> Regarding run-time overhead:
> * rcu_read_lock() is a no-op on CONFIG_PREEMPT_NONE kernels and is
>   translated into preempt_disable() with preemption enabled. The latter
>   function modifies a per-cpu variable.
> * Checking BLK_MQ_S_STOPPED before taking an rcu or srcu lock is only
>   safe if the BLK_MQ_S_STOPPED flag is tested in such a way that the
>   compiler is told to reread the hctx flags (READ_ONCE()) and if the
>   compiler and CPU are told not to reorder test_bit() with the
>   memory accesses in (s)rcu_read_lock(). To avoid races
>   BLK_MQ_S_STOPPED will have to be tested a second time after the lock
>   has been obtained, similar to the double-checked-locking pattern.
> * srcu_read_lock() reads a word from the srcu structure, disables
>   preemption, calls __srcu_read_lock() and re-enables preemption. The
>   latter function increments two CPU-local variables and triggers a
>   memory barrier (smp_mp()).

We can use srcu read lock for BLOCKING and rcu read lock for non-BLOCKING,
by putting *_read_lock() and *_read_unlock() into two wrappers, which
should minimize the cost of srcu read lock & unlock and the code is still easy
to read & verify.

>
> Swapping srcu_read_lock() and the BLK_MQ_S_STOPPED flag test will make the
> code more complicated. Going back to the implementation that calls
> rcu_read_lock() if .queue_rq() won't sleep will result in an implementation
> that is easier to read and to verify.

Yeah, I agree.

Thanks,
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH v2 4/7] blk-mq: Introduce blk_quiesce_queue() and blk_resume_queue()
@ 2016-10-05 22:49                     ` Ming Lei
  0 siblings, 0 replies; 75+ messages in thread
From: Ming Lei @ 2016-10-05 22:49 UTC (permalink / raw)


On Thu, Oct 6, 2016 at 5:08 AM, Bart Van Assche
<bart.vanassche@sandisk.com> wrote:
> On 10/05/2016 12:11 PM, Sagi Grimberg wrote:
>>
>> I was referring to weather we can take srcu in the submission path
>> conditional of the hctx being STOPPED?
>
>
> Hello Sagi,
>
> Regarding run-time overhead:
> * rcu_read_lock() is a no-op on CONFIG_PREEMPT_NONE kernels and is
>   translated into preempt_disable() with preemption enabled. The latter
>   function modifies a per-cpu variable.
> * Checking BLK_MQ_S_STOPPED before taking an rcu or srcu lock is only
>   safe if the BLK_MQ_S_STOPPED flag is tested in such a way that the
>   compiler is told to reread the hctx flags (READ_ONCE()) and if the
>   compiler and CPU are told not to reorder test_bit() with the
>   memory accesses in (s)rcu_read_lock(). To avoid races
>   BLK_MQ_S_STOPPED will have to be tested a second time after the lock
>   has been obtained, similar to the double-checked-locking pattern.
> * srcu_read_lock() reads a word from the srcu structure, disables
>   preemption, calls __srcu_read_lock() and re-enables preemption. The
>   latter function increments two CPU-local variables and triggers a
>   memory barrier (smp_mp()).

We can use srcu read lock for BLOCKING and rcu read lock for non-BLOCKING,
by putting *_read_lock() and *_read_unlock() into two wrappers, which
should minimize the cost of srcu read lock & unlock and the code is still easy
to read & verify.

>
> Swapping srcu_read_lock() and the BLK_MQ_S_STOPPED flag test will make the
> code more complicated. Going back to the implementation that calls
> rcu_read_lock() if .queue_rq() won't sleep will result in an implementation
> that is easier to read and to verify.

Yeah, I agree.

Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 4/7] blk-mq: Introduce blk_quiesce_queue() and blk_resume_queue()
  2016-10-05 22:49                     ` Ming Lei
@ 2016-10-05 23:00                       ` Bart Van Assche
  -1 siblings, 0 replies; 75+ messages in thread
From: Bart Van Assche @ 2016-10-05 23:00 UTC (permalink / raw)
  To: Ming Lei
  Cc: Sagi Grimberg, Jens Axboe, Christoph Hellwig, James Bottomley,
	Martin K. Petersen, Mike Snitzer, Doug Ledford, Keith Busch,
	linux-block, linux-scsi, linux-rdma, linux-nvme

On 10/05/2016 03:49 PM, Ming Lei wrote:
> We can use srcu read lock for BLOCKING and rcu read lock for non-BLOCKING,
> by putting *_read_lock() and *_read_unlock() into two wrappers, which
> should minimize the cost of srcu read lock & unlock and the code is still easy
> to read & verify.

Hello Ming,

The lock checking algorithms in the sparse and smatch static checkers 
are unable to deal with code of the type "if (condition) (un)lock()". So 
unless someone has a better proposal my preference is to use the 
approach from the patch at the start of this e-mail thread.

Thanks,

Bart.


^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH v2 4/7] blk-mq: Introduce blk_quiesce_queue() and blk_resume_queue()
@ 2016-10-05 23:00                       ` Bart Van Assche
  0 siblings, 0 replies; 75+ messages in thread
From: Bart Van Assche @ 2016-10-05 23:00 UTC (permalink / raw)


On 10/05/2016 03:49 PM, Ming Lei wrote:
> We can use srcu read lock for BLOCKING and rcu read lock for non-BLOCKING,
> by putting *_read_lock() and *_read_unlock() into two wrappers, which
> should minimize the cost of srcu read lock & unlock and the code is still easy
> to read & verify.

Hello Ming,

The lock checking algorithms in the sparse and smatch static checkers 
are unable to deal with code of the type "if (condition) (un)lock()". So 
unless someone has a better proposal my preference is to use the 
approach from the patch at the start of this e-mail thread.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 1/7] blk-mq: Introduce blk_mq_queue_stopped()
  2016-09-28 23:57   ` Bart Van Assche
@ 2016-10-11 16:40     ` Christoph Hellwig
  -1 siblings, 0 replies; 75+ messages in thread
From: Christoph Hellwig @ 2016-10-11 16:40 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, Christoph Hellwig, James Bottomley,
	Martin K. Petersen, Mike Snitzer, Doug Ledford, Keith Busch,
	linux-block, linux-scsi, linux-rdma, linux-nvme

Looks fine,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH v2 1/7] blk-mq: Introduce blk_mq_queue_stopped()
@ 2016-10-11 16:40     ` Christoph Hellwig
  0 siblings, 0 replies; 75+ messages in thread
From: Christoph Hellwig @ 2016-10-11 16:40 UTC (permalink / raw)


Looks fine,

Reviewed-by: Christoph Hellwig <hch at lst.de>

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 6/7] SRP transport: Port srp_wait_for_queuecommand() to scsi-mq
  2016-10-05 21:51       ` Bart Van Assche
  (?)
@ 2016-10-11 16:44         ` Christoph Hellwig
  -1 siblings, 0 replies; 75+ messages in thread
From: Christoph Hellwig @ 2016-10-11 16:44 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Keith Busch, James Bottomley, Sagi Grimberg, Mike Snitzer,
	linux-rdma, linux-nvme, Jens Axboe, Doug Ledford, linux-block,
	Martin K. Petersen, linux-scsi, Christoph Hellwig

On Wed, Oct 05, 2016 at 02:51:50PM -0700, Bart Van Assche wrote:
> There are multiple direct blk_*() calls in other SCSI transport drivers. So 
> my proposal is to wait with moving this code into scsi_lib.c until there is 
> a second user of this code.

I still don't think these low-level difference for blk-mq vs legacy
request belong into a scsi LLDD.  So I concur with Sagi that this
should go into the core SCSI code.

In fact I suspect we should just call it directly from
scsi_internal_device_block, and maybe even scsi_internal_device_unblock
for case of setting the device offline.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 6/7] SRP transport: Port srp_wait_for_queuecommand() to scsi-mq
@ 2016-10-11 16:44         ` Christoph Hellwig
  0 siblings, 0 replies; 75+ messages in thread
From: Christoph Hellwig @ 2016-10-11 16:44 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Sagi Grimberg, Jens Axboe, Christoph Hellwig, James Bottomley,
	Martin K. Petersen, Mike Snitzer, Doug Ledford, Keith Busch,
	linux-block, linux-scsi, linux-rdma, linux-nvme

On Wed, Oct 05, 2016 at 02:51:50PM -0700, Bart Van Assche wrote:
> There are multiple direct blk_*() calls in other SCSI transport drivers. So 
> my proposal is to wait with moving this code into scsi_lib.c until there is 
> a second user of this code.

I still don't think these low-level difference for blk-mq vs legacy
request belong into a scsi LLDD.  So I concur with Sagi that this
should go into the core SCSI code.

In fact I suspect we should just call it directly from
scsi_internal_device_block, and maybe even scsi_internal_device_unblock
for case of setting the device offline.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH v2 6/7] SRP transport: Port srp_wait_for_queuecommand() to scsi-mq
@ 2016-10-11 16:44         ` Christoph Hellwig
  0 siblings, 0 replies; 75+ messages in thread
From: Christoph Hellwig @ 2016-10-11 16:44 UTC (permalink / raw)


On Wed, Oct 05, 2016@02:51:50PM -0700, Bart Van Assche wrote:
> There are multiple direct blk_*() calls in other SCSI transport drivers. So 
> my proposal is to wait with moving this code into scsi_lib.c until there is 
> a second user of this code.

I still don't think these low-level difference for blk-mq vs legacy
request belong into a scsi LLDD.  So I concur with Sagi that this
should go into the core SCSI code.

In fact I suspect we should just call it directly from
scsi_internal_device_block, and maybe even scsi_internal_device_unblock
for case of setting the device offline.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 7/7] [RFC] nvme: Fix a race condition
  2016-09-29  0:01   ` Bart Van Assche
@ 2016-10-11 16:46     ` Christoph Hellwig
  -1 siblings, 0 replies; 75+ messages in thread
From: Christoph Hellwig @ 2016-10-11 16:46 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, Christoph Hellwig, James Bottomley,
	Martin K. Petersen, Mike Snitzer, Doug Ledford, Keith Busch,
	linux-block, linux-scsi, linux-rdma, linux-nvme

On Wed, Sep 28, 2016 at 05:01:45PM -0700, Bart Van Assche wrote:
> Avoid that nvme_queue_rq() is still running when nvme_stop_queues()
> returns. Untested.
> 
> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
> Cc: Keith Busch <keith.busch@intel.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Sagi Grimberg <sagi@grimberg.me>
> ---
>  drivers/nvme/host/core.c | 20 ++++++++++++--------
>  1 file changed, 12 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index d791fba..98f1f29 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -201,13 +201,9 @@ fail:
>  
>  void nvme_requeue_req(struct request *req)
>  {
> -	unsigned long flags;
> -
>  	blk_mq_requeue_request(req);
> -	spin_lock_irqsave(req->q->queue_lock, flags);
> -	if (!blk_mq_queue_stopped(req->q))
> -		blk_mq_kick_requeue_list(req->q);
> -	spin_unlock_irqrestore(req->q->queue_lock, flags);
> +	WARN_ON_ONCE(blk_mq_queue_stopped(req->q));
> +	blk_mq_kick_requeue_list(req->q);
>  }
>  EXPORT_SYMBOL_GPL(nvme_requeue_req);

Can we just add a 'bool kick' argument to blk_mq_requeue_request and
move all this handling to the core?

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH v2 7/7] [RFC] nvme: Fix a race condition
@ 2016-10-11 16:46     ` Christoph Hellwig
  0 siblings, 0 replies; 75+ messages in thread
From: Christoph Hellwig @ 2016-10-11 16:46 UTC (permalink / raw)


On Wed, Sep 28, 2016@05:01:45PM -0700, Bart Van Assche wrote:
> Avoid that nvme_queue_rq() is still running when nvme_stop_queues()
> returns. Untested.
> 
> Signed-off-by: Bart Van Assche <bart.vanassche at sandisk.com>
> Cc: Keith Busch <keith.busch at intel.com>
> Cc: Christoph Hellwig <hch at lst.de>
> Cc: Sagi Grimberg <sagi at grimberg.me>
> ---
>  drivers/nvme/host/core.c | 20 ++++++++++++--------
>  1 file changed, 12 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index d791fba..98f1f29 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -201,13 +201,9 @@ fail:
>  
>  void nvme_requeue_req(struct request *req)
>  {
> -	unsigned long flags;
> -
>  	blk_mq_requeue_request(req);
> -	spin_lock_irqsave(req->q->queue_lock, flags);
> -	if (!blk_mq_queue_stopped(req->q))
> -		blk_mq_kick_requeue_list(req->q);
> -	spin_unlock_irqrestore(req->q->queue_lock, flags);
> +	WARN_ON_ONCE(blk_mq_queue_stopped(req->q));
> +	blk_mq_kick_requeue_list(req->q);
>  }
>  EXPORT_SYMBOL_GPL(nvme_requeue_req);

Can we just add a 'bool kick' argument to blk_mq_requeue_request and
move all this handling to the core?

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 7/7] [RFC] nvme: Fix a race condition
  2016-10-11 16:46     ` Christoph Hellwig
@ 2016-10-12  0:41       ` Bart Van Assche
  -1 siblings, 0 replies; 75+ messages in thread
From: Bart Van Assche @ 2016-10-12  0:41 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, James Bottomley, Martin K. Petersen, Mike Snitzer,
	Doug Ledford, Keith Busch, linux-block, linux-scsi, linux-rdma,
	linux-nvme

On 10/11/16 09:46, Christoph Hellwig wrote:
> On Wed, Sep 28, 2016 at 05:01:45PM -0700, Bart Van Assche wrote:
>> Avoid that nvme_queue_rq() is still running when nvme_stop_queues()
>> returns. Untested.
>>
>> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
>> Cc: Keith Busch <keith.busch@intel.com>
>> Cc: Christoph Hellwig <hch@lst.de>
>> Cc: Sagi Grimberg <sagi@grimberg.me>
>> ---
>>  drivers/nvme/host/core.c | 20 ++++++++++++--------
>>  1 file changed, 12 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>> index d791fba..98f1f29 100644
>> --- a/drivers/nvme/host/core.c
>> +++ b/drivers/nvme/host/core.c
>> @@ -201,13 +201,9 @@ fail:
>>
>>  void nvme_requeue_req(struct request *req)
>>  {
>> -	unsigned long flags;
>> -
>>  	blk_mq_requeue_request(req);
>> -	spin_lock_irqsave(req->q->queue_lock, flags);
>> -	if (!blk_mq_queue_stopped(req->q))
>> -		blk_mq_kick_requeue_list(req->q);
>> -	spin_unlock_irqrestore(req->q->queue_lock, flags);
>> +	WARN_ON_ONCE(blk_mq_queue_stopped(req->q));
>> +	blk_mq_kick_requeue_list(req->q);
>>  }
>>  EXPORT_SYMBOL_GPL(nvme_requeue_req);
>
> Can we just add a 'bool kick' argument to blk_mq_requeue_request and
> move all this handling to the core?

Hello Christoph,

That sounds like a good idea to me. Thanks also for the other review 
comments you posted on this patch series. I will rework patch 6/7 such 
that the code for waiting is moved into the SCSI core.

Bart.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH v2 7/7] [RFC] nvme: Fix a race condition
@ 2016-10-12  0:41       ` Bart Van Assche
  0 siblings, 0 replies; 75+ messages in thread
From: Bart Van Assche @ 2016-10-12  0:41 UTC (permalink / raw)


On 10/11/16 09:46, Christoph Hellwig wrote:
> On Wed, Sep 28, 2016@05:01:45PM -0700, Bart Van Assche wrote:
>> Avoid that nvme_queue_rq() is still running when nvme_stop_queues()
>> returns. Untested.
>>
>> Signed-off-by: Bart Van Assche <bart.vanassche at sandisk.com>
>> Cc: Keith Busch <keith.busch at intel.com>
>> Cc: Christoph Hellwig <hch at lst.de>
>> Cc: Sagi Grimberg <sagi at grimberg.me>
>> ---
>>  drivers/nvme/host/core.c | 20 ++++++++++++--------
>>  1 file changed, 12 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>> index d791fba..98f1f29 100644
>> --- a/drivers/nvme/host/core.c
>> +++ b/drivers/nvme/host/core.c
>> @@ -201,13 +201,9 @@ fail:
>>
>>  void nvme_requeue_req(struct request *req)
>>  {
>> -	unsigned long flags;
>> -
>>  	blk_mq_requeue_request(req);
>> -	spin_lock_irqsave(req->q->queue_lock, flags);
>> -	if (!blk_mq_queue_stopped(req->q))
>> -		blk_mq_kick_requeue_list(req->q);
>> -	spin_unlock_irqrestore(req->q->queue_lock, flags);
>> +	WARN_ON_ONCE(blk_mq_queue_stopped(req->q));
>> +	blk_mq_kick_requeue_list(req->q);
>>  }
>>  EXPORT_SYMBOL_GPL(nvme_requeue_req);
>
> Can we just add a 'bool kick' argument to blk_mq_requeue_request and
> move all this handling to the core?

Hello Christoph,

That sounds like a good idea to me. Thanks also for the other review 
comments you posted on this patch series. I will rework patch 6/7 such 
that the code for waiting is moved into the SCSI core.

Bart.

^ permalink raw reply	[flat|nested] 75+ messages in thread

end of thread, other threads:[~2016-10-12  0:41 UTC | newest]

Thread overview: 75+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-28 23:57 [PATCH v2 0/7] Introduce blk_quiesce_queue() and blk_resume_queue() Bart Van Assche
2016-09-28 23:57 ` Bart Van Assche
2016-09-28 23:57 ` [PATCH v2 1/7] blk-mq: Introduce blk_mq_queue_stopped() Bart Van Assche
2016-09-28 23:57   ` Bart Van Assche
2016-10-05 17:50   ` Sagi Grimberg
2016-10-05 17:50     ` Sagi Grimberg
2016-10-11 16:40   ` Christoph Hellwig
2016-10-11 16:40     ` Christoph Hellwig
2016-09-28 23:59 ` [PATCH v2 4/7] blk-mq: Introduce blk_quiesce_queue() and blk_resume_queue() Bart Van Assche
2016-09-28 23:59   ` Bart Van Assche
2016-09-29  5:52   ` Hannes Reinecke
2016-09-29  5:52     ` Hannes Reinecke
2016-09-29  5:52     ` Hannes Reinecke
2016-09-29 21:51   ` Ming Lei
2016-09-29 21:51     ` Ming Lei
2016-09-29 21:51     ` Ming Lei
2016-09-30 15:55     ` Bart Van Assche
2016-09-30 15:55       ` Bart Van Assche
2016-10-01 22:56       ` Ming Lei
2016-10-01 22:56         ` Ming Lei
2016-10-01 22:56         ` Ming Lei
2016-10-05  4:16         ` Bart Van Assche
2016-10-05  4:16           ` Bart Van Assche
2016-10-05  4:32           ` Ming Lei
2016-10-05  4:32             ` Ming Lei
2016-10-05  4:32             ` Ming Lei
2016-10-05 14:46             ` Bart Van Assche
2016-10-05 14:46               ` Bart Van Assche
2016-10-05 16:11               ` Ming Lei
2016-10-05 16:11                 ` Ming Lei
2016-10-05 18:14           ` Sagi Grimberg
2016-10-05 18:14             ` Sagi Grimberg
2016-10-05 18:14             ` Sagi Grimberg
2016-10-05 19:05             ` Bart Van Assche
2016-10-05 19:05               ` Bart Van Assche
2016-10-05 19:10               ` Sagi Grimberg
2016-10-05 19:10                 ` Sagi Grimberg
2016-10-05 21:08                 ` Bart Van Assche
2016-10-05 21:08                   ` Bart Van Assche
2016-10-05 21:08                   ` Bart Van Assche
2016-10-05 22:49                   ` Ming Lei
2016-10-05 22:49                     ` Ming Lei
2016-10-05 22:49                     ` Ming Lei
2016-10-05 23:00                     ` Bart Van Assche
2016-10-05 23:00                       ` Bart Van Assche
2016-09-29  0:00 ` [PATCH v2 5/7] dm: Fix a race condition related to stopping and starting queues Bart Van Assche
2016-09-29  0:00   ` Bart Van Assche
2016-09-29  0:00   ` Bart Van Assche
2016-09-29  0:01 ` [PATCH v2 6/7] SRP transport: Port srp_wait_for_queuecommand() to scsi-mq Bart Van Assche
2016-09-29  0:01   ` Bart Van Assche
2016-09-29  5:54   ` Hannes Reinecke
2016-09-29  5:54     ` Hannes Reinecke
2016-10-05 17:38   ` Sagi Grimberg
2016-10-05 17:38     ` Sagi Grimberg
2016-10-05 17:38     ` Sagi Grimberg
2016-10-05 21:51     ` Bart Van Assche
2016-10-05 21:51       ` Bart Van Assche
2016-10-11 16:44       ` Christoph Hellwig
2016-10-11 16:44         ` Christoph Hellwig
2016-10-11 16:44         ` Christoph Hellwig
2016-09-29  0:01 ` [PATCH v2 7/7] [RFC] nvme: Fix a race condition Bart Van Assche
2016-09-29  0:01   ` Bart Van Assche
2016-10-05 17:40   ` Sagi Grimberg
2016-10-05 17:40     ` Sagi Grimberg
2016-10-11 16:46   ` Christoph Hellwig
2016-10-11 16:46     ` Christoph Hellwig
2016-10-12  0:41     ` Bart Van Assche
2016-10-12  0:41       ` Bart Van Assche
2016-09-29  0:02 ` [PATCH v2 2/7] dm: Use BLK_MQ_S_STOPPED instead of QUEUE_FLAG_STOPPED in blk-mq code Bart Van Assche
2016-09-29  0:02   ` Bart Van Assche
2016-09-29  0:02 ` [PATCH v2 3/7] [RFC] nvme: " Bart Van Assche
2016-09-29  0:02   ` Bart Van Assche
2016-10-05 17:43   ` Sagi Grimberg
2016-10-05 17:43     ` Sagi Grimberg
2016-10-05 17:43     ` Sagi Grimberg

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.