linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/10] block: per-distpatch_queue flush machinery
@ 2014-09-15 13:11 Ming Lei
  2014-09-15 13:11 ` [PATCH v4 01/10] blk-mq: allocate flush_rq in blk_mq_init_flush() Ming Lei
                   ` (9 more replies)
  0 siblings, 10 replies; 24+ messages in thread
From: Ming Lei @ 2014-09-15 13:11 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel; +Cc: Christoph Hellwig

Hi,

As recent discussion, especially suggested by Christoph, this patchset
implements per-distpatch_queue flush machinery, so that:

	- current init_request and exit_request callbacks can
	cover flush request too, then the buggy copying way of
	initializing flush request's pdu can be fixed

	- flushing performance gets improved in case of multi hw-queue

About 70% throughput improvement is observed in sync write
over multi dispatch-queue virtio-blk, see details in commit log
of patch 10/10.

This patchset can be pulled from below tree too:

	git://kernel.ubuntu.com/ming/linux.git v3.17-block-dev-flush_v4

V4:
	- remove pdu copy from original request to flush request
	- don't call blk_free_flush_queue for !q->mq_ops

V3:
	- don't return failure code from blk_alloc_flush_queue() to
	avoid freeing invalid buffer in case of allocation failure
	- remove blk_init_flush() and blk_exit_flush()
	- remove unnecessary WARN_ON() from blk_alloc_flush_queue()
V2:
	- refactor blk_mq_init_hw_queues() and its pair, also it is a fix
	on failure path, so that conversion to per-queue flush becomes simple.
	- allocate/initialize flush queue in blk_mq_init_hw_queues()
	- add sync write tests on virtio-blk which is backed by SSD image

V1:
	- commit log typo fix
	- introduce blk_alloc_flush_queue() and its pair earlier, so
	that patch 5 and 8 become easier for review

 block/blk-core.c       |   12 ++--
 block/blk-flush.c      |  138 +++++++++++++++++++++++++------------
 block/blk-mq.c         |  180 +++++++++++++++++++++++++++---------------------
 block/blk-mq.h         |    1 -
 block/blk-sysfs.c      |    4 +-
 block/blk.h            |   35 +++++++++-
 include/linux/blk-mq.h |    2 +
 include/linux/blkdev.h |   10 +--
 8 files changed, 238 insertions(+), 144 deletions(-)

Thanks,
--
Ming Lei


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v4 01/10] blk-mq: allocate flush_rq in blk_mq_init_flush()
  2014-09-15 13:11 [PATCH v4 0/10] block: per-distpatch_queue flush machinery Ming Lei
@ 2014-09-15 13:11 ` Ming Lei
  2014-09-15 13:11 ` [PATCH v4 02/10] block: introduce blk_init_flush and its pair Ming Lei
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 24+ messages in thread
From: Ming Lei @ 2014-09-15 13:11 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel; +Cc: Christoph Hellwig, Ming Lei

It is reasonable to allocate flush req in blk_mq_init_flush().

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
---
 block/blk-flush.c |   11 ++++++++++-
 block/blk-mq.c    |   16 ++++++----------
 block/blk-mq.h    |    2 +-
 3 files changed, 17 insertions(+), 12 deletions(-)

diff --git a/block/blk-flush.c b/block/blk-flush.c
index 3cb5e9e..75ca6cd0 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -474,7 +474,16 @@ int blkdev_issue_flush(struct block_device *bdev, gfp_t gfp_mask,
 }
 EXPORT_SYMBOL(blkdev_issue_flush);
 
-void blk_mq_init_flush(struct request_queue *q)
+int blk_mq_init_flush(struct request_queue *q)
 {
+	struct blk_mq_tag_set *set = q->tag_set;
+
 	spin_lock_init(&q->mq_flush_lock);
+
+	q->flush_rq = kzalloc(round_up(sizeof(struct request) +
+				set->cmd_size, cache_line_size()),
+				GFP_KERNEL);
+	if (!q->flush_rq)
+		return -ENOMEM;
+	return 0;
 }
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 854342e..3b79ee7 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1831,17 +1831,10 @@ struct request_queue *blk_mq_init_queue(struct blk_mq_tag_set *set)
 	if (set->ops->complete)
 		blk_queue_softirq_done(q, set->ops->complete);
 
-	blk_mq_init_flush(q);
 	blk_mq_init_cpu_queues(q, set->nr_hw_queues);
 
-	q->flush_rq = kzalloc(round_up(sizeof(struct request) +
-				set->cmd_size, cache_line_size()),
-				GFP_KERNEL);
-	if (!q->flush_rq)
-		goto err_hw;
-
 	if (blk_mq_init_hw_queues(q, set))
-		goto err_flush_rq;
+		goto err_hw;
 
 	mutex_lock(&all_q_mutex);
 	list_add_tail(&q->all_q_node, &all_q_list);
@@ -1849,12 +1842,15 @@ struct request_queue *blk_mq_init_queue(struct blk_mq_tag_set *set)
 
 	blk_mq_add_queue_tag_set(set, q);
 
+	if (blk_mq_init_flush(q))
+		goto err_hw_queues;
+
 	blk_mq_map_swqueue(q);
 
 	return q;
 
-err_flush_rq:
-	kfree(q->flush_rq);
+err_hw_queues:
+	blk_mq_exit_hw_queues(q, set, set->nr_hw_queues);
 err_hw:
 	blk_cleanup_queue(q);
 err_hctxs:
diff --git a/block/blk-mq.h b/block/blk-mq.h
index ca4964a..b0bd9bc 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -27,7 +27,7 @@ struct blk_mq_ctx {
 
 void __blk_mq_complete_request(struct request *rq);
 void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async);
-void blk_mq_init_flush(struct request_queue *q);
+int blk_mq_init_flush(struct request_queue *q);
 void blk_mq_freeze_queue(struct request_queue *q);
 void blk_mq_free_queue(struct request_queue *q);
 void blk_mq_clone_flush_request(struct request *flush_rq,
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v4 02/10] block: introduce blk_init_flush and its pair
  2014-09-15 13:11 [PATCH v4 0/10] block: per-distpatch_queue flush machinery Ming Lei
  2014-09-15 13:11 ` [PATCH v4 01/10] blk-mq: allocate flush_rq in blk_mq_init_flush() Ming Lei
@ 2014-09-15 13:11 ` Ming Lei
  2014-09-24 10:18   ` Christoph Hellwig
  2014-09-15 13:11 ` [PATCH v4 03/10] block: move flush initialization to blk_flush_init Ming Lei
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 24+ messages in thread
From: Ming Lei @ 2014-09-15 13:11 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel; +Cc: Christoph Hellwig, Ming Lei

These two temporary functions are introduced for holding flush
initialization and de-initialization, so that we can
introduce 'flush queue' easier in the following patch. And
once 'flush queue' and its allocation/free functions are ready,
they will be removed for sake of code readability.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
---
 block/blk-core.c  |    5 ++---
 block/blk-flush.c |   19 ++++++++++++++++++-
 block/blk-mq.c    |    2 +-
 block/blk-mq.h    |    1 -
 block/blk-sysfs.c |    4 ++--
 block/blk.h       |    3 +++
 6 files changed, 26 insertions(+), 8 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 6946a42..0a9d172 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -705,8 +705,7 @@ blk_init_allocated_queue(struct request_queue *q, request_fn_proc *rfn,
 	if (!q)
 		return NULL;
 
-	q->flush_rq = kzalloc(sizeof(struct request), GFP_KERNEL);
-	if (!q->flush_rq)
+	if (blk_init_flush(q))
 		return NULL;
 
 	if (blk_init_rl(&q->root_rl, q, GFP_KERNEL))
@@ -742,7 +741,7 @@ blk_init_allocated_queue(struct request_queue *q, request_fn_proc *rfn,
 	return q;
 
 fail:
-	kfree(q->flush_rq);
+	blk_exit_flush(q);
 	return NULL;
 }
 EXPORT_SYMBOL(blk_init_allocated_queue);
diff --git a/block/blk-flush.c b/block/blk-flush.c
index 75ca6cd0..6932ee8 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -474,7 +474,7 @@ int blkdev_issue_flush(struct block_device *bdev, gfp_t gfp_mask,
 }
 EXPORT_SYMBOL(blkdev_issue_flush);
 
-int blk_mq_init_flush(struct request_queue *q)
+static int blk_mq_init_flush(struct request_queue *q)
 {
 	struct blk_mq_tag_set *set = q->tag_set;
 
@@ -487,3 +487,20 @@ int blk_mq_init_flush(struct request_queue *q)
 		return -ENOMEM;
 	return 0;
 }
+
+int blk_init_flush(struct request_queue *q)
+{
+	if (q->mq_ops)
+		return blk_mq_init_flush(q);
+
+	q->flush_rq = kzalloc(sizeof(struct request), GFP_KERNEL);
+	if (!q->flush_rq)
+		return -ENOMEM;
+
+	return 0;
+}
+
+void blk_exit_flush(struct request_queue *q)
+{
+	kfree(q->flush_rq);
+}
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 3b79ee7..467b1d8 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1842,7 +1842,7 @@ struct request_queue *blk_mq_init_queue(struct blk_mq_tag_set *set)
 
 	blk_mq_add_queue_tag_set(set, q);
 
-	if (blk_mq_init_flush(q))
+	if (blk_init_flush(q))
 		goto err_hw_queues;
 
 	blk_mq_map_swqueue(q);
diff --git a/block/blk-mq.h b/block/blk-mq.h
index b0bd9bc..a39cfa9 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -27,7 +27,6 @@ struct blk_mq_ctx {
 
 void __blk_mq_complete_request(struct request *rq);
 void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async);
-int blk_mq_init_flush(struct request_queue *q);
 void blk_mq_freeze_queue(struct request_queue *q);
 void blk_mq_free_queue(struct request_queue *q);
 void blk_mq_clone_flush_request(struct request *flush_rq,
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 4db5abf..28d3a11 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -517,11 +517,11 @@ static void blk_release_queue(struct kobject *kobj)
 	if (q->queue_tags)
 		__blk_queue_free_tags(q);
 
+	blk_exit_flush(q);
+
 	if (q->mq_ops)
 		blk_mq_free_queue(q);
 
-	kfree(q->flush_rq);
-
 	blk_trace_shutdown(q);
 
 	bdi_destroy(&q->backing_dev_info);
diff --git a/block/blk.h b/block/blk.h
index 6748c4f..261f734 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -22,6 +22,9 @@ static inline void __blk_get_queue(struct request_queue *q)
 	kobject_get(&q->kobj);
 }
 
+int blk_init_flush(struct request_queue *q);
+void blk_exit_flush(struct request_queue *q);
+
 int blk_init_rl(struct request_list *rl, struct request_queue *q,
 		gfp_t gfp_mask);
 void blk_exit_rl(struct request_list *rl);
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v4 03/10] block: move flush initialization to blk_flush_init
  2014-09-15 13:11 [PATCH v4 0/10] block: per-distpatch_queue flush machinery Ming Lei
  2014-09-15 13:11 ` [PATCH v4 01/10] blk-mq: allocate flush_rq in blk_mq_init_flush() Ming Lei
  2014-09-15 13:11 ` [PATCH v4 02/10] block: introduce blk_init_flush and its pair Ming Lei
@ 2014-09-15 13:11 ` Ming Lei
  2014-09-15 13:11 ` [PATCH v4 04/10] block: avoid to use q->flush_rq directly Ming Lei
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 24+ messages in thread
From: Ming Lei @ 2014-09-15 13:11 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel; +Cc: Christoph Hellwig, Ming Lei

These fields are always used with the flush request, so
initialize them together.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
---
 block/blk-core.c  |    3 ---
 block/blk-flush.c |    4 ++++
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 0a9d172..222fe84 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -600,9 +600,6 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id)
 #ifdef CONFIG_BLK_CGROUP
 	INIT_LIST_HEAD(&q->blkg_list);
 #endif
-	INIT_LIST_HEAD(&q->flush_queue[0]);
-	INIT_LIST_HEAD(&q->flush_queue[1]);
-	INIT_LIST_HEAD(&q->flush_data_in_flight);
 	INIT_DELAYED_WORK(&q->delay_work, blk_delay_work);
 
 	kobject_init(&q->kobj, &blk_queue_ktype);
diff --git a/block/blk-flush.c b/block/blk-flush.c
index 6932ee8..a5b2a00 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -490,6 +490,10 @@ static int blk_mq_init_flush(struct request_queue *q)
 
 int blk_init_flush(struct request_queue *q)
 {
+	INIT_LIST_HEAD(&q->flush_queue[0]);
+	INIT_LIST_HEAD(&q->flush_queue[1]);
+	INIT_LIST_HEAD(&q->flush_data_in_flight);
+
 	if (q->mq_ops)
 		return blk_mq_init_flush(q);
 
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v4 04/10] block: avoid to use q->flush_rq directly
  2014-09-15 13:11 [PATCH v4 0/10] block: per-distpatch_queue flush machinery Ming Lei
                   ` (2 preceding siblings ...)
  2014-09-15 13:11 ` [PATCH v4 03/10] block: move flush initialization to blk_flush_init Ming Lei
@ 2014-09-15 13:11 ` Ming Lei
  2014-09-15 13:11 ` [PATCH v4 05/10] block: introduce blk_flush_queue to drive flush machinery Ming Lei
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 24+ messages in thread
From: Ming Lei @ 2014-09-15 13:11 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel; +Cc: Christoph Hellwig, Ming Lei

This patch trys to use local variable to access flush request,
so that we can convert to per-queue flush machinery a bit easier.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
---
 block/blk-flush.c |   17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/block/blk-flush.c b/block/blk-flush.c
index a5b2a00..a59dd1a 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -225,7 +225,7 @@ static void flush_end_io(struct request *flush_rq, int error)
 
 	if (q->mq_ops) {
 		spin_lock_irqsave(&q->mq_flush_lock, flags);
-		q->flush_rq->tag = -1;
+		flush_rq->tag = -1;
 	}
 
 	running = &q->flush_queue[q->flush_running_idx];
@@ -283,6 +283,7 @@ static bool blk_kick_flush(struct request_queue *q)
 	struct list_head *pending = &q->flush_queue[q->flush_pending_idx];
 	struct request *first_rq =
 		list_first_entry(pending, struct request, flush.list);
+	struct request *flush_rq = q->flush_rq;
 
 	/* C1 described at the top of this file */
 	if (q->flush_pending_idx != q->flush_running_idx || list_empty(pending))
@@ -300,16 +301,16 @@ static bool blk_kick_flush(struct request_queue *q)
 	 */
 	q->flush_pending_idx ^= 1;
 
-	blk_rq_init(q, q->flush_rq);
+	blk_rq_init(q, flush_rq);
 	if (q->mq_ops)
-		blk_mq_clone_flush_request(q->flush_rq, first_rq);
+		blk_mq_clone_flush_request(flush_rq, first_rq);
 
-	q->flush_rq->cmd_type = REQ_TYPE_FS;
-	q->flush_rq->cmd_flags = WRITE_FLUSH | REQ_FLUSH_SEQ;
-	q->flush_rq->rq_disk = first_rq->rq_disk;
-	q->flush_rq->end_io = flush_end_io;
+	flush_rq->cmd_type = REQ_TYPE_FS;
+	flush_rq->cmd_flags = WRITE_FLUSH | REQ_FLUSH_SEQ;
+	flush_rq->rq_disk = first_rq->rq_disk;
+	flush_rq->end_io = flush_end_io;
 
-	return blk_flush_queue_rq(q->flush_rq, false);
+	return blk_flush_queue_rq(flush_rq, false);
 }
 
 static void flush_data_end_io(struct request *rq, int error)
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v4 05/10] block: introduce blk_flush_queue to drive flush machinery
  2014-09-15 13:11 [PATCH v4 0/10] block: per-distpatch_queue flush machinery Ming Lei
                   ` (3 preceding siblings ...)
  2014-09-15 13:11 ` [PATCH v4 04/10] block: avoid to use q->flush_rq directly Ming Lei
@ 2014-09-15 13:11 ` Ming Lei
  2014-09-15 13:11 ` [PATCH v4 06/10] block: remove blk_init_flush() and its pair Ming Lei
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 24+ messages in thread
From: Ming Lei @ 2014-09-15 13:11 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel; +Cc: Christoph Hellwig, Ming Lei

This patch introduces 'struct blk_flush_queue' and puts all
flush machinery related fields into this structure, so that

	- flush implementation details aren't exposed to driver
	- it is easy to convert to per dispatch-queue flush machinery

This patch is basically a mechanical replacement.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
---
 block/blk-core.c       |    3 +-
 block/blk-flush.c      |  107 +++++++++++++++++++++++++++++-------------------
 block/blk-mq.c         |   10 +++--
 block/blk.h            |   22 +++++++++-
 include/linux/blkdev.h |   10 +----
 5 files changed, 95 insertions(+), 57 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 222fe84..d278a30 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -390,11 +390,12 @@ static void __blk_drain_queue(struct request_queue *q, bool drain_all)
 		 * be drained.  Check all the queues and counters.
 		 */
 		if (drain_all) {
+			struct blk_flush_queue *fq = blk_get_flush_queue(q);
 			drain |= !list_empty(&q->queue_head);
 			for (i = 0; i < 2; i++) {
 				drain |= q->nr_rqs[i];
 				drain |= q->in_flight[i];
-				drain |= !list_empty(&q->flush_queue[i]);
+				drain |= !list_empty(&fq->flush_queue[i]);
 			}
 		}
 
diff --git a/block/blk-flush.c b/block/blk-flush.c
index a59dd1a..db269d4 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -28,7 +28,7 @@
  *
  * The actual execution of flush is double buffered.  Whenever a request
  * needs to execute PRE or POSTFLUSH, it queues at
- * q->flush_queue[q->flush_pending_idx].  Once certain criteria are met, a
+ * fq->flush_queue[fq->flush_pending_idx].  Once certain criteria are met, a
  * flush is issued and the pending_idx is toggled.  When the flush
  * completes, all the requests which were pending are proceeded to the next
  * step.  This allows arbitrary merging of different types of FLUSH/FUA
@@ -157,7 +157,7 @@ static bool blk_flush_queue_rq(struct request *rq, bool add_front)
  * completion and trigger the next step.
  *
  * CONTEXT:
- * spin_lock_irq(q->queue_lock or q->mq_flush_lock)
+ * spin_lock_irq(q->queue_lock or fq->mq_flush_lock)
  *
  * RETURNS:
  * %true if requests were added to the dispatch queue, %false otherwise.
@@ -166,7 +166,8 @@ static bool blk_flush_complete_seq(struct request *rq, unsigned int seq,
 				   int error)
 {
 	struct request_queue *q = rq->q;
-	struct list_head *pending = &q->flush_queue[q->flush_pending_idx];
+	struct blk_flush_queue *fq = blk_get_flush_queue(q);
+	struct list_head *pending = &fq->flush_queue[fq->flush_pending_idx];
 	bool queued = false, kicked;
 
 	BUG_ON(rq->flush.seq & seq);
@@ -182,12 +183,12 @@ static bool blk_flush_complete_seq(struct request *rq, unsigned int seq,
 	case REQ_FSEQ_POSTFLUSH:
 		/* queue for flush */
 		if (list_empty(pending))
-			q->flush_pending_since = jiffies;
+			fq->flush_pending_since = jiffies;
 		list_move_tail(&rq->flush.list, pending);
 		break;
 
 	case REQ_FSEQ_DATA:
-		list_move_tail(&rq->flush.list, &q->flush_data_in_flight);
+		list_move_tail(&rq->flush.list, &fq->flush_data_in_flight);
 		queued = blk_flush_queue_rq(rq, true);
 		break;
 
@@ -222,17 +223,18 @@ static void flush_end_io(struct request *flush_rq, int error)
 	bool queued = false;
 	struct request *rq, *n;
 	unsigned long flags = 0;
+	struct blk_flush_queue *fq = blk_get_flush_queue(q);
 
 	if (q->mq_ops) {
-		spin_lock_irqsave(&q->mq_flush_lock, flags);
+		spin_lock_irqsave(&fq->mq_flush_lock, flags);
 		flush_rq->tag = -1;
 	}
 
-	running = &q->flush_queue[q->flush_running_idx];
-	BUG_ON(q->flush_pending_idx == q->flush_running_idx);
+	running = &fq->flush_queue[fq->flush_running_idx];
+	BUG_ON(fq->flush_pending_idx == fq->flush_running_idx);
 
 	/* account completion of the flush request */
-	q->flush_running_idx ^= 1;
+	fq->flush_running_idx ^= 1;
 
 	if (!q->mq_ops)
 		elv_completed_request(q, flush_rq);
@@ -256,13 +258,13 @@ static void flush_end_io(struct request *flush_rq, int error)
 	 * directly into request_fn may confuse the driver.  Always use
 	 * kblockd.
 	 */
-	if (queued || q->flush_queue_delayed) {
+	if (queued || fq->flush_queue_delayed) {
 		WARN_ON(q->mq_ops);
 		blk_run_queue_async(q);
 	}
-	q->flush_queue_delayed = 0;
+	fq->flush_queue_delayed = 0;
 	if (q->mq_ops)
-		spin_unlock_irqrestore(&q->mq_flush_lock, flags);
+		spin_unlock_irqrestore(&fq->mq_flush_lock, flags);
 }
 
 /**
@@ -273,33 +275,34 @@ static void flush_end_io(struct request *flush_rq, int error)
  * Please read the comment at the top of this file for more info.
  *
  * CONTEXT:
- * spin_lock_irq(q->queue_lock or q->mq_flush_lock)
+ * spin_lock_irq(q->queue_lock or fq->mq_flush_lock)
  *
  * RETURNS:
  * %true if flush was issued, %false otherwise.
  */
 static bool blk_kick_flush(struct request_queue *q)
 {
-	struct list_head *pending = &q->flush_queue[q->flush_pending_idx];
+	struct blk_flush_queue *fq = blk_get_flush_queue(q);
+	struct list_head *pending = &fq->flush_queue[fq->flush_pending_idx];
 	struct request *first_rq =
 		list_first_entry(pending, struct request, flush.list);
-	struct request *flush_rq = q->flush_rq;
+	struct request *flush_rq = fq->flush_rq;
 
 	/* C1 described at the top of this file */
-	if (q->flush_pending_idx != q->flush_running_idx || list_empty(pending))
+	if (fq->flush_pending_idx != fq->flush_running_idx || list_empty(pending))
 		return false;
 
 	/* C2 and C3 */
-	if (!list_empty(&q->flush_data_in_flight) &&
+	if (!list_empty(&fq->flush_data_in_flight) &&
 	    time_before(jiffies,
-			q->flush_pending_since + FLUSH_PENDING_TIMEOUT))
+			fq->flush_pending_since + FLUSH_PENDING_TIMEOUT))
 		return false;
 
 	/*
 	 * Issue flush and toggle pending_idx.  This makes pending_idx
 	 * different from running_idx, which means flush is in flight.
 	 */
-	q->flush_pending_idx ^= 1;
+	fq->flush_pending_idx ^= 1;
 
 	blk_rq_init(q, flush_rq);
 	if (q->mq_ops)
@@ -331,6 +334,7 @@ static void mq_flush_data_end_io(struct request *rq, int error)
 	struct blk_mq_hw_ctx *hctx;
 	struct blk_mq_ctx *ctx;
 	unsigned long flags;
+	struct blk_flush_queue *fq = blk_get_flush_queue(q);
 
 	ctx = rq->mq_ctx;
 	hctx = q->mq_ops->map_queue(q, ctx->cpu);
@@ -339,10 +343,10 @@ static void mq_flush_data_end_io(struct request *rq, int error)
 	 * After populating an empty queue, kick it to avoid stall.  Read
 	 * the comment in flush_end_io().
 	 */
-	spin_lock_irqsave(&q->mq_flush_lock, flags);
+	spin_lock_irqsave(&fq->mq_flush_lock, flags);
 	if (blk_flush_complete_seq(rq, REQ_FSEQ_DATA, error))
 		blk_mq_run_hw_queue(hctx, true);
-	spin_unlock_irqrestore(&q->mq_flush_lock, flags);
+	spin_unlock_irqrestore(&fq->mq_flush_lock, flags);
 }
 
 /**
@@ -410,11 +414,13 @@ void blk_insert_flush(struct request *rq)
 	rq->cmd_flags |= REQ_FLUSH_SEQ;
 	rq->flush.saved_end_io = rq->end_io; /* Usually NULL */
 	if (q->mq_ops) {
+		struct blk_flush_queue *fq = blk_get_flush_queue(q);
+
 		rq->end_io = mq_flush_data_end_io;
 
-		spin_lock_irq(&q->mq_flush_lock);
+		spin_lock_irq(&fq->mq_flush_lock);
 		blk_flush_complete_seq(rq, REQ_FSEQ_ACTIONS & ~policy, 0);
-		spin_unlock_irq(&q->mq_flush_lock);
+		spin_unlock_irq(&fq->mq_flush_lock);
 		return;
 	}
 	rq->end_io = flush_data_end_io;
@@ -475,31 +481,48 @@ int blkdev_issue_flush(struct block_device *bdev, gfp_t gfp_mask,
 }
 EXPORT_SYMBOL(blkdev_issue_flush);
 
-static int blk_mq_init_flush(struct request_queue *q)
+static struct blk_flush_queue *blk_alloc_flush_queue(
+		struct request_queue *q)
 {
-	struct blk_mq_tag_set *set = q->tag_set;
+	struct blk_flush_queue *fq;
+	int rq_sz = sizeof(struct request);
 
-	spin_lock_init(&q->mq_flush_lock);
+	fq = kzalloc(sizeof(*fq), GFP_KERNEL);
+	if (!fq)
+		goto fail;
 
-	q->flush_rq = kzalloc(round_up(sizeof(struct request) +
-				set->cmd_size, cache_line_size()),
-				GFP_KERNEL);
-	if (!q->flush_rq)
-		return -ENOMEM;
-	return 0;
+	if (q->mq_ops) {
+		spin_lock_init(&fq->mq_flush_lock);
+		rq_sz = round_up(rq_sz + q->tag_set->cmd_size,
+				cache_line_size());
+	}
+
+	fq->flush_rq = kzalloc(rq_sz, GFP_KERNEL);
+	if (!fq->flush_rq)
+		goto fail_rq;
+
+	INIT_LIST_HEAD(&fq->flush_queue[0]);
+	INIT_LIST_HEAD(&fq->flush_queue[1]);
+	INIT_LIST_HEAD(&fq->flush_data_in_flight);
+
+	return fq;
+
+ fail_rq:
+	kfree(fq);
+ fail:
+	return NULL;
 }
 
-int blk_init_flush(struct request_queue *q)
+static void blk_free_flush_queue(struct blk_flush_queue *fq)
 {
-	INIT_LIST_HEAD(&q->flush_queue[0]);
-	INIT_LIST_HEAD(&q->flush_queue[1]);
-	INIT_LIST_HEAD(&q->flush_data_in_flight);
-
-	if (q->mq_ops)
-		return blk_mq_init_flush(q);
+	kfree(fq->flush_rq);
+	kfree(fq);
+}
 
-	q->flush_rq = kzalloc(sizeof(struct request), GFP_KERNEL);
-	if (!q->flush_rq)
+int blk_init_flush(struct request_queue *q)
+{
+	q->fq = blk_alloc_flush_queue(q);
+	if (!q->fq)
 		return -ENOMEM;
 
 	return 0;
@@ -507,5 +530,5 @@ int blk_init_flush(struct request_queue *q)
 
 void blk_exit_flush(struct request_queue *q)
 {
-	kfree(q->flush_rq);
+	blk_free_flush_queue(q->fq);
 }
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 467b1d8..a819af4 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -508,20 +508,22 @@ void blk_mq_kick_requeue_list(struct request_queue *q)
 }
 EXPORT_SYMBOL(blk_mq_kick_requeue_list);
 
-static inline bool is_flush_request(struct request *rq, unsigned int tag)
+static inline bool is_flush_request(struct request *rq,
+		struct blk_flush_queue *fq, unsigned int tag)
 {
 	return ((rq->cmd_flags & REQ_FLUSH_SEQ) &&
-			rq->q->flush_rq->tag == tag);
+			fq->flush_rq->tag == tag);
 }
 
 struct request *blk_mq_tag_to_rq(struct blk_mq_tags *tags, unsigned int tag)
 {
 	struct request *rq = tags->rqs[tag];
+	struct blk_flush_queue *fq = blk_get_flush_queue(rq->q);
 
-	if (!is_flush_request(rq, tag))
+	if (!is_flush_request(rq, fq, tag))
 		return rq;
 
-	return rq->q->flush_rq;
+	return fq->flush_rq;
 }
 EXPORT_SYMBOL(blk_mq_tag_to_rq);
 
diff --git a/block/blk.h b/block/blk.h
index 261f734..2637349 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -12,11 +12,28 @@
 /* Max future timer expiry for timeouts */
 #define BLK_MAX_TIMEOUT		(5 * HZ)
 
+struct blk_flush_queue {
+	unsigned int		flush_queue_delayed:1;
+	unsigned int		flush_pending_idx:1;
+	unsigned int		flush_running_idx:1;
+	unsigned long		flush_pending_since;
+	struct list_head	flush_queue[2];
+	struct list_head	flush_data_in_flight;
+	struct request		*flush_rq;
+	spinlock_t		mq_flush_lock;
+};
+
 extern struct kmem_cache *blk_requestq_cachep;
 extern struct kmem_cache *request_cachep;
 extern struct kobj_type blk_queue_ktype;
 extern struct ida blk_queue_ida;
 
+static inline struct blk_flush_queue *blk_get_flush_queue(
+		struct request_queue *q)
+{
+	return q->fq;
+}
+
 static inline void __blk_get_queue(struct request_queue *q)
 {
 	kobject_get(&q->kobj);
@@ -91,6 +108,7 @@ void blk_insert_flush(struct request *rq);
 static inline struct request *__elv_next_request(struct request_queue *q)
 {
 	struct request *rq;
+	struct blk_flush_queue *fq = blk_get_flush_queue(q);
 
 	while (1) {
 		if (!list_empty(&q->queue_head)) {
@@ -113,9 +131,9 @@ static inline struct request *__elv_next_request(struct request_queue *q)
 		 * should be restarted later. Please see flush_end_io() for
 		 * details.
 		 */
-		if (q->flush_pending_idx != q->flush_running_idx &&
+		if (fq->flush_pending_idx != fq->flush_running_idx &&
 				!queue_flush_queueable(q)) {
-			q->flush_queue_delayed = 1;
+			fq->flush_queue_delayed = 1;
 			return NULL;
 		}
 		if (unlikely(blk_queue_bypass(q)) ||
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index e267bf0..49f3461 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -36,6 +36,7 @@ struct request;
 struct sg_io_hdr;
 struct bsg_job;
 struct blkcg_gq;
+struct blk_flush_queue;
 
 #define BLKDEV_MIN_RQ	4
 #define BLKDEV_MAX_RQ	128	/* Default maximum */
@@ -455,14 +456,7 @@ struct request_queue {
 	 */
 	unsigned int		flush_flags;
 	unsigned int		flush_not_queueable:1;
-	unsigned int		flush_queue_delayed:1;
-	unsigned int		flush_pending_idx:1;
-	unsigned int		flush_running_idx:1;
-	unsigned long		flush_pending_since;
-	struct list_head	flush_queue[2];
-	struct list_head	flush_data_in_flight;
-	struct request		*flush_rq;
-	spinlock_t		mq_flush_lock;
+	struct blk_flush_queue	*fq;
 
 	struct list_head	requeue_list;
 	spinlock_t		requeue_lock;
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v4 06/10] block: remove blk_init_flush() and its pair
  2014-09-15 13:11 [PATCH v4 0/10] block: per-distpatch_queue flush machinery Ming Lei
                   ` (4 preceding siblings ...)
  2014-09-15 13:11 ` [PATCH v4 05/10] block: introduce blk_flush_queue to drive flush machinery Ming Lei
@ 2014-09-15 13:11 ` Ming Lei
  2014-09-24 10:20   ` Christoph Hellwig
  2014-09-15 13:11 ` [PATCH v4 07/10] block: flush: avoid to figure out flush queue unnecessarily Ming Lei
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 24+ messages in thread
From: Ming Lei @ 2014-09-15 13:11 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel; +Cc: Christoph Hellwig, Ming Lei

Now mission of the two helpers is over, and just call
blk_alloc_flush_queue() and blk_free_flush_queue() directly.

Signed-off-by: Ming Lei <ming.lei@canonical.com>
---
 block/blk-core.c  |    5 +++--
 block/blk-flush.c |   19 ++-----------------
 block/blk-mq.c    |    3 ++-
 block/blk-sysfs.c |    2 +-
 block/blk.h       |    4 ++--
 5 files changed, 10 insertions(+), 23 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index d278a30..e55a8eb 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -703,7 +703,8 @@ blk_init_allocated_queue(struct request_queue *q, request_fn_proc *rfn,
 	if (!q)
 		return NULL;
 
-	if (blk_init_flush(q))
+	q->fq = blk_alloc_flush_queue(q);
+	if (!q->fq)
 		return NULL;
 
 	if (blk_init_rl(&q->root_rl, q, GFP_KERNEL))
@@ -739,7 +740,7 @@ blk_init_allocated_queue(struct request_queue *q, request_fn_proc *rfn,
 	return q;
 
 fail:
-	blk_exit_flush(q);
+	blk_free_flush_queue(q->fq);
 	return NULL;
 }
 EXPORT_SYMBOL(blk_init_allocated_queue);
diff --git a/block/blk-flush.c b/block/blk-flush.c
index db269d4..a464b18 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -481,8 +481,7 @@ int blkdev_issue_flush(struct block_device *bdev, gfp_t gfp_mask,
 }
 EXPORT_SYMBOL(blkdev_issue_flush);
 
-static struct blk_flush_queue *blk_alloc_flush_queue(
-		struct request_queue *q)
+struct blk_flush_queue *blk_alloc_flush_queue(struct request_queue *q)
 {
 	struct blk_flush_queue *fq;
 	int rq_sz = sizeof(struct request);
@@ -513,22 +512,8 @@ static struct blk_flush_queue *blk_alloc_flush_queue(
 	return NULL;
 }
 
-static void blk_free_flush_queue(struct blk_flush_queue *fq)
+void blk_free_flush_queue(struct blk_flush_queue *fq)
 {
 	kfree(fq->flush_rq);
 	kfree(fq);
 }
-
-int blk_init_flush(struct request_queue *q)
-{
-	q->fq = blk_alloc_flush_queue(q);
-	if (!q->fq)
-		return -ENOMEM;
-
-	return 0;
-}
-
-void blk_exit_flush(struct request_queue *q)
-{
-	blk_free_flush_queue(q->fq);
-}
diff --git a/block/blk-mq.c b/block/blk-mq.c
index a819af4..beea082 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1844,7 +1844,8 @@ struct request_queue *blk_mq_init_queue(struct blk_mq_tag_set *set)
 
 	blk_mq_add_queue_tag_set(set, q);
 
-	if (blk_init_flush(q))
+	q->fq = blk_alloc_flush_queue(q);
+	if (!q->fq)
 		goto err_hw_queues;
 
 	blk_mq_map_swqueue(q);
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 28d3a11..571cd34 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -517,7 +517,7 @@ static void blk_release_queue(struct kobject *kobj)
 	if (q->queue_tags)
 		__blk_queue_free_tags(q);
 
-	blk_exit_flush(q);
+	blk_free_flush_queue(q->fq);
 
 	if (q->mq_ops)
 		blk_mq_free_queue(q);
diff --git a/block/blk.h b/block/blk.h
index 2637349..d514e3c 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -39,8 +39,8 @@ static inline void __blk_get_queue(struct request_queue *q)
 	kobject_get(&q->kobj);
 }
 
-int blk_init_flush(struct request_queue *q);
-void blk_exit_flush(struct request_queue *q);
+struct blk_flush_queue *blk_alloc_flush_queue(struct request_queue *q);
+void blk_free_flush_queue(struct blk_flush_queue *fq);
 
 int blk_init_rl(struct request_list *rl, struct request_queue *q,
 		gfp_t gfp_mask);
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v4 07/10] block: flush: avoid to figure out flush queue unnecessarily
  2014-09-15 13:11 [PATCH v4 0/10] block: per-distpatch_queue flush machinery Ming Lei
                   ` (5 preceding siblings ...)
  2014-09-15 13:11 ` [PATCH v4 06/10] block: remove blk_init_flush() and its pair Ming Lei
@ 2014-09-15 13:11 ` Ming Lei
  2014-09-15 13:11 ` [PATCH v4 08/10] block: introduce 'blk_mq_ctx' parameter to blk_get_flush_queue Ming Lei
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 24+ messages in thread
From: Ming Lei @ 2014-09-15 13:11 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel; +Cc: Christoph Hellwig, Ming Lei

Just figuring out flush queue at the entry of kicking off flush
machinery and request's completion handler, then pass it through.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
---
 block/blk-flush.c |   30 ++++++++++++++++--------------
 1 file changed, 16 insertions(+), 14 deletions(-)

diff --git a/block/blk-flush.c b/block/blk-flush.c
index a464b18..30110b6 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -91,7 +91,8 @@ enum {
 	FLUSH_PENDING_TIMEOUT	= 5 * HZ,
 };
 
-static bool blk_kick_flush(struct request_queue *q);
+static bool blk_kick_flush(struct request_queue *q,
+			   struct blk_flush_queue *fq);
 
 static unsigned int blk_flush_policy(unsigned int fflags, struct request *rq)
 {
@@ -150,6 +151,7 @@ static bool blk_flush_queue_rq(struct request *rq, bool add_front)
 /**
  * blk_flush_complete_seq - complete flush sequence
  * @rq: FLUSH/FUA request being sequenced
+ * @fq: flush queue
  * @seq: sequences to complete (mask of %REQ_FSEQ_*, can be zero)
  * @error: whether an error occurred
  *
@@ -162,11 +164,11 @@ static bool blk_flush_queue_rq(struct request *rq, bool add_front)
  * RETURNS:
  * %true if requests were added to the dispatch queue, %false otherwise.
  */
-static bool blk_flush_complete_seq(struct request *rq, unsigned int seq,
-				   int error)
+static bool blk_flush_complete_seq(struct request *rq,
+				   struct blk_flush_queue *fq,
+				   unsigned int seq, int error)
 {
 	struct request_queue *q = rq->q;
-	struct blk_flush_queue *fq = blk_get_flush_queue(q);
 	struct list_head *pending = &fq->flush_queue[fq->flush_pending_idx];
 	bool queued = false, kicked;
 
@@ -212,7 +214,7 @@ static bool blk_flush_complete_seq(struct request *rq, unsigned int seq,
 		BUG();
 	}
 
-	kicked = blk_kick_flush(q);
+	kicked = blk_kick_flush(q, fq);
 	return kicked | queued;
 }
 
@@ -244,7 +246,7 @@ static void flush_end_io(struct request *flush_rq, int error)
 		unsigned int seq = blk_flush_cur_seq(rq);
 
 		BUG_ON(seq != REQ_FSEQ_PREFLUSH && seq != REQ_FSEQ_POSTFLUSH);
-		queued |= blk_flush_complete_seq(rq, seq, error);
+		queued |= blk_flush_complete_seq(rq, fq, seq, error);
 	}
 
 	/*
@@ -270,6 +272,7 @@ static void flush_end_io(struct request *flush_rq, int error)
 /**
  * blk_kick_flush - consider issuing flush request
  * @q: request_queue being kicked
+ * @fq: flush queue
  *
  * Flush related states of @q have changed, consider issuing flush request.
  * Please read the comment at the top of this file for more info.
@@ -280,9 +283,8 @@ static void flush_end_io(struct request *flush_rq, int error)
  * RETURNS:
  * %true if flush was issued, %false otherwise.
  */
-static bool blk_kick_flush(struct request_queue *q)
+static bool blk_kick_flush(struct request_queue *q, struct blk_flush_queue *fq)
 {
-	struct blk_flush_queue *fq = blk_get_flush_queue(q);
 	struct list_head *pending = &fq->flush_queue[fq->flush_pending_idx];
 	struct request *first_rq =
 		list_first_entry(pending, struct request, flush.list);
@@ -319,12 +321,13 @@ static bool blk_kick_flush(struct request_queue *q)
 static void flush_data_end_io(struct request *rq, int error)
 {
 	struct request_queue *q = rq->q;
+	struct blk_flush_queue *fq = blk_get_flush_queue(q);
 
 	/*
 	 * After populating an empty queue, kick it to avoid stall.  Read
 	 * the comment in flush_end_io().
 	 */
-	if (blk_flush_complete_seq(rq, REQ_FSEQ_DATA, error))
+	if (blk_flush_complete_seq(rq, fq, REQ_FSEQ_DATA, error))
 		blk_run_queue_async(q);
 }
 
@@ -344,7 +347,7 @@ static void mq_flush_data_end_io(struct request *rq, int error)
 	 * the comment in flush_end_io().
 	 */
 	spin_lock_irqsave(&fq->mq_flush_lock, flags);
-	if (blk_flush_complete_seq(rq, REQ_FSEQ_DATA, error))
+	if (blk_flush_complete_seq(rq, fq, REQ_FSEQ_DATA, error))
 		blk_mq_run_hw_queue(hctx, true);
 	spin_unlock_irqrestore(&fq->mq_flush_lock, flags);
 }
@@ -366,6 +369,7 @@ void blk_insert_flush(struct request *rq)
 	struct request_queue *q = rq->q;
 	unsigned int fflags = q->flush_flags;	/* may change, cache */
 	unsigned int policy = blk_flush_policy(fflags, rq);
+	struct blk_flush_queue *fq = blk_get_flush_queue(q);
 
 	/*
 	 * @policy now records what operations need to be done.  Adjust
@@ -414,18 +418,16 @@ void blk_insert_flush(struct request *rq)
 	rq->cmd_flags |= REQ_FLUSH_SEQ;
 	rq->flush.saved_end_io = rq->end_io; /* Usually NULL */
 	if (q->mq_ops) {
-		struct blk_flush_queue *fq = blk_get_flush_queue(q);
-
 		rq->end_io = mq_flush_data_end_io;
 
 		spin_lock_irq(&fq->mq_flush_lock);
-		blk_flush_complete_seq(rq, REQ_FSEQ_ACTIONS & ~policy, 0);
+		blk_flush_complete_seq(rq, fq, REQ_FSEQ_ACTIONS & ~policy, 0);
 		spin_unlock_irq(&fq->mq_flush_lock);
 		return;
 	}
 	rq->end_io = flush_data_end_io;
 
-	blk_flush_complete_seq(rq, REQ_FSEQ_ACTIONS & ~policy, 0);
+	blk_flush_complete_seq(rq, fq, REQ_FSEQ_ACTIONS & ~policy, 0);
 }
 
 /**
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v4 08/10] block: introduce 'blk_mq_ctx' parameter to blk_get_flush_queue
  2014-09-15 13:11 [PATCH v4 0/10] block: per-distpatch_queue flush machinery Ming Lei
                   ` (6 preceding siblings ...)
  2014-09-15 13:11 ` [PATCH v4 07/10] block: flush: avoid to figure out flush queue unnecessarily Ming Lei
@ 2014-09-15 13:11 ` Ming Lei
  2014-09-24 10:21   ` Christoph Hellwig
  2014-09-15 13:11 ` [PATCH v4 09/10] blk-mq: handle failure path for initializing hctx Ming Lei
  2014-09-15 13:11 ` [PATCH v4 10/10] blk-mq: support per-distpatch_queue flush machinery Ming Lei
  9 siblings, 1 reply; 24+ messages in thread
From: Ming Lei @ 2014-09-15 13:11 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel; +Cc: Christoph Hellwig, Ming Lei

This patch adds 'blk_mq_ctx' parameter to blk_get_flush_queue(),
so that this function can find the corresponding blk_flush_queue
bound with current mq context since the flush queue will become
per hw-queue.

For legacy queue, the parameter can be simply 'NULL'.

For multiqueue case, the parameter should be set as the context
from which the related request is originated. With this context
info, the hw queue and related flush queue can be found easily.

Signed-off-by: Ming Lei <ming.lei@canonical.com>
---
 block/blk-core.c  |    2 +-
 block/blk-flush.c |   11 +++++------
 block/blk-mq.c    |    3 ++-
 block/blk.h       |    4 ++--
 4 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index e55a8eb..0238c02 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -390,7 +390,7 @@ static void __blk_drain_queue(struct request_queue *q, bool drain_all)
 		 * be drained.  Check all the queues and counters.
 		 */
 		if (drain_all) {
-			struct blk_flush_queue *fq = blk_get_flush_queue(q);
+			struct blk_flush_queue *fq = blk_get_flush_queue(q, NULL);
 			drain |= !list_empty(&q->queue_head);
 			for (i = 0; i < 2; i++) {
 				drain |= q->nr_rqs[i];
diff --git a/block/blk-flush.c b/block/blk-flush.c
index 30110b6..8ca65fb 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -225,7 +225,7 @@ static void flush_end_io(struct request *flush_rq, int error)
 	bool queued = false;
 	struct request *rq, *n;
 	unsigned long flags = 0;
-	struct blk_flush_queue *fq = blk_get_flush_queue(q);
+	struct blk_flush_queue *fq = blk_get_flush_queue(q, flush_rq->mq_ctx);
 
 	if (q->mq_ops) {
 		spin_lock_irqsave(&fq->mq_flush_lock, flags);
@@ -321,7 +321,7 @@ static bool blk_kick_flush(struct request_queue *q, struct blk_flush_queue *fq)
 static void flush_data_end_io(struct request *rq, int error)
 {
 	struct request_queue *q = rq->q;
-	struct blk_flush_queue *fq = blk_get_flush_queue(q);
+	struct blk_flush_queue *fq = blk_get_flush_queue(q, NULL);
 
 	/*
 	 * After populating an empty queue, kick it to avoid stall.  Read
@@ -335,11 +335,10 @@ static void mq_flush_data_end_io(struct request *rq, int error)
 {
 	struct request_queue *q = rq->q;
 	struct blk_mq_hw_ctx *hctx;
-	struct blk_mq_ctx *ctx;
+	struct blk_mq_ctx *ctx = rq->mq_ctx;
 	unsigned long flags;
-	struct blk_flush_queue *fq = blk_get_flush_queue(q);
+	struct blk_flush_queue *fq = blk_get_flush_queue(q, ctx);
 
-	ctx = rq->mq_ctx;
 	hctx = q->mq_ops->map_queue(q, ctx->cpu);
 
 	/*
@@ -369,7 +368,7 @@ void blk_insert_flush(struct request *rq)
 	struct request_queue *q = rq->q;
 	unsigned int fflags = q->flush_flags;	/* may change, cache */
 	unsigned int policy = blk_flush_policy(fflags, rq);
-	struct blk_flush_queue *fq = blk_get_flush_queue(q);
+	struct blk_flush_queue *fq = blk_get_flush_queue(q, rq->mq_ctx);
 
 	/*
 	 * @policy now records what operations need to be done.  Adjust
diff --git a/block/blk-mq.c b/block/blk-mq.c
index beea082..3106328 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -518,7 +518,8 @@ static inline bool is_flush_request(struct request *rq,
 struct request *blk_mq_tag_to_rq(struct blk_mq_tags *tags, unsigned int tag)
 {
 	struct request *rq = tags->rqs[tag];
-	struct blk_flush_queue *fq = blk_get_flush_queue(rq->q);
+	/* mq_ctx of flush rq is always cloned from the corresponding req */
+	struct blk_flush_queue *fq = blk_get_flush_queue(rq->q, rq->mq_ctx);
 
 	if (!is_flush_request(rq, fq, tag))
 		return rq;
diff --git a/block/blk.h b/block/blk.h
index d514e3c..b58c5d9 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -29,7 +29,7 @@ extern struct kobj_type blk_queue_ktype;
 extern struct ida blk_queue_ida;
 
 static inline struct blk_flush_queue *blk_get_flush_queue(
-		struct request_queue *q)
+		struct request_queue *q, struct blk_mq_ctx *ctx)
 {
 	return q->fq;
 }
@@ -108,7 +108,7 @@ void blk_insert_flush(struct request *rq);
 static inline struct request *__elv_next_request(struct request_queue *q)
 {
 	struct request *rq;
-	struct blk_flush_queue *fq = blk_get_flush_queue(q);
+	struct blk_flush_queue *fq = blk_get_flush_queue(q, NULL);
 
 	while (1) {
 		if (!list_empty(&q->queue_head)) {
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v4 09/10] blk-mq: handle failure path for initializing hctx
  2014-09-15 13:11 [PATCH v4 0/10] block: per-distpatch_queue flush machinery Ming Lei
                   ` (7 preceding siblings ...)
  2014-09-15 13:11 ` [PATCH v4 08/10] block: introduce 'blk_mq_ctx' parameter to blk_get_flush_queue Ming Lei
@ 2014-09-15 13:11 ` Ming Lei
  2014-09-24 10:22   ` Christoph Hellwig
  2014-09-15 13:11 ` [PATCH v4 10/10] blk-mq: support per-distpatch_queue flush machinery Ming Lei
  9 siblings, 1 reply; 24+ messages in thread
From: Ming Lei @ 2014-09-15 13:11 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel; +Cc: Christoph Hellwig, Ming Lei

Failure of initializing one hctx isn't handled, so this patch
introduces blk_mq_init_hctx() and its pair to handle it explicitly.
Also this patch makes code cleaner.

Signed-off-by: Ming Lei <ming.lei@canonical.com>
---
 block/blk-mq.c |  114 ++++++++++++++++++++++++++++++++++----------------------
 1 file changed, 69 insertions(+), 45 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 3106328..eb4a90a 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1527,6 +1527,20 @@ static int blk_mq_hctx_notify(void *data, unsigned long action,
 	return NOTIFY_OK;
 }
 
+static void blk_mq_exit_hctx(struct request_queue *q,
+		struct blk_mq_tag_set *set,
+		struct blk_mq_hw_ctx *hctx, unsigned int hctx_idx)
+{
+	blk_mq_tag_idle(hctx);
+
+	if (set->ops->exit_hctx)
+		set->ops->exit_hctx(hctx, hctx_idx);
+
+	blk_mq_unregister_cpu_notifier(&hctx->cpu_notifier);
+	kfree(hctx->ctxs);
+	blk_mq_free_bitmap(&hctx->ctx_map);
+}
+
 static void blk_mq_exit_hw_queues(struct request_queue *q,
 		struct blk_mq_tag_set *set, int nr_queue)
 {
@@ -1536,17 +1550,8 @@ static void blk_mq_exit_hw_queues(struct request_queue *q,
 	queue_for_each_hw_ctx(q, hctx, i) {
 		if (i == nr_queue)
 			break;
-
-		blk_mq_tag_idle(hctx);
-
-		if (set->ops->exit_hctx)
-			set->ops->exit_hctx(hctx, i);
-
-		blk_mq_unregister_cpu_notifier(&hctx->cpu_notifier);
-		kfree(hctx->ctxs);
-		blk_mq_free_bitmap(&hctx->ctx_map);
+		blk_mq_exit_hctx(q, set, hctx, i);
 	}
-
 }
 
 static void blk_mq_free_hw_queues(struct request_queue *q,
@@ -1561,53 +1566,72 @@ static void blk_mq_free_hw_queues(struct request_queue *q,
 	}
 }
 
-static int blk_mq_init_hw_queues(struct request_queue *q,
-		struct blk_mq_tag_set *set)
+static int blk_mq_init_hctx(struct request_queue *q,
+		struct blk_mq_tag_set *set,
+		struct blk_mq_hw_ctx *hctx, unsigned hctx_idx)
 {
-	struct blk_mq_hw_ctx *hctx;
-	unsigned int i;
+	int node;
+
+	node = hctx->numa_node;
+	if (node == NUMA_NO_NODE)
+		node = hctx->numa_node = set->numa_node;
+
+	INIT_DELAYED_WORK(&hctx->run_work, blk_mq_run_work_fn);
+	INIT_DELAYED_WORK(&hctx->delay_work, blk_mq_delay_work_fn);
+	spin_lock_init(&hctx->lock);
+	INIT_LIST_HEAD(&hctx->dispatch);
+	hctx->queue = q;
+	hctx->queue_num = hctx_idx;
+	hctx->flags = set->flags;
+	hctx->cmd_size = set->cmd_size;
+
+	blk_mq_init_cpu_notifier(&hctx->cpu_notifier,
+					blk_mq_hctx_notify, hctx);
+	blk_mq_register_cpu_notifier(&hctx->cpu_notifier);
+
+	hctx->tags = set->tags[hctx_idx];
 
 	/*
-	 * Initialize hardware queues
+	 * Allocate space for all possible cpus to avoid allocation at
+	 * runtime
 	 */
-	queue_for_each_hw_ctx(q, hctx, i) {
-		int node;
+	hctx->ctxs = kmalloc_node(nr_cpu_ids * sizeof(void *),
+					GFP_KERNEL, node);
+	if (!hctx->ctxs)
+		goto unregister_cpu_notifier;
 
-		node = hctx->numa_node;
-		if (node == NUMA_NO_NODE)
-			node = hctx->numa_node = set->numa_node;
+	if (blk_mq_alloc_bitmap(&hctx->ctx_map, node))
+		goto free_ctxs;
 
-		INIT_DELAYED_WORK(&hctx->run_work, blk_mq_run_work_fn);
-		INIT_DELAYED_WORK(&hctx->delay_work, blk_mq_delay_work_fn);
-		spin_lock_init(&hctx->lock);
-		INIT_LIST_HEAD(&hctx->dispatch);
-		hctx->queue = q;
-		hctx->queue_num = i;
-		hctx->flags = set->flags;
-		hctx->cmd_size = set->cmd_size;
+	hctx->nr_ctx = 0;
 
-		blk_mq_init_cpu_notifier(&hctx->cpu_notifier,
-						blk_mq_hctx_notify, hctx);
-		blk_mq_register_cpu_notifier(&hctx->cpu_notifier);
+	if (set->ops->init_hctx &&
+	    set->ops->init_hctx(hctx, set->driver_data, hctx_idx))
+		goto free_bitmap;
 
-		hctx->tags = set->tags[i];
+	return 0;
 
-		/*
-		 * Allocate space for all possible cpus to avoid allocation at
-		 * runtime
-		 */
-		hctx->ctxs = kmalloc_node(nr_cpu_ids * sizeof(void *),
-						GFP_KERNEL, node);
-		if (!hctx->ctxs)
-			break;
+ free_bitmap:
+	blk_mq_free_bitmap(&hctx->ctx_map);
+ free_ctxs:
+	kfree(hctx->ctxs);
+ unregister_cpu_notifier:
+	blk_mq_unregister_cpu_notifier(&hctx->cpu_notifier);
 
-		if (blk_mq_alloc_bitmap(&hctx->ctx_map, node))
-			break;
+	return -1;
+}
 
-		hctx->nr_ctx = 0;
+static int blk_mq_init_hw_queues(struct request_queue *q,
+		struct blk_mq_tag_set *set)
+{
+	struct blk_mq_hw_ctx *hctx;
+	unsigned int i;
 
-		if (set->ops->init_hctx &&
-		    set->ops->init_hctx(hctx, set->driver_data, i))
+	/*
+	 * Initialize hardware queues
+	 */
+	queue_for_each_hw_ctx(q, hctx, i) {
+		if (blk_mq_init_hctx(q, set, hctx, i))
 			break;
 	}
 
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v4 10/10] blk-mq: support per-distpatch_queue flush machinery
  2014-09-15 13:11 [PATCH v4 0/10] block: per-distpatch_queue flush machinery Ming Lei
                   ` (8 preceding siblings ...)
  2014-09-15 13:11 ` [PATCH v4 09/10] blk-mq: handle failure path for initializing hctx Ming Lei
@ 2014-09-15 13:11 ` Ming Lei
  2014-09-24 10:26   ` Christoph Hellwig
  9 siblings, 1 reply; 24+ messages in thread
From: Ming Lei @ 2014-09-15 13:11 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel; +Cc: Christoph Hellwig, Ming Lei

This patch supports to run one single flush machinery for
each blk-mq dispatch queue, so that:

- current init_request and exit_request callbacks can
cover flush request too, then the buggy copying way of
initializing flush request's pdu can be fixed

- flushing performance gets improved in case of multi hw-queue

In fio sync write test over virtio-blk(4 hw queues, ioengine=sync,
iodepth=64, numjobs=4, bs=4K), it is observed that througput gets
increased a lot over my test environment:
	- throughput: +70% in case of virtio-blk over null_blk
	- throughput: +30% in case of virtio-blk over SSD image

The multi virtqueue feature isn't merged to QEMU yet, and patches for
the feature can be found in below tree:

	git://kernel.ubuntu.com/ming/qemu.git  	v2.1.0-mq.3

And simply passing 'num_queues=4 vectors=5' should be enough to
enable multi queue(quad queue) feature for QEMU virtio-blk.

Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
---
 block/blk-core.c       |    2 +-
 block/blk-flush.c      |   24 +++++++++++++++--------
 block/blk-mq.c         |   50 +++++++++++++++++++++++-------------------------
 block/blk-sysfs.c      |    4 ++--
 block/blk.h            |   16 +++++++++++++---
 include/linux/blk-mq.h |    2 ++
 6 files changed, 58 insertions(+), 40 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 0238c02..122781e 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -703,7 +703,7 @@ blk_init_allocated_queue(struct request_queue *q, request_fn_proc *rfn,
 	if (!q)
 		return NULL;
 
-	q->fq = blk_alloc_flush_queue(q);
+	q->fq = blk_alloc_flush_queue(q, NULL, 0);
 	if (!q->fq)
 		return NULL;
 
diff --git a/block/blk-flush.c b/block/blk-flush.c
index 8ca65fb..b439670 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -307,8 +307,15 @@ static bool blk_kick_flush(struct request_queue *q, struct blk_flush_queue *fq)
 	fq->flush_pending_idx ^= 1;
 
 	blk_rq_init(q, flush_rq);
-	if (q->mq_ops)
-		blk_mq_clone_flush_request(flush_rq, first_rq);
+
+	/*
+	 * Borrow tag from the first request since they can't
+	 * be in flight at the same time.
+	 */
+	if (q->mq_ops) {
+		flush_rq->mq_ctx = first_rq->mq_ctx;
+		flush_rq->tag = first_rq->tag;
+	}
 
 	flush_rq->cmd_type = REQ_TYPE_FS;
 	flush_rq->cmd_flags = WRITE_FLUSH | REQ_FLUSH_SEQ;
@@ -482,22 +489,23 @@ int blkdev_issue_flush(struct block_device *bdev, gfp_t gfp_mask,
 }
 EXPORT_SYMBOL(blkdev_issue_flush);
 
-struct blk_flush_queue *blk_alloc_flush_queue(struct request_queue *q)
+struct blk_flush_queue *blk_alloc_flush_queue(struct request_queue *q,
+		struct blk_mq_hw_ctx *hctx, int cmd_size)
 {
 	struct blk_flush_queue *fq;
 	int rq_sz = sizeof(struct request);
+	int node = hctx ? hctx->numa_node : NUMA_NO_NODE;
 
-	fq = kzalloc(sizeof(*fq), GFP_KERNEL);
+	fq = kzalloc_node(sizeof(*fq), GFP_KERNEL, node);
 	if (!fq)
 		goto fail;
 
-	if (q->mq_ops) {
+	if (hctx) {
 		spin_lock_init(&fq->mq_flush_lock);
-		rq_sz = round_up(rq_sz + q->tag_set->cmd_size,
-				cache_line_size());
+		rq_sz = round_up(rq_sz + cmd_size, cache_line_size());
 	}
 
-	fq->flush_rq = kzalloc(rq_sz, GFP_KERNEL);
+	fq->flush_rq = kzalloc_node(rq_sz, GFP_KERNEL, node);
 	if (!fq->flush_rq)
 		goto fail_rq;
 
diff --git a/block/blk-mq.c b/block/blk-mq.c
index eb4a90a..5d9f660 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -280,26 +280,6 @@ void blk_mq_free_request(struct request *rq)
 	__blk_mq_free_request(hctx, ctx, rq);
 }
 
-/*
- * Clone all relevant state from a request that has been put on hold in
- * the flush state machine into the preallocated flush request that hangs
- * off the request queue.
- *
- * For a driver the flush request should be invisible, that's why we are
- * impersonating the original request here.
- */
-void blk_mq_clone_flush_request(struct request *flush_rq,
-		struct request *orig_rq)
-{
-	struct blk_mq_hw_ctx *hctx =
-		orig_rq->q->mq_ops->map_queue(orig_rq->q, orig_rq->mq_ctx->cpu);
-
-	flush_rq->mq_ctx = orig_rq->mq_ctx;
-	flush_rq->tag = orig_rq->tag;
-	memcpy(blk_mq_rq_to_pdu(flush_rq), blk_mq_rq_to_pdu(orig_rq),
-		hctx->cmd_size);
-}
-
 inline void __blk_mq_end_io(struct request *rq, int error)
 {
 	blk_account_io_done(rq);
@@ -1531,12 +1511,20 @@ static void blk_mq_exit_hctx(struct request_queue *q,
 		struct blk_mq_tag_set *set,
 		struct blk_mq_hw_ctx *hctx, unsigned int hctx_idx)
 {
+	unsigned flush_start_tag = set->queue_depth;
+
 	blk_mq_tag_idle(hctx);
 
+	if (set->ops->exit_request)
+		set->ops->exit_request(set->driver_data,
+				       hctx->fq->flush_rq, hctx_idx,
+				       flush_start_tag + hctx_idx);
+
 	if (set->ops->exit_hctx)
 		set->ops->exit_hctx(hctx, hctx_idx);
 
 	blk_mq_unregister_cpu_notifier(&hctx->cpu_notifier);
+	blk_free_flush_queue(hctx->fq);
 	kfree(hctx->ctxs);
 	blk_mq_free_bitmap(&hctx->ctx_map);
 }
@@ -1571,6 +1559,7 @@ static int blk_mq_init_hctx(struct request_queue *q,
 		struct blk_mq_hw_ctx *hctx, unsigned hctx_idx)
 {
 	int node;
+	unsigned flush_start_tag = set->queue_depth;
 
 	node = hctx->numa_node;
 	if (node == NUMA_NO_NODE)
@@ -1609,8 +1598,23 @@ static int blk_mq_init_hctx(struct request_queue *q,
 	    set->ops->init_hctx(hctx, set->driver_data, hctx_idx))
 		goto free_bitmap;
 
+	hctx->fq = blk_alloc_flush_queue(q, hctx, set->cmd_size);
+	if (!hctx->fq)
+		goto exit_hctx;
+
+	if (set->ops->init_request &&
+	    set->ops->init_request(set->driver_data,
+				   hctx->fq->flush_rq, hctx_idx,
+				   flush_start_tag + hctx_idx, node))
+		goto free_fq;
+
 	return 0;
 
+ free_fq:
+	kfree(hctx->fq);
+ exit_hctx:
+	if (set->ops->exit_hctx)
+		set->ops->exit_hctx(hctx, hctx_idx);
  free_bitmap:
 	blk_mq_free_bitmap(&hctx->ctx_map);
  free_ctxs:
@@ -1869,16 +1873,10 @@ struct request_queue *blk_mq_init_queue(struct blk_mq_tag_set *set)
 
 	blk_mq_add_queue_tag_set(set, q);
 
-	q->fq = blk_alloc_flush_queue(q);
-	if (!q->fq)
-		goto err_hw_queues;
-
 	blk_mq_map_swqueue(q);
 
 	return q;
 
-err_hw_queues:
-	blk_mq_exit_hw_queues(q, set, set->nr_hw_queues);
 err_hw:
 	blk_cleanup_queue(q);
 err_hctxs:
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 571cd34..b986561 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -517,10 +517,10 @@ static void blk_release_queue(struct kobject *kobj)
 	if (q->queue_tags)
 		__blk_queue_free_tags(q);
 
-	blk_free_flush_queue(q->fq);
-
 	if (q->mq_ops)
 		blk_mq_free_queue(q);
+	else
+		blk_free_flush_queue(q->fq);
 
 	blk_trace_shutdown(q);
 
diff --git a/block/blk.h b/block/blk.h
index b58c5d9..9051637 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -2,6 +2,8 @@
 #define BLK_INTERNAL_H
 
 #include <linux/idr.h>
+#include <linux/blk-mq.h>
+#include "blk-mq.h"
 
 /* Amount of time in which a process may batch requests */
 #define BLK_BATCH_TIME	(HZ/50UL)
@@ -31,7 +33,14 @@ extern struct ida blk_queue_ida;
 static inline struct blk_flush_queue *blk_get_flush_queue(
 		struct request_queue *q, struct blk_mq_ctx *ctx)
 {
-	return q->fq;
+	struct blk_mq_hw_ctx *hctx;
+
+	if (!q->mq_ops)
+		return q->fq;
+
+	hctx = q->mq_ops->map_queue(q, ctx->cpu);
+
+	return hctx->fq;
 }
 
 static inline void __blk_get_queue(struct request_queue *q)
@@ -39,8 +48,9 @@ static inline void __blk_get_queue(struct request_queue *q)
 	kobject_get(&q->kobj);
 }
 
-struct blk_flush_queue *blk_alloc_flush_queue(struct request_queue *q);
-void blk_free_flush_queue(struct blk_flush_queue *fq);
+struct blk_flush_queue *blk_alloc_flush_queue(struct request_queue *q,
+		struct blk_mq_hw_ctx *hctx, int cmd_size);
+void blk_free_flush_queue(struct blk_flush_queue *q);
 
 int blk_init_rl(struct request_list *rl, struct request_queue *q,
 		gfp_t gfp_mask);
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index a1e31f2..1f3c523 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -4,6 +4,7 @@
 #include <linux/blkdev.h>
 
 struct blk_mq_tags;
+struct blk_flush_queue;
 
 struct blk_mq_cpu_notifier {
 	struct list_head list;
@@ -34,6 +35,7 @@ struct blk_mq_hw_ctx {
 
 	struct request_queue	*queue;
 	unsigned int		queue_num;
+	struct blk_flush_queue	*fq;
 
 	void			*driver_data;
 
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 02/10] block: introduce blk_init_flush and its pair
  2014-09-15 13:11 ` [PATCH v4 02/10] block: introduce blk_init_flush and its pair Ming Lei
@ 2014-09-24 10:18   ` Christoph Hellwig
  2014-09-24 14:46     ` Ming Lei
  0 siblings, 1 reply; 24+ messages in thread
From: Christoph Hellwig @ 2014-09-24 10:18 UTC (permalink / raw)
  To: Ming Lei; +Cc: Jens Axboe, linux-kernel

On Mon, Sep 15, 2014 at 09:11:06PM +0800, Ming Lei wrote:
> These two temporary functions are introduced for holding flush
> initialization and de-initialization, so that we can
> introduce 'flush queue' easier in the following patch. And
> once 'flush queue' and its allocation/free functions are ready,
> they will be removed for sake of code readability.

Shouldn't we just do the mq work in this helper as well?  blk_mq_init_flush
does exactly the same work, plus initializing a lock, which is harmless
for the !mq case as well.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 06/10] block: remove blk_init_flush() and its pair
  2014-09-15 13:11 ` [PATCH v4 06/10] block: remove blk_init_flush() and its pair Ming Lei
@ 2014-09-24 10:20   ` Christoph Hellwig
  2014-09-24 14:49     ` Ming Lei
  0 siblings, 1 reply; 24+ messages in thread
From: Christoph Hellwig @ 2014-09-24 10:20 UTC (permalink / raw)
  To: Ming Lei; +Cc: Jens Axboe, linux-kernel, Christoph Hellwig

On Mon, Sep 15, 2014 at 09:11:10PM +0800, Ming Lei wrote:
> Now mission of the two helpers is over, and just call
> blk_alloc_flush_queue() and blk_free_flush_queue() directly.

I suspect it might be much easier to fold all previous patches except
for patch 4, which stands alone into one.  There's just too much churn
there which doesn't stay in the final version.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 08/10] block: introduce 'blk_mq_ctx' parameter to blk_get_flush_queue
  2014-09-15 13:11 ` [PATCH v4 08/10] block: introduce 'blk_mq_ctx' parameter to blk_get_flush_queue Ming Lei
@ 2014-09-24 10:21   ` Christoph Hellwig
  2014-09-24 14:56     ` Ming Lei
  0 siblings, 1 reply; 24+ messages in thread
From: Christoph Hellwig @ 2014-09-24 10:21 UTC (permalink / raw)
  To: Ming Lei; +Cc: Jens Axboe, linux-kernel, Christoph Hellwig

On Mon, Sep 15, 2014 at 09:11:12PM +0800, Ming Lei wrote:
> This patch adds 'blk_mq_ctx' parameter to blk_get_flush_queue(),
> so that this function can find the corresponding blk_flush_queue
> bound with current mq context since the flush queue will become
> per hw-queue.
> 
> For legacy queue, the parameter can be simply 'NULL'.
> 
> For multiqueue case, the parameter should be set as the context
> from which the related request is originated. With this context
> info, the hw queue and related flush queue can be found easily.

I think this should be merged into the patch introducing
blk_get_flush_queue.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 09/10] blk-mq: handle failure path for initializing hctx
  2014-09-15 13:11 ` [PATCH v4 09/10] blk-mq: handle failure path for initializing hctx Ming Lei
@ 2014-09-24 10:22   ` Christoph Hellwig
  2014-09-24 14:17     ` Jens Axboe
  0 siblings, 1 reply; 24+ messages in thread
From: Christoph Hellwig @ 2014-09-24 10:22 UTC (permalink / raw)
  To: Ming Lei; +Cc: Jens Axboe, linux-kernel

On Mon, Sep 15, 2014 at 09:11:13PM +0800, Ming Lei wrote:
> Failure of initializing one hctx isn't handled, so this patch
> introduces blk_mq_init_hctx() and its pair to handle it explicitly.
> Also this patch makes code cleaner.
> 
> Signed-off-by: Ming Lei <ming.lei@canonical.com>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

Shouldn't this go to Jens as a separate fix for 3.18?


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 10/10] blk-mq: support per-distpatch_queue flush machinery
  2014-09-15 13:11 ` [PATCH v4 10/10] blk-mq: support per-distpatch_queue flush machinery Ming Lei
@ 2014-09-24 10:26   ` Christoph Hellwig
  2014-09-24 15:21     ` Ming Lei
  0 siblings, 1 reply; 24+ messages in thread
From: Christoph Hellwig @ 2014-09-24 10:26 UTC (permalink / raw)
  To: Ming Lei; +Cc: Jens Axboe, linux-kernel, Christoph Hellwig

> +struct blk_flush_queue *blk_alloc_flush_queue(struct request_queue *q,
> +		struct blk_mq_hw_ctx *hctx, int cmd_size)

I still think this should pass in the numa node instead of the hctx, and
allow node-local allocation for the old code as well.  As mentioned earlier
initializing mq_flush_lock for the !mq case is harmless. 


We also should document it where cleary somewhere that for devices that
have flushes enabled ->init_request can be called for more requests than
the queue depth, as drivers might allocate some sort of pool for them.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 09/10] blk-mq: handle failure path for initializing hctx
  2014-09-24 10:22   ` Christoph Hellwig
@ 2014-09-24 14:17     ` Jens Axboe
  2014-09-24 15:01       ` Ming Lei
  0 siblings, 1 reply; 24+ messages in thread
From: Jens Axboe @ 2014-09-24 14:17 UTC (permalink / raw)
  To: Christoph Hellwig, Ming Lei; +Cc: linux-kernel

On 09/24/2014 04:22 AM, Christoph Hellwig wrote:
> On Mon, Sep 15, 2014 at 09:11:13PM +0800, Ming Lei wrote:
>> Failure of initializing one hctx isn't handled, so this patch
>> introduces blk_mq_init_hctx() and its pair to handle it explicitly.
>> Also this patch makes code cleaner.
>>
>> Signed-off-by: Ming Lei <ming.lei@canonical.com>
> 
> Looks good,
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> 
> Shouldn't this go to Jens as a separate fix for 3.18?

I think so, doesn't look like it's tied into the series at all.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 02/10] block: introduce blk_init_flush and its pair
  2014-09-24 10:18   ` Christoph Hellwig
@ 2014-09-24 14:46     ` Ming Lei
  0 siblings, 0 replies; 24+ messages in thread
From: Ming Lei @ 2014-09-24 14:46 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jens Axboe, Linux Kernel Mailing List

On Wed, Sep 24, 2014 at 6:18 PM, Christoph Hellwig <hch@lst.de> wrote:
> On Mon, Sep 15, 2014 at 09:11:06PM +0800, Ming Lei wrote:
>> These two temporary functions are introduced for holding flush
>> initialization and de-initialization, so that we can
>> introduce 'flush queue' easier in the following patch. And
>> once 'flush queue' and its allocation/free functions are ready,
>> they will be removed for sake of code readability.
>
> Shouldn't we just do the mq work in this helper as well?  blk_mq_init_flush
> does exactly the same work, plus initializing a lock, which is harmless
> for the !mq case as well.

The initialization handler is just introduced for putting all initialization
into one place, then all will be put into blk_alloc_flush_queue() once
flush queue is introduced.

It is harmless to initialize mq lock for !mq, but may lose one benefit
of detecting misusing mq lock for !mq.

Thanks,
--
Ming Lei

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 06/10] block: remove blk_init_flush() and its pair
  2014-09-24 10:20   ` Christoph Hellwig
@ 2014-09-24 14:49     ` Ming Lei
  2014-09-24 14:54       ` Christoph Hellwig
  0 siblings, 1 reply; 24+ messages in thread
From: Ming Lei @ 2014-09-24 14:49 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jens Axboe, Linux Kernel Mailing List

On Wed, Sep 24, 2014 at 6:20 PM, Christoph Hellwig <hch@lst.de> wrote:
> On Mon, Sep 15, 2014 at 09:11:10PM +0800, Ming Lei wrote:
>> Now mission of the two helpers is over, and just call
>> blk_alloc_flush_queue() and blk_free_flush_queue() directly.
>
> I suspect it might be much easier to fold all previous patches except
> for patch 4, which stands alone into one.  There's just too much churn
> there which doesn't stay in the final version.

Simpler to merge, but more difficult to split. It is for easier review,
since patch 5 is a bit big and each patch still does one thing.

Thanks,

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 06/10] block: remove blk_init_flush() and its pair
  2014-09-24 14:49     ` Ming Lei
@ 2014-09-24 14:54       ` Christoph Hellwig
  0 siblings, 0 replies; 24+ messages in thread
From: Christoph Hellwig @ 2014-09-24 14:54 UTC (permalink / raw)
  To: Ming Lei; +Cc: Christoph Hellwig, Jens Axboe, Linux Kernel Mailing List

On Wed, Sep 24, 2014 at 10:49:00PM +0800, Ming Lei wrote:
> Simpler to merge, but more difficult to split. It is for easier review,
> since patch 5 is a bit big and each patch still does one thing.

I defintively find the current version hard to review - it moves
code around a few times before it setlles, so it requires a lot of
memory or applying patches one after another to a tree.  And while
this might be useful in some cases this is one where merging the
patches doesn't seem to have much of a downside.  We'd still move
various fields into a structure, and add helpers to init it, just
with a tiny bit more changes on the initialization side.

That being said if you really prefer the split that's fine with me,
but it doesn't really seem helpful.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 08/10] block: introduce 'blk_mq_ctx' parameter to blk_get_flush_queue
  2014-09-24 10:21   ` Christoph Hellwig
@ 2014-09-24 14:56     ` Ming Lei
  0 siblings, 0 replies; 24+ messages in thread
From: Ming Lei @ 2014-09-24 14:56 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jens Axboe, Linux Kernel Mailing List

On Wed, Sep 24, 2014 at 6:21 PM, Christoph Hellwig <hch@lst.de> wrote:
> On Mon, Sep 15, 2014 at 09:11:12PM +0800, Ming Lei wrote:
>> This patch adds 'blk_mq_ctx' parameter to blk_get_flush_queue(),
>> so that this function can find the corresponding blk_flush_queue
>> bound with current mq context since the flush queue will become
>> per hw-queue.
>>
>> For legacy queue, the parameter can be simply 'NULL'.
>>
>> For multiqueue case, the parameter should be set as the context
>> from which the related request is originated. With this context
>> info, the hw queue and related flush queue can be found easily.
>
> I think this should be merged into the patch introducing
> blk_get_flush_queue.

The 'blk_mq_ctx' parameter should be treated carefully for
the function, that is why the patch isn't merged to previous
one, and patch 5 is big enough.

Again, small patch is easier for review, and we should try to
do one thing in one patch.

Thanks,

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 09/10] blk-mq: handle failure path for initializing hctx
  2014-09-24 14:17     ` Jens Axboe
@ 2014-09-24 15:01       ` Ming Lei
  2014-09-24 15:03         ` Christoph Hellwig
  0 siblings, 1 reply; 24+ messages in thread
From: Ming Lei @ 2014-09-24 15:01 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Christoph Hellwig, Linux Kernel Mailing List

On Wed, Sep 24, 2014 at 10:17 PM, Jens Axboe <axboe@kernel.dk> wrote:
> On 09/24/2014 04:22 AM, Christoph Hellwig wrote:
>> On Mon, Sep 15, 2014 at 09:11:13PM +0800, Ming Lei wrote:
>>> Failure of initializing one hctx isn't handled, so this patch
>>> introduces blk_mq_init_hctx() and its pair to handle it explicitly.
>>> Also this patch makes code cleaner.
>>>
>>> Signed-off-by: Ming Lei <ming.lei@canonical.com>
>>
>> Looks good,
>>
>> Reviewed-by: Christoph Hellwig <hch@lst.de>
>>
>> Shouldn't this go to Jens as a separate fix for 3.18?
>
> I think so, doesn't look like it's tied into the series at all.

Without this one, patch 10 becomes quite ugly.

OK, I will split this one from this patchset.

Thank,

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 09/10] blk-mq: handle failure path for initializing hctx
  2014-09-24 15:01       ` Ming Lei
@ 2014-09-24 15:03         ` Christoph Hellwig
  0 siblings, 0 replies; 24+ messages in thread
From: Christoph Hellwig @ 2014-09-24 15:03 UTC (permalink / raw)
  To: Ming Lei; +Cc: Jens Axboe, Linux Kernel Mailing List

On Wed, Sep 24, 2014 at 11:01:00PM +0800, Ming Lei wrote:
> Without this one, patch 10 becomes quite ugly.
> 
> OK, I will split this one from this patchset.

I should just be patch 1 of the series so that Jens can merge it ASAP.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 10/10] blk-mq: support per-distpatch_queue flush machinery
  2014-09-24 10:26   ` Christoph Hellwig
@ 2014-09-24 15:21     ` Ming Lei
  0 siblings, 0 replies; 24+ messages in thread
From: Ming Lei @ 2014-09-24 15:21 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jens Axboe, Linux Kernel Mailing List

On Wed, Sep 24, 2014 at 6:26 PM, Christoph Hellwig <hch@lst.de> wrote:
>> +struct blk_flush_queue *blk_alloc_flush_queue(struct request_queue *q,
>> +             struct blk_mq_hw_ctx *hctx, int cmd_size)
>
> I still think this should pass in the numa node instead of the hctx, and
> allow node-local allocation for the old code as well.

We can do that, but have to pass NUMA_NO_NODE for old code
since blk_init_allocated_queue() doesn't provide node information
yet.

> As mentioned earlier
> initializing mq_flush_lock for the !mq case is harmless.

q->mq_ops is another friend for the purpose, :-)

>
> We also should document it where cleary somewhere that for devices that
> have flushes enabled ->init_request can be called for more requests than
> the queue depth, as drivers might allocate some sort of pool for them.

That does make sense.

Thanks,

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2014-09-24 15:21 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-15 13:11 [PATCH v4 0/10] block: per-distpatch_queue flush machinery Ming Lei
2014-09-15 13:11 ` [PATCH v4 01/10] blk-mq: allocate flush_rq in blk_mq_init_flush() Ming Lei
2014-09-15 13:11 ` [PATCH v4 02/10] block: introduce blk_init_flush and its pair Ming Lei
2014-09-24 10:18   ` Christoph Hellwig
2014-09-24 14:46     ` Ming Lei
2014-09-15 13:11 ` [PATCH v4 03/10] block: move flush initialization to blk_flush_init Ming Lei
2014-09-15 13:11 ` [PATCH v4 04/10] block: avoid to use q->flush_rq directly Ming Lei
2014-09-15 13:11 ` [PATCH v4 05/10] block: introduce blk_flush_queue to drive flush machinery Ming Lei
2014-09-15 13:11 ` [PATCH v4 06/10] block: remove blk_init_flush() and its pair Ming Lei
2014-09-24 10:20   ` Christoph Hellwig
2014-09-24 14:49     ` Ming Lei
2014-09-24 14:54       ` Christoph Hellwig
2014-09-15 13:11 ` [PATCH v4 07/10] block: flush: avoid to figure out flush queue unnecessarily Ming Lei
2014-09-15 13:11 ` [PATCH v4 08/10] block: introduce 'blk_mq_ctx' parameter to blk_get_flush_queue Ming Lei
2014-09-24 10:21   ` Christoph Hellwig
2014-09-24 14:56     ` Ming Lei
2014-09-15 13:11 ` [PATCH v4 09/10] blk-mq: handle failure path for initializing hctx Ming Lei
2014-09-24 10:22   ` Christoph Hellwig
2014-09-24 14:17     ` Jens Axboe
2014-09-24 15:01       ` Ming Lei
2014-09-24 15:03         ` Christoph Hellwig
2014-09-15 13:11 ` [PATCH v4 10/10] blk-mq: support per-distpatch_queue flush machinery Ming Lei
2014-09-24 10:26   ` Christoph Hellwig
2014-09-24 15:21     ` Ming Lei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).