All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/7] dm: add full blk-mq support to request-based DM
@ 2015-03-12  3:56 Mike Snitzer
  2015-03-12  3:56 ` [PATCH 1/7] blk-mq: fix use of incorrect goto label in blk_mq_init_queue error path Mike Snitzer
                   ` (7 more replies)
  0 siblings, 8 replies; 19+ messages in thread
From: Mike Snitzer @ 2015-03-12  3:56 UTC (permalink / raw)
  To: Jens Axboe, Keith Busch; +Cc: linux-kernel, linux-scsi, dm-devel

This patchset updates request-based DM (used exclussively by DM
multipath) to support blk-mq I/O path or the old ->request_fn I/O
path -- the default is still the old ->request_fn I/O path.

Like was done for SCSI, I've exposed a dm-mod.ko 'use_blk_mq' module
parameter and a CONFIG_DM_MQ_DEFAULT.  I've queued these patches in
linux-next via linux-dm.git's 'for-next' branch -- my goal is for
these changes to land upstream in 4.1.

I've tested all 4 supported permutations of rq-based queue stacking:
1) blk-mq mpath stacked on blk-mq device(s)
2) blk-mq mpath stacked on ->request_fn device(s)
3) old ->request_fn mpath stacked on blk-mq device(s)
4) old ->request_fn mpath stacked on old ->request_fn device(s)

There is definitely much more room for optimizing the blk-mq mpath
queue (via dm_mq_ops setup) to have more awareness about the
characteristics of the underlying blk-mq device(s) -- Jens offered
some suggestions that are worth pursuing.  Goal being to leverage the
ability to have multiple software submission queues that map to the
underlying paths' HW queue.  I expect Keith to be the most likely
person to pursue these follow-on optimizations given his immediate
access to NVMe devices, etc.

blk-mq allows for the removal of 2 of the 3 mempools and associated
allocations that were traditionally performed using old ->request_fn
I/O path.  The md->bs bioset still remains for cloning a request's
bios.

I'll be pursuing the possibility of removing the bio cloning that
happens when cloning a request (it was put in place for partial
completion of a request's bios but the prospect of avoiding those bio
clones warrants further review of how useful those partial completions
are, see: http://www.redhat.com/archives/dm-devel/2014-June/msg00030.html)

Jens and Keith, special thanks for your help answering more blk-mq
questions than I'd have hoped would be needed while at LSF/MM.

Keith Busch (1):
  blk-mq: don't wait in blk_mq_queue_enter() if __GFP_WAIT isn't set

Mike Snitzer (6):
  blk-mq: fix use of incorrect goto label in blk_mq_init_queue error path
  blk-mq: add blk_mq_init_allocated_queue and export blk_mq_register_disk
  blk-mq: export blk_mq_run_hw_queues
  dm: add full blk-mq support to request-based DM
  dm: optimize dm_mq_queue_rq to _not_ use kthread if using pure blk-mq
  dm: add 'use_blk_mq' module param and expose in per-device ro sysfs attr

 block/blk-mq-sysfs.c   |   1 +
 block/blk-mq.c         |  47 ++++--
 drivers/md/Kconfig     |  11 ++
 drivers/md/dm-mpath.c  |   4 +-
 drivers/md/dm-sysfs.c  |   9 ++
 drivers/md/dm-table.c  |  17 +-
 drivers/md/dm.c        | 414 +++++++++++++++++++++++++++++++++++++++----------
 drivers/md/dm.h        |   5 +-
 include/linux/blk-mq.h |   3 +
 9 files changed, 401 insertions(+), 110 deletions(-)

-- 
1.9.5 (Apple Git-50.3)


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/7] blk-mq: fix use of incorrect goto label in blk_mq_init_queue error path
  2015-03-12  3:56 [PATCH 0/7] dm: add full blk-mq support to request-based DM Mike Snitzer
@ 2015-03-12  3:56 ` Mike Snitzer
  2015-03-12  7:48   ` Ming Lei
  2015-03-12 14:27   ` [PATCH 1/7 v2] " Mike Snitzer
  2015-03-12  3:56 ` [PATCH 2/7] blk-mq: add blk_mq_init_allocated_queue and export blk_mq_register_disk Mike Snitzer
                   ` (6 subsequent siblings)
  7 siblings, 2 replies; 19+ messages in thread
From: Mike Snitzer @ 2015-03-12  3:56 UTC (permalink / raw)
  To: Jens Axboe, Keith Busch; +Cc: linux-kernel, linux-scsi, dm-devel

If percpu_ref_init() fails the 'err_hctxs' label should be used instead
of 'err_map'.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 block/blk-mq.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 4f4bea2..459840c 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1938,7 +1938,7 @@ struct request_queue *blk_mq_init_queue(struct blk_mq_tag_set *set)
 	 */
 	if (percpu_ref_init(&q->mq_usage_counter, blk_mq_usage_counter_release,
 			    PERCPU_REF_INIT_ATOMIC, GFP_KERNEL))
-		goto err_map;
+		goto err_hctxs;
 
 	setup_timer(&q->timeout, blk_mq_rq_timer, (unsigned long) q);
 	blk_queue_rq_timeout(q, 30000);
-- 
1.9.5 (Apple Git-50.3)


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/7] blk-mq: add blk_mq_init_allocated_queue and export blk_mq_register_disk
  2015-03-12  3:56 [PATCH 0/7] dm: add full blk-mq support to request-based DM Mike Snitzer
  2015-03-12  3:56 ` [PATCH 1/7] blk-mq: fix use of incorrect goto label in blk_mq_init_queue error path Mike Snitzer
@ 2015-03-12  3:56 ` Mike Snitzer
  2015-03-12  8:10   ` Ming Lei
  2015-03-12 14:29   ` [PATCH 2/7 v2] " Mike Snitzer
  2015-03-12  3:56 ` [PATCH 3/7] blk-mq: export blk_mq_run_hw_queues Mike Snitzer
                   ` (5 subsequent siblings)
  7 siblings, 2 replies; 19+ messages in thread
From: Mike Snitzer @ 2015-03-12  3:56 UTC (permalink / raw)
  To: Jens Axboe, Keith Busch; +Cc: linux-kernel, linux-scsi, dm-devel

Add a variant of blk_mq_init_queue that allows a previously allocated
queue to be initialized.  blk_mq_init_allocated_queue models
blk_init_allocated_queue -- which was also created for DM's use.

DM's approach to device creation requires a placeholder request_queue be
allocated for use with alloc_dev() but the decision about what type of
request_queue will be ultimately created is deferred until all component
devices referenced in the DM table are processed to determine the table
type (request-based, blk-mq request-based, or bio-based).

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 block/blk-mq-sysfs.c   |  1 +
 block/blk-mq.c         | 28 +++++++++++++++++++---------
 include/linux/blk-mq.h |  2 ++
 3 files changed, 22 insertions(+), 9 deletions(-)

diff --git a/block/blk-mq-sysfs.c b/block/blk-mq-sysfs.c
index 1630a20..b79685e 100644
--- a/block/blk-mq-sysfs.c
+++ b/block/blk-mq-sysfs.c
@@ -436,6 +436,7 @@ int blk_mq_register_disk(struct gendisk *disk)
 
 	return 0;
 }
+EXPORT_SYMBOL_GPL(blk_mq_register_disk);
 
 void blk_mq_sysfs_unregister(struct request_queue *q)
 {
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 459840c..d576a78 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1891,9 +1891,25 @@ void blk_mq_release(struct request_queue *q)
 
 struct request_queue *blk_mq_init_queue(struct blk_mq_tag_set *set)
 {
+	struct request_queue *uninit_q, *q;
+
+	uninit_q = blk_alloc_queue_node(GFP_KERNEL, set->numa_node);
+	if (!uninit_q)
+		return ERR_PTR(-ENOMEM);
+
+	q = blk_mq_init_allocated_queue(set, uninit_q);
+	if (!q)
+		blk_cleanup_queue(uninit_q);
+
+	return q;
+}
+EXPORT_SYMBOL(blk_mq_init_queue);
+
+struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
+						  struct request_queue *q)
+{
 	struct blk_mq_hw_ctx **hctxs;
 	struct blk_mq_ctx __percpu *ctx;
-	struct request_queue *q;
 	unsigned int *map;
 	int i;
 
@@ -1928,10 +1944,6 @@ struct request_queue *blk_mq_init_queue(struct blk_mq_tag_set *set)
 		hctxs[i]->queue_num = i;
 	}
 
-	q = blk_alloc_queue_node(GFP_KERNEL, set->numa_node);
-	if (!q)
-		goto err_hctxs;
-
 	/*
 	 * Init percpu_ref in atomic mode so that it's faster to shutdown.
 	 * See blk_register_queue() for details.
@@ -1981,7 +1993,7 @@ struct request_queue *blk_mq_init_queue(struct blk_mq_tag_set *set)
 	blk_mq_init_cpu_queues(q, set->nr_hw_queues);
 
 	if (blk_mq_init_hw_queues(q, set))
-		goto err_hw;
+		goto err_hctxs;
 
 	mutex_lock(&all_q_mutex);
 	list_add_tail(&q->all_q_node, &all_q_list);
@@ -1993,8 +2005,6 @@ struct request_queue *blk_mq_init_queue(struct blk_mq_tag_set *set)
 
 	return q;
 
-err_hw:
-	blk_cleanup_queue(q);
 err_hctxs:
 	kfree(map);
 	for (i = 0; i < set->nr_hw_queues; i++) {
@@ -2009,7 +2019,7 @@ err_percpu:
 	free_percpu(ctx);
 	return ERR_PTR(-ENOMEM);
 }
-EXPORT_SYMBOL(blk_mq_init_queue);
+EXPORT_SYMBOL(blk_mq_init_allocated_queue);
 
 void blk_mq_free_queue(struct request_queue *q)
 {
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 7aec861..9a75c88 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -164,6 +164,8 @@ enum {
 		<< BLK_MQ_F_ALLOC_POLICY_START_BIT)
 
 struct request_queue *blk_mq_init_queue(struct blk_mq_tag_set *);
+struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
+						  struct request_queue *q);
 void blk_mq_finish_init(struct request_queue *q);
 int blk_mq_register_disk(struct gendisk *);
 void blk_mq_unregister_disk(struct gendisk *);
-- 
1.9.5 (Apple Git-50.3)


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 3/7] blk-mq: export blk_mq_run_hw_queues
  2015-03-12  3:56 [PATCH 0/7] dm: add full blk-mq support to request-based DM Mike Snitzer
  2015-03-12  3:56 ` [PATCH 1/7] blk-mq: fix use of incorrect goto label in blk_mq_init_queue error path Mike Snitzer
  2015-03-12  3:56 ` [PATCH 2/7] blk-mq: add blk_mq_init_allocated_queue and export blk_mq_register_disk Mike Snitzer
@ 2015-03-12  3:56 ` Mike Snitzer
  2015-03-12  3:56 ` [PATCH 4/7] blk-mq: don't wait in blk_mq_queue_enter() if __GFP_WAIT isn't set Mike Snitzer
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Mike Snitzer @ 2015-03-12  3:56 UTC (permalink / raw)
  To: Jens Axboe, Keith Busch; +Cc: linux-kernel, linux-scsi, dm-devel

Rename blk_mq_run_queues to blk_mq_run_hw_queues, add async argument,
and export it.

DM's suspend support must be able to run the queue without starting
stopped hw queues.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 block/blk-mq.c         | 8 ++++----
 include/linux/blk-mq.h | 1 +
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index d576a78..5706dd9 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -33,7 +33,6 @@ static DEFINE_MUTEX(all_q_mutex);
 static LIST_HEAD(all_q_list);
 
 static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx);
-static void blk_mq_run_queues(struct request_queue *q);
 
 /*
  * Check if any of the ctx's have pending work in this hardware queue
@@ -118,7 +117,7 @@ void blk_mq_freeze_queue_start(struct request_queue *q)
 
 	if (freeze) {
 		percpu_ref_kill(&q->mq_usage_counter);
-		blk_mq_run_queues(q);
+		blk_mq_run_hw_queues(q, false);
 	}
 }
 EXPORT_SYMBOL_GPL(blk_mq_freeze_queue_start);
@@ -904,7 +903,7 @@ void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async)
 			&hctx->run_work, 0);
 }
 
-static void blk_mq_run_queues(struct request_queue *q)
+void blk_mq_run_hw_queues(struct request_queue *q, bool async)
 {
 	struct blk_mq_hw_ctx *hctx;
 	int i;
@@ -915,9 +914,10 @@ static void blk_mq_run_queues(struct request_queue *q)
 		    test_bit(BLK_MQ_S_STOPPED, &hctx->state))
 			continue;
 
-		blk_mq_run_hw_queue(hctx, false);
+		blk_mq_run_hw_queue(hctx, async);
 	}
 }
+EXPORT_SYMBOL(blk_mq_run_hw_queues);
 
 void blk_mq_stop_hw_queue(struct blk_mq_hw_ctx *hctx)
 {
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 9a75c88..ebfe707 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -220,6 +220,7 @@ void blk_mq_start_hw_queue(struct blk_mq_hw_ctx *hctx);
 void blk_mq_stop_hw_queues(struct request_queue *q);
 void blk_mq_start_hw_queues(struct request_queue *q);
 void blk_mq_start_stopped_hw_queues(struct request_queue *q, bool async);
+void blk_mq_run_hw_queues(struct request_queue *q, bool async);
 void blk_mq_delay_queue(struct blk_mq_hw_ctx *hctx, unsigned long msecs);
 void blk_mq_tag_busy_iter(struct blk_mq_hw_ctx *hctx, busy_iter_fn *fn,
 		void *priv);
-- 
1.9.5 (Apple Git-50.3)


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 4/7] blk-mq: don't wait in blk_mq_queue_enter() if __GFP_WAIT isn't set
  2015-03-12  3:56 [PATCH 0/7] dm: add full blk-mq support to request-based DM Mike Snitzer
                   ` (2 preceding siblings ...)
  2015-03-12  3:56 ` [PATCH 3/7] blk-mq: export blk_mq_run_hw_queues Mike Snitzer
@ 2015-03-12  3:56 ` Mike Snitzer
  2015-03-12  3:56 ` [PATCH 5/7] dm: add full blk-mq support to request-based DM Mike Snitzer
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Mike Snitzer @ 2015-03-12  3:56 UTC (permalink / raw)
  To: Jens Axboe, Keith Busch; +Cc: linux-kernel, linux-scsi, dm-devel

From: Keith Busch <keith.busch@intel.com>

Return -EBUSY if we're unable to enter a queue immediately when
allocating a blk-mq request without __GFP_WAIT.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 block/blk-mq.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 5706dd9..01b2912 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -77,7 +77,7 @@ static void blk_mq_hctx_clear_pending(struct blk_mq_hw_ctx *hctx,
 	clear_bit(CTX_TO_BIT(hctx, ctx), &bm->word);
 }
 
-static int blk_mq_queue_enter(struct request_queue *q)
+static int blk_mq_queue_enter(struct request_queue *q, gfp_t gfp)
 {
 	while (true) {
 		int ret;
@@ -85,6 +85,9 @@ static int blk_mq_queue_enter(struct request_queue *q)
 		if (percpu_ref_tryget_live(&q->mq_usage_counter))
 			return 0;
 
+		if (!(gfp & __GFP_WAIT))
+			return -EBUSY;
+
 		ret = wait_event_interruptible(q->mq_freeze_wq,
 				!q->mq_freeze_depth || blk_queue_dying(q));
 		if (blk_queue_dying(q))
@@ -256,7 +259,7 @@ struct request *blk_mq_alloc_request(struct request_queue *q, int rw, gfp_t gfp,
 	struct blk_mq_alloc_data alloc_data;
 	int ret;
 
-	ret = blk_mq_queue_enter(q);
+	ret = blk_mq_queue_enter(q, gfp);
 	if (ret)
 		return ERR_PTR(ret);
 
@@ -1186,7 +1189,7 @@ static struct request *blk_mq_map_request(struct request_queue *q,
 	int rw = bio_data_dir(bio);
 	struct blk_mq_alloc_data alloc_data;
 
-	if (unlikely(blk_mq_queue_enter(q))) {
+	if (unlikely(blk_mq_queue_enter(q, GFP_KERNEL))) {
 		bio_endio(bio, -EIO);
 		return NULL;
 	}
-- 
1.9.5 (Apple Git-50.3)


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 5/7] dm: add full blk-mq support to request-based DM
  2015-03-12  3:56 [PATCH 0/7] dm: add full blk-mq support to request-based DM Mike Snitzer
                   ` (3 preceding siblings ...)
  2015-03-12  3:56 ` [PATCH 4/7] blk-mq: don't wait in blk_mq_queue_enter() if __GFP_WAIT isn't set Mike Snitzer
@ 2015-03-12  3:56 ` Mike Snitzer
  2015-03-12  3:56 ` [PATCH 6/7] dm: optimize dm_mq_queue_rq to _not_ use kthread if using pure blk-mq Mike Snitzer
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Mike Snitzer @ 2015-03-12  3:56 UTC (permalink / raw)
  To: Jens Axboe, Keith Busch; +Cc: linux-kernel, linux-scsi, dm-devel

Commit e5863d9ad ("dm: allocate requests in target when stacking on
blk-mq devices") served as the first step toward fully utilizing blk-mq
in request-based DM -- it enabled stacking a old-style (request_fn)
request_queue ontop of the underlying blk-mq device(s).  This first step
didn't improve performance of DM multipath ontop of fast blk-mq devices
(e.g. NVMe) because the top-level old-style request_queue was severely
limited by the queue_lock.

The second step offered here enables stacking a blk-mq request_queue
ontop of the underlying blk-mq device(s).  This unlocks significant
performance gains on fast blk-mq devices, Keith Busch tested on his NVMe
testbed and offered this really positive news:

 "Just providing a performance update. All my fio tests are getting
  roughly equal performance whether accessed through the raw block
  device or the multipath device mapper (~470k IOPS). I could only push
  ~20% of the raw iops through dm before this conversion, so this latest
  tree is looking really solid from a performance standpoint."

Tested-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 drivers/md/dm-mpath.c |   4 +-
 drivers/md/dm-table.c |  11 +-
 drivers/md/dm.c       | 326 +++++++++++++++++++++++++++++++++++++++-----------
 3 files changed, 269 insertions(+), 72 deletions(-)

diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
index add6391..2bb6599 100644
--- a/drivers/md/dm-mpath.c
+++ b/drivers/md/dm-mpath.c
@@ -428,9 +428,9 @@ static int __multipath_map(struct dm_target *ti, struct request *clone,
 	} else {
 		/* blk-mq request-based interface */
 		*__clone = blk_get_request(bdev_get_queue(bdev),
-					   rq_data_dir(rq), GFP_KERNEL);
+					   rq_data_dir(rq), GFP_ATOMIC);
 		if (IS_ERR(*__clone))
-			/* ENOMEM, requeue */
+			/* ENOMEM or EBUSY, requeue */
 			return r;
 		(*__clone)->bio = (*__clone)->biotail = NULL;
 		(*__clone)->rq_disk = bdev->bd_disk;
diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index 0573120..66600ca 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -18,6 +18,7 @@
 #include <linux/mutex.h>
 #include <linux/delay.h>
 #include <linux/atomic.h>
+#include <linux/blk-mq.h>
 
 #define DM_MSG_PREFIX "table"
 
@@ -1695,9 +1696,13 @@ void dm_table_run_md_queue_async(struct dm_table *t)
 	md = dm_table_get_md(t);
 	queue = dm_get_md_queue(md);
 	if (queue) {
-		spin_lock_irqsave(queue->queue_lock, flags);
-		blk_run_queue_async(queue);
-		spin_unlock_irqrestore(queue->queue_lock, flags);
+		if (queue->mq_ops)
+			blk_mq_run_hw_queues(queue, true);
+		else {
+			spin_lock_irqsave(queue->queue_lock, flags);
+			blk_run_queue_async(queue);
+			spin_unlock_irqrestore(queue->queue_lock, flags);
+		}
 	}
 }
 EXPORT_SYMBOL(dm_table_run_md_queue_async);
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index b8cb2d5..b5409ac 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -23,6 +23,7 @@
 #include <linux/kthread.h>
 #include <linux/ktime.h>
 #include <linux/elevator.h> /* for rq_end_sector() */
+#include <linux/blk-mq.h>
 
 #include <trace/events/block.h>
 
@@ -224,6 +225,9 @@ struct mapped_device {
 	int last_rq_rw;
 	sector_t last_rq_pos;
 	ktime_t last_rq_start_time;
+
+	/* for blk-mq request-based DM support */
+	struct blk_mq_tag_set tag_set;
 };
 
 /*
@@ -1022,6 +1026,11 @@ static void end_clone_bio(struct bio *clone, int error)
 	blk_update_request(tio->orig, 0, nr_bytes);
 }
 
+static struct dm_rq_target_io *tio_from_request(struct request *rq)
+{
+	return (rq->q->mq_ops ? blk_mq_rq_to_pdu(rq) : rq->special);
+}
+
 /*
  * Don't touch any member of the md after calling this function because
  * the md may be freed in dm_put() at the end of this function.
@@ -1045,8 +1054,10 @@ static void rq_completed(struct mapped_device *md, int rw, bool run_queue)
 	 * queue lock again.
 	 */
 	if (run_queue) {
-		if (!nr_requests_pending ||
-		    (nr_requests_pending >= md->queue->nr_congestion_on))
+		if (md->queue->mq_ops)
+			blk_mq_run_hw_queues(md->queue, true);
+		else if (!nr_requests_pending ||
+			 (nr_requests_pending >= md->queue->nr_congestion_on))
 			blk_run_queue_async(md->queue);
 	}
 
@@ -1059,13 +1070,17 @@ static void rq_completed(struct mapped_device *md, int rw, bool run_queue)
 static void free_rq_clone(struct request *clone)
 {
 	struct dm_rq_target_io *tio = clone->end_io_data;
+	struct mapped_device *md = tio->md;
 
 	blk_rq_unprep_clone(clone);
+
 	if (clone->q && clone->q->mq_ops)
 		tio->ti->type->release_clone_rq(clone);
 	else
-		free_clone_request(tio->md, clone);
-	free_rq_tio(tio);
+		free_clone_request(md, clone);
+
+	if (!md->queue->mq_ops)
+		free_rq_tio(tio);
 }
 
 /*
@@ -1094,17 +1109,22 @@ static void dm_end_request(struct request *clone, int error)
 	}
 
 	free_rq_clone(clone);
-	blk_end_request_all(rq, error);
+	if (!rq->q->mq_ops)
+		blk_end_request_all(rq, error);
+	else
+		blk_mq_end_request(rq, error);
 	rq_completed(md, rw, true);
 }
 
 static void dm_unprep_request(struct request *rq)
 {
-	struct dm_rq_target_io *tio = rq->special;
+	struct dm_rq_target_io *tio = tio_from_request(rq);
 	struct request *clone = tio->clone;
 
-	rq->special = NULL;
-	rq->cmd_flags &= ~REQ_DONTPREP;
+	if (!rq->q->mq_ops) {
+		rq->special = NULL;
+		rq->cmd_flags &= ~REQ_DONTPREP;
+	}
 
 	if (clone)
 		free_rq_clone(clone);
@@ -1113,18 +1133,29 @@ static void dm_unprep_request(struct request *rq)
 /*
  * Requeue the original request of a clone.
  */
-static void dm_requeue_unmapped_original_request(struct mapped_device *md,
-						 struct request *rq)
+static void old_requeue_request(struct request *rq)
 {
-	int rw = rq_data_dir(rq);
 	struct request_queue *q = rq->q;
 	unsigned long flags;
 
-	dm_unprep_request(rq);
-
 	spin_lock_irqsave(q->queue_lock, flags);
 	blk_requeue_request(q, rq);
 	spin_unlock_irqrestore(q->queue_lock, flags);
+}
+
+static void dm_requeue_unmapped_original_request(struct mapped_device *md,
+						 struct request *rq)
+{
+	int rw = rq_data_dir(rq);
+
+	dm_unprep_request(rq);
+
+	if (!rq->q->mq_ops)
+		old_requeue_request(rq);
+	else {
+		blk_mq_requeue_request(rq);
+		blk_mq_kick_requeue_list(rq->q);
+	}
 
 	rq_completed(md, rw, false);
 }
@@ -1136,35 +1167,44 @@ static void dm_requeue_unmapped_request(struct request *clone)
 	dm_requeue_unmapped_original_request(tio->md, tio->orig);
 }
 
-static void __stop_queue(struct request_queue *q)
-{
-	blk_stop_queue(q);
-}
-
-static void stop_queue(struct request_queue *q)
+static void old_stop_queue(struct request_queue *q)
 {
 	unsigned long flags;
 
+	if (blk_queue_stopped(q))
+		return;
+
 	spin_lock_irqsave(q->queue_lock, flags);
-	__stop_queue(q);
+	blk_stop_queue(q);
 	spin_unlock_irqrestore(q->queue_lock, flags);
 }
 
-static void __start_queue(struct request_queue *q)
+static void stop_queue(struct request_queue *q)
 {
-	if (blk_queue_stopped(q))
-		blk_start_queue(q);
+	if (!q->mq_ops)
+		old_stop_queue(q);
+	else
+		blk_mq_stop_hw_queues(q);
 }
 
-static void start_queue(struct request_queue *q)
+static void old_start_queue(struct request_queue *q)
 {
 	unsigned long flags;
 
 	spin_lock_irqsave(q->queue_lock, flags);
-	__start_queue(q);
+	if (blk_queue_stopped(q))
+		blk_start_queue(q);
 	spin_unlock_irqrestore(q->queue_lock, flags);
 }
 
+static void start_queue(struct request_queue *q)
+{
+	if (!q->mq_ops)
+		old_start_queue(q);
+	else
+		blk_mq_start_stopped_hw_queues(q, true);
+}
+
 static void dm_done(struct request *clone, int error, bool mapped)
 {
 	int r = error;
@@ -1203,13 +1243,20 @@ static void dm_done(struct request *clone, int error, bool mapped)
 static void dm_softirq_done(struct request *rq)
 {
 	bool mapped = true;
-	struct dm_rq_target_io *tio = rq->special;
+	struct dm_rq_target_io *tio = tio_from_request(rq);
 	struct request *clone = tio->clone;
+	int rw;
 
 	if (!clone) {
-		blk_end_request_all(rq, tio->error);
-		rq_completed(tio->md, rq_data_dir(rq), false);
-		free_rq_tio(tio);
+		rw = rq_data_dir(rq);
+		if (!rq->q->mq_ops) {
+			blk_end_request_all(rq, tio->error);
+			rq_completed(tio->md, rw, false);
+			free_rq_tio(tio);
+		} else {
+			blk_mq_end_request(rq, tio->error);
+			rq_completed(tio->md, rw, false);
+		}
 		return;
 	}
 
@@ -1225,7 +1272,7 @@ static void dm_softirq_done(struct request *rq)
  */
 static void dm_complete_request(struct request *rq, int error)
 {
-	struct dm_rq_target_io *tio = rq->special;
+	struct dm_rq_target_io *tio = tio_from_request(rq);
 
 	tio->error = error;
 	blk_complete_request(rq);
@@ -1244,7 +1291,7 @@ static void dm_kill_unmapped_request(struct request *rq, int error)
 }
 
 /*
- * Called with the clone's queue lock held
+ * Called with the clone's queue lock held (for non-blk-mq)
  */
 static void end_clone_request(struct request *clone, int error)
 {
@@ -1805,6 +1852,18 @@ static struct request *clone_rq(struct request *rq, struct mapped_device *md,
 
 static void map_tio_request(struct kthread_work *work);
 
+static void init_tio(struct dm_rq_target_io *tio, struct request *rq,
+		     struct mapped_device *md)
+{
+	tio->md = md;
+	tio->ti = NULL;
+	tio->clone = NULL;
+	tio->orig = rq;
+	tio->error = 0;
+	memset(&tio->info, 0, sizeof(tio->info));
+	init_kthread_work(&tio->work, map_tio_request);
+}
+
 static struct dm_rq_target_io *prep_tio(struct request *rq,
 					struct mapped_device *md, gfp_t gfp_mask)
 {
@@ -1816,13 +1875,7 @@ static struct dm_rq_target_io *prep_tio(struct request *rq,
 	if (!tio)
 		return NULL;
 
-	tio->md = md;
-	tio->ti = NULL;
-	tio->clone = NULL;
-	tio->orig = rq;
-	tio->error = 0;
-	memset(&tio->info, 0, sizeof(tio->info));
-	init_kthread_work(&tio->work, map_tio_request);
+	init_tio(tio, rq, md);
 
 	table = dm_get_live_table(md, &srcu_idx);
 	if (!dm_table_mq_request_based(table)) {
@@ -1866,11 +1919,11 @@ static int dm_prep_fn(struct request_queue *q, struct request *rq)
  * DM_MAPIO_REQUEUE : the original request needs to be requeued
  * < 0              : the request was completed due to failure
  */
-static int map_request(struct dm_target *ti, struct request *rq,
+static int map_request(struct dm_rq_target_io *tio, struct request *rq,
 		       struct mapped_device *md)
 {
 	int r;
-	struct dm_rq_target_io *tio = rq->special;
+	struct dm_target *ti = tio->ti;
 	struct request *clone = NULL;
 
 	if (tio->clone) {
@@ -1885,7 +1938,7 @@ static int map_request(struct dm_target *ti, struct request *rq,
 		}
 		if (IS_ERR(clone))
 			return DM_MAPIO_REQUEUE;
-		if (setup_clone(clone, rq, tio, GFP_KERNEL)) {
+		if (setup_clone(clone, rq, tio, GFP_NOIO)) {
 			/* -ENOMEM */
 			ti->type->release_clone_rq(clone);
 			return DM_MAPIO_REQUEUE;
@@ -1926,13 +1979,16 @@ static void map_tio_request(struct kthread_work *work)
 	struct request *rq = tio->orig;
 	struct mapped_device *md = tio->md;
 
-	if (map_request(tio->ti, rq, md) == DM_MAPIO_REQUEUE)
+	if (map_request(tio, rq, md) == DM_MAPIO_REQUEUE)
 		dm_requeue_unmapped_original_request(md, rq);
 }
 
 static void dm_start_request(struct mapped_device *md, struct request *orig)
 {
-	blk_start_request(orig);
+	if (!orig->q->mq_ops)
+		blk_start_request(orig);
+	else
+		blk_mq_start_request(orig);
 	atomic_inc(&md->pending[rq_data_dir(orig)]);
 
 	if (md->seq_rq_merge_deadline_usecs) {
@@ -2042,7 +2098,7 @@ static void dm_request_fn(struct request_queue *q)
 
 		dm_start_request(md, rq);
 
-		tio = rq->special;
+		tio = tio_from_request(rq);
 		/* Establish tio->ti before queuing work (map_tio_request) */
 		tio->ti = ti;
 		queue_kthread_work(&md->kworker, &tio->work);
@@ -2139,7 +2195,7 @@ static void dm_init_md_queue(struct mapped_device *md)
 {
 	/*
 	 * Request-based dm devices cannot be stacked on top of bio-based dm
-	 * devices.  The type of this dm device has not been decided yet.
+	 * devices.  The type of this dm device may not have been decided yet.
 	 * The type is decided at the first table loading time.
 	 * To prevent problematic device stacking, clear the queue flag
 	 * for request stacking support until then.
@@ -2147,7 +2203,15 @@ static void dm_init_md_queue(struct mapped_device *md)
 	 * This queue is new, so no concurrency on the queue_flags.
 	 */
 	queue_flag_clear_unlocked(QUEUE_FLAG_STACKABLE, md->queue);
+}
 
+static void dm_init_old_md_queue(struct mapped_device *md)
+{
+	dm_init_md_queue(md);
+
+	/*
+	 * Initialize aspects of queue that aren't relevant for blk-mq
+	 */
 	md->queue->queuedata = md;
 	md->queue->backing_dev_info.congested_fn = dm_any_congested;
 	md->queue->backing_dev_info.congested_data = md;
@@ -2270,6 +2334,7 @@ static void unlock_fs(struct mapped_device *md);
 static void free_dev(struct mapped_device *md)
 {
 	int minor = MINOR(disk_devt(md->disk));
+	bool using_blk_mq = !!md->queue->mq_ops;
 
 	unlock_fs(md);
 	bdput(md->bdev);
@@ -2295,6 +2360,8 @@ static void free_dev(struct mapped_device *md)
 
 	put_disk(md->disk);
 	blk_cleanup_queue(md->queue);
+	if (using_blk_mq)
+		blk_mq_free_tag_set(&md->tag_set);
 	dm_stats_cleanup(&md->stats);
 	module_put(THIS_MODULE);
 	kfree(md);
@@ -2452,7 +2519,7 @@ static struct dm_table *__bind(struct mapped_device *md, struct dm_table *t,
 	 * This must be done before setting the queue restrictions,
 	 * because request-based dm may be run just after the setting.
 	 */
-	if (dm_table_request_based(t) && !blk_queue_stopped(q))
+	if (dm_table_request_based(t))
 		stop_queue(q);
 
 	__bind_mempools(md, t);
@@ -2534,14 +2601,6 @@ unsigned dm_get_md_type(struct mapped_device *md)
 	return md->type;
 }
 
-static bool dm_md_type_request_based(struct mapped_device *md)
-{
-	unsigned table_type = dm_get_md_type(md);
-
-	return (table_type == DM_TYPE_REQUEST_BASED ||
-		table_type == DM_TYPE_MQ_REQUEST_BASED);
-}
-
 struct target_type *dm_get_immutable_target_type(struct mapped_device *md)
 {
 	return md->immutable_target_type;
@@ -2558,6 +2617,14 @@ struct queue_limits *dm_get_queue_limits(struct mapped_device *md)
 }
 EXPORT_SYMBOL_GPL(dm_get_queue_limits);
 
+static void init_rq_based_worker_thread(struct mapped_device *md)
+{
+	/* Initialize the request-based DM worker thread */
+	init_kthread_worker(&md->kworker);
+	md->kworker_task = kthread_run(kthread_worker_fn, &md->kworker,
+				       "kdmwork-%s", dm_device_name(md));
+}
+
 /*
  * Fully initialize a request-based queue (->elevator, ->request_fn, etc).
  */
@@ -2566,29 +2633,140 @@ static int dm_init_request_based_queue(struct mapped_device *md)
 	struct request_queue *q = NULL;
 
 	if (md->queue->elevator)
-		return 1;
+		return 0;
 
 	/* Fully initialize the queue */
 	q = blk_init_allocated_queue(md->queue, dm_request_fn, NULL);
 	if (!q)
-		return 0;
+		return -EINVAL;
 
 	/* disable dm_request_fn's merge heuristic by default */
 	md->seq_rq_merge_deadline_usecs = 0;
 
 	md->queue = q;
-	dm_init_md_queue(md);
+	dm_init_old_md_queue(md);
 	blk_queue_softirq_done(md->queue, dm_softirq_done);
 	blk_queue_prep_rq(md->queue, dm_prep_fn);
 
-	/* Also initialize the request-based DM worker thread */
-	init_kthread_worker(&md->kworker);
-	md->kworker_task = kthread_run(kthread_worker_fn, &md->kworker,
-				       "kdmwork-%s", dm_device_name(md));
+	init_rq_based_worker_thread(md);
 
 	elv_register_queue(md->queue);
 
-	return 1;
+	return 0;
+}
+
+static int dm_mq_init_request(void *data, struct request *rq,
+			      unsigned int hctx_idx, unsigned int request_idx,
+			      unsigned int numa_node)
+{
+	struct mapped_device *md = data;
+	struct dm_rq_target_io *tio = blk_mq_rq_to_pdu(rq);
+
+	/*
+	 * Must initialize md member of tio, otherwise it won't
+	 * be available in dm_mq_queue_rq.
+	 */
+	tio->md = md;
+
+	return 0;
+}
+
+static int dm_mq_queue_rq(struct blk_mq_hw_ctx *hctx,
+			  const struct blk_mq_queue_data *bd)
+{
+	struct request *rq = bd->rq;
+	struct dm_rq_target_io *tio = blk_mq_rq_to_pdu(rq);
+	struct mapped_device *md = tio->md;
+	int srcu_idx;
+	struct dm_table *map = dm_get_live_table(md, &srcu_idx);
+	struct dm_target *ti;
+	sector_t pos;
+
+	/* always use block 0 to find the target for flushes for now */
+	pos = 0;
+	if (!(rq->cmd_flags & REQ_FLUSH))
+		pos = blk_rq_pos(rq);
+
+	ti = dm_table_find_target(map, pos);
+	if (!dm_target_is_valid(ti)) {
+		dm_put_live_table(md, srcu_idx);
+		DMERR_LIMIT("request attempted access beyond the end of device");
+		/*
+		 * Must perform setup, that rq_completed() requires,
+		 * before returning BLK_MQ_RQ_QUEUE_ERROR
+		 */
+		dm_start_request(md, rq);
+		return BLK_MQ_RQ_QUEUE_ERROR;
+	}
+	dm_put_live_table(md, srcu_idx);
+
+	if (ti->type->busy && ti->type->busy(ti))
+		return BLK_MQ_RQ_QUEUE_BUSY;
+
+	dm_start_request(md, rq);
+
+	/* Init tio using md established in .init_request */
+	init_tio(tio, rq, md);
+
+	/* Clone the request if underlying devices aren't blk-mq */
+	if (dm_table_get_type(map) == DM_TYPE_REQUEST_BASED) {
+		// FIXME: make the memory for clone part of the pdu
+		if (!clone_rq(rq, md, tio, GFP_ATOMIC))
+			return BLK_MQ_RQ_QUEUE_BUSY;
+	}
+
+	/* Establish tio->ti before queuing work (map_tio_request) */
+	tio->ti = ti;
+	queue_kthread_work(&md->kworker, &tio->work);
+
+	return BLK_MQ_RQ_QUEUE_OK;
+}
+
+static struct blk_mq_ops dm_mq_ops = {
+	.queue_rq = dm_mq_queue_rq,
+	.map_queue = blk_mq_map_queue,
+	.complete = dm_softirq_done,
+	.init_request = dm_mq_init_request,
+};
+
+static int dm_init_request_based_blk_mq_queue(struct mapped_device *md)
+{
+	struct request_queue *q;
+	int err;
+
+	memset(&md->tag_set, 0, sizeof(md->tag_set));
+	md->tag_set.ops = &dm_mq_ops;
+	md->tag_set.queue_depth = BLKDEV_MAX_RQ;
+	md->tag_set.numa_node = NUMA_NO_NODE;
+	md->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_SG_MERGE;
+	md->tag_set.nr_hw_queues = 1;
+	// FIXME: make the memory for non-blk-mq clone part of the pdu
+	// would need to be done only if new 'use_blk_mq' is set in DM sysfs
+	md->tag_set.cmd_size = sizeof(struct dm_rq_target_io);
+	md->tag_set.driver_data = md;
+
+	err = blk_mq_alloc_tag_set(&md->tag_set);
+	if (err)
+		return err;
+
+	q = blk_mq_init_allocated_queue(&md->tag_set, md->queue);
+	if (IS_ERR(q)) {
+		err = PTR_ERR(q);
+		goto out_tag_set;
+	}
+	md->queue = q;
+	dm_init_md_queue(md);
+
+	/* backfill 'mq' sysfs registration normally done in blk_register_queue */
+	blk_mq_register_disk(md->disk);
+
+	init_rq_based_worker_thread(md);
+
+	return 0;
+
+out_tag_set:
+	blk_mq_free_tag_set(&md->tag_set);
+	return err;
 }
 
 /*
@@ -2596,15 +2774,29 @@ static int dm_init_request_based_queue(struct mapped_device *md)
  */
 int dm_setup_md_queue(struct mapped_device *md)
 {
-	if (dm_md_type_request_based(md)) {
-		if (!dm_init_request_based_queue(md)) {
+	int r;
+	unsigned md_type = dm_get_md_type(md);
+
+	switch (md_type) {
+	case DM_TYPE_REQUEST_BASED:
+		r = dm_init_request_based_queue(md);
+		if (r) {
 			DMWARN("Cannot initialize queue for request-based mapped device");
-			return -EINVAL;
+			return r;
 		}
-	} else {
-		/* bio-based specific initialization */
+		break;
+	case DM_TYPE_MQ_REQUEST_BASED:
+		r = dm_init_request_based_blk_mq_queue(md);
+		if (r) {
+			DMWARN("Cannot initialize queue for request-based blk-mq mapped device");
+			return r;
+		}
+		break;
+	case DM_TYPE_BIO_BASED:
+		dm_init_old_md_queue(md);
 		blk_queue_make_request(md->queue, dm_make_request);
 		blk_queue_merge_bvec(md->queue, dm_merge_bvec);
+		break;
 	}
 
 	return 0;
-- 
1.9.5 (Apple Git-50.3)


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 6/7] dm: optimize dm_mq_queue_rq to _not_ use kthread if using pure blk-mq
  2015-03-12  3:56 [PATCH 0/7] dm: add full blk-mq support to request-based DM Mike Snitzer
                   ` (4 preceding siblings ...)
  2015-03-12  3:56 ` [PATCH 5/7] dm: add full blk-mq support to request-based DM Mike Snitzer
@ 2015-03-12  3:56 ` Mike Snitzer
  2015-03-12  3:56 ` [PATCH 7/7] dm: add 'use_blk_mq' module param and expose in per-device ro sysfs attr Mike Snitzer
  2015-03-13 14:34 ` [PATCH 0/7] dm: add full blk-mq support to request-based DM Jens Axboe
  7 siblings, 0 replies; 19+ messages in thread
From: Mike Snitzer @ 2015-03-12  3:56 UTC (permalink / raw)
  To: Jens Axboe, Keith Busch; +Cc: linux-kernel, linux-scsi, dm-devel

dm_mq_queue_rq() is in atomic context so care must be taken to not
sleep -- as such GFP_ATOMIC is used for the md->bs bioset allocations
and dm-mpath's call to blk_get_request().  In the future the bioset
allocations will hopefully go away (by removing support for partial
completions of a request).

But the kthread will still be used to queue work if blk-mq is used ontop
of old-style request_fn device(s).  Also prepare for supporting DM
blk-mq ontop of old-style request_fn device(s) if a new dm-mod
'use_blk_mq' parameter is set.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 drivers/md/dm.c | 65 +++++++++++++++++++++++++++++++++++++++------------------
 1 file changed, 45 insertions(+), 20 deletions(-)

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index b5409ac..b0c965a 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1074,9 +1074,10 @@ static void free_rq_clone(struct request *clone)
 
 	blk_rq_unprep_clone(clone);
 
-	if (clone->q && clone->q->mq_ops)
+	if (clone->q->mq_ops)
 		tio->ti->type->release_clone_rq(clone);
-	else
+	else if (!md->queue->mq_ops)
+		/* request_fn queue stacked on request_fn queue(s) */
 		free_clone_request(md, clone);
 
 	if (!md->queue->mq_ops)
@@ -1835,15 +1836,25 @@ static int setup_clone(struct request *clone, struct request *rq,
 static struct request *clone_rq(struct request *rq, struct mapped_device *md,
 				struct dm_rq_target_io *tio, gfp_t gfp_mask)
 {
-	struct request *clone = alloc_clone_request(md, gfp_mask);
+	/*
+	 * Do not allocate a clone if tio->clone was already set
+	 * (see: dm_mq_queue_rq).
+	 */
+	bool alloc_clone = !tio->clone;
+	struct request *clone;
 
-	if (!clone)
-		return NULL;
+	if (alloc_clone) {
+		clone = alloc_clone_request(md, gfp_mask);
+		if (!clone)
+			return NULL;
+	} else
+		clone = tio->clone;
 
 	blk_rq_init(NULL, clone);
 	if (setup_clone(clone, rq, tio, gfp_mask)) {
 		/* -ENOMEM */
-		free_clone_request(md, clone);
+		if (alloc_clone)
+			free_clone_request(md, clone);
 		return NULL;
 	}
 
@@ -1861,7 +1872,8 @@ static void init_tio(struct dm_rq_target_io *tio, struct request *rq,
 	tio->orig = rq;
 	tio->error = 0;
 	memset(&tio->info, 0, sizeof(tio->info));
-	init_kthread_work(&tio->work, map_tio_request);
+	if (md->kworker_task)
+		init_kthread_work(&tio->work, map_tio_request);
 }
 
 static struct dm_rq_target_io *prep_tio(struct request *rq,
@@ -1938,7 +1950,7 @@ static int map_request(struct dm_rq_target_io *tio, struct request *rq,
 		}
 		if (IS_ERR(clone))
 			return DM_MAPIO_REQUEUE;
-		if (setup_clone(clone, rq, tio, GFP_NOIO)) {
+		if (setup_clone(clone, rq, tio, GFP_ATOMIC)) {
 			/* -ENOMEM */
 			ti->type->release_clone_rq(clone);
 			return DM_MAPIO_REQUEUE;
@@ -2403,7 +2415,7 @@ static void __bind_mempools(struct mapped_device *md, struct dm_table *t)
 	p->bs = NULL;
 
 out:
-	/* mempool bind completed, now no need any mempools in the table */
+	/* mempool bind completed, no longer need any mempools in the table */
 	dm_table_free_md_mempools(t);
 }
 
@@ -2708,17 +2720,25 @@ static int dm_mq_queue_rq(struct blk_mq_hw_ctx *hctx,
 	/* Init tio using md established in .init_request */
 	init_tio(tio, rq, md);
 
+	/*
+	 * Establish tio->ti before queuing work (map_tio_request)
+	 * or making direct call to map_request().
+	 */
+	tio->ti = ti;
+
 	/* Clone the request if underlying devices aren't blk-mq */
 	if (dm_table_get_type(map) == DM_TYPE_REQUEST_BASED) {
-		// FIXME: make the memory for clone part of the pdu
+		/* clone request is allocated at the end of the pdu */
+		tio->clone = (void *)blk_mq_rq_to_pdu(rq) + sizeof(struct dm_rq_target_io);
 		if (!clone_rq(rq, md, tio, GFP_ATOMIC))
 			return BLK_MQ_RQ_QUEUE_BUSY;
+		queue_kthread_work(&md->kworker, &tio->work);
+	} else {
+		/* Direct call is fine since .queue_rq allows allocations */
+		if (map_request(tio, rq, md) == DM_MAPIO_REQUEUE)
+			dm_requeue_unmapped_original_request(md, rq);
 	}
 
-	/* Establish tio->ti before queuing work (map_tio_request) */
-	tio->ti = ti;
-	queue_kthread_work(&md->kworker, &tio->work);
-
 	return BLK_MQ_RQ_QUEUE_OK;
 }
 
@@ -2731,6 +2751,7 @@ static struct blk_mq_ops dm_mq_ops = {
 
 static int dm_init_request_based_blk_mq_queue(struct mapped_device *md)
 {
+	unsigned md_type = dm_get_md_type(md);
 	struct request_queue *q;
 	int err;
 
@@ -2740,9 +2761,11 @@ static int dm_init_request_based_blk_mq_queue(struct mapped_device *md)
 	md->tag_set.numa_node = NUMA_NO_NODE;
 	md->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_SG_MERGE;
 	md->tag_set.nr_hw_queues = 1;
-	// FIXME: make the memory for non-blk-mq clone part of the pdu
-	// would need to be done only if new 'use_blk_mq' is set in DM sysfs
-	md->tag_set.cmd_size = sizeof(struct dm_rq_target_io);
+	if (md_type == DM_TYPE_REQUEST_BASED) {
+		/* make the memory for non-blk-mq clone part of the pdu */
+		md->tag_set.cmd_size = sizeof(struct dm_rq_target_io) + sizeof(struct request);
+	} else
+		md->tag_set.cmd_size = sizeof(struct dm_rq_target_io);
 	md->tag_set.driver_data = md;
 
 	err = blk_mq_alloc_tag_set(&md->tag_set);
@@ -2760,7 +2783,8 @@ static int dm_init_request_based_blk_mq_queue(struct mapped_device *md)
 	/* backfill 'mq' sysfs registration normally done in blk_register_queue */
 	blk_mq_register_disk(md->disk);
 
-	init_rq_based_worker_thread(md);
+	if (md_type == DM_TYPE_REQUEST_BASED)
+		init_rq_based_worker_thread(md);
 
 	return 0;
 
@@ -2879,7 +2903,7 @@ static void __dm_destroy(struct mapped_device *md, bool wait)
 	set_bit(DMF_FREEING, &md->flags);
 	spin_unlock(&_minor_lock);
 
-	if (dm_request_based(md))
+	if (dm_request_based(md) && md->kworker_task)
 		flush_kthread_worker(&md->kworker);
 
 	/*
@@ -3133,7 +3157,8 @@ static int __dm_suspend(struct mapped_device *md, struct dm_table *map,
 	 */
 	if (dm_request_based(md)) {
 		stop_queue(md->queue);
-		flush_kthread_worker(&md->kworker);
+		if (md->kworker_task)
+			flush_kthread_worker(&md->kworker);
 	}
 
 	flush_workqueue(md->wq);
-- 
1.9.5 (Apple Git-50.3)


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 7/7] dm: add 'use_blk_mq' module param and expose in per-device ro sysfs attr
  2015-03-12  3:56 [PATCH 0/7] dm: add full blk-mq support to request-based DM Mike Snitzer
                   ` (5 preceding siblings ...)
  2015-03-12  3:56 ` [PATCH 6/7] dm: optimize dm_mq_queue_rq to _not_ use kthread if using pure blk-mq Mike Snitzer
@ 2015-03-12  3:56 ` Mike Snitzer
  2015-03-13 14:34 ` [PATCH 0/7] dm: add full blk-mq support to request-based DM Jens Axboe
  7 siblings, 0 replies; 19+ messages in thread
From: Mike Snitzer @ 2015-03-12  3:56 UTC (permalink / raw)
  To: Jens Axboe, Keith Busch; +Cc: linux-kernel, linux-scsi, dm-devel

Request-based DM's blk-mq support defaults to off; but a user can easily
change the default using the dm_mod.use_blk_mq module/boot option.

Also, you can check what mode a given request-based DM device is using
with: cat /sys/block/dm-X/dm/use_blk_mq

This change enabled further cleanup and reduced work (e.g. the
md->io_pool and md->rq_pool isn't created if using blk-mq).

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 drivers/md/Kconfig    | 11 +++++++++++
 drivers/md/dm-sysfs.c |  9 +++++++++
 drivers/md/dm-table.c |  6 +++---
 drivers/md/dm.c       | 51 ++++++++++++++++++++++++++++++++++++++++-----------
 drivers/md/dm.h       |  5 ++++-
 5 files changed, 67 insertions(+), 15 deletions(-)

diff --git a/drivers/md/Kconfig b/drivers/md/Kconfig
index 63e05e3..109f9dc 100644
--- a/drivers/md/Kconfig
+++ b/drivers/md/Kconfig
@@ -196,6 +196,17 @@ config BLK_DEV_DM
 
 	  If unsure, say N.
 
+config DM_MQ_DEFAULT
+	bool "request-based DM: use blk-mq I/O path by default"
+	depends on BLK_DEV_DM
+	---help---
+	  This option enables the blk-mq based I/O path for request-based
+	  DM devices by default.  With the option the dm_mod.use_blk_mq
+	  module/boot option defaults to Y, without it to N, but it can
+	  still be overriden either way.
+
+	  If unsure say N.
+
 config DM_DEBUG
 	bool "Device mapper debugging support"
 	depends on BLK_DEV_DM
diff --git a/drivers/md/dm-sysfs.c b/drivers/md/dm-sysfs.c
index f5bb394..7e818f5 100644
--- a/drivers/md/dm-sysfs.c
+++ b/drivers/md/dm-sysfs.c
@@ -89,15 +89,24 @@ static ssize_t dm_attr_suspended_show(struct mapped_device *md, char *buf)
 	return strlen(buf);
 }
 
+static ssize_t dm_attr_use_blk_mq_show(struct mapped_device *md, char *buf)
+{
+	sprintf(buf, "%d\n", dm_use_blk_mq(md));
+
+	return strlen(buf);
+}
+
 static DM_ATTR_RO(name);
 static DM_ATTR_RO(uuid);
 static DM_ATTR_RO(suspended);
+static DM_ATTR_RO(use_blk_mq);
 static DM_ATTR_RW(rq_based_seq_io_merge_deadline);
 
 static struct attribute *dm_attrs[] = {
 	&dm_attr_name.attr,
 	&dm_attr_uuid.attr,
 	&dm_attr_suspended.attr,
+	&dm_attr_use_blk_mq.attr,
 	&dm_attr_rq_based_seq_io_merge_deadline.attr,
 	NULL,
 };
diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index 66600ca..8d025f3 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -940,7 +940,7 @@ bool dm_table_mq_request_based(struct dm_table *t)
 	return dm_table_get_type(t) == DM_TYPE_MQ_REQUEST_BASED;
 }
 
-static int dm_table_alloc_md_mempools(struct dm_table *t)
+static int dm_table_alloc_md_mempools(struct dm_table *t, struct mapped_device *md)
 {
 	unsigned type = dm_table_get_type(t);
 	unsigned per_bio_data_size = 0;
@@ -958,7 +958,7 @@ static int dm_table_alloc_md_mempools(struct dm_table *t)
 			per_bio_data_size = max(per_bio_data_size, tgt->per_bio_data_size);
 		}
 
-	t->mempools = dm_alloc_md_mempools(type, t->integrity_supported, per_bio_data_size);
+	t->mempools = dm_alloc_md_mempools(md, type, t->integrity_supported, per_bio_data_size);
 	if (!t->mempools)
 		return -ENOMEM;
 
@@ -1128,7 +1128,7 @@ int dm_table_complete(struct dm_table *t)
 		return r;
 	}
 
-	r = dm_table_alloc_md_mempools(t);
+	r = dm_table_alloc_md_mempools(t, t->md);
 	if (r)
 		DMERR("unable to allocate mempools");
 
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index b0c965a..3165b94 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -228,8 +228,20 @@ struct mapped_device {
 
 	/* for blk-mq request-based DM support */
 	struct blk_mq_tag_set tag_set;
+	bool use_blk_mq;
 };
 
+#ifdef CONFIG_DM_MQ_DEFAULT
+static bool use_blk_mq = true;
+#else
+static bool use_blk_mq = false;
+#endif
+
+bool dm_use_blk_mq(struct mapped_device *md)
+{
+	return md->use_blk_mq;
+}
+
 /*
  * For mempools pre-allocation at the table loading time.
  */
@@ -2219,6 +2231,7 @@ static void dm_init_md_queue(struct mapped_device *md)
 
 static void dm_init_old_md_queue(struct mapped_device *md)
 {
+	md->use_blk_mq = false;
 	dm_init_md_queue(md);
 
 	/*
@@ -2260,6 +2273,7 @@ static struct mapped_device *alloc_dev(int minor)
 	if (r < 0)
 		goto bad_io_barrier;
 
+	md->use_blk_mq = use_blk_mq;
 	md->type = DM_TYPE_NONE;
 	mutex_init(&md->suspend_lock);
 	mutex_init(&md->type_lock);
@@ -2346,7 +2360,6 @@ static void unlock_fs(struct mapped_device *md);
 static void free_dev(struct mapped_device *md)
 {
 	int minor = MINOR(disk_devt(md->disk));
-	bool using_blk_mq = !!md->queue->mq_ops;
 
 	unlock_fs(md);
 	bdput(md->bdev);
@@ -2372,7 +2385,7 @@ static void free_dev(struct mapped_device *md)
 
 	put_disk(md->disk);
 	blk_cleanup_queue(md->queue);
-	if (using_blk_mq)
+	if (md->use_blk_mq)
 		blk_mq_free_tag_set(&md->tag_set);
 	dm_stats_cleanup(&md->stats);
 	module_put(THIS_MODULE);
@@ -2383,7 +2396,7 @@ static void __bind_mempools(struct mapped_device *md, struct dm_table *t)
 {
 	struct dm_md_mempools *p = dm_table_get_md_mempools(t);
 
-	if (md->io_pool && md->bs) {
+	if (md->bs) {
 		/* The md already has necessary mempools. */
 		if (dm_table_get_type(t) == DM_TYPE_BIO_BASED) {
 			/*
@@ -2793,13 +2806,21 @@ out_tag_set:
 	return err;
 }
 
+static unsigned filter_md_type(unsigned type, struct mapped_device *md)
+{
+	if (type == DM_TYPE_BIO_BASED)
+		return type;
+
+	return !md->use_blk_mq ? DM_TYPE_REQUEST_BASED : DM_TYPE_MQ_REQUEST_BASED;
+}
+
 /*
  * Setup the DM device's queue based on md's type
  */
 int dm_setup_md_queue(struct mapped_device *md)
 {
 	int r;
-	unsigned md_type = dm_get_md_type(md);
+	unsigned md_type = filter_md_type(dm_get_md_type(md), md);
 
 	switch (md_type) {
 	case DM_TYPE_REQUEST_BASED:
@@ -3503,16 +3524,19 @@ int dm_noflush_suspending(struct dm_target *ti)
 }
 EXPORT_SYMBOL_GPL(dm_noflush_suspending);
 
-struct dm_md_mempools *dm_alloc_md_mempools(unsigned type, unsigned integrity, unsigned per_bio_data_size)
+struct dm_md_mempools *dm_alloc_md_mempools(struct mapped_device *md, unsigned type,
+					    unsigned integrity, unsigned per_bio_data_size)
 {
 	struct dm_md_mempools *pools = kzalloc(sizeof(*pools), GFP_KERNEL);
-	struct kmem_cache *cachep;
+	struct kmem_cache *cachep = NULL;
 	unsigned int pool_size = 0;
 	unsigned int front_pad;
 
 	if (!pools)
 		return NULL;
 
+	type = filter_md_type(type, md);
+
 	switch (type) {
 	case DM_TYPE_BIO_BASED:
 		cachep = _io_cache;
@@ -3520,13 +3544,13 @@ struct dm_md_mempools *dm_alloc_md_mempools(unsigned type, unsigned integrity, u
 		front_pad = roundup(per_bio_data_size, __alignof__(struct dm_target_io)) + offsetof(struct dm_target_io, clone);
 		break;
 	case DM_TYPE_REQUEST_BASED:
+		cachep = _rq_tio_cache;
 		pool_size = dm_get_reserved_rq_based_ios();
 		pools->rq_pool = mempool_create_slab_pool(pool_size, _rq_cache);
 		if (!pools->rq_pool)
 			goto out;
 		/* fall through to setup remaining rq-based pools */
 	case DM_TYPE_MQ_REQUEST_BASED:
-		cachep = _rq_tio_cache;
 		if (!pool_size)
 			pool_size = dm_get_reserved_rq_based_ios();
 		front_pad = offsetof(struct dm_rq_clone_bio_info, clone);
@@ -3534,12 +3558,14 @@ struct dm_md_mempools *dm_alloc_md_mempools(unsigned type, unsigned integrity, u
 		WARN_ON(per_bio_data_size != 0);
 		break;
 	default:
-		goto out;
+		BUG();
 	}
 
-	pools->io_pool = mempool_create_slab_pool(pool_size, cachep);
-	if (!pools->io_pool)
-		goto out;
+	if (cachep) {
+		pools->io_pool = mempool_create_slab_pool(pool_size, cachep);
+		if (!pools->io_pool)
+			goto out;
+	}
 
 	pools->bs = bioset_create_nobvec(pool_size, front_pad);
 	if (!pools->bs)
@@ -3596,6 +3622,9 @@ MODULE_PARM_DESC(reserved_bio_based_ios, "Reserved IOs in bio-based mempools");
 module_param(reserved_rq_based_ios, uint, S_IRUGO | S_IWUSR);
 MODULE_PARM_DESC(reserved_rq_based_ios, "Reserved IOs in request-based mempools");
 
+module_param(use_blk_mq, bool, S_IRUGO | S_IWUSR);
+MODULE_PARM_DESC(use_blk_mq, "Use block multiqueue for request-based DM devices");
+
 MODULE_DESCRIPTION(DM_NAME " driver");
 MODULE_AUTHOR("Joe Thornber <dm-devel@redhat.com>");
 MODULE_LICENSE("GPL");
diff --git a/drivers/md/dm.h b/drivers/md/dm.h
index 5522422..6123c2b 100644
--- a/drivers/md/dm.h
+++ b/drivers/md/dm.h
@@ -211,6 +211,8 @@ int dm_kobject_uevent(struct mapped_device *md, enum kobject_action action,
 void dm_internal_suspend(struct mapped_device *md);
 void dm_internal_resume(struct mapped_device *md);
 
+bool dm_use_blk_mq(struct mapped_device *md);
+
 int dm_io_init(void);
 void dm_io_exit(void);
 
@@ -220,7 +222,8 @@ void dm_kcopyd_exit(void);
 /*
  * Mempool operations
  */
-struct dm_md_mempools *dm_alloc_md_mempools(unsigned type, unsigned integrity, unsigned per_bio_data_size);
+struct dm_md_mempools *dm_alloc_md_mempools(struct mapped_device *md, unsigned type,
+					    unsigned integrity, unsigned per_bio_data_size);
 void dm_free_md_mempools(struct dm_md_mempools *pools);
 
 /*
-- 
1.9.5 (Apple Git-50.3)


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/7] blk-mq: fix use of incorrect goto label in blk_mq_init_queue error path
  2015-03-12  3:56 ` [PATCH 1/7] blk-mq: fix use of incorrect goto label in blk_mq_init_queue error path Mike Snitzer
@ 2015-03-12  7:48   ` Ming Lei
  2015-03-12 13:51     ` Mike Snitzer
  2015-03-12 14:27   ` [PATCH 1/7 v2] " Mike Snitzer
  1 sibling, 1 reply; 19+ messages in thread
From: Ming Lei @ 2015-03-12  7:48 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Jens Axboe, Keith Busch, Linux Kernel Mailing List,
	Linux SCSI List, dm-devel

On Thu, Mar 12, 2015 at 11:56 AM, Mike Snitzer <snitzer@redhat.com> wrote:
> If percpu_ref_init() fails the 'err_hctxs' label should be used instead
> of 'err_map'.
>
> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
> ---
>  block/blk-mq.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 4f4bea2..459840c 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -1938,7 +1938,7 @@ struct request_queue *blk_mq_init_queue(struct blk_mq_tag_set *set)
>          */
>         if (percpu_ref_init(&q->mq_usage_counter, blk_mq_usage_counter_release,
>                             PERCPU_REF_INIT_ATOMIC, GFP_KERNEL))
> -               goto err_map;
> +               goto err_hctxs;

If it is changed to 'goto err_hctxs', percpu_ref_init() need to
move before blk_alloc_queue_node(), otherwise just 'goto err_hw'
is enough, but the former is better.

Thanks,

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/7] blk-mq: add blk_mq_init_allocated_queue and export blk_mq_register_disk
  2015-03-12  3:56 ` [PATCH 2/7] blk-mq: add blk_mq_init_allocated_queue and export blk_mq_register_disk Mike Snitzer
@ 2015-03-12  8:10   ` Ming Lei
  2015-03-12 14:29   ` [PATCH 2/7 v2] " Mike Snitzer
  1 sibling, 0 replies; 19+ messages in thread
From: Ming Lei @ 2015-03-12  8:10 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Jens Axboe, Keith Busch, Linux Kernel Mailing List,
	Linux SCSI List, dm-devel

On Thu, Mar 12, 2015 at 11:56 AM, Mike Snitzer <snitzer@redhat.com> wrote:
> Add a variant of blk_mq_init_queue that allows a previously allocated
> queue to be initialized.  blk_mq_init_allocated_queue models
> blk_init_allocated_queue -- which was also created for DM's use.
>
> DM's approach to device creation requires a placeholder request_queue be
> allocated for use with alloc_dev() but the decision about what type of
> request_queue will be ultimately created is deferred until all component
> devices referenced in the DM table are processed to determine the table
> type (request-based, blk-mq request-based, or bio-based).
>
> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
> ---
>  block/blk-mq-sysfs.c   |  1 +
>  block/blk-mq.c         | 28 +++++++++++++++++++---------
>  include/linux/blk-mq.h |  2 ++
>  3 files changed, 22 insertions(+), 9 deletions(-)
>
> diff --git a/block/blk-mq-sysfs.c b/block/blk-mq-sysfs.c
> index 1630a20..b79685e 100644
> --- a/block/blk-mq-sysfs.c
> +++ b/block/blk-mq-sysfs.c
> @@ -436,6 +436,7 @@ int blk_mq_register_disk(struct gendisk *disk)
>
>         return 0;
>  }
> +EXPORT_SYMBOL_GPL(blk_mq_register_disk);
>
>  void blk_mq_sysfs_unregister(struct request_queue *q)
>  {
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 459840c..d576a78 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -1891,9 +1891,25 @@ void blk_mq_release(struct request_queue *q)
>
>  struct request_queue *blk_mq_init_queue(struct blk_mq_tag_set *set)
>  {
> +       struct request_queue *uninit_q, *q;
> +
> +       uninit_q = blk_alloc_queue_node(GFP_KERNEL, set->numa_node);
> +       if (!uninit_q)
> +               return ERR_PTR(-ENOMEM);
> +
> +       q = blk_mq_init_allocated_queue(set, uninit_q);
> +       if (!q)
> +               blk_cleanup_queue(uninit_q);
> +
> +       return q;

Current drivers suppose that queue won't be returned as null, and
looks you need to change the check of !q as IS_ERR(q) and return
ERR_PTR() if it is true.

> +}
> +EXPORT_SYMBOL(blk_mq_init_queue);
> +
> +struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
> +                                                 struct request_queue *q)
> +{
>         struct blk_mq_hw_ctx **hctxs;
>         struct blk_mq_ctx __percpu *ctx;
> -       struct request_queue *q;
>         unsigned int *map;
>         int i;
>
> @@ -1928,10 +1944,6 @@ struct request_queue *blk_mq_init_queue(struct blk_mq_tag_set *set)
>                 hctxs[i]->queue_num = i;
>         }
>
> -       q = blk_alloc_queue_node(GFP_KERNEL, set->numa_node);
> -       if (!q)
> -               goto err_hctxs;
> -
>         /*
>          * Init percpu_ref in atomic mode so that it's faster to shutdown.
>          * See blk_register_queue() for details.
> @@ -1981,7 +1993,7 @@ struct request_queue *blk_mq_init_queue(struct blk_mq_tag_set *set)
>         blk_mq_init_cpu_queues(q, set->nr_hw_queues);
>
>         if (blk_mq_init_hw_queues(q, set))
> -               goto err_hw;
> +               goto err_hctxs;
>
>         mutex_lock(&all_q_mutex);
>         list_add_tail(&q->all_q_node, &all_q_list);
> @@ -1993,8 +2005,6 @@ struct request_queue *blk_mq_init_queue(struct blk_mq_tag_set *set)
>
>         return q;
>
> -err_hw:
> -       blk_cleanup_queue(q);
>  err_hctxs:
>         kfree(map);
>         for (i = 0; i < set->nr_hw_queues; i++) {
> @@ -2009,7 +2019,7 @@ err_percpu:
>         free_percpu(ctx);
>         return ERR_PTR(-ENOMEM);
>  }
> -EXPORT_SYMBOL(blk_mq_init_queue);
> +EXPORT_SYMBOL(blk_mq_init_allocated_queue);
>
>  void blk_mq_free_queue(struct request_queue *q)
>  {
> diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
> index 7aec861..9a75c88 100644
> --- a/include/linux/blk-mq.h
> +++ b/include/linux/blk-mq.h
> @@ -164,6 +164,8 @@ enum {
>                 << BLK_MQ_F_ALLOC_POLICY_START_BIT)
>
>  struct request_queue *blk_mq_init_queue(struct blk_mq_tag_set *);
> +struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
> +                                                 struct request_queue *q);
>  void blk_mq_finish_init(struct request_queue *q);
>  int blk_mq_register_disk(struct gendisk *);
>  void blk_mq_unregister_disk(struct gendisk *);
> --
> 1.9.5 (Apple Git-50.3)
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Ming Lei

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/7] blk-mq: fix use of incorrect goto label in blk_mq_init_queue error path
  2015-03-12  7:48   ` Ming Lei
@ 2015-03-12 13:51     ` Mike Snitzer
  2015-03-13  3:29       ` Mike Snitzer
  0 siblings, 1 reply; 19+ messages in thread
From: Mike Snitzer @ 2015-03-12 13:51 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, Keith Busch, Linux Kernel Mailing List,
	Linux SCSI List, dm-devel

On Thu, Mar 12 2015 at  3:48am -0400,
Ming Lei <tom.leiming@gmail.com> wrote:

> On Thu, Mar 12, 2015 at 11:56 AM, Mike Snitzer <snitzer@redhat.com> wrote:
> > If percpu_ref_init() fails the 'err_hctxs' label should be used instead
> > of 'err_map'.
> >
> > Signed-off-by: Mike Snitzer <snitzer@redhat.com>
> > ---
> >  block/blk-mq.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/block/blk-mq.c b/block/blk-mq.c
> > index 4f4bea2..459840c 100644
> > --- a/block/blk-mq.c
> > +++ b/block/blk-mq.c
> > @@ -1938,7 +1938,7 @@ struct request_queue *blk_mq_init_queue(struct blk_mq_tag_set *set)
> >          */
> >         if (percpu_ref_init(&q->mq_usage_counter, blk_mq_usage_counter_release,
> >                             PERCPU_REF_INIT_ATOMIC, GFP_KERNEL))
> > -               goto err_map;
> > +               goto err_hctxs;
> 
> If it is changed to 'goto err_hctxs', percpu_ref_init() need to
> move before blk_alloc_queue_node(), otherwise just 'goto err_hw'
> is enough, but the former is better.

Yes, you're correct.  Patch 2 happened to eliminate this problem.  But
I'll get it fixed up and post v2.

The reason this thinko happened is I noticed the problem after I moved
the blk_alloc_queue_node() out in patch 2 and when I rebased to reorder
the fix before that patch 2 context got lost in the shuffle.

Thanks for your review!

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/7 v2] blk-mq: fix use of incorrect goto label in blk_mq_init_queue error path
  2015-03-12  3:56 ` [PATCH 1/7] blk-mq: fix use of incorrect goto label in blk_mq_init_queue error path Mike Snitzer
  2015-03-12  7:48   ` Ming Lei
@ 2015-03-12 14:27   ` Mike Snitzer
  2015-03-12 17:13     ` Hannes Reinecke
  2015-03-13  3:53     ` [PATCH 1/7 v3] " Mike Snitzer
  1 sibling, 2 replies; 19+ messages in thread
From: Mike Snitzer @ 2015-03-12 14:27 UTC (permalink / raw)
  To: Jens Axboe, Keith Busch; +Cc: linux-kernel, linux-scsi, dm-devel, Ming Lei

If percpu_ref_init() fails the 'err_hctxs' label should be used instead
of 'err_map'.

Rather than reuse 'err_hw', which was introduced if the later
blk_mq_init_hw_queues() call fails, move percpu_ref_init() before
blk_alloc_queue_node().

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Ming Lei <ming.lei@canonical.com>
---
 block/blk-mq.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 4f4bea2..0a877d2 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1928,17 +1928,17 @@ struct request_queue *blk_mq_init_queue(struct blk_mq_tag_set *set)
 		hctxs[i]->queue_num = i;
 	}
 
-	q = blk_alloc_queue_node(GFP_KERNEL, set->numa_node);
-	if (!q)
-		goto err_hctxs;
-
 	/*
 	 * Init percpu_ref in atomic mode so that it's faster to shutdown.
 	 * See blk_register_queue() for details.
 	 */
 	if (percpu_ref_init(&q->mq_usage_counter, blk_mq_usage_counter_release,
 			    PERCPU_REF_INIT_ATOMIC, GFP_KERNEL))
-		goto err_map;
+		goto err_hctxs;
+
+	q = blk_alloc_queue_node(GFP_KERNEL, set->numa_node);
+	if (!q)
+		goto err_hctxs;
 
 	setup_timer(&q->timeout, blk_mq_rq_timer, (unsigned long) q);
 	blk_queue_rq_timeout(q, 30000);
-- 
1.9.5 (Apple Git-50.3)


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/7 v2] blk-mq: add blk_mq_init_allocated_queue and export blk_mq_register_disk
  2015-03-12  3:56 ` [PATCH 2/7] blk-mq: add blk_mq_init_allocated_queue and export blk_mq_register_disk Mike Snitzer
  2015-03-12  8:10   ` Ming Lei
@ 2015-03-12 14:29   ` Mike Snitzer
  2015-03-12 17:14     ` Hannes Reinecke
  2015-03-13  3:56     ` [PATCH 2/7 v3] " Mike Snitzer
  1 sibling, 2 replies; 19+ messages in thread
From: Mike Snitzer @ 2015-03-12 14:29 UTC (permalink / raw)
  To: Jens Axboe, Keith Busch; +Cc: linux-kernel, linux-scsi, dm-devel, Ming Lei

Add a variant of blk_mq_init_queue that allows a previously allocated
queue to be initialized.  blk_mq_init_allocated_queue models
blk_init_allocated_queue -- which was also created for DM's use.

DM's approach to device creation requires a placeholder request_queue be
allocated for use with alloc_dev() but the decision about what type of
request_queue will be ultimately created is deferred until all component
devices referenced in the DM table are processed to determine the table
type (request-based, blk-mq request-based, or bio-based).

Also, because of DM's late finalization of the request_queue type
the call to blk_mq_register_disk() doesn't happen during alloc_dev().
Must export blk_mq_register_disk() so that DM can backfill the 'mq' dir
once the blk-mq queue is fully allocated.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Ming Lei <ming.lei@canonical.com>
---
 block/blk-mq-sysfs.c   |  1 +
 block/blk-mq.c         | 24 +++++++++++++++++++-----
 include/linux/blk-mq.h |  2 ++
 3 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/block/blk-mq-sysfs.c b/block/blk-mq-sysfs.c
index 1630a20..b79685e 100644
--- a/block/blk-mq-sysfs.c
+++ b/block/blk-mq-sysfs.c
@@ -436,6 +436,7 @@ int blk_mq_register_disk(struct gendisk *disk)
 
 	return 0;
 }
+EXPORT_SYMBOL_GPL(blk_mq_register_disk);
 
 void blk_mq_sysfs_unregister(struct request_queue *q)
 {
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 0a877d2..1cab7831 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1891,9 +1891,25 @@ void blk_mq_release(struct request_queue *q)
 
 struct request_queue *blk_mq_init_queue(struct blk_mq_tag_set *set)
 {
+	struct request_queue *uninit_q, *q;
+
+	uninit_q = blk_alloc_queue_node(GFP_KERNEL, set->numa_node);
+	if (!uninit_q)
+		return ERR_PTR(-ENOMEM);
+
+	q = blk_mq_init_allocated_queue(set, uninit_q);
+	if (IS_ERR(q))
+		blk_cleanup_queue(uninit_q);
+
+	return q;
+}
+EXPORT_SYMBOL(blk_mq_init_queue);
+
+struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
+						  struct request_queue *q)
+{
 	struct blk_mq_hw_ctx **hctxs;
 	struct blk_mq_ctx __percpu *ctx;
-	struct request_queue *q;
 	unsigned int *map;
 	int i;
 
@@ -1981,7 +1997,7 @@ struct request_queue *blk_mq_init_queue(struct blk_mq_tag_set *set)
 	blk_mq_init_cpu_queues(q, set->nr_hw_queues);
 
 	if (blk_mq_init_hw_queues(q, set))
-		goto err_hw;
+		goto err_hctxs;
 
 	mutex_lock(&all_q_mutex);
 	list_add_tail(&q->all_q_node, &all_q_list);
@@ -1993,8 +2009,6 @@ struct request_queue *blk_mq_init_queue(struct blk_mq_tag_set *set)
 
 	return q;
 
-err_hw:
-	blk_cleanup_queue(q);
 err_hctxs:
 	kfree(map);
 	for (i = 0; i < set->nr_hw_queues; i++) {
@@ -2009,7 +2023,7 @@ err_percpu:
 	free_percpu(ctx);
 	return ERR_PTR(-ENOMEM);
 }
-EXPORT_SYMBOL(blk_mq_init_queue);
+EXPORT_SYMBOL(blk_mq_init_allocated_queue);
 
 void blk_mq_free_queue(struct request_queue *q)
 {
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 7aec861..9a75c88 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -164,6 +164,8 @@ enum {
 		<< BLK_MQ_F_ALLOC_POLICY_START_BIT)
 
 struct request_queue *blk_mq_init_queue(struct blk_mq_tag_set *);
+struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
+						  struct request_queue *q);
 void blk_mq_finish_init(struct request_queue *q);
 int blk_mq_register_disk(struct gendisk *);
 void blk_mq_unregister_disk(struct gendisk *);
-- 
1.9.5 (Apple Git-50.3)



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/7 v2] blk-mq: fix use of incorrect goto label in blk_mq_init_queue error path
  2015-03-12 14:27   ` [PATCH 1/7 v2] " Mike Snitzer
@ 2015-03-12 17:13     ` Hannes Reinecke
  2015-03-13  3:53     ` [PATCH 1/7 v3] " Mike Snitzer
  1 sibling, 0 replies; 19+ messages in thread
From: Hannes Reinecke @ 2015-03-12 17:13 UTC (permalink / raw)
  To: dm-devel

On 03/12/2015 10:27 AM, Mike Snitzer wrote:
> If percpu_ref_init() fails the 'err_hctxs' label should be used instead
> of 'err_map'.
> 
> Rather than reuse 'err_hw', which was introduced if the later
> blk_mq_init_hw_queues() call fails, move percpu_ref_init() before
> blk_alloc_queue_node().
> 
> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
> Reviewed-by: Ming Lei <ming.lei@canonical.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/7 v2] blk-mq: add blk_mq_init_allocated_queue and export blk_mq_register_disk
  2015-03-12 14:29   ` [PATCH 2/7 v2] " Mike Snitzer
@ 2015-03-12 17:14     ` Hannes Reinecke
  2015-03-13  3:56     ` [PATCH 2/7 v3] " Mike Snitzer
  1 sibling, 0 replies; 19+ messages in thread
From: Hannes Reinecke @ 2015-03-12 17:14 UTC (permalink / raw)
  To: dm-devel

On 03/12/2015 10:29 AM, Mike Snitzer wrote:
> Add a variant of blk_mq_init_queue that allows a previously allocated
> queue to be initialized.  blk_mq_init_allocated_queue models
> blk_init_allocated_queue -- which was also created for DM's use.
> 
> DM's approach to device creation requires a placeholder request_queue be
> allocated for use with alloc_dev() but the decision about what type of
> request_queue will be ultimately created is deferred until all component
> devices referenced in the DM table are processed to determine the table
> type (request-based, blk-mq request-based, or bio-based).
> 
> Also, because of DM's late finalization of the request_queue type
> the call to blk_mq_register_disk() doesn't happen during alloc_dev().
> Must export blk_mq_register_disk() so that DM can backfill the 'mq' dir
> once the blk-mq queue is fully allocated.
> 
> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
> Reviewed-by: Ming Lei <ming.lei@canonical.com>

Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/7] blk-mq: fix use of incorrect goto label in blk_mq_init_queue error path
  2015-03-12 13:51     ` Mike Snitzer
@ 2015-03-13  3:29       ` Mike Snitzer
  0 siblings, 0 replies; 19+ messages in thread
From: Mike Snitzer @ 2015-03-13  3:29 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, Keith Busch, Linux Kernel Mailing List,
	Linux SCSI List, dm-devel

On Thu, Mar 12 2015 at  9:51am -0400,
Mike Snitzer <snitzer@redhat.com> wrote:

> On Thu, Mar 12 2015 at  3:48am -0400,
> Ming Lei <tom.leiming@gmail.com> wrote:
> 
> > On Thu, Mar 12, 2015 at 11:56 AM, Mike Snitzer <snitzer@redhat.com> wrote:
> > > If percpu_ref_init() fails the 'err_hctxs' label should be used instead
> > > of 'err_map'.
> > >
> > > Signed-off-by: Mike Snitzer <snitzer@redhat.com>
> > > ---
> > >  block/blk-mq.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/block/blk-mq.c b/block/blk-mq.c
> > > index 4f4bea2..459840c 100644
> > > --- a/block/blk-mq.c
> > > +++ b/block/blk-mq.c
> > > @@ -1938,7 +1938,7 @@ struct request_queue *blk_mq_init_queue(struct blk_mq_tag_set *set)
> > >          */
> > >         if (percpu_ref_init(&q->mq_usage_counter, blk_mq_usage_counter_release,
> > >                             PERCPU_REF_INIT_ATOMIC, GFP_KERNEL))
> > > -               goto err_map;
> > > +               goto err_hctxs;
> > 
> > If it is changed to 'goto err_hctxs', percpu_ref_init() need to
> > move before blk_alloc_queue_node(), otherwise just 'goto err_hw'
> > is enough, but the former is better.
> 
> Yes, you're correct.  Patch 2 happened to eliminate this problem.  But
> I'll get it fixed up and post v2.

Actually, percpu_ref_init clearly cannot go before blk_alloc_queue_node
since percpu_ref_init deferences q.  (buildbot just emailed complaining
about crashes with my v2 patch).  I didn't test this 1/7 patch in
isolation but should have -- when 2/7 is applied the problem goes away.

v3 of these first 2 patches will be inbound shortly.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/7 v3] blk-mq: fix use of incorrect goto label in blk_mq_init_queue error path
  2015-03-12 14:27   ` [PATCH 1/7 v2] " Mike Snitzer
  2015-03-12 17:13     ` Hannes Reinecke
@ 2015-03-13  3:53     ` Mike Snitzer
  1 sibling, 0 replies; 19+ messages in thread
From: Mike Snitzer @ 2015-03-13  3:53 UTC (permalink / raw)
  To: Jens Axboe, Keith Busch; +Cc: dm-devel, Ming Lei, linux-kernel, linux-scsi

If percpu_ref_init() fails the allocated q and hctxs must get cleaned
up; using 'err_map' doesn't allow that to happen.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Ming Lei <ming.lei@canonical.com>
---
 block/blk-mq.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 4f4bea2..b7b8933 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1938,7 +1938,7 @@ struct request_queue *blk_mq_init_queue(struct blk_mq_tag_set *set)
 	 */
 	if (percpu_ref_init(&q->mq_usage_counter, blk_mq_usage_counter_release,
 			    PERCPU_REF_INIT_ATOMIC, GFP_KERNEL))
-		goto err_map;
+		goto err_mq_usage;
 
 	setup_timer(&q->timeout, blk_mq_rq_timer, (unsigned long) q);
 	blk_queue_rq_timeout(q, 30000);
@@ -1981,7 +1981,7 @@ struct request_queue *blk_mq_init_queue(struct blk_mq_tag_set *set)
 	blk_mq_init_cpu_queues(q, set->nr_hw_queues);
 
 	if (blk_mq_init_hw_queues(q, set))
-		goto err_hw;
+		goto err_mq_usage;
 
 	mutex_lock(&all_q_mutex);
 	list_add_tail(&q->all_q_node, &all_q_list);
@@ -1993,7 +1993,7 @@ struct request_queue *blk_mq_init_queue(struct blk_mq_tag_set *set)
 
 	return q;
 
-err_hw:
+err_mq_usage:
 	blk_cleanup_queue(q);
 err_hctxs:
 	kfree(map);
-- 
1.9.5 (Apple Git-50.3)


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/7 v3] blk-mq: add blk_mq_init_allocated_queue and export blk_mq_register_disk
  2015-03-12 14:29   ` [PATCH 2/7 v2] " Mike Snitzer
  2015-03-12 17:14     ` Hannes Reinecke
@ 2015-03-13  3:56     ` Mike Snitzer
  1 sibling, 0 replies; 19+ messages in thread
From: Mike Snitzer @ 2015-03-13  3:56 UTC (permalink / raw)
  To: Jens Axboe, Keith Busch; +Cc: dm-devel, Ming Lei, linux-kernel, linux-scsi

Add a variant of blk_mq_init_queue that allows a previously allocated
queue to be initialized.  blk_mq_init_allocated_queue models
blk_init_allocated_queue -- which was also created for DM's use.

DM's approach to device creation requires a placeholder request_queue be
allocated for use with alloc_dev() but the decision about what type of
request_queue will be ultimately created is deferred until all component
devices referenced in the DM table are processed to determine the table
type (request-based, blk-mq request-based, or bio-based).

Also, because of DM's late finalization of the request_queue type
the call to blk_mq_register_disk() doesn't happen during alloc_dev().
Must export blk_mq_register_disk() so that DM can backfill the 'mq' dir
once the blk-mq queue is fully allocated.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Ming Lei <ming.lei@canonical.com>
---
 block/blk-mq-sysfs.c   |  1 +
 block/blk-mq.c         | 30 ++++++++++++++++++++----------
 include/linux/blk-mq.h |  2 ++
 3 files changed, 23 insertions(+), 10 deletions(-)

diff --git a/block/blk-mq-sysfs.c b/block/blk-mq-sysfs.c
index 1630a20..b79685e 100644
--- a/block/blk-mq-sysfs.c
+++ b/block/blk-mq-sysfs.c
@@ -436,6 +436,7 @@ int blk_mq_register_disk(struct gendisk *disk)
 
 	return 0;
 }
+EXPORT_SYMBOL_GPL(blk_mq_register_disk);
 
 void blk_mq_sysfs_unregister(struct request_queue *q)
 {
diff --git a/block/blk-mq.c b/block/blk-mq.c
index b7b8933..3000121 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1891,9 +1891,25 @@ void blk_mq_release(struct request_queue *q)
 
 struct request_queue *blk_mq_init_queue(struct blk_mq_tag_set *set)
 {
+	struct request_queue *uninit_q, *q;
+
+	uninit_q = blk_alloc_queue_node(GFP_KERNEL, set->numa_node);
+	if (!uninit_q)
+		return ERR_PTR(-ENOMEM);
+
+	q = blk_mq_init_allocated_queue(set, uninit_q);
+	if (IS_ERR(q))
+		blk_cleanup_queue(uninit_q);
+
+	return q;
+}
+EXPORT_SYMBOL(blk_mq_init_queue);
+
+struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
+						  struct request_queue *q)
+{
 	struct blk_mq_hw_ctx **hctxs;
 	struct blk_mq_ctx __percpu *ctx;
-	struct request_queue *q;
 	unsigned int *map;
 	int i;
 
@@ -1928,17 +1944,13 @@ struct request_queue *blk_mq_init_queue(struct blk_mq_tag_set *set)
 		hctxs[i]->queue_num = i;
 	}
 
-	q = blk_alloc_queue_node(GFP_KERNEL, set->numa_node);
-	if (!q)
-		goto err_hctxs;
-
 	/*
 	 * Init percpu_ref in atomic mode so that it's faster to shutdown.
 	 * See blk_register_queue() for details.
 	 */
 	if (percpu_ref_init(&q->mq_usage_counter, blk_mq_usage_counter_release,
 			    PERCPU_REF_INIT_ATOMIC, GFP_KERNEL))
-		goto err_mq_usage;
+		goto err_hctxs;
 
 	setup_timer(&q->timeout, blk_mq_rq_timer, (unsigned long) q);
 	blk_queue_rq_timeout(q, 30000);
@@ -1981,7 +1993,7 @@ struct request_queue *blk_mq_init_queue(struct blk_mq_tag_set *set)
 	blk_mq_init_cpu_queues(q, set->nr_hw_queues);
 
 	if (blk_mq_init_hw_queues(q, set))
-		goto err_mq_usage;
+		goto err_hctxs;
 
 	mutex_lock(&all_q_mutex);
 	list_add_tail(&q->all_q_node, &all_q_list);
@@ -1993,8 +2005,6 @@ struct request_queue *blk_mq_init_queue(struct blk_mq_tag_set *set)
 
 	return q;
 
-err_mq_usage:
-	blk_cleanup_queue(q);
 err_hctxs:
 	kfree(map);
 	for (i = 0; i < set->nr_hw_queues; i++) {
@@ -2009,7 +2019,7 @@ err_percpu:
 	free_percpu(ctx);
 	return ERR_PTR(-ENOMEM);
 }
-EXPORT_SYMBOL(blk_mq_init_queue);
+EXPORT_SYMBOL(blk_mq_init_allocated_queue);
 
 void blk_mq_free_queue(struct request_queue *q)
 {
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 7aec861..9a75c88 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -164,6 +164,8 @@ enum {
 		<< BLK_MQ_F_ALLOC_POLICY_START_BIT)
 
 struct request_queue *blk_mq_init_queue(struct blk_mq_tag_set *);
+struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
+						  struct request_queue *q);
 void blk_mq_finish_init(struct request_queue *q);
 int blk_mq_register_disk(struct gendisk *);
 void blk_mq_unregister_disk(struct gendisk *);
-- 
1.9.5 (Apple Git-50.3)


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/7] dm: add full blk-mq support to request-based DM
  2015-03-12  3:56 [PATCH 0/7] dm: add full blk-mq support to request-based DM Mike Snitzer
                   ` (6 preceding siblings ...)
  2015-03-12  3:56 ` [PATCH 7/7] dm: add 'use_blk_mq' module param and expose in per-device ro sysfs attr Mike Snitzer
@ 2015-03-13 14:34 ` Jens Axboe
  7 siblings, 0 replies; 19+ messages in thread
From: Jens Axboe @ 2015-03-13 14:34 UTC (permalink / raw)
  To: Mike Snitzer, Keith Busch; +Cc: linux-kernel, linux-scsi, dm-devel

On 03/11/2015 09:56 PM, Mike Snitzer wrote:
> This patchset updates request-based DM (used exclussively by DM
> multipath) to support blk-mq I/O path or the old ->request_fn I/O
> path -- the default is still the old ->request_fn I/O path.
>
> Like was done for SCSI, I've exposed a dm-mod.ko 'use_blk_mq' module
> parameter and a CONFIG_DM_MQ_DEFAULT.  I've queued these patches in
> linux-next via linux-dm.git's 'for-next' branch -- my goal is for
> these changes to land upstream in 4.1.
>
> I've tested all 4 supported permutations of rq-based queue stacking:
> 1) blk-mq mpath stacked on blk-mq device(s)
> 2) blk-mq mpath stacked on ->request_fn device(s)
> 3) old ->request_fn mpath stacked on blk-mq device(s)
> 4) old ->request_fn mpath stacked on old ->request_fn device(s)
>
> There is definitely much more room for optimizing the blk-mq mpath
> queue (via dm_mq_ops setup) to have more awareness about the
> characteristics of the underlying blk-mq device(s) -- Jens offered
> some suggestions that are worth pursuing.  Goal being to leverage the
> ability to have multiple software submission queues that map to the
> underlying paths' HW queue.  I expect Keith to be the most likely
> person to pursue these follow-on optimizations given his immediate
> access to NVMe devices, etc.
>
> blk-mq allows for the removal of 2 of the 3 mempools and associated
> allocations that were traditionally performed using old ->request_fn
> I/O path.  The md->bs bioset still remains for cloning a request's
> bios.
>
> I'll be pursuing the possibility of removing the bio cloning that
> happens when cloning a request (it was put in place for partial
> completion of a request's bios but the prospect of avoiding those bio
> clones warrants further review of how useful those partial completions
> are, see: http://www.redhat.com/archives/dm-devel/2014-June/msg00030.html)
>
> Jens and Keith, special thanks for your help answering more blk-mq
> questions than I'd have hoped would be needed while at LSF/MM.
>
> Keith Busch (1):
>    blk-mq: don't wait in blk_mq_queue_enter() if __GFP_WAIT isn't set
>
> Mike Snitzer (6):
>    blk-mq: fix use of incorrect goto label in blk_mq_init_queue error path
>    blk-mq: add blk_mq_init_allocated_queue and export blk_mq_register_disk
>    blk-mq: export blk_mq_run_hw_queues
>    dm: add full blk-mq support to request-based DM
>    dm: optimize dm_mq_queue_rq to _not_ use kthread if using pure blk-mq
>    dm: add 'use_blk_mq' module param and expose in per-device ro sysfs attr

Applied patch 1 to current tree, marked for stable. 2-4 are applied for 
4.1, in for-4.1/core.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2015-03-13 14:34 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-12  3:56 [PATCH 0/7] dm: add full blk-mq support to request-based DM Mike Snitzer
2015-03-12  3:56 ` [PATCH 1/7] blk-mq: fix use of incorrect goto label in blk_mq_init_queue error path Mike Snitzer
2015-03-12  7:48   ` Ming Lei
2015-03-12 13:51     ` Mike Snitzer
2015-03-13  3:29       ` Mike Snitzer
2015-03-12 14:27   ` [PATCH 1/7 v2] " Mike Snitzer
2015-03-12 17:13     ` Hannes Reinecke
2015-03-13  3:53     ` [PATCH 1/7 v3] " Mike Snitzer
2015-03-12  3:56 ` [PATCH 2/7] blk-mq: add blk_mq_init_allocated_queue and export blk_mq_register_disk Mike Snitzer
2015-03-12  8:10   ` Ming Lei
2015-03-12 14:29   ` [PATCH 2/7 v2] " Mike Snitzer
2015-03-12 17:14     ` Hannes Reinecke
2015-03-13  3:56     ` [PATCH 2/7 v3] " Mike Snitzer
2015-03-12  3:56 ` [PATCH 3/7] blk-mq: export blk_mq_run_hw_queues Mike Snitzer
2015-03-12  3:56 ` [PATCH 4/7] blk-mq: don't wait in blk_mq_queue_enter() if __GFP_WAIT isn't set Mike Snitzer
2015-03-12  3:56 ` [PATCH 5/7] dm: add full blk-mq support to request-based DM Mike Snitzer
2015-03-12  3:56 ` [PATCH 6/7] dm: optimize dm_mq_queue_rq to _not_ use kthread if using pure blk-mq Mike Snitzer
2015-03-12  3:56 ` [PATCH 7/7] dm: add 'use_blk_mq' module param and expose in per-device ro sysfs attr Mike Snitzer
2015-03-13 14:34 ` [PATCH 0/7] dm: add full blk-mq support to request-based DM Jens Axboe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.