All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/8] dm: optimize request-based queue processing
@ 2015-03-04  0:47 Mike Snitzer
  2015-03-04  0:47 ` [PATCH 1/8] dm: remove unnecessary wrapper around blk_lld_busy Mike Snitzer
                   ` (7 more replies)
  0 siblings, 8 replies; 19+ messages in thread
From: Mike Snitzer @ 2015-03-04  0:47 UTC (permalink / raw)
  To: dm-devel; +Cc: axboe, jmoyer

This patchset is the result of working through the report that
request-based DM (via dm-multipath) performance suffers on very fast
storage.

The last patch is this set introduces a new per-device sysfs parameter
(rq_based_queue_deadline) that can be set to throttle how aggressively
a request is allowed to remain on the queue to promote merging.
Netapp found that for their filer with an SSD backend and 4 paths they
needed to set this to 10us -- performance jumped nearly 100% (from 280
to 503 MB/s) for their multi-threaded sequential IO workload.

This new parameter is clearly beneficial but yet unfortunate because:
it requires users with really fast multipath storage configs to
test/set this parameter based on running their sequential IO workload
on their specific hardware config.

While I'd love it if we could autodetect the proper value for _any_
storage I really don't think that autotuning belongs in the kernel.
Not to mention: getting that algorithm 1) correct 2) efficient across
all storage is difficult.  But I'm open to any suggestions for how to
autotune this parameter in-kernel without impacting slower setups that
have no need to be bothered with this parameter.

Mike Snitzer (8):
  dm: remove unnecessary wrapper around blk_lld_busy
  dm: remove request-based DM queue's lld_busy_fn hook
  dm: remove request-based logic from make_request_fn wrapper
  dm: only run the queue on completion if congested or no requests pending
  dm: don't schedule delayed run of the queue if nothing to do
  dm: don't start current request if it would've merged with the previous
  dm sysfs: introduce ability to add writable attributes
  dm: impose configurable deadline for dm_request_fn's merge heuristic

 block/blk-core.c              |   5 +-
 drivers/md/dm-mpath.c         |   2 +-
 drivers/md/dm-sysfs.c         |  34 +++++++++--
 drivers/md/dm-table.c         |  14 -----
 drivers/md/dm.c               | 128 +++++++++++++++++++++++++++---------------
 drivers/md/dm.h               |   5 +-
 include/linux/blkdev.h        |   2 -
 include/linux/device-mapper.h |   5 --
 8 files changed, 121 insertions(+), 74 deletions(-)

-- 
1.9.3 (Apple Git-50)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/8] dm: remove unnecessary wrapper around blk_lld_busy
  2015-03-04  0:47 [PATCH 0/8] dm: optimize request-based queue processing Mike Snitzer
@ 2015-03-04  0:47 ` Mike Snitzer
  2015-03-04  0:47 ` [PATCH 2/8] dm: remove request-based DM queue's lld_busy_fn hook Mike Snitzer
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Mike Snitzer @ 2015-03-04  0:47 UTC (permalink / raw)
  To: dm-devel; +Cc: axboe, jmoyer

There is no need for DM to export a wrapper around the already exported
blk_lld_busy().

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 drivers/md/dm-mpath.c         | 2 +-
 drivers/md/dm.c               | 6 ------
 include/linux/device-mapper.h | 5 -----
 3 files changed, 1 insertion(+), 12 deletions(-)

diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
index d376dc8..add6391 100644
--- a/drivers/md/dm-mpath.c
+++ b/drivers/md/dm-mpath.c
@@ -1627,7 +1627,7 @@ static int __pgpath_busy(struct pgpath *pgpath)
 {
 	struct request_queue *q = bdev_get_queue(pgpath->path.dev->bdev);
 
-	return dm_underlying_device_busy(q);
+	return blk_lld_busy(q);
 }
 
 /*
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 9b641b3..e57a2f9 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -2003,12 +2003,6 @@ out:
 	dm_put_live_table(md, srcu_idx);
 }
 
-int dm_underlying_device_busy(struct request_queue *q)
-{
-	return blk_lld_busy(q);
-}
-EXPORT_SYMBOL_GPL(dm_underlying_device_busy);
-
 static int dm_lld_busy(struct request_queue *q)
 {
 	int r;
diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h
index fd23978..51cc1de 100644
--- a/include/linux/device-mapper.h
+++ b/include/linux/device-mapper.h
@@ -605,9 +605,4 @@ static inline unsigned long to_bytes(sector_t n)
 	return (n << SECTOR_SHIFT);
 }
 
-/*-----------------------------------------------------------------
- * Helper for block layer and dm core operations
- *---------------------------------------------------------------*/
-int dm_underlying_device_busy(struct request_queue *q);
-
 #endif	/* _LINUX_DEVICE_MAPPER_H */
-- 
1.9.3 (Apple Git-50)

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/8] dm: remove request-based DM queue's lld_busy_fn hook
  2015-03-04  0:47 [PATCH 0/8] dm: optimize request-based queue processing Mike Snitzer
  2015-03-04  0:47 ` [PATCH 1/8] dm: remove unnecessary wrapper around blk_lld_busy Mike Snitzer
@ 2015-03-04  0:47 ` Mike Snitzer
  2015-03-04  0:47 ` [PATCH 3/8] dm: remove request-based logic from make_request_fn wrapper Mike Snitzer
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Mike Snitzer @ 2015-03-04  0:47 UTC (permalink / raw)
  To: dm-devel; +Cc: axboe, jmoyer

DM multipath is the only caller of blk_lld_busy() -- which calls a
queue's lld_busy_fn hook.  Request-based DM doesn't support stacking
multipath devices so there is no reason to register the lld_busy_fn hook
on a multipath device's queue using blk_queue_lld_busy().

As such, remove functions dm_lld_busy and dm_table_any_busy_target.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 drivers/md/dm-table.c | 14 --------------
 drivers/md/dm.c       | 17 -----------------
 drivers/md/dm.h       |  1 -
 3 files changed, 32 deletions(-)

diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index 6554d91..0573120 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -1677,20 +1677,6 @@ int dm_table_any_congested(struct dm_table *t, int bdi_bits)
 	return r;
 }
 
-int dm_table_any_busy_target(struct dm_table *t)
-{
-	unsigned i;
-	struct dm_target *ti;
-
-	for (i = 0; i < t->num_targets; i++) {
-		ti = t->targets + i;
-		if (ti->type->busy && ti->type->busy(ti))
-			return 1;
-	}
-
-	return 0;
-}
-
 struct mapped_device *dm_table_get_md(struct dm_table *t)
 {
 	return t->md;
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index e57a2f9..fcc3203 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -2003,22 +2003,6 @@ out:
 	dm_put_live_table(md, srcu_idx);
 }
 
-static int dm_lld_busy(struct request_queue *q)
-{
-	int r;
-	struct mapped_device *md = q->queuedata;
-	struct dm_table *map = dm_get_live_table_fast(md);
-
-	if (!map || test_bit(DMF_BLOCK_IO_FOR_SUSPEND, &md->flags))
-		r = 1;
-	else
-		r = dm_table_any_busy_target(map);
-
-	dm_put_live_table_fast(md);
-
-	return r;
-}
-
 static int dm_any_congested(void *congested_data, int bdi_bits)
 {
 	int r = bdi_bits;
@@ -2540,7 +2524,6 @@ static int dm_init_request_based_queue(struct mapped_device *md)
 	dm_init_md_queue(md);
 	blk_queue_softirq_done(md->queue, dm_softirq_done);
 	blk_queue_prep_rq(md->queue, dm_prep_fn);
-	blk_queue_lld_busy(md->queue, dm_lld_busy);
 
 	/* Also initialize the request-based DM worker thread */
 	init_kthread_worker(&md->kworker);
diff --git a/drivers/md/dm.h b/drivers/md/dm.h
index 59f53e7..db49586 100644
--- a/drivers/md/dm.h
+++ b/drivers/md/dm.h
@@ -70,7 +70,6 @@ void dm_table_presuspend_undo_targets(struct dm_table *t);
 void dm_table_postsuspend_targets(struct dm_table *t);
 int dm_table_resume_targets(struct dm_table *t);
 int dm_table_any_congested(struct dm_table *t, int bdi_bits);
-int dm_table_any_busy_target(struct dm_table *t);
 unsigned dm_table_get_type(struct dm_table *t);
 struct target_type *dm_table_get_immutable_target_type(struct dm_table *t);
 bool dm_table_request_based(struct dm_table *t);
-- 
1.9.3 (Apple Git-50)

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 3/8] dm: remove request-based logic from make_request_fn wrapper
  2015-03-04  0:47 [PATCH 0/8] dm: optimize request-based queue processing Mike Snitzer
  2015-03-04  0:47 ` [PATCH 1/8] dm: remove unnecessary wrapper around blk_lld_busy Mike Snitzer
  2015-03-04  0:47 ` [PATCH 2/8] dm: remove request-based DM queue's lld_busy_fn hook Mike Snitzer
@ 2015-03-04  0:47 ` Mike Snitzer
  2015-03-04  0:47 ` [PATCH 4/8] dm: only run the queue on completion if congested or no requests pending Mike Snitzer
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Mike Snitzer @ 2015-03-04  0:47 UTC (permalink / raw)
  To: dm-devel; +Cc: axboe, jmoyer

The old dm_request() method used for q->make_request_fn had a branch for
request-based DM support but it isn't needed given that
dm_init_request_based_queue() sets it to the standard blk_queue_bio()
anyway.

Cleanup dm_init_md_queue() to be DM device-type agnostic and have
dm_setup_md_queue() properly finish queue setup based on DM device-type
(bio-based vs request-based).

Also, remove the export for blk_queue_bio() now that DM no longer calls
it directly.  Doing so required a forward declaration in blk-core.c.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 block/blk-core.c       |  5 +++--
 drivers/md/dm.c        | 29 ++++++++++++-----------------
 include/linux/blkdev.h |  2 --
 3 files changed, 15 insertions(+), 21 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 794c3e7..4bcd739 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -719,6 +719,8 @@ blk_init_queue_node(request_fn_proc *rfn, spinlock_t *lock, int node_id)
 }
 EXPORT_SYMBOL(blk_init_queue_node);
 
+static void blk_queue_bio(struct request_queue *q, struct bio *bio);
+
 struct request_queue *
 blk_init_allocated_queue(struct request_queue *q, request_fn_proc *rfn,
 			 spinlock_t *lock)
@@ -1563,7 +1565,7 @@ void init_request_from_bio(struct request *req, struct bio *bio)
 	blk_rq_bio_prep(req->q, req, bio);
 }
 
-void blk_queue_bio(struct request_queue *q, struct bio *bio)
+static void blk_queue_bio(struct request_queue *q, struct bio *bio)
 {
 	const bool sync = !!(bio->bi_rw & REQ_SYNC);
 	struct blk_plug *plug;
@@ -1671,7 +1673,6 @@ out_unlock:
 		spin_unlock_irq(q->queue_lock);
 	}
 }
-EXPORT_SYMBOL_GPL(blk_queue_bio);	/* for device mapper only */
 
 /*
  * If bio->bi_dev is a partition, remap the location
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index fcc3203..6a85bd4 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1690,7 +1690,7 @@ out:
  * The request function that just remaps the bio built up by
  * dm_merge_bvec.
  */
-static void _dm_request(struct request_queue *q, struct bio *bio)
+static void dm_make_request(struct request_queue *q, struct bio *bio)
 {
 	int rw = bio_data_dir(bio);
 	struct mapped_device *md = q->queuedata;
@@ -1722,16 +1722,6 @@ int dm_request_based(struct mapped_device *md)
 	return blk_queue_stackable(md->queue);
 }
 
-static void dm_request(struct request_queue *q, struct bio *bio)
-{
-	struct mapped_device *md = q->queuedata;
-
-	if (dm_request_based(md))
-		blk_queue_bio(q, bio);
-	else
-		_dm_request(q, bio);
-}
-
 static void dm_dispatch_clone_request(struct request *clone, struct request *rq)
 {
 	int r;
@@ -2097,9 +2087,8 @@ static void dm_init_md_queue(struct mapped_device *md)
 	md->queue->queuedata = md;
 	md->queue->backing_dev_info.congested_fn = dm_any_congested;
 	md->queue->backing_dev_info.congested_data = md;
-	blk_queue_make_request(md->queue, dm_request);
+
 	blk_queue_bounce_limit(md->queue, BLK_BOUNCE_ANY);
-	blk_queue_merge_bvec(md->queue, dm_merge_bvec);
 }
 
 /*
@@ -2330,7 +2319,7 @@ int dm_queue_merge_is_compulsory(struct request_queue *q)
 	if (!q->merge_bvec_fn)
 		return 0;
 
-	if (q->make_request_fn == dm_request) {
+	if (q->make_request_fn == dm_make_request) {
 		dev_md = q->queuedata;
 		if (test_bit(DMF_MERGE_IS_OPTIONAL, &dev_md->flags))
 			return 0;
@@ -2540,9 +2529,15 @@ static int dm_init_request_based_queue(struct mapped_device *md)
  */
 int dm_setup_md_queue(struct mapped_device *md)
 {
-	if (dm_md_type_request_based(md) && !dm_init_request_based_queue(md)) {
-		DMWARN("Cannot initialize queue for request-based mapped device");
-		return -EINVAL;
+	if (dm_md_type_request_based(md)) {
+		if (!dm_init_request_based_queue(md)) {
+			DMWARN("Cannot initialize queue for request-based mapped device");
+			return -EINVAL;
+		}
+	} else {
+		/* bio-based specific initialization */
+		blk_queue_make_request(md->queue, dm_make_request);
+		blk_queue_merge_bvec(md->queue, dm_merge_bvec);
 	}
 
 	return 0;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 7f9a516..5d93a66 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -821,8 +821,6 @@ extern int scsi_cmd_ioctl(struct request_queue *, struct gendisk *, fmode_t,
 extern int sg_scsi_ioctl(struct request_queue *, struct gendisk *, fmode_t,
 			 struct scsi_ioctl_command __user *);
 
-extern void blk_queue_bio(struct request_queue *q, struct bio *bio);
-
 /*
  * A queue has just exitted congestion.  Note this in the global counter of
  * congested queues, and wake up anyone who was waiting for requests to be
-- 
1.9.3 (Apple Git-50)

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 4/8] dm: only run the queue on completion if congested or no requests pending
  2015-03-04  0:47 [PATCH 0/8] dm: optimize request-based queue processing Mike Snitzer
                   ` (2 preceding siblings ...)
  2015-03-04  0:47 ` [PATCH 3/8] dm: remove request-based logic from make_request_fn wrapper Mike Snitzer
@ 2015-03-04  0:47 ` Mike Snitzer
  2015-03-04  0:47 ` [PATCH 5/8] dm: don't schedule delayed run of the queue if nothing to do Mike Snitzer
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Mike Snitzer @ 2015-03-04  0:47 UTC (permalink / raw)
  To: dm-devel; +Cc: axboe, jmoyer

On really fast storage it can be beneficial to delay running the
request_queue to allow the elevator more opportunity to merge requests.

Otherwise, it has been observed that requests are being sent to
q->request_fn much quicker than is ideal on IOPS-bound backends.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 drivers/md/dm.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 6a85bd4..8e58df5 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1021,10 +1021,13 @@ static void end_clone_bio(struct bio *clone, int error)
  */
 static void rq_completed(struct mapped_device *md, int rw, bool run_queue)
 {
+	int nr_requests_pending;
+
 	atomic_dec(&md->pending[rw]);
 
 	/* nudge anyone waiting on suspend queue */
-	if (!md_in_flight(md))
+	nr_requests_pending = md_in_flight(md);
+	if (!nr_requests_pending)
 		wake_up(&md->wait);
 
 	/*
@@ -1033,8 +1036,11 @@ static void rq_completed(struct mapped_device *md, int rw, bool run_queue)
 	 * back into ->request_fn() could deadlock attempting to grab the
 	 * queue lock again.
 	 */
-	if (run_queue)
-		blk_run_queue_async(md->queue);
+	if (run_queue) {
+		if (!nr_requests_pending ||
+		    (nr_requests_pending >= md->queue->nr_congestion_on))
+			blk_run_queue_async(md->queue);
+	}
 
 	/*
 	 * dm_put() must be at the end of this function. See the comment above
-- 
1.9.3 (Apple Git-50)

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 5/8] dm: don't schedule delayed run of the queue if nothing to do
  2015-03-04  0:47 [PATCH 0/8] dm: optimize request-based queue processing Mike Snitzer
                   ` (3 preceding siblings ...)
  2015-03-04  0:47 ` [PATCH 4/8] dm: only run the queue on completion if congested or no requests pending Mike Snitzer
@ 2015-03-04  0:47 ` Mike Snitzer
  2015-03-04  0:47 ` [PATCH 6/8] dm: don't start current request if it would've merged with the previous Mike Snitzer
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Mike Snitzer @ 2015-03-04  0:47 UTC (permalink / raw)
  To: dm-devel; +Cc: axboe, jmoyer

In request-based DM's dm_request_fn(), if blk_peek_request() returns
NULL just return.  Avoids unnecessary blk_delay_queue().

Reported-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 drivers/md/dm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 8e58df5..c13477a 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1960,7 +1960,7 @@ static void dm_request_fn(struct request_queue *q)
 	while (!blk_queue_stopped(q)) {
 		rq = blk_peek_request(q);
 		if (!rq)
-			goto delay_and_out;
+			goto out;
 
 		/* always use block 0 to find the target for flushes for now */
 		pos = 0;
-- 
1.9.3 (Apple Git-50)

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 6/8] dm: don't start current request if it would've merged with the previous
  2015-03-04  0:47 [PATCH 0/8] dm: optimize request-based queue processing Mike Snitzer
                   ` (4 preceding siblings ...)
  2015-03-04  0:47 ` [PATCH 5/8] dm: don't schedule delayed run of the queue if nothing to do Mike Snitzer
@ 2015-03-04  0:47 ` Mike Snitzer
  2015-03-04  6:36   ` Hannes Reinecke
  2015-03-09  1:04   ` Junichi Nomura
  2015-03-04  0:47 ` [PATCH 7/8] dm sysfs: introduce ability to add writable attributes Mike Snitzer
  2015-03-04  0:47 ` [PATCH 8/8] dm: impose configurable deadline for dm_request_fn's merge heuristic Mike Snitzer
  7 siblings, 2 replies; 19+ messages in thread
From: Mike Snitzer @ 2015-03-04  0:47 UTC (permalink / raw)
  To: dm-devel; +Cc: axboe, jmoyer

Request-based DM's dm_request_fn() is so fast to pull requests off the
queue that steps need to be taken to promote merging by avoiding request
processing if it makes sense.

If the current request would've merged with previous request let the
current request stay on the queue longer.

Suggested-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 drivers/md/dm.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index c13477a..3242f4c 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -21,6 +21,7 @@
 #include <linux/delay.h>
 #include <linux/wait.h>
 #include <linux/kthread.h>
+#include <linux/elevator.h> /* for rq_end_sector() */
 
 #include <trace/events/block.h>
 
@@ -216,6 +217,10 @@ struct mapped_device {
 
 	struct kthread_worker kworker;
 	struct task_struct *kworker_task;
+
+	/* for request-based merge heuristic in dm_request_fn() */
+	sector_t last_rq_pos;
+	int last_rq_rw;
 };
 
 /*
@@ -1927,6 +1932,9 @@ static void dm_start_request(struct mapped_device *md, struct request *orig)
 	blk_start_request(orig);
 	atomic_inc(&md->pending[rq_data_dir(orig)]);
 
+	md->last_rq_pos = rq_end_sector(orig);
+	md->last_rq_rw = rq_data_dir(orig);
+
 	/*
 	 * Hold the md reference here for the in-flight I/O.
 	 * We can't rely on the reference count by device opener,
@@ -1979,6 +1987,10 @@ static void dm_request_fn(struct request_queue *q)
 			continue;
 		}
 
+		if (md_in_flight(md) && rq->bio && rq->bio->bi_vcnt == 1 &&
+		    md->last_rq_pos == pos && md->last_rq_rw == rq_data_dir(rq))
+			goto delay_and_out;
+
 		if (ti->type->busy && ti->type->busy(ti))
 			goto delay_and_out;
 
-- 
1.9.3 (Apple Git-50)

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 7/8] dm sysfs: introduce ability to add writable attributes
  2015-03-04  0:47 [PATCH 0/8] dm: optimize request-based queue processing Mike Snitzer
                   ` (5 preceding siblings ...)
  2015-03-04  0:47 ` [PATCH 6/8] dm: don't start current request if it would've merged with the previous Mike Snitzer
@ 2015-03-04  0:47 ` Mike Snitzer
  2015-03-04  0:47 ` [PATCH 8/8] dm: impose configurable deadline for dm_request_fn's merge heuristic Mike Snitzer
  7 siblings, 0 replies; 19+ messages in thread
From: Mike Snitzer @ 2015-03-04  0:47 UTC (permalink / raw)
  To: dm-devel; +Cc: axboe, jmoyer

Add DM_ATTR_RW() macro and establish .store method in dm_sysfs_ops.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 drivers/md/dm-sysfs.c | 32 +++++++++++++++++++++++++++-----
 1 file changed, 27 insertions(+), 5 deletions(-)

diff --git a/drivers/md/dm-sysfs.c b/drivers/md/dm-sysfs.c
index c62c5ab..1271c31 100644
--- a/drivers/md/dm-sysfs.c
+++ b/drivers/md/dm-sysfs.c
@@ -11,7 +11,7 @@
 struct dm_sysfs_attr {
 	struct attribute attr;
 	ssize_t (*show)(struct mapped_device *, char *);
-	ssize_t (*store)(struct mapped_device *, char *);
+	ssize_t (*store)(struct mapped_device *, const char *, size_t count);
 };
 
 #define DM_ATTR_RO(_name) \
@@ -39,6 +39,31 @@ static ssize_t dm_attr_show(struct kobject *kobj, struct attribute *attr,
 	return ret;
 }
 
+#define DM_ATTR_RW(_name) \
+struct dm_sysfs_attr dm_attr_##_name = \
+	__ATTR(_name, S_IRUGO | S_IWUSR, dm_attr_##_name##_show, dm_attr_##_name##_store)
+
+static ssize_t dm_attr_store(struct kobject *kobj, struct attribute *attr,
+			     const char *page, size_t count)
+{
+	struct dm_sysfs_attr *dm_attr;
+	struct mapped_device *md;
+	ssize_t ret;
+
+	dm_attr = container_of(attr, struct dm_sysfs_attr, attr);
+	if (!dm_attr->store)
+		return -EIO;
+
+	md = dm_get_from_kobject(kobj);
+	if (!md)
+		return -EINVAL;
+
+	ret = dm_attr->store(md, page, count);
+	dm_put(md);
+
+	return ret;
+}
+
 static ssize_t dm_attr_name_show(struct mapped_device *md, char *buf)
 {
 	if (dm_copy_name_and_uuid(md, buf, NULL))
@@ -77,12 +102,9 @@ static struct attribute *dm_attrs[] = {
 
 static const struct sysfs_ops dm_sysfs_ops = {
 	.show	= dm_attr_show,
+	.store	= dm_attr_store,
 };
 
-/*
- * dm kobject is embedded in mapped_device structure
- * no need to define release function here
- */
 static struct kobj_type dm_ktype = {
 	.sysfs_ops	= &dm_sysfs_ops,
 	.default_attrs	= dm_attrs,
-- 
1.9.3 (Apple Git-50)

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 8/8] dm: impose configurable deadline for dm_request_fn's merge heuristic
  2015-03-04  0:47 [PATCH 0/8] dm: optimize request-based queue processing Mike Snitzer
                   ` (6 preceding siblings ...)
  2015-03-04  0:47 ` [PATCH 7/8] dm sysfs: introduce ability to add writable attributes Mike Snitzer
@ 2015-03-04  0:47 ` Mike Snitzer
  2015-03-06 15:30   ` [PATCH 8/8 v2] " Mike Snitzer
  7 siblings, 1 reply; 19+ messages in thread
From: Mike Snitzer @ 2015-03-04  0:47 UTC (permalink / raw)
  To: dm-devel; +Cc: axboe, jmoyer

Otherwise, for sequential workloads, the dm_request_fn can perform
excessive request merging at the expense of increased service time.

Add a per-device sysfs parameter to allow the user to control how long a
request that is a reasonable merge candidate can be queued on the
request queue.  The resolution of this request dispatch deadline is in
microseconds (ranging from 1 to 100000 usecs), to set a 0.3ms deadline:
  echo 300 > /sys/block/dm-7/dm/rq_based_queue_deadline

The dm_request_fn's merge heuristic and associated extra accounting is
disabled by default (rq_based_queue_deadline is 0).

This parameter is not applicable to bio-based DM devices so it will only
ever report 0 for them.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 drivers/md/dm-sysfs.c |  2 ++
 drivers/md/dm.c       | 58 +++++++++++++++++++++++++++++++++++++++++++++++----
 drivers/md/dm.h       |  4 ++++
 3 files changed, 60 insertions(+), 4 deletions(-)

diff --git a/drivers/md/dm-sysfs.c b/drivers/md/dm-sysfs.c
index 1271c31..cf5f83b 100644
--- a/drivers/md/dm-sysfs.c
+++ b/drivers/md/dm-sysfs.c
@@ -92,11 +92,13 @@ static ssize_t dm_attr_suspended_show(struct mapped_device *md, char *buf)
 static DM_ATTR_RO(name);
 static DM_ATTR_RO(uuid);
 static DM_ATTR_RO(suspended);
+static DM_ATTR_RW(rq_based_queue_deadline);
 
 static struct attribute *dm_attrs[] = {
 	&dm_attr_name.attr,
 	&dm_attr_uuid.attr,
 	&dm_attr_suspended.attr,
+	&dm_attr_rq_based_queue_deadline.attr,
 	NULL,
 };
 
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 3242f4c..b38361c 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -21,6 +21,7 @@
 #include <linux/delay.h>
 #include <linux/wait.h>
 #include <linux/kthread.h>
+#include <linux/ktime.h>
 #include <linux/elevator.h> /* for rq_end_sector() */
 
 #include <trace/events/block.h>
@@ -219,8 +220,10 @@ struct mapped_device {
 	struct task_struct *kworker_task;
 
 	/* for request-based merge heuristic in dm_request_fn() */
-	sector_t last_rq_pos;
+	unsigned rq_based_queue_deadline_usecs;
 	int last_rq_rw;
+	sector_t last_rq_pos;
+	ktime_t last_rq_start_time;
 };
 
 /*
@@ -1932,8 +1935,11 @@ static void dm_start_request(struct mapped_device *md, struct request *orig)
 	blk_start_request(orig);
 	atomic_inc(&md->pending[rq_data_dir(orig)]);
 
-	md->last_rq_pos = rq_end_sector(orig);
-	md->last_rq_rw = rq_data_dir(orig);
+	if (md->rq_based_queue_deadline_usecs) {
+		md->last_rq_pos = rq_end_sector(orig);
+		md->last_rq_rw = rq_data_dir(orig);
+		md->last_rq_start_time = ktime_get();
+	}
 
 	/*
 	 * Hold the md reference here for the in-flight I/O.
@@ -1945,6 +1951,46 @@ static void dm_start_request(struct mapped_device *md, struct request *orig)
 	dm_get(md);
 }
 
+#define MAX_QUEUE_DEADLINE_USECS 100000 /* 100 ms */
+
+ssize_t dm_attr_rq_based_queue_deadline_show(struct mapped_device *md, char *buf)
+{
+	return sprintf(buf, "%u\n", md->rq_based_queue_deadline_usecs);
+}
+
+ssize_t dm_attr_rq_based_queue_deadline_store(struct mapped_device *md,
+					      const char *buf, size_t count)
+{
+	unsigned deadline;
+
+	if (!dm_request_based(md))
+		return count;
+
+	if (kstrtouint(buf, 10, &deadline))
+		return -EINVAL;
+
+	if (deadline > MAX_QUEUE_DEADLINE_USECS)
+		deadline = MAX_QUEUE_DEADLINE_USECS;
+
+	md->rq_based_queue_deadline_usecs = deadline;
+
+	return count;
+}
+
+static bool dm_request_dispatched_before_queue_deadline(struct mapped_device *md)
+{
+	ktime_t kt_now, kt_deadline;
+
+	if (!md->rq_based_queue_deadline_usecs)
+		return false;
+
+	kt_now = ktime_get();
+	kt_deadline = ns_to_ktime((u64)md->rq_based_queue_deadline_usecs * NSEC_PER_USEC);
+	kt_deadline = ktime_add_safe(md->last_rq_start_time, kt_deadline);
+
+	return !ktime_after(kt_now, kt_deadline);
+}
+
 /*
  * q->request_fn for request-based dm.
  * Called with the queue lock held.
@@ -1987,7 +2033,8 @@ static void dm_request_fn(struct request_queue *q)
 			continue;
 		}
 
-		if (md_in_flight(md) && rq->bio && rq->bio->bi_vcnt == 1 &&
+		if (dm_request_dispatched_before_queue_deadline(md) &&
+		    md_in_flight(md) && rq->bio && rq->bio->bi_vcnt == 1 &&
 		    md->last_rq_pos == pos && md->last_rq_rw == rq_data_dir(rq))
 			goto delay_and_out;
 
@@ -2527,6 +2574,9 @@ static int dm_init_request_based_queue(struct mapped_device *md)
 	if (!q)
 		return 0;
 
+	/* disable dm_request_fn's merge heuristic by default */
+	md->rq_based_queue_deadline_usecs = 0;
+
 	md->queue = q;
 	dm_init_md_queue(md);
 	blk_queue_softirq_done(md->queue, dm_softirq_done);
diff --git a/drivers/md/dm.h b/drivers/md/dm.h
index db49586..722fc0f 100644
--- a/drivers/md/dm.h
+++ b/drivers/md/dm.h
@@ -234,4 +234,8 @@ static inline bool dm_message_test_buffer_overflow(char *result, unsigned maxlen
 	return !maxlen || strlen(result) + 1 >= maxlen;
 }
 
+ssize_t dm_attr_rq_based_queue_deadline_show(struct mapped_device *md, char *buf);
+ssize_t dm_attr_rq_based_queue_deadline_store(struct mapped_device *md,
+					      const char *buf, size_t count);
+
 #endif
-- 
1.9.3 (Apple Git-50)

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 6/8] dm: don't start current request if it would've merged with the previous
  2015-03-04  0:47 ` [PATCH 6/8] dm: don't start current request if it would've merged with the previous Mike Snitzer
@ 2015-03-04  6:36   ` Hannes Reinecke
  2015-03-04 17:26     ` Mike Snitzer
  2015-03-09  1:04   ` Junichi Nomura
  1 sibling, 1 reply; 19+ messages in thread
From: Hannes Reinecke @ 2015-03-04  6:36 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: dm-devel

On 03/04/2015 01:47 AM, Mike Snitzer wrote:
> Request-based DM's dm_request_fn() is so fast to pull requests off the
> queue that steps need to be taken to promote merging by avoiding request
> processing if it makes sense.
> 
> If the current request would've merged with previous request let the
> current request stay on the queue longer.
> 
> Suggested-by: Jens Axboe <axboe@fb.com>
> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
> ---
>  drivers/md/dm.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> index c13477a..3242f4c 100644
> --- a/drivers/md/dm.c
> +++ b/drivers/md/dm.c
> @@ -21,6 +21,7 @@
>  #include <linux/delay.h>
>  #include <linux/wait.h>
>  #include <linux/kthread.h>
> +#include <linux/elevator.h> /* for rq_end_sector() */
>  
>  #include <trace/events/block.h>
>  
> @@ -216,6 +217,10 @@ struct mapped_device {
>  
>  	struct kthread_worker kworker;
>  	struct task_struct *kworker_task;
> +
> +	/* for request-based merge heuristic in dm_request_fn() */
> +	sector_t last_rq_pos;
> +	int last_rq_rw;
>  };
>  
>  /*
> @@ -1927,6 +1932,9 @@ static void dm_start_request(struct mapped_device *md, struct request *orig)
>  	blk_start_request(orig);
>  	atomic_inc(&md->pending[rq_data_dir(orig)]);
>  
> +	md->last_rq_pos = rq_end_sector(orig);
> +	md->last_rq_rw = rq_data_dir(orig);
> +
>  	/*
>  	 * Hold the md reference here for the in-flight I/O.
>  	 * We can't rely on the reference count by device opener,
> @@ -1979,6 +1987,10 @@ static void dm_request_fn(struct request_queue *q)
>  			continue;
>  		}
>  
> +		if (md_in_flight(md) && rq->bio && rq->bio->bi_vcnt == 1 &&
> +		    md->last_rq_pos == pos && md->last_rq_rw == rq_data_dir(rq))
> +			goto delay_and_out;
> +
>  		if (ti->type->busy && ti->type->busy(ti))
>  			goto delay_and_out;
>  
> 
That's one of the things I'm slightly uneasy with.

It essentially means we're out-guessing the I/O scheduler here, and
assume that _any_ scheduler will be merging requests which are aligned.
While this is apparently true for the existing schedulers, is this a
requirement for any I/O scheduler?

I"ll probably show my ignorance here, but why can we even pull these
requests off the request queue?
_Especially_ as they'll be merged with another bio/request just a few us
later?

Nevertheless, thank you Mike for these patches. They helped a lot.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 6/8] dm: don't start current request if it would've merged with the previous
  2015-03-04  6:36   ` Hannes Reinecke
@ 2015-03-04 17:26     ` Mike Snitzer
  0 siblings, 0 replies; 19+ messages in thread
From: Mike Snitzer @ 2015-03-04 17:26 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: dm-devel

On Wed, Mar 04 2015 at  1:36am -0500,
Hannes Reinecke <hare@suse.de> wrote:

> On 03/04/2015 01:47 AM, Mike Snitzer wrote:
> > Request-based DM's dm_request_fn() is so fast to pull requests off the
> > queue that steps need to be taken to promote merging by avoiding request
> > processing if it makes sense.
> > 
> > If the current request would've merged with previous request let the
> > current request stay on the queue longer.
> > 
> > Suggested-by: Jens Axboe <axboe@fb.com>
> > Signed-off-by: Mike Snitzer <snitzer@redhat.com>
> > ---
> >  drivers/md/dm.c | 12 ++++++++++++
> >  1 file changed, 12 insertions(+)
> > 
> > diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> > index c13477a..3242f4c 100644
> > --- a/drivers/md/dm.c
> > +++ b/drivers/md/dm.c
> > @@ -1927,6 +1932,9 @@ static void dm_start_request(struct mapped_device *md, struct request *orig)
> >  	blk_start_request(orig);
> >  	atomic_inc(&md->pending[rq_data_dir(orig)]);
> >  
> > +	md->last_rq_pos = rq_end_sector(orig);
> > +	md->last_rq_rw = rq_data_dir(orig);
> > +
> >  	/*
> >  	 * Hold the md reference here for the in-flight I/O.
> >  	 * We can't rely on the reference count by device opener,
> > @@ -1979,6 +1987,10 @@ static void dm_request_fn(struct request_queue *q)
> >  			continue;
> >  		}
> >  
> > +		if (md_in_flight(md) && rq->bio && rq->bio->bi_vcnt == 1 &&
> > +		    md->last_rq_pos == pos && md->last_rq_rw == rq_data_dir(rq))
> > +			goto delay_and_out;
> > +
> >  		if (ti->type->busy && ti->type->busy(ti))
> >  			goto delay_and_out;
> >  
> > 
> That's one of the things I'm slightly uneasy with.
> 
> It essentially means we're out-guessing the I/O scheduler here, and
> assume that _any_ scheduler will be merging requests which are aligned.
> While this is apparently true for the existing schedulers, is this a
> requirement for any I/O scheduler?

I'm not too interested in predicting the future of IO schedulers.  ALl
existing IO schedulers at least provide merging (noop included); and
Jens said that even blk-mq provides merging (I've yet to explore that
though).

But yes, needing to second-guess whether the block layer is hurting
throughput by dispatching requests is unfortunate.  Ultimately the
heuristic used here _could_ (should?) be elevated into the block core.
Jens already said that blk-mq will suffer from this same problem (the
queue will be kicked so fast that the window of opportunity to merge
becomes non-existent).

> I"ll probably show my ignorance here, but why can we even pull these
> requests off the request queue?
> _Especially_ as they'll be merged with another bio/request just a few us
> later?

dm_request_fn() hasn't pulled the request off the queue at the time the
above heuristic is used.  It is actually determing whether it best to
leave the request in question on the queue (where it is currently).  It
isn't until blk_start_request() that the request is taken off the queue
(via blk_dequeue_request).

> Nevertheless, thank you Mike for these patches. They helped a lot.

No problem, I'm still exploring how we might _efficiently_ autotune this
parameter.  Certainly not an easy problem.  But given this is to be
paired with multipath the performance profile can change as paths come
and go (Netapp found with 4 paths using ~10us is best, with 2 paths
using ~25us is ideal).  So it would seem a fixed deadline only gets us
so far.

That said, this is a bit of a unique problem in that the Netapp backend
is IOPS bound.  Other hardware could easily have a similar constraint;
it is this constraint that makes this problem somewhat illusive (at
least in terms of a "simple" yet complete solution).

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 8/8 v2] dm: impose configurable deadline for dm_request_fn's merge heuristic
  2015-03-04  0:47 ` [PATCH 8/8] dm: impose configurable deadline for dm_request_fn's merge heuristic Mike Snitzer
@ 2015-03-06 15:30   ` Mike Snitzer
  0 siblings, 0 replies; 19+ messages in thread
From: Mike Snitzer @ 2015-03-06 15:30 UTC (permalink / raw)
  To: dm-devel; +Cc: axboe, jmoyer

Otherwise, for sequential workloads, the dm_request_fn can allow
excessive request merging at the expense of increased service time.

Add a per-device sysfs attribute to allow the user to control how long a
request, that is a reasonable merge candidate, can be queued on the
request queue.  The resolution of this request dispatch deadline is in
microseconds (ranging from 1 to 100000 usecs), to set a 20us deadline:
  echo 20 > /sys/block/dm-7/dm/rq_based_seq_io_merge_deadline

The dm_request_fn's merge heuristic and associated extra accounting is
disabled by default (rq_based_seq_io_merge_deadline is 0).

This sysfs attribute is not applicable to bio-based DM devices so it
will only ever report 0 for them.

By allowing a request to remain on the queue it will block others
requests on the queue.  But introducing a short dequeue delay has proven
very effective at enabling certain sequential IO workloads on really
fast, yet IOPS constrained, devices to build up slightly larger IOs --
yielding 90+% throughput improvements.  Having precise control over the
time taken to wait for larger requests to build affords control beyond
that of waiting for certain IO sizes to accumulate (which would require
a deadline anyway).  This knob will only ever make sense with sequential
IO workloads and the particular value used is storage configuration
specific.

Given the expected niche use-case for when this knob is useful it has
been deemed acceptable to expose this relatively crude method for
crafting optimal IO on specific storage -- especially given the solution
is simple yet effective.  In the context of DM multipath, it is
advisable to tune this sysfs attribute to a value that offers the best
performance for the common case (e.g. if 4 paths are expected active,
tune for that; if paths fail then performance may be slightly reduced).

Alternatives were explored to have request-based DM autotune this value
(e.g. if/when paths fail) but they were quickly deemed too fragile and
complex to warrant further design and development time.  If this problem
proves more common as faster storage emerges we'll have to look at
elevating a generic solution into the block core.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 Documentation/ABI/testing/sysfs-block-dm | 14 ++++++++
 drivers/md/dm-sysfs.c                    |  2 ++
 drivers/md/dm.c                          | 57 +++++++++++++++++++++++++++++---
 drivers/md/dm.h                          |  4 +++
 4 files changed, 73 insertions(+), 4 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-block-dm b/Documentation/ABI/testing/sysfs-block-dm
index 87ca569..1becc94 100644
--- a/Documentation/ABI/testing/sysfs-block-dm
+++ b/Documentation/ABI/testing/sysfs-block-dm
@@ -23,3 +23,17 @@ Description:	Device-mapper device suspend state.
 		Contains the value 1 while the device is suspended.
 		Otherwise it contains 0. Read-only attribute.
 Users:		util-linux, device-mapper udev rules
+
+What:		/sys/block/dm-<num>/dm/rq_based_seq_io_merge_deadline
+Date:		March 2015
+KernelVersion:	4.0
+Contact:	dm-devel@redhat.com
+Description:	Allow control over how long a request that is a
+		reasonable merge candidate can be queued on the request
+		queue.  The resolution of this deadline is in
+		microseconds (ranging from 1 to 100000 usecs).
+		Setting this attribute to 0 (the default) will disable
+		request-based DM's merge heuristic and associated extra
+		accounting.  This attribute is not applicable to
+		bio-based DM devices so it will only ever report 0 for
+		them.
diff --git a/drivers/md/dm-sysfs.c b/drivers/md/dm-sysfs.c
index 1271c31..f5bb394 100644
--- a/drivers/md/dm-sysfs.c
+++ b/drivers/md/dm-sysfs.c
@@ -92,11 +92,13 @@ static ssize_t dm_attr_suspended_show(struct mapped_device *md, char *buf)
 static DM_ATTR_RO(name);
 static DM_ATTR_RO(uuid);
 static DM_ATTR_RO(suspended);
+static DM_ATTR_RW(rq_based_seq_io_merge_deadline);
 
 static struct attribute *dm_attrs[] = {
 	&dm_attr_name.attr,
 	&dm_attr_uuid.attr,
 	&dm_attr_suspended.attr,
+	&dm_attr_rq_based_seq_io_merge_deadline.attr,
 	NULL,
 };
 
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 6efac0c..0c51c01 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -21,6 +21,7 @@
 #include <linux/delay.h>
 #include <linux/wait.h>
 #include <linux/kthread.h>
+#include <linux/ktime.h>
 #include <linux/elevator.h> /* for rq_end_sector() */
 
 #include <trace/events/block.h>
@@ -219,8 +220,10 @@ struct mapped_device {
 	struct task_struct *kworker_task;
 
 	/* for request-based merge heuristic in dm_request_fn() */
-	sector_t last_rq_pos;
+	unsigned seq_rq_merge_deadline_usecs;
 	int last_rq_rw;
+	sector_t last_rq_pos;
+	ktime_t last_rq_start_time;
 };
 
 /*
@@ -1932,8 +1935,11 @@ static void dm_start_request(struct mapped_device *md, struct request *orig)
 	blk_start_request(orig);
 	atomic_inc(&md->pending[rq_data_dir(orig)]);
 
-	md->last_rq_pos = rq_end_sector(orig);
-	md->last_rq_rw = rq_data_dir(orig);
+	if (md->seq_rq_merge_deadline_usecs) {
+		md->last_rq_pos = rq_end_sector(orig);
+		md->last_rq_rw = rq_data_dir(orig);
+		md->last_rq_start_time = ktime_get();
+	}
 
 	/*
 	 * Hold the md reference here for the in-flight I/O.
@@ -1945,6 +1951,45 @@ static void dm_start_request(struct mapped_device *md, struct request *orig)
 	dm_get(md);
 }
 
+#define MAX_SEQ_RQ_MERGE_DEADLINE_USECS 100000
+
+ssize_t dm_attr_rq_based_seq_io_merge_deadline_show(struct mapped_device *md, char *buf)
+{
+	return sprintf(buf, "%u\n", md->seq_rq_merge_deadline_usecs);
+}
+
+ssize_t dm_attr_rq_based_seq_io_merge_deadline_store(struct mapped_device *md,
+						     const char *buf, size_t count)
+{
+	unsigned deadline;
+
+	if (!dm_request_based(md))
+		return count;
+
+	if (kstrtouint(buf, 10, &deadline))
+		return -EINVAL;
+
+	if (deadline > MAX_SEQ_RQ_MERGE_DEADLINE_USECS)
+		deadline = MAX_SEQ_RQ_MERGE_DEADLINE_USECS;
+
+	md->seq_rq_merge_deadline_usecs = deadline;
+
+	return count;
+}
+
+static bool dm_request_peeked_before_merge_deadline(struct mapped_device *md)
+{
+	ktime_t kt_deadline;
+
+	if (!md->seq_rq_merge_deadline_usecs)
+		return false;
+
+	kt_deadline = ns_to_ktime((u64)md->seq_rq_merge_deadline_usecs * NSEC_PER_USEC);
+	kt_deadline = ktime_add_safe(md->last_rq_start_time, kt_deadline);
+
+	return !ktime_after(ktime_get(), kt_deadline);
+}
+
 /*
  * q->request_fn for request-based dm.
  * Called with the queue lock held.
@@ -1987,7 +2032,8 @@ static void dm_request_fn(struct request_queue *q)
 			continue;
 		}
 
-		if (md_in_flight(md) && rq->bio && rq->bio->bi_vcnt == 1 &&
+		if (dm_request_peeked_before_merge_deadline(md) &&
+		    md_in_flight(md) && rq->bio && rq->bio->bi_vcnt == 1 &&
 		    md->last_rq_pos == pos && md->last_rq_rw == rq_data_dir(rq))
 			goto delay_and_out;
 
@@ -2527,6 +2573,9 @@ static int dm_init_request_based_queue(struct mapped_device *md)
 	if (!q)
 		return 0;
 
+	/* disable dm_request_fn's merge heuristic by default */
+	md->seq_rq_merge_deadline_usecs = 0;
+
 	md->queue = q;
 	dm_init_md_queue(md);
 	blk_queue_softirq_done(md->queue, dm_softirq_done);
diff --git a/drivers/md/dm.h b/drivers/md/dm.h
index db49586..5522422 100644
--- a/drivers/md/dm.h
+++ b/drivers/md/dm.h
@@ -234,4 +234,8 @@ static inline bool dm_message_test_buffer_overflow(char *result, unsigned maxlen
 	return !maxlen || strlen(result) + 1 >= maxlen;
 }
 
+ssize_t dm_attr_rq_based_seq_io_merge_deadline_show(struct mapped_device *md, char *buf);
+ssize_t dm_attr_rq_based_seq_io_merge_deadline_store(struct mapped_device *md,
+						     const char *buf, size_t count);
+
 #endif
-- 
1.9.3 (Apple Git-50)

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 6/8] dm: don't start current request if it would've merged with the previous
  2015-03-04  0:47 ` [PATCH 6/8] dm: don't start current request if it would've merged with the previous Mike Snitzer
  2015-03-04  6:36   ` Hannes Reinecke
@ 2015-03-09  1:04   ` Junichi Nomura
  2015-03-09  3:30     ` Merla, ShivaKrishna
  1 sibling, 1 reply; 19+ messages in thread
From: Junichi Nomura @ 2015-03-09  1:04 UTC (permalink / raw)
  To: device-mapper development, Mike Snitzer
  Cc: axboe, jmoyer, Shiva Krishna Merla

On 03/04/15 09:47, Mike Snitzer wrote:
> Request-based DM's dm_request_fn() is so fast to pull requests off the
> queue that steps need to be taken to promote merging by avoiding request
> processing if it makes sense.
> 
> If the current request would've merged with previous request let the
> current request stay on the queue longer.

Hi Mike,

Looking at this thread, I think there are 2 different problems mixed.

Firstly, "/dev/skd" is STEC S1120 block driver, which doesn't have
lld_busy function. So back pressure doesn't propagate to request-based
dm device and dm feeds as many request as possible to the lower driver.
("pulling too fast" situation)
If you still have access to the device, can you try the patch like
the attached one?

Secondly, for this comment from Merla ShivaKrishna:

> Yes, Indeed this the exact issue we saw at NetApp. While running sequential
> 4K write I/O with large thread count, 2 paths yield better performance than
> 4 paths and performance drastically drops with 4 paths. The device queue_depth
> as 32 and with blktrace we could see better I/O merging happening and average 
> request size was > 8K through iostat. With 4 paths none of the I/O gets merged and
> always average request size is 4K. Scheduler used was noop as we are using SSD 
> based storage. We could get I/O merging to happen even with 4 paths but with lower
> device queue_depth of 16. Even then the performance was lacking compared to 2 paths.

Have you tried increasing nr_requests of the dm device?
E.g. setting nr_requests to 256.

4 paths with each queue depth 32 means that it can have 128 I/Os in flight.
With the default value of nr_requests 128, the request queue is almost
always empty and I/O merge could not happen.
Increasing nr_requests of the dm device allows some more requests queued,
thus the chance of merging may increase.
Reducing the lower device queue depth could be another solution. But if
the depth is too low, you might not be able to keep the optimal speed.

----
Jun'ichi Nomura, NEC Corporation


[PATCH] skd: Add lld_busy function for request-based stacking driver

diff --git a/drivers/block/skd_main.c b/drivers/block/skd_main.c
index 1e46eb2..0e8f466 100644
--- a/drivers/block/skd_main.c
+++ b/drivers/block/skd_main.c
@@ -565,6 +565,16 @@ skd_prep_discard_cdb(struct skd_scsi_request *scsi_req,
 	blk_add_request_payload(req, page, len);
 }
 
+static int skd_lld_busy(struct request_queue *q)
+{
+	struct skd_device *skdev = q->queuedata;
+
+	if (skdev->in_flight >= skdev->cur_max_queue_depth)
+		return 1;
+
+	return 0;
+}
+
 static void skd_request_fn_not_online(struct request_queue *q);
 
 static void skd_request_fn(struct request_queue *q)
@@ -4419,6 +4429,8 @@ static int skd_cons_disk(struct skd_device *skdev)
 	/* set sysfs ptimal_io_size to 8K */
 	blk_queue_io_opt(q, 8192);
 
+	/* register feed back function for stacking driver */
+	blk_queue_lld_busy(q, skd_lld_busy);
+
 	/* DISCARD Flag initialization. */
 	q->limits.discard_granularity = 8192;
 	q->limits.discard_alignment = 0;

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 6/8] dm: don't start current request if it would've merged with the previous
  2015-03-09  1:04   ` Junichi Nomura
@ 2015-03-09  3:30     ` Merla, ShivaKrishna
  2015-03-09  6:09       ` Junichi Nomura
  0 siblings, 1 reply; 19+ messages in thread
From: Merla, ShivaKrishna @ 2015-03-09  3:30 UTC (permalink / raw)
  To: Junichi Nomura, device-mapper development, Mike Snitzer; +Cc: axboe, jmoyer



> -----Original Message-----
> From: Junichi Nomura [mailto:j-nomura@ce.jp.nec.com]
> Sent: Sunday, March 08, 2015 8:05 PM
> To: device-mapper development; Mike Snitzer
> Cc: axboe@kernel.dk; jmoyer@redhat.com; Hannes Reinecke; Merla,
> ShivaKrishna
> Subject: Re: [dm-devel] [PATCH 6/8] dm: don't start current request if it
> would've merged with the previous
> 
> On 03/04/15 09:47, Mike Snitzer wrote:
> > Request-based DM's dm_request_fn() is so fast to pull requests off the
> > queue that steps need to be taken to promote merging by avoiding
> request
> > processing if it makes sense.
> >
> > If the current request would've merged with previous request let the
> > current request stay on the queue longer.
> 
> Hi Mike,
> 
> Looking at this thread, I think there are 2 different problems mixed.
> 
> Firstly, "/dev/skd" is STEC S1120 block driver, which doesn't have
> lld_busy function. So back pressure doesn't propagate to request-based
> dm device and dm feeds as many request as possible to the lower driver.
> ("pulling too fast" situation)
> If you still have access to the device, can you try the patch like
> the attached one?
> 
> Secondly, for this comment from Merla ShivaKrishna:
> 
> > Yes, Indeed this the exact issue we saw at NetApp. While running
> sequential
> > 4K write I/O with large thread count, 2 paths yield better performance than
> > 4 paths and performance drastically drops with 4 paths. The device
> queue_depth
> > as 32 and with blktrace we could see better I/O merging happening and
> average
> > request size was > 8K through iostat. With 4 paths none of the I/O gets
> merged and
> > always average request size is 4K. Scheduler used was noop as we are using
> SSD
> > based storage. We could get I/O merging to happen even with 4 paths but
> with lower
> > device queue_depth of 16. Even then the performance was lacking
> compared to 2 paths.
> 
> Have you tried increasing nr_requests of the dm device?
> E.g. setting nr_requests to 256.
> 
> 4 paths with each queue depth 32 means that it can have 128 I/Os in flight.
> With the default value of nr_requests 128, the request queue is almost
> always empty and I/O merge could not happen.
> Increasing nr_requests of the dm device allows some more requests
> queued,
> thus the chance of merging may increase.
> Reducing the lower device queue depth could be another solution. But if
> the depth is too low, you might not be able to keep the optimal speed.
>
Yes, we have tried this as well but didn't help. Indeed, we have tested with queue_depth
of 16 on each path as well with 64 I/O's in flight and resulted in same issue. We did try
reducing the queue_depth with 4 paths, but couldn't achieve comparable performance
as of 2 paths. With Mike's patch, we see tremendous improvement with just a small delay 
of ~20us with 4 paths. This might vary with different configurations but sure have proved 
that a tunable to delay dispatches with sequential workloads has helped a lot.

 
> ----
> Jun'ichi Nomura, NEC Corporation
> 
> 
> [PATCH] skd: Add lld_busy function for request-based stacking driver
> 
> diff --git a/drivers/block/skd_main.c b/drivers/block/skd_main.c
> index 1e46eb2..0e8f466 100644
> --- a/drivers/block/skd_main.c
> +++ b/drivers/block/skd_main.c
> @@ -565,6 +565,16 @@ skd_prep_discard_cdb(struct skd_scsi_request
> *scsi_req,
>  	blk_add_request_payload(req, page, len);
>  }
> 
> +static int skd_lld_busy(struct request_queue *q)
> +{
> +	struct skd_device *skdev = q->queuedata;
> +
> +	if (skdev->in_flight >= skdev->cur_max_queue_depth)
> +		return 1;
> +
> +	return 0;
> +}
> +
>  static void skd_request_fn_not_online(struct request_queue *q);
> 
>  static void skd_request_fn(struct request_queue *q)
> @@ -4419,6 +4429,8 @@ static int skd_cons_disk(struct skd_device *skdev)
>  	/* set sysfs ptimal_io_size to 8K */
>  	blk_queue_io_opt(q, 8192);
> 
> +	/* register feed back function for stacking driver */
> +	blk_queue_lld_busy(q, skd_lld_busy);
> +
>  	/* DISCARD Flag initialization. */
>  	q->limits.discard_granularity = 8192;
>  	q->limits.discard_alignment = 0;

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 6/8] dm: don't start current request if it would've merged with the previous
  2015-03-09  3:30     ` Merla, ShivaKrishna
@ 2015-03-09  6:09       ` Junichi Nomura
  2015-03-09 16:10         ` Merla, ShivaKrishna
  0 siblings, 1 reply; 19+ messages in thread
From: Junichi Nomura @ 2015-03-09  6:09 UTC (permalink / raw)
  To: Merla, ShivaKrishna
  Cc: axboe, device-mapper development, jmoyer, Mike Snitzer

On 03/09/15 12:30, Merla, ShivaKrishna wrote:
>> Secondly, for this comment from Merla ShivaKrishna:
>>
>>> Yes, Indeed this the exact issue we saw at NetApp. While running sequential
>>> 4K write I/O with large thread count, 2 paths yield better performance than
>>> 4 paths and performance drastically drops with 4 paths. The device queue_depth
>>> as 32 and with blktrace we could see better I/O merging happening and average
>>> request size was > 8K through iostat. With 4 paths none of the I/O gets merged and
>>> always average request size is 4K. Scheduler used was noop as we are using SSD
>>> based storage. We could get I/O merging to happen even with 4 paths but with lower
>>> device queue_depth of 16. Even then the performance was lacking compared to 2 paths.
>>
>> Have you tried increasing nr_requests of the dm device?
>> E.g. setting nr_requests to 256.
>>
>> 4 paths with each queue depth 32 means that it can have 128 I/Os in flight.
>> With the default value of nr_requests 128, the request queue is almost
>> always empty and I/O merge could not happen.
>> Increasing nr_requests of the dm device allows some more requests
>> queued,
>> thus the chance of merging may increase.
>> Reducing the lower device queue depth could be another solution. But if
>> the depth is too low, you might not be able to keep the optimal speed.
>>
> Yes, we have tried this as well but didn't help. Indeed, we have tested with queue_depth
> of 16 on each path as well with 64 I/O's in flight and resulted in same issue. We did try
> reducing the queue_depth with 4 paths, but couldn't achieve comparable performance
> as of 2 paths. With Mike's patch, we see tremendous improvement with just a small delay 
> of ~20us with 4 paths. This might vary with different configurations but sure have proved 
> that a tunable to delay dispatches with sequential workloads has helped a lot.

Hi,

did you try increasing nr_requests of dm request queue?
If so, what was the increased value of nr_requests in the case of
device queue_depth 32?

-- 
Jun'ichi Nomura, NEC Corporation

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 6/8] dm: don't start current request if it would've merged with the previous
  2015-03-09  6:09       ` Junichi Nomura
@ 2015-03-09 16:10         ` Merla, ShivaKrishna
  2015-03-10  1:05           ` Junichi Nomura
  0 siblings, 1 reply; 19+ messages in thread
From: Merla, ShivaKrishna @ 2015-03-09 16:10 UTC (permalink / raw)
  To: Junichi Nomura; +Cc: axboe, device-mapper development, jmoyer, Mike Snitzer



> -----Original Message-----
> From: Junichi Nomura [mailto:j-nomura@ce.jp.nec.com]
> Sent: Monday, March 09, 2015 1:10 AM
> To: Merla, ShivaKrishna
> Cc: device-mapper development; Mike Snitzer; axboe@kernel.dk;
> jmoyer@redhat.com; Hannes Reinecke
> Subject: Re: [dm-devel] [PATCH 6/8] dm: don't start current request if it
> would've merged with the previous
> 
> On 03/09/15 12:30, Merla, ShivaKrishna wrote:
> >> Secondly, for this comment from Merla ShivaKrishna:
> >>
> >>> Yes, Indeed this the exact issue we saw at NetApp. While running
> sequential
> >>> 4K write I/O with large thread count, 2 paths yield better performance
> than
> >>> 4 paths and performance drastically drops with 4 paths. The device
> queue_depth
> >>> as 32 and with blktrace we could see better I/O merging happening and
> average
> >>> request size was > 8K through iostat. With 4 paths none of the I/O gets
> merged and
> >>> always average request size is 4K. Scheduler used was noop as we are
> using SSD
> >>> based storage. We could get I/O merging to happen even with 4 paths
> but with lower
> >>> device queue_depth of 16. Even then the performance was lacking
> compared to 2 paths.
> >>
> >> Have you tried increasing nr_requests of the dm device?
> >> E.g. setting nr_requests to 256.
> >>
> >> 4 paths with each queue depth 32 means that it can have 128 I/Os in flight.
> >> With the default value of nr_requests 128, the request queue is almost
> >> always empty and I/O merge could not happen.
> >> Increasing nr_requests of the dm device allows some more requests
> >> queued,
> >> thus the chance of merging may increase.
> >> Reducing the lower device queue depth could be another solution. But if
> >> the depth is too low, you might not be able to keep the optimal speed.
> >>
> > Yes, we have tried this as well but didn't help. Indeed, we have tested with
> queue_depth
> > of 16 on each path as well with 64 I/O's in flight and resulted in same issue.
> We did try
> > reducing the queue_depth with 4 paths, but couldn't achieve comparable
> performance
> > as of 2 paths. With Mike's patch, we see tremendous improvement with
> just a small delay
> > of ~20us with 4 paths. This might vary with different configurations but sure
> have proved
> > that a tunable to delay dispatches with sequential workloads has helped a
> lot.
> 
> Hi,
> 
> did you try increasing nr_requests of dm request queue?
> If so, what was the increased value of nr_requests in the case of
> device queue_depth 32?
> 
Yes, we tried increasing it to 256, the average merge count
certainly increased a little bit but not comparable as to Mike's change.


03/09/2015 11:07:54 AM
Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdak              0.00     0.00    0.00 21737.00     0.00 101064.00     9.30    11.93    0.55    0.00    0.55   0.04  93.00
sdu               0.00     0.00    0.00 21759.00     0.00 101728.00     9.35    11.55    0.53    0.00    0.53   0.04  93.60
sdm               0.00     0.00    0.00 21669.00     0.00 101168.00     9.34    11.76    0.54    0.00    0.54   0.04  94.00
sdac              0.00     0.00    0.00 21812.00     0.00 101540.00     9.31    11.74    0.54    0.00    0.54   0.04  92.50
dm-6              0.00 14266.00    0.00 86980.00     0.00 405496.00     9.32    48.44    0.56    0.00    0.56   0.01  98.70

With tunable delay of 20us here are the results.

03/09/2015 11:08:43 AM
Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdak              0.00     0.00    0.00 11740.00     0.00 135344.00    23.06     4.42    0.38    0.00    0.38   0.05  62.60
sdu               0.00     0.00    0.00 11781.00     0.00 140800.00    23.90     3.23    0.27    0.00    0.27   0.05  62.80
sdm               0.00     0.00    0.00 11770.00     0.00 137592.00    23.38     4.53    0.39    0.00    0.39   0.06  65.60
sdac              0.00     0.00    0.00 11664.00     0.00 137976.00    23.66     3.36    0.29    0.00    0.29   0.05  60.80
dm-6              0.00 88446.00    0.00 46937.00     0.00 551684.00    23.51    17.88    0.38    0.00    0.38   0.02  99.30

> --
> Jun'ichi Nomura, NEC Corporation

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 6/8] dm: don't start current request if it would've merged with the previous
  2015-03-09 16:10         ` Merla, ShivaKrishna
@ 2015-03-10  1:05           ` Junichi Nomura
  2015-03-10  1:59             ` Merla, ShivaKrishna
  0 siblings, 1 reply; 19+ messages in thread
From: Junichi Nomura @ 2015-03-10  1:05 UTC (permalink / raw)
  To: Merla, ShivaKrishna
  Cc: axboe, device-mapper development, jmoyer, Mike Snitzer

On 03/10/15 01:10, Merla, ShivaKrishna wrote:
>> did you try increasing nr_requests of dm request queue?
>> If so, what was the increased value of nr_requests in the case of
>> device queue_depth 32?
>>
> Yes, we tried increasing it to 256, the average merge count
> certainly increased a little bit but not comparable as to Mike's change.
> 
> 
> 03/09/2015 11:07:54 AM
> Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
> sdak              0.00     0.00    0.00 21737.00     0.00 101064.00     9.30    11.93    0.55    0.00    0.55   0.04  93.00
> sdu               0.00     0.00    0.00 21759.00     0.00 101728.00     9.35    11.55    0.53    0.00    0.53   0.04  93.60
> sdm               0.00     0.00    0.00 21669.00     0.00 101168.00     9.34    11.76    0.54    0.00    0.54   0.04  94.00
> sdac              0.00     0.00    0.00 21812.00     0.00 101540.00     9.31    11.74    0.54    0.00    0.54   0.04  92.50
> dm-6              0.00 14266.00    0.00 86980.00     0.00 405496.00     9.32    48.44    0.56    0.00    0.56   0.01  98.70
> 
> With tunable delay of 20us here are the results.
> 
> 03/09/2015 11:08:43 AM
> Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
> sdak              0.00     0.00    0.00 11740.00     0.00 135344.00    23.06     4.42    0.38    0.00    0.38   0.05  62.60
> sdu               0.00     0.00    0.00 11781.00     0.00 140800.00    23.90     3.23    0.27    0.00    0.27   0.05  62.80
> sdm               0.00     0.00    0.00 11770.00     0.00 137592.00    23.38     4.53    0.39    0.00    0.39   0.06  65.60
> sdac              0.00     0.00    0.00 11664.00     0.00 137976.00    23.66     3.36    0.29    0.00    0.29   0.05  60.80
> dm-6              0.00 88446.00    0.00 46937.00     0.00 551684.00    23.51    17.88    0.38    0.00    0.38   0.02  99.30

Oh I see. Thank you.
So it's not that requests weren't queued for merging but that CPUs
could not pile the requests fast enough...

If possible, it would be interesting to see the results with much
lower device queue_depth like 4 or 2.

-- 
Jun'ichi Nomura, NEC Corporation

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 6/8] dm: don't start current request if it would've merged with the previous
  2015-03-10  1:05           ` Junichi Nomura
@ 2015-03-10  1:59             ` Merla, ShivaKrishna
  2015-03-10  5:43               ` Junichi Nomura
  0 siblings, 1 reply; 19+ messages in thread
From: Merla, ShivaKrishna @ 2015-03-10  1:59 UTC (permalink / raw)
  To: Junichi Nomura; +Cc: axboe, device-mapper development, jmoyer, Mike Snitzer



> -----Original Message-----
> From: Junichi Nomura [mailto:j-nomura@ce.jp.nec.com]
> Sent: Monday, March 09, 2015 8:06 PM
> To: Merla, ShivaKrishna
> Cc: device-mapper development; Mike Snitzer; axboe@kernel.dk;
> jmoyer@redhat.com; Hannes Reinecke
> Subject: Re: [dm-devel] [PATCH 6/8] dm: don't start current request if it
> would've merged with the previous
> 
> On 03/10/15 01:10, Merla, ShivaKrishna wrote:
> >> did you try increasing nr_requests of dm request queue?
> >> If so, what was the increased value of nr_requests in the case of
> >> device queue_depth 32?
> >>
> > Yes, we tried increasing it to 256, the average merge count
> > certainly increased a little bit but not comparable as to Mike's change.
> >
> >
> > 03/09/2015 11:07:54 AM
> > Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz
> await r_await w_await  svctm  %util
> > sdak              0.00     0.00    0.00 21737.00     0.00 101064.00     9.30    11.93    0.55
> 0.00    0.55   0.04  93.00
> > sdu               0.00     0.00    0.00 21759.00     0.00 101728.00     9.35    11.55    0.53
> 0.00    0.53   0.04  93.60
> > sdm               0.00     0.00    0.00 21669.00     0.00 101168.00     9.34    11.76    0.54
> 0.00    0.54   0.04  94.00
> > sdac              0.00     0.00    0.00 21812.00     0.00 101540.00     9.31    11.74    0.54
> 0.00    0.54   0.04  92.50
> > dm-6              0.00 14266.00    0.00 86980.00     0.00 405496.00     9.32    48.44
> 0.56    0.00    0.56   0.01  98.70
> >
> > With tunable delay of 20us here are the results.
> >
> > 03/09/2015 11:08:43 AM
> > Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz
> await r_await w_await  svctm  %util
> > sdak              0.00     0.00    0.00 11740.00     0.00 135344.00    23.06     4.42    0.38
> 0.00    0.38   0.05  62.60
> > sdu               0.00     0.00    0.00 11781.00     0.00 140800.00    23.90     3.23    0.27
> 0.00    0.27   0.05  62.80
> > sdm               0.00     0.00    0.00 11770.00     0.00 137592.00    23.38     4.53    0.39
> 0.00    0.39   0.06  65.60
> > sdac              0.00     0.00    0.00 11664.00     0.00 137976.00    23.66     3.36    0.29
> 0.00    0.29   0.05  60.80
> > dm-6              0.00 88446.00    0.00 46937.00     0.00 551684.00    23.51    17.88
> 0.38    0.00    0.38   0.02  99.30
> 
> Oh I see. Thank you.
> So it's not that requests weren't queued for merging but that CPUs
> could not pile the requests fast enough...
> 
> If possible, it would be interesting to see the results with much
> lower device queue_depth like 4 or 2.
I think very low queue_depths will lead multipath_busy() to return 1
even in case of large number of paths. Hence leads to better I/O merging.

queue_depth 2

03/09/2015 08:47:04 PM
Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdak              0.00     0.00    0.00 10512.00     0.00 106512.00    20.26    12.60    1.20    0.00    1.20   0.10 100.00
sdu               0.00     0.00    0.00 10546.00     0.00 105728.00    20.05    12.92    1.23    0.00    1.23   0.09 100.00
sdm               0.00     0.00    0.00 10518.00     0.00 106108.00    20.18    12.80    1.22    0.00    1.22   0.09  99.90
sdac              0.00     0.00    0.00 10548.00     0.00 106100.00    20.12    13.10    1.24    0.00    1.24   0.09 100.00
dm-6              0.00 62581.00    0.00 42122.00     0.00 424420.00    20.15    53.77    1.28    0.00    1.28   0.02 100.00

queue_depth 4

03/09/2015 08:54:27 PM
Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdak              0.00     0.00    0.00 15671.00     0.00 109292.00    13.95     8.64    0.55    0.00    0.55   0.06  99.80
sdu               0.00     0.00    0.00 15733.00     0.00 109204.00    13.88     8.34    0.53    0.00    0.53   0.06  99.40
sdm               0.00     0.00    0.00 15779.00     0.00 106788.00    13.54     8.57    0.54    0.00    0.54   0.06  99.50
sdac              0.00     0.00    0.00 15611.00     0.00 109568.00    14.04     8.31    0.53    0.00    0.53   0.06  98.70
dm-6              0.00 44626.00    0.00 62795.00     0.00 434832.00    13.85    36.12    0.58    0.00    0.58   0.02 100.00

> 
> --
> Jun'ichi Nomura, NEC Corporation

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 6/8] dm: don't start current request if it would've merged with the previous
  2015-03-10  1:59             ` Merla, ShivaKrishna
@ 2015-03-10  5:43               ` Junichi Nomura
  0 siblings, 0 replies; 19+ messages in thread
From: Junichi Nomura @ 2015-03-10  5:43 UTC (permalink / raw)
  To: Merla, ShivaKrishna
  Cc: axboe, device-mapper development, jmoyer, Mike Snitzer

On 03/10/15 10:59, Merla, ShivaKrishna wrote:
>>> 03/09/2015 11:07:54 AM
>>> Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz
>> await r_await w_await  svctm  %util
>>> sdak              0.00     0.00    0.00 21737.00     0.00 101064.00     9.30    11.93    0.55
>> 0.00    0.55   0.04  93.00
>>> sdu               0.00     0.00    0.00 21759.00     0.00 101728.00     9.35    11.55    0.53
>> 0.00    0.53   0.04  93.60
>>> sdm               0.00     0.00    0.00 21669.00     0.00 101168.00     9.34    11.76    0.54
>> 0.00    0.54   0.04  94.00
>>> sdac              0.00     0.00    0.00 21812.00     0.00 101540.00     9.31    11.74    0.54
>> 0.00    0.54   0.04  92.50
>>> dm-6              0.00 14266.00    0.00 86980.00     0.00 405496.00     9.32    48.44
>> 0.56    0.00    0.56   0.01  98.70
>>>
>>> With tunable delay of 20us here are the results.
>>>
>>> 03/09/2015 11:08:43 AM
>>> Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz
>> await r_await w_await  svctm  %util
>>> sdak              0.00     0.00    0.00 11740.00     0.00 135344.00    23.06     4.42    0.38
>> 0.00    0.38   0.05  62.60
>>> sdu               0.00     0.00    0.00 11781.00     0.00 140800.00    23.90     3.23    0.27
>> 0.00    0.27   0.05  62.80
>>> sdm               0.00     0.00    0.00 11770.00     0.00 137592.00    23.38     4.53    0.39
>> 0.00    0.39   0.06  65.60
>>> sdac              0.00     0.00    0.00 11664.00     0.00 137976.00    23.66     3.36    0.29
>> 0.00    0.29   0.05  60.80
>>> dm-6              0.00 88446.00    0.00 46937.00     0.00 551684.00    23.51    17.88
>> 0.38    0.00    0.38   0.02  99.30
>>
>> Oh I see. Thank you.
>> So it's not that requests weren't queued for merging but that CPUs
>> could not pile the requests fast enough...
>>
>> If possible, it would be interesting to see the results with much
>> lower device queue_depth like 4 or 2.
> I think very low queue_depths will lead multipath_busy() to return 1
> even in case of large number of paths. Hence leads to better I/O merging.

Yes. But it didn't help very much..

> queue_depth 2
> 
> 03/09/2015 08:47:04 PM
> Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
> sdak              0.00     0.00    0.00 10512.00     0.00 106512.00    20.26    12.60    1.20    0.00    1.20   0.10 100.00
> sdu               0.00     0.00    0.00 10546.00     0.00 105728.00    20.05    12.92    1.23    0.00    1.23   0.09 100.00
> sdm               0.00     0.00    0.00 10518.00     0.00 106108.00    20.18    12.80    1.22    0.00    1.22   0.09  99.90
> sdac              0.00     0.00    0.00 10548.00     0.00 106100.00    20.12    13.10    1.24    0.00    1.24   0.09 100.00
> dm-6              0.00 62581.00    0.00 42122.00     0.00 424420.00    20.15    53.77    1.28    0.00    1.28   0.02 100.00
> 
> queue_depth 4
> 
> 03/09/2015 08:54:27 PM
> Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
> sdak              0.00     0.00    0.00 15671.00     0.00 109292.00    13.95     8.64    0.55    0.00    0.55   0.06  99.80
> sdu               0.00     0.00    0.00 15733.00     0.00 109204.00    13.88     8.34    0.53    0.00    0.53   0.06  99.40
> sdm               0.00     0.00    0.00 15779.00     0.00 106788.00    13.54     8.57    0.54    0.00    0.54   0.06  99.50
> sdac              0.00     0.00    0.00 15611.00     0.00 109568.00    14.04     8.31    0.53    0.00    0.53   0.06  98.70
> dm-6              0.00 44626.00    0.00 62795.00     0.00 434832.00    13.85    36.12    0.58    0.00    0.58   0.02 100.00

-- 
Jun'ichi Nomura, NEC Corporation

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2015-03-10  5:43 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-04  0:47 [PATCH 0/8] dm: optimize request-based queue processing Mike Snitzer
2015-03-04  0:47 ` [PATCH 1/8] dm: remove unnecessary wrapper around blk_lld_busy Mike Snitzer
2015-03-04  0:47 ` [PATCH 2/8] dm: remove request-based DM queue's lld_busy_fn hook Mike Snitzer
2015-03-04  0:47 ` [PATCH 3/8] dm: remove request-based logic from make_request_fn wrapper Mike Snitzer
2015-03-04  0:47 ` [PATCH 4/8] dm: only run the queue on completion if congested or no requests pending Mike Snitzer
2015-03-04  0:47 ` [PATCH 5/8] dm: don't schedule delayed run of the queue if nothing to do Mike Snitzer
2015-03-04  0:47 ` [PATCH 6/8] dm: don't start current request if it would've merged with the previous Mike Snitzer
2015-03-04  6:36   ` Hannes Reinecke
2015-03-04 17:26     ` Mike Snitzer
2015-03-09  1:04   ` Junichi Nomura
2015-03-09  3:30     ` Merla, ShivaKrishna
2015-03-09  6:09       ` Junichi Nomura
2015-03-09 16:10         ` Merla, ShivaKrishna
2015-03-10  1:05           ` Junichi Nomura
2015-03-10  1:59             ` Merla, ShivaKrishna
2015-03-10  5:43               ` Junichi Nomura
2015-03-04  0:47 ` [PATCH 7/8] dm sysfs: introduce ability to add writable attributes Mike Snitzer
2015-03-04  0:47 ` [PATCH 8/8] dm: impose configurable deadline for dm_request_fn's merge heuristic Mike Snitzer
2015-03-06 15:30   ` [PATCH 8/8 v2] " Mike Snitzer

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.