All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] dm-mpath: improve I/O schedule
@ 2017-09-15 16:44 ` Ming Lei
  0 siblings, 0 replies; 69+ messages in thread
From: Ming Lei @ 2017-09-15 16:44 UTC (permalink / raw)
  To: dm-devel, Mike Snitzer, Jens Axboe
  Cc: linux-block, Christoph Hellwig, Bart Van Assche,
	Laurence Oberman, Ming Lei

Hi,

We depend on I/O scheduler in dm-mpath layer, and underlying
I/O scheduler is bypassed basically.

I/O scheduler depends on queue busy condition to
trigger I/O merge, unfortunatley inside dm-mpath,
the underlying queue busy feedback is not accurate
enough, and we just allocate one request and dispatcch
it out to underlying queue, no matter if that queue
is busy or not. Then I/O merge is hard to trigger.

This patchset sets underlying queue's nr_request as
the queue's queue depth, so that queue busy is figured
out by checking if request is allocated successfully.

>From test result on mq-deadline, sequential I/O performance
is improved a lot, see test result in patch 5's commit log.

Any comments are welcome!

Thanks,
Ming

Ming Lei (5):
  block: don't call blk_mq_delay_run_hw_queue() in case of
    BLK_STS_RESOURCE
  dm-mpath: return DM_MAPIO_REQUEUE in case of rq allocation failure
  dm-mpath: remove annoying message of 'blk_get_request() returned -11'
  block: export blk_update_nr_requests
  dm-mpath: improve I/O schedule

 block/blk-core.c        |  4 +++-
 block/blk-sysfs.c       |  5 +----
 block/blk.h             |  2 --
 drivers/md/dm-mpath.c   | 30 +++++++++++++++++++++++++++---
 drivers/md/dm-rq.c      |  1 -
 drivers/nvme/host/fc.c  |  3 ---
 drivers/scsi/scsi_lib.c |  4 ----
 include/linux/blkdev.h  |  1 +
 8 files changed, 32 insertions(+), 18 deletions(-)

-- 
2.9.5

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH 0/5] dm-mpath: improve I/O schedule
@ 2017-09-15 16:44 ` Ming Lei
  0 siblings, 0 replies; 69+ messages in thread
From: Ming Lei @ 2017-09-15 16:44 UTC (permalink / raw)
  To: dm-devel, Mike Snitzer, Jens Axboe
  Cc: linux-block, Christoph Hellwig, Bart Van Assche,
	Laurence Oberman, Ming Lei

Hi,

We depend on I/O scheduler in dm-mpath layer, and underlying
I/O scheduler is bypassed basically.

I/O scheduler depends on queue busy condition to
trigger I/O merge, unfortunatley inside dm-mpath,
the underlying queue busy feedback is not accurate
enough, and we just allocate one request and dispatcch
it out to underlying queue, no matter if that queue
is busy or not. Then I/O merge is hard to trigger.

This patchset sets underlying queue's nr_request as
the queue's queue depth, so that queue busy is figured
out by checking if request is allocated successfully.

From test result on mq-deadline, sequential I/O performance
is improved a lot, see test result in patch 5's commit log.

Any comments are welcome!

Thanks,
Ming

Ming Lei (5):
  block: don't call blk_mq_delay_run_hw_queue() in case of
    BLK_STS_RESOURCE
  dm-mpath: return DM_MAPIO_REQUEUE in case of rq allocation failure
  dm-mpath: remove annoying message of 'blk_get_request() returned -11'
  block: export blk_update_nr_requests
  dm-mpath: improve I/O schedule

 block/blk-core.c        |  4 +++-
 block/blk-sysfs.c       |  5 +----
 block/blk.h             |  2 --
 drivers/md/dm-mpath.c   | 30 +++++++++++++++++++++++++++---
 drivers/md/dm-rq.c      |  1 -
 drivers/nvme/host/fc.c  |  3 ---
 drivers/scsi/scsi_lib.c |  4 ----
 include/linux/blkdev.h  |  1 +
 8 files changed, 32 insertions(+), 18 deletions(-)

-- 
2.9.5

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE
  2017-09-15 16:44 ` Ming Lei
@ 2017-09-15 16:44   ` Ming Lei
  -1 siblings, 0 replies; 69+ messages in thread
From: Ming Lei @ 2017-09-15 16:44 UTC (permalink / raw)
  To: dm-devel, Mike Snitzer, Jens Axboe
  Cc: linux-block, Christoph Hellwig, Bart Van Assche,
	Laurence Oberman, Ming Lei, Sagi Grimberg, linux-nvme,
	James E.J. Bottomley, Martin K. Petersen, linux-scsi

If .queue_rq() returns BLK_STS_RESOURCE, blk-mq will rerun
the queue in the three situations:

1) if BLK_MQ_S_SCHED_RESTART is set
- queue is rerun after one rq is completed, see blk_mq_sched_restart()
which is run from blk_mq_free_request()

2) BLK_MQ_S_TAG_WAITING is set
- queue is rerun after one tag is freed

3) otherwise
- queue is run immediately in blk_mq_dispatch_rq_list()

So calling blk_mq_delay_run_hw_queue() inside .queue_rq() doesn't make
sense because no matter it is called or not, the queue still will be
rerun soon in above three situations, and the busy req can be dispatched
again.

Cc: Mike Snitzer <snitzer@redhat.com>
Cc: dm-devel@redhat.com
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: linux-nvme@lists.infradead.org
Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: linux-scsi@vger.kernel.org
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/md/dm-rq.c      | 1 -
 drivers/nvme/host/fc.c  | 3 ---
 drivers/scsi/scsi_lib.c | 4 ----
 3 files changed, 8 deletions(-)

diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
index c6ebc5b1e00e..71422cea1c4a 100644
--- a/drivers/md/dm-rq.c
+++ b/drivers/md/dm-rq.c
@@ -761,7 +761,6 @@ static blk_status_t dm_mq_queue_rq(struct blk_mq_hw_ctx *hctx,
 		/* Undo dm_start_request() before requeuing */
 		rq_end_stats(md, rq);
 		rq_completed(md, rq_data_dir(rq), false);
-		blk_mq_delay_run_hw_queue(hctx, 100/*ms*/);
 		return BLK_STS_RESOURCE;
 	}
 
diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index d2e882c0f496..17727eee5d5f 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -2041,9 +2041,6 @@ nvme_fc_start_fcp_op(struct nvme_fc_ctrl *ctrl, struct nvme_fc_queue *queue,
 	return BLK_STS_OK;
 
 busy:
-	if (!(op->flags & FCOP_FLAGS_AEN) && queue->hctx)
-		blk_mq_delay_run_hw_queue(queue->hctx, NVMEFC_QUEUE_DELAY);
-
 	return BLK_STS_RESOURCE;
 }
 
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 9cf6a80fe297..71d5bf1345c9 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -2010,11 +2010,7 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx,
 out:
 	switch (ret) {
 	case BLK_STS_OK:
-		break;
 	case BLK_STS_RESOURCE:
-		if (atomic_read(&sdev->device_busy) == 0 &&
-		    !scsi_device_blocked(sdev))
-			blk_mq_delay_run_hw_queue(hctx, SCSI_QUEUE_DELAY);
 		break;
 	default:
 		/*
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE
@ 2017-09-15 16:44   ` Ming Lei
  0 siblings, 0 replies; 69+ messages in thread
From: Ming Lei @ 2017-09-15 16:44 UTC (permalink / raw)


If .queue_rq() returns BLK_STS_RESOURCE, blk-mq will rerun
the queue in the three situations:

1) if BLK_MQ_S_SCHED_RESTART is set
- queue is rerun after one rq is completed, see blk_mq_sched_restart()
which is run from blk_mq_free_request()

2) BLK_MQ_S_TAG_WAITING is set
- queue is rerun after one tag is freed

3) otherwise
- queue is run immediately in blk_mq_dispatch_rq_list()

So calling blk_mq_delay_run_hw_queue() inside .queue_rq() doesn't make
sense because no matter it is called or not, the queue still will be
rerun soon in above three situations, and the busy req can be dispatched
again.

Cc: Mike Snitzer <snitzer at redhat.com>
Cc: dm-devel at redhat.com
Cc: Sagi Grimberg <sagi at grimberg.me>
Cc: linux-nvme at lists.infradead.org
Cc: "James E.J. Bottomley" <jejb at linux.vnet.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen at oracle.com>
Cc: linux-scsi at vger.kernel.org
Signed-off-by: Ming Lei <ming.lei at redhat.com>
---
 drivers/md/dm-rq.c      | 1 -
 drivers/nvme/host/fc.c  | 3 ---
 drivers/scsi/scsi_lib.c | 4 ----
 3 files changed, 8 deletions(-)

diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
index c6ebc5b1e00e..71422cea1c4a 100644
--- a/drivers/md/dm-rq.c
+++ b/drivers/md/dm-rq.c
@@ -761,7 +761,6 @@ static blk_status_t dm_mq_queue_rq(struct blk_mq_hw_ctx *hctx,
 		/* Undo dm_start_request() before requeuing */
 		rq_end_stats(md, rq);
 		rq_completed(md, rq_data_dir(rq), false);
-		blk_mq_delay_run_hw_queue(hctx, 100/*ms*/);
 		return BLK_STS_RESOURCE;
 	}
 
diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index d2e882c0f496..17727eee5d5f 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -2041,9 +2041,6 @@ nvme_fc_start_fcp_op(struct nvme_fc_ctrl *ctrl, struct nvme_fc_queue *queue,
 	return BLK_STS_OK;
 
 busy:
-	if (!(op->flags & FCOP_FLAGS_AEN) && queue->hctx)
-		blk_mq_delay_run_hw_queue(queue->hctx, NVMEFC_QUEUE_DELAY);
-
 	return BLK_STS_RESOURCE;
 }
 
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 9cf6a80fe297..71d5bf1345c9 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -2010,11 +2010,7 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx,
 out:
 	switch (ret) {
 	case BLK_STS_OK:
-		break;
 	case BLK_STS_RESOURCE:
-		if (atomic_read(&sdev->device_busy) == 0 &&
-		    !scsi_device_blocked(sdev))
-			blk_mq_delay_run_hw_queue(hctx, SCSI_QUEUE_DELAY);
 		break;
 	default:
 		/*
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH 2/5] dm-mpath: return DM_MAPIO_REQUEUE in case of rq allocation failure
  2017-09-15 16:44 ` Ming Lei
  (?)
  (?)
@ 2017-09-15 16:44 ` Ming Lei
  2017-09-15 17:29     ` Bart Van Assche
  -1 siblings, 1 reply; 69+ messages in thread
From: Ming Lei @ 2017-09-15 16:44 UTC (permalink / raw)
  To: dm-devel, Mike Snitzer, Jens Axboe
  Cc: linux-block, Christoph Hellwig, Bart Van Assche,
	Laurence Oberman, Ming Lei

blk-mq will rerun queue via RESTART after one request is completion,
so not necessary to wait random time for requeuing, it should trust
blk-mq to do it.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/md/dm-mpath.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
index 96aedaac2c64..f5a1088a6e79 100644
--- a/drivers/md/dm-mpath.c
+++ b/drivers/md/dm-mpath.c
@@ -505,7 +505,7 @@ static int multipath_clone_and_map(struct dm_target *ti, struct request *rq,
 			atomic_inc(&m->pg_init_in_progress);
 			activate_or_offline_path(pgpath);
 		}
-		return DM_MAPIO_DELAY_REQUEUE;
+		return DM_MAPIO_REQUEUE;
 	}
 	clone->bio = clone->biotail = NULL;
 	clone->rq_disk = bdev->bd_disk;
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH 3/5] dm-mpath: remove annoying message of 'blk_get_request() returned -11'
  2017-09-15 16:44 ` Ming Lei
                   ` (2 preceding siblings ...)
  (?)
@ 2017-09-15 16:44 ` Ming Lei
  -1 siblings, 0 replies; 69+ messages in thread
From: Ming Lei @ 2017-09-15 16:44 UTC (permalink / raw)
  To: dm-devel, Mike Snitzer, Jens Axboe
  Cc: linux-block, Christoph Hellwig, Bart Van Assche,
	Laurence Oberman, Ming Lei

It is very normal to see allocation failure, so not necessary
to dump it and annoy people.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/md/dm-mpath.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
index f5a1088a6e79..f57ad8621c4c 100644
--- a/drivers/md/dm-mpath.c
+++ b/drivers/md/dm-mpath.c
@@ -499,8 +499,6 @@ static int multipath_clone_and_map(struct dm_target *ti, struct request *rq,
 	if (IS_ERR(clone)) {
 		/* EBUSY, ENODEV or EWOULDBLOCK: requeue */
 		bool queue_dying = blk_queue_dying(q);
-		DMERR_LIMIT("blk_get_request() returned %ld%s - requeuing",
-			    PTR_ERR(clone), queue_dying ? " (path offline)" : "");
 		if (queue_dying) {
 			atomic_inc(&m->pg_init_in_progress);
 			activate_or_offline_path(pgpath);
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH 4/5] block: export blk_update_nr_requests
  2017-09-15 16:44 ` Ming Lei
                   ` (3 preceding siblings ...)
  (?)
@ 2017-09-15 16:44 ` Ming Lei
  -1 siblings, 0 replies; 69+ messages in thread
From: Ming Lei @ 2017-09-15 16:44 UTC (permalink / raw)
  To: dm-devel, Mike Snitzer, Jens Axboe
  Cc: linux-block, Christoph Hellwig, Bart Van Assche,
	Laurence Oberman, Ming Lei

dm-mpath need this API for improving IO scheduling.

The IO schedule is actually done on dm-rq(mpath) queue,
instead of underlying devices.

If we set q->nr_requests as q->queue_depth on underlying
devices, we can get the queue's busy feedback by simply
checking if blk_get_request() returns successfully.

This way will make block layer's I/O schedule more
effective on dm-rq device.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-core.c       | 4 +++-
 block/blk-sysfs.c      | 5 +----
 block/blk.h            | 2 --
 include/linux/blkdev.h | 1 +
 4 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index aebe676225e6..9752aac9821c 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1108,7 +1108,8 @@ int blk_update_nr_requests(struct request_queue *q, unsigned int nr)
 	struct request_list *rl;
 	int on_thresh, off_thresh;
 
-	WARN_ON_ONCE(q->mq_ops);
+	if (q->mq_ops)
+		return blk_mq_update_nr_requests(q, nr);
 
 	spin_lock_irq(q->queue_lock);
 	q->nr_requests = nr;
@@ -1145,6 +1146,7 @@ int blk_update_nr_requests(struct request_queue *q, unsigned int nr)
 	spin_unlock_irq(q->queue_lock);
 	return 0;
 }
+EXPORT_SYMBOL(blk_update_nr_requests);
 
 /**
  * __get_request - get a free request
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index b8362c0df51d..2dbef5dbd195 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -77,10 +77,7 @@ queue_requests_store(struct request_queue *q, const char *page, size_t count)
 	if (nr < BLKDEV_MIN_RQ)
 		nr = BLKDEV_MIN_RQ;
 
-	if (q->request_fn)
-		err = blk_update_nr_requests(q, nr);
-	else
-		err = blk_mq_update_nr_requests(q, nr);
+	err = blk_update_nr_requests(q, nr);
 
 	if (err)
 		return err;
diff --git a/block/blk.h b/block/blk.h
index fa4f232afc18..5bf662da2417 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -308,8 +308,6 @@ static inline int queue_congestion_off_threshold(struct request_queue *q)
 	return q->nr_congestion_off;
 }
 
-extern int blk_update_nr_requests(struct request_queue *, unsigned int);
-
 /*
  * Contribute to IO statistics IFF:
  *
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 460294bb0fa5..226b97142999 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1229,6 +1229,7 @@ struct request_queue *blk_alloc_queue(gfp_t);
 struct request_queue *blk_alloc_queue_node(gfp_t, int);
 extern void blk_put_queue(struct request_queue *);
 extern void blk_set_queue_dying(struct request_queue *);
+extern int blk_update_nr_requests(struct request_queue *, unsigned int);
 
 /*
  * block layer runtime pm functions
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH 5/5] dm-mpath: improve I/O schedule
  2017-09-15 16:44 ` Ming Lei
                   ` (4 preceding siblings ...)
  (?)
@ 2017-09-15 16:44 ` Ming Lei
  2017-09-15 20:10   ` Mike Snitzer
                     ` (3 more replies)
  -1 siblings, 4 replies; 69+ messages in thread
From: Ming Lei @ 2017-09-15 16:44 UTC (permalink / raw)
  To: dm-devel, Mike Snitzer, Jens Axboe
  Cc: linux-block, Christoph Hellwig, Bart Van Assche,
	Laurence Oberman, Ming Lei

The actual I/O schedule is done in dm-mpath layer, and the
underlying I/O schedule is simply bypassed.

This patch sets underlying queue's nr_requests as its queue's
queue_depth, then  we can get its queue busy feedback by simply
checking if blk_get_request() returns successfully.

In this way, dm-mpath can reports its queue busy to block layer
effectively, so I/O scheduling is improved much.

Follows the test results on lpfc*:

- fio(libaio, bs:4k, dio, queue_depth:64, 64 jobs,
      over dm-mpath disk)
- system(12 cores, dual sockets, mem: 64G)

---------------------------------------
          |v4.13+         |v4.13+
          |+scsi_mq_perf  |+scsi_mq_perf+patches
-----------------------------------------
 IOPS(K)  |MQ-DEADLINE    |MQ-DEADLINE
------------------------------------------
read      |       30.71   |      343.91
-----------------------------------------
randread  |       22.98   |       17.17
------------------------------------------
write     |       16.45   |      390.88
------------------------------------------
randwrite |       16.21   |       16.09
---------------------------------------

*:
1) lpfc.lpfc_lun_queue_depth=3, so that it is same with .cmd_per_lun
2) scsi_mq_perf means the patchset of 'blk-mq-sched: improve SCSI-MQ performance(V4)'[1]
3) v4.13+: top commit is 46c1e79fee41 Merge branch 'perf-urgent-for-linus'
4) the patchset 'blk-mq-sched: improve SCSI-MQ performance(V4)' focuses
on improving on SCSI-MQ, and all the test result in that coverletter was
against the raw lpfc/ib(run after 'multipath -F'), instead of dm-mpath.
5) this patchset itself doesn't depend on the scsi_mq_perf patchset[1]

[1] https://marc.info/?t=150436555700002&r=1&w=2

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/md/dm-mpath.c | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
index f57ad8621c4c..02647386d2d9 100644
--- a/drivers/md/dm-mpath.c
+++ b/drivers/md/dm-mpath.c
@@ -39,6 +39,8 @@ struct pgpath {
 
 	struct dm_path path;
 	struct delayed_work activate_path;
+	unsigned old_nr_requests;
+	unsigned queue_depth;
 
 	bool is_active:1;		/* Path status */
 };
@@ -160,12 +162,34 @@ static struct priority_group *alloc_priority_group(void)
 	return pg;
 }
 
+static void save_path_queue_depth(struct pgpath *p)
+{
+	struct request_queue *q = bdev_get_queue(p->path.dev->bdev);
+
+	p->old_nr_requests = q->nr_requests;
+	p->queue_depth = q->queue_depth;
+
+	/* one extra request for making the pipeline full */
+	if (p->queue_depth)
+		blk_update_nr_requests(q, p->queue_depth + 1);
+}
+
+static void restore_path_queue_depth(struct pgpath *p)
+{
+	struct request_queue *q = bdev_get_queue(p->path.dev->bdev);
+
+	/* nr->requests isn't changed, we restore to old value */
+	if (q->nr_requests == p->queue_depth + 1)
+		blk_update_nr_requests(q, p->old_nr_requests);
+}
+
 static void free_pgpaths(struct list_head *pgpaths, struct dm_target *ti)
 {
 	struct pgpath *pgpath, *tmp;
 
 	list_for_each_entry_safe(pgpath, tmp, pgpaths, list) {
 		list_del(&pgpath->list);
+		restore_path_queue_depth(pgpath);
 		dm_put_device(ti, pgpath->path.dev);
 		free_pgpath(pgpath);
 	}
@@ -810,6 +834,8 @@ static struct pgpath *parse_path(struct dm_arg_set *as, struct path_selector *ps
 		goto bad;
 	}
 
+	save_path_queue_depth(p);
+
 	return p;
 
  bad:
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* Re: [PATCH 2/5] dm-mpath: return DM_MAPIO_REQUEUE in case of rq allocation failure
  2017-09-15 16:44 ` [PATCH 2/5] dm-mpath: return DM_MAPIO_REQUEUE in case of rq allocation failure Ming Lei
@ 2017-09-15 17:29     ` Bart Van Assche
  0 siblings, 0 replies; 69+ messages in thread
From: Bart Van Assche @ 2017-09-15 17:29 UTC (permalink / raw)
  To: dm-devel, axboe, ming.lei, snitzer
  Cc: Bart Van Assche, hch, linux-block, loberman

T24gU2F0LCAyMDE3LTA5LTE2IGF0IDAwOjQ0ICswODAwLCBNaW5nIExlaSB3cm90ZToNCj4gYmxr
LW1xIHdpbGwgcmVydW4gcXVldWUgdmlhIFJFU1RBUlQgYWZ0ZXIgb25lIHJlcXVlc3QgaXMgY29t
cGxldGlvbiwNCj4gc28gbm90IG5lY2Vzc2FyeSB0byB3YWl0IHJhbmRvbSB0aW1lIGZvciByZXF1
ZXVpbmcsIGl0IHNob3VsZCB0cnVzdA0KPiBibGstbXEgdG8gZG8gaXQuDQo+IA0KPiBTaWduZWQt
b2ZmLWJ5OiBNaW5nIExlaSA8bWluZy5sZWlAcmVkaGF0LmNvbT4NCj4gLS0tDQo+ICBkcml2ZXJz
L21kL2RtLW1wYXRoLmMgfCAyICstDQo+ICAxIGZpbGUgY2hhbmdlZCwgMSBpbnNlcnRpb24oKyks
IDEgZGVsZXRpb24oLSkNCj4gDQo+IGRpZmYgLS1naXQgYS9kcml2ZXJzL21kL2RtLW1wYXRoLmMg
Yi9kcml2ZXJzL21kL2RtLW1wYXRoLmMNCj4gaW5kZXggOTZhZWRhYWMyYzY0Li5mNWExMDg4YTZl
NzkgMTAwNjQ0DQo+IC0tLSBhL2RyaXZlcnMvbWQvZG0tbXBhdGguYw0KPiArKysgYi9kcml2ZXJz
L21kL2RtLW1wYXRoLmMNCj4gQEAgLTUwNSw3ICs1MDUsNyBAQCBzdGF0aWMgaW50IG11bHRpcGF0
aF9jbG9uZV9hbmRfbWFwKHN0cnVjdCBkbV90YXJnZXQgKnRpLCBzdHJ1Y3QgcmVxdWVzdCAqcnEs
DQo+ICAJCQlhdG9taWNfaW5jKCZtLT5wZ19pbml0X2luX3Byb2dyZXNzKTsNCj4gIAkJCWFjdGl2
YXRlX29yX29mZmxpbmVfcGF0aChwZ3BhdGgpOw0KPiAgCQl9DQo+IC0JCXJldHVybiBETV9NQVBJ
T19ERUxBWV9SRVFVRVVFOw0KPiArCQlyZXR1cm4gRE1fTUFQSU9fUkVRVUVVRTsNCj4gIAl9DQo+
ICAJY2xvbmUtPmJpbyA9IGNsb25lLT5iaW90YWlsID0gTlVMTDsNCj4gIAljbG9uZS0+cnFfZGlz
ayA9IGJkZXYtPmJkX2Rpc2s7DQoNClNvIHlvdSBhcmUgcmV2ZXJ0aW5nIHRoZSBwYXRjaCBiZWxv
dz8gVGhhbmsgeW91IHZlcnkgbXVjaC4NCg0KY29tbWl0IDFjMjM0ODRjMzU1ZWMzNjBjYTJmMzc5
MTRmOGE0ODAyYzZiYWVlYWQNCkF1dGhvcjogQmFydCBWYW4gQXNzY2hlIDxiYXJ0LnZhbmFzc2No
ZUB3ZGMuY29tPg0KRGF0ZTogICBXZWQgQXVnIDkgMTE6MzI6MTIgMjAxNyAtMDcwMA0KDQogICAg
ZG0gbXBhdGg6IGRvIG5vdCBsb2NrIHVwIGEgQ1BVIHdpdGggcmVxdWV1aW5nIGFjdGl2aXR5DQog
ICAgDQogICAgV2hlbiB1c2luZyB0aGUgYmxvY2sgbGF5ZXIgaW4gc2luZ2xlIHF1ZXVlIG1vZGUs
IGdldF9yZXF1ZXN0KCkNCiAgICByZXR1cm5zIEVSUl9QVFIoLUVBR0FJTikgaWYgdGhlIHF1ZXVl
IGlzIGR5aW5nIGFuZCB0aGUgUkVRX05PV0FJVA0KICAgIGZsYWcgaGFzIGJlZW4gcGFzc2VkIHRv
IGdldF9yZXF1ZXN0KCkuIEF2b2lkIHRoYXQgdGhlIGtlcm5lbA0KICAgIHJlcG9ydHMgc29mdCBs
b2NrdXAgY29tcGxhaW50cyBpbiB0aGlzIGNhc2UgZHVlIHRvIGNvbnRpbnVvdXMNCiAgICByZXF1
ZXVpbmcgYWN0aXZpdHkuDQogICAgDQogICAgRml4ZXM6IDcwODNhYmJiZiAoImRtIG1wYXRoOiBh
dm9pZCB0aGF0IHBhdGggcmVtb3ZhbCBjYW4gdHJpZ2dlciBhbiBpbmZpbml0ZSBsb29wIikNCiAg
ICBDYzogc3RhYmxlQHZnZXIua2VybmVsLm9yZw0KICAgIFNpZ25lZC1vZmYtYnk6IEJhcnQgVmFu
IEFzc2NoZSA8YmFydC52YW5hc3NjaGVAd2RjLmNvbT4NCiAgICBUZXN0ZWQtYnk6IExhdXJlbmNl
IE9iZXJtYW4gPGxvYmVybWFuQHJlZGhhdC5jb20+DQogICAgUmV2aWV3ZWQtYnk6IENocmlzdG9w
aCBIZWxsd2lnIDxoY2hAbHN0LmRlPg0KICAgIFNpZ25lZC1vZmYtYnk6IE1pa2UgU25pdHplciA8
c25pdHplckByZWRoYXQuY29tPg==

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 2/5] dm-mpath: return DM_MAPIO_REQUEUE in case of rq allocation failure
@ 2017-09-15 17:29     ` Bart Van Assche
  0 siblings, 0 replies; 69+ messages in thread
From: Bart Van Assche @ 2017-09-15 17:29 UTC (permalink / raw)
  To: dm-devel, axboe, ming.lei, snitzer
  Cc: Bart Van Assche, hch, linux-block, loberman

On Sat, 2017-09-16 at 00:44 +0800, Ming Lei wrote:
> blk-mq will rerun queue via RESTART after one request is completion,
> so not necessary to wait random time for requeuing, it should trust
> blk-mq to do it.
> 
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>  drivers/md/dm-mpath.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
> index 96aedaac2c64..f5a1088a6e79 100644
> --- a/drivers/md/dm-mpath.c
> +++ b/drivers/md/dm-mpath.c
> @@ -505,7 +505,7 @@ static int multipath_clone_and_map(struct dm_target *ti, struct request *rq,
>  			atomic_inc(&m->pg_init_in_progress);
>  			activate_or_offline_path(pgpath);
>  		}
> -		return DM_MAPIO_DELAY_REQUEUE;
> +		return DM_MAPIO_REQUEUE;
>  	}
>  	clone->bio = clone->biotail = NULL;
>  	clone->rq_disk = bdev->bd_disk;

So you are reverting the patch below? Thank you very much.

commit 1c23484c355ec360ca2f37914f8a4802c6baeead
Author: Bart Van Assche <bart.vanassche@wdc.com>
Date:   Wed Aug 9 11:32:12 2017 -0700

    dm mpath: do not lock up a CPU with requeuing activity
    
    When using the block layer in single queue mode, get_request()
    returns ERR_PTR(-EAGAIN) if the queue is dying and the REQ_NOWAIT
    flag has been passed to get_request(). Avoid that the kernel
    reports soft lockup complaints in this case due to continuous
    requeuing activity.
    
    Fixes: 7083abbbf ("dm mpath: avoid that path removal can trigger an infinite loop")
    Cc: stable@vger.kernel.org
    Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
    Tested-by: Laurence Oberman <loberman@redhat.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Mike Snitzer <snitzer@redhat.com>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE
  2017-09-15 16:44   ` Ming Lei
  (?)
@ 2017-09-15 17:57     ` Bart Van Assche
  -1 siblings, 0 replies; 69+ messages in thread
From: Bart Van Assche @ 2017-09-15 17:57 UTC (permalink / raw)
  To: dm-devel, axboe, ming.lei, snitzer
  Cc: linux-block, hch, sagi, Bart Van Assche, martin.petersen,
	linux-scsi, linux-nvme, jejb, loberman

T24gU2F0LCAyMDE3LTA5LTE2IGF0IDAwOjQ0ICswODAwLCBNaW5nIExlaSB3cm90ZToNCj4gSWYg
LnF1ZXVlX3JxKCkgcmV0dXJucyBCTEtfU1RTX1JFU09VUkNFLCBibGstbXEgd2lsbCByZXJ1bg0K
PiB0aGUgcXVldWUgaW4gdGhlIHRocmVlIHNpdHVhdGlvbnM6DQo+IA0KPiAxKSBpZiBCTEtfTVFf
U19TQ0hFRF9SRVNUQVJUIGlzIHNldA0KPiAtIHF1ZXVlIGlzIHJlcnVuIGFmdGVyIG9uZSBycSBp
cyBjb21wbGV0ZWQsIHNlZSBibGtfbXFfc2NoZWRfcmVzdGFydCgpDQo+IHdoaWNoIGlzIHJ1biBm
cm9tIGJsa19tcV9mcmVlX3JlcXVlc3QoKQ0KPiANCj4gMikgQkxLX01RX1NfVEFHX1dBSVRJTkcg
aXMgc2V0DQo+IC0gcXVldWUgaXMgcmVydW4gYWZ0ZXIgb25lIHRhZyBpcyBmcmVlZA0KPiANCj4g
Mykgb3RoZXJ3aXNlDQo+IC0gcXVldWUgaXMgcnVuIGltbWVkaWF0ZWx5IGluIGJsa19tcV9kaXNw
YXRjaF9ycV9saXN0KCkNCj4gDQo+IFNvIGNhbGxpbmcgYmxrX21xX2RlbGF5X3J1bl9od19xdWV1
ZSgpIGluc2lkZSAucXVldWVfcnEoKSBkb2Vzbid0IG1ha2UNCj4gc2Vuc2UgYmVjYXVzZSBubyBt
YXR0ZXIgaXQgaXMgY2FsbGVkIG9yIG5vdCwgdGhlIHF1ZXVlIHN0aWxsIHdpbGwgYmUNCj4gcmVy
dW4gc29vbiBpbiBhYm92ZSB0aHJlZSBzaXR1YXRpb25zLCBhbmQgdGhlIGJ1c3kgcmVxIGNhbiBi
ZSBkaXNwYXRjaGVkDQo+IGFnYWluLg0KDQpOQUsNCg0KQmxvY2sgZHJpdmVycyBjYWxsIGJsa19t
cV9kZWxheV9ydW5faHdfcXVldWUoKSBpZiBpdCBpcyBrbm93biB0aGF0IHRoZSBxdWV1ZQ0KaGFz
IHRvIGJlIHJlcnVuIGxhdGVyIGV2ZW4gaWYgbm8gcmVxdWVzdCBoYXMgY29tcGxldGVkIGJlZm9y
ZSB0aGUgZGVsYXkgaGFzDQpleHBpcmVkLiBUaGlzIHBhdGNoIGJyZWFrcyBhdCBsZWFzdCB0aGUg
U0NTSSBjb3JlIGFuZCB0aGUgZG0tbXBhdGggZHJpdmVycy4NCg0KQmFydC4=

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE
@ 2017-09-15 17:57     ` Bart Van Assche
  0 siblings, 0 replies; 69+ messages in thread
From: Bart Van Assche @ 2017-09-15 17:57 UTC (permalink / raw)
  To: dm-devel, axboe, ming.lei, snitzer
  Cc: linux-block, hch, sagi, Bart Van Assche, martin.petersen,
	linux-scsi, linux-nvme, jejb, loberman

On Sat, 2017-09-16 at 00:44 +0800, Ming Lei wrote:
> If .queue_rq() returns BLK_STS_RESOURCE, blk-mq will rerun
> the queue in the three situations:
> 
> 1) if BLK_MQ_S_SCHED_RESTART is set
> - queue is rerun after one rq is completed, see blk_mq_sched_restart()
> which is run from blk_mq_free_request()
> 
> 2) BLK_MQ_S_TAG_WAITING is set
> - queue is rerun after one tag is freed
> 
> 3) otherwise
> - queue is run immediately in blk_mq_dispatch_rq_list()
> 
> So calling blk_mq_delay_run_hw_queue() inside .queue_rq() doesn't make
> sense because no matter it is called or not, the queue still will be
> rerun soon in above three situations, and the busy req can be dispatched
> again.

NAK

Block drivers call blk_mq_delay_run_hw_queue() if it is known that the queue
has to be rerun later even if no request has completed before the delay has
expired. This patch breaks at least the SCSI core and the dm-mpath drivers.

Bart.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE
@ 2017-09-15 17:57     ` Bart Van Assche
  0 siblings, 0 replies; 69+ messages in thread
From: Bart Van Assche @ 2017-09-15 17:57 UTC (permalink / raw)


On Sat, 2017-09-16@00:44 +0800, Ming Lei wrote:
> If .queue_rq() returns BLK_STS_RESOURCE, blk-mq will rerun
> the queue in the three situations:
> 
> 1) if BLK_MQ_S_SCHED_RESTART is set
> - queue is rerun after one rq is completed, see blk_mq_sched_restart()
> which is run from blk_mq_free_request()
> 
> 2) BLK_MQ_S_TAG_WAITING is set
> - queue is rerun after one tag is freed
> 
> 3) otherwise
> - queue is run immediately in blk_mq_dispatch_rq_list()
> 
> So calling blk_mq_delay_run_hw_queue() inside .queue_rq() doesn't make
> sense because no matter it is called or not, the queue still will be
> rerun soon in above three situations, and the busy req can be dispatched
> again.

NAK

Block drivers call blk_mq_delay_run_hw_queue() if it is known that the queue
has to be rerun later even if no request has completed before the delay has
expired. This patch breaks at least the SCSI core and the dm-mpath drivers.

Bart.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 2/5] dm-mpath: return DM_MAPIO_REQUEUE in case of rq allocation failure
  2017-09-15 17:29     ` Bart Van Assche
  (?)
@ 2017-09-15 20:06     ` Mike Snitzer
  2017-09-15 20:48         ` Bart Van Assche
  2017-09-17 13:23       ` Ming Lei
  -1 siblings, 2 replies; 69+ messages in thread
From: Mike Snitzer @ 2017-09-15 20:06 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: dm-devel, axboe, ming.lei, hch, linux-block, loberman

On Fri, Sep 15 2017 at  1:29pm -0400,
Bart Van Assche <Bart.VanAssche@wdc.com> wrote:

> On Sat, 2017-09-16 at 00:44 +0800, Ming Lei wrote:
> > blk-mq will rerun queue via RESTART after one request is completion,
> > so not necessary to wait random time for requeuing, it should trust
> > blk-mq to do it.
> > 
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > ---
> >  drivers/md/dm-mpath.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
> > index 96aedaac2c64..f5a1088a6e79 100644
> > --- a/drivers/md/dm-mpath.c
> > +++ b/drivers/md/dm-mpath.c
> > @@ -505,7 +505,7 @@ static int multipath_clone_and_map(struct dm_target *ti, struct request *rq,
> >  			atomic_inc(&m->pg_init_in_progress);
> >  			activate_or_offline_path(pgpath);
> >  		}
> > -		return DM_MAPIO_DELAY_REQUEUE;
> > +		return DM_MAPIO_REQUEUE;
> >  	}
> >  	clone->bio = clone->biotail = NULL;
> >  	clone->rq_disk = bdev->bd_disk;
> 
> So you are reverting the patch below? Thank you very much.
> 
> commit 1c23484c355ec360ca2f37914f8a4802c6baeead
> Author: Bart Van Assche <bart.vanassche@wdc.com>
> Date:   Wed Aug 9 11:32:12 2017 -0700
> 
>     dm mpath: do not lock up a CPU with requeuing activity
>     
>     When using the block layer in single queue mode, get_request()
>     returns ERR_PTR(-EAGAIN) if the queue is dying and the REQ_NOWAIT
>     flag has been passed to get_request(). Avoid that the kernel
>     reports soft lockup complaints in this case due to continuous
>     requeuing activity.
>     
>     Fixes: 7083abbbf ("dm mpath: avoid that path removal can trigger an infinite loop")
>     Cc: stable@vger.kernel.org
>     Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
>     Tested-by: Laurence Oberman <loberman@redhat.com>
>     Reviewed-by: Christoph Hellwig <hch@lst.de>
>     Signed-off-by: Mike Snitzer <snitzer@redhat.com>

The problem is that multipath_clone_and_map() is now treated as common
code (thanks to both blk-mq and old .request_fn now enjoying the use of
blk_get_request) BUT: Ming please understand that this code is used by
old .request_fn too.  So it would seem that the use of
DM_MAPIO_DELAY_REQUEUE vs DM_MAPIO_REQUEUE needs to be based on dm-sq vs
dm-mq.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 5/5] dm-mpath: improve I/O schedule
  2017-09-15 16:44 ` [PATCH 5/5] dm-mpath: improve I/O schedule Ming Lei
@ 2017-09-15 20:10   ` Mike Snitzer
  2017-09-15 20:56     ` Bart Van Assche
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 69+ messages in thread
From: Mike Snitzer @ 2017-09-15 20:10 UTC (permalink / raw)
  To: Ming Lei
  Cc: dm-devel, Jens Axboe, linux-block, Christoph Hellwig,
	Bart Van Assche, Laurence Oberman

On Fri, Sep 15 2017 at 12:44pm -0400,
Ming Lei <ming.lei@redhat.com> wrote:

> The actual I/O schedule is done in dm-mpath layer, and the
> underlying I/O schedule is simply bypassed.
> 
> This patch sets underlying queue's nr_requests as its queue's
> queue_depth, then  we can get its queue busy feedback by simply
> checking if blk_get_request() returns successfully.
> 
> In this way, dm-mpath can reports its queue busy to block layer
> effectively, so I/O scheduling is improved much.
> 
> Follows the test results on lpfc*:
> 
> - fio(libaio, bs:4k, dio, queue_depth:64, 64 jobs,
>       over dm-mpath disk)
> - system(12 cores, dual sockets, mem: 64G)
> 
> ---------------------------------------
>           |v4.13+         |v4.13+
>           |+scsi_mq_perf  |+scsi_mq_perf+patches
> -----------------------------------------
>  IOPS(K)  |MQ-DEADLINE    |MQ-DEADLINE
> ------------------------------------------
> read      |       30.71   |      343.91
> -----------------------------------------
> randread  |       22.98   |       17.17
> ------------------------------------------
> write     |       16.45   |      390.88
> ------------------------------------------
> randwrite |       16.21   |       16.09
> ---------------------------------------
> 
> *:
> 1) lpfc.lpfc_lun_queue_depth=3, so that it is same with .cmd_per_lun
> 2) scsi_mq_perf means the patchset of 'blk-mq-sched: improve SCSI-MQ performance(V4)'[1]
> 3) v4.13+: top commit is 46c1e79fee41 Merge branch 'perf-urgent-for-linus'
> 4) the patchset 'blk-mq-sched: improve SCSI-MQ performance(V4)' focuses
> on improving on SCSI-MQ, and all the test result in that coverletter was
> against the raw lpfc/ib(run after 'multipath -F'), instead of dm-mpath.
> 5) this patchset itself doesn't depend on the scsi_mq_perf patchset[1]
> 
> [1] https://marc.info/?t=150436555700002&r=1&w=2
> 
> Signed-off-by: Ming Lei <ming.lei@redhat.com>

Very impressive gains Ming!

I'll review in more detail next week but Bart's concerns on patch 1 and
2 definitely need consideration.

Thanks,
Mike

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 2/5] dm-mpath: return DM_MAPIO_REQUEUE in case of rq allocation failure
  2017-09-15 20:06     ` Mike Snitzer
@ 2017-09-15 20:48         ` Bart Van Assche
  2017-09-17 13:23       ` Ming Lei
  1 sibling, 0 replies; 69+ messages in thread
From: Bart Van Assche @ 2017-09-15 20:48 UTC (permalink / raw)
  To: snitzer; +Cc: dm-devel, hch, linux-block, axboe, ming.lei, loberman

T24gRnJpLCAyMDE3LTA5LTE1IGF0IDE2OjA2IC0wNDAwLCBNaWtlIFNuaXR6ZXIgd3JvdGU6DQo+
IFRoZSBwcm9ibGVtIGlzIHRoYXQgbXVsdGlwYXRoX2Nsb25lX2FuZF9tYXAoKSBpcyBub3cgdHJl
YXRlZCBhcyBjb21tb24NCj4gY29kZSAodGhhbmtzIHRvIGJvdGggYmxrLW1xIGFuZCBvbGQgLnJl
cXVlc3RfZm4gbm93IGVuam95aW5nIHRoZSB1c2Ugb2YNCj4gYmxrX2dldF9yZXF1ZXN0KSBCVVQ6
IE1pbmcgcGxlYXNlIHVuZGVyc3RhbmQgdGhhdCB0aGlzIGNvZGUgaXMgdXNlZCBieQ0KPiBvbGQg
LnJlcXVlc3RfZm4gdG9vLiAgU28gaXQgd291bGQgc2VlbSB0aGF0IHRoZSB1c2Ugb2YNCj4gRE1f
TUFQSU9fREVMQVlfUkVRVUVVRSB2cyBETV9NQVBJT19SRVFVRVVFIG5lZWRzIHRvIGJlIGJhc2Vk
IG9uIGRtLXNxIHZzDQo+IGRtLW1xLg0KDQpIZWxsbyBNaWtlLA0KDQpNeSBwcm9wb3NhbCBpcyB0
byBsZWF2ZSBvdXQgcGF0Y2hlcyAxIGFuZCAyIGVudGlyZWx5LiBTaW5jZSB0aGUgU0NTSSBjb3Jl
DQpjYWxscyBibGtfbXFfcnVuX2h3X3F1ZXVlcygpIGFueXdheSBhZnRlciBhIHJlcXVlc3QgaGFz
IGZpbmlzaGVkIGl0IGlzIG5vdA0KY2xlYXIgdG8gbWUgd2hhdCB0aGUgbW90aXZhdGlvbiB3YXMg
YmVoaW5kIHRoZSBkZXZlbG9wbWVudCBvZiBwYXRjaGVzIDEgYW5kDQoyIGluIHRoaXMgc2VyaWVz
LiBJZiB0aGUgZ29hbCB3YXMgdG8gcmVydW4gYSBxdWV1ZSBhZnRlciBhIHJlcXVlc3QgaGFzDQpm
aW5pc2hlZCBJIHRoaW5rIHRoZSBzYW1lIGFwcHJvYWNoIHNob3VsZCBiZSB0YWtlbiBmb3IgZG0g
YXMgZm9yIHRoZSBTQ1NJDQpjb3JlLCBuYW1lbHkgdG8gcnVuIHRoZSBxdWV1ZSBmcm9tIGluc2lk
ZSB0aGUgZW5kX2lvIGNhbGxiYWNrLg0KDQpCYXJ0Lg==

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 2/5] dm-mpath: return DM_MAPIO_REQUEUE in case of rq allocation failure
@ 2017-09-15 20:48         ` Bart Van Assche
  0 siblings, 0 replies; 69+ messages in thread
From: Bart Van Assche @ 2017-09-15 20:48 UTC (permalink / raw)
  To: snitzer; +Cc: dm-devel, hch, linux-block, axboe, ming.lei, loberman

On Fri, 2017-09-15 at 16:06 -0400, Mike Snitzer wrote:
> The problem is that multipath_clone_and_map() is now treated as common
> code (thanks to both blk-mq and old .request_fn now enjoying the use of
> blk_get_request) BUT: Ming please understand that this code is used by
> old .request_fn too.  So it would seem that the use of
> DM_MAPIO_DELAY_REQUEUE vs DM_MAPIO_REQUEUE needs to be based on dm-sq vs
> dm-mq.

Hello Mike,

My proposal is to leave out patches 1 and 2 entirely. Since the SCSI core
calls blk_mq_run_hw_queues() anyway after a request has finished it is not
clear to me what the motivation was behind the development of patches 1 and
2 in this series. If the goal was to rerun a queue after a request has
finished I think the same approach should be taken for dm as for the SCSI
core, namely to run the queue from inside the end_io callback.

Bart.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 5/5] dm-mpath: improve I/O schedule
  2017-09-15 16:44 ` [PATCH 5/5] dm-mpath: improve I/O schedule Ming Lei
@ 2017-09-15 20:56     ` Bart Van Assche
  2017-09-15 20:56     ` Bart Van Assche
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 69+ messages in thread
From: Bart Van Assche @ 2017-09-15 20:56 UTC (permalink / raw)
  To: dm-devel, axboe, ming.lei, snitzer
  Cc: Bart Van Assche, hch, linux-block, loberman

T24gU2F0LCAyMDE3LTA5LTE2IGF0IDAwOjQ0ICswODAwLCBNaW5nIExlaSB3cm90ZToNCj4gLS0t
LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tDQo+ICAgICAgICAgICB8djQuMTMr
ICAgICAgICAgfHY0LjEzKw0KPiAgICAgICAgICAgfCtzY3NpX21xX3BlcmYgIHwrc2NzaV9tcV9w
ZXJmK3BhdGNoZXMNCj4gLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0N
Cj4gIElPUFMoSykgIHxNUS1ERUFETElORSAgICB8TVEtREVBRExJTkUNCj4gLS0tLS0tLS0tLS0t
LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tDQo+IHJlYWQgICAgICB8ICAgICAgIDMwLjcx
ICAgfCAgICAgIDM0My45MQ0KPiAtLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t
LS0tLQ0KPiByYW5kcmVhZCAgfCAgICAgICAyMi45OCAgIHwgICAgICAgMTcuMTcNCj4gLS0tLS0t
LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tDQo+IHdyaXRlICAgICB8ICAgICAg
IDE2LjQ1ICAgfCAgICAgIDM5MC44OA0KPiAtLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t
LS0tLS0tLS0tLS0NCj4gcmFuZHdyaXRlIHwgICAgICAgMTYuMjEgICB8ICAgICAgIDE2LjA5DQo+
IC0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLQ0KDQpXaGF0IGlzIHRoZSBy
ZWFzb24gZm9yIHRoZSByYW5kb20gSS9PIHBlcmZvcm1hbmNlIHJlZ3Jlc3Npb25zPyBVc2VycyB3
aG9zZQ0Kd29ya2xvYWQgdHJpZ2dlcnMgYSByYW5kb20gSS9PIHBhdHRlcm4gd2lsbCBiZSByZWFs
bHkgdW5oYXBweSB3aXRoIGEgMzMlIElPUFMNCnJlZ3Jlc3Npb24gZm9yIHJhbmRvbSByZWFkcy4g
RG9lcyB0aGF0IHJlZ3Jlc3Npb24gZ2V0IHdvcnNlIGZvciBsb3dlciBsYXRlbmN5DQp0cmFuc3Bv
cnRzLCBlLmcuIFNSUD8NCg0KQmFydC4=

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 5/5] dm-mpath: improve I/O schedule
@ 2017-09-15 20:56     ` Bart Van Assche
  0 siblings, 0 replies; 69+ messages in thread
From: Bart Van Assche @ 2017-09-15 20:56 UTC (permalink / raw)
  To: dm-devel, axboe, ming.lei, snitzer
  Cc: Bart Van Assche, hch, linux-block, loberman

On Sat, 2017-09-16 at 00:44 +0800, Ming Lei wrote:
> ---------------------------------------
>           |v4.13+         |v4.13+
>           |+scsi_mq_perf  |+scsi_mq_perf+patches
> -----------------------------------------
>  IOPS(K)  |MQ-DEADLINE    |MQ-DEADLINE
> ------------------------------------------
> read      |       30.71   |      343.91
> -----------------------------------------
> randread  |       22.98   |       17.17
> ------------------------------------------
> write     |       16.45   |      390.88
> ------------------------------------------
> randwrite |       16.21   |       16.09
> ---------------------------------------

What is the reason for the random I/O performance regressions? Users whose
workload triggers a random I/O pattern will be really unhappy with a 33% IOPS
regression for random reads. Does that regression get worse for lower latency
transports, e.g. SRP?

Bart.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 5/5] dm-mpath: improve I/O schedule
  2017-09-15 16:44 ` [PATCH 5/5] dm-mpath: improve I/O schedule Ming Lei
@ 2017-09-15 21:06     ` Bart Van Assche
  2017-09-15 20:56     ` Bart Van Assche
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 69+ messages in thread
From: Bart Van Assche @ 2017-09-15 21:06 UTC (permalink / raw)
  To: dm-devel, axboe, ming.lei, snitzer
  Cc: Bart Van Assche, hch, linux-block, loberman

T24gU2F0LCAyMDE3LTA5LTE2IGF0IDAwOjQ0ICswODAwLCBNaW5nIExlaSB3cm90ZToNCj4gMSkg
bHBmYy5scGZjX2x1bl9xdWV1ZV9kZXB0aD0zLCBzbyB0aGF0IGl0IGlzIHNhbWUgd2l0aCAuY21k
X3Blcl9sdW4NCg0KTm9ib2R5IEkga25vdyB1c2VzIHN1Y2ggYSBsb3cgcXVldWUgZGVwdGggZm9y
IHRoZSBscGZjIGRyaXZlci4gUGxlYXNlIGFsc28NCmluY2x1ZGUgcGVyZm9ybWFuY2UgcmVzdWx0
cyBmb3IgYSBtb3JlIHJlYWxpc3RpYyBxdWV1ZSBkZXB0aC4NCg0KVGhhbmtzLA0KDQpCYXJ0Lg==

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 5/5] dm-mpath: improve I/O schedule
@ 2017-09-15 21:06     ` Bart Van Assche
  0 siblings, 0 replies; 69+ messages in thread
From: Bart Van Assche @ 2017-09-15 21:06 UTC (permalink / raw)
  To: dm-devel, axboe, ming.lei, snitzer
  Cc: Bart Van Assche, hch, linux-block, loberman

On Sat, 2017-09-16 at 00:44 +0800, Ming Lei wrote:
> 1) lpfc.lpfc_lun_queue_depth=3, so that it is same with .cmd_per_lun

Nobody I know uses such a low queue depth for the lpfc driver. Please also
include performance results for a more realistic queue depth.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 5/5] dm-mpath: improve I/O schedule
  2017-09-15 16:44 ` [PATCH 5/5] dm-mpath: improve I/O schedule Ming Lei
@ 2017-09-15 21:42     ` Bart Van Assche
  2017-09-15 20:56     ` Bart Van Assche
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 69+ messages in thread
From: Bart Van Assche @ 2017-09-15 21:42 UTC (permalink / raw)
  To: dm-devel, axboe, ming.lei, snitzer
  Cc: Bart Van Assche, hch, linux-block, loberman

T24gU2F0LCAyMDE3LTA5LTE2IGF0IDAwOjQ0ICswODAwLCBNaW5nIExlaSB3cm90ZToNCj4gK3N0
YXRpYyB2b2lkIHNhdmVfcGF0aF9xdWV1ZV9kZXB0aChzdHJ1Y3QgcGdwYXRoICpwKQ0KPiArew0K
PiArCXN0cnVjdCByZXF1ZXN0X3F1ZXVlICpxID0gYmRldl9nZXRfcXVldWUocC0+cGF0aC5kZXYt
PmJkZXYpOw0KPiArDQo+ICsJcC0+b2xkX25yX3JlcXVlc3RzID0gcS0+bnJfcmVxdWVzdHM7DQo+
ICsJcC0+cXVldWVfZGVwdGggPSBxLT5xdWV1ZV9kZXB0aDsNCj4gKw0KPiArCS8qIG9uZSBleHRy
YSByZXF1ZXN0IGZvciBtYWtpbmcgdGhlIHBpcGVsaW5lIGZ1bGwgKi8NCj4gKwlpZiAocC0+cXVl
dWVfZGVwdGgpDQo+ICsJCWJsa191cGRhdGVfbnJfcmVxdWVzdHMocSwgcC0+cXVldWVfZGVwdGgg
KyAxKTsNCj4gK30NCg0KYmxrX21xX2luaXRfYWxsb2NhdGVkX3F1ZXVlKCkgaW5pdGlhbGl6ZXMg
bnJfcmVxdWVzdHMgdG8gdGhlIHRhZyBzZXQgcXVldWUgZGVwdGguDQpEb2VzIHRoYXQgbWVhbiB0
aGF0IHRoZSBhYm92ZSBjb2RlIGluY3JlYXNlcyBucl9yZXF1ZXN0cyBieSBvbmU/IElmIHNvLCBk
b2VzDQp0aGF0IGNoYW5nZSBvbmx5IHJlc3VsdCBpbiBhIHBlcmZvcm1hbmNlIGltcHJvdmVtZW50
IGZvciB0aGUgcXVldWUgZGVwdGgNCm1lbnRpb25lZCBpbiB0aGUgcGF0aCBkZXNjcmlwdGlvbiAo
Myk/IFNvcnJ5IGJ1dCBJIGRvdWJ0IHRoYXQgdGhpcyBjaGFuZ2Ugd2lsbA0KeWllbGQgYSBzaWdu
aWZpY2FudCBpbXByb3ZlbWVudCBmb3IgaGlnaGVyIHF1ZXVlIGRlcHRocy4gRG9lcyB0aGF0IG1l
YW4gdGhhdA0KdGhpcyBwYXRjaCBjYW4gYmUgbGVmdCBvdXQ/DQoNClRoYW5rcywNCg0KQmFydC4=

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 5/5] dm-mpath: improve I/O schedule
@ 2017-09-15 21:42     ` Bart Van Assche
  0 siblings, 0 replies; 69+ messages in thread
From: Bart Van Assche @ 2017-09-15 21:42 UTC (permalink / raw)
  To: dm-devel, axboe, ming.lei, snitzer
  Cc: Bart Van Assche, hch, linux-block, loberman

On Sat, 2017-09-16 at 00:44 +0800, Ming Lei wrote:
> +static void save_path_queue_depth(struct pgpath *p)
> +{
> +	struct request_queue *q = bdev_get_queue(p->path.dev->bdev);
> +
> +	p->old_nr_requests = q->nr_requests;
> +	p->queue_depth = q->queue_depth;
> +
> +	/* one extra request for making the pipeline full */
> +	if (p->queue_depth)
> +		blk_update_nr_requests(q, p->queue_depth + 1);
> +}

blk_mq_init_allocated_queue() initializes nr_requests to the tag set queue depth.
Does that mean that the above code increases nr_requests by one? If so, does
that change only result in a performance improvement for the queue depth
mentioned in the path description (3)? Sorry but I doubt that this change will
yield a significant improvement for higher queue depths. Does that mean that
this patch can be left out?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE
  2017-09-15 17:57     ` Bart Van Assche
@ 2017-09-17 12:40       ` Ming Lei
  -1 siblings, 0 replies; 69+ messages in thread
From: Ming Lei @ 2017-09-17 12:40 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: dm-devel, axboe, snitzer, linux-block, hch, sagi,
	martin.petersen, linux-scsi, linux-nvme, jejb, loberman

On Fri, Sep 15, 2017 at 05:57:31PM +0000, Bart Van Assche wrote:
> On Sat, 2017-09-16 at 00:44 +0800, Ming Lei wrote:
> > If .queue_rq() returns BLK_STS_RESOURCE, blk-mq will rerun
> > the queue in the three situations:
> > 
> > 1) if BLK_MQ_S_SCHED_RESTART is set
> > - queue is rerun after one rq is completed, see blk_mq_sched_restart()
> > which is run from blk_mq_free_request()
> > 
> > 2) BLK_MQ_S_TAG_WAITING is set
> > - queue is rerun after one tag is freed
> > 
> > 3) otherwise
> > - queue is run immediately in blk_mq_dispatch_rq_list()
> > 
> > So calling blk_mq_delay_run_hw_queue() inside .queue_rq() doesn't make
> > sense because no matter it is called or not, the queue still will be
> > rerun soon in above three situations, and the busy req can be dispatched
> > again.
> 
> NAK
> 
> Block drivers call blk_mq_delay_run_hw_queue() if it is known that the queue
> has to be rerun later even if no request has completed before the delay has
> expired. This patch breaks at least the SCSI core and the dm-mpath drivers.

"if no request has completed before the delay has expired" can't be a
reason to rerun the queue, because the queue can still be busy.

The only reason is that there isn't any requests in-flight and
queue is still busy, but I'd rather understand what the exact
situation is, instead of using this kind of workaround. If no
such situation, we should remove that.

For SCSI, it might be reasonable to run the hw queue after
a random time when atomic_read(&sdev->device_busy) is zero,
that means the queue may be busy even when there isn't any
requests in-flight in this queue. Could you or someone share
what the case is? Then we can avoid to use this workaround.

For dm-mpath, it might be related with path, but I have to say
it is still a workaround.

I suggest to understand the root cause, instead of keeping this
ugly random delay because run hw queue after 100ms may be useless
in 99.99% times.

-- 
Ming

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE
@ 2017-09-17 12:40       ` Ming Lei
  0 siblings, 0 replies; 69+ messages in thread
From: Ming Lei @ 2017-09-17 12:40 UTC (permalink / raw)


On Fri, Sep 15, 2017@05:57:31PM +0000, Bart Van Assche wrote:
> On Sat, 2017-09-16@00:44 +0800, Ming Lei wrote:
> > If .queue_rq() returns BLK_STS_RESOURCE, blk-mq will rerun
> > the queue in the three situations:
> > 
> > 1) if BLK_MQ_S_SCHED_RESTART is set
> > - queue is rerun after one rq is completed, see blk_mq_sched_restart()
> > which is run from blk_mq_free_request()
> > 
> > 2) BLK_MQ_S_TAG_WAITING is set
> > - queue is rerun after one tag is freed
> > 
> > 3) otherwise
> > - queue is run immediately in blk_mq_dispatch_rq_list()
> > 
> > So calling blk_mq_delay_run_hw_queue() inside .queue_rq() doesn't make
> > sense because no matter it is called or not, the queue still will be
> > rerun soon in above three situations, and the busy req can be dispatched
> > again.
> 
> NAK
> 
> Block drivers call blk_mq_delay_run_hw_queue() if it is known that the queue
> has to be rerun later even if no request has completed before the delay has
> expired. This patch breaks at least the SCSI core and the dm-mpath drivers.

"if no request has completed before the delay has expired" can't be a
reason to rerun the queue, because the queue can still be busy.

The only reason is that there isn't any requests in-flight and
queue is still busy, but I'd rather understand what the exact
situation is, instead of using this kind of workaround. If no
such situation, we should remove that.

For SCSI, it might be reasonable to run the hw queue after
a random time when atomic_read(&sdev->device_busy) is zero,
that means the queue may be busy even when there isn't any
requests in-flight in this queue. Could you or someone share
what the case is? Then we can avoid to use this workaround.

For dm-mpath, it might be related with path, but I have to say
it is still a workaround.

I suggest to understand the root cause, instead of keeping this
ugly random delay because run hw queue after 100ms may be useless
in 99.99% times.

-- 
Ming

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 2/5] dm-mpath: return DM_MAPIO_REQUEUE in case of rq allocation failure
  2017-09-15 17:29     ` Bart Van Assche
  (?)
  (?)
@ 2017-09-17 12:51     ` Ming Lei
  -1 siblings, 0 replies; 69+ messages in thread
From: Ming Lei @ 2017-09-17 12:51 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: dm-devel, axboe, snitzer, hch, linux-block, loberman

On Fri, Sep 15, 2017 at 05:29:53PM +0000, Bart Van Assche wrote:
> On Sat, 2017-09-16 at 00:44 +0800, Ming Lei wrote:
> > blk-mq will rerun queue via RESTART after one request is completion,
> > so not necessary to wait random time for requeuing, it should trust
> > blk-mq to do it.
> > 
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > ---
> >  drivers/md/dm-mpath.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
> > index 96aedaac2c64..f5a1088a6e79 100644
> > --- a/drivers/md/dm-mpath.c
> > +++ b/drivers/md/dm-mpath.c
> > @@ -505,7 +505,7 @@ static int multipath_clone_and_map(struct dm_target *ti, struct request *rq,
> >  			atomic_inc(&m->pg_init_in_progress);
> >  			activate_or_offline_path(pgpath);
> >  		}
> > -		return DM_MAPIO_DELAY_REQUEUE;
> > +		return DM_MAPIO_REQUEUE;
> >  	}
> >  	clone->bio = clone->biotail = NULL;
> >  	clone->rq_disk = bdev->bd_disk;
> 
> So you are reverting the patch below? Thank you very much.
> 
> commit 1c23484c355ec360ca2f37914f8a4802c6baeead
> Author: Bart Van Assche <bart.vanassche@wdc.com>
> Date:   Wed Aug 9 11:32:12 2017 -0700
> 
>     dm mpath: do not lock up a CPU with requeuing activity
>     
>     When using the block layer in single queue mode, get_request()
>     returns ERR_PTR(-EAGAIN) if the queue is dying and the REQ_NOWAIT
>     flag has been passed to get_request(). Avoid that the kernel
>     reports soft lockup complaints in this case due to continuous
>     requeuing activity.

What is the continuous requeuing activity? In case of BLK_STS_RESOURCE,
blk-mq's SCHED_RESTART(see blk_mq_sched_dispatch_requests()) will be
triggered, then this rq will be dispatched again after one rq is completed.

-- 
Ming

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 2/5] dm-mpath: return DM_MAPIO_REQUEUE in case of rq allocation failure
  2017-09-15 20:06     ` Mike Snitzer
  2017-09-15 20:48         ` Bart Van Assche
@ 2017-09-17 13:23       ` Ming Lei
  2017-09-19 14:41         ` Mike Snitzer
  1 sibling, 1 reply; 69+ messages in thread
From: Ming Lei @ 2017-09-17 13:23 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: Bart Van Assche, dm-devel, axboe, hch, linux-block, loberman

On Fri, Sep 15, 2017 at 04:06:55PM -0400, Mike Snitzer wrote:
> On Fri, Sep 15 2017 at  1:29pm -0400,
> Bart Van Assche <Bart.VanAssche@wdc.com> wrote:
> 
> > On Sat, 2017-09-16 at 00:44 +0800, Ming Lei wrote:
> > > blk-mq will rerun queue via RESTART after one request is completion,
> > > so not necessary to wait random time for requeuing, it should trust
> > > blk-mq to do it.
> > > 
> > > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > > ---
> > >  drivers/md/dm-mpath.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
> > > index 96aedaac2c64..f5a1088a6e79 100644
> > > --- a/drivers/md/dm-mpath.c
> > > +++ b/drivers/md/dm-mpath.c
> > > @@ -505,7 +505,7 @@ static int multipath_clone_and_map(struct dm_target *ti, struct request *rq,
> > >  			atomic_inc(&m->pg_init_in_progress);
> > >  			activate_or_offline_path(pgpath);
> > >  		}
> > > -		return DM_MAPIO_DELAY_REQUEUE;
> > > +		return DM_MAPIO_REQUEUE;
> > >  	}
> > >  	clone->bio = clone->biotail = NULL;
> > >  	clone->rq_disk = bdev->bd_disk;
> > 
> > So you are reverting the patch below? Thank you very much.
> > 
> > commit 1c23484c355ec360ca2f37914f8a4802c6baeead
> > Author: Bart Van Assche <bart.vanassche@wdc.com>
> > Date:   Wed Aug 9 11:32:12 2017 -0700
> > 
> >     dm mpath: do not lock up a CPU with requeuing activity
> >     
> >     When using the block layer in single queue mode, get_request()
> >     returns ERR_PTR(-EAGAIN) if the queue is dying and the REQ_NOWAIT
> >     flag has been passed to get_request(). Avoid that the kernel
> >     reports soft lockup complaints in this case due to continuous
> >     requeuing activity.
> >     
> >     Fixes: 7083abbbf ("dm mpath: avoid that path removal can trigger an infinite loop")
> >     Cc: stable@vger.kernel.org
> >     Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
> >     Tested-by: Laurence Oberman <loberman@redhat.com>
> >     Reviewed-by: Christoph Hellwig <hch@lst.de>
> >     Signed-off-by: Mike Snitzer <snitzer@redhat.com>
> 
> The problem is that multipath_clone_and_map() is now treated as common
> code (thanks to both blk-mq and old .request_fn now enjoying the use of
> blk_get_request) BUT: Ming please understand that this code is used by
> old .request_fn too.  So it would seem that the use of

Hi Mike,

OK, thanks for pointing this out.

> DM_MAPIO_DELAY_REQUEUE vs DM_MAPIO_REQUEUE needs to be based on dm-sq vs
> dm-mq.

Yeah, just forget that dm-mq can't work on underlying queue which is
block legacy path, also forget the exact reason, :-(

-- 
Ming

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE
  2017-09-17 12:40       ` Ming Lei
  (?)
@ 2017-09-18 15:18         ` Bart Van Assche
  -1 siblings, 0 replies; 69+ messages in thread
From: Bart Van Assche @ 2017-09-18 15:18 UTC (permalink / raw)
  To: ming.lei
  Cc: linux-block, hch, sagi, snitzer, martin.petersen, linux-scsi,
	axboe, linux-nvme, jejb, loberman, dm-devel

T24gU3VuLCAyMDE3LTA5LTE3IGF0IDIwOjQwICswODAwLCBNaW5nIExlaSB3cm90ZToNCj4gImlm
IG5vIHJlcXVlc3QgaGFzIGNvbXBsZXRlZCBiZWZvcmUgdGhlIGRlbGF5IGhhcyBleHBpcmVkIiBj
YW4ndCBiZSBhDQo+IHJlYXNvbiB0byByZXJ1biB0aGUgcXVldWUsIGJlY2F1c2UgdGhlIHF1ZXVl
IGNhbiBzdGlsbCBiZSBidXN5Lg0KDQpUaGF0IHN0YXRlbWVudCBvZiB5b3Ugc2hvd3MgdGhhdCB0
aGVyZSBhcmUgaW1wb3J0YW50IGFzcGVjdHMgb2YgdGhlIFNDU0kNCmNvcmUgYW5kIGRtLW1wYXRo
IGRyaXZlciB0aGF0IHlvdSBkb24ndCB1bmRlcnN0YW5kLg0KDQo+IEkgc3VnZ2VzdCB0byB1bmRl
cnN0YW5kIHRoZSByb290IGNhdXNlLCBpbnN0ZWFkIG9mIGtlZXBpbmcgdGhpcw0KPiB1Z2x5IHJh
bmRvbSBkZWxheSBiZWNhdXNlIHJ1biBodyBxdWV1ZSBhZnRlciAxMDBtcyBtYXkgYmUgdXNlbGVz
cw0KPiBpbiA5OS45OSUgdGltZXMuDQoNCklmIHlvdSBhcmUgc3RpbGwgbG9va2luZyBhdCByZW1v
dmluZyB0aGUgYmxrX21xX2RlbGF5X3J1bl9od19xdWV1ZSgpIGNhbGxzDQp0aGVuIEkgdGhpbmsg
eW91IGFyZSBsb29raW5nIGluIHRoZSB3cm9uZyBkaXJlY3Rpb24uIFdoYXQga2luZCBvZiBwcm9i
bGVtDQphcmUgeW91IHRyeWluZyB0byBzb2x2ZT8gSXMgaXQgcGVyaGFwcyB0aGF0IHRoZXJlIGNh
biBiZSBhIGRlbGF5IGJldHdlZW4NCmRtLW1wYXRoIHJlcXVlc3QgY29tcGxldGlvbiBhbmQgdGhl
IHF1ZXVlaW5nIG9mIGEgbmV3IHJlcXVlc3Q/IElmIHNvLA0KYWRkaW5nIGEgcXVldWUgcnVuIGNh
bGwgaW50byB0aGUgZG0tbXBhdGggZW5kX2lvIGNhbGxiYWNrIGlzIHByb2JhYmx5DQpzdWZmaWNp
ZW50IGFuZCBwcm9iYWJseSBjYW4gcmVwbGFjZSB0aGlzIGVudGlyZSBwYXRjaCBzZXJpZXMuDQoN
CkJhcnQu

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE
@ 2017-09-18 15:18         ` Bart Van Assche
  0 siblings, 0 replies; 69+ messages in thread
From: Bart Van Assche @ 2017-09-18 15:18 UTC (permalink / raw)
  To: ming.lei
  Cc: linux-block, hch, sagi, snitzer, martin.petersen, linux-scsi,
	axboe, linux-nvme, jejb, loberman, dm-devel

On Sun, 2017-09-17 at 20:40 +0800, Ming Lei wrote:
> "if no request has completed before the delay has expired" can't be a
> reason to rerun the queue, because the queue can still be busy.

That statement of you shows that there are important aspects of the SCSI
core and dm-mpath driver that you don't understand.

> I suggest to understand the root cause, instead of keeping this
> ugly random delay because run hw queue after 100ms may be useless
> in 99.99% times.

If you are still looking at removing the blk_mq_delay_run_hw_queue() calls
then I think you are looking in the wrong direction. What kind of problem
are you trying to solve? Is it perhaps that there can be a delay between
dm-mpath request completion and the queueing of a new request? If so,
adding a queue run call into the dm-mpath end_io callback is probably
sufficient and probably can replace this entire patch series.

Bart.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE
@ 2017-09-18 15:18         ` Bart Van Assche
  0 siblings, 0 replies; 69+ messages in thread
From: Bart Van Assche @ 2017-09-18 15:18 UTC (permalink / raw)


On Sun, 2017-09-17@20:40 +0800, Ming Lei wrote:
> "if no request has completed before the delay has expired" can't be a
> reason to rerun the queue, because the queue can still be busy.

That statement of you shows that there are important aspects of the SCSI
core and dm-mpath driver that you don't understand.

> I suggest to understand the root cause, instead of keeping this
> ugly random delay because run hw queue after 100ms may be useless
> in 99.99% times.

If you are still looking at removing the blk_mq_delay_run_hw_queue() calls
then I think you are looking in the wrong direction. What kind of problem
are you trying to solve? Is it perhaps that there can be a delay between
dm-mpath request completion and the queueing of a new request? If so,
adding a queue run call into the dm-mpath end_io callback is probably
sufficient and probably can replace this entire patch series.

Bart.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE
  2017-09-18 15:18         ` Bart Van Assche
@ 2017-09-19  5:43           ` Ming Lei
  -1 siblings, 0 replies; 69+ messages in thread
From: Ming Lei @ 2017-09-19  5:43 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: hch, jejb, sagi, linux-scsi, axboe, snitzer, linux-nvme,
	linux-block, dm-devel, martin.petersen, loberman

On Mon, Sep 18, 2017 at 03:18:16PM +0000, Bart Van Assche wrote:
> On Sun, 2017-09-17 at 20:40 +0800, Ming Lei wrote:
> > "if no request has completed before the delay has expired" can't be a
> > reason to rerun the queue, because the queue can still be busy.
> 
> That statement of you shows that there are important aspects of the SCSI
> core and dm-mpath driver that you don't understand.

Then can you tell me why blk-mq's SCHED_RESTART can't cover
the rerun when there are in-flight requests? What is the case
in which dm-rq can return BUSY and there aren't any in-flight
requests meantime?

Also you are the author of adding 'blk_mq_delay_run_hw_queue(
hctx, 100/*ms*/)' in dm-rq, you never explain in commit
6077c2d706097c0(dm rq: Avoid that request processing stalls
sporadically) what the root cause is for your request stall
and why this patch fixes your issue. Even you don't explain
why is the delay 100ms?

So it is a workaound, isn't it?

My concern is that it isn't good to add blk_mq_delay_run_hw_queue(hctx, 100/*ms*/)
in the hot path since it should been covered by SCHED_RESTART
if there are in-flight requests.

> 
> > I suggest to understand the root cause, instead of keeping this
> > ugly random delay because run hw queue after 100ms may be useless
> > in 99.99% times.
> 
> If you are still looking at removing the blk_mq_delay_run_hw_queue() calls
> then I think you are looking in the wrong direction. What kind of problem
> are you trying to solve? Is it perhaps that there can be a delay between

Actually the improvement on dm-rq IO schedule(the patch 2 ~ 5) doesn't
need this patch.


-- 
Ming

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE
@ 2017-09-19  5:43           ` Ming Lei
  0 siblings, 0 replies; 69+ messages in thread
From: Ming Lei @ 2017-09-19  5:43 UTC (permalink / raw)


On Mon, Sep 18, 2017@03:18:16PM +0000, Bart Van Assche wrote:
> On Sun, 2017-09-17@20:40 +0800, Ming Lei wrote:
> > "if no request has completed before the delay has expired" can't be a
> > reason to rerun the queue, because the queue can still be busy.
> 
> That statement of you shows that there are important aspects of the SCSI
> core and dm-mpath driver that you don't understand.

Then can you tell me why blk-mq's SCHED_RESTART can't cover
the rerun when there are in-flight requests? What is the case
in which dm-rq can return BUSY and there aren't any in-flight
requests meantime?

Also you are the author of adding 'blk_mq_delay_run_hw_queue(
hctx, 100/*ms*/)' in dm-rq, you never explain in commit
6077c2d706097c0(dm rq: Avoid that request processing stalls
sporadically) what the root cause is for your request stall
and why this patch fixes your issue. Even you don't explain
why is the delay 100ms?

So it is a workaound, isn't it?

My concern is that it isn't good to add blk_mq_delay_run_hw_queue(hctx, 100/*ms*/)
in the hot path since it should been covered by SCHED_RESTART
if there are in-flight requests.

> 
> > I suggest to understand the root cause, instead of keeping this
> > ugly random delay because run hw queue after 100ms may be useless
> > in 99.99% times.
> 
> If you are still looking at removing the blk_mq_delay_run_hw_queue() calls
> then I think you are looking in the wrong direction. What kind of problem
> are you trying to solve? Is it perhaps that there can be a delay between

Actually the improvement on dm-rq IO schedule(the patch 2 ~ 5) doesn't
need this patch.


-- 
Ming

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 2/5] dm-mpath: return DM_MAPIO_REQUEUE in case of rq allocation failure
  2017-09-17 13:23       ` Ming Lei
@ 2017-09-19 14:41         ` Mike Snitzer
  2017-09-19 15:56           ` Ming Lei
  0 siblings, 1 reply; 69+ messages in thread
From: Mike Snitzer @ 2017-09-19 14:41 UTC (permalink / raw)
  To: Ming Lei; +Cc: linux-block, loberman, hch, axboe, dm-devel, Bart Van Assche

On Sun, Sep 17 2017 at  9:23am -0400,
Ming Lei <ming.lei@redhat.com> wrote:

> On Fri, Sep 15, 2017 at 04:06:55PM -0400, Mike Snitzer wrote:
> > On Fri, Sep 15 2017 at  1:29pm -0400,
> > Bart Van Assche <Bart.VanAssche@wdc.com> wrote:
> > 
> > > On Sat, 2017-09-16 at 00:44 +0800, Ming Lei wrote:
> > > > blk-mq will rerun queue via RESTART after one request is completion,
> > > > so not necessary to wait random time for requeuing, it should trust
> > > > blk-mq to do it.
> > > > 
> > > > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > > > ---
> > > >  drivers/md/dm-mpath.c | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > 
> > > > diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
> > > > index 96aedaac2c64..f5a1088a6e79 100644
> > > > --- a/drivers/md/dm-mpath.c
> > > > +++ b/drivers/md/dm-mpath.c
> > > > @@ -505,7 +505,7 @@ static int multipath_clone_and_map(struct dm_target *ti, struct request *rq,
> > > >  			atomic_inc(&m->pg_init_in_progress);
> > > >  			activate_or_offline_path(pgpath);
> > > >  		}
> > > > -		return DM_MAPIO_DELAY_REQUEUE;
> > > > +		return DM_MAPIO_REQUEUE;
> > > >  	}
> > > >  	clone->bio = clone->biotail = NULL;
> > > >  	clone->rq_disk = bdev->bd_disk;
> > > 
> > > So you are reverting the patch below? Thank you very much.
> > > 
> > > commit 1c23484c355ec360ca2f37914f8a4802c6baeead
> > > Author: Bart Van Assche <bart.vanassche@wdc.com>
> > > Date:   Wed Aug 9 11:32:12 2017 -0700
> > > 
> > >     dm mpath: do not lock up a CPU with requeuing activity
> > >     
> > >     When using the block layer in single queue mode, get_request()
> > >     returns ERR_PTR(-EAGAIN) if the queue is dying and the REQ_NOWAIT
> > >     flag has been passed to get_request(). Avoid that the kernel
> > >     reports soft lockup complaints in this case due to continuous
> > >     requeuing activity.
> > >     
> > >     Fixes: 7083abbbf ("dm mpath: avoid that path removal can trigger an infinite loop")
> > >     Cc: stable@vger.kernel.org
> > >     Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
> > >     Tested-by: Laurence Oberman <loberman@redhat.com>
> > >     Reviewed-by: Christoph Hellwig <hch@lst.de>
> > >     Signed-off-by: Mike Snitzer <snitzer@redhat.com>
> > 
> > The problem is that multipath_clone_and_map() is now treated as common
> > code (thanks to both blk-mq and old .request_fn now enjoying the use of
> > blk_get_request) BUT: Ming please understand that this code is used by
> > old .request_fn too.  So it would seem that the use of
> 
> Hi Mike,
> 
> OK, thanks for pointing this out.
> 
> > DM_MAPIO_DELAY_REQUEUE vs DM_MAPIO_REQUEUE needs to be based on dm-sq vs
> > dm-mq.
> 
> Yeah, just forget that dm-mq can't work on underlying queue which is
> block legacy path, also forget the exact reason, :-(

Not sure how that detail is relevant to your patch?

But here are the 2 patches I posted as RFC to eliminate the restriction:
https://patchwork.kernel.org/patch/9839565/
https://patchwork.kernel.org/patch/9839567/

In the end I'm not too interested in supporting blk-mq multipath ontop
of block legacy path.  But I'd revisit if hch or someone else posted the
blk-mq patch that "removed the ->complete handler" (as hch mentions in
the thread of the first patchwork patch I reference above).

Mike

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE
  2017-09-19  5:43           ` Ming Lei
  (?)
@ 2017-09-19 15:36             ` Bart Van Assche
  -1 siblings, 0 replies; 69+ messages in thread
From: Bart Van Assche @ 2017-09-19 15:36 UTC (permalink / raw)
  To: ming.lei
  Cc: linux-block, hch, sagi, snitzer, martin.petersen, linux-scsi,
	axboe, linux-nvme, jejb, loberman, dm-devel

T24gVHVlLCAyMDE3LTA5LTE5IGF0IDEzOjQzICswODAwLCBNaW5nIExlaSB3cm90ZToNCj4gT24g
TW9uLCBTZXAgMTgsIDIwMTcgYXQgMDM6MTg6MTZQTSArMDAwMCwgQmFydCBWYW4gQXNzY2hlIHdy
b3RlOg0KPiA+IElmIHlvdSBhcmUgc3RpbGwgbG9va2luZyBhdCByZW1vdmluZyB0aGUgYmxrX21x
X2RlbGF5X3J1bl9od19xdWV1ZSgpIGNhbGxzDQo+ID4gdGhlbiBJIHRoaW5rIHlvdSBhcmUgbG9v
a2luZyBpbiB0aGUgd3JvbmcgZGlyZWN0aW9uLiBXaGF0IGtpbmQgb2YgcHJvYmxlbQ0KPiA+IGFy
ZSB5b3UgdHJ5aW5nIHRvIHNvbHZlPyBJcyBpdCBwZXJoYXBzIHRoYXQgdGhlcmUgY2FuIGJlIGEg
ZGVsYXkgYmV0d2Vlbg0KPiANCj4gQWN0dWFsbHkgdGhlIGltcHJvdmVtZW50IG9uIGRtLXJxIElP
IHNjaGVkdWxlKHRoZSBwYXRjaCAyIH4gNSkgZG9lc24ndA0KPiBuZWVkIHRoaXMgcGF0Y2guDQoN
ClRoZSBhcHByb2FjaCBvZiB0aGlzIHBhdGNoIHNlcmllcyBsb29rcyB3cm9uZyB0byBtZSBhbmQg
cGF0Y2ggNS81IGlzIGFuIHVnbHkNCmhhY2sgaW4gbXkgb3Bpbmlvbi4gVGhhdCdzIHdoeSBJIGFz
a2VkIHlvdSB0byBkcm9wIHRoZSBlbnRpcmUgcGF0Y2ggc2VyaWVzIGFuZA0KdG8gdGVzdCB3aGV0
aGVyIGluc2VydGluZyBhIHF1ZXVlIHJ1biBjYWxsIGludG8gdGhlIGRtLW1wYXRoIGVuZF9pbyBj
YWxsYmFjaw0KeWllbGRzIGEgc2ltaWxhciBwZXJmb3JtYW5jZSBpbXByb3ZlbWVudCB0byB0aGUg
cGF0Y2hlcyB5b3UgcG9zdGVkLiBQbGVhc2UgZG8NCm5vdCBleHBlY3QgbWUgdG8gc3BlbmQgbW9y
ZSB0aW1lIG9uIHRoaXMgcGF0Y2ggc2VyaWVzIGlmIHlvdSBkbyBub3QgY29tZSB1cA0Kd2l0aCBt
ZWFzdXJlbWVudCByZXN1bHRzIGZvciB0aGUgcHJvcG9zZWQgYWx0ZXJuYXRpdmUuDQoNCkJhcnQu

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE
@ 2017-09-19 15:36             ` Bart Van Assche
  0 siblings, 0 replies; 69+ messages in thread
From: Bart Van Assche @ 2017-09-19 15:36 UTC (permalink / raw)
  To: ming.lei
  Cc: linux-block, hch, sagi, snitzer, martin.petersen, linux-scsi,
	axboe, linux-nvme, jejb, loberman, dm-devel

On Tue, 2017-09-19 at 13:43 +0800, Ming Lei wrote:
> On Mon, Sep 18, 2017 at 03:18:16PM +0000, Bart Van Assche wrote:
> > If you are still looking at removing the blk_mq_delay_run_hw_queue() calls
> > then I think you are looking in the wrong direction. What kind of problem
> > are you trying to solve? Is it perhaps that there can be a delay between
> 
> Actually the improvement on dm-rq IO schedule(the patch 2 ~ 5) doesn't
> need this patch.

The approach of this patch series looks wrong to me and patch 5/5 is an ugly
hack in my opinion. That's why I asked you to drop the entire patch series and
to test whether inserting a queue run call into the dm-mpath end_io callback
yields a similar performance improvement to the patches you posted. Please do
not expect me to spend more time on this patch series if you do not come up
with measurement results for the proposed alternative.

Bart.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE
@ 2017-09-19 15:36             ` Bart Van Assche
  0 siblings, 0 replies; 69+ messages in thread
From: Bart Van Assche @ 2017-09-19 15:36 UTC (permalink / raw)


On Tue, 2017-09-19@13:43 +0800, Ming Lei wrote:
> On Mon, Sep 18, 2017@03:18:16PM +0000, Bart Van Assche wrote:
> > If you are still looking at removing the blk_mq_delay_run_hw_queue() calls
> > then I think you are looking in the wrong direction. What kind of problem
> > are you trying to solve? Is it perhaps that there can be a delay between
> 
> Actually the improvement on dm-rq IO schedule(the patch 2 ~ 5) doesn't
> need this patch.

The approach of this patch series looks wrong to me and patch 5/5 is an ugly
hack in my opinion. That's why I asked you to drop the entire patch series and
to test whether inserting a queue run call into the dm-mpath end_io callback
yields a similar performance improvement to the patches you posted. Please do
not expect me to spend more time on this patch series if you do not come up
with measurement results for the proposed alternative.

Bart.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE
  2017-09-19  5:43           ` Ming Lei
@ 2017-09-19 15:48             ` Mike Snitzer
  -1 siblings, 0 replies; 69+ messages in thread
From: Mike Snitzer @ 2017-09-19 15:48 UTC (permalink / raw)
  To: Ming Lei
  Cc: Bart Van Assche, hch, jejb, sagi, linux-scsi, axboe, linux-nvme,
	linux-block, dm-devel, martin.petersen, loberman

On Tue, Sep 19 2017 at  1:43am -0400,
Ming Lei <ming.lei@redhat.com> wrote:

> On Mon, Sep 18, 2017 at 03:18:16PM +0000, Bart Van Assche wrote:
> > On Sun, 2017-09-17 at 20:40 +0800, Ming Lei wrote:
> > > "if no request has completed before the delay has expired" can't be a
> > > reason to rerun the queue, because the queue can still be busy.
> > 
> > That statement of you shows that there are important aspects of the SCSI
> > core and dm-mpath driver that you don't understand.
> 
> Then can you tell me why blk-mq's SCHED_RESTART can't cover
> the rerun when there are in-flight requests? What is the case
> in which dm-rq can return BUSY and there aren't any in-flight
> requests meantime?
> 
> Also you are the author of adding 'blk_mq_delay_run_hw_queue(
> hctx, 100/*ms*/)' in dm-rq, you never explain in commit
> 6077c2d706097c0(dm rq: Avoid that request processing stalls
> sporadically) what the root cause is for your request stall
> and why this patch fixes your issue. Even you don't explain
> why is the delay 100ms?
> 
> So it is a workaound, isn't it?
> 
> My concern is that it isn't good to add blk_mq_delay_run_hw_queue(hctx, 100/*ms*/)
> in the hot path since it should been covered by SCHED_RESTART
> if there are in-flight requests.

This thread proves that it is definitely brittle to be relying on fixed
delays like this:
https://patchwork.kernel.org/patch/9703249/

Mike

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE
@ 2017-09-19 15:48             ` Mike Snitzer
  0 siblings, 0 replies; 69+ messages in thread
From: Mike Snitzer @ 2017-09-19 15:48 UTC (permalink / raw)


On Tue, Sep 19 2017 at  1:43am -0400,
Ming Lei <ming.lei@redhat.com> wrote:

> On Mon, Sep 18, 2017@03:18:16PM +0000, Bart Van Assche wrote:
> > On Sun, 2017-09-17@20:40 +0800, Ming Lei wrote:
> > > "if no request has completed before the delay has expired" can't be a
> > > reason to rerun the queue, because the queue can still be busy.
> > 
> > That statement of you shows that there are important aspects of the SCSI
> > core and dm-mpath driver that you don't understand.
> 
> Then can you tell me why blk-mq's SCHED_RESTART can't cover
> the rerun when there are in-flight requests? What is the case
> in which dm-rq can return BUSY and there aren't any in-flight
> requests meantime?
> 
> Also you are the author of adding 'blk_mq_delay_run_hw_queue(
> hctx, 100/*ms*/)' in dm-rq, you never explain in commit
> 6077c2d706097c0(dm rq: Avoid that request processing stalls
> sporadically) what the root cause is for your request stall
> and why this patch fixes your issue. Even you don't explain
> why is the delay 100ms?
> 
> So it is a workaound, isn't it?
> 
> My concern is that it isn't good to add blk_mq_delay_run_hw_queue(hctx, 100/*ms*/)
> in the hot path since it should been covered by SCHED_RESTART
> if there are in-flight requests.

This thread proves that it is definitely brittle to be relying on fixed
delays like this:
https://patchwork.kernel.org/patch/9703249/

Mike

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE
  2017-09-19 15:48             ` Mike Snitzer
  (?)
@ 2017-09-19 15:52               ` Bart Van Assche
  -1 siblings, 0 replies; 69+ messages in thread
From: Bart Van Assche @ 2017-09-19 15:52 UTC (permalink / raw)
  To: snitzer, ming.lei
  Cc: linux-block, hch, sagi, martin.petersen, linux-scsi, axboe,
	linux-nvme, jejb, loberman, dm-devel

T24gVHVlLCAyMDE3LTA5LTE5IGF0IDExOjQ4IC0wNDAwLCBNaWtlIFNuaXR6ZXIgd3JvdGU6DQo+
IFRoaXMgdGhyZWFkIHByb3ZlcyB0aGF0IGl0IGlzIGRlZmluaXRlbHkgYnJpdHRsZSB0byBiZSBy
ZWx5aW5nIG9uIGZpeGVkDQo+IGRlbGF5cyBsaWtlIHRoaXM6DQo+IGh0dHBzOi8vcGF0Y2h3b3Jr
Lmtlcm5lbC5vcmcvcGF0Y2gvOTcwMzI0OS8NCg0KSGVsbG8gTWlrZSwNCg0KU29ycnkgYnV0IEkg
dGhpbmsgdGhhdCdzIGEgbWlzaW50ZXJwcmV0YXRpb24gb2YgbXkgcGF0Y2guIEkgY2FtZSB1cCB3
aXRoIHRoYXQNCnBhdGNoIGJlZm9yZSBJIGhhZCBmb3VuZCBhbmQgZml4ZWQgdGhlIHJvb3QgY2F1
c2Ugb2YgdGhlIGhpZ2ggc3lzdGVtIGxvYWQuDQpUaGVzZSBmaXhlcyBhcmUgdXBzdHJlYW0gKGtl
cm5lbCB2NC4xMykgd2hpY2ggbWVhbnMgdGhhdCB0aGF0IHBhdGNoIGlzIG5vdw0Kb2Jzb2xldGUu
DQoNCkJhcnQu

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE
@ 2017-09-19 15:52               ` Bart Van Assche
  0 siblings, 0 replies; 69+ messages in thread
From: Bart Van Assche @ 2017-09-19 15:52 UTC (permalink / raw)
  To: snitzer, ming.lei
  Cc: linux-block, hch, sagi, martin.petersen, linux-scsi, axboe,
	linux-nvme, jejb, loberman, dm-devel

On Tue, 2017-09-19 at 11:48 -0400, Mike Snitzer wrote:
> This thread proves that it is definitely brittle to be relying on fixed
> delays like this:
> https://patchwork.kernel.org/patch/9703249/

Hello Mike,

Sorry but I think that's a misinterpretation of my patch. I came up with that
patch before I had found and fixed the root cause of the high system load.
These fixes are upstream (kernel v4.13) which means that that patch is now
obsolete.

Bart.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE
@ 2017-09-19 15:52               ` Bart Van Assche
  0 siblings, 0 replies; 69+ messages in thread
From: Bart Van Assche @ 2017-09-19 15:52 UTC (permalink / raw)


On Tue, 2017-09-19@11:48 -0400, Mike Snitzer wrote:
> This thread proves that it is definitely brittle to be relying on fixed
> delays like this:
> https://patchwork.kernel.org/patch/9703249/

Hello Mike,

Sorry but I think that's a misinterpretation of my patch. I came up with that
patch before I had found and fixed the root cause of the high system load.
These fixes are upstream (kernel v4.13) which means that that patch is now
obsolete.

Bart.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE
  2017-09-19 15:36             ` Bart Van Assche
@ 2017-09-19 15:56               ` Mike Snitzer
  -1 siblings, 0 replies; 69+ messages in thread
From: Mike Snitzer @ 2017-09-19 15:56 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: ming.lei, linux-block, hch, sagi, martin.petersen, linux-scsi,
	axboe, linux-nvme, jejb, loberman, dm-devel

On Tue, Sep 19 2017 at 11:36am -0400,
Bart Van Assche <Bart.VanAssche@wdc.com> wrote:

> On Tue, 2017-09-19 at 13:43 +0800, Ming Lei wrote:
> > On Mon, Sep 18, 2017 at 03:18:16PM +0000, Bart Van Assche wrote:
> > > If you are still looking at removing the blk_mq_delay_run_hw_queue() calls
> > > then I think you are looking in the wrong direction. What kind of problem
> > > are you trying to solve? Is it perhaps that there can be a delay between
> > 
> > Actually the improvement on dm-rq IO schedule(the patch 2 ~ 5) doesn't
> > need this patch.
> 
> The approach of this patch series looks wrong to me and patch 5/5 is an ugly
> hack in my opinion. That's why I asked you to drop the entire patch series and
> to test whether inserting a queue run call into the dm-mpath end_io callback
> yields a similar performance improvement to the patches you posted. Please do
> not expect me to spend more time on this patch series if you do not come up
> with measurement results for the proposed alternative.

Bart, asserting that Ming's work is a hack doesn't help your apparent
goal of deligitimizing Ming's work.

Nor does it take away from the fact that your indecision on appropriate
timeouts (let alone ability to defend and/or explain them) validates
Ming calling them into question (which you are now dodging regularly).

But please don't take this feedback and shut-down.  Instead please work
together more constructively.  This doesn't need to be adversarial!  I
am at a loss for why there is such animosity from your end Bart.

Please dial it back.  It is just a distraction that fosters needless
in-fighting.

Believe it or not: Ming is just trying to improve the code because he
has a testbed that is showing fairly abysmal performance with dm-mq
multipath (on lpfc with scsi-mq).

Ming, if you can: please see if what Bart has proposed (instead: run
queue at end_io) helps.  Though I don't yet see why that should be
needed.

Mike

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE
@ 2017-09-19 15:56               ` Mike Snitzer
  0 siblings, 0 replies; 69+ messages in thread
From: Mike Snitzer @ 2017-09-19 15:56 UTC (permalink / raw)


On Tue, Sep 19 2017 at 11:36am -0400,
Bart Van Assche <Bart.VanAssche@wdc.com> wrote:

> On Tue, 2017-09-19@13:43 +0800, Ming Lei wrote:
> > On Mon, Sep 18, 2017@03:18:16PM +0000, Bart Van Assche wrote:
> > > If you are still looking at removing the blk_mq_delay_run_hw_queue() calls
> > > then I think you are looking in the wrong direction. What kind of problem
> > > are you trying to solve? Is it perhaps that there can be a delay between
> > 
> > Actually the improvement on dm-rq IO schedule(the patch 2 ~ 5) doesn't
> > need this patch.
> 
> The approach of this patch series looks wrong to me and patch 5/5 is an ugly
> hack in my opinion. That's why I asked you to drop the entire patch series and
> to test whether inserting a queue run call into the dm-mpath end_io callback
> yields a similar performance improvement to the patches you posted. Please do
> not expect me to spend more time on this patch series if you do not come up
> with measurement results for the proposed alternative.

Bart, asserting that Ming's work is a hack doesn't help your apparent
goal of deligitimizing Ming's work.

Nor does it take away from the fact that your indecision on appropriate
timeouts (let alone ability to defend and/or explain them) validates
Ming calling them into question (which you are now dodging regularly).

But please don't take this feedback and shut-down.  Instead please work
together more constructively.  This doesn't need to be adversarial!  I
am at a loss for why there is such animosity from your end Bart.

Please dial it back.  It is just a distraction that fosters needless
in-fighting.

Believe it or not: Ming is just trying to improve the code because he
has a testbed that is showing fairly abysmal performance with dm-mq
multipath (on lpfc with scsi-mq).

Ming, if you can: please see if what Bart has proposed (instead: run
queue at end_io) helps.  Though I don't yet see why that should be
needed.

Mike

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 2/5] dm-mpath: return DM_MAPIO_REQUEUE in case of rq allocation failure
  2017-09-19 14:41         ` Mike Snitzer
@ 2017-09-19 15:56           ` Ming Lei
  0 siblings, 0 replies; 69+ messages in thread
From: Ming Lei @ 2017-09-19 15:56 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: linux-block, loberman, hch, axboe, dm-devel, Bart Van Assche

On Tue, Sep 19, 2017 at 10:41:30AM -0400, Mike Snitzer wrote:
> On Sun, Sep 17 2017 at  9:23am -0400,
> Ming Lei <ming.lei@redhat.com> wrote:
> 
> > On Fri, Sep 15, 2017 at 04:06:55PM -0400, Mike Snitzer wrote:
> > > On Fri, Sep 15 2017 at  1:29pm -0400,
> > > Bart Van Assche <Bart.VanAssche@wdc.com> wrote:
> > > 
> > > > On Sat, 2017-09-16 at 00:44 +0800, Ming Lei wrote:
> > > > > blk-mq will rerun queue via RESTART after one request is completion,
> > > > > so not necessary to wait random time for requeuing, it should trust
> > > > > blk-mq to do it.
> > > > > 
> > > > > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > > > > ---
> > > > >  drivers/md/dm-mpath.c | 2 +-
> > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
> > > > > index 96aedaac2c64..f5a1088a6e79 100644
> > > > > --- a/drivers/md/dm-mpath.c
> > > > > +++ b/drivers/md/dm-mpath.c
> > > > > @@ -505,7 +505,7 @@ static int multipath_clone_and_map(struct dm_target *ti, struct request *rq,
> > > > >  			atomic_inc(&m->pg_init_in_progress);
> > > > >  			activate_or_offline_path(pgpath);
> > > > >  		}
> > > > > -		return DM_MAPIO_DELAY_REQUEUE;
> > > > > +		return DM_MAPIO_REQUEUE;
> > > > >  	}
> > > > >  	clone->bio = clone->biotail = NULL;
> > > > >  	clone->rq_disk = bdev->bd_disk;
> > > > 
> > > > So you are reverting the patch below? Thank you very much.
> > > > 
> > > > commit 1c23484c355ec360ca2f37914f8a4802c6baeead
> > > > Author: Bart Van Assche <bart.vanassche@wdc.com>
> > > > Date:   Wed Aug 9 11:32:12 2017 -0700
> > > > 
> > > >     dm mpath: do not lock up a CPU with requeuing activity
> > > >     
> > > >     When using the block layer in single queue mode, get_request()
> > > >     returns ERR_PTR(-EAGAIN) if the queue is dying and the REQ_NOWAIT
> > > >     flag has been passed to get_request(). Avoid that the kernel
> > > >     reports soft lockup complaints in this case due to continuous
> > > >     requeuing activity.
> > > >     
> > > >     Fixes: 7083abbbf ("dm mpath: avoid that path removal can trigger an infinite loop")
> > > >     Cc: stable@vger.kernel.org
> > > >     Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
> > > >     Tested-by: Laurence Oberman <loberman@redhat.com>
> > > >     Reviewed-by: Christoph Hellwig <hch@lst.de>
> > > >     Signed-off-by: Mike Snitzer <snitzer@redhat.com>
> > > 
> > > The problem is that multipath_clone_and_map() is now treated as common
> > > code (thanks to both blk-mq and old .request_fn now enjoying the use of
> > > blk_get_request) BUT: Ming please understand that this code is used by
> > > old .request_fn too.  So it would seem that the use of
> > 
> > Hi Mike,
> > 
> > OK, thanks for pointing this out.
> > 
> > > DM_MAPIO_DELAY_REQUEUE vs DM_MAPIO_REQUEUE needs to be based on dm-sq vs
> > > dm-mq.
> > 
> > Yeah, just forget that dm-mq can't work on underlying queue which is
> > block legacy path, also forget the exact reason, :-(
> 
> Not sure how that detail is relevant to your patch?
> 
> But here are the 2 patches I posted as RFC to eliminate the restriction:
> https://patchwork.kernel.org/patch/9839565/
> https://patchwork.kernel.org/patch/9839567/

Thanks for the posting.

> 
> In the end I'm not too interested in supporting blk-mq multipath ontop
> of block legacy path.  But I'd revisit if hch or someone else posted the
> blk-mq patch that "removed the ->complete handler" (as hch mentions in
> the thread of the first patchwork patch I reference above).

If blk-mq mpath can be ontop of both block legacy and blk-mq,
the legacy .request_fn can be removed from dm mpath, that can
be a big simplification, especially the global parameter of
use_blk_mq may exist for a bit long.

-- 
Ming

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE
  2017-09-19 15:52               ` Bart Van Assche
@ 2017-09-19 16:03                 ` Mike Snitzer
  -1 siblings, 0 replies; 69+ messages in thread
From: Mike Snitzer @ 2017-09-19 16:03 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: ming.lei, linux-block, hch, sagi, martin.petersen, linux-scsi,
	axboe, linux-nvme, jejb, loberman, dm-devel

On Tue, Sep 19 2017 at 11:52am -0400,
Bart Van Assche <Bart.VanAssche@wdc.com> wrote:

> On Tue, 2017-09-19 at 11:48 -0400, Mike Snitzer wrote:
> > This thread proves that it is definitely brittle to be relying on fixed
> > delays like this:
> > https://patchwork.kernel.org/patch/9703249/
> 
> Hello Mike,
> 
> Sorry but I think that's a misinterpretation of my patch. I came up with that
> patch before I had found and fixed the root cause of the high system load.
> These fixes are upstream (kernel v4.13) which means that that patch is now
> obsolete.

OK, fair point.

Though fixed magic values for delay aren't going to stand the test of
time.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE
@ 2017-09-19 16:03                 ` Mike Snitzer
  0 siblings, 0 replies; 69+ messages in thread
From: Mike Snitzer @ 2017-09-19 16:03 UTC (permalink / raw)


On Tue, Sep 19 2017 at 11:52am -0400,
Bart Van Assche <Bart.VanAssche@wdc.com> wrote:

> On Tue, 2017-09-19@11:48 -0400, Mike Snitzer wrote:
> > This thread proves that it is definitely brittle to be relying on fixed
> > delays like this:
> > https://patchwork.kernel.org/patch/9703249/
> 
> Hello Mike,
> 
> Sorry but I think that's a misinterpretation of my patch. I came up with that
> patch before I had found and fixed the root cause of the high system load.
> These fixes are upstream (kernel v4.13) which means that that patch is now
> obsolete.

OK, fair point.

Though fixed magic values for delay aren't going to stand the test of
time.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE
  2017-09-19 15:56               ` Mike Snitzer
@ 2017-09-19 16:04                 ` Ming Lei
  -1 siblings, 0 replies; 69+ messages in thread
From: Ming Lei @ 2017-09-19 16:04 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Bart Van Assche, linux-block, hch, sagi, martin.petersen,
	linux-scsi, axboe, linux-nvme, jejb, loberman, dm-devel

On Tue, Sep 19, 2017 at 11:56:03AM -0400, Mike Snitzer wrote:
> On Tue, Sep 19 2017 at 11:36am -0400,
> Bart Van Assche <Bart.VanAssche@wdc.com> wrote:
> 
> > On Tue, 2017-09-19 at 13:43 +0800, Ming Lei wrote:
> > > On Mon, Sep 18, 2017 at 03:18:16PM +0000, Bart Van Assche wrote:
> > > > If you are still looking at removing the blk_mq_delay_run_hw_queue() calls
> > > > then I think you are looking in the wrong direction. What kind of problem
> > > > are you trying to solve? Is it perhaps that there can be a delay between
> > > 
> > > Actually the improvement on dm-rq IO schedule(the patch 2 ~ 5) doesn't
> > > need this patch.
> > 
> > The approach of this patch series looks wrong to me and patch 5/5 is an ugly
> > hack in my opinion. That's why I asked you to drop the entire patch series and
> > to test whether inserting a queue run call into the dm-mpath end_io callback
> > yields a similar performance improvement to the patches you posted. Please do
> > not expect me to spend more time on this patch series if you do not come up
> > with measurement results for the proposed alternative.
> 
> Bart, asserting that Ming's work is a hack doesn't help your apparent
> goal of deligitimizing Ming's work.
> 
> Nor does it take away from the fact that your indecision on appropriate
> timeouts (let alone ability to defend and/or explain them) validates
> Ming calling them into question (which you are now dodging regularly).
> 
> But please don't take this feedback and shut-down.  Instead please work
> together more constructively.  This doesn't need to be adversarial!  I
> am at a loss for why there is such animosity from your end Bart.
> 
> Please dial it back.  It is just a distraction that fosters needless
> in-fighting.
> 
> Believe it or not: Ming is just trying to improve the code because he
> has a testbed that is showing fairly abysmal performance with dm-mq
> multipath (on lpfc with scsi-mq).
> 
> Ming, if you can: please see if what Bart has proposed (instead: run
> queue at end_io) helps.  Though I don't yet see why that should be
> needed.

Run queue at end_io is definitely wrong, because blk-mq has SCHED_RESTART
to do that already.

The dm-mpath performance issue is nothing to do with that, actually
the issue is very similar with my improvement on SCSI-MQ, because
now dm_dispatch_clone_request() doesn't know if the underlying
queue is busy or not, and dm-rq/mpath is just trying to dispatch
more request to underlying queue even though it is busy, then IO
merge can't be done easily on dm-rq/mpath.

I am thinking another way to improve this issue, since some
SCSI device's queue_depth is different with .cmd_per_lun,
will post patchset soon.


-- 
Ming

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE
@ 2017-09-19 16:04                 ` Ming Lei
  0 siblings, 0 replies; 69+ messages in thread
From: Ming Lei @ 2017-09-19 16:04 UTC (permalink / raw)


On Tue, Sep 19, 2017@11:56:03AM -0400, Mike Snitzer wrote:
> On Tue, Sep 19 2017 at 11:36am -0400,
> Bart Van Assche <Bart.VanAssche@wdc.com> wrote:
> 
> > On Tue, 2017-09-19@13:43 +0800, Ming Lei wrote:
> > > On Mon, Sep 18, 2017@03:18:16PM +0000, Bart Van Assche wrote:
> > > > If you are still looking at removing the blk_mq_delay_run_hw_queue() calls
> > > > then I think you are looking in the wrong direction. What kind of problem
> > > > are you trying to solve? Is it perhaps that there can be a delay between
> > > 
> > > Actually the improvement on dm-rq IO schedule(the patch 2 ~ 5) doesn't
> > > need this patch.
> > 
> > The approach of this patch series looks wrong to me and patch 5/5 is an ugly
> > hack in my opinion. That's why I asked you to drop the entire patch series and
> > to test whether inserting a queue run call into the dm-mpath end_io callback
> > yields a similar performance improvement to the patches you posted. Please do
> > not expect me to spend more time on this patch series if you do not come up
> > with measurement results for the proposed alternative.
> 
> Bart, asserting that Ming's work is a hack doesn't help your apparent
> goal of deligitimizing Ming's work.
> 
> Nor does it take away from the fact that your indecision on appropriate
> timeouts (let alone ability to defend and/or explain them) validates
> Ming calling them into question (which you are now dodging regularly).
> 
> But please don't take this feedback and shut-down.  Instead please work
> together more constructively.  This doesn't need to be adversarial!  I
> am at a loss for why there is such animosity from your end Bart.
> 
> Please dial it back.  It is just a distraction that fosters needless
> in-fighting.
> 
> Believe it or not: Ming is just trying to improve the code because he
> has a testbed that is showing fairly abysmal performance with dm-mq
> multipath (on lpfc with scsi-mq).
> 
> Ming, if you can: please see if what Bart has proposed (instead: run
> queue at end_io) helps.  Though I don't yet see why that should be
> needed.

Run queue at end_io is definitely wrong, because blk-mq has SCHED_RESTART
to do that already.

The dm-mpath performance issue is nothing to do with that, actually
the issue is very similar with my improvement on SCSI-MQ, because
now dm_dispatch_clone_request() doesn't know if the underlying
queue is busy or not, and dm-rq/mpath is just trying to dispatch
more request to underlying queue even though it is busy, then IO
merge can't be done easily on dm-rq/mpath.

I am thinking another way to improve this issue, since some
SCSI device's queue_depth is different with .cmd_per_lun,
will post patchset soon.


-- 
Ming

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE
  2017-09-19 15:48             ` Mike Snitzer
@ 2017-09-19 16:07               ` Ming Lei
  -1 siblings, 0 replies; 69+ messages in thread
From: Ming Lei @ 2017-09-19 16:07 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Bart Van Assche, hch, jejb, sagi, linux-scsi, axboe, linux-nvme,
	linux-block, dm-devel, martin.petersen, loberman

On Tue, Sep 19, 2017 at 11:48:23AM -0400, Mike Snitzer wrote:
> On Tue, Sep 19 2017 at  1:43am -0400,
> Ming Lei <ming.lei@redhat.com> wrote:
> 
> > On Mon, Sep 18, 2017 at 03:18:16PM +0000, Bart Van Assche wrote:
> > > On Sun, 2017-09-17 at 20:40 +0800, Ming Lei wrote:
> > > > "if no request has completed before the delay has expired" can't be a
> > > > reason to rerun the queue, because the queue can still be busy.
> > > 
> > > That statement of you shows that there are important aspects of the SCSI
> > > core and dm-mpath driver that you don't understand.
> > 
> > Then can you tell me why blk-mq's SCHED_RESTART can't cover
> > the rerun when there are in-flight requests? What is the case
> > in which dm-rq can return BUSY and there aren't any in-flight
> > requests meantime?
> > 
> > Also you are the author of adding 'blk_mq_delay_run_hw_queue(
> > hctx, 100/*ms*/)' in dm-rq, you never explain in commit
> > 6077c2d706097c0(dm rq: Avoid that request processing stalls
> > sporadically) what the root cause is for your request stall
> > and why this patch fixes your issue. Even you don't explain
> > why is the delay 100ms?
> > 
> > So it is a workaound, isn't it?
> > 
> > My concern is that it isn't good to add blk_mq_delay_run_hw_queue(hctx, 100/*ms*/)
> > in the hot path since it should been covered by SCHED_RESTART
> > if there are in-flight requests.
> 
> This thread proves that it is definitely brittle to be relying on fixed
> delays like this:
> https://patchwork.kernel.org/patch/9703249/

I can't agree more, because no one mentioned the root cause, maybe the
request stall has been fixed recently. Keeping the workaound in hotpath
is a bit annoying.

-- 
Ming

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE
@ 2017-09-19 16:07               ` Ming Lei
  0 siblings, 0 replies; 69+ messages in thread
From: Ming Lei @ 2017-09-19 16:07 UTC (permalink / raw)


On Tue, Sep 19, 2017@11:48:23AM -0400, Mike Snitzer wrote:
> On Tue, Sep 19 2017 at  1:43am -0400,
> Ming Lei <ming.lei@redhat.com> wrote:
> 
> > On Mon, Sep 18, 2017@03:18:16PM +0000, Bart Van Assche wrote:
> > > On Sun, 2017-09-17@20:40 +0800, Ming Lei wrote:
> > > > "if no request has completed before the delay has expired" can't be a
> > > > reason to rerun the queue, because the queue can still be busy.
> > > 
> > > That statement of you shows that there are important aspects of the SCSI
> > > core and dm-mpath driver that you don't understand.
> > 
> > Then can you tell me why blk-mq's SCHED_RESTART can't cover
> > the rerun when there are in-flight requests? What is the case
> > in which dm-rq can return BUSY and there aren't any in-flight
> > requests meantime?
> > 
> > Also you are the author of adding 'blk_mq_delay_run_hw_queue(
> > hctx, 100/*ms*/)' in dm-rq, you never explain in commit
> > 6077c2d706097c0(dm rq: Avoid that request processing stalls
> > sporadically) what the root cause is for your request stall
> > and why this patch fixes your issue. Even you don't explain
> > why is the delay 100ms?
> > 
> > So it is a workaound, isn't it?
> > 
> > My concern is that it isn't good to add blk_mq_delay_run_hw_queue(hctx, 100/*ms*/)
> > in the hot path since it should been covered by SCHED_RESTART
> > if there are in-flight requests.
> 
> This thread proves that it is definitely brittle to be relying on fixed
> delays like this:
> https://patchwork.kernel.org/patch/9703249/

I can't agree more, because no one mentioned the root cause, maybe the
request stall has been fixed recently. Keeping the workaound in hotpath
is a bit annoying.

-- 
Ming

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE
  2017-09-19 16:04                 ` Ming Lei
  (?)
@ 2017-09-19 16:49                   ` Bart Van Assche
  -1 siblings, 0 replies; 69+ messages in thread
From: Bart Van Assche @ 2017-09-19 16:49 UTC (permalink / raw)
  To: ming.lei, snitzer
  Cc: linux-block, hch, sagi, martin.petersen, linux-scsi, axboe,
	linux-nvme, jejb, loberman, dm-devel

T24gV2VkLCAyMDE3LTA5LTIwIGF0IDAwOjA0ICswODAwLCBNaW5nIExlaSB3cm90ZToNCj4gUnVu
IHF1ZXVlIGF0IGVuZF9pbyBpcyBkZWZpbml0ZWx5IHdyb25nLCBiZWNhdXNlIGJsay1tcSBoYXMg
U0NIRURfUkVTVEFSVA0KPiB0byBkbyB0aGF0IGFscmVhZHkuDQoNClNvcnJ5IGJ1dCBJIGRpc2Fn
cmVlLiBJZiBTQ0hFRF9SRVNUQVJUIGlzIHNldCB0aGF0IGNhdXNlcyB0aGUgYmxrLW1xIGNvcmUg
dG8NCnJlZXhhbWluZSB0aGUgc29mdHdhcmUgcXVldWVzIGFuZCB0aGUgaGN0eCBkaXNwYXRjaCBs
aXN0IGJ1dCBub3QgdGhlIHJlcXVldWUNCmxpc3QuIElmIGEgYmxvY2sgZHJpdmVyIHJldHVybnMg
QkxLX1NUU19SRVNPVVJDRSB0aGVuIHJlcXVlc3RzIGVuZCB1cCBvbiB0aGUNCnJlcXVldWUgbGlz
dC4gSGVuY2UgdGhlIGZvbGxvd2luZyBjb2RlIGluIHNjc2lfZW5kX3JlcXVlc3QoKToNCg0KCWlm
IChzY3NpX3RhcmdldChzZGV2KS0+c2luZ2xlX2x1biB8fCAhbGlzdF9lbXB0eSgmc2Rldi0+aG9z
dC0+c3RhcnZlZF9saXN0KSkNCgkJa2Jsb2NrZF9zY2hlZHVsZV93b3JrKCZzZGV2LT5yZXF1ZXVl
X3dvcmspOw0KCWVsc2UNCgkJYmxrX21xX3J1bl9od19xdWV1ZXMocSwgdHJ1ZSk7DQoNCkJhcnQu

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE
@ 2017-09-19 16:49                   ` Bart Van Assche
  0 siblings, 0 replies; 69+ messages in thread
From: Bart Van Assche @ 2017-09-19 16:49 UTC (permalink / raw)
  To: ming.lei, snitzer
  Cc: linux-block, hch, sagi, martin.petersen, linux-scsi, axboe,
	linux-nvme, jejb, loberman, dm-devel

On Wed, 2017-09-20 at 00:04 +0800, Ming Lei wrote:
> Run queue at end_io is definitely wrong, because blk-mq has SCHED_RESTART
> to do that already.

Sorry but I disagree. If SCHED_RESTART is set that causes the blk-mq core to
reexamine the software queues and the hctx dispatch list but not the requeue
list. If a block driver returns BLK_STS_RESOURCE then requests end up on the
requeue list. Hence the following code in scsi_end_request():

	if (scsi_target(sdev)->single_lun || !list_empty(&sdev->host->starved_list))
		kblockd_schedule_work(&sdev->requeue_work);
	else
		blk_mq_run_hw_queues(q, true);

Bart.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE
@ 2017-09-19 16:49                   ` Bart Van Assche
  0 siblings, 0 replies; 69+ messages in thread
From: Bart Van Assche @ 2017-09-19 16:49 UTC (permalink / raw)


On Wed, 2017-09-20@00:04 +0800, Ming Lei wrote:
> Run queue at end_io is definitely wrong, because blk-mq has SCHED_RESTART
> to do that already.

Sorry but I disagree. If SCHED_RESTART is set that causes the blk-mq core to
reexamine the software queues and the hctx dispatch list but not the requeue
list. If a block driver returns BLK_STS_RESOURCE then requests end up on the
requeue list. Hence the following code in scsi_end_request():

	if (scsi_target(sdev)->single_lun || !list_empty(&sdev->host->starved_list))
		kblockd_schedule_work(&sdev->requeue_work);
	else
		blk_mq_run_hw_queues(q, true);

Bart.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE
  2017-09-19 16:49                   ` Bart Van Assche
@ 2017-09-19 16:55                     ` Ming Lei
  -1 siblings, 0 replies; 69+ messages in thread
From: Ming Lei @ 2017-09-19 16:55 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: ming.lei, snitzer, linux-block, hch, sagi, martin.petersen,
	linux-scsi, axboe, linux-nvme, jejb, loberman, dm-devel

On Wed, Sep 20, 2017 at 12:49 AM, Bart Van Assche
<Bart.VanAssche@wdc.com> wrote:
> On Wed, 2017-09-20 at 00:04 +0800, Ming Lei wrote:
>> Run queue at end_io is definitely wrong, because blk-mq has SCHED_RESTART
>> to do that already.
>
> Sorry but I disagree. If SCHED_RESTART is set that causes the blk-mq core to
> reexamine the software queues and the hctx dispatch list but not the requeue
> list. If a block driver returns BLK_STS_RESOURCE then requests end up on the
> requeue list. Hence the following code in scsi_end_request():

That doesn't need SCHED_RESTART, because it is requeue's
responsibility to do that,
see blk_mq_requeue_work(), which will run hw queue at the end of this func.


-- 
Ming Lei

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE
@ 2017-09-19 16:55                     ` Ming Lei
  0 siblings, 0 replies; 69+ messages in thread
From: Ming Lei @ 2017-09-19 16:55 UTC (permalink / raw)


On Wed, Sep 20, 2017 at 12:49 AM, Bart Van Assche
<Bart.VanAssche@wdc.com> wrote:
> On Wed, 2017-09-20@00:04 +0800, Ming Lei wrote:
>> Run queue at end_io is definitely wrong, because blk-mq has SCHED_RESTART
>> to do that already.
>
> Sorry but I disagree. If SCHED_RESTART is set that causes the blk-mq core to
> reexamine the software queues and the hctx dispatch list but not the requeue
> list. If a block driver returns BLK_STS_RESOURCE then requests end up on the
> requeue list. Hence the following code in scsi_end_request():

That doesn't need SCHED_RESTART, because it is requeue's
responsibility to do that,
see blk_mq_requeue_work(), which will run hw queue at the end of this func.


-- 
Ming Lei

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE
  2017-09-19 16:55                     ` Ming Lei
  (?)
@ 2017-09-19 18:42                       ` Bart Van Assche
  -1 siblings, 0 replies; 69+ messages in thread
From: Bart Van Assche @ 2017-09-19 18:42 UTC (permalink / raw)
  To: tom.leiming
  Cc: linux-block, hch, sagi, snitzer, martin.petersen, ming.lei,
	linux-scsi, axboe, linux-nvme, jejb, loberman, dm-devel

T24gV2VkLCAyMDE3LTA5LTIwIGF0IDAwOjU1ICswODAwLCBNaW5nIExlaSB3cm90ZToNCj4gT24g
V2VkLCBTZXAgMjAsIDIwMTcgYXQgMTI6NDkgQU0sIEJhcnQgVmFuIEFzc2NoZQ0KPiA8QmFydC5W
YW5Bc3NjaGVAd2RjLmNvbT4gd3JvdGU6DQo+ID4gT24gV2VkLCAyMDE3LTA5LTIwIGF0IDAwOjA0
ICswODAwLCBNaW5nIExlaSB3cm90ZToNCj4gPiA+IFJ1biBxdWV1ZSBhdCBlbmRfaW8gaXMgZGVm
aW5pdGVseSB3cm9uZywgYmVjYXVzZSBibGstbXEgaGFzIFNDSEVEX1JFU1RBUlQNCj4gPiA+IHRv
IGRvIHRoYXQgYWxyZWFkeS4NCj4gPiANCj4gPiBTb3JyeSBidXQgSSBkaXNhZ3JlZS4gSWYgU0NI
RURfUkVTVEFSVCBpcyBzZXQgdGhhdCBjYXVzZXMgdGhlIGJsay1tcSBjb3JlIHRvDQo+ID4gcmVl
eGFtaW5lIHRoZSBzb2Z0d2FyZSBxdWV1ZXMgYW5kIHRoZSBoY3R4IGRpc3BhdGNoIGxpc3QgYnV0
IG5vdCB0aGUgcmVxdWV1ZQ0KPiA+IGxpc3QuIElmIGEgYmxvY2sgZHJpdmVyIHJldHVybnMgQkxL
X1NUU19SRVNPVVJDRSB0aGVuIHJlcXVlc3RzIGVuZCB1cCBvbiB0aGUNCj4gPiByZXF1ZXVlIGxp
c3QuIEhlbmNlIHRoZSBmb2xsb3dpbmcgY29kZSBpbiBzY3NpX2VuZF9yZXF1ZXN0KCk6DQo+IA0K
PiBUaGF0IGRvZXNuJ3QgbmVlZCBTQ0hFRF9SRVNUQVJULCBiZWNhdXNlIGl0IGlzIHJlcXVldWUn
cw0KPiByZXNwb25zaWJpbGl0eSB0byBkbyB0aGF0LA0KPiBzZWUgYmxrX21xX3JlcXVldWVfd29y
aygpLCB3aGljaCB3aWxsIHJ1biBodyBxdWV1ZSBhdCB0aGUgZW5kIG9mIHRoaXMgZnVuYy4NCg0K
VGhhdCdzIG5vdCB3aGF0IEkgd2FzIHRyeWluZyB0byBleHBsYWluLiBXaGF0IEkgd2FzIHRyeWlu
ZyB0byBleHBsYWluIGlzIHRoYXQNCmV2ZXJ5IGJsb2NrIGRyaXZlciB0aGF0IGNhbiBjYXVzZSBh
IHJlcXVlc3QgdG8gZW5kIHVwIG9uIHRoZSByZXF1ZXVlIGxpc3QgaXMNCnJlc3BvbnNpYmxlIGZv
ciBraWNraW5nIHRoZSByZXF1ZXVlIGxpc3QgYXQgYSBsYXRlciB0aW1lLiBIZW5jZSB0aGUNCmti
bG9ja2Rfc2NoZWR1bGVfd29yaygmc2Rldi0+cmVxdWV1ZV93b3JrKSBjYWxsIGluIHRoZSBTQ1NJ
IGNvcmUgYW5kIHRoZQ0KYmxrX21xX2tpY2tfcmVxdWV1ZV9saXN0KCkgYW5kIGJsa19tcV9kZWxh
eV9raWNrX3JlcXVldWVfbGlzdCgpIGNhbGxzIGluIHRoZQ0KZG0gY29kZS4gV2hhdCBJIHdvdWxk
IGxpa2UgdG8gc2VlIGlzIG1lYXN1cmVtZW50IHJlc3VsdHMgZm9yIGRtLW1wYXRoIHdpdGhvdXQN
CnRoaXMgcGF0Y2ggc2VyaWVzIGFuZCBhIGNhbGwgdG8ga2ljayB0aGUgcmVxdWV1ZSBsaXN0IGFk
ZGVkIHRvIHRoZSBkbS1tcGF0aA0KZW5kX2lvIGNvZGUuDQoNCkJhcnQu

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE
@ 2017-09-19 18:42                       ` Bart Van Assche
  0 siblings, 0 replies; 69+ messages in thread
From: Bart Van Assche @ 2017-09-19 18:42 UTC (permalink / raw)
  To: tom.leiming
  Cc: linux-block, hch, sagi, snitzer, martin.petersen, ming.lei,
	linux-scsi, axboe, linux-nvme, jejb, loberman, dm-devel

On Wed, 2017-09-20 at 00:55 +0800, Ming Lei wrote:
> On Wed, Sep 20, 2017 at 12:49 AM, Bart Van Assche
> <Bart.VanAssche@wdc.com> wrote:
> > On Wed, 2017-09-20 at 00:04 +0800, Ming Lei wrote:
> > > Run queue at end_io is definitely wrong, because blk-mq has SCHED_RESTART
> > > to do that already.
> > 
> > Sorry but I disagree. If SCHED_RESTART is set that causes the blk-mq core to
> > reexamine the software queues and the hctx dispatch list but not the requeue
> > list. If a block driver returns BLK_STS_RESOURCE then requests end up on the
> > requeue list. Hence the following code in scsi_end_request():
> 
> That doesn't need SCHED_RESTART, because it is requeue's
> responsibility to do that,
> see blk_mq_requeue_work(), which will run hw queue at the end of this func.

That's not what I was trying to explain. What I was trying to explain is that
every block driver that can cause a request to end up on the requeue list is
responsible for kicking the requeue list at a later time. Hence the
kblockd_schedule_work(&sdev->requeue_work) call in the SCSI core and the
blk_mq_kick_requeue_list() and blk_mq_delay_kick_requeue_list() calls in the
dm code. What I would like to see is measurement results for dm-mpath without
this patch series and a call to kick the requeue list added to the dm-mpath
end_io code.

Bart.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE
@ 2017-09-19 18:42                       ` Bart Van Assche
  0 siblings, 0 replies; 69+ messages in thread
From: Bart Van Assche @ 2017-09-19 18:42 UTC (permalink / raw)


On Wed, 2017-09-20@00:55 +0800, Ming Lei wrote:
> On Wed, Sep 20, 2017 at 12:49 AM, Bart Van Assche
> <Bart.VanAssche@wdc.com> wrote:
> > On Wed, 2017-09-20@00:04 +0800, Ming Lei wrote:
> > > Run queue at end_io is definitely wrong, because blk-mq has SCHED_RESTART
> > > to do that already.
> > 
> > Sorry but I disagree. If SCHED_RESTART is set that causes the blk-mq core to
> > reexamine the software queues and the hctx dispatch list but not the requeue
> > list. If a block driver returns BLK_STS_RESOURCE then requests end up on the
> > requeue list. Hence the following code in scsi_end_request():
> 
> That doesn't need SCHED_RESTART, because it is requeue's
> responsibility to do that,
> see blk_mq_requeue_work(), which will run hw queue at the end of this func.

That's not what I was trying to explain. What I was trying to explain is that
every block driver that can cause a request to end up on the requeue list is
responsible for kicking the requeue list at a later time. Hence the
kblockd_schedule_work(&sdev->requeue_work) call in the SCSI core and the
blk_mq_kick_requeue_list() and blk_mq_delay_kick_requeue_list() calls in the
dm code. What I would like to see is measurement results for dm-mpath without
this patch series and a call to kick the requeue list added to the dm-mpath
end_io code.

Bart.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE
  2017-09-19 18:42                       ` Bart Van Assche
@ 2017-09-19 22:44                         ` Ming Lei
  -1 siblings, 0 replies; 69+ messages in thread
From: Ming Lei @ 2017-09-19 22:44 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: tom.leiming, linux-block, hch, sagi, snitzer, martin.petersen,
	linux-scsi, axboe, linux-nvme, jejb, loberman, dm-devel

On Tue, Sep 19, 2017 at 06:42:30PM +0000, Bart Van Assche wrote:
> On Wed, 2017-09-20 at 00:55 +0800, Ming Lei wrote:
> > On Wed, Sep 20, 2017 at 12:49 AM, Bart Van Assche
> > <Bart.VanAssche@wdc.com> wrote:
> > > On Wed, 2017-09-20 at 00:04 +0800, Ming Lei wrote:
> > > > Run queue at end_io is definitely wrong, because blk-mq has SCHED_RESTART
> > > > to do that already.
> > > 
> > > Sorry but I disagree. If SCHED_RESTART is set that causes the blk-mq core to
> > > reexamine the software queues and the hctx dispatch list but not the requeue
> > > list. If a block driver returns BLK_STS_RESOURCE then requests end up on the
> > > requeue list. Hence the following code in scsi_end_request():
> > 
> > That doesn't need SCHED_RESTART, because it is requeue's
> > responsibility to do that,
> > see blk_mq_requeue_work(), which will run hw queue at the end of this func.
> 
> That's not what I was trying to explain. What I was trying to explain is that
> every block driver that can cause a request to end up on the requeue list is
> responsible for kicking the requeue list at a later time. Hence the
> kblockd_schedule_work(&sdev->requeue_work) call in the SCSI core and the
> blk_mq_kick_requeue_list() and blk_mq_delay_kick_requeue_list() calls in the
> dm code. What I would like to see is measurement results for dm-mpath without
> this patch series and a call to kick the requeue list added to the dm-mpath
> end_io code.

For this issue, it isn't same between SCSI and dm-rq.

We don't need to run queue in .end_io of dm, and the theory is
simple, otherwise it isn't performance issue, and should be I/O hang.

1) every dm-rq's request is 1:1 mapped to SCSI's request

2) if there is any mapped SCSI request not finished, either
in-flight or in requeue list or whatever, there will be one
corresponding dm-rq's request in-flight

3) once the mapped SCSI request is completed, dm-rq's completion
path will be triggered and dm-rq's queue will be rerun because of
SCHED_RESTART in dm-rq

So the hw queue of dm-rq has been run in dm-rq's completion path
already, right? Why do we need to do it again in the hot path?

-- 
Ming

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE
@ 2017-09-19 22:44                         ` Ming Lei
  0 siblings, 0 replies; 69+ messages in thread
From: Ming Lei @ 2017-09-19 22:44 UTC (permalink / raw)


On Tue, Sep 19, 2017@06:42:30PM +0000, Bart Van Assche wrote:
> On Wed, 2017-09-20@00:55 +0800, Ming Lei wrote:
> > On Wed, Sep 20, 2017 at 12:49 AM, Bart Van Assche
> > <Bart.VanAssche@wdc.com> wrote:
> > > On Wed, 2017-09-20@00:04 +0800, Ming Lei wrote:
> > > > Run queue at end_io is definitely wrong, because blk-mq has SCHED_RESTART
> > > > to do that already.
> > > 
> > > Sorry but I disagree. If SCHED_RESTART is set that causes the blk-mq core to
> > > reexamine the software queues and the hctx dispatch list but not the requeue
> > > list. If a block driver returns BLK_STS_RESOURCE then requests end up on the
> > > requeue list. Hence the following code in scsi_end_request():
> > 
> > That doesn't need SCHED_RESTART, because it is requeue's
> > responsibility to do that,
> > see blk_mq_requeue_work(), which will run hw queue at the end of this func.
> 
> That's not what I was trying to explain. What I was trying to explain is that
> every block driver that can cause a request to end up on the requeue list is
> responsible for kicking the requeue list at a later time. Hence the
> kblockd_schedule_work(&sdev->requeue_work) call in the SCSI core and the
> blk_mq_kick_requeue_list() and blk_mq_delay_kick_requeue_list() calls in the
> dm code. What I would like to see is measurement results for dm-mpath without
> this patch series and a call to kick the requeue list added to the dm-mpath
> end_io code.

For this issue, it isn't same between SCSI and dm-rq.

We don't need to run queue in .end_io of dm, and the theory is
simple, otherwise it isn't performance issue, and should be I/O hang.

1) every dm-rq's request is 1:1 mapped to SCSI's request

2) if there is any mapped SCSI request not finished, either
in-flight or in requeue list or whatever, there will be one
corresponding dm-rq's request in-flight

3) once the mapped SCSI request is completed, dm-rq's completion
path will be triggered and dm-rq's queue will be rerun because of
SCHED_RESTART in dm-rq

So the hw queue of dm-rq has been run in dm-rq's completion path
already, right? Why do we need to do it again in the hot path?

-- 
Ming

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE
  2017-09-19 22:44                         ` Ming Lei
  (?)
@ 2017-09-19 23:25                           ` Bart Van Assche
  -1 siblings, 0 replies; 69+ messages in thread
From: Bart Van Assche @ 2017-09-19 23:25 UTC (permalink / raw)
  To: ming.lei
  Cc: linux-block, hch, tom.leiming, sagi, snitzer, martin.petersen,
	linux-scsi, axboe, linux-nvme, jejb, loberman, dm-devel

T24gV2VkLCAyMDE3LTA5LTIwIGF0IDA2OjQ0ICswODAwLCBNaW5nIExlaSB3cm90ZToNCj4gRm9y
IHRoaXMgaXNzdWUsIGl0IGlzbid0IHNhbWUgYmV0d2VlbiBTQ1NJIGFuZCBkbS1ycS4NCj4gDQo+
IFdlIGRvbid0IG5lZWQgdG8gcnVuIHF1ZXVlIGluIC5lbmRfaW8gb2YgZG0sIGFuZCB0aGUgdGhl
b3J5IGlzDQo+IHNpbXBsZSwgb3RoZXJ3aXNlIGl0IGlzbid0IHBlcmZvcm1hbmNlIGlzc3VlLCBh
bmQgc2hvdWxkIGJlIEkvTyBoYW5nLg0KPiANCj4gMSkgZXZlcnkgZG0tcnEncyByZXF1ZXN0IGlz
IDE6MSBtYXBwZWQgdG8gU0NTSSdzIHJlcXVlc3QNCj4gDQo+IDIpIGlmIHRoZXJlIGlzIGFueSBt
YXBwZWQgU0NTSSByZXF1ZXN0IG5vdCBmaW5pc2hlZCwgZWl0aGVyDQo+IGluLWZsaWdodCBvciBp
biByZXF1ZXVlIGxpc3Qgb3Igd2hhdGV2ZXIsIHRoZXJlIHdpbGwgYmUgb25lDQo+IGNvcnJlc3Bv
bmRpbmcgZG0tcnEncyByZXF1ZXN0IGluLWZsaWdodA0KPiANCj4gMykgb25jZSB0aGUgbWFwcGVk
IFNDU0kgcmVxdWVzdCBpcyBjb21wbGV0ZWQsIGRtLXJxJ3MgY29tcGxldGlvbg0KPiBwYXRoIHdp
bGwgYmUgdHJpZ2dlcmVkIGFuZCBkbS1ycSdzIHF1ZXVlIHdpbGwgYmUgcmVydW4gYmVjYXVzZSBv
Zg0KPiBTQ0hFRF9SRVNUQVJUIGluIGRtLXJxDQo+IA0KPiBTbyB0aGUgaHcgcXVldWUgb2YgZG0t
cnEgaGFzIGJlZW4gcnVuIGluIGRtLXJxJ3MgY29tcGxldGlvbiBwYXRoDQo+IGFscmVhZHksIHJp
Z2h0PyBXaHkgZG8gd2UgbmVlZCB0byBkbyBpdCBhZ2FpbiBpbiB0aGUgaG90IHBhdGg/DQoNClRo
ZSBtZWFzdXJlbWVudCBkYXRhIGluIHRoZSBkZXNjcmlwdGlvbiBvZiBwYXRjaCA1LzUgc2hvd3Mg
YSBzaWduaWZpY2FudA0KcGVyZm9ybWFuY2UgcmVncmVzc2lvbiBmb3IgYW4gaW1wb3J0YW50IHdv
cmtsb2FkLCBuYW1lbHkgcmFuZG9tIEkvTy4NCkFkZGl0aW9uYWxseSwgdGhlIHBlcmZvcm1hbmNl
IGltcHJvdmVtZW50IGZvciBzZXF1ZW50aWFsIEkvTyB3YXMgYWNoaWV2ZWQNCmZvciBhbiB1bnJl
YWxpc3RpY2FsbHkgbG93IHF1ZXVlIGRlcHRoLiBTb3JyeSBidXQgZ2l2ZW4gdGhlc2UgbWVhc3Vy
ZW1lbnQNCnJlc3VsdHMgSSBkb24ndCBzZWUgd2h5IEkgc2hvdWxkIHNwZW5kIG1vcmUgdGltZSBv
biB0aGlzIHBhdGNoIHNlcmllcy4NCg0KQmFydC4=

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE
@ 2017-09-19 23:25                           ` Bart Van Assche
  0 siblings, 0 replies; 69+ messages in thread
From: Bart Van Assche @ 2017-09-19 23:25 UTC (permalink / raw)
  To: ming.lei
  Cc: linux-block, hch, tom.leiming, sagi, snitzer, martin.petersen,
	linux-scsi, axboe, linux-nvme, jejb, loberman, dm-devel

On Wed, 2017-09-20 at 06:44 +0800, Ming Lei wrote:
> For this issue, it isn't same between SCSI and dm-rq.
> 
> We don't need to run queue in .end_io of dm, and the theory is
> simple, otherwise it isn't performance issue, and should be I/O hang.
> 
> 1) every dm-rq's request is 1:1 mapped to SCSI's request
> 
> 2) if there is any mapped SCSI request not finished, either
> in-flight or in requeue list or whatever, there will be one
> corresponding dm-rq's request in-flight
> 
> 3) once the mapped SCSI request is completed, dm-rq's completion
> path will be triggered and dm-rq's queue will be rerun because of
> SCHED_RESTART in dm-rq
> 
> So the hw queue of dm-rq has been run in dm-rq's completion path
> already, right? Why do we need to do it again in the hot path?

The measurement data in the description of patch 5/5 shows a significant
performance regression for an important workload, namely random I/O.
Additionally, the performance improvement for sequential I/O was achieved
for an unrealistically low queue depth. Sorry but given these measurement
results I don't see why I should spend more time on this patch series.

Bart.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE
@ 2017-09-19 23:25                           ` Bart Van Assche
  0 siblings, 0 replies; 69+ messages in thread
From: Bart Van Assche @ 2017-09-19 23:25 UTC (permalink / raw)


On Wed, 2017-09-20@06:44 +0800, Ming Lei wrote:
> For this issue, it isn't same between SCSI and dm-rq.
> 
> We don't need to run queue in .end_io of dm, and the theory is
> simple, otherwise it isn't performance issue, and should be I/O hang.
> 
> 1) every dm-rq's request is 1:1 mapped to SCSI's request
> 
> 2) if there is any mapped SCSI request not finished, either
> in-flight or in requeue list or whatever, there will be one
> corresponding dm-rq's request in-flight
> 
> 3) once the mapped SCSI request is completed, dm-rq's completion
> path will be triggered and dm-rq's queue will be rerun because of
> SCHED_RESTART in dm-rq
> 
> So the hw queue of dm-rq has been run in dm-rq's completion path
> already, right? Why do we need to do it again in the hot path?

The measurement data in the description of patch 5/5 shows a significant
performance regression for an important workload, namely random I/O.
Additionally, the performance improvement for sequential I/O was achieved
for an unrealistically low queue depth. Sorry but given these measurement
results I don't see why I should spend more time on this patch series.

Bart.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE
  2017-09-19 23:25                           ` Bart Van Assche
@ 2017-09-19 23:50                             ` Mike Snitzer
  -1 siblings, 0 replies; 69+ messages in thread
From: Mike Snitzer @ 2017-09-19 23:50 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: ming.lei, linux-block, hch, tom.leiming, sagi, martin.petersen,
	linux-scsi, axboe, linux-nvme, jejb, loberman, dm-devel

On Tue, Sep 19 2017 at  7:25pm -0400,
Bart Van Assche <Bart.VanAssche@wdc.com> wrote:

> On Wed, 2017-09-20 at 06:44 +0800, Ming Lei wrote:
> > For this issue, it isn't same between SCSI and dm-rq.
> > 
> > We don't need to run queue in .end_io of dm, and the theory is
> > simple, otherwise it isn't performance issue, and should be I/O hang.
> > 
> > 1) every dm-rq's request is 1:1 mapped to SCSI's request
> > 
> > 2) if there is any mapped SCSI request not finished, either
> > in-flight or in requeue list or whatever, there will be one
> > corresponding dm-rq's request in-flight
> > 
> > 3) once the mapped SCSI request is completed, dm-rq's completion
> > path will be triggered and dm-rq's queue will be rerun because of
> > SCHED_RESTART in dm-rq
> > 
> > So the hw queue of dm-rq has been run in dm-rq's completion path
> > already, right? Why do we need to do it again in the hot path?
> 
> The measurement data in the description of patch 5/5 shows a significant
> performance regression for an important workload, namely random I/O.
> Additionally, the performance improvement for sequential I/O was achieved
> for an unrealistically low queue depth.

So you've ignored Ming's question entirely and instead decided to focus
on previous questions you raised to Ming that he ignored.  This is
getting tedious.

Especially given that Ming said the first patch that all this fighting
has been over isn't even required to attain the improvements.

Ming, please retest both your baseline and patched setup with a
queue_depth of >= 32.  Also, please do 3 - 5 runs to get a avg and std
dev across the runs.

> Sorry but given these measurement results I don't see why I should
> spend more time on this patch series. 

Bart, I've historically genuinely always appreciated your review and
insight.  But if your future "review" on this patchset would take the
form shown in this thread then please don't spend more time on it.

Mike

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE
@ 2017-09-19 23:50                             ` Mike Snitzer
  0 siblings, 0 replies; 69+ messages in thread
From: Mike Snitzer @ 2017-09-19 23:50 UTC (permalink / raw)


On Tue, Sep 19 2017 at  7:25pm -0400,
Bart Van Assche <Bart.VanAssche@wdc.com> wrote:

> On Wed, 2017-09-20@06:44 +0800, Ming Lei wrote:
> > For this issue, it isn't same between SCSI and dm-rq.
> > 
> > We don't need to run queue in .end_io of dm, and the theory is
> > simple, otherwise it isn't performance issue, and should be I/O hang.
> > 
> > 1) every dm-rq's request is 1:1 mapped to SCSI's request
> > 
> > 2) if there is any mapped SCSI request not finished, either
> > in-flight or in requeue list or whatever, there will be one
> > corresponding dm-rq's request in-flight
> > 
> > 3) once the mapped SCSI request is completed, dm-rq's completion
> > path will be triggered and dm-rq's queue will be rerun because of
> > SCHED_RESTART in dm-rq
> > 
> > So the hw queue of dm-rq has been run in dm-rq's completion path
> > already, right? Why do we need to do it again in the hot path?
> 
> The measurement data in the description of patch 5/5 shows a significant
> performance regression for an important workload, namely random I/O.
> Additionally, the performance improvement for sequential I/O was achieved
> for an unrealistically low queue depth.

So you've ignored Ming's question entirely and instead decided to focus
on previous questions you raised to Ming that he ignored.  This is
getting tedious.

Especially given that Ming said the first patch that all this fighting
has been over isn't even required to attain the improvements.

Ming, please retest both your baseline and patched setup with a
queue_depth of >= 32.  Also, please do 3 - 5 runs to get a avg and std
dev across the runs.

> Sorry but given these measurement results I don't see why I should
> spend more time on this patch series. 

Bart, I've historically genuinely always appreciated your review and
insight.  But if your future "review" on this patchset would take the
form shown in this thread then please don't spend more time on it.

Mike

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE
  2017-09-19 23:50                             ` Mike Snitzer
@ 2017-09-20  1:13                               ` Ming Lei
  -1 siblings, 0 replies; 69+ messages in thread
From: Ming Lei @ 2017-09-20  1:13 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Bart Van Assche, linux-block, hch, tom.leiming, sagi,
	martin.petersen, linux-scsi, axboe, linux-nvme, jejb, loberman,
	dm-devel

Hi Mike,

On Tue, Sep 19, 2017 at 07:50:06PM -0400, Mike Snitzer wrote:
> On Tue, Sep 19 2017 at  7:25pm -0400,
> Bart Van Assche <Bart.VanAssche@wdc.com> wrote:
> 
> > On Wed, 2017-09-20 at 06:44 +0800, Ming Lei wrote:
> > > For this issue, it isn't same between SCSI and dm-rq.
> > > 
> > > We don't need to run queue in .end_io of dm, and the theory is
> > > simple, otherwise it isn't performance issue, and should be I/O hang.
> > > 
> > > 1) every dm-rq's request is 1:1 mapped to SCSI's request
> > > 
> > > 2) if there is any mapped SCSI request not finished, either
> > > in-flight or in requeue list or whatever, there will be one
> > > corresponding dm-rq's request in-flight
> > > 
> > > 3) once the mapped SCSI request is completed, dm-rq's completion
> > > path will be triggered and dm-rq's queue will be rerun because of
> > > SCHED_RESTART in dm-rq
> > > 
> > > So the hw queue of dm-rq has been run in dm-rq's completion path
> > > already, right? Why do we need to do it again in the hot path?
> > 
> > The measurement data in the description of patch 5/5 shows a significant
> > performance regression for an important workload, namely random I/O.
> > Additionally, the performance improvement for sequential I/O was achieved
> > for an unrealistically low queue depth.
> 
> So you've ignored Ming's question entirely and instead decided to focus
> on previous questions you raised to Ming that he ignored.  This is
> getting tedious.

Sorry for not making it clear, I mentioned I will post a new version
to address the random I/O regression.

> 
> Especially given that Ming said the first patch that all this fighting
> has been over isn't even required to attain the improvements.
> 
> Ming, please retest both your baseline and patched setup with a
> queue_depth of >= 32.  Also, please do 3 - 5 runs to get a avg and std
> dev across the runs.

Taking a bigger queue_depth won't be helpful on this issue,
and it can make the situation worse, because .cmd_per_lun won't
be changed, and queue often becomes busy when number of in-flight
requests is bigger than .cmd_per_lun.

I will post one new version, which will use another simple way to
figure out if underlying queue is busy, so that random I/O perf
won't be affected, but this new version need to depend on the
following patchset:

	https://marc.info/?t=150436555700002&r=1&w=2

So it may take a while since that patchset is still under review.

I will post them all together in 'blk-mq-sched: improve SCSI-MQ performance(V5)'.

The approach taken in patch 5 depends on q->queue_depth, but some SCSI
host's .cmd_per_lun is different with q->queue_depth, so causes
the random I/O regression.

-- 
Ming

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE
@ 2017-09-20  1:13                               ` Ming Lei
  0 siblings, 0 replies; 69+ messages in thread
From: Ming Lei @ 2017-09-20  1:13 UTC (permalink / raw)


Hi Mike,

On Tue, Sep 19, 2017@07:50:06PM -0400, Mike Snitzer wrote:
> On Tue, Sep 19 2017 at  7:25pm -0400,
> Bart Van Assche <Bart.VanAssche@wdc.com> wrote:
> 
> > On Wed, 2017-09-20@06:44 +0800, Ming Lei wrote:
> > > For this issue, it isn't same between SCSI and dm-rq.
> > > 
> > > We don't need to run queue in .end_io of dm, and the theory is
> > > simple, otherwise it isn't performance issue, and should be I/O hang.
> > > 
> > > 1) every dm-rq's request is 1:1 mapped to SCSI's request
> > > 
> > > 2) if there is any mapped SCSI request not finished, either
> > > in-flight or in requeue list or whatever, there will be one
> > > corresponding dm-rq's request in-flight
> > > 
> > > 3) once the mapped SCSI request is completed, dm-rq's completion
> > > path will be triggered and dm-rq's queue will be rerun because of
> > > SCHED_RESTART in dm-rq
> > > 
> > > So the hw queue of dm-rq has been run in dm-rq's completion path
> > > already, right? Why do we need to do it again in the hot path?
> > 
> > The measurement data in the description of patch 5/5 shows a significant
> > performance regression for an important workload, namely random I/O.
> > Additionally, the performance improvement for sequential I/O was achieved
> > for an unrealistically low queue depth.
> 
> So you've ignored Ming's question entirely and instead decided to focus
> on previous questions you raised to Ming that he ignored.  This is
> getting tedious.

Sorry for not making it clear, I mentioned I will post a new version
to address the random I/O regression.

> 
> Especially given that Ming said the first patch that all this fighting
> has been over isn't even required to attain the improvements.
> 
> Ming, please retest both your baseline and patched setup with a
> queue_depth of >= 32.  Also, please do 3 - 5 runs to get a avg and std
> dev across the runs.

Taking a bigger queue_depth won't be helpful on this issue,
and it can make the situation worse, because .cmd_per_lun won't
be changed, and queue often becomes busy when number of in-flight
requests is bigger than .cmd_per_lun.

I will post one new version, which will use another simple way to
figure out if underlying queue is busy, so that random I/O perf
won't be affected, but this new version need to depend on the
following patchset:

	https://marc.info/?t=150436555700002&r=1&w=2

So it may take a while since that patchset is still under review.

I will post them all together in 'blk-mq-sched: improve SCSI-MQ performance(V5)'.

The approach taken in patch 5 depends on q->queue_depth, but some SCSI
host's .cmd_per_lun is different with q->queue_depth, so causes
the random I/O regression.

-- 
Ming

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE
  2017-09-19 16:49                   ` Bart Van Assche
@ 2017-09-20  1:19                     ` Ming Lei
  -1 siblings, 0 replies; 69+ messages in thread
From: Ming Lei @ 2017-09-20  1:19 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: snitzer, linux-block, hch, sagi, martin.petersen, linux-scsi,
	axboe, linux-nvme, jejb, loberman, dm-devel

On Tue, Sep 19, 2017 at 04:49:15PM +0000, Bart Van Assche wrote:
> On Wed, 2017-09-20 at 00:04 +0800, Ming Lei wrote:
> > Run queue at end_io is definitely wrong, because blk-mq has SCHED_RESTART
> > to do that already.
> 
> Sorry but I disagree. If SCHED_RESTART is set that causes the blk-mq core to
> reexamine the software queues and the hctx dispatch list but not the requeue
> list. If a block driver returns BLK_STS_RESOURCE then requests end up on the
> requeue list. Hence the following code in scsi_end_request():
> 
> 	if (scsi_target(sdev)->single_lun || !list_empty(&sdev->host->starved_list))
> 		kblockd_schedule_work(&sdev->requeue_work);
> 	else
> 		blk_mq_run_hw_queues(q, true);

Let's focus on dm-rq.

I have explained before, dm-rq is different with SCSI's, we don't need
to requeue request any more in dm-rq if blk_get_request() returns NULL
in multipath_clone_and_map(), since SCHED_RESTART can cover that
definitely.

Not mention dm_mq_delay_requeue_request() will run the queue explicitly
if it has to be done. That needn't SCHED_RESTART to cover, right?

-- 
Ming

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE
@ 2017-09-20  1:19                     ` Ming Lei
  0 siblings, 0 replies; 69+ messages in thread
From: Ming Lei @ 2017-09-20  1:19 UTC (permalink / raw)


On Tue, Sep 19, 2017@04:49:15PM +0000, Bart Van Assche wrote:
> On Wed, 2017-09-20@00:04 +0800, Ming Lei wrote:
> > Run queue at end_io is definitely wrong, because blk-mq has SCHED_RESTART
> > to do that already.
> 
> Sorry but I disagree. If SCHED_RESTART is set that causes the blk-mq core to
> reexamine the software queues and the hctx dispatch list but not the requeue
> list. If a block driver returns BLK_STS_RESOURCE then requests end up on the
> requeue list. Hence the following code in scsi_end_request():
> 
> 	if (scsi_target(sdev)->single_lun || !list_empty(&sdev->host->starved_list))
> 		kblockd_schedule_work(&sdev->requeue_work);
> 	else
> 		blk_mq_run_hw_queues(q, true);

Let's focus on dm-rq.

I have explained before, dm-rq is different with SCSI's, we don't need
to requeue request any more in dm-rq if blk_get_request() returns NULL
in multipath_clone_and_map(), since SCHED_RESTART can cover that
definitely.

Not mention dm_mq_delay_requeue_request() will run the queue explicitly
if it has to be done. That needn't SCHED_RESTART to cover, right?

-- 
Ming

^ permalink raw reply	[flat|nested] 69+ messages in thread

end of thread, other threads:[~2017-09-20  1:19 UTC | newest]

Thread overview: 69+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-15 16:44 [PATCH 0/5] dm-mpath: improve I/O schedule Ming Lei
2017-09-15 16:44 ` Ming Lei
2017-09-15 16:44 ` [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE Ming Lei
2017-09-15 16:44   ` Ming Lei
2017-09-15 17:57   ` Bart Van Assche
2017-09-15 17:57     ` Bart Van Assche
2017-09-15 17:57     ` Bart Van Assche
2017-09-17 12:40     ` Ming Lei
2017-09-17 12:40       ` Ming Lei
2017-09-18 15:18       ` Bart Van Assche
2017-09-18 15:18         ` Bart Van Assche
2017-09-18 15:18         ` Bart Van Assche
2017-09-19  5:43         ` Ming Lei
2017-09-19  5:43           ` Ming Lei
2017-09-19 15:36           ` Bart Van Assche
2017-09-19 15:36             ` Bart Van Assche
2017-09-19 15:36             ` Bart Van Assche
2017-09-19 15:56             ` Mike Snitzer
2017-09-19 15:56               ` Mike Snitzer
2017-09-19 16:04               ` Ming Lei
2017-09-19 16:04                 ` Ming Lei
2017-09-19 16:49                 ` Bart Van Assche
2017-09-19 16:49                   ` Bart Van Assche
2017-09-19 16:49                   ` Bart Van Assche
2017-09-19 16:55                   ` Ming Lei
2017-09-19 16:55                     ` Ming Lei
2017-09-19 18:42                     ` Bart Van Assche
2017-09-19 18:42                       ` Bart Van Assche
2017-09-19 18:42                       ` Bart Van Assche
2017-09-19 22:44                       ` Ming Lei
2017-09-19 22:44                         ` Ming Lei
2017-09-19 23:25                         ` Bart Van Assche
2017-09-19 23:25                           ` Bart Van Assche
2017-09-19 23:25                           ` Bart Van Assche
2017-09-19 23:50                           ` Mike Snitzer
2017-09-19 23:50                             ` Mike Snitzer
2017-09-20  1:13                             ` Ming Lei
2017-09-20  1:13                               ` Ming Lei
2017-09-20  1:19                   ` Ming Lei
2017-09-20  1:19                     ` Ming Lei
2017-09-19 15:48           ` Mike Snitzer
2017-09-19 15:48             ` Mike Snitzer
2017-09-19 15:52             ` Bart Van Assche
2017-09-19 15:52               ` Bart Van Assche
2017-09-19 15:52               ` Bart Van Assche
2017-09-19 16:03               ` Mike Snitzer
2017-09-19 16:03                 ` Mike Snitzer
2017-09-19 16:07             ` Ming Lei
2017-09-19 16:07               ` Ming Lei
2017-09-15 16:44 ` [PATCH 2/5] dm-mpath: return DM_MAPIO_REQUEUE in case of rq allocation failure Ming Lei
2017-09-15 17:29   ` Bart Van Assche
2017-09-15 17:29     ` Bart Van Assche
2017-09-15 20:06     ` Mike Snitzer
2017-09-15 20:48       ` Bart Van Assche
2017-09-15 20:48         ` Bart Van Assche
2017-09-17 13:23       ` Ming Lei
2017-09-19 14:41         ` Mike Snitzer
2017-09-19 15:56           ` Ming Lei
2017-09-17 12:51     ` Ming Lei
2017-09-15 16:44 ` [PATCH 3/5] dm-mpath: remove annoying message of 'blk_get_request() returned -11' Ming Lei
2017-09-15 16:44 ` [PATCH 4/5] block: export blk_update_nr_requests Ming Lei
2017-09-15 16:44 ` [PATCH 5/5] dm-mpath: improve I/O schedule Ming Lei
2017-09-15 20:10   ` Mike Snitzer
2017-09-15 20:56   ` Bart Van Assche
2017-09-15 20:56     ` Bart Van Assche
2017-09-15 21:06   ` Bart Van Assche
2017-09-15 21:06     ` Bart Van Assche
2017-09-15 21:42   ` Bart Van Assche
2017-09-15 21:42     ` Bart Van Assche

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.