* [PATCH 0/2] block/SCSI MQ: two RESTART related patches @ 2017-10-17 5:04 Ming Lei 2017-10-17 5:04 ` [PATCH 1/2] SCSI: run idle hctx after delay in scsi_mq_get_budget() Ming Lei ` (2 more replies) 0 siblings, 3 replies; 7+ messages in thread From: Ming Lei @ 2017-10-17 5:04 UTC (permalink / raw) To: Jens Axboe, linux-block, Christoph Hellwig Cc: Bart Van Assche, linux-kernel, linux-scsi, Omar Sandoval, John Garry, Ming Lei Hi Jens, The 1st patch runs idle hctx after dealy in scsi_mq_get_budget(), so that we can keep same behaviour with before, and it can be thought as a fix. The 2nd patch cleans up RESTART, and removes handling for TAG_SHARED from current blk-mq's RESTART mechanism because SCSI_MQ can covers its restart by itself, so that no need to handle TAG_SHARED in blk-mq RESTART. And >20% IOPS boost is observed in my rand read test over scsi_debug. John, please test this two patches and see if it may improve your SAS IO performance, and you can find the two patches in the following branch: https://github.com/ming1/linux/commits/blk_mq_improve_restart_V1 Ming Lei (2): SCSI: run idle hctx after delay in scsi_mq_get_budget() blk-mq: don't handle TAG_SHARED in restart block/blk-mq-sched.c | 78 +++---------------------------------------------- drivers/scsi/scsi_lib.c | 13 +++++++-- 2 files changed, 14 insertions(+), 77 deletions(-) -- 2.9.5 ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 1/2] SCSI: run idle hctx after delay in scsi_mq_get_budget() 2017-10-17 5:04 [PATCH 0/2] block/SCSI MQ: two RESTART related patches Ming Lei @ 2017-10-17 5:04 ` Ming Lei 2017-10-17 5:04 ` [PATCH 2/2] blk-mq: don't handle TAG_SHARED in restart Ming Lei 2017-10-17 5:12 ` [PATCH 0/2] block/SCSI MQ: two RESTART related patches Ming Lei 2 siblings, 0 replies; 7+ messages in thread From: Ming Lei @ 2017-10-17 5:04 UTC (permalink / raw) To: Jens Axboe, linux-block, Christoph Hellwig Cc: Bart Van Assche, linux-kernel, linux-scsi, Omar Sandoval, John Garry, Ming Lei If there isn't any outstanding request in this queue, both blk-mq's RESTART and SCSI's builtin RESTART can't work, so we have to deal with this case by running this queue after delay. Fixes: d04b6d97d0a1(scsi: implement .get_budget and .put_budget for blk-mq) Signed-off-by: Ming Lei <ming.lei@redhat.com> --- drivers/scsi/scsi_lib.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index 6f10afaca25b..08c495364b28 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -1959,6 +1959,14 @@ static void scsi_mq_put_budget(struct blk_mq_hw_ctx *hctx) put_device(&sdev->sdev_gendev); } +static void scsi_mq_run_idle_hctx(struct scsi_device *sdev, + struct blk_mq_hw_ctx *hctx) +{ + if (atomic_read(&sdev->device_busy) == 0 && + !scsi_device_blocked(sdev)) + blk_mq_delay_run_hw_queue(hctx, SCSI_QUEUE_DELAY); +} + static blk_status_t scsi_mq_get_budget(struct blk_mq_hw_ctx *hctx) { struct request_queue *q = hctx->queue; @@ -1989,6 +1997,7 @@ static blk_status_t scsi_mq_get_budget(struct blk_mq_hw_ctx *hctx) out_put_device: put_device(&sdev->sdev_gendev); out: + scsi_mq_run_idle_hctx(sdev, hctx); return BLK_STS_RESOURCE; } @@ -2039,9 +2048,7 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx, case BLK_STS_OK: break; case BLK_STS_RESOURCE: - if (atomic_read(&sdev->device_busy) == 0 && - !scsi_device_blocked(sdev)) - blk_mq_delay_run_hw_queue(hctx, SCSI_QUEUE_DELAY); + scsi_mq_run_idle_hctx(sdev, hctx); break; default: /* -- 2.9.5 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 2/2] blk-mq: don't handle TAG_SHARED in restart 2017-10-17 5:04 [PATCH 0/2] block/SCSI MQ: two RESTART related patches Ming Lei 2017-10-17 5:04 ` [PATCH 1/2] SCSI: run idle hctx after delay in scsi_mq_get_budget() Ming Lei @ 2017-10-17 5:04 ` Ming Lei 2017-10-17 5:12 ` [PATCH 0/2] block/SCSI MQ: two RESTART related patches Ming Lei 2 siblings, 0 replies; 7+ messages in thread From: Ming Lei @ 2017-10-17 5:04 UTC (permalink / raw) To: Jens Axboe, linux-block, Christoph Hellwig Cc: Bart Van Assche, linux-kernel, linux-scsi, Omar Sandoval, John Garry, Ming Lei Now restart is used in the following cases, and TAG_SHARED is for SCSI only. 1) .get_budget() returns BLK_STS_RESOURCE - if resource in target/host level isn't satistifed, this SCSI device will be added in shost->starved_list, and the whole queue will be rerun (via SCSI's built-in RESTART) in scsi_end_request() after any request initiated from this host/targe is completed. Forget to mention, host level resource is always respected by blk-mq before running .queue_rq(). - the same is true if resource in the queue level isn't satisfied. - if there isn't outstanding request on this queue, then SCSI's RESTART can't work(blk-mq's can't work too), and the queue will be run after SCSI_QUEUE_DELAY, and finally all starved sdevs will be handled by SCSI's RESTART when this request is finished 2) scsi_dispatch_cmd() returns BLK_STS_RESOURCE - if there isn't onprogressing request on this queue, the queue will be run after SCSI_QUEUE_DELAY - otherwise, SCSI's RESTART covers the rerun. 3) blk_mq_get_driver_tag() failed - BLK_MQ_S_TAG_WAITING covers the cross-queue RESTART for driver allocation. In one word, SCSI's built-in RESTART is enough to cover itself. So we don't need to pay special attention to TAG_SHARED wrt. restart. Signed-off-by: Ming Lei <ming.lei@redhat.com> --- block/blk-mq-sched.c | 78 +++------------------------------------------------- 1 file changed, 4 insertions(+), 74 deletions(-) diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c index df8581bb0a37..daab27feb653 100644 --- a/block/blk-mq-sched.c +++ b/block/blk-mq-sched.c @@ -68,25 +68,17 @@ static void blk_mq_sched_mark_restart_hctx(struct blk_mq_hw_ctx *hctx) set_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state); } -static bool blk_mq_sched_restart_hctx(struct blk_mq_hw_ctx *hctx) +void blk_mq_sched_restart(struct blk_mq_hw_ctx *hctx) { if (!test_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state)) - return false; - - if (hctx->flags & BLK_MQ_F_TAG_SHARED) { - struct request_queue *q = hctx->queue; + return; - if (test_and_clear_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state)) - atomic_dec(&q->shared_hctx_restart); - } else - clear_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state); + clear_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state); if (blk_mq_hctx_has_pending(hctx)) { blk_mq_run_hw_queue(hctx, true); - return true; + return; } - - return false; } /* return true if hctx need to run again */ @@ -385,68 +377,6 @@ static bool blk_mq_sched_bypass_insert(struct blk_mq_hw_ctx *hctx, return true; } -/** - * list_for_each_entry_rcu_rr - iterate in a round-robin fashion over rcu list - * @pos: loop cursor. - * @skip: the list element that will not be examined. Iteration starts at - * @skip->next. - * @head: head of the list to examine. This list must have at least one - * element, namely @skip. - * @member: name of the list_head structure within typeof(*pos). - */ -#define list_for_each_entry_rcu_rr(pos, skip, head, member) \ - for ((pos) = (skip); \ - (pos = (pos)->member.next != (head) ? list_entry_rcu( \ - (pos)->member.next, typeof(*pos), member) : \ - list_entry_rcu((pos)->member.next->next, typeof(*pos), member)), \ - (pos) != (skip); ) - -/* - * Called after a driver tag has been freed to check whether a hctx needs to - * be restarted. Restarts @hctx if its tag set is not shared. Restarts hardware - * queues in a round-robin fashion if the tag set of @hctx is shared with other - * hardware queues. - */ -void blk_mq_sched_restart(struct blk_mq_hw_ctx *const hctx) -{ - struct blk_mq_tags *const tags = hctx->tags; - struct blk_mq_tag_set *const set = hctx->queue->tag_set; - struct request_queue *const queue = hctx->queue, *q; - struct blk_mq_hw_ctx *hctx2; - unsigned int i, j; - - if (set->flags & BLK_MQ_F_TAG_SHARED) { - /* - * If this is 0, then we know that no hardware queues - * have RESTART marked. We're done. - */ - if (!atomic_read(&queue->shared_hctx_restart)) - return; - - rcu_read_lock(); - list_for_each_entry_rcu_rr(q, queue, &set->tag_list, - tag_set_list) { - queue_for_each_hw_ctx(q, hctx2, i) - if (hctx2->tags == tags && - blk_mq_sched_restart_hctx(hctx2)) - goto done; - } - j = hctx->queue_num + 1; - for (i = 0; i < queue->nr_hw_queues; i++, j++) { - if (j == queue->nr_hw_queues) - j = 0; - hctx2 = queue->queue_hw_ctx[j]; - if (hctx2->tags == tags && - blk_mq_sched_restart_hctx(hctx2)) - break; - } -done: - rcu_read_unlock(); - } else { - blk_mq_sched_restart_hctx(hctx); - } -} - /* * Add flush/fua to the queue. If we fail getting a driver tag, then * punt to the requeue list. Requeue will re-invoke us from a context -- 2.9.5 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 0/2] block/SCSI MQ: two RESTART related patches 2017-10-17 5:04 [PATCH 0/2] block/SCSI MQ: two RESTART related patches Ming Lei 2017-10-17 5:04 ` [PATCH 1/2] SCSI: run idle hctx after delay in scsi_mq_get_budget() Ming Lei 2017-10-17 5:04 ` [PATCH 2/2] blk-mq: don't handle TAG_SHARED in restart Ming Lei @ 2017-10-17 5:12 ` Ming Lei 2017-10-17 15:47 ` John Garry 2 siblings, 1 reply; 7+ messages in thread From: Ming Lei @ 2017-10-17 5:12 UTC (permalink / raw) To: Jens Axboe, linux-block, Christoph Hellwig Cc: Bart Van Assche, linux-kernel, linux-scsi, Omar Sandoval, John Garry On Tue, Oct 17, 2017 at 01:04:16PM +0800, Ming Lei wrote: > Hi Jens, > > The 1st patch runs idle hctx after dealy in scsi_mq_get_budget(), > so that we can keep same behaviour with before, and it can be > thought as a fix. > > The 2nd patch cleans up RESTART, and removes handling for TAG_SHARED > from current blk-mq's RESTART mechanism because SCSI_MQ can covers its > restart by itself, so that no need to handle TAG_SHARED in blk-mq > RESTART. And >20% IOPS boost is observed in my rand read test over > scsi_debug. > > John, please test this two patches and see if it may improve your SAS > IO performance, and you can find the two patches in the following branch: > > https://github.com/ming1/linux/commits/blk_mq_improve_restart_V1 Forget to mention, you need to either pull the above branch directly or apply the two patches against for-next branch of Jens' block tree: git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git #for-next -- Ming ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 0/2] block/SCSI MQ: two RESTART related patches 2017-10-17 5:12 ` [PATCH 0/2] block/SCSI MQ: two RESTART related patches Ming Lei @ 2017-10-17 15:47 ` John Garry 0 siblings, 0 replies; 7+ messages in thread From: John Garry @ 2017-10-17 15:47 UTC (permalink / raw) To: Ming Lei, Jens Axboe, linux-block, Christoph Hellwig Cc: Bart Van Assche, linux-kernel, linux-scsi, Omar Sandoval, Linuxarm On 17/10/2017 06:12, Ming Lei wrote: > On Tue, Oct 17, 2017 at 01:04:16PM +0800, Ming Lei wrote: >> Hi Jens, >> >> The 1st patch runs idle hctx after dealy in scsi_mq_get_budget(), >> so that we can keep same behaviour with before, and it can be >> thought as a fix. >> >> The 2nd patch cleans up RESTART, and removes handling for TAG_SHARED >> from current blk-mq's RESTART mechanism because SCSI_MQ can covers its >> restart by itself, so that no need to handle TAG_SHARED in blk-mq >> RESTART. And >20% IOPS boost is observed in my rand read test over >> scsi_debug. >> >> John, please test this two patches and see if it may improve your SAS >> IO performance, and you can find the two patches in the following branch: >> >> https://github.com/ming1/linux/commits/blk_mq_improve_restart_V1 > Hi Ming, As requested, here's my figures for blk_mq_improve_restart_V1: without default SCSI_MQ, deadline scheduler, CONFIG PREEMPT off read, rw, write IOPS 989K, 113K/113K, 835K with default SCSI_MQ, mq-deadline scheduler, CONFIG PREEMPT off read, rw, write IOPS 738K, 130K/130K, 686K For axboe for-next tip (21ed538): without default SCSI_MQ, deadline scheduler, CONFIG PREEMPT off read, rw, write IOPS 977K, 117K/117K, 821K with default SCSI_MQ, mq-deadline scheduler, CONFIG PREEMPT off read, rw, write IOPS 733K, 128K/128K, 676K All cases do not have LLDD mq exposed/enabled. So unfortunately not much difference with your branch. cheers, John > Forget to mention, you need to either pull the above branch directly > or apply the two patches against for-next branch of Jens' block > tree: > > git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git #for-next > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 0/2] block/SCSI MQ: two RESTART related patches @ 2017-10-17 15:47 ` John Garry 0 siblings, 0 replies; 7+ messages in thread From: John Garry @ 2017-10-17 15:47 UTC (permalink / raw) To: Ming Lei, Jens Axboe, linux-block, Christoph Hellwig Cc: Bart Van Assche, linux-kernel, linux-scsi, Omar Sandoval, Linuxarm On 17/10/2017 06:12, Ming Lei wrote: > On Tue, Oct 17, 2017 at 01:04:16PM +0800, Ming Lei wrote: >> Hi Jens, >> >> The 1st patch runs idle hctx after dealy in scsi_mq_get_budget(), >> so that we can keep same behaviour with before, and it can be >> thought as a fix. >> >> The 2nd patch cleans up RESTART, and removes handling for TAG_SHARED >> from current blk-mq's RESTART mechanism because SCSI_MQ can covers its >> restart by itself, so that no need to handle TAG_SHARED in blk-mq >> RESTART. And >20% IOPS boost is observed in my rand read test over >> scsi_debug. >> >> John, please test this two patches and see if it may improve your SAS >> IO performance, and you can find the two patches in the following branch: >> >> https://github.com/ming1/linux/commits/blk_mq_improve_restart_V1 > Hi Ming, As requested, here's my figures for blk_mq_improve_restart_V1: without default SCSI_MQ, deadline scheduler, CONFIG PREEMPT off read, rw, write IOPS 989K, 113K/113K, 835K with default SCSI_MQ, mq-deadline scheduler, CONFIG PREEMPT off read, rw, write IOPS 738K, 130K/130K, 686K For axboe for-next tip (21ed538): without default SCSI_MQ, deadline scheduler, CONFIG PREEMPT off read, rw, write IOPS 977K, 117K/117K, 821K with default SCSI_MQ, mq-deadline scheduler, CONFIG PREEMPT off read, rw, write IOPS 733K, 128K/128K, 676K All cases do not have LLDD mq exposed/enabled. So unfortunately not much difference with your branch. cheers, John > Forget to mention, you need to either pull the above branch directly > or apply the two patches against for-next branch of Jens' block > tree: > > git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git #for-next > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 0/2] block/SCSI MQ: two RESTART related patches 2017-10-17 15:47 ` John Garry (?) @ 2017-10-18 1:30 ` Ming Lei -1 siblings, 0 replies; 7+ messages in thread From: Ming Lei @ 2017-10-18 1:30 UTC (permalink / raw) To: John Garry Cc: Jens Axboe, linux-block, Christoph Hellwig, Bart Van Assche, linux-kernel, linux-scsi, Omar Sandoval, Linuxarm On Tue, Oct 17, 2017 at 04:47:23PM +0100, John Garry wrote: > On 17/10/2017 06:12, Ming Lei wrote: > > On Tue, Oct 17, 2017 at 01:04:16PM +0800, Ming Lei wrote: > > > Hi Jens, > > > > > > The 1st patch runs idle hctx after dealy in scsi_mq_get_budget(), > > > so that we can keep same behaviour with before, and it can be > > > thought as a fix. > > > > > > The 2nd patch cleans up RESTART, and removes handling for TAG_SHARED > > > from current blk-mq's RESTART mechanism because SCSI_MQ can covers its > > > restart by itself, so that no need to handle TAG_SHARED in blk-mq > > > RESTART. And >20% IOPS boost is observed in my rand read test over > > > scsi_debug. > > > > > > John, please test this two patches and see if it may improve your SAS > > > IO performance, and you can find the two patches in the following branch: > > > > > > https://github.com/ming1/linux/commits/blk_mq_improve_restart_V1 > > > > Hi Ming, > > As requested, here's my figures for blk_mq_improve_restart_V1: > without default SCSI_MQ, deadline scheduler, CONFIG PREEMPT off > read, rw, write IOPS > 989K, 113K/113K, 835K > > with default SCSI_MQ, mq-deadline scheduler, CONFIG PREEMPT off > read, rw, write IOPS > 738K, 130K/130K, 686K > > For axboe for-next tip (21ed538): > without default SCSI_MQ, deadline scheduler, CONFIG PREEMPT off > read, rw, write IOPS > 977K, 117K/117K, 821K > > with default SCSI_MQ, mq-deadline scheduler, CONFIG PREEMPT off > read, rw, write IOPS > 733K, 128K/128K, 676K > > All cases do not have LLDD mq exposed/enabled. So unfortunately not much > difference with your branch. That looks a bit strange, this patchset should cut half of blk_mq_run_hw_queues() since SCSI-MQ does it by itself in scsi_end_request(). So looks your issue is related with neither IO merge nor RESTART. just wondering what the possible reason is? But this patchset is still correct thing to do, even though your issue can't be addressed. Thanks, Ming ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2017-10-18 1:30 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-10-17 5:04 [PATCH 0/2] block/SCSI MQ: two RESTART related patches Ming Lei 2017-10-17 5:04 ` [PATCH 1/2] SCSI: run idle hctx after delay in scsi_mq_get_budget() Ming Lei 2017-10-17 5:04 ` [PATCH 2/2] blk-mq: don't handle TAG_SHARED in restart Ming Lei 2017-10-17 5:12 ` [PATCH 0/2] block/SCSI MQ: two RESTART related patches Ming Lei 2017-10-17 15:47 ` John Garry 2017-10-17 15:47 ` John Garry 2017-10-18 1:30 ` Ming Lei
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.