All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] block/SCSI MQ: two RESTART related patches
@ 2017-10-17  5:04 Ming Lei
  2017-10-17  5:04 ` [PATCH 1/2] SCSI: run idle hctx after delay in scsi_mq_get_budget() Ming Lei
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Ming Lei @ 2017-10-17  5:04 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig
  Cc: Bart Van Assche, linux-kernel, linux-scsi, Omar Sandoval,
	John Garry, Ming Lei

Hi Jens,

The 1st patch runs idle hctx after dealy in scsi_mq_get_budget(),
so that we can keep same behaviour with before, and it can be
thought as a fix.

The 2nd patch cleans up RESTART, and removes handling for TAG_SHARED
from current blk-mq's RESTART mechanism because SCSI_MQ can covers its
restart by itself, so that no need to handle TAG_SHARED in blk-mq
RESTART. And >20% IOPS boost is observed in my rand read test over
scsi_debug.

John, please test this two patches and see if it may improve your SAS
IO performance, and you can find the two patches in the following branch:

	https://github.com/ming1/linux/commits/blk_mq_improve_restart_V1


Ming Lei (2):
  SCSI: run idle hctx after delay in scsi_mq_get_budget()
  blk-mq: don't handle TAG_SHARED in restart

 block/blk-mq-sched.c    | 78 +++----------------------------------------------
 drivers/scsi/scsi_lib.c | 13 +++++++--
 2 files changed, 14 insertions(+), 77 deletions(-)

-- 
2.9.5

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/2] SCSI: run idle hctx after delay in scsi_mq_get_budget()
  2017-10-17  5:04 [PATCH 0/2] block/SCSI MQ: two RESTART related patches Ming Lei
@ 2017-10-17  5:04 ` Ming Lei
  2017-10-17  5:04 ` [PATCH 2/2] blk-mq: don't handle TAG_SHARED in restart Ming Lei
  2017-10-17  5:12 ` [PATCH 0/2] block/SCSI MQ: two RESTART related patches Ming Lei
  2 siblings, 0 replies; 7+ messages in thread
From: Ming Lei @ 2017-10-17  5:04 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig
  Cc: Bart Van Assche, linux-kernel, linux-scsi, Omar Sandoval,
	John Garry, Ming Lei

If there isn't any outstanding request in this queue, both
blk-mq's RESTART and SCSI's builtin RESTART can't work,
so we have to deal with this case by running this queue
after delay.

Fixes: d04b6d97d0a1(scsi: implement .get_budget and .put_budget for blk-mq)
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/scsi/scsi_lib.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 6f10afaca25b..08c495364b28 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1959,6 +1959,14 @@ static void scsi_mq_put_budget(struct blk_mq_hw_ctx *hctx)
 	put_device(&sdev->sdev_gendev);
 }
 
+static void scsi_mq_run_idle_hctx(struct scsi_device *sdev,
+		struct blk_mq_hw_ctx *hctx)
+{
+	if (atomic_read(&sdev->device_busy) == 0 &&
+	    !scsi_device_blocked(sdev))
+		blk_mq_delay_run_hw_queue(hctx, SCSI_QUEUE_DELAY);
+}
+
 static blk_status_t scsi_mq_get_budget(struct blk_mq_hw_ctx *hctx)
 {
 	struct request_queue *q = hctx->queue;
@@ -1989,6 +1997,7 @@ static blk_status_t scsi_mq_get_budget(struct blk_mq_hw_ctx *hctx)
 out_put_device:
 	put_device(&sdev->sdev_gendev);
 out:
+	scsi_mq_run_idle_hctx(sdev, hctx);
 	return BLK_STS_RESOURCE;
 }
 
@@ -2039,9 +2048,7 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx,
 	case BLK_STS_OK:
 		break;
 	case BLK_STS_RESOURCE:
-		if (atomic_read(&sdev->device_busy) == 0 &&
-		    !scsi_device_blocked(sdev))
-			blk_mq_delay_run_hw_queue(hctx, SCSI_QUEUE_DELAY);
+		scsi_mq_run_idle_hctx(sdev, hctx);
 		break;
 	default:
 		/*
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/2] blk-mq: don't handle TAG_SHARED in restart
  2017-10-17  5:04 [PATCH 0/2] block/SCSI MQ: two RESTART related patches Ming Lei
  2017-10-17  5:04 ` [PATCH 1/2] SCSI: run idle hctx after delay in scsi_mq_get_budget() Ming Lei
@ 2017-10-17  5:04 ` Ming Lei
  2017-10-17  5:12 ` [PATCH 0/2] block/SCSI MQ: two RESTART related patches Ming Lei
  2 siblings, 0 replies; 7+ messages in thread
From: Ming Lei @ 2017-10-17  5:04 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig
  Cc: Bart Van Assche, linux-kernel, linux-scsi, Omar Sandoval,
	John Garry, Ming Lei

Now restart is used in the following cases, and TAG_SHARED is for
SCSI only.

1) .get_budget() returns BLK_STS_RESOURCE
- if resource in target/host level isn't satistifed, this SCSI device
will be added in shost->starved_list, and the whole queue will be rerun
(via SCSI's built-in RESTART) in scsi_end_request() after any request
initiated from this host/targe is completed. Forget to mention, host level
resource is always respected by blk-mq before running .queue_rq().

- the same is true if resource in the queue level isn't satisfied.

- if there isn't outstanding request on this queue, then SCSI's RESTART
can't work(blk-mq's can't work too), and the queue will be run after
SCSI_QUEUE_DELAY, and finally all starved sdevs will be handled by SCSI's
RESTART when this request is finished

2) scsi_dispatch_cmd() returns BLK_STS_RESOURCE
- if there isn't onprogressing request on this queue, the queue
will be run after SCSI_QUEUE_DELAY

- otherwise, SCSI's RESTART covers the rerun.

3) blk_mq_get_driver_tag() failed
- BLK_MQ_S_TAG_WAITING covers the cross-queue RESTART for driver
allocation.

In one word, SCSI's built-in RESTART is enough to cover itself.
So we don't need to pay special attention to TAG_SHARED wrt. restart.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq-sched.c | 78 +++-------------------------------------------------
 1 file changed, 4 insertions(+), 74 deletions(-)

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index df8581bb0a37..daab27feb653 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -68,25 +68,17 @@ static void blk_mq_sched_mark_restart_hctx(struct blk_mq_hw_ctx *hctx)
 		set_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state);
 }
 
-static bool blk_mq_sched_restart_hctx(struct blk_mq_hw_ctx *hctx)
+void blk_mq_sched_restart(struct blk_mq_hw_ctx *hctx)
 {
 	if (!test_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state))
-		return false;
-
-	if (hctx->flags & BLK_MQ_F_TAG_SHARED) {
-		struct request_queue *q = hctx->queue;
+		return;
 
-		if (test_and_clear_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state))
-			atomic_dec(&q->shared_hctx_restart);
-	} else
-		clear_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state);
+	clear_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state);
 
 	if (blk_mq_hctx_has_pending(hctx)) {
 		blk_mq_run_hw_queue(hctx, true);
-		return true;
+		return;
 	}
-
-	return false;
 }
 
 /* return true if hctx need to run again */
@@ -385,68 +377,6 @@ static bool blk_mq_sched_bypass_insert(struct blk_mq_hw_ctx *hctx,
 	return true;
 }
 
-/**
- * list_for_each_entry_rcu_rr - iterate in a round-robin fashion over rcu list
- * @pos:    loop cursor.
- * @skip:   the list element that will not be examined. Iteration starts at
- *          @skip->next.
- * @head:   head of the list to examine. This list must have at least one
- *          element, namely @skip.
- * @member: name of the list_head structure within typeof(*pos).
- */
-#define list_for_each_entry_rcu_rr(pos, skip, head, member)		\
-	for ((pos) = (skip);						\
-	     (pos = (pos)->member.next != (head) ? list_entry_rcu(	\
-			(pos)->member.next, typeof(*pos), member) :	\
-	      list_entry_rcu((pos)->member.next->next, typeof(*pos), member)), \
-	     (pos) != (skip); )
-
-/*
- * Called after a driver tag has been freed to check whether a hctx needs to
- * be restarted. Restarts @hctx if its tag set is not shared. Restarts hardware
- * queues in a round-robin fashion if the tag set of @hctx is shared with other
- * hardware queues.
- */
-void blk_mq_sched_restart(struct blk_mq_hw_ctx *const hctx)
-{
-	struct blk_mq_tags *const tags = hctx->tags;
-	struct blk_mq_tag_set *const set = hctx->queue->tag_set;
-	struct request_queue *const queue = hctx->queue, *q;
-	struct blk_mq_hw_ctx *hctx2;
-	unsigned int i, j;
-
-	if (set->flags & BLK_MQ_F_TAG_SHARED) {
-		/*
-		 * If this is 0, then we know that no hardware queues
-		 * have RESTART marked. We're done.
-		 */
-		if (!atomic_read(&queue->shared_hctx_restart))
-			return;
-
-		rcu_read_lock();
-		list_for_each_entry_rcu_rr(q, queue, &set->tag_list,
-					   tag_set_list) {
-			queue_for_each_hw_ctx(q, hctx2, i)
-				if (hctx2->tags == tags &&
-				    blk_mq_sched_restart_hctx(hctx2))
-					goto done;
-		}
-		j = hctx->queue_num + 1;
-		for (i = 0; i < queue->nr_hw_queues; i++, j++) {
-			if (j == queue->nr_hw_queues)
-				j = 0;
-			hctx2 = queue->queue_hw_ctx[j];
-			if (hctx2->tags == tags &&
-			    blk_mq_sched_restart_hctx(hctx2))
-				break;
-		}
-done:
-		rcu_read_unlock();
-	} else {
-		blk_mq_sched_restart_hctx(hctx);
-	}
-}
-
 /*
  * Add flush/fua to the queue. If we fail getting a driver tag, then
  * punt to the requeue list. Requeue will re-invoke us from a context
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/2] block/SCSI MQ: two RESTART related patches
  2017-10-17  5:04 [PATCH 0/2] block/SCSI MQ: two RESTART related patches Ming Lei
  2017-10-17  5:04 ` [PATCH 1/2] SCSI: run idle hctx after delay in scsi_mq_get_budget() Ming Lei
  2017-10-17  5:04 ` [PATCH 2/2] blk-mq: don't handle TAG_SHARED in restart Ming Lei
@ 2017-10-17  5:12 ` Ming Lei
  2017-10-17 15:47     ` John Garry
  2 siblings, 1 reply; 7+ messages in thread
From: Ming Lei @ 2017-10-17  5:12 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig
  Cc: Bart Van Assche, linux-kernel, linux-scsi, Omar Sandoval, John Garry

On Tue, Oct 17, 2017 at 01:04:16PM +0800, Ming Lei wrote:
> Hi Jens,
> 
> The 1st patch runs idle hctx after dealy in scsi_mq_get_budget(),
> so that we can keep same behaviour with before, and it can be
> thought as a fix.
> 
> The 2nd patch cleans up RESTART, and removes handling for TAG_SHARED
> from current blk-mq's RESTART mechanism because SCSI_MQ can covers its
> restart by itself, so that no need to handle TAG_SHARED in blk-mq
> RESTART. And >20% IOPS boost is observed in my rand read test over
> scsi_debug.
> 
> John, please test this two patches and see if it may improve your SAS
> IO performance, and you can find the two patches in the following branch:
> 
> 	https://github.com/ming1/linux/commits/blk_mq_improve_restart_V1

Forget to mention, you need to either pull the above branch directly
or apply the two patches against for-next branch of Jens' block
tree:

	git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git #for-next

-- 
Ming

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/2] block/SCSI MQ: two RESTART related patches
  2017-10-17  5:12 ` [PATCH 0/2] block/SCSI MQ: two RESTART related patches Ming Lei
@ 2017-10-17 15:47     ` John Garry
  0 siblings, 0 replies; 7+ messages in thread
From: John Garry @ 2017-10-17 15:47 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe, linux-block, Christoph Hellwig
  Cc: Bart Van Assche, linux-kernel, linux-scsi, Omar Sandoval, Linuxarm

On 17/10/2017 06:12, Ming Lei wrote:
> On Tue, Oct 17, 2017 at 01:04:16PM +0800, Ming Lei wrote:
>> Hi Jens,
>>
>> The 1st patch runs idle hctx after dealy in scsi_mq_get_budget(),
>> so that we can keep same behaviour with before, and it can be
>> thought as a fix.
>>
>> The 2nd patch cleans up RESTART, and removes handling for TAG_SHARED
>> from current blk-mq's RESTART mechanism because SCSI_MQ can covers its
>> restart by itself, so that no need to handle TAG_SHARED in blk-mq
>> RESTART. And >20% IOPS boost is observed in my rand read test over
>> scsi_debug.
>>
>> John, please test this two patches and see if it may improve your SAS
>> IO performance, and you can find the two patches in the following branch:
>>
>> 	https://github.com/ming1/linux/commits/blk_mq_improve_restart_V1
>

Hi Ming,

As requested, here's my figures for blk_mq_improve_restart_V1:
without default SCSI_MQ, deadline scheduler, CONFIG PREEMPT off
read, rw, write IOPS	
989K, 113K/113K, 835K

with default SCSI_MQ, mq-deadline scheduler, CONFIG PREEMPT off
read, rw, write IOPS	
738K, 130K/130K, 686K

For axboe for-next tip (21ed538):
without default SCSI_MQ, deadline scheduler, CONFIG PREEMPT off
read, rw, write IOPS	
977K, 117K/117K, 821K

with default SCSI_MQ, mq-deadline scheduler, CONFIG PREEMPT off
read, rw, write IOPS	
733K, 128K/128K, 676K

All cases do not have LLDD mq exposed/enabled. So unfortunately not much 
difference with your branch.

cheers,
John

> Forget to mention, you need to either pull the above branch directly
> or apply the two patches against for-next branch of Jens' block
> tree:
>
> 	git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git #for-next
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/2] block/SCSI MQ: two RESTART related patches
@ 2017-10-17 15:47     ` John Garry
  0 siblings, 0 replies; 7+ messages in thread
From: John Garry @ 2017-10-17 15:47 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe, linux-block, Christoph Hellwig
  Cc: Bart Van Assche, linux-kernel, linux-scsi, Omar Sandoval, Linuxarm

On 17/10/2017 06:12, Ming Lei wrote:
> On Tue, Oct 17, 2017 at 01:04:16PM +0800, Ming Lei wrote:
>> Hi Jens,
>>
>> The 1st patch runs idle hctx after dealy in scsi_mq_get_budget(),
>> so that we can keep same behaviour with before, and it can be
>> thought as a fix.
>>
>> The 2nd patch cleans up RESTART, and removes handling for TAG_SHARED
>> from current blk-mq's RESTART mechanism because SCSI_MQ can covers its
>> restart by itself, so that no need to handle TAG_SHARED in blk-mq
>> RESTART. And >20% IOPS boost is observed in my rand read test over
>> scsi_debug.
>>
>> John, please test this two patches and see if it may improve your SAS
>> IO performance, and you can find the two patches in the following branch:
>>
>> 	https://github.com/ming1/linux/commits/blk_mq_improve_restart_V1
>

Hi Ming,

As requested, here's my figures for blk_mq_improve_restart_V1:
without default SCSI_MQ, deadline scheduler, CONFIG PREEMPT off
read, rw, write IOPS	
989K, 113K/113K, 835K

with default SCSI_MQ, mq-deadline scheduler, CONFIG PREEMPT off
read, rw, write IOPS	
738K, 130K/130K, 686K

For axboe for-next tip (21ed538):
without default SCSI_MQ, deadline scheduler, CONFIG PREEMPT off
read, rw, write IOPS	
977K, 117K/117K, 821K

with default SCSI_MQ, mq-deadline scheduler, CONFIG PREEMPT off
read, rw, write IOPS	
733K, 128K/128K, 676K

All cases do not have LLDD mq exposed/enabled. So unfortunately not much 
difference with your branch.

cheers,
John

> Forget to mention, you need to either pull the above branch directly
> or apply the two patches against for-next branch of Jens' block
> tree:
>
> 	git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git #for-next
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/2] block/SCSI MQ: two RESTART related patches
  2017-10-17 15:47     ` John Garry
  (?)
@ 2017-10-18  1:30     ` Ming Lei
  -1 siblings, 0 replies; 7+ messages in thread
From: Ming Lei @ 2017-10-18  1:30 UTC (permalink / raw)
  To: John Garry
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Bart Van Assche,
	linux-kernel, linux-scsi, Omar Sandoval, Linuxarm

On Tue, Oct 17, 2017 at 04:47:23PM +0100, John Garry wrote:
> On 17/10/2017 06:12, Ming Lei wrote:
> > On Tue, Oct 17, 2017 at 01:04:16PM +0800, Ming Lei wrote:
> > > Hi Jens,
> > > 
> > > The 1st patch runs idle hctx after dealy in scsi_mq_get_budget(),
> > > so that we can keep same behaviour with before, and it can be
> > > thought as a fix.
> > > 
> > > The 2nd patch cleans up RESTART, and removes handling for TAG_SHARED
> > > from current blk-mq's RESTART mechanism because SCSI_MQ can covers its
> > > restart by itself, so that no need to handle TAG_SHARED in blk-mq
> > > RESTART. And >20% IOPS boost is observed in my rand read test over
> > > scsi_debug.
> > > 
> > > John, please test this two patches and see if it may improve your SAS
> > > IO performance, and you can find the two patches in the following branch:
> > > 
> > > 	https://github.com/ming1/linux/commits/blk_mq_improve_restart_V1
> > 
> 
> Hi Ming,
> 
> As requested, here's my figures for blk_mq_improve_restart_V1:
> without default SCSI_MQ, deadline scheduler, CONFIG PREEMPT off
> read, rw, write IOPS	
> 989K, 113K/113K, 835K
> 
> with default SCSI_MQ, mq-deadline scheduler, CONFIG PREEMPT off
> read, rw, write IOPS	
> 738K, 130K/130K, 686K
> 
> For axboe for-next tip (21ed538):
> without default SCSI_MQ, deadline scheduler, CONFIG PREEMPT off
> read, rw, write IOPS	
> 977K, 117K/117K, 821K
> 
> with default SCSI_MQ, mq-deadline scheduler, CONFIG PREEMPT off
> read, rw, write IOPS	
> 733K, 128K/128K, 676K
> 
> All cases do not have LLDD mq exposed/enabled. So unfortunately not much
> difference with your branch.

That looks a bit strange, this patchset should cut half of
blk_mq_run_hw_queues() since SCSI-MQ does it by itself in
scsi_end_request().

So looks your issue is related with neither IO merge nor RESTART.
just wondering what the possible reason is?

But this patchset is still correct thing to do, even though your
issue can't be addressed.


Thanks,
Ming

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-10-18  1:30 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-17  5:04 [PATCH 0/2] block/SCSI MQ: two RESTART related patches Ming Lei
2017-10-17  5:04 ` [PATCH 1/2] SCSI: run idle hctx after delay in scsi_mq_get_budget() Ming Lei
2017-10-17  5:04 ` [PATCH 2/2] blk-mq: don't handle TAG_SHARED in restart Ming Lei
2017-10-17  5:12 ` [PATCH 0/2] block/SCSI MQ: two RESTART related patches Ming Lei
2017-10-17 15:47   ` John Garry
2017-10-17 15:47     ` John Garry
2017-10-18  1:30     ` Ming Lei

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.