linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] block: Do not pull requests from the scheduler when we cannot dispatch them
@ 2021-05-20 11:25 Jan Kara
  2021-05-21  1:29 ` Ming Lei
  2021-06-02  9:25 ` Ming Lei
  0 siblings, 2 replies; 7+ messages in thread
From: Jan Kara @ 2021-05-20 11:25 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Paolo Valente, linux-block, Jan Kara

Provided the device driver does not implement dispatch budget accounting
(which only SCSI does) the loop in __blk_mq_do_dispatch_sched() pulls
requests from the IO scheduler as long as it is willing to give out any.
That defeats scheduling heuristics inside the scheduler by creating
false impression that the device can take more IO when it in fact
cannot.

For example with BFQ IO scheduler on top of virtio-blk device setting
blkio cgroup weight has barely any impact on observed throughput of
async IO because __blk_mq_do_dispatch_sched() always sucks out all the
IO queued in BFQ. BFQ first submits IO from higher weight cgroups but
when that is all dispatched, it will give out IO of lower weight cgroups
as well. And then we have to wait for all this IO to be dispatched to
the disk (which means lot of it actually has to complete) before the
IO scheduler is queried again for dispatching more requests. This
completely destroys any service differentiation.

So grab request tag for a request pulled out of the IO scheduler already
in __blk_mq_do_dispatch_sched() and do not pull any more requests if we
cannot get it because we are unlikely to be able to dispatch it. That
way only single request is going to wait in the dispatch list for some
tag to free.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 block/blk-mq-sched.c | 12 +++++++++++-
 block/blk-mq.c       |  2 +-
 block/blk-mq.h       |  2 ++
 3 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 996a4b2f73aa..714e678f516a 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -168,9 +168,19 @@ static int __blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
 		 * in blk_mq_dispatch_rq_list().
 		 */
 		list_add_tail(&rq->queuelist, &rq_list);
+		count++;
 		if (rq->mq_hctx != hctx)
 			multi_hctxs = true;
-	} while (++count < max_dispatch);
+
+		/*
+		 * If we cannot get tag for the request, stop dequeueing
+		 * requests from the IO scheduler. We are unlikely to be able
+		 * to submit them anyway and it creates false impression for
+		 * scheduling heuristics that the device can take more IO.
+		 */
+		if (!blk_mq_get_driver_tag(rq))
+			break;
+	} while (count < max_dispatch);
 
 	if (!count) {
 		if (run_queue)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index c86c01bfecdb..bc2cf80d2c3b 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1100,7 +1100,7 @@ static bool __blk_mq_get_driver_tag(struct request *rq)
 	return true;
 }
 
-static bool blk_mq_get_driver_tag(struct request *rq)
+bool blk_mq_get_driver_tag(struct request *rq)
 {
 	struct blk_mq_hw_ctx *hctx = rq->mq_hctx;
 
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 9ce64bc4a6c8..81a775171be7 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -259,6 +259,8 @@ static inline void blk_mq_put_driver_tag(struct request *rq)
 	__blk_mq_put_driver_tag(rq->mq_hctx, rq);
 }
 
+bool blk_mq_get_driver_tag(struct request *rq);
+
 static inline void blk_mq_clear_mq_map(struct blk_mq_queue_map *qmap)
 {
 	int cpu;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] block: Do not pull requests from the scheduler when we cannot dispatch them
  2021-05-20 11:25 [PATCH] block: Do not pull requests from the scheduler when we cannot dispatch them Jan Kara
@ 2021-05-21  1:29 ` Ming Lei
  2021-05-21 11:20   ` Jan Kara
  2021-06-02  9:25 ` Ming Lei
  1 sibling, 1 reply; 7+ messages in thread
From: Ming Lei @ 2021-05-21  1:29 UTC (permalink / raw)
  To: Jan Kara; +Cc: Jens Axboe, Paolo Valente, linux-block

On Thu, May 20, 2021 at 01:25:28PM +0200, Jan Kara wrote:
> Provided the device driver does not implement dispatch budget accounting
> (which only SCSI does) the loop in __blk_mq_do_dispatch_sched() pulls
> requests from the IO scheduler as long as it is willing to give out any.
> That defeats scheduling heuristics inside the scheduler by creating
> false impression that the device can take more IO when it in fact
> cannot.

So hctx->dispatch_busy isn't set as true in this case?

> 
> For example with BFQ IO scheduler on top of virtio-blk device setting
> blkio cgroup weight has barely any impact on observed throughput of
> async IO because __blk_mq_do_dispatch_sched() always sucks out all the
> IO queued in BFQ. BFQ first submits IO from higher weight cgroups but
> when that is all dispatched, it will give out IO of lower weight cgroups
> as well. And then we have to wait for all this IO to be dispatched to
> the disk (which means lot of it actually has to complete) before the
> IO scheduler is queried again for dispatching more requests. This
> completely destroys any service differentiation.
> 
> So grab request tag for a request pulled out of the IO scheduler already
> in __blk_mq_do_dispatch_sched() and do not pull any more requests if we
> cannot get it because we are unlikely to be able to dispatch it. That
> way only single request is going to wait in the dispatch list for some
> tag to free.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  block/blk-mq-sched.c | 12 +++++++++++-
>  block/blk-mq.c       |  2 +-
>  block/blk-mq.h       |  2 ++
>  3 files changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
> index 996a4b2f73aa..714e678f516a 100644
> --- a/block/blk-mq-sched.c
> +++ b/block/blk-mq-sched.c
> @@ -168,9 +168,19 @@ static int __blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
>  		 * in blk_mq_dispatch_rq_list().
>  		 */
>  		list_add_tail(&rq->queuelist, &rq_list);
> +		count++;
>  		if (rq->mq_hctx != hctx)
>  			multi_hctxs = true;
> -	} while (++count < max_dispatch);
> +
> +		/*
> +		 * If we cannot get tag for the request, stop dequeueing
> +		 * requests from the IO scheduler. We are unlikely to be able
> +		 * to submit them anyway and it creates false impression for
> +		 * scheduling heuristics that the device can take more IO.
> +		 */
> +		if (!blk_mq_get_driver_tag(rq))
> +			break;

At default BFQ's queue depth is same with virtblk_queue_depth, both are
256, so looks you use non-default setting?

Also in case of running out of driver tag, hctx->dispatch_busy should have
been set as true for avoiding batching dequeuing, does the following
patch make a difference for you?


diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 045b6878b8c5..c2ce3091ad6e 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -107,6 +107,13 @@ static bool blk_mq_dispatch_hctx_list(struct list_head *rq_list)
 
 #define BLK_MQ_BUDGET_DELAY	3		/* ms units */
 
+static int blk_mq_sched_max_disaptch(struct blk_mq_hw_ctx *hctx)
+{
+	if (!hctx->dispatch_busy)
+		return hctx->queue->nr_requests;
+	return 1;
+}
+
 /*
  * Only SCSI implements .get_budget and .put_budget, and SCSI restarts
  * its queue by itself in its completion handler, so we don't need to
@@ -121,15 +128,9 @@ static int __blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
 	struct elevator_queue *e = q->elevator;
 	bool multi_hctxs = false, run_queue = false;
 	bool dispatched = false, busy = false;
-	unsigned int max_dispatch;
 	LIST_HEAD(rq_list);
 	int count = 0;
 
-	if (hctx->dispatch_busy)
-		max_dispatch = 1;
-	else
-		max_dispatch = hctx->queue->nr_requests;
-
 	do {
 		struct request *rq;
 		int budget_token;
@@ -170,7 +171,7 @@ static int __blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
 		list_add_tail(&rq->queuelist, &rq_list);
 		if (rq->mq_hctx != hctx)
 			multi_hctxs = true;
-	} while (++count < max_dispatch);
+	} while (++count < blk_mq_sched_max_disaptch(hctx));
 
 	if (!count) {
 		if (run_queue)


Thanks,
Ming


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] block: Do not pull requests from the scheduler when we cannot dispatch them
  2021-05-21  1:29 ` Ming Lei
@ 2021-05-21 11:20   ` Jan Kara
  2021-05-21 11:27     ` Jan Kara
  2021-05-21 13:18     ` Ming Lei
  0 siblings, 2 replies; 7+ messages in thread
From: Jan Kara @ 2021-05-21 11:20 UTC (permalink / raw)
  To: Ming Lei; +Cc: Jan Kara, Jens Axboe, Paolo Valente, linux-block

On Fri 21-05-21 09:29:33, Ming Lei wrote:
> On Thu, May 20, 2021 at 01:25:28PM +0200, Jan Kara wrote:
> > Provided the device driver does not implement dispatch budget accounting
> > (which only SCSI does) the loop in __blk_mq_do_dispatch_sched() pulls
> > requests from the IO scheduler as long as it is willing to give out any.
> > That defeats scheduling heuristics inside the scheduler by creating
> > false impression that the device can take more IO when it in fact
> > cannot.
> 
> So hctx->dispatch_busy isn't set as true in this case?

No. blk_mq_update_dispatch_busy() has:

        if (hctx->queue->elevator)
                return;

> > For example with BFQ IO scheduler on top of virtio-blk device setting
> > blkio cgroup weight has barely any impact on observed throughput of
> > async IO because __blk_mq_do_dispatch_sched() always sucks out all the
> > IO queued in BFQ. BFQ first submits IO from higher weight cgroups but
> > when that is all dispatched, it will give out IO of lower weight cgroups
> > as well. And then we have to wait for all this IO to be dispatched to
> > the disk (which means lot of it actually has to complete) before the
> > IO scheduler is queried again for dispatching more requests. This
> > completely destroys any service differentiation.
> > 
> > So grab request tag for a request pulled out of the IO scheduler already
> > in __blk_mq_do_dispatch_sched() and do not pull any more requests if we
> > cannot get it because we are unlikely to be able to dispatch it. That
> > way only single request is going to wait in the dispatch list for some
> > tag to free.
> > 
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> >  block/blk-mq-sched.c | 12 +++++++++++-
> >  block/blk-mq.c       |  2 +-
> >  block/blk-mq.h       |  2 ++
> >  3 files changed, 14 insertions(+), 2 deletions(-)
> > 
> > diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
> > index 996a4b2f73aa..714e678f516a 100644
> > --- a/block/blk-mq-sched.c
> > +++ b/block/blk-mq-sched.c
> > @@ -168,9 +168,19 @@ static int __blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
> >  		 * in blk_mq_dispatch_rq_list().
> >  		 */
> >  		list_add_tail(&rq->queuelist, &rq_list);
> > +		count++;
> >  		if (rq->mq_hctx != hctx)
> >  			multi_hctxs = true;
> > -	} while (++count < max_dispatch);
> > +
> > +		/*
> > +		 * If we cannot get tag for the request, stop dequeueing
> > +		 * requests from the IO scheduler. We are unlikely to be able
> > +		 * to submit them anyway and it creates false impression for
> > +		 * scheduling heuristics that the device can take more IO.
> > +		 */
> > +		if (!blk_mq_get_driver_tag(rq))
> > +			break;
> 
> At default BFQ's queue depth is same with virtblk_queue_depth, both are
> 256, so looks you use non-default setting?

Ah yes, I forgot to mention that. I actually had nr_requests set to 1024 as
otherwise all requests get sucked into virtio-blk and no IO scheduling
happens either.

> Also in case of running out of driver tag, hctx->dispatch_busy should have
> been set as true for avoiding batching dequeuing, does the following
> patch make a difference for you?

I'll try it with modifying blk_mq_update_dispatch_busy() to be updated when
elevator is in use...

								Honza
> 
> 
> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
> index 045b6878b8c5..c2ce3091ad6e 100644
> --- a/block/blk-mq-sched.c
> +++ b/block/blk-mq-sched.c
> @@ -107,6 +107,13 @@ static bool blk_mq_dispatch_hctx_list(struct list_head *rq_list)
>  
>  #define BLK_MQ_BUDGET_DELAY	3		/* ms units */
>  
> +static int blk_mq_sched_max_disaptch(struct blk_mq_hw_ctx *hctx)
> +{
> +	if (!hctx->dispatch_busy)
> +		return hctx->queue->nr_requests;
> +	return 1;
> +}
> +
>  /*
>   * Only SCSI implements .get_budget and .put_budget, and SCSI restarts
>   * its queue by itself in its completion handler, so we don't need to
> @@ -121,15 +128,9 @@ static int __blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
>  	struct elevator_queue *e = q->elevator;
>  	bool multi_hctxs = false, run_queue = false;
>  	bool dispatched = false, busy = false;
> -	unsigned int max_dispatch;
>  	LIST_HEAD(rq_list);
>  	int count = 0;
>  
> -	if (hctx->dispatch_busy)
> -		max_dispatch = 1;
> -	else
> -		max_dispatch = hctx->queue->nr_requests;
> -
>  	do {
>  		struct request *rq;
>  		int budget_token;
> @@ -170,7 +171,7 @@ static int __blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
>  		list_add_tail(&rq->queuelist, &rq_list);
>  		if (rq->mq_hctx != hctx)
>  			multi_hctxs = true;
> -	} while (++count < max_dispatch);
> +	} while (++count < blk_mq_sched_max_disaptch(hctx));
>  
>  	if (!count) {
>  		if (run_queue)
> 
> 
> Thanks,
> Ming
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] block: Do not pull requests from the scheduler when we cannot dispatch them
  2021-05-21 11:20   ` Jan Kara
@ 2021-05-21 11:27     ` Jan Kara
  2021-05-21 13:18     ` Ming Lei
  1 sibling, 0 replies; 7+ messages in thread
From: Jan Kara @ 2021-05-21 11:27 UTC (permalink / raw)
  To: Ming Lei; +Cc: Jan Kara, Jens Axboe, Paolo Valente, linux-block

On Fri 21-05-21 13:20:16, Jan Kara wrote:
> On Fri 21-05-21 09:29:33, Ming Lei wrote:
> > On Thu, May 20, 2021 at 01:25:28PM +0200, Jan Kara wrote:
> > > Provided the device driver does not implement dispatch budget accounting
> > > (which only SCSI does) the loop in __blk_mq_do_dispatch_sched() pulls
> > > requests from the IO scheduler as long as it is willing to give out any.
> > > That defeats scheduling heuristics inside the scheduler by creating
> > > false impression that the device can take more IO when it in fact
> > > cannot.
> > 
> > So hctx->dispatch_busy isn't set as true in this case?
> 
> No. blk_mq_update_dispatch_busy() has:
> 
>         if (hctx->queue->elevator)
>                 return;
> 
> > > For example with BFQ IO scheduler on top of virtio-blk device setting
> > > blkio cgroup weight has barely any impact on observed throughput of
> > > async IO because __blk_mq_do_dispatch_sched() always sucks out all the
> > > IO queued in BFQ. BFQ first submits IO from higher weight cgroups but
> > > when that is all dispatched, it will give out IO of lower weight cgroups
> > > as well. And then we have to wait for all this IO to be dispatched to
> > > the disk (which means lot of it actually has to complete) before the
> > > IO scheduler is queried again for dispatching more requests. This
> > > completely destroys any service differentiation.
> > > 
> > > So grab request tag for a request pulled out of the IO scheduler already
> > > in __blk_mq_do_dispatch_sched() and do not pull any more requests if we
> > > cannot get it because we are unlikely to be able to dispatch it. That
> > > way only single request is going to wait in the dispatch list for some
> > > tag to free.
> > > 
> > > Signed-off-by: Jan Kara <jack@suse.cz>
> > > ---
> > >  block/blk-mq-sched.c | 12 +++++++++++-
> > >  block/blk-mq.c       |  2 +-
> > >  block/blk-mq.h       |  2 ++
> > >  3 files changed, 14 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
> > > index 996a4b2f73aa..714e678f516a 100644
> > > --- a/block/blk-mq-sched.c
> > > +++ b/block/blk-mq-sched.c
> > > @@ -168,9 +168,19 @@ static int __blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
> > >  		 * in blk_mq_dispatch_rq_list().
> > >  		 */
> > >  		list_add_tail(&rq->queuelist, &rq_list);
> > > +		count++;
> > >  		if (rq->mq_hctx != hctx)
> > >  			multi_hctxs = true;
> > > -	} while (++count < max_dispatch);
> > > +
> > > +		/*
> > > +		 * If we cannot get tag for the request, stop dequeueing
> > > +		 * requests from the IO scheduler. We are unlikely to be able
> > > +		 * to submit them anyway and it creates false impression for
> > > +		 * scheduling heuristics that the device can take more IO.
> > > +		 */
> > > +		if (!blk_mq_get_driver_tag(rq))
> > > +			break;
> > 
> > At default BFQ's queue depth is same with virtblk_queue_depth, both are
> > 256, so looks you use non-default setting?
> 
> Ah yes, I forgot to mention that. I actually had nr_requests set to 1024 as
> otherwise all requests get sucked into virtio-blk and no IO scheduling
> happens either.
> 
> > Also in case of running out of driver tag, hctx->dispatch_busy should have
> > been set as true for avoiding batching dequeuing, does the following
> > patch make a difference for you?
> 
> I'll try it with modifying blk_mq_update_dispatch_busy() to be updated when
> elevator is in use...

OK, tried your patch with the modification and it seems to work OK! Thanks!

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] block: Do not pull requests from the scheduler when we cannot dispatch them
  2021-05-21 11:20   ` Jan Kara
  2021-05-21 11:27     ` Jan Kara
@ 2021-05-21 13:18     ` Ming Lei
  1 sibling, 0 replies; 7+ messages in thread
From: Ming Lei @ 2021-05-21 13:18 UTC (permalink / raw)
  To: Jan Kara; +Cc: Jens Axboe, Paolo Valente, linux-block

On Fri, May 21, 2021 at 01:20:16PM +0200, Jan Kara wrote:
> On Fri 21-05-21 09:29:33, Ming Lei wrote:
> > On Thu, May 20, 2021 at 01:25:28PM +0200, Jan Kara wrote:
> > > Provided the device driver does not implement dispatch budget accounting
> > > (which only SCSI does) the loop in __blk_mq_do_dispatch_sched() pulls
> > > requests from the IO scheduler as long as it is willing to give out any.
> > > That defeats scheduling heuristics inside the scheduler by creating
> > > false impression that the device can take more IO when it in fact
> > > cannot.
> > 
> > So hctx->dispatch_busy isn't set as true in this case?
> 
> No. blk_mq_update_dispatch_busy() has:
> 
>         if (hctx->queue->elevator)
>                 return;

ooops, the above check should have been killed in commit 6e6fcbc27e77
("blk-mq: support batching dispatch in case of io"), :-(


Thanks,
Ming


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] block: Do not pull requests from the scheduler when we cannot dispatch them
  2021-05-20 11:25 [PATCH] block: Do not pull requests from the scheduler when we cannot dispatch them Jan Kara
  2021-05-21  1:29 ` Ming Lei
@ 2021-06-02  9:25 ` Ming Lei
  2021-06-03 10:45   ` Jan Kara
  1 sibling, 1 reply; 7+ messages in thread
From: Ming Lei @ 2021-06-02  9:25 UTC (permalink / raw)
  To: Jan Kara; +Cc: Jens Axboe, Paolo Valente, linux-block

On Thu, May 20, 2021 at 01:25:28PM +0200, Jan Kara wrote:
> Provided the device driver does not implement dispatch budget accounting
> (which only SCSI does) the loop in __blk_mq_do_dispatch_sched() pulls
> requests from the IO scheduler as long as it is willing to give out any.
> That defeats scheduling heuristics inside the scheduler by creating
> false impression that the device can take more IO when it in fact
> cannot.
> 
> For example with BFQ IO scheduler on top of virtio-blk device setting
> blkio cgroup weight has barely any impact on observed throughput of
> async IO because __blk_mq_do_dispatch_sched() always sucks out all the
> IO queued in BFQ. BFQ first submits IO from higher weight cgroups but
> when that is all dispatched, it will give out IO of lower weight cgroups
> as well. And then we have to wait for all this IO to be dispatched to
> the disk (which means lot of it actually has to complete) before the
> IO scheduler is queried again for dispatching more requests. This
> completely destroys any service differentiation.
> 
> So grab request tag for a request pulled out of the IO scheduler already
> in __blk_mq_do_dispatch_sched() and do not pull any more requests if we
> cannot get it because we are unlikely to be able to dispatch it. That
> way only single request is going to wait in the dispatch list for some
> tag to free.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  block/blk-mq-sched.c | 12 +++++++++++-
>  block/blk-mq.c       |  2 +-
>  block/blk-mq.h       |  2 ++
>  3 files changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
> index 996a4b2f73aa..714e678f516a 100644
> --- a/block/blk-mq-sched.c
> +++ b/block/blk-mq-sched.c
> @@ -168,9 +168,19 @@ static int __blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
>  		 * in blk_mq_dispatch_rq_list().
>  		 */
>  		list_add_tail(&rq->queuelist, &rq_list);
> +		count++;
>  		if (rq->mq_hctx != hctx)
>  			multi_hctxs = true;
> -	} while (++count < max_dispatch);
> +
> +		/*
> +		 * If we cannot get tag for the request, stop dequeueing
> +		 * requests from the IO scheduler. We are unlikely to be able
> +		 * to submit them anyway and it creates false impression for
> +		 * scheduling heuristics that the device can take more IO.
> +		 */
> +		if (!blk_mq_get_driver_tag(rq))
> +			break;
> +	} while (count < max_dispatch);
>  
>  	if (!count) {
>  		if (run_queue)
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index c86c01bfecdb..bc2cf80d2c3b 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -1100,7 +1100,7 @@ static bool __blk_mq_get_driver_tag(struct request *rq)
>  	return true;
>  }
>  
> -static bool blk_mq_get_driver_tag(struct request *rq)
> +bool blk_mq_get_driver_tag(struct request *rq)
>  {
>  	struct blk_mq_hw_ctx *hctx = rq->mq_hctx;
>  
> diff --git a/block/blk-mq.h b/block/blk-mq.h
> index 9ce64bc4a6c8..81a775171be7 100644
> --- a/block/blk-mq.h
> +++ b/block/blk-mq.h
> @@ -259,6 +259,8 @@ static inline void blk_mq_put_driver_tag(struct request *rq)
>  	__blk_mq_put_driver_tag(rq->mq_hctx, rq);
>  }
>  
> +bool blk_mq_get_driver_tag(struct request *rq);
> +
>  static inline void blk_mq_clear_mq_map(struct blk_mq_queue_map *qmap)
>  {
>  	int cpu;

Thinking of further, looks this patch is fine, and it is safe to use driver tag
allocation result to decide if more requests need to be dequeued since run queue
always be followed when breaking from the loop. Also I can observe that
io.bfq.weight is improved on virtio-blk, so

Reviewed-by: Ming Lei <ming.lei@redhat.com>

Thanks,
Ming


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] block: Do not pull requests from the scheduler when we cannot dispatch them
  2021-06-02  9:25 ` Ming Lei
@ 2021-06-03 10:45   ` Jan Kara
  0 siblings, 0 replies; 7+ messages in thread
From: Jan Kara @ 2021-06-03 10:45 UTC (permalink / raw)
  To: Ming Lei; +Cc: Jan Kara, Jens Axboe, Paolo Valente, linux-block

On Wed 02-06-21 17:25:52, Ming Lei wrote:
> On Thu, May 20, 2021 at 01:25:28PM +0200, Jan Kara wrote:
> > Provided the device driver does not implement dispatch budget accounting
> > (which only SCSI does) the loop in __blk_mq_do_dispatch_sched() pulls
> > requests from the IO scheduler as long as it is willing to give out any.
> > That defeats scheduling heuristics inside the scheduler by creating
> > false impression that the device can take more IO when it in fact
> > cannot.
> > 
> > For example with BFQ IO scheduler on top of virtio-blk device setting
> > blkio cgroup weight has barely any impact on observed throughput of
> > async IO because __blk_mq_do_dispatch_sched() always sucks out all the
> > IO queued in BFQ. BFQ first submits IO from higher weight cgroups but
> > when that is all dispatched, it will give out IO of lower weight cgroups
> > as well. And then we have to wait for all this IO to be dispatched to
> > the disk (which means lot of it actually has to complete) before the
> > IO scheduler is queried again for dispatching more requests. This
> > completely destroys any service differentiation.
> > 
> > So grab request tag for a request pulled out of the IO scheduler already
> > in __blk_mq_do_dispatch_sched() and do not pull any more requests if we
> > cannot get it because we are unlikely to be able to dispatch it. That
> > way only single request is going to wait in the dispatch list for some
> > tag to free.
> > 
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> >  block/blk-mq-sched.c | 12 +++++++++++-
> >  block/blk-mq.c       |  2 +-
> >  block/blk-mq.h       |  2 ++
> >  3 files changed, 14 insertions(+), 2 deletions(-)
> > 
> > diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
> > index 996a4b2f73aa..714e678f516a 100644
> > --- a/block/blk-mq-sched.c
> > +++ b/block/blk-mq-sched.c
> > @@ -168,9 +168,19 @@ static int __blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
> >  		 * in blk_mq_dispatch_rq_list().
> >  		 */
> >  		list_add_tail(&rq->queuelist, &rq_list);
> > +		count++;
> >  		if (rq->mq_hctx != hctx)
> >  			multi_hctxs = true;
> > -	} while (++count < max_dispatch);
> > +
> > +		/*
> > +		 * If we cannot get tag for the request, stop dequeueing
> > +		 * requests from the IO scheduler. We are unlikely to be able
> > +		 * to submit them anyway and it creates false impression for
> > +		 * scheduling heuristics that the device can take more IO.
> > +		 */
> > +		if (!blk_mq_get_driver_tag(rq))
> > +			break;
> > +	} while (count < max_dispatch);
> >  
> >  	if (!count) {
> >  		if (run_queue)
> > diff --git a/block/blk-mq.c b/block/blk-mq.c
> > index c86c01bfecdb..bc2cf80d2c3b 100644
> > --- a/block/blk-mq.c
> > +++ b/block/blk-mq.c
> > @@ -1100,7 +1100,7 @@ static bool __blk_mq_get_driver_tag(struct request *rq)
> >  	return true;
> >  }
> >  
> > -static bool blk_mq_get_driver_tag(struct request *rq)
> > +bool blk_mq_get_driver_tag(struct request *rq)
> >  {
> >  	struct blk_mq_hw_ctx *hctx = rq->mq_hctx;
> >  
> > diff --git a/block/blk-mq.h b/block/blk-mq.h
> > index 9ce64bc4a6c8..81a775171be7 100644
> > --- a/block/blk-mq.h
> > +++ b/block/blk-mq.h
> > @@ -259,6 +259,8 @@ static inline void blk_mq_put_driver_tag(struct request *rq)
> >  	__blk_mq_put_driver_tag(rq->mq_hctx, rq);
> >  }
> >  
> > +bool blk_mq_get_driver_tag(struct request *rq);
> > +
> >  static inline void blk_mq_clear_mq_map(struct blk_mq_queue_map *qmap)
> >  {
> >  	int cpu;
> 
> Thinking of further, looks this patch is fine, and it is safe to use driver tag
> allocation result to decide if more requests need to be dequeued since run queue
> always be followed when breaking from the loop. Also I can observe that
> io.bfq.weight is improved on virtio-blk, so
> 
> Reviewed-by: Ming Lei <ming.lei@redhat.com>

OK, thanks for your review!

									Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-06-03 10:45 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-20 11:25 [PATCH] block: Do not pull requests from the scheduler when we cannot dispatch them Jan Kara
2021-05-21  1:29 ` Ming Lei
2021-05-21 11:20   ` Jan Kara
2021-05-21 11:27     ` Jan Kara
2021-05-21 13:18     ` Ming Lei
2021-06-02  9:25 ` Ming Lei
2021-06-03 10:45   ` Jan Kara

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).