linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Shaohua Li <shli@fb.com>
To: Jeff Moyer <jmoyer@redhat.com>
Cc: <linux-kernel@vger.kernel.org>, <axboe@fb.com>, <hch@lst.de>,
	<neilb@suse.de>
Subject: Re: [PATCH 4/5] blk-mq: do limited block plug for multiple queue case
Date: Mon, 4 May 2015 12:40:39 -0700	[thread overview]
Message-ID: <20150504194037.GA3300441@devbig257.prn2.facebook.com> (raw)
In-Reply-To: <x49d22kyry3.fsf@segfault.boston.devel.redhat.com>

On Fri, May 01, 2015 at 04:16:04PM -0400, Jeff Moyer wrote:
> Shaohua Li <shli@fb.com> writes:
> 
> > plug is still helpful for workload with IO merge, but it can be harmful
> > otherwise especially with multiple hardware queues, as there is
> > (supposed) no lock contention in this case and plug can introduce
> > latency. For multiple queues, we do limited plug, eg plug only if there
> > is request merge. If a request doesn't have merge with following
> > request, the requet will be dispatched immediately.
> >
> > This also fixes a bug. If we directly issue a request and it fails, we
> > use blk_mq_merge_queue_io(). But we already assigned bio to a request in
> > blk_mq_bio_to_request. blk_mq_merge_queue_io shouldn't run
> > blk_mq_bio_to_request again.
> 
> Good catch.  Might've been better to split that out first for easy
> backport to stable kernels, but I won't hold you to that.

It's not a severe bug, but I don't mind. Jens, please let me know if I
should split the patch into 2 patches.
 
> > @@ -1243,6 +1277,10 @@ static void blk_mq_make_request(struct request_queue *q, struct bio *bio)
> >  		return;
> >  	}
> >  
> > +	if (likely(!is_flush_fua) && !blk_queue_nomerges(q) &&
> > +	    blk_attempt_plug_merge(q, bio, &request_count))
> > +		return;
> > +
> >  	rq = blk_mq_map_request(q, bio, &data);
> >  	if (unlikely(!rq))
> >  		return;
> 
> After this patch, everything up to this point in blk_mq_make_request and
> blk_sq_make_request is the same.  This can be factored out (in another
> patch) to a common function.

I'll leave this for a separate cleanup if a good function name is found.

> > @@ -1253,38 +1291,38 @@ static void blk_mq_make_request(struct request_queue *q, struct bio *bio)
> >  		goto run_queue;
> >  	}
> >  
> > +	plug = current->plug;
> >  	/*
> >  	 * If the driver supports defer issued based on 'last', then
> >  	 * queue it up like normal since we can potentially save some
> >  	 * CPU this way.
> >  	 */
> > -	if (is_sync && !(data.hctx->flags & BLK_MQ_F_DEFER_ISSUE)) {
> > -		struct blk_mq_queue_data bd = {
> > -			.rq = rq,
> > -			.list = NULL,
> > -			.last = 1
> > -		};
> > -		int ret;
> > +	if ((plug || is_sync) && !(data.hctx->flags & BLK_MQ_F_DEFER_ISSUE)) {
> > +		struct request *old_rq = NULL;
> 
> I would add a !blk_queue_nomerges(q) to that conditional.  There's no
> point holding back an I/O when we won't merge it anyway.

Good catch! Fixed.
 
> That brings up another quirk of the current implementation (not your
> patches) that bugs me.
> 
> BLK_MQ_F_SHOULD_MERGE
> QUEUE_FLAG_NOMERGES
> 
> Those two flags are set independently, one via the driver and the other
> via a sysfs file.  So the user could set the nomerges flag to 1 or 2,
> and still potentially get merges (see blk_mq_merge_queue_io).  That's
> something that should be fixed, albeit that can wait.

Agree 
> >  		blk_mq_bio_to_request(rq, bio);
> >  
> >  		/*
> > -		 * For OK queue, we are done. For error, kill it. Any other
> > -		 * error (busy), just add it to our list as we previously
> > -		 * would have done
> > +		 * we do limited pluging. If bio can be merged, do merge.
> > +		 * Otherwise the existing request in the plug list will be
> > +		 * issued. So the plug list will have one request at most
> >  		 */
> > -		ret = q->mq_ops->queue_rq(data.hctx, &bd);
> > -		if (ret == BLK_MQ_RQ_QUEUE_OK)
> > -			goto done;
> > -		else {
> > -			__blk_mq_requeue_request(rq);
> > -
> > -			if (ret == BLK_MQ_RQ_QUEUE_ERROR) {
> > -				rq->errors = -EIO;
> > -				blk_mq_end_request(rq, rq->errors);
> > -				goto done;
> > +		if (plug) {
> > +			if (!list_empty(&plug->mq_list)) {
> > +				old_rq = list_first_entry(&plug->mq_list,
> > +					struct request, queuelist);
> > +				list_del_init(&old_rq->queuelist);
> >  			}
> > -		}
> > +			list_add_tail(&rq->queuelist, &plug->mq_list);
> > +		} else /* is_sync */
> > +			old_rq = rq;
> > +		blk_mq_put_ctx(data.ctx);
> > +		if (!old_rq)
> > +			return;
> > +		if (!blk_mq_direct_issue_request(old_rq))
> > +			return;
> > +		blk_mq_insert_request(old_rq, false, true, true);
> > +		return;
> >  	}
> 
> Now there is no way to exit that if block, we always return.  It may be
> worth cosidering moving that block to its own function, if you can think
> of a good name for it.

I'll leave this for a later work

> Other than those minor issues, this looks good to me.

Thanks for your time!


>From b2f2f6fbf72e4b80dffbce3ada6a151754407044 Mon Sep 17 00:00:00 2001
Message-Id: <b2f2f6fbf72e4b80dffbce3ada6a151754407044.1430766392.git.shli@fb.com>
In-Reply-To: <f3bfe60a013827942790c89d658f63c920653437.1430766392.git.shli@fb.com>
References: <f3bfe60a013827942790c89d658f63c920653437.1430766392.git.shli@fb.com>
From: Shaohua Li <shli@fb.com>
Date: Wed, 29 Apr 2015 16:45:40 -0700
Subject: [PATCH 4/5] blk-mq: do limited block plug for multiple queue case

plug is still helpful for workload with IO merge, but it can be harmful
otherwise especially with multiple hardware queues, as there is
(supposed) no lock contention in this case and plug can introduce
latency. For multiple queues, we do limited plug, eg plug only if there
is request merge. If a request doesn't have merge with following
request, the requet will be dispatched immediately.

This also fixes a bug. If we directly issue a request and it fails, we
use blk_mq_merge_queue_io(). But we already assigned bio to a request in
blk_mq_bio_to_request. blk_mq_merge_queue_io shouldn't run
blk_mq_bio_to_request again.

Cc: Jens Axboe <axboe@fb.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Shaohua Li <shli@fb.com>
---
 block/blk-mq.c | 82 ++++++++++++++++++++++++++++++++++++++++++----------------
 1 file changed, 60 insertions(+), 22 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 7f9d3a1..6a6b6d0 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1224,6 +1224,38 @@ static struct request *blk_mq_map_request(struct request_queue *q,
 	return rq;
 }
 
+static int blk_mq_direct_issue_request(struct request *rq)
+{
+	int ret;
+	struct request_queue *q = rq->q;
+	struct blk_mq_hw_ctx *hctx = q->mq_ops->map_queue(q,
+			rq->mq_ctx->cpu);
+	struct blk_mq_queue_data bd = {
+		.rq = rq,
+		.list = NULL,
+		.last = 1
+	};
+
+	/*
+	 * For OK queue, we are done. For error, kill it. Any other
+	 * error (busy), just add it to our list as we previously
+	 * would have done
+	 */
+	ret = q->mq_ops->queue_rq(hctx, &bd);
+	if (ret == BLK_MQ_RQ_QUEUE_OK)
+		return 0;
+	else {
+		__blk_mq_requeue_request(rq);
+
+		if (ret == BLK_MQ_RQ_QUEUE_ERROR) {
+			rq->errors = -EIO;
+			blk_mq_end_request(rq, rq->errors);
+			return 0;
+		}
+		return -1;
+	}
+}
+
 /*
  * Multiple hardware queue variant. This will not use per-process plugs,
  * but will attempt to bypass the hctx queueing if we can go straight to
@@ -1235,6 +1267,8 @@ static void blk_mq_make_request(struct request_queue *q, struct bio *bio)
 	const int is_flush_fua = bio->bi_rw & (REQ_FLUSH | REQ_FUA);
 	struct blk_map_ctx data;
 	struct request *rq;
+	unsigned int request_count = 0;
+	struct blk_plug *plug;
 
 	blk_queue_bounce(q, &bio);
 
@@ -1243,6 +1277,10 @@ static void blk_mq_make_request(struct request_queue *q, struct bio *bio)
 		return;
 	}
 
+	if (likely(!is_flush_fua) && !blk_queue_nomerges(q) &&
+	    blk_attempt_plug_merge(q, bio, &request_count))
+		return;
+
 	rq = blk_mq_map_request(q, bio, &data);
 	if (unlikely(!rq))
 		return;
@@ -1253,38 +1291,39 @@ static void blk_mq_make_request(struct request_queue *q, struct bio *bio)
 		goto run_queue;
 	}
 
+	plug = current->plug;
 	/*
 	 * If the driver supports defer issued based on 'last', then
 	 * queue it up like normal since we can potentially save some
 	 * CPU this way.
 	 */
-	if (is_sync && !(data.hctx->flags & BLK_MQ_F_DEFER_ISSUE)) {
-		struct blk_mq_queue_data bd = {
-			.rq = rq,
-			.list = NULL,
-			.last = 1
-		};
-		int ret;
+	if (((plug && !blk_queue_nomerges(q)) || is_sync) &&
+	    !(data.hctx->flags & BLK_MQ_F_DEFER_ISSUE)) {
+		struct request *old_rq = NULL;
 
 		blk_mq_bio_to_request(rq, bio);
 
 		/*
-		 * For OK queue, we are done. For error, kill it. Any other
-		 * error (busy), just add it to our list as we previously
-		 * would have done
+		 * we do limited pluging. If bio can be merged, do merge.
+		 * Otherwise the existing request in the plug list will be
+		 * issued. So the plug list will have one request at most
 		 */
-		ret = q->mq_ops->queue_rq(data.hctx, &bd);
-		if (ret == BLK_MQ_RQ_QUEUE_OK)
-			goto done;
-		else {
-			__blk_mq_requeue_request(rq);
-
-			if (ret == BLK_MQ_RQ_QUEUE_ERROR) {
-				rq->errors = -EIO;
-				blk_mq_end_request(rq, rq->errors);
-				goto done;
+		if (plug) {
+			if (!list_empty(&plug->mq_list)) {
+				old_rq = list_first_entry(&plug->mq_list,
+					struct request, queuelist);
+				list_del_init(&old_rq->queuelist);
 			}
-		}
+			list_add_tail(&rq->queuelist, &plug->mq_list);
+		} else /* is_sync */
+			old_rq = rq;
+		blk_mq_put_ctx(data.ctx);
+		if (!old_rq)
+			return;
+		if (!blk_mq_direct_issue_request(old_rq))
+			return;
+		blk_mq_insert_request(old_rq, false, true, true);
+		return;
 	}
 
 	if (!blk_mq_merge_queue_io(data.hctx, data.ctx, rq, bio)) {
@@ -1297,7 +1336,6 @@ static void blk_mq_make_request(struct request_queue *q, struct bio *bio)
 run_queue:
 		blk_mq_run_hw_queue(data.hctx, !is_sync || is_flush_fua);
 	}
-done:
 	blk_mq_put_ctx(data.ctx);
 }
 
-- 
1.8.1


  reply	other threads:[~2015-05-04 19:41 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-30 17:45 [PATCH 0/5] blk plug fixes Shaohua Li
2015-04-30 17:45 ` [PATCH 1/5] blk: clean up plug Shaohua Li
2015-05-01 17:11   ` Christoph Hellwig
2015-04-30 17:45 ` [PATCH 2/5] sched: always use blk_schedule_flush_plug in io_schedule_out Shaohua Li
2015-05-01 17:14   ` Christoph Hellwig
2015-05-01 18:05     ` Shaohua Li
2015-05-01 17:42   ` Jeff Moyer
2015-05-01 18:07     ` Jeff Moyer
2015-05-01 18:28       ` Shaohua Li
2015-05-01 19:37         ` Jeff Moyer
2015-04-30 17:45 ` [PATCH 3/5] blk-mq: fix plugging in blk_sq_make_request Shaohua Li
2015-05-01 17:16   ` Christoph Hellwig
2015-05-01 17:47     ` Jeff Moyer
2015-04-30 17:45 ` [PATCH 4/5] blk-mq: do limited block plug for multiple queue case Shaohua Li
2015-05-01 20:16   ` Jeff Moyer
2015-05-04 19:40     ` Shaohua Li [this message]
2015-05-04 19:46       ` Jens Axboe
2015-05-04 20:33         ` Shaohua Li
2015-05-04 20:35           ` Jens Axboe
2015-04-30 17:45 ` [PATCH 5/5] blk-mq: make plug work for mutiple disks and queues Shaohua Li
2015-05-01 20:55   ` Jeff Moyer
2015-05-04 19:44     ` Shaohua Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150504194037.GA3300441@devbig257.prn2.facebook.com \
    --to=shli@fb.com \
    --cc=axboe@fb.com \
    --cc=hch@lst.de \
    --cc=jmoyer@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).