From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.1 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1690BC433E0 for ; Tue, 30 Jun 2020 00:57:48 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D683C20781 for ; Tue, 30 Jun 2020 00:57:47 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="Jce7pnpW" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726601AbgF3A5r (ORCPT ); Mon, 29 Jun 2020 20:57:47 -0400 Received: from us-smtp-delivery-1.mimecast.com ([207.211.31.120]:36899 "EHLO us-smtp-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1725998AbgF3A5r (ORCPT ); Mon, 29 Jun 2020 20:57:47 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1593478664; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=42nLYzYhkWK6/r5DmX+hlkd7lSnr9BbdwwD5Wi4QyC8=; b=Jce7pnpWVDpkGo73k47nO1YamJ4wIRqyVUdvih792erFKja5x7zpLjsH0EFTKuJZ9pwFXp BwVXz10tqSgxIt6FcGpUouxj+WqnpZZ0JXeuX9zgItC1fnTKpHTnYWlPpXBQW3y7pdTKqA VtNVi3oSc1v3AGlGwmiLljPcFt7MFWw= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-137-5mQI4EAgPPajF_mX7Hr3Ug-1; Mon, 29 Jun 2020 20:57:42 -0400 X-MC-Unique: 5mQI4EAgPPajF_mX7Hr3Ug-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 0F3CE107ACF5; Tue, 30 Jun 2020 00:57:41 +0000 (UTC) Received: from T590 (ovpn-13-98.pek2.redhat.com [10.72.13.98]) by smtp.corp.redhat.com (Postfix) with ESMTPS id C03A428573; Tue, 30 Jun 2020 00:57:34 +0000 (UTC) Date: Tue, 30 Jun 2020 08:57:30 +0800 From: Ming Lei To: Christoph Hellwig Cc: Jens Axboe , linux-block@vger.kernel.org, Sagi Grimberg , Baolin Wang Subject: Re: [PATCH V6 6/6] blk-mq: support batching dispatch in case of io scheduler Message-ID: <20200630005730.GA2049266@T590> References: <20200624230349.1046821-1-ming.lei@redhat.com> <20200624230349.1046821-7-ming.lei@redhat.com> <20200629090452.GA4892@infradead.org> <20200629103239.GB1881343@T590> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200629103239.GB1881343@T590> X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org On Mon, Jun 29, 2020 at 06:32:39PM +0800, Ming Lei wrote: > On Mon, Jun 29, 2020 at 10:04:52AM +0100, Christoph Hellwig wrote: > > > +/* > > > + * We know bfq and deadline apply single scheduler queue instead of multi > > > + * queue. However, the two are often used on single queue devices, also > > > + * the current @hctx should affect the real device status most of times > > > + * because of locality principle. > > > + * > > > + * So use current hctx->dispatch_busy directly for figuring out batching > > > + * dispatch count. > > > + */ > > > > I don't really understand this comment. Also I think the code might > > be cleaner if this function is inlined as an if/else in the only > > caller. > > > > > +static inline bool blk_mq_do_dispatch_rq_lists(struct blk_mq_hw_ctx *hctx, > > > + struct list_head *lists, bool multi_hctxs, unsigned count) > > > +{ > > > + bool ret; > > > + > > > + if (!count) > > > + return false; > > > + > > > + if (likely(!multi_hctxs)) > > > + return blk_mq_dispatch_rq_list(hctx, lists, count); > > > > Keeping these checks in the callers would keep things a little cleaner, > > especially as the multi hctx case only really needs the lists argument. > > > > > + LIST_HEAD(list); > > > + struct request *new, *rq = list_first_entry(lists, > > > + struct request, queuelist); > > > + unsigned cnt = 0; > > > + > > > + list_for_each_entry(new, lists, queuelist) { > > > + if (new->mq_hctx != rq->mq_hctx) > > > + break; > > > + cnt++; > > > + } > > > + > > > + if (new->mq_hctx == rq->mq_hctx) > > > + list_splice_tail_init(lists, &list); > > > + else > > > + list_cut_before(&list, lists, &new->queuelist); > > > + > > > + ret = blk_mq_dispatch_rq_list(rq->mq_hctx, &list, cnt); > > > + } > > > > I think this has two issues: for one ret should be ORed as any dispatch > > or error should leaave ret set. Also we need to splice the dispatch > > OK. > > > list back onto the main list here, otherwise we can lose those requests. > > The dispatch list always becomes empty after blk_mq_dispatch_rq_list() > returns, so no need to splice it back. > > > > > FYI, while reviewing this I ended up editing things into a shape I > > could better understand. Let me know what you think of this version? > > > > > > diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c > > index 4c72073830f3cb..466dce99699ae4 100644 > > --- a/block/blk-mq-sched.c > > +++ b/block/blk-mq-sched.c > > @@ -7,6 +7,7 @@ > > #include > > #include > > #include > > +#include > > > > #include > > > > @@ -80,6 +81,38 @@ void blk_mq_sched_restart(struct blk_mq_hw_ctx *hctx) > > blk_mq_run_hw_queue(hctx, true); > > } > > > > +static int sched_rq_cmp(void *priv, struct list_head *a, struct list_head *b) > > +{ > > + struct request *rqa = container_of(a, struct request, queuelist); > > + struct request *rqb = container_of(b, struct request, queuelist); > > + > > + return rqa->mq_hctx > rqb->mq_hctx; > > +} > > + > > +static bool blk_mq_dispatch_hctx_list(struct list_head *rq_list) > > +{ > > + struct blk_mq_hw_ctx *hctx = > > + list_first_entry(rq_list, struct request, queuelist)->mq_hctx; > > + struct request *rq; > > + LIST_HEAD(hctx_list); > > + unsigned int count = 0; > > + bool ret; > > + > > + list_for_each_entry(rq, rq_list, queuelist) { > > + if (rq->mq_hctx != hctx) { > > + list_cut_before(&hctx_list, rq_list, &rq->queuelist); > > + goto dispatch; > > + } > > + count++; > > + } > > + list_splice_tail_init(rq_list, &hctx_list); > > + > > +dispatch: > > + ret = blk_mq_dispatch_rq_list(hctx, &hctx_list, count); > > + list_splice(&hctx_list, rq_list); > > The above line isn't needed. > > > + return ret; > > +} > > + > > #define BLK_MQ_BUDGET_DELAY 3 /* ms units */ > > > > /* > > @@ -90,20 +123,29 @@ void blk_mq_sched_restart(struct blk_mq_hw_ctx *hctx) > > * Returns -EAGAIN if hctx->dispatch was found non-empty and run_work has to > > * be run again. This is necessary to avoid starving flushes. > > */ > > -static int blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx) > > +static int __blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx) > > The return type can be changed to 'bool'. > > > { > > struct request_queue *q = hctx->queue; > > struct elevator_queue *e = q->elevator; > > + bool multi_hctxs = false, run_queue = false; > > + bool dispatched = false, busy = false; > > + unsigned int max_dispatch; > > LIST_HEAD(rq_list); > > - int ret = 0; > > - struct request *rq; > > + int count = 0; > > + > > + if (hctx->dispatch_busy) > > + max_dispatch = 1; > > + else > > + max_dispatch = hctx->queue->nr_requests; > > > > do { > > + struct request *rq; > > + > > if (e->type->ops.has_work && !e->type->ops.has_work(hctx)) > > break; > > > > if (!list_empty_careful(&hctx->dispatch)) { > > - ret = -EAGAIN; > > + busy = true; > > break; > > } > > > > @@ -120,7 +162,7 @@ static int blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx) > > * no guarantee anyone will kick the queue. Kick it > > * ourselves. > > */ > > - blk_mq_delay_run_hw_queues(q, BLK_MQ_BUDGET_DELAY); > > + run_queue = true; > > break; > > } > > > > @@ -130,7 +172,43 @@ static int blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx) > > * in blk_mq_dispatch_rq_list(). > > */ > > list_add(&rq->queuelist, &rq_list); > > The above should change to list_add_tail(&rq->queuelist, &rq_list). Hi Christoph, Follows the revised patch, and will post it as V7 if you are fine: diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c index 4c72073830f3..1c52e56a19b1 100644 --- a/block/blk-mq-sched.c +++ b/block/blk-mq-sched.c @@ -7,6 +7,7 @@ #include #include #include +#include #include @@ -80,6 +81,37 @@ void blk_mq_sched_restart(struct blk_mq_hw_ctx *hctx) blk_mq_run_hw_queue(hctx, true); } +static int sched_rq_cmp(void *priv, struct list_head *a, struct list_head *b) +{ + struct request *rqa = container_of(a, struct request, queuelist); + struct request *rqb = container_of(b, struct request, queuelist); + + return rqa->mq_hctx > rqb->mq_hctx; +} + +static bool blk_mq_dispatch_hctx_list(struct list_head *rq_list) +{ + struct blk_mq_hw_ctx *hctx = + list_first_entry(rq_list, struct request, queuelist)->mq_hctx; + struct request *rq; + LIST_HEAD(hctx_list); + unsigned int count = 0; + bool ret; + + list_for_each_entry(rq, rq_list, queuelist) { + if (rq->mq_hctx != hctx) { + list_cut_before(&hctx_list, rq_list, &rq->queuelist); + goto dispatch; + } + count++; + } + list_splice_tail_init(rq_list, &hctx_list); + +dispatch: + ret = blk_mq_dispatch_rq_list(hctx, &hctx_list, count); + return ret; +} + #define BLK_MQ_BUDGET_DELAY 3 /* ms units */ /* @@ -90,20 +122,29 @@ void blk_mq_sched_restart(struct blk_mq_hw_ctx *hctx) * Returns -EAGAIN if hctx->dispatch was found non-empty and run_work has to * be run again. This is necessary to avoid starving flushes. */ -static int blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx) +static int __blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx) { struct request_queue *q = hctx->queue; struct elevator_queue *e = q->elevator; + bool multi_hctxs = false, run_queue = false; + bool dispatched = false, busy = false; + unsigned int max_dispatch; LIST_HEAD(rq_list); - int ret = 0; - struct request *rq; + int count = 0; + + if (hctx->dispatch_busy) + max_dispatch = 1; + else + max_dispatch = hctx->queue->nr_requests; do { + struct request *rq; + if (e->type->ops.has_work && !e->type->ops.has_work(hctx)) break; if (!list_empty_careful(&hctx->dispatch)) { - ret = -EAGAIN; + busy = true; break; } @@ -120,7 +161,7 @@ static int blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx) * no guarantee anyone will kick the queue. Kick it * ourselves. */ - blk_mq_delay_run_hw_queues(q, BLK_MQ_BUDGET_DELAY); + run_queue = true; break; } @@ -129,8 +170,42 @@ static int blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx) * if this rq won't be queued to driver via .queue_rq() * in blk_mq_dispatch_rq_list(). */ - list_add(&rq->queuelist, &rq_list); - } while (blk_mq_dispatch_rq_list(rq->mq_hctx, &rq_list, 1)); + list_add_tail(&rq->queuelist, &rq_list); + if (rq->mq_hctx != hctx) + multi_hctxs = true; + } while (++count < max_dispatch); + + if (!count) { + if (run_queue) + blk_mq_delay_run_hw_queues(q, BLK_MQ_BUDGET_DELAY); + } else if (multi_hctxs) { + /* + * Requests from different hctx may be dequeued from some + * schedulers, such as bfq and deadline. + * + * Sort the requests in the list according to their hctx, + * dispatch batching requests from same hctx at a time. + */ + list_sort(NULL, &rq_list, sched_rq_cmp); + do { + dispatched |= blk_mq_dispatch_hctx_list(&rq_list); + } while (!list_empty(&rq_list)); + } else { + dispatched = blk_mq_dispatch_rq_list(hctx, &rq_list, count); + } + + if (busy) + return -EAGAIN; + return !!dispatched; +} + +static int blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx) +{ + int ret; + + do { + ret = __blk_mq_do_dispatch_sched(hctx); + } while (ret == 1); return ret; } diff --git a/block/blk-mq.c b/block/blk-mq.c index d273a56f11c0..57ae018d5cc8 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1323,8 +1323,6 @@ bool blk_mq_dispatch_rq_list(struct blk_mq_hw_ctx *hctx, struct list_head *list, if (list_empty(list)) return false; - WARN_ON(!list_is_singular(list) && nr_budgets); - /* * Now process all the entries, sending them to the driver. */ Thanks, Ming