From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Subject: Re: [PATCH 1/2] blk-mq: add requests in the tail of hctx->dispatch To: Ming Lei Cc: linux-block@vger.kernel.org, Christoph Hellwig , Bart Van Assche , Oleksandr Natalenko References: <20170830151935.24253-1-ming.lei@redhat.com> <20170830151935.24253-3-ming.lei@redhat.com> <567ad683-d577-1817-cf96-eff5aaf47db6@kernel.dk> <20170830153929.GB14684@ming.t460p> From: Jens Axboe Message-ID: Date: Wed, 30 Aug 2017 09:51:31 -0600 MIME-Version: 1.0 In-Reply-To: <20170830153929.GB14684@ming.t460p> Content-Type: text/plain; charset=utf-8 List-ID: On 08/30/2017 09:39 AM, Ming Lei wrote: > On Wed, Aug 30, 2017 at 09:22:42AM -0600, Jens Axboe wrote: >> On 08/30/2017 09:19 AM, Ming Lei wrote: >>> It is more reasonable to add requests to ->dispatch in way >>> of FIFO style, instead of LIFO style. >>> >>> Also in this way, we can allow to insert request at the front >>> of hw queue, which function is needed to fix one bug >>> in blk-mq's implementation of blk_execute_rq() >>> >>> Reported-by: Oleksandr Natalenko >>> Tested-by: Oleksandr Natalenko >>> Signed-off-by: Ming Lei >>> --- >>> block/blk-mq-sched.c | 2 +- >>> block/blk-mq.c | 2 +- >>> 2 files changed, 2 insertions(+), 2 deletions(-) >>> >>> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c >>> index 4ab69435708c..8d97df40fc28 100644 >>> --- a/block/blk-mq-sched.c >>> +++ b/block/blk-mq-sched.c >>> @@ -272,7 +272,7 @@ static bool blk_mq_sched_bypass_insert(struct blk_mq_hw_ctx *hctx, >>> * the dispatch list. >>> */ >>> spin_lock(&hctx->lock); >>> - list_add(&rq->queuelist, &hctx->dispatch); >>> + list_add_tail(&rq->queuelist, &hctx->dispatch); >>> spin_unlock(&hctx->lock); >>> return true; >>> } >>> diff --git a/block/blk-mq.c b/block/blk-mq.c >>> index 4603b115e234..fed3d0c16266 100644 >>> --- a/block/blk-mq.c >>> +++ b/block/blk-mq.c >>> @@ -1067,7 +1067,7 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list) >>> blk_mq_put_driver_tag(rq); >>> >>> spin_lock(&hctx->lock); >>> - list_splice_init(list, &hctx->dispatch); >>> + list_splice_tail_init(list, &hctx->dispatch); >>> spin_unlock(&hctx->lock); >> >> I'm not convinced this is safe, there's actually a reason why the >> request is added to the front and not the back. We do have >> reorder_tags_to_front() as a safe guard, but I'd much rather get rid of > > reorder_tags_to_front() is for reordering the requests in current list, > this patch is for splicing list into hctx->dispatch, so I can't see > it isn't safe, or could you explain it a bit? If we can get the ordering right, then down the line we won't need to have the tags reordering at all. It's an ugly hack that I'd love to see go away. >> that than make this change. >> >> What's your reasoning here? Your changelog doesn't really explain why > > Firstly the 2nd patch need to add one rq(such as RQF_PM) to the > front of the hw queue, the simple way is to add it to the front > of hctx->dispatch. Without this change, the 2nd patch can't work > at all. > > Secondly this way is still reasonable: > > - one rq is added to hctx->dispatch because queue is busy > - another rq is added to hctx->dispatch too because of same reason > > so it is reasonable to to add list into hctx->dispatch in FIFO style. Not disagreeing with the logic. But it also begs the question of why we don't apply the same treatment to when we splice leftovers to the dispatch list, currently we front splice that. All I'm saying is that you need to tread very carefully with this, and throw it through some careful testing to ensure that we don't introduce conditions that now livelock. NVMe is the easy test case, that will generally always work since we never run out of tags. The problematic test case is usually things like SATA with 31 tags, and especially SATA with flushes that don't queue. One good test case is the one where you end up having all tags (or almost all) consumed by flushes, and still ensuring that we're making forward progress. -- Jens Axboe