From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.0 required=3.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A7B7FC43441 for ; Thu, 29 Nov 2018 03:13:48 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 6047620863 for ; Thu, 29 Nov 2018 03:13:48 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20150623.gappssmtp.com header.i=@kernel-dk.20150623.gappssmtp.com header.b="w55Ezx6R" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6047620863 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-block-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727171AbeK2ORi (ORCPT ); Thu, 29 Nov 2018 09:17:38 -0500 Received: from mail-pl1-f193.google.com ([209.85.214.193]:45377 "EHLO mail-pl1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727022AbeK2ORh (ORCPT ); Thu, 29 Nov 2018 09:17:37 -0500 Received: by mail-pl1-f193.google.com with SMTP id a14so259684plm.12 for ; Wed, 28 Nov 2018 19:13:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=eOzY4SNgF1Zqv1Lk3/33qe28TpEajp8xfBP2kmD5tNc=; b=w55Ezx6RTjwS4ZL6bYRXGXdIOxBKiXIQpboxhBse71dWgmLImxFpVPleABOHYWeOun aTJ5dxDdo40zgBhqcvdJIrDD9m8kYt3h7rIy1AtZCS1Okt0SRVaRtlEMsC4PBnz3F9Sq wqCBwfCb4JO299eAefakDGTAR9peWw4iK4L109jSuu891EqE83XS9euvOBfSxpn7lxYg Hs93xFSa17LblxJzQ+JnLFMORnbQ+B/hgXbBtH+ulDcdY4CXCevtqObEUeqYisCXClb8 5keDWnrtXB0YMXQn2CjOi4pisBReTbVCB+ye3tF9pNPQ/M6QdhbpKuyrltpCDpyJT2h7 76rw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=eOzY4SNgF1Zqv1Lk3/33qe28TpEajp8xfBP2kmD5tNc=; b=ZGTpvfd7ZcNY583nZz9GR+HBCuAc3x0c+434N8xjbOBSaTJZhgJFy6BTbVQZsI0dQ1 ROHKiEgYL+5P6wlwXIiO6hS+REciinmzGKRMHi72GC6uqQv0NJd2ajrrD6eCLuEy/6jj whwoTFO/+tpT3KHIniezweTzgIuj71JBIHHCfEQYs/T6KsUh7wDRSW9fkeIxZmdCzp2W OS11n/2cvcvprALSiiBakIhpFts9hjxjJXaSYBBOXJqEwcm/b38YDb3QrZFV8hiKW08a s25CLYzeEGKsGQpjOaaC3shg2Bfm1pVsFVasdtS9C9dw1tbWjBMkhhFaKZumrxgbgD1C EHVA== X-Gm-Message-State: AA+aEWZedLBdlC5aDLp11a12xlxxXLjeKyiaKTzdyr7CXSeHzs82jHZI W7DI703MVwQe3xqSFAWX9nJmHSo/IWE= X-Google-Smtp-Source: AFSGD/V+L7deRCiwXnYpJxclROl7cGPZnBxb8vAgCpQqDqMFEhIJthknvFgwTzy46ng8iKkD5GZBUQ== X-Received: by 2002:a17:902:930b:: with SMTP id bc11mr40484692plb.17.1543461226687; Wed, 28 Nov 2018 19:13:46 -0800 (PST) Received: from [192.168.1.121] (66.29.188.166.static.utbb.net. [66.29.188.166]) by smtp.gmail.com with ESMTPSA id n23-v6sm563532pfg.84.2018.11.28.19.13.45 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 28 Nov 2018 19:13:45 -0800 (PST) Subject: Re: [PATCH 5/8] virtio_blk: implement mq_ops->commit_rqs() hook To: Ming Lei Cc: linux-block@vger.kernel.org, linux-nvme@lists.infradead.org References: <20181126163556.5181-1-axboe@kernel.dk> <20181126163556.5181-6-axboe@kernel.dk> <20181128021029.GF11128@ming.t460p> <35b33a34-9e24-5acb-7a4e-57433328bf3d@kernel.dk> <20181129012342.GB23249@ming.t460p> <20181129025143.GC23390@ming.t460p> From: Jens Axboe Message-ID: <41d9d590-6a59-c050-c8a8-2506342b93a4@kernel.dk> Date: Wed, 28 Nov 2018 20:13:43 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.2.1 MIME-Version: 1.0 In-Reply-To: <20181129025143.GC23390@ming.t460p> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org On 11/28/18 7:51 PM, Ming Lei wrote: > On Wed, Nov 28, 2018 at 07:19:09PM -0700, Jens Axboe wrote: >> On 11/28/18 6:23 PM, Ming Lei wrote: >>> On Tue, Nov 27, 2018 at 07:34:51PM -0700, Jens Axboe wrote: >>>> On 11/27/18 7:10 PM, Ming Lei wrote: >>>>> On Mon, Nov 26, 2018 at 09:35:53AM -0700, Jens Axboe wrote: >>>>>> We need this for blk-mq to kick things into gear, if we told it that >>>>>> we had more IO coming, but then failed to deliver on that promise. >>>>>> >>>>>> Signed-off-by: Jens Axboe >>>>>> --- >>>>>> drivers/block/virtio_blk.c | 15 +++++++++++++++ >>>>>> 1 file changed, 15 insertions(+) >>>>>> >>>>>> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c >>>>>> index 6e869d05f91e..b49c57e77780 100644 >>>>>> --- a/drivers/block/virtio_blk.c >>>>>> +++ b/drivers/block/virtio_blk.c >>>>>> @@ -214,6 +214,20 @@ static void virtblk_done(struct virtqueue *vq) >>>>>> spin_unlock_irqrestore(&vblk->vqs[qid].lock, flags); >>>>>> } >>>>>> >>>>>> +static void virtio_commit_rqs(struct blk_mq_hw_ctx *hctx) >>>>>> +{ >>>>>> + struct virtio_blk *vblk = hctx->queue->queuedata; >>>>>> + int qid = hctx->queue_num; >>>>>> + bool kick; >>>>>> + >>>>>> + spin_lock_irq(&vblk->vqs[qid].lock); >>>>>> + kick = virtqueue_kick_prepare(vblk->vqs[qid].vq); >>>>>> + spin_unlock_irq(&vblk->vqs[qid].lock); >>>>>> + >>>>>> + if (kick) >>>>>> + virtqueue_notify(vblk->vqs[qid].vq); >>>>>> +} >>>>>> + >>>>>> static blk_status_t virtio_queue_rq(struct blk_mq_hw_ctx *hctx, >>>>>> const struct blk_mq_queue_data *bd) >>>>>> { >>>>>> @@ -638,6 +652,7 @@ static void virtblk_initialize_rq(struct request *req) >>>>>> >>>>>> static const struct blk_mq_ops virtio_mq_ops = { >>>>>> .queue_rq = virtio_queue_rq, >>>>>> + .commit_rqs = virtio_commit_rqs, >>>>>> .complete = virtblk_request_done, >>>>>> .init_request = virtblk_init_request, >>>>>> #ifdef CONFIG_VIRTIO_BLK_SCSI >>>>>> -- >>>>>> 2.17.1 >>>>>> >>>>> >>>>> If .commit_rqs() is implemented, virtqueue_notify() in virtio_queue_rq() >>>>> should have been removed for saving the world switch per .queue_rq() >>>> >>>> ->commits_rqs() is only for the case where bd->last is set to false, >>>> and we never make it to the end and flag bd->last == true. If bd->last >>>> is true, the driver should kick things into gear. >>> >>> OK, looks I misunderstood it. However, virtio-blk doesn't need this >>> change since virtio_queue_rq() can handle it well. This patch may introduce >>> one unnecessary VM world switch in case of queue busy. >> >> Not it won't, it may in the case of some failure outside of the driver. > > If the failure is because of out of tag, blk_mq_dispatch_wake() will > rerun the queue, and the bd->last will be set finally. Or is there > other failure(outside of driver) not covered? The point is to make this happen when we commit the IOs, not needing to do a restart (or relying on IO being in-flight). If we're submitting a string of requests, we should not rely on failures happening only due to IO being going and thus restarting us. It defeats the purpose of even having ->last in the first place. >> The only reason that virtio-blk doesn't currently hang is because it >> has restart logic, and the failure case only happens in the if we >> already have IO in-flight. > > Yeah, virtqueue_kick() is called in case of any error in > virtio_queue_rq(), so I am still wondering why we have to implement > .commit_rqs() for virtio-blk. It's not strictly needed for virtio-blk with the restart logic that it has, but I think it'd be nicer to kill that since we have other real use cases of bd->last at this point. >>> IMO bd->last won't work well in case of io scheduler given the rq_list >>> only includes one single request. >> >> But that's a fake limitation that definitely should just be lifted, >> the fact that blk-mq-sched is _currently_ just doing single requests >> is woefully inefficient. > > I agree, but seems a bit hard given we have to consider request > merge. We don't have to drain everything, it should still be feasible to submit at least a batch of requests. For basic sequential IO, you want to leave the last one in the queue, if you have IOs going, for instance. But doing each and every request individually is a huge extra task. Doing IOPS comparisons of kyber and no scheduler reveals that to be very true. >>> I wrote this kind of patch(never posted) before to use sort of >>> ->commits_rqs() to replace the current bd->last mechanism which need >>> one extra driver tag, which may improve the above case, also code gets >>> cleaned up. >> >> It doesn't need one extra driver tag, we currently get an extra one just >> to flag ->last correctly. That's not a requirement, that's a limitation >> of the current implementation. We could get rid of that, and it it >> proves to be an issue, that's not hard to do. > > What do you think about using .commit_rqs() to replace ->last? For > example, just call .commit_rqs() after the last request is queued to > driver successfully. Then we can remove bd->last and avoid to get the > extra tag for figuring out bd->last. I don't want to make ->commit_rqs() part of the regular execution, it should be relegated to the "failure" case of not being able to fulfil our promise of sending a request with bd->last == true. Reasons mentioned earlier, but basically it's more efficient to commit from inside ->queue_rq() if we can, so we don't have to re-grab the submission lock needlessly. I like the idea of separate ->queue and ->commit, but in practice I don't see it working out without a performance penalty. -- Jens Axboe From mboxrd@z Thu Jan 1 00:00:00 1970 From: axboe@kernel.dk (Jens Axboe) Date: Wed, 28 Nov 2018 20:13:43 -0700 Subject: [PATCH 5/8] virtio_blk: implement mq_ops->commit_rqs() hook In-Reply-To: <20181129025143.GC23390@ming.t460p> References: <20181126163556.5181-1-axboe@kernel.dk> <20181126163556.5181-6-axboe@kernel.dk> <20181128021029.GF11128@ming.t460p> <35b33a34-9e24-5acb-7a4e-57433328bf3d@kernel.dk> <20181129012342.GB23249@ming.t460p> <20181129025143.GC23390@ming.t460p> Message-ID: <41d9d590-6a59-c050-c8a8-2506342b93a4@kernel.dk> On 11/28/18 7:51 PM, Ming Lei wrote: > On Wed, Nov 28, 2018@07:19:09PM -0700, Jens Axboe wrote: >> On 11/28/18 6:23 PM, Ming Lei wrote: >>> On Tue, Nov 27, 2018@07:34:51PM -0700, Jens Axboe wrote: >>>> On 11/27/18 7:10 PM, Ming Lei wrote: >>>>> On Mon, Nov 26, 2018@09:35:53AM -0700, Jens Axboe wrote: >>>>>> We need this for blk-mq to kick things into gear, if we told it that >>>>>> we had more IO coming, but then failed to deliver on that promise. >>>>>> >>>>>> Signed-off-by: Jens Axboe >>>>>> --- >>>>>> drivers/block/virtio_blk.c | 15 +++++++++++++++ >>>>>> 1 file changed, 15 insertions(+) >>>>>> >>>>>> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c >>>>>> index 6e869d05f91e..b49c57e77780 100644 >>>>>> --- a/drivers/block/virtio_blk.c >>>>>> +++ b/drivers/block/virtio_blk.c >>>>>> @@ -214,6 +214,20 @@ static void virtblk_done(struct virtqueue *vq) >>>>>> spin_unlock_irqrestore(&vblk->vqs[qid].lock, flags); >>>>>> } >>>>>> >>>>>> +static void virtio_commit_rqs(struct blk_mq_hw_ctx *hctx) >>>>>> +{ >>>>>> + struct virtio_blk *vblk = hctx->queue->queuedata; >>>>>> + int qid = hctx->queue_num; >>>>>> + bool kick; >>>>>> + >>>>>> + spin_lock_irq(&vblk->vqs[qid].lock); >>>>>> + kick = virtqueue_kick_prepare(vblk->vqs[qid].vq); >>>>>> + spin_unlock_irq(&vblk->vqs[qid].lock); >>>>>> + >>>>>> + if (kick) >>>>>> + virtqueue_notify(vblk->vqs[qid].vq); >>>>>> +} >>>>>> + >>>>>> static blk_status_t virtio_queue_rq(struct blk_mq_hw_ctx *hctx, >>>>>> const struct blk_mq_queue_data *bd) >>>>>> { >>>>>> @@ -638,6 +652,7 @@ static void virtblk_initialize_rq(struct request *req) >>>>>> >>>>>> static const struct blk_mq_ops virtio_mq_ops = { >>>>>> .queue_rq = virtio_queue_rq, >>>>>> + .commit_rqs = virtio_commit_rqs, >>>>>> .complete = virtblk_request_done, >>>>>> .init_request = virtblk_init_request, >>>>>> #ifdef CONFIG_VIRTIO_BLK_SCSI >>>>>> -- >>>>>> 2.17.1 >>>>>> >>>>> >>>>> If .commit_rqs() is implemented, virtqueue_notify() in virtio_queue_rq() >>>>> should have been removed for saving the world switch per .queue_rq() >>>> >>>> ->commits_rqs() is only for the case where bd->last is set to false, >>>> and we never make it to the end and flag bd->last == true. If bd->last >>>> is true, the driver should kick things into gear. >>> >>> OK, looks I misunderstood it. However, virtio-blk doesn't need this >>> change since virtio_queue_rq() can handle it well. This patch may introduce >>> one unnecessary VM world switch in case of queue busy. >> >> Not it won't, it may in the case of some failure outside of the driver. > > If the failure is because of out of tag, blk_mq_dispatch_wake() will > rerun the queue, and the bd->last will be set finally. Or is there > other failure(outside of driver) not covered? The point is to make this happen when we commit the IOs, not needing to do a restart (or relying on IO being in-flight). If we're submitting a string of requests, we should not rely on failures happening only due to IO being going and thus restarting us. It defeats the purpose of even having ->last in the first place. >> The only reason that virtio-blk doesn't currently hang is because it >> has restart logic, and the failure case only happens in the if we >> already have IO in-flight. > > Yeah, virtqueue_kick() is called in case of any error in > virtio_queue_rq(), so I am still wondering why we have to implement > .commit_rqs() for virtio-blk. It's not strictly needed for virtio-blk with the restart logic that it has, but I think it'd be nicer to kill that since we have other real use cases of bd->last at this point. >>> IMO bd->last won't work well in case of io scheduler given the rq_list >>> only includes one single request. >> >> But that's a fake limitation that definitely should just be lifted, >> the fact that blk-mq-sched is _currently_ just doing single requests >> is woefully inefficient. > > I agree, but seems a bit hard given we have to consider request > merge. We don't have to drain everything, it should still be feasible to submit at least a batch of requests. For basic sequential IO, you want to leave the last one in the queue, if you have IOs going, for instance. But doing each and every request individually is a huge extra task. Doing IOPS comparisons of kyber and no scheduler reveals that to be very true. >>> I wrote this kind of patch(never posted) before to use sort of >>> ->commits_rqs() to replace the current bd->last mechanism which need >>> one extra driver tag, which may improve the above case, also code gets >>> cleaned up. >> >> It doesn't need one extra driver tag, we currently get an extra one just >> to flag ->last correctly. That's not a requirement, that's a limitation >> of the current implementation. We could get rid of that, and it it >> proves to be an issue, that's not hard to do. > > What do you think about using .commit_rqs() to replace ->last? For > example, just call .commit_rqs() after the last request is queued to > driver successfully. Then we can remove bd->last and avoid to get the > extra tag for figuring out bd->last. I don't want to make ->commit_rqs() part of the regular execution, it should be relegated to the "failure" case of not being able to fulfil our promise of sending a request with bd->last == true. Reasons mentioned earlier, but basically it's more efficient to commit from inside ->queue_rq() if we can, so we don't have to re-grab the submission lock needlessly. I like the idea of separate ->queue and ->commit, but in practice I don't see it working out without a performance penalty. -- Jens Axboe