From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.6 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS, USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 938AFC47089 for ; Thu, 27 May 2021 05:48:51 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 74603613CA for ; Thu, 27 May 2021 05:48:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234485AbhE0FuW (ORCPT ); Thu, 27 May 2021 01:50:22 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:21395 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234399AbhE0FuW (ORCPT ); Thu, 27 May 2021 01:50:22 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1622094529; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=0hrDilVfh/qqPx2XPO3E6zbGSUURQs7c8ZBAfRTr1MU=; b=cYfJ+fwKqWjb5ctKpfS33VmGuEQ+CG6J9ojbpJdGNHlYp+6llAEgjdRcsUR7fJA9WTHjoR xmg5QPVBCnFqMeMqyG7o8JQsm5o4dWDKDIEmCbu44ams+sXfDIPOVjbLOVZpQBiWf4ig9M 7kBjL6caxyN+DDh8OVHnL0MspLA632I= Received: from mail-pg1-f199.google.com (mail-pg1-f199.google.com [209.85.215.199]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-166-vy2dQNzVN8q4fmyNbmKRsw-1; Thu, 27 May 2021 01:48:47 -0400 X-MC-Unique: vy2dQNzVN8q4fmyNbmKRsw-1 Received: by mail-pg1-f199.google.com with SMTP id 1-20020a6306010000b0290215c617f0f8so2304303pgg.8 for ; Wed, 26 May 2021 22:48:46 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding :content-language; bh=0hrDilVfh/qqPx2XPO3E6zbGSUURQs7c8ZBAfRTr1MU=; b=KVVUj6KtdYHj8Vp4yKX8nj6Ho54j6P2LvuDgHjvtfd5kuLTHnwXFRPEfpCY9VwvWrs VfqAMt6C8qfEWUljDKbT+N5OYbQxAFTTrVI7uroEOj32SxwvwGvKDXdJM9YMlBUBdzjI jizf9X6Am1QIA6BX8SH1wKPgkoVY8JX1bg0KssKqU9b+8J3mlaXr/AhH1VkI4QNreaIt Qp5wnUZPHVCmagvmGCs7EKN+fWv9ra8MEG1IxKf82BXLg/hyluczBYJ7k7Uk81m8Ml3l 6qBFQjTirYQ4Ic3x6BxS04LiPCBr/hNKdIRHiXTiM3rXBrWdvPFq+UaX+zF1k52E5Ep/ d8Ug== X-Gm-Message-State: AOAM531EZ+7fh3Sx2rgSUuOkG6tOcLtG6tkG58CuQI3lLAkD/oxLcHBz Xw2KmefKKydGXmXGfwJcD8U5oXNhVyD4Iz+TMs2mLZpLqIC/3khze6XeFJVCvpmBNOeevhzlbzz 5ivEsnIWuREJnyU3tUenL5hE= X-Received: by 2002:a63:6cc1:: with SMTP id h184mr2135506pgc.367.1622094525878; Wed, 26 May 2021 22:48:45 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxhiC4pUyikKRVg/fGg/BEWY/NKv/DsEJoI/3n+vZ8sn2JoDl+AANg2WvJ/mb0S+nQ+EAS4Ag== X-Received: by 2002:a63:6cc1:: with SMTP id h184mr2135485pgc.367.1622094525472; Wed, 26 May 2021 22:48:45 -0700 (PDT) Received: from wangxiaodeMacBook-Air.local ([209.132.188.80]) by smtp.gmail.com with ESMTPSA id o186sm873098pfg.170.2021.05.26.22.48.41 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 26 May 2021 22:48:44 -0700 (PDT) Subject: Re: [PATCH 3/3] virtio_blk: implement blk_mq_ops->poll() To: Stefan Hajnoczi Cc: virtualization@lists.linux-foundation.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, Christoph Hellwig , Paolo Bonzini , Jens Axboe , slp@redhat.com, sgarzare@redhat.com, "Michael S. Tsirkin" References: <20210520141305.355961-1-stefanha@redhat.com> <20210520141305.355961-4-stefanha@redhat.com> From: Jason Wang Message-ID: Date: Thu, 27 May 2021 13:48:36 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.10.2 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org 在 2021/5/25 下午4:59, Stefan Hajnoczi 写道: > On Tue, May 25, 2021 at 11:21:41AM +0800, Jason Wang wrote: >> 在 2021/5/20 下午10:13, Stefan Hajnoczi 写道: >>> Request completion latency can be reduced by using polling instead of >>> irqs. Even Posted Interrupts or similar hardware support doesn't beat >>> polling. The reason is that disabling virtqueue notifications saves >>> critical-path CPU cycles on the host by skipping irq injection and in >>> the guest by skipping the irq handler. So let's add blk_mq_ops->poll() >>> support to virtio_blk. >>> >>> The approach taken by this patch differs from the NVMe driver's >>> approach. NVMe dedicates hardware queues to polling and submits >>> REQ_HIPRI requests only on those queues. This patch does not require >>> exclusive polling queues for virtio_blk. Instead, it switches between >>> irqs and polling when one or more REQ_HIPRI requests are in flight on a >>> virtqueue. >>> >>> This is possible because toggling virtqueue notifications is cheap even >>> while the virtqueue is running. NVMe cqs can't do this because irqs are >>> only enabled/disabled at queue creation time. >>> >>> This toggling approach requires no configuration. There is no need to >>> dedicate queues ahead of time or to teach users and orchestration tools >>> how to set up polling queues. >>> >>> Possible drawbacks of this approach: >>> >>> - Hardware virtio_blk implementations may find virtqueue_disable_cb() >>> expensive since it requires DMA. >> >> Note that it's probably not related to the behavior of the driver but the >> design of the event suppression mechanism. >> >> Device can choose to ignore the suppression flag and keep sending >> interrupts. > Yes, it's the design of the event suppression mechanism. > > If we use dedicated polling virtqueues then the hardware doesn't need to > check whether interrupts are enabled for each notification. However, > there's no mechanism to tell the device that virtqueue interrupts are > permanently disabled. This means that as of today, even dedicated > virtqueues cannot suppress interrupts without the device checking for > each notification. This can be detected via a transport specific way. E.g in the case of MSI, VIRTIO_MSI_NO_VECTOR could be a hint. > >>> If such devices become popular then >>> the virtio_blk driver could use a similar approach to NVMe when >>> VIRTIO_F_ACCESS_PLATFORM is detected in the future. >>> >>> - If a blk_poll() thread is descheduled it not only hurts polling >>> performance but also delays completion of non-REQ_HIPRI requests on >>> that virtqueue since vq notifications are disabled. >> >> Can we poll only when only high pri requests are pending? > Yes, that's what this patch does. > >> If the backend is a remote one, I think the polling may cause more cpu >> cycles. > Right, but polling is only done when userspace sets the RWF_HIPRI > request flag. Most applications don't support it and for those that do > it's probably an option that the user needs to enable explicitly. I see. > > Stefan > >>> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c >>> index fc0fb1dcd399..f0243dcd745a 100644 >>> --- a/drivers/block/virtio_blk.c >>> +++ b/drivers/block/virtio_blk.c >>> @@ -29,6 +29,16 @@ static struct workqueue_struct *virtblk_wq; >>> struct virtio_blk_vq { >>> struct virtqueue *vq; >>> spinlock_t lock; >>> + >>> + /* Number of non-REQ_HIPRI requests in flight. Protected by lock. */ >>> + unsigned int num_lopri; >>> + >>> + /* Number of REQ_HIPRI requests in flight. Protected by lock. */ >>> + unsigned int num_hipri; >>> + >>> + /* Are vq notifications enabled? Protected by lock. */ >>> + bool cb_enabled; >> >> We had event_flag_shadow, is it sufficient to introduce a new helper >> virtqueue_cb_is_enabled()? > Yes, I'll try that in the next revision. > >>> + >>> char name[VQ_NAME_LEN]; >>> } ____cacheline_aligned_in_smp; >>> @@ -171,33 +181,67 @@ static inline void virtblk_request_done(struct request *req) >>> blk_mq_end_request(req, virtblk_result(vbr)); >>> } >>> -static void virtblk_done(struct virtqueue *vq) >>> +/* Returns true if one or more requests completed */ >>> +static bool virtblk_complete_requests(struct virtqueue *vq) >>> { >>> struct virtio_blk *vblk = vq->vdev->priv; >>> struct virtio_blk_vq *vbq = &vblk->vqs[vq->index]; >>> bool req_done = false; >>> + bool last_hipri_done = false; >>> struct virtblk_req *vbr; >>> unsigned long flags; >>> unsigned int len; >>> spin_lock_irqsave(&vbq->lock, flags); >>> + >>> do { >>> - virtqueue_disable_cb(vq); >>> + if (vbq->cb_enabled) >>> + virtqueue_disable_cb(vq); >>> while ((vbr = virtqueue_get_buf(vq, &len)) != NULL) { >>> struct request *req = blk_mq_rq_from_pdu(vbr); >>> + if (req->cmd_flags & REQ_HIPRI) { >>> + if (--vbq->num_hipri == 0) >>> + last_hipri_done = true; >>> + } else >>> + vbq->num_lopri--; >>> + >>> if (likely(!blk_should_fake_timeout(req->q))) >>> blk_mq_complete_request(req); >>> req_done = true; >>> } >>> if (unlikely(virtqueue_is_broken(vq))) >>> break; >>> - } while (!virtqueue_enable_cb(vq)); >>> + >>> + /* Enable vq notifications if non-polled requests remain */ >>> + if (last_hipri_done && vbq->num_lopri > 0) { >>> + last_hipri_done = false; >>> + vbq->cb_enabled = true; >>> + } >>> + } while (vbq->cb_enabled && !virtqueue_enable_cb(vq)); >>> /* In case queue is stopped waiting for more buffers. */ >>> if (req_done) >>> blk_mq_start_stopped_hw_queues(vblk->disk->queue, true); >>> spin_unlock_irqrestore(&vbq->lock, flags); >>> + >>> + return req_done; >>> +} >>> + >>> +static int virtblk_poll(struct blk_mq_hw_ctx *hctx) >>> +{ >>> + struct virtio_blk *vblk = hctx->queue->queuedata; >>> + struct virtqueue *vq = vblk->vqs[hctx->queue_num].vq; >>> + >>> + if (!virtqueue_more_used(vq)) >> >> I'm not familiar with block polling but what happens if a buffer is made >> available after virtqueue_more_used() returns false here? > Can you explain the scenario, I'm not sure I understand? "buffer is made > available" -> are you thinking about additional requests being submitted > by the driver or an in-flight request being marked used by the device? Something like that: 1) requests are submitted 2) poll but virtqueue_more_used() return false 3) device make buffer used In this case, will poll() be triggered again by somebody else? (I think interrupt is disabled here). Thanks > > Stefan