Re: [PATCH 1/2] vdpa/mlx5: Avoid unnecessary query virtqueue - Si-Wei Liu

From: Si-Wei Liu <siwliu.kernel@gmail.com>
To: Eli Cohen <elic@nvidia.com>
Cc: Jason Wang <jasowang@redhat.com>,
	mst@redhat.com, virtualization@lists.linux-foundation.org,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	lulu@redhat.com, Si-Wei Liu <si-wei.liu@oracle.com>
Subject: Re: [PATCH 1/2] vdpa/mlx5: Avoid unnecessary query virtqueue
Date: Tue, 2 Feb 2021 09:54:00 -0800	[thread overview]
Message-ID: <CAPWQSg0tRXoGF88LQSLzUg88ZEi8p+M=R6Qd445iABShfn-o4g@mail.gmail.com> (raw)
In-Reply-To: <20210202092253.GA236663@mtl-vdi-166.wap.labs.mlnx>

On Tue, Feb 2, 2021 at 1:23 AM Eli Cohen <elic@nvidia.com> wrote:
>
> On Tue, Feb 02, 2021 at 12:38:51AM -0800, Si-Wei Liu wrote:
> > Thanks Eli and Jason for clarifications. See inline.
> >
> > On Mon, Feb 1, 2021 at 11:06 PM Eli Cohen <elic@nvidia.com> wrote:
> > >
> > > On Tue, Feb 02, 2021 at 02:02:25PM +0800, Jason Wang wrote:
> > > >
> > > > On 2021/2/2 下午12:15, Si-Wei Liu wrote:
> > > > > On Mon, Feb 1, 2021 at 7:13 PM Jason Wang <jasowang@redhat.com> wrote:
> > > > > >
> > > > > > On 2021/2/2 上午3:17, Si-Wei Liu wrote:
> > > > > > > On Mon, Feb 1, 2021 at 10:51 AM Si-Wei Liu <siwliu.kernel@gmail.com> wrote:
> > > > > > > > On Thu, Jan 28, 2021 at 5:46 AM Eli Cohen <elic@nvidia.com> wrote:
> > > > > > > > > suspend_vq should only suspend the VQ on not save the current available
> > > > > > > > > index. This is done when a change of map occurs when the driver calls
> > > > > > > > > save_channel_info().
> > > > > > > > Hmmm, suspend_vq() is also called by teardown_vq(), the latter of
> > > > > > > > which doesn't save the available index as save_channel_info() doesn't
> > > > > > > > get called in that path at all. How does it handle the case that
> > > > > > > > aget_vq_state() is called from userspace (e.g. QEMU) while the
> > > > > > > > hardware VQ object was torn down, but userspace still wants to access
> > > > > > > > the queue index?
> > > > > > > >
> > > > > > > > Refer to https://lore.kernel.org/netdev/1601583511-15138-1-git-send-email-si-wei.liu@oracle.com/
> > > > > > > >
> > > > > > > > vhost VQ 0 ring restore failed: -1: Resource temporarily unavailable (11)
> > > > > > > > vhost VQ 1 ring restore failed: -1: Resource temporarily unavailable (11)
> > > > > > > >
> > > > > > > > QEMU will complain with the above warning while VM is being rebooted
> > > > > > > > or shut down.
> > > > > > > >
> > > > > > > > Looks to me either the kernel driver should cover this requirement, or
> > > > > > > > the userspace has to bear the burden in saving the index and not call
> > > > > > > > into kernel if VQ is destroyed.
> > > > > > > Actually, the userspace doesn't have the insights whether virt queue
> > > > > > > will be destroyed if just changing the device status via set_status().
> > > > > > > Looking at other vdpa driver in tree i.e. ifcvf it doesn't behave like
> > > > > > > so. Hence this still looks to me to be Mellanox specifics and
> > > > > > > mlx5_vdpa implementation detail that shouldn't expose to userspace.
> > > > > >
> > > > > > So I think we can simply drop this patch?
> > > > > Yep, I think so. To be honest I don't know why it has anything to do
> > > > > with the memory hotplug issue.
> > > >
> > > >
> > > > Eli may know more, my understanding is that, during memory hotplut, qemu
> > > > need to propagate new memory mappings via set_map(). For mellanox, it means
> > > > it needs to rebuild memory keys, so the virtqueue needs to be suspended.
> > > >
> > >
> > > I think Siwei was asking why the first patch was related to the hotplug
> > > issue.
> >
> > I was thinking how consistency is assured when saving/restoring this
> > h/w avail_index against the one in the virtq memory, particularly in
> > the region_add/.region_del() context (e.g. the hotplug case). Problem
> > is I don't see explicit memory barrier when guest thread updates the
> > avail_index, how does the device make sure the h/w avail_index is
> > uptodate while guest may race with updating the virtq's avail_index in
> > the mean while? Maybe I completely miss something obvious?
> DKIM-Signature: v1; arsa-sha256; crelaxed/relaxed; dnvidia.com; sn1;
>         t 12257780; bhHnB0z4VEKwRS3WGY8d836MJgxu5Eln/jbFZlNXVxc08;
>         hX-PGP-Universal:Date:From:To:CC:Subject:Message-ID:References:
>          MIME-Version:Content-Type:Content-Disposition:
>          Content-Transfer-Encoding:In-Reply-To:User-Agent:X-Originating-IP:
>          X-ClientProxiedBy;
>         bgGmb8+rcn3/rKzKQ/7QzSnghWzZ+FAU0XntsRZYGQ66sFvT7zsYPHogG3LIWNY77t
>          wNHPw7GCJrNaH3nEXPbOp0FMOZw4Kv4W7UPuYPobbLeTkvuPAidjB8dM42vz+1X61t
>          9IVQT9X4hnAxRjI5CqZOo41GS4Tl1X+ykGoA+VE80BR/R/+nQ3tXDVULfppzeB+vu3
>          TWnnpaZ2GyoNyPlMiyVRkHdXzDVgA4uQHxwHn7otGK5J4lzyu8KrFyQtiP+f6hfu5v
>          crJkYS8e9A+rfzUmKWuyHcKcmhPhAVJ4XdpzZcDXXlMHVxG7nR1o88xttC6D1+oNIP
>          9xHI3DkNBpEuA
> If you're asking about syncronization upon hot plug of memory, the
> hardware always goes to read the available index from memory when a new
> hardware object is associted with a virtqueue. You can argue then that
> you don't need to restore the available index and you may be right but
> this is the currect inteface to the firmware.
>
>
> If you're asking on generally how sync is assured when the guest updates
> the available index, can you please send a pointer to the code where you
> see the update without a memory barrier?

This is a snippet of virtqueue_add_split() where avail_index gets
updated by guest:

        /* Put entry in available array (but don't update avail->idx until they
         * do sync). */
        avail = vq->split.avail_idx_shadow & (vq->split.vring.num - 1);
        vq->split.vring.avail->ring[avail] = cpu_to_virtio16(_vq->vdev, head);

        /* Descriptors and available array need to be set before we expose the
         * new available array entries. */
        virtio_wmb(vq->weak_barriers);
        vq->split.avail_idx_shadow++;
        vq->split.vring.avail->idx = cpu_to_virtio16(_vq->vdev,
                                                vq->split.avail_idx_shadow);
        vq->num_added++;

There's memory barrier to make sure the update to descriptor and
available ring is seen before writing to the avail->idx, but there
seems no gurantee that this update would flush to the memory
immmedately either before or after the mlx5-vdpa is suspened and gets
the old avail_index value stashed somewhere. In this case, how does
the hardware ensure the consistency between the guest virtio and host
mlx5-vdpa? Or, it completly relies on guest to update the avail_index
once the next buffer is available, so that the index will be in sync
again?

Thanks,
-Siwei

>
> >
> > Thanks,
> > -Siwei
> >
> > >
> > > But you're correct. When memory is added, I get a new memory map. This
> > > requires me to build a new memory key object which covers the new memory
> > > map. Since the virtqueue objects are referencing this memory key, I need
> > > to destroy them and build new ones referncing the new memory key.
> > >
> > > > Thanks
> > > >
> > > >
> > > > >
> > > > > -Siwei
> > > > >
> > > > > > Thanks
> > > > > >
> > > > > >
> > > > > > > > -Siwei
> > > > > > > >
> > > > > > > >
> > > > > > > > > Signed-off-by: Eli Cohen <elic@nvidia.com>
> > > > > > > > > ---
> > > > > > > > >    drivers/vdpa/mlx5/net/mlx5_vnet.c | 8 --------
> > > > > > > > >    1 file changed, 8 deletions(-)
> > > > > > > > >
> > > > > > > > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > > > index 88dde3455bfd..549ded074ff3 100644
> > > > > > > > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > > > @@ -1148,8 +1148,6 @@ static int setup_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
> > > > > > > > >
> > > > > > > > >    static void suspend_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
> > > > > > > > >    {
> > > > > > > > > -       struct mlx5_virtq_attr attr;
> > > > > > > > > -
> > > > > > > > >           if (!mvq->initialized)
> > > > > > > > >                   return;
> > > > > > > > >
> > > > > > > > > @@ -1158,12 +1156,6 @@ static void suspend_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *m
> > > > > > > > >
> > > > > > > > >           if (modify_virtqueue(ndev, mvq, MLX5_VIRTIO_NET_Q_OBJECT_STATE_SUSPEND))
> > > > > > > > >                   mlx5_vdpa_warn(&ndev->mvdev, "modify to suspend failed\n");
> > > > > > > > > -
> > > > > > > > > -       if (query_virtqueue(ndev, mvq, &attr)) {
> > > > > > > > > -               mlx5_vdpa_warn(&ndev->mvdev, "failed to query virtqueue\n");
> > > > > > > > > -               return;
> > > > > > > > > -       }
> > > > > > > > > -       mvq->avail_idx = attr.available_index;
> > > > > > > > >    }
> > > > > > > > >
> > > > > > > > >    static void suspend_vqs(struct mlx5_vdpa_net *ndev)
> > > > > > > > > --
> > > > > > > > > 2.29.2
> > > > > > > > >
> > > >