Re: [PATCH v2 12/19] vhost: add vhost_svq_poll

From: Jason Wang <jasowang@redhat.com>
To: Eugenio Perez Martin <eperezma@redhat.com>
Cc: qemu-devel <qemu-devel@nongnu.org>,
	Gautam Dawar <gdawar@xilinx.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	Stefano Garzarella <sgarzare@redhat.com>,
	Eric Blake <eblake@redhat.com>,
	Zhu Lingshan <lingshan.zhu@intel.com>,
	Stefan Hajnoczi <stefanha@redhat.com>,
	Markus Armbruster <armbru@redhat.com>,
	Cornelia Huck <cohuck@redhat.com>,
	Parav Pandit <parav@mellanox.com>,
	Laurent Vivier <lvivier@redhat.com>,
	Liuxiangdong <liuxiangdong5@huawei.com>,
	Eli Cohen <eli@mellanox.com>, Cindy Lu <lulu@redhat.com>,
	Harpreet Singh Anand <hanand@xilinx.com>,
	"Gonglei (Arei)" <arei.gonglei@huawei.com>
Subject: Re: [PATCH v2 12/19] vhost: add vhost_svq_poll
Date: Tue, 19 Jul 2022 15:38:32 +0800	[thread overview]
Message-ID: <c5ee2d97-a86a-9f98-c45a-940b720f601a@redhat.com> (raw)
In-Reply-To: <CAJaqyWdf-VQxP4rYasSzdq3yaYLPw1_G6J=Kqq5SXXWEDYpKCA@mail.gmail.com>

在 2022/7/16 01:05, Eugenio Perez Martin 写道:
> On Fri, Jul 15, 2022 at 10:48 AM Jason Wang <jasowang@redhat.com> wrote:
>> On Fri, Jul 15, 2022 at 1:39 PM Eugenio Perez Martin
>> <eperezma@redhat.com> wrote:
>>> On Fri, Jul 15, 2022 at 5:59 AM Jason Wang <jasowang@redhat.com> wrote:
>>>> On Fri, Jul 15, 2022 at 12:32 AM Eugenio Pérez <eperezma@redhat.com> wrote:
>>>>> It allows the Shadow Control VirtQueue to wait for the device to use the
>>>>> available buffers.
>>>>>
>>>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>>>>> ---
>>>>>   hw/virtio/vhost-shadow-virtqueue.h |  1 +
>>>>>   hw/virtio/vhost-shadow-virtqueue.c | 22 ++++++++++++++++++++++
>>>>>   2 files changed, 23 insertions(+)
>>>>>
>>>>> diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
>>>>> index 1692541cbb..b5c6e3b3b4 100644
>>>>> --- a/hw/virtio/vhost-shadow-virtqueue.h
>>>>> +++ b/hw/virtio/vhost-shadow-virtqueue.h
>>>>> @@ -89,6 +89,7 @@ void vhost_svq_push_elem(VhostShadowVirtqueue *svq, const SVQElement *elem,
>>>>>   int vhost_svq_add(VhostShadowVirtqueue *svq, const struct iovec *out_sg,
>>>>>                     size_t out_num, const struct iovec *in_sg, size_t in_num,
>>>>>                     SVQElement *elem);
>>>>> +size_t vhost_svq_poll(VhostShadowVirtqueue *svq);
>>>>>
>>>>>   void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
>>>>>   void vhost_svq_set_svq_call_fd(VhostShadowVirtqueue *svq, int call_fd);
>>>>> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
>>>>> index 5244896358..31a267f721 100644
>>>>> --- a/hw/virtio/vhost-shadow-virtqueue.c
>>>>> +++ b/hw/virtio/vhost-shadow-virtqueue.c
>>>>> @@ -486,6 +486,28 @@ static void vhost_svq_flush(VhostShadowVirtqueue *svq,
>>>>>       } while (!vhost_svq_enable_notification(svq));
>>>>>   }
>>>>>
>>>>> +/**
>>>>> + * Poll the SVQ for one device used buffer.
>>>>> + *
>>>>> + * This function race with main event loop SVQ polling, so extra
>>>>> + * synchronization is needed.
>>>>> + *
>>>>> + * Return the length written by the device.
>>>>> + */
>>>>> +size_t vhost_svq_poll(VhostShadowVirtqueue *svq)
>>>>> +{
>>>>> +    do {
>>>>> +        uint32_t len;
>>>>> +        SVQElement *elem = vhost_svq_get_buf(svq, &len);
>>>>> +        if (elem) {
>>>>> +            return len;
>>>>> +        }
>>>>> +
>>>>> +        /* Make sure we read new used_idx */
>>>>> +        smp_rmb();
>>>> There's already one smp_rmb(0 in vhost_svq_get_buf(). So this seems useless?
>>>>
>>> That rmb is after checking for new entries with (vq->last_used_idx !=
>>> svq->shadow_used_idx) , to avoid reordering used_idx read with the
>>> actual used entry. So my understanding is
>>> that the compiler is free to skip that check within the while loop.
>> What do you mean by "that check" here?
>>
> The first check of (presumably cached) last_used_idx !=
> shadow_used_idx. Right before the SVQ's vring fetch of
> svq->vring.used->idx.
>
>>> Maybe the right solution is to add it in vhost_svq_more_used after the
>>> condition (vq->last_used_idx != svq->shadow_used_idx) is false?
>> I'm not sure I get the goal of the smp_rmb() here. What barrier does it pair?
>>
> It pairs with the actual device memory write.
>
> Note that I'm worried about compiler optimizations or reordering
> causing an infinite loop, not that the memory is updated properly.
>
> Let's say we just returned from vhost_svq_add, and avail_idx -
> used_idx > 0. The device still did not update SVQ vring used idx, and
> qemu is very fast so it completes a full call of vhost_svq_get_buf
> before the device is able to increment the used index. We can trace
> the full vhost_svq_get_buf without a memory barrier.
>
> If the compiler inlines enough and we delete the new smp_rmb barrier,
> this is what it sees:
>
> size_t vhost_svq_poll(VhostShadowVirtqueue *svq)
> {
>      do {
>          more_used = false
>          // The next conditional returns false
>          if (svq->last_used_idx != svq->shadow_used_idx) {
>              goto useful;
>          }
>
>          svq->shadow_used_idx = cpu_to_le16(svq->vring.used->idx);
>
>          // next conditional returns false too
>          if (!(svq->last_used_idx != svq->shadow_used_idx))
>              continue;
>
> useful:
>          // actual code to handle new used buffer
>          break;
>          }
>      }
> }
>
> And qemu does not need to read again none of the variables since
> nothing modifies them within the loop before "useful" tag
> (svq->vring.used->idx, svq->last_used_idx, svq->shadow_used_idx). So
> it could freely rewrite as:
>
> size_t vhost_svq_poll(VhostShadowVirtqueue *svq) {
>      if (svq->last_used_idx == svq->shadow_used_idx &&
>          svq->last_used_idx == svq->vring.used->idx) {
>              for (;;);
>      }
> }
>
> That's why I think the right place for the mb is right after caller
> code see (potentially cached) last_used_idx == shadow_used_idx, and it
> needs to read a value paired with the "device's mb" in the SVQ vring.

I think you need "volatile" instead of the memory barriers. If I 
understand correctly, you want the load from the memory instead of the 
registers here.

Thanks

>
> We didn't have that problem before, since we clear event_notifier
> right before the do{}while(), and event loop should hit a memory
> barrier in the next select / poll / read / whatever syscall to check
> that event notifier fd is set again.
>
>> Since we are in the busy loop, we will read the for new used_idx for
>> sure,
> I'm not so sure of that, but maybe I've missed something.
>
> I'm sending v3 with this comment pending, so we can iterate faster.
>
> Thanks!
>
>> and we can't forecast when the used_idx is committed to memory.
>>