qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Eugenio Perez Martin <eperezma@redhat.com>
To: Jason Wang <jasowang@redhat.com>
Cc: Parav Pandit <parav@mellanox.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	Juan Quintela <quintela@redhat.com>,
	Markus Armbruster <armbru@redhat.com>,
	qemu-level <qemu-devel@nongnu.org>,
	Harpreet Singh Anand <hanand@xilinx.com>,
	Xiao W Wang <xiao.w.wang@intel.com>,
	Stefan Hajnoczi <stefanha@redhat.com>,
	Eli Cohen <eli@mellanox.com>,
	virtualization@lists.linux-foundation.org,
	Michael Lilja <ml@napatech.com>,
	Stefano Garzarella <sgarzare@redhat.com>
Subject: Re: [RFC v3 17/29] vhost: Shadow virtqueue buffers forwarding
Date: Wed, 2 Jun 2021 19:18:10 +0200	[thread overview]
Message-ID: <CAJaqyWf7M1fjrd+kr-2bcYj+ibrqZVoREZuTiJ0i+p6dA+Dukw@mail.gmail.com> (raw)
In-Reply-To: <bfd680e5-9434-3fbe-3119-1f3c5fc42f4c@redhat.com>

On Wed, Jun 2, 2021 at 11:51 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2021/5/20 上午12:28, Eugenio Pérez 写道:
> > Initial version of shadow virtqueue that actually forward buffers. The
> > exposed addresses are the qemu's virtual address, so devices with IOMMU
> > that does not allow full mapping of qemu's address space does not work
> > at the moment.
> >
> > Also for simplicity it only supports modern devices, that expects vring
> > in little endian, with split ring and no event idx or indirect
> > descriptors.
> >
> > It reuses the VirtQueue code for the device part. The driver part is
> > based on Linux's virtio_ring driver, but with stripped functionality
> > and optimizations so it's easier to review.
> >
> > Later commits will solve some of these concerns.
>
>
> It would be more more easier to review if you squash those
> "enhancements" into this patch.
>

Ok, they will be in the same commit for the next version.

>
> >
> > Code also need to map used ring (device part) as RW in, and only in,
> > vhost-net. To map (or call vhost_device_iotlb_miss) inconditionally
> > would print an error in case of vhost devices with its own mapping
> > (vdpa).
>
>
> I think we should not depend on the IOTLB miss. Instead, we should
> program the device IOTLB before starting the svq. Or is there anything
> that prevent you from doing this?
>

Sorry for being unclear, that is what I meant in the message: No other
device than kernel vhost needs the map (as "sent iotlb miss ahead"),
so we must make it conditional. Doing it unconditionally would make
nothing but an error appear on the screen, but it is incorrect anyway.

Is it clearer this way?

>
> > To know if this call is needed, vhost_sw_live_migration_start_vq and
> > vhost_sw_live_migration_stop copy the test performed in
> > vhost_dev_start. Testing for the actual backend type could be cleaner,
> > or checking for non-NULL vhost_force_iommu, enable_custom_iommu, or
> > another vhostOp. We could extract this test in its own function too,
> > so its name could give a better hint. Just copy the vhost_dev_start
> > check at the moment.
> >
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > ---
> >   hw/virtio/vhost-shadow-virtqueue.c | 205 +++++++++++++++++++++++++++--
> >   hw/virtio/vhost.c                  | 134 ++++++++++++++++++-
> >   2 files changed, 325 insertions(+), 14 deletions(-)
> >
> > diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> > index ff50f12410..6d767fe248 100644
> > --- a/hw/virtio/vhost-shadow-virtqueue.c
> > +++ b/hw/virtio/vhost-shadow-virtqueue.c
> > @@ -9,6 +9,7 @@
> >
> >   #include "hw/virtio/vhost-shadow-virtqueue.h"
> >   #include "hw/virtio/vhost.h"
> > +#include "hw/virtio/virtio-access.h"
> >
> >   #include "standard-headers/linux/vhost_types.h"
> >
> > @@ -48,9 +49,93 @@ typedef struct VhostShadowVirtqueue {
> >
> >       /* Virtio device */
> >       VirtIODevice *vdev;
> > +
> > +    /* Map for returning guest's descriptors */
> > +    VirtQueueElement **ring_id_maps;
> > +
> > +    /* Next head to expose to device */
> > +    uint16_t avail_idx_shadow;
> > +
> > +    /* Next free descriptor */
> > +    uint16_t free_head;
> > +
> > +    /* Last seen used idx */
> > +    uint16_t shadow_used_idx;
> > +
> > +    /* Next head to consume from device */
> > +    uint16_t used_idx;
> >   } VhostShadowVirtqueue;
> >
> > -/* Forward guest notifications */
> > +static void vhost_vring_write_descs(VhostShadowVirtqueue *svq,
> > +                                    const struct iovec *iovec,
> > +                                    size_t num, bool more_descs, bool write)
> > +{
> > +    uint16_t i = svq->free_head, last = svq->free_head;
> > +    unsigned n;
> > +    uint16_t flags = write ? cpu_to_le16(VRING_DESC_F_WRITE) : 0;
> > +    vring_desc_t *descs = svq->vring.desc;
> > +
> > +    if (num == 0) {
> > +        return;
> > +    }
> > +
> > +    for (n = 0; n < num; n++) {
> > +        if (more_descs || (n + 1 < num)) {
> > +            descs[i].flags = flags | cpu_to_le16(VRING_DESC_F_NEXT);
> > +        } else {
> > +            descs[i].flags = flags;
> > +        }
> > +        descs[i].addr = cpu_to_le64((hwaddr)iovec[n].iov_base);
> > +        descs[i].len = cpu_to_le32(iovec[n].iov_len);
> > +
> > +        last = i;
> > +        i = cpu_to_le16(descs[i].next);
> > +    }
> > +
> > +    svq->free_head = le16_to_cpu(descs[last].next);
> > +}
> > +
> > +static unsigned vhost_shadow_vq_add_split(VhostShadowVirtqueue *svq,
> > +                                          VirtQueueElement *elem)
> > +{
> > +    int head;
> > +    unsigned avail_idx;
> > +    vring_avail_t *avail = svq->vring.avail;
> > +
> > +    head = svq->free_head;
> > +
> > +    /* We need some descriptors here */
> > +    assert(elem->out_num || elem->in_num);
> > +
> > +    vhost_vring_write_descs(svq, elem->out_sg, elem->out_num,
> > +                            elem->in_num > 0, false);
> > +    vhost_vring_write_descs(svq, elem->in_sg, elem->in_num, false, true);
> > +
> > +    /*
> > +     * Put entry in available array (but don't update avail->idx until they
> > +     * do sync).
> > +     */
> > +    avail_idx = svq->avail_idx_shadow & (svq->vring.num - 1);
> > +    avail->ring[avail_idx] = cpu_to_le16(head);
> > +    svq->avail_idx_shadow++;
> > +
> > +    /* Expose descriptors to device */
>
>
> It's better to describe the detail order.
>
> E.g "Update avail index after the descriptor is wrote"
>

Agree, I will replace it with your wording.

>
> > +    smp_wmb();
> > +    avail->idx = cpu_to_le16(svq->avail_idx_shadow);
> > +
> > +    return head;
> > +
> > +}
> > +
> > +static void vhost_shadow_vq_add(VhostShadowVirtqueue *svq,
> > +                                VirtQueueElement *elem)
> > +{
> > +    unsigned qemu_head = vhost_shadow_vq_add_split(svq, elem);
> > +
> > +    svq->ring_id_maps[qemu_head] = elem;
> > +}
> > +
> > +/* Handle guest->device notifications */
> >   static void vhost_handle_guest_kick(EventNotifier *n)
> >   {
> >       VhostShadowVirtqueue *svq = container_of(n, VhostShadowVirtqueue,
> > @@ -60,7 +145,67 @@ static void vhost_handle_guest_kick(EventNotifier *n)
> >           return;
> >       }
> >
> > -    event_notifier_set(&svq->kick_notifier);
> > +    /* Make available as many buffers as possible */
> > +    do {
> > +        if (virtio_queue_get_notification(svq->vq)) {
> > +            /* No more notifications until process all available */
> > +            virtio_queue_set_notification(svq->vq, false);
> > +        }
> > +
> > +        while (true) {
> > +            VirtQueueElement *elem = virtqueue_pop(svq->vq, sizeof(*elem));
> > +            if (!elem) {
> > +                break;
> > +            }
> > +
> > +            vhost_shadow_vq_add(svq, elem);
> > +            event_notifier_set(&svq->kick_notifier);
> > +        }
> > +
> > +        virtio_queue_set_notification(svq->vq, true);
> > +    } while (!virtio_queue_empty(svq->vq));
> > +}
> > +
> > +static bool vhost_shadow_vq_more_used(VhostShadowVirtqueue *svq)
> > +{
> > +    if (svq->used_idx != svq->shadow_used_idx) {
> > +        return true;
> > +    }
> > +
> > +    /* Get used idx must not be reordered */
> > +    smp_rmb();
> > +    svq->shadow_used_idx = cpu_to_le16(svq->vring.used->idx);
> > +
> > +    return svq->used_idx != svq->shadow_used_idx;
> > +}
> > +
> > +static VirtQueueElement *vhost_shadow_vq_get_buf(VhostShadowVirtqueue *svq)
> > +{
> > +    vring_desc_t *descs = svq->vring.desc;
> > +    const vring_used_t *used = svq->vring.used;
> > +    vring_used_elem_t used_elem;
> > +    uint16_t last_used;
> > +
> > +    if (!vhost_shadow_vq_more_used(svq)) {
> > +        return NULL;
> > +    }
> > +
> > +    last_used = svq->used_idx & (svq->vring.num - 1);
> > +    used_elem.id = le32_to_cpu(used->ring[last_used].id);
> > +    used_elem.len = le32_to_cpu(used->ring[last_used].len);
> > +
> > +    if (unlikely(used_elem.id >= svq->vring.num)) {
> > +        error_report("Device %s says index %u is available", svq->vdev->name,
> > +                     used_elem.id);
> > +        return NULL;
> > +    }
> > +
> > +    descs[used_elem.id].next = svq->free_head;
> > +    svq->free_head = used_elem.id;
> > +
> > +    svq->used_idx++;
> > +    svq->ring_id_maps[used_elem.id]->len = used_elem.len;
> > +    return g_steal_pointer(&svq->ring_id_maps[used_elem.id]);
> >   }
> >
> >   /* Forward vhost notifications */
> > @@ -69,17 +214,33 @@ static void vhost_shadow_vq_handle_call_no_test(EventNotifier *n)
> >       VhostShadowVirtqueue *svq = container_of(n, VhostShadowVirtqueue,
> >                                                call_notifier);
> >       EventNotifier *masked_notifier;
> > +    VirtQueue *vq = svq->vq;
> >
> >       masked_notifier = svq->masked_notifier.n;
> >
> > -    if (!masked_notifier) {
> > -        unsigned n = virtio_get_queue_index(svq->vq);
> > -        virtio_queue_invalidate_signalled_used(svq->vdev, n);
> > -        virtio_notify_irqfd(svq->vdev, svq->vq);
> > -    } else if (!svq->masked_notifier.signaled) {
> > -        svq->masked_notifier.signaled = true;
> > -        event_notifier_set(svq->masked_notifier.n);
> > -    }
> > +    /* Make as many buffers as possible used. */
> > +    do {
> > +        unsigned i = 0;
> > +
> > +        /* TODO: Use VRING_AVAIL_F_NO_INTERRUPT */
> > +        while (true) {
> > +            g_autofree VirtQueueElement *elem = vhost_shadow_vq_get_buf(svq);
> > +            if (!elem) {
> > +                break;
> > +            }
> > +
> > +            assert(i < svq->vring.num);
> > +            virtqueue_fill(vq, elem, elem->len, i++);
> > +        }
> > +
> > +        virtqueue_flush(vq, i);
> > +        if (!masked_notifier) {
> > +            virtio_notify_irqfd(svq->vdev, svq->vq);
>
>
> Any reason that you don't use virtio_notify() here?
>

No reason but to make sure guest_notifier is used. I'm not sure if
this is an implementation detail though.

I can test to switch to virtio_notify, what would be the advantage here?

>
> > +        } else if (!svq->masked_notifier.signaled) {
> > +            svq->masked_notifier.signaled = true;
> > +            event_notifier_set(svq->masked_notifier.n);
> > +        }
>
>
> This is an example of the extra complexity if you do shadow virtqueue at
.> virtio level.
>
> If you do everything at e.g vhost-vdpa, you don't need to care about
> masked_notifer at all.
>

Correct me if I'm wrong, what you mean is to use the backend
vhost_set_vring_call to set the guest notifier (from SVQ point of
view), and then set it unconditionally. The function
vhost_virtqueue_mask would set the masked one by itself, no
modification is needed here.

Backend would still need a conditional checking if SVQ is enabled, so
it either sends call_fd to backend or to SVQ. The call to
virtqueue_fill, would still be needed if we don't want to duplicate
all the device virtio's logic in the vhost-vdpa backend.

Another possibility would be to just store guest_notifier in SVQ, and
replace it with masked notifier and back. I think this is more aligned
with what you have in mind, but it still needs changes to
vhost_virtqueue_mask. Note that the boolean store
masked_notifier.signaled is just a (maybe premature) optimization to
skip the unneeded write syscall, but it could be omitted for brevity.
Or maybe a cleaner solution is to use io_uring for this write? :).

Thanks!

> Thanks
>



  reply	other threads:[~2021-06-02 17:19 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-19 16:28 [RFC v3 00/29] vDPA software assisted live migration Eugenio Pérez
2021-05-19 16:28 ` [RFC v3 01/29] virtio: Add virtio_queue_is_host_notifier_enabled Eugenio Pérez
2021-05-19 16:28 ` [RFC v3 02/29] vhost: Save masked_notifier state Eugenio Pérez
2021-05-19 16:28 ` [RFC v3 03/29] vhost: Add VhostShadowVirtqueue Eugenio Pérez
2021-05-19 16:28 ` [RFC v3 04/29] vhost: Add x-vhost-enable-shadow-vq qmp Eugenio Pérez
2021-05-21  7:05   ` Markus Armbruster
2021-05-24  7:13     ` Eugenio Perez Martin
2021-06-08 14:23       ` Markus Armbruster
2021-06-08 15:26         ` Eugenio Perez Martin
2021-06-09 11:46           ` Markus Armbruster
2021-06-09 14:06             ` Eugenio Perez Martin
2021-05-19 16:28 ` [RFC v3 05/29] virtio: Add VIRTIO_F_QUEUE_STATE Eugenio Pérez
2021-05-19 16:28 ` [RFC v3 06/29] virtio-net: Honor VIRTIO_CONFIG_S_DEVICE_STOPPED Eugenio Pérez
2021-05-26  1:06   ` Jason Wang
2021-05-26  1:10     ` Jason Wang
2021-06-01  7:13       ` Eugenio Perez Martin
2021-06-03  3:12         ` Jason Wang
2021-05-19 16:28 ` [RFC v3 07/29] vhost: Route guest->host notification through shadow virtqueue Eugenio Pérez
2021-05-19 16:28 ` [RFC v3 08/29] vhost: Route host->guest " Eugenio Pérez
2021-05-19 16:28 ` [RFC v3 09/29] vhost: Avoid re-set masked notifier in shadow vq Eugenio Pérez
2021-05-19 16:28 ` [RFC v3 10/29] virtio: Add vhost_shadow_vq_get_vring_addr Eugenio Pérez
2021-05-19 16:28 ` [RFC v3 11/29] vhost: Add vhost_vring_pause operation Eugenio Pérez
2021-05-19 16:28 ` [RFC v3 12/29] vhost: add vhost_kernel_vring_pause Eugenio Pérez
2021-05-19 16:28 ` [RFC v3 13/29] vhost: Add vhost_get_iova_range operation Eugenio Pérez
2021-05-26  1:14   ` Jason Wang
2021-05-26 17:49     ` Eugenio Perez Martin
2021-05-27  4:51       ` Jason Wang
2021-06-01  7:17         ` Eugenio Perez Martin
2021-06-03  3:13           ` Jason Wang
2021-05-19 16:28 ` [RFC v3 14/29] vhost: add vhost_has_limited_iova_range Eugenio Pérez
2021-05-19 16:28 ` [RFC v3 15/29] vhost: Add enable_custom_iommu to VhostOps Eugenio Pérez
2021-05-31  9:01   ` Jason Wang
2021-06-01  7:49     ` Eugenio Perez Martin
2021-05-19 16:28 ` [RFC v3 16/29] vhost-vdpa: Add vhost_vdpa_enable_custom_iommu Eugenio Pérez
2021-05-19 16:28 ` [RFC v3 17/29] vhost: Shadow virtqueue buffers forwarding Eugenio Pérez
2021-06-02  9:50   ` Jason Wang
2021-06-02 17:18     ` Eugenio Perez Martin [this message]
2021-06-03  3:34       ` Jason Wang
2021-06-04  8:37         ` Eugenio Perez Martin
2021-05-19 16:28 ` [RFC v3 18/29] vhost: Use vhost_enable_custom_iommu to unmap everything if available Eugenio Pérez
2021-05-19 16:28 ` [RFC v3 19/29] vhost: Check for device VRING_USED_F_NO_NOTIFY at shadow virtqueue kick Eugenio Pérez
2021-05-19 16:28 ` [RFC v3 20/29] vhost: Use VRING_AVAIL_F_NO_INTERRUPT at device call on shadow virtqueue Eugenio Pérez
2021-05-19 16:28 ` [RFC v3 21/29] vhost: Add VhostIOVATree Eugenio Pérez
2021-05-31  9:40   ` Jason Wang
2021-06-01  8:15     ` Eugenio Perez Martin
2021-07-14  3:04       ` Jason Wang
2021-07-14  6:54         ` Eugenio Perez Martin
2021-07-14  9:14           ` Eugenio Perez Martin
2021-07-14  9:33             ` Jason Wang
2021-05-19 16:28 ` [RFC v3 22/29] vhost: Add iova_rev_maps_find_iova to IOVAReverseMaps Eugenio Pérez
2021-05-19 16:28 ` [RFC v3 23/29] vhost: Use a tree to store memory mappings Eugenio Pérez
2021-05-19 16:28 ` [RFC v3 24/29] vhost: Add iova_rev_maps_alloc Eugenio Pérez
2021-05-19 16:28 ` [RFC v3 25/29] vhost: Add custom IOTLB translations to SVQ Eugenio Pérez
2021-06-02  9:51   ` Jason Wang
2021-06-02 17:51     ` Eugenio Perez Martin
2021-06-03  3:39       ` Jason Wang
2021-06-04  9:07         ` Eugenio Perez Martin
2021-05-19 16:29 ` [RFC v3 26/29] vhost: Map in vdpa-dev Eugenio Pérez
2021-05-19 16:29 ` [RFC v3 27/29] vhost-vdpa: Implement vhost_vdpa_vring_pause operation Eugenio Pérez
2021-05-19 16:29 ` [RFC v3 28/29] vhost-vdpa: never map with vDPA listener Eugenio Pérez
2021-05-19 16:29 ` [RFC v3 29/29] vhost: Start vhost-vdpa SVQ directly Eugenio Pérez
2021-05-24  9:38 ` [RFC v3 00/29] vDPA software assisted live migration Michael S. Tsirkin
2021-05-24 10:37   ` Eugenio Perez Martin
2021-05-24 11:29     ` Michael S. Tsirkin
2021-07-19 14:13       ` Stefan Hajnoczi
2021-05-25  0:09     ` Jason Wang
2021-06-02  9:59 ` Jason Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJaqyWf7M1fjrd+kr-2bcYj+ibrqZVoREZuTiJ0i+p6dA+Dukw@mail.gmail.com \
    --to=eperezma@redhat.com \
    --cc=armbru@redhat.com \
    --cc=eli@mellanox.com \
    --cc=hanand@xilinx.com \
    --cc=jasowang@redhat.com \
    --cc=ml@napatech.com \
    --cc=mst@redhat.com \
    --cc=parav@mellanox.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=sgarzare@redhat.com \
    --cc=stefanha@redhat.com \
    --cc=virtualization@lists.linux-foundation.org \
    --cc=xiao.w.wang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).