All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eugenio Perez Martin <eperezma@redhat.com>
To: Jason Wang <jasowang@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>,
	qemu-level <qemu-devel@nongnu.org>, Peter Xu <peterx@redhat.com>,
	virtualization <virtualization@lists.linux-foundation.org>,
	Eli Cohen <eli@mellanox.com>, Eric Blake <eblake@redhat.com>,
	Parav Pandit <parav@mellanox.com>, Cindy Lu <lulu@redhat.com>,
	"Fangyi \(Eric\)" <eric.fangyi@huawei.com>,
	Markus Armbruster <armbru@redhat.com>,
	yebiaoxiang@huawei.com, Liuxiangdong <liuxiangdong5@huawei.com>,
	Stefano Garzarella <sgarzare@redhat.com>,
	Laurent Vivier <lvivier@redhat.com>,
	Eduardo Habkost <ehabkost@redhat.com>,
	Richard Henderson <richard.henderson@linaro.org>,
	Gautam Dawar <gdawar@xilinx.com>,
	Xiao W Wang <xiao.w.wang@intel.com>,
	Stefan Hajnoczi <stefanha@redhat.com>,
	Juan Quintela <quintela@redhat.com>,
	Harpreet Singh Anand <hanand@xilinx.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Lingshan <lingshan.zhu@intel.com>
Subject: Re: [PATCH v2 09/14] vhost: Add VhostIOVATree
Date: Fri, 4 Mar 2022 09:01:40 +0100	[thread overview]
Message-ID: <CAJaqyWcsUv=Kc8up=T103wz8uy8YWd+6gK3Pm5PXwHVVMuLM2Q@mail.gmail.com> (raw)
In-Reply-To: <CACGkMEsSBGo7+ox9V4tjY6Eq_rcJp0E6jXaQA=jhNh4AfdOMdw@mail.gmail.com>

On Fri, Mar 4, 2022 at 3:04 AM Jason Wang <jasowang@redhat.com> wrote:
>
> On Fri, Mar 4, 2022 at 12:33 AM Eugenio Perez Martin
> <eperezma@redhat.com> wrote:
> >
> > On Mon, Feb 28, 2022 at 8:06 AM Jason Wang <jasowang@redhat.com> wrote:
> > >
> > >
> > > 在 2022/2/27 下午9:41, Eugenio Pérez 写道:
> > > > This tree is able to look for a translated address from an IOVA address.
> > > >
> > > > At first glance it is similar to util/iova-tree. However, SVQ working on
> > > > devices with limited IOVA space need more capabilities, like allocating
> > > > IOVA chunks or performing reverse translations (qemu addresses to iova).
> > > >
> > > > The allocation capability, as "assign a free IOVA address to this chunk
> > > > of memory in qemu's address space" allows shadow virtqueue to create a
> > > > new address space that is not restricted by guest's addressable one, so
> > > > we can allocate shadow vqs vrings outside of it.
> > > >
> > > > It duplicates the tree so it can search efficiently in both directions,
> > > > and it will signal overlap if iova or the translated address is present
> > > > in any tree.
> > > >
> > > > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > > ---
> > > >   hw/virtio/vhost-iova-tree.h |  27 +++++++
> > > >   hw/virtio/vhost-iova-tree.c | 155 ++++++++++++++++++++++++++++++++++++
> > > >   hw/virtio/meson.build       |   2 +-
> > > >   3 files changed, 183 insertions(+), 1 deletion(-)
> > > >   create mode 100644 hw/virtio/vhost-iova-tree.h
> > > >   create mode 100644 hw/virtio/vhost-iova-tree.c
> > > >
> > > > diff --git a/hw/virtio/vhost-iova-tree.h b/hw/virtio/vhost-iova-tree.h
> > > > new file mode 100644
> > > > index 0000000000..6a4f24e0f9
> > > > --- /dev/null
> > > > +++ b/hw/virtio/vhost-iova-tree.h
> > > > @@ -0,0 +1,27 @@
> > > > +/*
> > > > + * vhost software live migration iova tree
> > > > + *
> > > > + * SPDX-FileCopyrightText: Red Hat, Inc. 2021
> > > > + * SPDX-FileContributor: Author: Eugenio Pérez <eperezma@redhat.com>
> > > > + *
> > > > + * SPDX-License-Identifier: GPL-2.0-or-later
> > > > + */
> > > > +
> > > > +#ifndef HW_VIRTIO_VHOST_IOVA_TREE_H
> > > > +#define HW_VIRTIO_VHOST_IOVA_TREE_H
> > > > +
> > > > +#include "qemu/iova-tree.h"
> > > > +#include "exec/memory.h"
> > > > +
> > > > +typedef struct VhostIOVATree VhostIOVATree;
> > > > +
> > > > +VhostIOVATree *vhost_iova_tree_new(uint64_t iova_first, uint64_t iova_last);
> > > > +void vhost_iova_tree_delete(VhostIOVATree *iova_tree);
> > > > +G_DEFINE_AUTOPTR_CLEANUP_FUNC(VhostIOVATree, vhost_iova_tree_delete);
> > > > +
> > > > +const DMAMap *vhost_iova_tree_find_iova(const VhostIOVATree *iova_tree,
> > > > +                                        const DMAMap *map);
> > > > +int vhost_iova_tree_map_alloc(VhostIOVATree *iova_tree, DMAMap *map);
> > > > +void vhost_iova_tree_remove(VhostIOVATree *iova_tree, const DMAMap *map);
> > > > +
> > > > +#endif
> > > > diff --git a/hw/virtio/vhost-iova-tree.c b/hw/virtio/vhost-iova-tree.c
> > > > new file mode 100644
> > > > index 0000000000..03496ac075
> > > > --- /dev/null
> > > > +++ b/hw/virtio/vhost-iova-tree.c
> > > > @@ -0,0 +1,155 @@
> > > > +/*
> > > > + * vhost software live migration iova tree
> > > > + *
> > > > + * SPDX-FileCopyrightText: Red Hat, Inc. 2021
> > > > + * SPDX-FileContributor: Author: Eugenio Pérez <eperezma@redhat.com>
> > > > + *
> > > > + * SPDX-License-Identifier: GPL-2.0-or-later
> > > > + */
> > > > +
> > > > +#include "qemu/osdep.h"
> > > > +#include "qemu/iova-tree.h"
> > > > +#include "vhost-iova-tree.h"
> > > > +
> > > > +#define iova_min_addr qemu_real_host_page_size
> > > > +
> > > > +/**
> > > > + * VhostIOVATree, able to:
> > > > + * - Translate iova address
> > > > + * - Reverse translate iova address (from translated to iova)
> > > > + * - Allocate IOVA regions for translated range (linear operation)
> > > > + */
> > > > +struct VhostIOVATree {
> > > > +    /* First addressable iova address in the device */
> > > > +    uint64_t iova_first;
> > > > +
> > > > +    /* Last addressable iova address in the device */
> > > > +    uint64_t iova_last;
> > > > +
> > > > +    /* IOVA address to qemu memory maps. */
> > > > +    IOVATree *iova_taddr_map;
> > > > +
> > > > +    /* QEMU virtual memory address to iova maps */
> > > > +    GTree *taddr_iova_map;
> > > > +};
> > > > +
> > > > +static gint vhost_iova_tree_cmp_taddr(gconstpointer a, gconstpointer b,
> > > > +                                      gpointer data)
> > > > +{
> > > > +    const DMAMap *m1 = a, *m2 = b;
> > > > +
> > > > +    if (m1->translated_addr > m2->translated_addr + m2->size) {
> > > > +        return 1;
> > > > +    }
> > > > +
> > > > +    if (m1->translated_addr + m1->size < m2->translated_addr) {
> > > > +        return -1;
> > > > +    }
> > > > +
> > > > +    /* Overlapped */
> > > > +    return 0;
> > > > +}
> > > > +
> > > > +/**
> > > > + * Create a new IOVA tree
> > > > + *
> > > > + * Returns the new IOVA tree
> > > > + */
> > > > +VhostIOVATree *vhost_iova_tree_new(hwaddr iova_first, hwaddr iova_last)
> > > > +{
> > > > +    VhostIOVATree *tree = g_new(VhostIOVATree, 1);
> > > > +
> > > > +    /* Some devices do not like 0 addresses */
> > > > +    tree->iova_first = MAX(iova_first, iova_min_addr);
> > > > +    tree->iova_last = iova_last;
> > > > +
> > > > +    tree->iova_taddr_map = iova_tree_new();
> > > > +    tree->taddr_iova_map = g_tree_new_full(vhost_iova_tree_cmp_taddr, NULL,
> > > > +                                           NULL, g_free);
> > > > +    return tree;
> > > > +}
> > > > +
> > > > +/**
> > > > + * Delete an iova tree
> > > > + */
> > > > +void vhost_iova_tree_delete(VhostIOVATree *iova_tree)
> > > > +{
> > > > +    iova_tree_destroy(iova_tree->iova_taddr_map);
> > > > +    g_tree_unref(iova_tree->taddr_iova_map);
> > > > +    g_free(iova_tree);
> > > > +}
> > > > +
> > > > +/**
> > > > + * Find the IOVA address stored from a memory address
> > > > + *
> > > > + * @tree     The iova tree
> > > > + * @map      The map with the memory address
> > > > + *
> > > > + * Return the stored mapping, or NULL if not found.
> > > > + */
> > > > +const DMAMap *vhost_iova_tree_find_iova(const VhostIOVATree *tree,
> > > > +                                        const DMAMap *map)
> > > > +{
> > > > +    return g_tree_lookup(tree->taddr_iova_map, map);
> > > > +}
> > > > +
> > > > +/**
> > > > + * Allocate a new mapping
> > > > + *
> > > > + * @tree  The iova tree
> > > > + * @map   The iova map
> > > > + *
> > > > + * Returns:
> > > > + * - IOVA_OK if the map fits in the container
> > > > + * - IOVA_ERR_INVALID if the map does not make sense (like size overflow)
> > > > + * - IOVA_ERR_OVERLAP if the tree already contains that map
> > > > + * - IOVA_ERR_NOMEM if tree cannot allocate more space.
> > > > + *
> > > > + * It returns assignated iova in map->iova if return value is VHOST_DMA_MAP_OK.
> > > > + */
> > > > +int vhost_iova_tree_map_alloc(VhostIOVATree *tree, DMAMap *map)
> > > > +{
> > > > +    /* Some vhost devices do not like addr 0. Skip first page */
> > > > +    hwaddr iova_first = tree->iova_first ?: qemu_real_host_page_size;
> > > > +    DMAMap *new;
> > > > +    int r;
> > > > +
> > > > +    if (map->translated_addr + map->size < map->translated_addr ||
> > > > +        map->perm == IOMMU_NONE) {
> > > > +        return IOVA_ERR_INVALID;
> > > > +    }
> > > > +
> > > > +    /* Check for collisions in translated addresses */
> > > > +    if (vhost_iova_tree_find_iova(tree, map)) {
> > > > +        return IOVA_ERR_OVERLAP;
> > > > +    }
> > > > +
> > > > +    /* Allocate a node in IOVA address */
> > > > +    r = iova_tree_alloc_map(tree->iova_taddr_map, map, iova_first,
> > > > +                            tree->iova_last);
> > > > +    if (r != IOVA_OK) {
> > > > +        return r;
> > > > +    }
> > > > +
> > > > +    /* Allocate node in qemu -> iova translations */
> > > > +    new = g_malloc(sizeof(*new));
> > > > +    memcpy(new, map, sizeof(*new));
> > > > +    g_tree_insert(tree->taddr_iova_map, new, new);
> > >
> > >
> > > Can the caller map two IOVA ranges to the same e.g GPA range?
> > >
> >
> > It shouldn't matter, because we are totally ignoring GPA here. HVA
> > could be more problematic.
> >
> > We call it from two places: The shadow vring addresses and through the
> > memory listener. The SVQ vring addresses should already be on a
> > separated translated address from each one and guest's HVA because of
> > malloc semantics.
>
> Right, so SVQ addresses should be fine, the problem is the guest mappings.
>
> >
> > Regarding the listener, it should already report flattened memory with
> > no overlapping between the HVA chunks.
> > vhost_vdpa_listener_skipped_section should skip all problematic
> > sections if I'm not wrong.
> >
> > But I may have missed some scenarios: vdpa devices only care about
> > IOVA -> HVA translation, so two IOVA could translate to the same HVA
> > in theory and we would not notice until we try with SVQ. To develop an
> > algorithm to handle this seems complicated at this moment: Should we
> > keep the bigger one? The last mapped? What happens if the listener
> > unmaps one of them, we suddenly must start translating from the not
> > unmapping? Seems that some kind of stacking would be needed.
> >
> > Thanks!
>
> It looks to me that we should always try to allocate new iova each
> time, even if the HVA is the same. This means we need to remove the
> reverse mapping tree.
>
> Currently we had:
>
>     /* Check for collisions in translated addresses */
>     if (vhost_iova_tree_find_iova(tree, map)) {
>         return IOVA_ERR_OVERLAP;
>     }
>
> We probably need to remove that. And during the translation we need to
> iterate the whole iova tree to get the reverse mapping instead by
> returning the largest possible mapping there.
>

I'm not sure if that is possible. g_tree_insert() calls the comparison
methods so it knows where to place the new element, so it's expected
to do something if the node already exists. Looking at the sources it
actually silently destroys the new node. If we call g_tree_replace, we
achieve the opposite and destroy the old node. But the tree is
expected to have non-overlapping keys.

Apart from that, we're not using this struct as a tree anymore so it's
better to use directly a list in that case.

But even with the list there are still questions on how to handle
overlappings. How to handle this deletion:

* Allocate translated_addr 0, size 0x1000.
* Allocate translated_addr 0, size 0x2000.
* Delete translated_addr 0, size 0x1000.

Should it delete only the first node? Both of them?

iova-tree has similar questions too with iova. Inserting (iova=0,
size=0x1000) and deleting (.iova=0, size=0x800) will delete all the
whole node, so we cannot search the translation of (.iova=0x900)
anymore. Is this expected?

> But this may degrade the performance, but consider the memslots should
> not be much at most of the time, it should be fine.
>
> Thanks
>
>
> >
> > > Thanks
> > >
> > >
> > > > +    return IOVA_OK;
> > > > +}
> > > > +
> > > > +/**
> > > > + * Remove existing mappings from iova tree
> > > > + *
> > > > + * @param  iova_tree  The vhost iova tree
> > > > + * @param  map        The map to remove
> > > > + */
> > > > +void vhost_iova_tree_remove(VhostIOVATree *iova_tree, const DMAMap *map)
> > > > +{
> > > > +    const DMAMap *overlap;
> > > > +
> > > > +    iova_tree_remove(iova_tree->iova_taddr_map, map);
> > > > +    while ((overlap = vhost_iova_tree_find_iova(iova_tree, map))) {
> > > > +        g_tree_remove(iova_tree->taddr_iova_map, overlap);
> > > > +    }
> > > > +}
> > > > diff --git a/hw/virtio/meson.build b/hw/virtio/meson.build
> > > > index 2dc87613bc..6047670804 100644
> > > > --- a/hw/virtio/meson.build
> > > > +++ b/hw/virtio/meson.build
> > > > @@ -11,7 +11,7 @@ softmmu_ss.add(when: 'CONFIG_ALL', if_true: files('vhost-stub.c'))
> > > >
> > > >   virtio_ss = ss.source_set()
> > > >   virtio_ss.add(files('virtio.c'))
> > > > -virtio_ss.add(when: 'CONFIG_VHOST', if_true: files('vhost.c', 'vhost-backend.c', 'vhost-shadow-virtqueue.c'))
> > > > +virtio_ss.add(when: 'CONFIG_VHOST', if_true: files('vhost.c', 'vhost-backend.c', 'vhost-shadow-virtqueue.c', 'vhost-iova-tree.c'))
> > > >   virtio_ss.add(when: 'CONFIG_VHOST_USER', if_true: files('vhost-user.c'))
> > > >   virtio_ss.add(when: 'CONFIG_VHOST_VDPA', if_true: files('vhost-vdpa.c'))
> > > >   virtio_ss.add(when: 'CONFIG_VIRTIO_BALLOON', if_true: files('virtio-balloon.c'))
> > >
> >
>



  reply	other threads:[~2022-03-04  8:03 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-27 13:40 [PATCH v2 00/14] vDPA shadow virtqueue Eugenio Pérez
2022-02-27 13:40 ` [PATCH v2 01/14] vhost: Add VhostShadowVirtqueue Eugenio Pérez
2022-02-27 13:40 ` [PATCH v2 02/14] vhost: Add Shadow VirtQueue kick forwarding capabilities Eugenio Pérez
2022-02-28  2:57   ` Jason Wang
2022-02-28  2:57     ` Jason Wang
2022-03-01 18:49     ` Eugenio Perez Martin
2022-03-03  7:12       ` Jason Wang
2022-03-03  7:12         ` Jason Wang
2022-03-03  9:24         ` Eugenio Perez Martin
2022-03-04  1:39           ` Jason Wang
2022-03-04  1:39             ` Jason Wang
2022-02-27 13:41 ` [PATCH v2 03/14] vhost: Add Shadow VirtQueue call " Eugenio Pérez
2022-02-28  3:18   ` Jason Wang
2022-02-28  3:18     ` Jason Wang
2022-03-01 11:18     ` Eugenio Perez Martin
2022-02-27 13:41 ` [PATCH v2 04/14] vhost: Add vhost_svq_valid_features to shadow vq Eugenio Pérez
2022-02-28  3:25   ` Jason Wang
2022-02-28  3:25     ` Jason Wang
2022-03-01 19:18     ` Eugenio Perez Martin
2022-02-27 13:41 ` [PATCH v2 05/14] virtio: Add vhost_shadow_vq_get_vring_addr Eugenio Pérez
2022-02-27 13:41 ` [PATCH v2 06/14] vdpa: adapt vhost_ops callbacks to svq Eugenio Pérez
2022-02-28  3:59   ` Jason Wang
2022-02-28  3:59     ` Jason Wang
2022-03-01 19:31     ` Eugenio Perez Martin
2022-02-27 13:41 ` [PATCH v2 07/14] vhost: Shadow virtqueue buffers forwarding Eugenio Pérez
2022-02-28  5:39   ` Jason Wang
2022-02-28  5:39     ` Jason Wang
2022-03-02 18:23     ` Eugenio Perez Martin
2022-03-03  7:35       ` Jason Wang
2022-03-03  7:35         ` Jason Wang
2022-02-27 13:41 ` [PATCH v2 08/14] util: Add iova_tree_alloc Eugenio Pérez
2022-02-28  6:39   ` Jason Wang
2022-02-28  6:39     ` Jason Wang
2022-03-01 10:06     ` Eugenio Perez Martin
2022-03-03  7:16       ` Jason Wang
2022-03-03  7:16         ` Jason Wang
2022-02-27 13:41 ` [PATCH v2 09/14] vhost: Add VhostIOVATree Eugenio Pérez
2022-02-28  7:06   ` Jason Wang
2022-02-28  7:06     ` Jason Wang
2022-03-03 16:32     ` Eugenio Perez Martin
2022-03-04  2:04       ` Jason Wang
2022-03-04  2:04         ` Jason Wang
2022-03-04  8:01         ` Eugenio Perez Martin [this message]
2022-03-07  3:41           ` Jason Wang
2022-03-07  3:41             ` Jason Wang
2022-03-07  8:56             ` Eugenio Perez Martin
2022-02-27 13:41 ` [PATCH v2 10/14] vdpa: Add custom IOTLB translations to SVQ Eugenio Pérez
2022-02-28  7:36   ` Jason Wang
2022-02-28  7:36     ` Jason Wang
2022-03-01  8:50     ` Eugenio Perez Martin
2022-03-03  7:33       ` Jason Wang
2022-03-03  7:33         ` Jason Wang
2022-03-03 11:35         ` Eugenio Perez Martin
2022-03-07  4:24           ` Jason Wang
2022-03-07  4:24             ` Jason Wang
2022-03-07  7:44             ` Eugenio Perez Martin
2022-02-27 13:41 ` [PATCH v2 11/14] vdpa: Adapt vhost_vdpa_get_vring_base " Eugenio Pérez
2022-02-28  7:38   ` Jason Wang
2022-02-28  7:38     ` Jason Wang
2022-03-01  7:51     ` Eugenio Perez Martin
2022-02-27 13:41 ` [PATCH v2 12/14] vdpa: Never set log_base addr if SVQ is enabled Eugenio Pérez
2022-02-27 13:41 ` [PATCH v2 13/14] vdpa: Expose VHOST_F_LOG_ALL on SVQ Eugenio Pérez
2022-02-27 13:41 ` [PATCH v2 14/14] vdpa: Add x-svq to NetdevVhostVDPAOptions Eugenio Pérez
2022-02-28  2:32 ` [PATCH v2 00/14] vDPA shadow virtqueue Jason Wang
2022-02-28  2:32   ` Jason Wang
2022-03-01 11:36   ` Eugenio Perez Martin
2022-02-28  7:41 ` Jason Wang
2022-02-28  7:41   ` Jason Wang
2022-03-02 20:30   ` Eugenio Perez Martin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAJaqyWcsUv=Kc8up=T103wz8uy8YWd+6gK3Pm5PXwHVVMuLM2Q@mail.gmail.com' \
    --to=eperezma@redhat.com \
    --cc=armbru@redhat.com \
    --cc=eblake@redhat.com \
    --cc=ehabkost@redhat.com \
    --cc=eli@mellanox.com \
    --cc=eric.fangyi@huawei.com \
    --cc=gdawar@xilinx.com \
    --cc=hanand@xilinx.com \
    --cc=jasowang@redhat.com \
    --cc=lingshan.zhu@intel.com \
    --cc=liuxiangdong5@huawei.com \
    --cc=lulu@redhat.com \
    --cc=lvivier@redhat.com \
    --cc=mst@redhat.com \
    --cc=parav@mellanox.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=richard.henderson@linaro.org \
    --cc=sgarzare@redhat.com \
    --cc=stefanha@redhat.com \
    --cc=virtualization@lists.linux-foundation.org \
    --cc=xiao.w.wang@intel.com \
    --cc=yebiaoxiang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.