All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [RFC 0/7] Rework vhost memory region updates
@ 2017-11-29 18:50 Dr. David Alan Gilbert (git)
  2017-11-29 18:50 ` [Qemu-devel] [RFC 1/7] memory: address_space_iterate Dr. David Alan Gilbert (git)
                   ` (9 more replies)
  0 siblings, 10 replies; 34+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2017-11-29 18:50 UTC (permalink / raw)
  To: qemu-devel, imammedo; +Cc: maxime.coquelin, mst

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Hi,
  This is an experimental set that reworks the way the vhost
code handles changes in physical address space layout that
came from a discussion with Igor.

Instead of updating and trying to merge sections of address
space on each add/remove callback, we wait until the commit phase
and go through and rebuild a list by walking the Flatview of
memory and end up producing an ordered list.
We compare the list to the old list to trigger updates.

Note, only very lightly tested so far, I'm just trying to see if it's
the right shape.

Igor, is this what you were intending?

Dave

Dr. David Alan Gilbert (7):
  memory: address_space_iterate
  vhost: Move log_dirty check
  vhost: New memory update functions
  vhost: update_mem_cb implementation
  vhost: Compare new and old memory lists
  vhost: Copy updated region data into device state
  vhost: Remove vhost_set_memory and children

 hw/virtio/trace-events |   8 +
 hw/virtio/vhost.c      | 424 ++++++++++++++++++++++---------------------------
 include/exec/memory.h  |  23 +++
 memory.c               |  22 +++
 4 files changed, 241 insertions(+), 236 deletions(-)

-- 
2.14.3

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Qemu-devel] [RFC 1/7] memory: address_space_iterate
  2017-11-29 18:50 [Qemu-devel] [RFC 0/7] Rework vhost memory region updates Dr. David Alan Gilbert (git)
@ 2017-11-29 18:50 ` Dr. David Alan Gilbert (git)
  2017-11-29 18:50 ` [Qemu-devel] [RFC 2/7] vhost: Move log_dirty check Dr. David Alan Gilbert (git)
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 34+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2017-11-29 18:50 UTC (permalink / raw)
  To: qemu-devel, imammedo; +Cc: maxime.coquelin, mst

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Iterate through an address space calling a function for each
section.  The iteration is done in order.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/exec/memory.h | 23 +++++++++++++++++++++++
 memory.c              | 22 ++++++++++++++++++++++
 2 files changed, 45 insertions(+)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 5ed4042f87..f5a9df642e 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -1987,6 +1987,29 @@ address_space_write_cached(MemoryRegionCache *cache, hwaddr addr,
     address_space_write(cache->as, cache->xlat + addr, MEMTXATTRS_UNSPECIFIED, buf, len);
 }
 
+/**
+ * ASIterateCallback: Function type called by address_space_iterate
+ *
+ * Return 0 on success or a negative error code.
+ *
+ * @mrs: Memory region section for this range
+ * @opaque: The opaque value passed in to the iterator.
+ */
+typedef int (*ASIterateCallback)(MemoryRegionSection *mrs, void *opaque);
+
+/**
+ * address_space_iterate: Call the function for each address range in the
+ *                        AddressSpace, in sorted order.
+ *
+ * Return 0 on success or a negative error code.
+ *
+ * @as: Address space to iterate over
+ * @cb: Function to call.  If the function returns none-0 the iteration will
+ *     stop.
+ * @opaque: Value to pass to the function
+ */
+int
+address_space_iterate(AddressSpace *as, ASIterateCallback cb, void *opaque);
 #endif
 
 #endif
diff --git a/memory.c b/memory.c
index e26e5a3b1d..f45137f25e 100644
--- a/memory.c
+++ b/memory.c
@@ -2810,6 +2810,28 @@ void address_space_destroy(AddressSpace *as)
     call_rcu(as, do_address_space_destroy, rcu);
 }
 
+int address_space_iterate(AddressSpace *as, ASIterateCallback cb,
+                          void *opaque)
+{
+    int res = 0;
+    FlatView *fv = address_space_to_flatview(as);
+    FlatRange *range;
+
+    flatview_ref(fv);
+
+    FOR_EACH_FLAT_RANGE(range, fv) {
+        MemoryRegionSection mrs = section_from_flat_range(range, fv);
+        res = cb(&mrs, opaque);
+        if (res) {
+            break;
+        }
+    }
+
+    flatview_unref(fv);
+
+    return res;
+}
+
 static const char *memory_region_type(MemoryRegion *mr)
 {
     if (memory_region_is_ram_device(mr)) {
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [RFC 2/7] vhost: Move log_dirty check
  2017-11-29 18:50 [Qemu-devel] [RFC 0/7] Rework vhost memory region updates Dr. David Alan Gilbert (git)
  2017-11-29 18:50 ` [Qemu-devel] [RFC 1/7] memory: address_space_iterate Dr. David Alan Gilbert (git)
@ 2017-11-29 18:50 ` Dr. David Alan Gilbert (git)
  2017-11-29 18:50 ` [Qemu-devel] [RFC 3/7] vhost: New memory update functions Dr. David Alan Gilbert (git)
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 34+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2017-11-29 18:50 UTC (permalink / raw)
  To: qemu-devel, imammedo; +Cc: maxime.coquelin, mst

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Move the log_dirty check into vhost_section.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 hw/virtio/trace-events |  3 +++
 hw/virtio/vhost.c      | 20 +++++++++++++-------
 2 files changed, 16 insertions(+), 7 deletions(-)

diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index 775461ae98..4a493bcd46 100644
--- a/hw/virtio/trace-events
+++ b/hw/virtio/trace-events
@@ -1,5 +1,8 @@
 # See docs/devel/tracing.txt for syntax documentation.
 
+# hw/virtio/vhost.c
+vhost_section(const char *name, int r) "%s:%d"
+
 # hw/virtio/virtio.c
 virtqueue_alloc_element(void *elem, size_t sz, unsigned in_num, unsigned out_num) "elem %p size %zd in_num %u out_num %u"
 virtqueue_fill(void *vq, const void *elem, unsigned int len, unsigned int idx) "vq %p elem %p len %u idx %u"
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index ddc42f0f93..938253d1e8 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -27,6 +27,7 @@
 #include "hw/virtio/virtio-access.h"
 #include "migration/blocker.h"
 #include "sysemu/dma.h"
+#include "trace.h"
 
 /* enabled until disconnected backend stabilizes */
 #define _VHOST_DEBUG 1
@@ -567,18 +568,12 @@ static void vhost_set_memory(MemoryListener *listener,
                                          memory_listener);
     hwaddr start_addr = section->offset_within_address_space;
     ram_addr_t size = int128_get64(section->size);
-    bool log_dirty =
-        memory_region_get_dirty_log_mask(section->mr) & ~(1 << DIRTY_MEMORY_MIGRATION);
     int s = offsetof(struct vhost_memory, regions) +
         (dev->mem->nregions + 1) * sizeof dev->mem->regions[0];
     void *ram;
 
     dev->mem = g_realloc(dev->mem, s);
 
-    if (log_dirty) {
-        add = false;
-    }
-
     assert(size);
 
     /* Optimize no-change case. At least cirrus_vga does this a lot at this time. */
@@ -611,8 +606,19 @@ static void vhost_set_memory(MemoryListener *listener,
 
 static bool vhost_section(MemoryRegionSection *section)
 {
-    return memory_region_is_ram(section->mr) &&
+    bool result;
+    bool log_dirty = memory_region_get_dirty_log_mask(section->mr) &
+                     ~(1 << DIRTY_MEMORY_MIGRATION);
+    result = memory_region_is_ram(section->mr) &&
         !memory_region_is_rom(section->mr);
+
+    /* Vhost doesn't handle any block which is doing dirty-tracking other
+     * than migration; this typically fires on VGA areas.
+     */
+    result &= !log_dirty;
+
+    trace_vhost_section(section->mr->name, result);
+    return result;
 }
 
 static void vhost_begin(MemoryListener *listener)
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [RFC 3/7] vhost: New memory update functions
  2017-11-29 18:50 [Qemu-devel] [RFC 0/7] Rework vhost memory region updates Dr. David Alan Gilbert (git)
  2017-11-29 18:50 ` [Qemu-devel] [RFC 1/7] memory: address_space_iterate Dr. David Alan Gilbert (git)
  2017-11-29 18:50 ` [Qemu-devel] [RFC 2/7] vhost: Move log_dirty check Dr. David Alan Gilbert (git)
@ 2017-11-29 18:50 ` Dr. David Alan Gilbert (git)
  2017-11-30 15:48   ` Igor Mammedov
  2017-11-29 18:50 ` [Qemu-devel] [RFC 4/7] vhost: update_mem_cb implementation Dr. David Alan Gilbert (git)
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 34+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2017-11-29 18:50 UTC (permalink / raw)
  To: qemu-devel, imammedo; +Cc: maxime.coquelin, mst

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

vhost_update_mem will replace the existing update mechanism.
They make use of the Flatview we have now to make the update simpler.
This commit just adds the basic structure.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 hw/virtio/vhost.c | 44 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 44 insertions(+)

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 938253d1e8..c959a59fb3 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -629,6 +629,43 @@ static void vhost_begin(MemoryListener *listener)
     dev->mem_changed_start_addr = -1;
 }
 
+struct vhost_update_mem_tmp {
+    struct vhost_dev   *dev;
+    uint32_t nregions;
+    struct vhost_memory_region *regions;
+};
+
+/* Called for each MRS from vhost_update_mem */
+static int vhost_update_mem_cb(MemoryRegionSection *mrs, void *opaque)
+{
+    if (!vhost_section(mrs)) {
+        return 0;
+    }
+
+    /* TODO */
+    return 0;
+}
+
+static int vhost_update_mem(struct vhost_dev *dev)
+{
+    int res;
+    struct vhost_update_mem_tmp vtmp;
+    vtmp.regions = 0;
+    vtmp.nregions = 0;
+    vtmp.dev = dev;
+
+    res = address_space_iterate(&address_space_memory,
+                                vhost_update_mem_cb, &vtmp);
+    if (res) {
+        goto out;
+    }
+
+    /* TODO */
+out:
+    g_free(vtmp.regions);
+    return res;
+}
+
 static void vhost_commit(MemoryListener *listener)
 {
     struct vhost_dev *dev = container_of(listener, struct vhost_dev,
@@ -648,6 +685,10 @@ static void vhost_commit(MemoryListener *listener)
         return;
     }
 
+    if (vhost_update_mem(dev)) {
+        return;
+    }
+
     if (dev->started) {
         start_addr = dev->mem_changed_start_addr;
         size = dev->mem_changed_end_addr - dev->mem_changed_start_addr + 1;
@@ -1523,6 +1564,9 @@ int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev)
         goto fail_features;
     }
 
+    if (vhost_update_mem(hdev)) {
+        goto fail_mem;
+    }
     if (vhost_dev_has_iommu(hdev)) {
         memory_listener_register(&hdev->iommu_listener, vdev->dma_as);
     }
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [RFC 4/7] vhost: update_mem_cb implementation
  2017-11-29 18:50 [Qemu-devel] [RFC 0/7] Rework vhost memory region updates Dr. David Alan Gilbert (git)
                   ` (2 preceding siblings ...)
  2017-11-29 18:50 ` [Qemu-devel] [RFC 3/7] vhost: New memory update functions Dr. David Alan Gilbert (git)
@ 2017-11-29 18:50 ` Dr. David Alan Gilbert (git)
  2017-11-30 11:27   ` Igor Mammedov
  2017-11-29 18:50 ` [Qemu-devel] [RFC 5/7] vhost: Compare new and old memory lists Dr. David Alan Gilbert (git)
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 34+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2017-11-29 18:50 UTC (permalink / raw)
  To: qemu-devel, imammedo; +Cc: maxime.coquelin, mst

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Add the meat of update_mem_cb;  this is called for each region,
to add a region to our temporary list.
Our temporary list is in order we look to see if this
region can be merged with the current head of list.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 hw/virtio/trace-events |  2 ++
 hw/virtio/vhost.c      | 55 +++++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 56 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index 4a493bcd46..92fadec192 100644
--- a/hw/virtio/trace-events
+++ b/hw/virtio/trace-events
@@ -2,6 +2,8 @@
 
 # hw/virtio/vhost.c
 vhost_section(const char *name, int r) "%s:%d"
+vhost_update_mem_cb(const char *name, uint64_t gpa, uint64_t size, uint64_t host) "%s: 0x%"PRIx64"+0x%"PRIx64" @ 0x%"PRIx64
+vhost_update_mem_cb_abut(const char *name, uint64_t new_size) "%s: 0x%"PRIx64
 
 # hw/virtio/virtio.c
 virtqueue_alloc_element(void *elem, size_t sz, unsigned in_num, unsigned out_num) "elem %p size %zd in_num %u out_num %u"
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index c959a59fb3..7e3c6ae032 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -638,11 +638,64 @@ struct vhost_update_mem_tmp {
 /* Called for each MRS from vhost_update_mem */
 static int vhost_update_mem_cb(MemoryRegionSection *mrs, void *opaque)
 {
+    struct vhost_update_mem_tmp *vtmp = opaque;
+    struct vhost_memory_region *cur_vmr;
+    bool need_add = true;
+    uint64_t mrs_size;
+    uint64_t mrs_gpa;
+    uintptr_t mrs_host;
+
     if (!vhost_section(mrs)) {
         return 0;
     }
+    mrs_size = int128_get64(mrs->size);
+    mrs_gpa  = mrs->offset_within_address_space;
+    mrs_host = (uintptr_t)memory_region_get_ram_ptr(mrs->mr) +
+                         mrs->offset_within_region;
+
+    trace_vhost_update_mem_cb(mrs->mr->name, mrs_gpa, mrs_size,
+                              (uint64_t)mrs_host);
+
+    if (vtmp->nregions) {
+        /* Since we already have at least one region, lets see if
+         * this extends it; since we're scanning in order, we only
+         * have to look at the last one, and the FlatView that calls
+         * us shouldn't have overlaps.
+         */
+        struct vhost_memory_region *prev_vmr = vtmp->regions +
+                                               (vtmp->nregions - 1);
+        uint64_t prev_gpa_start = prev_vmr->guest_phys_addr;
+        uint64_t prev_gpa_end   = range_get_last(prev_gpa_start,
+                                                 prev_vmr->memory_size);
+        uint64_t prev_host_start = prev_vmr->userspace_addr;
+        uint64_t prev_host_end   = range_get_last(prev_host_start,
+                                                  prev_vmr->memory_size);
+
+        if (prev_gpa_end + 1 == mrs_gpa &&
+            prev_host_end + 1 == mrs_host &&
+            (!vtmp->dev->vhost_ops->vhost_backend_can_merge ||
+                vtmp->dev->vhost_ops->vhost_backend_can_merge(vtmp->dev,
+                    mrs_host, mrs_size,
+                    prev_host_start, prev_vmr->memory_size))) {
+            /* The two regions abut */
+            need_add = false;
+            mrs_size = mrs_size + prev_vmr->memory_size;
+            prev_vmr->memory_size = mrs_size;
+            trace_vhost_update_mem_cb_abut(mrs->mr->name, mrs_size);
+        }
+    }
+
+    if (need_add) {
+        vtmp->nregions++;
+        vtmp->regions = g_realloc_n(vtmp->regions, vtmp->nregions,
+                                    sizeof(vtmp->regions[0]));
+        cur_vmr = &vtmp->regions[vtmp->nregions - 1];
+        cur_vmr->guest_phys_addr = mrs_gpa;
+        cur_vmr->memory_size     = mrs_size;
+        cur_vmr->userspace_addr  = mrs_host;
+        cur_vmr->flags_padding = 0;
+    }
 
-    /* TODO */
     return 0;
 }
 
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [RFC 5/7] vhost: Compare new and old memory lists
  2017-11-29 18:50 [Qemu-devel] [RFC 0/7] Rework vhost memory region updates Dr. David Alan Gilbert (git)
                   ` (3 preceding siblings ...)
  2017-11-29 18:50 ` [Qemu-devel] [RFC 4/7] vhost: update_mem_cb implementation Dr. David Alan Gilbert (git)
@ 2017-11-29 18:50 ` Dr. David Alan Gilbert (git)
  2017-11-29 18:50 ` [Qemu-devel] [RFC 6/7] vhost: Copy updated region data into device state Dr. David Alan Gilbert (git)
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 34+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2017-11-29 18:50 UTC (permalink / raw)
  To: qemu-devel, imammedo; +Cc: maxime.coquelin, mst

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Compare the memory section list we've just built with the old one
and produce an upper/lower bound of addresses changed.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 hw/virtio/trace-events |  3 ++
 hw/virtio/vhost.c      | 74 ++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 77 insertions(+)

diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index 92fadec192..ba85a1855b 100644
--- a/hw/virtio/trace-events
+++ b/hw/virtio/trace-events
@@ -2,6 +2,9 @@
 
 # hw/virtio/vhost.c
 vhost_section(const char *name, int r) "%s:%d"
+vhost_update_compare_list_loopn(int newi, int oldi, uint64_t gpa, uint64_t size) "%d/%d: (new) 0x%"PRIx64"+0x%"PRIx64
+vhost_update_compare_list_loopo(int newi, int oldi, uint64_t gpa, uint64_t size) "%d/%d: (old) 0x%"PRIx64"+0x%"PRIx64
+vhost_update_mem_comparison(bool update, uint64_t start, uint64_t end) "update: %d  0x%"PRIx64" - 0x%"PRIx64
 vhost_update_mem_cb(const char *name, uint64_t gpa, uint64_t size, uint64_t host) "%s: 0x%"PRIx64"+0x%"PRIx64" @ 0x%"PRIx64
 vhost_update_mem_cb_abut(const char *name, uint64_t new_size) "%s: 0x%"PRIx64
 
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 7e3c6ae032..ae3d57df1d 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -699,10 +699,79 @@ static int vhost_update_mem_cb(MemoryRegionSection *mrs, void *opaque)
     return 0;
 }
 
+static bool vhost_update_compare_list(struct vhost_dev *dev,
+                                      struct vhost_update_mem_tmp *vtmp,
+                                      hwaddr *change_start, hwaddr *change_end)
+{
+    uint32_t oldi, newi;
+    *change_start = 0;
+    *change_end = 0;
+
+    for (newi = 0, oldi = 0; newi < vtmp->nregions; newi++) {
+        struct vhost_memory_region *newr = &vtmp->regions[newi];
+        hwaddr newr_last = range_get_last(newr->guest_phys_addr,
+                                          newr->memory_size);
+        trace_vhost_update_compare_list_loopn(newi, oldi,
+                                        newr->guest_phys_addr,
+                                        newr->memory_size);
+        bool whole_change = true;
+        while (oldi < dev->mem->nregions) {
+            struct vhost_memory_region *oldr = &dev->mem->regions[oldi];
+            hwaddr oldr_last = range_get_last(oldr->guest_phys_addr,
+                                          oldr->memory_size);
+            trace_vhost_update_compare_list_loopo(newi, oldi,
+                                        oldr->guest_phys_addr,
+                                        oldr->memory_size);
+            if (newr->guest_phys_addr == oldr->guest_phys_addr &&
+                newr->memory_size == oldr->memory_size) {
+                /* Match in GPA and size, but it could be different
+                 * in host address or flags
+                 */
+                whole_change = newr->userspace_addr != oldr->userspace_addr ||
+                               newr->flags_padding != oldr->flags_padding;
+                oldi++;
+                break; /* inner old loop */
+            }
+            /* There's a difference - need to figure out what */
+            if (oldr_last < newr->guest_phys_addr) {
+                /* There used to be a region before us that's gone */
+                *change_start = MIN(*change_start, oldr->guest_phys_addr);
+                *change_end = MAX(*change_end, oldr_last);
+                oldi++;
+                continue; /* inner old loop */
+            }
+            if (oldr->guest_phys_addr > newr_last) {
+                /* We've passed all the old mappings that could have overlapped
+                 * this one
+                 */
+                break; /* Inner old loop */
+            }
+            /* Overlap case */
+            *change_start = MIN(*change_start,
+                                MIN(oldr->guest_phys_addr,
+                                    newr->guest_phys_addr));
+            *change_end = MAX(*change_end,
+                                MAX(oldr_last, newr_last));
+            whole_change = false;
+            /* There might be more old mappings that overlap */
+            oldi++;
+        }
+        if (whole_change) {
+            /* No old region to compare against, this must be a change */
+            *change_start = MIN(*change_start, newr->guest_phys_addr);
+            *change_end = MAX(*change_end, newr_last);
+        }
+    }
+
+    return *change_start || *change_end;
+}
+
 static int vhost_update_mem(struct vhost_dev *dev)
 {
     int res;
     struct vhost_update_mem_tmp vtmp;
+    hwaddr change_start, change_end;
+    bool need_update;
     vtmp.regions = 0;
     vtmp.nregions = 0;
     vtmp.dev = dev;
@@ -713,6 +782,11 @@ static int vhost_update_mem(struct vhost_dev *dev)
         goto out;
     }
 
+    need_update = vhost_update_compare_list(dev, &vtmp,
+                                            &change_start, &change_end);
+    trace_vhost_update_mem_comparison(need_update,
+                                      (uint64_t)change_start,
+                                      (uint64_t)change_end);
     /* TODO */
 out:
     g_free(vtmp.regions);
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [RFC 6/7] vhost: Copy updated region data into device state
  2017-11-29 18:50 [Qemu-devel] [RFC 0/7] Rework vhost memory region updates Dr. David Alan Gilbert (git)
                   ` (4 preceding siblings ...)
  2017-11-29 18:50 ` [Qemu-devel] [RFC 5/7] vhost: Compare new and old memory lists Dr. David Alan Gilbert (git)
@ 2017-11-29 18:50 ` Dr. David Alan Gilbert (git)
  2017-11-29 18:50 ` [Qemu-devel] [RFC 7/7] vhost: Remove vhost_set_memory and children Dr. David Alan Gilbert (git)
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 34+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2017-11-29 18:50 UTC (permalink / raw)
  To: qemu-devel, imammedo; +Cc: maxime.coquelin, mst

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Copy the temporary region data we calculated into the device state.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 hw/virtio/vhost.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index ae3d57df1d..34011f9acb 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -787,7 +787,21 @@ static int vhost_update_mem(struct vhost_dev *dev)
     trace_vhost_update_mem_comparison(need_update,
                                       (uint64_t)change_start,
                                       (uint64_t)change_end);
-    /* TODO */
+    if (need_update) {
+        /* Update the main regions list from our tmp */
+        size_t mem_size = offsetof(struct vhost_memory, regions) +
+            (vtmp.nregions + 1) * sizeof dev->mem->regions[0];
+
+        dev->mem = g_realloc(dev->mem, mem_size);
+        dev->mem->nregions = vtmp.nregions;
+        memcpy(dev->mem->regions, vtmp.regions,
+               vtmp.nregions * sizeof dev->mem->regions[0]);
+        used_memslots = vtmp.nregions;
+
+        dev->mem_changed_start_addr = change_start;
+        dev->mem_changed_end_addr = change_end;
+    }
+
 out:
     g_free(vtmp.regions);
     return res;
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [RFC 7/7] vhost: Remove vhost_set_memory and children
  2017-11-29 18:50 [Qemu-devel] [RFC 0/7] Rework vhost memory region updates Dr. David Alan Gilbert (git)
                   ` (5 preceding siblings ...)
  2017-11-29 18:50 ` [Qemu-devel] [RFC 6/7] vhost: Copy updated region data into device state Dr. David Alan Gilbert (git)
@ 2017-11-29 18:50 ` Dr. David Alan Gilbert (git)
  2017-11-30 11:22 ` [Qemu-devel] [RFC 0/7] Rework vhost memory region updates Igor Mammedov
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 34+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2017-11-29 18:50 UTC (permalink / raw)
  To: qemu-devel, imammedo; +Cc: maxime.coquelin, mst

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Remove the old update mechanism, vhost_set_memory, and the functions
it uses.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 hw/virtio/vhost.c | 239 ------------------------------------------------------
 1 file changed, 239 deletions(-)

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 34011f9acb..5b9a7d7107 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -156,160 +156,6 @@ static void vhost_log_sync_range(struct vhost_dev *dev,
     }
 }
 
-/* Assign/unassign. Keep an unsorted array of non-overlapping
- * memory regions in dev->mem. */
-static void vhost_dev_unassign_memory(struct vhost_dev *dev,
-                                      uint64_t start_addr,
-                                      uint64_t size)
-{
-    int from, to, n = dev->mem->nregions;
-    /* Track overlapping/split regions for sanity checking. */
-    int overlap_start = 0, overlap_end = 0, overlap_middle = 0, split = 0;
-
-    for (from = 0, to = 0; from < n; ++from, ++to) {
-        struct vhost_memory_region *reg = dev->mem->regions + to;
-        uint64_t reglast;
-        uint64_t memlast;
-        uint64_t change;
-
-        /* clone old region */
-        if (to != from) {
-            memcpy(reg, dev->mem->regions + from, sizeof *reg);
-        }
-
-        /* No overlap is simple */
-        if (!ranges_overlap(reg->guest_phys_addr, reg->memory_size,
-                            start_addr, size)) {
-            continue;
-        }
-
-        /* Split only happens if supplied region
-         * is in the middle of an existing one. Thus it can not
-         * overlap with any other existing region. */
-        assert(!split);
-
-        reglast = range_get_last(reg->guest_phys_addr, reg->memory_size);
-        memlast = range_get_last(start_addr, size);
-
-        /* Remove whole region */
-        if (start_addr <= reg->guest_phys_addr && memlast >= reglast) {
-            --dev->mem->nregions;
-            --to;
-            ++overlap_middle;
-            continue;
-        }
-
-        /* Shrink region */
-        if (memlast >= reglast) {
-            reg->memory_size = start_addr - reg->guest_phys_addr;
-            assert(reg->memory_size);
-            assert(!overlap_end);
-            ++overlap_end;
-            continue;
-        }
-
-        /* Shift region */
-        if (start_addr <= reg->guest_phys_addr) {
-            change = memlast + 1 - reg->guest_phys_addr;
-            reg->memory_size -= change;
-            reg->guest_phys_addr += change;
-            reg->userspace_addr += change;
-            assert(reg->memory_size);
-            assert(!overlap_start);
-            ++overlap_start;
-            continue;
-        }
-
-        /* This only happens if supplied region
-         * is in the middle of an existing one. Thus it can not
-         * overlap with any other existing region. */
-        assert(!overlap_start);
-        assert(!overlap_end);
-        assert(!overlap_middle);
-        /* Split region: shrink first part, shift second part. */
-        memcpy(dev->mem->regions + n, reg, sizeof *reg);
-        reg->memory_size = start_addr - reg->guest_phys_addr;
-        assert(reg->memory_size);
-        change = memlast + 1 - reg->guest_phys_addr;
-        reg = dev->mem->regions + n;
-        reg->memory_size -= change;
-        assert(reg->memory_size);
-        reg->guest_phys_addr += change;
-        reg->userspace_addr += change;
-        /* Never add more than 1 region */
-        assert(dev->mem->nregions == n);
-        ++dev->mem->nregions;
-        ++split;
-    }
-}
-
-/* Called after unassign, so no regions overlap the given range. */
-static void vhost_dev_assign_memory(struct vhost_dev *dev,
-                                    uint64_t start_addr,
-                                    uint64_t size,
-                                    uint64_t uaddr)
-{
-    int from, to;
-    struct vhost_memory_region *merged = NULL;
-    for (from = 0, to = 0; from < dev->mem->nregions; ++from, ++to) {
-        struct vhost_memory_region *reg = dev->mem->regions + to;
-        uint64_t prlast, urlast;
-        uint64_t pmlast, umlast;
-        uint64_t s, e, u;
-
-        /* clone old region */
-        if (to != from) {
-            memcpy(reg, dev->mem->regions + from, sizeof *reg);
-        }
-        prlast = range_get_last(reg->guest_phys_addr, reg->memory_size);
-        pmlast = range_get_last(start_addr, size);
-        urlast = range_get_last(reg->userspace_addr, reg->memory_size);
-        umlast = range_get_last(uaddr, size);
-
-        /* check for overlapping regions: should never happen. */
-        assert(prlast < start_addr || pmlast < reg->guest_phys_addr);
-        /* Not an adjacent or overlapping region - do not merge. */
-        if ((prlast + 1 != start_addr || urlast + 1 != uaddr) &&
-            (pmlast + 1 != reg->guest_phys_addr ||
-             umlast + 1 != reg->userspace_addr)) {
-            continue;
-        }
-
-        if (dev->vhost_ops->vhost_backend_can_merge &&
-            !dev->vhost_ops->vhost_backend_can_merge(dev, uaddr, size,
-                                                     reg->userspace_addr,
-                                                     reg->memory_size)) {
-            continue;
-        }
-
-        if (merged) {
-            --to;
-            assert(to >= 0);
-        } else {
-            merged = reg;
-        }
-        u = MIN(uaddr, reg->userspace_addr);
-        s = MIN(start_addr, reg->guest_phys_addr);
-        e = MAX(pmlast, prlast);
-        uaddr = merged->userspace_addr = u;
-        start_addr = merged->guest_phys_addr = s;
-        size = merged->memory_size = e - s + 1;
-        assert(merged->memory_size);
-    }
-
-    if (!merged) {
-        struct vhost_memory_region *reg = dev->mem->regions + to;
-        memset(reg, 0, sizeof *reg);
-        reg->memory_size = size;
-        assert(reg->memory_size);
-        reg->guest_phys_addr = start_addr;
-        reg->userspace_addr = uaddr;
-        ++to;
-    }
-    assert(to <= dev->mem->nregions + 1);
-    dev->mem->nregions = to;
-}
-
 static uint64_t vhost_get_log_size(struct vhost_dev *dev)
 {
     uint64_t log_size = 0;
@@ -521,89 +367,6 @@ static int vhost_verify_ring_mappings(struct vhost_dev *dev,
     return r;
 }
 
-static struct vhost_memory_region *vhost_dev_find_reg(struct vhost_dev *dev,
-						      uint64_t start_addr,
-						      uint64_t size)
-{
-    int i, n = dev->mem->nregions;
-    for (i = 0; i < n; ++i) {
-        struct vhost_memory_region *reg = dev->mem->regions + i;
-        if (ranges_overlap(reg->guest_phys_addr, reg->memory_size,
-                           start_addr, size)) {
-            return reg;
-        }
-    }
-    return NULL;
-}
-
-static bool vhost_dev_cmp_memory(struct vhost_dev *dev,
-                                 uint64_t start_addr,
-                                 uint64_t size,
-                                 uint64_t uaddr)
-{
-    struct vhost_memory_region *reg = vhost_dev_find_reg(dev, start_addr, size);
-    uint64_t reglast;
-    uint64_t memlast;
-
-    if (!reg) {
-        return true;
-    }
-
-    reglast = range_get_last(reg->guest_phys_addr, reg->memory_size);
-    memlast = range_get_last(start_addr, size);
-
-    /* Need to extend region? */
-    if (start_addr < reg->guest_phys_addr || memlast > reglast) {
-        return true;
-    }
-    /* userspace_addr changed? */
-    return uaddr != reg->userspace_addr + start_addr - reg->guest_phys_addr;
-}
-
-static void vhost_set_memory(MemoryListener *listener,
-                             MemoryRegionSection *section,
-                             bool add)
-{
-    struct vhost_dev *dev = container_of(listener, struct vhost_dev,
-                                         memory_listener);
-    hwaddr start_addr = section->offset_within_address_space;
-    ram_addr_t size = int128_get64(section->size);
-    int s = offsetof(struct vhost_memory, regions) +
-        (dev->mem->nregions + 1) * sizeof dev->mem->regions[0];
-    void *ram;
-
-    dev->mem = g_realloc(dev->mem, s);
-
-    assert(size);
-
-    /* Optimize no-change case. At least cirrus_vga does this a lot at this time. */
-    ram = memory_region_get_ram_ptr(section->mr) + section->offset_within_region;
-    if (add) {
-        if (!vhost_dev_cmp_memory(dev, start_addr, size, (uintptr_t)ram)) {
-            /* Region exists with same address. Nothing to do. */
-            return;
-        }
-    } else {
-        if (!vhost_dev_find_reg(dev, start_addr, size)) {
-            /* Removing region that we don't access. Nothing to do. */
-            return;
-        }
-    }
-
-    vhost_dev_unassign_memory(dev, start_addr, size);
-    if (add) {
-        /* Add given mapping, merging adjacent regions if any */
-        vhost_dev_assign_memory(dev, start_addr, size, (uintptr_t)ram);
-    } else {
-        /* Remove old mapping for this memory, if any. */
-        vhost_dev_unassign_memory(dev, start_addr, size);
-    }
-    dev->mem_changed_start_addr = MIN(dev->mem_changed_start_addr, start_addr);
-    dev->mem_changed_end_addr = MAX(dev->mem_changed_end_addr, start_addr + size - 1);
-    dev->memory_changed = true;
-    used_memslots = dev->mem->nregions;
-}
-
 static bool vhost_section(MemoryRegionSection *section)
 {
     bool result;
@@ -880,7 +643,6 @@ static void vhost_region_add(MemoryListener *listener,
                                 dev->n_mem_sections);
     dev->mem_sections[dev->n_mem_sections - 1] = *section;
     memory_region_ref(section->mr);
-    vhost_set_memory(listener, section, true);
 }
 
 static void vhost_region_del(MemoryListener *listener,
@@ -894,7 +656,6 @@ static void vhost_region_del(MemoryListener *listener,
         return;
     }
 
-    vhost_set_memory(listener, section, false);
     memory_region_unref(section->mr);
     for (i = 0; i < dev->n_mem_sections; ++i) {
         if (dev->mem_sections[i].offset_within_address_space
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [RFC 0/7] Rework vhost memory region updates
  2017-11-29 18:50 [Qemu-devel] [RFC 0/7] Rework vhost memory region updates Dr. David Alan Gilbert (git)
                   ` (6 preceding siblings ...)
  2017-11-29 18:50 ` [Qemu-devel] [RFC 7/7] vhost: Remove vhost_set_memory and children Dr. David Alan Gilbert (git)
@ 2017-11-30 11:22 ` Igor Mammedov
  2017-11-30 12:08   ` Dr. David Alan Gilbert
  2017-12-01 10:02 ` Stefan Hajnoczi
  2017-12-01 14:22 ` [Qemu-devel] [RFC] vhost: check if ring mapping is still valid when building memmap Igor Mammedov
  9 siblings, 1 reply; 34+ messages in thread
From: Igor Mammedov @ 2017-11-30 11:22 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, maxime.coquelin, mst

On Wed, 29 Nov 2017 18:50:19 +0000
"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:

> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Hi,
>   This is an experimental set that reworks the way the vhost
> code handles changes in physical address space layout that
> came from a discussion with Igor.
Thanks for looking into it.

 
> Instead of updating and trying to merge sections of address
> space on each add/remove callback, we wait until the commit phase
> and go through and rebuild a list by walking the Flatview of
> memory and end up producing an ordered list.
> We compare the list to the old list to trigger updates.
> 
> Note, only very lightly tested so far, I'm just trying to see if it's
> the right shape.
> 
> Igor, is this what you were intending?

I was thinking about a little less intrusive approach

where vhost_region_add/del are modified to maintain
sorted by GPA array of mem_sections, vhost_dev::mem is dropped
altogether and vhost_memory_region array is build/used/freed
on every vhost_commit().
Maintaining sorted array should roughly cost us O(2 log n) if
binary search is used.

However I like your idea with iterator even more as it have
potential to make it even faster O(n) if we get rid of
quadratic and relatively complex vhost_update_compare_list().

Pls, see comments on individual patches.

> Dave
> 
> Dr. David Alan Gilbert (7):
>   memory: address_space_iterate
>   vhost: Move log_dirty check
>   vhost: New memory update functions
>   vhost: update_mem_cb implementation
>   vhost: Compare new and old memory lists
>   vhost: Copy updated region data into device state
>   vhost: Remove vhost_set_memory and children
> 
>  hw/virtio/trace-events |   8 +
>  hw/virtio/vhost.c      | 424 ++++++++++++++++++++++---------------------------
>  include/exec/memory.h  |  23 +++
>  memory.c               |  22 +++
>  4 files changed, 241 insertions(+), 236 deletions(-)
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [RFC 4/7] vhost: update_mem_cb implementation
  2017-11-29 18:50 ` [Qemu-devel] [RFC 4/7] vhost: update_mem_cb implementation Dr. David Alan Gilbert (git)
@ 2017-11-30 11:27   ` Igor Mammedov
  2017-12-06 20:09     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 34+ messages in thread
From: Igor Mammedov @ 2017-11-30 11:27 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, maxime.coquelin, mst

On Wed, 29 Nov 2017 18:50:23 +0000
"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:

> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Add the meat of update_mem_cb;  this is called for each region,
> to add a region to our temporary list.
> Our temporary list is in order we look to see if this
> region can be merged with the current head of list.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  hw/virtio/trace-events |  2 ++
>  hw/virtio/vhost.c      | 55 +++++++++++++++++++++++++++++++++++++++++++++++++-
>  2 files changed, 56 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
> index 4a493bcd46..92fadec192 100644
> --- a/hw/virtio/trace-events
> +++ b/hw/virtio/trace-events
> @@ -2,6 +2,8 @@
>  
>  # hw/virtio/vhost.c
>  vhost_section(const char *name, int r) "%s:%d"
> +vhost_update_mem_cb(const char *name, uint64_t gpa, uint64_t size, uint64_t host) "%s: 0x%"PRIx64"+0x%"PRIx64" @ 0x%"PRIx64
> +vhost_update_mem_cb_abut(const char *name, uint64_t new_size) "%s: 0x%"PRIx64
>  
>  # hw/virtio/virtio.c
>  virtqueue_alloc_element(void *elem, size_t sz, unsigned in_num, unsigned out_num) "elem %p size %zd in_num %u out_num %u"
> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> index c959a59fb3..7e3c6ae032 100644
> --- a/hw/virtio/vhost.c
> +++ b/hw/virtio/vhost.c
> @@ -638,11 +638,64 @@ struct vhost_update_mem_tmp {
>  /* Called for each MRS from vhost_update_mem */
>  static int vhost_update_mem_cb(MemoryRegionSection *mrs, void *opaque)
>  {
> +    struct vhost_update_mem_tmp *vtmp = opaque;
> +    struct vhost_memory_region *cur_vmr;
> +    bool need_add = true;
> +    uint64_t mrs_size;
> +    uint64_t mrs_gpa;
> +    uintptr_t mrs_host;
> +
>      if (!vhost_section(mrs)) {
>          return 0;
>      }
> +    mrs_size = int128_get64(mrs->size);
> +    mrs_gpa  = mrs->offset_within_address_space;
> +    mrs_host = (uintptr_t)memory_region_get_ram_ptr(mrs->mr) +
> +                         mrs->offset_within_region;
> +
> +    trace_vhost_update_mem_cb(mrs->mr->name, mrs_gpa, mrs_size,
> +                              (uint64_t)mrs_host);
> +
> +    if (vtmp->nregions) {
What forces you to maintain helper vhost_memory_region array
instead of MemoryRegionSection array?

> +        /* Since we already have at least one region, lets see if
> +         * this extends it; since we're scanning in order, we only
> +         * have to look at the last one, and the FlatView that calls
> +         * us shouldn't have overlaps.
> +         */
> +        struct vhost_memory_region *prev_vmr = vtmp->regions +
> +                                               (vtmp->nregions - 1);
> +        uint64_t prev_gpa_start = prev_vmr->guest_phys_addr;
> +        uint64_t prev_gpa_end   = range_get_last(prev_gpa_start,
> +                                                 prev_vmr->memory_size);
> +        uint64_t prev_host_start = prev_vmr->userspace_addr;
> +        uint64_t prev_host_end   = range_get_last(prev_host_start,
> +                                                  prev_vmr->memory_size);
> +
> +        if (prev_gpa_end + 1 == mrs_gpa &&
> +            prev_host_end + 1 == mrs_host &&
> +            (!vtmp->dev->vhost_ops->vhost_backend_can_merge ||
> +                vtmp->dev->vhost_ops->vhost_backend_can_merge(vtmp->dev,
> +                    mrs_host, mrs_size,
> +                    prev_host_start, prev_vmr->memory_size))) {
> +            /* The two regions abut */
> +            need_add = false;
> +            mrs_size = mrs_size + prev_vmr->memory_size;
> +            prev_vmr->memory_size = mrs_size;
> +            trace_vhost_update_mem_cb_abut(mrs->mr->name, mrs_size);
> +        }
> +    }
> +
> +    if (need_add) {
> +        vtmp->nregions++;
> +        vtmp->regions = g_realloc_n(vtmp->regions, vtmp->nregions,
> +                                    sizeof(vtmp->regions[0]));
> +        cur_vmr = &vtmp->regions[vtmp->nregions - 1];
> +        cur_vmr->guest_phys_addr = mrs_gpa;
> +        cur_vmr->memory_size     = mrs_size;
> +        cur_vmr->userspace_addr  = mrs_host;
> +        cur_vmr->flags_padding = 0;
> +    }
>  
> -    /* TODO */
>      return 0;
>  }
>  

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [RFC 0/7] Rework vhost memory region updates
  2017-11-30 11:22 ` [Qemu-devel] [RFC 0/7] Rework vhost memory region updates Igor Mammedov
@ 2017-11-30 12:08   ` Dr. David Alan Gilbert
  2017-11-30 12:40     ` Igor Mammedov
  0 siblings, 1 reply; 34+ messages in thread
From: Dr. David Alan Gilbert @ 2017-11-30 12:08 UTC (permalink / raw)
  To: Igor Mammedov; +Cc: qemu-devel, maxime.coquelin, mst

* Igor Mammedov (imammedo@redhat.com) wrote:
> On Wed, 29 Nov 2017 18:50:19 +0000
> "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> 
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > Hi,
> >   This is an experimental set that reworks the way the vhost
> > code handles changes in physical address space layout that
> > came from a discussion with Igor.
> Thanks for looking into it.
> 
>  
> > Instead of updating and trying to merge sections of address
> > space on each add/remove callback, we wait until the commit phase
> > and go through and rebuild a list by walking the Flatview of
> > memory and end up producing an ordered list.
> > We compare the list to the old list to trigger updates.
> > 
> > Note, only very lightly tested so far, I'm just trying to see if it's
> > the right shape.
> > 
> > Igor, is this what you were intending?
> 
> I was thinking about a little less intrusive approach
> 
> where vhost_region_add/del are modified to maintain
> sorted by GPA array of mem_sections, vhost_dev::mem is dropped
> altogether and vhost_memory_region array is build/used/freed
> on every vhost_commit().
> Maintaining sorted array should roughly cost us O(2 log n) if
> binary search is used.
> 
> However I like your idea with iterator even more as it have
> potential to make it even faster O(n) if we get rid of
> quadratic and relatively complex vhost_update_compare_list().

Note vhost_update_compare_list is complex, but it is O(n) - it's
got nested loops, but the inner loop moves forward and oldi never
gets reset back to zero.

> Pls, see comments on individual patches.

Thanks; I have fixed a couple of bugs since I posted, so I'm
more interested in structure/shape.  Any good ideas how to test
it are welcome.

Dave

> 
> > Dave
> > 
> > Dr. David Alan Gilbert (7):
> >   memory: address_space_iterate
> >   vhost: Move log_dirty check
> >   vhost: New memory update functions
> >   vhost: update_mem_cb implementation
> >   vhost: Compare new and old memory lists
> >   vhost: Copy updated region data into device state
> >   vhost: Remove vhost_set_memory and children
> > 
> >  hw/virtio/trace-events |   8 +
> >  hw/virtio/vhost.c      | 424 ++++++++++++++++++++++---------------------------
> >  include/exec/memory.h  |  23 +++
> >  memory.c               |  22 +++
> >  4 files changed, 241 insertions(+), 236 deletions(-)
> > 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [RFC 0/7] Rework vhost memory region updates
  2017-11-30 12:08   ` Dr. David Alan Gilbert
@ 2017-11-30 12:40     ` Igor Mammedov
  2017-11-30 12:47       ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 34+ messages in thread
From: Igor Mammedov @ 2017-11-30 12:40 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: qemu-devel, maxime.coquelin, mst

On Thu, 30 Nov 2017 12:08:06 +0000
"Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:

> * Igor Mammedov (imammedo@redhat.com) wrote:
> > On Wed, 29 Nov 2017 18:50:19 +0000
> > "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> >   
> > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > 
> > > Hi,
> > >   This is an experimental set that reworks the way the vhost
> > > code handles changes in physical address space layout that
> > > came from a discussion with Igor.  
> > Thanks for looking into it.
> > 
> >    
> > > Instead of updating and trying to merge sections of address
> > > space on each add/remove callback, we wait until the commit phase
> > > and go through and rebuild a list by walking the Flatview of
> > > memory and end up producing an ordered list.
> > > We compare the list to the old list to trigger updates.
> > > 
> > > Note, only very lightly tested so far, I'm just trying to see if it's
> > > the right shape.
> > > 
> > > Igor, is this what you were intending?  
> > 
> > I was thinking about a little less intrusive approach
> > 
> > where vhost_region_add/del are modified to maintain
> > sorted by GPA array of mem_sections, vhost_dev::mem is dropped
> > altogether and vhost_memory_region array is build/used/freed
> > on every vhost_commit().
> > Maintaining sorted array should roughly cost us O(2 log n) if
> > binary search is used.
> > 
> > However I like your idea with iterator even more as it have
> > potential to make it even faster O(n) if we get rid of
> > quadratic and relatively complex vhost_update_compare_list().  
> 
> Note vhost_update_compare_list is complex,
> but it is O(n) - it's
> got nested loops, but the inner loop moves forward and oldi never
> gets reset back to zero.
While skimming through patches I've overlooked it.

Anyways,
why memcmp(old_arr, new_arr) is not sufficient
to detect a change in memory map?

> 
> > Pls, see comments on individual patches.  
> 
> Thanks; I have fixed a couple of bugs since I posted, so I'm
> more interested in structure/shape.  Any good ideas how to test
> it are welcome.
> 
> Dave
> 
> >   
> > > Dave
> > > 
> > > Dr. David Alan Gilbert (7):
> > >   memory: address_space_iterate
> > >   vhost: Move log_dirty check
> > >   vhost: New memory update functions
> > >   vhost: update_mem_cb implementation
> > >   vhost: Compare new and old memory lists
> > >   vhost: Copy updated region data into device state
> > >   vhost: Remove vhost_set_memory and children
> > > 
> > >  hw/virtio/trace-events |   8 +
> > >  hw/virtio/vhost.c      | 424 ++++++++++++++++++++++---------------------------
> > >  include/exec/memory.h  |  23 +++
> > >  memory.c               |  22 +++
> > >  4 files changed, 241 insertions(+), 236 deletions(-)
> > >   
> >   
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [RFC 0/7] Rework vhost memory region updates
  2017-11-30 12:40     ` Igor Mammedov
@ 2017-11-30 12:47       ` Dr. David Alan Gilbert
  2017-11-30 12:58         ` Igor Mammedov
  0 siblings, 1 reply; 34+ messages in thread
From: Dr. David Alan Gilbert @ 2017-11-30 12:47 UTC (permalink / raw)
  To: Igor Mammedov; +Cc: qemu-devel, maxime.coquelin, mst

* Igor Mammedov (imammedo@redhat.com) wrote:
> On Thu, 30 Nov 2017 12:08:06 +0000
> "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> 
> > * Igor Mammedov (imammedo@redhat.com) wrote:
> > > On Wed, 29 Nov 2017 18:50:19 +0000
> > > "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> > >   
> > > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > > 
> > > > Hi,
> > > >   This is an experimental set that reworks the way the vhost
> > > > code handles changes in physical address space layout that
> > > > came from a discussion with Igor.  
> > > Thanks for looking into it.
> > > 
> > >    
> > > > Instead of updating and trying to merge sections of address
> > > > space on each add/remove callback, we wait until the commit phase
> > > > and go through and rebuild a list by walking the Flatview of
> > > > memory and end up producing an ordered list.
> > > > We compare the list to the old list to trigger updates.
> > > > 
> > > > Note, only very lightly tested so far, I'm just trying to see if it's
> > > > the right shape.
> > > > 
> > > > Igor, is this what you were intending?  
> > > 
> > > I was thinking about a little less intrusive approach
> > > 
> > > where vhost_region_add/del are modified to maintain
> > > sorted by GPA array of mem_sections, vhost_dev::mem is dropped
> > > altogether and vhost_memory_region array is build/used/freed
> > > on every vhost_commit().
> > > Maintaining sorted array should roughly cost us O(2 log n) if
> > > binary search is used.
> > > 
> > > However I like your idea with iterator even more as it have
> > > potential to make it even faster O(n) if we get rid of
> > > quadratic and relatively complex vhost_update_compare_list().  
> > 
> > Note vhost_update_compare_list is complex,
> > but it is O(n) - it's
> > got nested loops, but the inner loop moves forward and oldi never
> > gets reset back to zero.
> While skimming through patches I've overlooked it.
> 
> Anyways,
> why memcmp(old_arr, new_arr) is not sufficient
> to detect a change in memory map?

It tells you that you've got a change, but doesn't give
the start/end of the range that's changed, and those
are used by vhost_commit to limit the work of
vhost_verify_ring_mappings.

Dave

> > 
> > > Pls, see comments on individual patches.  
> > 
> > Thanks; I have fixed a couple of bugs since I posted, so I'm
> > more interested in structure/shape.  Any good ideas how to test
> > it are welcome.
> > 
> > Dave
> > 
> > >   
> > > > Dave
> > > > 
> > > > Dr. David Alan Gilbert (7):
> > > >   memory: address_space_iterate
> > > >   vhost: Move log_dirty check
> > > >   vhost: New memory update functions
> > > >   vhost: update_mem_cb implementation
> > > >   vhost: Compare new and old memory lists
> > > >   vhost: Copy updated region data into device state
> > > >   vhost: Remove vhost_set_memory and children
> > > > 
> > > >  hw/virtio/trace-events |   8 +
> > > >  hw/virtio/vhost.c      | 424 ++++++++++++++++++++++---------------------------
> > > >  include/exec/memory.h  |  23 +++
> > > >  memory.c               |  22 +++
> > > >  4 files changed, 241 insertions(+), 236 deletions(-)
> > > >   
> > >   
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [RFC 0/7] Rework vhost memory region updates
  2017-11-30 12:47       ` Dr. David Alan Gilbert
@ 2017-11-30 12:58         ` Igor Mammedov
  2017-11-30 13:06           ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 34+ messages in thread
From: Igor Mammedov @ 2017-11-30 12:58 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: qemu-devel, maxime.coquelin, mst

On Thu, 30 Nov 2017 12:47:20 +0000
"Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:

> * Igor Mammedov (imammedo@redhat.com) wrote:
> > On Thu, 30 Nov 2017 12:08:06 +0000
> > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> >   
> > > * Igor Mammedov (imammedo@redhat.com) wrote:  
> > > > On Wed, 29 Nov 2017 18:50:19 +0000
> > > > "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> > > >     
> > > > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > > > 
> > > > > Hi,
> > > > >   This is an experimental set that reworks the way the vhost
> > > > > code handles changes in physical address space layout that
> > > > > came from a discussion with Igor.    
> > > > Thanks for looking into it.
> > > > 
> > > >      
> > > > > Instead of updating and trying to merge sections of address
> > > > > space on each add/remove callback, we wait until the commit phase
> > > > > and go through and rebuild a list by walking the Flatview of
> > > > > memory and end up producing an ordered list.
> > > > > We compare the list to the old list to trigger updates.
> > > > > 
> > > > > Note, only very lightly tested so far, I'm just trying to see if it's
> > > > > the right shape.
> > > > > 
> > > > > Igor, is this what you were intending?    
> > > > 
> > > > I was thinking about a little less intrusive approach
> > > > 
> > > > where vhost_region_add/del are modified to maintain
> > > > sorted by GPA array of mem_sections, vhost_dev::mem is dropped
> > > > altogether and vhost_memory_region array is build/used/freed
> > > > on every vhost_commit().
> > > > Maintaining sorted array should roughly cost us O(2 log n) if
> > > > binary search is used.
> > > > 
> > > > However I like your idea with iterator even more as it have
> > > > potential to make it even faster O(n) if we get rid of
> > > > quadratic and relatively complex vhost_update_compare_list().    
> > > 
> > > Note vhost_update_compare_list is complex,
> > > but it is O(n) - it's
> > > got nested loops, but the inner loop moves forward and oldi never
> > > gets reset back to zero.  
> > While skimming through patches I've overlooked it.
> > 
> > Anyways,
> > why memcmp(old_arr, new_arr) is not sufficient
> > to detect a change in memory map?  
> 
> It tells you that you've got a change, but doesn't give
> the start/end of the range that's changed, and those
> are used by vhost_commit to limit the work of
> vhost_verify_ring_mappings.
Isn't memmap list a sorted and
 dev->mem_changed_[start|end]_addr are the lowest|highest
addresses of whole map?

If it's, so wouldn't getting values directly from 
the 1st/last entries of array be sufficient?



> 
> Dave
> 
> > >   
> > > > Pls, see comments on individual patches.    
> > > 
> > > Thanks; I have fixed a couple of bugs since I posted, so I'm
> > > more interested in structure/shape.  Any good ideas how to test
> > > it are welcome.
> > > 
> > > Dave
> > >   
> > > >     
> > > > > Dave
> > > > > 
> > > > > Dr. David Alan Gilbert (7):
> > > > >   memory: address_space_iterate
> > > > >   vhost: Move log_dirty check
> > > > >   vhost: New memory update functions
> > > > >   vhost: update_mem_cb implementation
> > > > >   vhost: Compare new and old memory lists
> > > > >   vhost: Copy updated region data into device state
> > > > >   vhost: Remove vhost_set_memory and children
> > > > > 
> > > > >  hw/virtio/trace-events |   8 +
> > > > >  hw/virtio/vhost.c      | 424 ++++++++++++++++++++++---------------------------
> > > > >  include/exec/memory.h  |  23 +++
> > > > >  memory.c               |  22 +++
> > > > >  4 files changed, 241 insertions(+), 236 deletions(-)
> > > > >     
> > > >     
> > > --
> > > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK  
> >   
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [RFC 0/7] Rework vhost memory region updates
  2017-11-30 12:58         ` Igor Mammedov
@ 2017-11-30 13:06           ` Dr. David Alan Gilbert
  2017-11-30 15:08             ` Igor Mammedov
  0 siblings, 1 reply; 34+ messages in thread
From: Dr. David Alan Gilbert @ 2017-11-30 13:06 UTC (permalink / raw)
  To: Igor Mammedov; +Cc: qemu-devel, maxime.coquelin, mst

* Igor Mammedov (imammedo@redhat.com) wrote:
> On Thu, 30 Nov 2017 12:47:20 +0000
> "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> 
> > * Igor Mammedov (imammedo@redhat.com) wrote:
> > > On Thu, 30 Nov 2017 12:08:06 +0000
> > > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> > >   
> > > > * Igor Mammedov (imammedo@redhat.com) wrote:  
> > > > > On Wed, 29 Nov 2017 18:50:19 +0000
> > > > > "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> > > > >     
> > > > > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > > > > 
> > > > > > Hi,
> > > > > >   This is an experimental set that reworks the way the vhost
> > > > > > code handles changes in physical address space layout that
> > > > > > came from a discussion with Igor.    
> > > > > Thanks for looking into it.
> > > > > 
> > > > >      
> > > > > > Instead of updating and trying to merge sections of address
> > > > > > space on each add/remove callback, we wait until the commit phase
> > > > > > and go through and rebuild a list by walking the Flatview of
> > > > > > memory and end up producing an ordered list.
> > > > > > We compare the list to the old list to trigger updates.
> > > > > > 
> > > > > > Note, only very lightly tested so far, I'm just trying to see if it's
> > > > > > the right shape.
> > > > > > 
> > > > > > Igor, is this what you were intending?    
> > > > > 
> > > > > I was thinking about a little less intrusive approach
> > > > > 
> > > > > where vhost_region_add/del are modified to maintain
> > > > > sorted by GPA array of mem_sections, vhost_dev::mem is dropped
> > > > > altogether and vhost_memory_region array is build/used/freed
> > > > > on every vhost_commit().
> > > > > Maintaining sorted array should roughly cost us O(2 log n) if
> > > > > binary search is used.
> > > > > 
> > > > > However I like your idea with iterator even more as it have
> > > > > potential to make it even faster O(n) if we get rid of
> > > > > quadratic and relatively complex vhost_update_compare_list().    
> > > > 
> > > > Note vhost_update_compare_list is complex,
> > > > but it is O(n) - it's
> > > > got nested loops, but the inner loop moves forward and oldi never
> > > > gets reset back to zero.  
> > > While skimming through patches I've overlooked it.
> > > 
> > > Anyways,
> > > why memcmp(old_arr, new_arr) is not sufficient
> > > to detect a change in memory map?  
> > 
> > It tells you that you've got a change, but doesn't give
> > the start/end of the range that's changed, and those
> > are used by vhost_commit to limit the work of
> > vhost_verify_ring_mappings.
> Isn't memmap list a sorted and
>  dev->mem_changed_[start|end]_addr are the lowest|highest
> addresses of whole map?
> 
> If it's, so wouldn't getting values directly from 
> the 1st/last entries of array be sufficient?

THat wasn't my understanding from the existing code;
my understanding was that changed_start_addr is set by
vhost_region_add->vhost_set_memory when a new region is added
(or removed) and is set to the limit of the section added.
But perhaps I'm misunderstanding.

(The logic in vhost_verify_ring_mappings doesn't make sense
to me either though; if vhost_verify_ring_part_mapping returns 0
on success, why is it doing   if (!r) { break; }  surely it
should be  if (r) { break; })

Dave

> 
> 
> > 
> > Dave
> > 
> > > >   
> > > > > Pls, see comments on individual patches.    
> > > > 
> > > > Thanks; I have fixed a couple of bugs since I posted, so I'm
> > > > more interested in structure/shape.  Any good ideas how to test
> > > > it are welcome.
> > > > 
> > > > Dave
> > > >   
> > > > >     
> > > > > > Dave
> > > > > > 
> > > > > > Dr. David Alan Gilbert (7):
> > > > > >   memory: address_space_iterate
> > > > > >   vhost: Move log_dirty check
> > > > > >   vhost: New memory update functions
> > > > > >   vhost: update_mem_cb implementation
> > > > > >   vhost: Compare new and old memory lists
> > > > > >   vhost: Copy updated region data into device state
> > > > > >   vhost: Remove vhost_set_memory and children
> > > > > > 
> > > > > >  hw/virtio/trace-events |   8 +
> > > > > >  hw/virtio/vhost.c      | 424 ++++++++++++++++++++++---------------------------
> > > > > >  include/exec/memory.h  |  23 +++
> > > > > >  memory.c               |  22 +++
> > > > > >  4 files changed, 241 insertions(+), 236 deletions(-)
> > > > > >     
> > > > >     
> > > > --
> > > > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK  
> > >   
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [RFC 0/7] Rework vhost memory region updates
  2017-11-30 13:06           ` Dr. David Alan Gilbert
@ 2017-11-30 15:08             ` Igor Mammedov
  2017-11-30 15:18               ` Dr. David Alan Gilbert
  2017-11-30 16:51               ` Greg Kurz
  0 siblings, 2 replies; 34+ messages in thread
From: Igor Mammedov @ 2017-11-30 15:08 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: maxime.coquelin, qemu-devel, mst, groug

On Thu, 30 Nov 2017 13:06:29 +0000
"Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:

> * Igor Mammedov (imammedo@redhat.com) wrote:
> > On Thu, 30 Nov 2017 12:47:20 +0000
> > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> >   
> > > * Igor Mammedov (imammedo@redhat.com) wrote:  
> > > > On Thu, 30 Nov 2017 12:08:06 +0000
> > > > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> > > >     
> > > > > * Igor Mammedov (imammedo@redhat.com) wrote:    
> > > > > > On Wed, 29 Nov 2017 18:50:19 +0000
> > > > > > "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> > > > > >       
> > > > > > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > > > > > 
> > > > > > > Hi,
> > > > > > >   This is an experimental set that reworks the way the vhost
> > > > > > > code handles changes in physical address space layout that
> > > > > > > came from a discussion with Igor.      
> > > > > > Thanks for looking into it.
> > > > > > 
> > > > > >        
> > > > > > > Instead of updating and trying to merge sections of address
> > > > > > > space on each add/remove callback, we wait until the commit phase
> > > > > > > and go through and rebuild a list by walking the Flatview of
> > > > > > > memory and end up producing an ordered list.
> > > > > > > We compare the list to the old list to trigger updates.
> > > > > > > 
> > > > > > > Note, only very lightly tested so far, I'm just trying to see if it's
> > > > > > > the right shape.
> > > > > > > 
> > > > > > > Igor, is this what you were intending?      
> > > > > > 
> > > > > > I was thinking about a little less intrusive approach
> > > > > > 
> > > > > > where vhost_region_add/del are modified to maintain
> > > > > > sorted by GPA array of mem_sections, vhost_dev::mem is dropped
> > > > > > altogether and vhost_memory_region array is build/used/freed
> > > > > > on every vhost_commit().
> > > > > > Maintaining sorted array should roughly cost us O(2 log n) if
> > > > > > binary search is used.
> > > > > > 
> > > > > > However I like your idea with iterator even more as it have
> > > > > > potential to make it even faster O(n) if we get rid of
> > > > > > quadratic and relatively complex vhost_update_compare_list().      
> > > > > 
> > > > > Note vhost_update_compare_list is complex,
> > > > > but it is O(n) - it's
> > > > > got nested loops, but the inner loop moves forward and oldi never
> > > > > gets reset back to zero.    
> > > > While skimming through patches I've overlooked it.
> > > > 
> > > > Anyways,
> > > > why memcmp(old_arr, new_arr) is not sufficient
> > > > to detect a change in memory map?    
> > > 
> > > It tells you that you've got a change, but doesn't give
> > > the start/end of the range that's changed, and those
> > > are used by vhost_commit to limit the work of
> > > vhost_verify_ring_mappings.  
> > Isn't memmap list a sorted and
> >  dev->mem_changed_[start|end]_addr are the lowest|highest
> > addresses of whole map?
> > 
> > If it's, so wouldn't getting values directly from 
> > the 1st/last entries of array be sufficient?  
> 
> THat wasn't my understanding from the existing code;
> my understanding was that changed_start_addr is set by
> vhost_region_add->vhost_set_memory when a new region is added
> (or removed) and is set to the limit of the section added.
> But perhaps I'm misunderstanding.
changed_*_addr is actually lower/upper bound of memory transaction
and in practice it often includes several memory sections that
get mapped during transaction (between begin - commit).

but then again,
  - how expensive vhost_verify_ring_mappings() is?
  - do we really need optimization here (perhaps finding out changed range is more expensive)?
  - how about calling vhost_verify_ring_mappings() unconditionally?

> (The logic in vhost_verify_ring_mappings doesn't make sense
> to me either though; if vhost_verify_ring_part_mapping returns 0
> on success, why is it doing   if (!r) { break; }  surely it
> should be  if (r) { break; })
it looks like a bug (CCing Greg)

before (f1f9e6c5 vhost: adapt vhost_verify_ring_mappings() to virtio 1 ring layout)
logic used to be

   if changed_*_addr doesn't contain ring
      "IGNORE as we don't care"
   
   if changed_*_addr contain ring AND ring can't be mapped at the same place
      ABORT

with f1f9e6c5 we have 3 rings so on any of them following could happen
   if "IGNORE as we don't care"
     break => false success 
     since it's possible that the remaining rings in vq do overlap and didn't get checked

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [RFC 0/7] Rework vhost memory region updates
  2017-11-30 15:08             ` Igor Mammedov
@ 2017-11-30 15:18               ` Dr. David Alan Gilbert
  2017-11-30 15:32                 ` Igor Mammedov
  2017-11-30 16:51               ` Greg Kurz
  1 sibling, 1 reply; 34+ messages in thread
From: Dr. David Alan Gilbert @ 2017-11-30 15:18 UTC (permalink / raw)
  To: Igor Mammedov; +Cc: maxime.coquelin, qemu-devel, mst, groug

* Igor Mammedov (imammedo@redhat.com) wrote:
> On Thu, 30 Nov 2017 13:06:29 +0000
> "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> 
> > * Igor Mammedov (imammedo@redhat.com) wrote:
> > > On Thu, 30 Nov 2017 12:47:20 +0000
> > > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> > >   
> > > > * Igor Mammedov (imammedo@redhat.com) wrote:  
> > > > > On Thu, 30 Nov 2017 12:08:06 +0000
> > > > > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> > > > >     
> > > > > > * Igor Mammedov (imammedo@redhat.com) wrote:    
> > > > > > > On Wed, 29 Nov 2017 18:50:19 +0000
> > > > > > > "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> > > > > > >       
> > > > > > > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > > > > > > 
> > > > > > > > Hi,
> > > > > > > >   This is an experimental set that reworks the way the vhost
> > > > > > > > code handles changes in physical address space layout that
> > > > > > > > came from a discussion with Igor.      
> > > > > > > Thanks for looking into it.
> > > > > > > 
> > > > > > >        
> > > > > > > > Instead of updating and trying to merge sections of address
> > > > > > > > space on each add/remove callback, we wait until the commit phase
> > > > > > > > and go through and rebuild a list by walking the Flatview of
> > > > > > > > memory and end up producing an ordered list.
> > > > > > > > We compare the list to the old list to trigger updates.
> > > > > > > > 
> > > > > > > > Note, only very lightly tested so far, I'm just trying to see if it's
> > > > > > > > the right shape.
> > > > > > > > 
> > > > > > > > Igor, is this what you were intending?      
> > > > > > > 
> > > > > > > I was thinking about a little less intrusive approach
> > > > > > > 
> > > > > > > where vhost_region_add/del are modified to maintain
> > > > > > > sorted by GPA array of mem_sections, vhost_dev::mem is dropped
> > > > > > > altogether and vhost_memory_region array is build/used/freed
> > > > > > > on every vhost_commit().
> > > > > > > Maintaining sorted array should roughly cost us O(2 log n) if
> > > > > > > binary search is used.
> > > > > > > 
> > > > > > > However I like your idea with iterator even more as it have
> > > > > > > potential to make it even faster O(n) if we get rid of
> > > > > > > quadratic and relatively complex vhost_update_compare_list().      
> > > > > > 
> > > > > > Note vhost_update_compare_list is complex,
> > > > > > but it is O(n) - it's
> > > > > > got nested loops, but the inner loop moves forward and oldi never
> > > > > > gets reset back to zero.    
> > > > > While skimming through patches I've overlooked it.
> > > > > 
> > > > > Anyways,
> > > > > why memcmp(old_arr, new_arr) is not sufficient
> > > > > to detect a change in memory map?    
> > > > 
> > > > It tells you that you've got a change, but doesn't give
> > > > the start/end of the range that's changed, and those
> > > > are used by vhost_commit to limit the work of
> > > > vhost_verify_ring_mappings.  
> > > Isn't memmap list a sorted and
> > >  dev->mem_changed_[start|end]_addr are the lowest|highest
> > > addresses of whole map?
> > > 
> > > If it's, so wouldn't getting values directly from 
> > > the 1st/last entries of array be sufficient?  
> > 
> > THat wasn't my understanding from the existing code;
> > my understanding was that changed_start_addr is set by
> > vhost_region_add->vhost_set_memory when a new region is added
> > (or removed) and is set to the limit of the section added.
> > But perhaps I'm misunderstanding.
> changed_*_addr is actually lower/upper bound of memory transaction
> and in practice it often includes several memory sections that
> get mapped during transaction (between begin - commit).
> 
> but then again,
>   - how expensive vhost_verify_ring_mappings() is?
>   - do we really need optimization here (perhaps finding out changed range is more expensive)?
>   - how about calling vhost_verify_ring_mappings() unconditionally?

My worry is that:
    vhost_verify_ring_mappings
       vhost_verify_ring_part_mapping
          vhost_verify_ring_part_mapping
             vhost_memory_map & vhost_memory_unmap
               (non-iommu case...)
               cpu_physical_memory_map & cpu_physical_memory_unmap
                 address_space_map/address_space_unmap
                    flatview_translate etc

so it's not cheap at all - I *think* it should end up doing very little
after it's gone all the way through that because it should already be
mapped; but still it's not trivial.

Dave

> 
> > (The logic in vhost_verify_ring_mappings doesn't make sense
> > to me either though; if vhost_verify_ring_part_mapping returns 0
> > on success, why is it doing   if (!r) { break; }  surely it
> > should be  if (r) { break; })
> it looks like a bug (CCing Greg)
> 
> before (f1f9e6c5 vhost: adapt vhost_verify_ring_mappings() to virtio 1 ring layout)
> logic used to be
> 
>    if changed_*_addr doesn't contain ring
>       "IGNORE as we don't care"
>    
>    if changed_*_addr contain ring AND ring can't be mapped at the same place
>       ABORT
> 
> with f1f9e6c5 we have 3 rings so on any of them following could happen
>    if "IGNORE as we don't care"
>      break => false success 
>      since it's possible that the remaining rings in vq do overlap and didn't get checked
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [RFC 0/7] Rework vhost memory region updates
  2017-11-30 15:18               ` Dr. David Alan Gilbert
@ 2017-11-30 15:32                 ` Igor Mammedov
  2017-11-30 15:41                   ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 34+ messages in thread
From: Igor Mammedov @ 2017-11-30 15:32 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: maxime.coquelin, qemu-devel, mst, groug

On Thu, 30 Nov 2017 15:18:55 +0000
"Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:

> * Igor Mammedov (imammedo@redhat.com) wrote:
> > On Thu, 30 Nov 2017 13:06:29 +0000
> > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> >   
> > > * Igor Mammedov (imammedo@redhat.com) wrote:  
> > > > On Thu, 30 Nov 2017 12:47:20 +0000
> > > > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> > > >     
> > > > > * Igor Mammedov (imammedo@redhat.com) wrote:    
> > > > > > On Thu, 30 Nov 2017 12:08:06 +0000
> > > > > > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> > > > > >       
> > > > > > > * Igor Mammedov (imammedo@redhat.com) wrote:      
> > > > > > > > On Wed, 29 Nov 2017 18:50:19 +0000
> > > > > > > > "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> > > > > > > >         
> > > > > > > > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > > > > > > > 
> > > > > > > > > Hi,
> > > > > > > > >   This is an experimental set that reworks the way the vhost
> > > > > > > > > code handles changes in physical address space layout that
> > > > > > > > > came from a discussion with Igor.        
> > > > > > > > Thanks for looking into it.
> > > > > > > > 
> > > > > > > >          
> > > > > > > > > Instead of updating and trying to merge sections of address
> > > > > > > > > space on each add/remove callback, we wait until the commit phase
> > > > > > > > > and go through and rebuild a list by walking the Flatview of
> > > > > > > > > memory and end up producing an ordered list.
> > > > > > > > > We compare the list to the old list to trigger updates.
> > > > > > > > > 
> > > > > > > > > Note, only very lightly tested so far, I'm just trying to see if it's
> > > > > > > > > the right shape.
> > > > > > > > > 
> > > > > > > > > Igor, is this what you were intending?        
> > > > > > > > 
> > > > > > > > I was thinking about a little less intrusive approach
> > > > > > > > 
> > > > > > > > where vhost_region_add/del are modified to maintain
> > > > > > > > sorted by GPA array of mem_sections, vhost_dev::mem is dropped
> > > > > > > > altogether and vhost_memory_region array is build/used/freed
> > > > > > > > on every vhost_commit().
> > > > > > > > Maintaining sorted array should roughly cost us O(2 log n) if
> > > > > > > > binary search is used.
> > > > > > > > 
> > > > > > > > However I like your idea with iterator even more as it have
> > > > > > > > potential to make it even faster O(n) if we get rid of
> > > > > > > > quadratic and relatively complex vhost_update_compare_list().        
> > > > > > > 
> > > > > > > Note vhost_update_compare_list is complex,
> > > > > > > but it is O(n) - it's
> > > > > > > got nested loops, but the inner loop moves forward and oldi never
> > > > > > > gets reset back to zero.      
> > > > > > While skimming through patches I've overlooked it.
> > > > > > 
> > > > > > Anyways,
> > > > > > why memcmp(old_arr, new_arr) is not sufficient
> > > > > > to detect a change in memory map?      
> > > > > 
> > > > > It tells you that you've got a change, but doesn't give
> > > > > the start/end of the range that's changed, and those
> > > > > are used by vhost_commit to limit the work of
> > > > > vhost_verify_ring_mappings.    
> > > > Isn't memmap list a sorted and
> > > >  dev->mem_changed_[start|end]_addr are the lowest|highest
> > > > addresses of whole map?
> > > > 
> > > > If it's, so wouldn't getting values directly from 
> > > > the 1st/last entries of array be sufficient?    
> > > 
> > > THat wasn't my understanding from the existing code;
> > > my understanding was that changed_start_addr is set by
> > > vhost_region_add->vhost_set_memory when a new region is added
> > > (or removed) and is set to the limit of the section added.
> > > But perhaps I'm misunderstanding.  
> > changed_*_addr is actually lower/upper bound of memory transaction
> > and in practice it often includes several memory sections that
> > get mapped during transaction (between begin - commit).
> > 
> > but then again,
> >   - how expensive vhost_verify_ring_mappings() is?
> >   - do we really need optimization here (perhaps finding out changed range is more expensive)?
> >   - how about calling vhost_verify_ring_mappings() unconditionally?  
> 
> My worry is that:
>     vhost_verify_ring_mappings
>        vhost_verify_ring_part_mapping
>           vhost_verify_ring_part_mapping
>              vhost_memory_map & vhost_memory_unmap
>                (non-iommu case...)
>                cpu_physical_memory_map & cpu_physical_memory_unmap
>                  address_space_map/address_space_unmap
>                     flatview_translate etc
> 
> so it's not cheap at all - I *think* it should end up doing very little
> after it's gone all the way through that because it should already be
> mapped; but still it's not trivial.
neither trivial finding out changed range.
How often it will be called and what actual time it takes
for vhost_verify_ring_mappings and vhost_update_compare_list to complete.

note 
vhost_memory_map() would be run only on ranges that
overlap with rings (typically 1), while vhost_update_compare_list()
would go over all ranges.
So question is does optimization really saves us anything?


> 
> Dave
> 
> >   
> > > (The logic in vhost_verify_ring_mappings doesn't make sense
> > > to me either though; if vhost_verify_ring_part_mapping returns 0
> > > on success, why is it doing   if (!r) { break; }  surely it
> > > should be  if (r) { break; })  
> > it looks like a bug (CCing Greg)
> > 
> > before (f1f9e6c5 vhost: adapt vhost_verify_ring_mappings() to virtio 1 ring layout)
> > logic used to be
> > 
> >    if changed_*_addr doesn't contain ring
> >       "IGNORE as we don't care"
> >    
> >    if changed_*_addr contain ring AND ring can't be mapped at the same place
> >       ABORT
> > 
> > with f1f9e6c5 we have 3 rings so on any of them following could happen
> >    if "IGNORE as we don't care"
> >      break => false success 
> >      since it's possible that the remaining rings in vq do overlap and didn't get checked
> >   
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [RFC 0/7] Rework vhost memory region updates
  2017-11-30 15:32                 ` Igor Mammedov
@ 2017-11-30 15:41                   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 34+ messages in thread
From: Dr. David Alan Gilbert @ 2017-11-30 15:41 UTC (permalink / raw)
  To: Igor Mammedov; +Cc: maxime.coquelin, qemu-devel, mst, groug

* Igor Mammedov (imammedo@redhat.com) wrote:
> On Thu, 30 Nov 2017 15:18:55 +0000
> "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> 
> > * Igor Mammedov (imammedo@redhat.com) wrote:
> > > On Thu, 30 Nov 2017 13:06:29 +0000
> > > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> > >   
> > > > * Igor Mammedov (imammedo@redhat.com) wrote:  
> > > > > On Thu, 30 Nov 2017 12:47:20 +0000
> > > > > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> > > > >     
> > > > > > * Igor Mammedov (imammedo@redhat.com) wrote:    
> > > > > > > On Thu, 30 Nov 2017 12:08:06 +0000
> > > > > > > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> > > > > > >       
> > > > > > > > * Igor Mammedov (imammedo@redhat.com) wrote:      
> > > > > > > > > On Wed, 29 Nov 2017 18:50:19 +0000
> > > > > > > > > "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> > > > > > > > >         
> > > > > > > > > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > > > > > > > > 
> > > > > > > > > > Hi,
> > > > > > > > > >   This is an experimental set that reworks the way the vhost
> > > > > > > > > > code handles changes in physical address space layout that
> > > > > > > > > > came from a discussion with Igor.        
> > > > > > > > > Thanks for looking into it.
> > > > > > > > > 
> > > > > > > > >          
> > > > > > > > > > Instead of updating and trying to merge sections of address
> > > > > > > > > > space on each add/remove callback, we wait until the commit phase
> > > > > > > > > > and go through and rebuild a list by walking the Flatview of
> > > > > > > > > > memory and end up producing an ordered list.
> > > > > > > > > > We compare the list to the old list to trigger updates.
> > > > > > > > > > 
> > > > > > > > > > Note, only very lightly tested so far, I'm just trying to see if it's
> > > > > > > > > > the right shape.
> > > > > > > > > > 
> > > > > > > > > > Igor, is this what you were intending?        
> > > > > > > > > 
> > > > > > > > > I was thinking about a little less intrusive approach
> > > > > > > > > 
> > > > > > > > > where vhost_region_add/del are modified to maintain
> > > > > > > > > sorted by GPA array of mem_sections, vhost_dev::mem is dropped
> > > > > > > > > altogether and vhost_memory_region array is build/used/freed
> > > > > > > > > on every vhost_commit().
> > > > > > > > > Maintaining sorted array should roughly cost us O(2 log n) if
> > > > > > > > > binary search is used.
> > > > > > > > > 
> > > > > > > > > However I like your idea with iterator even more as it have
> > > > > > > > > potential to make it even faster O(n) if we get rid of
> > > > > > > > > quadratic and relatively complex vhost_update_compare_list().        
> > > > > > > > 
> > > > > > > > Note vhost_update_compare_list is complex,
> > > > > > > > but it is O(n) - it's
> > > > > > > > got nested loops, but the inner loop moves forward and oldi never
> > > > > > > > gets reset back to zero.      
> > > > > > > While skimming through patches I've overlooked it.
> > > > > > > 
> > > > > > > Anyways,
> > > > > > > why memcmp(old_arr, new_arr) is not sufficient
> > > > > > > to detect a change in memory map?      
> > > > > > 
> > > > > > It tells you that you've got a change, but doesn't give
> > > > > > the start/end of the range that's changed, and those
> > > > > > are used by vhost_commit to limit the work of
> > > > > > vhost_verify_ring_mappings.    
> > > > > Isn't memmap list a sorted and
> > > > >  dev->mem_changed_[start|end]_addr are the lowest|highest
> > > > > addresses of whole map?
> > > > > 
> > > > > If it's, so wouldn't getting values directly from 
> > > > > the 1st/last entries of array be sufficient?    
> > > > 
> > > > THat wasn't my understanding from the existing code;
> > > > my understanding was that changed_start_addr is set by
> > > > vhost_region_add->vhost_set_memory when a new region is added
> > > > (or removed) and is set to the limit of the section added.
> > > > But perhaps I'm misunderstanding.  
> > > changed_*_addr is actually lower/upper bound of memory transaction
> > > and in practice it often includes several memory sections that
> > > get mapped during transaction (between begin - commit).
> > > 
> > > but then again,
> > >   - how expensive vhost_verify_ring_mappings() is?
> > >   - do we really need optimization here (perhaps finding out changed range is more expensive)?
> > >   - how about calling vhost_verify_ring_mappings() unconditionally?  
> > 
> > My worry is that:
> >     vhost_verify_ring_mappings
> >        vhost_verify_ring_part_mapping
> >           vhost_verify_ring_part_mapping
> >              vhost_memory_map & vhost_memory_unmap
> >                (non-iommu case...)
> >                cpu_physical_memory_map & cpu_physical_memory_unmap
> >                  address_space_map/address_space_unmap
> >                     flatview_translate etc
> > 
> > so it's not cheap at all - I *think* it should end up doing very little
> > after it's gone all the way through that because it should already be
> > mapped; but still it's not trivial.
> neither trivial finding out changed range.
> How often it will be called and what actual time it takes
> for vhost_verify_ring_mappings and vhost_update_compare_list to complete.
> 
> note 
> vhost_memory_map() would be run only on ranges that
> overlap with rings (typically 1), while vhost_update_compare_list()
> would go over all ranges.
> So question is does optimization really saves us anything?

Frankly I don't know - I mean what causes and how often are these
changes anyway?  In my setup here I don't normally see any.
vhost_update_compare_list is a bit complex - but it is O(n) and none of
the things it does are actually expensive - they're just simple min/max;
so it should be pretty cheap.  The case where the entries match is
especially cheap.

Dave

> 
> > 
> > Dave
> > 
> > >   
> > > > (The logic in vhost_verify_ring_mappings doesn't make sense
> > > > to me either though; if vhost_verify_ring_part_mapping returns 0
> > > > on success, why is it doing   if (!r) { break; }  surely it
> > > > should be  if (r) { break; })  
> > > it looks like a bug (CCing Greg)
> > > 
> > > before (f1f9e6c5 vhost: adapt vhost_verify_ring_mappings() to virtio 1 ring layout)
> > > logic used to be
> > > 
> > >    if changed_*_addr doesn't contain ring
> > >       "IGNORE as we don't care"
> > >    
> > >    if changed_*_addr contain ring AND ring can't be mapped at the same place
> > >       ABORT
> > > 
> > > with f1f9e6c5 we have 3 rings so on any of them following could happen
> > >    if "IGNORE as we don't care"
> > >      break => false success 
> > >      since it's possible that the remaining rings in vq do overlap and didn't get checked
> > >   
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [RFC 3/7] vhost: New memory update functions
  2017-11-29 18:50 ` [Qemu-devel] [RFC 3/7] vhost: New memory update functions Dr. David Alan Gilbert (git)
@ 2017-11-30 15:48   ` Igor Mammedov
  2017-12-05 18:25     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 34+ messages in thread
From: Igor Mammedov @ 2017-11-30 15:48 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, maxime.coquelin, mst

On Wed, 29 Nov 2017 18:50:22 +0000
"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:

> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> vhost_update_mem will replace the existing update mechanism.
> They make use of the Flatview we have now to make the update simpler.
> This commit just adds the basic structure.
see below suggestion on dropping vhost_region_add/vhost_region_del

> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  hw/virtio/vhost.c | 44 ++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 44 insertions(+)
> 
> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> index 938253d1e8..c959a59fb3 100644
> --- a/hw/virtio/vhost.c
> +++ b/hw/virtio/vhost.c
> @@ -629,6 +629,43 @@ static void vhost_begin(MemoryListener *listener)
>      dev->mem_changed_start_addr = -1;
>  }
>  
> +struct vhost_update_mem_tmp {
> +    struct vhost_dev   *dev;
> +    uint32_t nregions;
> +    struct vhost_memory_region *regions;
> +};
> +
> +/* Called for each MRS from vhost_update_mem */
> +static int vhost_update_mem_cb(MemoryRegionSection *mrs, void *opaque)
> +{
> +    if (!vhost_section(mrs)) {
> +        return 0;
> +    }
> +
it's possible to move section tracking from vhost_region_add() to here

    ++dev->n_mem_sections;                                                       
    dev->mem_sections = g_renew(MemoryRegionSection, dev->mem_sections,          
                                dev->n_mem_sections);                            
    ;[dev->n_mem_sections - 1] = *section;                       
    memory_region_ref(section->mr);  
...

> +static int vhost_update_mem(struct vhost_dev *dev)
> +{
> +    int res;
> +    struct vhost_update_mem_tmp vtmp;
> +    vtmp.regions = 0;
> +    vtmp.nregions = 0;
> +    vtmp.dev = dev;
       
 iterate over dev->mem_sections
    memory_region_unref(section->mr);
 g_free(dev->mem_sections)
 dev->mem_sections = NULL;

with this you won't need vhost_region_add/del callbacks anymore


> +
> +    res = address_space_iterate(&address_space_memory,
> +                                vhost_update_mem_cb, &vtmp);
> +    if (res) {
> +        goto out;
> +    }
> +
> +    /* TODO */
> +out:
> +    g_free(vtmp.regions);
> +    return res;
> +}
> +
>  static void vhost_commit(MemoryListener *listener)
>  {
>      struct vhost_dev *dev = container_of(listener, struct vhost_dev,
> @@ -648,6 +685,10 @@ static void vhost_commit(MemoryListener *listener)
>          return;
>      }
>  
> +    if (vhost_update_mem(dev)) {
> +        return;
> +    }
> +
>      if (dev->started) {
>          start_addr = dev->mem_changed_start_addr;
>          size = dev->mem_changed_end_addr - dev->mem_changed_start_addr + 1;
> @@ -1523,6 +1564,9 @@ int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev)
>          goto fail_features;
>      }
>  
> +    if (vhost_update_mem(hdev)) {
> +        goto fail_mem;
> +    }
>      if (vhost_dev_has_iommu(hdev)) {
>          memory_listener_register(&hdev->iommu_listener, vdev->dma_as);
>      }

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [RFC 0/7] Rework vhost memory region updates
  2017-11-30 15:08             ` Igor Mammedov
  2017-11-30 15:18               ` Dr. David Alan Gilbert
@ 2017-11-30 16:51               ` Greg Kurz
  1 sibling, 0 replies; 34+ messages in thread
From: Greg Kurz @ 2017-11-30 16:51 UTC (permalink / raw)
  To: Igor Mammedov; +Cc: Dr. David Alan Gilbert, maxime.coquelin, qemu-devel, mst

On Thu, 30 Nov 2017 16:08:44 +0100
Igor Mammedov <imammedo@redhat.com> wrote:
[...]
> > (The logic in vhost_verify_ring_mappings doesn't make sense
> > to me either though; if vhost_verify_ring_part_mapping returns 0
> > on success, why is it doing   if (!r) { break; }  surely it
> > should be  if (r) { break; })  
> it looks like a bug (CCing Greg)
> 

Wow! It's obviously a bug indeed and I'm amazed it didn't get caught
during the review :-\

I'll send a patch ASAP.

> before (f1f9e6c5 vhost: adapt vhost_verify_ring_mappings() to virtio 1 ring layout)
> logic used to be
> 
>    if changed_*_addr doesn't contain ring
>       "IGNORE as we don't care"
>    
>    if changed_*_addr contain ring AND ring can't be mapped at the same place
>       ABORT
> 
> with f1f9e6c5 we have 3 rings so on any of them following could happen
>    if "IGNORE as we don't care"
>      break => false success 
>      since it's possible that the remaining rings in vq do overlap and didn't get checked
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [RFC 0/7] Rework vhost memory region updates
  2017-11-29 18:50 [Qemu-devel] [RFC 0/7] Rework vhost memory region updates Dr. David Alan Gilbert (git)
                   ` (7 preceding siblings ...)
  2017-11-30 11:22 ` [Qemu-devel] [RFC 0/7] Rework vhost memory region updates Igor Mammedov
@ 2017-12-01 10:02 ` Stefan Hajnoczi
  2017-12-01 10:19   ` Dr. David Alan Gilbert
  2017-12-01 14:22 ` [Qemu-devel] [RFC] vhost: check if ring mapping is still valid when building memmap Igor Mammedov
  9 siblings, 1 reply; 34+ messages in thread
From: Stefan Hajnoczi @ 2017-12-01 10:02 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, imammedo, maxime.coquelin, mst

[-- Attachment #1: Type: text/plain, Size: 823 bytes --]

On Wed, Nov 29, 2017 at 06:50:19PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Hi,
>   This is an experimental set that reworks the way the vhost
> code handles changes in physical address space layout that
> came from a discussion with Igor.
> 
> Instead of updating and trying to merge sections of address
> space on each add/remove callback, we wait until the commit phase
> and go through and rebuild a list by walking the Flatview of
> memory and end up producing an ordered list.
> We compare the list to the old list to trigger updates.
> 
> Note, only very lightly tested so far, I'm just trying to see if it's
> the right shape.

The cover letter doesn't mention why this change is necessary.  It would
be helpful to know :).

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [RFC 0/7] Rework vhost memory region updates
  2017-12-01 10:02 ` Stefan Hajnoczi
@ 2017-12-01 10:19   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 34+ messages in thread
From: Dr. David Alan Gilbert @ 2017-12-01 10:19 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: qemu-devel, imammedo, maxime.coquelin, mst

* Stefan Hajnoczi (stefanha@gmail.com) wrote:
> On Wed, Nov 29, 2017 at 06:50:19PM +0000, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > Hi,
> >   This is an experimental set that reworks the way the vhost
> > code handles changes in physical address space layout that
> > came from a discussion with Igor.
> > 
> > Instead of updating and trying to merge sections of address
> > space on each add/remove callback, we wait until the commit phase
> > and go through and rebuild a list by walking the Flatview of
> > memory and end up producing an ordered list.
> > We compare the list to the old list to trigger updates.
> > 
> > Note, only very lightly tested so far, I'm just trying to see if it's
> > the right shape.
> 
> The cover letter doesn't mention why this change is necessary.  It would
> be helpful to know :).

The hope was to simplify it, and in particular make it easier for me
to add the bits I needed for postcopy support;  in the current code it
does a fairly complex merge procedure on each section ad/del call and
doesn't keep the structures in order.   Trying to add any more rules to
that just makes it more and more complex.

The hope is that this is simpler because we just do a linear walk of the
set of sections, which means any merging just happens with the top
element, and also the list is in order that's easier to work with.
Then it should be easier for me to add the postcopy stuff which needs
to do some hugepage merging.

(Bizarrely, after having rewritten the code from scratch completely
differently, the LoC is within a few lines of each other)

Dave

> Stefan


--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Qemu-devel] [RFC] vhost: check if ring mapping is still valid when building memmap
  2017-11-29 18:50 [Qemu-devel] [RFC 0/7] Rework vhost memory region updates Dr. David Alan Gilbert (git)
                   ` (8 preceding siblings ...)
  2017-12-01 10:02 ` Stefan Hajnoczi
@ 2017-12-01 14:22 ` Igor Mammedov
  2017-12-07 18:17   ` Dr. David Alan Gilbert
  2017-12-08 20:45   ` Dr. David Alan Gilbert
  9 siblings, 2 replies; 34+ messages in thread
From: Igor Mammedov @ 2017-12-01 14:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: dgilbert, mst, groug, stefanha

vhost_verify_ring_mappings() were used to verify that
rings are still accessible and related memory hasn't
been moved after flatview is updated.

It were doing checks by mapping ring's GPA+len and
checking that HVA hasn't changed with new memory map.
To avoid maybe expensive mapping call, we were
identifying address range that changed and were doing
mapping only if ring were in changed range.

However it's no neccessary to perform ringi's GPA
mapping as we already have it's current HVA and all
we need is to verify that ring's GPA translates to
the same HVA in updated flatview.

That could be done during time when we are building
vhost memmap where vhost_update_mem_cb() already maps
every RAM MemoryRegionSection to get section HVA. This
fact can be used to check that ring belongs to the same
MemoryRegion in the same place, like this:

  hva_ring_offset = GPA(ring) - GPA(MemoryRegionSection)
  ring_hva == HVA(MemoryRegionSection) + hva_ring_offset

Doing this would allow us to drop ~100LOC of code which
figures out changed memory range and avoid calling not
needed reverse vhost_memory_map(is_write=true) which is
overkill for the task at hand.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
PS:
   should be applied on top of David's series

---
 include/hw/virtio/vhost.h |   2 -
 hw/virtio/vhost.c         | 195 ++++++++++++++--------------------------------
 2 files changed, 57 insertions(+), 140 deletions(-)

diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index 467dc77..fc4af5c 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -68,8 +68,6 @@ struct vhost_dev {
     uint64_t log_size;
     Error *migration_blocker;
     bool memory_changed;
-    hwaddr mem_changed_start_addr;
-    hwaddr mem_changed_end_addr;
     const VhostOps *vhost_ops;
     void *opaque;
     struct vhost_log *log;
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 5b9a7d7..026bac3 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -296,35 +296,43 @@ static void vhost_memory_unmap(struct vhost_dev *dev, void *buffer,
     }
 }
 
-static int vhost_verify_ring_part_mapping(struct vhost_dev *dev,
-                                          void *part,
-                                          uint64_t part_addr,
-                                          uint64_t part_size,
-                                          uint64_t start_addr,
-                                          uint64_t size)
+static int vhost_verify_ring_part_mapping(void *ring_hva,
+                                          uint64_t ring_gpa,
+                                          uint64_t ring_size,
+                                          void *reg_hva,
+                                          uint64_t reg_gpa,
+                                          uint64_t reg_size)
 {
-    hwaddr l;
-    void *p;
-    int r = 0;
+    uint64_t hva_ring_offset;
+    uint64_t ring_last = range_get_last(ring_gpa, ring_size);
+    uint64_t reg_last = range_get_last(reg_gpa, reg_size);
 
-    if (!ranges_overlap(start_addr, size, part_addr, part_size)) {
+    /* start check from the end so that the rest of checks
+     * are done when whole ring is in merged sections range
+     */
+    if (ring_last < reg_last || ring_gpa > reg_last) {
         return 0;
     }
-    l = part_size;
-    p = vhost_memory_map(dev, part_addr, &l, 1);
-    if (!p || l != part_size) {
-        r = -ENOMEM;
+
+    /* check that whole ring's is mapped */
+    if (range_get_last(ring_gpa, ring_size) >
+        range_get_last(reg_gpa, reg_size)) {
+        return -EBUSY;
     }
-    if (p != part) {
-        r = -EBUSY;
+
+    /* check that ring's MemoryRegion wasn't replaced */
+    hva_ring_offset = ring_gpa - reg_gpa;
+    if (ring_hva != reg_hva + hva_ring_offset) {
+        return -ENOMEM;
     }
-    vhost_memory_unmap(dev, p, l, 0, 0);
-    return r;
+
+    return 0;
 }
 
 static int vhost_verify_ring_mappings(struct vhost_dev *dev,
-                                      uint64_t start_addr,
-                                      uint64_t size)
+                                      void *reg_hva,
+                                      uint64_t reg_gpa,
+                                      uint64_t reg_size)
 {
     int i, j;
     int r = 0;
@@ -338,23 +346,26 @@ static int vhost_verify_ring_mappings(struct vhost_dev *dev,
         struct vhost_virtqueue *vq = dev->vqs + i;
 
         j = 0;
-        r = vhost_verify_ring_part_mapping(dev, vq->desc, vq->desc_phys,
-                                           vq->desc_size, start_addr, size);
-        if (!r) {
+        r = vhost_verify_ring_part_mapping(
+                vq->desc, vq->desc_phys, vq->desc_size,
+                reg_hva, reg_gpa, reg_size);
+        if (r) {
             break;
         }
 
         j++;
-        r = vhost_verify_ring_part_mapping(dev, vq->avail, vq->avail_phys,
-                                           vq->avail_size, start_addr, size);
-        if (!r) {
+        r = vhost_verify_ring_part_mapping(
+                vq->desc, vq->desc_phys, vq->desc_size,
+                reg_hva, reg_gpa, reg_size);
+        if (r) {
             break;
         }
 
         j++;
-        r = vhost_verify_ring_part_mapping(dev, vq->used, vq->used_phys,
-                                           vq->used_size, start_addr, size);
-        if (!r) {
+        r = vhost_verify_ring_part_mapping(
+                vq->desc, vq->desc_phys, vq->desc_size,
+                reg_hva, reg_gpa, reg_size);
+        if (r) {
             break;
         }
     }
@@ -384,24 +395,18 @@ static bool vhost_section(MemoryRegionSection *section)
     return result;
 }
 
-static void vhost_begin(MemoryListener *listener)
-{
-    struct vhost_dev *dev = container_of(listener, struct vhost_dev,
-                                         memory_listener);
-    dev->mem_changed_end_addr = 0;
-    dev->mem_changed_start_addr = -1;
-}
-
 struct vhost_update_mem_tmp {
     struct vhost_dev   *dev;
     uint32_t nregions;
     struct vhost_memory_region *regions;
 };
 
+
 /* Called for each MRS from vhost_update_mem */
 static int vhost_update_mem_cb(MemoryRegionSection *mrs, void *opaque)
 {
     struct vhost_update_mem_tmp *vtmp = opaque;
+    struct vhost_dev   *dev = vtmp->dev;
     struct vhost_memory_region *cur_vmr;
     bool need_add = true;
     uint64_t mrs_size;
@@ -416,8 +421,7 @@ static int vhost_update_mem_cb(MemoryRegionSection *mrs, void *opaque)
     mrs_host = (uintptr_t)memory_region_get_ram_ptr(mrs->mr) +
                          mrs->offset_within_region;
 
-    trace_vhost_update_mem_cb(mrs->mr->name, mrs_gpa, mrs_size,
-                              (uint64_t)mrs_host);
+    trace_vhost_update_mem_cb(mrs->mr->name, mrs_gpa, mrs_size, mrs_host);
 
     if (vtmp->nregions) {
         /* Since we already have at least one region, lets see if
@@ -459,85 +463,19 @@ static int vhost_update_mem_cb(MemoryRegionSection *mrs, void *opaque)
         cur_vmr->flags_padding = 0;
     }
 
-    return 0;
-}
-
-static bool vhost_update_compare_list(struct vhost_dev *dev,
-                                      struct vhost_update_mem_tmp *vtmp,
-                                      hwaddr *change_start, hwaddr *change_end)
-{
-    uint32_t oldi, newi;
-    *change_start = 0;
-    *change_end = 0;
-
-    for (newi = 0, oldi = 0; newi < vtmp->nregions; newi++) {
-        struct vhost_memory_region *newr = &vtmp->regions[newi];
-        hwaddr newr_last = range_get_last(newr->guest_phys_addr,
-                                          newr->memory_size);
-        trace_vhost_update_compare_list_loopn(newi, oldi,
-                                        newr->guest_phys_addr,
-                                        newr->memory_size);
-        bool whole_change = true;
-        while (oldi < dev->mem->nregions) {
-            struct vhost_memory_region *oldr = &dev->mem->regions[oldi];
-            hwaddr oldr_last = range_get_last(oldr->guest_phys_addr,
-                                          oldr->memory_size);
-            trace_vhost_update_compare_list_loopo(newi, oldi,
-                                        oldr->guest_phys_addr,
-                                        oldr->memory_size);
-            if (newr->guest_phys_addr == oldr->guest_phys_addr &&
-                newr->memory_size == oldr->memory_size) {
-                /* Match in GPA and size, but it could be different
-                 * in host address or flags
-                 */
-                whole_change = newr->userspace_addr != oldr->userspace_addr ||
-                               newr->flags_padding != oldr->flags_padding;
-                oldi++;
-                break; /* inner old loop */
-            }
-            /* There's a difference - need to figure out what */
-            if (oldr_last < newr->guest_phys_addr) {
-                /* There used to be a region before us that's gone */
-                *change_start = MIN(*change_start, oldr->guest_phys_addr);
-                *change_end = MAX(*change_end, oldr_last);
-                oldi++;
-                continue; /* inner old loop */
-            }
-            if (oldr->guest_phys_addr > newr_last) {
-                /* We've passed all the old mappings that could have overlapped
-                 * this one
-                 */
-                break; /* Inner old loop */
-            }
-            /* Overlap case */
-            *change_start = MIN(*change_start,
-                                MIN(oldr->guest_phys_addr,
-                                    newr->guest_phys_addr));
-            *change_end = MAX(*change_end,
-                                MAX(oldr_last, newr_last));
-            whole_change = false;
-            /* There might be more old mappings that overlap */
-            oldi++;
-        }
-        if (whole_change) {
-            /* No old region to compare against, this must be a change */
-            *change_start = MIN(*change_start, newr->guest_phys_addr);
-            *change_end = MAX(*change_end, newr_last);
-        }
+    if (dev->started &&
+        vhost_verify_ring_mappings(dev, (void *)mrs_host, mrs_gpa, mrs_size)) {
+        abort();
     }
 
-    return *change_start || *change_end;
+    return 0;
 }
 
 static int vhost_update_mem(struct vhost_dev *dev)
 {
     int res;
-    struct vhost_update_mem_tmp vtmp;
-    hwaddr change_start, change_end;
-    bool need_update;
-    vtmp.regions = 0;
-    vtmp.nregions = 0;
-    vtmp.dev = dev;
+    size_t mem_size;
+    struct vhost_update_mem_tmp vtmp = {.dev = dev};
 
     res = address_space_iterate(&address_space_memory,
                                 vhost_update_mem_cb, &vtmp);
@@ -545,24 +483,17 @@ static int vhost_update_mem(struct vhost_dev *dev)
         goto out;
     }
 
-    need_update = vhost_update_compare_list(dev, &vtmp,
-                                            &change_start, &change_end);
-    trace_vhost_update_mem_comparison(need_update,
-                                      (uint64_t)change_start,
-                                      (uint64_t)change_end);
-    if (need_update) {
-        /* Update the main regions list from our tmp */
-        size_t mem_size = offsetof(struct vhost_memory, regions) +
-            (vtmp.nregions + 1) * sizeof dev->mem->regions[0];
+    mem_size = offsetof(struct vhost_memory, regions) +
+               (vtmp.nregions + 1) * sizeof dev->mem->regions[0];
 
+    if(vtmp.nregions != dev->mem->nregions ||
+       memcmp(vtmp.regions, dev->mem->regions, mem_size)) {
+        /* Update the main regions list from our tmp */
         dev->mem = g_realloc(dev->mem, mem_size);
         dev->mem->nregions = vtmp.nregions;
         memcpy(dev->mem->regions, vtmp.regions,
                vtmp.nregions * sizeof dev->mem->regions[0]);
         used_memslots = vtmp.nregions;
-
-        dev->mem_changed_start_addr = change_start;
-        dev->mem_changed_end_addr = change_end;
     }
 
 out:
@@ -574,8 +505,6 @@ static void vhost_commit(MemoryListener *listener)
 {
     struct vhost_dev *dev = container_of(listener, struct vhost_dev,
                                          memory_listener);
-    hwaddr start_addr = 0;
-    ram_addr_t size = 0;
     uint64_t log_size;
     int r;
 
@@ -585,22 +514,11 @@ static void vhost_commit(MemoryListener *listener)
     if (!dev->started) {
         return;
     }
-    if (dev->mem_changed_start_addr > dev->mem_changed_end_addr) {
-        return;
-    }
 
     if (vhost_update_mem(dev)) {
         return;
     }
 
-    if (dev->started) {
-        start_addr = dev->mem_changed_start_addr;
-        size = dev->mem_changed_end_addr - dev->mem_changed_start_addr + 1;
-
-        r = vhost_verify_ring_mappings(dev, start_addr, size);
-        assert(r >= 0);
-    }
-
     if (!dev->log_enabled) {
         r = dev->vhost_ops->vhost_set_mem_table(dev, dev->mem);
         if (r < 0) {
@@ -1235,7 +1153,6 @@ int vhost_dev_init(struct vhost_dev *hdev, void *opaque,
     hdev->features = features;
 
     hdev->memory_listener = (MemoryListener) {
-        .begin = vhost_begin,
         .commit = vhost_commit,
         .region_add = vhost_region_add,
         .region_del = vhost_region_del,
@@ -1458,7 +1375,6 @@ int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev)
     /* should only be called after backend is connected */
     assert(hdev->vhost_ops);
 
-    hdev->started = true;
     hdev->vdev = vdev;
 
     r = vhost_dev_set_features(hdev, hdev->log_enabled);
@@ -1469,6 +1385,9 @@ int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev)
     if (vhost_update_mem(hdev)) {
         goto fail_mem;
     }
+
+    hdev->started = true;
+
     if (vhost_dev_has_iommu(hdev)) {
         memory_listener_register(&hdev->iommu_listener, vdev->dma_as);
     }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [RFC 3/7] vhost: New memory update functions
  2017-11-30 15:48   ` Igor Mammedov
@ 2017-12-05 18:25     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 34+ messages in thread
From: Dr. David Alan Gilbert @ 2017-12-05 18:25 UTC (permalink / raw)
  To: Igor Mammedov; +Cc: qemu-devel, maxime.coquelin, mst

* Igor Mammedov (imammedo@redhat.com) wrote:
> On Wed, 29 Nov 2017 18:50:22 +0000
> "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> 
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > vhost_update_mem will replace the existing update mechanism.
> > They make use of the Flatview we have now to make the update simpler.
> > This commit just adds the basic structure.
> see below suggestion on dropping vhost_region_add/vhost_region_del
> 
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > ---
> >  hw/virtio/vhost.c | 44 ++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 44 insertions(+)
> > 
> > diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> > index 938253d1e8..c959a59fb3 100644
> > --- a/hw/virtio/vhost.c
> > +++ b/hw/virtio/vhost.c
> > @@ -629,6 +629,43 @@ static void vhost_begin(MemoryListener *listener)
> >      dev->mem_changed_start_addr = -1;
> >  }
> >  
> > +struct vhost_update_mem_tmp {
> > +    struct vhost_dev   *dev;
> > +    uint32_t nregions;
> > +    struct vhost_memory_region *regions;
> > +};
> > +
> > +/* Called for each MRS from vhost_update_mem */
> > +static int vhost_update_mem_cb(MemoryRegionSection *mrs, void *opaque)
> > +{
> > +    if (!vhost_section(mrs)) {
> > +        return 0;
> > +    }
> > +
> it's possible to move section tracking from vhost_region_add() to here
> 
>     ++dev->n_mem_sections;                                                       
>     dev->mem_sections = g_renew(MemoryRegionSection, dev->mem_sections,          
>                                 dev->n_mem_sections);                            
>     ;[dev->n_mem_sections - 1] = *section;                       
>     memory_region_ref(section->mr);  
> ...
> 
> > +static int vhost_update_mem(struct vhost_dev *dev)
> > +{
> > +    int res;
> > +    struct vhost_update_mem_tmp vtmp;
> > +    vtmp.regions = 0;
> > +    vtmp.nregions = 0;
> > +    vtmp.dev = dev;
>        
>  iterate over dev->mem_sections
>     memory_region_unref(section->mr);
>  g_free(dev->mem_sections)
>  dev->mem_sections = NULL;
> 
> with this you won't need vhost_region_add/del callbacks anymore

Yes, with the addition of a:
   dev->n_mem_sections = 0;

in the deletion bit, that seems to work nicely.  Added.

Thanks,

Dave

> 
> > +
> > +    res = address_space_iterate(&address_space_memory,
> > +                                vhost_update_mem_cb, &vtmp);
> > +    if (res) {
> > +        goto out;
> > +    }
> > +
> > +    /* TODO */
> > +out:
> > +    g_free(vtmp.regions);
> > +    return res;
> > +}
> > +
> >  static void vhost_commit(MemoryListener *listener)
> >  {
> >      struct vhost_dev *dev = container_of(listener, struct vhost_dev,
> > @@ -648,6 +685,10 @@ static void vhost_commit(MemoryListener *listener)
> >          return;
> >      }
> >  
> > +    if (vhost_update_mem(dev)) {
> > +        return;
> > +    }
> > +
> >      if (dev->started) {
> >          start_addr = dev->mem_changed_start_addr;
> >          size = dev->mem_changed_end_addr - dev->mem_changed_start_addr + 1;
> > @@ -1523,6 +1564,9 @@ int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev)
> >          goto fail_features;
> >      }
> >  
> > +    if (vhost_update_mem(hdev)) {
> > +        goto fail_mem;
> > +    }
> >      if (vhost_dev_has_iommu(hdev)) {
> >          memory_listener_register(&hdev->iommu_listener, vdev->dma_as);
> >      }
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [RFC 4/7] vhost: update_mem_cb implementation
  2017-11-30 11:27   ` Igor Mammedov
@ 2017-12-06 20:09     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 34+ messages in thread
From: Dr. David Alan Gilbert @ 2017-12-06 20:09 UTC (permalink / raw)
  To: Igor Mammedov; +Cc: qemu-devel, maxime.coquelin, mst

* Igor Mammedov (imammedo@redhat.com) wrote:
> On Wed, 29 Nov 2017 18:50:23 +0000
> "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> 
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > Add the meat of update_mem_cb;  this is called for each region,
> > to add a region to our temporary list.
> > Our temporary list is in order we look to see if this
> > region can be merged with the current head of list.
> > 
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > ---
> >  hw/virtio/trace-events |  2 ++
> >  hw/virtio/vhost.c      | 55 +++++++++++++++++++++++++++++++++++++++++++++++++-
> >  2 files changed, 56 insertions(+), 1 deletion(-)
> > 
> > diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
> > index 4a493bcd46..92fadec192 100644
> > --- a/hw/virtio/trace-events
> > +++ b/hw/virtio/trace-events
> > @@ -2,6 +2,8 @@
> >  
> >  # hw/virtio/vhost.c
> >  vhost_section(const char *name, int r) "%s:%d"
> > +vhost_update_mem_cb(const char *name, uint64_t gpa, uint64_t size, uint64_t host) "%s: 0x%"PRIx64"+0x%"PRIx64" @ 0x%"PRIx64
> > +vhost_update_mem_cb_abut(const char *name, uint64_t new_size) "%s: 0x%"PRIx64
> >  
> >  # hw/virtio/virtio.c
> >  virtqueue_alloc_element(void *elem, size_t sz, unsigned in_num, unsigned out_num) "elem %p size %zd in_num %u out_num %u"
> > diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> > index c959a59fb3..7e3c6ae032 100644
> > --- a/hw/virtio/vhost.c
> > +++ b/hw/virtio/vhost.c
> > @@ -638,11 +638,64 @@ struct vhost_update_mem_tmp {
> >  /* Called for each MRS from vhost_update_mem */
> >  static int vhost_update_mem_cb(MemoryRegionSection *mrs, void *opaque)
> >  {
> > +    struct vhost_update_mem_tmp *vtmp = opaque;
> > +    struct vhost_memory_region *cur_vmr;
> > +    bool need_add = true;
> > +    uint64_t mrs_size;
> > +    uint64_t mrs_gpa;
> > +    uintptr_t mrs_host;
> > +
> >      if (!vhost_section(mrs)) {
> >          return 0;
> >      }
> > +    mrs_size = int128_get64(mrs->size);
> > +    mrs_gpa  = mrs->offset_within_address_space;
> > +    mrs_host = (uintptr_t)memory_region_get_ram_ptr(mrs->mr) +
> > +                         mrs->offset_within_region;
> > +
> > +    trace_vhost_update_mem_cb(mrs->mr->name, mrs_gpa, mrs_size,
> > +                              (uint64_t)mrs_host);
> > +
> > +    if (vtmp->nregions) {
> What forces you to maintain helper vhost_memory_region array
> instead of MemoryRegionSection array?

I looked at this - neither is nice.
I think I need to keep the real dev->mem->regions to keep vhost
happy.  If I've got to keep that then I've got to produce something
in that format; so producing it here and comparing it at the end
(possibly with your simple memcmp) works nicely.

The other downside of keeping working with the MemoryRegionSections is
that they have the size as an Int128 which is a pain to work with.

Dave

> > +        /* Since we already have at least one region, lets see if
> > +         * this extends it; since we're scanning in order, we only
> > +         * have to look at the last one, and the FlatView that calls
> > +         * us shouldn't have overlaps.
> > +         */
> > +        struct vhost_memory_region *prev_vmr = vtmp->regions +
> > +                                               (vtmp->nregions - 1);
> > +        uint64_t prev_gpa_start = prev_vmr->guest_phys_addr;
> > +        uint64_t prev_gpa_end   = range_get_last(prev_gpa_start,
> > +                                                 prev_vmr->memory_size);
> > +        uint64_t prev_host_start = prev_vmr->userspace_addr;
> > +        uint64_t prev_host_end   = range_get_last(prev_host_start,
> > +                                                  prev_vmr->memory_size);
> > +
> > +        if (prev_gpa_end + 1 == mrs_gpa &&
> > +            prev_host_end + 1 == mrs_host &&
> > +            (!vtmp->dev->vhost_ops->vhost_backend_can_merge ||
> > +                vtmp->dev->vhost_ops->vhost_backend_can_merge(vtmp->dev,
> > +                    mrs_host, mrs_size,
> > +                    prev_host_start, prev_vmr->memory_size))) {
> > +            /* The two regions abut */
> > +            need_add = false;
> > +            mrs_size = mrs_size + prev_vmr->memory_size;
> > +            prev_vmr->memory_size = mrs_size;
> > +            trace_vhost_update_mem_cb_abut(mrs->mr->name, mrs_size);
> > +        }
> > +    }
> > +
> > +    if (need_add) {
> > +        vtmp->nregions++;
> > +        vtmp->regions = g_realloc_n(vtmp->regions, vtmp->nregions,
> > +                                    sizeof(vtmp->regions[0]));
> > +        cur_vmr = &vtmp->regions[vtmp->nregions - 1];
> > +        cur_vmr->guest_phys_addr = mrs_gpa;
> > +        cur_vmr->memory_size     = mrs_size;
> > +        cur_vmr->userspace_addr  = mrs_host;
> > +        cur_vmr->flags_padding = 0;
> > +    }
> >  
> > -    /* TODO */
> >      return 0;
> >  }
> >  
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [RFC] vhost: check if ring mapping is still valid when building memmap
  2017-12-01 14:22 ` [Qemu-devel] [RFC] vhost: check if ring mapping is still valid when building memmap Igor Mammedov
@ 2017-12-07 18:17   ` Dr. David Alan Gilbert
  2017-12-08 14:42     ` Igor Mammedov
  2017-12-08 20:45   ` Dr. David Alan Gilbert
  1 sibling, 1 reply; 34+ messages in thread
From: Dr. David Alan Gilbert @ 2017-12-07 18:17 UTC (permalink / raw)
  To: Igor Mammedov; +Cc: qemu-devel, mst, groug, stefanha

* Igor Mammedov (imammedo@redhat.com) wrote:
> vhost_verify_ring_mappings() were used to verify that
> rings are still accessible and related memory hasn't
> been moved after flatview is updated.
> 
> It were doing checks by mapping ring's GPA+len and
> checking that HVA hasn't changed with new memory map.
> To avoid maybe expensive mapping call, we were
> identifying address range that changed and were doing
> mapping only if ring were in changed range.
> 
> However it's no neccessary to perform ringi's GPA
> mapping as we already have it's current HVA and all
> we need is to verify that ring's GPA translates to
> the same HVA in updated flatview.
> 
> That could be done during time when we are building
> vhost memmap where vhost_update_mem_cb() already maps
> every RAM MemoryRegionSection to get section HVA. This
> fact can be used to check that ring belongs to the same
> MemoryRegion in the same place, like this:
> 
>   hva_ring_offset = GPA(ring) - GPA(MemoryRegionSection)
>   ring_hva == HVA(MemoryRegionSection) + hva_ring_offset
> 
> Doing this would allow us to drop ~100LOC of code which
> figures out changed memory range and avoid calling not
> needed reverse vhost_memory_map(is_write=true) which is
> overkill for the task at hand.

There are a few things about this I don't understand; however if
it's right I agree that it means we can kill off my comparison
code.


First, can I check what constraints 'verify_ring' needs to check;
if I'm understanding the original code it's checking that :
    a) If a queue falls within a region, that the whole queue can
       be mapped
    b) And the start of the queue corresponds to where we thought
       it used to be (in GPA?)


   so that doesn't have any constraint on the ordering of the regions
   or whether the region is the same size or anything.
  Also I don't think it would spot if there was a qeueue that had no
  region associated with it at all.

Now, comments on your code below:


> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> ---
> PS:
>    should be applied on top of David's series
> 
> ---
>  include/hw/virtio/vhost.h |   2 -
>  hw/virtio/vhost.c         | 195 ++++++++++++++--------------------------------
>  2 files changed, 57 insertions(+), 140 deletions(-)
> 
> diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
> index 467dc77..fc4af5c 100644
> --- a/include/hw/virtio/vhost.h
> +++ b/include/hw/virtio/vhost.h
> @@ -68,8 +68,6 @@ struct vhost_dev {
>      uint64_t log_size;
>      Error *migration_blocker;
>      bool memory_changed;
> -    hwaddr mem_changed_start_addr;
> -    hwaddr mem_changed_end_addr;
>      const VhostOps *vhost_ops;
>      void *opaque;
>      struct vhost_log *log;
> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> index 5b9a7d7..026bac3 100644
> --- a/hw/virtio/vhost.c
> +++ b/hw/virtio/vhost.c
> @@ -296,35 +296,43 @@ static void vhost_memory_unmap(struct vhost_dev *dev, void *buffer,
>      }
>  }
>  
> -static int vhost_verify_ring_part_mapping(struct vhost_dev *dev,
> -                                          void *part,
> -                                          uint64_t part_addr,
> -                                          uint64_t part_size,
> -                                          uint64_t start_addr,
> -                                          uint64_t size)
> +static int vhost_verify_ring_part_mapping(void *ring_hva,
> +                                          uint64_t ring_gpa,
> +                                          uint64_t ring_size,
> +                                          void *reg_hva,
> +                                          uint64_t reg_gpa,
> +                                          uint64_t reg_size)
>  {
> -    hwaddr l;
> -    void *p;
> -    int r = 0;
> +    uint64_t hva_ring_offset;
> +    uint64_t ring_last = range_get_last(ring_gpa, ring_size);
> +    uint64_t reg_last = range_get_last(reg_gpa, reg_size);
>  
> -    if (!ranges_overlap(start_addr, size, part_addr, part_size)) {
> +    /* start check from the end so that the rest of checks
> +     * are done when whole ring is in merged sections range
> +     */
> +    if (ring_last < reg_last || ring_gpa > reg_last) {
>          return 0;
>      }

  so does that mean if our region never grows to be as big as the ring
we wont spot the problem?

> -    l = part_size;
> -    p = vhost_memory_map(dev, part_addr, &l, 1);
> -    if (!p || l != part_size) {
> -        r = -ENOMEM;
> +
> +    /* check that whole ring's is mapped */
> +    if (range_get_last(ring_gpa, ring_size) >
> +        range_get_last(reg_gpa, reg_size)) {
> +        return -EBUSY;
>      }

can't that be:
       if (ring_last > reg_last) {
           return -EBUSY;
       }

> -    if (p != part) {
> -        r = -EBUSY;
> +
> +    /* check that ring's MemoryRegion wasn't replaced */
> +    hva_ring_offset = ring_gpa - reg_gpa;
> +    if (ring_hva != reg_hva + hva_ring_offset) {
> +        return -ENOMEM;
>      }

Is that the same as:
   if (ring_gpa - reg_gpa != ring_hva - reg_hva) ?
which seems more symmetrical if true.


> -    vhost_memory_unmap(dev, p, l, 0, 0);
> -    return r;
> +
> +    return 0;
>  }
>  
>  static int vhost_verify_ring_mappings(struct vhost_dev *dev,
> -                                      uint64_t start_addr,
> -                                      uint64_t size)
> +                                      void *reg_hva,
> +                                      uint64_t reg_gpa,
> +                                      uint64_t reg_size)
>  {
>      int i, j;
>      int r = 0;
> @@ -338,23 +346,26 @@ static int vhost_verify_ring_mappings(struct vhost_dev *dev,
>          struct vhost_virtqueue *vq = dev->vqs + i;
>  
>          j = 0;
> -        r = vhost_verify_ring_part_mapping(dev, vq->desc, vq->desc_phys,
> -                                           vq->desc_size, start_addr, size);
> -        if (!r) {
> +        r = vhost_verify_ring_part_mapping(
> +                vq->desc, vq->desc_phys, vq->desc_size,
> +                reg_hva, reg_gpa, reg_size);
> +        if (r) {
>              break;
>          }
>  
>          j++;
> -        r = vhost_verify_ring_part_mapping(dev, vq->avail, vq->avail_phys,
> -                                           vq->avail_size, start_addr, size);
> -        if (!r) {
> +        r = vhost_verify_ring_part_mapping(
> +                vq->desc, vq->desc_phys, vq->desc_size,
> +                reg_hva, reg_gpa, reg_size);
> +        if (r) {
>              break;
>          }
>  
>          j++;
> -        r = vhost_verify_ring_part_mapping(dev, vq->used, vq->used_phys,
> -                                           vq->used_size, start_addr, size);
> -        if (!r) {
> +        r = vhost_verify_ring_part_mapping(
> +                vq->desc, vq->desc_phys, vq->desc_size,
> +                reg_hva, reg_gpa, reg_size);
> +        if (r) {
>              break;
>          }
>      }
> @@ -384,24 +395,18 @@ static bool vhost_section(MemoryRegionSection *section)
>      return result;
>  }
>  
> -static void vhost_begin(MemoryListener *listener)
> -{
> -    struct vhost_dev *dev = container_of(listener, struct vhost_dev,
> -                                         memory_listener);
> -    dev->mem_changed_end_addr = 0;
> -    dev->mem_changed_start_addr = -1;
> -}
> -
>  struct vhost_update_mem_tmp {
>      struct vhost_dev   *dev;
>      uint32_t nregions;
>      struct vhost_memory_region *regions;
>  };
>  
> +
>  /* Called for each MRS from vhost_update_mem */
>  static int vhost_update_mem_cb(MemoryRegionSection *mrs, void *opaque)
>  {
>      struct vhost_update_mem_tmp *vtmp = opaque;
> +    struct vhost_dev   *dev = vtmp->dev;
>      struct vhost_memory_region *cur_vmr;
>      bool need_add = true;
>      uint64_t mrs_size;
> @@ -416,8 +421,7 @@ static int vhost_update_mem_cb(MemoryRegionSection *mrs, void *opaque)
>      mrs_host = (uintptr_t)memory_region_get_ram_ptr(mrs->mr) +
>                           mrs->offset_within_region;
>  
> -    trace_vhost_update_mem_cb(mrs->mr->name, mrs_gpa, mrs_size,
> -                              (uint64_t)mrs_host);
> +    trace_vhost_update_mem_cb(mrs->mr->name, mrs_gpa, mrs_size, mrs_host);
>  
>      if (vtmp->nregions) {
>          /* Since we already have at least one region, lets see if
> @@ -459,85 +463,19 @@ static int vhost_update_mem_cb(MemoryRegionSection *mrs, void *opaque)
>          cur_vmr->flags_padding = 0;
>      }
>  
> -    return 0;
> -}
> -
> -static bool vhost_update_compare_list(struct vhost_dev *dev,
> -                                      struct vhost_update_mem_tmp *vtmp,
> -                                      hwaddr *change_start, hwaddr *change_end)
> -{
> -    uint32_t oldi, newi;
> -    *change_start = 0;
> -    *change_end = 0;
> -
> -    for (newi = 0, oldi = 0; newi < vtmp->nregions; newi++) {
> -        struct vhost_memory_region *newr = &vtmp->regions[newi];
> -        hwaddr newr_last = range_get_last(newr->guest_phys_addr,
> -                                          newr->memory_size);
> -        trace_vhost_update_compare_list_loopn(newi, oldi,
> -                                        newr->guest_phys_addr,
> -                                        newr->memory_size);
> -        bool whole_change = true;
> -        while (oldi < dev->mem->nregions) {
> -            struct vhost_memory_region *oldr = &dev->mem->regions[oldi];
> -            hwaddr oldr_last = range_get_last(oldr->guest_phys_addr,
> -                                          oldr->memory_size);
> -            trace_vhost_update_compare_list_loopo(newi, oldi,
> -                                        oldr->guest_phys_addr,
> -                                        oldr->memory_size);
> -            if (newr->guest_phys_addr == oldr->guest_phys_addr &&
> -                newr->memory_size == oldr->memory_size) {
> -                /* Match in GPA and size, but it could be different
> -                 * in host address or flags
> -                 */
> -                whole_change = newr->userspace_addr != oldr->userspace_addr ||
> -                               newr->flags_padding != oldr->flags_padding;
> -                oldi++;
> -                break; /* inner old loop */
> -            }
> -            /* There's a difference - need to figure out what */
> -            if (oldr_last < newr->guest_phys_addr) {
> -                /* There used to be a region before us that's gone */
> -                *change_start = MIN(*change_start, oldr->guest_phys_addr);
> -                *change_end = MAX(*change_end, oldr_last);
> -                oldi++;
> -                continue; /* inner old loop */
> -            }
> -            if (oldr->guest_phys_addr > newr_last) {
> -                /* We've passed all the old mappings that could have overlapped
> -                 * this one
> -                 */
> -                break; /* Inner old loop */
> -            }
> -            /* Overlap case */
> -            *change_start = MIN(*change_start,
> -                                MIN(oldr->guest_phys_addr,
> -                                    newr->guest_phys_addr));
> -            *change_end = MAX(*change_end,
> -                                MAX(oldr_last, newr_last));
> -            whole_change = false;
> -            /* There might be more old mappings that overlap */
> -            oldi++;
> -        }
> -        if (whole_change) {
> -            /* No old region to compare against, this must be a change */
> -            *change_start = MIN(*change_start, newr->guest_phys_addr);
> -            *change_end = MAX(*change_end, newr_last);
> -        }
> +    if (dev->started &&
> +        vhost_verify_ring_mappings(dev, (void *)mrs_host, mrs_gpa, mrs_size)) {
> +        abort();
>      }
>  
> -    return *change_start || *change_end;
> +    return 0;
>  }
>  
>  static int vhost_update_mem(struct vhost_dev *dev)
>  {
>      int res;
> -    struct vhost_update_mem_tmp vtmp;
> -    hwaddr change_start, change_end;
> -    bool need_update;
> -    vtmp.regions = 0;
> -    vtmp.nregions = 0;
> -    vtmp.dev = dev;
> +    size_t mem_size;
> +    struct vhost_update_mem_tmp vtmp = {.dev = dev};
>  
>      res = address_space_iterate(&address_space_memory,
>                                  vhost_update_mem_cb, &vtmp);
> @@ -545,24 +483,17 @@ static int vhost_update_mem(struct vhost_dev *dev)
>          goto out;
>      }
>  
> -    need_update = vhost_update_compare_list(dev, &vtmp,
> -                                            &change_start, &change_end);
> -    trace_vhost_update_mem_comparison(need_update,
> -                                      (uint64_t)change_start,
> -                                      (uint64_t)change_end);
> -    if (need_update) {
> -        /* Update the main regions list from our tmp */
> -        size_t mem_size = offsetof(struct vhost_memory, regions) +
> -            (vtmp.nregions + 1) * sizeof dev->mem->regions[0];
> +    mem_size = offsetof(struct vhost_memory, regions) +
> +               (vtmp.nregions + 1) * sizeof dev->mem->regions[0];
>  
> +    if(vtmp.nregions != dev->mem->nregions ||
> +       memcmp(vtmp.regions, dev->mem->regions, mem_size)) {
> +        /* Update the main regions list from our tmp */
>          dev->mem = g_realloc(dev->mem, mem_size);
>          dev->mem->nregions = vtmp.nregions;
>          memcpy(dev->mem->regions, vtmp.regions,
>                 vtmp.nregions * sizeof dev->mem->regions[0]);
>          used_memslots = vtmp.nregions;
> -
> -        dev->mem_changed_start_addr = change_start;
> -        dev->mem_changed_end_addr = change_end;
>      }
>  
>  out:
> @@ -574,8 +505,6 @@ static void vhost_commit(MemoryListener *listener)
>  {
>      struct vhost_dev *dev = container_of(listener, struct vhost_dev,
>                                           memory_listener);
> -    hwaddr start_addr = 0;
> -    ram_addr_t size = 0;
>      uint64_t log_size;
>      int r;
>  
> @@ -585,22 +514,11 @@ static void vhost_commit(MemoryListener *listener)
>      if (!dev->started) {
>          return;
>      }
> -    if (dev->mem_changed_start_addr > dev->mem_changed_end_addr) {
> -        return;
> -    }

This was in the wrong place in the version I posted; this should have
been after the vhost_update_mem;  I think we still need something
telling us whether vhost_update_mem found a change, because we want to
skip the rest of vhost_commit if there's no change at all.

>      if (vhost_update_mem(dev)) {
>          return;
>      }
>  
> -    if (dev->started) {
> -        start_addr = dev->mem_changed_start_addr;
> -        size = dev->mem_changed_end_addr - dev->mem_changed_start_addr + 1;
> -
> -        r = vhost_verify_ring_mappings(dev, start_addr, size);
> -        assert(r >= 0);
> -    }
> -
>      if (!dev->log_enabled) {
>          r = dev->vhost_ops->vhost_set_mem_table(dev, dev->mem);
>          if (r < 0) {
> @@ -1235,7 +1153,6 @@ int vhost_dev_init(struct vhost_dev *hdev, void *opaque,
>      hdev->features = features;
>  
>      hdev->memory_listener = (MemoryListener) {
> -        .begin = vhost_begin,
>          .commit = vhost_commit,
>          .region_add = vhost_region_add,
>          .region_del = vhost_region_del,
> @@ -1458,7 +1375,6 @@ int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev)
>      /* should only be called after backend is connected */
>      assert(hdev->vhost_ops);
>  
> -    hdev->started = true;
>      hdev->vdev = vdev;
>  
>      r = vhost_dev_set_features(hdev, hdev->log_enabled);
> @@ -1469,6 +1385,9 @@ int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev)
>      if (vhost_update_mem(hdev)) {
>          goto fail_mem;
>      }
> +
> +    hdev->started = true;
> +

Ah, good spot.

Dave

>      if (vhost_dev_has_iommu(hdev)) {
>          memory_listener_register(&hdev->iommu_listener, vdev->dma_as);
>      }
> -- 
> 2.7.4
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [RFC] vhost: check if ring mapping is still valid when building memmap
  2017-12-07 18:17   ` Dr. David Alan Gilbert
@ 2017-12-08 14:42     ` Igor Mammedov
  2017-12-08 17:51       ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 34+ messages in thread
From: Igor Mammedov @ 2017-12-08 14:42 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: qemu-devel, mst, groug, stefanha

On Thu, 7 Dec 2017 18:17:51 +0000
"Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:

> * Igor Mammedov (imammedo@redhat.com) wrote:
> > vhost_verify_ring_mappings() were used to verify that
> > rings are still accessible and related memory hasn't
> > been moved after flatview is updated.
> > 
> > It were doing checks by mapping ring's GPA+len and
> > checking that HVA hasn't changed with new memory map.
> > To avoid maybe expensive mapping call, we were
> > identifying address range that changed and were doing
> > mapping only if ring were in changed range.
> > 
> > However it's no neccessary to perform ringi's GPA
> > mapping as we already have it's current HVA and all
> > we need is to verify that ring's GPA translates to
> > the same HVA in updated flatview.
> > 
> > That could be done during time when we are building
> > vhost memmap where vhost_update_mem_cb() already maps
> > every RAM MemoryRegionSection to get section HVA. This
> > fact can be used to check that ring belongs to the same
> > MemoryRegion in the same place, like this:
> > 
> >   hva_ring_offset = GPA(ring) - GPA(MemoryRegionSection)
> >   ring_hva == HVA(MemoryRegionSection) + hva_ring_offset
> > 
> > Doing this would allow us to drop ~100LOC of code which
> > figures out changed memory range and avoid calling not
> > needed reverse vhost_memory_map(is_write=true) which is
> > overkill for the task at hand.  
> 
> There are a few things about this I don't understand; however if
> it's right I agree that it means we can kill off my comparison
> code.
> 
> 
> First, can I check what constraints 'verify_ring' needs to check;
> if I'm understanding the original code it's checking that :
>     a) If a queue falls within a region, that the whole queue can
>        be mapped
         see vhost_verify_ring_part_mapping():

             p = vhost_memory_map(dev, part_addr, &l, 1);                                 
             if (!p || l != part_size) 
                  error_out
             
         1st: (!p) requested part_addr must be mapped
              i.e. be a part of MemorySection in flatview
         AND
         2nd: (l != part_size) mapped range size must match what we asked for
              i.e. mapped as continuous range so that
                 [GPA, GPA + part_size) could directly correspond to [HVA, HVA + part_size)
              and that's is possible only withing 1 MemorySection && 1 MeoryRegion
              if I read address_space_map() correctly
                     flatview_translate() -> GPA's MemorySection
                     flatview_extend_translation() -> 1:1 GPA->HVA range size
              
         So answer in case of RAM memory region that we are interested in, would be:
         queue falls within a MemorySection and whole queue fits in to it

>     b) And the start of the queue corresponds to where we thought
>        it used to be (in GPA?)
         that cached at vq->desc queue HVA hasn't changed after flatview change
            if (p != part)
               error_out
> 
> 
>    so that doesn't have any constraint on the ordering of the regions
>    or whether the region is the same size or anything.
>   Also I don't think it would spot if there was a qeueue that had no
>   region associated with it at all.
> 
> Now, comments on your code below:
> 
> 
> > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > ---
> > PS:
> >    should be applied on top of David's series
> > 
> > ---
> >  include/hw/virtio/vhost.h |   2 -
> >  hw/virtio/vhost.c         | 195 ++++++++++++++--------------------------------
> >  2 files changed, 57 insertions(+), 140 deletions(-)
> > 
> > diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
> > index 467dc77..fc4af5c 100644
> > --- a/include/hw/virtio/vhost.h
> > +++ b/include/hw/virtio/vhost.h
> > @@ -68,8 +68,6 @@ struct vhost_dev {
> >      uint64_t log_size;
> >      Error *migration_blocker;
> >      bool memory_changed;
> > -    hwaddr mem_changed_start_addr;
> > -    hwaddr mem_changed_end_addr;
> >      const VhostOps *vhost_ops;
> >      void *opaque;
> >      struct vhost_log *log;
> > diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> > index 5b9a7d7..026bac3 100644
> > --- a/hw/virtio/vhost.c
> > +++ b/hw/virtio/vhost.c
> > @@ -296,35 +296,43 @@ static void vhost_memory_unmap(struct vhost_dev *dev, void *buffer,
> >      }
> >  }
> >  
> > -static int vhost_verify_ring_part_mapping(struct vhost_dev *dev,
> > -                                          void *part,
> > -                                          uint64_t part_addr,
> > -                                          uint64_t part_size,
> > -                                          uint64_t start_addr,
> > -                                          uint64_t size)
> > +static int vhost_verify_ring_part_mapping(void *ring_hva,
> > +                                          uint64_t ring_gpa,
> > +                                          uint64_t ring_size,
> > +                                          void *reg_hva,
> > +                                          uint64_t reg_gpa,
> > +                                          uint64_t reg_size)
> >  {
> > -    hwaddr l;
> > -    void *p;
> > -    int r = 0;
> > +    uint64_t hva_ring_offset;
> > +    uint64_t ring_last = range_get_last(ring_gpa, ring_size);
> > +    uint64_t reg_last = range_get_last(reg_gpa, reg_size);
> >  
> > -    if (!ranges_overlap(start_addr, size, part_addr, part_size)) {
> > +    /* start check from the end so that the rest of checks
> > +     * are done when whole ring is in merged sections range
> > +     */
> > +    if (ring_last < reg_last || ring_gpa > reg_last) {
> >          return 0;
> >      }  
> 
>   so does that mean if our region never grows to be as big as the ring
> we wont spot the problem?
I think there is mistake here it should be:
   ring_last < reg_gpa || ring_gpa > reg_last

so if ring is out side of continuous region, we do not care => no change

> 
> > -    l = part_size;
> > -    p = vhost_memory_map(dev, part_addr, &l, 1);
> > -    if (!p || l != part_size) {
> > -        r = -ENOMEM;
> > +
> > +    /* check that whole ring's is mapped */
> > +    if (range_get_last(ring_gpa, ring_size) >
> > +        range_get_last(reg_gpa, reg_size)) {
> > +        return -EBUSY;
> >      }  
> 
> can't that be:
>        if (ring_last > reg_last) {
>            return -EBUSY;
>        }
yep 

> > -    if (p != part) {
> > -        r = -EBUSY;
> > +
> > +    /* check that ring's MemoryRegion wasn't replaced */
> > +    hva_ring_offset = ring_gpa - reg_gpa;
> > +    if (ring_hva != reg_hva + hva_ring_offset) {
> > +        return -ENOMEM;
> >      }  
> 
> Is that the same as:
>    if (ring_gpa - reg_gpa != ring_hva - reg_hva) ?
> which seems more symmetrical if true.
it looks more cryptic to me, my more verbose variant should
be easier to read when one forgets how it all works and needs
to figure it out again.


> > -    vhost_memory_unmap(dev, p, l, 0, 0);
> > -    return r;
> > +
> > +    return 0;
> >  }
> >  
> >  static int vhost_verify_ring_mappings(struct vhost_dev *dev,
> > -                                      uint64_t start_addr,
> > -                                      uint64_t size)
> > +                                      void *reg_hva,
> > +                                      uint64_t reg_gpa,
> > +                                      uint64_t reg_size)
> >  {
> >      int i, j;
> >      int r = 0;
> > @@ -338,23 +346,26 @@ static int vhost_verify_ring_mappings(struct vhost_dev *dev,
> >          struct vhost_virtqueue *vq = dev->vqs + i;
> >  
> >          j = 0;
> > -        r = vhost_verify_ring_part_mapping(dev, vq->desc, vq->desc_phys,
> > -                                           vq->desc_size, start_addr, size);
> > -        if (!r) {
> > +        r = vhost_verify_ring_part_mapping(
> > +                vq->desc, vq->desc_phys, vq->desc_size,
> > +                reg_hva, reg_gpa, reg_size);
> > +        if (r) {
> >              break;
> >          }
> >  
> >          j++;
> > -        r = vhost_verify_ring_part_mapping(dev, vq->avail, vq->avail_phys,
> > -                                           vq->avail_size, start_addr, size);
> > -        if (!r) {
> > +        r = vhost_verify_ring_part_mapping(
> > +                vq->desc, vq->desc_phys, vq->desc_size,
> > +                reg_hva, reg_gpa, reg_size);
> > +        if (r) {
> >              break;
> >          }
> >  
> >          j++;
> > -        r = vhost_verify_ring_part_mapping(dev, vq->used, vq->used_phys,
> > -                                           vq->used_size, start_addr, size);
> > -        if (!r) {
> > +        r = vhost_verify_ring_part_mapping(
> > +                vq->desc, vq->desc_phys, vq->desc_size,
> > +                reg_hva, reg_gpa, reg_size);
> > +        if (r) {
> >              break;
> >          }
> >      }
> > @@ -384,24 +395,18 @@ static bool vhost_section(MemoryRegionSection *section)
> >      return result;
> >  }
> >  
> > -static void vhost_begin(MemoryListener *listener)
> > -{
> > -    struct vhost_dev *dev = container_of(listener, struct vhost_dev,
> > -                                         memory_listener);
> > -    dev->mem_changed_end_addr = 0;
> > -    dev->mem_changed_start_addr = -1;
> > -}
> > -
> >  struct vhost_update_mem_tmp {
> >      struct vhost_dev   *dev;
> >      uint32_t nregions;
> >      struct vhost_memory_region *regions;
> >  };
> >  
> > +
> >  /* Called for each MRS from vhost_update_mem */
> >  static int vhost_update_mem_cb(MemoryRegionSection *mrs, void *opaque)
> >  {
> >      struct vhost_update_mem_tmp *vtmp = opaque;
> > +    struct vhost_dev   *dev = vtmp->dev;
> >      struct vhost_memory_region *cur_vmr;
> >      bool need_add = true;
> >      uint64_t mrs_size;
> > @@ -416,8 +421,7 @@ static int vhost_update_mem_cb(MemoryRegionSection *mrs, void *opaque)
> >      mrs_host = (uintptr_t)memory_region_get_ram_ptr(mrs->mr) +
> >                           mrs->offset_within_region;
> >  
> > -    trace_vhost_update_mem_cb(mrs->mr->name, mrs_gpa, mrs_size,
> > -                              (uint64_t)mrs_host);
> > +    trace_vhost_update_mem_cb(mrs->mr->name, mrs_gpa, mrs_size, mrs_host);
> >  
> >      if (vtmp->nregions) {
> >          /* Since we already have at least one region, lets see if
> > @@ -459,85 +463,19 @@ static int vhost_update_mem_cb(MemoryRegionSection *mrs, void *opaque)
> >          cur_vmr->flags_padding = 0;
> >      }
> >  
> > -    return 0;
> > -}
> > -
> > -static bool vhost_update_compare_list(struct vhost_dev *dev,
> > -                                      struct vhost_update_mem_tmp *vtmp,
> > -                                      hwaddr *change_start, hwaddr *change_end)
> > -{
> > -    uint32_t oldi, newi;
> > -    *change_start = 0;
> > -    *change_end = 0;
> > -
> > -    for (newi = 0, oldi = 0; newi < vtmp->nregions; newi++) {
> > -        struct vhost_memory_region *newr = &vtmp->regions[newi];
> > -        hwaddr newr_last = range_get_last(newr->guest_phys_addr,
> > -                                          newr->memory_size);
> > -        trace_vhost_update_compare_list_loopn(newi, oldi,
> > -                                        newr->guest_phys_addr,
> > -                                        newr->memory_size);
> > -        bool whole_change = true;
> > -        while (oldi < dev->mem->nregions) {
> > -            struct vhost_memory_region *oldr = &dev->mem->regions[oldi];
> > -            hwaddr oldr_last = range_get_last(oldr->guest_phys_addr,
> > -                                          oldr->memory_size);
> > -            trace_vhost_update_compare_list_loopo(newi, oldi,
> > -                                        oldr->guest_phys_addr,
> > -                                        oldr->memory_size);
> > -            if (newr->guest_phys_addr == oldr->guest_phys_addr &&
> > -                newr->memory_size == oldr->memory_size) {
> > -                /* Match in GPA and size, but it could be different
> > -                 * in host address or flags
> > -                 */
> > -                whole_change = newr->userspace_addr != oldr->userspace_addr ||
> > -                               newr->flags_padding != oldr->flags_padding;
> > -                oldi++;
> > -                break; /* inner old loop */
> > -            }
> > -            /* There's a difference - need to figure out what */
> > -            if (oldr_last < newr->guest_phys_addr) {
> > -                /* There used to be a region before us that's gone */
> > -                *change_start = MIN(*change_start, oldr->guest_phys_addr);
> > -                *change_end = MAX(*change_end, oldr_last);
> > -                oldi++;
> > -                continue; /* inner old loop */
> > -            }
> > -            if (oldr->guest_phys_addr > newr_last) {
> > -                /* We've passed all the old mappings that could have overlapped
> > -                 * this one
> > -                 */
> > -                break; /* Inner old loop */
> > -            }
> > -            /* Overlap case */
> > -            *change_start = MIN(*change_start,
> > -                                MIN(oldr->guest_phys_addr,
> > -                                    newr->guest_phys_addr));
> > -            *change_end = MAX(*change_end,
> > -                                MAX(oldr_last, newr_last));
> > -            whole_change = false;
> > -            /* There might be more old mappings that overlap */
> > -            oldi++;
> > -        }
> > -        if (whole_change) {
> > -            /* No old region to compare against, this must be a change */
> > -            *change_start = MIN(*change_start, newr->guest_phys_addr);
> > -            *change_end = MAX(*change_end, newr_last);
> > -        }
> > +    if (dev->started &&
> > +        vhost_verify_ring_mappings(dev, (void *)mrs_host, mrs_gpa, mrs_size)) {
> > +        abort();
> >      }
> >  
> > -    return *change_start || *change_end;
> > +    return 0;
> >  }
> >  
> >  static int vhost_update_mem(struct vhost_dev *dev)
> >  {
> >      int res;
> > -    struct vhost_update_mem_tmp vtmp;
> > -    hwaddr change_start, change_end;
> > -    bool need_update;
> > -    vtmp.regions = 0;
> > -    vtmp.nregions = 0;
> > -    vtmp.dev = dev;
> > +    size_t mem_size;
> > +    struct vhost_update_mem_tmp vtmp = {.dev = dev};
> >  
> >      res = address_space_iterate(&address_space_memory,
> >                                  vhost_update_mem_cb, &vtmp);
> > @@ -545,24 +483,17 @@ static int vhost_update_mem(struct vhost_dev *dev)
> >          goto out;
> >      }
> >  
> > -    need_update = vhost_update_compare_list(dev, &vtmp,
> > -                                            &change_start, &change_end);
> > -    trace_vhost_update_mem_comparison(need_update,
> > -                                      (uint64_t)change_start,
> > -                                      (uint64_t)change_end);
> > -    if (need_update) {
> > -        /* Update the main regions list from our tmp */
> > -        size_t mem_size = offsetof(struct vhost_memory, regions) +
> > -            (vtmp.nregions + 1) * sizeof dev->mem->regions[0];
> > +    mem_size = offsetof(struct vhost_memory, regions) +
> > +               (vtmp.nregions + 1) * sizeof dev->mem->regions[0];
> >  
> > +    if(vtmp.nregions != dev->mem->nregions ||
> > +       memcmp(vtmp.regions, dev->mem->regions, mem_size)) {
> > +        /* Update the main regions list from our tmp */
> >          dev->mem = g_realloc(dev->mem, mem_size);
> >          dev->mem->nregions = vtmp.nregions;
> >          memcpy(dev->mem->regions, vtmp.regions,
> >                 vtmp.nregions * sizeof dev->mem->regions[0]);
> >          used_memslots = vtmp.nregions;
> > -
> > -        dev->mem_changed_start_addr = change_start;
> > -        dev->mem_changed_end_addr = change_end;
> >      }
> >  
> >  out:
> > @@ -574,8 +505,6 @@ static void vhost_commit(MemoryListener *listener)
> >  {
> >      struct vhost_dev *dev = container_of(listener, struct vhost_dev,
> >                                           memory_listener);
> > -    hwaddr start_addr = 0;
> > -    ram_addr_t size = 0;
> >      uint64_t log_size;
> >      int r;
> >  
> > @@ -585,22 +514,11 @@ static void vhost_commit(MemoryListener *listener)
> >      if (!dev->started) {
> >          return;
> >      }
> > -    if (dev->mem_changed_start_addr > dev->mem_changed_end_addr) {
> > -        return;
> > -    }  
> 
> This was in the wrong place in the version I posted; this should have
> been after the vhost_update_mem;  I think we still need something
> telling us whether vhost_update_mem found a change, because we want to
> skip the rest of vhost_commit if there's no change at all.
maybe cache memcmp result in dev->memory_changed

> 
> >      if (vhost_update_mem(dev)) {
> >          return;
> >      }
> >  
> > -    if (dev->started) {
> > -        start_addr = dev->mem_changed_start_addr;
> > -        size = dev->mem_changed_end_addr - dev->mem_changed_start_addr + 1;
> > -
> > -        r = vhost_verify_ring_mappings(dev, start_addr, size);
> > -        assert(r >= 0);
> > -    }
> > -
> >      if (!dev->log_enabled) {
> >          r = dev->vhost_ops->vhost_set_mem_table(dev, dev->mem);
> >          if (r < 0) {
> > @@ -1235,7 +1153,6 @@ int vhost_dev_init(struct vhost_dev *hdev, void *opaque,
> >      hdev->features = features;
> >  
> >      hdev->memory_listener = (MemoryListener) {
> > -        .begin = vhost_begin,
> >          .commit = vhost_commit,
> >          .region_add = vhost_region_add,
> >          .region_del = vhost_region_del,
> > @@ -1458,7 +1375,6 @@ int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev)
> >      /* should only be called after backend is connected */
> >      assert(hdev->vhost_ops);
> >  
> > -    hdev->started = true;
> >      hdev->vdev = vdev;
> >  
> >      r = vhost_dev_set_features(hdev, hdev->log_enabled);
> > @@ -1469,6 +1385,9 @@ int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev)
> >      if (vhost_update_mem(hdev)) {
> >          goto fail_mem;
> >      }
> > +
> > +    hdev->started = true;
> > +  
> 
> Ah, good spot.
> 
> Dave
> 
> >      if (vhost_dev_has_iommu(hdev)) {
> >          memory_listener_register(&hdev->iommu_listener, vdev->dma_as);
> >      }
> > -- 
> > 2.7.4
> >   
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [RFC] vhost: check if ring mapping is still valid when building memmap
  2017-12-08 14:42     ` Igor Mammedov
@ 2017-12-08 17:51       ` Dr. David Alan Gilbert
  2017-12-11  9:37         ` Igor Mammedov
  0 siblings, 1 reply; 34+ messages in thread
From: Dr. David Alan Gilbert @ 2017-12-08 17:51 UTC (permalink / raw)
  To: Igor Mammedov; +Cc: qemu-devel, mst, groug, stefanha

* Igor Mammedov (imammedo@redhat.com) wrote:
> On Thu, 7 Dec 2017 18:17:51 +0000
> "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> 
> > * Igor Mammedov (imammedo@redhat.com) wrote:
> > > vhost_verify_ring_mappings() were used to verify that
> > > rings are still accessible and related memory hasn't
> > > been moved after flatview is updated.
> > > 
> > > It were doing checks by mapping ring's GPA+len and
> > > checking that HVA hasn't changed with new memory map.
> > > To avoid maybe expensive mapping call, we were
> > > identifying address range that changed and were doing
> > > mapping only if ring were in changed range.
> > > 
> > > However it's no neccessary to perform ringi's GPA
> > > mapping as we already have it's current HVA and all
> > > we need is to verify that ring's GPA translates to
> > > the same HVA in updated flatview.
> > > 
> > > That could be done during time when we are building
> > > vhost memmap where vhost_update_mem_cb() already maps
> > > every RAM MemoryRegionSection to get section HVA. This
> > > fact can be used to check that ring belongs to the same
> > > MemoryRegion in the same place, like this:
> > > 
> > >   hva_ring_offset = GPA(ring) - GPA(MemoryRegionSection)
> > >   ring_hva == HVA(MemoryRegionSection) + hva_ring_offset
> > > 
> > > Doing this would allow us to drop ~100LOC of code which
> > > figures out changed memory range and avoid calling not
> > > needed reverse vhost_memory_map(is_write=true) which is
> > > overkill for the task at hand.  
> > 
> > There are a few things about this I don't understand; however if
> > it's right I agree that it means we can kill off my comparison
> > code.
> > 
> > 
> > First, can I check what constraints 'verify_ring' needs to check;
> > if I'm understanding the original code it's checking that :
> >     a) If a queue falls within a region, that the whole queue can
> >        be mapped
>          see vhost_verify_ring_part_mapping():
> 
>              p = vhost_memory_map(dev, part_addr, &l, 1);                                 
>              if (!p || l != part_size) 
>                   error_out
>              
>          1st: (!p) requested part_addr must be mapped
>               i.e. be a part of MemorySection in flatview
>          AND
>          2nd: (l != part_size) mapped range size must match what we asked for
>               i.e. mapped as continuous range so that
>                  [GPA, GPA + part_size) could directly correspond to [HVA, HVA + part_size)
>               and that's is possible only withing 1 MemorySection && 1 MeoryRegion
>               if I read address_space_map() correctly
>                      flatview_translate() -> GPA's MemorySection
>                      flatview_extend_translation() -> 1:1 GPA->HVA range size
>               
>          So answer in case of RAM memory region that we are interested in, would be:
>          queue falls within a MemorySection and whole queue fits in to it

Yes, OK.

> >     b) And the start of the queue corresponds to where we thought
> >        it used to be (in GPA?)
>          that cached at vq->desc queue HVA hasn't changed after flatview change
>             if (p != part)
>                error_out

OK, so it's the HVA that's not changed based on taking the part_addr
which is GPA and checking the map?

> >    so that doesn't have any constraint on the ordering of the regions
> >    or whether the region is the same size or anything.
> >   Also I don't think it would spot if there was a qeueue that had no
> >   region associated with it at all.
> > 
> > Now, comments on your code below:
> > 
> > 
> > > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > > ---
> > > PS:
> > >    should be applied on top of David's series
> > > 
> > > ---
> > >  include/hw/virtio/vhost.h |   2 -
> > >  hw/virtio/vhost.c         | 195 ++++++++++++++--------------------------------
> > >  2 files changed, 57 insertions(+), 140 deletions(-)
> > > 
> > > diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
> > > index 467dc77..fc4af5c 100644
> > > --- a/include/hw/virtio/vhost.h
> > > +++ b/include/hw/virtio/vhost.h
> > > @@ -68,8 +68,6 @@ struct vhost_dev {
> > >      uint64_t log_size;
> > >      Error *migration_blocker;
> > >      bool memory_changed;
> > > -    hwaddr mem_changed_start_addr;
> > > -    hwaddr mem_changed_end_addr;
> > >      const VhostOps *vhost_ops;
> > >      void *opaque;
> > >      struct vhost_log *log;
> > > diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> > > index 5b9a7d7..026bac3 100644
> > > --- a/hw/virtio/vhost.c
> > > +++ b/hw/virtio/vhost.c
> > > @@ -296,35 +296,43 @@ static void vhost_memory_unmap(struct vhost_dev *dev, void *buffer,
> > >      }
> > >  }
> > >  
> > > -static int vhost_verify_ring_part_mapping(struct vhost_dev *dev,
> > > -                                          void *part,
> > > -                                          uint64_t part_addr,
> > > -                                          uint64_t part_size,
> > > -                                          uint64_t start_addr,
> > > -                                          uint64_t size)
> > > +static int vhost_verify_ring_part_mapping(void *ring_hva,
> > > +                                          uint64_t ring_gpa,
> > > +                                          uint64_t ring_size,
> > > +                                          void *reg_hva,
> > > +                                          uint64_t reg_gpa,
> > > +                                          uint64_t reg_size)
> > >  {
> > > -    hwaddr l;
> > > -    void *p;
> > > -    int r = 0;
> > > +    uint64_t hva_ring_offset;
> > > +    uint64_t ring_last = range_get_last(ring_gpa, ring_size);
> > > +    uint64_t reg_last = range_get_last(reg_gpa, reg_size);
> > >  
> > > -    if (!ranges_overlap(start_addr, size, part_addr, part_size)) {
> > > +    /* start check from the end so that the rest of checks
> > > +     * are done when whole ring is in merged sections range
> > > +     */
> > > +    if (ring_last < reg_last || ring_gpa > reg_last) {
> > >          return 0;
> > >      }  
> > 
> >   so does that mean if our region never grows to be as big as the ring
> > we wont spot the problem?
> I think there is mistake here it should be:
>    ring_last < reg_gpa || ring_gpa > reg_last
> 
> so if ring is out side of continuous region, we do not care => no change

OK, I don't see how that corresponds to your 'start check from the end'
comment - I thought it was doing something smarter to deal with this
being called from the _cb routine potentially before another part of the
range had been joined onto it.
In that case, we can just use ranges_overlap like the original
routine did.

> > > -    l = part_size;
> > > -    p = vhost_memory_map(dev, part_addr, &l, 1);
> > > -    if (!p || l != part_size) {
> > > -        r = -ENOMEM;
> > > +
> > > +    /* check that whole ring's is mapped */
> > > +    if (range_get_last(ring_gpa, ring_size) >
> > > +        range_get_last(reg_gpa, reg_size)) {
> > > +        return -EBUSY;
> > >      }  
> > 
> > can't that be:
> >        if (ring_last > reg_last) {
> >            return -EBUSY;
> >        }
> yep 
> 
> > > -    if (p != part) {
> > > -        r = -EBUSY;
> > > +
> > > +    /* check that ring's MemoryRegion wasn't replaced */
> > > +    hva_ring_offset = ring_gpa - reg_gpa;
> > > +    if (ring_hva != reg_hva + hva_ring_offset) {
> > > +        return -ENOMEM;
> > >      }  
> > 
> > Is that the same as:
> >    if (ring_gpa - reg_gpa != ring_hva - reg_hva) ?
> > which seems more symmetrical if true.
> it looks more cryptic to me, my more verbose variant should
> be easier to read when one forgets how it all works and needs
> to figure it out again.

OK, just difference in tastes.

> > > -    vhost_memory_unmap(dev, p, l, 0, 0);
> > > -    return r;
> > > +
> > > +    return 0;
> > >  }
> > >  
> > >  static int vhost_verify_ring_mappings(struct vhost_dev *dev,
> > > -                                      uint64_t start_addr,
> > > -                                      uint64_t size)
> > > +                                      void *reg_hva,
> > > +                                      uint64_t reg_gpa,
> > > +                                      uint64_t reg_size)
> > >  {
> > >      int i, j;
> > >      int r = 0;
> > > @@ -338,23 +346,26 @@ static int vhost_verify_ring_mappings(struct vhost_dev *dev,
> > >          struct vhost_virtqueue *vq = dev->vqs + i;
> > >  
> > >          j = 0;
> > > -        r = vhost_verify_ring_part_mapping(dev, vq->desc, vq->desc_phys,
> > > -                                           vq->desc_size, start_addr, size);
> > > -        if (!r) {
> > > +        r = vhost_verify_ring_part_mapping(
> > > +                vq->desc, vq->desc_phys, vq->desc_size,
> > > +                reg_hva, reg_gpa, reg_size);
> > > +        if (r) {
> > >              break;
> > >          }
> > >  
> > >          j++;
> > > -        r = vhost_verify_ring_part_mapping(dev, vq->avail, vq->avail_phys,
> > > -                                           vq->avail_size, start_addr, size);
> > > -        if (!r) {
> > > +        r = vhost_verify_ring_part_mapping(
> > > +                vq->desc, vq->desc_phys, vq->desc_size,
> > > +                reg_hva, reg_gpa, reg_size);
> > > +        if (r) {
> > >              break;
> > >          }
> > >  
> > >          j++;
> > > -        r = vhost_verify_ring_part_mapping(dev, vq->used, vq->used_phys,
> > > -                                           vq->used_size, start_addr, size);
> > > -        if (!r) {
> > > +        r = vhost_verify_ring_part_mapping(
> > > +                vq->desc, vq->desc_phys, vq->desc_size,
> > > +                reg_hva, reg_gpa, reg_size);
> > > +        if (r) {
> > >              break;
> > >          }
> > >      }
> > > @@ -384,24 +395,18 @@ static bool vhost_section(MemoryRegionSection *section)
> > >      return result;
> > >  }
> > >  
> > > -static void vhost_begin(MemoryListener *listener)
> > > -{
> > > -    struct vhost_dev *dev = container_of(listener, struct vhost_dev,
> > > -                                         memory_listener);
> > > -    dev->mem_changed_end_addr = 0;
> > > -    dev->mem_changed_start_addr = -1;
> > > -}
> > > -
> > >  struct vhost_update_mem_tmp {
> > >      struct vhost_dev   *dev;
> > >      uint32_t nregions;
> > >      struct vhost_memory_region *regions;
> > >  };
> > >  
> > > +
> > >  /* Called for each MRS from vhost_update_mem */
> > >  static int vhost_update_mem_cb(MemoryRegionSection *mrs, void *opaque)
> > >  {
> > >      struct vhost_update_mem_tmp *vtmp = opaque;
> > > +    struct vhost_dev   *dev = vtmp->dev;
> > >      struct vhost_memory_region *cur_vmr;
> > >      bool need_add = true;
> > >      uint64_t mrs_size;
> > > @@ -416,8 +421,7 @@ static int vhost_update_mem_cb(MemoryRegionSection *mrs, void *opaque)
> > >      mrs_host = (uintptr_t)memory_region_get_ram_ptr(mrs->mr) +
> > >                           mrs->offset_within_region;
> > >  
> > > -    trace_vhost_update_mem_cb(mrs->mr->name, mrs_gpa, mrs_size,
> > > -                              (uint64_t)mrs_host);
> > > +    trace_vhost_update_mem_cb(mrs->mr->name, mrs_gpa, mrs_size, mrs_host);
> > >  
> > >      if (vtmp->nregions) {
> > >          /* Since we already have at least one region, lets see if
> > > @@ -459,85 +463,19 @@ static int vhost_update_mem_cb(MemoryRegionSection *mrs, void *opaque)
> > >          cur_vmr->flags_padding = 0;
> > >      }
> > >  
> > > -    return 0;
> > > -}
> > > -
> > > -static bool vhost_update_compare_list(struct vhost_dev *dev,
> > > -                                      struct vhost_update_mem_tmp *vtmp,
> > > -                                      hwaddr *change_start, hwaddr *change_end)
> > > -{
> > > -    uint32_t oldi, newi;
> > > -    *change_start = 0;
> > > -    *change_end = 0;
> > > -
> > > -    for (newi = 0, oldi = 0; newi < vtmp->nregions; newi++) {
> > > -        struct vhost_memory_region *newr = &vtmp->regions[newi];
> > > -        hwaddr newr_last = range_get_last(newr->guest_phys_addr,
> > > -                                          newr->memory_size);
> > > -        trace_vhost_update_compare_list_loopn(newi, oldi,
> > > -                                        newr->guest_phys_addr,
> > > -                                        newr->memory_size);
> > > -        bool whole_change = true;
> > > -        while (oldi < dev->mem->nregions) {
> > > -            struct vhost_memory_region *oldr = &dev->mem->regions[oldi];
> > > -            hwaddr oldr_last = range_get_last(oldr->guest_phys_addr,
> > > -                                          oldr->memory_size);
> > > -            trace_vhost_update_compare_list_loopo(newi, oldi,
> > > -                                        oldr->guest_phys_addr,
> > > -                                        oldr->memory_size);
> > > -            if (newr->guest_phys_addr == oldr->guest_phys_addr &&
> > > -                newr->memory_size == oldr->memory_size) {
> > > -                /* Match in GPA and size, but it could be different
> > > -                 * in host address or flags
> > > -                 */
> > > -                whole_change = newr->userspace_addr != oldr->userspace_addr ||
> > > -                               newr->flags_padding != oldr->flags_padding;
> > > -                oldi++;
> > > -                break; /* inner old loop */
> > > -            }
> > > -            /* There's a difference - need to figure out what */
> > > -            if (oldr_last < newr->guest_phys_addr) {
> > > -                /* There used to be a region before us that's gone */
> > > -                *change_start = MIN(*change_start, oldr->guest_phys_addr);
> > > -                *change_end = MAX(*change_end, oldr_last);
> > > -                oldi++;
> > > -                continue; /* inner old loop */
> > > -            }
> > > -            if (oldr->guest_phys_addr > newr_last) {
> > > -                /* We've passed all the old mappings that could have overlapped
> > > -                 * this one
> > > -                 */
> > > -                break; /* Inner old loop */
> > > -            }
> > > -            /* Overlap case */
> > > -            *change_start = MIN(*change_start,
> > > -                                MIN(oldr->guest_phys_addr,
> > > -                                    newr->guest_phys_addr));
> > > -            *change_end = MAX(*change_end,
> > > -                                MAX(oldr_last, newr_last));
> > > -            whole_change = false;
> > > -            /* There might be more old mappings that overlap */
> > > -            oldi++;
> > > -        }
> > > -        if (whole_change) {
> > > -            /* No old region to compare against, this must be a change */
> > > -            *change_start = MIN(*change_start, newr->guest_phys_addr);
> > > -            *change_end = MAX(*change_end, newr_last);
> > > -        }
> > > +    if (dev->started &&
> > > +        vhost_verify_ring_mappings(dev, (void *)mrs_host, mrs_gpa, mrs_size)) {
> > > +        abort();
> > >      }
> > >  
> > > -    return *change_start || *change_end;
> > > +    return 0;
> > >  }
> > >  
> > >  static int vhost_update_mem(struct vhost_dev *dev)
> > >  {
> > >      int res;
> > > -    struct vhost_update_mem_tmp vtmp;
> > > -    hwaddr change_start, change_end;
> > > -    bool need_update;
> > > -    vtmp.regions = 0;
> > > -    vtmp.nregions = 0;
> > > -    vtmp.dev = dev;
> > > +    size_t mem_size;
> > > +    struct vhost_update_mem_tmp vtmp = {.dev = dev};
> > >  
> > >      res = address_space_iterate(&address_space_memory,
> > >                                  vhost_update_mem_cb, &vtmp);
> > > @@ -545,24 +483,17 @@ static int vhost_update_mem(struct vhost_dev *dev)
> > >          goto out;
> > >      }
> > >  
> > > -    need_update = vhost_update_compare_list(dev, &vtmp,
> > > -                                            &change_start, &change_end);
> > > -    trace_vhost_update_mem_comparison(need_update,
> > > -                                      (uint64_t)change_start,
> > > -                                      (uint64_t)change_end);
> > > -    if (need_update) {
> > > -        /* Update the main regions list from our tmp */
> > > -        size_t mem_size = offsetof(struct vhost_memory, regions) +
> > > -            (vtmp.nregions + 1) * sizeof dev->mem->regions[0];
> > > +    mem_size = offsetof(struct vhost_memory, regions) +
> > > +               (vtmp.nregions + 1) * sizeof dev->mem->regions[0];
> > >  
> > > +    if(vtmp.nregions != dev->mem->nregions ||
> > > +       memcmp(vtmp.regions, dev->mem->regions, mem_size)) {
> > > +        /* Update the main regions list from our tmp */
> > >          dev->mem = g_realloc(dev->mem, mem_size);
> > >          dev->mem->nregions = vtmp.nregions;
> > >          memcpy(dev->mem->regions, vtmp.regions,
> > >                 vtmp.nregions * sizeof dev->mem->regions[0]);
> > >          used_memslots = vtmp.nregions;
> > > -
> > > -        dev->mem_changed_start_addr = change_start;
> > > -        dev->mem_changed_end_addr = change_end;
> > >      }
> > >  
> > >  out:
> > > @@ -574,8 +505,6 @@ static void vhost_commit(MemoryListener *listener)
> > >  {
> > >      struct vhost_dev *dev = container_of(listener, struct vhost_dev,
> > >                                           memory_listener);
> > > -    hwaddr start_addr = 0;
> > > -    ram_addr_t size = 0;
> > >      uint64_t log_size;
> > >      int r;
> > >  
> > > @@ -585,22 +514,11 @@ static void vhost_commit(MemoryListener *listener)
> > >      if (!dev->started) {
> > >          return;
> > >      }
> > > -    if (dev->mem_changed_start_addr > dev->mem_changed_end_addr) {
> > > -        return;
> > > -    }  
> > 
> > This was in the wrong place in the version I posted; this should have
> > been after the vhost_update_mem;  I think we still need something
> > telling us whether vhost_update_mem found a change, because we want to
> > skip the rest of vhost_commit if there's no change at all.
> maybe cache memcmp result in dev->memory_changed

OK, let me see, I'll post a new version of my set, having tried to merge
this in.

Thanks,

Dave

> > 
> > >      if (vhost_update_mem(dev)) {
> > >          return;
> > >      }
> > >  
> > > -    if (dev->started) {
> > > -        start_addr = dev->mem_changed_start_addr;
> > > -        size = dev->mem_changed_end_addr - dev->mem_changed_start_addr + 1;
> > > -
> > > -        r = vhost_verify_ring_mappings(dev, start_addr, size);
> > > -        assert(r >= 0);
> > > -    }
> > > -
> > >      if (!dev->log_enabled) {
> > >          r = dev->vhost_ops->vhost_set_mem_table(dev, dev->mem);
> > >          if (r < 0) {
> > > @@ -1235,7 +1153,6 @@ int vhost_dev_init(struct vhost_dev *hdev, void *opaque,
> > >      hdev->features = features;
> > >  
> > >      hdev->memory_listener = (MemoryListener) {
> > > -        .begin = vhost_begin,
> > >          .commit = vhost_commit,
> > >          .region_add = vhost_region_add,
> > >          .region_del = vhost_region_del,
> > > @@ -1458,7 +1375,6 @@ int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev)
> > >      /* should only be called after backend is connected */
> > >      assert(hdev->vhost_ops);
> > >  
> > > -    hdev->started = true;
> > >      hdev->vdev = vdev;
> > >  
> > >      r = vhost_dev_set_features(hdev, hdev->log_enabled);
> > > @@ -1469,6 +1385,9 @@ int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev)
> > >      if (vhost_update_mem(hdev)) {
> > >          goto fail_mem;
> > >      }
> > > +
> > > +    hdev->started = true;
> > > +  
> > 
> > Ah, good spot.
> > 
> > Dave
> > 
> > >      if (vhost_dev_has_iommu(hdev)) {
> > >          memory_listener_register(&hdev->iommu_listener, vdev->dma_as);
> > >      }
> > > -- 
> > > 2.7.4
> > >   
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [RFC] vhost: check if ring mapping is still valid when building memmap
  2017-12-01 14:22 ` [Qemu-devel] [RFC] vhost: check if ring mapping is still valid when building memmap Igor Mammedov
  2017-12-07 18:17   ` Dr. David Alan Gilbert
@ 2017-12-08 20:45   ` Dr. David Alan Gilbert
  1 sibling, 0 replies; 34+ messages in thread
From: Dr. David Alan Gilbert @ 2017-12-08 20:45 UTC (permalink / raw)
  To: Igor Mammedov; +Cc: qemu-devel, mst, groug, stefanha

* Igor Mammedov (imammedo@redhat.com) wrote:
> vhost_verify_ring_mappings() were used to verify that
> rings are still accessible and related memory hasn't
> been moved after flatview is updated.

I think I've rolled the equivalent into my v2 I've just
posted; please have a look.

Dave

> It were doing checks by mapping ring's GPA+len and
> checking that HVA hasn't changed with new memory map.
> To avoid maybe expensive mapping call, we were
> identifying address range that changed and were doing
> mapping only if ring were in changed range.
> 
> However it's no neccessary to perform ringi's GPA
> mapping as we already have it's current HVA and all
> we need is to verify that ring's GPA translates to
> the same HVA in updated flatview.
> 
> That could be done during time when we are building
> vhost memmap where vhost_update_mem_cb() already maps
> every RAM MemoryRegionSection to get section HVA. This
> fact can be used to check that ring belongs to the same
> MemoryRegion in the same place, like this:
> 
>   hva_ring_offset = GPA(ring) - GPA(MemoryRegionSection)
>   ring_hva == HVA(MemoryRegionSection) + hva_ring_offset
> 
> Doing this would allow us to drop ~100LOC of code which
> figures out changed memory range and avoid calling not
> needed reverse vhost_memory_map(is_write=true) which is
> overkill for the task at hand.
> 
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> ---
> PS:
>    should be applied on top of David's series
> 
> ---
>  include/hw/virtio/vhost.h |   2 -
>  hw/virtio/vhost.c         | 195 ++++++++++++++--------------------------------
>  2 files changed, 57 insertions(+), 140 deletions(-)
> 
> diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
> index 467dc77..fc4af5c 100644
> --- a/include/hw/virtio/vhost.h
> +++ b/include/hw/virtio/vhost.h
> @@ -68,8 +68,6 @@ struct vhost_dev {
>      uint64_t log_size;
>      Error *migration_blocker;
>      bool memory_changed;
> -    hwaddr mem_changed_start_addr;
> -    hwaddr mem_changed_end_addr;
>      const VhostOps *vhost_ops;
>      void *opaque;
>      struct vhost_log *log;
> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> index 5b9a7d7..026bac3 100644
> --- a/hw/virtio/vhost.c
> +++ b/hw/virtio/vhost.c
> @@ -296,35 +296,43 @@ static void vhost_memory_unmap(struct vhost_dev *dev, void *buffer,
>      }
>  }
>  
> -static int vhost_verify_ring_part_mapping(struct vhost_dev *dev,
> -                                          void *part,
> -                                          uint64_t part_addr,
> -                                          uint64_t part_size,
> -                                          uint64_t start_addr,
> -                                          uint64_t size)
> +static int vhost_verify_ring_part_mapping(void *ring_hva,
> +                                          uint64_t ring_gpa,
> +                                          uint64_t ring_size,
> +                                          void *reg_hva,
> +                                          uint64_t reg_gpa,
> +                                          uint64_t reg_size)
>  {
> -    hwaddr l;
> -    void *p;
> -    int r = 0;
> +    uint64_t hva_ring_offset;
> +    uint64_t ring_last = range_get_last(ring_gpa, ring_size);
> +    uint64_t reg_last = range_get_last(reg_gpa, reg_size);
>  
> -    if (!ranges_overlap(start_addr, size, part_addr, part_size)) {
> +    /* start check from the end so that the rest of checks
> +     * are done when whole ring is in merged sections range
> +     */
> +    if (ring_last < reg_last || ring_gpa > reg_last) {
>          return 0;
>      }
> -    l = part_size;
> -    p = vhost_memory_map(dev, part_addr, &l, 1);
> -    if (!p || l != part_size) {
> -        r = -ENOMEM;
> +
> +    /* check that whole ring's is mapped */
> +    if (range_get_last(ring_gpa, ring_size) >
> +        range_get_last(reg_gpa, reg_size)) {
> +        return -EBUSY;
>      }
> -    if (p != part) {
> -        r = -EBUSY;
> +
> +    /* check that ring's MemoryRegion wasn't replaced */
> +    hva_ring_offset = ring_gpa - reg_gpa;
> +    if (ring_hva != reg_hva + hva_ring_offset) {
> +        return -ENOMEM;
>      }
> -    vhost_memory_unmap(dev, p, l, 0, 0);
> -    return r;
> +
> +    return 0;
>  }
>  
>  static int vhost_verify_ring_mappings(struct vhost_dev *dev,
> -                                      uint64_t start_addr,
> -                                      uint64_t size)
> +                                      void *reg_hva,
> +                                      uint64_t reg_gpa,
> +                                      uint64_t reg_size)
>  {
>      int i, j;
>      int r = 0;
> @@ -338,23 +346,26 @@ static int vhost_verify_ring_mappings(struct vhost_dev *dev,
>          struct vhost_virtqueue *vq = dev->vqs + i;
>  
>          j = 0;
> -        r = vhost_verify_ring_part_mapping(dev, vq->desc, vq->desc_phys,
> -                                           vq->desc_size, start_addr, size);
> -        if (!r) {
> +        r = vhost_verify_ring_part_mapping(
> +                vq->desc, vq->desc_phys, vq->desc_size,
> +                reg_hva, reg_gpa, reg_size);
> +        if (r) {
>              break;
>          }
>  
>          j++;
> -        r = vhost_verify_ring_part_mapping(dev, vq->avail, vq->avail_phys,
> -                                           vq->avail_size, start_addr, size);
> -        if (!r) {
> +        r = vhost_verify_ring_part_mapping(
> +                vq->desc, vq->desc_phys, vq->desc_size,
> +                reg_hva, reg_gpa, reg_size);
> +        if (r) {
>              break;
>          }
>  
>          j++;
> -        r = vhost_verify_ring_part_mapping(dev, vq->used, vq->used_phys,
> -                                           vq->used_size, start_addr, size);
> -        if (!r) {
> +        r = vhost_verify_ring_part_mapping(
> +                vq->desc, vq->desc_phys, vq->desc_size,
> +                reg_hva, reg_gpa, reg_size);
> +        if (r) {
>              break;
>          }
>      }
> @@ -384,24 +395,18 @@ static bool vhost_section(MemoryRegionSection *section)
>      return result;
>  }
>  
> -static void vhost_begin(MemoryListener *listener)
> -{
> -    struct vhost_dev *dev = container_of(listener, struct vhost_dev,
> -                                         memory_listener);
> -    dev->mem_changed_end_addr = 0;
> -    dev->mem_changed_start_addr = -1;
> -}
> -
>  struct vhost_update_mem_tmp {
>      struct vhost_dev   *dev;
>      uint32_t nregions;
>      struct vhost_memory_region *regions;
>  };
>  
> +
>  /* Called for each MRS from vhost_update_mem */
>  static int vhost_update_mem_cb(MemoryRegionSection *mrs, void *opaque)
>  {
>      struct vhost_update_mem_tmp *vtmp = opaque;
> +    struct vhost_dev   *dev = vtmp->dev;
>      struct vhost_memory_region *cur_vmr;
>      bool need_add = true;
>      uint64_t mrs_size;
> @@ -416,8 +421,7 @@ static int vhost_update_mem_cb(MemoryRegionSection *mrs, void *opaque)
>      mrs_host = (uintptr_t)memory_region_get_ram_ptr(mrs->mr) +
>                           mrs->offset_within_region;
>  
> -    trace_vhost_update_mem_cb(mrs->mr->name, mrs_gpa, mrs_size,
> -                              (uint64_t)mrs_host);
> +    trace_vhost_update_mem_cb(mrs->mr->name, mrs_gpa, mrs_size, mrs_host);
>  
>      if (vtmp->nregions) {
>          /* Since we already have at least one region, lets see if
> @@ -459,85 +463,19 @@ static int vhost_update_mem_cb(MemoryRegionSection *mrs, void *opaque)
>          cur_vmr->flags_padding = 0;
>      }
>  
> -    return 0;
> -}
> -
> -static bool vhost_update_compare_list(struct vhost_dev *dev,
> -                                      struct vhost_update_mem_tmp *vtmp,
> -                                      hwaddr *change_start, hwaddr *change_end)
> -{
> -    uint32_t oldi, newi;
> -    *change_start = 0;
> -    *change_end = 0;
> -
> -    for (newi = 0, oldi = 0; newi < vtmp->nregions; newi++) {
> -        struct vhost_memory_region *newr = &vtmp->regions[newi];
> -        hwaddr newr_last = range_get_last(newr->guest_phys_addr,
> -                                          newr->memory_size);
> -        trace_vhost_update_compare_list_loopn(newi, oldi,
> -                                        newr->guest_phys_addr,
> -                                        newr->memory_size);
> -        bool whole_change = true;
> -        while (oldi < dev->mem->nregions) {
> -            struct vhost_memory_region *oldr = &dev->mem->regions[oldi];
> -            hwaddr oldr_last = range_get_last(oldr->guest_phys_addr,
> -                                          oldr->memory_size);
> -            trace_vhost_update_compare_list_loopo(newi, oldi,
> -                                        oldr->guest_phys_addr,
> -                                        oldr->memory_size);
> -            if (newr->guest_phys_addr == oldr->guest_phys_addr &&
> -                newr->memory_size == oldr->memory_size) {
> -                /* Match in GPA and size, but it could be different
> -                 * in host address or flags
> -                 */
> -                whole_change = newr->userspace_addr != oldr->userspace_addr ||
> -                               newr->flags_padding != oldr->flags_padding;
> -                oldi++;
> -                break; /* inner old loop */
> -            }
> -            /* There's a difference - need to figure out what */
> -            if (oldr_last < newr->guest_phys_addr) {
> -                /* There used to be a region before us that's gone */
> -                *change_start = MIN(*change_start, oldr->guest_phys_addr);
> -                *change_end = MAX(*change_end, oldr_last);
> -                oldi++;
> -                continue; /* inner old loop */
> -            }
> -            if (oldr->guest_phys_addr > newr_last) {
> -                /* We've passed all the old mappings that could have overlapped
> -                 * this one
> -                 */
> -                break; /* Inner old loop */
> -            }
> -            /* Overlap case */
> -            *change_start = MIN(*change_start,
> -                                MIN(oldr->guest_phys_addr,
> -                                    newr->guest_phys_addr));
> -            *change_end = MAX(*change_end,
> -                                MAX(oldr_last, newr_last));
> -            whole_change = false;
> -            /* There might be more old mappings that overlap */
> -            oldi++;
> -        }
> -        if (whole_change) {
> -            /* No old region to compare against, this must be a change */
> -            *change_start = MIN(*change_start, newr->guest_phys_addr);
> -            *change_end = MAX(*change_end, newr_last);
> -        }
> +    if (dev->started &&
> +        vhost_verify_ring_mappings(dev, (void *)mrs_host, mrs_gpa, mrs_size)) {
> +        abort();
>      }
>  
> -    return *change_start || *change_end;
> +    return 0;
>  }
>  
>  static int vhost_update_mem(struct vhost_dev *dev)
>  {
>      int res;
> -    struct vhost_update_mem_tmp vtmp;
> -    hwaddr change_start, change_end;
> -    bool need_update;
> -    vtmp.regions = 0;
> -    vtmp.nregions = 0;
> -    vtmp.dev = dev;
> +    size_t mem_size;
> +    struct vhost_update_mem_tmp vtmp = {.dev = dev};
>  
>      res = address_space_iterate(&address_space_memory,
>                                  vhost_update_mem_cb, &vtmp);
> @@ -545,24 +483,17 @@ static int vhost_update_mem(struct vhost_dev *dev)
>          goto out;
>      }
>  
> -    need_update = vhost_update_compare_list(dev, &vtmp,
> -                                            &change_start, &change_end);
> -    trace_vhost_update_mem_comparison(need_update,
> -                                      (uint64_t)change_start,
> -                                      (uint64_t)change_end);
> -    if (need_update) {
> -        /* Update the main regions list from our tmp */
> -        size_t mem_size = offsetof(struct vhost_memory, regions) +
> -            (vtmp.nregions + 1) * sizeof dev->mem->regions[0];
> +    mem_size = offsetof(struct vhost_memory, regions) +
> +               (vtmp.nregions + 1) * sizeof dev->mem->regions[0];
>  
> +    if(vtmp.nregions != dev->mem->nregions ||
> +       memcmp(vtmp.regions, dev->mem->regions, mem_size)) {
> +        /* Update the main regions list from our tmp */
>          dev->mem = g_realloc(dev->mem, mem_size);
>          dev->mem->nregions = vtmp.nregions;
>          memcpy(dev->mem->regions, vtmp.regions,
>                 vtmp.nregions * sizeof dev->mem->regions[0]);
>          used_memslots = vtmp.nregions;
> -
> -        dev->mem_changed_start_addr = change_start;
> -        dev->mem_changed_end_addr = change_end;
>      }
>  
>  out:
> @@ -574,8 +505,6 @@ static void vhost_commit(MemoryListener *listener)
>  {
>      struct vhost_dev *dev = container_of(listener, struct vhost_dev,
>                                           memory_listener);
> -    hwaddr start_addr = 0;
> -    ram_addr_t size = 0;
>      uint64_t log_size;
>      int r;
>  
> @@ -585,22 +514,11 @@ static void vhost_commit(MemoryListener *listener)
>      if (!dev->started) {
>          return;
>      }
> -    if (dev->mem_changed_start_addr > dev->mem_changed_end_addr) {
> -        return;
> -    }
>  
>      if (vhost_update_mem(dev)) {
>          return;
>      }
>  
> -    if (dev->started) {
> -        start_addr = dev->mem_changed_start_addr;
> -        size = dev->mem_changed_end_addr - dev->mem_changed_start_addr + 1;
> -
> -        r = vhost_verify_ring_mappings(dev, start_addr, size);
> -        assert(r >= 0);
> -    }
> -
>      if (!dev->log_enabled) {
>          r = dev->vhost_ops->vhost_set_mem_table(dev, dev->mem);
>          if (r < 0) {
> @@ -1235,7 +1153,6 @@ int vhost_dev_init(struct vhost_dev *hdev, void *opaque,
>      hdev->features = features;
>  
>      hdev->memory_listener = (MemoryListener) {
> -        .begin = vhost_begin,
>          .commit = vhost_commit,
>          .region_add = vhost_region_add,
>          .region_del = vhost_region_del,
> @@ -1458,7 +1375,6 @@ int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev)
>      /* should only be called after backend is connected */
>      assert(hdev->vhost_ops);
>  
> -    hdev->started = true;
>      hdev->vdev = vdev;
>  
>      r = vhost_dev_set_features(hdev, hdev->log_enabled);
> @@ -1469,6 +1385,9 @@ int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev)
>      if (vhost_update_mem(hdev)) {
>          goto fail_mem;
>      }
> +
> +    hdev->started = true;
> +
>      if (vhost_dev_has_iommu(hdev)) {
>          memory_listener_register(&hdev->iommu_listener, vdev->dma_as);
>      }
> -- 
> 2.7.4
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [RFC] vhost: check if ring mapping is still valid when building memmap
  2017-12-08 17:51       ` Dr. David Alan Gilbert
@ 2017-12-11  9:37         ` Igor Mammedov
  2017-12-11 11:03           ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 34+ messages in thread
From: Igor Mammedov @ 2017-12-11  9:37 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: qemu-devel, mst, groug, stefanha

On Fri, 8 Dec 2017 17:51:56 +0000
"Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:

> * Igor Mammedov (imammedo@redhat.com) wrote:
> > On Thu, 7 Dec 2017 18:17:51 +0000
> > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> >   
> > > * Igor Mammedov (imammedo@redhat.com) wrote:  
> > > > vhost_verify_ring_mappings() were used to verify that
> > > > rings are still accessible and related memory hasn't
> > > > been moved after flatview is updated.
> > > > 
> > > > It were doing checks by mapping ring's GPA+len and
> > > > checking that HVA hasn't changed with new memory map.
> > > > To avoid maybe expensive mapping call, we were
> > > > identifying address range that changed and were doing
> > > > mapping only if ring were in changed range.
> > > > 
> > > > However it's no neccessary to perform ringi's GPA
> > > > mapping as we already have it's current HVA and all
> > > > we need is to verify that ring's GPA translates to
> > > > the same HVA in updated flatview.
> > > > 
> > > > That could be done during time when we are building
> > > > vhost memmap where vhost_update_mem_cb() already maps
> > > > every RAM MemoryRegionSection to get section HVA. This
> > > > fact can be used to check that ring belongs to the same
> > > > MemoryRegion in the same place, like this:
> > > > 
> > > >   hva_ring_offset = GPA(ring) - GPA(MemoryRegionSection)
> > > >   ring_hva == HVA(MemoryRegionSection) + hva_ring_offset
> > > > 
> > > > Doing this would allow us to drop ~100LOC of code which
> > > > figures out changed memory range and avoid calling not
> > > > needed reverse vhost_memory_map(is_write=true) which is
> > > > overkill for the task at hand.    
> > > 
> > > There are a few things about this I don't understand; however if
> > > it's right I agree that it means we can kill off my comparison
> > > code.
> > > 
> > > 
> > > First, can I check what constraints 'verify_ring' needs to check;
> > > if I'm understanding the original code it's checking that :
> > >     a) If a queue falls within a region, that the whole queue can
> > >        be mapped  
> >          see vhost_verify_ring_part_mapping():
> > 
> >              p = vhost_memory_map(dev, part_addr, &l, 1);                                 
> >              if (!p || l != part_size) 
> >                   error_out
> >              
> >          1st: (!p) requested part_addr must be mapped
> >               i.e. be a part of MemorySection in flatview
> >          AND
> >          2nd: (l != part_size) mapped range size must match what we asked for
> >               i.e. mapped as continuous range so that
> >                  [GPA, GPA + part_size) could directly correspond to [HVA, HVA + part_size)
> >               and that's is possible only withing 1 MemorySection && 1 MeoryRegion
> >               if I read address_space_map() correctly
> >                      flatview_translate() -> GPA's MemorySection
> >                      flatview_extend_translation() -> 1:1 GPA->HVA range size
> >               
> >          So answer in case of RAM memory region that we are interested in, would be:
> >          queue falls within a MemorySection and whole queue fits in to it  
> 
> Yes, OK.
> 
> > >     b) And the start of the queue corresponds to where we thought
> > >        it used to be (in GPA?)  
> >          that cached at vq->desc queue HVA hasn't changed after flatview change
> >             if (p != part)
> >                error_out  
> 
> OK, so it's the HVA that's not changed based on taking the part_addr
> which is GPA and checking the map?
Yes, I think so.

> > >    so that doesn't have any constraint on the ordering of the regions
> > >    or whether the region is the same size or anything.
> > >   Also I don't think it would spot if there was a qeueue that had no
> > >   region associated with it at all.
> > > 
> > > Now, comments on your code below:
> > > 
> > >   
> > > > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > > > ---
> > > > PS:
> > > >    should be applied on top of David's series
> > > > 
> > > > ---
> > > >  include/hw/virtio/vhost.h |   2 -
> > > >  hw/virtio/vhost.c         | 195 ++++++++++++++--------------------------------
> > > >  2 files changed, 57 insertions(+), 140 deletions(-)
> > > > 
> > > > diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
> > > > index 467dc77..fc4af5c 100644
> > > > --- a/include/hw/virtio/vhost.h
> > > > +++ b/include/hw/virtio/vhost.h
> > > > @@ -68,8 +68,6 @@ struct vhost_dev {
> > > >      uint64_t log_size;
> > > >      Error *migration_blocker;
> > > >      bool memory_changed;
> > > > -    hwaddr mem_changed_start_addr;
> > > > -    hwaddr mem_changed_end_addr;
> > > >      const VhostOps *vhost_ops;
> > > >      void *opaque;
> > > >      struct vhost_log *log;
> > > > diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> > > > index 5b9a7d7..026bac3 100644
> > > > --- a/hw/virtio/vhost.c
> > > > +++ b/hw/virtio/vhost.c
> > > > @@ -296,35 +296,43 @@ static void vhost_memory_unmap(struct vhost_dev *dev, void *buffer,
> > > >      }
> > > >  }
> > > >  
> > > > -static int vhost_verify_ring_part_mapping(struct vhost_dev *dev,
> > > > -                                          void *part,
> > > > -                                          uint64_t part_addr,
> > > > -                                          uint64_t part_size,
> > > > -                                          uint64_t start_addr,
> > > > -                                          uint64_t size)
> > > > +static int vhost_verify_ring_part_mapping(void *ring_hva,
> > > > +                                          uint64_t ring_gpa,
> > > > +                                          uint64_t ring_size,
> > > > +                                          void *reg_hva,
> > > > +                                          uint64_t reg_gpa,
> > > > +                                          uint64_t reg_size)
> > > >  {
> > > > -    hwaddr l;
> > > > -    void *p;
> > > > -    int r = 0;
> > > > +    uint64_t hva_ring_offset;
> > > > +    uint64_t ring_last = range_get_last(ring_gpa, ring_size);
> > > > +    uint64_t reg_last = range_get_last(reg_gpa, reg_size);
> > > >  
> > > > -    if (!ranges_overlap(start_addr, size, part_addr, part_size)) {
> > > > +    /* start check from the end so that the rest of checks
> > > > +     * are done when whole ring is in merged sections range
> > > > +     */
> > > > +    if (ring_last < reg_last || ring_gpa > reg_last) {
> > > >          return 0;
> > > >      }    
> > > 
> > >   so does that mean if our region never grows to be as big as the ring
> > > we wont spot the problem?  
> > I think there is mistake here it should be:
> >    ring_last < reg_gpa || ring_gpa > reg_last
> > 
> > so if ring is out side of continuous region, we do not care => no change  
> 
> OK, I don't see how that corresponds to your 'start check from the end'
> comment - I thought it was doing something smarter to deal with this
> being called from the _cb routine potentially before another part of the
> range had been joined onto it.
> In that case, we can just use ranges_overlap like the original
> routine did.
I suppose range check will do as well, the reason for check from the end
was optimization to make less checks than ranges_overlap(),
considering we are comparing against vhost_memory_region.

But it seems like a bit an overdoing, since by the commit time
flatview would contain 1:1 mapping of MemorySection:vhost_memory_region
(modulo sections we are not interested in). It should be unlikely to 
get 2 MemoryRegions allocated one after another and merge in vhost_memory_region()
into one vhost_memory_region. Maybe we would be better off with just
copying MemorySections into vhost_memory_region in vhost_update_mem_cb()
and drop merging altogether.
I'd even consider 'non deterministic' merging of 2 MemoryRegions harmful
to migration, since vhost might see different memory map on target vs source.

[...]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [RFC] vhost: check if ring mapping is still valid when building memmap
  2017-12-11  9:37         ` Igor Mammedov
@ 2017-12-11 11:03           ` Dr. David Alan Gilbert
  2017-12-11 13:45             ` Igor Mammedov
  0 siblings, 1 reply; 34+ messages in thread
From: Dr. David Alan Gilbert @ 2017-12-11 11:03 UTC (permalink / raw)
  To: Igor Mammedov; +Cc: qemu-devel, mst, groug, stefanha

* Igor Mammedov (imammedo@redhat.com) wrote:
> On Fri, 8 Dec 2017 17:51:56 +0000
> "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> 
> > * Igor Mammedov (imammedo@redhat.com) wrote:
> > > On Thu, 7 Dec 2017 18:17:51 +0000
> > > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> > >   
> > > > * Igor Mammedov (imammedo@redhat.com) wrote:  
> > > > > vhost_verify_ring_mappings() were used to verify that
> > > > > rings are still accessible and related memory hasn't
> > > > > been moved after flatview is updated.
> > > > > 
> > > > > It were doing checks by mapping ring's GPA+len and
> > > > > checking that HVA hasn't changed with new memory map.
> > > > > To avoid maybe expensive mapping call, we were
> > > > > identifying address range that changed and were doing
> > > > > mapping only if ring were in changed range.
> > > > > 
> > > > > However it's no neccessary to perform ringi's GPA
> > > > > mapping as we already have it's current HVA and all
> > > > > we need is to verify that ring's GPA translates to
> > > > > the same HVA in updated flatview.
> > > > > 
> > > > > That could be done during time when we are building
> > > > > vhost memmap where vhost_update_mem_cb() already maps
> > > > > every RAM MemoryRegionSection to get section HVA. This
> > > > > fact can be used to check that ring belongs to the same
> > > > > MemoryRegion in the same place, like this:
> > > > > 
> > > > >   hva_ring_offset = GPA(ring) - GPA(MemoryRegionSection)
> > > > >   ring_hva == HVA(MemoryRegionSection) + hva_ring_offset
> > > > > 
> > > > > Doing this would allow us to drop ~100LOC of code which
> > > > > figures out changed memory range and avoid calling not
> > > > > needed reverse vhost_memory_map(is_write=true) which is
> > > > > overkill for the task at hand.    
> > > > 
> > > > There are a few things about this I don't understand; however if
> > > > it's right I agree that it means we can kill off my comparison
> > > > code.
> > > > 
> > > > 
> > > > First, can I check what constraints 'verify_ring' needs to check;
> > > > if I'm understanding the original code it's checking that :
> > > >     a) If a queue falls within a region, that the whole queue can
> > > >        be mapped  
> > >          see vhost_verify_ring_part_mapping():
> > > 
> > >              p = vhost_memory_map(dev, part_addr, &l, 1);                                 
> > >              if (!p || l != part_size) 
> > >                   error_out
> > >              
> > >          1st: (!p) requested part_addr must be mapped
> > >               i.e. be a part of MemorySection in flatview
> > >          AND
> > >          2nd: (l != part_size) mapped range size must match what we asked for
> > >               i.e. mapped as continuous range so that
> > >                  [GPA, GPA + part_size) could directly correspond to [HVA, HVA + part_size)
> > >               and that's is possible only withing 1 MemorySection && 1 MeoryRegion
> > >               if I read address_space_map() correctly
> > >                      flatview_translate() -> GPA's MemorySection
> > >                      flatview_extend_translation() -> 1:1 GPA->HVA range size
> > >               
> > >          So answer in case of RAM memory region that we are interested in, would be:
> > >          queue falls within a MemorySection and whole queue fits in to it  
> > 
> > Yes, OK.
> > 
> > > >     b) And the start of the queue corresponds to where we thought
> > > >        it used to be (in GPA?)  
> > >          that cached at vq->desc queue HVA hasn't changed after flatview change
> > >             if (p != part)
> > >                error_out  
> > 
> > OK, so it's the HVA that's not changed based on taking the part_addr
> > which is GPA and checking the map?
> Yes, I think so.
> 
> > > >    so that doesn't have any constraint on the ordering of the regions
> > > >    or whether the region is the same size or anything.
> > > >   Also I don't think it would spot if there was a qeueue that had no
> > > >   region associated with it at all.
> > > > 
> > > > Now, comments on your code below:
> > > > 
> > > >   
> > > > > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > > > > ---
> > > > > PS:
> > > > >    should be applied on top of David's series
> > > > > 
> > > > > ---
> > > > >  include/hw/virtio/vhost.h |   2 -
> > > > >  hw/virtio/vhost.c         | 195 ++++++++++++++--------------------------------
> > > > >  2 files changed, 57 insertions(+), 140 deletions(-)
> > > > > 
> > > > > diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
> > > > > index 467dc77..fc4af5c 100644
> > > > > --- a/include/hw/virtio/vhost.h
> > > > > +++ b/include/hw/virtio/vhost.h
> > > > > @@ -68,8 +68,6 @@ struct vhost_dev {
> > > > >      uint64_t log_size;
> > > > >      Error *migration_blocker;
> > > > >      bool memory_changed;
> > > > > -    hwaddr mem_changed_start_addr;
> > > > > -    hwaddr mem_changed_end_addr;
> > > > >      const VhostOps *vhost_ops;
> > > > >      void *opaque;
> > > > >      struct vhost_log *log;
> > > > > diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> > > > > index 5b9a7d7..026bac3 100644
> > > > > --- a/hw/virtio/vhost.c
> > > > > +++ b/hw/virtio/vhost.c
> > > > > @@ -296,35 +296,43 @@ static void vhost_memory_unmap(struct vhost_dev *dev, void *buffer,
> > > > >      }
> > > > >  }
> > > > >  
> > > > > -static int vhost_verify_ring_part_mapping(struct vhost_dev *dev,
> > > > > -                                          void *part,
> > > > > -                                          uint64_t part_addr,
> > > > > -                                          uint64_t part_size,
> > > > > -                                          uint64_t start_addr,
> > > > > -                                          uint64_t size)
> > > > > +static int vhost_verify_ring_part_mapping(void *ring_hva,
> > > > > +                                          uint64_t ring_gpa,
> > > > > +                                          uint64_t ring_size,
> > > > > +                                          void *reg_hva,
> > > > > +                                          uint64_t reg_gpa,
> > > > > +                                          uint64_t reg_size)
> > > > >  {
> > > > > -    hwaddr l;
> > > > > -    void *p;
> > > > > -    int r = 0;
> > > > > +    uint64_t hva_ring_offset;
> > > > > +    uint64_t ring_last = range_get_last(ring_gpa, ring_size);
> > > > > +    uint64_t reg_last = range_get_last(reg_gpa, reg_size);
> > > > >  
> > > > > -    if (!ranges_overlap(start_addr, size, part_addr, part_size)) {
> > > > > +    /* start check from the end so that the rest of checks
> > > > > +     * are done when whole ring is in merged sections range
> > > > > +     */
> > > > > +    if (ring_last < reg_last || ring_gpa > reg_last) {
> > > > >          return 0;
> > > > >      }    
> > > > 
> > > >   so does that mean if our region never grows to be as big as the ring
> > > > we wont spot the problem?  
> > > I think there is mistake here it should be:
> > >    ring_last < reg_gpa || ring_gpa > reg_last
> > > 
> > > so if ring is out side of continuous region, we do not care => no change  
> > 
> > OK, I don't see how that corresponds to your 'start check from the end'
> > comment - I thought it was doing something smarter to deal with this
> > being called from the _cb routine potentially before another part of the
> > range had been joined onto it.
> > In that case, we can just use ranges_overlap like the original
> > routine did.
> I suppose range check will do as well, the reason for check from the end
> was optimization to make less checks than ranges_overlap(),
> considering we are comparing against vhost_memory_region.

Have a look at the RFC v2 I've posted; I've reworked it a bit so
we call this code aftwards rather than from the _cb.

> But it seems like a bit an overdoing, since by the commit time
> flatview would contain 1:1 mapping of MemorySection:vhost_memory_region
> (modulo sections we are not interested in). It should be unlikely to 
> get 2 MemoryRegions allocated one after another and merge in vhost_memory_region()
> into one vhost_memory_region. Maybe we would be better off with just
> copying MemorySections into vhost_memory_region in vhost_update_mem_cb()
> and drop merging altogether.

Except I need the merging for the hugepage stuff that's coming in the
postcopy world.

> I'd even consider 'non deterministic' merging of 2 MemoryRegions harmful
> to migration, since vhost might see different memory map on target vs source.

I think that's come up before and it was decided it's not a problem
as long as the regions are still mapped; the only consistency required
is between the qemu and the vhost (either the kernel or the vhost-user);
it shouldn't be a guest visibile issue if I understand it correctly.

Dave
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [RFC] vhost: check if ring mapping is still valid when building memmap
  2017-12-11 11:03           ` Dr. David Alan Gilbert
@ 2017-12-11 13:45             ` Igor Mammedov
  2017-12-11 15:43               ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 34+ messages in thread
From: Igor Mammedov @ 2017-12-11 13:45 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: qemu-devel, mst, groug, stefanha

On Mon, 11 Dec 2017 11:03:00 +0000
"Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:

> * Igor Mammedov (imammedo@redhat.com) wrote:
> > On Fri, 8 Dec 2017 17:51:56 +0000
> > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> >   
> > > * Igor Mammedov (imammedo@redhat.com) wrote:  
> > > > On Thu, 7 Dec 2017 18:17:51 +0000
> > > > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> > > >     
> > > > > * Igor Mammedov (imammedo@redhat.com) wrote:    
> > > > > > vhost_verify_ring_mappings() were used to verify that
> > > > > > rings are still accessible and related memory hasn't
> > > > > > been moved after flatview is updated.
> > > > > > 
> > > > > > It were doing checks by mapping ring's GPA+len and
> > > > > > checking that HVA hasn't changed with new memory map.
> > > > > > To avoid maybe expensive mapping call, we were
> > > > > > identifying address range that changed and were doing
> > > > > > mapping only if ring were in changed range.
> > > > > > 
> > > > > > However it's no neccessary to perform ringi's GPA
> > > > > > mapping as we already have it's current HVA and all
> > > > > > we need is to verify that ring's GPA translates to
> > > > > > the same HVA in updated flatview.
> > > > > > 
> > > > > > That could be done during time when we are building
> > > > > > vhost memmap where vhost_update_mem_cb() already maps
> > > > > > every RAM MemoryRegionSection to get section HVA. This
> > > > > > fact can be used to check that ring belongs to the same
> > > > > > MemoryRegion in the same place, like this:
> > > > > > 
> > > > > >   hva_ring_offset = GPA(ring) - GPA(MemoryRegionSection)
> > > > > >   ring_hva == HVA(MemoryRegionSection) + hva_ring_offset
> > > > > > 
> > > > > > Doing this would allow us to drop ~100LOC of code which
> > > > > > figures out changed memory range and avoid calling not
> > > > > > needed reverse vhost_memory_map(is_write=true) which is
> > > > > > overkill for the task at hand.      
> > > > > 
> > > > > There are a few things about this I don't understand; however if
> > > > > it's right I agree that it means we can kill off my comparison
> > > > > code.
> > > > > 
> > > > > 
> > > > > First, can I check what constraints 'verify_ring' needs to check;
> > > > > if I'm understanding the original code it's checking that :
> > > > >     a) If a queue falls within a region, that the whole queue can
> > > > >        be mapped    
> > > >          see vhost_verify_ring_part_mapping():
> > > > 
> > > >              p = vhost_memory_map(dev, part_addr, &l, 1);                                 
> > > >              if (!p || l != part_size) 
> > > >                   error_out
> > > >              
> > > >          1st: (!p) requested part_addr must be mapped
> > > >               i.e. be a part of MemorySection in flatview
> > > >          AND
> > > >          2nd: (l != part_size) mapped range size must match what we asked for
> > > >               i.e. mapped as continuous range so that
> > > >                  [GPA, GPA + part_size) could directly correspond to [HVA, HVA + part_size)
> > > >               and that's is possible only withing 1 MemorySection && 1 MeoryRegion
> > > >               if I read address_space_map() correctly
> > > >                      flatview_translate() -> GPA's MemorySection
> > > >                      flatview_extend_translation() -> 1:1 GPA->HVA range size
> > > >               
> > > >          So answer in case of RAM memory region that we are interested in, would be:
> > > >          queue falls within a MemorySection and whole queue fits in to it    
> > > 
> > > Yes, OK.
> > >   
> > > > >     b) And the start of the queue corresponds to where we thought
> > > > >        it used to be (in GPA?)    
> > > >          that cached at vq->desc queue HVA hasn't changed after flatview change
> > > >             if (p != part)
> > > >                error_out    
> > > 
> > > OK, so it's the HVA that's not changed based on taking the part_addr
> > > which is GPA and checking the map?  
> > Yes, I think so.
> >   
> > > > >    so that doesn't have any constraint on the ordering of the regions
> > > > >    or whether the region is the same size or anything.
> > > > >   Also I don't think it would spot if there was a qeueue that had no
> > > > >   region associated with it at all.
> > > > > 
> > > > > Now, comments on your code below:
> > > > > 
> > > > >     
> > > > > > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > > > > > ---
> > > > > > PS:
> > > > > >    should be applied on top of David's series
> > > > > > 
> > > > > > ---
> > > > > >  include/hw/virtio/vhost.h |   2 -
> > > > > >  hw/virtio/vhost.c         | 195 ++++++++++++++--------------------------------
> > > > > >  2 files changed, 57 insertions(+), 140 deletions(-)
> > > > > > 
> > > > > > diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
> > > > > > index 467dc77..fc4af5c 100644
> > > > > > --- a/include/hw/virtio/vhost.h
> > > > > > +++ b/include/hw/virtio/vhost.h
> > > > > > @@ -68,8 +68,6 @@ struct vhost_dev {
> > > > > >      uint64_t log_size;
> > > > > >      Error *migration_blocker;
> > > > > >      bool memory_changed;
> > > > > > -    hwaddr mem_changed_start_addr;
> > > > > > -    hwaddr mem_changed_end_addr;
> > > > > >      const VhostOps *vhost_ops;
> > > > > >      void *opaque;
> > > > > >      struct vhost_log *log;
> > > > > > diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> > > > > > index 5b9a7d7..026bac3 100644
> > > > > > --- a/hw/virtio/vhost.c
> > > > > > +++ b/hw/virtio/vhost.c
> > > > > > @@ -296,35 +296,43 @@ static void vhost_memory_unmap(struct vhost_dev *dev, void *buffer,
> > > > > >      }
> > > > > >  }
> > > > > >  
> > > > > > -static int vhost_verify_ring_part_mapping(struct vhost_dev *dev,
> > > > > > -                                          void *part,
> > > > > > -                                          uint64_t part_addr,
> > > > > > -                                          uint64_t part_size,
> > > > > > -                                          uint64_t start_addr,
> > > > > > -                                          uint64_t size)
> > > > > > +static int vhost_verify_ring_part_mapping(void *ring_hva,
> > > > > > +                                          uint64_t ring_gpa,
> > > > > > +                                          uint64_t ring_size,
> > > > > > +                                          void *reg_hva,
> > > > > > +                                          uint64_t reg_gpa,
> > > > > > +                                          uint64_t reg_size)
> > > > > >  {
> > > > > > -    hwaddr l;
> > > > > > -    void *p;
> > > > > > -    int r = 0;
> > > > > > +    uint64_t hva_ring_offset;
> > > > > > +    uint64_t ring_last = range_get_last(ring_gpa, ring_size);
> > > > > > +    uint64_t reg_last = range_get_last(reg_gpa, reg_size);
> > > > > >  
> > > > > > -    if (!ranges_overlap(start_addr, size, part_addr, part_size)) {
> > > > > > +    /* start check from the end so that the rest of checks
> > > > > > +     * are done when whole ring is in merged sections range
> > > > > > +     */
> > > > > > +    if (ring_last < reg_last || ring_gpa > reg_last) {
> > > > > >          return 0;
> > > > > >      }      
> > > > > 
> > > > >   so does that mean if our region never grows to be as big as the ring
> > > > > we wont spot the problem?    
> > > > I think there is mistake here it should be:
> > > >    ring_last < reg_gpa || ring_gpa > reg_last
> > > > 
> > > > so if ring is out side of continuous region, we do not care => no change    
> > > 
> > > OK, I don't see how that corresponds to your 'start check from the end'
> > > comment - I thought it was doing something smarter to deal with this
> > > being called from the _cb routine potentially before another part of the
> > > range had been joined onto it.
> > > In that case, we can just use ranges_overlap like the original
> > > routine did.  
> > I suppose range check will do as well, the reason for check from the end
> > was optimization to make less checks than ranges_overlap(),
> > considering we are comparing against vhost_memory_region.  
> 
> Have a look at the RFC v2 I've posted; I've reworked it a bit so
> we call this code aftwards rather than from the _cb.
I'll skimmed through it already and it looks good to me on the first glance
/it needs spelling fixes at least/
I'll do line by line review and testing this week

> 
> > But it seems like a bit an overdoing, since by the commit time
> > flatview would contain 1:1 mapping of MemorySection:vhost_memory_region
> > (modulo sections we are not interested in). It should be unlikely to 
> > get 2 MemoryRegions allocated one after another and merge in vhost_memory_region()
> > into one vhost_memory_region. Maybe we would be better off with just
> > copying MemorySections into vhost_memory_region in vhost_update_mem_cb()
> > and drop merging altogether.  
> 
> Except I need the merging for the hugepage stuff that's coming in the
> postcopy world.
> 
> > I'd even consider 'non deterministic' merging of 2 MemoryRegions harmful
> > to migration, since vhost might see different memory map on target vs source.  
> 
> I think that's come up before and it was decided it's not a problem
> as long as the regions are still mapped; the only consistency required
> is between the qemu and the vhost (either the kernel or the vhost-user);
> it shouldn't be a guest visibile issue if I understand it correctly.
non consistent merging will cause change in number of memory regions
in vhost memmap and even if it's not visible to guests it's still
migration blocker in borderline cases where number of regions
is around vhost_backend_memslots_limit(). i.e. it source QEMU starts
fine by luck (merging happens) but target fails as it hits limit (merging fails).

Anyways it's problem that exists in current code as well and
not related to this series, so we can ponder on it later
(perhaps we don't see the issue because merging doesn't happen
in practice).

 
> Dave
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [RFC] vhost: check if ring mapping is still valid when building memmap
  2017-12-11 13:45             ` Igor Mammedov
@ 2017-12-11 15:43               ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 34+ messages in thread
From: Dr. David Alan Gilbert @ 2017-12-11 15:43 UTC (permalink / raw)
  To: Igor Mammedov; +Cc: qemu-devel, mst, groug, stefanha

* Igor Mammedov (imammedo@redhat.com) wrote:
> On Mon, 11 Dec 2017 11:03:00 +0000
> "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> 
> > * Igor Mammedov (imammedo@redhat.com) wrote:
> > > On Fri, 8 Dec 2017 17:51:56 +0000
> > > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> > >   
> > > > * Igor Mammedov (imammedo@redhat.com) wrote:  
> > > > > On Thu, 7 Dec 2017 18:17:51 +0000
> > > > > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> > > > >     
> > > > > > * Igor Mammedov (imammedo@redhat.com) wrote:    
> > > > > > > vhost_verify_ring_mappings() were used to verify that
> > > > > > > rings are still accessible and related memory hasn't
> > > > > > > been moved after flatview is updated.
> > > > > > > 
> > > > > > > It were doing checks by mapping ring's GPA+len and
> > > > > > > checking that HVA hasn't changed with new memory map.
> > > > > > > To avoid maybe expensive mapping call, we were
> > > > > > > identifying address range that changed and were doing
> > > > > > > mapping only if ring were in changed range.
> > > > > > > 
> > > > > > > However it's no neccessary to perform ringi's GPA
> > > > > > > mapping as we already have it's current HVA and all
> > > > > > > we need is to verify that ring's GPA translates to
> > > > > > > the same HVA in updated flatview.
> > > > > > > 
> > > > > > > That could be done during time when we are building
> > > > > > > vhost memmap where vhost_update_mem_cb() already maps
> > > > > > > every RAM MemoryRegionSection to get section HVA. This
> > > > > > > fact can be used to check that ring belongs to the same
> > > > > > > MemoryRegion in the same place, like this:
> > > > > > > 
> > > > > > >   hva_ring_offset = GPA(ring) - GPA(MemoryRegionSection)
> > > > > > >   ring_hva == HVA(MemoryRegionSection) + hva_ring_offset
> > > > > > > 
> > > > > > > Doing this would allow us to drop ~100LOC of code which
> > > > > > > figures out changed memory range and avoid calling not
> > > > > > > needed reverse vhost_memory_map(is_write=true) which is
> > > > > > > overkill for the task at hand.      
> > > > > > 
> > > > > > There are a few things about this I don't understand; however if
> > > > > > it's right I agree that it means we can kill off my comparison
> > > > > > code.
> > > > > > 
> > > > > > 
> > > > > > First, can I check what constraints 'verify_ring' needs to check;
> > > > > > if I'm understanding the original code it's checking that :
> > > > > >     a) If a queue falls within a region, that the whole queue can
> > > > > >        be mapped    
> > > > >          see vhost_verify_ring_part_mapping():
> > > > > 
> > > > >              p = vhost_memory_map(dev, part_addr, &l, 1);                                 
> > > > >              if (!p || l != part_size) 
> > > > >                   error_out
> > > > >              
> > > > >          1st: (!p) requested part_addr must be mapped
> > > > >               i.e. be a part of MemorySection in flatview
> > > > >          AND
> > > > >          2nd: (l != part_size) mapped range size must match what we asked for
> > > > >               i.e. mapped as continuous range so that
> > > > >                  [GPA, GPA + part_size) could directly correspond to [HVA, HVA + part_size)
> > > > >               and that's is possible only withing 1 MemorySection && 1 MeoryRegion
> > > > >               if I read address_space_map() correctly
> > > > >                      flatview_translate() -> GPA's MemorySection
> > > > >                      flatview_extend_translation() -> 1:1 GPA->HVA range size
> > > > >               
> > > > >          So answer in case of RAM memory region that we are interested in, would be:
> > > > >          queue falls within a MemorySection and whole queue fits in to it    
> > > > 
> > > > Yes, OK.
> > > >   
> > > > > >     b) And the start of the queue corresponds to where we thought
> > > > > >        it used to be (in GPA?)    
> > > > >          that cached at vq->desc queue HVA hasn't changed after flatview change
> > > > >             if (p != part)
> > > > >                error_out    
> > > > 
> > > > OK, so it's the HVA that's not changed based on taking the part_addr
> > > > which is GPA and checking the map?  
> > > Yes, I think so.
> > >   
> > > > > >    so that doesn't have any constraint on the ordering of the regions
> > > > > >    or whether the region is the same size or anything.
> > > > > >   Also I don't think it would spot if there was a qeueue that had no
> > > > > >   region associated with it at all.
> > > > > > 
> > > > > > Now, comments on your code below:
> > > > > > 
> > > > > >     
> > > > > > > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > > > > > > ---
> > > > > > > PS:
> > > > > > >    should be applied on top of David's series
> > > > > > > 
> > > > > > > ---
> > > > > > >  include/hw/virtio/vhost.h |   2 -
> > > > > > >  hw/virtio/vhost.c         | 195 ++++++++++++++--------------------------------
> > > > > > >  2 files changed, 57 insertions(+), 140 deletions(-)
> > > > > > > 
> > > > > > > diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
> > > > > > > index 467dc77..fc4af5c 100644
> > > > > > > --- a/include/hw/virtio/vhost.h
> > > > > > > +++ b/include/hw/virtio/vhost.h
> > > > > > > @@ -68,8 +68,6 @@ struct vhost_dev {
> > > > > > >      uint64_t log_size;
> > > > > > >      Error *migration_blocker;
> > > > > > >      bool memory_changed;
> > > > > > > -    hwaddr mem_changed_start_addr;
> > > > > > > -    hwaddr mem_changed_end_addr;
> > > > > > >      const VhostOps *vhost_ops;
> > > > > > >      void *opaque;
> > > > > > >      struct vhost_log *log;
> > > > > > > diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> > > > > > > index 5b9a7d7..026bac3 100644
> > > > > > > --- a/hw/virtio/vhost.c
> > > > > > > +++ b/hw/virtio/vhost.c
> > > > > > > @@ -296,35 +296,43 @@ static void vhost_memory_unmap(struct vhost_dev *dev, void *buffer,
> > > > > > >      }
> > > > > > >  }
> > > > > > >  
> > > > > > > -static int vhost_verify_ring_part_mapping(struct vhost_dev *dev,
> > > > > > > -                                          void *part,
> > > > > > > -                                          uint64_t part_addr,
> > > > > > > -                                          uint64_t part_size,
> > > > > > > -                                          uint64_t start_addr,
> > > > > > > -                                          uint64_t size)
> > > > > > > +static int vhost_verify_ring_part_mapping(void *ring_hva,
> > > > > > > +                                          uint64_t ring_gpa,
> > > > > > > +                                          uint64_t ring_size,
> > > > > > > +                                          void *reg_hva,
> > > > > > > +                                          uint64_t reg_gpa,
> > > > > > > +                                          uint64_t reg_size)
> > > > > > >  {
> > > > > > > -    hwaddr l;
> > > > > > > -    void *p;
> > > > > > > -    int r = 0;
> > > > > > > +    uint64_t hva_ring_offset;
> > > > > > > +    uint64_t ring_last = range_get_last(ring_gpa, ring_size);
> > > > > > > +    uint64_t reg_last = range_get_last(reg_gpa, reg_size);
> > > > > > >  
> > > > > > > -    if (!ranges_overlap(start_addr, size, part_addr, part_size)) {
> > > > > > > +    /* start check from the end so that the rest of checks
> > > > > > > +     * are done when whole ring is in merged sections range
> > > > > > > +     */
> > > > > > > +    if (ring_last < reg_last || ring_gpa > reg_last) {
> > > > > > >          return 0;
> > > > > > >      }      
> > > > > > 
> > > > > >   so does that mean if our region never grows to be as big as the ring
> > > > > > we wont spot the problem?    
> > > > > I think there is mistake here it should be:
> > > > >    ring_last < reg_gpa || ring_gpa > reg_last
> > > > > 
> > > > > so if ring is out side of continuous region, we do not care => no change    
> > > > 
> > > > OK, I don't see how that corresponds to your 'start check from the end'
> > > > comment - I thought it was doing something smarter to deal with this
> > > > being called from the _cb routine potentially before another part of the
> > > > range had been joined onto it.
> > > > In that case, we can just use ranges_overlap like the original
> > > > routine did.  
> > > I suppose range check will do as well, the reason for check from the end
> > > was optimization to make less checks than ranges_overlap(),
> > > considering we are comparing against vhost_memory_region.  
> > 
> > Have a look at the RFC v2 I've posted; I've reworked it a bit so
> > we call this code aftwards rather than from the _cb.
> I'll skimmed through it already and it looks good to me on the first glance

OK, great.

> /it needs spelling fixes at least/

Sounds normal for my code!  I'll go through and try and clean them
up; I'll also try and make sure it's bisectable.

> I'll do line by line review and testing this week

Thanks.

> > 
> > > But it seems like a bit an overdoing, since by the commit time
> > > flatview would contain 1:1 mapping of MemorySection:vhost_memory_region
> > > (modulo sections we are not interested in). It should be unlikely to 
> > > get 2 MemoryRegions allocated one after another and merge in vhost_memory_region()
> > > into one vhost_memory_region. Maybe we would be better off with just
> > > copying MemorySections into vhost_memory_region in vhost_update_mem_cb()
> > > and drop merging altogether.  
> > 
> > Except I need the merging for the hugepage stuff that's coming in the
> > postcopy world.
> > 
> > > I'd even consider 'non deterministic' merging of 2 MemoryRegions harmful
> > > to migration, since vhost might see different memory map on target vs source.  
> > 
> > I think that's come up before and it was decided it's not a problem
> > as long as the regions are still mapped; the only consistency required
> > is between the qemu and the vhost (either the kernel or the vhost-user);
> > it shouldn't be a guest visibile issue if I understand it correctly.
> non consistent merging will cause change in number of memory regions
> in vhost memmap and even if it's not visible to guests it's still
> migration blocker in borderline cases where number of regions
> is around vhost_backend_memslots_limit(). i.e. it source QEMU starts
> fine by luck (merging happens) but target fails as it hits limit (merging fails).

Yep, but it's weighed against merging meaning that you take less
slots up so you're less likely to run into the limit.
IMHO the real problem is having a fixed small arbitrary number of slots
rather than something dynamic.

> Anyways it's problem that exists in current code as well and
> not related to this series, so we can ponder on it later
> (perhaps we don't see the issue because merging doesn't happen
> in practice).

I suspect it does but you get very little variation and very rarely
get near the limit with merging, I think the
worst case is there's a VGA mapping case which can be a bit random
but that probably happens rarely with a lot of regions.

I tested a simple world with real vhost (rather than user) and it
is surviving hotplug RAM.

Dave
> 
>  
> > Dave
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2017-12-11 15:43 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-29 18:50 [Qemu-devel] [RFC 0/7] Rework vhost memory region updates Dr. David Alan Gilbert (git)
2017-11-29 18:50 ` [Qemu-devel] [RFC 1/7] memory: address_space_iterate Dr. David Alan Gilbert (git)
2017-11-29 18:50 ` [Qemu-devel] [RFC 2/7] vhost: Move log_dirty check Dr. David Alan Gilbert (git)
2017-11-29 18:50 ` [Qemu-devel] [RFC 3/7] vhost: New memory update functions Dr. David Alan Gilbert (git)
2017-11-30 15:48   ` Igor Mammedov
2017-12-05 18:25     ` Dr. David Alan Gilbert
2017-11-29 18:50 ` [Qemu-devel] [RFC 4/7] vhost: update_mem_cb implementation Dr. David Alan Gilbert (git)
2017-11-30 11:27   ` Igor Mammedov
2017-12-06 20:09     ` Dr. David Alan Gilbert
2017-11-29 18:50 ` [Qemu-devel] [RFC 5/7] vhost: Compare new and old memory lists Dr. David Alan Gilbert (git)
2017-11-29 18:50 ` [Qemu-devel] [RFC 6/7] vhost: Copy updated region data into device state Dr. David Alan Gilbert (git)
2017-11-29 18:50 ` [Qemu-devel] [RFC 7/7] vhost: Remove vhost_set_memory and children Dr. David Alan Gilbert (git)
2017-11-30 11:22 ` [Qemu-devel] [RFC 0/7] Rework vhost memory region updates Igor Mammedov
2017-11-30 12:08   ` Dr. David Alan Gilbert
2017-11-30 12:40     ` Igor Mammedov
2017-11-30 12:47       ` Dr. David Alan Gilbert
2017-11-30 12:58         ` Igor Mammedov
2017-11-30 13:06           ` Dr. David Alan Gilbert
2017-11-30 15:08             ` Igor Mammedov
2017-11-30 15:18               ` Dr. David Alan Gilbert
2017-11-30 15:32                 ` Igor Mammedov
2017-11-30 15:41                   ` Dr. David Alan Gilbert
2017-11-30 16:51               ` Greg Kurz
2017-12-01 10:02 ` Stefan Hajnoczi
2017-12-01 10:19   ` Dr. David Alan Gilbert
2017-12-01 14:22 ` [Qemu-devel] [RFC] vhost: check if ring mapping is still valid when building memmap Igor Mammedov
2017-12-07 18:17   ` Dr. David Alan Gilbert
2017-12-08 14:42     ` Igor Mammedov
2017-12-08 17:51       ` Dr. David Alan Gilbert
2017-12-11  9:37         ` Igor Mammedov
2017-12-11 11:03           ` Dr. David Alan Gilbert
2017-12-11 13:45             ` Igor Mammedov
2017-12-11 15:43               ` Dr. David Alan Gilbert
2017-12-08 20:45   ` Dr. David Alan Gilbert

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.