qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 00/10] KVM: Dirty ring support (QEMU part)
@ 2021-03-10 20:32 Peter Xu
  2021-03-10 20:32 ` [PATCH v5 01/10] memory: Introduce log_sync_global() to memory listener Peter Xu
                   ` (11 more replies)
  0 siblings, 12 replies; 29+ messages in thread
From: Peter Xu @ 2021-03-10 20:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, Hyman, Dr . David Alan Gilbert, peterx

This is v5 of the qemu dirty ring interface support.

v5:
- rebase
- dropped patch "update-linux-headers: Include const.h" after rebase
- dropped patch "KVM: Fixup kvm_log_clear_one_slot() ioctl return check" since
  similar patch got merged recently (38e0b7904eca7cd32f8953c3)

========= v4 cover letter below =============

It is merely the same as v3 content-wise, but there're a few things to mention
besides the rebase itself:

  - I picked up two patches from Eric Farman for the linux-header updates (from
    Eric's v3 series) for convenience just in case any of the series would got
    queued by any maintainer.

  - One more patch is added as "KVM: Disable manual dirty log when dirty ring
    enabled".  I found this when testing the branch after rebasing to latest
    qemu, that not only the manual dirty log capability is not needed for kvm
    dirty ring, but more importantly INITIALLY_ALL_SET is totally against kvm
    dirty ring and it could silently crash the guest after migration.  For this
    new commit, I touched up "KVM: Add dirty-gfn-count property" a bit.

  - A few more documentation lines in qemu-options.hx.

  - I removed the RFC tag after kernel series got merged.

Again, this is only the 1st step to support dirty ring.  Ideally dirty ring
should grant QEMU the possibility to remove the whole layered dirty bitmap so
that dirty ring will work similarly as auto-converge enabled but should better;
we will just throttle vcpus with the dirty ring kvm exit rather than explicitly
adding a timer to stop the vcpu thread from entering the guest again (like what
we did with current migration auto-converge).  Some more information could also
be found in the kvm forum 2020 talk regarding kvm dirty ring (slides 21/22 [1]).

That next step (to remove all the dirty bitmaps, as mentioned above) is still
discussable: firstly I don't know whether there's anything I've overlooked in
there.  Meanwhile that's also only services huge VM cases, may not be extremely
helpful with a lot major scenarios where VMs are not that huge.

There's probably other ways to fix huge VM migration issues, majorly focusing
on responsiveness and convergence.  For example, Google has proposed some new
userfaultfd kernel capability called "minor modes" [2] to track page minor
faults and that could be finally served for that purpose too using postcopy.
That's another long story so I'll stop here, but just as a marker along with
the dirty ring series so there'll still be a record to reference.

Said that, I still think this series is very worth merging even if we don't
persue the next steps yet, since dirty ring is disabled by default, and we can
always work upon this series.

Please review, thanks.

V3: https://lore.kernel.org/qemu-devel/20200523232035.1029349-1-peterx@redhat.com/
    (V3 contains all the pre-v3 changelog)

QEMU branch for testing (requires kernel version 5.11-rc1+):
    https://github.com/xzpeter/qemu/tree/kvm-dirty-ring

[1] https://static.sched.com/hosted_files/kvmforum2020/97/kvm_dirty_ring_peter.pdf
[2] https://lore.kernel.org/lkml/20210107190453.3051110-1-axelrasmussen@google.com/

---------------------------8<---------------------------------

Overview
========

KVM dirty ring is a new interface to pass over dirty bits from kernel
to the userspace.  Instead of using a bitmap for each memory region,
the dirty ring contains an array of dirtied GPAs to fetch, one ring
per vcpu.

There're a few major changes comparing to how the old dirty logging
interface would work:

- Granularity of dirty bits

  KVM dirty ring interface does not offer memory region level
  granularity to collect dirty bits (i.e., per KVM memory
  slot). Instead the dirty bit is collected globally for all the vcpus
  at once.  The major effect is on VGA part because VGA dirty tracking
  is enabled as long as the device is created, also it was in memory
  region granularity.  Now that operation will be amplified to a VM
  sync.  Maybe there's smarter way to do the same thing in VGA with
  the new interface, but so far I don't see it affects much at least
  on regular VMs.

- Collection of dirty bits

  The old dirty logging interface collects KVM dirty bits when
  synchronizing dirty bits.  KVM dirty ring interface instead used a
  standalone thread to do that.  So when the other thread (e.g., the
  migration thread) wants to synchronize the dirty bits, it simply
  kick the thread and wait until it flushes all the dirty bits to the
  ramblock dirty bitmap.

A new parameter "dirty-ring-size" is added to "-accel kvm".  By
default, dirty ring is still disabled (size==0).  To enable it, we
need to be with:

  -accel kvm,dirty-ring-size=65536

This establishes a 64K dirty ring buffer per vcpu.  Then if we
migrate, it'll switch to dirty ring.

I gave it a shot with a 24G guest, 8 vcpus, using 10g NIC as migration
channel.  When idle or dirty workload small, I don't observe major
difference on total migration time.  When with higher random dirty
workload (800MB/s dirty rate upon 20G memory, worse for kvm dirty
ring). Total migration time is (ping pong migrate for 6 times, in
seconds):

|-------------------------+---------------|
| dirty ring (4k entries) | dirty logging |
|-------------------------+---------------|
|                      70 |            58 |
|                      78 |            70 |
|                      72 |            48 |
|                      74 |            52 |
|                      83 |            49 |
|                      65 |            54 |
|-------------------------+---------------|

Summary:

dirty ring average:    73s
dirty logging average: 55s

The KVM dirty ring will be slower in above case.  The number may show
that the dirty logging is still preferred as a default value because
small/medium VMs are still major cases, and high dirty workload
happens frequently too.  And that's what this series did.

TODO:

- Consider to drop the BQL dependency: then we can run the reaper thread in
  parallel of main thread.  Needs some thought around the race conditions.

- Consider to drop the kvmslot bitmap: logically this can be dropped with kvm
  dirty ring, not only for space saving, but also it's still another layer
  linear to guest mem size which is against the whole idea of kvm dirty ring.
  This should make above number (of kvm dirty ring) even smaller (but still may
  not be as good as dirty logging when with such high workload).

Please refer to the code and comment itself for more information.

Thanks,

Peter Xu (10):
  memory: Introduce log_sync_global() to memory listener
  KVM: Use a big lock to replace per-kml slots_lock
  KVM: Create the KVMSlot dirty bitmap on flag changes
  KVM: Provide helper to get kvm dirty log
  KVM: Provide helper to sync dirty bitmap from slot to ramblock
  KVM: Simplify dirty log sync in kvm_set_phys_mem
  KVM: Cache kvm slot dirty bitmap size
  KVM: Add dirty-gfn-count property
  KVM: Disable manual dirty log when dirty ring enabled
  KVM: Dirty ring support

 accel/kvm/kvm-all.c      | 585 +++++++++++++++++++++++++++++++++------
 accel/kvm/trace-events   |   7 +
 include/exec/memory.h    |  12 +
 include/hw/core/cpu.h    |   8 +
 include/sysemu/kvm_int.h |   7 +-
 qemu-options.hx          |  12 +
 softmmu/memory.c         |  33 ++-
 7 files changed, 565 insertions(+), 99 deletions(-)

-- 
2.26.2




^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v5 01/10] memory: Introduce log_sync_global() to memory listener
  2021-03-10 20:32 [PATCH v5 00/10] KVM: Dirty ring support (QEMU part) Peter Xu
@ 2021-03-10 20:32 ` Peter Xu
  2021-03-10 20:32 ` [PATCH v5 02/10] KVM: Use a big lock to replace per-kml slots_lock Peter Xu
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 29+ messages in thread
From: Peter Xu @ 2021-03-10 20:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, Hyman, Dr . David Alan Gilbert, peterx

Some of the memory listener may want to do log synchronization without
being able to specify a range of memory to sync but always globally.
Such a memory listener should provide this new method instead of the
log_sync() method.

Obviously we can also achieve similar thing when we put the global
sync logic into a log_sync() handler. However that's not efficient
enough because otherwise memory_global_dirty_log_sync() may do the
global sync N times, where N is the number of flat ranges in the
address space.

Make this new method be exclusive to log_sync().

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 include/exec/memory.h | 12 ++++++++++++
 softmmu/memory.c      | 33 +++++++++++++++++++++++----------
 2 files changed, 35 insertions(+), 10 deletions(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index c6fb714e499..e52e2c31eb7 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -616,6 +616,18 @@ struct MemoryListener {
      */
     void (*log_sync)(MemoryListener *listener, MemoryRegionSection *section);
 
+    /**
+     * @log_sync_global:
+     *
+     * This is the global version of @log_sync when the listener does
+     * not have a way to synchronize the log with finer granularity.
+     * When the listener registers with @log_sync_global defined, then
+     * its @log_sync must be NULL.  Vice versa.
+     *
+     * @listener: The #MemoryListener.
+     */
+    void (*log_sync_global)(MemoryListener *listener);
+
     /**
      * @log_clear:
      *
diff --git a/softmmu/memory.c b/softmmu/memory.c
index 874a8fccdee..f655ed83129 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -2056,6 +2056,10 @@ void memory_region_set_dirty(MemoryRegion *mr, hwaddr addr,
                                         memory_region_get_dirty_log_mask(mr));
 }
 
+/*
+ * If memory region `mr' is NULL, do global sync.  Otherwise, sync
+ * dirty bitmap for the specified memory region.
+ */
 static void memory_region_sync_dirty_bitmap(MemoryRegion *mr)
 {
     MemoryListener *listener;
@@ -2069,18 +2073,24 @@ static void memory_region_sync_dirty_bitmap(MemoryRegion *mr)
      * address space once.
      */
     QTAILQ_FOREACH(listener, &memory_listeners, link) {
-        if (!listener->log_sync) {
-            continue;
-        }
-        as = listener->address_space;
-        view = address_space_get_flatview(as);
-        FOR_EACH_FLAT_RANGE(fr, view) {
-            if (fr->dirty_log_mask && (!mr || fr->mr == mr)) {
-                MemoryRegionSection mrs = section_from_flat_range(fr, view);
-                listener->log_sync(listener, &mrs);
+        if (listener->log_sync) {
+            as = listener->address_space;
+            view = address_space_get_flatview(as);
+            FOR_EACH_FLAT_RANGE(fr, view) {
+                if (fr->dirty_log_mask && (!mr || fr->mr == mr)) {
+                    MemoryRegionSection mrs = section_from_flat_range(fr, view);
+                    listener->log_sync(listener, &mrs);
+                }
             }
+            flatview_unref(view);
+        } else if (listener->log_sync_global) {
+            /*
+             * No matter whether MR is specified, what we can do here
+             * is to do a global sync, because we are not capable to
+             * sync in a finer granularity.
+             */
+            listener->log_sync_global(listener);
         }
-        flatview_unref(view);
     }
 }
 
@@ -2768,6 +2778,9 @@ void memory_listener_register(MemoryListener *listener, AddressSpace *as)
 {
     MemoryListener *other = NULL;
 
+    /* Only one of them can be defined for a listener */
+    assert(!(listener->log_sync && listener->log_sync_global));
+
     listener->address_space = as;
     if (QTAILQ_EMPTY(&memory_listeners)
         || listener->priority >= QTAILQ_LAST(&memory_listeners)->priority) {
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v5 02/10] KVM: Use a big lock to replace per-kml slots_lock
  2021-03-10 20:32 [PATCH v5 00/10] KVM: Dirty ring support (QEMU part) Peter Xu
  2021-03-10 20:32 ` [PATCH v5 01/10] memory: Introduce log_sync_global() to memory listener Peter Xu
@ 2021-03-10 20:32 ` Peter Xu
  2021-03-22 10:47   ` Keqian Zhu
  2021-03-10 20:32 ` [PATCH v5 03/10] KVM: Create the KVMSlot dirty bitmap on flag changes Peter Xu
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 29+ messages in thread
From: Peter Xu @ 2021-03-10 20:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, Hyman, Dr . David Alan Gilbert, peterx

Per-kml slots_lock will bring some trouble if we want to take all slots_lock of
all the KMLs, especially when we're in a context that we could have taken some
of the KML slots_lock, then we even need to figure out what we've taken and
what we need to take.

Make this simple by merging all KML slots_lock into a single slots lock.

Per-kml slots_lock isn't anything that helpful anyway - so far only x86 has two
address spaces (so, two slots_locks).  All the rest archs will be having one
address space always, which means there's actually one slots_lock so it will be
the same as before.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 accel/kvm/kvm-all.c      | 32 +++++++++++++++++---------------
 include/sysemu/kvm_int.h |  2 --
 2 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index f88a52393fe..94e881f123b 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -174,8 +174,10 @@ typedef struct KVMResampleFd KVMResampleFd;
 static QLIST_HEAD(, KVMResampleFd) kvm_resample_fd_list =
     QLIST_HEAD_INITIALIZER(kvm_resample_fd_list);
 
-#define kvm_slots_lock(kml)      qemu_mutex_lock(&(kml)->slots_lock)
-#define kvm_slots_unlock(kml)    qemu_mutex_unlock(&(kml)->slots_lock)
+static QemuMutex kml_slots_lock;
+
+#define kvm_slots_lock()  qemu_mutex_lock(&kml_slots_lock)
+#define kvm_slots_unlock()  qemu_mutex_unlock(&kml_slots_lock)
 
 static inline void kvm_resample_fd_remove(int gsi)
 {
@@ -241,9 +243,9 @@ bool kvm_has_free_slot(MachineState *ms)
     bool result;
     KVMMemoryListener *kml = &s->memory_listener;
 
-    kvm_slots_lock(kml);
+    kvm_slots_lock();
     result = !!kvm_get_free_slot(kml);
-    kvm_slots_unlock(kml);
+    kvm_slots_unlock();
 
     return result;
 }
@@ -309,7 +311,7 @@ int kvm_physical_memory_addr_from_host(KVMState *s, void *ram,
     KVMMemoryListener *kml = &s->memory_listener;
     int i, ret = 0;
 
-    kvm_slots_lock(kml);
+    kvm_slots_lock();
     for (i = 0; i < s->nr_slots; i++) {
         KVMSlot *mem = &kml->slots[i];
 
@@ -319,7 +321,7 @@ int kvm_physical_memory_addr_from_host(KVMState *s, void *ram,
             break;
         }
     }
-    kvm_slots_unlock(kml);
+    kvm_slots_unlock();
 
     return ret;
 }
@@ -515,7 +517,7 @@ static int kvm_section_update_flags(KVMMemoryListener *kml,
         return 0;
     }
 
-    kvm_slots_lock(kml);
+    kvm_slots_lock();
 
     while (size && !ret) {
         slot_size = MIN(kvm_max_slot_size, size);
@@ -531,7 +533,7 @@ static int kvm_section_update_flags(KVMMemoryListener *kml,
     }
 
 out:
-    kvm_slots_unlock(kml);
+    kvm_slots_unlock();
     return ret;
 }
 
@@ -819,7 +821,7 @@ static int kvm_physical_log_clear(KVMMemoryListener *kml,
         return ret;
     }
 
-    kvm_slots_lock(kml);
+    kvm_slots_lock();
 
     for (i = 0; i < s->nr_slots; i++) {
         mem = &kml->slots[i];
@@ -845,7 +847,7 @@ static int kvm_physical_log_clear(KVMMemoryListener *kml,
         }
     }
 
-    kvm_slots_unlock(kml);
+    kvm_slots_unlock();
 
     return ret;
 }
@@ -1150,7 +1152,7 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
     ram = memory_region_get_ram_ptr(mr) + section->offset_within_region +
           (start_addr - section->offset_within_address_space);
 
-    kvm_slots_lock(kml);
+    kvm_slots_lock();
 
     if (!add) {
         do {
@@ -1208,7 +1210,7 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
     } while (size);
 
 out:
-    kvm_slots_unlock(kml);
+    kvm_slots_unlock();
 }
 
 static void kvm_region_add(MemoryListener *listener,
@@ -1235,9 +1237,9 @@ static void kvm_log_sync(MemoryListener *listener,
     KVMMemoryListener *kml = container_of(listener, KVMMemoryListener, listener);
     int r;
 
-    kvm_slots_lock(kml);
+    kvm_slots_lock();
     r = kvm_physical_sync_dirty_bitmap(kml, section);
-    kvm_slots_unlock(kml);
+    kvm_slots_unlock();
     if (r < 0) {
         abort();
     }
@@ -1337,7 +1339,7 @@ void kvm_memory_listener_register(KVMState *s, KVMMemoryListener *kml,
 {
     int i;
 
-    qemu_mutex_init(&kml->slots_lock);
+    qemu_mutex_init(&kml_slots_lock);
     kml->slots = g_malloc0(s->nr_slots * sizeof(KVMSlot));
     kml->as_id = as_id;
 
diff --git a/include/sysemu/kvm_int.h b/include/sysemu/kvm_int.h
index ccb8869f01b..1da30e18841 100644
--- a/include/sysemu/kvm_int.h
+++ b/include/sysemu/kvm_int.h
@@ -27,8 +27,6 @@ typedef struct KVMSlot
 
 typedef struct KVMMemoryListener {
     MemoryListener listener;
-    /* Protects the slots and all inside them */
-    QemuMutex slots_lock;
     KVMSlot *slots;
     int as_id;
 } KVMMemoryListener;
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v5 03/10] KVM: Create the KVMSlot dirty bitmap on flag changes
  2021-03-10 20:32 [PATCH v5 00/10] KVM: Dirty ring support (QEMU part) Peter Xu
  2021-03-10 20:32 ` [PATCH v5 01/10] memory: Introduce log_sync_global() to memory listener Peter Xu
  2021-03-10 20:32 ` [PATCH v5 02/10] KVM: Use a big lock to replace per-kml slots_lock Peter Xu
@ 2021-03-10 20:32 ` Peter Xu
  2021-03-10 20:32 ` [PATCH v5 04/10] KVM: Provide helper to get kvm dirty log Peter Xu
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 29+ messages in thread
From: Peter Xu @ 2021-03-10 20:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, Hyman, Dr . David Alan Gilbert, peterx

Previously we have two places that will create the per KVMSlot dirty
bitmap:

  1. When a newly created KVMSlot has dirty logging enabled,
  2. When the first log_sync() happens for a memory slot.

The 2nd case is lazy-init, while the 1st case is not (which is a fix
of what the 2nd case missed).

To do explicit initialization of dirty bitmaps, what we're missing is
to create the dirty bitmap when the slot changed from not-dirty-track
to dirty-track.  Do that in kvm_slot_update_flags().

With that, we can safely remove the 2nd lazy-init.

This change will be needed for kvm dirty ring because kvm dirty ring
does not use the log_sync() interface at all.

Also move all the pre-checks into kvm_slot_init_dirty_bitmap().

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 accel/kvm/kvm-all.c | 23 +++++++++--------------
 1 file changed, 9 insertions(+), 14 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 94e881f123b..fa337418636 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -179,6 +179,8 @@ static QemuMutex kml_slots_lock;
 #define kvm_slots_lock()  qemu_mutex_lock(&kml_slots_lock)
 #define kvm_slots_unlock()  qemu_mutex_unlock(&kml_slots_lock)
 
+static void kvm_slot_init_dirty_bitmap(KVMSlot *mem);
+
 static inline void kvm_resample_fd_remove(int gsi)
 {
     KVMResampleFd *rfd;
@@ -502,6 +504,7 @@ static int kvm_slot_update_flags(KVMMemoryListener *kml, KVMSlot *mem,
         return 0;
     }
 
+    kvm_slot_init_dirty_bitmap(mem);
     return kvm_set_user_memory_region(kml, mem, false);
 }
 
@@ -586,8 +589,12 @@ static int kvm_get_dirty_pages_log_range(MemoryRegionSection *section,
 #define ALIGN(x, y)  (((x)+(y)-1) & ~((y)-1))
 
 /* Allocate the dirty bitmap for a slot  */
-static void kvm_memslot_init_dirty_bitmap(KVMSlot *mem)
+static void kvm_slot_init_dirty_bitmap(KVMSlot *mem)
 {
+    if (!(mem->flags & KVM_MEM_LOG_DIRTY_PAGES) || mem->dirty_bmap) {
+        return;
+    }
+
     /*
      * XXX bad kernel interface alert
      * For dirty bitmap, kernel allocates array of size aligned to
@@ -642,11 +649,6 @@ static int kvm_physical_sync_dirty_bitmap(KVMMemoryListener *kml,
             goto out;
         }
 
-        if (!mem->dirty_bmap) {
-            /* Allocate on the first log_sync, once and for all */
-            kvm_memslot_init_dirty_bitmap(mem);
-        }
-
         d.dirty_bitmap = mem->dirty_bmap;
         d.slot = mem->slot | (kml->as_id << 16);
         ret = kvm_vm_ioctl(s, KVM_GET_DIRTY_LOG, &d);
@@ -1190,14 +1192,7 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
         mem->start_addr = start_addr;
         mem->ram = ram;
         mem->flags = kvm_mem_flags(mr);
-
-        if (mem->flags & KVM_MEM_LOG_DIRTY_PAGES) {
-            /*
-             * Reallocate the bmap; it means it doesn't disappear in
-             * middle of a migrate.
-             */
-            kvm_memslot_init_dirty_bitmap(mem);
-        }
+        kvm_slot_init_dirty_bitmap(mem);
         err = kvm_set_user_memory_region(kml, mem, true);
         if (err) {
             fprintf(stderr, "%s: error registering slot: %s\n", __func__,
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v5 04/10] KVM: Provide helper to get kvm dirty log
  2021-03-10 20:32 [PATCH v5 00/10] KVM: Dirty ring support (QEMU part) Peter Xu
                   ` (2 preceding siblings ...)
  2021-03-10 20:32 ` [PATCH v5 03/10] KVM: Create the KVMSlot dirty bitmap on flag changes Peter Xu
@ 2021-03-10 20:32 ` Peter Xu
  2021-03-10 20:32 ` [PATCH v5 05/10] KVM: Provide helper to sync dirty bitmap from slot to ramblock Peter Xu
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 29+ messages in thread
From: Peter Xu @ 2021-03-10 20:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, Hyman, Dr . David Alan Gilbert, peterx

Provide a helper kvm_slot_get_dirty_log() to make the function
kvm_physical_sync_dirty_bitmap() clearer.  We can even cache the as_id
into KVMSlot when it is created, so that we don't even need to pass it
down every time.

Since at it, remove return value of kvm_physical_sync_dirty_bitmap()
because it should never fail.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 accel/kvm/kvm-all.c      | 52 +++++++++++++++++++++++-----------------
 include/sysemu/kvm_int.h |  2 ++
 2 files changed, 32 insertions(+), 22 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index fa337418636..853dfb076bd 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -617,6 +617,30 @@ static void kvm_slot_init_dirty_bitmap(KVMSlot *mem)
     mem->dirty_bmap = g_malloc0(bitmap_size);
 }
 
+/*
+ * Sync dirty bitmap from kernel to KVMSlot.dirty_bmap, return true if
+ * succeeded, false otherwise
+ */
+static bool kvm_slot_get_dirty_log(KVMState *s, KVMSlot *slot)
+{
+    struct kvm_dirty_log d = {};
+    int ret;
+
+    d.dirty_bitmap = slot->dirty_bmap;
+    d.slot = slot->slot | (slot->as_id << 16);
+    ret = kvm_vm_ioctl(s, KVM_GET_DIRTY_LOG, &d);
+
+    if (ret == -ENOENT) {
+        /* kernel does not have dirty bitmap in this slot */
+        ret = 0;
+    }
+    if (ret) {
+        error_report_once("%s: KVM_GET_DIRTY_LOG failed with %d",
+                          __func__, ret);
+    }
+    return ret == 0;
+}
+
 /**
  * kvm_physical_sync_dirty_bitmap - Sync dirty bitmap from kernel space
  *
@@ -628,15 +652,13 @@ static void kvm_slot_init_dirty_bitmap(KVMSlot *mem)
  * @kml: the KVM memory listener object
  * @section: the memory section to sync the dirty bitmap with
  */
-static int kvm_physical_sync_dirty_bitmap(KVMMemoryListener *kml,
-                                          MemoryRegionSection *section)
+static void kvm_physical_sync_dirty_bitmap(KVMMemoryListener *kml,
+                                           MemoryRegionSection *section)
 {
     KVMState *s = kvm_state;
-    struct kvm_dirty_log d = {};
     KVMSlot *mem;
     hwaddr start_addr, size;
     hwaddr slot_size, slot_offset = 0;
-    int ret = 0;
 
     size = kvm_align_section(section, &start_addr);
     while (size) {
@@ -646,19 +668,10 @@ static int kvm_physical_sync_dirty_bitmap(KVMMemoryListener *kml,
         mem = kvm_lookup_matching_slot(kml, start_addr, slot_size);
         if (!mem) {
             /* We don't have a slot if we want to trap every access. */
-            goto out;
+            return;
         }
 
-        d.dirty_bitmap = mem->dirty_bmap;
-        d.slot = mem->slot | (kml->as_id << 16);
-        ret = kvm_vm_ioctl(s, KVM_GET_DIRTY_LOG, &d);
-        if (ret == -ENOENT) {
-            /* kernel does not have dirty bitmap in this slot */
-            ret = 0;
-        } else if (ret < 0) {
-            error_report("ioctl KVM_GET_DIRTY_LOG failed: %d", errno);
-            goto out;
-        } else {
+        if (kvm_slot_get_dirty_log(s, mem)) {
             subsection.offset_within_region += slot_offset;
             subsection.size = int128_make64(slot_size);
             kvm_get_dirty_pages_log_range(&subsection, d.dirty_bitmap);
@@ -668,8 +681,6 @@ static int kvm_physical_sync_dirty_bitmap(KVMMemoryListener *kml,
         start_addr += slot_size;
         size -= slot_size;
     }
-out:
-    return ret;
 }
 
 /* Alignment requirement for KVM_CLEAR_DIRTY_LOG - 64 pages */
@@ -1188,6 +1199,7 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
     do {
         slot_size = MIN(kvm_max_slot_size, size);
         mem = kvm_alloc_slot(kml);
+        mem->as_id = kml->as_id;
         mem->memory_size = slot_size;
         mem->start_addr = start_addr;
         mem->ram = ram;
@@ -1230,14 +1242,10 @@ static void kvm_log_sync(MemoryListener *listener,
                          MemoryRegionSection *section)
 {
     KVMMemoryListener *kml = container_of(listener, KVMMemoryListener, listener);
-    int r;
 
     kvm_slots_lock();
-    r = kvm_physical_sync_dirty_bitmap(kml, section);
+    kvm_physical_sync_dirty_bitmap(kml, section);
     kvm_slots_unlock();
-    if (r < 0) {
-        abort();
-    }
 }
 
 static void kvm_log_clear(MemoryListener *listener,
diff --git a/include/sysemu/kvm_int.h b/include/sysemu/kvm_int.h
index 1da30e18841..e13075f738a 100644
--- a/include/sysemu/kvm_int.h
+++ b/include/sysemu/kvm_int.h
@@ -23,6 +23,8 @@ typedef struct KVMSlot
     int old_flags;
     /* Dirty bitmap cache for the slot */
     unsigned long *dirty_bmap;
+    /* Cache of the address space ID */
+    int as_id;
 } KVMSlot;
 
 typedef struct KVMMemoryListener {
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v5 05/10] KVM: Provide helper to sync dirty bitmap from slot to ramblock
  2021-03-10 20:32 [PATCH v5 00/10] KVM: Dirty ring support (QEMU part) Peter Xu
                   ` (3 preceding siblings ...)
  2021-03-10 20:32 ` [PATCH v5 04/10] KVM: Provide helper to get kvm dirty log Peter Xu
@ 2021-03-10 20:32 ` Peter Xu
  2021-03-10 20:32 ` [PATCH v5 06/10] KVM: Simplify dirty log sync in kvm_set_phys_mem Peter Xu
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 29+ messages in thread
From: Peter Xu @ 2021-03-10 20:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, Hyman, Dr . David Alan Gilbert, peterx

kvm_physical_sync_dirty_bitmap() calculates the ramblock offset in an
awkward way from the MemoryRegionSection that passed in from the
caller.  The truth is for each KVMSlot the ramblock offset never
change for the lifecycle.  Cache the ramblock offset for each KVMSlot
into the structure when the KVMSlot is created.

With that, we can further simplify kvm_physical_sync_dirty_bitmap()
with a helper to sync KVMSlot dirty bitmap to the ramblock dirty
bitmap of a specific KVMSlot.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 accel/kvm/kvm-all.c      | 37 +++++++++++++++++--------------------
 include/sysemu/kvm_int.h |  2 ++
 2 files changed, 19 insertions(+), 20 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 853dfb076bd..65dc00b0a61 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -575,15 +575,12 @@ static void kvm_log_stop(MemoryListener *listener,
 }
 
 /* get kvm's dirty pages bitmap and update qemu's */
-static int kvm_get_dirty_pages_log_range(MemoryRegionSection *section,
-                                         unsigned long *bitmap)
+static void kvm_slot_sync_dirty_pages(KVMSlot *slot)
 {
-    ram_addr_t start = section->offset_within_region +
-                       memory_region_get_ram_addr(section->mr);
-    ram_addr_t pages = int128_get64(section->size) / qemu_real_host_page_size;
+    ram_addr_t start = slot->ram_start_offset;
+    ram_addr_t pages = slot->memory_size / qemu_real_host_page_size;
 
-    cpu_physical_memory_set_dirty_lebitmap(bitmap, start, pages);
-    return 0;
+    cpu_physical_memory_set_dirty_lebitmap(slot->dirty_bmap, start, pages);
 }
 
 #define ALIGN(x, y)  (((x)+(y)-1) & ~((y)-1))
@@ -658,26 +655,19 @@ static void kvm_physical_sync_dirty_bitmap(KVMMemoryListener *kml,
     KVMState *s = kvm_state;
     KVMSlot *mem;
     hwaddr start_addr, size;
-    hwaddr slot_size, slot_offset = 0;
+    hwaddr slot_size;
 
     size = kvm_align_section(section, &start_addr);
     while (size) {
-        MemoryRegionSection subsection = *section;
-
         slot_size = MIN(kvm_max_slot_size, size);
         mem = kvm_lookup_matching_slot(kml, start_addr, slot_size);
         if (!mem) {
             /* We don't have a slot if we want to trap every access. */
             return;
         }
-
         if (kvm_slot_get_dirty_log(s, mem)) {
-            subsection.offset_within_region += slot_offset;
-            subsection.size = int128_make64(slot_size);
-            kvm_get_dirty_pages_log_range(&subsection, d.dirty_bitmap);
+            kvm_slot_sync_dirty_pages(mem);
         }
-
-        slot_offset += slot_size;
         start_addr += slot_size;
         size -= slot_size;
     }
@@ -1143,7 +1133,8 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
     int err;
     MemoryRegion *mr = section->mr;
     bool writeable = !mr->readonly && !mr->rom_device;
-    hwaddr start_addr, size, slot_size;
+    hwaddr start_addr, size, slot_size, mr_offset;
+    ram_addr_t ram_start_offset;
     void *ram;
 
     if (!memory_region_is_ram(mr)) {
@@ -1161,9 +1152,13 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
         return;
     }
 
-    /* use aligned delta to align the ram address */
-    ram = memory_region_get_ram_ptr(mr) + section->offset_within_region +
-          (start_addr - section->offset_within_address_space);
+    /* The offset of the kvmslot within the memory region */
+    mr_offset = section->offset_within_region + start_addr -
+        section->offset_within_address_space;
+
+    /* use aligned delta to align the ram address and offset */
+    ram = memory_region_get_ram_ptr(mr) + mr_offset;
+    ram_start_offset = memory_region_get_ram_addr(mr) + mr_offset;
 
     kvm_slots_lock();
 
@@ -1202,6 +1197,7 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
         mem->as_id = kml->as_id;
         mem->memory_size = slot_size;
         mem->start_addr = start_addr;
+        mem->ram_start_offset = ram_start_offset;
         mem->ram = ram;
         mem->flags = kvm_mem_flags(mr);
         kvm_slot_init_dirty_bitmap(mem);
@@ -1212,6 +1208,7 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
             abort();
         }
         start_addr += slot_size;
+        ram_start_offset += slot_size;
         ram += slot_size;
         size -= slot_size;
     } while (size);
diff --git a/include/sysemu/kvm_int.h b/include/sysemu/kvm_int.h
index e13075f738a..ab09a150e19 100644
--- a/include/sysemu/kvm_int.h
+++ b/include/sysemu/kvm_int.h
@@ -25,6 +25,8 @@ typedef struct KVMSlot
     unsigned long *dirty_bmap;
     /* Cache of the address space ID */
     int as_id;
+    /* Cache of the offset in ram address space */
+    ram_addr_t ram_start_offset;
 } KVMSlot;
 
 typedef struct KVMMemoryListener {
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v5 06/10] KVM: Simplify dirty log sync in kvm_set_phys_mem
  2021-03-10 20:32 [PATCH v5 00/10] KVM: Dirty ring support (QEMU part) Peter Xu
                   ` (4 preceding siblings ...)
  2021-03-10 20:32 ` [PATCH v5 05/10] KVM: Provide helper to sync dirty bitmap from slot to ramblock Peter Xu
@ 2021-03-10 20:32 ` Peter Xu
  2021-03-10 20:32 ` [PATCH v5 07/10] KVM: Cache kvm slot dirty bitmap size Peter Xu
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 29+ messages in thread
From: Peter Xu @ 2021-03-10 20:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, Hyman, Dr . David Alan Gilbert, peterx

kvm_physical_sync_dirty_bitmap() on the whole section is inaccurate, because
the section can be a superset of the memslot that we're working on.  The result
is that if the section covers multiple kvm memslots, we could be doing the
synchronization for multiple times for each kvmslot in the section.

With the two helpers that we just introduced, it's very easy to do it right now
by calling the helpers.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 accel/kvm/kvm-all.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 65dc00b0a61..20f852a990b 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -1170,7 +1170,8 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
                 goto out;
             }
             if (mem->flags & KVM_MEM_LOG_DIRTY_PAGES) {
-                kvm_physical_sync_dirty_bitmap(kml, section);
+                kvm_slot_get_dirty_log(kvm_state, mem);
+                kvm_slot_sync_dirty_pages(mem);
             }
 
             /* unregister the slot */
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v5 07/10] KVM: Cache kvm slot dirty bitmap size
  2021-03-10 20:32 [PATCH v5 00/10] KVM: Dirty ring support (QEMU part) Peter Xu
                   ` (5 preceding siblings ...)
  2021-03-10 20:32 ` [PATCH v5 06/10] KVM: Simplify dirty log sync in kvm_set_phys_mem Peter Xu
@ 2021-03-10 20:32 ` Peter Xu
  2021-03-10 20:32 ` [PATCH v5 08/10] KVM: Add dirty-gfn-count property Peter Xu
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 29+ messages in thread
From: Peter Xu @ 2021-03-10 20:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, Hyman, Dr . David Alan Gilbert, peterx

Cache it too because we'll reference it more frequently in the future.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 accel/kvm/kvm-all.c      | 1 +
 include/sysemu/kvm_int.h | 1 +
 2 files changed, 2 insertions(+)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 20f852a990b..a1e7b1332a1 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -612,6 +612,7 @@ static void kvm_slot_init_dirty_bitmap(KVMSlot *mem)
     hwaddr bitmap_size = ALIGN(mem->memory_size / qemu_real_host_page_size,
                                         /*HOST_LONG_BITS*/ 64) / 8;
     mem->dirty_bmap = g_malloc0(bitmap_size);
+    mem->dirty_bmap_size = bitmap_size;
 }
 
 /*
diff --git a/include/sysemu/kvm_int.h b/include/sysemu/kvm_int.h
index ab09a150e19..c788452cd96 100644
--- a/include/sysemu/kvm_int.h
+++ b/include/sysemu/kvm_int.h
@@ -23,6 +23,7 @@ typedef struct KVMSlot
     int old_flags;
     /* Dirty bitmap cache for the slot */
     unsigned long *dirty_bmap;
+    unsigned long dirty_bmap_size;
     /* Cache of the address space ID */
     int as_id;
     /* Cache of the offset in ram address space */
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v5 08/10] KVM: Add dirty-gfn-count property
  2021-03-10 20:32 [PATCH v5 00/10] KVM: Dirty ring support (QEMU part) Peter Xu
                   ` (6 preceding siblings ...)
  2021-03-10 20:32 ` [PATCH v5 07/10] KVM: Cache kvm slot dirty bitmap size Peter Xu
@ 2021-03-10 20:32 ` Peter Xu
  2021-03-10 20:33 ` [PATCH v5 09/10] KVM: Disable manual dirty log when dirty ring enabled Peter Xu
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 29+ messages in thread
From: Peter Xu @ 2021-03-10 20:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, Hyman, Dr . David Alan Gilbert, peterx

Add a parameter for dirty gfn count for dirty rings.  If zero, dirty ring is
disabled.  Otherwise dirty ring will be enabled with the per-vcpu gfn count as
specified.  If dirty ring cannot be enabled due to unsupported kernel or
illegal parameter, it'll fallback to dirty logging.

By default, dirty ring is not enabled (dirty-gfn-count default to 0).

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 accel/kvm/kvm-all.c | 72 +++++++++++++++++++++++++++++++++++++++++++++
 qemu-options.hx     | 12 ++++++++
 2 files changed, 84 insertions(+)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index a1e7b1332a1..10137b6af11 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -128,6 +128,9 @@ struct KVMState
         KVMMemoryListener *ml;
         AddressSpace *as;
     } *as;
+    bool kvm_dirty_ring_enabled;    /* Whether KVM dirty ring is enabled */
+    uint64_t kvm_dirty_ring_size;   /* Size of the per-vcpu dirty ring */
+    uint32_t kvm_dirty_gfn_count;   /* Number of dirty GFNs per ring */
 };
 
 KVMState *kvm_state;
@@ -2136,6 +2139,40 @@ static int kvm_init(MachineState *ms)
     s->coalesced_pio = s->coalesced_mmio &&
                        kvm_check_extension(s, KVM_CAP_COALESCED_PIO);
 
+    /*
+     * Enable KVM dirty ring if supported, otherwise fall back to
+     * dirty logging mode
+     */
+    if (s->kvm_dirty_gfn_count > 0) {
+        uint64_t ring_size;
+
+        ring_size = s->kvm_dirty_gfn_count * sizeof(struct kvm_dirty_gfn);
+
+        /* Read the max supported pages */
+        ret = kvm_vm_check_extension(s, KVM_CAP_DIRTY_LOG_RING);
+        if (ret > 0) {
+            if (ring_size > ret) {
+                error_report("KVM dirty GFN count %" PRIu32 " too big "
+                             "(maximum is %ld).  Please use a smaller value.",
+                             s->kvm_dirty_gfn_count,
+                             ret / sizeof(struct kvm_dirty_gfn));
+                ret = -EINVAL;
+                goto err;
+            }
+
+            ret = kvm_vm_enable_cap(s, KVM_CAP_DIRTY_LOG_RING, 0, ring_size);
+            if (ret) {
+                error_report("Enabling of KVM dirty ring failed: %d. "
+                             "Suggested mininum value is 1024. "
+                             "Please also make sure it's a power of two.", ret);
+                goto err;
+            }
+
+            s->kvm_dirty_ring_size = ring_size;
+            s->kvm_dirty_ring_enabled = true;
+        }
+    }
+
     dirty_log_manual_caps =
         kvm_check_extension(s, KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2);
     dirty_log_manual_caps &= (KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE |
@@ -3179,6 +3216,33 @@ bool kvm_kernel_irqchip_split(void)
     return kvm_state->kernel_irqchip_split == ON_OFF_AUTO_ON;
 }
 
+static void kvm_get_dirty_gfn_count(Object *obj, Visitor *v,
+                                    const char *name, void *opaque,
+                                    Error **errp)
+{
+    KVMState *s = KVM_STATE(obj);
+    uint32_t value = s->kvm_dirty_gfn_count;
+
+    visit_type_uint32(v, name, &value, errp);
+}
+
+static void kvm_set_dirty_gfn_count(Object *obj, Visitor *v,
+                                    const char *name, void *opaque,
+                                    Error **errp)
+{
+    KVMState *s = KVM_STATE(obj);
+    Error *error = NULL;
+    uint32_t value;
+
+    visit_type_uint32(v, name, &value, &error);
+    if (error) {
+        error_propagate(errp, error);
+        return;
+    }
+
+    s->kvm_dirty_gfn_count = value;
+}
+
 static void kvm_accel_instance_init(Object *obj)
 {
     KVMState *s = KVM_STATE(obj);
@@ -3186,6 +3250,8 @@ static void kvm_accel_instance_init(Object *obj)
     s->kvm_shadow_mem = -1;
     s->kernel_irqchip_allowed = true;
     s->kernel_irqchip_split = ON_OFF_AUTO_AUTO;
+    /* KVM dirty ring is by default off */
+    s->kvm_dirty_gfn_count = 0;
 }
 
 static void kvm_accel_class_init(ObjectClass *oc, void *data)
@@ -3207,6 +3273,12 @@ static void kvm_accel_class_init(ObjectClass *oc, void *data)
         NULL, NULL);
     object_class_property_set_description(oc, "kvm-shadow-mem",
         "KVM shadow MMU size");
+
+    object_class_property_add(oc, "dirty-gfn-count", "uint32",
+        kvm_get_dirty_gfn_count, kvm_set_dirty_gfn_count,
+        NULL, NULL);
+    object_class_property_set_description(oc, "dirty-gfn-count",
+        "KVM dirty GFN count (=0 to disable dirty ring)");
 }
 
 static const TypeInfo kvm_accel_type = {
diff --git a/qemu-options.hx b/qemu-options.hx
index 90801286c6e..08becce4ad0 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -141,6 +141,7 @@ DEF("accel", HAS_ARG, QEMU_OPTION_accel,
     "                kvm-shadow-mem=size of KVM shadow MMU in bytes\n"
     "                split-wx=on|off (enable TCG split w^x mapping)\n"
     "                tb-size=n (TCG translation block cache size)\n"
+    "                dirty-gfn-count=n (KVM dirty ring GFN count, default 0)\n"
     "                thread=single|multi (enable multi-threaded TCG)\n", QEMU_ARCH_ALL)
 SRST
 ``-accel name[,prop=value[,...]]``
@@ -181,6 +182,17 @@ SRST
         where both the back-end and front-ends support it and no
         incompatible TCG features have been enabled (e.g.
         icount/replay).
+
+    ``dirty-gfn-count=n``
+        When KVM accelerator is used, it controls the per-vcpu KVM dirty ring
+        size (number of entries one dirty ring contains, per-vcpu). It should
+        be a value that is power of two, and it should be 1024 or bigger (but
+        still less than the maximum value that the kernel supports).  4096
+        could be a good initial value if you have no idea which is the best.
+        Set this value to 0 to disable the feature.  By default, this feature
+        is disabled (dirty-gfn-count=0).  When enabled, it'll automatically
+        replace the kvm get dirty log feature.
+
 ERST
 
 DEF("smp", HAS_ARG, QEMU_OPTION_smp,
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v5 09/10] KVM: Disable manual dirty log when dirty ring enabled
  2021-03-10 20:32 [PATCH v5 00/10] KVM: Dirty ring support (QEMU part) Peter Xu
                   ` (7 preceding siblings ...)
  2021-03-10 20:32 ` [PATCH v5 08/10] KVM: Add dirty-gfn-count property Peter Xu
@ 2021-03-10 20:33 ` Peter Xu
  2021-03-22  9:17   ` Keqian Zhu
  2021-03-10 20:33 ` [PATCH v5 10/10] KVM: Dirty ring support Peter Xu
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 29+ messages in thread
From: Peter Xu @ 2021-03-10 20:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, Hyman, Dr . David Alan Gilbert, peterx

KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 is for KVM_CLEAR_DIRTY_LOG, which is only
useful for KVM_GET_DIRTY_LOG.  Skip enabling it for kvm dirty ring.

More importantly, KVM_DIRTY_LOG_INITIALLY_SET will not wr-protect all the pages
initially, which is against how kvm dirty ring is used - there's no way for kvm
dirty ring to re-protect a page before it's notified as being written first
with a GFN entry in the ring!  So when KVM_DIRTY_LOG_INITIALLY_SET is enabled
with dirty ring, we'll see silent data loss after migration.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 accel/kvm/kvm-all.c | 37 +++++++++++++++++++++++--------------
 1 file changed, 23 insertions(+), 14 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 10137b6af11..ae9393266b2 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -2173,20 +2173,29 @@ static int kvm_init(MachineState *ms)
         }
     }
 
-    dirty_log_manual_caps =
-        kvm_check_extension(s, KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2);
-    dirty_log_manual_caps &= (KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE |
-                              KVM_DIRTY_LOG_INITIALLY_SET);
-    s->manual_dirty_log_protect = dirty_log_manual_caps;
-    if (dirty_log_manual_caps) {
-        ret = kvm_vm_enable_cap(s, KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2, 0,
-                                   dirty_log_manual_caps);
-        if (ret) {
-            warn_report("Trying to enable capability %"PRIu64" of "
-                        "KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 but failed. "
-                        "Falling back to the legacy mode. ",
-                        dirty_log_manual_caps);
-            s->manual_dirty_log_protect = 0;
+    /*
+     * KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 is not needed when dirty ring is
+     * enabled.  More importantly, KVM_DIRTY_LOG_INITIALLY_SET will assume no
+     * page is wr-protected initially, which is against how kvm dirty ring is
+     * usage - kvm dirty ring requires all pages are wr-protected at the very
+     * beginning.  Enabling this feature for dirty ring causes data corruption.
+     */
+    if (!s->kvm_dirty_ring_enabled) {
+        dirty_log_manual_caps =
+            kvm_check_extension(s, KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2);
+        dirty_log_manual_caps &= (KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE |
+                                  KVM_DIRTY_LOG_INITIALLY_SET);
+        s->manual_dirty_log_protect = dirty_log_manual_caps;
+        if (dirty_log_manual_caps) {
+            ret = kvm_vm_enable_cap(s, KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2, 0,
+                                    dirty_log_manual_caps);
+            if (ret) {
+                warn_report("Trying to enable capability %"PRIu64" of "
+                            "KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 but failed. "
+                            "Falling back to the legacy mode. ",
+                            dirty_log_manual_caps);
+                s->manual_dirty_log_protect = 0;
+            }
         }
     }
 
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v5 10/10] KVM: Dirty ring support
  2021-03-10 20:32 [PATCH v5 00/10] KVM: Dirty ring support (QEMU part) Peter Xu
                   ` (8 preceding siblings ...)
  2021-03-10 20:33 ` [PATCH v5 09/10] KVM: Disable manual dirty log when dirty ring enabled Peter Xu
@ 2021-03-10 20:33 ` Peter Xu
  2021-03-22 13:37   ` Keqian Zhu
  2021-03-19 18:12 ` [PATCH v5 00/10] KVM: Dirty ring support (QEMU part) Peter Xu
  2021-03-22 14:02 ` Keqian Zhu
  11 siblings, 1 reply; 29+ messages in thread
From: Peter Xu @ 2021-03-10 20:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, Hyman, Dr . David Alan Gilbert, peterx

KVM dirty ring is a new interface to pass over dirty bits from kernel to the
userspace.  Instead of using a bitmap for each memory region, the dirty ring
contains an array of dirtied GPAs to fetch (in the form of offset in slots).
For each vcpu there will be one dirty ring that binds to it.

kvm_dirty_ring_reap() is the major function to collect dirty rings.  It can be
called either by a standalone reaper thread that runs in the background,
collecting dirty pages for the whole VM.  It can also be called directly by any
thread that has BQL taken.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 accel/kvm/kvm-all.c    | 332 ++++++++++++++++++++++++++++++++++++++++-
 accel/kvm/trace-events |   7 +
 include/hw/core/cpu.h  |   8 +
 3 files changed, 344 insertions(+), 3 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index ae9393266b2..bf2b21f038b 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -15,6 +15,7 @@
 
 #include "qemu/osdep.h"
 #include <sys/ioctl.h>
+#include <poll.h>
 
 #include <linux/kvm.h>
 
@@ -80,6 +81,25 @@ struct KVMParkedVcpu {
     QLIST_ENTRY(KVMParkedVcpu) node;
 };
 
+enum KVMDirtyRingReaperState {
+    KVM_DIRTY_RING_REAPER_NONE = 0,
+    /* The reaper is sleeping */
+    KVM_DIRTY_RING_REAPER_WAIT,
+    /* The reaper is reaping for dirty pages */
+    KVM_DIRTY_RING_REAPER_REAPING,
+};
+
+/*
+ * KVM reaper instance, responsible for collecting the KVM dirty bits
+ * via the dirty ring.
+ */
+struct KVMDirtyRingReaper {
+    /* The reaper thread */
+    QemuThread reaper_thr;
+    volatile uint64_t reaper_iteration; /* iteration number of reaper thr */
+    volatile enum KVMDirtyRingReaperState reaper_state; /* reap thr state */
+};
+
 struct KVMState
 {
     AccelState parent_obj;
@@ -131,6 +151,7 @@ struct KVMState
     bool kvm_dirty_ring_enabled;    /* Whether KVM dirty ring is enabled */
     uint64_t kvm_dirty_ring_size;   /* Size of the per-vcpu dirty ring */
     uint32_t kvm_dirty_gfn_count;   /* Number of dirty GFNs per ring */
+    struct KVMDirtyRingReaper reaper;
 };
 
 KVMState *kvm_state;
@@ -392,6 +413,13 @@ static int do_kvm_destroy_vcpu(CPUState *cpu)
         goto err;
     }
 
+    if (cpu->kvm_dirty_gfns) {
+        ret = munmap(cpu->kvm_dirty_gfns, s->kvm_dirty_ring_size);
+        if (ret < 0) {
+            goto err;
+        }
+    }
+
     vcpu = g_malloc0(sizeof(*vcpu));
     vcpu->vcpu_id = kvm_arch_vcpu_id(cpu);
     vcpu->kvm_fd = cpu->kvm_fd;
@@ -468,6 +496,19 @@ int kvm_init_vcpu(CPUState *cpu, Error **errp)
             (void *)cpu->kvm_run + s->coalesced_mmio * PAGE_SIZE;
     }
 
+    if (s->kvm_dirty_ring_enabled) {
+        /* Use MAP_SHARED to share pages with the kernel */
+        cpu->kvm_dirty_gfns = mmap(NULL, s->kvm_dirty_ring_size,
+                                   PROT_READ | PROT_WRITE, MAP_SHARED,
+                                   cpu->kvm_fd,
+                                   PAGE_SIZE * KVM_DIRTY_LOG_PAGE_OFFSET);
+        if (cpu->kvm_dirty_gfns == MAP_FAILED) {
+            ret = -errno;
+            DPRINTF("mmap'ing vcpu dirty gfns failed: %d\n", ret);
+            goto err;
+        }
+    }
+
     ret = kvm_arch_init_vcpu(cpu);
     if (ret < 0) {
         error_setg_errno(errp, -ret,
@@ -586,6 +627,11 @@ static void kvm_slot_sync_dirty_pages(KVMSlot *slot)
     cpu_physical_memory_set_dirty_lebitmap(slot->dirty_bmap, start, pages);
 }
 
+static void kvm_slot_reset_dirty_pages(KVMSlot *slot)
+{
+    memset(slot->dirty_bmap, 0, slot->dirty_bmap_size);
+}
+
 #define ALIGN(x, y)  (((x)+(y)-1) & ~((y)-1))
 
 /* Allocate the dirty bitmap for a slot  */
@@ -642,6 +688,170 @@ static bool kvm_slot_get_dirty_log(KVMState *s, KVMSlot *slot)
     return ret == 0;
 }
 
+/* Should be with all slots_lock held for the address spaces. */
+static void kvm_dirty_ring_mark_page(KVMState *s, uint32_t as_id,
+                                     uint32_t slot_id, uint64_t offset)
+{
+    KVMMemoryListener *kml;
+    KVMSlot *mem;
+
+    if (as_id >= s->nr_as) {
+        return;
+    }
+
+    kml = s->as[as_id].ml;
+    mem = &kml->slots[slot_id];
+
+    if (!mem->memory_size || offset >= (mem->memory_size / TARGET_PAGE_SIZE)) {
+        return;
+    }
+
+    set_bit(offset, mem->dirty_bmap);
+}
+
+static bool dirty_gfn_is_dirtied(struct kvm_dirty_gfn *gfn)
+{
+    return gfn->flags == KVM_DIRTY_GFN_F_DIRTY;
+}
+
+static void dirty_gfn_set_collected(struct kvm_dirty_gfn *gfn)
+{
+    gfn->flags = KVM_DIRTY_GFN_F_RESET;
+}
+
+/*
+ * Should be with all slots_lock held for the address spaces.  It returns the
+ * dirty page we've collected on this dirty ring.
+ */
+static uint32_t kvm_dirty_ring_reap_one(KVMState *s, CPUState *cpu)
+{
+    struct kvm_dirty_gfn *dirty_gfns = cpu->kvm_dirty_gfns, *cur;
+    uint32_t gfn_count = s->kvm_dirty_gfn_count;
+    uint32_t count = 0, fetch = cpu->kvm_fetch_index;
+
+    assert(dirty_gfns && gfn_count);
+    trace_kvm_dirty_ring_reap_vcpu(cpu->cpu_index);
+
+    while (true) {
+        cur = &dirty_gfns[fetch % gfn_count];
+        if (!dirty_gfn_is_dirtied(cur)) {
+            break;
+        }
+        kvm_dirty_ring_mark_page(s, cur->slot >> 16, cur->slot & 0xffff,
+                                 cur->offset);
+        dirty_gfn_set_collected(cur);
+        trace_kvm_dirty_ring_page(cpu->cpu_index, fetch, cur->offset);
+        fetch++;
+        count++;
+    }
+    cpu->kvm_fetch_index = fetch;
+
+    return count;
+}
+
+/* Must be with slots_lock held */
+static uint64_t kvm_dirty_ring_reap_locked(KVMState *s)
+{
+    int ret;
+    CPUState *cpu;
+    uint64_t total = 0;
+    int64_t stamp;
+
+    stamp = get_clock();
+
+    CPU_FOREACH(cpu) {
+        total += kvm_dirty_ring_reap_one(s, cpu);
+    }
+
+    if (total) {
+        ret = kvm_vm_ioctl(s, KVM_RESET_DIRTY_RINGS);
+        assert(ret == total);
+    }
+
+    stamp = get_clock() - stamp;
+
+    if (total) {
+        trace_kvm_dirty_ring_reap(total, stamp / 1000);
+    }
+
+    return total;
+}
+
+/*
+ * Currently for simplicity, we must hold BQL before calling this.  We can
+ * consider to drop the BQL if we're clear with all the race conditions.
+ */
+static uint64_t kvm_dirty_ring_reap(KVMState *s)
+{
+    uint64_t total;
+
+    /*
+     * We need to lock all kvm slots for all address spaces here,
+     * because:
+     *
+     * (1) We need to mark dirty for dirty bitmaps in multiple slots
+     *     and for tons of pages, so it's better to take the lock here
+     *     once rather than once per page.  And more importantly,
+     *
+     * (2) We must _NOT_ publish dirty bits to the other threads
+     *     (e.g., the migration thread) via the kvm memory slot dirty
+     *     bitmaps before correctly re-protect those dirtied pages.
+     *     Otherwise we can have potential risk of data corruption if
+     *     the page data is read in the other thread before we do
+     *     reset below.
+     */
+    kvm_slots_lock();
+    total = kvm_dirty_ring_reap_locked(s);
+    kvm_slots_unlock();
+
+    return total;
+}
+
+static void do_kvm_cpu_synchronize_kick(CPUState *cpu, run_on_cpu_data arg)
+{
+    /* No need to do anything */
+}
+
+/*
+ * Kick all vcpus out in a synchronized way.  When returned, we
+ * guarantee that every vcpu has been kicked and at least returned to
+ * userspace once.
+ */
+static void kvm_cpu_synchronize_kick_all(void)
+{
+    CPUState *cpu;
+
+    CPU_FOREACH(cpu) {
+        run_on_cpu(cpu, do_kvm_cpu_synchronize_kick, RUN_ON_CPU_NULL);
+    }
+}
+
+/*
+ * Flush all the existing dirty pages to the KVM slot buffers.  When
+ * this call returns, we guarantee that all the touched dirty pages
+ * before calling this function have been put into the per-kvmslot
+ * dirty bitmap.
+ *
+ * This function must be called with BQL held.
+ */
+static void kvm_dirty_ring_flush(struct KVMDirtyRingReaper *r)
+{
+    trace_kvm_dirty_ring_flush(0);
+    /*
+     * The function needs to be serialized.  Since this function
+     * should always be with BQL held, serialization is guaranteed.
+     * However, let's be sure of it.
+     */
+    assert(qemu_mutex_iothread_locked());
+    /*
+     * First make sure to flush the hardware buffers by kicking all
+     * vcpus out in a synchronous way.
+     */
+    kvm_cpu_synchronize_kick_all();
+    kvm_dirty_ring_reap(kvm_state);
+    trace_kvm_dirty_ring_flush(1);
+}
+
 /**
  * kvm_physical_sync_dirty_bitmap - Sync dirty bitmap from kernel space
  *
@@ -1174,7 +1384,24 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
                 goto out;
             }
             if (mem->flags & KVM_MEM_LOG_DIRTY_PAGES) {
-                kvm_slot_get_dirty_log(kvm_state, mem);
+                /*
+                 * NOTE: We should be aware of the fact that here we're only
+                 * doing a best effort to sync dirty bits.  No matter whether
+                 * we're using dirty log or dirty ring, we ignored two facts:
+                 *
+                 * (1) dirty bits can reside in hardware buffers (PML)
+                 *
+                 * (2) after we collected dirty bits here, pages can be dirtied
+                 * again before we do the final KVM_SET_USER_MEMORY_REGION to
+                 * remove the slot.
+                 *
+                 * Not easy.  Let's cross the fingers until it's fixed.
+                 */
+                if (kvm_state->kvm_dirty_ring_enabled) {
+                    kvm_dirty_ring_reap_locked(kvm_state);
+                } else {
+                    kvm_slot_get_dirty_log(kvm_state, mem);
+                }
                 kvm_slot_sync_dirty_pages(mem);
             }
 
@@ -1222,6 +1449,51 @@ out:
     kvm_slots_unlock();
 }
 
+static void *kvm_dirty_ring_reaper_thread(void *data)
+{
+    KVMState *s = data;
+    struct KVMDirtyRingReaper *r = &s->reaper;
+
+    rcu_register_thread();
+
+    trace_kvm_dirty_ring_reaper("init");
+
+    while (true) {
+        r->reaper_state = KVM_DIRTY_RING_REAPER_WAIT;
+        trace_kvm_dirty_ring_reaper("wait");
+        /*
+         * TODO: provide a smarter timeout rather than a constant?
+         */
+        sleep(1);
+
+        trace_kvm_dirty_ring_reaper("wakeup");
+        r->reaper_state = KVM_DIRTY_RING_REAPER_REAPING;
+
+        qemu_mutex_lock_iothread();
+        kvm_dirty_ring_reap(s);
+        qemu_mutex_unlock_iothread();
+
+        r->reaper_iteration++;
+    }
+
+    trace_kvm_dirty_ring_reaper("exit");
+
+    rcu_unregister_thread();
+
+    return NULL;
+}
+
+static int kvm_dirty_ring_reaper_init(KVMState *s)
+{
+    struct KVMDirtyRingReaper *r = &s->reaper;
+
+    qemu_thread_create(&r->reaper_thr, "kvm-reaper",
+                       kvm_dirty_ring_reaper_thread,
+                       s, QEMU_THREAD_JOINABLE);
+
+    return 0;
+}
+
 static void kvm_region_add(MemoryListener *listener,
                            MemoryRegionSection *section)
 {
@@ -1250,6 +1522,36 @@ static void kvm_log_sync(MemoryListener *listener,
     kvm_slots_unlock();
 }
 
+static void kvm_log_sync_global(MemoryListener *l)
+{
+    KVMMemoryListener *kml = container_of(l, KVMMemoryListener, listener);
+    KVMState *s = kvm_state;
+    KVMSlot *mem;
+    int i;
+
+    /* Flush all kernel dirty addresses into KVMSlot dirty bitmap */
+    kvm_dirty_ring_flush(&s->reaper);
+
+    /*
+     * TODO: make this faster when nr_slots is big while there are
+     * only a few used slots (small VMs).
+     */
+    kvm_slots_lock();
+    for (i = 0; i < s->nr_slots; i++) {
+        mem = &kml->slots[i];
+        if (mem->memory_size && mem->flags & KVM_MEM_LOG_DIRTY_PAGES) {
+            kvm_slot_sync_dirty_pages(mem);
+            /*
+             * This is not needed by KVM_GET_DIRTY_LOG because the
+             * ioctl will unconditionally overwrite the whole region.
+             * However kvm dirty ring has no such side effect.
+             */
+            kvm_slot_reset_dirty_pages(mem);
+        }
+    }
+    kvm_slots_unlock();
+}
+
 static void kvm_log_clear(MemoryListener *listener,
                           MemoryRegionSection *section)
 {
@@ -1356,10 +1658,15 @@ void kvm_memory_listener_register(KVMState *s, KVMMemoryListener *kml,
     kml->listener.region_del = kvm_region_del;
     kml->listener.log_start = kvm_log_start;
     kml->listener.log_stop = kvm_log_stop;
-    kml->listener.log_sync = kvm_log_sync;
-    kml->listener.log_clear = kvm_log_clear;
     kml->listener.priority = 10;
 
+    if (s->kvm_dirty_ring_enabled) {
+        kml->listener.log_sync_global = kvm_log_sync_global;
+    } else {
+        kml->listener.log_sync = kvm_log_sync;
+        kml->listener.log_clear = kvm_log_clear;
+    }
+
     memory_listener_register(&kml->listener, as);
 
     for (i = 0; i < s->nr_as; ++i) {
@@ -2281,6 +2588,14 @@ static int kvm_init(MachineState *ms)
         ret = ram_block_discard_disable(true);
         assert(!ret);
     }
+
+    if (s->kvm_dirty_ring_enabled) {
+        ret = kvm_dirty_ring_reaper_init(s);
+        if (ret) {
+            goto err;
+        }
+    }
+
     return 0;
 
 err:
@@ -2593,6 +2908,17 @@ int kvm_cpu_exec(CPUState *cpu)
         case KVM_EXIT_INTERNAL_ERROR:
             ret = kvm_handle_internal_error(cpu, run);
             break;
+        case KVM_EXIT_DIRTY_RING_FULL:
+            /*
+             * We shouldn't continue if the dirty ring of this vcpu is
+             * still full.  Got kicked by KVM_RESET_DIRTY_RINGS.
+             */
+            trace_kvm_dirty_ring_full(cpu->cpu_index);
+            qemu_mutex_lock_iothread();
+            kvm_dirty_ring_reap(kvm_state);
+            qemu_mutex_unlock_iothread();
+            ret = 0;
+            break;
         case KVM_EXIT_SYSTEM_EVENT:
             switch (run->system_event.type) {
             case KVM_SYSTEM_EVENT_SHUTDOWN:
diff --git a/accel/kvm/trace-events b/accel/kvm/trace-events
index e15ae8980d3..72a01320a1a 100644
--- a/accel/kvm/trace-events
+++ b/accel/kvm/trace-events
@@ -18,4 +18,11 @@ kvm_set_ioeventfd_pio(int fd, uint16_t addr, uint32_t val, bool assign, uint32_t
 kvm_set_user_memory(uint32_t slot, uint32_t flags, uint64_t guest_phys_addr, uint64_t memory_size, uint64_t userspace_addr, int ret) "Slot#%d flags=0x%x gpa=0x%"PRIx64 " size=0x%"PRIx64 " ua=0x%"PRIx64 " ret=%d"
 kvm_clear_dirty_log(uint32_t slot, uint64_t start, uint32_t size) "slot#%"PRId32" start 0x%"PRIx64" size 0x%"PRIx32
 kvm_resample_fd_notify(int gsi) "gsi %d"
+kvm_dirty_ring_full(int id) "vcpu %d"
+kvm_dirty_ring_reap_vcpu(int id) "vcpu %d"
+kvm_dirty_ring_page(int vcpu, uint32_t slot, uint64_t offset) "vcpu %d fetch %"PRIu32" offset 0x%"PRIx64
+kvm_dirty_ring_reaper(const char *s) "%s"
+kvm_dirty_ring_reap(uint64_t count, int64_t t) "reaped %"PRIu64" pages (took %"PRIi64" us)"
+kvm_dirty_ring_reaper_kick(const char *reason) "%s"
+kvm_dirty_ring_flush(int finished) "%d"
 
diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
index c68bc3ba8af..2f0991d93f7 100644
--- a/include/hw/core/cpu.h
+++ b/include/hw/core/cpu.h
@@ -323,6 +323,11 @@ struct qemu_work_item;
  * @ignore_memory_transaction_failures: Cached copy of the MachineState
  *    flag of the same name: allows the board to suppress calling of the
  *    CPU do_transaction_failed hook function.
+ * @kvm_dirty_ring_full:
+ *   Whether the kvm dirty ring of this vcpu is soft-full.
+ * @kvm_dirty_ring_avail:
+ *   Semaphore to be posted when the kvm dirty ring of the vcpu is
+ *   available again.
  *
  * State of one CPU core or thread.
  */
@@ -394,9 +399,12 @@ struct CPUState {
      */
     uintptr_t mem_io_pc;
 
+    /* Only used in KVM */
     int kvm_fd;
     struct KVMState *kvm_state;
     struct kvm_run *kvm_run;
+    struct kvm_dirty_gfn *kvm_dirty_gfns;
+    uint32_t kvm_fetch_index;
 
     /* Used for events with 'vcpu' and *without* the 'disabled' properties */
     DECLARE_BITMAP(trace_dstate_delayed, CPU_TRACE_DSTATE_MAX_EVENTS);
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 00/10] KVM: Dirty ring support (QEMU part)
  2021-03-10 20:32 [PATCH v5 00/10] KVM: Dirty ring support (QEMU part) Peter Xu
                   ` (9 preceding siblings ...)
  2021-03-10 20:33 ` [PATCH v5 10/10] KVM: Dirty ring support Peter Xu
@ 2021-03-19 18:12 ` Peter Xu
  2021-03-22 14:02 ` Keqian Zhu
  11 siblings, 0 replies; 29+ messages in thread
From: Peter Xu @ 2021-03-19 18:12 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, Hyman, Dr . David Alan Gilbert

On Wed, Mar 10, 2021 at 03:32:51PM -0500, Peter Xu wrote:
> This is v5 of the qemu dirty ring interface support.
> 
> v5:
> - rebase
> - dropped patch "update-linux-headers: Include const.h" after rebase
> - dropped patch "KVM: Fixup kvm_log_clear_one_slot() ioctl return check" since
>   similar patch got merged recently (38e0b7904eca7cd32f8953c3)

Ping - I think it missed 6.0 soft freeze, so it's a ping about review comments,
then hopefully it can catch the train for 6.1, still?  Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 09/10] KVM: Disable manual dirty log when dirty ring enabled
  2021-03-10 20:33 ` [PATCH v5 09/10] KVM: Disable manual dirty log when dirty ring enabled Peter Xu
@ 2021-03-22  9:17   ` Keqian Zhu
  2021-03-22 13:55     ` Paolo Bonzini
  0 siblings, 1 reply; 29+ messages in thread
From: Keqian Zhu @ 2021-03-22  9:17 UTC (permalink / raw)
  To: Peter Xu, qemu-devel; +Cc: Paolo Bonzini, Hyman, Dr . David Alan Gilbert

Hi Peter,

On 2021/3/11 4:33, Peter Xu wrote:
> KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 is for KVM_CLEAR_DIRTY_LOG, which is only
> useful for KVM_GET_DIRTY_LOG.  Skip enabling it for kvm dirty ring.
> 
> More importantly, KVM_DIRTY_LOG_INITIALLY_SET will not wr-protect all the pages
> initially, which is against how kvm dirty ring is used - there's no way for kvm
> dirty ring to re-protect a page before it's notified as being written first
> with a GFN entry in the ring!  So when KVM_DIRTY_LOG_INITIALLY_SET is enabled
> with dirty ring, we'll see silent data loss after migration.
I feel a little regret that dirty ring can not work with KVM_DIRTY_LOG_INITIALLY_SET ...
With KVM_DIRTY_LOG_INITIALLY_SET, we can speedup dirty log start. More important, we can
enable dirty log gradually. For write fault based dirty log, it greatly reduces the side
effect of dirty log over guest.

I hope we can put forward another similar optimization under dirty ring mode. :)

Thanks,
Keqian

> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  accel/kvm/kvm-all.c | 37 +++++++++++++++++++++++--------------
>  1 file changed, 23 insertions(+), 14 deletions(-)
> 
> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> index 10137b6af11..ae9393266b2 100644
> --- a/accel/kvm/kvm-all.c
> +++ b/accel/kvm/kvm-all.c
> @@ -2173,20 +2173,29 @@ static int kvm_init(MachineState *ms)
>          }
>      }
>  
> -    dirty_log_manual_caps =
> -        kvm_check_extension(s, KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2);
> -    dirty_log_manual_caps &= (KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE |
> -                              KVM_DIRTY_LOG_INITIALLY_SET);
> -    s->manual_dirty_log_protect = dirty_log_manual_caps;
> -    if (dirty_log_manual_caps) {
> -        ret = kvm_vm_enable_cap(s, KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2, 0,
> -                                   dirty_log_manual_caps);
> -        if (ret) {
> -            warn_report("Trying to enable capability %"PRIu64" of "
> -                        "KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 but failed. "
> -                        "Falling back to the legacy mode. ",
> -                        dirty_log_manual_caps);
> -            s->manual_dirty_log_protect = 0;
> +    /*
> +     * KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 is not needed when dirty ring is
> +     * enabled.  More importantly, KVM_DIRTY_LOG_INITIALLY_SET will assume no
> +     * page is wr-protected initially, which is against how kvm dirty ring is
> +     * usage - kvm dirty ring requires all pages are wr-protected at the very
> +     * beginning.  Enabling this feature for dirty ring causes data corruption.
> +     */
> +    if (!s->kvm_dirty_ring_enabled) {
> +        dirty_log_manual_caps =
> +            kvm_check_extension(s, KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2);
> +        dirty_log_manual_caps &= (KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE |
> +                                  KVM_DIRTY_LOG_INITIALLY_SET);
> +        s->manual_dirty_log_protect = dirty_log_manual_caps;
> +        if (dirty_log_manual_caps) {
> +            ret = kvm_vm_enable_cap(s, KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2, 0,
> +                                    dirty_log_manual_caps);
> +            if (ret) {
> +                warn_report("Trying to enable capability %"PRIu64" of "
> +                            "KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 but failed. "
> +                            "Falling back to the legacy mode. ",
> +                            dirty_log_manual_caps);
> +                s->manual_dirty_log_protect = 0;
> +            }
>          }
>      }
>  
> 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 02/10] KVM: Use a big lock to replace per-kml slots_lock
  2021-03-10 20:32 ` [PATCH v5 02/10] KVM: Use a big lock to replace per-kml slots_lock Peter Xu
@ 2021-03-22 10:47   ` Keqian Zhu
  2021-03-22 13:54     ` Paolo Bonzini
  0 siblings, 1 reply; 29+ messages in thread
From: Keqian Zhu @ 2021-03-22 10:47 UTC (permalink / raw)
  To: Peter Xu, qemu-devel; +Cc: Paolo Bonzini, Hyman, Dr . David Alan Gilbert

Hi Peter,

On 2021/3/11 4:32, Peter Xu wrote:
> Per-kml slots_lock will bring some trouble if we want to take all slots_lock of
> all the KMLs, especially when we're in a context that we could have taken some
> of the KML slots_lock, then we even need to figure out what we've taken and
> what we need to take.
> 
> Make this simple by merging all KML slots_lock into a single slots lock.
> 
> Per-kml slots_lock isn't anything that helpful anyway - so far only x86 has two
> address spaces (so, two slots_locks).  All the rest archs will be having one
> address space always, which means there's actually one slots_lock so it will be
> the same as before.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  accel/kvm/kvm-all.c      | 32 +++++++++++++++++---------------
>  include/sysemu/kvm_int.h |  2 --
>  2 files changed, 17 insertions(+), 17 deletions(-)
> 
> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> index f88a52393fe..94e881f123b 100644
> --- a/accel/kvm/kvm-all.c
> +++ b/accel/kvm/kvm-all.c
> @@ -174,8 +174,10 @@ typedef struct KVMResampleFd KVMResampleFd;
>  static QLIST_HEAD(, KVMResampleFd) kvm_resample_fd_list =
>      QLIST_HEAD_INITIALIZER(kvm_resample_fd_list);
>  
> -#define kvm_slots_lock(kml)      qemu_mutex_lock(&(kml)->slots_lock)
> -#define kvm_slots_unlock(kml)    qemu_mutex_unlock(&(kml)->slots_lock)
> +static QemuMutex kml_slots_lock;
> +
> +#define kvm_slots_lock()  qemu_mutex_lock(&kml_slots_lock)
> +#define kvm_slots_unlock()  qemu_mutex_unlock(&kml_slots_lock)
nit: qemu_mutex_lock and qemu_mutex_unlock is not aligned.


>  
>  static inline void kvm_resample_fd_remove(int gsi)
>  {
> @@ -241,9 +243,9 @@ bool kvm_has_free_slot(MachineState *ms)
>      bool result;
>      KVMMemoryListener *kml = &s->memory_listener;
>  
> -    kvm_slots_lock(kml);
> +    kvm_slots_lock();
>      result = !!kvm_get_free_slot(kml);
> -    kvm_slots_unlock(kml);
> +    kvm_slots_unlock();
>  
>      return result;
>  }
> @@ -309,7 +311,7 @@ int kvm_physical_memory_addr_from_host(KVMState *s, void *ram,
>      KVMMemoryListener *kml = &s->memory_listener;
>      int i, ret = 0;
>  
> -    kvm_slots_lock(kml);
> +    kvm_slots_lock();
>      for (i = 0; i < s->nr_slots; i++) {
>          KVMSlot *mem = &kml->slots[i];
>  
> @@ -319,7 +321,7 @@ int kvm_physical_memory_addr_from_host(KVMState *s, void *ram,
>              break;
>          }
>      }
> -    kvm_slots_unlock(kml);
> +    kvm_slots_unlock();
>  
>      return ret;
>  }
> @@ -515,7 +517,7 @@ static int kvm_section_update_flags(KVMMemoryListener *kml,
>          return 0;
>      }
>  
> -    kvm_slots_lock(kml);
> +    kvm_slots_lock();
>  
>      while (size && !ret) {
>          slot_size = MIN(kvm_max_slot_size, size);
> @@ -531,7 +533,7 @@ static int kvm_section_update_flags(KVMMemoryListener *kml,
>      }
>  
>  out:
> -    kvm_slots_unlock(kml);
> +    kvm_slots_unlock();
>      return ret;
>  }
>  
> @@ -819,7 +821,7 @@ static int kvm_physical_log_clear(KVMMemoryListener *kml,
>          return ret;
>      }
>  
> -    kvm_slots_lock(kml);
> +    kvm_slots_lock();
>  
>      for (i = 0; i < s->nr_slots; i++) {
>          mem = &kml->slots[i];
> @@ -845,7 +847,7 @@ static int kvm_physical_log_clear(KVMMemoryListener *kml,
>          }
>      }
>  
> -    kvm_slots_unlock(kml);
> +    kvm_slots_unlock();
>  
>      return ret;
>  }
> @@ -1150,7 +1152,7 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
>      ram = memory_region_get_ram_ptr(mr) + section->offset_within_region +
>            (start_addr - section->offset_within_address_space);
>  
> -    kvm_slots_lock(kml);
> +    kvm_slots_lock();
>  
>      if (!add) {
>          do {
> @@ -1208,7 +1210,7 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
>      } while (size);
>  
>  out:
> -    kvm_slots_unlock(kml);
> +    kvm_slots_unlock();
>  }
>  
>  static void kvm_region_add(MemoryListener *listener,
> @@ -1235,9 +1237,9 @@ static void kvm_log_sync(MemoryListener *listener,
>      KVMMemoryListener *kml = container_of(listener, KVMMemoryListener, listener);
>      int r;
>  
> -    kvm_slots_lock(kml);
> +    kvm_slots_lock();
>      r = kvm_physical_sync_dirty_bitmap(kml, section);
> -    kvm_slots_unlock(kml);
> +    kvm_slots_unlock();
>      if (r < 0) {
>          abort();
>      }
> @@ -1337,7 +1339,7 @@ void kvm_memory_listener_register(KVMState *s, KVMMemoryListener *kml,
>  {
>      int i;
>  
> -    qemu_mutex_init(&kml->slots_lock);
> +    qemu_mutex_init(&kml_slots_lock);
As you said, x86 has two address spaces, is it a problem that we may have multi initialization for kml_slots_lock?

Thanks,
Keqian

>      kml->slots = g_malloc0(s->nr_slots * sizeof(KVMSlot));
>      kml->as_id = as_id;
>  
> diff --git a/include/sysemu/kvm_int.h b/include/sysemu/kvm_int.h
> index ccb8869f01b..1da30e18841 100644
> --- a/include/sysemu/kvm_int.h
> +++ b/include/sysemu/kvm_int.h
> @@ -27,8 +27,6 @@ typedef struct KVMSlot
>  
>  typedef struct KVMMemoryListener {
>      MemoryListener listener;
> -    /* Protects the slots and all inside them */
> -    QemuMutex slots_lock;
>      KVMSlot *slots;
>      int as_id;
>  } KVMMemoryListener;
> 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 10/10] KVM: Dirty ring support
  2021-03-10 20:33 ` [PATCH v5 10/10] KVM: Dirty ring support Peter Xu
@ 2021-03-22 13:37   ` Keqian Zhu
  2021-03-22 18:52     ` Peter Xu
  0 siblings, 1 reply; 29+ messages in thread
From: Keqian Zhu @ 2021-03-22 13:37 UTC (permalink / raw)
  To: Peter Xu, qemu-devel; +Cc: Paolo Bonzini, Hyman, Dr . David Alan Gilbert



On 2021/3/11 4:33, Peter Xu wrote:
> KVM dirty ring is a new interface to pass over dirty bits from kernel to the
> userspace.  Instead of using a bitmap for each memory region, the dirty ring
> contains an array of dirtied GPAs to fetch (in the form of offset in slots).
> For each vcpu there will be one dirty ring that binds to it.
> 
> kvm_dirty_ring_reap() is the major function to collect dirty rings.  It can be
> called either by a standalone reaper thread that runs in the background,
> collecting dirty pages for the whole VM.  It can also be called directly by any
> thread that has BQL taken.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  accel/kvm/kvm-all.c    | 332 ++++++++++++++++++++++++++++++++++++++++-
>  accel/kvm/trace-events |   7 +
>  include/hw/core/cpu.h  |   8 +
>  3 files changed, 344 insertions(+), 3 deletions(-)
> 
> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> index ae9393266b2..bf2b21f038b 100644
> --- a/accel/kvm/kvm-all.c
> +++ b/accel/kvm/kvm-all.c
> @@ -15,6 +15,7 @@
>  
>  #include "qemu/osdep.h"
>  #include <sys/ioctl.h>
> +#include <poll.h>
>  
>  #include <linux/kvm.h>
>  
> @@ -80,6 +81,25 @@ struct KVMParkedVcpu {
>      QLIST_ENTRY(KVMParkedVcpu) node;
>  };
>  
> +enum KVMDirtyRingReaperState {
> +    KVM_DIRTY_RING_REAPER_NONE = 0,
> +    /* The reaper is sleeping */
> +    KVM_DIRTY_RING_REAPER_WAIT,
> +    /* The reaper is reaping for dirty pages */
> +    KVM_DIRTY_RING_REAPER_REAPING,
> +};
> +
> +/*
> + * KVM reaper instance, responsible for collecting the KVM dirty bits
> + * via the dirty ring.
> + */
> +struct KVMDirtyRingReaper {
> +    /* The reaper thread */
> +    QemuThread reaper_thr;
> +    volatile uint64_t reaper_iteration; /* iteration number of reaper thr */
> +    volatile enum KVMDirtyRingReaperState reaper_state; /* reap thr state */
> +};
> +
>  struct KVMState
>  {
>      AccelState parent_obj;
> @@ -131,6 +151,7 @@ struct KVMState
>      bool kvm_dirty_ring_enabled;    /* Whether KVM dirty ring is enabled */
>      uint64_t kvm_dirty_ring_size;   /* Size of the per-vcpu dirty ring */
>      uint32_t kvm_dirty_gfn_count;   /* Number of dirty GFNs per ring */
> +    struct KVMDirtyRingReaper reaper;
>  };
>  
>  KVMState *kvm_state;
> @@ -392,6 +413,13 @@ static int do_kvm_destroy_vcpu(CPUState *cpu)
>          goto err;
>      }
>  
> +    if (cpu->kvm_dirty_gfns) {
> +        ret = munmap(cpu->kvm_dirty_gfns, s->kvm_dirty_ring_size);
> +        if (ret < 0) {
> +            goto err;
> +        }
> +    }
> +
>      vcpu = g_malloc0(sizeof(*vcpu));
>      vcpu->vcpu_id = kvm_arch_vcpu_id(cpu);
>      vcpu->kvm_fd = cpu->kvm_fd;
> @@ -468,6 +496,19 @@ int kvm_init_vcpu(CPUState *cpu, Error **errp)
>              (void *)cpu->kvm_run + s->coalesced_mmio * PAGE_SIZE;
>      }
>  
> +    if (s->kvm_dirty_ring_enabled) {
> +        /* Use MAP_SHARED to share pages with the kernel */
> +        cpu->kvm_dirty_gfns = mmap(NULL, s->kvm_dirty_ring_size,
> +                                   PROT_READ | PROT_WRITE, MAP_SHARED,
> +                                   cpu->kvm_fd,
> +                                   PAGE_SIZE * KVM_DIRTY_LOG_PAGE_OFFSET);
> +        if (cpu->kvm_dirty_gfns == MAP_FAILED) {
> +            ret = -errno;
> +            DPRINTF("mmap'ing vcpu dirty gfns failed: %d\n", ret);
> +            goto err;
> +        }
> +    }
> +
>      ret = kvm_arch_init_vcpu(cpu);
>      if (ret < 0) {
>          error_setg_errno(errp, -ret,
> @@ -586,6 +627,11 @@ static void kvm_slot_sync_dirty_pages(KVMSlot *slot)
>      cpu_physical_memory_set_dirty_lebitmap(slot->dirty_bmap, start, pages);
>  }
>  
> +static void kvm_slot_reset_dirty_pages(KVMSlot *slot)
> +{
> +    memset(slot->dirty_bmap, 0, slot->dirty_bmap_size);
> +}
> +
>  #define ALIGN(x, y)  (((x)+(y)-1) & ~((y)-1))
>  
>  /* Allocate the dirty bitmap for a slot  */
> @@ -642,6 +688,170 @@ static bool kvm_slot_get_dirty_log(KVMState *s, KVMSlot *slot)
>      return ret == 0;
>  }
>  
> +/* Should be with all slots_lock held for the address spaces. */
> +static void kvm_dirty_ring_mark_page(KVMState *s, uint32_t as_id,
> +                                     uint32_t slot_id, uint64_t offset)
> +{
> +    KVMMemoryListener *kml;
> +    KVMSlot *mem;
> +
> +    if (as_id >= s->nr_as) {
> +        return;
> +    }
> +
> +    kml = s->as[as_id].ml;
> +    mem = &kml->slots[slot_id];
> +
> +    if (!mem->memory_size || offset >= (mem->memory_size / TARGET_PAGE_SIZE)) {
It seems that TARGET_PAGE_SIZE should be qemu_real_host_page_size.


> +        return;
> +    }
> +
> +    set_bit(offset, mem->dirty_bmap);
> +}
> +
> +static bool dirty_gfn_is_dirtied(struct kvm_dirty_gfn *gfn)
> +{
> +    return gfn->flags == KVM_DIRTY_GFN_F_DIRTY;
> +}
> +
> +static void dirty_gfn_set_collected(struct kvm_dirty_gfn *gfn)
> +{
> +    gfn->flags = KVM_DIRTY_GFN_F_RESET;
> +}
> +
> +/*
> + * Should be with all slots_lock held for the address spaces.  It returns the
> + * dirty page we've collected on this dirty ring.
> + */
> +static uint32_t kvm_dirty_ring_reap_one(KVMState *s, CPUState *cpu)
> +{
> +    struct kvm_dirty_gfn *dirty_gfns = cpu->kvm_dirty_gfns, *cur;
> +    uint32_t gfn_count = s->kvm_dirty_gfn_count;
> +    uint32_t count = 0, fetch = cpu->kvm_fetch_index;
> +
> +    assert(dirty_gfns && gfn_count);
> +    trace_kvm_dirty_ring_reap_vcpu(cpu->cpu_index);
> +
> +    while (true) {
> +        cur = &dirty_gfns[fetch % gfn_count];
> +        if (!dirty_gfn_is_dirtied(cur)) {
> +            break;
> +        }
> +        kvm_dirty_ring_mark_page(s, cur->slot >> 16, cur->slot & 0xffff,
> +                                 cur->offset);
> +        dirty_gfn_set_collected(cur);
> +        trace_kvm_dirty_ring_page(cpu->cpu_index, fetch, cur->offset);
> +        fetch++;
> +        count++;
> +    }
> +    cpu->kvm_fetch_index = fetch;
> +
> +    return count;
> +}
> +
> +/* Must be with slots_lock held */
> +static uint64_t kvm_dirty_ring_reap_locked(KVMState *s)
> +{
> +    int ret;
> +    CPUState *cpu;
> +    uint64_t total = 0;
> +    int64_t stamp;
> +
> +    stamp = get_clock();
> +
> +    CPU_FOREACH(cpu) {
> +        total += kvm_dirty_ring_reap_one(s, cpu);
> +    }
> +
> +    if (total) {
> +        ret = kvm_vm_ioctl(s, KVM_RESET_DIRTY_RINGS);
> +        assert(ret == total);
> +    }
> +
> +    stamp = get_clock() - stamp;
> +
> +    if (total) {
> +        trace_kvm_dirty_ring_reap(total, stamp / 1000);
> +    }
> +
> +    return total;
> +}
> +
> +/*
> + * Currently for simplicity, we must hold BQL before calling this.  We can
> + * consider to drop the BQL if we're clear with all the race conditions.
> + */
> +static uint64_t kvm_dirty_ring_reap(KVMState *s)
> +{
> +    uint64_t total;
> +
> +    /*
> +     * We need to lock all kvm slots for all address spaces here,
> +     * because:
> +     *
> +     * (1) We need to mark dirty for dirty bitmaps in multiple slots
> +     *     and for tons of pages, so it's better to take the lock here
> +     *     once rather than once per page.  And more importantly,
> +     *
> +     * (2) We must _NOT_ publish dirty bits to the other threads
> +     *     (e.g., the migration thread) via the kvm memory slot dirty
> +     *     bitmaps before correctly re-protect those dirtied pages.
> +     *     Otherwise we can have potential risk of data corruption if
> +     *     the page data is read in the other thread before we do
> +     *     reset below.
> +     */
> +    kvm_slots_lock();
> +    total = kvm_dirty_ring_reap_locked(s);
> +    kvm_slots_unlock();
> +
> +    return total;
> +}
> +
> +static void do_kvm_cpu_synchronize_kick(CPUState *cpu, run_on_cpu_data arg)
> +{
> +    /* No need to do anything */
> +}
> +
> +/*
> + * Kick all vcpus out in a synchronized way.  When returned, we
> + * guarantee that every vcpu has been kicked and at least returned to
> + * userspace once.
> + */
> +static void kvm_cpu_synchronize_kick_all(void)
> +{
> +    CPUState *cpu;
> +
> +    CPU_FOREACH(cpu) {
> +        run_on_cpu(cpu, do_kvm_cpu_synchronize_kick, RUN_ON_CPU_NULL);
> +    }
> +}
> +
> +/*
> + * Flush all the existing dirty pages to the KVM slot buffers.  When
> + * this call returns, we guarantee that all the touched dirty pages
> + * before calling this function have been put into the per-kvmslot
> + * dirty bitmap.
> + *
> + * This function must be called with BQL held.
> + */
> +static void kvm_dirty_ring_flush(struct KVMDirtyRingReaper *r)
The argument is not used.

> +{
> +    trace_kvm_dirty_ring_flush(0);
> +    /*
> +     * The function needs to be serialized.  Since this function
> +     * should always be with BQL held, serialization is guaranteed.
> +     * However, let's be sure of it.
> +     */
> +    assert(qemu_mutex_iothread_locked());
> +    /*
> +     * First make sure to flush the hardware buffers by kicking all
> +     * vcpus out in a synchronous way.
> +     */
> +    kvm_cpu_synchronize_kick_all();
Can we make this function to be architecture specific?
It seems that kick out vCPU is an architecture specific way to flush hardware buffers
to dirty ring (for x86 PML).

> +    kvm_dirty_ring_reap(kvm_state);
> +    trace_kvm_dirty_ring_flush(1);
> +}
> +
>  /**
>   * kvm_physical_sync_dirty_bitmap - Sync dirty bitmap from kernel space
>   *
> @@ -1174,7 +1384,24 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
>                  goto out;
>              }
>              if (mem->flags & KVM_MEM_LOG_DIRTY_PAGES) {
> -                kvm_slot_get_dirty_log(kvm_state, mem);
> +                /*
> +                 * NOTE: We should be aware of the fact that here we're only
> +                 * doing a best effort to sync dirty bits.  No matter whether
> +                 * we're using dirty log or dirty ring, we ignored two facts:
> +                 *
> +                 * (1) dirty bits can reside in hardware buffers (PML)
> +                 *
> +                 * (2) after we collected dirty bits here, pages can be dirtied
> +                 * again before we do the final KVM_SET_USER_MEMORY_REGION to
> +                 * remove the slot.
> +                 *
> +                 * Not easy.  Let's cross the fingers until it's fixed.
> +                 */
> +                if (kvm_state->kvm_dirty_ring_enabled) {
> +                    kvm_dirty_ring_reap_locked(kvm_state);
> +                } else {
> +                    kvm_slot_get_dirty_log(kvm_state, mem);
> +                }
>                  kvm_slot_sync_dirty_pages(mem);
>              }
>  
> @@ -1222,6 +1449,51 @@ out:
>      kvm_slots_unlock();
>  }
>  
> +static void *kvm_dirty_ring_reaper_thread(void *data)
> +{
> +    KVMState *s = data;
> +    struct KVMDirtyRingReaper *r = &s->reaper;
> +
> +    rcu_register_thread();
> +
> +    trace_kvm_dirty_ring_reaper("init");
> +
> +    while (true) {
> +        r->reaper_state = KVM_DIRTY_RING_REAPER_WAIT;
> +        trace_kvm_dirty_ring_reaper("wait");
> +        /*
> +         * TODO: provide a smarter timeout rather than a constant?
> +         */
> +        sleep(1);
> +
> +        trace_kvm_dirty_ring_reaper("wakeup");
> +        r->reaper_state = KVM_DIRTY_RING_REAPER_REAPING;
> +
> +        qemu_mutex_lock_iothread();
> +        kvm_dirty_ring_reap(s);
> +        qemu_mutex_unlock_iothread();
> +
> +        r->reaper_iteration++;
> +    }
I don't know when does this iteration exit?
And I see that we start this reaper_thread in kvm_init(), maybe it's better to start it
when start dirty log and stop it when stop dirty log.

> +
> +    trace_kvm_dirty_ring_reaper("exit");
> +
> +    rcu_unregister_thread();
> +
> +    return NULL;
> +}
> +
> +static int kvm_dirty_ring_reaper_init(KVMState *s)
> +{
> +    struct KVMDirtyRingReaper *r = &s->reaper;
> +
> +    qemu_thread_create(&r->reaper_thr, "kvm-reaper",
> +                       kvm_dirty_ring_reaper_thread,
> +                       s, QEMU_THREAD_JOINABLE);
> +
> +    return 0;
> +}
> +
>  static void kvm_region_add(MemoryListener *listener,
>                             MemoryRegionSection *section)
>  {
> @@ -1250,6 +1522,36 @@ static void kvm_log_sync(MemoryListener *listener,
>      kvm_slots_unlock();
>  }
>  
> +static void kvm_log_sync_global(MemoryListener *l)
> +{
> +    KVMMemoryListener *kml = container_of(l, KVMMemoryListener, listener);
> +    KVMState *s = kvm_state;
> +    KVMSlot *mem;
> +    int i;
> +
> +    /* Flush all kernel dirty addresses into KVMSlot dirty bitmap */
> +    kvm_dirty_ring_flush(&s->reaper);
> +
> +    /*
> +     * TODO: make this faster when nr_slots is big while there are
> +     * only a few used slots (small VMs).
> +     */
> +    kvm_slots_lock();
> +    for (i = 0; i < s->nr_slots; i++) {
> +        mem = &kml->slots[i];
> +        if (mem->memory_size && mem->flags & KVM_MEM_LOG_DIRTY_PAGES) {
> +            kvm_slot_sync_dirty_pages(mem);
> +            /*
> +             * This is not needed by KVM_GET_DIRTY_LOG because the
> +             * ioctl will unconditionally overwrite the whole region.
> +             * However kvm dirty ring has no such side effect.
> +             */
> +            kvm_slot_reset_dirty_pages(mem);
> +        }
> +    }
> +    kvm_slots_unlock();
> +}
> +
>  static void kvm_log_clear(MemoryListener *listener,
>                            MemoryRegionSection *section)
>  {
> @@ -1356,10 +1658,15 @@ void kvm_memory_listener_register(KVMState *s, KVMMemoryListener *kml,
>      kml->listener.region_del = kvm_region_del;
>      kml->listener.log_start = kvm_log_start;
>      kml->listener.log_stop = kvm_log_stop;
> -    kml->listener.log_sync = kvm_log_sync;
> -    kml->listener.log_clear = kvm_log_clear;
>      kml->listener.priority = 10;
>  
> +    if (s->kvm_dirty_ring_enabled) {
> +        kml->listener.log_sync_global = kvm_log_sync_global;
> +    } else {
> +        kml->listener.log_sync = kvm_log_sync;
> +        kml->listener.log_clear = kvm_log_clear;
> +    }
> +
>      memory_listener_register(&kml->listener, as);
>  
>      for (i = 0; i < s->nr_as; ++i) {
> @@ -2281,6 +2588,14 @@ static int kvm_init(MachineState *ms)
>          ret = ram_block_discard_disable(true);
>          assert(!ret);
>      }
> +
> +    if (s->kvm_dirty_ring_enabled) {
> +        ret = kvm_dirty_ring_reaper_init(s);
> +        if (ret) {
> +            goto err;
> +        }
> +    }
> +
>      return 0;
>  
>  err:
> @@ -2593,6 +2908,17 @@ int kvm_cpu_exec(CPUState *cpu)
>          case KVM_EXIT_INTERNAL_ERROR:
>              ret = kvm_handle_internal_error(cpu, run);
>              break;
> +        case KVM_EXIT_DIRTY_RING_FULL:
> +            /*
> +             * We shouldn't continue if the dirty ring of this vcpu is
> +             * still full.  Got kicked by KVM_RESET_DIRTY_RINGS.
> +             */
> +            trace_kvm_dirty_ring_full(cpu->cpu_index);
> +            qemu_mutex_lock_iothread();
> +            kvm_dirty_ring_reap(kvm_state);
> +            qemu_mutex_unlock_iothread();
> +            ret = 0;
> +            break;
>          case KVM_EXIT_SYSTEM_EVENT:
>              switch (run->system_event.type) {
>              case KVM_SYSTEM_EVENT_SHUTDOWN:
> diff --git a/accel/kvm/trace-events b/accel/kvm/trace-events
> index e15ae8980d3..72a01320a1a 100644
> --- a/accel/kvm/trace-events
> +++ b/accel/kvm/trace-events
> @@ -18,4 +18,11 @@ kvm_set_ioeventfd_pio(int fd, uint16_t addr, uint32_t val, bool assign, uint32_t
>  kvm_set_user_memory(uint32_t slot, uint32_t flags, uint64_t guest_phys_addr, uint64_t memory_size, uint64_t userspace_addr, int ret) "Slot#%d flags=0x%x gpa=0x%"PRIx64 " size=0x%"PRIx64 " ua=0x%"PRIx64 " ret=%d"
>  kvm_clear_dirty_log(uint32_t slot, uint64_t start, uint32_t size) "slot#%"PRId32" start 0x%"PRIx64" size 0x%"PRIx32
>  kvm_resample_fd_notify(int gsi) "gsi %d"
> +kvm_dirty_ring_full(int id) "vcpu %d"
> +kvm_dirty_ring_reap_vcpu(int id) "vcpu %d"
> +kvm_dirty_ring_page(int vcpu, uint32_t slot, uint64_t offset) "vcpu %d fetch %"PRIu32" offset 0x%"PRIx64
> +kvm_dirty_ring_reaper(const char *s) "%s"
> +kvm_dirty_ring_reap(uint64_t count, int64_t t) "reaped %"PRIu64" pages (took %"PRIi64" us)"
> +kvm_dirty_ring_reaper_kick(const char *reason) "%s"
> +kvm_dirty_ring_flush(int finished) "%d"
>  
> diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
> index c68bc3ba8af..2f0991d93f7 100644
> --- a/include/hw/core/cpu.h
> +++ b/include/hw/core/cpu.h
> @@ -323,6 +323,11 @@ struct qemu_work_item;
>   * @ignore_memory_transaction_failures: Cached copy of the MachineState
>   *    flag of the same name: allows the board to suppress calling of the
>   *    CPU do_transaction_failed hook function.
> + * @kvm_dirty_ring_full:
> + *   Whether the kvm dirty ring of this vcpu is soft-full.
> + * @kvm_dirty_ring_avail:
> + *   Semaphore to be posted when the kvm dirty ring of the vcpu is
> + *   available again.
The doc does not match code.

>   *
>   * State of one CPU core or thread.
>   */
> @@ -394,9 +399,12 @@ struct CPUState {
>       */
>      uintptr_t mem_io_pc;
>  
> +    /* Only used in KVM */
>      int kvm_fd;
>      struct KVMState *kvm_state;
>      struct kvm_run *kvm_run;
> +    struct kvm_dirty_gfn *kvm_dirty_gfns;
> +    uint32_t kvm_fetch_index;
>  
>      /* Used for events with 'vcpu' and *without* the 'disabled' properties */
>      DECLARE_BITMAP(trace_dstate_delayed, CPU_TRACE_DSTATE_MAX_EVENTS);
> 

Thanks,
Keqian


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 02/10] KVM: Use a big lock to replace per-kml slots_lock
  2021-03-22 10:47   ` Keqian Zhu
@ 2021-03-22 13:54     ` Paolo Bonzini
  2021-03-22 16:27       ` Peter Xu
  0 siblings, 1 reply; 29+ messages in thread
From: Paolo Bonzini @ 2021-03-22 13:54 UTC (permalink / raw)
  To: Keqian Zhu, Peter Xu, qemu-devel; +Cc: Hyman, Dr . David Alan Gilbert

On 22/03/21 11:47, Keqian Zhu wrote:
>> +    qemu_mutex_init(&kml_slots_lock);
> As you said, x86 has two address spaces, is it a problem that we may have multi initialization for kml_slots_lock?

Agreed, the lock can be initialized (if only for cleanliness) in kvm_init.

Paolo



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 09/10] KVM: Disable manual dirty log when dirty ring enabled
  2021-03-22  9:17   ` Keqian Zhu
@ 2021-03-22 13:55     ` Paolo Bonzini
  2021-03-22 16:21       ` Peter Xu
  0 siblings, 1 reply; 29+ messages in thread
From: Paolo Bonzini @ 2021-03-22 13:55 UTC (permalink / raw)
  To: Keqian Zhu, Peter Xu, qemu-devel; +Cc: Hyman, Dr . David Alan Gilbert

On 22/03/21 10:17, Keqian Zhu wrote:
> Hi Peter,
> 
> On 2021/3/11 4:33, Peter Xu wrote:
>> KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 is for KVM_CLEAR_DIRTY_LOG, which is only
>> useful for KVM_GET_DIRTY_LOG.  Skip enabling it for kvm dirty ring.
>>
>> More importantly, KVM_DIRTY_LOG_INITIALLY_SET will not wr-protect all the pages
>> initially, which is against how kvm dirty ring is used - there's no way for kvm
>> dirty ring to re-protect a page before it's notified as being written first
>> with a GFN entry in the ring!  So when KVM_DIRTY_LOG_INITIALLY_SET is enabled
>> with dirty ring, we'll see silent data loss after migration.
> I feel a little regret that dirty ring can not work with KVM_DIRTY_LOG_INITIALLY_SET ...
> With KVM_DIRTY_LOG_INITIALLY_SET, we can speedup dirty log start. More important, we can
> enable dirty log gradually. For write fault based dirty log, it greatly reduces the side
> effect of dirty log over guest.
> 
> I hope we can put forward another similar optimization under dirty ring mode. :)

Indeed, perhaps (even though KVM_GET_DIRTY_LOG does not make sense with 
dirty ring) we could allow KVM_CLEAR_DIRTY_LOG.

Paolo



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 00/10] KVM: Dirty ring support (QEMU part)
  2021-03-10 20:32 [PATCH v5 00/10] KVM: Dirty ring support (QEMU part) Peter Xu
                   ` (10 preceding siblings ...)
  2021-03-19 18:12 ` [PATCH v5 00/10] KVM: Dirty ring support (QEMU part) Peter Xu
@ 2021-03-22 14:02 ` Keqian Zhu
  2021-03-22 19:45   ` Peter Xu
  11 siblings, 1 reply; 29+ messages in thread
From: Keqian Zhu @ 2021-03-22 14:02 UTC (permalink / raw)
  To: Peter Xu, qemu-devel; +Cc: Paolo Bonzini, Hyman, Dr . David Alan Gilbert

Hi Peter,

On 2021/3/11 4:32, Peter Xu wrote:
> This is v5 of the qemu dirty ring interface support.
> 
> 
> 
> v5:
> 
> - rebase
> 
> - dropped patch "update-linux-headers: Include const.h" after rebase
> 
> - dropped patch "KVM: Fixup kvm_log_clear_one_slot() ioctl return check" since
> 
>   similar patch got merged recently (38e0b7904eca7cd32f8953c3)
> 
> 
> 
> ========= v4 cover letter below =============
> 
> 
> 
> It is merely the same as v3 content-wise, but there're a few things to mention
> 
> besides the rebase itself:
> 
> 
> 
>   - I picked up two patches from Eric Farman for the linux-header updates (from
> 
>     Eric's v3 series) for convenience just in case any of the series would got
> 
>     queued by any maintainer.
> 
> 
> 
>   - One more patch is added as "KVM: Disable manual dirty log when dirty ring
> 
>     enabled".  I found this when testing the branch after rebasing to latest
> 
>     qemu, that not only the manual dirty log capability is not needed for kvm
> 
>     dirty ring, but more importantly INITIALLY_ALL_SET is totally against kvm
> 
>     dirty ring and it could silently crash the guest after migration.  For this
> 
>     new commit, I touched up "KVM: Add dirty-gfn-count property" a bit.
> 
> 
> 
>   - A few more documentation lines in qemu-options.hx.
> 
> 
> 
>   - I removed the RFC tag after kernel series got merged.
> 
> 
> 
> Again, this is only the 1st step to support dirty ring.  Ideally dirty ring
> 
> should grant QEMU the possibility to remove the whole layered dirty bitmap so
> 
> that dirty ring will work similarly as auto-converge enabled but should better;
> 
> we will just throttle vcpus with the dirty ring kvm exit rather than explicitly
> 
> adding a timer to stop the vcpu thread from entering the guest again (like what
> 
> we did with current migration auto-converge).  Some more information could also
> 
> be found in the kvm forum 2020 talk regarding kvm dirty ring (slides 21/22 [1]).
I have read this pdf and code, and I have some questions, hope you can help me. :)

You emphasize that dirty ring is a "Thread-local buffers", but dirty bitmap is global,
but I don't see it has optimization about "locking" compared to dirty bitmap.

The thread-local means that vCPU can flush hardware buffer into dirty ring without
locking, but for bitmap, vCPU can also use atomic set to mark dirty without locking.
Maybe I miss something?

The second question is that you observed longer migration time (55s->73s) when guest
has 24G ram and dirty rate is 800M/s. I am not clear about the reason. As with dirty
ring enabled, Qemu can get dirty info faster which means it handles dirty page more
quick, and guest can be throttled which means dirty page is generated slower. What's
the rationale for the longer migration time?

PS: As the dirty ring is still converted into dirty_bitmap of kvm_slot, so the
"get dirty info faster" maybe not true. :-(

Thanks,
Keqian

> 
> 
> 
> That next step (to remove all the dirty bitmaps, as mentioned above) is still
> 
> discussable: firstly I don't know whether there's anything I've overlooked in
> 
> there.  Meanwhile that's also only services huge VM cases, may not be extremely
> 
> helpful with a lot major scenarios where VMs are not that huge.
> 
> 
> 
> There's probably other ways to fix huge VM migration issues, majorly focusing
> 
> on responsiveness and convergence.  For example, Google has proposed some new
> 
> userfaultfd kernel capability called "minor modes" [2] to track page minor
> 
> faults and that could be finally served for that purpose too using postcopy.
> 
> That's another long story so I'll stop here, but just as a marker along with
> 
> the dirty ring series so there'll still be a record to reference.
> 
> 
> 
> Said that, I still think this series is very worth merging even if we don't
> 
> persue the next steps yet, since dirty ring is disabled by default, and we can
> 
> always work upon this series.
> 
> 
> 
> Please review, thanks.
> 
> 
> 
> V3: https://lore.kernel.org/qemu-devel/20200523232035.1029349-1-peterx@redhat.com/
> 
>     (V3 contains all the pre-v3 changelog)
> 
> 
> 
> QEMU branch for testing (requires kernel version 5.11-rc1+):
> 
>     https://github.com/xzpeter/qemu/tree/kvm-dirty-ring
> 
> 
> 
> [1] https://static.sched.com/hosted_files/kvmforum2020/97/kvm_dirty_ring_peter.pdf
> 
> [2] https://lore.kernel.org/lkml/20210107190453.3051110-1-axelrasmussen@google.com/
> 
> 
> 
> ---------------------------8<---------------------------------
> 
> 
> 
> Overview
> 
> ========
> 
> 
> 
> KVM dirty ring is a new interface to pass over dirty bits from kernel
> 
> to the userspace.  Instead of using a bitmap for each memory region,
> 
> the dirty ring contains an array of dirtied GPAs to fetch, one ring
> 
> per vcpu.
> 
> 
> 
> There're a few major changes comparing to how the old dirty logging
> 
> interface would work:
> 
> 
> 
> - Granularity of dirty bits
> 
> 
> 
>   KVM dirty ring interface does not offer memory region level
> 
>   granularity to collect dirty bits (i.e., per KVM memory
> 
>   slot). Instead the dirty bit is collected globally for all the vcpus
> 
>   at once.  The major effect is on VGA part because VGA dirty tracking
> 
>   is enabled as long as the device is created, also it was in memory
> 
>   region granularity.  Now that operation will be amplified to a VM
> 
>   sync.  Maybe there's smarter way to do the same thing in VGA with
> 
>   the new interface, but so far I don't see it affects much at least
> 
>   on regular VMs.
> 
> 
> 
> - Collection of dirty bits
> 
> 
> 
>   The old dirty logging interface collects KVM dirty bits when
> 
>   synchronizing dirty bits.  KVM dirty ring interface instead used a
> 
>   standalone thread to do that.  So when the other thread (e.g., the
> 
>   migration thread) wants to synchronize the dirty bits, it simply
> 
>   kick the thread and wait until it flushes all the dirty bits to the
> 
>   ramblock dirty bitmap.
> 
> 
> 
> A new parameter "dirty-ring-size" is added to "-accel kvm".  By
> 
> default, dirty ring is still disabled (size==0).  To enable it, we
> 
> need to be with:
> 
> 
> 
>   -accel kvm,dirty-ring-size=65536
> 
> 
> 
> This establishes a 64K dirty ring buffer per vcpu.  Then if we
> 
> migrate, it'll switch to dirty ring.
> 
> 
> 
> I gave it a shot with a 24G guest, 8 vcpus, using 10g NIC as migration
> 
> channel.  When idle or dirty workload small, I don't observe major
> 
> difference on total migration time.  When with higher random dirty
> 
> workload (800MB/s dirty rate upon 20G memory, worse for kvm dirty
> 
> ring). Total migration time is (ping pong migrate for 6 times, in
> 
> seconds):
> 
> 
> 
> |-------------------------+---------------|
> 
> | dirty ring (4k entries) | dirty logging |
> 
> |-------------------------+---------------|
> 
> |                      70 |            58 |
> 
> |                      78 |            70 |
> 
> |                      72 |            48 |
> 
> |                      74 |            52 |
> 
> |                      83 |            49 |
> 
> |                      65 |            54 |
> 
> |-------------------------+---------------|
> 
> 
> 
> Summary:
> 
> 
> 
> dirty ring average:    73s
> 
> dirty logging average: 55s
> 
> 
> 
> The KVM dirty ring will be slower in above case.  The number may show
> 
> that the dirty logging is still preferred as a default value because
> 
> small/medium VMs are still major cases, and high dirty workload
> 
> happens frequently too.  And that's what this series did.
> 
> 
> 
> TODO:
> 
> 
> 
> - Consider to drop the BQL dependency: then we can run the reaper thread in
> 
>   parallel of main thread.  Needs some thought around the race conditions.
> 
> 
> 
> - Consider to drop the kvmslot bitmap: logically this can be dropped with kvm
> 
>   dirty ring, not only for space saving, but also it's still another layer
> 
>   linear to guest mem size which is against the whole idea of kvm dirty ring.
> 
>   This should make above number (of kvm dirty ring) even smaller (but still may
> 
>   not be as good as dirty logging when with such high workload).
> 
> 
> 
> Please refer to the code and comment itself for more information.
> 
> 
> 
> Thanks,
> 
> 
> 
> Peter Xu (10):
> 
>   memory: Introduce log_sync_global() to memory listener
> 
>   KVM: Use a big lock to replace per-kml slots_lock
> 
>   KVM: Create the KVMSlot dirty bitmap on flag changes
> 
>   KVM: Provide helper to get kvm dirty log
> 
>   KVM: Provide helper to sync dirty bitmap from slot to ramblock
> 
>   KVM: Simplify dirty log sync in kvm_set_phys_mem
> 
>   KVM: Cache kvm slot dirty bitmap size
> 
>   KVM: Add dirty-gfn-count property
> 
>   KVM: Disable manual dirty log when dirty ring enabled
> 
>   KVM: Dirty ring support
> 
> 
> 
>  accel/kvm/kvm-all.c      | 585 +++++++++++++++++++++++++++++++++------
> 
>  accel/kvm/trace-events   |   7 +
> 
>  include/exec/memory.h    |  12 +
> 
>  include/hw/core/cpu.h    |   8 +
> 
>  include/sysemu/kvm_int.h |   7 +-
> 
>  qemu-options.hx          |  12 +
> 
>  softmmu/memory.c         |  33 ++-
> 
>  7 files changed, 565 insertions(+), 99 deletions(-)
> 
> 
> 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 09/10] KVM: Disable manual dirty log when dirty ring enabled
  2021-03-22 13:55     ` Paolo Bonzini
@ 2021-03-22 16:21       ` Peter Xu
  0 siblings, 0 replies; 29+ messages in thread
From: Peter Xu @ 2021-03-22 16:21 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Hyman, Keqian Zhu, qemu-devel, Dr . David Alan Gilbert

On Mon, Mar 22, 2021 at 02:55:44PM +0100, Paolo Bonzini wrote:
> On 22/03/21 10:17, Keqian Zhu wrote:
> > Hi Peter,
> > 
> > On 2021/3/11 4:33, Peter Xu wrote:
> > > KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 is for KVM_CLEAR_DIRTY_LOG, which is only
> > > useful for KVM_GET_DIRTY_LOG.  Skip enabling it for kvm dirty ring.
> > > 
> > > More importantly, KVM_DIRTY_LOG_INITIALLY_SET will not wr-protect all the pages
> > > initially, which is against how kvm dirty ring is used - there's no way for kvm
> > > dirty ring to re-protect a page before it's notified as being written first
> > > with a GFN entry in the ring!  So when KVM_DIRTY_LOG_INITIALLY_SET is enabled
> > > with dirty ring, we'll see silent data loss after migration.
> > I feel a little regret that dirty ring can not work with KVM_DIRTY_LOG_INITIALLY_SET ...
> > With KVM_DIRTY_LOG_INITIALLY_SET, we can speedup dirty log start. More important, we can
> > enable dirty log gradually. For write fault based dirty log, it greatly reduces the side
> > effect of dirty log over guest.
> > 
> > I hope we can put forward another similar optimization under dirty ring mode. :)
> 
> Indeed, perhaps (even though KVM_GET_DIRTY_LOG does not make sense with
> dirty ring) we could allow KVM_CLEAR_DIRTY_LOG.

Right, KVM_CLEAR_DIRTY_LOG is a good interface to reuse so as to grant
userspace more flexibility to explicit wr-protect some guest pages.  However
that'll need kernel reworks - obviously when I worked on the kernel part I
didn't notice this issue..

To make it a complete work, IMHO we'll also need QEMU to completely drop the
whole dirty bitmap in all the layers, then I'd expect the dirty ring idea as a
whole start to make more difference on huge vms.  They all just need some more
work which should be based on this series.

Shall we make it a "TODO" though?  E.g., I can add a comment here mentioning
about this issue.  I still hope we can have the qemu series lands soon first,
since it'll be bigger project to fully complete it.  Paolo?

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 02/10] KVM: Use a big lock to replace per-kml slots_lock
  2021-03-22 13:54     ` Paolo Bonzini
@ 2021-03-22 16:27       ` Peter Xu
  2021-03-24 18:08         ` Peter Xu
  0 siblings, 1 reply; 29+ messages in thread
From: Peter Xu @ 2021-03-22 16:27 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Hyman, Keqian Zhu, qemu-devel, Dr . David Alan Gilbert

[-- Attachment #1: Type: text/plain, Size: 473 bytes --]

On Mon, Mar 22, 2021 at 02:54:30PM +0100, Paolo Bonzini wrote:
> On 22/03/21 11:47, Keqian Zhu wrote:
> > > +    qemu_mutex_init(&kml_slots_lock);
> > As you said, x86 has two address spaces, is it a problem that we may have multi initialization for kml_slots_lock?
> 
> Agreed, the lock can be initialized (if only for cleanliness) in kvm_init.

Definitely, I'm surprised why I didn't see this... :) Paolo, do you want me to
add another patch (as attached)?

-- 
Peter Xu

[-- Attachment #2: 0001-qemu-thread-Assert-and-check-mutex-re-initialize.patch --]
[-- Type: text/plain, Size: 987 bytes --]

From 0cb7124d111426f255113814cb8395620276cc1f Mon Sep 17 00:00:00 2001
From: Peter Xu <peterx@redhat.com>
Date: Mon, 22 Mar 2021 12:25:18 -0400
Subject: [PATCH] qemu-thread: Assert and check mutex re-initialize

qemu_mutex_post_init() sets mutex->initialized but not check it yet.  Add it,
so as to detect re-initialization of mutexes.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 util/qemu-thread-common.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/util/qemu-thread-common.h b/util/qemu-thread-common.h
index 2af6b120853..e02059845d8 100644
--- a/util/qemu-thread-common.h
+++ b/util/qemu-thread-common.h
@@ -15,6 +15,7 @@
 
 #include "qemu/thread.h"
 #include "trace.h"
+#include <assert.h>
 
 static inline void qemu_mutex_post_init(QemuMutex *mutex)
 {
@@ -22,6 +23,7 @@ static inline void qemu_mutex_post_init(QemuMutex *mutex)
     mutex->file = NULL;
     mutex->line = 0;
 #endif
+    assert(!mutex->initialized);
     mutex->initialized = true;
 }
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 10/10] KVM: Dirty ring support
  2021-03-22 13:37   ` Keqian Zhu
@ 2021-03-22 18:52     ` Peter Xu
  2021-03-23  1:25       ` Keqian Zhu
  0 siblings, 1 reply; 29+ messages in thread
From: Peter Xu @ 2021-03-22 18:52 UTC (permalink / raw)
  To: Keqian Zhu; +Cc: Paolo Bonzini, Hyman, qemu-devel, Dr . David Alan Gilbert

On Mon, Mar 22, 2021 at 09:37:19PM +0800, Keqian Zhu wrote:
> > +/* Should be with all slots_lock held for the address spaces. */
> > +static void kvm_dirty_ring_mark_page(KVMState *s, uint32_t as_id,
> > +                                     uint32_t slot_id, uint64_t offset)
> > +{
> > +    KVMMemoryListener *kml;
> > +    KVMSlot *mem;
> > +
> > +    if (as_id >= s->nr_as) {
> > +        return;
> > +    }
> > +
> > +    kml = s->as[as_id].ml;
> > +    mem = &kml->slots[slot_id];
> > +
> > +    if (!mem->memory_size || offset >= (mem->memory_size / TARGET_PAGE_SIZE)) {
> It seems that TARGET_PAGE_SIZE should be qemu_real_host_page_size.

Fixed.

[...]

> > +/*
> > + * Flush all the existing dirty pages to the KVM slot buffers.  When
> > + * this call returns, we guarantee that all the touched dirty pages
> > + * before calling this function have been put into the per-kvmslot
> > + * dirty bitmap.
> > + *
> > + * This function must be called with BQL held.
> > + */
> > +static void kvm_dirty_ring_flush(struct KVMDirtyRingReaper *r)
> The argument is not used.

Indeed, removed.

> 
> > +{
> > +    trace_kvm_dirty_ring_flush(0);
> > +    /*
> > +     * The function needs to be serialized.  Since this function
> > +     * should always be with BQL held, serialization is guaranteed.
> > +     * However, let's be sure of it.
> > +     */
> > +    assert(qemu_mutex_iothread_locked());
> > +    /*
> > +     * First make sure to flush the hardware buffers by kicking all
> > +     * vcpus out in a synchronous way.
> > +     */
> > +    kvm_cpu_synchronize_kick_all();
> Can we make this function to be architecture specific?
> It seems that kick out vCPU is an architecture specific way to flush hardware buffers
> to dirty ring (for x86 PML).

I can do that, but I'd say it's kind of an overkill if after all the kernel
support is not there yet, so I still tend to make it as simple as possible.

[...]

> > +static void *kvm_dirty_ring_reaper_thread(void *data)
> > +{
> > +    KVMState *s = data;
> > +    struct KVMDirtyRingReaper *r = &s->reaper;
> > +
> > +    rcu_register_thread();
> > +
> > +    trace_kvm_dirty_ring_reaper("init");
> > +
> > +    while (true) {
> > +        r->reaper_state = KVM_DIRTY_RING_REAPER_WAIT;
> > +        trace_kvm_dirty_ring_reaper("wait");
> > +        /*
> > +         * TODO: provide a smarter timeout rather than a constant?
> > +         */
> > +        sleep(1);
> > +
> > +        trace_kvm_dirty_ring_reaper("wakeup");
> > +        r->reaper_state = KVM_DIRTY_RING_REAPER_REAPING;
> > +
> > +        qemu_mutex_lock_iothread();
> > +        kvm_dirty_ring_reap(s);
> > +        qemu_mutex_unlock_iothread();
> > +
> > +        r->reaper_iteration++;
> > +    }
> I don't know when does this iteration exit?
> And I see that we start this reaper_thread in kvm_init(), maybe it's better to start it
> when start dirty log and stop it when stop dirty log.

Yes we can make it conditional, but note that we can't hook at functions like
memory_global_dirty_log_start() because that is only for migration purpose.

Currently QEMU exports the dirty tracking more than that, e.g., to the VGA
code.  We'll need to try to detect whether there's any existing MR got its
mr->dirty_log_mask set (besides global_dirty_log being set).  When all of them
got cleared we'll need to detect too so as to turn the thread off.

It's just easier to me to run this thread with such a timeout, then when not
necessary it'll see empty ring and return fast (index comparison for each
ring).  Not to mention the VGA dirty tracking should be on for most of the VM
lifecycle, so even if we have that knob this thread will probably be running
for 99% of the time as long as any MR has its DIRTY_MEMORY_VGA bit set.

[...]

> > diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
> > index c68bc3ba8af..2f0991d93f7 100644
> > --- a/include/hw/core/cpu.h
> > +++ b/include/hw/core/cpu.h
> > @@ -323,6 +323,11 @@ struct qemu_work_item;
> >   * @ignore_memory_transaction_failures: Cached copy of the MachineState
> >   *    flag of the same name: allows the board to suppress calling of the
> >   *    CPU do_transaction_failed hook function.
> > + * @kvm_dirty_ring_full:
> > + *   Whether the kvm dirty ring of this vcpu is soft-full.
> > + * @kvm_dirty_ring_avail:
> > + *   Semaphore to be posted when the kvm dirty ring of the vcpu is
> > + *   available again.
> The doc does not match code.

Right; fixed.

Thanks for taking a look, keqian.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 00/10] KVM: Dirty ring support (QEMU part)
  2021-03-22 14:02 ` Keqian Zhu
@ 2021-03-22 19:45   ` Peter Xu
  2021-03-23  6:40     ` Keqian Zhu
  0 siblings, 1 reply; 29+ messages in thread
From: Peter Xu @ 2021-03-22 19:45 UTC (permalink / raw)
  To: Keqian Zhu; +Cc: Paolo Bonzini, Hyman, qemu-devel, Dr . David Alan Gilbert

On Mon, Mar 22, 2021 at 10:02:38PM +0800, Keqian Zhu wrote:
> Hi Peter,

Hi, Keqian,

[...]

> You emphasize that dirty ring is a "Thread-local buffers", but dirty bitmap is global,
> but I don't see it has optimization about "locking" compared to dirty bitmap.
> 
> The thread-local means that vCPU can flush hardware buffer into dirty ring without
> locking, but for bitmap, vCPU can also use atomic set to mark dirty without locking.
> Maybe I miss something?

Yes, the atomic ops guaranteed locking as you said, but afaiu atomics are
expensive already, since at least on x86 I think it needs to lock the memory
bus.  IIUC that'll become even slower as cores grow, as long as the cores share
the memory bus.

KVM dirty ring is per-vcpu, it means its metadata can be modified locally
without atomicity at all (but still, we'll need READ_ONCE/WRITE_ONCE to
guarantee ordering of memory accesses).  It should scale better especially with
hosts who have lots of cores.

> 
> The second question is that you observed longer migration time (55s->73s) when guest
> has 24G ram and dirty rate is 800M/s. I am not clear about the reason. As with dirty
> ring enabled, Qemu can get dirty info faster which means it handles dirty page more
> quick, and guest can be throttled which means dirty page is generated slower. What's
> the rationale for the longer migration time?

Because dirty ring is more sensitive to dirty rate, while dirty bitmap is more
sensitive to memory footprint.  In above 24G mem + 800MB/s dirty rate
condition, dirty bitmap seems to be more efficient, say, collecting dirty
bitmap of 24G mem (24G/4K/8=0.75MB) for each migration cycle is fast enough.

Not to mention that current implementation of dirty ring in QEMU is not
complete - we still have two more layers of dirty bitmap, so it's actually a
mixture of dirty bitmap and dirty ring.  This series is more like a POC on
dirty ring interface, so as to let QEMU be able to run on KVM dirty ring.
E.g., we won't have hang issue when getting dirty pages since it's totally
async, however we'll still have some legacy dirty bitmap issues e.g. memory
consumption of userspace dirty bitmaps are still linear to memory footprint.

Moreover, IMHO another important feature that dirty ring provided is actually
the full-exit, where we can pause a vcpu when it dirties too fast, while other
vcpus won't be affected.  That's something I really wanted to POC too but I
don't have enough time.  I think it's a worth project in the future to really
make the full-exit throttle vcpus, then ideally we'll remove all the dirty
bitmaps in QEMU as long as dirty ring is on.

So I'd say the number I got at that time is not really helping a lot - as you
can see for small VMs it won't make things faster.  Maybe a bit more efficient?
I can't tell.  From design-wise it looks actually still better.  However dirty
logging still has the reasoning to be the default interface we use for small
vms, imho.

> 
> PS: As the dirty ring is still converted into dirty_bitmap of kvm_slot, so the
> "get dirty info faster" maybe not true. :-(

We can get dirty info faster even now, I think, because previously we only do
KVM_GET_DIRTY_LOG once per migration iteration, which could be tens of seconds
for a VM mentioned above with 24G and 800MB/s dirty rate.  Dirty ring is fully
async, we'll get that after the reaper thread timeout.  However I must also
confess "get dirty info faster" doesn't help us a lot on anything yet, afaict,
comparing to a full-featured dirty logging where clear dirty log and so on.

Hope above helps.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 10/10] KVM: Dirty ring support
  2021-03-22 18:52     ` Peter Xu
@ 2021-03-23  1:25       ` Keqian Zhu
  0 siblings, 0 replies; 29+ messages in thread
From: Keqian Zhu @ 2021-03-23  1:25 UTC (permalink / raw)
  To: Peter Xu; +Cc: Paolo Bonzini, Hyman, qemu-devel, Dr . David Alan Gilbert



On 2021/3/23 2:52, Peter Xu wrote:
> On Mon, Mar 22, 2021 at 09:37:19PM +0800, Keqian Zhu wrote:
>>> +/* Should be with all slots_lock held for the address spaces. */
>>> +static void kvm_dirty_ring_mark_page(KVMState *s, uint32_t as_id,
>>> +                                     uint32_t slot_id, uint64_t offset)
>>> +{
>>> +    KVMMemoryListener *kml;
>>> +    KVMSlot *mem;
>>> +
>>> +    if (as_id >= s->nr_as) {
>>> +        return;
>>> +    }
>>> +
>>> +    kml = s->as[as_id].ml;
>>> +    mem = &kml->slots[slot_id];
>>> +
>>> +    if (!mem->memory_size || offset >= (mem->memory_size / TARGET_PAGE_SIZE)) {
>> It seems that TARGET_PAGE_SIZE should be qemu_real_host_page_size.
> 
> Fixed.
> 
> [...]
> 
>>> +/*
>>> + * Flush all the existing dirty pages to the KVM slot buffers.  When
>>> + * this call returns, we guarantee that all the touched dirty pages
>>> + * before calling this function have been put into the per-kvmslot
>>> + * dirty bitmap.
>>> + *
>>> + * This function must be called with BQL held.
>>> + */
>>> +static void kvm_dirty_ring_flush(struct KVMDirtyRingReaper *r)
>> The argument is not used.
> 
> Indeed, removed.
> 
>>
>>> +{
>>> +    trace_kvm_dirty_ring_flush(0);
>>> +    /*
>>> +     * The function needs to be serialized.  Since this function
>>> +     * should always be with BQL held, serialization is guaranteed.
>>> +     * However, let's be sure of it.
>>> +     */
>>> +    assert(qemu_mutex_iothread_locked());
>>> +    /*
>>> +     * First make sure to flush the hardware buffers by kicking all
>>> +     * vcpus out in a synchronous way.
>>> +     */
>>> +    kvm_cpu_synchronize_kick_all();
>> Can we make this function to be architecture specific?
>> It seems that kick out vCPU is an architecture specific way to flush hardware buffers
>> to dirty ring (for x86 PML).
> 
> I can do that, but I'd say it's kind of an overkill if after all the kernel
> support is not there yet, so I still tend to make it as simple as possible.
OK.

> 
> [...]
> 
>>> +static void *kvm_dirty_ring_reaper_thread(void *data)
>>> +{
>>> +    KVMState *s = data;
>>> +    struct KVMDirtyRingReaper *r = &s->reaper;
>>> +
>>> +    rcu_register_thread();
>>> +
>>> +    trace_kvm_dirty_ring_reaper("init");
>>> +
>>> +    while (true) {
>>> +        r->reaper_state = KVM_DIRTY_RING_REAPER_WAIT;
>>> +        trace_kvm_dirty_ring_reaper("wait");
>>> +        /*
>>> +         * TODO: provide a smarter timeout rather than a constant?
>>> +         */
>>> +        sleep(1);
>>> +
>>> +        trace_kvm_dirty_ring_reaper("wakeup");
>>> +        r->reaper_state = KVM_DIRTY_RING_REAPER_REAPING;
>>> +
>>> +        qemu_mutex_lock_iothread();
>>> +        kvm_dirty_ring_reap(s);
>>> +        qemu_mutex_unlock_iothread();
>>> +
>>> +        r->reaper_iteration++;
>>> +    }
>> I don't know when does this iteration exit?
>> And I see that we start this reaper_thread in kvm_init(), maybe it's better to start it
>> when start dirty log and stop it when stop dirty log.
> 
> Yes we can make it conditional, but note that we can't hook at functions like
> memory_global_dirty_log_start() because that is only for migration purpose.
> 
> Currently QEMU exports the dirty tracking more than that, e.g., to the VGA
> code.  We'll need to try to detect whether there's any existing MR got its
> mr->dirty_log_mask set (besides global_dirty_log being set).  When all of them
> got cleared we'll need to detect too so as to turn the thread off.
> 
> It's just easier to me to run this thread with such a timeout, then when not
> necessary it'll see empty ring and return fast (index comparison for each
> ring).  Not to mention the VGA dirty tracking should be on for most of the VM
> lifecycle, so even if we have that knob this thread will probably be running
> for 99% of the time as long as any MR has its DIRTY_MEMORY_VGA bit set.
Make sense. Thanks for your explanation!

Thanks,
Keqian
> 
> [...]
> 
>>> diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
>>> index c68bc3ba8af..2f0991d93f7 100644
>>> --- a/include/hw/core/cpu.h
>>> +++ b/include/hw/core/cpu.h
>>> @@ -323,6 +323,11 @@ struct qemu_work_item;
>>>   * @ignore_memory_transaction_failures: Cached copy of the MachineState
>>>   *    flag of the same name: allows the board to suppress calling of the
>>>   *    CPU do_transaction_failed hook function.
>>> + * @kvm_dirty_ring_full:
>>> + *   Whether the kvm dirty ring of this vcpu is soft-full.
>>> + * @kvm_dirty_ring_avail:
>>> + *   Semaphore to be posted when the kvm dirty ring of the vcpu is
>>> + *   available again.
>> The doc does not match code.
> 
> Right; fixed.
> 
> Thanks for taking a look, keqian.
> 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 00/10] KVM: Dirty ring support (QEMU part)
  2021-03-22 19:45   ` Peter Xu
@ 2021-03-23  6:40     ` Keqian Zhu
  2021-03-23 14:34       ` Peter Xu
  0 siblings, 1 reply; 29+ messages in thread
From: Keqian Zhu @ 2021-03-23  6:40 UTC (permalink / raw)
  To: Peter Xu; +Cc: Paolo Bonzini, Hyman, qemu-devel, Dr . David Alan Gilbert

Hi Peter,

On 2021/3/23 3:45, Peter Xu wrote:
> On Mon, Mar 22, 2021 at 10:02:38PM +0800, Keqian Zhu wrote:
>> Hi Peter,
> 
> Hi, Keqian,
> 
> [...]
> 
>> You emphasize that dirty ring is a "Thread-local buffers", but dirty bitmap is global,
>> but I don't see it has optimization about "locking" compared to dirty bitmap.
>>
>> The thread-local means that vCPU can flush hardware buffer into dirty ring without
>> locking, but for bitmap, vCPU can also use atomic set to mark dirty without locking.
>> Maybe I miss something?
> 
> Yes, the atomic ops guaranteed locking as you said, but afaiu atomics are
> expensive already, since at least on x86 I think it needs to lock the memory
> bus.  IIUC that'll become even slower as cores grow, as long as the cores share
> the memory bus.
> 
> KVM dirty ring is per-vcpu, it means its metadata can be modified locally
> without atomicity at all (but still, we'll need READ_ONCE/WRITE_ONCE to
> guarantee ordering of memory accesses).  It should scale better especially with
> hosts who have lots of cores.
That makes sense to me.

> 
>>
>> The second question is that you observed longer migration time (55s->73s) when guest
>> has 24G ram and dirty rate is 800M/s. I am not clear about the reason. As with dirty
>> ring enabled, Qemu can get dirty info faster which means it handles dirty page more
>> quick, and guest can be throttled which means dirty page is generated slower. What's
>> the rationale for the longer migration time?
> 
> Because dirty ring is more sensitive to dirty rate, while dirty bitmap is more
Emm... Sorry that I'm very clear about this... I think that higher dirty rate doesn't cause
slower dirty_log_sync compared to that of legacy bitmap mode. Besides, higher dirty rate
means we may have more full-exit, which can properly limit the dirty rate. So it seems that
dirty ring "prefers" higher dirty rate.

> sensitive to memory footprint.  In above 24G mem + 800MB/s dirty rate
> condition, dirty bitmap seems to be more efficient, say, collecting dirty
> bitmap of 24G mem (24G/4K/8=0.75MB) for each migration cycle is fast enough.
> 
> Not to mention that current implementation of dirty ring in QEMU is not
> complete - we still have two more layers of dirty bitmap, so it's actually a
> mixture of dirty bitmap and dirty ring.  This series is more like a POC on
> dirty ring interface, so as to let QEMU be able to run on KVM dirty ring.
> E.g., we won't have hang issue when getting dirty pages since it's totally
> async, however we'll still have some legacy dirty bitmap issues e.g. memory
> consumption of userspace dirty bitmaps are still linear to memory footprint.
The plan looks good and coordinated, but I have a concern. Our dirty ring actually depends
on the structure of hardware logging buffer (PML buffer). We can't say it can be properly
adapted to all kinds of hardware design in the future.

> 
> Moreover, IMHO another important feature that dirty ring provided is actually
> the full-exit, where we can pause a vcpu when it dirties too fast, while other
I think a proper pause time is hard to decide. Short time may have little effect
of throttle, but long time may have heavy effect on guest. Do you have a good algorithm?


> vcpus won't be affected.  That's something I really wanted to POC too but I
> don't have enough time.  I think it's a worth project in the future to really
> make the full-exit throttle vcpus, then ideally we'll remove all the dirty
> bitmaps in QEMU as long as dirty ring is on.
> 
> So I'd say the number I got at that time is not really helping a lot - as you
> can see for small VMs it won't make things faster.  Maybe a bit more efficient?
> I can't tell.  From design-wise it looks actually still better.  However dirty
> logging still has the reasoning to be the default interface we use for small
> vms, imho.
I see.

> 
>>
>> PS: As the dirty ring is still converted into dirty_bitmap of kvm_slot, so the
>> "get dirty info faster" maybe not true. :-(
> 
> We can get dirty info faster even now, I think, because previously we only do
> KVM_GET_DIRTY_LOG once per migration iteration, which could be tens of seconds
> for a VM mentioned above with 24G and 800MB/s dirty rate.  Dirty ring is fully
> async, we'll get that after the reaper thread timeout.  However I must also
> confess "get dirty info faster" doesn't help us a lot on anything yet, afaict,
> comparing to a full-featured dirty logging where clear dirty log and so on.
OK.

> 
> Hope above helps.
Sure, thanks. :)


Keqian


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 00/10] KVM: Dirty ring support (QEMU part)
  2021-03-23  6:40     ` Keqian Zhu
@ 2021-03-23 14:34       ` Peter Xu
  2021-03-24  2:56         ` Keqian Zhu
  0 siblings, 1 reply; 29+ messages in thread
From: Peter Xu @ 2021-03-23 14:34 UTC (permalink / raw)
  To: Keqian Zhu; +Cc: Paolo Bonzini, Hyman, qemu-devel, Dr . David Alan Gilbert

Keqian,

On Tue, Mar 23, 2021 at 02:40:43PM +0800, Keqian Zhu wrote:
> >> The second question is that you observed longer migration time (55s->73s) when guest
> >> has 24G ram and dirty rate is 800M/s. I am not clear about the reason. As with dirty
> >> ring enabled, Qemu can get dirty info faster which means it handles dirty page more
> >> quick, and guest can be throttled which means dirty page is generated slower. What's
> >> the rationale for the longer migration time?
> > 
> > Because dirty ring is more sensitive to dirty rate, while dirty bitmap is more
> Emm... Sorry that I'm very clear about this... I think that higher dirty rate doesn't cause
> slower dirty_log_sync compared to that of legacy bitmap mode. Besides, higher dirty rate
> means we may have more full-exit, which can properly limit the dirty rate. So it seems that
> dirty ring "prefers" higher dirty rate.

When I measured the 800MB/s it's in the guest, after throttling.

Imagine another example: a VM has 1G memory keep dirtying with 10GB/s.  Dirty
logging will need to collect even less for each iteration because memory size
shrinked, collect even less frequent due to the high dirty rate, however dirty
ring will use 100% cpu power to collect dirty pages because the ring keeps full.

> 
> > sensitive to memory footprint.  In above 24G mem + 800MB/s dirty rate
> > condition, dirty bitmap seems to be more efficient, say, collecting dirty
> > bitmap of 24G mem (24G/4K/8=0.75MB) for each migration cycle is fast enough.
> > 
> > Not to mention that current implementation of dirty ring in QEMU is not
> > complete - we still have two more layers of dirty bitmap, so it's actually a
> > mixture of dirty bitmap and dirty ring.  This series is more like a POC on
> > dirty ring interface, so as to let QEMU be able to run on KVM dirty ring.
> > E.g., we won't have hang issue when getting dirty pages since it's totally
> > async, however we'll still have some legacy dirty bitmap issues e.g. memory
> > consumption of userspace dirty bitmaps are still linear to memory footprint.
> The plan looks good and coordinated, but I have a concern. Our dirty ring actually depends
> on the structure of hardware logging buffer (PML buffer). We can't say it can be properly
> adapted to all kinds of hardware design in the future.

Sorry I don't get it - dirty ring can work with pure page wr-protect too?

> 
> > 
> > Moreover, IMHO another important feature that dirty ring provided is actually
> > the full-exit, where we can pause a vcpu when it dirties too fast, while other
> I think a proper pause time is hard to decide. Short time may have little effect
> of throttle, but long time may have heavy effect on guest. Do you have a good algorithm?

That's the next thing we can discuss.  IMHO I think the dirty ring is nice
already because we can measure dirty rate per-vcpu, also we can throttle in
vcpu granule.  That's something required for a good algorithm, say we shouldn't
block vcpu when there's small dirty rate, and in many cases that's the case for
e.g. UI threads.  Any algorithm should be based on these facts.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 00/10] KVM: Dirty ring support (QEMU part)
  2021-03-23 14:34       ` Peter Xu
@ 2021-03-24  2:56         ` Keqian Zhu
  2021-03-24 15:09           ` Peter Xu
  0 siblings, 1 reply; 29+ messages in thread
From: Keqian Zhu @ 2021-03-24  2:56 UTC (permalink / raw)
  To: Peter Xu; +Cc: Paolo Bonzini, Hyman, qemu-devel, Dr . David Alan Gilbert

Hi Peter,

On 2021/3/23 22:34, Peter Xu wrote:
> Keqian,
> 
> On Tue, Mar 23, 2021 at 02:40:43PM +0800, Keqian Zhu wrote:
>>>> The second question is that you observed longer migration time (55s->73s) when guest
>>>> has 24G ram and dirty rate is 800M/s. I am not clear about the reason. As with dirty
>>>> ring enabled, Qemu can get dirty info faster which means it handles dirty page more
>>>> quick, and guest can be throttled which means dirty page is generated slower. What's
>>>> the rationale for the longer migration time?
>>>
>>> Because dirty ring is more sensitive to dirty rate, while dirty bitmap is more
>> Emm... Sorry that I'm very clear about this... I think that higher dirty rate doesn't cause
>> slower dirty_log_sync compared to that of legacy bitmap mode. Besides, higher dirty rate
>> means we may have more full-exit, which can properly limit the dirty rate. So it seems that
>> dirty ring "prefers" higher dirty rate.
> 
> When I measured the 800MB/s it's in the guest, after throttling.
> 
> Imagine another example: a VM has 1G memory keep dirtying with 10GB/s.  Dirty
> logging will need to collect even less for each iteration because memory size
> shrinked, collect even less frequent due to the high dirty rate, however dirty
> ring will use 100% cpu power to collect dirty pages because the ring keeps full.
Looks good.

We have many places to collect dirty pages: the background reaper, vCPU exit handler,
and the migration thread. I think migration time is closely related to the migration thread.

The migration thread calls kvm_dirty_ring_flush().
1. kvm_cpu_synchronize_kick_all() will wait vcpu handles full-exit.
2. kvm_dirty_ring_reap() collects and resets dirty pages.
The above two operation will spend more time with higher dirty rate.

But I suddenly realize that the key problem maybe not at this. Though we have separate
"reset" operation for dirty ring, actually it is performed right after we collect dirty
ring to kvmslot. So in dirty ring mode, it likes legacy bitmap mode without manual_dirty_clear.

If we can "reset" dirty ring just before we really handle the dirty pages, we can have
shorter migration time. But the design of dirty ring doesn't allow this, because we must
perform reset to make free space...

> 
>>
>>> sensitive to memory footprint.  In above 24G mem + 800MB/s dirty rate
>>> condition, dirty bitmap seems to be more efficient, say, collecting dirty
>>> bitmap of 24G mem (24G/4K/8=0.75MB) for each migration cycle is fast enough.
>>>
>>> Not to mention that current implementation of dirty ring in QEMU is not
>>> complete - we still have two more layers of dirty bitmap, so it's actually a
>>> mixture of dirty bitmap and dirty ring.  This series is more like a POC on
>>> dirty ring interface, so as to let QEMU be able to run on KVM dirty ring.
>>> E.g., we won't have hang issue when getting dirty pages since it's totally
>>> async, however we'll still have some legacy dirty bitmap issues e.g. memory
>>> consumption of userspace dirty bitmaps are still linear to memory footprint.
>> The plan looks good and coordinated, but I have a concern. Our dirty ring actually depends
>> on the structure of hardware logging buffer (PML buffer). We can't say it can be properly
>> adapted to all kinds of hardware design in the future.
> 
> Sorry I don't get it - dirty ring can work with pure page wr-protect too?
Sure, it can. I just want to discuss many possible kinds of hardware logging buffer.
However, I'd like to stop at this, at least dirty ring works well with PML. :)

> 
>>
>>>
>>> Moreover, IMHO another important feature that dirty ring provided is actually
>>> the full-exit, where we can pause a vcpu when it dirties too fast, while other
>> I think a proper pause time is hard to decide. Short time may have little effect
>> of throttle, but long time may have heavy effect on guest. Do you have a good algorithm?
> 
> That's the next thing we can discuss.  IMHO I think the dirty ring is nice
> already because we can measure dirty rate per-vcpu, also we can throttle in
> vcpu granule.  That's something required for a good algorithm, say we shouldn't
> block vcpu when there's small dirty rate, and in many cases that's the case for
> e.g. UI threads.  Any algorithm should be based on these facts.
OK.

Thanks,
Keqian


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 00/10] KVM: Dirty ring support (QEMU part)
  2021-03-24  2:56         ` Keqian Zhu
@ 2021-03-24 15:09           ` Peter Xu
  2021-03-25  1:21             ` Keqian Zhu
  0 siblings, 1 reply; 29+ messages in thread
From: Peter Xu @ 2021-03-24 15:09 UTC (permalink / raw)
  To: Keqian Zhu; +Cc: Paolo Bonzini, Hyman, qemu-devel, Dr . David Alan Gilbert

On Wed, Mar 24, 2021 at 10:56:22AM +0800, Keqian Zhu wrote:
> Hi Peter,
> 
> On 2021/3/23 22:34, Peter Xu wrote:
> > Keqian,
> > 
> > On Tue, Mar 23, 2021 at 02:40:43PM +0800, Keqian Zhu wrote:
> >>>> The second question is that you observed longer migration time (55s->73s) when guest
> >>>> has 24G ram and dirty rate is 800M/s. I am not clear about the reason. As with dirty
> >>>> ring enabled, Qemu can get dirty info faster which means it handles dirty page more
> >>>> quick, and guest can be throttled which means dirty page is generated slower. What's
> >>>> the rationale for the longer migration time?
> >>>
> >>> Because dirty ring is more sensitive to dirty rate, while dirty bitmap is more
> >> Emm... Sorry that I'm very clear about this... I think that higher dirty rate doesn't cause
> >> slower dirty_log_sync compared to that of legacy bitmap mode. Besides, higher dirty rate
> >> means we may have more full-exit, which can properly limit the dirty rate. So it seems that
> >> dirty ring "prefers" higher dirty rate.
> > 
> > When I measured the 800MB/s it's in the guest, after throttling.
> > 
> > Imagine another example: a VM has 1G memory keep dirtying with 10GB/s.  Dirty
> > logging will need to collect even less for each iteration because memory size
> > shrinked, collect even less frequent due to the high dirty rate, however dirty
> > ring will use 100% cpu power to collect dirty pages because the ring keeps full.
> Looks good.
> 
> We have many places to collect dirty pages: the background reaper, vCPU exit handler,
> and the migration thread. I think migration time is closely related to the migration thread.
> 
> The migration thread calls kvm_dirty_ring_flush().
> 1. kvm_cpu_synchronize_kick_all() will wait vcpu handles full-exit.
> 2. kvm_dirty_ring_reap() collects and resets dirty pages.
> The above two operation will spend more time with higher dirty rate.
> 
> But I suddenly realize that the key problem maybe not at this. Though we have separate
> "reset" operation for dirty ring, actually it is performed right after we collect dirty
> ring to kvmslot. So in dirty ring mode, it likes legacy bitmap mode without manual_dirty_clear.
> 
> If we can "reset" dirty ring just before we really handle the dirty pages, we can have
> shorter migration time. But the design of dirty ring doesn't allow this, because we must
> perform reset to make free space...

This is a very good point.

Dirty ring should have been better in quite some ways already, but from that
pov as you said it goes a bit backwards on reprotection of pages (not to
mention currently we can't even reset the ring per-vcpu; that seems to be not
fully matching the full locality that the rings have provided as well; but
Paolo and I discussed with that issue, it's about TLB flush expensiveness, so
we still need to think more of it..).

Ideally the ring could have been both per-vcpu but also bi-directional (then
we'll have 2*N rings, N=vcpu number), so as to split the state transition into
"dirty ring" and "reprotect ring", then that reprotect ring will be the clear
dirty log.  That'll look more like virtio as used ring.  However we'll still
need to think about the TLB flush issue too as Paolo used to mention, as
that'll exist too with any per-vcpu flush model (each reprotect of page will
need a tlb flush of all vcpus).

Or.. maybe we can make the flush ring a standalone one, so we need N dirty ring
and one global flush ring.

Anyway.. Before that, I'd still think the next step should be how to integrate
qemu to fully leverage current ring model, so as to be able to throttle in
per-vcpu fashion.

The major issue (IMHO) with huge VM migration is:

  1. Convergence
  2. Responsiveness

Here we'll have a chance to solve (1) by highly throttle the working vcpu
threads, meanwhile still keep (2) by not throttle user interactive threads.
I'm not sure whether this will really work as expected, but just show what I'm
thinking about it.  These may not matter a lot yet with further improving ring
reset mechanism, which definitely sounds even better, but seems orthogonal.

That's also why I think we should still merge this series first as a fundation
for the rest.

> 
> > 
> >>
> >>> sensitive to memory footprint.  In above 24G mem + 800MB/s dirty rate
> >>> condition, dirty bitmap seems to be more efficient, say, collecting dirty
> >>> bitmap of 24G mem (24G/4K/8=0.75MB) for each migration cycle is fast enough.
> >>>
> >>> Not to mention that current implementation of dirty ring in QEMU is not
> >>> complete - we still have two more layers of dirty bitmap, so it's actually a
> >>> mixture of dirty bitmap and dirty ring.  This series is more like a POC on
> >>> dirty ring interface, so as to let QEMU be able to run on KVM dirty ring.
> >>> E.g., we won't have hang issue when getting dirty pages since it's totally
> >>> async, however we'll still have some legacy dirty bitmap issues e.g. memory
> >>> consumption of userspace dirty bitmaps are still linear to memory footprint.
> >> The plan looks good and coordinated, but I have a concern. Our dirty ring actually depends
> >> on the structure of hardware logging buffer (PML buffer). We can't say it can be properly
> >> adapted to all kinds of hardware design in the future.
> > 
> > Sorry I don't get it - dirty ring can work with pure page wr-protect too?
> Sure, it can. I just want to discuss many possible kinds of hardware logging buffer.
> However, I'd like to stop at this, at least dirty ring works well with PML. :)

I see your point.  That'll be a good topic at least when we'd like to port
dirty ring to other archs for sure.  However as you see I hoped we can start to
use dirty ring first, find issues, fix it, even redesign some of it, make it
really beneficial at least on one arch, then it'll make more sense to port it,
or attract people porting it. :)

QEMU does not yet have a good solution for huge vm migration yet.  Maybe dirty
ring is a good start for it, maybe not (e.g., with uffd minor mode postcopy has
the other chance).  We'll see...

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 02/10] KVM: Use a big lock to replace per-kml slots_lock
  2021-03-22 16:27       ` Peter Xu
@ 2021-03-24 18:08         ` Peter Xu
  0 siblings, 0 replies; 29+ messages in thread
From: Peter Xu @ 2021-03-24 18:08 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Hyman, Keqian Zhu, qemu-devel, Dr . David Alan Gilbert

On Mon, Mar 22, 2021 at 12:27:54PM -0400, Peter Xu wrote:
> On Mon, Mar 22, 2021 at 02:54:30PM +0100, Paolo Bonzini wrote:
> > On 22/03/21 11:47, Keqian Zhu wrote:
> > > > +    qemu_mutex_init(&kml_slots_lock);
> > > As you said, x86 has two address spaces, is it a problem that we may have multi initialization for kml_slots_lock?
> > 
> > Agreed, the lock can be initialized (if only for cleanliness) in kvm_init.
> 
> Definitely, I'm surprised why I didn't see this... :) Paolo, do you want me to
> add another patch (as attached)?
> 
> -- 
> Peter Xu

> From 0cb7124d111426f255113814cb8395620276cc1f Mon Sep 17 00:00:00 2001
> From: Peter Xu <peterx@redhat.com>
> Date: Mon, 22 Mar 2021 12:25:18 -0400
> Subject: [PATCH] qemu-thread: Assert and check mutex re-initialize
> 
> qemu_mutex_post_init() sets mutex->initialized but not check it yet.  Add it,
> so as to detect re-initialization of mutexes.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  util/qemu-thread-common.h | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/util/qemu-thread-common.h b/util/qemu-thread-common.h
> index 2af6b120853..e02059845d8 100644
> --- a/util/qemu-thread-common.h
> +++ b/util/qemu-thread-common.h
> @@ -15,6 +15,7 @@
>  
>  #include "qemu/thread.h"
>  #include "trace.h"
> +#include <assert.h>
>  
>  static inline void qemu_mutex_post_init(QemuMutex *mutex)
>  {
> @@ -22,6 +23,7 @@ static inline void qemu_mutex_post_init(QemuMutex *mutex)
>      mutex->file = NULL;
>      mutex->line = 0;
>  #endif
> +    assert(!mutex->initialized);
>      mutex->initialized = true;
>  }

I got coredumps after applying this patch when make check:

#1  0x00007fce1090b895 in __GI_abort
#2  0x00007fce1090b769 in __assert_fail_base
#3  0x00007fce1091ae86 in __GI___assert_fail
#4  0x0000563e3e407248 in qemu_mutex_post_init
#5  0x0000563e3e407491 in qemu_mutex_init
#6  0x0000563e3e410ca4 in rcu_init_complete
#7  0x0000563e3e410e13 in rcu_init_child
#8  0x00007fce1096d60b in __run_fork_handlers
#9  0x00007fce109b42cc in __libc_fork
#10 0x0000563e3e3b5006 in qtest_init_without_qmp_handshake
#11 0x0000563e3e3b51a1 in qtest_init
#12 0x0000563e3e3b116b in test_acpi_one
#13 0x0000563e3e3b1264 in test_acpi_piix4_tcg
#14 0x00007fce10ebe29e in g_test_run_suite_internal
#15 0x00007fce10ebe09b in g_test_run_suite_internal
#16 0x00007fce10ebe09b in g_test_run_suite_internal
#17 0x00007fce10ebe78a in g_test_run_suite
#18 0x00007fce10ebe7a5 in g_test_run
#19 0x0000563e3e3b32b4 in main

Then I saw this commit:

    commit 21b7cf9e07e5991c57b461181cfb5bbb6fe7a9d6
    Author: Paolo Bonzini <pbonzini@redhat.com>
    Date:   Thu Mar 5 16:53:48 2015 +0100

    rcu: handle forks safely
    
    After forking, only the calling thread is duplicated in the child process.
    The call_rcu thread has to be recreated in the child.  Exploit the fact
    that only one thread exists (same as when constructors run), and just redo
    the entire initialization to ensure the threads are in the proper state.
    
    The only additional things to do are emptying the list of threads
    registered with RCU, and unlocking the lock that was taken in the prepare
    callback (implementations are allowed to fail pthread_mutex_init()
    if the mutex is still locked).

It seems we depend on the capability to have pthread_mutex_init() be able to
detect an initialized (especially locked) lock?  I didn't dig deeper yet,
instead for simplicity I'll just drop the extra assertion patch.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 00/10] KVM: Dirty ring support (QEMU part)
  2021-03-24 15:09           ` Peter Xu
@ 2021-03-25  1:21             ` Keqian Zhu
  0 siblings, 0 replies; 29+ messages in thread
From: Keqian Zhu @ 2021-03-25  1:21 UTC (permalink / raw)
  To: Peter Xu; +Cc: Paolo Bonzini, Hyman, qemu-devel, Dr . David Alan Gilbert

Peter,

On 2021/3/24 23:09, Peter Xu wrote:
> On Wed, Mar 24, 2021 at 10:56:22AM +0800, Keqian Zhu wrote:
>> Hi Peter,
>>
>> On 2021/3/23 22:34, Peter Xu wrote:
>>> Keqian,
>>>
>>> On Tue, Mar 23, 2021 at 02:40:43PM +0800, Keqian Zhu wrote:
>>>>>> The second question is that you observed longer migration time (55s->73s) when guest
>>>>>> has 24G ram and dirty rate is 800M/s. I am not clear about the reason. As with dirty
>>>>>> ring enabled, Qemu can get dirty info faster which means it handles dirty page more
>>>>>> quick, and guest can be throttled which means dirty page is generated slower. What's
>>>>>> the rationale for the longer migration time?
>>>>>
>>>>> Because dirty ring is more sensitive to dirty rate, while dirty bitmap is more
>>>> Emm... Sorry that I'm very clear about this... I think that higher dirty rate doesn't cause
>>>> slower dirty_log_sync compared to that of legacy bitmap mode. Besides, higher dirty rate
>>>> means we may have more full-exit, which can properly limit the dirty rate. So it seems that
>>>> dirty ring "prefers" higher dirty rate.
>>>
>>> When I measured the 800MB/s it's in the guest, after throttling.
>>>
>>> Imagine another example: a VM has 1G memory keep dirtying with 10GB/s.  Dirty
>>> logging will need to collect even less for each iteration because memory size
>>> shrinked, collect even less frequent due to the high dirty rate, however dirty
>>> ring will use 100% cpu power to collect dirty pages because the ring keeps full.
>> Looks good.
>>
>> We have many places to collect dirty pages: the background reaper, vCPU exit handler,
>> and the migration thread. I think migration time is closely related to the migration thread.
>>
>> The migration thread calls kvm_dirty_ring_flush().
>> 1. kvm_cpu_synchronize_kick_all() will wait vcpu handles full-exit.
>> 2. kvm_dirty_ring_reap() collects and resets dirty pages.
>> The above two operation will spend more time with higher dirty rate.
>>
>> But I suddenly realize that the key problem maybe not at this. Though we have separate
>> "reset" operation for dirty ring, actually it is performed right after we collect dirty
>> ring to kvmslot. So in dirty ring mode, it likes legacy bitmap mode without manual_dirty_clear.
>>
>> If we can "reset" dirty ring just before we really handle the dirty pages, we can have
>> shorter migration time. But the design of dirty ring doesn't allow this, because we must
>> perform reset to make free space...
> 
> This is a very good point.
> 
> Dirty ring should have been better in quite some ways already, but from that
> pov as you said it goes a bit backwards on reprotection of pages (not to
> mention currently we can't even reset the ring per-vcpu; that seems to be not
> fully matching the full locality that the rings have provided as well; but
> Paolo and I discussed with that issue, it's about TLB flush expensiveness, so
> we still need to think more of it..).
> 
> Ideally the ring could have been both per-vcpu but also bi-directional (then
> we'll have 2*N rings, N=vcpu number), so as to split the state transition into
> "dirty ring" and "reprotect ring", then that reprotect ring will be the clear
> dirty log.  That'll look more like virtio as used ring.  However we'll still
> need to think about the TLB flush issue too as Paolo used to mention, as
> that'll exist too with any per-vcpu flush model (each reprotect of page will
> need a tlb flush of all vcpus).
> 
> Or.. maybe we can make the flush ring a standalone one, so we need N dirty ring
> and one global flush ring.
Yep, have separate "reprotect" ring(s) is a good idea.

> 
> Anyway.. Before that, I'd still think the next step should be how to integrate
> qemu to fully leverage current ring model, so as to be able to throttle in
> per-vcpu fashion.
> 
> The major issue (IMHO) with huge VM migration is:
> 
>   1. Convergence
>   2. Responsiveness
> 
> Here we'll have a chance to solve (1) by highly throttle the working vcpu
> threads, meanwhile still keep (2) by not throttle user interactive threads.
> I'm not sure whether this will really work as expected, but just show what I'm
> thinking about it.  These may not matter a lot yet with further improving ring
> reset mechanism, which definitely sounds even better, but seems orthogonal.
> 
> That's also why I think we should still merge this series first as a fundation
> for the rest.
I see.

> 
>>
>>>
>>>>
>>>>> sensitive to memory footprint.  In above 24G mem + 800MB/s dirty rate
>>>>> condition, dirty bitmap seems to be more efficient, say, collecting dirty
>>>>> bitmap of 24G mem (24G/4K/8=0.75MB) for each migration cycle is fast enough.
>>>>>
>>>>> Not to mention that current implementation of dirty ring in QEMU is not
>>>>> complete - we still have two more layers of dirty bitmap, so it's actually a
>>>>> mixture of dirty bitmap and dirty ring.  This series is more like a POC on
>>>>> dirty ring interface, so as to let QEMU be able to run on KVM dirty ring.
>>>>> E.g., we won't have hang issue when getting dirty pages since it's totally
>>>>> async, however we'll still have some legacy dirty bitmap issues e.g. memory
>>>>> consumption of userspace dirty bitmaps are still linear to memory footprint.
>>>> The plan looks good and coordinated, but I have a concern. Our dirty ring actually depends
>>>> on the structure of hardware logging buffer (PML buffer). We can't say it can be properly
>>>> adapted to all kinds of hardware design in the future.
>>>
>>> Sorry I don't get it - dirty ring can work with pure page wr-protect too?
>> Sure, it can. I just want to discuss many possible kinds of hardware logging buffer.
>> However, I'd like to stop at this, at least dirty ring works well with PML. :)
> 
> I see your point.  That'll be a good topic at least when we'd like to port
> dirty ring to other archs for sure.  However as you see I hoped we can start to
> use dirty ring first, find issues, fix it, even redesign some of it, make it
> really beneficial at least on one arch, then it'll make more sense to port it,
> or attract people porting it. :)
> 
> QEMU does not yet have a good solution for huge vm migration yet.  Maybe dirty
> ring is a good start for it, maybe not (e.g., with uffd minor mode postcopy has
> the other chance).  We'll see...
OK.

Thanks,
Keqian


^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2021-03-25  1:23 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-10 20:32 [PATCH v5 00/10] KVM: Dirty ring support (QEMU part) Peter Xu
2021-03-10 20:32 ` [PATCH v5 01/10] memory: Introduce log_sync_global() to memory listener Peter Xu
2021-03-10 20:32 ` [PATCH v5 02/10] KVM: Use a big lock to replace per-kml slots_lock Peter Xu
2021-03-22 10:47   ` Keqian Zhu
2021-03-22 13:54     ` Paolo Bonzini
2021-03-22 16:27       ` Peter Xu
2021-03-24 18:08         ` Peter Xu
2021-03-10 20:32 ` [PATCH v5 03/10] KVM: Create the KVMSlot dirty bitmap on flag changes Peter Xu
2021-03-10 20:32 ` [PATCH v5 04/10] KVM: Provide helper to get kvm dirty log Peter Xu
2021-03-10 20:32 ` [PATCH v5 05/10] KVM: Provide helper to sync dirty bitmap from slot to ramblock Peter Xu
2021-03-10 20:32 ` [PATCH v5 06/10] KVM: Simplify dirty log sync in kvm_set_phys_mem Peter Xu
2021-03-10 20:32 ` [PATCH v5 07/10] KVM: Cache kvm slot dirty bitmap size Peter Xu
2021-03-10 20:32 ` [PATCH v5 08/10] KVM: Add dirty-gfn-count property Peter Xu
2021-03-10 20:33 ` [PATCH v5 09/10] KVM: Disable manual dirty log when dirty ring enabled Peter Xu
2021-03-22  9:17   ` Keqian Zhu
2021-03-22 13:55     ` Paolo Bonzini
2021-03-22 16:21       ` Peter Xu
2021-03-10 20:33 ` [PATCH v5 10/10] KVM: Dirty ring support Peter Xu
2021-03-22 13:37   ` Keqian Zhu
2021-03-22 18:52     ` Peter Xu
2021-03-23  1:25       ` Keqian Zhu
2021-03-19 18:12 ` [PATCH v5 00/10] KVM: Dirty ring support (QEMU part) Peter Xu
2021-03-22 14:02 ` Keqian Zhu
2021-03-22 19:45   ` Peter Xu
2021-03-23  6:40     ` Keqian Zhu
2021-03-23 14:34       ` Peter Xu
2021-03-24  2:56         ` Keqian Zhu
2021-03-24 15:09           ` Peter Xu
2021-03-25  1:21             ` Keqian Zhu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).