All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v1 0/6] hw/arm/virt: Support dirty ring
@ 2023-02-13  0:39 Gavin Shan
  2023-02-13  0:39 ` [PATCH v1 1/6] linux-headers: Update for " Gavin Shan
                   ` (6 more replies)
  0 siblings, 7 replies; 25+ messages in thread
From: Gavin Shan @ 2023-02-13  0:39 UTC (permalink / raw)
  To: qemu-arm
  Cc: qemu-devel, pbonzini, peter.maydell, peterx, david, philmd, mst,
	cohuck, quintela, dgilbert, maz, zhenyzha, shan.gavin

This series intends to support dirty ring for live migration. The dirty
ring use discrete buffer to track dirty pages. For ARM64, the speciality
is to use backup bitmap to track dirty pages when there is no-running-vcpu
context. It's known that the backup bitmap needs to be synchronized when
KVM device "kvm-arm-gicv3" or "arm-its-kvm" has been enabled. The backup
bitmap is collected in the last stage of migration.

PATCH[1]    Synchronize linux-headers for dirty ring
PATCH[2]    Introduce indicator of the last stage migration and pass it
            all the way down
PATCH[3]    Synchronize the backup bitmap in the last stage of live migration
PATCH[4]    Introduce helper kvm_dirty_ring_init() to enable the dirty ring
PATCH[5-6]  Enable dirty ring for hw/arm/virt

RFCv1: https://lists.nongnu.org/archive/html/qemu-arm/2023-02/msg00171.html

Testing
=======
(1) kvm-unit-tests/its-pending-migration and kvm-unit-tests/its-migration with
    dirty ring or normal dirty page tracking mechanism. All test cases passed.

    QEMU=./qemu.main/build/qemu-system-aarch64 ACCEL=kvm \
    ./its-pending-migration

    QEMU=./qemu.main/build/qemu-system-aarch64 ACCEL=kvm \
    ./its-migration

    QEMU=./qemu.main/build/qemu-system-aarch64 ACCEL=kvm,dirty-ring-size=65536 \
    ./its-pending-migration

    QEMU=./qemu.main/build/qemu-system-aarch64 ACCEL=kvm,dirty-ring-size=65536 \
    ./its-migration

(2) Combinations of migration, post-copy migration, e1000e and virtio-net
    devices. All test cases passed.

    -netdev tap,id=net0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown  \
    -device e1000e,bus=pcie.5,netdev=net0,mac=52:54:00:f1:26:a0

    -netdev tap,id=vnet0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown \
    -device virtio-net-pci,bus=pcie.6,netdev=vnet0,mac=52:54:00:f1:26:b0

Changelog
=========
v1:
  * Combine two patches into one PATCH[v1 2/6] for the last stage indicator    (Peter)
  * Drop the secondary bitmap and use the original one directly                (Juan)
  * Avoid "goto out" in helper kvm_dirty_ring_init()                           (Juan)


Gavin Shan (6):
  linux-headers: Update for dirty ring
  migration: Add last stage indicator to global dirty log
    synchronization
  kvm: Synchronize the backup bitmap in the last stage
  kvm: Add helper kvm_dirty_ring_init()
  hw/arm/virt: Enable backup bitmap for dirty ring
  kvm: Enable dirty ring for arm64

 accel/kvm/kvm-all.c           | 95 ++++++++++++++++++++++++-----------
 hw/arm/virt.c                 | 26 ++++++++++
 include/exec/memory.h         |  5 +-
 include/sysemu/kvm_int.h      |  1 +
 linux-headers/asm-arm64/kvm.h |  1 +
 linux-headers/linux/kvm.h     |  2 +
 migration/dirtyrate.c         |  4 +-
 migration/ram.c               | 20 ++++----
 softmmu/memory.c              | 10 ++--
 target/arm/kvm64.c            | 25 +++++++++
 target/arm/kvm_arm.h          | 12 +++++
 11 files changed, 152 insertions(+), 49 deletions(-)

-- 
2.23.0



^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v1 1/6] linux-headers: Update for dirty ring
  2023-02-13  0:39 [PATCH v1 0/6] hw/arm/virt: Support dirty ring Gavin Shan
@ 2023-02-13  0:39 ` Gavin Shan
  2023-02-21 16:30   ` Peter Maydell
  2023-02-13  0:39 ` [PATCH v1 2/6] migration: Add last stage indicator to global dirty log synchronization Gavin Shan
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 25+ messages in thread
From: Gavin Shan @ 2023-02-13  0:39 UTC (permalink / raw)
  To: qemu-arm
  Cc: qemu-devel, pbonzini, peter.maydell, peterx, david, philmd, mst,
	cohuck, quintela, dgilbert, maz, zhenyzha, shan.gavin

Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 linux-headers/asm-arm64/kvm.h | 1 +
 linux-headers/linux/kvm.h     | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/linux-headers/asm-arm64/kvm.h b/linux-headers/asm-arm64/kvm.h
index 4bf2d7246e..a7cfefb3a8 100644
--- a/linux-headers/asm-arm64/kvm.h
+++ b/linux-headers/asm-arm64/kvm.h
@@ -43,6 +43,7 @@
 #define __KVM_HAVE_VCPU_EVENTS
 
 #define KVM_COALESCED_MMIO_PAGE_OFFSET 1
+#define KVM_DIRTY_LOG_PAGE_OFFSET 64
 
 #define KVM_REG_SIZE(id)						\
 	(1U << (((id) & KVM_REG_SIZE_MASK) >> KVM_REG_SIZE_SHIFT))
diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index ebdafa576d..5b4e0e4e68 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -1175,6 +1175,8 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_VM_DISABLE_NX_HUGE_PAGES 220
 #define KVM_CAP_S390_ZPCI_OP 221
 #define KVM_CAP_S390_CPU_TOPOLOGY 222
+#define KVM_CAP_DIRTY_LOG_RING_ACQ_REL 223
+#define KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP 225
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v1 2/6] migration: Add last stage indicator to global dirty log synchronization
  2023-02-13  0:39 [PATCH v1 0/6] hw/arm/virt: Support dirty ring Gavin Shan
  2023-02-13  0:39 ` [PATCH v1 1/6] linux-headers: Update for " Gavin Shan
@ 2023-02-13  0:39 ` Gavin Shan
  2023-02-21 17:36   ` Peter Xu
  2023-02-13  0:39 ` [PATCH v1 3/6] kvm: Synchronize the backup bitmap in the last stage Gavin Shan
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 25+ messages in thread
From: Gavin Shan @ 2023-02-13  0:39 UTC (permalink / raw)
  To: qemu-arm
  Cc: qemu-devel, pbonzini, peter.maydell, peterx, david, philmd, mst,
	cohuck, quintela, dgilbert, maz, zhenyzha, shan.gavin

The global dirty log synchronization is used when KVM and dirty ring
are enabled. There is a particularity for ARM64 where the backup
bitmap is used to track dirty pages in non-running-vcpu situations.
It means the dirty ring works with the combination of ring buffer
and backup bitmap. The dirty bits in the backup bitmap needs to
collected in the last stage of live migration.

In order to identify the last stage of live migration and pass it
down, an extra parameter is added to the relevant functions and
callbacks. This last stage indicator isn't used until the dirty
ring is enabled in the subsequent patches.

No functional change intended.

Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 accel/kvm/kvm-all.c   |  2 +-
 include/exec/memory.h |  5 +++--
 migration/dirtyrate.c |  4 ++--
 migration/ram.c       | 20 ++++++++++----------
 softmmu/memory.c      | 10 +++++-----
 5 files changed, 21 insertions(+), 20 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 9b26582655..01a6a026af 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -1554,7 +1554,7 @@ static void kvm_log_sync(MemoryListener *listener,
     kvm_slots_unlock();
 }
 
-static void kvm_log_sync_global(MemoryListener *l)
+static void kvm_log_sync_global(MemoryListener *l, bool last_stage)
 {
     KVMMemoryListener *kml = container_of(l, KVMMemoryListener, listener);
     KVMState *s = kvm_state;
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 2e602a2fad..75b2fd9f48 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -929,8 +929,9 @@ struct MemoryListener {
      * its @log_sync must be NULL.  Vice versa.
      *
      * @listener: The #MemoryListener.
+     * @last_stage: The last stage to synchronize the log during migration
      */
-    void (*log_sync_global)(MemoryListener *listener);
+    void (*log_sync_global)(MemoryListener *listener, bool last_stage);
 
     /**
      * @log_clear:
@@ -2408,7 +2409,7 @@ MemoryRegionSection memory_region_find(MemoryRegion *mr,
  *
  * Synchronizes the dirty page log for all address spaces.
  */
-void memory_global_dirty_log_sync(void);
+void memory_global_dirty_log_sync(bool last_stage);
 
 /**
  * memory_global_dirty_log_sync: synchronize the dirty log for all memory
diff --git a/migration/dirtyrate.c b/migration/dirtyrate.c
index 575d48c397..da9b4a1f8d 100644
--- a/migration/dirtyrate.c
+++ b/migration/dirtyrate.c
@@ -101,7 +101,7 @@ void global_dirty_log_change(unsigned int flag, bool start)
 static void global_dirty_log_sync(unsigned int flag, bool one_shot)
 {
     qemu_mutex_lock_iothread();
-    memory_global_dirty_log_sync();
+    memory_global_dirty_log_sync(false);
     if (one_shot) {
         memory_global_dirty_log_stop(flag);
     }
@@ -553,7 +553,7 @@ static void calculate_dirtyrate_dirty_bitmap(struct DirtyRateConfig config)
      * skip it unconditionally and start dirty tracking
      * from 2'round of log sync
      */
-    memory_global_dirty_log_sync();
+    memory_global_dirty_log_sync(false);
 
     /*
      * reset page protect manually and unconditionally.
diff --git a/migration/ram.c b/migration/ram.c
index b966e148c2..52abf015fd 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1176,7 +1176,7 @@ static void migration_trigger_throttle(RAMState *rs)
     }
 }
 
-static void migration_bitmap_sync(RAMState *rs)
+static void migration_bitmap_sync(RAMState *rs, bool last_stage)
 {
     RAMBlock *block;
     int64_t end_time;
@@ -1188,7 +1188,7 @@ static void migration_bitmap_sync(RAMState *rs)
     }
 
     trace_migration_bitmap_sync_start();
-    memory_global_dirty_log_sync();
+    memory_global_dirty_log_sync(last_stage);
 
     qemu_mutex_lock(&rs->bitmap_mutex);
     WITH_RCU_READ_LOCK_GUARD() {
@@ -1222,7 +1222,7 @@ static void migration_bitmap_sync(RAMState *rs)
     }
 }
 
-static void migration_bitmap_sync_precopy(RAMState *rs)
+static void migration_bitmap_sync_precopy(RAMState *rs, bool last_stage)
 {
     Error *local_err = NULL;
 
@@ -1235,7 +1235,7 @@ static void migration_bitmap_sync_precopy(RAMState *rs)
         local_err = NULL;
     }
 
-    migration_bitmap_sync(rs);
+    migration_bitmap_sync(rs, last_stage);
 
     if (precopy_notify(PRECOPY_NOTIFY_AFTER_BITMAP_SYNC, &local_err)) {
         error_report_err(local_err);
@@ -2887,7 +2887,7 @@ void ram_postcopy_send_discard_bitmap(MigrationState *ms)
     RCU_READ_LOCK_GUARD();
 
     /* This should be our last sync, the src is now paused */
-    migration_bitmap_sync(rs);
+    migration_bitmap_sync(rs, false);
 
     /* Easiest way to make sure we don't resume in the middle of a host-page */
     rs->pss[RAM_CHANNEL_PRECOPY].last_sent_block = NULL;
@@ -3077,7 +3077,7 @@ static void ram_init_bitmaps(RAMState *rs)
         /* We don't use dirty log with background snapshots */
         if (!migrate_background_snapshot()) {
             memory_global_dirty_log_start(GLOBAL_DIRTY_MIGRATION);
-            migration_bitmap_sync_precopy(rs);
+            migration_bitmap_sync_precopy(rs, false);
         }
     }
     qemu_mutex_unlock_ramlist();
@@ -3392,7 +3392,7 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
 
     WITH_RCU_READ_LOCK_GUARD() {
         if (!migration_in_postcopy()) {
-            migration_bitmap_sync_precopy(rs);
+            migration_bitmap_sync_precopy(rs, true);
         }
 
         ram_control_before_iterate(f, RAM_CONTROL_FINISH);
@@ -3466,7 +3466,7 @@ static void ram_state_pending_exact(void *opaque,
     if (!migration_in_postcopy()) {
         qemu_mutex_lock_iothread();
         WITH_RCU_READ_LOCK_GUARD() {
-            migration_bitmap_sync_precopy(rs);
+            migration_bitmap_sync_precopy(rs, false);
         }
         qemu_mutex_unlock_iothread();
         remaining_size = rs->migration_dirty_pages * TARGET_PAGE_SIZE;
@@ -3876,7 +3876,7 @@ void colo_incoming_start_dirty_log(void)
     qemu_mutex_lock_iothread();
     qemu_mutex_lock_ramlist();
 
-    memory_global_dirty_log_sync();
+    memory_global_dirty_log_sync(false);
     WITH_RCU_READ_LOCK_GUARD() {
         RAMBLOCK_FOREACH_NOT_IGNORED(block) {
             ramblock_sync_dirty_bitmap(ram_state, block);
@@ -4167,7 +4167,7 @@ void colo_flush_ram_cache(void)
     void *src_host;
     unsigned long offset = 0;
 
-    memory_global_dirty_log_sync();
+    memory_global_dirty_log_sync(false);
     WITH_RCU_READ_LOCK_GUARD() {
         RAMBLOCK_FOREACH_NOT_IGNORED(block) {
             ramblock_sync_dirty_bitmap(ram_state, block);
diff --git a/softmmu/memory.c b/softmmu/memory.c
index 9d64efca26..1cc36ef028 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -2224,7 +2224,7 @@ void memory_region_set_dirty(MemoryRegion *mr, hwaddr addr,
  * If memory region `mr' is NULL, do global sync.  Otherwise, sync
  * dirty bitmap for the specified memory region.
  */
-static void memory_region_sync_dirty_bitmap(MemoryRegion *mr)
+static void memory_region_sync_dirty_bitmap(MemoryRegion *mr, bool last_stage)
 {
     MemoryListener *listener;
     AddressSpace *as;
@@ -2254,7 +2254,7 @@ static void memory_region_sync_dirty_bitmap(MemoryRegion *mr)
              * is to do a global sync, because we are not capable to
              * sync in a finer granularity.
              */
-            listener->log_sync_global(listener);
+            listener->log_sync_global(listener, last_stage);
             trace_memory_region_sync_dirty(mr ? mr->name : "(all)", listener->name, 1);
         }
     }
@@ -2318,7 +2318,7 @@ DirtyBitmapSnapshot *memory_region_snapshot_and_clear_dirty(MemoryRegion *mr,
 {
     DirtyBitmapSnapshot *snapshot;
     assert(mr->ram_block);
-    memory_region_sync_dirty_bitmap(mr);
+    memory_region_sync_dirty_bitmap(mr, false);
     snapshot = cpu_physical_memory_snapshot_and_clear_dirty(mr, addr, size, client);
     memory_global_after_dirty_log_sync();
     return snapshot;
@@ -2844,9 +2844,9 @@ bool memory_region_present(MemoryRegion *container, hwaddr addr)
     return mr && mr != container;
 }
 
-void memory_global_dirty_log_sync(void)
+void memory_global_dirty_log_sync(bool last_stage)
 {
-    memory_region_sync_dirty_bitmap(NULL);
+    memory_region_sync_dirty_bitmap(NULL, last_stage);
 }
 
 void memory_global_after_dirty_log_sync(void)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v1 3/6] kvm: Synchronize the backup bitmap in the last stage
  2023-02-13  0:39 [PATCH v1 0/6] hw/arm/virt: Support dirty ring Gavin Shan
  2023-02-13  0:39 ` [PATCH v1 1/6] linux-headers: Update for " Gavin Shan
  2023-02-13  0:39 ` [PATCH v1 2/6] migration: Add last stage indicator to global dirty log synchronization Gavin Shan
@ 2023-02-13  0:39 ` Gavin Shan
  2023-02-21 17:46   ` Peter Xu
  2023-02-13  0:39 ` [PATCH v1 4/6] kvm: Add helper kvm_dirty_ring_init() Gavin Shan
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 25+ messages in thread
From: Gavin Shan @ 2023-02-13  0:39 UTC (permalink / raw)
  To: qemu-arm
  Cc: qemu-devel, pbonzini, peter.maydell, peterx, david, philmd, mst,
	cohuck, quintela, dgilbert, maz, zhenyzha, shan.gavin

In the last stage of live migration or memory slot removal, the
backup bitmap needs to be synchronized when it has been enabled.

Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 accel/kvm/kvm-all.c      | 11 +++++++++++
 include/sysemu/kvm_int.h |  1 +
 2 files changed, 12 insertions(+)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 01a6a026af..b5e12de522 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -1352,6 +1352,10 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
                  */
                 if (kvm_state->kvm_dirty_ring_size) {
                     kvm_dirty_ring_reap_locked(kvm_state, NULL);
+                    if (kvm_state->kvm_dirty_ring_with_bitmap) {
+                        kvm_slot_sync_dirty_pages(mem);
+                        kvm_slot_get_dirty_log(kvm_state, mem);
+                    }
                 } else {
                     kvm_slot_get_dirty_log(kvm_state, mem);
                 }
@@ -1573,6 +1577,12 @@ static void kvm_log_sync_global(MemoryListener *l, bool last_stage)
         mem = &kml->slots[i];
         if (mem->memory_size && mem->flags & KVM_MEM_LOG_DIRTY_PAGES) {
             kvm_slot_sync_dirty_pages(mem);
+
+            if (s->kvm_dirty_ring_with_bitmap && last_stage &&
+                kvm_slot_get_dirty_log(s, mem)) {
+                kvm_slot_sync_dirty_pages(mem);
+            }
+
             /*
              * This is not needed by KVM_GET_DIRTY_LOG because the
              * ioctl will unconditionally overwrite the whole region.
@@ -3701,6 +3711,7 @@ static void kvm_accel_instance_init(Object *obj)
     s->kernel_irqchip_split = ON_OFF_AUTO_AUTO;
     /* KVM dirty ring is by default off */
     s->kvm_dirty_ring_size = 0;
+    s->kvm_dirty_ring_with_bitmap = false;
     s->notify_vmexit = NOTIFY_VMEXIT_OPTION_RUN;
     s->notify_window = 0;
 }
diff --git a/include/sysemu/kvm_int.h b/include/sysemu/kvm_int.h
index 60b520a13e..fdd5b1bde0 100644
--- a/include/sysemu/kvm_int.h
+++ b/include/sysemu/kvm_int.h
@@ -115,6 +115,7 @@ struct KVMState
     } *as;
     uint64_t kvm_dirty_ring_bytes;  /* Size of the per-vcpu dirty ring */
     uint32_t kvm_dirty_ring_size;   /* Number of dirty GFNs per ring */
+    bool kvm_dirty_ring_with_bitmap;
     struct KVMDirtyRingReaper reaper;
     NotifyVmexitOption notify_vmexit;
     uint32_t notify_window;
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v1 4/6] kvm: Add helper kvm_dirty_ring_init()
  2023-02-13  0:39 [PATCH v1 0/6] hw/arm/virt: Support dirty ring Gavin Shan
                   ` (2 preceding siblings ...)
  2023-02-13  0:39 ` [PATCH v1 3/6] kvm: Synchronize the backup bitmap in the last stage Gavin Shan
@ 2023-02-13  0:39 ` Gavin Shan
  2023-02-21 20:12   ` Peter Xu
  2023-02-13  0:39 ` [PATCH v1 5/6] hw/arm/virt: Enable backup bitmap for dirty ring Gavin Shan
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 25+ messages in thread
From: Gavin Shan @ 2023-02-13  0:39 UTC (permalink / raw)
  To: qemu-arm
  Cc: qemu-devel, pbonzini, peter.maydell, peterx, david, philmd, mst,
	cohuck, quintela, dgilbert, maz, zhenyzha, shan.gavin

Due to multiple capabilities associated with the dirty ring for different
architectures: KVM_CAP_DIRTY_{LOG_RING, LOG_RING_ACQ_REL} for x86 and
arm64 separately. There will be more to be done in order to support the
dirty ring for arm64.

Lets add helper kvm_dirty_ring_init() to enable the dirty ring. With this,
the code looks a bit clean.

No functional change intended.

Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 accel/kvm/kvm-all.c | 76 ++++++++++++++++++++++++++++-----------------
 1 file changed, 47 insertions(+), 29 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index b5e12de522..e5035026c9 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -1453,6 +1453,50 @@ static int kvm_dirty_ring_reaper_init(KVMState *s)
     return 0;
 }
 
+static int kvm_dirty_ring_init(KVMState *s)
+{
+    uint32_t ring_size = s->kvm_dirty_ring_size;
+    uint64_t ring_bytes = ring_size * sizeof(struct kvm_dirty_gfn);
+    int ret;
+
+    s->kvm_dirty_ring_size = 0;
+    s->kvm_dirty_ring_bytes = 0;
+
+    /* Bail if the dirty ring size isn't specified */
+    if (!ring_size) {
+        return 0;
+    }
+
+    /*
+     * Read the max supported pages. Fall back to dirty logging mode
+     * if the dirty ring isn't supported.
+     */
+    ret = kvm_vm_check_extension(s, KVM_CAP_DIRTY_LOG_RING);
+    if (ret <= 0) {
+        warn_report("KVM dirty ring not available, using bitmap method");
+        return 0;
+    }
+
+    if (ring_bytes > ret) {
+        error_report("KVM dirty ring size %" PRIu32 " too big "
+                     "(maximum is %ld).  Please use a smaller value.",
+                     ring_size, (long)ret / sizeof(struct kvm_dirty_gfn));
+        return -EINVAL;
+    }
+
+    ret = kvm_vm_enable_cap(s, KVM_CAP_DIRTY_LOG_RING, 0, ring_bytes);
+    if (ret) {
+        error_report("Enabling of KVM dirty ring failed: %s. "
+                     "Suggested minimum value is 1024.", strerror(-ret));
+        return -EIO;
+    }
+
+    s->kvm_dirty_ring_size = ring_size;
+    s->kvm_dirty_ring_bytes = ring_bytes;
+
+    return 0;
+}
+
 static void kvm_region_add(MemoryListener *listener,
                            MemoryRegionSection *section)
 {
@@ -2522,35 +2566,9 @@ static int kvm_init(MachineState *ms)
      * Enable KVM dirty ring if supported, otherwise fall back to
      * dirty logging mode
      */
-    if (s->kvm_dirty_ring_size > 0) {
-        uint64_t ring_bytes;
-
-        ring_bytes = s->kvm_dirty_ring_size * sizeof(struct kvm_dirty_gfn);
-
-        /* Read the max supported pages */
-        ret = kvm_vm_check_extension(s, KVM_CAP_DIRTY_LOG_RING);
-        if (ret > 0) {
-            if (ring_bytes > ret) {
-                error_report("KVM dirty ring size %" PRIu32 " too big "
-                             "(maximum is %ld).  Please use a smaller value.",
-                             s->kvm_dirty_ring_size,
-                             (long)ret / sizeof(struct kvm_dirty_gfn));
-                ret = -EINVAL;
-                goto err;
-            }
-
-            ret = kvm_vm_enable_cap(s, KVM_CAP_DIRTY_LOG_RING, 0, ring_bytes);
-            if (ret) {
-                error_report("Enabling of KVM dirty ring failed: %s. "
-                             "Suggested minimum value is 1024.", strerror(-ret));
-                goto err;
-            }
-
-            s->kvm_dirty_ring_bytes = ring_bytes;
-         } else {
-             warn_report("KVM dirty ring not available, using bitmap method");
-             s->kvm_dirty_ring_size = 0;
-        }
+    ret = kvm_dirty_ring_init(s);
+    if (ret < 0) {
+        goto err;
     }
 
     /*
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v1 5/6] hw/arm/virt: Enable backup bitmap for dirty ring
  2023-02-13  0:39 [PATCH v1 0/6] hw/arm/virt: Support dirty ring Gavin Shan
                   ` (3 preceding siblings ...)
  2023-02-13  0:39 ` [PATCH v1 4/6] kvm: Add helper kvm_dirty_ring_init() Gavin Shan
@ 2023-02-13  0:39 ` Gavin Shan
  2023-02-21 16:27   ` Peter Maydell
  2023-02-13  0:39 ` [PATCH v1 6/6] kvm: Enable dirty ring for arm64 Gavin Shan
  2023-02-17  1:59 ` [PATCH v1 0/6] hw/arm/virt: Support dirty ring Zhenyu Zhang
  6 siblings, 1 reply; 25+ messages in thread
From: Gavin Shan @ 2023-02-13  0:39 UTC (permalink / raw)
  To: qemu-arm
  Cc: qemu-devel, pbonzini, peter.maydell, peterx, david, philmd, mst,
	cohuck, quintela, dgilbert, maz, zhenyzha, shan.gavin

When KVM device "kvm-arm-gicv3" or "arm-its-kvm" is used, we have to
enable the backup bitmap for the dirty ring. Otherwise, the migration
will fail because those two devices are using the backup bitmap to track
dirty guest memory, corresponding to various hardware tables.

Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
---
 hw/arm/virt.c        | 26 ++++++++++++++++++++++++++
 target/arm/kvm64.c   | 25 +++++++++++++++++++++++++
 target/arm/kvm_arm.h | 12 ++++++++++++
 3 files changed, 63 insertions(+)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 75f28947de..ea9bd98a65 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -2024,6 +2024,8 @@ static void machvirt_init(MachineState *machine)
     VirtMachineClass *vmc = VIRT_MACHINE_GET_CLASS(machine);
     MachineClass *mc = MACHINE_GET_CLASS(machine);
     const CPUArchIdList *possible_cpus;
+    const char *gictype = NULL;
+    const char *itsclass = NULL;
     MemoryRegion *sysmem = get_system_memory();
     MemoryRegion *secure_sysmem = NULL;
     MemoryRegion *tag_sysmem = NULL;
@@ -2071,6 +2073,30 @@ static void machvirt_init(MachineState *machine)
      */
     finalize_gic_version(vms);
 
+    /*
+     * When "kvm-arm-gicv3" or "arm-its-kvm" is used, the backup dirty
+     * bitmap has to be enabled for KVM dirty ring, before any memory
+     * slot is added. Otherwise, the migration will fail with the dirty
+     * ring.
+     */
+    if (kvm_enabled()) {
+        if (vms->gic_version != VIRT_GIC_VERSION_2) {
+            gictype = gicv3_class_name();
+        }
+
+        if (vms->gic_version != VIRT_GIC_VERSION_2 && vms->its) {
+            itsclass = its_class_name();
+        }
+
+        if (((gictype && !strcmp(gictype, "kvm-arm-gicv3")) ||
+             (itsclass && !strcmp(itsclass, "arm-its-kvm"))) &&
+            !kvm_arm_enable_dirty_ring_with_bitmap()) {
+            error_report("Failed to enable the backup bitmap for "
+                         "KVM dirty ring");
+            exit(1);
+        }
+    }
+
     if (vms->secure) {
         /*
          * The Secure view of the world is the same as the NonSecure,
diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
index 1197253d12..91960e1cd3 100644
--- a/target/arm/kvm64.c
+++ b/target/arm/kvm64.c
@@ -764,6 +764,31 @@ bool kvm_arm_steal_time_supported(void)
     return kvm_check_extension(kvm_state, KVM_CAP_STEAL_TIME);
 }
 
+bool kvm_arm_enable_dirty_ring_with_bitmap(void)
+{
+    int ret;
+
+    /* No need to enable the backup bitmap if dirty ring isn't enabled */
+    if (!kvm_dirty_ring_enabled()) {
+        return true;
+    }
+
+    ret = kvm_check_extension(kvm_state, KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP);
+    if (!ret) {
+        return false;
+    }
+
+    ret = kvm_vm_enable_cap(kvm_state, KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP,
+                            0, 1);
+    if (ret) {
+        return false;
+    }
+
+    kvm_state->kvm_dirty_ring_with_bitmap = true;
+
+    return true;
+}
+
 QEMU_BUILD_BUG_ON(KVM_ARM64_SVE_VQ_MIN != 1);
 
 uint32_t kvm_arm_sve_get_vls(CPUState *cs)
diff --git a/target/arm/kvm_arm.h b/target/arm/kvm_arm.h
index 99017b635c..402281e61a 100644
--- a/target/arm/kvm_arm.h
+++ b/target/arm/kvm_arm.h
@@ -282,6 +282,13 @@ void kvm_arm_steal_time_finalize(ARMCPU *cpu, Error **errp);
  */
 bool kvm_arm_steal_time_supported(void);
 
+/**
+ * kvm_arm_enable_dirty_ring_with_bitmap:
+ * Returns: true if KVM dirty ring's backup bitmap is enabled
+ * and false otherwise.
+ */
+bool kvm_arm_enable_dirty_ring_with_bitmap(void);
+
 /**
  * kvm_arm_aarch32_supported:
  *
@@ -395,6 +402,11 @@ static inline bool kvm_arm_steal_time_supported(void)
     return false;
 }
 
+static inline bool kvm_arm_enable_dirty_ring_with_bitmap(void)
+{
+    return false;
+}
+
 /*
  * These functions should never actually be called without KVM support.
  */
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v1 6/6] kvm: Enable dirty ring for arm64
  2023-02-13  0:39 [PATCH v1 0/6] hw/arm/virt: Support dirty ring Gavin Shan
                   ` (4 preceding siblings ...)
  2023-02-13  0:39 ` [PATCH v1 5/6] hw/arm/virt: Enable backup bitmap for dirty ring Gavin Shan
@ 2023-02-13  0:39 ` Gavin Shan
  2023-02-17  1:59 ` [PATCH v1 0/6] hw/arm/virt: Support dirty ring Zhenyu Zhang
  6 siblings, 0 replies; 25+ messages in thread
From: Gavin Shan @ 2023-02-13  0:39 UTC (permalink / raw)
  To: qemu-arm
  Cc: qemu-devel, pbonzini, peter.maydell, peterx, david, philmd, mst,
	cohuck, quintela, dgilbert, maz, zhenyzha, shan.gavin

arm64 has different capability from x86 to enable the dirty ring, which
is KVM_CAP_DIRTY_LOG_RING_ACQ_REL. To enable it in kvm_dirty_ring_init()
when KVM_CAP_DIRTY_LOG_RING isn't supported.

Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
---
 accel/kvm/kvm-all.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index e5035026c9..f6fbeae644 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -1457,6 +1457,7 @@ static int kvm_dirty_ring_init(KVMState *s)
 {
     uint32_t ring_size = s->kvm_dirty_ring_size;
     uint64_t ring_bytes = ring_size * sizeof(struct kvm_dirty_gfn);
+    unsigned int capability = KVM_CAP_DIRTY_LOG_RING;
     int ret;
 
     s->kvm_dirty_ring_size = 0;
@@ -1471,7 +1472,12 @@ static int kvm_dirty_ring_init(KVMState *s)
      * Read the max supported pages. Fall back to dirty logging mode
      * if the dirty ring isn't supported.
      */
-    ret = kvm_vm_check_extension(s, KVM_CAP_DIRTY_LOG_RING);
+    ret = kvm_vm_check_extension(s, capability);
+    if (ret <= 0) {
+        capability = KVM_CAP_DIRTY_LOG_RING_ACQ_REL;
+        ret = kvm_vm_check_extension(s, capability);
+    }
+
     if (ret <= 0) {
         warn_report("KVM dirty ring not available, using bitmap method");
         return 0;
@@ -1484,7 +1490,7 @@ static int kvm_dirty_ring_init(KVMState *s)
         return -EINVAL;
     }
 
-    ret = kvm_vm_enable_cap(s, KVM_CAP_DIRTY_LOG_RING, 0, ring_bytes);
+    ret = kvm_vm_enable_cap(s, capability, 0, ring_bytes);
     if (ret) {
         error_report("Enabling of KVM dirty ring failed: %s. "
                      "Suggested minimum value is 1024.", strerror(-ret));
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v1 0/6] hw/arm/virt: Support dirty ring
  2023-02-13  0:39 [PATCH v1 0/6] hw/arm/virt: Support dirty ring Gavin Shan
                   ` (5 preceding siblings ...)
  2023-02-13  0:39 ` [PATCH v1 6/6] kvm: Enable dirty ring for arm64 Gavin Shan
@ 2023-02-17  1:59 ` Zhenyu Zhang
  6 siblings, 0 replies; 25+ messages in thread
From: Zhenyu Zhang @ 2023-02-17  1:59 UTC (permalink / raw)
  To: Gavin Shan
  Cc: qemu-arm, qemu-devel, pbonzini, peter.maydell, peterx, david,
	philmd, mst, cohuck, quintela, dgilbert, maz, shan.gavin, Li Jin

[PATCH v1 0/6] hw/arm/virt: Support dirty ring

The patches work well on my arm Ampere host.
The test results are as expected.

Testing
=======
(1) kvm-unit-tests/its-pending-migration,
kvm-unit-tests/its-migrate-unmapped-collection
    and kvm-unit-tests/its-migration with dirty ring or normal dirty
page tracking
    mechanism. All test cases passed.

    QEMU=/home/zhenyzha/sandbox/qemu/build/qemu-system-aarch64 ACCEL=kvm \
    PROCESSOR=host ./its-migration

    QEMU=/home/zhenyzha/sandbox/qemu/build/qemu-system-aarch64 ACCEL=kvm \
    PROCESSOR=host ./its-migrate-unmapped-collection

    QEMU=/home/zhenyzha/sandbox/qemu/build/qemu-system-aarch64 ACCEL=kvm \
    PROCESSOR=host ./its-pending-migration

    QEMU=/home/zhenyzha/sandbox/qemu/build/qemu-system-aarch64
ACCEL=kvm,dirty-ring-size=65536 \
    PROCESSOR=host ./its-migration

    QEMU=/home/zhenyzha/sandbox/qemu/build/qemu-system-aarch64
ACCEL=kvm,dirty-ring-size=65536 \
    PROCESSOR=host ./its-migrate-unmapped-collection

    QEMU=/home/zhenyzha/sandbox/qemu/build/qemu-system-aarch64
ACCEL=kvm,dirty-ring-size=65536 \
    PROCESSOR=host ./its-pending-migration

(2) Combinations of migration, post-copy migration, e1000e and virtio-net
    devices. All test cases passed.

    -device '{"driver": "virtio-net-pci", "mac": "9a:97:8f:c7:cc:a6",
"rombar": 0, "netdev": "idDGdh30", "bus": "pcie-root-port-4", "addr":
"0x0"}'  \
    -netdev tap,id=idDGdh30,vhost=on

    -device '{"driver": "e1000e", "mac": "9a:fd:93:f1:97:b1",
"netdev": "idXDOtMA", "bus": "pcie-root-port-4", "addr": "0x0"}'  \
    -netdev tap,id=idXDOtMA,vhost=on  \

(3) Simulate heavy memory pressure scenarios and compare the migration
    performance difference between dirty ring and dirty logging.

    I gave with a 200G memory guest, 40 vcpus, using 10g NIC as migration
    channel.  When idle or dirty workload small, I don't observe major
    difference on total migration time.  When with higher random dirty
    workload (Anonymous mapping 180G memory, 256MB/s dirty rate upon).
    Total migration time is (in seconds):


    |-------------------------+|-------------------------|
    | dirty ring (4k entries) | dirty logging       |
    |-------------------------+|-------------------------|
    |                           67 |                         74 |
    |                           67 |                         74 |
    |                           66 |                         76 |
    |                           66 |                         73 |
    |                           67 |                         76 |
    |                           67 |                         76 |
    |                           66 |                         73 |
    |                           67 |                         74 |
    |-------------------------+|-------------------------|

    Summary:

    dirty ring average:    67s
    dirty logging average: 75s

Tested-by: Zhenyu Zhang <zhenzha@redhat.com>


On Mon, Feb 13, 2023 at 8:39 AM Gavin Shan <gshan@redhat.com> wrote:
>
> This series intends to support dirty ring for live migration. The dirty
> ring use discrete buffer to track dirty pages. For ARM64, the speciality
> is to use backup bitmap to track dirty pages when there is no-running-vcpu
> context. It's known that the backup bitmap needs to be synchronized when
> KVM device "kvm-arm-gicv3" or "arm-its-kvm" has been enabled. The backup
> bitmap is collected in the last stage of migration.
>
> PATCH[1]    Synchronize linux-headers for dirty ring
> PATCH[2]    Introduce indicator of the last stage migration and pass it
>             all the way down
> PATCH[3]    Synchronize the backup bitmap in the last stage of live migration
> PATCH[4]    Introduce helper kvm_dirty_ring_init() to enable the dirty ring
> PATCH[5-6]  Enable dirty ring for hw/arm/virt
>
> RFCv1: https://lists.nongnu.org/archive/html/qemu-arm/2023-02/msg00171.html
>
> Testing
> =======
> (1) kvm-unit-tests/its-pending-migration and kvm-unit-tests/its-migration with
>     dirty ring or normal dirty page tracking mechanism. All test cases passed.
>
>     QEMU=./qemu.main/build/qemu-system-aarch64 ACCEL=kvm \
>     ./its-pending-migration
>
>     QEMU=./qemu.main/build/qemu-system-aarch64 ACCEL=kvm \
>     ./its-migration
>
>     QEMU=./qemu.main/build/qemu-system-aarch64 ACCEL=kvm,dirty-ring-size=65536 \
>     ./its-pending-migration
>
>     QEMU=./qemu.main/build/qemu-system-aarch64 ACCEL=kvm,dirty-ring-size=65536 \
>     ./its-migration
>
> (2) Combinations of migration, post-copy migration, e1000e and virtio-net
>     devices. All test cases passed.
>
>     -netdev tap,id=net0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown  \
>     -device e1000e,bus=pcie.5,netdev=net0,mac=52:54:00:f1:26:a0
>
>     -netdev tap,id=vnet0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown \
>     -device virtio-net-pci,bus=pcie.6,netdev=vnet0,mac=52:54:00:f1:26:b0
>
> Changelog
> =========
> v1:
>   * Combine two patches into one PATCH[v1 2/6] for the last stage indicator    (Peter)
>   * Drop the secondary bitmap and use the original one directly                (Juan)
>   * Avoid "goto out" in helper kvm_dirty_ring_init()                           (Juan)
>
>
> Gavin Shan (6):
>   linux-headers: Update for dirty ring
>   migration: Add last stage indicator to global dirty log
>     synchronization
>   kvm: Synchronize the backup bitmap in the last stage
>   kvm: Add helper kvm_dirty_ring_init()
>   hw/arm/virt: Enable backup bitmap for dirty ring
>   kvm: Enable dirty ring for arm64
>
>  accel/kvm/kvm-all.c           | 95 ++++++++++++++++++++++++-----------
>  hw/arm/virt.c                 | 26 ++++++++++
>  include/exec/memory.h         |  5 +-
>  include/sysemu/kvm_int.h      |  1 +
>  linux-headers/asm-arm64/kvm.h |  1 +
>  linux-headers/linux/kvm.h     |  2 +
>  migration/dirtyrate.c         |  4 +-
>  migration/ram.c               | 20 ++++----
>  softmmu/memory.c              | 10 ++--
>  target/arm/kvm64.c            | 25 +++++++++
>  target/arm/kvm_arm.h          | 12 +++++
>  11 files changed, 152 insertions(+), 49 deletions(-)
>
> --
> 2.23.0
>



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v1 5/6] hw/arm/virt: Enable backup bitmap for dirty ring
  2023-02-13  0:39 ` [PATCH v1 5/6] hw/arm/virt: Enable backup bitmap for dirty ring Gavin Shan
@ 2023-02-21 16:27   ` Peter Maydell
  2023-02-22  4:35     ` Gavin Shan
  0 siblings, 1 reply; 25+ messages in thread
From: Peter Maydell @ 2023-02-21 16:27 UTC (permalink / raw)
  To: Gavin Shan
  Cc: qemu-arm, qemu-devel, pbonzini, peterx, david, philmd, mst,
	cohuck, quintela, dgilbert, maz, zhenyzha, shan.gavin

On Mon, 13 Feb 2023 at 00:40, Gavin Shan <gshan@redhat.com> wrote:
>
> When KVM device "kvm-arm-gicv3" or "arm-its-kvm" is used, we have to
> enable the backup bitmap for the dirty ring. Otherwise, the migration
> will fail because those two devices are using the backup bitmap to track
> dirty guest memory, corresponding to various hardware tables.
>
> Signed-off-by: Gavin Shan <gshan@redhat.com>
> Reviewed-by: Juan Quintela <quintela@redhat.com>
> ---
>  hw/arm/virt.c        | 26 ++++++++++++++++++++++++++
>  target/arm/kvm64.c   | 25 +++++++++++++++++++++++++
>  target/arm/kvm_arm.h | 12 ++++++++++++
>  3 files changed, 63 insertions(+)
>
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index 75f28947de..ea9bd98a65 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -2024,6 +2024,8 @@ static void machvirt_init(MachineState *machine)
>      VirtMachineClass *vmc = VIRT_MACHINE_GET_CLASS(machine);
>      MachineClass *mc = MACHINE_GET_CLASS(machine);
>      const CPUArchIdList *possible_cpus;
> +    const char *gictype = NULL;
> +    const char *itsclass = NULL;
>      MemoryRegion *sysmem = get_system_memory();
>      MemoryRegion *secure_sysmem = NULL;
>      MemoryRegion *tag_sysmem = NULL;
> @@ -2071,6 +2073,30 @@ static void machvirt_init(MachineState *machine)
>       */
>      finalize_gic_version(vms);
>
> +    /*
> +     * When "kvm-arm-gicv3" or "arm-its-kvm" is used, the backup dirty
> +     * bitmap has to be enabled for KVM dirty ring, before any memory
> +     * slot is added. Otherwise, the migration will fail with the dirty
> +     * ring.
> +     */
> +    if (kvm_enabled()) {
> +        if (vms->gic_version != VIRT_GIC_VERSION_2) {
> +            gictype = gicv3_class_name();
> +        }
> +
> +        if (vms->gic_version != VIRT_GIC_VERSION_2 && vms->its) {
> +            itsclass = its_class_name();
> +        }
> +
> +        if (((gictype && !strcmp(gictype, "kvm-arm-gicv3")) ||
> +             (itsclass && !strcmp(itsclass, "arm-its-kvm"))) &&
> +            !kvm_arm_enable_dirty_ring_with_bitmap()) {
> +            error_report("Failed to enable the backup bitmap for "
> +                         "KVM dirty ring");
> +            exit(1);
> +        }
> +    }

Why does this need to be board-specific code? Is there
some way we can just do the right thing automatically?
Why does the GIC/ITS matter?

The kernel should already know whether we have asked it
to do something that needs this extra extension, so
I think we ought to be able in the generic "enable the
dirty ring" code say "if the kernel says we need this
extra thing, also enable this extra thing". Or if that's
too early, we can do the extra part in a generic hook a
bit later.

In the future there might be other things, presumably,
that need the backup bitmap, so it would be more future
proof not to need to also change QEMU to add extra
logic checks that duplicate the logic the kernel already has.

thanks
-- PMM


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v1 1/6] linux-headers: Update for dirty ring
  2023-02-13  0:39 ` [PATCH v1 1/6] linux-headers: Update for " Gavin Shan
@ 2023-02-21 16:30   ` Peter Maydell
  2023-02-21 23:18     ` Gavin Shan
  0 siblings, 1 reply; 25+ messages in thread
From: Peter Maydell @ 2023-02-21 16:30 UTC (permalink / raw)
  To: Gavin Shan
  Cc: qemu-arm, qemu-devel, pbonzini, peterx, david, philmd, mst,
	cohuck, quintela, dgilbert, maz, zhenyzha, shan.gavin

On Mon, 13 Feb 2023 at 00:39, Gavin Shan <gshan@redhat.com> wrote:
>
> Signed-off-by: Gavin Shan <gshan@redhat.com>
> ---
>  linux-headers/asm-arm64/kvm.h | 1 +
>  linux-headers/linux/kvm.h     | 2 ++
>  2 files changed, 3 insertions(+)

For this to be a non-RFC patch, this needs to be a proper
sync of the headers against an upstream kernel tree.
(By-hand tweaks are fine for purposes of working on
and getting patchsets reviewed.)

thanks
-- PMM


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v1 2/6] migration: Add last stage indicator to global dirty log synchronization
  2023-02-13  0:39 ` [PATCH v1 2/6] migration: Add last stage indicator to global dirty log synchronization Gavin Shan
@ 2023-02-21 17:36   ` Peter Xu
  2023-02-21 23:20     ` Gavin Shan
  0 siblings, 1 reply; 25+ messages in thread
From: Peter Xu @ 2023-02-21 17:36 UTC (permalink / raw)
  To: Gavin Shan
  Cc: qemu-arm, qemu-devel, pbonzini, peter.maydell, david, philmd,
	mst, cohuck, quintela, dgilbert, maz, zhenyzha, shan.gavin

On Mon, Feb 13, 2023 at 08:39:21AM +0800, Gavin Shan wrote:
> The global dirty log synchronization is used when KVM and dirty ring
> are enabled. There is a particularity for ARM64 where the backup
> bitmap is used to track dirty pages in non-running-vcpu situations.
> It means the dirty ring works with the combination of ring buffer
> and backup bitmap. The dirty bits in the backup bitmap needs to
> collected in the last stage of live migration.
> 
> In order to identify the last stage of live migration and pass it
> down, an extra parameter is added to the relevant functions and
> callbacks. This last stage indicator isn't used until the dirty
> ring is enabled in the subsequent patches.
> 
> No functional change intended.
> 
> Signed-off-by: Gavin Shan <gshan@redhat.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

One trivial thing to mention below.

> ---
>  accel/kvm/kvm-all.c   |  2 +-
>  include/exec/memory.h |  5 +++--
>  migration/dirtyrate.c |  4 ++--
>  migration/ram.c       | 20 ++++++++++----------
>  softmmu/memory.c      | 10 +++++-----
>  5 files changed, 21 insertions(+), 20 deletions(-)
> 
> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> index 9b26582655..01a6a026af 100644
> --- a/accel/kvm/kvm-all.c
> +++ b/accel/kvm/kvm-all.c
> @@ -1554,7 +1554,7 @@ static void kvm_log_sync(MemoryListener *listener,
>      kvm_slots_unlock();
>  }
>  
> -static void kvm_log_sync_global(MemoryListener *l)
> +static void kvm_log_sync_global(MemoryListener *l, bool last_stage)
>  {
>      KVMMemoryListener *kml = container_of(l, KVMMemoryListener, listener);
>      KVMState *s = kvm_state;
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index 2e602a2fad..75b2fd9f48 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -929,8 +929,9 @@ struct MemoryListener {
>       * its @log_sync must be NULL.  Vice versa.
>       *
>       * @listener: The #MemoryListener.
> +     * @last_stage: The last stage to synchronize the log during migration

IMHO it may be important to mention the vcpu status here that the caller
guarantees to call the last_stage==true only once, only after all vcpus are
stopped (and vcpus will not be started again if migration succeeded).

>       */
> -    void (*log_sync_global)(MemoryListener *listener);
> +    void (*log_sync_global)(MemoryListener *listener, bool last_stage);

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v1 3/6] kvm: Synchronize the backup bitmap in the last stage
  2023-02-13  0:39 ` [PATCH v1 3/6] kvm: Synchronize the backup bitmap in the last stage Gavin Shan
@ 2023-02-21 17:46   ` Peter Xu
  2023-02-21 23:44     ` Gavin Shan
  0 siblings, 1 reply; 25+ messages in thread
From: Peter Xu @ 2023-02-21 17:46 UTC (permalink / raw)
  To: Gavin Shan
  Cc: qemu-arm, qemu-devel, pbonzini, peter.maydell, david, philmd,
	mst, cohuck, quintela, dgilbert, maz, zhenyzha, shan.gavin

On Mon, Feb 13, 2023 at 08:39:22AM +0800, Gavin Shan wrote:
> In the last stage of live migration or memory slot removal, the
> backup bitmap needs to be synchronized when it has been enabled.
> 
> Signed-off-by: Gavin Shan <gshan@redhat.com>
> ---
>  accel/kvm/kvm-all.c      | 11 +++++++++++
>  include/sysemu/kvm_int.h |  1 +
>  2 files changed, 12 insertions(+)
> 
> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> index 01a6a026af..b5e12de522 100644
> --- a/accel/kvm/kvm-all.c
> +++ b/accel/kvm/kvm-all.c
> @@ -1352,6 +1352,10 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
>                   */
>                  if (kvm_state->kvm_dirty_ring_size) {
>                      kvm_dirty_ring_reap_locked(kvm_state, NULL);
> +                    if (kvm_state->kvm_dirty_ring_with_bitmap) {
> +                        kvm_slot_sync_dirty_pages(mem);
> +                        kvm_slot_get_dirty_log(kvm_state, mem);
> +                    }
>                  } else {
>                      kvm_slot_get_dirty_log(kvm_state, mem);
>                  }

IIUC after the memory atomic update changes lands QEMU, we may not need
this sync at all.

My understanding is that we sync dirty log here only because of non-atomic
updates happening in the past and we may lose dirty bits unexpectedly.
Maybe Paolo knows.

But that needs some more justification and history digging, so definitely
more suitable to leave it for later and separate discussion.

Reviewed-by: Peter Xu <peterx@redhat.com>

Thanks,

> @@ -1573,6 +1577,12 @@ static void kvm_log_sync_global(MemoryListener *l, bool last_stage)
>          mem = &kml->slots[i];
>          if (mem->memory_size && mem->flags & KVM_MEM_LOG_DIRTY_PAGES) {
>              kvm_slot_sync_dirty_pages(mem);
> +
> +            if (s->kvm_dirty_ring_with_bitmap && last_stage &&
> +                kvm_slot_get_dirty_log(s, mem)) {
> +                kvm_slot_sync_dirty_pages(mem);
> +            }
> +
>              /*
>               * This is not needed by KVM_GET_DIRTY_LOG because the
>               * ioctl will unconditionally overwrite the whole region.
> @@ -3701,6 +3711,7 @@ static void kvm_accel_instance_init(Object *obj)
>      s->kernel_irqchip_split = ON_OFF_AUTO_AUTO;
>      /* KVM dirty ring is by default off */
>      s->kvm_dirty_ring_size = 0;
> +    s->kvm_dirty_ring_with_bitmap = false;
>      s->notify_vmexit = NOTIFY_VMEXIT_OPTION_RUN;
>      s->notify_window = 0;
>  }
> diff --git a/include/sysemu/kvm_int.h b/include/sysemu/kvm_int.h
> index 60b520a13e..fdd5b1bde0 100644
> --- a/include/sysemu/kvm_int.h
> +++ b/include/sysemu/kvm_int.h
> @@ -115,6 +115,7 @@ struct KVMState
>      } *as;
>      uint64_t kvm_dirty_ring_bytes;  /* Size of the per-vcpu dirty ring */
>      uint32_t kvm_dirty_ring_size;   /* Number of dirty GFNs per ring */
> +    bool kvm_dirty_ring_with_bitmap;
>      struct KVMDirtyRingReaper reaper;
>      NotifyVmexitOption notify_vmexit;
>      uint32_t notify_window;
> -- 
> 2.23.0
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v1 4/6] kvm: Add helper kvm_dirty_ring_init()
  2023-02-13  0:39 ` [PATCH v1 4/6] kvm: Add helper kvm_dirty_ring_init() Gavin Shan
@ 2023-02-21 20:12   ` Peter Xu
  0 siblings, 0 replies; 25+ messages in thread
From: Peter Xu @ 2023-02-21 20:12 UTC (permalink / raw)
  To: Gavin Shan
  Cc: qemu-arm, qemu-devel, pbonzini, peter.maydell, david, philmd,
	mst, cohuck, quintela, dgilbert, maz, zhenyzha, shan.gavin

On Mon, Feb 13, 2023 at 08:39:23AM +0800, Gavin Shan wrote:
> Due to multiple capabilities associated with the dirty ring for different
> architectures: KVM_CAP_DIRTY_{LOG_RING, LOG_RING_ACQ_REL} for x86 and
> arm64 separately. There will be more to be done in order to support the
> dirty ring for arm64.
> 
> Lets add helper kvm_dirty_ring_init() to enable the dirty ring. With this,
> the code looks a bit clean.
> 
> No functional change intended.
> 
> Signed-off-by: Gavin Shan <gshan@redhat.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v1 1/6] linux-headers: Update for dirty ring
  2023-02-21 16:30   ` Peter Maydell
@ 2023-02-21 23:18     ` Gavin Shan
  2023-02-22  8:49       ` Cornelia Huck
  0 siblings, 1 reply; 25+ messages in thread
From: Gavin Shan @ 2023-02-21 23:18 UTC (permalink / raw)
  To: Peter Maydell
  Cc: qemu-arm, qemu-devel, pbonzini, peterx, david, philmd, mst,
	cohuck, quintela, dgilbert, maz, zhenyzha, shan.gavin

On 2/22/23 3:30 AM, Peter Maydell wrote:
> On Mon, 13 Feb 2023 at 00:39, Gavin Shan <gshan@redhat.com> wrote:
>>
>> Signed-off-by: Gavin Shan <gshan@redhat.com>
>> ---
>>   linux-headers/asm-arm64/kvm.h | 1 +
>>   linux-headers/linux/kvm.h     | 2 ++
>>   2 files changed, 3 insertions(+)
> 
> For this to be a non-RFC patch, this needs to be a proper
> sync of the headers against an upstream kernel tree.
> (By-hand tweaks are fine for purposes of working on
> and getting patchsets reviewed.)
> 

Yes, I vaguely remember there is script to synchronize linux header files,
which is './scripts/update-linux-headers.sh'. I think I need to run the
following command to update?

   # ./scripts/update-linux-headers.sh <LINUX_PATH> <QEMU_PATH>

Thanks,
Gavin



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v1 2/6] migration: Add last stage indicator to global dirty log synchronization
  2023-02-21 17:36   ` Peter Xu
@ 2023-02-21 23:20     ` Gavin Shan
  0 siblings, 0 replies; 25+ messages in thread
From: Gavin Shan @ 2023-02-21 23:20 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-arm, qemu-devel, pbonzini, peter.maydell, david, philmd,
	mst, cohuck, quintela, dgilbert, maz, zhenyzha, shan.gavin

On 2/22/23 4:36 AM, Peter Xu wrote:
> On Mon, Feb 13, 2023 at 08:39:21AM +0800, Gavin Shan wrote:
>> The global dirty log synchronization is used when KVM and dirty ring
>> are enabled. There is a particularity for ARM64 where the backup
>> bitmap is used to track dirty pages in non-running-vcpu situations.
>> It means the dirty ring works with the combination of ring buffer
>> and backup bitmap. The dirty bits in the backup bitmap needs to
>> collected in the last stage of live migration.
>>
>> In order to identify the last stage of live migration and pass it
>> down, an extra parameter is added to the relevant functions and
>> callbacks. This last stage indicator isn't used until the dirty
>> ring is enabled in the subsequent patches.
>>
>> No functional change intended.
>>
>> Signed-off-by: Gavin Shan <gshan@redhat.com>
> 
> Reviewed-by: Peter Xu <peterx@redhat.com>
> 
> One trivial thing to mention below.
> 
>> ---
>>   accel/kvm/kvm-all.c   |  2 +-
>>   include/exec/memory.h |  5 +++--
>>   migration/dirtyrate.c |  4 ++--
>>   migration/ram.c       | 20 ++++++++++----------
>>   softmmu/memory.c      | 10 +++++-----
>>   5 files changed, 21 insertions(+), 20 deletions(-)
>>
>> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
>> index 9b26582655..01a6a026af 100644
>> --- a/accel/kvm/kvm-all.c
>> +++ b/accel/kvm/kvm-all.c
>> @@ -1554,7 +1554,7 @@ static void kvm_log_sync(MemoryListener *listener,
>>       kvm_slots_unlock();
>>   }
>>   
>> -static void kvm_log_sync_global(MemoryListener *l)
>> +static void kvm_log_sync_global(MemoryListener *l, bool last_stage)
>>   {
>>       KVMMemoryListener *kml = container_of(l, KVMMemoryListener, listener);
>>       KVMState *s = kvm_state;
>> diff --git a/include/exec/memory.h b/include/exec/memory.h
>> index 2e602a2fad..75b2fd9f48 100644
>> --- a/include/exec/memory.h
>> +++ b/include/exec/memory.h
>> @@ -929,8 +929,9 @@ struct MemoryListener {
>>        * its @log_sync must be NULL.  Vice versa.
>>        *
>>        * @listener: The #MemoryListener.
>> +     * @last_stage: The last stage to synchronize the log during migration
> 
> IMHO it may be important to mention the vcpu status here that the caller
> guarantees to call the last_stage==true only once, only after all vcpus are
> stopped (and vcpus will not be started again if migration succeeded).
> 

Yes, I will update the comments in next revision accordingly.

>>        */
>> -    void (*log_sync_global)(MemoryListener *listener);
>> +    void (*log_sync_global)(MemoryListener *listener, bool last_stage);

Thanks,
Gavin



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v1 3/6] kvm: Synchronize the backup bitmap in the last stage
  2023-02-21 17:46   ` Peter Xu
@ 2023-02-21 23:44     ` Gavin Shan
  2023-02-21 23:58       ` Peter Xu
  0 siblings, 1 reply; 25+ messages in thread
From: Gavin Shan @ 2023-02-21 23:44 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-arm, qemu-devel, pbonzini, peter.maydell, david, philmd,
	mst, cohuck, quintela, dgilbert, maz, zhenyzha, shan.gavin

On 2/22/23 4:46 AM, Peter Xu wrote:
> On Mon, Feb 13, 2023 at 08:39:22AM +0800, Gavin Shan wrote:
>> In the last stage of live migration or memory slot removal, the
>> backup bitmap needs to be synchronized when it has been enabled.
>>
>> Signed-off-by: Gavin Shan <gshan@redhat.com>
>> ---
>>   accel/kvm/kvm-all.c      | 11 +++++++++++
>>   include/sysemu/kvm_int.h |  1 +
>>   2 files changed, 12 insertions(+)
>>
>> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
>> index 01a6a026af..b5e12de522 100644
>> --- a/accel/kvm/kvm-all.c
>> +++ b/accel/kvm/kvm-all.c
>> @@ -1352,6 +1352,10 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
>>                    */
>>                   if (kvm_state->kvm_dirty_ring_size) {
>>                       kvm_dirty_ring_reap_locked(kvm_state, NULL);
>> +                    if (kvm_state->kvm_dirty_ring_with_bitmap) {
>> +                        kvm_slot_sync_dirty_pages(mem);
>> +                        kvm_slot_get_dirty_log(kvm_state, mem);
>> +                    }
>>                   } else {
>>                       kvm_slot_get_dirty_log(kvm_state, mem);
>>                   }
> 
> IIUC after the memory atomic update changes lands QEMU, we may not need
> this sync at all.
> 
> My understanding is that we sync dirty log here only because of non-atomic
> updates happening in the past and we may lose dirty bits unexpectedly.
> Maybe Paolo knows.
> 
> But that needs some more justification and history digging, so definitely
> more suitable to leave it for later and separate discussion.
> 
> Reviewed-by: Peter Xu <peterx@redhat.com>
> 

Peter, could you please give some hints for me to understand the atomic
and non-atomic update here? Ok, I will drop this part of changes in next
revision with the assumption that we have atomic update supported for
ARM64.

Thanks,
Gavin

> 
>> @@ -1573,6 +1577,12 @@ static void kvm_log_sync_global(MemoryListener *l, bool last_stage)
>>           mem = &kml->slots[i];
>>           if (mem->memory_size && mem->flags & KVM_MEM_LOG_DIRTY_PAGES) {
>>               kvm_slot_sync_dirty_pages(mem);
>> +
>> +            if (s->kvm_dirty_ring_with_bitmap && last_stage &&
>> +                kvm_slot_get_dirty_log(s, mem)) {
>> +                kvm_slot_sync_dirty_pages(mem);
>> +            }
>> +
>>               /*
>>                * This is not needed by KVM_GET_DIRTY_LOG because the
>>                * ioctl will unconditionally overwrite the whole region.
>> @@ -3701,6 +3711,7 @@ static void kvm_accel_instance_init(Object *obj)
>>       s->kernel_irqchip_split = ON_OFF_AUTO_AUTO;
>>       /* KVM dirty ring is by default off */
>>       s->kvm_dirty_ring_size = 0;
>> +    s->kvm_dirty_ring_with_bitmap = false;
>>       s->notify_vmexit = NOTIFY_VMEXIT_OPTION_RUN;
>>       s->notify_window = 0;
>>   }
>> diff --git a/include/sysemu/kvm_int.h b/include/sysemu/kvm_int.h
>> index 60b520a13e..fdd5b1bde0 100644
>> --- a/include/sysemu/kvm_int.h
>> +++ b/include/sysemu/kvm_int.h
>> @@ -115,6 +115,7 @@ struct KVMState
>>       } *as;
>>       uint64_t kvm_dirty_ring_bytes;  /* Size of the per-vcpu dirty ring */
>>       uint32_t kvm_dirty_ring_size;   /* Number of dirty GFNs per ring */
>> +    bool kvm_dirty_ring_with_bitmap;
>>       struct KVMDirtyRingReaper reaper;
>>       NotifyVmexitOption notify_vmexit;
>>       uint32_t notify_window;
>> -- 
>> 2.23.0
>>
> 



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v1 3/6] kvm: Synchronize the backup bitmap in the last stage
  2023-02-21 23:44     ` Gavin Shan
@ 2023-02-21 23:58       ` Peter Xu
  2023-02-22  6:06         ` Gavin Shan
  0 siblings, 1 reply; 25+ messages in thread
From: Peter Xu @ 2023-02-21 23:58 UTC (permalink / raw)
  To: Gavin Shan
  Cc: qemu-arm, qemu-devel, pbonzini, peter.maydell, david, philmd,
	mst, cohuck, quintela, dgilbert, maz, zhenyzha, shan.gavin

Hi, Gavin,

On Wed, Feb 22, 2023 at 10:44:07AM +1100, Gavin Shan wrote:
> Peter, could you please give some hints for me to understand the atomic
> and non-atomic update here? Ok, I will drop this part of changes in next
> revision with the assumption that we have atomic update supported for
> ARM64.

See commit f39b7d2b96.  Please don't remove the change in this patch.

The comment was just something I thought about when reading, not something
I suggested to change.

If to remove it we'll need to remove the whole chunk not your changes alone
here.  Still, please take it with a grain of salt before anyone can help to
confirm because I can miss something else here.

In short: before we know anything solidly, your current code is exactly
correct, AFAICT.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v1 5/6] hw/arm/virt: Enable backup bitmap for dirty ring
  2023-02-21 16:27   ` Peter Maydell
@ 2023-02-22  4:35     ` Gavin Shan
  2023-02-22 15:54       ` Peter Maydell
  0 siblings, 1 reply; 25+ messages in thread
From: Gavin Shan @ 2023-02-22  4:35 UTC (permalink / raw)
  To: Peter Maydell
  Cc: qemu-arm, qemu-devel, pbonzini, peterx, david, philmd, mst,
	cohuck, quintela, dgilbert, maz, zhenyzha, shan.gavin

On 2/22/23 3:27 AM, Peter Maydell wrote:
> On Mon, 13 Feb 2023 at 00:40, Gavin Shan <gshan@redhat.com> wrote:
>>
>> When KVM device "kvm-arm-gicv3" or "arm-its-kvm" is used, we have to
>> enable the backup bitmap for the dirty ring. Otherwise, the migration
>> will fail because those two devices are using the backup bitmap to track
>> dirty guest memory, corresponding to various hardware tables.
>>
>> Signed-off-by: Gavin Shan <gshan@redhat.com>
>> Reviewed-by: Juan Quintela <quintela@redhat.com>
>> ---
>>   hw/arm/virt.c        | 26 ++++++++++++++++++++++++++
>>   target/arm/kvm64.c   | 25 +++++++++++++++++++++++++
>>   target/arm/kvm_arm.h | 12 ++++++++++++
>>   3 files changed, 63 insertions(+)
>>
>> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
>> index 75f28947de..ea9bd98a65 100644
>> --- a/hw/arm/virt.c
>> +++ b/hw/arm/virt.c
>> @@ -2024,6 +2024,8 @@ static void machvirt_init(MachineState *machine)
>>       VirtMachineClass *vmc = VIRT_MACHINE_GET_CLASS(machine);
>>       MachineClass *mc = MACHINE_GET_CLASS(machine);
>>       const CPUArchIdList *possible_cpus;
>> +    const char *gictype = NULL;
>> +    const char *itsclass = NULL;
>>       MemoryRegion *sysmem = get_system_memory();
>>       MemoryRegion *secure_sysmem = NULL;
>>       MemoryRegion *tag_sysmem = NULL;
>> @@ -2071,6 +2073,30 @@ static void machvirt_init(MachineState *machine)
>>        */
>>       finalize_gic_version(vms);
>>
>> +    /*
>> +     * When "kvm-arm-gicv3" or "arm-its-kvm" is used, the backup dirty
>> +     * bitmap has to be enabled for KVM dirty ring, before any memory
>> +     * slot is added. Otherwise, the migration will fail with the dirty
>> +     * ring.
>> +     */
>> +    if (kvm_enabled()) {
>> +        if (vms->gic_version != VIRT_GIC_VERSION_2) {
>> +            gictype = gicv3_class_name();
>> +        }
>> +
>> +        if (vms->gic_version != VIRT_GIC_VERSION_2 && vms->its) {
>> +            itsclass = its_class_name();
>> +        }
>> +
>> +        if (((gictype && !strcmp(gictype, "kvm-arm-gicv3")) ||
>> +             (itsclass && !strcmp(itsclass, "arm-its-kvm"))) &&
>> +            !kvm_arm_enable_dirty_ring_with_bitmap()) {
>> +            error_report("Failed to enable the backup bitmap for "
>> +                         "KVM dirty ring");
>> +            exit(1);
>> +        }
>> +    }
> 
> Why does this need to be board-specific code? Is there
> some way we can just do the right thing automatically?
> Why does the GIC/ITS matter?
> 
> The kernel should already know whether we have asked it
> to do something that needs this extra extension, so
> I think we ought to be able in the generic "enable the
> dirty ring" code say "if the kernel says we need this
> extra thing, also enable this extra thing". Or if that's
> too early, we can do the extra part in a generic hook a
> bit later.
> 
> In the future there might be other things, presumably,
> that need the backup bitmap, so it would be more future
> proof not to need to also change QEMU to add extra
> logic checks that duplicate the logic the kernel already has.
> 

When the dirty ring is enabled, a per-vcpu buffer is used to track the dirty pages.
The prerequisite to use the per-vcpu buffer is existing running VCPU context. There
are two cases where no running VCPU context exists and the backup bitmap extension
is needed, as we know until now: (a) save/restore GICv3 tables; (b) save/restore ITS
tables; These two cases are related to KVM device "kvm-arm-gicv3" and "arm-its-kvm",
which are only needed by virt machine at present. So we needn't the backup bitmap
extension for other boards.

The host kernel always exports the capability KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP
for ARM64. The capability isn't exported for x86 because we needn't it there. The
generic path to enable the extension would be in kvm_init(). I think the extension
is enabled unconditionally if it has been exported by host kernel. It means there
will be unnecessary overhead to synchronize the backup bitmap when the aforementioned
KVM devices aren't used, but the overhead should be very small and acceptable. The
only concern is host kernel has to allocate the backup bitmap, which is totally no
use. Please let me know your thoughts, Peter.


qemu_init
   qemu_create_machine
     :
     :
   configure_accelerators
     do_configure_accelerator
       accel_init_machine
         kvm_init                            // Where the dirty ring is eanbled. Would be best
           kvm_arch_init                     // place to enable the backup bitmap extension regardless
     :                                       // of 'kvm-arm-gicv3' and 'arm-its-kvm' are used
     :
   qmp_x_exit_preconfig
     qemu_init_board
       machine_run_board_init
         machvirt_init                      // memory slots are added here and where we know the KVM devices
     :                                      // are used
     :
   accel_setup_post                         // no backend for KVM yet, xen only

Note that the capability KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP can't be enabled if
the dirty ring isn't enabled or memory slots have been added.

Thanks,
Gavin



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v1 3/6] kvm: Synchronize the backup bitmap in the last stage
  2023-02-21 23:58       ` Peter Xu
@ 2023-02-22  6:06         ` Gavin Shan
  0 siblings, 0 replies; 25+ messages in thread
From: Gavin Shan @ 2023-02-22  6:06 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-arm, qemu-devel, pbonzini, peter.maydell, david, philmd,
	mst, cohuck, quintela, dgilbert, maz, zhenyzha, shan.gavin

On 2/22/23 10:58 AM, Peter Xu wrote:
> On Wed, Feb 22, 2023 at 10:44:07AM +1100, Gavin Shan wrote:
>> Peter, could you please give some hints for me to understand the atomic
>> and non-atomic update here? Ok, I will drop this part of changes in next
>> revision with the assumption that we have atomic update supported for
>> ARM64.
> 
> See commit f39b7d2b96.  Please don't remove the change in this patch.
> 
> The comment was just something I thought about when reading, not something
> I suggested to change.
> 
> If to remove it we'll need to remove the whole chunk not your changes alone
> here.  Still, please take it with a grain of salt before anyone can help to
> confirm because I can miss something else here.
> 
> In short: before we know anything solidly, your current code is exactly
> correct, AFAICT.
> 

Thanks, Peter. I think it's all for later. I will keep the changes and with
your r-b in next revision :)

Thanks,
Gavin



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v1 1/6] linux-headers: Update for dirty ring
  2023-02-21 23:18     ` Gavin Shan
@ 2023-02-22  8:49       ` Cornelia Huck
  2023-02-22  9:03         ` Gavin Shan
  0 siblings, 1 reply; 25+ messages in thread
From: Cornelia Huck @ 2023-02-22  8:49 UTC (permalink / raw)
  To: Gavin Shan, Peter Maydell
  Cc: qemu-arm, qemu-devel, pbonzini, peterx, david, philmd, mst,
	quintela, dgilbert, maz, zhenyzha, shan.gavin

On Wed, Feb 22 2023, Gavin Shan <gshan@redhat.com> wrote:

> On 2/22/23 3:30 AM, Peter Maydell wrote:
>> On Mon, 13 Feb 2023 at 00:39, Gavin Shan <gshan@redhat.com> wrote:
>>>
>>> Signed-off-by: Gavin Shan <gshan@redhat.com>
>>> ---
>>>   linux-headers/asm-arm64/kvm.h | 1 +
>>>   linux-headers/linux/kvm.h     | 2 ++
>>>   2 files changed, 3 insertions(+)
>> 
>> For this to be a non-RFC patch, this needs to be a proper
>> sync of the headers against an upstream kernel tree.
>> (By-hand tweaks are fine for purposes of working on
>> and getting patchsets reviewed.)
>> 
>
> Yes, I vaguely remember there is script to synchronize linux header files,
> which is './scripts/update-linux-headers.sh'. I think I need to run the
> following command to update?
>
>    # ./scripts/update-linux-headers.sh <LINUX_PATH> <QEMU_PATH>

Yes, and please include a reference to the exact Linux version you used
in the commit :)



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v1 1/6] linux-headers: Update for dirty ring
  2023-02-22  8:49       ` Cornelia Huck
@ 2023-02-22  9:03         ` Gavin Shan
  0 siblings, 0 replies; 25+ messages in thread
From: Gavin Shan @ 2023-02-22  9:03 UTC (permalink / raw)
  To: Cornelia Huck, Peter Maydell
  Cc: qemu-arm, qemu-devel, pbonzini, peterx, david, philmd, mst,
	quintela, dgilbert, maz, zhenyzha, shan.gavin

On 2/22/23 7:49 PM, Cornelia Huck wrote:
> On Wed, Feb 22 2023, Gavin Shan <gshan@redhat.com> wrote:
> 
>> On 2/22/23 3:30 AM, Peter Maydell wrote:
>>> On Mon, 13 Feb 2023 at 00:39, Gavin Shan <gshan@redhat.com> wrote:
>>>>
>>>> Signed-off-by: Gavin Shan <gshan@redhat.com>
>>>> ---
>>>>    linux-headers/asm-arm64/kvm.h | 1 +
>>>>    linux-headers/linux/kvm.h     | 2 ++
>>>>    2 files changed, 3 insertions(+)
>>>
>>> For this to be a non-RFC patch, this needs to be a proper
>>> sync of the headers against an upstream kernel tree.
>>> (By-hand tweaks are fine for purposes of working on
>>> and getting patchsets reviewed.)
>>>
>>
>> Yes, I vaguely remember there is script to synchronize linux header files,
>> which is './scripts/update-linux-headers.sh'. I think I need to run the
>> following command to update?
>>
>>     # ./scripts/update-linux-headers.sh <LINUX_PATH> <QEMU_PATH>
> 
> Yes, and please include a reference to the exact Linux version you used
> in the commit :)
> 

Nice, thanks, Connie :)



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v1 5/6] hw/arm/virt: Enable backup bitmap for dirty ring
  2023-02-22  4:35     ` Gavin Shan
@ 2023-02-22 15:54       ` Peter Maydell
  2023-02-23  0:52         ` Gavin Shan
  0 siblings, 1 reply; 25+ messages in thread
From: Peter Maydell @ 2023-02-22 15:54 UTC (permalink / raw)
  To: Gavin Shan
  Cc: qemu-arm, qemu-devel, pbonzini, peterx, david, philmd, mst,
	cohuck, quintela, dgilbert, maz, zhenyzha, shan.gavin

On Wed, 22 Feb 2023 at 04:36, Gavin Shan <gshan@redhat.com> wrote:
>
> On 2/22/23 3:27 AM, Peter Maydell wrote:
> > Why does this need to be board-specific code? Is there
> > some way we can just do the right thing automatically?
> > Why does the GIC/ITS matter?
> >
> > The kernel should already know whether we have asked it
> > to do something that needs this extra extension, so
> > I think we ought to be able in the generic "enable the
> > dirty ring" code say "if the kernel says we need this
> > extra thing, also enable this extra thing". Or if that's
> > too early, we can do the extra part in a generic hook a
> > bit later.
> >
> > In the future there might be other things, presumably,
> > that need the backup bitmap, so it would be more future
> > proof not to need to also change QEMU to add extra
> > logic checks that duplicate the logic the kernel already has.
> >
>
> When the dirty ring is enabled, a per-vcpu buffer is used to track the dirty pages.
> The prerequisite to use the per-vcpu buffer is existing running VCPU context. There
> are two cases where no running VCPU context exists and the backup bitmap extension
> is needed, as we know until now: (a) save/restore GICv3 tables; (b) save/restore ITS
> tables; These two cases are related to KVM device "kvm-arm-gicv3" and "arm-its-kvm",
> which are only needed by virt machine at present. So we needn't the backup bitmap
> extension for other boards.

But we might have to for other boards we add later. We shouldn't
put code in per-board if it's not really board specific.

Moreover, I think "we need the backup bitmap if the kernel is
using its GICv3 or ITS implementation" is a kernel implementation
detail. It seems to me that it would be cleaner if QEMU didn't
have to hardcode "we happen to know that these are the situations
when we need to do that". A better API would be "ask the kernel
'do we need this?' and enable it if it says 'yes'". The kernel
knows what its implementations of ITS and GICv3 (and perhaps
future in-kernel memory-using devices) require, after all.

thanks
-- PMM


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v1 5/6] hw/arm/virt: Enable backup bitmap for dirty ring
  2023-02-22 15:54       ` Peter Maydell
@ 2023-02-23  0:52         ` Gavin Shan
  2023-02-23 11:51           ` Peter Maydell
  0 siblings, 1 reply; 25+ messages in thread
From: Gavin Shan @ 2023-02-23  0:52 UTC (permalink / raw)
  To: Peter Maydell
  Cc: qemu-arm, qemu-devel, pbonzini, peterx, david, philmd, mst,
	cohuck, quintela, dgilbert, maz, zhenyzha, shan.gavin

On 2/23/23 2:54 AM, Peter Maydell wrote:
> On Wed, 22 Feb 2023 at 04:36, Gavin Shan <gshan@redhat.com> wrote:
>>
>> On 2/22/23 3:27 AM, Peter Maydell wrote:
>>> Why does this need to be board-specific code? Is there
>>> some way we can just do the right thing automatically?
>>> Why does the GIC/ITS matter?
>>>
>>> The kernel should already know whether we have asked it
>>> to do something that needs this extra extension, so
>>> I think we ought to be able in the generic "enable the
>>> dirty ring" code say "if the kernel says we need this
>>> extra thing, also enable this extra thing". Or if that's
>>> too early, we can do the extra part in a generic hook a
>>> bit later.
>>>
>>> In the future there might be other things, presumably,
>>> that need the backup bitmap, so it would be more future
>>> proof not to need to also change QEMU to add extra
>>> logic checks that duplicate the logic the kernel already has.
>>>
>>
>> When the dirty ring is enabled, a per-vcpu buffer is used to track the dirty pages.
>> The prerequisite to use the per-vcpu buffer is existing running VCPU context. There
>> are two cases where no running VCPU context exists and the backup bitmap extension
>> is needed, as we know until now: (a) save/restore GICv3 tables; (b) save/restore ITS
>> tables; These two cases are related to KVM device "kvm-arm-gicv3" and "arm-its-kvm",
>> which are only needed by virt machine at present. So we needn't the backup bitmap
>> extension for other boards.
> 
> But we might have to for other boards we add later. We shouldn't
> put code in per-board if it's not really board specific.
> 
> Moreover, I think "we need the backup bitmap if the kernel is
> using its GICv3 or ITS implementation" is a kernel implementation
> detail. It seems to me that it would be cleaner if QEMU didn't
> have to hardcode "we happen to know that these are the situations
> when we need to do that". A better API would be "ask the kernel
> 'do we need this?' and enable it if it says 'yes'". The kernel
> knows what its implementations of ITS and GICv3 (and perhaps
> future in-kernel memory-using devices) require, after all.
> 

Well, As we know so far, the backup bitmap extension is only required by 'kvm-arm-gicv3'
and 'arm-its-kvm' device. Those two devices are only used by virt machine at present.
So it's a board specific requirement. I'm not sure about the future. We may need to
enable the extension for other devices and other boards. That time, the requirement
isn't board specific any more. However, we're uncertain for the future.

In order to cover the future requirement, the extension is needed by other boards,
the best way I can figure out is to enable the extension in generic path in kvm_init()
if the extension is supported by the host kernel. In this way, the unnecessary overhead
is introduced for those boards where 'kvm-arm-vgic3' and 'arm-its-kvm' aren't used.
The overhead should be very small and acceptable. Note that the host kernel don't know
if 'kvm-arm-vgic3' or 'arm-its-kvm' device is needed by the board in kvm_init(), which
is the generic path.

The 'kvm-arm-vgic3' and 'arm-its-kvm' devices are created in machvirt_init(), where
the memory slots are also added. Prior to the function, host kernel doesn't know if
the extension is needed by QEMU. It means we have to enable the extension in machvirt_init(),
which is exactly what we're doing. The difference is QEMU decides to enable the extension
instead of being told to enable it by host kernel. Host kernel doesn't have the answer to
"Hey host kernel, do we need to enable the extension" until machvirt_init() where the devices
are created. Besides, machvirt_init() isn't the generic path if we want to enable the extension
for all possible boards. Further more, the extension can't be enabled if memory slots have been
added.

In summary, the best way I can figure out is to enable the extension in kvm_init() if it
has been supported by host kernel, to cover all possible boards for future cases. Otherwise,
we keep what we're doing to enable the extension in machvirt_init(). Please let me know your
thoughts, Peter :)

Thanks,
Gavin



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v1 5/6] hw/arm/virt: Enable backup bitmap for dirty ring
  2023-02-23  0:52         ` Gavin Shan
@ 2023-02-23 11:51           ` Peter Maydell
  2023-02-24 10:19             ` Gavin Shan
  0 siblings, 1 reply; 25+ messages in thread
From: Peter Maydell @ 2023-02-23 11:51 UTC (permalink / raw)
  To: Gavin Shan
  Cc: qemu-arm, qemu-devel, pbonzini, peterx, david, philmd, mst,
	cohuck, quintela, dgilbert, maz, zhenyzha, shan.gavin

On Thu, 23 Feb 2023 at 00:52, Gavin Shan <gshan@redhat.com> wrote:
>
> On 2/23/23 2:54 AM, Peter Maydell wrote:
> > But we might have to for other boards we add later. We shouldn't
> > put code in per-board if it's not really board specific.
> >
> > Moreover, I think "we need the backup bitmap if the kernel is
> > using its GICv3 or ITS implementation" is a kernel implementation
> > detail. It seems to me that it would be cleaner if QEMU didn't
> > have to hardcode "we happen to know that these are the situations
> > when we need to do that". A better API would be "ask the kernel
> > 'do we need this?' and enable it if it says 'yes'". The kernel
> > knows what its implementations of ITS and GICv3 (and perhaps
> > future in-kernel memory-using devices) require, after all.
> >
>
> Well, As we know so far, the backup bitmap extension is only required by 'kvm-arm-gicv3'
> and 'arm-its-kvm' device. Those two devices are only used by virt machine at present.
> So it's a board specific requirement. I'm not sure about the future. We may need to
> enable the extension for other devices and other boards. That time, the requirement
> isn't board specific any more. However, we're uncertain for the future.

Most boards using KVM are likely to want a GICv3, and
probably an ITS too. A board with no interrupt controller
is useless, and the GICv2 is obsolete.

> In order to cover the future requirement, the extension is needed by other boards,
> the best way I can figure out is to enable the extension in generic path in kvm_init()
> if the extension is supported by the host kernel. In this way, the unnecessary overhead
> is introduced for those boards where 'kvm-arm-vgic3' and 'arm-its-kvm' aren't used.
> The overhead should be very small and acceptable. Note that the host kernel don't know
> if 'kvm-arm-vgic3' or 'arm-its-kvm' device is needed by the board in kvm_init(), which
> is the generic path.

We can have a generic hook that happens after board init is
done, if we want to do non-board-specific stuff that happens
later. However I suspect that anybody who cares about migration
performance is likely using a GICv3 at least anyway,
so "enable always" should be fine.

thanks
-- PMM


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v1 5/6] hw/arm/virt: Enable backup bitmap for dirty ring
  2023-02-23 11:51           ` Peter Maydell
@ 2023-02-24 10:19             ` Gavin Shan
  0 siblings, 0 replies; 25+ messages in thread
From: Gavin Shan @ 2023-02-24 10:19 UTC (permalink / raw)
  To: Peter Maydell
  Cc: qemu-arm, qemu-devel, pbonzini, peterx, david, philmd, mst,
	cohuck, quintela, dgilbert, maz, zhenyzha, shan.gavin

On 2/23/23 10:51 PM, Peter Maydell wrote:
> On Thu, 23 Feb 2023 at 00:52, Gavin Shan <gshan@redhat.com> wrote:
>>
>> On 2/23/23 2:54 AM, Peter Maydell wrote:
>>> But we might have to for other boards we add later. We shouldn't
>>> put code in per-board if it's not really board specific.
>>>
>>> Moreover, I think "we need the backup bitmap if the kernel is
>>> using its GICv3 or ITS implementation" is a kernel implementation
>>> detail. It seems to me that it would be cleaner if QEMU didn't
>>> have to hardcode "we happen to know that these are the situations
>>> when we need to do that". A better API would be "ask the kernel
>>> 'do we need this?' and enable it if it says 'yes'". The kernel
>>> knows what its implementations of ITS and GICv3 (and perhaps
>>> future in-kernel memory-using devices) require, after all.
>>>
>>
>> Well, As we know so far, the backup bitmap extension is only required by 'kvm-arm-gicv3'
>> and 'arm-its-kvm' device. Those two devices are only used by virt machine at present.
>> So it's a board specific requirement. I'm not sure about the future. We may need to
>> enable the extension for other devices and other boards. That time, the requirement
>> isn't board specific any more. However, we're uncertain for the future.
> 
> Most boards using KVM are likely to want a GICv3, and
> probably an ITS too. A board with no interrupt controller
> is useless, and the GICv2 is obsolete.
> 

Yes, exactly.

>> In order to cover the future requirement, the extension is needed by other boards,
>> the best way I can figure out is to enable the extension in generic path in kvm_init()
>> if the extension is supported by the host kernel. In this way, the unnecessary overhead
>> is introduced for those boards where 'kvm-arm-vgic3' and 'arm-its-kvm' aren't used.
>> The overhead should be very small and acceptable. Note that the host kernel don't know
>> if 'kvm-arm-vgic3' or 'arm-its-kvm' device is needed by the board in kvm_init(), which
>> is the generic path.
> 
> We can have a generic hook that happens after board init is
> done, if we want to do non-board-specific stuff that happens
> later. However I suspect that anybody who cares about migration
> performance is likely using a GICv3 at least anyway,
> so "enable always" should be fine.
> 

Ok, Thanks, Peter. I will enable it always in next revision. The overhead
is very small and acceptable.

Thanks,
Gavin



^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2023-02-24 10:20 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-13  0:39 [PATCH v1 0/6] hw/arm/virt: Support dirty ring Gavin Shan
2023-02-13  0:39 ` [PATCH v1 1/6] linux-headers: Update for " Gavin Shan
2023-02-21 16:30   ` Peter Maydell
2023-02-21 23:18     ` Gavin Shan
2023-02-22  8:49       ` Cornelia Huck
2023-02-22  9:03         ` Gavin Shan
2023-02-13  0:39 ` [PATCH v1 2/6] migration: Add last stage indicator to global dirty log synchronization Gavin Shan
2023-02-21 17:36   ` Peter Xu
2023-02-21 23:20     ` Gavin Shan
2023-02-13  0:39 ` [PATCH v1 3/6] kvm: Synchronize the backup bitmap in the last stage Gavin Shan
2023-02-21 17:46   ` Peter Xu
2023-02-21 23:44     ` Gavin Shan
2023-02-21 23:58       ` Peter Xu
2023-02-22  6:06         ` Gavin Shan
2023-02-13  0:39 ` [PATCH v1 4/6] kvm: Add helper kvm_dirty_ring_init() Gavin Shan
2023-02-21 20:12   ` Peter Xu
2023-02-13  0:39 ` [PATCH v1 5/6] hw/arm/virt: Enable backup bitmap for dirty ring Gavin Shan
2023-02-21 16:27   ` Peter Maydell
2023-02-22  4:35     ` Gavin Shan
2023-02-22 15:54       ` Peter Maydell
2023-02-23  0:52         ` Gavin Shan
2023-02-23 11:51           ` Peter Maydell
2023-02-24 10:19             ` Gavin Shan
2023-02-13  0:39 ` [PATCH v1 6/6] kvm: Enable dirty ring for arm64 Gavin Shan
2023-02-17  1:59 ` [PATCH v1 0/6] hw/arm/virt: Support dirty ring Zhenyu Zhang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.