All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH  0/8] kvm: Fixes, cleanups and live migration
@ 2009-05-01 21:17 Jan Kiszka
  2009-05-01 21:17 ` [Qemu-devel] [PATCH 1/8] kvm: Conditionally apply workaround for KVM slot handling bug Jan Kiszka
                   ` (10 more replies)
  0 siblings, 11 replies; 48+ messages in thread
From: Jan Kiszka @ 2009-05-01 21:17 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Avi Kivity, qemu-devel

Besides refreshed versions of my already posted fixes and cleanups for
KVM, this series comes with patches to enable live migration in KVM
mode. If there is still some migration bit missing compared to qemu-kvm,
please let me know.

Find the patches also at git://git.kiszka.org/qemu.git queues/kvm

Jan Kiszka (8):
      kvm: Conditionally apply workaround for KVM slot handling bug
      kvm: Introduce kvm_set_migration_log
      kvm: Fix dirty log temporary buffer size
      kvm: Rework dirty bitmap synchronization
      kvm: Add missing bits to support live migration
      kvm: Fix framebuffer dirty log sync
      Introduce reset notifier order
      kvm: Rework VCPU reset

 cpu-all.h             |    3 +-
 exec.c                |   14 ++++-
 hw/ac97.c             |    2 +-
 hw/acpi.c             |    2 +-
 hw/adb.c              |    2 +-
 hw/apic.c             |    2 +-
 hw/arm_boot.c         |    2 +-
 hw/axis_dev88.c       |    2 +-
 hw/cirrus_vga.c       |    2 +-
 hw/cs4231.c           |    2 +-
 hw/cs4231a.c          |    2 +-
 hw/cuda.c             |    2 +-
 hw/dma.c              |    2 +-
 hw/dp8393x.c          |    2 +-
 hw/eccmemctl.c        |    2 +-
 hw/eepro100.c         |    2 +-
 hw/es1370.c           |    2 +-
 hw/escc.c             |    4 +-
 hw/esp.c              |    2 +-
 hw/etraxfs.c          |    2 +-
 hw/etraxfs_timer.c    |    2 +-
 hw/fdc.c              |    2 +-
 hw/framebuffer.c      |    5 +--
 hw/fw_cfg.c           |    2 +-
 hw/g364fb.c           |    2 +-
 hw/grackle_pci.c      |    2 +-
 hw/heathrow_pic.c     |    2 +-
 hw/hpet.c             |    2 +-
 hw/hw.h               |    2 +-
 hw/i8254.c            |    2 +-
 hw/i8259.c            |    2 +-
 hw/ide.c              |    8 ++--
 hw/ioapic.c           |    2 +-
 hw/iommu.c            |    2 +-
 hw/lm832x.c           |    2 +-
 hw/m48t59.c           |    2 +-
 hw/mac_dbdma.c        |    2 +-
 hw/mac_nvram.c        |    2 +-
 hw/mips_jazz.c        |    2 +-
 hw/mips_malta.c       |    4 +-
 hw/mips_mipssim.c     |    2 +-
 hw/mips_r4k.c         |    2 +-
 hw/musicpal.c         |    4 +-
 hw/nseries.c          |    2 +-
 hw/omap1.c            |    2 +-
 hw/omap2.c            |    2 +-
 hw/openpic.c          |    4 +-
 hw/parallel.c         |    4 +-
 hw/pc.c               |    2 +-
 hw/pckbd.c            |    4 +-
 hw/pl181.c            |    2 +-
 hw/ppc405_boards.c    |    4 +-
 hw/ppc405_uc.c        |   24 ++++----
 hw/ppc4xx_devs.c      |    6 +-
 hw/ppc4xx_pci.c       |    2 +-
 hw/ppc_newworld.c     |    2 +-
 hw/ppc_oldworld.c     |    2 +-
 hw/ppc_prep.c         |    2 +-
 hw/ps2.c              |    4 +-
 hw/rc4030.c           |    2 +-
 hw/sbi.c              |    2 +-
 hw/serial.c           |    2 +-
 hw/slavio_intctl.c    |    2 +-
 hw/slavio_misc.c      |    2 +-
 hw/slavio_timer.c     |    2 +-
 hw/sparc32_dma.c      |    2 +-
 hw/sun4c_intctl.c     |    2 +-
 hw/sun4m.c            |   10 ++--
 hw/sun4u.c            |    2 +-
 hw/tcx.c              |    2 +-
 hw/tsc2005.c          |    2 +-
 hw/tsc210x.c          |    4 +-
 hw/unin_pci.c         |    2 +-
 hw/usb-ohci.c         |    2 +-
 hw/vga.c              |    2 +-
 hw/virtio.c           |    2 +-
 kvm-all.c             |  139 +++++++++++++++++++++++++++++++++++--------------
 kvm.h                 |    5 +-
 target-i386/machine.c |    4 ++
 target-ppc/machine.c  |    5 ++
 vl.c                  |   16 ++++--
 81 files changed, 240 insertions(+), 155 deletions(-)

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH 1/8] kvm: Conditionally apply workaround for KVM slot handling bug
  2009-05-01 21:17 [Qemu-devel] [PATCH 0/8] kvm: Fixes, cleanups and live migration Jan Kiszka
@ 2009-05-01 21:17 ` Jan Kiszka
  2009-05-01 21:17 ` [Qemu-devel] [PATCH 3/8] kvm: Fix dirty log temporary buffer size Jan Kiszka
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 48+ messages in thread
From: Jan Kiszka @ 2009-05-01 21:17 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Avi Kivity, qemu-devel

Only apply the workaround for broken slot joining in KVM when the
capability was not found that signals the corresponding fix existence.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---

 kvm-all.c |   12 +++++++++++-
 1 files changed, 11 insertions(+), 1 deletions(-)

diff --git a/kvm-all.c b/kvm-all.c
index 36659a9..cde3c5b 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -57,6 +57,7 @@ struct KVMState
     int fd;
     int vmfd;
     int coalesced_mmio;
+    int broken_set_mem_region;
 #ifdef KVM_CAP_SET_GUEST_DEBUG
     struct kvm_sw_breakpoint_head kvm_sw_breakpoints;
 #endif
@@ -398,6 +399,14 @@ int kvm_init(int smp_cpus)
         s->coalesced_mmio = ret;
 #endif
 
+    s->broken_set_mem_region = 1;
+#ifdef KVM_CAP_JOIN_MEMORY_REGIONS_WORKS
+    ret = kvm_ioctl(s, KVM_CHECK_EXTENSION, KVM_CAP_JOIN_MEMORY_REGIONS_WORKS);
+    if (ret > 0) {
+        s->broken_set_mem_region = 0;
+    }
+#endif
+
     ret = kvm_arch_init(s, smp_cpus);
     if (ret < 0)
         goto err;
@@ -631,7 +640,8 @@ void kvm_set_phys_mem(target_phys_addr_t start_addr,
          * address as the first existing one. If not or if some overlapping
          * slot comes around later, we will fail (not seen in practice so far)
          * - and actually require a recent KVM version. */
-        if (old.start_addr == start_addr && old.memory_size < size &&
+        if (s->broken_set_mem_region &&
+            old.start_addr == start_addr && old.memory_size < size &&
             flags < IO_MEM_UNASSIGNED) {
             mem = kvm_alloc_slot(s);
             mem->memory_size = old.memory_size;

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH 2/8] kvm: Introduce kvm_set_migration_log
  2009-05-01 21:17 [Qemu-devel] [PATCH 0/8] kvm: Fixes, cleanups and live migration Jan Kiszka
                   ` (4 preceding siblings ...)
  2009-05-01 21:17 ` [Qemu-devel] [PATCH 4/8] kvm: Rework dirty bitmap synchronization Jan Kiszka
@ 2009-05-01 21:17 ` Jan Kiszka
  2009-05-01 21:17 ` [Qemu-devel] [PATCH 8/8] kvm: Rework VCPU reset Jan Kiszka
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 48+ messages in thread
From: Jan Kiszka @ 2009-05-01 21:17 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Avi Kivity, qemu-devel

Introduce a global dirty logging flag that enforces logging for all
slots. This can be used by the live migration code to enable/disable
global logging withouth destroying the per-slot setting.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---

 kvm-all.c |   46 +++++++++++++++++++++++++++++++++++++++-------
 kvm.h     |    1 +
 2 files changed, 40 insertions(+), 7 deletions(-)

diff --git a/kvm-all.c b/kvm-all.c
index cde3c5b..3844398 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -58,6 +58,7 @@ struct KVMState
     int vmfd;
     int coalesced_mmio;
     int broken_set_mem_region;
+    int migration_log;
 #ifdef KVM_CAP_SET_GUEST_DEBUG
     struct kvm_sw_breakpoint_head kvm_sw_breakpoints;
 #endif
@@ -135,7 +136,9 @@ static int kvm_set_user_memory_region(KVMState *s, KVMSlot *slot)
     mem.memory_size = slot->memory_size;
     mem.userspace_addr = (unsigned long)qemu_get_ram_ptr(slot->phys_offset);
     mem.flags = slot->flags;
-
+    if (s->migration_log) {
+        mem.flags |= KVM_MEM_LOG_DIRTY_PAGES;
+    }
     return kvm_vm_ioctl(s, KVM_SET_USER_MEMORY_REGION, &mem);
 }
 
@@ -196,11 +199,12 @@ int kvm_sync_vcpus(void)
  * dirty pages logging control
  */
 static int kvm_dirty_pages_log_change(target_phys_addr_t phys_addr,
-                                      ram_addr_t size, unsigned flags,
-                                      unsigned mask)
+                                      ram_addr_t size, int flags, int mask)
 {
     KVMState *s = kvm_state;
     KVMSlot *mem = kvm_lookup_matching_slot(s, phys_addr, phys_addr + size);
+    int old_flags;
+
     if (mem == NULL)  {
             fprintf(stderr, "BUG: %s: invalid parameters " TARGET_FMT_plx "-"
                     TARGET_FMT_plx "\n", __func__, phys_addr,
@@ -208,13 +212,19 @@ static int kvm_dirty_pages_log_change(target_phys_addr_t phys_addr,
             return -EINVAL;
     }
 
-    flags = (mem->flags & ~mask) | flags;
-    /* Nothing changed, no need to issue ioctl */
-    if (flags == mem->flags)
-            return 0;
+    old_flags = mem->flags;
 
+    flags = (mem->flags & ~mask) | flags;
     mem->flags = flags;
 
+    /* If nothing changed effectively, no need to issue ioctl */
+    if (s->migration_log) {
+        flags |= KVM_MEM_LOG_DIRTY_PAGES;
+    }
+    if (flags == old_flags) {
+            return 0;
+    }
+
     return kvm_set_user_memory_region(s, mem);
 }
 
@@ -232,6 +242,28 @@ int kvm_log_stop(target_phys_addr_t phys_addr, ram_addr_t size)
                                           KVM_MEM_LOG_DIRTY_PAGES);
 }
 
+int kvm_set_migration_log(int enable)
+{
+    KVMState *s = kvm_state;
+    KVMSlot *mem;
+    int i, err;
+
+    s->migration_log = enable;
+
+    for (i = 0; i < ARRAY_SIZE(s->slots); i++) {
+        mem = &s->slots[i];
+
+        if (!!(mem->flags & KVM_MEM_LOG_DIRTY_PAGES) == enable) {
+            continue;
+        }
+        err = kvm_set_user_memory_region(s, mem);
+        if (err) {
+            return err;
+        }
+    }
+    return 0;
+}
+
 /**
  * kvm_physical_sync_dirty_bitmap - Grab dirty bitmap from kernel space
  * This function updates qemu's dirty bitmap using cpu_physical_memory_set_dirty().
diff --git a/kvm.h b/kvm.h
index 0ea2426..6a5f8b3 100644
--- a/kvm.h
+++ b/kvm.h
@@ -45,6 +45,7 @@ void kvm_physical_sync_dirty_bitmap(target_phys_addr_t start_addr,
 
 int kvm_log_start(target_phys_addr_t phys_addr, ram_addr_t size);
 int kvm_log_stop(target_phys_addr_t phys_addr, ram_addr_t size);
+int kvm_set_migration_log(int enable);
 
 int kvm_has_sync_mmu(void);
 

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH 4/8] kvm: Rework dirty bitmap synchronization
  2009-05-01 21:17 [Qemu-devel] [PATCH 0/8] kvm: Fixes, cleanups and live migration Jan Kiszka
                   ` (3 preceding siblings ...)
  2009-05-01 21:17 ` [Qemu-devel] [PATCH 6/8] kvm: Fix framebuffer dirty log sync Jan Kiszka
@ 2009-05-01 21:17 ` Jan Kiszka
  2009-05-03 10:05   ` [Qemu-devel] " Avi Kivity
  2009-05-01 21:17 ` [Qemu-devel] [PATCH 2/8] kvm: Introduce kvm_set_migration_log Jan Kiszka
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 48+ messages in thread
From: Jan Kiszka @ 2009-05-01 21:17 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Avi Kivity, qemu-devel

Extend kvm_physical_sync_dirty_bitmap() so that is can sync across
multiple slots. Useful for updating the whole dirty log during
migration. Moreover, properly pass down errors the whole call chain.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---

 cpu-all.h |    3 ++-
 exec.c    |    8 +++++--
 kvm-all.c |   73 +++++++++++++++++++++++++++++++++++--------------------------
 kvm.h     |    4 ++-
 4 files changed, 52 insertions(+), 36 deletions(-)

diff --git a/cpu-all.h b/cpu-all.h
index 0df54b6..e5ed6ff 100644
--- a/cpu-all.h
+++ b/cpu-all.h
@@ -984,7 +984,8 @@ int cpu_physical_memory_set_dirty_tracking(int enable);
 
 int cpu_physical_memory_get_dirty_tracking(void);
 
-void cpu_physical_sync_dirty_bitmap(target_phys_addr_t start_addr, target_phys_addr_t end_addr);
+int cpu_physical_sync_dirty_bitmap(target_phys_addr_t start_addr,
+                                   target_phys_addr_t end_addr);
 
 void dump_exec_info(FILE *f,
                     int (*cpu_fprintf)(FILE *f, const char *fmt, ...));
diff --git a/exec.c b/exec.c
index c649381..90b848f 100644
--- a/exec.c
+++ b/exec.c
@@ -1931,10 +1931,14 @@ int cpu_physical_memory_get_dirty_tracking(void)
     return in_migration;
 }
 
-void cpu_physical_sync_dirty_bitmap(target_phys_addr_t start_addr, target_phys_addr_t end_addr)
+int cpu_physical_sync_dirty_bitmap(target_phys_addr_t start_addr,
+                                   target_phys_addr_t end_addr)
 {
+    int ret = 0;
+
     if (kvm_enabled())
-        kvm_physical_sync_dirty_bitmap(start_addr, end_addr);
+        ret = kvm_physical_sync_dirty_bitmap(start_addr, end_addr);
+    return ret;
 }
 
 static inline void tlb_update_dirty(CPUTLBEntry *tlb_entry)
diff --git a/kvm-all.c b/kvm-all.c
index 17e5b38..27ad80e 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -272,47 +272,58 @@ int kvm_set_migration_log(int enable)
  * @start_add: start of logged region.
  * @end_addr: end of logged region.
  */
-void kvm_physical_sync_dirty_bitmap(target_phys_addr_t start_addr,
-                                    target_phys_addr_t end_addr)
+int kvm_physical_sync_dirty_bitmap(target_phys_addr_t start_addr,
+                                   target_phys_addr_t end_addr)
 {
     KVMState *s = kvm_state;
-    KVMDirtyLog d;
-    KVMSlot *mem = kvm_lookup_matching_slot(s, start_addr, end_addr);
-    unsigned long alloc_size;
+    unsigned long size, allocated_size = 0;
+    target_phys_addr_t phys_addr;
     ram_addr_t addr;
-    target_phys_addr_t phys_addr = start_addr;
+    KVMDirtyLog d;
+    KVMSlot *mem;
+    int ret = 0;
 
-    dprintf("sync addr: " TARGET_FMT_lx " into %lx\n", start_addr,
-            mem->phys_offset);
-    if (mem == NULL) {
-            fprintf(stderr, "BUG: %s: invalid parameters " TARGET_FMT_plx "-"
-                    TARGET_FMT_plx "\n", __func__, phys_addr, end_addr - 1);
-            return;
-    }
+    d.dirty_bitmap = NULL;
+    while (start_addr < end_addr) {
+        mem = kvm_lookup_overlapping_slot(s, start_addr, end_addr);
+        if (mem == NULL) {
+            break;
+        }
 
-    alloc_size = ((mem->memory_size >> TARGET_PAGE_BITS) + 7) / 8;
-    d.dirty_bitmap = qemu_mallocz(alloc_size);
+        size = ((mem->memory_size >> TARGET_PAGE_BITS) + 7) / 8;
+        if (!d.dirty_bitmap) {
+            d.dirty_bitmap = qemu_malloc(size);
+        } else if (size > allocated_size) {
+            d.dirty_bitmap = qemu_realloc(d.dirty_bitmap, size);
+        }
+        allocated_size = size;
+        memset(d.dirty_bitmap, 0, allocated_size);
 
-    d.slot = mem->slot;
-    dprintf("slot %d, phys_addr %llx, uaddr: %llx\n",
-            d.slot, mem->start_addr, mem->phys_offset);
+        d.slot = mem->slot;
 
-    if (kvm_vm_ioctl(s, KVM_GET_DIRTY_LOG, &d) == -1) {
-        dprintf("ioctl failed %d\n", errno);
-        goto out;
-    }
+        if (kvm_vm_ioctl(s, KVM_GET_DIRTY_LOG, &d) == -1) {
+            dprintf("ioctl failed %d\n", errno);
+            ret = -1;
+            break;
+        }
+
+        for (phys_addr = mem->start_addr, addr = mem->phys_offset;
+             phys_addr < mem->start_addr + mem->memory_size;
+             phys_addr += TARGET_PAGE_SIZE, addr += TARGET_PAGE_SIZE) {
+            unsigned long *bitmap = (unsigned long *)d.dirty_bitmap;
+            unsigned nr = (phys_addr - mem->start_addr) >> TARGET_PAGE_BITS;
+            unsigned word = nr / (sizeof(*bitmap) * 8);
+            unsigned bit = nr % (sizeof(*bitmap) * 8);
 
-    phys_addr = start_addr;
-    for (addr = mem->phys_offset; phys_addr < end_addr; phys_addr+= TARGET_PAGE_SIZE, addr += TARGET_PAGE_SIZE) {
-        unsigned long *bitmap = (unsigned long *)d.dirty_bitmap;
-        unsigned nr = (phys_addr - start_addr) >> TARGET_PAGE_BITS;
-        unsigned word = nr / (sizeof(*bitmap) * 8);
-        unsigned bit = nr % (sizeof(*bitmap) * 8);
-        if ((bitmap[word] >> bit) & 1)
-            cpu_physical_memory_set_dirty(addr);
+            if ((bitmap[word] >> bit) & 1) {
+                cpu_physical_memory_set_dirty(addr);
+            }
+        }
+        start_addr = phys_addr;
     }
-out:
     qemu_free(d.dirty_bitmap);
+
+    return ret;
 }
 
 int kvm_coalesce_mmio_region(target_phys_addr_t start, ram_addr_t size)
diff --git a/kvm.h b/kvm.h
index 6a5f8b3..6e0589a 100644
--- a/kvm.h
+++ b/kvm.h
@@ -40,8 +40,8 @@ void kvm_set_phys_mem(target_phys_addr_t start_addr,
                       ram_addr_t size,
                       ram_addr_t phys_offset);
 
-void kvm_physical_sync_dirty_bitmap(target_phys_addr_t start_addr,
-                                    target_phys_addr_t end_addr);
+int kvm_physical_sync_dirty_bitmap(target_phys_addr_t start_addr,
+                                   target_phys_addr_t end_addr);
 
 int kvm_log_start(target_phys_addr_t phys_addr, ram_addr_t size);
 int kvm_log_stop(target_phys_addr_t phys_addr, ram_addr_t size);

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH 3/8] kvm: Fix dirty log temporary buffer size
  2009-05-01 21:17 [Qemu-devel] [PATCH 0/8] kvm: Fixes, cleanups and live migration Jan Kiszka
  2009-05-01 21:17 ` [Qemu-devel] [PATCH 1/8] kvm: Conditionally apply workaround for KVM slot handling bug Jan Kiszka
@ 2009-05-01 21:17 ` Jan Kiszka
  2009-05-01 21:17 ` [Qemu-devel] [PATCH 5/8] kvm: Add missing bits to support live migration Jan Kiszka
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 48+ messages in thread
From: Jan Kiszka @ 2009-05-01 21:17 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Avi Kivity, qemu-devel

The buffer passed to KVM_GET_DIRTY_LOG requires one bit per page. Fix
the size calculation in kvm_physical_sync_dirty_bitmap accordingly,
avoiding allocation of extremly oversized buffers.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---

 kvm-all.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kvm-all.c b/kvm-all.c
index 3844398..17e5b38 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -290,7 +290,7 @@ void kvm_physical_sync_dirty_bitmap(target_phys_addr_t start_addr,
             return;
     }
 
-    alloc_size = mem->memory_size >> TARGET_PAGE_BITS / sizeof(d.dirty_bitmap);
+    alloc_size = ((mem->memory_size >> TARGET_PAGE_BITS) + 7) / 8;
     d.dirty_bitmap = qemu_mallocz(alloc_size);
 
     d.slot = mem->slot;

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH 6/8] kvm: Fix framebuffer dirty log sync
  2009-05-01 21:17 [Qemu-devel] [PATCH 0/8] kvm: Fixes, cleanups and live migration Jan Kiszka
                   ` (2 preceding siblings ...)
  2009-05-01 21:17 ` [Qemu-devel] [PATCH 5/8] kvm: Add missing bits to support live migration Jan Kiszka
@ 2009-05-01 21:17 ` Jan Kiszka
  2009-05-01 21:17 ` [Qemu-devel] [PATCH 4/8] kvm: Rework dirty bitmap synchronization Jan Kiszka
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 48+ messages in thread
From: Jan Kiszka @ 2009-05-01 21:17 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Avi Kivity, qemu-devel

kvm_physical_sync_dirty_bitmap() takes the end address as second
argument, not the region size. Moverover, the kvm API should not be used
directly here, but cpu_physical_sync_dirty_bitmap().

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---

 hw/framebuffer.c |    5 +----
 1 files changed, 1 insertions(+), 4 deletions(-)

diff --git a/hw/framebuffer.c b/hw/framebuffer.c
index 1086ba9..24cdf25 100644
--- a/hw/framebuffer.c
+++ b/hw/framebuffer.c
@@ -17,7 +17,6 @@
 #include "hw.h"
 #include "console.h"
 #include "framebuffer.h"
-#include "kvm.h"
 
 /* Render an image from a shared memory framebuffer.  */
    
@@ -50,9 +49,7 @@ void framebuffer_update_display(
     *first_row = -1;
     src_len = src_width * rows;
 
-    if (kvm_enabled()) {
-        kvm_physical_sync_dirty_bitmap(base, src_len);
-    }
+    cpu_physical_sync_dirty_bitmap(base, base + src_len);
     pd = cpu_get_physical_page_desc(base);
     pd2 = cpu_get_physical_page_desc(base + src_len - 1);
     /* We should reall check that this is a continuous ram region.

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH 5/8] kvm: Add missing bits to support live migration
  2009-05-01 21:17 [Qemu-devel] [PATCH 0/8] kvm: Fixes, cleanups and live migration Jan Kiszka
  2009-05-01 21:17 ` [Qemu-devel] [PATCH 1/8] kvm: Conditionally apply workaround for KVM slot handling bug Jan Kiszka
  2009-05-01 21:17 ` [Qemu-devel] [PATCH 3/8] kvm: Fix dirty log temporary buffer size Jan Kiszka
@ 2009-05-01 21:17 ` Jan Kiszka
  2009-05-01 21:17 ` [Qemu-devel] [PATCH 6/8] kvm: Fix framebuffer dirty log sync Jan Kiszka
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 48+ messages in thread
From: Jan Kiszka @ 2009-05-01 21:17 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Avi Kivity, qemu-devel

This patch adds the missing hooks to allow live migration in KVM mode.
It adds proper synchronization before/after saving/restoring the VCPU
states (note: PPC is untested), hooks into
cpu_physical_memory_set_dirty_tracking() to enable dirty memory logging
at KVM level, and synchronizes that drity log into QEMU's view before
running ram_live_save().

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---

 exec.c                |    6 ++++++
 target-i386/machine.c |    4 ++++
 target-ppc/machine.c  |    5 +++++
 vl.c                  |    7 ++++++-
 4 files changed, 21 insertions(+), 1 deletions(-)

diff --git a/exec.c b/exec.c
index 90b848f..9718787 100644
--- a/exec.c
+++ b/exec.c
@@ -516,6 +516,8 @@ static void cpu_common_save(QEMUFile *f, void *opaque)
 {
     CPUState *env = opaque;
 
+    cpu_synchronize_state(env, 0);
+
     qemu_put_be32s(f, &env->halted);
     qemu_put_be32s(f, &env->interrupt_request);
 }
@@ -533,6 +535,7 @@ static int cpu_common_load(QEMUFile *f, void *opaque, int version_id)
        version_id is increased. */
     env->interrupt_request &= ~0x01;
     tlb_flush(env, 1);
+    cpu_synchronize_state(env, 1);
 
     return 0;
 }
@@ -1923,6 +1926,9 @@ void cpu_physical_memory_reset_dirty(ram_addr_t start, ram_addr_t end,
 int cpu_physical_memory_set_dirty_tracking(int enable)
 {
     in_migration = enable;
+    if (kvm_enabled()) {
+        return kvm_set_migration_log(enable);
+    }
     return 0;
 }
 
diff --git a/target-i386/machine.c b/target-i386/machine.c
index 1b0d36d..4fc7335 100644
--- a/target-i386/machine.c
+++ b/target-i386/machine.c
@@ -4,6 +4,7 @@
 #include "hw/isa.h"
 
 #include "exec-all.h"
+#include "kvm.h"
 
 void register_machines(void)
 {
@@ -38,6 +39,8 @@ void cpu_save(QEMUFile *f, void *opaque)
     int32_t a20_mask;
     int i;
 
+    cpu_synchronize_state(env, 0);
+
     for(i = 0; i < CPU_NB_REGS; i++)
         qemu_put_betls(f, &env->regs[i]);
     qemu_put_betls(f, &env->eip);
@@ -330,5 +333,6 @@ int cpu_load(QEMUFile *f, void *opaque, int version_id)
     /* XXX: compute redundant hflags bits */
     env->hflags = hflags;
     tlb_flush(env, 1);
+    cpu_synchronize_state(env, 1);
     return 0;
 }
diff --git a/target-ppc/machine.c b/target-ppc/machine.c
index 8b82005..ac8702f 100644
--- a/target-ppc/machine.c
+++ b/target-ppc/machine.c
@@ -1,5 +1,6 @@
 #include "hw/hw.h"
 #include "hw/boards.h"
+#include "kvm.h"
 
 void register_machines(void)
 {
@@ -17,6 +18,8 @@ void cpu_save(QEMUFile *f, void *opaque)
     CPUState *env = (CPUState *)opaque;
     unsigned int i, j;
 
+    cpu_synchronize_state(env, 0);
+
     for (i = 0; i < 32; i++)
         qemu_put_betls(f, &env->gpr[i]);
 #if !defined(TARGET_PPC64)
@@ -185,5 +188,7 @@ int cpu_load(QEMUFile *f, void *opaque, int version_id)
     qemu_get_sbe32s(f, &env->mmu_idx);
     qemu_get_sbe32s(f, &env->power_mode);
 
+    cpu_synchronize_state(env, 1);
+
     return 0;
 }
diff --git a/vl.c b/vl.c
index 867111c..065cffd 100644
--- a/vl.c
+++ b/vl.c
@@ -3248,13 +3248,18 @@ static int ram_save_live(QEMUFile *f, int stage, void *opaque)
 {
     ram_addr_t addr;
 
+    if (cpu_physical_sync_dirty_bitmap(0, last_ram_offset) != 0) {
+        qemu_file_set_error(f);
+        return 0;
+    }
+
     if (stage == 1) {
         /* Make sure all dirty bits are set */
         for (addr = 0; addr < last_ram_offset; addr += TARGET_PAGE_SIZE) {
             if (!cpu_physical_memory_get_dirty(addr, MIGRATION_DIRTY_FLAG))
                 cpu_physical_memory_set_dirty(addr);
         }
-        
+
         /* Enable dirty memory tracking */
         cpu_physical_memory_set_dirty_tracking(1);
 

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH 8/8] kvm: Rework VCPU reset
  2009-05-01 21:17 [Qemu-devel] [PATCH 0/8] kvm: Fixes, cleanups and live migration Jan Kiszka
                   ` (5 preceding siblings ...)
  2009-05-01 21:17 ` [Qemu-devel] [PATCH 2/8] kvm: Introduce kvm_set_migration_log Jan Kiszka
@ 2009-05-01 21:17 ` Jan Kiszka
  2009-05-03 15:58   ` [Qemu-devel] " Avi Kivity
  2009-05-01 21:17 ` [Qemu-devel] [PATCH 7/8] Introduce reset notifier order Jan Kiszka
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 48+ messages in thread
From: Jan Kiszka @ 2009-05-01 21:17 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Avi Kivity, qemu-devel

Use standard callback with highest order to synchronize VCPU on reset
after all device callbacks were execute. This allows to remove the
special kvm hook in qemu_system_reset.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---

 kvm-all.c |    8 ++++++++
 vl.c      |    2 --
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/kvm-all.c b/kvm-all.c
index 27ad80e..2ac5129 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -22,6 +22,7 @@
 
 #include "qemu-common.h"
 #include "sysemu.h"
+#include "hw/hw.h"
 #include "gdbstub.h"
 #include "kvm.h"
 
@@ -364,6 +365,11 @@ int kvm_uncoalesce_mmio_region(target_phys_addr_t start, ram_addr_t size)
     return ret;
 }
 
+static void kvm_reset_vcpus(void *opaque)
+{
+    kvm_sync_vcpus();
+}
+
 int kvm_init(int smp_cpus)
 {
     KVMState *s;
@@ -454,6 +460,8 @@ int kvm_init(int smp_cpus)
     if (ret < 0)
         goto err;
 
+    qemu_register_reset(kvm_reset_vcpus, INT_MAX, NULL);
+
     kvm_state = s;
 
     return 0;
diff --git a/vl.c b/vl.c
index f6d3ce5..04fedd4 100644
--- a/vl.c
+++ b/vl.c
@@ -3660,8 +3660,6 @@ void qemu_system_reset(void)
     for(re = first_reset_entry; re != NULL; re = re->next) {
         re->func(re->opaque);
     }
-    if (kvm_enabled())
-        kvm_sync_vcpus();
 }
 
 void qemu_system_reset_request(void)

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH 7/8] Introduce reset notifier order
  2009-05-01 21:17 [Qemu-devel] [PATCH 0/8] kvm: Fixes, cleanups and live migration Jan Kiszka
                   ` (6 preceding siblings ...)
  2009-05-01 21:17 ` [Qemu-devel] [PATCH 8/8] kvm: Rework VCPU reset Jan Kiszka
@ 2009-05-01 21:17 ` Jan Kiszka
  2009-05-01 23:52   ` Paul Brook
  2009-05-01 22:30 ` [Qemu-devel] Re: [PATCH 0/8] kvm: Fixes, cleanups and live migration Anthony Liguori
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 48+ messages in thread
From: Jan Kiszka @ 2009-05-01 21:17 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Avi Kivity, qemu-devel

Add the parameter 'order' to qemu_register_reset and sort callbacks on
registration. On system reset, callbacks with lower order will be
invoked before those with higher order. Update all existing users to the
standard order 0.

Note: At least for x86, the existing users seem to assume that handlers
are called in their registration order. Therefore, the patch preserves
this property. If someone feels bored, (s)he could try to identify this
dependency and express it properly on callback registration.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---

 hw/ac97.c          |    2 +-
 hw/acpi.c          |    2 +-
 hw/adb.c           |    2 +-
 hw/apic.c          |    2 +-
 hw/arm_boot.c      |    2 +-
 hw/axis_dev88.c    |    2 +-
 hw/cirrus_vga.c    |    2 +-
 hw/cs4231.c        |    2 +-
 hw/cs4231a.c       |    2 +-
 hw/cuda.c          |    2 +-
 hw/dma.c           |    2 +-
 hw/dp8393x.c       |    2 +-
 hw/eccmemctl.c     |    2 +-
 hw/eepro100.c      |    2 +-
 hw/es1370.c        |    2 +-
 hw/escc.c          |    4 ++--
 hw/esp.c           |    2 +-
 hw/etraxfs.c       |    2 +-
 hw/etraxfs_timer.c |    2 +-
 hw/fdc.c           |    2 +-
 hw/fw_cfg.c        |    2 +-
 hw/g364fb.c        |    2 +-
 hw/grackle_pci.c   |    2 +-
 hw/heathrow_pic.c  |    2 +-
 hw/hpet.c          |    2 +-
 hw/hw.h            |    2 +-
 hw/i8254.c         |    2 +-
 hw/i8259.c         |    2 +-
 hw/ide.c           |    8 ++++----
 hw/ioapic.c        |    2 +-
 hw/iommu.c         |    2 +-
 hw/lm832x.c        |    2 +-
 hw/m48t59.c        |    2 +-
 hw/mac_dbdma.c     |    2 +-
 hw/mac_nvram.c     |    2 +-
 hw/mips_jazz.c     |    2 +-
 hw/mips_malta.c    |    4 ++--
 hw/mips_mipssim.c  |    2 +-
 hw/mips_r4k.c      |    2 +-
 hw/musicpal.c      |    4 ++--
 hw/nseries.c       |    2 +-
 hw/omap1.c         |    2 +-
 hw/omap2.c         |    2 +-
 hw/openpic.c       |    4 ++--
 hw/parallel.c      |    4 ++--
 hw/pc.c            |    2 +-
 hw/pckbd.c         |    4 ++--
 hw/pl181.c         |    2 +-
 hw/ppc405_boards.c |    4 ++--
 hw/ppc405_uc.c     |   24 ++++++++++++------------
 hw/ppc4xx_devs.c   |    6 +++---
 hw/ppc4xx_pci.c    |    2 +-
 hw/ppc_newworld.c  |    2 +-
 hw/ppc_oldworld.c  |    2 +-
 hw/ppc_prep.c      |    2 +-
 hw/ps2.c           |    4 ++--
 hw/rc4030.c        |    2 +-
 hw/sbi.c           |    2 +-
 hw/serial.c        |    2 +-
 hw/slavio_intctl.c |    2 +-
 hw/slavio_misc.c   |    2 +-
 hw/slavio_timer.c  |    2 +-
 hw/sparc32_dma.c   |    2 +-
 hw/sun4c_intctl.c  |    2 +-
 hw/sun4m.c         |   10 +++++-----
 hw/sun4u.c         |    2 +-
 hw/tcx.c           |    2 +-
 hw/tsc2005.c       |    2 +-
 hw/tsc210x.c       |    4 ++--
 hw/unin_pci.c      |    2 +-
 hw/usb-ohci.c      |    2 +-
 hw/vga.c           |    2 +-
 hw/virtio.c        |    2 +-
 vl.c               |    7 +++++--
 74 files changed, 107 insertions(+), 104 deletions(-)

diff --git a/hw/ac97.c b/hw/ac97.c
index ade2719..a7777e5 100644
--- a/hw/ac97.c
+++ b/hw/ac97.c
@@ -1374,7 +1374,7 @@ int ac97_init (PCIBus *bus, AudioState *audio)
     pci_register_io_region (&d->dev, 0, 256 * 4, PCI_ADDRESS_SPACE_IO, ac97_map);
     pci_register_io_region (&d->dev, 1, 64 * 4, PCI_ADDRESS_SPACE_IO, ac97_map);
     register_savevm ("ac97", 0, 2, ac97_save, ac97_load, s);
-    qemu_register_reset (ac97_on_reset, s);
+    qemu_register_reset (ac97_on_reset, 0, s);
     AUD_register_card (audio, "ac97", &s->card);
     ac97_on_reset (s);
     return 0;
diff --git a/hw/acpi.c b/hw/acpi.c
index 7e91405..d23c33c 100644
--- a/hw/acpi.c
+++ b/hw/acpi.c
@@ -550,7 +550,7 @@ i2c_bus *piix4_pm_init(PCIBus *bus, int devfn, uint32_t smb_io_base,
 
     s->smbus = i2c_init_bus();
     s->irq = sci_irq;
-    qemu_register_reset(piix4_reset, s);
+    qemu_register_reset(piix4_reset, 0, s);
 
     return s->smbus;
 }
diff --git a/hw/adb.c b/hw/adb.c
index 61a3cdf..ef04366 100644
--- a/hw/adb.c
+++ b/hw/adb.c
@@ -122,7 +122,7 @@ ADBDevice *adb_register_device(ADBBusState *s, int devaddr,
     d->devreq = devreq;
     d->devreset = devreset;
     d->opaque = opaque;
-    qemu_register_reset((QEMUResetHandler *)devreset, d);
+    qemu_register_reset((QEMUResetHandler *)devreset, 0, d);
     d->devreset(d);
     return d;
 }
diff --git a/hw/apic.c b/hw/apic.c
index d63d74b..8c8b2de 100644
--- a/hw/apic.c
+++ b/hw/apic.c
@@ -924,7 +924,7 @@ int apic_init(CPUState *env)
     s->timer = qemu_new_timer(vm_clock, apic_timer, s);
 
     register_savevm("apic", s->id, 2, apic_save, apic_load, s);
-    qemu_register_reset(apic_reset, s);
+    qemu_register_reset(apic_reset, 0, s);
 
     local_apics[s->id] = s;
     return 0;
diff --git a/hw/arm_boot.c b/hw/arm_boot.c
index 35f0130..acfa67e 100644
--- a/hw/arm_boot.c
+++ b/hw/arm_boot.c
@@ -203,7 +203,7 @@ void arm_load_kernel(CPUState *env, struct arm_boot_info *info)
         if (info->nb_cpus == 0)
             info->nb_cpus = 1;
         env->boot_info = info;
-        qemu_register_reset(main_cpu_reset, env);
+        qemu_register_reset(main_cpu_reset, 0, env);
     }
 
     /* Assume that raw images are linux kernels, and ELF images are not.  */
diff --git a/hw/axis_dev88.c b/hw/axis_dev88.c
index 85ca81e..627d760 100644
--- a/hw/axis_dev88.c
+++ b/hw/axis_dev88.c
@@ -272,7 +272,7 @@ void axisdev88_init (ram_addr_t ram_size, int vga_ram_size,
         cpu_model = "crisv32";
     }
     env = cpu_init(cpu_model);
-    qemu_register_reset(main_cpu_reset, env);
+    qemu_register_reset(main_cpu_reset, 0, env);
 
     /* allocate RAM */
     phys_ram = qemu_ram_alloc(ram_size);
diff --git a/hw/cirrus_vga.c b/hw/cirrus_vga.c
index 52ac346..4c4b701 100644
--- a/hw/cirrus_vga.c
+++ b/hw/cirrus_vga.c
@@ -3229,7 +3229,7 @@ static void cirrus_init_common(CirrusVGAState * s, int device_id, int is_pci)
     s->cursor_invalidate = cirrus_cursor_invalidate;
     s->cursor_draw_line = cirrus_cursor_draw_line;
 
-    qemu_register_reset(cirrus_reset, s);
+    qemu_register_reset(cirrus_reset, 0, s);
     cirrus_reset(s);
     register_savevm("cirrus_vga", 0, 2, cirrus_vga_save, cirrus_vga_load, s);
 }
diff --git a/hw/cs4231.c b/hw/cs4231.c
index 59d83e0..7ebf2b1 100644
--- a/hw/cs4231.c
+++ b/hw/cs4231.c
@@ -175,6 +175,6 @@ void cs_init(target_phys_addr_t base, int irq, void *intctl)
     cs_io_memory = cpu_register_io_memory(0, cs_mem_read, cs_mem_write, s);
     cpu_register_physical_memory(base, CS_SIZE, cs_io_memory);
     register_savevm("cs4231", base, 1, cs_save, cs_load, s);
-    qemu_register_reset(cs_reset, s);
+    qemu_register_reset(cs_reset, 0, s);
     cs_reset(s);
 }
diff --git a/hw/cs4231a.c b/hw/cs4231a.c
index 25ad409..c621dcf 100644
--- a/hw/cs4231a.c
+++ b/hw/cs4231a.c
@@ -661,7 +661,7 @@ int cs4231a_init (AudioState *audio, qemu_irq *pic)
     DMA_register_channel (s->dma, cs_dma_read, s);
 
     register_savevm ("cs4231a", 0, 1, cs_save, cs_load, s);
-    qemu_register_reset (cs_reset, s);
+    qemu_register_reset (cs_reset, 0, s);
     cs_reset (s);
 
     AUD_register_card (audio,"cs4231a", &s->card);
diff --git a/hw/cuda.c b/hw/cuda.c
index 0956354..af1fe1d 100644
--- a/hw/cuda.c
+++ b/hw/cuda.c
@@ -762,6 +762,6 @@ void cuda_init (int *cuda_mem_index, qemu_irq irq)
     s->adb_poll_timer = qemu_new_timer(vm_clock, cuda_adb_poll, s);
     *cuda_mem_index = cpu_register_io_memory(0, cuda_read, cuda_write, s);
     register_savevm("cuda", -1, 1, cuda_save, cuda_load, s);
-    qemu_register_reset(cuda_reset, s);
+    qemu_register_reset(cuda_reset, 0, s);
     cuda_reset(s);
 }
diff --git a/hw/dma.c b/hw/dma.c
index b95407b..c8ed6b0 100644
--- a/hw/dma.c
+++ b/hw/dma.c
@@ -493,7 +493,7 @@ static void dma_init2(struct dma_cont *d, int base, int dshift,
         register_ioport_read (base + ((i + 8) << dshift), 1, 1,
                               read_cont, d);
     }
-    qemu_register_reset(dma_reset, d);
+    qemu_register_reset(dma_reset, 0, d);
     dma_reset(d);
     for (i = 0; i < ARRAY_SIZE (d->regs); ++i) {
         d->regs[i].transfer_handler = dma_phony_handler;
diff --git a/hw/dp8393x.c b/hw/dp8393x.c
index 6170588..43b659d 100644
--- a/hw/dp8393x.c
+++ b/hw/dp8393x.c
@@ -892,7 +892,7 @@ void dp83932_init(NICInfo *nd, target_phys_addr_t base, int it_shift,
                                  nic_receive, nic_can_receive, nic_cleanup, s);
 
     qemu_format_nic_info_str(s->vc, nd->macaddr);
-    qemu_register_reset(nic_reset, s);
+    qemu_register_reset(nic_reset, 0, s);
     nic_reset(s);
 
     s->mmio_index = cpu_register_io_memory(0, dp8393x_read, dp8393x_write, s);
diff --git a/hw/eccmemctl.c b/hw/eccmemctl.c
index 28519c8..3bc3fdc 100644
--- a/hw/eccmemctl.c
+++ b/hw/eccmemctl.c
@@ -334,7 +334,7 @@ void * ecc_init(target_phys_addr_t base, qemu_irq irq, uint32_t version)
                                      ecc_io_memory);
     }
     register_savevm("ECC", base, 3, ecc_save, ecc_load, s);
-    qemu_register_reset(ecc_reset, s);
+    qemu_register_reset(ecc_reset, 0, s);
     ecc_reset(s);
     return s;
 }
diff --git a/hw/eepro100.c b/hw/eepro100.c
index 235e598..fe2e273 100644
--- a/hw/eepro100.c
+++ b/hw/eepro100.c
@@ -1778,7 +1778,7 @@ static PCIDevice *nic_init(PCIBus * bus, NICInfo * nd, uint32_t device)
 
     qemu_format_nic_info_str(s->vc, s->macaddr);
 
-    qemu_register_reset(nic_reset, s);
+    qemu_register_reset(nic_reset, 0, s);
 
     register_savevm(s->vc->model, -1, 3, nic_save, nic_load, s);
     return (PCIDevice *)d;
diff --git a/hw/es1370.c b/hw/es1370.c
index 50f5a55..251ec41 100644
--- a/hw/es1370.c
+++ b/hw/es1370.c
@@ -1060,7 +1060,7 @@ int es1370_init (PCIBus *bus, AudioState *audio)
 
     pci_register_io_region (&d->dev, 0, 256, PCI_ADDRESS_SPACE_IO, es1370_map);
     register_savevm ("es1370", 0, 2, es1370_save, es1370_load, s);
-    qemu_register_reset (es1370_on_reset, s);
+    qemu_register_reset (es1370_on_reset, 0, s);
 
     AUD_register_card (audio, "es1370", &s->card);
     es1370_reset (s);
diff --git a/hw/escc.c b/hw/escc.c
index 4c299ff..0d67fa9 100644
--- a/hw/escc.c
+++ b/hw/escc.c
@@ -758,7 +758,7 @@ int escc_init(target_phys_addr_t base, qemu_irq irqA, qemu_irq irqB,
         register_savevm("escc", base, 2, escc_save, escc_load, s);
     else
         register_savevm("escc", -1, 2, escc_save, escc_load, s);
-    qemu_register_reset(escc_reset, s);
+    qemu_register_reset(escc_reset, 0, s);
     escc_reset(s);
     return escc_io_memory;
 }
@@ -932,6 +932,6 @@ void slavio_serial_ms_kbd_init(target_phys_addr_t base, qemu_irq irq,
                                  "QEMU Sun Mouse");
     qemu_add_kbd_event_handler(sunkbd_event, &s->chn[1]);
     register_savevm("slavio_serial_mouse", base, 2, escc_save, escc_load, s);
-    qemu_register_reset(escc_reset, s);
+    qemu_register_reset(escc_reset, 0, s);
     escc_reset(s);
 }
diff --git a/hw/esp.c b/hw/esp.c
index aa1a76e..4b4acd8 100644
--- a/hw/esp.c
+++ b/hw/esp.c
@@ -666,7 +666,7 @@ void *esp_init(target_phys_addr_t espaddr, int it_shift,
     esp_reset(s);
 
     register_savevm("esp", espaddr, 3, esp_save, esp_load, s);
-    qemu_register_reset(esp_reset, s);
+    qemu_register_reset(esp_reset, 0, s);
 
     *reset = *qemu_allocate_irqs(parent_esp_reset, s, 1);
 
diff --git a/hw/etraxfs.c b/hw/etraxfs.c
index 03bff48..ba23ee6 100644
--- a/hw/etraxfs.c
+++ b/hw/etraxfs.c
@@ -66,7 +66,7 @@ void bareetraxfs_init (ram_addr_t ram_size, int vga_ram_size,
         cpu_model = "crisv32";
     }
     env = cpu_init(cpu_model);
-    qemu_register_reset(main_cpu_reset, env);
+    qemu_register_reset(main_cpu_reset, 0, env);
 
     /* allocate RAM */
     phys_ram = qemu_ram_alloc(ram_size);
diff --git a/hw/etraxfs_timer.c b/hw/etraxfs_timer.c
index 1144369..fdbb358 100644
--- a/hw/etraxfs_timer.c
+++ b/hw/etraxfs_timer.c
@@ -336,5 +336,5 @@ void etraxfs_timer_init(CPUState *env, qemu_irq *irqs, qemu_irq *nmi,
 	timer_regs = cpu_register_io_memory(0, timer_read, timer_write, t);
 	cpu_register_physical_memory (base, 0x5c, timer_regs);
 
-	qemu_register_reset(etraxfs_timer_reset, t);
+	qemu_register_reset(etraxfs_timer_reset, 0, t);
 }
diff --git a/hw/fdc.c b/hw/fdc.c
index b00a4ec..dc77696 100644
--- a/hw/fdc.c
+++ b/hw/fdc.c
@@ -1883,7 +1883,7 @@ static fdctrl_t *fdctrl_init_common (qemu_irq irq, int dma_chann,
     }
     fdctrl_external_reset(fdctrl);
     register_savevm("fdc", io_base, 2, fdc_save, fdc_load, fdctrl);
-    qemu_register_reset(fdctrl_external_reset, fdctrl);
+    qemu_register_reset(fdctrl_external_reset, 0, fdctrl);
     for (i = 0; i < MAX_FD; i++) {
         fd_revalidate(&fdctrl->drives[i]);
     }
diff --git a/hw/fw_cfg.c b/hw/fw_cfg.c
index e1b19d7..ab23c01 100644
--- a/hw/fw_cfg.c
+++ b/hw/fw_cfg.c
@@ -281,7 +281,7 @@ void *fw_cfg_init(uint32_t ctl_port, uint32_t data_port,
     fw_cfg_add_i16(s, FW_CFG_NB_CPUS, (uint16_t)smp_cpus);
 
     register_savevm("fw_cfg", -1, 1, fw_cfg_save, fw_cfg_load, s);
-    qemu_register_reset(fw_cfg_reset, s);
+    qemu_register_reset(fw_cfg_reset, 0, s);
     fw_cfg_reset(s);
 
     return s;
diff --git a/hw/g364fb.c b/hw/g364fb.c
index 44c0685..e0c7f38 100644
--- a/hw/g364fb.c
+++ b/hw/g364fb.c
@@ -598,7 +598,7 @@ int g364fb_mm_init(int vram_size, target_phys_addr_t vram_base,
     s->vram_size = vram_size;
     s->irq = irq;
 
-    qemu_register_reset(g364fb_reset, s);
+    qemu_register_reset(g364fb_reset, 0, s);
     register_savevm("g364fb", 0, 1, g364fb_save, g364fb_load, s);
     g364fb_reset(s);
 
diff --git a/hw/grackle_pci.c b/hw/grackle_pci.c
index 5161727..6a120a7 100644
--- a/hw/grackle_pci.c
+++ b/hw/grackle_pci.c
@@ -176,7 +176,7 @@ PCIBus *pci_grackle_init(uint32_t base, qemu_irq *pic)
     d->config[0x27] = 0x85;
 #endif
     register_savevm("grackle", 0, 1, pci_grackle_save, pci_grackle_load, d);
-    qemu_register_reset(pci_grackle_reset, d);
+    qemu_register_reset(pci_grackle_reset, 0, d);
     pci_grackle_reset(d);
 
     return s->bus;
diff --git a/hw/heathrow_pic.c b/hw/heathrow_pic.c
index f0518bb..647dc5e 100644
--- a/hw/heathrow_pic.c
+++ b/hw/heathrow_pic.c
@@ -230,7 +230,7 @@ qemu_irq *heathrow_pic_init(int *pmem_index,
 
     register_savevm("heathrow_pic", -1, 1, heathrow_pic_save,
                     heathrow_pic_load, s);
-    qemu_register_reset(heathrow_pic_reset, s);
+    qemu_register_reset(heathrow_pic_reset, 0, s);
     heathrow_pic_reset(s);
     return qemu_allocate_irqs(heathrow_pic_set_irq, s, 64);
 }
diff --git a/hw/hpet.c b/hw/hpet.c
index c7945ec..29db325 100644
--- a/hw/hpet.c
+++ b/hw/hpet.c
@@ -580,7 +580,7 @@ void hpet_init(qemu_irq *irq) {
     }
     hpet_reset(s);
     register_savevm("hpet", -1, 1, hpet_save, hpet_load, s);
-    qemu_register_reset(hpet_reset, s);
+    qemu_register_reset(hpet_reset, 0, s);
     /* HPET Area */
     iomemtype = cpu_register_io_memory(0, hpet_ram_read,
                                        hpet_ram_write, s);
diff --git a/hw/hw.h b/hw/hw.h
index d0cf598..654bb9d 100644
--- a/hw/hw.h
+++ b/hw/hw.h
@@ -243,7 +243,7 @@ void unregister_savevm(const char *idstr, void *opaque);
 
 typedef void QEMUResetHandler(void *opaque);
 
-void qemu_register_reset(QEMUResetHandler *func, void *opaque);
+void qemu_register_reset(QEMUResetHandler *func, int order, void *opaque);
 
 /* handler to set the boot_device for a specific type of QEMUMachine */
 /* return 0 if success */
diff --git a/hw/i8254.c b/hw/i8254.c
index 44e4531..acdd234 100644
--- a/hw/i8254.c
+++ b/hw/i8254.c
@@ -497,7 +497,7 @@ PITState *pit_init(int base, qemu_irq irq)
 
     register_savevm("i8254", base, 1, pit_save, pit_load, pit);
 
-    qemu_register_reset(pit_reset, pit);
+    qemu_register_reset(pit_reset, 0, pit);
     register_ioport_write(base, 4, 1, pit_ioport_write, pit);
     register_ioport_read(base, 3, 1, pit_ioport_read, pit);
 
diff --git a/hw/i8259.c b/hw/i8259.c
index adabd2b..40f8bee 100644
--- a/hw/i8259.c
+++ b/hw/i8259.c
@@ -508,7 +508,7 @@ static void pic_init1(int io_addr, int elcr_addr, PicState *s)
         register_ioport_read(elcr_addr, 1, 1, elcr_ioport_read, s);
     }
     register_savevm("i8259", io_addr, 1, pic_save, pic_load, s);
-    qemu_register_reset(pic_reset, s);
+    qemu_register_reset(pic_reset, 0, s);
 }
 
 void pic_info(Monitor *mon)
diff --git a/hw/ide.c b/hw/ide.c
index e61cefb..7284010 100644
--- a/hw/ide.c
+++ b/hw/ide.c
@@ -3330,7 +3330,7 @@ void pci_cmd646_ide_init(PCIBus *bus, BlockDriverState **hd_table,
     ide_init2(&d->ide_if[2], hd_table[2], hd_table[3], irq[1]);
 
     register_savevm("ide", 0, 2, pci_ide_save, pci_ide_load, d);
-    qemu_register_reset(cmd646_reset, d);
+    qemu_register_reset(cmd646_reset, 0, d);
     cmd646_reset(d);
 }
 
@@ -3373,7 +3373,7 @@ void pci_piix3_ide_init(PCIBus *bus, BlockDriverState **hd_table, int devfn,
     pci_config_set_class(pci_conf, PCI_CLASS_STORAGE_IDE);
     pci_conf[0x0e] = 0x00; // header_type
 
-    qemu_register_reset(piix3_reset, d);
+    qemu_register_reset(piix3_reset, 0, d);
     piix3_reset(d);
 
     pci_register_io_region((PCIDevice *)d, 4, 0x10,
@@ -3413,7 +3413,7 @@ void pci_piix4_ide_init(PCIBus *bus, BlockDriverState **hd_table, int devfn,
     pci_config_set_class(pci_conf, PCI_CLASS_STORAGE_IDE);
     pci_conf[0x0e] = 0x00; // header_type
 
-    qemu_register_reset(piix3_reset, d);
+    qemu_register_reset(piix3_reset, 0, d);
     piix3_reset(d);
 
     pci_register_io_region((PCIDevice *)d, 4, 0x10,
@@ -3754,7 +3754,7 @@ int pmac_ide_init (BlockDriverState **hd_table, qemu_irq irq,
     pmac_ide_memory = cpu_register_io_memory(0, pmac_ide_read,
                                              pmac_ide_write, d);
     register_savevm("ide", 0, 1, pmac_ide_save, pmac_ide_load, d);
-    qemu_register_reset(pmac_ide_reset, d);
+    qemu_register_reset(pmac_ide_reset, 0, d);
     pmac_ide_reset(d);
 
     return pmac_ide_memory;
diff --git a/hw/ioapic.c b/hw/ioapic.c
index 317c2c2..83ac25e 100644
--- a/hw/ioapic.c
+++ b/hw/ioapic.c
@@ -255,7 +255,7 @@ IOAPICState *ioapic_init(void)
     cpu_register_physical_memory(0xfec00000, 0x1000, io_memory);
 
     register_savevm("ioapic", 0, 1, ioapic_save, ioapic_load, s);
-    qemu_register_reset(ioapic_reset, s);
+    qemu_register_reset(ioapic_reset, 0, s);
 
     return s;
 }
diff --git a/hw/iommu.c b/hw/iommu.c
index cde5f1f..fa899b5 100644
--- a/hw/iommu.c
+++ b/hw/iommu.c
@@ -380,7 +380,7 @@ void *iommu_init(target_phys_addr_t addr, uint32_t version, qemu_irq irq)
     cpu_register_physical_memory(addr, IOMMU_NREGS * 4, iommu_io_memory);
 
     register_savevm("iommu", addr, 2, iommu_save, iommu_load, s);
-    qemu_register_reset(iommu_reset, s);
+    qemu_register_reset(iommu_reset, 0, s);
     iommu_reset(s);
     return s;
 }
diff --git a/hw/lm832x.c b/hw/lm832x.c
index 6479e14..f9ebfee 100644
--- a/hw/lm832x.c
+++ b/hw/lm832x.c
@@ -506,7 +506,7 @@ struct i2c_slave *lm8323_init(i2c_bus *bus, qemu_irq nirq)
 
     lm_kbd_reset(s);
 
-    qemu_register_reset((void *) lm_kbd_reset, s);
+    qemu_register_reset((void *) lm_kbd_reset, 0, s);
     register_savevm("LM8323", -1, 0, lm_kbd_save, lm_kbd_load, s);
 
     return &s->i2c;
diff --git a/hw/m48t59.c b/hw/m48t59.c
index 0cfdab3..546fc69 100644
--- a/hw/m48t59.c
+++ b/hw/m48t59.c
@@ -641,7 +641,7 @@ m48t59_t *m48t59_init (qemu_irq IRQ, target_phys_addr_t mem_base,
     }
     qemu_get_timedate(&s->alarm, 0);
 
-    qemu_register_reset(m48t59_reset, s);
+    qemu_register_reset(m48t59_reset, 0, s);
     save_base = mem_base ? mem_base : io_base;
     register_savevm("m48t59", save_base, 1, m48t59_save, m48t59_load, s);
 
diff --git a/hw/mac_dbdma.c b/hw/mac_dbdma.c
index e863980..cd2e0cd 100644
--- a/hw/mac_dbdma.c
+++ b/hw/mac_dbdma.c
@@ -839,7 +839,7 @@ void* DBDMA_init (int *dbdma_mem_index)
 
     *dbdma_mem_index = cpu_register_io_memory(0, dbdma_read, dbdma_write, s);
     register_savevm("dbdma", -1, 1, dbdma_save, dbdma_load, s);
-    qemu_register_reset(dbdma_reset, s);
+    qemu_register_reset(dbdma_reset, 0, s);
     dbdma_reset(s);
 
     dbdma_bh = qemu_bh_new(DBDMA_run_bh, s);
diff --git a/hw/mac_nvram.c b/hw/mac_nvram.c
index ae4d4bb..02eb188 100644
--- a/hw/mac_nvram.c
+++ b/hw/mac_nvram.c
@@ -142,7 +142,7 @@ MacIONVRAMState *macio_nvram_init (int *mem_index, target_phys_addr_t size,
     *mem_index = s->mem_index;
     register_savevm("macio_nvram", -1, 1, macio_nvram_save, macio_nvram_load,
                     s);
-    qemu_register_reset(macio_nvram_reset, s);
+    qemu_register_reset(macio_nvram_reset, 0, s);
     macio_nvram_reset(s);
 
     return s;
diff --git a/hw/mips_jazz.c b/hw/mips_jazz.c
index 9550413..beaddef 100644
--- a/hw/mips_jazz.c
+++ b/hw/mips_jazz.c
@@ -158,7 +158,7 @@ void mips_jazz_init (ram_addr_t ram_size, int vga_ram_size,
         fprintf(stderr, "Unable to find CPU definition\n");
         exit(1);
     }
-    qemu_register_reset(main_cpu_reset, env);
+    qemu_register_reset(main_cpu_reset, 0, env);
 
     /* allocate RAM */
     ram_offset = qemu_ram_alloc(ram_size);
diff --git a/hw/mips_malta.c b/hw/mips_malta.c
index e7504c1..10cebba 100644
--- a/hw/mips_malta.c
+++ b/hw/mips_malta.c
@@ -452,7 +452,7 @@ static MaltaFPGAState *malta_fpga_init(target_phys_addr_t base, qemu_irq uart_ir
     s->uart = serial_mm_init(base + 0x900, 3, uart_irq, 230400, uart_chr, 1);
 
     malta_fpga_reset(s);
-    qemu_register_reset(malta_fpga_reset, s);
+    qemu_register_reset(malta_fpga_reset, 0, s);
 
     return s;
 }
@@ -801,7 +801,7 @@ void mips_malta_init (ram_addr_t ram_size, int vga_ram_size,
         fprintf(stderr, "Unable to find CPU definition\n");
         exit(1);
     }
-    qemu_register_reset(main_cpu_reset, env);
+    qemu_register_reset(main_cpu_reset, 0, env);
 
     /* allocate RAM */
     if (ram_size > (256 << 20)) {
diff --git a/hw/mips_mipssim.c b/hw/mips_mipssim.c
index 676cd26..6207c53 100644
--- a/hw/mips_mipssim.c
+++ b/hw/mips_mipssim.c
@@ -131,7 +131,7 @@ mips_mipssim_init (ram_addr_t ram_size, int vga_ram_size,
         fprintf(stderr, "Unable to find CPU definition\n");
         exit(1);
     }
-    qemu_register_reset(main_cpu_reset, env);
+    qemu_register_reset(main_cpu_reset, 0, env);
 
     /* Allocate RAM. */
     ram_offset = qemu_ram_alloc(ram_size);
diff --git a/hw/mips_r4k.c b/hw/mips_r4k.c
index 51232cf..5656a14 100644
--- a/hw/mips_r4k.c
+++ b/hw/mips_r4k.c
@@ -176,7 +176,7 @@ void mips_r4k_init (ram_addr_t ram_size, int vga_ram_size,
         fprintf(stderr, "Unable to find CPU definition\n");
         exit(1);
     }
-    qemu_register_reset(main_cpu_reset, env);
+    qemu_register_reset(main_cpu_reset, 0, env);
 
     /* allocate RAM */
     if (ram_size > (256 << 20)) {
diff --git a/hw/musicpal.c b/hw/musicpal.c
index fc227e9..2d14e24 100644
--- a/hw/musicpal.c
+++ b/hw/musicpal.c
@@ -450,7 +450,7 @@ static i2c_interface *musicpal_audio_init(qemu_irq irq)
                        musicpal_audio_writefn, s);
     cpu_register_physical_memory(MP_AUDIO_BASE, MP_AUDIO_SIZE, iomemtype);
 
-    qemu_register_reset(musicpal_audio_reset, s);
+    qemu_register_reset(musicpal_audio_reset, 0, s);
 
     return i2c;
 }
@@ -1057,7 +1057,7 @@ static qemu_irq *mv88w8618_pic_init(uint32_t base, qemu_irq parent_irq)
                                        mv88w8618_pic_writefn, s);
     cpu_register_physical_memory(base, MP_PIC_SIZE, iomemtype);
 
-    qemu_register_reset(mv88w8618_pic_reset, s);
+    qemu_register_reset(mv88w8618_pic_reset, 0, s);
 
     return qi;
 }
diff --git a/hw/nseries.c b/hw/nseries.c
index dafc0ba..128835f 100644
--- a/hw/nseries.c
+++ b/hw/nseries.c
@@ -1329,7 +1329,7 @@ static void n8x0_init(ram_addr_t ram_size, const char *boot_device,
         binfo->initrd_filename = initrd_filename;
         arm_load_kernel(s->cpu->env, binfo);
 
-        qemu_register_reset(n8x0_boot_init, s);
+        qemu_register_reset(n8x0_boot_init, 0, s);
         n8x0_boot_init(s);
     }
 
diff --git a/hw/omap1.c b/hw/omap1.c
index c32d3f7..d0f1e60 100644
--- a/hw/omap1.c
+++ b/hw/omap1.c
@@ -4798,7 +4798,7 @@ struct omap_mpu_state_s *omap310_mpu_init(unsigned long sdram_size,
     omap_setup_dsp_mapping(omap15xx_dsp_mm);
     omap_setup_mpui_io(s);
 
-    qemu_register_reset(omap1_mpu_reset, s);
+    qemu_register_reset(omap1_mpu_reset, 0, s);
 
     return s;
 }
diff --git a/hw/omap2.c b/hw/omap2.c
index 20b3811..e7d42a0 100644
--- a/hw/omap2.c
+++ b/hw/omap2.c
@@ -4871,7 +4871,7 @@ struct omap_mpu_state_s *omap2420_mpu_init(unsigned long sdram_size,
      * GPMC registers	6800a000   6800afff
      */
 
-    qemu_register_reset(omap2_mpu_reset, s);
+    qemu_register_reset(omap2_mpu_reset, 0, s);
 
     return s;
 }
diff --git a/hw/openpic.c b/hw/openpic.c
index 733284a..a34d1e2 100644
--- a/hw/openpic.c
+++ b/hw/openpic.c
@@ -1249,7 +1249,7 @@ qemu_irq *openpic_init (PCIBus *bus, int *pmem_index, int nb_cpus,
     opp->need_swap = 1;
 
     register_savevm("openpic", 0, 2, openpic_save, openpic_load, opp);
-    qemu_register_reset(openpic_reset, opp);
+    qemu_register_reset(openpic_reset, 0, opp);
 
     opp->irq_raise = openpic_irq_raise;
     opp->reset = openpic_reset;
@@ -1709,7 +1709,7 @@ qemu_irq *mpic_init (target_phys_addr_t base, int nb_cpus,
     mpp->reset = mpic_reset;
 
     register_savevm("mpic", 0, 2, openpic_save, openpic_load, mpp);
-    qemu_register_reset(mpic_reset, mpp);
+    qemu_register_reset(mpic_reset, 0, mpp);
     mpp->reset(mpp);
 
     return qemu_allocate_irqs(openpic_set_irq, mpp, mpp->max_irq);
diff --git a/hw/parallel.c b/hw/parallel.c
index 0abd0a2..2692ae9 100644
--- a/hw/parallel.c
+++ b/hw/parallel.c
@@ -448,7 +448,7 @@ ParallelState *parallel_init(int base, qemu_irq irq, CharDriverState *chr)
     s->irq = irq;
     s->chr = chr;
     parallel_reset(s);
-    qemu_register_reset(parallel_reset, s);
+    qemu_register_reset(parallel_reset, 0, s);
 
     if (qemu_chr_ioctl(chr, CHR_IOCTL_PP_READ_STATUS, &dummy) == 0) {
         s->hw_driver = 1;
@@ -541,7 +541,7 @@ ParallelState *parallel_mm_init(target_phys_addr_t base, int it_shift, qemu_irq
     s->chr = chr;
     s->it_shift = it_shift;
     parallel_reset(s);
-    qemu_register_reset(parallel_reset, s);
+    qemu_register_reset(parallel_reset, 0, s);
 
     io_sw = cpu_register_io_memory(0, parallel_mm_read_sw, parallel_mm_write_sw, s);
     cpu_register_physical_memory(base, 8 << it_shift, io_sw);
diff --git a/hw/pc.c b/hw/pc.c
index 61f6e7b..971c12c 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -861,7 +861,7 @@ static void pc_init1(ram_addr_t ram_size, int vga_ram_size,
             /* XXX: enable it in all cases */
             env->cpuid_features |= CPUID_APIC;
         }
-        qemu_register_reset(main_cpu_reset, env);
+        qemu_register_reset(main_cpu_reset, 0, env);
         if (pci_enabled) {
             apic_init(env);
         }
diff --git a/hw/pckbd.c b/hw/pckbd.c
index dd52651..3ef3594 100644
--- a/hw/pckbd.c
+++ b/hw/pckbd.c
@@ -381,7 +381,7 @@ void i8042_init(qemu_irq kbd_irq, qemu_irq mouse_irq, uint32_t io_base)
 #ifdef TARGET_I386
     vmmouse_init(s->mouse);
 #endif
-    qemu_register_reset(kbd_reset, s);
+    qemu_register_reset(kbd_reset, 0, s);
 }
 
 /* Memory mapped interface */
@@ -438,5 +438,5 @@ void i8042_mm_init(qemu_irq kbd_irq, qemu_irq mouse_irq,
 #ifdef TARGET_I386
     vmmouse_init(s->mouse);
 #endif
-    qemu_register_reset(kbd_reset, s);
+    qemu_register_reset(kbd_reset, 0, s);
 }
diff --git a/hw/pl181.c b/hw/pl181.c
index 8583490..663a2e3 100644
--- a/hw/pl181.c
+++ b/hw/pl181.c
@@ -457,7 +457,7 @@ void pl181_init(uint32_t base, BlockDriverState *bd,
     s->card = sd_init(bd, 0);
     s->irq[0] = irq0;
     s->irq[1] = irq1;
-    qemu_register_reset(pl181_reset, s);
+    qemu_register_reset(pl181_reset, 0, s);
     pl181_reset(s);
     /* ??? Save/restore.  */
 }
diff --git a/hw/ppc405_boards.c b/hw/ppc405_boards.c
index c6ddfef..e80ead0 100644
--- a/hw/ppc405_boards.c
+++ b/hw/ppc405_boards.c
@@ -166,7 +166,7 @@ static void ref405ep_fpga_init (uint32_t base)
                                          ref405ep_fpga_write, fpga);
     cpu_register_physical_memory(base, 0x00000100, fpga_memory);
     ref405ep_fpga_reset(fpga);
-    qemu_register_reset(&ref405ep_fpga_reset, fpga);
+    qemu_register_reset(&ref405ep_fpga_reset, 0, fpga);
 }
 
 static void ref405ep_init (ram_addr_t ram_size, int vga_ram_size,
@@ -484,7 +484,7 @@ static void taihu_cpld_init (uint32_t base)
                                          taihu_cpld_write, cpld);
     cpu_register_physical_memory(base, 0x00000100, cpld_memory);
     taihu_cpld_reset(cpld);
-    qemu_register_reset(&taihu_cpld_reset, cpld);
+    qemu_register_reset(&taihu_cpld_reset, 0, cpld);
 }
 
 static void taihu_405ep_init(ram_addr_t ram_size, int vga_ram_size,
diff --git a/hw/ppc405_uc.c b/hw/ppc405_uc.c
index dfe1905..8dc33c7 100644
--- a/hw/ppc405_uc.c
+++ b/hw/ppc405_uc.c
@@ -173,7 +173,7 @@ void ppc4xx_plb_init (CPUState *env)
     ppc_dcr_register(env, PLB0_BEAR, plb, &dcr_read_plb, &dcr_write_plb);
     ppc_dcr_register(env, PLB0_BESR, plb, &dcr_read_plb, &dcr_write_plb);
     ppc4xx_plb_reset(plb);
-    qemu_register_reset(ppc4xx_plb_reset, plb);
+    qemu_register_reset(ppc4xx_plb_reset, 0, plb);
 }
 
 /*****************************************************************************/
@@ -249,7 +249,7 @@ void ppc4xx_pob_init (CPUState *env)
     ppc_dcr_register(env, POB0_BEAR, pob, &dcr_read_pob, &dcr_write_pob);
     ppc_dcr_register(env, POB0_BESR0, pob, &dcr_read_pob, &dcr_write_pob);
     ppc_dcr_register(env, POB0_BESR1, pob, &dcr_read_pob, &dcr_write_pob);
-    qemu_register_reset(ppc4xx_pob_reset, pob);
+    qemu_register_reset(ppc4xx_pob_reset, 0, pob);
     ppc4xx_pob_reset(env);
 }
 
@@ -386,7 +386,7 @@ void ppc4xx_opba_init (CPUState *env, ppc4xx_mmio_t *mmio,
 #endif
     ppc4xx_mmio_register(env, mmio, offset, 0x002,
                          opba_read, opba_write, opba);
-    qemu_register_reset(ppc4xx_opba_reset, opba);
+    qemu_register_reset(ppc4xx_opba_reset, 0, opba);
     ppc4xx_opba_reset(opba);
 }
 
@@ -580,7 +580,7 @@ void ppc405_ebc_init (CPUState *env)
 
     ebc = qemu_mallocz(sizeof(ppc4xx_ebc_t));
     ebc_reset(ebc);
-    qemu_register_reset(&ebc_reset, ebc);
+    qemu_register_reset(&ebc_reset, 0, ebc);
     ppc_dcr_register(env, EBC0_CFGADDR,
                      ebc, &dcr_read_ebc, &dcr_write_ebc);
     ppc_dcr_register(env, EBC0_CFGDATA,
@@ -672,7 +672,7 @@ void ppc405_dma_init (CPUState *env, qemu_irq irqs[4])
     dma = qemu_mallocz(sizeof(ppc405_dma_t));
     memcpy(dma->irqs, irqs, 4 * sizeof(qemu_irq));
     ppc405_dma_reset(dma);
-    qemu_register_reset(&ppc405_dma_reset, dma);
+    qemu_register_reset(&ppc405_dma_reset, 0, dma);
     ppc_dcr_register(env, DMA0_CR0,
                      dma, &dcr_read_dma, &dcr_write_dma);
     ppc_dcr_register(env, DMA0_CT0,
@@ -837,7 +837,7 @@ void ppc405_gpio_init (CPUState *env, ppc4xx_mmio_t *mmio,
     gpio = qemu_mallocz(sizeof(ppc405_gpio_t));
     gpio->base = offset;
     ppc405_gpio_reset(gpio);
-    qemu_register_reset(&ppc405_gpio_reset, gpio);
+    qemu_register_reset(&ppc405_gpio_reset, 0, gpio);
 #ifdef DEBUG_GPIO
     printf("%s: offset " PADDRX "\n", __func__, offset);
 #endif
@@ -1028,7 +1028,7 @@ void ppc405_ocm_init (CPUState *env)
     ocm = qemu_mallocz(sizeof(ppc405_ocm_t));
     ocm->offset = qemu_ram_alloc(4096);
     ocm_reset(ocm);
-    qemu_register_reset(&ocm_reset, ocm);
+    qemu_register_reset(&ocm_reset, 0, ocm);
     ppc_dcr_register(env, OCM0_ISARC,
                      ocm, &dcr_read_ocm, &dcr_write_ocm);
     ppc_dcr_register(env, OCM0_ISACNTL,
@@ -1280,7 +1280,7 @@ void ppc405_i2c_init (CPUState *env, ppc4xx_mmio_t *mmio,
 #endif
     ppc4xx_mmio_register(env, mmio, offset, 0x011,
                          i2c_read, i2c_write, i2c);
-    qemu_register_reset(ppc4xx_i2c_reset, i2c);
+    qemu_register_reset(ppc4xx_i2c_reset, 0, i2c);
 }
 
 /*****************************************************************************/
@@ -1562,7 +1562,7 @@ void ppc4xx_gpt_init (CPUState *env, ppc4xx_mmio_t *mmio,
 #endif
     ppc4xx_mmio_register(env, mmio, offset, 0x0D4,
                          gpt_read, gpt_write, gpt);
-    qemu_register_reset(ppc4xx_gpt_reset, gpt);
+    qemu_register_reset(ppc4xx_gpt_reset, 0, gpt);
 }
 
 /*****************************************************************************/
@@ -1787,7 +1787,7 @@ void ppc405_mal_init (CPUState *env, qemu_irq irqs[4])
     for (i = 0; i < 4; i++)
         mal->irqs[i] = irqs[i];
     ppc40x_mal_reset(mal);
-    qemu_register_reset(&ppc40x_mal_reset, mal);
+    qemu_register_reset(&ppc40x_mal_reset, 0, mal);
     ppc_dcr_register(env, MAL0_CFG,
                      mal, &dcr_read_mal, &dcr_write_mal);
     ppc_dcr_register(env, MAL0_ESR,
@@ -2171,7 +2171,7 @@ static void ppc405cr_cpc_init (CPUState *env, clk_setup_t clk_setup[7],
     ppc_dcr_register(env, PPC405CR_CPC0_SR, cpc,
                      &dcr_read_crcpc, &dcr_write_crcpc);
     ppc405cr_clk_init(cpc);
-    qemu_register_reset(ppc405cr_cpc_reset, cpc);
+    qemu_register_reset(ppc405cr_cpc_reset, 0, cpc);
     ppc405cr_cpc_reset(cpc);
 }
 
@@ -2493,7 +2493,7 @@ static void ppc405ep_cpc_init (CPUState *env, clk_setup_t clk_setup[8],
     cpc->jtagid = 0x20267049;
     cpc->sysclk = sysclk;
     ppc405ep_cpc_reset(cpc);
-    qemu_register_reset(&ppc405ep_cpc_reset, cpc);
+    qemu_register_reset(&ppc405ep_cpc_reset, 0, cpc);
     ppc_dcr_register(env, PPC405EP_CPC0_BOOT, cpc,
                      &dcr_read_epcpc, &dcr_write_epcpc);
     ppc_dcr_register(env, PPC405EP_CPC0_EPCTL, cpc,
diff --git a/hw/ppc4xx_devs.c b/hw/ppc4xx_devs.c
index ddee8f6..5c8d273 100644
--- a/hw/ppc4xx_devs.c
+++ b/hw/ppc4xx_devs.c
@@ -60,7 +60,7 @@ CPUState *ppc4xx_init (const char *cpu_model,
     tb_clk->opaque = env;
     ppc_dcr_init(env, NULL, NULL);
     /* Register qemu callbacks */
-    qemu_register_reset(&cpu_ppc_reset, env);
+    qemu_register_reset(&cpu_ppc_reset, 0, env);
 
     return env;
 }
@@ -498,7 +498,7 @@ qemu_irq *ppcuic_init (CPUState *env, qemu_irq *irqs,
         ppc_dcr_register(env, dcr_base + i, uic,
                          &dcr_read_uic, &dcr_write_uic);
     }
-    qemu_register_reset(ppcuic_reset, uic);
+    qemu_register_reset(ppcuic_reset, 0, uic);
     ppcuic_reset(uic);
 
     return qemu_allocate_irqs(&ppcuic_set_irq, uic, UIC_MAX_IRQ);
@@ -834,7 +834,7 @@ void ppc4xx_sdram_init (CPUState *env, qemu_irq irq, int nbanks,
     memcpy(sdram->ram_sizes, ram_sizes,
            nbanks * sizeof(target_phys_addr_t));
     sdram_reset(sdram);
-    qemu_register_reset(&sdram_reset, sdram);
+    qemu_register_reset(&sdram_reset, 0, sdram);
     ppc_dcr_register(env, SDRAM0_CFGADDR,
                      sdram, &dcr_read_sdram, &dcr_write_sdram);
     ppc_dcr_register(env, SDRAM0_CFGDATA,
diff --git a/hw/ppc4xx_pci.c b/hw/ppc4xx_pci.c
index 2e67508..f21fb89 100644
--- a/hw/ppc4xx_pci.c
+++ b/hw/ppc4xx_pci.c
@@ -403,7 +403,7 @@ PCIBus *ppc4xx_pci_init(CPUState *env, qemu_irq pci_irqs[4],
         goto free;
     cpu_register_physical_memory(registers, PCI_REG_SIZE, index);
 
-    qemu_register_reset(ppc4xx_pci_reset, controller);
+    qemu_register_reset(ppc4xx_pci_reset, 0, controller);
 
     /* XXX load/save code not tested. */
     register_savevm("ppc4xx_pci", ppc4xx_pci_id++, 1,
diff --git a/hw/ppc_newworld.c b/hw/ppc_newworld.c
index f67b727..8a6f258 100644
--- a/hw/ppc_newworld.c
+++ b/hw/ppc_newworld.c
@@ -128,7 +128,7 @@ static void ppc_core99_init (ram_addr_t ram_size, int vga_ram_size,
 #if 0
         env->osi_call = vga_osi_call;
 #endif
-        qemu_register_reset(&cpu_ppc_reset, env);
+        qemu_register_reset(&cpu_ppc_reset, 0, env);
         envs[i] = env;
     }
 
diff --git a/hw/ppc_oldworld.c b/hw/ppc_oldworld.c
index 06c77b4..b0cb1e9 100644
--- a/hw/ppc_oldworld.c
+++ b/hw/ppc_oldworld.c
@@ -154,7 +154,7 @@ static void ppc_heathrow_init (ram_addr_t ram_size, int vga_ram_size,
         /* Set time-base frequency to 16.6 Mhz */
         cpu_ppc_tb_init(env,  16600000UL);
         env->osi_call = vga_osi_call;
-        qemu_register_reset(&cpu_ppc_reset, env);
+        qemu_register_reset(&cpu_ppc_reset, 0, env);
         envs[i] = env;
     }
 
diff --git a/hw/ppc_prep.c b/hw/ppc_prep.c
index 0e2d581..08d9cdd 100644
--- a/hw/ppc_prep.c
+++ b/hw/ppc_prep.c
@@ -572,7 +572,7 @@ static void ppc_prep_init (ram_addr_t ram_size, int vga_ram_size,
             /* Set time-base frequency to 100 Mhz */
             cpu_ppc_tb_init(env, 100UL * 1000UL * 1000UL);
         }
-        qemu_register_reset(&cpu_ppc_reset, env);
+        qemu_register_reset(&cpu_ppc_reset, 0, env);
         envs[i] = env;
     }
 
diff --git a/hw/ps2.c b/hw/ps2.c
index fb77005..b1352d0 100644
--- a/hw/ps2.c
+++ b/hw/ps2.c
@@ -593,7 +593,7 @@ void *ps2_kbd_init(void (*update_irq)(void *, int), void *update_arg)
     ps2_reset(&s->common);
     register_savevm("ps2kbd", 0, 3, ps2_kbd_save, ps2_kbd_load, s);
     qemu_add_kbd_event_handler(ps2_put_keycode, s);
-    qemu_register_reset(ps2_reset, &s->common);
+    qemu_register_reset(ps2_reset, 0, &s->common);
     return s;
 }
 
@@ -606,6 +606,6 @@ void *ps2_mouse_init(void (*update_irq)(void *, int), void *update_arg)
     ps2_reset(&s->common);
     register_savevm("ps2mouse", 0, 2, ps2_mouse_save, ps2_mouse_load, s);
     qemu_add_mouse_event_handler(ps2_mouse_event, s, 0, "QEMU PS/2 Mouse");
-    qemu_register_reset(ps2_reset, &s->common);
+    qemu_register_reset(ps2_reset, 0, &s->common);
     return s;
 }
diff --git a/hw/rc4030.c b/hw/rc4030.c
index 2f9bb0e..5d0b543 100644
--- a/hw/rc4030.c
+++ b/hw/rc4030.c
@@ -810,7 +810,7 @@ void *rc4030_init(qemu_irq timer, qemu_irq jazz_bus,
     s->timer_irq = timer;
     s->jazz_bus_irq = jazz_bus;
 
-    qemu_register_reset(rc4030_reset, s);
+    qemu_register_reset(rc4030_reset, 0, s);
     register_savevm("rc4030", 0, 2, rc4030_save, rc4030_load, s);
     rc4030_reset(s);
 
diff --git a/hw/sbi.c b/hw/sbi.c
index 9c41f53..a6b5506 100644
--- a/hw/sbi.c
+++ b/hw/sbi.c
@@ -155,7 +155,7 @@ void *sbi_init(target_phys_addr_t addr, qemu_irq **irq, qemu_irq **cpu_irq,
     cpu_register_physical_memory(addr, SBI_SIZE, sbi_io_memory);
 
     register_savevm("sbi", addr, 1, sbi_save, sbi_load, s);
-    qemu_register_reset(sbi_reset, s);
+    qemu_register_reset(sbi_reset, 0, s);
     *irq = qemu_allocate_irqs(sbi_set_irq, s, 32);
     *cpu_irq = qemu_allocate_irqs(sbi_set_timer_irq_cpu, s, MAX_CPUS);
     sbi_reset(s);
diff --git a/hw/serial.c b/hw/serial.c
index ac089fc..a82c29c 100644
--- a/hw/serial.c
+++ b/hw/serial.c
@@ -718,7 +718,7 @@ static void serial_init_core(SerialState *s, qemu_irq irq, int baudbase,
     s->fifo_timeout_timer = qemu_new_timer(vm_clock, (QEMUTimerCB *) fifo_timeout_int, s);
     s->transmit_timer = qemu_new_timer(vm_clock, (QEMUTimerCB *) serial_xmit, s);
 
-    qemu_register_reset(serial_reset, s);
+    qemu_register_reset(serial_reset, 0, s);
     serial_reset(s);
 
     qemu_chr_add_handlers(s->chr, serial_can_receive1, serial_receive1,
diff --git a/hw/slavio_intctl.c b/hw/slavio_intctl.c
index 9ee5ff8..77deb87 100644
--- a/hw/slavio_intctl.c
+++ b/hw/slavio_intctl.c
@@ -407,7 +407,7 @@ void *slavio_intctl_init(target_phys_addr_t addr, target_phys_addr_t addrg,
 
     register_savevm("slavio_intctl", addr, 1, slavio_intctl_save,
                     slavio_intctl_load, s);
-    qemu_register_reset(slavio_intctl_reset, s);
+    qemu_register_reset(slavio_intctl_reset, 0, s);
     *irq = qemu_allocate_irqs(slavio_set_irq, s, 32);
 
     *cpu_irq = qemu_allocate_irqs(slavio_set_timer_irq_cpu, s, MAX_CPUS);
diff --git a/hw/slavio_misc.c b/hw/slavio_misc.c
index 8da7f4a..ec97ae4 100644
--- a/hw/slavio_misc.c
+++ b/hw/slavio_misc.c
@@ -501,7 +501,7 @@ void *slavio_misc_init(target_phys_addr_t base, target_phys_addr_t power_base,
 
     register_savevm("slavio_misc", base, 1, slavio_misc_save, slavio_misc_load,
                     s);
-    qemu_register_reset(slavio_misc_reset, s);
+    qemu_register_reset(slavio_misc_reset, 0, s);
     slavio_misc_reset(s);
 
     return s;
diff --git a/hw/slavio_timer.c b/hw/slavio_timer.c
index 6a29ce2..9896019 100644
--- a/hw/slavio_timer.c
+++ b/hw/slavio_timer.c
@@ -391,7 +391,7 @@ static SLAVIO_TIMERState *slavio_timer_init(target_phys_addr_t addr,
                                      slavio_timer_io_memory);
     register_savevm("slavio_timer", addr, 3, slavio_timer_save,
                     slavio_timer_load, s);
-    qemu_register_reset(slavio_timer_reset, s);
+    qemu_register_reset(slavio_timer_reset, 0, s);
     slavio_timer_reset(s);
 
     return s;
diff --git a/hw/sparc32_dma.c b/hw/sparc32_dma.c
index b1495dd..4ea93f7 100644
--- a/hw/sparc32_dma.c
+++ b/hw/sparc32_dma.c
@@ -256,7 +256,7 @@ void *sparc32_dma_init(target_phys_addr_t daddr, qemu_irq parent_irq,
     cpu_register_physical_memory(daddr, DMA_SIZE, dma_io_memory);
 
     register_savevm("sparc32_dma", daddr, 2, dma_save, dma_load, s);
-    qemu_register_reset(dma_reset, s);
+    qemu_register_reset(dma_reset, 0, s);
     *dev_irq = qemu_allocate_irqs(dma_set_irq, s, 1);
 
     *reset = &s->dev_reset;
diff --git a/hw/sun4c_intctl.c b/hw/sun4c_intctl.c
index 33df653..03c7174 100644
--- a/hw/sun4c_intctl.c
+++ b/hw/sun4c_intctl.c
@@ -213,7 +213,7 @@ void *sun4c_intctl_init(target_phys_addr_t addr, qemu_irq **irq,
     register_savevm("sun4c_intctl", addr, 1, sun4c_intctl_save,
                     sun4c_intctl_load, s);
 
-    qemu_register_reset(sun4c_intctl_reset, s);
+    qemu_register_reset(sun4c_intctl_reset, 0, s);
     *irq = qemu_allocate_irqs(sun4c_set_irq, s, 8);
 
     sun4c_intctl_reset(s);
diff --git a/hw/sun4m.c b/hw/sun4m.c
index 1f1efd0..92c963b 100644
--- a/hw/sun4m.c
+++ b/hw/sun4m.c
@@ -400,9 +400,9 @@ static void sun4m_hw_init(const struct sun4m_hwdef *hwdef, ram_addr_t RAM_size,
         cpu_sparc_set_id(env, i);
         envs[i] = env;
         if (i == 0) {
-            qemu_register_reset(main_cpu_reset, env);
+            qemu_register_reset(main_cpu_reset, 0, env);
         } else {
-            qemu_register_reset(secondary_cpu_reset, env);
+            qemu_register_reset(secondary_cpu_reset, 0, env);
             env->halted = 1;
         }
         cpu_irqs[i] = qemu_allocate_irqs(cpu_set_irq, envs[i], MAX_PILS);
@@ -1190,9 +1190,9 @@ static void sun4d_hw_init(const struct sun4d_hwdef *hwdef, ram_addr_t RAM_size,
         cpu_sparc_set_id(env, i);
         envs[i] = env;
         if (i == 0) {
-            qemu_register_reset(main_cpu_reset, env);
+            qemu_register_reset(main_cpu_reset, 0, env);
         } else {
-            qemu_register_reset(secondary_cpu_reset, env);
+            qemu_register_reset(secondary_cpu_reset, 0, env);
             env->halted = 1;
         }
         cpu_irqs[i] = qemu_allocate_irqs(cpu_set_irq, envs[i], MAX_PILS);
@@ -1413,7 +1413,7 @@ static void sun4c_hw_init(const struct sun4c_hwdef *hwdef, ram_addr_t RAM_size,
 
     cpu_sparc_set_id(env, 0);
 
-    qemu_register_reset(main_cpu_reset, env);
+    qemu_register_reset(main_cpu_reset, 0, env);
     cpu_irqs = qemu_allocate_irqs(cpu_set_irq, env, MAX_PILS);
     env->prom_addr = hwdef->slavio_base;
 
diff --git a/hw/sun4u.c b/hw/sun4u.c
index de635d4..6c34cb0 100644
--- a/hw/sun4u.c
+++ b/hw/sun4u.c
@@ -374,7 +374,7 @@ static void sun4uv_init(ram_addr_t RAM_size, int vga_ram_size,
     reset_info = qemu_mallocz(sizeof(ResetData));
     reset_info->env = env;
     reset_info->reset_addr = hwdef->prom_addr + 0x40ULL;
-    qemu_register_reset(main_cpu_reset, reset_info);
+    qemu_register_reset(main_cpu_reset, 0, reset_info);
     main_cpu_reset(reset_info);
     // Override warm reset address with cold start address
     env->pc = hwdef->prom_addr + 0x20ULL;
diff --git a/hw/tcx.c b/hw/tcx.c
index 99e65a0..450ff9d 100644
--- a/hw/tcx.c
+++ b/hw/tcx.c
@@ -560,7 +560,7 @@ void tcx_init(target_phys_addr_t addr, int vram_size, int width, int height,
                                  dummy_memory);
 
     register_savevm("tcx", addr, 4, tcx_save, tcx_load, s);
-    qemu_register_reset(tcx_reset, s);
+    qemu_register_reset(tcx_reset, 0, s);
     tcx_reset(s);
     qemu_console_resize(s->ds, width, height);
 }
diff --git a/hw/tsc2005.c b/hw/tsc2005.c
index e8d4a85..7eb60a0 100644
--- a/hw/tsc2005.c
+++ b/hw/tsc2005.c
@@ -548,7 +548,7 @@ void *tsc2005_init(qemu_irq pintdav)
     qemu_add_mouse_event_handler(tsc2005_touchscreen_event, s, 1,
                     "QEMU TSC2005-driven Touchscreen");
 
-    qemu_register_reset((void *) tsc2005_reset, s);
+    qemu_register_reset((void *) tsc2005_reset, 0, s);
     register_savevm("tsc2005", -1, 0, tsc2005_save, tsc2005_load, s);
 
     return s;
diff --git a/hw/tsc210x.c b/hw/tsc210x.c
index 41d374f..4466fc8 100644
--- a/hw/tsc210x.c
+++ b/hw/tsc210x.c
@@ -1151,7 +1151,7 @@ struct uwire_slave_s *tsc2102_init(qemu_irq pint, AudioState *audio)
     if (s->audio)
         AUD_register_card(s->audio, s->name, &s->card);
 
-    qemu_register_reset((void *) tsc210x_reset, s);
+    qemu_register_reset((void *) tsc210x_reset, 0, s);
     register_savevm(s->name, -1, 0,
                     tsc210x_save, tsc210x_load, s);
 
@@ -1205,7 +1205,7 @@ struct uwire_slave_s *tsc2301_init(qemu_irq penirq, qemu_irq kbirq,
     if (s->audio)
         AUD_register_card(s->audio, s->name, &s->card);
 
-    qemu_register_reset((void *) tsc210x_reset, s);
+    qemu_register_reset((void *) tsc210x_reset, 0, s);
     register_savevm(s->name, -1, 0, tsc210x_save, tsc210x_load, s);
 
     return &s->chip;
diff --git a/hw/unin_pci.c b/hw/unin_pci.c
index 9fc073a..7f136b0 100644
--- a/hw/unin_pci.c
+++ b/hw/unin_pci.c
@@ -265,7 +265,7 @@ PCIBus *pci_pmac_init(qemu_irq *pic)
     d->config[0x34] = 0x00; // capabilities_pointer
 #endif
     register_savevm("uninorth", 0, 1, pci_unin_save, pci_unin_load, d);
-    qemu_register_reset(pci_unin_reset, d);
+    qemu_register_reset(pci_unin_reset, 0, d);
     pci_unin_reset(d);
 
     return s->bus;
diff --git a/hw/usb-ohci.c b/hw/usb-ohci.c
index 09944d0..1cc1b62 100644
--- a/hw/usb-ohci.c
+++ b/hw/usb-ohci.c
@@ -1695,7 +1695,7 @@ static void usb_ohci_init(OHCIState *ohci, int num_ports, int devfn,
     }
 
     ohci->async_td = 0;
-    qemu_register_reset(ohci_reset, ohci);
+    qemu_register_reset(ohci_reset, 0, ohci);
     ohci_reset(ohci);
 }
 
diff --git a/hw/vga.c b/hw/vga.c
index 517ce3d..79e1d36 100644
--- a/hw/vga.c
+++ b/hw/vga.c
@@ -2306,7 +2306,7 @@ void vga_init(VGAState *s)
 {
     int vga_io_memory;
 
-    qemu_register_reset(vga_reset, s);
+    qemu_register_reset(vga_reset, 0, s);
     register_savevm("vga", 0, 2, vga_save, vga_load, s);
 
     register_ioport_write(0x3c0, 16, 1, vga_ioport_write, s);
diff --git a/hw/virtio.c b/hw/virtio.c
index 4aa5f20..c420eda 100644
--- a/hw/virtio.c
+++ b/hw/virtio.c
@@ -810,7 +810,7 @@ VirtIODevice *virtio_init_pci(PCIBus *bus, const char *name,
 
     pci_register_io_region(pci_dev, 0, size, PCI_ADDRESS_SPACE_IO,
                            virtio_map);
-    qemu_register_reset(virtio_reset, vdev);
+    qemu_register_reset(virtio_reset, 0, vdev);
 
     return vdev;
 }
diff --git a/vl.c b/vl.c
index 065cffd..f6d3ce5 100644
--- a/vl.c
+++ b/vl.c
@@ -3580,6 +3580,7 @@ void vm_start(void)
 typedef struct QEMUResetEntry {
     QEMUResetHandler *func;
     void *opaque;
+    int order;
     struct QEMUResetEntry *next;
 } QEMUResetEntry;
 
@@ -3635,16 +3636,18 @@ static void do_vm_stop(int reason)
     }
 }
 
-void qemu_register_reset(QEMUResetHandler *func, void *opaque)
+void qemu_register_reset(QEMUResetHandler *func, int order, void *opaque)
 {
     QEMUResetEntry **pre, *re;
 
     pre = &first_reset_entry;
-    while (*pre != NULL)
+    while (*pre != NULL && (*pre)->order >= order) {
         pre = &(*pre)->next;
+    }
     re = qemu_mallocz(sizeof(QEMUResetEntry));
     re->func = func;
     re->opaque = opaque;
+    re->order = order;
     re->next = NULL;
     *pre = re;
 }

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] Re: [PATCH  0/8] kvm: Fixes, cleanups and live migration
  2009-05-01 21:17 [Qemu-devel] [PATCH 0/8] kvm: Fixes, cleanups and live migration Jan Kiszka
                   ` (7 preceding siblings ...)
  2009-05-01 21:17 ` [Qemu-devel] [PATCH 7/8] Introduce reset notifier order Jan Kiszka
@ 2009-05-01 22:30 ` Anthony Liguori
  2009-05-01 22:49   ` Anthony Liguori
  2009-05-01 22:49   ` Jan Kiszka
  2009-05-01 22:40 ` Anthony Liguori
  2009-05-01 22:49 ` [Qemu-devel] [PATCH 9/8] kvm: Save/restore TSC counter Jan Kiszka
  10 siblings, 2 replies; 48+ messages in thread
From: Anthony Liguori @ 2009-05-01 22:30 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Avi Kivity, qemu-devel

Jan Kiszka wrote:
> Besides refreshed versions of my already posted fixes and cleanups for
> KVM, this series comes with patches to enable live migration in KVM
> mode. If there is still some migration bit missing compared to qemu-kvm,
> please let me know.
>
> Find the patches also at git://git.kiszka.org/qemu.git queues/kvm
>   

In the future, can you use topic branches so I can pull more easily?

You've got a fair bit more than this series in that branch.

-- 
Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Qemu-devel] Re: [PATCH  0/8] kvm: Fixes, cleanups and live migration
  2009-05-01 21:17 [Qemu-devel] [PATCH 0/8] kvm: Fixes, cleanups and live migration Jan Kiszka
                   ` (8 preceding siblings ...)
  2009-05-01 22:30 ` [Qemu-devel] Re: [PATCH 0/8] kvm: Fixes, cleanups and live migration Anthony Liguori
@ 2009-05-01 22:40 ` Anthony Liguori
  2009-05-01 22:56   ` Jan Kiszka
  2009-05-02  7:40   ` Gleb Natapov
  2009-05-01 22:49 ` [Qemu-devel] [PATCH 9/8] kvm: Save/restore TSC counter Jan Kiszka
  10 siblings, 2 replies; 48+ messages in thread
From: Anthony Liguori @ 2009-05-01 22:40 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Avi Kivity, qemu-devel

Jan Kiszka wrote:
> Besides refreshed versions of my already posted fixes and cleanups for
> KVM, this series comes with patches to enable live migration in KVM
> mode. If there is still some migration bit missing compared to qemu-kvm,
> please let me know.
>   

In qemu-kvm, we also save:

MSR_IA32_TSC
mp_state
interrupt_bitmap

In QEMU, we probably should just emulate MSR_IA32_TSC and then we can 
reuse that for save/restore.

interrupt_bitmap is a bit more tricky.  It only ever can have one bit 
set AFAICT but I don't see anything that we're currently saving that 
maps to it which leads me to wonder why KVM needs it and QEMU doesn't.

Regards,

Anthony Liguori

> Find the patches also at git://git.kiszka.org/qemu.git queues/kvm
>
> Jan Kiszka (8):
>       kvm: Conditionally apply workaround for KVM slot handling bug
>       kvm: Introduce kvm_set_migration_log
>       kvm: Fix dirty log temporary buffer size
>       kvm: Rework dirty bitmap synchronization
>       kvm: Add missing bits to support live migration
>       kvm: Fix framebuffer dirty log sync
>       Introduce reset notifier order
>       kvm: Rework VCPU reset
>
>  cpu-all.h             |    3 +-
>  exec.c                |   14 ++++-
>  hw/ac97.c             |    2 +-
>  hw/acpi.c             |    2 +-
>  hw/adb.c              |    2 +-
>  hw/apic.c             |    2 +-
>  hw/arm_boot.c         |    2 +-
>  hw/axis_dev88.c       |    2 +-
>  hw/cirrus_vga.c       |    2 +-
>  hw/cs4231.c           |    2 +-
>  hw/cs4231a.c          |    2 +-
>  hw/cuda.c             |    2 +-
>  hw/dma.c              |    2 +-
>  hw/dp8393x.c          |    2 +-
>  hw/eccmemctl.c        |    2 +-
>  hw/eepro100.c         |    2 +-
>  hw/es1370.c           |    2 +-
>  hw/escc.c             |    4 +-
>  hw/esp.c              |    2 +-
>  hw/etraxfs.c          |    2 +-
>  hw/etraxfs_timer.c    |    2 +-
>  hw/fdc.c              |    2 +-
>  hw/framebuffer.c      |    5 +--
>  hw/fw_cfg.c           |    2 +-
>  hw/g364fb.c           |    2 +-
>  hw/grackle_pci.c      |    2 +-
>  hw/heathrow_pic.c     |    2 +-
>  hw/hpet.c             |    2 +-
>  hw/hw.h               |    2 +-
>  hw/i8254.c            |    2 +-
>  hw/i8259.c            |    2 +-
>  hw/ide.c              |    8 ++--
>  hw/ioapic.c           |    2 +-
>  hw/iommu.c            |    2 +-
>  hw/lm832x.c           |    2 +-
>  hw/m48t59.c           |    2 +-
>  hw/mac_dbdma.c        |    2 +-
>  hw/mac_nvram.c        |    2 +-
>  hw/mips_jazz.c        |    2 +-
>  hw/mips_malta.c       |    4 +-
>  hw/mips_mipssim.c     |    2 +-
>  hw/mips_r4k.c         |    2 +-
>  hw/musicpal.c         |    4 +-
>  hw/nseries.c          |    2 +-
>  hw/omap1.c            |    2 +-
>  hw/omap2.c            |    2 +-
>  hw/openpic.c          |    4 +-
>  hw/parallel.c         |    4 +-
>  hw/pc.c               |    2 +-
>  hw/pckbd.c            |    4 +-
>  hw/pl181.c            |    2 +-
>  hw/ppc405_boards.c    |    4 +-
>  hw/ppc405_uc.c        |   24 ++++----
>  hw/ppc4xx_devs.c      |    6 +-
>  hw/ppc4xx_pci.c       |    2 +-
>  hw/ppc_newworld.c     |    2 +-
>  hw/ppc_oldworld.c     |    2 +-
>  hw/ppc_prep.c         |    2 +-
>  hw/ps2.c              |    4 +-
>  hw/rc4030.c           |    2 +-
>  hw/sbi.c              |    2 +-
>  hw/serial.c           |    2 +-
>  hw/slavio_intctl.c    |    2 +-
>  hw/slavio_misc.c      |    2 +-
>  hw/slavio_timer.c     |    2 +-
>  hw/sparc32_dma.c      |    2 +-
>  hw/sun4c_intctl.c     |    2 +-
>  hw/sun4m.c            |   10 ++--
>  hw/sun4u.c            |    2 +-
>  hw/tcx.c              |    2 +-
>  hw/tsc2005.c          |    2 +-
>  hw/tsc210x.c          |    4 +-
>  hw/unin_pci.c         |    2 +-
>  hw/usb-ohci.c         |    2 +-
>  hw/vga.c              |    2 +-
>  hw/virtio.c           |    2 +-
>  kvm-all.c             |  139 +++++++++++++++++++++++++++++++++++--------------
>  kvm.h                 |    5 +-
>  target-i386/machine.c |    4 ++
>  target-ppc/machine.c  |    5 ++
>  vl.c                  |   16 ++++--
>  81 files changed, 240 insertions(+), 155 deletions(-)
>
>
>
>   


-- 
Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Qemu-devel] Re: [PATCH  0/8] kvm: Fixes, cleanups and live migration
  2009-05-01 22:30 ` [Qemu-devel] Re: [PATCH 0/8] kvm: Fixes, cleanups and live migration Anthony Liguori
@ 2009-05-01 22:49   ` Anthony Liguori
  2009-05-01 22:49   ` Jan Kiszka
  1 sibling, 0 replies; 48+ messages in thread
From: Anthony Liguori @ 2009-05-01 22:49 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Avi Kivity, qemu-devel

Anthony Liguori wrote:
> Jan Kiszka wrote:
>> Besides refreshed versions of my already posted fixes and cleanups for
>> KVM, this series comes with patches to enable live migration in KVM
>> mode. If there is still some migration bit missing compared to qemu-kvm,
>> please let me know.
>>
>> Find the patches also at git://git.kiszka.org/qemu.git queues/kvm
>>   
>
> In the future, can you use topic branches so I can pull more easily?
>
> You've got a fair bit more than this series in that branch.

Ignore that, I just misread.

-- 
Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH  9/8] kvm: Save/restore TSC counter
  2009-05-01 21:17 [Qemu-devel] [PATCH 0/8] kvm: Fixes, cleanups and live migration Jan Kiszka
                   ` (9 preceding siblings ...)
  2009-05-01 22:40 ` Anthony Liguori
@ 2009-05-01 22:49 ` Jan Kiszka
  2009-05-01 22:51   ` [Qemu-devel] " Anthony Liguori
  2009-05-02  0:08   ` [Qemu-devel] [PATCH 9/8] kvm: x86: Save/restore KVM-specific CPU states Jan Kiszka
  10 siblings, 2 replies; 48+ messages in thread
From: Jan Kiszka @ 2009-05-01 22:49 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Avi Kivity, qemu-devel

This bit was still missing for stable save/restore or migrate. Only KVM
uses CPUState::tsc, so this value was not yet included into the CPU
snapshot.
---

 target-i386/cpu.h     |    2 +-
 target-i386/machine.c |    8 ++++++--
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index c6bca94..538cbb1 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -837,7 +837,7 @@ static inline int cpu_get_time_fast(void)
 #define cpu_signal_handler cpu_x86_signal_handler
 #define cpu_list x86_cpu_list
 
-#define CPU_SAVE_VERSION 8
+#define CPU_SAVE_VERSION 9
 
 /* MMU modes definitions */
 #define MMU_MODE0_SUFFIX _kernel
diff --git a/target-i386/machine.c b/target-i386/machine.c
index 4fc7335..27aebf4 100644
--- a/target-i386/machine.c
+++ b/target-i386/machine.c
@@ -149,6 +149,8 @@ void cpu_save(QEMUFile *f, void *opaque)
         qemu_put_be64s(f, &env->mtrr_var[i].base);
         qemu_put_be64s(f, &env->mtrr_var[i].mask);
     }
+
+    qemu_put_be64s(f, &env->tsc);
 }
 
 #ifdef USE_X86LDOUBLE
@@ -183,8 +185,7 @@ int cpu_load(QEMUFile *f, void *opaque, int version_id)
     uint16_t fpus, fpuc, fptag, fpregs_format;
     int32_t a20_mask;
 
-    if (version_id != 3 && version_id != 4 && version_id != 5
-        && version_id != 6 && version_id != 7 && version_id != 8)
+    if (version_id < 3 || version_id > CPU_SAVE_VERSION)
         return -EINVAL;
     for(i = 0; i < CPU_NB_REGS; i++)
         qemu_get_betls(f, &env->regs[i]);
@@ -328,6 +329,9 @@ int cpu_load(QEMUFile *f, void *opaque, int version_id)
             qemu_get_be64s(f, &env->mtrr_var[i].mask);
         }
     }
+    if (version_id >= 9) {
+        qemu_get_be64s(f, &env->tsc);
+    }
 
     /* XXX: ensure compatiblity for halted bit ? */
     /* XXX: compute redundant hflags bits */

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] Re: [PATCH  0/8] kvm: Fixes, cleanups and live migration
  2009-05-01 22:30 ` [Qemu-devel] Re: [PATCH 0/8] kvm: Fixes, cleanups and live migration Anthony Liguori
  2009-05-01 22:49   ` Anthony Liguori
@ 2009-05-01 22:49   ` Jan Kiszka
  1 sibling, 0 replies; 48+ messages in thread
From: Jan Kiszka @ 2009-05-01 22:49 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Avi Kivity, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 655 bytes --]

Anthony Liguori wrote:
> Jan Kiszka wrote:
>> Besides refreshed versions of my already posted fixes and cleanups for
>> KVM, this series comes with patches to enable live migration in KVM
>> mode. If there is still some migration bit missing compared to qemu-kvm,
>> please let me know.
>>
>> Find the patches also at git://git.kiszka.org/qemu.git queues/kvm
>>   
> 
> In the future, can you use topic branches so I can pull more easily?
> 
> You've got a fair bit more than this series in that branch.
> 

Sorry, I forgot to push, so you likely got old content. Should be fine
now, including the ninth patch I just sent out.

Jan



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 257 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Qemu-devel] Re: [PATCH  9/8] kvm: Save/restore TSC counter
  2009-05-01 22:49 ` [Qemu-devel] [PATCH 9/8] kvm: Save/restore TSC counter Jan Kiszka
@ 2009-05-01 22:51   ` Anthony Liguori
  2009-05-01 22:58     ` Jan Kiszka
  2009-05-02  0:08   ` [Qemu-devel] [PATCH 9/8] kvm: x86: Save/restore KVM-specific CPU states Jan Kiszka
  1 sibling, 1 reply; 48+ messages in thread
From: Anthony Liguori @ 2009-05-01 22:51 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Avi Kivity, qemu-devel

Jan Kiszka wrote:
> This bit was still missing for stable save/restore or migrate. Only KVM
> uses CPUState::tsc, so this value was not yet included into the CPU
> snapshot.
>   

Needs a Signed-off-by.

Let's try to collapse all the kvm needed changes into a single version 
bump.  kvm-userspace already is using 9 so we'll need to bump it to 10.

-- 
Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Qemu-devel] Re: [PATCH  0/8] kvm: Fixes, cleanups and live migration
  2009-05-01 22:40 ` Anthony Liguori
@ 2009-05-01 22:56   ` Jan Kiszka
  2009-05-02  8:07     ` Avi Kivity
  2009-05-02  7:40   ` Gleb Natapov
  1 sibling, 1 reply; 48+ messages in thread
From: Jan Kiszka @ 2009-05-01 22:56 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Avi Kivity, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1099 bytes --]

Anthony Liguori wrote:
> Jan Kiszka wrote:
>> Besides refreshed versions of my already posted fixes and cleanups for
>> KVM, this series comes with patches to enable live migration in KVM
>> mode. If there is still some migration bit missing compared to qemu-kvm,
>> please let me know.
>>   
> 
> In qemu-kvm, we also save:
> 
> MSR_IA32_TSC

That should be cured by patch 9 now. At least it works for me (TM).

> mp_state

There is no mp_state upstream yet.

> interrupt_bitmap
> 
> In QEMU, we probably should just emulate MSR_IA32_TSC and then we can
> reuse that for save/restore.
> 
> interrupt_bitmap is a bit more tricky.  It only ever can have one bit
> set AFAICT but I don't see anything that we're currently saving that
> maps to it which leads me to wonder why KVM needs it and QEMU doesn't.

As far as I understood it, interrupt_bitmap will become relevant for
in-kernel irqchips.

Hmm, I wonder if we should pick up the change and align the CPU on-disk
format with KVM in revision 9, even if we do not yet need all fields.
What do you think?

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 257 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Qemu-devel] Re: [PATCH  9/8] kvm: Save/restore TSC counter
  2009-05-01 22:51   ` [Qemu-devel] " Anthony Liguori
@ 2009-05-01 22:58     ` Jan Kiszka
  2009-05-01 23:09       ` Jan Kiszka
  0 siblings, 1 reply; 48+ messages in thread
From: Jan Kiszka @ 2009-05-01 22:58 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Avi Kivity, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 495 bytes --]

Anthony Liguori wrote:
> Jan Kiszka wrote:
>> This bit was still missing for stable save/restore or migrate. Only KVM
>> uses CPUState::tsc, so this value was not yet included into the CPU
>> snapshot.
>>   
> 
> Needs a Signed-off-by.

Premature submission...

> 
> Let's try to collapse all the kvm needed changes into a single version
> bump.  kvm-userspace already is using 9 so we'll need to bump it to 10.

Can't we make our format 9 compatible to kvm's number 9?

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 257 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Qemu-devel] Re: [PATCH  9/8] kvm: Save/restore TSC counter
  2009-05-01 22:58     ` Jan Kiszka
@ 2009-05-01 23:09       ` Jan Kiszka
  2009-05-01 23:18         ` Anthony Liguori
  0 siblings, 1 reply; 48+ messages in thread
From: Jan Kiszka @ 2009-05-01 23:09 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Avi Kivity, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 454 bytes --]

Jan Kiszka wrote:
> Anthony Liguori wrote:
>> Let's try to collapse all the kvm needed changes into a single version
>> bump.  kvm-userspace already is using 9 so we'll need to bump it to 10.
> 
> Can't we make our format 9 compatible to kvm's number 9?

qemu-kvm is at v8 ATM, so we could meet at v9 with an identical format.
Just needs a bit more work to sync mp_state and check that
interrupt_bitmap is already properly maintained.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 257 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Qemu-devel] Re: [PATCH  9/8] kvm: Save/restore TSC counter
  2009-05-01 23:09       ` Jan Kiszka
@ 2009-05-01 23:18         ` Anthony Liguori
  0 siblings, 0 replies; 48+ messages in thread
From: Anthony Liguori @ 2009-05-01 23:18 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Avi Kivity, qemu-devel

Jan Kiszka wrote:
> Jan Kiszka wrote:
>   
>> Anthony Liguori wrote:
>>     
>>> Let's try to collapse all the kvm needed changes into a single version
>>> bump.  kvm-userspace already is using 9 so we'll need to bump it to 10.
>>>       
>> Can't we make our format 9 compatible to kvm's number 9?
>>     
>
> qemu-kvm is at v8 ATM, so we could meet at v9 with an identical format.
> Just needs a bit more work to sync mp_state and check that
> interrupt_bitmap is already properly maintained.
>   

Yeah, I don't mind making v9 a different format than what is in qemu-kvm.

Regards,

Anthony Liguori

> Jan
>
>   


-- 
Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 7/8] Introduce reset notifier order
  2009-05-01 21:17 ` [Qemu-devel] [PATCH 7/8] Introduce reset notifier order Jan Kiszka
@ 2009-05-01 23:52   ` Paul Brook
  2009-05-02  0:05     ` [Qemu-devel] " Jan Kiszka
  0 siblings, 1 reply; 48+ messages in thread
From: Paul Brook @ 2009-05-01 23:52 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jan Kiszka, Anthony Liguori, Avi Kivity

On Friday 01 May 2009, Jan Kiszka wrote:
> Add the parameter 'order' to qemu_register_reset and sort callbacks on
> registration. On system reset, callbacks with lower order will be
> invoked before those with higher order. Update all existing users to the
> standard order 0.
>
> Note: At least for x86, the existing users seem to assume that handlers
> are called in their registration order. Therefore, the patch preserves
> this property. If someone feels bored, (s)he could try to identify this
> dependency and express it properly on callback registration.

Why do we need this? Why isn't creation order good enough?

Paul

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Qemu-devel] Re: [PATCH 7/8] Introduce reset notifier order
  2009-05-01 23:52   ` Paul Brook
@ 2009-05-02  0:05     ` Jan Kiszka
  2009-05-02  0:34       ` Paul Brook
  0 siblings, 1 reply; 48+ messages in thread
From: Jan Kiszka @ 2009-05-02  0:05 UTC (permalink / raw)
  To: Paul Brook; +Cc: Anthony Liguori, qemu-devel, Avi Kivity

[-- Attachment #1: Type: text/plain, Size: 896 bytes --]

Paul Brook wrote:
> On Friday 01 May 2009, Jan Kiszka wrote:
>> Add the parameter 'order' to qemu_register_reset and sort callbacks on
>> registration. On system reset, callbacks with lower order will be
>> invoked before those with higher order. Update all existing users to the
>> standard order 0.
>>
>> Note: At least for x86, the existing users seem to assume that handlers
>> are called in their registration order. Therefore, the patch preserves
>> this property. If someone feels bored, (s)he could try to identify this
>> dependency and express it properly on callback registration.
> 
> Why do we need this? Why isn't creation order good enough?

At latest when properly deregistering reset handlers again on device
unplug, the registration order is no longer a static thing, manifested
in the code organization - which can also break due to refactoring, BTW.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 257 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH 9/8] kvm: x86: Save/restore KVM-specific CPU states
  2009-05-01 22:49 ` [Qemu-devel] [PATCH 9/8] kvm: Save/restore TSC counter Jan Kiszka
  2009-05-01 22:51   ` [Qemu-devel] " Anthony Liguori
@ 2009-05-02  0:08   ` Jan Kiszka
  2009-05-02  0:20     ` [Qemu-devel] [PATCH 9/8 v2] " Jan Kiszka
  1 sibling, 1 reply; 48+ messages in thread
From: Jan Kiszka @ 2009-05-02  0:08 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Avi Kivity, qemu-devel

Save and restore all so far neglected KVM-specific CPU states. Handling
the TSC stabilizes migration in KVM mode. The interrupt_bitmap and
mp_state are currently unused, but will become relevant for in-kernel
irqchip support. By including proper saving/restoring already, we avoid
having to increment CPU_SAVE_VERSION later on once again.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---

 kvm-all.c             |   20 ++++++++++++++++++++
 kvm.h                 |    3 +++
 target-i386/cpu.h     |    3 ++-
 target-i386/kvm.c     |   10 ++++++++++
 target-i386/machine.c |   16 ++++++++++++++--
 5 files changed, 49 insertions(+), 3 deletions(-)

diff --git a/kvm-all.c b/kvm-all.c
index 2ac5129..f17055f 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -181,6 +181,26 @@ err:
     return ret;
 }
 
+int kvm_put_mp_state(CPUState *env)
+{
+    struct kvm_mp_state mp_state = { .mp_state = env->mp_state };
+
+    return kvm_vcpu_ioctl(env, KVM_SET_MP_STATE, &mp_state);
+}
+
+int kvm_get_mp_state(CPUState *env)
+{
+    struct kvm_mp_state mp_state;
+    int ret;
+
+    ret = kvm_vcpu_ioctl(env, KVM_GET_MP_STATE, &mp_state);
+    if (ret < 0) {
+        return ret;
+    }
+    env->mp_state = mp_state.mp_state;
+    return 0;
+}
+
 int kvm_sync_vcpus(void)
 {
     CPUState *env;
diff --git a/kvm.h b/kvm.h
index 6e0589a..8256eb6 100644
--- a/kvm.h
+++ b/kvm.h
@@ -72,6 +72,9 @@ int kvm_vm_ioctl(KVMState *s, int type, ...);
 
 int kvm_vcpu_ioctl(CPUState *env, int type, ...);
 
+int kvm_get_mp_state(CPUState *env);
+int kvm_put_mp_state(CPUState *env);
+
 /* Arch specific hooks */
 
 int kvm_arch_post_run(CPUState *env, struct kvm_run *run);
diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index c6bca94..eaa623c 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -669,6 +669,7 @@ typedef struct CPUX86State {
 
     /* For KVM */
     uint64_t interrupt_bitmap[256 / 64];
+    uint32_t mp_state;
 
     /* in order to simplify APIC support, we leave this pointer to the
        user */
@@ -837,7 +838,7 @@ static inline int cpu_get_time_fast(void)
 #define cpu_signal_handler cpu_x86_signal_handler
 #define cpu_list x86_cpu_list
 
-#define CPU_SAVE_VERSION 8
+#define CPU_SAVE_VERSION 9
 
 /* MMU modes definitions */
 #define MMU_MODE0_SUFFIX _kernel
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 2de8b81..f65ae00 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -43,6 +43,8 @@ int kvm_arch_init_vcpu(CPUState *env)
     uint32_t limit, i, j, cpuid_i;
     uint32_t unused;
 
+    env->mp_state = KVM_MP_STATE_UNINITIALIZED;
+
     cpuid_i = 0;
 
     cpu_x86_cpuid(env, 0, 0, &limit, &unused, &unused, &unused);
@@ -565,6 +567,10 @@ int kvm_arch_put_registers(CPUState *env)
     if (ret < 0)
         return ret;
 
+    ret = kvm_put_mp_state(env);
+    if (ret < 0)
+        return ret;
+
     return 0;
 }
 
@@ -588,6 +594,10 @@ int kvm_arch_get_registers(CPUState *env)
     if (ret < 0)
         return ret;
 
+    ret = kvm_get_mp_state(env);
+    if (ret < 0)
+        return ret;
+
     return 0;
 }
 
diff --git a/target-i386/machine.c b/target-i386/machine.c
index 4fc7335..e1ba0d5 100644
--- a/target-i386/machine.c
+++ b/target-i386/machine.c
@@ -149,6 +149,12 @@ void cpu_save(QEMUFile *f, void *opaque)
         qemu_put_be64s(f, &env->mtrr_var[i].base);
         qemu_put_be64s(f, &env->mtrr_var[i].mask);
     }
+
+    for (i = 0; i < sizeof(env->interrupt_bitmap)/8; i++) {
+        qemu_put_be64s(f, &env->interrupt_bitmap[i]);
+    }
+    qemu_put_be64s(f, &env->tsc);
+    qemu_put_be32s(f, &env->mp_state);
 }
 
 #ifdef USE_X86LDOUBLE
@@ -183,8 +189,7 @@ int cpu_load(QEMUFile *f, void *opaque, int version_id)
     uint16_t fpus, fpuc, fptag, fpregs_format;
     int32_t a20_mask;
 
-    if (version_id != 3 && version_id != 4 && version_id != 5
-        && version_id != 6 && version_id != 7 && version_id != 8)
+    if (version_id < 3 || version_id > CPU_SAVE_VERSION)
         return -EINVAL;
     for(i = 0; i < CPU_NB_REGS; i++)
         qemu_get_betls(f, &env->regs[i]);
@@ -328,6 +333,13 @@ int cpu_load(QEMUFile *f, void *opaque, int version_id)
             qemu_get_be64s(f, &env->mtrr_var[i].mask);
         }
     }
+    if (version_id >= 9) {
+        for (i = 0; i < sizeof(env->interrupt_bitmap)/8; i++) {
+            qemu_get_be64s(f, &env->interrupt_bitmap[i]);
+        }
+        qemu_get_be64s(f, &env->tsc);
+        qemu_get_be32s(f, &env->mp_state);
+    }
 
     /* XXX: ensure compatiblity for halted bit ? */
     /* XXX: compute redundant hflags bits */

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH 9/8 v2] kvm: x86: Save/restore KVM-specific CPU states
  2009-05-02  0:08   ` [Qemu-devel] [PATCH 9/8] kvm: x86: Save/restore KVM-specific CPU states Jan Kiszka
@ 2009-05-02  0:20     ` Jan Kiszka
  0 siblings, 0 replies; 48+ messages in thread
From: Jan Kiszka @ 2009-05-02  0:20 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Avi Kivity, qemu-devel

Save and restore all so far neglected KVM-specific CPU states. Handling
the TSC stabilizes migration in KVM mode. The interrupt_bitmap and
mp_state are currently unused, but will become relevant for in-kernel
irqchip support. By including proper saving/restoring already, we avoid
having to increment CPU_SAVE_VERSION later on once again.

v2:
 - initialize mp_state runnable (for the boot CPU)

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---

 kvm-all.c             |   20 ++++++++++++++++++++
 kvm.h                 |    3 +++
 target-i386/cpu.h     |    3 ++-
 target-i386/kvm.c     |   10 ++++++++++
 target-i386/machine.c |   16 ++++++++++++++--
 5 files changed, 49 insertions(+), 3 deletions(-)

diff --git a/kvm-all.c b/kvm-all.c
index 2ac5129..f17055f 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -181,6 +181,26 @@ err:
     return ret;
 }
 
+int kvm_put_mp_state(CPUState *env)
+{
+    struct kvm_mp_state mp_state = { .mp_state = env->mp_state };
+
+    return kvm_vcpu_ioctl(env, KVM_SET_MP_STATE, &mp_state);
+}
+
+int kvm_get_mp_state(CPUState *env)
+{
+    struct kvm_mp_state mp_state;
+    int ret;
+
+    ret = kvm_vcpu_ioctl(env, KVM_GET_MP_STATE, &mp_state);
+    if (ret < 0) {
+        return ret;
+    }
+    env->mp_state = mp_state.mp_state;
+    return 0;
+}
+
 int kvm_sync_vcpus(void)
 {
     CPUState *env;
diff --git a/kvm.h b/kvm.h
index 6e0589a..8256eb6 100644
--- a/kvm.h
+++ b/kvm.h
@@ -72,6 +72,9 @@ int kvm_vm_ioctl(KVMState *s, int type, ...);
 
 int kvm_vcpu_ioctl(CPUState *env, int type, ...);
 
+int kvm_get_mp_state(CPUState *env);
+int kvm_put_mp_state(CPUState *env);
+
 /* Arch specific hooks */
 
 int kvm_arch_post_run(CPUState *env, struct kvm_run *run);
diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index c6bca94..eaa623c 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -669,6 +669,7 @@ typedef struct CPUX86State {
 
     /* For KVM */
     uint64_t interrupt_bitmap[256 / 64];
+    uint32_t mp_state;
 
     /* in order to simplify APIC support, we leave this pointer to the
        user */
@@ -837,7 +838,7 @@ static inline int cpu_get_time_fast(void)
 #define cpu_signal_handler cpu_x86_signal_handler
 #define cpu_list x86_cpu_list
 
-#define CPU_SAVE_VERSION 8
+#define CPU_SAVE_VERSION 9
 
 /* MMU modes definitions */
 #define MMU_MODE0_SUFFIX _kernel
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 2de8b81..00e5b1a 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -43,6 +43,8 @@ int kvm_arch_init_vcpu(CPUState *env)
     uint32_t limit, i, j, cpuid_i;
     uint32_t unused;
 
+    env->mp_state = KVM_MP_STATE_RUNNABLE;
+
     cpuid_i = 0;
 
     cpu_x86_cpuid(env, 0, 0, &limit, &unused, &unused, &unused);
@@ -565,6 +567,10 @@ int kvm_arch_put_registers(CPUState *env)
     if (ret < 0)
         return ret;
 
+    ret = kvm_put_mp_state(env);
+    if (ret < 0)
+        return ret;
+
     return 0;
 }
 
@@ -588,6 +594,10 @@ int kvm_arch_get_registers(CPUState *env)
     if (ret < 0)
         return ret;
 
+    ret = kvm_get_mp_state(env);
+    if (ret < 0)
+        return ret;
+
     return 0;
 }
 
diff --git a/target-i386/machine.c b/target-i386/machine.c
index 4fc7335..e1ba0d5 100644
--- a/target-i386/machine.c
+++ b/target-i386/machine.c
@@ -149,6 +149,12 @@ void cpu_save(QEMUFile *f, void *opaque)
         qemu_put_be64s(f, &env->mtrr_var[i].base);
         qemu_put_be64s(f, &env->mtrr_var[i].mask);
     }
+
+    for (i = 0; i < sizeof(env->interrupt_bitmap)/8; i++) {
+        qemu_put_be64s(f, &env->interrupt_bitmap[i]);
+    }
+    qemu_put_be64s(f, &env->tsc);
+    qemu_put_be32s(f, &env->mp_state);
 }
 
 #ifdef USE_X86LDOUBLE
@@ -183,8 +189,7 @@ int cpu_load(QEMUFile *f, void *opaque, int version_id)
     uint16_t fpus, fpuc, fptag, fpregs_format;
     int32_t a20_mask;
 
-    if (version_id != 3 && version_id != 4 && version_id != 5
-        && version_id != 6 && version_id != 7 && version_id != 8)
+    if (version_id < 3 || version_id > CPU_SAVE_VERSION)
         return -EINVAL;
     for(i = 0; i < CPU_NB_REGS; i++)
         qemu_get_betls(f, &env->regs[i]);
@@ -328,6 +333,13 @@ int cpu_load(QEMUFile *f, void *opaque, int version_id)
             qemu_get_be64s(f, &env->mtrr_var[i].mask);
         }
     }
+    if (version_id >= 9) {
+        for (i = 0; i < sizeof(env->interrupt_bitmap)/8; i++) {
+            qemu_get_be64s(f, &env->interrupt_bitmap[i]);
+        }
+        qemu_get_be64s(f, &env->tsc);
+        qemu_get_be32s(f, &env->mp_state);
+    }
 
     /* XXX: ensure compatiblity for halted bit ? */
     /* XXX: compute redundant hflags bits */

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 7/8] Introduce reset notifier order
  2009-05-02  0:05     ` [Qemu-devel] " Jan Kiszka
@ 2009-05-02  0:34       ` Paul Brook
  2009-05-04  7:45         ` Jan Kiszka
  0 siblings, 1 reply; 48+ messages in thread
From: Paul Brook @ 2009-05-02  0:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: Anthony Liguori, Jan Kiszka, Avi Kivity

On Saturday 02 May 2009, Jan Kiszka wrote:
> Paul Brook wrote:
> > On Friday 01 May 2009, Jan Kiszka wrote:
> >> Add the parameter 'order' to qemu_register_reset and sort callbacks on
> >> registration. On system reset, callbacks with lower order will be
> >> invoked before those with higher order. Update all existing users to the
> >> standard order 0.
> >>
> >> Note: At least for x86, the existing users seem to assume that handlers
> >> are called in their registration order. Therefore, the patch preserves
> >> this property. If someone feels bored, (s)he could try to identify this
> >> dependency and express it properly on callback registration.
> >
> > Why do we need this? Why isn't creation order good enough?
>
> At latest when properly deregistering reset handlers again on device
> unplug, the registration order is no longer a static thing, manifested
> in the code organization - which can also break due to refactoring, BTW.

I'm afraid I can't make any sense of this. What exactly are you trying to 
solve?

Paul

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] Re: [PATCH  0/8] kvm: Fixes, cleanups and live migration
  2009-05-01 22:40 ` Anthony Liguori
  2009-05-01 22:56   ` Jan Kiszka
@ 2009-05-02  7:40   ` Gleb Natapov
  2009-05-02 13:50     ` Anthony Liguori
  1 sibling, 1 reply; 48+ messages in thread
From: Gleb Natapov @ 2009-05-02  7:40 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Jan Kiszka, Avi Kivity, qemu-devel

On Fri, May 01, 2009 at 05:40:00PM -0500, Anthony Liguori wrote:
> Jan Kiszka wrote:
>> Besides refreshed versions of my already posted fixes and cleanups for
>> KVM, this series comes with patches to enable live migration in KVM
>> mode. If there is still some migration bit missing compared to qemu-kvm,
>> please let me know.
>>   
>
> In qemu-kvm, we also save:
>
> MSR_IA32_TSC
> mp_state
> interrupt_bitmap
>
> In QEMU, we probably should just emulate MSR_IA32_TSC and then we can  
> reuse that for save/restore.
>
> interrupt_bitmap is a bit more tricky.  It only ever can have one bit  
> set AFAICT but I don't see anything that we're currently saving that  
> maps to it which leads me to wonder why KVM needs it and QEMU doesn't.
>
KVM stores pending interrupt there (interrupt that is not in irq chip
already but not yet injected to vcpu). I have patch (not yet submitted)
that removes the use of interrupt_bitmap from kernel. It recreated only
for migration bits. My plan is to change migration bits too, but it is
trickier if we want to keep backwards compatibility. No need to
introduce interrupt_bitmap in qemu upstream IMO. We need to migrate
pending interrupt though. The info that should be migrated it interrupt
vector, interrupt type (soft/hw) and instruction length (needed for
injection of soft interrupts on VMX).

--
			Gleb.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Qemu-devel] Re: [PATCH  0/8] kvm: Fixes, cleanups and live migration
  2009-05-01 22:56   ` Jan Kiszka
@ 2009-05-02  8:07     ` Avi Kivity
  0 siblings, 0 replies; 48+ messages in thread
From: Avi Kivity @ 2009-05-02  8:07 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Anthony Liguori, qemu-devel

Jan Kiszka wrote:
> As far as I understood it, interrupt_bitmap will become relevant for
> in-kernel irqchips.
>   

It's relevant for both.  tcg is able to ack an interrupt and inject it 
atomically,  but kvm is not (since injection is part of guest entry, 
which does not have bounded execution time).  Hence the need to split 
the ack and injection into two steps, and save intermediate state.



-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] Re: [PATCH  0/8] kvm: Fixes, cleanups and live migration
  2009-05-02  7:40   ` Gleb Natapov
@ 2009-05-02 13:50     ` Anthony Liguori
  2009-05-02 17:23       ` Gleb Natapov
  0 siblings, 1 reply; 48+ messages in thread
From: Anthony Liguori @ 2009-05-02 13:50 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Anthony Liguori, Jan Kiszka, Avi Kivity, qemu-devel

Gleb Natapov wrote:
> On Fri, May 01, 2009 at 05:40:00PM -0500, Anthony Liguori wrote:
>   
>> Jan Kiszka wrote:
>>     
>>> Besides refreshed versions of my already posted fixes and cleanups for
>>> KVM, this series comes with patches to enable live migration in KVM
>>> mode. If there is still some migration bit missing compared to qemu-kvm,
>>> please let me know.
>>>   
>>>       
>> In qemu-kvm, we also save:
>>
>> MSR_IA32_TSC
>> mp_state
>> interrupt_bitmap
>>
>> In QEMU, we probably should just emulate MSR_IA32_TSC and then we can  
>> reuse that for save/restore.
>>
>> interrupt_bitmap is a bit more tricky.  It only ever can have one bit  
>> set AFAICT but I don't see anything that we're currently saving that  
>> maps to it which leads me to wonder why KVM needs it and QEMU doesn't.
>>
>>     
> KVM stores pending interrupt there (interrupt that is not in irq chip
> already but not yet injected to vcpu). I have patch (not yet submitted)
> that removes the use of interrupt_bitmap from kernel. It recreated only
> for migration bits. My plan is to change migration bits too, but it is
> trickier if we want to keep backwards compatibility. No need to
> introduce interrupt_bitmap in qemu upstream IMO. We need to migrate
> pending interrupt though. The info that should be migrated it interrupt
> vector, interrupt type (soft/hw) and instruction length (needed for
> injection of soft interrupts on VMX).
>   

I think the right thing to do with this is introduce a kvm-cpu savevm 
that stores this information since it isn't relevant to TCG.  I think 
it's arguable whether you want instruction length there (can you get it 
reliably on SVM?).

If we did that, v1 could be interrupt_bitmap and then v2 can be this new 
data layout.

Regards,

Anthony Liguori

> --
> 			Gleb.
>
>
>   

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] Re: [PATCH  0/8] kvm: Fixes, cleanups and live migration
  2009-05-02 13:50     ` Anthony Liguori
@ 2009-05-02 17:23       ` Gleb Natapov
  2009-05-02 19:12         ` Avi Kivity
  0 siblings, 1 reply; 48+ messages in thread
From: Gleb Natapov @ 2009-05-02 17:23 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Anthony Liguori, Jan Kiszka, Avi Kivity, qemu-devel

On Sat, May 02, 2009 at 08:50:08AM -0500, Anthony Liguori wrote:
> Gleb Natapov wrote:
>> On Fri, May 01, 2009 at 05:40:00PM -0500, Anthony Liguori wrote:
>>   
>>> Jan Kiszka wrote:
>>>     
>>>> Besides refreshed versions of my already posted fixes and cleanups for
>>>> KVM, this series comes with patches to enable live migration in KVM
>>>> mode. If there is still some migration bit missing compared to qemu-kvm,
>>>> please let me know.
>>>>         
>>> In qemu-kvm, we also save:
>>>
>>> MSR_IA32_TSC
>>> mp_state
>>> interrupt_bitmap
>>>
>>> In QEMU, we probably should just emulate MSR_IA32_TSC and then we can 
>>>  reuse that for save/restore.
>>>
>>> interrupt_bitmap is a bit more tricky.  It only ever can have one bit 
>>>  set AFAICT but I don't see anything that we're currently saving that 
>>>  maps to it which leads me to wonder why KVM needs it and QEMU 
>>> doesn't.
>>>
>>>     
>> KVM stores pending interrupt there (interrupt that is not in irq chip
>> already but not yet injected to vcpu). I have patch (not yet submitted)
>> that removes the use of interrupt_bitmap from kernel. It recreated only
>> for migration bits. My plan is to change migration bits too, but it is
>> trickier if we want to keep backwards compatibility. No need to
>> introduce interrupt_bitmap in qemu upstream IMO. We need to migrate
>> pending interrupt though. The info that should be migrated it interrupt
>> vector, interrupt type (soft/hw) and instruction length (needed for
>> injection of soft interrupts on VMX).
>>   
>
> I think the right thing to do with this is introduce a kvm-cpu savevm  
> that stores this information since it isn't relevant to TCG.  I think  
> it's arguable whether you want instruction length there (can you get it  
> reliably on SVM?).
>
We can't get it on SVM without instruction decoding, but it is not required
on SVM. It is absolutely essential for soft interrupt/exception injection
on VMX and has to be a part of migratable state.

> If we did that, v1 could be interrupt_bitmap and then v2 can be this new  
> data layout.
>
> Regards,
>
> Anthony Liguori
>
>> --
>> 			Gleb.
>>
>>
>>   

--
			Gleb.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] Re: [PATCH  0/8] kvm: Fixes, cleanups and live migration
  2009-05-02 17:23       ` Gleb Natapov
@ 2009-05-02 19:12         ` Avi Kivity
  2009-05-02 20:07           ` Gleb Natapov
  0 siblings, 1 reply; 48+ messages in thread
From: Avi Kivity @ 2009-05-02 19:12 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Anthony Liguori, Jan Kiszka, qemu-devel

Gleb Natapov wrote:
>>
>> I think the right thing to do with this is introduce a kvm-cpu savevm  
>> that stores this information since it isn't relevant to TCG.  I think  
>> it's arguable whether you want instruction length there (can you get it  
>> reliably on SVM?).
>>
>>     
> We can't get it on SVM without instruction decoding, but it is not required
> on SVM. It is absolutely essential for soft interrupt/exception injection
> on VMX and has to be a part of migratable state.
>   

We need it in some neutral form so cross-vendor migration can work.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] Re: [PATCH  0/8] kvm: Fixes, cleanups and live migration
  2009-05-02 19:12         ` Avi Kivity
@ 2009-05-02 20:07           ` Gleb Natapov
  2009-05-02 20:09             ` [Qemu-devel] Re: [PATCH 0/8] kvm: Fixes, cleanups and livemigration Anthony Liguori
  2009-05-03  5:57             ` [Qemu-devel] Re: [PATCH 0/8] kvm: Fixes, cleanups and live migration Avi Kivity
  0 siblings, 2 replies; 48+ messages in thread
From: Gleb Natapov @ 2009-05-02 20:07 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Anthony Liguori, Jan Kiszka, qemu-devel

On Sat, May 02, 2009 at 10:12:57PM +0300, Avi Kivity wrote:
> Gleb Natapov wrote:
>>>
>>> I think the right thing to do with this is introduce a kvm-cpu savevm 
>>>  that stores this information since it isn't relevant to TCG.  I 
>>> think  it's arguable whether you want instruction length there (can 
>>> you get it  reliably on SVM?).
>>>
>>>     
>> We can't get it on SVM without instruction decoding, but it is not required
>> on SVM. It is absolutely essential for soft interrupt/exception injection
>> on VMX and has to be a part of migratable state.
>>   
>
> We need it in some neutral form so cross-vendor migration can work.
>
VMX->SVM No problem.
SVM->VMX bad luck :)  We will have to decode instruction ourself. 

--
			Gleb.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] Re: [PATCH  0/8] kvm: Fixes, cleanups and  livemigration
  2009-05-02 20:07           ` Gleb Natapov
@ 2009-05-02 20:09             ` Anthony Liguori
  2009-05-03  5:25               ` Gleb Natapov
  2009-05-03  5:57             ` [Qemu-devel] Re: [PATCH 0/8] kvm: Fixes, cleanups and live migration Avi Kivity
  1 sibling, 1 reply; 48+ messages in thread
From: Anthony Liguori @ 2009-05-02 20:09 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Jan Kiszka, Avi Kivity, qemu-devel

Gleb Natapov wrote:
> On Sat, May 02, 2009 at 10:12:57PM +0300, Avi Kivity wrote:
>   
>> Gleb Natapov wrote:
>>     
>>>> I think the right thing to do with this is introduce a kvm-cpu savevm 
>>>>  that stores this information since it isn't relevant to TCG.  I 
>>>> think  it's arguable whether you want instruction length there (can 
>>>> you get it  reliably on SVM?).
>>>>
>>>>     
>>>>         
>>> We can't get it on SVM without instruction decoding, but it is not required
>>> on SVM. It is absolutely essential for soft interrupt/exception injection
>>> on VMX and has to be a part of migratable state.
>>>   
>>>       
>> We need it in some neutral form so cross-vendor migration can work.
>>
>>     
> VMX->SVM No problem.
> SVM->VMX bad luck :)  We will have to decode instruction ourself. 
>   

Any reason to not just always decode the instruction then?

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] Re: [PATCH  0/8] kvm: Fixes, cleanups and livemigration
  2009-05-02 20:09             ` [Qemu-devel] Re: [PATCH 0/8] kvm: Fixes, cleanups and livemigration Anthony Liguori
@ 2009-05-03  5:25               ` Gleb Natapov
  0 siblings, 0 replies; 48+ messages in thread
From: Gleb Natapov @ 2009-05-03  5:25 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Jan Kiszka, Avi Kivity, qemu-devel

On Sat, May 02, 2009 at 03:09:27PM -0500, Anthony Liguori wrote:
> Gleb Natapov wrote:
>> On Sat, May 02, 2009 at 10:12:57PM +0300, Avi Kivity wrote:
>>   
>>> Gleb Natapov wrote:
>>>     
>>>>> I think the right thing to do with this is introduce a kvm-cpu 
>>>>> savevm  that stores this information since it isn't relevant to 
>>>>> TCG.  I think  it's arguable whether you want instruction length 
>>>>> there (can you get it  reliably on SVM?).
>>>>>
>>>>>             
>>>> We can't get it on SVM without instruction decoding, but it is not required
>>>> on SVM. It is absolutely essential for soft interrupt/exception injection
>>>> on VMX and has to be a part of migratable state.
>>>>         
>>> We need it in some neutral form so cross-vendor migration can work.
>>>
>>>     
>> VMX->SVM No problem.
>> SVM->VMX bad luck :)  We will have to decode instruction ourself.   
>
> Any reason to not just always decode the instruction then?
>
That is other way around. Any reason to decode instruction if the length
is kindly provided by VMX? :)

But actually, thinking about it, we may not have to decode instruction
even during SVM->VMX migration. Instruction will be re-executed after
migration to VMX instead of event re-injected. Intel advice not to do
that, but given that migration (especially cross vendor) is very rare
and having pending soft interrupt during migration is even more rare I
think it will do.

--
			Gleb.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] Re: [PATCH  0/8] kvm: Fixes, cleanups and live migration
  2009-05-02 20:07           ` Gleb Natapov
  2009-05-02 20:09             ` [Qemu-devel] Re: [PATCH 0/8] kvm: Fixes, cleanups and livemigration Anthony Liguori
@ 2009-05-03  5:57             ` Avi Kivity
  2009-05-03  6:05               ` Gleb Natapov
  1 sibling, 1 reply; 48+ messages in thread
From: Avi Kivity @ 2009-05-03  5:57 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Anthony Liguori, Jan Kiszka, qemu-devel

Gleb Natapov wrote:
> On Sat, May 02, 2009 at 10:12:57PM +0300, Avi Kivity wrote:
>   
>> Gleb Natapov wrote:
>>     
>>>> I think the right thing to do with this is introduce a kvm-cpu savevm 
>>>>  that stores this information since it isn't relevant to TCG.  I 
>>>> think  it's arguable whether you want instruction length there (can 
>>>> you get it  reliably on SVM?).
>>>>
>>>>     
>>>>         
>>> We can't get it on SVM without instruction decoding, but it is not required
>>> on SVM. It is absolutely essential for soft interrupt/exception injection
>>> on VMX and has to be a part of migratable state.
>>>   
>>>       
>> We need it in some neutral form so cross-vendor migration can work.
>>
>>     
> VMX->SVM No problem.
> SVM->VMX bad luck :)  We will have to decode instruction ourself. 
>   

I don't think it's necessary.  We can record the software interrupt at 
the end of the instruction that generated it, and give it higher 
priority than a pending external interrupt.  On vmx, decrement RIP and 
set entry instruction length = 1 before injection.  On svm, use EVENTINJ 
and forget about the instruction length.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] Re: [PATCH  0/8] kvm: Fixes, cleanups and live migration
  2009-05-03  5:57             ` [Qemu-devel] Re: [PATCH 0/8] kvm: Fixes, cleanups and live migration Avi Kivity
@ 2009-05-03  6:05               ` Gleb Natapov
  2009-05-03  7:36                 ` Avi Kivity
  0 siblings, 1 reply; 48+ messages in thread
From: Gleb Natapov @ 2009-05-03  6:05 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Anthony Liguori, Jan Kiszka, qemu-devel

On Sun, May 03, 2009 at 08:57:58AM +0300, Avi Kivity wrote:
> Gleb Natapov wrote:
>> On Sat, May 02, 2009 at 10:12:57PM +0300, Avi Kivity wrote:
>>   
>>> Gleb Natapov wrote:
>>>     
>>>>> I think the right thing to do with this is introduce a kvm-cpu 
>>>>> savevm  that stores this information since it isn't relevant to 
>>>>> TCG.  I think  it's arguable whether you want instruction length 
>>>>> there (can you get it  reliably on SVM?).
>>>>>
>>>>>             
>>>> We can't get it on SVM without instruction decoding, but it is not required
>>>> on SVM. It is absolutely essential for soft interrupt/exception injection
>>>> on VMX and has to be a part of migratable state.
>>>>         
>>> We need it in some neutral form so cross-vendor migration can work.
>>>
>>>     
>> VMX->SVM No problem.
>> SVM->VMX bad luck :)  We will have to decode instruction ourself.   
>
> I don't think it's necessary.  We can record the software interrupt at  
> the end of the instruction that generated it, and give it higher  
> priority than a pending external interrupt.  On vmx, decrement RIP and  
> set entry instruction length = 1 before injection.
And get wrong error value when exception happens during soft interrupt
delivery? I don't like all those tricks. They work only if everything
happens like you expected and breaks completely when it is not.

>                                                     On svm, use EVENTINJ  
> and forget about the instruction length.
>
On SVM we do not re-inject soft int/exception at all, but re-execute the
offending instruction.

--
			Gleb.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] Re: [PATCH  0/8] kvm: Fixes, cleanups and live migration
  2009-05-03  6:05               ` Gleb Natapov
@ 2009-05-03  7:36                 ` Avi Kivity
  2009-05-03  7:46                   ` Gleb Natapov
  0 siblings, 1 reply; 48+ messages in thread
From: Avi Kivity @ 2009-05-03  7:36 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Anthony Liguori, Jan Kiszka, qemu-devel

Gleb Natapov wrote:
>> I don't think it's necessary.  We can record the software interrupt at  
>> the end of the instruction that generated it, and give it higher  
>> priority than a pending external interrupt.  On vmx, decrement RIP and  
>> set entry instruction length = 1 before injection.
>>     
> And get wrong error value when exception happens during soft interrupt
> delivery? I don't like all those tricks. They work only if everything
> happens like you expected and breaks completely when it is not.
>
>   

Er, yes.

>>                                                     On svm, use EVENTINJ  
>> and forget about the instruction length.
>>
>>     
> On SVM we do not re-inject soft int/exception at all, but re-execute the
> offending instruction.
>   

Maybe we should unexecute the software interrupt instruction on Intel 
and get the same effect.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] Re: [PATCH  0/8] kvm: Fixes, cleanups and live migration
  2009-05-03  7:36                 ` Avi Kivity
@ 2009-05-03  7:46                   ` Gleb Natapov
  2009-05-03  7:50                     ` Avi Kivity
  0 siblings, 1 reply; 48+ messages in thread
From: Gleb Natapov @ 2009-05-03  7:46 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Anthony Liguori, Jan Kiszka, qemu-devel

On Sun, May 03, 2009 at 10:36:54AM +0300, Avi Kivity wrote:
> Gleb Natapov wrote:
>>> I don't think it's necessary.  We can record the software interrupt 
>>> at  the end of the instruction that generated it, and give it higher  
>>> priority than a pending external interrupt.  On vmx, decrement RIP 
>>> and  set entry instruction length = 1 before injection.
>>>     
>> And get wrong error value when exception happens during soft interrupt
>> delivery? I don't like all those tricks. They work only if everything
>> happens like you expected and breaks completely when it is not.
>>
>>   
>
> Er, yes.
>
>>>                                                     On svm, use 
>>> EVENTINJ  and forget about the instruction length.
>>>
>>>     
>> On SVM we do not re-inject soft int/exception at all, but re-execute the
>> offending instruction.
>>   
>
> Maybe we should unexecute the software interrupt instruction on Intel  
> and get the same effect.
>
We don't need to unexecute anything. We get exit with RIP pointing to
the offending instruction. The right thing on VMX to do is to inject
software interrupt with correct instruction length. Processor will do
the rest. Remind me please why do we try to find problems where there is
none? We will do right thing and fix migration code to do right thing.

--
			Gleb.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] Re: [PATCH  0/8] kvm: Fixes, cleanups and live migration
  2009-05-03  7:46                   ` Gleb Natapov
@ 2009-05-03  7:50                     ` Avi Kivity
  2009-05-03  7:56                       ` Gleb Natapov
  0 siblings, 1 reply; 48+ messages in thread
From: Avi Kivity @ 2009-05-03  7:50 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Anthony Liguori, Jan Kiszka, qemu-devel

Gleb Natapov wrote:

  

>> Maybe we should unexecute the software interrupt instruction on Intel  
>> and get the same effect.
>>
>>     
> We don't need to unexecute anything. We get exit with RIP pointing to
> the offending instruction. The right thing on VMX to do is to inject
> software interrupt with correct instruction length. Processor will do
> the rest. Remind me please why do we try to find problems where there is
> none? We will do right thing and fix migration code to do right thing.
>
>   

I don't want the migration protocol to encode vendor specific 
information. The architectural state is complicated enough, we don't 
want microarchitectural state as well.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] Re: [PATCH  0/8] kvm: Fixes, cleanups and live migration
  2009-05-03  7:50                     ` Avi Kivity
@ 2009-05-03  7:56                       ` Gleb Natapov
  2009-05-03  8:01                         ` Avi Kivity
  0 siblings, 1 reply; 48+ messages in thread
From: Gleb Natapov @ 2009-05-03  7:56 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Anthony Liguori, Jan Kiszka, qemu-devel

On Sun, May 03, 2009 at 10:50:12AM +0300, Avi Kivity wrote:
> Gleb Natapov wrote:
>
>  
>
>>> Maybe we should unexecute the software interrupt instruction on Intel 
>>>  and get the same effect.
>>>
>>>     
>> We don't need to unexecute anything. We get exit with RIP pointing to
>> the offending instruction. The right thing on VMX to do is to inject
>> software interrupt with correct instruction length. Processor will do
>> the rest. Remind me please why do we try to find problems where there is
>> none? We will do right thing and fix migration code to do right thing.
>>
>>   
>
> I don't want the migration protocol to encode vendor specific  
> information. The architectural state is complicated enough, we don't  
> want microarchitectural state as well.
Then I don't see how migration can work correctly. How do you expect
migration to work if you don't migrate part of a processor state? Why
not drop non migratable state immediately after exit then? (that is
essentially what happens if we don't migrate it).

--
			Gleb.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] Re: [PATCH  0/8] kvm: Fixes, cleanups and live migration
  2009-05-03  7:56                       ` Gleb Natapov
@ 2009-05-03  8:01                         ` Avi Kivity
  2009-05-03  8:35                           ` Gleb Natapov
  0 siblings, 1 reply; 48+ messages in thread
From: Avi Kivity @ 2009-05-03  8:01 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Anthony Liguori, Jan Kiszka, qemu-devel

Gleb Natapov wrote:
>> I don't want the migration protocol to encode vendor specific  
>> information. The architectural state is complicated enough, we don't  
>> want microarchitectural state as well.
>>     
> Then I don't see how migration can work correctly. How do you expect
> migration to work if you don't migrate part of a processor state? Why
> not drop non migratable state immediately after exit then? (that is
> essentially what happens if we don't migrate it).
>   

If we can roll back the state to before the software interrupt executed, 
we are never in the situation where the instruction length is needed.

The whole mess is needed because vmx allows exiting after a software 
interrupt instruction has been executed, but before the software 
interrupt was processed by the cpu. If we unexecute the instruction and 
forget the software interrupt, everything will continue to work.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] Re: [PATCH  0/8] kvm: Fixes, cleanups and live migration
  2009-05-03  8:01                         ` Avi Kivity
@ 2009-05-03  8:35                           ` Gleb Natapov
  0 siblings, 0 replies; 48+ messages in thread
From: Gleb Natapov @ 2009-05-03  8:35 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Anthony Liguori, Jan Kiszka, qemu-devel

On Sun, May 03, 2009 at 11:01:34AM +0300, Avi Kivity wrote:
> Gleb Natapov wrote:
>>> I don't want the migration protocol to encode vendor specific   
>>> information. The architectural state is complicated enough, we don't  
>>> want microarchitectural state as well.
>>>     
>> Then I don't see how migration can work correctly. How do you expect
>> migration to work if you don't migrate part of a processor state? Why
>> not drop non migratable state immediately after exit then? (that is
>> essentially what happens if we don't migrate it).
>>   
>
> If we can roll back the state to before the software interrupt executed,  
> we are never in the situation where the instruction length is needed.
>
Not according to Intel :)

> The whole mess is needed because vmx allows exiting after a software  
> interrupt instruction has been executed, but before the software  
> interrupt was processed by the cpu. If we unexecute the instruction and  
> forget the software interrupt, everything will continue to work.
>
VMX exits with RIP pointing to software interrupt instruction (i.e before
instruction execution), so no need to "unexecute" it. Intel advice to
inject software interrupt as opposite to reexecute instruction. If we
will not migrate information needed to inject soft interrupt we will
have to reexecute it after migration. May be this is not a big deal.

--
			Gleb.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Qemu-devel] Re: [PATCH 4/8] kvm: Rework dirty bitmap synchronization
  2009-05-01 21:17 ` [Qemu-devel] [PATCH 4/8] kvm: Rework dirty bitmap synchronization Jan Kiszka
@ 2009-05-03 10:05   ` Avi Kivity
  2009-05-04  8:52     ` Jan Kiszka
  0 siblings, 1 reply; 48+ messages in thread
From: Avi Kivity @ 2009-05-03 10:05 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Anthony Liguori, qemu-devel

Jan Kiszka wrote:
> Extend kvm_physical_sync_dirty_bitmap() so that is can sync across
> multiple slots. Useful for updating the whole dirty log during
> migration. Moreover, properly pass down errors the whole call chain.
>
>  
> -void cpu_physical_sync_dirty_bitmap(target_phys_addr_t start_addr, target_phys_addr_t end_addr);
> +int cpu_physical_sync_dirty_bitmap(target_phys_addr_t start_addr,
> +                                   target_phys_addr_t end_addr);
>  
>   

This is defined in terms of physical addresses, but in patch 5, you

> @@ -3248,13 +3248,18 @@ static int ram_save_live(QEMUFile *f, int stage, void *opaque)
>  {
>      ram_addr_t addr;
>  
> +    if (cpu_physical_sync_dirty_bitmap(0, last_ram_offset) != 0) {
> +        qemu_file_set_error(f);
> +        return 0;
> +    }
> +
>     

Which is in terms of ram addresses; so this will fail with large memory, 
where the last phys address is larger than the last ram address.

Maybe this can be as easy as passing -1 for the end address.  But in 
this case I suggest adding a helper to sync all of memory instead.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Qemu-devel] Re: [PATCH 8/8] kvm: Rework VCPU reset
  2009-05-01 21:17 ` [Qemu-devel] [PATCH 8/8] kvm: Rework VCPU reset Jan Kiszka
@ 2009-05-03 15:58   ` Avi Kivity
  2009-05-04  8:54     ` Jan Kiszka
  0 siblings, 1 reply; 48+ messages in thread
From: Avi Kivity @ 2009-05-03 15:58 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Anthony Liguori, qemu-devel

Jan Kiszka wrote:
> Use standard callback with highest order to synchronize VCPU on reset
> after all device callbacks were execute. This allows to remove the
> special kvm hook in qemu_system_reset.
>   

Is this needed for the lapic reset callback?

If so, we can express the dependency explicitly rather than with a 
priority, by having cpu reset notifiers invoked when the cpu is reset.  
In the case of the lapic, I don't think we need an abstract mechanism; 
the lapic is part of the cpu, not some random device.

Maybe we should even save/load it as part of the cpu.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 7/8] Introduce reset notifier order
  2009-05-02  0:34       ` Paul Brook
@ 2009-05-04  7:45         ` Jan Kiszka
  0 siblings, 0 replies; 48+ messages in thread
From: Jan Kiszka @ 2009-05-04  7:45 UTC (permalink / raw)
  To: Paul Brook; +Cc: Anthony Liguori, qemu-devel, Avi Kivity

[-- Attachment #1: Type: text/plain, Size: 1362 bytes --]

Paul Brook wrote:
> On Saturday 02 May 2009, Jan Kiszka wrote:
>> Paul Brook wrote:
>>> On Friday 01 May 2009, Jan Kiszka wrote:
>>>> Add the parameter 'order' to qemu_register_reset and sort callbacks on
>>>> registration. On system reset, callbacks with lower order will be
>>>> invoked before those with higher order. Update all existing users to the
>>>> standard order 0.
>>>>
>>>> Note: At least for x86, the existing users seem to assume that handlers
>>>> are called in their registration order. Therefore, the patch preserves
>>>> this property. If someone feels bored, (s)he could try to identify this
>>>> dependency and express it properly on callback registration.
>>> Why do we need this? Why isn't creation order good enough?
>> At latest when properly deregistering reset handlers again on device
>> unplug, the registration order is no longer a static thing, manifested
>> in the code organization - which can also break due to refactoring, BTW.
> 
> I'm afraid I can't make any sense of this. What exactly are you trying to 
> solve?

Thinking about it again, device hot-plugging was a bad example as its
natural order usually also ensures the right reset order.

The problem I'm trying to solve are tricky dependencies on x86 between
CPU, APIC, and KVM's VCPU. But I got an idea how to solve it differently.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 257 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Qemu-devel] Re: [PATCH 4/8] kvm: Rework dirty bitmap synchronization
  2009-05-03 10:05   ` [Qemu-devel] " Avi Kivity
@ 2009-05-04  8:52     ` Jan Kiszka
  0 siblings, 0 replies; 48+ messages in thread
From: Jan Kiszka @ 2009-05-04  8:52 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Anthony Liguori, qemu-devel

Avi Kivity wrote:
> Jan Kiszka wrote:
>> Extend kvm_physical_sync_dirty_bitmap() so that is can sync across
>> multiple slots. Useful for updating the whole dirty log during
>> migration. Moreover, properly pass down errors the whole call chain.
>>
>>  
>> -void cpu_physical_sync_dirty_bitmap(target_phys_addr_t start_addr,
>> target_phys_addr_t end_addr);
>> +int cpu_physical_sync_dirty_bitmap(target_phys_addr_t start_addr,
>> +                                   target_phys_addr_t end_addr);
>>  
>>   
> 
> This is defined in terms of physical addresses, but in patch 5, you
> 
>> @@ -3248,13 +3248,18 @@ static int ram_save_live(QEMUFile *f, int
>> stage, void *opaque)
>>  {
>>      ram_addr_t addr;
>>  
>> +    if (cpu_physical_sync_dirty_bitmap(0, last_ram_offset) != 0) {
>> +        qemu_file_set_error(f);
>> +        return 0;
>> +    }
>> +
>>     
> 
> Which is in terms of ram addresses; so this will fail with large memory,
> where the last phys address is larger than the last ram address.

True, will fix.

> 
> Maybe this can be as easy as passing -1 for the end address.  But in
> this case I suggest adding a helper to sync all of memory instead.

IMHO, defining a separate helper is overkill as it would also mean
providing a specialized version of kvm_physical_sync_dirty_bitmap. I
think I will go for TARGET_PHYS_ADDR_MAX.

Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Qemu-devel] Re: [PATCH 8/8] kvm: Rework VCPU reset
  2009-05-03 15:58   ` [Qemu-devel] " Avi Kivity
@ 2009-05-04  8:54     ` Jan Kiszka
  2009-05-04  9:12       ` Avi Kivity
  0 siblings, 1 reply; 48+ messages in thread
From: Jan Kiszka @ 2009-05-04  8:54 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Anthony Liguori, qemu-devel

Avi Kivity wrote:
> Jan Kiszka wrote:
>> Use standard callback with highest order to synchronize VCPU on reset
>> after all device callbacks were execute. This allows to remove the
>> special kvm hook in qemu_system_reset.
>>   
> 
> Is this needed for the lapic reset callback?
> 
> If so, we can express the dependency explicitly rather than with a
> priority, by having cpu reset notifiers invoked when the cpu is reset. 
> In the case of the lapic, I don't think we need an abstract mechanism;
> the lapic is part of the cpu, not some random device.
> 
> Maybe we should even save/load it as part of the cpu.
> 

QEMU is not only providing the LAPIC, but also covering the old
dedicated version. That makes at least HW instantiating a bit more complex.

Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Qemu-devel] Re: [PATCH 8/8] kvm: Rework VCPU reset
  2009-05-04  8:54     ` Jan Kiszka
@ 2009-05-04  9:12       ` Avi Kivity
  2009-05-04  9:29         ` Jan Kiszka
  0 siblings, 1 reply; 48+ messages in thread
From: Avi Kivity @ 2009-05-04  9:12 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Anthony Liguori, qemu-devel

Jan Kiszka wrote:
> Avi Kivity wrote:
>   
>> Jan Kiszka wrote:
>>     
>>> Use standard callback with highest order to synchronize VCPU on reset
>>> after all device callbacks were execute. This allows to remove the
>>> special kvm hook in qemu_system_reset.
>>>   
>>>       
>> Is this needed for the lapic reset callback?
>>
>> If so, we can express the dependency explicitly rather than with a
>> priority, by having cpu reset notifiers invoked when the cpu is reset. 
>> In the case of the lapic, I don't think we need an abstract mechanism;
>> the lapic is part of the cpu, not some random device.
>>
>> Maybe we should even save/load it as part of the cpu.
>>
>>     
>
> QEMU is not only providing the LAPIC, but also covering the old
> dedicated version. That makes at least HW instantiating a bit more complex.
>
>   

What do you mean by 'old dedicated version'?

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Qemu-devel] Re: [PATCH 8/8] kvm: Rework VCPU reset
  2009-05-04  9:12       ` Avi Kivity
@ 2009-05-04  9:29         ` Jan Kiszka
  2009-05-04 10:01           ` Avi Kivity
  0 siblings, 1 reply; 48+ messages in thread
From: Jan Kiszka @ 2009-05-04  9:29 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Anthony Liguori, qemu-devel

Avi Kivity wrote:
> Jan Kiszka wrote:
>> Avi Kivity wrote:
>>  
>>> Jan Kiszka wrote:
>>>    
>>>> Use standard callback with highest order to synchronize VCPU on reset
>>>> after all device callbacks were execute. This allows to remove the
>>>> special kvm hook in qemu_system_reset.
>>>>         
>>> Is this needed for the lapic reset callback?
>>>
>>> If so, we can express the dependency explicitly rather than with a
>>> priority, by having cpu reset notifiers invoked when the cpu is
>>> reset. In the case of the lapic, I don't think we need an abstract
>>> mechanism;
>>> the lapic is part of the cpu, not some random device.
>>>
>>> Maybe we should even save/load it as part of the cpu.
>>>
>>>     
>>
>> QEMU is not only providing the LAPIC, but also covering the old
>> dedicated version. That makes at least HW instantiating a bit more
>> complex.
>>
>>   
> 
> What do you mean by 'old dedicated version'?

Separate chip, not part of the CPU. Some 486 system used to have this IIRC.

Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Qemu-devel] Re: [PATCH 8/8] kvm: Rework VCPU reset
  2009-05-04  9:29         ` Jan Kiszka
@ 2009-05-04 10:01           ` Avi Kivity
  0 siblings, 0 replies; 48+ messages in thread
From: Avi Kivity @ 2009-05-04 10:01 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Anthony Liguori, qemu-devel

Jan Kiszka wrote:
> Avi Kivity wrote:
>   
>> Jan Kiszka wrote:
>>     
>>> Avi Kivity wrote:
>>>  
>>>       
>>>> Jan Kiszka wrote:
>>>>    
>>>>         
>>>>> Use standard callback with highest order to synchronize VCPU on reset
>>>>> after all device callbacks were execute. This allows to remove the
>>>>> special kvm hook in qemu_system_reset.
>>>>>         
>>>>>           
>>>> Is this needed for the lapic reset callback?
>>>>
>>>> If so, we can express the dependency explicitly rather than with a
>>>> priority, by having cpu reset notifiers invoked when the cpu is
>>>> reset. In the case of the lapic, I don't think we need an abstract
>>>> mechanism;
>>>> the lapic is part of the cpu, not some random device.
>>>>
>>>> Maybe we should even save/load it as part of the cpu.
>>>>
>>>>     
>>>>         
>>> QEMU is not only providing the LAPIC, but also covering the old
>>> dedicated version. That makes at least HW instantiating a bit more
>>> complex.
>>>
>>>   
>>>       
>> What do you mean by 'old dedicated version'?
>>     
>
> Separate chip, not part of the CPU. Some 486 system used to have this IIRC.
>
>   

Yes, you're right.

Maybe we can have a vcpu reset callback for the lapic.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2009-05-04 10:02 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-05-01 21:17 [Qemu-devel] [PATCH 0/8] kvm: Fixes, cleanups and live migration Jan Kiszka
2009-05-01 21:17 ` [Qemu-devel] [PATCH 1/8] kvm: Conditionally apply workaround for KVM slot handling bug Jan Kiszka
2009-05-01 21:17 ` [Qemu-devel] [PATCH 3/8] kvm: Fix dirty log temporary buffer size Jan Kiszka
2009-05-01 21:17 ` [Qemu-devel] [PATCH 5/8] kvm: Add missing bits to support live migration Jan Kiszka
2009-05-01 21:17 ` [Qemu-devel] [PATCH 6/8] kvm: Fix framebuffer dirty log sync Jan Kiszka
2009-05-01 21:17 ` [Qemu-devel] [PATCH 4/8] kvm: Rework dirty bitmap synchronization Jan Kiszka
2009-05-03 10:05   ` [Qemu-devel] " Avi Kivity
2009-05-04  8:52     ` Jan Kiszka
2009-05-01 21:17 ` [Qemu-devel] [PATCH 2/8] kvm: Introduce kvm_set_migration_log Jan Kiszka
2009-05-01 21:17 ` [Qemu-devel] [PATCH 8/8] kvm: Rework VCPU reset Jan Kiszka
2009-05-03 15:58   ` [Qemu-devel] " Avi Kivity
2009-05-04  8:54     ` Jan Kiszka
2009-05-04  9:12       ` Avi Kivity
2009-05-04  9:29         ` Jan Kiszka
2009-05-04 10:01           ` Avi Kivity
2009-05-01 21:17 ` [Qemu-devel] [PATCH 7/8] Introduce reset notifier order Jan Kiszka
2009-05-01 23:52   ` Paul Brook
2009-05-02  0:05     ` [Qemu-devel] " Jan Kiszka
2009-05-02  0:34       ` Paul Brook
2009-05-04  7:45         ` Jan Kiszka
2009-05-01 22:30 ` [Qemu-devel] Re: [PATCH 0/8] kvm: Fixes, cleanups and live migration Anthony Liguori
2009-05-01 22:49   ` Anthony Liguori
2009-05-01 22:49   ` Jan Kiszka
2009-05-01 22:40 ` Anthony Liguori
2009-05-01 22:56   ` Jan Kiszka
2009-05-02  8:07     ` Avi Kivity
2009-05-02  7:40   ` Gleb Natapov
2009-05-02 13:50     ` Anthony Liguori
2009-05-02 17:23       ` Gleb Natapov
2009-05-02 19:12         ` Avi Kivity
2009-05-02 20:07           ` Gleb Natapov
2009-05-02 20:09             ` [Qemu-devel] Re: [PATCH 0/8] kvm: Fixes, cleanups and livemigration Anthony Liguori
2009-05-03  5:25               ` Gleb Natapov
2009-05-03  5:57             ` [Qemu-devel] Re: [PATCH 0/8] kvm: Fixes, cleanups and live migration Avi Kivity
2009-05-03  6:05               ` Gleb Natapov
2009-05-03  7:36                 ` Avi Kivity
2009-05-03  7:46                   ` Gleb Natapov
2009-05-03  7:50                     ` Avi Kivity
2009-05-03  7:56                       ` Gleb Natapov
2009-05-03  8:01                         ` Avi Kivity
2009-05-03  8:35                           ` Gleb Natapov
2009-05-01 22:49 ` [Qemu-devel] [PATCH 9/8] kvm: Save/restore TSC counter Jan Kiszka
2009-05-01 22:51   ` [Qemu-devel] " Anthony Liguori
2009-05-01 22:58     ` Jan Kiszka
2009-05-01 23:09       ` Jan Kiszka
2009-05-01 23:18         ` Anthony Liguori
2009-05-02  0:08   ` [Qemu-devel] [PATCH 9/8] kvm: x86: Save/restore KVM-specific CPU states Jan Kiszka
2009-05-02  0:20     ` [Qemu-devel] [PATCH 9/8 v2] " Jan Kiszka

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.