All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC 0/5] CPU: Detect put cpu register errors for migrations
@ 2022-06-07 23:06 Peter Xu
  2022-06-07 23:06 ` [PATCH RFC 1/5] cpus-common: Introduce run_on_cpu_func2 which allows error returns Peter Xu
                   ` (4 more replies)
  0 siblings, 5 replies; 10+ messages in thread
From: Peter Xu @ 2022-06-07 23:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Juan Quintela, Sean Christopherson, Leonardo Bras Soares Passos,
	Dr . David Alan Gilbert, Paolo Bonzini, Richard Henderson,
	Igor Mammedov, peterx

[Marking this as RFC]

This series teaches QEMU to detect errors when e.g. putting registers from
QEMU to KVM, and fail migrations properly.

For the rational of this series and why it was posted, please refer to the
bug report here:

https://lore.kernel.org/all/YppVupW+IWsm7Osr@xz-m1.local/

But I'd rather not go into that if the reviewer doesn't have that context,
because we don't really need that complexity..  It can be simple as we
should fail migration early when we see issues happening already, so:

  1) We fail explicitly, rather than afterward with some weird guest
     errors.  In my bug report, it was a guest double fault.  There's
     another bug report that Sean mentioned in the thread from Mike Tancsa
     that can have other sympotons rather than double fault, but anyway
     they'll be hard to diagnose since the processor state can be corrupted
     (please refer to kvm_arch_put_registers() where we stop putting more
     registers to KVM when we see any error).

  2) For precopy, with this early failure the VM won't crash itself since
     we still have a chance to keep running it on src host, while if
     without this patch we will fail later, and it can crash the VM.

In this specific case, when KVM_SET_XSAVE ioctl failed on dest host before
start running the VM there, we should fail the migration already.

After the patchset applied, the above "double fault" issue will become
migration failures, and...

For precopy, we can see some error dumped for precopy on dest, then the VM
will be kept running on src host:

2022-06-07T22:48:48.804234Z qemu-system-x86_64: kvm_arch_put_registers() failed with retval=-22
2022-06-07T22:48:48.804588Z qemu-system-x86_64: load of migration failed: Invalid argument

For postcopy, currently we'll pause the VM immediately for admin to decide
what to do:

2022-06-07T22:47:49.448192Z qemu-system-x86_64: kvm_arch_put_registers() failed with retval=-22
13072@1654642069.518993:runstate_set current_run_state 1 (inmigrate) new_state 4 (paused)

If something like this series is welcomed, we could do better in the future
by telling the src host about this issue and keep running, because
put-register happens right at the switch-over, so we actually have this
chance (no dirty page on dest host yet).

Comments welcomed.  Thanks,

Peter Xu (5):
  cpus-common: Introduce run_on_cpu_func2 which allows error returns
  cpus-common: Add run_on_cpu2()
  accel: Allow synchronize_post_init() to take an Error**
  cpu: Allow cpu_synchronize_all_post_init() to take an errp
  KVM: Hook kvm_arch_put_registers() errors to the caller

 accel/hvf/hvf-accel-ops.c     |  2 +-
 accel/kvm/kvm-all.c           | 15 +++++++---
 accel/kvm/kvm-cpus.h          |  2 +-
 cpus-common.c                 | 55 +++++++++++++++++++++++++++++++++--
 hw/core/machine.c             |  2 +-
 include/hw/core/cpu.h         | 28 ++++++++++++++++++
 include/sysemu/accel-ops.h    |  2 +-
 include/sysemu/cpus.h         |  2 +-
 include/sysemu/hw_accel.h     |  1 +
 migration/savevm.c            | 20 +++++++++++--
 softmmu/cpus.c                | 23 ++++++++++++---
 stubs/cpu-synchronize-state.c |  3 ++
 target/i386/hax/hax-all.c     |  2 +-
 target/i386/nvmm/nvmm-all.c   |  2 +-
 target/i386/whpx/whpx-all.c   |  2 +-
 15 files changed, 139 insertions(+), 22 deletions(-)

-- 
2.32.0



^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH RFC 1/5] cpus-common: Introduce run_on_cpu_func2 which allows error returns
  2022-06-07 23:06 [PATCH RFC 0/5] CPU: Detect put cpu register errors for migrations Peter Xu
@ 2022-06-07 23:06 ` Peter Xu
  2022-06-07 23:06 ` [PATCH RFC 2/5] cpus-common: Add run_on_cpu2() Peter Xu
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 10+ messages in thread
From: Peter Xu @ 2022-06-07 23:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Juan Quintela, Sean Christopherson, Leonardo Bras Soares Passos,
	Dr . David Alan Gilbert, Paolo Bonzini, Richard Henderson,
	Igor Mammedov, peterx

run_on_cpu API does not yet support any way to pass over an error message
to above.  Add a new run_on_cpu_func2 hook to grant possibility of that.

Note that this only changes the cpus-common core, no API is yet introduced
for v2 of the run_on_cpu_func function.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 cpus-common.c         | 28 +++++++++++++++++++++++++---
 include/hw/core/cpu.h |  2 ++
 2 files changed, 27 insertions(+), 3 deletions(-)

diff --git a/cpus-common.c b/cpus-common.c
index db459b41ce..1db7bbbb88 100644
--- a/cpus-common.c
+++ b/cpus-common.c
@@ -116,9 +116,20 @@ __thread CPUState *current_cpu;
 
 struct qemu_work_item {
     QSIMPLEQ_ENTRY(qemu_work_item) node;
-    run_on_cpu_func func;
+    union {
+        run_on_cpu_func func;     /* When has_errp==false */
+        run_on_cpu_func2 func2;   /* When has_errp==true  */
+    };
     run_on_cpu_data data;
     bool free, exclusive, done;
+
+    /*
+     * Below are only used by v2 of work item, where we allow to return
+     * errors for cpu work items.  When has_errp==true, then: (1) we call
+     * func2 rather than func, and (2) we pass in errp into func2() call.
+     */
+    bool has_errp;
+    Error **errp;
 };
 
 static void queue_work_on_cpu(CPUState *cpu, struct qemu_work_item *wi)
@@ -314,6 +325,17 @@ void async_safe_run_on_cpu(CPUState *cpu, run_on_cpu_func func,
     queue_work_on_cpu(cpu, wi);
 }
 
+static void process_one_work_item(struct qemu_work_item *wi, CPUState *cpu)
+{
+    if (wi->has_errp) {
+        /* V2 of work item, allows errors */
+        wi->func2(cpu, wi->data, wi->errp);
+    } else {
+        /* Old version of work item, no error returned */
+        wi->func(cpu, wi->data);
+    }
+}
+
 void process_queued_cpu_work(CPUState *cpu)
 {
     struct qemu_work_item *wi;
@@ -336,11 +358,11 @@ void process_queued_cpu_work(CPUState *cpu)
              */
             qemu_mutex_unlock_iothread();
             start_exclusive();
-            wi->func(cpu, wi->data);
+            process_one_work_item(wi, cpu);
             end_exclusive();
             qemu_mutex_lock_iothread();
         } else {
-            wi->func(cpu, wi->data);
+            process_one_work_item(wi, cpu);
         }
         qemu_mutex_lock(&cpu->work_mutex);
         if (wi->free) {
diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
index 996f94059f..7a303576d0 100644
--- a/include/hw/core/cpu.h
+++ b/include/hw/core/cpu.h
@@ -252,6 +252,8 @@ typedef union {
 #define RUN_ON_CPU_NULL           RUN_ON_CPU_HOST_PTR(NULL)
 
 typedef void (*run_on_cpu_func)(CPUState *cpu, run_on_cpu_data data);
+/* Same as run_on_cpu_func but allows to return an error */
+typedef void (*run_on_cpu_func2)(CPUState *cpu, run_on_cpu_data data, Error **errp);
 
 struct qemu_work_item;
 
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH RFC 2/5] cpus-common: Add run_on_cpu2()
  2022-06-07 23:06 [PATCH RFC 0/5] CPU: Detect put cpu register errors for migrations Peter Xu
  2022-06-07 23:06 ` [PATCH RFC 1/5] cpus-common: Introduce run_on_cpu_func2 which allows error returns Peter Xu
@ 2022-06-07 23:06 ` Peter Xu
  2022-06-07 23:06 ` [PATCH RFC 3/5] accel: Allow synchronize_post_init() to take an Error** Peter Xu
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 10+ messages in thread
From: Peter Xu @ 2022-06-07 23:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Juan Quintela, Sean Christopherson, Leonardo Bras Soares Passos,
	Dr . David Alan Gilbert, Paolo Bonzini, Richard Henderson,
	Igor Mammedov, peterx

This version of run_on_cpu() allows to take an Error** to detect errors.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 cpus-common.c         | 27 +++++++++++++++++++++++++++
 include/hw/core/cpu.h | 26 ++++++++++++++++++++++++++
 softmmu/cpus.c        |  6 ++++++
 3 files changed, 59 insertions(+)

diff --git a/cpus-common.c b/cpus-common.c
index 1db7bbbb88..1d67c0c655 100644
--- a/cpus-common.c
+++ b/cpus-common.c
@@ -167,6 +167,33 @@ void do_run_on_cpu(CPUState *cpu, run_on_cpu_func func, run_on_cpu_data data,
     }
 }
 
+void do_run_on_cpu2(CPUState *cpu, run_on_cpu_func2 func2, run_on_cpu_data data,
+                    QemuMutex *mutex, Error **errp)
+{
+    struct qemu_work_item wi;
+
+    if (qemu_cpu_is_self(cpu)) {
+        func2(cpu, data, errp);
+        return;
+    }
+
+    wi.func2 = func2;
+    wi.data = data;
+    wi.done = false;
+    wi.free = false;
+    wi.exclusive = false;
+    wi.has_errp = true;
+    wi.errp = errp;
+
+    queue_work_on_cpu(cpu, &wi);
+    while (!qatomic_mb_read(&wi.done)) {
+        CPUState *self_cpu = current_cpu;
+
+        qemu_cond_wait(&qemu_work_cond, mutex);
+        current_cpu = self_cpu;
+    }
+}
+
 void async_run_on_cpu(CPUState *cpu, run_on_cpu_func func, run_on_cpu_data data)
 {
     struct qemu_work_item *wi;
diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
index 7a303576d0..4bb40a03cf 100644
--- a/include/hw/core/cpu.h
+++ b/include/hw/core/cpu.h
@@ -709,6 +709,19 @@ bool cpu_is_stopped(CPUState *cpu);
 void do_run_on_cpu(CPUState *cpu, run_on_cpu_func func, run_on_cpu_data data,
                    QemuMutex *mutex);
 
+/**
+ * do_run_on_cpu2:
+ * @cpu: The vCPU to run on.
+ * @func2: The function to be executed.
+ * @data: Data to pass to the function.
+ * @mutex: Mutex to release while waiting for @func2 to run.
+ * @errp: The Error** pointer to be passed into @func2.
+ *
+ * Used internally in the implementation of run_on_cpu2.
+ */
+void do_run_on_cpu2(CPUState *cpu, run_on_cpu_func2 func2, run_on_cpu_data data,
+                    QemuMutex *mutex, Error **errp);
+
 /**
  * run_on_cpu:
  * @cpu: The vCPU to run on.
@@ -719,6 +732,19 @@ void do_run_on_cpu(CPUState *cpu, run_on_cpu_func func, run_on_cpu_data data,
  */
 void run_on_cpu(CPUState *cpu, run_on_cpu_func func, run_on_cpu_data data);
 
+/**
+ * run_on_cpu2:
+ * @cpu: The vCPU to run on.
+ * @func: The function to be executed.
+ * @data: Data to pass to the function.
+ * @errp: The Error** pointer to be passed into @func2.
+ *
+ * Schedules the function @func2 for execution on the vCPU @cpu, capture
+ * any error and put it into *@errp when provided.
+ */
+void run_on_cpu2(CPUState *cpu, run_on_cpu_func2 func2, run_on_cpu_data data,
+                 Error **errp);
+
 /**
  * async_run_on_cpu:
  * @cpu: The vCPU to run on.
diff --git a/softmmu/cpus.c b/softmmu/cpus.c
index 23b30484b2..898363a1d0 100644
--- a/softmmu/cpus.c
+++ b/softmmu/cpus.c
@@ -391,6 +391,12 @@ void run_on_cpu(CPUState *cpu, run_on_cpu_func func, run_on_cpu_data data)
     do_run_on_cpu(cpu, func, data, &qemu_global_mutex);
 }
 
+void run_on_cpu2(CPUState *cpu, run_on_cpu_func2 func2, run_on_cpu_data data,
+                 Error **errp)
+{
+    do_run_on_cpu2(cpu, func2, data, &qemu_global_mutex, errp);
+}
+
 static void qemu_cpu_stop(CPUState *cpu, bool exit)
 {
     g_assert(qemu_cpu_is_self(cpu));
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH RFC 3/5] accel: Allow synchronize_post_init() to take an Error**
  2022-06-07 23:06 [PATCH RFC 0/5] CPU: Detect put cpu register errors for migrations Peter Xu
  2022-06-07 23:06 ` [PATCH RFC 1/5] cpus-common: Introduce run_on_cpu_func2 which allows error returns Peter Xu
  2022-06-07 23:06 ` [PATCH RFC 2/5] cpus-common: Add run_on_cpu2() Peter Xu
@ 2022-06-07 23:06 ` Peter Xu
  2022-06-07 23:06 ` [PATCH RFC 4/5] cpu: Allow cpu_synchronize_all_post_init() to take an errp Peter Xu
  2022-06-07 23:06 ` [PATCH RFC 5/5] KVM: Hook kvm_arch_put_registers() errors to the caller Peter Xu
  4 siblings, 0 replies; 10+ messages in thread
From: Peter Xu @ 2022-06-07 23:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Juan Quintela, Sean Christopherson, Leonardo Bras Soares Passos,
	Dr . David Alan Gilbert, Paolo Bonzini, Richard Henderson,
	Igor Mammedov, peterx

It allows accel->synchronize_post_init() hook to return an error upwards.
Add a new cpu_synchronize_post_init_full() for it, then let the existing
cpu_synchronize_post_init() to call it with errp==NULL.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 accel/hvf/hvf-accel-ops.c     |  2 +-
 accel/kvm/kvm-all.c           |  2 +-
 include/sysemu/accel-ops.h    |  2 +-
 include/sysemu/hw_accel.h     |  1 +
 softmmu/cpus.c                | 10 ++++++++--
 stubs/cpu-synchronize-state.c |  3 +++
 target/i386/hax/hax-all.c     |  2 +-
 target/i386/nvmm/nvmm-all.c   |  2 +-
 target/i386/whpx/whpx-all.c   |  2 +-
 9 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/accel/hvf/hvf-accel-ops.c b/accel/hvf/hvf-accel-ops.c
index a70e2eb375..b439e0c104 100644
--- a/accel/hvf/hvf-accel-ops.c
+++ b/accel/hvf/hvf-accel-ops.c
@@ -228,7 +228,7 @@ static void hvf_cpu_synchronize_post_reset(CPUState *cpu)
     run_on_cpu(cpu, do_hvf_cpu_synchronize_set_dirty, RUN_ON_CPU_NULL);
 }
 
-static void hvf_cpu_synchronize_post_init(CPUState *cpu)
+static void hvf_cpu_synchronize_post_init(CPUState *cpu, Error **errp)
 {
     run_on_cpu(cpu, do_hvf_cpu_synchronize_set_dirty, RUN_ON_CPU_NULL);
 }
diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 32e177bd26..1caed1a295 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -2767,7 +2767,7 @@ static void do_kvm_cpu_synchronize_post_init(CPUState *cpu, run_on_cpu_data arg)
     cpu->vcpu_dirty = false;
 }
 
-void kvm_cpu_synchronize_post_init(CPUState *cpu)
+void kvm_cpu_synchronize_post_init(CPUState *cpu, Error **errp)
 {
     run_on_cpu(cpu, do_kvm_cpu_synchronize_post_init, RUN_ON_CPU_NULL);
 }
diff --git a/include/sysemu/accel-ops.h b/include/sysemu/accel-ops.h
index a0572ea87a..7e526d3c65 100644
--- a/include/sysemu/accel-ops.h
+++ b/include/sysemu/accel-ops.h
@@ -35,7 +35,7 @@ struct AccelOpsClass {
     bool (*cpu_thread_is_idle)(CPUState *cpu);
 
     void (*synchronize_post_reset)(CPUState *cpu);
-    void (*synchronize_post_init)(CPUState *cpu);
+    void (*synchronize_post_init)(CPUState *cpu, Error **errp);
     void (*synchronize_state)(CPUState *cpu);
     void (*synchronize_pre_loadvm)(CPUState *cpu);
     void (*synchronize_pre_resume)(bool step_pending);
diff --git a/include/sysemu/hw_accel.h b/include/sysemu/hw_accel.h
index 22903a55f7..3ee3508411 100644
--- a/include/sysemu/hw_accel.h
+++ b/include/sysemu/hw_accel.h
@@ -21,6 +21,7 @@
 void cpu_synchronize_state(CPUState *cpu);
 void cpu_synchronize_post_reset(CPUState *cpu);
 void cpu_synchronize_post_init(CPUState *cpu);
+void cpu_synchronize_post_init_full(CPUState *cpu, Error **errp);
 void cpu_synchronize_pre_loadvm(CPUState *cpu);
 
 #endif /* QEMU_HW_ACCEL_H */
diff --git a/softmmu/cpus.c b/softmmu/cpus.c
index 898363a1d0..464c06201c 100644
--- a/softmmu/cpus.c
+++ b/softmmu/cpus.c
@@ -178,13 +178,19 @@ void cpu_synchronize_post_reset(CPUState *cpu)
     }
 }
 
-void cpu_synchronize_post_init(CPUState *cpu)
+void cpu_synchronize_post_init_full(CPUState *cpu, Error **errp)
 {
     if (cpus_accel->synchronize_post_init) {
-        cpus_accel->synchronize_post_init(cpu);
+        cpus_accel->synchronize_post_init(cpu, errp);
     }
 }
 
+void cpu_synchronize_post_init(CPUState *cpu)
+{
+    /* errp=NULL means we won't capture any error */
+    cpu_synchronize_post_init_full(cpu, NULL);
+}
+
 void cpu_synchronize_pre_loadvm(CPUState *cpu)
 {
     if (cpus_accel->synchronize_pre_loadvm) {
diff --git a/stubs/cpu-synchronize-state.c b/stubs/cpu-synchronize-state.c
index d9211da66c..6d2c9f509a 100644
--- a/stubs/cpu-synchronize-state.c
+++ b/stubs/cpu-synchronize-state.c
@@ -7,3 +7,6 @@ void cpu_synchronize_state(CPUState *cpu)
 void cpu_synchronize_post_init(CPUState *cpu)
 {
 }
+void cpu_synchronize_post_init_full(CPUState *cpu, Error **errp)
+{
+}
diff --git a/target/i386/hax/hax-all.c b/target/i386/hax/hax-all.c
index b185ee8de4..782d83b531 100644
--- a/target/i386/hax/hax-all.c
+++ b/target/i386/hax/hax-all.c
@@ -651,7 +651,7 @@ static void do_hax_cpu_synchronize_post_init(CPUState *cpu, run_on_cpu_data arg)
     cpu->vcpu_dirty = false;
 }
 
-void hax_cpu_synchronize_post_init(CPUState *cpu)
+void hax_cpu_synchronize_post_init(CPUState *cpu, Error **errp)
 {
     run_on_cpu(cpu, do_hax_cpu_synchronize_post_init, RUN_ON_CPU_NULL);
 }
diff --git a/target/i386/nvmm/nvmm-all.c b/target/i386/nvmm/nvmm-all.c
index b75738ee9c..f429e940af 100644
--- a/target/i386/nvmm/nvmm-all.c
+++ b/target/i386/nvmm/nvmm-all.c
@@ -869,7 +869,7 @@ void nvmm_cpu_synchronize_post_reset(CPUState *cpu)
     run_on_cpu(cpu, do_nvmm_cpu_synchronize_post_reset, RUN_ON_CPU_NULL);
 }
 
-void nvmm_cpu_synchronize_post_init(CPUState *cpu)
+void nvmm_cpu_synchronize_post_init(CPUState *cpu, Error **errp)
 {
     run_on_cpu(cpu, do_nvmm_cpu_synchronize_post_init, RUN_ON_CPU_NULL);
 }
diff --git a/target/i386/whpx/whpx-all.c b/target/i386/whpx/whpx-all.c
index b22a3314b4..09bf5681ce 100644
--- a/target/i386/whpx/whpx-all.c
+++ b/target/i386/whpx/whpx-all.c
@@ -2123,7 +2123,7 @@ void whpx_cpu_synchronize_post_reset(CPUState *cpu)
     run_on_cpu(cpu, do_whpx_cpu_synchronize_post_reset, RUN_ON_CPU_NULL);
 }
 
-void whpx_cpu_synchronize_post_init(CPUState *cpu)
+void whpx_cpu_synchronize_post_init(CPUState *cpu, Error **errp)
 {
     run_on_cpu(cpu, do_whpx_cpu_synchronize_post_init, RUN_ON_CPU_NULL);
 }
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH RFC 4/5] cpu: Allow cpu_synchronize_all_post_init() to take an errp
  2022-06-07 23:06 [PATCH RFC 0/5] CPU: Detect put cpu register errors for migrations Peter Xu
                   ` (2 preceding siblings ...)
  2022-06-07 23:06 ` [PATCH RFC 3/5] accel: Allow synchronize_post_init() to take an Error** Peter Xu
@ 2022-06-07 23:06 ` Peter Xu
  2022-06-08 17:05   ` Dr. David Alan Gilbert
  2022-06-07 23:06 ` [PATCH RFC 5/5] KVM: Hook kvm_arch_put_registers() errors to the caller Peter Xu
  4 siblings, 1 reply; 10+ messages in thread
From: Peter Xu @ 2022-06-07 23:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Juan Quintela, Sean Christopherson, Leonardo Bras Soares Passos,
	Dr . David Alan Gilbert, Paolo Bonzini, Richard Henderson,
	Igor Mammedov, peterx

Allow cpu_synchronize_all_post_init() to fail with an errp when it's set.
Modify both precopy and postcopy to try to detect such error.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 hw/core/machine.c     |  2 +-
 include/sysemu/cpus.h |  2 +-
 migration/savevm.c    | 20 +++++++++++++++++---
 softmmu/cpus.c        |  2 +-
 4 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index c53548d0b1..b5daad82f8 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -1447,7 +1447,7 @@ void qemu_remove_machine_init_done_notifier(Notifier *notify)
 
 void qdev_machine_creation_done(void)
 {
-    cpu_synchronize_all_post_init();
+    cpu_synchronize_all_post_init(NULL);
 
     if (current_machine->boot_config.has_once) {
         qemu_boot_set(current_machine->boot_config.once, &error_fatal);
diff --git a/include/sysemu/cpus.h b/include/sysemu/cpus.h
index b5c87d48b3..a51ee46441 100644
--- a/include/sysemu/cpus.h
+++ b/include/sysemu/cpus.h
@@ -45,7 +45,7 @@ bool cpus_are_resettable(void);
 
 void cpu_synchronize_all_states(void);
 void cpu_synchronize_all_post_reset(void);
-void cpu_synchronize_all_post_init(void);
+void cpu_synchronize_all_post_init(Error **errp);
 void cpu_synchronize_all_pre_loadvm(void);
 
 #ifndef CONFIG_USER_ONLY
diff --git a/migration/savevm.c b/migration/savevm.c
index d9076897b8..1175ddefd4 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2005,7 +2005,17 @@ static void loadvm_postcopy_handle_run_bh(void *opaque)
     /* TODO we should move all of this lot into postcopy_ram.c or a shared code
      * in migration.c
      */
-    cpu_synchronize_all_post_init();
+    cpu_synchronize_all_post_init(&local_err);
+    if (local_err) {
+        /*
+         * TODO: a better way to do this is to tell the src that we cannot
+         * run the VM here so hopefully we can keep the VM running on src
+         * and immediately halt the switch-over.  But that needs work.
+         */
+        error_report_err(local_err);
+        local_err = NULL;
+        autostart = false;
+    }
 
     trace_loadvm_postcopy_handle_run_bh("after cpu sync");
 
@@ -2772,7 +2782,11 @@ int qemu_loadvm_state(QEMUFile *f)
     }
 
     qemu_loadvm_state_cleanup();
-    cpu_synchronize_all_post_init();
+    cpu_synchronize_all_post_init(&local_err);
+    if (local_err) {
+        error_report_err(local_err);
+        return -EINVAL;
+    }
 
     return ret;
 }
@@ -2789,7 +2803,7 @@ int qemu_load_device_state(QEMUFile *f)
         return ret;
     }
 
-    cpu_synchronize_all_post_init();
+    cpu_synchronize_all_post_init(NULL);
     return 0;
 }
 
diff --git a/softmmu/cpus.c b/softmmu/cpus.c
index 464c06201c..59c70fd496 100644
--- a/softmmu/cpus.c
+++ b/softmmu/cpus.c
@@ -146,7 +146,7 @@ void cpu_synchronize_all_post_reset(void)
     }
 }
 
-void cpu_synchronize_all_post_init(void)
+void cpu_synchronize_all_post_init(Error **errp)
 {
     CPUState *cpu;
 
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH RFC 5/5] KVM: Hook kvm_arch_put_registers() errors to the caller
  2022-06-07 23:06 [PATCH RFC 0/5] CPU: Detect put cpu register errors for migrations Peter Xu
                   ` (3 preceding siblings ...)
  2022-06-07 23:06 ` [PATCH RFC 4/5] cpu: Allow cpu_synchronize_all_post_init() to take an errp Peter Xu
@ 2022-06-07 23:06 ` Peter Xu
  4 siblings, 0 replies; 10+ messages in thread
From: Peter Xu @ 2022-06-07 23:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Juan Quintela, Sean Christopherson, Leonardo Bras Soares Passos,
	Dr . David Alan Gilbert, Paolo Bonzini, Richard Henderson,
	Igor Mammedov, peterx

Leverage the new mechanism to pass over errors to upper stack for
kvm_arch_put_registers() when called for the post_init() accel hook.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 accel/kvm/kvm-all.c  | 13 ++++++++++---
 accel/kvm/kvm-cpus.h |  2 +-
 softmmu/cpus.c       |  5 ++++-
 3 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 1caed1a295..71be723d24 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -2761,15 +2761,22 @@ void kvm_cpu_synchronize_post_reset(CPUState *cpu)
     run_on_cpu(cpu, do_kvm_cpu_synchronize_post_reset, RUN_ON_CPU_NULL);
 }
 
-static void do_kvm_cpu_synchronize_post_init(CPUState *cpu, run_on_cpu_data arg)
+static void do_kvm_cpu_synchronize_post_init(CPUState *cpu, run_on_cpu_data arg,
+                                             Error **errp)
 {
-    kvm_arch_put_registers(cpu, KVM_PUT_FULL_STATE);
+    int ret = kvm_arch_put_registers(cpu, KVM_PUT_FULL_STATE);
+
+    if (ret) {
+        error_setg(errp, "kvm_arch_put_registers() failed with retval=%d", ret);
+        return;
+    }
+
     cpu->vcpu_dirty = false;
 }
 
 void kvm_cpu_synchronize_post_init(CPUState *cpu, Error **errp)
 {
-    run_on_cpu(cpu, do_kvm_cpu_synchronize_post_init, RUN_ON_CPU_NULL);
+    run_on_cpu2(cpu, do_kvm_cpu_synchronize_post_init, RUN_ON_CPU_NULL, errp);
 }
 
 static void do_kvm_cpu_synchronize_pre_loadvm(CPUState *cpu, run_on_cpu_data arg)
diff --git a/accel/kvm/kvm-cpus.h b/accel/kvm/kvm-cpus.h
index bf0bd1bee4..c9b8262704 100644
--- a/accel/kvm/kvm-cpus.h
+++ b/accel/kvm/kvm-cpus.h
@@ -16,7 +16,7 @@ int kvm_init_vcpu(CPUState *cpu, Error **errp);
 int kvm_cpu_exec(CPUState *cpu);
 void kvm_destroy_vcpu(CPUState *cpu);
 void kvm_cpu_synchronize_post_reset(CPUState *cpu);
-void kvm_cpu_synchronize_post_init(CPUState *cpu);
+void kvm_cpu_synchronize_post_init(CPUState *cpu, Error **errp);
 void kvm_cpu_synchronize_pre_loadvm(CPUState *cpu);
 
 #endif /* KVM_CPUS_H */
diff --git a/softmmu/cpus.c b/softmmu/cpus.c
index 59c70fd496..6c0b5b87f0 100644
--- a/softmmu/cpus.c
+++ b/softmmu/cpus.c
@@ -151,7 +151,10 @@ void cpu_synchronize_all_post_init(Error **errp)
     CPUState *cpu;
 
     CPU_FOREACH(cpu) {
-        cpu_synchronize_post_init(cpu);
+        cpu_synchronize_post_init_full(cpu, errp);
+        if (errp && *errp) {
+            break;
+        }
     }
 }
 
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH RFC 4/5] cpu: Allow cpu_synchronize_all_post_init() to take an errp
  2022-06-07 23:06 ` [PATCH RFC 4/5] cpu: Allow cpu_synchronize_all_post_init() to take an errp Peter Xu
@ 2022-06-08 17:05   ` Dr. David Alan Gilbert
  2022-06-09 21:02     ` Peter Xu
  0 siblings, 1 reply; 10+ messages in thread
From: Dr. David Alan Gilbert @ 2022-06-08 17:05 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Juan Quintela, Sean Christopherson,
	Leonardo Bras Soares Passos, Paolo Bonzini, Richard Henderson,
	Igor Mammedov

* Peter Xu (peterx@redhat.com) wrote:
> Allow cpu_synchronize_all_post_init() to fail with an errp when it's set.
> Modify both precopy and postcopy to try to detect such error.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  hw/core/machine.c     |  2 +-
>  include/sysemu/cpus.h |  2 +-
>  migration/savevm.c    | 20 +++++++++++++++++---
>  softmmu/cpus.c        |  2 +-
>  4 files changed, 20 insertions(+), 6 deletions(-)
> 
> diff --git a/hw/core/machine.c b/hw/core/machine.c
> index c53548d0b1..b5daad82f8 100644
> --- a/hw/core/machine.c
> +++ b/hw/core/machine.c
> @@ -1447,7 +1447,7 @@ void qemu_remove_machine_init_done_notifier(Notifier *notify)
>  
>  void qdev_machine_creation_done(void)
>  {
> -    cpu_synchronize_all_post_init();
> +    cpu_synchronize_all_post_init(NULL);
>  
>      if (current_machine->boot_config.has_once) {
>          qemu_boot_set(current_machine->boot_config.once, &error_fatal);
> diff --git a/include/sysemu/cpus.h b/include/sysemu/cpus.h
> index b5c87d48b3..a51ee46441 100644
> --- a/include/sysemu/cpus.h
> +++ b/include/sysemu/cpus.h
> @@ -45,7 +45,7 @@ bool cpus_are_resettable(void);
>  
>  void cpu_synchronize_all_states(void);
>  void cpu_synchronize_all_post_reset(void);
> -void cpu_synchronize_all_post_init(void);
> +void cpu_synchronize_all_post_init(Error **errp);
>  void cpu_synchronize_all_pre_loadvm(void);
>  
>  #ifndef CONFIG_USER_ONLY
> diff --git a/migration/savevm.c b/migration/savevm.c
> index d9076897b8..1175ddefd4 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -2005,7 +2005,17 @@ static void loadvm_postcopy_handle_run_bh(void *opaque)
>      /* TODO we should move all of this lot into postcopy_ram.c or a shared code
>       * in migration.c
>       */
> -    cpu_synchronize_all_post_init();
> +    cpu_synchronize_all_post_init(&local_err);
> +    if (local_err) {
> +        /*
> +         * TODO: a better way to do this is to tell the src that we cannot
> +         * run the VM here so hopefully we can keep the VM running on src
> +         * and immediately halt the switch-over.  But that needs work.

Yes, I think it is possible; unlike some of the later errors in the same
function, in this case we know no disks/network/etc have been touched,
so we should be able to recover.
I wonder if we can move the postcopy_state_set(POSTCOPY_INCOMING_RUNNING)
out of loadvm_postcopy_handle_run to after this point.

We've already got the return path, so we should be able to signal the
failure unless we're very unlucky.

Dave

> +         */
> +        error_report_err(local_err);
> +        local_err = NULL;
> +        autostart = false;
> +    }
>  
>      trace_loadvm_postcopy_handle_run_bh("after cpu sync");
>  
> @@ -2772,7 +2782,11 @@ int qemu_loadvm_state(QEMUFile *f)
>      }
>  
>      qemu_loadvm_state_cleanup();
> -    cpu_synchronize_all_post_init();
> +    cpu_synchronize_all_post_init(&local_err);
> +    if (local_err) {
> +        error_report_err(local_err);
> +        return -EINVAL;
> +    }
>  
>      return ret;
>  }
> @@ -2789,7 +2803,7 @@ int qemu_load_device_state(QEMUFile *f)
>          return ret;
>      }
>  
> -    cpu_synchronize_all_post_init();
> +    cpu_synchronize_all_post_init(NULL);
>      return 0;
>  }
>  
> diff --git a/softmmu/cpus.c b/softmmu/cpus.c
> index 464c06201c..59c70fd496 100644
> --- a/softmmu/cpus.c
> +++ b/softmmu/cpus.c
> @@ -146,7 +146,7 @@ void cpu_synchronize_all_post_reset(void)
>      }
>  }
>  
> -void cpu_synchronize_all_post_init(void)
> +void cpu_synchronize_all_post_init(Error **errp)
>  {
>      CPUState *cpu;
>  
> -- 
> 2.32.0
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH RFC 4/5] cpu: Allow cpu_synchronize_all_post_init() to take an errp
  2022-06-08 17:05   ` Dr. David Alan Gilbert
@ 2022-06-09 21:02     ` Peter Xu
  2022-06-10 14:19       ` Peter Xu
  0 siblings, 1 reply; 10+ messages in thread
From: Peter Xu @ 2022-06-09 21:02 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: qemu-devel, Juan Quintela, Sean Christopherson,
	Leonardo Bras Soares Passos, Paolo Bonzini, Richard Henderson,
	Igor Mammedov

On Wed, Jun 08, 2022 at 06:05:28PM +0100, Dr. David Alan Gilbert wrote:
> > @@ -2005,7 +2005,17 @@ static void loadvm_postcopy_handle_run_bh(void *opaque)
> >      /* TODO we should move all of this lot into postcopy_ram.c or a shared code
> >       * in migration.c
> >       */
> > -    cpu_synchronize_all_post_init();
> > +    cpu_synchronize_all_post_init(&local_err);
> > +    if (local_err) {
> > +        /*
> > +         * TODO: a better way to do this is to tell the src that we cannot
> > +         * run the VM here so hopefully we can keep the VM running on src
> > +         * and immediately halt the switch-over.  But that needs work.
> 
> Yes, I think it is possible; unlike some of the later errors in the same
> function, in this case we know no disks/network/etc have been touched,
> so we should be able to recover.
> I wonder if we can move the postcopy_state_set(POSTCOPY_INCOMING_RUNNING)
> out of loadvm_postcopy_handle_run to after this point.
> 
> We've already got the return path, so we should be able to signal the
> failure unless we're very unlucky.

Right.  It's just that for the new ACK we may need to modify the return
path protocol for sure, because none of the existing ones can notify such
an information.

One idea is to reuse MIG_RP_MSG_RESUME_ACK, it was only used for postcopy
recovery before to do the final handshake with offload=1 only (which is
defined as MIGRATION_RESUME_ACK_VALUE).  We could try to fill in the
payload with some !1 value, to tell the source that we NACK the migration
then src fails the migration as long as possible?

That seems to be even compatibile with one old qemu migrating to a new qemu
scenario, because when the old qemu notices the MIG_RP_MSG_RESUME_ACK
message with !1 payload, it'll mark the rp bad:

  if (migrate_handle_rp_resume_ack(ms, tmp32)) {
      mark_source_rp_bad(ms);
      goto out;
  }

  static int migrate_handle_rp_resume_ack(MigrationState *s, uint32_t value)
  {
      trace_source_return_path_thread_resume_ack(value);
  
      if (value != MIGRATION_RESUME_ACK_VALUE) {
          error_report("%s: illegal resume_ack value %"PRIu32,
                       __func__, value);
          return -1;
      }
      ...
  }

If it looks generally good, I can try with such a change in v2.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH RFC 4/5] cpu: Allow cpu_synchronize_all_post_init() to take an errp
  2022-06-09 21:02     ` Peter Xu
@ 2022-06-10 14:19       ` Peter Xu
  2022-06-13 11:13         ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 10+ messages in thread
From: Peter Xu @ 2022-06-10 14:19 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: qemu-devel, Juan Quintela, Sean Christopherson,
	Leonardo Bras Soares Passos, Paolo Bonzini, Richard Henderson,
	Igor Mammedov

On Thu, Jun 09, 2022 at 05:02:29PM -0400, Peter Xu wrote:
> On Wed, Jun 08, 2022 at 06:05:28PM +0100, Dr. David Alan Gilbert wrote:
> > > @@ -2005,7 +2005,17 @@ static void loadvm_postcopy_handle_run_bh(void *opaque)
> > >      /* TODO we should move all of this lot into postcopy_ram.c or a shared code
> > >       * in migration.c
> > >       */
> > > -    cpu_synchronize_all_post_init();
> > > +    cpu_synchronize_all_post_init(&local_err);
> > > +    if (local_err) {
> > > +        /*
> > > +         * TODO: a better way to do this is to tell the src that we cannot
> > > +         * run the VM here so hopefully we can keep the VM running on src
> > > +         * and immediately halt the switch-over.  But that needs work.
> > 
> > Yes, I think it is possible; unlike some of the later errors in the same
> > function, in this case we know no disks/network/etc have been touched,
> > so we should be able to recover.
> > I wonder if we can move the postcopy_state_set(POSTCOPY_INCOMING_RUNNING)
> > out of loadvm_postcopy_handle_run to after this point.
> > 
> > We've already got the return path, so we should be able to signal the
> > failure unless we're very unlucky.
> 
> Right.  It's just that for the new ACK we may need to modify the return
> path protocol for sure, because none of the existing ones can notify such
> an information.
> 
> One idea is to reuse MIG_RP_MSG_RESUME_ACK, it was only used for postcopy
> recovery before to do the final handshake with offload=1 only (which is
> defined as MIGRATION_RESUME_ACK_VALUE).  We could try to fill in the
> payload with some !1 value, to tell the source that we NACK the migration
> then src fails the migration as long as possible?
> 
> That seems to be even compatibile with one old qemu migrating to a new qemu
> scenario, because when the old qemu notices the MIG_RP_MSG_RESUME_ACK
> message with !1 payload, it'll mark the rp bad:

Oh it won't be compatible..  The clean way to do this is we need to modify
the src qemu to halt in postcopy_start() to wait for that ack before
continue.  That may need another cap/param to enable.

The thing is I'm not very sure whether this will be worth it.

Non-compatible migrations should be rare on put register failures.  For the
issue I was working on, it was actually a kernel bug that triggered it but
it's just hard to figure out where's wrong.  With properly working kernels
and matching hosts they should just not really heppen.  I'm worried adding
too much complexity could over-engineer things without much benefits.
  
In that case, I'd think it proper if we start with what this patchset
provides, which at least allows us to fail in a crystal clear way?

> 
>   if (migrate_handle_rp_resume_ack(ms, tmp32)) {
>       mark_source_rp_bad(ms);
>       goto out;
>   }
> 
>   static int migrate_handle_rp_resume_ack(MigrationState *s, uint32_t value)
>   {
>       trace_source_return_path_thread_resume_ack(value);
>   
>       if (value != MIGRATION_RESUME_ACK_VALUE) {
>           error_report("%s: illegal resume_ack value %"PRIu32,
>                        __func__, value);
>           return -1;
>       }
>       ...
>   }
> 
> If it looks generally good, I can try with such a change in v2.
> 
> Thanks,
> 
> -- 
> Peter Xu

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH RFC 4/5] cpu: Allow cpu_synchronize_all_post_init() to take an errp
  2022-06-10 14:19       ` Peter Xu
@ 2022-06-13 11:13         ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 10+ messages in thread
From: Dr. David Alan Gilbert @ 2022-06-13 11:13 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Juan Quintela, Sean Christopherson,
	Leonardo Bras Soares Passos, Paolo Bonzini, Richard Henderson,
	Igor Mammedov

* Peter Xu (peterx@redhat.com) wrote:
> On Thu, Jun 09, 2022 at 05:02:29PM -0400, Peter Xu wrote:
> > On Wed, Jun 08, 2022 at 06:05:28PM +0100, Dr. David Alan Gilbert wrote:
> > > > @@ -2005,7 +2005,17 @@ static void loadvm_postcopy_handle_run_bh(void *opaque)
> > > >      /* TODO we should move all of this lot into postcopy_ram.c or a shared code
> > > >       * in migration.c
> > > >       */
> > > > -    cpu_synchronize_all_post_init();
> > > > +    cpu_synchronize_all_post_init(&local_err);
> > > > +    if (local_err) {
> > > > +        /*
> > > > +         * TODO: a better way to do this is to tell the src that we cannot
> > > > +         * run the VM here so hopefully we can keep the VM running on src
> > > > +         * and immediately halt the switch-over.  But that needs work.
> > > 
> > > Yes, I think it is possible; unlike some of the later errors in the same
> > > function, in this case we know no disks/network/etc have been touched,
> > > so we should be able to recover.
> > > I wonder if we can move the postcopy_state_set(POSTCOPY_INCOMING_RUNNING)
> > > out of loadvm_postcopy_handle_run to after this point.
> > > 
> > > We've already got the return path, so we should be able to signal the
> > > failure unless we're very unlucky.
> > 
> > Right.  It's just that for the new ACK we may need to modify the return
> > path protocol for sure, because none of the existing ones can notify such
> > an information.
> > 
> > One idea is to reuse MIG_RP_MSG_RESUME_ACK, it was only used for postcopy
> > recovery before to do the final handshake with offload=1 only (which is
> > defined as MIGRATION_RESUME_ACK_VALUE).  We could try to fill in the
> > payload with some !1 value, to tell the source that we NACK the migration
> > then src fails the migration as long as possible?
> > 
> > That seems to be even compatibile with one old qemu migrating to a new qemu
> > scenario, because when the old qemu notices the MIG_RP_MSG_RESUME_ACK
> > message with !1 payload, it'll mark the rp bad:
> 
> Oh it won't be compatible..  The clean way to do this is we need to modify
> the src qemu to halt in postcopy_start() to wait for that ack before
> continue.  That may need another cap/param to enable.

OK; I was wondering aobut sending a RP_MSG_SHUT with a failure; but if
you'd need to change the source it's still a problem.

> The thing is I'm not very sure whether this will be worth it.
> 
> Non-compatible migrations should be rare on put register failures.  For the
> issue I was working on, it was actually a kernel bug that triggered it but
> it's just hard to figure out where's wrong.  With properly working kernels
> and matching hosts they should just not really heppen.  I'm worried adding
> too much complexity could over-engineer things without much benefits.

OK that makes sense.

> In that case, I'd think it proper if we start with what this patchset
> provides, which at least allows us to fail in a crystal clear way?

Yes, the clear error is important.

Dave

> > 
> >   if (migrate_handle_rp_resume_ack(ms, tmp32)) {
> >       mark_source_rp_bad(ms);
> >       goto out;
> >   }
> > 
> >   static int migrate_handle_rp_resume_ack(MigrationState *s, uint32_t value)
> >   {
> >       trace_source_return_path_thread_resume_ack(value);
> >   
> >       if (value != MIGRATION_RESUME_ACK_VALUE) {
> >           error_report("%s: illegal resume_ack value %"PRIu32,
> >                        __func__, value);
> >           return -1;
> >       }
> >       ...
> >   }
> > 
> > If it looks generally good, I can try with such a change in v2.
> > 
> > Thanks,
> > 
> > -- 
> > Peter Xu
> 
> -- 
> Peter Xu
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-06-13 11:14 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-07 23:06 [PATCH RFC 0/5] CPU: Detect put cpu register errors for migrations Peter Xu
2022-06-07 23:06 ` [PATCH RFC 1/5] cpus-common: Introduce run_on_cpu_func2 which allows error returns Peter Xu
2022-06-07 23:06 ` [PATCH RFC 2/5] cpus-common: Add run_on_cpu2() Peter Xu
2022-06-07 23:06 ` [PATCH RFC 3/5] accel: Allow synchronize_post_init() to take an Error** Peter Xu
2022-06-07 23:06 ` [PATCH RFC 4/5] cpu: Allow cpu_synchronize_all_post_init() to take an errp Peter Xu
2022-06-08 17:05   ` Dr. David Alan Gilbert
2022-06-09 21:02     ` Peter Xu
2022-06-10 14:19       ` Peter Xu
2022-06-13 11:13         ` Dr. David Alan Gilbert
2022-06-07 23:06 ` [PATCH RFC 5/5] KVM: Hook kvm_arch_put_registers() errors to the caller Peter Xu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.