All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH RFC v6 0/7] qemu_thread_create: propagate errors to callers to check
@ 2018-10-29 12:58 Fei Li
  2018-10-29 12:58 ` [Qemu-devel] [PATCH RFC v6 1/7] Fix segmentation fault when qemu_signal_init fails Fei Li
                   ` (6 more replies)
  0 siblings, 7 replies; 17+ messages in thread
From: Fei Li @ 2018-10-29 12:58 UTC (permalink / raw)
  To: qemu-devel; +Cc: armbru, dgilbert, famz, peterx, quintela

Hi,

This idea comes from BiteSizedTasks, and this patch series implement
the error checking of qemu_thread_create: make qemu_thread_create
return a flag to indicate if it succeeded rather than failing with an
error; make all callers check it.

The first and the third patch fixes some segmentation faults occured
during the debugging.   The second patch paves the way for the last
patch as that patch is too long.   The last patch modifies the
qemu_thread_create() and makes it return a bool to all direct callers
to indicate if it succeeds.   The middle three fixes some migration
issues.

Please help to review, thanks. :)

v6:
- Add a new migration-multifd related patch. BTW, delete the previous
  vnc related patch as it has been upstreamed.
- Use error_setg_errno() to set the errno when qemu_thread_create()
  fails for both Linux and Windows implementation.
- Optimize the first patch, less codes are needed

v5:
- Remove `errno = err` in qemu_thread_create() for Linux, and change
  `return errno` to `return -1` in qemu_signal_init() to indicate
  the error in case qemu_thread_create() fails.
- Delete the v4-added qemu_cond/mutex_destroy() in iothread_complete()
  as the destroy() will be done by its callers' object_unref().

v4:
- Separate the migration compression patch from this series
- Add one more error handling patch related with migration
- Add more cleaning up code for touched functions

v3:
- Add two migration related patches to fix the segmentaion fault
- Extract the segmentation fault fix from v2's last patch to be a 
  separate patch
- Add cleaning up code for touched functions
- Update some error messages

v2:
- Pass errp straightly instead of using a local_err & error_propagate
- Return a bool: false/true to indicate if one function succeeds
- Merge v1's last two patches into one to avoid the compile error
- Fix one omitted error in patch1 and update some error messages

Fei Li (7):
  Fix segmentation fault when qemu_signal_init fails
  qemu_init_vcpu: add a new Error parameter to propagate
  qemu_thread_join: fix segmentation fault
  migration: fix some segmentation faults when using multifd
  migration: fix the multifd code when receiving less channels
  migration: fix some error handling
  qemu_thread_create: propagate the error to callers to handle

 accel/tcg/user-exec-stub.c      |  3 +-
 cpus.c                          | 79 +++++++++++++++++++++-------------
 dump.c                          |  6 ++-
 hw/misc/edu.c                   |  6 ++-
 hw/ppc/spapr_hcall.c            | 10 ++++-
 hw/rdma/rdma_backend.c          |  4 +-
 hw/usb/ccid-card-emulated.c     | 13 ++++--
 include/qemu/thread.h           |  4 +-
 include/qom/cpu.h               |  2 +-
 io/task.c                       |  3 +-
 iothread.c                      | 16 ++++---
 migration/channel.c             |  7 +++-
 migration/migration.c           | 66 ++++++++++++++++++-----------
 migration/migration.h           |  2 +-
 migration/postcopy-ram.c        | 17 +++++++-
 migration/ram.c                 | 93 ++++++++++++++++++++++++++++++-----------
 migration/ram.h                 |  4 +-
 migration/savevm.c              | 11 +++--
 target/alpha/cpu.c              |  4 +-
 target/arm/cpu.c                |  4 +-
 target/cris/cpu.c               |  4 +-
 target/hppa/cpu.c               |  4 +-
 target/i386/cpu.c               |  4 +-
 target/lm32/cpu.c               |  4 +-
 target/m68k/cpu.c               |  4 +-
 target/microblaze/cpu.c         |  4 +-
 target/mips/cpu.c               |  4 +-
 target/moxie/cpu.c              |  4 +-
 target/nios2/cpu.c              |  4 +-
 target/openrisc/cpu.c           |  4 +-
 target/ppc/translate_init.inc.c |  4 +-
 target/riscv/cpu.c              |  4 +-
 target/s390x/cpu.c              |  4 +-
 target/sh4/cpu.c                |  4 +-
 target/sparc/cpu.c              |  4 +-
 target/tilegx/cpu.c             |  4 +-
 target/tricore/cpu.c            |  4 +-
 target/unicore32/cpu.c          |  4 +-
 target/xtensa/cpu.c             |  4 +-
 tests/atomic_add-bench.c        |  3 +-
 tests/iothread.c                |  2 +-
 tests/qht-bench.c               |  3 +-
 tests/rcutorture.c              |  3 +-
 tests/test-aio.c                |  2 +-
 tests/test-rcu-list.c           |  3 +-
 ui/vnc-jobs.c                   | 17 +++++---
 ui/vnc-jobs.h                   |  2 +-
 ui/vnc.c                        |  4 +-
 util/compatfd.c                 | 12 +++++-
 util/main-loop.c                |  8 ++--
 util/oslib-posix.c              | 17 ++++++--
 util/qemu-thread-posix.c        | 27 ++++++++----
 util/qemu-thread-win32.c        | 18 +++++---
 util/rcu.c                      |  3 +-
 util/thread-pool.c              |  4 +-
 55 files changed, 393 insertions(+), 165 deletions(-)

-- 
2.13.7

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH RFC v6 1/7] Fix segmentation fault when qemu_signal_init fails
  2018-10-29 12:58 [Qemu-devel] [PATCH RFC v6 0/7] qemu_thread_create: propagate errors to callers to check Fei Li
@ 2018-10-29 12:58 ` Fei Li
  2018-10-29 12:58 ` [Qemu-devel] [PATCH RFC v6 2/7] qemu_init_vcpu: add a new Error parameter to propagate Fei Li
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 17+ messages in thread
From: Fei Li @ 2018-10-29 12:58 UTC (permalink / raw)
  To: qemu-devel; +Cc: armbru, dgilbert, famz, peterx, quintela

When qemu_signal_init() fails in qemu_init_main_loop(), we return
without setting an error.  Its callers crash then when they try to
report the error with error_report_err().

To avoid such segmentation fault, add a new Error parameter to make
the call trace to propagate the err to the final caller.

Cc: Markus Armbruster <armbru@redhat.com>
Cc: Fam Zheng <famz@redhat.com>
Signed-off-by: Fei Li <fli@suse.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
---
 util/main-loop.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/util/main-loop.c b/util/main-loop.c
index affe0403c5..443cb4cfe8 100644
--- a/util/main-loop.c
+++ b/util/main-loop.c
@@ -71,7 +71,7 @@ static void sigfd_handler(void *opaque)
     }
 }
 
-static int qemu_signal_init(void)
+static int qemu_signal_init(Error **errp)
 {
     int sigfd;
     sigset_t set;
@@ -96,7 +96,7 @@ static int qemu_signal_init(void)
     sigdelset(&set, SIG_IPI);
     sigfd = qemu_signalfd(&set);
     if (sigfd == -1) {
-        fprintf(stderr, "failed to create signalfd\n");
+        error_setg_errno(errp, errno, "failed to create signalfd");
         return -errno;
     }
 
@@ -109,7 +109,7 @@ static int qemu_signal_init(void)
 
 #else /* _WIN32 */
 
-static int qemu_signal_init(void)
+static int qemu_signal_init(Error **errp)
 {
     return 0;
 }
@@ -148,7 +148,7 @@ int qemu_init_main_loop(Error **errp)
 
     init_clocks(qemu_timer_notify_cb);
 
-    ret = qemu_signal_init();
+    ret = qemu_signal_init(errp);
     if (ret) {
         return ret;
     }
-- 
2.13.7

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH RFC v6 2/7] qemu_init_vcpu: add a new Error parameter to propagate
  2018-10-29 12:58 [Qemu-devel] [PATCH RFC v6 0/7] qemu_thread_create: propagate errors to callers to check Fei Li
  2018-10-29 12:58 ` [Qemu-devel] [PATCH RFC v6 1/7] Fix segmentation fault when qemu_signal_init fails Fei Li
@ 2018-10-29 12:58 ` Fei Li
  2018-10-29 12:58 ` [Qemu-devel] [PATCH RFC v6 3/7] qemu_thread_join: fix segmentation fault Fei Li
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 17+ messages in thread
From: Fei Li @ 2018-10-29 12:58 UTC (permalink / raw)
  To: qemu-devel; +Cc: armbru, dgilbert, famz, peterx, quintela

This patch is to pave the way for a later patch as it is too long:
"qemu_thread_create: propagate the error to callers to handle."

The callers of qemu_init_vcpu() already passed the **errp to handle
errors. In view of this, add a new Error parameter to all the
functions called by qemu_init_vcpu() to propagate the error and let
the further callers check it.

Besides, make qemu_init_vcpu() return a Boolean value to let its
callers know whether it succeeds.

Signed-off-by: Fei Li <fli@suse.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
---
 accel/tcg/user-exec-stub.c      |  3 ++-
 cpus.c                          | 34 +++++++++++++++++++++-------------
 include/qom/cpu.h               |  2 +-
 target/alpha/cpu.c              |  4 +++-
 target/arm/cpu.c                |  4 +++-
 target/cris/cpu.c               |  4 +++-
 target/hppa/cpu.c               |  4 +++-
 target/i386/cpu.c               |  4 +++-
 target/lm32/cpu.c               |  4 +++-
 target/m68k/cpu.c               |  4 +++-
 target/microblaze/cpu.c         |  4 +++-
 target/mips/cpu.c               |  4 +++-
 target/moxie/cpu.c              |  4 +++-
 target/nios2/cpu.c              |  4 +++-
 target/openrisc/cpu.c           |  4 +++-
 target/ppc/translate_init.inc.c |  4 +++-
 target/riscv/cpu.c              |  4 +++-
 target/s390x/cpu.c              |  4 +++-
 target/sh4/cpu.c                |  4 +++-
 target/sparc/cpu.c              |  4 +++-
 target/tilegx/cpu.c             |  4 +++-
 target/tricore/cpu.c            |  4 +++-
 target/unicore32/cpu.c          |  4 +++-
 target/xtensa/cpu.c             |  4 +++-
 24 files changed, 87 insertions(+), 36 deletions(-)

diff --git a/accel/tcg/user-exec-stub.c b/accel/tcg/user-exec-stub.c
index a32b4496af..f8c38a375c 100644
--- a/accel/tcg/user-exec-stub.c
+++ b/accel/tcg/user-exec-stub.c
@@ -10,8 +10,9 @@ void cpu_resume(CPUState *cpu)
 {
 }
 
-void qemu_init_vcpu(CPUState *cpu)
+bool qemu_init_vcpu(CPUState *cpu, Error **errp)
 {
+    return true;
 }
 
 /* User mode emulation does not support record/replay yet.  */
diff --git a/cpus.c b/cpus.c
index 3978f63d8f..ed71618e1f 100644
--- a/cpus.c
+++ b/cpus.c
@@ -1919,7 +1919,7 @@ void cpu_remove_sync(CPUState *cpu)
 /* For temporary buffers for forming a name */
 #define VCPU_THREAD_NAME_SIZE 16
 
-static void qemu_tcg_init_vcpu(CPUState *cpu)
+static void qemu_tcg_init_vcpu(CPUState *cpu, Error **errp)
 {
     char thread_name[VCPU_THREAD_NAME_SIZE];
     static QemuCond *single_tcg_halt_cond;
@@ -1975,7 +1975,7 @@ static void qemu_tcg_init_vcpu(CPUState *cpu)
     }
 }
 
-static void qemu_hax_start_vcpu(CPUState *cpu)
+static void qemu_hax_start_vcpu(CPUState *cpu, Error **errp)
 {
     char thread_name[VCPU_THREAD_NAME_SIZE];
 
@@ -1992,7 +1992,7 @@ static void qemu_hax_start_vcpu(CPUState *cpu)
 #endif
 }
 
-static void qemu_kvm_start_vcpu(CPUState *cpu)
+static void qemu_kvm_start_vcpu(CPUState *cpu, Error **errp)
 {
     char thread_name[VCPU_THREAD_NAME_SIZE];
 
@@ -2005,7 +2005,7 @@ static void qemu_kvm_start_vcpu(CPUState *cpu)
                        cpu, QEMU_THREAD_JOINABLE);
 }
 
-static void qemu_hvf_start_vcpu(CPUState *cpu)
+static void qemu_hvf_start_vcpu(CPUState *cpu, Error **errp)
 {
     char thread_name[VCPU_THREAD_NAME_SIZE];
 
@@ -2023,7 +2023,7 @@ static void qemu_hvf_start_vcpu(CPUState *cpu)
                        cpu, QEMU_THREAD_JOINABLE);
 }
 
-static void qemu_whpx_start_vcpu(CPUState *cpu)
+static void qemu_whpx_start_vcpu(CPUState *cpu, Error **errp)
 {
     char thread_name[VCPU_THREAD_NAME_SIZE];
 
@@ -2039,7 +2039,7 @@ static void qemu_whpx_start_vcpu(CPUState *cpu)
 #endif
 }
 
-static void qemu_dummy_start_vcpu(CPUState *cpu)
+static void qemu_dummy_start_vcpu(CPUState *cpu, Error **errp)
 {
     char thread_name[VCPU_THREAD_NAME_SIZE];
 
@@ -2052,11 +2052,12 @@ static void qemu_dummy_start_vcpu(CPUState *cpu)
                        QEMU_THREAD_JOINABLE);
 }
 
-void qemu_init_vcpu(CPUState *cpu)
+bool qemu_init_vcpu(CPUState *cpu, Error **errp)
 {
     cpu->nr_cores = smp_cores;
     cpu->nr_threads = smp_threads;
     cpu->stopped = true;
+    Error *local_err = NULL;
 
     if (!cpu->as) {
         /* If the target cpu hasn't set up any address spaces itself,
@@ -2067,22 +2068,29 @@ void qemu_init_vcpu(CPUState *cpu)
     }
 
     if (kvm_enabled()) {
-        qemu_kvm_start_vcpu(cpu);
+        qemu_kvm_start_vcpu(cpu, &local_err);
     } else if (hax_enabled()) {
-        qemu_hax_start_vcpu(cpu);
+        qemu_hax_start_vcpu(cpu, &local_err);
     } else if (hvf_enabled()) {
-        qemu_hvf_start_vcpu(cpu);
+        qemu_hvf_start_vcpu(cpu, &local_err);
     } else if (tcg_enabled()) {
-        qemu_tcg_init_vcpu(cpu);
+        qemu_tcg_init_vcpu(cpu, &local_err);
     } else if (whpx_enabled()) {
-        qemu_whpx_start_vcpu(cpu);
+        qemu_whpx_start_vcpu(cpu, &local_err);
     } else {
-        qemu_dummy_start_vcpu(cpu);
+        qemu_dummy_start_vcpu(cpu, &local_err);
+    }
+
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return false;
     }
 
     while (!cpu->created) {
         qemu_cond_wait(&qemu_cpu_cond, &qemu_global_mutex);
     }
+
+    return true;
 }
 
 void cpu_stop_current(void)
diff --git a/include/qom/cpu.h b/include/qom/cpu.h
index 4e238b0d9f..6254cdb2bf 100644
--- a/include/qom/cpu.h
+++ b/include/qom/cpu.h
@@ -1012,7 +1012,7 @@ void end_exclusive(void);
  *
  * Initializes a vCPU.
  */
-void qemu_init_vcpu(CPUState *cpu);
+bool qemu_init_vcpu(CPUState *cpu, Error **errp);
 
 #define SSTEP_ENABLE  0x1  /* Enable simulated HW single stepping */
 #define SSTEP_NOIRQ   0x2  /* Do not use IRQ while single stepping */
diff --git a/target/alpha/cpu.c b/target/alpha/cpu.c
index a953897fcc..bf3c34516d 100644
--- a/target/alpha/cpu.c
+++ b/target/alpha/cpu.c
@@ -66,7 +66,9 @@ static void alpha_cpu_realizefn(DeviceState *dev, Error **errp)
         return;
     }
 
-    qemu_init_vcpu(cs);
+    if (!qemu_init_vcpu(cs, errp)) {
+        return;
+    }
 
     acc->parent_realize(dev, errp);
 }
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index 8f16e96b6c..85a96554fb 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -1036,7 +1036,9 @@ static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
     }
 #endif
 
-    qemu_init_vcpu(cs);
+    if (!qemu_init_vcpu(cs, errp)) {
+        return;
+    }
     cpu_reset(cs);
 
     acc->parent_realize(dev, errp);
diff --git a/target/cris/cpu.c b/target/cris/cpu.c
index a23aba2688..ec92d69781 100644
--- a/target/cris/cpu.c
+++ b/target/cris/cpu.c
@@ -140,7 +140,9 @@ static void cris_cpu_realizefn(DeviceState *dev, Error **errp)
     }
 
     cpu_reset(cs);
-    qemu_init_vcpu(cs);
+    if (!qemu_init_vcpu(cs, errp)) {
+        return;
+    }
 
     ccc->parent_realize(dev, errp);
 }
diff --git a/target/hppa/cpu.c b/target/hppa/cpu.c
index 00bf444620..08f600ced9 100644
--- a/target/hppa/cpu.c
+++ b/target/hppa/cpu.c
@@ -98,7 +98,9 @@ static void hppa_cpu_realizefn(DeviceState *dev, Error **errp)
         return;
     }
 
-    qemu_init_vcpu(cs);
+    if (!qemu_init_vcpu(cs, errp)) {
+        return;
+    }
     acc->parent_realize(dev, errp);
 
 #ifndef CONFIG_USER_ONLY
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 1469a1be01..81fd4a365c 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -5112,7 +5112,9 @@ static void x86_cpu_realizefn(DeviceState *dev, Error **errp)
     }
 #endif
 
-    qemu_init_vcpu(cs);
+    if (!qemu_init_vcpu(cs, errp)) {
+        return;
+    }
 
     /*
      * Most Intel and certain AMD CPUs support hyperthreading. Even though QEMU
diff --git a/target/lm32/cpu.c b/target/lm32/cpu.c
index b7499cb627..d50b1e4a43 100644
--- a/target/lm32/cpu.c
+++ b/target/lm32/cpu.c
@@ -139,7 +139,9 @@ static void lm32_cpu_realizefn(DeviceState *dev, Error **errp)
 
     cpu_reset(cs);
 
-    qemu_init_vcpu(cs);
+    if (!qemu_init_vcpu(cs, errp)) {
+        return;
+    }
 
     lcc->parent_realize(dev, errp);
 }
diff --git a/target/m68k/cpu.c b/target/m68k/cpu.c
index 582e3a73b3..4ab53f2d58 100644
--- a/target/m68k/cpu.c
+++ b/target/m68k/cpu.c
@@ -231,7 +231,9 @@ static void m68k_cpu_realizefn(DeviceState *dev, Error **errp)
     m68k_cpu_init_gdb(cpu);
 
     cpu_reset(cs);
-    qemu_init_vcpu(cs);
+    if (!qemu_init_vcpu(cs, errp)) {
+        return;
+    }
 
     mcc->parent_realize(dev, errp);
 }
diff --git a/target/microblaze/cpu.c b/target/microblaze/cpu.c
index 9b546a2c18..3906c864a3 100644
--- a/target/microblaze/cpu.c
+++ b/target/microblaze/cpu.c
@@ -161,7 +161,9 @@ static void mb_cpu_realizefn(DeviceState *dev, Error **errp)
         return;
     }
 
-    qemu_init_vcpu(cs);
+    if (!qemu_init_vcpu(cs, errp)) {
+        return;
+    }
 
     env->pvr.regs[0] = PVR0_USE_EXC_MASK \
                        | PVR0_USE_ICACHE_MASK \
diff --git a/target/mips/cpu.c b/target/mips/cpu.c
index e217fb3e36..1e5aa69c57 100644
--- a/target/mips/cpu.c
+++ b/target/mips/cpu.c
@@ -145,7 +145,9 @@ static void mips_cpu_realizefn(DeviceState *dev, Error **errp)
     cpu_mips_realize_env(&cpu->env);
 
     cpu_reset(cs);
-    qemu_init_vcpu(cs);
+    if (!qemu_init_vcpu(cs, errp)) {
+        return;
+    }
 
     mcc->parent_realize(dev, errp);
 }
diff --git a/target/moxie/cpu.c b/target/moxie/cpu.c
index 8d67eb6727..8581a6d922 100644
--- a/target/moxie/cpu.c
+++ b/target/moxie/cpu.c
@@ -66,7 +66,9 @@ static void moxie_cpu_realizefn(DeviceState *dev, Error **errp)
         return;
     }
 
-    qemu_init_vcpu(cs);
+    if (!qemu_init_vcpu(cs, errp)) {
+        return;
+    }
     cpu_reset(cs);
 
     mcc->parent_realize(dev, errp);
diff --git a/target/nios2/cpu.c b/target/nios2/cpu.c
index fbfaa2ce26..5c7b4b486e 100644
--- a/target/nios2/cpu.c
+++ b/target/nios2/cpu.c
@@ -94,7 +94,9 @@ static void nios2_cpu_realizefn(DeviceState *dev, Error **errp)
         return;
     }
 
-    qemu_init_vcpu(cs);
+    if (!qemu_init_vcpu(cs, errp)) {
+        return;
+    }
     cpu_reset(cs);
 
     ncc->parent_realize(dev, errp);
diff --git a/target/openrisc/cpu.c b/target/openrisc/cpu.c
index fb7cb5c507..a6dcdb9df9 100644
--- a/target/openrisc/cpu.c
+++ b/target/openrisc/cpu.c
@@ -83,7 +83,9 @@ static void openrisc_cpu_realizefn(DeviceState *dev, Error **errp)
         return;
     }
 
-    qemu_init_vcpu(cs);
+    if (!qemu_init_vcpu(cs, errp)) {
+        return;
+    }
     cpu_reset(cs);
 
     occ->parent_realize(dev, errp);
diff --git a/target/ppc/translate_init.inc.c b/target/ppc/translate_init.inc.c
index ee9432eb15..a4c392eb21 100644
--- a/target/ppc/translate_init.inc.c
+++ b/target/ppc/translate_init.inc.c
@@ -9694,7 +9694,9 @@ static void ppc_cpu_realize(DeviceState *dev, Error **errp)
                                  32, "power-vsx.xml", 0);
     }
 
-    qemu_init_vcpu(cs);
+    if (!qemu_init_vcpu(cs, errp)) {
+        goto unrealize;
+    }
 
     pcc->parent_realize(dev, errp);
 
diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index a025a0a3ba..9829fd9bc4 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -305,7 +305,9 @@ static void riscv_cpu_realize(DeviceState *dev, Error **errp)
         return;
     }
 
-    qemu_init_vcpu(cs);
+    if (!qemu_init_vcpu(cs, errp)) {
+        return;
+    }
     cpu_reset(cs);
 
     mcc->parent_realize(dev, errp);
diff --git a/target/s390x/cpu.c b/target/s390x/cpu.c
index 18ba7f85a5..2a3eac9761 100644
--- a/target/s390x/cpu.c
+++ b/target/s390x/cpu.c
@@ -222,7 +222,9 @@ static void s390_cpu_realizefn(DeviceState *dev, Error **errp)
     qemu_register_reset(s390_cpu_machine_reset_cb, cpu);
 #endif
     s390_cpu_gdb_init(cs);
-    qemu_init_vcpu(cs);
+    if (!qemu_init_vcpu(cs, errp)) {
+        return;
+    }
 
     /*
      * KVM requires the initial CPU reset ioctl to be executed on the target
diff --git a/target/sh4/cpu.c b/target/sh4/cpu.c
index b9f393b7c7..d32ef2e1cb 100644
--- a/target/sh4/cpu.c
+++ b/target/sh4/cpu.c
@@ -196,7 +196,9 @@ static void superh_cpu_realizefn(DeviceState *dev, Error **errp)
     }
 
     cpu_reset(cs);
-    qemu_init_vcpu(cs);
+    if (!qemu_init_vcpu(cs, errp)) {
+        return;
+    }
 
     scc->parent_realize(dev, errp);
 }
diff --git a/target/sparc/cpu.c b/target/sparc/cpu.c
index 0f090ece54..9c22f6a7df 100644
--- a/target/sparc/cpu.c
+++ b/target/sparc/cpu.c
@@ -773,7 +773,9 @@ static void sparc_cpu_realizefn(DeviceState *dev, Error **errp)
         return;
     }
 
-    qemu_init_vcpu(cs);
+    if (!qemu_init_vcpu(cs, errp)) {
+        return;
+    }
 
     scc->parent_realize(dev, errp);
 }
diff --git a/target/tilegx/cpu.c b/target/tilegx/cpu.c
index bfe9be59b5..234148fabd 100644
--- a/target/tilegx/cpu.c
+++ b/target/tilegx/cpu.c
@@ -92,7 +92,9 @@ static void tilegx_cpu_realizefn(DeviceState *dev, Error **errp)
     }
 
     cpu_reset(cs);
-    qemu_init_vcpu(cs);
+    if (!qemu_init_vcpu(cs, errp)) {
+        return;
+    }
 
     tcc->parent_realize(dev, errp);
 }
diff --git a/target/tricore/cpu.c b/target/tricore/cpu.c
index 2edaef1aef..5482d6ea3f 100644
--- a/target/tricore/cpu.c
+++ b/target/tricore/cpu.c
@@ -96,7 +96,9 @@ static void tricore_cpu_realizefn(DeviceState *dev, Error **errp)
         set_feature(env, TRICORE_FEATURE_13);
     }
     cpu_reset(cs);
-    qemu_init_vcpu(cs);
+    if (!qemu_init_vcpu(cs, errp)) {
+        return;
+    }
 
     tcc->parent_realize(dev, errp);
 }
diff --git a/target/unicore32/cpu.c b/target/unicore32/cpu.c
index 2b49d1ca40..0c737c3187 100644
--- a/target/unicore32/cpu.c
+++ b/target/unicore32/cpu.c
@@ -96,7 +96,9 @@ static void uc32_cpu_realizefn(DeviceState *dev, Error **errp)
         return;
     }
 
-    qemu_init_vcpu(cs);
+    if (!qemu_init_vcpu(cs, errp)) {
+        return;
+    }
 
     ucc->parent_realize(dev, errp);
 }
diff --git a/target/xtensa/cpu.c b/target/xtensa/cpu.c
index a54dbe4260..d2351c9b20 100644
--- a/target/xtensa/cpu.c
+++ b/target/xtensa/cpu.c
@@ -131,7 +131,9 @@ static void xtensa_cpu_realizefn(DeviceState *dev, Error **errp)
 
     cs->gdb_num_regs = xcc->config->gdb_regmap.num_regs;
 
-    qemu_init_vcpu(cs);
+    if (!qemu_init_vcpu(cs, errp)) {
+        return;
+    }
 
     xcc->parent_realize(dev, errp);
 }
-- 
2.13.7

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH RFC v6 3/7] qemu_thread_join: fix segmentation fault
  2018-10-29 12:58 [Qemu-devel] [PATCH RFC v6 0/7] qemu_thread_create: propagate errors to callers to check Fei Li
  2018-10-29 12:58 ` [Qemu-devel] [PATCH RFC v6 1/7] Fix segmentation fault when qemu_signal_init fails Fei Li
  2018-10-29 12:58 ` [Qemu-devel] [PATCH RFC v6 2/7] qemu_init_vcpu: add a new Error parameter to propagate Fei Li
@ 2018-10-29 12:58 ` Fei Li
  2018-10-29 12:58 ` [Qemu-devel] [PATCH RFC v6 4/7] migration: fix some segmentation faults when using multifd Fei Li
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 17+ messages in thread
From: Fei Li @ 2018-10-29 12:58 UTC (permalink / raw)
  To: qemu-devel; +Cc: armbru, dgilbert, famz, peterx, quintela

To avoid the segmentation fault in qemu_thread_join(), just directly
return when the QemuThread *thread failed to be created in either
qemu-thread-posix.c or qemu-thread-win32.c.

Signed-off-by: Fei Li <fli@suse.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
---
 util/qemu-thread-posix.c | 3 +++
 util/qemu-thread-win32.c | 2 +-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/util/qemu-thread-posix.c b/util/qemu-thread-posix.c
index dfa66ff2fb..289af4fab5 100644
--- a/util/qemu-thread-posix.c
+++ b/util/qemu-thread-posix.c
@@ -562,6 +562,9 @@ void *qemu_thread_join(QemuThread *thread)
     int err;
     void *ret;
 
+    if (!thread->thread) {
+        return NULL;
+    }
     err = pthread_join(thread->thread, &ret);
     if (err) {
         error_exit(err, __func__);
diff --git a/util/qemu-thread-win32.c b/util/qemu-thread-win32.c
index 4a363ca675..1a27e1cf6f 100644
--- a/util/qemu-thread-win32.c
+++ b/util/qemu-thread-win32.c
@@ -366,7 +366,7 @@ void *qemu_thread_join(QemuThread *thread)
     HANDLE handle;
 
     data = thread->data;
-    if (data->mode == QEMU_THREAD_DETACHED) {
+    if (data == NULL || data->mode == QEMU_THREAD_DETACHED) {
         return NULL;
     }
 
-- 
2.13.7

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH RFC v6 4/7] migration: fix some segmentation faults when using multifd
  2018-10-29 12:58 [Qemu-devel] [PATCH RFC v6 0/7] qemu_thread_create: propagate errors to callers to check Fei Li
                   ` (2 preceding siblings ...)
  2018-10-29 12:58 ` [Qemu-devel] [PATCH RFC v6 3/7] qemu_thread_join: fix segmentation fault Fei Li
@ 2018-10-29 12:58 ` Fei Li
  2018-10-29 12:58 ` [Qemu-devel] [PATCH RFC v6 5/7] migration: fix the multifd code when receiving less channels Fei Li
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 17+ messages in thread
From: Fei Li @ 2018-10-29 12:58 UTC (permalink / raw)
  To: qemu-devel; +Cc: armbru, dgilbert, famz, peterx, quintela

When multifd is used during migration, a segmentaion fault will
occur in the source when multifd_save_cleanup() is called again if
the multifd_send_state has been freed in earlier error handling. This
can happen when migrate_fd_connect() fails and multifd_fd_cleanup()
is called, and then multifd_new_send_channel_async() fails and
multifd_save_cleanup() is called again.

If the QIOChannel *c of multifd_recv_state->params[i] (p->c) is not
initialized, there is no need to close the channel. Or else a
segmentation fault will occur in multifd_recv_terminate_threads()
when multifd_recv_initial_packet() fails.

Signed-off-by: Fei Li <fli@suse.com>
---
 migration/ram.c | 28 +++++++++++++++++++++-------
 1 file changed, 21 insertions(+), 7 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 7e7deec4d8..4db3b3e8f4 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -907,6 +907,11 @@ static void multifd_send_terminate_threads(Error *err)
         }
     }
 
+    /* in case multifd_send_state has been freed earlier */
+    if (!multifd_send_state) {
+        return;
+    }
+
     for (i = 0; i < migrate_multifd_channels(); i++) {
         MultiFDSendParams *p = &multifd_send_state->params[i];
 
@@ -922,7 +927,7 @@ int multifd_save_cleanup(Error **errp)
     int i;
     int ret = 0;
 
-    if (!migrate_use_multifd()) {
+    if (!migrate_use_multifd() || !multifd_send_state) {
         return 0;
     }
     multifd_send_terminate_threads(NULL);
@@ -960,7 +965,7 @@ static void multifd_send_sync_main(void)
 {
     int i;
 
-    if (!migrate_use_multifd()) {
+    if (!migrate_use_multifd() || !multifd_send_state) {
         return;
     }
     if (multifd_send_state->pages->used) {
@@ -1070,6 +1075,10 @@ static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque)
     QIOChannel *sioc = QIO_CHANNEL(qio_task_get_source(task));
     Error *local_err = NULL;
 
+    if (!multifd_send_state) {
+        return;
+    }
+
     if (qio_task_propagate_error(task, &local_err)) {
         if (multifd_save_cleanup(&local_err) != 0) {
             migrate_set_error(migrate_get_current(), local_err);
@@ -1131,7 +1140,7 @@ struct {
     uint64_t packet_num;
 } *multifd_recv_state;
 
-static void multifd_recv_terminate_threads(Error *err)
+static void multifd_recv_terminate_threads(Error *err, bool channel)
 {
     int i;
 
@@ -1145,6 +1154,11 @@ static void multifd_recv_terminate_threads(Error *err)
         }
     }
 
+    /* in case p->c is not initialized */
+    if (!channel) {
+        return;
+    }
+
     for (i = 0; i < migrate_multifd_channels(); i++) {
         MultiFDRecvParams *p = &multifd_recv_state->params[i];
 
@@ -1166,7 +1180,7 @@ int multifd_load_cleanup(Error **errp)
     if (!migrate_use_multifd()) {
         return 0;
     }
-    multifd_recv_terminate_threads(NULL);
+    multifd_recv_terminate_threads(NULL, true);
     for (i = 0; i < migrate_multifd_channels(); i++) {
         MultiFDRecvParams *p = &multifd_recv_state->params[i];
 
@@ -1269,7 +1283,7 @@ static void *multifd_recv_thread(void *opaque)
     }
 
     if (local_err) {
-        multifd_recv_terminate_threads(local_err);
+        multifd_recv_terminate_threads(local_err, true);
     }
     qemu_mutex_lock(&p->mutex);
     p->running = false;
@@ -1331,7 +1345,7 @@ bool multifd_recv_new_channel(QIOChannel *ioc)
 
     id = multifd_recv_initial_packet(ioc, &local_err);
     if (id < 0) {
-        multifd_recv_terminate_threads(local_err);
+        multifd_recv_terminate_threads(local_err, false);
         return false;
     }
 
@@ -1339,7 +1353,7 @@ bool multifd_recv_new_channel(QIOChannel *ioc)
     if (p->c != NULL) {
         error_setg(&local_err, "multifd: received id '%d' already setup'",
                    id);
-        multifd_recv_terminate_threads(local_err);
+        multifd_recv_terminate_threads(local_err, true);
         return false;
     }
     p->c = ioc;
-- 
2.13.7

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH RFC v6 5/7] migration: fix the multifd code when receiving less channels
  2018-10-29 12:58 [Qemu-devel] [PATCH RFC v6 0/7] qemu_thread_create: propagate errors to callers to check Fei Li
                   ` (3 preceding siblings ...)
  2018-10-29 12:58 ` [Qemu-devel] [PATCH RFC v6 4/7] migration: fix some segmentation faults when using multifd Fei Li
@ 2018-10-29 12:58 ` Fei Li
  2018-10-30  6:05   ` Peter Xu
  2018-10-31 13:50   ` Fei Li
  2018-10-29 12:58 ` [Qemu-devel] [PATCH RFC v6 6/7] migration: fix some error handling Fei Li
  2018-10-29 12:58 ` [Qemu-devel] [PATCH RFC v6 7/7] qemu_thread_create: propagate the error to callers to handle Fei Li
  6 siblings, 2 replies; 17+ messages in thread
From: Fei Li @ 2018-10-29 12:58 UTC (permalink / raw)
  To: qemu-devel; +Cc: armbru, dgilbert, famz, peterx, quintela

In our current code, when multifd is used during migration, if there
is an error before the destination receives all new channels, the
source keeps running, however the destination does not exit but keeps
waiting until the source is killed deliberately.

Fix this by simply killing the destination when it fails to receive
packet via some channel.

Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Fei Li <fli@suse.com>
---
 migration/channel.c   |  7 ++++++-
 migration/migration.c |  9 +++++++--
 migration/migration.h |  2 +-
 migration/ram.c       | 17 ++++++++++++++---
 migration/ram.h       |  2 +-
 5 files changed, 29 insertions(+), 8 deletions(-)

diff --git a/migration/channel.c b/migration/channel.c
index 33e0e9b82f..572be4245a 100644
--- a/migration/channel.c
+++ b/migration/channel.c
@@ -44,7 +44,12 @@ void migration_channel_process_incoming(QIOChannel *ioc)
             error_report_err(local_err);
         }
     } else {
-        migration_ioc_process_incoming(ioc);
+        Error *local_err = NULL;
+        migration_ioc_process_incoming(ioc, &local_err);
+        if (local_err) {
+            error_report_err(local_err);
+            exit(EXIT_FAILURE);
+        }
     }
 }
 
diff --git a/migration/migration.c b/migration/migration.c
index 8b36e7f184..87dfc7374f 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -541,7 +541,7 @@ void migration_fd_process_incoming(QEMUFile *f)
     migration_incoming_process();
 }
 
-void migration_ioc_process_incoming(QIOChannel *ioc)
+void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp)
 {
     MigrationIncomingState *mis = migration_incoming_get_current();
     bool start_migration;
@@ -563,9 +563,14 @@ void migration_ioc_process_incoming(QIOChannel *ioc)
          */
         start_migration = !migrate_use_multifd();
     } else {
+        Error *local_err = NULL;
         /* Multiple connections */
         assert(migrate_use_multifd());
-        start_migration = multifd_recv_new_channel(ioc);
+        start_migration = multifd_recv_new_channel(ioc, &local_err);
+        if (local_err) {
+            error_propagate(errp, local_err);
+            return;
+        }
     }
 
     if (start_migration) {
diff --git a/migration/migration.h b/migration/migration.h
index f7813f8261..7df4d426d0 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -229,7 +229,7 @@ struct MigrationState
 void migrate_set_state(int *state, int old_state, int new_state);
 
 void migration_fd_process_incoming(QEMUFile *f);
-void migration_ioc_process_incoming(QIOChannel *ioc);
+void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp);
 void migration_incoming_process(void);
 
 bool  migration_has_all_channels(void);
diff --git a/migration/ram.c b/migration/ram.c
index 4db3b3e8f4..8f03afe228 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1072,6 +1072,7 @@ out:
 static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque)
 {
     MultiFDSendParams *p = opaque;
+    MigrationState *s = migrate_get_current();
     QIOChannel *sioc = QIO_CHANNEL(qio_task_get_source(task));
     Error *local_err = NULL;
 
@@ -1080,6 +1081,7 @@ static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque)
     }
 
     if (qio_task_propagate_error(task, &local_err)) {
+        migrate_set_state(&s->state, s->state, MIGRATION_STATUS_FAILED);
         if (multifd_save_cleanup(&local_err) != 0) {
             migrate_set_error(migrate_get_current(), local_err);
         }
@@ -1337,16 +1339,20 @@ bool multifd_recv_all_channels_created(void)
 }
 
 /* Return true if multifd is ready for the migration, otherwise false */
-bool multifd_recv_new_channel(QIOChannel *ioc)
+bool multifd_recv_new_channel(QIOChannel *ioc, Error **errp)
 {
+    MigrationIncomingState *mis = migration_incoming_get_current();
     MultiFDRecvParams *p;
     Error *local_err = NULL;
     int id;
 
     id = multifd_recv_initial_packet(ioc, &local_err);
     if (id < 0) {
+        error_propagate_prepend(errp, local_err,
+                        "failed to receive packet via multifd channel %x: ",
+                        multifd_recv_state->count);
         multifd_recv_terminate_threads(local_err, false);
-        return false;
+        goto fail;
     }
 
     p = &multifd_recv_state->params[id];
@@ -1354,7 +1360,8 @@ bool multifd_recv_new_channel(QIOChannel *ioc)
         error_setg(&local_err, "multifd: received id '%d' already setup'",
                    id);
         multifd_recv_terminate_threads(local_err, true);
-        return false;
+        error_propagate(errp, local_err);
+        goto fail;
     }
     p->c = ioc;
     object_ref(OBJECT(ioc));
@@ -1366,6 +1373,10 @@ bool multifd_recv_new_channel(QIOChannel *ioc)
                        QEMU_THREAD_JOINABLE);
     atomic_inc(&multifd_recv_state->count);
     return multifd_recv_state->count == migrate_multifd_channels();
+fail:
+    qemu_fclose(mis->from_src_file);
+    mis->from_src_file = NULL;
+    return false;
 }
 
 /**
diff --git a/migration/ram.h b/migration/ram.h
index 83ff1bc11a..046d3074be 100644
--- a/migration/ram.h
+++ b/migration/ram.h
@@ -47,7 +47,7 @@ int multifd_save_cleanup(Error **errp);
 int multifd_load_setup(void);
 int multifd_load_cleanup(Error **errp);
 bool multifd_recv_all_channels_created(void);
-bool multifd_recv_new_channel(QIOChannel *ioc);
+bool multifd_recv_new_channel(QIOChannel *ioc, Error **errp);
 
 uint64_t ram_pagesize_summary(void);
 int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t len);
-- 
2.13.7

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH RFC v6 6/7] migration: fix some error handling
  2018-10-29 12:58 [Qemu-devel] [PATCH RFC v6 0/7] qemu_thread_create: propagate errors to callers to check Fei Li
                   ` (4 preceding siblings ...)
  2018-10-29 12:58 ` [Qemu-devel] [PATCH RFC v6 5/7] migration: fix the multifd code when receiving less channels Fei Li
@ 2018-10-29 12:58 ` Fei Li
  2018-10-30 19:49   ` Dr. David Alan Gilbert
  2018-10-29 12:58 ` [Qemu-devel] [PATCH RFC v6 7/7] qemu_thread_create: propagate the error to callers to handle Fei Li
  6 siblings, 1 reply; 17+ messages in thread
From: Fei Li @ 2018-10-29 12:58 UTC (permalink / raw)
  To: qemu-devel; +Cc: armbru, dgilbert, famz, peterx, quintela

Add error handling for qemu_ram_foreach_migratable_block() when
it fails.

Always call migrate_set_error() to set the error state without relying
on whether multifd_save_cleanup() succeeds. As the passed &local_err
is never used in multifd_save_cleanup(), remove it.

Signed-off-by: Fei Li <fli@suse.com>
---
 migration/migration.c    | 5 +----
 migration/postcopy-ram.c | 3 +++
 migration/ram.c          | 7 +++----
 migration/ram.h          | 2 +-
 4 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 87dfc7374f..3b8b7ab4f9 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1377,7 +1377,6 @@ static void migrate_fd_cleanup(void *opaque)
     qemu_savevm_state_cleanup();
 
     if (s->to_dst_file) {
-        Error *local_err = NULL;
         QEMUFile *tmp;
 
         trace_migrate_fd_cleanup();
@@ -1388,9 +1387,7 @@ static void migrate_fd_cleanup(void *opaque)
         }
         qemu_mutex_lock_iothread();
 
-        if (multifd_save_cleanup(&local_err) != 0) {
-            error_report_err(local_err);
-        }
+        multifd_save_cleanup();
         qemu_mutex_lock(&s->qemu_file_lock);
         tmp = s->to_dst_file;
         s->to_dst_file = NULL;
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index e5c02a32c5..4ca2bc02b3 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -1117,6 +1117,9 @@ int postcopy_ram_enable_notify(MigrationIncomingState *mis)
 
     /* Mark so that we get notified of accesses to unwritten areas */
     if (qemu_ram_foreach_migratable_block(ram_block_enable_notify, mis)) {
+        error_report("ram_block_enable_notify failed");
+        close(mis->userfault_event_fd);
+        close(mis->userfault_fd);
         return -1;
     }
 
diff --git a/migration/ram.c b/migration/ram.c
index 8f03afe228..57cb1bee3d 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -922,7 +922,7 @@ static void multifd_send_terminate_threads(Error *err)
     }
 }
 
-int multifd_save_cleanup(Error **errp)
+int multifd_save_cleanup(void)
 {
     int i;
     int ret = 0;
@@ -1082,9 +1082,8 @@ static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque)
 
     if (qio_task_propagate_error(task, &local_err)) {
         migrate_set_state(&s->state, s->state, MIGRATION_STATUS_FAILED);
-        if (multifd_save_cleanup(&local_err) != 0) {
-            migrate_set_error(migrate_get_current(), local_err);
-        }
+        multifd_save_cleanup();
+        migrate_set_error(migrate_get_current(), local_err);
     } else {
         p->c = QIO_CHANNEL(sioc);
         qio_channel_set_delay(p->c, false);
diff --git a/migration/ram.h b/migration/ram.h
index 046d3074be..0d1215209e 100644
--- a/migration/ram.h
+++ b/migration/ram.h
@@ -43,7 +43,7 @@ uint64_t ram_bytes_remaining(void);
 uint64_t ram_bytes_total(void);
 
 int multifd_save_setup(void);
-int multifd_save_cleanup(Error **errp);
+int multifd_save_cleanup(void);
 int multifd_load_setup(void);
 int multifd_load_cleanup(Error **errp);
 bool multifd_recv_all_channels_created(void);
-- 
2.13.7

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH RFC v6 7/7] qemu_thread_create: propagate the error to callers to handle
  2018-10-29 12:58 [Qemu-devel] [PATCH RFC v6 0/7] qemu_thread_create: propagate errors to callers to check Fei Li
                   ` (5 preceding siblings ...)
  2018-10-29 12:58 ` [Qemu-devel] [PATCH RFC v6 6/7] migration: fix some error handling Fei Li
@ 2018-10-29 12:58 ` Fei Li
  6 siblings, 0 replies; 17+ messages in thread
From: Fei Li @ 2018-10-29 12:58 UTC (permalink / raw)
  To: qemu-devel; +Cc: armbru, dgilbert, famz, peterx, quintela

Make qemu_thread_create() return a Boolean to indicate if it succeeds
rather than failing with an error. And add an Error parameter to hold
the error message and let the callers handle it.

Signed-off-by: Fei Li <fli@suse.com>
---
 cpus.c                      | 45 ++++++++++++++++++++++++++-------------
 dump.c                      |  6 ++++--
 hw/misc/edu.c               |  6 ++++--
 hw/ppc/spapr_hcall.c        | 10 +++++++--
 hw/rdma/rdma_backend.c      |  4 +++-
 hw/usb/ccid-card-emulated.c | 13 ++++++++----
 include/qemu/thread.h       |  4 ++--
 io/task.c                   |  3 ++-
 iothread.c                  | 16 +++++++++-----
 migration/migration.c       | 52 +++++++++++++++++++++++++++++----------------
 migration/postcopy-ram.c    | 14 ++++++++++--
 migration/ram.c             | 41 ++++++++++++++++++++++++++---------
 migration/savevm.c          | 11 +++++++---
 tests/atomic_add-bench.c    |  3 ++-
 tests/iothread.c            |  2 +-
 tests/qht-bench.c           |  3 ++-
 tests/rcutorture.c          |  3 ++-
 tests/test-aio.c            |  2 +-
 tests/test-rcu-list.c       |  3 ++-
 ui/vnc-jobs.c               | 17 ++++++++++-----
 ui/vnc-jobs.h               |  2 +-
 ui/vnc.c                    |  4 +++-
 util/compatfd.c             | 12 +++++++++--
 util/oslib-posix.c          | 17 +++++++++++----
 util/qemu-thread-posix.c    | 24 ++++++++++++++-------
 util/qemu-thread-win32.c    | 16 ++++++++++----
 util/rcu.c                  |  3 ++-
 util/thread-pool.c          |  4 +++-
 28 files changed, 240 insertions(+), 100 deletions(-)

diff --git a/cpus.c b/cpus.c
index ed71618e1f..0510f90e06 100644
--- a/cpus.c
+++ b/cpus.c
@@ -1949,15 +1949,20 @@ static void qemu_tcg_init_vcpu(CPUState *cpu, Error **errp)
             snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/TCG",
                  cpu->cpu_index);
 
-            qemu_thread_create(cpu->thread, thread_name, qemu_tcg_cpu_thread_fn,
-                               cpu, QEMU_THREAD_JOINABLE);
+            if (!qemu_thread_create(cpu->thread, thread_name,
+                                    qemu_tcg_cpu_thread_fn, cpu,
+                                    QEMU_THREAD_JOINABLE, errp)) {
+                return;
+            }
 
         } else {
             /* share a single thread for all cpus with TCG */
             snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "ALL CPUs/TCG");
-            qemu_thread_create(cpu->thread, thread_name,
-                               qemu_tcg_rr_cpu_thread_fn,
-                               cpu, QEMU_THREAD_JOINABLE);
+            if (!qemu_thread_create(cpu->thread, thread_name,
+                                    qemu_tcg_rr_cpu_thread_fn, cpu,
+                                    QEMU_THREAD_JOINABLE, errp)) {
+                return;
+            }
 
             single_tcg_halt_cond = cpu->halt_cond;
             single_tcg_cpu_thread = cpu->thread;
@@ -1985,8 +1990,10 @@ static void qemu_hax_start_vcpu(CPUState *cpu, Error **errp)
 
     snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/HAX",
              cpu->cpu_index);
-    qemu_thread_create(cpu->thread, thread_name, qemu_hax_cpu_thread_fn,
-                       cpu, QEMU_THREAD_JOINABLE);
+    if (!qemu_thread_create(cpu->thread, thread_name, qemu_hax_cpu_thread_fn,
+                            cpu, QEMU_THREAD_JOINABLE, errp)) {
+        return;
+    }
 #ifdef _WIN32
     cpu->hThread = qemu_thread_get_handle(cpu->thread);
 #endif
@@ -2001,8 +2008,10 @@ static void qemu_kvm_start_vcpu(CPUState *cpu, Error **errp)
     qemu_cond_init(cpu->halt_cond);
     snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/KVM",
              cpu->cpu_index);
-    qemu_thread_create(cpu->thread, thread_name, qemu_kvm_cpu_thread_fn,
-                       cpu, QEMU_THREAD_JOINABLE);
+    if (!qemu_thread_create(cpu->thread, thread_name, qemu_kvm_cpu_thread_fn,
+                            cpu, QEMU_THREAD_JOINABLE, errp)) {
+        /* keep 'if' here in case there is further error handling logic */
+    }
 }
 
 static void qemu_hvf_start_vcpu(CPUState *cpu, Error **errp)
@@ -2019,8 +2028,10 @@ static void qemu_hvf_start_vcpu(CPUState *cpu, Error **errp)
 
     snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/HVF",
              cpu->cpu_index);
-    qemu_thread_create(cpu->thread, thread_name, qemu_hvf_cpu_thread_fn,
-                       cpu, QEMU_THREAD_JOINABLE);
+    if (!qemu_thread_create(cpu->thread, thread_name, qemu_hvf_cpu_thread_fn,
+                            cpu, QEMU_THREAD_JOINABLE, errp)) {
+        /* keep 'if' here in case there is further error handling logic */
+    }
 }
 
 static void qemu_whpx_start_vcpu(CPUState *cpu, Error **errp)
@@ -2032,8 +2043,10 @@ static void qemu_whpx_start_vcpu(CPUState *cpu, Error **errp)
     qemu_cond_init(cpu->halt_cond);
     snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/WHPX",
              cpu->cpu_index);
-    qemu_thread_create(cpu->thread, thread_name, qemu_whpx_cpu_thread_fn,
-                       cpu, QEMU_THREAD_JOINABLE);
+    if (!qemu_thread_create(cpu->thread, thread_name, qemu_whpx_cpu_thread_fn,
+                            cpu, QEMU_THREAD_JOINABLE, errp)) {
+        return;
+    }
 #ifdef _WIN32
     cpu->hThread = qemu_thread_get_handle(cpu->thread);
 #endif
@@ -2048,8 +2061,10 @@ static void qemu_dummy_start_vcpu(CPUState *cpu, Error **errp)
     qemu_cond_init(cpu->halt_cond);
     snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/DUMMY",
              cpu->cpu_index);
-    qemu_thread_create(cpu->thread, thread_name, qemu_dummy_cpu_thread_fn, cpu,
-                       QEMU_THREAD_JOINABLE);
+    if (!qemu_thread_create(cpu->thread, thread_name, qemu_dummy_cpu_thread_fn,
+                           cpu, QEMU_THREAD_JOINABLE, errp)) {
+        /* keep 'if' here in case there is further error handling logic */
+    }
 }
 
 bool qemu_init_vcpu(CPUState *cpu, Error **errp)
diff --git a/dump.c b/dump.c
index 4ec94c5e25..1f003aff9a 100644
--- a/dump.c
+++ b/dump.c
@@ -2020,8 +2020,10 @@ void qmp_dump_guest_memory(bool paging, const char *file,
     if (detach_p) {
         /* detached dump */
         s->detached = true;
-        qemu_thread_create(&s->dump_thread, "dump_thread", dump_thread,
-                           s, QEMU_THREAD_DETACHED);
+        if (!qemu_thread_create(&s->dump_thread, "dump_thread", dump_thread,
+                                s, QEMU_THREAD_DETACHED, errp)) {
+            /* keep 'if' here in case there is further error handling logic */
+        }
     } else {
         /* sync dump */
         dump_process(s, errp);
diff --git a/hw/misc/edu.c b/hw/misc/edu.c
index cdcf550dd7..6684c60a96 100644
--- a/hw/misc/edu.c
+++ b/hw/misc/edu.c
@@ -355,8 +355,10 @@ static void pci_edu_realize(PCIDevice *pdev, Error **errp)
 
     qemu_mutex_init(&edu->thr_mutex);
     qemu_cond_init(&edu->thr_cond);
-    qemu_thread_create(&edu->thread, "edu", edu_fact_thread,
-                       edu, QEMU_THREAD_JOINABLE);
+    if (!qemu_thread_create(&edu->thread, "edu", edu_fact_thread,
+                            edu, QEMU_THREAD_JOINABLE, errp)) {
+        return;
+    }
 
     memory_region_init_io(&edu->mmio, OBJECT(edu), &edu_mmio_ops, edu,
                     "edu-mmio", 1 * MiB);
diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index ae913d070f..7c16ade04a 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -478,6 +478,7 @@ static target_ulong h_resize_hpt_prepare(PowerPCCPU *cpu,
     sPAPRPendingHPT *pending = spapr->pending_hpt;
     uint64_t current_ram_size;
     int rc;
+    Error *local_err = NULL;
 
     if (spapr->resize_hpt == SPAPR_RESIZE_HPT_DISABLED) {
         return H_AUTHORITY;
@@ -538,8 +539,13 @@ static target_ulong h_resize_hpt_prepare(PowerPCCPU *cpu,
     pending->shift = shift;
     pending->ret = H_HARDWARE;
 
-    qemu_thread_create(&pending->thread, "sPAPR HPT prepare",
-                       hpt_prepare_thread, pending, QEMU_THREAD_DETACHED);
+    if (!qemu_thread_create(&pending->thread, "sPAPR HPT prepare",
+                            hpt_prepare_thread, pending,
+                            QEMU_THREAD_DETACHED, &local_err)) {
+        error_reportf_err(local_err, "failed to create hpt_prepare_thread: ");
+        g_free(pending);
+        return H_RESOURCE;
+    }
 
     spapr->pending_hpt = pending;
 
diff --git a/hw/rdma/rdma_backend.c b/hw/rdma/rdma_backend.c
index d7a4bbd91f..53a2bd0d85 100644
--- a/hw/rdma/rdma_backend.c
+++ b/hw/rdma/rdma_backend.c
@@ -164,8 +164,10 @@ static void start_comp_thread(RdmaBackendDev *backend_dev)
     snprintf(thread_name, sizeof(thread_name), "rdma_comp_%s",
              ibv_get_device_name(backend_dev->ib_dev));
     backend_dev->comp_thread.run = true;
+    /* FIXME: let the further caller handle the error instead of abort() here */
     qemu_thread_create(&backend_dev->comp_thread.thread, thread_name,
-                       comp_handler_thread, backend_dev, QEMU_THREAD_DETACHED);
+                       comp_handler_thread, backend_dev,
+                       QEMU_THREAD_DETACHED, &error_abort);
 }
 
 void rdma_backend_register_comp_handler(void (*handler)(int status,
diff --git a/hw/usb/ccid-card-emulated.c b/hw/usb/ccid-card-emulated.c
index 5c8b3c9907..09682b5d30 100644
--- a/hw/usb/ccid-card-emulated.c
+++ b/hw/usb/ccid-card-emulated.c
@@ -538,10 +538,15 @@ static void emulated_realize(CCIDCardState *base, Error **errp)
         error_setg(errp, "%s: failed to initialize vcard", TYPE_EMULATED_CCID);
         return;
     }
-    qemu_thread_create(&card->event_thread_id, "ccid/event", event_thread,
-                       card, QEMU_THREAD_JOINABLE);
-    qemu_thread_create(&card->apdu_thread_id, "ccid/apdu", handle_apdu_thread,
-                       card, QEMU_THREAD_JOINABLE);
+    if (!qemu_thread_create(&card->event_thread_id, "ccid/event", event_thread,
+                            card, QEMU_THREAD_JOINABLE, errp)) {
+        return;
+    }
+    if (!qemu_thread_create(&card->apdu_thread_id, "ccid/apdu",
+                            handle_apdu_thread, card,
+                            QEMU_THREAD_JOINABLE, errp)) {
+        /* keep 'if' here in case there is further error handling logic */
+    }
 }
 
 static void emulated_unrealize(CCIDCardState *base, Error **errp)
diff --git a/include/qemu/thread.h b/include/qemu/thread.h
index b2661b6672..c230eb5d5b 100644
--- a/include/qemu/thread.h
+++ b/include/qemu/thread.h
@@ -152,9 +152,9 @@ void qemu_event_reset(QemuEvent *ev);
 void qemu_event_wait(QemuEvent *ev);
 void qemu_event_destroy(QemuEvent *ev);
 
-void qemu_thread_create(QemuThread *thread, const char *name,
+bool qemu_thread_create(QemuThread *thread, const char *name,
                         void *(*start_routine)(void *),
-                        void *arg, int mode);
+                        void *arg, int mode, Error **errp);
 void *qemu_thread_join(QemuThread *thread);
 void qemu_thread_get_self(QemuThread *thread);
 bool qemu_thread_is_self(QemuThread *thread);
diff --git a/io/task.c b/io/task.c
index 2886a2c1bc..6d3a18ab80 100644
--- a/io/task.c
+++ b/io/task.c
@@ -149,7 +149,8 @@ void qio_task_run_in_thread(QIOTask *task,
                        "io-task-worker",
                        qio_task_thread_worker,
                        data,
-                       QEMU_THREAD_DETACHED);
+                       QEMU_THREAD_DETACHED,
+                       &error_abort);
 }
 
 
diff --git a/iothread.c b/iothread.c
index 2fb1cdf55d..7335dacf0b 100644
--- a/iothread.c
+++ b/iothread.c
@@ -164,9 +164,7 @@ static void iothread_complete(UserCreatable *obj, Error **errp)
                                 &local_error);
     if (local_error) {
         error_propagate(errp, local_error);
-        aio_context_unref(iothread->ctx);
-        iothread->ctx = NULL;
-        return;
+        goto fail;
     }
 
     qemu_mutex_init(&iothread->init_done_lock);
@@ -178,8 +176,12 @@ static void iothread_complete(UserCreatable *obj, Error **errp)
      */
     name = object_get_canonical_path_component(OBJECT(obj));
     thread_name = g_strdup_printf("IO %s", name);
-    qemu_thread_create(&iothread->thread, thread_name, iothread_run,
-                       iothread, QEMU_THREAD_JOINABLE);
+    if (!qemu_thread_create(&iothread->thread, thread_name, iothread_run,
+                            iothread, QEMU_THREAD_JOINABLE, errp)) {
+        g_free(thread_name);
+        g_free(name);
+        goto fail;
+    }
     g_free(thread_name);
     g_free(name);
 
@@ -190,6 +192,10 @@ static void iothread_complete(UserCreatable *obj, Error **errp)
                        &iothread->init_done_lock);
     }
     qemu_mutex_unlock(&iothread->init_done_lock);
+    return;
+fail:
+    aio_context_unref(iothread->ctx);
+    iothread->ctx = NULL;
 }
 
 typedef struct {
diff --git a/migration/migration.c b/migration/migration.c
index 3b8b7ab4f9..6bf8398968 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -438,10 +438,8 @@ static void process_incoming_migration_co(void *opaque)
         /* Make sure all file formats flush their mutable metadata */
         bdrv_invalidate_cache_all(&local_err);
         if (local_err) {
-            migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
-                    MIGRATION_STATUS_FAILED);
             error_report_err(local_err);
-            exit(EXIT_FAILURE);
+            goto fail;
         }
 
         if (colo_init_ram_cache() < 0) {
@@ -449,8 +447,13 @@ static void process_incoming_migration_co(void *opaque)
             exit(EXIT_FAILURE);
         }
 
-        qemu_thread_create(&mis->colo_incoming_thread, "COLO incoming",
-             colo_process_incoming_thread, mis, QEMU_THREAD_JOINABLE);
+        if (!qemu_thread_create(&mis->colo_incoming_thread, "COLO incoming",
+                                colo_process_incoming_thread, mis,
+                                QEMU_THREAD_JOINABLE, &local_err)) {
+            error_reportf_err(local_err, "failed to create "
+                              "colo_process_incoming_thread: ");
+            goto fail;
+        }
         mis->have_colo_incoming_thread = true;
         qemu_coroutine_yield();
 
@@ -461,20 +464,22 @@ static void process_incoming_migration_co(void *opaque)
     }
 
     if (ret < 0) {
-        Error *local_err = NULL;
-
-        migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
-                          MIGRATION_STATUS_FAILED);
         error_report("load of migration failed: %s", strerror(-ret));
-        qemu_fclose(mis->from_src_file);
-        if (multifd_load_cleanup(&local_err) != 0) {
-            error_report_err(local_err);
-        }
-        exit(EXIT_FAILURE);
+        goto fail;
     }
     mis->bh = qemu_bh_new(process_incoming_migration_bh, mis);
     qemu_bh_schedule(mis->bh);
     mis->migration_incoming_co = NULL;
+    return;
+fail:
+    local_err = NULL;
+    migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
+                      MIGRATION_STATUS_FAILED);
+    qemu_fclose(mis->from_src_file);
+    if (multifd_load_cleanup(&local_err) != 0) {
+        error_report_err(local_err);
+    }
+    exit(EXIT_FAILURE);
 }
 
 static void migration_incoming_setup(QEMUFile *f)
@@ -2336,6 +2341,7 @@ out:
 static int open_return_path_on_source(MigrationState *ms,
                                       bool create_thread)
 {
+    Error *local_err = NULL;
 
     ms->rp_state.from_dst_file = qemu_file_get_return_path(ms->to_dst_file);
     if (!ms->rp_state.from_dst_file) {
@@ -2349,8 +2355,13 @@ static int open_return_path_on_source(MigrationState *ms,
         return 0;
     }
 
-    qemu_thread_create(&ms->rp_state.rp_thread, "return path",
-                       source_return_path_thread, ms, QEMU_THREAD_JOINABLE);
+    if (!qemu_thread_create(&ms->rp_state.rp_thread, "return path",
+                            source_return_path_thread, ms,
+                            QEMU_THREAD_JOINABLE, &local_err)) {
+        error_reportf_err(local_err,
+                          "failed to create source_return_path_thread: ");
+        return -1;
+    }
 
     trace_open_return_path_on_source_continue();
 
@@ -3180,8 +3191,13 @@ void migrate_fd_connect(MigrationState *s, Error *error_in)
         migrate_fd_cleanup(s);
         return;
     }
-    qemu_thread_create(&s->thread, "live_migration", migration_thread, s,
-                       QEMU_THREAD_JOINABLE);
+    if (!qemu_thread_create(&s->thread, "live_migration", migration_thread,
+                            s, QEMU_THREAD_JOINABLE, &error_in)) {
+        error_reportf_err(error_in, "failed to create migration_thread: ");
+        migrate_set_state(&s->state, s->state, MIGRATION_STATUS_FAILED);
+        migrate_fd_cleanup(s);
+        return;
+    }
     s->migration_thread_running = true;
 }
 
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 4ca2bc02b3..56c13e3661 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -1083,6 +1083,8 @@ retry:
 
 int postcopy_ram_enable_notify(MigrationIncomingState *mis)
 {
+    Error *local_err = NULL;
+
     /* Open the fd for the kernel to give us userfaults */
     mis->userfault_fd = syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK);
     if (mis->userfault_fd == -1) {
@@ -1109,8 +1111,16 @@ int postcopy_ram_enable_notify(MigrationIncomingState *mis)
     }
 
     qemu_sem_init(&mis->fault_thread_sem, 0);
-    qemu_thread_create(&mis->fault_thread, "postcopy/fault",
-                       postcopy_ram_fault_thread, mis, QEMU_THREAD_JOINABLE);
+    if (!qemu_thread_create(&mis->fault_thread, "postcopy/fault",
+                            postcopy_ram_fault_thread, mis,
+                            QEMU_THREAD_JOINABLE, &local_err)) {
+        error_reportf_err(local_err,
+                          "failed to create postcopy_ram_fault_thread: ");
+        close(mis->userfault_event_fd);
+        close(mis->userfault_fd);
+        qemu_sem_destroy(&mis->fault_thread_sem);
+        return -1;
+    }
     qemu_sem_wait(&mis->fault_thread_sem);
     qemu_sem_destroy(&mis->fault_thread_sem);
     mis->have_fault_thread = true;
diff --git a/migration/ram.c b/migration/ram.c
index 57cb1bee3d..7097a6475a 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -473,6 +473,7 @@ static void compress_threads_save_cleanup(void)
 static int compress_threads_save_setup(void)
 {
     int i, thread_count;
+    Error *local_err = NULL;
 
     if (!migrate_use_compression()) {
         return 0;
@@ -502,9 +503,12 @@ static int compress_threads_save_setup(void)
         comp_param[i].quit = false;
         qemu_mutex_init(&comp_param[i].mutex);
         qemu_cond_init(&comp_param[i].cond);
-        qemu_thread_create(compress_threads + i, "compress",
-                           do_data_compress, comp_param + i,
-                           QEMU_THREAD_JOINABLE);
+        if (!qemu_thread_create(compress_threads + i, "compress",
+                                do_data_compress, comp_param + i,
+                                QEMU_THREAD_JOINABLE, &local_err)) {
+            error_reportf_err(local_err, "failed to create do_data_compress: ");
+            goto exit;
+        }
     }
     return 0;
 
@@ -1088,8 +1092,15 @@ static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque)
         p->c = QIO_CHANNEL(sioc);
         qio_channel_set_delay(p->c, false);
         p->running = true;
-        qemu_thread_create(&p->thread, p->name, multifd_send_thread, p,
-                           QEMU_THREAD_JOINABLE);
+        if (!qemu_thread_create(&p->thread, p->name, multifd_send_thread, p,
+                                QEMU_THREAD_JOINABLE, &local_err)) {
+            migrate_set_state(&s->state, s->state, MIGRATION_STATUS_FAILED);
+            multifd_save_cleanup();
+            migrate_set_error(migrate_get_current(), local_err);
+            error_reportf_err(local_err,
+                              "failed to create multifd_send_thread: ");
+            return;
+        }
 
         atomic_inc(&multifd_send_state->count);
     }
@@ -1368,8 +1379,13 @@ bool multifd_recv_new_channel(QIOChannel *ioc, Error **errp)
     p->num_packets = 1;
 
     p->running = true;
-    qemu_thread_create(&p->thread, p->name, multifd_recv_thread, p,
-                       QEMU_THREAD_JOINABLE);
+    if (!qemu_thread_create(&p->thread, p->name, multifd_recv_thread,
+                            p, QEMU_THREAD_JOINABLE, &local_err)) {
+        error_propagate_prepend(errp, local_err,
+                                "failed to create multifd_recv_thread: ");
+        multifd_recv_terminate_threads(local_err, true);
+        goto fail;
+    }
     atomic_inc(&multifd_recv_state->count);
     return multifd_recv_state->count == migrate_multifd_channels();
 fail:
@@ -3638,6 +3654,7 @@ static void compress_threads_load_cleanup(void)
 static int compress_threads_load_setup(QEMUFile *f)
 {
     int i, thread_count;
+    Error *local_err = NULL;
 
     if (!migrate_use_compression()) {
         return 0;
@@ -3659,9 +3676,13 @@ static int compress_threads_load_setup(QEMUFile *f)
         qemu_cond_init(&decomp_param[i].cond);
         decomp_param[i].done = true;
         decomp_param[i].quit = false;
-        qemu_thread_create(decompress_threads + i, "decompress",
-                           do_data_decompress, decomp_param + i,
-                           QEMU_THREAD_JOINABLE);
+        if (!qemu_thread_create(decompress_threads + i, "decompress",
+                                do_data_decompress, decomp_param + i,
+                                QEMU_THREAD_JOINABLE, &local_err)) {
+            error_reportf_err(local_err,
+                              "failed to create do_data_decompress: ");
+            goto exit;
+        }
     }
     return 0;
 exit:
diff --git a/migration/savevm.c b/migration/savevm.c
index 9992af4db4..127d9a1ae4 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1742,9 +1742,14 @@ static int loadvm_postcopy_handle_listen(MigrationIncomingState *mis)
     mis->have_listen_thread = true;
     /* Start up the listening thread and wait for it to signal ready */
     qemu_sem_init(&mis->listen_thread_sem, 0);
-    qemu_thread_create(&mis->listen_thread, "postcopy/listen",
-                       postcopy_ram_listen_thread, NULL,
-                       QEMU_THREAD_DETACHED);
+    if (!qemu_thread_create(&mis->listen_thread, "postcopy/listen",
+                            postcopy_ram_listen_thread, NULL,
+                            QEMU_THREAD_DETACHED, &local_err)) {
+        error_reportf_err(local_err,
+                          "failed to create postcopy_ram_listen_thread: ");
+        qemu_sem_destroy(&mis->listen_thread_sem);
+        return -1;
+    }
     qemu_sem_wait(&mis->listen_thread_sem);
     qemu_sem_destroy(&mis->listen_thread_sem);
 
diff --git a/tests/atomic_add-bench.c b/tests/atomic_add-bench.c
index 2f6c72f63a..338b9563e3 100644
--- a/tests/atomic_add-bench.c
+++ b/tests/atomic_add-bench.c
@@ -2,6 +2,7 @@
 #include "qemu/thread.h"
 #include "qemu/host-utils.h"
 #include "qemu/processor.h"
+#include "qapi/error.h"
 
 struct thread_info {
     uint64_t r;
@@ -110,7 +111,7 @@ static void create_threads(void)
 
         info->r = (i + 1) ^ time(NULL);
         qemu_thread_create(&threads[i], NULL, thread_func, info,
-                           QEMU_THREAD_JOINABLE);
+                           QEMU_THREAD_JOINABLE, &error_abort);
     }
 }
 
diff --git a/tests/iothread.c b/tests/iothread.c
index 777d9eea46..f4ad992e61 100644
--- a/tests/iothread.c
+++ b/tests/iothread.c
@@ -73,7 +73,7 @@ IOThread *iothread_new(void)
     qemu_mutex_init(&iothread->init_done_lock);
     qemu_cond_init(&iothread->init_done_cond);
     qemu_thread_create(&iothread->thread, NULL, iothread_run,
-                       iothread, QEMU_THREAD_JOINABLE);
+                       iothread, QEMU_THREAD_JOINABLE, &error_abort);
 
     /* Wait for initialization to complete */
     qemu_mutex_lock(&iothread->init_done_lock);
diff --git a/tests/qht-bench.c b/tests/qht-bench.c
index 2089e2bed1..71df567ea2 100644
--- a/tests/qht-bench.c
+++ b/tests/qht-bench.c
@@ -9,6 +9,7 @@
 #include "qemu/atomic.h"
 #include "qemu/qht.h"
 #include "qemu/rcu.h"
+#include "qapi/error.h"
 #include "exec/tb-hash-xx.h"
 
 struct thread_stats {
@@ -247,7 +248,7 @@ th_create_n(QemuThread **threads, struct thread_info **infos, const char *name,
         prepare_thread_info(&info[i], offset + i);
         info[i].func = func;
         qemu_thread_create(&th[i], name, thread_func, &info[i],
-                           QEMU_THREAD_JOINABLE);
+                           QEMU_THREAD_JOINABLE, &error_abort);
     }
 }
 
diff --git a/tests/rcutorture.c b/tests/rcutorture.c
index 49311c82ea..0e799ff256 100644
--- a/tests/rcutorture.c
+++ b/tests/rcutorture.c
@@ -64,6 +64,7 @@
 #include "qemu/atomic.h"
 #include "qemu/rcu.h"
 #include "qemu/thread.h"
+#include "qapi/error.h"
 
 long long n_reads = 0LL;
 long n_updates = 0L;
@@ -90,7 +91,7 @@ static void create_thread(void *(*func)(void *))
         exit(-1);
     }
     qemu_thread_create(&threads[n_threads], "test", func, &data[n_threads],
-                       QEMU_THREAD_JOINABLE);
+                       QEMU_THREAD_JOINABLE, &error_abort);
     n_threads++;
 }
 
diff --git a/tests/test-aio.c b/tests/test-aio.c
index 86fb73b3d5..b3ac261724 100644
--- a/tests/test-aio.c
+++ b/tests/test-aio.c
@@ -154,7 +154,7 @@ static void test_acquire(void)
 
     qemu_thread_create(&thread, "test_acquire_thread",
                        test_acquire_thread,
-                       &data, QEMU_THREAD_JOINABLE);
+                       &data, QEMU_THREAD_JOINABLE, &error_abort);
 
     /* Block in aio_poll(), let other thread kick us and acquire context */
     aio_context_acquire(ctx);
diff --git a/tests/test-rcu-list.c b/tests/test-rcu-list.c
index 2e6f70bd59..0f7da81291 100644
--- a/tests/test-rcu-list.c
+++ b/tests/test-rcu-list.c
@@ -25,6 +25,7 @@
 #include "qemu/rcu.h"
 #include "qemu/thread.h"
 #include "qemu/rcu_queue.h"
+#include "qapi/error.h"
 
 /*
  * Test variables.
@@ -68,7 +69,7 @@ static void create_thread(void *(*func)(void *))
         exit(-1);
     }
     qemu_thread_create(&threads[n_threads], "test", func, &data[n_threads],
-                       QEMU_THREAD_JOINABLE);
+                       QEMU_THREAD_JOINABLE, &error_abort);
     n_threads++;
 }
 
diff --git a/ui/vnc-jobs.c b/ui/vnc-jobs.c
index 929391f85d..35a652d1fd 100644
--- a/ui/vnc-jobs.c
+++ b/ui/vnc-jobs.c
@@ -31,6 +31,7 @@
 #include "vnc-jobs.h"
 #include "qemu/sockets.h"
 #include "qemu/main-loop.h"
+#include "qapi/error.h"
 #include "block/aio.h"
 
 /*
@@ -331,15 +332,21 @@ static bool vnc_worker_thread_running(void)
     return queue; /* Check global queue */
 }
 
-void vnc_start_worker_thread(void)
+bool vnc_start_worker_thread(Error **errp)
 {
     VncJobQueue *q;
 
-    if (vnc_worker_thread_running())
-        return ;
+    if (vnc_worker_thread_running()) {
+        goto out;
+    }
 
     q = vnc_queue_init();
-    qemu_thread_create(&q->thread, "vnc_worker", vnc_worker_thread, q,
-                       QEMU_THREAD_DETACHED);
+    if (!qemu_thread_create(&q->thread, "vnc_worker", vnc_worker_thread,
+                            q, QEMU_THREAD_DETACHED, errp)) {
+        vnc_queue_clear(q);
+        return false;
+    }
     queue = q; /* Set global queue */
+out:
+    return true;
 }
diff --git a/ui/vnc-jobs.h b/ui/vnc-jobs.h
index 59f66bcc35..14640593db 100644
--- a/ui/vnc-jobs.h
+++ b/ui/vnc-jobs.h
@@ -37,7 +37,7 @@ void vnc_job_push(VncJob *job);
 void vnc_jobs_join(VncState *vs);
 
 void vnc_jobs_consume_buffer(VncState *vs);
-void vnc_start_worker_thread(void);
+bool vnc_start_worker_thread(Error **errp);
 
 /* Locks */
 static inline int vnc_trylock_display(VncDisplay *vd)
diff --git a/ui/vnc.c b/ui/vnc.c
index 0c1b477425..0ffe9e6a5d 100644
--- a/ui/vnc.c
+++ b/ui/vnc.c
@@ -3236,7 +3236,9 @@ void vnc_display_init(const char *id, Error **errp)
     vd->connections_limit = 32;
 
     qemu_mutex_init(&vd->mutex);
-    vnc_start_worker_thread();
+    if (!vnc_start_worker_thread(errp)) {
+        return;
+    }
 
     vd->dcl.ops = &dcl_ops;
     register_displaychangelistener(&vd->dcl);
diff --git a/util/compatfd.c b/util/compatfd.c
index 980bd33e52..886aa249f9 100644
--- a/util/compatfd.c
+++ b/util/compatfd.c
@@ -16,6 +16,7 @@
 #include "qemu/osdep.h"
 #include "qemu-common.h"
 #include "qemu/thread.h"
+#include "qapi/error.h"
 
 #include <sys/syscall.h>
 
@@ -70,6 +71,7 @@ static int qemu_signalfd_compat(const sigset_t *mask)
     struct sigfd_compat_info *info;
     QemuThread thread;
     int fds[2];
+    Error *local_err = NULL;
 
     info = malloc(sizeof(*info));
     if (info == NULL) {
@@ -88,8 +90,14 @@ static int qemu_signalfd_compat(const sigset_t *mask)
     memcpy(&info->mask, mask, sizeof(*mask));
     info->fd = fds[1];
 
-    qemu_thread_create(&thread, "signalfd_compat", sigwait_compat, info,
-                       QEMU_THREAD_DETACHED);
+    if (!qemu_thread_create(&thread, "signalfd_compat", sigwait_compat,
+                            info, QEMU_THREAD_DETACHED, &local_err)) {
+        error_reportf_err(local_err, "failed to create sigwait_compat: ");
+        close(fds[0]);
+        close(fds[1]);
+        free(info);
+        return -1;
+    }
 
     return fds[0];
 }
diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index fbd0dc8c57..c05ed9d020 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -437,9 +437,12 @@ static bool touch_all_pages(char *area, size_t hpagesize, size_t numpages,
     size_t size_per_thread;
     char *addr = area;
     int i = 0;
+    int started_thread = 0;
+    Error *local_err = NULL;
 
     memset_thread_failed = false;
     memset_num_threads = get_memset_num_threads(smp_cpus);
+    started_thread = memset_num_threads;
     memset_thread = g_new0(MemsetThread, memset_num_threads);
     numpages_per_thread = (numpages / memset_num_threads);
     size_per_thread = (hpagesize * numpages_per_thread);
@@ -448,13 +451,19 @@ static bool touch_all_pages(char *area, size_t hpagesize, size_t numpages,
         memset_thread[i].numpages = (i == (memset_num_threads - 1)) ?
                                     numpages : numpages_per_thread;
         memset_thread[i].hpagesize = hpagesize;
-        qemu_thread_create(&memset_thread[i].pgthread, "touch_pages",
-                           do_touch_pages, &memset_thread[i],
-                           QEMU_THREAD_JOINABLE);
+        if (!qemu_thread_create(&memset_thread[i].pgthread, "touch_pages",
+                                do_touch_pages, &memset_thread[i],
+                                QEMU_THREAD_JOINABLE, &local_err)) {
+            error_reportf_err(local_err, "failed to create do_touch_pages: ");
+            memset_thread_failed = true;
+            started_thread = i;
+            goto out;
+        }
         addr += size_per_thread;
         numpages -= numpages_per_thread;
     }
-    for (i = 0; i < memset_num_threads; i++) {
+out:
+    for (i = 0; i < started_thread; i++) {
         qemu_thread_join(&memset_thread[i].pgthread);
     }
     g_free(memset_thread);
diff --git a/util/qemu-thread-posix.c b/util/qemu-thread-posix.c
index 289af4fab5..f2246dac02 100644
--- a/util/qemu-thread-posix.c
+++ b/util/qemu-thread-posix.c
@@ -15,6 +15,7 @@
 #include "qemu/atomic.h"
 #include "qemu/notify.h"
 #include "qemu-thread-common.h"
+#include "qapi/error.h"
 
 static bool name_threads;
 
@@ -504,9 +505,9 @@ static void *qemu_thread_start(void *args)
     return start_routine(arg);
 }
 
-void qemu_thread_create(QemuThread *thread, const char *name,
-                       void *(*start_routine)(void*),
-                       void *arg, int mode)
+bool qemu_thread_create(QemuThread *thread, const char *name,
+                        void *(*start_routine)(void *),
+                        void *arg, int mode, Error **errp)
 {
     sigset_t set, oldset;
     int err;
@@ -515,7 +516,9 @@ void qemu_thread_create(QemuThread *thread, const char *name,
 
     err = pthread_attr_init(&attr);
     if (err) {
-        error_exit(err, __func__);
+        error_setg_errno(errp, -err, "pthread_attr_init failed: %s",
+                         strerror(err));
+        return false;
     }
 
     if (mode == QEMU_THREAD_DETACHED) {
@@ -530,16 +533,21 @@ void qemu_thread_create(QemuThread *thread, const char *name,
     qemu_thread_args->name = g_strdup(name);
     qemu_thread_args->start_routine = start_routine;
     qemu_thread_args->arg = arg;
-
     err = pthread_create(&thread->thread, &attr,
                          qemu_thread_start, qemu_thread_args);
-
-    if (err)
-        error_exit(err, __func__);
+    if (err) {
+        error_setg_errno(errp, -err, "pthread_create failed: %s",
+                         strerror(err));
+        pthread_attr_destroy(&attr);
+        g_free(qemu_thread_args->name);
+        g_free(qemu_thread_args);
+        return false;
+    }
 
     pthread_sigmask(SIG_SETMASK, &oldset, NULL);
 
     pthread_attr_destroy(&attr);
+    return true;
 }
 
 void qemu_thread_get_self(QemuThread *thread)
diff --git a/util/qemu-thread-win32.c b/util/qemu-thread-win32.c
index 1a27e1cf6f..ca4d5329e3 100644
--- a/util/qemu-thread-win32.c
+++ b/util/qemu-thread-win32.c
@@ -20,6 +20,7 @@
 #include "qemu/thread.h"
 #include "qemu/notify.h"
 #include "qemu-thread-common.h"
+#include "qapi/error.h"
 #include <process.h>
 
 static bool name_threads;
@@ -388,9 +389,9 @@ void *qemu_thread_join(QemuThread *thread)
     return ret;
 }
 
-void qemu_thread_create(QemuThread *thread, const char *name,
-                       void *(*start_routine)(void *),
-                       void *arg, int mode)
+bool qemu_thread_create(QemuThread *thread, const char *name,
+                        void *(*start_routine)(void *),
+                        void *arg, int mode, Error **errp)
 {
     HANDLE hThread;
     struct QemuThreadData *data;
@@ -409,10 +410,17 @@ void qemu_thread_create(QemuThread *thread, const char *name,
     hThread = (HANDLE) _beginthreadex(NULL, 0, win32_start_routine,
                                       data, 0, &thread->tid);
     if (!hThread) {
-        error_exit(GetLastError(), __func__);
+        if (data->mode != QEMU_THREAD_DETACHED) {
+            DeleteCriticalSection(&data->cs);
+        }
+        error_setg_errno(errp, errno,
+                         "failed to create win32_start_routine");
+        g_free(data);
+        return false;
     }
     CloseHandle(hThread);
     thread->data = data;
+    return true;
 }
 
 void qemu_thread_get_self(QemuThread *thread)
diff --git a/util/rcu.c b/util/rcu.c
index 5676c22bd1..145dcdb0c6 100644
--- a/util/rcu.c
+++ b/util/rcu.c
@@ -32,6 +32,7 @@
 #include "qemu/atomic.h"
 #include "qemu/thread.h"
 #include "qemu/main-loop.h"
+#include "qapi/error.h"
 #if defined(CONFIG_MALLOC_TRIM)
 #include <malloc.h>
 #endif
@@ -325,7 +326,7 @@ static void rcu_init_complete(void)
      * must have been quiescent even after forking, just recreate it.
      */
     qemu_thread_create(&thread, "call_rcu", call_rcu_thread,
-                       NULL, QEMU_THREAD_DETACHED);
+                       NULL, QEMU_THREAD_DETACHED, &error_abort);
 
     rcu_register_thread();
 }
diff --git a/util/thread-pool.c b/util/thread-pool.c
index 610646d131..ad0f980783 100644
--- a/util/thread-pool.c
+++ b/util/thread-pool.c
@@ -22,6 +22,7 @@
 #include "trace.h"
 #include "block/thread-pool.h"
 #include "qemu/main-loop.h"
+#include "qapi/error.h"
 
 static void do_spawn_thread(ThreadPool *pool);
 
@@ -132,7 +133,8 @@ static void do_spawn_thread(ThreadPool *pool)
     pool->new_threads--;
     pool->pending_threads++;
 
-    qemu_thread_create(&t, "worker", worker_thread, pool, QEMU_THREAD_DETACHED);
+    qemu_thread_create(&t, "worker", worker_thread, pool,
+                       QEMU_THREAD_DETACHED, &error_abort);
 }
 
 static void spawn_thread_bh_fn(void *opaque)
-- 
2.13.7

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v6 5/7] migration: fix the multifd code when receiving less channels
  2018-10-29 12:58 ` [Qemu-devel] [PATCH RFC v6 5/7] migration: fix the multifd code when receiving less channels Fei Li
@ 2018-10-30  6:05   ` Peter Xu
  2018-10-30 10:05     ` Fei Li
  2018-10-31 13:50   ` Fei Li
  1 sibling, 1 reply; 17+ messages in thread
From: Peter Xu @ 2018-10-30  6:05 UTC (permalink / raw)
  To: Fei Li; +Cc: qemu-devel, armbru, dgilbert, famz, quintela

On Mon, Oct 29, 2018 at 08:58:16PM +0800, Fei Li wrote:
> In our current code, when multifd is used during migration, if there
> is an error before the destination receives all new channels, the
> source keeps running, however the destination does not exit but keeps
> waiting until the source is killed deliberately.
> 
> Fix this by simply killing the destination when it fails to receive
> packet via some channel.
> 
> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Cc: Peter Xu <peterx@redhat.com>
> Signed-off-by: Fei Li <fli@suse.com>
> ---
>  migration/channel.c   |  7 ++++++-
>  migration/migration.c |  9 +++++++--
>  migration/migration.h |  2 +-
>  migration/ram.c       | 17 ++++++++++++++---
>  migration/ram.h       |  2 +-
>  5 files changed, 29 insertions(+), 8 deletions(-)
> 
> diff --git a/migration/channel.c b/migration/channel.c
> index 33e0e9b82f..572be4245a 100644
> --- a/migration/channel.c
> +++ b/migration/channel.c
> @@ -44,7 +44,12 @@ void migration_channel_process_incoming(QIOChannel *ioc)
>              error_report_err(local_err);

[1]

>          }
>      } else {
> -        migration_ioc_process_incoming(ioc);
> +        Error *local_err = NULL;
> +        migration_ioc_process_incoming(ioc, &local_err);
> +        if (local_err) {
> +            error_report_err(local_err);
> +            exit(EXIT_FAILURE);

I would still suggest that you don't quit.  See TLS error at [1], it
only dumps the error.  IMHO users can quit easily for dst vm, I'll
just let them decide if they want.

Then you can merge the error path for both.

> +        }
>      }
>  }
>  
> diff --git a/migration/migration.c b/migration/migration.c
> index 8b36e7f184..87dfc7374f 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -541,7 +541,7 @@ void migration_fd_process_incoming(QEMUFile *f)
>      migration_incoming_process();
>  }
>  
> -void migration_ioc_process_incoming(QIOChannel *ioc)
> +void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp)
>  {
>      MigrationIncomingState *mis = migration_incoming_get_current();
>      bool start_migration;
> @@ -563,9 +563,14 @@ void migration_ioc_process_incoming(QIOChannel *ioc)
>           */
>          start_migration = !migrate_use_multifd();
>      } else {
> +        Error *local_err = NULL;
>          /* Multiple connections */
>          assert(migrate_use_multifd());
> -        start_migration = multifd_recv_new_channel(ioc);
> +        start_migration = multifd_recv_new_channel(ioc, &local_err);
> +        if (local_err) {
> +            error_propagate(errp, local_err);
> +            return;
> +        }
>      }
>  
>      if (start_migration) {
> diff --git a/migration/migration.h b/migration/migration.h
> index f7813f8261..7df4d426d0 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -229,7 +229,7 @@ struct MigrationState
>  void migrate_set_state(int *state, int old_state, int new_state);
>  
>  void migration_fd_process_incoming(QEMUFile *f);
> -void migration_ioc_process_incoming(QIOChannel *ioc);
> +void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp);
>  void migration_incoming_process(void);
>  
>  bool  migration_has_all_channels(void);
> diff --git a/migration/ram.c b/migration/ram.c
> index 4db3b3e8f4..8f03afe228 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -1072,6 +1072,7 @@ out:
>  static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque)
>  {
>      MultiFDSendParams *p = opaque;
> +    MigrationState *s = migrate_get_current();

This seems to be the source part, then I'll suggest you split the
patch and keep this patch only touches the dest vm path.

>      QIOChannel *sioc = QIO_CHANNEL(qio_task_get_source(task));
>      Error *local_err = NULL;
>  
> @@ -1080,6 +1081,7 @@ static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque)
>      }
>  
>      if (qio_task_propagate_error(task, &local_err)) {
> +        migrate_set_state(&s->state, s->state, MIGRATION_STATUS_FAILED);
>          if (multifd_save_cleanup(&local_err) != 0) {
>              migrate_set_error(migrate_get_current(), local_err);
>          }
> @@ -1337,16 +1339,20 @@ bool multifd_recv_all_channels_created(void)
>  }
>  
>  /* Return true if multifd is ready for the migration, otherwise false */
> -bool multifd_recv_new_channel(QIOChannel *ioc)
> +bool multifd_recv_new_channel(QIOChannel *ioc, Error **errp)
>  {
> +    MigrationIncomingState *mis = migration_incoming_get_current();
>      MultiFDRecvParams *p;
>      Error *local_err = NULL;
>      int id;
>  
>      id = multifd_recv_initial_packet(ioc, &local_err);
>      if (id < 0) {
> +        error_propagate_prepend(errp, local_err,
> +                        "failed to receive packet via multifd channel %x: ",
> +                        multifd_recv_state->count);
>          multifd_recv_terminate_threads(local_err, false);
> -        return false;
> +        goto fail;
>      }
>  
>      p = &multifd_recv_state->params[id];
> @@ -1354,7 +1360,8 @@ bool multifd_recv_new_channel(QIOChannel *ioc)
>          error_setg(&local_err, "multifd: received id '%d' already setup'",
>                     id);
>          multifd_recv_terminate_threads(local_err, true);
> -        return false;
> +        error_propagate(errp, local_err);
> +        goto fail;
>      }
>      p->c = ioc;
>      object_ref(OBJECT(ioc));
> @@ -1366,6 +1373,10 @@ bool multifd_recv_new_channel(QIOChannel *ioc)
>                         QEMU_THREAD_JOINABLE);
>      atomic_inc(&multifd_recv_state->count);
>      return multifd_recv_state->count == migrate_multifd_channels();
> +fail:
> +    qemu_fclose(mis->from_src_file);
> +    mis->from_src_file = NULL;
> +    return false;

Do we need this?

I'd suggest to put all cleanups into a single function.  For dest vm
I say it's process_incoming_migration_bh.

Regards,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v6 5/7] migration: fix the multifd code when receiving less channels
  2018-10-30  6:05   ` Peter Xu
@ 2018-10-30 10:05     ` Fei Li
  2018-10-30 22:18       ` Peter Xu
  0 siblings, 1 reply; 17+ messages in thread
From: Fei Li @ 2018-10-30 10:05 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, armbru, dgilbert, famz, quintela



On 10/30/2018 02:05 PM, Peter Xu wrote:
> On Mon, Oct 29, 2018 at 08:58:16PM +0800, Fei Li wrote:
>> In our current code, when multifd is used during migration, if there
>> is an error before the destination receives all new channels, the
>> source keeps running, however the destination does not exit but keeps
>> waiting until the source is killed deliberately.
>>
>> Fix this by simply killing the destination when it fails to receive
>> packet via some channel.
>>
>> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
>> Cc: Peter Xu <peterx@redhat.com>
>> Signed-off-by: Fei Li <fli@suse.com>
>> ---
>>   migration/channel.c   |  7 ++++++-
>>   migration/migration.c |  9 +++++++--
>>   migration/migration.h |  2 +-
>>   migration/ram.c       | 17 ++++++++++++++---
>>   migration/ram.h       |  2 +-
>>   5 files changed, 29 insertions(+), 8 deletions(-)
>>
>> diff --git a/migration/channel.c b/migration/channel.c
>> index 33e0e9b82f..572be4245a 100644
>> --- a/migration/channel.c
>> +++ b/migration/channel.c
>> @@ -44,7 +44,12 @@ void migration_channel_process_incoming(QIOChannel *ioc)
>>               error_report_err(local_err);
> [1]
>
>>           }
>>       } else {
>> -        migration_ioc_process_incoming(ioc);
>> +        Error *local_err = NULL;
>> +        migration_ioc_process_incoming(ioc, &local_err);
>> +        if (local_err) {
>> +            error_report_err(local_err);
>> +            exit(EXIT_FAILURE);
> I would still suggest that you don't quit.  See TLS error at [1], it
> only dumps the error.  IMHO users can quit easily for dst vm, I'll
> just let them decide if they want.
>
> Then you can merge the error path for both.
Ok, got it, thanks :)
>
>> +        }
>>       }
>>   }
>>   
>> diff --git a/migration/migration.c b/migration/migration.c
>> index 8b36e7f184..87dfc7374f 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -541,7 +541,7 @@ void migration_fd_process_incoming(QEMUFile *f)
>>       migration_incoming_process();
>>   }
>>   
>> -void migration_ioc_process_incoming(QIOChannel *ioc)
>> +void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp)
>>   {
>>       MigrationIncomingState *mis = migration_incoming_get_current();
>>       bool start_migration;
>> @@ -563,9 +563,14 @@ void migration_ioc_process_incoming(QIOChannel *ioc)
>>            */
>>           start_migration = !migrate_use_multifd();
>>       } else {
>> +        Error *local_err = NULL;
>>           /* Multiple connections */
>>           assert(migrate_use_multifd());
>> -        start_migration = multifd_recv_new_channel(ioc);
>> +        start_migration = multifd_recv_new_channel(ioc, &local_err);
>> +        if (local_err) {
>> +            error_propagate(errp, local_err);
>> +            return;
>> +        }
>>       }
>>   
>>       if (start_migration) {
>> diff --git a/migration/migration.h b/migration/migration.h
>> index f7813f8261..7df4d426d0 100644
>> --- a/migration/migration.h
>> +++ b/migration/migration.h
>> @@ -229,7 +229,7 @@ struct MigrationState
>>   void migrate_set_state(int *state, int old_state, int new_state);
>>   
>>   void migration_fd_process_incoming(QEMUFile *f);
>> -void migration_ioc_process_incoming(QIOChannel *ioc);
>> +void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp);
>>   void migration_incoming_process(void);
>>   
>>   bool  migration_has_all_channels(void);
>> diff --git a/migration/ram.c b/migration/ram.c
>> index 4db3b3e8f4..8f03afe228 100644
>> --- a/migration/ram.c
>> +++ b/migration/ram.c
>> @@ -1072,6 +1072,7 @@ out:
>>   static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque)
>>   {
>>       MultiFDSendParams *p = opaque;
>> +    MigrationState *s = migrate_get_current();
> This seems to be the source part, then I'll suggest you split the
> patch and keep this patch only touches the dest vm path.
ok
>
>>       QIOChannel *sioc = QIO_CHANNEL(qio_task_get_source(task));
>>       Error *local_err = NULL;
>>   
>> @@ -1080,6 +1081,7 @@ static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque)
>>       }
>>   
>>       if (qio_task_propagate_error(task, &local_err)) {
>> +        migrate_set_state(&s->state, s->state, MIGRATION_STATUS_FAILED);
>>           if (multifd_save_cleanup(&local_err) != 0) {
>>               migrate_set_error(migrate_get_current(), local_err);
>>           }
>> @@ -1337,16 +1339,20 @@ bool multifd_recv_all_channels_created(void)
>>   }
>>   
>>   /* Return true if multifd is ready for the migration, otherwise false */
>> -bool multifd_recv_new_channel(QIOChannel *ioc)
>> +bool multifd_recv_new_channel(QIOChannel *ioc, Error **errp)
>>   {
>> +    MigrationIncomingState *mis = migration_incoming_get_current();
>>       MultiFDRecvParams *p;
>>       Error *local_err = NULL;
>>       int id;
>>   
>>       id = multifd_recv_initial_packet(ioc, &local_err);
>>       if (id < 0) {
>> +        error_propagate_prepend(errp, local_err,
>> +                        "failed to receive packet via multifd channel %x: ",
>> +                        multifd_recv_state->count);
>>           multifd_recv_terminate_threads(local_err, false);
>> -        return false;
>> +        goto fail;
>>       }
>>   
>>       p = &multifd_recv_state->params[id];
>> @@ -1354,7 +1360,8 @@ bool multifd_recv_new_channel(QIOChannel *ioc)
>>           error_setg(&local_err, "multifd: received id '%d' already setup'",
>>                      id);
>>           multifd_recv_terminate_threads(local_err, true);
>> -        return false;
>> +        error_propagate(errp, local_err);
>> +        goto fail;
>>       }
>>       p->c = ioc;
>>       object_ref(OBJECT(ioc));
>> @@ -1366,6 +1373,10 @@ bool multifd_recv_new_channel(QIOChannel *ioc)
>>                          QEMU_THREAD_JOINABLE);
>>       atomic_inc(&multifd_recv_state->count);
>>       return multifd_recv_state->count == migrate_multifd_channels();
>> +fail:
>> +    qemu_fclose(mis->from_src_file);
>> +    mis->from_src_file = NULL;
>> +    return false;
> Do we need this?
>
> I'd suggest to put all cleanups into a single function.  For dest vm
> I say it's process_incoming_migration_bh.
>
> Regards,
>
Not sure whether I understand correctly, if multifd_recv_new_channel() 
fails,
that means migration_incoming_process() will not be called, then
process_incoming_migration_co() and process_incoming_migration_bh()
will not be called either. In that way, there is no cleanup.

Have a nice day, thanks
Fei

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v6 6/7] migration: fix some error handling
  2018-10-29 12:58 ` [Qemu-devel] [PATCH RFC v6 6/7] migration: fix some error handling Fei Li
@ 2018-10-30 19:49   ` Dr. David Alan Gilbert
  2018-10-31 11:25     ` Fei Li
  0 siblings, 1 reply; 17+ messages in thread
From: Dr. David Alan Gilbert @ 2018-10-30 19:49 UTC (permalink / raw)
  To: Fei Li; +Cc: qemu-devel, armbru, famz, peterx, quintela

* Fei Li (fli@suse.com) wrote:
> Add error handling for qemu_ram_foreach_migratable_block() when
> it fails.
> 
> Always call migrate_set_error() to set the error state without relying
> on whether multifd_save_cleanup() succeeds. As the passed &local_err
> is never used in multifd_save_cleanup(), remove it.
> 
> Signed-off-by: Fei Li <fli@suse.com>
> ---
>  migration/migration.c    | 5 +----
>  migration/postcopy-ram.c | 3 +++
>  migration/ram.c          | 7 +++----
>  migration/ram.h          | 2 +-
>  4 files changed, 8 insertions(+), 9 deletions(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 87dfc7374f..3b8b7ab4f9 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1377,7 +1377,6 @@ static void migrate_fd_cleanup(void *opaque)
>      qemu_savevm_state_cleanup();
>  
>      if (s->to_dst_file) {
> -        Error *local_err = NULL;
>          QEMUFile *tmp;
>  
>          trace_migrate_fd_cleanup();
> @@ -1388,9 +1387,7 @@ static void migrate_fd_cleanup(void *opaque)
>          }
>          qemu_mutex_lock_iothread();
>  
> -        if (multifd_save_cleanup(&local_err) != 0) {
> -            error_report_err(local_err);
> -        }
> +        multifd_save_cleanup();
>          qemu_mutex_lock(&s->qemu_file_lock);
>          tmp = s->to_dst_file;
>          s->to_dst_file = NULL;
> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> index e5c02a32c5..4ca2bc02b3 100644
> --- a/migration/postcopy-ram.c
> +++ b/migration/postcopy-ram.c
> @@ -1117,6 +1117,9 @@ int postcopy_ram_enable_notify(MigrationIncomingState *mis)
>  
>      /* Mark so that we get notified of accesses to unwritten areas */
>      if (qemu_ram_foreach_migratable_block(ram_block_enable_notify, mis)) {
> +        error_report("ram_block_enable_notify failed");
> +        close(mis->userfault_event_fd);
> +        close(mis->userfault_fd);

I don't think these close() calls are safe.  This code is just after
starting the fault thread, and the fault thread has a poll() call on
these fd's, so we can't close them until we've instructed that thread
to exit.

We should fall out through postcopy_ram_incoming_cleanup, and because
the thread exists it should do a notify to the thread, a join and then
only later do the close calls.

Dave

>          return -1;
>      }
>  
> diff --git a/migration/ram.c b/migration/ram.c
> index 8f03afe228..57cb1bee3d 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -922,7 +922,7 @@ static void multifd_send_terminate_threads(Error *err)
>      }
>  }
>  
> -int multifd_save_cleanup(Error **errp)
> +int multifd_save_cleanup(void)
>  {
>      int i;
>      int ret = 0;
> @@ -1082,9 +1082,8 @@ static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque)
>  
>      if (qio_task_propagate_error(task, &local_err)) {
>          migrate_set_state(&s->state, s->state, MIGRATION_STATUS_FAILED);
> -        if (multifd_save_cleanup(&local_err) != 0) {
> -            migrate_set_error(migrate_get_current(), local_err);
> -        }
> +        multifd_save_cleanup();
> +        migrate_set_error(migrate_get_current(), local_err);
>      } else {
>          p->c = QIO_CHANNEL(sioc);
>          qio_channel_set_delay(p->c, false);
> diff --git a/migration/ram.h b/migration/ram.h
> index 046d3074be..0d1215209e 100644
> --- a/migration/ram.h
> +++ b/migration/ram.h
> @@ -43,7 +43,7 @@ uint64_t ram_bytes_remaining(void);
>  uint64_t ram_bytes_total(void);
>  
>  int multifd_save_setup(void);
> -int multifd_save_cleanup(Error **errp);
> +int multifd_save_cleanup(void);
>  int multifd_load_setup(void);
>  int multifd_load_cleanup(Error **errp);
>  bool multifd_recv_all_channels_created(void);
> -- 
> 2.13.7
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v6 5/7] migration: fix the multifd code when receiving less channels
  2018-10-30 10:05     ` Fei Li
@ 2018-10-30 22:18       ` Peter Xu
  2018-10-31 12:26         ` Fei Li
  0 siblings, 1 reply; 17+ messages in thread
From: Peter Xu @ 2018-10-30 22:18 UTC (permalink / raw)
  To: Fei Li; +Cc: qemu-devel, armbru, dgilbert, famz, quintela

On Tue, Oct 30, 2018 at 06:05:18PM +0800, Fei Li wrote:

[...]

> > > @@ -1366,6 +1373,10 @@ bool multifd_recv_new_channel(QIOChannel *ioc)
> > >                          QEMU_THREAD_JOINABLE);
> > >       atomic_inc(&multifd_recv_state->count);
> > >       return multifd_recv_state->count == migrate_multifd_channels();
> > > +fail:
> > > +    qemu_fclose(mis->from_src_file);
> > > +    mis->from_src_file = NULL;
> > > +    return false;
> > Do we need this?
> > 
> > I'd suggest to put all cleanups into a single function.  For dest vm
> > I say it's process_incoming_migration_bh.
> > 
> > Regards,
> > 
> Not sure whether I understand correctly, if multifd_recv_new_channel()
> fails,
> that means migration_incoming_process() will not be called, then
> process_incoming_migration_co() and process_incoming_migration_bh()
> will not be called either. In that way, there is no cleanup.

Sorry the funtion name I wanted to paste is something like
migration_incoming_state_destroy()...  Anyway I still don't feel that
right to close the mis->from_src_file in a multifd special path.

For now, I'll either ignore the cleanup part (AFAIU the TLS failure
will also ignore it when migration_tls_channel_process_incoming fails)
and just print the extra error message, or you can also look into how
to cleanup the dest vm in a better way.  That could be someting like
calling migration_incoming_state_destroy() somewhere in
migration_channel_process_incoming() when failure happens but I'm not
sure.

Regards,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v6 6/7] migration: fix some error handling
  2018-10-30 19:49   ` Dr. David Alan Gilbert
@ 2018-10-31 11:25     ` Fei Li
  2018-10-31 16:30       ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 17+ messages in thread
From: Fei Li @ 2018-10-31 11:25 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: qemu-devel, armbru, famz, peterx, quintela



On 10/31/2018 03:49 AM, Dr. David Alan Gilbert wrote:
> * Fei Li (fli@suse.com) wrote:
>> Add error handling for qemu_ram_foreach_migratable_block() when
>> it fails.
>>
>> Always call migrate_set_error() to set the error state without relying
>> on whether multifd_save_cleanup() succeeds. As the passed &local_err
>> is never used in multifd_save_cleanup(), remove it.
>>
>> Signed-off-by: Fei Li <fli@suse.com>
>> ---
>>   migration/migration.c    | 5 +----
>>   migration/postcopy-ram.c | 3 +++
>>   migration/ram.c          | 7 +++----
>>   migration/ram.h          | 2 +-
>>   4 files changed, 8 insertions(+), 9 deletions(-)
>>
>> diff --git a/migration/migration.c b/migration/migration.c
>> index 87dfc7374f..3b8b7ab4f9 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -1377,7 +1377,6 @@ static void migrate_fd_cleanup(void *opaque)
>>       qemu_savevm_state_cleanup();
>>   
>>       if (s->to_dst_file) {
>> -        Error *local_err = NULL;
>>           QEMUFile *tmp;
>>   
>>           trace_migrate_fd_cleanup();
>> @@ -1388,9 +1387,7 @@ static void migrate_fd_cleanup(void *opaque)
>>           }
>>           qemu_mutex_lock_iothread();
>>   
>> -        if (multifd_save_cleanup(&local_err) != 0) {
>> -            error_report_err(local_err);
>> -        }
>> +        multifd_save_cleanup();
>>           qemu_mutex_lock(&s->qemu_file_lock);
>>           tmp = s->to_dst_file;
>>           s->to_dst_file = NULL;
>> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
>> index e5c02a32c5..4ca2bc02b3 100644
>> --- a/migration/postcopy-ram.c
>> +++ b/migration/postcopy-ram.c
>> @@ -1117,6 +1117,9 @@ int postcopy_ram_enable_notify(MigrationIncomingState *mis)
>>   
>>       /* Mark so that we get notified of accesses to unwritten areas */
>>       if (qemu_ram_foreach_migratable_block(ram_block_enable_notify, mis)) {
>> +        error_report("ram_block_enable_notify failed");
>> +        close(mis->userfault_event_fd);
>> +        close(mis->userfault_fd);
> I don't think these close() calls are safe.  This code is just after
> starting the fault thread, and the fault thread has a poll() call on
> these fd's, so we can't close them until we've instructed that thread
> to exit.
>
> We should fall out through postcopy_ram_incoming_cleanup, and because
> the thread exists it should do a notify to the thread, a join and then
> only later do the close calls.
>
> Dave
I see the postcopy_ram_incoming_cleanup() already include the
notify & join the fault thread & close these two fds and other more cleanup,
thus that means directly replace the above two close() with
postcopy_ram_incoming_cleanup() is ok, right? :)

BTW, does the parameter "&local_err" for multifd_save_cleanup(&local_err) is
reserved for some reason? As I notice no code is using this error parameter,
and if it is reserved for nothing I'd like to delete it just as the 
second paragraph
of the commit message said. :)

Have a nice day, thanks
Fei
>
>>           return -1;
>>       }
>>   
>> diff --git a/migration/ram.c b/migration/ram.c
>> index 8f03afe228..57cb1bee3d 100644
>> --- a/migration/ram.c
>> +++ b/migration/ram.c
>> @@ -922,7 +922,7 @@ static void multifd_send_terminate_threads(Error *err)
>>       }
>>   }
>>   
>> -int multifd_save_cleanup(Error **errp)
>> +int multifd_save_cleanup(void)
>>   {
>>       int i;
>>       int ret = 0;
>> @@ -1082,9 +1082,8 @@ static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque)
>>   
>>       if (qio_task_propagate_error(task, &local_err)) {
>>           migrate_set_state(&s->state, s->state, MIGRATION_STATUS_FAILED);
>> -        if (multifd_save_cleanup(&local_err) != 0) {
>> -            migrate_set_error(migrate_get_current(), local_err);
>> -        }
>> +        multifd_save_cleanup();
>> +        migrate_set_error(migrate_get_current(), local_err);
>>       } else {
>>           p->c = QIO_CHANNEL(sioc);
>>           qio_channel_set_delay(p->c, false);
>> diff --git a/migration/ram.h b/migration/ram.h
>> index 046d3074be..0d1215209e 100644
>> --- a/migration/ram.h
>> +++ b/migration/ram.h
>> @@ -43,7 +43,7 @@ uint64_t ram_bytes_remaining(void);
>>   uint64_t ram_bytes_total(void);
>>   
>>   int multifd_save_setup(void);
>> -int multifd_save_cleanup(Error **errp);
>> +int multifd_save_cleanup(void);
>>   int multifd_load_setup(void);
>>   int multifd_load_cleanup(Error **errp);
>>   bool multifd_recv_all_channels_created(void);
>> -- 
>> 2.13.7
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v6 5/7] migration: fix the multifd code when receiving less channels
  2018-10-30 22:18       ` Peter Xu
@ 2018-10-31 12:26         ` Fei Li
  0 siblings, 0 replies; 17+ messages in thread
From: Fei Li @ 2018-10-31 12:26 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, armbru, dgilbert, famz, quintela



On 10/31/2018 06:18 AM, Peter Xu wrote:
> On Tue, Oct 30, 2018 at 06:05:18PM +0800, Fei Li wrote:
>
> [...]
>
>>>> @@ -1366,6 +1373,10 @@ bool multifd_recv_new_channel(QIOChannel *ioc)
>>>>                           QEMU_THREAD_JOINABLE);
>>>>        atomic_inc(&multifd_recv_state->count);
>>>>        return multifd_recv_state->count == migrate_multifd_channels();
>>>> +fail:
>>>> +    qemu_fclose(mis->from_src_file);
>>>> +    mis->from_src_file = NULL;
>>>> +    return false;
>>> Do we need this?
>>>
>>> I'd suggest to put all cleanups into a single function.  For dest vm
>>> I say it's process_incoming_migration_bh.
>>>
>>> Regards,
>>>
>> Not sure whether I understand correctly, if multifd_recv_new_channel()
>> fails,
>> that means migration_incoming_process() will not be called, then
>> process_incoming_migration_co() and process_incoming_migration_bh()
>> will not be called either. In that way, there is no cleanup.
> Sorry the funtion name I wanted to paste is something like
> migration_incoming_state_destroy()...  Anyway I still don't feel that
> right to close the mis->from_src_file in a multifd special path.
>
> For now, I'll either ignore the cleanup part (AFAIU the TLS failure
> will also ignore it when migration_tls_channel_process_incoming fails)
> and just print the extra error message,
I will adopt this option, thanks for the suggestion :)

Have a nice day, thanks
Fei
>   or you can also look into how
> to cleanup the dest vm in a better way.  That could be someting like
> calling migration_incoming_state_destroy() somewhere in
> migration_channel_process_incoming() when failure happens but I'm not
> sure.
>
> Regards,
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v6 5/7] migration: fix the multifd code when receiving less channels
  2018-10-29 12:58 ` [Qemu-devel] [PATCH RFC v6 5/7] migration: fix the multifd code when receiving less channels Fei Li
  2018-10-30  6:05   ` Peter Xu
@ 2018-10-31 13:50   ` Fei Li
  1 sibling, 0 replies; 17+ messages in thread
From: Fei Li @ 2018-10-31 13:50 UTC (permalink / raw)
  To: qemu-devel; +Cc: quintela, famz, armbru, peterx, dgilbert

Hi all,

I create a new thread to inquiry one live migration issue when using 
multifd :)
I am not so sure with the rule that when and how to use the multifd is 
correct,
so I'd like to confirm. This is because when I use the current upstream qemu
code and run into a failed case: "Migration status: failed (Unable to 
write to
socket: Connection reset by peer)".

The detailed is as follows and the "failed" situation can not reproduce 
100%.
But as far as I tested, if I do the live migration using multifd* just 
after the
guest started for less than one minute*, I almost can reproduce this for 
100%.

My steps are:
1. start the vm in the src side;
2. start the -incoming in the dst side;
3. after the vm started for a little while (After I open a file inside 
the vm),
     I begin the live migration, steps are:
- on src: migrate_set_capability x-multifd on
- on src: migrate_set_parameter x-multifd-channels 4
- on dst: migrate_set_capability x-multifd on
- on dst: migrate_set_parameter x-multifd-channels 4
- on src: migrate -d tcp:192.168.120.5:4444

Errors are:
[src]
linux-50ts:/mnt/live-migration # ./sle12-source.sh
QEMU 3.0.50 monitor - type 'help' for more information
(qemu) Running QEMU with SDL 1.2 is deprecated, and will be removed
in a future release. Please switch to SDL 2.0 instead
  migrate_set_capability x-multifd on
(qemu)  migrate_set_parameter x-multifd-channels 4
(qemu) migrate -d tcp:192.168.120.5:4444
(qemu) info migrate
globals:
store-global-state: on
only-migratable: off
send-configuration: on
send-section-footer: on
decompress-error-check: on
capabilities: xbzrle: off rdma-pin-all: off auto-converge: off 
zero-blocks: off compress: off events: off postcopy-ram: off x-colo: off 
release-ram: off block: off return-path: off pause-before-switchover: 
off x-multifd: on dirty-bitmaps: off postcopy-blocktime: off 
late-block-activate: off
Migration status: failed (Unable to write to socket: Connection reset by 
peer)
total time: 0 milliseconds

[dst]
linux-p6v6:/mnt/live-migration # ./sle12-dest.sh
QEMU 3.0.50 monitor - type 'help' for more information
(qemu) migrate_set_capability x-multifd on
(qemu)  migrate_set_parameter x-multifd-channels 4
(qemu) info migrate
globals:
store-global-state: on
only-migratable: off
send-configuration: on
send-section-footer: on
decompress-error-check: on


Hope this does not bother you too much. ;)
Have a nice day, thanks again
Fei

On 10/29/2018 08:58 PM, Fei Li wrote:
> In our current code, when multifd is used during migration, if there
> is an error before the destination receives all new channels, the
> source keeps running, however the destination does not exit but keeps
> waiting until the source is killed deliberately.
>
> Fix this by simply killing the destination when it fails to receive
> packet via some channel.
>
> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Cc: Peter Xu <peterx@redhat.com>
> Signed-off-by: Fei Li <fli@suse.com>
> ---
>   migration/channel.c   |  7 ++++++-
>   migration/migration.c |  9 +++++++--
>   migration/migration.h |  2 +-
>   migration/ram.c       | 17 ++++++++++++++---
>   migration/ram.h       |  2 +-
>   5 files changed, 29 insertions(+), 8 deletions(-)
>
> diff --git a/migration/channel.c b/migration/channel.c
> index 33e0e9b82f..572be4245a 100644
> --- a/migration/channel.c
> +++ b/migration/channel.c
> @@ -44,7 +44,12 @@ void migration_channel_process_incoming(QIOChannel *ioc)
>               error_report_err(local_err);
>           }
>       } else {
> -        migration_ioc_process_incoming(ioc);
> +        Error *local_err = NULL;
> +        migration_ioc_process_incoming(ioc, &local_err);
> +        if (local_err) {
> +            error_report_err(local_err);
> +            exit(EXIT_FAILURE);
> +        }
>       }
>   }
>   
> diff --git a/migration/migration.c b/migration/migration.c
> index 8b36e7f184..87dfc7374f 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -541,7 +541,7 @@ void migration_fd_process_incoming(QEMUFile *f)
>       migration_incoming_process();
>   }
>   
> -void migration_ioc_process_incoming(QIOChannel *ioc)
> +void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp)
>   {
>       MigrationIncomingState *mis = migration_incoming_get_current();
>       bool start_migration;
> @@ -563,9 +563,14 @@ void migration_ioc_process_incoming(QIOChannel *ioc)
>            */
>           start_migration = !migrate_use_multifd();
>       } else {
> +        Error *local_err = NULL;
>           /* Multiple connections */
>           assert(migrate_use_multifd());
> -        start_migration = multifd_recv_new_channel(ioc);
> +        start_migration = multifd_recv_new_channel(ioc, &local_err);
> +        if (local_err) {
> +            error_propagate(errp, local_err);
> +            return;
> +        }
>       }
>   
>       if (start_migration) {
> diff --git a/migration/migration.h b/migration/migration.h
> index f7813f8261..7df4d426d0 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -229,7 +229,7 @@ struct MigrationState
>   void migrate_set_state(int *state, int old_state, int new_state);
>   
>   void migration_fd_process_incoming(QEMUFile *f);
> -void migration_ioc_process_incoming(QIOChannel *ioc);
> +void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp);
>   void migration_incoming_process(void);
>   
>   bool  migration_has_all_channels(void);
> diff --git a/migration/ram.c b/migration/ram.c
> index 4db3b3e8f4..8f03afe228 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -1072,6 +1072,7 @@ out:
>   static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque)
>   {
>       MultiFDSendParams *p = opaque;
> +    MigrationState *s = migrate_get_current();
>       QIOChannel *sioc = QIO_CHANNEL(qio_task_get_source(task));
>       Error *local_err = NULL;
>   
> @@ -1080,6 +1081,7 @@ static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque)
>       }
>   
>       if (qio_task_propagate_error(task, &local_err)) {
> +        migrate_set_state(&s->state, s->state, MIGRATION_STATUS_FAILED);
>           if (multifd_save_cleanup(&local_err) != 0) {
>               migrate_set_error(migrate_get_current(), local_err);
>           }
> @@ -1337,16 +1339,20 @@ bool multifd_recv_all_channels_created(void)
>   }
>   
>   /* Return true if multifd is ready for the migration, otherwise false */
> -bool multifd_recv_new_channel(QIOChannel *ioc)
> +bool multifd_recv_new_channel(QIOChannel *ioc, Error **errp)
>   {
> +    MigrationIncomingState *mis = migration_incoming_get_current();
>       MultiFDRecvParams *p;
>       Error *local_err = NULL;
>       int id;
>   
>       id = multifd_recv_initial_packet(ioc, &local_err);
>       if (id < 0) {
> +        error_propagate_prepend(errp, local_err,
> +                        "failed to receive packet via multifd channel %x: ",
> +                        multifd_recv_state->count);
>           multifd_recv_terminate_threads(local_err, false);
> -        return false;
> +        goto fail;
>       }
>   
>       p = &multifd_recv_state->params[id];
> @@ -1354,7 +1360,8 @@ bool multifd_recv_new_channel(QIOChannel *ioc)
>           error_setg(&local_err, "multifd: received id '%d' already setup'",
>                      id);
>           multifd_recv_terminate_threads(local_err, true);
> -        return false;
> +        error_propagate(errp, local_err);
> +        goto fail;
>       }
>       p->c = ioc;
>       object_ref(OBJECT(ioc));
> @@ -1366,6 +1373,10 @@ bool multifd_recv_new_channel(QIOChannel *ioc)
>                          QEMU_THREAD_JOINABLE);
>       atomic_inc(&multifd_recv_state->count);
>       return multifd_recv_state->count == migrate_multifd_channels();
> +fail:
> +    qemu_fclose(mis->from_src_file);
> +    mis->from_src_file = NULL;
> +    return false;
>   }
>   
>   /**
> diff --git a/migration/ram.h b/migration/ram.h
> index 83ff1bc11a..046d3074be 100644
> --- a/migration/ram.h
> +++ b/migration/ram.h
> @@ -47,7 +47,7 @@ int multifd_save_cleanup(Error **errp);
>   int multifd_load_setup(void);
>   int multifd_load_cleanup(Error **errp);
>   bool multifd_recv_all_channels_created(void);
> -bool multifd_recv_new_channel(QIOChannel *ioc);
> +bool multifd_recv_new_channel(QIOChannel *ioc, Error **errp);
>   
>   uint64_t ram_pagesize_summary(void);
>   int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t len);

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v6 6/7] migration: fix some error handling
  2018-10-31 11:25     ` Fei Li
@ 2018-10-31 16:30       ` Dr. David Alan Gilbert
  2018-11-01  5:20         ` Fei Li
  0 siblings, 1 reply; 17+ messages in thread
From: Dr. David Alan Gilbert @ 2018-10-31 16:30 UTC (permalink / raw)
  To: Fei Li; +Cc: qemu-devel, armbru, famz, peterx, quintela

* Fei Li (fli@suse.com) wrote:
> 
> 
> On 10/31/2018 03:49 AM, Dr. David Alan Gilbert wrote:
> > * Fei Li (fli@suse.com) wrote:
> > > Add error handling for qemu_ram_foreach_migratable_block() when
> > > it fails.
> > > 
> > > Always call migrate_set_error() to set the error state without relying
> > > on whether multifd_save_cleanup() succeeds. As the passed &local_err
> > > is never used in multifd_save_cleanup(), remove it.
> > > 
> > > Signed-off-by: Fei Li <fli@suse.com>
> > > ---
> > >   migration/migration.c    | 5 +----
> > >   migration/postcopy-ram.c | 3 +++
> > >   migration/ram.c          | 7 +++----
> > >   migration/ram.h          | 2 +-
> > >   4 files changed, 8 insertions(+), 9 deletions(-)
> > > 
> > > diff --git a/migration/migration.c b/migration/migration.c
> > > index 87dfc7374f..3b8b7ab4f9 100644
> > > --- a/migration/migration.c
> > > +++ b/migration/migration.c
> > > @@ -1377,7 +1377,6 @@ static void migrate_fd_cleanup(void *opaque)
> > >       qemu_savevm_state_cleanup();
> > >       if (s->to_dst_file) {
> > > -        Error *local_err = NULL;
> > >           QEMUFile *tmp;
> > >           trace_migrate_fd_cleanup();
> > > @@ -1388,9 +1387,7 @@ static void migrate_fd_cleanup(void *opaque)
> > >           }
> > >           qemu_mutex_lock_iothread();
> > > -        if (multifd_save_cleanup(&local_err) != 0) {
> > > -            error_report_err(local_err);
> > > -        }
> > > +        multifd_save_cleanup();
> > >           qemu_mutex_lock(&s->qemu_file_lock);
> > >           tmp = s->to_dst_file;
> > >           s->to_dst_file = NULL;
> > > diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> > > index e5c02a32c5..4ca2bc02b3 100644
> > > --- a/migration/postcopy-ram.c
> > > +++ b/migration/postcopy-ram.c
> > > @@ -1117,6 +1117,9 @@ int postcopy_ram_enable_notify(MigrationIncomingState *mis)
> > >       /* Mark so that we get notified of accesses to unwritten areas */
> > >       if (qemu_ram_foreach_migratable_block(ram_block_enable_notify, mis)) {
> > > +        error_report("ram_block_enable_notify failed");
> > > +        close(mis->userfault_event_fd);
> > > +        close(mis->userfault_fd);
> > I don't think these close() calls are safe.  This code is just after
> > starting the fault thread, and the fault thread has a poll() call on
> > these fd's, so we can't close them until we've instructed that thread
> > to exit.
> > 
> > We should fall out through postcopy_ram_incoming_cleanup, and because
> > the thread exists it should do a notify to the thread, a join and then
> > only later do the close calls.
> > 
> > Dave
> I see the postcopy_ram_incoming_cleanup() already include the
> notify & join the fault thread & close these two fds and other more cleanup,
> thus that means directly replace the above two close() with
> postcopy_ram_incoming_cleanup() is ok, right? :)

Yes; I think I'd do that in loadvm_postcopy_handle_listen perhaps.
It's a weird case, normally if it fails before 'listen' then we call
cleanup in process_incoming_migration_co,  but after listen we call it
at the bottom of the listen thread.  Here we've got a half-state where
we're trying and failing to enter listen.

> BTW, does the parameter "&local_err" for multifd_save_cleanup(&local_err) is
> reserved for some reason? As I notice no code is using this error parameter,
> and if it is reserved for nothing I'd like to delete it just as the second
> paragraph
> of the commit message said. :)

Yes, that's OK; it might be better as a separate patch;  I suspect maybe
in an earlier version it had an error case.

Dave

> Have a nice day, thanks
> Fei
> > 
> > >           return -1;
> > >       }
> > > diff --git a/migration/ram.c b/migration/ram.c
> > > index 8f03afe228..57cb1bee3d 100644
> > > --- a/migration/ram.c
> > > +++ b/migration/ram.c
> > > @@ -922,7 +922,7 @@ static void multifd_send_terminate_threads(Error *err)
> > >       }
> > >   }
> > > -int multifd_save_cleanup(Error **errp)
> > > +int multifd_save_cleanup(void)
> > >   {
> > >       int i;
> > >       int ret = 0;
> > > @@ -1082,9 +1082,8 @@ static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque)
> > >       if (qio_task_propagate_error(task, &local_err)) {
> > >           migrate_set_state(&s->state, s->state, MIGRATION_STATUS_FAILED);
> > > -        if (multifd_save_cleanup(&local_err) != 0) {
> > > -            migrate_set_error(migrate_get_current(), local_err);
> > > -        }
> > > +        multifd_save_cleanup();
> > > +        migrate_set_error(migrate_get_current(), local_err);
> > >       } else {
> > >           p->c = QIO_CHANNEL(sioc);
> > >           qio_channel_set_delay(p->c, false);
> > > diff --git a/migration/ram.h b/migration/ram.h
> > > index 046d3074be..0d1215209e 100644
> > > --- a/migration/ram.h
> > > +++ b/migration/ram.h
> > > @@ -43,7 +43,7 @@ uint64_t ram_bytes_remaining(void);
> > >   uint64_t ram_bytes_total(void);
> > >   int multifd_save_setup(void);
> > > -int multifd_save_cleanup(Error **errp);
> > > +int multifd_save_cleanup(void);
> > >   int multifd_load_setup(void);
> > >   int multifd_load_cleanup(Error **errp);
> > >   bool multifd_recv_all_channels_created(void);
> > > -- 
> > > 2.13.7
> > > 
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> > 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v6 6/7] migration: fix some error handling
  2018-10-31 16:30       ` Dr. David Alan Gilbert
@ 2018-11-01  5:20         ` Fei Li
  0 siblings, 0 replies; 17+ messages in thread
From: Fei Li @ 2018-11-01  5:20 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: qemu-devel, armbru, famz, peterx, quintela



On 11/01/2018 12:30 AM, Dr. David Alan Gilbert wrote:
> * Fei Li (fli@suse.com) wrote:
>>
>> On 10/31/2018 03:49 AM, Dr. David Alan Gilbert wrote:
>>> * Fei Li (fli@suse.com) wrote:
>>>> Add error handling for qemu_ram_foreach_migratable_block() when
>>>> it fails.
>>>>
>>>> Always call migrate_set_error() to set the error state without relying
>>>> on whether multifd_save_cleanup() succeeds. As the passed &local_err
>>>> is never used in multifd_save_cleanup(), remove it.
>>>>
>>>> Signed-off-by: Fei Li <fli@suse.com>
>>>> ---
>>>>    migration/migration.c    | 5 +----
>>>>    migration/postcopy-ram.c | 3 +++
>>>>    migration/ram.c          | 7 +++----
>>>>    migration/ram.h          | 2 +-
>>>>    4 files changed, 8 insertions(+), 9 deletions(-)
>>>>
>>>> diff --git a/migration/migration.c b/migration/migration.c
>>>> index 87dfc7374f..3b8b7ab4f9 100644
>>>> --- a/migration/migration.c
>>>> +++ b/migration/migration.c
>>>> @@ -1377,7 +1377,6 @@ static void migrate_fd_cleanup(void *opaque)
>>>>        qemu_savevm_state_cleanup();
>>>>        if (s->to_dst_file) {
>>>> -        Error *local_err = NULL;
>>>>            QEMUFile *tmp;
>>>>            trace_migrate_fd_cleanup();
>>>> @@ -1388,9 +1387,7 @@ static void migrate_fd_cleanup(void *opaque)
>>>>            }
>>>>            qemu_mutex_lock_iothread();
>>>> -        if (multifd_save_cleanup(&local_err) != 0) {
>>>> -            error_report_err(local_err);
>>>> -        }
>>>> +        multifd_save_cleanup();
>>>>            qemu_mutex_lock(&s->qemu_file_lock);
>>>>            tmp = s->to_dst_file;
>>>>            s->to_dst_file = NULL;
>>>> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
>>>> index e5c02a32c5..4ca2bc02b3 100644
>>>> --- a/migration/postcopy-ram.c
>>>> +++ b/migration/postcopy-ram.c
>>>> @@ -1117,6 +1117,9 @@ int postcopy_ram_enable_notify(MigrationIncomingState *mis)
>>>>        /* Mark so that we get notified of accesses to unwritten areas */
>>>>        if (qemu_ram_foreach_migratable_block(ram_block_enable_notify, mis)) {
>>>> +        error_report("ram_block_enable_notify failed");
>>>> +        close(mis->userfault_event_fd);
>>>> +        close(mis->userfault_fd);
>>> I don't think these close() calls are safe.  This code is just after
>>> starting the fault thread, and the fault thread has a poll() call on
>>> these fd's, so we can't close them until we've instructed that thread
>>> to exit.
>>>
>>> We should fall out through postcopy_ram_incoming_cleanup, and because
>>> the thread exists it should do a notify to the thread, a join and then
>>> only later do the close calls.
>>>
>>> Dave
>> I see the postcopy_ram_incoming_cleanup() already include the
>> notify & join the fault thread & close these two fds and other more cleanup,
>> thus that means directly replace the above two close() with
>> postcopy_ram_incoming_cleanup() is ok, right? :)
> Yes; I think I'd do that in loadvm_postcopy_handle_listen perhaps.
> It's a weird case, normally if it fails before 'listen' then we call
> cleanup in process_incoming_migration_co,  but after listen we call it
> at the bottom of the listen thread.  Here we've got a half-state where
> we're trying and failing to enter listen.
Ok, I will put the cleanup() inside of loadvm_postcopy_handle_liste(), 
thanks
for the advise. :)
>
>> BTW, does the parameter "&local_err" for multifd_save_cleanup(&local_err) is
>> reserved for some reason? As I notice no code is using this error parameter,
>> and if it is reserved for nothing I'd like to delete it just as the second
>> paragraph
>> of the commit message said. :)
> Yes, that's OK; it might be better as a separate patch;  I suspect maybe
> in an earlier version it had an error case.
>
> Dave
Thanks for the suggestion, will separate into two patches in the next 
version.

Have a nice day
Fei
>
>> Have a nice day, thanks
>> Fei
>>>>            return -1;
>>>>        }
>>>> diff --git a/migration/ram.c b/migration/ram.c
>>>> index 8f03afe228..57cb1bee3d 100644
>>>> --- a/migration/ram.c
>>>> +++ b/migration/ram.c
>>>> @@ -922,7 +922,7 @@ static void multifd_send_terminate_threads(Error *err)
>>>>        }
>>>>    }
>>>> -int multifd_save_cleanup(Error **errp)
>>>> +int multifd_save_cleanup(void)
>>>>    {
>>>>        int i;
>>>>        int ret = 0;
>>>> @@ -1082,9 +1082,8 @@ static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque)
>>>>        if (qio_task_propagate_error(task, &local_err)) {
>>>>            migrate_set_state(&s->state, s->state, MIGRATION_STATUS_FAILED);
>>>> -        if (multifd_save_cleanup(&local_err) != 0) {
>>>> -            migrate_set_error(migrate_get_current(), local_err);
>>>> -        }
>>>> +        multifd_save_cleanup();
>>>> +        migrate_set_error(migrate_get_current(), local_err);
>>>>        } else {
>>>>            p->c = QIO_CHANNEL(sioc);
>>>>            qio_channel_set_delay(p->c, false);
>>>> diff --git a/migration/ram.h b/migration/ram.h
>>>> index 046d3074be..0d1215209e 100644
>>>> --- a/migration/ram.h
>>>> +++ b/migration/ram.h
>>>> @@ -43,7 +43,7 @@ uint64_t ram_bytes_remaining(void);
>>>>    uint64_t ram_bytes_total(void);
>>>>    int multifd_save_setup(void);
>>>> -int multifd_save_cleanup(Error **errp);
>>>> +int multifd_save_cleanup(void);
>>>>    int multifd_load_setup(void);
>>>>    int multifd_load_cleanup(Error **errp);
>>>>    bool multifd_recv_all_channels_created(void);
>>>> -- 
>>>> 2.13.7
>>>>
>>> --
>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2018-11-01  5:20 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-29 12:58 [Qemu-devel] [PATCH RFC v6 0/7] qemu_thread_create: propagate errors to callers to check Fei Li
2018-10-29 12:58 ` [Qemu-devel] [PATCH RFC v6 1/7] Fix segmentation fault when qemu_signal_init fails Fei Li
2018-10-29 12:58 ` [Qemu-devel] [PATCH RFC v6 2/7] qemu_init_vcpu: add a new Error parameter to propagate Fei Li
2018-10-29 12:58 ` [Qemu-devel] [PATCH RFC v6 3/7] qemu_thread_join: fix segmentation fault Fei Li
2018-10-29 12:58 ` [Qemu-devel] [PATCH RFC v6 4/7] migration: fix some segmentation faults when using multifd Fei Li
2018-10-29 12:58 ` [Qemu-devel] [PATCH RFC v6 5/7] migration: fix the multifd code when receiving less channels Fei Li
2018-10-30  6:05   ` Peter Xu
2018-10-30 10:05     ` Fei Li
2018-10-30 22:18       ` Peter Xu
2018-10-31 12:26         ` Fei Li
2018-10-31 13:50   ` Fei Li
2018-10-29 12:58 ` [Qemu-devel] [PATCH RFC v6 6/7] migration: fix some error handling Fei Li
2018-10-30 19:49   ` Dr. David Alan Gilbert
2018-10-31 11:25     ` Fei Li
2018-10-31 16:30       ` Dr. David Alan Gilbert
2018-11-01  5:20         ` Fei Li
2018-10-29 12:58 ` [Qemu-devel] [PATCH RFC v6 7/7] qemu_thread_create: propagate the error to callers to handle Fei Li

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.