All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH for-4.0 v8 0/7] qemu_thread_create: propagate errors to callers to check
@ 2018-12-11  9:50 Fei Li
  2018-12-11  9:50 ` [Qemu-devel] [PATCH for-4.0 v8 1/7] Fix segmentation fault when qemu_signal_init fails Fei Li
                   ` (6 more replies)
  0 siblings, 7 replies; 27+ messages in thread
From: Fei Li @ 2018-12-11  9:50 UTC (permalink / raw)
  To: qemu-devel

Hi,

This idea comes from BiteSizedTasks, and this patch series implement
the error checking of qemu_thread_create: make qemu_thread_create
return a flag to indicate if it succeeded rather than failing with an
error; make all callers check it.

The first and the last patch fixes some segmentation faults occured
during the debugging.   The second patch paves the way for the 6/7
patch as that patch is too long.   The 6/7 patch modifies the
qemu_thread_create() and makes it return a bool to all direct callers
to indicate if it succeeds.   The middle three fix some migration
issues.

Actually, almost all have got the "Reviewed-by:" except the 6/7 patch.
Please help to review, thanks. :)


v8:
- Remove previous two patches trying to fix the multifd issue on the
  source side, as we are still waiting for maintainer's opinions.
- Use atomic_read to get multifd_recv_state->count in patch 3/7.
- Get three more "Reviewed-by:".

v7:
- Split the previous multifd-migration into two patches: the src and
  the dst. For the dst, only dump the error instead of quitting.
- Safely do the cleanup for postcopy_ram_enable_notify().
- Split the previous migration-error-handling patch into two patches.

v6:
- Add a new migration-multifd related patch. BTW, delete the previous
  vnc related patch as it has been upstreamed.
- Use error_setg_errno() to set the errno when qemu_thread_create()
  fails for both Linux and Windows implementation.
- Optimize the first patch, less codes are needed

v5:
- Remove `errno = err` in qemu_thread_create() for Linux, and change
  `return errno` to `return -1` in qemu_signal_init() to indicate
  the error in case qemu_thread_create() fails.
- Delete the v4-added qemu_cond/mutex_destroy() in iothread_complete()
  as the destroy() will be done by its callers' object_unref().

v4:
- Separate the migration compression patch from this series
- Add one more error handling patch related with migration
- Add more cleaning up code for touched functions

v3:
- Add two migration related patches to fix the segmentaion fault
- Extract the segmentation fault fix from v2's last patch to be a 
  separate patch
- Add cleaning up code for touched functions
- Update some error messages

v2:
- Pass errp straightly instead of using a local_err & error_propagate
- Return a bool: false/true to indicate if one function succeeds
- Merge v1's last two patches into one to avoid the compile error
- Fix one omitted error in patch1 and update some error messages

Fei Li (7):
  Fix segmentation fault when qemu_signal_init fails
  qemu_init_vcpu: add a new Error parameter to propagate
  migration: fix the multifd code when receiving less channels
  migration: remove unused &local_err parameter in multifd_save_cleanup
  migration: add more error handling for postcopy_ram_enable_notify
  qemu_thread_create: propagate the error to callers to handle
  qemu_thread_join: fix segmentation fault

 accel/tcg/user-exec-stub.c      |  3 +-
 cpus.c                          | 79 ++++++++++++++++++++++++++---------------
 dump.c                          |  6 ++--
 hw/misc/edu.c                   |  6 ++--
 hw/ppc/spapr_hcall.c            | 10 ++++--
 hw/rdma/rdma_backend.c          |  4 ++-
 hw/usb/ccid-card-emulated.c     | 16 ++++++---
 include/qemu/thread.h           |  4 +--
 include/qom/cpu.h               |  2 +-
 io/task.c                       |  3 +-
 iothread.c                      | 16 ++++++---
 migration/channel.c             | 11 +++---
 migration/migration.c           | 68 ++++++++++++++++++++++-------------
 migration/migration.h           |  2 +-
 migration/postcopy-ram.c        | 15 ++++++--
 migration/ram.c                 | 61 +++++++++++++++++++++----------
 migration/ram.h                 |  4 +--
 migration/savevm.c              | 12 +++++--
 target/alpha/cpu.c              |  4 ++-
 target/arm/cpu.c                |  4 ++-
 target/cris/cpu.c               |  4 ++-
 target/hppa/cpu.c               |  4 ++-
 target/i386/cpu.c               |  4 ++-
 target/lm32/cpu.c               |  4 ++-
 target/m68k/cpu.c               |  4 ++-
 target/microblaze/cpu.c         |  4 ++-
 target/mips/cpu.c               |  4 ++-
 target/moxie/cpu.c              |  4 ++-
 target/nios2/cpu.c              |  4 ++-
 target/openrisc/cpu.c           |  4 ++-
 target/ppc/translate_init.inc.c |  4 ++-
 target/riscv/cpu.c              |  4 ++-
 target/s390x/cpu.c              |  4 ++-
 target/sh4/cpu.c                |  4 ++-
 target/sparc/cpu.c              |  4 ++-
 target/tilegx/cpu.c             |  4 ++-
 target/tricore/cpu.c            |  4 ++-
 target/unicore32/cpu.c          |  4 ++-
 target/xtensa/cpu.c             |  4 ++-
 tests/atomic_add-bench.c        |  3 +-
 tests/iothread.c                |  2 +-
 tests/qht-bench.c               |  3 +-
 tests/rcutorture.c              |  3 +-
 tests/test-aio.c                |  2 +-
 tests/test-rcu-list.c           |  3 +-
 ui/vnc-jobs.c                   | 17 ++++++---
 ui/vnc-jobs.h                   |  2 +-
 ui/vnc.c                        |  4 ++-
 util/compatfd.c                 | 12 +++++--
 util/main-loop.c                |  8 ++---
 util/oslib-posix.c              | 17 ++++++---
 util/qemu-thread-posix.c        | 27 +++++++++-----
 util/qemu-thread-win32.c        | 18 +++++++---
 util/rcu.c                      |  3 +-
 util/thread-pool.c              |  4 ++-
 55 files changed, 369 insertions(+), 165 deletions(-)

-- 
2.13.7

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH for-4.0 v8 1/7] Fix segmentation fault when qemu_signal_init fails
  2018-12-11  9:50 [Qemu-devel] [PATCH for-4.0 v8 0/7] qemu_thread_create: propagate errors to callers to check Fei Li
@ 2018-12-11  9:50 ` Fei Li
  2018-12-11  9:50 ` [Qemu-devel] [PATCH for-4.0 v8 2/7] qemu_init_vcpu: add a new Error parameter to propagate Fei Li
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 27+ messages in thread
From: Fei Li @ 2018-12-11  9:50 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini

When qemu_signal_init() fails in qemu_init_main_loop(), we return
without setting an error.  Its callers crash then when they try to
report the error with error_report_err().

To avoid such segmentation fault, add a new Error parameter to make
the call trace to propagate the err to the final caller.

Fixes: 2f78e491d7b46542158ce0b8132ee4e05bc0ade4
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Fei Li <fli@suse.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
---
 util/main-loop.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/util/main-loop.c b/util/main-loop.c
index affe0403c5..443cb4cfe8 100644
--- a/util/main-loop.c
+++ b/util/main-loop.c
@@ -71,7 +71,7 @@ static void sigfd_handler(void *opaque)
     }
 }
 
-static int qemu_signal_init(void)
+static int qemu_signal_init(Error **errp)
 {
     int sigfd;
     sigset_t set;
@@ -96,7 +96,7 @@ static int qemu_signal_init(void)
     sigdelset(&set, SIG_IPI);
     sigfd = qemu_signalfd(&set);
     if (sigfd == -1) {
-        fprintf(stderr, "failed to create signalfd\n");
+        error_setg_errno(errp, errno, "failed to create signalfd");
         return -errno;
     }
 
@@ -109,7 +109,7 @@ static int qemu_signal_init(void)
 
 #else /* _WIN32 */
 
-static int qemu_signal_init(void)
+static int qemu_signal_init(Error **errp)
 {
     return 0;
 }
@@ -148,7 +148,7 @@ int qemu_init_main_loop(Error **errp)
 
     init_clocks(qemu_timer_notify_cb);
 
-    ret = qemu_signal_init();
+    ret = qemu_signal_init(errp);
     if (ret) {
         return ret;
     }
-- 
2.13.7

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH for-4.0 v8 2/7] qemu_init_vcpu: add a new Error parameter to propagate
  2018-12-11  9:50 [Qemu-devel] [PATCH for-4.0 v8 0/7] qemu_thread_create: propagate errors to callers to check Fei Li
  2018-12-11  9:50 ` [Qemu-devel] [PATCH for-4.0 v8 1/7] Fix segmentation fault when qemu_signal_init fails Fei Li
@ 2018-12-11  9:50 ` Fei Li
  2018-12-11  9:50 ` [Qemu-devel] [PATCH for-4.0 v8 3/7] migration: fix the multifd code when receiving less channels Fei Li
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 27+ messages in thread
From: Fei Li @ 2018-12-11  9:50 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini

This patch is to pave the way for a later patch as it is too long:
"qemu_thread_create: propagate the error to callers to handle."

The callers of qemu_init_vcpu() already passed the **errp to handle
errors. In view of this, add a new Error parameter to all the
functions called by qemu_init_vcpu() to propagate the error and let
the further callers check it.

Besides, make qemu_init_vcpu() return a Boolean value to let its
callers know whether it succeeds.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Fei Li <fli@suse.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
---
 accel/tcg/user-exec-stub.c      |  3 ++-
 cpus.c                          | 34 +++++++++++++++++++++-------------
 include/qom/cpu.h               |  2 +-
 target/alpha/cpu.c              |  4 +++-
 target/arm/cpu.c                |  4 +++-
 target/cris/cpu.c               |  4 +++-
 target/hppa/cpu.c               |  4 +++-
 target/i386/cpu.c               |  4 +++-
 target/lm32/cpu.c               |  4 +++-
 target/m68k/cpu.c               |  4 +++-
 target/microblaze/cpu.c         |  4 +++-
 target/mips/cpu.c               |  4 +++-
 target/moxie/cpu.c              |  4 +++-
 target/nios2/cpu.c              |  4 +++-
 target/openrisc/cpu.c           |  4 +++-
 target/ppc/translate_init.inc.c |  4 +++-
 target/riscv/cpu.c              |  4 +++-
 target/s390x/cpu.c              |  4 +++-
 target/sh4/cpu.c                |  4 +++-
 target/sparc/cpu.c              |  4 +++-
 target/tilegx/cpu.c             |  4 +++-
 target/tricore/cpu.c            |  4 +++-
 target/unicore32/cpu.c          |  4 +++-
 target/xtensa/cpu.c             |  4 +++-
 24 files changed, 87 insertions(+), 36 deletions(-)

diff --git a/accel/tcg/user-exec-stub.c b/accel/tcg/user-exec-stub.c
index a32b4496af..f8c38a375c 100644
--- a/accel/tcg/user-exec-stub.c
+++ b/accel/tcg/user-exec-stub.c
@@ -10,8 +10,9 @@ void cpu_resume(CPUState *cpu)
 {
 }
 
-void qemu_init_vcpu(CPUState *cpu)
+bool qemu_init_vcpu(CPUState *cpu, Error **errp)
 {
+    return true;
 }
 
 /* User mode emulation does not support record/replay yet.  */
diff --git a/cpus.c b/cpus.c
index 0ddeeefc14..7b091bda53 100644
--- a/cpus.c
+++ b/cpus.c
@@ -1931,7 +1931,7 @@ void cpu_remove_sync(CPUState *cpu)
 /* For temporary buffers for forming a name */
 #define VCPU_THREAD_NAME_SIZE 16
 
-static void qemu_tcg_init_vcpu(CPUState *cpu)
+static void qemu_tcg_init_vcpu(CPUState *cpu, Error **errp)
 {
     char thread_name[VCPU_THREAD_NAME_SIZE];
     static QemuCond *single_tcg_halt_cond;
@@ -1987,7 +1987,7 @@ static void qemu_tcg_init_vcpu(CPUState *cpu)
     }
 }
 
-static void qemu_hax_start_vcpu(CPUState *cpu)
+static void qemu_hax_start_vcpu(CPUState *cpu, Error **errp)
 {
     char thread_name[VCPU_THREAD_NAME_SIZE];
 
@@ -2004,7 +2004,7 @@ static void qemu_hax_start_vcpu(CPUState *cpu)
 #endif
 }
 
-static void qemu_kvm_start_vcpu(CPUState *cpu)
+static void qemu_kvm_start_vcpu(CPUState *cpu, Error **errp)
 {
     char thread_name[VCPU_THREAD_NAME_SIZE];
 
@@ -2017,7 +2017,7 @@ static void qemu_kvm_start_vcpu(CPUState *cpu)
                        cpu, QEMU_THREAD_JOINABLE);
 }
 
-static void qemu_hvf_start_vcpu(CPUState *cpu)
+static void qemu_hvf_start_vcpu(CPUState *cpu, Error **errp)
 {
     char thread_name[VCPU_THREAD_NAME_SIZE];
 
@@ -2035,7 +2035,7 @@ static void qemu_hvf_start_vcpu(CPUState *cpu)
                        cpu, QEMU_THREAD_JOINABLE);
 }
 
-static void qemu_whpx_start_vcpu(CPUState *cpu)
+static void qemu_whpx_start_vcpu(CPUState *cpu, Error **errp)
 {
     char thread_name[VCPU_THREAD_NAME_SIZE];
 
@@ -2051,7 +2051,7 @@ static void qemu_whpx_start_vcpu(CPUState *cpu)
 #endif
 }
 
-static void qemu_dummy_start_vcpu(CPUState *cpu)
+static void qemu_dummy_start_vcpu(CPUState *cpu, Error **errp)
 {
     char thread_name[VCPU_THREAD_NAME_SIZE];
 
@@ -2064,11 +2064,12 @@ static void qemu_dummy_start_vcpu(CPUState *cpu)
                        QEMU_THREAD_JOINABLE);
 }
 
-void qemu_init_vcpu(CPUState *cpu)
+bool qemu_init_vcpu(CPUState *cpu, Error **errp)
 {
     cpu->nr_cores = smp_cores;
     cpu->nr_threads = smp_threads;
     cpu->stopped = true;
+    Error *local_err = NULL;
 
     if (!cpu->as) {
         /* If the target cpu hasn't set up any address spaces itself,
@@ -2079,22 +2080,29 @@ void qemu_init_vcpu(CPUState *cpu)
     }
 
     if (kvm_enabled()) {
-        qemu_kvm_start_vcpu(cpu);
+        qemu_kvm_start_vcpu(cpu, &local_err);
     } else if (hax_enabled()) {
-        qemu_hax_start_vcpu(cpu);
+        qemu_hax_start_vcpu(cpu, &local_err);
     } else if (hvf_enabled()) {
-        qemu_hvf_start_vcpu(cpu);
+        qemu_hvf_start_vcpu(cpu, &local_err);
     } else if (tcg_enabled()) {
-        qemu_tcg_init_vcpu(cpu);
+        qemu_tcg_init_vcpu(cpu, &local_err);
     } else if (whpx_enabled()) {
-        qemu_whpx_start_vcpu(cpu);
+        qemu_whpx_start_vcpu(cpu, &local_err);
     } else {
-        qemu_dummy_start_vcpu(cpu);
+        qemu_dummy_start_vcpu(cpu, &local_err);
+    }
+
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return false;
     }
 
     while (!cpu->created) {
         qemu_cond_wait(&qemu_cpu_cond, &qemu_global_mutex);
     }
+
+    return true;
 }
 
 void cpu_stop_current(void)
diff --git a/include/qom/cpu.h b/include/qom/cpu.h
index 1396f53e5b..696c3608d2 100644
--- a/include/qom/cpu.h
+++ b/include/qom/cpu.h
@@ -1006,7 +1006,7 @@ void end_exclusive(void);
  *
  * Initializes a vCPU.
  */
-void qemu_init_vcpu(CPUState *cpu);
+bool qemu_init_vcpu(CPUState *cpu, Error **errp);
 
 #define SSTEP_ENABLE  0x1  /* Enable simulated HW single stepping */
 #define SSTEP_NOIRQ   0x2  /* Do not use IRQ while single stepping */
diff --git a/target/alpha/cpu.c b/target/alpha/cpu.c
index a953897fcc..bf3c34516d 100644
--- a/target/alpha/cpu.c
+++ b/target/alpha/cpu.c
@@ -66,7 +66,9 @@ static void alpha_cpu_realizefn(DeviceState *dev, Error **errp)
         return;
     }
 
-    qemu_init_vcpu(cs);
+    if (!qemu_init_vcpu(cs, errp)) {
+        return;
+    }
 
     acc->parent_realize(dev, errp);
 }
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index 60411f6bfe..29a68c5114 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -1110,7 +1110,9 @@ static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
     }
 #endif
 
-    qemu_init_vcpu(cs);
+    if (!qemu_init_vcpu(cs, errp)) {
+        return;
+    }
     cpu_reset(cs);
 
     acc->parent_realize(dev, errp);
diff --git a/target/cris/cpu.c b/target/cris/cpu.c
index a23aba2688..ec92d69781 100644
--- a/target/cris/cpu.c
+++ b/target/cris/cpu.c
@@ -140,7 +140,9 @@ static void cris_cpu_realizefn(DeviceState *dev, Error **errp)
     }
 
     cpu_reset(cs);
-    qemu_init_vcpu(cs);
+    if (!qemu_init_vcpu(cs, errp)) {
+        return;
+    }
 
     ccc->parent_realize(dev, errp);
 }
diff --git a/target/hppa/cpu.c b/target/hppa/cpu.c
index 00bf444620..08f600ced9 100644
--- a/target/hppa/cpu.c
+++ b/target/hppa/cpu.c
@@ -98,7 +98,9 @@ static void hppa_cpu_realizefn(DeviceState *dev, Error **errp)
         return;
     }
 
-    qemu_init_vcpu(cs);
+    if (!qemu_init_vcpu(cs, errp)) {
+        return;
+    }
     acc->parent_realize(dev, errp);
 
 #ifndef CONFIG_USER_ONLY
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index f81d35e1f9..85612dc465 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -5279,7 +5279,9 @@ static void x86_cpu_realizefn(DeviceState *dev, Error **errp)
     }
 #endif
 
-    qemu_init_vcpu(cs);
+    if (!qemu_init_vcpu(cs, errp)) {
+        return;
+    }
 
     /*
      * Most Intel and certain AMD CPUs support hyperthreading. Even though QEMU
diff --git a/target/lm32/cpu.c b/target/lm32/cpu.c
index b7499cb627..d50b1e4a43 100644
--- a/target/lm32/cpu.c
+++ b/target/lm32/cpu.c
@@ -139,7 +139,9 @@ static void lm32_cpu_realizefn(DeviceState *dev, Error **errp)
 
     cpu_reset(cs);
 
-    qemu_init_vcpu(cs);
+    if (!qemu_init_vcpu(cs, errp)) {
+        return;
+    }
 
     lcc->parent_realize(dev, errp);
 }
diff --git a/target/m68k/cpu.c b/target/m68k/cpu.c
index 582e3a73b3..4ab53f2d58 100644
--- a/target/m68k/cpu.c
+++ b/target/m68k/cpu.c
@@ -231,7 +231,9 @@ static void m68k_cpu_realizefn(DeviceState *dev, Error **errp)
     m68k_cpu_init_gdb(cpu);
 
     cpu_reset(cs);
-    qemu_init_vcpu(cs);
+    if (!qemu_init_vcpu(cs, errp)) {
+        return;
+    }
 
     mcc->parent_realize(dev, errp);
 }
diff --git a/target/microblaze/cpu.c b/target/microblaze/cpu.c
index 9b546a2c18..3906c864a3 100644
--- a/target/microblaze/cpu.c
+++ b/target/microblaze/cpu.c
@@ -161,7 +161,9 @@ static void mb_cpu_realizefn(DeviceState *dev, Error **errp)
         return;
     }
 
-    qemu_init_vcpu(cs);
+    if (!qemu_init_vcpu(cs, errp)) {
+        return;
+    }
 
     env->pvr.regs[0] = PVR0_USE_EXC_MASK \
                        | PVR0_USE_ICACHE_MASK \
diff --git a/target/mips/cpu.c b/target/mips/cpu.c
index e217fb3e36..1e5aa69c57 100644
--- a/target/mips/cpu.c
+++ b/target/mips/cpu.c
@@ -145,7 +145,9 @@ static void mips_cpu_realizefn(DeviceState *dev, Error **errp)
     cpu_mips_realize_env(&cpu->env);
 
     cpu_reset(cs);
-    qemu_init_vcpu(cs);
+    if (!qemu_init_vcpu(cs, errp)) {
+        return;
+    }
 
     mcc->parent_realize(dev, errp);
 }
diff --git a/target/moxie/cpu.c b/target/moxie/cpu.c
index 8d67eb6727..8581a6d922 100644
--- a/target/moxie/cpu.c
+++ b/target/moxie/cpu.c
@@ -66,7 +66,9 @@ static void moxie_cpu_realizefn(DeviceState *dev, Error **errp)
         return;
     }
 
-    qemu_init_vcpu(cs);
+    if (!qemu_init_vcpu(cs, errp)) {
+        return;
+    }
     cpu_reset(cs);
 
     mcc->parent_realize(dev, errp);
diff --git a/target/nios2/cpu.c b/target/nios2/cpu.c
index fbfaa2ce26..5c7b4b486e 100644
--- a/target/nios2/cpu.c
+++ b/target/nios2/cpu.c
@@ -94,7 +94,9 @@ static void nios2_cpu_realizefn(DeviceState *dev, Error **errp)
         return;
     }
 
-    qemu_init_vcpu(cs);
+    if (!qemu_init_vcpu(cs, errp)) {
+        return;
+    }
     cpu_reset(cs);
 
     ncc->parent_realize(dev, errp);
diff --git a/target/openrisc/cpu.c b/target/openrisc/cpu.c
index fb7cb5c507..a6dcdb9df9 100644
--- a/target/openrisc/cpu.c
+++ b/target/openrisc/cpu.c
@@ -83,7 +83,9 @@ static void openrisc_cpu_realizefn(DeviceState *dev, Error **errp)
         return;
     }
 
-    qemu_init_vcpu(cs);
+    if (!qemu_init_vcpu(cs, errp)) {
+        return;
+    }
     cpu_reset(cs);
 
     occ->parent_realize(dev, errp);
diff --git a/target/ppc/translate_init.inc.c b/target/ppc/translate_init.inc.c
index 168d0cec28..587a0e7710 100644
--- a/target/ppc/translate_init.inc.c
+++ b/target/ppc/translate_init.inc.c
@@ -9711,7 +9711,9 @@ static void ppc_cpu_realize(DeviceState *dev, Error **errp)
                                  32, "power-vsx.xml", 0);
     }
 
-    qemu_init_vcpu(cs);
+    if (!qemu_init_vcpu(cs, errp)) {
+        goto unrealize;
+    }
 
     pcc->parent_realize(dev, errp);
 
diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index a025a0a3ba..9829fd9bc4 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -305,7 +305,9 @@ static void riscv_cpu_realize(DeviceState *dev, Error **errp)
         return;
     }
 
-    qemu_init_vcpu(cs);
+    if (!qemu_init_vcpu(cs, errp)) {
+        return;
+    }
     cpu_reset(cs);
 
     mcc->parent_realize(dev, errp);
diff --git a/target/s390x/cpu.c b/target/s390x/cpu.c
index 18ba7f85a5..2a3eac9761 100644
--- a/target/s390x/cpu.c
+++ b/target/s390x/cpu.c
@@ -222,7 +222,9 @@ static void s390_cpu_realizefn(DeviceState *dev, Error **errp)
     qemu_register_reset(s390_cpu_machine_reset_cb, cpu);
 #endif
     s390_cpu_gdb_init(cs);
-    qemu_init_vcpu(cs);
+    if (!qemu_init_vcpu(cs, errp)) {
+        return;
+    }
 
     /*
      * KVM requires the initial CPU reset ioctl to be executed on the target
diff --git a/target/sh4/cpu.c b/target/sh4/cpu.c
index b9f393b7c7..d32ef2e1cb 100644
--- a/target/sh4/cpu.c
+++ b/target/sh4/cpu.c
@@ -196,7 +196,9 @@ static void superh_cpu_realizefn(DeviceState *dev, Error **errp)
     }
 
     cpu_reset(cs);
-    qemu_init_vcpu(cs);
+    if (!qemu_init_vcpu(cs, errp)) {
+        return;
+    }
 
     scc->parent_realize(dev, errp);
 }
diff --git a/target/sparc/cpu.c b/target/sparc/cpu.c
index 0f090ece54..9c22f6a7df 100644
--- a/target/sparc/cpu.c
+++ b/target/sparc/cpu.c
@@ -773,7 +773,9 @@ static void sparc_cpu_realizefn(DeviceState *dev, Error **errp)
         return;
     }
 
-    qemu_init_vcpu(cs);
+    if (!qemu_init_vcpu(cs, errp)) {
+        return;
+    }
 
     scc->parent_realize(dev, errp);
 }
diff --git a/target/tilegx/cpu.c b/target/tilegx/cpu.c
index bfe9be59b5..234148fabd 100644
--- a/target/tilegx/cpu.c
+++ b/target/tilegx/cpu.c
@@ -92,7 +92,9 @@ static void tilegx_cpu_realizefn(DeviceState *dev, Error **errp)
     }
 
     cpu_reset(cs);
-    qemu_init_vcpu(cs);
+    if (!qemu_init_vcpu(cs, errp)) {
+        return;
+    }
 
     tcc->parent_realize(dev, errp);
 }
diff --git a/target/tricore/cpu.c b/target/tricore/cpu.c
index 2edaef1aef..5482d6ea3f 100644
--- a/target/tricore/cpu.c
+++ b/target/tricore/cpu.c
@@ -96,7 +96,9 @@ static void tricore_cpu_realizefn(DeviceState *dev, Error **errp)
         set_feature(env, TRICORE_FEATURE_13);
     }
     cpu_reset(cs);
-    qemu_init_vcpu(cs);
+    if (!qemu_init_vcpu(cs, errp)) {
+        return;
+    }
 
     tcc->parent_realize(dev, errp);
 }
diff --git a/target/unicore32/cpu.c b/target/unicore32/cpu.c
index 2b49d1ca40..0c737c3187 100644
--- a/target/unicore32/cpu.c
+++ b/target/unicore32/cpu.c
@@ -96,7 +96,9 @@ static void uc32_cpu_realizefn(DeviceState *dev, Error **errp)
         return;
     }
 
-    qemu_init_vcpu(cs);
+    if (!qemu_init_vcpu(cs, errp)) {
+        return;
+    }
 
     ucc->parent_realize(dev, errp);
 }
diff --git a/target/xtensa/cpu.c b/target/xtensa/cpu.c
index a54dbe4260..d2351c9b20 100644
--- a/target/xtensa/cpu.c
+++ b/target/xtensa/cpu.c
@@ -131,7 +131,9 @@ static void xtensa_cpu_realizefn(DeviceState *dev, Error **errp)
 
     cs->gdb_num_regs = xcc->config->gdb_regmap.num_regs;
 
-    qemu_init_vcpu(cs);
+    if (!qemu_init_vcpu(cs, errp)) {
+        return;
+    }
 
     xcc->parent_realize(dev, errp);
 }
-- 
2.13.7

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH for-4.0 v8 3/7] migration: fix the multifd code when receiving less channels
  2018-12-11  9:50 [Qemu-devel] [PATCH for-4.0 v8 0/7] qemu_thread_create: propagate errors to callers to check Fei Li
  2018-12-11  9:50 ` [Qemu-devel] [PATCH for-4.0 v8 1/7] Fix segmentation fault when qemu_signal_init fails Fei Li
  2018-12-11  9:50 ` [Qemu-devel] [PATCH for-4.0 v8 2/7] qemu_init_vcpu: add a new Error parameter to propagate Fei Li
@ 2018-12-11  9:50 ` Fei Li
  2018-12-13  6:17   ` Markus Armbruster
  2018-12-11  9:50 ` [Qemu-devel] [PATCH for-4.0 v8 4/7] migration: remove unused &local_err parameter in multifd_save_cleanup Fei Li
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 27+ messages in thread
From: Fei Li @ 2018-12-11  9:50 UTC (permalink / raw)
  To: qemu-devel; +Cc: Dr . David Alan Gilbert

In our current code, when multifd is used during migration, if there
is an error before the destination receives all new channels, the
source keeps running, however the destination does not exit but keeps
waiting until the source is killed deliberately.

Fix this by dumping the specific error and let users decide whether
to quit from the destination side when failing to receive packet via
some channel.

Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Fei Li <fli@suse.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
---
 migration/channel.c   | 11 ++++++-----
 migration/migration.c |  9 +++++++--
 migration/migration.h |  2 +-
 migration/ram.c       | 10 ++++++++--
 migration/ram.h       |  2 +-
 5 files changed, 23 insertions(+), 11 deletions(-)

diff --git a/migration/channel.c b/migration/channel.c
index 33e0e9b82f..20e4c8e2dc 100644
--- a/migration/channel.c
+++ b/migration/channel.c
@@ -30,6 +30,7 @@
 void migration_channel_process_incoming(QIOChannel *ioc)
 {
     MigrationState *s = migrate_get_current();
+    Error *local_err = NULL;
 
     trace_migration_set_incoming_channel(
         ioc, object_get_typename(OBJECT(ioc)));
@@ -38,13 +39,13 @@ void migration_channel_process_incoming(QIOChannel *ioc)
         *s->parameters.tls_creds &&
         !object_dynamic_cast(OBJECT(ioc),
                              TYPE_QIO_CHANNEL_TLS)) {
-        Error *local_err = NULL;
         migration_tls_channel_process_incoming(s, ioc, &local_err);
-        if (local_err) {
-            error_report_err(local_err);
-        }
     } else {
-        migration_ioc_process_incoming(ioc);
+        migration_ioc_process_incoming(ioc, &local_err);
+    }
+
+    if (local_err) {
+        error_report_err(local_err);
     }
 }
 
diff --git a/migration/migration.c b/migration/migration.c
index 49ffb9997a..72106bddf0 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -541,7 +541,7 @@ void migration_fd_process_incoming(QEMUFile *f)
     migration_incoming_process();
 }
 
-void migration_ioc_process_incoming(QIOChannel *ioc)
+void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp)
 {
     MigrationIncomingState *mis = migration_incoming_get_current();
     bool start_migration;
@@ -563,9 +563,14 @@ void migration_ioc_process_incoming(QIOChannel *ioc)
          */
         start_migration = !migrate_use_multifd();
     } else {
+        Error *local_err = NULL;
         /* Multiple connections */
         assert(migrate_use_multifd());
-        start_migration = multifd_recv_new_channel(ioc);
+        start_migration = multifd_recv_new_channel(ioc, &local_err);
+        if (local_err) {
+            error_propagate(errp, local_err);
+            return;
+        }
     }
 
     if (start_migration) {
diff --git a/migration/migration.h b/migration/migration.h
index e413d4d8b6..02b7304610 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -229,7 +229,7 @@ struct MigrationState
 void migrate_set_state(int *state, int old_state, int new_state);
 
 void migration_fd_process_incoming(QEMUFile *f);
-void migration_ioc_process_incoming(QIOChannel *ioc);
+void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp);
 void migration_incoming_process(void);
 
 bool  migration_has_all_channels(void);
diff --git a/migration/ram.c b/migration/ram.c
index 7e7deec4d8..c7e3d6b0fd 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1323,7 +1323,7 @@ bool multifd_recv_all_channels_created(void)
 }
 
 /* Return true if multifd is ready for the migration, otherwise false */
-bool multifd_recv_new_channel(QIOChannel *ioc)
+bool multifd_recv_new_channel(QIOChannel *ioc, Error **errp)
 {
     MultiFDRecvParams *p;
     Error *local_err = NULL;
@@ -1331,6 +1331,10 @@ bool multifd_recv_new_channel(QIOChannel *ioc)
 
     id = multifd_recv_initial_packet(ioc, &local_err);
     if (id < 0) {
+        error_propagate_prepend(errp, local_err,
+                                "failed to receive packet"
+                                " via multifd channel %d: ",
+                                atomic_read(&multifd_recv_state->count));
         multifd_recv_terminate_threads(local_err);
         return false;
     }
@@ -1340,6 +1344,7 @@ bool multifd_recv_new_channel(QIOChannel *ioc)
         error_setg(&local_err, "multifd: received id '%d' already setup'",
                    id);
         multifd_recv_terminate_threads(local_err);
+        error_propagate(errp, local_err);
         return false;
     }
     p->c = ioc;
@@ -1351,7 +1356,8 @@ bool multifd_recv_new_channel(QIOChannel *ioc)
     qemu_thread_create(&p->thread, p->name, multifd_recv_thread, p,
                        QEMU_THREAD_JOINABLE);
     atomic_inc(&multifd_recv_state->count);
-    return multifd_recv_state->count == migrate_multifd_channels();
+    return atomic_read(&multifd_recv_state->count) ==
+           migrate_multifd_channels();
 }
 
 /**
diff --git a/migration/ram.h b/migration/ram.h
index 83ff1bc11a..046d3074be 100644
--- a/migration/ram.h
+++ b/migration/ram.h
@@ -47,7 +47,7 @@ int multifd_save_cleanup(Error **errp);
 int multifd_load_setup(void);
 int multifd_load_cleanup(Error **errp);
 bool multifd_recv_all_channels_created(void);
-bool multifd_recv_new_channel(QIOChannel *ioc);
+bool multifd_recv_new_channel(QIOChannel *ioc, Error **errp);
 
 uint64_t ram_pagesize_summary(void);
 int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t len);
-- 
2.13.7

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH for-4.0 v8 4/7] migration: remove unused &local_err parameter in multifd_save_cleanup
  2018-12-11  9:50 [Qemu-devel] [PATCH for-4.0 v8 0/7] qemu_thread_create: propagate errors to callers to check Fei Li
                   ` (2 preceding siblings ...)
  2018-12-11  9:50 ` [Qemu-devel] [PATCH for-4.0 v8 3/7] migration: fix the multifd code when receiving less channels Fei Li
@ 2018-12-11  9:50 ` Fei Li
  2018-12-11  9:50 ` [Qemu-devel] [PATCH for-4.0 v8 5/7] migration: add more error handling for postcopy_ram_enable_notify Fei Li
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 27+ messages in thread
From: Fei Li @ 2018-12-11  9:50 UTC (permalink / raw)
  To: qemu-devel; +Cc: Dr . David Alan Gilbert

Always call migrate_set_error() to set the error state without relying
on whether multifd_save_cleanup() succeeds.  As the passed &local_err
is never used in multifd_save_cleanup(), remove it. And make the
function be: void multifd_save_cleanup(void).

Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Fei Li <fli@suse.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
---
 migration/migration.c |  5 +----
 migration/ram.c       | 11 ++++-------
 migration/ram.h       |  2 +-
 3 files changed, 6 insertions(+), 12 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 72106bddf0..0537fc0c26 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1386,7 +1386,6 @@ static void migrate_fd_cleanup(void *opaque)
     qemu_savevm_state_cleanup();
 
     if (s->to_dst_file) {
-        Error *local_err = NULL;
         QEMUFile *tmp;
 
         trace_migrate_fd_cleanup();
@@ -1397,9 +1396,7 @@ static void migrate_fd_cleanup(void *opaque)
         }
         qemu_mutex_lock_iothread();
 
-        if (multifd_save_cleanup(&local_err) != 0) {
-            error_report_err(local_err);
-        }
+        multifd_save_cleanup();
         qemu_mutex_lock(&s->qemu_file_lock);
         tmp = s->to_dst_file;
         s->to_dst_file = NULL;
diff --git a/migration/ram.c b/migration/ram.c
index c7e3d6b0fd..658dfa88a3 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -917,13 +917,12 @@ static void multifd_send_terminate_threads(Error *err)
     }
 }
 
-int multifd_save_cleanup(Error **errp)
+void multifd_save_cleanup(void)
 {
     int i;
-    int ret = 0;
 
     if (!migrate_use_multifd()) {
-        return 0;
+        return;
     }
     multifd_send_terminate_threads(NULL);
     for (i = 0; i < migrate_multifd_channels(); i++) {
@@ -953,7 +952,6 @@ int multifd_save_cleanup(Error **errp)
     multifd_send_state->pages = NULL;
     g_free(multifd_send_state);
     multifd_send_state = NULL;
-    return ret;
 }
 
 static void multifd_send_sync_main(void)
@@ -1071,9 +1069,8 @@ static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque)
     Error *local_err = NULL;
 
     if (qio_task_propagate_error(task, &local_err)) {
-        if (multifd_save_cleanup(&local_err) != 0) {
-            migrate_set_error(migrate_get_current(), local_err);
-        }
+        migrate_set_error(migrate_get_current(), local_err);
+        multifd_save_cleanup();
     } else {
         p->c = QIO_CHANNEL(sioc);
         qio_channel_set_delay(p->c, false);
diff --git a/migration/ram.h b/migration/ram.h
index 046d3074be..936177b3e9 100644
--- a/migration/ram.h
+++ b/migration/ram.h
@@ -43,7 +43,7 @@ uint64_t ram_bytes_remaining(void);
 uint64_t ram_bytes_total(void);
 
 int multifd_save_setup(void);
-int multifd_save_cleanup(Error **errp);
+void multifd_save_cleanup(void);
 int multifd_load_setup(void);
 int multifd_load_cleanup(Error **errp);
 bool multifd_recv_all_channels_created(void);
-- 
2.13.7

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH for-4.0 v8 5/7] migration: add more error handling for postcopy_ram_enable_notify
  2018-12-11  9:50 [Qemu-devel] [PATCH for-4.0 v8 0/7] qemu_thread_create: propagate errors to callers to check Fei Li
                   ` (3 preceding siblings ...)
  2018-12-11  9:50 ` [Qemu-devel] [PATCH for-4.0 v8 4/7] migration: remove unused &local_err parameter in multifd_save_cleanup Fei Li
@ 2018-12-11  9:50 ` Fei Li
  2018-12-11  9:50 ` [Qemu-devel] [PATCH for-4.0 v8 6/7] qemu_thread_create: propagate the error to callers to handle Fei Li
  2018-12-11  9:50 ` [Qemu-devel] [PATCH for-4.0 v8 7/7] qemu_thread_join: fix segmentation fault Fei Li
  6 siblings, 0 replies; 27+ messages in thread
From: Fei Li @ 2018-12-11  9:50 UTC (permalink / raw)
  To: qemu-devel; +Cc: Dr . David Alan Gilbert

Call postcopy_ram_incoming_cleanup() to do the cleanup when
postcopy_ram_enable_notify fails. Besides, report the error
message when qemu_ram_foreach_migratable_block() fails.

Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Fei Li <fli@suse.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/postcopy-ram.c | 1 +
 migration/savevm.c       | 1 +
 2 files changed, 2 insertions(+)

diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index e5c02a32c5..fa09dba534 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -1117,6 +1117,7 @@ int postcopy_ram_enable_notify(MigrationIncomingState *mis)
 
     /* Mark so that we get notified of accesses to unwritten areas */
     if (qemu_ram_foreach_migratable_block(ram_block_enable_notify, mis)) {
+        error_report("ram_block_enable_notify failed");
         return -1;
     }
 
diff --git a/migration/savevm.c b/migration/savevm.c
index 9e45fb4f3f..d784e8aa40 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1729,6 +1729,7 @@ static int loadvm_postcopy_handle_listen(MigrationIncomingState *mis)
      */
     if (migrate_postcopy_ram()) {
         if (postcopy_ram_enable_notify(mis)) {
+            postcopy_ram_incoming_cleanup(mis);
             return -1;
         }
     }
-- 
2.13.7

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH for-4.0 v8 6/7] qemu_thread_create: propagate the error to callers to handle
  2018-12-11  9:50 [Qemu-devel] [PATCH for-4.0 v8 0/7] qemu_thread_create: propagate errors to callers to check Fei Li
                   ` (4 preceding siblings ...)
  2018-12-11  9:50 ` [Qemu-devel] [PATCH for-4.0 v8 5/7] migration: add more error handling for postcopy_ram_enable_notify Fei Li
@ 2018-12-11  9:50 ` Fei Li
  2018-12-13  7:26   ` Markus Armbruster
  2018-12-11  9:50 ` [Qemu-devel] [PATCH for-4.0 v8 7/7] qemu_thread_join: fix segmentation fault Fei Li
  6 siblings, 1 reply; 27+ messages in thread
From: Fei Li @ 2018-12-11  9:50 UTC (permalink / raw)
  To: qemu-devel
  Cc: Markus Armbruster, Daniel P . Berrangé, Dr . David Alan Gilbert

Make qemu_thread_create() return a Boolean to indicate if it succeeds
rather than failing with an error. And add an Error parameter to hold
the error message and let the callers handle it.

Cc: Markus Armbruster <armbru@redhat.com>
Cc: Daniel P. Berrangé <berrange@redhat.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Fei Li <fli@suse.com>
---
 cpus.c                      | 45 ++++++++++++++++++++++++-------------
 dump.c                      |  6 +++--
 hw/misc/edu.c               |  6 +++--
 hw/ppc/spapr_hcall.c        | 10 +++++++--
 hw/rdma/rdma_backend.c      |  4 +++-
 hw/usb/ccid-card-emulated.c | 16 ++++++++++----
 include/qemu/thread.h       |  4 ++--
 io/task.c                   |  3 ++-
 iothread.c                  | 16 +++++++++-----
 migration/migration.c       | 54 +++++++++++++++++++++++++++++----------------
 migration/postcopy-ram.c    | 14 ++++++++++--
 migration/ram.c             | 40 ++++++++++++++++++++++++---------
 migration/savevm.c          | 11 ++++++---
 tests/atomic_add-bench.c    |  3 ++-
 tests/iothread.c            |  2 +-
 tests/qht-bench.c           |  3 ++-
 tests/rcutorture.c          |  3 ++-
 tests/test-aio.c            |  2 +-
 tests/test-rcu-list.c       |  3 ++-
 ui/vnc-jobs.c               | 17 +++++++++-----
 ui/vnc-jobs.h               |  2 +-
 ui/vnc.c                    |  4 +++-
 util/compatfd.c             | 12 ++++++++--
 util/oslib-posix.c          | 17 ++++++++++----
 util/qemu-thread-posix.c    | 24 +++++++++++++-------
 util/qemu-thread-win32.c    | 16 ++++++++++----
 util/rcu.c                  |  3 ++-
 util/thread-pool.c          |  4 +++-
 28 files changed, 243 insertions(+), 101 deletions(-)

diff --git a/cpus.c b/cpus.c
index 7b091bda53..e8450e518a 100644
--- a/cpus.c
+++ b/cpus.c
@@ -1961,15 +1961,20 @@ static void qemu_tcg_init_vcpu(CPUState *cpu, Error **errp)
             snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/TCG",
                  cpu->cpu_index);
 
-            qemu_thread_create(cpu->thread, thread_name, qemu_tcg_cpu_thread_fn,
-                               cpu, QEMU_THREAD_JOINABLE);
+            if (!qemu_thread_create(cpu->thread, thread_name,
+                                    qemu_tcg_cpu_thread_fn, cpu,
+                                    QEMU_THREAD_JOINABLE, errp)) {
+                return;
+            }
 
         } else {
             /* share a single thread for all cpus with TCG */
             snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "ALL CPUs/TCG");
-            qemu_thread_create(cpu->thread, thread_name,
-                               qemu_tcg_rr_cpu_thread_fn,
-                               cpu, QEMU_THREAD_JOINABLE);
+            if (!qemu_thread_create(cpu->thread, thread_name,
+                                    qemu_tcg_rr_cpu_thread_fn, cpu,
+                                    QEMU_THREAD_JOINABLE, errp)) {
+                return;
+            }
 
             single_tcg_halt_cond = cpu->halt_cond;
             single_tcg_cpu_thread = cpu->thread;
@@ -1997,8 +2002,10 @@ static void qemu_hax_start_vcpu(CPUState *cpu, Error **errp)
 
     snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/HAX",
              cpu->cpu_index);
-    qemu_thread_create(cpu->thread, thread_name, qemu_hax_cpu_thread_fn,
-                       cpu, QEMU_THREAD_JOINABLE);
+    if (!qemu_thread_create(cpu->thread, thread_name, qemu_hax_cpu_thread_fn,
+                            cpu, QEMU_THREAD_JOINABLE, errp)) {
+        return;
+    }
 #ifdef _WIN32
     cpu->hThread = qemu_thread_get_handle(cpu->thread);
 #endif
@@ -2013,8 +2020,10 @@ static void qemu_kvm_start_vcpu(CPUState *cpu, Error **errp)
     qemu_cond_init(cpu->halt_cond);
     snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/KVM",
              cpu->cpu_index);
-    qemu_thread_create(cpu->thread, thread_name, qemu_kvm_cpu_thread_fn,
-                       cpu, QEMU_THREAD_JOINABLE);
+    if (!qemu_thread_create(cpu->thread, thread_name, qemu_kvm_cpu_thread_fn,
+                            cpu, QEMU_THREAD_JOINABLE, errp)) {
+        /* keep 'if' here in case there is further error handling logic */
+    }
 }
 
 static void qemu_hvf_start_vcpu(CPUState *cpu, Error **errp)
@@ -2031,8 +2040,10 @@ static void qemu_hvf_start_vcpu(CPUState *cpu, Error **errp)
 
     snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/HVF",
              cpu->cpu_index);
-    qemu_thread_create(cpu->thread, thread_name, qemu_hvf_cpu_thread_fn,
-                       cpu, QEMU_THREAD_JOINABLE);
+    if (!qemu_thread_create(cpu->thread, thread_name, qemu_hvf_cpu_thread_fn,
+                            cpu, QEMU_THREAD_JOINABLE, errp)) {
+        /* keep 'if' here in case there is further error handling logic */
+    }
 }
 
 static void qemu_whpx_start_vcpu(CPUState *cpu, Error **errp)
@@ -2044,8 +2055,10 @@ static void qemu_whpx_start_vcpu(CPUState *cpu, Error **errp)
     qemu_cond_init(cpu->halt_cond);
     snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/WHPX",
              cpu->cpu_index);
-    qemu_thread_create(cpu->thread, thread_name, qemu_whpx_cpu_thread_fn,
-                       cpu, QEMU_THREAD_JOINABLE);
+    if (!qemu_thread_create(cpu->thread, thread_name, qemu_whpx_cpu_thread_fn,
+                            cpu, QEMU_THREAD_JOINABLE, errp)) {
+        return;
+    }
 #ifdef _WIN32
     cpu->hThread = qemu_thread_get_handle(cpu->thread);
 #endif
@@ -2060,8 +2073,10 @@ static void qemu_dummy_start_vcpu(CPUState *cpu, Error **errp)
     qemu_cond_init(cpu->halt_cond);
     snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/DUMMY",
              cpu->cpu_index);
-    qemu_thread_create(cpu->thread, thread_name, qemu_dummy_cpu_thread_fn, cpu,
-                       QEMU_THREAD_JOINABLE);
+    if (!qemu_thread_create(cpu->thread, thread_name, qemu_dummy_cpu_thread_fn,
+                            cpu, QEMU_THREAD_JOINABLE, errp)) {
+        /* keep 'if' here in case there is further error handling logic */
+    }
 }
 
 bool qemu_init_vcpu(CPUState *cpu, Error **errp)
diff --git a/dump.c b/dump.c
index 4ec94c5e25..1f003aff9a 100644
--- a/dump.c
+++ b/dump.c
@@ -2020,8 +2020,10 @@ void qmp_dump_guest_memory(bool paging, const char *file,
     if (detach_p) {
         /* detached dump */
         s->detached = true;
-        qemu_thread_create(&s->dump_thread, "dump_thread", dump_thread,
-                           s, QEMU_THREAD_DETACHED);
+        if (!qemu_thread_create(&s->dump_thread, "dump_thread", dump_thread,
+                                s, QEMU_THREAD_DETACHED, errp)) {
+            /* keep 'if' here in case there is further error handling logic */
+        }
     } else {
         /* sync dump */
         dump_process(s, errp);
diff --git a/hw/misc/edu.c b/hw/misc/edu.c
index cdcf550dd7..6684c60a96 100644
--- a/hw/misc/edu.c
+++ b/hw/misc/edu.c
@@ -355,8 +355,10 @@ static void pci_edu_realize(PCIDevice *pdev, Error **errp)
 
     qemu_mutex_init(&edu->thr_mutex);
     qemu_cond_init(&edu->thr_cond);
-    qemu_thread_create(&edu->thread, "edu", edu_fact_thread,
-                       edu, QEMU_THREAD_JOINABLE);
+    if (!qemu_thread_create(&edu->thread, "edu", edu_fact_thread,
+                            edu, QEMU_THREAD_JOINABLE, errp)) {
+        return;
+    }
 
     memory_region_init_io(&edu->mmio, OBJECT(edu), &edu_mmio_ops, edu,
                     "edu-mmio", 1 * MiB);
diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index ae913d070f..7c16ade04a 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -478,6 +478,7 @@ static target_ulong h_resize_hpt_prepare(PowerPCCPU *cpu,
     sPAPRPendingHPT *pending = spapr->pending_hpt;
     uint64_t current_ram_size;
     int rc;
+    Error *local_err = NULL;
 
     if (spapr->resize_hpt == SPAPR_RESIZE_HPT_DISABLED) {
         return H_AUTHORITY;
@@ -538,8 +539,13 @@ static target_ulong h_resize_hpt_prepare(PowerPCCPU *cpu,
     pending->shift = shift;
     pending->ret = H_HARDWARE;
 
-    qemu_thread_create(&pending->thread, "sPAPR HPT prepare",
-                       hpt_prepare_thread, pending, QEMU_THREAD_DETACHED);
+    if (!qemu_thread_create(&pending->thread, "sPAPR HPT prepare",
+                            hpt_prepare_thread, pending,
+                            QEMU_THREAD_DETACHED, &local_err)) {
+        error_reportf_err(local_err, "failed to create hpt_prepare_thread: ");
+        g_free(pending);
+        return H_RESOURCE;
+    }
 
     spapr->pending_hpt = pending;
 
diff --git a/hw/rdma/rdma_backend.c b/hw/rdma/rdma_backend.c
index d7a4bbd91f..53a2bd0d85 100644
--- a/hw/rdma/rdma_backend.c
+++ b/hw/rdma/rdma_backend.c
@@ -164,8 +164,10 @@ static void start_comp_thread(RdmaBackendDev *backend_dev)
     snprintf(thread_name, sizeof(thread_name), "rdma_comp_%s",
              ibv_get_device_name(backend_dev->ib_dev));
     backend_dev->comp_thread.run = true;
+    /* FIXME: let the further caller handle the error instead of abort() here */
     qemu_thread_create(&backend_dev->comp_thread.thread, thread_name,
-                       comp_handler_thread, backend_dev, QEMU_THREAD_DETACHED);
+                       comp_handler_thread, backend_dev,
+                       QEMU_THREAD_DETACHED, &error_abort);
 }
 
 void rdma_backend_register_comp_handler(void (*handler)(int status,
diff --git a/hw/usb/ccid-card-emulated.c b/hw/usb/ccid-card-emulated.c
index 25976ed84f..c6783f124a 100644
--- a/hw/usb/ccid-card-emulated.c
+++ b/hw/usb/ccid-card-emulated.c
@@ -33,6 +33,7 @@
 #include "qemu/main-loop.h"
 #include "ccid.h"
 #include "qapi/error.h"
+#include "qemu/error-report.h"
 
 #define DPRINTF(card, lvl, fmt, ...) \
 do {\
@@ -544,10 +545,17 @@ static void emulated_realize(CCIDCardState *base, Error **errp)
         error_setg(errp, "%s: failed to initialize vcard", TYPE_EMULATED_CCID);
         goto out2;
     }
-    qemu_thread_create(&card->event_thread_id, "ccid/event", event_thread,
-                       card, QEMU_THREAD_JOINABLE);
-    qemu_thread_create(&card->apdu_thread_id, "ccid/apdu", handle_apdu_thread,
-                       card, QEMU_THREAD_JOINABLE);
+    if (!qemu_thread_create(&card->event_thread_id, "ccid/event", event_thread,
+                            card, QEMU_THREAD_JOINABLE, errp)) {
+        error_report("failed to create event_thread");
+        goto out2;
+    }
+    if (!qemu_thread_create(&card->apdu_thread_id, "ccid/apdu",
+                            handle_apdu_thread, card,
+                            QEMU_THREAD_JOINABLE, errp)) {
+        error_report("failed to create handle_apdu_thread");
+        goto out2;
+    }
 
 out2:
     clean_event_notifier(card);
diff --git a/include/qemu/thread.h b/include/qemu/thread.h
index 55d83a907c..12291f4ccd 100644
--- a/include/qemu/thread.h
+++ b/include/qemu/thread.h
@@ -152,9 +152,9 @@ void qemu_event_reset(QemuEvent *ev);
 void qemu_event_wait(QemuEvent *ev);
 void qemu_event_destroy(QemuEvent *ev);
 
-void qemu_thread_create(QemuThread *thread, const char *name,
+bool qemu_thread_create(QemuThread *thread, const char *name,
                         void *(*start_routine)(void *),
-                        void *arg, int mode);
+                        void *arg, int mode, Error **errp);
 void *qemu_thread_join(QemuThread *thread);
 void qemu_thread_get_self(QemuThread *thread);
 bool qemu_thread_is_self(QemuThread *thread);
diff --git a/io/task.c b/io/task.c
index 2886a2c1bc..6d3a18ab80 100644
--- a/io/task.c
+++ b/io/task.c
@@ -149,7 +149,8 @@ void qio_task_run_in_thread(QIOTask *task,
                        "io-task-worker",
                        qio_task_thread_worker,
                        data,
-                       QEMU_THREAD_DETACHED);
+                       QEMU_THREAD_DETACHED,
+                       &error_abort);
 }
 
 
diff --git a/iothread.c b/iothread.c
index 2fb1cdf55d..7335dacf0b 100644
--- a/iothread.c
+++ b/iothread.c
@@ -164,9 +164,7 @@ static void iothread_complete(UserCreatable *obj, Error **errp)
                                 &local_error);
     if (local_error) {
         error_propagate(errp, local_error);
-        aio_context_unref(iothread->ctx);
-        iothread->ctx = NULL;
-        return;
+        goto fail;
     }
 
     qemu_mutex_init(&iothread->init_done_lock);
@@ -178,8 +176,12 @@ static void iothread_complete(UserCreatable *obj, Error **errp)
      */
     name = object_get_canonical_path_component(OBJECT(obj));
     thread_name = g_strdup_printf("IO %s", name);
-    qemu_thread_create(&iothread->thread, thread_name, iothread_run,
-                       iothread, QEMU_THREAD_JOINABLE);
+    if (!qemu_thread_create(&iothread->thread, thread_name, iothread_run,
+                            iothread, QEMU_THREAD_JOINABLE, errp)) {
+        g_free(thread_name);
+        g_free(name);
+        goto fail;
+    }
     g_free(thread_name);
     g_free(name);
 
@@ -190,6 +192,10 @@ static void iothread_complete(UserCreatable *obj, Error **errp)
                        &iothread->init_done_lock);
     }
     qemu_mutex_unlock(&iothread->init_done_lock);
+    return;
+fail:
+    aio_context_unref(iothread->ctx);
+    iothread->ctx = NULL;
 }
 
 typedef struct {
diff --git a/migration/migration.c b/migration/migration.c
index 0537fc0c26..af6c72ac5d 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -438,19 +438,22 @@ static void process_incoming_migration_co(void *opaque)
         /* Make sure all file formats flush their mutable metadata */
         bdrv_invalidate_cache_all(&local_err);
         if (local_err) {
-            migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
-                    MIGRATION_STATUS_FAILED);
             error_report_err(local_err);
-            exit(EXIT_FAILURE);
+            goto fail;
         }
 
         if (colo_init_ram_cache() < 0) {
             error_report("Init ram cache failed");
-            exit(EXIT_FAILURE);
+            goto fail;
         }
 
-        qemu_thread_create(&mis->colo_incoming_thread, "COLO incoming",
-             colo_process_incoming_thread, mis, QEMU_THREAD_JOINABLE);
+        if (!qemu_thread_create(&mis->colo_incoming_thread, "COLO incoming",
+                                colo_process_incoming_thread, mis,
+                                QEMU_THREAD_JOINABLE, &local_err)) {
+            error_reportf_err(local_err, "failed to create "
+                              "colo_process_incoming_thread: ");
+            goto fail;
+        }
         mis->have_colo_incoming_thread = true;
         qemu_coroutine_yield();
 
@@ -461,20 +464,22 @@ static void process_incoming_migration_co(void *opaque)
     }
 
     if (ret < 0) {
-        Error *local_err = NULL;
-
-        migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
-                          MIGRATION_STATUS_FAILED);
         error_report("load of migration failed: %s", strerror(-ret));
-        qemu_fclose(mis->from_src_file);
-        if (multifd_load_cleanup(&local_err) != 0) {
-            error_report_err(local_err);
-        }
-        exit(EXIT_FAILURE);
+        goto fail;
     }
     mis->bh = qemu_bh_new(process_incoming_migration_bh, mis);
     qemu_bh_schedule(mis->bh);
     mis->migration_incoming_co = NULL;
+    return;
+fail:
+    local_err = NULL;
+    migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
+                      MIGRATION_STATUS_FAILED);
+    qemu_fclose(mis->from_src_file);
+    if (multifd_load_cleanup(&local_err) != 0) {
+        error_report_err(local_err);
+    }
+    exit(EXIT_FAILURE);
 }
 
 static void migration_incoming_setup(QEMUFile *f)
@@ -2345,6 +2350,7 @@ out:
 static int open_return_path_on_source(MigrationState *ms,
                                       bool create_thread)
 {
+    Error *local_err = NULL;
 
     ms->rp_state.from_dst_file = qemu_file_get_return_path(ms->to_dst_file);
     if (!ms->rp_state.from_dst_file) {
@@ -2358,8 +2364,13 @@ static int open_return_path_on_source(MigrationState *ms,
         return 0;
     }
 
-    qemu_thread_create(&ms->rp_state.rp_thread, "return path",
-                       source_return_path_thread, ms, QEMU_THREAD_JOINABLE);
+    if (!qemu_thread_create(&ms->rp_state.rp_thread, "return path",
+                            source_return_path_thread, ms,
+                            QEMU_THREAD_JOINABLE, &local_err)) {
+        error_reportf_err(local_err,
+                          "failed to create source_return_path_thread: ");
+        return -1;
+    }
 
     trace_open_return_path_on_source_continue();
 
@@ -3189,8 +3200,13 @@ void migrate_fd_connect(MigrationState *s, Error *error_in)
         migrate_fd_cleanup(s);
         return;
     }
-    qemu_thread_create(&s->thread, "live_migration", migration_thread, s,
-                       QEMU_THREAD_JOINABLE);
+    if (!qemu_thread_create(&s->thread, "live_migration", migration_thread,
+                            s, QEMU_THREAD_JOINABLE, &error_in)) {
+        error_reportf_err(error_in, "failed to create migration_thread: ");
+        migrate_set_state(&s->state, s->state, MIGRATION_STATUS_FAILED);
+        migrate_fd_cleanup(s);
+        return;
+    }
     s->migration_thread_running = true;
 }
 
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index fa09dba534..80bfa9c4a2 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -1083,6 +1083,8 @@ retry:
 
 int postcopy_ram_enable_notify(MigrationIncomingState *mis)
 {
+    Error *local_err = NULL;
+
     /* Open the fd for the kernel to give us userfaults */
     mis->userfault_fd = syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK);
     if (mis->userfault_fd == -1) {
@@ -1109,8 +1111,16 @@ int postcopy_ram_enable_notify(MigrationIncomingState *mis)
     }
 
     qemu_sem_init(&mis->fault_thread_sem, 0);
-    qemu_thread_create(&mis->fault_thread, "postcopy/fault",
-                       postcopy_ram_fault_thread, mis, QEMU_THREAD_JOINABLE);
+    if (!qemu_thread_create(&mis->fault_thread, "postcopy/fault",
+                            postcopy_ram_fault_thread, mis,
+                            QEMU_THREAD_JOINABLE, &local_err)) {
+        error_reportf_err(local_err,
+                          "failed to create postcopy_ram_fault_thread: ");
+        close(mis->userfault_event_fd);
+        close(mis->userfault_fd);
+        qemu_sem_destroy(&mis->fault_thread_sem);
+        return -1;
+    }
     qemu_sem_wait(&mis->fault_thread_sem);
     qemu_sem_destroy(&mis->fault_thread_sem);
     mis->have_fault_thread = true;
diff --git a/migration/ram.c b/migration/ram.c
index 658dfa88a3..6e0cccf066 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -473,6 +473,7 @@ static void compress_threads_save_cleanup(void)
 static int compress_threads_save_setup(void)
 {
     int i, thread_count;
+    Error *local_err = NULL;
 
     if (!migrate_use_compression()) {
         return 0;
@@ -502,9 +503,12 @@ static int compress_threads_save_setup(void)
         comp_param[i].quit = false;
         qemu_mutex_init(&comp_param[i].mutex);
         qemu_cond_init(&comp_param[i].cond);
-        qemu_thread_create(compress_threads + i, "compress",
-                           do_data_compress, comp_param + i,
-                           QEMU_THREAD_JOINABLE);
+        if (!qemu_thread_create(compress_threads + i, "compress",
+                                do_data_compress, comp_param + i,
+                                QEMU_THREAD_JOINABLE, &local_err)) {
+            error_reportf_err(local_err, "failed to create do_data_compress: ");
+            goto exit;
+        }
     }
     return 0;
 
@@ -1075,8 +1079,14 @@ static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque)
         p->c = QIO_CHANNEL(sioc);
         qio_channel_set_delay(p->c, false);
         p->running = true;
-        qemu_thread_create(&p->thread, p->name, multifd_send_thread, p,
-                           QEMU_THREAD_JOINABLE);
+        if (!qemu_thread_create(&p->thread, p->name, multifd_send_thread, p,
+                                QEMU_THREAD_JOINABLE, &local_err)) {
+            migrate_set_error(migrate_get_current(), local_err);
+            error_reportf_err(local_err,
+                              "failed to create multifd_send_thread: ");
+            multifd_save_cleanup();
+            return;
+        }
 
         atomic_inc(&multifd_send_state->count);
     }
@@ -1350,8 +1360,13 @@ bool multifd_recv_new_channel(QIOChannel *ioc, Error **errp)
     p->num_packets = 1;
 
     p->running = true;
-    qemu_thread_create(&p->thread, p->name, multifd_recv_thread, p,
-                       QEMU_THREAD_JOINABLE);
+    if (!qemu_thread_create(&p->thread, p->name, multifd_recv_thread,
+                            p, QEMU_THREAD_JOINABLE, &local_err)) {
+        error_propagate_prepend(errp, local_err,
+                                "failed to create multifd_recv_thread: ");
+        multifd_recv_terminate_threads(local_err);
+        return false;
+    }
     atomic_inc(&multifd_recv_state->count);
     return atomic_read(&multifd_recv_state->count) ==
            migrate_multifd_channels();
@@ -3617,6 +3632,7 @@ static void compress_threads_load_cleanup(void)
 static int compress_threads_load_setup(QEMUFile *f)
 {
     int i, thread_count;
+    Error *local_err = NULL;
 
     if (!migrate_use_compression()) {
         return 0;
@@ -3638,9 +3654,13 @@ static int compress_threads_load_setup(QEMUFile *f)
         qemu_cond_init(&decomp_param[i].cond);
         decomp_param[i].done = true;
         decomp_param[i].quit = false;
-        qemu_thread_create(decompress_threads + i, "decompress",
-                           do_data_decompress, decomp_param + i,
-                           QEMU_THREAD_JOINABLE);
+        if (!qemu_thread_create(decompress_threads + i, "decompress",
+                                do_data_decompress, decomp_param + i,
+                                QEMU_THREAD_JOINABLE, &local_err)) {
+            error_reportf_err(local_err,
+                              "failed to create do_data_decompress: ");
+            goto exit;
+        }
     }
     return 0;
 exit:
diff --git a/migration/savevm.c b/migration/savevm.c
index d784e8aa40..b8bdcde5d8 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1747,9 +1747,14 @@ static int loadvm_postcopy_handle_listen(MigrationIncomingState *mis)
     mis->have_listen_thread = true;
     /* Start up the listening thread and wait for it to signal ready */
     qemu_sem_init(&mis->listen_thread_sem, 0);
-    qemu_thread_create(&mis->listen_thread, "postcopy/listen",
-                       postcopy_ram_listen_thread, NULL,
-                       QEMU_THREAD_DETACHED);
+    if (!qemu_thread_create(&mis->listen_thread, "postcopy/listen",
+                            postcopy_ram_listen_thread, NULL,
+                            QEMU_THREAD_DETACHED, &local_err)) {
+        error_reportf_err(local_err,
+                          "failed to create postcopy_ram_listen_thread: ");
+        qemu_sem_destroy(&mis->listen_thread_sem);
+        return -1;
+    }
     qemu_sem_wait(&mis->listen_thread_sem);
     qemu_sem_destroy(&mis->listen_thread_sem);
 
diff --git a/tests/atomic_add-bench.c b/tests/atomic_add-bench.c
index 2f6c72f63a..338b9563e3 100644
--- a/tests/atomic_add-bench.c
+++ b/tests/atomic_add-bench.c
@@ -2,6 +2,7 @@
 #include "qemu/thread.h"
 #include "qemu/host-utils.h"
 #include "qemu/processor.h"
+#include "qapi/error.h"
 
 struct thread_info {
     uint64_t r;
@@ -110,7 +111,7 @@ static void create_threads(void)
 
         info->r = (i + 1) ^ time(NULL);
         qemu_thread_create(&threads[i], NULL, thread_func, info,
-                           QEMU_THREAD_JOINABLE);
+                           QEMU_THREAD_JOINABLE, &error_abort);
     }
 }
 
diff --git a/tests/iothread.c b/tests/iothread.c
index 777d9eea46..f4ad992e61 100644
--- a/tests/iothread.c
+++ b/tests/iothread.c
@@ -73,7 +73,7 @@ IOThread *iothread_new(void)
     qemu_mutex_init(&iothread->init_done_lock);
     qemu_cond_init(&iothread->init_done_cond);
     qemu_thread_create(&iothread->thread, NULL, iothread_run,
-                       iothread, QEMU_THREAD_JOINABLE);
+                       iothread, QEMU_THREAD_JOINABLE, &error_abort);
 
     /* Wait for initialization to complete */
     qemu_mutex_lock(&iothread->init_done_lock);
diff --git a/tests/qht-bench.c b/tests/qht-bench.c
index 2089e2bed1..71df567ea2 100644
--- a/tests/qht-bench.c
+++ b/tests/qht-bench.c
@@ -9,6 +9,7 @@
 #include "qemu/atomic.h"
 #include "qemu/qht.h"
 #include "qemu/rcu.h"
+#include "qapi/error.h"
 #include "exec/tb-hash-xx.h"
 
 struct thread_stats {
@@ -247,7 +248,7 @@ th_create_n(QemuThread **threads, struct thread_info **infos, const char *name,
         prepare_thread_info(&info[i], offset + i);
         info[i].func = func;
         qemu_thread_create(&th[i], name, thread_func, &info[i],
-                           QEMU_THREAD_JOINABLE);
+                           QEMU_THREAD_JOINABLE, &error_abort);
     }
 }
 
diff --git a/tests/rcutorture.c b/tests/rcutorture.c
index 49311c82ea..0e799ff256 100644
--- a/tests/rcutorture.c
+++ b/tests/rcutorture.c
@@ -64,6 +64,7 @@
 #include "qemu/atomic.h"
 #include "qemu/rcu.h"
 #include "qemu/thread.h"
+#include "qapi/error.h"
 
 long long n_reads = 0LL;
 long n_updates = 0L;
@@ -90,7 +91,7 @@ static void create_thread(void *(*func)(void *))
         exit(-1);
     }
     qemu_thread_create(&threads[n_threads], "test", func, &data[n_threads],
-                       QEMU_THREAD_JOINABLE);
+                       QEMU_THREAD_JOINABLE, &error_abort);
     n_threads++;
 }
 
diff --git a/tests/test-aio.c b/tests/test-aio.c
index 86fb73b3d5..b3ac261724 100644
--- a/tests/test-aio.c
+++ b/tests/test-aio.c
@@ -154,7 +154,7 @@ static void test_acquire(void)
 
     qemu_thread_create(&thread, "test_acquire_thread",
                        test_acquire_thread,
-                       &data, QEMU_THREAD_JOINABLE);
+                       &data, QEMU_THREAD_JOINABLE, &error_abort);
 
     /* Block in aio_poll(), let other thread kick us and acquire context */
     aio_context_acquire(ctx);
diff --git a/tests/test-rcu-list.c b/tests/test-rcu-list.c
index 2e6f70bd59..0f7da81291 100644
--- a/tests/test-rcu-list.c
+++ b/tests/test-rcu-list.c
@@ -25,6 +25,7 @@
 #include "qemu/rcu.h"
 #include "qemu/thread.h"
 #include "qemu/rcu_queue.h"
+#include "qapi/error.h"
 
 /*
  * Test variables.
@@ -68,7 +69,7 @@ static void create_thread(void *(*func)(void *))
         exit(-1);
     }
     qemu_thread_create(&threads[n_threads], "test", func, &data[n_threads],
-                       QEMU_THREAD_JOINABLE);
+                       QEMU_THREAD_JOINABLE, &error_abort);
     n_threads++;
 }
 
diff --git a/ui/vnc-jobs.c b/ui/vnc-jobs.c
index 929391f85d..35a652d1fd 100644
--- a/ui/vnc-jobs.c
+++ b/ui/vnc-jobs.c
@@ -31,6 +31,7 @@
 #include "vnc-jobs.h"
 #include "qemu/sockets.h"
 #include "qemu/main-loop.h"
+#include "qapi/error.h"
 #include "block/aio.h"
 
 /*
@@ -331,15 +332,21 @@ static bool vnc_worker_thread_running(void)
     return queue; /* Check global queue */
 }
 
-void vnc_start_worker_thread(void)
+bool vnc_start_worker_thread(Error **errp)
 {
     VncJobQueue *q;
 
-    if (vnc_worker_thread_running())
-        return ;
+    if (vnc_worker_thread_running()) {
+        goto out;
+    }
 
     q = vnc_queue_init();
-    qemu_thread_create(&q->thread, "vnc_worker", vnc_worker_thread, q,
-                       QEMU_THREAD_DETACHED);
+    if (!qemu_thread_create(&q->thread, "vnc_worker", vnc_worker_thread,
+                            q, QEMU_THREAD_DETACHED, errp)) {
+        vnc_queue_clear(q);
+        return false;
+    }
     queue = q; /* Set global queue */
+out:
+    return true;
 }
diff --git a/ui/vnc-jobs.h b/ui/vnc-jobs.h
index 59f66bcc35..14640593db 100644
--- a/ui/vnc-jobs.h
+++ b/ui/vnc-jobs.h
@@ -37,7 +37,7 @@ void vnc_job_push(VncJob *job);
 void vnc_jobs_join(VncState *vs);
 
 void vnc_jobs_consume_buffer(VncState *vs);
-void vnc_start_worker_thread(void);
+bool vnc_start_worker_thread(Error **errp);
 
 /* Locks */
 static inline int vnc_trylock_display(VncDisplay *vd)
diff --git a/ui/vnc.c b/ui/vnc.c
index 0c1b477425..0ffe9e6a5d 100644
--- a/ui/vnc.c
+++ b/ui/vnc.c
@@ -3236,7 +3236,9 @@ void vnc_display_init(const char *id, Error **errp)
     vd->connections_limit = 32;
 
     qemu_mutex_init(&vd->mutex);
-    vnc_start_worker_thread();
+    if (!vnc_start_worker_thread(errp)) {
+        return;
+    }
 
     vd->dcl.ops = &dcl_ops;
     register_displaychangelistener(&vd->dcl);
diff --git a/util/compatfd.c b/util/compatfd.c
index 980bd33e52..886aa249f9 100644
--- a/util/compatfd.c
+++ b/util/compatfd.c
@@ -16,6 +16,7 @@
 #include "qemu/osdep.h"
 #include "qemu-common.h"
 #include "qemu/thread.h"
+#include "qapi/error.h"
 
 #include <sys/syscall.h>
 
@@ -70,6 +71,7 @@ static int qemu_signalfd_compat(const sigset_t *mask)
     struct sigfd_compat_info *info;
     QemuThread thread;
     int fds[2];
+    Error *local_err = NULL;
 
     info = malloc(sizeof(*info));
     if (info == NULL) {
@@ -88,8 +90,14 @@ static int qemu_signalfd_compat(const sigset_t *mask)
     memcpy(&info->mask, mask, sizeof(*mask));
     info->fd = fds[1];
 
-    qemu_thread_create(&thread, "signalfd_compat", sigwait_compat, info,
-                       QEMU_THREAD_DETACHED);
+    if (!qemu_thread_create(&thread, "signalfd_compat", sigwait_compat,
+                            info, QEMU_THREAD_DETACHED, &local_err)) {
+        error_reportf_err(local_err, "failed to create sigwait_compat: ");
+        close(fds[0]);
+        close(fds[1]);
+        free(info);
+        return -1;
+    }
 
     return fds[0];
 }
diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index c1bee2a581..2c779fd634 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -437,9 +437,12 @@ static bool touch_all_pages(char *area, size_t hpagesize, size_t numpages,
     size_t size_per_thread;
     char *addr = area;
     int i = 0;
+    int started_thread = 0;
+    Error *local_err = NULL;
 
     memset_thread_failed = false;
     memset_num_threads = get_memset_num_threads(smp_cpus);
+    started_thread = memset_num_threads;
     memset_thread = g_new0(MemsetThread, memset_num_threads);
     numpages_per_thread = (numpages / memset_num_threads);
     size_per_thread = (hpagesize * numpages_per_thread);
@@ -448,13 +451,19 @@ static bool touch_all_pages(char *area, size_t hpagesize, size_t numpages,
         memset_thread[i].numpages = (i == (memset_num_threads - 1)) ?
                                     numpages : numpages_per_thread;
         memset_thread[i].hpagesize = hpagesize;
-        qemu_thread_create(&memset_thread[i].pgthread, "touch_pages",
-                           do_touch_pages, &memset_thread[i],
-                           QEMU_THREAD_JOINABLE);
+        if (!qemu_thread_create(&memset_thread[i].pgthread, "touch_pages",
+                                do_touch_pages, &memset_thread[i],
+                                QEMU_THREAD_JOINABLE, &local_err)) {
+            error_reportf_err(local_err, "failed to create do_touch_pages: ");
+            memset_thread_failed = true;
+            started_thread = i;
+            goto out;
+        }
         addr += size_per_thread;
         numpages -= numpages_per_thread;
     }
-    for (i = 0; i < memset_num_threads; i++) {
+out:
+    for (i = 0; i < started_thread; i++) {
         qemu_thread_join(&memset_thread[i].pgthread);
     }
     g_free(memset_thread);
diff --git a/util/qemu-thread-posix.c b/util/qemu-thread-posix.c
index 865e476df5..81b40a1ece 100644
--- a/util/qemu-thread-posix.c
+++ b/util/qemu-thread-posix.c
@@ -15,6 +15,7 @@
 #include "qemu/atomic.h"
 #include "qemu/notify.h"
 #include "qemu-thread-common.h"
+#include "qapi/error.h"
 
 static bool name_threads;
 
@@ -500,9 +501,9 @@ static void *qemu_thread_start(void *args)
     return r;
 }
 
-void qemu_thread_create(QemuThread *thread, const char *name,
-                       void *(*start_routine)(void*),
-                       void *arg, int mode)
+bool qemu_thread_create(QemuThread *thread, const char *name,
+                        void *(*start_routine)(void *),
+                        void *arg, int mode, Error **errp)
 {
     sigset_t set, oldset;
     int err;
@@ -511,7 +512,9 @@ void qemu_thread_create(QemuThread *thread, const char *name,
 
     err = pthread_attr_init(&attr);
     if (err) {
-        error_exit(err, __func__);
+        error_setg_errno(errp, -err, "pthread_attr_init failed: %s",
+                         strerror(err));
+        return false;
     }
 
     if (mode == QEMU_THREAD_DETACHED) {
@@ -526,16 +529,21 @@ void qemu_thread_create(QemuThread *thread, const char *name,
     qemu_thread_args->name = g_strdup(name);
     qemu_thread_args->start_routine = start_routine;
     qemu_thread_args->arg = arg;
-
     err = pthread_create(&thread->thread, &attr,
                          qemu_thread_start, qemu_thread_args);
-
-    if (err)
-        error_exit(err, __func__);
+    if (err) {
+        error_setg_errno(errp, -err, "pthread_create failed: %s",
+                         strerror(err));
+        pthread_attr_destroy(&attr);
+        g_free(qemu_thread_args->name);
+        g_free(qemu_thread_args);
+        return false;
+    }
 
     pthread_sigmask(SIG_SETMASK, &oldset, NULL);
 
     pthread_attr_destroy(&attr);
+    return true;
 }
 
 void qemu_thread_get_self(QemuThread *thread)
diff --git a/util/qemu-thread-win32.c b/util/qemu-thread-win32.c
index 4a363ca675..57b1143e97 100644
--- a/util/qemu-thread-win32.c
+++ b/util/qemu-thread-win32.c
@@ -20,6 +20,7 @@
 #include "qemu/thread.h"
 #include "qemu/notify.h"
 #include "qemu-thread-common.h"
+#include "qapi/error.h"
 #include <process.h>
 
 static bool name_threads;
@@ -388,9 +389,9 @@ void *qemu_thread_join(QemuThread *thread)
     return ret;
 }
 
-void qemu_thread_create(QemuThread *thread, const char *name,
-                       void *(*start_routine)(void *),
-                       void *arg, int mode)
+bool qemu_thread_create(QemuThread *thread, const char *name,
+                        void *(*start_routine)(void *),
+                        void *arg, int mode, Error **errp)
 {
     HANDLE hThread;
     struct QemuThreadData *data;
@@ -409,10 +410,17 @@ void qemu_thread_create(QemuThread *thread, const char *name,
     hThread = (HANDLE) _beginthreadex(NULL, 0, win32_start_routine,
                                       data, 0, &thread->tid);
     if (!hThread) {
-        error_exit(GetLastError(), __func__);
+        if (data->mode != QEMU_THREAD_DETACHED) {
+            DeleteCriticalSection(&data->cs);
+        }
+        error_setg_errno(errp, errno,
+                         "failed to create win32_start_routine");
+        g_free(data);
+        return false;
     }
     CloseHandle(hThread);
     thread->data = data;
+    return true;
 }
 
 void qemu_thread_get_self(QemuThread *thread)
diff --git a/util/rcu.c b/util/rcu.c
index 5676c22bd1..145dcdb0c6 100644
--- a/util/rcu.c
+++ b/util/rcu.c
@@ -32,6 +32,7 @@
 #include "qemu/atomic.h"
 #include "qemu/thread.h"
 #include "qemu/main-loop.h"
+#include "qapi/error.h"
 #if defined(CONFIG_MALLOC_TRIM)
 #include <malloc.h>
 #endif
@@ -325,7 +326,7 @@ static void rcu_init_complete(void)
      * must have been quiescent even after forking, just recreate it.
      */
     qemu_thread_create(&thread, "call_rcu", call_rcu_thread,
-                       NULL, QEMU_THREAD_DETACHED);
+                       NULL, QEMU_THREAD_DETACHED, &error_abort);
 
     rcu_register_thread();
 }
diff --git a/util/thread-pool.c b/util/thread-pool.c
index 610646d131..ad0f980783 100644
--- a/util/thread-pool.c
+++ b/util/thread-pool.c
@@ -22,6 +22,7 @@
 #include "trace.h"
 #include "block/thread-pool.h"
 #include "qemu/main-loop.h"
+#include "qapi/error.h"
 
 static void do_spawn_thread(ThreadPool *pool);
 
@@ -132,7 +133,8 @@ static void do_spawn_thread(ThreadPool *pool)
     pool->new_threads--;
     pool->pending_threads++;
 
-    qemu_thread_create(&t, "worker", worker_thread, pool, QEMU_THREAD_DETACHED);
+    qemu_thread_create(&t, "worker", worker_thread, pool,
+                       QEMU_THREAD_DETACHED, &error_abort);
 }
 
 static void spawn_thread_bh_fn(void *opaque)
-- 
2.13.7

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH for-4.0 v8 7/7] qemu_thread_join: fix segmentation fault
  2018-12-11  9:50 [Qemu-devel] [PATCH for-4.0 v8 0/7] qemu_thread_create: propagate errors to callers to check Fei Li
                   ` (5 preceding siblings ...)
  2018-12-11  9:50 ` [Qemu-devel] [PATCH for-4.0 v8 6/7] qemu_thread_create: propagate the error to callers to handle Fei Li
@ 2018-12-11  9:50 ` Fei Li
  6 siblings, 0 replies; 27+ messages in thread
From: Fei Li @ 2018-12-11  9:50 UTC (permalink / raw)
  To: qemu-devel; +Cc: Stefan Weil

To avoid the segmentation fault in qemu_thread_join(), just directly
return when the QemuThread *thread failed to be created in either
qemu-thread-posix.c or qemu-thread-win32.c.

Cc: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Fei Li <fli@suse.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
---
 util/qemu-thread-posix.c | 3 +++
 util/qemu-thread-win32.c | 2 +-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/util/qemu-thread-posix.c b/util/qemu-thread-posix.c
index 81b40a1ece..cb3722a4f4 100644
--- a/util/qemu-thread-posix.c
+++ b/util/qemu-thread-posix.c
@@ -566,6 +566,9 @@ void *qemu_thread_join(QemuThread *thread)
     int err;
     void *ret;
 
+    if (!thread->thread) {
+        return NULL;
+    }
     err = pthread_join(thread->thread, &ret);
     if (err) {
         error_exit(err, __func__);
diff --git a/util/qemu-thread-win32.c b/util/qemu-thread-win32.c
index 57b1143e97..ca4d5329e3 100644
--- a/util/qemu-thread-win32.c
+++ b/util/qemu-thread-win32.c
@@ -367,7 +367,7 @@ void *qemu_thread_join(QemuThread *thread)
     HANDLE handle;
 
     data = thread->data;
-    if (data->mode == QEMU_THREAD_DETACHED) {
+    if (data == NULL || data->mode == QEMU_THREAD_DETACHED) {
         return NULL;
     }
 
-- 
2.13.7

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.0 v8 3/7] migration: fix the multifd code when receiving less channels
  2018-12-11  9:50 ` [Qemu-devel] [PATCH for-4.0 v8 3/7] migration: fix the multifd code when receiving less channels Fei Li
@ 2018-12-13  6:17   ` Markus Armbruster
  2018-12-17 11:45     ` Fei Li
  0 siblings, 1 reply; 27+ messages in thread
From: Markus Armbruster @ 2018-12-13  6:17 UTC (permalink / raw)
  To: Fei Li; +Cc: qemu-devel, Dr . David Alan Gilbert

Fei Li <fli@suse.com> writes:

> In our current code, when multifd is used during migration, if there
> is an error before the destination receives all new channels, the
> source keeps running, however the destination does not exit but keeps
> waiting until the source is killed deliberately.
>
> Fix this by dumping the specific error and let users decide whether
> to quit from the destination side when failing to receive packet via
> some channel.
>
> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Signed-off-by: Fei Li <fli@suse.com>
> Reviewed-by: Peter Xu <peterx@redhat.com>
> ---
>  migration/channel.c   | 11 ++++++-----
>  migration/migration.c |  9 +++++++--
>  migration/migration.h |  2 +-
>  migration/ram.c       | 10 ++++++++--
>  migration/ram.h       |  2 +-
>  5 files changed, 23 insertions(+), 11 deletions(-)
>
> diff --git a/migration/channel.c b/migration/channel.c
> index 33e0e9b82f..20e4c8e2dc 100644
> --- a/migration/channel.c
> +++ b/migration/channel.c
> @@ -30,6 +30,7 @@
>  void migration_channel_process_incoming(QIOChannel *ioc)
>  {
>      MigrationState *s = migrate_get_current();
> +    Error *local_err = NULL;
>  
>      trace_migration_set_incoming_channel(
>          ioc, object_get_typename(OBJECT(ioc)));
> @@ -38,13 +39,13 @@ void migration_channel_process_incoming(QIOChannel *ioc)
>          *s->parameters.tls_creds &&
>          !object_dynamic_cast(OBJECT(ioc),
>                               TYPE_QIO_CHANNEL_TLS)) {
> -        Error *local_err = NULL;
>          migration_tls_channel_process_incoming(s, ioc, &local_err);
> -        if (local_err) {
> -            error_report_err(local_err);
> -        }
>      } else {
> -        migration_ioc_process_incoming(ioc);
> +        migration_ioc_process_incoming(ioc, &local_err);
> +    }
> +
> +    if (local_err) {
> +        error_report_err(local_err);
>      }
>  }
>  
> diff --git a/migration/migration.c b/migration/migration.c
> index 49ffb9997a..72106bddf0 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -541,7 +541,7 @@ void migration_fd_process_incoming(QEMUFile *f)
>      migration_incoming_process();
>  }
>  
> -void migration_ioc_process_incoming(QIOChannel *ioc)
> +void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp)
>  {
>      MigrationIncomingState *mis = migration_incoming_get_current();
>      bool start_migration;
> @@ -563,9 +563,14 @@ void migration_ioc_process_incoming(QIOChannel *ioc)
>           */
>          start_migration = !migrate_use_multifd();
>      } else {
> +        Error *local_err = NULL;
>          /* Multiple connections */
>          assert(migrate_use_multifd());
> -        start_migration = multifd_recv_new_channel(ioc);
> +        start_migration = multifd_recv_new_channel(ioc, &local_err);
> +        if (local_err) {
> +            error_propagate(errp, local_err);
> +            return;
> +        }
>      }
>  
>      if (start_migration) {
> diff --git a/migration/migration.h b/migration/migration.h
> index e413d4d8b6..02b7304610 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -229,7 +229,7 @@ struct MigrationState
>  void migrate_set_state(int *state, int old_state, int new_state);
>  
>  void migration_fd_process_incoming(QEMUFile *f);
> -void migration_ioc_process_incoming(QIOChannel *ioc);
> +void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp);
>  void migration_incoming_process(void);
>  
>  bool  migration_has_all_channels(void);
> diff --git a/migration/ram.c b/migration/ram.c
> index 7e7deec4d8..c7e3d6b0fd 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -1323,7 +1323,7 @@ bool multifd_recv_all_channels_created(void)
>  }
>  
>  /* Return true if multifd is ready for the migration, otherwise false */
> -bool multifd_recv_new_channel(QIOChannel *ioc)
> +bool multifd_recv_new_channel(QIOChannel *ioc, Error **errp)
>  {
>      MultiFDRecvParams *p;
>      Error *local_err = NULL;
> @@ -1331,6 +1331,10 @@ bool multifd_recv_new_channel(QIOChannel *ioc)
>  
>      id = multifd_recv_initial_packet(ioc, &local_err);
>      if (id < 0) {
> +        error_propagate_prepend(errp, local_err,
> +                                "failed to receive packet"
> +                                " via multifd channel %d: ",
> +                                atomic_read(&multifd_recv_state->count));
>          multifd_recv_terminate_threads(local_err);
>          return false;

Here, we return false without setting an error.

>      }
> @@ -1340,6 +1344,7 @@ bool multifd_recv_new_channel(QIOChannel *ioc)
>          error_setg(&local_err, "multifd: received id '%d' already setup'",
>                     id);
>          multifd_recv_terminate_threads(local_err);
> +        error_propagate(errp, local_err);
>          return false;

Here, we return false with setting an error.

>      }
>      p->c = ioc;
> @@ -1351,7 +1356,8 @@ bool multifd_recv_new_channel(QIOChannel *ioc)
>      qemu_thread_create(&p->thread, p->name, multifd_recv_thread, p,
>                         QEMU_THREAD_JOINABLE);
>      atomic_inc(&multifd_recv_state->count);
> -    return multifd_recv_state->count == migrate_multifd_channels();
> +    return atomic_read(&multifd_recv_state->count) ==
> +           migrate_multifd_channels();

Here, we return either true of false without setting an error.

Taken together, there are three cases:

1. Succeed and return true

2. Succeed and return false

3. Fail (set an error) and return false.

Assuming that's what we want: please update the function comment to
spell them out.

>  }
>  
>  /**
> diff --git a/migration/ram.h b/migration/ram.h
> index 83ff1bc11a..046d3074be 100644
> --- a/migration/ram.h
> +++ b/migration/ram.h
> @@ -47,7 +47,7 @@ int multifd_save_cleanup(Error **errp);
>  int multifd_load_setup(void);
>  int multifd_load_cleanup(Error **errp);
>  bool multifd_recv_all_channels_created(void);
> -bool multifd_recv_new_channel(QIOChannel *ioc);
> +bool multifd_recv_new_channel(QIOChannel *ioc, Error **errp);
>  
>  uint64_t ram_pagesize_summary(void);
>  int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t len);

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.0 v8 6/7] qemu_thread_create: propagate the error to callers to handle
  2018-12-11  9:50 ` [Qemu-devel] [PATCH for-4.0 v8 6/7] qemu_thread_create: propagate the error to callers to handle Fei Li
@ 2018-12-13  7:26   ` Markus Armbruster
  2018-12-14  0:24     ` David Gibson
  2018-12-17  7:29     ` Fei Li
  0 siblings, 2 replies; 27+ messages in thread
From: Markus Armbruster @ 2018-12-13  7:26 UTC (permalink / raw)
  To: Fei Li; +Cc: qemu-devel, Dr . David Alan Gilbert, David Gibson

There's a question for David Gibson inline.  Please search for /ppc/.

Fei Li <fli@suse.com> writes:

> Make qemu_thread_create() return a Boolean to indicate if it succeeds
> rather than failing with an error. And add an Error parameter to hold
> the error message and let the callers handle it.

The "rather than failing with an error" is misleading.  Before the
patch, we report to stderr and abort().  What about:

    qemu-thread: Make qemu_thread_create() handle errors properly

    qemu_thread_create() abort()s on error.  Not nice.  Give it a
    return value and an Error ** argument, so it can return success /
    failure.

Still missing from the commit message then: how you update the callers.
Let's see below.

>
> Cc: Markus Armbruster <armbru@redhat.com>
> Cc: Daniel P. Berrangé <berrange@redhat.com>
> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Signed-off-by: Fei Li <fli@suse.com>
> ---
>  cpus.c                      | 45 ++++++++++++++++++++++++-------------
>  dump.c                      |  6 +++--
>  hw/misc/edu.c               |  6 +++--
>  hw/ppc/spapr_hcall.c        | 10 +++++++--
>  hw/rdma/rdma_backend.c      |  4 +++-
>  hw/usb/ccid-card-emulated.c | 16 ++++++++++----
>  include/qemu/thread.h       |  4 ++--
>  io/task.c                   |  3 ++-
>  iothread.c                  | 16 +++++++++-----
>  migration/migration.c       | 54 +++++++++++++++++++++++++++++----------------
>  migration/postcopy-ram.c    | 14 ++++++++++--
>  migration/ram.c             | 40 ++++++++++++++++++++++++---------
>  migration/savevm.c          | 11 ++++++---
>  tests/atomic_add-bench.c    |  3 ++-
>  tests/iothread.c            |  2 +-
>  tests/qht-bench.c           |  3 ++-
>  tests/rcutorture.c          |  3 ++-
>  tests/test-aio.c            |  2 +-
>  tests/test-rcu-list.c       |  3 ++-
>  ui/vnc-jobs.c               | 17 +++++++++-----
>  ui/vnc-jobs.h               |  2 +-
>  ui/vnc.c                    |  4 +++-
>  util/compatfd.c             | 12 ++++++++--
>  util/oslib-posix.c          | 17 ++++++++++----
>  util/qemu-thread-posix.c    | 24 +++++++++++++-------
>  util/qemu-thread-win32.c    | 16 ++++++++++----
>  util/rcu.c                  |  3 ++-
>  util/thread-pool.c          |  4 +++-
>  28 files changed, 243 insertions(+), 101 deletions(-)
>
> diff --git a/cpus.c b/cpus.c
> index 7b091bda53..e8450e518a 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -1961,15 +1961,20 @@ static void qemu_tcg_init_vcpu(CPUState *cpu, Error **errp)
>              snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/TCG",
>                   cpu->cpu_index);
>  
> -            qemu_thread_create(cpu->thread, thread_name, qemu_tcg_cpu_thread_fn,
> -                               cpu, QEMU_THREAD_JOINABLE);
> +            if (!qemu_thread_create(cpu->thread, thread_name,
> +                                    qemu_tcg_cpu_thread_fn, cpu,
> +                                    QEMU_THREAD_JOINABLE, errp)) {
> +                return;
> +            }
>  
>          } else {
>              /* share a single thread for all cpus with TCG */
>              snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "ALL CPUs/TCG");
> -            qemu_thread_create(cpu->thread, thread_name,
> -                               qemu_tcg_rr_cpu_thread_fn,
> -                               cpu, QEMU_THREAD_JOINABLE);
> +            if (!qemu_thread_create(cpu->thread, thread_name,
> +                                    qemu_tcg_rr_cpu_thread_fn, cpu,
> +                                    QEMU_THREAD_JOINABLE, errp)) {
> +                return;
> +            }
>  
>              single_tcg_halt_cond = cpu->halt_cond;
>              single_tcg_cpu_thread = cpu->thread;

This is a caller that sets an error on failure.  You make it set an
error on qemu_thread_create() failure.  Makes sense.

> @@ -1997,8 +2002,10 @@ static void qemu_hax_start_vcpu(CPUState *cpu, Error **errp)
>  
>      snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/HAX",
>               cpu->cpu_index);
> -    qemu_thread_create(cpu->thread, thread_name, qemu_hax_cpu_thread_fn,
> -                       cpu, QEMU_THREAD_JOINABLE);
> +    if (!qemu_thread_create(cpu->thread, thread_name, qemu_hax_cpu_thread_fn,
> +                            cpu, QEMU_THREAD_JOINABLE, errp)) {
> +        return;
> +    }
>  #ifdef _WIN32
>      cpu->hThread = qemu_thread_get_handle(cpu->thread);
>  #endif

Likewise.  I'll stop commenting on this pattern now.

> @@ -2013,8 +2020,10 @@ static void qemu_kvm_start_vcpu(CPUState *cpu, Error **errp)
>      qemu_cond_init(cpu->halt_cond);
>      snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/KVM",
>               cpu->cpu_index);
> -    qemu_thread_create(cpu->thread, thread_name, qemu_kvm_cpu_thread_fn,
> -                       cpu, QEMU_THREAD_JOINABLE);
> +    if (!qemu_thread_create(cpu->thread, thread_name, qemu_kvm_cpu_thread_fn,
> +                            cpu, QEMU_THREAD_JOINABLE, errp)) {
> +        /* keep 'if' here in case there is further error handling logic */
> +    }
>  }
>  
>  static void qemu_hvf_start_vcpu(CPUState *cpu, Error **errp)
> @@ -2031,8 +2040,10 @@ static void qemu_hvf_start_vcpu(CPUState *cpu, Error **errp)
>  
>      snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/HVF",
>               cpu->cpu_index);
> -    qemu_thread_create(cpu->thread, thread_name, qemu_hvf_cpu_thread_fn,
> -                       cpu, QEMU_THREAD_JOINABLE);
> +    if (!qemu_thread_create(cpu->thread, thread_name, qemu_hvf_cpu_thread_fn,
> +                            cpu, QEMU_THREAD_JOINABLE, errp)) {
> +        /* keep 'if' here in case there is further error handling logic */
> +    }
>  }
>  
>  static void qemu_whpx_start_vcpu(CPUState *cpu, Error **errp)
> @@ -2044,8 +2055,10 @@ static void qemu_whpx_start_vcpu(CPUState *cpu, Error **errp)
>      qemu_cond_init(cpu->halt_cond);
>      snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/WHPX",
>               cpu->cpu_index);
> -    qemu_thread_create(cpu->thread, thread_name, qemu_whpx_cpu_thread_fn,
> -                       cpu, QEMU_THREAD_JOINABLE);
> +    if (!qemu_thread_create(cpu->thread, thread_name, qemu_whpx_cpu_thread_fn,
> +                            cpu, QEMU_THREAD_JOINABLE, errp)) {
> +        return;
> +    }
>  #ifdef _WIN32
>      cpu->hThread = qemu_thread_get_handle(cpu->thread);
>  #endif
> @@ -2060,8 +2073,10 @@ static void qemu_dummy_start_vcpu(CPUState *cpu, Error **errp)
>      qemu_cond_init(cpu->halt_cond);
>      snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/DUMMY",
>               cpu->cpu_index);
> -    qemu_thread_create(cpu->thread, thread_name, qemu_dummy_cpu_thread_fn, cpu,
> -                       QEMU_THREAD_JOINABLE);
> +    if (!qemu_thread_create(cpu->thread, thread_name, qemu_dummy_cpu_thread_fn,
> +                            cpu, QEMU_THREAD_JOINABLE, errp)) {
> +        /* keep 'if' here in case there is further error handling logic */
> +    }
>  }
>  
>  bool qemu_init_vcpu(CPUState *cpu, Error **errp)
> diff --git a/dump.c b/dump.c
> index 4ec94c5e25..1f003aff9a 100644
> --- a/dump.c
> +++ b/dump.c
> @@ -2020,8 +2020,10 @@ void qmp_dump_guest_memory(bool paging, const char *file,
>      if (detach_p) {
>          /* detached dump */
>          s->detached = true;
> -        qemu_thread_create(&s->dump_thread, "dump_thread", dump_thread,
> -                           s, QEMU_THREAD_DETACHED);
> +        if (!qemu_thread_create(&s->dump_thread, "dump_thread", dump_thread,
> +                                s, QEMU_THREAD_DETACHED, errp)) {
> +            /* keep 'if' here in case there is further error handling logic */
> +        }
>      } else {
>          /* sync dump */
>          dump_process(s, errp);
> diff --git a/hw/misc/edu.c b/hw/misc/edu.c
> index cdcf550dd7..6684c60a96 100644
> --- a/hw/misc/edu.c
> +++ b/hw/misc/edu.c
> @@ -355,8 +355,10 @@ static void pci_edu_realize(PCIDevice *pdev, Error **errp)
>  
>      qemu_mutex_init(&edu->thr_mutex);
>      qemu_cond_init(&edu->thr_cond);
> -    qemu_thread_create(&edu->thread, "edu", edu_fact_thread,
> -                       edu, QEMU_THREAD_JOINABLE);
> +    if (!qemu_thread_create(&edu->thread, "edu", edu_fact_thread,
> +                            edu, QEMU_THREAD_JOINABLE, errp)) {
> +        return;
> +    }
>  
>      memory_region_init_io(&edu->mmio, OBJECT(edu), &edu_mmio_ops, edu,
>                      "edu-mmio", 1 * MiB);
> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
> index ae913d070f..7c16ade04a 100644
> --- a/hw/ppc/spapr_hcall.c
> +++ b/hw/ppc/spapr_hcall.c
> @@ -478,6 +478,7 @@ static target_ulong h_resize_hpt_prepare(PowerPCCPU *cpu,
>      sPAPRPendingHPT *pending = spapr->pending_hpt;
>      uint64_t current_ram_size;
>      int rc;
> +    Error *local_err = NULL;
>  
>      if (spapr->resize_hpt == SPAPR_RESIZE_HPT_DISABLED) {
>          return H_AUTHORITY;
> @@ -538,8 +539,13 @@ static target_ulong h_resize_hpt_prepare(PowerPCCPU *cpu,
>      pending->shift = shift;
>      pending->ret = H_HARDWARE;
>  
> -    qemu_thread_create(&pending->thread, "sPAPR HPT prepare",
> -                       hpt_prepare_thread, pending, QEMU_THREAD_DETACHED);
> +    if (!qemu_thread_create(&pending->thread, "sPAPR HPT prepare",
> +                            hpt_prepare_thread, pending,
> +                            QEMU_THREAD_DETACHED, &local_err)) {
> +        error_reportf_err(local_err, "failed to create hpt_prepare_thread: ");
> +        g_free(pending);
> +        return H_RESOURCE;
> +    }
>  
>      spapr->pending_hpt = pending;
>  

This is a caller that returns an error code on failure.  You change it
to report the error, then return failure.  The return failure part looks
fine.  Whether reporting the error is appropriate I can't say for sure.
No other failure mode reports anything.  David, what do you think?

Fei Li, you could pass &error_abort to side-step this question for now.

> diff --git a/hw/rdma/rdma_backend.c b/hw/rdma/rdma_backend.c
> index d7a4bbd91f..53a2bd0d85 100644
> --- a/hw/rdma/rdma_backend.c
> +++ b/hw/rdma/rdma_backend.c
> @@ -164,8 +164,10 @@ static void start_comp_thread(RdmaBackendDev *backend_dev)
>      snprintf(thread_name, sizeof(thread_name), "rdma_comp_%s",
>               ibv_get_device_name(backend_dev->ib_dev));
>      backend_dev->comp_thread.run = true;
> +    /* FIXME: let the further caller handle the error instead of abort() here */
>      qemu_thread_create(&backend_dev->comp_thread.thread, thread_name,
> -                       comp_handler_thread, backend_dev, QEMU_THREAD_DETACHED);
> +                       comp_handler_thread, backend_dev,
> +                       QEMU_THREAD_DETACHED, &error_abort);
>  }
>  

This is a caller that can't return failure.  You pass &error_abort.  No
behavioral change.

I think I'd mark the spot TODO, not FIXME.  Matter of taste, I guess.

>  void rdma_backend_register_comp_handler(void (*handler)(int status,
> diff --git a/hw/usb/ccid-card-emulated.c b/hw/usb/ccid-card-emulated.c
> index 25976ed84f..c6783f124a 100644
> --- a/hw/usb/ccid-card-emulated.c
> +++ b/hw/usb/ccid-card-emulated.c
> @@ -33,6 +33,7 @@
>  #include "qemu/main-loop.h"
>  #include "ccid.h"
>  #include "qapi/error.h"
> +#include "qemu/error-report.h"
>  
>  #define DPRINTF(card, lvl, fmt, ...) \
>  do {\
> @@ -544,10 +545,17 @@ static void emulated_realize(CCIDCardState *base, Error **errp)
>          error_setg(errp, "%s: failed to initialize vcard", TYPE_EMULATED_CCID);
>          goto out2;
>      }
> -    qemu_thread_create(&card->event_thread_id, "ccid/event", event_thread,
> -                       card, QEMU_THREAD_JOINABLE);
> -    qemu_thread_create(&card->apdu_thread_id, "ccid/apdu", handle_apdu_thread,
> -                       card, QEMU_THREAD_JOINABLE);
> +    if (!qemu_thread_create(&card->event_thread_id, "ccid/event", event_thread,
> +                            card, QEMU_THREAD_JOINABLE, errp)) {
> +        error_report("failed to create event_thread");
> +        goto out2;
> +    }
> +    if (!qemu_thread_create(&card->apdu_thread_id, "ccid/apdu",
> +                            handle_apdu_thread, card,
> +                            QEMU_THREAD_JOINABLE, errp)) {
> +        error_report("failed to create handle_apdu_thread");
> +        goto out2;
> +    }
>  
>  out2:
>      clean_event_notifier(card);

error_report() in a realize() method is almost certainly wrong.

> diff --git a/include/qemu/thread.h b/include/qemu/thread.h
> index 55d83a907c..12291f4ccd 100644
> --- a/include/qemu/thread.h
> +++ b/include/qemu/thread.h
> @@ -152,9 +152,9 @@ void qemu_event_reset(QemuEvent *ev);
>  void qemu_event_wait(QemuEvent *ev);
>  void qemu_event_destroy(QemuEvent *ev);
>  
> -void qemu_thread_create(QemuThread *thread, const char *name,
> +bool qemu_thread_create(QemuThread *thread, const char *name,
>                          void *(*start_routine)(void *),
> -                        void *arg, int mode);
> +                        void *arg, int mode, Error **errp);
>  void *qemu_thread_join(QemuThread *thread);
>  void qemu_thread_get_self(QemuThread *thread);
>  bool qemu_thread_is_self(QemuThread *thread);
> diff --git a/io/task.c b/io/task.c
> index 2886a2c1bc..6d3a18ab80 100644
> --- a/io/task.c
> +++ b/io/task.c
> @@ -149,7 +149,8 @@ void qio_task_run_in_thread(QIOTask *task,
>                         "io-task-worker",
>                         qio_task_thread_worker,
>                         data,
> -                       QEMU_THREAD_DETACHED);
> +                       QEMU_THREAD_DETACHED,
> +                       &error_abort);
>  }
>  
>  

This is a caller that can't return failure.  You pass &error_abort.  No
behavioral change.  Unlike above, you don't mark this spot FIXME.  Any
particular reason for marking one, but not the other?

I'll stop commenting on this pattern now.

> diff --git a/iothread.c b/iothread.c
> index 2fb1cdf55d..7335dacf0b 100644
> --- a/iothread.c
> +++ b/iothread.c
> @@ -164,9 +164,7 @@ static void iothread_complete(UserCreatable *obj, Error **errp)
>                                  &local_error);
>      if (local_error) {
>          error_propagate(errp, local_error);
> -        aio_context_unref(iothread->ctx);
> -        iothread->ctx = NULL;
> -        return;
> +        goto fail;
>      }
>  
>      qemu_mutex_init(&iothread->init_done_lock);
> @@ -178,8 +176,12 @@ static void iothread_complete(UserCreatable *obj, Error **errp)
>       */
>      name = object_get_canonical_path_component(OBJECT(obj));
>      thread_name = g_strdup_printf("IO %s", name);
> -    qemu_thread_create(&iothread->thread, thread_name, iothread_run,
> -                       iothread, QEMU_THREAD_JOINABLE);
> +    if (!qemu_thread_create(&iothread->thread, thread_name, iothread_run,
> +                            iothread, QEMU_THREAD_JOINABLE, errp)) {
> +        g_free(thread_name);
> +        g_free(name);
> +        goto fail;
> +    }
>      g_free(thread_name);
>      g_free(name);
>  
> @@ -190,6 +192,10 @@ static void iothread_complete(UserCreatable *obj, Error **errp)
>                         &iothread->init_done_lock);
>      }
>      qemu_mutex_unlock(&iothread->init_done_lock);
> +    return;
> +fail:
> +    aio_context_unref(iothread->ctx);
> +    iothread->ctx = NULL;
>  }
>  
>  typedef struct {
> diff --git a/migration/migration.c b/migration/migration.c
> index 0537fc0c26..af6c72ac5d 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -438,19 +438,22 @@ static void process_incoming_migration_co(void *opaque)
>          /* Make sure all file formats flush their mutable metadata */
>          bdrv_invalidate_cache_all(&local_err);
>          if (local_err) {
> -            migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
> -                    MIGRATION_STATUS_FAILED);
>              error_report_err(local_err);
> -            exit(EXIT_FAILURE);
> +            goto fail;
>          }
>  
>          if (colo_init_ram_cache() < 0) {
>              error_report("Init ram cache failed");
> -            exit(EXIT_FAILURE);
> +            goto fail;
>          }
>  
> -        qemu_thread_create(&mis->colo_incoming_thread, "COLO incoming",
> -             colo_process_incoming_thread, mis, QEMU_THREAD_JOINABLE);
> +        if (!qemu_thread_create(&mis->colo_incoming_thread, "COLO incoming",
> +                                colo_process_incoming_thread, mis,
> +                                QEMU_THREAD_JOINABLE, &local_err)) {
> +            error_reportf_err(local_err, "failed to create "
> +                              "colo_process_incoming_thread: ");
> +            goto fail;
> +        }
>          mis->have_colo_incoming_thread = true;
>          qemu_coroutine_yield();
>  
> @@ -461,20 +464,22 @@ static void process_incoming_migration_co(void *opaque)
>      }
>  
>      if (ret < 0) {
> -        Error *local_err = NULL;
> -
> -        migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
> -                          MIGRATION_STATUS_FAILED);
>          error_report("load of migration failed: %s", strerror(-ret));
> -        qemu_fclose(mis->from_src_file);
> -        if (multifd_load_cleanup(&local_err) != 0) {
> -            error_report_err(local_err);
> -        }
> -        exit(EXIT_FAILURE);
> +        goto fail;
>      }
>      mis->bh = qemu_bh_new(process_incoming_migration_bh, mis);
>      qemu_bh_schedule(mis->bh);
>      mis->migration_incoming_co = NULL;
> +    return;
> +fail:
> +    local_err = NULL;
> +    migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
> +                      MIGRATION_STATUS_FAILED);
> +    qemu_fclose(mis->from_src_file);
> +    if (multifd_load_cleanup(&local_err) != 0) {
> +        error_report_err(local_err);
> +    }
> +    exit(EXIT_FAILURE);
>  }

You change handling of errors other than qemu_thread_create().  Separate
patch, please.  I'd put it before this one.

>  
>  static void migration_incoming_setup(QEMUFile *f)
> @@ -2345,6 +2350,7 @@ out:
>  static int open_return_path_on_source(MigrationState *ms,
>                                        bool create_thread)
>  {
> +    Error *local_err = NULL;
>  
>      ms->rp_state.from_dst_file = qemu_file_get_return_path(ms->to_dst_file);
>      if (!ms->rp_state.from_dst_file) {
> @@ -2358,8 +2364,13 @@ static int open_return_path_on_source(MigrationState *ms,
>          return 0;
>      }
>  
> -    qemu_thread_create(&ms->rp_state.rp_thread, "return path",
> -                       source_return_path_thread, ms, QEMU_THREAD_JOINABLE);
> +    if (!qemu_thread_create(&ms->rp_state.rp_thread, "return path",
> +                            source_return_path_thread, ms,
> +                            QEMU_THREAD_JOINABLE, &local_err)) {
> +        error_reportf_err(local_err,
> +                          "failed to create source_return_path_thread: ");
> +        return -1;
> +    }
>  
>      trace_open_return_path_on_source_continue();
>  
This is a caller that returns an error code on failure.  You change it
to report the error, then return failure.  This is okay, because its
sole caller also reports errors that way.

> @@ -3189,8 +3200,13 @@ void migrate_fd_connect(MigrationState *s, Error *error_in)
>          migrate_fd_cleanup(s);
>          return;
>      }
> -    qemu_thread_create(&s->thread, "live_migration", migration_thread, s,
> -                       QEMU_THREAD_JOINABLE);
> +    if (!qemu_thread_create(&s->thread, "live_migration", migration_thread,
> +                            s, QEMU_THREAD_JOINABLE, &error_in)) {
> +        error_reportf_err(error_in, "failed to create migration_thread: ");
> +        migrate_set_state(&s->state, s->state, MIGRATION_STATUS_FAILED);
> +        migrate_fd_cleanup(s);
> +        return;
> +    }
>      s->migration_thread_running = true;
>  }

This is a caller that reports errors.  You make it handle
qemu_thread_create() the same way.  Good.

>  
> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> index fa09dba534..80bfa9c4a2 100644
> --- a/migration/postcopy-ram.c
> +++ b/migration/postcopy-ram.c
> @@ -1083,6 +1083,8 @@ retry:
>  
>  int postcopy_ram_enable_notify(MigrationIncomingState *mis)
>  {
> +    Error *local_err = NULL;
> +
>      /* Open the fd for the kernel to give us userfaults */
>      mis->userfault_fd = syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK);
>      if (mis->userfault_fd == -1) {
> @@ -1109,8 +1111,16 @@ int postcopy_ram_enable_notify(MigrationIncomingState *mis)
>      }
>  
>      qemu_sem_init(&mis->fault_thread_sem, 0);
> -    qemu_thread_create(&mis->fault_thread, "postcopy/fault",
> -                       postcopy_ram_fault_thread, mis, QEMU_THREAD_JOINABLE);
> +    if (!qemu_thread_create(&mis->fault_thread, "postcopy/fault",
> +                            postcopy_ram_fault_thread, mis,
> +                            QEMU_THREAD_JOINABLE, &local_err)) {
> +        error_reportf_err(local_err,
> +                          "failed to create postcopy_ram_fault_thread: ");
> +        close(mis->userfault_event_fd);
> +        close(mis->userfault_fd);
> +        qemu_sem_destroy(&mis->fault_thread_sem);
> +        return -1;
> +    }
>      qemu_sem_wait(&mis->fault_thread_sem);
>      qemu_sem_destroy(&mis->fault_thread_sem);
>      mis->have_fault_thread = true;

This is a caller that reports errors, then returns failure.  You make it
handle qemu_thread_create() the same way.  Good.

Not related to this patch, just spotted while reviewing it:

       /* Mark so that we get notified of accesses to unwritten areas */
       if (qemu_ram_foreach_migratable_block(ram_block_enable_notify, mis)) {
           error_report("ram_block_enable_notify failed");
           return -1;
       }

Do we leak mis->userfault_fd, mis->userfault_event_fd,
mis->fault_thread_sem here?

> diff --git a/migration/ram.c b/migration/ram.c
> index 658dfa88a3..6e0cccf066 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -473,6 +473,7 @@ static void compress_threads_save_cleanup(void)
>  static int compress_threads_save_setup(void)
>  {
>      int i, thread_count;
> +    Error *local_err = NULL;
>  
>      if (!migrate_use_compression()) {
>          return 0;
> @@ -502,9 +503,12 @@ static int compress_threads_save_setup(void)
>          comp_param[i].quit = false;
>          qemu_mutex_init(&comp_param[i].mutex);
>          qemu_cond_init(&comp_param[i].cond);
> -        qemu_thread_create(compress_threads + i, "compress",
> -                           do_data_compress, comp_param + i,
> -                           QEMU_THREAD_JOINABLE);
> +        if (!qemu_thread_create(compress_threads + i, "compress",
> +                                do_data_compress, comp_param + i,
> +                                QEMU_THREAD_JOINABLE, &local_err)) {
> +            error_reportf_err(local_err, "failed to create do_data_compress: ");
> +            goto exit;
> +        }
>      }
>      return 0;
>  

Reviewing the migration changes is getting tiresome...  Is reporting the
error appropriate here, and why?

> @@ -1075,8 +1079,14 @@ static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque)
>          p->c = QIO_CHANNEL(sioc);
>          qio_channel_set_delay(p->c, false);
>          p->running = true;
> -        qemu_thread_create(&p->thread, p->name, multifd_send_thread, p,
> -                           QEMU_THREAD_JOINABLE);
> +        if (!qemu_thread_create(&p->thread, p->name, multifd_send_thread, p,
> +                                QEMU_THREAD_JOINABLE, &local_err)) {
> +            migrate_set_error(migrate_get_current(), local_err);
> +            error_reportf_err(local_err,
> +                              "failed to create multifd_send_thread: ");
> +            multifd_save_cleanup();
> +            return;
> +        }
>  
>          atomic_inc(&multifd_send_state->count);
>      }

Same question.

> @@ -1350,8 +1360,13 @@ bool multifd_recv_new_channel(QIOChannel *ioc, Error **errp)
>      p->num_packets = 1;
>  
>      p->running = true;
> -    qemu_thread_create(&p->thread, p->name, multifd_recv_thread, p,
> -                       QEMU_THREAD_JOINABLE);
> +    if (!qemu_thread_create(&p->thread, p->name, multifd_recv_thread,
> +                            p, QEMU_THREAD_JOINABLE, &local_err)) {
> +        error_propagate_prepend(errp, local_err,
> +                                "failed to create multifd_recv_thread: ");
> +        multifd_recv_terminate_threads(local_err);
> +        return false;
> +    }
>      atomic_inc(&multifd_recv_state->count);
>      return atomic_read(&multifd_recv_state->count) ==
>             migrate_multifd_channels();
> @@ -3617,6 +3632,7 @@ static void compress_threads_load_cleanup(void)
>  static int compress_threads_load_setup(QEMUFile *f)
>  {
>      int i, thread_count;
> +    Error *local_err = NULL;
>  
>      if (!migrate_use_compression()) {
>          return 0;
> @@ -3638,9 +3654,13 @@ static int compress_threads_load_setup(QEMUFile *f)
>          qemu_cond_init(&decomp_param[i].cond);
>          decomp_param[i].done = true;
>          decomp_param[i].quit = false;
> -        qemu_thread_create(decompress_threads + i, "decompress",
> -                           do_data_decompress, decomp_param + i,
> -                           QEMU_THREAD_JOINABLE);
> +        if (!qemu_thread_create(decompress_threads + i, "decompress",
> +                                do_data_decompress, decomp_param + i,
> +                                QEMU_THREAD_JOINABLE, &local_err)) {
> +            error_reportf_err(local_err,
> +                              "failed to create do_data_decompress: ");
> +            goto exit;
> +        }
>      }
>      return 0;
>  exit:

Same question.

> diff --git a/migration/savevm.c b/migration/savevm.c
> index d784e8aa40..b8bdcde5d8 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -1747,9 +1747,14 @@ static int loadvm_postcopy_handle_listen(MigrationIncomingState *mis)
>      mis->have_listen_thread = true;
>      /* Start up the listening thread and wait for it to signal ready */
>      qemu_sem_init(&mis->listen_thread_sem, 0);
> -    qemu_thread_create(&mis->listen_thread, "postcopy/listen",
> -                       postcopy_ram_listen_thread, NULL,
> -                       QEMU_THREAD_DETACHED);
> +    if (!qemu_thread_create(&mis->listen_thread, "postcopy/listen",
> +                            postcopy_ram_listen_thread, NULL,
> +                            QEMU_THREAD_DETACHED, &local_err)) {
> +        error_reportf_err(local_err,
> +                          "failed to create postcopy_ram_listen_thread: ");
> +        qemu_sem_destroy(&mis->listen_thread_sem);
> +        return -1;
> +    }
>      qemu_sem_wait(&mis->listen_thread_sem);
>      qemu_sem_destroy(&mis->listen_thread_sem);
>  
This is a caller that reports errors, then returns failure.  You make it
handle qemu_thread_create() the same way.  Good.

I'll stop commenting on this pattern now.

> diff --git a/tests/atomic_add-bench.c b/tests/atomic_add-bench.c
> index 2f6c72f63a..338b9563e3 100644
> --- a/tests/atomic_add-bench.c
> +++ b/tests/atomic_add-bench.c
> @@ -2,6 +2,7 @@
>  #include "qemu/thread.h"
>  #include "qemu/host-utils.h"
>  #include "qemu/processor.h"
> +#include "qapi/error.h"
>  
>  struct thread_info {
>      uint64_t r;
> @@ -110,7 +111,7 @@ static void create_threads(void)
>  
>          info->r = (i + 1) ^ time(NULL);
>          qemu_thread_create(&threads[i], NULL, thread_func, info,
> -                           QEMU_THREAD_JOINABLE);
> +                           QEMU_THREAD_JOINABLE, &error_abort);
>      }
>  }
>  
> diff --git a/tests/iothread.c b/tests/iothread.c
> index 777d9eea46..f4ad992e61 100644
> --- a/tests/iothread.c
> +++ b/tests/iothread.c
> @@ -73,7 +73,7 @@ IOThread *iothread_new(void)
>      qemu_mutex_init(&iothread->init_done_lock);
>      qemu_cond_init(&iothread->init_done_cond);
>      qemu_thread_create(&iothread->thread, NULL, iothread_run,
> -                       iothread, QEMU_THREAD_JOINABLE);
> +                       iothread, QEMU_THREAD_JOINABLE, &error_abort);
>  
>      /* Wait for initialization to complete */
>      qemu_mutex_lock(&iothread->init_done_lock);
> diff --git a/tests/qht-bench.c b/tests/qht-bench.c
> index 2089e2bed1..71df567ea2 100644
> --- a/tests/qht-bench.c
> +++ b/tests/qht-bench.c
> @@ -9,6 +9,7 @@
>  #include "qemu/atomic.h"
>  #include "qemu/qht.h"
>  #include "qemu/rcu.h"
> +#include "qapi/error.h"
>  #include "exec/tb-hash-xx.h"
>  
>  struct thread_stats {
> @@ -247,7 +248,7 @@ th_create_n(QemuThread **threads, struct thread_info **infos, const char *name,
>          prepare_thread_info(&info[i], offset + i);
>          info[i].func = func;
>          qemu_thread_create(&th[i], name, thread_func, &info[i],
> -                           QEMU_THREAD_JOINABLE);
> +                           QEMU_THREAD_JOINABLE, &error_abort);
>      }
>  }
>  
> diff --git a/tests/rcutorture.c b/tests/rcutorture.c
> index 49311c82ea..0e799ff256 100644
> --- a/tests/rcutorture.c
> +++ b/tests/rcutorture.c
> @@ -64,6 +64,7 @@
>  #include "qemu/atomic.h"
>  #include "qemu/rcu.h"
>  #include "qemu/thread.h"
> +#include "qapi/error.h"
>  
>  long long n_reads = 0LL;
>  long n_updates = 0L;
> @@ -90,7 +91,7 @@ static void create_thread(void *(*func)(void *))
>          exit(-1);
>      }
>      qemu_thread_create(&threads[n_threads], "test", func, &data[n_threads],
> -                       QEMU_THREAD_JOINABLE);
> +                       QEMU_THREAD_JOINABLE, &error_abort);
>      n_threads++;
>  }
>  
> diff --git a/tests/test-aio.c b/tests/test-aio.c
> index 86fb73b3d5..b3ac261724 100644
> --- a/tests/test-aio.c
> +++ b/tests/test-aio.c
> @@ -154,7 +154,7 @@ static void test_acquire(void)
>  
>      qemu_thread_create(&thread, "test_acquire_thread",
>                         test_acquire_thread,
> -                       &data, QEMU_THREAD_JOINABLE);
> +                       &data, QEMU_THREAD_JOINABLE, &error_abort);
>  
>      /* Block in aio_poll(), let other thread kick us and acquire context */
>      aio_context_acquire(ctx);
> diff --git a/tests/test-rcu-list.c b/tests/test-rcu-list.c
> index 2e6f70bd59..0f7da81291 100644
> --- a/tests/test-rcu-list.c
> +++ b/tests/test-rcu-list.c
> @@ -25,6 +25,7 @@
>  #include "qemu/rcu.h"
>  #include "qemu/thread.h"
>  #include "qemu/rcu_queue.h"
> +#include "qapi/error.h"
>  
>  /*
>   * Test variables.
> @@ -68,7 +69,7 @@ static void create_thread(void *(*func)(void *))
>          exit(-1);
>      }
>      qemu_thread_create(&threads[n_threads], "test", func, &data[n_threads],
> -                       QEMU_THREAD_JOINABLE);
> +                       QEMU_THREAD_JOINABLE, &error_abort);
>      n_threads++;
>  }
>  
> diff --git a/ui/vnc-jobs.c b/ui/vnc-jobs.c
> index 929391f85d..35a652d1fd 100644
> --- a/ui/vnc-jobs.c
> +++ b/ui/vnc-jobs.c
> @@ -31,6 +31,7 @@
>  #include "vnc-jobs.h"
>  #include "qemu/sockets.h"
>  #include "qemu/main-loop.h"
> +#include "qapi/error.h"
>  #include "block/aio.h"
>  
>  /*
> @@ -331,15 +332,21 @@ static bool vnc_worker_thread_running(void)
>      return queue; /* Check global queue */
>  }
>  
> -void vnc_start_worker_thread(void)
> +bool vnc_start_worker_thread(Error **errp)
>  {
>      VncJobQueue *q;
>  
> -    if (vnc_worker_thread_running())
> -        return ;
> +    if (vnc_worker_thread_running()) {
> +        goto out;
> +    }
>  
>      q = vnc_queue_init();
> -    qemu_thread_create(&q->thread, "vnc_worker", vnc_worker_thread, q,
> -                       QEMU_THREAD_DETACHED);
> +    if (!qemu_thread_create(&q->thread, "vnc_worker", vnc_worker_thread,
> +                            q, QEMU_THREAD_DETACHED, errp)) {
> +        vnc_queue_clear(q);
> +        return false;
> +    }
>      queue = q; /* Set global queue */
> +out:
> +    return true;
>  }

I recommend to pass &error_abort to qemu_thread_create() in this patch,
then convert vnc_start_worker_thread() to Error in a subsequent patch.

> diff --git a/ui/vnc-jobs.h b/ui/vnc-jobs.h
> index 59f66bcc35..14640593db 100644
> --- a/ui/vnc-jobs.h
> +++ b/ui/vnc-jobs.h
> @@ -37,7 +37,7 @@ void vnc_job_push(VncJob *job);
>  void vnc_jobs_join(VncState *vs);
>  
>  void vnc_jobs_consume_buffer(VncState *vs);
> -void vnc_start_worker_thread(void);
> +bool vnc_start_worker_thread(Error **errp);
>  
>  /* Locks */
>  static inline int vnc_trylock_display(VncDisplay *vd)
> diff --git a/ui/vnc.c b/ui/vnc.c
> index 0c1b477425..0ffe9e6a5d 100644
> --- a/ui/vnc.c
> +++ b/ui/vnc.c
> @@ -3236,7 +3236,9 @@ void vnc_display_init(const char *id, Error **errp)
>      vd->connections_limit = 32;
>  
>      qemu_mutex_init(&vd->mutex);
> -    vnc_start_worker_thread();
> +    if (!vnc_start_worker_thread(errp)) {
> +        return;
> +    }
>  
>      vd->dcl.ops = &dcl_ops;
>      register_displaychangelistener(&vd->dcl);

These two hunks then also go into the subsequent patch.

> diff --git a/util/compatfd.c b/util/compatfd.c
> index 980bd33e52..886aa249f9 100644
> --- a/util/compatfd.c
> +++ b/util/compatfd.c
> @@ -16,6 +16,7 @@
>  #include "qemu/osdep.h"
>  #include "qemu-common.h"
>  #include "qemu/thread.h"
> +#include "qapi/error.h"
>  
>  #include <sys/syscall.h>
>  
> @@ -70,6 +71,7 @@ static int qemu_signalfd_compat(const sigset_t *mask)
>      struct sigfd_compat_info *info;
>      QemuThread thread;
>      int fds[2];
> +    Error *local_err = NULL;
>  
>      info = malloc(sizeof(*info));
>      if (info == NULL) {
> @@ -88,8 +90,14 @@ static int qemu_signalfd_compat(const sigset_t *mask)
>      memcpy(&info->mask, mask, sizeof(*mask));
>      info->fd = fds[1];
>  
> -    qemu_thread_create(&thread, "signalfd_compat", sigwait_compat, info,
> -                       QEMU_THREAD_DETACHED);
> +    if (!qemu_thread_create(&thread, "signalfd_compat", sigwait_compat,
> +                            info, QEMU_THREAD_DETACHED, &local_err)) {
> +        error_reportf_err(local_err, "failed to create sigwait_compat: ");
> +        close(fds[0]);
> +        close(fds[1]);
> +        free(info);
> +        return -1;
> +    }
>  
>      return fds[0];
>  }

This function is implements signalfd() when the kernel doesn't provide
it.

signalfd() sets errno on failure.  The replacement's existing failure
modes set errno.  You add a failure mode that doesn't set errno.  That's
a bug.  To fix it, you can either make qemu_thread_create() set errno,
or you can make it return a value you can use to set errno.  The common
way to do the latter is returning a *negated* errno value.

signalfd() doesn't print anything on failure.  The replacement's
existing failure modes don't print anything.  You add a failure mode
that does print.  I think it shouldn't.

> diff --git a/util/oslib-posix.c b/util/oslib-posix.c
> index c1bee2a581..2c779fd634 100644
> --- a/util/oslib-posix.c
> +++ b/util/oslib-posix.c
> @@ -437,9 +437,12 @@ static bool touch_all_pages(char *area, size_t hpagesize, size_t numpages,
>      size_t size_per_thread;
>      char *addr = area;
>      int i = 0;
> +    int started_thread = 0;
> +    Error *local_err = NULL;
>  
>      memset_thread_failed = false;
>      memset_num_threads = get_memset_num_threads(smp_cpus);
> +    started_thread = memset_num_threads;
>      memset_thread = g_new0(MemsetThread, memset_num_threads);
>      numpages_per_thread = (numpages / memset_num_threads);
>      size_per_thread = (hpagesize * numpages_per_thread);
> @@ -448,13 +451,19 @@ static bool touch_all_pages(char *area, size_t hpagesize, size_t numpages,
>          memset_thread[i].numpages = (i == (memset_num_threads - 1)) ?
>                                      numpages : numpages_per_thread;
>          memset_thread[i].hpagesize = hpagesize;
> -        qemu_thread_create(&memset_thread[i].pgthread, "touch_pages",
> -                           do_touch_pages, &memset_thread[i],
> -                           QEMU_THREAD_JOINABLE);
> +        if (!qemu_thread_create(&memset_thread[i].pgthread, "touch_pages",
> +                                do_touch_pages, &memset_thread[i],
> +                                QEMU_THREAD_JOINABLE, &local_err)) {
> +            error_reportf_err(local_err, "failed to create do_touch_pages: ");
> +            memset_thread_failed = true;
> +            started_thread = i;
> +            goto out;
> +        }
>          addr += size_per_thread;
>          numpages -= numpages_per_thread;
>      }
> -    for (i = 0; i < memset_num_threads; i++) {
> +out:
> +    for (i = 0; i < started_thread; i++) {
>          qemu_thread_join(&memset_thread[i].pgthread);
>      }
>      g_free(memset_thread);

You need to convert this function to Error instead, because its caller
os_mem_prealloc() sets an error on failure.  I recommend to pass
&error_abort in this patch, and convert to Error in a subsequent patch.

> diff --git a/util/qemu-thread-posix.c b/util/qemu-thread-posix.c
> index 865e476df5..81b40a1ece 100644
> --- a/util/qemu-thread-posix.c
> +++ b/util/qemu-thread-posix.c
> @@ -15,6 +15,7 @@
>  #include "qemu/atomic.h"
>  #include "qemu/notify.h"
>  #include "qemu-thread-common.h"
> +#include "qapi/error.h"
>  
>  static bool name_threads;
>  
> @@ -500,9 +501,9 @@ static void *qemu_thread_start(void *args)
>      return r;
>  }
>  
> -void qemu_thread_create(QemuThread *thread, const char *name,
> -                       void *(*start_routine)(void*),
> -                       void *arg, int mode)
> +bool qemu_thread_create(QemuThread *thread, const char *name,
> +                        void *(*start_routine)(void *),
> +                        void *arg, int mode, Error **errp)
>  {
>      sigset_t set, oldset;
>      int err;
> @@ -511,7 +512,9 @@ void qemu_thread_create(QemuThread *thread, const char *name,
>  
>      err = pthread_attr_init(&attr);
>      if (err) {
> -        error_exit(err, __func__);
> +        error_setg_errno(errp, -err, "pthread_attr_init failed: %s",
> +                         strerror(err));
> +        return false;
>      }
>  
>      if (mode == QEMU_THREAD_DETACHED) {
> @@ -526,16 +529,21 @@ void qemu_thread_create(QemuThread *thread, const char *name,
>      qemu_thread_args->name = g_strdup(name);
>      qemu_thread_args->start_routine = start_routine;
>      qemu_thread_args->arg = arg;
> -

Let's keep the blank line.

>      err = pthread_create(&thread->thread, &attr,
>                           qemu_thread_start, qemu_thread_args);
> -
> -    if (err)
> -        error_exit(err, __func__);
> +    if (err) {
> +        error_setg_errno(errp, -err, "pthread_create failed: %s",
> +                         strerror(err));
> +        pthread_attr_destroy(&attr);
> +        g_free(qemu_thread_args->name);
> +        g_free(qemu_thread_args);
> +        return false;
> +    }
>  
>      pthread_sigmask(SIG_SETMASK, &oldset, NULL);
>  
>      pthread_attr_destroy(&attr);
> +    return true;
>  }
>  
>  void qemu_thread_get_self(QemuThread *thread)
> diff --git a/util/qemu-thread-win32.c b/util/qemu-thread-win32.c
> index 4a363ca675..57b1143e97 100644
> --- a/util/qemu-thread-win32.c
> +++ b/util/qemu-thread-win32.c
> @@ -20,6 +20,7 @@
>  #include "qemu/thread.h"
>  #include "qemu/notify.h"
>  #include "qemu-thread-common.h"
> +#include "qapi/error.h"
>  #include <process.h>
>  
>  static bool name_threads;
> @@ -388,9 +389,9 @@ void *qemu_thread_join(QemuThread *thread)
>      return ret;
>  }
>  
> -void qemu_thread_create(QemuThread *thread, const char *name,
> -                       void *(*start_routine)(void *),
> -                       void *arg, int mode)
> +bool qemu_thread_create(QemuThread *thread, const char *name,
> +                        void *(*start_routine)(void *),
> +                        void *arg, int mode, Error **errp)
>  {
>      HANDLE hThread;
>      struct QemuThreadData *data;
> @@ -409,10 +410,17 @@ void qemu_thread_create(QemuThread *thread, const char *name,
>      hThread = (HANDLE) _beginthreadex(NULL, 0, win32_start_routine,
>                                        data, 0, &thread->tid);
>      if (!hThread) {
> -        error_exit(GetLastError(), __func__);
> +        if (data->mode != QEMU_THREAD_DETACHED) {
> +            DeleteCriticalSection(&data->cs);
> +        }
> +        error_setg_errno(errp, errno,
> +                         "failed to create win32_start_routine");
> +        g_free(data);
> +        return false;
>      }
>      CloseHandle(hThread);
>      thread->data = data;
> +    return true;
>  }
>  
>  void qemu_thread_get_self(QemuThread *thread)
> diff --git a/util/rcu.c b/util/rcu.c
> index 5676c22bd1..145dcdb0c6 100644
> --- a/util/rcu.c
> +++ b/util/rcu.c
> @@ -32,6 +32,7 @@
>  #include "qemu/atomic.h"
>  #include "qemu/thread.h"
>  #include "qemu/main-loop.h"
> +#include "qapi/error.h"
>  #if defined(CONFIG_MALLOC_TRIM)
>  #include <malloc.h>
>  #endif
> @@ -325,7 +326,7 @@ static void rcu_init_complete(void)
>       * must have been quiescent even after forking, just recreate it.
>       */
>      qemu_thread_create(&thread, "call_rcu", call_rcu_thread,
> -                       NULL, QEMU_THREAD_DETACHED);
> +                       NULL, QEMU_THREAD_DETACHED, &error_abort);
>  
>      rcu_register_thread();
>  }
> diff --git a/util/thread-pool.c b/util/thread-pool.c
> index 610646d131..ad0f980783 100644
> --- a/util/thread-pool.c
> +++ b/util/thread-pool.c
> @@ -22,6 +22,7 @@
>  #include "trace.h"
>  #include "block/thread-pool.h"
>  #include "qemu/main-loop.h"
> +#include "qapi/error.h"
>  
>  static void do_spawn_thread(ThreadPool *pool);
>  
> @@ -132,7 +133,8 @@ static void do_spawn_thread(ThreadPool *pool)
>      pool->new_threads--;
>      pool->pending_threads++;
>  
> -    qemu_thread_create(&t, "worker", worker_thread, pool, QEMU_THREAD_DETACHED);
> +    qemu_thread_create(&t, "worker", worker_thread, pool,
> +                       QEMU_THREAD_DETACHED, &error_abort);
>  }
>  
>  static void spawn_thread_bh_fn(void *opaque)

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.0 v8 6/7] qemu_thread_create: propagate the error to callers to handle
  2018-12-13  7:26   ` Markus Armbruster
@ 2018-12-14  0:24     ` David Gibson
  2018-12-19  9:29       ` Markus Armbruster
  2018-12-17  7:29     ` Fei Li
  1 sibling, 1 reply; 27+ messages in thread
From: David Gibson @ 2018-12-14  0:24 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: Fei Li, qemu-devel, Dr . David Alan Gilbert

[-- Attachment #1: Type: text/plain, Size: 2880 bytes --]

On Thu, 13 Dec 2018 08:26:48 +0100
Markus Armbruster <armbru@redhat.com> wrote:

> There's a question for David Gibson inline.  Please search for /ppc/.
> 
> Fei Li <fli@suse.com> writes:
> 
> > Make qemu_thread_create() return a Boolean to indicate if it succeeds
> > rather than failing with an error. And add an Error parameter to hold
> > the error message and let the callers handle it.  
> 
> The "rather than failing with an error" is misleading.  Before the
> patch, we report to stderr and abort().  What about:
> 
>     qemu-thread: Make qemu_thread_create() handle errors properly
> 
>     qemu_thread_create() abort()s on error.  Not nice.  Give it a
>     return value and an Error ** argument, so it can return success /
>     failure.
> 
> Still missing from the commit message then: how you update the callers.
> Let's see below.

[snip]
> > --- a/hw/ppc/spapr_hcall.c
> > +++ b/hw/ppc/spapr_hcall.c
> > @@ -478,6 +478,7 @@ static target_ulong h_resize_hpt_prepare(PowerPCCPU *cpu,
> >      sPAPRPendingHPT *pending = spapr->pending_hpt;
> >      uint64_t current_ram_size;
> >      int rc;
> > +    Error *local_err = NULL;
> >  
> >      if (spapr->resize_hpt == SPAPR_RESIZE_HPT_DISABLED) {
> >          return H_AUTHORITY;
> > @@ -538,8 +539,13 @@ static target_ulong h_resize_hpt_prepare(PowerPCCPU *cpu,
> >      pending->shift = shift;
> >      pending->ret = H_HARDWARE;
> >  
> > -    qemu_thread_create(&pending->thread, "sPAPR HPT prepare",
> > -                       hpt_prepare_thread, pending, QEMU_THREAD_DETACHED);
> > +    if (!qemu_thread_create(&pending->thread, "sPAPR HPT prepare",
> > +                            hpt_prepare_thread, pending,
> > +                            QEMU_THREAD_DETACHED, &local_err)) {
> > +        error_reportf_err(local_err, "failed to create hpt_prepare_thread: ");
> > +        g_free(pending);
> > +        return H_RESOURCE;
> > +    }
> >  
> >      spapr->pending_hpt = pending;
> >    
> 
> This is a caller that returns an error code on failure.  You change it
> to report the error, then return failure.  The return failure part looks
> fine.  Whether reporting the error is appropriate I can't say for sure.
> No other failure mode reports anything.  David, what do you think?

I think it's reasonable here.  In this context error returns and
reported errors are for different audiences.  The error returns are for
the guest, the reported errors are for the guest administrator or
management layers.  This particularly failure is essentially a host
side fault that is mostly relevant to the VM management.  We have to
say *something* to the guest to explain that the action couldn't go
forward and H_RESOURCE makes as much sense as anything.

-- 
David Gibson <dgibson@redhat.com>
Principal Software Engineer, Virtualization, Red Hat

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.0 v8 6/7] qemu_thread_create: propagate the error to callers to handle
  2018-12-13  7:26   ` Markus Armbruster
  2018-12-14  0:24     ` David Gibson
@ 2018-12-17  7:29     ` Fei Li
  2018-12-18 12:40       ` Fei Li
  2018-12-19 10:10       ` Markus Armbruster
  1 sibling, 2 replies; 27+ messages in thread
From: Fei Li @ 2018-12-17  7:29 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, Dr . David Alan Gilbert, David Gibson



On 12/13/2018 03:26 PM, Markus Armbruster wrote:
> There's a question for David Gibson inline.  Please search for /ppc/.
>
> Fei Li <fli@suse.com> writes:
>
>> Make qemu_thread_create() return a Boolean to indicate if it succeeds
>> rather than failing with an error. And add an Error parameter to hold
>> the error message and let the callers handle it.
> The "rather than failing with an error" is misleading.  Before the
> patch, we report to stderr and abort().  What about:
>
>      qemu-thread: Make qemu_thread_create() handle errors properly
>
>      qemu_thread_create() abort()s on error.  Not nice.  Give it a
>      return value and an Error ** argument, so it can return success /
>      failure.
A nice commit-amend! Thanks!
> Still missing from the commit message then: how you update the callers.
Yes, agree. I think the-how should also be noted here, like
- propagating the err to callers whose call trace already have the Error 
paramater;
- just add an &error_abort for qemu_thread_create() and make it a "TODO: 
xxx";
> Let's see below.
>
>> Cc: Markus Armbruster <armbru@redhat.com>
>> Cc: Daniel P. Berrangé <berrange@redhat.com>
>> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
>> Signed-off-by: Fei Li <fli@suse.com>
>> ---
>>   cpus.c                      | 45 ++++++++++++++++++++++++-------------
>>   dump.c                      |  6 +++--
>>   hw/misc/edu.c               |  6 +++--
>>   hw/ppc/spapr_hcall.c        | 10 +++++++--
>>   hw/rdma/rdma_backend.c      |  4 +++-
>>   hw/usb/ccid-card-emulated.c | 16 ++++++++++----
>>   include/qemu/thread.h       |  4 ++--
>>   io/task.c                   |  3 ++-
>>   iothread.c                  | 16 +++++++++-----
>>   migration/migration.c       | 54 +++++++++++++++++++++++++++++----------------
>>   migration/postcopy-ram.c    | 14 ++++++++++--
>>   migration/ram.c             | 40 ++++++++++++++++++++++++---------
>>   migration/savevm.c          | 11 ++++++---
>>   tests/atomic_add-bench.c    |  3 ++-
>>   tests/iothread.c            |  2 +-
>>   tests/qht-bench.c           |  3 ++-
>>   tests/rcutorture.c          |  3 ++-
>>   tests/test-aio.c            |  2 +-
>>   tests/test-rcu-list.c       |  3 ++-
>>   ui/vnc-jobs.c               | 17 +++++++++-----
>>   ui/vnc-jobs.h               |  2 +-
>>   ui/vnc.c                    |  4 +++-
>>   util/compatfd.c             | 12 ++++++++--
>>   util/oslib-posix.c          | 17 ++++++++++----
>>   util/qemu-thread-posix.c    | 24 +++++++++++++-------
>>   util/qemu-thread-win32.c    | 16 ++++++++++----
>>   util/rcu.c                  |  3 ++-
>>   util/thread-pool.c          |  4 +++-
>>   28 files changed, 243 insertions(+), 101 deletions(-)
>>
>> diff --git a/cpus.c b/cpus.c
>> index 7b091bda53..e8450e518a 100644
>> --- a/cpus.c
>> +++ b/cpus.c
>> @@ -1961,15 +1961,20 @@ static void qemu_tcg_init_vcpu(CPUState *cpu, Error **errp)
>>               snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/TCG",
>>                    cpu->cpu_index);
>>   
>> -            qemu_thread_create(cpu->thread, thread_name, qemu_tcg_cpu_thread_fn,
>> -                               cpu, QEMU_THREAD_JOINABLE);
>> +            if (!qemu_thread_create(cpu->thread, thread_name,
>> +                                    qemu_tcg_cpu_thread_fn, cpu,
>> +                                    QEMU_THREAD_JOINABLE, errp)) {
>> +                return;
>> +            }
>>   
>>           } else {
>>               /* share a single thread for all cpus with TCG */
>>               snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "ALL CPUs/TCG");
>> -            qemu_thread_create(cpu->thread, thread_name,
>> -                               qemu_tcg_rr_cpu_thread_fn,
>> -                               cpu, QEMU_THREAD_JOINABLE);
>> +            if (!qemu_thread_create(cpu->thread, thread_name,
>> +                                    qemu_tcg_rr_cpu_thread_fn, cpu,
>> +                                    QEMU_THREAD_JOINABLE, errp)) {
>> +                return;
>> +            }
>>   
>>               single_tcg_halt_cond = cpu->halt_cond;
>>               single_tcg_cpu_thread = cpu->thread;
> This is a caller that sets an error on failure.  You make it set an
> error on qemu_thread_create() failure.  Makes sense.
Thanks for the comment!
>> @@ -1997,8 +2002,10 @@ static void qemu_hax_start_vcpu(CPUState *cpu, Error **errp)
>>   
>>       snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/HAX",
>>                cpu->cpu_index);
>> -    qemu_thread_create(cpu->thread, thread_name, qemu_hax_cpu_thread_fn,
>> -                       cpu, QEMU_THREAD_JOINABLE);
>> +    if (!qemu_thread_create(cpu->thread, thread_name, qemu_hax_cpu_thread_fn,
>> +                            cpu, QEMU_THREAD_JOINABLE, errp)) {
>> +        return;
>> +    }
>>   #ifdef _WIN32
>>       cpu->hThread = qemu_thread_get_handle(cpu->thread);
>>   #endif
> Likewise.  I'll stop commenting on this pattern now.
>
>> @@ -2013,8 +2020,10 @@ static void qemu_kvm_start_vcpu(CPUState *cpu, Error **errp)
>>       qemu_cond_init(cpu->halt_cond);
>>       snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/KVM",
>>                cpu->cpu_index);
>> -    qemu_thread_create(cpu->thread, thread_name, qemu_kvm_cpu_thread_fn,
>> -                       cpu, QEMU_THREAD_JOINABLE);
>> +    if (!qemu_thread_create(cpu->thread, thread_name, qemu_kvm_cpu_thread_fn,
>> +                            cpu, QEMU_THREAD_JOINABLE, errp)) {
>> +        /* keep 'if' here in case there is further error handling logic */
>> +    }
>>   }
>>   
>>   static void qemu_hvf_start_vcpu(CPUState *cpu, Error **errp)
>> @@ -2031,8 +2040,10 @@ static void qemu_hvf_start_vcpu(CPUState *cpu, Error **errp)
>>   
>>       snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/HVF",
>>                cpu->cpu_index);
>> -    qemu_thread_create(cpu->thread, thread_name, qemu_hvf_cpu_thread_fn,
>> -                       cpu, QEMU_THREAD_JOINABLE);
>> +    if (!qemu_thread_create(cpu->thread, thread_name, qemu_hvf_cpu_thread_fn,
>> +                            cpu, QEMU_THREAD_JOINABLE, errp)) {
>> +        /* keep 'if' here in case there is further error handling logic */
>> +    }
>>   }
>>   
>>   static void qemu_whpx_start_vcpu(CPUState *cpu, Error **errp)
>> @@ -2044,8 +2055,10 @@ static void qemu_whpx_start_vcpu(CPUState *cpu, Error **errp)
>>       qemu_cond_init(cpu->halt_cond);
>>       snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/WHPX",
>>                cpu->cpu_index);
>> -    qemu_thread_create(cpu->thread, thread_name, qemu_whpx_cpu_thread_fn,
>> -                       cpu, QEMU_THREAD_JOINABLE);
>> +    if (!qemu_thread_create(cpu->thread, thread_name, qemu_whpx_cpu_thread_fn,
>> +                            cpu, QEMU_THREAD_JOINABLE, errp)) {
>> +        return;
>> +    }
>>   #ifdef _WIN32
>>       cpu->hThread = qemu_thread_get_handle(cpu->thread);
>>   #endif
>> @@ -2060,8 +2073,10 @@ static void qemu_dummy_start_vcpu(CPUState *cpu, Error **errp)
>>       qemu_cond_init(cpu->halt_cond);
>>       snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/DUMMY",
>>                cpu->cpu_index);
>> -    qemu_thread_create(cpu->thread, thread_name, qemu_dummy_cpu_thread_fn, cpu,
>> -                       QEMU_THREAD_JOINABLE);
>> +    if (!qemu_thread_create(cpu->thread, thread_name, qemu_dummy_cpu_thread_fn,
>> +                            cpu, QEMU_THREAD_JOINABLE, errp)) {
>> +        /* keep 'if' here in case there is further error handling logic */
>> +    }
>>   }
>>   
>>   bool qemu_init_vcpu(CPUState *cpu, Error **errp)
>> diff --git a/dump.c b/dump.c
>> index 4ec94c5e25..1f003aff9a 100644
>> --- a/dump.c
>> +++ b/dump.c
>> @@ -2020,8 +2020,10 @@ void qmp_dump_guest_memory(bool paging, const char *file,
>>       if (detach_p) {
>>           /* detached dump */
>>           s->detached = true;
>> -        qemu_thread_create(&s->dump_thread, "dump_thread", dump_thread,
>> -                           s, QEMU_THREAD_DETACHED);
>> +        if (!qemu_thread_create(&s->dump_thread, "dump_thread", dump_thread,
>> +                                s, QEMU_THREAD_DETACHED, errp)) {
>> +            /* keep 'if' here in case there is further error handling logic */
>> +        }
>>       } else {
>>           /* sync dump */
>>           dump_process(s, errp);
>> diff --git a/hw/misc/edu.c b/hw/misc/edu.c
>> index cdcf550dd7..6684c60a96 100644
>> --- a/hw/misc/edu.c
>> +++ b/hw/misc/edu.c
>> @@ -355,8 +355,10 @@ static void pci_edu_realize(PCIDevice *pdev, Error **errp)
>>   
>>       qemu_mutex_init(&edu->thr_mutex);
>>       qemu_cond_init(&edu->thr_cond);
>> -    qemu_thread_create(&edu->thread, "edu", edu_fact_thread,
>> -                       edu, QEMU_THREAD_JOINABLE);
>> +    if (!qemu_thread_create(&edu->thread, "edu", edu_fact_thread,
>> +                            edu, QEMU_THREAD_JOINABLE, errp)) {
>> +        return;
>> +    }
>>   
>>       memory_region_init_io(&edu->mmio, OBJECT(edu), &edu_mmio_ops, edu,
>>                       "edu-mmio", 1 * MiB);
>> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
>> index ae913d070f..7c16ade04a 100644
>> --- a/hw/ppc/spapr_hcall.c
>> +++ b/hw/ppc/spapr_hcall.c
>> @@ -478,6 +478,7 @@ static target_ulong h_resize_hpt_prepare(PowerPCCPU *cpu,
>>       sPAPRPendingHPT *pending = spapr->pending_hpt;
>>       uint64_t current_ram_size;
>>       int rc;
>> +    Error *local_err = NULL;
>>   
>>       if (spapr->resize_hpt == SPAPR_RESIZE_HPT_DISABLED) {
>>           return H_AUTHORITY;
>> @@ -538,8 +539,13 @@ static target_ulong h_resize_hpt_prepare(PowerPCCPU *cpu,
>>       pending->shift = shift;
>>       pending->ret = H_HARDWARE;
>>   
>> -    qemu_thread_create(&pending->thread, "sPAPR HPT prepare",
>> -                       hpt_prepare_thread, pending, QEMU_THREAD_DETACHED);
>> +    if (!qemu_thread_create(&pending->thread, "sPAPR HPT prepare",
>> +                            hpt_prepare_thread, pending,
>> +                            QEMU_THREAD_DETACHED, &local_err)) {
>> +        error_reportf_err(local_err, "failed to create hpt_prepare_thread: ");
>> +        g_free(pending);
>> +        return H_RESOURCE;
>> +    }
>>   
>>       spapr->pending_hpt = pending;
>>   
> This is a caller that returns an error code on failure.  You change it
> to report the error, then return failure.  The return failure part looks
> fine.  Whether reporting the error is appropriate I can't say for sure.
> No other failure mode reports anything.  David, what do you think?
Just as David explains. :)
> Fei Li, you could pass &error_abort to side-step this question for now.
>
>> diff --git a/hw/rdma/rdma_backend.c b/hw/rdma/rdma_backend.c
>> index d7a4bbd91f..53a2bd0d85 100644
>> --- a/hw/rdma/rdma_backend.c
>> +++ b/hw/rdma/rdma_backend.c
>> @@ -164,8 +164,10 @@ static void start_comp_thread(RdmaBackendDev *backend_dev)
>>       snprintf(thread_name, sizeof(thread_name), "rdma_comp_%s",
>>                ibv_get_device_name(backend_dev->ib_dev));
>>       backend_dev->comp_thread.run = true;
>> +    /* FIXME: let the further caller handle the error instead of abort() here */
>>       qemu_thread_create(&backend_dev->comp_thread.thread, thread_name,
>> -                       comp_handler_thread, backend_dev, QEMU_THREAD_DETACHED);
>> +                       comp_handler_thread, backend_dev,
>> +                       QEMU_THREAD_DETACHED, &error_abort);
>>   }
>>   
> This is a caller that can't return failure.  You pass &error_abort.  No
> behavioral change.
Actually, yes..The reason why I did not do some change is that I am not 
quite
sure about how to fix for the rdma device, esp. setting certain value 
for the
dev->regs_data[idx] when it fails.
> I think I'd mark the spot TODO, not FIXME.  Matter of taste, I guess.
Sounds good, thanks!
>>   void rdma_backend_register_comp_handler(void (*handler)(int status,
>> diff --git a/hw/usb/ccid-card-emulated.c b/hw/usb/ccid-card-emulated.c
>> index 25976ed84f..c6783f124a 100644
>> --- a/hw/usb/ccid-card-emulated.c
>> +++ b/hw/usb/ccid-card-emulated.c
>> @@ -33,6 +33,7 @@
>>   #include "qemu/main-loop.h"
>>   #include "ccid.h"
>>   #include "qapi/error.h"
>> +#include "qemu/error-report.h"
>>   
>>   #define DPRINTF(card, lvl, fmt, ...) \
>>   do {\
>> @@ -544,10 +545,17 @@ static void emulated_realize(CCIDCardState *base, Error **errp)
>>           error_setg(errp, "%s: failed to initialize vcard", TYPE_EMULATED_CCID);
>>           goto out2;
>>       }
>> -    qemu_thread_create(&card->event_thread_id, "ccid/event", event_thread,
>> -                       card, QEMU_THREAD_JOINABLE);
>> -    qemu_thread_create(&card->apdu_thread_id, "ccid/apdu", handle_apdu_thread,
>> -                       card, QEMU_THREAD_JOINABLE);
>> +    if (!qemu_thread_create(&card->event_thread_id, "ccid/event", event_thread,
>> +                            card, QEMU_THREAD_JOINABLE, errp)) {
>> +        error_report("failed to create event_thread");
>> +        goto out2;
>> +    }
>> +    if (!qemu_thread_create(&card->apdu_thread_id, "ccid/apdu",
>> +                            handle_apdu_thread, card,
>> +                            QEMU_THREAD_JOINABLE, errp)) {
>> +        error_report("failed to create handle_apdu_thread");
>> +        goto out2;
>> +    }
>>   
>>   out2:
>>       clean_event_notifier(card);
> error_report() in a realize() method is almost certainly wrong.
Ok, I will remove these two.
>> diff --git a/include/qemu/thread.h b/include/qemu/thread.h
>> index 55d83a907c..12291f4ccd 100644
>> --- a/include/qemu/thread.h
>> +++ b/include/qemu/thread.h
>> @@ -152,9 +152,9 @@ void qemu_event_reset(QemuEvent *ev);
>>   void qemu_event_wait(QemuEvent *ev);
>>   void qemu_event_destroy(QemuEvent *ev);
>>   
>> -void qemu_thread_create(QemuThread *thread, const char *name,
>> +bool qemu_thread_create(QemuThread *thread, const char *name,
>>                           void *(*start_routine)(void *),
>> -                        void *arg, int mode);
>> +                        void *arg, int mode, Error **errp);
>>   void *qemu_thread_join(QemuThread *thread);
>>   void qemu_thread_get_self(QemuThread *thread);
>>   bool qemu_thread_is_self(QemuThread *thread);
>> diff --git a/io/task.c b/io/task.c
>> index 2886a2c1bc..6d3a18ab80 100644
>> --- a/io/task.c
>> +++ b/io/task.c
>> @@ -149,7 +149,8 @@ void qio_task_run_in_thread(QIOTask *task,
>>                          "io-task-worker",
>>                          qio_task_thread_worker,
>>                          data,
>> -                       QEMU_THREAD_DETACHED);
>> +                       QEMU_THREAD_DETACHED,
>> +                       &error_abort);
>>   }
>>   
>>   
> This is a caller that can't return failure.  You pass &error_abort.  No
> behavioral change.  Unlike above, you don't mark this spot FIXME.  Any
> particular reason for marking one, but not the other?
Emm, it is a little difficult to add a Error parameter for its callers and
the callers seem does not need the Error. Thus I think passing
&error_abort in this function instead of its further callers is more 
direct. :)
The same reasons for the several below.

But just as you mentioned, maybe we should add a "TODO: xxxx" for the direct
&error_abort case in case the callers need the Error parameter in future.
> I'll stop commenting on this pattern now.
>
>> diff --git a/iothread.c b/iothread.c
>> index 2fb1cdf55d..7335dacf0b 100644
>> --- a/iothread.c
>> +++ b/iothread.c
>> @@ -164,9 +164,7 @@ static void iothread_complete(UserCreatable *obj, Error **errp)
>>                                   &local_error);
>>       if (local_error) {
>>           error_propagate(errp, local_error);
>> -        aio_context_unref(iothread->ctx);
>> -        iothread->ctx = NULL;
>> -        return;
>> +        goto fail;
>>       }
>>   
>>       qemu_mutex_init(&iothread->init_done_lock);
>> @@ -178,8 +176,12 @@ static void iothread_complete(UserCreatable *obj, Error **errp)
>>        */
>>       name = object_get_canonical_path_component(OBJECT(obj));
>>       thread_name = g_strdup_printf("IO %s", name);
>> -    qemu_thread_create(&iothread->thread, thread_name, iothread_run,
>> -                       iothread, QEMU_THREAD_JOINABLE);
>> +    if (!qemu_thread_create(&iothread->thread, thread_name, iothread_run,
>> +                            iothread, QEMU_THREAD_JOINABLE, errp)) {
>> +        g_free(thread_name);
>> +        g_free(name);
>> +        goto fail;
>> +    }
>>       g_free(thread_name);
>>       g_free(name);
>>   
>> @@ -190,6 +192,10 @@ static void iothread_complete(UserCreatable *obj, Error **errp)
>>                          &iothread->init_done_lock);
>>       }
>>       qemu_mutex_unlock(&iothread->init_done_lock);
>> +    return;
>> +fail:
>> +    aio_context_unref(iothread->ctx);
>> +    iothread->ctx = NULL;
>>   }
>>   
>>   typedef struct {
>> diff --git a/migration/migration.c b/migration/migration.c
>> index 0537fc0c26..af6c72ac5d 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -438,19 +438,22 @@ static void process_incoming_migration_co(void *opaque)
>>           /* Make sure all file formats flush their mutable metadata */
>>           bdrv_invalidate_cache_all(&local_err);
>>           if (local_err) {
>> -            migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
>> -                    MIGRATION_STATUS_FAILED);
>>               error_report_err(local_err);
>> -            exit(EXIT_FAILURE);
>> +            goto fail;
>>           }
>>   
>>           if (colo_init_ram_cache() < 0) {
>>               error_report("Init ram cache failed");
>> -            exit(EXIT_FAILURE);
>> +            goto fail;
>>           }
>>   
>> -        qemu_thread_create(&mis->colo_incoming_thread, "COLO incoming",
>> -             colo_process_incoming_thread, mis, QEMU_THREAD_JOINABLE);
>> +        if (!qemu_thread_create(&mis->colo_incoming_thread, "COLO incoming",
>> +                                colo_process_incoming_thread, mis,
>> +                                QEMU_THREAD_JOINABLE, &local_err)) {
>> +            error_reportf_err(local_err, "failed to create "
>> +                              "colo_process_incoming_thread: ");
>> +            goto fail;
>> +        }
>>           mis->have_colo_incoming_thread = true;
>>           qemu_coroutine_yield();
>>   
>> @@ -461,20 +464,22 @@ static void process_incoming_migration_co(void *opaque)
>>       }
>>   
>>       if (ret < 0) {
>> -        Error *local_err = NULL;
>> -
>> -        migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
>> -                          MIGRATION_STATUS_FAILED);
>>           error_report("load of migration failed: %s", strerror(-ret));
>> -        qemu_fclose(mis->from_src_file);
>> -        if (multifd_load_cleanup(&local_err) != 0) {
>> -            error_report_err(local_err);
>> -        }
>> -        exit(EXIT_FAILURE);
>> +        goto fail;
>>       }
>>       mis->bh = qemu_bh_new(process_incoming_migration_bh, mis);
>>       qemu_bh_schedule(mis->bh);
>>       mis->migration_incoming_co = NULL;
>> +    return;
>> +fail:
>> +    local_err = NULL;
>> +    migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
>> +                      MIGRATION_STATUS_FAILED);
>> +    qemu_fclose(mis->from_src_file);
>> +    if (multifd_load_cleanup(&local_err) != 0) {
>> +        error_report_err(local_err);
>> +    }
>> +    exit(EXIT_FAILURE);
>>   }
> You change handling of errors other than qemu_thread_create().  Separate
> patch, please.  I'd put it before this one.
Ok, thanks for the reminder. Will update in the next version.
>>   
>>   static void migration_incoming_setup(QEMUFile *f)
>> @@ -2345,6 +2350,7 @@ out:
>>   static int open_return_path_on_source(MigrationState *ms,
>>                                         bool create_thread)
>>   {
>> +    Error *local_err = NULL;
>>   
>>       ms->rp_state.from_dst_file = qemu_file_get_return_path(ms->to_dst_file);
>>       if (!ms->rp_state.from_dst_file) {
>> @@ -2358,8 +2364,13 @@ static int open_return_path_on_source(MigrationState *ms,
>>           return 0;
>>       }
>>   
>> -    qemu_thread_create(&ms->rp_state.rp_thread, "return path",
>> -                       source_return_path_thread, ms, QEMU_THREAD_JOINABLE);
>> +    if (!qemu_thread_create(&ms->rp_state.rp_thread, "return path",
>> +                            source_return_path_thread, ms,
>> +                            QEMU_THREAD_JOINABLE, &local_err)) {
>> +        error_reportf_err(local_err,
>> +                          "failed to create source_return_path_thread: ");
>> +        return -1;
>> +    }
>>   
>>       trace_open_return_path_on_source_continue();
>>   
> This is a caller that returns an error code on failure.  You change it
> to report the error, then return failure.  This is okay, because its
> sole caller also reports errors that way.
Thanks.
>> @@ -3189,8 +3200,13 @@ void migrate_fd_connect(MigrationState *s, Error *error_in)
>>           migrate_fd_cleanup(s);
>>           return;
>>       }
>> -    qemu_thread_create(&s->thread, "live_migration", migration_thread, s,
>> -                       QEMU_THREAD_JOINABLE);
>> +    if (!qemu_thread_create(&s->thread, "live_migration", migration_thread,
>> +                            s, QEMU_THREAD_JOINABLE, &error_in)) {
>> +        error_reportf_err(error_in, "failed to create migration_thread: ");
>> +        migrate_set_state(&s->state, s->state, MIGRATION_STATUS_FAILED);
>> +        migrate_fd_cleanup(s);
>> +        return;
>> +    }
>>       s->migration_thread_running = true;
>>   }
> This is a caller that reports errors.  You make it handle
> qemu_thread_create() the same way.  Good.
Thanks!
>>   
>> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
>> index fa09dba534..80bfa9c4a2 100644
>> --- a/migration/postcopy-ram.c
>> +++ b/migration/postcopy-ram.c
>> @@ -1083,6 +1083,8 @@ retry:
>>   
>>   int postcopy_ram_enable_notify(MigrationIncomingState *mis)
>>   {
>> +    Error *local_err = NULL;
>> +
>>       /* Open the fd for the kernel to give us userfaults */
>>       mis->userfault_fd = syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK);
>>       if (mis->userfault_fd == -1) {
>> @@ -1109,8 +1111,16 @@ int postcopy_ram_enable_notify(MigrationIncomingState *mis)
>>       }
>>   
>>       qemu_sem_init(&mis->fault_thread_sem, 0);
>> -    qemu_thread_create(&mis->fault_thread, "postcopy/fault",
>> -                       postcopy_ram_fault_thread, mis, QEMU_THREAD_JOINABLE);
>> +    if (!qemu_thread_create(&mis->fault_thread, "postcopy/fault",
>> +                            postcopy_ram_fault_thread, mis,
>> +                            QEMU_THREAD_JOINABLE, &local_err)) {
>> +        error_reportf_err(local_err,
>> +                          "failed to create postcopy_ram_fault_thread: ");
>> +        close(mis->userfault_event_fd);
>> +        close(mis->userfault_fd);
>> +        qemu_sem_destroy(&mis->fault_thread_sem);
>> +        return -1;
>> +    }
>>       qemu_sem_wait(&mis->fault_thread_sem);
>>       qemu_sem_destroy(&mis->fault_thread_sem);
>>       mis->have_fault_thread = true;
> This is a caller that reports errors, then returns failure.  You make it
> handle qemu_thread_create() the same way.  Good.
>
> Not related to this patch, just spotted while reviewing it:
>
>         /* Mark so that we get notified of accesses to unwritten areas */
>         if (qemu_ram_foreach_migratable_block(ram_block_enable_notify, mis)) {
>             error_report("ram_block_enable_notify failed");
>             return -1;
>         }
>
> Do we leak mis->userfault_fd, mis->userfault_event_fd,
> mis->fault_thread_sem here?
Actually the patch 5/7 fixes this: we leave the cleanup() handling to
postcopy_ram_incoming_cleanup() when failing to notify here.
Looking back to the history, I falsely did close(these_fds) just here but
David corrected me, and the following is quoted from his earlier comment:
"
I don't think these close() calls are safe.  This code is just after
starting the fault thread, and the fault thread has a poll() call on
these fd's, so we can't close them until we've instructed that thread
to exit.

We should fall out through postcopy_ram_incoming_cleanup, and because
the thread exists it should do a notify to the thread, a join and then
only later do the close calls.
"
>> diff --git a/migration/ram.c b/migration/ram.c
>> index 658dfa88a3..6e0cccf066 100644
>> --- a/migration/ram.c
>> +++ b/migration/ram.c
>> @@ -473,6 +473,7 @@ static void compress_threads_save_cleanup(void)
>>   static int compress_threads_save_setup(void)
>>   {
>>       int i, thread_count;
>> +    Error *local_err = NULL;
>>   
>>       if (!migrate_use_compression()) {
>>           return 0;
>> @@ -502,9 +503,12 @@ static int compress_threads_save_setup(void)
>>           comp_param[i].quit = false;
>>           qemu_mutex_init(&comp_param[i].mutex);
>>           qemu_cond_init(&comp_param[i].cond);
>> -        qemu_thread_create(compress_threads + i, "compress",
>> -                           do_data_compress, comp_param + i,
>> -                           QEMU_THREAD_JOINABLE);
>> +        if (!qemu_thread_create(compress_threads + i, "compress",
>> +                                do_data_compress, comp_param + i,
>> +                                QEMU_THREAD_JOINABLE, &local_err)) {
>> +            error_reportf_err(local_err, "failed to create do_data_compress: ");
>> +            goto exit;
>> +        }
>>       }
>>       return 0;
>>   
> Reviewing the migration changes is getting tiresome...
Yes, indeed, the migration involves a lot! Thanks so much for helping to 
review!
>   Is reporting the
> error appropriate here, and why?
I think the qemu monitor should display the obvious and exact failing
reason for administrators, esp considering that qemu_thread_create()
itself does not print any message thus we have no idea which direct
function fails if gdb is not enabled.
IOW, I think David's answer to that ppc's error_reportf_err() also apply 
here:

"The error returns are for the guest, the reported errors are for the 
guest administrator or management layers."

>
>> @@ -1075,8 +1079,14 @@ static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque)
>>           p->c = QIO_CHANNEL(sioc);
>>           qio_channel_set_delay(p->c, false);
>>           p->running = true;
>> -        qemu_thread_create(&p->thread, p->name, multifd_send_thread, p,
>> -                           QEMU_THREAD_JOINABLE);
>> +        if (!qemu_thread_create(&p->thread, p->name, multifd_send_thread, p,
>> +                                QEMU_THREAD_JOINABLE, &local_err)) {
>> +            migrate_set_error(migrate_get_current(), local_err);
>> +            error_reportf_err(local_err,
>> +                              "failed to create multifd_send_thread: ");
>> +            multifd_save_cleanup();
>> +            return;
>> +        }
>>   
>>           atomic_inc(&multifd_send_state->count);
>>       }
> Same question.
>
>> @@ -1350,8 +1360,13 @@ bool multifd_recv_new_channel(QIOChannel *ioc, Error **errp)
>>       p->num_packets = 1;
>>   
>>       p->running = true;
>> -    qemu_thread_create(&p->thread, p->name, multifd_recv_thread, p,
>> -                       QEMU_THREAD_JOINABLE);
>> +    if (!qemu_thread_create(&p->thread, p->name, multifd_recv_thread,
>> +                            p, QEMU_THREAD_JOINABLE, &local_err)) {
>> +        error_propagate_prepend(errp, local_err,
>> +                                "failed to create multifd_recv_thread: ");
>> +        multifd_recv_terminate_threads(local_err);
>> +        return false;
>> +    }
>>       atomic_inc(&multifd_recv_state->count);
>>       return atomic_read(&multifd_recv_state->count) ==
>>              migrate_multifd_channels();
>> @@ -3617,6 +3632,7 @@ static void compress_threads_load_cleanup(void)
>>   static int compress_threads_load_setup(QEMUFile *f)
>>   {
>>       int i, thread_count;
>> +    Error *local_err = NULL;
>>   
>>       if (!migrate_use_compression()) {
>>           return 0;
>> @@ -3638,9 +3654,13 @@ static int compress_threads_load_setup(QEMUFile *f)
>>           qemu_cond_init(&decomp_param[i].cond);
>>           decomp_param[i].done = true;
>>           decomp_param[i].quit = false;
>> -        qemu_thread_create(decompress_threads + i, "decompress",
>> -                           do_data_decompress, decomp_param + i,
>> -                           QEMU_THREAD_JOINABLE);
>> +        if (!qemu_thread_create(decompress_threads + i, "decompress",
>> +                                do_data_decompress, decomp_param + i,
>> +                                QEMU_THREAD_JOINABLE, &local_err)) {
>> +            error_reportf_err(local_err,
>> +                              "failed to create do_data_decompress: ");
>> +            goto exit;
>> +        }
>>       }
>>       return 0;
>>   exit:
> Same question.
>
>> diff --git a/migration/savevm.c b/migration/savevm.c
>> index d784e8aa40..b8bdcde5d8 100644
>> --- a/migration/savevm.c
>> +++ b/migration/savevm.c
>> @@ -1747,9 +1747,14 @@ static int loadvm_postcopy_handle_listen(MigrationIncomingState *mis)
>>       mis->have_listen_thread = true;
>>       /* Start up the listening thread and wait for it to signal ready */
>>       qemu_sem_init(&mis->listen_thread_sem, 0);
>> -    qemu_thread_create(&mis->listen_thread, "postcopy/listen",
>> -                       postcopy_ram_listen_thread, NULL,
>> -                       QEMU_THREAD_DETACHED);
>> +    if (!qemu_thread_create(&mis->listen_thread, "postcopy/listen",
>> +                            postcopy_ram_listen_thread, NULL,
>> +                            QEMU_THREAD_DETACHED, &local_err)) {
>> +        error_reportf_err(local_err,
>> +                          "failed to create postcopy_ram_listen_thread: ");
>> +        qemu_sem_destroy(&mis->listen_thread_sem);
>> +        return -1;
>> +    }
>>       qemu_sem_wait(&mis->listen_thread_sem);
>>       qemu_sem_destroy(&mis->listen_thread_sem);
>>   
> This is a caller that reports errors, then returns failure.  You make it
> handle qemu_thread_create() the same way.  Good.
>
> I'll stop commenting on this pattern now.
Thanks.
>> diff --git a/tests/atomic_add-bench.c b/tests/atomic_add-bench.c
>> index 2f6c72f63a..338b9563e3 100644
>> --- a/tests/atomic_add-bench.c
>> +++ b/tests/atomic_add-bench.c
>> @@ -2,6 +2,7 @@
>>   #include "qemu/thread.h"
>>   #include "qemu/host-utils.h"
>>   #include "qemu/processor.h"
>> +#include "qapi/error.h"
>>   
>>   struct thread_info {
>>       uint64_t r;
>> @@ -110,7 +111,7 @@ static void create_threads(void)
>>   
>>           info->r = (i + 1) ^ time(NULL);
>>           qemu_thread_create(&threads[i], NULL, thread_func, info,
>> -                           QEMU_THREAD_JOINABLE);
>> +                           QEMU_THREAD_JOINABLE, &error_abort);
>>       }
>>   }
... snip for all tests/xxx.c as all the passed parameter is &error_abort ...
>>   
>> diff --git a/ui/vnc-jobs.c b/ui/vnc-jobs.c
>> index 929391f85d..35a652d1fd 100644
>> --- a/ui/vnc-jobs.c
>> +++ b/ui/vnc-jobs.c
>> @@ -31,6 +31,7 @@
>>   #include "vnc-jobs.h"
>>   #include "qemu/sockets.h"
>>   #include "qemu/main-loop.h"
>> +#include "qapi/error.h"
>>   #include "block/aio.h"
>>   
>>   /*
>> @@ -331,15 +332,21 @@ static bool vnc_worker_thread_running(void)
>>       return queue; /* Check global queue */
>>   }
>>   
>> -void vnc_start_worker_thread(void)
>> +bool vnc_start_worker_thread(Error **errp)
>>   {
>>       VncJobQueue *q;
>>   
>> -    if (vnc_worker_thread_running())
>> -        return ;
>> +    if (vnc_worker_thread_running()) {
>> +        goto out;
>> +    }
>>   
>>       q = vnc_queue_init();
>> -    qemu_thread_create(&q->thread, "vnc_worker", vnc_worker_thread, q,
>> -                       QEMU_THREAD_DETACHED);
>> +    if (!qemu_thread_create(&q->thread, "vnc_worker", vnc_worker_thread,
>> +                            q, QEMU_THREAD_DETACHED, errp)) {
>> +        vnc_queue_clear(q);
>> +        return false;
>> +    }
>>       queue = q; /* Set global queue */
>> +out:
>> +    return true;
>>   }
> I recommend to pass &error_abort to qemu_thread_create() in this patch,
> then convert vnc_start_worker_thread() to Error in a subsequent patch.
Ok, thanks! This makes this patch shorter. :)
BTW, would it be better by adding a "TODO: xxx" comment before the
&error_abort in this patch, and remove it in the subsequent patch?
If it is ok, I will do the same adding for the latter touch_all_pages().
>> diff --git a/ui/vnc-jobs.h b/ui/vnc-jobs.h
>> index 59f66bcc35..14640593db 100644
>> --- a/ui/vnc-jobs.h
>> +++ b/ui/vnc-jobs.h
>> @@ -37,7 +37,7 @@ void vnc_job_push(VncJob *job);
>>   void vnc_jobs_join(VncState *vs);
>>   
>>   void vnc_jobs_consume_buffer(VncState *vs);
>> -void vnc_start_worker_thread(void);
>> +bool vnc_start_worker_thread(Error **errp);
>>   
>>   /* Locks */
>>   static inline int vnc_trylock_display(VncDisplay *vd)
>> diff --git a/ui/vnc.c b/ui/vnc.c
>> index 0c1b477425..0ffe9e6a5d 100644
>> --- a/ui/vnc.c
>> +++ b/ui/vnc.c
>> @@ -3236,7 +3236,9 @@ void vnc_display_init(const char *id, Error **errp)
>>       vd->connections_limit = 32;
>>   
>>       qemu_mutex_init(&vd->mutex);
>> -    vnc_start_worker_thread();
>> +    if (!vnc_start_worker_thread(errp)) {
>> +        return;
>> +    }
>>   
>>       vd->dcl.ops = &dcl_ops;
>>       register_displaychangelistener(&vd->dcl);
> These two hunks then also go into the subsequent patch.
Ok.
>> diff --git a/util/compatfd.c b/util/compatfd.c
>> index 980bd33e52..886aa249f9 100644
>> --- a/util/compatfd.c
>> +++ b/util/compatfd.c
>> @@ -16,6 +16,7 @@
>>   #include "qemu/osdep.h"
>>   #include "qemu-common.h"
>>   #include "qemu/thread.h"
>> +#include "qapi/error.h"
>>   
>>   #include <sys/syscall.h>
>>   
>> @@ -70,6 +71,7 @@ static int qemu_signalfd_compat(const sigset_t *mask)
>>       struct sigfd_compat_info *info;
>>       QemuThread thread;
>>       int fds[2];
>> +    Error *local_err = NULL;
>>   
>>       info = malloc(sizeof(*info));
>>       if (info == NULL) {
>> @@ -88,8 +90,14 @@ static int qemu_signalfd_compat(const sigset_t *mask)
>>       memcpy(&info->mask, mask, sizeof(*mask));
>>       info->fd = fds[1];
>>   
>> -    qemu_thread_create(&thread, "signalfd_compat", sigwait_compat, info,
>> -                       QEMU_THREAD_DETACHED);
>> +    if (!qemu_thread_create(&thread, "signalfd_compat", sigwait_compat,
>> +                            info, QEMU_THREAD_DETACHED, &local_err)) {
>> +        error_reportf_err(local_err, "failed to create sigwait_compat: ");
>> +        close(fds[0]);
>> +        close(fds[1]);
>> +        free(info);
>> +        return -1;
>> +    }
>>   
>>       return fds[0];
>>   }
> This function is implements signalfd() when the kernel doesn't provide
> it.
>
> signalfd() sets errno on failure.  The replacement's existing failure
> modes set errno.  You add a failure mode that doesn't set errno.  That's
> a bug.  To fix it, you can either make qemu_thread_create() set errno,
> or you can make it return a value you can use to set errno.  The common
> way to do the latter is returning a *negated* errno value.
Oops, I forgot setting the errno for Linux implementation! My fault..
I will set errno inside qemu_thread_create() as follows:
      err = pthread_attr_init(&attr);
      if (err) {
-        error_setg_errno(errp, -err, "pthread_attr_init failed: %s",
-                         strerror(err));
+        errno = err;
+        error_setg_errno(errp, errno, "pthread_attr_init failed");
          return false;
      }

> signalfd() doesn't print anything on failure.  The replacement's
> existing failure modes don't print anything.  You add a failure mode
> that does print.  I think it shouldn't.
Ok, I will remove it. Thanks!
>> diff --git a/util/oslib-posix.c b/util/oslib-posix.c
>> index c1bee2a581..2c779fd634 100644
>> --- a/util/oslib-posix.c
>> +++ b/util/oslib-posix.c
>> @@ -437,9 +437,12 @@ static bool touch_all_pages(char *area, size_t hpagesize, size_t numpages,
>>       size_t size_per_thread;
>>       char *addr = area;
>>       int i = 0;
>> +    int started_thread = 0;
>> +    Error *local_err = NULL;
>>   
>>       memset_thread_failed = false;
>>       memset_num_threads = get_memset_num_threads(smp_cpus);
>> +    started_thread = memset_num_threads;
>>       memset_thread = g_new0(MemsetThread, memset_num_threads);
>>       numpages_per_thread = (numpages / memset_num_threads);
>>       size_per_thread = (hpagesize * numpages_per_thread);
>> @@ -448,13 +451,19 @@ static bool touch_all_pages(char *area, size_t hpagesize, size_t numpages,
>>           memset_thread[i].numpages = (i == (memset_num_threads - 1)) ?
>>                                       numpages : numpages_per_thread;
>>           memset_thread[i].hpagesize = hpagesize;
>> -        qemu_thread_create(&memset_thread[i].pgthread, "touch_pages",
>> -                           do_touch_pages, &memset_thread[i],
>> -                           QEMU_THREAD_JOINABLE);
>> +        if (!qemu_thread_create(&memset_thread[i].pgthread, "touch_pages",
>> +                                do_touch_pages, &memset_thread[i],
>> +                                QEMU_THREAD_JOINABLE, &local_err)) {
>> +            error_reportf_err(local_err, "failed to create do_touch_pages: ");
>> +            memset_thread_failed = true;
>> +            started_thread = i;
>> +            goto out;
>> +        }
>>           addr += size_per_thread;
>>           numpages -= numpages_per_thread;
>>       }
>> -    for (i = 0; i < memset_num_threads; i++) {
>> +out:
>> +    for (i = 0; i < started_thread; i++) {
>>           qemu_thread_join(&memset_thread[i].pgthread);
>>       }
>>       g_free(memset_thread);
> You need to convert this function to Error instead, because its caller
> os_mem_prealloc() sets an error on failure.  I recommend to pass
> &error_abort in this patch, and convert to Error in a subsequent patch.
Ok, thanks for the advice.
>> diff --git a/util/qemu-thread-posix.c b/util/qemu-thread-posix.c
>> index 865e476df5..81b40a1ece 100644
>> --- a/util/qemu-thread-posix.c
>> +++ b/util/qemu-thread-posix.c
>> @@ -15,6 +15,7 @@
>>   #include "qemu/atomic.h"
>>   #include "qemu/notify.h"
>>   #include "qemu-thread-common.h"
>> +#include "qapi/error.h"
>>   
>>   static bool name_threads;
>>   
>> @@ -500,9 +501,9 @@ static void *qemu_thread_start(void *args)
>>       return r;
>>   }
>>   
>> -void qemu_thread_create(QemuThread *thread, const char *name,
>> -                       void *(*start_routine)(void*),
>> -                       void *arg, int mode)
>> +bool qemu_thread_create(QemuThread *thread, const char *name,
>> +                        void *(*start_routine)(void *),
>> +                        void *arg, int mode, Error **errp)
>>   {
>>       sigset_t set, oldset;
>>       int err;
>> @@ -511,7 +512,9 @@ void qemu_thread_create(QemuThread *thread, const char *name,
>>   
>>       err = pthread_attr_init(&attr);
>>       if (err) {
>> -        error_exit(err, __func__);
>> +        error_setg_errno(errp, -err, "pthread_attr_init failed: %s",
>> +                         strerror(err));
>> +        return false;
>>       }
>>   
>>       if (mode == QEMU_THREAD_DETACHED) {
>> @@ -526,16 +529,21 @@ void qemu_thread_create(QemuThread *thread, const char *name,
>>       qemu_thread_args->name = g_strdup(name);
>>       qemu_thread_args->start_routine = start_routine;
>>       qemu_thread_args->arg = arg;
>> -
> Let's keep the blank line.
ok.

Thanks so much for the review! Have a nice day. :)
Fei
>>       err = pthread_create(&thread->thread, &attr,
>>                            qemu_thread_start, qemu_thread_args);
>> -
>> -    if (err)
>> -        error_exit(err, __func__);
>> +    if (err) {
>> +        error_setg_errno(errp, -err, "pthread_create failed: %s",
>> +                         strerror(err));
>> +        pthread_attr_destroy(&attr);
>> +        g_free(qemu_thread_args->name);
>> +        g_free(qemu_thread_args);
>> +        return false;
>> +    }
>>   
>>       pthread_sigmask(SIG_SETMASK, &oldset, NULL);
>>   
>>       pthread_attr_destroy(&attr);
>> +    return true;
>>   }
>>   
>>   void qemu_thread_get_self(QemuThread *thread)
>> diff --git a/util/qemu-thread-win32.c b/util/qemu-thread-win32.c
>> index 4a363ca675..57b1143e97 100644
>> --- a/util/qemu-thread-win32.c
>> +++ b/util/qemu-thread-win32.c
>> @@ -20,6 +20,7 @@
>>   #include "qemu/thread.h"
>>   #include "qemu/notify.h"
>>   #include "qemu-thread-common.h"
>> +#include "qapi/error.h"
>>   #include <process.h>
>>   
>>   static bool name_threads;
>> @@ -388,9 +389,9 @@ void *qemu_thread_join(QemuThread *thread)
>>       return ret;
>>   }
>>   
>> -void qemu_thread_create(QemuThread *thread, const char *name,
>> -                       void *(*start_routine)(void *),
>> -                       void *arg, int mode)
>> +bool qemu_thread_create(QemuThread *thread, const char *name,
>> +                        void *(*start_routine)(void *),
>> +                        void *arg, int mode, Error **errp)
>>   {
>>       HANDLE hThread;
>>       struct QemuThreadData *data;
>> @@ -409,10 +410,17 @@ void qemu_thread_create(QemuThread *thread, const char *name,
>>       hThread = (HANDLE) _beginthreadex(NULL, 0, win32_start_routine,
>>                                         data, 0, &thread->tid);
>>       if (!hThread) {
>> -        error_exit(GetLastError(), __func__);
>> +        if (data->mode != QEMU_THREAD_DETACHED) {
>> +            DeleteCriticalSection(&data->cs);
>> +        }
>> +        error_setg_errno(errp, errno,
>> +                         "failed to create win32_start_routine");
>> +        g_free(data);
>> +        return false;
>>       }
>>       CloseHandle(hThread);
>>       thread->data = data;
>> +    return true;
>>   }
>>   
>>   void qemu_thread_get_self(QemuThread *thread)
>> diff --git a/util/rcu.c b/util/rcu.c
>> index 5676c22bd1..145dcdb0c6 100644
>> --- a/util/rcu.c
>> +++ b/util/rcu.c
>> @@ -32,6 +32,7 @@
>>   #include "qemu/atomic.h"
>>   #include "qemu/thread.h"
>>   #include "qemu/main-loop.h"
>> +#include "qapi/error.h"
>>   #if defined(CONFIG_MALLOC_TRIM)
>>   #include <malloc.h>
>>   #endif
>> @@ -325,7 +326,7 @@ static void rcu_init_complete(void)
>>        * must have been quiescent even after forking, just recreate it.
>>        */
>>       qemu_thread_create(&thread, "call_rcu", call_rcu_thread,
>> -                       NULL, QEMU_THREAD_DETACHED);
>> +                       NULL, QEMU_THREAD_DETACHED, &error_abort);
>>   
>>       rcu_register_thread();
>>   }
>> diff --git a/util/thread-pool.c b/util/thread-pool.c
>> index 610646d131..ad0f980783 100644
>> --- a/util/thread-pool.c
>> +++ b/util/thread-pool.c
>> @@ -22,6 +22,7 @@
>>   #include "trace.h"
>>   #include "block/thread-pool.h"
>>   #include "qemu/main-loop.h"
>> +#include "qapi/error.h"
>>   
>>   static void do_spawn_thread(ThreadPool *pool);
>>   
>> @@ -132,7 +133,8 @@ static void do_spawn_thread(ThreadPool *pool)
>>       pool->new_threads--;
>>       pool->pending_threads++;
>>   
>> -    qemu_thread_create(&t, "worker", worker_thread, pool, QEMU_THREAD_DETACHED);
>> +    qemu_thread_create(&t, "worker", worker_thread, pool,
>> +                       QEMU_THREAD_DETACHED, &error_abort);
>>   }
>>   
>>   static void spawn_thread_bh_fn(void *opaque)
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.0 v8 3/7] migration: fix the multifd code when receiving less channels
  2018-12-13  6:17   ` Markus Armbruster
@ 2018-12-17 11:45     ` Fei Li
  2018-12-19 14:11       ` Markus Armbruster
  0 siblings, 1 reply; 27+ messages in thread
From: Fei Li @ 2018-12-17 11:45 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, Dr . David Alan Gilbert



On 12/13/2018 02:17 PM, Markus Armbruster wrote:
> Fei Li <fli@suse.com> writes:
>
>> In our current code, when multifd is used during migration, if there
>> is an error before the destination receives all new channels, the
>> source keeps running, however the destination does not exit but keeps
>> waiting until the source is killed deliberately.
>>
>> Fix this by dumping the specific error and let users decide whether
>> to quit from the destination side when failing to receive packet via
>> some channel.
>>
>> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
>> Signed-off-by: Fei Li <fli@suse.com>
>> Reviewed-by: Peter Xu <peterx@redhat.com>
>> ---
>>   migration/channel.c   | 11 ++++++-----
>>   migration/migration.c |  9 +++++++--
>>   migration/migration.h |  2 +-
>>   migration/ram.c       | 10 ++++++++--
>>   migration/ram.h       |  2 +-
>>   5 files changed, 23 insertions(+), 11 deletions(-)
>>
>> diff --git a/migration/channel.c b/migration/channel.c
>> index 33e0e9b82f..20e4c8e2dc 100644
>> --- a/migration/channel.c
>> +++ b/migration/channel.c
>> @@ -30,6 +30,7 @@
>>   void migration_channel_process_incoming(QIOChannel *ioc)
>>   {
>>       MigrationState *s = migrate_get_current();
>> +    Error *local_err = NULL;
>>   
>>       trace_migration_set_incoming_channel(
>>           ioc, object_get_typename(OBJECT(ioc)));
>> @@ -38,13 +39,13 @@ void migration_channel_process_incoming(QIOChannel *ioc)
>>           *s->parameters.tls_creds &&
>>           !object_dynamic_cast(OBJECT(ioc),
>>                                TYPE_QIO_CHANNEL_TLS)) {
>> -        Error *local_err = NULL;
>>           migration_tls_channel_process_incoming(s, ioc, &local_err);
>> -        if (local_err) {
>> -            error_report_err(local_err);
>> -        }
>>       } else {
>> -        migration_ioc_process_incoming(ioc);
>> +        migration_ioc_process_incoming(ioc, &local_err);
>> +    }
>> +
>> +    if (local_err) {
>> +        error_report_err(local_err);
>>       }
>>   }
>>   
>> diff --git a/migration/migration.c b/migration/migration.c
>> index 49ffb9997a..72106bddf0 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -541,7 +541,7 @@ void migration_fd_process_incoming(QEMUFile *f)
>>       migration_incoming_process();
>>   }
>>   
>> -void migration_ioc_process_incoming(QIOChannel *ioc)
>> +void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp)
>>   {
>>       MigrationIncomingState *mis = migration_incoming_get_current();
>>       bool start_migration;
>> @@ -563,9 +563,14 @@ void migration_ioc_process_incoming(QIOChannel *ioc)
>>            */
>>           start_migration = !migrate_use_multifd();
>>       } else {
>> +        Error *local_err = NULL;
>>           /* Multiple connections */
>>           assert(migrate_use_multifd());
>> -        start_migration = multifd_recv_new_channel(ioc);
>> +        start_migration = multifd_recv_new_channel(ioc, &local_err);
>> +        if (local_err) {
>> +            error_propagate(errp, local_err);
>> +            return;
>> +        }
>>       }
>>   
>>       if (start_migration) {
>> diff --git a/migration/migration.h b/migration/migration.h
>> index e413d4d8b6..02b7304610 100644
>> --- a/migration/migration.h
>> +++ b/migration/migration.h
>> @@ -229,7 +229,7 @@ struct MigrationState
>>   void migrate_set_state(int *state, int old_state, int new_state);
>>   
>>   void migration_fd_process_incoming(QEMUFile *f);
>> -void migration_ioc_process_incoming(QIOChannel *ioc);
>> +void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp);
>>   void migration_incoming_process(void);
>>   
>>   bool  migration_has_all_channels(void);
>> diff --git a/migration/ram.c b/migration/ram.c
>> index 7e7deec4d8..c7e3d6b0fd 100644
>> --- a/migration/ram.c
>> +++ b/migration/ram.c
>> @@ -1323,7 +1323,7 @@ bool multifd_recv_all_channels_created(void)
>>   }
>>   
>>   /* Return true if multifd is ready for the migration, otherwise false */
>> -bool multifd_recv_new_channel(QIOChannel *ioc)
>> +bool multifd_recv_new_channel(QIOChannel *ioc, Error **errp)
>>   {
>>       MultiFDRecvParams *p;
>>       Error *local_err = NULL;
>> @@ -1331,6 +1331,10 @@ bool multifd_recv_new_channel(QIOChannel *ioc)
>>   
>>       id = multifd_recv_initial_packet(ioc, &local_err);
>>       if (id < 0) {
>> +        error_propagate_prepend(errp, local_err,
>> +                                "failed to receive packet"
>> +                                " via multifd channel %d: ",
>> +                                atomic_read(&multifd_recv_state->count));
>>           multifd_recv_terminate_threads(local_err);
>>           return false;
> Here, we return false without setting an error.
I am not sure whether I understand correctly, but here I think the above
error_propagate_prepend() set the error to errp.
>>       }
>> @@ -1340,6 +1344,7 @@ bool multifd_recv_new_channel(QIOChannel *ioc)
>>           error_setg(&local_err, "multifd: received id '%d' already setup'",
>>                      id);
>>           multifd_recv_terminate_threads(local_err);
>> +        error_propagate(errp, local_err);
>>           return false;
> Here, we return false with setting an error.
>
>>       }
>>       p->c = ioc;
>> @@ -1351,7 +1356,8 @@ bool multifd_recv_new_channel(QIOChannel *ioc)
>>       qemu_thread_create(&p->thread, p->name, multifd_recv_thread, p,
>>                          QEMU_THREAD_JOINABLE);
>>       atomic_inc(&multifd_recv_state->count);
>> -    return multifd_recv_state->count == migrate_multifd_channels();
>> +    return atomic_read(&multifd_recv_state->count) ==
>> +           migrate_multifd_channels();
> Here, we return either true of false without setting an error.
yes.
> Taken together, there are three cases:
>
> 1. Succeed and return true
Yes, when all multifd channels are correctly received.
> 2. Succeed and return false
Yes, when the current multifd channel is received correctly, but
have not received all the channels.
> 3. Fail (set an error) and return false.
Yes. And with the propagated error, the code just returns and
report the error in migration_channel_process_incoming().
> Assuming that's what we want: please update the function comment to
> spell them out.
Ok, I will update the three cases in the comment to clarify in detail.

Have a nice day, thanks :)
Fei
>
>>   }
>>   
>>   /**
>> diff --git a/migration/ram.h b/migration/ram.h
>> index 83ff1bc11a..046d3074be 100644
>> --- a/migration/ram.h
>> +++ b/migration/ram.h
>> @@ -47,7 +47,7 @@ int multifd_save_cleanup(Error **errp);
>>   int multifd_load_setup(void);
>>   int multifd_load_cleanup(Error **errp);
>>   bool multifd_recv_all_channels_created(void);
>> -bool multifd_recv_new_channel(QIOChannel *ioc);
>> +bool multifd_recv_new_channel(QIOChannel *ioc, Error **errp);
>>   
>>   uint64_t ram_pagesize_summary(void);
>>   int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t len);
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.0 v8 6/7] qemu_thread_create: propagate the error to callers to handle
  2018-12-17  7:29     ` Fei Li
@ 2018-12-18 12:40       ` Fei Li
  2018-12-19 10:11         ` Markus Armbruster
  2018-12-19 10:10       ` Markus Armbruster
  1 sibling, 1 reply; 27+ messages in thread
From: Fei Li @ 2018-12-18 12:40 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: David Gibson, qemu-devel, Dr . David Alan Gilbert



On 12/17/2018 03:29 PM, Fei Li wrote:
>
>
> On 12/13/2018 03:26 PM, Markus Armbruster wrote:
>> There's a question for David Gibson inline.  Please search for /ppc/.
>>
>> Fei Li <fli@suse.com> writes:
>>
>>> Make qemu_thread_create() return a Boolean to indicate if it succeeds
>>> rather than failing with an error. And add an Error parameter to hold
>>> the error message and let the callers handle it.
>> The "rather than failing with an error" is misleading.  Before the
>> patch, we report to stderr and abort().  What about:
>>
>>      qemu-thread: Make qemu_thread_create() handle errors properly
>>
>>      qemu_thread_create() abort()s on error.  Not nice.  Give it a
>>      return value and an Error ** argument, so it can return success /
>>      failure.
> A nice commit-amend! Thanks!
>> Still missing from the commit message then: how you update the callers.
> Yes, agree. I think the-how should also be noted here, like
> - propagating the err to callers whose call trace already have the 
> Error paramater;
> - just add an &error_abort for qemu_thread_create() and make it a 
> "TODO: xxx";
>> Let's see below.
According to your below comment and suggestion, I make a summary for
the second paragraph for the commit message, please help to review, 
thanks. :)

/* ...The first paragraph and the middle blank... */
And let's update qemu_thread_create()'s callers by
- setting an error on qemu_thread_create() failure for callers that
   set an error on failure;
- reporting the error and returning failure for callers that return
   an error code on failure;
- reporting the error and setting some state for callers that just
   report errors and choose not to continue on.
- passing &error_abort for qemu_thread_create() for callers that
   can't return failure, and marking a "TODO: " for further change.

Have a nice day
Fei

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.0 v8 6/7] qemu_thread_create: propagate the error to callers to handle
  2018-12-14  0:24     ` David Gibson
@ 2018-12-19  9:29       ` Markus Armbruster
  2019-01-02  2:29         ` David Gibson
  0 siblings, 1 reply; 27+ messages in thread
From: Markus Armbruster @ 2018-12-19  9:29 UTC (permalink / raw)
  To: David Gibson; +Cc: Fei Li, qemu-devel, Dr . David Alan Gilbert

David Gibson <dgibson@redhat.com> writes:

> On Thu, 13 Dec 2018 08:26:48 +0100
> Markus Armbruster <armbru@redhat.com> wrote:
>
>> There's a question for David Gibson inline.  Please search for /ppc/.
>> 
>> Fei Li <fli@suse.com> writes:
>> 
>> > Make qemu_thread_create() return a Boolean to indicate if it succeeds
>> > rather than failing with an error. And add an Error parameter to hold
>> > the error message and let the callers handle it.  
>> 
>> The "rather than failing with an error" is misleading.  Before the
>> patch, we report to stderr and abort().  What about:
>> 
>>     qemu-thread: Make qemu_thread_create() handle errors properly
>> 
>>     qemu_thread_create() abort()s on error.  Not nice.  Give it a
>>     return value and an Error ** argument, so it can return success /
>>     failure.
>> 
>> Still missing from the commit message then: how you update the callers.
>> Let's see below.
>
> [snip]
>> > --- a/hw/ppc/spapr_hcall.c
>> > +++ b/hw/ppc/spapr_hcall.c
>> > @@ -478,6 +478,7 @@ static target_ulong h_resize_hpt_prepare(PowerPCCPU *cpu,
>> >      sPAPRPendingHPT *pending = spapr->pending_hpt;
>> >      uint64_t current_ram_size;
>> >      int rc;
>> > +    Error *local_err = NULL;
>> >  
>> >      if (spapr->resize_hpt == SPAPR_RESIZE_HPT_DISABLED) {
>> >          return H_AUTHORITY;
>> > @@ -538,8 +539,13 @@ static target_ulong h_resize_hpt_prepare(PowerPCCPU *cpu,
>> >      pending->shift = shift;
>> >      pending->ret = H_HARDWARE;
>> >  
>> > -    qemu_thread_create(&pending->thread, "sPAPR HPT prepare",
>> > -                       hpt_prepare_thread, pending, QEMU_THREAD_DETACHED);
>> > +    if (!qemu_thread_create(&pending->thread, "sPAPR HPT prepare",
>> > +                            hpt_prepare_thread, pending,
>> > +                            QEMU_THREAD_DETACHED, &local_err)) {
>> > +        error_reportf_err(local_err, "failed to create hpt_prepare_thread: ");
>> > +        g_free(pending);
>> > +        return H_RESOURCE;
>> > +    }
>> >  
>> >      spapr->pending_hpt = pending;
>> >    
>> 
>> This is a caller that returns an error code on failure.  You change it
>> to report the error, then return failure.  The return failure part looks
>> fine.  Whether reporting the error is appropriate I can't say for sure.
>> No other failure mode reports anything.  David, what do you think?
>
> I think it's reasonable here.  In this context error returns and
> reported errors are for different audiences.  The error returns are for
> the guest, the reported errors are for the guest administrator or
> management layers.  This particularly failure is essentially a host
> side fault that is mostly relevant to the VM management.  We have to
> say *something* to the guest to explain that the action couldn't go
> forward and H_RESOURCE makes as much sense as anything.

Double-checking: is it okay to report some failures of this function
(one of two H_RESOURCE failures, to be precise), but not others?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.0 v8 6/7] qemu_thread_create: propagate the error to callers to handle
  2018-12-17  7:29     ` Fei Li
  2018-12-18 12:40       ` Fei Li
@ 2018-12-19 10:10       ` Markus Armbruster
  2018-12-19 12:14         ` Fei Li
  1 sibling, 1 reply; 27+ messages in thread
From: Markus Armbruster @ 2018-12-19 10:10 UTC (permalink / raw)
  To: Fei Li; +Cc: David Gibson, qemu-devel, Dr . David Alan Gilbert

Fei Li <fli@suse.com> writes:

> On 12/13/2018 03:26 PM, Markus Armbruster wrote:
>> There's a question for David Gibson inline.  Please search for /ppc/.
>>
>> Fei Li <fli@suse.com> writes:
>>
>>> Make qemu_thread_create() return a Boolean to indicate if it succeeds
>>> rather than failing with an error. And add an Error parameter to hold
>>> the error message and let the callers handle it.
>> The "rather than failing with an error" is misleading.  Before the
>> patch, we report to stderr and abort().  What about:
>>
>>      qemu-thread: Make qemu_thread_create() handle errors properly
>>
>>      qemu_thread_create() abort()s on error.  Not nice.  Give it a
>>      return value and an Error ** argument, so it can return success /
>>      failure.
> A nice commit-amend! Thanks!
>> Still missing from the commit message then: how you update the callers.
> Yes, agree. I think the-how should also be noted here, like
> - propagating the err to callers whose call trace already have the
> Error paramater;
> - just add an &error_abort for qemu_thread_create() and make it a
> "TODO: xxx";
>> Let's see below.
>>
>>> Cc: Markus Armbruster <armbru@redhat.com>
>>> Cc: Daniel P. Berrangé <berrange@redhat.com>
>>> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
>>> Signed-off-by: Fei Li <fli@suse.com>
>>> ---
>>>   cpus.c                      | 45 ++++++++++++++++++++++++-------------
>>>   dump.c                      |  6 +++--
>>>   hw/misc/edu.c               |  6 +++--
>>>   hw/ppc/spapr_hcall.c        | 10 +++++++--
>>>   hw/rdma/rdma_backend.c      |  4 +++-
>>>   hw/usb/ccid-card-emulated.c | 16 ++++++++++----
>>>   include/qemu/thread.h       |  4 ++--
>>>   io/task.c                   |  3 ++-
>>>   iothread.c                  | 16 +++++++++-----
>>>   migration/migration.c       | 54 +++++++++++++++++++++++++++++----------------
>>>   migration/postcopy-ram.c    | 14 ++++++++++--
>>>   migration/ram.c             | 40 ++++++++++++++++++++++++---------
>>>   migration/savevm.c          | 11 ++++++---
>>>   tests/atomic_add-bench.c    |  3 ++-
>>>   tests/iothread.c            |  2 +-
>>>   tests/qht-bench.c           |  3 ++-
>>>   tests/rcutorture.c          |  3 ++-
>>>   tests/test-aio.c            |  2 +-
>>>   tests/test-rcu-list.c       |  3 ++-
>>>   ui/vnc-jobs.c               | 17 +++++++++-----
>>>   ui/vnc-jobs.h               |  2 +-
>>>   ui/vnc.c                    |  4 +++-
>>>   util/compatfd.c             | 12 ++++++++--
>>>   util/oslib-posix.c          | 17 ++++++++++----
>>>   util/qemu-thread-posix.c    | 24 +++++++++++++-------
>>>   util/qemu-thread-win32.c    | 16 ++++++++++----
>>>   util/rcu.c                  |  3 ++-
>>>   util/thread-pool.c          |  4 +++-
>>>   28 files changed, 243 insertions(+), 101 deletions(-)
>>>
>>> diff --git a/cpus.c b/cpus.c
>>> index 7b091bda53..e8450e518a 100644
>>> --- a/cpus.c
>>> +++ b/cpus.c
>>> @@ -1961,15 +1961,20 @@ static void qemu_tcg_init_vcpu(CPUState *cpu, Error **errp)
>>>               snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/TCG",
>>>                    cpu->cpu_index);
>>>   -            qemu_thread_create(cpu->thread, thread_name,
>>> qemu_tcg_cpu_thread_fn,
>>> -                               cpu, QEMU_THREAD_JOINABLE);
>>> +            if (!qemu_thread_create(cpu->thread, thread_name,
>>> +                                    qemu_tcg_cpu_thread_fn, cpu,
>>> +                                    QEMU_THREAD_JOINABLE, errp)) {
>>> +                return;
>>> +            }
>>>             } else {
>>>               /* share a single thread for all cpus with TCG */
>>>               snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "ALL CPUs/TCG");
>>> -            qemu_thread_create(cpu->thread, thread_name,
>>> -                               qemu_tcg_rr_cpu_thread_fn,
>>> -                               cpu, QEMU_THREAD_JOINABLE);
>>> +            if (!qemu_thread_create(cpu->thread, thread_name,
>>> +                                    qemu_tcg_rr_cpu_thread_fn, cpu,
>>> +                                    QEMU_THREAD_JOINABLE, errp)) {
>>> +                return;
>>> +            }
>>>                 single_tcg_halt_cond = cpu->halt_cond;
>>>               single_tcg_cpu_thread = cpu->thread;
>> This is a caller that sets an error on failure.  You make it set an
>> error on qemu_thread_create() failure.  Makes sense.
> Thanks for the comment!
>>> @@ -1997,8 +2002,10 @@ static void qemu_hax_start_vcpu(CPUState *cpu, Error **errp)
>>>         snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/HAX",
>>>                cpu->cpu_index);
>>> -    qemu_thread_create(cpu->thread, thread_name, qemu_hax_cpu_thread_fn,
>>> -                       cpu, QEMU_THREAD_JOINABLE);
>>> +    if (!qemu_thread_create(cpu->thread, thread_name, qemu_hax_cpu_thread_fn,
>>> +                            cpu, QEMU_THREAD_JOINABLE, errp)) {
>>> +        return;
>>> +    }
>>>   #ifdef _WIN32
>>>       cpu->hThread = qemu_thread_get_handle(cpu->thread);
>>>   #endif
>> Likewise.  I'll stop commenting on this pattern now.
>>
>>> @@ -2013,8 +2020,10 @@ static void qemu_kvm_start_vcpu(CPUState *cpu, Error **errp)
>>>       qemu_cond_init(cpu->halt_cond);
>>>       snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/KVM",
>>>                cpu->cpu_index);
>>> -    qemu_thread_create(cpu->thread, thread_name, qemu_kvm_cpu_thread_fn,
>>> -                       cpu, QEMU_THREAD_JOINABLE);
>>> +    if (!qemu_thread_create(cpu->thread, thread_name, qemu_kvm_cpu_thread_fn,
>>> +                            cpu, QEMU_THREAD_JOINABLE, errp)) {
>>> +        /* keep 'if' here in case there is further error handling logic */
>>> +    }
>>>   }
>>>     static void qemu_hvf_start_vcpu(CPUState *cpu, Error **errp)
>>> @@ -2031,8 +2040,10 @@ static void qemu_hvf_start_vcpu(CPUState *cpu, Error **errp)
>>>         snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/HVF",
>>>                cpu->cpu_index);
>>> -    qemu_thread_create(cpu->thread, thread_name, qemu_hvf_cpu_thread_fn,
>>> -                       cpu, QEMU_THREAD_JOINABLE);
>>> +    if (!qemu_thread_create(cpu->thread, thread_name, qemu_hvf_cpu_thread_fn,
>>> +                            cpu, QEMU_THREAD_JOINABLE, errp)) {
>>> +        /* keep 'if' here in case there is further error handling logic */
>>> +    }
>>>   }
>>>     static void qemu_whpx_start_vcpu(CPUState *cpu, Error **errp)
>>> @@ -2044,8 +2055,10 @@ static void qemu_whpx_start_vcpu(CPUState *cpu, Error **errp)
>>>       qemu_cond_init(cpu->halt_cond);
>>>       snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/WHPX",
>>>                cpu->cpu_index);
>>> -    qemu_thread_create(cpu->thread, thread_name, qemu_whpx_cpu_thread_fn,
>>> -                       cpu, QEMU_THREAD_JOINABLE);
>>> +    if (!qemu_thread_create(cpu->thread, thread_name, qemu_whpx_cpu_thread_fn,
>>> +                            cpu, QEMU_THREAD_JOINABLE, errp)) {
>>> +        return;
>>> +    }
>>>   #ifdef _WIN32
>>>       cpu->hThread = qemu_thread_get_handle(cpu->thread);
>>>   #endif
>>> @@ -2060,8 +2073,10 @@ static void qemu_dummy_start_vcpu(CPUState *cpu, Error **errp)
>>>       qemu_cond_init(cpu->halt_cond);
>>>       snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/DUMMY",
>>>                cpu->cpu_index);
>>> -    qemu_thread_create(cpu->thread, thread_name, qemu_dummy_cpu_thread_fn, cpu,
>>> -                       QEMU_THREAD_JOINABLE);
>>> +    if (!qemu_thread_create(cpu->thread, thread_name, qemu_dummy_cpu_thread_fn,
>>> +                            cpu, QEMU_THREAD_JOINABLE, errp)) {
>>> +        /* keep 'if' here in case there is further error handling logic */
>>> +    }
>>>   }
>>>     bool qemu_init_vcpu(CPUState *cpu, Error **errp)
>>> diff --git a/dump.c b/dump.c
>>> index 4ec94c5e25..1f003aff9a 100644
>>> --- a/dump.c
>>> +++ b/dump.c
>>> @@ -2020,8 +2020,10 @@ void qmp_dump_guest_memory(bool paging, const char *file,
>>>       if (detach_p) {
>>>           /* detached dump */
>>>           s->detached = true;
>>> -        qemu_thread_create(&s->dump_thread, "dump_thread", dump_thread,
>>> -                           s, QEMU_THREAD_DETACHED);
>>> +        if (!qemu_thread_create(&s->dump_thread, "dump_thread", dump_thread,
>>> +                                s, QEMU_THREAD_DETACHED, errp)) {
>>> +            /* keep 'if' here in case there is further error handling logic */
>>> +        }
>>>       } else {
>>>           /* sync dump */
>>>           dump_process(s, errp);
>>> diff --git a/hw/misc/edu.c b/hw/misc/edu.c
>>> index cdcf550dd7..6684c60a96 100644
>>> --- a/hw/misc/edu.c
>>> +++ b/hw/misc/edu.c
>>> @@ -355,8 +355,10 @@ static void pci_edu_realize(PCIDevice *pdev, Error **errp)
>>>         qemu_mutex_init(&edu->thr_mutex);
>>>       qemu_cond_init(&edu->thr_cond);
>>> -    qemu_thread_create(&edu->thread, "edu", edu_fact_thread,
>>> -                       edu, QEMU_THREAD_JOINABLE);
>>> +    if (!qemu_thread_create(&edu->thread, "edu", edu_fact_thread,
>>> +                            edu, QEMU_THREAD_JOINABLE, errp)) {
>>> +        return;
>>> +    }
>>>         memory_region_init_io(&edu->mmio, OBJECT(edu),
>>> &edu_mmio_ops, edu,
>>>                       "edu-mmio", 1 * MiB);
>>> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
>>> index ae913d070f..7c16ade04a 100644
>>> --- a/hw/ppc/spapr_hcall.c
>>> +++ b/hw/ppc/spapr_hcall.c
>>> @@ -478,6 +478,7 @@ static target_ulong h_resize_hpt_prepare(PowerPCCPU *cpu,
>>>       sPAPRPendingHPT *pending = spapr->pending_hpt;
>>>       uint64_t current_ram_size;
>>>       int rc;
>>> +    Error *local_err = NULL;
>>>         if (spapr->resize_hpt == SPAPR_RESIZE_HPT_DISABLED) {
>>>           return H_AUTHORITY;
>>> @@ -538,8 +539,13 @@ static target_ulong h_resize_hpt_prepare(PowerPCCPU *cpu,
>>>       pending->shift = shift;
>>>       pending->ret = H_HARDWARE;
>>>   -    qemu_thread_create(&pending->thread, "sPAPR HPT prepare",
>>> -                       hpt_prepare_thread, pending, QEMU_THREAD_DETACHED);
>>> +    if (!qemu_thread_create(&pending->thread, "sPAPR HPT prepare",
>>> +                            hpt_prepare_thread, pending,
>>> +                            QEMU_THREAD_DETACHED, &local_err)) {
>>> +        error_reportf_err(local_err, "failed to create hpt_prepare_thread: ");
>>> +        g_free(pending);
>>> +        return H_RESOURCE;
>>> +    }
>>>         spapr->pending_hpt = pending;
>>>   
>> This is a caller that returns an error code on failure.  You change it
>> to report the error, then return failure.  The return failure part looks
>> fine.  Whether reporting the error is appropriate I can't say for sure.
>> No other failure mode reports anything.  David, what do you think?
> Just as David explains. :)
>> Fei Li, you could pass &error_abort to side-step this question for now.
>>
>>> diff --git a/hw/rdma/rdma_backend.c b/hw/rdma/rdma_backend.c
>>> index d7a4bbd91f..53a2bd0d85 100644
>>> --- a/hw/rdma/rdma_backend.c
>>> +++ b/hw/rdma/rdma_backend.c
>>> @@ -164,8 +164,10 @@ static void start_comp_thread(RdmaBackendDev *backend_dev)
>>>       snprintf(thread_name, sizeof(thread_name), "rdma_comp_%s",
>>>                ibv_get_device_name(backend_dev->ib_dev));
>>>       backend_dev->comp_thread.run = true;
>>> +    /* FIXME: let the further caller handle the error instead of abort() here */
>>>       qemu_thread_create(&backend_dev->comp_thread.thread, thread_name,
>>> -                       comp_handler_thread, backend_dev, QEMU_THREAD_DETACHED);
>>> +                       comp_handler_thread, backend_dev,
>>> +                       QEMU_THREAD_DETACHED, &error_abort);
>>>   }
>>>   
>> This is a caller that can't return failure.  You pass &error_abort.  No
>> behavioral change.
> Actually, yes..The reason why I did not do some change is that I am
> not quite
> sure about how to fix for the rdma device, esp. setting certain value
> for the
> dev->regs_data[idx] when it fails.

I recommend to split this patch.  First part adds the Error ** parameter
to qemu_thread_create(), passing &error_abort everywhere.  No functional
change.  Subsequent patches then improve on &error_abort.  This way,
each improvement patch can be cc'ed to just that part's maintainer(s).
Parts you don't want to touch you simply leave at &error_abort.  Makes
sense?

>> I think I'd mark the spot TODO, not FIXME.  Matter of taste, I guess.
> Sounds good, thanks!
>>>   void rdma_backend_register_comp_handler(void (*handler)(int status,
>>> diff --git a/hw/usb/ccid-card-emulated.c b/hw/usb/ccid-card-emulated.c
>>> index 25976ed84f..c6783f124a 100644
>>> --- a/hw/usb/ccid-card-emulated.c
>>> +++ b/hw/usb/ccid-card-emulated.c
>>> @@ -33,6 +33,7 @@
>>>   #include "qemu/main-loop.h"
>>>   #include "ccid.h"
>>>   #include "qapi/error.h"
>>> +#include "qemu/error-report.h"
>>>     #define DPRINTF(card, lvl, fmt, ...) \
>>>   do {\
>>> @@ -544,10 +545,17 @@ static void emulated_realize(CCIDCardState *base, Error **errp)
>>>           error_setg(errp, "%s: failed to initialize vcard", TYPE_EMULATED_CCID);
>>>           goto out2;
>>>       }
>>> -    qemu_thread_create(&card->event_thread_id, "ccid/event", event_thread,
>>> -                       card, QEMU_THREAD_JOINABLE);
>>> -    qemu_thread_create(&card->apdu_thread_id, "ccid/apdu", handle_apdu_thread,
>>> -                       card, QEMU_THREAD_JOINABLE);
>>> +    if (!qemu_thread_create(&card->event_thread_id, "ccid/event", event_thread,
>>> +                            card, QEMU_THREAD_JOINABLE, errp)) {
>>> +        error_report("failed to create event_thread");
>>> +        goto out2;
>>> +    }
>>> +    if (!qemu_thread_create(&card->apdu_thread_id, "ccid/apdu",
>>> +                            handle_apdu_thread, card,
>>> +                            QEMU_THREAD_JOINABLE, errp)) {
>>> +        error_report("failed to create handle_apdu_thread");
>>> +        goto out2;
>>> +    }
>>>     out2:
>>>       clean_event_notifier(card);
>> error_report() in a realize() method is almost certainly wrong.
> Ok, I will remove these two.
>>> diff --git a/include/qemu/thread.h b/include/qemu/thread.h
>>> index 55d83a907c..12291f4ccd 100644
>>> --- a/include/qemu/thread.h
>>> +++ b/include/qemu/thread.h
>>> @@ -152,9 +152,9 @@ void qemu_event_reset(QemuEvent *ev);
>>>   void qemu_event_wait(QemuEvent *ev);
>>>   void qemu_event_destroy(QemuEvent *ev);
>>>   -void qemu_thread_create(QemuThread *thread, const char *name,
>>> +bool qemu_thread_create(QemuThread *thread, const char *name,
>>>                           void *(*start_routine)(void *),
>>> -                        void *arg, int mode);
>>> +                        void *arg, int mode, Error **errp);
>>>   void *qemu_thread_join(QemuThread *thread);
>>>   void qemu_thread_get_self(QemuThread *thread);
>>>   bool qemu_thread_is_self(QemuThread *thread);
>>> diff --git a/io/task.c b/io/task.c
>>> index 2886a2c1bc..6d3a18ab80 100644
>>> --- a/io/task.c
>>> +++ b/io/task.c
>>> @@ -149,7 +149,8 @@ void qio_task_run_in_thread(QIOTask *task,
>>>                          "io-task-worker",
>>>                          qio_task_thread_worker,
>>>                          data,
>>> -                       QEMU_THREAD_DETACHED);
>>> +                       QEMU_THREAD_DETACHED,
>>> +                       &error_abort);
>>>   }
>>>     
>> This is a caller that can't return failure.  You pass &error_abort.  No
>> behavioral change.  Unlike above, you don't mark this spot FIXME.  Any
>> particular reason for marking one, but not the other?
> Emm, it is a little difficult to add a Error parameter for its callers and
> the callers seem does not need the Error. Thus I think passing
> &error_abort in this function instead of its further callers is more
> direct. :)
> The same reasons for the several below.
>
> But just as you mentioned, maybe we should add a "TODO: xxxx" for the direct
> &error_abort case in case the callers need the Error parameter in future.

Your use of &error_abort in this patch is fine simply because it's no
worse than before.  I'm merely probing your use of FIXME / TODO.

Adding a FIXME is appropriate when you're convinced the code is actually
broken.

Adding a TODO is appropriate when you believe the code should be
improved.

Both are almost always worth mentioning in the commit message.

If you don't really know, and you're not really changing how the code
behaves, then it's better not to add either kind of comment.

>> I'll stop commenting on this pattern now.
>>
>>> diff --git a/iothread.c b/iothread.c
>>> index 2fb1cdf55d..7335dacf0b 100644
>>> --- a/iothread.c
>>> +++ b/iothread.c
>>> @@ -164,9 +164,7 @@ static void iothread_complete(UserCreatable *obj, Error **errp)
>>>                                   &local_error);
>>>       if (local_error) {
>>>           error_propagate(errp, local_error);
>>> -        aio_context_unref(iothread->ctx);
>>> -        iothread->ctx = NULL;
>>> -        return;
>>> +        goto fail;
>>>       }
>>>         qemu_mutex_init(&iothread->init_done_lock);
>>> @@ -178,8 +176,12 @@ static void iothread_complete(UserCreatable *obj, Error **errp)
>>>        */
>>>       name = object_get_canonical_path_component(OBJECT(obj));
>>>       thread_name = g_strdup_printf("IO %s", name);
>>> -    qemu_thread_create(&iothread->thread, thread_name, iothread_run,
>>> -                       iothread, QEMU_THREAD_JOINABLE);
>>> +    if (!qemu_thread_create(&iothread->thread, thread_name, iothread_run,
>>> +                            iothread, QEMU_THREAD_JOINABLE, errp)) {
>>> +        g_free(thread_name);
>>> +        g_free(name);
>>> +        goto fail;
>>> +    }
>>>       g_free(thread_name);
>>>       g_free(name);
>>>   @@ -190,6 +192,10 @@ static void iothread_complete(UserCreatable
>>> *obj, Error **errp)
>>>                          &iothread->init_done_lock);
>>>       }
>>>       qemu_mutex_unlock(&iothread->init_done_lock);
>>> +    return;
>>> +fail:
>>> +    aio_context_unref(iothread->ctx);
>>> +    iothread->ctx = NULL;
>>>   }
>>>     typedef struct {
>>> diff --git a/migration/migration.c b/migration/migration.c
>>> index 0537fc0c26..af6c72ac5d 100644
>>> --- a/migration/migration.c
>>> +++ b/migration/migration.c
>>> @@ -438,19 +438,22 @@ static void process_incoming_migration_co(void *opaque)
>>>           /* Make sure all file formats flush their mutable metadata */
>>>           bdrv_invalidate_cache_all(&local_err);
>>>           if (local_err) {
>>> -            migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
>>> -                    MIGRATION_STATUS_FAILED);
>>>               error_report_err(local_err);
>>> -            exit(EXIT_FAILURE);
>>> +            goto fail;
>>>           }
>>>             if (colo_init_ram_cache() < 0) {
>>>               error_report("Init ram cache failed");
>>> -            exit(EXIT_FAILURE);
>>> +            goto fail;
>>>           }
>>>   -        qemu_thread_create(&mis->colo_incoming_thread, "COLO
>>> incoming",
>>> -             colo_process_incoming_thread, mis, QEMU_THREAD_JOINABLE);
>>> +        if (!qemu_thread_create(&mis->colo_incoming_thread, "COLO incoming",
>>> +                                colo_process_incoming_thread, mis,
>>> +                                QEMU_THREAD_JOINABLE, &local_err)) {
>>> +            error_reportf_err(local_err, "failed to create "
>>> +                              "colo_process_incoming_thread: ");
>>> +            goto fail;
>>> +        }
>>>           mis->have_colo_incoming_thread = true;
>>>           qemu_coroutine_yield();
>>>   @@ -461,20 +464,22 @@ static void
>>> process_incoming_migration_co(void *opaque)
>>>       }
>>>         if (ret < 0) {
>>> -        Error *local_err = NULL;
>>> -
>>> -        migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
>>> -                          MIGRATION_STATUS_FAILED);
>>>           error_report("load of migration failed: %s", strerror(-ret));
>>> -        qemu_fclose(mis->from_src_file);
>>> -        if (multifd_load_cleanup(&local_err) != 0) {
>>> -            error_report_err(local_err);
>>> -        }
>>> -        exit(EXIT_FAILURE);
>>> +        goto fail;
>>>       }
>>>       mis->bh = qemu_bh_new(process_incoming_migration_bh, mis);
>>>       qemu_bh_schedule(mis->bh);
>>>       mis->migration_incoming_co = NULL;
>>> +    return;
>>> +fail:
>>> +    local_err = NULL;
>>> +    migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
>>> +                      MIGRATION_STATUS_FAILED);
>>> +    qemu_fclose(mis->from_src_file);
>>> +    if (multifd_load_cleanup(&local_err) != 0) {
>>> +        error_report_err(local_err);
>>> +    }
>>> +    exit(EXIT_FAILURE);
>>>   }
>> You change handling of errors other than qemu_thread_create().  Separate
>> patch, please.  I'd put it before this one.
> Ok, thanks for the reminder. Will update in the next version.
>>>     static void migration_incoming_setup(QEMUFile *f)
>>> @@ -2345,6 +2350,7 @@ out:
>>>   static int open_return_path_on_source(MigrationState *ms,
>>>                                         bool create_thread)
>>>   {
>>> +    Error *local_err = NULL;
>>>         ms->rp_state.from_dst_file =
>>> qemu_file_get_return_path(ms->to_dst_file);
>>>       if (!ms->rp_state.from_dst_file) {
>>> @@ -2358,8 +2364,13 @@ static int open_return_path_on_source(MigrationState *ms,
>>>           return 0;
>>>       }
>>>   -    qemu_thread_create(&ms->rp_state.rp_thread, "return path",
>>> -                       source_return_path_thread, ms, QEMU_THREAD_JOINABLE);
>>> +    if (!qemu_thread_create(&ms->rp_state.rp_thread, "return path",
>>> +                            source_return_path_thread, ms,
>>> +                            QEMU_THREAD_JOINABLE, &local_err)) {
>>> +        error_reportf_err(local_err,
>>> +                          "failed to create source_return_path_thread: ");
>>> +        return -1;
>>> +    }
>>>         trace_open_return_path_on_source_continue();
>>>   
>> This is a caller that returns an error code on failure.  You change it
>> to report the error, then return failure.  This is okay, because its
>> sole caller also reports errors that way.
> Thanks.
>>> @@ -3189,8 +3200,13 @@ void migrate_fd_connect(MigrationState *s, Error *error_in)
>>>           migrate_fd_cleanup(s);
>>>           return;
>>>       }
>>> -    qemu_thread_create(&s->thread, "live_migration", migration_thread, s,
>>> -                       QEMU_THREAD_JOINABLE);
>>> +    if (!qemu_thread_create(&s->thread, "live_migration", migration_thread,
>>> +                            s, QEMU_THREAD_JOINABLE, &error_in)) {
>>> +        error_reportf_err(error_in, "failed to create migration_thread: ");
>>> +        migrate_set_state(&s->state, s->state, MIGRATION_STATUS_FAILED);
>>> +        migrate_fd_cleanup(s);
>>> +        return;
>>> +    }
>>>       s->migration_thread_running = true;
>>>   }
>> This is a caller that reports errors.  You make it handle
>> qemu_thread_create() the same way.  Good.
> Thanks!
>>>   diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
>>> index fa09dba534..80bfa9c4a2 100644
>>> --- a/migration/postcopy-ram.c
>>> +++ b/migration/postcopy-ram.c
>>> @@ -1083,6 +1083,8 @@ retry:
>>>     int postcopy_ram_enable_notify(MigrationIncomingState *mis)
>>>   {
>>> +    Error *local_err = NULL;
>>> +
>>>       /* Open the fd for the kernel to give us userfaults */
>>>       mis->userfault_fd = syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK);
>>>       if (mis->userfault_fd == -1) {
>>> @@ -1109,8 +1111,16 @@ int postcopy_ram_enable_notify(MigrationIncomingState *mis)
>>>       }
>>>         qemu_sem_init(&mis->fault_thread_sem, 0);
>>> -    qemu_thread_create(&mis->fault_thread, "postcopy/fault",
>>> -                       postcopy_ram_fault_thread, mis, QEMU_THREAD_JOINABLE);
>>> +    if (!qemu_thread_create(&mis->fault_thread, "postcopy/fault",
>>> +                            postcopy_ram_fault_thread, mis,
>>> +                            QEMU_THREAD_JOINABLE, &local_err)) {
>>> +        error_reportf_err(local_err,
>>> +                          "failed to create postcopy_ram_fault_thread: ");
>>> +        close(mis->userfault_event_fd);
>>> +        close(mis->userfault_fd);
>>> +        qemu_sem_destroy(&mis->fault_thread_sem);
>>> +        return -1;
>>> +    }
>>>       qemu_sem_wait(&mis->fault_thread_sem);
>>>       qemu_sem_destroy(&mis->fault_thread_sem);
>>>       mis->have_fault_thread = true;
>> This is a caller that reports errors, then returns failure.  You make it
>> handle qemu_thread_create() the same way.  Good.
>>
>> Not related to this patch, just spotted while reviewing it:
>>
>>         /* Mark so that we get notified of accesses to unwritten areas */
>>         if (qemu_ram_foreach_migratable_block(ram_block_enable_notify, mis)) {
>>             error_report("ram_block_enable_notify failed");
>>             return -1;
>>         }
>>
>> Do we leak mis->userfault_fd, mis->userfault_event_fd,
>> mis->fault_thread_sem here?
> Actually the patch 5/7 fixes this: we leave the cleanup() handling to
> postcopy_ram_incoming_cleanup() when failing to notify here.
> Looking back to the history, I falsely did close(these_fds) just here but
> David corrected me, and the following is quoted from his earlier comment:
> "
> I don't think these close() calls are safe.  This code is just after
> starting the fault thread, and the fault thread has a poll() call on
> these fd's, so we can't close them until we've instructed that thread
> to exit.
>
> We should fall out through postcopy_ram_incoming_cleanup, and because
> the thread exists it should do a notify to the thread, a join and then
> only later do the close calls.
> "
>>> diff --git a/migration/ram.c b/migration/ram.c
>>> index 658dfa88a3..6e0cccf066 100644
>>> --- a/migration/ram.c
>>> +++ b/migration/ram.c
>>> @@ -473,6 +473,7 @@ static void compress_threads_save_cleanup(void)
>>>   static int compress_threads_save_setup(void)
>>>   {
>>>       int i, thread_count;
>>> +    Error *local_err = NULL;
>>>         if (!migrate_use_compression()) {
>>>           return 0;
>>> @@ -502,9 +503,12 @@ static int compress_threads_save_setup(void)
>>>           comp_param[i].quit = false;
>>>           qemu_mutex_init(&comp_param[i].mutex);
>>>           qemu_cond_init(&comp_param[i].cond);
>>> -        qemu_thread_create(compress_threads + i, "compress",
>>> -                           do_data_compress, comp_param + i,
>>> -                           QEMU_THREAD_JOINABLE);
>>> +        if (!qemu_thread_create(compress_threads + i, "compress",
>>> +                                do_data_compress, comp_param + i,
>>> +                                QEMU_THREAD_JOINABLE, &local_err)) {
>>> +            error_reportf_err(local_err, "failed to create do_data_compress: ");
>>> +            goto exit;
>>> +        }
>>>       }
>>>       return 0;
>>>   
>> Reviewing the migration changes is getting tiresome...
> Yes, indeed, the migration involves a lot! Thanks so much for helping
> to review!
>>   Is reporting the
>> error appropriate here, and why?
> I think the qemu monitor should display the obvious and exact failing
> reason for administrators, esp considering that qemu_thread_create()
> itself does not print any message thus we have no idea which direct
> function fails if gdb is not enabled.
> IOW, I think David's answer to that ppc's error_reportf_err() also
> apply here:
>
> "The error returns are for the guest, the reported errors are for the
> guest administrator or management layers."

There could well be an issue with the "management layers" part.  Should
this error be sent to the management layer via QMP somehow?  Migration
maintainers should be able to assist with this question.

>>> @@ -1075,8 +1079,14 @@ static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque)
>>>           p->c = QIO_CHANNEL(sioc);
>>>           qio_channel_set_delay(p->c, false);
>>>           p->running = true;
>>> -        qemu_thread_create(&p->thread, p->name, multifd_send_thread, p,
>>> -                           QEMU_THREAD_JOINABLE);
>>> +        if (!qemu_thread_create(&p->thread, p->name, multifd_send_thread, p,
>>> +                                QEMU_THREAD_JOINABLE, &local_err)) {
>>> +            migrate_set_error(migrate_get_current(), local_err);
>>> +            error_reportf_err(local_err,
>>> +                              "failed to create multifd_send_thread: ");
>>> +            multifd_save_cleanup();
>>> +            return;
>>> +        }
>>>             atomic_inc(&multifd_send_state->count);
>>>       }
>> Same question.
>>
>>> @@ -1350,8 +1360,13 @@ bool multifd_recv_new_channel(QIOChannel *ioc, Error **errp)
>>>       p->num_packets = 1;
>>>         p->running = true;
>>> -    qemu_thread_create(&p->thread, p->name, multifd_recv_thread, p,
>>> -                       QEMU_THREAD_JOINABLE);
>>> +    if (!qemu_thread_create(&p->thread, p->name, multifd_recv_thread,
>>> +                            p, QEMU_THREAD_JOINABLE, &local_err)) {
>>> +        error_propagate_prepend(errp, local_err,
>>> +                                "failed to create multifd_recv_thread: ");
>>> +        multifd_recv_terminate_threads(local_err);
>>> +        return false;
>>> +    }
>>>       atomic_inc(&multifd_recv_state->count);
>>>       return atomic_read(&multifd_recv_state->count) ==
>>>              migrate_multifd_channels();
>>> @@ -3617,6 +3632,7 @@ static void compress_threads_load_cleanup(void)
>>>   static int compress_threads_load_setup(QEMUFile *f)
>>>   {
>>>       int i, thread_count;
>>> +    Error *local_err = NULL;
>>>         if (!migrate_use_compression()) {
>>>           return 0;
>>> @@ -3638,9 +3654,13 @@ static int compress_threads_load_setup(QEMUFile *f)
>>>           qemu_cond_init(&decomp_param[i].cond);
>>>           decomp_param[i].done = true;
>>>           decomp_param[i].quit = false;
>>> -        qemu_thread_create(decompress_threads + i, "decompress",
>>> -                           do_data_decompress, decomp_param + i,
>>> -                           QEMU_THREAD_JOINABLE);
>>> +        if (!qemu_thread_create(decompress_threads + i, "decompress",
>>> +                                do_data_decompress, decomp_param + i,
>>> +                                QEMU_THREAD_JOINABLE, &local_err)) {
>>> +            error_reportf_err(local_err,
>>> +                              "failed to create do_data_decompress: ");
>>> +            goto exit;
>>> +        }
>>>       }
>>>       return 0;
>>>   exit:
>> Same question.
>>
>>> diff --git a/migration/savevm.c b/migration/savevm.c
>>> index d784e8aa40..b8bdcde5d8 100644
>>> --- a/migration/savevm.c
>>> +++ b/migration/savevm.c
>>> @@ -1747,9 +1747,14 @@ static int loadvm_postcopy_handle_listen(MigrationIncomingState *mis)
>>>       mis->have_listen_thread = true;
>>>       /* Start up the listening thread and wait for it to signal ready */
>>>       qemu_sem_init(&mis->listen_thread_sem, 0);
>>> -    qemu_thread_create(&mis->listen_thread, "postcopy/listen",
>>> -                       postcopy_ram_listen_thread, NULL,
>>> -                       QEMU_THREAD_DETACHED);
>>> +    if (!qemu_thread_create(&mis->listen_thread, "postcopy/listen",
>>> +                            postcopy_ram_listen_thread, NULL,
>>> +                            QEMU_THREAD_DETACHED, &local_err)) {
>>> +        error_reportf_err(local_err,
>>> +                          "failed to create postcopy_ram_listen_thread: ");
>>> +        qemu_sem_destroy(&mis->listen_thread_sem);
>>> +        return -1;
>>> +    }
>>>       qemu_sem_wait(&mis->listen_thread_sem);
>>>       qemu_sem_destroy(&mis->listen_thread_sem);
>>>   
>> This is a caller that reports errors, then returns failure.  You make it
>> handle qemu_thread_create() the same way.  Good.
>>
>> I'll stop commenting on this pattern now.
> Thanks.
>>> diff --git a/tests/atomic_add-bench.c b/tests/atomic_add-bench.c
>>> index 2f6c72f63a..338b9563e3 100644
>>> --- a/tests/atomic_add-bench.c
>>> +++ b/tests/atomic_add-bench.c
>>> @@ -2,6 +2,7 @@
>>>   #include "qemu/thread.h"
>>>   #include "qemu/host-utils.h"
>>>   #include "qemu/processor.h"
>>> +#include "qapi/error.h"
>>>     struct thread_info {
>>>       uint64_t r;
>>> @@ -110,7 +111,7 @@ static void create_threads(void)
>>>             info->r = (i + 1) ^ time(NULL);
>>>           qemu_thread_create(&threads[i], NULL, thread_func, info,
>>> -                           QEMU_THREAD_JOINABLE);
>>> +                           QEMU_THREAD_JOINABLE, &error_abort);
>>>       }
>>>   }
> ... snip for all tests/xxx.c as all the passed parameter is &error_abort ...
>>>   diff --git a/ui/vnc-jobs.c b/ui/vnc-jobs.c
>>> index 929391f85d..35a652d1fd 100644
>>> --- a/ui/vnc-jobs.c
>>> +++ b/ui/vnc-jobs.c
>>> @@ -31,6 +31,7 @@
>>>   #include "vnc-jobs.h"
>>>   #include "qemu/sockets.h"
>>>   #include "qemu/main-loop.h"
>>> +#include "qapi/error.h"
>>>   #include "block/aio.h"
>>>     /*
>>> @@ -331,15 +332,21 @@ static bool vnc_worker_thread_running(void)
>>>       return queue; /* Check global queue */
>>>   }
>>>   -void vnc_start_worker_thread(void)
>>> +bool vnc_start_worker_thread(Error **errp)
>>>   {
>>>       VncJobQueue *q;
>>>   -    if (vnc_worker_thread_running())
>>> -        return ;
>>> +    if (vnc_worker_thread_running()) {
>>> +        goto out;
>>> +    }
>>>         q = vnc_queue_init();
>>> -    qemu_thread_create(&q->thread, "vnc_worker", vnc_worker_thread, q,
>>> -                       QEMU_THREAD_DETACHED);
>>> +    if (!qemu_thread_create(&q->thread, "vnc_worker", vnc_worker_thread,
>>> +                            q, QEMU_THREAD_DETACHED, errp)) {
>>> +        vnc_queue_clear(q);
>>> +        return false;
>>> +    }
>>>       queue = q; /* Set global queue */
>>> +out:
>>> +    return true;
>>>   }
>> I recommend to pass &error_abort to qemu_thread_create() in this patch,
>> then convert vnc_start_worker_thread() to Error in a subsequent patch.
> Ok, thanks! This makes this patch shorter. :)
> BTW, would it be better by adding a "TODO: xxx" comment before the
> &error_abort in this patch, and remove it in the subsequent patch?
> If it is ok, I will do the same adding for the latter touch_all_pages().

See my remark on use of FIXME and TODO above.

Adding a TODO only to remove it later in the same series is fine.  More
so when it helps avoid review questions like "I think you need to do X
here", followed by "Oh, I see you're doing X here" when the reviewer
gets to the later patch.

>>> diff --git a/ui/vnc-jobs.h b/ui/vnc-jobs.h
>>> index 59f66bcc35..14640593db 100644
>>> --- a/ui/vnc-jobs.h
>>> +++ b/ui/vnc-jobs.h
>>> @@ -37,7 +37,7 @@ void vnc_job_push(VncJob *job);
>>>   void vnc_jobs_join(VncState *vs);
>>>     void vnc_jobs_consume_buffer(VncState *vs);
>>> -void vnc_start_worker_thread(void);
>>> +bool vnc_start_worker_thread(Error **errp);
>>>     /* Locks */
>>>   static inline int vnc_trylock_display(VncDisplay *vd)
>>> diff --git a/ui/vnc.c b/ui/vnc.c
>>> index 0c1b477425..0ffe9e6a5d 100644
>>> --- a/ui/vnc.c
>>> +++ b/ui/vnc.c
>>> @@ -3236,7 +3236,9 @@ void vnc_display_init(const char *id, Error **errp)
>>>       vd->connections_limit = 32;
>>>         qemu_mutex_init(&vd->mutex);
>>> -    vnc_start_worker_thread();
>>> +    if (!vnc_start_worker_thread(errp)) {
>>> +        return;
>>> +    }
>>>         vd->dcl.ops = &dcl_ops;
>>>       register_displaychangelistener(&vd->dcl);
>> These two hunks then also go into the subsequent patch.
> Ok.
>>> diff --git a/util/compatfd.c b/util/compatfd.c
>>> index 980bd33e52..886aa249f9 100644
>>> --- a/util/compatfd.c
>>> +++ b/util/compatfd.c
>>> @@ -16,6 +16,7 @@
>>>   #include "qemu/osdep.h"
>>>   #include "qemu-common.h"
>>>   #include "qemu/thread.h"
>>> +#include "qapi/error.h"
>>>     #include <sys/syscall.h>
>>>   @@ -70,6 +71,7 @@ static int qemu_signalfd_compat(const sigset_t
>>> *mask)
>>>       struct sigfd_compat_info *info;
>>>       QemuThread thread;
>>>       int fds[2];
>>> +    Error *local_err = NULL;
>>>         info = malloc(sizeof(*info));
>>>       if (info == NULL) {
>>> @@ -88,8 +90,14 @@ static int qemu_signalfd_compat(const sigset_t *mask)
>>>       memcpy(&info->mask, mask, sizeof(*mask));
>>>       info->fd = fds[1];
>>>   -    qemu_thread_create(&thread, "signalfd_compat",
>>> sigwait_compat, info,
>>> -                       QEMU_THREAD_DETACHED);
>>> +    if (!qemu_thread_create(&thread, "signalfd_compat", sigwait_compat,
>>> +                            info, QEMU_THREAD_DETACHED, &local_err)) {
>>> +        error_reportf_err(local_err, "failed to create sigwait_compat: ");
>>> +        close(fds[0]);
>>> +        close(fds[1]);
>>> +        free(info);
>>> +        return -1;
>>> +    }
>>>         return fds[0];
>>>   }
>> This function is implements signalfd() when the kernel doesn't provide
>> it.
>>
>> signalfd() sets errno on failure.  The replacement's existing failure
>> modes set errno.  You add a failure mode that doesn't set errno.  That's
>> a bug.  To fix it, you can either make qemu_thread_create() set errno,
>> or you can make it return a value you can use to set errno.  The common
>> way to do the latter is returning a *negated* errno value.
> Oops, I forgot setting the errno for Linux implementation! My fault..
> I will set errno inside qemu_thread_create() as follows:
>      err = pthread_attr_init(&attr);
>      if (err) {
> -        error_setg_errno(errp, -err, "pthread_attr_init failed: %s",
> -                         strerror(err));
> +        errno = err;
> +        error_setg_errno(errp, errno, "pthread_attr_init failed");
>          return false;
>      }

Make sure to set errno on all failures, not just this one.

Also add a function comment.  I suspect returning negated errno would
lead to a shorter function comment.  Yet another reason to write
function comments!  Making myself document the mess I made has made me
clean it up before I submit it many times :)

>
>> signalfd() doesn't print anything on failure.  The replacement's
>> existing failure modes don't print anything.  You add a failure mode
>> that does print.  I think it shouldn't.
> Ok, I will remove it. Thanks!
>>> diff --git a/util/oslib-posix.c b/util/oslib-posix.c
>>> index c1bee2a581..2c779fd634 100644
>>> --- a/util/oslib-posix.c
>>> +++ b/util/oslib-posix.c
>>> @@ -437,9 +437,12 @@ static bool touch_all_pages(char *area, size_t hpagesize, size_t numpages,
>>>       size_t size_per_thread;
>>>       char *addr = area;
>>>       int i = 0;
>>> +    int started_thread = 0;
>>> +    Error *local_err = NULL;
>>>         memset_thread_failed = false;
>>>       memset_num_threads = get_memset_num_threads(smp_cpus);
>>> +    started_thread = memset_num_threads;
>>>       memset_thread = g_new0(MemsetThread, memset_num_threads);
>>>       numpages_per_thread = (numpages / memset_num_threads);
>>>       size_per_thread = (hpagesize * numpages_per_thread);
>>> @@ -448,13 +451,19 @@ static bool touch_all_pages(char *area, size_t hpagesize, size_t numpages,
>>>           memset_thread[i].numpages = (i == (memset_num_threads - 1)) ?
>>>                                       numpages : numpages_per_thread;
>>>           memset_thread[i].hpagesize = hpagesize;
>>> -        qemu_thread_create(&memset_thread[i].pgthread, "touch_pages",
>>> -                           do_touch_pages, &memset_thread[i],
>>> -                           QEMU_THREAD_JOINABLE);
>>> +        if (!qemu_thread_create(&memset_thread[i].pgthread, "touch_pages",
>>> +                                do_touch_pages, &memset_thread[i],
>>> +                                QEMU_THREAD_JOINABLE, &local_err)) {
>>> +            error_reportf_err(local_err, "failed to create do_touch_pages: ");
>>> +            memset_thread_failed = true;
>>> +            started_thread = i;
>>> +            goto out;
>>> +        }
>>>           addr += size_per_thread;
>>>           numpages -= numpages_per_thread;
>>>       }
>>> -    for (i = 0; i < memset_num_threads; i++) {
>>> +out:
>>> +    for (i = 0; i < started_thread; i++) {
>>>           qemu_thread_join(&memset_thread[i].pgthread);
>>>       }
>>>       g_free(memset_thread);
>> You need to convert this function to Error instead, because its caller
>> os_mem_prealloc() sets an error on failure.  I recommend to pass
>> &error_abort in this patch, and convert to Error in a subsequent patch.
> Ok, thanks for the advice.
>>> diff --git a/util/qemu-thread-posix.c b/util/qemu-thread-posix.c
>>> index 865e476df5..81b40a1ece 100644
>>> --- a/util/qemu-thread-posix.c
>>> +++ b/util/qemu-thread-posix.c
>>> @@ -15,6 +15,7 @@
>>>   #include "qemu/atomic.h"
>>>   #include "qemu/notify.h"
>>>   #include "qemu-thread-common.h"
>>> +#include "qapi/error.h"
>>>     static bool name_threads;
>>>   @@ -500,9 +501,9 @@ static void *qemu_thread_start(void *args)
>>>       return r;
>>>   }
>>>   -void qemu_thread_create(QemuThread *thread, const char *name,
>>> -                       void *(*start_routine)(void*),
>>> -                       void *arg, int mode)
>>> +bool qemu_thread_create(QemuThread *thread, const char *name,
>>> +                        void *(*start_routine)(void *),
>>> +                        void *arg, int mode, Error **errp)
>>>   {
>>>       sigset_t set, oldset;
>>>       int err;
>>> @@ -511,7 +512,9 @@ void qemu_thread_create(QemuThread *thread, const char *name,
>>>         err = pthread_attr_init(&attr);
>>>       if (err) {
>>> -        error_exit(err, __func__);
>>> +        error_setg_errno(errp, -err, "pthread_attr_init failed: %s",
>>> +                         strerror(err));

-err is actually wrong: pthread_attr_init() returns a *positive* errno
code on failure.

>>> +        return false;
>>>       }
>>>         if (mode == QEMU_THREAD_DETACHED) {
>>> @@ -526,16 +529,21 @@ void qemu_thread_create(QemuThread *thread, const char *name,
>>>       qemu_thread_args->name = g_strdup(name);
>>>       qemu_thread_args->start_routine = start_routine;
>>>       qemu_thread_args->arg = arg;
>>> -
>> Let's keep the blank line.
> ok.
>
> Thanks so much for the review! Have a nice day. :)
> Fei

You're welcome :)

>>>       err = pthread_create(&thread->thread, &attr,
>>>                            qemu_thread_start, qemu_thread_args);
>>> -
>>> -    if (err)
>>> -        error_exit(err, __func__);
>>> +    if (err) {
>>> +        error_setg_errno(errp, -err, "pthread_create failed: %s",
>>> +                         strerror(err));
>>> +        pthread_attr_destroy(&attr);
>>> +        g_free(qemu_thread_args->name);
>>> +        g_free(qemu_thread_args);
>>> +        return false;
>>> +    }
>>>         pthread_sigmask(SIG_SETMASK, &oldset, NULL);
>>>         pthread_attr_destroy(&attr);
>>> +    return true;
>>>   }
>>>     void qemu_thread_get_self(QemuThread *thread)
>>> diff --git a/util/qemu-thread-win32.c b/util/qemu-thread-win32.c
>>> index 4a363ca675..57b1143e97 100644
>>> --- a/util/qemu-thread-win32.c
>>> +++ b/util/qemu-thread-win32.c
>>> @@ -20,6 +20,7 @@
>>>   #include "qemu/thread.h"
>>>   #include "qemu/notify.h"
>>>   #include "qemu-thread-common.h"
>>> +#include "qapi/error.h"
>>>   #include <process.h>
>>>     static bool name_threads;
>>> @@ -388,9 +389,9 @@ void *qemu_thread_join(QemuThread *thread)
>>>       return ret;
>>>   }
>>>   -void qemu_thread_create(QemuThread *thread, const char *name,
>>> -                       void *(*start_routine)(void *),
>>> -                       void *arg, int mode)
>>> +bool qemu_thread_create(QemuThread *thread, const char *name,
>>> +                        void *(*start_routine)(void *),
>>> +                        void *arg, int mode, Error **errp)
>>>   {
>>>       HANDLE hThread;
>>>       struct QemuThreadData *data;
>>> @@ -409,10 +410,17 @@ void qemu_thread_create(QemuThread *thread, const char *name,
>>>       hThread = (HANDLE) _beginthreadex(NULL, 0, win32_start_routine,
>>>                                         data, 0, &thread->tid);
>>>       if (!hThread) {
>>> -        error_exit(GetLastError(), __func__);
>>> +        if (data->mode != QEMU_THREAD_DETACHED) {
>>> +            DeleteCriticalSection(&data->cs);
>>> +        }
>>> +        error_setg_errno(errp, errno,
>>> +                         "failed to create win32_start_routine");
>>> +        g_free(data);
>>> +        return false;
>>>       }
>>>       CloseHandle(hThread);
>>>       thread->data = data;
>>> +    return true;
>>>   }
>>>     void qemu_thread_get_self(QemuThread *thread)
>>> diff --git a/util/rcu.c b/util/rcu.c
>>> index 5676c22bd1..145dcdb0c6 100644
>>> --- a/util/rcu.c
>>> +++ b/util/rcu.c
>>> @@ -32,6 +32,7 @@
>>>   #include "qemu/atomic.h"
>>>   #include "qemu/thread.h"
>>>   #include "qemu/main-loop.h"
>>> +#include "qapi/error.h"
>>>   #if defined(CONFIG_MALLOC_TRIM)
>>>   #include <malloc.h>
>>>   #endif
>>> @@ -325,7 +326,7 @@ static void rcu_init_complete(void)
>>>        * must have been quiescent even after forking, just recreate it.
>>>        */
>>>       qemu_thread_create(&thread, "call_rcu", call_rcu_thread,
>>> -                       NULL, QEMU_THREAD_DETACHED);
>>> +                       NULL, QEMU_THREAD_DETACHED, &error_abort);
>>>         rcu_register_thread();
>>>   }
>>> diff --git a/util/thread-pool.c b/util/thread-pool.c
>>> index 610646d131..ad0f980783 100644
>>> --- a/util/thread-pool.c
>>> +++ b/util/thread-pool.c
>>> @@ -22,6 +22,7 @@
>>>   #include "trace.h"
>>>   #include "block/thread-pool.h"
>>>   #include "qemu/main-loop.h"
>>> +#include "qapi/error.h"
>>>     static void do_spawn_thread(ThreadPool *pool);
>>>   @@ -132,7 +133,8 @@ static void do_spawn_thread(ThreadPool *pool)
>>>       pool->new_threads--;
>>>       pool->pending_threads++;
>>>   -    qemu_thread_create(&t, "worker", worker_thread, pool,
>>> QEMU_THREAD_DETACHED);
>>> +    qemu_thread_create(&t, "worker", worker_thread, pool,
>>> +                       QEMU_THREAD_DETACHED, &error_abort);
>>>   }
>>>     static void spawn_thread_bh_fn(void *opaque)
>>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.0 v8 6/7] qemu_thread_create: propagate the error to callers to handle
  2018-12-18 12:40       ` Fei Li
@ 2018-12-19 10:11         ` Markus Armbruster
  0 siblings, 0 replies; 27+ messages in thread
From: Markus Armbruster @ 2018-12-19 10:11 UTC (permalink / raw)
  To: Fei Li; +Cc: David Gibson, qemu-devel, Dr . David Alan Gilbert

Fei Li <fli@suse.com> writes:

> On 12/17/2018 03:29 PM, Fei Li wrote:
>>
>>
>> On 12/13/2018 03:26 PM, Markus Armbruster wrote:
>>> There's a question for David Gibson inline.  Please search for /ppc/.
>>>
>>> Fei Li <fli@suse.com> writes:
>>>
>>>> Make qemu_thread_create() return a Boolean to indicate if it succeeds
>>>> rather than failing with an error. And add an Error parameter to hold
>>>> the error message and let the callers handle it.
>>> The "rather than failing with an error" is misleading.  Before the
>>> patch, we report to stderr and abort().  What about:
>>>
>>>      qemu-thread: Make qemu_thread_create() handle errors properly
>>>
>>>      qemu_thread_create() abort()s on error.  Not nice.  Give it a
>>>      return value and an Error ** argument, so it can return success /
>>>      failure.
>> A nice commit-amend! Thanks!
>>> Still missing from the commit message then: how you update the callers.
>> Yes, agree. I think the-how should also be noted here, like
>> - propagating the err to callers whose call trace already have the
>> Error paramater;
>> - just add an &error_abort for qemu_thread_create() and make it a
>> "TODO: xxx";
>>> Let's see below.
> According to your below comment and suggestion, I make a summary for
> the second paragraph for the commit message, please help to review,
> thanks. :)
>
> /* ...The first paragraph and the middle blank... */
> And let's update qemu_thread_create()'s callers by
> - setting an error on qemu_thread_create() failure for callers that
>   set an error on failure;
> - reporting the error and returning failure for callers that return
>   an error code on failure;
> - reporting the error and setting some state for callers that just
>   report errors and choose not to continue on.
> - passing &error_abort for qemu_thread_create() for callers that
>   can't return failure, and marking a "TODO: " for further change.
>
> Have a nice day
> Fei

If you split the patch so that the first part makes all callers pass
&error_abort, the first part's commit message becomes much simpler, and
the subsequent parts' commit messages should be pretty simple to write,
too.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.0 v8 6/7] qemu_thread_create: propagate the error to callers to handle
  2018-12-19 10:10       ` Markus Armbruster
@ 2018-12-19 12:14         ` Fei Li
  2018-12-19 17:29           ` Eric Blake
  2018-12-21  9:36           ` Fei Li
  0 siblings, 2 replies; 27+ messages in thread
From: Fei Li @ 2018-12-19 12:14 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: David Gibson, qemu-devel, Dr . David Alan Gilbert



On 12/19/2018 06:10 PM, Markus Armbruster wrote:
> Fei Li <fli@suse.com> writes:
>
>> On 12/13/2018 03:26 PM, Markus Armbruster wrote:
>>> There's a question for David Gibson inline.  Please search for /ppc/.
>>>
>>> Fei Li <fli@suse.com> writes:
>>>
>>>> Make qemu_thread_create() return a Boolean to indicate if it succeeds
>>>> rather than failing with an error. And add an Error parameter to hold
>>>> the error message and let the callers handle it.
>>> The "rather than failing with an error" is misleading.  Before the
>>> patch, we report to stderr and abort().  What about:
>>>
>>>       qemu-thread: Make qemu_thread_create() handle errors properly
>>>
>>>       qemu_thread_create() abort()s on error.  Not nice.  Give it a
>>>       return value and an Error ** argument, so it can return success /
>>>       failure.
>> A nice commit-amend! Thanks!
>>> Still missing from the commit message then: how you update the callers.
>> Yes, agree. I think the-how should also be noted here, like
>> - propagating the err to callers whose call trace already have the
>> Error paramater;
>> - just add an &error_abort for qemu_thread_create() and make it a
>> "TODO: xxx";
>>> Let's see below.
>>>
>>>> Cc: Markus Armbruster <armbru@redhat.com>
>>>> Cc: Daniel P. Berrangé <berrange@redhat.com>
>>>> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
>>>> Signed-off-by: Fei Li <fli@suse.com>
>>>> ---
>>>>    cpus.c                      | 45 ++++++++++++++++++++++++-------------
>>>>    dump.c                      |  6 +++--
>>>>    hw/misc/edu.c               |  6 +++--
>>>>    hw/ppc/spapr_hcall.c        | 10 +++++++--
>>>>    hw/rdma/rdma_backend.c      |  4 +++-
>>>>    hw/usb/ccid-card-emulated.c | 16 ++++++++++----
>>>>    include/qemu/thread.h       |  4 ++--
>>>>    io/task.c                   |  3 ++-
>>>>    iothread.c                  | 16 +++++++++-----
>>>>    migration/migration.c       | 54 +++++++++++++++++++++++++++++----------------
>>>>    migration/postcopy-ram.c    | 14 ++++++++++--
>>>>    migration/ram.c             | 40 ++++++++++++++++++++++++---------
>>>>    migration/savevm.c          | 11 ++++++---
>>>>    tests/atomic_add-bench.c    |  3 ++-
>>>>    tests/iothread.c            |  2 +-
>>>>    tests/qht-bench.c           |  3 ++-
>>>>    tests/rcutorture.c          |  3 ++-
>>>>    tests/test-aio.c            |  2 +-
>>>>    tests/test-rcu-list.c       |  3 ++-
>>>>    ui/vnc-jobs.c               | 17 +++++++++-----
>>>>    ui/vnc-jobs.h               |  2 +-
>>>>    ui/vnc.c                    |  4 +++-
>>>>    util/compatfd.c             | 12 ++++++++--
>>>>    util/oslib-posix.c          | 17 ++++++++++----
>>>>    util/qemu-thread-posix.c    | 24 +++++++++++++-------
>>>>    util/qemu-thread-win32.c    | 16 ++++++++++----
>>>>    util/rcu.c                  |  3 ++-
>>>>    util/thread-pool.c          |  4 +++-
>>>>    28 files changed, 243 insertions(+), 101 deletions(-)
>>>>
>>>> diff --git a/cpus.c b/cpus.c
>>>> index 7b091bda53..e8450e518a 100644
>>>> --- a/cpus.c
>>>> +++ b/cpus.c
>>>> @@ -1961,15 +1961,20 @@ static void qemu_tcg_init_vcpu(CPUState *cpu, Error **errp)
>>>>                snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/TCG",
>>>>                     cpu->cpu_index);
>>>>    -            qemu_thread_create(cpu->thread, thread_name,
>>>> qemu_tcg_cpu_thread_fn,
>>>> -                               cpu, QEMU_THREAD_JOINABLE);
>>>> +            if (!qemu_thread_create(cpu->thread, thread_name,
>>>> +                                    qemu_tcg_cpu_thread_fn, cpu,
>>>> +                                    QEMU_THREAD_JOINABLE, errp)) {
>>>> +                return;
>>>> +            }
>>>>              } else {
>>>>                /* share a single thread for all cpus with TCG */
>>>>                snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "ALL CPUs/TCG");
>>>> -            qemu_thread_create(cpu->thread, thread_name,
>>>> -                               qemu_tcg_rr_cpu_thread_fn,
>>>> -                               cpu, QEMU_THREAD_JOINABLE);
>>>> +            if (!qemu_thread_create(cpu->thread, thread_name,
>>>> +                                    qemu_tcg_rr_cpu_thread_fn, cpu,
>>>> +                                    QEMU_THREAD_JOINABLE, errp)) {
>>>> +                return;
>>>> +            }
>>>>                  single_tcg_halt_cond = cpu->halt_cond;
>>>>                single_tcg_cpu_thread = cpu->thread;
>>> This is a caller that sets an error on failure.  You make it set an
>>> error on qemu_thread_create() failure.  Makes sense.
>> Thanks for the comment!
>>>> @@ -1997,8 +2002,10 @@ static void qemu_hax_start_vcpu(CPUState *cpu, Error **errp)
>>>>          snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/HAX",
>>>>                 cpu->cpu_index);
>>>> -    qemu_thread_create(cpu->thread, thread_name, qemu_hax_cpu_thread_fn,
>>>> -                       cpu, QEMU_THREAD_JOINABLE);
>>>> +    if (!qemu_thread_create(cpu->thread, thread_name, qemu_hax_cpu_thread_fn,
>>>> +                            cpu, QEMU_THREAD_JOINABLE, errp)) {
>>>> +        return;
>>>> +    }
>>>>    #ifdef _WIN32
>>>>        cpu->hThread = qemu_thread_get_handle(cpu->thread);
>>>>    #endif
>>> Likewise.  I'll stop commenting on this pattern now.
>>>
>>>> @@ -2013,8 +2020,10 @@ static void qemu_kvm_start_vcpu(CPUState *cpu, Error **errp)
>>>>        qemu_cond_init(cpu->halt_cond);
>>>>        snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/KVM",
>>>>                 cpu->cpu_index);
>>>> -    qemu_thread_create(cpu->thread, thread_name, qemu_kvm_cpu_thread_fn,
>>>> -                       cpu, QEMU_THREAD_JOINABLE);
>>>> +    if (!qemu_thread_create(cpu->thread, thread_name, qemu_kvm_cpu_thread_fn,
>>>> +                            cpu, QEMU_THREAD_JOINABLE, errp)) {
>>>> +        /* keep 'if' here in case there is further error handling logic */
>>>> +    }
>>>>    }
>>>>      static void qemu_hvf_start_vcpu(CPUState *cpu, Error **errp)
>>>> @@ -2031,8 +2040,10 @@ static void qemu_hvf_start_vcpu(CPUState *cpu, Error **errp)
>>>>          snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/HVF",
>>>>                 cpu->cpu_index);
>>>> -    qemu_thread_create(cpu->thread, thread_name, qemu_hvf_cpu_thread_fn,
>>>> -                       cpu, QEMU_THREAD_JOINABLE);
>>>> +    if (!qemu_thread_create(cpu->thread, thread_name, qemu_hvf_cpu_thread_fn,
>>>> +                            cpu, QEMU_THREAD_JOINABLE, errp)) {
>>>> +        /* keep 'if' here in case there is further error handling logic */
>>>> +    }
>>>>    }
>>>>      static void qemu_whpx_start_vcpu(CPUState *cpu, Error **errp)
>>>> @@ -2044,8 +2055,10 @@ static void qemu_whpx_start_vcpu(CPUState *cpu, Error **errp)
>>>>        qemu_cond_init(cpu->halt_cond);
>>>>        snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/WHPX",
>>>>                 cpu->cpu_index);
>>>> -    qemu_thread_create(cpu->thread, thread_name, qemu_whpx_cpu_thread_fn,
>>>> -                       cpu, QEMU_THREAD_JOINABLE);
>>>> +    if (!qemu_thread_create(cpu->thread, thread_name, qemu_whpx_cpu_thread_fn,
>>>> +                            cpu, QEMU_THREAD_JOINABLE, errp)) {
>>>> +        return;
>>>> +    }
>>>>    #ifdef _WIN32
>>>>        cpu->hThread = qemu_thread_get_handle(cpu->thread);
>>>>    #endif
>>>> @@ -2060,8 +2073,10 @@ static void qemu_dummy_start_vcpu(CPUState *cpu, Error **errp)
>>>>        qemu_cond_init(cpu->halt_cond);
>>>>        snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/DUMMY",
>>>>                 cpu->cpu_index);
>>>> -    qemu_thread_create(cpu->thread, thread_name, qemu_dummy_cpu_thread_fn, cpu,
>>>> -                       QEMU_THREAD_JOINABLE);
>>>> +    if (!qemu_thread_create(cpu->thread, thread_name, qemu_dummy_cpu_thread_fn,
>>>> +                            cpu, QEMU_THREAD_JOINABLE, errp)) {
>>>> +        /* keep 'if' here in case there is further error handling logic */
>>>> +    }
>>>>    }
>>>>      bool qemu_init_vcpu(CPUState *cpu, Error **errp)
>>>> diff --git a/dump.c b/dump.c
>>>> index 4ec94c5e25..1f003aff9a 100644
>>>> --- a/dump.c
>>>> +++ b/dump.c
>>>> @@ -2020,8 +2020,10 @@ void qmp_dump_guest_memory(bool paging, const char *file,
>>>>        if (detach_p) {
>>>>            /* detached dump */
>>>>            s->detached = true;
>>>> -        qemu_thread_create(&s->dump_thread, "dump_thread", dump_thread,
>>>> -                           s, QEMU_THREAD_DETACHED);
>>>> +        if (!qemu_thread_create(&s->dump_thread, "dump_thread", dump_thread,
>>>> +                                s, QEMU_THREAD_DETACHED, errp)) {
>>>> +            /* keep 'if' here in case there is further error handling logic */
>>>> +        }
>>>>        } else {
>>>>            /* sync dump */
>>>>            dump_process(s, errp);
>>>> diff --git a/hw/misc/edu.c b/hw/misc/edu.c
>>>> index cdcf550dd7..6684c60a96 100644
>>>> --- a/hw/misc/edu.c
>>>> +++ b/hw/misc/edu.c
>>>> @@ -355,8 +355,10 @@ static void pci_edu_realize(PCIDevice *pdev, Error **errp)
>>>>          qemu_mutex_init(&edu->thr_mutex);
>>>>        qemu_cond_init(&edu->thr_cond);
>>>> -    qemu_thread_create(&edu->thread, "edu", edu_fact_thread,
>>>> -                       edu, QEMU_THREAD_JOINABLE);
>>>> +    if (!qemu_thread_create(&edu->thread, "edu", edu_fact_thread,
>>>> +                            edu, QEMU_THREAD_JOINABLE, errp)) {
>>>> +        return;
>>>> +    }
>>>>          memory_region_init_io(&edu->mmio, OBJECT(edu),
>>>> &edu_mmio_ops, edu,
>>>>                        "edu-mmio", 1 * MiB);
>>>> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
>>>> index ae913d070f..7c16ade04a 100644
>>>> --- a/hw/ppc/spapr_hcall.c
>>>> +++ b/hw/ppc/spapr_hcall.c
>>>> @@ -478,6 +478,7 @@ static target_ulong h_resize_hpt_prepare(PowerPCCPU *cpu,
>>>>        sPAPRPendingHPT *pending = spapr->pending_hpt;
>>>>        uint64_t current_ram_size;
>>>>        int rc;
>>>> +    Error *local_err = NULL;
>>>>          if (spapr->resize_hpt == SPAPR_RESIZE_HPT_DISABLED) {
>>>>            return H_AUTHORITY;
>>>> @@ -538,8 +539,13 @@ static target_ulong h_resize_hpt_prepare(PowerPCCPU *cpu,
>>>>        pending->shift = shift;
>>>>        pending->ret = H_HARDWARE;
>>>>    -    qemu_thread_create(&pending->thread, "sPAPR HPT prepare",
>>>> -                       hpt_prepare_thread, pending, QEMU_THREAD_DETACHED);
>>>> +    if (!qemu_thread_create(&pending->thread, "sPAPR HPT prepare",
>>>> +                            hpt_prepare_thread, pending,
>>>> +                            QEMU_THREAD_DETACHED, &local_err)) {
>>>> +        error_reportf_err(local_err, "failed to create hpt_prepare_thread: ");
>>>> +        g_free(pending);
>>>> +        return H_RESOURCE;
>>>> +    }
>>>>          spapr->pending_hpt = pending;
>>>>    
>>> This is a caller that returns an error code on failure.  You change it
>>> to report the error, then return failure.  The return failure part looks
>>> fine.  Whether reporting the error is appropriate I can't say for sure.
>>> No other failure mode reports anything.  David, what do you think?
>> Just as David explains. :)
>>> Fei Li, you could pass &error_abort to side-step this question for now.
>>>
>>>> diff --git a/hw/rdma/rdma_backend.c b/hw/rdma/rdma_backend.c
>>>> index d7a4bbd91f..53a2bd0d85 100644
>>>> --- a/hw/rdma/rdma_backend.c
>>>> +++ b/hw/rdma/rdma_backend.c
>>>> @@ -164,8 +164,10 @@ static void start_comp_thread(RdmaBackendDev *backend_dev)
>>>>        snprintf(thread_name, sizeof(thread_name), "rdma_comp_%s",
>>>>                 ibv_get_device_name(backend_dev->ib_dev));
>>>>        backend_dev->comp_thread.run = true;
>>>> +    /* FIXME: let the further caller handle the error instead of abort() here */
>>>>        qemu_thread_create(&backend_dev->comp_thread.thread, thread_name,
>>>> -                       comp_handler_thread, backend_dev, QEMU_THREAD_DETACHED);
>>>> +                       comp_handler_thread, backend_dev,
>>>> +                       QEMU_THREAD_DETACHED, &error_abort);
>>>>    }
>>>>    
>>> This is a caller that can't return failure.  You pass &error_abort.  No
>>> behavioral change.
>> Actually, yes..The reason why I did not do some change is that I am
>> not quite
>> sure about how to fix for the rdma device, esp. setting certain value
>> for the
>> dev->regs_data[idx] when it fails.
> I recommend to split this patch.  First part adds the Error ** parameter
> to qemu_thread_create(), passing &error_abort everywhere.  No functional
> change.  Subsequent patches then improve on &error_abort.  This way,
> each improvement patch can be cc'ed to just that part's maintainer(s).
> Parts you don't want to touch you simply leave at &error_abort.  Makes
> sense?
Yes, I think this makes sense, much clearer. :) But I am a little 
worried about
whether too many subsequent improvement patches (some of them are quite
small changes) are acceptable.
BTW, referring to the split, I think the previous "[2/7] qemu_init_vcpu: 
add a
new Error parameter to propagate" should be merged into the later
improvement for qemu_xxx_init_vcpu. What do you think?
>>> I think I'd mark the spot TODO, not FIXME.  Matter of taste, I guess.
>> Sounds good, thanks!
>>>>    void rdma_backend_register_comp_handler(void (*handler)(int status,
>>>> diff --git a/hw/usb/ccid-card-emulated.c b/hw/usb/ccid-card-emulated.c
>>>> index 25976ed84f..c6783f124a 100644
>>>> --- a/hw/usb/ccid-card-emulated.c
>>>> +++ b/hw/usb/ccid-card-emulated.c
>>>> @@ -33,6 +33,7 @@
>>>>    #include "qemu/main-loop.h"
>>>>    #include "ccid.h"
>>>>    #include "qapi/error.h"
>>>> +#include "qemu/error-report.h"
>>>>      #define DPRINTF(card, lvl, fmt, ...) \
>>>>    do {\
>>>> @@ -544,10 +545,17 @@ static void emulated_realize(CCIDCardState *base, Error **errp)
>>>>            error_setg(errp, "%s: failed to initialize vcard", TYPE_EMULATED_CCID);
>>>>            goto out2;
>>>>        }
>>>> -    qemu_thread_create(&card->event_thread_id, "ccid/event", event_thread,
>>>> -                       card, QEMU_THREAD_JOINABLE);
>>>> -    qemu_thread_create(&card->apdu_thread_id, "ccid/apdu", handle_apdu_thread,
>>>> -                       card, QEMU_THREAD_JOINABLE);
>>>> +    if (!qemu_thread_create(&card->event_thread_id, "ccid/event", event_thread,
>>>> +                            card, QEMU_THREAD_JOINABLE, errp)) {
>>>> +        error_report("failed to create event_thread");
>>>> +        goto out2;
>>>> +    }
>>>> +    if (!qemu_thread_create(&card->apdu_thread_id, "ccid/apdu",
>>>> +                            handle_apdu_thread, card,
>>>> +                            QEMU_THREAD_JOINABLE, errp)) {
>>>> +        error_report("failed to create handle_apdu_thread");
>>>> +        goto out2;
>>>> +    }
>>>>      out2:
>>>>        clean_event_notifier(card);
>>> error_report() in a realize() method is almost certainly wrong.
>> Ok, I will remove these two.
>>>> diff --git a/include/qemu/thread.h b/include/qemu/thread.h
>>>> index 55d83a907c..12291f4ccd 100644
>>>> --- a/include/qemu/thread.h
>>>> +++ b/include/qemu/thread.h
>>>> @@ -152,9 +152,9 @@ void qemu_event_reset(QemuEvent *ev);
>>>>    void qemu_event_wait(QemuEvent *ev);
>>>>    void qemu_event_destroy(QemuEvent *ev);
>>>>    -void qemu_thread_create(QemuThread *thread, const char *name,
>>>> +bool qemu_thread_create(QemuThread *thread, const char *name,
>>>>                            void *(*start_routine)(void *),
>>>> -                        void *arg, int mode);
>>>> +                        void *arg, int mode, Error **errp);
>>>>    void *qemu_thread_join(QemuThread *thread);
>>>>    void qemu_thread_get_self(QemuThread *thread);
>>>>    bool qemu_thread_is_self(QemuThread *thread);
>>>> diff --git a/io/task.c b/io/task.c
>>>> index 2886a2c1bc..6d3a18ab80 100644
>>>> --- a/io/task.c
>>>> +++ b/io/task.c
>>>> @@ -149,7 +149,8 @@ void qio_task_run_in_thread(QIOTask *task,
>>>>                           "io-task-worker",
>>>>                           qio_task_thread_worker,
>>>>                           data,
>>>> -                       QEMU_THREAD_DETACHED);
>>>> +                       QEMU_THREAD_DETACHED,
>>>> +                       &error_abort);
>>>>    }
>>>>      
>>> This is a caller that can't return failure.  You pass &error_abort.  No
>>> behavioral change.  Unlike above, you don't mark this spot FIXME.  Any
>>> particular reason for marking one, but not the other?
>> Emm, it is a little difficult to add a Error parameter for its callers and
>> the callers seem does not need the Error. Thus I think passing
>> &error_abort in this function instead of its further callers is more
>> direct. :)
>> The same reasons for the several below.
>>
>> But just as you mentioned, maybe we should add a "TODO: xxxx" for the direct
>> &error_abort case in case the callers need the Error parameter in future.
> Your use of &error_abort in this patch is fine simply because it's no
> worse than before.  I'm merely probing your use of FIXME / TODO.
>
> Adding a FIXME is appropriate when you're convinced the code is actually
> broken.
>
> Adding a TODO is appropriate when you believe the code should be
> improved.
>
> Both are almost always worth mentioning in the commit message.
>
> If you don't really know, and you're not really changing how the code
> behaves, then it's better not to add either kind of comment.
Ok, for such cases, I will not add either comment.
Thanks for the detail explanation!
>
>>> I'll stop commenting on this pattern now.
>>>
>>>> diff --git a/iothread.c b/iothread.c
>>>> index 2fb1cdf55d..7335dacf0b 100644
>>>> --- a/iothread.c
>>>> +++ b/iothread.c
>>>> @@ -164,9 +164,7 @@ static void iothread_complete(UserCreatable *obj, Error **errp)
>>>>                                    &local_error);
>>>>        if (local_error) {
>>>>            error_propagate(errp, local_error);
>>>> -        aio_context_unref(iothread->ctx);
>>>> -        iothread->ctx = NULL;
>>>> -        return;
>>>> +        goto fail;
>>>>        }
>>>>          qemu_mutex_init(&iothread->init_done_lock);
>>>> @@ -178,8 +176,12 @@ static void iothread_complete(UserCreatable *obj, Error **errp)
>>>>         */
>>>>        name = object_get_canonical_path_component(OBJECT(obj));
>>>>        thread_name = g_strdup_printf("IO %s", name);
>>>> -    qemu_thread_create(&iothread->thread, thread_name, iothread_run,
>>>> -                       iothread, QEMU_THREAD_JOINABLE);
>>>> +    if (!qemu_thread_create(&iothread->thread, thread_name, iothread_run,
>>>> +                            iothread, QEMU_THREAD_JOINABLE, errp)) {
>>>> +        g_free(thread_name);
>>>> +        g_free(name);
>>>> +        goto fail;
>>>> +    }
>>>>        g_free(thread_name);
>>>>        g_free(name);
>>>>    @@ -190,6 +192,10 @@ static void iothread_complete(UserCreatable
>>>> *obj, Error **errp)
>>>>                           &iothread->init_done_lock);
>>>>        }
>>>>        qemu_mutex_unlock(&iothread->init_done_lock);
>>>> +    return;
>>>> +fail:
>>>> +    aio_context_unref(iothread->ctx);
>>>> +    iothread->ctx = NULL;
>>>>    }
>>>>      typedef struct {
>>>> diff --git a/migration/migration.c b/migration/migration.c
>>>> index 0537fc0c26..af6c72ac5d 100644
>>>> --- a/migration/migration.c
>>>> +++ b/migration/migration.c
>>>> @@ -438,19 +438,22 @@ static void process_incoming_migration_co(void *opaque)
>>>>            /* Make sure all file formats flush their mutable metadata */
>>>>            bdrv_invalidate_cache_all(&local_err);
>>>>            if (local_err) {
>>>> -            migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
>>>> -                    MIGRATION_STATUS_FAILED);
>>>>                error_report_err(local_err);
>>>> -            exit(EXIT_FAILURE);
>>>> +            goto fail;
>>>>            }
>>>>              if (colo_init_ram_cache() < 0) {
>>>>                error_report("Init ram cache failed");
>>>> -            exit(EXIT_FAILURE);
>>>> +            goto fail;
>>>>            }
>>>>    -        qemu_thread_create(&mis->colo_incoming_thread, "COLO
>>>> incoming",
>>>> -             colo_process_incoming_thread, mis, QEMU_THREAD_JOINABLE);
>>>> +        if (!qemu_thread_create(&mis->colo_incoming_thread, "COLO incoming",
>>>> +                                colo_process_incoming_thread, mis,
>>>> +                                QEMU_THREAD_JOINABLE, &local_err)) {
>>>> +            error_reportf_err(local_err, "failed to create "
>>>> +                              "colo_process_incoming_thread: ");
>>>> +            goto fail;
>>>> +        }
>>>>            mis->have_colo_incoming_thread = true;
>>>>            qemu_coroutine_yield();
>>>>    @@ -461,20 +464,22 @@ static void
>>>> process_incoming_migration_co(void *opaque)
>>>>        }
>>>>          if (ret < 0) {
>>>> -        Error *local_err = NULL;
>>>> -
>>>> -        migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
>>>> -                          MIGRATION_STATUS_FAILED);
>>>>            error_report("load of migration failed: %s", strerror(-ret));
>>>> -        qemu_fclose(mis->from_src_file);
>>>> -        if (multifd_load_cleanup(&local_err) != 0) {
>>>> -            error_report_err(local_err);
>>>> -        }
>>>> -        exit(EXIT_FAILURE);
>>>> +        goto fail;
>>>>        }
>>>>        mis->bh = qemu_bh_new(process_incoming_migration_bh, mis);
>>>>        qemu_bh_schedule(mis->bh);
>>>>        mis->migration_incoming_co = NULL;
>>>> +    return;
>>>> +fail:
>>>> +    local_err = NULL;
>>>> +    migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
>>>> +                      MIGRATION_STATUS_FAILED);
>>>> +    qemu_fclose(mis->from_src_file);
>>>> +    if (multifd_load_cleanup(&local_err) != 0) {
>>>> +        error_report_err(local_err);
>>>> +    }
>>>> +    exit(EXIT_FAILURE);
>>>>    }
>>> You change handling of errors other than qemu_thread_create().  Separate
>>> patch, please.  I'd put it before this one.
>> Ok, thanks for the reminder. Will update in the next version.
>>>>      static void migration_incoming_setup(QEMUFile *f)
>>>> @@ -2345,6 +2350,7 @@ out:
>>>>    static int open_return_path_on_source(MigrationState *ms,
>>>>                                          bool create_thread)
>>>>    {
>>>> +    Error *local_err = NULL;
>>>>          ms->rp_state.from_dst_file =
>>>> qemu_file_get_return_path(ms->to_dst_file);
>>>>        if (!ms->rp_state.from_dst_file) {
>>>> @@ -2358,8 +2364,13 @@ static int open_return_path_on_source(MigrationState *ms,
>>>>            return 0;
>>>>        }
>>>>    -    qemu_thread_create(&ms->rp_state.rp_thread, "return path",
>>>> -                       source_return_path_thread, ms, QEMU_THREAD_JOINABLE);
>>>> +    if (!qemu_thread_create(&ms->rp_state.rp_thread, "return path",
>>>> +                            source_return_path_thread, ms,
>>>> +                            QEMU_THREAD_JOINABLE, &local_err)) {
>>>> +        error_reportf_err(local_err,
>>>> +                          "failed to create source_return_path_thread: ");
>>>> +        return -1;
>>>> +    }
>>>>          trace_open_return_path_on_source_continue();
>>>>    
>>> This is a caller that returns an error code on failure.  You change it
>>> to report the error, then return failure.  This is okay, because its
>>> sole caller also reports errors that way.
>> Thanks.
>>>> @@ -3189,8 +3200,13 @@ void migrate_fd_connect(MigrationState *s, Error *error_in)
>>>>            migrate_fd_cleanup(s);
>>>>            return;
>>>>        }
>>>> -    qemu_thread_create(&s->thread, "live_migration", migration_thread, s,
>>>> -                       QEMU_THREAD_JOINABLE);
>>>> +    if (!qemu_thread_create(&s->thread, "live_migration", migration_thread,
>>>> +                            s, QEMU_THREAD_JOINABLE, &error_in)) {
>>>> +        error_reportf_err(error_in, "failed to create migration_thread: ");
>>>> +        migrate_set_state(&s->state, s->state, MIGRATION_STATUS_FAILED);
>>>> +        migrate_fd_cleanup(s);
>>>> +        return;
>>>> +    }
>>>>        s->migration_thread_running = true;
>>>>    }
>>> This is a caller that reports errors.  You make it handle
>>> qemu_thread_create() the same way.  Good.
>> Thanks!
>>>>    diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
>>>> index fa09dba534..80bfa9c4a2 100644
>>>> --- a/migration/postcopy-ram.c
>>>> +++ b/migration/postcopy-ram.c
>>>> @@ -1083,6 +1083,8 @@ retry:
>>>>      int postcopy_ram_enable_notify(MigrationIncomingState *mis)
>>>>    {
>>>> +    Error *local_err = NULL;
>>>> +
>>>>        /* Open the fd for the kernel to give us userfaults */
>>>>        mis->userfault_fd = syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK);
>>>>        if (mis->userfault_fd == -1) {
>>>> @@ -1109,8 +1111,16 @@ int postcopy_ram_enable_notify(MigrationIncomingState *mis)
>>>>        }
>>>>          qemu_sem_init(&mis->fault_thread_sem, 0);
>>>> -    qemu_thread_create(&mis->fault_thread, "postcopy/fault",
>>>> -                       postcopy_ram_fault_thread, mis, QEMU_THREAD_JOINABLE);
>>>> +    if (!qemu_thread_create(&mis->fault_thread, "postcopy/fault",
>>>> +                            postcopy_ram_fault_thread, mis,
>>>> +                            QEMU_THREAD_JOINABLE, &local_err)) {
>>>> +        error_reportf_err(local_err,
>>>> +                          "failed to create postcopy_ram_fault_thread: ");
>>>> +        close(mis->userfault_event_fd);
>>>> +        close(mis->userfault_fd);
>>>> +        qemu_sem_destroy(&mis->fault_thread_sem);
>>>> +        return -1;
>>>> +    }
>>>>        qemu_sem_wait(&mis->fault_thread_sem);
>>>>        qemu_sem_destroy(&mis->fault_thread_sem);
>>>>        mis->have_fault_thread = true;
>>> This is a caller that reports errors, then returns failure.  You make it
>>> handle qemu_thread_create() the same way.  Good.
>>>
>>> Not related to this patch, just spotted while reviewing it:
>>>
>>>          /* Mark so that we get notified of accesses to unwritten areas */
>>>          if (qemu_ram_foreach_migratable_block(ram_block_enable_notify, mis)) {
>>>              error_report("ram_block_enable_notify failed");
>>>              return -1;
>>>          }
>>>
>>> Do we leak mis->userfault_fd, mis->userfault_event_fd,
>>> mis->fault_thread_sem here?
>> Actually the patch 5/7 fixes this: we leave the cleanup() handling to
>> postcopy_ram_incoming_cleanup() when failing to notify here.
>> Looking back to the history, I falsely did close(these_fds) just here but
>> David corrected me, and the following is quoted from his earlier comment:
>> "
>> I don't think these close() calls are safe.  This code is just after
>> starting the fault thread, and the fault thread has a poll() call on
>> these fd's, so we can't close them until we've instructed that thread
>> to exit.
>>
>> We should fall out through postcopy_ram_incoming_cleanup, and because
>> the thread exists it should do a notify to the thread, a join and then
>> only later do the close calls.
>> "
>>>> diff --git a/migration/ram.c b/migration/ram.c
>>>> index 658dfa88a3..6e0cccf066 100644
>>>> --- a/migration/ram.c
>>>> +++ b/migration/ram.c
>>>> @@ -473,6 +473,7 @@ static void compress_threads_save_cleanup(void)
>>>>    static int compress_threads_save_setup(void)
>>>>    {
>>>>        int i, thread_count;
>>>> +    Error *local_err = NULL;
>>>>          if (!migrate_use_compression()) {
>>>>            return 0;
>>>> @@ -502,9 +503,12 @@ static int compress_threads_save_setup(void)
>>>>            comp_param[i].quit = false;
>>>>            qemu_mutex_init(&comp_param[i].mutex);
>>>>            qemu_cond_init(&comp_param[i].cond);
>>>> -        qemu_thread_create(compress_threads + i, "compress",
>>>> -                           do_data_compress, comp_param + i,
>>>> -                           QEMU_THREAD_JOINABLE);
>>>> +        if (!qemu_thread_create(compress_threads + i, "compress",
>>>> +                                do_data_compress, comp_param + i,
>>>> +                                QEMU_THREAD_JOINABLE, &local_err)) {
>>>> +            error_reportf_err(local_err, "failed to create do_data_compress: ");
>>>> +            goto exit;
>>>> +        }
>>>>        }
>>>>        return 0;
>>>>    
>>> Reviewing the migration changes is getting tiresome...
>> Yes, indeed, the migration involves a lot! Thanks so much for helping
>> to review!
>>>    Is reporting the
>>> error appropriate here, and why?
>> I think the qemu monitor should display the obvious and exact failing
>> reason for administrators, esp considering that qemu_thread_create()
>> itself does not print any message thus we have no idea which direct
>> function fails if gdb is not enabled.
>> IOW, I think David's answer to that ppc's error_reportf_err() also
>> apply here:
>>
>> "The error returns are for the guest, the reported errors are for the
>> guest administrator or management layers."
> There could well be an issue with the "management layers" part.  Should
> this error be sent to the management layer via QMP somehow?  Migration
> maintainers should be able to assist with this question.
>
>>>> @@ -1075,8 +1079,14 @@ static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque)
>>>>            p->c = QIO_CHANNEL(sioc);
>>>>            qio_channel_set_delay(p->c, false);
>>>>            p->running = true;
>>>> -        qemu_thread_create(&p->thread, p->name, multifd_send_thread, p,
>>>> -                           QEMU_THREAD_JOINABLE);
>>>> +        if (!qemu_thread_create(&p->thread, p->name, multifd_send_thread, p,
>>>> +                                QEMU_THREAD_JOINABLE, &local_err)) {
>>>> +            migrate_set_error(migrate_get_current(), local_err);
>>>> +            error_reportf_err(local_err,
>>>> +                              "failed to create multifd_send_thread: ");
>>>> +            multifd_save_cleanup();
>>>> +            return;
>>>> +        }
>>>>              atomic_inc(&multifd_send_state->count);
>>>>        }
>>> Same question.
>>>
>>>> @@ -1350,8 +1360,13 @@ bool multifd_recv_new_channel(QIOChannel *ioc, Error **errp)
>>>>        p->num_packets = 1;
>>>>          p->running = true;
>>>> -    qemu_thread_create(&p->thread, p->name, multifd_recv_thread, p,
>>>> -                       QEMU_THREAD_JOINABLE);
>>>> +    if (!qemu_thread_create(&p->thread, p->name, multifd_recv_thread,
>>>> +                            p, QEMU_THREAD_JOINABLE, &local_err)) {
>>>> +        error_propagate_prepend(errp, local_err,
>>>> +                                "failed to create multifd_recv_thread: ");
>>>> +        multifd_recv_terminate_threads(local_err);
>>>> +        return false;
>>>> +    }
>>>>        atomic_inc(&multifd_recv_state->count);
>>>>        return atomic_read(&multifd_recv_state->count) ==
>>>>               migrate_multifd_channels();
>>>> @@ -3617,6 +3632,7 @@ static void compress_threads_load_cleanup(void)
>>>>    static int compress_threads_load_setup(QEMUFile *f)
>>>>    {
>>>>        int i, thread_count;
>>>> +    Error *local_err = NULL;
>>>>          if (!migrate_use_compression()) {
>>>>            return 0;
>>>> @@ -3638,9 +3654,13 @@ static int compress_threads_load_setup(QEMUFile *f)
>>>>            qemu_cond_init(&decomp_param[i].cond);
>>>>            decomp_param[i].done = true;
>>>>            decomp_param[i].quit = false;
>>>> -        qemu_thread_create(decompress_threads + i, "decompress",
>>>> -                           do_data_decompress, decomp_param + i,
>>>> -                           QEMU_THREAD_JOINABLE);
>>>> +        if (!qemu_thread_create(decompress_threads + i, "decompress",
>>>> +                                do_data_decompress, decomp_param + i,
>>>> +                                QEMU_THREAD_JOINABLE, &local_err)) {
>>>> +            error_reportf_err(local_err,
>>>> +                              "failed to create do_data_decompress: ");
>>>> +            goto exit;
>>>> +        }
>>>>        }
>>>>        return 0;
>>>>    exit:
>>> Same question.
>>>
>>>> diff --git a/migration/savevm.c b/migration/savevm.c
>>>> index d784e8aa40..b8bdcde5d8 100644
>>>> --- a/migration/savevm.c
>>>> +++ b/migration/savevm.c
>>>> @@ -1747,9 +1747,14 @@ static int loadvm_postcopy_handle_listen(MigrationIncomingState *mis)
>>>>        mis->have_listen_thread = true;
>>>>        /* Start up the listening thread and wait for it to signal ready */
>>>>        qemu_sem_init(&mis->listen_thread_sem, 0);
>>>> -    qemu_thread_create(&mis->listen_thread, "postcopy/listen",
>>>> -                       postcopy_ram_listen_thread, NULL,
>>>> -                       QEMU_THREAD_DETACHED);
>>>> +    if (!qemu_thread_create(&mis->listen_thread, "postcopy/listen",
>>>> +                            postcopy_ram_listen_thread, NULL,
>>>> +                            QEMU_THREAD_DETACHED, &local_err)) {
>>>> +        error_reportf_err(local_err,
>>>> +                          "failed to create postcopy_ram_listen_thread: ");
>>>> +        qemu_sem_destroy(&mis->listen_thread_sem);
>>>> +        return -1;
>>>> +    }
>>>>        qemu_sem_wait(&mis->listen_thread_sem);
>>>>        qemu_sem_destroy(&mis->listen_thread_sem);
>>>>    
>>> This is a caller that reports errors, then returns failure.  You make it
>>> handle qemu_thread_create() the same way.  Good.
>>>
>>> I'll stop commenting on this pattern now.
>> Thanks.
>>>> diff --git a/tests/atomic_add-bench.c b/tests/atomic_add-bench.c
>>>> index 2f6c72f63a..338b9563e3 100644
>>>> --- a/tests/atomic_add-bench.c
>>>> +++ b/tests/atomic_add-bench.c
>>>> @@ -2,6 +2,7 @@
>>>>    #include "qemu/thread.h"
>>>>    #include "qemu/host-utils.h"
>>>>    #include "qemu/processor.h"
>>>> +#include "qapi/error.h"
>>>>      struct thread_info {
>>>>        uint64_t r;
>>>> @@ -110,7 +111,7 @@ static void create_threads(void)
>>>>              info->r = (i + 1) ^ time(NULL);
>>>>            qemu_thread_create(&threads[i], NULL, thread_func, info,
>>>> -                           QEMU_THREAD_JOINABLE);
>>>> +                           QEMU_THREAD_JOINABLE, &error_abort);
>>>>        }
>>>>    }
>> ... snip for all tests/xxx.c as all the passed parameter is &error_abort ...
>>>>    diff --git a/ui/vnc-jobs.c b/ui/vnc-jobs.c
>>>> index 929391f85d..35a652d1fd 100644
>>>> --- a/ui/vnc-jobs.c
>>>> +++ b/ui/vnc-jobs.c
>>>> @@ -31,6 +31,7 @@
>>>>    #include "vnc-jobs.h"
>>>>    #include "qemu/sockets.h"
>>>>    #include "qemu/main-loop.h"
>>>> +#include "qapi/error.h"
>>>>    #include "block/aio.h"
>>>>      /*
>>>> @@ -331,15 +332,21 @@ static bool vnc_worker_thread_running(void)
>>>>        return queue; /* Check global queue */
>>>>    }
>>>>    -void vnc_start_worker_thread(void)
>>>> +bool vnc_start_worker_thread(Error **errp)
>>>>    {
>>>>        VncJobQueue *q;
>>>>    -    if (vnc_worker_thread_running())
>>>> -        return ;
>>>> +    if (vnc_worker_thread_running()) {
>>>> +        goto out;
>>>> +    }
>>>>          q = vnc_queue_init();
>>>> -    qemu_thread_create(&q->thread, "vnc_worker", vnc_worker_thread, q,
>>>> -                       QEMU_THREAD_DETACHED);
>>>> +    if (!qemu_thread_create(&q->thread, "vnc_worker", vnc_worker_thread,
>>>> +                            q, QEMU_THREAD_DETACHED, errp)) {
>>>> +        vnc_queue_clear(q);
>>>> +        return false;
>>>> +    }
>>>>        queue = q; /* Set global queue */
>>>> +out:
>>>> +    return true;
>>>>    }
>>> I recommend to pass &error_abort to qemu_thread_create() in this patch,
>>> then convert vnc_start_worker_thread() to Error in a subsequent patch.
>> Ok, thanks! This makes this patch shorter. :)
>> BTW, would it be better by adding a "TODO: xxx" comment before the
>> &error_abort in this patch, and remove it in the subsequent patch?
>> If it is ok, I will do the same adding for the latter touch_all_pages().
> See my remark on use of FIXME and TODO above.
>
> Adding a TODO only to remove it later in the same series is fine.  More
> so when it helps avoid review questions like "I think you need to do X
> here", followed by "Oh, I see you're doing X here" when the reviewer
> gets to the later patch.
Ok, will do so then, thanks for the advice.
>
>>>> diff --git a/ui/vnc-jobs.h b/ui/vnc-jobs.h
>>>> index 59f66bcc35..14640593db 100644
>>>> --- a/ui/vnc-jobs.h
>>>> +++ b/ui/vnc-jobs.h
>>>> @@ -37,7 +37,7 @@ void vnc_job_push(VncJob *job);
>>>>    void vnc_jobs_join(VncState *vs);
>>>>      void vnc_jobs_consume_buffer(VncState *vs);
>>>> -void vnc_start_worker_thread(void);
>>>> +bool vnc_start_worker_thread(Error **errp);
>>>>      /* Locks */
>>>>    static inline int vnc_trylock_display(VncDisplay *vd)
>>>> diff --git a/ui/vnc.c b/ui/vnc.c
>>>> index 0c1b477425..0ffe9e6a5d 100644
>>>> --- a/ui/vnc.c
>>>> +++ b/ui/vnc.c
>>>> @@ -3236,7 +3236,9 @@ void vnc_display_init(const char *id, Error **errp)
>>>>        vd->connections_limit = 32;
>>>>          qemu_mutex_init(&vd->mutex);
>>>> -    vnc_start_worker_thread();
>>>> +    if (!vnc_start_worker_thread(errp)) {
>>>> +        return;
>>>> +    }
>>>>          vd->dcl.ops = &dcl_ops;
>>>>        register_displaychangelistener(&vd->dcl);
>>> These two hunks then also go into the subsequent patch.
>> Ok.
>>>> diff --git a/util/compatfd.c b/util/compatfd.c
>>>> index 980bd33e52..886aa249f9 100644
>>>> --- a/util/compatfd.c
>>>> +++ b/util/compatfd.c
>>>> @@ -16,6 +16,7 @@
>>>>    #include "qemu/osdep.h"
>>>>    #include "qemu-common.h"
>>>>    #include "qemu/thread.h"
>>>> +#include "qapi/error.h"
>>>>      #include <sys/syscall.h>
>>>>    @@ -70,6 +71,7 @@ static int qemu_signalfd_compat(const sigset_t
>>>> *mask)
>>>>        struct sigfd_compat_info *info;
>>>>        QemuThread thread;
>>>>        int fds[2];
>>>> +    Error *local_err = NULL;
>>>>          info = malloc(sizeof(*info));
>>>>        if (info == NULL) {
>>>> @@ -88,8 +90,14 @@ static int qemu_signalfd_compat(const sigset_t *mask)
>>>>        memcpy(&info->mask, mask, sizeof(*mask));
>>>>        info->fd = fds[1];
>>>>    -    qemu_thread_create(&thread, "signalfd_compat",
>>>> sigwait_compat, info,
>>>> -                       QEMU_THREAD_DETACHED);
>>>> +    if (!qemu_thread_create(&thread, "signalfd_compat", sigwait_compat,
>>>> +                            info, QEMU_THREAD_DETACHED, &local_err)) {
>>>> +        error_reportf_err(local_err, "failed to create sigwait_compat: ");
>>>> +        close(fds[0]);
>>>> +        close(fds[1]);
>>>> +        free(info);
>>>> +        return -1;
>>>> +    }
>>>>          return fds[0];
>>>>    }
>>> This function is implements signalfd() when the kernel doesn't provide
>>> it.
>>>
>>> signalfd() sets errno on failure.  The replacement's existing failure
>>> modes set errno.  You add a failure mode that doesn't set errno.  That's
>>> a bug.  To fix it, you can either make qemu_thread_create() set errno,
>>> or you can make it return a value you can use to set errno.  The common
>>> way to do the latter is returning a *negated* errno value.
>> Oops, I forgot setting the errno for Linux implementation! My fault..
>> I will set errno inside qemu_thread_create() as follows:
>>       err = pthread_attr_init(&attr);
>>       if (err) {
>> -        error_setg_errno(errp, -err, "pthread_attr_init failed: %s",
>> -                         strerror(err));
>> +        errno = err;
>> +        error_setg_errno(errp, errno, "pthread_attr_init failed");
>>           return false;
>>       }
> Make sure to set errno on all failures, not just this one.
Actually, this code update is changed for qemu_thread_create() itself,
I think if the errno is set in this function, no callers' errno need to 
be set.
Please correct me if I understand wrong. :)
> Also add a function comment.  I suspect returning negated errno would
> lead to a shorter function comment.
Actually only one caller needs the errno, that is the above 
qemu_signalfd_compat().
For the returning value, I remember there's once a email thread talking 
about it:
returning a bool (and let the passed errp hold the error message) is to 
keep the
consistency with glib. IMO, returning a bool or returning the -errno is 
equal to
me if we do not use the return value again in the callers, it just 
involves the
judgement. But if we want to reuse the return value, like:
   ret = qemu_thread_create(xx, xx, &local_err);
I do not think it is much needed. What do you think?
>   Yet another reason to write
> function comments!  Making myself document the mess I made has made me
> clean it up before I submit it many times :)
Ok, thanks for the experience. Will add the comment. :)
>
>>> signalfd() doesn't print anything on failure.  The replacement's
>>> existing failure modes don't print anything.  You add a failure mode
>>> that does print.  I think it shouldn't.
>> Ok, I will remove it. Thanks!
>>>> diff --git a/util/oslib-posix.c b/util/oslib-posix.c
>>>> index c1bee2a581..2c779fd634 100644
>>>> --- a/util/oslib-posix.c
>>>> +++ b/util/oslib-posix.c
>>>> @@ -437,9 +437,12 @@ static bool touch_all_pages(char *area, size_t hpagesize, size_t numpages,
>>>>        size_t size_per_thread;
>>>>        char *addr = area;
>>>>        int i = 0;
>>>> +    int started_thread = 0;
>>>> +    Error *local_err = NULL;
>>>>          memset_thread_failed = false;
>>>>        memset_num_threads = get_memset_num_threads(smp_cpus);
>>>> +    started_thread = memset_num_threads;
>>>>        memset_thread = g_new0(MemsetThread, memset_num_threads);
>>>>        numpages_per_thread = (numpages / memset_num_threads);
>>>>        size_per_thread = (hpagesize * numpages_per_thread);
>>>> @@ -448,13 +451,19 @@ static bool touch_all_pages(char *area, size_t hpagesize, size_t numpages,
>>>>            memset_thread[i].numpages = (i == (memset_num_threads - 1)) ?
>>>>                                        numpages : numpages_per_thread;
>>>>            memset_thread[i].hpagesize = hpagesize;
>>>> -        qemu_thread_create(&memset_thread[i].pgthread, "touch_pages",
>>>> -                           do_touch_pages, &memset_thread[i],
>>>> -                           QEMU_THREAD_JOINABLE);
>>>> +        if (!qemu_thread_create(&memset_thread[i].pgthread, "touch_pages",
>>>> +                                do_touch_pages, &memset_thread[i],
>>>> +                                QEMU_THREAD_JOINABLE, &local_err)) {
>>>> +            error_reportf_err(local_err, "failed to create do_touch_pages: ");
>>>> +            memset_thread_failed = true;
>>>> +            started_thread = i;
>>>> +            goto out;
>>>> +        }
>>>>            addr += size_per_thread;
>>>>            numpages -= numpages_per_thread;
>>>>        }
>>>> -    for (i = 0; i < memset_num_threads; i++) {
>>>> +out:
>>>> +    for (i = 0; i < started_thread; i++) {
>>>>            qemu_thread_join(&memset_thread[i].pgthread);
>>>>        }
>>>>        g_free(memset_thread);
>>> You need to convert this function to Error instead, because its caller
>>> os_mem_prealloc() sets an error on failure.  I recommend to pass
>>> &error_abort in this patch, and convert to Error in a subsequent patch.
>> Ok, thanks for the advice.
>>>> diff --git a/util/qemu-thread-posix.c b/util/qemu-thread-posix.c
>>>> index 865e476df5..81b40a1ece 100644
>>>> --- a/util/qemu-thread-posix.c
>>>> +++ b/util/qemu-thread-posix.c
>>>> @@ -15,6 +15,7 @@
>>>>    #include "qemu/atomic.h"
>>>>    #include "qemu/notify.h"
>>>>    #include "qemu-thread-common.h"
>>>> +#include "qapi/error.h"
>>>>      static bool name_threads;
>>>>    @@ -500,9 +501,9 @@ static void *qemu_thread_start(void *args)
>>>>        return r;
>>>>    }
>>>>    -void qemu_thread_create(QemuThread *thread, const char *name,
>>>> -                       void *(*start_routine)(void*),
>>>> -                       void *arg, int mode)
>>>> +bool qemu_thread_create(QemuThread *thread, const char *name,
>>>> +                        void *(*start_routine)(void *),
>>>> +                        void *arg, int mode, Error **errp)
>>>>    {
>>>>        sigset_t set, oldset;
>>>>        int err;
>>>> @@ -511,7 +512,9 @@ void qemu_thread_create(QemuThread *thread, const char *name,
>>>>          err = pthread_attr_init(&attr);
>>>>        if (err) {
>>>> -        error_exit(err, __func__);
>>>> +        error_setg_errno(errp, -err, "pthread_attr_init failed: %s",
>>>> +                         strerror(err));
> -err is actually wrong: pthread_attr_init() returns a *positive* errno
> code on failure.
Yes, a definite wrong code.. :( Actually, pthread_attr_init() returns a 
nonzero error
number, thus I do the below update by assigning the return err to errno.

      err = pthread_attr_init(&attr);
      if (err) {
-        error_exit(err, __func__);
+        errno = err;
+        error_setg_errno(errp, errno, "pthread_attr_init failed");
+        return false;
      }

Have a nice day, thanks so much for the review! ;)
Fei
>
>>>> +        return false;
>>>>        }
>>>>          if (mode == QEMU_THREAD_DETACHED) {
>>>> @@ -526,16 +529,21 @@ void qemu_thread_create(QemuThread *thread, const char *name,
>>>>        qemu_thread_args->name = g_strdup(name);
>>>>        qemu_thread_args->start_routine = start_routine;
>>>>        qemu_thread_args->arg = arg;
>>>> -
>>> Let's keep the blank line.
>> ok.
>>
>> Thanks so much for the review! Have a nice day. :)
>> Fei
> You're welcome :)
>
>>>>        err = pthread_create(&thread->thread, &attr,
>>>>                             qemu_thread_start, qemu_thread_args);
>>>> -
>>>> -    if (err)
>>>> -        error_exit(err, __func__);
>>>> +    if (err) {
>>>> +        error_setg_errno(errp, -err, "pthread_create failed: %s",
>>>> +                         strerror(err));
>>>> +        pthread_attr_destroy(&attr);
>>>> +        g_free(qemu_thread_args->name);
>>>> +        g_free(qemu_thread_args);
>>>> +        return false;
>>>> +    }
>>>>          pthread_sigmask(SIG_SETMASK, &oldset, NULL);
>>>>          pthread_attr_destroy(&attr);
>>>> +    return true;
>>>>    }
>>>>      void qemu_thread_get_self(QemuThread *thread)
>>>> diff --git a/util/qemu-thread-win32.c b/util/qemu-thread-win32.c
>>>> index 4a363ca675..57b1143e97 100644
>>>> --- a/util/qemu-thread-win32.c
>>>> +++ b/util/qemu-thread-win32.c
>>>> @@ -20,6 +20,7 @@
>>>>    #include "qemu/thread.h"
>>>>    #include "qemu/notify.h"
>>>>    #include "qemu-thread-common.h"
>>>> +#include "qapi/error.h"
>>>>    #include <process.h>
>>>>      static bool name_threads;
>>>> @@ -388,9 +389,9 @@ void *qemu_thread_join(QemuThread *thread)
>>>>        return ret;
>>>>    }
>>>>    -void qemu_thread_create(QemuThread *thread, const char *name,
>>>> -                       void *(*start_routine)(void *),
>>>> -                       void *arg, int mode)
>>>> +bool qemu_thread_create(QemuThread *thread, const char *name,
>>>> +                        void *(*start_routine)(void *),
>>>> +                        void *arg, int mode, Error **errp)
>>>>    {
>>>>        HANDLE hThread;
>>>>        struct QemuThreadData *data;
>>>> @@ -409,10 +410,17 @@ void qemu_thread_create(QemuThread *thread, const char *name,
>>>>        hThread = (HANDLE) _beginthreadex(NULL, 0, win32_start_routine,
>>>>                                          data, 0, &thread->tid);
>>>>        if (!hThread) {
>>>> -        error_exit(GetLastError(), __func__);
>>>> +        if (data->mode != QEMU_THREAD_DETACHED) {
>>>> +            DeleteCriticalSection(&data->cs);
>>>> +        }
>>>> +        error_setg_errno(errp, errno,
>>>> +                         "failed to create win32_start_routine");
>>>> +        g_free(data);
>>>> +        return false;
>>>>        }
>>>>        CloseHandle(hThread);
>>>>        thread->data = data;
>>>> +    return true;
>>>>    }
>>>>      void qemu_thread_get_self(QemuThread *thread)
>>>> diff --git a/util/rcu.c b/util/rcu.c
>>>> index 5676c22bd1..145dcdb0c6 100644
>>>> --- a/util/rcu.c
>>>> +++ b/util/rcu.c
>>>> @@ -32,6 +32,7 @@
>>>>    #include "qemu/atomic.h"
>>>>    #include "qemu/thread.h"
>>>>    #include "qemu/main-loop.h"
>>>> +#include "qapi/error.h"
>>>>    #if defined(CONFIG_MALLOC_TRIM)
>>>>    #include <malloc.h>
>>>>    #endif
>>>> @@ -325,7 +326,7 @@ static void rcu_init_complete(void)
>>>>         * must have been quiescent even after forking, just recreate it.
>>>>         */
>>>>        qemu_thread_create(&thread, "call_rcu", call_rcu_thread,
>>>> -                       NULL, QEMU_THREAD_DETACHED);
>>>> +                       NULL, QEMU_THREAD_DETACHED, &error_abort);
>>>>          rcu_register_thread();
>>>>    }
>>>> diff --git a/util/thread-pool.c b/util/thread-pool.c
>>>> index 610646d131..ad0f980783 100644
>>>> --- a/util/thread-pool.c
>>>> +++ b/util/thread-pool.c
>>>> @@ -22,6 +22,7 @@
>>>>    #include "trace.h"
>>>>    #include "block/thread-pool.h"
>>>>    #include "qemu/main-loop.h"
>>>> +#include "qapi/error.h"
>>>>      static void do_spawn_thread(ThreadPool *pool);
>>>>    @@ -132,7 +133,8 @@ static void do_spawn_thread(ThreadPool *pool)
>>>>        pool->new_threads--;
>>>>        pool->pending_threads++;
>>>>    -    qemu_thread_create(&t, "worker", worker_thread, pool,
>>>> QEMU_THREAD_DETACHED);
>>>> +    qemu_thread_create(&t, "worker", worker_thread, pool,
>>>> +                       QEMU_THREAD_DETACHED, &error_abort);
>>>>    }
>>>>      static void spawn_thread_bh_fn(void *opaque)
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.0 v8 3/7] migration: fix the multifd code when receiving less channels
  2018-12-17 11:45     ` Fei Li
@ 2018-12-19 14:11       ` Markus Armbruster
  2018-12-20  3:27         ` Fei Li
  0 siblings, 1 reply; 27+ messages in thread
From: Markus Armbruster @ 2018-12-19 14:11 UTC (permalink / raw)
  To: Fei Li; +Cc: qemu-devel, Dr . David Alan Gilbert

Fei Li <fli@suse.com> writes:

> On 12/13/2018 02:17 PM, Markus Armbruster wrote:
>> Fei Li <fli@suse.com> writes:
>>
>>> In our current code, when multifd is used during migration, if there
>>> is an error before the destination receives all new channels, the
>>> source keeps running, however the destination does not exit but keeps
>>> waiting until the source is killed deliberately.
>>>
>>> Fix this by dumping the specific error and let users decide whether
>>> to quit from the destination side when failing to receive packet via
>>> some channel.
>>>
>>> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
>>> Signed-off-by: Fei Li <fli@suse.com>
>>> Reviewed-by: Peter Xu <peterx@redhat.com>
[...]
>>> diff --git a/migration/migration.h b/migration/migration.h
>>> index e413d4d8b6..02b7304610 100644
>>> --- a/migration/migration.h
>>> +++ b/migration/migration.h
>>> @@ -229,7 +229,7 @@ struct MigrationState
>>>   void migrate_set_state(int *state, int old_state, int new_state);
>>>     void migration_fd_process_incoming(QEMUFile *f);
>>> -void migration_ioc_process_incoming(QIOChannel *ioc);
>>> +void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp);
>>>   void migration_incoming_process(void);
>>>     bool  migration_has_all_channels(void);
>>> diff --git a/migration/ram.c b/migration/ram.c
>>> index 7e7deec4d8..c7e3d6b0fd 100644
>>> --- a/migration/ram.c
>>> +++ b/migration/ram.c
>>> @@ -1323,7 +1323,7 @@ bool multifd_recv_all_channels_created(void)
>>>   }
>>>     /* Return true if multifd is ready for the migration, otherwise
>>> false */
>>> -bool multifd_recv_new_channel(QIOChannel *ioc)
>>> +bool multifd_recv_new_channel(QIOChannel *ioc, Error **errp)
>>>   {
>>>       MultiFDRecvParams *p;
>>>       Error *local_err = NULL;
>>> @@ -1331,6 +1331,10 @@ bool multifd_recv_new_channel(QIOChannel *ioc)
>>>         id = multifd_recv_initial_packet(ioc, &local_err);
>>>       if (id < 0) {
>>> +        error_propagate_prepend(errp, local_err,
>>> +                                "failed to receive packet"
>>> +                                " via multifd channel %d: ",
>>> +                                atomic_read(&multifd_recv_state->count));
>>>           multifd_recv_terminate_threads(local_err);
>>>           return false;
>> Here, we return false without setting an error.
> I am not sure whether I understand correctly, but here I think the above
> error_propagate_prepend() set the error to errp.

You're right, I got confused.

However, you shouldn't access @local_err after error_propagate() or
similar.  Please insert error_propagate_prepend() after
multifd_recv_terminate_threads(), lik you do in the next hunk.

>>>       }
>>> @@ -1340,6 +1344,7 @@ bool multifd_recv_new_channel(QIOChannel *ioc)
>>>           error_setg(&local_err, "multifd: received id '%d' already setup'",
>>>                      id);
>>>           multifd_recv_terminate_threads(local_err);
>>> +        error_propagate(errp, local_err);
>>>           return false;
>> Here, we return false with setting an error.
>>
>>>       }
>>>       p->c = ioc;
>>> @@ -1351,7 +1356,8 @@ bool multifd_recv_new_channel(QIOChannel *ioc)
>>>       qemu_thread_create(&p->thread, p->name, multifd_recv_thread, p,
>>>                          QEMU_THREAD_JOINABLE);
>>>       atomic_inc(&multifd_recv_state->count);
>>> -    return multifd_recv_state->count == migrate_multifd_channels();
>>> +    return atomic_read(&multifd_recv_state->count) ==
>>> +           migrate_multifd_channels();
>> Here, we return either true of false without setting an error.
> yes.
>> Taken together, there are three cases:
>>
>> 1. Succeed and return true
> Yes, when all multifd channels are correctly received.
>> 2. Succeed and return false
> Yes, when the current multifd channel is received correctly, but
> have not received all the channels.

Aha.

>> 3. Fail (set an error) and return false.
> Yes. And with the propagated error, the code just returns and
> report the error in migration_channel_process_incoming().
>> Assuming that's what we want: please update the function comment to
>> spell them out.
> Ok, I will update the three cases in the comment to clarify in detail.
>
> Have a nice day, thanks :)

You're welcome!

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.0 v8 6/7] qemu_thread_create: propagate the error to callers to handle
  2018-12-19 12:14         ` Fei Li
@ 2018-12-19 17:29           ` Eric Blake
  2018-12-20  3:20             ` Fei Li
  2018-12-21  9:36           ` Fei Li
  1 sibling, 1 reply; 27+ messages in thread
From: Eric Blake @ 2018-12-19 17:29 UTC (permalink / raw)
  To: Fei Li, Markus Armbruster
  Cc: David Gibson, qemu-devel, Dr . David Alan Gilbert

On 12/19/18 6:14 AM, Fei Li wrote:

>>>>>    28 files changed, 243 insertions(+), 101 deletions(-)

>> I recommend to split this patch.  First part adds the Error ** parameter
>> to qemu_thread_create(), passing &error_abort everywhere.  No functional
>> change.  Subsequent patches then improve on &error_abort.  This way,
>> each improvement patch can be cc'ed to just that part's maintainer(s).
>> Parts you don't want to touch you simply leave at &error_abort.  Makes
>> sense?
> Yes, I think this makes sense, much clearer. :) But I am a little 
> worried about
> whether too many subsequent improvement patches (some of them are quite
> small changes) are acceptable.

A long series of small patches, where each patch is cc'd to an 
appropriate maintainer, will likely get cumulative reviews faster than a 
single monolithic patch where no one person is the expert on every line 
touched.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.0 v8 6/7] qemu_thread_create: propagate the error to callers to handle
  2018-12-19 17:29           ` Eric Blake
@ 2018-12-20  3:20             ` Fei Li
  0 siblings, 0 replies; 27+ messages in thread
From: Fei Li @ 2018-12-20  3:20 UTC (permalink / raw)
  To: Eric Blake, Markus Armbruster
  Cc: David Gibson, qemu-devel, Dr . David Alan Gilbert



On 12/20/2018 01:29 AM, Eric Blake wrote:
> On 12/19/18 6:14 AM, Fei Li wrote:
>
>>>>>>    28 files changed, 243 insertions(+), 101 deletions(-)
>
>>> I recommend to split this patch.  First part adds the Error ** 
>>> parameter
>>> to qemu_thread_create(), passing &error_abort everywhere. No functional
>>> change.  Subsequent patches then improve on &error_abort. This way,
>>> each improvement patch can be cc'ed to just that part's maintainer(s).
>>> Parts you don't want to touch you simply leave at &error_abort.  Makes
>>> sense?
>> Yes, I think this makes sense, much clearer. :) But I am a little 
>> worried about
>> whether too many subsequent improvement patches (some of them are quite
>> small changes) are acceptable.
>
> A long series of small patches, where each patch is cc'd to an 
> appropriate maintainer, will likely get cumulative reviews faster than 
> a single monolithic patch where no one person is the expert on every 
> line touched.
>
Ok, thanks for the advice!

Have a nice day
Fei

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.0 v8 3/7] migration: fix the multifd code when receiving less channels
  2018-12-19 14:11       ` Markus Armbruster
@ 2018-12-20  3:27         ` Fei Li
  0 siblings, 0 replies; 27+ messages in thread
From: Fei Li @ 2018-12-20  3:27 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, Dr . David Alan Gilbert



On 12/19/2018 10:11 PM, Markus Armbruster wrote:
> Fei Li <fli@suse.com> writes:
>
>> On 12/13/2018 02:17 PM, Markus Armbruster wrote:
>>> Fei Li <fli@suse.com> writes:
>>>
>>>> In our current code, when multifd is used during migration, if there
>>>> is an error before the destination receives all new channels, the
>>>> source keeps running, however the destination does not exit but keeps
>>>> waiting until the source is killed deliberately.
>>>>
>>>> Fix this by dumping the specific error and let users decide whether
>>>> to quit from the destination side when failing to receive packet via
>>>> some channel.
>>>>
>>>> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
>>>> Signed-off-by: Fei Li <fli@suse.com>
>>>> Reviewed-by: Peter Xu <peterx@redhat.com>
> [...]
>>>> diff --git a/migration/migration.h b/migration/migration.h
>>>> index e413d4d8b6..02b7304610 100644
>>>> --- a/migration/migration.h
>>>> +++ b/migration/migration.h
>>>> @@ -229,7 +229,7 @@ struct MigrationState
>>>>    void migrate_set_state(int *state, int old_state, int new_state);
>>>>      void migration_fd_process_incoming(QEMUFile *f);
>>>> -void migration_ioc_process_incoming(QIOChannel *ioc);
>>>> +void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp);
>>>>    void migration_incoming_process(void);
>>>>      bool  migration_has_all_channels(void);
>>>> diff --git a/migration/ram.c b/migration/ram.c
>>>> index 7e7deec4d8..c7e3d6b0fd 100644
>>>> --- a/migration/ram.c
>>>> +++ b/migration/ram.c
>>>> @@ -1323,7 +1323,7 @@ bool multifd_recv_all_channels_created(void)
>>>>    }
>>>>      /* Return true if multifd is ready for the migration, otherwise
>>>> false */
>>>> -bool multifd_recv_new_channel(QIOChannel *ioc)
>>>> +bool multifd_recv_new_channel(QIOChannel *ioc, Error **errp)
>>>>    {
>>>>        MultiFDRecvParams *p;
>>>>        Error *local_err = NULL;
>>>> @@ -1331,6 +1331,10 @@ bool multifd_recv_new_channel(QIOChannel *ioc)
>>>>          id = multifd_recv_initial_packet(ioc, &local_err);
>>>>        if (id < 0) {
>>>> +        error_propagate_prepend(errp, local_err,
>>>> +                                "failed to receive packet"
>>>> +                                " via multifd channel %d: ",
>>>> +                                atomic_read(&multifd_recv_state->count));
>>>>            multifd_recv_terminate_threads(local_err);
>>>>            return false;
>>> Here, we return false without setting an error.
>> I am not sure whether I understand correctly, but here I think the above
>> error_propagate_prepend() set the error to errp.
> You're right, I got confused.
>
> However, you shouldn't access @local_err after error_propagate() or
> similar.  Please insert error_propagate_prepend() after
> multifd_recv_terminate_threads(), lik you do in the next hunk.
Right, thanks for the reminder.

Have a nice day :)
Fei
>
>>>>        }
>>>> @@ -1340,6 +1344,7 @@ bool multifd_recv_new_channel(QIOChannel *ioc)
>>>>            error_setg(&local_err, "multifd: received id '%d' already setup'",
>>>>                       id);
>>>>            multifd_recv_terminate_threads(local_err);
>>>> +        error_propagate(errp, local_err);
>>>>            return false;
>>> Here, we return false with setting an error.
>>>
>>>>        }
>>>>        p->c = ioc;
>>>> @@ -1351,7 +1356,8 @@ bool multifd_recv_new_channel(QIOChannel *ioc)
>>>>        qemu_thread_create(&p->thread, p->name, multifd_recv_thread, p,
>>>>                           QEMU_THREAD_JOINABLE);
>>>>        atomic_inc(&multifd_recv_state->count);
>>>> -    return multifd_recv_state->count == migrate_multifd_channels();
>>>> +    return atomic_read(&multifd_recv_state->count) ==
>>>> +           migrate_multifd_channels();
>>> Here, we return either true of false without setting an error.
>> yes.
>>> Taken together, there are three cases:
>>>
>>> 1. Succeed and return true
>> Yes, when all multifd channels are correctly received.
>>> 2. Succeed and return false
>> Yes, when the current multifd channel is received correctly, but
>> have not received all the channels.
> Aha.
>
>>> 3. Fail (set an error) and return false.
>> Yes. And with the propagated error, the code just returns and
>> report the error in migration_channel_process_incoming().
>>> Assuming that's what we want: please update the function comment to
>>> spell them out.
>> Ok, I will update the three cases in the comment to clarify in detail.
>>
>> Have a nice day, thanks :)
> You're welcome!
>
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.0 v8 6/7] qemu_thread_create: propagate the error to callers to handle
  2018-12-19 12:14         ` Fei Li
  2018-12-19 17:29           ` Eric Blake
@ 2018-12-21  9:36           ` Fei Li
  2018-12-24  3:34             ` Peter Xu
  1 sibling, 1 reply; 27+ messages in thread
From: Fei Li @ 2018-12-21  9:36 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Juan Quintela, Peter Xu, Dr . David Alan Gilbert, QEMU Developers


On 12/19/2018 08:14 PM, Fei Li wrote:
>
> On 12/19/2018 06:10 PM, Markus Armbruster wrote:
>> Fei Li <fli@suse.com> writes:
>>
>>> On 12/13/2018 03:26 PM, Markus Armbruster wrote:
>>>> There's a question for David Gibson inline.  Please search for /ppc/.
>>>>
>>>> Fei Li <fli@suse.com> writes:
>>>>
>>>>> Make qemu_thread_create() return a Boolean to indicate if it succeeds
>>>>> rather than failing with an error. And add an Error parameter to hold
>>>>> the error message and let the callers handle it.
>>>> The "rather than failing with an error" is misleading. Before the
>>>> patch, we report to stderr and abort().  What about:
>>>>
>>>>       qemu-thread: Make qemu_thread_create() handle errors properly
>>>>
>>>>       qemu_thread_create() abort()s on error.  Not nice. Give it a
>>>>       return value and an Error ** argument, so it can return 
>>>> success /
>>>>       failure.
>>> A nice commit-amend! Thanks!
>>>> Still missing from the commit message then: how you update the 
>>>> callers.
>>> Yes, agree. I think the-how should also be noted here, like
>>> - propagating the err to callers whose call trace already have the
>>> Error paramater;
>>> - just add an &error_abort for qemu_thread_create() and make it a
>>> "TODO: xxx";
>>>> Let's see below.
>>>>
>>>>> Cc: Markus Armbruster <armbru@redhat.com>
>>>>> Cc: Daniel P. Berrangé <berrange@redhat.com>
>>>>> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
>>>>> Signed-off-by: Fei Li <fli@suse.com>
>>>>> ---
>>>>>    cpus.c                      | 45 
>>>>> ++++++++++++++++++++++++-------------
>>>>>    dump.c                      |  6 +++--
>>>>>    hw/misc/edu.c               |  6 +++--
>>>>>    hw/ppc/spapr_hcall.c        | 10 +++++++--
>>>>>    hw/rdma/rdma_backend.c      |  4 +++-
>>>>>    hw/usb/ccid-card-emulated.c | 16 ++++++++++----
>>>>>    include/qemu/thread.h       |  4 ++--
>>>>>    io/task.c                   |  3 ++-
>>>>>    iothread.c                  | 16 +++++++++-----
>>>>>    migration/migration.c       | 54 
>>>>> +++++++++++++++++++++++++++++----------------
>>>>>    migration/postcopy-ram.c    | 14 ++++++++++--
>>>>>    migration/ram.c             | 40 ++++++++++++++++++++++++---------
>>>>>    migration/savevm.c          | 11 ++++++---
>>>>>    tests/atomic_add-bench.c    |  3 ++-
>>>>>    tests/iothread.c            |  2 +-
>>>>>    tests/qht-bench.c           |  3 ++-
>>>>>    tests/rcutorture.c          |  3 ++-
>>>>>    tests/test-aio.c            |  2 +-
>>>>>    tests/test-rcu-list.c       |  3 ++-
>>>>>    ui/vnc-jobs.c               | 17 +++++++++-----
>>>>>    ui/vnc-jobs.h               |  2 +-
>>>>>    ui/vnc.c                    |  4 +++-
>>>>>    util/compatfd.c             | 12 ++++++++--
>>>>>    util/oslib-posix.c          | 17 ++++++++++----
>>>>>    util/qemu-thread-posix.c    | 24 +++++++++++++-------
>>>>>    util/qemu-thread-win32.c    | 16 ++++++++++----
>>>>>    util/rcu.c                  |  3 ++-
>>>>>    util/thread-pool.c          |  4 +++-
>>>>>    28 files changed, 243 insertions(+), 101 deletions(-)
>>>>>
...snip, and only leave the three uncertain small topics...
>>>
>>>>> diff --git a/migration/ram.c b/migration/ram.c
>>>>> index 658dfa88a3..6e0cccf066 100644
>>>>> --- a/migration/ram.c
>>>>> +++ b/migration/ram.c
>>>>> @@ -473,6 +473,7 @@ static void compress_threads_save_cleanup(void)
>>>>>    static int compress_threads_save_setup(void)
>>>>>    {
>>>>>        int i, thread_count;
>>>>> +    Error *local_err = NULL;
>>>>>          if (!migrate_use_compression()) {
>>>>>            return 0;
>>>>> @@ -502,9 +503,12 @@ static int compress_threads_save_setup(void)
>>>>>            comp_param[i].quit = false;
>>>>>            qemu_mutex_init(&comp_param[i].mutex);
>>>>>            qemu_cond_init(&comp_param[i].cond);
>>>>> -        qemu_thread_create(compress_threads + i, "compress",
>>>>> -                           do_data_compress, comp_param + i,
>>>>> -                           QEMU_THREAD_JOINABLE);
>>>>> +        if (!qemu_thread_create(compress_threads + i, "compress",
>>>>> +                                do_data_compress, comp_param + i,
>>>>> +                                QEMU_THREAD_JOINABLE, &local_err)) {
>>>>> +            error_reportf_err(local_err, "failed to create 
>>>>> do_data_compress: ");
>>>>> +            goto exit;
>>>>> +        }
>>>>>        }
>>>>>        return 0;
>>>> Reviewing the migration changes is getting tiresome...
>>> Yes, indeed, the migration involves a lot! Thanks so much for helping
>>> to review!
>>>>    Is reporting the
>>>> error appropriate here, and why?
>>> I think the qemu monitor should display the obvious and exact failing
>>> reason for administrators, esp considering that qemu_thread_create()
>>> itself does not print any message thus we have no idea which direct
>>> function fails if gdb is not enabled.
>>> IOW, I think David's answer to that ppc's error_reportf_err() also
>>> apply here:
>>>
>>> "The error returns are for the guest, the reported errors are for the
>>> guest administrator or management layers."
>> There could well be an issue with the "management layers" part. Should
>> this error be sent to the management layer via QMP somehow? Migration
>> maintainers should be able to assist with this question.
Kindly ping migration maintainers. :)
>
>>>>> diff --git a/util/compatfd.c b/util/compatfd.c
>>>>> index 980bd33e52..886aa249f9 100644
>>>>> --- a/util/compatfd.c
>>>>> +++ b/util/compatfd.c
>>>>> @@ -16,6 +16,7 @@
>>>>>    #include "qemu/osdep.h"
>>>>>    #include "qemu-common.h"
>>>>>    #include "qemu/thread.h"
>>>>> +#include "qapi/error.h"
>>>>>      #include <sys/syscall.h>
>>>>>    @@ -70,6 +71,7 @@ static int qemu_signalfd_compat(const sigset_t
>>>>> *mask)
>>>>>        struct sigfd_compat_info *info;
>>>>>        QemuThread thread;
>>>>>        int fds[2];
>>>>> +    Error *local_err = NULL;
>>>>>          info = malloc(sizeof(*info));
>>>>>        if (info == NULL) {
>>>>> @@ -88,8 +90,14 @@ static int qemu_signalfd_compat(const sigset_t 
>>>>> *mask)
>>>>>        memcpy(&info->mask, mask, sizeof(*mask));
>>>>>        info->fd = fds[1];
>>>>>    -    qemu_thread_create(&thread, "signalfd_compat",
>>>>> sigwait_compat, info,
>>>>> -                       QEMU_THREAD_DETACHED);
>>>>> +    if (!qemu_thread_create(&thread, "signalfd_compat", 
>>>>> sigwait_compat,
>>>>> +                            info, QEMU_THREAD_DETACHED, 
>>>>> &local_err)) {
>>>>> +        error_reportf_err(local_err, "failed to create 
>>>>> sigwait_compat: ");
>>>>> +        close(fds[0]);
>>>>> +        close(fds[1]);
>>>>> +        free(info);
>>>>> +        return -1;
>>>>> +    }
>>>>>          return fds[0];
>>>>>    }
>>>> This function is implements signalfd() when the kernel doesn't provide
>>>> it.
>>>>
>>>> signalfd() sets errno on failure.  The replacement's existing failure
>>>> modes set errno.  You add a failure mode that doesn't set errno.  
>>>> That's
>>>> a bug.  To fix it, you can either make qemu_thread_create() set errno,
>>>> or you can make it return a value you can use to set errno. The common
>>>> way to do the latter is returning a *negated* errno value.
>>> Oops, I forgot setting the errno for Linux implementation! My fault..
>>> I will set errno inside qemu_thread_create() as follows:
>>>       err = pthread_attr_init(&attr);
>>>       if (err) {
>>> -        error_setg_errno(errp, -err, "pthread_attr_init failed: %s",
>>> -                         strerror(err));
>>> +        errno = err;
>>> +        error_setg_errno(errp, errno, "pthread_attr_init failed");
>>>           return false;
>>>       }
>> Make sure to set errno on all failures, not just this one.
> Actually, this code update is changed for qemu_thread_create() itself,
> I think if the errno is set in this function, no callers' errno need 
> to be set.
> Please correct me if I understand wrong. :)
>> Also add a function comment.  I suspect returning negated errno would
>> lead to a shorter function comment.
> Actually only one caller needs the errno, that is the above 
> qemu_signalfd_compat().
> For the returning value, I remember there's once a email thread 
> talking about it:
> returning a bool (and let the passed errp hold the error message) is 
> to keep the
> consistency with glib. IMO, returning a bool or returning the -errno 
> is equal to
> me if we do not use the return value again in the callers, it just 
> involves the
> judgement. But if we want to reuse the return value, like:
>   ret = qemu_thread_create(xx, xx, &local_err);
> I do not think it is much needed. What do you think?
One place needs to be confirmed. :)
>>   Yet another reason to write
>> function comments!  Making myself document the mess I made has made me
>> clean it up before I submit it many times :)
> Ok, thanks for the experience. Will add the comment. :)
>>
>>>>> diff --git a/util/qemu-thread-posix.c b/util/qemu-thread-posix.c
>>>>> index 865e476df5..81b40a1ece 100644
>>>>> --- a/util/qemu-thread-posix.c
>>>>> +++ b/util/qemu-thread-posix.c
>>>>> @@ -15,6 +15,7 @@
>>>>>    #include "qemu/atomic.h"
>>>>>    #include "qemu/notify.h"
>>>>>    #include "qemu-thread-common.h"
>>>>> +#include "qapi/error.h"
>>>>>      static bool name_threads;
>>>>>    @@ -500,9 +501,9 @@ static void *qemu_thread_start(void *args)
>>>>>        return r;
>>>>>    }
>>>>>    -void qemu_thread_create(QemuThread *thread, const char *name,
>>>>> -                       void *(*start_routine)(void*),
>>>>> -                       void *arg, int mode)
>>>>> +bool qemu_thread_create(QemuThread *thread, const char *name,
>>>>> +                        void *(*start_routine)(void *),
>>>>> +                        void *arg, int mode, Error **errp)
>>>>>    {
>>>>>        sigset_t set, oldset;
>>>>>        int err;
>>>>> @@ -511,7 +512,9 @@ void qemu_thread_create(QemuThread *thread, 
>>>>> const char *name,
>>>>>          err = pthread_attr_init(&attr);
>>>>>        if (err) {
>>>>> -        error_exit(err, __func__);
>>>>> +        error_setg_errno(errp, -err, "pthread_attr_init failed: %s",
>>>>> +                         strerror(err));
>> -err is actually wrong: pthread_attr_init() returns a *positive* errno
>> code on failure.
> Yes, a definite wrong code.. :( Actually, pthread_attr_init() returns 
> a nonzero error
> number, thus I do the below update by assigning the return err to errno.
>
>      err = pthread_attr_init(&attr);
>      if (err) {
> -        error_exit(err, __func__);
> +        errno = err;
> +        error_setg_errno(errp, errno, "pthread_attr_init failed");
> +        return false;
>      }
>
Another place needs to be confirmed. :)
>
Have a nice day, thanks
Fei

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.0 v8 6/7] qemu_thread_create: propagate the error to callers to handle
  2018-12-21  9:36           ` Fei Li
@ 2018-12-24  3:34             ` Peter Xu
  2018-12-24  6:53               ` Fei Li
  0 siblings, 1 reply; 27+ messages in thread
From: Peter Xu @ 2018-12-24  3:34 UTC (permalink / raw)
  To: Fei Li
  Cc: Markus Armbruster, Juan Quintela, Dr . David Alan Gilbert,
	QEMU Developers

On Fri, Dec 21, 2018 at 05:36:57PM +0800, Fei Li wrote:
> 
> On 12/19/2018 08:14 PM, Fei Li wrote:
> > 
> > On 12/19/2018 06:10 PM, Markus Armbruster wrote:
> > > Fei Li <fli@suse.com> writes:
> > > 
> > > > On 12/13/2018 03:26 PM, Markus Armbruster wrote:
> > > > > There's a question for David Gibson inline.  Please search for /ppc/.
> > > > > 
> > > > > Fei Li <fli@suse.com> writes:
> > > > > 
> > > > > > Make qemu_thread_create() return a Boolean to indicate if it succeeds
> > > > > > rather than failing with an error. And add an Error parameter to hold
> > > > > > the error message and let the callers handle it.
> > > > > The "rather than failing with an error" is misleading. Before the
> > > > > patch, we report to stderr and abort().  What about:
> > > > > 
> > > > >       qemu-thread: Make qemu_thread_create() handle errors properly
> > > > > 
> > > > >       qemu_thread_create() abort()s on error.  Not nice. Give it a
> > > > >       return value and an Error ** argument, so it can
> > > > > return success /
> > > > >       failure.
> > > > A nice commit-amend! Thanks!
> > > > > Still missing from the commit message then: how you update
> > > > > the callers.
> > > > Yes, agree. I think the-how should also be noted here, like
> > > > - propagating the err to callers whose call trace already have the
> > > > Error paramater;
> > > > - just add an &error_abort for qemu_thread_create() and make it a
> > > > "TODO: xxx";
> > > > > Let's see below.
> > > > > 
> > > > > > Cc: Markus Armbruster <armbru@redhat.com>
> > > > > > Cc: Daniel P. Berrangé <berrange@redhat.com>
> > > > > > Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > > > > > Signed-off-by: Fei Li <fli@suse.com>
> > > > > > ---
> > > > > >    cpus.c                      | 45
> > > > > > ++++++++++++++++++++++++-------------
> > > > > >    dump.c                      |  6 +++--
> > > > > >    hw/misc/edu.c               |  6 +++--
> > > > > >    hw/ppc/spapr_hcall.c        | 10 +++++++--
> > > > > >    hw/rdma/rdma_backend.c      |  4 +++-
> > > > > >    hw/usb/ccid-card-emulated.c | 16 ++++++++++----
> > > > > >    include/qemu/thread.h       |  4 ++--
> > > > > >    io/task.c                   |  3 ++-
> > > > > >    iothread.c                  | 16 +++++++++-----
> > > > > >    migration/migration.c       | 54
> > > > > > +++++++++++++++++++++++++++++----------------
> > > > > >    migration/postcopy-ram.c    | 14 ++++++++++--
> > > > > >    migration/ram.c             | 40 ++++++++++++++++++++++++---------
> > > > > >    migration/savevm.c          | 11 ++++++---
> > > > > >    tests/atomic_add-bench.c    |  3 ++-
> > > > > >    tests/iothread.c            |  2 +-
> > > > > >    tests/qht-bench.c           |  3 ++-
> > > > > >    tests/rcutorture.c          |  3 ++-
> > > > > >    tests/test-aio.c            |  2 +-
> > > > > >    tests/test-rcu-list.c       |  3 ++-
> > > > > >    ui/vnc-jobs.c               | 17 +++++++++-----
> > > > > >    ui/vnc-jobs.h               |  2 +-
> > > > > >    ui/vnc.c                    |  4 +++-
> > > > > >    util/compatfd.c             | 12 ++++++++--
> > > > > >    util/oslib-posix.c          | 17 ++++++++++----
> > > > > >    util/qemu-thread-posix.c    | 24 +++++++++++++-------
> > > > > >    util/qemu-thread-win32.c    | 16 ++++++++++----
> > > > > >    util/rcu.c                  |  3 ++-
> > > > > >    util/thread-pool.c          |  4 +++-
> > > > > >    28 files changed, 243 insertions(+), 101 deletions(-)
> > > > > > 
> ...snip, and only leave the three uncertain small topics...
> > > > 
> > > > > > diff --git a/migration/ram.c b/migration/ram.c
> > > > > > index 658dfa88a3..6e0cccf066 100644
> > > > > > --- a/migration/ram.c
> > > > > > +++ b/migration/ram.c
> > > > > > @@ -473,6 +473,7 @@ static void compress_threads_save_cleanup(void)
> > > > > >    static int compress_threads_save_setup(void)
> > > > > >    {
> > > > > >        int i, thread_count;
> > > > > > +    Error *local_err = NULL;
> > > > > >          if (!migrate_use_compression()) {
> > > > > >            return 0;
> > > > > > @@ -502,9 +503,12 @@ static int compress_threads_save_setup(void)
> > > > > >            comp_param[i].quit = false;
> > > > > >            qemu_mutex_init(&comp_param[i].mutex);
> > > > > >            qemu_cond_init(&comp_param[i].cond);
> > > > > > -        qemu_thread_create(compress_threads + i, "compress",
> > > > > > -                           do_data_compress, comp_param + i,
> > > > > > -                           QEMU_THREAD_JOINABLE);
> > > > > > +        if (!qemu_thread_create(compress_threads + i, "compress",
> > > > > > +                                do_data_compress, comp_param + i,
> > > > > > +                                QEMU_THREAD_JOINABLE, &local_err)) {
> > > > > > +            error_reportf_err(local_err, "failed to
> > > > > > create do_data_compress: ");
> > > > > > +            goto exit;

[1]

> > > > > > +        }
> > > > > >        }
> > > > > >        return 0;
> > > > > Reviewing the migration changes is getting tiresome...
> > > > Yes, indeed, the migration involves a lot! Thanks so much for helping
> > > > to review!
> > > > >    Is reporting the
> > > > > error appropriate here, and why?
> > > > I think the qemu monitor should display the obvious and exact failing
> > > > reason for administrators, esp considering that qemu_thread_create()
> > > > itself does not print any message thus we have no idea which direct
> > > > function fails if gdb is not enabled.
> > > > IOW, I think David's answer to that ppc's error_reportf_err() also
> > > > apply here:
> > > > 
> > > > "The error returns are for the guest, the reported errors are for the
> > > > guest administrator or management layers."
> > > There could well be an issue with the "management layers" part. Should
> > > this error be sent to the management layer via QMP somehow? Migration
> > > maintainers should be able to assist with this question.
> Kindly ping migration maintainers. :)

I think both the maintainers are on holiday so possibly there won't be
any reply from them this week... :)

Regarding to error reports of migration via QMP layer, please have a
look at d59ce6f344 ("migration: add reporting of errors for outgoing
migration", 2016-05-26).  Though I see that even
qemu_savevm_state_setup() is not capturing error for the management
layer so if you want to pass this thread creation error upward you'll
possibly need to work on that as well.

Though here note that when you "goto exit" at [1] you probably also
need to touch up the cleanup part since otherwise the join() could be
with an invalid thread ID, so you'll possibly need to check the thread
ID validity before do the join() of the compression thread.

Regards,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.0 v8 6/7] qemu_thread_create: propagate the error to callers to handle
  2018-12-24  3:34             ` Peter Xu
@ 2018-12-24  6:53               ` Fei Li
  2018-12-25 12:18                 ` Fei Li
  0 siblings, 1 reply; 27+ messages in thread
From: Fei Li @ 2018-12-24  6:53 UTC (permalink / raw)
  To: Peter Xu
  Cc: Markus Armbruster, Juan Quintela, Dr . David Alan Gilbert,
	QEMU Developers



On 12/24/2018 11:34 AM, Peter Xu wrote:
> On Fri, Dec 21, 2018 at 05:36:57PM +0800, Fei Li wrote:
>> On 12/19/2018 08:14 PM, Fei Li wrote:
>>> On 12/19/2018 06:10 PM, Markus Armbruster wrote:
>>>> Fei Li <fli@suse.com> writes:
>>>>
>>>>> On 12/13/2018 03:26 PM, Markus Armbruster wrote:
>>>>>> There's a question for David Gibson inline.  Please search for /ppc/.
>>>>>>
>>>>>> Fei Li <fli@suse.com> writes:
>>>>>>
>>>>>>> Make qemu_thread_create() return a Boolean to indicate if it succeeds
>>>>>>> rather than failing with an error. And add an Error parameter to hold
>>>>>>> the error message and let the callers handle it.
>>>>>> The "rather than failing with an error" is misleading. Before the
>>>>>> patch, we report to stderr and abort().  What about:
>>>>>>
>>>>>>        qemu-thread: Make qemu_thread_create() handle errors properly
>>>>>>
>>>>>>        qemu_thread_create() abort()s on error.  Not nice. Give it a
>>>>>>        return value and an Error ** argument, so it can
>>>>>> return success /
>>>>>>        failure.
>>>>> A nice commit-amend! Thanks!
>>>>>> Still missing from the commit message then: how you update
>>>>>> the callers.
>>>>> Yes, agree. I think the-how should also be noted here, like
>>>>> - propagating the err to callers whose call trace already have the
>>>>> Error paramater;
>>>>> - just add an &error_abort for qemu_thread_create() and make it a
>>>>> "TODO: xxx";
>>>>>> Let's see below.
>>>>>>
>>>>>>> Cc: Markus Armbruster <armbru@redhat.com>
>>>>>>> Cc: Daniel P. Berrangé <berrange@redhat.com>
>>>>>>> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
>>>>>>> Signed-off-by: Fei Li <fli@suse.com>
>>>>>>> ---
>>>>>>>     cpus.c                      | 45
>>>>>>> ++++++++++++++++++++++++-------------
>>>>>>>     dump.c                      |  6 +++--
>>>>>>>     hw/misc/edu.c               |  6 +++--
>>>>>>>     hw/ppc/spapr_hcall.c        | 10 +++++++--
>>>>>>>     hw/rdma/rdma_backend.c      |  4 +++-
>>>>>>>     hw/usb/ccid-card-emulated.c | 16 ++++++++++----
>>>>>>>     include/qemu/thread.h       |  4 ++--
>>>>>>>     io/task.c                   |  3 ++-
>>>>>>>     iothread.c                  | 16 +++++++++-----
>>>>>>>     migration/migration.c       | 54
>>>>>>> +++++++++++++++++++++++++++++----------------
>>>>>>>     migration/postcopy-ram.c    | 14 ++++++++++--
>>>>>>>     migration/ram.c             | 40 ++++++++++++++++++++++++---------
>>>>>>>     migration/savevm.c          | 11 ++++++---
>>>>>>>     tests/atomic_add-bench.c    |  3 ++-
>>>>>>>     tests/iothread.c            |  2 +-
>>>>>>>     tests/qht-bench.c           |  3 ++-
>>>>>>>     tests/rcutorture.c          |  3 ++-
>>>>>>>     tests/test-aio.c            |  2 +-
>>>>>>>     tests/test-rcu-list.c       |  3 ++-
>>>>>>>     ui/vnc-jobs.c               | 17 +++++++++-----
>>>>>>>     ui/vnc-jobs.h               |  2 +-
>>>>>>>     ui/vnc.c                    |  4 +++-
>>>>>>>     util/compatfd.c             | 12 ++++++++--
>>>>>>>     util/oslib-posix.c          | 17 ++++++++++----
>>>>>>>     util/qemu-thread-posix.c    | 24 +++++++++++++-------
>>>>>>>     util/qemu-thread-win32.c    | 16 ++++++++++----
>>>>>>>     util/rcu.c                  |  3 ++-
>>>>>>>     util/thread-pool.c          |  4 +++-
>>>>>>>     28 files changed, 243 insertions(+), 101 deletions(-)
>>>>>>>
>> ...snip, and only leave the three uncertain small topics...
>>>>>>> diff --git a/migration/ram.c b/migration/ram.c
>>>>>>> index 658dfa88a3..6e0cccf066 100644
>>>>>>> --- a/migration/ram.c
>>>>>>> +++ b/migration/ram.c
>>>>>>> @@ -473,6 +473,7 @@ static void compress_threads_save_cleanup(void)
>>>>>>>     static int compress_threads_save_setup(void)
>>>>>>>     {
>>>>>>>         int i, thread_count;
>>>>>>> +    Error *local_err = NULL;
>>>>>>>           if (!migrate_use_compression()) {
>>>>>>>             return 0;
>>>>>>> @@ -502,9 +503,12 @@ static int compress_threads_save_setup(void)
>>>>>>>             comp_param[i].quit = false;
>>>>>>>             qemu_mutex_init(&comp_param[i].mutex);
>>>>>>>             qemu_cond_init(&comp_param[i].cond);
>>>>>>> -        qemu_thread_create(compress_threads + i, "compress",
>>>>>>> -                           do_data_compress, comp_param + i,
>>>>>>> -                           QEMU_THREAD_JOINABLE);
>>>>>>> +        if (!qemu_thread_create(compress_threads + i, "compress",
>>>>>>> +                                do_data_compress, comp_param + i,
>>>>>>> +                                QEMU_THREAD_JOINABLE, &local_err)) {
>>>>>>> +            error_reportf_err(local_err, "failed to
>>>>>>> create do_data_compress: ");
>>>>>>> +            goto exit;
> [1]
>
>>>>>>> +        }
>>>>>>>         }
>>>>>>>         return 0;
>>>>>> Reviewing the migration changes is getting tiresome...
>>>>> Yes, indeed, the migration involves a lot! Thanks so much for helping
>>>>> to review!
>>>>>>     Is reporting the
>>>>>> error appropriate here, and why?
>>>>> I think the qemu monitor should display the obvious and exact failing
>>>>> reason for administrators, esp considering that qemu_thread_create()
>>>>> itself does not print any message thus we have no idea which direct
>>>>> function fails if gdb is not enabled.
>>>>> IOW, I think David's answer to that ppc's error_reportf_err() also
>>>>> apply here:
>>>>>
>>>>> "The error returns are for the guest, the reported errors are for the
>>>>> guest administrator or management layers."
>>>> There could well be an issue with the "management layers" part. Should
>>>> this error be sent to the management layer via QMP somehow? Migration
>>>> maintainers should be able to assist with this question.
>> Kindly ping migration maintainers. :)
> I think both the maintainers are on holiday so possibly there won't be
> any reply from them this week... :)
>
> Regarding to error reports of migration via QMP layer, please have a
> look at d59ce6f344 ("migration: add reporting of errors for outgoing
> migration", 2016-05-26).  Though I see that even
> qemu_savevm_state_setup() is not capturing error for the management
> layer so if you want to pass this thread creation error upward you'll
> possibly need to work on that as well.
Thanks for the useful commit. :) I guess the "the client app" mentioned 
is not qemu,
but other upper thing, maybe something inside openstack? As I have to 
say that I
can see the error message (I mean the above error_reportf_err(...) ) be 
printed to the
screen when I use qemu command line via hmp to do the migration.

For the qemu_savevm_state_setup(), I see it sets the f->last_error 
(instead of s->error)
to indicate whether to stop the migration or not when back to 
migration_thread()
in migration_detect_error(s). And no matter whether 
qemu_savevm_state_setup()
succeeds, the current code continues to set the migration state to be 
ACTIVE. Emm,
I am wondering whether this is on purpose..
> Though here note that when you "goto exit" at [1] you probably also
> need to touch up the cleanup part since otherwise the join() could be
> with an invalid thread ID, so you'll possibly need to check the thread
> ID validity before do the join() of the compression thread.
Thanks for pointing this out. I think my last patch is to fix this 
problem, that is
to add a check in qemu_thread_join():
+    if (!thread->thread) {
+        return NULL;
+    }
Correct me if this is not the proper solution. :)

Have a nice day, thanks :)
Fei
>
> Regards,
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.0 v8 6/7] qemu_thread_create: propagate the error to callers to handle
  2018-12-24  6:53               ` Fei Li
@ 2018-12-25 12:18                 ` Fei Li
  0 siblings, 0 replies; 27+ messages in thread
From: Fei Li @ 2018-12-25 12:18 UTC (permalink / raw)
  To: Peter Xu
  Cc: QEMU Developers, Markus Armbruster, Dr . David Alan Gilbert,
	Juan Quintela

Hi all,

As I am leaving my current company and most reviewers are on holiday,
I'd like to send a new version now:
v9: "qemu_thread: Make qemu_thread_create() handle errors properly",
although some details like whether it is appropriate to report the error
to be seen by the management layer. And I will use my new personal
email address (shirley17fei@gmail.com <mailto:shirley17fei@gmail.com>) 
to follow the new version. :)

Merry Christmas, and have a nice day, thanks all!
Fei

On 12/24/2018 02:53 PM, Fei Li wrote:
>
>
> On 12/24/2018 11:34 AM, Peter Xu wrote:
>> On Fri, Dec 21, 2018 at 05:36:57PM +0800, Fei Li wrote:
>>> On 12/19/2018 08:14 PM, Fei Li wrote:
>>>> On 12/19/2018 06:10 PM, Markus Armbruster wrote:
>>>>> Fei Li <fli@suse.com> writes:
>>>>>
>>>>>> On 12/13/2018 03:26 PM, Markus Armbruster wrote:
>>>>>>> There's a question for David Gibson inline.  Please search for 
>>>>>>> /ppc/.
>>>>>>>
>>>>>>> Fei Li <fli@suse.com> writes:
>>>>>>>
>>>>>>>> Make qemu_thread_create() return a Boolean to indicate if it 
>>>>>>>> succeeds
>>>>>>>> rather than failing with an error. And add an Error parameter 
>>>>>>>> to hold
>>>>>>>> the error message and let the callers handle it.
>>>>>>> The "rather than failing with an error" is misleading. Before the
>>>>>>> patch, we report to stderr and abort().  What about:
>>>>>>>
>>>>>>>        qemu-thread: Make qemu_thread_create() handle errors 
>>>>>>> properly
>>>>>>>
>>>>>>>        qemu_thread_create() abort()s on error.  Not nice. Give it a
>>>>>>>        return value and an Error ** argument, so it can
>>>>>>> return success /
>>>>>>>        failure.
>>>>>> A nice commit-amend! Thanks!
>>>>>>> Still missing from the commit message then: how you update
>>>>>>> the callers.
>>>>>> Yes, agree. I think the-how should also be noted here, like
>>>>>> - propagating the err to callers whose call trace already have the
>>>>>> Error paramater;
>>>>>> - just add an &error_abort for qemu_thread_create() and make it a
>>>>>> "TODO: xxx";
>>>>>>> Let's see below.
>>>>>>>
>>>>>>>> Cc: Markus Armbruster <armbru@redhat.com>
>>>>>>>> Cc: Daniel P. Berrangé <berrange@redhat.com>
>>>>>>>> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
>>>>>>>> Signed-off-by: Fei Li <fli@suse.com>
>>>>>>>> ---
>>>>>>>>     cpus.c                      | 45
>>>>>>>> ++++++++++++++++++++++++-------------
>>>>>>>>     dump.c                      |  6 +++--
>>>>>>>>     hw/misc/edu.c               |  6 +++--
>>>>>>>>     hw/ppc/spapr_hcall.c        | 10 +++++++--
>>>>>>>>     hw/rdma/rdma_backend.c      |  4 +++-
>>>>>>>>     hw/usb/ccid-card-emulated.c | 16 ++++++++++----
>>>>>>>>     include/qemu/thread.h       |  4 ++--
>>>>>>>>     io/task.c                   |  3 ++-
>>>>>>>>     iothread.c                  | 16 +++++++++-----
>>>>>>>>     migration/migration.c       | 54
>>>>>>>> +++++++++++++++++++++++++++++----------------
>>>>>>>>     migration/postcopy-ram.c    | 14 ++++++++++--
>>>>>>>>     migration/ram.c             | 40 
>>>>>>>> ++++++++++++++++++++++++---------
>>>>>>>>     migration/savevm.c          | 11 ++++++---
>>>>>>>>     tests/atomic_add-bench.c    |  3 ++-
>>>>>>>>     tests/iothread.c            |  2 +-
>>>>>>>>     tests/qht-bench.c           |  3 ++-
>>>>>>>>     tests/rcutorture.c          |  3 ++-
>>>>>>>>     tests/test-aio.c            |  2 +-
>>>>>>>>     tests/test-rcu-list.c       |  3 ++-
>>>>>>>>     ui/vnc-jobs.c               | 17 +++++++++-----
>>>>>>>>     ui/vnc-jobs.h               |  2 +-
>>>>>>>>     ui/vnc.c                    |  4 +++-
>>>>>>>>     util/compatfd.c             | 12 ++++++++--
>>>>>>>>     util/oslib-posix.c          | 17 ++++++++++----
>>>>>>>>     util/qemu-thread-posix.c    | 24 +++++++++++++-------
>>>>>>>>     util/qemu-thread-win32.c    | 16 ++++++++++----
>>>>>>>>     util/rcu.c                  |  3 ++-
>>>>>>>>     util/thread-pool.c          |  4 +++-
>>>>>>>>     28 files changed, 243 insertions(+), 101 deletions(-)
>>>>>>>>
>>> ...snip, and only leave the three uncertain small topics...
>>>>>>>> diff --git a/migration/ram.c b/migration/ram.c
>>>>>>>> index 658dfa88a3..6e0cccf066 100644
>>>>>>>> --- a/migration/ram.c
>>>>>>>> +++ b/migration/ram.c
>>>>>>>> @@ -473,6 +473,7 @@ static void 
>>>>>>>> compress_threads_save_cleanup(void)
>>>>>>>>     static int compress_threads_save_setup(void)
>>>>>>>>     {
>>>>>>>>         int i, thread_count;
>>>>>>>> +    Error *local_err = NULL;
>>>>>>>>           if (!migrate_use_compression()) {
>>>>>>>>             return 0;
>>>>>>>> @@ -502,9 +503,12 @@ static int compress_threads_save_setup(void)
>>>>>>>>             comp_param[i].quit = false;
>>>>>>>> qemu_mutex_init(&comp_param[i].mutex);
>>>>>>>>             qemu_cond_init(&comp_param[i].cond);
>>>>>>>> -        qemu_thread_create(compress_threads + i, "compress",
>>>>>>>> -                           do_data_compress, comp_param + i,
>>>>>>>> -                           QEMU_THREAD_JOINABLE);
>>>>>>>> +        if (!qemu_thread_create(compress_threads + i, "compress",
>>>>>>>> +                                do_data_compress, comp_param + i,
>>>>>>>> + QEMU_THREAD_JOINABLE, &local_err)) {
>>>>>>>> +            error_reportf_err(local_err, "failed to
>>>>>>>> create do_data_compress: ");
>>>>>>>> +            goto exit;
>> [1]
>>
>>>>>>>> +        }
>>>>>>>>         }
>>>>>>>>         return 0;
>>>>>>> Reviewing the migration changes is getting tiresome...
>>>>>> Yes, indeed, the migration involves a lot! Thanks so much for 
>>>>>> helping
>>>>>> to review!
>>>>>>>     Is reporting the
>>>>>>> error appropriate here, and why?
>>>>>> I think the qemu monitor should display the obvious and exact 
>>>>>> failing
>>>>>> reason for administrators, esp considering that qemu_thread_create()
>>>>>> itself does not print any message thus we have no idea which direct
>>>>>> function fails if gdb is not enabled.
>>>>>> IOW, I think David's answer to that ppc's error_reportf_err() also
>>>>>> apply here:
>>>>>>
>>>>>> "The error returns are for the guest, the reported errors are for 
>>>>>> the
>>>>>> guest administrator or management layers."
>>>>> There could well be an issue with the "management layers" part. 
>>>>> Should
>>>>> this error be sent to the management layer via QMP somehow? Migration
>>>>> maintainers should be able to assist with this question.
>>> Kindly ping migration maintainers. :)
>> I think both the maintainers are on holiday so possibly there won't be
>> any reply from them this week... :)
>>
>> Regarding to error reports of migration via QMP layer, please have a
>> look at d59ce6f344 ("migration: add reporting of errors for outgoing
>> migration", 2016-05-26).  Though I see that even
>> qemu_savevm_state_setup() is not capturing error for the management
>> layer so if you want to pass this thread creation error upward you'll
>> possibly need to work on that as well.
> Thanks for the useful commit. :) I guess the "the client app" 
> mentioned is not qemu,
> but other upper thing, maybe something inside openstack? As I have to 
> say that I
> can see the error message (I mean the above error_reportf_err(...) ) 
> be printed to the
> screen when I use qemu command line via hmp to do the migration.
>
> For the qemu_savevm_state_setup(), I see it sets the f->last_error 
> (instead of s->error)
> to indicate whether to stop the migration or not when back to 
> migration_thread()
> in migration_detect_error(s). And no matter whether 
> qemu_savevm_state_setup()
> succeeds, the current code continues to set the migration state to be 
> ACTIVE. Emm,
> I am wondering whether this is on purpose..
>> Though here note that when you "goto exit" at [1] you probably also
>> need to touch up the cleanup part since otherwise the join() could be
>> with an invalid thread ID, so you'll possibly need to check the thread
>> ID validity before do the join() of the compression thread.
> Thanks for pointing this out. I think my last patch is to fix this 
> problem, that is
> to add a check in qemu_thread_join():
> +    if (!thread->thread) {
> +        return NULL;
> +    }
> Correct me if this is not the proper solution. :)
>
> Have a nice day, thanks :)
> Fei
>>
>> Regards,
>>
>
>
>
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH for-4.0 v8 6/7] qemu_thread_create: propagate the error to callers to handle
  2018-12-19  9:29       ` Markus Armbruster
@ 2019-01-02  2:29         ` David Gibson
  0 siblings, 0 replies; 27+ messages in thread
From: David Gibson @ 2019-01-02  2:29 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: Fei Li, qemu-devel, Dr . David Alan Gilbert

[-- Attachment #1: Type: text/plain, Size: 723 bytes --]

On Wed, 19 Dec 2018 10:29:41 +0100
Markus Armbruster <armbru@redhat.com> wrote:

> David Gibson <dgibson@redhat.com> writes:
> 
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
> 
> Double-checking: is it okay to report some failures of this function
> (one of two H_RESOURCE failures, to be precise), but not others?

Yes.  The distinction is whether the failure is likely to be of
relevance to the *host* administrator, or just to the guest.

Although.. come to think of it possibly that reported failure should be
H_HARDWARE to the guest, rather than H_RESOURCE.

-- 
David Gibson <dgibson@redhat.com>
Principal Software Engineer, Virtualization, Red Hat

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2019-01-02  2:29 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-11  9:50 [Qemu-devel] [PATCH for-4.0 v8 0/7] qemu_thread_create: propagate errors to callers to check Fei Li
2018-12-11  9:50 ` [Qemu-devel] [PATCH for-4.0 v8 1/7] Fix segmentation fault when qemu_signal_init fails Fei Li
2018-12-11  9:50 ` [Qemu-devel] [PATCH for-4.0 v8 2/7] qemu_init_vcpu: add a new Error parameter to propagate Fei Li
2018-12-11  9:50 ` [Qemu-devel] [PATCH for-4.0 v8 3/7] migration: fix the multifd code when receiving less channels Fei Li
2018-12-13  6:17   ` Markus Armbruster
2018-12-17 11:45     ` Fei Li
2018-12-19 14:11       ` Markus Armbruster
2018-12-20  3:27         ` Fei Li
2018-12-11  9:50 ` [Qemu-devel] [PATCH for-4.0 v8 4/7] migration: remove unused &local_err parameter in multifd_save_cleanup Fei Li
2018-12-11  9:50 ` [Qemu-devel] [PATCH for-4.0 v8 5/7] migration: add more error handling for postcopy_ram_enable_notify Fei Li
2018-12-11  9:50 ` [Qemu-devel] [PATCH for-4.0 v8 6/7] qemu_thread_create: propagate the error to callers to handle Fei Li
2018-12-13  7:26   ` Markus Armbruster
2018-12-14  0:24     ` David Gibson
2018-12-19  9:29       ` Markus Armbruster
2019-01-02  2:29         ` David Gibson
2018-12-17  7:29     ` Fei Li
2018-12-18 12:40       ` Fei Li
2018-12-19 10:11         ` Markus Armbruster
2018-12-19 10:10       ` Markus Armbruster
2018-12-19 12:14         ` Fei Li
2018-12-19 17:29           ` Eric Blake
2018-12-20  3:20             ` Fei Li
2018-12-21  9:36           ` Fei Li
2018-12-24  3:34             ` Peter Xu
2018-12-24  6:53               ` Fei Li
2018-12-25 12:18                 ` Fei Li
2018-12-11  9:50 ` [Qemu-devel] [PATCH for-4.0 v8 7/7] qemu_thread_join: fix segmentation fault Fei Li

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.