[Qemu-devel] [PATCH v6 00/12] cpu-exec: Safe work in quiescent state

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH v6 00/12] cpu-exec: Safe work in quiescent state
@ 2016-09-01 10:20 Paolo Bonzini
  2016-09-01 10:20 ` [Qemu-devel] [PATCH 01/12] cpus: pass CPUState to run_on_cpu helpers Paolo Bonzini
                   ` (12 more replies)
  0 siblings, 13 replies; 26+ messages in thread
From: Paolo Bonzini @ 2016-09-01 10:20 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Bennée, Sergey Fedorov, Sergey Fedorov, Emilio G. Cota

This is not the code I promised at the beginning of the week.
It's better, and not only in that this one works. :)  Instead of
reinventing the wheel and using the new wheel for linux-user's
start_exclusive/end_exclusive, the linux-user/ code is moved to
cpus-common.c and reused as the synchronization mechanism behind
async_safe_run_on_cpu.

The code works and actually satisfies our needs very well.  The only
disadvantage is that safe work items will run with a mutex taken; the
mutex fits decently in QEMU's hierarchy however, sitting between the
BQL and the tb_lock.

For performance, the last patch changes it to avoid condition variables
in the fast path.  (There are still two memory barriers; if desired they
could be merged with the ones in rcu_read_lock/rcu_read_unlock).  I am
including a formal model of the algorithm; together with new documentation
in include/qom/cpu.h, it accounts for most of the added lines of code.
Still, it is completely optional.

Paolo

Alex Bennée (1):
  cpus: pass CPUState to run_on_cpu helpers

Paolo Bonzini (5):
  cpus-common: move CPU list management to common code
  cpus-common: move exclusive work infrastructure from linux-user
  cpus-common: always defer async_run_on_cpu work items
  cpus-common: Introduce async_safe_run_on_cpu()
  cpus-common: lock-free fast path for cpu_exec_start/end

Sergey Fedorov (6):
  cpus: Move common code out of {async_, }run_on_cpu()
  cpus: Rename flush_queued_work()
  linux-user: Use QemuMutex and QemuCond
  linux-user: Add qemu_cpu_is_self() and qemu_cpu_kick()
  cpus-common: move CPU work item management to common code
  tcg: Make tb_flush() thread safe

 Makefile.target            |   2 +-
 bsd-user/main.c            |  30 +----
 cpu-exec.c                 |  12 +-
 cpus-common.c              | 314 +++++++++++++++++++++++++++++++++++++++++++++
 cpus.c                     |  99 +-------------
 docs/tcg-exclusive.promela | 224 ++++++++++++++++++++++++++++++++
 exec.c                     |  30 +----
 hw/i386/kvm/apic.c         |   3 +-
 hw/i386/kvmvapic.c         |   6 +-
 hw/ppc/ppce500_spin.c      |  31 ++---
 hw/ppc/spapr.c             |   6 +-
 hw/ppc/spapr_hcall.c       |  17 +--
 include/exec/cpu-all.h     |   4 +
 include/exec/cpu-common.h  |   2 +
 include/exec/exec-all.h    |  11 --
 include/exec/tb-context.h  |   2 +-
 include/qom/cpu.h          |  95 +++++++++++++-
 kvm-all.c                  |  21 +--
 linux-user/main.c          | 130 ++++++-------------
 target-i386/helper.c       |  19 ++-
 target-i386/kvm.c          |   6 +-
 target-s390x/cpu.c         |   4 +-
 target-s390x/cpu.h         |   7 +-
 target-s390x/kvm.c         |  98 +++++++-------
 target-s390x/misc_helper.c |   4 +-
 translate-all.c            |  38 ++++--
 vl.c                       |   1 +
 27 files changed, 814 insertions(+), 402 deletions(-)
 create mode 100644 cpus-common.c
 create mode 100644 docs/tcg-exclusive.promela

-- 
2.7.4

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH 01/12] cpus: pass CPUState to run_on_cpu helpers
  2016-09-01 10:20 [Qemu-devel] [PATCH v6 00/12] cpu-exec: Safe work in quiescent state Paolo Bonzini
@ 2016-09-01 10:20 ` Paolo Bonzini
  2016-09-01 10:20 ` [Qemu-devel] [PATCH 02/12] cpus: Move common code out of {async_, }run_on_cpu() Paolo Bonzini
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 26+ messages in thread
From: Paolo Bonzini @ 2016-09-01 10:20 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Bennée, Sergey Fedorov, Sergey Fedorov, Emilio G. Cota

From: Alex Bennée <alex.bennee@linaro.org>

CPUState is a fairly common pointer to pass to these helpers. This means
if you need other arguments for the async_run_on_cpu case you end up
having to do a g_malloc to stuff additional data into the routine. For
the current users this isn't a massive deal but for MTTCG this gets
cumbersome when the only other parameter is often an address.

This adds the typedef run_on_cpu_func for helper functions which has an
explicit CPUState * passed as the first parameter. All the users of
run_on_cpu and async_run_on_cpu have had their helpers updated to use
CPUState where available.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
[Sergey Fedorov:
 - eliminate more CPUState in user data;
 - remove unnecessary user data passing;
 - fix target-s390x/kvm.c and target-s390x/misc_helper.c]
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au> (ppc parts)
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> (s390 parts)
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <1470158864-17651-3-git-send-email-alex.bennee@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 cpus.c                     | 15 ++++---
 hw/i386/kvm/apic.c         |  3 +-
 hw/i386/kvmvapic.c         |  6 +--
 hw/ppc/ppce500_spin.c      | 31 +++++----------
 hw/ppc/spapr.c             |  6 +--
 hw/ppc/spapr_hcall.c       | 17 ++++----
 include/qom/cpu.h          |  8 ++--
 kvm-all.c                  | 21 ++++------
 target-i386/helper.c       | 19 ++++-----
 target-i386/kvm.c          |  6 +--
 target-s390x/cpu.c         |  4 +-
 target-s390x/cpu.h         |  7 +---
 target-s390x/kvm.c         | 98 +++++++++++++++++++++++-----------------------
 target-s390x/misc_helper.c |  4 +-
 14 files changed, 108 insertions(+), 137 deletions(-)

diff --git a/cpus.c b/cpus.c
index 0308431..2508cbf 100644
--- a/cpus.c
+++ b/cpus.c
@@ -556,9 +556,8 @@ static const VMStateDescription vmstate_timers = {
     }
 };
 
-static void cpu_throttle_thread(void *opaque)
+static void cpu_throttle_thread(CPUState *cpu, void *opaque)
 {
-    CPUState *cpu = opaque;
     double pct;
     double throttle_ratio;
     long sleeptime_ns;
@@ -588,7 +587,7 @@ static void cpu_throttle_timer_tick(void *opaque)
     }
     CPU_FOREACH(cpu) {
         if (!atomic_xchg(&cpu->throttle_thread_scheduled, 1)) {
-            async_run_on_cpu(cpu, cpu_throttle_thread, cpu);
+            async_run_on_cpu(cpu, cpu_throttle_thread, NULL);
         }
     }
 
@@ -916,12 +915,12 @@ void qemu_init_cpu_loop(void)
     qemu_thread_get_self(&io_thread);
 }
 
-void run_on_cpu(CPUState *cpu, void (*func)(void *data), void *data)
+void run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data)
 {
     struct qemu_work_item wi;
 
     if (qemu_cpu_is_self(cpu)) {
-        func(data);
+        func(cpu, data);
         return;
     }
 
@@ -949,12 +948,12 @@ void run_on_cpu(CPUState *cpu, void (*func)(void *data), void *data)
     }
 }
 
-void async_run_on_cpu(CPUState *cpu, void (*func)(void *data), void *data)
+void async_run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data)
 {
     struct qemu_work_item *wi;
 
     if (qemu_cpu_is_self(cpu)) {
-        func(data);
+        func(cpu, data);
         return;
     }
 
@@ -1005,7 +1004,7 @@ static void flush_queued_work(CPUState *cpu)
             cpu->queued_work_last = NULL;
         }
         qemu_mutex_unlock(&cpu->work_mutex);
-        wi->func(wi->data);
+        wi->func(cpu, wi->data);
         qemu_mutex_lock(&cpu->work_mutex);
         if (wi->free) {
             g_free(wi);
diff --git a/hw/i386/kvm/apic.c b/hw/i386/kvm/apic.c
index 2bd0de8..295b675 100644
--- a/hw/i386/kvm/apic.c
+++ b/hw/i386/kvm/apic.c
@@ -125,10 +125,9 @@ static void kvm_apic_vapic_base_update(APICCommonState *s)
     }
 }
 
-static void do_inject_external_nmi(void *data)
+static void do_inject_external_nmi(CPUState *cpu, void *data)
 {
     APICCommonState *s = data;
-    CPUState *cpu = CPU(s->cpu);
     uint32_t lvt;
     int ret;
 
diff --git a/hw/i386/kvmvapic.c b/hw/i386/kvmvapic.c
index 3bf1ddd..1bc02fb 100644
--- a/hw/i386/kvmvapic.c
+++ b/hw/i386/kvmvapic.c
@@ -483,7 +483,7 @@ typedef struct VAPICEnableTPRReporting {
     bool enable;
 } VAPICEnableTPRReporting;
 
-static void vapic_do_enable_tpr_reporting(void *data)
+static void vapic_do_enable_tpr_reporting(CPUState *cpu, void *data)
 {
     VAPICEnableTPRReporting *info = data;
 
@@ -734,10 +734,10 @@ static void vapic_realize(DeviceState *dev, Error **errp)
     nb_option_roms++;
 }
 
-static void do_vapic_enable(void *data)
+static void do_vapic_enable(CPUState *cs, void *data)
 {
     VAPICROMState *s = data;
-    X86CPU *cpu = X86_CPU(first_cpu);
+    X86CPU *cpu = X86_CPU(cs);
 
     static const uint8_t enabled = 1;
     cpu_physical_memory_write(s->vapic_paddr + offsetof(VAPICState, enabled),
diff --git a/hw/ppc/ppce500_spin.c b/hw/ppc/ppce500_spin.c
index 22c584e..8e16f65 100644
--- a/hw/ppc/ppce500_spin.c
+++ b/hw/ppc/ppce500_spin.c
@@ -54,11 +54,6 @@ typedef struct SpinState {
     SpinInfo spin[MAX_CPUS];
 } SpinState;
 
-typedef struct spin_kick {
-    PowerPCCPU *cpu;
-    SpinInfo *spin;
-} SpinKick;
-
 static void spin_reset(void *opaque)
 {
     SpinState *s = opaque;
@@ -89,16 +84,15 @@ static void mmubooke_create_initial_mapping(CPUPPCState *env,
     env->tlb_dirty = true;
 }
 
-static void spin_kick(void *data)
+static void spin_kick(CPUState *cs, void *data)
 {
-    SpinKick *kick = data;
-    CPUState *cpu = CPU(kick->cpu);
-    CPUPPCState *env = &kick->cpu->env;
-    SpinInfo *curspin = kick->spin;
+    PowerPCCPU *cpu = POWERPC_CPU(cs);
+    CPUPPCState *env = &cpu->env;
+    SpinInfo *curspin = data;
     hwaddr map_size = 64 * 1024 * 1024;
     hwaddr map_start;
 
-    cpu_synchronize_state(cpu);
+    cpu_synchronize_state(cs);
     stl_p(&curspin->pir, env->spr[SPR_BOOKE_PIR]);
     env->nip = ldq_p(&curspin->addr) & (map_size - 1);
     env->gpr[3] = ldq_p(&curspin->r3);
@@ -112,10 +106,10 @@ static void spin_kick(void *data)
     map_start = ldq_p(&curspin->addr) & ~(map_size - 1);
     mmubooke_create_initial_mapping(env, 0, map_start, map_size);
 
-    cpu->halted = 0;
-    cpu->exception_index = -1;
-    cpu->stopped = false;
-    qemu_cpu_kick(cpu);
+    cs->halted = 0;
+    cs->exception_index = -1;
+    cs->stopped = false;
+    qemu_cpu_kick(cs);
 }
 
 static void spin_write(void *opaque, hwaddr addr, uint64_t value,
@@ -153,12 +147,7 @@ static void spin_write(void *opaque, hwaddr addr, uint64_t value,
 
     if (!(ldq_p(&curspin->addr) & 1)) {
         /* run CPU */
-        SpinKick kick = {
-            .cpu = POWERPC_CPU(cpu),
-            .spin = curspin,
-        };
-
-        run_on_cpu(cpu, spin_kick, &kick);
+        run_on_cpu(cpu, spin_kick, curspin);
     }
 }
 
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 30d6800..7b28308 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2179,10 +2179,8 @@ static void spapr_machine_finalizefn(Object *obj)
     g_free(spapr->kvm_type);
 }
 
-static void ppc_cpu_do_nmi_on_cpu(void *arg)
+static void ppc_cpu_do_nmi_on_cpu(CPUState *cs, void *arg)
 {
-    CPUState *cs = arg;
-
     cpu_synchronize_state(cs);
     ppc_cpu_do_system_reset(cs);
 }
@@ -2192,7 +2190,7 @@ static void spapr_nmi(NMIState *n, int cpu_index, Error **errp)
     CPUState *cs;
 
     CPU_FOREACH(cs) {
-        async_run_on_cpu(cs, ppc_cpu_do_nmi_on_cpu, cs);
+        async_run_on_cpu(cs, ppc_cpu_do_nmi_on_cpu, NULL);
     }
 }
 
diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index 73af112..e5eca67 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -13,19 +13,18 @@
 #include "kvm_ppc.h"
 
 struct SPRSyncState {
-    CPUState *cs;
     int spr;
     target_ulong value;
     target_ulong mask;
 };
 
-static void do_spr_sync(void *arg)
+static void do_spr_sync(CPUState *cs, void *arg)
 {
     struct SPRSyncState *s = arg;
-    PowerPCCPU *cpu = POWERPC_CPU(s->cs);
+    PowerPCCPU *cpu = POWERPC_CPU(cs);
     CPUPPCState *env = &cpu->env;
 
-    cpu_synchronize_state(s->cs);
+    cpu_synchronize_state(cs);
     env->spr[s->spr] &= ~s->mask;
     env->spr[s->spr] |= s->value;
 }
@@ -34,7 +33,6 @@ static void set_spr(CPUState *cs, int spr, target_ulong value,
                     target_ulong mask)
 {
     struct SPRSyncState s = {
-        .cs = cs,
         .spr = spr,
         .value = value,
         .mask = mask
@@ -907,17 +905,17 @@ static target_ulong cas_get_option_vector(int vector, target_ulong table)
 }
 
 typedef struct {
-    PowerPCCPU *cpu;
     uint32_t cpu_version;
     Error *err;
 } SetCompatState;
 
-static void do_set_compat(void *arg)
+static void do_set_compat(CPUState *cs, void *arg)
 {
+    PowerPCCPU *cpu = POWERPC_CPU(cs);
     SetCompatState *s = arg;
 
-    cpu_synchronize_state(CPU(s->cpu));
-    ppc_set_compat(s->cpu, s->cpu_version, &s->err);
+    cpu_synchronize_state(cs);
+    ppc_set_compat(cpu, s->cpu_version, &s->err);
 }
 
 #define get_compat_level(cpuver) ( \
@@ -1013,7 +1011,6 @@ static target_ulong h_client_architecture_support(PowerPCCPU *cpu_,
     if (old_cpu_version != cpu_version) {
         CPU_FOREACH(cs) {
             SetCompatState s = {
-                .cpu = POWERPC_CPU(cs),
                 .cpu_version = cpu_version,
                 .err = NULL,
             };
diff --git a/include/qom/cpu.h b/include/qom/cpu.h
index ce0c406..4aa9e61 100644
--- a/include/qom/cpu.h
+++ b/include/qom/cpu.h
@@ -232,9 +232,11 @@ struct kvm_run;
 #define TB_JMP_CACHE_SIZE (1 << TB_JMP_CACHE_BITS)
 
 /* work queue */
+typedef void (*run_on_cpu_func)(CPUState *cpu, void *data);
+
 struct qemu_work_item {
     struct qemu_work_item *next;
-    void (*func)(void *data);
+    run_on_cpu_func func;
     void *data;
     int done;
     bool free;
@@ -623,7 +625,7 @@ bool cpu_is_stopped(CPUState *cpu);
  *
  * Schedules the function @func for execution on the vCPU @cpu.
  */
-void run_on_cpu(CPUState *cpu, void (*func)(void *data), void *data);
+void run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data);
 
 /**
  * async_run_on_cpu:
@@ -633,7 +635,7 @@ void run_on_cpu(CPUState *cpu, void (*func)(void *data), void *data);
  *
  * Schedules the function @func for execution on the vCPU @cpu asynchronously.
  */
-void async_run_on_cpu(CPUState *cpu, void (*func)(void *data), void *data);
+void async_run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data);
 
 /**
  * qemu_get_cpu:
diff --git a/kvm-all.c b/kvm-all.c
index ebf35b0..d945c2b 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -1847,10 +1847,8 @@ void kvm_flush_coalesced_mmio_buffer(void)
     s->coalesced_flush_in_progress = false;
 }
 
-static void do_kvm_cpu_synchronize_state(void *arg)
+static void do_kvm_cpu_synchronize_state(CPUState *cpu, void *arg)
 {
-    CPUState *cpu = arg;
-
     if (!cpu->kvm_vcpu_dirty) {
         kvm_arch_get_registers(cpu);
         cpu->kvm_vcpu_dirty = true;
@@ -1860,34 +1858,30 @@ static void do_kvm_cpu_synchronize_state(void *arg)
 void kvm_cpu_synchronize_state(CPUState *cpu)
 {
     if (!cpu->kvm_vcpu_dirty) {
-        run_on_cpu(cpu, do_kvm_cpu_synchronize_state, cpu);
+        run_on_cpu(cpu, do_kvm_cpu_synchronize_state, NULL);
     }
 }
 
-static void do_kvm_cpu_synchronize_post_reset(void *arg)
+static void do_kvm_cpu_synchronize_post_reset(CPUState *cpu, void *arg)
 {
-    CPUState *cpu = arg;
-
     kvm_arch_put_registers(cpu, KVM_PUT_RESET_STATE);
     cpu->kvm_vcpu_dirty = false;
 }
 
 void kvm_cpu_synchronize_post_reset(CPUState *cpu)
 {
-    run_on_cpu(cpu, do_kvm_cpu_synchronize_post_reset, cpu);
+    run_on_cpu(cpu, do_kvm_cpu_synchronize_post_reset, NULL);
 }
 
-static void do_kvm_cpu_synchronize_post_init(void *arg)
+static void do_kvm_cpu_synchronize_post_init(CPUState *cpu, void *arg)
 {
-    CPUState *cpu = arg;
-
     kvm_arch_put_registers(cpu, KVM_PUT_FULL_STATE);
     cpu->kvm_vcpu_dirty = false;
 }
 
 void kvm_cpu_synchronize_post_init(CPUState *cpu)
 {
-    run_on_cpu(cpu, do_kvm_cpu_synchronize_post_init, cpu);
+    run_on_cpu(cpu, do_kvm_cpu_synchronize_post_init, NULL);
 }
 
 int kvm_cpu_exec(CPUState *cpu)
@@ -2229,7 +2223,7 @@ struct kvm_set_guest_debug_data {
     int err;
 };
 
-static void kvm_invoke_set_guest_debug(void *data)
+static void kvm_invoke_set_guest_debug(CPUState *unused_cpu, void *data)
 {
     struct kvm_set_guest_debug_data *dbg_data = data;
 
@@ -2247,7 +2241,6 @@ int kvm_update_guest_debug(CPUState *cpu, unsigned long reinject_trap)
         data.dbg.control |= KVM_GUESTDBG_ENABLE | KVM_GUESTDBG_SINGLESTEP;
     }
     kvm_arch_update_guest_debug(cpu, &data.dbg);
-    data.cpu = cpu;
 
     run_on_cpu(cpu, kvm_invoke_set_guest_debug, &data);
     return data.err;
diff --git a/target-i386/helper.c b/target-i386/helper.c
index 1c250b8..9bc961b 100644
--- a/target-i386/helper.c
+++ b/target-i386/helper.c
@@ -1113,7 +1113,6 @@ out:
 
 typedef struct MCEInjectionParams {
     Monitor *mon;
-    X86CPU *cpu;
     int bank;
     uint64_t status;
     uint64_t mcg_status;
@@ -1122,14 +1121,14 @@ typedef struct MCEInjectionParams {
     int flags;
 } MCEInjectionParams;
 
-static void do_inject_x86_mce(void *data)
+static void do_inject_x86_mce(CPUState *cs, void *data)
 {
     MCEInjectionParams *params = data;
-    CPUX86State *cenv = &params->cpu->env;
-    CPUState *cpu = CPU(params->cpu);
+    X86CPU *cpu = X86_CPU(cs);
+    CPUX86State *cenv = &cpu->env;
     uint64_t *banks = cenv->mce_banks + 4 * params->bank;
 
-    cpu_synchronize_state(cpu);
+    cpu_synchronize_state(cs);
 
     /*
      * If there is an MCE exception being processed, ignore this SRAO MCE
@@ -1149,7 +1148,7 @@ static void do_inject_x86_mce(void *data)
         if ((cenv->mcg_cap & MCG_CTL_P) && cenv->mcg_ctl != ~(uint64_t)0) {
             monitor_printf(params->mon,
                            "CPU %d: Uncorrected error reporting disabled\n",
-                           cpu->cpu_index);
+                           cs->cpu_index);
             return;
         }
 
@@ -1161,7 +1160,7 @@ static void do_inject_x86_mce(void *data)
             monitor_printf(params->mon,
                            "CPU %d: Uncorrected error reporting disabled for"
                            " bank %d\n",
-                           cpu->cpu_index, params->bank);
+                           cs->cpu_index, params->bank);
             return;
         }
 
@@ -1170,7 +1169,7 @@ static void do_inject_x86_mce(void *data)
             monitor_printf(params->mon,
                            "CPU %d: Previous MCE still in progress, raising"
                            " triple fault\n",
-                           cpu->cpu_index);
+                           cs->cpu_index);
             qemu_log_mask(CPU_LOG_RESET, "Triple fault\n");
             qemu_system_reset_request();
             return;
@@ -1182,7 +1181,7 @@ static void do_inject_x86_mce(void *data)
         banks[3] = params->misc;
         cenv->mcg_status = params->mcg_status;
         banks[1] = params->status;
-        cpu_interrupt(cpu, CPU_INTERRUPT_MCE);
+        cpu_interrupt(cs, CPU_INTERRUPT_MCE);
     } else if (!(banks[1] & MCI_STATUS_VAL)
                || !(banks[1] & MCI_STATUS_UC)) {
         if (banks[1] & MCI_STATUS_VAL) {
@@ -1204,7 +1203,6 @@ void cpu_x86_inject_mce(Monitor *mon, X86CPU *cpu, int bank,
     CPUX86State *cenv = &cpu->env;
     MCEInjectionParams params = {
         .mon = mon,
-        .cpu = cpu,
         .bank = bank,
         .status = status,
         .mcg_status = mcg_status,
@@ -1245,7 +1243,6 @@ void cpu_x86_inject_mce(Monitor *mon, X86CPU *cpu, int bank,
             if (other_cs == cs) {
                 continue;
             }
-            params.cpu = X86_CPU(other_cs);
             run_on_cpu(other_cs, do_inject_x86_mce, &params);
         }
     }
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index d1a25c5..791c8b4 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -156,10 +156,8 @@ static int kvm_get_tsc(CPUState *cs)
     return 0;
 }
 
-static inline void do_kvm_synchronize_tsc(void *arg)
+static inline void do_kvm_synchronize_tsc(CPUState *cpu, void *arg)
 {
-    CPUState *cpu = arg;
-
     kvm_get_tsc(cpu);
 }
 
@@ -169,7 +167,7 @@ void kvm_synchronize_all_tsc(void)
 
     if (kvm_enabled()) {
         CPU_FOREACH(cpu) {
-            run_on_cpu(cpu, do_kvm_synchronize_tsc, cpu);
+            run_on_cpu(cpu, do_kvm_synchronize_tsc, NULL);
         }
     }
 }
diff --git a/target-s390x/cpu.c b/target-s390x/cpu.c
index e43e2d6..4f09c64 100644
--- a/target-s390x/cpu.c
+++ b/target-s390x/cpu.c
@@ -188,7 +188,7 @@ static void s390_cpu_machine_reset_cb(void *opaque)
 {
     S390CPU *cpu = opaque;
 
-    run_on_cpu(CPU(cpu), s390_do_cpu_full_reset, CPU(cpu));
+    run_on_cpu(CPU(cpu), s390_do_cpu_full_reset, NULL);
 }
 #endif
 
@@ -238,7 +238,7 @@ static void s390_cpu_realizefn(DeviceState *dev, Error **errp)
     s390_cpu_gdb_init(cs);
     qemu_init_vcpu(cs);
 #if !defined(CONFIG_USER_ONLY)
-    run_on_cpu(cs, s390_do_cpu_full_reset, cs);
+    run_on_cpu(cs, s390_do_cpu_full_reset, NULL);
 #else
     cpu_reset(cs);
 #endif
diff --git a/target-s390x/cpu.h b/target-s390x/cpu.h
index c216bda..0bdb1be 100644
--- a/target-s390x/cpu.h
+++ b/target-s390x/cpu.h
@@ -499,17 +499,14 @@ static inline hwaddr decode_basedisp_s(CPUS390XState *env, uint32_t ipb,
 #define decode_basedisp_rs decode_basedisp_s
 
 /* helper functions for run_on_cpu() */
-static inline void s390_do_cpu_reset(void *arg)
+static inline void s390_do_cpu_reset(CPUState *cs, void *arg)
 {
-    CPUState *cs = arg;
     S390CPUClass *scc = S390_CPU_GET_CLASS(cs);
 
     scc->cpu_reset(cs);
 }
-static inline void s390_do_cpu_full_reset(void *arg)
+static inline void s390_do_cpu_full_reset(CPUState *cs, void *arg)
 {
-    CPUState *cs = arg;
-
     cpu_reset(cs);
 }
 
diff --git a/target-s390x/kvm.c b/target-s390x/kvm.c
index 80ac621..53deb44 100644
--- a/target-s390x/kvm.c
+++ b/target-s390x/kvm.c
@@ -1354,7 +1354,6 @@ static int handle_diag(S390CPU *cpu, struct kvm_run *run, uint32_t ipb)
 }
 
 typedef struct SigpInfo {
-    S390CPU *cpu;
     uint64_t param;
     int cc;
     uint64_t *status_reg;
@@ -1367,38 +1366,40 @@ static void set_sigp_status(SigpInfo *si, uint64_t status)
     si->cc = SIGP_CC_STATUS_STORED;
 }
 
-static void sigp_start(void *arg)
+static void sigp_start(CPUState *cs, void *arg)
 {
+    S390CPU *cpu = S390_CPU(cs);
     SigpInfo *si = arg;
 
-    if (s390_cpu_get_state(si->cpu) != CPU_STATE_STOPPED) {
+    if (s390_cpu_get_state(cpu) != CPU_STATE_STOPPED) {
         si->cc = SIGP_CC_ORDER_CODE_ACCEPTED;
         return;
     }
 
-    s390_cpu_set_state(CPU_STATE_OPERATING, si->cpu);
+    s390_cpu_set_state(CPU_STATE_OPERATING, cpu);
     si->cc = SIGP_CC_ORDER_CODE_ACCEPTED;
 }
 
-static void sigp_stop(void *arg)
+static void sigp_stop(CPUState *cs, void *arg)
 {
+    S390CPU *cpu = S390_CPU(cs);
     SigpInfo *si = arg;
     struct kvm_s390_irq irq = {
         .type = KVM_S390_SIGP_STOP,
     };
 
-    if (s390_cpu_get_state(si->cpu) != CPU_STATE_OPERATING) {
+    if (s390_cpu_get_state(cpu) != CPU_STATE_OPERATING) {
         si->cc = SIGP_CC_ORDER_CODE_ACCEPTED;
         return;
     }
 
     /* disabled wait - sleeping in user space */
-    if (CPU(si->cpu)->halted) {
-        s390_cpu_set_state(CPU_STATE_STOPPED, si->cpu);
+    if (cs->halted) {
+        s390_cpu_set_state(CPU_STATE_STOPPED, cpu);
     } else {
         /* execute the stop function */
-        si->cpu->env.sigp_order = SIGP_STOP;
-        kvm_s390_vcpu_interrupt(si->cpu, &irq);
+        cpu->env.sigp_order = SIGP_STOP;
+        kvm_s390_vcpu_interrupt(cpu, &irq);
     }
     si->cc = SIGP_CC_ORDER_CODE_ACCEPTED;
 }
@@ -1465,56 +1466,58 @@ static int kvm_s390_store_status(S390CPU *cpu, hwaddr addr, bool store_arch)
     return 0;
 }
 
-static void sigp_stop_and_store_status(void *arg)
+static void sigp_stop_and_store_status(CPUState *cs, void *arg)
 {
+    S390CPU *cpu = S390_CPU(cs);
     SigpInfo *si = arg;
     struct kvm_s390_irq irq = {
         .type = KVM_S390_SIGP_STOP,
     };
 
     /* disabled wait - sleeping in user space */
-    if (s390_cpu_get_state(si->cpu) == CPU_STATE_OPERATING &&
-        CPU(si->cpu)->halted) {
-        s390_cpu_set_state(CPU_STATE_STOPPED, si->cpu);
+    if (s390_cpu_get_state(cpu) == CPU_STATE_OPERATING && cs->halted) {
+        s390_cpu_set_state(CPU_STATE_STOPPED, cpu);
     }
 
-    switch (s390_cpu_get_state(si->cpu)) {
+    switch (s390_cpu_get_state(cpu)) {
     case CPU_STATE_OPERATING:
-        si->cpu->env.sigp_order = SIGP_STOP_STORE_STATUS;
-        kvm_s390_vcpu_interrupt(si->cpu, &irq);
+        cpu->env.sigp_order = SIGP_STOP_STORE_STATUS;
+        kvm_s390_vcpu_interrupt(cpu, &irq);
         /* store will be performed when handling the stop intercept */
         break;
     case CPU_STATE_STOPPED:
         /* already stopped, just store the status */
-        cpu_synchronize_state(CPU(si->cpu));
-        kvm_s390_store_status(si->cpu, KVM_S390_STORE_STATUS_DEF_ADDR, true);
+        cpu_synchronize_state(cs);
+        kvm_s390_store_status(cpu, KVM_S390_STORE_STATUS_DEF_ADDR, true);
         break;
     }
     si->cc = SIGP_CC_ORDER_CODE_ACCEPTED;
 }
 
-static void sigp_store_status_at_address(void *arg)
+static void sigp_store_status_at_address(CPUState *cs, void *arg)
 {
+    S390CPU *cpu = S390_CPU(cs);
     SigpInfo *si = arg;
     uint32_t address = si->param & 0x7ffffe00u;
 
     /* cpu has to be stopped */
-    if (s390_cpu_get_state(si->cpu) != CPU_STATE_STOPPED) {
+    if (s390_cpu_get_state(cpu) != CPU_STATE_STOPPED) {
         set_sigp_status(si, SIGP_STAT_INCORRECT_STATE);
         return;
     }
 
-    cpu_synchronize_state(CPU(si->cpu));
+    cpu_synchronize_state(cs);
 
-    if (kvm_s390_store_status(si->cpu, address, false)) {
+    if (kvm_s390_store_status(cpu, address, false)) {
         set_sigp_status(si, SIGP_STAT_INVALID_PARAMETER);
         return;
     }
     si->cc = SIGP_CC_ORDER_CODE_ACCEPTED;
 }
 
-static void sigp_store_adtl_status(void *arg)
+static void sigp_store_adtl_status(CPUState *cs, void *arg)
 {
+    S390CPU *cpu = S390_CPU(cs);
     SigpInfo *si = arg;
 
     if (!kvm_check_extension(kvm_state, KVM_CAP_S390_VECTOR_REGISTERS)) {
@@ -1523,7 +1526,7 @@ static void sigp_store_adtl_status(void *arg)
     }
 
     /* cpu has to be stopped */
-    if (s390_cpu_get_state(si->cpu) != CPU_STATE_STOPPED) {
+    if (s390_cpu_get_state(cpu) != CPU_STATE_STOPPED) {
         set_sigp_status(si, SIGP_STAT_INCORRECT_STATE);
         return;
     }
@@ -1534,31 +1537,32 @@ static void sigp_store_adtl_status(void *arg)
         return;
     }
 
-    cpu_synchronize_state(CPU(si->cpu));
+    cpu_synchronize_state(cs);
 
-    if (kvm_s390_store_adtl_status(si->cpu, si->param)) {
+    if (kvm_s390_store_adtl_status(cpu, si->param)) {
         set_sigp_status(si, SIGP_STAT_INVALID_PARAMETER);
         return;
     }
     si->cc = SIGP_CC_ORDER_CODE_ACCEPTED;
 }
 
-static void sigp_restart(void *arg)
+static void sigp_restart(CPUState *cs, void *arg)
 {
+    S390CPU *cpu = S390_CPU(cs);
     SigpInfo *si = arg;
     struct kvm_s390_irq irq = {
         .type = KVM_S390_RESTART,
     };
 
-    switch (s390_cpu_get_state(si->cpu)) {
+    switch (s390_cpu_get_state(cpu)) {
     case CPU_STATE_STOPPED:
         /* the restart irq has to be delivered prior to any other pending irq */
-        cpu_synchronize_state(CPU(si->cpu));
-        do_restart_interrupt(&si->cpu->env);
-        s390_cpu_set_state(CPU_STATE_OPERATING, si->cpu);
+        cpu_synchronize_state(cs);
+        do_restart_interrupt(&cpu->env);
+        s390_cpu_set_state(CPU_STATE_OPERATING, cpu);
         break;
     case CPU_STATE_OPERATING:
-        kvm_s390_vcpu_interrupt(si->cpu, &irq);
+        kvm_s390_vcpu_interrupt(cpu, &irq);
         break;
     }
     si->cc = SIGP_CC_ORDER_CODE_ACCEPTED;
@@ -1566,20 +1570,18 @@ static void sigp_restart(void *arg)
 
 int kvm_s390_cpu_restart(S390CPU *cpu)
 {
-    SigpInfo si = {
-        .cpu = cpu,
-    };
+    SigpInfo si = {};
 
     run_on_cpu(CPU(cpu), sigp_restart, &si);
     DPRINTF("DONE: KVM cpu restart: %p\n", &cpu->env);
     return 0;
 }
 
-static void sigp_initial_cpu_reset(void *arg)
+static void sigp_initial_cpu_reset(CPUState *cs, void *arg)
 {
+    S390CPU *cpu = S390_CPU(cs);
+    S390CPUClass *scc = S390_CPU_GET_CLASS(cpu);
     SigpInfo *si = arg;
-    CPUState *cs = CPU(si->cpu);
-    S390CPUClass *scc = S390_CPU_GET_CLASS(si->cpu);
 
     cpu_synchronize_state(cs);
     scc->initial_cpu_reset(cs);
@@ -1587,11 +1589,11 @@ static void sigp_initial_cpu_reset(void *arg)
     si->cc = SIGP_CC_ORDER_CODE_ACCEPTED;
 }
 
-static void sigp_cpu_reset(void *arg)
+static void sigp_cpu_reset(CPUState *cs, void *arg)
 {
+    S390CPU *cpu = S390_CPU(cs);
+    S390CPUClass *scc = S390_CPU_GET_CLASS(cpu);
     SigpInfo *si = arg;
-    CPUState *cs = CPU(si->cpu);
-    S390CPUClass *scc = S390_CPU_GET_CLASS(si->cpu);
 
     cpu_synchronize_state(cs);
     scc->cpu_reset(cs);
@@ -1599,12 +1601,13 @@ static void sigp_cpu_reset(void *arg)
     si->cc = SIGP_CC_ORDER_CODE_ACCEPTED;
 }
 
-static void sigp_set_prefix(void *arg)
+static void sigp_set_prefix(CPUState *cs, void *arg)
 {
+    S390CPU *cpu = S390_CPU(cs);
     SigpInfo *si = arg;
     uint32_t addr = si->param & 0x7fffe000u;
 
-    cpu_synchronize_state(CPU(si->cpu));
+    cpu_synchronize_state(cs);
 
     if (!address_space_access_valid(&address_space_memory, addr,
                                     sizeof(struct LowCore), false)) {
@@ -1613,13 +1616,13 @@ static void sigp_set_prefix(void *arg)
     }
 
     /* cpu has to be stopped */
-    if (s390_cpu_get_state(si->cpu) != CPU_STATE_STOPPED) {
+    if (s390_cpu_get_state(cpu) != CPU_STATE_STOPPED) {
         set_sigp_status(si, SIGP_STAT_INCORRECT_STATE);
         return;
     }
 
-    si->cpu->env.psa = addr;
-    cpu_synchronize_post_init(CPU(si->cpu));
+    cpu->env.psa = addr;
+    cpu_synchronize_post_init(cs);
     si->cc = SIGP_CC_ORDER_CODE_ACCEPTED;
 }
 
@@ -1627,7 +1630,6 @@ static int handle_sigp_single_dst(S390CPU *dst_cpu, uint8_t order,
                                   uint64_t param, uint64_t *status_reg)
 {
     SigpInfo si = {
-        .cpu = dst_cpu,
         .param = param,
         .status_reg = status_reg,
     };
diff --git a/target-s390x/misc_helper.c b/target-s390x/misc_helper.c
index 86da194..4df2ec6 100644
--- a/target-s390x/misc_helper.c
+++ b/target-s390x/misc_helper.c
@@ -126,7 +126,7 @@ static int modified_clear_reset(S390CPU *cpu)
     pause_all_vcpus();
     cpu_synchronize_all_states();
     CPU_FOREACH(t) {
-        run_on_cpu(t, s390_do_cpu_full_reset, t);
+        run_on_cpu(t, s390_do_cpu_full_reset, NULL);
     }
     s390_cmma_reset();
     subsystem_reset();
@@ -145,7 +145,7 @@ static int load_normal_reset(S390CPU *cpu)
     pause_all_vcpus();
     cpu_synchronize_all_states();
     CPU_FOREACH(t) {
-        run_on_cpu(t, s390_do_cpu_reset, t);
+        run_on_cpu(t, s390_do_cpu_reset, NULL);
     }
     s390_cmma_reset();
     subsystem_reset();
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH 02/12] cpus: Move common code out of {async_, }run_on_cpu()
  2016-09-01 10:20 [Qemu-devel] [PATCH v6 00/12] cpu-exec: Safe work in quiescent state Paolo Bonzini
  2016-09-01 10:20 ` [Qemu-devel] [PATCH 01/12] cpus: pass CPUState to run_on_cpu helpers Paolo Bonzini
@ 2016-09-01 10:20 ` Paolo Bonzini
  2016-09-01 10:20 ` [Qemu-devel] [PATCH 03/12] cpus: Rename flush_queued_work() Paolo Bonzini
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 26+ messages in thread
From: Paolo Bonzini @ 2016-09-01 10:20 UTC (permalink / raw)
  To: qemu-devel
  Cc: Sergey Fedorov, Alex Bennée, Sergey Fedorov, Emilio G. Cota

From: Sergey Fedorov <serge.fdrv@gmail.com>

Move the code common between run_on_cpu() and async_run_on_cpu() into a
new function queue_work_on_cpu().

Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <1470158864-17651-4-git-send-email-alex.bennee@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 cpus.c | 42 ++++++++++++++++++------------------------
 1 file changed, 18 insertions(+), 24 deletions(-)

diff --git a/cpus.c b/cpus.c
index 2508cbf..6113eb7 100644
--- a/cpus.c
+++ b/cpus.c
@@ -915,6 +915,22 @@ void qemu_init_cpu_loop(void)
     qemu_thread_get_self(&io_thread);
 }
 
+static void queue_work_on_cpu(CPUState *cpu, struct qemu_work_item *wi)
+{
+    qemu_mutex_lock(&cpu->work_mutex);
+    if (cpu->queued_work_first == NULL) {
+        cpu->queued_work_first = wi;
+    } else {
+        cpu->queued_work_last->next = wi;
+    }
+    cpu->queued_work_last = wi;
+    wi->next = NULL;
+    wi->done = false;
+    qemu_mutex_unlock(&cpu->work_mutex);
+
+    qemu_cpu_kick(cpu);
+}
+
 void run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data)
 {
     struct qemu_work_item wi;
@@ -928,18 +944,7 @@ void run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data)
     wi.data = data;
     wi.free = false;
 
-    qemu_mutex_lock(&cpu->work_mutex);
-    if (cpu->queued_work_first == NULL) {
-        cpu->queued_work_first = &wi;
-    } else {
-        cpu->queued_work_last->next = &wi;
-    }
-    cpu->queued_work_last = &wi;
-    wi.next = NULL;
-    wi.done = false;
-    qemu_mutex_unlock(&cpu->work_mutex);
-
-    qemu_cpu_kick(cpu);
+    queue_work_on_cpu(cpu, &wi);
     while (!atomic_mb_read(&wi.done)) {
         CPUState *self_cpu = current_cpu;
 
@@ -962,18 +967,7 @@ void async_run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data)
     wi->data = data;
     wi->free = true;
 
-    qemu_mutex_lock(&cpu->work_mutex);
-    if (cpu->queued_work_first == NULL) {
-        cpu->queued_work_first = wi;
-    } else {
-        cpu->queued_work_last->next = wi;
-    }
-    cpu->queued_work_last = wi;
-    wi->next = NULL;
-    wi->done = false;
-    qemu_mutex_unlock(&cpu->work_mutex);
-
-    qemu_cpu_kick(cpu);
+    queue_work_on_cpu(cpu, wi);
 }
 
 static void qemu_kvm_destroy_vcpu(CPUState *cpu)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH 03/12] cpus: Rename flush_queued_work()
  2016-09-01 10:20 [Qemu-devel] [PATCH v6 00/12] cpu-exec: Safe work in quiescent state Paolo Bonzini
  2016-09-01 10:20 ` [Qemu-devel] [PATCH 01/12] cpus: pass CPUState to run_on_cpu helpers Paolo Bonzini
  2016-09-01 10:20 ` [Qemu-devel] [PATCH 02/12] cpus: Move common code out of {async_, }run_on_cpu() Paolo Bonzini
@ 2016-09-01 10:20 ` Paolo Bonzini
  2016-09-01 10:20 ` [Qemu-devel] [PATCH 04/12] linux-user: Use QemuMutex and QemuCond Paolo Bonzini
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 26+ messages in thread
From: Paolo Bonzini @ 2016-09-01 10:20 UTC (permalink / raw)
  To: qemu-devel
  Cc: Sergey Fedorov, Alex Bennée, Sergey Fedorov, Emilio G. Cota

From: Sergey Fedorov <serge.fdrv@gmail.com>

To avoid possible confusion, rename flush_queued_work() to
process_queued_cpu_work().

Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <1470158864-17651-6-git-send-email-alex.bennee@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 cpus.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/cpus.c b/cpus.c
index 6113eb7..ab15214 100644
--- a/cpus.c
+++ b/cpus.c
@@ -982,7 +982,7 @@ static void qemu_tcg_destroy_vcpu(CPUState *cpu)
 {
 }
 
-static void flush_queued_work(CPUState *cpu)
+static void process_queued_cpu_work(CPUState *cpu)
 {
     struct qemu_work_item *wi;
 
@@ -1017,7 +1017,7 @@ static void qemu_wait_io_event_common(CPUState *cpu)
         cpu->stopped = true;
         qemu_cond_broadcast(&qemu_pause_cond);
     }
-    flush_queued_work(cpu);
+    process_queued_cpu_work(cpu);
     cpu->thread_kicked = false;
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH 04/12] linux-user: Use QemuMutex and QemuCond
  2016-09-01 10:20 [Qemu-devel] [PATCH v6 00/12] cpu-exec: Safe work in quiescent state Paolo Bonzini
                   ` (2 preceding siblings ...)
  2016-09-01 10:20 ` [Qemu-devel] [PATCH 03/12] cpus: Rename flush_queued_work() Paolo Bonzini
@ 2016-09-01 10:20 ` Paolo Bonzini
  2016-09-01 10:20 ` [Qemu-devel] [PATCH 05/12] linux-user: Add qemu_cpu_is_self() and qemu_cpu_kick() Paolo Bonzini
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 26+ messages in thread
From: Paolo Bonzini @ 2016-09-01 10:20 UTC (permalink / raw)
  To: qemu-devel
  Cc: Sergey Fedorov, Alex Bennée, Sergey Fedorov, Emilio G. Cota

From: Sergey Fedorov <serge.fdrv@gmail.com>

Convert pthread_mutex_t and pthread_cond_t to QemuMutex and QemuCond.
This will allow to make some locks and conditional variables common
between user and system mode emulation.

Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <1470158864-17651-7-git-send-email-alex.bennee@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 linux-user/main.c | 53 +++++++++++++++++++++++++++++++----------------------
 1 file changed, 31 insertions(+), 22 deletions(-)

diff --git a/linux-user/main.c b/linux-user/main.c
index f2f4d2f..12a5475 100644
--- a/linux-user/main.c
+++ b/linux-user/main.c
@@ -111,17 +111,25 @@ int cpu_get_pic_interrupt(CPUX86State *env)
    We don't require a full sync, only that no cpus are executing guest code.
    The alternative is to map target atomic ops onto host equivalents,
    which requires quite a lot of per host/target work.  */
-static pthread_mutex_t cpu_list_mutex = PTHREAD_MUTEX_INITIALIZER;
-static pthread_mutex_t exclusive_lock = PTHREAD_MUTEX_INITIALIZER;
-static pthread_cond_t exclusive_cond = PTHREAD_COND_INITIALIZER;
-static pthread_cond_t exclusive_resume = PTHREAD_COND_INITIALIZER;
+static QemuMutex cpu_list_mutex;
+static QemuMutex exclusive_lock;
+static QemuCond exclusive_cond;
+static QemuCond exclusive_resume;
 static int pending_cpus;
 
+void qemu_init_cpu_loop(void)
+{
+    qemu_mutex_init(&cpu_list_mutex);
+    qemu_mutex_init(&exclusive_lock);
+    qemu_cond_init(&exclusive_cond);
+    qemu_cond_init(&exclusive_resume);
+}
+
 /* Make sure everything is in a consistent state for calling fork().  */
 void fork_start(void)
 {
     qemu_mutex_lock(&tcg_ctx.tb_ctx.tb_lock);
-    pthread_mutex_lock(&exclusive_lock);
+    qemu_mutex_lock(&exclusive_lock);
     mmap_fork_start();
 }
 
@@ -138,14 +146,14 @@ void fork_end(int child)
             }
         }
         pending_cpus = 0;
-        pthread_mutex_init(&exclusive_lock, NULL);
-        pthread_mutex_init(&cpu_list_mutex, NULL);
-        pthread_cond_init(&exclusive_cond, NULL);
-        pthread_cond_init(&exclusive_resume, NULL);
+        qemu_mutex_init(&exclusive_lock);
+        qemu_mutex_init(&cpu_list_mutex);
+        qemu_cond_init(&exclusive_cond);
+        qemu_cond_init(&exclusive_resume);
         qemu_mutex_init(&tcg_ctx.tb_ctx.tb_lock);
         gdbserver_fork(thread_cpu);
     } else {
-        pthread_mutex_unlock(&exclusive_lock);
+        qemu_mutex_unlock(&exclusive_lock);
         qemu_mutex_unlock(&tcg_ctx.tb_ctx.tb_lock);
     }
 }
@@ -155,7 +163,7 @@ void fork_end(int child)
 static inline void exclusive_idle(void)
 {
     while (pending_cpus) {
-        pthread_cond_wait(&exclusive_resume, &exclusive_lock);
+        qemu_cond_wait(&exclusive_resume, &exclusive_lock);
     }
 }
 
@@ -165,7 +173,7 @@ static inline void start_exclusive(void)
 {
     CPUState *other_cpu;
 
-    pthread_mutex_lock(&exclusive_lock);
+    qemu_mutex_lock(&exclusive_lock);
     exclusive_idle();
 
     pending_cpus = 1;
@@ -177,7 +185,7 @@ static inline void start_exclusive(void)
         }
     }
     if (pending_cpus > 1) {
-        pthread_cond_wait(&exclusive_cond, &exclusive_lock);
+        qemu_cond_wait(&exclusive_cond, &exclusive_lock);
     }
 }
 
@@ -185,42 +193,42 @@ static inline void start_exclusive(void)
 static inline void __attribute__((unused)) end_exclusive(void)
 {
     pending_cpus = 0;
-    pthread_cond_broadcast(&exclusive_resume);
-    pthread_mutex_unlock(&exclusive_lock);
+    qemu_cond_broadcast(&exclusive_resume);
+    qemu_mutex_unlock(&exclusive_lock);
 }
 
 /* Wait for exclusive ops to finish, and begin cpu execution.  */
 static inline void cpu_exec_start(CPUState *cpu)
 {
-    pthread_mutex_lock(&exclusive_lock);
+    qemu_mutex_lock(&exclusive_lock);
     exclusive_idle();
     cpu->running = true;
-    pthread_mutex_unlock(&exclusive_lock);
+    qemu_mutex_unlock(&exclusive_lock);
 }
 
 /* Mark cpu as not executing, and release pending exclusive ops.  */
 static inline void cpu_exec_end(CPUState *cpu)
 {
-    pthread_mutex_lock(&exclusive_lock);
+    qemu_mutex_lock(&exclusive_lock);
     cpu->running = false;
     if (pending_cpus > 1) {
         pending_cpus--;
         if (pending_cpus == 1) {
-            pthread_cond_signal(&exclusive_cond);
+            qemu_cond_signal(&exclusive_cond);
         }
     }
     exclusive_idle();
-    pthread_mutex_unlock(&exclusive_lock);
+    qemu_mutex_unlock(&exclusive_lock);
 }
 
 void cpu_list_lock(void)
 {
-    pthread_mutex_lock(&cpu_list_mutex);
+    qemu_mutex_lock(&cpu_list_mutex);
 }
 
 void cpu_list_unlock(void)
 {
-    pthread_mutex_unlock(&cpu_list_mutex);
+    qemu_mutex_unlock(&cpu_list_mutex);
 }
 
 
@@ -4222,6 +4230,7 @@ int main(int argc, char **argv, char **envp)
     int ret;
     int execfd;
 
+    qemu_init_cpu_loop();
     module_call_init(MODULE_INIT_QOM);
 
     if ((envlist = envlist_create()) == NULL) {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH 05/12] linux-user: Add qemu_cpu_is_self() and qemu_cpu_kick()
  2016-09-01 10:20 [Qemu-devel] [PATCH v6 00/12] cpu-exec: Safe work in quiescent state Paolo Bonzini
                   ` (3 preceding siblings ...)
  2016-09-01 10:20 ` [Qemu-devel] [PATCH 04/12] linux-user: Use QemuMutex and QemuCond Paolo Bonzini
@ 2016-09-01 10:20 ` Paolo Bonzini
  2016-09-01 10:20 ` [Qemu-devel] [PATCH 06/12] cpus-common: move CPU list management to common code Paolo Bonzini
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 26+ messages in thread
From: Paolo Bonzini @ 2016-09-01 10:20 UTC (permalink / raw)
  To: qemu-devel
  Cc: Sergey Fedorov, Alex Bennée, Sergey Fedorov, Emilio G. Cota

From: Sergey Fedorov <serge.fdrv@gmail.com>

Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <1470158864-17651-9-git-send-email-alex.bennee@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 linux-user/main.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/linux-user/main.c b/linux-user/main.c
index 12a5475..6195c68 100644
--- a/linux-user/main.c
+++ b/linux-user/main.c
@@ -3788,6 +3788,16 @@ void cpu_loop(CPUTLGState *env)
 
 THREAD CPUState *thread_cpu;
 
+bool qemu_cpu_is_self(CPUState *cpu)
+{
+    return thread_cpu == cpu;
+}
+
+void qemu_cpu_kick(CPUState *cpu)
+{
+    cpu_exit(cpu);
+}
+
 void task_settid(TaskState *ts)
 {
     if (ts->ts_tid == 0) {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH 06/12] cpus-common: move CPU list management to common code
  2016-09-01 10:20 [Qemu-devel] [PATCH v6 00/12] cpu-exec: Safe work in quiescent state Paolo Bonzini
                   ` (4 preceding siblings ...)
  2016-09-01 10:20 ` [Qemu-devel] [PATCH 05/12] linux-user: Add qemu_cpu_is_self() and qemu_cpu_kick() Paolo Bonzini
@ 2016-09-01 10:20 ` Paolo Bonzini
  2016-09-05 10:05   ` Alex Bennée
  2016-09-01 10:20 ` [Qemu-devel] [PATCH 07/12] cpus-common: move CPU work item management to common Paolo Bonzini
                   ` (6 subsequent siblings)
  12 siblings, 1 reply; 26+ messages in thread
From: Paolo Bonzini @ 2016-09-01 10:20 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Bennée, Sergey Fedorov, Sergey Fedorov, Emilio G. Cota

Add a mutex for the CPU list to system emulation, as it will be used to
manage safe work.  Abstract manipulation of the CPU list in new functions
cpu_list_add and cpu_list_remove.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 Makefile.target           |  2 +-
 bsd-user/main.c           |  9 +-----
 cpus-common.c             | 74 +++++++++++++++++++++++++++++++++++++++++++++++
 exec.c                    | 30 ++-----------------
 include/exec/cpu-all.h    |  4 +++
 include/exec/cpu-common.h |  2 ++
 include/exec/exec-all.h   | 11 -------
 include/qom/cpu.h         | 12 ++++++++
 linux-user/main.c         | 17 +++--------
 vl.c                      |  1 +
 10 files changed, 101 insertions(+), 61 deletions(-)
 create mode 100644 cpus-common.c

diff --git a/Makefile.target b/Makefile.target
index a440bcb..73d0c2f 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -86,7 +86,7 @@ all: $(PROGS) stap
 # cpu emulator library
 obj-y = exec.o translate-all.o cpu-exec.o
 obj-y += translate-common.o
-obj-y += cpu-exec-common.o
+obj-y += cpu-exec-common.o cpus-common.o
 obj-y += tcg/tcg.o tcg/tcg-op.o tcg/optimize.o
 obj-$(CONFIG_TCG_INTERPRETER) += tci.o
 obj-y += tcg/tcg-common.o
diff --git a/bsd-user/main.c b/bsd-user/main.c
index b4a0a00..461641a 100644
--- a/bsd-user/main.c
+++ b/bsd-user/main.c
@@ -94,14 +94,6 @@ void fork_end(int child)
     }
 }
 
-void cpu_list_lock(void)
-{
-}
-
-void cpu_list_unlock(void)
-{
-}
-
 #ifdef TARGET_I386
 /***********************************************************/
 /* CPUX86 core interface */
@@ -747,6 +739,7 @@ int main(int argc, char **argv)
     if (argc <= 1)
         usage();
 
+    qemu_init_cpu_list();
     module_call_init(MODULE_INIT_QOM);
 
     if ((envlist = envlist_create()) == NULL) {
diff --git a/cpus-common.c b/cpus-common.c
new file mode 100644
index 0000000..642e923
--- /dev/null
+++ b/cpus-common.c
@@ -0,0 +1,74 @@
+/*
+ * CPU thread main loop - common bits for user and system mode emulation
+ *
+ *  Copyright (c) 2003-2005 Fabrice Bellard
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+#include "cpu.h"
+#include "sysemu/cpus.h"
+#include "exec/memory-internal.h"
+
+static QemuMutex qemu_cpu_list_mutex;
+
+void qemu_init_cpu_list(void)
+{
+    qemu_mutex_init(&qemu_cpu_list_mutex);
+}
+
+void cpu_list_lock(void)
+{
+    qemu_mutex_lock(&qemu_cpu_list_mutex);
+}
+
+void cpu_list_unlock(void)
+{
+    qemu_mutex_unlock(&qemu_cpu_list_mutex);
+}
+
+static int cpu_get_free_index(void)
+{
+    CPUState *some_cpu;
+    int cpu_index = 0;
+
+    CPU_FOREACH(some_cpu) {
+        cpu_index++;
+    }
+    return cpu_index;
+}
+
+void cpu_list_add(CPUState *cpu)
+{
+    qemu_mutex_lock(&qemu_cpu_list_mutex);
+    if (cpu->cpu_index == UNASSIGNED_CPU_INDEX) {
+        cpu->cpu_index = cpu_get_free_index();
+        assert(cpu->cpu_index != UNASSIGNED_CPU_INDEX);
+    }
+
+    QTAILQ_INSERT_TAIL(&cpus, cpu, node);
+    qemu_mutex_unlock(&qemu_cpu_list_mutex);
+}
+
+void cpu_list_remove(CPUState *cpu)
+{
+    /* ??? How is this serialized against CPU_FOREACH? */
+    qemu_mutex_lock(&qemu_cpu_list_mutex);
+    if (QTAILQ_IN_USE(cpu, node)) {
+        QTAILQ_REMOVE(&cpus, cpu, node);
+    }
+    cpu->cpu_index = UNASSIGNED_CPU_INDEX;
+    qemu_mutex_unlock(&qemu_cpu_list_mutex);
+}
diff --git a/exec.c b/exec.c
index 286f5cd..784990c 100644
--- a/exec.c
+++ b/exec.c
@@ -598,31 +598,11 @@ AddressSpace *cpu_get_address_space(CPUState *cpu, int asidx)
 }
 #endif
 
-static int cpu_get_free_index(void)
-{
-    CPUState *some_cpu;
-    int cpu_index = 0;
-
-    CPU_FOREACH(some_cpu) {
-        cpu_index++;
-    }
-    return cpu_index;
-}
-
 void cpu_exec_exit(CPUState *cpu)
 {
     CPUClass *cc = CPU_GET_CLASS(cpu);
 
-    cpu_list_lock();
-    if (!QTAILQ_IN_USE(cpu, node)) {
-        /* there is nothing to undo since cpu_exec_init() hasn't been called */
-        cpu_list_unlock();
-        return;
-    }
-
-    QTAILQ_REMOVE(&cpus, cpu, node);
-    cpu->cpu_index = UNASSIGNED_CPU_INDEX;
-    cpu_list_unlock();
+    cpu_list_remove(cpu);
 
     if (cc->vmsd != NULL) {
         vmstate_unregister(NULL, cc->vmsd, cpu);
@@ -658,13 +638,7 @@ void cpu_exec_init(CPUState *cpu, Error **errp)
     object_ref(OBJECT(cpu->memory));
 #endif
 
-    cpu_list_lock();
-    if (cpu->cpu_index == UNASSIGNED_CPU_INDEX) {
-        cpu->cpu_index = cpu_get_free_index();
-        assert(cpu->cpu_index != UNASSIGNED_CPU_INDEX);
-    }
-    QTAILQ_INSERT_TAIL(&cpus, cpu, node);
-    cpu_list_unlock();
+    cpu_list_add(cpu);
 
 #ifndef CONFIG_USER_ONLY
     if (qdev_get_vmsd(DEVICE(cpu)) == NULL) {
diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
index b6a7059..c694d16 100644
--- a/include/exec/cpu-all.h
+++ b/include/exec/cpu-all.h
@@ -187,6 +187,10 @@ void address_space_stq(AddressSpace *as, hwaddr addr, uint64_t val,
                             MemTxAttrs attrs, MemTxResult *result);
 #endif
 
+/* The CPU list lock nests outside tb_lock/tb_unlock.  */
+void cpu_list_lock(void);
+void cpu_list_unlock(void);
+
 /* page related stuff */
 
 #define TARGET_PAGE_SIZE (1 << TARGET_PAGE_BITS)
diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
index 952bcfe..1d41452 100644
--- a/include/exec/cpu-common.h
+++ b/include/exec/cpu-common.h
@@ -23,6 +23,8 @@ typedef struct CPUListState {
     FILE *file;
 } CPUListState;
 
+void qemu_init_cpu_list(void);
+
 #if !defined(CONFIG_USER_ONLY)
 
 enum device_endian {
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index a0e87be..36ab8b6 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -56,17 +56,6 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
                               target_ulong pc, target_ulong cs_base,
                               uint32_t flags,
                               int cflags);
-#if defined(CONFIG_USER_ONLY)
-void cpu_list_lock(void);
-void cpu_list_unlock(void);
-#else
-static inline void cpu_list_unlock(void)
-{
-}
-static inline void cpu_list_lock(void)
-{
-}
-#endif
 
 void cpu_exec_init(CPUState *cpu, Error **errp);
 void QEMU_NORETURN cpu_loop_exit(CPUState *cpu);
diff --git a/include/qom/cpu.h b/include/qom/cpu.h
index 4aa9e61..ea3233f 100644
--- a/include/qom/cpu.h
+++ b/include/qom/cpu.h
@@ -545,6 +545,18 @@ static inline int cpu_asidx_from_attrs(CPUState *cpu, MemTxAttrs attrs)
 #endif
 
 /**
+ * cpu_list_add:
+ * @cpu: The CPU to be added to the list of CPUs.
+ */
+void cpu_list_add(CPUState *cpu);
+
+/**
+ * cpu_list_remove:
+ * @cpu: The CPU to be removed from the list of CPUs.
+ */
+void cpu_list_remove(CPUState *cpu);
+
+/**
  * cpu_reset:
  * @cpu: The CPU whose state is to be reset.
  */
diff --git a/linux-user/main.c b/linux-user/main.c
index 6195c68..bd5b58f 100644
--- a/linux-user/main.c
+++ b/linux-user/main.c
@@ -111,7 +111,6 @@ int cpu_get_pic_interrupt(CPUX86State *env)
    We don't require a full sync, only that no cpus are executing guest code.
    The alternative is to map target atomic ops onto host equivalents,
    which requires quite a lot of per host/target work.  */
-static QemuMutex cpu_list_mutex;
 static QemuMutex exclusive_lock;
 static QemuCond exclusive_cond;
 static QemuCond exclusive_resume;
@@ -119,7 +118,6 @@ static int pending_cpus;
 
 void qemu_init_cpu_loop(void)
 {
-    qemu_mutex_init(&cpu_list_mutex);
     qemu_mutex_init(&exclusive_lock);
     qemu_cond_init(&exclusive_cond);
     qemu_cond_init(&exclusive_resume);
@@ -128,6 +126,7 @@ void qemu_init_cpu_loop(void)
 /* Make sure everything is in a consistent state for calling fork().  */
 void fork_start(void)
 {
+    cpu_list_lock();
     qemu_mutex_lock(&tcg_ctx.tb_ctx.tb_lock);
     qemu_mutex_lock(&exclusive_lock);
     mmap_fork_start();
@@ -147,14 +146,15 @@ void fork_end(int child)
         }
         pending_cpus = 0;
         qemu_mutex_init(&exclusive_lock);
-        qemu_mutex_init(&cpu_list_mutex);
         qemu_cond_init(&exclusive_cond);
         qemu_cond_init(&exclusive_resume);
         qemu_mutex_init(&tcg_ctx.tb_ctx.tb_lock);
+        qemu_init_cpu_list();
         gdbserver_fork(thread_cpu);
     } else {
         qemu_mutex_unlock(&exclusive_lock);
         qemu_mutex_unlock(&tcg_ctx.tb_ctx.tb_lock);
+        cpu_list_unlock();
     }
 }
 
@@ -221,16 +221,6 @@ static inline void cpu_exec_end(CPUState *cpu)
     qemu_mutex_unlock(&exclusive_lock);
 }
 
-void cpu_list_lock(void)
-{
-    qemu_mutex_lock(&cpu_list_mutex);
-}
-
-void cpu_list_unlock(void)
-{
-    qemu_mutex_unlock(&cpu_list_mutex);
-}
-
 
 #ifdef TARGET_I386
 /***********************************************************/
@@ -4240,6 +4230,7 @@ int main(int argc, char **argv, char **envp)
     int ret;
     int execfd;
 
+    qemu_init_cpu_list();
     qemu_init_cpu_loop();
     module_call_init(MODULE_INIT_QOM);
 
diff --git a/vl.c b/vl.c
index b3c80d5..1445935 100644
--- a/vl.c
+++ b/vl.c
@@ -2965,6 +2965,7 @@ int main(int argc, char **argv, char **envp)
     Error *err = NULL;
     bool list_data_dirs = false;
 
+    qemu_init_cpu_list();
     qemu_init_cpu_loop();
     qemu_mutex_lock_iothread();
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH 07/12] cpus-common: move CPU work item management to common
  2016-09-01 10:20 [Qemu-devel] [PATCH v6 00/12] cpu-exec: Safe work in quiescent state Paolo Bonzini
                   ` (5 preceding siblings ...)
  2016-09-01 10:20 ` [Qemu-devel] [PATCH 06/12] cpus-common: move CPU list management to common code Paolo Bonzini
@ 2016-09-01 10:20 ` Paolo Bonzini
  2016-09-01 10:20 ` [Qemu-devel] [PATCH 08/12] cpus-common: move exclusive work infrastructure from Paolo Bonzini
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 26+ messages in thread
From: Paolo Bonzini @ 2016-09-01 10:20 UTC (permalink / raw)
  To: qemu-devel
  Cc: Sergey Fedorov, Alex Bennée, Sergey Fedorov, Emilio G. Cota code

From: Sergey Fedorov <serge.fdrv@gmail.com>

Make CPU work core functions common between system and user-mode
emulation. User-mode does not use run_on_cpu, so do not implement it.

Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <1470158864-17651-10-git-send-email-alex.bennee@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 bsd-user/main.c   |  8 ++++--
 cpus-common.c     | 86 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 cpus.c            | 82 +---------------------------------------------------
 include/qom/cpu.h | 18 ++++++++++++
 linux-user/main.c | 25 ++++++++++++++++
 5 files changed, 136 insertions(+), 83 deletions(-)

diff --git a/bsd-user/main.c b/bsd-user/main.c
index 461641a..125067a 100644
--- a/bsd-user/main.c
+++ b/bsd-user/main.c
@@ -67,11 +67,11 @@ int cpu_get_pic_interrupt(CPUX86State *env)
 #endif
 
 /* These are no-ops because we are not threadsafe.  */
-static inline void cpu_exec_start(CPUArchState *env)
+static inline void cpu_exec_start(CPUState *cpu)
 {
 }
 
-static inline void cpu_exec_end(CPUArchState *env)
+static inline void cpu_exec_end(CPUState *cpu)
 {
 }
 
@@ -163,7 +163,9 @@ void cpu_loop(CPUX86State *env)
     //target_siginfo_t info;
 
     for(;;) {
+        cpu_exec_start(cs);
         trapnr = cpu_exec(cs);
+        cpu_exec_end(cs);
         switch(trapnr) {
         case 0x80:
             /* syscall from int $0x80 */
@@ -504,7 +506,9 @@ void cpu_loop(CPUSPARCState *env)
     //target_siginfo_t info;
 
     while (1) {
+        cpu_exec_start(cs);
         trapnr = cpu_exec(cs);
+        cpu_exec_end(cs);
 
         switch (trapnr) {
 #ifndef TARGET_SPARC64
diff --git a/cpus-common.c b/cpus-common.c
index 642e923..12729e5 100644
--- a/cpus-common.c
+++ b/cpus-common.c
@@ -23,10 +23,12 @@
 #include "exec/memory-internal.h"
 
 static QemuMutex qemu_cpu_list_mutex;
+static QemuCond qemu_work_cond;
 
 void qemu_init_cpu_list(void)
 {
     qemu_mutex_init(&qemu_cpu_list_mutex);
+    qemu_cond_init(&qemu_work_cond);
 }
 
 void cpu_list_lock(void)
@@ -72,3 +74,87 @@ void cpu_list_remove(CPUState *cpu)
     cpu->cpu_index = UNASSIGNED_CPU_INDEX;
     qemu_mutex_unlock(&qemu_cpu_list_mutex);
 }
+
+static void queue_work_on_cpu(CPUState *cpu, struct qemu_work_item *wi)
+{
+    qemu_mutex_lock(&cpu->work_mutex);
+    if (cpu->queued_work_first == NULL) {
+        cpu->queued_work_first = wi;
+    } else {
+        cpu->queued_work_last->next = wi;
+    }
+    cpu->queued_work_last = wi;
+    wi->next = NULL;
+    wi->done = false;
+    qemu_mutex_unlock(&cpu->work_mutex);
+
+    qemu_cpu_kick(cpu);
+}
+
+void do_run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data,
+                   QemuMutex *mutex)
+{
+    struct qemu_work_item wi;
+
+    if (qemu_cpu_is_self(cpu)) {
+        func(cpu, data);
+        return;
+    }
+
+    wi.func = func;
+    wi.data = data;
+    wi.free = false;
+
+    queue_work_on_cpu(cpu, &wi);
+    while (!atomic_mb_read(&wi.done)) {
+        CPUState *self_cpu = current_cpu;
+
+        qemu_cond_wait(&qemu_work_cond, mutex);
+        current_cpu = self_cpu;
+    }
+}
+
+void async_run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data)
+{
+    struct qemu_work_item *wi;
+
+    if (qemu_cpu_is_self(cpu)) {
+        func(cpu, data);
+        return;
+    }
+
+    wi = g_malloc0(sizeof(struct qemu_work_item));
+    wi->func = func;
+    wi->data = data;
+    wi->free = true;
+
+    queue_work_on_cpu(cpu, wi);
+}
+
+void process_queued_cpu_work(CPUState *cpu)
+{
+    struct qemu_work_item *wi;
+
+    if (cpu->queued_work_first == NULL) {
+        return;
+    }
+
+    qemu_mutex_lock(&cpu->work_mutex);
+    while (cpu->queued_work_first != NULL) {
+        wi = cpu->queued_work_first;
+        cpu->queued_work_first = wi->next;
+        if (!cpu->queued_work_first) {
+            cpu->queued_work_last = NULL;
+        }
+        qemu_mutex_unlock(&cpu->work_mutex);
+        wi->func(cpu, wi->data);
+        qemu_mutex_lock(&cpu->work_mutex);
+        if (wi->free) {
+            g_free(wi);
+        } else {
+            atomic_mb_set(&wi->done, true);
+        }
+    }
+    qemu_mutex_unlock(&cpu->work_mutex);
+    qemu_cond_broadcast(&qemu_work_cond);
+}
diff --git a/cpus.c b/cpus.c
index ab15214..e1bdc16 100644
--- a/cpus.c
+++ b/cpus.c
@@ -901,73 +901,21 @@ static QemuThread io_thread;
 static QemuCond qemu_cpu_cond;
 /* system init */
 static QemuCond qemu_pause_cond;
-static QemuCond qemu_work_cond;
 
 void qemu_init_cpu_loop(void)
 {
     qemu_init_sigbus();
     qemu_cond_init(&qemu_cpu_cond);
     qemu_cond_init(&qemu_pause_cond);
-    qemu_cond_init(&qemu_work_cond);
     qemu_cond_init(&qemu_io_proceeded_cond);
     qemu_mutex_init(&qemu_global_mutex);
 
     qemu_thread_get_self(&io_thread);
 }
 
-static void queue_work_on_cpu(CPUState *cpu, struct qemu_work_item *wi)
-{
-    qemu_mutex_lock(&cpu->work_mutex);
-    if (cpu->queued_work_first == NULL) {
-        cpu->queued_work_first = wi;
-    } else {
-        cpu->queued_work_last->next = wi;
-    }
-    cpu->queued_work_last = wi;
-    wi->next = NULL;
-    wi->done = false;
-    qemu_mutex_unlock(&cpu->work_mutex);
-
-    qemu_cpu_kick(cpu);
-}
-
 void run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data)
 {
-    struct qemu_work_item wi;
-
-    if (qemu_cpu_is_self(cpu)) {
-        func(cpu, data);
-        return;
-    }
-
-    wi.func = func;
-    wi.data = data;
-    wi.free = false;
-
-    queue_work_on_cpu(cpu, &wi);
-    while (!atomic_mb_read(&wi.done)) {
-        CPUState *self_cpu = current_cpu;
-
-        qemu_cond_wait(&qemu_work_cond, &qemu_global_mutex);
-        current_cpu = self_cpu;
-    }
-}
-
-void async_run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data)
-{
-    struct qemu_work_item *wi;
-
-    if (qemu_cpu_is_self(cpu)) {
-        func(cpu, data);
-        return;
-    }
-
-    wi = g_malloc0(sizeof(struct qemu_work_item));
-    wi->func = func;
-    wi->data = data;
-    wi->free = true;
-
-    queue_work_on_cpu(cpu, wi);
+    do_run_on_cpu(cpu, func, data, &qemu_global_mutex);
 }
 
 static void qemu_kvm_destroy_vcpu(CPUState *cpu)
@@ -982,34 +930,6 @@ static void qemu_tcg_destroy_vcpu(CPUState *cpu)
 {
 }
 
-static void process_queued_cpu_work(CPUState *cpu)
-{
-    struct qemu_work_item *wi;
-
-    if (cpu->queued_work_first == NULL) {
-        return;
-    }
-
-    qemu_mutex_lock(&cpu->work_mutex);
-    while (cpu->queued_work_first != NULL) {
-        wi = cpu->queued_work_first;
-        cpu->queued_work_first = wi->next;
-        if (!cpu->queued_work_first) {
-            cpu->queued_work_last = NULL;
-        }
-        qemu_mutex_unlock(&cpu->work_mutex);
-        wi->func(cpu, wi->data);
-        qemu_mutex_lock(&cpu->work_mutex);
-        if (wi->free) {
-            g_free(wi);
-        } else {
-            atomic_mb_set(&wi->done, true);
-        }
-    }
-    qemu_mutex_unlock(&cpu->work_mutex);
-    qemu_cond_broadcast(&qemu_work_cond);
-}
-
 static void qemu_wait_io_event_common(CPUState *cpu)
 {
     if (cpu->stop) {
diff --git a/include/qom/cpu.h b/include/qom/cpu.h
index ea3233f..d7688f6 100644
--- a/include/qom/cpu.h
+++ b/include/qom/cpu.h
@@ -630,6 +630,18 @@ void qemu_cpu_kick(CPUState *cpu);
 bool cpu_is_stopped(CPUState *cpu);
 
 /**
+ * do_run_on_cpu:
+ * @cpu: The vCPU to run on.
+ * @func: The function to be executed.
+ * @data: Data to pass to the function.
+ * @mutex: Mutex to release while waiting for @func to run.
+ *
+ * Used internally in the implementation of run_on_cpu.
+ */
+void do_run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data,
+                   QemuMutex *mutex);
+
+/**
  * run_on_cpu:
  * @cpu: The vCPU to run on.
  * @func: The function to be executed.
@@ -808,6 +820,12 @@ void cpu_remove(CPUState *cpu);
 void cpu_remove_sync(CPUState *cpu);
 
 /**
+ * process_queued_cpu_work() - process all items on CPU work queue
+ * @cpu: The CPU which work queue to process.
+ */
+void process_queued_cpu_work(CPUState *cpu);
+
+/**
  * qemu_init_vcpu:
  * @cpu: The vCPU to initialize.
  *
diff --git a/linux-user/main.c b/linux-user/main.c
index bd5b58f..4972bbe 100644
--- a/linux-user/main.c
+++ b/linux-user/main.c
@@ -294,6 +294,8 @@ void cpu_loop(CPUX86State *env)
         cpu_exec_start(cs);
         trapnr = cpu_exec(cs);
         cpu_exec_end(cs);
+        process_queued_cpu_work(cs);
+
         switch(trapnr) {
         case 0x80:
             /* linux syscall from int $0x80 */
@@ -735,6 +737,8 @@ void cpu_loop(CPUARMState *env)
         cpu_exec_start(cs);
         trapnr = cpu_exec(cs);
         cpu_exec_end(cs);
+        process_queued_cpu_work(cs);
+
         switch(trapnr) {
         case EXCP_UDEF:
             {
@@ -1071,6 +1075,7 @@ void cpu_loop(CPUARMState *env)
         cpu_exec_start(cs);
         trapnr = cpu_exec(cs);
         cpu_exec_end(cs);
+        process_queued_cpu_work(cs);
 
         switch (trapnr) {
         case EXCP_SWI:
@@ -1159,6 +1164,8 @@ void cpu_loop(CPUUniCore32State *env)
         cpu_exec_start(cs);
         trapnr = cpu_exec(cs);
         cpu_exec_end(cs);
+        process_queued_cpu_work(cs);
+
         switch (trapnr) {
         case UC32_EXCP_PRIV:
             {
@@ -1364,6 +1371,7 @@ void cpu_loop (CPUSPARCState *env)
         cpu_exec_start(cs);
         trapnr = cpu_exec(cs);
         cpu_exec_end(cs);
+        process_queued_cpu_work(cs);
 
         /* Compute PSR before exposing state.  */
         if (env->cc_op != CC_OP_FLAGS) {
@@ -1636,6 +1644,8 @@ void cpu_loop(CPUPPCState *env)
         cpu_exec_start(cs);
         trapnr = cpu_exec(cs);
         cpu_exec_end(cs);
+        process_queued_cpu_work(cs);
+
         switch(trapnr) {
         case POWERPC_EXCP_NONE:
             /* Just go on */
@@ -2493,6 +2503,8 @@ void cpu_loop(CPUMIPSState *env)
         cpu_exec_start(cs);
         trapnr = cpu_exec(cs);
         cpu_exec_end(cs);
+        process_queued_cpu_work(cs);
+
         switch(trapnr) {
         case EXCP_SYSCALL:
             env->active_tc.PC += 4;
@@ -2733,6 +2745,7 @@ void cpu_loop(CPUOpenRISCState *env)
         cpu_exec_start(cs);
         trapnr = cpu_exec(cs);
         cpu_exec_end(cs);
+        process_queued_cpu_work(cs);
         gdbsig = 0;
 
         switch (trapnr) {
@@ -2827,6 +2840,7 @@ void cpu_loop(CPUSH4State *env)
         cpu_exec_start(cs);
         trapnr = cpu_exec(cs);
         cpu_exec_end(cs);
+        process_queued_cpu_work(cs);
 
         switch (trapnr) {
         case 0x160:
@@ -2893,6 +2907,8 @@ void cpu_loop(CPUCRISState *env)
         cpu_exec_start(cs);
         trapnr = cpu_exec(cs);
         cpu_exec_end(cs);
+        process_queued_cpu_work(cs);
+
         switch (trapnr) {
         case 0xaa:
             {
@@ -2958,6 +2974,8 @@ void cpu_loop(CPUMBState *env)
         cpu_exec_start(cs);
         trapnr = cpu_exec(cs);
         cpu_exec_end(cs);
+        process_queued_cpu_work(cs);
+
         switch (trapnr) {
         case 0xaa:
             {
@@ -3075,6 +3093,8 @@ void cpu_loop(CPUM68KState *env)
         cpu_exec_start(cs);
         trapnr = cpu_exec(cs);
         cpu_exec_end(cs);
+        process_queued_cpu_work(cs);
+
         switch(trapnr) {
         case EXCP_ILLEGAL:
             {
@@ -3218,6 +3238,7 @@ void cpu_loop(CPUAlphaState *env)
         cpu_exec_start(cs);
         trapnr = cpu_exec(cs);
         cpu_exec_end(cs);
+        process_queued_cpu_work(cs);
 
         /* All of the traps imply a transition through PALcode, which
            implies an REI instruction has been executed.  Which means
@@ -3410,6 +3431,8 @@ void cpu_loop(CPUS390XState *env)
         cpu_exec_start(cs);
         trapnr = cpu_exec(cs);
         cpu_exec_end(cs);
+        process_queued_cpu_work(cs);
+
         switch (trapnr) {
         case EXCP_INTERRUPT:
             /* Just indicate that signals should be handled asap.  */
@@ -3719,6 +3742,8 @@ void cpu_loop(CPUTLGState *env)
         cpu_exec_start(cs);
         trapnr = cpu_exec(cs);
         cpu_exec_end(cs);
+        process_queued_cpu_work(cs);
+
         switch (trapnr) {
         case TILEGX_EXCP_SYSCALL:
         {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH 08/12] cpus-common: move exclusive work infrastructure from
  2016-09-01 10:20 [Qemu-devel] [PATCH v6 00/12] cpu-exec: Safe work in quiescent state Paolo Bonzini
                   ` (6 preceding siblings ...)
  2016-09-01 10:20 ` [Qemu-devel] [PATCH 07/12] cpus-common: move CPU work item management to common Paolo Bonzini
@ 2016-09-01 10:20 ` Paolo Bonzini
  2016-09-05 14:55   ` Alex Bennée
  2016-09-01 10:20 ` [Qemu-devel] [PATCH 09/12] cpus-common: always defer async_run_on_cpu work items Paolo Bonzini
                   ` (4 subsequent siblings)
  12 siblings, 1 reply; 26+ messages in thread
From: Paolo Bonzini @ 2016-09-01 10:20 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Bennée, Sergey Fedorov, Sergey Fedorov,
	Emilio G. Cota linux-user

This will serve as the base for async_safe_run_on_cpu.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 bsd-user/main.c   | 17 -----------
 cpus-common.c     | 82 +++++++++++++++++++++++++++++++++++++++++++++++++++
 cpus.c            |  2 ++
 include/qom/cpu.h | 40 ++++++++++++++++++++++++-
 linux-user/main.c | 87 -------------------------------------------------------
 5 files changed, 123 insertions(+), 105 deletions(-)

diff --git a/bsd-user/main.c b/bsd-user/main.c
index 125067a..f29e0e3 100644
--- a/bsd-user/main.c
+++ b/bsd-user/main.c
@@ -66,23 +66,6 @@ int cpu_get_pic_interrupt(CPUX86State *env)
 }
 #endif
 
-/* These are no-ops because we are not threadsafe.  */
-static inline void cpu_exec_start(CPUState *cpu)
-{
-}
-
-static inline void cpu_exec_end(CPUState *cpu)
-{
-}
-
-static inline void start_exclusive(void)
-{
-}
-
-static inline void end_exclusive(void)
-{
-}
-
 void fork_start(void)
 {
 }
diff --git a/cpus-common.c b/cpus-common.c
index 12729e5..47f7c06 100644
--- a/cpus-common.c
+++ b/cpus-common.c
@@ -23,11 +23,21 @@
 #include "exec/memory-internal.h"
 
 static QemuMutex qemu_cpu_list_mutex;
+static QemuCond exclusive_cond;
+static QemuCond exclusive_resume;
 static QemuCond qemu_work_cond;
 
+static int pending_cpus;
+
 void qemu_init_cpu_list(void)
 {
+    /* This is needed because qemu_init_cpu_list is also called by the
+     * child process in a fork.  */
+    pending_cpus = 0;
+
     qemu_mutex_init(&qemu_cpu_list_mutex);
+    qemu_cond_init(&exclusive_cond);
+    qemu_cond_init(&exclusive_resume);
     qemu_cond_init(&qemu_work_cond);
 }
 
@@ -52,6 +62,12 @@ static int cpu_get_free_index(void)
     return cpu_index;
 }
 
+static void finish_safe_work(CPUState *cpu)
+{
+    cpu_exec_start(cpu);
+    cpu_exec_end(cpu);
+}
+
 void cpu_list_add(CPUState *cpu)
 {
     qemu_mutex_lock(&qemu_cpu_list_mutex);
@@ -62,6 +78,8 @@ void cpu_list_add(CPUState *cpu)
 
     QTAILQ_INSERT_TAIL(&cpus, cpu, node);
     qemu_mutex_unlock(&qemu_cpu_list_mutex);
+
+    finish_safe_work(cpu);
 }
 
 void cpu_list_remove(CPUState *cpu)
@@ -131,6 +149,70 @@ void async_run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data)
     queue_work_on_cpu(cpu, wi);
 }
 
+/* Wait for pending exclusive operations to complete.  The exclusive lock
+   must be held.  */
+static inline void exclusive_idle(void)
+{
+    while (pending_cpus) {
+        qemu_cond_wait(&exclusive_resume, &qemu_cpu_list_mutex);
+    }
+}
+
+/* Start an exclusive operation.
+   Must only be called from outside cpu_exec, takes
+   qemu_cpu_list_mutex.   */
+void start_exclusive(void)
+{
+    CPUState *other_cpu;
+
+    qemu_mutex_lock(&qemu_cpu_list_mutex);
+    exclusive_idle();
+
+    /* Make all other cpus stop executing.  */
+    pending_cpus = 1;
+    CPU_FOREACH(other_cpu) {
+        if (other_cpu->running) {
+            pending_cpus++;
+            qemu_cpu_kick(other_cpu);
+        }
+    }
+    if (pending_cpus > 1) {
+        qemu_cond_wait(&exclusive_cond, &qemu_cpu_list_mutex);
+    }
+}
+
+/* Finish an exclusive operation.  Releases qemu_cpu_list_mutex.  */
+void end_exclusive(void)
+{
+    pending_cpus = 0;
+    qemu_cond_broadcast(&exclusive_resume);
+    qemu_mutex_unlock(&qemu_cpu_list_mutex);
+}
+
+/* Wait for exclusive ops to finish, and begin cpu execution.  */
+void cpu_exec_start(CPUState *cpu)
+{
+    qemu_mutex_lock(&qemu_cpu_list_mutex);
+    exclusive_idle();
+    cpu->running = true;
+    qemu_mutex_unlock(&qemu_cpu_list_mutex);
+}
+
+/* Mark cpu as not executing, and release pending exclusive ops.  */
+void cpu_exec_end(CPUState *cpu)
+{
+    qemu_mutex_lock(&qemu_cpu_list_mutex);
+    cpu->running = false;
+    if (pending_cpus > 1) {
+        pending_cpus--;
+        if (pending_cpus == 1) {
+            qemu_cond_signal(&exclusive_cond);
+        }
+    }
+    exclusive_idle();
+    qemu_mutex_unlock(&qemu_cpu_list_mutex);
+}
+
 void process_queued_cpu_work(CPUState *cpu)
 {
     struct qemu_work_item *wi;
diff --git a/cpus.c b/cpus.c
index e1bdc16..b024604 100644
--- a/cpus.c
+++ b/cpus.c
@@ -1456,7 +1456,9 @@ static int tcg_cpu_exec(CPUState *cpu)
         cpu->icount_decr.u16.low = decr;
         cpu->icount_extra = count;
     }
+    cpu_exec_start(cpu);
     ret = cpu_exec(cpu);
+    cpu_exec_end(cpu);
 #ifdef CONFIG_PROFILER
     tcg_time += profile_getclock() - ti;
 #endif
diff --git a/include/qom/cpu.h b/include/qom/cpu.h
index d7688f6..0e04e8f 100644
--- a/include/qom/cpu.h
+++ b/include/qom/cpu.h
@@ -249,7 +249,8 @@ struct qemu_work_item {
  * @nr_threads: Number of threads within this CPU.
  * @numa_node: NUMA node this CPU is belonging to.
  * @host_tid: Host thread ID.
- * @running: #true if CPU is currently running (usermode).
+ * @running: #true if CPU is currently running;
+ * valid under cpu_list_lock.
  * @created: Indicates whether the CPU thread has been successfully created.
  * @interrupt_request: Indicates a pending interrupt request.
  * @halted: Nonzero if the CPU is in suspended state.
@@ -826,6 +827,43 @@ void cpu_remove_sync(CPUState *cpu);
 void process_queued_cpu_work(CPUState *cpu);
 
 /**
+ * cpu_exec_start:
+ * @cpu: The CPU for the current thread.
+ *
+ * Record that a CPU has started execution and can be interrupted with
+ * cpu_exit.
+ */
+void cpu_exec_start(CPUState *cpu);
+
+/**
+ * cpu_exec_end:
+ * @cpu: The CPU for the current thread.
+ *
+ * Record that a CPU has stopped execution and safe work can be executed
+ * without interrupting it.
+ */
+void cpu_exec_end(CPUState *cpu);
+
+/**
+ * start_exclusive:
+ * @cpu: The CPU for the current thread.
+ *
+ * Similar to async_safe_work_run_on_cpu, but the work item is only
+ * run on one CPU and is between start_exclusive and end_exclusive.
+ * Returns with the CPU list lock taken (which nests outside all
+ * other locks except the BQL).
+ */
+void start_exclusive(void);
+
+/**
+ * end_exclusive:
+ *
+ * Concludes an exclusive execution section started by start_exclusive.
+ * Releases the CPU list lock.
+ */
+void end_exclusive(void);
+
+/**
  * qemu_init_vcpu:
  * @cpu: The vCPU to initialize.
  *
diff --git a/linux-user/main.c b/linux-user/main.c
index 4972bbe..cecab8d 100644
--- a/linux-user/main.c
+++ b/linux-user/main.c
@@ -107,28 +107,11 @@ int cpu_get_pic_interrupt(CPUX86State *env)
 /***********************************************************/
 /* Helper routines for implementing atomic operations.  */
 
-/* To implement exclusive operations we force all cpus to syncronise.
-   We don't require a full sync, only that no cpus are executing guest code.
-   The alternative is to map target atomic ops onto host equivalents,
-   which requires quite a lot of per host/target work.  */
-static QemuMutex exclusive_lock;
-static QemuCond exclusive_cond;
-static QemuCond exclusive_resume;
-static int pending_cpus;
-
-void qemu_init_cpu_loop(void)
-{
-    qemu_mutex_init(&exclusive_lock);
-    qemu_cond_init(&exclusive_cond);
-    qemu_cond_init(&exclusive_resume);
-}
-
 /* Make sure everything is in a consistent state for calling fork().  */
 void fork_start(void)
 {
     cpu_list_lock();
     qemu_mutex_lock(&tcg_ctx.tb_ctx.tb_lock);
-    qemu_mutex_lock(&exclusive_lock);
     mmap_fork_start();
 }
 
@@ -144,84 +127,15 @@ void fork_end(int child)
                 QTAILQ_REMOVE(&cpus, cpu, node);
             }
         }
-        pending_cpus = 0;
-        qemu_mutex_init(&exclusive_lock);
-        qemu_cond_init(&exclusive_cond);
-        qemu_cond_init(&exclusive_resume);
         qemu_mutex_init(&tcg_ctx.tb_ctx.tb_lock);
         qemu_init_cpu_list();
         gdbserver_fork(thread_cpu);
     } else {
-        qemu_mutex_unlock(&exclusive_lock);
         qemu_mutex_unlock(&tcg_ctx.tb_ctx.tb_lock);
         cpu_list_unlock();
     }
 }
 
-/* Wait for pending exclusive operations to complete.  The exclusive lock
-   must be held.  */
-static inline void exclusive_idle(void)
-{
-    while (pending_cpus) {
-        qemu_cond_wait(&exclusive_resume, &exclusive_lock);
-    }
-}
-
-/* Start an exclusive operation.
-   Must only be called from outside cpu_exec.   */
-static inline void start_exclusive(void)
-{
-    CPUState *other_cpu;
-
-    qemu_mutex_lock(&exclusive_lock);
-    exclusive_idle();
-
-    pending_cpus = 1;
-    /* Make all other cpus stop executing.  */
-    CPU_FOREACH(other_cpu) {
-        if (other_cpu->running) {
-            pending_cpus++;
-            cpu_exit(other_cpu);
-        }
-    }
-    if (pending_cpus > 1) {
-        qemu_cond_wait(&exclusive_cond, &exclusive_lock);
-    }
-}
-
-/* Finish an exclusive operation.  */
-static inline void __attribute__((unused)) end_exclusive(void)
-{
-    pending_cpus = 0;
-    qemu_cond_broadcast(&exclusive_resume);
-    qemu_mutex_unlock(&exclusive_lock);
-}
-
-/* Wait for exclusive ops to finish, and begin cpu execution.  */
-static inline void cpu_exec_start(CPUState *cpu)
-{
-    qemu_mutex_lock(&exclusive_lock);
-    exclusive_idle();
-    cpu->running = true;
-    qemu_mutex_unlock(&exclusive_lock);
-}
-
-/* Mark cpu as not executing, and release pending exclusive ops.  */
-static inline void cpu_exec_end(CPUState *cpu)
-{
-    qemu_mutex_lock(&exclusive_lock);
-    cpu->running = false;
-    if (pending_cpus > 1) {
-        pending_cpus--;
-        if (pending_cpus == 1) {
-            qemu_cond_signal(&exclusive_cond);
-        }
-    }
-    exclusive_idle();
-    qemu_mutex_unlock(&exclusive_lock);
-}
-
-
 #ifdef TARGET_I386
 /***********************************************************/
 /* CPUX86 core interface */
@@ -4256,7 +4170,6 @@ int main(int argc, char **argv, char **envp)
     int execfd;
 
     qemu_init_cpu_list();
-    qemu_init_cpu_loop();
     module_call_init(MODULE_INIT_QOM);
 
     if ((envlist = envlist_create()) == NULL) {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH 09/12] cpus-common: always defer async_run_on_cpu work items
  2016-09-01 10:20 [Qemu-devel] [PATCH v6 00/12] cpu-exec: Safe work in quiescent state Paolo Bonzini
                   ` (7 preceding siblings ...)
  2016-09-01 10:20 ` [Qemu-devel] [PATCH 08/12] cpus-common: move exclusive work infrastructure from Paolo Bonzini
@ 2016-09-01 10:20 ` Paolo Bonzini
  2016-09-05 14:57   ` Alex Bennée
  2016-09-01 10:20 ` [Qemu-devel] [PATCH 10/12] cpus-common: Introduce async_safe_run_on_cpu() Paolo Bonzini
                   ` (3 subsequent siblings)
  12 siblings, 1 reply; 26+ messages in thread
From: Paolo Bonzini @ 2016-09-01 10:20 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Bennée, Sergey Fedorov, Sergey Fedorov, Emilio G. Cota

async_run_on_cpu is only called from the I/O thread, not from CPU threads,
so it doesn't make any difference.  It will make a difference however
for async_safe_run_on_cpu.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 cpus-common.c | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/cpus-common.c b/cpus-common.c
index 47f7c06..59c8dc8 100644
--- a/cpus-common.c
+++ b/cpus-common.c
@@ -136,11 +136,6 @@ void async_run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data)
 {
     struct qemu_work_item *wi;
 
-    if (qemu_cpu_is_self(cpu)) {
-        func(cpu, data);
-        return;
-    }
-
     wi = g_malloc0(sizeof(struct qemu_work_item));
     wi->func = func;
     wi->data = data;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH 10/12] cpus-common: Introduce async_safe_run_on_cpu()
  2016-09-01 10:20 [Qemu-devel] [PATCH v6 00/12] cpu-exec: Safe work in quiescent state Paolo Bonzini
                   ` (8 preceding siblings ...)
  2016-09-01 10:20 ` [Qemu-devel] [PATCH 09/12] cpus-common: always defer async_run_on_cpu work items Paolo Bonzini
@ 2016-09-01 10:20 ` Paolo Bonzini
  2016-09-05 15:08   ` Alex Bennée
  2016-09-12 18:25   ` Pranith Kumar
  2016-09-01 10:20 ` [Qemu-devel] [PATCH 11/12] tcg: Make tb_flush() thread safe Paolo Bonzini
                   ` (2 subsequent siblings)
  12 siblings, 2 replies; 26+ messages in thread
From: Paolo Bonzini @ 2016-09-01 10:20 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Bennée, Sergey Fedorov, Sergey Fedorov, Emilio G. Cota

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 cpus-common.c     | 25 +++++++++++++++++++++++++
 include/qom/cpu.h | 12 ++++++++++++
 2 files changed, 37 insertions(+)

diff --git a/cpus-common.c b/cpus-common.c
index 59c8dc8..88cf5ec 100644
--- a/cpus-common.c
+++ b/cpus-common.c
@@ -144,6 +144,11 @@ void async_run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data)
     queue_work_on_cpu(cpu, wi);
 }
 
+typedef struct SafeWorkItem {
+    run_on_cpu_func func;
+    void *data;
+} SafeWorkItem;
+
 /* Wait for pending exclusive operations to complete.  The exclusive lock
    must be held.  */
 static inline void exclusive_idle(void)
@@ -208,6 +213,26 @@ void cpu_exec_end(CPUState *cpu)
     qemu_mutex_unlock(&qemu_cpu_list_mutex);
 }
 
+static void async_safe_run_on_cpu_fn(CPUState *cpu, void *data)
+{
+    SafeWorkItem *w = data;
+
+    start_exclusive();
+    w->func(cpu, w->data);
+    end_exclusive();
+    g_free(w);
+}
+
+void async_safe_run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data)
+{
+    SafeWorkItem *w = g_new(SafeWorkItem, 1);
+
+    w->func = func;
+    w->data = data;
+
+    async_run_on_cpu(cpu, async_safe_run_on_cpu_fn, w);
+}
+
 void process_queued_cpu_work(CPUState *cpu)
 {
     struct qemu_work_item *wi;
diff --git a/include/qom/cpu.h b/include/qom/cpu.h
index 0e04e8f..54a875e 100644
--- a/include/qom/cpu.h
+++ b/include/qom/cpu.h
@@ -663,6 +663,18 @@ void run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data);
 void async_run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data);
 
 /**
+ * async_safe_run_on_cpu:
+ * @cpu: The vCPU to run on.
+ * @func: The function to be executed.
+ * @data: Data to pass to the function.
+ *
+ * Schedules the function @func for execution on the vCPU @cpu asynchronously,
+ * while all other vCPUs are sleeping.  @func is called with the CPU list lock
+ * taken (and for system emulation the BQL); any other lock can be taken safely.
+ */
+void async_safe_run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data);
+
+/**
  * qemu_get_cpu:
  * @index: The CPUState@cpu_index value of the CPU to obtain.
  *
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH 11/12] tcg: Make tb_flush() thread safe
  2016-09-01 10:20 [Qemu-devel] [PATCH v6 00/12] cpu-exec: Safe work in quiescent state Paolo Bonzini
                   ` (9 preceding siblings ...)
  2016-09-01 10:20 ` [Qemu-devel] [PATCH 10/12] cpus-common: Introduce async_safe_run_on_cpu() Paolo Bonzini
@ 2016-09-01 10:20 ` Paolo Bonzini
  2016-09-01 10:20 ` [Qemu-devel] [PATCH 12/12] cpus-common: lock-free fast path for cpu_exec_start/end Paolo Bonzini
  2016-09-05 15:51 ` [Qemu-devel] [PATCH v6 00/12] cpu-exec: Safe work in quiescent state Alex Bennée
  12 siblings, 0 replies; 26+ messages in thread
From: Paolo Bonzini @ 2016-09-01 10:20 UTC (permalink / raw)
  To: qemu-devel
  Cc: Sergey Fedorov, Alex Bennée, Sergey Fedorov, Emilio G. Cota

From: Sergey Fedorov <serge.fdrv@gmail.com>

Use async_safe_run_on_cpu() to make tb_flush() thread safe.  This is
possible now that code generation does not happen in the middle of
execution.

It can happen that multiple threads schedule a safe work to flush the
translation buffer. To keep statistics and debugging output sane, always
check if the translation buffer has already been flushed.

Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
[AJB: minor re-base fixes]
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <1470158864-17651-13-git-send-email-alex.bennee@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 cpu-exec.c                | 12 ++----------
 include/exec/tb-context.h |  2 +-
 include/qom/cpu.h         |  2 --
 translate-all.c           | 38 ++++++++++++++++++++++++++++----------
 4 files changed, 31 insertions(+), 23 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index b240b9f..a8ff2a1 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -203,20 +203,16 @@ static void cpu_exec_nocache(CPUState *cpu, int max_cycles,
                              TranslationBlock *orig_tb, bool ignore_icount)
 {
     TranslationBlock *tb;
-    bool old_tb_flushed;
 
     /* Should never happen.
        We only end up here when an existing TB is too long.  */
     if (max_cycles > CF_COUNT_MASK)
         max_cycles = CF_COUNT_MASK;
 
-    old_tb_flushed = cpu->tb_flushed;
-    cpu->tb_flushed = false;
     tb = tb_gen_code(cpu, orig_tb->pc, orig_tb->cs_base, orig_tb->flags,
                      max_cycles | CF_NOCACHE
                          | (ignore_icount ? CF_IGNORE_ICOUNT : 0));
-    tb->orig_tb = cpu->tb_flushed ? NULL : orig_tb;
-    cpu->tb_flushed |= old_tb_flushed;
+    tb->orig_tb = orig_tb;
     /* execute the generated code */
     trace_exec_tb_nocache(tb, tb->pc);
     cpu_tb_exec(cpu, tb);
@@ -337,10 +333,7 @@ static inline TranslationBlock *tb_find(CPUState *cpu,
             tb_lock();
             have_tb_lock = true;
         }
-        /* Check if translation buffer has been flushed */
-        if (cpu->tb_flushed) {
-            cpu->tb_flushed = false;
-        } else if (!tb->invalid) {
+        if (!tb->invalid) {
             tb_add_jump(last_tb, tb_exit, tb);
         }
     }
@@ -605,7 +598,6 @@ int cpu_exec(CPUState *cpu)
                 break;
             }
 
-            atomic_mb_set(&cpu->tb_flushed, false); /* reset before first TB lookup */
             for(;;) {
                 cpu_handle_interrupt(cpu, &last_tb);
                 tb = tb_find(cpu, last_tb, tb_exit);
diff --git a/include/exec/tb-context.h b/include/exec/tb-context.h
index dce95d9..c7f17f2 100644
--- a/include/exec/tb-context.h
+++ b/include/exec/tb-context.h
@@ -38,7 +38,7 @@ struct TBContext {
     QemuMutex tb_lock;
 
     /* statistics */
-    int tb_flush_count;
+    unsigned tb_flush_count;
     int tb_phys_invalidate_count;
 };
 
diff --git a/include/qom/cpu.h b/include/qom/cpu.h
index 54a875e..3817a98 100644
--- a/include/qom/cpu.h
+++ b/include/qom/cpu.h
@@ -260,7 +260,6 @@ struct qemu_work_item {
  * @crash_occurred: Indicates the OS reported a crash (panic) for this CPU
  * @tcg_exit_req: Set to force TCG to stop executing linked TBs for this
  *           CPU and return to its top level loop.
- * @tb_flushed: Indicates the translation buffer has been flushed.
  * @singlestep_enabled: Flags for single-stepping.
  * @icount_extra: Instructions until next timer event.
  * @icount_decr: Number of cycles left, with interrupt flag in high bit.
@@ -313,7 +312,6 @@ struct CPUState {
     bool unplug;
     bool crash_occurred;
     bool exit_request;
-    bool tb_flushed;
     uint32_t interrupt_request;
     int singlestep_enabled;
     int64_t icount_extra;
diff --git a/translate-all.c b/translate-all.c
index b6663dc..ab657e7 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -832,12 +832,19 @@ static void page_flush_tb(void)
 }
 
 /* flush all the translation blocks */
-/* XXX: tb_flush is currently not thread safe */
-void tb_flush(CPUState *cpu)
+static void do_tb_flush(CPUState *cpu, void *data)
 {
-    if (!tcg_enabled()) {
-        return;
+    unsigned tb_flush_req = (unsigned) (uintptr_t) data;
+
+    tb_lock();
+
+    /* If it's already been done on request of another CPU,
+     * just retry.
+     */
+    if (atomic_read(&tcg_ctx.tb_ctx.tb_flush_count) != tb_flush_req) {
+        goto done;
     }
+
 #if defined(DEBUG_FLUSH)
     printf("qemu: flush code_size=%ld nb_tbs=%d avg_tb_size=%ld\n",
            (unsigned long)(tcg_ctx.code_gen_ptr - tcg_ctx.code_gen_buffer),
@@ -856,7 +863,6 @@ void tb_flush(CPUState *cpu)
         for (i = 0; i < TB_JMP_CACHE_SIZE; ++i) {
             atomic_set(&cpu->tb_jmp_cache[i], NULL);
         }
-        atomic_mb_set(&cpu->tb_flushed, true);
     }
 
     tcg_ctx.tb_ctx.nb_tbs = 0;
@@ -866,7 +872,19 @@ void tb_flush(CPUState *cpu)
     tcg_ctx.code_gen_ptr = tcg_ctx.code_gen_buffer;
     /* XXX: flush processor icache at this point if cache flush is
        expensive */
-    tcg_ctx.tb_ctx.tb_flush_count++;
+    atomic_inc(&tcg_ctx.tb_ctx.tb_flush_count);
+
+done:
+    tb_unlock();
+}
+
+void tb_flush(CPUState *cpu)
+{
+    if (tcg_enabled()) {
+        uintptr_t tb_flush_req = (uintptr_t)
+            atomic_read(&tcg_ctx.tb_ctx.tb_flush_count);
+        async_safe_run_on_cpu(cpu, do_tb_flush, (void *) tb_flush_req);
+    }
 }
 
 #ifdef DEBUG_TB_CHECK
@@ -1173,9 +1191,8 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
  buffer_overflow:
         /* flush must be done */
         tb_flush(cpu);
-        /* cannot fail at this point */
-        tb = tb_alloc(pc);
-        assert(tb != NULL);
+        mmap_unlock();
+        cpu_loop_exit(cpu);
     }
 
     gen_code_buf = tcg_ctx.code_gen_ptr;
@@ -1773,7 +1790,8 @@ void dump_exec_info(FILE *f, fprintf_function cpu_fprintf)
     qht_statistics_destroy(&hst);
 
     cpu_fprintf(f, "\nStatistics:\n");
-    cpu_fprintf(f, "TB flush count      %d\n", tcg_ctx.tb_ctx.tb_flush_count);
+    cpu_fprintf(f, "TB flush count      %d\n",
+            atomic_read(&tcg_ctx.tb_ctx.tb_flush_count));
     cpu_fprintf(f, "TB invalidate count %d\n",
             tcg_ctx.tb_ctx.tb_phys_invalidate_count);
     cpu_fprintf(f, "TLB flush count     %d\n", tlb_flush_count);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH 12/12] cpus-common: lock-free fast path for cpu_exec_start/end
  2016-09-01 10:20 [Qemu-devel] [PATCH v6 00/12] cpu-exec: Safe work in quiescent state Paolo Bonzini
                   ` (10 preceding siblings ...)
  2016-09-01 10:20 ` [Qemu-devel] [PATCH 11/12] tcg: Make tb_flush() thread safe Paolo Bonzini
@ 2016-09-01 10:20 ` Paolo Bonzini
  2016-09-05 15:25   ` Alex Bennée
  2016-09-05 15:51 ` [Qemu-devel] [PATCH v6 00/12] cpu-exec: Safe work in quiescent state Alex Bennée
  12 siblings, 1 reply; 26+ messages in thread
From: Paolo Bonzini @ 2016-09-01 10:20 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Bennée, Sergey Fedorov, Sergey Fedorov, Emilio G. Cota

Set cpu->running without taking the cpu_list lock, only look at it if
there is a concurrent exclusive section.  This requires adding a new
field to CPUState, which records whether a running CPU is being counted
in pending_cpus.  When an exclusive section is started concurrently with
cpu_exec_start, cpu_exec_start can use the new field to wait for the end
of the exclusive section.

This a separate patch for easier bisection of issues.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 cpus-common.c              |  72 +++++++++++++--
 docs/tcg-exclusive.promela | 224 +++++++++++++++++++++++++++++++++++++++++++++
 include/qom/cpu.h          |   5 +-
 3 files changed, 289 insertions(+), 12 deletions(-)
 create mode 100644 docs/tcg-exclusive.promela

diff --git a/cpus-common.c b/cpus-common.c
index 88cf5ec..443617a 100644
--- a/cpus-common.c
+++ b/cpus-common.c
@@ -170,8 +170,12 @@ void start_exclusive(void)
 
     /* Make all other cpus stop executing.  */
     pending_cpus = 1;
+
+    /* Write pending_cpus before reading other_cpu->running.  */
+    smp_mb();
     CPU_FOREACH(other_cpu) {
         if (other_cpu->running) {
+            other_cpu->has_waiter = true;
             pending_cpus++;
             qemu_cpu_kick(other_cpu);
         }
@@ -192,25 +196,73 @@ void end_exclusive(void)
 /* Wait for exclusive ops to finish, and begin cpu execution.  */
 void cpu_exec_start(CPUState *cpu)
 {
-    qemu_mutex_lock(&qemu_cpu_list_mutex);
-    exclusive_idle();
     cpu->running = true;
-    qemu_mutex_unlock(&qemu_cpu_list_mutex);
+
+    /* Write cpu->running before reading pending_cpus.  */
+    smp_mb();
+
+    /* 1. start_exclusive saw cpu->running == true and pending_cpus >= 1.
+     * After taking the lock we'll see cpu->has_waiter == true and run---not
+     * for long because start_exclusive kicked us.  cpu_exec_end will
+     * decrement pending_cpus and signal the waiter.
+     *
+     * 2. start_exclusive saw cpu->running == false but pending_cpus >= 1.
+     * This includes the case when an exclusive item is running now.
+     * Then we'll see cpu->has_waiter == false and wait for the item to
+     * complete.
+     *
+     * 3. pending_cpu == 0.  Then start_exclusive is definitely going to
+     * see cpu->running == true, and it will kick the CPU.
+     */
+    if (pending_cpus) {
+        qemu_mutex_lock(&qemu_cpu_list_mutex);
+        if (!cpu->has_waiter) {
+            /* Not counted in pending_cpus, let the exclusive item
+             * run.  Since we have the lock, set cpu->running to true
+             * while holding it instead of retrying.
+             */
+            cpu->running = false;
+            exclusive_idle();
+            cpu->running = true;
+        } else {
+            /* Counted in pending_cpus, go ahead.  */
+        }
+        qemu_mutex_unlock(&qemu_cpu_list_mutex);
+    }
 }
 
 /* Mark cpu as not executing, and release pending exclusive ops.  */
 void cpu_exec_end(CPUState *cpu)
 {
-    qemu_mutex_lock(&qemu_cpu_list_mutex);
     cpu->running = false;
-    if (pending_cpus > 1) {
-        pending_cpus--;
-        if (pending_cpus == 1) {
-            qemu_cond_signal(&exclusive_cond);
+
+    /* Write cpu->running before reading pending_cpus.  */
+    smp_mb();
+
+    /* 1. start_exclusive saw cpu->running == true.  Then it will increment
+     * pending_cpus and wait for exclusive_cond.  After taking the lock
+     * we'll see cpu->has_waiter == true.
+     *
+     * 2. start_exclusive saw cpu->running == false but here pending_cpus >= 1.
+     * This includes the case when an exclusive item is running now.
+     * Then we'll see cpu->has_waiter == false and not touch pending_cpus,
+     * but will run exclusive_idle to wait for the item to complete.
+     *
+     * 3. pending_cpu == 0.  Then start_exclusive is definitely going to
+     * see cpu->running == false, and it can ignore this CPU until the
+     * next cpu_exec_start.
+     */
+    if (pending_cpus) {
+        qemu_mutex_lock(&qemu_cpu_list_mutex);
+        if (cpu->has_waiter) {
+            cpu->has_waiter = false;
+            if (--pending_cpus == 1) {
+                qemu_cond_signal(&exclusive_cond);
+            }
+            exclusive_idle();
         }
+        qemu_mutex_unlock(&qemu_cpu_list_mutex);
     }
-    exclusive_idle();
-    qemu_mutex_unlock(&qemu_cpu_list_mutex);
 }
 
 static void async_safe_run_on_cpu_fn(CPUState *cpu, void *data)
diff --git a/docs/tcg-exclusive.promela b/docs/tcg-exclusive.promela
new file mode 100644
index 0000000..293b530
--- /dev/null
+++ b/docs/tcg-exclusive.promela
@@ -0,0 +1,224 @@
+/*
+ * This model describes the implementation of exclusive sections in
+ * cpus-common.c (start_exclusive, end_exclusive, cpu_exec_start,
+ * cpu_exec_end).
+ *
+ * Author: Paolo Bonzini <pbonzini@redhat.com>
+ *
+ * This file is in the public domain.  If you really want a license,
+ * the WTFPL will do.
+ *
+ * To verify it:
+ *     spin -a docs/event.promela
+ *     ./a.out -a
+ *
+ * Tunable processor macros: N_CPUS, N_EXCLUSIVE, N_CYCLES, USE_MUTEX,
+ *                           TEST_EXPENSIVE.
+ */
+
+// Define the missing parameters for the model
+#ifndef N_CPUS
+#define N_CPUS 2
+#warning defaulting to 2 CPU processes
+#endif
+
+// the expensive test is not so expensive for <= 2 CPUs
+// If the mutex is used, it's also cheap (300 MB / 4 seconds) for 3 CPUs
+// For 3 CPUs and the lock-free option it needs 1.5 GB of RAM
+#if N_CPUS <= 2 || (N_CPUS <= 3 && defined USE_MUTEX)
+#define TEST_EXPENSIVE
+#endif
+
+#ifndef N_EXCLUSIVE
+# if !defined N_CYCLES || N_CYCLES <= 1 || defined TEST_EXPENSIVE
+#  define N_EXCLUSIVE     2
+#  warning defaulting to 2 concurrent exclusive sections
+# else
+#  define N_EXCLUSIVE     1
+#  warning defaulting to 1 concurrent exclusive sections
+# endif
+#endif
+#ifndef N_CYCLES
+# if N_EXCLUSIVE <= 1 || defined TEST_EXPENSIVE
+#  define N_CYCLES        2
+#  warning defaulting to 2 CPU cycles
+# else
+#  define N_CYCLES        1
+#  warning defaulting to 1 CPU cycles
+# endif
+#endif
+
+
+// synchronization primitives.  condition variables require a
+// process-local "cond_t saved;" variable.
+
+#define mutex_t              byte
+#define MUTEX_LOCK(m)        atomic { m == 0 -> m = 1 }
+#define MUTEX_UNLOCK(m)      m = 0
+
+#define cond_t               int
+#define COND_WAIT(c, m)      {                                  \
+                               saved = c;                       \
+                               MUTEX_UNLOCK(m);                 \
+                               c != saved -> MUTEX_LOCK(m);     \
+                             }
+#define COND_BROADCAST(c)    c++
+
+// this is the logic from cpus-common.c
+
+mutex_t mutex;
+cond_t exclusive_cond;
+cond_t exclusive_resume;
+byte pending_cpus;
+
+byte running[N_CPUS];
+byte has_waiter[N_CPUS];
+
+#define exclusive_idle()                                          \
+  do                                                              \
+      :: pending_cpus -> COND_WAIT(exclusive_resume, mutex);      \
+      :: else         -> break;                                   \
+  od
+
+#define start_exclusive()                                         \
+    MUTEX_LOCK(mutex);                                            \
+    exclusive_idle();                                             \
+    pending_cpus = 1;                                             \
+                                                                  \
+    i = 0;                                                        \
+    do                                                            \
+       :: i < N_CPUS -> {                                         \
+           if                                                     \
+              :: running[i] -> has_waiter[i] = 1; pending_cpus++; \
+              :: else       -> skip;                              \
+           fi;                                                    \
+           i++;                                                   \
+       }                                                          \
+       :: else -> break;                                          \
+    od;                                                           \
+                                                                  \
+    do                                                            \
+      :: pending_cpus > 1 -> COND_WAIT(exclusive_cond, mutex);    \
+      :: else             -> break;                               \
+    od;
+
+#define end_exclusive()                                           \
+    pending_cpus = 0;                                             \
+    COND_BROADCAST(exclusive_resume);                             \
+    MUTEX_UNLOCK(mutex);
+
+#ifdef USE_MUTEX
+// Simple version using mutexes
+#define cpu_exec_start(id)                                                   \
+    MUTEX_LOCK(mutex);                                                       \
+    exclusive_idle();                                                        \
+    running[id] = 1;                                                         \
+    MUTEX_UNLOCK(mutex);
+
+#define cpu_exec_end(id)                                                     \
+    MUTEX_LOCK(mutex);                                                       \
+    running[id] = 0;                                                         \
+    if                                                                       \
+        :: pending_cpus -> {                                                 \
+            pending_cpus--;                                                  \
+            if                                                               \
+                :: pending_cpus == 1 -> COND_BROADCAST(exclusive_cond);      \
+                :: else -> skip;                                             \
+            fi;                                                              \
+            exclusive_idle();                                                \
+        }                                                                    \
+        :: else -> skip;                                                     \
+    fi;                                                                      \
+    MUTEX_UNLOCK(mutex);
+#else
+// Wait-free fast path, only needs mutex when concurrent with
+// an exclusive section
+#define cpu_exec_start(id)                                                   \
+    running[id] = 1;                                                         \
+    if                                                                       \
+        :: pending_cpus -> {                                                 \
+            MUTEX_LOCK(mutex);                                               \
+            if                                                               \
+                :: !has_waiter[id] -> {                                      \
+                    running[id] = 0;                                         \
+                    exclusive_idle();                                        \
+                    running[id] = 1;                                         \
+                }                                                            \
+                :: else -> skip;                                             \
+            fi;                                                              \
+            MUTEX_UNLOCK(mutex);                                             \
+        }                                                                    \
+        :: else -> skip;                                                     \
+    fi;
+
+#define cpu_exec_end(id)                                                     \
+    running[id] = 0;                                                         \
+    if                                                                       \
+        :: pending_cpus -> {                                                 \
+            MUTEX_LOCK(mutex);                                               \
+            if                                                               \
+                :: has_waiter[id] -> {                                       \
+                    has_waiter[id] = 0;                                      \
+                    pending_cpus--;                                          \
+                    if                                                       \
+                        :: pending_cpus == 1 -> COND_BROADCAST(exclusive_cond); \
+                        :: else -> skip;                                     \
+                    fi;                                                      \
+                    exclusive_idle();                                        \
+                }                                                            \
+                :: else -> skip;                                             \
+            fi;                                                              \
+            MUTEX_UNLOCK(mutex);                                             \
+        }                                                                    \
+        :: else -> skip;                                                     \
+    fi
+#endif
+
+// Promela processes
+
+byte done_cpu;
+byte in_cpu;
+active[N_CPUS] proctype cpu()
+{
+    byte id = _pid % N_CPUS;
+    byte cycles = 0;
+    cond_t saved;
+
+    do
+       :: cycles == N_CYCLES -> break;
+       :: else -> {
+           cycles++;
+           cpu_exec_start(id)
+           in_cpu++;
+           done_cpu++;
+           in_cpu--;
+           cpu_exec_end(id)
+       }
+    od;
+}
+
+byte done_exclusive;
+byte in_exclusive;
+active[N_EXCLUSIVE] proctype exclusive()
+{
+    cond_t saved;
+    byte i;
+
+    start_exclusive();
+    in_exclusive = 1;
+    done_exclusive++;
+    in_exclusive = 0;
+    end_exclusive();
+}
+
+#define LIVENESS   (done_cpu == N_CPUS * N_CYCLES && done_exclusive == N_EXCLUSIVE)
+#define SAFETY     !(in_exclusive && in_cpu)
+
+never {    /* ! ([] SAFETY && <> [] LIVENESS) */
+    do
+    // once the liveness property is satisfied, this is not executable
+    // and the never clause is not accepted
+    :: ! LIVENESS -> accept_liveness: skip
+    :: 1          -> assert(SAFETY)
+    od;
+}
diff --git a/include/qom/cpu.h b/include/qom/cpu.h
index 3817a98..14cf97c 100644
--- a/include/qom/cpu.h
+++ b/include/qom/cpu.h
@@ -249,7 +249,8 @@ struct qemu_work_item {
  * @nr_threads: Number of threads within this CPU.
  * @numa_node: NUMA node this CPU is belonging to.
  * @host_tid: Host thread ID.
- * @running: #true if CPU is currently running;
+ * @running: #true if CPU is currently running (lockless).
+ * @has_waiter: #true if a CPU is currently waiting for the cpu_exec_end;
  * valid under cpu_list_lock.
  * @created: Indicates whether the CPU thread has been successfully created.
  * @interrupt_request: Indicates a pending interrupt request.
@@ -303,7 +304,7 @@ struct CPUState {
 #endif
     int thread_id;
     uint32_t host_tid;
-    bool running;
+    bool running, has_waiter;
     struct QemuCond *halt_cond;
     bool thread_kicked;
     bool created;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH 06/12] cpus-common: move CPU list management to common code
  2016-09-01 10:20 ` [Qemu-devel] [PATCH 06/12] cpus-common: move CPU list management to common code Paolo Bonzini
@ 2016-09-05 10:05   ` Alex Bennée
  2016-09-05 10:29     ` Paolo Bonzini
  0 siblings, 1 reply; 26+ messages in thread
From: Alex Bennée @ 2016-09-05 10:05 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, Sergey Fedorov, Sergey Fedorov, Emilio G. Cota


Paolo Bonzini <pbonzini@redhat.com> writes:

> Add a mutex for the CPU list to system emulation, as it will be used to
> manage safe work.  Abstract manipulation of the CPU list in new functions
> cpu_list_add and cpu_list_remove.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
<snip>
> index a440bcb..73d0c2f 100644

What tree are you based off? git-am is having trouble applying this
patch and the -3way fall back gets very confused:

    Applying: cpus-common: move CPU list management to common code
    fatal: sha1 information is lacking or useless (exec.c).
    error: could not build fake ancestor
    Patch failed at 0001 cpus-common: move CPU list management to common code
    The copy of the patch that failed is found in: .git/rebase-apply/patch
    When you have resolved this problem, run "git am --continue".
    If you prefer to skip this patch, run "git am --skip" instead.
    To restore the original branch and stop patching, run "git am --abort".

I think this is because it looks to find real SHA1's to replay the diff.
Your root commit (0308431) doesn't seem to exist in any public tree.

> --- a/Makefile.target
> +++ b/Makefile.target
> @@ -86,7 +86,7 @@ all: $(PROGS) stap
>  # cpu emulator library
>  obj-y = exec.o translate-all.o cpu-exec.o
>  obj-y += translate-common.o
> -obj-y += cpu-exec-common.o
> +obj-y += cpu-exec-common.o cpus-common.o
>  obj-y += tcg/tcg.o tcg/tcg-op.o tcg/optimize.o
>  obj-$(CONFIG_TCG_INTERPRETER) += tci.o
>  obj-y += tcg/tcg-common.o
> diff --git a/bsd-user/main.c b/bsd-user/main.c
> index b4a0a00..461641a 100644
> --- a/bsd-user/main.c
> +++ b/bsd-user/main.c
> @@ -94,14 +94,6 @@ void fork_end(int child)
>      }
>  }
>
> -void cpu_list_lock(void)
> -{
> -}
> -
> -void cpu_list_unlock(void)
> -{
> -}
> -
>  #ifdef TARGET_I386
>  /***********************************************************/
>  /* CPUX86 core interface */
> @@ -747,6 +739,7 @@ int main(int argc, char **argv)
>      if (argc <= 1)
>          usage();
>
> +    qemu_init_cpu_list();
>      module_call_init(MODULE_INIT_QOM);
>
>      if ((envlist = envlist_create()) == NULL) {
> diff --git a/cpus-common.c b/cpus-common.c
> new file mode 100644
> index 0000000..642e923
> --- /dev/null
> +++ b/cpus-common.c
> @@ -0,0 +1,74 @@
> +/*
> + * CPU thread main loop - common bits for user and system mode emulation
> + *
> + *  Copyright (c) 2003-2005 Fabrice Bellard
> + *
> + * This library is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2 of the License, or (at your option) any later version.
> + *
> + * This library is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with this library; if not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "cpu.h"
> +#include "sysemu/cpus.h"
> +#include "exec/memory-internal.h"
> +
> +static QemuMutex qemu_cpu_list_mutex;
> +
> +void qemu_init_cpu_list(void)
> +{
> +    qemu_mutex_init(&qemu_cpu_list_mutex);
> +}
> +
> +void cpu_list_lock(void)
> +{
> +    qemu_mutex_lock(&qemu_cpu_list_mutex);
> +}
> +
> +void cpu_list_unlock(void)
> +{
> +    qemu_mutex_unlock(&qemu_cpu_list_mutex);
> +}
> +
> +static int cpu_get_free_index(void)
> +{
> +    CPUState *some_cpu;
> +    int cpu_index = 0;
> +
> +    CPU_FOREACH(some_cpu) {
> +        cpu_index++;
> +    }
> +    return cpu_index;
> +}
> +
> +void cpu_list_add(CPUState *cpu)
> +{
> +    qemu_mutex_lock(&qemu_cpu_list_mutex);
> +    if (cpu->cpu_index == UNASSIGNED_CPU_INDEX) {
> +        cpu->cpu_index = cpu_get_free_index();
> +        assert(cpu->cpu_index != UNASSIGNED_CPU_INDEX);
> +    }
> +
> +    QTAILQ_INSERT_TAIL(&cpus, cpu, node);
> +    qemu_mutex_unlock(&qemu_cpu_list_mutex);
> +}
> +
> +void cpu_list_remove(CPUState *cpu)
> +{
> +    /* ??? How is this serialized against CPU_FOREACH? */
> +    qemu_mutex_lock(&qemu_cpu_list_mutex);
> +    if (QTAILQ_IN_USE(cpu, node)) {
> +        QTAILQ_REMOVE(&cpus, cpu, node);
> +    }
> +    cpu->cpu_index = UNASSIGNED_CPU_INDEX;
> +    qemu_mutex_unlock(&qemu_cpu_list_mutex);
> +}
> diff --git a/exec.c b/exec.c
> index 286f5cd..784990c 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -598,31 +598,11 @@ AddressSpace *cpu_get_address_space(CPUState *cpu, int asidx)
>  }
>  #endif
>
> -static int cpu_get_free_index(void)
> -{
> -    CPUState *some_cpu;
> -    int cpu_index = 0;
> -
> -    CPU_FOREACH(some_cpu) {
> -        cpu_index++;
> -    }
> -    return cpu_index;
> -}
> -
>  void cpu_exec_exit(CPUState *cpu)
>  {
>      CPUClass *cc = CPU_GET_CLASS(cpu);
>
> -    cpu_list_lock();
> -    if (!QTAILQ_IN_USE(cpu, node)) {
> -        /* there is nothing to undo since cpu_exec_init() hasn't been called */
> -        cpu_list_unlock();
> -        return;
> -    }
> -
> -    QTAILQ_REMOVE(&cpus, cpu, node);
> -    cpu->cpu_index = UNASSIGNED_CPU_INDEX;
> -    cpu_list_unlock();
> +    cpu_list_remove(cpu);
>
>      if (cc->vmsd != NULL) {
>          vmstate_unregister(NULL, cc->vmsd, cpu);
> @@ -658,13 +638,7 @@ void cpu_exec_init(CPUState *cpu, Error **errp)
>      object_ref(OBJECT(cpu->memory));
>  #endif
>
> -    cpu_list_lock();
> -    if (cpu->cpu_index == UNASSIGNED_CPU_INDEX) {
> -        cpu->cpu_index = cpu_get_free_index();
> -        assert(cpu->cpu_index != UNASSIGNED_CPU_INDEX);
> -    }
> -    QTAILQ_INSERT_TAIL(&cpus, cpu, node);
> -    cpu_list_unlock();
> +    cpu_list_add(cpu);
>
>  #ifndef CONFIG_USER_ONLY
>      if (qdev_get_vmsd(DEVICE(cpu)) == NULL) {
> diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
> index b6a7059..c694d16 100644
> --- a/include/exec/cpu-all.h
> +++ b/include/exec/cpu-all.h
> @@ -187,6 +187,10 @@ void address_space_stq(AddressSpace *as, hwaddr addr, uint64_t val,
>                              MemTxAttrs attrs, MemTxResult *result);
>  #endif
>
> +/* The CPU list lock nests outside tb_lock/tb_unlock.  */
> +void cpu_list_lock(void);
> +void cpu_list_unlock(void);
> +
>  /* page related stuff */
>
>  #define TARGET_PAGE_SIZE (1 << TARGET_PAGE_BITS)
> diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
> index 952bcfe..1d41452 100644
> --- a/include/exec/cpu-common.h
> +++ b/include/exec/cpu-common.h
> @@ -23,6 +23,8 @@ typedef struct CPUListState {
>      FILE *file;
>  } CPUListState;
>
> +void qemu_init_cpu_list(void);
> +
>  #if !defined(CONFIG_USER_ONLY)
>
>  enum device_endian {
> diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
> index a0e87be..36ab8b6 100644
> --- a/include/exec/exec-all.h
> +++ b/include/exec/exec-all.h
> @@ -56,17 +56,6 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
>                                target_ulong pc, target_ulong cs_base,
>                                uint32_t flags,
>                                int cflags);
> -#if defined(CONFIG_USER_ONLY)
> -void cpu_list_lock(void);
> -void cpu_list_unlock(void);
> -#else
> -static inline void cpu_list_unlock(void)
> -{
> -}
> -static inline void cpu_list_lock(void)
> -{
> -}
> -#endif
>
>  void cpu_exec_init(CPUState *cpu, Error **errp);
>  void QEMU_NORETURN cpu_loop_exit(CPUState *cpu);
> diff --git a/include/qom/cpu.h b/include/qom/cpu.h
> index 4aa9e61..ea3233f 100644
> --- a/include/qom/cpu.h
> +++ b/include/qom/cpu.h
> @@ -545,6 +545,18 @@ static inline int cpu_asidx_from_attrs(CPUState *cpu, MemTxAttrs attrs)
>  #endif
>
>  /**
> + * cpu_list_add:
> + * @cpu: The CPU to be added to the list of CPUs.
> + */
> +void cpu_list_add(CPUState *cpu);
> +
> +/**
> + * cpu_list_remove:
> + * @cpu: The CPU to be removed from the list of CPUs.
> + */
> +void cpu_list_remove(CPUState *cpu);
> +
> +/**
>   * cpu_reset:
>   * @cpu: The CPU whose state is to be reset.
>   */
> diff --git a/linux-user/main.c b/linux-user/main.c
> index 6195c68..bd5b58f 100644
> --- a/linux-user/main.c
> +++ b/linux-user/main.c
> @@ -111,7 +111,6 @@ int cpu_get_pic_interrupt(CPUX86State *env)
>     We don't require a full sync, only that no cpus are executing guest code.
>     The alternative is to map target atomic ops onto host equivalents,
>     which requires quite a lot of per host/target work.  */
> -static QemuMutex cpu_list_mutex;
>  static QemuMutex exclusive_lock;
>  static QemuCond exclusive_cond;
>  static QemuCond exclusive_resume;
> @@ -119,7 +118,6 @@ static int pending_cpus;
>
>  void qemu_init_cpu_loop(void)
>  {
> -    qemu_mutex_init(&cpu_list_mutex);
>      qemu_mutex_init(&exclusive_lock);
>      qemu_cond_init(&exclusive_cond);
>      qemu_cond_init(&exclusive_resume);
> @@ -128,6 +126,7 @@ void qemu_init_cpu_loop(void)
>  /* Make sure everything is in a consistent state for calling fork().  */
>  void fork_start(void)
>  {
> +    cpu_list_lock();
>      qemu_mutex_lock(&tcg_ctx.tb_ctx.tb_lock);
>      qemu_mutex_lock(&exclusive_lock);
>      mmap_fork_start();
> @@ -147,14 +146,15 @@ void fork_end(int child)
>          }
>          pending_cpus = 0;
>          qemu_mutex_init(&exclusive_lock);
> -        qemu_mutex_init(&cpu_list_mutex);
>          qemu_cond_init(&exclusive_cond);
>          qemu_cond_init(&exclusive_resume);
>          qemu_mutex_init(&tcg_ctx.tb_ctx.tb_lock);
> +        qemu_init_cpu_list();
>          gdbserver_fork(thread_cpu);
>      } else {
>          qemu_mutex_unlock(&exclusive_lock);
>          qemu_mutex_unlock(&tcg_ctx.tb_ctx.tb_lock);
> +        cpu_list_unlock();
>      }
>  }
>
> @@ -221,16 +221,6 @@ static inline void cpu_exec_end(CPUState *cpu)
>      qemu_mutex_unlock(&exclusive_lock);
>  }
>
> -void cpu_list_lock(void)
> -{
> -    qemu_mutex_lock(&cpu_list_mutex);
> -}
> -
> -void cpu_list_unlock(void)
> -{
> -    qemu_mutex_unlock(&cpu_list_mutex);
> -}
> -
>
>  #ifdef TARGET_I386
>  /***********************************************************/
> @@ -4240,6 +4230,7 @@ int main(int argc, char **argv, char **envp)
>      int ret;
>      int execfd;
>
> +    qemu_init_cpu_list();
>      qemu_init_cpu_loop();
>      module_call_init(MODULE_INIT_QOM);
>
> diff --git a/vl.c b/vl.c
> index b3c80d5..1445935 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -2965,6 +2965,7 @@ int main(int argc, char **argv, char **envp)
>      Error *err = NULL;
>      bool list_data_dirs = false;
>
> +    qemu_init_cpu_list();
>      qemu_init_cpu_loop();
>      qemu_mutex_lock_iothread();


--
Alex Bennée

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH 06/12] cpus-common: move CPU list management to common code
  2016-09-05 10:05   ` Alex Bennée
@ 2016-09-05 10:29     ` Paolo Bonzini
  0 siblings, 0 replies; 26+ messages in thread
From: Paolo Bonzini @ 2016-09-05 10:29 UTC (permalink / raw)
  To: Alex Bennée
  Cc: qemu-devel, Sergey Fedorov, Sergey Fedorov, Emilio G. Cota



On 05/09/2016 12:05, Alex Bennée wrote:
> What tree are you based off? git-am is having trouble applying this
> patch and the -3way fall back gets very confused:
> 
>     Applying: cpus-common: move CPU list management to common code
>     fatal: sha1 information is lacking or useless (exec.c).
>     error: could not build fake ancestor
>     Patch failed at 0001 cpus-common: move CPU list management to common code
>     The copy of the patch that failed is found in: .git/rebase-apply/patch
>     When you have resolved this problem, run "git am --continue".
>     If you prefer to skip this patch, run "git am --skip" instead.
>     To restore the original branch and stop patching, run "git am --abort".
> 
> I think this is because it looks to find real SHA1's to replay the diff.
> Your root commit (0308431) doesn't seem to exist in any public tree.

I have this patch in my tree:

Author: Igor Mammedov <imammedo@redhat.com>  2016-07-25 14:47:12
Committer: Paolo Bonzini <pbonzini@redhat.com>  2016-08-30 14:35:52
Parent: 7846e37fa292b44fbc2be863c8b95a8ae7abeea8 (optionrom: cope with
multiple -O options)
Child:  8ea3d1b2bfcfb2637dcb38917650e0847bc31356 (util/qemu-sockets:
revert Yoda Conditions to normal)
Branches: for-2.6, remotes/mirror/for-2.6, remotes/mirror/safe-work
Follows: v2.7.0-rc4
Precedes:

    qtail: clean up direct access to tqe_prev field

    instead of accessing tqe_prev field dircetly outside
    of queue.h use macros to check if element is in list
    and make sure that afer element is removed from list
    tqe_prev field could be used to do the same check.

    Signed-off-by: Igor Mammedov <imammedo@redhat.com>
    Message-Id: <1469450832-84343-1-git-send-email-imammedo@redhat.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


Paolo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH 08/12] cpus-common: move exclusive work infrastructure from
  2016-09-01 10:20 ` [Qemu-devel] [PATCH 08/12] cpus-common: move exclusive work infrastructure from Paolo Bonzini
@ 2016-09-05 14:55   ` Alex Bennée
  2016-09-05 14:57     ` Paolo Bonzini
  0 siblings, 1 reply; 26+ messages in thread
From: Alex Bennée @ 2016-09-05 14:55 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: qemu-devel, Sergey Fedorov, Sergey Fedorov, Emilio G. Cota linux-user


Subject title seems to be truncated.

Paolo Bonzini <pbonzini@redhat.com> writes:

> This will serve as the base for async_safe_run_on_cpu.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  bsd-user/main.c   | 17 -----------
>  cpus-common.c     | 82 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  cpus.c            |  2 ++
>  include/qom/cpu.h | 40 ++++++++++++++++++++++++-
>  linux-user/main.c | 87 -------------------------------------------------------
>  5 files changed, 123 insertions(+), 105 deletions(-)
>
> diff --git a/bsd-user/main.c b/bsd-user/main.c
> index 125067a..f29e0e3 100644
> --- a/bsd-user/main.c
> +++ b/bsd-user/main.c
> @@ -66,23 +66,6 @@ int cpu_get_pic_interrupt(CPUX86State *env)
>  }
>  #endif
>
> -/* These are no-ops because we are not threadsafe.  */
> -static inline void cpu_exec_start(CPUState *cpu)
> -{
> -}
> -
> -static inline void cpu_exec_end(CPUState *cpu)
> -{
> -}
> -
> -static inline void start_exclusive(void)
> -{
> -}
> -
> -static inline void end_exclusive(void)
> -{
> -}
> -

Does this mean BSD user now gets exclusive support as well or is other
plumbing required for the tb_flush to work?

>  void fork_start(void)
>  {
>  }
> diff --git a/cpus-common.c b/cpus-common.c
> index 12729e5..47f7c06 100644
> --- a/cpus-common.c
> +++ b/cpus-common.c
> @@ -23,11 +23,21 @@
>  #include "exec/memory-internal.h"
>
>  static QemuMutex qemu_cpu_list_mutex;
> +static QemuCond exclusive_cond;
> +static QemuCond exclusive_resume;
>  static QemuCond qemu_work_cond;
>
> +static int pending_cpus;
> +
>  void qemu_init_cpu_list(void)
>  {
> +    /* This is needed because qemu_init_cpu_list is also called by the
> +     * child process in a fork.  */
> +    pending_cpus = 0;
> +
>      qemu_mutex_init(&qemu_cpu_list_mutex);
> +    qemu_cond_init(&exclusive_cond);
> +    qemu_cond_init(&exclusive_resume);
>      qemu_cond_init(&qemu_work_cond);
>  }
>
> @@ -52,6 +62,12 @@ static int cpu_get_free_index(void)
>      return cpu_index;
>  }
>
> +static void finish_safe_work(CPUState *cpu)
> +{
> +    cpu_exec_start(cpu);
> +    cpu_exec_end(cpu);
> +}
> +
>  void cpu_list_add(CPUState *cpu)
>  {
>      qemu_mutex_lock(&qemu_cpu_list_mutex);
> @@ -62,6 +78,8 @@ void cpu_list_add(CPUState *cpu)
>
>      QTAILQ_INSERT_TAIL(&cpus, cpu, node);
>      qemu_mutex_unlock(&qemu_cpu_list_mutex);
> +
> +    finish_safe_work(cpu);
>  }
>
>  void cpu_list_remove(CPUState *cpu)
> @@ -131,6 +149,70 @@ void async_run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data)
>      queue_work_on_cpu(cpu, wi);
>  }
>
> +/* Wait for pending exclusive operations to complete.  The exclusive lock
> +   must be held.  */
> +static inline void exclusive_idle(void)
> +{
> +    while (pending_cpus) {
> +        qemu_cond_wait(&exclusive_resume, &qemu_cpu_list_mutex);
> +    }
> +}
> +
> +/* Start an exclusive operation.
> +   Must only be called from outside cpu_exec, takes
> +   qemu_cpu_list_mutex.   */
> +void start_exclusive(void)
> +{
> +    CPUState *other_cpu;
> +
> +    qemu_mutex_lock(&qemu_cpu_list_mutex);
> +    exclusive_idle();
> +
> +    /* Make all other cpus stop executing.  */
> +    pending_cpus = 1;
> +    CPU_FOREACH(other_cpu) {
> +        if (other_cpu->running) {
> +            pending_cpus++;
> +            qemu_cpu_kick(other_cpu);
> +        }
> +    }
> +    if (pending_cpus > 1) {
> +        qemu_cond_wait(&exclusive_cond, &qemu_cpu_list_mutex);
> +    }
> +}
> +
> +/* Finish an exclusive operation.  Releases qemu_cpu_list_mutex.  */
> +void end_exclusive(void)
> +{
> +    pending_cpus = 0;
> +    qemu_cond_broadcast(&exclusive_resume);
> +    qemu_mutex_unlock(&qemu_cpu_list_mutex);
> +}
> +
> +/* Wait for exclusive ops to finish, and begin cpu execution.  */
> +void cpu_exec_start(CPUState *cpu)
> +{
> +    qemu_mutex_lock(&qemu_cpu_list_mutex);
> +    exclusive_idle();
> +    cpu->running = true;
> +    qemu_mutex_unlock(&qemu_cpu_list_mutex);
> +}
> +
> +/* Mark cpu as not executing, and release pending exclusive ops.  */
> +void cpu_exec_end(CPUState *cpu)
> +{
> +    qemu_mutex_lock(&qemu_cpu_list_mutex);
> +    cpu->running = false;
> +    if (pending_cpus > 1) {
> +        pending_cpus--;
> +        if (pending_cpus == 1) {
> +            qemu_cond_signal(&exclusive_cond);
> +        }
> +    }
> +    exclusive_idle();
> +    qemu_mutex_unlock(&qemu_cpu_list_mutex);
> +}
> +
>  void process_queued_cpu_work(CPUState *cpu)
>  {
>      struct qemu_work_item *wi;
> diff --git a/cpus.c b/cpus.c
> index e1bdc16..b024604 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -1456,7 +1456,9 @@ static int tcg_cpu_exec(CPUState *cpu)
>          cpu->icount_decr.u16.low = decr;
>          cpu->icount_extra = count;
>      }
> +    cpu_exec_start(cpu);
>      ret = cpu_exec(cpu);
> +    cpu_exec_end(cpu);
>  #ifdef CONFIG_PROFILER
>      tcg_time += profile_getclock() - ti;
>  #endif
> diff --git a/include/qom/cpu.h b/include/qom/cpu.h
> index d7688f6..0e04e8f 100644
> --- a/include/qom/cpu.h
> +++ b/include/qom/cpu.h
> @@ -249,7 +249,8 @@ struct qemu_work_item {
>   * @nr_threads: Number of threads within this CPU.
>   * @numa_node: NUMA node this CPU is belonging to.
>   * @host_tid: Host thread ID.
> - * @running: #true if CPU is currently running (usermode).
> + * @running: #true if CPU is currently running;
> + * valid under cpu_list_lock.
>   * @created: Indicates whether the CPU thread has been successfully created.
>   * @interrupt_request: Indicates a pending interrupt request.
>   * @halted: Nonzero if the CPU is in suspended state.
> @@ -826,6 +827,43 @@ void cpu_remove_sync(CPUState *cpu);
>  void process_queued_cpu_work(CPUState *cpu);
>
>  /**
> + * cpu_exec_start:
> + * @cpu: The CPU for the current thread.
> + *
> + * Record that a CPU has started execution and can be interrupted with
> + * cpu_exit.
> + */
> +void cpu_exec_start(CPUState *cpu);
> +
> +/**
> + * cpu_exec_end:
> + * @cpu: The CPU for the current thread.
> + *
> + * Record that a CPU has stopped execution and safe work can be executed
> + * without interrupting it.
> + */
> +void cpu_exec_end(CPUState *cpu);
> +
> +/**
> + * start_exclusive:
> + * @cpu: The CPU for the current thread.
> + *
> + * Similar to async_safe_work_run_on_cpu, but the work item is only
> + * run on one CPU and is between start_exclusive and end_exclusive.
> + * Returns with the CPU list lock taken (which nests outside all
> + * other locks except the BQL).
> + */
> +void start_exclusive(void);

Hmm the comment here looks as though it has been transplanted or is
somehow mangled. The function doesn't take @cpu and isn't like
async_safe_wrok_run_on_cpu.

> +
> +/**
> + * end_exclusive:
> + *
> + * Concludes an exclusive execution section started by start_exclusive.
> + * Releases the CPU list lock.
> + */
> +void end_exclusive(void);
> +
> +/**
>   * qemu_init_vcpu:
>   * @cpu: The vCPU to initialize.
>   *
> diff --git a/linux-user/main.c b/linux-user/main.c
> index 4972bbe..cecab8d 100644
> --- a/linux-user/main.c
> +++ b/linux-user/main.c
> @@ -107,28 +107,11 @@ int cpu_get_pic_interrupt(CPUX86State *env)
>  /***********************************************************/
>  /* Helper routines for implementing atomic operations.  */
>
> -/* To implement exclusive operations we force all cpus to syncronise.
> -   We don't require a full sync, only that no cpus are executing guest code.
> -   The alternative is to map target atomic ops onto host equivalents,
> -   which requires quite a lot of per host/target work.  */
> -static QemuMutex exclusive_lock;
> -static QemuCond exclusive_cond;
> -static QemuCond exclusive_resume;
> -static int pending_cpus;
> -
> -void qemu_init_cpu_loop(void)
> -{
> -    qemu_mutex_init(&exclusive_lock);
> -    qemu_cond_init(&exclusive_cond);
> -    qemu_cond_init(&exclusive_resume);
> -}
> -
>  /* Make sure everything is in a consistent state for calling fork().  */
>  void fork_start(void)
>  {
>      cpu_list_lock();
>      qemu_mutex_lock(&tcg_ctx.tb_ctx.tb_lock);
> -    qemu_mutex_lock(&exclusive_lock);
>      mmap_fork_start();
>  }
>
> @@ -144,84 +127,15 @@ void fork_end(int child)
>                  QTAILQ_REMOVE(&cpus, cpu, node);
>              }
>          }
> -        pending_cpus = 0;
> -        qemu_mutex_init(&exclusive_lock);
> -        qemu_cond_init(&exclusive_cond);
> -        qemu_cond_init(&exclusive_resume);
>          qemu_mutex_init(&tcg_ctx.tb_ctx.tb_lock);
>          qemu_init_cpu_list();
>          gdbserver_fork(thread_cpu);
>      } else {
> -        qemu_mutex_unlock(&exclusive_lock);
>          qemu_mutex_unlock(&tcg_ctx.tb_ctx.tb_lock);
>          cpu_list_unlock();
>      }
>  }
>
> -/* Wait for pending exclusive operations to complete.  The exclusive lock
> -   must be held.  */
> -static inline void exclusive_idle(void)
> -{
> -    while (pending_cpus) {
> -        qemu_cond_wait(&exclusive_resume, &exclusive_lock);
> -    }
> -}
> -
> -/* Start an exclusive operation.
> -   Must only be called from outside cpu_exec.   */
> -static inline void start_exclusive(void)
> -{
> -    CPUState *other_cpu;
> -
> -    qemu_mutex_lock(&exclusive_lock);
> -    exclusive_idle();
> -
> -    pending_cpus = 1;
> -    /* Make all other cpus stop executing.  */
> -    CPU_FOREACH(other_cpu) {
> -        if (other_cpu->running) {
> -            pending_cpus++;
> -            cpu_exit(other_cpu);
> -        }
> -    }
> -    if (pending_cpus > 1) {
> -        qemu_cond_wait(&exclusive_cond, &exclusive_lock);
> -    }
> -}
> -
> -/* Finish an exclusive operation.  */
> -static inline void __attribute__((unused)) end_exclusive(void)
> -{
> -    pending_cpus = 0;
> -    qemu_cond_broadcast(&exclusive_resume);
> -    qemu_mutex_unlock(&exclusive_lock);
> -}
> -
> -/* Wait for exclusive ops to finish, and begin cpu execution.  */
> -static inline void cpu_exec_start(CPUState *cpu)
> -{
> -    qemu_mutex_lock(&exclusive_lock);
> -    exclusive_idle();
> -    cpu->running = true;
> -    qemu_mutex_unlock(&exclusive_lock);
> -}
> -
> -/* Mark cpu as not executing, and release pending exclusive ops.  */
> -static inline void cpu_exec_end(CPUState *cpu)
> -{
> -    qemu_mutex_lock(&exclusive_lock);
> -    cpu->running = false;
> -    if (pending_cpus > 1) {
> -        pending_cpus--;
> -        if (pending_cpus == 1) {
> -            qemu_cond_signal(&exclusive_cond);
> -        }
> -    }
> -    exclusive_idle();
> -    qemu_mutex_unlock(&exclusive_lock);
> -}
> -
> -
>  #ifdef TARGET_I386
>  /***********************************************************/
>  /* CPUX86 core interface */
> @@ -4256,7 +4170,6 @@ int main(int argc, char **argv, char **envp)
>      int execfd;
>
>      qemu_init_cpu_list();
> -    qemu_init_cpu_loop();
>      module_call_init(MODULE_INIT_QOM);
>
>      if ((envlist = envlist_create()) == NULL) {


Aside from a few niggles about comments it looks good to me. Fix those
and you can have a:

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

--
Alex Bennée

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH 08/12] cpus-common: move exclusive work infrastructure from
  2016-09-05 14:55   ` Alex Bennée
@ 2016-09-05 14:57     ` Paolo Bonzini
  0 siblings, 0 replies; 26+ messages in thread
From: Paolo Bonzini @ 2016-09-05 14:57 UTC (permalink / raw)
  To: Alex Bennée
  Cc: qemu-devel, Sergey Fedorov, Sergey Fedorov, Emilio G. Cota linux-user



On 05/09/2016 16:55, Alex Bennée wrote:
>> > -/* These are no-ops because we are not threadsafe.  */
>> > -static inline void cpu_exec_start(CPUState *cpu)
>> > -{
>> > -}
>> > -
>> > -static inline void cpu_exec_end(CPUState *cpu)
>> > -{
>> > -}
>> > -
>> > -static inline void start_exclusive(void)
>> > -{
>> > -}
>> > -
>> > -static inline void end_exclusive(void)
>> > -{
>> > -}
>> > -
> 
> Does this mean BSD user now gets exclusive support as well or is other
> plumbing required for the tb_flush to work?

It does, though I have forgotten to add the call to process the work items.

>> >  void fork_start(void)
>> >  {
>> >  }
>> > diff --git a/cpus-common.c b/cpus-common.c
>> > index 12729e5..47f7c06 100644
>> > --- a/cpus-common.c
>> > +++ b/cpus-common.c
>> > @@ -23,11 +23,21 @@
>> >  #include "exec/memory-internal.h"
>> >
>> >  static QemuMutex qemu_cpu_list_mutex;
>> > +static QemuCond exclusive_cond;
>> > +static QemuCond exclusive_resume;
>> >  static QemuCond qemu_work_cond;
>> >
>> > +static int pending_cpus;
>> > +
>> >  void qemu_init_cpu_list(void)
>> >  {
>> > +    /* This is needed because qemu_init_cpu_list is also called by the
>> > +     * child process in a fork.  */
>> > +    pending_cpus = 0;
>> > +
>> >      qemu_mutex_init(&qemu_cpu_list_mutex);
>> > +    qemu_cond_init(&exclusive_cond);
>> > +    qemu_cond_init(&exclusive_resume);
>> >      qemu_cond_init(&qemu_work_cond);
>> >  }
>> >
>> > @@ -52,6 +62,12 @@ static int cpu_get_free_index(void)
>> >      return cpu_index;
>> >  }
>> >
>> > +static void finish_safe_work(CPUState *cpu)
>> > +{
>> > +    cpu_exec_start(cpu);
>> > +    cpu_exec_end(cpu);
>> > +}
>> > +
>> >  void cpu_list_add(CPUState *cpu)
>> >  {
>> >      qemu_mutex_lock(&qemu_cpu_list_mutex);
>> > @@ -62,6 +78,8 @@ void cpu_list_add(CPUState *cpu)
>> >
>> >      QTAILQ_INSERT_TAIL(&cpus, cpu, node);
>> >      qemu_mutex_unlock(&qemu_cpu_list_mutex);
>> > +
>> > +    finish_safe_work(cpu);
>> >  }
>> >
>> >  void cpu_list_remove(CPUState *cpu)
>> > @@ -131,6 +149,70 @@ void async_run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data)
>> >      queue_work_on_cpu(cpu, wi);
>> >  }
>> >
>> > +/* Wait for pending exclusive operations to complete.  The exclusive lock
>> > +   must be held.  */
>> > +static inline void exclusive_idle(void)
>> > +{
>> > +    while (pending_cpus) {
>> > +        qemu_cond_wait(&exclusive_resume, &qemu_cpu_list_mutex);
>> > +    }
>> > +}
>> > +
>> > +/* Start an exclusive operation.
>> > +   Must only be called from outside cpu_exec, takes
>> > +   qemu_cpu_list_mutex.   */
>> > +void start_exclusive(void)
>> > +{
>> > +    CPUState *other_cpu;
>> > +
>> > +    qemu_mutex_lock(&qemu_cpu_list_mutex);
>> > +    exclusive_idle();
>> > +
>> > +    /* Make all other cpus stop executing.  */
>> > +    pending_cpus = 1;
>> > +    CPU_FOREACH(other_cpu) {
>> > +        if (other_cpu->running) {
>> > +            pending_cpus++;
>> > +            qemu_cpu_kick(other_cpu);
>> > +        }
>> > +    }
>> > +    if (pending_cpus > 1) {
>> > +        qemu_cond_wait(&exclusive_cond, &qemu_cpu_list_mutex);
>> > +    }
>> > +}
>> > +
>> > +/* Finish an exclusive operation.  Releases qemu_cpu_list_mutex.  */
>> > +void end_exclusive(void)
>> > +{
>> > +    pending_cpus = 0;
>> > +    qemu_cond_broadcast(&exclusive_resume);
>> > +    qemu_mutex_unlock(&qemu_cpu_list_mutex);
>> > +}
>> > +
>> > +/* Wait for exclusive ops to finish, and begin cpu execution.  */
>> > +void cpu_exec_start(CPUState *cpu)
>> > +{
>> > +    qemu_mutex_lock(&qemu_cpu_list_mutex);
>> > +    exclusive_idle();
>> > +    cpu->running = true;
>> > +    qemu_mutex_unlock(&qemu_cpu_list_mutex);
>> > +}
>> > +
>> > +/* Mark cpu as not executing, and release pending exclusive ops.  */
>> > +void cpu_exec_end(CPUState *cpu)
>> > +{
>> > +    qemu_mutex_lock(&qemu_cpu_list_mutex);
>> > +    cpu->running = false;
>> > +    if (pending_cpus > 1) {
>> > +        pending_cpus--;
>> > +        if (pending_cpus == 1) {
>> > +            qemu_cond_signal(&exclusive_cond);
>> > +        }
>> > +    }
>> > +    exclusive_idle();
>> > +    qemu_mutex_unlock(&qemu_cpu_list_mutex);
>> > +}
>> > +
>> >  void process_queued_cpu_work(CPUState *cpu)
>> >  {
>> >      struct qemu_work_item *wi;
>> > diff --git a/cpus.c b/cpus.c
>> > index e1bdc16..b024604 100644
>> > --- a/cpus.c
>> > +++ b/cpus.c
>> > @@ -1456,7 +1456,9 @@ static int tcg_cpu_exec(CPUState *cpu)
>> >          cpu->icount_decr.u16.low = decr;
>> >          cpu->icount_extra = count;
>> >      }
>> > +    cpu_exec_start(cpu);
>> >      ret = cpu_exec(cpu);
>> > +    cpu_exec_end(cpu);
>> >  #ifdef CONFIG_PROFILER
>> >      tcg_time += profile_getclock() - ti;
>> >  #endif
>> > diff --git a/include/qom/cpu.h b/include/qom/cpu.h
>> > index d7688f6..0e04e8f 100644
>> > --- a/include/qom/cpu.h
>> > +++ b/include/qom/cpu.h
>> > @@ -249,7 +249,8 @@ struct qemu_work_item {
>> >   * @nr_threads: Number of threads within this CPU.
>> >   * @numa_node: NUMA node this CPU is belonging to.
>> >   * @host_tid: Host thread ID.
>> > - * @running: #true if CPU is currently running (usermode).
>> > + * @running: #true if CPU is currently running;
>> > + * valid under cpu_list_lock.
>> >   * @created: Indicates whether the CPU thread has been successfully created.
>> >   * @interrupt_request: Indicates a pending interrupt request.
>> >   * @halted: Nonzero if the CPU is in suspended state.
>> > @@ -826,6 +827,43 @@ void cpu_remove_sync(CPUState *cpu);
>> >  void process_queued_cpu_work(CPUState *cpu);
>> >
>> >  /**
>> > + * cpu_exec_start:
>> > + * @cpu: The CPU for the current thread.
>> > + *
>> > + * Record that a CPU has started execution and can be interrupted with
>> > + * cpu_exit.
>> > + */
>> > +void cpu_exec_start(CPUState *cpu);
>> > +
>> > +/**
>> > + * cpu_exec_end:
>> > + * @cpu: The CPU for the current thread.
>> > + *
>> > + * Record that a CPU has stopped execution and safe work can be executed
>> > + * without interrupting it.
>> > + */
>> > +void cpu_exec_end(CPUState *cpu);
>> > +
>> > +/**
>> > + * start_exclusive:
>> > + * @cpu: The CPU for the current thread.
>> > + *
>> > + * Similar to async_safe_work_run_on_cpu, but the work item is only
>> > + * run on one CPU and is between start_exclusive and end_exclusive.
>> > + * Returns with the CPU list lock taken (which nests outside all
>> > + * other locks except the BQL).
>> > + */
>> > +void start_exclusive(void);
> 
> Hmm the comment here looks as though it has been transplanted or is
> somehow mangled. The function doesn't take @cpu and isn't like
> async_safe_wrok_run_on_cpu.

It doesn't take @cpu, but it is kind of like async_safe_run_on_cpu.
However, async_safe_run_on_cpu doesn't exist yet. :)

Paolo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH 09/12] cpus-common: always defer async_run_on_cpu work items
  2016-09-01 10:20 ` [Qemu-devel] [PATCH 09/12] cpus-common: always defer async_run_on_cpu work items Paolo Bonzini
@ 2016-09-05 14:57   ` Alex Bennée
  0 siblings, 0 replies; 26+ messages in thread
From: Alex Bennée @ 2016-09-05 14:57 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, Sergey Fedorov, Sergey Fedorov, Emilio G. Cota


Paolo Bonzini <pbonzini@redhat.com> writes:

> async_run_on_cpu is only called from the I/O thread, not from CPU threads,
> so it doesn't make any difference.  It will make a difference however
> for async_safe_run_on_cpu.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  cpus-common.c | 5 -----
>  1 file changed, 5 deletions(-)
>
> diff --git a/cpus-common.c b/cpus-common.c
> index 47f7c06..59c8dc8 100644
> --- a/cpus-common.c
> +++ b/cpus-common.c
> @@ -136,11 +136,6 @@ void async_run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data)
>  {
>      struct qemu_work_item *wi;
>
> -    if (qemu_cpu_is_self(cpu)) {
> -        func(cpu, data);
> -        return;
> -    }
> -
>      wi = g_malloc0(sizeof(struct qemu_work_item));
>      wi->func = func;
>      wi->data = data;

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

--
Alex Bennée

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH 10/12] cpus-common: Introduce async_safe_run_on_cpu()
  2016-09-01 10:20 ` [Qemu-devel] [PATCH 10/12] cpus-common: Introduce async_safe_run_on_cpu() Paolo Bonzini
@ 2016-09-05 15:08   ` Alex Bennée
  2016-09-05 15:14     ` Paolo Bonzini
  2016-09-12 18:25   ` Pranith Kumar
  1 sibling, 1 reply; 26+ messages in thread
From: Alex Bennée @ 2016-09-05 15:08 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, Sergey Fedorov, Sergey Fedorov, Emilio G. Cota


Paolo Bonzini <pbonzini@redhat.com> writes:

> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  cpus-common.c     | 25 +++++++++++++++++++++++++
>  include/qom/cpu.h | 12 ++++++++++++
>  2 files changed, 37 insertions(+)
>
> diff --git a/cpus-common.c b/cpus-common.c
> index 59c8dc8..88cf5ec 100644
> --- a/cpus-common.c
> +++ b/cpus-common.c
> @@ -144,6 +144,11 @@ void async_run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data)
>      queue_work_on_cpu(cpu, wi);
>  }
>
> +typedef struct SafeWorkItem {
> +    run_on_cpu_func func;
> +    void *data;
> +} SafeWorkItem;
> +
>  /* Wait for pending exclusive operations to complete.  The exclusive lock
>     must be held.  */
>  static inline void exclusive_idle(void)
> @@ -208,6 +213,26 @@ void cpu_exec_end(CPUState *cpu)
>      qemu_mutex_unlock(&qemu_cpu_list_mutex);
>  }
>
> +static void async_safe_run_on_cpu_fn(CPUState *cpu, void *data)
> +{
> +    SafeWorkItem *w = data;
> +
> +    start_exclusive();
> +    w->func(cpu, w->data);
> +    end_exclusive();
> +    g_free(w);
> +}
> +
> +void async_safe_run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data)
> +{
> +    SafeWorkItem *w = g_new(SafeWorkItem, 1);

OK so I appreciate this approach is a neat way to embed safe work in the
existing queue but does it really offer that much more for yet another
dynamic allocation vs an extra flag to the WorkItem?

In previous iterations I can DoS QEMU with a guest that does heavy
cross-CPU TLB flushing which led to a storm of mini allocations (for the
list and associated structures). This caused the massive memory usage as
the queue backed up.

I appreciate it was a fairly special test case and I introduced other
mitigations in the base patches cputlb code to get around it however it
was the driver for me experimenting with the pre-allocated array for
holding work items.

> +
> +    w->func = func;
> +    w->data = data;
> +
> +    async_run_on_cpu(cpu, async_safe_run_on_cpu_fn, w);
> +}
> +
>  void process_queued_cpu_work(CPUState *cpu)
>  {
>      struct qemu_work_item *wi;
> diff --git a/include/qom/cpu.h b/include/qom/cpu.h
> index 0e04e8f..54a875e 100644
> --- a/include/qom/cpu.h
> +++ b/include/qom/cpu.h
> @@ -663,6 +663,18 @@ void run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data);
>  void async_run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data);
>
>  /**
> + * async_safe_run_on_cpu:
> + * @cpu: The vCPU to run on.
> + * @func: The function to be executed.
> + * @data: Data to pass to the function.
> + *
> + * Schedules the function @func for execution on the vCPU @cpu asynchronously,
> + * while all other vCPUs are sleeping.  @func is called with the CPU list lock
> + * taken (and for system emulation the BQL); any other lock can be taken safely.
> + */
> +void async_safe_run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data);
> +
> +/**
>   * qemu_get_cpu:
>   * @index: The CPUState@cpu_index value of the CPU to obtain.
>   *


--
Alex Bennée

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH 10/12] cpus-common: Introduce async_safe_run_on_cpu()
  2016-09-05 15:08   ` Alex Bennée
@ 2016-09-05 15:14     ` Paolo Bonzini
  2016-09-05 15:41       ` Alex Bennée
  0 siblings, 1 reply; 26+ messages in thread
From: Paolo Bonzini @ 2016-09-05 15:14 UTC (permalink / raw)
  To: Alex Bennée
  Cc: qemu-devel, Sergey Fedorov, Sergey Fedorov, Emilio G. Cota



On 05/09/2016 17:08, Alex Bennée wrote:
> 
> Paolo Bonzini <pbonzini@redhat.com> writes:
> 
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> ---
>>  cpus-common.c     | 25 +++++++++++++++++++++++++
>>  include/qom/cpu.h | 12 ++++++++++++
>>  2 files changed, 37 insertions(+)
>>
>> diff --git a/cpus-common.c b/cpus-common.c
>> index 59c8dc8..88cf5ec 100644
>> --- a/cpus-common.c
>> +++ b/cpus-common.c
>> @@ -144,6 +144,11 @@ void async_run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data)
>>      queue_work_on_cpu(cpu, wi);
>>  }
>>
>> +typedef struct SafeWorkItem {
>> +    run_on_cpu_func func;
>> +    void *data;
>> +} SafeWorkItem;
>> +
>>  /* Wait for pending exclusive operations to complete.  The exclusive lock
>>     must be held.  */
>>  static inline void exclusive_idle(void)
>> @@ -208,6 +213,26 @@ void cpu_exec_end(CPUState *cpu)
>>      qemu_mutex_unlock(&qemu_cpu_list_mutex);
>>  }
>>
>> +static void async_safe_run_on_cpu_fn(CPUState *cpu, void *data)
>> +{
>> +    SafeWorkItem *w = data;
>> +
>> +    start_exclusive();
>> +    w->func(cpu, w->data);
>> +    end_exclusive();
>> +    g_free(w);
>> +}
>> +
>> +void async_safe_run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data)
>> +{
>> +    SafeWorkItem *w = g_new(SafeWorkItem, 1);
> 
> OK so I appreciate this approach is a neat way to embed safe work in the
> existing queue but does it really offer that much more for yet another
> dynamic allocation vs an extra flag to the WorkItem?

Actually I had it that way in the "first version" (the one that I
promised last Monday).  The questions are:

1) is it so different to have 1 vs. 2 mini allocations

2) is this a fast path

but it's no big deal, I can certainly change back.

Paolo

> In previous iterations I can DoS QEMU with a guest that does heavy
> cross-CPU TLB flushing which led to a storm of mini allocations (for the
> list and associated structures). This caused the massive memory usage as
> the queue backed up.
> 
> I appreciate it was a fairly special test case and I introduced other
> mitigations in the base patches cputlb code to get around it however it
> was the driver for me experimenting with the pre-allocated array for
> holding work items.
> 
>> +
>> +    w->func = func;
>> +    w->data = data;
>> +
>> +    async_run_on_cpu(cpu, async_safe_run_on_cpu_fn, w);
>> +}
>> +
>>  void process_queued_cpu_work(CPUState *cpu)
>>  {
>>      struct qemu_work_item *wi;
>> diff --git a/include/qom/cpu.h b/include/qom/cpu.h
>> index 0e04e8f..54a875e 100644
>> --- a/include/qom/cpu.h
>> +++ b/include/qom/cpu.h
>> @@ -663,6 +663,18 @@ void run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data);
>>  void async_run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data);
>>
>>  /**
>> + * async_safe_run_on_cpu:
>> + * @cpu: The vCPU to run on.
>> + * @func: The function to be executed.
>> + * @data: Data to pass to the function.
>> + *
>> + * Schedules the function @func for execution on the vCPU @cpu asynchronously,
>> + * while all other vCPUs are sleeping.  @func is called with the CPU list lock
>> + * taken (and for system emulation the BQL); any other lock can be taken safely.
>> + */
>> +void async_safe_run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data);
>> +
>> +/**
>>   * qemu_get_cpu:
>>   * @index: The CPUState@cpu_index value of the CPU to obtain.
>>   *
> 
> 
> --
> Alex Bennée
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH 12/12] cpus-common: lock-free fast path for cpu_exec_start/end
  2016-09-01 10:20 ` [Qemu-devel] [PATCH 12/12] cpus-common: lock-free fast path for cpu_exec_start/end Paolo Bonzini
@ 2016-09-05 15:25   ` Alex Bennée
  2016-09-05 16:57     ` Paolo Bonzini
  0 siblings, 1 reply; 26+ messages in thread
From: Alex Bennée @ 2016-09-05 15:25 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, Sergey Fedorov, Sergey Fedorov, Emilio G. Cota


Paolo Bonzini <pbonzini@redhat.com> writes:

> Set cpu->running without taking the cpu_list lock, only look at it if
> there is a concurrent exclusive section.  This requires adding a new
> field to CPUState, which records whether a running CPU is being counted
> in pending_cpus.  When an exclusive section is started concurrently with
> cpu_exec_start, cpu_exec_start can use the new field to wait for the end
> of the exclusive section.
>
> This a separate patch for easier bisection of issues.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  cpus-common.c              |  72 +++++++++++++--
>  docs/tcg-exclusive.promela | 224 +++++++++++++++++++++++++++++++++++++++++++++
>  include/qom/cpu.h          |   5 +-
>  3 files changed, 289 insertions(+), 12 deletions(-)
>  create mode 100644 docs/tcg-exclusive.promela
>
> diff --git a/cpus-common.c b/cpus-common.c
> index 88cf5ec..443617a 100644
> --- a/cpus-common.c
> +++ b/cpus-common.c
> @@ -170,8 +170,12 @@ void start_exclusive(void)
>
>      /* Make all other cpus stop executing.  */
>      pending_cpus = 1;
> +
> +    /* Write pending_cpus before reading other_cpu->running.  */
> +    smp_mb();
>      CPU_FOREACH(other_cpu) {
>          if (other_cpu->running) {
> +            other_cpu->has_waiter = true;
>              pending_cpus++;
>              qemu_cpu_kick(other_cpu);
>          }
> @@ -192,25 +196,73 @@ void end_exclusive(void)
>  /* Wait for exclusive ops to finish, and begin cpu execution.  */
>  void cpu_exec_start(CPUState *cpu)
>  {
> -    qemu_mutex_lock(&qemu_cpu_list_mutex);
> -    exclusive_idle();
>      cpu->running = true;
> -    qemu_mutex_unlock(&qemu_cpu_list_mutex);
> +
> +    /* Write cpu->running before reading pending_cpus.  */
> +    smp_mb();
> +
> +    /* 1. start_exclusive saw cpu->running == true and pending_cpus >= 1.
> +     * After taking the lock we'll see cpu->has_waiter == true and run---not
> +     * for long because start_exclusive kicked us.  cpu_exec_end will
> +     * decrement pending_cpus and signal the waiter.
> +     *
> +     * 2. start_exclusive saw cpu->running == false but pending_cpus >= 1.
> +     * This includes the case when an exclusive item is running now.
> +     * Then we'll see cpu->has_waiter == false and wait for the item to
> +     * complete.
> +     *
> +     * 3. pending_cpu == 0.  Then start_exclusive is definitely going to
> +     * see cpu->running == true, and it will kick the CPU.
> +     */
> +    if (pending_cpus) {
> +        qemu_mutex_lock(&qemu_cpu_list_mutex);
> +        if (!cpu->has_waiter) {
> +            /* Not counted in pending_cpus, let the exclusive item
> +             * run.  Since we have the lock, set cpu->running to true
> +             * while holding it instead of retrying.
> +             */
> +            cpu->running = false;
> +            exclusive_idle();
> +            cpu->running = true;
> +        } else {
> +            /* Counted in pending_cpus, go ahead.  */
> +        }
> +        qemu_mutex_unlock(&qemu_cpu_list_mutex);
> +    }
>  }
>
>  /* Mark cpu as not executing, and release pending exclusive ops.  */
>  void cpu_exec_end(CPUState *cpu)
>  {
> -    qemu_mutex_lock(&qemu_cpu_list_mutex);
>      cpu->running = false;
> -    if (pending_cpus > 1) {
> -        pending_cpus--;
> -        if (pending_cpus == 1) {
> -            qemu_cond_signal(&exclusive_cond);
> +
> +    /* Write cpu->running before reading pending_cpus.  */
> +    smp_mb();
> +
> +    /* 1. start_exclusive saw cpu->running == true.  Then it will increment
> +     * pending_cpus and wait for exclusive_cond.  After taking the lock
> +     * we'll see cpu->has_waiter == true.
> +     *
> +     * 2. start_exclusive saw cpu->running == false but here pending_cpus >= 1.
> +     * This includes the case when an exclusive item is running now.
> +     * Then we'll see cpu->has_waiter == false and not touch pending_cpus,
> +     * but will run exclusive_idle to wait for the item to complete.
> +     *
> +     * 3. pending_cpu == 0.  Then start_exclusive is definitely going to
> +     * see cpu->running == false, and it can ignore this CPU until the
> +     * next cpu_exec_start.
> +     */
> +    if (pending_cpus) {
> +        qemu_mutex_lock(&qemu_cpu_list_mutex);
> +        if (cpu->has_waiter) {
> +            cpu->has_waiter = false;
> +            if (--pending_cpus == 1) {
> +                qemu_cond_signal(&exclusive_cond);
> +            }
> +            exclusive_idle();
>          }
> +        qemu_mutex_unlock(&qemu_cpu_list_mutex);
>      }
> -    exclusive_idle();
> -    qemu_mutex_unlock(&qemu_cpu_list_mutex);
>  }
>
>  static void async_safe_run_on_cpu_fn(CPUState *cpu, void *data)
> diff --git a/docs/tcg-exclusive.promela b/docs/tcg-exclusive.promela
> new file mode 100644
> index 0000000..293b530
> --- /dev/null
> +++ b/docs/tcg-exclusive.promela
> @@ -0,0 +1,224 @@
> +/*
> + * This model describes the implementation of exclusive sections in
> + * cpus-common.c (start_exclusive, end_exclusive, cpu_exec_start,
> + * cpu_exec_end).

\o/ nice to have a model ;-)

> + *
> + * Author: Paolo Bonzini <pbonzini@redhat.com>
> + *
> + * This file is in the public domain.  If you really want a license,
> + * the WTFPL will do.
> + *
> + * To verify it:
> + *     spin -a docs/event.promela

wrong docs name

> + *     ./a.out -a

Which version of spin did you run? I grabbed the latest src release
(http://spinroot.com/spin/Src/src645.tar.gz) and had to manually build
the output:

    ~/src/spin/Src6.4.5/spin -a docs/tcg-exclusive.promela
    gcc pan.c
    ../a.out

> + *
> + * Tunable processor macros: N_CPUS, N_EXCLUSIVE, N_CYCLES, USE_MUTEX,
> + *                           TEST_EXPENSIVE.
> + */

How do you pass these? I tried:

    ~/src/spin/Src6.4.5/spin -a docs/tcg-exclusive.promela -DN_CPUS=4
    ~/src/spin/Src6.4.5/spin -a docs/tcg-exclusive.promela -DN_CPUS 4

without any joy.

> +
> +// Define the missing parameters for the model
> +#ifndef N_CPUS
> +#define N_CPUS 2
> +#warning defaulting to 2 CPU processes
> +#endif
> +
> +// the expensive test is not so expensive for <= 2 CPUs
> +// If the mutex is used, it's also cheap (300 MB / 4 seconds) for 3 CPUs
> +// For 3 CPUs and the lock-free option it needs 1.5 GB of RAM
> +#if N_CPUS <= 2 || (N_CPUS <= 3 && defined USE_MUTEX)
> +#define TEST_EXPENSIVE
> +#endif
> +
> +#ifndef N_EXCLUSIVE
> +# if !defined N_CYCLES || N_CYCLES <= 1 || defined TEST_EXPENSIVE
> +#  define N_EXCLUSIVE     2
> +#  warning defaulting to 2 concurrent exclusive sections
> +# else
> +#  define N_EXCLUSIVE     1
> +#  warning defaulting to 1 concurrent exclusive sections
> +# endif
> +#endif
> +#ifndef N_CYCLES
> +# if N_EXCLUSIVE <= 1 || defined TEST_EXPENSIVE
> +#  define N_CYCLES        2
> +#  warning defaulting to 2 CPU cycles
> +# else
> +#  define N_CYCLES        1
> +#  warning defaulting to 1 CPU cycles
> +# endif
> +#endif
> +
> +
> +// synchronization primitives.  condition variables require a
> +// process-local "cond_t saved;" variable.
> +
> +#define mutex_t              byte
> +#define MUTEX_LOCK(m)        atomic { m == 0 -> m = 1 }
> +#define MUTEX_UNLOCK(m)      m = 0
> +
> +#define cond_t               int
> +#define COND_WAIT(c, m)      {                                  \
> +                               saved = c;                       \
> +                               MUTEX_UNLOCK(m);                 \
> +                               c != saved -> MUTEX_LOCK(m);     \
> +                             }
> +#define COND_BROADCAST(c)    c++
> +
> +// this is the logic from cpus-common.c
> +
> +mutex_t mutex;
> +cond_t exclusive_cond;
> +cond_t exclusive_resume;
> +byte pending_cpus;
> +
> +byte running[N_CPUS];
> +byte has_waiter[N_CPUS];
> +
> +#define exclusive_idle()                                          \
> +  do                                                              \
> +      :: pending_cpus -> COND_WAIT(exclusive_resume, mutex);      \
> +      :: else         -> break;                                   \
> +  od
> +
> +#define start_exclusive()                                         \
> +    MUTEX_LOCK(mutex);                                            \
> +    exclusive_idle();                                             \
> +    pending_cpus = 1;                                             \
> +                                                                  \
> +    i = 0;                                                        \
> +    do                                                            \
> +       :: i < N_CPUS -> {                                         \
> +           if                                                     \
> +              :: running[i] -> has_waiter[i] = 1; pending_cpus++; \
> +              :: else       -> skip;                              \
> +           fi;                                                    \
> +           i++;                                                   \
> +       }                                                          \
> +       :: else -> break;                                          \
> +    od;                                                           \
> +                                                                  \
> +    do                                                            \
> +      :: pending_cpus > 1 -> COND_WAIT(exclusive_cond, mutex);    \
> +      :: else             -> break;                               \
> +    od;
> +
> +#define end_exclusive()                                           \
> +    pending_cpus = 0;                                             \
> +    COND_BROADCAST(exclusive_resume);                             \
> +    MUTEX_UNLOCK(mutex);
> +
> +#ifdef USE_MUTEX
> +// Simple version using mutexes
> +#define cpu_exec_start(id)                                                   \
> +    MUTEX_LOCK(mutex);                                                       \
> +    exclusive_idle();                                                        \
> +    running[id] = 1;                                                         \
> +    MUTEX_UNLOCK(mutex);
> +
> +#define cpu_exec_end(id)                                                     \
> +    MUTEX_LOCK(mutex);                                                       \
> +    running[id] = 0;                                                         \
> +    if                                                                       \
> +        :: pending_cpus -> {                                                 \
> +            pending_cpus--;                                                  \
> +            if                                                               \
> +                :: pending_cpus == 1 -> COND_BROADCAST(exclusive_cond);      \
> +                :: else -> skip;                                             \
> +            fi;                                                              \
> +            exclusive_idle();                                                \
> +        }                                                                    \
> +        :: else -> skip;                                                     \
> +    fi;                                                                      \
> +    MUTEX_UNLOCK(mutex);
> +#else
> +// Wait-free fast path, only needs mutex when concurrent with
> +// an exclusive section
> +#define cpu_exec_start(id)                                                   \
> +    running[id] = 1;                                                         \
> +    if                                                                       \
> +        :: pending_cpus -> {                                                 \
> +            MUTEX_LOCK(mutex);                                               \
> +            if                                                               \
> +                :: !has_waiter[id] -> {                                      \
> +                    running[id] = 0;                                         \
> +                    exclusive_idle();                                        \
> +                    running[id] = 1;                                         \
> +                }                                                            \
> +                :: else -> skip;                                             \
> +            fi;                                                              \
> +            MUTEX_UNLOCK(mutex);                                             \
> +        }                                                                    \
> +        :: else -> skip;                                                     \
> +    fi;
> +
> +#define cpu_exec_end(id)                                                     \
> +    running[id] = 0;                                                         \
> +    if                                                                       \
> +        :: pending_cpus -> {                                                 \
> +            MUTEX_LOCK(mutex);                                               \
> +            if                                                               \
> +                :: has_waiter[id] -> {                                       \
> +                    has_waiter[id] = 0;                                      \
> +                    pending_cpus--;                                          \
> +                    if                                                       \
> +                        :: pending_cpus == 1 -> COND_BROADCAST(exclusive_cond); \
> +                        :: else -> skip;                                     \
> +                    fi;                                                      \
> +                    exclusive_idle();                                        \
> +                }                                                            \
> +                :: else -> skip;                                             \
> +            fi;                                                              \
> +            MUTEX_UNLOCK(mutex);                                             \
> +        }                                                                    \
> +        :: else -> skip;                                                     \
> +    fi
> +#endif
> +
> +// Promela processes
> +
> +byte done_cpu;
> +byte in_cpu;
> +active[N_CPUS] proctype cpu()
> +{
> +    byte id = _pid % N_CPUS;
> +    byte cycles = 0;
> +    cond_t saved;
> +
> +    do
> +       :: cycles == N_CYCLES -> break;
> +       :: else -> {
> +           cycles++;
> +           cpu_exec_start(id)
> +           in_cpu++;
> +           done_cpu++;
> +           in_cpu--;
> +           cpu_exec_end(id)
> +       }
> +    od;
> +}
> +
> +byte done_exclusive;
> +byte in_exclusive;
> +active[N_EXCLUSIVE] proctype exclusive()
> +{
> +    cond_t saved;
> +    byte i;
> +
> +    start_exclusive();
> +    in_exclusive = 1;
> +    done_exclusive++;
> +    in_exclusive = 0;
> +    end_exclusive();
> +}
> +
> +#define LIVENESS   (done_cpu == N_CPUS * N_CYCLES && done_exclusive == N_EXCLUSIVE)
> +#define SAFETY     !(in_exclusive && in_cpu)
> +
> +never {    /* ! ([] SAFETY && <> [] LIVENESS) */
> +    do
> +    // once the liveness property is satisfied, this is not executable
> +    // and the never clause is not accepted
> +    :: ! LIVENESS -> accept_liveness: skip
> +    :: 1          -> assert(SAFETY)
> +    od;
> +}
> diff --git a/include/qom/cpu.h b/include/qom/cpu.h
> index 3817a98..14cf97c 100644
> --- a/include/qom/cpu.h
> +++ b/include/qom/cpu.h
> @@ -249,7 +249,8 @@ struct qemu_work_item {
>   * @nr_threads: Number of threads within this CPU.
>   * @numa_node: NUMA node this CPU is belonging to.
>   * @host_tid: Host thread ID.
> - * @running: #true if CPU is currently running;
> + * @running: #true if CPU is currently running (lockless).
> + * @has_waiter: #true if a CPU is currently waiting for the cpu_exec_end;
>   * valid under cpu_list_lock.
>   * @created: Indicates whether the CPU thread has been successfully created.
>   * @interrupt_request: Indicates a pending interrupt request.
> @@ -303,7 +304,7 @@ struct CPUState {
>  #endif
>      int thread_id;
>      uint32_t host_tid;
> -    bool running;
> +    bool running, has_waiter;
>      struct QemuCond *halt_cond;
>      bool thread_kicked;
>      bool created;


--
Alex Bennée

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH 10/12] cpus-common: Introduce async_safe_run_on_cpu()
  2016-09-05 15:14     ` Paolo Bonzini
@ 2016-09-05 15:41       ` Alex Bennée
  0 siblings, 0 replies; 26+ messages in thread
From: Alex Bennée @ 2016-09-05 15:41 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, Sergey Fedorov, Sergey Fedorov, Emilio G. Cota


Paolo Bonzini <pbonzini@redhat.com> writes:

> On 05/09/2016 17:08, Alex Bennée wrote:
>>
>> Paolo Bonzini <pbonzini@redhat.com> writes:
>>
>>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>>> ---
>>>  cpus-common.c     | 25 +++++++++++++++++++++++++
>>>  include/qom/cpu.h | 12 ++++++++++++
>>>  2 files changed, 37 insertions(+)
>>>
>>> diff --git a/cpus-common.c b/cpus-common.c
>>> index 59c8dc8..88cf5ec 100644
>>> --- a/cpus-common.c
>>> +++ b/cpus-common.c
>>> @@ -144,6 +144,11 @@ void async_run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data)
>>>      queue_work_on_cpu(cpu, wi);
>>>  }
>>>
>>> +typedef struct SafeWorkItem {
>>> +    run_on_cpu_func func;
>>> +    void *data;
>>> +} SafeWorkItem;
>>> +
>>>  /* Wait for pending exclusive operations to complete.  The exclusive lock
>>>     must be held.  */
>>>  static inline void exclusive_idle(void)
>>> @@ -208,6 +213,26 @@ void cpu_exec_end(CPUState *cpu)
>>>      qemu_mutex_unlock(&qemu_cpu_list_mutex);
>>>  }
>>>
>>> +static void async_safe_run_on_cpu_fn(CPUState *cpu, void *data)
>>> +{
>>> +    SafeWorkItem *w = data;
>>> +
>>> +    start_exclusive();
>>> +    w->func(cpu, w->data);
>>> +    end_exclusive();
>>> +    g_free(w);
>>> +}
>>> +
>>> +void async_safe_run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data)
>>> +{
>>> +    SafeWorkItem *w = g_new(SafeWorkItem, 1);
>>
>> OK so I appreciate this approach is a neat way to embed safe work in the
>> existing queue but does it really offer that much more for yet another
>> dynamic allocation vs an extra flag to the WorkItem?
>
> Actually I had it that way in the "first version" (the one that I
> promised last Monday).  The questions are:
>
> 1) is it so different to have 1 vs. 2 mini allocations
>
> 2) is this a fast path
>
> but it's no big deal, I can certainly change back.

When the system is under stress it all adds up - but as I said the test
case was a fairly special. Keeping it all in one structure does make it
easier to convert to an array later though.

I'll see if I can come up with a differently pathological case and I'll
benchmark the two approaches.

>
> Paolo
>
>> In previous iterations I can DoS QEMU with a guest that does heavy
>> cross-CPU TLB flushing which led to a storm of mini allocations (for the
>> list and associated structures). This caused the massive memory usage as
>> the queue backed up.
>>
>> I appreciate it was a fairly special test case and I introduced other
>> mitigations in the base patches cputlb code to get around it however it
>> was the driver for me experimenting with the pre-allocated array for
>> holding work items.
>>
>>> +
>>> +    w->func = func;
>>> +    w->data = data;
>>> +
>>> +    async_run_on_cpu(cpu, async_safe_run_on_cpu_fn, w);
>>> +}
>>> +
>>>  void process_queued_cpu_work(CPUState *cpu)
>>>  {
>>>      struct qemu_work_item *wi;
>>> diff --git a/include/qom/cpu.h b/include/qom/cpu.h
>>> index 0e04e8f..54a875e 100644
>>> --- a/include/qom/cpu.h
>>> +++ b/include/qom/cpu.h
>>> @@ -663,6 +663,18 @@ void run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data);
>>>  void async_run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data);
>>>
>>>  /**
>>> + * async_safe_run_on_cpu:
>>> + * @cpu: The vCPU to run on.
>>> + * @func: The function to be executed.
>>> + * @data: Data to pass to the function.
>>> + *
>>> + * Schedules the function @func for execution on the vCPU @cpu asynchronously,
>>> + * while all other vCPUs are sleeping.  @func is called with the CPU list lock
>>> + * taken (and for system emulation the BQL); any other lock can be taken safely.
>>> + */
>>> +void async_safe_run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data);
>>> +
>>> +/**
>>>   * qemu_get_cpu:
>>>   * @index: The CPUState@cpu_index value of the CPU to obtain.
>>>   *
>>
>>
>> --
>> Alex Bennée
>>


--
Alex Bennée

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH v6 00/12] cpu-exec: Safe work in quiescent state
  2016-09-01 10:20 [Qemu-devel] [PATCH v6 00/12] cpu-exec: Safe work in quiescent state Paolo Bonzini
                   ` (11 preceding siblings ...)
  2016-09-01 10:20 ` [Qemu-devel] [PATCH 12/12] cpus-common: lock-free fast path for cpu_exec_start/end Paolo Bonzini
@ 2016-09-05 15:51 ` Alex Bennée
  2016-09-05 17:00   ` Paolo Bonzini
  12 siblings, 1 reply; 26+ messages in thread
From: Alex Bennée @ 2016-09-05 15:51 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, Sergey Fedorov, Sergey Fedorov, Emilio G. Cota


Paolo Bonzini <pbonzini@redhat.com> writes:

> This is not the code I promised at the beginning of the week.
> It's better, and not only in that this one works. :)  Instead of
> reinventing the wheel and using the new wheel for linux-user's
> start_exclusive/end_exclusive, the linux-user/ code is moved to
> cpus-common.c and reused as the synchronization mechanism behind
> async_safe_run_on_cpu.
>
> The code works and actually satisfies our needs very well.  The only
> disadvantage is that safe work items will run with a mutex taken; the
> mutex fits decently in QEMU's hierarchy however, sitting between the
> BQL and the tb_lock.

The series is looking good. I only had a few comments on comments.

So for testing I applied:

1 file changed, 16 insertions(+), 5 deletions(-)
translate-all.c | 21 ++++++++++++++++-----

modified   translate-all.c
@@ -20,6 +20,9 @@
 #include <windows.h>
 #endif
 #include "qemu/osdep.h"
+#include <sys/types.h>
+#include <unistd.h>
+#include <sys/syscall.h>


 #include "qemu-common.h"
@@ -57,7 +60,7 @@
 #include "exec/log.h"

 //#define DEBUG_TB_INVALIDATE
-//#define DEBUG_FLUSH
+#define DEBUG_FLUSH
 /* make various TB consistency checks */
 //#define DEBUG_TB_CHECK

@@ -456,7 +459,8 @@ static inline PageDesc *page_find(tb_page_addr_t index)
    indicated, this is constrained by the range of direct branches on the
    host cpu, as used by the TCG implementation of goto_tb.  */
 #if defined(__x86_64__)
-# define MAX_CODE_GEN_BUFFER_SIZE  (2ul * 1024 * 1024 * 1024)
+/* # define MAX_CODE_GEN_BUFFER_SIZE  (2ul * 1024 * 1024 * 1024) */
+# define MAX_CODE_GEN_BUFFER_SIZE  (256 * 1024)
 #elif defined(__sparc__)
 # define MAX_CODE_GEN_BUFFER_SIZE  (2ul * 1024 * 1024 * 1024)
 #elif defined(__powerpc64__)
@@ -835,18 +839,25 @@ static void page_flush_tb(void)
 static void do_tb_flush(CPUState *cpu, void *data)
 {
     unsigned tb_flush_req = (unsigned) (uintptr_t) data;
-
+    unsigned tb_flush_count;
     tb_lock();

     /* If it's already been done on request of another CPU,
      * just retry.
      */
-    if (atomic_read(&tcg_ctx.tb_ctx.tb_flush_count) != tb_flush_req) {
+    tb_flush_count = atomic_read(&tcg_ctx.tb_ctx.tb_flush_count);
+    if (tb_flush_count != tb_flush_req) {
+#if defined(DEBUG_FLUSH)
+        fprintf(stderr,"%s: (%d/%ld) skipping racing flush %x/%x\n", __func__,
+                getpid(), syscall(SYS_gettid), tb_flush_req, tb_flush_count);
+#endif
         goto done;
     }

 #if defined(DEBUG_FLUSH)
-    printf("qemu: flush code_size=%ld nb_tbs=%d avg_tb_size=%ld\n",
+    fprintf(stderr, "%s: (%d/%ld, %x/%x) code_size=%ld nb_tbs=%d avg_tb_size=%ld\n", __func__,
+            getpid(), syscall(SYS_gettid),
+            tb_flush_req, tb_flush_count,
            (unsigned long)(tcg_ctx.code_gen_ptr - tcg_ctx.code_gen_buffer),
            tcg_ctx.tb_ctx.nb_tbs, tcg_ctx.tb_ctx.nb_tbs > 0 ?
            ((unsigned long)(tcg_ctx.code_gen_ptr - tcg_ctx.code_gen_buffer)) /

And then ran the test in my pigz testing image:

    ./tests/docker/docker.py update pigz-test:armhf ./arm-linux-user/qemu-arm
    retry.py -n 20 -c -- docker run -i -t --rm pigz-test:armhf pigz data.tar

An observed the results of flushing including racing flushes which would
ordinarily take out the system.

It would be nice to resurrect some of the tests/tcg stuff and see if we
can stress linux-user with creating and destroying threads. I guess the
work here also replaces Emilo's RCU cpu list.

>
> For performance, the last patch changes it to avoid condition variables
> in the fast path.  (There are still two memory barriers; if desired they
> could be merged with the ones in rcu_read_lock/rcu_read_unlock).  I am
> including a formal model of the algorithm; together with new documentation
> in include/qom/cpu.h, it accounts for most of the added lines of code.
> Still, it is completely optional.
>
> Paolo
>
> Alex Bennée (1):
>   cpus: pass CPUState to run_on_cpu helpers
>
> Paolo Bonzini (5):
>   cpus-common: move CPU list management to common code
>   cpus-common: move exclusive work infrastructure from linux-user
>   cpus-common: always defer async_run_on_cpu work items
>   cpus-common: Introduce async_safe_run_on_cpu()
>   cpus-common: lock-free fast path for cpu_exec_start/end
>
> Sergey Fedorov (6):
>   cpus: Move common code out of {async_, }run_on_cpu()
>   cpus: Rename flush_queued_work()
>   linux-user: Use QemuMutex and QemuCond
>   linux-user: Add qemu_cpu_is_self() and qemu_cpu_kick()
>   cpus-common: move CPU work item management to common code
>   tcg: Make tb_flush() thread safe
>
>  Makefile.target            |   2 +-
>  bsd-user/main.c            |  30 +----
>  cpu-exec.c                 |  12 +-
>  cpus-common.c              | 314 +++++++++++++++++++++++++++++++++++++++++++++
>  cpus.c                     |  99 +-------------
>  docs/tcg-exclusive.promela | 224 ++++++++++++++++++++++++++++++++
>  exec.c                     |  30 +----
>  hw/i386/kvm/apic.c         |   3 +-
>  hw/i386/kvmvapic.c         |   6 +-
>  hw/ppc/ppce500_spin.c      |  31 ++---
>  hw/ppc/spapr.c             |   6 +-
>  hw/ppc/spapr_hcall.c       |  17 +--
>  include/exec/cpu-all.h     |   4 +
>  include/exec/cpu-common.h  |   2 +
>  include/exec/exec-all.h    |  11 --
>  include/exec/tb-context.h  |   2 +-
>  include/qom/cpu.h          |  95 +++++++++++++-
>  kvm-all.c                  |  21 +--
>  linux-user/main.c          | 130 ++++++-------------
>  target-i386/helper.c       |  19 ++-
>  target-i386/kvm.c          |   6 +-
>  target-s390x/cpu.c         |   4 +-
>  target-s390x/cpu.h         |   7 +-
>  target-s390x/kvm.c         |  98 +++++++-------
>  target-s390x/misc_helper.c |   4 +-
>  translate-all.c            |  38 ++++--
>  vl.c                       |   1 +
>  27 files changed, 814 insertions(+), 402 deletions(-)
>  create mode 100644 cpus-common.c
>  create mode 100644 docs/tcg-exclusive.promela


--
Alex Bennée

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH 12/12] cpus-common: lock-free fast path for cpu_exec_start/end
  2016-09-05 15:25   ` Alex Bennée
@ 2016-09-05 16:57     ` Paolo Bonzini
  0 siblings, 0 replies; 26+ messages in thread
From: Paolo Bonzini @ 2016-09-05 16:57 UTC (permalink / raw)
  To: Alex Bennée
  Cc: qemu-devel, Sergey Fedorov, Sergey Fedorov, Emilio G. Cota



On 05/09/2016 17:25, Alex Bennée wrote:
>> diff --git a/docs/tcg-exclusive.promela b/docs/tcg-exclusive.promela
>> new file mode 100644
>> index 0000000..293b530
>> --- /dev/null
>> +++ b/docs/tcg-exclusive.promela
>> @@ -0,0 +1,224 @@
>> +/*
>> + * This model describes the implementation of exclusive sections in
>> + * cpus-common.c (start_exclusive, end_exclusive, cpu_exec_start,
>> + * cpu_exec_end).
> 
> \o/ nice to have a model ;-)

Yeah, in my tree actually I have a couple optimizations that patch both
cpus-common.c and the model.

>> + *
>> + * Author: Paolo Bonzini <pbonzini@redhat.com>
>> + *
>> + * This file is in the public domain.  If you really want a license,
>> + * the WTFPL will do.
>> + *
>> + * To verify it:
>> + *     spin -a docs/event.promela
> 
> wrong docs name
> 
>> + *     ./a.out -a
> 
> Which version of spin did you run? I grabbed the latest src release
> (http://spinroot.com/spin/Src/src645.tar.gz) and had to manually build
> the output:
> 
>     ~/src/spin/Src6.4.5/spin -a docs/tcg-exclusive.promela
>     gcc pan.c
>     ../a.out

Yes, that's how you use it.  You'd better use -O2, too.

>> + *
>> + * Tunable processor macros: N_CPUS, N_EXCLUSIVE, N_CYCLES, USE_MUTEX,
>> + *                           TEST_EXPENSIVE.
>> + */
> 
> How do you pass these? I tried:
> 
>     ~/src/spin/Src6.4.5/spin -a docs/tcg-exclusive.promela -DN_CPUS=4

This should work...  Maybe put -D before the file name.

Paolo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH v6 00/12] cpu-exec: Safe work in quiescent state
  2016-09-05 15:51 ` [Qemu-devel] [PATCH v6 00/12] cpu-exec: Safe work in quiescent state Alex Bennée
@ 2016-09-05 17:00   ` Paolo Bonzini
  0 siblings, 0 replies; 26+ messages in thread
From: Paolo Bonzini @ 2016-09-05 17:00 UTC (permalink / raw)
  To: Alex Bennée
  Cc: qemu-devel, Sergey Fedorov, Sergey Fedorov, Emilio G. Cota



On 05/09/2016 17:51, Alex Bennée wrote:
> And then ran the test in my pigz testing image:
> 
>     ./tests/docker/docker.py update pigz-test:armhf ./arm-linux-user/qemu-arm
>     retry.py -n 20 -c -- docker run -i -t --rm pigz-test:armhf pigz data.tar
> 
> An observed the results of flushing including racing flushes which would
> ordinarily take out the system.
> 
> It would be nice to resurrect some of the tests/tcg stuff and see if we
> can stress linux-user with creating and destroying threads. I guess the
> work here also replaces Emilo's RCU cpu list.

Sort of, uses of CPU_FOREACH should be protected by RCU which isn't yet
the case.  But yes, that was the idea behind using a single mutex.

Related to this, in fact you don't need to hold the mutex between
start_exclusive and end_exclusive.  You can release it at the end of
start_exclusive and take it again for end_exclusive.  That's a bit nicer
to reason about without making the mutex recursive mutexes.

Paolo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH 10/12] cpus-common: Introduce async_safe_run_on_cpu()
  2016-09-01 10:20 ` [Qemu-devel] [PATCH 10/12] cpus-common: Introduce async_safe_run_on_cpu() Paolo Bonzini
  2016-09-05 15:08   ` Alex Bennée
@ 2016-09-12 18:25   ` Pranith Kumar
  1 sibling, 0 replies; 26+ messages in thread
From: Pranith Kumar @ 2016-09-12 18:25 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: qemu-devel, Sergey Fedorov, Emilio G. Cota, Alex Bennée,
	Sergey Fedorov


Paolo Bonzini writes:

> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  cpus-common.c     | 25 +++++++++++++++++++++++++
>  include/qom/cpu.h | 12 ++++++++++++
>  2 files changed, 37 insertions(+)
>
> diff --git a/cpus-common.c b/cpus-common.c
> index 59c8dc8..88cf5ec 100644
> --- a/cpus-common.c
> +++ b/cpus-common.c
> @@ -144,6 +144,11 @@ void async_run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data)
>      queue_work_on_cpu(cpu, wi);
>  }
>  
> +typedef struct SafeWorkItem {
> +    run_on_cpu_func func;
> +    void *data;
> +} SafeWorkItem;
> +
>  /* Wait for pending exclusive operations to complete.  The exclusive lock
>     must be held.  */
>  static inline void exclusive_idle(void)
> @@ -208,6 +213,26 @@ void cpu_exec_end(CPUState *cpu)
>      qemu_mutex_unlock(&qemu_cpu_list_mutex);
>  }
>  
> +static void async_safe_run_on_cpu_fn(CPUState *cpu, void *data)
> +{
> +    SafeWorkItem *w = data;
> +
> +    start_exclusive();
> +    w->func(cpu, w->data);
> +    end_exclusive();
> +    g_free(w);
> +}
> +
> +void async_safe_run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data)
> +{
> +    SafeWorkItem *w = g_new(SafeWorkItem, 1);
> +
> +    w->func = func;
> +    w->data = data;
> +
> +    async_run_on_cpu(cpu, async_safe_run_on_cpu_fn, w);
> +}
> +
>  void process_queued_cpu_work(CPUState *cpu)
>  {
>      struct qemu_work_item *wi;
> diff --git a/include/qom/cpu.h b/include/qom/cpu.h
> index 0e04e8f..54a875e 100644
> --- a/include/qom/cpu.h
> +++ b/include/qom/cpu.h
> @@ -663,6 +663,18 @@ void run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data);
>  void async_run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data);
>  
>  /**
> + * async_safe_run_on_cpu:
> + * @cpu: The vCPU to run on.
> + * @func: The function to be executed.
> + * @data: Data to pass to the function.
> + *
> + * Schedules the function @func for execution on the vCPU @cpu asynchronously,
> + * while all other vCPUs are sleeping.  @func is called with the CPU list lock
> + * taken (and for system emulation the BQL); any other lock can be taken safely.
> + */
> +void async_safe_run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data);
> +
> +/**
>   * qemu_get_cpu:
>   * @index: The CPUState@cpu_index value of the CPU to obtain.
>   *

I used the following diff on top(as suggested by bonzini) to fix a deadlock I
was seeing while testing.

Fix deadlock caused by holding BQL for safe work

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
---
 cpus-common.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/cpus-common.c b/cpus-common.c
index 443617a..579917f 100644
--- a/cpus-common.c
+++ b/cpus-common.c
@@ -18,6 +18,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/main-loop.h"
 #include "cpu.h"
 #include "sysemu/cpus.h"
 #include "exec/memory-internal.h"
@@ -269,9 +270,11 @@ static void async_safe_run_on_cpu_fn(CPUState *cpu, void *data)
 {
     SafeWorkItem *w = data;
 
+    qemu_mutex_unlock_iothread();
     start_exclusive();
     w->func(cpu, w->data);
     end_exclusive();
+    qemu_mutex_lock_iothread();
     g_free(w);
 }
 
-- 
Pranith

^ permalink raw reply related	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2016-09-12 18:27 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-01 10:20 [Qemu-devel] [PATCH v6 00/12] cpu-exec: Safe work in quiescent state Paolo Bonzini
2016-09-01 10:20 ` [Qemu-devel] [PATCH 01/12] cpus: pass CPUState to run_on_cpu helpers Paolo Bonzini
2016-09-01 10:20 ` [Qemu-devel] [PATCH 02/12] cpus: Move common code out of {async_, }run_on_cpu() Paolo Bonzini
2016-09-01 10:20 ` [Qemu-devel] [PATCH 03/12] cpus: Rename flush_queued_work() Paolo Bonzini
2016-09-01 10:20 ` [Qemu-devel] [PATCH 04/12] linux-user: Use QemuMutex and QemuCond Paolo Bonzini
2016-09-01 10:20 ` [Qemu-devel] [PATCH 05/12] linux-user: Add qemu_cpu_is_self() and qemu_cpu_kick() Paolo Bonzini
2016-09-01 10:20 ` [Qemu-devel] [PATCH 06/12] cpus-common: move CPU list management to common code Paolo Bonzini
2016-09-05 10:05   ` Alex Bennée
2016-09-05 10:29     ` Paolo Bonzini
2016-09-01 10:20 ` [Qemu-devel] [PATCH 07/12] cpus-common: move CPU work item management to common Paolo Bonzini
2016-09-01 10:20 ` [Qemu-devel] [PATCH 08/12] cpus-common: move exclusive work infrastructure from Paolo Bonzini
2016-09-05 14:55   ` Alex Bennée
2016-09-05 14:57     ` Paolo Bonzini
2016-09-01 10:20 ` [Qemu-devel] [PATCH 09/12] cpus-common: always defer async_run_on_cpu work items Paolo Bonzini
2016-09-05 14:57   ` Alex Bennée
2016-09-01 10:20 ` [Qemu-devel] [PATCH 10/12] cpus-common: Introduce async_safe_run_on_cpu() Paolo Bonzini
2016-09-05 15:08   ` Alex Bennée
2016-09-05 15:14     ` Paolo Bonzini
2016-09-05 15:41       ` Alex Bennée
2016-09-12 18:25   ` Pranith Kumar
2016-09-01 10:20 ` [Qemu-devel] [PATCH 11/12] tcg: Make tb_flush() thread safe Paolo Bonzini
2016-09-01 10:20 ` [Qemu-devel] [PATCH 12/12] cpus-common: lock-free fast path for cpu_exec_start/end Paolo Bonzini
2016-09-05 15:25   ` Alex Bennée
2016-09-05 16:57     ` Paolo Bonzini
2016-09-05 15:51 ` [Qemu-devel] [PATCH v6 00/12] cpu-exec: Safe work in quiescent state Alex Bennée
2016-09-05 17:00   ` Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.