All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [RFC v1 00/11] Base enabling patches for MTTCG
@ 2016-03-18 16:18 Alex Bennée
  2016-03-18 16:18 ` [Qemu-devel] [RFC v1 01/11] tcg: move tb_find_fast outside the tb_lock critical section Alex Bennée
                   ` (10 more replies)
  0 siblings, 11 replies; 28+ messages in thread
From: Alex Bennée @ 2016-03-18 16:18 UTC (permalink / raw)
  To: mttcg, fred.konrad, a.rigo, serge.fdrv, cota
  Cc: Alex Bennée, pbonzini, mark.burton, qemu-devel

Hi,

This RFC patch set aims to provide the basic framework for MTTCG. I've
tried to pull together the non-contentious bits that other trees can
then build on. This is because we have multiple potential solutions
for things like deferring work and handling atomics. The idea being
that those bits could be added on and when a given architecture combo
is ready multi-threading can be turned on automatically for known good
combinations. Without the additional work just blindly turning on
MTTCG is likely to fail horribly ;-)

The branch can be found at:

  https://github.com/stsquad/qemu/tree/mttcg/base-patches-v1

It has Serge's tcg-cleanups branch as it's base as those are TCG
clean-ups not directly tied to MTTCG which will hopefully get merged
in due course well before these patches.

The main changes I've made from Fred's last tree are:
  - a new option -tcg mttcg=on|off
  - the kick timer to prevent lock clean-ups breaking single-threaded -smp
  - pulled in a fix from Emilio for locking interrupt handling

Where I've made tweaks to existing patches I've added [] comments and
some revision history below the ---.

What's left
===========

This series still boots a full virtio based Debian arm in single
threaded mode although currently "shutdown -h now" hangs once the
guest is shutdown as the iowait hangs on the final condition.

There are a few checkpatch things that need to be addressed with
regards to barrier comments.

Any comments welcome, does this look like a sane approach? Can we
agree a common base on which everything else can be built on?

Cheers,

Alex

Alex Bennée (3):
  target-arm/psci.c: wake up sleeping CPUs
  tcg: cpus rm tcg_exec_all()
  tcg: add kick timer for single-threaded vCPU emulation

Emilio G. Cota (1):
  tcg: grab iothread lock in cpu-exec interrupt handling

KONRAD Frederic (5):
  tcg: move tb_find_fast outside the tb_lock critical section
  tcg: protect TBContext with tb_lock.
  tcg: add options for enabling MTTCG
  tcg: drop global lock during TCG code execution
  tcg: enable thread-per-vCPU

Paolo Bonzini (2):
  cpu-exec: elide more icount code if CONFIG_USER_ONLY
  tcg: comment on which functions have to be called with tb_lock held

 cpu-exec-common.c       |   1 -
 cpu-exec.c              | 128 +++++++++++++------
 cpus.c                  | 323 +++++++++++++++++++++++++++++++-----------------
 cputlb.c                |   4 +
 exec.c                  |  16 +++
 hw/i386/kvmvapic.c      |   6 +
 include/exec/exec-all.h |   5 +-
 include/qom/cpu.h       |  20 +++
 include/sysemu/cpus.h   |   2 +
 qemu-options.hx         |  14 +++
 softmmu_template.h      |  17 +++
 target-arm/psci.c       |   2 +
 tcg/tcg.h               |   3 +
 translate-all.c         | 110 +++++++++++++----
 vl.c                    |  12 +-
 15 files changed, 477 insertions(+), 186 deletions(-)

-- 
2.7.3

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Qemu-devel] [RFC v1 01/11] tcg: move tb_find_fast outside the tb_lock critical section
  2016-03-18 16:18 [Qemu-devel] [RFC v1 00/11] Base enabling patches for MTTCG Alex Bennée
@ 2016-03-18 16:18 ` Alex Bennée
  2016-03-18 16:54   ` Paolo Bonzini
  2016-03-21 21:50   ` Emilio G. Cota
  2016-03-18 16:18 ` [Qemu-devel] [RFC v1 02/11] cpu-exec: elide more icount code if CONFIG_USER_ONLY Alex Bennée
                   ` (9 subsequent siblings)
  10 siblings, 2 replies; 28+ messages in thread
From: Alex Bennée @ 2016-03-18 16:18 UTC (permalink / raw)
  To: mttcg, fred.konrad, a.rigo, serge.fdrv, cota
  Cc: Peter Crosthwaite, mark.burton, qemu-devel, pbonzini,
	Alex Bennée, Andreas Färber, Richard Henderson

From: KONRAD Frederic <fred.konrad@greensocs.com>

Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[AJB: minor checkpatch fixes]
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

---
v1(ajb)
  - checkpatch fixes
---
 cpu-exec.c        | 74 +++++++++++++++++++++++++++++++++----------------------
 include/qom/cpu.h |  2 ++
 tcg/tcg.h         |  1 +
 translate-all.c   | 23 ++++++++++++++++-
 4 files changed, 70 insertions(+), 30 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index 07545aa..52f25de 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -225,8 +225,9 @@ static TranslationBlock *tb_find_physical(CPUState *cpu,
     phys_page1 = phys_pc & TARGET_PAGE_MASK;
     h = tb_phys_hash_func(phys_pc);
     for (ptb1 = &tcg_ctx.tb_ctx.tb_phys_hash[h];
-         (tb = *ptb1) != NULL;
+         (tb = atomic_read(ptb1)) != NULL;
          ptb1 = &tb->phys_hash_next) {
+        smp_read_barrier_depends();
         if (tb->pc != pc ||
             tb->page_addr[0] != phys_page1 ||
             tb->cs_base != cs_base ||
@@ -254,7 +255,18 @@ static TranslationBlock *tb_find_physical(CPUState *cpu,
         *ptb1 = tb->phys_hash_next;
         tb->phys_hash_next = tcg_ctx.tb_ctx.tb_phys_hash[h];
         tcg_ctx.tb_ctx.tb_phys_hash[h] = tb;
+    } else {
+        return NULL;
     }
+
+    /* If tb_flush was called since the last time we released the lock,
+     * forget about this TB.
+     */
+    smp_rmb();
+    if (atomic_read(&cpu->tb_invalidated_flag)) {
+        return NULL;
+    }
+
     return tb;
 }
 
@@ -265,36 +277,31 @@ static TranslationBlock *tb_find_slow(CPUState *cpu,
 {
     TranslationBlock *tb;
 
-    tb = tb_find_physical(cpu, pc, cs_base, flags);
-    if (tb) {
-        goto found;
-    }
-
-#ifdef CONFIG_USER_ONLY
-    /* mmap_lock is needed by tb_gen_code, and mmap_lock must be
-     * taken outside tb_lock.  Since we're momentarily dropping
-     * tb_lock, there's a chance that our desired tb has been
-     * translated.
+    /* First try to get the tb.  If we don't find it we need to lock and
+     * compile it.
      */
-    tb_unlock();
-    mmap_lock();
-    tb_lock();
     tb = tb_find_physical(cpu, pc, cs_base, flags);
-    if (tb) {
-        mmap_unlock();
-        goto found;
-    }
-#endif
-
-    /* if no translated code available, then translate it now */
-    cpu->tb_invalidated_flag = 0;
-    tb = tb_gen_code(cpu, pc, cs_base, flags, 0);
-
+    if (!tb) {
 #ifdef CONFIG_USER_ONLY
-    mmap_unlock();
+        /* mmap_lock is needed by tb_gen_code, and mmap_lock must be
+         * taken outside tb_lock.  tb_lock is released later in
+         * cpu_exec.
+         */
+        mmap_lock();
+        tb_lock();
+
+        /* Retry to get the TB in case a CPU just translate it to avoid having
+         * duplicated TB in the pool.
+         */
+        tb = tb_find_physical(cpu, pc, cs_base, flags);
 #endif
+        if (!tb) {
+            /* if no translated code available, then translate it now */
+            tb = tb_gen_code(cpu, pc, cs_base, flags, 0);
+        }
+        mmap_unlock();
+    }
 
-found:
     /* we add the TB in the virtual pc hash table */
     cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)] = tb;
     return tb;
@@ -312,6 +319,8 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu)
        is executed. */
     cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags);
     tb = cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)];
+    /* Read tb_jmp_cache before tb->pc.  */
+    smp_read_barrier_depends();
     if (unlikely(!tb || tb->pc != pc || tb->cs_base != cs_base ||
                  tb->flags != flags)) {
         tb = tb_find_slow(cpu, pc, cs_base, flags);
@@ -489,15 +498,18 @@ int cpu_exec(CPUState *cpu)
                     cpu->exception_index = EXCP_INTERRUPT;
                     cpu_loop_exit(cpu);
                 }
-                tb_lock();
                 tb = tb_find_fast(cpu);
                 /* Note: we do it here to avoid a gcc bug on Mac OS X when
                    doing it in tb_find_slow */
-                if (cpu->tb_invalidated_flag) {
+                if (atomic_read(&cpu->tb_invalidated_flag)) {
                     /* as some TB could have been invalidated because
                        of a tb_flush while generating the code, we
                        must recompute the hash index here */
                     next_tb = 0;
+
+                    /* Clear the flag, we've now observed the flush.  */
+                    tb_lock_recursive();
+                    cpu->tb_invalidated_flag = 0;
                 }
                 if (qemu_loglevel_mask(CPU_LOG_EXEC)) {
                     qemu_log("Trace %p [" TARGET_FMT_lx "] %s\n",
@@ -508,10 +520,14 @@ int cpu_exec(CPUState *cpu)
                    jump. */
                 if (next_tb != 0 && tb->page_addr[1] == -1
                     && !qemu_loglevel_mask(CPU_LOG_TB_NOCHAIN)) {
+                    tb_lock_recursive();
                     tb_add_jump((TranslationBlock *)(next_tb & ~TB_EXIT_MASK),
                                 next_tb & TB_EXIT_MASK, tb);
                 }
-                tb_unlock();
+                /* The lock may not be taken if we went through the
+                 * fast lookup path and did not have to do any patching.
+                 */
+                tb_lock_reset();
                 if (likely(!cpu->exit_request)) {
                     trace_exec_tb(tb, tb->pc);
                     tc_ptr = tb->tc_ptr;
diff --git a/include/qom/cpu.h b/include/qom/cpu.h
index 9538f9c..4132108 100644
--- a/include/qom/cpu.h
+++ b/include/qom/cpu.h
@@ -241,6 +241,8 @@ struct kvm_run;
  * @tcg_exit_req: Set to force TCG to stop executing linked TBs for this
  *           CPU and return to its top level loop.
  * @tb_invalidated_flag: Set to tell TCG that tb_flush has been called.
+ * It is only cleared while holding the tb_lock, so that no tb_flush can
+ * happen concurrently.
  * @singlestep_enabled: Flags for single-stepping.
  * @icount_extra: Instructions until next timer event.
  * @icount_decr: Number of cycles left, with interrupt flag in high bit.
diff --git a/tcg/tcg.h b/tcg/tcg.h
index b83f763..aa4e123 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -615,6 +615,7 @@ void tcg_pool_delete(TCGContext *s);
 
 void tb_lock(void);
 void tb_unlock(void);
+bool tb_lock_recursive(void);
 void tb_lock_reset(void);
 
 static inline void *tcg_malloc(int size)
diff --git a/translate-all.c b/translate-all.c
index 8e1edd6..f68dcbc 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -143,6 +143,17 @@ void tb_unlock(void)
 #endif
 }
 
+bool tb_lock_recursive(void)
+{
+#ifdef CONFIG_USER_ONLY
+    if (have_tb_lock) {
+        return false;
+    }
+    tb_lock();
+#endif
+    return true;
+}
+
 void tb_lock_reset(void)
 {
 #ifdef CONFIG_USER_ONLY
@@ -843,7 +854,8 @@ void tb_flush(CPUState *cpu)
     tcg_ctx.tb_ctx.nb_tbs = 0;
 
     CPU_FOREACH(cpu) {
-        cpu->tb_invalidated_flag = 1;
+        atomic_set(&cpu->tb_invalidated_flag, 1);
+        smp_wmb();
         memset(cpu->tb_jmp_cache, 0, sizeof(cpu->tb_jmp_cache));
     }
 
@@ -979,6 +991,9 @@ void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr)
     pc = tb->pc;
     tb->pc = -1;
 
+    /* Pairs with smp_read_barrier_depends() in tb_find_fast.  */
+    smp_wmb();
+
     /* Then suppress this TB from the two jump lists.  CPUs will not jump
      * anymore into this translation block.
      */
@@ -1478,7 +1493,13 @@ static void tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc,
     /* add in the physical hash table */
     h = tb_phys_hash_func(phys_pc);
     ptb = &tcg_ctx.tb_ctx.tb_phys_hash[h];
+
+    /* Both write barriers pair with tb_find_physical's
+     * smp_read_barrier_depends.
+     */
+    smp_wmb();
     tb->phys_hash_next = *ptb;
+    smp_wmb();
     *ptb = tb;
 
     /* add in the page list */
-- 
2.7.3

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [RFC v1 02/11] cpu-exec: elide more icount code if CONFIG_USER_ONLY
  2016-03-18 16:18 [Qemu-devel] [RFC v1 00/11] Base enabling patches for MTTCG Alex Bennée
  2016-03-18 16:18 ` [Qemu-devel] [RFC v1 01/11] tcg: move tb_find_fast outside the tb_lock critical section Alex Bennée
@ 2016-03-18 16:18 ` Alex Bennée
  2016-03-18 16:18 ` [Qemu-devel] [RFC v1 03/11] tcg: comment on which functions have to be called with tb_lock held Alex Bennée
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 28+ messages in thread
From: Alex Bennée @ 2016-03-18 16:18 UTC (permalink / raw)
  To: mttcg, fred.konrad, a.rigo, serge.fdrv, cota
  Cc: Peter Crosthwaite, mark.burton, qemu-devel, pbonzini,
	Alex Bennée, Richard Henderson

From: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

---
v1
  - #ifndef replay code to match elided functions
---
 cpu-exec.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/cpu-exec.c b/cpu-exec.c
index 52f25de..e71113e 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -183,6 +183,7 @@ static inline tcg_target_ulong cpu_tb_exec(CPUState *cpu, uint8_t *tb_ptr)
     return next_tb;
 }
 
+#ifndef CONFIG_USER_ONLY
 /* Execute the code without caching the generated code. An interpreter
    could be used if available. */
 static void cpu_exec_nocache(CPUState *cpu, int max_cycles,
@@ -208,6 +209,7 @@ static void cpu_exec_nocache(CPUState *cpu, int max_cycles,
     tb_phys_invalidate(tb, -1);
     tb_free(tb);
 }
+#endif
 
 static TranslationBlock *tb_find_physical(CPUState *cpu,
                                           target_ulong pc,
@@ -427,12 +429,14 @@ int cpu_exec(CPUState *cpu)
                     }
 #endif
                 }
+#ifndef CONFIG_USER_ONLY
             } else if (replay_has_exception()
                        && cpu->icount_decr.u16.low + cpu->icount_extra == 0) {
                 /* try to cause an exception pending in the log */
                 cpu_exec_nocache(cpu, 1, tb_find_fast(cpu), true);
                 ret = -1;
                 break;
+#endif
             }
 
             next_tb = 0; /* force lookup of first TB */
@@ -553,6 +557,9 @@ int cpu_exec(CPUState *cpu)
                     case TB_EXIT_ICOUNT_EXPIRED:
                     {
                         /* Instruction counter expired.  */
+#ifdef CONFIG_USER_ONLY
+                        abort();
+#else
                         int insns_left = cpu->icount_decr.u32;
                         if (cpu->icount_extra && insns_left >= 0) {
                             /* Refill decrementer and continue execution.  */
@@ -572,6 +579,7 @@ int cpu_exec(CPUState *cpu)
                             cpu_loop_exit(cpu);
                         }
                         break;
+#endif
                     }
                     default:
                         break;
-- 
2.7.3

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [RFC v1 03/11] tcg: comment on which functions have to be called with tb_lock held
  2016-03-18 16:18 [Qemu-devel] [RFC v1 00/11] Base enabling patches for MTTCG Alex Bennée
  2016-03-18 16:18 ` [Qemu-devel] [RFC v1 01/11] tcg: move tb_find_fast outside the tb_lock critical section Alex Bennée
  2016-03-18 16:18 ` [Qemu-devel] [RFC v1 02/11] cpu-exec: elide more icount code if CONFIG_USER_ONLY Alex Bennée
@ 2016-03-18 16:18 ` Alex Bennée
  2016-03-18 16:59   ` Paolo Bonzini
  2016-03-18 16:18 ` [Qemu-devel] [RFC v1 04/11] tcg: protect TBContext with tb_lock Alex Bennée
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 28+ messages in thread
From: Alex Bennée @ 2016-03-18 16:18 UTC (permalink / raw)
  To: mttcg, fred.konrad, a.rigo, serge.fdrv, cota
  Cc: Peter Crosthwaite, mark.burton, qemu-devel, pbonzini,
	Alex Bennée, Andreas Färber, Richard Henderson

From: Paolo Bonzini <pbonzini@redhat.com>

softmmu requires more functions to be thread-safe, because translation
blocks can be invalidated from e.g. notdirty callbacks.  Probably the
same holds for user-mode emulation, it's just that no one has ever
tried to produce a coherent locking there.

This patch will guide the introduction of more tb_lock and tb_unlock
calls for system emulation.

Note that after this patch some (most) of the mentioned functions are
still called outside tb_lock/tb_unlock.  The next one will rectify this.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

---
v1(ajb):
  - just s-o-b
---
 exec.c                  |  1 +
 include/exec/exec-all.h |  1 +
 include/qom/cpu.h       |  3 +++
 tcg/tcg.h               |  2 ++
 translate-all.c         | 38 ++++++++++++++++++++++++++++++--------
 5 files changed, 37 insertions(+), 8 deletions(-)

diff --git a/exec.c b/exec.c
index f09dd4e..4f0e5ed 100644
--- a/exec.c
+++ b/exec.c
@@ -825,6 +825,7 @@ int cpu_breakpoint_insert(CPUState *cpu, vaddr pc, int flags,
 {
     CPUBreakpoint *bp;
 
+    /* TODO: locking (RCU?) */
     bp = g_malloc(sizeof(*bp));
 
     bp->pc = pc;
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 0ef6ea5..c41d680 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -372,6 +372,7 @@ static inline void tb_set_jmp_target(TranslationBlock *tb,
 
 #endif
 
+/* Called with tb_lock held.  */
 static inline void tb_add_jump(TranslationBlock *tb, int n,
                                TranslationBlock *tb_next)
 {
diff --git a/include/qom/cpu.h b/include/qom/cpu.h
index 4132108..9124b6d 100644
--- a/include/qom/cpu.h
+++ b/include/qom/cpu.h
@@ -310,7 +310,10 @@ struct CPUState {
 
     void *env_ptr; /* CPUArchState */
     struct TranslationBlock *current_tb;
+
+    /* Protected by tb_lock.  */
     struct TranslationBlock *tb_jmp_cache[TB_JMP_CACHE_SIZE];
+
     struct GDBRegisterState *gdb_regs;
     int gdb_num_regs;
     int gdb_num_g_regs;
diff --git a/tcg/tcg.h b/tcg/tcg.h
index aa4e123..cd7cde7 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -609,6 +609,7 @@ static inline bool tcg_op_buf_full(void)
 
 /* pool based memory allocation */
 
+/* tb_lock must be held for tcg_malloc_internal. */
 void *tcg_malloc_internal(TCGContext *s, int size);
 void tcg_pool_reset(TCGContext *s);
 void tcg_pool_delete(TCGContext *s);
@@ -618,6 +619,7 @@ void tb_unlock(void);
 bool tb_lock_recursive(void);
 void tb_lock_reset(void);
 
+/* Called with tb_lock held.  */
 static inline void *tcg_malloc(int size)
 {
     TCGContext *s = &tcg_ctx;
diff --git a/translate-all.c b/translate-all.c
index f68dcbc..1a02450 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -260,7 +260,9 @@ static int encode_search(TranslationBlock *tb, uint8_t *block)
     return p - block;
 }
 
-/* The cpu state corresponding to 'searched_pc' is restored.  */
+/* The cpu state corresponding to 'searched_pc' is restored.
+ * Called with tb_lock held.
+ */
 static int cpu_restore_state_from_tb(CPUState *cpu, TranslationBlock *tb,
                                      uintptr_t searched_pc)
 {
@@ -318,8 +320,10 @@ bool cpu_restore_state(CPUState *cpu, uintptr_t retaddr)
         if (tb->cflags & CF_NOCACHE) {
             /* one-shot translation, invalidate it immediately */
             cpu->current_tb = NULL;
+            tb_lock();
             tb_phys_invalidate(tb, -1);
             tb_free(tb);
+            tb_unlock();
         }
         return true;
     }
@@ -411,6 +415,7 @@ static void page_init(void)
 }
 
 /* If alloc=1:
+ * Called with tb_lock held for system emulation.
  * Called with mmap_lock held for user-mode emulation.
  */
 static PageDesc *page_find_alloc(tb_page_addr_t index, int alloc)
@@ -766,8 +771,12 @@ bool tcg_enabled(void)
     return tcg_ctx.code_gen_buffer != NULL;
 }
 
-/* Allocate a new translation block. Flush the translation buffer if
-   too many translation blocks or too much generated code. */
+/*
+ * Allocate a new translation block. Flush the translation buffer if
+ * too many translation blocks or too much generated code.
+ *
+ * Called with tb_lock held.
+ */
 static TranslationBlock *tb_alloc(target_ulong pc)
 {
     TranslationBlock *tb;
@@ -781,6 +790,7 @@ static TranslationBlock *tb_alloc(target_ulong pc)
     return tb;
 }
 
+/* Called with tb_lock held.  */
 void tb_free(TranslationBlock *tb)
 {
     /* In practice this is mostly used for single use temporary TB
@@ -888,7 +898,10 @@ static void tb_invalidate_check(target_ulong address)
     }
 }
 
-/* verify that all the pages have correct rights for code */
+/* verify that all the pages have correct rights for code
+ *
+ * Called with tb_lock held.
+ */
 static void tb_page_check(void)
 {
     TranslationBlock *tb;
@@ -976,7 +989,10 @@ static inline void tb_jmp_remove(TranslationBlock *tb, int n)
     }
 }
 
-/* invalidate one TB */
+/* invalidate one TB
+ *
+ * Called with tb_lock held.
+ */
 void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr)
 {
     PageDesc *p;
@@ -1067,7 +1083,7 @@ static void build_page_bitmap(PageDesc *p)
 }
 #endif
 
-/* Called with mmap_lock held for user mode emulation.  */
+/* Called with tb_lock held, and mmap_lock too for user mode emulation.  */
 TranslationBlock *tb_gen_code(CPUState *cpu,
                               target_ulong pc, target_ulong cs_base,
                               int flags, int cflags)
@@ -1333,7 +1349,9 @@ void tb_invalidate_phys_page_fast(tb_page_addr_t start, int len)
     }
     if (!p->code_bitmap &&
         ++p->code_write_count >= SMC_BITMAP_USE_THRESHOLD) {
-        /* build code bitmap */
+        /* build code bitmap.  FIXME: writes should be protected by
+         * tb_lock, reads by tb_lock or RCU.
+         */
         build_page_bitmap(p);
     }
     if (p->code_bitmap) {
@@ -1423,6 +1441,7 @@ static void tb_invalidate_phys_page(tb_page_addr_t addr,
 
 /* add the tb in the target page and protect it if necessary
  *
+ * Called with tb_lock held.
  * Called with mmap_lock held for user-mode emulation.
  */
 static inline void tb_alloc_page(TranslationBlock *tb,
@@ -1482,6 +1501,7 @@ static inline void tb_alloc_page(TranslationBlock *tb,
 /* add a new TB and link it to the physical page tables. phys_page2 is
  * (-1) to indicate that only one page contains the TB.
  *
+ * Called with tb_lock held.
  * Called with mmap_lock held for user-mode emulation.
  */
 static void tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc,
@@ -1528,7 +1548,8 @@ static void tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc,
 }
 
 /* find the TB 'tb' such that tb[0].tc_ptr <= tc_ptr <
-   tb[1].tc_ptr. Return NULL if not found */
+ * tb[1].tc_ptr. Return NULL if not found
+ */
 static TranslationBlock *tb_find_pc(uintptr_t tc_ptr)
 {
     int m_min, m_max, m;
@@ -1581,6 +1602,7 @@ void tb_invalidate_phys_addr(AddressSpace *as, hwaddr addr)
 }
 #endif /* !defined(CONFIG_USER_ONLY) */
 
+/* Called with tb_lock held.  */
 void tb_check_watchpoint(CPUState *cpu)
 {
     TranslationBlock *tb;
-- 
2.7.3

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [RFC v1 04/11] tcg: protect TBContext with tb_lock.
  2016-03-18 16:18 [Qemu-devel] [RFC v1 00/11] Base enabling patches for MTTCG Alex Bennée
                   ` (2 preceding siblings ...)
  2016-03-18 16:18 ` [Qemu-devel] [RFC v1 03/11] tcg: comment on which functions have to be called with tb_lock held Alex Bennée
@ 2016-03-18 16:18 ` Alex Bennée
  2016-03-18 16:18 ` [Qemu-devel] [RFC v1 05/11] target-arm/psci.c: wake up sleeping CPUs Alex Bennée
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 28+ messages in thread
From: Alex Bennée @ 2016-03-18 16:18 UTC (permalink / raw)
  To: mttcg, fred.konrad, a.rigo, serge.fdrv, cota
  Cc: Eduardo Habkost, Peter Crosthwaite, Michael S. Tsirkin,
	mark.burton, qemu-devel, pbonzini, Alex Bennée,
	Richard Henderson

From: KONRAD Frederic <fred.konrad@greensocs.com>

This protects TBContext with tb_lock to make tb_* thread safe.

We can still have issue with tb_flush in case of multithread TCG:
another CPU can be executing code during a flush.

This can be fixed later by making all other TCG thread exiting before calling
tb_flush().

Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
Message-Id: <1439220437-23957-8-git-send-email-fred.konrad@greensocs.com>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[AJB: moved into tree, clean-up history]
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

---
v7 (FK):
  - Drop a tb_lock in already locked restore_state_to_opc.
v6 (FK):
  - Drop a tb_lock arround tb_find_fast in cpu-exec.c.
---
 cpu-exec.c         |  6 ++++++
 exec.c             |  3 +++
 hw/i386/kvmvapic.c |  3 +++
 translate-all.c    | 32 +++++++++++++++++++++++++-------
 4 files changed, 37 insertions(+), 7 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index e71113e..3572256 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -196,18 +196,24 @@ static void cpu_exec_nocache(CPUState *cpu, int max_cycles,
     if (max_cycles > CF_COUNT_MASK)
         max_cycles = CF_COUNT_MASK;
 
+    tb_lock();
     cpu->tb_invalidated_flag = 0;
     tb = tb_gen_code(cpu, orig_tb->pc, orig_tb->cs_base, orig_tb->flags,
                      max_cycles | CF_NOCACHE
                          | (ignore_icount ? CF_IGNORE_ICOUNT : 0));
     tb->orig_tb = cpu->tb_invalidated_flag ? NULL : orig_tb;
     cpu->current_tb = tb;
+    tb_unlock();
+
     /* execute the generated code */
     trace_exec_tb_nocache(tb, tb->pc);
     cpu_tb_exec(cpu, tb->tc_ptr);
+
+    tb_lock();
     cpu->current_tb = NULL;
     tb_phys_invalidate(tb, -1);
     tb_free(tb);
+    tb_unlock();
 }
 #endif
 
diff --git a/exec.c b/exec.c
index 4f0e5ed..402b751 100644
--- a/exec.c
+++ b/exec.c
@@ -2112,6 +2112,9 @@ static void check_watchpoint(int offset, int len, MemTxAttrs attrs, int flags)
                     continue;
                 }
                 cpu->watchpoint_hit = wp;
+
+                /* Unlocked by cpu_loop_exit or cpu_resume_from_signal.  */
+                tb_lock();
                 tb_check_watchpoint(cpu);
                 if (wp->flags & BP_STOP_BEFORE_ACCESS) {
                     cpu->exception_index = EXCP_DEBUG;
diff --git a/hw/i386/kvmvapic.c b/hw/i386/kvmvapic.c
index c69f374..7c0d542 100644
--- a/hw/i386/kvmvapic.c
+++ b/hw/i386/kvmvapic.c
@@ -14,6 +14,7 @@
 #include "sysemu/kvm.h"
 #include "hw/i386/apic_internal.h"
 #include "hw/sysbus.h"
+#include "tcg/tcg.h"
 
 #define VAPIC_IO_PORT           0x7e
 
@@ -446,6 +447,8 @@ static void patch_instruction(VAPICROMState *s, X86CPU *cpu, target_ulong ip)
     resume_all_vcpus();
 
     if (!kvm_enabled()) {
+        /* Unlocked by cpu_resume_from_signal.  */
+        tb_lock();
         cs->current_tb = NULL;
         tb_gen_code(cs, current_pc, current_cs_base, current_flags, 1);
         cpu_resume_from_signal(cs, NULL);
diff --git a/translate-all.c b/translate-all.c
index 1a02450..1aab243 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -847,7 +847,9 @@ static void page_flush_tb(void)
 }
 
 /* flush all the translation blocks */
-/* XXX: tb_flush is currently not thread safe */
+/* XXX: tb_flush is currently not thread safe.  System emulation calls it only
+ * with tb_lock taken or from safe_work, so no need to take tb_lock here.
+ */
 void tb_flush(CPUState *cpu)
 {
 #if defined(DEBUG_FLUSH)
@@ -1253,6 +1255,7 @@ void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end,
     /* we remove all the TBs in the range [start, end[ */
     /* XXX: see if in some cases it could be faster to invalidate all
        the code */
+    tb_lock();
     tb = p->first_tb;
     while (tb != NULL) {
         n = (uintptr_t)tb & 3;
@@ -1320,12 +1323,13 @@ void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end,
     if (current_tb_modified) {
         /* we generate a block containing just the instruction
            modifying the memory. It will ensure that it cannot modify
-           itself */
+           itself.  cpu_resume_from_signal unlocks tb_lock.  */
         cpu->current_tb = NULL;
         tb_gen_code(cpu, current_pc, current_cs_base, current_flags, 1);
         cpu_resume_from_signal(cpu, NULL);
     }
 #endif
+    tb_unlock();
 }
 
 #ifdef CONFIG_SOFTMMU
@@ -1392,6 +1396,8 @@ static void tb_invalidate_phys_page(tb_page_addr_t addr,
     if (!p) {
         return;
     }
+
+    tb_lock();
     tb = p->first_tb;
 #ifdef TARGET_HAS_PRECISE_SMC
     if (tb && pc != 0) {
@@ -1433,9 +1439,12 @@ static void tb_invalidate_phys_page(tb_page_addr_t addr,
         if (locked) {
             mmap_unlock();
         }
+
+        /* tb_lock released by cpu_resume_from_signal.  */
         cpu_resume_from_signal(cpu, puc);
     }
 #endif
+    tb_unlock();
 }
 #endif
 
@@ -1639,6 +1648,7 @@ void cpu_io_recompile(CPUState *cpu, uintptr_t retaddr)
     target_ulong pc, cs_base;
     uint64_t flags;
 
+    tb_lock();
     tb = tb_find_pc(retaddr);
     if (!tb) {
         cpu_abort(cpu, "cpu_io_recompile: could not find TB for pc=%p",
@@ -1690,11 +1700,15 @@ void cpu_io_recompile(CPUState *cpu, uintptr_t retaddr)
     /* FIXME: In theory this could raise an exception.  In practice
        we have already translated the block once so it's probably ok.  */
     tb_gen_code(cpu, pc, cs_base, flags, cflags);
-    /* TODO: If env->pc != tb->pc (i.e. the faulting instruction was not
-       the first in the TB) then we end up generating a whole new TB and
-       repeating the fault, which is horribly inefficient.
-       Better would be to execute just this insn uncached, or generate a
-       second new TB.  */
+
+    /* This unlocks the tb_lock.
+     *
+     * TODO: If env->pc != tb->pc (i.e. the faulting instruction was not
+     * the first in the TB) then we end up generating a whole new TB and
+     * repeating the fault, which is horribly inefficient.
+     * Better would be to execute just this insn uncached, or generate a
+     * second new TB.
+     */
     cpu_resume_from_signal(cpu, NULL);
 }
 
@@ -1719,6 +1733,8 @@ void dump_exec_info(FILE *f, fprintf_function cpu_fprintf)
     int direct_jmp_count, direct_jmp2_count, cross_page;
     TranslationBlock *tb;
 
+    tb_lock();
+
     target_code_size = 0;
     max_target_code_size = 0;
     cross_page = 0;
@@ -1774,6 +1790,8 @@ void dump_exec_info(FILE *f, fprintf_function cpu_fprintf)
             tcg_ctx.tb_ctx.tb_phys_invalidate_count);
     cpu_fprintf(f, "TLB flush count     %d\n", tlb_flush_count);
     tcg_dump_info(f, cpu_fprintf);
+
+    tb_unlock();
 }
 
 void dump_opcount_info(FILE *f, fprintf_function cpu_fprintf)
-- 
2.7.3

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [RFC v1 05/11] target-arm/psci.c: wake up sleeping CPUs
  2016-03-18 16:18 [Qemu-devel] [RFC v1 00/11] Base enabling patches for MTTCG Alex Bennée
                   ` (3 preceding siblings ...)
  2016-03-18 16:18 ` [Qemu-devel] [RFC v1 04/11] tcg: protect TBContext with tb_lock Alex Bennée
@ 2016-03-18 16:18 ` Alex Bennée
  2016-03-18 16:18 ` [Qemu-devel] [RFC v1 06/11] tcg: cpus rm tcg_exec_all() Alex Bennée
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 28+ messages in thread
From: Alex Bennée @ 2016-03-18 16:18 UTC (permalink / raw)
  To: mttcg, fred.konrad, a.rigo, serge.fdrv, cota
  Cc: Peter Maydell, Alexander Spyridakis, mark.burton, qemu-devel,
	open list:ARM, pbonzini, Alex Bennée

Testing with Alexander's bare metal syncronisation tests fails in MTTCG
leaving one CPU spinning forever waiting for the second CPU to wake up.
We simply need to poke the halt_cond once we have processed the PSCI
power on call.

Tested-by: Alex Bennée <alex.bennee@linaro.org>
CC: Alexander Spyridakis <a.spyridakis@virtualopensystems.com>
Message-Id: <1439220437-23957-20-git-send-email-fred.konrad@greensocs.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 target-arm/psci.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/target-arm/psci.c b/target-arm/psci.c
index c55487f..8e937d8 100644
--- a/target-arm/psci.c
+++ b/target-arm/psci.c
@@ -212,6 +212,8 @@ void arm_handle_psci_call(ARMCPU *cpu)
         }
         target_cpu_class->set_pc(target_cpu_state, entry);
 
+        qemu_cpu_kick(target_cpu_state);
+
         ret = 0;
         break;
     case QEMU_PSCI_0_1_FN_CPU_OFF:
-- 
2.7.3

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [RFC v1 06/11] tcg: cpus rm tcg_exec_all()
  2016-03-18 16:18 [Qemu-devel] [RFC v1 00/11] Base enabling patches for MTTCG Alex Bennée
                   ` (4 preceding siblings ...)
  2016-03-18 16:18 ` [Qemu-devel] [RFC v1 05/11] target-arm/psci.c: wake up sleeping CPUs Alex Bennée
@ 2016-03-18 16:18 ` Alex Bennée
  2016-03-18 16:18 ` [Qemu-devel] [RFC v1 07/11] tcg: add options for enabling MTTCG Alex Bennée
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 28+ messages in thread
From: Alex Bennée @ 2016-03-18 16:18 UTC (permalink / raw)
  To: mttcg, fred.konrad, a.rigo, serge.fdrv, cota
  Cc: Peter Crosthwaite, mark.burton, qemu-devel, pbonzini,
	Alex Bennée, Richard Henderson

In preparation for multi-threaded TCG we remove tcg_exec_all and move
all the CPU cycling into the main thread function. When MTTCG is enabled
we shall use a separate thread function which only handles one vCPU.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
 cpus.c | 64 ++++++++++++++++++++++++++++++----------------------------------
 1 file changed, 30 insertions(+), 34 deletions(-)

diff --git a/cpus.c b/cpus.c
index bc774e2..970d3ae 100644
--- a/cpus.c
+++ b/cpus.c
@@ -66,7 +66,6 @@
 
 #endif /* CONFIG_LINUX */
 
-static CPUState *next_cpu;
 int64_t max_delay;
 int64_t max_advance;
 
@@ -1103,7 +1102,7 @@ static void *qemu_dummy_cpu_thread_fn(void *arg)
 #endif
 }
 
-static void tcg_exec_all(void);
+static int tcg_cpu_exec(CPUState *cpu);
 
 static void *qemu_tcg_cpu_thread_fn(void *arg)
 {
@@ -1134,8 +1133,36 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
     /* process any pending work */
     atomic_mb_set(&exit_request, 1);
 
+    cpu = first_cpu;
+
     while (1) {
-        tcg_exec_all();
+        /* Account partial waits to QEMU_CLOCK_VIRTUAL.  */
+        /* qemu_account_warp_timer(); */
+        qemu_clock_warp(QEMU_CLOCK_VIRTUAL);
+
+        if (!cpu) {
+            cpu = first_cpu;
+        }
+
+        for (; cpu != NULL && !exit_request; cpu = CPU_NEXT(cpu)) {
+
+            qemu_clock_enable(QEMU_CLOCK_VIRTUAL,
+                              (cpu->singlestep_enabled & SSTEP_NOTIMER) == 0);
+
+            if (cpu_can_run(cpu)) {
+                int r = tcg_cpu_exec(cpu);
+                if (r == EXCP_DEBUG) {
+                    cpu_handle_guest_debug(cpu);
+                    break;
+                }
+            } else if (cpu->stop || cpu->stopped) {
+                break;
+            }
+
+        } /* for cpu.. */
+
+        /* Pairs with smp_wmb in qemu_cpu_kick.  */
+        atomic_mb_set(&exit_request, 0);
 
         if (use_icount) {
             int64_t deadline = qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL);
@@ -1494,37 +1521,6 @@ static int tcg_cpu_exec(CPUState *cpu)
     return ret;
 }
 
-static void tcg_exec_all(void)
-{
-    int r;
-
-    /* Account partial waits to QEMU_CLOCK_VIRTUAL.  */
-    qemu_clock_warp(QEMU_CLOCK_VIRTUAL);
-
-    if (next_cpu == NULL) {
-        next_cpu = first_cpu;
-    }
-    for (; next_cpu != NULL && !exit_request; next_cpu = CPU_NEXT(next_cpu)) {
-        CPUState *cpu = next_cpu;
-
-        qemu_clock_enable(QEMU_CLOCK_VIRTUAL,
-                          (cpu->singlestep_enabled & SSTEP_NOTIMER) == 0);
-
-        if (cpu_can_run(cpu)) {
-            r = tcg_cpu_exec(cpu);
-            if (r == EXCP_DEBUG) {
-                cpu_handle_guest_debug(cpu);
-                break;
-            }
-        } else if (cpu->stop || cpu->stopped) {
-            break;
-        }
-    }
-
-    /* Pairs with smp_wmb in qemu_cpu_kick.  */
-    atomic_mb_set(&exit_request, 0);
-}
-
 void list_cpus(FILE *f, fprintf_function cpu_fprintf, const char *optarg)
 {
     /* XXX: implement xxx_cpu_list for targets that still miss it */
-- 
2.7.3

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [RFC v1 07/11] tcg: add options for enabling MTTCG
  2016-03-18 16:18 [Qemu-devel] [RFC v1 00/11] Base enabling patches for MTTCG Alex Bennée
                   ` (5 preceding siblings ...)
  2016-03-18 16:18 ` [Qemu-devel] [RFC v1 06/11] tcg: cpus rm tcg_exec_all() Alex Bennée
@ 2016-03-18 16:18 ` Alex Bennée
  2016-03-18 16:18 ` [Qemu-devel] [RFC v1 08/11] tcg: add kick timer for single-threaded vCPU emulation Alex Bennée
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 28+ messages in thread
From: Alex Bennée @ 2016-03-18 16:18 UTC (permalink / raw)
  To: mttcg, fred.konrad, a.rigo, serge.fdrv, cota
  Cc: Peter Crosthwaite, mark.burton, qemu-devel, pbonzini,
	Alex Bennée, Andreas Färber, Richard Henderson

From: KONRAD Frederic <fred.konrad@greensocs.com>

We know there will be cases where MTTCG won't work until additional work
is done in the front/back ends to support. It will however be useful to
be able to turn it on.

As a result MTTCG will default to off unless the combination is
supported. However the user can turn it on for the sake of testing.

Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
[AJB: move to -tcg mttcg=on/off, defaults]
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

---
v1:
  - merge with add mttcg option.
  - update commit message
---
 cpus.c                | 43 +++++++++++++++++++++++++++++++++++++++++++
 include/qom/cpu.h     | 14 ++++++++++++++
 include/sysemu/cpus.h |  2 ++
 qemu-options.hx       | 14 ++++++++++++++
 vl.c                  | 12 +++++++++++-
 5 files changed, 84 insertions(+), 1 deletion(-)

diff --git a/cpus.c b/cpus.c
index 970d3ae..76bd321 100644
--- a/cpus.c
+++ b/cpus.c
@@ -25,6 +25,7 @@
 /* Needed early for CONFIG_BSD etc. */
 #include "qemu/osdep.h"
 
+#include "qemu/config-file.h"
 #include "monitor/monitor.h"
 #include "qapi/qmp/qerror.h"
 #include "qemu/error-report.h"
@@ -145,6 +146,48 @@ typedef struct TimersState {
 } TimersState;
 
 static TimersState timers_state;
+static bool mttcg_enabled;
+
+static QemuOptsList qemu_tcg_opts = {
+    .name = "tcg",
+    .head = QTAILQ_HEAD_INITIALIZER(qemu_tcg_opts.head),
+    .desc = {
+        {
+            .name = "mttcg",
+            .type = QEMU_OPT_BOOL,
+            .help = "Enable/disable multi-threaded TCG",
+        },
+        { /* end of list */ }
+    },
+};
+
+static void tcg_register_config(void)
+{
+    qemu_add_opts(&qemu_tcg_opts);
+}
+
+machine_init(tcg_register_config);
+
+static bool default_mttcg_enabled(void)
+{
+    /*
+     * TODO: Check if we have a chance to have MTTCG working on this guest/host.
+     *       Basically is the atomic instruction implemented? Is there any
+     *       memory ordering issue?
+     */
+    return false;
+}
+
+void qemu_tcg_configure(QemuOpts *opts)
+{
+    mttcg_enabled = qemu_opt_get_bool(opts, "mttcg", default_mttcg_enabled());
+}
+
+bool qemu_tcg_mttcg_enabled(void)
+{
+    return mttcg_enabled;
+}
+
 
 int64_t cpu_get_icount_raw(void)
 {
diff --git a/include/qom/cpu.h b/include/qom/cpu.h
index 9124b6d..d6cb7b8 100644
--- a/include/qom/cpu.h
+++ b/include/qom/cpu.h
@@ -373,6 +373,20 @@ extern struct CPUTailQ cpus;
 extern __thread CPUState *current_cpu;
 
 /**
+ * qemu_tcg_enable_mttcg:
+ * Enable the MultiThread TCG support.
+ */
+void qemu_tcg_enable_mttcg(void);
+
+/**
+ * qemu_tcg_mttcg_enabled:
+ * Check whether we are running MultiThread TCG or not.
+ *
+ * Returns: %true if we are in MTTCG mode %false otherwise.
+ */
+bool qemu_tcg_mttcg_enabled(void);
+
+/**
  * cpu_paging_enabled:
  * @cpu: The CPU whose state is to be inspected.
  *
diff --git a/include/sysemu/cpus.h b/include/sysemu/cpus.h
index 3d1e5ba..606426f 100644
--- a/include/sysemu/cpus.h
+++ b/include/sysemu/cpus.h
@@ -26,4 +26,6 @@ extern int smp_threads;
 
 void list_cpus(FILE *f, fprintf_function cpu_fprintf, const char *optarg);
 
+void qemu_tcg_configure(QemuOpts *opts);
+
 #endif
diff --git a/qemu-options.hx b/qemu-options.hx
index 0cf7bb9..52ab1f3 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -3199,6 +3199,20 @@ Attach to existing xen domain.
 xend will use this when starting QEMU (XEN only).
 ETEXI
 
+DEF("tcg", HAS_ARG, QEMU_OPTION_tcg, \
+    "-tcg [mttcg=on|off] control TCG options\n", QEMU_ARCH_ALL)
+STEXI
+@item -tcg
+@findex -tcg
+@table @option
+@item mttcg=[on|off]
+Control multi-threaded TCG. By default QEMU will enable multi-threaded
+emulation for front/back-end combinations that are known to work. The
+user may enable it against the defaults however odd guest behaviour
+may occur.
+@end table
+ETEXI
+
 DEF("no-reboot", 0, QEMU_OPTION_no_reboot, \
     "-no-reboot      exit instead of rebooting\n", QEMU_ARCH_ALL)
 STEXI
diff --git a/vl.c b/vl.c
index 7a28982..1165506 100644
--- a/vl.c
+++ b/vl.c
@@ -2958,7 +2958,8 @@ int main(int argc, char **argv, char **envp)
     const char *boot_once = NULL;
     DisplayState *ds;
     int cyls, heads, secs, translation;
-    QemuOpts *hda_opts = NULL, *opts, *machine_opts, *icount_opts = NULL;
+    QemuOpts *opts, *machine_opts;
+    QemuOpts *hda_opts = NULL, *icount_opts = NULL, *tcg_opts = NULL;
     QemuOptsList *olist;
     int optind;
     const char *optarg;
@@ -3750,6 +3751,13 @@ int main(int argc, char **argv, char **envp)
             case QEMU_OPTION_no_reboot:
                 no_reboot = 1;
                 break;
+            case QEMU_OPTION_tcg:
+                tcg_opts = qemu_opts_parse_noisily(qemu_find_opts("tcg"),
+                                                   optarg, false);
+                if (!tcg_opts) {
+                    exit(1);
+                }
+                break;
             case QEMU_OPTION_no_shutdown:
                 no_shutdown = 1;
                 break;
@@ -4028,6 +4036,8 @@ int main(int argc, char **argv, char **envp)
      */
     loc_set_none();
 
+    qemu_tcg_configure(tcg_opts);
+
     replay_configure(icount_opts);
 
     machine_class = select_machine();
-- 
2.7.3

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [RFC v1 08/11] tcg: add kick timer for single-threaded vCPU emulation
  2016-03-18 16:18 [Qemu-devel] [RFC v1 00/11] Base enabling patches for MTTCG Alex Bennée
                   ` (6 preceding siblings ...)
  2016-03-18 16:18 ` [Qemu-devel] [RFC v1 07/11] tcg: add options for enabling MTTCG Alex Bennée
@ 2016-03-18 16:18 ` Alex Bennée
  2016-03-18 16:18 ` [Qemu-devel] [RFC v1 09/11] tcg: drop global lock during TCG code execution Alex Bennée
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 28+ messages in thread
From: Alex Bennée @ 2016-03-18 16:18 UTC (permalink / raw)
  To: mttcg, fred.konrad, a.rigo, serge.fdrv, cota
  Cc: Peter Crosthwaite, mark.burton, qemu-devel, pbonzini,
	Alex Bennée, Richard Henderson

Currently we rely on the side effect of the main loop grabbing the
iothread_mutex to give any long running basic block chains a kick to
ensure the next vCPU is scheduled. As this code is being re-factored and
rationalised we now do it explicitly here.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
 cpus.c | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/cpus.c b/cpus.c
index 76bd321..a87fbf9 100644
--- a/cpus.c
+++ b/cpus.c
@@ -1145,11 +1145,30 @@ static void *qemu_dummy_cpu_thread_fn(void *arg)
 #endif
 }
 
+/* Single-threaded TCG
+ *
+ * In the single-threaded case each vCPU is simulated in turn. If
+ * there is more than a single vCPU we create a simple timer to kick
+ * the vCPU and ensure we don't get stuck in a tight loop in one vCPU.
+ * This is done explicitly rather than relying on side-effects
+ * elsewhere.
+ */
 static int tcg_cpu_exec(CPUState *cpu);
+static void qemu_cpu_kick_no_halt(void);
+
+static void kick_tcg_thread(void *opaque)
+{
+    QEMUTimer *self = *(QEMUTimer **) opaque;
+    timer_mod(self,
+              qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) +
+              get_ticks_per_sec() / 10);
+    qemu_cpu_kick_no_halt();
+}
 
 static void *qemu_tcg_cpu_thread_fn(void *arg)
 {
     CPUState *cpu = arg;
+    QEMUTimer *kick_timer;
 
     rcu_register_thread();
 
@@ -1173,6 +1192,14 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
         }
     }
 
+    /* Set to kick if we have to do more than one vCPU */
+    if (CPU_NEXT(first_cpu)) {
+        kick_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,  kick_tcg_thread, &kick_timer);
+        timer_mod(kick_timer,
+                  qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) +
+                  get_ticks_per_sec() / 10);
+    }
+
     /* process any pending work */
     atomic_mb_set(&exit_request, 1);
 
-- 
2.7.3

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [RFC v1 09/11] tcg: drop global lock during TCG code execution
  2016-03-18 16:18 [Qemu-devel] [RFC v1 00/11] Base enabling patches for MTTCG Alex Bennée
                   ` (7 preceding siblings ...)
  2016-03-18 16:18 ` [Qemu-devel] [RFC v1 08/11] tcg: add kick timer for single-threaded vCPU emulation Alex Bennée
@ 2016-03-18 16:18 ` Alex Bennée
  2016-03-18 16:49   ` Paolo Bonzini
  2016-03-23  9:19   ` KONRAD Frederic
  2016-03-18 16:18 ` [Qemu-devel] [RFC v1 10/11] tcg: grab iothread lock in cpu-exec interrupt handling Alex Bennée
  2016-03-18 16:18 ` [Qemu-devel] [RFC v1 11/11] tcg: enable thread-per-vCPU Alex Bennée
  10 siblings, 2 replies; 28+ messages in thread
From: Alex Bennée @ 2016-03-18 16:18 UTC (permalink / raw)
  To: mttcg, fred.konrad, a.rigo, serge.fdrv, cota
  Cc: Eduardo Habkost, Peter Crosthwaite, Michael S. Tsirkin,
	mark.burton, qemu-devel, pbonzini, Alex Bennée,
	Richard Henderson

From: KONRAD Frederic <fred.konrad@greensocs.com>

This finally allows TCG to benefit from the iothread introduction: Drop
the global mutex while running pure TCG CPU code. Reacquire the lock
when entering MMIO or PIO emulation, or when leaving the TCG loop.

We have to revert a few optimization for the current TCG threading
model, namely kicking the TCG thread in qemu_mutex_lock_iothread and not
kicking it in qemu_cpu_kick. We also need to disable RAM block
reordering until we have a more efficient locking mechanism at hand.

Still, a Linux x86 UP guest and my Musicpal ARM model boot fine here.
These numbers demonstrate where we gain something:

20338 jan       20   0  331m  75m 6904 R   99  0.9   0:50.95 qemu-system-arm
20337 jan       20   0  331m  75m 6904 S   20  0.9   0:26.50 qemu-system-arm

The guest CPU was fully loaded, but the iothread could still run mostly
independent on a second core. Without the patch we don't get beyond

32206 jan       20   0  330m  73m 7036 R   82  0.9   1:06.00 qemu-system-arm
32204 jan       20   0  330m  73m 7036 S   21  0.9   0:17.03 qemu-system-arm

We don't benefit significantly, though, when the guest is not fully
loading a host CPU.

Message-Id: <1439220437-23957-10-git-send-email-fred.konrad@greensocs.com>
Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
[AJB: -smp single-threaded fix, rm old info from commit msg]
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

---
v1 (ajb):
  - SMP failure now fixed by previous commit
Changes from Fred Konrad (mttcg-v7 via paolo):
  * Rebase on the current HEAD.
  * Fixes a deadlock in qemu_devices_reset().
  * Remove the mutex in address_space_*
---
 cpu-exec.c         | 10 ++++++++++
 cpus.c             | 26 +++-----------------------
 cputlb.c           |  4 ++++
 exec.c             | 12 ++++++++++++
 hw/i386/kvmvapic.c |  3 +++
 softmmu_template.h | 17 +++++++++++++++++
 translate-all.c    | 11 +++++++++--
 7 files changed, 58 insertions(+), 25 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index 3572256..76891fd 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -28,6 +28,7 @@
 #include "qemu/rcu.h"
 #include "exec/tb-hash.h"
 #include "exec/log.h"
+#include "qemu/main-loop.h"
 #if defined(TARGET_I386) && !defined(CONFIG_USER_ONLY)
 #include "hw/i386/apic.h"
 #endif
@@ -449,6 +450,13 @@ int cpu_exec(CPUState *cpu)
             for(;;) {
                 interrupt_request = cpu->interrupt_request;
                 if (unlikely(interrupt_request)) {
+                    /* FIXME: this needs to take the iothread lock.
+                     * For this we need to find all places in
+                     * cc->cpu_exec_interrupt that can call cpu_loop_exit,
+                     * and call qemu_unlock_iothread_mutex() there.  Else,
+                     * add a flag telling cpu_loop_exit() to unlock it.
+                     */
+
                     if (unlikely(cpu->singlestep_enabled & SSTEP_NOIRQ)) {
                         /* Mask out external interrupts for this step. */
                         interrupt_request &= ~CPU_INTERRUPT_SSTEP_MASK;
@@ -501,6 +509,7 @@ int cpu_exec(CPUState *cpu)
                            the program flow was changed */
                         next_tb = 0;
                     }
+
                 }
                 if (unlikely(cpu->exit_request
                              || replay_has_interrupt())) {
@@ -618,6 +627,7 @@ int cpu_exec(CPUState *cpu)
             g_assert(env == &x86_cpu->env);
 #endif
 #endif /* buggy compiler */
+
             cpu->can_do_io = 1;
             tb_lock_reset();
         }
diff --git a/cpus.c b/cpus.c
index a87fbf9..0e3d5f9 100644
--- a/cpus.c
+++ b/cpus.c
@@ -911,8 +911,6 @@ static void qemu_kvm_init_cpu_signals(CPUState *cpu)
 #endif /* _WIN32 */
 
 static QemuMutex qemu_global_mutex;
-static QemuCond qemu_io_proceeded_cond;
-static unsigned iothread_requesting_mutex;
 
 static QemuThread io_thread;
 
@@ -928,7 +926,6 @@ void qemu_init_cpu_loop(void)
     qemu_cond_init(&qemu_cpu_cond);
     qemu_cond_init(&qemu_pause_cond);
     qemu_cond_init(&qemu_work_cond);
-    qemu_cond_init(&qemu_io_proceeded_cond);
     qemu_mutex_init(&qemu_global_mutex);
 
     qemu_thread_get_self(&io_thread);
@@ -1043,10 +1040,6 @@ static void qemu_tcg_wait_io_event(CPUState *cpu)
         qemu_cond_wait(cpu->halt_cond, &qemu_global_mutex);
     }
 
-    while (iothread_requesting_mutex) {
-        qemu_cond_wait(&qemu_io_proceeded_cond, &qemu_global_mutex);
-    }
-
     CPU_FOREACH(cpu) {
         qemu_wait_io_event_common(cpu);
     }
@@ -1314,22 +1307,7 @@ bool qemu_mutex_iothread_locked(void)
 
 void qemu_mutex_lock_iothread(void)
 {
-    atomic_inc(&iothread_requesting_mutex);
-    /* In the simple case there is no need to bump the VCPU thread out of
-     * TCG code execution.
-     */
-    if (!tcg_enabled() || qemu_in_vcpu_thread() ||
-        !first_cpu || !first_cpu->created) {
-        qemu_mutex_lock(&qemu_global_mutex);
-        atomic_dec(&iothread_requesting_mutex);
-    } else {
-        if (qemu_mutex_trylock(&qemu_global_mutex)) {
-            qemu_cpu_kick_no_halt();
-            qemu_mutex_lock(&qemu_global_mutex);
-        }
-        atomic_dec(&iothread_requesting_mutex);
-        qemu_cond_broadcast(&qemu_io_proceeded_cond);
-    }
+    qemu_mutex_lock(&qemu_global_mutex);
     iothread_locked = true;
 }
 
@@ -1575,7 +1553,9 @@ static int tcg_cpu_exec(CPUState *cpu)
         cpu->icount_decr.u16.low = decr;
         cpu->icount_extra = count;
     }
+    qemu_mutex_unlock_iothread();
     ret = cpu_exec(cpu);
+    qemu_mutex_lock_iothread();
 #ifdef CONFIG_PROFILER
     tcg_time += profile_getclock() - ti;
 #endif
diff --git a/cputlb.c b/cputlb.c
index 2f7a166..d607471 100644
--- a/cputlb.c
+++ b/cputlb.c
@@ -30,6 +30,10 @@
 #include "exec/ram_addr.h"
 #include "tcg/tcg.h"
 
+#ifdef CONFIG_SOFTMMU
+#include "qemu/main-loop.h"
+#endif
+
 //#define DEBUG_TLB
 //#define DEBUG_TLB_CHECK
 
diff --git a/exec.c b/exec.c
index 402b751..9986d59 100644
--- a/exec.c
+++ b/exec.c
@@ -2113,6 +2113,12 @@ static void check_watchpoint(int offset, int len, MemTxAttrs attrs, int flags)
                 }
                 cpu->watchpoint_hit = wp;
 
+                /*
+                 * Unlock iothread mutex before calling cpu_loop_exit
+                 * or cpu_resume_from_signal.
+                 */
+                qemu_mutex_unlock_iothread();
+
                 /* Unlocked by cpu_loop_exit or cpu_resume_from_signal.  */
                 tb_lock();
                 tb_check_watchpoint(cpu);
@@ -2348,8 +2354,14 @@ static void io_mem_init(void)
     memory_region_init_io(&io_mem_rom, NULL, &unassigned_mem_ops, NULL, NULL, UINT64_MAX);
     memory_region_init_io(&io_mem_unassigned, NULL, &unassigned_mem_ops, NULL,
                           NULL, UINT64_MAX);
+
+    /* io_mem_notdirty calls tb_invalidate_phys_page_fast,
+     * which must be called without the iothread mutex.
+     */
     memory_region_init_io(&io_mem_notdirty, NULL, &notdirty_mem_ops, NULL,
                           NULL, UINT64_MAX);
+    memory_region_clear_global_locking(&io_mem_notdirty);
+
     memory_region_init_io(&io_mem_watch, NULL, &watch_mem_ops, NULL,
                           NULL, UINT64_MAX);
 }
diff --git a/hw/i386/kvmvapic.c b/hw/i386/kvmvapic.c
index 7c0d542..a0439a8 100644
--- a/hw/i386/kvmvapic.c
+++ b/hw/i386/kvmvapic.c
@@ -447,6 +447,9 @@ static void patch_instruction(VAPICROMState *s, X86CPU *cpu, target_ulong ip)
     resume_all_vcpus();
 
     if (!kvm_enabled()) {
+        /* Unlock iothread mutex before calling cpu_resume_from_signal.  */
+        qemu_mutex_unlock_iothread();
+
         /* Unlocked by cpu_resume_from_signal.  */
         tb_lock();
         cs->current_tb = NULL;
diff --git a/softmmu_template.h b/softmmu_template.h
index 208f808..0b6c609 100644
--- a/softmmu_template.h
+++ b/softmmu_template.h
@@ -151,6 +151,7 @@ static inline DATA_TYPE glue(io_read, SUFFIX)(CPUArchState *env,
     CPUState *cpu = ENV_GET_CPU(env);
     hwaddr physaddr = iotlbentry->addr;
     MemoryRegion *mr = iotlb_to_region(cpu, physaddr, iotlbentry->attrs);
+    bool locked = false;
 
     physaddr = (physaddr & TARGET_PAGE_MASK) + addr;
     cpu->mem_io_pc = retaddr;
@@ -159,8 +160,15 @@ static inline DATA_TYPE glue(io_read, SUFFIX)(CPUArchState *env,
     }
 
     cpu->mem_io_vaddr = addr;
+    if (mr->global_locking) {
+        qemu_mutex_lock_iothread();
+        locked = true;
+    }
     memory_region_dispatch_read(mr, physaddr, &val, 1 << SHIFT,
                                 iotlbentry->attrs);
+    if (locked) {
+        qemu_mutex_unlock_iothread();
+    }
     return val;
 }
 #endif
@@ -358,6 +366,7 @@ static inline void glue(io_write, SUFFIX)(CPUArchState *env,
     CPUState *cpu = ENV_GET_CPU(env);
     hwaddr physaddr = iotlbentry->addr;
     MemoryRegion *mr = iotlb_to_region(cpu, physaddr, iotlbentry->attrs);
+    bool locked = false;
 
     physaddr = (physaddr & TARGET_PAGE_MASK) + addr;
     if (mr != &io_mem_rom && mr != &io_mem_notdirty && !cpu->can_do_io) {
@@ -366,8 +375,16 @@ static inline void glue(io_write, SUFFIX)(CPUArchState *env,
 
     cpu->mem_io_vaddr = addr;
     cpu->mem_io_pc = retaddr;
+
+    if (mr->global_locking) {
+        qemu_mutex_lock_iothread();
+        locked = true;
+    }
     memory_region_dispatch_write(mr, physaddr, val, 1 << SHIFT,
                                  iotlbentry->attrs);
+    if (locked) {
+        qemu_mutex_unlock_iothread();
+    }
 }
 
 void helper_le_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
diff --git a/translate-all.c b/translate-all.c
index 1aab243..fe1392a 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -1221,6 +1221,7 @@ void tb_invalidate_phys_range(tb_page_addr_t start, tb_page_addr_t end)
  * this TB.
  *
  * Called with mmap_lock held for user-mode emulation
+ * If called from generated code, iothread mutex must not be held.
  */
 void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end,
                                    int is_cpu_write_access)
@@ -1333,7 +1334,9 @@ void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end,
 }
 
 #ifdef CONFIG_SOFTMMU
-/* len must be <= 8 and start must be a multiple of len */
+/* len must be <= 8 and start must be a multiple of len.
+ * Called via softmmu_template.h, with iothread mutex not held.
+ */
 void tb_invalidate_phys_page_fast(tb_page_addr_t start, int len)
 {
     PageDesc *p;
@@ -1591,6 +1594,7 @@ static TranslationBlock *tb_find_pc(uintptr_t tc_ptr)
 }
 
 #if !defined(CONFIG_USER_ONLY)
+/* If called from generated code, iothread mutex must not be held.  */
 void tb_invalidate_phys_addr(AddressSpace *as, hwaddr addr)
 {
     ram_addr_t ram_addr;
@@ -1637,7 +1641,10 @@ void tb_check_watchpoint(CPUState *cpu)
 
 #ifndef CONFIG_USER_ONLY
 /* in deterministic execution mode, instructions doing device I/Os
-   must be at the end of the TB */
+ * must be at the end of the TB.
+ *
+ * Called by softmmu_template.h, with iothread mutex not held.
+ */
 void cpu_io_recompile(CPUState *cpu, uintptr_t retaddr)
 {
 #if defined(TARGET_MIPS) || defined(TARGET_SH4)
-- 
2.7.3

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [RFC v1 10/11] tcg: grab iothread lock in cpu-exec interrupt handling
  2016-03-18 16:18 [Qemu-devel] [RFC v1 00/11] Base enabling patches for MTTCG Alex Bennée
                   ` (8 preceding siblings ...)
  2016-03-18 16:18 ` [Qemu-devel] [RFC v1 09/11] tcg: drop global lock during TCG code execution Alex Bennée
@ 2016-03-18 16:18 ` Alex Bennée
  2016-03-18 16:48   ` Paolo Bonzini
  2016-03-18 16:18 ` [Qemu-devel] [RFC v1 11/11] tcg: enable thread-per-vCPU Alex Bennée
  10 siblings, 1 reply; 28+ messages in thread
From: Alex Bennée @ 2016-03-18 16:18 UTC (permalink / raw)
  To: mttcg, fred.konrad, a.rigo, serge.fdrv, cota
  Cc: Peter Crosthwaite, mark.burton, qemu-devel, pbonzini,
	Alex Bennée, Andreas Färber, Richard Henderson

From: "Emilio G. Cota" <cota@braap.org>

Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

---
v1 (ajb)
 - pulled from emilio/mttcg series
---
 cpu-exec.c        | 33 +++++++++++++++++++++++++++------
 include/qom/cpu.h |  1 +
 2 files changed, 28 insertions(+), 6 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index 76891fd..6acaf25 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -351,6 +351,29 @@ static void cpu_handle_debug_exception(CPUState *cpu)
     cc->debug_excp_handler(cpu);
 }
 
+#ifdef CONFIG_SOFTMMU
+static inline void cpu_exit_loop_lock(CPUState *cpu)
+{
+    qemu_mutex_lock_iothread();
+    cpu->cpu_loop_exit_locked = true;
+}
+
+static inline void cpu_exit_loop_lock_reset(CPUState *cpu)
+{
+    if (cpu->cpu_loop_exit_locked) {
+        cpu->cpu_loop_exit_locked = false;
+        qemu_mutex_unlock_iothread();
+    }
+}
+
+#else
+static inline void cpu_exit_loop_lock(CPUState *cpu)
+{ }
+
+static inline void cpu_exit_loop_lock_reset(CPUState *cpu)
+{ }
+#endif
+
 /* main execution loop */
 
 int cpu_exec(CPUState *cpu)
@@ -450,12 +473,7 @@ int cpu_exec(CPUState *cpu)
             for(;;) {
                 interrupt_request = cpu->interrupt_request;
                 if (unlikely(interrupt_request)) {
-                    /* FIXME: this needs to take the iothread lock.
-                     * For this we need to find all places in
-                     * cc->cpu_exec_interrupt that can call cpu_loop_exit,
-                     * and call qemu_unlock_iothread_mutex() there.  Else,
-                     * add a flag telling cpu_loop_exit() to unlock it.
-                     */
+                    cpu_exit_loop_lock(cpu);
 
                     if (unlikely(cpu->singlestep_enabled & SSTEP_NOIRQ)) {
                         /* Mask out external interrupts for this step. */
@@ -510,6 +528,8 @@ int cpu_exec(CPUState *cpu)
                         next_tb = 0;
                     }
 
+                    cpu_exit_loop_lock_reset(cpu);
+
                 }
                 if (unlikely(cpu->exit_request
                              || replay_has_interrupt())) {
@@ -630,6 +650,7 @@ int cpu_exec(CPUState *cpu)
 
             cpu->can_do_io = 1;
             tb_lock_reset();
+            cpu_exit_loop_lock_reset(cpu);
         }
     } /* for(;;) */
 
diff --git a/include/qom/cpu.h b/include/qom/cpu.h
index d6cb7b8..954d97d 100644
--- a/include/qom/cpu.h
+++ b/include/qom/cpu.h
@@ -295,6 +295,7 @@ struct CPUState {
     bool crash_occurred;
     bool exit_request;
     bool tb_invalidated_flag;
+    bool cpu_loop_exit_locked;
     uint32_t interrupt_request;
     int singlestep_enabled;
     int64_t icount_extra;
-- 
2.7.3

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [RFC v1 11/11] tcg: enable thread-per-vCPU
  2016-03-18 16:18 [Qemu-devel] [RFC v1 00/11] Base enabling patches for MTTCG Alex Bennée
                   ` (9 preceding siblings ...)
  2016-03-18 16:18 ` [Qemu-devel] [RFC v1 10/11] tcg: grab iothread lock in cpu-exec interrupt handling Alex Bennée
@ 2016-03-18 16:18 ` Alex Bennée
  10 siblings, 0 replies; 28+ messages in thread
From: Alex Bennée @ 2016-03-18 16:18 UTC (permalink / raw)
  To: mttcg, fred.konrad, a.rigo, serge.fdrv, cota
  Cc: Peter Crosthwaite, mark.burton, qemu-devel, pbonzini,
	Alex Bennée, Richard Henderson

From: KONRAD Frederic <fred.konrad@greensocs.com>

This allows the user to switch on multithread. However it defaults to
off as individual front-end and back-ends need additional fixes. The
function default_mttcg_enabled can be tweaked as support is added.

As assumptions about tcg_current_cpu are no longer relevant to the
single-threaded kick routine we need to save the current information
somewhere else for the timer.

Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[AJB: Some fixes, conditionally, commit rewording]
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

---
v1 (ajb):
  - fix merge conflicts
  - maintain single-thread approach
---
 cpu-exec-common.c       |   1 -
 cpu-exec.c              |  13 ----
 cpus.c                  | 181 ++++++++++++++++++++++++++++++------------------
 include/exec/exec-all.h |   4 --
 translate-all.c         |  10 ---
 5 files changed, 115 insertions(+), 94 deletions(-)

diff --git a/cpu-exec-common.c b/cpu-exec-common.c
index 1b1731c..3d7eaa3 100644
--- a/cpu-exec-common.c
+++ b/cpu-exec-common.c
@@ -23,7 +23,6 @@
 #include "exec/memory-internal.h"
 
 bool exit_request;
-CPUState *tcg_current_cpu;
 
 /* exit the current TB from a signal handler. The host registers are
    restored in a state compatible with the CPU emulator
diff --git a/cpu-exec.c b/cpu-exec.c
index 6acaf25..f6a8016 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -291,7 +291,6 @@ static TranslationBlock *tb_find_slow(CPUState *cpu,
      */
     tb = tb_find_physical(cpu, pc, cs_base, flags);
     if (!tb) {
-#ifdef CONFIG_USER_ONLY
         /* mmap_lock is needed by tb_gen_code, and mmap_lock must be
          * taken outside tb_lock.  tb_lock is released later in
          * cpu_exec.
@@ -303,7 +302,6 @@ static TranslationBlock *tb_find_slow(CPUState *cpu,
          * duplicated TB in the pool.
          */
         tb = tb_find_physical(cpu, pc, cs_base, flags);
-#endif
         if (!tb) {
             /* if no translated code available, then translate it now */
             tb = tb_gen_code(cpu, pc, cs_base, flags, 0);
@@ -408,13 +406,8 @@ int cpu_exec(CPUState *cpu)
         cpu->halted = 0;
     }
 
-    atomic_mb_set(&tcg_current_cpu, cpu);
     rcu_read_lock();
 
-    if (unlikely(atomic_mb_read(&exit_request))) {
-        cpu->exit_request = 1;
-    }
-
     cc->cpu_exec_enter(cpu);
 
     /* Calculate difference between guest clock and host clock.
@@ -533,7 +526,6 @@ int cpu_exec(CPUState *cpu)
                 }
                 if (unlikely(cpu->exit_request
                              || replay_has_interrupt())) {
-                    cpu->exit_request = 0;
                     cpu->exception_index = EXCP_INTERRUPT;
                     cpu_loop_exit(cpu);
                 }
@@ -657,10 +649,5 @@ int cpu_exec(CPUState *cpu)
     cc->cpu_exec_exit(cpu);
     rcu_read_unlock();
 
-    /* fail safe : never use current_cpu outside cpu_exec() */
-    current_cpu = NULL;
-
-    /* Does not need atomic_mb_set because a spurious wakeup is okay.  */
-    atomic_set(&tcg_current_cpu, NULL);
     return ret;
 }
diff --git a/cpus.c b/cpus.c
index 0e3d5f9..71f2267 100644
--- a/cpus.c
+++ b/cpus.c
@@ -957,10 +957,7 @@ void run_on_cpu(CPUState *cpu, void (*func)(void *data), void *data)
 
     qemu_cpu_kick(cpu);
     while (!atomic_mb_read(&wi.done)) {
-        CPUState *self_cpu = current_cpu;
-
         qemu_cond_wait(&qemu_work_cond, &qemu_global_mutex);
-        current_cpu = self_cpu;
     }
 }
 
@@ -1022,27 +1019,29 @@ static void flush_queued_work(CPUState *cpu)
 
 static void qemu_wait_io_event_common(CPUState *cpu)
 {
+    atomic_mb_set(&cpu->thread_kicked, false);
     if (cpu->stop) {
         cpu->stop = false;
         cpu->stopped = true;
         qemu_cond_broadcast(&qemu_pause_cond);
     }
     flush_queued_work(cpu);
-    cpu->thread_kicked = false;
 }
 
 static void qemu_tcg_wait_io_event(CPUState *cpu)
 {
-    while (all_cpu_threads_idle()) {
-       /* Start accounting real time to the virtual clock if the CPUs
-          are idle.  */
-        qemu_clock_warp(QEMU_CLOCK_VIRTUAL);
+    while (cpu_thread_is_idle(cpu)) {
+        /* Start accounting real time to the virtual clock if the CPUs
+         * are idle.
+         */
+        if ((all_cpu_threads_idle()) && (cpu->cpu_index == 0)) {
+            /* qemu_account_warp_timer(); */
+            qemu_clock_warp(QEMU_CLOCK_VIRTUAL);
+        }
         qemu_cond_wait(cpu->halt_cond, &qemu_global_mutex);
     }
 
-    CPU_FOREACH(cpu) {
-        qemu_wait_io_event_common(cpu);
-    }
+    qemu_wait_io_event_common(cpu);
 }
 
 static void qemu_kvm_wait_io_event(CPUState *cpu)
@@ -1109,6 +1108,7 @@ static void *qemu_dummy_cpu_thread_fn(void *arg)
     qemu_thread_get_self(cpu->thread);
     cpu->thread_id = qemu_get_thread_id();
     cpu->can_do_io = 1;
+    current_cpu = cpu;
 
     sigemptyset(&waitset);
     sigaddset(&waitset, SIG_IPI);
@@ -1117,9 +1117,7 @@ static void *qemu_dummy_cpu_thread_fn(void *arg)
     cpu->created = true;
     qemu_cond_signal(&qemu_cpu_cond);
 
-    current_cpu = cpu;
     while (1) {
-        current_cpu = NULL;
         qemu_mutex_unlock_iothread();
         do {
             int sig;
@@ -1130,7 +1128,6 @@ static void *qemu_dummy_cpu_thread_fn(void *arg)
             exit(1);
         }
         qemu_mutex_lock_iothread();
-        current_cpu = cpu;
         qemu_wait_io_event_common(cpu);
     }
 
@@ -1147,32 +1144,40 @@ static void *qemu_dummy_cpu_thread_fn(void *arg)
  * elsewhere.
  */
 static int tcg_cpu_exec(CPUState *cpu);
-static void qemu_cpu_kick_no_halt(void);
+
+struct kick_info {
+    QEMUTimer *timer;
+    CPUState  *cpu;
+};
 
 static void kick_tcg_thread(void *opaque)
 {
-    QEMUTimer *self = *(QEMUTimer **) opaque;
-    timer_mod(self,
+    struct kick_info *info = (struct kick_info *) opaque;
+    CPUState *cpu = atomic_mb_read(&info->cpu);
+
+    timer_mod(info->timer,
               qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) +
               get_ticks_per_sec() / 10);
-    qemu_cpu_kick_no_halt();
+
+    if (cpu) {
+        cpu_exit(cpu);
+    }
 }
 
-static void *qemu_tcg_cpu_thread_fn(void *arg)
+static void *qemu_tcg_single_cpu_thread_fn(void *arg)
 {
+    struct kick_info info;
     CPUState *cpu = arg;
-    QEMUTimer *kick_timer;
 
     rcu_register_thread();
 
     qemu_mutex_lock_iothread();
     qemu_thread_get_self(cpu->thread);
 
-    CPU_FOREACH(cpu) {
-        cpu->thread_id = qemu_get_thread_id();
-        cpu->created = true;
-        cpu->can_do_io = 1;
-    }
+    cpu->thread_id = qemu_get_thread_id();
+    cpu->created = true;
+    cpu->can_do_io = 1;
+    current_cpu = cpu;
     qemu_cond_signal(&qemu_cpu_cond);
 
     /* wait for initial kick-off after machine start */
@@ -1187,14 +1192,18 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
 
     /* Set to kick if we have to do more than one vCPU */
     if (CPU_NEXT(first_cpu)) {
-        kick_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,  kick_tcg_thread, &kick_timer);
-        timer_mod(kick_timer,
+        info.timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,  kick_tcg_thread, &info);
+        info.cpu = NULL;
+        smp_wmb();
+        timer_mod(info.timer,
                   qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) +
                   get_ticks_per_sec() / 10);
     }
 
     /* process any pending work */
-    atomic_mb_set(&exit_request, 1);
+    CPU_FOREACH(cpu) {
+        atomic_mb_set(&cpu->exit_request, 1);
+    }
 
     cpu = first_cpu;
 
@@ -1207,7 +1216,8 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
             cpu = first_cpu;
         }
 
-        for (; cpu != NULL && !exit_request; cpu = CPU_NEXT(cpu)) {
+        for (; cpu != NULL && !cpu->exit_request; cpu = CPU_NEXT(cpu)) {
+            atomic_mb_set(&info.cpu, cpu);
 
             qemu_clock_enable(QEMU_CLOCK_VIRTUAL,
                               (cpu->singlestep_enabled & SSTEP_NOTIMER) == 0);
@@ -1224,17 +1234,64 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
 
         } /* for cpu.. */
 
+        atomic_mb_set(&info.cpu, NULL);
+
         /* Pairs with smp_wmb in qemu_cpu_kick.  */
-        atomic_mb_set(&exit_request, 0);
+        CPU_FOREACH(cpu) {
+            atomic_mb_set(&cpu->exit_request, 0);
+        }
 
         if (use_icount) {
-            int64_t deadline = qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL);
+            int64_t deadline =
+                qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL);
+
+                if (deadline == 0) {
+                    qemu_clock_notify(QEMU_CLOCK_VIRTUAL);
+                }
+        }
+
+        qemu_tcg_wait_io_event(first_cpu);
+    }
+
+    return NULL;
+}
+
+/* Multi-threaded TCG
+ *
+ * In the multi-threaded case each vCPU has its own thread. The TLS
+ * variable current_cpu can be used deep in the code to find the
+ * current CPUState for a given thread.
+ */
+
+static void *qemu_tcg_cpu_thread_fn(void *arg)
+{
+    CPUState *cpu = arg;
+
+    rcu_register_thread();
+
+    qemu_mutex_lock_iothread();
+    qemu_thread_get_self(cpu->thread);
+
+    cpu->thread_id = qemu_get_thread_id();
+    cpu->created = true;
+    cpu->can_do_io = 1;
+    current_cpu = cpu;
+    qemu_cond_signal(&qemu_cpu_cond);
+
+    while (1) {
+        if (!cpu->stopped) {
+            tcg_cpu_exec(cpu);
 
-            if (deadline == 0) {
-                qemu_clock_notify(QEMU_CLOCK_VIRTUAL);
+            if (use_icount) {
+                int64_t deadline =
+                    qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL);
+
+                if (deadline == 0) {
+                    qemu_clock_notify(QEMU_CLOCK_VIRTUAL);
+                }
             }
         }
-        qemu_tcg_wait_io_event(QTAILQ_FIRST(&cpus));
+        qemu_tcg_wait_io_event(cpu);
     }
 
     return NULL;
@@ -1259,24 +1316,11 @@ static void qemu_cpu_kick_thread(CPUState *cpu)
 #endif
 }
 
-static void qemu_cpu_kick_no_halt(void)
-{
-    CPUState *cpu;
-    /* Ensure whatever caused the exit has reached the CPU threads before
-     * writing exit_request.
-     */
-    atomic_mb_set(&exit_request, 1);
-    cpu = atomic_mb_read(&tcg_current_cpu);
-    if (cpu) {
-        cpu_exit(cpu);
-    }
-}
-
 void qemu_cpu_kick(CPUState *cpu)
 {
     qemu_cond_broadcast(cpu->halt_cond);
     if (tcg_enabled()) {
-        qemu_cpu_kick_no_halt();
+        cpu_exit(cpu);
     } else {
         qemu_cpu_kick_thread(cpu);
     }
@@ -1342,13 +1386,6 @@ void pause_all_vcpus(void)
 
     if (qemu_in_vcpu_thread()) {
         cpu_stop_current();
-        if (!kvm_enabled()) {
-            CPU_FOREACH(cpu) {
-                cpu->stop = false;
-                cpu->stopped = true;
-            }
-            return;
-        }
     }
 
     while (!all_vcpus_paused()) {
@@ -1382,29 +1419,41 @@ void resume_all_vcpus(void)
 static void qemu_tcg_init_vcpu(CPUState *cpu)
 {
     char thread_name[VCPU_THREAD_NAME_SIZE];
-    static QemuCond *tcg_halt_cond;
-    static QemuThread *tcg_cpu_thread;
+    static QemuCond *single_tcg_halt_cond;
+    static QemuThread *single_tcg_cpu_thread;
 
-    /* share a single thread for all cpus with TCG */
-    if (!tcg_cpu_thread) {
+    if (qemu_tcg_mttcg_enabled() || !single_tcg_cpu_thread) {
         cpu->thread = g_malloc0(sizeof(QemuThread));
         cpu->halt_cond = g_malloc0(sizeof(QemuCond));
         qemu_cond_init(cpu->halt_cond);
-        tcg_halt_cond = cpu->halt_cond;
-        snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/TCG",
+
+        if (qemu_tcg_mttcg_enabled()) {
+            /* create a thread per vCPU with TCG (MTTCG) */
+            snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/TCG",
                  cpu->cpu_index);
-        qemu_thread_create(cpu->thread, thread_name, qemu_tcg_cpu_thread_fn,
-                           cpu, QEMU_THREAD_JOINABLE);
+
+            qemu_thread_create(cpu->thread, thread_name, qemu_tcg_cpu_thread_fn,
+                               cpu, QEMU_THREAD_JOINABLE);
+
+        } else {
+            /* share a single thread for all cpus with TCG */
+            snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "ALL CPUs/TCG");
+            qemu_thread_create(cpu->thread, thread_name, qemu_tcg_single_cpu_thread_fn,
+                               cpu, QEMU_THREAD_JOINABLE);
+
+            single_tcg_halt_cond = cpu->halt_cond;
+            single_tcg_cpu_thread = cpu->thread;
+        }
 #ifdef _WIN32
         cpu->hThread = qemu_thread_get_handle(cpu->thread);
 #endif
         while (!cpu->created) {
             qemu_cond_wait(&qemu_cpu_cond, &qemu_global_mutex);
         }
-        tcg_cpu_thread = cpu->thread;
     } else {
-        cpu->thread = tcg_cpu_thread;
-        cpu->halt_cond = tcg_halt_cond;
+        /* For non-MTTCG cases we share the thread */
+        cpu->thread = single_tcg_cpu_thread;
+        cpu->halt_cond = single_tcg_halt_cond;
     }
 }
 
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index c41d680..15e30da 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -455,8 +455,4 @@ bool memory_region_is_unassigned(MemoryRegion *mr);
 /* vl.c */
 extern int singlestep;
 
-/* cpu-exec.c, accessed with atomic_mb_read/atomic_mb_set */
-extern CPUState *tcg_current_cpu;
-extern bool exit_request;
-
 #endif
diff --git a/translate-all.c b/translate-all.c
index fe1392a..f129308 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -121,47 +121,37 @@ static void *l1_map[V_L1_SIZE];
 TCGContext tcg_ctx;
 
 /* translation block context */
-#ifdef CONFIG_USER_ONLY
 __thread int have_tb_lock;
-#endif
 
 void tb_lock(void)
 {
-#ifdef CONFIG_USER_ONLY
     assert(!have_tb_lock);
     qemu_mutex_lock(&tcg_ctx.tb_ctx.tb_lock);
     have_tb_lock++;
-#endif
 }
 
 void tb_unlock(void)
 {
-#ifdef CONFIG_USER_ONLY
     assert(have_tb_lock);
     have_tb_lock--;
     qemu_mutex_unlock(&tcg_ctx.tb_ctx.tb_lock);
-#endif
 }
 
 bool tb_lock_recursive(void)
 {
-#ifdef CONFIG_USER_ONLY
     if (have_tb_lock) {
         return false;
     }
     tb_lock();
-#endif
     return true;
 }
 
 void tb_lock_reset(void)
 {
-#ifdef CONFIG_USER_ONLY
     if (have_tb_lock) {
         qemu_mutex_unlock(&tcg_ctx.tb_ctx.tb_lock);
         have_tb_lock = 0;
     }
-#endif
 }
 
 static void tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc,
-- 
2.7.3

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [RFC v1 10/11] tcg: grab iothread lock in cpu-exec interrupt handling
  2016-03-18 16:18 ` [Qemu-devel] [RFC v1 10/11] tcg: grab iothread lock in cpu-exec interrupt handling Alex Bennée
@ 2016-03-18 16:48   ` Paolo Bonzini
  2016-03-22 12:03     ` Alex Bennée
  0 siblings, 1 reply; 28+ messages in thread
From: Paolo Bonzini @ 2016-03-18 16:48 UTC (permalink / raw)
  To: Alex Bennée, mttcg, fred.konrad, a.rigo, serge.fdrv, cota
  Cc: Richard Henderson, mark.burton, qemu-devel, Andreas Färber,
	Peter Crosthwaite



On 18/03/2016 17:18, Alex Bennée wrote:
> From: "Emilio G. Cota" <cota@braap.org>
> 
> Signed-off-by: Emilio G. Cota <cota@braap.org>
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> 
> ---
> v1 (ajb)
>  - pulled from emilio/mttcg series
> ---
>  cpu-exec.c        | 33 +++++++++++++++++++++++++++------
>  include/qom/cpu.h |  1 +
>  2 files changed, 28 insertions(+), 6 deletions(-)
> 
> diff --git a/cpu-exec.c b/cpu-exec.c
> index 76891fd..6acaf25 100644
> --- a/cpu-exec.c
> +++ b/cpu-exec.c
> @@ -351,6 +351,29 @@ static void cpu_handle_debug_exception(CPUState *cpu)
>      cc->debug_excp_handler(cpu);
>  }
>  
> +#ifdef CONFIG_SOFTMMU
> +static inline void cpu_exit_loop_lock(CPUState *cpu)
> +{
> +    qemu_mutex_lock_iothread();
> +    cpu->cpu_loop_exit_locked = true;
> +}
> +
> +static inline void cpu_exit_loop_lock_reset(CPUState *cpu)
> +{
> +    if (cpu->cpu_loop_exit_locked) {
> +        cpu->cpu_loop_exit_locked = false;
> +        qemu_mutex_unlock_iothread();
> +    }
> +}

In the meanwhile we got qemu_mutex_iothread_locked(), so these two
inlines are not necessary anymore.

Paolo

> +#else
> +static inline void cpu_exit_loop_lock(CPUState *cpu)
> +{ }
> +
> +static inline void cpu_exit_loop_lock_reset(CPUState *cpu)
> +{ }
> +#endif
> +
>  /* main execution loop */
>  
>  int cpu_exec(CPUState *cpu)
> @@ -450,12 +473,7 @@ int cpu_exec(CPUState *cpu)
>              for(;;) {
>                  interrupt_request = cpu->interrupt_request;
>                  if (unlikely(interrupt_request)) {
> -                    /* FIXME: this needs to take the iothread lock.
> -                     * For this we need to find all places in
> -                     * cc->cpu_exec_interrupt that can call cpu_loop_exit,
> -                     * and call qemu_unlock_iothread_mutex() there.  Else,
> -                     * add a flag telling cpu_loop_exit() to unlock it.
> -                     */
> +                    cpu_exit_loop_lock(cpu);
>  
>                      if (unlikely(cpu->singlestep_enabled & SSTEP_NOIRQ)) {
>                          /* Mask out external interrupts for this step. */
> @@ -510,6 +528,8 @@ int cpu_exec(CPUState *cpu)
>                          next_tb = 0;
>                      }
>  
> +                    cpu_exit_loop_lock_reset(cpu);
> +
>                  }
>                  if (unlikely(cpu->exit_request
>                               || replay_has_interrupt())) {
> @@ -630,6 +650,7 @@ int cpu_exec(CPUState *cpu)
>  
>              cpu->can_do_io = 1;
>              tb_lock_reset();
> +            cpu_exit_loop_lock_reset(cpu);
>          }
>      } /* for(;;) */
>  
> diff --git a/include/qom/cpu.h b/include/qom/cpu.h
> index d6cb7b8..954d97d 100644
> --- a/include/qom/cpu.h
> +++ b/include/qom/cpu.h
> @@ -295,6 +295,7 @@ struct CPUState {
>      bool crash_occurred;
>      bool exit_request;
>      bool tb_invalidated_flag;
> +    bool cpu_loop_exit_locked;
>      uint32_t interrupt_request;
>      int singlestep_enabled;
>      int64_t icount_extra;
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [RFC v1 09/11] tcg: drop global lock during TCG code execution
  2016-03-18 16:18 ` [Qemu-devel] [RFC v1 09/11] tcg: drop global lock during TCG code execution Alex Bennée
@ 2016-03-18 16:49   ` Paolo Bonzini
  2016-03-23  9:19   ` KONRAD Frederic
  1 sibling, 0 replies; 28+ messages in thread
From: Paolo Bonzini @ 2016-03-18 16:49 UTC (permalink / raw)
  To: Alex Bennée, mttcg, fred.konrad, a.rigo, serge.fdrv, cota
  Cc: Eduardo Habkost, Michael S. Tsirkin, Peter Crosthwaite,
	mark.burton, qemu-devel, Richard Henderson



On 18/03/2016 17:18, Alex Bennée wrote:
> From: KONRAD Frederic <fred.konrad@greensocs.com>
> 
> This finally allows TCG to benefit from the iothread introduction: Drop
> the global mutex while running pure TCG CPU code. Reacquire the lock
> when entering MMIO or PIO emulation, or when leaving the TCG loop.
> 
> We have to revert a few optimization for the current TCG threading
> model, namely kicking the TCG thread in qemu_mutex_lock_iothread and not
> kicking it in qemu_cpu_kick. We also need to disable RAM block
> reordering until we have a more efficient locking mechanism at hand.
> 
> Still, a Linux x86 UP guest and my Musicpal ARM model boot fine here.
> These numbers demonstrate where we gain something:
> 
> 20338 jan       20   0  331m  75m 6904 R   99  0.9   0:50.95 qemu-system-arm
> 20337 jan       20   0  331m  75m 6904 S   20  0.9   0:26.50 qemu-system-arm
> 
> The guest CPU was fully loaded, but the iothread could still run mostly
> independent on a second core. Without the patch we don't get beyond
> 
> 32206 jan       20   0  330m  73m 7036 R   82  0.9   1:06.00 qemu-system-arm
> 32204 jan       20   0  330m  73m 7036 S   21  0.9   0:17.03 qemu-system-arm
> 
> We don't benefit significantly, though, when the guest is not fully
> loading a host CPU.
> 
> Message-Id: <1439220437-23957-10-git-send-email-fred.konrad@greensocs.com>
> Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
> [AJB: -smp single-threaded fix, rm old info from commit msg]
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

You should probably squash this and patch 10 together.

Paolo

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [RFC v1 01/11] tcg: move tb_find_fast outside the tb_lock critical section
  2016-03-18 16:18 ` [Qemu-devel] [RFC v1 01/11] tcg: move tb_find_fast outside the tb_lock critical section Alex Bennée
@ 2016-03-18 16:54   ` Paolo Bonzini
  2016-03-21 21:50   ` Emilio G. Cota
  1 sibling, 0 replies; 28+ messages in thread
From: Paolo Bonzini @ 2016-03-18 16:54 UTC (permalink / raw)
  To: Alex Bennée, mttcg, fred.konrad, a.rigo, serge.fdrv, cota
  Cc: Richard Henderson, mark.burton, qemu-devel, Andreas Färber,
	Peter Crosthwaite



On 18/03/2016 17:18, Alex Bennée wrote:
> -#endif
> -
> -    /* if no translated code available, then translate it now */
> -    cpu->tb_invalidated_flag = 0;

Moving this reset from here...

> -                if (cpu->tb_invalidated_flag) {
> +                if (atomic_read(&cpu->tb_invalidated_flag)) {
>                      /* as some TB could have been invalidated because
>                         of a tb_flush while generating the code, we
>                         must recompute the hash index here */
>                      next_tb = 0;
> +
> +                    /* Clear the flag, we've now observed the flush.  */
> +                    tb_lock_recursive();
> +                    cpu->tb_invalidated_flag = 0;
>                  }

... to here probably can be anticipated to Sergey's (my? :D) "make
tb_invalidated_flag per-CPU" patch.

Then this patch can just add the tb_lock_recursive.

Paolo

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [RFC v1 03/11] tcg: comment on which functions have to be called with tb_lock held
  2016-03-18 16:18 ` [Qemu-devel] [RFC v1 03/11] tcg: comment on which functions have to be called with tb_lock held Alex Bennée
@ 2016-03-18 16:59   ` Paolo Bonzini
  2016-03-21 21:50     ` Emilio G. Cota
  0 siblings, 1 reply; 28+ messages in thread
From: Paolo Bonzini @ 2016-03-18 16:59 UTC (permalink / raw)
  To: Alex Bennée, mttcg, fred.konrad, a.rigo, serge.fdrv, cota
  Cc: Richard Henderson, mark.burton, qemu-devel, Andreas Färber,
	Peter Crosthwaite



On 18/03/2016 17:18, Alex Bennée wrote:
> +
> +    /* Protected by tb_lock.  */

Only writes are protected by tb_lock.  Read happen outside the lock.

Reads are not quite thread safe yet, because of tb_flush.  In order to
fix that, there's either the async_safe_run() mechanism from Fred or
preferrably the code generation buffer could be moved under RCU.

Because tb_flush is really rare, my suggestion is simply to allocate two
code generation buffers and do something like

static int which_buffer_is_in_use_bit_mask = 1;
...

   /* in tb_flush */
   assert (which_buffer_is_in_use_bit_mask != 3);
   if (which_buffer_is_in_use_bit_mask == 1) {
       which_buffer_is_in_use_bit_mask |= 2;
       call_rcu(function doing which_buffer_is_in_use_bit_mask &= ~1);
       point TCG to second buffer
    } else if (which_buffer_is_in_use_bit_mask == 2) {
       which_buffer_is_in_use_bit_mask |= 1;
       call_rcu(function doing which_buffer_is_in_use_bit_mask &= ~2);
       point TCG to first buffer
    }

Basically, we just assert that call_rcu makes at least one pass between
two tb_flushes.

All this is also a prerequisite for patch 1.

Paolo

>      struct TranslationBlock *tb_jmp_cache[TB_JMP_CACHE_SIZE];
> +

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [RFC v1 01/11] tcg: move tb_find_fast outside the tb_lock critical section
  2016-03-18 16:18 ` [Qemu-devel] [RFC v1 01/11] tcg: move tb_find_fast outside the tb_lock critical section Alex Bennée
  2016-03-18 16:54   ` Paolo Bonzini
@ 2016-03-21 21:50   ` Emilio G. Cota
  2016-03-21 22:08     ` Peter Maydell
  2016-03-22 11:55     ` Alex Bennée
  1 sibling, 2 replies; 28+ messages in thread
From: Emilio G. Cota @ 2016-03-21 21:50 UTC (permalink / raw)
  To: Alex Bennée
  Cc: mttcg, Peter Crosthwaite, mark.burton, qemu-devel, a.rigo,
	pbonzini, serge.fdrv, Richard Henderson, Andreas Färber,
	fred.konrad

On Fri, Mar 18, 2016 at 16:18:42 +0000, Alex Bennée wrote:
> From: KONRAD Frederic <fred.konrad@greensocs.com>
> 
> Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> [AJB: minor checkpatch fixes]
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> 
> ---
> v1(ajb)
>   - checkpatch fixes
> ---
> diff --git a/cpu-exec.c b/cpu-exec.c
> index 07545aa..52f25de 100644
> --- a/cpu-exec.c
> +++ b/cpu-exec.c
> @@ -225,8 +225,9 @@ static TranslationBlock *tb_find_physical(CPUState *cpu,
>      phys_page1 = phys_pc & TARGET_PAGE_MASK;
>      h = tb_phys_hash_func(phys_pc);
>      for (ptb1 = &tcg_ctx.tb_ctx.tb_phys_hash[h];
> -         (tb = *ptb1) != NULL;
> +         (tb = atomic_read(ptb1)) != NULL;
>           ptb1 = &tb->phys_hash_next) {
> +        smp_read_barrier_depends();
>          if (tb->pc != pc ||
>              tb->page_addr[0] != phys_page1 ||
>              tb->cs_base != cs_base ||
> @@ -254,7 +255,18 @@ static TranslationBlock *tb_find_physical(CPUState *cpu,
[ Adding this missing line to the diff for clarity ]
           /* Move the TB to the head of the list */
>          *ptb1 = tb->phys_hash_next;
>          tb->phys_hash_next = tcg_ctx.tb_ctx.tb_phys_hash[h];
>          tcg_ctx.tb_ctx.tb_phys_hash[h] = tb;

This function, as is, doesn't really just "find"; two concurrent "finders"
could race here by *writing* to the head of the list at the same time.

The fix is to get rid of this write entirely; moving the just-found TB to
the head of the list is not really that necessary thanks to the CPU's
tb_jmp_cache table. This fix would make the function read-only, which
is what the function's name implies.

Further, I'd like to see tb_phys_hash to use the RCU queue primitives; it
makes everything easier to understand (and we avoid sprinkling the code
base with smp_barrier_depends).

I have these two changes queued up as part of my upcoming series, which I'm
basing on your patchset.

Thanks for putting these changes together!

		Emilio

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [RFC v1 03/11] tcg: comment on which functions have to be called with tb_lock held
  2016-03-18 16:59   ` Paolo Bonzini
@ 2016-03-21 21:50     ` Emilio G. Cota
  2016-03-21 22:12       ` Paolo Bonzini
  0 siblings, 1 reply; 28+ messages in thread
From: Emilio G. Cota @ 2016-03-21 21:50 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: mttcg, Peter Crosthwaite, mark.burton, qemu-devel, a.rigo,
	serge.fdrv, Richard Henderson, Alex Bennée,
	Andreas Färber, fred.konrad

On Fri, Mar 18, 2016 at 17:59:46 +0100, Paolo Bonzini wrote:
> On 18/03/2016 17:18, Alex Bennée wrote:
> > +
> > +    /* Protected by tb_lock.  */
> 
> Only writes are protected by tb_lock.  Read happen outside the lock.
> 
> Reads are not quite thread safe yet, because of tb_flush.  In order to
> fix that, there's either the async_safe_run() mechanism from Fred or
> preferrably the code generation buffer could be moved under RCU.

A third approach (which I prefer) is to protect tb_jmp_cache with
a seqlock. That way invalidates (via tlb_flush from other CPUs, or
via tb_flush) are picked up if they're racing with concurrent reads.

> Because tb_flush is really rare, my suggestion is simply to allocate two
> code generation buffers and do something like
> 
> static int which_buffer_is_in_use_bit_mask = 1;
> ...
> 
>    /* in tb_flush */
>    assert (which_buffer_is_in_use_bit_mask != 3);
>    if (which_buffer_is_in_use_bit_mask == 1) {
>        which_buffer_is_in_use_bit_mask |= 2;
>        call_rcu(function doing which_buffer_is_in_use_bit_mask &= ~1);
>        point TCG to second buffer
>     } else if (which_buffer_is_in_use_bit_mask == 2) {
>        which_buffer_is_in_use_bit_mask |= 1;
>        call_rcu(function doing which_buffer_is_in_use_bit_mask &= ~2);
>        point TCG to first buffer
>     }
> 
> Basically, we just assert that call_rcu makes at least one pass between
> two tb_flushes.
> 
> All this is also a prerequisite for patch 1.

The problem with this approach is that the "point TCG to second buffer"
is not just a question of pointing code_gen_buffer to a new address;
we'd have to create a new tcg_ctx struct, since tcg_ctx has quite a few
elements that are dependent on code_gen_buffer (e.g. s->code_ptr,
s->code_buf). And this could end up with readers reading a partially
up-to-date (i.e. corrupt) tcg_ctx.

I know you're not enthusiastic about it, but I think a mechanism to "stop
all CPUs and wait until they have indeed stopped" is in this case justified.

I'm preparing an RFC with these two changes (seqlock and stop all cpus mechanism)
on top of these base patches.

Thanks,

		Emilio

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [RFC v1 01/11] tcg: move tb_find_fast outside the tb_lock critical section
  2016-03-21 21:50   ` Emilio G. Cota
@ 2016-03-21 22:08     ` Peter Maydell
  2016-03-21 23:59       ` Emilio G. Cota
  2016-03-22 11:55     ` Alex Bennée
  1 sibling, 1 reply; 28+ messages in thread
From: Peter Maydell @ 2016-03-21 22:08 UTC (permalink / raw)
  To: Emilio G. Cota
  Cc: mttcg, Peter Crosthwaite, Mark Burton, Alvise Rigo,
	QEMU Developers, Sergey Fedorov, Paolo Bonzini,
	KONRAD Frédéric, Alex Bennée, Andreas Färber,
	Richard Henderson

On 21 March 2016 at 21:50, Emilio G. Cota <cota@braap.org> wrote:
> This function, as is, doesn't really just "find"; two concurrent "finders"
> could race here by *writing* to the head of the list at the same time.
>
> The fix is to get rid of this write entirely; moving the just-found TB to
> the head of the list is not really that necessary thanks to the CPU's
> tb_jmp_cache table. This fix would make the function read-only, which
> is what the function's name implies.

It is not _necessary_, but it is a performance optimization to
speed up the "missed in the TLB" case. (A TLB flush will wipe
the tb_jmp_cache table.) From the thread where the move-to-front-of-list
behaviour was added in 2010, benefits cited:

# The exact numbers depend on complexity of guest system.
# - For basic Debian system (no X-server) on versatilepb we observed
# 25% decrease of boot time.
# - For to-be released Samsung LIMO platform on S5PC110 board we
# observed 2x (for older version) and 3x (for newer version)
# decrease of boot time.
# - Small CPU-intensive benchmarks are not affected because they are
# completely handled by 'tb_find_fast'.
#
# We also noticed better response time for heavyweight GUI applications,
# but I do not know how to measure it accurately.
(https://lists.gnu.org/archive/html/qemu-devel/2010-12/msg00380.html)

I think what's happening here is that for guest CPUs where TLB
invalidation happens fairly frequently (notably ARM, because
we don't model ASIDs in the QEMU TLB and thus have to flush
the TLB on any context switch) the case of "we didn't hit in
the TLB but we do have this TB and it was used really recently"
happens often enough to make it worthwhile for the
tb_find_physical() code to keep its hash buckets in LRU order.

Obviously that's all five year old data now, so a pinch of
salt may be indicated, but I'd rather we didn't just remove
the optimisation without some benchmarking to check that it's
not significant. A 2x difference is huge.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [RFC v1 03/11] tcg: comment on which functions have to be called with tb_lock held
  2016-03-21 21:50     ` Emilio G. Cota
@ 2016-03-21 22:12       ` Paolo Bonzini
  0 siblings, 0 replies; 28+ messages in thread
From: Paolo Bonzini @ 2016-03-21 22:12 UTC (permalink / raw)
  To: Emilio G. Cota
  Cc: mttcg, Peter Crosthwaite, mark.burton, a.rigo, qemu-devel,
	serge.fdrv, fred.konrad, Alex Bennée, Andreas Färber,
	Richard Henderson



On 21/03/2016 22:50, Emilio G. Cota wrote:
> The problem with this approach is that the "point TCG to second buffer"
> is not just a question of pointing code_gen_buffer to a new address;
> we'd have to create a new tcg_ctx struct, since tcg_ctx has quite a few
> elements that are dependent on code_gen_buffer (e.g. s->code_ptr,
> s->code_buf). 

Are these (or other fields similarly dependent on code_gen_buffer) ever
read outside tb_lock?  A quick "git grep -wl" suggests that they are
only used from tcg/, which should only run while tb_lock is held.

If not it would be enough to call tcg_prologue_init from tb_flush.

Paolo

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [RFC v1 01/11] tcg: move tb_find_fast outside the tb_lock critical section
  2016-03-21 22:08     ` Peter Maydell
@ 2016-03-21 23:59       ` Emilio G. Cota
  2016-03-22  8:29         ` Paolo Bonzini
  2016-03-22 11:59         ` Alex Bennée
  0 siblings, 2 replies; 28+ messages in thread
From: Emilio G. Cota @ 2016-03-21 23:59 UTC (permalink / raw)
  To: Peter Maydell
  Cc: mttcg, Peter Crosthwaite, Mark Burton, Alvise Rigo,
	QEMU Developers, Sergey Fedorov, Paolo Bonzini,
	KONRAD Frédéric, Alex Bennée, Andreas Färber,
	Richard Henderson

On Mon, Mar 21, 2016 at 22:08:06 +0000, Peter Maydell wrote:
> It is not _necessary_, but it is a performance optimization to
> speed up the "missed in the TLB" case. (A TLB flush will wipe
> the tb_jmp_cache table.) From the thread where the move-to-front-of-list
> behaviour was added in 2010, benefits cited:

(snip)
> I think what's happening here is that for guest CPUs where TLB
> invalidation happens fairly frequently (notably ARM, because
> we don't model ASIDs in the QEMU TLB and thus have to flush
> the TLB on any context switch) the case of "we didn't hit in
> the TLB but we do have this TB and it was used really recently"
> happens often enough to make it worthwhile for the
> tb_find_physical() code to keep its hash buckets in LRU order.
> 
> Obviously that's all five year old data now, so a pinch of
> salt may be indicated, but I'd rather we didn't just remove
> the optimisation without some benchmarking to check that it's
> not significant. A 2x difference is huge.

Good point. Most of my tests have been on x86-on-x86, and the
difference there (for many CPU-intensive benchmarks such as SPEC) was
negligible.

Just tested the current master booting Alex' debian ARM image, without
LRU, and I see a 20% increase in boot time.

I'll add per-bucket locks to keep the same behaviour without hurting
scalability.

Thanks,

		Emilio

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [RFC v1 01/11] tcg: move tb_find_fast outside the tb_lock critical section
  2016-03-21 23:59       ` Emilio G. Cota
@ 2016-03-22  8:29         ` Paolo Bonzini
  2016-03-22 11:59         ` Alex Bennée
  1 sibling, 0 replies; 28+ messages in thread
From: Paolo Bonzini @ 2016-03-22  8:29 UTC (permalink / raw)
  To: Emilio G. Cota, Peter Maydell
  Cc: mttcg, Peter Crosthwaite, Mark Burton, QEMU Developers,
	Alvise Rigo, Sergey Fedorov, Richard Henderson, Alex Bennée,
	Andreas Färber, KONRAD Frédéric



On 22/03/2016 00:59, Emilio G. Cota wrote:
> Good point. Most of my tests have been on x86-on-x86, and the
> difference there (for many CPU-intensive benchmarks such as SPEC) was
> negligible.
> 
> Just tested the current master booting Alex' debian ARM image, without
> LRU, and I see a 20% increase in boot time.
> 
> I'll add per-bucket locks to keep the same behaviour without hurting
> scalability.

You probably should skip the MRU move if the TB is not within the first
N items of the list (e.g. N=4).  Then you take the lock much less.

Paolo

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [RFC v1 01/11] tcg: move tb_find_fast outside the tb_lock critical section
  2016-03-21 21:50   ` Emilio G. Cota
  2016-03-21 22:08     ` Peter Maydell
@ 2016-03-22 11:55     ` Alex Bennée
  1 sibling, 0 replies; 28+ messages in thread
From: Alex Bennée @ 2016-03-22 11:55 UTC (permalink / raw)
  To: Emilio G. Cota
  Cc: mttcg, Peter Crosthwaite, mark.burton, qemu-devel, a.rigo,
	pbonzini, serge.fdrv, Richard Henderson, Andreas Färber,
	fred.konrad


Emilio G. Cota <cota@braap.org> writes:

> On Fri, Mar 18, 2016 at 16:18:42 +0000, Alex Bennée wrote:
>> From: KONRAD Frederic <fred.konrad@greensocs.com>
>>
>> Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> [AJB: minor checkpatch fixes]
>> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
>>
>> ---
>> v1(ajb)
>>   - checkpatch fixes
>> ---
>> diff --git a/cpu-exec.c b/cpu-exec.c
>> index 07545aa..52f25de 100644
>> --- a/cpu-exec.c
>> +++ b/cpu-exec.c
>> @@ -225,8 +225,9 @@ static TranslationBlock *tb_find_physical(CPUState *cpu,
>>      phys_page1 = phys_pc & TARGET_PAGE_MASK;
>>      h = tb_phys_hash_func(phys_pc);
>>      for (ptb1 = &tcg_ctx.tb_ctx.tb_phys_hash[h];
>> -         (tb = *ptb1) != NULL;
>> +         (tb = atomic_read(ptb1)) != NULL;
>>           ptb1 = &tb->phys_hash_next) {
>> +        smp_read_barrier_depends();
>>          if (tb->pc != pc ||
>>              tb->page_addr[0] != phys_page1 ||
>>              tb->cs_base != cs_base ||
>> @@ -254,7 +255,18 @@ static TranslationBlock *tb_find_physical(CPUState *cpu,
> [ Adding this missing line to the diff for clarity ]

I have to admit that clarity is one thing the code in this area could do
with. I find it hard to follow on the best of days.

>            /* Move the TB to the head of the list */
>>          *ptb1 = tb->phys_hash_next;
>>          tb->phys_hash_next = tcg_ctx.tb_ctx.tb_phys_hash[h];
>>          tcg_ctx.tb_ctx.tb_phys_hash[h] = tb;
>
> This function, as is, doesn't really just "find"; two concurrent "finders"
> could race here by *writing* to the head of the list at the same time.
>
> The fix is to get rid of this write entirely; moving the just-found TB to
> the head of the list is not really that necessary thanks to the CPU's
> tb_jmp_cache table. This fix would make the function read-only, which
> is what the function's name implies.
>
> Further, I'd like to see tb_phys_hash to use the RCU queue primitives; it
> makes everything easier to understand (and we avoid sprinkling the code
> base with smp_barrier_depends).
>
> I have these two changes queued up as part of my upcoming series, which I'm
> basing on your patchset.

Cool, I look forward to it.

>
> Thanks for putting these changes together!

This was exactly my aim, getting the common base stuff reviewed so the
competing plurality of approaches can build on it as we shake out the
design ;-)

>
> 		Emilio


--
Alex Bennée

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [RFC v1 01/11] tcg: move tb_find_fast outside the tb_lock critical section
  2016-03-21 23:59       ` Emilio G. Cota
  2016-03-22  8:29         ` Paolo Bonzini
@ 2016-03-22 11:59         ` Alex Bennée
  1 sibling, 0 replies; 28+ messages in thread
From: Alex Bennée @ 2016-03-22 11:59 UTC (permalink / raw)
  To: Emilio G. Cota
  Cc: mttcg, Peter Maydell, Peter Crosthwaite, Mark Burton,
	Alvise Rigo, QEMU Developers, Sergey Fedorov, Paolo Bonzini,
	KONRAD Frédéric, Andreas Färber,
	Richard Henderson


Emilio G. Cota <cota@braap.org> writes:

> On Mon, Mar 21, 2016 at 22:08:06 +0000, Peter Maydell wrote:
>> It is not _necessary_, but it is a performance optimization to
>> speed up the "missed in the TLB" case. (A TLB flush will wipe
>> the tb_jmp_cache table.) From the thread where the move-to-front-of-list
>> behaviour was added in 2010, benefits cited:
>
> (snip)
>> I think what's happening here is that for guest CPUs where TLB
>> invalidation happens fairly frequently (notably ARM, because
>> we don't model ASIDs in the QEMU TLB and thus have to flush
>> the TLB on any context switch) the case of "we didn't hit in
>> the TLB but we do have this TB and it was used really recently"
>> happens often enough to make it worthwhile for the
>> tb_find_physical() code to keep its hash buckets in LRU order.
>>
>> Obviously that's all five year old data now, so a pinch of
>> salt may be indicated, but I'd rather we didn't just remove
>> the optimisation without some benchmarking to check that it's
>> not significant. A 2x difference is huge.
>
> Good point. Most of my tests have been on x86-on-x86, and the
> difference there (for many CPU-intensive benchmarks such as SPEC) was
> negligible.
>
> Just tested the current master booting Alex' debian ARM image, without
> LRU, and I see a 20% increase in boot time.

Also see:

https://github.com/stsquad/kvm-unit-tests/tree/mttcg/current-tests-v5

./run-tests.sh -g tcg -t

The tcg tests are designed to exercise the TB find and linking logic.
The computed and paged variants of the test always exit the run loop to
look up the next TB. Granted the tests are pathological cases but useful
for comparing different approaches at the edge cases.

>
> I'll add per-bucket locks to keep the same behaviour without hurting
> scalability.
>
> Thanks,
>
> 		Emilio


--
Alex Bennée

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [RFC v1 10/11] tcg: grab iothread lock in cpu-exec interrupt handling
  2016-03-18 16:48   ` Paolo Bonzini
@ 2016-03-22 12:03     ` Alex Bennée
  0 siblings, 0 replies; 28+ messages in thread
From: Alex Bennée @ 2016-03-22 12:03 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: mttcg, Peter Crosthwaite, mark.burton, qemu-devel, a.rigo, cota,
	serge.fdrv, Richard Henderson, Andreas Färber, fred.konrad


Paolo Bonzini <pbonzini@redhat.com> writes:

> On 18/03/2016 17:18, Alex Bennée wrote:
>> From: "Emilio G. Cota" <cota@braap.org>
>>
>> Signed-off-by: Emilio G. Cota <cota@braap.org>
>> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
>>
>> ---
>> v1 (ajb)
>>  - pulled from emilio/mttcg series
>> ---
>>  cpu-exec.c        | 33 +++++++++++++++++++++++++++------
>>  include/qom/cpu.h |  1 +
>>  2 files changed, 28 insertions(+), 6 deletions(-)
>>
>> diff --git a/cpu-exec.c b/cpu-exec.c
>> index 76891fd..6acaf25 100644
>> --- a/cpu-exec.c
>> +++ b/cpu-exec.c
>> @@ -351,6 +351,29 @@ static void cpu_handle_debug_exception(CPUState *cpu)
>>      cc->debug_excp_handler(cpu);
>>  }
>>
>> +#ifdef CONFIG_SOFTMMU
>> +static inline void cpu_exit_loop_lock(CPUState *cpu)
>> +{
>> +    qemu_mutex_lock_iothread();
>> +    cpu->cpu_loop_exit_locked = true;
>> +}
>> +
>> +static inline void cpu_exit_loop_lock_reset(CPUState *cpu)
>> +{
>> +    if (cpu->cpu_loop_exit_locked) {
>> +        cpu->cpu_loop_exit_locked = false;
>> +        qemu_mutex_unlock_iothread();
>> +    }
>> +}
>
> In the meanwhile we got qemu_mutex_iothread_locked(), so these two
> inlines are not necessary anymore.

Good point, I'll fix that up as I squash them.

>
> Paolo
>
>> +#else
>> +static inline void cpu_exit_loop_lock(CPUState *cpu)
>> +{ }
>> +
>> +static inline void cpu_exit_loop_lock_reset(CPUState *cpu)
>> +{ }
>> +#endif
>> +
>>  /* main execution loop */
>>
>>  int cpu_exec(CPUState *cpu)
>> @@ -450,12 +473,7 @@ int cpu_exec(CPUState *cpu)
>>              for(;;) {
>>                  interrupt_request = cpu->interrupt_request;
>>                  if (unlikely(interrupt_request)) {
>> -                    /* FIXME: this needs to take the iothread lock.
>> -                     * For this we need to find all places in
>> -                     * cc->cpu_exec_interrupt that can call cpu_loop_exit,
>> -                     * and call qemu_unlock_iothread_mutex() there.  Else,
>> -                     * add a flag telling cpu_loop_exit() to unlock it.
>> -                     */
>> +                    cpu_exit_loop_lock(cpu);
>>
>>                      if (unlikely(cpu->singlestep_enabled & SSTEP_NOIRQ)) {
>>                          /* Mask out external interrupts for this step. */
>> @@ -510,6 +528,8 @@ int cpu_exec(CPUState *cpu)
>>                          next_tb = 0;
>>                      }
>>
>> +                    cpu_exit_loop_lock_reset(cpu);
>> +
>>                  }
>>                  if (unlikely(cpu->exit_request
>>                               || replay_has_interrupt())) {
>> @@ -630,6 +650,7 @@ int cpu_exec(CPUState *cpu)
>>
>>              cpu->can_do_io = 1;
>>              tb_lock_reset();
>> +            cpu_exit_loop_lock_reset(cpu);
>>          }
>>      } /* for(;;) */
>>
>> diff --git a/include/qom/cpu.h b/include/qom/cpu.h
>> index d6cb7b8..954d97d 100644
>> --- a/include/qom/cpu.h
>> +++ b/include/qom/cpu.h
>> @@ -295,6 +295,7 @@ struct CPUState {
>>      bool crash_occurred;
>>      bool exit_request;
>>      bool tb_invalidated_flag;
>> +    bool cpu_loop_exit_locked;
>>      uint32_t interrupt_request;
>>      int singlestep_enabled;
>>      int64_t icount_extra;
>>


--
Alex Bennée

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [RFC v1 09/11] tcg: drop global lock during TCG code execution
  2016-03-18 16:18 ` [Qemu-devel] [RFC v1 09/11] tcg: drop global lock during TCG code execution Alex Bennée
  2016-03-18 16:49   ` Paolo Bonzini
@ 2016-03-23  9:19   ` KONRAD Frederic
  2016-03-23 16:27     ` Alex Bennée
  1 sibling, 1 reply; 28+ messages in thread
From: KONRAD Frederic @ 2016-03-23  9:19 UTC (permalink / raw)
  To: Alex Bennée, mttcg, a.rigo, serge.fdrv, cota
  Cc: Eduardo Habkost, Michael S. Tsirkin, jan.kiszka,
	Peter Crosthwaite, mark.burton, qemu-devel, pbonzini,
	Richard Henderson

Hi Alex,

Thanks for having pulling all that together!
About this patch the original author was Jan Kiszka 
(https://lists.gnu.org/archive/html/qemu-devel/2011-09/msg03323.html)

This has probably been dropped during a rebase.

Thanks,
Fred

Le 18/03/2016 17:18, Alex Bennée a écrit :
> From: KONRAD Frederic <fred.konrad@greensocs.com>
>
> This finally allows TCG to benefit from the iothread introduction: Drop
> the global mutex while running pure TCG CPU code. Reacquire the lock
> when entering MMIO or PIO emulation, or when leaving the TCG loop.
>
> We have to revert a few optimization for the current TCG threading
> model, namely kicking the TCG thread in qemu_mutex_lock_iothread and not
> kicking it in qemu_cpu_kick. We also need to disable RAM block
> reordering until we have a more efficient locking mechanism at hand.
>
> Still, a Linux x86 UP guest and my Musicpal ARM model boot fine here.
> These numbers demonstrate where we gain something:
>
> 20338 jan       20   0  331m  75m 6904 R   99  0.9   0:50.95 qemu-system-arm
> 20337 jan       20   0  331m  75m 6904 S   20  0.9   0:26.50 qemu-system-arm
>
> The guest CPU was fully loaded, but the iothread could still run mostly
> independent on a second core. Without the patch we don't get beyond
>
> 32206 jan       20   0  330m  73m 7036 R   82  0.9   1:06.00 qemu-system-arm
> 32204 jan       20   0  330m  73m 7036 S   21  0.9   0:17.03 qemu-system-arm
>
> We don't benefit significantly, though, when the guest is not fully
> loading a host CPU.
>
> Message-Id: <1439220437-23957-10-git-send-email-fred.konrad@greensocs.com>
> Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
> [AJB: -smp single-threaded fix, rm old info from commit msg]
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
>
> ---
> v1 (ajb):
>    - SMP failure now fixed by previous commit
> Changes from Fred Konrad (mttcg-v7 via paolo):
>    * Rebase on the current HEAD.
>    * Fixes a deadlock in qemu_devices_reset().
>    * Remove the mutex in address_space_*
> ---
>   cpu-exec.c         | 10 ++++++++++
>   cpus.c             | 26 +++-----------------------
>   cputlb.c           |  4 ++++
>   exec.c             | 12 ++++++++++++
>   hw/i386/kvmvapic.c |  3 +++
>   softmmu_template.h | 17 +++++++++++++++++
>   translate-all.c    | 11 +++++++++--
>   7 files changed, 58 insertions(+), 25 deletions(-)
>
> diff --git a/cpu-exec.c b/cpu-exec.c
> index 3572256..76891fd 100644
> --- a/cpu-exec.c
> +++ b/cpu-exec.c
> @@ -28,6 +28,7 @@
>   #include "qemu/rcu.h"
>   #include "exec/tb-hash.h"
>   #include "exec/log.h"
> +#include "qemu/main-loop.h"
>   #if defined(TARGET_I386) && !defined(CONFIG_USER_ONLY)
>   #include "hw/i386/apic.h"
>   #endif
> @@ -449,6 +450,13 @@ int cpu_exec(CPUState *cpu)
>               for(;;) {
>                   interrupt_request = cpu->interrupt_request;
>                   if (unlikely(interrupt_request)) {
> +                    /* FIXME: this needs to take the iothread lock.
> +                     * For this we need to find all places in
> +                     * cc->cpu_exec_interrupt that can call cpu_loop_exit,
> +                     * and call qemu_unlock_iothread_mutex() there.  Else,
> +                     * add a flag telling cpu_loop_exit() to unlock it.
> +                     */
> +
>                       if (unlikely(cpu->singlestep_enabled & SSTEP_NOIRQ)) {
>                           /* Mask out external interrupts for this step. */
>                           interrupt_request &= ~CPU_INTERRUPT_SSTEP_MASK;
> @@ -501,6 +509,7 @@ int cpu_exec(CPUState *cpu)
>                              the program flow was changed */
>                           next_tb = 0;
>                       }
> +
>                   }
>                   if (unlikely(cpu->exit_request
>                                || replay_has_interrupt())) {
> @@ -618,6 +627,7 @@ int cpu_exec(CPUState *cpu)
>               g_assert(env == &x86_cpu->env);
>   #endif
>   #endif /* buggy compiler */
> +
>               cpu->can_do_io = 1;
>               tb_lock_reset();
>           }
> diff --git a/cpus.c b/cpus.c
> index a87fbf9..0e3d5f9 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -911,8 +911,6 @@ static void qemu_kvm_init_cpu_signals(CPUState *cpu)
>   #endif /* _WIN32 */
>   
>   static QemuMutex qemu_global_mutex;
> -static QemuCond qemu_io_proceeded_cond;
> -static unsigned iothread_requesting_mutex;
>   
>   static QemuThread io_thread;
>   
> @@ -928,7 +926,6 @@ void qemu_init_cpu_loop(void)
>       qemu_cond_init(&qemu_cpu_cond);
>       qemu_cond_init(&qemu_pause_cond);
>       qemu_cond_init(&qemu_work_cond);
> -    qemu_cond_init(&qemu_io_proceeded_cond);
>       qemu_mutex_init(&qemu_global_mutex);
>   
>       qemu_thread_get_self(&io_thread);
> @@ -1043,10 +1040,6 @@ static void qemu_tcg_wait_io_event(CPUState *cpu)
>           qemu_cond_wait(cpu->halt_cond, &qemu_global_mutex);
>       }
>   
> -    while (iothread_requesting_mutex) {
> -        qemu_cond_wait(&qemu_io_proceeded_cond, &qemu_global_mutex);
> -    }
> -
>       CPU_FOREACH(cpu) {
>           qemu_wait_io_event_common(cpu);
>       }
> @@ -1314,22 +1307,7 @@ bool qemu_mutex_iothread_locked(void)
>   
>   void qemu_mutex_lock_iothread(void)
>   {
> -    atomic_inc(&iothread_requesting_mutex);
> -    /* In the simple case there is no need to bump the VCPU thread out of
> -     * TCG code execution.
> -     */
> -    if (!tcg_enabled() || qemu_in_vcpu_thread() ||
> -        !first_cpu || !first_cpu->created) {
> -        qemu_mutex_lock(&qemu_global_mutex);
> -        atomic_dec(&iothread_requesting_mutex);
> -    } else {
> -        if (qemu_mutex_trylock(&qemu_global_mutex)) {
> -            qemu_cpu_kick_no_halt();
> -            qemu_mutex_lock(&qemu_global_mutex);
> -        }
> -        atomic_dec(&iothread_requesting_mutex);
> -        qemu_cond_broadcast(&qemu_io_proceeded_cond);
> -    }
> +    qemu_mutex_lock(&qemu_global_mutex);
>       iothread_locked = true;
>   }
>   
> @@ -1575,7 +1553,9 @@ static int tcg_cpu_exec(CPUState *cpu)
>           cpu->icount_decr.u16.low = decr;
>           cpu->icount_extra = count;
>       }
> +    qemu_mutex_unlock_iothread();
>       ret = cpu_exec(cpu);
> +    qemu_mutex_lock_iothread();
>   #ifdef CONFIG_PROFILER
>       tcg_time += profile_getclock() - ti;
>   #endif
> diff --git a/cputlb.c b/cputlb.c
> index 2f7a166..d607471 100644
> --- a/cputlb.c
> +++ b/cputlb.c
> @@ -30,6 +30,10 @@
>   #include "exec/ram_addr.h"
>   #include "tcg/tcg.h"
>   
> +#ifdef CONFIG_SOFTMMU
> +#include "qemu/main-loop.h"
> +#endif
> +
>   //#define DEBUG_TLB
>   //#define DEBUG_TLB_CHECK
>   
> diff --git a/exec.c b/exec.c
> index 402b751..9986d59 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -2113,6 +2113,12 @@ static void check_watchpoint(int offset, int len, MemTxAttrs attrs, int flags)
>                   }
>                   cpu->watchpoint_hit = wp;
>   
> +                /*
> +                 * Unlock iothread mutex before calling cpu_loop_exit
> +                 * or cpu_resume_from_signal.
> +                 */
> +                qemu_mutex_unlock_iothread();
> +
>                   /* Unlocked by cpu_loop_exit or cpu_resume_from_signal.  */
>                   tb_lock();
>                   tb_check_watchpoint(cpu);
> @@ -2348,8 +2354,14 @@ static void io_mem_init(void)
>       memory_region_init_io(&io_mem_rom, NULL, &unassigned_mem_ops, NULL, NULL, UINT64_MAX);
>       memory_region_init_io(&io_mem_unassigned, NULL, &unassigned_mem_ops, NULL,
>                             NULL, UINT64_MAX);
> +
> +    /* io_mem_notdirty calls tb_invalidate_phys_page_fast,
> +     * which must be called without the iothread mutex.
> +     */
>       memory_region_init_io(&io_mem_notdirty, NULL, &notdirty_mem_ops, NULL,
>                             NULL, UINT64_MAX);
> +    memory_region_clear_global_locking(&io_mem_notdirty);
> +
>       memory_region_init_io(&io_mem_watch, NULL, &watch_mem_ops, NULL,
>                             NULL, UINT64_MAX);
>   }
> diff --git a/hw/i386/kvmvapic.c b/hw/i386/kvmvapic.c
> index 7c0d542..a0439a8 100644
> --- a/hw/i386/kvmvapic.c
> +++ b/hw/i386/kvmvapic.c
> @@ -447,6 +447,9 @@ static void patch_instruction(VAPICROMState *s, X86CPU *cpu, target_ulong ip)
>       resume_all_vcpus();
>   
>       if (!kvm_enabled()) {
> +        /* Unlock iothread mutex before calling cpu_resume_from_signal.  */
> +        qemu_mutex_unlock_iothread();
> +
>           /* Unlocked by cpu_resume_from_signal.  */
>           tb_lock();
>           cs->current_tb = NULL;
> diff --git a/softmmu_template.h b/softmmu_template.h
> index 208f808..0b6c609 100644
> --- a/softmmu_template.h
> +++ b/softmmu_template.h
> @@ -151,6 +151,7 @@ static inline DATA_TYPE glue(io_read, SUFFIX)(CPUArchState *env,
>       CPUState *cpu = ENV_GET_CPU(env);
>       hwaddr physaddr = iotlbentry->addr;
>       MemoryRegion *mr = iotlb_to_region(cpu, physaddr, iotlbentry->attrs);
> +    bool locked = false;
>   
>       physaddr = (physaddr & TARGET_PAGE_MASK) + addr;
>       cpu->mem_io_pc = retaddr;
> @@ -159,8 +160,15 @@ static inline DATA_TYPE glue(io_read, SUFFIX)(CPUArchState *env,
>       }
>   
>       cpu->mem_io_vaddr = addr;
> +    if (mr->global_locking) {
> +        qemu_mutex_lock_iothread();
> +        locked = true;
> +    }
>       memory_region_dispatch_read(mr, physaddr, &val, 1 << SHIFT,
>                                   iotlbentry->attrs);
> +    if (locked) {
> +        qemu_mutex_unlock_iothread();
> +    }
>       return val;
>   }
>   #endif
> @@ -358,6 +366,7 @@ static inline void glue(io_write, SUFFIX)(CPUArchState *env,
>       CPUState *cpu = ENV_GET_CPU(env);
>       hwaddr physaddr = iotlbentry->addr;
>       MemoryRegion *mr = iotlb_to_region(cpu, physaddr, iotlbentry->attrs);
> +    bool locked = false;
>   
>       physaddr = (physaddr & TARGET_PAGE_MASK) + addr;
>       if (mr != &io_mem_rom && mr != &io_mem_notdirty && !cpu->can_do_io) {
> @@ -366,8 +375,16 @@ static inline void glue(io_write, SUFFIX)(CPUArchState *env,
>   
>       cpu->mem_io_vaddr = addr;
>       cpu->mem_io_pc = retaddr;
> +
> +    if (mr->global_locking) {
> +        qemu_mutex_lock_iothread();
> +        locked = true;
> +    }
>       memory_region_dispatch_write(mr, physaddr, val, 1 << SHIFT,
>                                    iotlbentry->attrs);
> +    if (locked) {
> +        qemu_mutex_unlock_iothread();
> +    }
>   }
>   
>   void helper_le_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
> diff --git a/translate-all.c b/translate-all.c
> index 1aab243..fe1392a 100644
> --- a/translate-all.c
> +++ b/translate-all.c
> @@ -1221,6 +1221,7 @@ void tb_invalidate_phys_range(tb_page_addr_t start, tb_page_addr_t end)
>    * this TB.
>    *
>    * Called with mmap_lock held for user-mode emulation
> + * If called from generated code, iothread mutex must not be held.
>    */
>   void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end,
>                                      int is_cpu_write_access)
> @@ -1333,7 +1334,9 @@ void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end,
>   }
>   
>   #ifdef CONFIG_SOFTMMU
> -/* len must be <= 8 and start must be a multiple of len */
> +/* len must be <= 8 and start must be a multiple of len.
> + * Called via softmmu_template.h, with iothread mutex not held.
> + */
>   void tb_invalidate_phys_page_fast(tb_page_addr_t start, int len)
>   {
>       PageDesc *p;
> @@ -1591,6 +1594,7 @@ static TranslationBlock *tb_find_pc(uintptr_t tc_ptr)
>   }
>   
>   #if !defined(CONFIG_USER_ONLY)
> +/* If called from generated code, iothread mutex must not be held.  */
>   void tb_invalidate_phys_addr(AddressSpace *as, hwaddr addr)
>   {
>       ram_addr_t ram_addr;
> @@ -1637,7 +1641,10 @@ void tb_check_watchpoint(CPUState *cpu)
>   
>   #ifndef CONFIG_USER_ONLY
>   /* in deterministic execution mode, instructions doing device I/Os
> -   must be at the end of the TB */
> + * must be at the end of the TB.
> + *
> + * Called by softmmu_template.h, with iothread mutex not held.
> + */
>   void cpu_io_recompile(CPUState *cpu, uintptr_t retaddr)
>   {
>   #if defined(TARGET_MIPS) || defined(TARGET_SH4)

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [RFC v1 09/11] tcg: drop global lock during TCG code execution
  2016-03-23  9:19   ` KONRAD Frederic
@ 2016-03-23 16:27     ` Alex Bennée
  2016-03-23 20:36       ` Jan Kiszka
  0 siblings, 1 reply; 28+ messages in thread
From: Alex Bennée @ 2016-03-23 16:27 UTC (permalink / raw)
  To: KONRAD Frederic
  Cc: mttcg, Eduardo Habkost, Peter Crosthwaite, jan.kiszka,
	Michael S. Tsirkin, mark.burton, a.rigo, qemu-devel, cota,
	pbonzini, serge.fdrv, Richard Henderson


KONRAD Frederic <fred.konrad@greensocs.com> writes:

> Hi Alex,
>
> Thanks for having pulling all that together!
> About this patch the original author was Jan Kiszka
> (https://lists.gnu.org/archive/html/qemu-devel/2011-09/msg03323.html)
>
> This has probably been dropped during a rebase.

I'll amend the author on v2.

Jan,

The original didn't have a s-o-b tag from you. Are you happy to give one
for its current iteration?

>
> Thanks,
> Fred
>
> Le 18/03/2016 17:18, Alex Bennée a écrit :
>> From: KONRAD Frederic <fred.konrad@greensocs.com>
>>
>> This finally allows TCG to benefit from the iothread introduction: Drop
>> the global mutex while running pure TCG CPU code. Reacquire the lock
>> when entering MMIO or PIO emulation, or when leaving the TCG loop.
>>
>> We have to revert a few optimization for the current TCG threading
>> model, namely kicking the TCG thread in qemu_mutex_lock_iothread and not
>> kicking it in qemu_cpu_kick. We also need to disable RAM block
>> reordering until we have a more efficient locking mechanism at hand.
>>
>> Still, a Linux x86 UP guest and my Musicpal ARM model boot fine here.
>> These numbers demonstrate where we gain something:
>>
>> 20338 jan       20   0  331m  75m 6904 R   99  0.9   0:50.95 qemu-system-arm
>> 20337 jan       20   0  331m  75m 6904 S   20  0.9   0:26.50 qemu-system-arm
>>
>> The guest CPU was fully loaded, but the iothread could still run mostly
>> independent on a second core. Without the patch we don't get beyond
>>
>> 32206 jan       20   0  330m  73m 7036 R   82  0.9   1:06.00 qemu-system-arm
>> 32204 jan       20   0  330m  73m 7036 S   21  0.9   0:17.03 qemu-system-arm
>>
>> We don't benefit significantly, though, when the guest is not fully
>> loading a host CPU.
>>
>> Message-Id: <1439220437-23957-10-git-send-email-fred.konrad@greensocs.com>
>> Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
>> [AJB: -smp single-threaded fix, rm old info from commit msg]
>> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
>>
>> ---
>> v1 (ajb):
>>    - SMP failure now fixed by previous commit
>> Changes from Fred Konrad (mttcg-v7 via paolo):
>>    * Rebase on the current HEAD.
>>    * Fixes a deadlock in qemu_devices_reset().
>>    * Remove the mutex in address_space_*
>> ---
>>   cpu-exec.c         | 10 ++++++++++
>>   cpus.c             | 26 +++-----------------------
>>   cputlb.c           |  4 ++++
>>   exec.c             | 12 ++++++++++++
>>   hw/i386/kvmvapic.c |  3 +++
>>   softmmu_template.h | 17 +++++++++++++++++
>>   translate-all.c    | 11 +++++++++--
>>   7 files changed, 58 insertions(+), 25 deletions(-)
>>
>> diff --git a/cpu-exec.c b/cpu-exec.c
>> index 3572256..76891fd 100644
>> --- a/cpu-exec.c
>> +++ b/cpu-exec.c
>> @@ -28,6 +28,7 @@
>>   #include "qemu/rcu.h"
>>   #include "exec/tb-hash.h"
>>   #include "exec/log.h"
>> +#include "qemu/main-loop.h"
>>   #if defined(TARGET_I386) && !defined(CONFIG_USER_ONLY)
>>   #include "hw/i386/apic.h"
>>   #endif
>> @@ -449,6 +450,13 @@ int cpu_exec(CPUState *cpu)
>>               for(;;) {
>>                   interrupt_request = cpu->interrupt_request;
>>                   if (unlikely(interrupt_request)) {
>> +                    /* FIXME: this needs to take the iothread lock.
>> +                     * For this we need to find all places in
>> +                     * cc->cpu_exec_interrupt that can call cpu_loop_exit,
>> +                     * and call qemu_unlock_iothread_mutex() there.  Else,
>> +                     * add a flag telling cpu_loop_exit() to unlock it.
>> +                     */
>> +
>>                       if (unlikely(cpu->singlestep_enabled & SSTEP_NOIRQ)) {
>>                           /* Mask out external interrupts for this step. */
>>                           interrupt_request &= ~CPU_INTERRUPT_SSTEP_MASK;
>> @@ -501,6 +509,7 @@ int cpu_exec(CPUState *cpu)
>>                              the program flow was changed */
>>                           next_tb = 0;
>>                       }
>> +
>>                   }
>>                   if (unlikely(cpu->exit_request
>>                                || replay_has_interrupt())) {
>> @@ -618,6 +627,7 @@ int cpu_exec(CPUState *cpu)
>>               g_assert(env == &x86_cpu->env);
>>   #endif
>>   #endif /* buggy compiler */
>> +
>>               cpu->can_do_io = 1;
>>               tb_lock_reset();
>>           }
>> diff --git a/cpus.c b/cpus.c
>> index a87fbf9..0e3d5f9 100644
>> --- a/cpus.c
>> +++ b/cpus.c
>> @@ -911,8 +911,6 @@ static void qemu_kvm_init_cpu_signals(CPUState *cpu)
>>   #endif /* _WIN32 */
>>
>>   static QemuMutex qemu_global_mutex;
>> -static QemuCond qemu_io_proceeded_cond;
>> -static unsigned iothread_requesting_mutex;
>>
>>   static QemuThread io_thread;
>>
>> @@ -928,7 +926,6 @@ void qemu_init_cpu_loop(void)
>>       qemu_cond_init(&qemu_cpu_cond);
>>       qemu_cond_init(&qemu_pause_cond);
>>       qemu_cond_init(&qemu_work_cond);
>> -    qemu_cond_init(&qemu_io_proceeded_cond);
>>       qemu_mutex_init(&qemu_global_mutex);
>>
>>       qemu_thread_get_self(&io_thread);
>> @@ -1043,10 +1040,6 @@ static void qemu_tcg_wait_io_event(CPUState *cpu)
>>           qemu_cond_wait(cpu->halt_cond, &qemu_global_mutex);
>>       }
>>
>> -    while (iothread_requesting_mutex) {
>> -        qemu_cond_wait(&qemu_io_proceeded_cond, &qemu_global_mutex);
>> -    }
>> -
>>       CPU_FOREACH(cpu) {
>>           qemu_wait_io_event_common(cpu);
>>       }
>> @@ -1314,22 +1307,7 @@ bool qemu_mutex_iothread_locked(void)
>>
>>   void qemu_mutex_lock_iothread(void)
>>   {
>> -    atomic_inc(&iothread_requesting_mutex);
>> -    /* In the simple case there is no need to bump the VCPU thread out of
>> -     * TCG code execution.
>> -     */
>> -    if (!tcg_enabled() || qemu_in_vcpu_thread() ||
>> -        !first_cpu || !first_cpu->created) {
>> -        qemu_mutex_lock(&qemu_global_mutex);
>> -        atomic_dec(&iothread_requesting_mutex);
>> -    } else {
>> -        if (qemu_mutex_trylock(&qemu_global_mutex)) {
>> -            qemu_cpu_kick_no_halt();
>> -            qemu_mutex_lock(&qemu_global_mutex);
>> -        }
>> -        atomic_dec(&iothread_requesting_mutex);
>> -        qemu_cond_broadcast(&qemu_io_proceeded_cond);
>> -    }
>> +    qemu_mutex_lock(&qemu_global_mutex);
>>       iothread_locked = true;
>>   }
>>
>> @@ -1575,7 +1553,9 @@ static int tcg_cpu_exec(CPUState *cpu)
>>           cpu->icount_decr.u16.low = decr;
>>           cpu->icount_extra = count;
>>       }
>> +    qemu_mutex_unlock_iothread();
>>       ret = cpu_exec(cpu);
>> +    qemu_mutex_lock_iothread();
>>   #ifdef CONFIG_PROFILER
>>       tcg_time += profile_getclock() - ti;
>>   #endif
>> diff --git a/cputlb.c b/cputlb.c
>> index 2f7a166..d607471 100644
>> --- a/cputlb.c
>> +++ b/cputlb.c
>> @@ -30,6 +30,10 @@
>>   #include "exec/ram_addr.h"
>>   #include "tcg/tcg.h"
>>
>> +#ifdef CONFIG_SOFTMMU
>> +#include "qemu/main-loop.h"
>> +#endif
>> +
>>   //#define DEBUG_TLB
>>   //#define DEBUG_TLB_CHECK
>>
>> diff --git a/exec.c b/exec.c
>> index 402b751..9986d59 100644
>> --- a/exec.c
>> +++ b/exec.c
>> @@ -2113,6 +2113,12 @@ static void check_watchpoint(int offset, int len, MemTxAttrs attrs, int flags)
>>                   }
>>                   cpu->watchpoint_hit = wp;
>>
>> +                /*
>> +                 * Unlock iothread mutex before calling cpu_loop_exit
>> +                 * or cpu_resume_from_signal.
>> +                 */
>> +                qemu_mutex_unlock_iothread();
>> +
>>                   /* Unlocked by cpu_loop_exit or cpu_resume_from_signal.  */
>>                   tb_lock();
>>                   tb_check_watchpoint(cpu);
>> @@ -2348,8 +2354,14 @@ static void io_mem_init(void)
>>       memory_region_init_io(&io_mem_rom, NULL, &unassigned_mem_ops, NULL, NULL, UINT64_MAX);
>>       memory_region_init_io(&io_mem_unassigned, NULL, &unassigned_mem_ops, NULL,
>>                             NULL, UINT64_MAX);
>> +
>> +    /* io_mem_notdirty calls tb_invalidate_phys_page_fast,
>> +     * which must be called without the iothread mutex.
>> +     */
>>       memory_region_init_io(&io_mem_notdirty, NULL, &notdirty_mem_ops, NULL,
>>                             NULL, UINT64_MAX);
>> +    memory_region_clear_global_locking(&io_mem_notdirty);
>> +
>>       memory_region_init_io(&io_mem_watch, NULL, &watch_mem_ops, NULL,
>>                             NULL, UINT64_MAX);
>>   }
>> diff --git a/hw/i386/kvmvapic.c b/hw/i386/kvmvapic.c
>> index 7c0d542..a0439a8 100644
>> --- a/hw/i386/kvmvapic.c
>> +++ b/hw/i386/kvmvapic.c
>> @@ -447,6 +447,9 @@ static void patch_instruction(VAPICROMState *s, X86CPU *cpu, target_ulong ip)
>>       resume_all_vcpus();
>>
>>       if (!kvm_enabled()) {
>> +        /* Unlock iothread mutex before calling cpu_resume_from_signal.  */
>> +        qemu_mutex_unlock_iothread();
>> +
>>           /* Unlocked by cpu_resume_from_signal.  */
>>           tb_lock();
>>           cs->current_tb = NULL;
>> diff --git a/softmmu_template.h b/softmmu_template.h
>> index 208f808..0b6c609 100644
>> --- a/softmmu_template.h
>> +++ b/softmmu_template.h
>> @@ -151,6 +151,7 @@ static inline DATA_TYPE glue(io_read, SUFFIX)(CPUArchState *env,
>>       CPUState *cpu = ENV_GET_CPU(env);
>>       hwaddr physaddr = iotlbentry->addr;
>>       MemoryRegion *mr = iotlb_to_region(cpu, physaddr, iotlbentry->attrs);
>> +    bool locked = false;
>>
>>       physaddr = (physaddr & TARGET_PAGE_MASK) + addr;
>>       cpu->mem_io_pc = retaddr;
>> @@ -159,8 +160,15 @@ static inline DATA_TYPE glue(io_read, SUFFIX)(CPUArchState *env,
>>       }
>>
>>       cpu->mem_io_vaddr = addr;
>> +    if (mr->global_locking) {
>> +        qemu_mutex_lock_iothread();
>> +        locked = true;
>> +    }
>>       memory_region_dispatch_read(mr, physaddr, &val, 1 << SHIFT,
>>                                   iotlbentry->attrs);
>> +    if (locked) {
>> +        qemu_mutex_unlock_iothread();
>> +    }
>>       return val;
>>   }
>>   #endif
>> @@ -358,6 +366,7 @@ static inline void glue(io_write, SUFFIX)(CPUArchState *env,
>>       CPUState *cpu = ENV_GET_CPU(env);
>>       hwaddr physaddr = iotlbentry->addr;
>>       MemoryRegion *mr = iotlb_to_region(cpu, physaddr, iotlbentry->attrs);
>> +    bool locked = false;
>>
>>       physaddr = (physaddr & TARGET_PAGE_MASK) + addr;
>>       if (mr != &io_mem_rom && mr != &io_mem_notdirty && !cpu->can_do_io) {
>> @@ -366,8 +375,16 @@ static inline void glue(io_write, SUFFIX)(CPUArchState *env,
>>
>>       cpu->mem_io_vaddr = addr;
>>       cpu->mem_io_pc = retaddr;
>> +
>> +    if (mr->global_locking) {
>> +        qemu_mutex_lock_iothread();
>> +        locked = true;
>> +    }
>>       memory_region_dispatch_write(mr, physaddr, val, 1 << SHIFT,
>>                                    iotlbentry->attrs);
>> +    if (locked) {
>> +        qemu_mutex_unlock_iothread();
>> +    }
>>   }
>>
>>   void helper_le_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
>> diff --git a/translate-all.c b/translate-all.c
>> index 1aab243..fe1392a 100644
>> --- a/translate-all.c
>> +++ b/translate-all.c
>> @@ -1221,6 +1221,7 @@ void tb_invalidate_phys_range(tb_page_addr_t start, tb_page_addr_t end)
>>    * this TB.
>>    *
>>    * Called with mmap_lock held for user-mode emulation
>> + * If called from generated code, iothread mutex must not be held.
>>    */
>>   void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end,
>>                                      int is_cpu_write_access)
>> @@ -1333,7 +1334,9 @@ void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end,
>>   }
>>
>>   #ifdef CONFIG_SOFTMMU
>> -/* len must be <= 8 and start must be a multiple of len */
>> +/* len must be <= 8 and start must be a multiple of len.
>> + * Called via softmmu_template.h, with iothread mutex not held.
>> + */
>>   void tb_invalidate_phys_page_fast(tb_page_addr_t start, int len)
>>   {
>>       PageDesc *p;
>> @@ -1591,6 +1594,7 @@ static TranslationBlock *tb_find_pc(uintptr_t tc_ptr)
>>   }
>>
>>   #if !defined(CONFIG_USER_ONLY)
>> +/* If called from generated code, iothread mutex must not be held.  */
>>   void tb_invalidate_phys_addr(AddressSpace *as, hwaddr addr)
>>   {
>>       ram_addr_t ram_addr;
>> @@ -1637,7 +1641,10 @@ void tb_check_watchpoint(CPUState *cpu)
>>
>>   #ifndef CONFIG_USER_ONLY
>>   /* in deterministic execution mode, instructions doing device I/Os
>> -   must be at the end of the TB */
>> + * must be at the end of the TB.
>> + *
>> + * Called by softmmu_template.h, with iothread mutex not held.
>> + */
>>   void cpu_io_recompile(CPUState *cpu, uintptr_t retaddr)
>>   {
>>   #if defined(TARGET_MIPS) || defined(TARGET_SH4)


--
Alex Bennée

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [RFC v1 09/11] tcg: drop global lock during TCG code execution
  2016-03-23 16:27     ` Alex Bennée
@ 2016-03-23 20:36       ` Jan Kiszka
  0 siblings, 0 replies; 28+ messages in thread
From: Jan Kiszka @ 2016-03-23 20:36 UTC (permalink / raw)
  To: Alex Bennée, KONRAD Frederic
  Cc: mttcg, Eduardo Habkost, Peter Crosthwaite, Michael S. Tsirkin,
	mark.burton, a.rigo, qemu-devel, cota, pbonzini, serge.fdrv,
	Richard Henderson

On 2016-03-23 17:27, Alex Bennée wrote:
> 
> KONRAD Frederic <fred.konrad@greensocs.com> writes:
> 
>> Hi Alex,
>>
>> Thanks for having pulling all that together!
>> About this patch the original author was Jan Kiszka
>> (https://lists.gnu.org/archive/html/qemu-devel/2011-09/msg03323.html)
>>
>> This has probably been dropped during a rebase.
> 
> I'll amend the author on v2.
> 
> Jan,
> 
> The original didn't have a s-o-b tag from you. Are you happy to give one
> for its current iteration?

You can consider it

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>

At that point, I didn't consider it ready for upstream (see the log), so
I left out the signature.

Jan

> 
>>
>> Thanks,
>> Fred
>>
>> Le 18/03/2016 17:18, Alex Bennée a écrit :
>>> From: KONRAD Frederic <fred.konrad@greensocs.com>
>>>
>>> This finally allows TCG to benefit from the iothread introduction: Drop
>>> the global mutex while running pure TCG CPU code. Reacquire the lock
>>> when entering MMIO or PIO emulation, or when leaving the TCG loop.
>>>
>>> We have to revert a few optimization for the current TCG threading
>>> model, namely kicking the TCG thread in qemu_mutex_lock_iothread and not
>>> kicking it in qemu_cpu_kick. We also need to disable RAM block
>>> reordering until we have a more efficient locking mechanism at hand.
>>>
>>> Still, a Linux x86 UP guest and my Musicpal ARM model boot fine here.
>>> These numbers demonstrate where we gain something:
>>>
>>> 20338 jan       20   0  331m  75m 6904 R   99  0.9   0:50.95 qemu-system-arm
>>> 20337 jan       20   0  331m  75m 6904 S   20  0.9   0:26.50 qemu-system-arm
>>>
>>> The guest CPU was fully loaded, but the iothread could still run mostly
>>> independent on a second core. Without the patch we don't get beyond
>>>
>>> 32206 jan       20   0  330m  73m 7036 R   82  0.9   1:06.00 qemu-system-arm
>>> 32204 jan       20   0  330m  73m 7036 S   21  0.9   0:17.03 qemu-system-arm
>>>
>>> We don't benefit significantly, though, when the guest is not fully
>>> loading a host CPU.
>>>
>>> Message-Id: <1439220437-23957-10-git-send-email-fred.konrad@greensocs.com>
>>> Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
>>> [AJB: -smp single-threaded fix, rm old info from commit msg]
>>> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
>>>
>>> ---
>>> v1 (ajb):
>>>    - SMP failure now fixed by previous commit
>>> Changes from Fred Konrad (mttcg-v7 via paolo):
>>>    * Rebase on the current HEAD.
>>>    * Fixes a deadlock in qemu_devices_reset().
>>>    * Remove the mutex in address_space_*
>>> ---
>>>   cpu-exec.c         | 10 ++++++++++
>>>   cpus.c             | 26 +++-----------------------
>>>   cputlb.c           |  4 ++++
>>>   exec.c             | 12 ++++++++++++
>>>   hw/i386/kvmvapic.c |  3 +++
>>>   softmmu_template.h | 17 +++++++++++++++++
>>>   translate-all.c    | 11 +++++++++--
>>>   7 files changed, 58 insertions(+), 25 deletions(-)
>>>
>>> diff --git a/cpu-exec.c b/cpu-exec.c
>>> index 3572256..76891fd 100644
>>> --- a/cpu-exec.c
>>> +++ b/cpu-exec.c
>>> @@ -28,6 +28,7 @@
>>>   #include "qemu/rcu.h"
>>>   #include "exec/tb-hash.h"
>>>   #include "exec/log.h"
>>> +#include "qemu/main-loop.h"
>>>   #if defined(TARGET_I386) && !defined(CONFIG_USER_ONLY)
>>>   #include "hw/i386/apic.h"
>>>   #endif
>>> @@ -449,6 +450,13 @@ int cpu_exec(CPUState *cpu)
>>>               for(;;) {
>>>                   interrupt_request = cpu->interrupt_request;
>>>                   if (unlikely(interrupt_request)) {
>>> +                    /* FIXME: this needs to take the iothread lock.
>>> +                     * For this we need to find all places in
>>> +                     * cc->cpu_exec_interrupt that can call cpu_loop_exit,
>>> +                     * and call qemu_unlock_iothread_mutex() there.  Else,
>>> +                     * add a flag telling cpu_loop_exit() to unlock it.
>>> +                     */
>>> +
>>>                       if (unlikely(cpu->singlestep_enabled & SSTEP_NOIRQ)) {
>>>                           /* Mask out external interrupts for this step. */
>>>                           interrupt_request &= ~CPU_INTERRUPT_SSTEP_MASK;
>>> @@ -501,6 +509,7 @@ int cpu_exec(CPUState *cpu)
>>>                              the program flow was changed */
>>>                           next_tb = 0;
>>>                       }
>>> +
>>>                   }
>>>                   if (unlikely(cpu->exit_request
>>>                                || replay_has_interrupt())) {
>>> @@ -618,6 +627,7 @@ int cpu_exec(CPUState *cpu)
>>>               g_assert(env == &x86_cpu->env);
>>>   #endif
>>>   #endif /* buggy compiler */
>>> +
>>>               cpu->can_do_io = 1;
>>>               tb_lock_reset();
>>>           }
>>> diff --git a/cpus.c b/cpus.c
>>> index a87fbf9..0e3d5f9 100644
>>> --- a/cpus.c
>>> +++ b/cpus.c
>>> @@ -911,8 +911,6 @@ static void qemu_kvm_init_cpu_signals(CPUState *cpu)
>>>   #endif /* _WIN32 */
>>>
>>>   static QemuMutex qemu_global_mutex;
>>> -static QemuCond qemu_io_proceeded_cond;
>>> -static unsigned iothread_requesting_mutex;
>>>
>>>   static QemuThread io_thread;
>>>
>>> @@ -928,7 +926,6 @@ void qemu_init_cpu_loop(void)
>>>       qemu_cond_init(&qemu_cpu_cond);
>>>       qemu_cond_init(&qemu_pause_cond);
>>>       qemu_cond_init(&qemu_work_cond);
>>> -    qemu_cond_init(&qemu_io_proceeded_cond);
>>>       qemu_mutex_init(&qemu_global_mutex);
>>>
>>>       qemu_thread_get_self(&io_thread);
>>> @@ -1043,10 +1040,6 @@ static void qemu_tcg_wait_io_event(CPUState *cpu)
>>>           qemu_cond_wait(cpu->halt_cond, &qemu_global_mutex);
>>>       }
>>>
>>> -    while (iothread_requesting_mutex) {
>>> -        qemu_cond_wait(&qemu_io_proceeded_cond, &qemu_global_mutex);
>>> -    }
>>> -
>>>       CPU_FOREACH(cpu) {
>>>           qemu_wait_io_event_common(cpu);
>>>       }
>>> @@ -1314,22 +1307,7 @@ bool qemu_mutex_iothread_locked(void)
>>>
>>>   void qemu_mutex_lock_iothread(void)
>>>   {
>>> -    atomic_inc(&iothread_requesting_mutex);
>>> -    /* In the simple case there is no need to bump the VCPU thread out of
>>> -     * TCG code execution.
>>> -     */
>>> -    if (!tcg_enabled() || qemu_in_vcpu_thread() ||
>>> -        !first_cpu || !first_cpu->created) {
>>> -        qemu_mutex_lock(&qemu_global_mutex);
>>> -        atomic_dec(&iothread_requesting_mutex);
>>> -    } else {
>>> -        if (qemu_mutex_trylock(&qemu_global_mutex)) {
>>> -            qemu_cpu_kick_no_halt();
>>> -            qemu_mutex_lock(&qemu_global_mutex);
>>> -        }
>>> -        atomic_dec(&iothread_requesting_mutex);
>>> -        qemu_cond_broadcast(&qemu_io_proceeded_cond);
>>> -    }
>>> +    qemu_mutex_lock(&qemu_global_mutex);
>>>       iothread_locked = true;
>>>   }
>>>
>>> @@ -1575,7 +1553,9 @@ static int tcg_cpu_exec(CPUState *cpu)
>>>           cpu->icount_decr.u16.low = decr;
>>>           cpu->icount_extra = count;
>>>       }
>>> +    qemu_mutex_unlock_iothread();
>>>       ret = cpu_exec(cpu);
>>> +    qemu_mutex_lock_iothread();
>>>   #ifdef CONFIG_PROFILER
>>>       tcg_time += profile_getclock() - ti;
>>>   #endif
>>> diff --git a/cputlb.c b/cputlb.c
>>> index 2f7a166..d607471 100644
>>> --- a/cputlb.c
>>> +++ b/cputlb.c
>>> @@ -30,6 +30,10 @@
>>>   #include "exec/ram_addr.h"
>>>   #include "tcg/tcg.h"
>>>
>>> +#ifdef CONFIG_SOFTMMU
>>> +#include "qemu/main-loop.h"
>>> +#endif
>>> +
>>>   //#define DEBUG_TLB
>>>   //#define DEBUG_TLB_CHECK
>>>
>>> diff --git a/exec.c b/exec.c
>>> index 402b751..9986d59 100644
>>> --- a/exec.c
>>> +++ b/exec.c
>>> @@ -2113,6 +2113,12 @@ static void check_watchpoint(int offset, int len, MemTxAttrs attrs, int flags)
>>>                   }
>>>                   cpu->watchpoint_hit = wp;
>>>
>>> +                /*
>>> +                 * Unlock iothread mutex before calling cpu_loop_exit
>>> +                 * or cpu_resume_from_signal.
>>> +                 */
>>> +                qemu_mutex_unlock_iothread();
>>> +
>>>                   /* Unlocked by cpu_loop_exit or cpu_resume_from_signal.  */
>>>                   tb_lock();
>>>                   tb_check_watchpoint(cpu);
>>> @@ -2348,8 +2354,14 @@ static void io_mem_init(void)
>>>       memory_region_init_io(&io_mem_rom, NULL, &unassigned_mem_ops, NULL, NULL, UINT64_MAX);
>>>       memory_region_init_io(&io_mem_unassigned, NULL, &unassigned_mem_ops, NULL,
>>>                             NULL, UINT64_MAX);
>>> +
>>> +    /* io_mem_notdirty calls tb_invalidate_phys_page_fast,
>>> +     * which must be called without the iothread mutex.
>>> +     */
>>>       memory_region_init_io(&io_mem_notdirty, NULL, &notdirty_mem_ops, NULL,
>>>                             NULL, UINT64_MAX);
>>> +    memory_region_clear_global_locking(&io_mem_notdirty);
>>> +
>>>       memory_region_init_io(&io_mem_watch, NULL, &watch_mem_ops, NULL,
>>>                             NULL, UINT64_MAX);
>>>   }
>>> diff --git a/hw/i386/kvmvapic.c b/hw/i386/kvmvapic.c
>>> index 7c0d542..a0439a8 100644
>>> --- a/hw/i386/kvmvapic.c
>>> +++ b/hw/i386/kvmvapic.c
>>> @@ -447,6 +447,9 @@ static void patch_instruction(VAPICROMState *s, X86CPU *cpu, target_ulong ip)
>>>       resume_all_vcpus();
>>>
>>>       if (!kvm_enabled()) {
>>> +        /* Unlock iothread mutex before calling cpu_resume_from_signal.  */
>>> +        qemu_mutex_unlock_iothread();
>>> +
>>>           /* Unlocked by cpu_resume_from_signal.  */
>>>           tb_lock();
>>>           cs->current_tb = NULL;
>>> diff --git a/softmmu_template.h b/softmmu_template.h
>>> index 208f808..0b6c609 100644
>>> --- a/softmmu_template.h
>>> +++ b/softmmu_template.h
>>> @@ -151,6 +151,7 @@ static inline DATA_TYPE glue(io_read, SUFFIX)(CPUArchState *env,
>>>       CPUState *cpu = ENV_GET_CPU(env);
>>>       hwaddr physaddr = iotlbentry->addr;
>>>       MemoryRegion *mr = iotlb_to_region(cpu, physaddr, iotlbentry->attrs);
>>> +    bool locked = false;
>>>
>>>       physaddr = (physaddr & TARGET_PAGE_MASK) + addr;
>>>       cpu->mem_io_pc = retaddr;
>>> @@ -159,8 +160,15 @@ static inline DATA_TYPE glue(io_read, SUFFIX)(CPUArchState *env,
>>>       }
>>>
>>>       cpu->mem_io_vaddr = addr;
>>> +    if (mr->global_locking) {
>>> +        qemu_mutex_lock_iothread();
>>> +        locked = true;
>>> +    }
>>>       memory_region_dispatch_read(mr, physaddr, &val, 1 << SHIFT,
>>>                                   iotlbentry->attrs);
>>> +    if (locked) {
>>> +        qemu_mutex_unlock_iothread();
>>> +    }
>>>       return val;
>>>   }
>>>   #endif
>>> @@ -358,6 +366,7 @@ static inline void glue(io_write, SUFFIX)(CPUArchState *env,
>>>       CPUState *cpu = ENV_GET_CPU(env);
>>>       hwaddr physaddr = iotlbentry->addr;
>>>       MemoryRegion *mr = iotlb_to_region(cpu, physaddr, iotlbentry->attrs);
>>> +    bool locked = false;
>>>
>>>       physaddr = (physaddr & TARGET_PAGE_MASK) + addr;
>>>       if (mr != &io_mem_rom && mr != &io_mem_notdirty && !cpu->can_do_io) {
>>> @@ -366,8 +375,16 @@ static inline void glue(io_write, SUFFIX)(CPUArchState *env,
>>>
>>>       cpu->mem_io_vaddr = addr;
>>>       cpu->mem_io_pc = retaddr;
>>> +
>>> +    if (mr->global_locking) {
>>> +        qemu_mutex_lock_iothread();
>>> +        locked = true;
>>> +    }
>>>       memory_region_dispatch_write(mr, physaddr, val, 1 << SHIFT,
>>>                                    iotlbentry->attrs);
>>> +    if (locked) {
>>> +        qemu_mutex_unlock_iothread();
>>> +    }
>>>   }
>>>
>>>   void helper_le_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
>>> diff --git a/translate-all.c b/translate-all.c
>>> index 1aab243..fe1392a 100644
>>> --- a/translate-all.c
>>> +++ b/translate-all.c
>>> @@ -1221,6 +1221,7 @@ void tb_invalidate_phys_range(tb_page_addr_t start, tb_page_addr_t end)
>>>    * this TB.
>>>    *
>>>    * Called with mmap_lock held for user-mode emulation
>>> + * If called from generated code, iothread mutex must not be held.
>>>    */
>>>   void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end,
>>>                                      int is_cpu_write_access)
>>> @@ -1333,7 +1334,9 @@ void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end,
>>>   }
>>>
>>>   #ifdef CONFIG_SOFTMMU
>>> -/* len must be <= 8 and start must be a multiple of len */
>>> +/* len must be <= 8 and start must be a multiple of len.
>>> + * Called via softmmu_template.h, with iothread mutex not held.
>>> + */
>>>   void tb_invalidate_phys_page_fast(tb_page_addr_t start, int len)
>>>   {
>>>       PageDesc *p;
>>> @@ -1591,6 +1594,7 @@ static TranslationBlock *tb_find_pc(uintptr_t tc_ptr)
>>>   }
>>>
>>>   #if !defined(CONFIG_USER_ONLY)
>>> +/* If called from generated code, iothread mutex must not be held.  */
>>>   void tb_invalidate_phys_addr(AddressSpace *as, hwaddr addr)
>>>   {
>>>       ram_addr_t ram_addr;
>>> @@ -1637,7 +1641,10 @@ void tb_check_watchpoint(CPUState *cpu)
>>>
>>>   #ifndef CONFIG_USER_ONLY
>>>   /* in deterministic execution mode, instructions doing device I/Os
>>> -   must be at the end of the TB */
>>> + * must be at the end of the TB.
>>> + *
>>> + * Called by softmmu_template.h, with iothread mutex not held.
>>> + */
>>>   void cpu_io_recompile(CPUState *cpu, uintptr_t retaddr)
>>>   {
>>>   #if defined(TARGET_MIPS) || defined(TARGET_SH4)
> 
> 
> --
> Alex Bennée
> 


-- 
Siemens AG, Corporate Technology, CT RDA ITP SES-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2016-03-23 20:37 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-18 16:18 [Qemu-devel] [RFC v1 00/11] Base enabling patches for MTTCG Alex Bennée
2016-03-18 16:18 ` [Qemu-devel] [RFC v1 01/11] tcg: move tb_find_fast outside the tb_lock critical section Alex Bennée
2016-03-18 16:54   ` Paolo Bonzini
2016-03-21 21:50   ` Emilio G. Cota
2016-03-21 22:08     ` Peter Maydell
2016-03-21 23:59       ` Emilio G. Cota
2016-03-22  8:29         ` Paolo Bonzini
2016-03-22 11:59         ` Alex Bennée
2016-03-22 11:55     ` Alex Bennée
2016-03-18 16:18 ` [Qemu-devel] [RFC v1 02/11] cpu-exec: elide more icount code if CONFIG_USER_ONLY Alex Bennée
2016-03-18 16:18 ` [Qemu-devel] [RFC v1 03/11] tcg: comment on which functions have to be called with tb_lock held Alex Bennée
2016-03-18 16:59   ` Paolo Bonzini
2016-03-21 21:50     ` Emilio G. Cota
2016-03-21 22:12       ` Paolo Bonzini
2016-03-18 16:18 ` [Qemu-devel] [RFC v1 04/11] tcg: protect TBContext with tb_lock Alex Bennée
2016-03-18 16:18 ` [Qemu-devel] [RFC v1 05/11] target-arm/psci.c: wake up sleeping CPUs Alex Bennée
2016-03-18 16:18 ` [Qemu-devel] [RFC v1 06/11] tcg: cpus rm tcg_exec_all() Alex Bennée
2016-03-18 16:18 ` [Qemu-devel] [RFC v1 07/11] tcg: add options for enabling MTTCG Alex Bennée
2016-03-18 16:18 ` [Qemu-devel] [RFC v1 08/11] tcg: add kick timer for single-threaded vCPU emulation Alex Bennée
2016-03-18 16:18 ` [Qemu-devel] [RFC v1 09/11] tcg: drop global lock during TCG code execution Alex Bennée
2016-03-18 16:49   ` Paolo Bonzini
2016-03-23  9:19   ` KONRAD Frederic
2016-03-23 16:27     ` Alex Bennée
2016-03-23 20:36       ` Jan Kiszka
2016-03-18 16:18 ` [Qemu-devel] [RFC v1 10/11] tcg: grab iothread lock in cpu-exec interrupt handling Alex Bennée
2016-03-18 16:48   ` Paolo Bonzini
2016-03-22 12:03     ` Alex Bennée
2016-03-18 16:18 ` [Qemu-devel] [RFC v1 11/11] tcg: enable thread-per-vCPU Alex Bennée

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.