All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [RFC 00/10] MultiThread TCG.
@ 2015-01-16 17:19 fred.konrad
  2015-01-16 17:19 ` [Qemu-devel] [RFC 01/10] target-arm: protect cpu_exclusive_* fred.konrad
                   ` (10 more replies)
  0 siblings, 11 replies; 62+ messages in thread
From: fred.konrad @ 2015-01-16 17:19 UTC (permalink / raw)
  To: qemu-devel, mttcg
  Cc: peter.maydell, jan.kiszka, mark.burton, agraf, pbonzini, fred.konrad

From: KONRAD Frederic <fred.konrad@greensocs.com>

Hi everybody,

This is the start of our work on TCG multithread.

We send it for comment to be sure we are taking the right direction.
We already discussed the first patch but we keep it for simplicity.

We choice to keep a common tbs array for all VCPU but protect it with the
tb_lock from TBContext. Then for each PageDesc we have a tb list per VCPU.

Known issues:
  * Some random deadlock.
  * qemu_pause_cond is broken we can't quit QEMU.
  * tb_flush is broken we must make sure no VCPU are executing code.

Jan Kiszka (1):
  Drop global lock during TCG code execution

KONRAD Frederic (9):
  target-arm: protect cpu_exclusive_*.
  use a different translation block list for each cpu.
  replace spinlock by QemuMutex.
  remove unused spinlock.
  extract TBContext from TCGContext.
  protect TBContext with tb_lock.
  tcg: remove tcg_halt_cond global variable.
  cpu: remove exit_request global.
  tcg: switch on multithread.

 cpu-exec.c                |  47 +++++++----
 cpus.c                    | 122 +++++++++++----------------
 cputlb.c                  |   5 ++
 exec.c                    |  25 ++++++
 include/exec/exec-all.h   |   4 +-
 include/exec/spinlock.h   |  49 -----------
 include/qom/cpu.h         |   1 +
 linux-user/main.c         |   6 +-
 scripts/checkpatch.pl     |   9 +-
 softmmu_template.h        |   6 ++
 target-arm/cpu.c          |  14 ++++
 target-arm/cpu.h          |   3 +
 target-arm/helper.h       |   3 +
 target-arm/op_helper.c    |  10 +++
 target-arm/translate.c    |   6 ++
 target-i386/mem_helper.c  |  16 +++-
 target-i386/misc_helper.c |  27 +++++-
 tcg/i386/tcg-target.c     |   8 ++
 tcg/tcg.h                 |   3 +-
 translate-all.c           | 208 +++++++++++++++++++++++++++-------------------
 vl.c                      |   6 ++
 21 files changed, 335 insertions(+), 243 deletions(-)
 delete mode 100644 include/exec/spinlock.h

-- 
1.9.0

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [Qemu-devel] [RFC 01/10] target-arm: protect cpu_exclusive_*.
  2015-01-16 17:19 [Qemu-devel] [RFC 00/10] MultiThread TCG fred.konrad
@ 2015-01-16 17:19 ` fred.konrad
  2015-01-27 14:36   ` Alex Bennée
  2015-01-29 15:17   ` Peter Maydell
  2015-01-16 17:19 ` [Qemu-devel] [RFC 02/10] use a different translation block list for each cpu fred.konrad
                   ` (9 subsequent siblings)
  10 siblings, 2 replies; 62+ messages in thread
From: fred.konrad @ 2015-01-16 17:19 UTC (permalink / raw)
  To: qemu-devel, mttcg
  Cc: peter.maydell, jan.kiszka, mark.burton, agraf, pbonzini, fred.konrad

From: KONRAD Frederic <fred.konrad@greensocs.com>

This adds a lock to avoid multiple exclusive access at the same time in case of
TCG multithread.

Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>

V1 -> V2:
  Removed qemu_mutex_destroy().
---
 target-arm/cpu.c       | 14 ++++++++++++++
 target-arm/cpu.h       |  3 +++
 target-arm/helper.h    |  3 +++
 target-arm/op_helper.c | 10 ++++++++++
 target-arm/translate.c |  6 ++++++
 5 files changed, 36 insertions(+)

diff --git a/target-arm/cpu.c b/target-arm/cpu.c
index 285947f..75bdc5b 100644
--- a/target-arm/cpu.c
+++ b/target-arm/cpu.c
@@ -31,6 +31,19 @@
 #include "sysemu/kvm.h"
 #include "kvm_arm.h"
 
+/* Protect cpu_exclusive_* variable .*/
+QemuMutex cpu_exclusive_lock;
+
+inline void arm_exclusive_lock(void)
+{
+    qemu_mutex_lock(&cpu_exclusive_lock);
+}
+
+inline void arm_exclusive_unlock(void)
+{
+    qemu_mutex_unlock(&cpu_exclusive_lock);
+}
+
 static void arm_cpu_set_pc(CPUState *cs, vaddr value)
 {
     ARMCPU *cpu = ARM_CPU(cs);
@@ -374,6 +387,7 @@ static void arm_cpu_initfn(Object *obj)
         cpu->psci_version = 2; /* TCG implements PSCI 0.2 */
         if (!inited) {
             inited = true;
+            qemu_mutex_init(&cpu_exclusive_lock);
             arm_translate_init();
         }
     }
diff --git a/target-arm/cpu.h b/target-arm/cpu.h
index 7ba55f0..2101d85 100644
--- a/target-arm/cpu.h
+++ b/target-arm/cpu.h
@@ -1821,4 +1821,7 @@ enum {
     QEMU_PSCI_CONDUIT_HVC = 2,
 };
 
+void arm_exclusive_lock(void);
+void arm_exclusive_unlock(void);
+
 #endif
diff --git a/target-arm/helper.h b/target-arm/helper.h
index dec3728..ce07711 100644
--- a/target-arm/helper.h
+++ b/target-arm/helper.h
@@ -529,6 +529,9 @@ DEF_HELPER_2(dc_zva, void, env, i64)
 DEF_HELPER_FLAGS_2(neon_pmull_64_lo, TCG_CALL_NO_RWG_SE, i64, i64, i64)
 DEF_HELPER_FLAGS_2(neon_pmull_64_hi, TCG_CALL_NO_RWG_SE, i64, i64, i64)
 
+DEF_HELPER_0(exclusive_lock, void)
+DEF_HELPER_0(exclusive_unlock, void)
+
 #ifdef TARGET_AARCH64
 #include "helper-a64.h"
 #endif
diff --git a/target-arm/op_helper.c b/target-arm/op_helper.c
index 2bed914..d830fd8 100644
--- a/target-arm/op_helper.c
+++ b/target-arm/op_helper.c
@@ -33,6 +33,16 @@ static void raise_exception(CPUARMState *env, int tt)
     cpu_loop_exit(cs);
 }
 
+void HELPER(exclusive_lock)(void)
+{
+    arm_exclusive_lock();
+}
+
+void HELPER(exclusive_unlock)(void)
+{
+    arm_exclusive_unlock();
+}
+
 uint32_t HELPER(neon_tbl)(CPUARMState *env, uint32_t ireg, uint32_t def,
                           uint32_t rn, uint32_t maxindex)
 {
diff --git a/target-arm/translate.c b/target-arm/translate.c
index bdfcdf1..513d151 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -7381,6 +7381,7 @@ static void gen_load_exclusive(DisasContext *s, int rt, int rt2,
         abort();
     }
 
+    gen_helper_exclusive_lock();
     if (size == 3) {
         TCGv_i32 tmp2 = tcg_temp_new_i32();
         TCGv_i32 tmp3 = tcg_temp_new_i32();
@@ -7396,11 +7397,14 @@ static void gen_load_exclusive(DisasContext *s, int rt, int rt2,
 
     store_reg(s, rt, tmp);
     tcg_gen_extu_i32_i64(cpu_exclusive_addr, addr);
+    gen_helper_exclusive_unlock();
 }
 
 static void gen_clrex(DisasContext *s)
 {
+    gen_helper_exclusive_lock();
     tcg_gen_movi_i64(cpu_exclusive_addr, -1);
+    gen_helper_exclusive_unlock();
 }
 
 #ifdef CONFIG_USER_ONLY
@@ -7431,6 +7435,7 @@ static void gen_store_exclusive(DisasContext *s, int rd, int rt, int rt2,
     done_label = gen_new_label();
     extaddr = tcg_temp_new_i64();
     tcg_gen_extu_i32_i64(extaddr, addr);
+    gen_helper_exclusive_lock();
     tcg_gen_brcond_i64(TCG_COND_NE, extaddr, cpu_exclusive_addr, fail_label);
     tcg_temp_free_i64(extaddr);
 
@@ -7495,6 +7500,7 @@ static void gen_store_exclusive(DisasContext *s, int rd, int rt, int rt2,
     tcg_gen_movi_i32(cpu_R[rd], 1);
     gen_set_label(done_label);
     tcg_gen_movi_i64(cpu_exclusive_addr, -1);
+    gen_helper_exclusive_unlock();
 }
 #endif
 
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [Qemu-devel] [RFC 02/10] use a different translation block list for each cpu.
  2015-01-16 17:19 [Qemu-devel] [RFC 00/10] MultiThread TCG fred.konrad
  2015-01-16 17:19 ` [Qemu-devel] [RFC 01/10] target-arm: protect cpu_exclusive_* fred.konrad
@ 2015-01-16 17:19 ` fred.konrad
  2015-01-27 14:45   ` Alex Bennée
                     ` (2 more replies)
  2015-01-16 17:19 ` [Qemu-devel] [RFC 03/10] replace spinlock by QemuMutex fred.konrad
                   ` (8 subsequent siblings)
  10 siblings, 3 replies; 62+ messages in thread
From: fred.konrad @ 2015-01-16 17:19 UTC (permalink / raw)
  To: qemu-devel, mttcg
  Cc: peter.maydell, jan.kiszka, mark.burton, agraf, pbonzini, fred.konrad

From: KONRAD Frederic <fred.konrad@greensocs.com>

We need a different TranslationBlock list for each core in case of multithread
TCG.

Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
---
 translate-all.c | 40 ++++++++++++++++++++++------------------
 1 file changed, 22 insertions(+), 18 deletions(-)

diff --git a/translate-all.c b/translate-all.c
index 8fa4378..0e11c70 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -72,10 +72,11 @@
 #endif
 
 #define SMC_BITMAP_USE_THRESHOLD 10
+#define MAX_CPUS 256
 
 typedef struct PageDesc {
     /* list of TBs intersecting this ram page */
-    TranslationBlock *first_tb;
+    TranslationBlock *first_tb[MAX_CPUS];
     /* in order to optimize self modifying code, we count the number
        of lookups we do to a given page to use a bitmap */
     unsigned int code_write_count;
@@ -750,7 +751,7 @@ static inline void invalidate_page_bitmap(PageDesc *p)
 /* Set to NULL all the 'first_tb' fields in all PageDescs. */
 static void page_flush_tb_1(int level, void **lp)
 {
-    int i;
+    int i, j;
 
     if (*lp == NULL) {
         return;
@@ -759,7 +760,9 @@ static void page_flush_tb_1(int level, void **lp)
         PageDesc *pd = *lp;
 
         for (i = 0; i < V_L2_SIZE; ++i) {
-            pd[i].first_tb = NULL;
+            for (j = 0; j < MAX_CPUS; j++) {
+                pd[i].first_tb[j] = NULL;
+            }
             invalidate_page_bitmap(pd + i);
         }
     } else {
@@ -937,12 +940,12 @@ void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr)
     /* remove the TB from the page list */
     if (tb->page_addr[0] != page_addr) {
         p = page_find(tb->page_addr[0] >> TARGET_PAGE_BITS);
-        tb_page_remove(&p->first_tb, tb);
+        tb_page_remove(&p->first_tb[current_cpu->cpu_index], tb);
         invalidate_page_bitmap(p);
     }
     if (tb->page_addr[1] != -1 && tb->page_addr[1] != page_addr) {
         p = page_find(tb->page_addr[1] >> TARGET_PAGE_BITS);
-        tb_page_remove(&p->first_tb, tb);
+        tb_page_remove(&p->first_tb[current_cpu->cpu_index], tb);
         invalidate_page_bitmap(p);
     }
 
@@ -1012,7 +1015,7 @@ static void build_page_bitmap(PageDesc *p)
 
     p->code_bitmap = g_malloc0(TARGET_PAGE_SIZE / 8);
 
-    tb = p->first_tb;
+    tb = p->first_tb[current_cpu->cpu_index];
     while (tb != NULL) {
         n = (uintptr_t)tb & 3;
         tb = (TranslationBlock *)((uintptr_t)tb & ~3);
@@ -1138,7 +1141,7 @@ void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end,
     /* we remove all the TBs in the range [start, end[ */
     /* XXX: see if in some cases it could be faster to invalidate all
        the code */
-    tb = p->first_tb;
+    tb = p->first_tb[cpu->cpu_index];
     while (tb != NULL) {
         n = (uintptr_t)tb & 3;
         tb = (TranslationBlock *)((uintptr_t)tb & ~3);
@@ -1196,7 +1199,7 @@ void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end,
     }
 #if !defined(CONFIG_USER_ONLY)
     /* if no code remaining, no need to continue to use slow writes */
-    if (!p->first_tb) {
+    if (!p->first_tb[cpu->cpu_index]) {
         invalidate_page_bitmap(p);
         if (is_cpu_write_access) {
             tlb_unprotect_code_phys(cpu, start, cpu->mem_io_vaddr);
@@ -1224,10 +1227,10 @@ void tb_invalidate_phys_page_fast(tb_page_addr_t start, int len)
 #if 0
     if (1) {
         qemu_log("modifying code at 0x%x size=%d EIP=%x PC=%08x\n",
-                  cpu_single_env->mem_io_vaddr, len,
-                  cpu_single_env->eip,
-                  cpu_single_env->eip +
-                  (intptr_t)cpu_single_env->segs[R_CS].base);
+                  current_cpu->mem_io_vaddr, len,
+                  current_cpu->eip,
+                  current_cpu->eip +
+                  (intptr_t)current_cpu->segs[R_CS].base);
     }
 #endif
     p = page_find(start >> TARGET_PAGE_BITS);
@@ -1269,7 +1272,7 @@ static void tb_invalidate_phys_page(tb_page_addr_t addr,
     if (!p) {
         return;
     }
-    tb = p->first_tb;
+    tb = p->first_tb[current_cpu->cpu_index];
 #ifdef TARGET_HAS_PRECISE_SMC
     if (tb && pc != 0) {
         current_tb = tb_find_pc(pc);
@@ -1299,7 +1302,7 @@ static void tb_invalidate_phys_page(tb_page_addr_t addr,
         tb_phys_invalidate(tb, addr);
         tb = tb->page_next[n];
     }
-    p->first_tb = NULL;
+    p->first_tb[current_cpu->cpu_index] = NULL;
 #ifdef TARGET_HAS_PRECISE_SMC
     if (current_tb_modified) {
         /* we generate a block containing just the instruction
@@ -1327,11 +1330,12 @@ static inline void tb_alloc_page(TranslationBlock *tb,
 
     tb->page_addr[n] = page_addr;
     p = page_find_alloc(page_addr >> TARGET_PAGE_BITS, 1);
-    tb->page_next[n] = p->first_tb;
+    tb->page_next[n] = p->first_tb[current_cpu->cpu_index];
 #ifndef CONFIG_USER_ONLY
-    page_already_protected = p->first_tb != NULL;
+    page_already_protected = p->first_tb[current_cpu->cpu_index] != NULL;
 #endif
-    p->first_tb = (TranslationBlock *)((uintptr_t)tb | n);
+    p->first_tb[current_cpu->cpu_index]
+      = (TranslationBlock *)((uintptr_t)tb | n);
     invalidate_page_bitmap(p);
 
 #if defined(TARGET_HAS_SMC) || 1
@@ -1821,7 +1825,7 @@ void page_set_flags(target_ulong start, target_ulong end, int flags)
            the code inside.  */
         if (!(p->flags & PAGE_WRITE) &&
             (flags & PAGE_WRITE) &&
-            p->first_tb) {
+            p->first_tb[current_cpu->cpu_index]) {
             tb_invalidate_phys_page(addr, 0, NULL, false);
         }
         p->flags = flags;
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [Qemu-devel] [RFC 03/10] replace spinlock by QemuMutex.
  2015-01-16 17:19 [Qemu-devel] [RFC 00/10] MultiThread TCG fred.konrad
  2015-01-16 17:19 ` [Qemu-devel] [RFC 01/10] target-arm: protect cpu_exclusive_* fred.konrad
  2015-01-16 17:19 ` [Qemu-devel] [RFC 02/10] use a different translation block list for each cpu fred.konrad
@ 2015-01-16 17:19 ` fred.konrad
  2015-01-29 15:25   ` Peter Maydell
  2015-01-16 17:19 ` [Qemu-devel] [RFC 04/10] remove unused spinlock fred.konrad
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 62+ messages in thread
From: fred.konrad @ 2015-01-16 17:19 UTC (permalink / raw)
  To: qemu-devel, mttcg
  Cc: peter.maydell, jan.kiszka, mark.burton, agraf, pbonzini, fred.konrad

From: KONRAD Frederic <fred.konrad@greensocs.com>

spinlock is only used in two cases:
  * cpu-exec.c: to protect TranslationBlock
  * mem_helper.c: for lock helper in target-i386 (which seems broken).

It's a pthread_mutex_t in user-mode so better using QemuMutex directly in this
case.
It allows as well to reuse tb_lock mutex of TBContext in case of multithread
TCG.

Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
---
 cpu-exec.c               | 15 +++++++++++----
 include/exec/exec-all.h  |  4 ++--
 linux-user/main.c        |  6 +++---
 target-i386/mem_helper.c | 16 +++++++++++++---
 tcg/i386/tcg-target.c    |  8 ++++++++
 5 files changed, 37 insertions(+), 12 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index a4f0eff..1e7513c 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -335,7 +335,9 @@ int cpu_exec(CPUArchState *env)
     SyncClocks sc;
 
     /* This must be volatile so it is not trashed by longjmp() */
+#if defined(CONFIG_USER_ONLY)
     volatile bool have_tb_lock = false;
+#endif
 
     if (cpu->halted) {
         if (!cpu_has_work(cpu)) {
@@ -451,8 +453,10 @@ int cpu_exec(CPUArchState *env)
                     cpu->exception_index = EXCP_INTERRUPT;
                     cpu_loop_exit(cpu);
                 }
-                spin_lock(&tcg_ctx.tb_ctx.tb_lock);
+#if defined(CONFIG_USER_ONLY)
+                qemu_mutex_lock(&tcg_ctx.tb_ctx.tb_lock);
                 have_tb_lock = true;
+#endif
                 tb = tb_find_fast(env);
                 /* Note: we do it here to avoid a gcc bug on Mac OS X when
                    doing it in tb_find_slow */
@@ -474,9 +478,10 @@ int cpu_exec(CPUArchState *env)
                     tb_add_jump((TranslationBlock *)(next_tb & ~TB_EXIT_MASK),
                                 next_tb & TB_EXIT_MASK, tb);
                 }
+#if defined(CONFIG_USER_ONLY)
                 have_tb_lock = false;
-                spin_unlock(&tcg_ctx.tb_ctx.tb_lock);
-
+                qemu_mutex_unlock(&tcg_ctx.tb_ctx.tb_lock);
+#endif
                 /* cpu_interrupt might be called while translating the
                    TB, but before it is linked into a potentially
                    infinite loop and becomes env->current_tb. Avoid
@@ -549,10 +554,12 @@ int cpu_exec(CPUArchState *env)
 #ifdef TARGET_I386
             x86_cpu = X86_CPU(cpu);
 #endif
+#if defined(CONFIG_USER_ONLY)
             if (have_tb_lock) {
-                spin_unlock(&tcg_ctx.tb_ctx.tb_lock);
+                qemu_mutex_unlock(&tcg_ctx.tb_ctx.tb_lock);
                 have_tb_lock = false;
             }
+#endif
         }
     } /* for(;;) */
 
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 6a15448..6a450c1 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -173,7 +173,7 @@ struct TranslationBlock {
     struct TranslationBlock *jmp_first;
 };
 
-#include "exec/spinlock.h"
+#include "qemu/thread.h"
 
 typedef struct TBContext TBContext;
 
@@ -183,7 +183,7 @@ struct TBContext {
     TranslationBlock *tb_phys_hash[CODE_GEN_PHYS_HASH_SIZE];
     int nb_tbs;
     /* any access to the tbs or the page table must use this lock */
-    spinlock_t tb_lock;
+    QemuMutex tb_lock;
 
     /* statistics */
     int tb_flush_count;
diff --git a/linux-user/main.c b/linux-user/main.c
index 67b0231..97e7d50 100644
--- a/linux-user/main.c
+++ b/linux-user/main.c
@@ -107,7 +107,7 @@ static int pending_cpus;
 /* Make sure everything is in a consistent state for calling fork().  */
 void fork_start(void)
 {
-    pthread_mutex_lock(&tcg_ctx.tb_ctx.tb_lock);
+    qemu_mutex_lock(&tcg_ctx.tb_ctx.tb_lock);
     pthread_mutex_lock(&exclusive_lock);
     mmap_fork_start();
 }
@@ -129,11 +129,11 @@ void fork_end(int child)
         pthread_mutex_init(&cpu_list_mutex, NULL);
         pthread_cond_init(&exclusive_cond, NULL);
         pthread_cond_init(&exclusive_resume, NULL);
-        pthread_mutex_init(&tcg_ctx.tb_ctx.tb_lock, NULL);
+        qemu_mutex_init(&tcg_ctx.tb_ctx.tb_lock);
         gdbserver_fork((CPUArchState *)thread_cpu->env_ptr);
     } else {
         pthread_mutex_unlock(&exclusive_lock);
-        pthread_mutex_unlock(&tcg_ctx.tb_ctx.tb_lock);
+        qemu_mutex_unlock(&tcg_ctx.tb_ctx.tb_lock);
     }
 }
 
diff --git a/target-i386/mem_helper.c b/target-i386/mem_helper.c
index 1aec8a5..7106cc3 100644
--- a/target-i386/mem_helper.c
+++ b/target-i386/mem_helper.c
@@ -23,17 +23,27 @@
 
 /* broken thread support */
 
-static spinlock_t global_cpu_lock = SPIN_LOCK_UNLOCKED;
+#if defined(CONFIG_USER_ONLY)
+QemuMutex global_cpu_lock;
 
 void helper_lock(void)
 {
-    spin_lock(&global_cpu_lock);
+    qemu_mutex_lock(&global_cpu_lock);
 }
 
 void helper_unlock(void)
 {
-    spin_unlock(&global_cpu_lock);
+    qemu_mutex_unlock(&global_cpu_lock);
 }
+#else
+void helper_lock(void)
+{
+}
+
+void helper_unlock(void)
+{
+}
+#endif
 
 void helper_cmpxchg8b(CPUX86State *env, target_ulong a0)
 {
diff --git a/tcg/i386/tcg-target.c b/tcg/i386/tcg-target.c
index 4133dcf..cbb8a4f 100644
--- a/tcg/i386/tcg-target.c
+++ b/tcg/i386/tcg-target.c
@@ -24,6 +24,10 @@
 
 #include "tcg-be-ldst.h"
 
+#if defined(CONFIG_USER_ONLY)
+extern QemuMutex global_cpu_lock;
+#endif
+
 #ifndef NDEBUG
 static const char * const tcg_target_reg_names[TCG_TARGET_NB_REGS] = {
 #if TCG_TARGET_REG_BITS == 64
@@ -2339,6 +2343,10 @@ static void tcg_target_init(TCGContext *s)
     tcg_regset_set_reg(s->reserved_regs, TCG_REG_CALL_STACK);
 
     tcg_add_target_add_op_defs(x86_op_defs);
+
+#if defined(CONFIG_USER_ONLY)
+    qemu_mutex_init(global_cpu_lock);
+#endif
 }
 
 typedef struct {
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [Qemu-devel] [RFC 04/10] remove unused spinlock.
  2015-01-16 17:19 [Qemu-devel] [RFC 00/10] MultiThread TCG fred.konrad
                   ` (2 preceding siblings ...)
  2015-01-16 17:19 ` [Qemu-devel] [RFC 03/10] replace spinlock by QemuMutex fred.konrad
@ 2015-01-16 17:19 ` fred.konrad
  2015-01-16 17:19 ` [Qemu-devel] [RFC 05/10] extract TBContext from TCGContext fred.konrad
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 62+ messages in thread
From: fred.konrad @ 2015-01-16 17:19 UTC (permalink / raw)
  To: qemu-devel, mttcg
  Cc: peter.maydell, jan.kiszka, mark.burton, agraf, pbonzini, fred.konrad

From: KONRAD Frederic <fred.konrad@greensocs.com>

This just removes spinlock as it is not used anymore.

Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
---
 include/exec/spinlock.h | 49 -------------------------------------------------
 scripts/checkpatch.pl   |  9 ++-------
 2 files changed, 2 insertions(+), 56 deletions(-)
 delete mode 100644 include/exec/spinlock.h

diff --git a/include/exec/spinlock.h b/include/exec/spinlock.h
deleted file mode 100644
index a72edda..0000000
--- a/include/exec/spinlock.h
+++ /dev/null
@@ -1,49 +0,0 @@
-/*
- *  Copyright (c) 2003 Fabrice Bellard
- *
- * This library is free software; you can redistribute it and/or
- * modify it under the terms of the GNU Lesser General Public
- * License as published by the Free Software Foundation; either
- * version 2 of the License, or (at your option) any later version.
- *
- * This library is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
- * Lesser General Public License for more details.
- *
- * You should have received a copy of the GNU Lesser General Public
- * License along with this library; if not, see <http://www.gnu.org/licenses/>
- */
-
-/* configure guarantees us that we have pthreads on any host except
- * mingw32, which doesn't support any of the user-only targets.
- * So we can simply assume we have pthread mutexes here.
- */
-#if defined(CONFIG_USER_ONLY)
-
-#include <pthread.h>
-#define spin_lock pthread_mutex_lock
-#define spin_unlock pthread_mutex_unlock
-#define spinlock_t pthread_mutex_t
-#define SPIN_LOCK_UNLOCKED PTHREAD_MUTEX_INITIALIZER
-
-#else
-
-/* Empty implementations, on the theory that system mode emulation
- * is single-threaded. This means that these functions should only
- * be used from code run in the TCG cpu thread, and cannot protect
- * data structures which might also be accessed from the IO thread
- * or from signal handlers.
- */
-typedef int spinlock_t;
-#define SPIN_LOCK_UNLOCKED 0
-
-static inline void spin_lock(spinlock_t *lock)
-{
-}
-
-static inline void spin_unlock(spinlock_t *lock)
-{
-}
-
-#endif
diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 053e432..e97b707 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -2655,11 +2655,6 @@ sub process {
 			WARN("Use of volatile is usually wrong: see Documentation/volatile-considered-harmful.txt\n" . $herecurr);
 		}
 
-# SPIN_LOCK_UNLOCKED & RW_LOCK_UNLOCKED are deprecated
-		if ($line =~ /\b(SPIN_LOCK_UNLOCKED|RW_LOCK_UNLOCKED)/) {
-			ERROR("Use of $1 is deprecated: see Documentation/spinlocks.txt\n" . $herecurr);
-		}
-
 # warn about #if 0
 		if ($line =~ /^.\s*\#\s*if\s+0\b/) {
 			CHK("if this code is redundant consider removing it\n" .
@@ -2708,8 +2703,8 @@ sub process {
 			ERROR("exactly one space required after that #$1\n" . $herecurr);
 		}
 
-# check for spinlock_t definitions without a comment.
-		if ($line =~ /^.\s*(struct\s+mutex|spinlock_t)\s+\S+;/ ||
+# check for mutex definitions without a comment.
+		if ($line =~ /^.\s*(struct\s+mutex)\s+\S+;/ ||
 		    $line =~ /^.\s*(DEFINE_MUTEX)\s*\(/) {
 			my $which = $1;
 			if (!ctx_has_comment($first_line, $linenr)) {
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [Qemu-devel] [RFC 05/10] extract TBContext from TCGContext.
  2015-01-16 17:19 [Qemu-devel] [RFC 00/10] MultiThread TCG fred.konrad
                   ` (3 preceding siblings ...)
  2015-01-16 17:19 ` [Qemu-devel] [RFC 04/10] remove unused spinlock fred.konrad
@ 2015-01-16 17:19 ` fred.konrad
  2015-01-29 15:44   ` Peter Maydell
  2015-01-16 17:19 ` [Qemu-devel] [RFC 06/10] protect TBContext with tb_lock fred.konrad
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 62+ messages in thread
From: fred.konrad @ 2015-01-16 17:19 UTC (permalink / raw)
  To: qemu-devel, mttcg
  Cc: peter.maydell, jan.kiszka, mark.burton, agraf, pbonzini, fred.konrad

From: KONRAD Frederic <fred.konrad@greensocs.com>

In order to have one TCGContext per thread and a single TBContext we have to
extract TBContext from TCGContext.

Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
---
 cpu-exec.c        | 18 ++++++-------
 linux-user/main.c |  6 ++---
 tcg/tcg.h         |  3 +--
 translate-all.c   | 79 +++++++++++++++++++++++++++----------------------------
 4 files changed, 52 insertions(+), 54 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index 1e7513c..4d22252 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -237,13 +237,13 @@ static TranslationBlock *tb_find_slow(CPUArchState *env,
     tb_page_addr_t phys_pc, phys_page1;
     target_ulong virt_page2;
 
-    tcg_ctx.tb_ctx.tb_invalidated_flag = 0;
+    tb_ctx.tb_invalidated_flag = 0;
 
     /* find translated block using physical mappings */
     phys_pc = get_page_addr_code(env, pc);
     phys_page1 = phys_pc & TARGET_PAGE_MASK;
     h = tb_phys_hash_func(phys_pc);
-    ptb1 = &tcg_ctx.tb_ctx.tb_phys_hash[h];
+    ptb1 = &tb_ctx.tb_phys_hash[h];
     for(;;) {
         tb = *ptb1;
         if (!tb)
@@ -275,8 +275,8 @@ static TranslationBlock *tb_find_slow(CPUArchState *env,
     /* Move the last found TB to the head of the list */
     if (likely(*ptb1)) {
         *ptb1 = tb->phys_hash_next;
-        tb->phys_hash_next = tcg_ctx.tb_ctx.tb_phys_hash[h];
-        tcg_ctx.tb_ctx.tb_phys_hash[h] = tb;
+        tb->phys_hash_next = tb_ctx.tb_phys_hash[h];
+        tb_ctx.tb_phys_hash[h] = tb;
     }
     /* we add the TB in the virtual pc hash table */
     cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)] = tb;
@@ -454,18 +454,18 @@ int cpu_exec(CPUArchState *env)
                     cpu_loop_exit(cpu);
                 }
 #if defined(CONFIG_USER_ONLY)
-                qemu_mutex_lock(&tcg_ctx.tb_ctx.tb_lock);
+                qemu_mutex_lock(&tb_ctx.tb_lock);
                 have_tb_lock = true;
 #endif
                 tb = tb_find_fast(env);
                 /* Note: we do it here to avoid a gcc bug on Mac OS X when
                    doing it in tb_find_slow */
-                if (tcg_ctx.tb_ctx.tb_invalidated_flag) {
+                if (tb_ctx.tb_invalidated_flag) {
                     /* as some TB could have been invalidated because
                        of memory exceptions while generating the code, we
                        must recompute the hash index here */
                     next_tb = 0;
-                    tcg_ctx.tb_ctx.tb_invalidated_flag = 0;
+                    tb_ctx.tb_invalidated_flag = 0;
                 }
                 if (qemu_loglevel_mask(CPU_LOG_EXEC)) {
                     qemu_log("Trace %p [" TARGET_FMT_lx "] %s\n",
@@ -480,7 +480,7 @@ int cpu_exec(CPUArchState *env)
                 }
 #if defined(CONFIG_USER_ONLY)
                 have_tb_lock = false;
-                qemu_mutex_unlock(&tcg_ctx.tb_ctx.tb_lock);
+                qemu_mutex_unlock(&tb_ctx.tb_lock);
 #endif
                 /* cpu_interrupt might be called while translating the
                    TB, but before it is linked into a potentially
@@ -556,7 +556,7 @@ int cpu_exec(CPUArchState *env)
 #endif
 #if defined(CONFIG_USER_ONLY)
             if (have_tb_lock) {
-                qemu_mutex_unlock(&tcg_ctx.tb_ctx.tb_lock);
+                qemu_mutex_unlock(&tb_ctx.tb_lock);
                 have_tb_lock = false;
             }
 #endif
diff --git a/linux-user/main.c b/linux-user/main.c
index 97e7d50..2a4c948 100644
--- a/linux-user/main.c
+++ b/linux-user/main.c
@@ -107,7 +107,7 @@ static int pending_cpus;
 /* Make sure everything is in a consistent state for calling fork().  */
 void fork_start(void)
 {
-    qemu_mutex_lock(&tcg_ctx.tb_ctx.tb_lock);
+    qemu_mutex_lock(&tb_ctx.tb_lock);
     pthread_mutex_lock(&exclusive_lock);
     mmap_fork_start();
 }
@@ -129,11 +129,11 @@ void fork_end(int child)
         pthread_mutex_init(&cpu_list_mutex, NULL);
         pthread_cond_init(&exclusive_cond, NULL);
         pthread_cond_init(&exclusive_resume, NULL);
-        qemu_mutex_init(&tcg_ctx.tb_ctx.tb_lock);
+        qemu_mutex_init(&tb_ctx.tb_lock);
         gdbserver_fork((CPUArchState *)thread_cpu->env_ptr);
     } else {
         pthread_mutex_unlock(&exclusive_lock);
-        qemu_mutex_unlock(&tcg_ctx.tb_ctx.tb_lock);
+        qemu_mutex_unlock(&tb_ctx.tb_lock);
     }
 }
 
diff --git a/tcg/tcg.h b/tcg/tcg.h
index 944b877..baf053a 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -529,13 +529,12 @@ struct TCGContext {
     size_t code_gen_buffer_max_size;
     void *code_gen_ptr;
 
-    TBContext tb_ctx;
-
     /* The TCGBackendData structure is private to tcg-target.c.  */
     struct TCGBackendData *be;
 };
 
 extern TCGContext tcg_ctx;
+extern TBContext tb_ctx;
 
 /* pool based memory allocation */
 
diff --git a/translate-all.c b/translate-all.c
index 0e11c70..e393d30 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -127,6 +127,9 @@ static void *l1_map[V_L1_SIZE];
 /* code generation context */
 TCGContext tcg_ctx;
 
+/* translation block context */
+TBContext tb_ctx;
+
 static void tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc,
                          tb_page_addr_t phys_page2);
 static TranslationBlock *tb_find_pc(uintptr_t tc_ptr);
@@ -684,8 +687,8 @@ static inline void code_gen_alloc(size_t tb_size)
         (TCG_MAX_OP_SIZE * OPC_BUF_SIZE);
     tcg_ctx.code_gen_max_blocks = tcg_ctx.code_gen_buffer_size /
             CODE_GEN_AVG_BLOCK_SIZE;
-    tcg_ctx.tb_ctx.tbs =
-            g_malloc(tcg_ctx.code_gen_max_blocks * sizeof(TranslationBlock));
+    tb_ctx.tbs = g_malloc(tcg_ctx.code_gen_max_blocks
+                          * sizeof(TranslationBlock));
 }
 
 /* Must be called before using the QEMU cpus. 'tb_size' is the size
@@ -716,12 +719,12 @@ static TranslationBlock *tb_alloc(target_ulong pc)
 {
     TranslationBlock *tb;
 
-    if (tcg_ctx.tb_ctx.nb_tbs >= tcg_ctx.code_gen_max_blocks ||
+    if (tb_ctx.nb_tbs >= tcg_ctx.code_gen_max_blocks ||
         (tcg_ctx.code_gen_ptr - tcg_ctx.code_gen_buffer) >=
          tcg_ctx.code_gen_buffer_max_size) {
         return NULL;
     }
-    tb = &tcg_ctx.tb_ctx.tbs[tcg_ctx.tb_ctx.nb_tbs++];
+    tb = &tb_ctx.tbs[tb_ctx.nb_tbs++];
     tb->pc = pc;
     tb->cflags = 0;
     return tb;
@@ -732,10 +735,10 @@ void tb_free(TranslationBlock *tb)
     /* In practice this is mostly used for single use temporary TB
        Ignore the hard cases and just back up if this TB happens to
        be the last one generated.  */
-    if (tcg_ctx.tb_ctx.nb_tbs > 0 &&
-            tb == &tcg_ctx.tb_ctx.tbs[tcg_ctx.tb_ctx.nb_tbs - 1]) {
+    if (tb_ctx.nb_tbs > 0 &&
+            tb == &tb_ctx.tbs[tb_ctx.nb_tbs - 1]) {
         tcg_ctx.code_gen_ptr = tb->tc_ptr;
-        tcg_ctx.tb_ctx.nb_tbs--;
+        tb_ctx.nb_tbs--;
     }
 }
 
@@ -792,27 +795,27 @@ void tb_flush(CPUArchState *env1)
 #if defined(DEBUG_FLUSH)
     printf("qemu: flush code_size=%ld nb_tbs=%d avg_tb_size=%ld\n",
            (unsigned long)(tcg_ctx.code_gen_ptr - tcg_ctx.code_gen_buffer),
-           tcg_ctx.tb_ctx.nb_tbs, tcg_ctx.tb_ctx.nb_tbs > 0 ?
+           tb_ctx.nb_tbs, tb_ctx.nb_tbs > 0 ?
            ((unsigned long)(tcg_ctx.code_gen_ptr - tcg_ctx.code_gen_buffer)) /
-           tcg_ctx.tb_ctx.nb_tbs : 0);
+           tb_ctx.nb_tbs : 0);
 #endif
     if ((unsigned long)(tcg_ctx.code_gen_ptr - tcg_ctx.code_gen_buffer)
         > tcg_ctx.code_gen_buffer_size) {
         cpu_abort(cpu, "Internal error: code buffer overflow\n");
     }
-    tcg_ctx.tb_ctx.nb_tbs = 0;
+    tb_ctx.nb_tbs = 0;
 
     CPU_FOREACH(cpu) {
         memset(cpu->tb_jmp_cache, 0, sizeof(cpu->tb_jmp_cache));
     }
 
-    memset(tcg_ctx.tb_ctx.tb_phys_hash, 0, sizeof(tcg_ctx.tb_ctx.tb_phys_hash));
+    memset(tb_ctx.tb_phys_hash, 0, sizeof(tb_ctx.tb_phys_hash));
     page_flush_tb();
 
     tcg_ctx.code_gen_ptr = tcg_ctx.code_gen_buffer;
     /* XXX: flush processor icache at this point if cache flush is
        expensive */
-    tcg_ctx.tb_ctx.tb_flush_count++;
+    tb_ctx.tb_flush_count++;
 }
 
 #ifdef DEBUG_TB_CHECK
@@ -842,7 +845,7 @@ static void tb_page_check(void)
     int i, flags1, flags2;
 
     for (i = 0; i < CODE_GEN_PHYS_HASH_SIZE; i++) {
-        for (tb = tcg_ctx.tb_ctx.tb_phys_hash[i]; tb != NULL;
+        for (tb = tb_ctx.tb_phys_hash[i]; tb != NULL;
                 tb = tb->phys_hash_next) {
             flags1 = page_get_flags(tb->pc);
             flags2 = page_get_flags(tb->pc + tb->size - 1);
@@ -935,7 +938,7 @@ void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr)
     /* remove the TB from the hash list */
     phys_pc = tb->page_addr[0] + (tb->pc & ~TARGET_PAGE_MASK);
     h = tb_phys_hash_func(phys_pc);
-    tb_hash_remove(&tcg_ctx.tb_ctx.tb_phys_hash[h], tb);
+    tb_hash_remove(&tb_ctx.tb_phys_hash[h], tb);
 
     /* remove the TB from the page list */
     if (tb->page_addr[0] != page_addr) {
@@ -949,7 +952,7 @@ void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr)
         invalidate_page_bitmap(p);
     }
 
-    tcg_ctx.tb_ctx.tb_invalidated_flag = 1;
+    tb_ctx.tb_invalidated_flag = 1;
 
     /* remove the TB from the hash list */
     h = tb_jmp_cache_hash_func(tb->pc);
@@ -978,7 +981,7 @@ void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr)
     }
     tb->jmp_first = (TranslationBlock *)((uintptr_t)tb | 2); /* fail safe */
 
-    tcg_ctx.tb_ctx.tb_phys_invalidate_count++;
+    tb_ctx.tb_phys_invalidate_count++;
 }
 
 static inline void set_bits(uint8_t *tab, int start, int len)
@@ -1058,7 +1061,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
         /* cannot fail at this point */
         tb = tb_alloc(pc);
         /* Don't forget to invalidate previous TB info.  */
-        tcg_ctx.tb_ctx.tb_invalidated_flag = 1;
+        tb_ctx.tb_invalidated_flag = 1;
     }
     tb->tc_ptr = tcg_ctx.code_gen_ptr;
     tb->cs_base = cs_base;
@@ -1392,7 +1395,7 @@ static void tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc,
     mmap_lock();
     /* add in the physical hash table */
     h = tb_phys_hash_func(phys_pc);
-    ptb = &tcg_ctx.tb_ctx.tb_phys_hash[h];
+    ptb = &tb_ctx.tb_phys_hash[h];
     tb->phys_hash_next = *ptb;
     *ptb = tb;
 
@@ -1430,7 +1433,7 @@ static TranslationBlock *tb_find_pc(uintptr_t tc_ptr)
     uintptr_t v;
     TranslationBlock *tb;
 
-    if (tcg_ctx.tb_ctx.nb_tbs <= 0) {
+    if (tb_ctx.nb_tbs <= 0) {
         return NULL;
     }
     if (tc_ptr < (uintptr_t)tcg_ctx.code_gen_buffer ||
@@ -1439,10 +1442,10 @@ static TranslationBlock *tb_find_pc(uintptr_t tc_ptr)
     }
     /* binary search (cf Knuth) */
     m_min = 0;
-    m_max = tcg_ctx.tb_ctx.nb_tbs - 1;
+    m_max = tb_ctx.nb_tbs - 1;
     while (m_min <= m_max) {
         m = (m_min + m_max) >> 1;
-        tb = &tcg_ctx.tb_ctx.tbs[m];
+        tb = &tb_ctx.tbs[m];
         v = (uintptr_t)tb->tc_ptr;
         if (v == tc_ptr) {
             return tb;
@@ -1452,7 +1455,7 @@ static TranslationBlock *tb_find_pc(uintptr_t tc_ptr)
             m_min = m + 1;
         }
     }
-    return &tcg_ctx.tb_ctx.tbs[m_max];
+    return &tb_ctx.tbs[m_max];
 }
 
 #if defined(TARGET_HAS_ICE) && !defined(CONFIG_USER_ONLY)
@@ -1606,8 +1609,8 @@ void dump_exec_info(FILE *f, fprintf_function cpu_fprintf)
     cross_page = 0;
     direct_jmp_count = 0;
     direct_jmp2_count = 0;
-    for (i = 0; i < tcg_ctx.tb_ctx.nb_tbs; i++) {
-        tb = &tcg_ctx.tb_ctx.tbs[i];
+    for (i = 0; i < tb_ctx.nb_tbs; i++) {
+        tb = &tb_ctx.tbs[i];
         target_code_size += tb->size;
         if (tb->size > max_target_code_size) {
             max_target_code_size = tb->size;
@@ -1628,32 +1631,28 @@ void dump_exec_info(FILE *f, fprintf_function cpu_fprintf)
                 tcg_ctx.code_gen_ptr - tcg_ctx.code_gen_buffer,
                 tcg_ctx.code_gen_buffer_max_size);
     cpu_fprintf(f, "TB count            %d/%d\n",
-            tcg_ctx.tb_ctx.nb_tbs, tcg_ctx.code_gen_max_blocks);
+                tb_ctx.nb_tbs, tcg_ctx.code_gen_max_blocks);
     cpu_fprintf(f, "TB avg target size  %d max=%d bytes\n",
-            tcg_ctx.tb_ctx.nb_tbs ? target_code_size /
-                    tcg_ctx.tb_ctx.nb_tbs : 0,
-            max_target_code_size);
+                tb_ctx.nb_tbs ? target_code_size / tb_ctx.nb_tbs : 0,
+                max_target_code_size);
     cpu_fprintf(f, "TB avg host size    %td bytes (expansion ratio: %0.1f)\n",
-            tcg_ctx.tb_ctx.nb_tbs ? (tcg_ctx.code_gen_ptr -
-                                     tcg_ctx.code_gen_buffer) /
-                                     tcg_ctx.tb_ctx.nb_tbs : 0,
+                tb_ctx.nb_tbs ? (tcg_ctx.code_gen_ptr - tcg_ctx.code_gen_buffer)
+                               / tb_ctx.nb_tbs : 0,
                 target_code_size ? (double) (tcg_ctx.code_gen_ptr -
                                              tcg_ctx.code_gen_buffer) /
                                              target_code_size : 0);
     cpu_fprintf(f, "cross page TB count %d (%d%%)\n", cross_page,
-            tcg_ctx.tb_ctx.nb_tbs ? (cross_page * 100) /
-                                    tcg_ctx.tb_ctx.nb_tbs : 0);
+                tb_ctx.nb_tbs ? (cross_page * 100) / tb_ctx.nb_tbs : 0);
     cpu_fprintf(f, "direct jump count   %d (%d%%) (2 jumps=%d %d%%)\n",
-                direct_jmp_count,
-                tcg_ctx.tb_ctx.nb_tbs ? (direct_jmp_count * 100) /
-                        tcg_ctx.tb_ctx.nb_tbs : 0,
+                direct_jmp_count, tb_ctx.nb_tbs ? (direct_jmp_count * 100) /
+                tb_ctx.nb_tbs : 0,
                 direct_jmp2_count,
-                tcg_ctx.tb_ctx.nb_tbs ? (direct_jmp2_count * 100) /
-                        tcg_ctx.tb_ctx.nb_tbs : 0);
+                tb_ctx.nb_tbs ? (direct_jmp2_count * 100) /
+                tb_ctx.nb_tbs : 0);
     cpu_fprintf(f, "\nStatistics:\n");
-    cpu_fprintf(f, "TB flush count      %d\n", tcg_ctx.tb_ctx.tb_flush_count);
+    cpu_fprintf(f, "TB flush count      %d\n", tb_ctx.tb_flush_count);
     cpu_fprintf(f, "TB invalidate count %d\n",
-            tcg_ctx.tb_ctx.tb_phys_invalidate_count);
+                tb_ctx.tb_phys_invalidate_count);
     cpu_fprintf(f, "TLB flush count     %d\n", tlb_flush_count);
     tcg_dump_info(f, cpu_fprintf);
 }
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [Qemu-devel] [RFC 06/10] protect TBContext with tb_lock.
  2015-01-16 17:19 [Qemu-devel] [RFC 00/10] MultiThread TCG fred.konrad
                   ` (4 preceding siblings ...)
  2015-01-16 17:19 ` [Qemu-devel] [RFC 05/10] extract TBContext from TCGContext fred.konrad
@ 2015-01-16 17:19 ` fred.konrad
  2015-01-16 17:19 ` [Qemu-devel] [RFC 07/10] tcg: remove tcg_halt_cond global variable fred.konrad
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 62+ messages in thread
From: fred.konrad @ 2015-01-16 17:19 UTC (permalink / raw)
  To: qemu-devel, mttcg
  Cc: peter.maydell, jan.kiszka, mark.burton, agraf, pbonzini, fred.konrad

From: KONRAD Frederic <fred.konrad@greensocs.com>

This protects TBContext with tb_lock to make tb_* thread safe.

We can still have issue with tb_flush in case of multithread TCG:
  An other CPU can be executing code during a flush.

This can be fixed later by making all other TCG thread exiting before calling
tb_flush().

Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
---
 cpu-exec.c      |  16 ++++++++-
 translate-all.c | 100 +++++++++++++++++++++++++++++++++++++-------------------
 2 files changed, 82 insertions(+), 34 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index 4d22252..68654e3 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -237,6 +237,7 @@ static TranslationBlock *tb_find_slow(CPUArchState *env,
     tb_page_addr_t phys_pc, phys_page1;
     target_ulong virt_page2;
 
+    qemu_mutex_lock(&tb_ctx.tb_lock);
     tb_ctx.tb_invalidated_flag = 0;
 
     /* find translated block using physical mappings */
@@ -268,8 +269,14 @@ static TranslationBlock *tb_find_slow(CPUArchState *env,
         ptb1 = &tb->phys_hash_next;
     }
  not_found:
+   /*
+    * FIXME: We need to release this mutex because tb_gen_code needs it.
+    * This can be optimised by adding a flag to tb_gen_code?
+    */
+   qemu_mutex_unlock(&tb_ctx.tb_lock);
    /* if no translated code available, then translate it now */
-    tb = tb_gen_code(cpu, pc, cs_base, flags, 0);
+   tb = tb_gen_code(cpu, pc, cs_base, flags, 0);
+   qemu_mutex_lock(&tb_ctx.tb_lock);
 
  found:
     /* Move the last found TB to the head of the list */
@@ -280,6 +287,7 @@ static TranslationBlock *tb_find_slow(CPUArchState *env,
     }
     /* we add the TB in the virtual pc hash table */
     cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)] = tb;
+    qemu_mutex_unlock(&tb_ctx.tb_lock);
     return tb;
 }
 
@@ -460,6 +468,9 @@ int cpu_exec(CPUArchState *env)
                 tb = tb_find_fast(env);
                 /* Note: we do it here to avoid a gcc bug on Mac OS X when
                    doing it in tb_find_slow */
+#if !defined(CONFIG_USER_ONLY)
+                qemu_mutex_lock(&tb_ctx.tb_lock);
+#endif
                 if (tb_ctx.tb_invalidated_flag) {
                     /* as some TB could have been invalidated because
                        of memory exceptions while generating the code, we
@@ -467,6 +478,9 @@ int cpu_exec(CPUArchState *env)
                     next_tb = 0;
                     tb_ctx.tb_invalidated_flag = 0;
                 }
+#if !defined(CONFIG_USER_ONLY)
+                qemu_mutex_unlock(&tb_ctx.tb_lock);
+#endif
                 if (qemu_loglevel_mask(CPU_LOG_EXEC)) {
                     qemu_log("Trace %p [" TARGET_FMT_lx "] %s\n",
                              tb->tc_ptr, tb->pc, lookup_symbol(tb->pc));
diff --git a/translate-all.c b/translate-all.c
index e393d30..68505c0 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -689,6 +689,7 @@ static inline void code_gen_alloc(size_t tb_size)
             CODE_GEN_AVG_BLOCK_SIZE;
     tb_ctx.tbs = g_malloc(tcg_ctx.code_gen_max_blocks
                           * sizeof(TranslationBlock));
+    qemu_mutex_init(&tb_ctx.tb_lock);
 }
 
 /* Must be called before using the QEMU cpus. 'tb_size' is the size
@@ -713,20 +714,23 @@ bool tcg_enabled(void)
     return tcg_ctx.code_gen_buffer != NULL;
 }
 
-/* Allocate a new translation block. Flush the translation buffer if
-   too many translation blocks or too much generated code. */
+/*
+ * Allocate a new translation block. Flush the translation buffer if
+ * too many translation blocks or too much generated code.
+ * tb_alloc is not thread safe but tb_gen_code is protected by a mutex so this
+ * function is called only by one thread.
+ */
 static TranslationBlock *tb_alloc(target_ulong pc)
 {
-    TranslationBlock *tb;
+    TranslationBlock *tb = NULL;
 
-    if (tb_ctx.nb_tbs >= tcg_ctx.code_gen_max_blocks ||
-        (tcg_ctx.code_gen_ptr - tcg_ctx.code_gen_buffer) >=
+    if (tb_ctx.nb_tbs < tcg_ctx.code_gen_max_blocks &&
+        (tcg_ctx.code_gen_ptr - tcg_ctx.code_gen_buffer) <
          tcg_ctx.code_gen_buffer_max_size) {
-        return NULL;
+        tb = &tb_ctx.tbs[tb_ctx.nb_tbs++];
+        tb->pc = pc;
+        tb->cflags = 0;
     }
-    tb = &tb_ctx.tbs[tb_ctx.nb_tbs++];
-    tb->pc = pc;
-    tb->cflags = 0;
     return tb;
 }
 
@@ -735,11 +739,16 @@ void tb_free(TranslationBlock *tb)
     /* In practice this is mostly used for single use temporary TB
        Ignore the hard cases and just back up if this TB happens to
        be the last one generated.  */
+
+    qemu_mutex_lock(&tb_ctx.tb_lock);
+
     if (tb_ctx.nb_tbs > 0 &&
             tb == &tb_ctx.tbs[tb_ctx.nb_tbs - 1]) {
         tcg_ctx.code_gen_ptr = tb->tc_ptr;
         tb_ctx.nb_tbs--;
     }
+
+    qemu_mutex_unlock(&tb_ctx.tb_lock);
 }
 
 static inline void invalidate_page_bitmap(PageDesc *p)
@@ -792,6 +801,8 @@ void tb_flush(CPUArchState *env1)
 {
     CPUState *cpu = ENV_GET_CPU(env1);
 
+    qemu_mutex_lock(&tb_ctx.tb_lock);
+
 #if defined(DEBUG_FLUSH)
     printf("qemu: flush code_size=%ld nb_tbs=%d avg_tb_size=%ld\n",
            (unsigned long)(tcg_ctx.code_gen_ptr - tcg_ctx.code_gen_buffer),
@@ -816,6 +827,8 @@ void tb_flush(CPUArchState *env1)
     /* XXX: flush processor icache at this point if cache flush is
        expensive */
     tb_ctx.tb_flush_count++;
+
+    qemu_mutex_unlock(&tb_ctx.tb_lock);
 }
 
 #ifdef DEBUG_TB_CHECK
@@ -825,6 +838,8 @@ static void tb_invalidate_check(target_ulong address)
     TranslationBlock *tb;
     int i;
 
+    qemu_mutex_lock(&tb_ctx.tb_lock);
+
     address &= TARGET_PAGE_MASK;
     for (i = 0; i < CODE_GEN_PHYS_HASH_SIZE; i++) {
         for (tb = tb_ctx.tb_phys_hash[i]; tb != NULL; tb = tb->phys_hash_next) {
@@ -836,6 +851,8 @@ static void tb_invalidate_check(target_ulong address)
             }
         }
     }
+
+    qemu_mutex_unlock(&tb_ctx.tb_lock);
 }
 
 /* verify that all the pages have correct rights for code */
@@ -844,6 +861,8 @@ static void tb_page_check(void)
     TranslationBlock *tb;
     int i, flags1, flags2;
 
+    qemu_mutex_lock(&tb_ctx.tb_lock);
+
     for (i = 0; i < CODE_GEN_PHYS_HASH_SIZE; i++) {
         for (tb = tb_ctx.tb_phys_hash[i]; tb != NULL;
                 tb = tb->phys_hash_next) {
@@ -855,6 +874,8 @@ static void tb_page_check(void)
             }
         }
     }
+
+    qemu_mutex_unlock(&tb_ctx.tb_lock);
 }
 
 #endif
@@ -935,6 +956,8 @@ void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr)
     tb_page_addr_t phys_pc;
     TranslationBlock *tb1, *tb2;
 
+    qemu_mutex_lock(&tb_ctx.tb_lock);
+
     /* remove the TB from the hash list */
     phys_pc = tb->page_addr[0] + (tb->pc & ~TARGET_PAGE_MASK);
     h = tb_phys_hash_func(phys_pc);
@@ -982,6 +1005,8 @@ void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr)
     tb->jmp_first = (TranslationBlock *)((uintptr_t)tb | 2); /* fail safe */
 
     tb_ctx.tb_phys_invalidate_count++;
+
+    qemu_mutex_unlock(&tb_ctx.tb_lock);
 }
 
 static inline void set_bits(uint8_t *tab, int start, int len)
@@ -1050,6 +1075,8 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
     target_ulong virt_page2;
     int code_gen_size;
 
+    qemu_mutex_lock(&tb_ctx.tb_lock);
+
     phys_pc = get_page_addr_code(env, pc);
     if (use_icount) {
         cflags |= CF_USE_ICOUNT;
@@ -1078,6 +1105,8 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
         phys_page2 = get_page_addr_code(env, virt_page2);
     }
     tb_link_page(tb, phys_pc, phys_page2);
+
+    qemu_mutex_unlock(&tb_ctx.tb_lock);
     return tb;
 }
 
@@ -1383,7 +1412,7 @@ static inline void tb_alloc_page(TranslationBlock *tb,
 }
 
 /* add a new TB and link it to the physical page tables. phys_page2 is
-   (-1) to indicate that only one page contains the TB. */
+ * (-1) to indicate that only one page contains the TB. */
 static void tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc,
                          tb_page_addr_t phys_page2)
 {
@@ -1431,31 +1460,32 @@ static TranslationBlock *tb_find_pc(uintptr_t tc_ptr)
 {
     int m_min, m_max, m;
     uintptr_t v;
-    TranslationBlock *tb;
-
-    if (tb_ctx.nb_tbs <= 0) {
-        return NULL;
-    }
-    if (tc_ptr < (uintptr_t)tcg_ctx.code_gen_buffer ||
-        tc_ptr >= (uintptr_t)tcg_ctx.code_gen_ptr) {
-        return NULL;
-    }
-    /* binary search (cf Knuth) */
-    m_min = 0;
-    m_max = tb_ctx.nb_tbs - 1;
-    while (m_min <= m_max) {
-        m = (m_min + m_max) >> 1;
-        tb = &tb_ctx.tbs[m];
-        v = (uintptr_t)tb->tc_ptr;
-        if (v == tc_ptr) {
-            return tb;
-        } else if (tc_ptr < v) {
-            m_max = m - 1;
-        } else {
-            m_min = m + 1;
+    TranslationBlock *tb = NULL;
+
+    qemu_mutex_lock(&tb_ctx.tb_lock);
+
+    if ((tb_ctx.nb_tbs > 0) && (tc_ptr >= (uintptr_t)tcg_ctx.code_gen_buffer &&
+        tc_ptr < (uintptr_t)tcg_ctx.code_gen_ptr)) {
+        /* binary search (cf Knuth) */
+        m_min = 0;
+        m_max = tb_ctx.nb_tbs - 1;
+        while (m_min <= m_max) {
+            m = (m_min + m_max) >> 1;
+            tb = &tb_ctx.tbs[m];
+            v = (uintptr_t)tb->tc_ptr;
+            if (v == tc_ptr) {
+                return tb;
+            } else if (tc_ptr < v) {
+                m_max = m - 1;
+            } else {
+                m_min = m + 1;
+            }
         }
+        tb = &tb_ctx.tbs[m_max];
     }
-    return &tb_ctx.tbs[m_max];
+
+    qemu_mutex_unlock(&tb_ctx.tb_lock);
+    return tb;
 }
 
 #if defined(TARGET_HAS_ICE) && !defined(CONFIG_USER_ONLY)
@@ -1604,6 +1634,8 @@ void dump_exec_info(FILE *f, fprintf_function cpu_fprintf)
     int direct_jmp_count, direct_jmp2_count, cross_page;
     TranslationBlock *tb;
 
+    qemu_mutex_lock(&tb_ctx.tb_lock);
+
     target_code_size = 0;
     max_target_code_size = 0;
     cross_page = 0;
@@ -1655,6 +1687,8 @@ void dump_exec_info(FILE *f, fprintf_function cpu_fprintf)
                 tb_ctx.tb_phys_invalidate_count);
     cpu_fprintf(f, "TLB flush count     %d\n", tlb_flush_count);
     tcg_dump_info(f, cpu_fprintf);
+
+    qemu_mutex_unlock(&tb_ctx.tb_lock);
 }
 
 void dump_opcount_info(FILE *f, fprintf_function cpu_fprintf)
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [Qemu-devel] [RFC 07/10] tcg: remove tcg_halt_cond global variable.
  2015-01-16 17:19 [Qemu-devel] [RFC 00/10] MultiThread TCG fred.konrad
                   ` (5 preceding siblings ...)
  2015-01-16 17:19 ` [Qemu-devel] [RFC 06/10] protect TBContext with tb_lock fred.konrad
@ 2015-01-16 17:19 ` fred.konrad
  2015-01-16 17:19 ` [Qemu-devel] [RFC 08/10] Drop global lock during TCG code execution fred.konrad
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 62+ messages in thread
From: fred.konrad @ 2015-01-16 17:19 UTC (permalink / raw)
  To: qemu-devel, mttcg
  Cc: peter.maydell, jan.kiszka, mark.burton, agraf, pbonzini, fred.konrad

From: KONRAD Frederic <fred.konrad@greensocs.com>

This removes tcg_halt_cond global variable.
We need one QemuCond per virtual cpu for multithread TCG.

Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
---
 cpus.c | 18 +++++++-----------
 1 file changed, 7 insertions(+), 11 deletions(-)

diff --git a/cpus.c b/cpus.c
index 2edb5cd..91a48f2 100644
--- a/cpus.c
+++ b/cpus.c
@@ -796,7 +796,6 @@ static bool iothread_requesting_mutex;
 static QemuThread io_thread;
 
 static QemuThread *tcg_cpu_thread;
-static QemuCond *tcg_halt_cond;
 
 /* cpu creation */
 static QemuCond qemu_cpu_cond;
@@ -902,15 +901,13 @@ static void qemu_wait_io_event_common(CPUState *cpu)
     cpu->thread_kicked = false;
 }
 
-static void qemu_tcg_wait_io_event(void)
+static void qemu_tcg_wait_io_event(CPUState *cpu)
 {
-    CPUState *cpu;
-
     while (all_cpu_threads_idle()) {
        /* Start accounting real time to the virtual clock if the CPUs
           are idle.  */
         qemu_clock_warp(QEMU_CLOCK_VIRTUAL);
-        qemu_cond_wait(tcg_halt_cond, &qemu_global_mutex);
+        qemu_cond_wait(cpu->halt_cond, &qemu_global_mutex);
     }
 
     while (iothread_requesting_mutex) {
@@ -1030,7 +1027,7 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
 
     /* wait for initial kick-off after machine start */
     while (QTAILQ_FIRST(&cpus)->stopped) {
-        qemu_cond_wait(tcg_halt_cond, &qemu_global_mutex);
+        qemu_cond_wait(QTAILQ_FIRST(&cpus)->halt_cond, &qemu_global_mutex);
 
         /* process any pending work */
         CPU_FOREACH(cpu) {
@@ -1048,7 +1045,7 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
                 qemu_clock_notify(QEMU_CLOCK_VIRTUAL);
             }
         }
-        qemu_tcg_wait_io_event();
+        qemu_tcg_wait_io_event(QTAILQ_FIRST(&cpus));
     }
 
     return NULL;
@@ -1214,12 +1211,12 @@ static void qemu_tcg_init_vcpu(CPUState *cpu)
 
     tcg_cpu_address_space_init(cpu, cpu->as);
 
+    cpu->halt_cond = g_malloc0(sizeof(QemuCond));
+    qemu_cond_init(cpu->halt_cond);
+
     /* share a single thread for all cpus with TCG */
     if (!tcg_cpu_thread) {
         cpu->thread = g_malloc0(sizeof(QemuThread));
-        cpu->halt_cond = g_malloc0(sizeof(QemuCond));
-        qemu_cond_init(cpu->halt_cond);
-        tcg_halt_cond = cpu->halt_cond;
         snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/TCG",
                  cpu->cpu_index);
         qemu_thread_create(cpu->thread, thread_name, qemu_tcg_cpu_thread_fn,
@@ -1233,7 +1230,6 @@ static void qemu_tcg_init_vcpu(CPUState *cpu)
         tcg_cpu_thread = cpu->thread;
     } else {
         cpu->thread = tcg_cpu_thread;
-        cpu->halt_cond = tcg_halt_cond;
     }
 }
 
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [Qemu-devel] [RFC 08/10] Drop global lock during TCG code execution
  2015-01-16 17:19 [Qemu-devel] [RFC 00/10] MultiThread TCG fred.konrad
                   ` (6 preceding siblings ...)
  2015-01-16 17:19 ` [Qemu-devel] [RFC 07/10] tcg: remove tcg_halt_cond global variable fred.konrad
@ 2015-01-16 17:19 ` fred.konrad
  2015-01-16 17:19 ` [Qemu-devel] [RFC 09/10] cpu: remove exit_request global fred.konrad
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 62+ messages in thread
From: fred.konrad @ 2015-01-16 17:19 UTC (permalink / raw)
  To: qemu-devel, mttcg
  Cc: peter.maydell, jan.kiszka, mark.burton, agraf, pbonzini, fred.konrad

From: Jan Kiszka <jan.kiszka@siemens.com>

This finally allows TCG to benefit from the iothread introduction: Drop
the global mutex while running pure TCG CPU code. Reacquire the lock
when entering MMIO or PIO emulation, or when leaving the TCG loop.

We have to revert a few optimization for the current TCG threading
model, namely kicking the TCG thread in qemu_mutex_lock_iothread and not
kicking it in qemu_cpu_kick. We also need to disable RAM block
reordering until we have a more efficient locking mechanism at hand.

I'm pretty sure some cases are still broken, definitely SMP (we no
longer perform round-robin scheduling "by chance"). Still, a Linux x86
UP guest and my Musicpal ARM model boot fine here. These numbers
demonstrate where we gain something:

20338 jan       20   0  331m  75m 6904 R   99  0.9   0:50.95 qemu-system-arm
20337 jan       20   0  331m  75m 6904 S   20  0.9   0:26.50 qemu-system-arm

The guest CPU was fully loaded, but the iothread could still run mostly
independent on a second core. Without the patch we don't get beyond

32206 jan       20   0  330m  73m 7036 R   82  0.9   1:06.00 qemu-system-arm
32204 jan       20   0  330m  73m 7036 S   21  0.9   0:17.03 qemu-system-arm

We don't benefit significantly, though, when the guest is not fully
loading a host CPU.

Note that this patch depends on
http://thread.gmane.org/gmane.comp.emulators.qemu/118657

Changes from Fred Konrad:
  * Rebase on the current HEAD.
  * Fixes a deadlock in qemu_devices_reset().
---
 cpus.c                    | 19 +++++++------------
 cputlb.c                  |  5 +++++
 exec.c                    | 25 +++++++++++++++++++++++++
 softmmu_template.h        |  6 ++++++
 target-i386/misc_helper.c | 27 ++++++++++++++++++++++++---
 translate-all.c           |  2 ++
 vl.c                      |  6 ++++++
 7 files changed, 75 insertions(+), 15 deletions(-)

diff --git a/cpus.c b/cpus.c
index 91a48f2..f10c94d 100644
--- a/cpus.c
+++ b/cpus.c
@@ -1017,7 +1017,7 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
     qemu_tcg_init_cpu_signals();
     qemu_thread_get_self(cpu->thread);
 
-    qemu_mutex_lock(&qemu_global_mutex);
+    qemu_mutex_lock_iothread();
     CPU_FOREACH(cpu) {
         cpu->thread_id = qemu_get_thread_id();
         cpu->created = true;
@@ -1125,17 +1125,7 @@ static bool qemu_in_vcpu_thread(void)
 
 void qemu_mutex_lock_iothread(void)
 {
-    if (!tcg_enabled()) {
-        qemu_mutex_lock(&qemu_global_mutex);
-    } else {
-        iothread_requesting_mutex = true;
-        if (qemu_mutex_trylock(&qemu_global_mutex)) {
-            qemu_cpu_kick_thread(first_cpu);
-            qemu_mutex_lock(&qemu_global_mutex);
-        }
-        iothread_requesting_mutex = false;
-        qemu_cond_broadcast(&qemu_io_proceeded_cond);
-    }
+    qemu_mutex_lock(&qemu_global_mutex);
 }
 
 void qemu_mutex_unlock_iothread(void)
@@ -1356,7 +1346,12 @@ static int tcg_cpu_exec(CPUArchState *env)
         cpu->icount_decr.u16.low = decr;
         cpu->icount_extra = count;
     }
+
+    qemu_mutex_unlock_iothread();
+
     ret = cpu_exec(env);
+
+    qemu_mutex_lock_iothread();
 #ifdef CONFIG_PROFILER
     qemu_time += profile_getclock() - ti;
 #endif
diff --git a/cputlb.c b/cputlb.c
index 3b271d4..4a7e634 100644
--- a/cputlb.c
+++ b/cputlb.c
@@ -30,6 +30,9 @@
 #include "exec/ram_addr.h"
 #include "tcg/tcg.h"
 
+void qemu_mutex_lock_iothread(void);
+void qemu_mutex_unlock_iothread(void);
+
 //#define DEBUG_TLB
 //#define DEBUG_TLB_CHECK
 
@@ -125,8 +128,10 @@ void tlb_flush_page(CPUState *cpu, target_ulong addr)
    can be detected */
 void tlb_protect_code(ram_addr_t ram_addr)
 {
+    qemu_mutex_lock_iothread();
     cpu_physical_memory_reset_dirty(ram_addr, TARGET_PAGE_SIZE,
                                     DIRTY_MEMORY_CODE);
+    qemu_mutex_unlock_iothread();
 }
 
 /* update the TLB so that writes in physical page 'phys_addr' are no longer
diff --git a/exec.c b/exec.c
index 081818e..705d451 100644
--- a/exec.c
+++ b/exec.c
@@ -1786,6 +1786,7 @@ static void check_watchpoint(int offset, int len, int flags)
             }
             wp->hitaddr = vaddr;
             if (!cpu->watchpoint_hit) {
+                qemu_mutex_unlock_iothread();
                 cpu->watchpoint_hit = wp;
                 tb_check_watchpoint(cpu);
                 if (wp->flags & BP_STOP_BEFORE_ACCESS) {
@@ -2557,6 +2558,7 @@ static inline uint32_t ldl_phys_internal(AddressSpace *as, hwaddr addr,
     mr = address_space_translate(as, addr, &addr1, &l, false);
     if (l < 4 || !memory_access_is_direct(mr, false)) {
         /* I/O case */
+        qemu_mutex_lock_iothread();
         io_mem_read(mr, addr1, &val, 4);
 #if defined(TARGET_WORDS_BIGENDIAN)
         if (endian == DEVICE_LITTLE_ENDIAN) {
@@ -2567,6 +2569,7 @@ static inline uint32_t ldl_phys_internal(AddressSpace *as, hwaddr addr,
             val = bswap32(val);
         }
 #endif
+        qemu_mutex_unlock_iothread();
     } else {
         /* RAM case */
         ptr = qemu_get_ram_ptr((memory_region_get_ram_addr(mr)
@@ -2616,6 +2619,7 @@ static inline uint64_t ldq_phys_internal(AddressSpace *as, hwaddr addr,
                                  false);
     if (l < 8 || !memory_access_is_direct(mr, false)) {
         /* I/O case */
+        qemu_mutex_lock_iothread();
         io_mem_read(mr, addr1, &val, 8);
 #if defined(TARGET_WORDS_BIGENDIAN)
         if (endian == DEVICE_LITTLE_ENDIAN) {
@@ -2626,6 +2630,7 @@ static inline uint64_t ldq_phys_internal(AddressSpace *as, hwaddr addr,
             val = bswap64(val);
         }
 #endif
+        qemu_mutex_unlock_iothread();
     } else {
         /* RAM case */
         ptr = qemu_get_ram_ptr((memory_region_get_ram_addr(mr)
@@ -2683,6 +2688,7 @@ static inline uint32_t lduw_phys_internal(AddressSpace *as, hwaddr addr,
                                  false);
     if (l < 2 || !memory_access_is_direct(mr, false)) {
         /* I/O case */
+        qemu_mutex_lock_iothread();
         io_mem_read(mr, addr1, &val, 2);
 #if defined(TARGET_WORDS_BIGENDIAN)
         if (endian == DEVICE_LITTLE_ENDIAN) {
@@ -2693,6 +2699,7 @@ static inline uint32_t lduw_phys_internal(AddressSpace *as, hwaddr addr,
             val = bswap16(val);
         }
 #endif
+        qemu_mutex_unlock_iothread();
     } else {
         /* RAM case */
         ptr = qemu_get_ram_ptr((memory_region_get_ram_addr(mr)
@@ -2741,7 +2748,9 @@ void stl_phys_notdirty(AddressSpace *as, hwaddr addr, uint32_t val)
     mr = address_space_translate(as, addr, &addr1, &l,
                                  true);
     if (l < 4 || !memory_access_is_direct(mr, true)) {
+        qemu_mutex_lock_iothread();
         io_mem_write(mr, addr1, val, 4);
+        qemu_mutex_unlock_iothread();
     } else {
         addr1 += memory_region_get_ram_addr(mr) & TARGET_PAGE_MASK;
         ptr = qemu_get_ram_ptr(addr1);
@@ -2749,10 +2758,12 @@ void stl_phys_notdirty(AddressSpace *as, hwaddr addr, uint32_t val)
 
         if (unlikely(in_migration)) {
             if (cpu_physical_memory_is_clean(addr1)) {
+                qemu_mutex_lock_iothread();
                 /* invalidate code */
                 tb_invalidate_phys_page_range(addr1, addr1 + 4, 0);
                 /* set dirty bit */
                 cpu_physical_memory_set_dirty_range_nocode(addr1, 4);
+                qemu_mutex_unlock_iothread();
             }
         }
     }
@@ -2780,7 +2791,9 @@ static inline void stl_phys_internal(AddressSpace *as,
             val = bswap32(val);
         }
 #endif
+        qemu_mutex_lock_iothread();
         io_mem_write(mr, addr1, val, 4);
+        qemu_mutex_unlock_iothread();
     } else {
         /* RAM case */
         addr1 += memory_region_get_ram_addr(mr) & TARGET_PAGE_MASK;
@@ -2796,7 +2809,9 @@ static inline void stl_phys_internal(AddressSpace *as,
             stl_p(ptr, val);
             break;
         }
+        qemu_mutex_lock_iothread();
         invalidate_and_set_dirty(addr1, 4);
+        qemu_mutex_unlock_iothread();
     }
 }
 
@@ -2843,7 +2858,9 @@ static inline void stw_phys_internal(AddressSpace *as,
             val = bswap16(val);
         }
 #endif
+        qemu_mutex_lock_iothread();
         io_mem_write(mr, addr1, val, 2);
+        qemu_mutex_unlock_iothread();
     } else {
         /* RAM case */
         addr1 += memory_region_get_ram_addr(mr) & TARGET_PAGE_MASK;
@@ -2859,7 +2876,9 @@ static inline void stw_phys_internal(AddressSpace *as,
             stw_p(ptr, val);
             break;
         }
+        qemu_mutex_lock_iothread();
         invalidate_and_set_dirty(addr1, 2);
+        qemu_mutex_unlock_iothread();
     }
 }
 
@@ -2881,20 +2900,26 @@ void stw_be_phys(AddressSpace *as, hwaddr addr, uint32_t val)
 /* XXX: optimize */
 void stq_phys(AddressSpace *as, hwaddr addr, uint64_t val)
 {
+    qemu_mutex_lock_iothread();
     val = tswap64(val);
     address_space_rw(as, addr, (void *) &val, 8, 1);
+    qemu_mutex_unlock_iothread();
 }
 
 void stq_le_phys(AddressSpace *as, hwaddr addr, uint64_t val)
 {
+    qemu_mutex_lock_iothread();
     val = cpu_to_le64(val);
     address_space_rw(as, addr, (void *) &val, 8, 1);
+    qemu_mutex_unlock_iothread();
 }
 
 void stq_be_phys(AddressSpace *as, hwaddr addr, uint64_t val)
 {
+    qemu_mutex_lock_iothread();
     val = cpu_to_be64(val);
     address_space_rw(as, addr, (void *) &val, 8, 1);
+    qemu_mutex_unlock_iothread();
 }
 
 /* virtual memory access for debug (includes writing to ROM) */
diff --git a/softmmu_template.h b/softmmu_template.h
index 6b4e615..e3c6dc8 100644
--- a/softmmu_template.h
+++ b/softmmu_template.h
@@ -157,8 +157,12 @@ static inline DATA_TYPE glue(io_read, SUFFIX)(CPUArchState *env,
         cpu_io_recompile(cpu, retaddr);
     }
 
+    qemu_mutex_lock_iothread();
+
     cpu->mem_io_vaddr = addr;
     io_mem_read(mr, physaddr, &val, 1 << SHIFT);
+
+    qemu_mutex_unlock_iothread();
     return val;
 }
 #endif
@@ -376,9 +380,11 @@ static inline void glue(io_write, SUFFIX)(CPUArchState *env,
         cpu_io_recompile(cpu, retaddr);
     }
 
+    qemu_mutex_lock_iothread();
     cpu->mem_io_vaddr = addr;
     cpu->mem_io_pc = retaddr;
     io_mem_write(mr, physaddr, val, 1 << SHIFT);
+    qemu_mutex_unlock_iothread();
 }
 
 void helper_le_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
diff --git a/target-i386/misc_helper.c b/target-i386/misc_helper.c
index 4aaf1e4..0a953a9 100644
--- a/target-i386/misc_helper.c
+++ b/target-i386/misc_helper.c
@@ -24,32 +24,53 @@
 
 void helper_outb(uint32_t port, uint32_t data)
 {
+    qemu_mutex_lock_iothread();
     cpu_outb(port, data & 0xff);
+    qemu_mutex_unlock_iothread();
 }
 
 target_ulong helper_inb(uint32_t port)
 {
-    return cpu_inb(port);
+    target_ulong ret;
+
+    qemu_mutex_lock_iothread();
+    ret = cpu_inb(port);
+    qemu_mutex_unlock_iothread();
+    return ret;
 }
 
 void helper_outw(uint32_t port, uint32_t data)
 {
+    qemu_mutex_lock_iothread();
     cpu_outw(port, data & 0xffff);
+    qemu_mutex_unlock_iothread();
 }
 
 target_ulong helper_inw(uint32_t port)
 {
-    return cpu_inw(port);
+    target_ulong ret;
+
+    qemu_mutex_lock_iothread();
+    ret = cpu_inw(port);
+    qemu_mutex_unlock_iothread();
+    return ret;
 }
 
 void helper_outl(uint32_t port, uint32_t data)
 {
+    qemu_mutex_lock_iothread();
     cpu_outl(port, data);
+    qemu_mutex_unlock_iothread();
 }
 
 target_ulong helper_inl(uint32_t port)
 {
-    return cpu_inl(port);
+    target_ulong ret;
+
+    qemu_mutex_lock_iothread();
+    ret = cpu_inl(port);
+    qemu_mutex_unlock_iothread();
+    return ret;
 }
 
 void helper_into(CPUX86State *env, int next_eip_addend)
diff --git a/translate-all.c b/translate-all.c
index 68505c0..a986d61 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -1240,6 +1240,7 @@ void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end,
 #endif
 #ifdef TARGET_HAS_PRECISE_SMC
     if (current_tb_modified) {
+        qemu_mutex_unlock_iothread();
         /* we generate a block containing just the instruction
            modifying the memory. It will ensure that it cannot modify
            itself */
@@ -1337,6 +1338,7 @@ static void tb_invalidate_phys_page(tb_page_addr_t addr,
     p->first_tb[current_cpu->cpu_index] = NULL;
 #ifdef TARGET_HAS_PRECISE_SMC
     if (current_tb_modified) {
+        qemu_mutex_unlock_iothread();
         /* we generate a block containing just the instruction
            modifying the memory. It will ensure that it cannot modify
            itself */
diff --git a/vl.c b/vl.c
index 7786b2f..160e4a8 100644
--- a/vl.c
+++ b/vl.c
@@ -1608,10 +1608,16 @@ void qemu_devices_reset(void)
 {
     QEMUResetEntry *re, *nre;
 
+    /*
+     * Some device's reset needs to grab the global_mutex. So just release it
+     * here.
+     */
+    qemu_mutex_unlock_iothread();
     /* reset all devices */
     QTAILQ_FOREACH_SAFE(re, &reset_handlers, entry, nre) {
         re->func(re->opaque);
     }
+    qemu_mutex_lock_iothread();
 }
 
 void qemu_system_reset(bool report)
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [Qemu-devel] [RFC 09/10] cpu: remove exit_request global.
  2015-01-16 17:19 [Qemu-devel] [RFC 00/10] MultiThread TCG fred.konrad
                   ` (7 preceding siblings ...)
  2015-01-16 17:19 ` [Qemu-devel] [RFC 08/10] Drop global lock during TCG code execution fred.konrad
@ 2015-01-16 17:19 ` fred.konrad
  2015-01-29 15:52   ` Peter Maydell
  2015-01-16 17:19 ` [Qemu-devel] [RFC 10/10] tcg: switch on multithread fred.konrad
  2015-03-27 10:08 ` [Qemu-devel] [RFC 00/10] MultiThread TCG Alex Bennée
  10 siblings, 1 reply; 62+ messages in thread
From: fred.konrad @ 2015-01-16 17:19 UTC (permalink / raw)
  To: qemu-devel, mttcg
  Cc: peter.maydell, jan.kiszka, mark.burton, agraf, pbonzini, fred.konrad

From: KONRAD Frederic <fred.konrad@greensocs.com>

This removes exit_request global and adds a variable in CPUState for this.
Only the flag for the first cpu is used for the moment as we are still with one
TCG thread.

Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
---
 cpu-exec.c        |  4 +---
 cpus.c            | 12 +++++++++---
 include/qom/cpu.h |  1 +
 3 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index 68654e3..3ba2a7a 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -327,8 +327,6 @@ static void cpu_handle_debug_exception(CPUArchState *env)
 
 /* main execution loop */
 
-volatile sig_atomic_t exit_request;
-
 int cpu_exec(CPUArchState *env)
 {
     CPUState *cpu = ENV_GET_CPU(env);
@@ -365,7 +363,7 @@ int cpu_exec(CPUArchState *env)
      * an instruction scheduling constraint on modern architectures.  */
     smp_mb();
 
-    if (unlikely(exit_request)) {
+    if (unlikely(cpu->exit_loop_request)) {
         cpu->exit_request = 1;
     }
 
diff --git a/cpus.c b/cpus.c
index f10c94d..8ae70c2 100644
--- a/cpus.c
+++ b/cpus.c
@@ -646,10 +646,14 @@ static void cpu_handle_guest_debug(CPUState *cpu)
 
 static void cpu_signal(int sig)
 {
+    CPUState *cpu;
     if (current_cpu) {
         cpu_exit(current_cpu);
     }
-    exit_request = 1;
+
+    CPU_FOREACH(cpu) {
+        cpu->exit_loop_request = 1;
+    }
 }
 
 #ifdef CONFIG_LINUX
@@ -1376,7 +1380,8 @@ static void tcg_exec_all(void)
     if (next_cpu == NULL) {
         next_cpu = first_cpu;
     }
-    for (; next_cpu != NULL && !exit_request; next_cpu = CPU_NEXT(next_cpu)) {
+    for (; next_cpu != NULL && !first_cpu->exit_loop_request;
+           next_cpu = CPU_NEXT(next_cpu)) {
         CPUState *cpu = next_cpu;
         CPUArchState *env = cpu->env_ptr;
 
@@ -1393,7 +1398,8 @@ static void tcg_exec_all(void)
             break;
         }
     }
-    exit_request = 0;
+
+    first_cpu->exit_loop_request = 0;
 }
 
 void list_cpus(FILE *f, fprintf_function cpu_fprintf, const char *optarg)
diff --git a/include/qom/cpu.h b/include/qom/cpu.h
index 2098f1c..a2e3208 100644
--- a/include/qom/cpu.h
+++ b/include/qom/cpu.h
@@ -249,6 +249,7 @@ struct CPUState {
     bool created;
     bool stop;
     bool stopped;
+    volatile sig_atomic_t exit_loop_request;
     volatile sig_atomic_t exit_request;
     uint32_t interrupt_request;
     int singlestep_enabled;
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [Qemu-devel] [RFC 10/10] tcg: switch on multithread.
  2015-01-16 17:19 [Qemu-devel] [RFC 00/10] MultiThread TCG fred.konrad
                   ` (8 preceding siblings ...)
  2015-01-16 17:19 ` [Qemu-devel] [RFC 09/10] cpu: remove exit_request global fred.konrad
@ 2015-01-16 17:19 ` fred.konrad
  2015-03-27 10:08 ` [Qemu-devel] [RFC 00/10] MultiThread TCG Alex Bennée
  10 siblings, 0 replies; 62+ messages in thread
From: fred.konrad @ 2015-01-16 17:19 UTC (permalink / raw)
  To: qemu-devel, mttcg
  Cc: peter.maydell, jan.kiszka, mark.burton, agraf, pbonzini, fred.konrad

From: KONRAD Frederic <fred.konrad@greensocs.com>

This switches on multithread.

Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
---
 cpus.c | 85 +++++++++++++++++++++++-------------------------------------------
 1 file changed, 30 insertions(+), 55 deletions(-)

diff --git a/cpus.c b/cpus.c
index 8ae70c2..ab327dd 100644
--- a/cpus.c
+++ b/cpus.c
@@ -64,7 +64,6 @@
 
 #endif /* CONFIG_LINUX */
 
-static CPUState *next_cpu;
 int64_t max_delay;
 int64_t max_advance;
 
@@ -799,8 +798,6 @@ static bool iothread_requesting_mutex;
 
 static QemuThread io_thread;
 
-static QemuThread *tcg_cpu_thread;
-
 /* cpu creation */
 static QemuCond qemu_cpu_cond;
 /* system init */
@@ -907,10 +904,12 @@ static void qemu_wait_io_event_common(CPUState *cpu)
 
 static void qemu_tcg_wait_io_event(CPUState *cpu)
 {
-    while (all_cpu_threads_idle()) {
+    while (cpu_thread_is_idle(cpu)) {
        /* Start accounting real time to the virtual clock if the CPUs
           are idle.  */
-        qemu_clock_warp(QEMU_CLOCK_VIRTUAL);
+        if ((all_cpu_threads_idle()) && (cpu->cpu_index == 0)) {
+          qemu_clock_warp(QEMU_CLOCK_VIRTUAL);
+        }
         qemu_cond_wait(cpu->halt_cond, &qemu_global_mutex);
     }
 
@@ -918,9 +917,7 @@ static void qemu_tcg_wait_io_event(CPUState *cpu)
         qemu_cond_wait(&qemu_io_proceeded_cond, &qemu_global_mutex);
     }
 
-    CPU_FOREACH(cpu) {
-        qemu_wait_io_event_common(cpu);
-    }
+    qemu_wait_io_event_common(cpu);
 }
 
 static void qemu_kvm_wait_io_event(CPUState *cpu)
@@ -1012,7 +1009,7 @@ static void *qemu_dummy_cpu_thread_fn(void *arg)
 #endif
 }
 
-static void tcg_exec_all(void);
+static void tcg_exec_all(CPUState *cpu);
 
 static void *qemu_tcg_cpu_thread_fn(void *arg)
 {
@@ -1022,34 +1019,26 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
     qemu_thread_get_self(cpu->thread);
 
     qemu_mutex_lock_iothread();
-    CPU_FOREACH(cpu) {
-        cpu->thread_id = qemu_get_thread_id();
-        cpu->created = true;
-        cpu->can_do_io = 1;
-    }
-    qemu_cond_signal(&qemu_cpu_cond);
-
-    /* wait for initial kick-off after machine start */
-    while (QTAILQ_FIRST(&cpus)->stopped) {
-        qemu_cond_wait(QTAILQ_FIRST(&cpus)->halt_cond, &qemu_global_mutex);
+    cpu->thread_id = qemu_get_thread_id();
+    cpu->created = true;
+    cpu->can_do_io = 1;
 
-        /* process any pending work */
-        CPU_FOREACH(cpu) {
-            qemu_wait_io_event_common(cpu);
-        }
-    }
+    qemu_cond_signal(&qemu_cpu_cond);
 
     while (1) {
-        tcg_exec_all();
+        if (!cpu->stopped) {
+            tcg_exec_all(cpu);
 
-        if (use_icount) {
-            int64_t deadline = qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL);
+            if (use_icount) {
+                int64_t deadline =
+                    qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL);
 
-            if (deadline == 0) {
-                qemu_clock_notify(QEMU_CLOCK_VIRTUAL);
+                if (deadline == 0) {
+                    qemu_clock_notify(QEMU_CLOCK_VIRTUAL);
+                }
             }
         }
-        qemu_tcg_wait_io_event(QTAILQ_FIRST(&cpus));
+        qemu_tcg_wait_io_event(cpu);
     }
 
     return NULL;
@@ -1207,23 +1196,15 @@ static void qemu_tcg_init_vcpu(CPUState *cpu)
 
     cpu->halt_cond = g_malloc0(sizeof(QemuCond));
     qemu_cond_init(cpu->halt_cond);
-
-    /* share a single thread for all cpus with TCG */
-    if (!tcg_cpu_thread) {
-        cpu->thread = g_malloc0(sizeof(QemuThread));
-        snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/TCG",
-                 cpu->cpu_index);
-        qemu_thread_create(cpu->thread, thread_name, qemu_tcg_cpu_thread_fn,
-                           cpu, QEMU_THREAD_JOINABLE);
+    cpu->thread = g_malloc0(sizeof(QemuThread));
+    snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/TCG", cpu->cpu_index);
+    qemu_thread_create(cpu->thread, thread_name, qemu_tcg_cpu_thread_fn, cpu,
+                       QEMU_THREAD_JOINABLE);
 #ifdef _WIN32
-        cpu->hThread = qemu_thread_get_handle(cpu->thread);
+    cpu->hThread = qemu_thread_get_handle(cpu->thread);
 #endif
-        while (!cpu->created) {
-            qemu_cond_wait(&qemu_cpu_cond, &qemu_global_mutex);
-        }
-        tcg_cpu_thread = cpu->thread;
-    } else {
-        cpu->thread = tcg_cpu_thread;
+    while (!cpu->created) {
+        qemu_cond_wait(&qemu_cpu_cond, &qemu_global_mutex);
     }
 }
 
@@ -1370,21 +1351,15 @@ static int tcg_cpu_exec(CPUArchState *env)
     return ret;
 }
 
-static void tcg_exec_all(void)
+static void tcg_exec_all(CPUState *cpu)
 {
     int r;
+    CPUArchState *env = cpu->env_ptr;
 
     /* Account partial waits to QEMU_CLOCK_VIRTUAL.  */
     qemu_clock_warp(QEMU_CLOCK_VIRTUAL);
 
-    if (next_cpu == NULL) {
-        next_cpu = first_cpu;
-    }
-    for (; next_cpu != NULL && !first_cpu->exit_loop_request;
-           next_cpu = CPU_NEXT(next_cpu)) {
-        CPUState *cpu = next_cpu;
-        CPUArchState *env = cpu->env_ptr;
-
+    while (!cpu->exit_request) {
         qemu_clock_enable(QEMU_CLOCK_VIRTUAL,
                           (cpu->singlestep_enabled & SSTEP_NOTIMER) == 0);
 
@@ -1399,7 +1374,7 @@ static void tcg_exec_all(void)
         }
     }
 
-    first_cpu->exit_loop_request = 0;
+    cpu->exit_loop_request = 0;
 }
 
 void list_cpus(FILE *f, fprintf_function cpu_fprintf, const char *optarg)
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC 01/10] target-arm: protect cpu_exclusive_*.
  2015-01-16 17:19 ` [Qemu-devel] [RFC 01/10] target-arm: protect cpu_exclusive_* fred.konrad
@ 2015-01-27 14:36   ` Alex Bennée
  2015-01-29 15:17   ` Peter Maydell
  1 sibling, 0 replies; 62+ messages in thread
From: Alex Bennée @ 2015-01-27 14:36 UTC (permalink / raw)
  To: fred.konrad
  Cc: mttcg, peter.maydell, jan.kiszka, mark.burton, qemu-devel, agraf,
	pbonzini


fred.konrad@greensocs.com writes:

> From: KONRAD Frederic <fred.konrad@greensocs.com>
>
> This adds a lock to avoid multiple exclusive access at the same time in case of
> TCG multithread.
>
> Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
>
> V1 -> V2:
>   Removed qemu_mutex_destroy().
> ---
>  target-arm/cpu.c       | 14 ++++++++++++++
>  target-arm/cpu.h       |  3 +++
>  target-arm/helper.h    |  3 +++
>  target-arm/op_helper.c | 10 ++++++++++
>  target-arm/translate.c |  6 ++++++
>  5 files changed, 36 insertions(+)
>
<snip>
> diff --git a/target-arm/translate.c b/target-arm/translate.c
> index bdfcdf1..513d151 100644
> --- a/target-arm/translate.c
> +++ b/target-arm/translate.c
> @@ -7381,6 +7381,7 @@ static void gen_load_exclusive(DisasContext *s, int rt, int rt2,
>          abort();
>      }
>  
> +    gen_helper_exclusive_lock();
>      if (size == 3) {
>          TCGv_i32 tmp2 = tcg_temp_new_i32();
>          TCGv_i32 tmp3 = tcg_temp_new_i32();
> @@ -7396,11 +7397,14 @@ static void gen_load_exclusive(DisasContext *s, int rt, int rt2,
>  
>      store_reg(s, rt, tmp);
>      tcg_gen_extu_i32_i64(cpu_exclusive_addr, addr);
> +    gen_helper_exclusive_unlock();
>  }
>  
>  static void gen_clrex(DisasContext *s)
>  {
> +    gen_helper_exclusive_lock();
>      tcg_gen_movi_i64(cpu_exclusive_addr, -1);
> +    gen_helper_exclusive_unlock();
>  }
>  
>  #ifdef CONFIG_USER_ONLY
> @@ -7431,6 +7435,7 @@ static void gen_store_exclusive(DisasContext *s, int rd, int rt, int rt2,
>      done_label = gen_new_label();
>      extaddr = tcg_temp_new_i64();
>      tcg_gen_extu_i32_i64(extaddr, addr);
> +    gen_helper_exclusive_lock();
>      tcg_gen_brcond_i64(TCG_COND_NE, extaddr, cpu_exclusive_addr, fail_label);
>      tcg_temp_free_i64(extaddr);
>  
> @@ -7495,6 +7500,7 @@ static void gen_store_exclusive(DisasContext *s, int rd, int rt, int rt2,
>      tcg_gen_movi_i32(cpu_R[rd], 1);
>      gen_set_label(done_label);
>      tcg_gen_movi_i64(cpu_exclusive_addr, -1);
> +    gen_helper_exclusive_unlock();
>  }
>  #endif

As the aarch64 code shares a lot of helper code with arm it is probably
worth adding the locks to both translate and translate-a64.c.

-- 
Alex Bennée

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC 02/10] use a different translation block list for each cpu.
  2015-01-16 17:19 ` [Qemu-devel] [RFC 02/10] use a different translation block list for each cpu fred.konrad
@ 2015-01-27 14:45   ` Alex Bennée
  2015-01-27 15:16     ` Frederic Konrad
  2015-01-29 15:24   ` Peter Maydell
  2015-02-03 16:17   ` Richard Henderson
  2 siblings, 1 reply; 62+ messages in thread
From: Alex Bennée @ 2015-01-27 14:45 UTC (permalink / raw)
  To: fred.konrad
  Cc: mttcg, peter.maydell, jan.kiszka, mark.burton, qemu-devel, agraf,
	pbonzini


fred.konrad@greensocs.com writes:

> From: KONRAD Frederic <fred.konrad@greensocs.com>
>
> We need a different TranslationBlock list for each core in case of multithread
> TCG.
>
> Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
> ---
>  translate-all.c | 40 ++++++++++++++++++++++------------------
>  1 file changed, 22 insertions(+), 18 deletions(-)
>
> diff --git a/translate-all.c b/translate-all.c
> index 8fa4378..0e11c70 100644
> --- a/translate-all.c
> +++ b/translate-all.c
> @@ -72,10 +72,11 @@
>  #endif
>  
>  #define SMC_BITMAP_USE_THRESHOLD 10
> +#define MAX_CPUS 256

Where does this number come from?

>  typedef struct PageDesc {
>      /* list of TBs intersecting this ram page */
> -    TranslationBlock *first_tb;
> +    TranslationBlock *first_tb[MAX_CPUS];

Especially given the size of the PageDesc structure this adds a lot of
of bulk, mostly unused. Is the access to the TB list via PageDesc that
frequent to avoid an additional indirection?

>      /* in order to optimize self modifying code, we count the number
>         of lookups we do to a given page to use a bitmap */
>      unsigned int code_write_count;
> @@ -750,7 +751,7 @@ static inline void invalidate_page_bitmap(PageDesc *p)
>  /* Set to NULL all the 'first_tb' fields in all PageDescs. */
>  static void page_flush_tb_1(int level, void **lp)
>  {
> -    int i;
> +    int i, j;
>  
>      if (*lp == NULL) {
>          return;
> @@ -759,7 +760,9 @@ static void page_flush_tb_1(int level, void **lp)
>          PageDesc *pd = *lp;
>  
>          for (i = 0; i < V_L2_SIZE; ++i) {
> -            pd[i].first_tb = NULL;
> +            for (j = 0; j < MAX_CPUS; j++) {
> +                pd[i].first_tb[j] = NULL;
> +            }
>              invalidate_page_bitmap(pd + i);
>          }
>      } else {
> @@ -937,12 +940,12 @@ void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr)
>      /* remove the TB from the page list */
>      if (tb->page_addr[0] != page_addr) {
>          p = page_find(tb->page_addr[0] >> TARGET_PAGE_BITS);
> -        tb_page_remove(&p->first_tb, tb);
> +        tb_page_remove(&p->first_tb[current_cpu->cpu_index], tb);
>          invalidate_page_bitmap(p);
>      }
>      if (tb->page_addr[1] != -1 && tb->page_addr[1] != page_addr) {
>          p = page_find(tb->page_addr[1] >> TARGET_PAGE_BITS);
> -        tb_page_remove(&p->first_tb, tb);
> +        tb_page_remove(&p->first_tb[current_cpu->cpu_index], tb);
>          invalidate_page_bitmap(p);
>      }
>  
> @@ -1012,7 +1015,7 @@ static void build_page_bitmap(PageDesc *p)
>  
>      p->code_bitmap = g_malloc0(TARGET_PAGE_SIZE / 8);
>  
> -    tb = p->first_tb;
> +    tb = p->first_tb[current_cpu->cpu_index];
>      while (tb != NULL) {
>          n = (uintptr_t)tb & 3;
>          tb = (TranslationBlock *)((uintptr_t)tb & ~3);
> @@ -1138,7 +1141,7 @@ void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end,
>      /* we remove all the TBs in the range [start, end[ */
>      /* XXX: see if in some cases it could be faster to invalidate all
>         the code */
> -    tb = p->first_tb;
> +    tb = p->first_tb[cpu->cpu_index];
>      while (tb != NULL) {
>          n = (uintptr_t)tb & 3;
>          tb = (TranslationBlock *)((uintptr_t)tb & ~3);
> @@ -1196,7 +1199,7 @@ void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end,
>      }
>  #if !defined(CONFIG_USER_ONLY)
>      /* if no code remaining, no need to continue to use slow writes */
> -    if (!p->first_tb) {
> +    if (!p->first_tb[cpu->cpu_index]) {
>          invalidate_page_bitmap(p);
>          if (is_cpu_write_access) {
>              tlb_unprotect_code_phys(cpu, start, cpu->mem_io_vaddr);
> @@ -1224,10 +1227,10 @@ void tb_invalidate_phys_page_fast(tb_page_addr_t start, int len)
>  #if 0
>      if (1) {
>          qemu_log("modifying code at 0x%x size=%d EIP=%x PC=%08x\n",
> -                  cpu_single_env->mem_io_vaddr, len,
> -                  cpu_single_env->eip,
> -                  cpu_single_env->eip +
> -                  (intptr_t)cpu_single_env->segs[R_CS].base);
> +                  current_cpu->mem_io_vaddr, len,
> +                  current_cpu->eip,
> +                  current_cpu->eip +
> +                  (intptr_t)current_cpu->segs[R_CS].base);
>      }
>  #endif
>      p = page_find(start >> TARGET_PAGE_BITS);
> @@ -1269,7 +1272,7 @@ static void tb_invalidate_phys_page(tb_page_addr_t addr,
>      if (!p) {
>          return;
>      }
> -    tb = p->first_tb;
> +    tb = p->first_tb[current_cpu->cpu_index];
>  #ifdef TARGET_HAS_PRECISE_SMC
>      if (tb && pc != 0) {
>          current_tb = tb_find_pc(pc);
> @@ -1299,7 +1302,7 @@ static void tb_invalidate_phys_page(tb_page_addr_t addr,
>          tb_phys_invalidate(tb, addr);
>          tb = tb->page_next[n];
>      }
> -    p->first_tb = NULL;
> +    p->first_tb[current_cpu->cpu_index] = NULL;
>  #ifdef TARGET_HAS_PRECISE_SMC
>      if (current_tb_modified) {
>          /* we generate a block containing just the instruction
> @@ -1327,11 +1330,12 @@ static inline void tb_alloc_page(TranslationBlock *tb,
>  
>      tb->page_addr[n] = page_addr;
>      p = page_find_alloc(page_addr >> TARGET_PAGE_BITS, 1);
> -    tb->page_next[n] = p->first_tb;
> +    tb->page_next[n] = p->first_tb[current_cpu->cpu_index];
>  #ifndef CONFIG_USER_ONLY
> -    page_already_protected = p->first_tb != NULL;
> +    page_already_protected = p->first_tb[current_cpu->cpu_index] != NULL;
>  #endif
> -    p->first_tb = (TranslationBlock *)((uintptr_t)tb | n);
> +    p->first_tb[current_cpu->cpu_index]
> +      = (TranslationBlock *)((uintptr_t)tb | n);
>      invalidate_page_bitmap(p);
>  
>  #if defined(TARGET_HAS_SMC) || 1
> @@ -1821,7 +1825,7 @@ void page_set_flags(target_ulong start, target_ulong end, int flags)
>             the code inside.  */
>          if (!(p->flags & PAGE_WRITE) &&
>              (flags & PAGE_WRITE) &&
> -            p->first_tb) {
> +            p->first_tb[current_cpu->cpu_index]) {
>              tb_invalidate_phys_page(addr, 0, NULL, false);
>          }
>          p->flags = flags;

As the TranslationBlock itself has a linked list for page related
blocks:

  struct TranslationBlock *page_next[2];

could we not just come up with a structure that chains them together
here? 

-- 
Alex Bennée

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC 02/10] use a different translation block list for each cpu.
  2015-01-27 14:45   ` Alex Bennée
@ 2015-01-27 15:16     ` Frederic Konrad
  0 siblings, 0 replies; 62+ messages in thread
From: Frederic Konrad @ 2015-01-27 15:16 UTC (permalink / raw)
  To: Alex Bennée
  Cc: mttcg, peter.maydell, jan.kiszka, mark.burton, qemu-devel, agraf,
	pbonzini

On 27/01/2015 15:45, Alex Bennée wrote:
> fred.konrad@greensocs.com writes:
>
>> From: KONRAD Frederic <fred.konrad@greensocs.com>
>>
>> We need a different TranslationBlock list for each core in case of multithread
>> TCG.
>>
>> Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
>> ---
>>   translate-all.c | 40 ++++++++++++++++++++++------------------
>>   1 file changed, 22 insertions(+), 18 deletions(-)
>>
>> diff --git a/translate-all.c b/translate-all.c
>> index 8fa4378..0e11c70 100644
>> --- a/translate-all.c
>> +++ b/translate-all.c
>> @@ -72,10 +72,11 @@
>>   #endif
>>   
>>   #define SMC_BITMAP_USE_THRESHOLD 10
>> +#define MAX_CPUS 256
> Where does this number come from?
>
>>   typedef struct PageDesc {
>>       /* list of TBs intersecting this ram page */
>> -    TranslationBlock *first_tb;
>> +    TranslationBlock *first_tb[MAX_CPUS];
> Especially given the size of the PageDesc structure this adds a lot of
> of bulk, mostly unused. Is the access to the TB list via PageDesc that
> frequent to avoid an additional indirection?
>
>>       /* in order to optimize self modifying code, we count the number
>>          of lookups we do to a given page to use a bitmap */
>>       unsigned int code_write_count;
>> @@ -750,7 +751,7 @@ static inline void invalidate_page_bitmap(PageDesc *p)
>>   /* Set to NULL all the 'first_tb' fields in all PageDescs. */
>>   static void page_flush_tb_1(int level, void **lp)
>>   {
>> -    int i;
>> +    int i, j;
>>   
>>       if (*lp == NULL) {
>>           return;
>> @@ -759,7 +760,9 @@ static void page_flush_tb_1(int level, void **lp)
>>           PageDesc *pd = *lp;
>>   
>>           for (i = 0; i < V_L2_SIZE; ++i) {
>> -            pd[i].first_tb = NULL;
>> +            for (j = 0; j < MAX_CPUS; j++) {
>> +                pd[i].first_tb[j] = NULL;
>> +            }
>>               invalidate_page_bitmap(pd + i);
>>           }
>>       } else {
>> @@ -937,12 +940,12 @@ void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr)
>>       /* remove the TB from the page list */
>>       if (tb->page_addr[0] != page_addr) {
>>           p = page_find(tb->page_addr[0] >> TARGET_PAGE_BITS);
>> -        tb_page_remove(&p->first_tb, tb);
>> +        tb_page_remove(&p->first_tb[current_cpu->cpu_index], tb);
>>           invalidate_page_bitmap(p);
>>       }
>>       if (tb->page_addr[1] != -1 && tb->page_addr[1] != page_addr) {
>>           p = page_find(tb->page_addr[1] >> TARGET_PAGE_BITS);
>> -        tb_page_remove(&p->first_tb, tb);
>> +        tb_page_remove(&p->first_tb[current_cpu->cpu_index], tb);
>>           invalidate_page_bitmap(p);
>>       }
>>   
>> @@ -1012,7 +1015,7 @@ static void build_page_bitmap(PageDesc *p)
>>   
>>       p->code_bitmap = g_malloc0(TARGET_PAGE_SIZE / 8);
>>   
>> -    tb = p->first_tb;
>> +    tb = p->first_tb[current_cpu->cpu_index];
>>       while (tb != NULL) {
>>           n = (uintptr_t)tb & 3;
>>           tb = (TranslationBlock *)((uintptr_t)tb & ~3);
>> @@ -1138,7 +1141,7 @@ void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end,
>>       /* we remove all the TBs in the range [start, end[ */
>>       /* XXX: see if in some cases it could be faster to invalidate all
>>          the code */
>> -    tb = p->first_tb;
>> +    tb = p->first_tb[cpu->cpu_index];
>>       while (tb != NULL) {
>>           n = (uintptr_t)tb & 3;
>>           tb = (TranslationBlock *)((uintptr_t)tb & ~3);
>> @@ -1196,7 +1199,7 @@ void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end,
>>       }
>>   #if !defined(CONFIG_USER_ONLY)
>>       /* if no code remaining, no need to continue to use slow writes */
>> -    if (!p->first_tb) {
>> +    if (!p->first_tb[cpu->cpu_index]) {
>>           invalidate_page_bitmap(p);
>>           if (is_cpu_write_access) {
>>               tlb_unprotect_code_phys(cpu, start, cpu->mem_io_vaddr);
>> @@ -1224,10 +1227,10 @@ void tb_invalidate_phys_page_fast(tb_page_addr_t start, int len)
>>   #if 0
>>       if (1) {
>>           qemu_log("modifying code at 0x%x size=%d EIP=%x PC=%08x\n",
>> -                  cpu_single_env->mem_io_vaddr, len,
>> -                  cpu_single_env->eip,
>> -                  cpu_single_env->eip +
>> -                  (intptr_t)cpu_single_env->segs[R_CS].base);
>> +                  current_cpu->mem_io_vaddr, len,
>> +                  current_cpu->eip,
>> +                  current_cpu->eip +
>> +                  (intptr_t)current_cpu->segs[R_CS].base);
>>       }
>>   #endif
>>       p = page_find(start >> TARGET_PAGE_BITS);
>> @@ -1269,7 +1272,7 @@ static void tb_invalidate_phys_page(tb_page_addr_t addr,
>>       if (!p) {
>>           return;
>>       }
>> -    tb = p->first_tb;
>> +    tb = p->first_tb[current_cpu->cpu_index];
>>   #ifdef TARGET_HAS_PRECISE_SMC
>>       if (tb && pc != 0) {
>>           current_tb = tb_find_pc(pc);
>> @@ -1299,7 +1302,7 @@ static void tb_invalidate_phys_page(tb_page_addr_t addr,
>>           tb_phys_invalidate(tb, addr);
>>           tb = tb->page_next[n];
>>       }
>> -    p->first_tb = NULL;
>> +    p->first_tb[current_cpu->cpu_index] = NULL;
>>   #ifdef TARGET_HAS_PRECISE_SMC
>>       if (current_tb_modified) {
>>           /* we generate a block containing just the instruction
>> @@ -1327,11 +1330,12 @@ static inline void tb_alloc_page(TranslationBlock *tb,
>>   
>>       tb->page_addr[n] = page_addr;
>>       p = page_find_alloc(page_addr >> TARGET_PAGE_BITS, 1);
>> -    tb->page_next[n] = p->first_tb;
>> +    tb->page_next[n] = p->first_tb[current_cpu->cpu_index];
>>   #ifndef CONFIG_USER_ONLY
>> -    page_already_protected = p->first_tb != NULL;
>> +    page_already_protected = p->first_tb[current_cpu->cpu_index] != NULL;
>>   #endif
>> -    p->first_tb = (TranslationBlock *)((uintptr_t)tb | n);
>> +    p->first_tb[current_cpu->cpu_index]
>> +      = (TranslationBlock *)((uintptr_t)tb | n);
>>       invalidate_page_bitmap(p);
>>   
>>   #if defined(TARGET_HAS_SMC) || 1
>> @@ -1821,7 +1825,7 @@ void page_set_flags(target_ulong start, target_ulong end, int flags)
>>              the code inside.  */
>>           if (!(p->flags & PAGE_WRITE) &&
>>               (flags & PAGE_WRITE) &&
>> -            p->first_tb) {
>> +            p->first_tb[current_cpu->cpu_index]) {
>>               tb_invalidate_phys_page(addr, 0, NULL, false);
>>           }
>>           p->flags = flags;
> As the TranslationBlock itself has a linked list for page related
> blocks:
>
>    struct TranslationBlock *page_next[2];
>
> could we not just come up with a structure that chains them together
> here?
>
Hi Alex,

Thanks for looking at this.

We don't know how many time this first_tb is accessed right now..
You suggest to chains tb instead of using an array for this?
This make sense but I think it means we will have to protect this by a 
mutex as
well?

Thanks,
Fred

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC 01/10] target-arm: protect cpu_exclusive_*.
  2015-01-16 17:19 ` [Qemu-devel] [RFC 01/10] target-arm: protect cpu_exclusive_* fred.konrad
  2015-01-27 14:36   ` Alex Bennée
@ 2015-01-29 15:17   ` Peter Maydell
  2015-02-02  8:31     ` Frederic Konrad
  2015-02-26 18:09     ` Frederic Konrad
  1 sibling, 2 replies; 62+ messages in thread
From: Peter Maydell @ 2015-01-29 15:17 UTC (permalink / raw)
  To: KONRAD Frédéric
  Cc: mttcg, J. Kiszka, Mark Burton, QEMU Developers, Alexander Graf,
	Paolo Bonzini

On 16 January 2015 at 17:19,  <fred.konrad@greensocs.com> wrote:
> From: KONRAD Frederic <fred.konrad@greensocs.com>
>
> This adds a lock to avoid multiple exclusive access at the same time in case of
> TCG multithread.
>
> Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>

All the same comments I had on this patch earlier still apply:

 * I think adding mutex handling code to all the target-*
   frontends rather than providing facilities in common
   code for them to use is the wrong approach
 * You will fail to unlock the mutex if the ldrex or strex
   takes a data abort
 * This is making no attempt to learn from or unify with
   the existing attempts at handling exclusives in linux-user.
   When we've done this work we should have a single
   mechanism for handling exclusives in a multithreaded
   host environment which is used by both softmmu and useronly
   configs

thanks
-- PMM

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC 02/10] use a different translation block list for each cpu.
  2015-01-16 17:19 ` [Qemu-devel] [RFC 02/10] use a different translation block list for each cpu fred.konrad
  2015-01-27 14:45   ` Alex Bennée
@ 2015-01-29 15:24   ` Peter Maydell
  2015-01-29 15:33     ` Mark Burton
  2015-02-02  8:39     ` Frederic Konrad
  2015-02-03 16:17   ` Richard Henderson
  2 siblings, 2 replies; 62+ messages in thread
From: Peter Maydell @ 2015-01-29 15:24 UTC (permalink / raw)
  To: KONRAD Frédéric
  Cc: mttcg, J. Kiszka, Mark Burton, QEMU Developers, Alexander Graf,
	Paolo Bonzini

On 16 January 2015 at 17:19,  <fred.konrad@greensocs.com> wrote:
> From: KONRAD Frederic <fred.konrad@greensocs.com>
>
> We need a different TranslationBlock list for each core in case of multithread
> TCG.
>
> Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
> ---
>  translate-all.c | 40 ++++++++++++++++++++++------------------
>  1 file changed, 22 insertions(+), 18 deletions(-)
>
> diff --git a/translate-all.c b/translate-all.c
> index 8fa4378..0e11c70 100644
> --- a/translate-all.c
> +++ b/translate-all.c
> @@ -72,10 +72,11 @@
>  #endif
>
>  #define SMC_BITMAP_USE_THRESHOLD 10
> +#define MAX_CPUS 256
>
>  typedef struct PageDesc {
>      /* list of TBs intersecting this ram page */
> -    TranslationBlock *first_tb;
> +    TranslationBlock *first_tb[MAX_CPUS];

Do we really need to know this for every CPU, or just for
the one that's using this PageDesc? I am assuming we're going to make
the l1_map be per-CPU.

>      /* in order to optimize self modifying code, we count the number
>         of lookups we do to a given page to use a bitmap */
>      unsigned int code_write_count;
> @@ -750,7 +751,7 @@ static inline void invalidate_page_bitmap(PageDesc *p)
>  /* Set to NULL all the 'first_tb' fields in all PageDescs. */
>  static void page_flush_tb_1(int level, void **lp)
>  {
> -    int i;
> +    int i, j;
>
>      if (*lp == NULL) {
>          return;
> @@ -759,7 +760,9 @@ static void page_flush_tb_1(int level, void **lp)
>          PageDesc *pd = *lp;
>
>          for (i = 0; i < V_L2_SIZE; ++i) {
> -            pd[i].first_tb = NULL;
> +            for (j = 0; j < MAX_CPUS; j++) {
> +                pd[i].first_tb[j] = NULL;
> +            }
>              invalidate_page_bitmap(pd + i);
>          }
>      } else {
> @@ -937,12 +940,12 @@ void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr)
>      /* remove the TB from the page list */
>      if (tb->page_addr[0] != page_addr) {
>          p = page_find(tb->page_addr[0] >> TARGET_PAGE_BITS);
> -        tb_page_remove(&p->first_tb, tb);
> +        tb_page_remove(&p->first_tb[current_cpu->cpu_index], tb);

Anything using current_cpu in this code is hugely suspect.
For instance cpu_restore_state() takes a CPUState pointer and
calls this function -- either it should be acting on just that
CPU (which might not be the current one) or on all CPUs. In
any case implicitly working on current_cpu here is wrong.

Probably we need to look at the public-facing functions here
and decide which should have "operate on all CPUs" semantics
and which should have "operate on the CPU passed as a parameter"
and which "operate on the implicit current CPU".

-- PMM

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC 03/10] replace spinlock by QemuMutex.
  2015-01-16 17:19 ` [Qemu-devel] [RFC 03/10] replace spinlock by QemuMutex fred.konrad
@ 2015-01-29 15:25   ` Peter Maydell
  2015-02-02  8:45     ` Frederic Konrad
  0 siblings, 1 reply; 62+ messages in thread
From: Peter Maydell @ 2015-01-29 15:25 UTC (permalink / raw)
  To: KONRAD Frédéric
  Cc: mttcg, J. Kiszka, Mark Burton, QEMU Developers, Alexander Graf,
	Paolo Bonzini

On 16 January 2015 at 17:19,  <fred.konrad@greensocs.com> wrote:
> From: KONRAD Frederic <fred.konrad@greensocs.com>
>
> spinlock is only used in two cases:
>   * cpu-exec.c: to protect TranslationBlock
>   * mem_helper.c: for lock helper in target-i386 (which seems broken).
>
> It's a pthread_mutex_t in user-mode so better using QemuMutex directly in this
> case.
> It allows as well to reuse tb_lock mutex of TBContext in case of multithread
> TCG.
>
> Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
> ---
>  cpu-exec.c               | 15 +++++++++++----
>  include/exec/exec-all.h  |  4 ++--
>  linux-user/main.c        |  6 +++---
>  target-i386/mem_helper.c | 16 +++++++++++++---
>  tcg/i386/tcg-target.c    |  8 ++++++++
>  5 files changed, 37 insertions(+), 12 deletions(-)
>
> diff --git a/cpu-exec.c b/cpu-exec.c
> index a4f0eff..1e7513c 100644
> --- a/cpu-exec.c
> +++ b/cpu-exec.c
> @@ -335,7 +335,9 @@ int cpu_exec(CPUArchState *env)
>      SyncClocks sc;
>
>      /* This must be volatile so it is not trashed by longjmp() */
> +#if defined(CONFIG_USER_ONLY)
>      volatile bool have_tb_lock = false;
> +#endif

This work should be removing ifdefs indicating differences between
handling of user-only and softmmu regarding locking and threads,
not adding new ones.

-- PMM

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC 02/10] use a different translation block list for each cpu.
  2015-01-29 15:24   ` Peter Maydell
@ 2015-01-29 15:33     ` Mark Burton
  2015-02-02  8:39     ` Frederic Konrad
  1 sibling, 0 replies; 62+ messages in thread
From: Mark Burton @ 2015-01-29 15:33 UTC (permalink / raw)
  To: Peter Maydell
  Cc: mttcg, J. Kiszka, Alexander Graf, QEMU Developers, Paolo Bonzini,
	KONRAD Frédéric

I’ll let Fred answer the other points you make - which might help explain what we’re finding..
But - for this one…

The idea for now is to keep things simple and have a thread per CPU and a ‘cache’ per thread. (Later we can look at reducing the caches).

What we mean by a ‘cache’ needs to be clearer. I dont think ‘cache’ is a helpful word, but it’s the one thats been used elsewhere… anyway Actually - what we (think we) mean is this first_tb list (in each page block). In other words, as things stand, each CPU that is going to use this page is also going to build it’s own translation block list. We end up with pages x CPU first_tb lists…

Does that make sense?

Cheers

Mark.







> On 29 Jan 2015, at 16:24, Peter Maydell <peter.maydell@linaro.org> wrote:
> 
>> +    TranslationBlock *first_tb[MAX_CPUS];
> 
> Do we really need to know this for every CPU, or just for
> the one that's using this PageDesc? I am assuming we're going to make
> the l1_map be per-CPU.


	 +44 (0)20 7100 3485 x 210
 +33 (0)5 33 52 01 77x 210

	+33 (0)603762104
	mark.burton

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC 05/10] extract TBContext from TCGContext.
  2015-01-16 17:19 ` [Qemu-devel] [RFC 05/10] extract TBContext from TCGContext fred.konrad
@ 2015-01-29 15:44   ` Peter Maydell
  2015-02-03 16:30     ` Richard Henderson
  0 siblings, 1 reply; 62+ messages in thread
From: Peter Maydell @ 2015-01-29 15:44 UTC (permalink / raw)
  To: KONRAD Frédéric
  Cc: mttcg, J. Kiszka, Mark Burton, QEMU Developers, Alexander Graf,
	Paolo Bonzini

On 16 January 2015 at 17:19,  <fred.konrad@greensocs.com> wrote:
> From: KONRAD Frederic <fred.konrad@greensocs.com>
>
> In order to have one TCGContext per thread and a single TBContext we have to
> extract TBContext from TCGContext.

This seems a bit odd. It's not clear to me what the advantages
are of having one TCGContext per thread but only a single
TBContext (as opposed to either (1) having a single TCGContext
and TBContext with locks protecting against multiple threads
generating code at once, or (2) having each thread have its
own TCGContext and TBContext and completely independent codegen).

Maybe it would help if you sketched out your design in a little
more detail in the cover letter, with emphasis on which data
structures are going to be per-thread and which are going to
be shared (and if so how shared).

(Long term we would want to be able to have multiple
TBContexts to support heterogenous systems where CPUs
might be different architectures or have different views
of physical memory...)

thanks
-- PMM

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC 09/10] cpu: remove exit_request global.
  2015-01-16 17:19 ` [Qemu-devel] [RFC 09/10] cpu: remove exit_request global fred.konrad
@ 2015-01-29 15:52   ` Peter Maydell
  2015-02-02 10:03     ` Paolo Bonzini
  2015-02-03  9:37     ` Frederic Konrad
  0 siblings, 2 replies; 62+ messages in thread
From: Peter Maydell @ 2015-01-29 15:52 UTC (permalink / raw)
  To: KONRAD Frédéric
  Cc: mttcg, J. Kiszka, Mark Burton, QEMU Developers, Alexander Graf,
	Paolo Bonzini

On 16 January 2015 at 17:19,  <fred.konrad@greensocs.com> wrote:
> From: KONRAD Frederic <fred.konrad@greensocs.com>
>
> This removes exit_request global and adds a variable in CPUState for this.
> Only the flag for the first cpu is used for the moment as we are still with one
> TCG thread.

> --- a/cpus.c
> +++ b/cpus.c
> @@ -646,10 +646,14 @@ static void cpu_handle_guest_debug(CPUState *cpu)
>
>  static void cpu_signal(int sig)
>  {
> +    CPUState *cpu;
>      if (current_cpu) {
>          cpu_exit(current_cpu);
>      }
> -    exit_request = 1;
> +
> +    CPU_FOREACH(cpu) {
> +        cpu->exit_loop_request = 1;
> +    }
>  }

You can't do this -- this code is a signal handler so it could
get run at any time including while the list of CPUs is being
updated. (This is why we have the exit_request flag in the
first place rather than just setting the exit_request flag in
each CPU...)

Possibly you want exit_request to be a per-thread variable,
but I haven't thought much about it.

> --- a/include/qom/cpu.h
> +++ b/include/qom/cpu.h
> @@ -249,6 +249,7 @@ struct CPUState {
>      bool created;
>      bool stop;
>      bool stopped;
> +    volatile sig_atomic_t exit_loop_request;
>      volatile sig_atomic_t exit_request;
>      uint32_t interrupt_request;
>      int singlestep_enabled;

This would duplicate the exit_request and
exit_loop_request flags in the CPU, which is kind of odd.

-- PMM

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC 01/10] target-arm: protect cpu_exclusive_*.
  2015-01-29 15:17   ` Peter Maydell
@ 2015-02-02  8:31     ` Frederic Konrad
  2015-02-02  8:36       ` Peter Maydell
  2015-02-26 18:09     ` Frederic Konrad
  1 sibling, 1 reply; 62+ messages in thread
From: Frederic Konrad @ 2015-02-02  8:31 UTC (permalink / raw)
  To: Peter Maydell
  Cc: mttcg, J. Kiszka, Mark Burton, QEMU Developers, Alexander Graf,
	Paolo Bonzini

On 29/01/2015 16:17, Peter Maydell wrote:
> On 16 January 2015 at 17:19,  <fred.konrad@greensocs.com> wrote:
>> From: KONRAD Frederic <fred.konrad@greensocs.com>
>>
>> This adds a lock to avoid multiple exclusive access at the same time in case of
>> TCG multithread.
>>
>> Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
Hi Peter,
> All the same comments I had on this patch earlier still apply:
>
>   * I think adding mutex handling code to all the target-*
>     frontends rather than providing facilities in common
>     code for them to use is the wrong approach
Ok we will fix that.
>   * You will fail to unlock the mutex if the ldrex or strex
>     takes a data abort
>   * This is making no attempt to learn from or unify with
>     the existing attempts at handling exclusives in linux-user.
>     When we've done this work we should have a single
>     mechanism for handling exclusives in a multithreaded
>     host environment which is used by both softmmu and useronly
>     configs
Can you point me to the existing attempts in linux-user?

Thanks,
Fred
>
> thanks
> -- PMM

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC 01/10] target-arm: protect cpu_exclusive_*.
  2015-02-02  8:31     ` Frederic Konrad
@ 2015-02-02  8:36       ` Peter Maydell
  0 siblings, 0 replies; 62+ messages in thread
From: Peter Maydell @ 2015-02-02  8:36 UTC (permalink / raw)
  To: Frederic Konrad
  Cc: mttcg, J. Kiszka, Mark Burton, QEMU Developers, Alexander Graf,
	Paolo Bonzini

On 2 February 2015 at 08:31, Frederic Konrad <fred.konrad@greensocs.com> wrote:
>>   * This is making no attempt to learn from or unify with
>>     the existing attempts at handling exclusives in linux-user.
>>     When we've done this work we should have a single
>>     mechanism for handling exclusives in a multithreaded
>>     host environment which is used by both softmmu and useronly
>>     configs
>
> Can you point me to the existing attempts in linux-user?

It's in two parts -- the #ifdefed code in target-arm/translate.c
(gen_store_exclusive), and the main loop in linux-user/main.c which
handles EXCP_STREX to call do_strex(). There's a pair of utility
routines start_exclusive() and end_exclusive() which suspend all
other CPUs (which in linux-user are separate threads) so the
strex code can run, and then resume them all afterwards.

Since "handle strex between multiple threads" is exactly
what you're trying to do, when we're done we should have
a single mechanism for both softmmu and linux-user. Whether
that means "replace linux-user's current code with better
mechanism" or not I don't currently know.

-- PMM

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC 02/10] use a different translation block list for each cpu.
  2015-01-29 15:24   ` Peter Maydell
  2015-01-29 15:33     ` Mark Burton
@ 2015-02-02  8:39     ` Frederic Konrad
  2015-02-02  8:49       ` Peter Maydell
  1 sibling, 1 reply; 62+ messages in thread
From: Frederic Konrad @ 2015-02-02  8:39 UTC (permalink / raw)
  To: Peter Maydell
  Cc: mttcg, J. Kiszka, Mark Burton, QEMU Developers, Alexander Graf,
	Paolo Bonzini

On 29/01/2015 16:24, Peter Maydell wrote:
> On 16 January 2015 at 17:19,  <fred.konrad@greensocs.com> wrote:
>> From: KONRAD Frederic <fred.konrad@greensocs.com>
>>
>> We need a different TranslationBlock list for each core in case of multithread
>> TCG.
>>
>> Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
>> ---
>>   translate-all.c | 40 ++++++++++++++++++++++------------------
>>   1 file changed, 22 insertions(+), 18 deletions(-)
>>
>> diff --git a/translate-all.c b/translate-all.c
>> index 8fa4378..0e11c70 100644
>> --- a/translate-all.c
>> +++ b/translate-all.c
>> @@ -72,10 +72,11 @@
>>   #endif
>>
>>   #define SMC_BITMAP_USE_THRESHOLD 10
>> +#define MAX_CPUS 256
>>
>>   typedef struct PageDesc {
>>       /* list of TBs intersecting this ram page */
>> -    TranslationBlock *first_tb;
>> +    TranslationBlock *first_tb[MAX_CPUS];
> Do we really need to know this for every CPU, or just for
> the one that's using this PageDesc? I am assuming we're going to make
> the l1_map be per-CPU.

Do we have any clue of which cpu is using this PageDesc?
We did this like that because it is quite simple.
>
>>       /* in order to optimize self modifying code, we count the number
>>          of lookups we do to a given page to use a bitmap */
>>       unsigned int code_write_count;
>> @@ -750,7 +751,7 @@ static inline void invalidate_page_bitmap(PageDesc *p)
>>   /* Set to NULL all the 'first_tb' fields in all PageDescs. */
>>   static void page_flush_tb_1(int level, void **lp)
>>   {
>> -    int i;
>> +    int i, j;
>>
>>       if (*lp == NULL) {
>>           return;
>> @@ -759,7 +760,9 @@ static void page_flush_tb_1(int level, void **lp)
>>           PageDesc *pd = *lp;
>>
>>           for (i = 0; i < V_L2_SIZE; ++i) {
>> -            pd[i].first_tb = NULL;
>> +            for (j = 0; j < MAX_CPUS; j++) {
>> +                pd[i].first_tb[j] = NULL;
>> +            }
>>               invalidate_page_bitmap(pd + i);
>>           }
>>       } else {
>> @@ -937,12 +940,12 @@ void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr)
>>       /* remove the TB from the page list */
>>       if (tb->page_addr[0] != page_addr) {
>>           p = page_find(tb->page_addr[0] >> TARGET_PAGE_BITS);
>> -        tb_page_remove(&p->first_tb, tb);
>> +        tb_page_remove(&p->first_tb[current_cpu->cpu_index], tb);
> Anything using current_cpu in this code is hugely suspect.
> For instance cpu_restore_state() takes a CPUState pointer and
> calls this function -- either it should be acting on just that
> CPU (which might not be the current one) or on all CPUs. In
> any case implicitly working on current_cpu here is wrong.
>
> Probably we need to look at the public-facing functions here
> and decide which should have "operate on all CPUs" semantics
> and which should have "operate on the CPU passed as a parameter"
> and which "operate on the implicit current CPU".

Ok so the idea would be to have eg a cpu mask parameter to know which cpu to
invalidate/restore etc etc?
Or just pointer and invalidate all if NULL?

Thanks,
Fred
>
> -- PMM

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC 03/10] replace spinlock by QemuMutex.
  2015-01-29 15:25   ` Peter Maydell
@ 2015-02-02  8:45     ` Frederic Konrad
  0 siblings, 0 replies; 62+ messages in thread
From: Frederic Konrad @ 2015-02-02  8:45 UTC (permalink / raw)
  To: Peter Maydell
  Cc: mttcg, J. Kiszka, Mark Burton, QEMU Developers, Alexander Graf,
	Paolo Bonzini

On 29/01/2015 16:25, Peter Maydell wrote:
> On 16 January 2015 at 17:19,  <fred.konrad@greensocs.com> wrote:
>> From: KONRAD Frederic <fred.konrad@greensocs.com>
>>
>> spinlock is only used in two cases:
>>    * cpu-exec.c: to protect TranslationBlock
>>    * mem_helper.c: for lock helper in target-i386 (which seems broken).
>>
>> It's a pthread_mutex_t in user-mode so better using QemuMutex directly in this
>> case.
>> It allows as well to reuse tb_lock mutex of TBContext in case of multithread
>> TCG.
>>
>> Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
>> ---
>>   cpu-exec.c               | 15 +++++++++++----
>>   include/exec/exec-all.h  |  4 ++--
>>   linux-user/main.c        |  6 +++---
>>   target-i386/mem_helper.c | 16 +++++++++++++---
>>   tcg/i386/tcg-target.c    |  8 ++++++++
>>   5 files changed, 37 insertions(+), 12 deletions(-)
>>
>> diff --git a/cpu-exec.c b/cpu-exec.c
>> index a4f0eff..1e7513c 100644
>> --- a/cpu-exec.c
>> +++ b/cpu-exec.c
>> @@ -335,7 +335,9 @@ int cpu_exec(CPUArchState *env)
>>       SyncClocks sc;
>>
>>       /* This must be volatile so it is not trashed by longjmp() */
>> +#if defined(CONFIG_USER_ONLY)
>>       volatile bool have_tb_lock = false;
>> +#endif
> This work should be removing ifdefs indicating differences between
> handling of user-only and softmmu regarding locking and threads,
> not adding new ones.
>
> -- PMM
True,
I tried to go step by step to be able to use tb_lock and remove spinlock 
(which is
qemu_mutex_lock with ifdefs) to avoid a big patch with a lot of changes.

The ifdefs should be removed later in the series (I think).

Fred

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC 02/10] use a different translation block list for each cpu.
  2015-02-02  8:39     ` Frederic Konrad
@ 2015-02-02  8:49       ` Peter Maydell
  0 siblings, 0 replies; 62+ messages in thread
From: Peter Maydell @ 2015-02-02  8:49 UTC (permalink / raw)
  To: Frederic Konrad
  Cc: mttcg, J. Kiszka, Mark Burton, QEMU Developers, Alexander Graf,
	Paolo Bonzini

On 2 February 2015 at 08:39, Frederic Konrad <fred.konrad@greensocs.com> wrote:
> On 29/01/2015 16:24, Peter Maydell wrote:
>>
>> On 16 January 2015 at 17:19,  <fred.konrad@greensocs.com> wrote:
>>>
>>> From: KONRAD Frederic <fred.konrad@greensocs.com>
>>>
>>> We need a different TranslationBlock list for each core in case of
>>> multithread
>>> TCG.
>>>
>>> Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
>>> ---
>>>   translate-all.c | 40 ++++++++++++++++++++++------------------
>>>   1 file changed, 22 insertions(+), 18 deletions(-)
>>>
>>> diff --git a/translate-all.c b/translate-all.c
>>> index 8fa4378..0e11c70 100644
>>> --- a/translate-all.c
>>> +++ b/translate-all.c
>>> @@ -72,10 +72,11 @@
>>>   #endif
>>>
>>>   #define SMC_BITMAP_USE_THRESHOLD 10
>>> +#define MAX_CPUS 256
>>>
>>>   typedef struct PageDesc {
>>>       /* list of TBs intersecting this ram page */
>>> -    TranslationBlock *first_tb;
>>> +    TranslationBlock *first_tb[MAX_CPUS];
>>
>> Do we really need to know this for every CPU, or just for
>> the one that's using this PageDesc? I am assuming we're going to make
>> the l1_map be per-CPU.
>
>
> Do we have any clue of which cpu is using this PageDesc?

If you're making the l1_map per-CPU then the PageDescs you
get to from that l1_map are for that CPU. Basically my
(possibly naive) view of the PageDesc struct is that it's
just the leaf descriptors for the l1_map, and the l1_map
has to be per-CPU (because it's a virtual-address-space
indexed data structure). So I think if you have a PageDesc
you know already which CPU it relates to, because you got
it from that CPU thread's l1_map.

-- PMM

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC 09/10] cpu: remove exit_request global.
  2015-01-29 15:52   ` Peter Maydell
@ 2015-02-02 10:03     ` Paolo Bonzini
  2015-02-02 13:12       ` Peter Maydell
  2015-02-03  9:37     ` Frederic Konrad
  1 sibling, 1 reply; 62+ messages in thread
From: Paolo Bonzini @ 2015-02-02 10:03 UTC (permalink / raw)
  To: Peter Maydell, KONRAD Frédéric
  Cc: mttcg, J. Kiszka, Mark Burton, QEMU Developers, Alexander Graf



On 29/01/2015 16:52, Peter Maydell wrote:
>> > +    CPU_FOREACH(cpu) {
>> > +        cpu->exit_loop_request = 1;
>> > +    }
>> >  }
> You can't do this -- this code is a signal handler so it could
> get run at any time including while the list of CPUs is being
> updated. (This is why we have the exit_request flag in the
> first place rather than just setting the exit_request flag in
> each CPU...)

Actually you can do this if you are careful.

In particular, you can do it while you are under the big QEMU lock.  If
you are not, basically you have to treat the CPU list as RCU-protected,
and this is doable because the CPU object cannot be removed and added
back into the CPU list.

Unfortunately RCU doesn't support QTAILQ, at least not yet, so you'd
have to convert the CPU list to QLIST.  But the basic idea of this patch
can be done.

Paolo

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC 09/10] cpu: remove exit_request global.
  2015-02-02 10:03     ` Paolo Bonzini
@ 2015-02-02 13:12       ` Peter Maydell
  2015-02-02 13:14         ` Paolo Bonzini
  0 siblings, 1 reply; 62+ messages in thread
From: Peter Maydell @ 2015-02-02 13:12 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: mttcg, J. Kiszka, Mark Burton, Alexander Graf, QEMU Developers,
	KONRAD Frédéric

On 2 February 2015 at 10:03, Paolo Bonzini <pbonzini@redhat.com> wrote:
>> You can't do this -- this code is a signal handler so it could
>> get run at any time including while the list of CPUs is being
>> updated. (This is why we have the exit_request flag in the
>> first place rather than just setting the exit_request flag in
>> each CPU...)
>
> Actually you can do this if you are careful.
>
> In particular, you can do it while you are under the big QEMU lock.

...but this is a signal handler, so we can't guarantee that the
thread holds the big lock.

>  If
> you are not, basically you have to treat the CPU list as RCU-protected,
> and this is doable because the CPU object cannot be removed and added
> back into the CPU list.
>
> Unfortunately RCU doesn't support QTAILQ, at least not yet, so you'd
> have to convert the CPU list to QLIST.  But the basic idea of this patch
> can be done.

...and if we can iterate over CPU lists in signal handlers,
the correct approach is probably to use the existing exit_request
flag rather than adding another one. (Needs investigation to
check and document semantics of that flag.)

-- PMM

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC 09/10] cpu: remove exit_request global.
  2015-02-02 13:12       ` Peter Maydell
@ 2015-02-02 13:14         ` Paolo Bonzini
  0 siblings, 0 replies; 62+ messages in thread
From: Paolo Bonzini @ 2015-02-02 13:14 UTC (permalink / raw)
  To: Peter Maydell
  Cc: mttcg, J. Kiszka, Mark Burton, Alexander Graf, QEMU Developers,
	KONRAD Frédéric



On 02/02/2015 14:12, Peter Maydell wrote:
> > In particular, you can do it while you are under the big QEMU lock.
> 
> ...but this is a signal handler, so we can't guarantee that the
> thread holds the big lock.
> 
> ...and if we can iterate over CPU lists in signal handlers,
> the correct approach is probably to use the existing exit_request
> flag rather than adding another one. (Needs investigation to
> check and document semantics of that flag.)

I agree on both counts.

Paolo

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC 09/10] cpu: remove exit_request global.
  2015-01-29 15:52   ` Peter Maydell
  2015-02-02 10:03     ` Paolo Bonzini
@ 2015-02-03  9:37     ` Frederic Konrad
  2015-02-03 10:29       ` Peter Maydell
  1 sibling, 1 reply; 62+ messages in thread
From: Frederic Konrad @ 2015-02-03  9:37 UTC (permalink / raw)
  To: Peter Maydell
  Cc: mttcg, J. Kiszka, Mark Burton, QEMU Developers, Alexander Graf,
	Paolo Bonzini

On 29/01/2015 16:52, Peter Maydell wrote:
> On 16 January 2015 at 17:19,  <fred.konrad@greensocs.com> wrote:
>> From: KONRAD Frederic <fred.konrad@greensocs.com>
>>
>> This removes exit_request global and adds a variable in CPUState for this.
>> Only the flag for the first cpu is used for the moment as we are still with one
>> TCG thread.
>> --- a/cpus.c
>> +++ b/cpus.c
>> @@ -646,10 +646,14 @@ static void cpu_handle_guest_debug(CPUState *cpu)
>>
>>   static void cpu_signal(int sig)
>>   {
>> +    CPUState *cpu;
>>       if (current_cpu) {
>>           cpu_exit(current_cpu);
>>       }
>> -    exit_request = 1;
>> +
>> +    CPU_FOREACH(cpu) {
>> +        cpu->exit_loop_request = 1;
>> +    }
>>   }
> You can't do this -- this code is a signal handler so it could
> get run at any time including while the list of CPUs is being
> updated. (This is why we have the exit_request flag in the
> first place rather than just setting the exit_request flag in
> each CPU...)
>
> Possibly you want exit_request to be a per-thread variable,
> but I haven't thought much about it.
>
>> --- a/include/qom/cpu.h
>> +++ b/include/qom/cpu.h
>> @@ -249,6 +249,7 @@ struct CPUState {
>>       bool created;
>>       bool stop;
>>       bool stopped;
>> +    volatile sig_atomic_t exit_loop_request;
>>       volatile sig_atomic_t exit_request;
>>       uint32_t interrupt_request;
>>       int singlestep_enabled;
> This would duplicate the exit_request and
> exit_loop_request flags in the CPU, which is kind of odd.
>
> -- PMM
Actually, what we want to do is remove exit_requested global because 
when it exits
the loop in tcg_exec_all it does exit_requested = 0, and other vcpu 
doesn't exit.

This is not clear to me why we have both exit_requested global and 
exit_request in
CPUState.

Maybe we can just make exit_request thread local this might work.

Also, we need to be able to exit VCPU we want from a TCG thread.
eg: We want to flush all the tlb we need to exit all cpus but one..

Thanks,
Fred

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC 09/10] cpu: remove exit_request global.
  2015-02-03  9:37     ` Frederic Konrad
@ 2015-02-03 10:29       ` Peter Maydell
  0 siblings, 0 replies; 62+ messages in thread
From: Peter Maydell @ 2015-02-03 10:29 UTC (permalink / raw)
  To: Frederic Konrad
  Cc: mttcg, J. Kiszka, Mark Burton, QEMU Developers, Alexander Graf,
	Paolo Bonzini

On 3 February 2015 at 09:37, Frederic Konrad <fred.konrad@greensocs.com> wrote:
> Actually, what we want to do is remove exit_requested global because when it
> exits
> the loop in tcg_exec_all it does exit_requested = 0, and other vcpu doesn't
> exit.
>
> This is not clear to me why we have both exit_requested global and
> exit_request in CPUState.

Well, it's not totally clear, but there are two interlinked reasons:
(1) historical/accidental
(2) setting a global flag from a signal handler to be tested
by a main loop is a standard pattern for avoiding race conditions
and other nastiness that can otherwise occur, because "set flag"
is always atomic.

So to clean this up you need to establish what the current code/
data structures are doing and therefore what you can replace them
with.

> Maybe we can just make exit_request thread local this might work.

Almost certainly not the *only* thing you need to do. Analyse
the problem first, and the correct fix will follow :-)
Also, the fact that KVM doesn't use the exit_request global at
all suggests that the better approach might be to delete it
entirely, if you can do it safely.

> Also, we need to be able to exit VCPU we want from a TCG thread.
> eg: We want to flush all the tlb we need to exit all cpus but one..

Compare the linux-user code for getting all CPUs out of their
execution loop...

-- PMM

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC 02/10] use a different translation block list for each cpu.
  2015-01-16 17:19 ` [Qemu-devel] [RFC 02/10] use a different translation block list for each cpu fred.konrad
  2015-01-27 14:45   ` Alex Bennée
  2015-01-29 15:24   ` Peter Maydell
@ 2015-02-03 16:17   ` Richard Henderson
  2015-02-03 16:33     ` Paolo Bonzini
  2 siblings, 1 reply; 62+ messages in thread
From: Richard Henderson @ 2015-02-03 16:17 UTC (permalink / raw)
  To: fred.konrad, qemu-devel, mttcg
  Cc: peter.maydell, pbonzini, mark.burton, agraf, jan.kiszka

On 01/16/2015 09:19 AM, fred.konrad@greensocs.com wrote:
> @@ -759,7 +760,9 @@ static void page_flush_tb_1(int level, void **lp)
>          PageDesc *pd = *lp;
>  
>          for (i = 0; i < V_L2_SIZE; ++i) {
> -            pd[i].first_tb = NULL;
> +            for (j = 0; j < MAX_CPUS; j++) {
> +                pd[i].first_tb[j] = NULL;
> +            }
>              invalidate_page_bitmap(pd + i);
>          }
>      } else {

Surely you've got to do some locking somewhere in order to be able to modify
another thread's cpu tb list.

I realize that we do have to solve this problem for x86, but for most other
targets we ought, in principal, be able to avoid it.  Which simply requires
that we not treat icache flushes as nops.

When the kernel has modified a page, like so, it will also have notified the
other cpus that like so,

        if (smp_call_function(ipi_flush_icache_page, mm, 1)) {

We ought to be able to leverage this to avoid some locking at the qemu level.


r~

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC 05/10] extract TBContext from TCGContext.
  2015-01-29 15:44   ` Peter Maydell
@ 2015-02-03 16:30     ` Richard Henderson
  0 siblings, 0 replies; 62+ messages in thread
From: Richard Henderson @ 2015-02-03 16:30 UTC (permalink / raw)
  To: Peter Maydell, KONRAD Frédéric
  Cc: mttcg, J. Kiszka, Mark Burton, QEMU Developers, Alexander Graf,
	Paolo Bonzini

On 01/29/2015 07:44 AM, Peter Maydell wrote:
> On 16 January 2015 at 17:19,  <fred.konrad@greensocs.com> wrote:
>> From: KONRAD Frederic <fred.konrad@greensocs.com>
>>
>> In order to have one TCGContext per thread and a single TBContext we have to
>> extract TBContext from TCGContext.
> 
> This seems a bit odd. It's not clear to me what the advantages
> are of having one TCGContext per thread but only a single
> TBContext (as opposed to either (1) having a single TCGContext
> and TBContext with locks protecting against multiple threads
> generating code at once, or (2) having each thread have its
> own TCGContext and TBContext and completely independent codegen).
> 
> Maybe it would help if you sketched out your design in a little
> more detail in the cover letter, with emphasis on which data
> structures are going to be per-thread and which are going to
> be shared (and if so how shared).
> 
> (Long term we would want to be able to have multiple
> TBContexts to support heterogenous systems where CPUs
> might be different architectures or have different views
> of physical memory...)

Seconded.


r~

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC 02/10] use a different translation block list for each cpu.
  2015-02-03 16:17   ` Richard Henderson
@ 2015-02-03 16:33     ` Paolo Bonzini
  0 siblings, 0 replies; 62+ messages in thread
From: Paolo Bonzini @ 2015-02-03 16:33 UTC (permalink / raw)
  To: Richard Henderson, fred.konrad, qemu-devel, mttcg
  Cc: peter.maydell, mark.burton, agraf, jan.kiszka



On 03/02/2015 17:17, Richard Henderson wrote:
>> > @@ -759,7 +760,9 @@ static void page_flush_tb_1(int level, void **lp)
>> >          PageDesc *pd = *lp;
>> >  
>> >          for (i = 0; i < V_L2_SIZE; ++i) {
>> > -            pd[i].first_tb = NULL;
>> > +            for (j = 0; j < MAX_CPUS; j++) {
>> > +                pd[i].first_tb[j] = NULL;
>> > +            }
>> >              invalidate_page_bitmap(pd + i);
>> >          }
>> >      } else {
> Surely you've got to do some locking somewhere in order to be able to modify
> another thread's cpu tb list.

But that's probably not even necessary.  page_flush_tb_1 is called from
tb_flush, which in turn is only called in very special circumstances.

It should be possible to have something like the kernel's "stop_machine"
that does the following:

1) schedule a callback on all TCG CPU threads

2) wait for all CPUs to have reached that callback

3) do tb_flush on all CPUs, while it knows they are not holding any lock

4) release all TCG CPU threads

With one TCG thread, just use qemu_bh_new (hidden behind a suitable API
of course!).  Once you have multiple TCG CPU threads, loop on all CPUs
with the same run_on_cpu function that KVM uses.

Paolo

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC 01/10] target-arm: protect cpu_exclusive_*.
  2015-01-29 15:17   ` Peter Maydell
  2015-02-02  8:31     ` Frederic Konrad
@ 2015-02-26 18:09     ` Frederic Konrad
  2015-02-26 20:36       ` Alexander Graf
  2015-02-26 22:56       ` Peter Maydell
  1 sibling, 2 replies; 62+ messages in thread
From: Frederic Konrad @ 2015-02-26 18:09 UTC (permalink / raw)
  To: Peter Maydell
  Cc: mttcg, J. Kiszka, Mark Burton, QEMU Developers, Alexander Graf,
	Paolo Bonzini

On 29/01/2015 16:17, Peter Maydell wrote:
> On 16 January 2015 at 17:19,  <fred.konrad@greensocs.com> wrote:
>> From: KONRAD Frederic <fred.konrad@greensocs.com>
>>
>> This adds a lock to avoid multiple exclusive access at the same time in case of
>> TCG multithread.
>>
>> Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
> All the same comments I had on this patch earlier still apply:
>
>   * I think adding mutex handling code to all the target-*
>     frontends rather than providing facilities in common
>     code for them to use is the wrong approach
>   * You will fail to unlock the mutex if the ldrex or strex
>     takes a data abort
>   * This is making no attempt to learn from or unify with
>     the existing attempts at handling exclusives in linux-user.
>     When we've done this work we should have a single
>     mechanism for handling exclusives in a multithreaded
>     host environment which is used by both softmmu and useronly
>     configs
>
> thanks
> -- PMM

Hi,

We see some performance improvment on this SMP, but still have random 
deadlock
in the guest in this function:

static inline void arch_spin_lock(arch_spinlock_t *lock)
{
     unsigned long tmp;
     u32 newval;
     arch_spinlock_t lockval;

     prefetchw(&lock->slock);
     __asm__ __volatile__(
"1:    ldrex    %0, [%3]\n"
"    add    %1, %0, %4\n"
"    strex    %2, %1, [%3]\n"
"    teq    %2, #0\n"
"    bne    1b"
     : "=&r" (lockval), "=&r" (newval), "=&r" (tmp)
     : "r" (&lock->slock), "I" (1 << TICKET_SHIFT)
     : "cc");

     while (lockval.tickets.next != lockval.tickets.owner) {
         wfe();
         lockval.tickets.owner = ACCESS_ONCE(lock->tickets.owner);
     }

     smp_mb();
}

In the last while loop. That's why we though immediately that it is 
caused by our
implementation of atomic instructions.

We failed to unlock the mutex a lot of time during the boot, not because 
of data
abort but I think because we can be interrupted during a strex (eg: 
there are two
branches)?

We decided to implement the whole atomic instruction inside an helper 
but is that
possible to get the data with eg: cpu_physical_memory_rw instead of the 
normal
generated code?

One other thing which looks suspicious it seems there is one pair of
exclusive_addr/exclusive_val per CPU is that normal?

Thanks,
Fred

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC 01/10] target-arm: protect cpu_exclusive_*.
  2015-02-26 18:09     ` Frederic Konrad
@ 2015-02-26 20:36       ` Alexander Graf
  2015-02-26 22:56       ` Peter Maydell
  1 sibling, 0 replies; 62+ messages in thread
From: Alexander Graf @ 2015-02-26 20:36 UTC (permalink / raw)
  To: Frederic Konrad
  Cc: mttcg, Peter Maydell, J. Kiszka, Mark Burton, QEMU Developers,
	Paolo Bonzini




> Am 26.02.2015 um 19:09 schrieb Frederic Konrad <fred.konrad@greensocs.com>:
> 
>> On 29/01/2015 16:17, Peter Maydell wrote:
>>> On 16 January 2015 at 17:19,  <fred.konrad@greensocs.com> wrote:
>>> From: KONRAD Frederic <fred.konrad@greensocs.com>
>>> 
>>> This adds a lock to avoid multiple exclusive access at the same time in case of
>>> TCG multithread.
>>> 
>>> Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
>> All the same comments I had on this patch earlier still apply:
>> 
>>  * I think adding mutex handling code to all the target-*
>>    frontends rather than providing facilities in common
>>    code for them to use is the wrong approach
>>  * You will fail to unlock the mutex if the ldrex or strex
>>    takes a data abort
>>  * This is making no attempt to learn from or unify with
>>    the existing attempts at handling exclusives in linux-user.
>>    When we've done this work we should have a single
>>    mechanism for handling exclusives in a multithreaded
>>    host environment which is used by both softmmu and useronly
>>    configs
>> 
>> thanks
>> -- PMM
> 
> Hi,
> 
> We see some performance improvment on this SMP, but still have random deadlock
> in the guest in this function:
> 
> static inline void arch_spin_lock(arch_spinlock_t *lock)
> {
>    unsigned long tmp;
>    u32 newval;
>    arch_spinlock_t lockval;
> 
>    prefetchw(&lock->slock);
>    __asm__ __volatile__(
> "1:    ldrex    %0, [%3]\n"
> "    add    %1, %0, %4\n"
> "    strex    %2, %1, [%3]\n"
> "    teq    %2, #0\n"
> "    bne    1b"
>    : "=&r" (lockval), "=&r" (newval), "=&r" (tmp)
>    : "r" (&lock->slock), "I" (1 << TICKET_SHIFT)
>    : "cc");
> 
>    while (lockval.tickets.next != lockval.tickets.owner) {
>        wfe();
>        lockval.tickets.owner = ACCESS_ONCE(lock->tickets.owner);
>    }
> 
>    smp_mb();
> }
> 
> In the last while loop. That's why we though immediately that it is caused by our
> implementation of atomic instructions.
> 
> We failed to unlock the mutex a lot of time during the boot, not because of data
> abort but I think because we can be interrupted during a strex (eg: there are two
> branches)?
> 
> We decided to implement the whole atomic instruction inside an helper but is that
> possible to get the data with eg: cpu_physical_memory_rw instead of the normal
> generated code?

Yes, but make sure you store pc in env before you enter the helper.

Alex

> 
> One other thing which looks suspicious it seems there is one pair of
> exclusive_addr/exclusive_val per CPU is that normal?
> 
> Thanks,
> Fred

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC 01/10] target-arm: protect cpu_exclusive_*.
  2015-02-26 18:09     ` Frederic Konrad
  2015-02-26 20:36       ` Alexander Graf
@ 2015-02-26 22:56       ` Peter Maydell
  2015-02-27  7:54         ` Mark Burton
  1 sibling, 1 reply; 62+ messages in thread
From: Peter Maydell @ 2015-02-26 22:56 UTC (permalink / raw)
  To: Frederic Konrad
  Cc: mttcg, J. Kiszka, Mark Burton, QEMU Developers, Alexander Graf,
	Paolo Bonzini

On 27 February 2015 at 03:09, Frederic Konrad <fred.konrad@greensocs.com> wrote:
> On 29/01/2015 16:17, Peter Maydell wrote:
>>
>> On 16 January 2015 at 17:19,  <fred.konrad@greensocs.com> wrote:
>>>
>>> From: KONRAD Frederic <fred.konrad@greensocs.com>
>>>
>>> This adds a lock to avoid multiple exclusive access at the same time in
>>> case of
>>> TCG multithread.

>> All the same comments I had on this patch earlier still apply:
>>
>>   * I think adding mutex handling code to all the target-*
>>     frontends rather than providing facilities in common
>>     code for them to use is the wrong approach
>>   * You will fail to unlock the mutex if the ldrex or strex
>>     takes a data abort
>>   * This is making no attempt to learn from or unify with
>>     the existing attempts at handling exclusives in linux-user.
>>     When we've done this work we should have a single
>>     mechanism for handling exclusives in a multithreaded
>>     host environment which is used by both softmmu and useronly
>>     configs

> We decided to implement the whole atomic instruction inside an helper

...which is a different approach which still isn't really
addressing any of my remarks in the list above...

> but is
> that
> possible to get the data with eg: cpu_physical_memory_rw instead of the
> normal
> generated code?

cpu_physical_memory_rw would bypass the TLB and so be much slower.
Make sure you use the functions which go via the TLB if you do
this in a helper (and remember that they will longjmp out on a
tlb miss!)

> One other thing which looks suspicious it seems there is one pair of
> exclusive_addr/exclusive_val per CPU is that normal?

Pretty sure we've already discussed how the current ldrex/strex
implementation is not architecturally correct. I think this is
another of those areas.

In general I'd be much happier seeing a proper sketch of your
design, what data structures etc you intend to share between
CPUs and which are per-CPU, what generic mechanisms you plan
to provide to allow targets to implement atomic instructions, etc.
It's quite hard to see the whole picture at the moment.

-- PMM

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC 01/10] target-arm: protect cpu_exclusive_*.
  2015-02-26 22:56       ` Peter Maydell
@ 2015-02-27  7:54         ` Mark Burton
  2015-03-02 12:27           ` Peter Maydell
  0 siblings, 1 reply; 62+ messages in thread
From: Mark Burton @ 2015-02-27  7:54 UTC (permalink / raw)
  To: Peter Maydell
  Cc: mttcg, J. Kiszka, Alexander Graf, QEMU Developers, Paolo Bonzini,
	KONRAD Frédéric


> On 26 Feb 2015, at 23:56, Peter Maydell <peter.maydell@linaro.org> wrote:
> 
> On 27 February 2015 at 03:09, Frederic Konrad <fred.konrad@greensocs.com> wrote:
>> On 29/01/2015 16:17, Peter Maydell wrote:
>>> 
>>> On 16 January 2015 at 17:19,  <fred.konrad@greensocs.com> wrote:
>>>> 
>>>> From: KONRAD Frederic <fred.konrad@greensocs.com>
>>>> 
>>>> This adds a lock to avoid multiple exclusive access at the same time in
>>>> case of
>>>> TCG multithread.
> 
>>> All the same comments I had on this patch earlier still apply:
>>> 
>>>  * I think adding mutex handling code to all the target-*
>>>    frontends rather than providing facilities in common
>>>    code for them to use is the wrong approach
>>>  * You will fail to unlock the mutex if the ldrex or strex
>>>    takes a data abort
>>>  * This is making no attempt to learn from or unify with
>>>    the existing attempts at handling exclusives in linux-user.
>>>    When we've done this work we should have a single
>>>    mechanism for handling exclusives in a multithreaded
>>>    host environment which is used by both softmmu and useronly
>>>    configs
> 
>> We decided to implement the whole atomic instruction inside an helper
> 
> ...which is a different approach which still isn't really
> addressing any of my remarks in the list above…

We agree on the above point. For atomic instructions, I think we discussed at length what to do. However we chose to ‘ignore’ the problem for now and to ‘hack’ something just to get it working initially. Basically at this stage we want something mostly working so we can work on individual bits of the code and address them in more careful detail. Our issue with Atomic-ness is that the ‘hack’ we put in place seems to allow a race condition, and we can’t see why :-(
But - overall, the plan is absolutely to provide a better implementation….

> 
>> but is
>> that
>> possible to get the data with eg: cpu_physical_memory_rw instead of the
>> normal
>> generated code?
> 
> cpu_physical_memory_rw would bypass the TLB and so be much slower.
> Make sure you use the functions which go via the TLB if you do
> this in a helper (and remember that they will longjmp out on a
> tlb miss!)

At this point speed isn’t our main concern - it’s simplicity of implementation - we want it to work, then we can worry about a better implementation (which likely should not go this path at all - as discussed above).
Given that - isn’t it reasonable to pass through cpu_physical_memory_rw - and hence not have to worry about the long jump ? Or am I missing something?

> 
>> One other thing which looks suspicious it seems there is one pair of
>> exclusive_addr/exclusive_val per CPU is that normal?
> 
> Pretty sure we've already discussed how the current ldrex/strex
> implementation is not architecturally correct. I think this is
> another of those areas.

We have indeed discussed this - but this is a surprise. What  we’ve found is that the ‘globals’ (which, when we discussed it, we assumed were indeed global) seem not to be global at all. 
if 2 CPU’s both wanted to write the same strex, and both wrote the address and current values to these variables, right now I believe it would be theoretically possible that both CPU’s would end up successfully completing the strex. If the TB exited on the second branch, I suspect you could have a race condition which would lead to a failure? However I guess this is unlikely.


> 
> In general I'd be much happier seeing a proper sketch of your
> design, what data structures etc you intend to share between
> CPUs and which are per-CPU, what generic mechanisms you plan
> to provide to allow targets to implement atomic instructions, etc.
> It's quite hard to see the whole picture at the moment.


Agreed - at this point - as I say - we are trying to get something to be ‘just' stable enough that we can then use it to move forward from.
But I totally agree - we should be clearer about the overall picture, (once we can see the wood for the trees)

Cheers

Mark.


> 
> -- PMM


	 +44 (0)20 7100 3485 x 210
 +33 (0)5 33 52 01 77x 210

	+33 (0)603762104
	mark.burton

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC 01/10] target-arm: protect cpu_exclusive_*.
  2015-02-27  7:54         ` Mark Burton
@ 2015-03-02 12:27           ` Peter Maydell
  2015-03-03 15:29             ` Mark Burton
  0 siblings, 1 reply; 62+ messages in thread
From: Peter Maydell @ 2015-03-02 12:27 UTC (permalink / raw)
  To: Mark Burton
  Cc: mttcg, J. Kiszka, Alexander Graf, QEMU Developers, Paolo Bonzini,
	KONRAD Frédéric

On 27 February 2015 at 16:54, Mark Burton <mark.burton@greensocs.com> wrote:
>
>> On 26 Feb 2015, at 23:56, Peter Maydell <peter.maydell@linaro.org> wrote:
>> cpu_physical_memory_rw would bypass the TLB and so be much slower.
>> Make sure you use the functions which go via the TLB if you do
>> this in a helper (and remember that they will longjmp out on a
>> tlb miss!)
>
> At this point speed isn’t our main concern - it’s simplicity of implementation - we want it to work, then we can worry about a better implementation (which likely should not go this path at all - as discussed above).
> Given that - isn’t it reasonable to pass through cpu_physical_memory_rw - and hence not have to worry about the long jump ? Or am I missing something?

If you use cpu_physical_memory_rw you need to do the
virt-to-phys translation by hand (preferably via the TLB).
That might be something you needed to do anyway if we want
to have architecturally correct monitors that work on
physaddrs rather than vaddrs, but if not then the two
step process is a bit awkward.

>> Pretty sure we've already discussed how the current ldrex/strex
>> implementation is not architecturally correct. I think this is
>> another of those areas.
>
> We have indeed discussed this - but this is a surprise.

You're right that I didn't specifically realise this exact
part of our incorrectness earlier.

-- PMM

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC 01/10] target-arm: protect cpu_exclusive_*.
  2015-03-02 12:27           ` Peter Maydell
@ 2015-03-03 15:29             ` Mark Burton
  2015-03-03 15:32               ` Paolo Bonzini
  0 siblings, 1 reply; 62+ messages in thread
From: Mark Burton @ 2015-03-03 15:29 UTC (permalink / raw)
  To: Peter Maydell
  Cc: mttcg, J. Kiszka, Alexander Graf, QEMU Developers, Paolo Bonzini,
	KONRAD Frédéric

Hi Peter, thanks for the feedback

So - what we have discovered - using the slow path as discussed has a significant performance hit (no surprise), likewise the user-mode exit mechanism. However we are investigating that in parallel...

However, for now, we have left the TCG doing the majority of the work. 
(This was supposed to be the ‘quick an easy’ approach - we have already discussed better approaches to atomic instructions at length - but they are even more invasive!)

we have tried several schemes, and not found anything totally satisfactory:  we have a gremlin. There seems to be a race condition somewhere effectively means that the guest ‘ticket’ mechanism inside the guest kernel goes off by one, and therefore the guest kernel stalls as it never gets the ticket it’s looking for…. (it’s off by one in the ‘one too few’ sense, both CPU’s end up looking for tickets that are higher than the current finished ticket)...

The mechanism to hand out the tickets is of course a LDREX/STREX, leading us to believe that is the cause of our problems, however I am increasingly sceptical.


Our scheme is actually quite simple now:
We keep the same overall scheme as exits upstream today - we store the address and value.
We provide a lock which is used for LDREX, STREX, CLREX, and in the raise_exception code.
For LDREX we check that no other CPU is tagging the address. Because of the lock, we can be sure no STREX is executing so we are always safe to ‘steal’ a tag from another CPU (which is what the ARM ARM defines).
STREX and CLREX are similar to upstream...
We are also careful to invalidate the address tag if we take an exception.

The drawback of this scheme is of course that we are not totally protected against non exclusive writes, which could potentially fall between our reading a value and checking it against our saved value. But I am not convinced that is our problem (I have checked to make sure we dont get non exclusive writes).

This scheme would seem to be elegant and simple - but it suffers from the one small drawback of still having the issue as above…


Cheers

Mark.

ps. on our bug - we believe somehow the STREX is being marked as failed, but actually succeeds to write something.  There are only 3 ways the strex can fail:
1/ the address doesn't match - in which case no ST will be attempted
2/ the value doesn't match - which means - no ST attempted
3/ the store starts, but causes a TLB fill/exception…

The 3rd has 2 possibilities - the TLB is filled, and the store goes ahead totally normally - there should be no ‘fail’ - or an exception is generated in which case we will long jump away and never return. 

Am I missing something?



>> 
>> 


> On 2 Mar 2015, at 13:27, Peter Maydell <peter.maydell@linaro.org> wrote:
> 
> On 27 February 2015 at 16:54, Mark Burton <mark.burton@greensocs.com> wrote:
>> 
>>> On 26 Feb 2015, at 23:56, Peter Maydell <peter.maydell@linaro.org> wrote:
>>> cpu_physical_memory_rw would bypass the TLB and so be much slower.
>>> Make sure you use the functions which go via the TLB if you do
>>> this in a helper (and remember that they will longjmp out on a
>>> tlb miss!)
>> 
>> At this point speed isn’t our main concern - it’s simplicity of implementation - we want it to work, then we can worry about a better implementation (which likely should not go this path at all - as discussed above).
>> Given that - isn’t it reasonable to pass through cpu_physical_memory_rw - and hence not have to worry about the long jump ? Or am I missing something?
> 
> If you use cpu_physical_memory_rw you need to do the
> virt-to-phys translation by hand (preferably via the TLB).
> That might be something you needed to do anyway if we want
> to have architecturally correct monitors that work on
> physaddrs rather than vaddrs, but if not then the two
> step process is a bit awkward.
> 
>>> Pretty sure we've already discussed how the current ldrex/strex
>>> implementation is not architecturally correct. I think this is
>>> another of those areas.
>> 
>> We have indeed discussed this - but this is a surprise.
> 
> You're right that I didn't specifically realise this exact
> part of our incorrectness earlier.
> 
> -- PMM


	 +44 (0)20 7100 3485 x 210
 +33 (0)5 33 52 01 77x 210

	+33 (0)603762104
	mark.burton

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC 01/10] target-arm: protect cpu_exclusive_*.
  2015-03-03 15:29             ` Mark Burton
@ 2015-03-03 15:32               ` Paolo Bonzini
  2015-03-03 15:33                 ` Mark Burton
  0 siblings, 1 reply; 62+ messages in thread
From: Paolo Bonzini @ 2015-03-03 15:32 UTC (permalink / raw)
  To: Mark Burton, Peter Maydell
  Cc: mttcg, Alexander Graf, J. Kiszka, QEMU Developers,
	KONRAD Frédéric



On 03/03/2015 16:29, Mark Burton wrote:
> 
> ps. on our bug - we believe somehow the STREX is being marked as
> failed, but actually succeeds to write something.  There are only 3
> ways the strex can fail: 1/ the address doesn't match - in which case
> no ST will be attempted 2/ the value doesn't match - which means - no
> ST attempted 3/ the store starts, but causes a TLB fill/exception…
> 
> The 3rd has 2 possibilities - the TLB is filled, and the store goes
> ahead totally normally - there should be no ‘fail’ - or an exception
> is generated in which case we will long jump away and never return.

When do you release the lock?

Paolo

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC 01/10] target-arm: protect cpu_exclusive_*.
  2015-03-03 15:32               ` Paolo Bonzini
@ 2015-03-03 15:33                 ` Mark Burton
  2015-03-03 15:34                   ` Paolo Bonzini
  2015-03-03 15:47                   ` Dr. David Alan Gilbert
  0 siblings, 2 replies; 62+ messages in thread
From: Mark Burton @ 2015-03-03 15:33 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: mttcg, Peter Maydell, J. Kiszka, Alexander Graf, QEMU Developers,
	KONRAD Frédéric


> On 3 Mar 2015, at 16:32, Paolo Bonzini <pbonzini@redhat.com> wrote:
> 
> 
> 
> On 03/03/2015 16:29, Mark Burton wrote:
>> 
>> ps. on our bug - we believe somehow the STREX is being marked as
>> failed, but actually succeeds to write something.  There are only 3
>> ways the strex can fail: 1/ the address doesn't match - in which case
>> no ST will be attempted 2/ the value doesn't match - which means - no
>> ST attempted 3/ the store starts, but causes a TLB fill/exception…
>> 
>> The 3rd has 2 possibilities - the TLB is filled, and the store goes
>> ahead totally normally - there should be no ‘fail’ - or an exception
>> is generated in which case we will long jump away and never return.
> 
> When do you release the lock?
> 
(Thanks Paolo!)

We release the lock in either
a) the end of the strex
or
b) in the ‘raise_exception’

Cheers

Mark.

> Paolo


	 +44 (0)20 7100 3485 x 210
 +33 (0)5 33 52 01 77x 210

	+33 (0)603762104
	mark.burton

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC 01/10] target-arm: protect cpu_exclusive_*.
  2015-03-03 15:33                 ` Mark Burton
@ 2015-03-03 15:34                   ` Paolo Bonzini
  2015-03-03 15:41                     ` Mark Burton
  2015-03-03 15:47                   ` Dr. David Alan Gilbert
  1 sibling, 1 reply; 62+ messages in thread
From: Paolo Bonzini @ 2015-03-03 15:34 UTC (permalink / raw)
  To: Mark Burton
  Cc: mttcg, Peter Maydell, J. Kiszka, Alexander Graf, QEMU Developers,
	KONRAD Frédéric



On 03/03/2015 16:33, Mark Burton wrote:
> 
>> On 3 Mar 2015, at 16:32, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>
>>
>>
>> On 03/03/2015 16:29, Mark Burton wrote:
>>>
>>> ps. on our bug - we believe somehow the STREX is being marked as
>>> failed, but actually succeeds to write something.  There are only 3
>>> ways the strex can fail: 1/ the address doesn't match - in which case
>>> no ST will be attempted 2/ the value doesn't match - which means - no
>>> ST attempted 3/ the store starts, but causes a TLB fill/exception…
>>>
>>> The 3rd has 2 possibilities - the TLB is filled, and the store goes
>>> ahead totally normally - there should be no ‘fail’ - or an exception
>>> is generated in which case we will long jump away and never return.
>>
>> When do you release the lock?
>>
> (Thanks Paolo!)
> 
> We release the lock in either
> a) the end of the strex
> or
> b) in the ‘raise_exception’

That seems right... Can you post the patch?

Paolo

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC 01/10] target-arm: protect cpu_exclusive_*.
  2015-03-03 15:34                   ` Paolo Bonzini
@ 2015-03-03 15:41                     ` Mark Burton
  0 siblings, 0 replies; 62+ messages in thread
From: Mark Burton @ 2015-03-03 15:41 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: mttcg, Peter Maydell, J. Kiszka, Alexander Graf, QEMU Developers,
	KONRAD Frédéric


we’ll try and clean a patch up to show just this…….
THANKS!


Cheers
Mark.

> On 3 Mar 2015, at 16:34, Paolo Bonzini <pbonzini@redhat.com> wrote:
> 
> 
> 
> On 03/03/2015 16:33, Mark Burton wrote:
>> 
>>> On 3 Mar 2015, at 16:32, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>> 
>>> 
>>> 
>>> On 03/03/2015 16:29, Mark Burton wrote:
>>>> 
>>>> ps. on our bug - we believe somehow the STREX is being marked as
>>>> failed, but actually succeeds to write something.  There are only 3
>>>> ways the strex can fail: 1/ the address doesn't match - in which case
>>>> no ST will be attempted 2/ the value doesn't match - which means - no
>>>> ST attempted 3/ the store starts, but causes a TLB fill/exception…
>>>> 
>>>> The 3rd has 2 possibilities - the TLB is filled, and the store goes
>>>> ahead totally normally - there should be no ‘fail’ - or an exception
>>>> is generated in which case we will long jump away and never return.
>>> 
>>> When do you release the lock?
>>> 
>> (Thanks Paolo!)
>> 
>> We release the lock in either
>> a) the end of the strex
>> or
>> b) in the ‘raise_exception’
> 
> That seems right... Can you post the patch?
> 
> Paolo


	 +44 (0)20 7100 3485 x 210
 +33 (0)5 33 52 01 77x 210

	+33 (0)603762104
	mark.burton

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC 01/10] target-arm: protect cpu_exclusive_*.
  2015-03-03 15:33                 ` Mark Burton
  2015-03-03 15:34                   ` Paolo Bonzini
@ 2015-03-03 15:47                   ` Dr. David Alan Gilbert
  2015-03-13 19:38                     ` Richard Henderson
  1 sibling, 1 reply; 62+ messages in thread
From: Dr. David Alan Gilbert @ 2015-03-03 15:47 UTC (permalink / raw)
  To: Mark Burton
  Cc: mttcg, Peter Maydell, J. Kiszka, Alexander Graf, QEMU Developers,
	Paolo Bonzini, KONRAD Fr?d?ric

* Mark Burton (mark.burton@greensocs.com) wrote:
> 
> > On 3 Mar 2015, at 16:32, Paolo Bonzini <pbonzini@redhat.com> wrote:
> > 
> > 
> > 
> > On 03/03/2015 16:29, Mark Burton wrote:
> >> 
> >> ps. on our bug - we believe somehow the STREX is being marked as
> >> failed, but actually succeeds to write something.  There are only 3
> >> ways the strex can fail: 1/ the address doesn't match - in which case
> >> no ST will be attempted 2/ the value doesn't match - which means - no
> >> ST attempted 3/ the store starts, but causes a TLB fill/exception???
> >> 
> >> The 3rd has 2 possibilities - the TLB is filled, and the store goes
> >> ahead totally normally - there should be no ???fail??? - or an exception
> >> is generated in which case we will long jump away and never return.
> > 
> > When do you release the lock?
> > 
> (Thanks Paolo!)
> 
> We release the lock in either
> a) the end of the strex
> or
> b) in the ???raise_exception???

That works for ARM where you have to terminate a ldrex with a strex or clrex,
but not all architectures have the equivalent of a clrex; most as I remember
just let you do an ldrex equivalent, decide you don't want to do the strex equivalent
and get on with life. (Was clrex originally in ARM when ldrex went in?)

Dave

> 
> Cheers
> 
> Mark.
> 
> > Paolo
> 
> 
> 	 +44 (0)20 7100 3485 x 210
>  +33 (0)5 33 52 01 77x 210
> 
> 	+33 (0)603762104
> 	mark.burton
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC 01/10] target-arm: protect cpu_exclusive_*.
  2015-03-03 15:47                   ` Dr. David Alan Gilbert
@ 2015-03-13 19:38                     ` Richard Henderson
  2015-03-13 20:04                       ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 62+ messages in thread
From: Richard Henderson @ 2015-03-13 19:38 UTC (permalink / raw)
  To: Dr. David Alan Gilbert, Mark Burton
  Cc: mttcg, Peter Maydell, J. Kiszka, QEMU Developers, Alexander Graf,
	Paolo Bonzini, KONRAD Fr?d?ric

On 03/03/2015 07:47 AM, Dr. David Alan Gilbert wrote:
> That works for ARM where you have to terminate a ldrex with a strex or clrex,
> but not all architectures have the equivalent of a clrex; most as I remember
> just let you do an ldrex equivalent, decide you don't want to do the strex equivalent
> and get on with life.


I'm pretty sure that's not the case.  In fact, I can guarantee you that GCC
never issues clrex, but does in fact simply do nothing like you describe for
other architectures if we decide not to do the store.


r~

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC 01/10] target-arm: protect cpu_exclusive_*.
  2015-03-13 19:38                     ` Richard Henderson
@ 2015-03-13 20:04                       ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 62+ messages in thread
From: Dr. David Alan Gilbert @ 2015-03-13 20:04 UTC (permalink / raw)
  To: Richard Henderson
  Cc: mttcg, Peter Maydell, J. Kiszka, Mark Burton, QEMU Developers,
	Alexander Graf, Paolo Bonzini, KONRAD Fr?d?ric

* Richard Henderson (rth@twiddle.net) wrote:
> On 03/03/2015 07:47 AM, Dr. David Alan Gilbert wrote:
> > That works for ARM where you have to terminate a ldrex with a strex or clrex,
> > but not all architectures have the equivalent of a clrex; most as I remember
> > just let you do an ldrex equivalent, decide you don't want to do the strex equivalent
> > and get on with life.
> 
> 
> I'm pretty sure that's not the case.  In fact, I can guarantee you that GCC
> never issues clrex, but does in fact simply do nothing like you describe for
> other architectures if we decide not to do the store.

Oh well, that means this technique won't work even for ARM, where I thought
it might stand a chance for ARM but nothing else.
I've still got a vague memory that some ARM docs at one point told you that
you should terminate an LDREX by either an STREX or a CLREX; but it's ~3.5 years
since I did any arm wrestling.

Dave

> r~
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC 00/10] MultiThread TCG.
  2015-01-16 17:19 [Qemu-devel] [RFC 00/10] MultiThread TCG fred.konrad
                   ` (9 preceding siblings ...)
  2015-01-16 17:19 ` [Qemu-devel] [RFC 10/10] tcg: switch on multithread fred.konrad
@ 2015-03-27 10:08 ` Alex Bennée
  2015-03-27 10:37   ` Frederic Konrad
  10 siblings, 1 reply; 62+ messages in thread
From: Alex Bennée @ 2015-03-27 10:08 UTC (permalink / raw)
  To: fred.konrad
  Cc: mttcg, peter.maydell, jan.kiszka, mark.burton, qemu-devel, agraf,
	pbonzini


fred.konrad@greensocs.com writes:

> From: KONRAD Frederic <fred.konrad@greensocs.com>
>
> Hi everybody,
>
> This is the start of our work on TCG multithread.

It's been a while since the first RFC are we likely to see a v2 of the
patch series any time soon?

>
> We send it for comment to be sure we are taking the right direction.
> We already discussed the first patch but we keep it for simplicity.
>
> We choice to keep a common tbs array for all VCPU but protect it with the
> tb_lock from TBContext. Then for each PageDesc we have a tb list per VCPU.
>
> Known issues:
>   * Some random deadlock.
>   * qemu_pause_cond is broken we can't quit QEMU.
>   * tb_flush is broken we must make sure no VCPU are executing code.
>
> Jan Kiszka (1):
>   Drop global lock during TCG code execution
>
> KONRAD Frederic (9):
>   target-arm: protect cpu_exclusive_*.
>   use a different translation block list for each cpu.
>   replace spinlock by QemuMutex.
>   remove unused spinlock.
>   extract TBContext from TCGContext.
>   protect TBContext with tb_lock.
>   tcg: remove tcg_halt_cond global variable.
>   cpu: remove exit_request global.
>   tcg: switch on multithread.
>
>  cpu-exec.c                |  47 +++++++----
>  cpus.c                    | 122 +++++++++++----------------
>  cputlb.c                  |   5 ++
>  exec.c                    |  25 ++++++
>  include/exec/exec-all.h   |   4 +-
>  include/exec/spinlock.h   |  49 -----------
>  include/qom/cpu.h         |   1 +
>  linux-user/main.c         |   6 +-
>  scripts/checkpatch.pl     |   9 +-
>  softmmu_template.h        |   6 ++
>  target-arm/cpu.c          |  14 ++++
>  target-arm/cpu.h          |   3 +
>  target-arm/helper.h       |   3 +
>  target-arm/op_helper.c    |  10 +++
>  target-arm/translate.c    |   6 ++
>  target-i386/mem_helper.c  |  16 +++-
>  target-i386/misc_helper.c |  27 +++++-
>  tcg/i386/tcg-target.c     |   8 ++
>  tcg/tcg.h                 |   3 +-
>  translate-all.c           | 208 +++++++++++++++++++++++++++-------------------
>  vl.c                      |   6 ++
>  21 files changed, 335 insertions(+), 243 deletions(-)
>  delete mode 100644 include/exec/spinlock.h

-- 
Alex Bennée

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC 00/10] MultiThread TCG.
  2015-03-27 10:08 ` [Qemu-devel] [RFC 00/10] MultiThread TCG Alex Bennée
@ 2015-03-27 10:37   ` Frederic Konrad
  2015-03-30  6:52     ` Mark Burton
  0 siblings, 1 reply; 62+ messages in thread
From: Frederic Konrad @ 2015-03-27 10:37 UTC (permalink / raw)
  To: Alex Bennée
  Cc: mttcg, peter.maydell, jan.kiszka, mark.burton, agraf, qemu-devel,
	pbonzini

Hi,

Yes a v2 will come soon.
Actually I'm trying to unbreak GDB stub and a strange segfault with smp > 2.

Fred

On 27/03/2015 11:08, Alex Bennée wrote:
> fred.konrad@greensocs.com writes:
>
>> From: KONRAD Frederic <fred.konrad@greensocs.com>
>>
>> Hi everybody,
>>
>> This is the start of our work on TCG multithread.
> It's been a while since the first RFC are we likely to see a v2 of the
> patch series any time soon?
>
>> We send it for comment to be sure we are taking the right direction.
>> We already discussed the first patch but we keep it for simplicity.
>>
>> We choice to keep a common tbs array for all VCPU but protect it with the
>> tb_lock from TBContext. Then for each PageDesc we have a tb list per VCPU.
>>
>> Known issues:
>>    * Some random deadlock.
>>    * qemu_pause_cond is broken we can't quit QEMU.
>>    * tb_flush is broken we must make sure no VCPU are executing code.
>>
>> Jan Kiszka (1):
>>    Drop global lock during TCG code execution
>>
>> KONRAD Frederic (9):
>>    target-arm: protect cpu_exclusive_*.
>>    use a different translation block list for each cpu.
>>    replace spinlock by QemuMutex.
>>    remove unused spinlock.
>>    extract TBContext from TCGContext.
>>    protect TBContext with tb_lock.
>>    tcg: remove tcg_halt_cond global variable.
>>    cpu: remove exit_request global.
>>    tcg: switch on multithread.
>>
>>   cpu-exec.c                |  47 +++++++----
>>   cpus.c                    | 122 +++++++++++----------------
>>   cputlb.c                  |   5 ++
>>   exec.c                    |  25 ++++++
>>   include/exec/exec-all.h   |   4 +-
>>   include/exec/spinlock.h   |  49 -----------
>>   include/qom/cpu.h         |   1 +
>>   linux-user/main.c         |   6 +-
>>   scripts/checkpatch.pl     |   9 +-
>>   softmmu_template.h        |   6 ++
>>   target-arm/cpu.c          |  14 ++++
>>   target-arm/cpu.h          |   3 +
>>   target-arm/helper.h       |   3 +
>>   target-arm/op_helper.c    |  10 +++
>>   target-arm/translate.c    |   6 ++
>>   target-i386/mem_helper.c  |  16 +++-
>>   target-i386/misc_helper.c |  27 +++++-
>>   tcg/i386/tcg-target.c     |   8 ++
>>   tcg/tcg.h                 |   3 +-
>>   translate-all.c           | 208 +++++++++++++++++++++++++++-------------------
>>   vl.c                      |   6 ++
>>   21 files changed, 335 insertions(+), 243 deletions(-)
>>   delete mode 100644 include/exec/spinlock.h

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC 00/10] MultiThread TCG.
  2015-03-27 10:37   ` Frederic Konrad
@ 2015-03-30  6:52     ` Mark Burton
  2015-03-30 21:46       ` Peter Maydell
  0 siblings, 1 reply; 62+ messages in thread
From: Mark Burton @ 2015-03-30  6:52 UTC (permalink / raw)
  To: KONRAD Frédéric
  Cc: mttcg, Peter Maydell, Jan Kiszka, Alexander Graf,
	QEMU Developers, pbonzini, Alex Bennée

to add some detail,
Unfortunately Fred is away this week, so we won’t get this patch set to you ask quickly as I’d have liked.

We have a ‘working’ implementation - where ‘working’ is limited to a couple of SMP cores, booting and running Dhrystone. The performance improvement we get is close to the 2x that you might expect (especially for such a simple test as Dhrystone). HOWEVER…

We still have the ‘hacked’ version of tb_first (if you remember, we have made that into an array, addressed by current_cpu). This has two effects (at least):
	1/ it’s probably un-necissary, but we DO need to protect cpu’s from each other when they invalidate tb_first (probably not when they write, as the current implementation only allows 1 thread to generate TB’s)
	2/ it breaks GDB as current_cpu isn’t set in the IO thread, so that just set-faults and dies…
We are in the process of working out how to fix this mess cleanly. 

So - Fred is unwilling to send the patch set as it stands, because frankly this part is totally broken.

There is an independent patch set that needs splitting out which deals with just the atomic instruction issue - specifically for ARM (though I guess it’s applicable across the board)…

So - in short - I HOPE to get the patch set onto the reflector sometime next week, and I’m sorry for the delay.

When we do send it - it will be sent RFC.
It is possible that we all agree that the atomic instruction fix is applicable, but the rest is almost certainly not. It’s there to show progress and the direction we’re going and give everybody a basis from which to discuss the issues…. not more.


Cheers

Mark.



> On 27 Mar 2015, at 11:37, Frederic Konrad <fred.konrad@greensocs.com> wrote:
> 
> Hi,
> 
> Yes a v2 will come soon.
> Actually I'm trying to unbreak GDB stub and a strange segfault with smp > 2.
> 
> Fred
> 
> On 27/03/2015 11:08, Alex Bennée wrote:
>> fred.konrad@greensocs.com writes:
>> 
>>> From: KONRAD Frederic <fred.konrad@greensocs.com>
>>> 
>>> Hi everybody,
>>> 
>>> This is the start of our work on TCG multithread.
>> It's been a while since the first RFC are we likely to see a v2 of the
>> patch series any time soon?
>> 
>>> We send it for comment to be sure we are taking the right direction.
>>> We already discussed the first patch but we keep it for simplicity.
>>> 
>>> We choice to keep a common tbs array for all VCPU but protect it with the
>>> tb_lock from TBContext. Then for each PageDesc we have a tb list per VCPU.
>>> 
>>> Known issues:
>>>   * Some random deadlock.
>>>   * qemu_pause_cond is broken we can't quit QEMU.
>>>   * tb_flush is broken we must make sure no VCPU are executing code.
>>> 
>>> Jan Kiszka (1):
>>>   Drop global lock during TCG code execution
>>> 
>>> KONRAD Frederic (9):
>>>   target-arm: protect cpu_exclusive_*.
>>>   use a different translation block list for each cpu.
>>>   replace spinlock by QemuMutex.
>>>   remove unused spinlock.
>>>   extract TBContext from TCGContext.
>>>   protect TBContext with tb_lock.
>>>   tcg: remove tcg_halt_cond global variable.
>>>   cpu: remove exit_request global.
>>>   tcg: switch on multithread.
>>> 
>>>  cpu-exec.c                |  47 +++++++----
>>>  cpus.c                    | 122 +++++++++++----------------
>>>  cputlb.c                  |   5 ++
>>>  exec.c                    |  25 ++++++
>>>  include/exec/exec-all.h   |   4 +-
>>>  include/exec/spinlock.h   |  49 -----------
>>>  include/qom/cpu.h         |   1 +
>>>  linux-user/main.c         |   6 +-
>>>  scripts/checkpatch.pl     |   9 +-
>>>  softmmu_template.h        |   6 ++
>>>  target-arm/cpu.c          |  14 ++++
>>>  target-arm/cpu.h          |   3 +
>>>  target-arm/helper.h       |   3 +
>>>  target-arm/op_helper.c    |  10 +++
>>>  target-arm/translate.c    |   6 ++
>>>  target-i386/mem_helper.c  |  16 +++-
>>>  target-i386/misc_helper.c |  27 +++++-
>>>  tcg/i386/tcg-target.c     |   8 ++
>>>  tcg/tcg.h                 |   3 +-
>>>  translate-all.c           | 208 +++++++++++++++++++++++++++-------------------
>>>  vl.c                      |   6 ++
>>>  21 files changed, 335 insertions(+), 243 deletions(-)
>>>  delete mode 100644 include/exec/spinlock.h
> 


	 +44 (0)20 7100 3485 x 210
 +33 (0)5 33 52 01 77x 210

	+33 (0)603762104
	mark.burton

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC 00/10] MultiThread TCG.
  2015-03-30  6:52     ` Mark Burton
@ 2015-03-30 21:46       ` Peter Maydell
  2015-03-31  6:41         ` Mark Burton
  2015-04-10 16:03         ` Frederic Konrad
  0 siblings, 2 replies; 62+ messages in thread
From: Peter Maydell @ 2015-03-30 21:46 UTC (permalink / raw)
  To: Mark Burton
  Cc: mttcg, Jan Kiszka, Alexander Graf, QEMU Developers,
	Paolo Bonzini, Alex Bennée, KONRAD Frédéric

On 30 March 2015 at 07:52, Mark Burton <mark.burton@greensocs.com> wrote:
> So - Fred is unwilling to send the patch set as it stands, because frankly this part is totally broken.
>
> There is an independent patch set that needs splitting out which deals with just the atomic instruction issue - specifically for ARM (though I guess it’s applicable across the board)…
>
> So - in short - I HOPE to get the patch set onto the reflector sometime next week, and I’m sorry for the delay.

What I really want to see is not so much the patch set
but the design sketch I asked for that lists the
various data structures and indicates which ones
are going to be per-cpu, which ones will be shared
(and with what locking), etc.

-- PMM

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC 00/10] MultiThread TCG.
  2015-03-30 21:46       ` Peter Maydell
@ 2015-03-31  6:41         ` Mark Burton
  2015-04-10 16:03         ` Frederic Konrad
  1 sibling, 0 replies; 62+ messages in thread
From: Mark Burton @ 2015-03-31  6:41 UTC (permalink / raw)
  To: Peter Maydell
  Cc: mttcg, Jan Kiszka, Alexander Graf, QEMU Developers,
	Paolo Bonzini, Alex Bennée, KONRAD Frédéric

understood.
Cheers
Mark.

> On 30 Mar 2015, at 23:46, Peter Maydell <peter.maydell@linaro.org> wrote:
> 
> On 30 March 2015 at 07:52, Mark Burton <mark.burton@greensocs.com> wrote:
>> So - Fred is unwilling to send the patch set as it stands, because frankly this part is totally broken.
>> 
>> There is an independent patch set that needs splitting out which deals with just the atomic instruction issue - specifically for ARM (though I guess it’s applicable across the board)…
>> 
>> So - in short - I HOPE to get the patch set onto the reflector sometime next week, and I’m sorry for the delay.
> 
> What I really want to see is not so much the patch set
> but the design sketch I asked for that lists the
> various data structures and indicates which ones
> are going to be per-cpu, which ones will be shared
> (and with what locking), etc.
> 
> -- PMM


	 +44 (0)20 7100 3485 x 210
 +33 (0)5 33 52 01 77x 210

	+33 (0)603762104
	mark.burton

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC 00/10] MultiThread TCG.
  2015-03-30 21:46       ` Peter Maydell
  2015-03-31  6:41         ` Mark Burton
@ 2015-04-10 16:03         ` Frederic Konrad
  2015-04-22 12:26           ` Frederic Konrad
  1 sibling, 1 reply; 62+ messages in thread
From: Frederic Konrad @ 2015-04-10 16:03 UTC (permalink / raw)
  To: Peter Maydell, Mark Burton
  Cc: mttcg, Jan Kiszka, QEMU Developers, Alexander Graf,
	Paolo Bonzini, Alex Bennée

On 30/03/2015 23:46, Peter Maydell wrote:
> On 30 March 2015 at 07:52, Mark Burton <mark.burton@greensocs.com> wrote:
>> So - Fred is unwilling to send the patch set as it stands, because frankly this part is totally broken.
>>
>> There is an independent patch set that needs splitting out which deals with just the atomic instruction issue - specifically for ARM (though I guess it’s applicable across the board)…
>>
>> So - in short - I HOPE to get the patch set onto the reflector sometime next week, and I’m sorry for the delay.
> What I really want to see is not so much the patch set
> but the design sketch I asked for that lists the
> various data structures and indicates which ones
> are going to be per-cpu, which ones will be shared
> (and with what locking), etc.
>
> -- PMM
Hi everybody,
Hi Peter,

I tried to recap what we did, how it "works" and what the status:

All the mechanism are basically unchanged.

A lot of TCG structures are not thread safe.
And all TCG threads can run at the same times and sometimes want to generate
code at the same time.

Translation block related structure:

struct TBContext {

     TranslationBlock *tbs;
     TranslationBlock *tb_phys_hash[CODE_GEN_PHYS_HASH_SIZE];
     int nb_tbs;
     /* any access to the tbs or the page table must use this lock */
     QemuMutex tb_lock;

     /* statistics */
     int tb_flush_count;
     int tb_phys_invalidate_count;

     int tb_invalidated_flag;
};

This structure is used in TCGContext: TBContext tb_ctx;

"tbs" is basically where the translated block are stored and 
tb_phys_hash an
hash table to find them quickly.

There are two solutions to prevent thread issues:
   A/ Just have two tb_ctx.
   B/ Share it between CPUs and protect the tb_ctx access.

We took the second solution so all CPUs can benefit of the translated TB.
TBContext is written almost everywhere in translate-all.c.
When there are too much tbs a tb_flush occurs and destroy the array. We 
don't
handle this case right now.
tb_lock is already used by user-mode code, so we just convert it to 
QemuMutex so
we can reuse it in system-mode.

struct TCGContext {
     uint8_t *pool_cur, *pool_end;
     TCGPool *pool_first, *pool_current, *pool_first_large;
     TCGLabel *labels;
     int nb_labels;
     int nb_globals;
     int nb_temps;

     /* goto_tb support */
     tcg_insn_unit *code_buf;
     uintptr_t *tb_next;
     uint16_t *tb_next_offset;
     uint16_t *tb_jmp_offset; /* != NULL if USE_DIRECT_JUMP */

     /* liveness analysis */
     uint16_t *op_dead_args; /* for each operation, each bit tells if the
                                corresponding argument is dead */
     uint8_t *op_sync_args;  /* for each operation, each bit tells if the
                                corresponding output argument needs to be
                                sync to memory. */

     /* tells in which temporary a given register is. It does not take
        into account fixed registers */
     int reg_to_temp[TCG_TARGET_NB_REGS];
     TCGRegSet reserved_regs;
     intptr_t current_frame_offset;
     intptr_t frame_start;
     intptr_t frame_end;
     int frame_reg;

     tcg_insn_unit *code_ptr;
     TCGTemp temps[TCG_MAX_TEMPS]; /* globals first, temps after */
     TCGTempSet free_temps[TCG_TYPE_COUNT * 2];

     GHashTable *helpers;

#ifdef CONFIG_PROFILER
     /* profiling info */
     int64_t tb_count1;
     int64_t tb_count;
     int64_t op_count; /* total insn count */
     int op_count_max; /* max insn per TB */
     int64_t temp_count;
     int temp_count_max;
     int64_t del_op_count;
     int64_t code_in_len;
     int64_t code_out_len;
     int64_t interm_time;
     int64_t code_time;
     int64_t la_time;
     int64_t opt_time;
     int64_t restore_count;
     int64_t restore_time;
#endif

#ifdef CONFIG_DEBUG_TCG
     int temps_in_use;
     int goto_tb_issue_mask;
#endif

     uint16_t gen_opc_buf[OPC_BUF_SIZE];
     TCGArg gen_opparam_buf[OPPARAM_BUF_SIZE];

     uint16_t *gen_opc_ptr;
     TCGArg *gen_opparam_ptr;
     target_ulong gen_opc_pc[OPC_BUF_SIZE];
     uint16_t gen_opc_icount[OPC_BUF_SIZE];
     uint8_t gen_opc_instr_start[OPC_BUF_SIZE];

     /* Code generation.  Note that we specifically do not use 
tcg_insn_unit
        here, because there's too much arithmetic throughout that relies
        on addition and subtraction working on bytes.  Rely on the GCC
        extension that allows arithmetic on void*.  */
     int code_gen_max_blocks;
     void *code_gen_prologue;
     void *code_gen_buffer;
     size_t code_gen_buffer_size;
     /* threshold to flush the translated code buffer */
     size_t code_gen_buffer_max_size;
     void *code_gen_ptr;

     TBContext tb_ctx;

     /* The TCGBackendData structure is private to tcg-target.c.  */
     struct TCGBackendData *be;
};

This structure is used to translate the TBs.
The easier solution was to protect the generation of the code to only 
allow one
CPU to generate code at a time. This is normal as we don't want double 
generated
tb in the pool anyway. This is achieved with the tb_lock used above.

TLB:

TLB seems to be CPU dependant, so it is not really a problem as in our
implementation one CPU = one pthread. But sometimes a CPU wants to flush 
TLB,
through an instruction for example. It is very likely an other CPU in an 
other
thread is executing code at the same time. That's why we choose to create a
tlb_flush_mechanism:
When a CPU wants to flush it asks and wait all CPU to exit TCG and then 
exit
itself. This can be reused for tb_invalidate and or tb_flush as well.

Atomic instructions:

Atomic instructions are quite hard to implement.
The TranslationBlock implementing the atomic instruction can't be 
interrupted
during the execution (eg: by an interrupt or a signal) cmpxchg64 helper 
is used
for that.

QEMU's global lock:

TCG thread take the lock during code execution. This is not ok for 
multi-thread
because that means only one thread will be running at a time. That's why 
we took
Jan's patch to allow TCG to run without the lock and take it when needed.

What is the status:

  * We can start a vexpress-a15 simulation with two A15 and run two 
dhrystones at
    a time, the performance are increased it's quite stable.

What is missing:

  * tb_flush is not implemented correctly.
  * PageDesc structure is not protected the patch which introduced a 
first_tb
    array was not the right approach and is removed. This implies that
    tb_invalidate is broken.

For both issues we plan to use the same mechanism as tlb_flush: exiting 
all the
CPU, flushing, invalidating and let them continue. A generic mechanism 
must be
implemented for that.

Known issues:

  * GDB stub is broken because it uses tb_invalidate and we didn't 
implement that
    for now, and there are probably other issues.
  * SMP > 2 crashes, probably because of tb_invalidate as well.
  * We don't know the status of the user code, which is probably broken 
by our
    changes.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC 00/10] MultiThread TCG.
  2015-04-10 16:03         ` Frederic Konrad
@ 2015-04-22 12:26           ` Frederic Konrad
  2015-04-22 13:18             ` Peter Maydell
                               ` (2 more replies)
  0 siblings, 3 replies; 62+ messages in thread
From: Frederic Konrad @ 2015-04-22 12:26 UTC (permalink / raw)
  To: Peter Maydell, Mark Burton
  Cc: mttcg, Jan Kiszka, QEMU Developers, Alexander Graf,
	Paolo Bonzini, Alex Bennée

On 10/04/2015 18:03, Frederic Konrad wrote:
> On 30/03/2015 23:46, Peter Maydell wrote:
>> On 30 March 2015 at 07:52, Mark Burton <mark.burton@greensocs.com> 
>> wrote:
>>> So - Fred is unwilling to send the patch set as it stands, because 
>>> frankly this part is totally broken.
>>>
>>> There is an independent patch set that needs splitting out which 
>>> deals with just the atomic instruction issue - specifically for ARM 
>>> (though I guess it’s applicable across the board)…
>>>
>>> So - in short - I HOPE to get the patch set onto the reflector 
>>> sometime next week, and I’m sorry for the delay.
>> What I really want to see is not so much the patch set
>> but the design sketch I asked for that lists the
>> various data structures and indicates which ones
>> are going to be per-cpu, which ones will be shared
>> (and with what locking), etc.
>>
>> -- PMM

Does that makes sense?

BTW here is the repository:
git clone git@git.greensocs.com:fkonrad/mttcg.git -b multi_tcg_v4

Thanks,
Fred

> Hi everybody,
> Hi Peter,
>
> I tried to recap what we did, how it "works" and what the status:
>
> All the mechanism are basically unchanged.
>
> A lot of TCG structures are not thread safe.
> And all TCG threads can run at the same times and sometimes want to 
> generate
> code at the same time.
>
> Translation block related structure:
>
> struct TBContext {
>
>     TranslationBlock *tbs;
>     TranslationBlock *tb_phys_hash[CODE_GEN_PHYS_HASH_SIZE];
>     int nb_tbs;
>     /* any access to the tbs or the page table must use this lock */
>     QemuMutex tb_lock;
>
>     /* statistics */
>     int tb_flush_count;
>     int tb_phys_invalidate_count;
>
>     int tb_invalidated_flag;
> };
>
> This structure is used in TCGContext: TBContext tb_ctx;
>
> "tbs" is basically where the translated block are stored and 
> tb_phys_hash an
> hash table to find them quickly.
>
> There are two solutions to prevent thread issues:
>   A/ Just have two tb_ctx.
>   B/ Share it between CPUs and protect the tb_ctx access.
>
> We took the second solution so all CPUs can benefit of the translated TB.
> TBContext is written almost everywhere in translate-all.c.
> When there are too much tbs a tb_flush occurs and destroy the array. 
> We don't
> handle this case right now.
> tb_lock is already used by user-mode code, so we just convert it to 
> QemuMutex so
> we can reuse it in system-mode.
>
> struct TCGContext {
>     uint8_t *pool_cur, *pool_end;
>     TCGPool *pool_first, *pool_current, *pool_first_large;
>     TCGLabel *labels;
>     int nb_labels;
>     int nb_globals;
>     int nb_temps;
>
>     /* goto_tb support */
>     tcg_insn_unit *code_buf;
>     uintptr_t *tb_next;
>     uint16_t *tb_next_offset;
>     uint16_t *tb_jmp_offset; /* != NULL if USE_DIRECT_JUMP */
>
>     /* liveness analysis */
>     uint16_t *op_dead_args; /* for each operation, each bit tells if the
>                                corresponding argument is dead */
>     uint8_t *op_sync_args;  /* for each operation, each bit tells if the
>                                corresponding output argument needs to be
>                                sync to memory. */
>
>     /* tells in which temporary a given register is. It does not take
>        into account fixed registers */
>     int reg_to_temp[TCG_TARGET_NB_REGS];
>     TCGRegSet reserved_regs;
>     intptr_t current_frame_offset;
>     intptr_t frame_start;
>     intptr_t frame_end;
>     int frame_reg;
>
>     tcg_insn_unit *code_ptr;
>     TCGTemp temps[TCG_MAX_TEMPS]; /* globals first, temps after */
>     TCGTempSet free_temps[TCG_TYPE_COUNT * 2];
>
>     GHashTable *helpers;
>
> #ifdef CONFIG_PROFILER
>     /* profiling info */
>     int64_t tb_count1;
>     int64_t tb_count;
>     int64_t op_count; /* total insn count */
>     int op_count_max; /* max insn per TB */
>     int64_t temp_count;
>     int temp_count_max;
>     int64_t del_op_count;
>     int64_t code_in_len;
>     int64_t code_out_len;
>     int64_t interm_time;
>     int64_t code_time;
>     int64_t la_time;
>     int64_t opt_time;
>     int64_t restore_count;
>     int64_t restore_time;
> #endif
>
> #ifdef CONFIG_DEBUG_TCG
>     int temps_in_use;
>     int goto_tb_issue_mask;
> #endif
>
>     uint16_t gen_opc_buf[OPC_BUF_SIZE];
>     TCGArg gen_opparam_buf[OPPARAM_BUF_SIZE];
>
>     uint16_t *gen_opc_ptr;
>     TCGArg *gen_opparam_ptr;
>     target_ulong gen_opc_pc[OPC_BUF_SIZE];
>     uint16_t gen_opc_icount[OPC_BUF_SIZE];
>     uint8_t gen_opc_instr_start[OPC_BUF_SIZE];
>
>     /* Code generation.  Note that we specifically do not use 
> tcg_insn_unit
>        here, because there's too much arithmetic throughout that relies
>        on addition and subtraction working on bytes.  Rely on the GCC
>        extension that allows arithmetic on void*.  */
>     int code_gen_max_blocks;
>     void *code_gen_prologue;
>     void *code_gen_buffer;
>     size_t code_gen_buffer_size;
>     /* threshold to flush the translated code buffer */
>     size_t code_gen_buffer_max_size;
>     void *code_gen_ptr;
>
>     TBContext tb_ctx;
>
>     /* The TCGBackendData structure is private to tcg-target.c. */
>     struct TCGBackendData *be;
> };
>
> This structure is used to translate the TBs.
> The easier solution was to protect the generation of the code to only 
> allow one
> CPU to generate code at a time. This is normal as we don't want double 
> generated
> tb in the pool anyway. This is achieved with the tb_lock used above.
>
> TLB:
>
> TLB seems to be CPU dependant, so it is not really a problem as in our
> implementation one CPU = one pthread. But sometimes a CPU wants to 
> flush TLB,
> through an instruction for example. It is very likely an other CPU in 
> an other
> thread is executing code at the same time. That's why we choose to 
> create a
> tlb_flush_mechanism:
> When a CPU wants to flush it asks and wait all CPU to exit TCG and 
> then exit
> itself. This can be reused for tb_invalidate and or tb_flush as well.
>
> Atomic instructions:
>
> Atomic instructions are quite hard to implement.
> The TranslationBlock implementing the atomic instruction can't be 
> interrupted
> during the execution (eg: by an interrupt or a signal) cmpxchg64 
> helper is used
> for that.
>
> QEMU's global lock:
>
> TCG thread take the lock during code execution. This is not ok for 
> multi-thread
> because that means only one thread will be running at a time. That's 
> why we took
> Jan's patch to allow TCG to run without the lock and take it when needed.
>
> What is the status:
>
>  * We can start a vexpress-a15 simulation with two A15 and run two 
> dhrystones at
>    a time, the performance are increased it's quite stable.
>
> What is missing:
>
>  * tb_flush is not implemented correctly.
>  * PageDesc structure is not protected the patch which introduced a 
> first_tb
>    array was not the right approach and is removed. This implies that
>    tb_invalidate is broken.
>
> For both issues we plan to use the same mechanism as tlb_flush: 
> exiting all the
> CPU, flushing, invalidating and let them continue. A generic mechanism 
> must be
> implemented for that.
>
> Known issues:
>
>  * GDB stub is broken because it uses tb_invalidate and we didn't 
> implement that
>    for now, and there are probably other issues.
>  * SMP > 2 crashes, probably because of tb_invalidate as well.
>  * We don't know the status of the user code, which is probably broken 
> by our
>    changes.
>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC 00/10] MultiThread TCG.
  2015-04-22 12:26           ` Frederic Konrad
@ 2015-04-22 13:18             ` Peter Maydell
  2015-04-23  7:38               ` Frederic Konrad
  2015-04-23 15:44             ` Alex Bennée
  2015-04-27 17:06             ` Emilio G. Cota
  2 siblings, 1 reply; 62+ messages in thread
From: Peter Maydell @ 2015-04-22 13:18 UTC (permalink / raw)
  To: Frederic Konrad
  Cc: mttcg, Jan Kiszka, Mark Burton, Alexander Graf, QEMU Developers,
	Paolo Bonzini, Alex Bennée

On 22 April 2015 at 13:26, Frederic Konrad <fred.konrad@greensocs.com> wrote:
[re summary of design]
> Does that makes sense?

Oops, I think that I lost that email in the flow of stuff through the
mailing list. I think Alex is going to be doing the bulk of the review
on this for Linaro, so I'll let him have a look at it :-)

thanks
-- PMM

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC 00/10] MultiThread TCG.
  2015-04-22 13:18             ` Peter Maydell
@ 2015-04-23  7:38               ` Frederic Konrad
  0 siblings, 0 replies; 62+ messages in thread
From: Frederic Konrad @ 2015-04-23  7:38 UTC (permalink / raw)
  To: Peter Maydell
  Cc: mttcg, Jan Kiszka, Mark Burton, Alexander Graf, QEMU Developers,
	Paolo Bonzini, Alex Bennée

On 22/04/2015 15:18, Peter Maydell wrote:
> On 22 April 2015 at 13:26, Frederic Konrad <fred.konrad@greensocs.com> wrote:
> [re summary of design]
>> Does that makes sense?
> Oops, I think that I lost that email in the flow of stuff through the
> mailing list. I think Alex is going to be doing the bulk of the review
> on this for Linaro, so I'll let him have a look at it :-)
>
> thanks
> -- PMM
Ok no problems :),

Thanks,
Fred

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC 00/10] MultiThread TCG.
  2015-04-22 12:26           ` Frederic Konrad
  2015-04-22 13:18             ` Peter Maydell
@ 2015-04-23 15:44             ` Alex Bennée
  2015-04-23 15:46               ` Alex Bennée
  2015-04-27 17:06             ` Emilio G. Cota
  2 siblings, 1 reply; 62+ messages in thread
From: Alex Bennée @ 2015-04-23 15:44 UTC (permalink / raw)
  To: Frederic Konrad
  Cc: mttcg, Peter Maydell, Jan Kiszka, Mark Burton, Alexander Graf,
	QEMU Developers, Paolo Bonzini


Frederic Konrad <fred.konrad@greensocs.com> writes:

> On 10/04/2015 18:03, Frederic Konrad wrote:
>> On 30/03/2015 23:46, Peter Maydell wrote:
>>> On 30 March 2015 at 07:52, Mark Burton <mark.burton@greensocs.com> 
>>> wrote:
>>>> So - Fred is unwilling to send the patch set as it stands, because 
>>>> frankly this part is totally broken.
>>>>
>>>> There is an independent patch set that needs splitting out which 
>>>> deals with just the atomic instruction issue - specifically for ARM 
>>>> (though I guess it’s applicable across the board)…
>>>>
>>>> So - in short - I HOPE to get the patch set onto the reflector 
>>>> sometime next week, and I’m sorry for the delay.
>>> What I really want to see is not so much the patch set
>>> but the design sketch I asked for that lists the
>>> various data structures and indicates which ones
>>> are going to be per-cpu, which ones will be shared
>>> (and with what locking), etc.
>>>
>>> -- PMM
>
> Does that makes sense?
>
> BTW here is the repository:
> git clone git@git.greensocs.com:fkonrad/mttcg.git -b multi_tcg_v4

Is there a non-authenticated read-only http or git:// access to this repo?

>
> Thanks,
> Fred
>
>> Hi everybody,
>> Hi Peter,
>>
>> I tried to recap what we did, how it "works" and what the status:
>>
>> All the mechanism are basically unchanged.
>>
>> A lot of TCG structures are not thread safe.
>> And all TCG threads can run at the same times and sometimes want to 
>> generate
>> code at the same time.
>>
>> Translation block related structure:
>>
>> struct TBContext {
>>
>>     TranslationBlock *tbs;
>>     TranslationBlock *tb_phys_hash[CODE_GEN_PHYS_HASH_SIZE];
>>     int nb_tbs;
>>     /* any access to the tbs or the page table must use this lock */
>>     QemuMutex tb_lock;
>>
>>     /* statistics */
>>     int tb_flush_count;
>>     int tb_phys_invalidate_count;
>>
>>     int tb_invalidated_flag;
>> };
>>
>> This structure is used in TCGContext: TBContext tb_ctx;
>>
>> "tbs" is basically where the translated block are stored and 
>> tb_phys_hash an
>> hash table to find them quickly.
>>
>> There are two solutions to prevent thread issues:
>>   A/ Just have two tb_ctx.
>>   B/ Share it between CPUs and protect the tb_ctx access.
>>
>> We took the second solution so all CPUs can benefit of the translated TB.
>> TBContext is written almost everywhere in translate-all.c.
>> When there are too much tbs a tb_flush occurs and destroy the array. 
>> We don't
>> handle this case right now.
>> tb_lock is already used by user-mode code, so we just convert it to 
>> QemuMutex so
>> we can reuse it in system-mode.
>>
>> struct TCGContext {
>>     uint8_t *pool_cur, *pool_end;
>>     TCGPool *pool_first, *pool_current, *pool_first_large;
>>     TCGLabel *labels;
>>     int nb_labels;
>>     int nb_globals;
>>     int nb_temps;
>>
>>     /* goto_tb support */
>>     tcg_insn_unit *code_buf;
>>     uintptr_t *tb_next;
>>     uint16_t *tb_next_offset;
>>     uint16_t *tb_jmp_offset; /* != NULL if USE_DIRECT_JUMP */
>>
>>     /* liveness analysis */
>>     uint16_t *op_dead_args; /* for each operation, each bit tells if the
>>                                corresponding argument is dead */
>>     uint8_t *op_sync_args;  /* for each operation, each bit tells if the
>>                                corresponding output argument needs to be
>>                                sync to memory. */
>>
>>     /* tells in which temporary a given register is. It does not take
>>        into account fixed registers */
>>     int reg_to_temp[TCG_TARGET_NB_REGS];
>>     TCGRegSet reserved_regs;
>>     intptr_t current_frame_offset;
>>     intptr_t frame_start;
>>     intptr_t frame_end;
>>     int frame_reg;
>>
>>     tcg_insn_unit *code_ptr;
>>     TCGTemp temps[TCG_MAX_TEMPS]; /* globals first, temps after */
>>     TCGTempSet free_temps[TCG_TYPE_COUNT * 2];
>>
>>     GHashTable *helpers;
>>
>> #ifdef CONFIG_PROFILER
>>     /* profiling info */
>>     int64_t tb_count1;
>>     int64_t tb_count;
>>     int64_t op_count; /* total insn count */
>>     int op_count_max; /* max insn per TB */
>>     int64_t temp_count;
>>     int temp_count_max;
>>     int64_t del_op_count;
>>     int64_t code_in_len;
>>     int64_t code_out_len;
>>     int64_t interm_time;
>>     int64_t code_time;
>>     int64_t la_time;
>>     int64_t opt_time;
>>     int64_t restore_count;
>>     int64_t restore_time;
>> #endif
>>
>> #ifdef CONFIG_DEBUG_TCG
>>     int temps_in_use;
>>     int goto_tb_issue_mask;
>> #endif
>>
>>     uint16_t gen_opc_buf[OPC_BUF_SIZE];
>>     TCGArg gen_opparam_buf[OPPARAM_BUF_SIZE];
>>
>>     uint16_t *gen_opc_ptr;
>>     TCGArg *gen_opparam_ptr;
>>     target_ulong gen_opc_pc[OPC_BUF_SIZE];
>>     uint16_t gen_opc_icount[OPC_BUF_SIZE];
>>     uint8_t gen_opc_instr_start[OPC_BUF_SIZE];
>>
>>     /* Code generation.  Note that we specifically do not use 
>> tcg_insn_unit
>>        here, because there's too much arithmetic throughout that relies
>>        on addition and subtraction working on bytes.  Rely on the GCC
>>        extension that allows arithmetic on void*.  */
>>     int code_gen_max_blocks;
>>     void *code_gen_prologue;
>>     void *code_gen_buffer;
>>     size_t code_gen_buffer_size;
>>     /* threshold to flush the translated code buffer */
>>     size_t code_gen_buffer_max_size;
>>     void *code_gen_ptr;
>>
>>     TBContext tb_ctx;
>>
>>     /* The TCGBackendData structure is private to tcg-target.c. */
>>     struct TCGBackendData *be;
>> };
>>
>> This structure is used to translate the TBs.
>> The easier solution was to protect the generation of the code to only 
>> allow one
>> CPU to generate code at a time. This is normal as we don't want double 
>> generated
>> tb in the pool anyway. This is achieved with the tb_lock used above.
>>
>> TLB:
>>
>> TLB seems to be CPU dependant, so it is not really a problem as in our
>> implementation one CPU = one pthread. But sometimes a CPU wants to 
>> flush TLB,
>> through an instruction for example. It is very likely an other CPU in 
>> an other
>> thread is executing code at the same time. That's why we choose to 
>> create a
>> tlb_flush_mechanism:
>> When a CPU wants to flush it asks and wait all CPU to exit TCG and 
>> then exit
>> itself. This can be reused for tb_invalidate and or tb_flush as well.
>>
>> Atomic instructions:
>>
>> Atomic instructions are quite hard to implement.
>> The TranslationBlock implementing the atomic instruction can't be 
>> interrupted
>> during the execution (eg: by an interrupt or a signal) cmpxchg64 
>> helper is used
>> for that.
>>
>> QEMU's global lock:
>>
>> TCG thread take the lock during code execution. This is not ok for 
>> multi-thread
>> because that means only one thread will be running at a time. That's 
>> why we took
>> Jan's patch to allow TCG to run without the lock and take it when needed.
>>
>> What is the status:
>>
>>  * We can start a vexpress-a15 simulation with two A15 and run two 
>> dhrystones at
>>    a time, the performance are increased it's quite stable.
>>
>> What is missing:
>>
>>  * tb_flush is not implemented correctly.
>>  * PageDesc structure is not protected the patch which introduced a 
>> first_tb
>>    array was not the right approach and is removed. This implies that
>>    tb_invalidate is broken.
>>
>> For both issues we plan to use the same mechanism as tlb_flush: 
>> exiting all the
>> CPU, flushing, invalidating and let them continue. A generic mechanism 
>> must be
>> implemented for that.
>>
>> Known issues:
>>
>>  * GDB stub is broken because it uses tb_invalidate and we didn't 
>> implement that
>>    for now, and there are probably other issues.
>>  * SMP > 2 crashes, probably because of tb_invalidate as well.
>>  * We don't know the status of the user code, which is probably broken 
>> by our
>>    changes.
>>

-- 
Alex Bennée

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC 00/10] MultiThread TCG.
  2015-04-23 15:44             ` Alex Bennée
@ 2015-04-23 15:46               ` Alex Bennée
  2015-04-27  7:37                 ` Frederic Konrad
  0 siblings, 1 reply; 62+ messages in thread
From: Alex Bennée @ 2015-04-23 15:46 UTC (permalink / raw)
  To: Frederic Konrad
  Cc: mttcg, Peter Maydell, Jan Kiszka, Mark Burton, Alexander Graf,
	QEMU Developers, Paolo Bonzini


Alex Bennée <alex.bennee@linaro.org> writes:

> Frederic Konrad <fred.konrad@greensocs.com> writes:
>
<snip>
>>
>> Does that makes sense?
>>
>> BTW here is the repository:
>> git clone git@git.greensocs.com:fkonrad/mttcg.git -b multi_tcg_v4
>
> Is there a non-authenticated read-only http or git:// access to this repo?

I should have done some poking first! I found it at:

http://git.greensocs.com/fkonrad/mttcg.git

Thanks for making your tree available, it will help with the review and
playing around.

-- 
Alex Bennée

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC 00/10] MultiThread TCG.
  2015-04-23 15:46               ` Alex Bennée
@ 2015-04-27  7:37                 ` Frederic Konrad
  0 siblings, 0 replies; 62+ messages in thread
From: Frederic Konrad @ 2015-04-27  7:37 UTC (permalink / raw)
  To: Alex Bennée
  Cc: mttcg, Peter Maydell, Jan Kiszka, Mark Burton, Alexander Graf,
	QEMU Developers, Paolo Bonzini

On 23/04/2015 17:46, Alex Bennée wrote:
> Alex Bennée <alex.bennee@linaro.org> writes:
>
>> Frederic Konrad <fred.konrad@greensocs.com> writes:
>>
> <snip>
>>> Does that makes sense?
>>>
>>> BTW here is the repository:
>>> git clone git@git.greensocs.com:fkonrad/mttcg.git -b multi_tcg_v4
>> Is there a non-authenticated read-only http or git:// access to this repo?
> I should have done some poking first! I found it at:
>
> http://git.greensocs.com/fkonrad/mttcg.git
>
> Thanks for making your tree available, it will help with the review and
> playing around.
>
Hi Alex,

Thanks for looking at this/testing it.
I currently try to fix tb_invalidate, I'll let you know when it'll be 
ready/pushed to the
server.

Thanks,
Fred

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC 00/10] MultiThread TCG.
  2015-04-22 12:26           ` Frederic Konrad
  2015-04-22 13:18             ` Peter Maydell
  2015-04-23 15:44             ` Alex Bennée
@ 2015-04-27 17:06             ` Emilio G. Cota
  2015-04-28  8:17               ` Frederic Konrad
  2015-04-28  9:06               ` Paolo Bonzini
  2 siblings, 2 replies; 62+ messages in thread
From: Emilio G. Cota @ 2015-04-27 17:06 UTC (permalink / raw)
  To: Frederic Konrad
  Cc: mttcg, Peter Maydell, Jan Kiszka, Mark Burton, Alexander Graf,
	QEMU Developers, Paolo Bonzini, Alex Bennée

Hi Fred,

On Wed, Apr 22, 2015 at 14:26:14 +0200, Frederic Konrad wrote:
> git clone git@git.greensocs.com:fkonrad/mttcg.git -b multi_tcg_v4

I've tried to run buildroot's vexpress-a9 with this, but unfortunately
it gets stuck before showing much progress towards boot:
[ messages in brackets are mine ]

$ time arm-softmmu/qemu-system-arm -M vexpress-a9 -kernel img/arm/zImage \
	-drive file=img/arm/rootfs.ext2,if=sd \
	-append "console=ttyAMA0,115200 root=/dev/mmcblk0" -net nic,model=lan9118 \
	-net user -nographic -smp 1

WARNING: Image format was not specified for 'img/arm/rootfs.ext2' and probing guessed raw.
         Automatically detecting the format is dangerous for raw images, write operations on block 0 will be restricted.
         Specify the 'raw' format explicitly to remove the restrictions.
tlb_flush: CPU 0                        [ this 11 times more ]
audio: Could not init `oss' audio driver
tlb_flush: CPU 0                        [ and this 31 times ]
[ then here it stops printing, and one thread takes 100% CPU ]

Note that I'm running with -smp 1. My guess is that the iothread
is starved, since patch 472f4003 "Drop global lock during TCG code execution"
removes from the iothread the ability to kick CPU threads.

Thanks,

		Emilio

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC 00/10] MultiThread TCG.
  2015-04-27 17:06             ` Emilio G. Cota
@ 2015-04-28  8:17               ` Frederic Konrad
  2015-04-28  9:06               ` Paolo Bonzini
  1 sibling, 0 replies; 62+ messages in thread
From: Frederic Konrad @ 2015-04-28  8:17 UTC (permalink / raw)
  To: Emilio G. Cota
  Cc: mttcg, Peter Maydell, Jan Kiszka, Mark Burton, QEMU Developers,
	Alexander Graf, Paolo Bonzini, Alex Bennée

Hi Emilio,

On 27/04/2015 19:06, Emilio G. Cota wrote:
> Hi Fred,
>
> On Wed, Apr 22, 2015 at 14:26:14 +0200, Frederic Konrad wrote:
>> git clone git@git.greensocs.com:fkonrad/mttcg.git -b multi_tcg_v4
> I've tried to run buildroot's vexpress-a9 with this, but unfortunately
> it gets stuck before showing much progress towards boot:
> [ messages in brackets are mine ]
>
> $ time arm-softmmu/qemu-system-arm -M vexpress-a9 -kernel img/arm/zImage \
> 	-drive file=img/arm/rootfs.ext2,if=sd \
> 	-append "console=ttyAMA0,115200 root=/dev/mmcblk0" -net nic,model=lan9118 \
> 	-net user -nographic -smp 1
>
> WARNING: Image format was not specified for 'img/arm/rootfs.ext2' and probing guessed raw.
>           Automatically detecting the format is dangerous for raw images, write operations on block 0 will be restricted.
>           Specify the 'raw' format explicitly to remove the restrictions.
> tlb_flush: CPU 0                        [ this 11 times more ]
> audio: Could not init `oss' audio driver
> tlb_flush: CPU 0                        [ and this 31 times ]
> [ then here it stops printing, and one thread takes 100% CPU ]
>
> Note that I'm running with -smp 1. My guess is that the iothread
> is starved, since patch 472f4003 "Drop global lock during TCG code execution"
> removes from the iothread the ability to kick CPU threads.
>
> Thanks,
>
> 		Emilio
>
Does this patch really removes the iothread the ability to kick CPU threads?
Can you check where it's blocked?

Thanks,
Fred

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC 00/10] MultiThread TCG.
  2015-04-27 17:06             ` Emilio G. Cota
  2015-04-28  8:17               ` Frederic Konrad
@ 2015-04-28  9:06               ` Paolo Bonzini
  2015-04-28 17:49                 ` Emilio G. Cota
  1 sibling, 1 reply; 62+ messages in thread
From: Paolo Bonzini @ 2015-04-28  9:06 UTC (permalink / raw)
  To: Emilio G. Cota, Frederic Konrad
  Cc: mttcg, Peter Maydell, Jan Kiszka, Mark Burton, Alexander Graf,
	QEMU Developers, Alex Bennée



On 27/04/2015 19:06, Emilio G. Cota wrote:
> Note that I'm running with -smp 1. My guess is that the iothread
> is starved, since patch 472f4003 "Drop global lock during TCG code execution"
> removes from the iothread the ability to kick CPU threads.

In theory that shouldn't be necessary anymore.  The CPU thread should
only hold the global lock for very small periods of time, similar to KVM.

Can you post a backtrace?

Paolo

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC 00/10] MultiThread TCG.
  2015-04-28  9:06               ` Paolo Bonzini
@ 2015-04-28 17:49                 ` Emilio G. Cota
  0 siblings, 0 replies; 62+ messages in thread
From: Emilio G. Cota @ 2015-04-28 17:49 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: mttcg, Peter Maydell, Jan Kiszka, Mark Burton, QEMU Developers,
	Alexander Graf, Alex Bennée, Frederic Konrad

On Tue, Apr 28, 2015 at 11:06:37 +0200, Paolo Bonzini wrote:
> On 27/04/2015 19:06, Emilio G. Cota wrote:
> > Note that I'm running with -smp 1. My guess is that the iothread
> > is starved, since patch 472f4003 "Drop global lock during TCG code execution"
> > removes from the iothread the ability to kick CPU threads.
> 
> In theory that shouldn't be necessary anymore.  The CPU thread should
> only hold the global lock for very small periods of time, similar to KVM.

You're right.

I added printouts around qemu_global_mutex_lock/unlock
and also added printouts around the cond_wait's that take the
BQL. The vcpu goes quiet after a while:

[...]
softmmu_template.h:io_writel:387 UNLO tid 17633
qemu/cputlb.c:tlb_protect_code:196 LOCK tid 17633
cputlb.c:tlb_protect_code:199 UNLO tid 17633
cputlb.c:tlb_protect_code:196 LOCK tid 17633
cputlb.c:tlb_protect_code:199 UNLO tid 17633
cputlb.c:tlb_protect_code:196 LOCK tid 17633
cputlb.c:tlb_protect_code:199 UNLO tid 17633
softmmu_template.h:io_readl:160 LOCK tid 17633
softmmu_template.h:io_readl:165 UNLO tid 17633
main-loop.c:os_host_main_loop_wait:242 LOCK tid 17630
main-loop.c:os_host_main_loop_wait:234 UNLO tid 17630

.. And at this point the last pair of LOCK/UNLO goes indefinitely.

> Can you post a backtrace?

$ sudo gdb --pid=8919
(gdb) info threads
  Id   Target Id         Frame 
  3    Thread 0x7ffff596b700 (LWP 16204) "qemu-system-arm" syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
  2    Thread 0x7ffff0f69700 (LWP 16206) "qemu-system-arm" 0x00007ffff33179fe in ?? ()
* 1    Thread 0x7ffff7fe4a80 (LWP 16203) "qemu-system-arm" 0x00007ffff5e9b1ef in __GI_ppoll (fds=0x5555569b8f70, nfds=4, 
    timeout=<optimized out>, sigmask=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:56
(gdb) bt
#0  0x00007ffff5e9b1ef in __GI_ppoll (fds=0x5555569b8f70, nfds=4, timeout=<optimized out>, sigmask=0x0)
    at ../sysdeps/unix/sysv/linux/ppoll.c:56
#1  0x00005555559a9e26 in qemu_poll_ns (fds=0x5555569b8f70, nfds=4, timeout=9689027) at qemu-timer.c:326
#2  0x00005555559a8abb in os_host_main_loop_wait (timeout=9689027) at main-loop.c:239
#3  0x00005555559a8bef in main_loop_wait (nonblocking=0) at main-loop.c:494
#4  0x000055555578c8c5 in main_loop () at vl.c:1803
#5  0x0000555555794634 in main (argc=16, argv=0x7fffffffe828, envp=0x7fffffffe8b0) at vl.c:4371
(gdb) thread 3
[Switching to thread 3 (Thread 0x7ffff596b700 (LWP 16204))]
#0  syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
38      ../sysdeps/unix/sysv/linux/x86_64/syscall.S: No such file or directory.
(gdb) bt
#0  syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
#1  0x0000555555a3a061 in futex_wait (ev=0x555556392724 <rcu_call_ready_event>, val=4294967295) at util/qemu-thread-posix.c:305
#2  0x0000555555a3a20b in qemu_event_wait (ev=0x555556392724 <rcu_call_ready_event>) at util/qemu-thread-posix.c:401
#3  0x0000555555a5011d in call_rcu_thread (opaque=0x0) at util/rcu.c:231
#4  0x00007ffff617b182 in start_thread (arg=0x7ffff596b700) at pthread_create.c:312
#5  0x00007ffff5ea847d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
(gdb) thread 2
[Switching to thread 2 (Thread 0x7ffff0f69700 (LWP 16206))]
#0  0x00007ffff33179fe in ?? ()
(gdb) bt
#0  0x00007ffff33179fe in ?? ()
#1  0x0000555555f0c200 in ?? ()
#2  0x00007fffd40029a0 in ?? ()
#3  0x00007fffd40029c0 in ?? ()
#4  0x13b33d1714a74c00 in ?? ()
#5  0x00007ffff0f685c0 in ?? ()
#6  0x00005555555f9da7 in tcg_out_reloc (s=<error reading variable: Cannot access memory at address 0xffff8ab1>, 
    code_ptr=<error reading variable: Cannot access memory at address 0xffff8aa9>, 
    type=<error reading variable: Cannot access memory at address 0xffff8aa5>, 
    label_index=<error reading variable: Cannot access memory at address 0xffff8aa1>, 
    addend=<error reading variable: Cannot access memory at address 0xffff8a99>) at /local/home/cota/src/qemu/tcg/tcg.c:224
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
(gdb) q

So it seems that the vcpu thread doesn't come out of the execution loop
from which that last io_readl was performed.

		Emilio

^ permalink raw reply	[flat|nested] 62+ messages in thread

end of thread, other threads:[~2015-04-28 17:48 UTC | newest]

Thread overview: 62+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-16 17:19 [Qemu-devel] [RFC 00/10] MultiThread TCG fred.konrad
2015-01-16 17:19 ` [Qemu-devel] [RFC 01/10] target-arm: protect cpu_exclusive_* fred.konrad
2015-01-27 14:36   ` Alex Bennée
2015-01-29 15:17   ` Peter Maydell
2015-02-02  8:31     ` Frederic Konrad
2015-02-02  8:36       ` Peter Maydell
2015-02-26 18:09     ` Frederic Konrad
2015-02-26 20:36       ` Alexander Graf
2015-02-26 22:56       ` Peter Maydell
2015-02-27  7:54         ` Mark Burton
2015-03-02 12:27           ` Peter Maydell
2015-03-03 15:29             ` Mark Burton
2015-03-03 15:32               ` Paolo Bonzini
2015-03-03 15:33                 ` Mark Burton
2015-03-03 15:34                   ` Paolo Bonzini
2015-03-03 15:41                     ` Mark Burton
2015-03-03 15:47                   ` Dr. David Alan Gilbert
2015-03-13 19:38                     ` Richard Henderson
2015-03-13 20:04                       ` Dr. David Alan Gilbert
2015-01-16 17:19 ` [Qemu-devel] [RFC 02/10] use a different translation block list for each cpu fred.konrad
2015-01-27 14:45   ` Alex Bennée
2015-01-27 15:16     ` Frederic Konrad
2015-01-29 15:24   ` Peter Maydell
2015-01-29 15:33     ` Mark Burton
2015-02-02  8:39     ` Frederic Konrad
2015-02-02  8:49       ` Peter Maydell
2015-02-03 16:17   ` Richard Henderson
2015-02-03 16:33     ` Paolo Bonzini
2015-01-16 17:19 ` [Qemu-devel] [RFC 03/10] replace spinlock by QemuMutex fred.konrad
2015-01-29 15:25   ` Peter Maydell
2015-02-02  8:45     ` Frederic Konrad
2015-01-16 17:19 ` [Qemu-devel] [RFC 04/10] remove unused spinlock fred.konrad
2015-01-16 17:19 ` [Qemu-devel] [RFC 05/10] extract TBContext from TCGContext fred.konrad
2015-01-29 15:44   ` Peter Maydell
2015-02-03 16:30     ` Richard Henderson
2015-01-16 17:19 ` [Qemu-devel] [RFC 06/10] protect TBContext with tb_lock fred.konrad
2015-01-16 17:19 ` [Qemu-devel] [RFC 07/10] tcg: remove tcg_halt_cond global variable fred.konrad
2015-01-16 17:19 ` [Qemu-devel] [RFC 08/10] Drop global lock during TCG code execution fred.konrad
2015-01-16 17:19 ` [Qemu-devel] [RFC 09/10] cpu: remove exit_request global fred.konrad
2015-01-29 15:52   ` Peter Maydell
2015-02-02 10:03     ` Paolo Bonzini
2015-02-02 13:12       ` Peter Maydell
2015-02-02 13:14         ` Paolo Bonzini
2015-02-03  9:37     ` Frederic Konrad
2015-02-03 10:29       ` Peter Maydell
2015-01-16 17:19 ` [Qemu-devel] [RFC 10/10] tcg: switch on multithread fred.konrad
2015-03-27 10:08 ` [Qemu-devel] [RFC 00/10] MultiThread TCG Alex Bennée
2015-03-27 10:37   ` Frederic Konrad
2015-03-30  6:52     ` Mark Burton
2015-03-30 21:46       ` Peter Maydell
2015-03-31  6:41         ` Mark Burton
2015-04-10 16:03         ` Frederic Konrad
2015-04-22 12:26           ` Frederic Konrad
2015-04-22 13:18             ` Peter Maydell
2015-04-23  7:38               ` Frederic Konrad
2015-04-23 15:44             ` Alex Bennée
2015-04-23 15:46               ` Alex Bennée
2015-04-27  7:37                 ` Frederic Konrad
2015-04-27 17:06             ` Emilio G. Cota
2015-04-28  8:17               ` Frederic Konrad
2015-04-28  9:06               ` Paolo Bonzini
2015-04-28 17:49                 ` Emilio G. Cota

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.