All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [RFC PATCH V6 00/18] Multithread TCG.
@ 2015-06-26 14:47 fred.konrad
  2015-06-26 14:47 ` [Qemu-devel] [RFC PATCH V6 01/18] cpu: make cpu_thread_is_idle public fred.konrad
                   ` (17 more replies)
  0 siblings, 18 replies; 82+ messages in thread
From: fred.konrad @ 2015-06-26 14:47 UTC (permalink / raw)
  To: qemu-devel, mttcg
  Cc: peter.maydell, a.spyridakis, mark.burton, agraf,
	alistair.francis, guillaume.delbergue, pbonzini, alex.bennee,
	fred.konrad

From: KONRAD Frederic <fred.konrad@greensocs.com>

This is the 6th round of the MTTCG patch series with hopefully a lot of
improvements since the last version. Basically the atomic patch has been
significantly improved, some issues has been fixed and the speed has been
improved.

It can be cloned from:
git@git.greensocs.com:fkonrad/mttcg.git branch multi_tcg_v6.

This patch-set try to address the different issues in the global picture of
MTTCG, presented on the wiki.

== Needed patch for our work ==

Some preliminaries are needed for our work:
 * current_cpu doesn't make sense in mttcg so a tcg_executing flag is added to
   the CPUState.
 * We need to run some work safely when all VCPUs are outside their execution
   loop. This is done with the async_run_safe_work_on_cpu function introduced
   in this series.
 * QemuSpin lock is introduced (on posix only yet) to allow a faster handling of
   atomic instruction.

== Code generation and cache ==

As Qemu stands, there is no protection at all against two threads attempting to
generate code at the same time or modifying a TranslationBlock.
The "protect TBContext with tb_lock" patch address the issue of code generation
and makes all the tb_* function thread safe (except tb_flush).
This raised the question of one or multiple caches. We choosed to use one
unified cache because it's easier as a first step and since the structure of
QEMU effectively has a ‘local’ cache per CPU in the form of the jump cache, we
don't see the benefit of having two pools of tbs.

== Dirty tracking ==

Protecting the IOs:
To allows all VCPUs threads to run at the same time we need to drop the
global_mutex as soon as possible. The io access need to take the mutex. This is
likely to change when http://thread.gmane.org/gmane.comp.emulators.qemu/345258
will be upstreamed.

Invalidation of TranslationBlocks:
We can have all VCPUs running during an invalidation. Each VCPU is able to clean
it's jump cache itself as it is in CPUState so that can be handled by a simple
call to async_run_on_cpu. However tb_invalidate also writes to the
TranslationBlock which is shared as we have only one pool.
Hence this part of invalidate requires all VCPUs to exit before it can be done.
Hence the async_run_safe_work_on_cpu is introduced to handle this case.

== Atomic instruction ==

For now only ARM on x64 is supported by using an cmpxchg instruction.
Specifically the limitation of this approach is that it is harder to support
64bit ARM on a host architecture that is multi-core, but only supports 32 bit
cmpxchg (we believe this could be the case for some PPC cores).  For now this
case is not correctly handled. The existing atomic patch will attempt to execute
the 64 bit cmpxchg functionality in a non thread safe fashion. Our intention is
to provide a new multi-thread ARM atomic patch for 64bit ARM on effective 32bit
hosts.
This atomic instruction part has been tested with Alexander's atomic stress repo
available here:
https://lists.gnu.org/archive/html/qemu-devel/2015-06/msg05585.html

The execution is a little slower than upstream probably because of the different
VCPU fight for the mutex. Swaping arm_exclusive_lock from mutex to spin_lock
reduce considerably the difference.

== Testing ==

A simple double dhrystone test in SMP 2 with vexpress-a15 in a linux guest show
a good performance progression: it takes basically 18s upstream to complete vs
10s with MTTCG.

Testing image is available here:
https://cloud.greensocs.com/index.php/s/CfHSLzDH5pmTkW3

Then simply:
./configure --target-list=arm-softmmu
make -j8
./arm-softmmu/qemu-system-arm -M vexpress-a15 -smp 2 -kernel zImage
-initrd rootfs.ext2 -dtb vexpress-v2p-ca15-tc1.dtb --nographic
--append "console=ttyAMA0"

login: root

The dhrystone command is the last one in the history.
"dhrystone 10000000 & dhrystone 10000000"

The atomic spinlock benchmark from Alexander shows that atomic basically work.
Just follow the instruction here:
https://lists.gnu.org/archive/html/qemu-devel/2015-06/msg05585.html

== Known issues ==

* Virtio double lock:
  Virtio does accidently a double qemu_mutex_iothread_lock.

* GDB stub:
  GDB stub is not tested right now it will probably requires some changes to
  work.

Changes:
  * Introduce async_safe_work to do the tb_flush and some part of tb_invalidate.
  * Introduce QemuSpin from Guillaume which allow a faster atomic instruction
    (6s to pass Alexander's atomic test instead of 30s before).
  * Don't take tb_lock before tb_find_fast.
  * Handle tb_flush with async_safe_work.
  * Handle tb_invalidate with async_work and async_safe_work.
  * Drop the tlb_flush_request mechanism and use async_work as well.
  * Fix the wrong lenght in atomic patch.
  * Fix the wrong return address for exception in atomic patch.

Guillaume Delbergue (1):
  add support for spin lock on POSIX systems exclusively

Jan Kiszka (1):
  Drop global lock during TCG code execution

KONRAD Frederic (16):
  cpu: make cpu_thread_is_idle public.
  replace spinlock by QemuMutex.
  remove unused spinlock.
  protect TBContext with tb_lock.
  tcg: remove tcg_halt_cond global variable.
  cpu: remove exit_request global.
  cpu: add a tcg_executing flag.
  tcg: switch on multithread.
  cpus: make qemu_cpu_kick_thread public.
  Use atomic cmpxchg to atomically check the exclusive value in a STREX
  cpu: introduce async_run_safe_work_on_cpu.
  add a callback when tb_invalidate is called.
  cpu: introduce tlb_flush*_all.
  arm: use tlb_flush*_all
  translate-all: introduces tb_flush_safe.
  translate-all: (wip) use tb_flush_safe when we can't alloc more tb.

 cpu-exec.c                  |  96 ++++++++++++--------
 cpus.c                      | 208 +++++++++++++++++++++++++-----------------
 cputlb.c                    |  81 +++++++++++++++++
 exec.c                      |  25 +++++
 include/exec/exec-all.h     |   7 +-
 include/exec/spinlock.h     |  49 ----------
 include/qemu/thread-posix.h |   4 +
 include/qemu/thread-win32.h |   4 +
 include/qemu/thread.h       |   7 ++
 include/qom/cpu.h           |  35 +++++++
 include/sysemu/cpus.h       |   1 +
 linux-user/main.c           |   6 +-
 qom/cpu.c                   |   1 +
 scripts/checkpatch.pl       |   9 +-
 softmmu_template.h          |   5 +
 target-arm/cpu.c            |  21 +++++
 target-arm/cpu.h            |   6 ++
 target-arm/helper.c         |  58 ++++--------
 target-arm/helper.h         |   4 +
 target-arm/op_helper.c      | 128 +++++++++++++++++++++++++-
 target-arm/translate.c      | 103 +++++----------------
 target-i386/mem_helper.c    |  16 +++-
 target-i386/misc_helper.c   |  27 +++++-
 tcg/i386/tcg-target.c       |   8 ++
 tcg/tcg.h                   |   7 ++
 translate-all.c             | 217 +++++++++++++++++++++++++++++++++++++-------
 util/qemu-thread-posix.c    |  45 +++++++++
 util/qemu-thread-win32.c    |  30 ++++++
 vl.c                        |   6 ++
 29 files changed, 869 insertions(+), 345 deletions(-)
 delete mode 100644 include/exec/spinlock.h

-- 
1.9.0

^ permalink raw reply	[flat|nested] 82+ messages in thread

* [Qemu-devel] [RFC PATCH V6 01/18] cpu: make cpu_thread_is_idle public.
  2015-06-26 14:47 [Qemu-devel] [RFC PATCH V6 00/18] Multithread TCG fred.konrad
@ 2015-06-26 14:47 ` fred.konrad
  2015-07-07  9:47   ` Alex Bennée
  2015-06-26 14:47 ` [Qemu-devel] [RFC PATCH V6 02/18] replace spinlock by QemuMutex fred.konrad
                   ` (16 subsequent siblings)
  17 siblings, 1 reply; 82+ messages in thread
From: fred.konrad @ 2015-06-26 14:47 UTC (permalink / raw)
  To: qemu-devel, mttcg
  Cc: peter.maydell, a.spyridakis, mark.burton, agraf,
	alistair.francis, guillaume.delbergue, pbonzini, alex.bennee,
	fred.konrad

From: KONRAD Frederic <fred.konrad@greensocs.com>

Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
---
 cpus.c            |  2 +-
 include/qom/cpu.h | 11 +++++++++++
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/cpus.c b/cpus.c
index 4f0e54d..2d62a35 100644
--- a/cpus.c
+++ b/cpus.c
@@ -74,7 +74,7 @@ bool cpu_is_stopped(CPUState *cpu)
     return cpu->stopped || !runstate_is_running();
 }
 
-static bool cpu_thread_is_idle(CPUState *cpu)
+bool cpu_thread_is_idle(CPUState *cpu)
 {
     if (cpu->stop || cpu->queued_work_first) {
         return false;
diff --git a/include/qom/cpu.h b/include/qom/cpu.h
index 39f0f19..af3c9e4 100644
--- a/include/qom/cpu.h
+++ b/include/qom/cpu.h
@@ -514,6 +514,17 @@ void qemu_cpu_kick(CPUState *cpu);
 bool cpu_is_stopped(CPUState *cpu);
 
 /**
+ * cpu_thread_is_idle:
+ * @cpu: The CPU to check.
+ *
+ * Checks whether the CPU thread is idle.
+ *
+ * Returns: %true if the thread is idle;
+ * %false otherwise.
+ */
+bool cpu_thread_is_idle(CPUState *cpu);
+
+/**
  * run_on_cpu:
  * @cpu: The vCPU to run on.
  * @func: The function to be executed.
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [Qemu-devel] [RFC PATCH V6 02/18] replace spinlock by QemuMutex.
  2015-06-26 14:47 [Qemu-devel] [RFC PATCH V6 00/18] Multithread TCG fred.konrad
  2015-06-26 14:47 ` [Qemu-devel] [RFC PATCH V6 01/18] cpu: make cpu_thread_is_idle public fred.konrad
@ 2015-06-26 14:47 ` fred.konrad
  2015-07-07 10:15   ` Alex Bennée
  2015-06-26 14:47 ` [Qemu-devel] [RFC PATCH V6 03/18] remove unused spinlock fred.konrad
                   ` (15 subsequent siblings)
  17 siblings, 1 reply; 82+ messages in thread
From: fred.konrad @ 2015-06-26 14:47 UTC (permalink / raw)
  To: qemu-devel, mttcg
  Cc: peter.maydell, a.spyridakis, mark.burton, agraf,
	alistair.francis, guillaume.delbergue, pbonzini, alex.bennee,
	fred.konrad

From: KONRAD Frederic <fred.konrad@greensocs.com>

spinlock is only used in two cases:
  * cpu-exec.c: to protect TranslationBlock
  * mem_helper.c: for lock helper in target-i386 (which seems broken).

It's a pthread_mutex_t in user-mode so better using QemuMutex directly in this
case.
It allows as well to reuse tb_lock mutex of TBContext in case of multithread
TCG.

Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
---
 cpu-exec.c               | 15 +++++++++++----
 include/exec/exec-all.h  |  4 ++--
 linux-user/main.c        |  6 +++---
 target-i386/mem_helper.c | 16 +++++++++++++---
 tcg/i386/tcg-target.c    |  8 ++++++++
 5 files changed, 37 insertions(+), 12 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index 2ffeb6e..d6336d9 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -362,7 +362,9 @@ int cpu_exec(CPUArchState *env)
     SyncClocks sc;
 
     /* This must be volatile so it is not trashed by longjmp() */
+#if defined(CONFIG_USER_ONLY)
     volatile bool have_tb_lock = false;
+#endif
 
     if (cpu->halted) {
         if (!cpu_has_work(cpu)) {
@@ -480,8 +482,10 @@ int cpu_exec(CPUArchState *env)
                     cpu->exception_index = EXCP_INTERRUPT;
                     cpu_loop_exit(cpu);
                 }
-                spin_lock(&tcg_ctx.tb_ctx.tb_lock);
+#if defined(CONFIG_USER_ONLY)
+                qemu_mutex_lock(&tcg_ctx.tb_ctx.tb_lock);
                 have_tb_lock = true;
+#endif
                 tb = tb_find_fast(env);
                 /* Note: we do it here to avoid a gcc bug on Mac OS X when
                    doing it in tb_find_slow */
@@ -503,9 +507,10 @@ int cpu_exec(CPUArchState *env)
                     tb_add_jump((TranslationBlock *)(next_tb & ~TB_EXIT_MASK),
                                 next_tb & TB_EXIT_MASK, tb);
                 }
+#if defined(CONFIG_USER_ONLY)
                 have_tb_lock = false;
-                spin_unlock(&tcg_ctx.tb_ctx.tb_lock);
-
+                qemu_mutex_unlock(&tcg_ctx.tb_ctx.tb_lock);
+#endif
                 /* cpu_interrupt might be called while translating the
                    TB, but before it is linked into a potentially
                    infinite loop and becomes env->current_tb. Avoid
@@ -572,10 +577,12 @@ int cpu_exec(CPUArchState *env)
 #ifdef TARGET_I386
             x86_cpu = X86_CPU(cpu);
 #endif
+#if defined(CONFIG_USER_ONLY)
             if (have_tb_lock) {
-                spin_unlock(&tcg_ctx.tb_ctx.tb_lock);
+                qemu_mutex_unlock(&tcg_ctx.tb_ctx.tb_lock);
                 have_tb_lock = false;
             }
+#endif
         }
     } /* for(;;) */
 
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 2573e8c..44f3336 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -176,7 +176,7 @@ struct TranslationBlock {
     struct TranslationBlock *jmp_first;
 };
 
-#include "exec/spinlock.h"
+#include "qemu/thread.h"
 
 typedef struct TBContext TBContext;
 
@@ -186,7 +186,7 @@ struct TBContext {
     TranslationBlock *tb_phys_hash[CODE_GEN_PHYS_HASH_SIZE];
     int nb_tbs;
     /* any access to the tbs or the page table must use this lock */
-    spinlock_t tb_lock;
+    QemuMutex tb_lock;
 
     /* statistics */
     int tb_flush_count;
diff --git a/linux-user/main.c b/linux-user/main.c
index c855bcc..bce3a98 100644
--- a/linux-user/main.c
+++ b/linux-user/main.c
@@ -107,7 +107,7 @@ static int pending_cpus;
 /* Make sure everything is in a consistent state for calling fork().  */
 void fork_start(void)
 {
-    pthread_mutex_lock(&tcg_ctx.tb_ctx.tb_lock);
+    qemu_mutex_lock(&tcg_ctx.tb_ctx.tb_lock);
     pthread_mutex_lock(&exclusive_lock);
     mmap_fork_start();
 }
@@ -129,11 +129,11 @@ void fork_end(int child)
         pthread_mutex_init(&cpu_list_mutex, NULL);
         pthread_cond_init(&exclusive_cond, NULL);
         pthread_cond_init(&exclusive_resume, NULL);
-        pthread_mutex_init(&tcg_ctx.tb_ctx.tb_lock, NULL);
+        qemu_mutex_init(&tcg_ctx.tb_ctx.tb_lock);
         gdbserver_fork((CPUArchState *)thread_cpu->env_ptr);
     } else {
         pthread_mutex_unlock(&exclusive_lock);
-        pthread_mutex_unlock(&tcg_ctx.tb_ctx.tb_lock);
+        qemu_mutex_unlock(&tcg_ctx.tb_ctx.tb_lock);
     }
 }
 
diff --git a/target-i386/mem_helper.c b/target-i386/mem_helper.c
index 1aec8a5..7106cc3 100644
--- a/target-i386/mem_helper.c
+++ b/target-i386/mem_helper.c
@@ -23,17 +23,27 @@
 
 /* broken thread support */
 
-static spinlock_t global_cpu_lock = SPIN_LOCK_UNLOCKED;
+#if defined(CONFIG_USER_ONLY)
+QemuMutex global_cpu_lock;
 
 void helper_lock(void)
 {
-    spin_lock(&global_cpu_lock);
+    qemu_mutex_lock(&global_cpu_lock);
 }
 
 void helper_unlock(void)
 {
-    spin_unlock(&global_cpu_lock);
+    qemu_mutex_unlock(&global_cpu_lock);
 }
+#else
+void helper_lock(void)
+{
+}
+
+void helper_unlock(void)
+{
+}
+#endif
 
 void helper_cmpxchg8b(CPUX86State *env, target_ulong a0)
 {
diff --git a/tcg/i386/tcg-target.c b/tcg/i386/tcg-target.c
index ff4d9cf..0d7c99c 100644
--- a/tcg/i386/tcg-target.c
+++ b/tcg/i386/tcg-target.c
@@ -24,6 +24,10 @@
 
 #include "tcg-be-ldst.h"
 
+#if defined(CONFIG_USER_ONLY)
+extern QemuMutex global_cpu_lock;
+#endif
+
 #ifndef NDEBUG
 static const char * const tcg_target_reg_names[TCG_TARGET_NB_REGS] = {
 #if TCG_TARGET_REG_BITS == 64
@@ -2342,6 +2346,10 @@ static void tcg_target_init(TCGContext *s)
     tcg_regset_set_reg(s->reserved_regs, TCG_REG_CALL_STACK);
 
     tcg_add_target_add_op_defs(x86_op_defs);
+
+#if defined(CONFIG_USER_ONLY)
+    qemu_mutex_init(global_cpu_lock);
+#endif
 }
 
 typedef struct {
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [Qemu-devel] [RFC PATCH V6 03/18] remove unused spinlock.
  2015-06-26 14:47 [Qemu-devel] [RFC PATCH V6 00/18] Multithread TCG fred.konrad
  2015-06-26 14:47 ` [Qemu-devel] [RFC PATCH V6 01/18] cpu: make cpu_thread_is_idle public fred.konrad
  2015-06-26 14:47 ` [Qemu-devel] [RFC PATCH V6 02/18] replace spinlock by QemuMutex fred.konrad
@ 2015-06-26 14:47 ` fred.konrad
  2015-06-26 14:53   ` Paolo Bonzini
  2015-06-26 14:47 ` [Qemu-devel] [RFC PATCH V6 04/18] add support for spin lock on POSIX systems exclusively fred.konrad
                   ` (14 subsequent siblings)
  17 siblings, 1 reply; 82+ messages in thread
From: fred.konrad @ 2015-06-26 14:47 UTC (permalink / raw)
  To: qemu-devel, mttcg
  Cc: peter.maydell, a.spyridakis, mark.burton, agraf,
	alistair.francis, guillaume.delbergue, pbonzini, alex.bennee,
	fred.konrad

From: KONRAD Frederic <fred.konrad@greensocs.com>

This just removes spinlock as it is not used anymore.

Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
---
 include/exec/spinlock.h | 49 -------------------------------------------------
 scripts/checkpatch.pl   |  9 ++-------
 2 files changed, 2 insertions(+), 56 deletions(-)
 delete mode 100644 include/exec/spinlock.h

diff --git a/include/exec/spinlock.h b/include/exec/spinlock.h
deleted file mode 100644
index a72edda..0000000
--- a/include/exec/spinlock.h
+++ /dev/null
@@ -1,49 +0,0 @@
-/*
- *  Copyright (c) 2003 Fabrice Bellard
- *
- * This library is free software; you can redistribute it and/or
- * modify it under the terms of the GNU Lesser General Public
- * License as published by the Free Software Foundation; either
- * version 2 of the License, or (at your option) any later version.
- *
- * This library is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
- * Lesser General Public License for more details.
- *
- * You should have received a copy of the GNU Lesser General Public
- * License along with this library; if not, see <http://www.gnu.org/licenses/>
- */
-
-/* configure guarantees us that we have pthreads on any host except
- * mingw32, which doesn't support any of the user-only targets.
- * So we can simply assume we have pthread mutexes here.
- */
-#if defined(CONFIG_USER_ONLY)
-
-#include <pthread.h>
-#define spin_lock pthread_mutex_lock
-#define spin_unlock pthread_mutex_unlock
-#define spinlock_t pthread_mutex_t
-#define SPIN_LOCK_UNLOCKED PTHREAD_MUTEX_INITIALIZER
-
-#else
-
-/* Empty implementations, on the theory that system mode emulation
- * is single-threaded. This means that these functions should only
- * be used from code run in the TCG cpu thread, and cannot protect
- * data structures which might also be accessed from the IO thread
- * or from signal handlers.
- */
-typedef int spinlock_t;
-#define SPIN_LOCK_UNLOCKED 0
-
-static inline void spin_lock(spinlock_t *lock)
-{
-}
-
-static inline void spin_unlock(spinlock_t *lock)
-{
-}
-
-#endif
diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 7f0aae9..d1e482a 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -2664,11 +2664,6 @@ sub process {
 			WARN("Use of volatile is usually wrong: see Documentation/volatile-considered-harmful.txt\n" . $herecurr);
 		}
 
-# SPIN_LOCK_UNLOCKED & RW_LOCK_UNLOCKED are deprecated
-		if ($line =~ /\b(SPIN_LOCK_UNLOCKED|RW_LOCK_UNLOCKED)/) {
-			ERROR("Use of $1 is deprecated: see Documentation/spinlocks.txt\n" . $herecurr);
-		}
-
 # warn about #if 0
 		if ($line =~ /^.\s*\#\s*if\s+0\b/) {
 			CHK("if this code is redundant consider removing it\n" .
@@ -2717,8 +2712,8 @@ sub process {
 			ERROR("exactly one space required after that #$1\n" . $herecurr);
 		}
 
-# check for spinlock_t definitions without a comment.
-		if ($line =~ /^.\s*(struct\s+mutex|spinlock_t)\s+\S+;/ ||
+# check for mutex definitions without a comment.
+		if ($line =~ /^.\s*(struct\s+mutex)\s+\S+;/ ||
 		    $line =~ /^.\s*(DEFINE_MUTEX)\s*\(/) {
 			my $which = $1;
 			if (!ctx_has_comment($first_line, $linenr)) {
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [Qemu-devel] [RFC PATCH V6 04/18] add support for spin lock on POSIX systems exclusively
  2015-06-26 14:47 [Qemu-devel] [RFC PATCH V6 00/18] Multithread TCG fred.konrad
                   ` (2 preceding siblings ...)
  2015-06-26 14:47 ` [Qemu-devel] [RFC PATCH V6 03/18] remove unused spinlock fred.konrad
@ 2015-06-26 14:47 ` fred.konrad
  2015-06-26 14:55   ` Paolo Bonzini
  2015-06-26 14:47 ` [Qemu-devel] [RFC PATCH V6 05/18] protect TBContext with tb_lock fred.konrad
                   ` (13 subsequent siblings)
  17 siblings, 1 reply; 82+ messages in thread
From: fred.konrad @ 2015-06-26 14:47 UTC (permalink / raw)
  To: qemu-devel, mttcg
  Cc: peter.maydell, a.spyridakis, mark.burton, agraf,
	alistair.francis, guillaume.delbergue, pbonzini, alex.bennee,
	fred.konrad

From: Guillaume Delbergue <guillaume.delbergue@greensocs.com>

WARNING: spin lock is currently not implemented on WIN32

Signed-off-by: Guillaume Delbergue <guillaume.delbergue@greensocs.com>
---
 include/qemu/thread-posix.h |  4 ++++
 include/qemu/thread-win32.h |  4 ++++
 include/qemu/thread.h       |  7 +++++++
 util/qemu-thread-posix.c    | 45 +++++++++++++++++++++++++++++++++++++++++++++
 util/qemu-thread-win32.c    | 30 ++++++++++++++++++++++++++++++
 5 files changed, 90 insertions(+)

diff --git a/include/qemu/thread-posix.h b/include/qemu/thread-posix.h
index eb5c7a1..8ce8f01 100644
--- a/include/qemu/thread-posix.h
+++ b/include/qemu/thread-posix.h
@@ -7,6 +7,10 @@ struct QemuMutex {
     pthread_mutex_t lock;
 };
 
+struct QemuSpin {
+    pthread_spinlock_t lock;
+};
+
 struct QemuCond {
     pthread_cond_t cond;
 };
diff --git a/include/qemu/thread-win32.h b/include/qemu/thread-win32.h
index 3d58081..310c8bd 100644
--- a/include/qemu/thread-win32.h
+++ b/include/qemu/thread-win32.h
@@ -7,6 +7,10 @@ struct QemuMutex {
     LONG owner;
 };
 
+struct QemuSpin {
+    PKSPIN_LOCK lock;
+};
+
 struct QemuCond {
     LONG waiters, target;
     HANDLE sema;
diff --git a/include/qemu/thread.h b/include/qemu/thread.h
index 5114ec8..f5d1259 100644
--- a/include/qemu/thread.h
+++ b/include/qemu/thread.h
@@ -5,6 +5,7 @@
 #include <stdbool.h>
 
 typedef struct QemuMutex QemuMutex;
+typedef struct QemuSpin QemuSpin;
 typedef struct QemuCond QemuCond;
 typedef struct QemuSemaphore QemuSemaphore;
 typedef struct QemuEvent QemuEvent;
@@ -25,6 +26,12 @@ void qemu_mutex_lock(QemuMutex *mutex);
 int qemu_mutex_trylock(QemuMutex *mutex);
 void qemu_mutex_unlock(QemuMutex *mutex);
 
+void qemu_spin_init(QemuSpin *spin);
+void qemu_spin_destroy(QemuSpin *spin);
+void qemu_spin_lock(QemuSpin *spin);
+int qemu_spin_trylock(QemuSpin *spin);
+void qemu_spin_unlock(QemuSpin *spin);
+
 void qemu_cond_init(QemuCond *cond);
 void qemu_cond_destroy(QemuCond *cond);
 
diff --git a/util/qemu-thread-posix.c b/util/qemu-thread-posix.c
index ba67cec..224bacc 100644
--- a/util/qemu-thread-posix.c
+++ b/util/qemu-thread-posix.c
@@ -89,6 +89,51 @@ void qemu_mutex_unlock(QemuMutex *mutex)
         error_exit(err, __func__);
 }
 
+void qemu_spin_init(QemuSpin *spin)
+{
+    int err;
+
+    err = pthread_spin_init(&spin->lock, 0);
+    if (err) {
+        error_exit(err, __func__);
+    }
+}
+
+void qemu_spin_destroy(QemuSpin *spin)
+{
+    int err;
+
+    err = pthread_spin_destroy(&spin->lock);
+    if (err) {
+        error_exit(err, __func__);
+    }
+}
+
+void qemu_spin_lock(QemuSpin *spin)
+{
+    int err;
+
+    err = pthread_spin_lock(&spin->lock);
+    if (err) {
+        error_exit(err, __func__);
+    }
+}
+
+int qemu_spin_trylock(QemuSpin *spin)
+{
+    return pthread_spin_trylock(&spin->lock);
+}
+
+void qemu_spin_unlock(QemuSpin *spin)
+{
+    int err;
+
+    err = pthread_spin_unlock(&spin->lock);
+    if (err) {
+        error_exit(err, __func__);
+    }
+}
+
 void qemu_cond_init(QemuCond *cond)
 {
     int err;
diff --git a/util/qemu-thread-win32.c b/util/qemu-thread-win32.c
index 406b52f..6fbe6a8 100644
--- a/util/qemu-thread-win32.c
+++ b/util/qemu-thread-win32.c
@@ -80,6 +80,36 @@ void qemu_mutex_unlock(QemuMutex *mutex)
     LeaveCriticalSection(&mutex->lock);
 }
 
+void qemu_spin_init(QemuSpin *spin)
+{
+    printf("spinlock not implemented");
+    abort();
+}
+
+void qemu_spin_destroy(QemuSpin *spin)
+{
+    printf("spinlock not implemented");
+    abort();
+}
+
+void qemu_spin_lock(QemuSpin *spin)
+{
+    printf("spinlock not implemented");
+    abort();
+}
+
+int qemu_spin_trylock(QemuSpin *spin)
+{
+    printf("spinlock not implemented");
+    abort();
+}
+
+void qemu_spin_unlock(QemuSpin *spin)
+{
+    printf("spinlock not implemented");
+    abort();
+}
+
 void qemu_cond_init(QemuCond *cond)
 {
     memset(cond, 0, sizeof(*cond));
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [Qemu-devel] [RFC PATCH V6 05/18] protect TBContext with tb_lock.
  2015-06-26 14:47 [Qemu-devel] [RFC PATCH V6 00/18] Multithread TCG fred.konrad
                   ` (3 preceding siblings ...)
  2015-06-26 14:47 ` [Qemu-devel] [RFC PATCH V6 04/18] add support for spin lock on POSIX systems exclusively fred.konrad
@ 2015-06-26 14:47 ` fred.konrad
  2015-06-26 14:56   ` Paolo Bonzini
                     ` (2 more replies)
  2015-06-26 14:47 ` [Qemu-devel] [RFC PATCH V6 06/18] tcg: remove tcg_halt_cond global variable fred.konrad
                   ` (12 subsequent siblings)
  17 siblings, 3 replies; 82+ messages in thread
From: fred.konrad @ 2015-06-26 14:47 UTC (permalink / raw)
  To: qemu-devel, mttcg
  Cc: peter.maydell, a.spyridakis, mark.burton, agraf,
	alistair.francis, guillaume.delbergue, pbonzini, alex.bennee,
	fred.konrad

From: KONRAD Frederic <fred.konrad@greensocs.com>

This protects TBContext with tb_lock to make tb_* thread safe.

We can still have issue with tb_flush in case of multithread TCG:
  An other CPU can be executing code during a flush.

This can be fixed later by making all other TCG thread exiting before calling
tb_flush().

tb_find_slow is separated into tb_find_slow and tb_find_physical as the whole
tb_find_slow doesn't require to lock the tb.

Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>

Changes:
V1 -> V2:
  * Drop a tb_lock arround tb_find_fast in cpu-exec.c.
---
 cpu-exec.c             |  60 ++++++++++++++--------
 target-arm/translate.c |   5 ++
 tcg/tcg.h              |   7 +++
 translate-all.c        | 137 ++++++++++++++++++++++++++++++++++++++-----------
 4 files changed, 158 insertions(+), 51 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index d6336d9..5d9b518 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -130,6 +130,8 @@ static void init_delay_params(SyncClocks *sc, const CPUState *cpu)
 void cpu_loop_exit(CPUState *cpu)
 {
     cpu->current_tb = NULL;
+    /* Release those mutex before long jump so other thread can work. */
+    tb_lock_reset();
     siglongjmp(cpu->jmp_env, 1);
 }
 
@@ -142,6 +144,8 @@ void cpu_resume_from_signal(CPUState *cpu, void *puc)
     /* XXX: restore cpu registers saved in host registers */
 
     cpu->exception_index = -1;
+    /* Release those mutex before long jump so other thread can work. */
+    tb_lock_reset();
     siglongjmp(cpu->jmp_env, 1);
 }
 
@@ -253,12 +257,9 @@ static void cpu_exec_nocache(CPUArchState *env, int max_cycles,
     tb_free(tb);
 }
 
-static TranslationBlock *tb_find_slow(CPUArchState *env,
-                                      target_ulong pc,
-                                      target_ulong cs_base,
-                                      uint64_t flags)
+static TranslationBlock *tb_find_physical(CPUArchState *env, target_ulong pc,
+                                          target_ulong cs_base, uint64_t flags)
 {
-    CPUState *cpu = ENV_GET_CPU(env);
     TranslationBlock *tb, **ptb1;
     unsigned int h;
     tb_page_addr_t phys_pc, phys_page1;
@@ -273,8 +274,9 @@ static TranslationBlock *tb_find_slow(CPUArchState *env,
     ptb1 = &tcg_ctx.tb_ctx.tb_phys_hash[h];
     for(;;) {
         tb = *ptb1;
-        if (!tb)
-            goto not_found;
+        if (!tb) {
+            return tb;
+        }
         if (tb->pc == pc &&
             tb->page_addr[0] == phys_page1 &&
             tb->cs_base == cs_base &&
@@ -282,28 +284,43 @@ static TranslationBlock *tb_find_slow(CPUArchState *env,
             /* check next page if needed */
             if (tb->page_addr[1] != -1) {
                 tb_page_addr_t phys_page2;
-
                 virt_page2 = (pc & TARGET_PAGE_MASK) +
                     TARGET_PAGE_SIZE;
                 phys_page2 = get_page_addr_code(env, virt_page2);
-                if (tb->page_addr[1] == phys_page2)
-                    goto found;
+                if (tb->page_addr[1] == phys_page2) {
+                    return tb;
+                }
             } else {
-                goto found;
+                return tb;
             }
         }
         ptb1 = &tb->phys_hash_next;
     }
- not_found:
-   /* if no translated code available, then translate it now */
-    tb = tb_gen_code(cpu, pc, cs_base, flags, 0);
-
- found:
-    /* Move the last found TB to the head of the list */
-    if (likely(*ptb1)) {
-        *ptb1 = tb->phys_hash_next;
-        tb->phys_hash_next = tcg_ctx.tb_ctx.tb_phys_hash[h];
-        tcg_ctx.tb_ctx.tb_phys_hash[h] = tb;
+    return tb;
+}
+
+static TranslationBlock *tb_find_slow(CPUArchState *env, target_ulong pc,
+                                      target_ulong cs_base, uint64_t flags)
+{
+    /*
+     * First try to get the tb if we don't find it we need to lock and compile
+     * it.
+     */
+    CPUState *cpu = ENV_GET_CPU(env);
+    TranslationBlock *tb;
+
+    tb = tb_find_physical(env, pc, cs_base, flags);
+    if (!tb) {
+        tb_lock();
+        /*
+         * Retry to get the TB in case a CPU just translate it to avoid having
+         * duplicated TB in the pool.
+         */
+        tb = tb_find_physical(env, pc, cs_base, flags);
+        if (!tb) {
+            tb = tb_gen_code(cpu, pc, cs_base, flags, 0);
+        }
+        tb_unlock();
     }
     /* we add the TB in the virtual pc hash table */
     cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)] = tb;
@@ -326,6 +343,7 @@ static inline TranslationBlock *tb_find_fast(CPUArchState *env)
                  tb->flags != flags)) {
         tb = tb_find_slow(env, pc, cs_base, flags);
     }
+
     return tb;
 }
 
diff --git a/target-arm/translate.c b/target-arm/translate.c
index 971b6db..47345aa 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -11162,6 +11162,8 @@ static inline void gen_intermediate_code_internal(ARMCPU *cpu,
 
     dc->tb = tb;
 
+    tb_lock();
+
     dc->is_jmp = DISAS_NEXT;
     dc->pc = pc_start;
     dc->singlestep_enabled = cs->singlestep_enabled;
@@ -11499,6 +11501,7 @@ done_generating:
         tb->size = dc->pc - pc_start;
         tb->icount = num_insns;
     }
+    tb_unlock();
 }
 
 void gen_intermediate_code(CPUARMState *env, TranslationBlock *tb)
@@ -11567,6 +11570,7 @@ void arm_cpu_dump_state(CPUState *cs, FILE *f, fprintf_function cpu_fprintf,
 
 void restore_state_to_opc(CPUARMState *env, TranslationBlock *tb, int pc_pos)
 {
+    tb_lock();
     if (is_a64(env)) {
         env->pc = tcg_ctx.gen_opc_pc[pc_pos];
         env->condexec_bits = 0;
@@ -11574,4 +11578,5 @@ void restore_state_to_opc(CPUARMState *env, TranslationBlock *tb, int pc_pos)
         env->regs[15] = tcg_ctx.gen_opc_pc[pc_pos];
         env->condexec_bits = gen_opc_condexec_bits[pc_pos];
     }
+    tb_unlock();
 }
diff --git a/tcg/tcg.h b/tcg/tcg.h
index 41e4869..032fe10 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -592,17 +592,24 @@ void *tcg_malloc_internal(TCGContext *s, int size);
 void tcg_pool_reset(TCGContext *s);
 void tcg_pool_delete(TCGContext *s);
 
+void tb_lock(void);
+void tb_unlock(void);
+void tb_lock_reset(void);
+
 static inline void *tcg_malloc(int size)
 {
     TCGContext *s = &tcg_ctx;
     uint8_t *ptr, *ptr_end;
+    tb_lock();
     size = (size + sizeof(long) - 1) & ~(sizeof(long) - 1);
     ptr = s->pool_cur;
     ptr_end = ptr + size;
     if (unlikely(ptr_end > s->pool_end)) {
+        tb_unlock();
         return tcg_malloc_internal(&tcg_ctx, size);
     } else {
         s->pool_cur = ptr_end;
+        tb_unlock();
         return ptr;
     }
 }
diff --git a/translate-all.c b/translate-all.c
index b6b0e1c..c25b79b 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -127,6 +127,34 @@ static void *l1_map[V_L1_SIZE];
 /* code generation context */
 TCGContext tcg_ctx;
 
+/* translation block context */
+__thread volatile int have_tb_lock;
+
+void tb_lock(void)
+{
+    if (!have_tb_lock) {
+        qemu_mutex_lock(&tcg_ctx.tb_ctx.tb_lock);
+    }
+    have_tb_lock++;
+}
+
+void tb_unlock(void)
+{
+    assert(have_tb_lock > 0);
+    have_tb_lock--;
+    if (!have_tb_lock) {
+        qemu_mutex_unlock(&tcg_ctx.tb_ctx.tb_lock);
+    }
+}
+
+void tb_lock_reset(void)
+{
+    if (have_tb_lock) {
+        qemu_mutex_unlock(&tcg_ctx.tb_ctx.tb_lock);
+    }
+    have_tb_lock = 0;
+}
+
 static void tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc,
                          tb_page_addr_t phys_page2);
 static TranslationBlock *tb_find_pc(uintptr_t tc_ptr);
@@ -215,6 +243,7 @@ static int cpu_restore_state_from_tb(CPUState *cpu, TranslationBlock *tb,
 #ifdef CONFIG_PROFILER
     ti = profile_getclock();
 #endif
+    tb_lock();
     tcg_func_start(s);
 
     gen_intermediate_code_pc(env, tb);
@@ -228,8 +257,10 @@ static int cpu_restore_state_from_tb(CPUState *cpu, TranslationBlock *tb,
 
     /* find opc index corresponding to search_pc */
     tc_ptr = (uintptr_t)tb->tc_ptr;
-    if (searched_pc < tc_ptr)
+    if (searched_pc < tc_ptr) {
+        tb_unlock();
         return -1;
+    }
 
     s->tb_next_offset = tb->tb_next_offset;
 #ifdef USE_DIRECT_JUMP
@@ -241,8 +272,10 @@ static int cpu_restore_state_from_tb(CPUState *cpu, TranslationBlock *tb,
 #endif
     j = tcg_gen_code_search_pc(s, (tcg_insn_unit *)tc_ptr,
                                searched_pc - tc_ptr);
-    if (j < 0)
+    if (j < 0) {
+        tb_unlock();
         return -1;
+    }
     /* now find start of instruction before */
     while (s->gen_opc_instr_start[j] == 0) {
         j--;
@@ -255,6 +288,8 @@ static int cpu_restore_state_from_tb(CPUState *cpu, TranslationBlock *tb,
     s->restore_time += profile_getclock() - ti;
     s->restore_count++;
 #endif
+
+    tb_unlock();
     return 0;
 }
 
@@ -672,6 +707,7 @@ static inline void code_gen_alloc(size_t tb_size)
             CODE_GEN_AVG_BLOCK_SIZE;
     tcg_ctx.tb_ctx.tbs =
             g_malloc(tcg_ctx.code_gen_max_blocks * sizeof(TranslationBlock));
+    qemu_mutex_init(&tcg_ctx.tb_ctx.tb_lock);
 }
 
 /* Must be called before using the QEMU cpus. 'tb_size' is the size
@@ -696,16 +732,22 @@ bool tcg_enabled(void)
     return tcg_ctx.code_gen_buffer != NULL;
 }
 
-/* Allocate a new translation block. Flush the translation buffer if
-   too many translation blocks or too much generated code. */
+/*
+ * Allocate a new translation block. Flush the translation buffer if
+ * too many translation blocks or too much generated code.
+ * tb_alloc is not thread safe but tb_gen_code is protected by a mutex so this
+ * function is called only by one thread.
+ */
 static TranslationBlock *tb_alloc(target_ulong pc)
 {
-    TranslationBlock *tb;
+    TranslationBlock *tb = NULL;
 
     if (tcg_ctx.tb_ctx.nb_tbs >= tcg_ctx.code_gen_max_blocks ||
         (tcg_ctx.code_gen_ptr - tcg_ctx.code_gen_buffer) >=
          tcg_ctx.code_gen_buffer_max_size) {
-        return NULL;
+        tb = &tcg_ctx.tb_ctx.tbs[tcg_ctx.tb_ctx.nb_tbs++];
+        tb->pc = pc;
+        tb->cflags = 0;
     }
     tb = &tcg_ctx.tb_ctx.tbs[tcg_ctx.tb_ctx.nb_tbs++];
     tb->pc = pc;
@@ -718,11 +760,16 @@ void tb_free(TranslationBlock *tb)
     /* In practice this is mostly used for single use temporary TB
        Ignore the hard cases and just back up if this TB happens to
        be the last one generated.  */
+
+    tb_lock();
+
     if (tcg_ctx.tb_ctx.nb_tbs > 0 &&
             tb == &tcg_ctx.tb_ctx.tbs[tcg_ctx.tb_ctx.nb_tbs - 1]) {
         tcg_ctx.code_gen_ptr = tb->tc_ptr;
         tcg_ctx.tb_ctx.nb_tbs--;
     }
+
+    tb_unlock();
 }
 
 static inline void invalidate_page_bitmap(PageDesc *p)
@@ -773,6 +820,8 @@ void tb_flush(CPUArchState *env1)
 {
     CPUState *cpu = ENV_GET_CPU(env1);
 
+    tb_lock();
+
 #if defined(DEBUG_FLUSH)
     printf("qemu: flush code_size=%ld nb_tbs=%d avg_tb_size=%ld\n",
            (unsigned long)(tcg_ctx.code_gen_ptr - tcg_ctx.code_gen_buffer),
@@ -797,6 +846,8 @@ void tb_flush(CPUArchState *env1)
     /* XXX: flush processor icache at this point if cache flush is
        expensive */
     tcg_ctx.tb_ctx.tb_flush_count++;
+
+    tb_unlock();
 }
 
 #ifdef DEBUG_TB_CHECK
@@ -806,6 +857,8 @@ static void tb_invalidate_check(target_ulong address)
     TranslationBlock *tb;
     int i;
 
+    tb_lock();
+
     address &= TARGET_PAGE_MASK;
     for (i = 0; i < CODE_GEN_PHYS_HASH_SIZE; i++) {
         for (tb = tb_ctx.tb_phys_hash[i]; tb != NULL; tb = tb->phys_hash_next) {
@@ -817,6 +870,8 @@ static void tb_invalidate_check(target_ulong address)
             }
         }
     }
+
+    tb_unlock();
 }
 
 /* verify that all the pages have correct rights for code */
@@ -825,6 +880,8 @@ static void tb_page_check(void)
     TranslationBlock *tb;
     int i, flags1, flags2;
 
+    tb_lock();
+
     for (i = 0; i < CODE_GEN_PHYS_HASH_SIZE; i++) {
         for (tb = tcg_ctx.tb_ctx.tb_phys_hash[i]; tb != NULL;
                 tb = tb->phys_hash_next) {
@@ -836,6 +893,8 @@ static void tb_page_check(void)
             }
         }
     }
+
+    tb_unlock();
 }
 
 #endif
@@ -916,6 +975,8 @@ void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr)
     tb_page_addr_t phys_pc;
     TranslationBlock *tb1, *tb2;
 
+    tb_lock();
+
     /* remove the TB from the hash list */
     phys_pc = tb->page_addr[0] + (tb->pc & ~TARGET_PAGE_MASK);
     h = tb_phys_hash_func(phys_pc);
@@ -963,6 +1024,7 @@ void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr)
     tb->jmp_first = (TranslationBlock *)((uintptr_t)tb | 2); /* fail safe */
 
     tcg_ctx.tb_ctx.tb_phys_invalidate_count++;
+    tb_unlock();
 }
 
 static void build_page_bitmap(PageDesc *p)
@@ -1004,6 +1066,8 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
     target_ulong virt_page2;
     int code_gen_size;
 
+    tb_lock();
+
     phys_pc = get_page_addr_code(env, pc);
     if (use_icount) {
         cflags |= CF_USE_ICOUNT;
@@ -1032,6 +1096,8 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
         phys_page2 = get_page_addr_code(env, virt_page2);
     }
     tb_link_page(tb, phys_pc, phys_page2);
+
+    tb_unlock();
     return tb;
 }
 
@@ -1330,13 +1396,15 @@ static inline void tb_alloc_page(TranslationBlock *tb,
 }
 
 /* add a new TB and link it to the physical page tables. phys_page2 is
-   (-1) to indicate that only one page contains the TB. */
+ * (-1) to indicate that only one page contains the TB. */
 static void tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc,
                          tb_page_addr_t phys_page2)
 {
     unsigned int h;
     TranslationBlock **ptb;
 
+    tb_lock();
+
     /* Grab the mmap lock to stop another thread invalidating this TB
        before we are done.  */
     mmap_lock();
@@ -1370,6 +1438,8 @@ static void tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc,
     tb_page_check();
 #endif
     mmap_unlock();
+
+    tb_unlock();
 }
 
 /* find the TB 'tb' such that tb[0].tc_ptr <= tc_ptr <
@@ -1378,31 +1448,34 @@ static TranslationBlock *tb_find_pc(uintptr_t tc_ptr)
 {
     int m_min, m_max, m;
     uintptr_t v;
-    TranslationBlock *tb;
-
-    if (tcg_ctx.tb_ctx.nb_tbs <= 0) {
-        return NULL;
-    }
-    if (tc_ptr < (uintptr_t)tcg_ctx.code_gen_buffer ||
-        tc_ptr >= (uintptr_t)tcg_ctx.code_gen_ptr) {
-        return NULL;
-    }
-    /* binary search (cf Knuth) */
-    m_min = 0;
-    m_max = tcg_ctx.tb_ctx.nb_tbs - 1;
-    while (m_min <= m_max) {
-        m = (m_min + m_max) >> 1;
-        tb = &tcg_ctx.tb_ctx.tbs[m];
-        v = (uintptr_t)tb->tc_ptr;
-        if (v == tc_ptr) {
-            return tb;
-        } else if (tc_ptr < v) {
-            m_max = m - 1;
-        } else {
-            m_min = m + 1;
+    TranslationBlock *tb = NULL;
+
+    tb_lock();
+
+    if ((tcg_ctx.tb_ctx.nb_tbs > 0)
+    && (tc_ptr >= (uintptr_t)tcg_ctx.code_gen_buffer &&
+        tc_ptr < (uintptr_t)tcg_ctx.code_gen_ptr)) {
+        /* binary search (cf Knuth) */
+        m_min = 0;
+        m_max = tcg_ctx.tb_ctx.nb_tbs - 1;
+        while (m_min <= m_max) {
+            m = (m_min + m_max) >> 1;
+            tb = &tcg_ctx.tb_ctx.tbs[m];
+            v = (uintptr_t)tb->tc_ptr;
+            if (v == tc_ptr) {
+                tb_unlock();
+                return tb;
+            } else if (tc_ptr < v) {
+                m_max = m - 1;
+            } else {
+                m_min = m + 1;
+            }
         }
+        tb = &tcg_ctx.tb_ctx.tbs[m_max];
     }
-    return &tcg_ctx.tb_ctx.tbs[m_max];
+
+    tb_unlock();
+    return tb;
 }
 
 #if !defined(CONFIG_USER_ONLY)
@@ -1564,6 +1637,8 @@ void dump_exec_info(FILE *f, fprintf_function cpu_fprintf)
     int direct_jmp_count, direct_jmp2_count, cross_page;
     TranslationBlock *tb;
 
+    tb_lock();
+
     target_code_size = 0;
     max_target_code_size = 0;
     cross_page = 0;
@@ -1619,6 +1694,8 @@ void dump_exec_info(FILE *f, fprintf_function cpu_fprintf)
             tcg_ctx.tb_ctx.tb_phys_invalidate_count);
     cpu_fprintf(f, "TLB flush count     %d\n", tlb_flush_count);
     tcg_dump_info(f, cpu_fprintf);
+
+    tb_unlock();
 }
 
 void dump_opcount_info(FILE *f, fprintf_function cpu_fprintf)
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [Qemu-devel] [RFC PATCH V6 06/18] tcg: remove tcg_halt_cond global variable.
  2015-06-26 14:47 [Qemu-devel] [RFC PATCH V6 00/18] Multithread TCG fred.konrad
                   ` (4 preceding siblings ...)
  2015-06-26 14:47 ` [Qemu-devel] [RFC PATCH V6 05/18] protect TBContext with tb_lock fred.konrad
@ 2015-06-26 14:47 ` fred.konrad
  2015-06-26 15:02   ` Paolo Bonzini
  2015-06-26 14:47 ` [Qemu-devel] [RFC PATCH V6 07/18] Drop global lock during TCG code execution fred.konrad
                   ` (11 subsequent siblings)
  17 siblings, 1 reply; 82+ messages in thread
From: fred.konrad @ 2015-06-26 14:47 UTC (permalink / raw)
  To: qemu-devel, mttcg
  Cc: peter.maydell, a.spyridakis, mark.burton, agraf,
	alistair.francis, guillaume.delbergue, pbonzini, alex.bennee,
	fred.konrad

From: KONRAD Frederic <fred.konrad@greensocs.com>

This removes tcg_halt_cond global variable.
We need one QemuCond per virtual cpu for multithread TCG.

Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
---
 cpus.c | 18 +++++++-----------
 1 file changed, 7 insertions(+), 11 deletions(-)

diff --git a/cpus.c b/cpus.c
index 2d62a35..79383df 100644
--- a/cpus.c
+++ b/cpus.c
@@ -813,7 +813,6 @@ static unsigned iothread_requesting_mutex;
 static QemuThread io_thread;
 
 static QemuThread *tcg_cpu_thread;
-static QemuCond *tcg_halt_cond;
 
 /* cpu creation */
 static QemuCond qemu_cpu_cond;
@@ -919,15 +918,13 @@ static void qemu_wait_io_event_common(CPUState *cpu)
     cpu->thread_kicked = false;
 }
 
-static void qemu_tcg_wait_io_event(void)
+static void qemu_tcg_wait_io_event(CPUState *cpu)
 {
-    CPUState *cpu;
-
     while (all_cpu_threads_idle()) {
        /* Start accounting real time to the virtual clock if the CPUs
           are idle.  */
         qemu_clock_warp(QEMU_CLOCK_VIRTUAL);
-        qemu_cond_wait(tcg_halt_cond, &qemu_global_mutex);
+        qemu_cond_wait(cpu->halt_cond, &qemu_global_mutex);
     }
 
     while (iothread_requesting_mutex) {
@@ -1047,7 +1044,7 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
 
     /* wait for initial kick-off after machine start */
     while (first_cpu->stopped) {
-        qemu_cond_wait(tcg_halt_cond, &qemu_global_mutex);
+        qemu_cond_wait(first_cpu->halt_cond, &qemu_global_mutex);
 
         /* process any pending work */
         CPU_FOREACH(cpu) {
@@ -1068,7 +1065,7 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
                 qemu_clock_notify(QEMU_CLOCK_VIRTUAL);
             }
         }
-        qemu_tcg_wait_io_event();
+        qemu_tcg_wait_io_event(QTAILQ_FIRST(&cpus));
     }
 
     return NULL;
@@ -1235,12 +1232,12 @@ static void qemu_tcg_init_vcpu(CPUState *cpu)
 
     tcg_cpu_address_space_init(cpu, cpu->as);
 
+    cpu->halt_cond = g_malloc0(sizeof(QemuCond));
+    qemu_cond_init(cpu->halt_cond);
+
     /* share a single thread for all cpus with TCG */
     if (!tcg_cpu_thread) {
         cpu->thread = g_malloc0(sizeof(QemuThread));
-        cpu->halt_cond = g_malloc0(sizeof(QemuCond));
-        qemu_cond_init(cpu->halt_cond);
-        tcg_halt_cond = cpu->halt_cond;
         snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/TCG",
                  cpu->cpu_index);
         qemu_thread_create(cpu->thread, thread_name, qemu_tcg_cpu_thread_fn,
@@ -1254,7 +1251,6 @@ static void qemu_tcg_init_vcpu(CPUState *cpu)
         tcg_cpu_thread = cpu->thread;
     } else {
         cpu->thread = tcg_cpu_thread;
-        cpu->halt_cond = tcg_halt_cond;
     }
 }
 
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [Qemu-devel] [RFC PATCH V6 07/18] Drop global lock during TCG code execution
  2015-06-26 14:47 [Qemu-devel] [RFC PATCH V6 00/18] Multithread TCG fred.konrad
                   ` (5 preceding siblings ...)
  2015-06-26 14:47 ` [Qemu-devel] [RFC PATCH V6 06/18] tcg: remove tcg_halt_cond global variable fred.konrad
@ 2015-06-26 14:47 ` fred.konrad
  2015-06-26 14:56   ` Jan Kiszka
  2015-06-26 14:47 ` [Qemu-devel] [RFC PATCH V6 08/18] cpu: remove exit_request global fred.konrad
                   ` (10 subsequent siblings)
  17 siblings, 1 reply; 82+ messages in thread
From: fred.konrad @ 2015-06-26 14:47 UTC (permalink / raw)
  To: qemu-devel, mttcg
  Cc: peter.maydell, a.spyridakis, mark.burton, agraf,
	alistair.francis, guillaume.delbergue, Jan Kiszka, pbonzini,
	alex.bennee, fred.konrad

From: Jan Kiszka <jan.kiszka@siemens.com>

This finally allows TCG to benefit from the iothread introduction: Drop
the global mutex while running pure TCG CPU code. Reacquire the lock
when entering MMIO or PIO emulation, or when leaving the TCG loop.

We have to revert a few optimization for the current TCG threading
model, namely kicking the TCG thread in qemu_mutex_lock_iothread and not
kicking it in qemu_cpu_kick. We also need to disable RAM block
reordering until we have a more efficient locking mechanism at hand.

I'm pretty sure some cases are still broken, definitely SMP (we no
longer perform round-robin scheduling "by chance"). Still, a Linux x86
UP guest and my Musicpal ARM model boot fine here. These numbers
demonstrate where we gain something:

20338 jan       20   0  331m  75m 6904 R   99  0.9   0:50.95 qemu-system-arm
20337 jan       20   0  331m  75m 6904 S   20  0.9   0:26.50 qemu-system-arm

The guest CPU was fully loaded, but the iothread could still run mostly
independent on a second core. Without the patch we don't get beyond

32206 jan       20   0  330m  73m 7036 R   82  0.9   1:06.00 qemu-system-arm
32204 jan       20   0  330m  73m 7036 S   21  0.9   0:17.03 qemu-system-arm

We don't benefit significantly, though, when the guest is not fully
loading a host CPU.

Note that this patch depends on
http://thread.gmane.org/gmane.comp.emulators.qemu/118657

Changes from Fred Konrad:
  * Rebase on the current HEAD.
  * Fixes a deadlock in qemu_devices_reset().
---
 cpus.c                    | 17 ++++-------------
 cputlb.c                  |  5 +++++
 exec.c                    | 25 +++++++++++++++++++++++++
 softmmu_template.h        |  5 +++++
 target-i386/misc_helper.c | 27 ++++++++++++++++++++++++---
 translate-all.c           |  2 ++
 vl.c                      |  6 ++++++
 7 files changed, 71 insertions(+), 16 deletions(-)

diff --git a/cpus.c b/cpus.c
index 79383df..23c316c 100644
--- a/cpus.c
+++ b/cpus.c
@@ -1034,7 +1034,7 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
     qemu_tcg_init_cpu_signals();
     qemu_thread_get_self(cpu->thread);
 
-    qemu_mutex_lock(&qemu_global_mutex);
+    qemu_mutex_lock_iothread();
     CPU_FOREACH(cpu) {
         cpu->thread_id = qemu_get_thread_id();
         cpu->created = true;
@@ -1145,18 +1145,7 @@ bool qemu_in_vcpu_thread(void)
 
 void qemu_mutex_lock_iothread(void)
 {
-    atomic_inc(&iothread_requesting_mutex);
-    if (!tcg_enabled() || !first_cpu || !first_cpu->thread) {
-        qemu_mutex_lock(&qemu_global_mutex);
-        atomic_dec(&iothread_requesting_mutex);
-    } else {
-        if (qemu_mutex_trylock(&qemu_global_mutex)) {
-            qemu_cpu_kick_thread(first_cpu);
-            qemu_mutex_lock(&qemu_global_mutex);
-        }
-        atomic_dec(&iothread_requesting_mutex);
-        qemu_cond_broadcast(&qemu_io_proceeded_cond);
-    }
+    qemu_mutex_lock(&qemu_global_mutex);
 }
 
 void qemu_mutex_unlock_iothread(void)
@@ -1377,7 +1366,9 @@ static int tcg_cpu_exec(CPUArchState *env)
         cpu->icount_decr.u16.low = decr;
         cpu->icount_extra = count;
     }
+    qemu_mutex_unlock_iothread();
     ret = cpu_exec(env);
+    qemu_mutex_lock_iothread();
 #ifdef CONFIG_PROFILER
     tcg_time += profile_getclock() - ti;
 #endif
diff --git a/cputlb.c b/cputlb.c
index a506086..79fff1c 100644
--- a/cputlb.c
+++ b/cputlb.c
@@ -30,6 +30,9 @@
 #include "exec/ram_addr.h"
 #include "tcg/tcg.h"
 
+void qemu_mutex_lock_iothread(void);
+void qemu_mutex_unlock_iothread(void);
+
 //#define DEBUG_TLB
 //#define DEBUG_TLB_CHECK
 
@@ -125,8 +128,10 @@ void tlb_flush_page(CPUState *cpu, target_ulong addr)
    can be detected */
 void tlb_protect_code(ram_addr_t ram_addr)
 {
+    qemu_mutex_lock_iothread();
     cpu_physical_memory_test_and_clear_dirty(ram_addr, TARGET_PAGE_SIZE,
                                              DIRTY_MEMORY_CODE);
+    qemu_mutex_unlock_iothread();
 }
 
 /* update the TLB so that writes in physical page 'phys_addr' are no longer
diff --git a/exec.c b/exec.c
index f7883d2..964e922 100644
--- a/exec.c
+++ b/exec.c
@@ -1881,6 +1881,7 @@ static void check_watchpoint(int offset, int len, MemTxAttrs attrs, int flags)
             wp->hitaddr = vaddr;
             wp->hitattrs = attrs;
             if (!cpu->watchpoint_hit) {
+                qemu_mutex_unlock_iothread();
                 cpu->watchpoint_hit = wp;
                 tb_check_watchpoint(cpu);
                 if (wp->flags & BP_STOP_BEFORE_ACCESS) {
@@ -2740,6 +2741,7 @@ static inline uint32_t address_space_ldl_internal(AddressSpace *as, hwaddr addr,
     mr = address_space_translate(as, addr, &addr1, &l, false);
     if (l < 4 || !memory_access_is_direct(mr, false)) {
         /* I/O case */
+        qemu_mutex_lock_iothread();
         r = memory_region_dispatch_read(mr, addr1, &val, 4, attrs);
 #if defined(TARGET_WORDS_BIGENDIAN)
         if (endian == DEVICE_LITTLE_ENDIAN) {
@@ -2750,6 +2752,7 @@ static inline uint32_t address_space_ldl_internal(AddressSpace *as, hwaddr addr,
             val = bswap32(val);
         }
 #endif
+        qemu_mutex_unlock_iothread();
     } else {
         /* RAM case */
         ptr = qemu_get_ram_ptr((memory_region_get_ram_addr(mr)
@@ -2829,6 +2832,7 @@ static inline uint64_t address_space_ldq_internal(AddressSpace *as, hwaddr addr,
                                  false);
     if (l < 8 || !memory_access_is_direct(mr, false)) {
         /* I/O case */
+        qemu_mutex_lock_iothread();
         r = memory_region_dispatch_read(mr, addr1, &val, 8, attrs);
 #if defined(TARGET_WORDS_BIGENDIAN)
         if (endian == DEVICE_LITTLE_ENDIAN) {
@@ -2839,6 +2843,7 @@ static inline uint64_t address_space_ldq_internal(AddressSpace *as, hwaddr addr,
             val = bswap64(val);
         }
 #endif
+        qemu_mutex_unlock_iothread();
     } else {
         /* RAM case */
         ptr = qemu_get_ram_ptr((memory_region_get_ram_addr(mr)
@@ -2938,7 +2943,9 @@ static inline uint32_t address_space_lduw_internal(AddressSpace *as,
                                  false);
     if (l < 2 || !memory_access_is_direct(mr, false)) {
         /* I/O case */
+        qemu_mutex_lock_iothread();
         r = memory_region_dispatch_read(mr, addr1, &val, 2, attrs);
+        qemu_mutex_unlock_iothread();
 #if defined(TARGET_WORDS_BIGENDIAN)
         if (endian == DEVICE_LITTLE_ENDIAN) {
             val = bswap16(val);
@@ -3026,15 +3033,19 @@ void address_space_stl_notdirty(AddressSpace *as, hwaddr addr, uint32_t val,
     mr = address_space_translate(as, addr, &addr1, &l,
                                  true);
     if (l < 4 || !memory_access_is_direct(mr, true)) {
+        qemu_mutex_lock_iothread();
         r = memory_region_dispatch_write(mr, addr1, val, 4, attrs);
+        qemu_mutex_unlock_iothread();
     } else {
         addr1 += memory_region_get_ram_addr(mr) & TARGET_PAGE_MASK;
         ptr = qemu_get_ram_ptr(addr1);
         stl_p(ptr, val);
 
+        qemu_mutex_lock_iothread();
         dirty_log_mask = memory_region_get_dirty_log_mask(mr);
         dirty_log_mask &= ~(1 << DIRTY_MEMORY_CODE);
         cpu_physical_memory_set_dirty_range(addr1, 4, dirty_log_mask);
+        qemu_mutex_unlock_iothread();
         r = MEMTX_OK;
     }
     if (result) {
@@ -3074,7 +3085,9 @@ static inline void address_space_stl_internal(AddressSpace *as,
             val = bswap32(val);
         }
 #endif
+        qemu_mutex_lock_iothread();
         r = memory_region_dispatch_write(mr, addr1, val, 4, attrs);
+        qemu_mutex_unlock_iothread();
     } else {
         /* RAM case */
         addr1 += memory_region_get_ram_addr(mr) & TARGET_PAGE_MASK;
@@ -3090,7 +3103,9 @@ static inline void address_space_stl_internal(AddressSpace *as,
             stl_p(ptr, val);
             break;
         }
+        qemu_mutex_lock_iothread();
         invalidate_and_set_dirty(mr, addr1, 4);
+        qemu_mutex_unlock_iothread();
         r = MEMTX_OK;
     }
     if (result) {
@@ -3178,7 +3193,9 @@ static inline void address_space_stw_internal(AddressSpace *as,
             val = bswap16(val);
         }
 #endif
+        qemu_mutex_lock_iothread();
         r = memory_region_dispatch_write(mr, addr1, val, 2, attrs);
+        qemu_mutex_unlock_iothread();
     } else {
         /* RAM case */
         addr1 += memory_region_get_ram_addr(mr) & TARGET_PAGE_MASK;
@@ -3194,7 +3211,9 @@ static inline void address_space_stw_internal(AddressSpace *as,
             stw_p(ptr, val);
             break;
         }
+        qemu_mutex_lock_iothread();
         invalidate_and_set_dirty(mr, addr1, 2);
+        qemu_mutex_unlock_iothread();
         r = MEMTX_OK;
     }
     if (result) {
@@ -3245,7 +3264,9 @@ void address_space_stq(AddressSpace *as, hwaddr addr, uint64_t val,
 {
     MemTxResult r;
     val = tswap64(val);
+    qemu_mutex_lock_iothread();
     r = address_space_rw(as, addr, attrs, (void *) &val, 8, 1);
+    qemu_mutex_unlock_iothread();
     if (result) {
         *result = r;
     }
@@ -3256,7 +3277,9 @@ void address_space_stq_le(AddressSpace *as, hwaddr addr, uint64_t val,
 {
     MemTxResult r;
     val = cpu_to_le64(val);
+    qemu_mutex_lock_iothread();
     r = address_space_rw(as, addr, attrs, (void *) &val, 8, 1);
+    qemu_mutex_unlock_iothread();
     if (result) {
         *result = r;
     }
@@ -3266,7 +3289,9 @@ void address_space_stq_be(AddressSpace *as, hwaddr addr, uint64_t val,
 {
     MemTxResult r;
     val = cpu_to_be64(val);
+    qemu_mutex_lock_iothread();
     r = address_space_rw(as, addr, attrs, (void *) &val, 8, 1);
+    qemu_mutex_unlock_iothread();
     if (result) {
         *result = r;
     }
diff --git a/softmmu_template.h b/softmmu_template.h
index d42d89d..18871f5 100644
--- a/softmmu_template.h
+++ b/softmmu_template.h
@@ -158,9 +158,12 @@ static inline DATA_TYPE glue(io_read, SUFFIX)(CPUArchState *env,
         cpu_io_recompile(cpu, retaddr);
     }
 
+    qemu_mutex_lock_iothread();
+
     cpu->mem_io_vaddr = addr;
     memory_region_dispatch_read(mr, physaddr, &val, 1 << SHIFT,
                                 iotlbentry->attrs);
+    qemu_mutex_unlock_iothread();
     return val;
 }
 #endif
@@ -378,10 +381,12 @@ static inline void glue(io_write, SUFFIX)(CPUArchState *env,
         cpu_io_recompile(cpu, retaddr);
     }
 
+    qemu_mutex_lock_iothread();
     cpu->mem_io_vaddr = addr;
     cpu->mem_io_pc = retaddr;
     memory_region_dispatch_write(mr, physaddr, val, 1 << SHIFT,
                                  iotlbentry->attrs);
+    qemu_mutex_unlock_iothread();
 }
 
 void helper_le_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
diff --git a/target-i386/misc_helper.c b/target-i386/misc_helper.c
index 52c5d65..55f63bf 100644
--- a/target-i386/misc_helper.c
+++ b/target-i386/misc_helper.c
@@ -27,8 +27,10 @@ void helper_outb(CPUX86State *env, uint32_t port, uint32_t data)
 #ifdef CONFIG_USER_ONLY
     fprintf(stderr, "outb: port=0x%04x, data=%02x\n", port, data);
 #else
+    qemu_mutex_lock_iothread();
     address_space_stb(&address_space_io, port, data,
                       cpu_get_mem_attrs(env), NULL);
+    qemu_mutex_unlock_iothread();
 #endif
 }
 
@@ -38,8 +40,13 @@ target_ulong helper_inb(CPUX86State *env, uint32_t port)
     fprintf(stderr, "inb: port=0x%04x\n", port);
     return 0;
 #else
-    return address_space_ldub(&address_space_io, port,
+    target_ulong ret;
+
+    qemu_mutex_lock_iothread();
+    ret = address_space_ldub(&address_space_io, port,
                               cpu_get_mem_attrs(env), NULL);
+    qemu_mutex_unlock_iothread();
+    return ret;
 #endif
 }
 
@@ -48,8 +55,10 @@ void helper_outw(CPUX86State *env, uint32_t port, uint32_t data)
 #ifdef CONFIG_USER_ONLY
     fprintf(stderr, "outw: port=0x%04x, data=%04x\n", port, data);
 #else
+    qemu_mutex_lock_iothread();
     address_space_stw(&address_space_io, port, data,
                       cpu_get_mem_attrs(env), NULL);
+    qemu_mutex_unlock_iothread();
 #endif
 }
 
@@ -59,8 +68,13 @@ target_ulong helper_inw(CPUX86State *env, uint32_t port)
     fprintf(stderr, "inw: port=0x%04x\n", port);
     return 0;
 #else
-    return address_space_lduw(&address_space_io, port,
+    target_ulong ret;
+
+    qemu_mutex_lock_iothread();
+    ret = address_space_lduw(&address_space_io, port,
                               cpu_get_mem_attrs(env), NULL);
+    qemu_mutex_unlock_iothread();
+    return ret;
 #endif
 }
 
@@ -69,8 +83,10 @@ void helper_outl(CPUX86State *env, uint32_t port, uint32_t data)
 #ifdef CONFIG_USER_ONLY
     fprintf(stderr, "outw: port=0x%04x, data=%08x\n", port, data);
 #else
+    qemu_mutex_lock_iothread();
     address_space_stl(&address_space_io, port, data,
                       cpu_get_mem_attrs(env), NULL);
+    qemu_mutex_unlock_iothread();
 #endif
 }
 
@@ -80,8 +96,13 @@ target_ulong helper_inl(CPUX86State *env, uint32_t port)
     fprintf(stderr, "inl: port=0x%04x\n", port);
     return 0;
 #else
-    return address_space_ldl(&address_space_io, port,
+    target_ulong ret;
+
+    qemu_mutex_lock_iothread();
+    ret = address_space_ldl(&address_space_io, port,
                              cpu_get_mem_attrs(env), NULL);
+    qemu_mutex_unlock_iothread();
+    return ret;
 #endif
 }
 
diff --git a/translate-all.c b/translate-all.c
index c25b79b..ade2269 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -1222,6 +1222,7 @@ void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end,
 #endif
 #ifdef TARGET_HAS_PRECISE_SMC
     if (current_tb_modified) {
+        qemu_mutex_unlock_iothread();
         /* we generate a block containing just the instruction
            modifying the memory. It will ensure that it cannot modify
            itself */
@@ -1326,6 +1327,7 @@ static void tb_invalidate_phys_page(tb_page_addr_t addr,
     p->first_tb = NULL;
 #ifdef TARGET_HAS_PRECISE_SMC
     if (current_tb_modified) {
+        qemu_mutex_unlock_iothread();
         /* we generate a block containing just the instruction
            modifying the memory. It will ensure that it cannot modify
            itself */
diff --git a/vl.c b/vl.c
index 69ad90c..2983d44 100644
--- a/vl.c
+++ b/vl.c
@@ -1698,10 +1698,16 @@ void qemu_devices_reset(void)
 {
     QEMUResetEntry *re, *nre;
 
+    /*
+     * Some device's reset needs to grab the global_mutex. So just release it
+     * here.
+     */
+    qemu_mutex_unlock_iothread();
     /* reset all devices */
     QTAILQ_FOREACH_SAFE(re, &reset_handlers, entry, nre) {
         re->func(re->opaque);
     }
+    qemu_mutex_lock_iothread();
 }
 
 void qemu_system_reset(bool report)
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [Qemu-devel] [RFC PATCH V6 08/18] cpu: remove exit_request global.
  2015-06-26 14:47 [Qemu-devel] [RFC PATCH V6 00/18] Multithread TCG fred.konrad
                   ` (6 preceding siblings ...)
  2015-06-26 14:47 ` [Qemu-devel] [RFC PATCH V6 07/18] Drop global lock during TCG code execution fred.konrad
@ 2015-06-26 14:47 ` fred.konrad
  2015-06-26 15:03   ` Paolo Bonzini
  2015-07-07 13:04   ` Alex Bennée
  2015-06-26 14:47 ` [Qemu-devel] [RFC PATCH V6 09/18] cpu: add a tcg_executing flag fred.konrad
                   ` (9 subsequent siblings)
  17 siblings, 2 replies; 82+ messages in thread
From: fred.konrad @ 2015-06-26 14:47 UTC (permalink / raw)
  To: qemu-devel, mttcg
  Cc: peter.maydell, a.spyridakis, mark.burton, agraf,
	alistair.francis, guillaume.delbergue, pbonzini, alex.bennee,
	fred.konrad

From: KONRAD Frederic <fred.konrad@greensocs.com>

This removes exit_request global and adds a variable in CPUState for this.
Only the flag for the first cpu is used for the moment as we are still with one
TCG thread.

Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
---
 cpu-exec.c | 15 ---------------
 cpus.c     | 17 ++++++++++++++---
 2 files changed, 14 insertions(+), 18 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index 5d9b518..0644383 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -364,8 +364,6 @@ static void cpu_handle_debug_exception(CPUArchState *env)
 
 /* main execution loop */
 
-volatile sig_atomic_t exit_request;
-
 int cpu_exec(CPUArchState *env)
 {
     CPUState *cpu = ENV_GET_CPU(env);
@@ -394,20 +392,8 @@ int cpu_exec(CPUArchState *env)
 
     current_cpu = cpu;
 
-    /* As long as current_cpu is null, up to the assignment just above,
-     * requests by other threads to exit the execution loop are expected to
-     * be issued using the exit_request global. We must make sure that our
-     * evaluation of the global value is performed past the current_cpu
-     * value transition point, which requires a memory barrier as well as
-     * an instruction scheduling constraint on modern architectures.  */
-    smp_mb();
-
     rcu_read_lock();
 
-    if (unlikely(exit_request)) {
-        cpu->exit_request = 1;
-    }
-
     cc->cpu_exec_enter(cpu);
 
     /* Calculate difference between guest clock and host clock.
@@ -496,7 +482,6 @@ int cpu_exec(CPUArchState *env)
                     }
                 }
                 if (unlikely(cpu->exit_request)) {
-                    cpu->exit_request = 0;
                     cpu->exception_index = EXCP_INTERRUPT;
                     cpu_loop_exit(cpu);
                 }
diff --git a/cpus.c b/cpus.c
index 23c316c..2541c56 100644
--- a/cpus.c
+++ b/cpus.c
@@ -137,6 +137,8 @@ typedef struct TimersState {
 } TimersState;
 
 static TimersState timers_state;
+/* CPU associated to this thread. */
+static __thread CPUState *tcg_thread_cpu;
 
 int64_t cpu_get_icount_raw(void)
 {
@@ -661,12 +663,18 @@ static void cpu_handle_guest_debug(CPUState *cpu)
     cpu->stopped = true;
 }
 
+/**
+ * cpu_signal
+ * Signal handler when using TCG.
+ */
 static void cpu_signal(int sig)
 {
     if (current_cpu) {
         cpu_exit(current_cpu);
     }
-    exit_request = 1;
+
+    /* FIXME: We might want to check if the cpu is running? */
+    tcg_thread_cpu->exit_request = true;
 }
 
 #ifdef CONFIG_LINUX
@@ -1031,6 +1039,7 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
 {
     CPUState *cpu = arg;
 
+    tcg_thread_cpu = cpu;
     qemu_tcg_init_cpu_signals();
     qemu_thread_get_self(cpu->thread);
 
@@ -1393,7 +1402,8 @@ static void tcg_exec_all(void)
     if (next_cpu == NULL) {
         next_cpu = first_cpu;
     }
-    for (; next_cpu != NULL && !exit_request; next_cpu = CPU_NEXT(next_cpu)) {
+    for (; next_cpu != NULL && !first_cpu->exit_request;
+           next_cpu = CPU_NEXT(next_cpu)) {
         CPUState *cpu = next_cpu;
         CPUArchState *env = cpu->env_ptr;
 
@@ -1410,7 +1420,8 @@ static void tcg_exec_all(void)
             break;
         }
     }
-    exit_request = 0;
+
+    first_cpu->exit_request = 0;
 }
 
 void list_cpus(FILE *f, fprintf_function cpu_fprintf, const char *optarg)
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [Qemu-devel] [RFC PATCH V6 09/18] cpu: add a tcg_executing flag.
  2015-06-26 14:47 [Qemu-devel] [RFC PATCH V6 00/18] Multithread TCG fred.konrad
                   ` (7 preceding siblings ...)
  2015-06-26 14:47 ` [Qemu-devel] [RFC PATCH V6 08/18] cpu: remove exit_request global fred.konrad
@ 2015-06-26 14:47 ` fred.konrad
  2015-07-07 13:23   ` Alex Bennée
  2015-06-26 14:47 ` [Qemu-devel] [RFC PATCH V6 10/18] tcg: switch on multithread fred.konrad
                   ` (8 subsequent siblings)
  17 siblings, 1 reply; 82+ messages in thread
From: fred.konrad @ 2015-06-26 14:47 UTC (permalink / raw)
  To: qemu-devel, mttcg
  Cc: peter.maydell, a.spyridakis, mark.burton, agraf,
	alistair.francis, guillaume.delbergue, pbonzini, alex.bennee,
	fred.konrad

From: KONRAD Frederic <fred.konrad@greensocs.com>

We need to know whether any other VCPU is executing code or not it's possible
with this flag.

Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
---
 cpu-exec.c        | 1 +
 cpus.c            | 1 +
 include/qom/cpu.h | 3 +++
 qom/cpu.c         | 1 +
 4 files changed, 6 insertions(+)

diff --git a/cpu-exec.c b/cpu-exec.c
index 0644383..de256d6 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -390,6 +390,7 @@ int cpu_exec(CPUArchState *env)
         cpu->halted = 0;
     }
 
+    cpu->tcg_executing = 1;
     current_cpu = cpu;
 
     rcu_read_lock();
diff --git a/cpus.c b/cpus.c
index 2541c56..0291620 100644
--- a/cpus.c
+++ b/cpus.c
@@ -1377,6 +1377,7 @@ static int tcg_cpu_exec(CPUArchState *env)
     }
     qemu_mutex_unlock_iothread();
     ret = cpu_exec(env);
+    cpu->tcg_executing = 0;
     qemu_mutex_lock_iothread();
 #ifdef CONFIG_PROFILER
     tcg_time += profile_getclock() - ti;
diff --git a/include/qom/cpu.h b/include/qom/cpu.h
index af3c9e4..1464afa 100644
--- a/include/qom/cpu.h
+++ b/include/qom/cpu.h
@@ -222,6 +222,7 @@ struct kvm_run;
  * @stopped: Indicates the CPU has been artificially stopped.
  * @tcg_exit_req: Set to force TCG to stop executing linked TBs for this
  *           CPU and return to its top level loop.
+ * @tcg_executing: This TCG thread is in cpu_exec().
  * @singlestep_enabled: Flags for single-stepping.
  * @icount_extra: Instructions until next timer event.
  * @icount_decr: Number of cycles left, with interrupt flag in high bit.
@@ -315,6 +316,8 @@ struct CPUState {
        (absolute value) offset as small as possible.  This reduces code
        size, especially for hosts without large memory offsets.  */
     volatile sig_atomic_t tcg_exit_req;
+
+    volatile int tcg_executing;
 };
 
 QTAILQ_HEAD(CPUTailQ, CPUState);
diff --git a/qom/cpu.c b/qom/cpu.c
index 108bfa2..ff41a4c 100644
--- a/qom/cpu.c
+++ b/qom/cpu.c
@@ -249,6 +249,7 @@ static void cpu_common_reset(CPUState *cpu)
     cpu->icount_decr.u32 = 0;
     cpu->can_do_io = 0;
     cpu->exception_index = -1;
+    cpu->tcg_executing = 0;
     memset(cpu->tb_jmp_cache, 0, TB_JMP_CACHE_SIZE * sizeof(void *));
 }
 
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [Qemu-devel] [RFC PATCH V6 10/18] tcg: switch on multithread.
  2015-06-26 14:47 [Qemu-devel] [RFC PATCH V6 00/18] Multithread TCG fred.konrad
                   ` (8 preceding siblings ...)
  2015-06-26 14:47 ` [Qemu-devel] [RFC PATCH V6 09/18] cpu: add a tcg_executing flag fred.konrad
@ 2015-06-26 14:47 ` fred.konrad
  2015-07-07 13:40   ` Alex Bennée
  2015-06-26 14:47 ` [Qemu-devel] [RFC PATCH V6 11/18] cpus: make qemu_cpu_kick_thread public fred.konrad
                   ` (7 subsequent siblings)
  17 siblings, 1 reply; 82+ messages in thread
From: fred.konrad @ 2015-06-26 14:47 UTC (permalink / raw)
  To: qemu-devel, mttcg
  Cc: peter.maydell, a.spyridakis, mark.burton, agraf,
	alistair.francis, guillaume.delbergue, pbonzini, alex.bennee,
	fred.konrad

From: KONRAD Frederic <fred.konrad@greensocs.com>

This switches on multithread.

Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>

Changes V5 -> V6:
  * make qemu_cpu_kick calling qemu_cpu_kick_thread in case of TCG.
---
 cpus.c | 95 ++++++++++++++++++++++++------------------------------------------
 1 file changed, 34 insertions(+), 61 deletions(-)

diff --git a/cpus.c b/cpus.c
index 0291620..08267ed 100644
--- a/cpus.c
+++ b/cpus.c
@@ -65,7 +65,6 @@
 
 #endif /* CONFIG_LINUX */
 
-static CPUState *next_cpu;
 int64_t max_delay;
 int64_t max_advance;
 
@@ -820,8 +819,6 @@ static unsigned iothread_requesting_mutex;
 
 static QemuThread io_thread;
 
-static QemuThread *tcg_cpu_thread;
-
 /* cpu creation */
 static QemuCond qemu_cpu_cond;
 /* system init */
@@ -928,10 +925,13 @@ static void qemu_wait_io_event_common(CPUState *cpu)
 
 static void qemu_tcg_wait_io_event(CPUState *cpu)
 {
-    while (all_cpu_threads_idle()) {
-       /* Start accounting real time to the virtual clock if the CPUs
-          are idle.  */
-        qemu_clock_warp(QEMU_CLOCK_VIRTUAL);
+    while (cpu_thread_is_idle(cpu)) {
+        /* Start accounting real time to the virtual clock if the CPUs
+         * are idle.
+         */
+        if ((all_cpu_threads_idle()) && (cpu->cpu_index == 0)) {
+            qemu_clock_warp(QEMU_CLOCK_VIRTUAL);
+        }
         qemu_cond_wait(cpu->halt_cond, &qemu_global_mutex);
     }
 
@@ -939,9 +939,7 @@ static void qemu_tcg_wait_io_event(CPUState *cpu)
         qemu_cond_wait(&qemu_io_proceeded_cond, &qemu_global_mutex);
     }
 
-    CPU_FOREACH(cpu) {
-        qemu_wait_io_event_common(cpu);
-    }
+    qemu_wait_io_event_common(cpu);
 }
 
 static void qemu_kvm_wait_io_event(CPUState *cpu)
@@ -1033,7 +1031,7 @@ static void *qemu_dummy_cpu_thread_fn(void *arg)
 #endif
 }
 
-static void tcg_exec_all(void);
+static void tcg_exec_all(CPUState *cpu);
 
 static void *qemu_tcg_cpu_thread_fn(void *arg)
 {
@@ -1044,37 +1042,26 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
     qemu_thread_get_self(cpu->thread);
 
     qemu_mutex_lock_iothread();
-    CPU_FOREACH(cpu) {
-        cpu->thread_id = qemu_get_thread_id();
-        cpu->created = true;
-        cpu->can_do_io = 1;
-    }
-    qemu_cond_signal(&qemu_cpu_cond);
-
-    /* wait for initial kick-off after machine start */
-    while (first_cpu->stopped) {
-        qemu_cond_wait(first_cpu->halt_cond, &qemu_global_mutex);
-
-        /* process any pending work */
-        CPU_FOREACH(cpu) {
-            qemu_wait_io_event_common(cpu);
-        }
-    }
+    cpu->thread_id = qemu_get_thread_id();
+    cpu->created = true;
+    cpu->can_do_io = 1;
 
-    /* process any pending work */
-    exit_request = 1;
+    qemu_cond_signal(&qemu_cpu_cond);
 
     while (1) {
-        tcg_exec_all();
+        if (!cpu->stopped) {
+            tcg_exec_all(cpu);
 
-        if (use_icount) {
-            int64_t deadline = qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL);
+            if (use_icount) {
+                int64_t deadline =
+                    qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL);
 
-            if (deadline == 0) {
-                qemu_clock_notify(QEMU_CLOCK_VIRTUAL);
+                if (deadline == 0) {
+                    qemu_clock_notify(QEMU_CLOCK_VIRTUAL);
+                }
             }
         }
-        qemu_tcg_wait_io_event(QTAILQ_FIRST(&cpus));
+        qemu_tcg_wait_io_event(cpu);
     }
 
     return NULL;
@@ -1122,7 +1109,7 @@ static void qemu_cpu_kick_thread(CPUState *cpu)
 void qemu_cpu_kick(CPUState *cpu)
 {
     qemu_cond_broadcast(cpu->halt_cond);
-    if (!tcg_enabled() && !cpu->thread_kicked) {
+    if (!cpu->thread_kicked) {
         qemu_cpu_kick_thread(cpu);
         cpu->thread_kicked = true;
     }
@@ -1232,23 +1219,15 @@ static void qemu_tcg_init_vcpu(CPUState *cpu)
 
     cpu->halt_cond = g_malloc0(sizeof(QemuCond));
     qemu_cond_init(cpu->halt_cond);
-
-    /* share a single thread for all cpus with TCG */
-    if (!tcg_cpu_thread) {
-        cpu->thread = g_malloc0(sizeof(QemuThread));
-        snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/TCG",
-                 cpu->cpu_index);
-        qemu_thread_create(cpu->thread, thread_name, qemu_tcg_cpu_thread_fn,
-                           cpu, QEMU_THREAD_JOINABLE);
+    cpu->thread = g_malloc0(sizeof(QemuThread));
+    snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/TCG", cpu->cpu_index);
+    qemu_thread_create(cpu->thread, thread_name, qemu_tcg_cpu_thread_fn, cpu,
+                       QEMU_THREAD_JOINABLE);
 #ifdef _WIN32
-        cpu->hThread = qemu_thread_get_handle(cpu->thread);
+    cpu->hThread = qemu_thread_get_handle(cpu->thread);
 #endif
-        while (!cpu->created) {
-            qemu_cond_wait(&qemu_cpu_cond, &qemu_global_mutex);
-        }
-        tcg_cpu_thread = cpu->thread;
-    } else {
-        cpu->thread = tcg_cpu_thread;
+    while (!cpu->created) {
+        qemu_cond_wait(&qemu_cpu_cond, &qemu_global_mutex);
     }
 }
 
@@ -1393,21 +1372,15 @@ static int tcg_cpu_exec(CPUArchState *env)
     return ret;
 }
 
-static void tcg_exec_all(void)
+static void tcg_exec_all(CPUState *cpu)
 {
     int r;
+    CPUArchState *env = cpu->env_ptr;
 
     /* Account partial waits to QEMU_CLOCK_VIRTUAL.  */
     qemu_clock_warp(QEMU_CLOCK_VIRTUAL);
 
-    if (next_cpu == NULL) {
-        next_cpu = first_cpu;
-    }
-    for (; next_cpu != NULL && !first_cpu->exit_request;
-           next_cpu = CPU_NEXT(next_cpu)) {
-        CPUState *cpu = next_cpu;
-        CPUArchState *env = cpu->env_ptr;
-
+    while (!cpu->exit_request) {
         qemu_clock_enable(QEMU_CLOCK_VIRTUAL,
                           (cpu->singlestep_enabled & SSTEP_NOTIMER) == 0);
 
@@ -1422,7 +1395,7 @@ static void tcg_exec_all(void)
         }
     }
 
-    first_cpu->exit_request = 0;
+    cpu->exit_request = 0;
 }
 
 void list_cpus(FILE *f, fprintf_function cpu_fprintf, const char *optarg)
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [Qemu-devel] [RFC PATCH V6 11/18] cpus: make qemu_cpu_kick_thread public.
  2015-06-26 14:47 [Qemu-devel] [RFC PATCH V6 00/18] Multithread TCG fred.konrad
                   ` (9 preceding siblings ...)
  2015-06-26 14:47 ` [Qemu-devel] [RFC PATCH V6 10/18] tcg: switch on multithread fred.konrad
@ 2015-06-26 14:47 ` fred.konrad
  2015-07-07 15:11   ` Alex Bennée
  2015-06-26 14:47 ` [Qemu-devel] [RFC PATCH V6 12/18] Use atomic cmpxchg to atomically check the exclusive value in a STREX fred.konrad
                   ` (6 subsequent siblings)
  17 siblings, 1 reply; 82+ messages in thread
From: fred.konrad @ 2015-06-26 14:47 UTC (permalink / raw)
  To: qemu-devel, mttcg
  Cc: peter.maydell, a.spyridakis, mark.burton, agraf,
	alistair.francis, guillaume.delbergue, pbonzini, alex.bennee,
	fred.konrad

From: KONRAD Frederic <fred.konrad@greensocs.com>

This makes qemu_cpu_kick_thread public.

Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
---
 cpus.c                | 2 +-
 include/sysemu/cpus.h | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/cpus.c b/cpus.c
index 08267ed..5f13d73 100644
--- a/cpus.c
+++ b/cpus.c
@@ -1067,7 +1067,7 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
     return NULL;
 }
 
-static void qemu_cpu_kick_thread(CPUState *cpu)
+void qemu_cpu_kick_thread(CPUState *cpu)
 {
 #ifndef _WIN32
     int err;
diff --git a/include/sysemu/cpus.h b/include/sysemu/cpus.h
index 3f162a9..4f95b72 100644
--- a/include/sysemu/cpus.h
+++ b/include/sysemu/cpus.h
@@ -6,6 +6,7 @@ void qemu_init_cpu_loop(void);
 void resume_all_vcpus(void);
 void pause_all_vcpus(void);
 void cpu_stop_current(void);
+void qemu_cpu_kick_thread(CPUState *cpu);
 
 void cpu_synchronize_all_states(void);
 void cpu_synchronize_all_post_reset(void);
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [Qemu-devel] [RFC PATCH V6 12/18] Use atomic cmpxchg to atomically check the exclusive value in a STREX
  2015-06-26 14:47 [Qemu-devel] [RFC PATCH V6 00/18] Multithread TCG fred.konrad
                   ` (10 preceding siblings ...)
  2015-06-26 14:47 ` [Qemu-devel] [RFC PATCH V6 11/18] cpus: make qemu_cpu_kick_thread public fred.konrad
@ 2015-06-26 14:47 ` fred.konrad
  2015-06-26 14:47 ` [Qemu-devel] [RFC PATCH V6 13/18] cpu: introduce async_run_safe_work_on_cpu fred.konrad
                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 82+ messages in thread
From: fred.konrad @ 2015-06-26 14:47 UTC (permalink / raw)
  To: qemu-devel, mttcg
  Cc: peter.maydell, a.spyridakis, mark.burton, agraf,
	alistair.francis, guillaume.delbergue, pbonzini, alex.bennee,
	fred.konrad

From: KONRAD Frederic <fred.konrad@greensocs.com>

This mechanism replaces the existing load/store exclusive mechanism which seems
to be broken for multithread.
It follows the intention of the existing mechanism and stores the target address
and data values during a load operation and checks that they remain unchanged
before a store.

In common with the older approach, this provides weaker semantics than required
in that it could be that a different processor writes the same value as a
non-exclusive write, however in practise this seems to be irrelevant.

The old implementation didn’t correctly store it’s values as globals, but rather
kept a local copy per CPU.

This new mechanism stores the values globally and also uses the atomic cmpxchg
macros to ensure atomicity - it is therefore very efficient and threadsafe.

Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>

Changes:
  V5 -> V6:
    * Use spinlock instead of mutex.
    * Fix the length for address map.
    * Fix the return address for tlb_fill.
  V4 -> V5:
    * Remove atomic_check and atomic_release which were unused.
---
 target-arm/cpu.c       |  21 ++++++++
 target-arm/cpu.h       |   6 +++
 target-arm/helper.c    |  13 +++++
 target-arm/helper.h    |   4 ++
 target-arm/op_helper.c | 128 ++++++++++++++++++++++++++++++++++++++++++++++++-
 target-arm/translate.c |  98 +++++++------------------------------
 6 files changed, 188 insertions(+), 82 deletions(-)

diff --git a/target-arm/cpu.c b/target-arm/cpu.c
index 80669a6..817ba6b 100644
--- a/target-arm/cpu.c
+++ b/target-arm/cpu.c
@@ -30,6 +30,26 @@
 #include "sysemu/kvm.h"
 #include "kvm_arm.h"
 
+/* Protect cpu_exclusive_* variable .*/
+__thread bool cpu_have_exclusive_lock;
+QemuSpin cpu_exclusive_lock;
+
+inline void arm_exclusive_lock(void)
+{
+    if (!cpu_have_exclusive_lock) {
+        qemu_spin_lock(&cpu_exclusive_lock);
+        cpu_have_exclusive_lock = true;
+    }
+}
+
+inline void arm_exclusive_unlock(void)
+{
+    if (cpu_have_exclusive_lock) {
+        cpu_have_exclusive_lock = false;
+        qemu_spin_unlock(&cpu_exclusive_lock);
+    }
+}
+
 static void arm_cpu_set_pc(CPUState *cs, vaddr value)
 {
     ARMCPU *cpu = ARM_CPU(cs);
@@ -436,6 +456,7 @@ static void arm_cpu_initfn(Object *obj)
         cpu->psci_version = 2; /* TCG implements PSCI 0.2 */
         if (!inited) {
             inited = true;
+            qemu_spin_init(&cpu_exclusive_lock);
             arm_translate_init();
         }
     }
diff --git a/target-arm/cpu.h b/target-arm/cpu.h
index 80297b3..fbbb396 100644
--- a/target-arm/cpu.h
+++ b/target-arm/cpu.h
@@ -515,6 +515,9 @@ static inline bool is_a64(CPUARMState *env)
 int cpu_arm_signal_handler(int host_signum, void *pinfo,
                            void *puc);
 
+bool arm_get_phys_addr(CPUARMState *env, target_ulong address, int access_type,
+                       hwaddr *phys_ptr, int *prot, target_ulong *page_size);
+
 /**
  * pmccntr_sync
  * @env: CPUARMState
@@ -1933,4 +1936,7 @@ enum {
     QEMU_PSCI_CONDUIT_HVC = 2,
 };
 
+void arm_exclusive_lock(void);
+void arm_exclusive_unlock(void);
+
 #endif
diff --git a/target-arm/helper.c b/target-arm/helper.c
index aa34159..ad3d5da 100644
--- a/target-arm/helper.c
+++ b/target-arm/helper.c
@@ -24,6 +24,15 @@ static inline bool get_phys_addr(CPUARMState *env, target_ulong address,
 #define PMCRE   0x1
 #endif
 
+bool arm_get_phys_addr(CPUARMState *env, target_ulong address, int access_type,
+                       hwaddr *phys_ptr, int *prot, target_ulong *page_size)
+{
+    MemTxAttrs attrs = {};
+    uint32_t fsr;
+    return get_phys_addr(env, address, access_type, cpu_mmu_index(env),
+                         phys_ptr, &attrs, prot, page_size, &fsr);
+}
+
 static int vfp_gdb_get_reg(CPUARMState *env, uint8_t *buf, int reg)
 {
     int nregs;
@@ -4824,6 +4833,10 @@ void arm_cpu_do_interrupt(CPUState *cs)
 
     arm_log_exception(cs->exception_index);
 
+    arm_exclusive_lock();
+    env->exclusive_addr = -1;
+    arm_exclusive_unlock();
+
     if (arm_is_psci_call(cpu, cs->exception_index)) {
         arm_handle_psci_call(cpu);
         qemu_log_mask(CPU_LOG_INT, "...handled as PSCI call\n");
diff --git a/target-arm/helper.h b/target-arm/helper.h
index fc885de..94a6744 100644
--- a/target-arm/helper.h
+++ b/target-arm/helper.h
@@ -529,6 +529,10 @@ DEF_HELPER_2(dc_zva, void, env, i64)
 DEF_HELPER_FLAGS_2(neon_pmull_64_lo, TCG_CALL_NO_RWG_SE, i64, i64, i64)
 DEF_HELPER_FLAGS_2(neon_pmull_64_hi, TCG_CALL_NO_RWG_SE, i64, i64, i64)
 
+DEF_HELPER_4(atomic_cmpxchg64, i32, env, i32, i64, i32)
+DEF_HELPER_1(atomic_clear, void, env)
+DEF_HELPER_3(atomic_claim, void, env, i32, i64)
+
 #ifdef TARGET_AARCH64
 #include "helper-a64.h"
 #endif
diff --git a/target-arm/op_helper.c b/target-arm/op_helper.c
index 7fa32c4..ae7ceab 100644
--- a/target-arm/op_helper.c
+++ b/target-arm/op_helper.c
@@ -30,12 +30,139 @@ static void raise_exception(CPUARMState *env, uint32_t excp,
     CPUState *cs = CPU(arm_env_get_cpu(env));
 
     assert(!excp_is_internal(excp));
+    arm_exclusive_lock();
     cs->exception_index = excp;
     env->exception.syndrome = syndrome;
     env->exception.target_el = target_el;
+    /*
+     * We MAY already have the lock - in which case we are exiting the
+     * instruction due to an exception. Otherwise we better make sure we are not
+     * about to enter a STREX anyway.
+     */
+    env->exclusive_addr = -1;
+    arm_exclusive_unlock();
     cpu_loop_exit(cs);
 }
 
+/* NB return 1 for fail, 0 for pass */
+uint32_t HELPER(atomic_cmpxchg64)(CPUARMState *env, uint32_t addr,
+                                  uint64_t newval, uint32_t size)
+{
+    ARMCPU *cpu = arm_env_get_cpu(env);
+    CPUState *cs = CPU(cpu);
+
+    uintptr_t retaddr = GETRA();
+    bool result = false;
+    hwaddr len = 1 << size;
+
+    hwaddr paddr;
+    target_ulong page_size;
+    int prot;
+
+    arm_exclusive_lock();
+
+    if (env->exclusive_addr != addr) {
+        arm_exclusive_unlock();
+        return 1;
+    }
+
+    if (arm_get_phys_addr(env, addr, 1, &paddr, &prot, &page_size)) {
+        tlb_fill(ENV_GET_CPU(env), addr, MMU_DATA_STORE, cpu_mmu_index(env),
+                 retaddr);
+        if (arm_get_phys_addr(env, addr, 1, &paddr, &prot, &page_size)) {
+            arm_exclusive_unlock();
+            return 1;
+        }
+    }
+
+    switch (size) {
+    case 0:
+    {
+        uint8_t oldval, *p;
+        p = address_space_map(cs->as, paddr, &len, true);
+        if (len == 1 << size) {
+            oldval = (uint8_t)env->exclusive_val;
+            result = (atomic_cmpxchg(p, oldval, (uint8_t)newval) == oldval);
+        }
+        address_space_unmap(cs->as, p, len, true, result ? len : 0);
+    }
+    break;
+    case 1:
+    {
+        uint16_t oldval, *p;
+        p = address_space_map(cs->as, paddr, &len, true);
+        if (len == 1 << size) {
+            oldval = (uint16_t)env->exclusive_val;
+            result = (atomic_cmpxchg(p, oldval, (uint16_t)newval) == oldval);
+        }
+        address_space_unmap(cs->as, p, len, true, result ? len : 0);
+    }
+    break;
+    case 2:
+    {
+        uint32_t oldval, *p;
+        p = address_space_map(cs->as, paddr, &len, true);
+        if (len == 1 << size) {
+            oldval = (uint32_t)env->exclusive_val;
+            result = (atomic_cmpxchg(p, oldval, (uint32_t)newval) == oldval);
+        }
+        address_space_unmap(cs->as, p, len, true, result ? len : 0);
+    }
+    break;
+    case 3:
+    {
+        uint64_t oldval, *p;
+        p = address_space_map(cs->as, paddr, &len, true);
+        if (len == 1 << size) {
+            oldval = (uint64_t)env->exclusive_val;
+            result = (atomic_cmpxchg(p, oldval, (uint64_t)newval) == oldval);
+        }
+        address_space_unmap(cs->as, p, len, true, result ? len : 0);
+    }
+    break;
+    default:
+        abort();
+    break;
+    }
+
+    env->exclusive_addr = -1;
+    arm_exclusive_unlock();
+    if (result) {
+        return 0;
+    } else {
+        return 1;
+    }
+}
+
+void HELPER(atomic_clear)(CPUARMState *env)
+{
+    /* make sure no STREX is about to start */
+    arm_exclusive_lock();
+    env->exclusive_addr = -1;
+    arm_exclusive_unlock();
+}
+
+void HELPER(atomic_claim)(CPUARMState *env, uint32_t addr, uint64_t val)
+{
+    CPUState *cpu;
+    CPUARMState *current_cpu;
+
+    /* ensure that there are no STREX's executing */
+    arm_exclusive_lock();
+
+    CPU_FOREACH(cpu) {
+        current_cpu = &ARM_CPU(cpu)->env;
+        if (current_cpu->exclusive_addr  == addr) {
+            /* We steal the atomic of this CPU. */
+            current_cpu->exclusive_addr = -1;
+        }
+    }
+
+    env->exclusive_val = val;
+    env->exclusive_addr = addr;
+    arm_exclusive_unlock();
+}
+
 static int exception_target_el(CPUARMState *env)
 {
     int target_el = MAX(1, arm_current_el(env));
@@ -583,7 +710,6 @@ void HELPER(exception_return)(CPUARMState *env)
 
     aarch64_save_sp(env, cur_el);
 
-    env->exclusive_addr = -1;
 
     /* We must squash the PSTATE.SS bit to zero unless both of the
      * following hold:
diff --git a/target-arm/translate.c b/target-arm/translate.c
index 47345aa..80302cd 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -65,8 +65,8 @@ TCGv_ptr cpu_env;
 static TCGv_i64 cpu_V0, cpu_V1, cpu_M0;
 static TCGv_i32 cpu_R[16];
 static TCGv_i32 cpu_CF, cpu_NF, cpu_VF, cpu_ZF;
-static TCGv_i64 cpu_exclusive_addr;
 static TCGv_i64 cpu_exclusive_val;
+static TCGv_i64 cpu_exclusive_addr;
 #ifdef CONFIG_USER_ONLY
 static TCGv_i64 cpu_exclusive_test;
 static TCGv_i32 cpu_exclusive_info;
@@ -7391,6 +7391,7 @@ static void gen_load_exclusive(DisasContext *s, int rt, int rt2,
                                TCGv_i32 addr, int size)
 {
     TCGv_i32 tmp = tcg_temp_new_i32();
+    TCGv_i64 val = tcg_temp_new_i64();
 
     s->is_ldex = true;
 
@@ -7415,20 +7416,20 @@ static void gen_load_exclusive(DisasContext *s, int rt, int rt2,
 
         tcg_gen_addi_i32(tmp2, addr, 4);
         gen_aa32_ld32u(tmp3, tmp2, get_mem_index(s));
+        tcg_gen_concat_i32_i64(val, tmp, tmp3);
         tcg_temp_free_i32(tmp2);
-        tcg_gen_concat_i32_i64(cpu_exclusive_val, tmp, tmp3);
         store_reg(s, rt2, tmp3);
     } else {
-        tcg_gen_extu_i32_i64(cpu_exclusive_val, tmp);
+        tcg_gen_extu_i32_i64(val, tmp);
     }
-
+    gen_helper_atomic_claim(cpu_env, addr, val);
+    tcg_temp_free_i64(val);
     store_reg(s, rt, tmp);
-    tcg_gen_extu_i32_i64(cpu_exclusive_addr, addr);
 }
 
 static void gen_clrex(DisasContext *s)
 {
-    tcg_gen_movi_i64(cpu_exclusive_addr, -1);
+    gen_helper_atomic_clear(cpu_env);
 }
 
 #ifdef CONFIG_USER_ONLY
@@ -7445,84 +7446,19 @@ static void gen_store_exclusive(DisasContext *s, int rd, int rt, int rt2,
                                 TCGv_i32 addr, int size)
 {
     TCGv_i32 tmp;
-    TCGv_i64 val64, extaddr;
-    TCGLabel *done_label;
-    TCGLabel *fail_label;
-
-    /* if (env->exclusive_addr == addr && env->exclusive_val == [addr]) {
-         [addr] = {Rt};
-         {Rd} = 0;
-       } else {
-         {Rd} = 1;
-       } */
-    fail_label = gen_new_label();
-    done_label = gen_new_label();
-    extaddr = tcg_temp_new_i64();
-    tcg_gen_extu_i32_i64(extaddr, addr);
-    tcg_gen_brcond_i64(TCG_COND_NE, extaddr, cpu_exclusive_addr, fail_label);
-    tcg_temp_free_i64(extaddr);
-
-    tmp = tcg_temp_new_i32();
-    switch (size) {
-    case 0:
-        gen_aa32_ld8u(tmp, addr, get_mem_index(s));
-        break;
-    case 1:
-        gen_aa32_ld16u(tmp, addr, get_mem_index(s));
-        break;
-    case 2:
-    case 3:
-        gen_aa32_ld32u(tmp, addr, get_mem_index(s));
-        break;
-    default:
-        abort();
-    }
-
-    val64 = tcg_temp_new_i64();
-    if (size == 3) {
-        TCGv_i32 tmp2 = tcg_temp_new_i32();
-        TCGv_i32 tmp3 = tcg_temp_new_i32();
-        tcg_gen_addi_i32(tmp2, addr, 4);
-        gen_aa32_ld32u(tmp3, tmp2, get_mem_index(s));
-        tcg_temp_free_i32(tmp2);
-        tcg_gen_concat_i32_i64(val64, tmp, tmp3);
-        tcg_temp_free_i32(tmp3);
-    } else {
-        tcg_gen_extu_i32_i64(val64, tmp);
-    }
-    tcg_temp_free_i32(tmp);
-
-    tcg_gen_brcond_i64(TCG_COND_NE, val64, cpu_exclusive_val, fail_label);
-    tcg_temp_free_i64(val64);
+    TCGv_i32 tmp2;
+    TCGv_i64 val = tcg_temp_new_i64();
+    TCGv_i32 tmp_size = tcg_const_i32(size);
 
     tmp = load_reg(s, rt);
-    switch (size) {
-    case 0:
-        gen_aa32_st8(tmp, addr, get_mem_index(s));
-        break;
-    case 1:
-        gen_aa32_st16(tmp, addr, get_mem_index(s));
-        break;
-    case 2:
-    case 3:
-        gen_aa32_st32(tmp, addr, get_mem_index(s));
-        break;
-    default:
-        abort();
-    }
+    tmp2 = load_reg(s, rt2);
+    tcg_gen_concat_i32_i64(val, tmp, tmp2);
     tcg_temp_free_i32(tmp);
-    if (size == 3) {
-        tcg_gen_addi_i32(addr, addr, 4);
-        tmp = load_reg(s, rt2);
-        gen_aa32_st32(tmp, addr, get_mem_index(s));
-        tcg_temp_free_i32(tmp);
-    }
-    tcg_gen_movi_i32(cpu_R[rd], 0);
-    tcg_gen_br(done_label);
-    gen_set_label(fail_label);
-    tcg_gen_movi_i32(cpu_R[rd], 1);
-    gen_set_label(done_label);
-    tcg_gen_movi_i64(cpu_exclusive_addr, -1);
+    tcg_temp_free_i32(tmp2);
+
+    gen_helper_atomic_cmpxchg64(cpu_R[rd], cpu_env, addr, val, tmp_size);
+    tcg_temp_free_i64(val);
+    tcg_temp_free_i32(tmp_size);
 }
 #endif
 
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [Qemu-devel] [RFC PATCH V6 13/18] cpu: introduce async_run_safe_work_on_cpu.
  2015-06-26 14:47 [Qemu-devel] [RFC PATCH V6 00/18] Multithread TCG fred.konrad
                   ` (11 preceding siblings ...)
  2015-06-26 14:47 ` [Qemu-devel] [RFC PATCH V6 12/18] Use atomic cmpxchg to atomically check the exclusive value in a STREX fred.konrad
@ 2015-06-26 14:47 ` fred.konrad
  2015-06-26 15:35   ` Paolo Bonzini
  2015-06-26 14:47 ` [Qemu-devel] [RFC PATCH V6 14/18] add a callback when tb_invalidate is called fred.konrad
                   ` (4 subsequent siblings)
  17 siblings, 1 reply; 82+ messages in thread
From: fred.konrad @ 2015-06-26 14:47 UTC (permalink / raw)
  To: qemu-devel, mttcg
  Cc: peter.maydell, a.spyridakis, mark.burton, agraf,
	alistair.francis, guillaume.delbergue, pbonzini, alex.bennee,
	fred.konrad

From: KONRAD Frederic <fred.konrad@greensocs.com>

We already had async_run_on_cpu but we need all VCPUs outside their execution
loop to execute some tb_flush/invalidate task:

async_run_on_cpu_safe schedule a work on a VCPU but the work start when no more
VCPUs are executing code.
When a safe work is pending cpu_has_work returns true, so cpu_exec returns and
the VCPUs can't enters execution loop. cpu_thread_is_idle returns false so at
the moment where all VCPUs are stop || stopped the safe work queue can be
flushed.

Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
---
 cpu-exec.c        |  5 ++++
 cpus.c            | 68 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 include/qom/cpu.h | 21 +++++++++++++++++
 3 files changed, 93 insertions(+), 1 deletion(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index de256d6..d6442cd 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -382,6 +382,11 @@ int cpu_exec(CPUArchState *env)
     volatile bool have_tb_lock = false;
 #endif
 
+    if (async_safe_work_pending()) {
+        cpu->exit_request = 1;
+        return 0;
+    }
+
     if (cpu->halted) {
         if (!cpu_has_work(cpu)) {
             return EXCP_HALTED;
diff --git a/cpus.c b/cpus.c
index 5f13d73..aee445a 100644
--- a/cpus.c
+++ b/cpus.c
@@ -75,7 +75,7 @@ bool cpu_is_stopped(CPUState *cpu)
 
 bool cpu_thread_is_idle(CPUState *cpu)
 {
-    if (cpu->stop || cpu->queued_work_first) {
+    if (cpu->stop || cpu->queued_work_first || cpu->queued_safe_work_first) {
         return false;
     }
     if (cpu_is_stopped(cpu)) {
@@ -892,6 +892,69 @@ void async_run_on_cpu(CPUState *cpu, void (*func)(void *data), void *data)
     qemu_cpu_kick(cpu);
 }
 
+void async_run_safe_work_on_cpu(CPUState *cpu, void (*func)(void *data),
+                                void *data)
+{
+    struct qemu_work_item *wi;
+
+    wi = g_malloc0(sizeof(struct qemu_work_item));
+    wi->func = func;
+    wi->data = data;
+    wi->free = true;
+    if (cpu->queued_safe_work_first == NULL) {
+        cpu->queued_safe_work_first = wi;
+    } else {
+        cpu->queued_safe_work_last->next = wi;
+    }
+    cpu->queued_safe_work_last = wi;
+    wi->next = NULL;
+    wi->done = false;
+
+    CPU_FOREACH(cpu) {
+        qemu_cpu_kick_thread(cpu);
+    }
+}
+
+static void flush_queued_safe_work(CPUState *cpu)
+{
+    struct qemu_work_item *wi;
+    CPUState *other_cpu;
+
+    if (cpu->queued_safe_work_first == NULL) {
+        return;
+    }
+
+    CPU_FOREACH(other_cpu) {
+        if (other_cpu->tcg_executing != 0) {
+            return;
+        }
+    }
+
+    while ((wi = cpu->queued_safe_work_first)) {
+        cpu->queued_safe_work_first = wi->next;
+        wi->func(wi->data);
+        wi->done = true;
+        if (wi->free) {
+            g_free(wi);
+        }
+    }
+    cpu->queued_safe_work_last = NULL;
+    qemu_cond_broadcast(&qemu_work_cond);
+}
+
+bool async_safe_work_pending(void)
+{
+    CPUState *cpu;
+
+    CPU_FOREACH(cpu) {
+        if (cpu->queued_safe_work_first) {
+            return true;
+        }
+    }
+
+    return false;
+}
+
 static void flush_queued_work(CPUState *cpu)
 {
     struct qemu_work_item *wi;
@@ -919,6 +982,9 @@ static void qemu_wait_io_event_common(CPUState *cpu)
         cpu->stopped = true;
         qemu_cond_signal(&qemu_pause_cond);
     }
+    qemu_mutex_unlock_iothread();
+    flush_queued_safe_work(cpu);
+    qemu_mutex_lock_iothread();
     flush_queued_work(cpu);
     cpu->thread_kicked = false;
 }
diff --git a/include/qom/cpu.h b/include/qom/cpu.h
index 1464afa..8f3fe56 100644
--- a/include/qom/cpu.h
+++ b/include/qom/cpu.h
@@ -260,6 +260,7 @@ struct CPUState {
     bool running;
     struct QemuCond *halt_cond;
     struct qemu_work_item *queued_work_first, *queued_work_last;
+    struct qemu_work_item *queued_safe_work_first, *queued_safe_work_last;
     bool thread_kicked;
     bool created;
     bool stop;
@@ -548,6 +549,26 @@ void run_on_cpu(CPUState *cpu, void (*func)(void *data), void *data);
 void async_run_on_cpu(CPUState *cpu, void (*func)(void *data), void *data);
 
 /**
+ * async_run_safe_work_on_cpu:
+ * @cpu: The vCPU to run on.
+ * @func: The function to be executed.
+ * @data: Data to pass to the function.
+ *
+ * Schedules the function @func for execution on the vCPU @cpu asynchronously
+ * when all the VCPUs are outside their loop.
+ */
+void async_run_safe_work_on_cpu(CPUState *cpu, void (*func)(void *data),
+                                void *data);
+
+/**
+ * async_safe_work_pending:
+ *
+ * Check whether any safe work is pending on any VCPUs.
+ * Returns: @true if a safe work is pending, @false otherwise.
+ */
+bool async_safe_work_pending(void);
+
+/**
  * qemu_get_cpu:
  * @index: The CPUState@cpu_index value of the CPU to obtain.
  *
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [Qemu-devel] [RFC PATCH V6 14/18] add a callback when tb_invalidate is called.
  2015-06-26 14:47 [Qemu-devel] [RFC PATCH V6 00/18] Multithread TCG fred.konrad
                   ` (12 preceding siblings ...)
  2015-06-26 14:47 ` [Qemu-devel] [RFC PATCH V6 13/18] cpu: introduce async_run_safe_work_on_cpu fred.konrad
@ 2015-06-26 14:47 ` fred.konrad
  2015-06-26 16:20   ` Paolo Bonzini
  2015-07-07 15:32   ` Alex Bennée
  2015-06-26 14:47 ` [Qemu-devel] [RFC PATCH V6 15/18] cpu: introduce tlb_flush*_all fred.konrad
                   ` (3 subsequent siblings)
  17 siblings, 2 replies; 82+ messages in thread
From: fred.konrad @ 2015-06-26 14:47 UTC (permalink / raw)
  To: qemu-devel, mttcg
  Cc: peter.maydell, a.spyridakis, mark.burton, agraf,
	alistair.francis, guillaume.delbergue, pbonzini, alex.bennee,
	fred.konrad

From: KONRAD Frederic <fred.konrad@greensocs.com>

Instead of doing the jump cache invalidation directly in tb_invalidate delay it
after the exit so we don't have an other CPU trying to execute the code being
invalidated.

Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
---
 translate-all.c | 61 +++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 59 insertions(+), 2 deletions(-)

diff --git a/translate-all.c b/translate-all.c
index ade2269..468648d 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -61,6 +61,7 @@
 #include "translate-all.h"
 #include "qemu/bitmap.h"
 #include "qemu/timer.h"
+#include "sysemu/cpus.h"
 
 //#define DEBUG_TB_INVALIDATE
 //#define DEBUG_FLUSH
@@ -966,14 +967,58 @@ static inline void tb_reset_jump(TranslationBlock *tb, int n)
     tb_set_jmp_target(tb, n, (uintptr_t)(tb->tc_ptr + tb->tb_next_offset[n]));
 }
 
+struct CPUDiscardTBParams {
+    CPUState *cpu;
+    TranslationBlock *tb;
+};
+
+static void cpu_discard_tb_from_jmp_cache(void *opaque)
+{
+    unsigned int h;
+    struct CPUDiscardTBParams *params = opaque;
+
+    h = tb_jmp_cache_hash_func(params->tb->pc);
+    if (params->cpu->tb_jmp_cache[h] == params->tb) {
+        params->cpu->tb_jmp_cache[h] = NULL;
+    }
+
+    g_free(opaque);
+}
+
+static void tb_invalidate_jmp_remove(void *opaque)
+{
+    TranslationBlock *tb = opaque;
+    TranslationBlock *tb1, *tb2;
+    unsigned int n1;
+
+    /* suppress this TB from the two jump lists */
+    tb_jmp_remove(tb, 0);
+    tb_jmp_remove(tb, 1);
+
+    /* suppress any remaining jumps to this TB */
+    tb1 = tb->jmp_first;
+    for (;;) {
+        n1 = (uintptr_t)tb1 & 3;
+        if (n1 == 2) {
+            break;
+        }
+        tb1 = (TranslationBlock *)((uintptr_t)tb1 & ~3);
+        tb2 = tb1->jmp_next[n1];
+        tb_reset_jump(tb1, n1);
+        tb1->jmp_next[n1] = NULL;
+        tb1 = tb2;
+    }
+    tb->jmp_first = (TranslationBlock *)((uintptr_t)tb | 2); /* fail safe */
+}
+
 /* invalidate one TB */
 void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr)
 {
     CPUState *cpu;
     PageDesc *p;
-    unsigned int h, n1;
+    unsigned int h;
     tb_page_addr_t phys_pc;
-    TranslationBlock *tb1, *tb2;
+    struct CPUDiscardTBParams *params;
 
     tb_lock();
 
@@ -996,6 +1041,9 @@ void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr)
 
     tcg_ctx.tb_ctx.tb_invalidated_flag = 1;
 
+#if 0 /*MTTCG*/
+    TranslationBlock *tb1, *tb2;
+    unsigned int n1;
     /* remove the TB from the hash list */
     h = tb_jmp_cache_hash_func(tb->pc);
     CPU_FOREACH(cpu) {
@@ -1022,6 +1070,15 @@ void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr)
         tb1 = tb2;
     }
     tb->jmp_first = (TranslationBlock *)((uintptr_t)tb | 2); /* fail safe */
+#else
+    CPU_FOREACH(cpu) {
+        params = g_malloc(sizeof(struct CPUDiscardTBParams));
+        params->cpu = cpu;
+        params->tb = tb;
+        async_run_on_cpu(cpu, cpu_discard_tb_from_jmp_cache, params);
+    }
+    async_run_safe_work_on_cpu(first_cpu, tb_invalidate_jmp_remove, tb);
+#endif /* MTTCG */
 
     tcg_ctx.tb_ctx.tb_phys_invalidate_count++;
     tb_unlock();
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [Qemu-devel] [RFC PATCH V6 15/18] cpu: introduce tlb_flush*_all.
  2015-06-26 14:47 [Qemu-devel] [RFC PATCH V6 00/18] Multithread TCG fred.konrad
                   ` (13 preceding siblings ...)
  2015-06-26 14:47 ` [Qemu-devel] [RFC PATCH V6 14/18] add a callback when tb_invalidate is called fred.konrad
@ 2015-06-26 14:47 ` fred.konrad
  2015-06-26 15:15   ` Paolo Bonzini
  2015-07-07 15:52   ` Alex Bennée
  2015-06-26 14:47 ` [Qemu-devel] [RFC PATCH V6 16/18] arm: use tlb_flush*_all fred.konrad
                   ` (2 subsequent siblings)
  17 siblings, 2 replies; 82+ messages in thread
From: fred.konrad @ 2015-06-26 14:47 UTC (permalink / raw)
  To: qemu-devel, mttcg
  Cc: peter.maydell, a.spyridakis, mark.burton, agraf,
	alistair.francis, guillaume.delbergue, pbonzini, alex.bennee,
	fred.konrad

From: KONRAD Frederic <fred.konrad@greensocs.com>

Some architectures allow to flush the tlb of other VCPUs. This is not a problem
when we have only one thread for all VCPUs but it definitely needs to be an
asynchronous work when we are in true multithreaded work.

TODO: Some test case, I fear some bad results in case a VCPUs execute a barrier
      or something like that.

Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
---
 cputlb.c                | 76 +++++++++++++++++++++++++++++++++++++++++++++++++
 include/exec/exec-all.h |  2 ++
 2 files changed, 78 insertions(+)

diff --git a/cputlb.c b/cputlb.c
index 79fff1c..e5853fd 100644
--- a/cputlb.c
+++ b/cputlb.c
@@ -72,6 +72,45 @@ void tlb_flush(CPUState *cpu, int flush_global)
     tlb_flush_count++;
 }
 
+struct TLBFlushParams {
+    CPUState *cpu;
+    int flush_global;
+};
+
+static void tlb_flush_async_work(void *opaque)
+{
+    struct TLBFlushParams *params = opaque;
+
+    tlb_flush(params->cpu, params->flush_global);
+    g_free(params);
+}
+
+void tlb_flush_all(int flush_global)
+{
+    CPUState *cpu;
+    struct TLBFlushParams *params;
+
+#if 0 /* MTTCG */
+    CPU_FOREACH(cpu) {
+        tlb_flush(cpu, flush_global);
+    }
+#else
+    CPU_FOREACH(cpu) {
+        if (qemu_cpu_is_self(cpu)) {
+            /* async_run_on_cpu handle this case but this just avoid a malloc
+             * here.
+             */
+            tlb_flush(cpu, flush_global);
+        } else {
+            params = g_malloc(sizeof(struct TLBFlushParams));
+            params->cpu = cpu;
+            params->flush_global = flush_global;
+            async_run_on_cpu(cpu, tlb_flush_async_work, params);
+        }
+    }
+#endif /* MTTCG */
+}
+
 static inline void tlb_flush_entry(CPUTLBEntry *tlb_entry, target_ulong addr)
 {
     if (addr == (tlb_entry->addr_read &
@@ -124,6 +163,43 @@ void tlb_flush_page(CPUState *cpu, target_ulong addr)
     tb_flush_jmp_cache(cpu, addr);
 }
 
+struct TLBFlushPageParams {
+    CPUState *cpu;
+    target_ulong addr;
+};
+
+static void tlb_flush_page_async_work(void *opaque)
+{
+    struct TLBFlushPageParams *params = opaque;
+
+    tlb_flush_page(params->cpu, params->addr);
+    g_free(params);
+}
+
+void tlb_flush_page_all(target_ulong addr)
+{
+    CPUState *cpu;
+    struct TLBFlushPageParams *params;
+
+    CPU_FOREACH(cpu) {
+#if 0 /* !MTTCG */
+        tlb_flush_page(cpu, addr);
+#else
+        if (qemu_cpu_is_self(cpu)) {
+            /* async_run_on_cpu handle this case but this just avoid a malloc
+             * here.
+             */
+            tlb_flush_page(cpu, addr);
+        } else {
+            params = g_malloc(sizeof(struct TLBFlushPageParams));
+            params->cpu = cpu;
+            params->addr = addr;
+            async_run_on_cpu(cpu, tlb_flush_page_async_work, params);
+        }
+#endif /* MTTCG */
+    }
+}
+
 /* update the TLBs so that writes to code in the virtual page 'addr'
    can be detected */
 void tlb_protect_code(ram_addr_t ram_addr)
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 44f3336..484c351 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -96,7 +96,9 @@ bool qemu_in_vcpu_thread(void);
 void cpu_reload_memory_map(CPUState *cpu);
 void tcg_cpu_address_space_init(CPUState *cpu, AddressSpace *as);
 /* cputlb.c */
+void tlb_flush_page_all(target_ulong addr);
 void tlb_flush_page(CPUState *cpu, target_ulong addr);
+void tlb_flush_all(int flush_global);
 void tlb_flush(CPUState *cpu, int flush_global);
 void tlb_set_page(CPUState *cpu, target_ulong vaddr,
                   hwaddr paddr, int prot,
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [Qemu-devel] [RFC PATCH V6 16/18] arm: use tlb_flush*_all
  2015-06-26 14:47 [Qemu-devel] [RFC PATCH V6 00/18] Multithread TCG fred.konrad
                   ` (14 preceding siblings ...)
  2015-06-26 14:47 ` [Qemu-devel] [RFC PATCH V6 15/18] cpu: introduce tlb_flush*_all fred.konrad
@ 2015-06-26 14:47 ` fred.konrad
  2015-07-07 16:14   ` Alex Bennée
  2015-06-26 14:47 ` [Qemu-devel] [RFC PATCH V6 17/18] translate-all: introduces tb_flush_safe fred.konrad
  2015-06-26 14:47 ` [Qemu-devel] [RFC PATCH V6 18/18] translate-all: (wip) use tb_flush_safe when we can't alloc more tb fred.konrad
  17 siblings, 1 reply; 82+ messages in thread
From: fred.konrad @ 2015-06-26 14:47 UTC (permalink / raw)
  To: qemu-devel, mttcg
  Cc: peter.maydell, a.spyridakis, mark.burton, agraf,
	alistair.francis, guillaume.delbergue, pbonzini, alex.bennee,
	fred.konrad

From: KONRAD Frederic <fred.konrad@greensocs.com>

This just use the new mechanism to ensure that each VCPU thread flush its own
VCPU.

Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
---
 target-arm/helper.c | 45 +++++++--------------------------------------
 1 file changed, 7 insertions(+), 38 deletions(-)

diff --git a/target-arm/helper.c b/target-arm/helper.c
index ad3d5da..1995439 100644
--- a/target-arm/helper.c
+++ b/target-arm/helper.c
@@ -411,41 +411,25 @@ static void tlbimvaa_write(CPUARMState *env, const ARMCPRegInfo *ri,
 static void tlbiall_is_write(CPUARMState *env, const ARMCPRegInfo *ri,
                              uint64_t value)
 {
-    CPUState *other_cs;
-
-    CPU_FOREACH(other_cs) {
-        tlb_flush(other_cs, 1);
-    }
+    tlb_flush_all(1);
 }
 
 static void tlbiasid_is_write(CPUARMState *env, const ARMCPRegInfo *ri,
                              uint64_t value)
 {
-    CPUState *other_cs;
-
-    CPU_FOREACH(other_cs) {
-        tlb_flush(other_cs, value == 0);
-    }
+    tlb_flush_all(value == 0);
 }
 
 static void tlbimva_is_write(CPUARMState *env, const ARMCPRegInfo *ri,
                              uint64_t value)
 {
-    CPUState *other_cs;
-
-    CPU_FOREACH(other_cs) {
-        tlb_flush_page(other_cs, value & TARGET_PAGE_MASK);
-    }
+    tlb_flush_page_all(value & TARGET_PAGE_MASK);
 }
 
 static void tlbimvaa_is_write(CPUARMState *env, const ARMCPRegInfo *ri,
                              uint64_t value)
 {
-    CPUState *other_cs;
-
-    CPU_FOREACH(other_cs) {
-        tlb_flush_page(other_cs, value & TARGET_PAGE_MASK);
-    }
+    tlb_flush_page_all(value & TARGET_PAGE_MASK);
 }
 
 static const ARMCPRegInfo cp_reginfo[] = {
@@ -2281,34 +2265,19 @@ static void tlbi_aa64_asid_write(CPUARMState *env, const ARMCPRegInfo *ri,
 static void tlbi_aa64_va_is_write(CPUARMState *env, const ARMCPRegInfo *ri,
                                   uint64_t value)
 {
-    CPUState *other_cs;
-    uint64_t pageaddr = sextract64(value << 12, 0, 56);
-
-    CPU_FOREACH(other_cs) {
-        tlb_flush_page(other_cs, pageaddr);
-    }
+    tlb_flush_page_all(sextract64(value << 12, 0, 56));
 }
 
 static void tlbi_aa64_vaa_is_write(CPUARMState *env, const ARMCPRegInfo *ri,
                                   uint64_t value)
 {
-    CPUState *other_cs;
-    uint64_t pageaddr = sextract64(value << 12, 0, 56);
-
-    CPU_FOREACH(other_cs) {
-        tlb_flush_page(other_cs, pageaddr);
-    }
+    tlb_flush_page_all(sextract64(value << 12, 0, 56));
 }
 
 static void tlbi_aa64_asid_is_write(CPUARMState *env, const ARMCPRegInfo *ri,
                                   uint64_t value)
 {
-    CPUState *other_cs;
-    int asid = extract64(value, 48, 16);
-
-    CPU_FOREACH(other_cs) {
-        tlb_flush(other_cs, asid == 0);
-    }
+    tlb_flush_all(extract64(value, 48, 16) == 0);
 }
 
 static CPAccessResult aa64_zva_access(CPUARMState *env, const ARMCPRegInfo *ri)
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [Qemu-devel] [RFC PATCH V6 17/18] translate-all: introduces tb_flush_safe.
  2015-06-26 14:47 [Qemu-devel] [RFC PATCH V6 00/18] Multithread TCG fred.konrad
                   ` (15 preceding siblings ...)
  2015-06-26 14:47 ` [Qemu-devel] [RFC PATCH V6 16/18] arm: use tlb_flush*_all fred.konrad
@ 2015-06-26 14:47 ` fred.konrad
  2015-07-07 16:16   ` Alex Bennée
  2015-06-26 14:47 ` [Qemu-devel] [RFC PATCH V6 18/18] translate-all: (wip) use tb_flush_safe when we can't alloc more tb fred.konrad
  17 siblings, 1 reply; 82+ messages in thread
From: fred.konrad @ 2015-06-26 14:47 UTC (permalink / raw)
  To: qemu-devel, mttcg
  Cc: peter.maydell, a.spyridakis, mark.burton, agraf,
	alistair.francis, guillaume.delbergue, pbonzini, alex.bennee,
	fred.konrad

From: KONRAD Frederic <fred.konrad@greensocs.com>

tb_flush is not thread safe we definitely need to exit VCPUs to do that.
This introduces tb_flush_safe which just creates an async safe work which will
do a tb_flush later.

Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
---
 include/exec/exec-all.h |  1 +
 translate-all.c         | 15 +++++++++++++++
 2 files changed, 16 insertions(+)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 484c351..b5e4fb3 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -219,6 +219,7 @@ static inline unsigned int tb_phys_hash_func(tb_page_addr_t pc)
 
 void tb_free(TranslationBlock *tb);
 void tb_flush(CPUArchState *env);
+void tb_flush_safe(CPUArchState *env);
 void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr);
 
 #if defined(USE_DIRECT_JUMP)
diff --git a/translate-all.c b/translate-all.c
index 468648d..8bd8fe8 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -815,6 +815,21 @@ static void page_flush_tb(void)
     }
 }
 
+static void tb_flush_work(void *opaque)
+{
+    CPUArchState *env = opaque;
+    tb_flush(env);
+}
+
+void tb_flush_safe(CPUArchState *env)
+{
+#if 0 /* !MTTCG */
+    tb_flush(env);
+#else
+    async_run_safe_work_on_cpu(ENV_GET_CPU(env), tb_flush_work, env);
+#endif /* MTTCG */
+}
+
 /* flush all the translation blocks */
 /* XXX: tb_flush is currently not thread safe */
 void tb_flush(CPUArchState *env1)
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [Qemu-devel] [RFC PATCH V6 18/18] translate-all: (wip) use tb_flush_safe when we can't alloc more tb.
  2015-06-26 14:47 [Qemu-devel] [RFC PATCH V6 00/18] Multithread TCG fred.konrad
                   ` (16 preceding siblings ...)
  2015-06-26 14:47 ` [Qemu-devel] [RFC PATCH V6 17/18] translate-all: introduces tb_flush_safe fred.konrad
@ 2015-06-26 14:47 ` fred.konrad
  2015-06-26 16:21   ` Paolo Bonzini
  2015-07-07 16:17   ` Alex Bennée
  17 siblings, 2 replies; 82+ messages in thread
From: fred.konrad @ 2015-06-26 14:47 UTC (permalink / raw)
  To: qemu-devel, mttcg
  Cc: peter.maydell, a.spyridakis, mark.burton, agraf,
	alistair.francis, guillaume.delbergue, pbonzini, alex.bennee,
	fred.konrad

From: KONRAD Frederic <fred.konrad@greensocs.com>

This changes just the tb_flush called from tb_alloc.

TODO:
 * changes the other tb_flush.

Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
---
 translate-all.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/translate-all.c b/translate-all.c
index 8bd8fe8..9adaffa 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -1147,7 +1147,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
     tb = tb_alloc(pc);
     if (!tb) {
         /* flush must be done */
-        tb_flush(env);
+        tb_flush_safe(env);
         /* cannot fail at this point */
         tb = tb_alloc(pc);
         /* Don't forget to invalidate previous TB info.  */
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [RFC PATCH V6 03/18] remove unused spinlock.
  2015-06-26 14:47 ` [Qemu-devel] [RFC PATCH V6 03/18] remove unused spinlock fred.konrad
@ 2015-06-26 14:53   ` Paolo Bonzini
  2015-06-26 15:29     ` Frederic Konrad
  0 siblings, 1 reply; 82+ messages in thread
From: Paolo Bonzini @ 2015-06-26 14:53 UTC (permalink / raw)
  To: fred.konrad, qemu-devel, mttcg
  Cc: peter.maydell, a.spyridakis, mark.burton, agraf,
	alistair.francis, guillaume.delbergue, alex.bennee



On 26/06/2015 16:47, fred.konrad@greensocs.com wrote:
> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> index 7f0aae9..d1e482a 100755
> --- a/scripts/checkpatch.pl
> +++ b/scripts/checkpatch.pl
> @@ -2664,11 +2664,6 @@ sub process {
>  			WARN("Use of volatile is usually wrong: see Documentation/volatile-considered-harmful.txt\n" . $herecurr);
>  		}
>  
> -# SPIN_LOCK_UNLOCKED & RW_LOCK_UNLOCKED are deprecated
> -		if ($line =~ /\b(SPIN_LOCK_UNLOCKED|RW_LOCK_UNLOCKED)/) {
> -			ERROR("Use of $1 is deprecated: see Documentation/spinlocks.txt\n" . $herecurr);
> -		}
> -
>  # warn about #if 0
>  		if ($line =~ /^.\s*\#\s*if\s+0\b/) {
>  			CHK("if this code is redundant consider removing it\n" .
> @@ -2717,8 +2712,8 @@ sub process {
>  			ERROR("exactly one space required after that #$1\n" . $herecurr);
>  		}
>  
> -# check for spinlock_t definitions without a comment.
> -		if ($line =~ /^.\s*(struct\s+mutex|spinlock_t)\s+\S+;/ ||
> +# check for mutex definitions without a comment.
> +		if ($line =~ /^.\s*(struct\s+mutex)\s+\S+;/ ||
>  		    $line =~ /^.\s*(DEFINE_MUTEX)\s*\(/) {
>  			my $which = $1;
>  			if (!ctx_has_comment($first_line, $linenr)) {

The checkpatch.pl parts simply come from Linux.  They don't matter for
QEMU, but we're limiting the changes to the minimum in this script.

Paolo

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [RFC PATCH V6 04/18] add support for spin lock on POSIX systems exclusively
  2015-06-26 14:47 ` [Qemu-devel] [RFC PATCH V6 04/18] add support for spin lock on POSIX systems exclusively fred.konrad
@ 2015-06-26 14:55   ` Paolo Bonzini
  2015-06-26 15:31     ` Frederic Konrad
  0 siblings, 1 reply; 82+ messages in thread
From: Paolo Bonzini @ 2015-06-26 14:55 UTC (permalink / raw)
  To: fred.konrad, qemu-devel, mttcg
  Cc: peter.maydell, a.spyridakis, mark.burton, agraf,
	alistair.francis, guillaume.delbergue, alex.bennee



On 26/06/2015 16:47, fred.konrad@greensocs.com wrote:
> From: Guillaume Delbergue <guillaume.delbergue@greensocs.com>
> 
> WARNING: spin lock is currently not implemented on WIN32

The Windows KSPIN_LOCK is a kernel data structure.  You can implement a
simple, portable test-and-test-and-set spinlock using atomics, and use
it on both POSIX and Win32.

Paolo

> Signed-off-by: Guillaume Delbergue <guillaume.delbergue@greensocs.com>
> ---
>  include/qemu/thread-posix.h |  4 ++++
>  include/qemu/thread-win32.h |  4 ++++
>  include/qemu/thread.h       |  7 +++++++
>  util/qemu-thread-posix.c    | 45 +++++++++++++++++++++++++++++++++++++++++++++
>  util/qemu-thread-win32.c    | 30 ++++++++++++++++++++++++++++++
>  5 files changed, 90 insertions(+)
> 
> diff --git a/include/qemu/thread-posix.h b/include/qemu/thread-posix.h
> index eb5c7a1..8ce8f01 100644
> --- a/include/qemu/thread-posix.h
> +++ b/include/qemu/thread-posix.h
> @@ -7,6 +7,10 @@ struct QemuMutex {
>      pthread_mutex_t lock;
>  };
>  
> +struct QemuSpin {
> +    pthread_spinlock_t lock;
> +};
> +
>  struct QemuCond {
>      pthread_cond_t cond;
>  };
> diff --git a/include/qemu/thread-win32.h b/include/qemu/thread-win32.h
> index 3d58081..310c8bd 100644
> --- a/include/qemu/thread-win32.h
> +++ b/include/qemu/thread-win32.h
> @@ -7,6 +7,10 @@ struct QemuMutex {
>      LONG owner;
>  };
>  
> +struct QemuSpin {
> +    PKSPIN_LOCK lock;
> +};
> +
>  struct QemuCond {
>      LONG waiters, target;
>      HANDLE sema;
> diff --git a/include/qemu/thread.h b/include/qemu/thread.h
> index 5114ec8..f5d1259 100644
> --- a/include/qemu/thread.h
> +++ b/include/qemu/thread.h
> @@ -5,6 +5,7 @@
>  #include <stdbool.h>
>  
>  typedef struct QemuMutex QemuMutex;
> +typedef struct QemuSpin QemuSpin;
>  typedef struct QemuCond QemuCond;
>  typedef struct QemuSemaphore QemuSemaphore;
>  typedef struct QemuEvent QemuEvent;
> @@ -25,6 +26,12 @@ void qemu_mutex_lock(QemuMutex *mutex);
>  int qemu_mutex_trylock(QemuMutex *mutex);
>  void qemu_mutex_unlock(QemuMutex *mutex);
>  
> +void qemu_spin_init(QemuSpin *spin);
> +void qemu_spin_destroy(QemuSpin *spin);
> +void qemu_spin_lock(QemuSpin *spin);
> +int qemu_spin_trylock(QemuSpin *spin);
> +void qemu_spin_unlock(QemuSpin *spin);
> +
>  void qemu_cond_init(QemuCond *cond);
>  void qemu_cond_destroy(QemuCond *cond);
>  
> diff --git a/util/qemu-thread-posix.c b/util/qemu-thread-posix.c
> index ba67cec..224bacc 100644
> --- a/util/qemu-thread-posix.c
> +++ b/util/qemu-thread-posix.c
> @@ -89,6 +89,51 @@ void qemu_mutex_unlock(QemuMutex *mutex)
>          error_exit(err, __func__);
>  }
>  
> +void qemu_spin_init(QemuSpin *spin)
> +{
> +    int err;
> +
> +    err = pthread_spin_init(&spin->lock, 0);
> +    if (err) {
> +        error_exit(err, __func__);
> +    }
> +}
> +
> +void qemu_spin_destroy(QemuSpin *spin)
> +{
> +    int err;
> +
> +    err = pthread_spin_destroy(&spin->lock);
> +    if (err) {
> +        error_exit(err, __func__);
> +    }
> +}
> +
> +void qemu_spin_lock(QemuSpin *spin)
> +{
> +    int err;
> +
> +    err = pthread_spin_lock(&spin->lock);
> +    if (err) {
> +        error_exit(err, __func__);
> +    }
> +}
> +
> +int qemu_spin_trylock(QemuSpin *spin)
> +{
> +    return pthread_spin_trylock(&spin->lock);
> +}
> +
> +void qemu_spin_unlock(QemuSpin *spin)
> +{
> +    int err;
> +
> +    err = pthread_spin_unlock(&spin->lock);
> +    if (err) {
> +        error_exit(err, __func__);
> +    }
> +}
> +
>  void qemu_cond_init(QemuCond *cond)
>  {
>      int err;
> diff --git a/util/qemu-thread-win32.c b/util/qemu-thread-win32.c
> index 406b52f..6fbe6a8 100644
> --- a/util/qemu-thread-win32.c
> +++ b/util/qemu-thread-win32.c
> @@ -80,6 +80,36 @@ void qemu_mutex_unlock(QemuMutex *mutex)
>      LeaveCriticalSection(&mutex->lock);
>  }
>  
> +void qemu_spin_init(QemuSpin *spin)
> +{
> +    printf("spinlock not implemented");
> +    abort();
> +}
> +
> +void qemu_spin_destroy(QemuSpin *spin)
> +{
> +    printf("spinlock not implemented");
> +    abort();
> +}
> +
> +void qemu_spin_lock(QemuSpin *spin)
> +{
> +    printf("spinlock not implemented");
> +    abort();
> +}
> +
> +int qemu_spin_trylock(QemuSpin *spin)
> +{
> +    printf("spinlock not implemented");
> +    abort();
> +}
> +
> +void qemu_spin_unlock(QemuSpin *spin)
> +{
> +    printf("spinlock not implemented");
> +    abort();
> +}
> +
>  void qemu_cond_init(QemuCond *cond)
>  {
>      memset(cond, 0, sizeof(*cond));
> 

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [RFC PATCH V6 07/18] Drop global lock during TCG code execution
  2015-06-26 14:47 ` [Qemu-devel] [RFC PATCH V6 07/18] Drop global lock during TCG code execution fred.konrad
@ 2015-06-26 14:56   ` Jan Kiszka
  2015-06-26 15:08     ` Paolo Bonzini
  2015-06-26 15:36     ` Frederic Konrad
  0 siblings, 2 replies; 82+ messages in thread
From: Jan Kiszka @ 2015-06-26 14:56 UTC (permalink / raw)
  To: fred.konrad, qemu-devel, mttcg
  Cc: peter.maydell, a.spyridakis, mark.burton, agraf,
	alistair.francis, guillaume.delbergue, pbonzini, alex.bennee

On 2015-06-26 16:47, fred.konrad@greensocs.com wrote:
> From: Jan Kiszka <jan.kiszka@siemens.com>
> 
> This finally allows TCG to benefit from the iothread introduction: Drop
> the global mutex while running pure TCG CPU code. Reacquire the lock
> when entering MMIO or PIO emulation, or when leaving the TCG loop.
> 
> We have to revert a few optimization for the current TCG threading
> model, namely kicking the TCG thread in qemu_mutex_lock_iothread and not
> kicking it in qemu_cpu_kick. We also need to disable RAM block
> reordering until we have a more efficient locking mechanism at hand.
> 
> I'm pretty sure some cases are still broken, definitely SMP (we no
> longer perform round-robin scheduling "by chance"). Still, a Linux x86
> UP guest and my Musicpal ARM model boot fine here. These numbers
> demonstrate where we gain something:
> 
> 20338 jan       20   0  331m  75m 6904 R   99  0.9   0:50.95 qemu-system-arm
> 20337 jan       20   0  331m  75m 6904 S   20  0.9   0:26.50 qemu-system-arm
> 
> The guest CPU was fully loaded, but the iothread could still run mostly
> independent on a second core. Without the patch we don't get beyond
> 
> 32206 jan       20   0  330m  73m 7036 R   82  0.9   1:06.00 qemu-system-arm
> 32204 jan       20   0  330m  73m 7036 S   21  0.9   0:17.03 qemu-system-arm
> 
> We don't benefit significantly, though, when the guest is not fully
> loading a host CPU.
> 
> Note that this patch depends on
> http://thread.gmane.org/gmane.comp.emulators.qemu/118657
> 
> Changes from Fred Konrad:
>   * Rebase on the current HEAD.
>   * Fixes a deadlock in qemu_devices_reset().
> ---
>  cpus.c                    | 17 ++++-------------
>  cputlb.c                  |  5 +++++
>  exec.c                    | 25 +++++++++++++++++++++++++
>  softmmu_template.h        |  5 +++++
>  target-i386/misc_helper.c | 27 ++++++++++++++++++++++++---
>  translate-all.c           |  2 ++
>  vl.c                      |  6 ++++++
>  7 files changed, 71 insertions(+), 16 deletions(-)
> 
> diff --git a/cpus.c b/cpus.c
> index 79383df..23c316c 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -1034,7 +1034,7 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
>      qemu_tcg_init_cpu_signals();
>      qemu_thread_get_self(cpu->thread);
>  
> -    qemu_mutex_lock(&qemu_global_mutex);
> +    qemu_mutex_lock_iothread();
>      CPU_FOREACH(cpu) {
>          cpu->thread_id = qemu_get_thread_id();
>          cpu->created = true;
> @@ -1145,18 +1145,7 @@ bool qemu_in_vcpu_thread(void)
>  
>  void qemu_mutex_lock_iothread(void)
>  {
> -    atomic_inc(&iothread_requesting_mutex);
> -    if (!tcg_enabled() || !first_cpu || !first_cpu->thread) {
> -        qemu_mutex_lock(&qemu_global_mutex);
> -        atomic_dec(&iothread_requesting_mutex);
> -    } else {
> -        if (qemu_mutex_trylock(&qemu_global_mutex)) {
> -            qemu_cpu_kick_thread(first_cpu);
> -            qemu_mutex_lock(&qemu_global_mutex);
> -        }
> -        atomic_dec(&iothread_requesting_mutex);
> -        qemu_cond_broadcast(&qemu_io_proceeded_cond);
> -    }
> +    qemu_mutex_lock(&qemu_global_mutex);
>  }
>  
>  void qemu_mutex_unlock_iothread(void)
> @@ -1377,7 +1366,9 @@ static int tcg_cpu_exec(CPUArchState *env)
>          cpu->icount_decr.u16.low = decr;
>          cpu->icount_extra = count;
>      }
> +    qemu_mutex_unlock_iothread();
>      ret = cpu_exec(env);
> +    qemu_mutex_lock_iothread();
>  #ifdef CONFIG_PROFILER
>      tcg_time += profile_getclock() - ti;
>  #endif
> diff --git a/cputlb.c b/cputlb.c
> index a506086..79fff1c 100644
> --- a/cputlb.c
> +++ b/cputlb.c
> @@ -30,6 +30,9 @@
>  #include "exec/ram_addr.h"
>  #include "tcg/tcg.h"
>  
> +void qemu_mutex_lock_iothread(void);
> +void qemu_mutex_unlock_iothread(void);
> +
>  //#define DEBUG_TLB
>  //#define DEBUG_TLB_CHECK
>  
> @@ -125,8 +128,10 @@ void tlb_flush_page(CPUState *cpu, target_ulong addr)
>     can be detected */
>  void tlb_protect_code(ram_addr_t ram_addr)
>  {
> +    qemu_mutex_lock_iothread();
>      cpu_physical_memory_test_and_clear_dirty(ram_addr, TARGET_PAGE_SIZE,
>                                               DIRTY_MEMORY_CODE);
> +    qemu_mutex_unlock_iothread();
>  }
>  
>  /* update the TLB so that writes in physical page 'phys_addr' are no longer
> diff --git a/exec.c b/exec.c
> index f7883d2..964e922 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -1881,6 +1881,7 @@ static void check_watchpoint(int offset, int len, MemTxAttrs attrs, int flags)
>              wp->hitaddr = vaddr;
>              wp->hitattrs = attrs;
>              if (!cpu->watchpoint_hit) {
> +                qemu_mutex_unlock_iothread();
>                  cpu->watchpoint_hit = wp;
>                  tb_check_watchpoint(cpu);
>                  if (wp->flags & BP_STOP_BEFORE_ACCESS) {
> @@ -2740,6 +2741,7 @@ static inline uint32_t address_space_ldl_internal(AddressSpace *as, hwaddr addr,
>      mr = address_space_translate(as, addr, &addr1, &l, false);
>      if (l < 4 || !memory_access_is_direct(mr, false)) {
>          /* I/O case */
> +        qemu_mutex_lock_iothread();
>          r = memory_region_dispatch_read(mr, addr1, &val, 4, attrs);
>  #if defined(TARGET_WORDS_BIGENDIAN)
>          if (endian == DEVICE_LITTLE_ENDIAN) {
> @@ -2750,6 +2752,7 @@ static inline uint32_t address_space_ldl_internal(AddressSpace *as, hwaddr addr,
>              val = bswap32(val);
>          }
>  #endif
> +        qemu_mutex_unlock_iothread();
>      } else {
>          /* RAM case */
>          ptr = qemu_get_ram_ptr((memory_region_get_ram_addr(mr)
> @@ -2829,6 +2832,7 @@ static inline uint64_t address_space_ldq_internal(AddressSpace *as, hwaddr addr,
>                                   false);
>      if (l < 8 || !memory_access_is_direct(mr, false)) {
>          /* I/O case */
> +        qemu_mutex_lock_iothread();
>          r = memory_region_dispatch_read(mr, addr1, &val, 8, attrs);
>  #if defined(TARGET_WORDS_BIGENDIAN)
>          if (endian == DEVICE_LITTLE_ENDIAN) {
> @@ -2839,6 +2843,7 @@ static inline uint64_t address_space_ldq_internal(AddressSpace *as, hwaddr addr,
>              val = bswap64(val);
>          }
>  #endif
> +        qemu_mutex_unlock_iothread();
>      } else {
>          /* RAM case */
>          ptr = qemu_get_ram_ptr((memory_region_get_ram_addr(mr)
> @@ -2938,7 +2943,9 @@ static inline uint32_t address_space_lduw_internal(AddressSpace *as,
>                                   false);
>      if (l < 2 || !memory_access_is_direct(mr, false)) {
>          /* I/O case */
> +        qemu_mutex_lock_iothread();
>          r = memory_region_dispatch_read(mr, addr1, &val, 2, attrs);
> +        qemu_mutex_unlock_iothread();
>  #if defined(TARGET_WORDS_BIGENDIAN)
>          if (endian == DEVICE_LITTLE_ENDIAN) {
>              val = bswap16(val);
> @@ -3026,15 +3033,19 @@ void address_space_stl_notdirty(AddressSpace *as, hwaddr addr, uint32_t val,
>      mr = address_space_translate(as, addr, &addr1, &l,
>                                   true);
>      if (l < 4 || !memory_access_is_direct(mr, true)) {
> +        qemu_mutex_lock_iothread();
>          r = memory_region_dispatch_write(mr, addr1, val, 4, attrs);
> +        qemu_mutex_unlock_iothread();
>      } else {
>          addr1 += memory_region_get_ram_addr(mr) & TARGET_PAGE_MASK;
>          ptr = qemu_get_ram_ptr(addr1);
>          stl_p(ptr, val);
>  
> +        qemu_mutex_lock_iothread();
>          dirty_log_mask = memory_region_get_dirty_log_mask(mr);
>          dirty_log_mask &= ~(1 << DIRTY_MEMORY_CODE);
>          cpu_physical_memory_set_dirty_range(addr1, 4, dirty_log_mask);
> +        qemu_mutex_unlock_iothread();
>          r = MEMTX_OK;
>      }
>      if (result) {
> @@ -3074,7 +3085,9 @@ static inline void address_space_stl_internal(AddressSpace *as,
>              val = bswap32(val);
>          }
>  #endif
> +        qemu_mutex_lock_iothread();
>          r = memory_region_dispatch_write(mr, addr1, val, 4, attrs);
> +        qemu_mutex_unlock_iothread();
>      } else {
>          /* RAM case */
>          addr1 += memory_region_get_ram_addr(mr) & TARGET_PAGE_MASK;
> @@ -3090,7 +3103,9 @@ static inline void address_space_stl_internal(AddressSpace *as,
>              stl_p(ptr, val);
>              break;
>          }
> +        qemu_mutex_lock_iothread();
>          invalidate_and_set_dirty(mr, addr1, 4);
> +        qemu_mutex_unlock_iothread();
>          r = MEMTX_OK;
>      }
>      if (result) {
> @@ -3178,7 +3193,9 @@ static inline void address_space_stw_internal(AddressSpace *as,
>              val = bswap16(val);
>          }
>  #endif
> +        qemu_mutex_lock_iothread();
>          r = memory_region_dispatch_write(mr, addr1, val, 2, attrs);
> +        qemu_mutex_unlock_iothread();
>      } else {
>          /* RAM case */
>          addr1 += memory_region_get_ram_addr(mr) & TARGET_PAGE_MASK;
> @@ -3194,7 +3211,9 @@ static inline void address_space_stw_internal(AddressSpace *as,
>              stw_p(ptr, val);
>              break;
>          }
> +        qemu_mutex_lock_iothread();
>          invalidate_and_set_dirty(mr, addr1, 2);
> +        qemu_mutex_unlock_iothread();
>          r = MEMTX_OK;
>      }
>      if (result) {
> @@ -3245,7 +3264,9 @@ void address_space_stq(AddressSpace *as, hwaddr addr, uint64_t val,
>  {
>      MemTxResult r;
>      val = tswap64(val);
> +    qemu_mutex_lock_iothread();
>      r = address_space_rw(as, addr, attrs, (void *) &val, 8, 1);
> +    qemu_mutex_unlock_iothread();
>      if (result) {
>          *result = r;
>      }
> @@ -3256,7 +3277,9 @@ void address_space_stq_le(AddressSpace *as, hwaddr addr, uint64_t val,
>  {
>      MemTxResult r;
>      val = cpu_to_le64(val);
> +    qemu_mutex_lock_iothread();
>      r = address_space_rw(as, addr, attrs, (void *) &val, 8, 1);
> +    qemu_mutex_unlock_iothread();
>      if (result) {
>          *result = r;
>      }
> @@ -3266,7 +3289,9 @@ void address_space_stq_be(AddressSpace *as, hwaddr addr, uint64_t val,
>  {
>      MemTxResult r;
>      val = cpu_to_be64(val);
> +    qemu_mutex_lock_iothread();
>      r = address_space_rw(as, addr, attrs, (void *) &val, 8, 1);
> +    qemu_mutex_unlock_iothread();
>      if (result) {
>          *result = r;
>      }
> diff --git a/softmmu_template.h b/softmmu_template.h
> index d42d89d..18871f5 100644
> --- a/softmmu_template.h
> +++ b/softmmu_template.h
> @@ -158,9 +158,12 @@ static inline DATA_TYPE glue(io_read, SUFFIX)(CPUArchState *env,
>          cpu_io_recompile(cpu, retaddr);
>      }
>  
> +    qemu_mutex_lock_iothread();
> +
>      cpu->mem_io_vaddr = addr;
>      memory_region_dispatch_read(mr, physaddr, &val, 1 << SHIFT,
>                                  iotlbentry->attrs);
> +    qemu_mutex_unlock_iothread();
>      return val;
>  }
>  #endif
> @@ -378,10 +381,12 @@ static inline void glue(io_write, SUFFIX)(CPUArchState *env,
>          cpu_io_recompile(cpu, retaddr);
>      }
>  
> +    qemu_mutex_lock_iothread();
>      cpu->mem_io_vaddr = addr;
>      cpu->mem_io_pc = retaddr;
>      memory_region_dispatch_write(mr, physaddr, val, 1 << SHIFT,
>                                   iotlbentry->attrs);
> +    qemu_mutex_unlock_iothread();
>  }
>  
>  void helper_le_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
> diff --git a/target-i386/misc_helper.c b/target-i386/misc_helper.c
> index 52c5d65..55f63bf 100644
> --- a/target-i386/misc_helper.c
> +++ b/target-i386/misc_helper.c
> @@ -27,8 +27,10 @@ void helper_outb(CPUX86State *env, uint32_t port, uint32_t data)
>  #ifdef CONFIG_USER_ONLY
>      fprintf(stderr, "outb: port=0x%04x, data=%02x\n", port, data);
>  #else
> +    qemu_mutex_lock_iothread();
>      address_space_stb(&address_space_io, port, data,
>                        cpu_get_mem_attrs(env), NULL);
> +    qemu_mutex_unlock_iothread();
>  #endif
>  }
>  
> @@ -38,8 +40,13 @@ target_ulong helper_inb(CPUX86State *env, uint32_t port)
>      fprintf(stderr, "inb: port=0x%04x\n", port);
>      return 0;
>  #else
> -    return address_space_ldub(&address_space_io, port,
> +    target_ulong ret;
> +
> +    qemu_mutex_lock_iothread();
> +    ret = address_space_ldub(&address_space_io, port,
>                                cpu_get_mem_attrs(env), NULL);
> +    qemu_mutex_unlock_iothread();
> +    return ret;
>  #endif
>  }
>  
> @@ -48,8 +55,10 @@ void helper_outw(CPUX86State *env, uint32_t port, uint32_t data)
>  #ifdef CONFIG_USER_ONLY
>      fprintf(stderr, "outw: port=0x%04x, data=%04x\n", port, data);
>  #else
> +    qemu_mutex_lock_iothread();
>      address_space_stw(&address_space_io, port, data,
>                        cpu_get_mem_attrs(env), NULL);
> +    qemu_mutex_unlock_iothread();
>  #endif
>  }
>  
> @@ -59,8 +68,13 @@ target_ulong helper_inw(CPUX86State *env, uint32_t port)
>      fprintf(stderr, "inw: port=0x%04x\n", port);
>      return 0;
>  #else
> -    return address_space_lduw(&address_space_io, port,
> +    target_ulong ret;
> +
> +    qemu_mutex_lock_iothread();
> +    ret = address_space_lduw(&address_space_io, port,
>                                cpu_get_mem_attrs(env), NULL);
> +    qemu_mutex_unlock_iothread();
> +    return ret;
>  #endif
>  }
>  
> @@ -69,8 +83,10 @@ void helper_outl(CPUX86State *env, uint32_t port, uint32_t data)
>  #ifdef CONFIG_USER_ONLY
>      fprintf(stderr, "outw: port=0x%04x, data=%08x\n", port, data);
>  #else
> +    qemu_mutex_lock_iothread();
>      address_space_stl(&address_space_io, port, data,
>                        cpu_get_mem_attrs(env), NULL);
> +    qemu_mutex_unlock_iothread();
>  #endif
>  }
>  
> @@ -80,8 +96,13 @@ target_ulong helper_inl(CPUX86State *env, uint32_t port)
>      fprintf(stderr, "inl: port=0x%04x\n", port);
>      return 0;
>  #else
> -    return address_space_ldl(&address_space_io, port,
> +    target_ulong ret;
> +
> +    qemu_mutex_lock_iothread();
> +    ret = address_space_ldl(&address_space_io, port,
>                               cpu_get_mem_attrs(env), NULL);
> +    qemu_mutex_unlock_iothread();
> +    return ret;
>  #endif
>  }
>  
> diff --git a/translate-all.c b/translate-all.c
> index c25b79b..ade2269 100644
> --- a/translate-all.c
> +++ b/translate-all.c
> @@ -1222,6 +1222,7 @@ void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end,
>  #endif
>  #ifdef TARGET_HAS_PRECISE_SMC
>      if (current_tb_modified) {
> +        qemu_mutex_unlock_iothread();
>          /* we generate a block containing just the instruction
>             modifying the memory. It will ensure that it cannot modify
>             itself */
> @@ -1326,6 +1327,7 @@ static void tb_invalidate_phys_page(tb_page_addr_t addr,
>      p->first_tb = NULL;
>  #ifdef TARGET_HAS_PRECISE_SMC
>      if (current_tb_modified) {
> +        qemu_mutex_unlock_iothread();
>          /* we generate a block containing just the instruction
>             modifying the memory. It will ensure that it cannot modify
>             itself */
> diff --git a/vl.c b/vl.c
> index 69ad90c..2983d44 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -1698,10 +1698,16 @@ void qemu_devices_reset(void)
>  {
>      QEMUResetEntry *re, *nre;
>  
> +    /*
> +     * Some device's reset needs to grab the global_mutex. So just release it
> +     * here.

That's a property newly introduced by the patch, or how does this
happen? In turn, are all reset handlers now fine to be called outside of
BQL? This looks suspicious, but it's been quite a while since I last
starred at this.

Jan

> +     */
> +    qemu_mutex_unlock_iothread();
>      /* reset all devices */
>      QTAILQ_FOREACH_SAFE(re, &reset_handlers, entry, nre) {
>          re->func(re->opaque);
>      }
> +    qemu_mutex_lock_iothread();
>  }
>  
>  void qemu_system_reset(bool report)
> 

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [RFC PATCH V6 05/18] protect TBContext with tb_lock.
  2015-06-26 14:47 ` [Qemu-devel] [RFC PATCH V6 05/18] protect TBContext with tb_lock fred.konrad
@ 2015-06-26 14:56   ` Paolo Bonzini
  2015-06-26 15:39     ` Frederic Konrad
  2015-06-26 16:20   ` Paolo Bonzini
  2015-07-07 12:22   ` Alex Bennée
  2 siblings, 1 reply; 82+ messages in thread
From: Paolo Bonzini @ 2015-06-26 14:56 UTC (permalink / raw)
  To: fred.konrad, qemu-devel, mttcg
  Cc: peter.maydell, a.spyridakis, mark.burton, agraf,
	alistair.francis, guillaume.delbergue, alex.bennee



On 26/06/2015 16:47, fred.konrad@greensocs.com wrote:
>  
> diff --git a/target-arm/translate.c b/target-arm/translate.c
> index 971b6db..47345aa 100644
> --- a/target-arm/translate.c
> +++ b/target-arm/translate.c
> @@ -11162,6 +11162,8 @@ static inline void gen_intermediate_code_internal(ARMCPU *cpu,
>  
>      dc->tb = tb;
>  
> +    tb_lock();
> +
>      dc->is_jmp = DISAS_NEXT;
>      dc->pc = pc_start;
>      dc->singlestep_enabled = cs->singlestep_enabled;
> @@ -11499,6 +11501,7 @@ done_generating:
>          tb->size = dc->pc - pc_start;
>          tb->icount = num_insns;
>      }
> +    tb_unlock();
>  }
>  
>  void gen_intermediate_code(CPUARMState *env, TranslationBlock *tb)
> @@ -11567,6 +11570,7 @@ void arm_cpu_dump_state(CPUState *cs, FILE *f, fprintf_function cpu_fprintf,
>  
>  void restore_state_to_opc(CPUARMState *env, TranslationBlock *tb, int pc_pos)
>  {
> +    tb_lock();
>      if (is_a64(env)) {
>          env->pc = tcg_ctx.gen_opc_pc[pc_pos];
>          env->condexec_bits = 0;
> @@ -11574,4 +11578,5 @@ void restore_state_to_opc(CPUARMState *env, TranslationBlock *tb, int pc_pos)
>          env->regs[15] = tcg_ctx.gen_opc_pc[pc_pos];
>          env->condexec_bits = gen_opc_condexec_bits[pc_pos];
>      }
> +    tb_unlock();
>  }

Should these instead be added to the callers?

Paolo

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [RFC PATCH V6 06/18] tcg: remove tcg_halt_cond global variable.
  2015-06-26 14:47 ` [Qemu-devel] [RFC PATCH V6 06/18] tcg: remove tcg_halt_cond global variable fred.konrad
@ 2015-06-26 15:02   ` Paolo Bonzini
  2015-06-26 15:41     ` Frederic Konrad
  0 siblings, 1 reply; 82+ messages in thread
From: Paolo Bonzini @ 2015-06-26 15:02 UTC (permalink / raw)
  To: fred.konrad, qemu-devel, mttcg
  Cc: peter.maydell, a.spyridakis, mark.burton, agraf,
	alistair.francis, guillaume.delbergue, alex.bennee



On 26/06/2015 16:47, fred.konrad@greensocs.com wrote:
> From: KONRAD Frederic <fred.konrad@greensocs.com>
> 
> This removes tcg_halt_cond global variable.
> We need one QemuCond per virtual cpu for multithread TCG.
> 
> Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
> ---
>  cpus.c | 18 +++++++-----------
>  1 file changed, 7 insertions(+), 11 deletions(-)
> 
> diff --git a/cpus.c b/cpus.c
> index 2d62a35..79383df 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -813,7 +813,6 @@ static unsigned iothread_requesting_mutex;
>  static QemuThread io_thread;
>  
>  static QemuThread *tcg_cpu_thread;
> -static QemuCond *tcg_halt_cond;
>  
>  /* cpu creation */
>  static QemuCond qemu_cpu_cond;
> @@ -919,15 +918,13 @@ static void qemu_wait_io_event_common(CPUState *cpu)
>      cpu->thread_kicked = false;
>  }
>  
> -static void qemu_tcg_wait_io_event(void)
> +static void qemu_tcg_wait_io_event(CPUState *cpu)
>  {
> -    CPUState *cpu;
> -
>      while (all_cpu_threads_idle()) {
>         /* Start accounting real time to the virtual clock if the CPUs
>            are idle.  */
>          qemu_clock_warp(QEMU_CLOCK_VIRTUAL);
> -        qemu_cond_wait(tcg_halt_cond, &qemu_global_mutex);
> +        qemu_cond_wait(cpu->halt_cond, &qemu_global_mutex);
>      }
>  
>      while (iothread_requesting_mutex) {
> @@ -1047,7 +1044,7 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
>  
>      /* wait for initial kick-off after machine start */
>      while (first_cpu->stopped) {
> -        qemu_cond_wait(tcg_halt_cond, &qemu_global_mutex);
> +        qemu_cond_wait(first_cpu->halt_cond, &qemu_global_mutex);
>  
>          /* process any pending work */
>          CPU_FOREACH(cpu) {
> @@ -1068,7 +1065,7 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
>                  qemu_clock_notify(QEMU_CLOCK_VIRTUAL);
>              }
>          }
> -        qemu_tcg_wait_io_event();
> +        qemu_tcg_wait_io_event(QTAILQ_FIRST(&cpus));

Does this work (for non-multithreaded TCG) if tcg_thread_fn is waiting
on the "wrong" condition variable?  For example if all CPUs are idle and
the second CPU wakes up, qemu_tcg_wait_io_event won't be kicked out of
the wait.

I think you need to have a CPUThread struct like this:

   struct CPUThread {
       QemuThread thread;
       QemuCond halt_cond;
   };

and in CPUState have a CPUThread * field instead of the thread and
halt_cond fields.

Then single-threaded TCG can point all CPUStates to the same instance of
the struct, while multi-threaded TCG can point each CPUState to a
different struct.

Paolo

>      }
>  
>      return NULL;
> @@ -1235,12 +1232,12 @@ static void qemu_tcg_init_vcpu(CPUState *cpu)
>  
>      tcg_cpu_address_space_init(cpu, cpu->as);
>  
> +    cpu->halt_cond = g_malloc0(sizeof(QemuCond));
> +    qemu_cond_init(cpu->halt_cond);
> +
>      /* share a single thread for all cpus with TCG */
>      if (!tcg_cpu_thread) {
>          cpu->thread = g_malloc0(sizeof(QemuThread));
> -        cpu->halt_cond = g_malloc0(sizeof(QemuCond));
> -        qemu_cond_init(cpu->halt_cond);
> -        tcg_halt_cond = cpu->halt_cond;
>          snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/TCG",
>                   cpu->cpu_index);
>          qemu_thread_create(cpu->thread, thread_name, qemu_tcg_cpu_thread_fn,
> @@ -1254,7 +1251,6 @@ static void qemu_tcg_init_vcpu(CPUState *cpu)
>          tcg_cpu_thread = cpu->thread;
>      } else {
>          cpu->thread = tcg_cpu_thread;
> -        cpu->halt_cond = tcg_halt_cond;
>      }

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [RFC PATCH V6 08/18] cpu: remove exit_request global.
  2015-06-26 14:47 ` [Qemu-devel] [RFC PATCH V6 08/18] cpu: remove exit_request global fred.konrad
@ 2015-06-26 15:03   ` Paolo Bonzini
  2015-07-07 13:04   ` Alex Bennée
  1 sibling, 0 replies; 82+ messages in thread
From: Paolo Bonzini @ 2015-06-26 15:03 UTC (permalink / raw)
  To: fred.konrad, qemu-devel, mttcg
  Cc: peter.maydell, a.spyridakis, mark.burton, agraf,
	alistair.francis, guillaume.delbergue, alex.bennee



On 26/06/2015 16:47, fred.konrad@greensocs.com wrote:
> From: KONRAD Frederic <fred.konrad@greensocs.com>
> 
> This removes exit_request global and adds a variable in CPUState for this.
> Only the flag for the first cpu is used for the moment as we are still with one
> TCG thread.

I think this should also be added to CPUThread.

Paolo

> Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
> ---
>  cpu-exec.c | 15 ---------------
>  cpus.c     | 17 ++++++++++++++---
>  2 files changed, 14 insertions(+), 18 deletions(-)
> 
> diff --git a/cpu-exec.c b/cpu-exec.c
> index 5d9b518..0644383 100644
> --- a/cpu-exec.c
> +++ b/cpu-exec.c
> @@ -364,8 +364,6 @@ static void cpu_handle_debug_exception(CPUArchState *env)
>  
>  /* main execution loop */
>  
> -volatile sig_atomic_t exit_request;
> -
>  int cpu_exec(CPUArchState *env)
>  {
>      CPUState *cpu = ENV_GET_CPU(env);
> @@ -394,20 +392,8 @@ int cpu_exec(CPUArchState *env)
>  
>      current_cpu = cpu;
>  
> -    /* As long as current_cpu is null, up to the assignment just above,
> -     * requests by other threads to exit the execution loop are expected to
> -     * be issued using the exit_request global. We must make sure that our
> -     * evaluation of the global value is performed past the current_cpu
> -     * value transition point, which requires a memory barrier as well as
> -     * an instruction scheduling constraint on modern architectures.  */
> -    smp_mb();
> -
>      rcu_read_lock();
>  
> -    if (unlikely(exit_request)) {
> -        cpu->exit_request = 1;
> -    }
> -
>      cc->cpu_exec_enter(cpu);
>  
>      /* Calculate difference between guest clock and host clock.
> @@ -496,7 +482,6 @@ int cpu_exec(CPUArchState *env)
>                      }
>                  }
>                  if (unlikely(cpu->exit_request)) {
> -                    cpu->exit_request = 0;
>                      cpu->exception_index = EXCP_INTERRUPT;
>                      cpu_loop_exit(cpu);
>                  }
> diff --git a/cpus.c b/cpus.c
> index 23c316c..2541c56 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -137,6 +137,8 @@ typedef struct TimersState {
>  } TimersState;
>  
>  static TimersState timers_state;
> +/* CPU associated to this thread. */
> +static __thread CPUState *tcg_thread_cpu;
>  
>  int64_t cpu_get_icount_raw(void)
>  {
> @@ -661,12 +663,18 @@ static void cpu_handle_guest_debug(CPUState *cpu)
>      cpu->stopped = true;
>  }
>  
> +/**
> + * cpu_signal
> + * Signal handler when using TCG.
> + */
>  static void cpu_signal(int sig)
>  {
>      if (current_cpu) {
>          cpu_exit(current_cpu);
>      }
> -    exit_request = 1;
> +
> +    /* FIXME: We might want to check if the cpu is running? */
> +    tcg_thread_cpu->exit_request = true;
>  }
>  
>  #ifdef CONFIG_LINUX
> @@ -1031,6 +1039,7 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
>  {
>      CPUState *cpu = arg;
>  
> +    tcg_thread_cpu = cpu;
>      qemu_tcg_init_cpu_signals();
>      qemu_thread_get_self(cpu->thread);
>  
> @@ -1393,7 +1402,8 @@ static void tcg_exec_all(void)
>      if (next_cpu == NULL) {
>          next_cpu = first_cpu;
>      }
> -    for (; next_cpu != NULL && !exit_request; next_cpu = CPU_NEXT(next_cpu)) {
> +    for (; next_cpu != NULL && !first_cpu->exit_request;
> +           next_cpu = CPU_NEXT(next_cpu)) {
>          CPUState *cpu = next_cpu;
>          CPUArchState *env = cpu->env_ptr;
>  
> @@ -1410,7 +1420,8 @@ static void tcg_exec_all(void)
>              break;
>          }
>      }
> -    exit_request = 0;
> +
> +    first_cpu->exit_request = 0;
>  }
>  
>  void list_cpus(FILE *f, fprintf_function cpu_fprintf, const char *optarg)
> 

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [RFC PATCH V6 07/18] Drop global lock during TCG code execution
  2015-06-26 14:56   ` Jan Kiszka
@ 2015-06-26 15:08     ` Paolo Bonzini
  2015-06-26 15:36     ` Frederic Konrad
  1 sibling, 0 replies; 82+ messages in thread
From: Paolo Bonzini @ 2015-06-26 15:08 UTC (permalink / raw)
  To: Jan Kiszka, fred.konrad, qemu-devel, mttcg
  Cc: peter.maydell, a.spyridakis, mark.burton, agraf,
	alistair.francis, guillaume.delbergue, alex.bennee



On 26/06/2015 16:56, Jan Kiszka wrote:
>> > +    /*
>> > +     * Some device's reset needs to grab the global_mutex. So just release it
>> > +     * here.
> That's a property newly introduced by the patch, or how does this
> happen? In turn, are all reset handlers now fine to be called outside of
> BQL? This looks suspicious, but it's been quite a while since I last
> starred at this.

Yes, this looks weird...  I guess this goes with the "RFC" part of the
patch.

Paolo

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [RFC PATCH V6 15/18] cpu: introduce tlb_flush*_all.
  2015-06-26 14:47 ` [Qemu-devel] [RFC PATCH V6 15/18] cpu: introduce tlb_flush*_all fred.konrad
@ 2015-06-26 15:15   ` Paolo Bonzini
  2015-06-26 15:54     ` Frederic Konrad
  2015-07-07 15:52   ` Alex Bennée
  1 sibling, 1 reply; 82+ messages in thread
From: Paolo Bonzini @ 2015-06-26 15:15 UTC (permalink / raw)
  To: fred.konrad, qemu-devel, mttcg
  Cc: peter.maydell, a.spyridakis, mark.burton, agraf,
	alistair.francis, guillaume.delbergue, alex.bennee



On 26/06/2015 16:47, fred.konrad@greensocs.com wrote:
> +    CPU_FOREACH(cpu) {
> +        if (qemu_cpu_is_self(cpu)) {
> +            /* async_run_on_cpu handle this case but this just avoid a malloc
> +             * here.
> +             */
> +            tlb_flush(cpu, flush_global);
> +        } else {
> +            params = g_malloc(sizeof(struct TLBFlushParams));
> +            params->cpu = cpu;
> +            params->flush_global = flush_global;
> +            async_run_on_cpu(cpu, tlb_flush_async_work, params);

Shouldn't this be synchronous (which you cannot do straightforwardly
because of deadlocks---hence the need to hook cpu_has_work as discussed
earlier)?

Paolo

> +        }
> +    }

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [RFC PATCH V6 03/18] remove unused spinlock.
  2015-06-26 14:53   ` Paolo Bonzini
@ 2015-06-26 15:29     ` Frederic Konrad
  2015-06-26 15:46       ` Paolo Bonzini
  0 siblings, 1 reply; 82+ messages in thread
From: Frederic Konrad @ 2015-06-26 15:29 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel, mttcg
  Cc: peter.maydell, a.spyridakis, mark.burton, agraf,
	alistair.francis, guillaume.delbergue, alex.bennee

On 26/06/2015 16:53, Paolo Bonzini wrote:
>
> On 26/06/2015 16:47, fred.konrad@greensocs.com wrote:
>> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
>> index 7f0aae9..d1e482a 100755
>> --- a/scripts/checkpatch.pl
>> +++ b/scripts/checkpatch.pl
>> @@ -2664,11 +2664,6 @@ sub process {
>>   			WARN("Use of volatile is usually wrong: see Documentation/volatile-considered-harmful.txt\n" . $herecurr);
>>   		}
>>   
>> -# SPIN_LOCK_UNLOCKED & RW_LOCK_UNLOCKED are deprecated
>> -		if ($line =~ /\b(SPIN_LOCK_UNLOCKED|RW_LOCK_UNLOCKED)/) {
>> -			ERROR("Use of $1 is deprecated: see Documentation/spinlocks.txt\n" . $herecurr);
>> -		}
>> -
>>   # warn about #if 0
>>   		if ($line =~ /^.\s*\#\s*if\s+0\b/) {
>>   			CHK("if this code is redundant consider removing it\n" .
>> @@ -2717,8 +2712,8 @@ sub process {
>>   			ERROR("exactly one space required after that #$1\n" . $herecurr);
>>   		}
>>   
>> -# check for spinlock_t definitions without a comment.
>> -		if ($line =~ /^.\s*(struct\s+mutex|spinlock_t)\s+\S+;/ ||
>> +# check for mutex definitions without a comment.
>> +		if ($line =~ /^.\s*(struct\s+mutex)\s+\S+;/ ||
>>   		    $line =~ /^.\s*(DEFINE_MUTEX)\s*\(/) {
>>   			my $which = $1;
>>   			if (!ctx_has_comment($first_line, $linenr)) {
> The checkpatch.pl parts simply come from Linux.  They don't matter for
> QEMU, but we're limiting the changes to the minimum in this script.
>
> Paolo
Ok so I can drop this part from the patch?

Thanks,
Fred

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [RFC PATCH V6 04/18] add support for spin lock on POSIX systems exclusively
  2015-06-26 14:55   ` Paolo Bonzini
@ 2015-06-26 15:31     ` Frederic Konrad
  0 siblings, 0 replies; 82+ messages in thread
From: Frederic Konrad @ 2015-06-26 15:31 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel, mttcg
  Cc: peter.maydell, a.spyridakis, mark.burton, agraf,
	alistair.francis, guillaume.delbergue, alex.bennee

On 26/06/2015 16:55, Paolo Bonzini wrote:
>
> On 26/06/2015 16:47, fred.konrad@greensocs.com wrote:
>> From: Guillaume Delbergue <guillaume.delbergue@greensocs.com>
>>
>> WARNING: spin lock is currently not implemented on WIN32
> The Windows KSPIN_LOCK is a kernel data structure.  You can implement a
> simple, portable test-and-test-and-set spinlock using atomics, and use
> it on both POSIX and Win32.
>
> Paolo

ok we will take a look at atomic instruction.

Fred
>
>> Signed-off-by: Guillaume Delbergue <guillaume.delbergue@greensocs.com>
>> ---
>>   include/qemu/thread-posix.h |  4 ++++
>>   include/qemu/thread-win32.h |  4 ++++
>>   include/qemu/thread.h       |  7 +++++++
>>   util/qemu-thread-posix.c    | 45 +++++++++++++++++++++++++++++++++++++++++++++
>>   util/qemu-thread-win32.c    | 30 ++++++++++++++++++++++++++++++
>>   5 files changed, 90 insertions(+)
>>
>> diff --git a/include/qemu/thread-posix.h b/include/qemu/thread-posix.h
>> index eb5c7a1..8ce8f01 100644
>> --- a/include/qemu/thread-posix.h
>> +++ b/include/qemu/thread-posix.h
>> @@ -7,6 +7,10 @@ struct QemuMutex {
>>       pthread_mutex_t lock;
>>   };
>>   
>> +struct QemuSpin {
>> +    pthread_spinlock_t lock;
>> +};
>> +
>>   struct QemuCond {
>>       pthread_cond_t cond;
>>   };
>> diff --git a/include/qemu/thread-win32.h b/include/qemu/thread-win32.h
>> index 3d58081..310c8bd 100644
>> --- a/include/qemu/thread-win32.h
>> +++ b/include/qemu/thread-win32.h
>> @@ -7,6 +7,10 @@ struct QemuMutex {
>>       LONG owner;
>>   };
>>   
>> +struct QemuSpin {
>> +    PKSPIN_LOCK lock;
>> +};
>> +
>>   struct QemuCond {
>>       LONG waiters, target;
>>       HANDLE sema;
>> diff --git a/include/qemu/thread.h b/include/qemu/thread.h
>> index 5114ec8..f5d1259 100644
>> --- a/include/qemu/thread.h
>> +++ b/include/qemu/thread.h
>> @@ -5,6 +5,7 @@
>>   #include <stdbool.h>
>>   
>>   typedef struct QemuMutex QemuMutex;
>> +typedef struct QemuSpin QemuSpin;
>>   typedef struct QemuCond QemuCond;
>>   typedef struct QemuSemaphore QemuSemaphore;
>>   typedef struct QemuEvent QemuEvent;
>> @@ -25,6 +26,12 @@ void qemu_mutex_lock(QemuMutex *mutex);
>>   int qemu_mutex_trylock(QemuMutex *mutex);
>>   void qemu_mutex_unlock(QemuMutex *mutex);
>>   
>> +void qemu_spin_init(QemuSpin *spin);
>> +void qemu_spin_destroy(QemuSpin *spin);
>> +void qemu_spin_lock(QemuSpin *spin);
>> +int qemu_spin_trylock(QemuSpin *spin);
>> +void qemu_spin_unlock(QemuSpin *spin);
>> +
>>   void qemu_cond_init(QemuCond *cond);
>>   void qemu_cond_destroy(QemuCond *cond);
>>   
>> diff --git a/util/qemu-thread-posix.c b/util/qemu-thread-posix.c
>> index ba67cec..224bacc 100644
>> --- a/util/qemu-thread-posix.c
>> +++ b/util/qemu-thread-posix.c
>> @@ -89,6 +89,51 @@ void qemu_mutex_unlock(QemuMutex *mutex)
>>           error_exit(err, __func__);
>>   }
>>   
>> +void qemu_spin_init(QemuSpin *spin)
>> +{
>> +    int err;
>> +
>> +    err = pthread_spin_init(&spin->lock, 0);
>> +    if (err) {
>> +        error_exit(err, __func__);
>> +    }
>> +}
>> +
>> +void qemu_spin_destroy(QemuSpin *spin)
>> +{
>> +    int err;
>> +
>> +    err = pthread_spin_destroy(&spin->lock);
>> +    if (err) {
>> +        error_exit(err, __func__);
>> +    }
>> +}
>> +
>> +void qemu_spin_lock(QemuSpin *spin)
>> +{
>> +    int err;
>> +
>> +    err = pthread_spin_lock(&spin->lock);
>> +    if (err) {
>> +        error_exit(err, __func__);
>> +    }
>> +}
>> +
>> +int qemu_spin_trylock(QemuSpin *spin)
>> +{
>> +    return pthread_spin_trylock(&spin->lock);
>> +}
>> +
>> +void qemu_spin_unlock(QemuSpin *spin)
>> +{
>> +    int err;
>> +
>> +    err = pthread_spin_unlock(&spin->lock);
>> +    if (err) {
>> +        error_exit(err, __func__);
>> +    }
>> +}
>> +
>>   void qemu_cond_init(QemuCond *cond)
>>   {
>>       int err;
>> diff --git a/util/qemu-thread-win32.c b/util/qemu-thread-win32.c
>> index 406b52f..6fbe6a8 100644
>> --- a/util/qemu-thread-win32.c
>> +++ b/util/qemu-thread-win32.c
>> @@ -80,6 +80,36 @@ void qemu_mutex_unlock(QemuMutex *mutex)
>>       LeaveCriticalSection(&mutex->lock);
>>   }
>>   
>> +void qemu_spin_init(QemuSpin *spin)
>> +{
>> +    printf("spinlock not implemented");
>> +    abort();
>> +}
>> +
>> +void qemu_spin_destroy(QemuSpin *spin)
>> +{
>> +    printf("spinlock not implemented");
>> +    abort();
>> +}
>> +
>> +void qemu_spin_lock(QemuSpin *spin)
>> +{
>> +    printf("spinlock not implemented");
>> +    abort();
>> +}
>> +
>> +int qemu_spin_trylock(QemuSpin *spin)
>> +{
>> +    printf("spinlock not implemented");
>> +    abort();
>> +}
>> +
>> +void qemu_spin_unlock(QemuSpin *spin)
>> +{
>> +    printf("spinlock not implemented");
>> +    abort();
>> +}
>> +
>>   void qemu_cond_init(QemuCond *cond)
>>   {
>>       memset(cond, 0, sizeof(*cond));
>>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [RFC PATCH V6 13/18] cpu: introduce async_run_safe_work_on_cpu.
  2015-06-26 14:47 ` [Qemu-devel] [RFC PATCH V6 13/18] cpu: introduce async_run_safe_work_on_cpu fred.konrad
@ 2015-06-26 15:35   ` Paolo Bonzini
  2015-06-26 16:09     ` Frederic Konrad
  0 siblings, 1 reply; 82+ messages in thread
From: Paolo Bonzini @ 2015-06-26 15:35 UTC (permalink / raw)
  To: fred.konrad, qemu-devel, mttcg
  Cc: peter.maydell, a.spyridakis, mark.burton, agraf,
	alistair.francis, guillaume.delbergue, alex.bennee

On 26/06/2015 16:47, fred.konrad@greensocs.com wrote:
> diff --git a/cpu-exec.c b/cpu-exec.c
> index de256d6..d6442cd 100644
> --- a/cpu-exec.c
> +++ b/cpu-exec.c

Nice solution.  However I still have a few questions that need
clarification.

> @@ -382,6 +382,11 @@ int cpu_exec(CPUArchState *env)
>      volatile bool have_tb_lock = false;
>  #endif
>  
> +    if (async_safe_work_pending()) {
> +        cpu->exit_request = 1;
> +        return 0;
> +    }

Perhaps move this to cpu_can_run()?

>      if (cpu->halted) {
>          if (!cpu_has_work(cpu)) {
>              return EXCP_HALTED;
> diff --git a/cpus.c b/cpus.c
> index 5f13d73..aee445a 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -75,7 +75,7 @@ bool cpu_is_stopped(CPUState *cpu)
>  
>  bool cpu_thread_is_idle(CPUState *cpu)
>  {
> -    if (cpu->stop || cpu->queued_work_first) {
> +    if (cpu->stop || cpu->queued_work_first || cpu->queued_safe_work_first) {
>          return false;
>      }
>      if (cpu_is_stopped(cpu)) {
> @@ -892,6 +892,69 @@ void async_run_on_cpu(CPUState *cpu, void (*func)(void *data), void *data)
>      qemu_cpu_kick(cpu);
>  }
>  
> +void async_run_safe_work_on_cpu(CPUState *cpu, void (*func)(void *data),
> +                                void *data)
> +{

Do you need a mutex to protect this data structure?  I would use one
even if not strictly necessary, to avoid introducing new BQL-protected
structures.

Also, can you add a count of how many such work items exist in the whole
system, in order to speed up async_safe_work_pending?

> +    struct qemu_work_item *wi;
> +
> +    wi = g_malloc0(sizeof(struct qemu_work_item));
> +    wi->func = func;
> +    wi->data = data;
> +    wi->free = true;
> +    if (cpu->queued_safe_work_first == NULL) {
> +        cpu->queued_safe_work_first = wi;
> +    } else {
> +        cpu->queued_safe_work_last->next = wi;
> +    }
> +    cpu->queued_safe_work_last = wi;
> +    wi->next = NULL;
> +    wi->done = false;
> +
> +    CPU_FOREACH(cpu) {
> +        qemu_cpu_kick_thread(cpu);
> +    }
> +}
> +
> +static void flush_queued_safe_work(CPUState *cpu)
> +{
> +    struct qemu_work_item *wi;
> +    CPUState *other_cpu;
> +
> +    if (cpu->queued_safe_work_first == NULL) {
> +        return;
> +    }
> +
> +    CPU_FOREACH(other_cpu) {
> +        if (other_cpu->tcg_executing != 0) {

This causes the thread to busy wait until everyone has exited, right?
Not a big deal, but worth a comment.

Paolo

> +            return;
> +        }
> +    }
> +
> +    while ((wi = cpu->queued_safe_work_first)) {
> +        cpu->queued_safe_work_first = wi->next;
> +        wi->func(wi->data);
> +        wi->done = true;
> +        if (wi->free) {
> +            g_free(wi);
> +        }
> +    }
> +    cpu->queued_safe_work_last = NULL;
> +    qemu_cond_broadcast(&qemu_work_cond);
> +}
> +
> +bool async_safe_work_pending(void)
> +{
> +    CPUState *cpu;
> +
> +    CPU_FOREACH(cpu) {
> +        if (cpu->queued_safe_work_first) {
> +            return true;
> +        }
> +    }
> +
> +    return false;
> +}
> +

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [RFC PATCH V6 07/18] Drop global lock during TCG code execution
  2015-06-26 14:56   ` Jan Kiszka
  2015-06-26 15:08     ` Paolo Bonzini
@ 2015-06-26 15:36     ` Frederic Konrad
  2015-06-26 15:42       ` Jan Kiszka
  2015-07-07 12:33       ` Alex Bennée
  1 sibling, 2 replies; 82+ messages in thread
From: Frederic Konrad @ 2015-06-26 15:36 UTC (permalink / raw)
  To: Jan Kiszka, qemu-devel, mttcg
  Cc: peter.maydell, a.spyridakis, mark.burton, agraf,
	alistair.francis, guillaume.delbergue, pbonzini, alex.bennee

On 26/06/2015 16:56, Jan Kiszka wrote:
> On 2015-06-26 16:47, fred.konrad@greensocs.com wrote:
>> From: Jan Kiszka <jan.kiszka@siemens.com>
>>
>> This finally allows TCG to benefit from the iothread introduction: Drop
>> the global mutex while running pure TCG CPU code. Reacquire the lock
>> when entering MMIO or PIO emulation, or when leaving the TCG loop.
>>
>> We have to revert a few optimization for the current TCG threading
>> model, namely kicking the TCG thread in qemu_mutex_lock_iothread and not
>> kicking it in qemu_cpu_kick. We also need to disable RAM block
>> reordering until we have a more efficient locking mechanism at hand.
>>
>> I'm pretty sure some cases are still broken, definitely SMP (we no
>> longer perform round-robin scheduling "by chance"). Still, a Linux x86
>> UP guest and my Musicpal ARM model boot fine here. These numbers
>> demonstrate where we gain something:
>>
>> 20338 jan       20   0  331m  75m 6904 R   99  0.9   0:50.95 qemu-system-arm
>> 20337 jan       20   0  331m  75m 6904 S   20  0.9   0:26.50 qemu-system-arm
>>
>> The guest CPU was fully loaded, but the iothread could still run mostly
>> independent on a second core. Without the patch we don't get beyond
>>
>> 32206 jan       20   0  330m  73m 7036 R   82  0.9   1:06.00 qemu-system-arm
>> 32204 jan       20   0  330m  73m 7036 S   21  0.9   0:17.03 qemu-system-arm
>>
>> We don't benefit significantly, though, when the guest is not fully
>> loading a host CPU.
>>
>> Note that this patch depends on
>> http://thread.gmane.org/gmane.comp.emulators.qemu/118657
>>
>> Changes from Fred Konrad:
>>    * Rebase on the current HEAD.
>>    * Fixes a deadlock in qemu_devices_reset().
>> ---
>>   cpus.c                    | 17 ++++-------------
>>   cputlb.c                  |  5 +++++
>>   exec.c                    | 25 +++++++++++++++++++++++++
>>   softmmu_template.h        |  5 +++++
>>   target-i386/misc_helper.c | 27 ++++++++++++++++++++++++---
>>   translate-all.c           |  2 ++
>>   vl.c                      |  6 ++++++
>>   7 files changed, 71 insertions(+), 16 deletions(-)
>>
>> diff --git a/cpus.c b/cpus.c
>> index 79383df..23c316c 100644
>> --- a/cpus.c
>> +++ b/cpus.c
>> @@ -1034,7 +1034,7 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
>>       qemu_tcg_init_cpu_signals();
>>       qemu_thread_get_self(cpu->thread);
>>   
>> -    qemu_mutex_lock(&qemu_global_mutex);
>> +    qemu_mutex_lock_iothread();
>>       CPU_FOREACH(cpu) {
>>           cpu->thread_id = qemu_get_thread_id();
>>           cpu->created = true;
>> @@ -1145,18 +1145,7 @@ bool qemu_in_vcpu_thread(void)
>>   
>>   void qemu_mutex_lock_iothread(void)
>>   {
>> -    atomic_inc(&iothread_requesting_mutex);
>> -    if (!tcg_enabled() || !first_cpu || !first_cpu->thread) {
>> -        qemu_mutex_lock(&qemu_global_mutex);
>> -        atomic_dec(&iothread_requesting_mutex);
>> -    } else {
>> -        if (qemu_mutex_trylock(&qemu_global_mutex)) {
>> -            qemu_cpu_kick_thread(first_cpu);
>> -            qemu_mutex_lock(&qemu_global_mutex);
>> -        }
>> -        atomic_dec(&iothread_requesting_mutex);
>> -        qemu_cond_broadcast(&qemu_io_proceeded_cond);
>> -    }
>> +    qemu_mutex_lock(&qemu_global_mutex);
>>   }
>>   
>>   void qemu_mutex_unlock_iothread(void)
>> @@ -1377,7 +1366,9 @@ static int tcg_cpu_exec(CPUArchState *env)
>>           cpu->icount_decr.u16.low = decr;
>>           cpu->icount_extra = count;
>>       }
>> +    qemu_mutex_unlock_iothread();
>>       ret = cpu_exec(env);
>> +    qemu_mutex_lock_iothread();
>>   #ifdef CONFIG_PROFILER
>>       tcg_time += profile_getclock() - ti;
>>   #endif
>> diff --git a/cputlb.c b/cputlb.c
>> index a506086..79fff1c 100644
>> --- a/cputlb.c
>> +++ b/cputlb.c
>> @@ -30,6 +30,9 @@
>>   #include "exec/ram_addr.h"
>>   #include "tcg/tcg.h"
>>   
>> +void qemu_mutex_lock_iothread(void);
>> +void qemu_mutex_unlock_iothread(void);
>> +
>>   //#define DEBUG_TLB
>>   //#define DEBUG_TLB_CHECK
>>   
>> @@ -125,8 +128,10 @@ void tlb_flush_page(CPUState *cpu, target_ulong addr)
>>      can be detected */
>>   void tlb_protect_code(ram_addr_t ram_addr)
>>   {
>> +    qemu_mutex_lock_iothread();
>>       cpu_physical_memory_test_and_clear_dirty(ram_addr, TARGET_PAGE_SIZE,
>>                                                DIRTY_MEMORY_CODE);
>> +    qemu_mutex_unlock_iothread();
>>   }
>>   
>>   /* update the TLB so that writes in physical page 'phys_addr' are no longer
>> diff --git a/exec.c b/exec.c
>> index f7883d2..964e922 100644
>> --- a/exec.c
>> +++ b/exec.c
>> @@ -1881,6 +1881,7 @@ static void check_watchpoint(int offset, int len, MemTxAttrs attrs, int flags)
>>               wp->hitaddr = vaddr;
>>               wp->hitattrs = attrs;
>>               if (!cpu->watchpoint_hit) {
>> +                qemu_mutex_unlock_iothread();
>>                   cpu->watchpoint_hit = wp;
>>                   tb_check_watchpoint(cpu);
>>                   if (wp->flags & BP_STOP_BEFORE_ACCESS) {
>> @@ -2740,6 +2741,7 @@ static inline uint32_t address_space_ldl_internal(AddressSpace *as, hwaddr addr,
>>       mr = address_space_translate(as, addr, &addr1, &l, false);
>>       if (l < 4 || !memory_access_is_direct(mr, false)) {
>>           /* I/O case */
>> +        qemu_mutex_lock_iothread();
>>           r = memory_region_dispatch_read(mr, addr1, &val, 4, attrs);
>>   #if defined(TARGET_WORDS_BIGENDIAN)
>>           if (endian == DEVICE_LITTLE_ENDIAN) {
>> @@ -2750,6 +2752,7 @@ static inline uint32_t address_space_ldl_internal(AddressSpace *as, hwaddr addr,
>>               val = bswap32(val);
>>           }
>>   #endif
>> +        qemu_mutex_unlock_iothread();
>>       } else {
>>           /* RAM case */
>>           ptr = qemu_get_ram_ptr((memory_region_get_ram_addr(mr)
>> @@ -2829,6 +2832,7 @@ static inline uint64_t address_space_ldq_internal(AddressSpace *as, hwaddr addr,
>>                                    false);
>>       if (l < 8 || !memory_access_is_direct(mr, false)) {
>>           /* I/O case */
>> +        qemu_mutex_lock_iothread();
>>           r = memory_region_dispatch_read(mr, addr1, &val, 8, attrs);
>>   #if defined(TARGET_WORDS_BIGENDIAN)
>>           if (endian == DEVICE_LITTLE_ENDIAN) {
>> @@ -2839,6 +2843,7 @@ static inline uint64_t address_space_ldq_internal(AddressSpace *as, hwaddr addr,
>>               val = bswap64(val);
>>           }
>>   #endif
>> +        qemu_mutex_unlock_iothread();
>>       } else {
>>           /* RAM case */
>>           ptr = qemu_get_ram_ptr((memory_region_get_ram_addr(mr)
>> @@ -2938,7 +2943,9 @@ static inline uint32_t address_space_lduw_internal(AddressSpace *as,
>>                                    false);
>>       if (l < 2 || !memory_access_is_direct(mr, false)) {
>>           /* I/O case */
>> +        qemu_mutex_lock_iothread();
>>           r = memory_region_dispatch_read(mr, addr1, &val, 2, attrs);
>> +        qemu_mutex_unlock_iothread();
>>   #if defined(TARGET_WORDS_BIGENDIAN)
>>           if (endian == DEVICE_LITTLE_ENDIAN) {
>>               val = bswap16(val);
>> @@ -3026,15 +3033,19 @@ void address_space_stl_notdirty(AddressSpace *as, hwaddr addr, uint32_t val,
>>       mr = address_space_translate(as, addr, &addr1, &l,
>>                                    true);
>>       if (l < 4 || !memory_access_is_direct(mr, true)) {
>> +        qemu_mutex_lock_iothread();
>>           r = memory_region_dispatch_write(mr, addr1, val, 4, attrs);
>> +        qemu_mutex_unlock_iothread();
>>       } else {
>>           addr1 += memory_region_get_ram_addr(mr) & TARGET_PAGE_MASK;
>>           ptr = qemu_get_ram_ptr(addr1);
>>           stl_p(ptr, val);
>>   
>> +        qemu_mutex_lock_iothread();
>>           dirty_log_mask = memory_region_get_dirty_log_mask(mr);
>>           dirty_log_mask &= ~(1 << DIRTY_MEMORY_CODE);
>>           cpu_physical_memory_set_dirty_range(addr1, 4, dirty_log_mask);
>> +        qemu_mutex_unlock_iothread();
>>           r = MEMTX_OK;
>>       }
>>       if (result) {
>> @@ -3074,7 +3085,9 @@ static inline void address_space_stl_internal(AddressSpace *as,
>>               val = bswap32(val);
>>           }
>>   #endif
>> +        qemu_mutex_lock_iothread();
>>           r = memory_region_dispatch_write(mr, addr1, val, 4, attrs);
>> +        qemu_mutex_unlock_iothread();
>>       } else {
>>           /* RAM case */
>>           addr1 += memory_region_get_ram_addr(mr) & TARGET_PAGE_MASK;
>> @@ -3090,7 +3103,9 @@ static inline void address_space_stl_internal(AddressSpace *as,
>>               stl_p(ptr, val);
>>               break;
>>           }
>> +        qemu_mutex_lock_iothread();
>>           invalidate_and_set_dirty(mr, addr1, 4);
>> +        qemu_mutex_unlock_iothread();
>>           r = MEMTX_OK;
>>       }
>>       if (result) {
>> @@ -3178,7 +3193,9 @@ static inline void address_space_stw_internal(AddressSpace *as,
>>               val = bswap16(val);
>>           }
>>   #endif
>> +        qemu_mutex_lock_iothread();
>>           r = memory_region_dispatch_write(mr, addr1, val, 2, attrs);
>> +        qemu_mutex_unlock_iothread();
>>       } else {
>>           /* RAM case */
>>           addr1 += memory_region_get_ram_addr(mr) & TARGET_PAGE_MASK;
>> @@ -3194,7 +3211,9 @@ static inline void address_space_stw_internal(AddressSpace *as,
>>               stw_p(ptr, val);
>>               break;
>>           }
>> +        qemu_mutex_lock_iothread();
>>           invalidate_and_set_dirty(mr, addr1, 2);
>> +        qemu_mutex_unlock_iothread();
>>           r = MEMTX_OK;
>>       }
>>       if (result) {
>> @@ -3245,7 +3264,9 @@ void address_space_stq(AddressSpace *as, hwaddr addr, uint64_t val,
>>   {
>>       MemTxResult r;
>>       val = tswap64(val);
>> +    qemu_mutex_lock_iothread();
>>       r = address_space_rw(as, addr, attrs, (void *) &val, 8, 1);
>> +    qemu_mutex_unlock_iothread();
>>       if (result) {
>>           *result = r;
>>       }
>> @@ -3256,7 +3277,9 @@ void address_space_stq_le(AddressSpace *as, hwaddr addr, uint64_t val,
>>   {
>>       MemTxResult r;
>>       val = cpu_to_le64(val);
>> +    qemu_mutex_lock_iothread();
>>       r = address_space_rw(as, addr, attrs, (void *) &val, 8, 1);
>> +    qemu_mutex_unlock_iothread();
>>       if (result) {
>>           *result = r;
>>       }
>> @@ -3266,7 +3289,9 @@ void address_space_stq_be(AddressSpace *as, hwaddr addr, uint64_t val,
>>   {
>>       MemTxResult r;
>>       val = cpu_to_be64(val);
>> +    qemu_mutex_lock_iothread();
>>       r = address_space_rw(as, addr, attrs, (void *) &val, 8, 1);
>> +    qemu_mutex_unlock_iothread();
>>       if (result) {
>>           *result = r;
>>       }
>> diff --git a/softmmu_template.h b/softmmu_template.h
>> index d42d89d..18871f5 100644
>> --- a/softmmu_template.h
>> +++ b/softmmu_template.h
>> @@ -158,9 +158,12 @@ static inline DATA_TYPE glue(io_read, SUFFIX)(CPUArchState *env,
>>           cpu_io_recompile(cpu, retaddr);
>>       }
>>   
>> +    qemu_mutex_lock_iothread();
>> +
>>       cpu->mem_io_vaddr = addr;
>>       memory_region_dispatch_read(mr, physaddr, &val, 1 << SHIFT,
>>                                   iotlbentry->attrs);
>> +    qemu_mutex_unlock_iothread();
>>       return val;
>>   }
>>   #endif
>> @@ -378,10 +381,12 @@ static inline void glue(io_write, SUFFIX)(CPUArchState *env,
>>           cpu_io_recompile(cpu, retaddr);
>>       }
>>   
>> +    qemu_mutex_lock_iothread();
>>       cpu->mem_io_vaddr = addr;
>>       cpu->mem_io_pc = retaddr;
>>       memory_region_dispatch_write(mr, physaddr, val, 1 << SHIFT,
>>                                    iotlbentry->attrs);
>> +    qemu_mutex_unlock_iothread();
>>   }
>>   
>>   void helper_le_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
>> diff --git a/target-i386/misc_helper.c b/target-i386/misc_helper.c
>> index 52c5d65..55f63bf 100644
>> --- a/target-i386/misc_helper.c
>> +++ b/target-i386/misc_helper.c
>> @@ -27,8 +27,10 @@ void helper_outb(CPUX86State *env, uint32_t port, uint32_t data)
>>   #ifdef CONFIG_USER_ONLY
>>       fprintf(stderr, "outb: port=0x%04x, data=%02x\n", port, data);
>>   #else
>> +    qemu_mutex_lock_iothread();
>>       address_space_stb(&address_space_io, port, data,
>>                         cpu_get_mem_attrs(env), NULL);
>> +    qemu_mutex_unlock_iothread();
>>   #endif
>>   }
>>   
>> @@ -38,8 +40,13 @@ target_ulong helper_inb(CPUX86State *env, uint32_t port)
>>       fprintf(stderr, "inb: port=0x%04x\n", port);
>>       return 0;
>>   #else
>> -    return address_space_ldub(&address_space_io, port,
>> +    target_ulong ret;
>> +
>> +    qemu_mutex_lock_iothread();
>> +    ret = address_space_ldub(&address_space_io, port,
>>                                 cpu_get_mem_attrs(env), NULL);
>> +    qemu_mutex_unlock_iothread();
>> +    return ret;
>>   #endif
>>   }
>>   
>> @@ -48,8 +55,10 @@ void helper_outw(CPUX86State *env, uint32_t port, uint32_t data)
>>   #ifdef CONFIG_USER_ONLY
>>       fprintf(stderr, "outw: port=0x%04x, data=%04x\n", port, data);
>>   #else
>> +    qemu_mutex_lock_iothread();
>>       address_space_stw(&address_space_io, port, data,
>>                         cpu_get_mem_attrs(env), NULL);
>> +    qemu_mutex_unlock_iothread();
>>   #endif
>>   }
>>   
>> @@ -59,8 +68,13 @@ target_ulong helper_inw(CPUX86State *env, uint32_t port)
>>       fprintf(stderr, "inw: port=0x%04x\n", port);
>>       return 0;
>>   #else
>> -    return address_space_lduw(&address_space_io, port,
>> +    target_ulong ret;
>> +
>> +    qemu_mutex_lock_iothread();
>> +    ret = address_space_lduw(&address_space_io, port,
>>                                 cpu_get_mem_attrs(env), NULL);
>> +    qemu_mutex_unlock_iothread();
>> +    return ret;
>>   #endif
>>   }
>>   
>> @@ -69,8 +83,10 @@ void helper_outl(CPUX86State *env, uint32_t port, uint32_t data)
>>   #ifdef CONFIG_USER_ONLY
>>       fprintf(stderr, "outw: port=0x%04x, data=%08x\n", port, data);
>>   #else
>> +    qemu_mutex_lock_iothread();
>>       address_space_stl(&address_space_io, port, data,
>>                         cpu_get_mem_attrs(env), NULL);
>> +    qemu_mutex_unlock_iothread();
>>   #endif
>>   }
>>   
>> @@ -80,8 +96,13 @@ target_ulong helper_inl(CPUX86State *env, uint32_t port)
>>       fprintf(stderr, "inl: port=0x%04x\n", port);
>>       return 0;
>>   #else
>> -    return address_space_ldl(&address_space_io, port,
>> +    target_ulong ret;
>> +
>> +    qemu_mutex_lock_iothread();
>> +    ret = address_space_ldl(&address_space_io, port,
>>                                cpu_get_mem_attrs(env), NULL);
>> +    qemu_mutex_unlock_iothread();
>> +    return ret;
>>   #endif
>>   }
>>   
>> diff --git a/translate-all.c b/translate-all.c
>> index c25b79b..ade2269 100644
>> --- a/translate-all.c
>> +++ b/translate-all.c
>> @@ -1222,6 +1222,7 @@ void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end,
>>   #endif
>>   #ifdef TARGET_HAS_PRECISE_SMC
>>       if (current_tb_modified) {
>> +        qemu_mutex_unlock_iothread();
>>           /* we generate a block containing just the instruction
>>              modifying the memory. It will ensure that it cannot modify
>>              itself */
>> @@ -1326,6 +1327,7 @@ static void tb_invalidate_phys_page(tb_page_addr_t addr,
>>       p->first_tb = NULL;
>>   #ifdef TARGET_HAS_PRECISE_SMC
>>       if (current_tb_modified) {
>> +        qemu_mutex_unlock_iothread();
>>           /* we generate a block containing just the instruction
>>              modifying the memory. It will ensure that it cannot modify
>>              itself */
>> diff --git a/vl.c b/vl.c
>> index 69ad90c..2983d44 100644
>> --- a/vl.c
>> +++ b/vl.c
>> @@ -1698,10 +1698,16 @@ void qemu_devices_reset(void)
>>   {
>>       QEMUResetEntry *re, *nre;
>>   
>> +    /*
>> +     * Some device's reset needs to grab the global_mutex. So just release it
>> +     * here.
> That's a property newly introduced by the patch, or how does this
> happen? In turn, are all reset handlers now fine to be called outside of
> BQL? This looks suspicious, but it's been quite a while since I last
> starred at this.
>
> Jan
Hi Jan,

Sorry for that, it's a dirty hack :).
Some reset handler probably load stuff in the memory hence a double lock.
It will probably disappear with:

http://thread.gmane.org/gmane.comp.emulators.qemu/345258

Thanks,
Fred

>> +     */
>> +    qemu_mutex_unlock_iothread();
>>       /* reset all devices */
>>       QTAILQ_FOREACH_SAFE(re, &reset_handlers, entry, nre) {
>>           re->func(re->opaque);
>>       }
>> +    qemu_mutex_lock_iothread();
>>   }
>>   
>>   void qemu_system_reset(bool report)
>>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [RFC PATCH V6 05/18] protect TBContext with tb_lock.
  2015-06-26 14:56   ` Paolo Bonzini
@ 2015-06-26 15:39     ` Frederic Konrad
  2015-06-26 15:45       ` Paolo Bonzini
  0 siblings, 1 reply; 82+ messages in thread
From: Frederic Konrad @ 2015-06-26 15:39 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel, mttcg
  Cc: peter.maydell, a.spyridakis, mark.burton, agraf,
	alistair.francis, guillaume.delbergue, alex.bennee

On 26/06/2015 16:56, Paolo Bonzini wrote:
>
> On 26/06/2015 16:47, fred.konrad@greensocs.com wrote:
>>   
>> diff --git a/target-arm/translate.c b/target-arm/translate.c
>> index 971b6db..47345aa 100644
>> --- a/target-arm/translate.c
>> +++ b/target-arm/translate.c
>> @@ -11162,6 +11162,8 @@ static inline void gen_intermediate_code_internal(ARMCPU *cpu,
>>   
>>       dc->tb = tb;
>>   
>> +    tb_lock();
>> +
>>       dc->is_jmp = DISAS_NEXT;
>>       dc->pc = pc_start;
>>       dc->singlestep_enabled = cs->singlestep_enabled;
>> @@ -11499,6 +11501,7 @@ done_generating:
>>           tb->size = dc->pc - pc_start;
>>           tb->icount = num_insns;
>>       }
>> +    tb_unlock();
>>   }
>>   
>>   void gen_intermediate_code(CPUARMState *env, TranslationBlock *tb)
>> @@ -11567,6 +11570,7 @@ void arm_cpu_dump_state(CPUState *cs, FILE *f, fprintf_function cpu_fprintf,
>>   
>>   void restore_state_to_opc(CPUARMState *env, TranslationBlock *tb, int pc_pos)
>>   {
>> +    tb_lock();
>>       if (is_a64(env)) {
>>           env->pc = tcg_ctx.gen_opc_pc[pc_pos];
>>           env->condexec_bits = 0;
>> @@ -11574,4 +11578,5 @@ void restore_state_to_opc(CPUARMState *env, TranslationBlock *tb, int pc_pos)
>>           env->regs[15] = tcg_ctx.gen_opc_pc[pc_pos];
>>           env->condexec_bits = gen_opc_condexec_bits[pc_pos];
>>       }
>> +    tb_unlock();
>>   }
> Should these instead be added to the callers?
>
> Paolo
Good point,
I see only one caller and the mutex is already locked.

Fred

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [RFC PATCH V6 06/18] tcg: remove tcg_halt_cond global variable.
  2015-06-26 15:02   ` Paolo Bonzini
@ 2015-06-26 15:41     ` Frederic Konrad
  2015-07-07 12:27       ` Alex Bennée
  0 siblings, 1 reply; 82+ messages in thread
From: Frederic Konrad @ 2015-06-26 15:41 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel, mttcg
  Cc: peter.maydell, a.spyridakis, mark.burton, agraf,
	alistair.francis, guillaume.delbergue, alex.bennee

On 26/06/2015 17:02, Paolo Bonzini wrote:
>
> On 26/06/2015 16:47, fred.konrad@greensocs.com wrote:
>> From: KONRAD Frederic <fred.konrad@greensocs.com>
>>
>> This removes tcg_halt_cond global variable.
>> We need one QemuCond per virtual cpu for multithread TCG.
>>
>> Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
>> ---
>>   cpus.c | 18 +++++++-----------
>>   1 file changed, 7 insertions(+), 11 deletions(-)
>>
>> diff --git a/cpus.c b/cpus.c
>> index 2d62a35..79383df 100644
>> --- a/cpus.c
>> +++ b/cpus.c
>> @@ -813,7 +813,6 @@ static unsigned iothread_requesting_mutex;
>>   static QemuThread io_thread;
>>   
>>   static QemuThread *tcg_cpu_thread;
>> -static QemuCond *tcg_halt_cond;
>>   
>>   /* cpu creation */
>>   static QemuCond qemu_cpu_cond;
>> @@ -919,15 +918,13 @@ static void qemu_wait_io_event_common(CPUState *cpu)
>>       cpu->thread_kicked = false;
>>   }
>>   
>> -static void qemu_tcg_wait_io_event(void)
>> +static void qemu_tcg_wait_io_event(CPUState *cpu)
>>   {
>> -    CPUState *cpu;
>> -
>>       while (all_cpu_threads_idle()) {
>>          /* Start accounting real time to the virtual clock if the CPUs
>>             are idle.  */
>>           qemu_clock_warp(QEMU_CLOCK_VIRTUAL);
>> -        qemu_cond_wait(tcg_halt_cond, &qemu_global_mutex);
>> +        qemu_cond_wait(cpu->halt_cond, &qemu_global_mutex);
>>       }
>>   
>>       while (iothread_requesting_mutex) {
>> @@ -1047,7 +1044,7 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
>>   
>>       /* wait for initial kick-off after machine start */
>>       while (first_cpu->stopped) {
>> -        qemu_cond_wait(tcg_halt_cond, &qemu_global_mutex);
>> +        qemu_cond_wait(first_cpu->halt_cond, &qemu_global_mutex);
>>   
>>           /* process any pending work */
>>           CPU_FOREACH(cpu) {
>> @@ -1068,7 +1065,7 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
>>                   qemu_clock_notify(QEMU_CLOCK_VIRTUAL);
>>               }
>>           }
>> -        qemu_tcg_wait_io_event();
>> +        qemu_tcg_wait_io_event(QTAILQ_FIRST(&cpus));
> Does this work (for non-multithreaded TCG) if tcg_thread_fn is waiting
> on the "wrong" condition variable?  For example if all CPUs are idle and
> the second CPU wakes up, qemu_tcg_wait_io_event won't be kicked out of
> the wait.
>
> I think you need to have a CPUThread struct like this:
>
>     struct CPUThread {
>         QemuThread thread;
>         QemuCond halt_cond;
>     };
>
> and in CPUState have a CPUThread * field instead of the thread and
> halt_cond fields.
>
> Then single-threaded TCG can point all CPUStates to the same instance of
> the struct, while multi-threaded TCG can point each CPUState to a
> different struct.
>
> Paolo

Hmm probably not, though we didn't pay attention to keep the non MTTCG 
working.
(which is probably not good).

>
>>       }
>>   
>>       return NULL;
>> @@ -1235,12 +1232,12 @@ static void qemu_tcg_init_vcpu(CPUState *cpu)
>>   
>>       tcg_cpu_address_space_init(cpu, cpu->as);
>>   
>> +    cpu->halt_cond = g_malloc0(sizeof(QemuCond));
>> +    qemu_cond_init(cpu->halt_cond);
>> +
>>       /* share a single thread for all cpus with TCG */
>>       if (!tcg_cpu_thread) {
>>           cpu->thread = g_malloc0(sizeof(QemuThread));
>> -        cpu->halt_cond = g_malloc0(sizeof(QemuCond));
>> -        qemu_cond_init(cpu->halt_cond);
>> -        tcg_halt_cond = cpu->halt_cond;
>>           snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/TCG",
>>                    cpu->cpu_index);
>>           qemu_thread_create(cpu->thread, thread_name, qemu_tcg_cpu_thread_fn,
>> @@ -1254,7 +1251,6 @@ static void qemu_tcg_init_vcpu(CPUState *cpu)
>>           tcg_cpu_thread = cpu->thread;
>>       } else {
>>           cpu->thread = tcg_cpu_thread;
>> -        cpu->halt_cond = tcg_halt_cond;
>>       }

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [RFC PATCH V6 07/18] Drop global lock during TCG code execution
  2015-06-26 15:36     ` Frederic Konrad
@ 2015-06-26 15:42       ` Jan Kiszka
  2015-06-26 16:11         ` Frederic Konrad
  2015-07-07 12:33       ` Alex Bennée
  1 sibling, 1 reply; 82+ messages in thread
From: Jan Kiszka @ 2015-06-26 15:42 UTC (permalink / raw)
  To: Frederic Konrad, qemu-devel, mttcg
  Cc: peter.maydell, a.spyridakis, mark.burton, agraf,
	alistair.francis, guillaume.delbergue, pbonzini, alex.bennee

On 2015-06-26 17:36, Frederic Konrad wrote:
> On 26/06/2015 16:56, Jan Kiszka wrote:
>> On 2015-06-26 16:47, fred.konrad@greensocs.com wrote:
>>> From: Jan Kiszka <jan.kiszka@siemens.com>
>>>
>>> This finally allows TCG to benefit from the iothread introduction: Drop
>>> the global mutex while running pure TCG CPU code. Reacquire the lock
>>> when entering MMIO or PIO emulation, or when leaving the TCG loop.
>>>
>>> We have to revert a few optimization for the current TCG threading
>>> model, namely kicking the TCG thread in qemu_mutex_lock_iothread and not
>>> kicking it in qemu_cpu_kick. We also need to disable RAM block
>>> reordering until we have a more efficient locking mechanism at hand.
>>>
>>> I'm pretty sure some cases are still broken, definitely SMP (we no
>>> longer perform round-robin scheduling "by chance"). Still, a Linux x86
>>> UP guest and my Musicpal ARM model boot fine here. These numbers
>>> demonstrate where we gain something:
>>>
>>> 20338 jan       20   0  331m  75m 6904 R   99  0.9   0:50.95
>>> qemu-system-arm
>>> 20337 jan       20   0  331m  75m 6904 S   20  0.9   0:26.50
>>> qemu-system-arm
>>>
>>> The guest CPU was fully loaded, but the iothread could still run mostly
>>> independent on a second core. Without the patch we don't get beyond
>>>
>>> 32206 jan       20   0  330m  73m 7036 R   82  0.9   1:06.00
>>> qemu-system-arm
>>> 32204 jan       20   0  330m  73m 7036 S   21  0.9   0:17.03
>>> qemu-system-arm
>>>
>>> We don't benefit significantly, though, when the guest is not fully
>>> loading a host CPU.
>>>
>>> Note that this patch depends on
>>> http://thread.gmane.org/gmane.comp.emulators.qemu/118657
>>>
>>> Changes from Fred Konrad:
>>>    * Rebase on the current HEAD.
>>>    * Fixes a deadlock in qemu_devices_reset().
>>> ---
>>>   cpus.c                    | 17 ++++-------------
>>>   cputlb.c                  |  5 +++++
>>>   exec.c                    | 25 +++++++++++++++++++++++++
>>>   softmmu_template.h        |  5 +++++
>>>   target-i386/misc_helper.c | 27 ++++++++++++++++++++++++---
>>>   translate-all.c           |  2 ++
>>>   vl.c                      |  6 ++++++
>>>   7 files changed, 71 insertions(+), 16 deletions(-)
>>>
>>> diff --git a/cpus.c b/cpus.c
>>> index 79383df..23c316c 100644
>>> --- a/cpus.c
>>> +++ b/cpus.c
>>> @@ -1034,7 +1034,7 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
>>>       qemu_tcg_init_cpu_signals();
>>>       qemu_thread_get_self(cpu->thread);
>>>   -    qemu_mutex_lock(&qemu_global_mutex);
>>> +    qemu_mutex_lock_iothread();
>>>       CPU_FOREACH(cpu) {
>>>           cpu->thread_id = qemu_get_thread_id();
>>>           cpu->created = true;
>>> @@ -1145,18 +1145,7 @@ bool qemu_in_vcpu_thread(void)
>>>     void qemu_mutex_lock_iothread(void)
>>>   {
>>> -    atomic_inc(&iothread_requesting_mutex);
>>> -    if (!tcg_enabled() || !first_cpu || !first_cpu->thread) {
>>> -        qemu_mutex_lock(&qemu_global_mutex);
>>> -        atomic_dec(&iothread_requesting_mutex);
>>> -    } else {
>>> -        if (qemu_mutex_trylock(&qemu_global_mutex)) {
>>> -            qemu_cpu_kick_thread(first_cpu);
>>> -            qemu_mutex_lock(&qemu_global_mutex);
>>> -        }
>>> -        atomic_dec(&iothread_requesting_mutex);
>>> -        qemu_cond_broadcast(&qemu_io_proceeded_cond);
>>> -    }
>>> +    qemu_mutex_lock(&qemu_global_mutex);
>>>   }
>>>     void qemu_mutex_unlock_iothread(void)
>>> @@ -1377,7 +1366,9 @@ static int tcg_cpu_exec(CPUArchState *env)
>>>           cpu->icount_decr.u16.low = decr;
>>>           cpu->icount_extra = count;
>>>       }
>>> +    qemu_mutex_unlock_iothread();
>>>       ret = cpu_exec(env);
>>> +    qemu_mutex_lock_iothread();
>>>   #ifdef CONFIG_PROFILER
>>>       tcg_time += profile_getclock() - ti;
>>>   #endif
>>> diff --git a/cputlb.c b/cputlb.c
>>> index a506086..79fff1c 100644
>>> --- a/cputlb.c
>>> +++ b/cputlb.c
>>> @@ -30,6 +30,9 @@
>>>   #include "exec/ram_addr.h"
>>>   #include "tcg/tcg.h"
>>>   +void qemu_mutex_lock_iothread(void);
>>> +void qemu_mutex_unlock_iothread(void);
>>> +
>>>   //#define DEBUG_TLB
>>>   //#define DEBUG_TLB_CHECK
>>>   @@ -125,8 +128,10 @@ void tlb_flush_page(CPUState *cpu,
>>> target_ulong addr)
>>>      can be detected */
>>>   void tlb_protect_code(ram_addr_t ram_addr)
>>>   {
>>> +    qemu_mutex_lock_iothread();
>>>       cpu_physical_memory_test_and_clear_dirty(ram_addr,
>>> TARGET_PAGE_SIZE,
>>>                                                DIRTY_MEMORY_CODE);
>>> +    qemu_mutex_unlock_iothread();
>>>   }
>>>     /* update the TLB so that writes in physical page 'phys_addr' are
>>> no longer
>>> diff --git a/exec.c b/exec.c
>>> index f7883d2..964e922 100644
>>> --- a/exec.c
>>> +++ b/exec.c
>>> @@ -1881,6 +1881,7 @@ static void check_watchpoint(int offset, int
>>> len, MemTxAttrs attrs, int flags)
>>>               wp->hitaddr = vaddr;
>>>               wp->hitattrs = attrs;
>>>               if (!cpu->watchpoint_hit) {
>>> +                qemu_mutex_unlock_iothread();
>>>                   cpu->watchpoint_hit = wp;
>>>                   tb_check_watchpoint(cpu);
>>>                   if (wp->flags & BP_STOP_BEFORE_ACCESS) {
>>> @@ -2740,6 +2741,7 @@ static inline uint32_t
>>> address_space_ldl_internal(AddressSpace *as, hwaddr addr,
>>>       mr = address_space_translate(as, addr, &addr1, &l, false);
>>>       if (l < 4 || !memory_access_is_direct(mr, false)) {
>>>           /* I/O case */
>>> +        qemu_mutex_lock_iothread();
>>>           r = memory_region_dispatch_read(mr, addr1, &val, 4, attrs);
>>>   #if defined(TARGET_WORDS_BIGENDIAN)
>>>           if (endian == DEVICE_LITTLE_ENDIAN) {
>>> @@ -2750,6 +2752,7 @@ static inline uint32_t
>>> address_space_ldl_internal(AddressSpace *as, hwaddr addr,
>>>               val = bswap32(val);
>>>           }
>>>   #endif
>>> +        qemu_mutex_unlock_iothread();
>>>       } else {
>>>           /* RAM case */
>>>           ptr = qemu_get_ram_ptr((memory_region_get_ram_addr(mr)
>>> @@ -2829,6 +2832,7 @@ static inline uint64_t
>>> address_space_ldq_internal(AddressSpace *as, hwaddr addr,
>>>                                    false);
>>>       if (l < 8 || !memory_access_is_direct(mr, false)) {
>>>           /* I/O case */
>>> +        qemu_mutex_lock_iothread();
>>>           r = memory_region_dispatch_read(mr, addr1, &val, 8, attrs);
>>>   #if defined(TARGET_WORDS_BIGENDIAN)
>>>           if (endian == DEVICE_LITTLE_ENDIAN) {
>>> @@ -2839,6 +2843,7 @@ static inline uint64_t
>>> address_space_ldq_internal(AddressSpace *as, hwaddr addr,
>>>               val = bswap64(val);
>>>           }
>>>   #endif
>>> +        qemu_mutex_unlock_iothread();
>>>       } else {
>>>           /* RAM case */
>>>           ptr = qemu_get_ram_ptr((memory_region_get_ram_addr(mr)
>>> @@ -2938,7 +2943,9 @@ static inline uint32_t
>>> address_space_lduw_internal(AddressSpace *as,
>>>                                    false);
>>>       if (l < 2 || !memory_access_is_direct(mr, false)) {
>>>           /* I/O case */
>>> +        qemu_mutex_lock_iothread();
>>>           r = memory_region_dispatch_read(mr, addr1, &val, 2, attrs);
>>> +        qemu_mutex_unlock_iothread();
>>>   #if defined(TARGET_WORDS_BIGENDIAN)
>>>           if (endian == DEVICE_LITTLE_ENDIAN) {
>>>               val = bswap16(val);
>>> @@ -3026,15 +3033,19 @@ void address_space_stl_notdirty(AddressSpace
>>> *as, hwaddr addr, uint32_t val,
>>>       mr = address_space_translate(as, addr, &addr1, &l,
>>>                                    true);
>>>       if (l < 4 || !memory_access_is_direct(mr, true)) {
>>> +        qemu_mutex_lock_iothread();
>>>           r = memory_region_dispatch_write(mr, addr1, val, 4, attrs);
>>> +        qemu_mutex_unlock_iothread();
>>>       } else {
>>>           addr1 += memory_region_get_ram_addr(mr) & TARGET_PAGE_MASK;
>>>           ptr = qemu_get_ram_ptr(addr1);
>>>           stl_p(ptr, val);
>>>   +        qemu_mutex_lock_iothread();
>>>           dirty_log_mask = memory_region_get_dirty_log_mask(mr);
>>>           dirty_log_mask &= ~(1 << DIRTY_MEMORY_CODE);
>>>           cpu_physical_memory_set_dirty_range(addr1, 4, dirty_log_mask);
>>> +        qemu_mutex_unlock_iothread();
>>>           r = MEMTX_OK;
>>>       }
>>>       if (result) {
>>> @@ -3074,7 +3085,9 @@ static inline void
>>> address_space_stl_internal(AddressSpace *as,
>>>               val = bswap32(val);
>>>           }
>>>   #endif
>>> +        qemu_mutex_lock_iothread();
>>>           r = memory_region_dispatch_write(mr, addr1, val, 4, attrs);
>>> +        qemu_mutex_unlock_iothread();
>>>       } else {
>>>           /* RAM case */
>>>           addr1 += memory_region_get_ram_addr(mr) & TARGET_PAGE_MASK;
>>> @@ -3090,7 +3103,9 @@ static inline void
>>> address_space_stl_internal(AddressSpace *as,
>>>               stl_p(ptr, val);
>>>               break;
>>>           }
>>> +        qemu_mutex_lock_iothread();
>>>           invalidate_and_set_dirty(mr, addr1, 4);
>>> +        qemu_mutex_unlock_iothread();
>>>           r = MEMTX_OK;
>>>       }
>>>       if (result) {
>>> @@ -3178,7 +3193,9 @@ static inline void
>>> address_space_stw_internal(AddressSpace *as,
>>>               val = bswap16(val);
>>>           }
>>>   #endif
>>> +        qemu_mutex_lock_iothread();
>>>           r = memory_region_dispatch_write(mr, addr1, val, 2, attrs);
>>> +        qemu_mutex_unlock_iothread();
>>>       } else {
>>>           /* RAM case */
>>>           addr1 += memory_region_get_ram_addr(mr) & TARGET_PAGE_MASK;
>>> @@ -3194,7 +3211,9 @@ static inline void
>>> address_space_stw_internal(AddressSpace *as,
>>>               stw_p(ptr, val);
>>>               break;
>>>           }
>>> +        qemu_mutex_lock_iothread();
>>>           invalidate_and_set_dirty(mr, addr1, 2);
>>> +        qemu_mutex_unlock_iothread();
>>>           r = MEMTX_OK;
>>>       }
>>>       if (result) {
>>> @@ -3245,7 +3264,9 @@ void address_space_stq(AddressSpace *as, hwaddr
>>> addr, uint64_t val,
>>>   {
>>>       MemTxResult r;
>>>       val = tswap64(val);
>>> +    qemu_mutex_lock_iothread();
>>>       r = address_space_rw(as, addr, attrs, (void *) &val, 8, 1);
>>> +    qemu_mutex_unlock_iothread();
>>>       if (result) {
>>>           *result = r;
>>>       }
>>> @@ -3256,7 +3277,9 @@ void address_space_stq_le(AddressSpace *as,
>>> hwaddr addr, uint64_t val,
>>>   {
>>>       MemTxResult r;
>>>       val = cpu_to_le64(val);
>>> +    qemu_mutex_lock_iothread();
>>>       r = address_space_rw(as, addr, attrs, (void *) &val, 8, 1);
>>> +    qemu_mutex_unlock_iothread();
>>>       if (result) {
>>>           *result = r;
>>>       }
>>> @@ -3266,7 +3289,9 @@ void address_space_stq_be(AddressSpace *as,
>>> hwaddr addr, uint64_t val,
>>>   {
>>>       MemTxResult r;
>>>       val = cpu_to_be64(val);
>>> +    qemu_mutex_lock_iothread();
>>>       r = address_space_rw(as, addr, attrs, (void *) &val, 8, 1);
>>> +    qemu_mutex_unlock_iothread();
>>>       if (result) {
>>>           *result = r;
>>>       }
>>> diff --git a/softmmu_template.h b/softmmu_template.h
>>> index d42d89d..18871f5 100644
>>> --- a/softmmu_template.h
>>> +++ b/softmmu_template.h
>>> @@ -158,9 +158,12 @@ static inline DATA_TYPE glue(io_read,
>>> SUFFIX)(CPUArchState *env,
>>>           cpu_io_recompile(cpu, retaddr);
>>>       }
>>>   +    qemu_mutex_lock_iothread();
>>> +
>>>       cpu->mem_io_vaddr = addr;
>>>       memory_region_dispatch_read(mr, physaddr, &val, 1 << SHIFT,
>>>                                   iotlbentry->attrs);
>>> +    qemu_mutex_unlock_iothread();
>>>       return val;
>>>   }
>>>   #endif
>>> @@ -378,10 +381,12 @@ static inline void glue(io_write,
>>> SUFFIX)(CPUArchState *env,
>>>           cpu_io_recompile(cpu, retaddr);
>>>       }
>>>   +    qemu_mutex_lock_iothread();
>>>       cpu->mem_io_vaddr = addr;
>>>       cpu->mem_io_pc = retaddr;
>>>       memory_region_dispatch_write(mr, physaddr, val, 1 << SHIFT,
>>>                                    iotlbentry->attrs);
>>> +    qemu_mutex_unlock_iothread();
>>>   }
>>>     void helper_le_st_name(CPUArchState *env, target_ulong addr,
>>> DATA_TYPE val,
>>> diff --git a/target-i386/misc_helper.c b/target-i386/misc_helper.c
>>> index 52c5d65..55f63bf 100644
>>> --- a/target-i386/misc_helper.c
>>> +++ b/target-i386/misc_helper.c
>>> @@ -27,8 +27,10 @@ void helper_outb(CPUX86State *env, uint32_t port,
>>> uint32_t data)
>>>   #ifdef CONFIG_USER_ONLY
>>>       fprintf(stderr, "outb: port=0x%04x, data=%02x\n", port, data);
>>>   #else
>>> +    qemu_mutex_lock_iothread();
>>>       address_space_stb(&address_space_io, port, data,
>>>                         cpu_get_mem_attrs(env), NULL);
>>> +    qemu_mutex_unlock_iothread();
>>>   #endif
>>>   }
>>>   @@ -38,8 +40,13 @@ target_ulong helper_inb(CPUX86State *env,
>>> uint32_t port)
>>>       fprintf(stderr, "inb: port=0x%04x\n", port);
>>>       return 0;
>>>   #else
>>> -    return address_space_ldub(&address_space_io, port,
>>> +    target_ulong ret;
>>> +
>>> +    qemu_mutex_lock_iothread();
>>> +    ret = address_space_ldub(&address_space_io, port,
>>>                                 cpu_get_mem_attrs(env), NULL);
>>> +    qemu_mutex_unlock_iothread();
>>> +    return ret;
>>>   #endif
>>>   }
>>>   @@ -48,8 +55,10 @@ void helper_outw(CPUX86State *env, uint32_t
>>> port, uint32_t data)
>>>   #ifdef CONFIG_USER_ONLY
>>>       fprintf(stderr, "outw: port=0x%04x, data=%04x\n", port, data);
>>>   #else
>>> +    qemu_mutex_lock_iothread();
>>>       address_space_stw(&address_space_io, port, data,
>>>                         cpu_get_mem_attrs(env), NULL);
>>> +    qemu_mutex_unlock_iothread();
>>>   #endif
>>>   }
>>>   @@ -59,8 +68,13 @@ target_ulong helper_inw(CPUX86State *env,
>>> uint32_t port)
>>>       fprintf(stderr, "inw: port=0x%04x\n", port);
>>>       return 0;
>>>   #else
>>> -    return address_space_lduw(&address_space_io, port,
>>> +    target_ulong ret;
>>> +
>>> +    qemu_mutex_lock_iothread();
>>> +    ret = address_space_lduw(&address_space_io, port,
>>>                                 cpu_get_mem_attrs(env), NULL);
>>> +    qemu_mutex_unlock_iothread();
>>> +    return ret;
>>>   #endif
>>>   }
>>>   @@ -69,8 +83,10 @@ void helper_outl(CPUX86State *env, uint32_t
>>> port, uint32_t data)
>>>   #ifdef CONFIG_USER_ONLY
>>>       fprintf(stderr, "outw: port=0x%04x, data=%08x\n", port, data);
>>>   #else
>>> +    qemu_mutex_lock_iothread();
>>>       address_space_stl(&address_space_io, port, data,
>>>                         cpu_get_mem_attrs(env), NULL);
>>> +    qemu_mutex_unlock_iothread();
>>>   #endif
>>>   }
>>>   @@ -80,8 +96,13 @@ target_ulong helper_inl(CPUX86State *env,
>>> uint32_t port)
>>>       fprintf(stderr, "inl: port=0x%04x\n", port);
>>>       return 0;
>>>   #else
>>> -    return address_space_ldl(&address_space_io, port,
>>> +    target_ulong ret;
>>> +
>>> +    qemu_mutex_lock_iothread();
>>> +    ret = address_space_ldl(&address_space_io, port,
>>>                                cpu_get_mem_attrs(env), NULL);
>>> +    qemu_mutex_unlock_iothread();
>>> +    return ret;
>>>   #endif
>>>   }
>>>   diff --git a/translate-all.c b/translate-all.c
>>> index c25b79b..ade2269 100644
>>> --- a/translate-all.c
>>> +++ b/translate-all.c
>>> @@ -1222,6 +1222,7 @@ void
>>> tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end,
>>>   #endif
>>>   #ifdef TARGET_HAS_PRECISE_SMC
>>>       if (current_tb_modified) {
>>> +        qemu_mutex_unlock_iothread();
>>>           /* we generate a block containing just the instruction
>>>              modifying the memory. It will ensure that it cannot modify
>>>              itself */
>>> @@ -1326,6 +1327,7 @@ static void
>>> tb_invalidate_phys_page(tb_page_addr_t addr,
>>>       p->first_tb = NULL;
>>>   #ifdef TARGET_HAS_PRECISE_SMC
>>>       if (current_tb_modified) {
>>> +        qemu_mutex_unlock_iothread();
>>>           /* we generate a block containing just the instruction
>>>              modifying the memory. It will ensure that it cannot modify
>>>              itself */
>>> diff --git a/vl.c b/vl.c
>>> index 69ad90c..2983d44 100644
>>> --- a/vl.c
>>> +++ b/vl.c
>>> @@ -1698,10 +1698,16 @@ void qemu_devices_reset(void)
>>>   {
>>>       QEMUResetEntry *re, *nre;
>>>   +    /*
>>> +     * Some device's reset needs to grab the global_mutex. So just
>>> release it
>>> +     * here.
>> That's a property newly introduced by the patch, or how does this
>> happen? In turn, are all reset handlers now fine to be called outside of
>> BQL? This looks suspicious, but it's been quite a while since I last
>> starred at this.
>>
>> Jan
> Hi Jan,
> 
> Sorry for that, it's a dirty hack :).
> Some reset handler probably load stuff in the memory hence a double lock.
> It will probably disappear with:
> 
> http://thread.gmane.org/gmane.comp.emulators.qemu/345258

Hmm, skeptical, at least as long as most devices work under BQL.

Do you have some backtraces from lockups?

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [RFC PATCH V6 05/18] protect TBContext with tb_lock.
  2015-06-26 15:39     ` Frederic Konrad
@ 2015-06-26 15:45       ` Paolo Bonzini
  0 siblings, 0 replies; 82+ messages in thread
From: Paolo Bonzini @ 2015-06-26 15:45 UTC (permalink / raw)
  To: Frederic Konrad, qemu-devel, mttcg
  Cc: peter.maydell, a.spyridakis, mark.burton, agraf,
	alistair.francis, guillaume.delbergue, alex.bennee



On 26/06/2015 17:39, Frederic Konrad wrote:
>>>
>>> @@ -11567,6 +11570,7 @@ void arm_cpu_dump_state(CPUState *cs, FILE
>>> *f, fprintf_function cpu_fprintf,
>>>     void restore_state_to_opc(CPUARMState *env, TranslationBlock *tb,
>>> int pc_pos)
>>>   {
>>> +    tb_lock();
>>>       if (is_a64(env)) {
>>>           env->pc = tcg_ctx.gen_opc_pc[pc_pos];
>>>           env->condexec_bits = 0;
>>> @@ -11574,4 +11578,5 @@ void restore_state_to_opc(CPUARMState *env,
>>> TranslationBlock *tb, int pc_pos)
>>>           env->regs[15] = tcg_ctx.gen_opc_pc[pc_pos];
>>>           env->condexec_bits = gen_opc_condexec_bits[pc_pos];
>>>       }
>>> +    tb_unlock();
>>>   }
>> Should these instead be added to the callers?
>>
>> Paolo
> Good point,
> I see only one caller and the mutex is already locked.

Good, then add a comment in include/exec/exec-all.h ("/* Called with
tb_lock held.  */") please!

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [RFC PATCH V6 03/18] remove unused spinlock.
  2015-06-26 15:29     ` Frederic Konrad
@ 2015-06-26 15:46       ` Paolo Bonzini
  0 siblings, 0 replies; 82+ messages in thread
From: Paolo Bonzini @ 2015-06-26 15:46 UTC (permalink / raw)
  To: Frederic Konrad, qemu-devel, mttcg
  Cc: peter.maydell, a.spyridakis, mark.burton, agraf,
	alistair.francis, guillaume.delbergue, alex.bennee



On 26/06/2015 17:29, Frederic Konrad wrote:
>> The checkpatch.pl parts simply come from Linux.  They don't matter for
>> QEMU, but we're limiting the changes to the minimum in this script.
>
> Ok so I can drop this part from the patch?

Yes, please!

Paolo

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [RFC PATCH V6 15/18] cpu: introduce tlb_flush*_all.
  2015-06-26 15:15   ` Paolo Bonzini
@ 2015-06-26 15:54     ` Frederic Konrad
  2015-06-26 16:01       ` Paolo Bonzini
  0 siblings, 1 reply; 82+ messages in thread
From: Frederic Konrad @ 2015-06-26 15:54 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel, mttcg
  Cc: peter.maydell, a.spyridakis, mark.burton, agraf,
	alistair.francis, guillaume.delbergue, alex.bennee

On 26/06/2015 17:15, Paolo Bonzini wrote:
>
> On 26/06/2015 16:47, fred.konrad@greensocs.com wrote:
>> +    CPU_FOREACH(cpu) {
>> +        if (qemu_cpu_is_self(cpu)) {
>> +            /* async_run_on_cpu handle this case but this just avoid a malloc
>> +             * here.
>> +             */
>> +            tlb_flush(cpu, flush_global);
>> +        } else {
>> +            params = g_malloc(sizeof(struct TLBFlushParams));
>> +            params->cpu = cpu;
>> +            params->flush_global = flush_global;
>> +            async_run_on_cpu(cpu, tlb_flush_async_work, params);
> Shouldn't this be synchronous (which you cannot do straightforwardly
> because of deadlocks---hence the need to hook cpu_has_work as discussed
> earlier)?
>
> Paolo
>
I think it doesn't requires to be synchronous as each VCPUs only clear 
it's own
tlb here:

void tlb_flush(CPUState *cpu, int flush_global)
{
     CPUArchState *env = cpu->env_ptr;

#if defined(DEBUG_TLB)
     printf("tlb_flush:\n");
#endif
     /* must reset current TB so that interrupts cannot modify the
        links while we are modifying them */
     cpu->current_tb = NULL;

     memset(env->tlb_table, -1, sizeof(env->tlb_table));
     memset(env->tlb_v_table, -1, sizeof(env->tlb_v_table));
     memset(cpu->tb_jmp_cache, 0, sizeof(cpu->tb_jmp_cache));

     env->vtlb_index = 0;
     env->tlb_flush_addr = -1;
     env->tlb_flush_mask = 0;
     tlb_flush_count++;
}

So what happen is:
An arm instruction want to clear tlb of all VCPUs eg: IS version of TLBIALL.
The VCPU which execute the TLBIALL_IS can't flush tlb of other VCPU.
It will just ask all VCPU thread to exit and to do tlb_flush hence the 
async_work.

Maybe the big issue might be memory barrier instruction here which I didn't
checked.

Fred
>> +        }
>> +    }

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [RFC PATCH V6 15/18] cpu: introduce tlb_flush*_all.
  2015-06-26 15:54     ` Frederic Konrad
@ 2015-06-26 16:01       ` Paolo Bonzini
  2015-06-26 16:08         ` Peter Maydell
  0 siblings, 1 reply; 82+ messages in thread
From: Paolo Bonzini @ 2015-06-26 16:01 UTC (permalink / raw)
  To: Frederic Konrad, qemu-devel, mttcg
  Cc: peter.maydell, a.spyridakis, mark.burton, agraf,
	alistair.francis, guillaume.delbergue, alex.bennee



On 26/06/2015 17:54, Frederic Konrad wrote:
>>
> I think it doesn't requires to be synchronous as each VCPUs only clear
> it's own
> tlb here:
> 
> void tlb_flush(CPUState *cpu, int flush_global)
> {
>     CPUArchState *env = cpu->env_ptr;
> 
> #if defined(DEBUG_TLB)
>     printf("tlb_flush:\n");
> #endif
>     /* must reset current TB so that interrupts cannot modify the
>        links while we are modifying them */
>     cpu->current_tb = NULL;
> 
>     memset(env->tlb_table, -1, sizeof(env->tlb_table));
>     memset(env->tlb_v_table, -1, sizeof(env->tlb_v_table));
>     memset(cpu->tb_jmp_cache, 0, sizeof(cpu->tb_jmp_cache));
> 
>     env->vtlb_index = 0;
>     env->tlb_flush_addr = -1;
>     env->tlb_flush_mask = 0;
>     tlb_flush_count++;
> }
> 
> So what happen is:
> An arm instruction want to clear tlb of all VCPUs eg: IS version of
> TLBIALL.
> The VCPU which execute the TLBIALL_IS can't flush tlb of other VCPU.
> It will just ask all VCPU thread to exit and to do tlb_flush hence the
> async_work.
> 
> Maybe the big issue might be memory barrier instruction here which I didn't
> checked.

Yeah, ISTR that in some cases you have to wait for other CPUs to
invalidate the TLB before proceeding.  Maybe it's only when you have a
dmb instruction, but it's probably simpler for QEMU to always do it
synchronously.

Paolo

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [RFC PATCH V6 15/18] cpu: introduce tlb_flush*_all.
  2015-06-26 16:01       ` Paolo Bonzini
@ 2015-06-26 16:08         ` Peter Maydell
  2015-06-26 16:30           ` Frederic Konrad
                             ` (2 more replies)
  0 siblings, 3 replies; 82+ messages in thread
From: Peter Maydell @ 2015-06-26 16:08 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: mttcg, Alexander Spyridakis, Mark Burton, Alexander Graf,
	QEMU Developers, Guillaume Delbergue, Alistair Francis,
	Alex Bennée, Frederic Konrad

On 26 June 2015 at 17:01, Paolo Bonzini <pbonzini@redhat.com> wrote:
> On 26/06/2015 17:54, Frederic Konrad wrote:
>> So what happen is:
>> An arm instruction want to clear tlb of all VCPUs eg: IS version of
>> TLBIALL.
>> The VCPU which execute the TLBIALL_IS can't flush tlb of other VCPU.
>> It will just ask all VCPU thread to exit and to do tlb_flush hence the
>> async_work.
>>
>> Maybe the big issue might be memory barrier instruction here which I didn't
>> checked.
>
> Yeah, ISTR that in some cases you have to wait for other CPUs to
> invalidate the TLB before proceeding.  Maybe it's only when you have a
> dmb instruction, but it's probably simpler for QEMU to always do it
> synchronously.

Yeah, the ARM architectural requirement here is that the TLB
operation is complete after a DSB instruction executes. (True for
any TLB op, not just the all-CPUs ones). NB that we also call
tlb_flush() from target-arm/ code for some things like "we just
updated a system register"; some of those have "must take effect
immediately" semantics.

In any case, for generic code we have to also consider the
semantics of non-ARM guests...

thanks
-- PMM

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [RFC PATCH V6 13/18] cpu: introduce async_run_safe_work_on_cpu.
  2015-06-26 15:35   ` Paolo Bonzini
@ 2015-06-26 16:09     ` Frederic Konrad
  2015-06-26 16:23       ` Paolo Bonzini
  0 siblings, 1 reply; 82+ messages in thread
From: Frederic Konrad @ 2015-06-26 16:09 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel, mttcg
  Cc: peter.maydell, a.spyridakis, mark.burton, agraf,
	alistair.francis, guillaume.delbergue, alex.bennee

On 26/06/2015 17:35, Paolo Bonzini wrote:
> On 26/06/2015 16:47, fred.konrad@greensocs.com wrote:
>> diff --git a/cpu-exec.c b/cpu-exec.c
>> index de256d6..d6442cd 100644
>> --- a/cpu-exec.c
>> +++ b/cpu-exec.c
> Nice solution.  However I still have a few questions that need
> clarification.
>
>> @@ -382,6 +382,11 @@ int cpu_exec(CPUArchState *env)
>>       volatile bool have_tb_lock = false;
>>   #endif
>>   
>> +    if (async_safe_work_pending()) {
>> +        cpu->exit_request = 1;
>> +        return 0;
>> +    }
> Perhaps move this to cpu_can_run()?
Yes why not.

>
>>       if (cpu->halted) {
>>           if (!cpu_has_work(cpu)) {
>>               return EXCP_HALTED;
>> diff --git a/cpus.c b/cpus.c
>> index 5f13d73..aee445a 100644
>> --- a/cpus.c
>> +++ b/cpus.c
>> @@ -75,7 +75,7 @@ bool cpu_is_stopped(CPUState *cpu)
>>   
>>   bool cpu_thread_is_idle(CPUState *cpu)
>>   {
>> -    if (cpu->stop || cpu->queued_work_first) {
>> +    if (cpu->stop || cpu->queued_work_first || cpu->queued_safe_work_first) {
>>           return false;
>>       }
>>       if (cpu_is_stopped(cpu)) {
>> @@ -892,6 +892,69 @@ void async_run_on_cpu(CPUState *cpu, void (*func)(void *data), void *data)
>>       qemu_cpu_kick(cpu);
>>   }
>>   
>> +void async_run_safe_work_on_cpu(CPUState *cpu, void (*func)(void *data),
>> +                                void *data)
>> +{
> Do you need a mutex to protect this data structure?  I would use one
> even if not strictly necessary, to avoid introducing new BQL-protected
> structures.

For the moment it's called by tb_invalidate and tb_flush_safe the second 
lacks a
tb_lock/unlock which should be added. I don't need an other mutex expect 
if this is
used elsewhere?

> Also, can you add a count of how many such work items exist in the whole
> system, in order to speed up async_safe_work_pending?

Yes that makes sense.
>> +    struct qemu_work_item *wi;
>> +
>> +    wi = g_malloc0(sizeof(struct qemu_work_item));
>> +    wi->func = func;
>> +    wi->data = data;
>> +    wi->free = true;
>> +    if (cpu->queued_safe_work_first == NULL) {
>> +        cpu->queued_safe_work_first = wi;
>> +    } else {
>> +        cpu->queued_safe_work_last->next = wi;
>> +    }
>> +    cpu->queued_safe_work_last = wi;
>> +    wi->next = NULL;
>> +    wi->done = false;
>> +
>> +    CPU_FOREACH(cpu) {
>> +        qemu_cpu_kick_thread(cpu);
>> +    }
>> +}
>> +
>> +static void flush_queued_safe_work(CPUState *cpu)
>> +{
>> +    struct qemu_work_item *wi;
>> +    CPUState *other_cpu;
>> +
>> +    if (cpu->queued_safe_work_first == NULL) {
>> +        return;
>> +    }
>> +
>> +    CPU_FOREACH(other_cpu) {
>> +        if (other_cpu->tcg_executing != 0) {
> This causes the thread to busy wait until everyone has exited, right?
> Not a big deal, but worth a comment.

Right.

Fred
> Paolo
>
>> +            return;
>> +        }
>> +    }
>> +
>> +    while ((wi = cpu->queued_safe_work_first)) {
>> +        cpu->queued_safe_work_first = wi->next;
>> +        wi->func(wi->data);
>> +        wi->done = true;
>> +        if (wi->free) {
>> +            g_free(wi);
>> +        }
>> +    }
>> +    cpu->queued_safe_work_last = NULL;
>> +    qemu_cond_broadcast(&qemu_work_cond);
>> +}
>> +
>> +bool async_safe_work_pending(void)
>> +{
>> +    CPUState *cpu;
>> +
>> +    CPU_FOREACH(cpu) {
>> +        if (cpu->queued_safe_work_first) {
>> +            return true;
>> +        }
>> +    }
>> +
>> +    return false;
>> +}
>> +

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [RFC PATCH V6 07/18] Drop global lock during TCG code execution
  2015-06-26 15:42       ` Jan Kiszka
@ 2015-06-26 16:11         ` Frederic Konrad
  0 siblings, 0 replies; 82+ messages in thread
From: Frederic Konrad @ 2015-06-26 16:11 UTC (permalink / raw)
  To: Jan Kiszka, qemu-devel, mttcg
  Cc: peter.maydell, a.spyridakis, mark.burton, agraf,
	alistair.francis, guillaume.delbergue, pbonzini, alex.bennee

On 26/06/2015 17:42, Jan Kiszka wrote:
> On 2015-06-26 17:36, Frederic Konrad wrote:
>> On 26/06/2015 16:56, Jan Kiszka wrote:
>>> On 2015-06-26 16:47, fred.konrad@greensocs.com wrote:
>>>> From: Jan Kiszka <jan.kiszka@siemens.com>
>>>>
>>>> This finally allows TCG to benefit from the iothread introduction: Drop
>>>> the global mutex while running pure TCG CPU code. Reacquire the lock
>>>> when entering MMIO or PIO emulation, or when leaving the TCG loop.
>>>>
>>>> We have to revert a few optimization for the current TCG threading
>>>> model, namely kicking the TCG thread in qemu_mutex_lock_iothread and not
>>>> kicking it in qemu_cpu_kick. We also need to disable RAM block
>>>> reordering until we have a more efficient locking mechanism at hand.
>>>>
>>>> I'm pretty sure some cases are still broken, definitely SMP (we no
>>>> longer perform round-robin scheduling "by chance"). Still, a Linux x86
>>>> UP guest and my Musicpal ARM model boot fine here. These numbers
>>>> demonstrate where we gain something:
>>>>
>>>> 20338 jan       20   0  331m  75m 6904 R   99  0.9   0:50.95
>>>> qemu-system-arm
>>>> 20337 jan       20   0  331m  75m 6904 S   20  0.9   0:26.50
>>>> qemu-system-arm
>>>>
>>>> The guest CPU was fully loaded, but the iothread could still run mostly
>>>> independent on a second core. Without the patch we don't get beyond
>>>>
>>>> 32206 jan       20   0  330m  73m 7036 R   82  0.9   1:06.00
>>>> qemu-system-arm
>>>> 32204 jan       20   0  330m  73m 7036 S   21  0.9   0:17.03
>>>> qemu-system-arm
>>>>
>>>> We don't benefit significantly, though, when the guest is not fully
>>>> loading a host CPU.
>>>>
>>>> Note that this patch depends on
>>>> http://thread.gmane.org/gmane.comp.emulators.qemu/118657
>>>>
>>>> Changes from Fred Konrad:
>>>>     * Rebase on the current HEAD.
>>>>     * Fixes a deadlock in qemu_devices_reset().
>>>> ---
>>>>    cpus.c                    | 17 ++++-------------
>>>>    cputlb.c                  |  5 +++++
>>>>    exec.c                    | 25 +++++++++++++++++++++++++
>>>>    softmmu_template.h        |  5 +++++
>>>>    target-i386/misc_helper.c | 27 ++++++++++++++++++++++++---
>>>>    translate-all.c           |  2 ++
>>>>    vl.c                      |  6 ++++++
>>>>    7 files changed, 71 insertions(+), 16 deletions(-)
>>>>
>>>> diff --git a/cpus.c b/cpus.c
>>>> index 79383df..23c316c 100644
>>>> --- a/cpus.c
>>>> +++ b/cpus.c
>>>> @@ -1034,7 +1034,7 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
>>>>        qemu_tcg_init_cpu_signals();
>>>>        qemu_thread_get_self(cpu->thread);
>>>>    -    qemu_mutex_lock(&qemu_global_mutex);
>>>> +    qemu_mutex_lock_iothread();
>>>>        CPU_FOREACH(cpu) {
>>>>            cpu->thread_id = qemu_get_thread_id();
>>>>            cpu->created = true;
>>>> @@ -1145,18 +1145,7 @@ bool qemu_in_vcpu_thread(void)
>>>>      void qemu_mutex_lock_iothread(void)
>>>>    {
>>>> -    atomic_inc(&iothread_requesting_mutex);
>>>> -    if (!tcg_enabled() || !first_cpu || !first_cpu->thread) {
>>>> -        qemu_mutex_lock(&qemu_global_mutex);
>>>> -        atomic_dec(&iothread_requesting_mutex);
>>>> -    } else {
>>>> -        if (qemu_mutex_trylock(&qemu_global_mutex)) {
>>>> -            qemu_cpu_kick_thread(first_cpu);
>>>> -            qemu_mutex_lock(&qemu_global_mutex);
>>>> -        }
>>>> -        atomic_dec(&iothread_requesting_mutex);
>>>> -        qemu_cond_broadcast(&qemu_io_proceeded_cond);
>>>> -    }
>>>> +    qemu_mutex_lock(&qemu_global_mutex);
>>>>    }
>>>>      void qemu_mutex_unlock_iothread(void)
>>>> @@ -1377,7 +1366,9 @@ static int tcg_cpu_exec(CPUArchState *env)
>>>>            cpu->icount_decr.u16.low = decr;
>>>>            cpu->icount_extra = count;
>>>>        }
>>>> +    qemu_mutex_unlock_iothread();
>>>>        ret = cpu_exec(env);
>>>> +    qemu_mutex_lock_iothread();
>>>>    #ifdef CONFIG_PROFILER
>>>>        tcg_time += profile_getclock() - ti;
>>>>    #endif
>>>> diff --git a/cputlb.c b/cputlb.c
>>>> index a506086..79fff1c 100644
>>>> --- a/cputlb.c
>>>> +++ b/cputlb.c
>>>> @@ -30,6 +30,9 @@
>>>>    #include "exec/ram_addr.h"
>>>>    #include "tcg/tcg.h"
>>>>    +void qemu_mutex_lock_iothread(void);
>>>> +void qemu_mutex_unlock_iothread(void);
>>>> +
>>>>    //#define DEBUG_TLB
>>>>    //#define DEBUG_TLB_CHECK
>>>>    @@ -125,8 +128,10 @@ void tlb_flush_page(CPUState *cpu,
>>>> target_ulong addr)
>>>>       can be detected */
>>>>    void tlb_protect_code(ram_addr_t ram_addr)
>>>>    {
>>>> +    qemu_mutex_lock_iothread();
>>>>        cpu_physical_memory_test_and_clear_dirty(ram_addr,
>>>> TARGET_PAGE_SIZE,
>>>>                                                 DIRTY_MEMORY_CODE);
>>>> +    qemu_mutex_unlock_iothread();
>>>>    }
>>>>      /* update the TLB so that writes in physical page 'phys_addr' are
>>>> no longer
>>>> diff --git a/exec.c b/exec.c
>>>> index f7883d2..964e922 100644
>>>> --- a/exec.c
>>>> +++ b/exec.c
>>>> @@ -1881,6 +1881,7 @@ static void check_watchpoint(int offset, int
>>>> len, MemTxAttrs attrs, int flags)
>>>>                wp->hitaddr = vaddr;
>>>>                wp->hitattrs = attrs;
>>>>                if (!cpu->watchpoint_hit) {
>>>> +                qemu_mutex_unlock_iothread();
>>>>                    cpu->watchpoint_hit = wp;
>>>>                    tb_check_watchpoint(cpu);
>>>>                    if (wp->flags & BP_STOP_BEFORE_ACCESS) {
>>>> @@ -2740,6 +2741,7 @@ static inline uint32_t
>>>> address_space_ldl_internal(AddressSpace *as, hwaddr addr,
>>>>        mr = address_space_translate(as, addr, &addr1, &l, false);
>>>>        if (l < 4 || !memory_access_is_direct(mr, false)) {
>>>>            /* I/O case */
>>>> +        qemu_mutex_lock_iothread();
>>>>            r = memory_region_dispatch_read(mr, addr1, &val, 4, attrs);
>>>>    #if defined(TARGET_WORDS_BIGENDIAN)
>>>>            if (endian == DEVICE_LITTLE_ENDIAN) {
>>>> @@ -2750,6 +2752,7 @@ static inline uint32_t
>>>> address_space_ldl_internal(AddressSpace *as, hwaddr addr,
>>>>                val = bswap32(val);
>>>>            }
>>>>    #endif
>>>> +        qemu_mutex_unlock_iothread();
>>>>        } else {
>>>>            /* RAM case */
>>>>            ptr = qemu_get_ram_ptr((memory_region_get_ram_addr(mr)
>>>> @@ -2829,6 +2832,7 @@ static inline uint64_t
>>>> address_space_ldq_internal(AddressSpace *as, hwaddr addr,
>>>>                                     false);
>>>>        if (l < 8 || !memory_access_is_direct(mr, false)) {
>>>>            /* I/O case */
>>>> +        qemu_mutex_lock_iothread();
>>>>            r = memory_region_dispatch_read(mr, addr1, &val, 8, attrs);
>>>>    #if defined(TARGET_WORDS_BIGENDIAN)
>>>>            if (endian == DEVICE_LITTLE_ENDIAN) {
>>>> @@ -2839,6 +2843,7 @@ static inline uint64_t
>>>> address_space_ldq_internal(AddressSpace *as, hwaddr addr,
>>>>                val = bswap64(val);
>>>>            }
>>>>    #endif
>>>> +        qemu_mutex_unlock_iothread();
>>>>        } else {
>>>>            /* RAM case */
>>>>            ptr = qemu_get_ram_ptr((memory_region_get_ram_addr(mr)
>>>> @@ -2938,7 +2943,9 @@ static inline uint32_t
>>>> address_space_lduw_internal(AddressSpace *as,
>>>>                                     false);
>>>>        if (l < 2 || !memory_access_is_direct(mr, false)) {
>>>>            /* I/O case */
>>>> +        qemu_mutex_lock_iothread();
>>>>            r = memory_region_dispatch_read(mr, addr1, &val, 2, attrs);
>>>> +        qemu_mutex_unlock_iothread();
>>>>    #if defined(TARGET_WORDS_BIGENDIAN)
>>>>            if (endian == DEVICE_LITTLE_ENDIAN) {
>>>>                val = bswap16(val);
>>>> @@ -3026,15 +3033,19 @@ void address_space_stl_notdirty(AddressSpace
>>>> *as, hwaddr addr, uint32_t val,
>>>>        mr = address_space_translate(as, addr, &addr1, &l,
>>>>                                     true);
>>>>        if (l < 4 || !memory_access_is_direct(mr, true)) {
>>>> +        qemu_mutex_lock_iothread();
>>>>            r = memory_region_dispatch_write(mr, addr1, val, 4, attrs);
>>>> +        qemu_mutex_unlock_iothread();
>>>>        } else {
>>>>            addr1 += memory_region_get_ram_addr(mr) & TARGET_PAGE_MASK;
>>>>            ptr = qemu_get_ram_ptr(addr1);
>>>>            stl_p(ptr, val);
>>>>    +        qemu_mutex_lock_iothread();
>>>>            dirty_log_mask = memory_region_get_dirty_log_mask(mr);
>>>>            dirty_log_mask &= ~(1 << DIRTY_MEMORY_CODE);
>>>>            cpu_physical_memory_set_dirty_range(addr1, 4, dirty_log_mask);
>>>> +        qemu_mutex_unlock_iothread();
>>>>            r = MEMTX_OK;
>>>>        }
>>>>        if (result) {
>>>> @@ -3074,7 +3085,9 @@ static inline void
>>>> address_space_stl_internal(AddressSpace *as,
>>>>                val = bswap32(val);
>>>>            }
>>>>    #endif
>>>> +        qemu_mutex_lock_iothread();
>>>>            r = memory_region_dispatch_write(mr, addr1, val, 4, attrs);
>>>> +        qemu_mutex_unlock_iothread();
>>>>        } else {
>>>>            /* RAM case */
>>>>            addr1 += memory_region_get_ram_addr(mr) & TARGET_PAGE_MASK;
>>>> @@ -3090,7 +3103,9 @@ static inline void
>>>> address_space_stl_internal(AddressSpace *as,
>>>>                stl_p(ptr, val);
>>>>                break;
>>>>            }
>>>> +        qemu_mutex_lock_iothread();
>>>>            invalidate_and_set_dirty(mr, addr1, 4);
>>>> +        qemu_mutex_unlock_iothread();
>>>>            r = MEMTX_OK;
>>>>        }
>>>>        if (result) {
>>>> @@ -3178,7 +3193,9 @@ static inline void
>>>> address_space_stw_internal(AddressSpace *as,
>>>>                val = bswap16(val);
>>>>            }
>>>>    #endif
>>>> +        qemu_mutex_lock_iothread();
>>>>            r = memory_region_dispatch_write(mr, addr1, val, 2, attrs);
>>>> +        qemu_mutex_unlock_iothread();
>>>>        } else {
>>>>            /* RAM case */
>>>>            addr1 += memory_region_get_ram_addr(mr) & TARGET_PAGE_MASK;
>>>> @@ -3194,7 +3211,9 @@ static inline void
>>>> address_space_stw_internal(AddressSpace *as,
>>>>                stw_p(ptr, val);
>>>>                break;
>>>>            }
>>>> +        qemu_mutex_lock_iothread();
>>>>            invalidate_and_set_dirty(mr, addr1, 2);
>>>> +        qemu_mutex_unlock_iothread();
>>>>            r = MEMTX_OK;
>>>>        }
>>>>        if (result) {
>>>> @@ -3245,7 +3264,9 @@ void address_space_stq(AddressSpace *as, hwaddr
>>>> addr, uint64_t val,
>>>>    {
>>>>        MemTxResult r;
>>>>        val = tswap64(val);
>>>> +    qemu_mutex_lock_iothread();
>>>>        r = address_space_rw(as, addr, attrs, (void *) &val, 8, 1);
>>>> +    qemu_mutex_unlock_iothread();
>>>>        if (result) {
>>>>            *result = r;
>>>>        }
>>>> @@ -3256,7 +3277,9 @@ void address_space_stq_le(AddressSpace *as,
>>>> hwaddr addr, uint64_t val,
>>>>    {
>>>>        MemTxResult r;
>>>>        val = cpu_to_le64(val);
>>>> +    qemu_mutex_lock_iothread();
>>>>        r = address_space_rw(as, addr, attrs, (void *) &val, 8, 1);
>>>> +    qemu_mutex_unlock_iothread();
>>>>        if (result) {
>>>>            *result = r;
>>>>        }
>>>> @@ -3266,7 +3289,9 @@ void address_space_stq_be(AddressSpace *as,
>>>> hwaddr addr, uint64_t val,
>>>>    {
>>>>        MemTxResult r;
>>>>        val = cpu_to_be64(val);
>>>> +    qemu_mutex_lock_iothread();
>>>>        r = address_space_rw(as, addr, attrs, (void *) &val, 8, 1);
>>>> +    qemu_mutex_unlock_iothread();
>>>>        if (result) {
>>>>            *result = r;
>>>>        }
>>>> diff --git a/softmmu_template.h b/softmmu_template.h
>>>> index d42d89d..18871f5 100644
>>>> --- a/softmmu_template.h
>>>> +++ b/softmmu_template.h
>>>> @@ -158,9 +158,12 @@ static inline DATA_TYPE glue(io_read,
>>>> SUFFIX)(CPUArchState *env,
>>>>            cpu_io_recompile(cpu, retaddr);
>>>>        }
>>>>    +    qemu_mutex_lock_iothread();
>>>> +
>>>>        cpu->mem_io_vaddr = addr;
>>>>        memory_region_dispatch_read(mr, physaddr, &val, 1 << SHIFT,
>>>>                                    iotlbentry->attrs);
>>>> +    qemu_mutex_unlock_iothread();
>>>>        return val;
>>>>    }
>>>>    #endif
>>>> @@ -378,10 +381,12 @@ static inline void glue(io_write,
>>>> SUFFIX)(CPUArchState *env,
>>>>            cpu_io_recompile(cpu, retaddr);
>>>>        }
>>>>    +    qemu_mutex_lock_iothread();
>>>>        cpu->mem_io_vaddr = addr;
>>>>        cpu->mem_io_pc = retaddr;
>>>>        memory_region_dispatch_write(mr, physaddr, val, 1 << SHIFT,
>>>>                                     iotlbentry->attrs);
>>>> +    qemu_mutex_unlock_iothread();
>>>>    }
>>>>      void helper_le_st_name(CPUArchState *env, target_ulong addr,
>>>> DATA_TYPE val,
>>>> diff --git a/target-i386/misc_helper.c b/target-i386/misc_helper.c
>>>> index 52c5d65..55f63bf 100644
>>>> --- a/target-i386/misc_helper.c
>>>> +++ b/target-i386/misc_helper.c
>>>> @@ -27,8 +27,10 @@ void helper_outb(CPUX86State *env, uint32_t port,
>>>> uint32_t data)
>>>>    #ifdef CONFIG_USER_ONLY
>>>>        fprintf(stderr, "outb: port=0x%04x, data=%02x\n", port, data);
>>>>    #else
>>>> +    qemu_mutex_lock_iothread();
>>>>        address_space_stb(&address_space_io, port, data,
>>>>                          cpu_get_mem_attrs(env), NULL);
>>>> +    qemu_mutex_unlock_iothread();
>>>>    #endif
>>>>    }
>>>>    @@ -38,8 +40,13 @@ target_ulong helper_inb(CPUX86State *env,
>>>> uint32_t port)
>>>>        fprintf(stderr, "inb: port=0x%04x\n", port);
>>>>        return 0;
>>>>    #else
>>>> -    return address_space_ldub(&address_space_io, port,
>>>> +    target_ulong ret;
>>>> +
>>>> +    qemu_mutex_lock_iothread();
>>>> +    ret = address_space_ldub(&address_space_io, port,
>>>>                                  cpu_get_mem_attrs(env), NULL);
>>>> +    qemu_mutex_unlock_iothread();
>>>> +    return ret;
>>>>    #endif
>>>>    }
>>>>    @@ -48,8 +55,10 @@ void helper_outw(CPUX86State *env, uint32_t
>>>> port, uint32_t data)
>>>>    #ifdef CONFIG_USER_ONLY
>>>>        fprintf(stderr, "outw: port=0x%04x, data=%04x\n", port, data);
>>>>    #else
>>>> +    qemu_mutex_lock_iothread();
>>>>        address_space_stw(&address_space_io, port, data,
>>>>                          cpu_get_mem_attrs(env), NULL);
>>>> +    qemu_mutex_unlock_iothread();
>>>>    #endif
>>>>    }
>>>>    @@ -59,8 +68,13 @@ target_ulong helper_inw(CPUX86State *env,
>>>> uint32_t port)
>>>>        fprintf(stderr, "inw: port=0x%04x\n", port);
>>>>        return 0;
>>>>    #else
>>>> -    return address_space_lduw(&address_space_io, port,
>>>> +    target_ulong ret;
>>>> +
>>>> +    qemu_mutex_lock_iothread();
>>>> +    ret = address_space_lduw(&address_space_io, port,
>>>>                                  cpu_get_mem_attrs(env), NULL);
>>>> +    qemu_mutex_unlock_iothread();
>>>> +    return ret;
>>>>    #endif
>>>>    }
>>>>    @@ -69,8 +83,10 @@ void helper_outl(CPUX86State *env, uint32_t
>>>> port, uint32_t data)
>>>>    #ifdef CONFIG_USER_ONLY
>>>>        fprintf(stderr, "outw: port=0x%04x, data=%08x\n", port, data);
>>>>    #else
>>>> +    qemu_mutex_lock_iothread();
>>>>        address_space_stl(&address_space_io, port, data,
>>>>                          cpu_get_mem_attrs(env), NULL);
>>>> +    qemu_mutex_unlock_iothread();
>>>>    #endif
>>>>    }
>>>>    @@ -80,8 +96,13 @@ target_ulong helper_inl(CPUX86State *env,
>>>> uint32_t port)
>>>>        fprintf(stderr, "inl: port=0x%04x\n", port);
>>>>        return 0;
>>>>    #else
>>>> -    return address_space_ldl(&address_space_io, port,
>>>> +    target_ulong ret;
>>>> +
>>>> +    qemu_mutex_lock_iothread();
>>>> +    ret = address_space_ldl(&address_space_io, port,
>>>>                                 cpu_get_mem_attrs(env), NULL);
>>>> +    qemu_mutex_unlock_iothread();
>>>> +    return ret;
>>>>    #endif
>>>>    }
>>>>    diff --git a/translate-all.c b/translate-all.c
>>>> index c25b79b..ade2269 100644
>>>> --- a/translate-all.c
>>>> +++ b/translate-all.c
>>>> @@ -1222,6 +1222,7 @@ void
>>>> tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end,
>>>>    #endif
>>>>    #ifdef TARGET_HAS_PRECISE_SMC
>>>>        if (current_tb_modified) {
>>>> +        qemu_mutex_unlock_iothread();
>>>>            /* we generate a block containing just the instruction
>>>>               modifying the memory. It will ensure that it cannot modify
>>>>               itself */
>>>> @@ -1326,6 +1327,7 @@ static void
>>>> tb_invalidate_phys_page(tb_page_addr_t addr,
>>>>        p->first_tb = NULL;
>>>>    #ifdef TARGET_HAS_PRECISE_SMC
>>>>        if (current_tb_modified) {
>>>> +        qemu_mutex_unlock_iothread();
>>>>            /* we generate a block containing just the instruction
>>>>               modifying the memory. It will ensure that it cannot modify
>>>>               itself */
>>>> diff --git a/vl.c b/vl.c
>>>> index 69ad90c..2983d44 100644
>>>> --- a/vl.c
>>>> +++ b/vl.c
>>>> @@ -1698,10 +1698,16 @@ void qemu_devices_reset(void)
>>>>    {
>>>>        QEMUResetEntry *re, *nre;
>>>>    +    /*
>>>> +     * Some device's reset needs to grab the global_mutex. So just
>>>> release it
>>>> +     * here.
>>> That's a property newly introduced by the patch, or how does this
>>> happen? In turn, are all reset handlers now fine to be called outside of
>>> BQL? This looks suspicious, but it's been quite a while since I last
>>> starred at this.
>>>
>>> Jan
>> Hi Jan,
>>
>> Sorry for that, it's a dirty hack :).
>> Some reset handler probably load stuff in the memory hence a double lock.
>> It will probably disappear with:
>>
>> http://thread.gmane.org/gmane.comp.emulators.qemu/345258
> Hmm, skeptical, at least as long as most devices work under BQL.
>
> Do you have some backtraces from lockups?
>
> Jan
>
I can try to reproduce, but this hack was introduced long time ago maybe 
it has
been already fixed?

Fred

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [RFC PATCH V6 05/18] protect TBContext with tb_lock.
  2015-06-26 14:47 ` [Qemu-devel] [RFC PATCH V6 05/18] protect TBContext with tb_lock fred.konrad
  2015-06-26 14:56   ` Paolo Bonzini
@ 2015-06-26 16:20   ` Paolo Bonzini
  2015-07-07 12:22   ` Alex Bennée
  2 siblings, 0 replies; 82+ messages in thread
From: Paolo Bonzini @ 2015-06-26 16:20 UTC (permalink / raw)
  To: fred.konrad, qemu-devel, mttcg
  Cc: peter.maydell, a.spyridakis, mark.burton, agraf,
	alistair.francis, guillaume.delbergue, alex.bennee



On 26/06/2015 16:47, fred.konrad@greensocs.com wrote:
> @@ -273,8 +274,9 @@ static TranslationBlock *tb_find_slow(CPUArchState *env,
>      ptb1 = &tcg_ctx.tb_ctx.tb_phys_hash[h];
>      for(;;) {
>          tb = *ptb1;
> -        if (!tb)
> -            goto not_found;
> +        if (!tb) {
> +            return tb;
> +        }

You are dereferencing tb outside the lock. You need a
smp_read_barrier_depends() here, and a smp_wmb() at the beginning of
tb_link_page.

Paolo

>          if (tb->pc == pc &&

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [RFC PATCH V6 14/18] add a callback when tb_invalidate is called.
  2015-06-26 14:47 ` [Qemu-devel] [RFC PATCH V6 14/18] add a callback when tb_invalidate is called fred.konrad
@ 2015-06-26 16:20   ` Paolo Bonzini
  2015-06-26 16:40     ` Frederic Konrad
  2015-07-07 15:32   ` Alex Bennée
  1 sibling, 1 reply; 82+ messages in thread
From: Paolo Bonzini @ 2015-06-26 16:20 UTC (permalink / raw)
  To: fred.konrad, qemu-devel, mttcg
  Cc: peter.maydell, a.spyridakis, mark.burton, agraf,
	alistair.francis, guillaume.delbergue, alex.bennee



On 26/06/2015 16:47, fred.konrad@greensocs.com wrote:
> From: KONRAD Frederic <fred.konrad@greensocs.com>
> 
> Instead of doing the jump cache invalidation directly in tb_invalidate delay it
> after the exit so we don't have an other CPU trying to execute the code being
> invalidated.
> 
> Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
> ---
>  translate-all.c | 61 +++++++++++++++++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 59 insertions(+), 2 deletions(-)
> 
> diff --git a/translate-all.c b/translate-all.c
> index ade2269..468648d 100644
> --- a/translate-all.c
> +++ b/translate-all.c
> @@ -61,6 +61,7 @@
>  #include "translate-all.h"
>  #include "qemu/bitmap.h"
>  #include "qemu/timer.h"
> +#include "sysemu/cpus.h"
>  
>  //#define DEBUG_TB_INVALIDATE
>  //#define DEBUG_FLUSH
> @@ -966,14 +967,58 @@ static inline void tb_reset_jump(TranslationBlock *tb, int n)
>      tb_set_jmp_target(tb, n, (uintptr_t)(tb->tc_ptr + tb->tb_next_offset[n]));
>  }
>  
> +struct CPUDiscardTBParams {
> +    CPUState *cpu;
> +    TranslationBlock *tb;
> +};
> +
> +static void cpu_discard_tb_from_jmp_cache(void *opaque)
> +{
> +    unsigned int h;
> +    struct CPUDiscardTBParams *params = opaque;
> +
> +    h = tb_jmp_cache_hash_func(params->tb->pc);
> +    if (params->cpu->tb_jmp_cache[h] == params->tb) {
> +        params->cpu->tb_jmp_cache[h] = NULL;
> +    }

It is a bit more tricky, but I think you can avoid async_run_on_cpu by
doing this:

1) introduce a QemuSeqLock in TBContext, e.g. invalidate_seqlock.

2) wrap this "if" with seqlock_write_lock/unlock

3) in cpu-exec.c do this:

     /* we add the TB in the virtual pc hash table */
+    idx = seqlock_read_begin(&tcg_ctx.tb_ctx.invalidate_seqlock);
     cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)] = tb;
+    if (seqlock_read_retry(&tcg_ctx.tb_ctx.invalidate_seqlock)) {
+        /* Another CPU invalidated a tb in the meanwhile.  We do not
+         * know if it's this one, but play it safe and avoid caching
+         * it.
+         */
+        cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)] = NULL;
+    }

> +    /* suppress this TB from the two jump lists */
> +    tb_jmp_remove(tb, 0);
> +    tb_jmp_remove(tb, 1);

If you do the above synchronously, this part doesn't need to be deferred
either.

Then, immediately after the two tb_jmp_remove calls you can also check
whether "(tb->jmp_first & 3) == 2": if so, the expensive expensive
async_run_safe_work_on_cpu can be skipped.

Paolo

> +#endif /* MTTCG */
>  
>      tcg_ctx.tb_ctx.tb_phys_invalidate_count++;
>      tb_unlock();
> 

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [RFC PATCH V6 18/18] translate-all: (wip) use tb_flush_safe when we can't alloc more tb.
  2015-06-26 14:47 ` [Qemu-devel] [RFC PATCH V6 18/18] translate-all: (wip) use tb_flush_safe when we can't alloc more tb fred.konrad
@ 2015-06-26 16:21   ` Paolo Bonzini
  2015-06-26 16:38     ` Frederic Konrad
  2015-07-07 16:17   ` Alex Bennée
  1 sibling, 1 reply; 82+ messages in thread
From: Paolo Bonzini @ 2015-06-26 16:21 UTC (permalink / raw)
  To: fred.konrad, qemu-devel, mttcg
  Cc: peter.maydell, a.spyridakis, mark.burton, agraf,
	alistair.francis, guillaume.delbergue, alex.bennee



On 26/06/2015 16:47, fred.konrad@greensocs.com wrote:
> @@ -1147,7 +1147,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
>      tb = tb_alloc(pc);
>      if (!tb) {
>          /* flush must be done */
> -        tb_flush(env);
> +        tb_flush_safe(env);

Should you just call cpu_loop_exit() here, instead of redoing the
tb_alloc etc.?

Paolo

>          /* cannot fail at this point */
>          tb = tb_alloc(pc);

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [RFC PATCH V6 13/18] cpu: introduce async_run_safe_work_on_cpu.
  2015-06-26 16:09     ` Frederic Konrad
@ 2015-06-26 16:23       ` Paolo Bonzini
  2015-06-26 16:36         ` Frederic Konrad
  0 siblings, 1 reply; 82+ messages in thread
From: Paolo Bonzini @ 2015-06-26 16:23 UTC (permalink / raw)
  To: Frederic Konrad, qemu-devel, mttcg
  Cc: peter.maydell, a.spyridakis, mark.burton, agraf,
	alistair.francis, guillaume.delbergue, alex.bennee



On 26/06/2015 18:09, Frederic Konrad wrote:
>>>
>>>   +void async_run_safe_work_on_cpu(CPUState *cpu, void (*func)(void
>>> *data),
>>> +                                void *data)
>>> +{
>> Do you need a mutex to protect this data structure?  I would use one
>> even if not strictly necessary, to avoid introducing new BQL-protected
>> structures.
> 
> For the moment it's called by tb_invalidate and tb_flush_safe the second
> lacks a
> tb_lock/unlock which should be added. I don't need an other mutex expect
> if this is
> used elsewhere?

In any case, the locking policy should be documented.

At which point you realize that protecting a CPU's
queued_safe_work_{first,next} fields with the tb_lock is a bit weird. :)
 I would add a mutex inside CPUState, and then later we could also use
it for regular run_on_cpu/async_run_on_cpu.

Paolo

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [RFC PATCH V6 15/18] cpu: introduce tlb_flush*_all.
  2015-06-26 16:08         ` Peter Maydell
@ 2015-06-26 16:30           ` Frederic Konrad
  2015-06-26 16:31             ` Paolo Bonzini
  2015-07-06 14:29             ` Mark Burton
  2015-06-26 16:54           ` Paolo Bonzini
  2015-07-08 15:35           ` Frederic Konrad
  2 siblings, 2 replies; 82+ messages in thread
From: Frederic Konrad @ 2015-06-26 16:30 UTC (permalink / raw)
  To: Peter Maydell, Paolo Bonzini
  Cc: mttcg, Alexander Graf, Alexander Spyridakis, Mark Burton,
	QEMU Developers, Alistair Francis, Guillaume Delbergue,
	Alex Bennée

On 26/06/2015 18:08, Peter Maydell wrote:
> On 26 June 2015 at 17:01, Paolo Bonzini <pbonzini@redhat.com> wrote:
>> On 26/06/2015 17:54, Frederic Konrad wrote:
>>> So what happen is:
>>> An arm instruction want to clear tlb of all VCPUs eg: IS version of
>>> TLBIALL.
>>> The VCPU which execute the TLBIALL_IS can't flush tlb of other VCPU.
>>> It will just ask all VCPU thread to exit and to do tlb_flush hence the
>>> async_work.
>>>
>>> Maybe the big issue might be memory barrier instruction here which I didn't
>>> checked.
>> Yeah, ISTR that in some cases you have to wait for other CPUs to
>> invalidate the TLB before proceeding.  Maybe it's only when you have a
>> dmb instruction, but it's probably simpler for QEMU to always do it
>> synchronously.
> Yeah, the ARM architectural requirement here is that the TLB
> operation is complete after a DSB instruction executes. (True for
> any TLB op, not just the all-CPUs ones). NB that we also call
> tlb_flush() from target-arm/ code for some things like "we just
> updated a system register"; some of those have "must take effect
> immediately" semantics.
>
> In any case, for generic code we have to also consider the
> semantics of non-ARM guests...
>
> thanks
> -- PMM
Yes this is not the case as I implemented it.

The rest of the TB will be executed before the tlb_flush work really happen.
The old version did this, was slow and was a mess (if two VCPUs want to 
tlb_flush
at the same time and an other tlb_flush_page.. it becomes tricky..)

I think it's not really terrible if the other VCPU execute some stuff 
before doing the
tlb_flush.? So the solution would be only to cut the TranslationBlock 
after instruction
which require a tlb_flush?

Thanks,
Fred

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [RFC PATCH V6 15/18] cpu: introduce tlb_flush*_all.
  2015-06-26 16:30           ` Frederic Konrad
@ 2015-06-26 16:31             ` Paolo Bonzini
  2015-06-26 16:35               ` Frederic Konrad
  2015-07-06 14:29             ` Mark Burton
  1 sibling, 1 reply; 82+ messages in thread
From: Paolo Bonzini @ 2015-06-26 16:31 UTC (permalink / raw)
  To: Frederic Konrad, Peter Maydell
  Cc: mttcg, Alexander Graf, Alexander Spyridakis, Mark Burton,
	QEMU Developers, Alistair Francis, Guillaume Delbergue,
	Alex Bennée



On 26/06/2015 18:30, Frederic Konrad wrote:
> Yes this is not the case as I implemented it.
> 
> The rest of the TB will be executed before the tlb_flush work really 
> happen. The old version did this, was slow and was a mess (if two
> VCPUs want to tlb_flush at the same time and an other
> tlb_flush_page.. it becomes tricky..)

Have you tried implementing the solution based on cpu->halted?

> I think it's not really terrible if the other VCPU execute some
> stuff before doing the tlb_flush.? So the solution would be only to
> cut the TranslationBlock after instruction which require a
> tlb_flush?

Yes, this is required too.

Paolo

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [RFC PATCH V6 15/18] cpu: introduce tlb_flush*_all.
  2015-06-26 16:31             ` Paolo Bonzini
@ 2015-06-26 16:35               ` Frederic Konrad
  2015-06-26 16:39                 ` Paolo Bonzini
  0 siblings, 1 reply; 82+ messages in thread
From: Frederic Konrad @ 2015-06-26 16:35 UTC (permalink / raw)
  To: Paolo Bonzini, Peter Maydell
  Cc: mttcg, Alexander Graf, Alexander Spyridakis, Mark Burton,
	QEMU Developers, Alistair Francis, Guillaume Delbergue,
	Alex Bennée

On 26/06/2015 18:31, Paolo Bonzini wrote:
>
> On 26/06/2015 18:30, Frederic Konrad wrote:
>> Yes this is not the case as I implemented it.
>>
>> The rest of the TB will be executed before the tlb_flush work really
>> happen. The old version did this, was slow and was a mess (if two
>> VCPUs want to tlb_flush at the same time and an other
>> tlb_flush_page.. it becomes tricky..)
> Have you tried implementing the solution based on cpu->halted?
You mean based on cpu_has_work?

Yes it was a little painfull (eg: it required cpu to be halted.. but 
maybe it's what you
were suggesting?)

>> I think it's not really terrible if the other VCPU execute some
>> stuff before doing the tlb_flush.? So the solution would be only to
>> cut the TranslationBlock after instruction which require a
>> tlb_flush?
> Yes, this is required too.
>
> Paolo

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [RFC PATCH V6 13/18] cpu: introduce async_run_safe_work_on_cpu.
  2015-06-26 16:23       ` Paolo Bonzini
@ 2015-06-26 16:36         ` Frederic Konrad
  0 siblings, 0 replies; 82+ messages in thread
From: Frederic Konrad @ 2015-06-26 16:36 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel, mttcg
  Cc: peter.maydell, a.spyridakis, mark.burton, agraf,
	alistair.francis, guillaume.delbergue, alex.bennee

On 26/06/2015 18:23, Paolo Bonzini wrote:
>
> On 26/06/2015 18:09, Frederic Konrad wrote:
>>>>    +void async_run_safe_work_on_cpu(CPUState *cpu, void (*func)(void
>>>> *data),
>>>> +                                void *data)
>>>> +{
>>> Do you need a mutex to protect this data structure?  I would use one
>>> even if not strictly necessary, to avoid introducing new BQL-protected
>>> structures.
>> For the moment it's called by tb_invalidate and tb_flush_safe the second
>> lacks a
>> tb_lock/unlock which should be added. I don't need an other mutex expect
>> if this is
>> used elsewhere?
> In any case, the locking policy should be documented.
>
> At which point you realize that protecting a CPU's
> queued_safe_work_{first,next} fields with the tb_lock is a bit weird. :)
>   I would add a mutex inside CPUState, and then later we could also use
> it for regular run_on_cpu/async_run_on_cpu.
>
> Paolo
Ok that makes sense :).

Fred

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [RFC PATCH V6 18/18] translate-all: (wip) use tb_flush_safe when we can't alloc more tb.
  2015-06-26 16:21   ` Paolo Bonzini
@ 2015-06-26 16:38     ` Frederic Konrad
  0 siblings, 0 replies; 82+ messages in thread
From: Frederic Konrad @ 2015-06-26 16:38 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel, mttcg
  Cc: peter.maydell, a.spyridakis, mark.burton, agraf,
	alistair.francis, guillaume.delbergue, alex.bennee

On 26/06/2015 18:21, Paolo Bonzini wrote:
>
> On 26/06/2015 16:47, fred.konrad@greensocs.com wrote:
>> @@ -1147,7 +1147,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
>>       tb = tb_alloc(pc);
>>       if (!tb) {
>>           /* flush must be done */
>> -        tb_flush(env);
>> +        tb_flush_safe(env);
> Should you just call cpu_loop_exit() here, instead of redoing the
> tb_alloc etc.?
>
> Paolo
Ah yes good point!

Thanks,
Fred

>>           /* cannot fail at this point */
>>           tb = tb_alloc(pc);

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [RFC PATCH V6 15/18] cpu: introduce tlb_flush*_all.
  2015-06-26 16:35               ` Frederic Konrad
@ 2015-06-26 16:39                 ` Paolo Bonzini
  0 siblings, 0 replies; 82+ messages in thread
From: Paolo Bonzini @ 2015-06-26 16:39 UTC (permalink / raw)
  To: Frederic Konrad, Peter Maydell
  Cc: mttcg, Alexander Graf, Alexander Spyridakis, Mark Burton,
	QEMU Developers, Alistair Francis, Guillaume Delbergue,
	Alex Bennée



On 26/06/2015 18:35, Frederic Konrad wrote:
>>>
>> Have you tried implementing the solution based on cpu->halted?
> You mean based on cpu_has_work?
> 
> Yes it was a little painfull (eg: it required cpu to be halted.. but
> maybe it's what you were suggesting?)

Yes. :)  No problem, we can discuss it at KVM Forum.

Paolo

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [RFC PATCH V6 14/18] add a callback when tb_invalidate is called.
  2015-06-26 16:20   ` Paolo Bonzini
@ 2015-06-26 16:40     ` Frederic Konrad
  0 siblings, 0 replies; 82+ messages in thread
From: Frederic Konrad @ 2015-06-26 16:40 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel, mttcg
  Cc: peter.maydell, a.spyridakis, mark.burton, agraf,
	alistair.francis, guillaume.delbergue, alex.bennee

On 26/06/2015 18:20, Paolo Bonzini wrote:
>
> On 26/06/2015 16:47, fred.konrad@greensocs.com wrote:
>> From: KONRAD Frederic <fred.konrad@greensocs.com>
>>
>> Instead of doing the jump cache invalidation directly in tb_invalidate delay it
>> after the exit so we don't have an other CPU trying to execute the code being
>> invalidated.
>>
>> Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
>> ---
>>   translate-all.c | 61 +++++++++++++++++++++++++++++++++++++++++++++++++++++++--
>>   1 file changed, 59 insertions(+), 2 deletions(-)
>>
>> diff --git a/translate-all.c b/translate-all.c
>> index ade2269..468648d 100644
>> --- a/translate-all.c
>> +++ b/translate-all.c
>> @@ -61,6 +61,7 @@
>>   #include "translate-all.h"
>>   #include "qemu/bitmap.h"
>>   #include "qemu/timer.h"
>> +#include "sysemu/cpus.h"
>>   
>>   //#define DEBUG_TB_INVALIDATE
>>   //#define DEBUG_FLUSH
>> @@ -966,14 +967,58 @@ static inline void tb_reset_jump(TranslationBlock *tb, int n)
>>       tb_set_jmp_target(tb, n, (uintptr_t)(tb->tc_ptr + tb->tb_next_offset[n]));
>>   }
>>   
>> +struct CPUDiscardTBParams {
>> +    CPUState *cpu;
>> +    TranslationBlock *tb;
>> +};
>> +
>> +static void cpu_discard_tb_from_jmp_cache(void *opaque)
>> +{
>> +    unsigned int h;
>> +    struct CPUDiscardTBParams *params = opaque;
>> +
>> +    h = tb_jmp_cache_hash_func(params->tb->pc);
>> +    if (params->cpu->tb_jmp_cache[h] == params->tb) {
>> +        params->cpu->tb_jmp_cache[h] = NULL;
>> +    }
> It is a bit more tricky, but I think you can avoid async_run_on_cpu by
> doing this:
>
> 1) introduce a QemuSeqLock in TBContext, e.g. invalidate_seqlock.
>
> 2) wrap this "if" with seqlock_write_lock/unlock
>
> 3) in cpu-exec.c do this:
>
>       /* we add the TB in the virtual pc hash table */
> +    idx = seqlock_read_begin(&tcg_ctx.tb_ctx.invalidate_seqlock);
>       cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)] = tb;
> +    if (seqlock_read_retry(&tcg_ctx.tb_ctx.invalidate_seqlock)) {
> +        /* Another CPU invalidated a tb in the meanwhile.  We do not
> +         * know if it's this one, but play it safe and avoid caching
> +         * it.
> +         */
> +        cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)] = NULL;
> +    }
>
>> +    /* suppress this TB from the two jump lists */
>> +    tb_jmp_remove(tb, 0);
>> +    tb_jmp_remove(tb, 1);
> If you do the above synchronously, this part doesn't need to be deferred
> either.
>
> Then, immediately after the two tb_jmp_remove calls you can also check
> whether "(tb->jmp_first & 3) == 2": if so, the expensive expensive
> async_run_safe_work_on_cpu can be skipped.
>
> Paolo
Ok seems tricky :) I'll take a look at this.

Fred

>> +#endif /* MTTCG */
>>   
>>       tcg_ctx.tb_ctx.tb_phys_invalidate_count++;
>>       tb_unlock();
>>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [RFC PATCH V6 15/18] cpu: introduce tlb_flush*_all.
  2015-06-26 16:08         ` Peter Maydell
  2015-06-26 16:30           ` Frederic Konrad
@ 2015-06-26 16:54           ` Paolo Bonzini
  2015-07-08 15:35           ` Frederic Konrad
  2 siblings, 0 replies; 82+ messages in thread
From: Paolo Bonzini @ 2015-06-26 16:54 UTC (permalink / raw)
  To: Peter Maydell
  Cc: mttcg, Alexander Spyridakis, Mark Burton, Alexander Graf,
	QEMU Developers, Guillaume Delbergue, Alistair Francis,
	Alex Bennée, Frederic Konrad



On 26/06/2015 18:08, Peter Maydell wrote:
>> > Yeah, ISTR that in some cases you have to wait for other CPUs to
>> > invalidate the TLB before proceeding.  Maybe it's only when you have a
>> > dmb instruction, but it's probably simpler for QEMU to always do it
>> > synchronously.
> Yeah, the ARM architectural requirement here is that the TLB
> operation is complete after a DSB instruction executes. (True for
> any TLB op, not just the all-CPUs ones). NB that we also call
> tlb_flush() from target-arm/ code for some things like "we just
> updated a system register"; some of those have "must take effect
> immediately" semantics.
> 
> In any case, for generic code we have to also consider the
> semantics of non-ARM guests...

I think it would be okay to make this an ARM-specific thing.  In most
other architectures that I know of, TLB shootdowns are done in software
thorough IPI.

Paolo

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [RFC PATCH V6 15/18] cpu: introduce tlb_flush*_all.
  2015-06-26 16:30           ` Frederic Konrad
  2015-06-26 16:31             ` Paolo Bonzini
@ 2015-07-06 14:29             ` Mark Burton
  2015-07-07 16:12               ` Alex Bennée
  1 sibling, 1 reply; 82+ messages in thread
From: Mark Burton @ 2015-07-06 14:29 UTC (permalink / raw)
  To: Paolo Bonzini, Alexander Spyridakis, Alex Bennée
  Cc: mttcg, Peter Maydell, QEMU Developers, KONRAD Frédéric

Paolo, Alex, Alexander,

Talking to Fred after the call about ways of avoiding the ‘stop the world’ (or rather ‘sync the world’) - we already discussed this on this thread.
One thing that would be very helpful would be some test cases around this. We could then use Fred’s code to check some of the possible solutions out….

I’m not sure if there is wiggle room in Peter’s statement below. Can the TLB operation be completed on one core, but not ‘seen’ by other cores until they hit an exit…..?

Cheers

Mark.


> On 26 Jun 2015, at 18:30, Frederic Konrad <fred.konrad@greensocs.com> wrote:
> 
> On 26/06/2015 18:08, Peter Maydell wrote:
>> On 26 June 2015 at 17:01, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>> On 26/06/2015 17:54, Frederic Konrad wrote:
>>>> So what happen is:
>>>> An arm instruction want to clear tlb of all VCPUs eg: IS version of
>>>> TLBIALL.
>>>> The VCPU which execute the TLBIALL_IS can't flush tlb of other VCPU.
>>>> It will just ask all VCPU thread to exit and to do tlb_flush hence the
>>>> async_work.
>>>> 
>>>> Maybe the big issue might be memory barrier instruction here which I didn't
>>>> checked.
>>> Yeah, ISTR that in some cases you have to wait for other CPUs to
>>> invalidate the TLB before proceeding.  Maybe it's only when you have a
>>> dmb instruction, but it's probably simpler for QEMU to always do it
>>> synchronously.
>> Yeah, the ARM architectural requirement here is that the TLB
>> operation is complete after a DSB instruction executes. (True for
>> any TLB op, not just the all-CPUs ones). NB that we also call
>> tlb_flush() from target-arm/ code for some things like "we just
>> updated a system register"; some of those have "must take effect
>> immediately" semantics.
>> 
>> In any case, for generic code we have to also consider the
>> semantics of non-ARM guests...
>> 
>> thanks
>> -- PMM
> Yes this is not the case as I implemented it.
> 
> The rest of the TB will be executed before the tlb_flush work really happen.
> The old version did this, was slow and was a mess (if two VCPUs want to tlb_flush
> at the same time and an other tlb_flush_page.. it becomes tricky..)
> 
> I think it's not really terrible if the other VCPU execute some stuff before doing the
> tlb_flush.? So the solution would be only to cut the TranslationBlock after instruction
> which require a tlb_flush?
> 
> Thanks,
> Fred
> 

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [RFC PATCH V6 01/18] cpu: make cpu_thread_is_idle public.
  2015-06-26 14:47 ` [Qemu-devel] [RFC PATCH V6 01/18] cpu: make cpu_thread_is_idle public fred.konrad
@ 2015-07-07  9:47   ` Alex Bennée
  2015-07-07 11:43     ` Frederic Konrad
  0 siblings, 1 reply; 82+ messages in thread
From: Alex Bennée @ 2015-07-07  9:47 UTC (permalink / raw)
  To: fred.konrad
  Cc: mttcg, peter.maydell, a.spyridakis, mark.burton, agraf,
	qemu-devel, guillaume.delbergue, pbonzini, alistair.francis


fred.konrad@greensocs.com writes:

> From: KONRAD Frederic <fred.konrad@greensocs.com>

Why are we making this visible? Looking at the tree I can't see it being
used outside the cpus.c. I see the function is modified later for async
work. Is this something we are planing to use later?

>
> Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
> ---
>  cpus.c            |  2 +-
>  include/qom/cpu.h | 11 +++++++++++
>  2 files changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/cpus.c b/cpus.c
> index 4f0e54d..2d62a35 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -74,7 +74,7 @@ bool cpu_is_stopped(CPUState *cpu)
>      return cpu->stopped || !runstate_is_running();
>  }
>  
> -static bool cpu_thread_is_idle(CPUState *cpu)
> +bool cpu_thread_is_idle(CPUState *cpu)
>  {
>      if (cpu->stop || cpu->queued_work_first) {
>          return false;
> diff --git a/include/qom/cpu.h b/include/qom/cpu.h
> index 39f0f19..af3c9e4 100644
> --- a/include/qom/cpu.h
> +++ b/include/qom/cpu.h
> @@ -514,6 +514,17 @@ void qemu_cpu_kick(CPUState *cpu);
>  bool cpu_is_stopped(CPUState *cpu);
>  
>  /**
> + * cpu_thread_is_idle:
> + * @cpu: The CPU to check.
> + *
> + * Checks whether the CPU thread is idle.
> + *
> + * Returns: %true if the thread is idle;
> + * %false otherwise.
> + */
> +bool cpu_thread_is_idle(CPUState *cpu);
> +
> +/**
>   * run_on_cpu:
>   * @cpu: The vCPU to run on.
>   * @func: The function to be executed.

-- 
Alex Bennée

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [RFC PATCH V6 02/18] replace spinlock by QemuMutex.
  2015-06-26 14:47 ` [Qemu-devel] [RFC PATCH V6 02/18] replace spinlock by QemuMutex fred.konrad
@ 2015-07-07 10:15   ` Alex Bennée
  2015-07-07 10:22     ` Paolo Bonzini
  2015-07-07 11:46     ` Frederic Konrad
  0 siblings, 2 replies; 82+ messages in thread
From: Alex Bennée @ 2015-07-07 10:15 UTC (permalink / raw)
  To: fred.konrad
  Cc: mttcg, peter.maydell, a.spyridakis, mark.burton, agraf,
	qemu-devel, guillaume.delbergue, pbonzini, alistair.francis


fred.konrad@greensocs.com writes:

> From: KONRAD Frederic <fred.konrad@greensocs.com>
>
> spinlock is only used in two cases:
>   * cpu-exec.c: to protect TranslationBlock
>   * mem_helper.c: for lock helper in target-i386 (which seems broken).
>
> It's a pthread_mutex_t in user-mode so better using QemuMutex directly in this
> case.
> It allows as well to reuse tb_lock mutex of TBContext in case of multithread
> TCG.
>
> Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
> ---
>  cpu-exec.c               | 15 +++++++++++----
>  include/exec/exec-all.h  |  4 ++--
>  linux-user/main.c        |  6 +++---
>  target-i386/mem_helper.c | 16 +++++++++++++---
>  tcg/i386/tcg-target.c    |  8 ++++++++
>  5 files changed, 37 insertions(+), 12 deletions(-)
>
> diff --git a/cpu-exec.c b/cpu-exec.c
> index 2ffeb6e..d6336d9 100644
> --- a/cpu-exec.c
> +++ b/cpu-exec.c
> @@ -362,7 +362,9 @@ int cpu_exec(CPUArchState *env)
>      SyncClocks sc;
>  
>      /* This must be volatile so it is not trashed by longjmp() */
> +#if defined(CONFIG_USER_ONLY)
>      volatile bool have_tb_lock = false;
> +#endif
>  
>      if (cpu->halted) {
>          if (!cpu_has_work(cpu)) {
> @@ -480,8 +482,10 @@ int cpu_exec(CPUArchState *env)
>                      cpu->exception_index = EXCP_INTERRUPT;
>                      cpu_loop_exit(cpu);
>                  }
> -                spin_lock(&tcg_ctx.tb_ctx.tb_lock);
> +#if defined(CONFIG_USER_ONLY)
> +                qemu_mutex_lock(&tcg_ctx.tb_ctx.tb_lock);
>                  have_tb_lock = true;
> +#endif

Why are the locking rules different for CONFIG_USER versus system
emulation? Looking at the final tree:

>                  tb = tb_find_fast(env);

this eventually ends up doing a tb_lock on the find_slow path which IIRC
is when might end up doing the actual code generation.

>                  /* Note: we do it here to avoid a gcc bug on Mac OS X when
>                     doing it in tb_find_slow */
> @@ -503,9 +507,10 @@ int cpu_exec(CPUArchState *env)
>                      tb_add_jump((TranslationBlock *)(next_tb & ~TB_EXIT_MASK),
>                                  next_tb & TB_EXIT_MASK, tb);
>                  }
> +#if defined(CONFIG_USER_ONLY)
>                  have_tb_lock = false;
> -                spin_unlock(&tcg_ctx.tb_ctx.tb_lock);
> -
> +                qemu_mutex_unlock(&tcg_ctx.tb_ctx.tb_lock);
> +#endif
>                  /* cpu_interrupt might be called while translating the
>                     TB, but before it is linked into a potentially
>                     infinite loop and becomes env->current_tb. Avoid
> @@ -572,10 +577,12 @@ int cpu_exec(CPUArchState *env)
>  #ifdef TARGET_I386
>              x86_cpu = X86_CPU(cpu);
>  #endif
> +#if defined(CONFIG_USER_ONLY)
>              if (have_tb_lock) {
> -                spin_unlock(&tcg_ctx.tb_ctx.tb_lock);
> +                qemu_mutex_unlock(&tcg_ctx.tb_ctx.tb_lock);
>                  have_tb_lock = false;
>              }
> +#endif
>          }
>      } /* for(;;) */
>  
> diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
> index 2573e8c..44f3336 100644
> --- a/include/exec/exec-all.h
> +++ b/include/exec/exec-all.h
> @@ -176,7 +176,7 @@ struct TranslationBlock {
>      struct TranslationBlock *jmp_first;
>  };
>  
> -#include "exec/spinlock.h"
> +#include "qemu/thread.h"
>  
>  typedef struct TBContext TBContext;
>  
> @@ -186,7 +186,7 @@ struct TBContext {
>      TranslationBlock *tb_phys_hash[CODE_GEN_PHYS_HASH_SIZE];
>      int nb_tbs;
>      /* any access to the tbs or the page table must use this lock */
> -    spinlock_t tb_lock;
> +    QemuMutex tb_lock;
>  
>      /* statistics */
>      int tb_flush_count;
> diff --git a/linux-user/main.c b/linux-user/main.c
> index c855bcc..bce3a98 100644
> --- a/linux-user/main.c
> +++ b/linux-user/main.c
> @@ -107,7 +107,7 @@ static int pending_cpus;
>  /* Make sure everything is in a consistent state for calling fork().  */
>  void fork_start(void)
>  {
> -    pthread_mutex_lock(&tcg_ctx.tb_ctx.tb_lock);
> +    qemu_mutex_lock(&tcg_ctx.tb_ctx.tb_lock);
>      pthread_mutex_lock(&exclusive_lock);
>      mmap_fork_start();
>  }
> @@ -129,11 +129,11 @@ void fork_end(int child)
>          pthread_mutex_init(&cpu_list_mutex, NULL);
>          pthread_cond_init(&exclusive_cond, NULL);
>          pthread_cond_init(&exclusive_resume, NULL);
> -        pthread_mutex_init(&tcg_ctx.tb_ctx.tb_lock, NULL);
> +        qemu_mutex_init(&tcg_ctx.tb_ctx.tb_lock);
>          gdbserver_fork((CPUArchState *)thread_cpu->env_ptr);
>      } else {
>          pthread_mutex_unlock(&exclusive_lock);
> -        pthread_mutex_unlock(&tcg_ctx.tb_ctx.tb_lock);
> +        qemu_mutex_unlock(&tcg_ctx.tb_ctx.tb_lock);
>      }
>  }
>  
> diff --git a/target-i386/mem_helper.c b/target-i386/mem_helper.c
> index 1aec8a5..7106cc3 100644
> --- a/target-i386/mem_helper.c
> +++ b/target-i386/mem_helper.c
> @@ -23,17 +23,27 @@
>  
>  /* broken thread support */
>  
> -static spinlock_t global_cpu_lock = SPIN_LOCK_UNLOCKED;
> +#if defined(CONFIG_USER_ONLY)
> +QemuMutex global_cpu_lock;
>  
>  void helper_lock(void)
>  {
> -    spin_lock(&global_cpu_lock);
> +    qemu_mutex_lock(&global_cpu_lock);
>  }
>  
>  void helper_unlock(void)
>  {
> -    spin_unlock(&global_cpu_lock);
> +    qemu_mutex_unlock(&global_cpu_lock);
>  }
> +#else
> +void helper_lock(void)
> +{
> +}
> +
> +void helper_unlock(void)
> +{
> +}
> +#endif
>  
>  void helper_cmpxchg8b(CPUX86State *env, target_ulong a0)
>  {
> diff --git a/tcg/i386/tcg-target.c b/tcg/i386/tcg-target.c
> index ff4d9cf..0d7c99c 100644
> --- a/tcg/i386/tcg-target.c
> +++ b/tcg/i386/tcg-target.c
> @@ -24,6 +24,10 @@
>  
>  #include "tcg-be-ldst.h"
>  
> +#if defined(CONFIG_USER_ONLY)
> +extern QemuMutex global_cpu_lock;
> +#endif
> +
>  #ifndef NDEBUG
>  static const char * const tcg_target_reg_names[TCG_TARGET_NB_REGS] = {
>  #if TCG_TARGET_REG_BITS == 64
> @@ -2342,6 +2346,10 @@ static void tcg_target_init(TCGContext *s)
>      tcg_regset_set_reg(s->reserved_regs, TCG_REG_CALL_STACK);
>  
>      tcg_add_target_add_op_defs(x86_op_defs);
> +
> +#if defined(CONFIG_USER_ONLY)
> +    qemu_mutex_init(global_cpu_lock);
> +#endif
>  }
>  
>  typedef struct {

I wonder if it would be better splitting the patches:

 - Convert tb spinlocks to use tb_lock
 - i386: convert lock helpers to QemuMutex

before the final

  - Remove spinlocks

-- 
Alex Bennée

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [RFC PATCH V6 02/18] replace spinlock by QemuMutex.
  2015-07-07 10:15   ` Alex Bennée
@ 2015-07-07 10:22     ` Paolo Bonzini
  2015-07-07 11:48       ` Frederic Konrad
  2015-07-07 11:46     ` Frederic Konrad
  1 sibling, 1 reply; 82+ messages in thread
From: Paolo Bonzini @ 2015-07-07 10:22 UTC (permalink / raw)
  To: Alex Bennée, fred.konrad
  Cc: mttcg, peter.maydell, a.spyridakis, mark.burton, qemu-devel,
	alistair.francis, agraf, guillaume.delbergue



On 07/07/2015 12:15, Alex Bennée wrote:
> Why are the locking rules different for CONFIG_USER versus system
> emulation? Looking at the final tree:
> 
>> >                  tb = tb_find_fast(env);
> this eventually ends up doing a tb_lock on the find_slow path which IIRC
> is when might end up doing the actual code generation.
> 

Up to this point, system emulation is using the BQL for everything.  I
guess things change later.

Paolo

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [RFC PATCH V6 01/18] cpu: make cpu_thread_is_idle public.
  2015-07-07  9:47   ` Alex Bennée
@ 2015-07-07 11:43     ` Frederic Konrad
  0 siblings, 0 replies; 82+ messages in thread
From: Frederic Konrad @ 2015-07-07 11:43 UTC (permalink / raw)
  To: Alex Bennée
  Cc: mttcg, peter.maydell, a.spyridakis, mark.burton, agraf,
	qemu-devel, guillaume.delbergue, pbonzini, alistair.francis

On 07/07/2015 11:47, Alex Bennée wrote:
> fred.konrad@greensocs.com writes:
>
>> From: KONRAD Frederic <fred.konrad@greensocs.com>
> Why are we making this visible? Looking at the tree I can't see it being
> used outside the cpus.c. I see the function is modified later for async
> work. Is this something we are planing to use later?

Thanks for spoting this..
It is probably something we used before I put async_safe_work.

Fred
>
>> Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
>> ---
>>   cpus.c            |  2 +-
>>   include/qom/cpu.h | 11 +++++++++++
>>   2 files changed, 12 insertions(+), 1 deletion(-)
>>
>> diff --git a/cpus.c b/cpus.c
>> index 4f0e54d..2d62a35 100644
>> --- a/cpus.c
>> +++ b/cpus.c
>> @@ -74,7 +74,7 @@ bool cpu_is_stopped(CPUState *cpu)
>>       return cpu->stopped || !runstate_is_running();
>>   }
>>   
>> -static bool cpu_thread_is_idle(CPUState *cpu)
>> +bool cpu_thread_is_idle(CPUState *cpu)
>>   {
>>       if (cpu->stop || cpu->queued_work_first) {
>>           return false;
>> diff --git a/include/qom/cpu.h b/include/qom/cpu.h
>> index 39f0f19..af3c9e4 100644
>> --- a/include/qom/cpu.h
>> +++ b/include/qom/cpu.h
>> @@ -514,6 +514,17 @@ void qemu_cpu_kick(CPUState *cpu);
>>   bool cpu_is_stopped(CPUState *cpu);
>>   
>>   /**
>> + * cpu_thread_is_idle:
>> + * @cpu: The CPU to check.
>> + *
>> + * Checks whether the CPU thread is idle.
>> + *
>> + * Returns: %true if the thread is idle;
>> + * %false otherwise.
>> + */
>> +bool cpu_thread_is_idle(CPUState *cpu);
>> +
>> +/**
>>    * run_on_cpu:
>>    * @cpu: The vCPU to run on.
>>    * @func: The function to be executed.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [RFC PATCH V6 02/18] replace spinlock by QemuMutex.
  2015-07-07 10:15   ` Alex Bennée
  2015-07-07 10:22     ` Paolo Bonzini
@ 2015-07-07 11:46     ` Frederic Konrad
  1 sibling, 0 replies; 82+ messages in thread
From: Frederic Konrad @ 2015-07-07 11:46 UTC (permalink / raw)
  To: Alex Bennée
  Cc: mttcg, peter.maydell, a.spyridakis, mark.burton, agraf,
	qemu-devel, guillaume.delbergue, pbonzini, alistair.francis

On 07/07/2015 12:15, Alex Bennée wrote:
> fred.konrad@greensocs.com writes:
>
>> From: KONRAD Frederic <fred.konrad@greensocs.com>
>>
>> spinlock is only used in two cases:
>>    * cpu-exec.c: to protect TranslationBlock
>>    * mem_helper.c: for lock helper in target-i386 (which seems broken).
>>
>> It's a pthread_mutex_t in user-mode so better using QemuMutex directly in this
>> case.
>> It allows as well to reuse tb_lock mutex of TBContext in case of multithread
>> TCG.
>>
>> Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
>> ---
>>   cpu-exec.c               | 15 +++++++++++----
>>   include/exec/exec-all.h  |  4 ++--
>>   linux-user/main.c        |  6 +++---
>>   target-i386/mem_helper.c | 16 +++++++++++++---
>>   tcg/i386/tcg-target.c    |  8 ++++++++
>>   5 files changed, 37 insertions(+), 12 deletions(-)
>>
>> diff --git a/cpu-exec.c b/cpu-exec.c
>> index 2ffeb6e..d6336d9 100644
>> --- a/cpu-exec.c
>> +++ b/cpu-exec.c
>> @@ -362,7 +362,9 @@ int cpu_exec(CPUArchState *env)
>>       SyncClocks sc;
>>   
>>       /* This must be volatile so it is not trashed by longjmp() */
>> +#if defined(CONFIG_USER_ONLY)
>>       volatile bool have_tb_lock = false;
>> +#endif
>>   
>>       if (cpu->halted) {
>>           if (!cpu_has_work(cpu)) {
>> @@ -480,8 +482,10 @@ int cpu_exec(CPUArchState *env)
>>                       cpu->exception_index = EXCP_INTERRUPT;
>>                       cpu_loop_exit(cpu);
>>                   }
>> -                spin_lock(&tcg_ctx.tb_ctx.tb_lock);
>> +#if defined(CONFIG_USER_ONLY)
>> +                qemu_mutex_lock(&tcg_ctx.tb_ctx.tb_lock);
>>                   have_tb_lock = true;
>> +#endif
> Why are the locking rules different for CONFIG_USER versus system
> emulation? Looking at the final tree:
>
>>                   tb = tb_find_fast(env);
> this eventually ends up doing a tb_lock on the find_slow path which IIRC
> is when might end up doing the actual code generation.

I didn't looked at the user code. But yes we should probably end with 
the same
thing for both user mode code and system mode code. That's what Peter was
suggesting before but I didn't have time to look at this right now.

>
>>                   /* Note: we do it here to avoid a gcc bug on Mac OS X when
>>                      doing it in tb_find_slow */
>> @@ -503,9 +507,10 @@ int cpu_exec(CPUArchState *env)
>>                       tb_add_jump((TranslationBlock *)(next_tb & ~TB_EXIT_MASK),
>>                                   next_tb & TB_EXIT_MASK, tb);
>>                   }
>> +#if defined(CONFIG_USER_ONLY)
>>                   have_tb_lock = false;
>> -                spin_unlock(&tcg_ctx.tb_ctx.tb_lock);
>> -
>> +                qemu_mutex_unlock(&tcg_ctx.tb_ctx.tb_lock);
>> +#endif
>>                   /* cpu_interrupt might be called while translating the
>>                      TB, but before it is linked into a potentially
>>                      infinite loop and becomes env->current_tb. Avoid
>> @@ -572,10 +577,12 @@ int cpu_exec(CPUArchState *env)
>>   #ifdef TARGET_I386
>>               x86_cpu = X86_CPU(cpu);
>>   #endif
>> +#if defined(CONFIG_USER_ONLY)
>>               if (have_tb_lock) {
>> -                spin_unlock(&tcg_ctx.tb_ctx.tb_lock);
>> +                qemu_mutex_unlock(&tcg_ctx.tb_ctx.tb_lock);
>>                   have_tb_lock = false;
>>               }
>> +#endif
>>           }
>>       } /* for(;;) */
>>   
>> diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
>> index 2573e8c..44f3336 100644
>> --- a/include/exec/exec-all.h
>> +++ b/include/exec/exec-all.h
>> @@ -176,7 +176,7 @@ struct TranslationBlock {
>>       struct TranslationBlock *jmp_first;
>>   };
>>   
>> -#include "exec/spinlock.h"
>> +#include "qemu/thread.h"
>>   
>>   typedef struct TBContext TBContext;
>>   
>> @@ -186,7 +186,7 @@ struct TBContext {
>>       TranslationBlock *tb_phys_hash[CODE_GEN_PHYS_HASH_SIZE];
>>       int nb_tbs;
>>       /* any access to the tbs or the page table must use this lock */
>> -    spinlock_t tb_lock;
>> +    QemuMutex tb_lock;
>>   
>>       /* statistics */
>>       int tb_flush_count;
>> diff --git a/linux-user/main.c b/linux-user/main.c
>> index c855bcc..bce3a98 100644
>> --- a/linux-user/main.c
>> +++ b/linux-user/main.c
>> @@ -107,7 +107,7 @@ static int pending_cpus;
>>   /* Make sure everything is in a consistent state for calling fork().  */
>>   void fork_start(void)
>>   {
>> -    pthread_mutex_lock(&tcg_ctx.tb_ctx.tb_lock);
>> +    qemu_mutex_lock(&tcg_ctx.tb_ctx.tb_lock);
>>       pthread_mutex_lock(&exclusive_lock);
>>       mmap_fork_start();
>>   }
>> @@ -129,11 +129,11 @@ void fork_end(int child)
>>           pthread_mutex_init(&cpu_list_mutex, NULL);
>>           pthread_cond_init(&exclusive_cond, NULL);
>>           pthread_cond_init(&exclusive_resume, NULL);
>> -        pthread_mutex_init(&tcg_ctx.tb_ctx.tb_lock, NULL);
>> +        qemu_mutex_init(&tcg_ctx.tb_ctx.tb_lock);
>>           gdbserver_fork((CPUArchState *)thread_cpu->env_ptr);
>>       } else {
>>           pthread_mutex_unlock(&exclusive_lock);
>> -        pthread_mutex_unlock(&tcg_ctx.tb_ctx.tb_lock);
>> +        qemu_mutex_unlock(&tcg_ctx.tb_ctx.tb_lock);
>>       }
>>   }
>>   
>> diff --git a/target-i386/mem_helper.c b/target-i386/mem_helper.c
>> index 1aec8a5..7106cc3 100644
>> --- a/target-i386/mem_helper.c
>> +++ b/target-i386/mem_helper.c
>> @@ -23,17 +23,27 @@
>>   
>>   /* broken thread support */
>>   
>> -static spinlock_t global_cpu_lock = SPIN_LOCK_UNLOCKED;
>> +#if defined(CONFIG_USER_ONLY)
>> +QemuMutex global_cpu_lock;
>>   
>>   void helper_lock(void)
>>   {
>> -    spin_lock(&global_cpu_lock);
>> +    qemu_mutex_lock(&global_cpu_lock);
>>   }
>>   
>>   void helper_unlock(void)
>>   {
>> -    spin_unlock(&global_cpu_lock);
>> +    qemu_mutex_unlock(&global_cpu_lock);
>>   }
>> +#else
>> +void helper_lock(void)
>> +{
>> +}
>> +
>> +void helper_unlock(void)
>> +{
>> +}
>> +#endif
>>   
>>   void helper_cmpxchg8b(CPUX86State *env, target_ulong a0)
>>   {
>> diff --git a/tcg/i386/tcg-target.c b/tcg/i386/tcg-target.c
>> index ff4d9cf..0d7c99c 100644
>> --- a/tcg/i386/tcg-target.c
>> +++ b/tcg/i386/tcg-target.c
>> @@ -24,6 +24,10 @@
>>   
>>   #include "tcg-be-ldst.h"
>>   
>> +#if defined(CONFIG_USER_ONLY)
>> +extern QemuMutex global_cpu_lock;
>> +#endif
>> +
>>   #ifndef NDEBUG
>>   static const char * const tcg_target_reg_names[TCG_TARGET_NB_REGS] = {
>>   #if TCG_TARGET_REG_BITS == 64
>> @@ -2342,6 +2346,10 @@ static void tcg_target_init(TCGContext *s)
>>       tcg_regset_set_reg(s->reserved_regs, TCG_REG_CALL_STACK);
>>   
>>       tcg_add_target_add_op_defs(x86_op_defs);
>> +
>> +#if defined(CONFIG_USER_ONLY)
>> +    qemu_mutex_init(global_cpu_lock);
>> +#endif
>>   }
>>   
>>   typedef struct {
> I wonder if it would be better splitting the patches:
>
>   - Convert tb spinlocks to use tb_lock
>   - i386: convert lock helpers to QemuMutex
>
> before the final
>
>    - Remove spinlocks

Yes that makes sense I think.

Fred
>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [RFC PATCH V6 02/18] replace spinlock by QemuMutex.
  2015-07-07 10:22     ` Paolo Bonzini
@ 2015-07-07 11:48       ` Frederic Konrad
  2015-07-07 12:34         ` Paolo Bonzini
  0 siblings, 1 reply; 82+ messages in thread
From: Frederic Konrad @ 2015-07-07 11:48 UTC (permalink / raw)
  To: Paolo Bonzini, Alex Bennée
  Cc: mttcg, peter.maydell, a.spyridakis, mark.burton, qemu-devel,
	alistair.francis, agraf, guillaume.delbergue

On 07/07/2015 12:22, Paolo Bonzini wrote:
>
> On 07/07/2015 12:15, Alex Bennée wrote:
>> Why are the locking rules different for CONFIG_USER versus system
>> emulation? Looking at the final tree:
>>
>>>>                   tb = tb_find_fast(env);
>> this eventually ends up doing a tb_lock on the find_slow path which IIRC
>> is when might end up doing the actual code generation.
>>
> Up to this point, system emulation is using the BQL for everything.  I
> guess things change later.
>
> Paolo
Actually we use tb_lock to protect all the tb related structure such as 
TBContext
etc.. Is it better to use the global lock for this?

Fred

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [RFC PATCH V6 05/18] protect TBContext with tb_lock.
  2015-06-26 14:47 ` [Qemu-devel] [RFC PATCH V6 05/18] protect TBContext with tb_lock fred.konrad
  2015-06-26 14:56   ` Paolo Bonzini
  2015-06-26 16:20   ` Paolo Bonzini
@ 2015-07-07 12:22   ` Alex Bennée
  2015-07-07 13:16     ` Frederic Konrad
  2 siblings, 1 reply; 82+ messages in thread
From: Alex Bennée @ 2015-07-07 12:22 UTC (permalink / raw)
  To: fred.konrad
  Cc: mttcg, peter.maydell, a.spyridakis, mark.burton, agraf,
	qemu-devel, guillaume.delbergue, pbonzini, alistair.francis


fred.konrad@greensocs.com writes:

> From: KONRAD Frederic <fred.konrad@greensocs.com>
>
> This protects TBContext with tb_lock to make tb_* thread safe.
>
> We can still have issue with tb_flush in case of multithread TCG:
>   An other CPU can be executing code during a flush.
>
> This can be fixed later by making all other TCG thread exiting before calling
> tb_flush().
>
> tb_find_slow is separated into tb_find_slow and tb_find_physical as the whole
> tb_find_slow doesn't require to lock the tb.
>
> Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>

So my comments from earlier about the different locking between
CONFIG_USER and system emulation still stand. Ultimately we need a good
reason (or an abstraction) before sprinkling #ifdefs in the code if only
for ease of reading.

>
> Changes:
> V1 -> V2:
>   * Drop a tb_lock arround tb_find_fast in cpu-exec.c.
> ---
>  cpu-exec.c             |  60 ++++++++++++++--------
>  target-arm/translate.c |   5 ++
>  tcg/tcg.h              |   7 +++
>  translate-all.c        | 137 ++++++++++++++++++++++++++++++++++++++-----------
>  4 files changed, 158 insertions(+), 51 deletions(-)
>
> diff --git a/cpu-exec.c b/cpu-exec.c
> index d6336d9..5d9b518 100644
> --- a/cpu-exec.c
> +++ b/cpu-exec.c
> @@ -130,6 +130,8 @@ static void init_delay_params(SyncClocks *sc, const CPUState *cpu)
>  void cpu_loop_exit(CPUState *cpu)
>  {
>      cpu->current_tb = NULL;
> +    /* Release those mutex before long jump so other thread can work. */
> +    tb_lock_reset();
>      siglongjmp(cpu->jmp_env, 1);
>  }
>  
> @@ -142,6 +144,8 @@ void cpu_resume_from_signal(CPUState *cpu, void *puc)
>      /* XXX: restore cpu registers saved in host registers */
>  
>      cpu->exception_index = -1;
> +    /* Release those mutex before long jump so other thread can work. */
> +    tb_lock_reset();
>      siglongjmp(cpu->jmp_env, 1);
>  }
>  
> @@ -253,12 +257,9 @@ static void cpu_exec_nocache(CPUArchState *env, int max_cycles,
>      tb_free(tb);
>  }
>  
> -static TranslationBlock *tb_find_slow(CPUArchState *env,
> -                                      target_ulong pc,
> -                                      target_ulong cs_base,
> -                                      uint64_t flags)
> +static TranslationBlock *tb_find_physical(CPUArchState *env, target_ulong pc,
> +                                          target_ulong cs_base, uint64_t flags)
>  {

As Paolo has already mentioned comments on functions expecting to have
locks held when called.

> -    CPUState *cpu = ENV_GET_CPU(env);
>      TranslationBlock *tb, **ptb1;
>      unsigned int h;
>      tb_page_addr_t phys_pc, phys_page1;
> @@ -273,8 +274,9 @@ static TranslationBlock *tb_find_slow(CPUArchState *env,
>      ptb1 = &tcg_ctx.tb_ctx.tb_phys_hash[h];
>      for(;;) {
>          tb = *ptb1;
> -        if (!tb)
> -            goto not_found;
> +        if (!tb) {
> +            return tb;
> +        }
>          if (tb->pc == pc &&
>              tb->page_addr[0] == phys_page1 &&
>              tb->cs_base == cs_base &&
> @@ -282,28 +284,43 @@ static TranslationBlock *tb_find_slow(CPUArchState *env,
>              /* check next page if needed */
>              if (tb->page_addr[1] != -1) {
>                  tb_page_addr_t phys_page2;
> -
>                  virt_page2 = (pc & TARGET_PAGE_MASK) +
>                      TARGET_PAGE_SIZE;
>                  phys_page2 = get_page_addr_code(env, virt_page2);
> -                if (tb->page_addr[1] == phys_page2)
> -                    goto found;
> +                if (tb->page_addr[1] == phys_page2) {
> +                    return tb;
> +                }
>              } else {
> -                goto found;
> +                return tb;
>              }
>          }
>          ptb1 = &tb->phys_hash_next;
>      }
> - not_found:
> -   /* if no translated code available, then translate it now */
> -    tb = tb_gen_code(cpu, pc, cs_base, flags, 0);
> -
> - found:
> -    /* Move the last found TB to the head of the list */
> -    if (likely(*ptb1)) {
> -        *ptb1 = tb->phys_hash_next;
> -        tb->phys_hash_next = tcg_ctx.tb_ctx.tb_phys_hash[h];
> -        tcg_ctx.tb_ctx.tb_phys_hash[h] = tb;
> +    return tb;
> +}
> +
> +static TranslationBlock *tb_find_slow(CPUArchState *env, target_ulong pc,
> +                                      target_ulong cs_base, uint64_t flags)
> +{
> +    /*
> +     * First try to get the tb if we don't find it we need to lock and compile
> +     * it.
> +     */
> +    CPUState *cpu = ENV_GET_CPU(env);
> +    TranslationBlock *tb;
> +
> +    tb = tb_find_physical(env, pc, cs_base, flags);
> +    if (!tb) {
> +        tb_lock();
> +        /*
> +         * Retry to get the TB in case a CPU just translate it to avoid having
> +         * duplicated TB in the pool.
> +         */
> +        tb = tb_find_physical(env, pc, cs_base, flags);
> +        if (!tb) {
> +            tb = tb_gen_code(cpu, pc, cs_base, flags, 0);
> +        }
> +        tb_unlock();
>      }
>      /* we add the TB in the virtual pc hash table */
>      cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)] = tb;
> @@ -326,6 +343,7 @@ static inline TranslationBlock *tb_find_fast(CPUArchState *env)
>                   tb->flags != flags)) {
>          tb = tb_find_slow(env, pc, cs_base, flags);
>      }
> +
>      return tb;
>  }
>  
> diff --git a/target-arm/translate.c b/target-arm/translate.c
> index 971b6db..47345aa 100644
> --- a/target-arm/translate.c
> +++ b/target-arm/translate.c
> @@ -11162,6 +11162,8 @@ static inline void gen_intermediate_code_internal(ARMCPU *cpu,
>  
>      dc->tb = tb;
>  
> +    tb_lock();
> +
>      dc->is_jmp = DISAS_NEXT;
>      dc->pc = pc_start;
>      dc->singlestep_enabled = cs->singlestep_enabled;
> @@ -11499,6 +11501,7 @@ done_generating:
>          tb->size = dc->pc - pc_start;
>          tb->icount = num_insns;
>      }
> +    tb_unlock();
>  }
>  
>  void gen_intermediate_code(CPUARMState *env, TranslationBlock *tb)
> @@ -11567,6 +11570,7 @@ void arm_cpu_dump_state(CPUState *cs, FILE *f, fprintf_function cpu_fprintf,
>  
>  void restore_state_to_opc(CPUARMState *env, TranslationBlock *tb, int pc_pos)
>  {
> +    tb_lock();
>      if (is_a64(env)) {
>          env->pc = tcg_ctx.gen_opc_pc[pc_pos];
>          env->condexec_bits = 0;
> @@ -11574,4 +11578,5 @@ void restore_state_to_opc(CPUARMState *env, TranslationBlock *tb, int pc_pos)
>          env->regs[15] = tcg_ctx.gen_opc_pc[pc_pos];
>          env->condexec_bits = gen_opc_condexec_bits[pc_pos];
>      }
> +    tb_unlock();
>  }
> diff --git a/tcg/tcg.h b/tcg/tcg.h
> index 41e4869..032fe10 100644
> --- a/tcg/tcg.h
> +++ b/tcg/tcg.h
> @@ -592,17 +592,24 @@ void *tcg_malloc_internal(TCGContext *s, int size);
>  void tcg_pool_reset(TCGContext *s);
>  void tcg_pool_delete(TCGContext *s);
>  
> +void tb_lock(void);
> +void tb_unlock(void);
> +void tb_lock_reset(void);
> +
>  static inline void *tcg_malloc(int size)
>  {
>      TCGContext *s = &tcg_ctx;
>      uint8_t *ptr, *ptr_end;
> +    tb_lock();
>      size = (size + sizeof(long) - 1) & ~(sizeof(long) - 1);
>      ptr = s->pool_cur;
>      ptr_end = ptr + size;
>      if (unlikely(ptr_end > s->pool_end)) {
> +        tb_unlock();
>          return tcg_malloc_internal(&tcg_ctx, size);

If the purpose of the lock is to protect the global tcg_ctx then we
shouldn't be unlocking before calling the _internal function which also
messes with the context. 

>      } else {
>          s->pool_cur = ptr_end;
> +        tb_unlock();
>          return ptr;
>      }
>  }
> diff --git a/translate-all.c b/translate-all.c
> index b6b0e1c..c25b79b 100644
> --- a/translate-all.c
> +++ b/translate-all.c
> @@ -127,6 +127,34 @@ static void *l1_map[V_L1_SIZE];
>  /* code generation context */
>  TCGContext tcg_ctx;
>  
> +/* translation block context */
> +__thread volatile int have_tb_lock;
> +
> +void tb_lock(void)
> +{
> +    if (!have_tb_lock) {
> +        qemu_mutex_lock(&tcg_ctx.tb_ctx.tb_lock);
> +    }
> +    have_tb_lock++;
> +}
> +
> +void tb_unlock(void)
> +{
> +    assert(have_tb_lock > 0);
> +    have_tb_lock--;
> +    if (!have_tb_lock) {
> +        qemu_mutex_unlock(&tcg_ctx.tb_ctx.tb_lock);
> +    }
> +}
> +
> +void tb_lock_reset(void)
> +{
> +    if (have_tb_lock) {
> +        qemu_mutex_unlock(&tcg_ctx.tb_ctx.tb_lock);
> +    }
> +    have_tb_lock = 0;
> +}
> +
>  static void tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc,
>                           tb_page_addr_t phys_page2);
>  static TranslationBlock *tb_find_pc(uintptr_t tc_ptr);
> @@ -215,6 +243,7 @@ static int cpu_restore_state_from_tb(CPUState *cpu, TranslationBlock *tb,
>  #ifdef CONFIG_PROFILER
>      ti = profile_getclock();
>  #endif
> +    tb_lock();
>      tcg_func_start(s);
>  
>      gen_intermediate_code_pc(env, tb);
> @@ -228,8 +257,10 @@ static int cpu_restore_state_from_tb(CPUState *cpu, TranslationBlock *tb,
>  
>      /* find opc index corresponding to search_pc */
>      tc_ptr = (uintptr_t)tb->tc_ptr;
> -    if (searched_pc < tc_ptr)
> +    if (searched_pc < tc_ptr) {
> +        tb_unlock();
>          return -1;
> +    }
>  
>      s->tb_next_offset = tb->tb_next_offset;
>  #ifdef USE_DIRECT_JUMP
> @@ -241,8 +272,10 @@ static int cpu_restore_state_from_tb(CPUState *cpu, TranslationBlock *tb,
>  #endif
>      j = tcg_gen_code_search_pc(s, (tcg_insn_unit *)tc_ptr,
>                                 searched_pc - tc_ptr);
> -    if (j < 0)
> +    if (j < 0) {
> +        tb_unlock();
>          return -1;
> +    }
>      /* now find start of instruction before */
>      while (s->gen_opc_instr_start[j] == 0) {
>          j--;
> @@ -255,6 +288,8 @@ static int cpu_restore_state_from_tb(CPUState *cpu, TranslationBlock *tb,
>      s->restore_time += profile_getclock() - ti;
>      s->restore_count++;
>  #endif
> +
> +    tb_unlock();
>      return 0;
>  }
>  
> @@ -672,6 +707,7 @@ static inline void code_gen_alloc(size_t tb_size)
>              CODE_GEN_AVG_BLOCK_SIZE;
>      tcg_ctx.tb_ctx.tbs =
>              g_malloc(tcg_ctx.code_gen_max_blocks * sizeof(TranslationBlock));
> +    qemu_mutex_init(&tcg_ctx.tb_ctx.tb_lock);
>  }
>  
>  /* Must be called before using the QEMU cpus. 'tb_size' is the size
> @@ -696,16 +732,22 @@ bool tcg_enabled(void)
>      return tcg_ctx.code_gen_buffer != NULL;
>  }
>  
> -/* Allocate a new translation block. Flush the translation buffer if
> -   too many translation blocks or too much generated code. */
> +/*
> + * Allocate a new translation block. Flush the translation buffer if
> + * too many translation blocks or too much generated code.
> + * tb_alloc is not thread safe but tb_gen_code is protected by a mutex so this
> + * function is called only by one thread.

maybe: "..is not thread safe but tb_gen_code is protected by tb_lock so
only one thread calls it at a time."?

> + */
>  static TranslationBlock *tb_alloc(target_ulong pc)
>  {
> -    TranslationBlock *tb;
> +    TranslationBlock *tb = NULL;
>  
>      if (tcg_ctx.tb_ctx.nb_tbs >= tcg_ctx.code_gen_max_blocks ||
>          (tcg_ctx.code_gen_ptr - tcg_ctx.code_gen_buffer) >=
>           tcg_ctx.code_gen_buffer_max_size) {
> -        return NULL;
> +        tb = &tcg_ctx.tb_ctx.tbs[tcg_ctx.tb_ctx.nb_tbs++];
> +        tb->pc = pc;
> +        tb->cflags = 0;
>      }
>      tb = &tcg_ctx.tb_ctx.tbs[tcg_ctx.tb_ctx.nb_tbs++];
>      tb->pc = pc;

That looks weird.

if (cond) return
&tcg_ctx.tb_ctx.tbs[tcg_ctx.tb_ctx.nb_tbs++] then return
&tcg_ctx.tb_ctx.tbs[tcg_ctx.tb_ctx.nb_tbs++];?

Also rendering the setting of tb = NULL pointless as it will always be
from the array?

> @@ -718,11 +760,16 @@ void tb_free(TranslationBlock *tb)
>      /* In pr      actice this is mostly used for single use temporary TB
>         Ignore the hard cases and just back up if this TB happens to
>         be the last one generated.  */
> +
> +    tb_lock();
> +
>      if (tcg_ctx.tb_ctx.nb_tbs > 0 &&
>              tb == &tcg_ctx.tb_ctx.tbs[tcg_ctx.tb_ctx.nb_tbs - 1]) {
>          tcg_ctx.code_gen_ptr = tb->tc_ptr;
>          tcg_ctx.tb_ctx.nb_tbs--;
>      }
> +
> +    tb_unlock();
>  }
>  
>  static inline void invalidate_page_bitmap(PageDesc *p)
> @@ -773,6 +820,8 @@ void tb_flush(CPUArchState *env1)
>  {
>      CPUState *cpu = ENV_GET_CPU(env1);
>  
> +    tb_lock();
> +
>  #if defined(DEBUG_FLUSH)
>      printf("qemu: flush code_size=%ld nb_tbs=%d avg_tb_size=%ld\n",
>             (unsigned long)(tcg_ctx.code_gen_ptr - tcg_ctx.code_gen_buffer),
> @@ -797,6 +846,8 @@ void tb_flush(CPUArchState *env1)
>      /* XXX: flush processor icache at this point if cache flush is
>         expensive */
>      tcg_ctx.tb_ctx.tb_flush_count++;
> +
> +    tb_unlock();
>  }
>  
>  #ifdef DEBUG_TB_CHECK
> @@ -806,6 +857,8 @@ static void tb_invalidate_check(target_ulong address)
>      TranslationBlock *tb;
>      int i;
>  
> +    tb_lock();
> +
>      address &= TARGET_PAGE_MASK;
>      for (i = 0; i < CODE_GEN_PHYS_HASH_SIZE; i++) {
>          for (tb = tb_ctx.tb_phys_hash[i]; tb != NULL; tb = tb->phys_hash_next) {
> @@ -817,6 +870,8 @@ static void tb_invalidate_check(target_ulong address)
>              }
>          }
>      }
> +
> +    tb_unlock();
>  }
>  
>  /* verify that all the pages have correct rights for code */
> @@ -825,6 +880,8 @@ static void tb_page_check(void)
>      TranslationBlock *tb;
>      int i, flags1, flags2;
>  
> +    tb_lock();
> +
>      for (i = 0; i < CODE_GEN_PHYS_HASH_SIZE; i++) {
>          for (tb = tcg_ctx.tb_ctx.tb_phys_hash[i]; tb != NULL;
>                  tb = tb->phys_hash_next) {
> @@ -836,6 +893,8 @@ static void tb_page_check(void)
>              }
>          }
>      }
> +
> +    tb_unlock();
>  }
>  
>  #endif
> @@ -916,6 +975,8 @@ void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr)
>      tb_page_addr_t phys_pc;
>      TranslationBlock *tb1, *tb2;
>  
> +    tb_lock();
> +
>      /* remove the TB from the hash list */
>      phys_pc = tb->page_addr[0] + (tb->pc & ~TARGET_PAGE_MASK);
>      h = tb_phys_hash_func(phys_pc);
> @@ -963,6 +1024,7 @@ void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr)
>      tb->jmp_first = (TranslationBlock *)((uintptr_t)tb | 2); /* fail safe */
>  
>      tcg_ctx.tb_ctx.tb_phys_invalidate_count++;
> +    tb_unlock();
>  }
>  
>  static void build_page_bitmap(PageDesc *p)
> @@ -1004,6 +1066,8 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
>      target_ulong virt_page2;
>      int code_gen_size;
>  
> +    tb_lock();
> +
>      phys_pc = get_page_addr_code(env, pc);
>      if (use_icount) {
>          cflags |= CF_USE_ICOUNT;
> @@ -1032,6 +1096,8 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
>          phys_page2 = get_page_addr_code(env, virt_page2);
>      }
>      tb_link_page(tb, phys_pc, phys_page2);
> +
> +    tb_unlock();
>      return tb;
>  }
>  
> @@ -1330,13 +1396,15 @@ static inline void tb_alloc_page(TranslationBlock *tb,
>  }
>  
>  /* add a new TB and link it to the physical page tables. phys_page2 is
> -   (-1) to indicate that only one page contains the TB. */
> + * (-1) to indicate that only one page contains the TB. */
>  static void tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc,
>                           tb_page_addr_t phys_page2)
>  {
>      unsigned int h;
>      TranslationBlock **ptb;
>  
> +    tb_lock();
> +
>      /* Grab the mmap lock to stop another thread invalidating this TB
>         before we are done.  */
>      mmap_lock();
> @@ -1370,6 +1438,8 @@ static void tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc,
>      tb_page_check();
>  #endif
>      mmap_unlock();
> +
> +    tb_unlock();
>  }
>  
>  /* find the TB 'tb' such that tb[0].tc_ptr <= tc_ptr <
> @@ -1378,31 +1448,34 @@ static TranslationBlock *tb_find_pc(uintptr_t tc_ptr)
>  {
>      int m_min, m_max, m;
>      uintptr_t v;
> -    TranslationBlock *tb;
> -
> -    if (tcg_ctx.tb_ctx.nb_tbs <= 0) {
> -        return NULL;
> -    }
> -    if (tc_ptr < (uintptr_t)tcg_ctx.code_gen_buffer ||
> -        tc_ptr >= (uintptr_t)tcg_ctx.code_gen_ptr) {
> -        return NULL;
> -    }
> -    /* binary search (cf Knuth) */
> -    m_min = 0;
> -    m_max = tcg_ctx.tb_ctx.nb_tbs - 1;
> -    while (m_min <= m_max) {
> -        m = (m_min + m_max) >> 1;
> -        tb = &tcg_ctx.tb_ctx.tbs[m];
> -        v = (uintptr_t)tb->tc_ptr;
> -        if (v == tc_ptr) {
> -            return tb;
> -        } else if (tc_ptr < v) {
> -            m_max = m - 1;
> -        } else {
> -            m_min = m + 1;
> +    TranslationBlock *tb = NULL;
> +
> +    tb_lock();
> +
> +    if ((tcg_ctx.tb_ctx.nb_tbs > 0)
> +    && (tc_ptr >= (uintptr_t)tcg_ctx.code_gen_buffer &&
> +        tc_ptr < (uintptr_t)tcg_ctx.code_gen_ptr)) {
> +        /* binary search (cf Knuth) */
> +        m_min = 0;
> +        m_max = tcg_ctx.tb_ctx.nb_tbs - 1;
> +        while (m_min <= m_max) {
> +            m = (m_min + m_max) >> 1;
> +            tb = &tcg_ctx.tb_ctx.tbs[m];
> +            v = (uintptr_t)tb->tc_ptr;
> +            if (v == tc_ptr) {
> +                tb_unlock();
> +                return tb;
> +            } else if (tc_ptr < v) {
> +                m_max = m - 1;
> +            } else {
> +                m_min = m + 1;
> +            }
>          }
> +        tb = &tcg_ctx.tb_ctx.tbs[m_max];
>      }
> -    return &tcg_ctx.tb_ctx.tbs[m_max];
> +
> +    tb_unlock();
> +    return tb;
>  }
>  
>  #if !defined(CONFIG_USER_ONLY)
> @@ -1564,6 +1637,8 @@ void dump_exec_info(FILE *f, fprintf_function cpu_fprintf)
>      int direct_jmp_count, direct_jmp2_count, cross_page;
>      TranslationBlock *tb;
>  
> +    tb_lock();
> +
>      target_code_size = 0;
>      max_target_code_size = 0;
>      cross_page = 0;
> @@ -1619,6 +1694,8 @@ void dump_exec_info(FILE *f, fprintf_function cpu_fprintf)
>              tcg_ctx.tb_ctx.tb_phys_invalidate_count);
>      cpu_fprintf(f, "TLB flush count     %d\n", tlb_flush_count);
>      tcg_dump_info(f, cpu_fprintf);
> +
> +    tb_unlock();
>  }
>  
>  void dump_opcount_info(FILE *f, fprintf_function cpu_fprintf)

-- 
Alex Bennée

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [RFC PATCH V6 06/18] tcg: remove tcg_halt_cond global variable.
  2015-06-26 15:41     ` Frederic Konrad
@ 2015-07-07 12:27       ` Alex Bennée
  2015-07-07 13:17         ` Frederic Konrad
  0 siblings, 1 reply; 82+ messages in thread
From: Alex Bennée @ 2015-07-07 12:27 UTC (permalink / raw)
  To: Frederic Konrad
  Cc: mttcg, peter.maydell, a.spyridakis, mark.burton, agraf,
	qemu-devel, guillaume.delbergue, Paolo Bonzini, alistair.francis


Frederic Konrad <fred.konrad@greensocs.com> writes:

> On 26/06/2015 17:02, Paolo Bonzini wrote:
>>
>> On 26/06/2015 16:47, fred.konrad@greensocs.com wrote:
>>> From: KONRAD Frederic <fred.konrad@greensocs.com>
>>>
>>> This removes tcg_halt_cond global variable.
>>> We need one QemuCond per virtual cpu for multithread TCG.
>>>
>>> Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
<snip>
>>> @@ -1068,7 +1065,7 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
>>>                   qemu_clock_notify(QEMU_CLOCK_VIRTUAL);
>>>               }
>>>           }
>>> -        qemu_tcg_wait_io_event();
>>> +        qemu_tcg_wait_io_event(QTAILQ_FIRST(&cpus));
>> Does this work (for non-multithreaded TCG) if tcg_thread_fn is waiting
>> on the "wrong" condition variable?  For example if all CPUs are idle and
>> the second CPU wakes up, qemu_tcg_wait_io_event won't be kicked out of
>> the wait.
>>
>> I think you need to have a CPUThread struct like this:
>>
>>     struct CPUThread {
>>         QemuThread thread;
>>         QemuCond halt_cond;
>>     };
>>
>> and in CPUState have a CPUThread * field instead of the thread and
>> halt_cond fields.
>>
>> Then single-threaded TCG can point all CPUStates to the same instance of
>> the struct, while multi-threaded TCG can point each CPUState to a
>> different struct.
>>
>> Paolo
>
> Hmm probably not, though we didn't pay attention to keep the non MTTCG 
> working.
> (which is probably not good).
<snip>

You may want to consider push a branch up to a github mirror and
enabling travis-ci on the repo. That way you'll at least know how broken
the rest of the tree is.

I appreciate we are still at the RFC stage here but it will probably pay
off in the long run to try and avoid breaking the rest of the tree ;-)

-- 
Alex Bennée

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [RFC PATCH V6 07/18] Drop global lock during TCG code execution
  2015-06-26 15:36     ` Frederic Konrad
  2015-06-26 15:42       ` Jan Kiszka
@ 2015-07-07 12:33       ` Alex Bennée
  2015-07-07 13:18         ` Frederic Konrad
  1 sibling, 1 reply; 82+ messages in thread
From: Alex Bennée @ 2015-07-07 12:33 UTC (permalink / raw)
  To: Frederic Konrad
  Cc: mttcg, peter.maydell, Jan Kiszka, mark.burton, agraf, qemu-devel,
	guillaume.delbergue, a.spyridakis, pbonzini, alistair.francis


Frederic Konrad <fred.konrad@greensocs.com> writes:

> On 26/06/2015 16:56, Jan Kiszka wrote:
>> On 2015-06-26 16:47, fred.konrad@greensocs.com wrote:
>>> From: Jan Kiszka <jan.kiszka@siemens.com>
>>>
>>> This finally allows TCG to benefit from the iothread introduction: Drop
>>> the global mutex while running pure TCG CPU code. Reacquire the lock
>>> when entering MMIO or PIO emulation, or when leaving the TCG loop.
<snip>
>>> diff --git a/translate-all.c b/translate-all.c
>>> index c25b79b..ade2269 100644
>>> --- a/translate-all.c
>>> +++ b/translate-all.c
>>> @@ -1222,6 +1222,7 @@ void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end,
>>>   #endif
>>>   #ifdef TARGET_HAS_PRECISE_SMC
>>>       if (current_tb_modified) {
>>> +        qemu_mutex_unlock_iothread();
>>>           /* we generate a block containing just the instruction
>>>              modifying the memory. It will ensure that it cannot modify
>>>              itself */
>>> @@ -1326,6 +1327,7 @@ static void tb_invalidate_phys_page(tb_page_addr_t addr,
>>>       p->first_tb = NULL;
>>>   #ifdef TARGET_HAS_PRECISE_SMC
>>>       if (current_tb_modified) {
>>> +        qemu_mutex_unlock_iothread();
>>>           /* we generate a block containing just the instruction
>>>              modifying the memory. It will ensure that it cannot modify
>>>              itself */
>>> diff --git a/vl.c b/vl.c
>>> index 69ad90c..2983d44 100644
>>> --- a/vl.c
>>> +++ b/vl.c
>>> @@ -1698,10 +1698,16 @@ void qemu_devices_reset(void)
>>>   {
>>>       QEMUResetEntry *re, *nre;
>>>   
>>> +    /*
>>> +     * Some device's reset needs to grab the global_mutex. So just release it
>>> +     * here.
>> That's a property newly introduced by the patch, or how does this
>> happen? In turn, are all reset handlers now fine to be called outside of
>> BQL? This looks suspicious, but it's been quite a while since I last
>> starred at this.
>>
>> Jan
> Hi Jan,
>
> Sorry for that, it's a dirty hack :).
> Some reset handler probably load stuff in the memory hence a double lock.
> It will probably disappear with:
>
> http://thread.gmane.org/gmane.comp.emulators.qemu/345258

So I guess this patch will shrink a lot once we re-base ontop of Paolo's
patches (which should be real soon now).

>
> Thanks,
> Fred
>
>>> +     */
>>> +    qemu_mutex_unlock_iothread();
>>>       /* reset all devices */
>>>       QTAILQ_FOREACH_SAFE(re, &reset_handlers, entry, nre) {
>>>           re->func(re->opaque);
>>>       }
>>> +    qemu_mutex_lock_iothread();
>>>   }
>>>   
>>>   void qemu_system_reset(bool report)
>>>

-- 
Alex Bennée

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [RFC PATCH V6 02/18] replace spinlock by QemuMutex.
  2015-07-07 11:48       ` Frederic Konrad
@ 2015-07-07 12:34         ` Paolo Bonzini
  2015-07-07 13:06           ` Frederic Konrad
  0 siblings, 1 reply; 82+ messages in thread
From: Paolo Bonzini @ 2015-07-07 12:34 UTC (permalink / raw)
  To: Frederic Konrad, Alex Bennée
  Cc: mttcg, peter.maydell, a.spyridakis, mark.burton, qemu-devel,
	alistair.francis, agraf, guillaume.delbergue



On 07/07/2015 13:48, Frederic Konrad wrote:
>>> this eventually ends up doing a tb_lock on the find_slow path which IIRC
>>> is when might end up doing the actual code generation.
>>
>> Up to this point, system emulation is using the BQL for everything.  I
>> guess things change later.
>
> Actually we use tb_lock to protect all the tb related structure such as
> TBContext etc.. Is it better to use the global lock for this?

No, on the contrary.  But using the BQL is the status as of patch 2, so
it's okay to keep the #ifdefs.  Thanks for confirming that it changes
later in the patch.

Paolo

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [RFC PATCH V6 08/18] cpu: remove exit_request global.
  2015-06-26 14:47 ` [Qemu-devel] [RFC PATCH V6 08/18] cpu: remove exit_request global fred.konrad
  2015-06-26 15:03   ` Paolo Bonzini
@ 2015-07-07 13:04   ` Alex Bennée
  2015-07-07 13:25     ` Frederic Konrad
  1 sibling, 1 reply; 82+ messages in thread
From: Alex Bennée @ 2015-07-07 13:04 UTC (permalink / raw)
  To: fred.konrad
  Cc: mttcg, peter.maydell, a.spyridakis, mark.burton, agraf,
	qemu-devel, guillaume.delbergue, pbonzini, alistair.francis


fred.konrad@greensocs.com writes:

> From: KONRAD Frederic <fred.konrad@greensocs.com>
>
> This removes exit_request global and adds a variable in CPUState for this.
> Only the flag for the first cpu is used for the moment as we are still with one
> TCG thread.
>
> Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
> ---
>  cpu-exec.c | 15 ---------------
>  cpus.c     | 17 ++++++++++++++---
>  2 files changed, 14 insertions(+), 18 deletions(-)
>
> diff --git a/cpu-exec.c b/cpu-exec.c
> index 5d9b518..0644383 100644
> --- a/cpu-exec.c
> +++ b/cpu-exec.c
> @@ -364,8 +364,6 @@ static void cpu_handle_debug_exception(CPUArchState *env)
>  
>  /* main execution loop */
>  
> -volatile sig_atomic_t exit_request;
> -
>  int cpu_exec(CPUArchState *env)
>  {
>      CPUState *cpu = ENV_GET_CPU(env);
> @@ -394,20 +392,8 @@ int cpu_exec(CPUArchState *env)
>  
>      current_cpu = cpu;
>  
> -    /* As long as current_cpu is null, up to the assignment just above,
> -     * requests by other threads to exit the execution loop are expected to
> -     * be issued using the exit_request global. We must make sure that our
> -     * evaluation of the global value is performed past the current_cpu
> -     * value transition point, which requires a memory barrier as well as
> -     * an instruction scheduling constraint on modern architectures.  */
> -    smp_mb();
> -
>      rcu_read_lock();
>  
> -    if (unlikely(exit_request)) {
> -        cpu->exit_request = 1;
> -    }
> -
>      cc->cpu_exec_enter(cpu);
>  
>      /* Calculate difference between guest clock and host clock.
> @@ -496,7 +482,6 @@ int cpu_exec(CPUArchState *env)
>                      }
>                  }
>                  if (unlikely(cpu->exit_request)) {
> -                    cpu->exit_request = 0;
>                      cpu->exception_index = EXCP_INTERRUPT;
>                      cpu_loop_exit(cpu);
>                  }
> diff --git a/cpus.c b/cpus.c
> index 23c316c..2541c56 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -137,6 +137,8 @@ typedef struct TimersState {
>  } TimersState;
>  
>  static TimersState timers_state;
> +/* CPU associated to this thread. */
> +static __thread CPUState *tcg_thread_cpu;
>  
>  int64_t cpu_get_icount_raw(void)
>  {
> @@ -661,12 +663,18 @@ static void cpu_handle_guest_debug(CPUState *cpu)
>      cpu->stopped = true;
>  }
>  
> +/**
> + * cpu_signal
> + * Signal handler when using TCG.
> + */
>  static void cpu_signal(int sig)
>  {
>      if (current_cpu) {
>          cpu_exit(current_cpu);
>      }
> -    exit_request = 1;
> +
> +    /* FIXME: We might want to check if the cpu is running? */
> +    tcg_thread_cpu->exit_request = true;

I guess the potential problem is race conditions here? What happens if
the cpu is signalled by two different threads for two different reasons?

>  }
>  
>  #ifdef CONFIG_LINUX
> @@ -1031,6 +1039,7 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
>  {
>      CPUState *cpu = arg;
>  
> +    tcg_thread_cpu = cpu;
>      qemu_tcg_init_cpu_signals();
>      qemu_thread_get_self(cpu->thread);
>  
> @@ -1393,7 +1402,8 @@ static void tcg_exec_all(void)
>      if (next_cpu == NULL) {
>          next_cpu = first_cpu;
>      }
> -    for (; next_cpu != NULL && !exit_request; next_cpu = CPU_NEXT(next_cpu)) {
> +    for (; next_cpu != NULL && !first_cpu->exit_request;
> +           next_cpu = CPU_NEXT(next_cpu)) {
>          CPUState *cpu = next_cpu;
>          CPUArchState *env = cpu->env_ptr;
>  
> @@ -1410,7 +1420,8 @@ static void tcg_exec_all(void)
>              break;
>          }
>      }
> -    exit_request = 0;
> +
> +    first_cpu->exit_request = 0;
>  }
>  
>  void list_cpus(FILE *f, fprintf_function cpu_fprintf, const char *optarg)

-- 
Alex Bennée

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [RFC PATCH V6 02/18] replace spinlock by QemuMutex.
  2015-07-07 12:34         ` Paolo Bonzini
@ 2015-07-07 13:06           ` Frederic Konrad
  0 siblings, 0 replies; 82+ messages in thread
From: Frederic Konrad @ 2015-07-07 13:06 UTC (permalink / raw)
  To: Paolo Bonzini, Alex Bennée
  Cc: mttcg, peter.maydell, a.spyridakis, mark.burton, qemu-devel,
	alistair.francis, agraf, guillaume.delbergue

On 07/07/2015 14:34, Paolo Bonzini wrote:
>
> On 07/07/2015 13:48, Frederic Konrad wrote:
>>>> this eventually ends up doing a tb_lock on the find_slow path which IIRC
>>>> is when might end up doing the actual code generation.
>>> Up to this point, system emulation is using the BQL for everything.  I
>>> guess things change later.
>> Actually we use tb_lock to protect all the tb related structure such as
>> TBContext etc.. Is it better to use the global lock for this?
> No, on the contrary.  But using the BQL is the status as of patch 2, so
> it's okay to keep the #ifdefs.  Thanks for confirming that it changes
> later in the patch.
>
> Paolo
In fact I changed nothing in patch 2 except abstracting out the #ifdef from
spinlock_t and using qemu_mutex (pthread_t on linux) instead of spinlock_t.
The only reason for that is only to use tb_lock for both user and system 
mode.

And yes as it's the first patch tb_lock is not used in this step except 
in the user
code.

Thanks,
Fred

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [RFC PATCH V6 05/18] protect TBContext with tb_lock.
  2015-07-07 12:22   ` Alex Bennée
@ 2015-07-07 13:16     ` Frederic Konrad
  0 siblings, 0 replies; 82+ messages in thread
From: Frederic Konrad @ 2015-07-07 13:16 UTC (permalink / raw)
  To: Alex Bennée
  Cc: mttcg, peter.maydell, a.spyridakis, mark.burton, agraf,
	qemu-devel, guillaume.delbergue, pbonzini, alistair.francis

On 07/07/2015 14:22, Alex Bennée wrote:
> fred.konrad@greensocs.com writes:
>
>> From: KONRAD Frederic <fred.konrad@greensocs.com>
>>
>> This protects TBContext with tb_lock to make tb_* thread safe.
>>
>> We can still have issue with tb_flush in case of multithread TCG:
>>    An other CPU can be executing code during a flush.
>>
>> This can be fixed later by making all other TCG thread exiting before calling
>> tb_flush().
>>
>> tb_find_slow is separated into tb_find_slow and tb_find_physical as the whole
>> tb_find_slow doesn't require to lock the tb.
>>
>> Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
> So my comments from earlier about the different locking between
> CONFIG_USER and system emulation still stand. Ultimately we need a good
> reason (or an abstraction) before sprinkling #ifdefs in the code if only
> for ease of reading.

True.
>> Changes:
>> V1 -> V2:
>>    * Drop a tb_lock arround tb_find_fast in cpu-exec.c.
>> ---
>>   cpu-exec.c             |  60 ++++++++++++++--------
>>   target-arm/translate.c |   5 ++
>>   tcg/tcg.h              |   7 +++
>>   translate-all.c        | 137 ++++++++++++++++++++++++++++++++++++++-----------
>>   4 files changed, 158 insertions(+), 51 deletions(-)
>>
>> diff --git a/cpu-exec.c b/cpu-exec.c
>> index d6336d9..5d9b518 100644
>> --- a/cpu-exec.c
>> +++ b/cpu-exec.c
>> @@ -130,6 +130,8 @@ static void init_delay_params(SyncClocks *sc, const CPUState *cpu)
>>   void cpu_loop_exit(CPUState *cpu)
>>   {
>>       cpu->current_tb = NULL;
>> +    /* Release those mutex before long jump so other thread can work. */
>> +    tb_lock_reset();
>>       siglongjmp(cpu->jmp_env, 1);
>>   }
>>   
>> @@ -142,6 +144,8 @@ void cpu_resume_from_signal(CPUState *cpu, void *puc)
>>       /* XXX: restore cpu registers saved in host registers */
>>   
>>       cpu->exception_index = -1;
>> +    /* Release those mutex before long jump so other thread can work. */
>> +    tb_lock_reset();
>>       siglongjmp(cpu->jmp_env, 1);
>>   }
>>   
>> @@ -253,12 +257,9 @@ static void cpu_exec_nocache(CPUArchState *env, int max_cycles,
>>       tb_free(tb);
>>   }
>>   
>> -static TranslationBlock *tb_find_slow(CPUArchState *env,
>> -                                      target_ulong pc,
>> -                                      target_ulong cs_base,
>> -                                      uint64_t flags)
>> +static TranslationBlock *tb_find_physical(CPUArchState *env, target_ulong pc,
>> +                                          target_ulong cs_base, uint64_t flags)
>>   {
> As Paolo has already mentioned comments on functions expecting to have
> locks held when called.

Ok, will do that.
>> -    CPUState *cpu = ENV_GET_CPU(env);
>>       TranslationBlock *tb, **ptb1;
>>       unsigned int h;
>>       tb_page_addr_t phys_pc, phys_page1;
>> @@ -273,8 +274,9 @@ static TranslationBlock *tb_find_slow(CPUArchState *env,
>>       ptb1 = &tcg_ctx.tb_ctx.tb_phys_hash[h];
>>       for(;;) {
>>           tb = *ptb1;
>> -        if (!tb)
>> -            goto not_found;
>> +        if (!tb) {
>> +            return tb;
>> +        }
>>           if (tb->pc == pc &&
>>               tb->page_addr[0] == phys_page1 &&
>>               tb->cs_base == cs_base &&
>> @@ -282,28 +284,43 @@ static TranslationBlock *tb_find_slow(CPUArchState *env,
>>               /* check next page if needed */
>>               if (tb->page_addr[1] != -1) {
>>                   tb_page_addr_t phys_page2;
>> -
>>                   virt_page2 = (pc & TARGET_PAGE_MASK) +
>>                       TARGET_PAGE_SIZE;
>>                   phys_page2 = get_page_addr_code(env, virt_page2);
>> -                if (tb->page_addr[1] == phys_page2)
>> -                    goto found;
>> +                if (tb->page_addr[1] == phys_page2) {
>> +                    return tb;
>> +                }
>>               } else {
>> -                goto found;
>> +                return tb;
>>               }
>>           }
>>           ptb1 = &tb->phys_hash_next;
>>       }
>> - not_found:
>> -   /* if no translated code available, then translate it now */
>> -    tb = tb_gen_code(cpu, pc, cs_base, flags, 0);
>> -
>> - found:
>> -    /* Move the last found TB to the head of the list */
>> -    if (likely(*ptb1)) {
>> -        *ptb1 = tb->phys_hash_next;
>> -        tb->phys_hash_next = tcg_ctx.tb_ctx.tb_phys_hash[h];
>> -        tcg_ctx.tb_ctx.tb_phys_hash[h] = tb;
>> +    return tb;
>> +}
>> +
>> +static TranslationBlock *tb_find_slow(CPUArchState *env, target_ulong pc,
>> +                                      target_ulong cs_base, uint64_t flags)
>> +{
>> +    /*
>> +     * First try to get the tb if we don't find it we need to lock and compile
>> +     * it.
>> +     */
>> +    CPUState *cpu = ENV_GET_CPU(env);
>> +    TranslationBlock *tb;
>> +
>> +    tb = tb_find_physical(env, pc, cs_base, flags);
>> +    if (!tb) {
>> +        tb_lock();
>> +        /*
>> +         * Retry to get the TB in case a CPU just translate it to avoid having
>> +         * duplicated TB in the pool.
>> +         */
>> +        tb = tb_find_physical(env, pc, cs_base, flags);
>> +        if (!tb) {
>> +            tb = tb_gen_code(cpu, pc, cs_base, flags, 0);
>> +        }
>> +        tb_unlock();
>>       }
>>       /* we add the TB in the virtual pc hash table */
>>       cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)] = tb;
>> @@ -326,6 +343,7 @@ static inline TranslationBlock *tb_find_fast(CPUArchState *env)
>>                    tb->flags != flags)) {
>>           tb = tb_find_slow(env, pc, cs_base, flags);
>>       }
>> +
>>       return tb;
>>   }
>>   
>> diff --git a/target-arm/translate.c b/target-arm/translate.c
>> index 971b6db..47345aa 100644
>> --- a/target-arm/translate.c
>> +++ b/target-arm/translate.c
>> @@ -11162,6 +11162,8 @@ static inline void gen_intermediate_code_internal(ARMCPU *cpu,
>>   
>>       dc->tb = tb;
>>   
>> +    tb_lock();
>> +
>>       dc->is_jmp = DISAS_NEXT;
>>       dc->pc = pc_start;
>>       dc->singlestep_enabled = cs->singlestep_enabled;
>> @@ -11499,6 +11501,7 @@ done_generating:
>>           tb->size = dc->pc - pc_start;
>>           tb->icount = num_insns;
>>       }
>> +    tb_unlock();
>>   }
>>   
>>   void gen_intermediate_code(CPUARMState *env, TranslationBlock *tb)
>> @@ -11567,6 +11570,7 @@ void arm_cpu_dump_state(CPUState *cs, FILE *f, fprintf_function cpu_fprintf,
>>   
>>   void restore_state_to_opc(CPUARMState *env, TranslationBlock *tb, int pc_pos)
>>   {
>> +    tb_lock();
>>       if (is_a64(env)) {
>>           env->pc = tcg_ctx.gen_opc_pc[pc_pos];
>>           env->condexec_bits = 0;
>> @@ -11574,4 +11578,5 @@ void restore_state_to_opc(CPUARMState *env, TranslationBlock *tb, int pc_pos)
>>           env->regs[15] = tcg_ctx.gen_opc_pc[pc_pos];
>>           env->condexec_bits = gen_opc_condexec_bits[pc_pos];
>>       }
>> +    tb_unlock();
>>   }
>> diff --git a/tcg/tcg.h b/tcg/tcg.h
>> index 41e4869..032fe10 100644
>> --- a/tcg/tcg.h
>> +++ b/tcg/tcg.h
>> @@ -592,17 +592,24 @@ void *tcg_malloc_internal(TCGContext *s, int size);
>>   void tcg_pool_reset(TCGContext *s);
>>   void tcg_pool_delete(TCGContext *s);
>>   
>> +void tb_lock(void);
>> +void tb_unlock(void);
>> +void tb_lock_reset(void);
>> +
>>   static inline void *tcg_malloc(int size)
>>   {
>>       TCGContext *s = &tcg_ctx;
>>       uint8_t *ptr, *ptr_end;
>> +    tb_lock();
>>       size = (size + sizeof(long) - 1) & ~(sizeof(long) - 1);
>>       ptr = s->pool_cur;
>>       ptr_end = ptr + size;
>>       if (unlikely(ptr_end > s->pool_end)) {
>> +        tb_unlock();
>>           return tcg_malloc_internal(&tcg_ctx, size);
> If the purpose of the lock is to protect the global tcg_ctx then we
> shouldn't be unlocking before calling the _internal function which also
> messes with the context.
>

Good point! I missed that.

>>       } else {
>>           s->pool_cur = ptr_end;
>> +        tb_unlock();
>>           return ptr;
>>       }
>>   }
>> diff --git a/translate-all.c b/translate-all.c
>> index b6b0e1c..c25b79b 100644
>> --- a/translate-all.c
>> +++ b/translate-all.c
>> @@ -127,6 +127,34 @@ static void *l1_map[V_L1_SIZE];
>>   /* code generation context */
>>   TCGContext tcg_ctx;
>>   
>> +/* translation block context */
>> +__thread volatile int have_tb_lock;
>> +
>> +void tb_lock(void)
>> +{
>> +    if (!have_tb_lock) {
>> +        qemu_mutex_lock(&tcg_ctx.tb_ctx.tb_lock);
>> +    }
>> +    have_tb_lock++;
>> +}
>> +
>> +void tb_unlock(void)
>> +{
>> +    assert(have_tb_lock > 0);
>> +    have_tb_lock--;
>> +    if (!have_tb_lock) {
>> +        qemu_mutex_unlock(&tcg_ctx.tb_ctx.tb_lock);
>> +    }
>> +}
>> +
>> +void tb_lock_reset(void)
>> +{
>> +    if (have_tb_lock) {
>> +        qemu_mutex_unlock(&tcg_ctx.tb_ctx.tb_lock);
>> +    }
>> +    have_tb_lock = 0;
>> +}
>> +
>>   static void tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc,
>>                            tb_page_addr_t phys_page2);
>>   static TranslationBlock *tb_find_pc(uintptr_t tc_ptr);
>> @@ -215,6 +243,7 @@ static int cpu_restore_state_from_tb(CPUState *cpu, TranslationBlock *tb,
>>   #ifdef CONFIG_PROFILER
>>       ti = profile_getclock();
>>   #endif
>> +    tb_lock();
>>       tcg_func_start(s);
>>   
>>       gen_intermediate_code_pc(env, tb);
>> @@ -228,8 +257,10 @@ static int cpu_restore_state_from_tb(CPUState *cpu, TranslationBlock *tb,
>>   
>>       /* find opc index corresponding to search_pc */
>>       tc_ptr = (uintptr_t)tb->tc_ptr;
>> -    if (searched_pc < tc_ptr)
>> +    if (searched_pc < tc_ptr) {
>> +        tb_unlock();
>>           return -1;
>> +    }
>>   
>>       s->tb_next_offset = tb->tb_next_offset;
>>   #ifdef USE_DIRECT_JUMP
>> @@ -241,8 +272,10 @@ static int cpu_restore_state_from_tb(CPUState *cpu, TranslationBlock *tb,
>>   #endif
>>       j = tcg_gen_code_search_pc(s, (tcg_insn_unit *)tc_ptr,
>>                                  searched_pc - tc_ptr);
>> -    if (j < 0)
>> +    if (j < 0) {
>> +        tb_unlock();
>>           return -1;
>> +    }
>>       /* now find start of instruction before */
>>       while (s->gen_opc_instr_start[j] == 0) {
>>           j--;
>> @@ -255,6 +288,8 @@ static int cpu_restore_state_from_tb(CPUState *cpu, TranslationBlock *tb,
>>       s->restore_time += profile_getclock() - ti;
>>       s->restore_count++;
>>   #endif
>> +
>> +    tb_unlock();
>>       return 0;
>>   }
>>   
>> @@ -672,6 +707,7 @@ static inline void code_gen_alloc(size_t tb_size)
>>               CODE_GEN_AVG_BLOCK_SIZE;
>>       tcg_ctx.tb_ctx.tbs =
>>               g_malloc(tcg_ctx.code_gen_max_blocks * sizeof(TranslationBlock));
>> +    qemu_mutex_init(&tcg_ctx.tb_ctx.tb_lock);
>>   }
>>   
>>   /* Must be called before using the QEMU cpus. 'tb_size' is the size
>> @@ -696,16 +732,22 @@ bool tcg_enabled(void)
>>       return tcg_ctx.code_gen_buffer != NULL;
>>   }
>>   
>> -/* Allocate a new translation block. Flush the translation buffer if
>> -   too many translation blocks or too much generated code. */
>> +/*
>> + * Allocate a new translation block. Flush the translation buffer if
>> + * too many translation blocks or too much generated code.
>> + * tb_alloc is not thread safe but tb_gen_code is protected by a mutex so this
>> + * function is called only by one thread.
> maybe: "..is not thread safe but tb_gen_code is protected by tb_lock so
> only one thread calls it at a time."?

Yes.
>> + */
>>   static TranslationBlock *tb_alloc(target_ulong pc)
>>   {
>> -    TranslationBlock *tb;
>> +    TranslationBlock *tb = NULL;
>>   
>>       if (tcg_ctx.tb_ctx.nb_tbs >= tcg_ctx.code_gen_max_blocks ||
>>           (tcg_ctx.code_gen_ptr - tcg_ctx.code_gen_buffer) >=
>>            tcg_ctx.code_gen_buffer_max_size) {
>> -        return NULL;
>> +        tb = &tcg_ctx.tb_ctx.tbs[tcg_ctx.tb_ctx.nb_tbs++];
>> +        tb->pc = pc;
>> +        tb->cflags = 0;
>>       }
>>       tb = &tcg_ctx.tb_ctx.tbs[tcg_ctx.tb_ctx.nb_tbs++];
>>       tb->pc = pc;
> That looks weird.
>
> if (cond) return
> &tcg_ctx.tb_ctx.tbs[tcg_ctx.tb_ctx.nb_tbs++] then return
> &tcg_ctx.tb_ctx.tbs[tcg_ctx.tb_ctx.nb_tbs++];?
>
> Also rendering the setting of tb = NULL pointless as it will always be
> from the array?

Oops yes sorry, this is definitely a mistake, those changes should have
disappeared.

Thanks,
Fred
>
>> @@ -718,11 +760,16 @@ void tb_free(TranslationBlock *tb)
>>       /* In pr      actice this is mostly used for single use temporary TB
>>          Ignore the hard cases and just back up if this TB happens to
>>          be the last one generated.  */
>> +
>> +    tb_lock();
>> +
>>       if (tcg_ctx.tb_ctx.nb_tbs > 0 &&
>>               tb == &tcg_ctx.tb_ctx.tbs[tcg_ctx.tb_ctx.nb_tbs - 1]) {
>>           tcg_ctx.code_gen_ptr = tb->tc_ptr;
>>           tcg_ctx.tb_ctx.nb_tbs--;
>>       }
>> +
>> +    tb_unlock();
>>   }
>>   
>>   static inline void invalidate_page_bitmap(PageDesc *p)
>> @@ -773,6 +820,8 @@ void tb_flush(CPUArchState *env1)
>>   {
>>       CPUState *cpu = ENV_GET_CPU(env1);
>>   
>> +    tb_lock();
>> +
>>   #if defined(DEBUG_FLUSH)
>>       printf("qemu: flush code_size=%ld nb_tbs=%d avg_tb_size=%ld\n",
>>              (unsigned long)(tcg_ctx.code_gen_ptr - tcg_ctx.code_gen_buffer),
>> @@ -797,6 +846,8 @@ void tb_flush(CPUArchState *env1)
>>       /* XXX: flush processor icache at this point if cache flush is
>>          expensive */
>>       tcg_ctx.tb_ctx.tb_flush_count++;
>> +
>> +    tb_unlock();
>>   }
>>   
>>   #ifdef DEBUG_TB_CHECK
>> @@ -806,6 +857,8 @@ static void tb_invalidate_check(target_ulong address)
>>       TranslationBlock *tb;
>>       int i;
>>   
>> +    tb_lock();
>> +
>>       address &= TARGET_PAGE_MASK;
>>       for (i = 0; i < CODE_GEN_PHYS_HASH_SIZE; i++) {
>>           for (tb = tb_ctx.tb_phys_hash[i]; tb != NULL; tb = tb->phys_hash_next) {
>> @@ -817,6 +870,8 @@ static void tb_invalidate_check(target_ulong address)
>>               }
>>           }
>>       }
>> +
>> +    tb_unlock();
>>   }
>>   
>>   /* verify that all the pages have correct rights for code */
>> @@ -825,6 +880,8 @@ static void tb_page_check(void)
>>       TranslationBlock *tb;
>>       int i, flags1, flags2;
>>   
>> +    tb_lock();
>> +
>>       for (i = 0; i < CODE_GEN_PHYS_HASH_SIZE; i++) {
>>           for (tb = tcg_ctx.tb_ctx.tb_phys_hash[i]; tb != NULL;
>>                   tb = tb->phys_hash_next) {
>> @@ -836,6 +893,8 @@ static void tb_page_check(void)
>>               }
>>           }
>>       }
>> +
>> +    tb_unlock();
>>   }
>>   
>>   #endif
>> @@ -916,6 +975,8 @@ void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr)
>>       tb_page_addr_t phys_pc;
>>       TranslationBlock *tb1, *tb2;
>>   
>> +    tb_lock();
>> +
>>       /* remove the TB from the hash list */
>>       phys_pc = tb->page_addr[0] + (tb->pc & ~TARGET_PAGE_MASK);
>>       h = tb_phys_hash_func(phys_pc);
>> @@ -963,6 +1024,7 @@ void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr)
>>       tb->jmp_first = (TranslationBlock *)((uintptr_t)tb | 2); /* fail safe */
>>   
>>       tcg_ctx.tb_ctx.tb_phys_invalidate_count++;
>> +    tb_unlock();
>>   }
>>   
>>   static void build_page_bitmap(PageDesc *p)
>> @@ -1004,6 +1066,8 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
>>       target_ulong virt_page2;
>>       int code_gen_size;
>>   
>> +    tb_lock();
>> +
>>       phys_pc = get_page_addr_code(env, pc);
>>       if (use_icount) {
>>           cflags |= CF_USE_ICOUNT;
>> @@ -1032,6 +1096,8 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
>>           phys_page2 = get_page_addr_code(env, virt_page2);
>>       }
>>       tb_link_page(tb, phys_pc, phys_page2);
>> +
>> +    tb_unlock();
>>       return tb;
>>   }
>>   
>> @@ -1330,13 +1396,15 @@ static inline void tb_alloc_page(TranslationBlock *tb,
>>   }
>>   
>>   /* add a new TB and link it to the physical page tables. phys_page2 is
>> -   (-1) to indicate that only one page contains the TB. */
>> + * (-1) to indicate that only one page contains the TB. */
>>   static void tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc,
>>                            tb_page_addr_t phys_page2)
>>   {
>>       unsigned int h;
>>       TranslationBlock **ptb;
>>   
>> +    tb_lock();
>> +
>>       /* Grab the mmap lock to stop another thread invalidating this TB
>>          before we are done.  */
>>       mmap_lock();
>> @@ -1370,6 +1438,8 @@ static void tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc,
>>       tb_page_check();
>>   #endif
>>       mmap_unlock();
>> +
>> +    tb_unlock();
>>   }
>>   
>>   /* find the TB 'tb' such that tb[0].tc_ptr <= tc_ptr <
>> @@ -1378,31 +1448,34 @@ static TranslationBlock *tb_find_pc(uintptr_t tc_ptr)
>>   {
>>       int m_min, m_max, m;
>>       uintptr_t v;
>> -    TranslationBlock *tb;
>> -
>> -    if (tcg_ctx.tb_ctx.nb_tbs <= 0) {
>> -        return NULL;
>> -    }
>> -    if (tc_ptr < (uintptr_t)tcg_ctx.code_gen_buffer ||
>> -        tc_ptr >= (uintptr_t)tcg_ctx.code_gen_ptr) {
>> -        return NULL;
>> -    }
>> -    /* binary search (cf Knuth) */
>> -    m_min = 0;
>> -    m_max = tcg_ctx.tb_ctx.nb_tbs - 1;
>> -    while (m_min <= m_max) {
>> -        m = (m_min + m_max) >> 1;
>> -        tb = &tcg_ctx.tb_ctx.tbs[m];
>> -        v = (uintptr_t)tb->tc_ptr;
>> -        if (v == tc_ptr) {
>> -            return tb;
>> -        } else if (tc_ptr < v) {
>> -            m_max = m - 1;
>> -        } else {
>> -            m_min = m + 1;
>> +    TranslationBlock *tb = NULL;
>> +
>> +    tb_lock();
>> +
>> +    if ((tcg_ctx.tb_ctx.nb_tbs > 0)
>> +    && (tc_ptr >= (uintptr_t)tcg_ctx.code_gen_buffer &&
>> +        tc_ptr < (uintptr_t)tcg_ctx.code_gen_ptr)) {
>> +        /* binary search (cf Knuth) */
>> +        m_min = 0;
>> +        m_max = tcg_ctx.tb_ctx.nb_tbs - 1;
>> +        while (m_min <= m_max) {
>> +            m = (m_min + m_max) >> 1;
>> +            tb = &tcg_ctx.tb_ctx.tbs[m];
>> +            v = (uintptr_t)tb->tc_ptr;
>> +            if (v == tc_ptr) {
>> +                tb_unlock();
>> +                return tb;
>> +            } else if (tc_ptr < v) {
>> +                m_max = m - 1;
>> +            } else {
>> +                m_min = m + 1;
>> +            }
>>           }
>> +        tb = &tcg_ctx.tb_ctx.tbs[m_max];
>>       }
>> -    return &tcg_ctx.tb_ctx.tbs[m_max];
>> +
>> +    tb_unlock();
>> +    return tb;
>>   }
>>   
>>   #if !defined(CONFIG_USER_ONLY)
>> @@ -1564,6 +1637,8 @@ void dump_exec_info(FILE *f, fprintf_function cpu_fprintf)
>>       int direct_jmp_count, direct_jmp2_count, cross_page;
>>       TranslationBlock *tb;
>>   
>> +    tb_lock();
>> +
>>       target_code_size = 0;
>>       max_target_code_size = 0;
>>       cross_page = 0;
>> @@ -1619,6 +1694,8 @@ void dump_exec_info(FILE *f, fprintf_function cpu_fprintf)
>>               tcg_ctx.tb_ctx.tb_phys_invalidate_count);
>>       cpu_fprintf(f, "TLB flush count     %d\n", tlb_flush_count);
>>       tcg_dump_info(f, cpu_fprintf);
>> +
>> +    tb_unlock();
>>   }
>>   
>>   void dump_opcount_info(FILE *f, fprintf_function cpu_fprintf)

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [RFC PATCH V6 06/18] tcg: remove tcg_halt_cond global variable.
  2015-07-07 12:27       ` Alex Bennée
@ 2015-07-07 13:17         ` Frederic Konrad
  0 siblings, 0 replies; 82+ messages in thread
From: Frederic Konrad @ 2015-07-07 13:17 UTC (permalink / raw)
  To: Alex Bennée
  Cc: mttcg, peter.maydell, a.spyridakis, mark.burton, agraf,
	qemu-devel, guillaume.delbergue, Paolo Bonzini, alistair.francis

On 07/07/2015 14:27, Alex Bennée wrote:
> Frederic Konrad <fred.konrad@greensocs.com> writes:
>
>> On 26/06/2015 17:02, Paolo Bonzini wrote:
>>> On 26/06/2015 16:47, fred.konrad@greensocs.com wrote:
>>>> From: KONRAD Frederic <fred.konrad@greensocs.com>
>>>>
>>>> This removes tcg_halt_cond global variable.
>>>> We need one QemuCond per virtual cpu for multithread TCG.
>>>>
>>>> Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
> <snip>
>>>> @@ -1068,7 +1065,7 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
>>>>                    qemu_clock_notify(QEMU_CLOCK_VIRTUAL);
>>>>                }
>>>>            }
>>>> -        qemu_tcg_wait_io_event();
>>>> +        qemu_tcg_wait_io_event(QTAILQ_FIRST(&cpus));
>>> Does this work (for non-multithreaded TCG) if tcg_thread_fn is waiting
>>> on the "wrong" condition variable?  For example if all CPUs are idle and
>>> the second CPU wakes up, qemu_tcg_wait_io_event won't be kicked out of
>>> the wait.
>>>
>>> I think you need to have a CPUThread struct like this:
>>>
>>>      struct CPUThread {
>>>          QemuThread thread;
>>>          QemuCond halt_cond;
>>>      };
>>>
>>> and in CPUState have a CPUThread * field instead of the thread and
>>> halt_cond fields.
>>>
>>> Then single-threaded TCG can point all CPUStates to the same instance of
>>> the struct, while multi-threaded TCG can point each CPUState to a
>>> different struct.
>>>
>>> Paolo
>> Hmm probably not, though we didn't pay attention to keep the non MTTCG
>> working.
>> (which is probably not good).
> <snip>
>
> You may want to consider push a branch up to a github mirror and
> enabling travis-ci on the repo. That way you'll at least know how broken
> the rest of the tree is.
>
> I appreciate we are still at the RFC stage here but it will probably pay
> off in the long run to try and avoid breaking the rest of the tree ;-)
>
Good point :)

Fred

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [RFC PATCH V6 07/18] Drop global lock during TCG code execution
  2015-07-07 12:33       ` Alex Bennée
@ 2015-07-07 13:18         ` Frederic Konrad
  0 siblings, 0 replies; 82+ messages in thread
From: Frederic Konrad @ 2015-07-07 13:18 UTC (permalink / raw)
  To: Alex Bennée
  Cc: mttcg, peter.maydell, Jan Kiszka, mark.burton, agraf, qemu-devel,
	guillaume.delbergue, a.spyridakis, pbonzini, alistair.francis

On 07/07/2015 14:33, Alex Bennée wrote:
> Frederic Konrad <fred.konrad@greensocs.com> writes:
>
>> On 26/06/2015 16:56, Jan Kiszka wrote:
>>> On 2015-06-26 16:47, fred.konrad@greensocs.com wrote:
>>>> From: Jan Kiszka <jan.kiszka@siemens.com>
>>>>
>>>> This finally allows TCG to benefit from the iothread introduction: Drop
>>>> the global mutex while running pure TCG CPU code. Reacquire the lock
>>>> when entering MMIO or PIO emulation, or when leaving the TCG loop.
> <snip>
>>>> diff --git a/translate-all.c b/translate-all.c
>>>> index c25b79b..ade2269 100644
>>>> --- a/translate-all.c
>>>> +++ b/translate-all.c
>>>> @@ -1222,6 +1222,7 @@ void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end,
>>>>    #endif
>>>>    #ifdef TARGET_HAS_PRECISE_SMC
>>>>        if (current_tb_modified) {
>>>> +        qemu_mutex_unlock_iothread();
>>>>            /* we generate a block containing just the instruction
>>>>               modifying the memory. It will ensure that it cannot modify
>>>>               itself */
>>>> @@ -1326,6 +1327,7 @@ static void tb_invalidate_phys_page(tb_page_addr_t addr,
>>>>        p->first_tb = NULL;
>>>>    #ifdef TARGET_HAS_PRECISE_SMC
>>>>        if (current_tb_modified) {
>>>> +        qemu_mutex_unlock_iothread();
>>>>            /* we generate a block containing just the instruction
>>>>               modifying the memory. It will ensure that it cannot modify
>>>>               itself */
>>>> diff --git a/vl.c b/vl.c
>>>> index 69ad90c..2983d44 100644
>>>> --- a/vl.c
>>>> +++ b/vl.c
>>>> @@ -1698,10 +1698,16 @@ void qemu_devices_reset(void)
>>>>    {
>>>>        QEMUResetEntry *re, *nre;
>>>>    
>>>> +    /*
>>>> +     * Some device's reset needs to grab the global_mutex. So just release it
>>>> +     * here.
>>> That's a property newly introduced by the patch, or how does this
>>> happen? In turn, are all reset handlers now fine to be called outside of
>>> BQL? This looks suspicious, but it's been quite a while since I last
>>> starred at this.
>>>
>>> Jan
>> Hi Jan,
>>
>> Sorry for that, it's a dirty hack :).
>> Some reset handler probably load stuff in the memory hence a double lock.
>> It will probably disappear with:
>>
>> http://thread.gmane.org/gmane.comp.emulators.qemu/345258
> So I guess this patch will shrink a lot once we re-base ontop of Paolo's
> patches (which should be real soon now).

Yes exactly.
>
>> Thanks,
>> Fred
>>
>>>> +     */
>>>> +    qemu_mutex_unlock_iothread();
>>>>        /* reset all devices */
>>>>        QTAILQ_FOREACH_SAFE(re, &reset_handlers, entry, nre) {
>>>>            re->func(re->opaque);
>>>>        }
>>>> +    qemu_mutex_lock_iothread();
>>>>    }
>>>>    
>>>>    void qemu_system_reset(bool report)
>>>>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [RFC PATCH V6 09/18] cpu: add a tcg_executing flag.
  2015-06-26 14:47 ` [Qemu-devel] [RFC PATCH V6 09/18] cpu: add a tcg_executing flag fred.konrad
@ 2015-07-07 13:23   ` Alex Bennée
  2015-07-07 13:30     ` Frederic Konrad
  0 siblings, 1 reply; 82+ messages in thread
From: Alex Bennée @ 2015-07-07 13:23 UTC (permalink / raw)
  To: fred.konrad
  Cc: mttcg, peter.maydell, a.spyridakis, mark.burton, agraf,
	qemu-devel, guillaume.delbergue, pbonzini, alistair.francis


fred.konrad@greensocs.com writes:

> From: KONRAD Frederic <fred.konrad@greensocs.com>
>
> We need to know whether any other VCPU is executing code or not it's possible
> with this flag.

Reword: "This flag indicates if the vCPU is currently executing TCG code"?
>
> Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
> ---
>  cpu-exec.c        | 1 +
>  cpus.c            | 1 +
>  include/qom/cpu.h | 3 +++
>  qom/cpu.c         | 1 +
>  4 files changed, 6 insertions(+)
>
> diff --git a/cpu-exec.c b/cpu-exec.c
> index 0644383..de256d6 100644
> --- a/cpu-exec.c
> +++ b/cpu-exec.c
> @@ -390,6 +390,7 @@ int cpu_exec(CPUArchState *env)
>          cpu->halted = 0;
>      }
>  
> +    cpu->tcg_executing = 1;
>      current_cpu = cpu;
>  
>      rcu_read_lock();
> diff --git a/cpus.c b/cpus.c
> index 2541c56..0291620 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -1377,6 +1377,7 @@ static int tcg_cpu_exec(CPUArchState *env)
>      }
>      qemu_mutex_unlock_iothread();
>      ret = cpu_exec(env);
> +    cpu->tcg_executing = 0;


This is an odd pairing, having the set in cpu_exec but the clear in the
outer call to it. Any particular reason it is unbalanced?

>      qemu_mutex_lock_iothread();
>  #ifdef CONFIG_PROFILER
>      tcg_time += profile_getclock() - ti;
> diff --git a/include/qom/cpu.h b/include/qom/cpu.h
> index af3c9e4..1464afa 100644
> --- a/include/qom/cpu.h
> +++ b/include/qom/cpu.h
> @@ -222,6 +222,7 @@ struct kvm_run;
>   * @stopped: Indicates the CPU has been artificially stopped.
>   * @tcg_exit_req: Set to force TCG to stop executing linked TBs for this
>   *           CPU and return to its top level loop.
> + * @tcg_executing: This TCG thread is in cpu_exec().
>   * @singlestep_enabled: Flags for single-stepping.
>   * @icount_extra: Instructions until next timer event.
>   * @icount_decr: Number of cycles left, with interrupt flag in high bit.
> @@ -315,6 +316,8 @@ struct CPUState {
>         (absolute value) offset as small as possible.  This reduces code
>         size, especially for hosts without large memory offsets.  */
>      volatile sig_atomic_t tcg_exit_req;
> +
> +    volatile int tcg_executing;
>  };
>  
>  QTAILQ_HEAD(CPUTailQ, CPUState);
> diff --git a/qom/cpu.c b/qom/cpu.c
> index 108bfa2..ff41a4c 100644
> --- a/qom/cpu.c
> +++ b/qom/cpu.c
> @@ -249,6 +249,7 @@ static void cpu_common_reset(CPUState *cpu)
>      cpu->icount_decr.u32 = 0;
>      cpu->can_do_io = 0;
>      cpu->exception_index = -1;
> +    cpu->tcg_executing = 0;
>      memset(cpu->tb_jmp_cache, 0, TB_JMP_CACHE_SIZE * sizeof(void *));
>  }

-- 
Alex Bennée

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [RFC PATCH V6 08/18] cpu: remove exit_request global.
  2015-07-07 13:04   ` Alex Bennée
@ 2015-07-07 13:25     ` Frederic Konrad
  0 siblings, 0 replies; 82+ messages in thread
From: Frederic Konrad @ 2015-07-07 13:25 UTC (permalink / raw)
  To: Alex Bennée
  Cc: mttcg, peter.maydell, a.spyridakis, mark.burton, agraf,
	qemu-devel, guillaume.delbergue, pbonzini, alistair.francis

On 07/07/2015 15:04, Alex Bennée wrote:
> fred.konrad@greensocs.com writes:
>
>> From: KONRAD Frederic <fred.konrad@greensocs.com>
>>
>> This removes exit_request global and adds a variable in CPUState for this.
>> Only the flag for the first cpu is used for the moment as we are still with one
>> TCG thread.
>>
>> Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
>> ---
>>   cpu-exec.c | 15 ---------------
>>   cpus.c     | 17 ++++++++++++++---
>>   2 files changed, 14 insertions(+), 18 deletions(-)
>>
>> diff --git a/cpu-exec.c b/cpu-exec.c
>> index 5d9b518..0644383 100644
>> --- a/cpu-exec.c
>> +++ b/cpu-exec.c
>> @@ -364,8 +364,6 @@ static void cpu_handle_debug_exception(CPUArchState *env)
>>   
>>   /* main execution loop */
>>   
>> -volatile sig_atomic_t exit_request;
>> -
>>   int cpu_exec(CPUArchState *env)
>>   {
>>       CPUState *cpu = ENV_GET_CPU(env);
>> @@ -394,20 +392,8 @@ int cpu_exec(CPUArchState *env)
>>   
>>       current_cpu = cpu;
>>   
>> -    /* As long as current_cpu is null, up to the assignment just above,
>> -     * requests by other threads to exit the execution loop are expected to
>> -     * be issued using the exit_request global. We must make sure that our
>> -     * evaluation of the global value is performed past the current_cpu
>> -     * value transition point, which requires a memory barrier as well as
>> -     * an instruction scheduling constraint on modern architectures.  */
>> -    smp_mb();
>> -
>>       rcu_read_lock();
>>   
>> -    if (unlikely(exit_request)) {
>> -        cpu->exit_request = 1;
>> -    }
>> -
>>       cc->cpu_exec_enter(cpu);
>>   
>>       /* Calculate difference between guest clock and host clock.
>> @@ -496,7 +482,6 @@ int cpu_exec(CPUArchState *env)
>>                       }
>>                   }
>>                   if (unlikely(cpu->exit_request)) {
>> -                    cpu->exit_request = 0;
>>                       cpu->exception_index = EXCP_INTERRUPT;
>>                       cpu_loop_exit(cpu);
>>                   }
>> diff --git a/cpus.c b/cpus.c
>> index 23c316c..2541c56 100644
>> --- a/cpus.c
>> +++ b/cpus.c
>> @@ -137,6 +137,8 @@ typedef struct TimersState {
>>   } TimersState;
>>   
>>   static TimersState timers_state;
>> +/* CPU associated to this thread. */
>> +static __thread CPUState *tcg_thread_cpu;
>>   
>>   int64_t cpu_get_icount_raw(void)
>>   {
>> @@ -661,12 +663,18 @@ static void cpu_handle_guest_debug(CPUState *cpu)
>>       cpu->stopped = true;
>>   }
>>   
>> +/**
>> + * cpu_signal
>> + * Signal handler when using TCG.
>> + */
>>   static void cpu_signal(int sig)
>>   {
>>       if (current_cpu) {
>>           cpu_exit(current_cpu);
>>       }
>> -    exit_request = 1;
>> +
>> +    /* FIXME: We might want to check if the cpu is running? */
>> +    tcg_thread_cpu->exit_request = true;
> I guess the potential problem is race conditions here? What happens if
> the cpu is signalled by two different threads for two different reasons?

Hmmm yes, I need to take a look at that and check all the reason why it 
should exit.

But maybe it's OK, the first time the cpu get the signal it will set 
exit_request..
and the second times as well?

>>   }
>>   
>>   #ifdef CONFIG_LINUX
>> @@ -1031,6 +1039,7 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
>>   {
>>       CPUState *cpu = arg;
>>   
>> +    tcg_thread_cpu = cpu;
>>       qemu_tcg_init_cpu_signals();
>>       qemu_thread_get_self(cpu->thread);
>>   
>> @@ -1393,7 +1402,8 @@ static void tcg_exec_all(void)
>>       if (next_cpu == NULL) {
>>           next_cpu = first_cpu;
>>       }
>> -    for (; next_cpu != NULL && !exit_request; next_cpu = CPU_NEXT(next_cpu)) {
>> +    for (; next_cpu != NULL && !first_cpu->exit_request;
>> +           next_cpu = CPU_NEXT(next_cpu)) {
>>           CPUState *cpu = next_cpu;
>>           CPUArchState *env = cpu->env_ptr;
>>   
>> @@ -1410,7 +1420,8 @@ static void tcg_exec_all(void)
>>               break;
>>           }
>>       }
>> -    exit_request = 0;
>> +
>> +    first_cpu->exit_request = 0;
>>   }
>>   
>>   void list_cpus(FILE *f, fprintf_function cpu_fprintf, const char *optarg)

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [RFC PATCH V6 09/18] cpu: add a tcg_executing flag.
  2015-07-07 13:23   ` Alex Bennée
@ 2015-07-07 13:30     ` Frederic Konrad
  0 siblings, 0 replies; 82+ messages in thread
From: Frederic Konrad @ 2015-07-07 13:30 UTC (permalink / raw)
  To: Alex Bennée
  Cc: mttcg, peter.maydell, a.spyridakis, mark.burton, agraf,
	qemu-devel, guillaume.delbergue, pbonzini, alistair.francis

On 07/07/2015 15:23, Alex Bennée wrote:
> fred.konrad@greensocs.com writes:
>
>> From: KONRAD Frederic <fred.konrad@greensocs.com>
>>
>> We need to know whether any other VCPU is executing code or not it's possible
>> with this flag.
> Reword: "This flag indicates if the vCPU is currently executing TCG code"?

Ok
>> Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
>> ---
>>   cpu-exec.c        | 1 +
>>   cpus.c            | 1 +
>>   include/qom/cpu.h | 3 +++
>>   qom/cpu.c         | 1 +
>>   4 files changed, 6 insertions(+)
>>
>> diff --git a/cpu-exec.c b/cpu-exec.c
>> index 0644383..de256d6 100644
>> --- a/cpu-exec.c
>> +++ b/cpu-exec.c
>> @@ -390,6 +390,7 @@ int cpu_exec(CPUArchState *env)
>>           cpu->halted = 0;
>>       }
>>   
>> +    cpu->tcg_executing = 1;
>>       current_cpu = cpu;
>>   
>>       rcu_read_lock();
>> diff --git a/cpus.c b/cpus.c
>> index 2541c56..0291620 100644
>> --- a/cpus.c
>> +++ b/cpus.c
>> @@ -1377,6 +1377,7 @@ static int tcg_cpu_exec(CPUArchState *env)
>>       }
>>       qemu_mutex_unlock_iothread();
>>       ret = cpu_exec(env);
>> +    cpu->tcg_executing = 0;
>
> This is an odd pairing, having the set in cpu_exec but the clear in the
> outer call to it. Any particular reason it is unbalanced?

true, no particular reason, I should move the clear in cpu_exec..
>
>>       qemu_mutex_lock_iothread();
>>   #ifdef CONFIG_PROFILER
>>       tcg_time += profile_getclock() - ti;
>> diff --git a/include/qom/cpu.h b/include/qom/cpu.h
>> index af3c9e4..1464afa 100644
>> --- a/include/qom/cpu.h
>> +++ b/include/qom/cpu.h
>> @@ -222,6 +222,7 @@ struct kvm_run;
>>    * @stopped: Indicates the CPU has been artificially stopped.
>>    * @tcg_exit_req: Set to force TCG to stop executing linked TBs for this
>>    *           CPU and return to its top level loop.
>> + * @tcg_executing: This TCG thread is in cpu_exec().
>>    * @singlestep_enabled: Flags for single-stepping.
>>    * @icount_extra: Instructions until next timer event.
>>    * @icount_decr: Number of cycles left, with interrupt flag in high bit.
>> @@ -315,6 +316,8 @@ struct CPUState {
>>          (absolute value) offset as small as possible.  This reduces code
>>          size, especially for hosts without large memory offsets.  */
>>       volatile sig_atomic_t tcg_exit_req;
>> +
>> +    volatile int tcg_executing;
>>   };
>>   
>>   QTAILQ_HEAD(CPUTailQ, CPUState);
>> diff --git a/qom/cpu.c b/qom/cpu.c
>> index 108bfa2..ff41a4c 100644
>> --- a/qom/cpu.c
>> +++ b/qom/cpu.c
>> @@ -249,6 +249,7 @@ static void cpu_common_reset(CPUState *cpu)
>>       cpu->icount_decr.u32 = 0;
>>       cpu->can_do_io = 0;
>>       cpu->exception_index = -1;
>> +    cpu->tcg_executing = 0;
>>       memset(cpu->tb_jmp_cache, 0, TB_JMP_CACHE_SIZE * sizeof(void *));
>>   }

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [RFC PATCH V6 10/18] tcg: switch on multithread.
  2015-06-26 14:47 ` [Qemu-devel] [RFC PATCH V6 10/18] tcg: switch on multithread fred.konrad
@ 2015-07-07 13:40   ` Alex Bennée
  0 siblings, 0 replies; 82+ messages in thread
From: Alex Bennée @ 2015-07-07 13:40 UTC (permalink / raw)
  To: fred.konrad
  Cc: mttcg, peter.maydell, a.spyridakis, mark.burton, agraf,
	qemu-devel, guillaume.delbergue, pbonzini, alistair.francis


fred.konrad@greensocs.com writes:

> From: KONRAD Frederic <fred.konrad@greensocs.com>
>
> This switches on multithread.
>
> Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
>
> Changes V5 -> V6:
>   * make qemu_cpu_kick calling qemu_cpu_kick_thread in case of TCG.
> ---
>  cpus.c | 95 ++++++++++++++++++++++++------------------------------------------
>  1 file changed, 34 insertions(+), 61 deletions(-)
>
> diff --git a/cpus.c b/cpus.c
> index 0291620..08267ed 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -65,7 +65,6 @@
>  
>  #endif /* CONFIG_LINUX */
>  
> -static CPUState *next_cpu;
>  int64_t max_delay;
>  int64_t max_advance;
>  
> @@ -820,8 +819,6 @@ static unsigned iothread_requesting_mutex;
>  
>  static QemuThread io_thread;
>  
> -static QemuThread *tcg_cpu_thread;
> -
>  /* cpu creation */
>  static QemuCond qemu_cpu_cond;
>  /* system init */
> @@ -928,10 +925,13 @@ static void qemu_wait_io_event_common(CPUState *cpu)
>  
>  static void qemu_tcg_wait_io_event(CPUState *cpu)
>  {
> -    while (all_cpu_threads_idle()) {
> -       /* Start accounting real time to the virtual clock if the CPUs
> -          are idle.  */
> -        qemu_clock_warp(QEMU_CLOCK_VIRTUAL);
> +    while (cpu_thread_is_idle(cpu)) {
> +        /* Start accounting real time to the virtual clock if the CPUs
> +         * are idle.
> +         */
> +        if ((all_cpu_threads_idle()) && (cpu->cpu_index == 0)) {
> +            qemu_clock_warp(QEMU_CLOCK_VIRTUAL);
> +        }
>          qemu_cond_wait(cpu->halt_cond, &qemu_global_mutex);
>      }
>  
> @@ -939,9 +939,7 @@ static void qemu_tcg_wait_io_event(CPUState *cpu)
>          qemu_cond_wait(&qemu_io_proceeded_cond, &qemu_global_mutex);
>      }
>  
> -    CPU_FOREACH(cpu) {
> -        qemu_wait_io_event_common(cpu);
> -    }
> +    qemu_wait_io_event_common(cpu);
>  }
>  
>  static void qemu_kvm_wait_io_event(CPUState *cpu)
> @@ -1033,7 +1031,7 @@ static void *qemu_dummy_cpu_thread_fn(void *arg)
>  #endif
>  }
>  
> -static void tcg_exec_all(void);
> +static void tcg_exec_all(CPUState *cpu);
>  
>  static void *qemu_tcg_cpu_thread_fn(void *arg)
>  {

This function could really do with a little comment header marking it
out at the start of life for each TCG vCPU.

> @@ -1044,37 +1042,26 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
>      qemu_thread_get_self(cpu->thread);
>  
>      qemu_mutex_lock_iothread();
> -    CPU_FOREACH(cpu) {
> -        cpu->thread_id = qemu_get_thread_id();
> -        cpu->created = true;
> -        cpu->can_do_io = 1;
> -    }
> -    qemu_cond_signal(&qemu_cpu_cond);
> -
> -    /* wait for initial kick-off after machine start */
> -    while (first_cpu->stopped) {
> -        qemu_cond_wait(first_cpu->halt_cond, &qemu_global_mutex);
> -
> -        /* process any pending work */
> -        CPU_FOREACH(cpu) {
> -            qemu_wait_io_event_common(cpu);
> -        }
> -    }
> +    cpu->thread_id = qemu_get_thread_id();
> +    cpu->created = true;
> +    cpu->can_do_io = 1;
>  
> -    /* process any pending work */
> -    exit_request = 1;
> +    qemu_cond_signal(&qemu_cpu_cond);
>  
>      while (1) {
> -        tcg_exec_all();
> +        if (!cpu->stopped) {
> +            tcg_exec_all(cpu);
>  
> -        if (use_icount) {
> -            int64_t deadline = qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL);
> +            if (use_icount) {
> +                int64_t deadline =
> +                    qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL);
>  
> -            if (deadline == 0) {
> -                qemu_clock_notify(QEMU_CLOCK_VIRTUAL);
> +                if (deadline == 0) {
> +                    qemu_clock_notify(QEMU_CLOCK_VIRTUAL);
> +                }
>              }
>          }
> -        qemu_tcg_wait_io_event(QTAILQ_FIRST(&cpus));
> +        qemu_tcg_wait_io_event(cpu);
>      }
>  
>      return NULL;
> @@ -1122,7 +1109,7 @@ static void qemu_cpu_kick_thread(CPUState *cpu)
>  void qemu_cpu_kick(CPUState *cpu)
>  {
>      qemu_cond_broadcast(cpu->halt_cond);
> -    if (!tcg_enabled() && !cpu->thread_kicked) {
> +    if (!cpu->thread_kicked) {
>          qemu_cpu_kick_thread(cpu);
>          cpu->thread_kicked = true;
>      }
> @@ -1232,23 +1219,15 @@ static void qemu_tcg_init_vcpu(CPUState *cpu)
>  
>      cpu->halt_cond = g_malloc0(sizeof(QemuCond));
>      qemu_cond_init(cpu->halt_cond);
> -
> -    /* share a single thread for all cpus with TCG */
> -    if (!tcg_cpu_thread) {
> -        cpu->thread = g_malloc0(sizeof(QemuThread));
> -        snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/TCG",
> -                 cpu->cpu_index);
> -        qemu_thread_create(cpu->thread, thread_name, qemu_tcg_cpu_thread_fn,
> -                           cpu, QEMU_THREAD_JOINABLE);
> +    cpu->thread = g_malloc0(sizeof(QemuThread));
> +    snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/TCG", cpu->cpu_index);
> +    qemu_thread_create(cpu->thread, thread_name, qemu_tcg_cpu_thread_fn, cpu,
> +                       QEMU_THREAD_JOINABLE);
>  #ifdef _WIN32
> -        cpu->hThread = qemu_thread_get_handle(cpu->thread);
> +    cpu->hThread = qemu_thread_get_handle(cpu->thread);
>  #endif
> -        while (!cpu->created) {
> -            qemu_cond_wait(&qemu_cpu_cond, &qemu_global_mutex);
> -        }
> -        tcg_cpu_thread = cpu->thread;
> -    } else {
> -        cpu->thread = tcg_cpu_thread;
> +    while (!cpu->created) {
> +        qemu_cond_wait(&qemu_cpu_cond, &qemu_global_mutex);
>      }
>  }
>  
> @@ -1393,21 +1372,15 @@ static int tcg_cpu_exec(CPUArchState *env)
>      return ret;
>  }
>  
> -static void tcg_exec_all(void)
> +static void tcg_exec_all(CPUState *cpu)

I'd drop the _all and rename the function tcg_exec() to avoid confusion.

>  {
>      int r;
> +    CPUArchState *env = cpu->env_ptr;
>  
>      /* Account partial waits to QEMU_CLOCK_VIRTUAL.  */
>      qemu_clock_warp(QEMU_CLOCK_VIRTUAL);
>  
> -    if (next_cpu == NULL) {
> -        next_cpu = first_cpu;
> -    }
> -    for (; next_cpu != NULL && !first_cpu->exit_request;
> -           next_cpu = CPU_NEXT(next_cpu)) {
> -        CPUState *cpu = next_cpu;
> -        CPUArchState *env = cpu->env_ptr;
> -
> +    while (!cpu->exit_request) {
>          qemu_clock_enable(QEMU_CLOCK_VIRTUAL,
>                            (cpu->singlestep_enabled & SSTEP_NOTIMER) == 0);
>  
> @@ -1422,7 +1395,7 @@ static void tcg_exec_all(void)
>          }
>      }
>  
> -    first_cpu->exit_request = 0;
> +    cpu->exit_request = 0;
>  }
>  
>  void list_cpus(FILE *f, fprintf_function cpu_fprintf, const char *optarg)

-- 
Alex Bennée

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [RFC PATCH V6 11/18] cpus: make qemu_cpu_kick_thread public.
  2015-06-26 14:47 ` [Qemu-devel] [RFC PATCH V6 11/18] cpus: make qemu_cpu_kick_thread public fred.konrad
@ 2015-07-07 15:11   ` Alex Bennée
  0 siblings, 0 replies; 82+ messages in thread
From: Alex Bennée @ 2015-07-07 15:11 UTC (permalink / raw)
  To: fred.konrad
  Cc: mttcg, peter.maydell, a.spyridakis, mark.burton, agraf,
	qemu-devel, guillaume.delbergue, pbonzini, alistair.francis


fred.konrad@greensocs.com writes:

> From: KONRAD Frederic <fred.konrad@greensocs.com>
>
> This makes qemu_cpu_kick_thread public.
>
> Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
> ---
>  cpus.c                | 2 +-
>  include/sysemu/cpus.h | 1 +
>  2 files changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/cpus.c b/cpus.c
> index 08267ed..5f13d73 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -1067,7 +1067,7 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
>      return NULL;
>  }
>  
> -static void qemu_cpu_kick_thread(CPUState *cpu)
> +void qemu_cpu_kick_thread(CPUState *cpu)
>  {
>  #ifndef _WIN32
>      int err;
> diff --git a/include/sysemu/cpus.h b/include/sysemu/cpus.h
> index 3f162a9..4f95b72 100644
> --- a/include/sysemu/cpus.h
> +++ b/include/sysemu/cpus.h
> @@ -6,6 +6,7 @@ void qemu_init_cpu_loop(void);
>  void resume_all_vcpus(void);
>  void pause_all_vcpus(void);
>  void cpu_stop_current(void);
> +void qemu_cpu_kick_thread(CPUState *cpu);

Again I couldn't see use outside of cpus.c which could be solved by
putting the declaration at the top.

>  
>  void cpu_synchronize_all_states(void);
>  void cpu_synchronize_all_post_reset(void);

-- 
Alex Bennée

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [RFC PATCH V6 14/18] add a callback when tb_invalidate is called.
  2015-06-26 14:47 ` [Qemu-devel] [RFC PATCH V6 14/18] add a callback when tb_invalidate is called fred.konrad
  2015-06-26 16:20   ` Paolo Bonzini
@ 2015-07-07 15:32   ` Alex Bennée
  1 sibling, 0 replies; 82+ messages in thread
From: Alex Bennée @ 2015-07-07 15:32 UTC (permalink / raw)
  To: fred.konrad
  Cc: mttcg, peter.maydell, a.spyridakis, mark.burton, agraf,
	qemu-devel, guillaume.delbergue, pbonzini, alistair.francis


fred.konrad@greensocs.com writes:

> From: KONRAD Frederic <fred.konrad@greensocs.com>
>
> Instead of doing the jump cache invalidation directly in tb_invalidate delay it
> after the exit so we don't have an other CPU trying to execute the code being
> invalidated.
>
> Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
> ---
>  translate-all.c | 61 +++++++++++++++++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 59 insertions(+), 2 deletions(-)
>
> diff --git a/translate-all.c b/translate-all.c
> index ade2269..468648d 100644
> --- a/translate-all.c
> +++ b/translate-all.c
> @@ -61,6 +61,7 @@
>  #include "translate-all.h"
>  #include "qemu/bitmap.h"
>  #include "qemu/timer.h"
> +#include "sysemu/cpus.h"
>  
>  //#define DEBUG_TB_INVALIDATE
>  //#define DEBUG_FLUSH
> @@ -966,14 +967,58 @@ static inline void tb_reset_jump(TranslationBlock *tb, int n)
>      tb_set_jmp_target(tb, n, (uintptr_t)(tb->tc_ptr + tb->tb_next_offset[n]));
>  }
>  
> +struct CPUDiscardTBParams {
> +    CPUState *cpu;
> +    TranslationBlock *tb;
> +};
> +
> +static void cpu_discard_tb_from_jmp_cache(void *opaque)
> +{
> +    unsigned int h;
> +    struct CPUDiscardTBParams *params = opaque;
> +
> +    h = tb_jmp_cache_hash_func(params->tb->pc);
> +    if (params->cpu->tb_jmp_cache[h] == params->tb) {
> +        params->cpu->tb_jmp_cache[h] = NULL;
> +    }
> +
> +    g_free(opaque);
> +}
> +
> +static void tb_invalidate_jmp_remove(void *opaque)
> +{
> +    TranslationBlock *tb = opaque;
> +    TranslationBlock *tb1, *tb2;
> +    unsigned int n1;
> +
> +    /* suppress this TB from the two jump lists */
> +    tb_jmp_remove(tb, 0);
> +    tb_jmp_remove(tb, 1);
> +
> +    /* suppress any remaining jumps to this TB */
> +    tb1 = tb->jmp_first;
> +    for (;;) {
> +        n1 = (uintptr_t)tb1 & 3;
> +        if (n1 == 2) {
> +            break;
> +        }
> +        tb1 = (TranslationBlock *)((uintptr_t)tb1 & ~3);
> +        tb2 = tb1->jmp_next[n1];
> +        tb_reset_jump(tb1, n1);
> +        tb1->jmp_next[n1] = NULL;
> +        tb1 = tb2;
> +    }
> +    tb->jmp_first = (TranslationBlock *)((uintptr_t)tb | 2); /* fail safe */
> +}
> +
>  /* invalidate one TB */
>  void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr)
>  {
>      CPUState *cpu;
>      PageDesc *p;
> -    unsigned int h, n1;
> +    unsigned int h;
>      tb_page_addr_t phys_pc;
> -    TranslationBlock *tb1, *tb2;
> +    struct CPUDiscardTBParams *params;
>  
>      tb_lock();
>  
> @@ -996,6 +1041,9 @@ void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr)
>  
>      tcg_ctx.tb_ctx.tb_invalidated_flag = 1;
>  
> +#if 0 /*MTTCG*/
> +    TranslationBlock *tb1, *tb2;
> +    unsigned int n1;

We may as well bite the bullet and get some build logic in to
conditionally build MTTCG (with the aim they will all be eventually).

>      /* remove the TB from the hash list */
>      h = tb_jmp_cache_hash_func(tb->pc);
>      CPU_FOREACH(cpu) {
> @@ -1022,6 +1070,15 @@ void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr)
>          tb1 = tb2;
>      }
>      tb->jmp_first = (TranslationBlock *)((uintptr_t)tb | 2); /* fail safe */
> +#else
> +    CPU_FOREACH(cpu) {
> +        params = g_malloc(sizeof(struct CPUDiscardTBParams));
> +        params->cpu = cpu;
> +        params->tb = tb;
> +        async_run_on_cpu(cpu, cpu_discard_tb_from_jmp_cache, params);
> +    }
> +    async_run_safe_work_on_cpu(first_cpu, tb_invalidate_jmp_remove, tb);
> +#endif /* MTTCG */
>  
>      tcg_ctx.tb_ctx.tb_phys_invalidate_count++;
>      tb_unlock();

-- 
Alex Bennée

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [RFC PATCH V6 15/18] cpu: introduce tlb_flush*_all.
  2015-06-26 14:47 ` [Qemu-devel] [RFC PATCH V6 15/18] cpu: introduce tlb_flush*_all fred.konrad
  2015-06-26 15:15   ` Paolo Bonzini
@ 2015-07-07 15:52   ` Alex Bennée
  1 sibling, 0 replies; 82+ messages in thread
From: Alex Bennée @ 2015-07-07 15:52 UTC (permalink / raw)
  To: fred.konrad
  Cc: mttcg, peter.maydell, a.spyridakis, mark.burton, agraf,
	qemu-devel, guillaume.delbergue, pbonzini, alistair.francis


fred.konrad@greensocs.com writes:

> From: KONRAD Frederic <fred.konrad@greensocs.com>
>
> Some architectures allow to flush the tlb of other VCPUs. This is not a problem
> when we have only one thread for all VCPUs but it definitely needs to be an
> asynchronous work when we are in true multithreaded work.
>
> TODO: Some test case, I fear some bad results in case a VCPUs execute a barrier
>       or something like that.
>
> Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
> ---
>  cputlb.c                | 76 +++++++++++++++++++++++++++++++++++++++++++++++++
>  include/exec/exec-all.h |  2 ++
>  2 files changed, 78 insertions(+)
>
> diff --git a/cputlb.c b/cputlb.c
> index 79fff1c..e5853fd 100644
> --- a/cputlb.c
> +++ b/cputlb.c
> @@ -72,6 +72,45 @@ void tlb_flush(CPUState *cpu, int flush_global)
>      tlb_flush_count++;
>  }
>  
> +struct TLBFlushParams {
> +    CPUState *cpu;
> +    int flush_global;
> +};
> +
> +static void tlb_flush_async_work(void *opaque)
> +{
> +    struct TLBFlushParams *params = opaque;
> +
> +    tlb_flush(params->cpu, params->flush_global);
> +    g_free(params);
> +}
> +
> +void tlb_flush_all(int flush_global)
> +{
> +    CPUState *cpu;
> +    struct TLBFlushParams *params;
> +
> +#if 0 /* MTTCG */
> +    CPU_FOREACH(cpu) {
> +        tlb_flush(cpu, flush_global);
> +    }
> +#else

As before we might as well add the build machinery - for one thing I
read that as the first leg being MTTCG ;-)

> +    CPU_FOREACH(cpu) {
> +        if (qemu_cpu_is_self(cpu)) {
> +            /* async_run_on_cpu handle this case but this just avoid a malloc
> +             * here.
> +             */
> +            tlb_flush(cpu, flush_global);
> +        } else {
> +            params = g_malloc(sizeof(struct TLBFlushParams));
> +            params->cpu = cpu;
> +            params->flush_global = flush_global;
> +            async_run_on_cpu(cpu, tlb_flush_async_work, params);
> +        }
> +    }
> +#endif /* MTTCG */
> +}
> +
>  static inline void tlb_flush_entry(CPUTLBEntry *tlb_entry, target_ulong addr)
>  {
>      if (addr == (tlb_entry->addr_read &
> @@ -124,6 +163,43 @@ void tlb_flush_page(CPUState *cpu, target_ulong addr)
>      tb_flush_jmp_cache(cpu, addr);
>  }
>  
> +struct TLBFlushPageParams {
> +    CPUState *cpu;
> +    target_ulong addr;
> +};
> +
> +static void tlb_flush_page_async_work(void *opaque)
> +{
> +    struct TLBFlushPageParams *params = opaque;
> +
> +    tlb_flush_page(params->cpu, params->addr);
> +    g_free(params);
> +}
> +
> +void tlb_flush_page_all(target_ulong addr)
> +{
> +    CPUState *cpu;
> +    struct TLBFlushPageParams *params;
> +
> +    CPU_FOREACH(cpu) {
> +#if 0 /* !MTTCG */
> +        tlb_flush_page(cpu, addr);
> +#else
> +        if (qemu_cpu_is_self(cpu)) {
> +            /* async_run_on_cpu handle this case but this just avoid a malloc
> +             * here.
> +             */
> +            tlb_flush_page(cpu, addr);
> +        } else {
> +            params = g_malloc(sizeof(struct TLBFlushPageParams));
> +            params->cpu = cpu;
> +            params->addr = addr;
> +            async_run_on_cpu(cpu, tlb_flush_page_async_work, params);
> +        }
> +#endif /* MTTCG */
> +    }
> +}
> +
>  /* update the TLBs so that writes to code in the virtual page 'addr'
>     can be detected */
>  void tlb_protect_code(ram_addr_t ram_addr)
> diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
> index 44f3336..484c351 100644
> --- a/include/exec/exec-all.h
> +++ b/include/exec/exec-all.h
> @@ -96,7 +96,9 @@ bool qemu_in_vcpu_thread(void);
>  void cpu_reload_memory_map(CPUState *cpu);
>  void tcg_cpu_address_space_init(CPUState *cpu, AddressSpace *as);
>  /* cputlb.c */
> +void tlb_flush_page_all(target_ulong addr);
>  void tlb_flush_page(CPUState *cpu, target_ulong addr);
> +void tlb_flush_all(int flush_global);
>  void tlb_flush(CPUState *cpu, int flush_global);
>  void tlb_set_page(CPUState *cpu, target_ulong vaddr,
>                    hwaddr paddr, int prot,

-- 
Alex Bennée

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [RFC PATCH V6 15/18] cpu: introduce tlb_flush*_all.
  2015-07-06 14:29             ` Mark Burton
@ 2015-07-07 16:12               ` Alex Bennée
  0 siblings, 0 replies; 82+ messages in thread
From: Alex Bennée @ 2015-07-07 16:12 UTC (permalink / raw)
  To: Mark Burton
  Cc: mttcg, Peter Maydell, Alexander Spyridakis, QEMU Developers,
	Paolo Bonzini, KONRAD Frédéric


Mark Burton <mark.burton@greensocs.com> writes:

> Paolo, Alex, Alexander,
>
> Talking to Fred after the call about ways of avoiding the ‘stop the world’ (or rather ‘sync the world’) - we already discussed this on this thread.
> One thing that would be very helpful would be some test cases around
> this. We could then use Fred’s code to check some of the possible
> solutions out….

Yeah we certainly could do with some. I'm currently investigating the
memory barriers but TLB flushing might be easier to write at first.

>
> I’m not sure if there is wiggle room in Peter’s statement below. Can
> the TLB operation be completed on one core, but not ‘seen’ by other
> cores until they hit an exit…..?

I suspect they can - assuming no other guest synchronisation primitive
was in play who's to say the other cores weren't at their eventual PC
already. However I suspect the key thing is the first core doesn't restart
until all the other cores have caught up with their flush operations.

>
> Cheers
>
> Mark.
>
>
>> On 26 Jun 2015, at 18:30, Frederic Konrad <fred.konrad@greensocs.com> wrote:
>> 
>> On 26/06/2015 18:08, Peter Maydell wrote:
>>> On 26 June 2015 at 17:01, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>>> On 26/06/2015 17:54, Frederic Konrad wrote:
>>>>> So what happen is:
>>>>> An arm instruction want to clear tlb of all VCPUs eg: IS version of
>>>>> TLBIALL.
>>>>> The VCPU which execute the TLBIALL_IS can't flush tlb of other VCPU.
>>>>> It will just ask all VCPU thread to exit and to do tlb_flush hence the
>>>>> async_work.
>>>>> 
>>>>> Maybe the big issue might be memory barrier instruction here which I didn't
>>>>> checked.
>>>> Yeah, ISTR that in some cases you have to wait for other CPUs to
>>>> invalidate the TLB before proceeding.  Maybe it's only when you have a
>>>> dmb instruction, but it's probably simpler for QEMU to always do it
>>>> synchronously.
>>> Yeah, the ARM architectural requirement here is that the TLB
>>> operation is complete after a DSB instruction executes. (True for
>>> any TLB op, not just the all-CPUs ones). NB that we also call
>>> tlb_flush() from target-arm/ code for some things like "we just
>>> updated a system register"; some of those have "must take effect
>>> immediately" semantics.
>>> 
>>> In any case, for generic code we have to also consider the
>>> semantics of non-ARM guests...
>>> 
>>> thanks
>>> -- PMM
>> Yes this is not the case as I implemented it.
>> 
>> The rest of the TB will be executed before the tlb_flush work really happen.
>> The old version did this, was slow and was a mess (if two VCPUs want to tlb_flush
>> at the same time and an other tlb_flush_page.. it becomes tricky..)
>> 
>> I think it's not really terrible if the other VCPU execute some stuff before doing the
>> tlb_flush.? So the solution would be only to cut the TranslationBlock after instruction
>> which require a tlb_flush?
>> 
>> Thanks,
>> Fred
>> 

-- 
Alex Bennée

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [RFC PATCH V6 16/18] arm: use tlb_flush*_all
  2015-06-26 14:47 ` [Qemu-devel] [RFC PATCH V6 16/18] arm: use tlb_flush*_all fred.konrad
@ 2015-07-07 16:14   ` Alex Bennée
  0 siblings, 0 replies; 82+ messages in thread
From: Alex Bennée @ 2015-07-07 16:14 UTC (permalink / raw)
  To: fred.konrad
  Cc: mttcg, peter.maydell, a.spyridakis, mark.burton, agraf,
	qemu-devel, guillaume.delbergue, pbonzini, alistair.francis


fred.konrad@greensocs.com writes:

> From: KONRAD Frederic <fred.konrad@greensocs.com>
>
> This just use the new mechanism to ensure that each VCPU thread flush its own
> VCPU.
>
> Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
> ---
>  target-arm/helper.c | 45 +++++++--------------------------------------
>  1 file changed, 7 insertions(+), 38 deletions(-)
>
> diff --git a/target-arm/helper.c b/target-arm/helper.c
> index ad3d5da..1995439 100644
> --- a/target-arm/helper.c
> +++ b/target-arm/helper.c
> @@ -411,41 +411,25 @@ static void tlbimvaa_write(CPUARMState *env, const ARMCPRegInfo *ri,
>  static void tlbiall_is_write(CPUARMState *env, const ARMCPRegInfo *ri,
>                               uint64_t value)
>  {
> -    CPUState *other_cs;
> -
> -    CPU_FOREACH(other_cs) {
> -        tlb_flush(other_cs, 1);
> -    }
> +    tlb_flush_all(1);
>  }
>  
>  static void tlbiasid_is_write(CPUARMState *env, const ARMCPRegInfo *ri,
>                               uint64_t value)
>  {
> -    CPUState *other_cs;
> -
> -    CPU_FOREACH(other_cs) {
> -        tlb_flush(other_cs, value == 0);
> -    }
> +    tlb_flush_all(value == 0);
>  }
>  
>  static void tlbimva_is_write(CPUARMState *env, const ARMCPRegInfo *ri,
>                               uint64_t value)
>  {
> -    CPUState *other_cs;
> -
> -    CPU_FOREACH(other_cs) {
> -        tlb_flush_page(other_cs, value & TARGET_PAGE_MASK);
> -    }
> +    tlb_flush_page_all(value & TARGET_PAGE_MASK);
>  }
>  
>  static void tlbimvaa_is_write(CPUARMState *env, const ARMCPRegInfo *ri,
>                               uint64_t value)
>  {
> -    CPUState *other_cs;
> -
> -    CPU_FOREACH(other_cs) {
> -        tlb_flush_page(other_cs, value & TARGET_PAGE_MASK);
> -    }
> +    tlb_flush_page_all(value & TARGET_PAGE_MASK);
>  }
>  
>  static const ARMCPRegInfo cp_reginfo[] = {
> @@ -2281,34 +2265,19 @@ static void tlbi_aa64_asid_write(CPUARMState *env, const ARMCPRegInfo *ri,
>  static void tlbi_aa64_va_is_write(CPUARMState *env, const ARMCPRegInfo *ri,
>                                    uint64_t value)
>  {
> -    CPUState *other_cs;
> -    uint64_t pageaddr = sextract64(value << 12, 0, 56);
> -
> -    CPU_FOREACH(other_cs) {
> -        tlb_flush_page(other_cs, pageaddr);
> -    }
> +    tlb_flush_page_all(sextract64(value << 12, 0, 56));
>  }

Personally I'd keep the:

uint64_t pageaddr = sextract64(value << 12, 0, 56);

The compiler will optimise away but the reader will now what those bits
are.

>  
>  static void tlbi_aa64_vaa_is_write(CPUARMState *env, const ARMCPRegInfo *ri,
>                                    uint64_t value)
>  {
> -    CPUState *other_cs;
> -    uint64_t pageaddr = sextract64(value << 12, 0, 56);
> -
> -    CPU_FOREACH(other_cs) {
> -        tlb_flush_page(other_cs, pageaddr);
> -    }
> +    tlb_flush_page_all(sextract64(value << 12, 0, 56));
>  }

ditto

>  
>  static void tlbi_aa64_asid_is_write(CPUARMState *env, const ARMCPRegInfo *ri,
>                                    uint64_t value)
>  {
> -    CPUState *other_cs;
> -    int asid = extract64(value, 48, 16);
> -
> -    CPU_FOREACH(other_cs) {
> -        tlb_flush(other_cs, asid == 0);
> -    }
> +    tlb_flush_all(extract64(value, 48, 16) == 0);
>  }

ditto

>  
>  static CPAccessResult aa64_zva_access(CPUARMState *env, const ARMCPRegInfo *ri)

-- 
Alex Bennée

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [RFC PATCH V6 17/18] translate-all: introduces tb_flush_safe.
  2015-06-26 14:47 ` [Qemu-devel] [RFC PATCH V6 17/18] translate-all: introduces tb_flush_safe fred.konrad
@ 2015-07-07 16:16   ` Alex Bennée
  0 siblings, 0 replies; 82+ messages in thread
From: Alex Bennée @ 2015-07-07 16:16 UTC (permalink / raw)
  To: fred.konrad
  Cc: mttcg, peter.maydell, a.spyridakis, mark.burton, agraf,
	qemu-devel, guillaume.delbergue, pbonzini, alistair.francis


fred.konrad@greensocs.com writes:

> From: KONRAD Frederic <fred.konrad@greensocs.com>
>
> tb_flush is not thread safe we definitely need to exit VCPUs to do that.
> This introduces tb_flush_safe which just creates an async safe work which will
> do a tb_flush later.
>
> Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
> ---
>  include/exec/exec-all.h |  1 +
>  translate-all.c         | 15 +++++++++++++++
>  2 files changed, 16 insertions(+)
>
> diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
> index 484c351..b5e4fb3 100644
> --- a/include/exec/exec-all.h
> +++ b/include/exec/exec-all.h
> @@ -219,6 +219,7 @@ static inline unsigned int tb_phys_hash_func(tb_page_addr_t pc)
>  
>  void tb_free(TranslationBlock *tb);
>  void tb_flush(CPUArchState *env);
> +void tb_flush_safe(CPUArchState *env);
>  void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr);
>  
>  #if defined(USE_DIRECT_JUMP)
> diff --git a/translate-all.c b/translate-all.c
> index 468648d..8bd8fe8 100644
> --- a/translate-all.c
> +++ b/translate-all.c
> @@ -815,6 +815,21 @@ static void page_flush_tb(void)
>      }
>  }
>  
> +static void tb_flush_work(void *opaque)
> +{
> +    CPUArchState *env = opaque;
> +    tb_flush(env);
> +}
> +
> +void tb_flush_safe(CPUArchState *env)
> +{
> +#if 0 /* !MTTCG */
> +    tb_flush(env);
> +#else
> +    async_run_safe_work_on_cpu(ENV_GET_CPU(env), tb_flush_work, env);
> +#endif /* MTTCG */
> +}

Same comments about build system fixups.

> +
>  /* flush all the translation blocks */
>  /* XXX: tb_flush is currently not thread safe */
>  void tb_flush(CPUArchState *env1)

-- 
Alex Bennée

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [RFC PATCH V6 18/18] translate-all: (wip) use tb_flush_safe when we can't alloc more tb.
  2015-06-26 14:47 ` [Qemu-devel] [RFC PATCH V6 18/18] translate-all: (wip) use tb_flush_safe when we can't alloc more tb fred.konrad
  2015-06-26 16:21   ` Paolo Bonzini
@ 2015-07-07 16:17   ` Alex Bennée
  2015-07-07 16:23     ` Frederic Konrad
  1 sibling, 1 reply; 82+ messages in thread
From: Alex Bennée @ 2015-07-07 16:17 UTC (permalink / raw)
  To: fred.konrad
  Cc: mttcg, peter.maydell, a.spyridakis, mark.burton, agraf,
	qemu-devel, guillaume.delbergue, pbonzini, alistair.francis


fred.konrad@greensocs.com writes:

> From: KONRAD Frederic <fred.konrad@greensocs.com>
>
> This changes just the tb_flush called from tb_alloc.
>
> TODO:
>  * changes the other tb_flush.
>
> Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
> ---
>  translate-all.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/translate-all.c b/translate-all.c
> index 8bd8fe8..9adaffa 100644
> --- a/translate-all.c
> +++ b/translate-all.c
> @@ -1147,7 +1147,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
>      tb = tb_alloc(pc);
>      if (!tb) {
>          /* flush must be done */
> -        tb_flush(env);
> +        tb_flush_safe(env);

Hold on this is async right? What stops us rolling on and then getting
flushed when the other vCPUs come to a halt?

It deserves a comment at least.

>          /* cannot fail at this point */
>          tb = tb_alloc(pc);
>          /* Don't forget to invalidate previous TB info.  */

-- 
Alex Bennée

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [RFC PATCH V6 18/18] translate-all: (wip) use tb_flush_safe when we can't alloc more tb.
  2015-07-07 16:17   ` Alex Bennée
@ 2015-07-07 16:23     ` Frederic Konrad
  0 siblings, 0 replies; 82+ messages in thread
From: Frederic Konrad @ 2015-07-07 16:23 UTC (permalink / raw)
  To: Alex Bennée
  Cc: mttcg, peter.maydell, a.spyridakis, mark.burton, agraf,
	qemu-devel, guillaume.delbergue, pbonzini, alistair.francis

On 07/07/2015 18:17, Alex Bennée wrote:
> fred.konrad@greensocs.com writes:
>
>> From: KONRAD Frederic <fred.konrad@greensocs.com>
>>
>> This changes just the tb_flush called from tb_alloc.
>>
>> TODO:
>>   * changes the other tb_flush.
>>
>> Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
>> ---
>>   translate-all.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/translate-all.c b/translate-all.c
>> index 8bd8fe8..9adaffa 100644
>> --- a/translate-all.c
>> +++ b/translate-all.c
>> @@ -1147,7 +1147,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
>>       tb = tb_alloc(pc);
>>       if (!tb) {
>>           /* flush must be done */
>> -        tb_flush(env);
>> +        tb_flush_safe(env);
> Hold on this is async right? What stops us rolling on and then getting
> flushed when the other vCPUs come to a halt?
>
> It deserves a comment at least.
>

not this need some synchronization when the CPUs are halted.
There is crap here spotted by Paolo though.

In the case of tb_flush.. We do an async_safe_work because all VCPUs thread
must be outside cpu_exec.

Are you suggesting to just exiting everybody and wait here that all 
VCPUs exit?
This is possible as well here but is not possible for the other case.. 
That's why I
prefered a "generic" mechanism.

Fred
>>           /* cannot fail at this point */
>>           tb = tb_alloc(pc);
>>           /* Don't forget to invalidate previous TB info.  */

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [RFC PATCH V6 15/18] cpu: introduce tlb_flush*_all.
  2015-06-26 16:08         ` Peter Maydell
  2015-06-26 16:30           ` Frederic Konrad
  2015-06-26 16:54           ` Paolo Bonzini
@ 2015-07-08 15:35           ` Frederic Konrad
  2 siblings, 0 replies; 82+ messages in thread
From: Frederic Konrad @ 2015-07-08 15:35 UTC (permalink / raw)
  To: Peter Maydell, Paolo Bonzini
  Cc: mttcg, Alexander Graf, Alexander Spyridakis, Mark Burton,
	QEMU Developers, Alistair Francis, Guillaume Delbergue,
	Alex Bennée

On 26/06/2015 18:08, Peter Maydell wrote:
> On 26 June 2015 at 17:01, Paolo Bonzini <pbonzini@redhat.com> wrote:
>> On 26/06/2015 17:54, Frederic Konrad wrote:
>>> So what happen is:
>>> An arm instruction want to clear tlb of all VCPUs eg: IS version of
>>> TLBIALL.
>>> The VCPU which execute the TLBIALL_IS can't flush tlb of other VCPU.
>>> It will just ask all VCPU thread to exit and to do tlb_flush hence the
>>> async_work.
>>>
>>> Maybe the big issue might be memory barrier instruction here which I didn't
>>> checked.
>> Yeah, ISTR that in some cases you have to wait for other CPUs to
>> invalidate the TLB before proceeding.  Maybe it's only when you have a
>> dmb instruction, but it's probably simpler for QEMU to always do it
>> synchronously.
> Yeah, the ARM architectural requirement here is that the TLB
> operation is complete after a DSB instruction executes. (True for
> any TLB op, not just the all-CPUs ones). NB that we also call
> tlb_flush() from target-arm/ code for some things like "we just
> updated a system register"; some of those have "must take effect
> immediately" semantics.
>
> In any case, for generic code we have to also consider the
> semantics of non-ARM guests...
>
> thanks
> -- PMM
Hi,

About that we plan to:
   * make tlb_flush work sync and not async (in case of a tlb_flush_all).
   * break the TranslationBlock after a DSB.

In this case when we have a tlb_flush_all, all VCPU's threads will exit 
and wait for all
VCPUs to be out of cpu_exec before doing the flush. Then they won't be 
able to
enter cpu_exec until any flush remains. So in case of a DSB, if there is 
any pending
tlb_flush it won't be able to enter cpu_exec until it is done so we have 
the right
behaviour I think.

The obscur part is: what should happen if CPU A flush it's tlb itself 
and CPU B does
a DSB? I'm not sure if this is really a problem if CPU A didn't finish 
it's TLB operation
as the DSB might have happened before the flush operation?

Do that makes sense?

Thanks,
Fred

^ permalink raw reply	[flat|nested] 82+ messages in thread

end of thread, other threads:[~2015-07-08 15:35 UTC | newest]

Thread overview: 82+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-26 14:47 [Qemu-devel] [RFC PATCH V6 00/18] Multithread TCG fred.konrad
2015-06-26 14:47 ` [Qemu-devel] [RFC PATCH V6 01/18] cpu: make cpu_thread_is_idle public fred.konrad
2015-07-07  9:47   ` Alex Bennée
2015-07-07 11:43     ` Frederic Konrad
2015-06-26 14:47 ` [Qemu-devel] [RFC PATCH V6 02/18] replace spinlock by QemuMutex fred.konrad
2015-07-07 10:15   ` Alex Bennée
2015-07-07 10:22     ` Paolo Bonzini
2015-07-07 11:48       ` Frederic Konrad
2015-07-07 12:34         ` Paolo Bonzini
2015-07-07 13:06           ` Frederic Konrad
2015-07-07 11:46     ` Frederic Konrad
2015-06-26 14:47 ` [Qemu-devel] [RFC PATCH V6 03/18] remove unused spinlock fred.konrad
2015-06-26 14:53   ` Paolo Bonzini
2015-06-26 15:29     ` Frederic Konrad
2015-06-26 15:46       ` Paolo Bonzini
2015-06-26 14:47 ` [Qemu-devel] [RFC PATCH V6 04/18] add support for spin lock on POSIX systems exclusively fred.konrad
2015-06-26 14:55   ` Paolo Bonzini
2015-06-26 15:31     ` Frederic Konrad
2015-06-26 14:47 ` [Qemu-devel] [RFC PATCH V6 05/18] protect TBContext with tb_lock fred.konrad
2015-06-26 14:56   ` Paolo Bonzini
2015-06-26 15:39     ` Frederic Konrad
2015-06-26 15:45       ` Paolo Bonzini
2015-06-26 16:20   ` Paolo Bonzini
2015-07-07 12:22   ` Alex Bennée
2015-07-07 13:16     ` Frederic Konrad
2015-06-26 14:47 ` [Qemu-devel] [RFC PATCH V6 06/18] tcg: remove tcg_halt_cond global variable fred.konrad
2015-06-26 15:02   ` Paolo Bonzini
2015-06-26 15:41     ` Frederic Konrad
2015-07-07 12:27       ` Alex Bennée
2015-07-07 13:17         ` Frederic Konrad
2015-06-26 14:47 ` [Qemu-devel] [RFC PATCH V6 07/18] Drop global lock during TCG code execution fred.konrad
2015-06-26 14:56   ` Jan Kiszka
2015-06-26 15:08     ` Paolo Bonzini
2015-06-26 15:36     ` Frederic Konrad
2015-06-26 15:42       ` Jan Kiszka
2015-06-26 16:11         ` Frederic Konrad
2015-07-07 12:33       ` Alex Bennée
2015-07-07 13:18         ` Frederic Konrad
2015-06-26 14:47 ` [Qemu-devel] [RFC PATCH V6 08/18] cpu: remove exit_request global fred.konrad
2015-06-26 15:03   ` Paolo Bonzini
2015-07-07 13:04   ` Alex Bennée
2015-07-07 13:25     ` Frederic Konrad
2015-06-26 14:47 ` [Qemu-devel] [RFC PATCH V6 09/18] cpu: add a tcg_executing flag fred.konrad
2015-07-07 13:23   ` Alex Bennée
2015-07-07 13:30     ` Frederic Konrad
2015-06-26 14:47 ` [Qemu-devel] [RFC PATCH V6 10/18] tcg: switch on multithread fred.konrad
2015-07-07 13:40   ` Alex Bennée
2015-06-26 14:47 ` [Qemu-devel] [RFC PATCH V6 11/18] cpus: make qemu_cpu_kick_thread public fred.konrad
2015-07-07 15:11   ` Alex Bennée
2015-06-26 14:47 ` [Qemu-devel] [RFC PATCH V6 12/18] Use atomic cmpxchg to atomically check the exclusive value in a STREX fred.konrad
2015-06-26 14:47 ` [Qemu-devel] [RFC PATCH V6 13/18] cpu: introduce async_run_safe_work_on_cpu fred.konrad
2015-06-26 15:35   ` Paolo Bonzini
2015-06-26 16:09     ` Frederic Konrad
2015-06-26 16:23       ` Paolo Bonzini
2015-06-26 16:36         ` Frederic Konrad
2015-06-26 14:47 ` [Qemu-devel] [RFC PATCH V6 14/18] add a callback when tb_invalidate is called fred.konrad
2015-06-26 16:20   ` Paolo Bonzini
2015-06-26 16:40     ` Frederic Konrad
2015-07-07 15:32   ` Alex Bennée
2015-06-26 14:47 ` [Qemu-devel] [RFC PATCH V6 15/18] cpu: introduce tlb_flush*_all fred.konrad
2015-06-26 15:15   ` Paolo Bonzini
2015-06-26 15:54     ` Frederic Konrad
2015-06-26 16:01       ` Paolo Bonzini
2015-06-26 16:08         ` Peter Maydell
2015-06-26 16:30           ` Frederic Konrad
2015-06-26 16:31             ` Paolo Bonzini
2015-06-26 16:35               ` Frederic Konrad
2015-06-26 16:39                 ` Paolo Bonzini
2015-07-06 14:29             ` Mark Burton
2015-07-07 16:12               ` Alex Bennée
2015-06-26 16:54           ` Paolo Bonzini
2015-07-08 15:35           ` Frederic Konrad
2015-07-07 15:52   ` Alex Bennée
2015-06-26 14:47 ` [Qemu-devel] [RFC PATCH V6 16/18] arm: use tlb_flush*_all fred.konrad
2015-07-07 16:14   ` Alex Bennée
2015-06-26 14:47 ` [Qemu-devel] [RFC PATCH V6 17/18] translate-all: introduces tb_flush_safe fred.konrad
2015-07-07 16:16   ` Alex Bennée
2015-06-26 14:47 ` [Qemu-devel] [RFC PATCH V6 18/18] translate-all: (wip) use tb_flush_safe when we can't alloc more tb fred.konrad
2015-06-26 16:21   ` Paolo Bonzini
2015-06-26 16:38     ` Frederic Konrad
2015-07-07 16:17   ` Alex Bennée
2015-07-07 16:23     ` Frederic Konrad

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.