All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [RFC v4 00/28] Base enabling patches for MTTCG
@ 2016-08-11 15:23 Alex Bennée
  2016-08-11 15:23 ` [Qemu-devel] [RFC v4 01/28] cpus: make all_vcpus_paused() return bool Alex Bennée
                   ` (29 more replies)
  0 siblings, 30 replies; 64+ messages in thread
From: Alex Bennée @ 2016-08-11 15:23 UTC (permalink / raw)
  To: mttcg, qemu-devel, fred.konrad, a.rigo, cota, bobby.prani, nikunj
  Cc: mark.burton, pbonzini, jan.kiszka, serge.fdrv, rth,
	peter.maydell, claudio.fontana, Alex Bennée

This is the fourth iteration of the RFC patch set which aims to
provide the basic framework for MTTCG. I hope this will provide a good
base for discussion at KVM Forum later this month.

Prerequisites
=============

This tree has been built on top of two other series of patches:

  - Reduce lock contention on TCG hot-path (v5, in Paolo's tree)
  - cpu-exec: Safe work in quiescent state (v5, in my tree)

You can find the base tree (based off -rc0) at:

  https://github.com/stsquad/qemu/tree/mttcg/async-safe-work-v5

Changes
=======

Since the last posting there have been a number of updates to the
original patches:

   - more updates to docs/multi-thread-tcg.txt design document
   - clean ups of sleep handling (and safe work integration)
   - split the big enable-multi-thread patch
   - split some re-factoring movement stuff into individual patches

As usual the patches themselves have a revision summary under the ---

In addition I've brought forward a number of changes from the original
ARM enabling patches to support the various cputlb operations which
are basically generic anyway. These include:

   - making cross-vCPU tlb_flush operations use async_run_on_cpu
   - making tlb_reset_dirty_range atomically apply the TLB_NOTDIRTY flag

A copy of the tree can be found at:

  https://github.com/stsquad/qemu/tree/mttcg/base-patches-v4

The series includes all the generic work needed and in theory just
needs MTTCG aware atomics and memory barriers for the various
host/guest combinations to be enabled by default.

In practice the memory barrier problems don't show up with an x86
host. In fact I have created a tree which merges in the Emilio's
cmpxchg atomics which happily boots ARMv7 Debian systems without any
additional changes. You can find that at:

  https://github.com/stsquad/qemu/tree/mttcg/base-patches-v4-with-cmpxchg-atomics-v2


Testing
=======

I've tested this boots ARMv7 Debian and all both ARMv7 and v8 kvm-unit-tests with:

  -accel tcg,thread=single

In addition I've tested ARMv7 and ARMv8 kvm-unit-tests of the tcg and
tlbflush group with:

  -accel tcg,thread=multi

These tests are safe as they don't rely on atomics to be work but do
exercise the parallel execution, invalidation and flushing of code.
The full invocation of all the tests is:

  echo "Running all tests in Single Thread Mode"
  ./run_tests.sh -t -o "-accel tcg,thread=single -name debug-threads=on"
  echo "Running tlbflush in Multi Thread Mode"
  ./run_tests.sh -t -g tlbflush -o "-accel tcg,thread=multi -name debug-threads=on"
  echo "Running TCG in Multi Thread Mode"
  ./run_tests.sh -t -g tcg -o "-accel tcg,thread=multi -name debug-threads=on"


Performance
===========

You can't do full work-load testing on this tree due to the lack of
atomic support (but I will run some numbers on
mttcg/base-patches-v4-with-cmpxchg-atomics-v2). However you certainly
see a run time improvement with the kvm-unit-tests TCG group.

  retry.py called with ['./run_tests.sh', '-t', '-g', 'tcg', '-o', '-accel tcg,thread=single']
  run 1: ret=0 (PASS), time=1047.147924 (1/1)
  run 2: ret=0 (PASS), time=1071.921204 (2/2)
  run 3: ret=0 (PASS), time=1048.141600 (3/3)
  Results summary:
  0: 3 times (100.00%), avg time 1055.737 (196.70 varience/14.02 deviation)
  Ran command 3 times, 3 passes
  retry.py called with ['./run_tests.sh', '-t', '-g', 'tcg', '-o', '-accel tcg,thread=multi']
  run 1: ret=0 (PASS), time=303.074210 (1/1)
  run 2: ret=0 (PASS), time=304.574991 (2/2)
  run 3: ret=0 (PASS), time=303.327408 (3/3)
  Results summary:
  0: 3 times (100.00%), avg time 303.659 (0.65 varience/0.80 deviation)
  Ran command 3 times, 3 passes

The TCG tests run with -smp 4 on my system. While the TCG tests are
purely CPU bound they do exercise the hot and cold paths of TCG
execution (especially when triggering SMC detection). However there is
still a benefit even with a 50% overhead compared to the ideal 263
second elapsed time.

Alex

Alex Bennée (23):
  cpus: make all_vcpus_paused() return bool
  translate_all: DEBUG_FLUSH -> DEBUG_TB_FLUSH
  translate-all: add DEBUG_LOCKING asserts
  cpu-exec: include cpu_index in CPU_LOG_EXEC messages
  docs: new design document multi-thread-tcg.txt (DRAFTING)
  linux-user/elfload: ensure mmap_lock() held while setting up
  translate-all: Add assert_(memory|tb)_lock annotations
  target-arm/arm-powerctl: wake up sleeping CPUs
  tcg: move tcg_exec_all and helpers above thread fn
  tcg: cpus rm tcg_exec_all()
  tcg: add kick timer for single-threaded vCPU emulation
  tcg: rename tcg_current_cpu to tcg_current_rr_cpu
  cpus: re-factor out handle_icount_deadline
  tcg: remove global exit_request
  tcg: move locking for tb_invalidate_phys_page_range up
  cpus: tweak sleeping and safe_work rules for MTTCG
  tcg: enable tb_lock() for SoftMMU
  tcg: enable thread-per-vCPU
  atomic: introduce cmpxchg_bool
  cputlb: add assert_cpu_is_self checks
  cputlb: tweak qemu_ram_addr_from_host_nofail reporting
  cputlb: make tlb_reset_dirty safe for MTTCG
  cputlb: make tlb_flush_by_mmuidx safe for MTTCG

Jan Kiszka (1):
  tcg: drop global lock during TCG code execution

KONRAD Frederic (3):
  tcg: protect TBContext with tb_lock.
  tcg: add options for enabling MTTCG
  cputlb: introduce tlb_flush_* async work.

Paolo Bonzini (1):
  tcg: comment on which functions have to be called with tb_lock held

 bsd-user/mmap.c           |   5 +
 cpu-exec-common.c         |  19 +-
 cpu-exec.c                |  41 ++--
 cpus.c                    | 510 +++++++++++++++++++++++++++++-----------------
 cputlb.c                  | 279 ++++++++++++++++++-------
 docs/multi-thread-tcg.txt | 310 ++++++++++++++++++++++++++++
 exec.c                    |  28 +++
 hw/i386/kvmvapic.c        |   4 +
 include/exec/cputlb.h     |   2 -
 include/exec/exec-all.h   |   5 +-
 include/qemu/atomic.h     |   9 +
 include/qom/cpu.h         |  27 +++
 include/sysemu/cpus.h     |   2 +
 linux-user/elfload.c      |   4 +
 linux-user/mmap.c         |   5 +
 memory.c                  |   2 +
 qemu-options.hx           |  20 ++
 qom/cpu.c                 |  10 +
 softmmu_template.h        |  17 ++
 target-arm/Makefile.objs  |   2 +-
 target-arm/arm-powerctl.c |   2 +
 target-i386/smm_helper.c  |   7 +
 tcg/tcg.h                 |   2 +
 translate-all.c           | 175 +++++++++++++---
 vl.c                      |  48 ++++-
 25 files changed, 1227 insertions(+), 308 deletions(-)
 create mode 100644 docs/multi-thread-tcg.txt

-- 
2.7.4

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [Qemu-devel] [RFC v4 01/28] cpus: make all_vcpus_paused() return bool
  2016-08-11 15:23 [Qemu-devel] [RFC v4 00/28] Base enabling patches for MTTCG Alex Bennée
@ 2016-08-11 15:23 ` Alex Bennée
  2016-08-11 15:23 ` [Qemu-devel] [RFC v4 02/28] translate_all: DEBUG_FLUSH -> DEBUG_TB_FLUSH Alex Bennée
                   ` (28 subsequent siblings)
  29 siblings, 0 replies; 64+ messages in thread
From: Alex Bennée @ 2016-08-11 15:23 UTC (permalink / raw)
  To: mttcg, qemu-devel, fred.konrad, a.rigo, cota, bobby.prani, nikunj
  Cc: mark.burton, pbonzini, jan.kiszka, serge.fdrv, rth,
	peter.maydell, claudio.fontana, Alex Bennée,
	Peter Crosthwaite

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>

---
v3
  - add r-b tags
---
 cpus.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/cpus.c b/cpus.c
index 1ea60e4..1700032 100644
--- a/cpus.c
+++ b/cpus.c
@@ -1219,17 +1219,17 @@ void qemu_mutex_unlock_iothread(void)
     qemu_mutex_unlock(&qemu_global_mutex);
 }
 
-static int all_vcpus_paused(void)
+static bool all_vcpus_paused(void)
 {
     CPUState *cpu;
 
     CPU_FOREACH(cpu) {
         if (!cpu->stopped) {
-            return 0;
+            return false;
         }
     }
 
-    return 1;
+    return true;
 }
 
 void pause_all_vcpus(void)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [Qemu-devel] [RFC v4 02/28] translate_all: DEBUG_FLUSH -> DEBUG_TB_FLUSH
  2016-08-11 15:23 [Qemu-devel] [RFC v4 00/28] Base enabling patches for MTTCG Alex Bennée
  2016-08-11 15:23 ` [Qemu-devel] [RFC v4 01/28] cpus: make all_vcpus_paused() return bool Alex Bennée
@ 2016-08-11 15:23 ` Alex Bennée
  2016-08-11 15:23 ` [Qemu-devel] [RFC v4 03/28] translate-all: add DEBUG_LOCKING asserts Alex Bennée
                   ` (27 subsequent siblings)
  29 siblings, 0 replies; 64+ messages in thread
From: Alex Bennée @ 2016-08-11 15:23 UTC (permalink / raw)
  To: mttcg, qemu-devel, fred.konrad, a.rigo, cota, bobby.prani, nikunj
  Cc: mark.burton, pbonzini, jan.kiszka, serge.fdrv, rth,
	peter.maydell, claudio.fontana, Alex Bennée,
	Peter Crosthwaite

Make the debug define consistent with the others. The flush operation is
all about invalidating TranslationBlocks on flush events.

Also fix up the commenting on the other DEBUG for the benefit of
checkpatch.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
---
 translate-all.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/translate-all.c b/translate-all.c
index 60527ad..b7d5adf 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -56,10 +56,10 @@
 #include "qemu/timer.h"
 #include "exec/log.h"
 
-//#define DEBUG_TB_INVALIDATE
-//#define DEBUG_FLUSH
+/* #define DEBUG_TB_INVALIDATE */
+/* #define DEBUG_TB_FLUSH */
 /* make various TB consistency checks */
-//#define DEBUG_TB_CHECK
+/* #define DEBUG_TB_CHECK */
 
 #if !defined(CONFIG_USER_ONLY)
 /* TB consistency checks only implemented for usermode emulation.  */
@@ -837,7 +837,7 @@ static void do_tb_flush(CPUState *cpu, void *data)
     if (tcg_ctx.tb_ctx.nb_tbs == 0) {
         return;
     }
-#if defined(DEBUG_FLUSH)
+#if defined(DEBUG_TB_FLUSH)
     printf("qemu: flush code_size=%ld nb_tbs=%d avg_tb_size=%ld\n",
            (unsigned long)(tcg_ctx.code_gen_ptr - tcg_ctx.code_gen_buffer),
            tcg_ctx.tb_ctx.nb_tbs, tcg_ctx.tb_ctx.nb_tbs > 0 ?
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [Qemu-devel] [RFC v4 03/28] translate-all: add DEBUG_LOCKING asserts
  2016-08-11 15:23 [Qemu-devel] [RFC v4 00/28] Base enabling patches for MTTCG Alex Bennée
  2016-08-11 15:23 ` [Qemu-devel] [RFC v4 01/28] cpus: make all_vcpus_paused() return bool Alex Bennée
  2016-08-11 15:23 ` [Qemu-devel] [RFC v4 02/28] translate_all: DEBUG_FLUSH -> DEBUG_TB_FLUSH Alex Bennée
@ 2016-08-11 15:23 ` Alex Bennée
  2016-08-11 15:24 ` [Qemu-devel] [RFC v4 04/28] cpu-exec: include cpu_index in CPU_LOG_EXEC messages Alex Bennée
                   ` (26 subsequent siblings)
  29 siblings, 0 replies; 64+ messages in thread
From: Alex Bennée @ 2016-08-11 15:23 UTC (permalink / raw)
  To: mttcg, qemu-devel, fred.konrad, a.rigo, cota, bobby.prani, nikunj
  Cc: mark.burton, pbonzini, jan.kiszka, serge.fdrv, rth,
	peter.maydell, claudio.fontana, Alex Bennée,
	Peter Crosthwaite, Riku Voipio

This adds asserts to check the locking on the various translation
engines structures. There are two sets of structures that are protected
by locks.

The first the l1map and PageDesc structures used to track which
translation blocks are associated with which physical addresses. In
user-mode this is covered by the mmap_lock.

The second case are TB context related structures which are protected by
tb_lock which is also user-mode only.

Currently the asserts do nothing in SoftMMU mode but this will change
for MTTCG.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
---
 bsd-user/mmap.c         |  5 +++++
 include/exec/exec-all.h |  1 +
 linux-user/mmap.c       |  5 +++++
 translate-all.c         | 41 +++++++++++++++++++++++++++++++++++++++++
 4 files changed, 52 insertions(+)

diff --git a/bsd-user/mmap.c b/bsd-user/mmap.c
index 610f91b..ee59073 100644
--- a/bsd-user/mmap.c
+++ b/bsd-user/mmap.c
@@ -42,6 +42,11 @@ void mmap_unlock(void)
     }
 }
 
+bool have_mmap_lock(void)
+{
+    return mmap_lock_count > 0 ? true : false;
+}
+
 /* Grab lock to make sure things are in a consistent state after fork().  */
 void mmap_fork_start(void)
 {
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index ed5b9c8..d5f29bd 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -371,6 +371,7 @@ void tlb_fill(CPUState *cpu, target_ulong addr, MMUAccessType access_type,
 #if defined(CONFIG_USER_ONLY)
 void mmap_lock(void);
 void mmap_unlock(void);
+bool have_mmap_lock(void);
 
 static inline tb_page_addr_t get_page_addr_code(CPUArchState *env1, target_ulong addr)
 {
diff --git a/linux-user/mmap.c b/linux-user/mmap.c
index c4371d9..19aeec5 100644
--- a/linux-user/mmap.c
+++ b/linux-user/mmap.c
@@ -43,6 +43,11 @@ void mmap_unlock(void)
     }
 }
 
+bool have_mmap_lock(void)
+{
+    return mmap_lock_count > 0 ? true : false;
+}
+
 /* Grab lock to make sure things are in a consistent state after fork().  */
 void mmap_fork_start(void)
 {
diff --git a/translate-all.c b/translate-all.c
index b7d5adf..7ab1fd6 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -31,6 +31,7 @@
 #include "tcg.h"
 #if defined(CONFIG_USER_ONLY)
 #include "qemu.h"
+#include "exec/exec-all.h"
 #if defined(__FreeBSD__) || defined(__FreeBSD_kernel__)
 #include <sys/param.h>
 #if __FreeBSD_version >= 700104
@@ -58,6 +59,7 @@
 
 /* #define DEBUG_TB_INVALIDATE */
 /* #define DEBUG_TB_FLUSH */
+/* #define DEBUG_LOCKING */
 /* make various TB consistency checks */
 /* #define DEBUG_TB_CHECK */
 
@@ -66,6 +68,28 @@
 #undef DEBUG_TB_CHECK
 #endif
 
+/* Access to the various translations structures need to be serialised via locks
+ * for consistency. This is automatic for SoftMMU based system
+ * emulation due to its single threaded nature. In user-mode emulation
+ * access to the memory related structures are protected with the
+ * mmap_lock.
+ */
+#ifdef DEBUG_LOCKING
+#define DEBUG_MEM_LOCKS 1
+#else
+#define DEBUG_MEM_LOCKS 0
+#endif
+
+#ifdef CONFIG_SOFTMMU
+#define assert_memory_lock() do { /* nothing */ } while (0)
+#else
+#define assert_memory_lock() do {               \
+        if (DEBUG_MEM_LOCKS) {                  \
+            g_assert(have_mmap_lock());         \
+        }                                       \
+    } while (0)
+#endif
+
 #define SMC_BITMAP_USE_THRESHOLD 10
 
 typedef struct PageDesc {
@@ -153,6 +177,23 @@ void tb_lock_reset(void)
 #endif
 }
 
+#ifdef DEBUG_LOCKING
+#define DEBUG_TB_LOCKS 1
+#else
+#define DEBUG_TB_LOCKS 0
+#endif
+
+#ifdef CONFIG_SOFTMMU
+#define assert_tb_lock() do { /* nothing */ } while (0)
+#else
+#define assert_tb_lock() do {               \
+        if (DEBUG_TB_LOCKS) {               \
+            g_assert(have_tb_lock);         \
+        }                                   \
+    } while (0)
+#endif
+
+
 static TranslationBlock *tb_find_pc(uintptr_t tc_ptr);
 
 void cpu_gen_init(void)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [Qemu-devel] [RFC v4 04/28] cpu-exec: include cpu_index in CPU_LOG_EXEC messages
  2016-08-11 15:23 [Qemu-devel] [RFC v4 00/28] Base enabling patches for MTTCG Alex Bennée
                   ` (2 preceding siblings ...)
  2016-08-11 15:23 ` [Qemu-devel] [RFC v4 03/28] translate-all: add DEBUG_LOCKING asserts Alex Bennée
@ 2016-08-11 15:24 ` Alex Bennée
  2016-09-07  2:21   ` Richard Henderson
  2016-08-11 15:24 ` [Qemu-devel] [RFC v4 05/28] docs: new design document multi-thread-tcg.txt (DRAFTING) Alex Bennée
                   ` (25 subsequent siblings)
  29 siblings, 1 reply; 64+ messages in thread
From: Alex Bennée @ 2016-08-11 15:24 UTC (permalink / raw)
  To: mttcg, qemu-devel, fred.konrad, a.rigo, cota, bobby.prani, nikunj
  Cc: mark.burton, pbonzini, jan.kiszka, serge.fdrv, rth,
	peter.maydell, claudio.fontana, Alex Bennée,
	Peter Crosthwaite

Even more important when debugging MTTCG is seeing which vCPU is
currently executing.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
 cpu-exec.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index f8cfdbd..f8fbf0d 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -143,8 +143,9 @@ static inline tcg_target_ulong cpu_tb_exec(CPUState *cpu, TranslationBlock *itb)
     uint8_t *tb_ptr = itb->tc_ptr;
 
     qemu_log_mask_and_addr(CPU_LOG_EXEC, itb->pc,
-                           "Trace %p [" TARGET_FMT_lx "] %s\n",
-                           itb->tc_ptr, itb->pc, lookup_symbol(itb->pc));
+                           "Trace %p [%d: " TARGET_FMT_lx "] %s\n",
+                           itb->tc_ptr, cpu->cpu_index, itb->pc,
+                           lookup_symbol(itb->pc));
 
 #if defined(DEBUG_DISAS)
     if (qemu_loglevel_mask(CPU_LOG_TB_CPU)) {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [Qemu-devel] [RFC v4 05/28] docs: new design document multi-thread-tcg.txt (DRAFTING)
  2016-08-11 15:23 [Qemu-devel] [RFC v4 00/28] Base enabling patches for MTTCG Alex Bennée
                   ` (3 preceding siblings ...)
  2016-08-11 15:24 ` [Qemu-devel] [RFC v4 04/28] cpu-exec: include cpu_index in CPU_LOG_EXEC messages Alex Bennée
@ 2016-08-11 15:24 ` Alex Bennée
  2016-08-11 15:24 ` [Qemu-devel] [RFC v4 06/28] tcg: comment on which functions have to be called with tb_lock held Alex Bennée
                   ` (24 subsequent siblings)
  29 siblings, 0 replies; 64+ messages in thread
From: Alex Bennée @ 2016-08-11 15:24 UTC (permalink / raw)
  To: mttcg, qemu-devel, fred.konrad, a.rigo, cota, bobby.prani, nikunj
  Cc: mark.burton, pbonzini, jan.kiszka, serge.fdrv, rth,
	peter.maydell, claudio.fontana, Alex Bennée

This is a current DRAFT of a design proposal for upgrading TCG emulation
to take advantage of modern CPUs by running a thread-per-CPU. The
document goes through the various areas of the code affected by such a
change and proposes design requirements for each part of the solution.

Where solutions have been merged upstream the approach is described
underneath the design requirements. Where the solutions are still under
discussion they are marked with (Current solution[s]) to document what
the current approaches being used are. Currently there are several
proposed solutions for modelling atomic operations.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

---
v1
  - initial version
v2
  - update discussion on locks
  - bit more detail on vCPU scheduling
  - explicitly mention Translation Blocks
  - emulated hardware state already covered by iomutex
  - a few minor rewords
v3
  - mention this covers system-mode
  - describe main main-loop and lookup hot-path
  - mention multi-concurrent-reader lookups
  - enumerate reasons for invalidation
  - add more details on lookup structures
  - describe the softmmu hot-path better
  - mention store-after-load barrier problem
v4
  - mention some cross-over between linux-user/system emulation
  - various minor grammar and scanning fixes
  - fix reference to tb_ctx.htbale
  - describe the solution for hot-path
  - more detail on TB flushing and invalidation
  - add (Current solution) following design requirements
  - more detail on iothread/BQL mutex
  - mention implicit memory barriers
  - add links to current LL/SC and cmpxchg patch sets
  - add TLB flag setting as an additional requirement
---
 docs/multi-thread-tcg.txt | 310 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 310 insertions(+)
 create mode 100644 docs/multi-thread-tcg.txt

diff --git a/docs/multi-thread-tcg.txt b/docs/multi-thread-tcg.txt
new file mode 100644
index 0000000..9f676c0
--- /dev/null
+++ b/docs/multi-thread-tcg.txt
@@ -0,0 +1,310 @@
+Copyright (c) 2015 Linaro Ltd.
+
+This work is licensed under the terms of the GNU GPL, version 2 or later.  See
+the COPYING file in the top-level directory.
+
+STATUS: DRAFTING
+
+Introduction
+============
+
+This document outlines the design for multi-threaded TCG system-mode
+emulation. The current user-mode emulation mirrors the thread
+structure of the translated executable. Some of the work will be
+applicable to both system and linux-user emulation.
+
+The original system-mode TCG implementation was single threaded and
+dealt with multiple CPUs with simple round-robin scheduling. This
+simplified a lot of things but became increasingly limited as systems
+being emulated gained additional cores and per-core performance gains
+for host systems started to level off.
+
+vCPU Scheduling
+===============
+
+We introduce a new running mode where each vCPU will run on its own
+user-space thread. This will be enabled by default for all FE/BE
+combinations that have had the required work done to support this
+safely.
+
+In the general case of running translated code there should be no
+inter-vCPU dependencies and all vCPUs should be able to run at full
+speed. Synchronisation will only be required while accessing internal
+shared data structures or when the emulated architecture requires a
+coherent representation of the emulated machine state.
+
+Shared Data Structures
+======================
+
+Main Run Loop
+-------------
+
+Even when there is no code being generated there are a number of
+structures associated with the hot-path through the main run-loop.
+These are associated with looking up the next translation block to
+execute. These include:
+
+    tb_jmp_cache (per-vCPU, cache of recent jumps)
+    tb_ctx.htable (global hash table, phys address->tb lookup)
+
+As TB linking only occurs when blocks are in the same page this code
+is critical to performance as looking up the next TB to execute is the
+most common reason to exit the generated code.
+
+DESIGN REQUIREMENT: Make access to lookup structures safe with
+multiple reader/writer threads. Minimise any lock contention to do it.
+
+The hot-path avoids using locks where possible. The tb_jmp_cache is
+updated with atomic accesses to ensure consistent results. The fall
+back QHT based hash table is also designed for lockless lookups. Locks
+are only taken when code generation is required or TranslationBlocks
+have their block-to-block jumps patched.
+
+Global TCG State
+----------------
+
+We need to protect the entire code generation cycle including any post
+generation patching of the translated code. This also implies a shared
+translation buffer which contains code running on all cores. Any
+execution path that comes to the main run loop will need to hold a
+mutex for code generation. This also includes times when we need flush
+code or entries from any shared lookups/caches. Structures held on a
+per-vCPU basis won't need locking unless other vCPUs will need to
+modify them.
+
+DESIGN REQUIREMENT: Add locking around all code generation and TB
+patching.
+
+Translation Blocks
+------------------
+
+Currently the whole system shares a single code generation buffer
+which when full will force a flush of all translations and start from
+scratch again. Some operations also force a full flush of translations
+including:
+
+  - debugging operations (breakpoint insertion/removal)
+  - some CPU helper functions
+
+This is done with the async_safe_run_on_cpu() mechanism to ensure all
+vCPUs are quiescent when changes are being made to shared global
+structures.
+
+More granular translation invalidation events are typically due
+to a change of the state of a physical page:
+
+  - code modification (self modify code, patching code)
+  - page changes (new page mapping in linux-user mode)
+
+While setting the invalid flag in a TranslationBlock will stop it
+being used when looked up in the hot-path there are a number of other
+book-keeping structures that need to be safely cleared.
+
+Any TranslationBlocks which have been patched to jump directly to the
+now invalid blocks need the jump patches reversing so they will return
+to the C code.
+
+There are a number of look-up caches that need to be properly updated
+including the:
+
+  - jump lookup cache
+  - the physical-to-tb lookup hash table
+  - the global page table
+
+The global page table (l1_map) which provides a multi-level look-up
+for PageDesc structures which contain pointers to the start of a
+linked list of all Translation Blocks in that page (see page_next).
+
+Both the jump patching and the page cache involve linked lists that
+the invalidated TranslationBlock needs to be removed from.
+
+DESIGN REQUIREMENT: Safely handle invalidation of TBs
+                      - safely patch/revert direct jumps
+                      - remove central PageDesc lookup entries
+                      - ensure lookup caches/hashes are safely updated
+
+(Current solution)
+
+The direct jump themselves are updated atomically by the TCG
+tb_set_jmp_target() code. Modification to the linked lists that allow
+searching for linked pages are done under the protect of the
+tb_lock().
+
+The global page table is protected by the tb_lock() in system-mode and
+mmap_lock() in linux-user mode.
+
+The lookup caches are updated atomically and the lookup hash uses QHT
+which is designed for concurrent safe lookup.
+
+
+Memory maps and TLBs
+--------------------
+
+The memory handling code is fairly critical to the speed of memory
+access in the emulated system. The SoftMMU code is designed so the
+hot-path can be handled entirely within translated code. This is
+handled with a per-vCPU TLB structure which once populated will allow
+a series of accesses to the page to occur without exiting the
+translated code. It is possible to set flags in the TLB address which
+will ensure the slow-path is taken for each access. This can be done
+to support:
+
+  - Memory regions (dividing up access to PIO, MMIO and RAM)
+  - Dirty page tracking (for code gen, SMC detection, migration and display)
+  - Virtual TLB (for translating guest address->real address)
+
+When the TLB tables are updated by a vCPU thread other than their own
+we need to ensure it is done in a safe way so no inconsistent state is
+seen by the vCPU thread.
+
+Some operations require updating a number of vCPUs TLBs at the same
+time in a synchronised manner.
+
+DESIGN REQUIREMENTS:
+
+  - TLB Flush All/Page
+    - can be across-vCPUs
+    - cross vCPU TLB flush may need other vCPU brought to halt
+    - change may need to be visible to the calling vCPU immediately
+  - TLB Flag Update
+    - usually cross-vCPU
+    - want change to be visible as soon as possible
+  - TLB Update (update a CPUTLBEntry, via tlb_set_page_with_attrs)
+    - This is a per-vCPU table - by definition can't race
+    - updated by its own thread when the slow-path is forced
+
+(Current solution)
+
+When we detect a cross-vCPU operation we use the
+async_safe_run_on_cpu() to defer running the operations until all
+vCPUs are quiescent. We can jump to the top of the run loop if we want
+the change to be visible to that vCPU straight away.
+
+Emulated hardware state
+-----------------------
+
+Currently thanks to KVM work any access to IO memory is automatically
+protected by the global iothread mutex, also known as the BQL (Big
+Qemu Lock). Any IO region that doesn't use global mutex is expected to
+do its own locking.
+
+As the global iothread mutex is shared across the system we push the
+use of the lock as far down into the TCG code as possible to minimise
+contention.
+
+Memory Consistency
+==================
+
+Between emulated guests and host systems there are a range of memory
+consistency models. Even emulating weakly ordered systems on strongly
+ordered hosts needs to ensure things like store-after-load re-ordering
+can be prevented when the guest wants to.
+
+Memory Barriers
+---------------
+
+Barriers (sometimes known as fences) provide a mechanism for software
+to enforce a particular ordering of memory operations from the point
+of view of external observers (e.g. another processor core). They can
+apply to any memory operations as well as just loads or stores.
+
+The Linux kernel has an excellent write-up on the various forms of
+memory barrier and the guarantees they can provide [1].
+
+Barriers are often wrapped around synchronisation primitives to
+provide explicit memory ordering semantics. However they can be used
+by themselves to provide safe lockless access by ensuring for example
+a change to a signal flag will only be visible once the changes to
+payload are.
+
+DESIGN REQUIREMENT: Add a new tcg_memory_barrier op
+
+This would enforce a strong load/store ordering so all loads/stores
+complete at the memory barrier. On single-core non-SMP strongly
+ordered backends this could become a NOP.
+
+Aside from explicit standalone memory barrier instructions there are
+also implicit memory ordering semantics which comes with each guest
+memory access instruction. For example all x86 load/stores come with
+fairly strong guarantees of sequential consistency where as ARM has
+special variants of load/store instructions that imply acquire/release
+semantics.
+
+In the case of a strongly ordered guest architecture being emulated on
+a weakly ordered host the scope for a heavy performance impact is
+quite high.
+
+DESIGN REQUIREMENTS: Be efficient with use of memory barriers
+       - host systems with stronger implied guarantees can skip some barriers
+       - merge consecutive barriers to the strongest one
+
+(Current solution)
+
+As the emulation of barriers is also a problem for linux-user
+emulation the current patch set is being developed separately from the
+MTTCG effort and will hopefully be merged before the rest of MTTCG.
+The latest iterations can be found at:
+
+    http://lists.nongnu.org/archive/html/qemu-devel/2016-07/msg03283.html
+    http://lists.nongnu.org/archive/html/qemu-devel/2016-07/msg03283.html
+
+Memory Control and Maintenance
+------------------------------
+
+This includes a class of instructions for controlling system cache
+behaviour. While QEMU doesn't model cache behaviour these instructions
+are often seen when code modification has taken place to ensure the
+changes take effect.
+
+Synchronisation Primitives
+--------------------------
+
+There are two broad types of synchronisation primitives found in
+modern ISAs: atomic instructions and exclusive regions.
+
+The first type offer a simple atomic instruction which will guarantee
+some sort of test and conditional store will be truly atomic w.r.t.
+other cores sharing access to the memory. The classic example is the
+x86 cmpxchg instruction.
+
+The second type offer a pair of load/store instructions which offer a
+guarantee that an region of memory has not been touched between the
+load and store instructions. An example of this is ARM's ldrex/strex
+pair where the strex instruction will return a flag indicating a
+successful store only if no other CPU has accessed the memory region
+since the ldrex.
+
+Traditionally TCG has generated a series of operations that work
+because they are within the context of a single translation block so
+will have completed before another CPU is scheduled. However with
+the ability to have multiple threads running to emulate multiple CPUs
+we will need to explicitly expose these semantics.
+
+DESIGN REQUIREMENTS:
+  - Support classic atomic instructions
+  - Support load/store exclusive (or load link/store conditional) pairs
+  - Generic enough infrastructure to support all guest architectures
+CURRENT OPEN QUESTIONS:
+  - How problematic is the ABA problem in general
+  - How do we solve atomic problems where the host's largest atomic
+  operation is smaller than the guests (64bit on 32bit).
+
+(Current solutions)
+
+There a number of solutions currently being proposed. The eventual
+solution will likely involve a trade off between performance and
+completeness. For example in practice most guest synchronisation
+sequences are unlikely to trigger the ABA problem.
+
+The current patch series are:
+
+  - Alvise Rigo's Slow-path for atomic instruction translation
+    http://lists.nongnu.org/archive/html/qemu-devel/2016-04/msg02913.html
+    http://lists.nongnu.org/archive/html/qemu-devel/2016-05/msg04789.html
+
+  - Emilio G. Cota's cmpxchg-based emulation of atomics
+    http://lists.nongnu.org/archive/html/qemu-devel/2016-07/msg00152.html
+
+==========
+
+[1] https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/plain/Documentation/memory-barriers.txt
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [Qemu-devel] [RFC v4 06/28] tcg: comment on which functions have to be called with tb_lock held
  2016-08-11 15:23 [Qemu-devel] [RFC v4 00/28] Base enabling patches for MTTCG Alex Bennée
                   ` (4 preceding siblings ...)
  2016-08-11 15:24 ` [Qemu-devel] [RFC v4 05/28] docs: new design document multi-thread-tcg.txt (DRAFTING) Alex Bennée
@ 2016-08-11 15:24 ` Alex Bennée
  2016-09-07  2:30   ` Richard Henderson
  2016-08-11 15:24 ` [Qemu-devel] [RFC v4 07/28] linux-user/elfload: ensure mmap_lock() held while setting up Alex Bennée
                   ` (23 subsequent siblings)
  29 siblings, 1 reply; 64+ messages in thread
From: Alex Bennée @ 2016-08-11 15:24 UTC (permalink / raw)
  To: mttcg, qemu-devel, fred.konrad, a.rigo, cota, bobby.prani, nikunj
  Cc: mark.burton, pbonzini, jan.kiszka, serge.fdrv, rth,
	peter.maydell, claudio.fontana, Alex Bennée,
	Peter Crosthwaite

From: Paolo Bonzini <pbonzini@redhat.com>

softmmu requires more functions to be thread-safe, because translation
blocks can be invalidated from e.g. notdirty callbacks.  Probably the
same holds for user-mode emulation, it's just that no one has ever
tried to produce a coherent locking there.

This patch will guide the introduction of more tb_lock and tb_unlock
calls for system emulation.

Note that after this patch some (most) of the mentioned functions are
still called outside tb_lock/tb_unlock.  The next one will rectify this.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

---
v1(ajb):
  - just s-o-b
v2
  - clarify write lock on tb_jump_cache
v3
  - drop RCU comment for debug stuff (separate commit now)
---
 include/exec/exec-all.h |  1 +
 include/qom/cpu.h       |  3 +++
 tcg/tcg.h               |  2 ++
 translate-all.c         | 30 ++++++++++++++++++++++++------
 4 files changed, 30 insertions(+), 6 deletions(-)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index d5f29bd..d73b0e6 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -315,6 +315,7 @@ static inline void tb_set_jmp_target(TranslationBlock *tb,
 
 #endif
 
+/* Called with tb_lock held.  */
 static inline void tb_add_jump(TranslationBlock *tb, int n,
                                TranslationBlock *tb_next)
 {
diff --git a/include/qom/cpu.h b/include/qom/cpu.h
index 060a473..8ee770a 100644
--- a/include/qom/cpu.h
+++ b/include/qom/cpu.h
@@ -324,7 +324,10 @@ struct CPUState {
     MemoryRegion *memory;
 
     void *env_ptr; /* CPUArchState */
+
+    /* Writes protected by tb_lock, reads not thread-safe  */
     struct TranslationBlock *tb_jmp_cache[TB_JMP_CACHE_SIZE];
+
     struct GDBRegisterState *gdb_regs;
     int gdb_num_regs;
     int gdb_num_g_regs;
diff --git a/tcg/tcg.h b/tcg/tcg.h
index 6046dcd..a7f34db 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -712,6 +712,7 @@ static inline bool tcg_op_buf_full(void)
 
 /* pool based memory allocation */
 
+/* tb_lock must be held for tcg_malloc_internal. */
 void *tcg_malloc_internal(TCGContext *s, int size);
 void tcg_pool_reset(TCGContext *s);
 void tcg_pool_delete(TCGContext *s);
@@ -720,6 +721,7 @@ void tb_lock(void);
 void tb_unlock(void);
 void tb_lock_reset(void);
 
+/* Called with tb_lock held.  */
 static inline void *tcg_malloc(int size)
 {
     TCGContext *s = &tcg_ctx;
diff --git a/translate-all.c b/translate-all.c
index 7ab1fd6..a3448e1 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -288,7 +288,9 @@ static int encode_search(TranslationBlock *tb, uint8_t *block)
     return p - block;
 }
 
-/* The cpu state corresponding to 'searched_pc' is restored.  */
+/* The cpu state corresponding to 'searched_pc' is restored.
+ * Called with tb_lock held.
+ */
 static int cpu_restore_state_from_tb(CPUState *cpu, TranslationBlock *tb,
                                      uintptr_t searched_pc)
 {
@@ -438,6 +440,7 @@ static void page_init(void)
 }
 
 /* If alloc=1:
+ * Called with tb_lock held for system emulation.
  * Called with mmap_lock held for user-mode emulation.
  */
 static PageDesc *page_find_alloc(tb_page_addr_t index, int alloc)
@@ -802,8 +805,12 @@ bool tcg_enabled(void)
     return tcg_ctx.code_gen_buffer != NULL;
 }
 
-/* Allocate a new translation block. Flush the translation buffer if
-   too many translation blocks or too much generated code. */
+/*
+ * Allocate a new translation block. Flush the translation buffer if
+ * too many translation blocks or too much generated code.
+ *
+ * Called with tb_lock held.
+ */
 static TranslationBlock *tb_alloc(target_ulong pc)
 {
     TranslationBlock *tb;
@@ -818,6 +825,7 @@ static TranslationBlock *tb_alloc(target_ulong pc)
     return tb;
 }
 
+/* Called with tb_lock held.  */
 void tb_free(TranslationBlock *tb)
 {
     /* In practice this is mostly used for single use temporary TB
@@ -927,7 +935,11 @@ do_tb_invalidate_check(struct qht *ht, void *p, uint32_t hash, void *userp)
     }
 }
 
-static void tb_invalidate_check(target_ulong address)
+/* verify that all the pages have correct rights for code
+ *
+ * Called with tb_lock held.
+ */
+static void tb_page_check(void)
 {
     address &= TARGET_PAGE_MASK;
     qht_iter(&tcg_ctx.tb_ctx.htable, do_tb_invalidate_check, &address);
@@ -1031,7 +1043,10 @@ static inline void tb_jmp_unlink(TranslationBlock *tb)
     }
 }
 
-/* invalidate one TB */
+/* invalidate one TB
+ *
+ * Called with tb_lock held.
+ */
 void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr)
 {
     CPUState *cpu;
@@ -1465,7 +1480,9 @@ void tb_invalidate_phys_page_fast(tb_page_addr_t start, int len)
     }
     if (!p->code_bitmap &&
         ++p->code_write_count >= SMC_BITMAP_USE_THRESHOLD) {
-        /* build code bitmap */
+        /* build code bitmap.  FIXME: writes should be protected by
+         * tb_lock, reads by tb_lock or RCU.
+         */
         build_page_bitmap(p);
     }
     if (p->code_bitmap) {
@@ -1606,6 +1623,7 @@ void tb_invalidate_phys_addr(AddressSpace *as, hwaddr addr)
 }
 #endif /* !defined(CONFIG_USER_ONLY) */
 
+/* Called with tb_lock held.  */
 void tb_check_watchpoint(CPUState *cpu)
 {
     TranslationBlock *tb;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [Qemu-devel] [RFC v4 07/28] linux-user/elfload: ensure mmap_lock() held while setting up
  2016-08-11 15:23 [Qemu-devel] [RFC v4 00/28] Base enabling patches for MTTCG Alex Bennée
                   ` (5 preceding siblings ...)
  2016-08-11 15:24 ` [Qemu-devel] [RFC v4 06/28] tcg: comment on which functions have to be called with tb_lock held Alex Bennée
@ 2016-08-11 15:24 ` Alex Bennée
  2016-09-07  2:34   ` Richard Henderson
  2016-08-11 15:24 ` [Qemu-devel] [RFC v4 08/28] translate-all: Add assert_(memory|tb)_lock annotations Alex Bennée
                   ` (22 subsequent siblings)
  29 siblings, 1 reply; 64+ messages in thread
From: Alex Bennée @ 2016-08-11 15:24 UTC (permalink / raw)
  To: mttcg, qemu-devel, fred.konrad, a.rigo, cota, bobby.prani, nikunj
  Cc: mark.burton, pbonzini, jan.kiszka, serge.fdrv, rth,
	peter.maydell, claudio.fontana, Alex Bennée, Riku Voipio

Future patches will enforce the holding of mmap_lock() when we are
manipulating internal memory structures. Technically it doesn't matter
in the case of elfload as we haven't started executing yet. However it
is easier to grab the lock when required than special case the
translate-all API.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

---
v4
  - split from assert patch
---
 linux-user/elfload.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index f807baf..4b125b9 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -1838,6 +1838,8 @@ static void load_elf_image(const char *image_name, int image_fd,
     info->pt_dynamic_addr = 0;
 #endif
 
+    mmap_lock();
+
     /* Find the maximum size of the image and allocate an appropriate
        amount of memory to handle that.  */
     loaddr = -1, hiaddr = 0;
@@ -1998,6 +2000,8 @@ static void load_elf_image(const char *image_name, int image_fd,
         load_symbols(ehdr, image_fd, load_bias);
     }
 
+    mmap_unlock();
+
     close(image_fd);
     return;
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [Qemu-devel] [RFC v4 08/28] translate-all: Add assert_(memory|tb)_lock annotations
  2016-08-11 15:23 [Qemu-devel] [RFC v4 00/28] Base enabling patches for MTTCG Alex Bennée
                   ` (6 preceding siblings ...)
  2016-08-11 15:24 ` [Qemu-devel] [RFC v4 07/28] linux-user/elfload: ensure mmap_lock() held while setting up Alex Bennée
@ 2016-08-11 15:24 ` Alex Bennée
  2016-09-07  2:41   ` Richard Henderson
  2016-08-11 15:24 ` [Qemu-devel] [RFC v4 09/28] tcg: protect TBContext with tb_lock Alex Bennée
                   ` (21 subsequent siblings)
  29 siblings, 1 reply; 64+ messages in thread
From: Alex Bennée @ 2016-08-11 15:24 UTC (permalink / raw)
  To: mttcg, qemu-devel, fred.konrad, a.rigo, cota, bobby.prani, nikunj
  Cc: mark.burton, pbonzini, jan.kiszka, serge.fdrv, rth,
	peter.maydell, claudio.fontana, Alex Bennée,
	Peter Crosthwaite

This adds calls to the assert_(memory|tb)_lock for all public APIs which
are documented as needing them held for linux-user mode. The asserts are
NOPs for system-mode although these will be converted when MTTCG is
enabled.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

---
v4
  - mention tb_locks as well
  - tweak commit message
---
 translate-all.c | 22 +++++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/translate-all.c b/translate-all.c
index a3448e1..213685c 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -449,6 +449,10 @@ static PageDesc *page_find_alloc(tb_page_addr_t index, int alloc)
     void **lp;
     int i;
 
+    if (alloc) {
+        assert_memory_lock();
+    }
+
     /* Level 1.  Always allocated.  */
     lp = l1_map + ((index >> V_L1_SHIFT) & (V_L1_SIZE - 1));
 
@@ -815,6 +819,8 @@ static TranslationBlock *tb_alloc(target_ulong pc)
 {
     TranslationBlock *tb;
 
+    assert_tb_lock();
+
     if (tcg_ctx.tb_ctx.nb_tbs >= tcg_ctx.code_gen_max_blocks) {
         return NULL;
     }
@@ -828,6 +834,8 @@ static TranslationBlock *tb_alloc(target_ulong pc)
 /* Called with tb_lock held.  */
 void tb_free(TranslationBlock *tb)
 {
+    assert_tb_lock();
+
     /* In practice this is mostly used for single use temporary TB
        Ignore the hard cases and just back up if this TB happens to
        be the last one generated.  */
@@ -1054,6 +1062,8 @@ void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr)
     uint32_t h;
     tb_page_addr_t phys_pc;
 
+    assert_tb_lock();
+
     atomic_set(&tb->invalid, true);
 
     /* remove the TB from the hash list */
@@ -1111,7 +1121,7 @@ static void build_page_bitmap(PageDesc *p)
             tb_end = tb_start + tb->size;
             if (tb_end > TARGET_PAGE_SIZE) {
                 tb_end = TARGET_PAGE_SIZE;
-            }
+             }
         } else {
             tb_start = 0;
             tb_end = ((tb->pc + tb->size) & ~TARGET_PAGE_MASK);
@@ -1134,6 +1144,8 @@ static inline void tb_alloc_page(TranslationBlock *tb,
     bool page_already_protected;
 #endif
 
+    assert_memory_lock();
+
     tb->page_addr[n] = page_addr;
     p = page_find_alloc(page_addr >> TARGET_PAGE_BITS, 1);
     tb->page_next[n] = p->first_tb;
@@ -1190,6 +1202,8 @@ static void tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc,
 {
     uint32_t h;
 
+    assert_memory_lock();
+
     /* add in the page list */
     tb_alloc_page(tb, 0, phys_pc & TARGET_PAGE_MASK);
     if (phys_page2 != -1) {
@@ -1221,6 +1235,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
 #ifdef CONFIG_PROFILER
     int64_t ti;
 #endif
+    assert_memory_lock();
 
     phys_pc = get_page_addr_code(env, pc);
     if (use_icount && !(cflags & CF_IGNORE_ICOUNT)) {
@@ -1349,6 +1364,8 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
  */
 void tb_invalidate_phys_range(tb_page_addr_t start, tb_page_addr_t end)
 {
+    assert_memory_lock();
+
     while (start < end) {
         tb_invalidate_phys_page_range(start, end, 0);
         start &= TARGET_PAGE_MASK;
@@ -1385,6 +1402,8 @@ void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end,
     uint32_t current_flags = 0;
 #endif /* TARGET_HAS_PRECISE_SMC */
 
+    assert_memory_lock();
+
     p = page_find(start >> TARGET_PAGE_BITS);
     if (!p) {
         return;
@@ -1983,6 +2002,7 @@ void page_set_flags(target_ulong start, target_ulong end, int flags)
     assert(end < ((target_ulong)1 << L1_MAP_ADDR_SPACE_BITS));
 #endif
     assert(start < end);
+    assert_memory_lock();
 
     start = start & TARGET_PAGE_MASK;
     end = TARGET_PAGE_ALIGN(end);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [Qemu-devel] [RFC v4 09/28] tcg: protect TBContext with tb_lock.
  2016-08-11 15:23 [Qemu-devel] [RFC v4 00/28] Base enabling patches for MTTCG Alex Bennée
                   ` (7 preceding siblings ...)
  2016-08-11 15:24 ` [Qemu-devel] [RFC v4 08/28] translate-all: Add assert_(memory|tb)_lock annotations Alex Bennée
@ 2016-08-11 15:24 ` Alex Bennée
  2016-09-07  2:48   ` Richard Henderson
  2016-08-11 15:24 ` [Qemu-devel] [RFC v4 10/28] target-arm/arm-powerctl: wake up sleeping CPUs Alex Bennée
                   ` (20 subsequent siblings)
  29 siblings, 1 reply; 64+ messages in thread
From: Alex Bennée @ 2016-08-11 15:24 UTC (permalink / raw)
  To: mttcg, qemu-devel, fred.konrad, a.rigo, cota, bobby.prani, nikunj
  Cc: mark.burton, pbonzini, jan.kiszka, serge.fdrv, rth,
	peter.maydell, claudio.fontana, Alex Bennée,
	Peter Crosthwaite, Michael S. Tsirkin, Eduardo Habkost

From: KONRAD Frederic <fred.konrad@greensocs.com>

This protects TBContext with tb_lock to make tb_* thread safe.

<more detail here>

Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
Message-Id: <1439220437-23957-8-git-send-email-fred.konrad@greensocs.com>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[AJB: moved into tree, clean-up history]
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
---
v4 (base-patches, ajb):
  - protect tb_phys_invalidate with tb_lock
  - drop mention of tb_flush, thread safe flushing in earlier patch series
v3 (base-patches, ajb):
  - more explicit comments on resetting tb_lock
  - more explicit comments about thread safety of user-mode tb_flush
v2 (base-patches, ajb):
  - re-base fixes
v7 (FK, MTTCG):
  - Drop a tb_lock in already locked restore_state_to_opc.
v6 (FK, MTTCG):
  - Drop a tb_lock arround tb_find_fast in cpu-exec.c.
---
 cpu-exec.c         |  6 ++++++
 exec.c             |  6 ++++++
 hw/i386/kvmvapic.c |  4 ++++
 translate-all.c    | 28 ++++++++++++++++++++++++----
 4 files changed, 40 insertions(+), 4 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index f8fbf0d..93a0eb1 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -210,15 +210,21 @@ static void cpu_exec_nocache(CPUState *cpu, int max_cycles,
     if (max_cycles > CF_COUNT_MASK)
         max_cycles = CF_COUNT_MASK;
 
+    tb_lock();
     tb = tb_gen_code(cpu, orig_tb->pc, orig_tb->cs_base, orig_tb->flags,
                      max_cycles | CF_NOCACHE
                          | (ignore_icount ? CF_IGNORE_ICOUNT : 0));
     tb->orig_tb = orig_tb;
+    tb_unlock();
+
     /* execute the generated code */
     trace_exec_tb_nocache(tb, tb->pc);
     cpu_tb_exec(cpu, tb);
+
+    tb_lock();
     tb_phys_invalidate(tb, -1);
     tb_free(tb);
+    tb_unlock();
 }
 #endif
 
diff --git a/exec.c b/exec.c
index 60cf46a..f2ea554 100644
--- a/exec.c
+++ b/exec.c
@@ -2085,6 +2085,12 @@ static void check_watchpoint(int offset, int len, MemTxAttrs attrs, int flags)
                     continue;
                 }
                 cpu->watchpoint_hit = wp;
+
+                /* The tb_lock will be reset when cpu_loop_exit or
+                 * cpu_loop_exit_noexc longjmp back into the cpu_exec
+                 * main loop.
+                 */
+                tb_lock();
                 tb_check_watchpoint(cpu);
                 if (wp->flags & BP_STOP_BEFORE_ACCESS) {
                     cpu->exception_index = EXCP_DEBUG;
diff --git a/hw/i386/kvmvapic.c b/hw/i386/kvmvapic.c
index 1bc02fb..0024b76 100644
--- a/hw/i386/kvmvapic.c
+++ b/hw/i386/kvmvapic.c
@@ -17,6 +17,7 @@
 #include "sysemu/kvm.h"
 #include "hw/i386/apic_internal.h"
 #include "hw/sysbus.h"
+#include "tcg/tcg.h"
 
 #define VAPIC_IO_PORT           0x7e
 
@@ -449,6 +450,9 @@ static void patch_instruction(VAPICROMState *s, X86CPU *cpu, target_ulong ip)
     resume_all_vcpus();
 
     if (!kvm_enabled()) {
+        /* tb_lock will be reset when cpu_loop_exit_noexc longjmps
+         * back into the cpu_exec loop. */
+        tb_lock();
         tb_gen_code(cs, current_pc, current_cs_base, current_flags, 1);
         cpu_loop_exit_noexc(cs);
     }
diff --git a/translate-all.c b/translate-all.c
index 213685c..d21e5ab 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -347,8 +347,10 @@ bool cpu_restore_state(CPUState *cpu, uintptr_t retaddr)
         cpu_restore_state_from_tb(cpu, tb, retaddr);
         if (tb->cflags & CF_NOCACHE) {
             /* one-shot translation, invalidate it immediately */
+            tb_lock();
             tb_phys_invalidate(tb, -1);
             tb_free(tb);
+            tb_unlock();
         }
         return true;
     }
@@ -1417,6 +1419,7 @@ void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end,
     /* we remove all the TBs in the range [start, end[ */
     /* XXX: see if in some cases it could be faster to invalidate all
        the code */
+    tb_lock();
     tb = p->first_tb;
     while (tb != NULL) {
         n = (uintptr_t)tb & 3;
@@ -1476,6 +1479,7 @@ void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end,
         cpu_loop_exit_noexc(cpu);
     }
 #endif
+    tb_unlock();
 }
 
 #ifdef CONFIG_SOFTMMU
@@ -1545,6 +1549,8 @@ static bool tb_invalidate_phys_page(tb_page_addr_t addr, uintptr_t pc)
     if (!p) {
         return false;
     }
+
+    tb_lock();
     tb = p->first_tb;
 #ifdef TARGET_HAS_PRECISE_SMC
     if (tb && pc != 0) {
@@ -1582,9 +1588,13 @@ static bool tb_invalidate_phys_page(tb_page_addr_t addr, uintptr_t pc)
            modifying the memory. It will ensure that it cannot modify
            itself */
         tb_gen_code(cpu, current_pc, current_cs_base, current_flags, 1);
+        /* tb_lock will be reset after cpu_loop_exit_noexc longjmps
+         * back into the cpu_exec loop. */
         return true;
     }
 #endif
+    tb_unlock();
+
     return false;
 }
 #endif
@@ -1679,6 +1689,7 @@ void cpu_io_recompile(CPUState *cpu, uintptr_t retaddr)
     target_ulong pc, cs_base;
     uint32_t flags;
 
+    tb_lock();
     tb = tb_find_pc(retaddr);
     if (!tb) {
         cpu_abort(cpu, "cpu_io_recompile: could not find TB for pc=%p",
@@ -1730,11 +1741,16 @@ void cpu_io_recompile(CPUState *cpu, uintptr_t retaddr)
     /* FIXME: In theory this could raise an exception.  In practice
        we have already translated the block once so it's probably ok.  */
     tb_gen_code(cpu, pc, cs_base, flags, cflags);
+
     /* TODO: If env->pc != tb->pc (i.e. the faulting instruction was not
-       the first in the TB) then we end up generating a whole new TB and
-       repeating the fault, which is horribly inefficient.
-       Better would be to execute just this insn uncached, or generate a
-       second new TB.  */
+     * the first in the TB) then we end up generating a whole new TB and
+     *  repeating the fault, which is horribly inefficient.
+     *  Better would be to execute just this insn uncached, or generate a
+     *  second new TB.
+     *
+     * cpu_loop_exit_noexc will longjmp back to cpu_exec where the
+     * tb_lock gets reset.
+     */
     cpu_loop_exit_noexc(cpu);
 }
 
@@ -1763,6 +1779,8 @@ void dump_exec_info(FILE *f, fprintf_function cpu_fprintf)
     size_t hgram_bins;
     char *hgram;
 
+    tb_lock();
+
     target_code_size = 0;
     max_target_code_size = 0;
     cross_page = 0;
@@ -1850,6 +1868,8 @@ void dump_exec_info(FILE *f, fprintf_function cpu_fprintf)
             tcg_ctx.tb_ctx.tb_phys_invalidate_count);
     cpu_fprintf(f, "TLB flush count     %d\n", tlb_flush_count);
     tcg_dump_info(f, cpu_fprintf);
+
+    tb_unlock();
 }
 
 void dump_opcount_info(FILE *f, fprintf_function cpu_fprintf)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [Qemu-devel] [RFC v4 10/28] target-arm/arm-powerctl: wake up sleeping CPUs
  2016-08-11 15:23 [Qemu-devel] [RFC v4 00/28] Base enabling patches for MTTCG Alex Bennée
                   ` (8 preceding siblings ...)
  2016-08-11 15:24 ` [Qemu-devel] [RFC v4 09/28] tcg: protect TBContext with tb_lock Alex Bennée
@ 2016-08-11 15:24 ` Alex Bennée
  2016-08-11 15:24 ` [Qemu-devel] [RFC v4 11/28] tcg: move tcg_exec_all and helpers above thread fn Alex Bennée
                   ` (19 subsequent siblings)
  29 siblings, 0 replies; 64+ messages in thread
From: Alex Bennée @ 2016-08-11 15:24 UTC (permalink / raw)
  To: mttcg, qemu-devel, fred.konrad, a.rigo, cota, bobby.prani, nikunj
  Cc: mark.burton, pbonzini, jan.kiszka, serge.fdrv, rth,
	peter.maydell, claudio.fontana, Alex Bennée,
	Alexander Spyridakis, open list:ARM

Testing with Alexander's bare metal syncronisation tests fails in MTTCG
leaving one CPU spinning forever waiting for the second CPU to wake up.
We simply need to kick the vCPU once we have processed the PSCI power on
call.

As the power control API is for system emulation only as is the
qemu_kick_cpu function we also ensure we only build arm-powerctl for
SoftMMU builds.

Tested-by: Alex Bennée <alex.bennee@linaro.org>
CC: Alexander Spyridakis <a.spyridakis@virtualopensystems.com>
Message-Id: <1439220437-23957-20-git-send-email-fred.konrad@greensocs.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

---
v3
  - re-base caught arm_powerctl re-factor
  - include cpu.h header for kick definition
  - fix Makefile.objs to only build for softmmu
---
 target-arm/Makefile.objs  | 2 +-
 target-arm/arm-powerctl.c | 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/target-arm/Makefile.objs b/target-arm/Makefile.objs
index f206411..847fb52 100644
--- a/target-arm/Makefile.objs
+++ b/target-arm/Makefile.objs
@@ -9,4 +9,4 @@ obj-y += neon_helper.o iwmmxt_helper.o
 obj-y += gdbstub.o
 obj-$(TARGET_AARCH64) += cpu64.o translate-a64.o helper-a64.o gdbstub64.o
 obj-y += crypto_helper.o
-obj-y += arm-powerctl.o
+obj-$(CONFIG_SOFTMMU) += arm-powerctl.o
diff --git a/target-arm/arm-powerctl.c b/target-arm/arm-powerctl.c
index 6519d52..fbb7a15 100644
--- a/target-arm/arm-powerctl.c
+++ b/target-arm/arm-powerctl.c
@@ -166,6 +166,8 @@ int arm_set_cpu_on(uint64_t cpuid, uint64_t entry, uint64_t context_id,
     /* Start the new CPU at the requested address */
     cpu_set_pc(target_cpu_state, entry);
 
+    qemu_cpu_kick(target_cpu_state);
+
     /* We are good to go */
     return QEMU_ARM_POWERCTL_RET_SUCCESS;
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [Qemu-devel] [RFC v4 11/28] tcg: move tcg_exec_all and helpers above thread fn
  2016-08-11 15:23 [Qemu-devel] [RFC v4 00/28] Base enabling patches for MTTCG Alex Bennée
                   ` (9 preceding siblings ...)
  2016-08-11 15:24 ` [Qemu-devel] [RFC v4 10/28] target-arm/arm-powerctl: wake up sleeping CPUs Alex Bennée
@ 2016-08-11 15:24 ` Alex Bennée
  2016-09-07  2:53   ` Richard Henderson
  2016-08-11 15:24 ` [Qemu-devel] [RFC v4 12/28] tcg: cpus rm tcg_exec_all() Alex Bennée
                   ` (18 subsequent siblings)
  29 siblings, 1 reply; 64+ messages in thread
From: Alex Bennée @ 2016-08-11 15:24 UTC (permalink / raw)
  To: mttcg, qemu-devel, fred.konrad, a.rigo, cota, bobby.prani, nikunj
  Cc: mark.burton, pbonzini, jan.kiszka, serge.fdrv, rth,
	peter.maydell, claudio.fontana, Alex Bennée,
	Peter Crosthwaite

This is a pure mechanical change in preparation for up-coming
re-factoring. Instead of a forward declaration for tcg_exec_all it and
the associated helper functions are moved in front of the call from
qemu_tcg_cpu_thread_fn.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

---
v4
  - split from tcg: cpus rm tcg_exec_all() patch
---
 cpus.c | 196 ++++++++++++++++++++++++++++++++---------------------------------
 1 file changed, 97 insertions(+), 99 deletions(-)

diff --git a/cpus.c b/cpus.c
index 1700032..0759a84 100644
--- a/cpus.c
+++ b/cpus.c
@@ -1067,7 +1067,103 @@ static void *qemu_dummy_cpu_thread_fn(void *arg)
 #endif
 }
 
-static void tcg_exec_all(void);
+static int64_t tcg_get_icount_limit(void)
+{
+    int64_t deadline;
+
+    if (replay_mode != REPLAY_MODE_PLAY) {
+        deadline = qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL);
+
+        /* Maintain prior (possibly buggy) behaviour where if no deadline
+         * was set (as there is no QEMU_CLOCK_VIRTUAL timer) or it is more than
+         * INT32_MAX nanoseconds ahead, we still use INT32_MAX
+         * nanoseconds.
+         */
+        if ((deadline < 0) || (deadline > INT32_MAX)) {
+            deadline = INT32_MAX;
+        }
+
+        return qemu_icount_round(deadline);
+    } else {
+        return replay_get_instructions();
+    }
+}
+
+static int tcg_cpu_exec(CPUState *cpu)
+{
+    int ret;
+#ifdef CONFIG_PROFILER
+    int64_t ti;
+#endif
+
+#ifdef CONFIG_PROFILER
+    ti = profile_getclock();
+#endif
+    if (use_icount) {
+        int64_t count;
+        int decr;
+        timers_state.qemu_icount -= (cpu->icount_decr.u16.low
+                                    + cpu->icount_extra);
+        cpu->icount_decr.u16.low = 0;
+        cpu->icount_extra = 0;
+        count = tcg_get_icount_limit();
+        timers_state.qemu_icount += count;
+        decr = (count > 0xffff) ? 0xffff : count;
+        count -= decr;
+        cpu->icount_decr.u16.low = decr;
+        cpu->icount_extra = count;
+    }
+    ret = cpu_exec(cpu);
+#ifdef CONFIG_PROFILER
+    tcg_time += profile_getclock() - ti;
+#endif
+    if (use_icount) {
+        /* Fold pending instructions back into the
+           instruction counter, and clear the interrupt flag.  */
+        timers_state.qemu_icount -= (cpu->icount_decr.u16.low
+                        + cpu->icount_extra);
+        cpu->icount_decr.u32 = 0;
+        cpu->icount_extra = 0;
+        replay_account_executed_instructions();
+    }
+    return ret;
+}
+
+static void tcg_exec_all(void)
+{
+    int r;
+
+    /* Account partial waits to QEMU_CLOCK_VIRTUAL.  */
+    qemu_account_warp_timer();
+
+    if (next_cpu == NULL) {
+        next_cpu = first_cpu;
+    }
+    for (; next_cpu != NULL && !exit_request; next_cpu = CPU_NEXT(next_cpu)) {
+        CPUState *cpu = next_cpu;
+
+        qemu_clock_enable(QEMU_CLOCK_VIRTUAL,
+                          (cpu->singlestep_enabled & SSTEP_NOTIMER) == 0);
+
+        if (cpu_can_run(cpu)) {
+            tcg_cpu_exec_start(cpu);
+            r = tcg_cpu_exec(cpu);
+            tcg_cpu_exec_end(cpu);
+            if (r == EXCP_DEBUG) {
+                cpu_handle_guest_debug(cpu);
+                break;
+            }
+        } else if (cpu->stop || cpu->stopped) {
+            if (cpu->unplug) {
+                next_cpu = CPU_NEXT(cpu);
+            }
+            break;
+        }
+    }
+
+    /* Pairs with smp_wmb in qemu_cpu_kick.  */
+    atomic_mb_set(&exit_request, 0);
+}
 
 static void *qemu_tcg_cpu_thread_fn(void *arg)
 {
@@ -1424,104 +1520,6 @@ int vm_stop_force_state(RunState state)
     }
 }
 
-static int64_t tcg_get_icount_limit(void)
-{
-    int64_t deadline;
-
-    if (replay_mode != REPLAY_MODE_PLAY) {
-        deadline = qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL);
-
-        /* Maintain prior (possibly buggy) behaviour where if no deadline
-         * was set (as there is no QEMU_CLOCK_VIRTUAL timer) or it is more than
-         * INT32_MAX nanoseconds ahead, we still use INT32_MAX
-         * nanoseconds.
-         */
-        if ((deadline < 0) || (deadline > INT32_MAX)) {
-            deadline = INT32_MAX;
-        }
-
-        return qemu_icount_round(deadline);
-    } else {
-        return replay_get_instructions();
-    }
-}
-
-static int tcg_cpu_exec(CPUState *cpu)
-{
-    int ret;
-#ifdef CONFIG_PROFILER
-    int64_t ti;
-#endif
-
-#ifdef CONFIG_PROFILER
-    ti = profile_getclock();
-#endif
-    if (use_icount) {
-        int64_t count;
-        int decr;
-        timers_state.qemu_icount -= (cpu->icount_decr.u16.low
-                                    + cpu->icount_extra);
-        cpu->icount_decr.u16.low = 0;
-        cpu->icount_extra = 0;
-        count = tcg_get_icount_limit();
-        timers_state.qemu_icount += count;
-        decr = (count > 0xffff) ? 0xffff : count;
-        count -= decr;
-        cpu->icount_decr.u16.low = decr;
-        cpu->icount_extra = count;
-    }
-    ret = cpu_exec(cpu);
-#ifdef CONFIG_PROFILER
-    tcg_time += profile_getclock() - ti;
-#endif
-    if (use_icount) {
-        /* Fold pending instructions back into the
-           instruction counter, and clear the interrupt flag.  */
-        timers_state.qemu_icount -= (cpu->icount_decr.u16.low
-                        + cpu->icount_extra);
-        cpu->icount_decr.u32 = 0;
-        cpu->icount_extra = 0;
-        replay_account_executed_instructions();
-    }
-    return ret;
-}
-
-static void tcg_exec_all(void)
-{
-    int r;
-
-    /* Account partial waits to QEMU_CLOCK_VIRTUAL.  */
-    qemu_account_warp_timer();
-
-    if (next_cpu == NULL) {
-        next_cpu = first_cpu;
-    }
-    for (; next_cpu != NULL && !exit_request; next_cpu = CPU_NEXT(next_cpu)) {
-        CPUState *cpu = next_cpu;
-
-        qemu_clock_enable(QEMU_CLOCK_VIRTUAL,
-                          (cpu->singlestep_enabled & SSTEP_NOTIMER) == 0);
-
-        if (cpu_can_run(cpu)) {
-            tcg_cpu_exec_start(cpu);
-            r = tcg_cpu_exec(cpu);
-            tcg_cpu_exec_end(cpu);
-            if (r == EXCP_DEBUG) {
-                cpu_handle_guest_debug(cpu);
-                break;
-            }
-        } else if (cpu->stop || cpu->stopped) {
-            if (cpu->unplug) {
-                next_cpu = CPU_NEXT(cpu);
-            }
-            break;
-        }
-    }
-
-    /* Pairs with smp_wmb in qemu_cpu_kick.  */
-    atomic_mb_set(&exit_request, 0);
-}
-
 void list_cpus(FILE *f, fprintf_function cpu_fprintf, const char *optarg)
 {
     /* XXX: implement xxx_cpu_list for targets that still miss it */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [Qemu-devel] [RFC v4 12/28] tcg: cpus rm tcg_exec_all()
  2016-08-11 15:23 [Qemu-devel] [RFC v4 00/28] Base enabling patches for MTTCG Alex Bennée
                   ` (10 preceding siblings ...)
  2016-08-11 15:24 ` [Qemu-devel] [RFC v4 11/28] tcg: move tcg_exec_all and helpers above thread fn Alex Bennée
@ 2016-08-11 15:24 ` Alex Bennée
  2016-08-11 15:24 ` [Qemu-devel] [RFC v4 13/28] tcg: add options for enabling MTTCG Alex Bennée
                   ` (17 subsequent siblings)
  29 siblings, 0 replies; 64+ messages in thread
From: Alex Bennée @ 2016-08-11 15:24 UTC (permalink / raw)
  To: mttcg, qemu-devel, fred.konrad, a.rigo, cota, bobby.prani, nikunj
  Cc: mark.burton, pbonzini, jan.kiszka, serge.fdrv, rth,
	peter.maydell, claudio.fontana, Alex Bennée,
	Peter Crosthwaite

In preparation for multi-threaded TCG we remove tcg_exec_all and move
all the CPU cycling into the main thread function. When MTTCG is enabled
we shall use a separate thread function which only handles one vCPU.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>

---
v2
  - update timer calls to new API on rebase
v3
  - move tcg_cpu_exec above thread function, drop static fwd declaration
v4
  - split mechanical moves into earlier patch
  - moved unplug logic info function, don't break smp boot
---
 cpus.c | 89 +++++++++++++++++++++++++++++++++---------------------------------
 1 file changed, 45 insertions(+), 44 deletions(-)

diff --git a/cpus.c b/cpus.c
index 0759a84..0ff51ea 100644
--- a/cpus.c
+++ b/cpus.c
@@ -69,7 +69,6 @@
 
 #endif /* CONFIG_LINUX */
 
-static CPUState *next_cpu;
 int64_t max_delay;
 int64_t max_advance;
 
@@ -1129,46 +1128,26 @@ static int tcg_cpu_exec(CPUState *cpu)
     return ret;
 }
 
-static void tcg_exec_all(void)
+/* Destroy any remaining vCPUs which have been unplugged and have
+ * finished running
+ */
+static void deal_with_unplugged_cpus(void)
 {
-    int r;
-
-    /* Account partial waits to QEMU_CLOCK_VIRTUAL.  */
-    qemu_account_warp_timer();
-
-    if (next_cpu == NULL) {
-        next_cpu = first_cpu;
-    }
-    for (; next_cpu != NULL && !exit_request; next_cpu = CPU_NEXT(next_cpu)) {
-        CPUState *cpu = next_cpu;
-
-        qemu_clock_enable(QEMU_CLOCK_VIRTUAL,
-                          (cpu->singlestep_enabled & SSTEP_NOTIMER) == 0);
+    CPUState *cpu;
 
-        if (cpu_can_run(cpu)) {
-            tcg_cpu_exec_start(cpu);
-            r = tcg_cpu_exec(cpu);
-            tcg_cpu_exec_end(cpu);
-            if (r == EXCP_DEBUG) {
-                cpu_handle_guest_debug(cpu);
-                break;
-            }
-        } else if (cpu->stop || cpu->stopped) {
-            if (cpu->unplug) {
-                next_cpu = CPU_NEXT(cpu);
-            }
+    CPU_FOREACH(cpu) {
+        if (cpu->unplug && !cpu_can_run(cpu)) {
+            qemu_tcg_destroy_vcpu(cpu);
+            cpu->created = false;
+            qemu_cond_signal(&qemu_cpu_cond);
             break;
         }
     }
-
-    /* Pairs with smp_wmb in qemu_cpu_kick.  */
-    atomic_mb_set(&exit_request, 0);
 }
 
 static void *qemu_tcg_cpu_thread_fn(void *arg)
 {
     CPUState *cpu = arg;
-    CPUState *remove_cpu = NULL;
 
     rcu_register_thread();
 
@@ -1195,8 +1174,41 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
     /* process any pending work */
     atomic_mb_set(&exit_request, 1);
 
+    cpu = first_cpu;
+
     while (1) {
-        tcg_exec_all();
+        /* Account partial waits to QEMU_CLOCK_VIRTUAL.  */
+        qemu_account_warp_timer();
+
+        if (!cpu) {
+            cpu = first_cpu;
+        }
+
+        for (; cpu != NULL && !exit_request; cpu = CPU_NEXT(cpu)) {
+
+            qemu_clock_enable(QEMU_CLOCK_VIRTUAL,
+                              (cpu->singlestep_enabled & SSTEP_NOTIMER) == 0);
+
+            if (cpu_can_run(cpu)) {
+                int r;
+                tcg_cpu_exec_start(cpu);
+                r = tcg_cpu_exec(cpu);
+                tcg_cpu_exec_end(cpu);
+                if (r == EXCP_DEBUG) {
+                    cpu_handle_guest_debug(cpu);
+                    break;
+                }
+            } else if (cpu->stop || cpu->stopped) {
+                if (cpu->unplug) {
+                    cpu = CPU_NEXT(cpu);
+                }
+                break;
+            }
+
+        } /* for cpu.. */
+
+        /* Pairs with smp_wmb in qemu_cpu_kick.  */
+        atomic_mb_set(&exit_request, 0);
 
         if (use_icount) {
             int64_t deadline = qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL);
@@ -1206,18 +1218,7 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
             }
         }
         qemu_tcg_wait_io_event(QTAILQ_FIRST(&cpus));
-        CPU_FOREACH(cpu) {
-            if (cpu->unplug && !cpu_can_run(cpu)) {
-                remove_cpu = cpu;
-                break;
-            }
-        }
-        if (remove_cpu) {
-            qemu_tcg_destroy_vcpu(remove_cpu);
-            cpu->created = false;
-            qemu_cond_signal(&qemu_cpu_cond);
-            remove_cpu = NULL;
-        }
+        deal_with_unplugged_cpus();
     }
 
     return NULL;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [Qemu-devel] [RFC v4 13/28] tcg: add options for enabling MTTCG
  2016-08-11 15:23 [Qemu-devel] [RFC v4 00/28] Base enabling patches for MTTCG Alex Bennée
                   ` (11 preceding siblings ...)
  2016-08-11 15:24 ` [Qemu-devel] [RFC v4 12/28] tcg: cpus rm tcg_exec_all() Alex Bennée
@ 2016-08-11 15:24 ` Alex Bennée
  2016-09-07  3:06   ` Richard Henderson
  2016-08-11 15:24 ` [Qemu-devel] [RFC v4 14/28] tcg: add kick timer for single-threaded vCPU emulation Alex Bennée
                   ` (16 subsequent siblings)
  29 siblings, 1 reply; 64+ messages in thread
From: Alex Bennée @ 2016-08-11 15:24 UTC (permalink / raw)
  To: mttcg, qemu-devel, fred.konrad, a.rigo, cota, bobby.prani, nikunj
  Cc: mark.burton, pbonzini, jan.kiszka, serge.fdrv, rth,
	peter.maydell, claudio.fontana, Alex Bennée,
	Peter Crosthwaite

From: KONRAD Frederic <fred.konrad@greensocs.com>

We know there will be cases where MTTCG won't work until additional work
is done in the front/back ends to support. It will however be useful to
be able to turn it on.

As a result MTTCG will default to off unless the combination is
supported. However the user can turn it on for the sake of testing.

Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
[AJB: move to -accel tcg,thread=multi|single, defaults]
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

---
v1:
  - merge with add mttcg option.
  - update commit message
v2:
  - machine_init->opts_init
v3:
  - moved from -tcg to -accel tcg,thread=single|multi
  - fix checkpatch warnings
v4:
  - make mttcg_enabled extern, qemu_tcg_mttcg_enabled() now just macro
  - qemu_tcg_configure now propagates Error instead of exiting
  - better error checking of thread=foo
  - use CONFIG flags for default_mttcg_enabled()
  - disable mttcg with icount, error if both forced on
---
 cpus.c                | 40 ++++++++++++++++++++++++++++++++++++++++
 include/qom/cpu.h     |  9 +++++++++
 include/sysemu/cpus.h |  2 ++
 qemu-options.hx       | 20 ++++++++++++++++++++
 vl.c                  | 48 +++++++++++++++++++++++++++++++++++++++++++++++-
 5 files changed, 118 insertions(+), 1 deletion(-)

diff --git a/cpus.c b/cpus.c
index 0ff51ea..b5b45b8 100644
--- a/cpus.c
+++ b/cpus.c
@@ -25,6 +25,7 @@
 /* Needed early for CONFIG_BSD etc. */
 #include "qemu/osdep.h"
 #include "qemu-common.h"
+#include "qemu/config-file.h"
 #include "cpu.h"
 #include "monitor/monitor.h"
 #include "qapi/qmp/qerror.h"
@@ -148,6 +149,45 @@ typedef struct TimersState {
 } TimersState;
 
 static TimersState timers_state;
+bool mttcg_enabled;
+
+/*
+ * We default to false if we know other options have been enabled
+ * which are currently incompatible with MTTCG. Otherwise when each
+ * guest (target) and host has been updated to support:
+ *   - atomic instructions
+ *   - memory ordering
+ * they can set the appropriate CONFIG flags in ${target}-softmmu.mak
+ * and generated config-host.mak fragments.
+ */
+static bool default_mttcg_enabled(void)
+{
+    if (qemu_find_opts("icount")) {
+        return false;
+    } else {
+#if defined(CONFIG_MTTCG_TARGET) && defined(CONFIG_MTTCG_HOST)
+        return true;
+#else
+        return false;
+#endif
+    }
+}
+
+void qemu_tcg_configure(QemuOpts *opts, Error **errp)
+{
+    const char *t = qemu_opt_get(opts, "thread");
+    if (t) {
+        if (strcmp(t, "multi") == 0) {
+            mttcg_enabled = true;
+        } else if (strcmp(t, "single") == 0) {
+            mttcg_enabled = false;
+        } else {
+            error_setg(errp, "Invalid 'thread' setting %s", t);
+        }
+    } else {
+        mttcg_enabled = default_mttcg_enabled();
+    }
+}
 
 int64_t cpu_get_icount_raw(void)
 {
diff --git a/include/qom/cpu.h b/include/qom/cpu.h
index 8ee770a..682788b 100644
--- a/include/qom/cpu.h
+++ b/include/qom/cpu.h
@@ -390,6 +390,15 @@ extern struct CPUTailQ cpus;
 extern __thread CPUState *current_cpu;
 
 /**
+ * qemu_tcg_mttcg_enabled:
+ * Check whether we are running MultiThread TCG or not.
+ *
+ * Returns: %true if we are in MTTCG mode %false otherwise.
+ */
+extern bool mttcg_enabled;
+#define qemu_tcg_mttcg_enabled() (mttcg_enabled)
+
+/**
  * cpu_paging_enabled:
  * @cpu: The CPU whose state is to be inspected.
  *
diff --git a/include/sysemu/cpus.h b/include/sysemu/cpus.h
index fe992a8..c5055f9 100644
--- a/include/sysemu/cpus.h
+++ b/include/sysemu/cpus.h
@@ -39,4 +39,6 @@ extern int smp_threads;
 
 void list_cpus(FILE *f, fprintf_function cpu_fprintf, const char *optarg);
 
+void qemu_tcg_configure(QemuOpts *opts, Error **errp);
+
 #endif
diff --git a/qemu-options.hx b/qemu-options.hx
index 8e0d9a5..aa27b54 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -96,6 +96,26 @@ STEXI
 Select CPU model (@code{-cpu help} for list and additional feature selection)
 ETEXI
 
+DEF("accel", HAS_ARG, QEMU_OPTION_accel,
+    "-accel [accel=]accelerator[,thread=single|multi]\n"
+    "               select accelerator ('-accel help for list')\n"
+    "               thread=single|multi (enable multi-threaded TCG)", QEMU_ARCH_ALL)
+STEXI
+@item -accel @var{name}[,prop=@var{value}[,...]]
+@findex -accel
+This is used to enable an accelerator. Depending on the target architecture,
+kvm, xen, or tcg can be available. By default, tcg is used. If there is more
+than one accelerator specified, the next one is used if the previous one fails
+to initialize.
+@table @option
+@item thread=single|multi
+Controls number of TCG threads. When the TCG is multi-threaded there will be one
+thread per vCPU therefor taking advantage of additional host cores. The default
+is to enable multi-threading where both the back-end and front-ends support it and
+no incompatible TCG features have been enabled (e.g. icount/replay).
+@end table
+ETEXI
+
 DEF("smp", HAS_ARG, QEMU_OPTION_smp,
     "-smp [cpus=]n[,maxcpus=cpus][,cores=cores][,threads=threads][,sockets=sockets]\n"
     "                set the number of CPUs to 'n' [default=1]\n"
diff --git a/vl.c b/vl.c
index a455947..eeeea08 100644
--- a/vl.c
+++ b/vl.c
@@ -293,6 +293,26 @@ static QemuOptsList qemu_machine_opts = {
     },
 };
 
+static QemuOptsList qemu_accel_opts = {
+    .name = "accel",
+    .implied_opt_name = "accel",
+    .head = QTAILQ_HEAD_INITIALIZER(qemu_accel_opts.head),
+    .merge_lists = true,
+    .desc = {
+        {
+            .name = "accel",
+            .type = QEMU_OPT_STRING,
+            .help = "Select the type of accelerator",
+        },
+        {
+            .name = "thread",
+            .type = QEMU_OPT_STRING,
+            .help = "Enable/disable multi-threaded TCG",
+        },
+        { /* end of list */ }
+    },
+};
+
 static QemuOptsList qemu_boot_opts = {
     .name = "boot-opts",
     .implied_opt_name = "order",
@@ -2936,7 +2956,8 @@ int main(int argc, char **argv, char **envp)
     const char *boot_once = NULL;
     DisplayState *ds;
     int cyls, heads, secs, translation;
-    QemuOpts *hda_opts = NULL, *opts, *machine_opts, *icount_opts = NULL;
+    QemuOpts *opts, *machine_opts;
+    QemuOpts *hda_opts = NULL, *icount_opts = NULL;
     QemuOptsList *olist;
     int optind;
     const char *optarg;
@@ -2986,6 +3007,7 @@ int main(int argc, char **argv, char **envp)
     qemu_add_opts(&qemu_trace_opts);
     qemu_add_opts(&qemu_option_rom_opts);
     qemu_add_opts(&qemu_machine_opts);
+    qemu_add_opts(&qemu_accel_opts);
     qemu_add_opts(&qemu_mem_opts);
     qemu_add_opts(&qemu_smp_opts);
     qemu_add_opts(&qemu_boot_opts);
@@ -3676,6 +3698,27 @@ int main(int argc, char **argv, char **envp)
                 qdev_prop_register_global(&kvm_pit_lost_tick_policy);
                 break;
             }
+            case QEMU_OPTION_accel:
+                opts = qemu_opts_parse_noisily(qemu_find_opts("accel"),
+                                               optarg, true);
+                optarg = qemu_opt_get(opts, "accel");
+
+                olist = qemu_find_opts("machine");
+                if (strcmp("kvm", optarg) == 0) {
+                    qemu_opts_parse_noisily(olist, "accel=kvm", false);
+                } else if (strcmp("xen", optarg) == 0) {
+                    qemu_opts_parse_noisily(olist, "accel=xen", false);
+                } else if (strcmp("tcg", optarg) == 0) {
+                    qemu_opts_parse_noisily(olist, "accel=tcg", false);
+                    qemu_tcg_configure(opts, &error_fatal);
+                } else {
+                    if (!is_help_option(optarg)) {
+                        error_printf("Unknown accelerator: %s", optarg);
+                    }
+                    error_printf("Supported accelerators: kvm, xen, tcg\n");
+                    exit(1);
+                }
+                break;
             case QEMU_OPTION_usb:
                 olist = qemu_find_opts("machine");
                 qemu_opts_parse_noisily(olist, "usb=on", false);
@@ -4340,6 +4383,9 @@ int main(int argc, char **argv, char **envp)
         if (kvm_enabled() || xen_enabled()) {
             error_report("-icount is not allowed with kvm or xen");
             exit(1);
+        } else if (qemu_tcg_mttcg_enabled()) {
+            error_report("-icount does not currently work with MTTCG");
+            exit(1);
         }
         configure_icount(icount_opts, &error_abort);
         qemu_opts_del(icount_opts);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [Qemu-devel] [RFC v4 14/28] tcg: add kick timer for single-threaded vCPU emulation
  2016-08-11 15:23 [Qemu-devel] [RFC v4 00/28] Base enabling patches for MTTCG Alex Bennée
                   ` (12 preceding siblings ...)
  2016-08-11 15:24 ` [Qemu-devel] [RFC v4 13/28] tcg: add options for enabling MTTCG Alex Bennée
@ 2016-08-11 15:24 ` Alex Bennée
  2016-09-07  3:25   ` Richard Henderson
  2016-08-11 15:24 ` [Qemu-devel] [RFC v4 15/28] tcg: rename tcg_current_cpu to tcg_current_rr_cpu Alex Bennée
                   ` (15 subsequent siblings)
  29 siblings, 1 reply; 64+ messages in thread
From: Alex Bennée @ 2016-08-11 15:24 UTC (permalink / raw)
  To: mttcg, qemu-devel, fred.konrad, a.rigo, cota, bobby.prani, nikunj
  Cc: mark.burton, pbonzini, jan.kiszka, serge.fdrv, rth,
	peter.maydell, claudio.fontana, Alex Bennée,
	Peter Crosthwaite

Currently we rely on the side effect of the main loop grabbing the
iothread_mutex to give any long running basic block chains a kick to
ensure the next vCPU is scheduled. As this code is being re-factored and
rationalised we now do it explicitly here.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

---
v2
  - re-base fixes
  - get_ticks_per_sec() -> NANOSECONDS_PER_SEC
v3
  - add define for TCG_KICK_FREQ
  - fix checkpatch warning
v4
  - wrap next calc in inline qemu_tcg_next_kick() instead of macro
---
 cpus.c | 32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/cpus.c b/cpus.c
index b5b45b8..8c49d6c 100644
--- a/cpus.c
+++ b/cpus.c
@@ -1185,9 +1185,34 @@ static void deal_with_unplugged_cpus(void)
     }
 }
 
+/* Single-threaded TCG
+ *
+ * In the single-threaded case each vCPU is simulated in turn. If
+ * there is more than a single vCPU we create a simple timer to kick
+ * the vCPU and ensure we don't get stuck in a tight loop in one vCPU.
+ * This is done explicitly rather than relying on side-effects
+ * elsewhere.
+ */
+static void qemu_cpu_kick_no_halt(void);
+
+#define TCG_KICK_PERIOD (NANOSECONDS_PER_SECOND / 10)
+
+static inline int64_t qemu_tcg_next_kick(void)
+{
+    return qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + TCG_KICK_PERIOD;
+}
+
+static void kick_tcg_thread(void *opaque)
+{
+    QEMUTimer *self = *(QEMUTimer **) opaque;
+    timer_mod(self, qemu_tcg_next_kick());
+    qemu_cpu_kick_no_halt();
+}
+
 static void *qemu_tcg_cpu_thread_fn(void *arg)
 {
     CPUState *cpu = arg;
+    QEMUTimer *kick_timer;
 
     rcu_register_thread();
 
@@ -1211,6 +1236,13 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
         }
     }
 
+    /* Set to kick if we have to do more than one vCPU */
+    if (CPU_NEXT(first_cpu)) {
+        kick_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,  kick_tcg_thread,
+                                  &kick_timer);
+        timer_mod(kick_timer, qemu_tcg_next_kick());
+    }
+
     /* process any pending work */
     atomic_mb_set(&exit_request, 1);
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [Qemu-devel] [RFC v4 15/28] tcg: rename tcg_current_cpu to tcg_current_rr_cpu
  2016-08-11 15:23 [Qemu-devel] [RFC v4 00/28] Base enabling patches for MTTCG Alex Bennée
                   ` (13 preceding siblings ...)
  2016-08-11 15:24 ` [Qemu-devel] [RFC v4 14/28] tcg: add kick timer for single-threaded vCPU emulation Alex Bennée
@ 2016-08-11 15:24 ` Alex Bennée
  2016-09-07  3:34   ` Richard Henderson
  2016-08-11 15:24 ` [Qemu-devel] [RFC v4 16/28] tcg: drop global lock during TCG code execution Alex Bennée
                   ` (14 subsequent siblings)
  29 siblings, 1 reply; 64+ messages in thread
From: Alex Bennée @ 2016-08-11 15:24 UTC (permalink / raw)
  To: mttcg, qemu-devel, fred.konrad, a.rigo, cota, bobby.prani, nikunj
  Cc: mark.burton, pbonzini, jan.kiszka, serge.fdrv, rth,
	peter.maydell, claudio.fontana, Alex Bennée,
	Peter Crosthwaite

..and make the definition local to cpus. In preparation for MTTCG the
concept of a global tcg_current_cpu will no longer make sense. However
we still need to keep track of it in the single-threaded case to be able
to exit quickly when required.

qemu_cpu_kick_no_halt() moves and becomes qemu_cpu_kick_rr_cpu() to
emphasise its use-case. qemu_cpu_kick now kicks the relevant cpu as
well as qemu_kick_rr_cpu() which will become a no-op in MTTCG.

For the time being the setting of the global exit_request remains.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

---
v4:
  - keep global exit_request setting for now
  - fix merge conflicts
---
 cpu-exec-common.c       |  3 ++-
 cpu-exec.c              |  3 ---
 cpus.c                  | 41 ++++++++++++++++++++++++-----------------
 include/exec/exec-all.h |  1 -
 4 files changed, 26 insertions(+), 22 deletions(-)

diff --git a/cpu-exec-common.c b/cpu-exec-common.c
index 0bec55a..e220ba7 100644
--- a/cpu-exec-common.c
+++ b/cpu-exec-common.c
@@ -24,7 +24,6 @@
 #include "exec/memory-internal.h"
 
 bool exit_request;
-CPUState *tcg_current_cpu;
 int tcg_pending_threads;
 
 /* exit the current TB, but without causing any exception to be raised */
@@ -140,6 +139,7 @@ void run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data)
     struct qemu_work_item wi;
     bool done = false;
 
+    /* Always true when using tcg RR scheduling from a vCPU context */
     if (qemu_cpu_is_self(cpu)) {
         func(cpu, data);
         return;
@@ -163,6 +163,7 @@ void async_run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data)
 {
     struct qemu_work_item wi;
 
+    /* Always true when using tcg RR scheduling from a vCPU context */
     if (qemu_cpu_is_self(cpu)) {
         func(cpu, data);
         return;
diff --git a/cpu-exec.c b/cpu-exec.c
index 93a0eb1..42fbf7d 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -578,7 +578,6 @@ int cpu_exec(CPUState *cpu)
         return EXCP_HALTED;
     }
 
-    atomic_mb_set(&tcg_current_cpu, cpu);
     rcu_read_lock();
 
     if (unlikely(atomic_mb_read(&exit_request))) {
@@ -637,7 +636,5 @@ int cpu_exec(CPUState *cpu)
     /* fail safe : never use current_cpu outside cpu_exec() */
     current_cpu = NULL;
 
-    /* Does not need atomic_mb_set because a spurious wakeup is okay.  */
-    atomic_set(&tcg_current_cpu, NULL);
     return ret;
 }
diff --git a/cpus.c b/cpus.c
index 8c49d6c..732b21f 100644
--- a/cpus.c
+++ b/cpus.c
@@ -1193,7 +1193,6 @@ static void deal_with_unplugged_cpus(void)
  * This is done explicitly rather than relying on side-effects
  * elsewhere.
  */
-static void qemu_cpu_kick_no_halt(void);
 
 #define TCG_KICK_PERIOD (NANOSECONDS_PER_SECOND / 10)
 
@@ -1202,11 +1201,27 @@ static inline int64_t qemu_tcg_next_kick(void)
     return qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + TCG_KICK_PERIOD;
 }
 
+/* only used in single-thread tcg mode */
+static CPUState *tcg_current_rr_cpu;
+
+/* Kick the currently round-robin scheduled vCPU */
+static void qemu_cpu_kick_rr_cpu(void)
+{
+    CPUState *cpu;
+    atomic_mb_set(&exit_request, 1);
+    do {
+        cpu = atomic_mb_read(&tcg_current_rr_cpu);
+        if (cpu) {
+            cpu_exit(cpu);
+        }
+    } while (cpu != atomic_mb_read(&tcg_current_rr_cpu));
+}
+
 static void kick_tcg_thread(void *opaque)
 {
     QEMUTimer *self = *(QEMUTimer **) opaque;
     timer_mod(self, qemu_tcg_next_kick());
-    qemu_cpu_kick_no_halt();
+    qemu_cpu_kick_rr_cpu();
 }
 
 static void *qemu_tcg_cpu_thread_fn(void *arg)
@@ -1257,6 +1272,7 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
         }
 
         for (; cpu != NULL && !exit_request; cpu = CPU_NEXT(cpu)) {
+            atomic_mb_set(&tcg_current_rr_cpu, cpu);
 
             qemu_clock_enable(QEMU_CLOCK_VIRTUAL,
                               (cpu->singlestep_enabled & SSTEP_NOTIMER) == 0);
@@ -1278,6 +1294,8 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
             }
 
         } /* for cpu.. */
+        /* Does not need atomic_mb_set because a spurious wakeup is okay.  */
+        atomic_set(&tcg_current_rr_cpu, NULL);
 
         /* Pairs with smp_wmb in qemu_cpu_kick.  */
         atomic_mb_set(&exit_request, 0);
@@ -1315,24 +1333,13 @@ static void qemu_cpu_kick_thread(CPUState *cpu)
 #endif
 }
 
-static void qemu_cpu_kick_no_halt(void)
-{
-    CPUState *cpu;
-    /* Ensure whatever caused the exit has reached the CPU threads before
-     * writing exit_request.
-     */
-    atomic_mb_set(&exit_request, 1);
-    cpu = atomic_mb_read(&tcg_current_cpu);
-    if (cpu) {
-        cpu_exit(cpu);
-    }
-}
-
 void qemu_cpu_kick(CPUState *cpu)
 {
     qemu_cond_broadcast(cpu->halt_cond);
     if (tcg_enabled()) {
-        qemu_cpu_kick_no_halt();
+        cpu_exit(cpu);
+        /* Also ensure current RR cpu is kicked */
+        qemu_cpu_kick_rr_cpu();
     } else {
         qemu_cpu_kick_thread(cpu);
     }
@@ -1373,7 +1380,7 @@ void qemu_mutex_lock_iothread(void)
         atomic_dec(&iothread_requesting_mutex);
     } else {
         if (qemu_mutex_trylock(&qemu_global_mutex)) {
-            qemu_cpu_kick_no_halt();
+            qemu_cpu_kick_rr_cpu();
             qemu_mutex_lock(&qemu_global_mutex);
         }
         atomic_dec(&iothread_requesting_mutex);
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index d73b0e6..3547dfa 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -408,7 +408,6 @@ bool memory_region_is_unassigned(MemoryRegion *mr);
 extern int singlestep;
 
 /* cpu-exec.c, accessed with atomic_mb_read/atomic_mb_set */
-extern CPUState *tcg_current_cpu;
 extern int tcg_pending_threads;
 extern bool exit_request;
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [Qemu-devel] [RFC v4 16/28] tcg: drop global lock during TCG code execution
  2016-08-11 15:23 [Qemu-devel] [RFC v4 00/28] Base enabling patches for MTTCG Alex Bennée
                   ` (14 preceding siblings ...)
  2016-08-11 15:24 ` [Qemu-devel] [RFC v4 15/28] tcg: rename tcg_current_cpu to tcg_current_rr_cpu Alex Bennée
@ 2016-08-11 15:24 ` Alex Bennée
  2016-09-07  4:03   ` Richard Henderson
  2016-08-11 15:24 ` [Qemu-devel] [RFC v4 17/28] cpus: re-factor out handle_icount_deadline Alex Bennée
                   ` (13 subsequent siblings)
  29 siblings, 1 reply; 64+ messages in thread
From: Alex Bennée @ 2016-08-11 15:24 UTC (permalink / raw)
  To: mttcg, qemu-devel, fred.konrad, a.rigo, cota, bobby.prani, nikunj
  Cc: mark.burton, pbonzini, jan.kiszka, serge.fdrv, rth,
	peter.maydell, claudio.fontana, Alex Bennée,
	Peter Crosthwaite, Michael S. Tsirkin, Eduardo Habkost

From: Jan Kiszka <jan.kiszka@siemens.com>

This finally allows TCG to benefit from the iothread introduction: Drop
the global mutex while running pure TCG CPU code. Reacquire the lock
when entering MMIO or PIO emulation, or when leaving the TCG loop.

We have to revert a few optimization for the current TCG threading
model, namely kicking the TCG thread in qemu_mutex_lock_iothread and not
kicking it in qemu_cpu_kick. We also need to disable RAM block
reordering until we have a more efficient locking mechanism at hand.

Still, a Linux x86 UP guest and my Musicpal ARM model boot fine here.
These numbers demonstrate where we gain something:

20338 jan       20   0  331m  75m 6904 R   99  0.9   0:50.95 qemu-system-arm
20337 jan       20   0  331m  75m 6904 S   20  0.9   0:26.50 qemu-system-arm

The guest CPU was fully loaded, but the iothread could still run mostly
independent on a second core. Without the patch we don't get beyond

32206 jan       20   0  330m  73m 7036 R   82  0.9   1:06.00 qemu-system-arm
32204 jan       20   0  330m  73m 7036 S   21  0.9   0:17.03 qemu-system-arm

We don't benefit significantly, though, when the guest is not fully
loading a host CPU.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Message-Id: <1439220437-23957-10-git-send-email-fred.konrad@greensocs.com>
[FK: Rebase, fix qemu_devices_reset deadlock, rm address_space_* mutex]
Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
[EGC: fixed iothread lock for cpu-exec IRQ handling]
Signed-off-by: Emilio G. Cota <cota@braap.org>
[AJB: -smp single-threaded fix, rm old info from commit msg, review updates]
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

---
v4 (ajb, base patches):
 - protect cpu->interrupt updates with BQL
 - fix wording io_mem_notdirty calls
 - s/we/with/
v3 (ajb, base-patches):
  - stale iothread_unlocks removed (cpu_exit/resume_from_signal deals
  with it in the longjmp).
  - fix re-base conflicts
v2 (ajb):
  - merge with tcg: grab iothread lock in cpu-exec interrupt handling
  - use existing fns for tracking lock state
  - lock iothread for mem_region
    - add assert on mem region modification
    - ensure smm_helper holds iothread
  - Add JK s-o-b
  - Fix-up FK s-o-b annotation
v1 (ajb, base-patches):
  - SMP failure now fixed by previous commit

Changes from Fred Konrad (mttcg-v7 via paolo):
  * Rebase on the current HEAD.
  * Fixes a deadlock in qemu_devices_reset().
  * Remove the mutex in address_space_*
---
 cpu-exec.c               | 13 +++++++++++++
 cpus.c                   | 26 +++-----------------------
 cputlb.c                 |  1 +
 exec.c                   | 12 +++++++++---
 hw/i386/kvmvapic.c       |  4 ++--
 include/qom/cpu.h        |  1 +
 memory.c                 |  2 ++
 qom/cpu.c                | 10 ++++++++++
 softmmu_template.h       | 17 +++++++++++++++++
 target-i386/smm_helper.c |  7 +++++++
 translate-all.c          | 11 +++++++++--
 11 files changed, 74 insertions(+), 30 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index 42fbf7d..317ad8e 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -29,6 +29,7 @@
 #include "qemu/rcu.h"
 #include "exec/tb-hash.h"
 #include "exec/log.h"
+#include "qemu/main-loop.h"
 #if defined(TARGET_I386) && !defined(CONFIG_USER_ONLY)
 #include "hw/i386/apic.h"
 #endif
@@ -357,8 +358,10 @@ static inline bool cpu_handle_halt(CPUState *cpu)
         if ((cpu->interrupt_request & CPU_INTERRUPT_POLL)
             && replay_interrupt()) {
             X86CPU *x86_cpu = X86_CPU(cpu);
+            qemu_mutex_lock_iothread();
             apic_poll_irq(x86_cpu->apic_state);
             cpu_reset_interrupt(cpu, CPU_INTERRUPT_POLL);
+            qemu_mutex_unlock_iothread();
         }
 #endif
         if (!cpu_has_work(cpu)) {
@@ -441,6 +444,8 @@ static inline void cpu_handle_interrupt(CPUState *cpu,
     int interrupt_request = cpu->interrupt_request;
 
     if (unlikely(interrupt_request)) {
+        qemu_mutex_lock_iothread();
+
         if (unlikely(cpu->singlestep_enabled & SSTEP_NOIRQ)) {
             /* Mask out external interrupts for this step. */
             interrupt_request &= ~CPU_INTERRUPT_SSTEP_MASK;
@@ -495,6 +500,10 @@ static inline void cpu_handle_interrupt(CPUState *cpu,
                the program flow was changed */
             *last_tb = NULL;
         }
+
+        /* If we exit via cpu_loop_exit/longjmp it is reset in cpu_exec */
+        qemu_mutex_unlock_iothread();
+
     }
     if (unlikely(cpu->exit_request || replay_has_interrupt())) {
         cpu->exit_request = 0;
@@ -625,8 +634,12 @@ int cpu_exec(CPUState *cpu)
             g_assert(cpu == current_cpu);
             g_assert(cc == CPU_GET_CLASS(cpu));
 #endif /* buggy compiler */
+
             cpu->can_do_io = 1;
             tb_lock_reset();
+            if (qemu_mutex_iothread_locked()) {
+                qemu_mutex_unlock_iothread();
+            }
         }
     } /* for(;;) */
 
diff --git a/cpus.c b/cpus.c
index 732b21f..d917473 100644
--- a/cpus.c
+++ b/cpus.c
@@ -926,8 +926,6 @@ static void qemu_kvm_init_cpu_signals(CPUState *cpu)
 #endif /* _WIN32 */
 
 static QemuMutex qemu_global_mutex;
-static QemuCond qemu_io_proceeded_cond;
-static unsigned iothread_requesting_mutex;
 
 static QemuThread io_thread;
 
@@ -944,7 +942,6 @@ void qemu_init_cpu_loop(void)
     qemu_cond_init(&qemu_work_cond);
     qemu_cond_init(&qemu_safe_work_cond);
     qemu_cond_init(&qemu_exclusive_cond);
-    qemu_cond_init(&qemu_io_proceeded_cond);
     qemu_mutex_init(&qemu_global_mutex);
 
     qemu_thread_get_self(&io_thread);
@@ -998,10 +995,6 @@ static void qemu_tcg_wait_io_event(CPUState *cpu)
         qemu_cond_wait(cpu->halt_cond, &qemu_global_mutex);
     }
 
-    while (iothread_requesting_mutex) {
-        qemu_cond_wait(&qemu_io_proceeded_cond, &qemu_global_mutex);
-    }
-
     CPU_FOREACH(cpu) {
         qemu_wait_io_event_common(cpu);
     }
@@ -1152,7 +1145,9 @@ static int tcg_cpu_exec(CPUState *cpu)
         cpu->icount_decr.u16.low = decr;
         cpu->icount_extra = count;
     }
+    qemu_mutex_unlock_iothread();
     ret = cpu_exec(cpu);
+    qemu_mutex_lock_iothread();
 #ifdef CONFIG_PROFILER
     tcg_time += profile_getclock() - ti;
 #endif
@@ -1370,22 +1365,7 @@ bool qemu_mutex_iothread_locked(void)
 
 void qemu_mutex_lock_iothread(void)
 {
-    atomic_inc(&iothread_requesting_mutex);
-    /* In the simple case there is no need to bump the VCPU thread out of
-     * TCG code execution.
-     */
-    if (!tcg_enabled() || qemu_in_vcpu_thread() ||
-        !first_cpu || !first_cpu->created) {
-        qemu_mutex_lock(&qemu_global_mutex);
-        atomic_dec(&iothread_requesting_mutex);
-    } else {
-        if (qemu_mutex_trylock(&qemu_global_mutex)) {
-            qemu_cpu_kick_rr_cpu();
-            qemu_mutex_lock(&qemu_global_mutex);
-        }
-        atomic_dec(&iothread_requesting_mutex);
-        qemu_cond_broadcast(&qemu_io_proceeded_cond);
-    }
+    qemu_mutex_lock(&qemu_global_mutex);
     iothread_locked = true;
 }
 
diff --git a/cputlb.c b/cputlb.c
index d068ee5..6f19daa 100644
--- a/cputlb.c
+++ b/cputlb.c
@@ -18,6 +18,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/main-loop.h"
 #include "cpu.h"
 #include "exec/exec-all.h"
 #include "exec/memory.h"
diff --git a/exec.c b/exec.c
index f2ea554..a39a200 100644
--- a/exec.c
+++ b/exec.c
@@ -2086,9 +2086,9 @@ static void check_watchpoint(int offset, int len, MemTxAttrs attrs, int flags)
                 }
                 cpu->watchpoint_hit = wp;
 
-                /* The tb_lock will be reset when cpu_loop_exit or
-                 * cpu_loop_exit_noexc longjmp back into the cpu_exec
-                 * main loop.
+                /* Both tb_lock and iothread_mutex will be reset when
+                 * cpu_loop_exit or cpu_loop_exit_noexc longjmp
+                 * back into the cpu_exec main loop.
                  */
                 tb_lock();
                 tb_check_watchpoint(cpu);
@@ -2324,8 +2324,14 @@ static void io_mem_init(void)
     memory_region_init_io(&io_mem_rom, NULL, &unassigned_mem_ops, NULL, NULL, UINT64_MAX);
     memory_region_init_io(&io_mem_unassigned, NULL, &unassigned_mem_ops, NULL,
                           NULL, UINT64_MAX);
+
+    /* io_mem_notdirty calls tb_invalidate_phys_page_fast,
+     * which can be called without the iothread mutex.
+     */
     memory_region_init_io(&io_mem_notdirty, NULL, &notdirty_mem_ops, NULL,
                           NULL, UINT64_MAX);
+    memory_region_clear_global_locking(&io_mem_notdirty);
+
     memory_region_init_io(&io_mem_watch, NULL, &watch_mem_ops, NULL,
                           NULL, UINT64_MAX);
 }
diff --git a/hw/i386/kvmvapic.c b/hw/i386/kvmvapic.c
index 0024b76..582f040 100644
--- a/hw/i386/kvmvapic.c
+++ b/hw/i386/kvmvapic.c
@@ -450,8 +450,8 @@ static void patch_instruction(VAPICROMState *s, X86CPU *cpu, target_ulong ip)
     resume_all_vcpus();
 
     if (!kvm_enabled()) {
-        /* tb_lock will be reset when cpu_loop_exit_noexc longjmps
-         * back into the cpu_exec loop. */
+        /* Both tb_lock and iothread_mutex will be reset when
+         *  longjmps back into the cpu_exec loop. */
         tb_lock();
         tb_gen_code(cs, current_pc, current_cs_base, current_flags, 1);
         cpu_loop_exit_noexc(cs);
diff --git a/include/qom/cpu.h b/include/qom/cpu.h
index 682788b..5ecbd29 100644
--- a/include/qom/cpu.h
+++ b/include/qom/cpu.h
@@ -310,6 +310,7 @@ struct CPUState {
     bool unplug;
     bool crash_occurred;
     bool exit_request;
+    /* updates protected by BQL */
     uint32_t interrupt_request;
     int singlestep_enabled;
     int64_t icount_extra;
diff --git a/memory.c b/memory.c
index 0eb6895..1875695 100644
--- a/memory.c
+++ b/memory.c
@@ -916,6 +916,8 @@ void memory_region_transaction_commit(void)
     AddressSpace *as;
 
     assert(memory_region_transaction_depth);
+    assert(qemu_mutex_iothread_locked());
+
     --memory_region_transaction_depth;
     if (!memory_region_transaction_depth) {
         if (memory_region_update_pending) {
diff --git a/qom/cpu.c b/qom/cpu.c
index 42b5631..08a39c5 100644
--- a/qom/cpu.c
+++ b/qom/cpu.c
@@ -112,9 +112,19 @@ static void cpu_common_get_memory_mapping(CPUState *cpu,
     error_setg(errp, "Obtaining memory mappings is unsupported on this CPU.");
 }
 
+/* Resetting the IRQ comes from across the code base so we take the
+ * BQL here if we need to.  cpu_interrupt assumes it is held.*/
 void cpu_reset_interrupt(CPUState *cpu, int mask)
 {
+    bool need_lock = !qemu_mutex_iothread_locked();
+
+    if (need_lock) {
+        qemu_mutex_lock_iothread();
+    }
     cpu->interrupt_request &= ~mask;
+    if (need_lock) {
+        qemu_mutex_unlock_iothread();
+    }
 }
 
 void cpu_exit(CPUState *cpu)
diff --git a/softmmu_template.h b/softmmu_template.h
index 284ab2c..c64e819 100644
--- a/softmmu_template.h
+++ b/softmmu_template.h
@@ -126,6 +126,7 @@ static inline DATA_TYPE glue(io_read, SUFFIX)(CPUArchState *env,
     CPUState *cpu = ENV_GET_CPU(env);
     hwaddr physaddr = iotlbentry->addr;
     MemoryRegion *mr = iotlb_to_region(cpu, physaddr, iotlbentry->attrs);
+    bool locked = false;
 
     physaddr = (physaddr & TARGET_PAGE_MASK) + addr;
     cpu->mem_io_pc = retaddr;
@@ -134,8 +135,15 @@ static inline DATA_TYPE glue(io_read, SUFFIX)(CPUArchState *env,
     }
 
     cpu->mem_io_vaddr = addr;
+    if (mr->global_locking) {
+        qemu_mutex_lock_iothread();
+        locked = true;
+    }
     memory_region_dispatch_read(mr, physaddr, &val, 1 << SHIFT,
                                 iotlbentry->attrs);
+    if (locked) {
+        qemu_mutex_unlock_iothread();
+    }
     return val;
 }
 #endif
@@ -313,6 +321,7 @@ static inline void glue(io_write, SUFFIX)(CPUArchState *env,
     CPUState *cpu = ENV_GET_CPU(env);
     hwaddr physaddr = iotlbentry->addr;
     MemoryRegion *mr = iotlb_to_region(cpu, physaddr, iotlbentry->attrs);
+    bool locked = false;
 
     physaddr = (physaddr & TARGET_PAGE_MASK) + addr;
     if (mr != &io_mem_rom && mr != &io_mem_notdirty && !cpu->can_do_io) {
@@ -321,8 +330,16 @@ static inline void glue(io_write, SUFFIX)(CPUArchState *env,
 
     cpu->mem_io_vaddr = addr;
     cpu->mem_io_pc = retaddr;
+
+    if (mr->global_locking) {
+        qemu_mutex_lock_iothread();
+        locked = true;
+    }
     memory_region_dispatch_write(mr, physaddr, val, 1 << SHIFT,
                                  iotlbentry->attrs);
+    if (locked) {
+        qemu_mutex_unlock_iothread();
+    }
 }
 
 void helper_le_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
diff --git a/target-i386/smm_helper.c b/target-i386/smm_helper.c
index 4dd6a2c..f051a77 100644
--- a/target-i386/smm_helper.c
+++ b/target-i386/smm_helper.c
@@ -18,6 +18,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/main-loop.h"
 #include "cpu.h"
 #include "exec/helper-proto.h"
 #include "exec/log.h"
@@ -42,11 +43,14 @@ void helper_rsm(CPUX86State *env)
 #define SMM_REVISION_ID 0x00020000
 #endif
 
+/* Called with iothread lock taken */
 void cpu_smm_update(X86CPU *cpu)
 {
     CPUX86State *env = &cpu->env;
     bool smm_enabled = (env->hflags & HF_SMM_MASK);
 
+    g_assert(qemu_mutex_iothread_locked());
+
     if (cpu->smram) {
         memory_region_set_enabled(cpu->smram, smm_enabled);
     }
@@ -333,7 +337,10 @@ void helper_rsm(CPUX86State *env)
     }
     env->hflags2 &= ~HF2_SMM_INSIDE_NMI_MASK;
     env->hflags &= ~HF_SMM_MASK;
+
+    qemu_mutex_lock_iothread();
     cpu_smm_update(cpu);
+    qemu_mutex_unlock_iothread();
 
     qemu_log_mask(CPU_LOG_INT, "SMM: after RSM\n");
     log_cpu_state_mask(CPU_LOG_INT, CPU(cpu), CPU_DUMP_CCOP);
diff --git a/translate-all.c b/translate-all.c
index d21e5ab..c650696 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -55,6 +55,7 @@
 #include "translate-all.h"
 #include "qemu/bitmap.h"
 #include "qemu/timer.h"
+#include "qemu/main-loop.h"
 #include "exec/log.h"
 
 /* #define DEBUG_TB_INVALIDATE */
@@ -1483,7 +1484,9 @@ void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end,
 }
 
 #ifdef CONFIG_SOFTMMU
-/* len must be <= 8 and start must be a multiple of len */
+/* len must be <= 8 and start must be a multiple of len.
+ * Called via softmmu_template.h, with iothread mutex not held.
+ */
 void tb_invalidate_phys_page_fast(tb_page_addr_t start, int len)
 {
     PageDesc *p;
@@ -1678,7 +1681,10 @@ void tb_check_watchpoint(CPUState *cpu)
 
 #ifndef CONFIG_USER_ONLY
 /* in deterministic execution mode, instructions doing device I/Os
-   must be at the end of the TB */
+ * must be at the end of the TB.
+ *
+ * Called by softmmu_template.h, with iothread mutex not held.
+ */
 void cpu_io_recompile(CPUState *cpu, uintptr_t retaddr)
 {
 #if defined(TARGET_MIPS) || defined(TARGET_SH4)
@@ -1881,6 +1887,7 @@ void dump_opcount_info(FILE *f, fprintf_function cpu_fprintf)
 
 void cpu_interrupt(CPUState *cpu, int mask)
 {
+    g_assert(qemu_mutex_iothread_locked());
     cpu->interrupt_request |= mask;
     cpu->tcg_exit_req = 1;
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [Qemu-devel] [RFC v4 17/28] cpus: re-factor out handle_icount_deadline
  2016-08-11 15:23 [Qemu-devel] [RFC v4 00/28] Base enabling patches for MTTCG Alex Bennée
                   ` (15 preceding siblings ...)
  2016-08-11 15:24 ` [Qemu-devel] [RFC v4 16/28] tcg: drop global lock during TCG code execution Alex Bennée
@ 2016-08-11 15:24 ` Alex Bennée
  2016-09-07  4:06   ` Richard Henderson
  2016-08-11 15:24 ` [Qemu-devel] [RFC v4 18/28] tcg: remove global exit_request Alex Bennée
                   ` (12 subsequent siblings)
  29 siblings, 1 reply; 64+ messages in thread
From: Alex Bennée @ 2016-08-11 15:24 UTC (permalink / raw)
  To: mttcg, qemu-devel, fred.konrad, a.rigo, cota, bobby.prani, nikunj
  Cc: mark.burton, pbonzini, jan.kiszka, serge.fdrv, rth,
	peter.maydell, claudio.fontana, Alex Bennée,
	Peter Crosthwaite

In preparation for adding a MTTCG thread we re-factor out a bit of what
will be common code to handle the QEMU_CLOCK_VIRTUAL expiration.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

---
v4
  - split from exit_request patch
---
 cpus.c | 19 +++++++++++++------
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/cpus.c b/cpus.c
index d917473..5ed6dc0 100644
--- a/cpus.c
+++ b/cpus.c
@@ -1121,6 +1121,18 @@ static int64_t tcg_get_icount_limit(void)
     }
 }
 
+static void handle_icount_deadline(void)
+{
+    if (use_icount) {
+        int64_t deadline =
+            qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL);
+
+        if (deadline == 0) {
+            qemu_clock_notify(QEMU_CLOCK_VIRTUAL);
+        }
+    }
+}
+
 static int tcg_cpu_exec(CPUState *cpu)
 {
     int ret;
@@ -1295,13 +1307,8 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
         /* Pairs with smp_wmb in qemu_cpu_kick.  */
         atomic_mb_set(&exit_request, 0);
 
-        if (use_icount) {
-            int64_t deadline = qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL);
+        handle_icount_deadline();
 
-            if (deadline == 0) {
-                qemu_clock_notify(QEMU_CLOCK_VIRTUAL);
-            }
-        }
         qemu_tcg_wait_io_event(QTAILQ_FIRST(&cpus));
         deal_with_unplugged_cpus();
     }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [Qemu-devel] [RFC v4 18/28] tcg: remove global exit_request
  2016-08-11 15:23 [Qemu-devel] [RFC v4 00/28] Base enabling patches for MTTCG Alex Bennée
                   ` (16 preceding siblings ...)
  2016-08-11 15:24 ` [Qemu-devel] [RFC v4 17/28] cpus: re-factor out handle_icount_deadline Alex Bennée
@ 2016-08-11 15:24 ` Alex Bennée
  2016-09-07  4:11   ` Richard Henderson
  2016-08-11 15:24 ` [Qemu-devel] [RFC v4 19/28] tcg: move locking for tb_invalidate_phys_page_range up Alex Bennée
                   ` (11 subsequent siblings)
  29 siblings, 1 reply; 64+ messages in thread
From: Alex Bennée @ 2016-08-11 15:24 UTC (permalink / raw)
  To: mttcg, qemu-devel, fred.konrad, a.rigo, cota, bobby.prani, nikunj
  Cc: mark.burton, pbonzini, jan.kiszka, serge.fdrv, rth,
	peter.maydell, claudio.fontana, Alex Bennée,
	Peter Crosthwaite

There are now only two uses of the global exit_request left.

The first ensures we exit the run_loop when we first start to process
pending work and in the kick handler. This is just as easily done by
setting the first_cpu->exit_request flag.

The second use is in the round robin kick routine. The global
exit_request ensured every vCPU would set its local exit_request and
cause a full exit of the loop. Now the iothread isn't being held while
running we can just rely on the kick handler to push us out as intended.

We lightly re-factor the main vCPU thread to ensure cpu->exit_requests
cause us to exit the main loop and process any IO requests that might
come along.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

---
v4
  - moved to after iothread unlocking patch
  - needed to remove kick exit_request as well.
  - remove extraneous cpu->exit_request check
  - remove stray exit_request setting
  - remove needless atomic operation
---
 cpu-exec-common.c       |  1 -
 cpu-exec.c              |  9 ++-------
 cpus.c                  | 18 ++++++++++--------
 include/exec/exec-all.h |  1 -
 4 files changed, 12 insertions(+), 17 deletions(-)

diff --git a/cpu-exec-common.c b/cpu-exec-common.c
index e220ba7..e29cf6d 100644
--- a/cpu-exec-common.c
+++ b/cpu-exec-common.c
@@ -23,7 +23,6 @@
 #include "exec/exec-all.h"
 #include "exec/memory-internal.h"
 
-bool exit_request;
 int tcg_pending_threads;
 
 /* exit the current TB, but without causing any exception to be raised */
diff --git a/cpu-exec.c b/cpu-exec.c
index 317ad8e..d2e8b9e 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -531,9 +531,8 @@ static inline void cpu_loop_exec_tb(CPUState *cpu, TranslationBlock *tb,
         /* Something asked us to stop executing
          * chained TBs; just continue round the main
          * loop. Whatever requested the exit will also
-         * have set something else (eg exit_request or
-         * interrupt_request) which we will handle
-         * next time around the loop.  But we need to
+         * have set something else (eg interrupt_request) which we
+         * will handle next time around the loop.  But we need to
          * ensure the tcg_exit_req read in generated code
          * comes before the next read of cpu->exit_request
          * or cpu->interrupt_request.
@@ -589,10 +588,6 @@ int cpu_exec(CPUState *cpu)
 
     rcu_read_lock();
 
-    if (unlikely(atomic_mb_read(&exit_request))) {
-        cpu->exit_request = 1;
-    }
-
     cc->cpu_exec_enter(cpu);
 
     /* Calculate difference between guest clock and host clock.
diff --git a/cpus.c b/cpus.c
index 5ed6dc0..8a40a08 100644
--- a/cpus.c
+++ b/cpus.c
@@ -1215,7 +1215,6 @@ static CPUState *tcg_current_rr_cpu;
 static void qemu_cpu_kick_rr_cpu(void)
 {
     CPUState *cpu;
-    atomic_mb_set(&exit_request, 1);
     do {
         cpu = atomic_mb_read(&tcg_current_rr_cpu);
         if (cpu) {
@@ -1265,11 +1264,11 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
         timer_mod(kick_timer, qemu_tcg_next_kick());
     }
 
-    /* process any pending work */
-    atomic_mb_set(&exit_request, 1);
-
     cpu = first_cpu;
 
+    /* process any pending work */
+    cpu->exit_request = 1;
+
     while (1) {
         /* Account partial waits to QEMU_CLOCK_VIRTUAL.  */
         qemu_account_warp_timer();
@@ -1278,7 +1277,7 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
             cpu = first_cpu;
         }
 
-        for (; cpu != NULL && !exit_request; cpu = CPU_NEXT(cpu)) {
+        while (cpu && !cpu->exit_request) {
             atomic_mb_set(&tcg_current_rr_cpu, cpu);
 
             qemu_clock_enable(QEMU_CLOCK_VIRTUAL,
@@ -1300,12 +1299,15 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
                 break;
             }
 
-        } /* for cpu.. */
+            cpu = CPU_NEXT(cpu);
+        } /* while (cpu && !cpu->exit_request).. */
+
         /* Does not need atomic_mb_set because a spurious wakeup is okay.  */
         atomic_set(&tcg_current_rr_cpu, NULL);
 
-        /* Pairs with smp_wmb in qemu_cpu_kick.  */
-        atomic_mb_set(&exit_request, 0);
+        if (cpu && cpu->exit_request) {
+            atomic_mb_set(&cpu->exit_request, 0);
+        }
 
         handle_icount_deadline();
 
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 3547dfa..27d1a24 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -409,7 +409,6 @@ extern int singlestep;
 
 /* cpu-exec.c, accessed with atomic_mb_read/atomic_mb_set */
 extern int tcg_pending_threads;
-extern bool exit_request;
 
 /**
  * qemu_work_cond - condition to wait for CPU work items completion
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [Qemu-devel] [RFC v4 19/28] tcg: move locking for tb_invalidate_phys_page_range up
  2016-08-11 15:23 [Qemu-devel] [RFC v4 00/28] Base enabling patches for MTTCG Alex Bennée
                   ` (17 preceding siblings ...)
  2016-08-11 15:24 ` [Qemu-devel] [RFC v4 18/28] tcg: remove global exit_request Alex Bennée
@ 2016-08-11 15:24 ` Alex Bennée
  2016-09-27 15:56   ` Paolo Bonzini
  2016-08-11 15:24 ` [Qemu-devel] [RFC v4 20/28] cpus: tweak sleeping and safe_work rules for MTTCG Alex Bennée
                   ` (10 subsequent siblings)
  29 siblings, 1 reply; 64+ messages in thread
From: Alex Bennée @ 2016-08-11 15:24 UTC (permalink / raw)
  To: mttcg, qemu-devel, fred.konrad, a.rigo, cota, bobby.prani, nikunj
  Cc: mark.burton, pbonzini, jan.kiszka, serge.fdrv, rth,
	peter.maydell, claudio.fontana, Alex Bennée,
	Peter Crosthwaite

In the linux-user case all things that involve ''l1_map' and PageDesc
tweaks are protected by the memory lock (mmpa_lock). For SoftMMU mode
we previously relied on single threaded behaviour, with MTTCG we now use
the tb_lock().

As a result we need to do a little re-factoring  and push the taking of
this lock up the call tree. This requires a slightly different entry for
the SoftMMU and user-mode cases from tb_invalidate_phys_range.

This also means user-mode breakpoint insertion needs to take two locks
but it hadn't taken any previously so this is an improvement.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

---
v4
 - reword commit message
---
 exec.c          | 16 ++++++++++++++++
 translate-all.c | 37 +++++++++++++++++++++++++++++--------
 2 files changed, 45 insertions(+), 8 deletions(-)

diff --git a/exec.c b/exec.c
index a39a200..f418725 100644
--- a/exec.c
+++ b/exec.c
@@ -722,7 +722,11 @@ void cpu_exec_init(CPUState *cpu, Error **errp)
 #if defined(CONFIG_USER_ONLY)
 static void breakpoint_invalidate(CPUState *cpu, target_ulong pc)
 {
+    mmap_lock();
+    tb_lock();
     tb_invalidate_phys_page_range(pc, pc + 1, 0);
+    tb_unlock();
+    mmap_unlock();
 }
 #else
 static void breakpoint_invalidate(CPUState *cpu, target_ulong pc)
@@ -731,6 +735,7 @@ static void breakpoint_invalidate(CPUState *cpu, target_ulong pc)
     hwaddr phys = cpu_get_phys_page_attrs_debug(cpu, pc, &attrs);
     int asidx = cpu_asidx_from_attrs(cpu, attrs);
     if (phys != -1) {
+        /* Locks grabbed by tb_invalidate_phys_addr */
         tb_invalidate_phys_addr(cpu->cpu_ases[asidx].as,
                                 phys | (pc & ~TARGET_PAGE_MASK));
     }
@@ -2009,7 +2014,11 @@ ram_addr_t qemu_ram_addr_from_host(void *ptr)
 static void notdirty_mem_write(void *opaque, hwaddr ram_addr,
                                uint64_t val, unsigned size)
 {
+    bool locked = false;
+
     if (!cpu_physical_memory_get_dirty_flag(ram_addr, DIRTY_MEMORY_CODE)) {
+        locked = true;
+        tb_lock();
         tb_invalidate_phys_page_fast(ram_addr, size);
     }
     switch (size) {
@@ -2025,6 +2034,11 @@ static void notdirty_mem_write(void *opaque, hwaddr ram_addr,
     default:
         abort();
     }
+
+    if (locked) {
+        tb_unlock();
+    }
+
     /* Set both VGA and migration bits for simplicity and to remove
      * the notdirty callback faster.
      */
@@ -2505,7 +2519,9 @@ static void invalidate_and_set_dirty(MemoryRegion *mr, hwaddr addr,
             cpu_physical_memory_range_includes_clean(addr, length, dirty_log_mask);
     }
     if (dirty_log_mask & (1 << DIRTY_MEMORY_CODE)) {
+        tb_lock();
         tb_invalidate_phys_range(addr, addr + length);
+        tb_unlock();
         dirty_log_mask &= ~(1 << DIRTY_MEMORY_CODE);
     }
     cpu_physical_memory_set_dirty_range(addr, length, dirty_log_mask);
diff --git a/translate-all.c b/translate-all.c
index c650696..c53ae8c 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -1363,12 +1363,11 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
  * access: the virtual CPU will exit the current TB if code is modified inside
  * this TB.
  *
- * Called with mmap_lock held for user-mode emulation
+ * Called with mmap_lock held for user-mode emulation, grabs tb_lock
+ * Called with tb_lock held for system-mode emulation
  */
-void tb_invalidate_phys_range(tb_page_addr_t start, tb_page_addr_t end)
+static void tb_invalidate_phys_range_1(tb_page_addr_t start, tb_page_addr_t end)
 {
-    assert_memory_lock();
-
     while (start < end) {
         tb_invalidate_phys_page_range(start, end, 0);
         start &= TARGET_PAGE_MASK;
@@ -1376,6 +1375,21 @@ void tb_invalidate_phys_range(tb_page_addr_t start, tb_page_addr_t end)
     }
 }
 
+#ifdef CONFIG_SOFTMMU
+void tb_invalidate_phys_range(tb_page_addr_t start, tb_page_addr_t end)
+{
+    assert_tb_lock();
+    tb_invalidate_phys_range_1(start, end);
+}
+#else
+void tb_invalidate_phys_range(tb_page_addr_t start, tb_page_addr_t end)
+{
+    assert_memory_lock();
+    tb_lock();
+    tb_invalidate_phys_range_1(start, end);
+    tb_unlock();
+}
+#endif
 /*
  * Invalidate all TBs which intersect with the target physical address range
  * [start;end[. NOTE: start and end must refer to the *same* physical page.
@@ -1383,7 +1397,8 @@ void tb_invalidate_phys_range(tb_page_addr_t start, tb_page_addr_t end)
  * access: the virtual CPU will exit the current TB if code is modified inside
  * this TB.
  *
- * Called with mmap_lock held for user-mode emulation
+ * Called with tb_lock/mmap_lock held for user-mode emulation
+ * Called with tb_lock held for system-mode emulation
  */
 void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end,
                                    int is_cpu_write_access)
@@ -1406,6 +1421,7 @@ void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end,
 #endif /* TARGET_HAS_PRECISE_SMC */
 
     assert_memory_lock();
+    assert_tb_lock();
 
     p = page_find(start >> TARGET_PAGE_BITS);
     if (!p) {
@@ -1420,7 +1436,6 @@ void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end,
     /* we remove all the TBs in the range [start, end[ */
     /* XXX: see if in some cases it could be faster to invalidate all
        the code */
-    tb_lock();
     tb = p->first_tb;
     while (tb != NULL) {
         n = (uintptr_t)tb & 3;
@@ -1480,12 +1495,12 @@ void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end,
         cpu_loop_exit_noexc(cpu);
     }
 #endif
-    tb_unlock();
 }
 
 #ifdef CONFIG_SOFTMMU
 /* len must be <= 8 and start must be a multiple of len.
- * Called via softmmu_template.h, with iothread mutex not held.
+ * Called via softmmu_template.h when code areas are written to with
+ * tb_lock held.
  */
 void tb_invalidate_phys_page_fast(tb_page_addr_t start, int len)
 {
@@ -1500,6 +1515,8 @@ void tb_invalidate_phys_page_fast(tb_page_addr_t start, int len)
                   (intptr_t)cpu_single_env->segs[R_CS].base);
     }
 #endif
+    assert_memory_lock();
+
     p = page_find(start >> TARGET_PAGE_BITS);
     if (!p) {
         return;
@@ -1547,6 +1564,8 @@ static bool tb_invalidate_phys_page(tb_page_addr_t addr, uintptr_t pc)
     uint32_t current_flags = 0;
 #endif
 
+    assert_memory_lock();
+
     addr &= TARGET_PAGE_MASK;
     p = page_find(addr >> TARGET_PAGE_BITS);
     if (!p) {
@@ -1650,7 +1669,9 @@ void tb_invalidate_phys_addr(AddressSpace *as, hwaddr addr)
         return;
     }
     ram_addr = memory_region_get_ram_addr(mr) + addr;
+    tb_lock();
     tb_invalidate_phys_page_range(ram_addr, ram_addr + 1, 0);
+    tb_unlock();
     rcu_read_unlock();
 }
 #endif /* !defined(CONFIG_USER_ONLY) */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [Qemu-devel] [RFC v4 20/28] cpus: tweak sleeping and safe_work rules for MTTCG
  2016-08-11 15:23 [Qemu-devel] [RFC v4 00/28] Base enabling patches for MTTCG Alex Bennée
                   ` (18 preceding siblings ...)
  2016-08-11 15:24 ` [Qemu-devel] [RFC v4 19/28] tcg: move locking for tb_invalidate_phys_page_range up Alex Bennée
@ 2016-08-11 15:24 ` Alex Bennée
  2016-09-07  4:22   ` Richard Henderson
  2016-09-07 10:05   ` Paolo Bonzini
  2016-08-11 15:24 ` [Qemu-devel] [RFC v4 21/28] tcg: enable tb_lock() for SoftMMU Alex Bennée
                   ` (9 subsequent siblings)
  29 siblings, 2 replies; 64+ messages in thread
From: Alex Bennée @ 2016-08-11 15:24 UTC (permalink / raw)
  To: mttcg, qemu-devel, fred.konrad, a.rigo, cota, bobby.prani, nikunj
  Cc: mark.burton, pbonzini, jan.kiszka, serge.fdrv, rth,
	peter.maydell, claudio.fontana, Alex Bennée,
	Peter Crosthwaite

Once TCG gains the ability to sleep individual threads we need to make
sure they don't sleep when safe work is pending as all threads need to
go through the process_queued_work function. Also if we have multiple
threads wait_for_safe_work can now sleep without deadlocking.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

---
v4
  - new for v4, ontop of async-safe-work-v5
---
 cpu-exec-common.c | 15 ++++++++++++++-
 cpus.c            |  2 +-
 include/qom/cpu.h |  8 ++++++++
 3 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/cpu-exec-common.c b/cpu-exec-common.c
index e29cf6d..84cb789 100644
--- a/cpu-exec-common.c
+++ b/cpu-exec-common.c
@@ -83,6 +83,18 @@ QemuCond qemu_exclusive_cond;
 
 static int safe_work_pending;
 
+/* No vCPUs can sleep while there is safe work pending as we need
+ * everything to finish up in process_cpu_work.
+ */
+bool cpu_has_queued_work(CPUState *cpu)
+{
+    if (cpu->queued_work || atomic_mb_read(&safe_work_pending) > 0) {
+        return true;
+    } else {
+        return false;
+    }
+}
+
 #ifdef CONFIG_USER_ONLY
 #define can_wait_for_safe() (1)
 #else
@@ -91,7 +103,8 @@ static int safe_work_pending;
  * all vCPUs are in the same thread. This will change for MTTCG
  * however.
  */
-#define can_wait_for_safe() (0)
+extern int smp_cpus;
+#define can_wait_for_safe() (mttcg_enabled && smp_cpus > 1)
 #endif
 
 void wait_safe_cpu_work(void)
diff --git a/cpus.c b/cpus.c
index 8a40a08..4fc5e4c 100644
--- a/cpus.c
+++ b/cpus.c
@@ -88,7 +88,7 @@ bool cpu_is_stopped(CPUState *cpu)
 
 static bool cpu_thread_is_idle(CPUState *cpu)
 {
-    if (cpu->stop || cpu->queued_work) {
+    if (cpu->stop || cpu_has_queued_work(cpu)) {
         return false;
     }
     if (cpu_is_stopped(cpu)) {
diff --git a/include/qom/cpu.h b/include/qom/cpu.h
index 5ecbd29..d0db846 100644
--- a/include/qom/cpu.h
+++ b/include/qom/cpu.h
@@ -661,6 +661,14 @@ void async_run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data);
 void async_safe_run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data);
 
 /**
+ * cpu_has_queued_work:
+ * @cpu: The vCPU to check
+ *
+ * Returns true if there is *_run_on_cpu work to be done.
+ */
+bool cpu_has_queued_work(CPUState *cpu);
+
+/**
  * qemu_get_cpu:
  * @index: The CPUState@cpu_index value of the CPU to obtain.
  *
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [Qemu-devel] [RFC v4 21/28] tcg: enable tb_lock() for SoftMMU
  2016-08-11 15:23 [Qemu-devel] [RFC v4 00/28] Base enabling patches for MTTCG Alex Bennée
                   ` (19 preceding siblings ...)
  2016-08-11 15:24 ` [Qemu-devel] [RFC v4 20/28] cpus: tweak sleeping and safe_work rules for MTTCG Alex Bennée
@ 2016-08-11 15:24 ` Alex Bennée
  2016-09-07  4:26   ` Richard Henderson
  2016-09-27 16:16   ` Paolo Bonzini
  2016-08-11 15:24 ` [Qemu-devel] [RFC v4 22/28] tcg: enable thread-per-vCPU Alex Bennée
                   ` (8 subsequent siblings)
  29 siblings, 2 replies; 64+ messages in thread
From: Alex Bennée @ 2016-08-11 15:24 UTC (permalink / raw)
  To: mttcg, qemu-devel, fred.konrad, a.rigo, cota, bobby.prani, nikunj
  Cc: mark.burton, pbonzini, jan.kiszka, serge.fdrv, rth,
	peter.maydell, claudio.fontana, Alex Bennée,
	Peter Crosthwaite

tb_lock() has long been used for linux-user mode to protect code
generation. By enabling it now we prepare for MTTCG and ensure all code
generation is serialised by this lock. The other major structure that
needs protecting is the l1_map and its PageDesc structures. For the
SoftMMU case we also use tb_lock() to protect these structures instead
of linux-user mmap_lock() which as the name suggests serialises updates
to the structure as a result of guest mmap operations.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

---
v4
  - split from main tcg: enable thread-per-vCPU patch
---
 translate-all.c | 18 +++++-------------
 1 file changed, 5 insertions(+), 13 deletions(-)

diff --git a/translate-all.c b/translate-all.c
index c53ae8c..cfb009a 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -82,7 +82,11 @@
 #endif
 
 #ifdef CONFIG_SOFTMMU
-#define assert_memory_lock() do { /* nothing */ } while (0)
+#define assert_memory_lock() do {           \
+        if (DEBUG_MEM_LOCKS) {              \
+            g_assert(have_tb_lock);         \
+        }                                   \
+    } while (0)
 #else
 #define assert_memory_lock() do {               \
         if (DEBUG_MEM_LOCKS) {                  \
@@ -146,36 +150,28 @@ static void *l1_map[V_L1_SIZE];
 TCGContext tcg_ctx;
 
 /* translation block context */
-#ifdef CONFIG_USER_ONLY
 __thread int have_tb_lock;
-#endif
 
 void tb_lock(void)
 {
-#ifdef CONFIG_USER_ONLY
     assert(!have_tb_lock);
     qemu_mutex_lock(&tcg_ctx.tb_ctx.tb_lock);
     have_tb_lock++;
-#endif
 }
 
 void tb_unlock(void)
 {
-#ifdef CONFIG_USER_ONLY
     assert(have_tb_lock);
     have_tb_lock--;
     qemu_mutex_unlock(&tcg_ctx.tb_ctx.tb_lock);
-#endif
 }
 
 void tb_lock_reset(void)
 {
-#ifdef CONFIG_USER_ONLY
     if (have_tb_lock) {
         qemu_mutex_unlock(&tcg_ctx.tb_ctx.tb_lock);
         have_tb_lock = 0;
     }
-#endif
 }
 
 #ifdef DEBUG_LOCKING
@@ -184,15 +180,11 @@ void tb_lock_reset(void)
 #define DEBUG_TB_LOCKS 0
 #endif
 
-#ifdef CONFIG_SOFTMMU
-#define assert_tb_lock() do { /* nothing */ } while (0)
-#else
 #define assert_tb_lock() do {               \
         if (DEBUG_TB_LOCKS) {               \
             g_assert(have_tb_lock);         \
         }                                   \
     } while (0)
-#endif
 
 
 static TranslationBlock *tb_find_pc(uintptr_t tc_ptr);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [Qemu-devel] [RFC v4 22/28] tcg: enable thread-per-vCPU
  2016-08-11 15:23 [Qemu-devel] [RFC v4 00/28] Base enabling patches for MTTCG Alex Bennée
                   ` (20 preceding siblings ...)
  2016-08-11 15:24 ` [Qemu-devel] [RFC v4 21/28] tcg: enable tb_lock() for SoftMMU Alex Bennée
@ 2016-08-11 15:24 ` Alex Bennée
  2016-08-11 15:24 ` [Qemu-devel] [RFC v4 23/28] atomic: introduce cmpxchg_bool Alex Bennée
                   ` (7 subsequent siblings)
  29 siblings, 0 replies; 64+ messages in thread
From: Alex Bennée @ 2016-08-11 15:24 UTC (permalink / raw)
  To: mttcg, qemu-devel, fred.konrad, a.rigo, cota, bobby.prani, nikunj
  Cc: mark.burton, pbonzini, jan.kiszka, serge.fdrv, rth,
	peter.maydell, claudio.fontana, Alex Bennée,
	Peter Crosthwaite

There are a couple of changes that occur at the same time here:

  - introduce a single vCPU qemu_tcg_cpu_thread_fn

  One of these is spawned per vCPU with its own Thread and Condition
  variables. qemu_tcg_rr_cpu_thread_fn is the new name for the old
  single threaded function.

  - the TLS current_cpu variable is now live for the lifetime of MTTCG
    vCPU threads. This is for future work where async jobs need to know
    the vCPU context they are operating in.

The user to switch on multi-thread behaviour and spawn a thread
per-vCPU. For a simple test kvm-unit-test like:

  ./arm/run ./arm/locking-test.flat -smp 4 -accel tcg,thread=multi

Will now use 4 vCPU threads and have an expected FAIL (instead of the
unexpected PASS) as the default mode of the test has no protection when
incrementing a shared variable.

However we still default to a single thread for all vCPUs as individual
front-end and back-ends need additional fixes to safely support:
  - atomic behaviour
  - tb invalidation
  - memory ordering

The function default_mttcg_enabled can be tweaked as support is added.

Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[AJB: Some fixes, conditionally, commit rewording]
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

---
v1 (ajb):
  - fix merge conflicts
  - maintain single-thread approach
v2
  - re-base fixes (no longer has tb_find_fast lock tweak ahead)
  - remove bogus break condition on cpu->stop/stopped
  - only process exiting cpus exit_request
  - handle all cpus idle case (fixes shutdown issues)
  - sleep on EXCP_HALTED in mttcg mode (prevent crash on start-up)
  - move icount timer into helper
v3
  - update the commit message
  - rm kick_timer tweaks (move to earlier tcg_current_cpu tweaks)
  - ensure linux-user clears cpu->exit_request in loop
  - purging of global exit_request and tcg_current_cpu in earlier patches
  - fix checkpatch warnings
v4
  - don't break loop on stopped, we may never schedule next in RR mode
  - make sure we flush iorequests of current cpu if we exited on one
  - add tcg_cpu_exec_start/end wraps for async work functions
  - stop killing of current_cpu on loop exit
  - set current_cpu in the single thread function
  - remove sleep special case, add qemu_tcg_should_sleep() for mttcg
  - no need to atomic set cpu->exit_request going into the loop
  - removed extraneous setting of exit_request
  - split tb_lock() part of patch
  - rename single thread fn to qemu_tcg_rr_cpu_thread_fn
---
 cpu-exec.c |   5 ---
 cpus.c     | 133 +++++++++++++++++++++++++++++++++++++++++++++++--------------
 2 files changed, 103 insertions(+), 35 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index d2e8b9e..9af7260 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -365,7 +365,6 @@ static inline bool cpu_handle_halt(CPUState *cpu)
         }
 #endif
         if (!cpu_has_work(cpu)) {
-            current_cpu = NULL;
             return true;
         }
 
@@ -506,7 +505,6 @@ static inline void cpu_handle_interrupt(CPUState *cpu,
 
     }
     if (unlikely(cpu->exit_request || replay_has_interrupt())) {
-        cpu->exit_request = 0;
         cpu->exception_index = EXCP_INTERRUPT;
         cpu_loop_exit(cpu);
     }
@@ -641,8 +639,5 @@ int cpu_exec(CPUState *cpu)
     cc->cpu_exec_exit(cpu);
     rcu_read_unlock();
 
-    /* fail safe : never use current_cpu outside cpu_exec() */
-    current_cpu = NULL;
-
     return ret;
 }
diff --git a/cpus.c b/cpus.c
index 4fc5e4c..d831d43 100644
--- a/cpus.c
+++ b/cpus.c
@@ -980,24 +980,31 @@ static inline void tcg_cpu_exec_end(CPUState *cpu)
 
 static void qemu_wait_io_event_common(CPUState *cpu)
 {
+    atomic_mb_set(&cpu->thread_kicked, false);
     if (cpu->stop) {
         cpu->stop = false;
         cpu->stopped = true;
         qemu_cond_broadcast(&qemu_pause_cond);
     }
     process_queued_cpu_work(cpu);
-    cpu->thread_kicked = false;
+}
+
+static bool qemu_tcg_should_sleep(CPUState *cpu)
+{
+    if (mttcg_enabled) {
+        return cpu_thread_is_idle(cpu);
+    } else {
+        return all_cpu_threads_idle();
+    }
 }
 
 static void qemu_tcg_wait_io_event(CPUState *cpu)
 {
-    while (all_cpu_threads_idle()) {
+    while (qemu_tcg_should_sleep(cpu)) {
         qemu_cond_wait(cpu->halt_cond, &qemu_global_mutex);
     }
 
-    CPU_FOREACH(cpu) {
-        qemu_wait_io_event_common(cpu);
-    }
+    qemu_wait_io_event_common(cpu);
 
     wait_safe_cpu_work();
 }
@@ -1070,6 +1077,7 @@ static void *qemu_dummy_cpu_thread_fn(void *arg)
     qemu_thread_get_self(cpu->thread);
     cpu->thread_id = qemu_get_thread_id();
     cpu->can_do_io = 1;
+    current_cpu = cpu;
 
     sigemptyset(&waitset);
     sigaddset(&waitset, SIG_IPI);
@@ -1078,9 +1086,7 @@ static void *qemu_dummy_cpu_thread_fn(void *arg)
     cpu->created = true;
     qemu_cond_signal(&qemu_cpu_cond);
 
-    current_cpu = cpu;
     while (1) {
-        current_cpu = NULL;
         qemu_mutex_unlock_iothread();
         do {
             int sig;
@@ -1091,7 +1097,6 @@ static void *qemu_dummy_cpu_thread_fn(void *arg)
             exit(1);
         }
         qemu_mutex_lock_iothread();
-        current_cpu = cpu;
         qemu_wait_io_event_common(cpu);
     }
 
@@ -1230,7 +1235,7 @@ static void kick_tcg_thread(void *opaque)
     qemu_cpu_kick_rr_cpu();
 }
 
-static void *qemu_tcg_cpu_thread_fn(void *arg)
+static void *qemu_tcg_rr_cpu_thread_fn(void *arg)
 {
     CPUState *cpu = arg;
     QEMUTimer *kick_timer;
@@ -1253,6 +1258,7 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
 
         /* process any pending work */
         CPU_FOREACH(cpu) {
+            current_cpu = cpu;
             qemu_wait_io_event_common(cpu);
         }
     }
@@ -1279,6 +1285,7 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
 
         while (cpu && !cpu->exit_request) {
             atomic_mb_set(&tcg_current_rr_cpu, cpu);
+            current_cpu = cpu;
 
             qemu_clock_enable(QEMU_CLOCK_VIRTUAL,
                               (cpu->singlestep_enabled & SSTEP_NOTIMER) == 0);
@@ -1292,7 +1299,7 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
                     cpu_handle_guest_debug(cpu);
                     break;
                 }
-            } else if (cpu->stop || cpu->stopped) {
+            } else if (cpu->stop) {
                 if (cpu->unplug) {
                     cpu = CPU_NEXT(cpu);
                 }
@@ -1311,13 +1318,73 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
 
         handle_icount_deadline();
 
-        qemu_tcg_wait_io_event(QTAILQ_FIRST(&cpus));
+        qemu_tcg_wait_io_event(cpu ? cpu : QTAILQ_FIRST(&cpus));
         deal_with_unplugged_cpus();
     }
 
     return NULL;
 }
 
+/* Multi-threaded TCG
+ *
+ * In the multi-threaded case each vCPU has its own thread. The TLS
+ * variable current_cpu can be used deep in the code to find the
+ * current CPUState for a given thread.
+ */
+
+static void *qemu_tcg_cpu_thread_fn(void *arg)
+{
+    CPUState *cpu = arg;
+
+    rcu_register_thread();
+
+    qemu_mutex_lock_iothread();
+    qemu_thread_get_self(cpu->thread);
+
+    cpu->thread_id = qemu_get_thread_id();
+    cpu->created = true;
+    cpu->can_do_io = 1;
+    current_cpu = cpu;
+    qemu_cond_signal(&qemu_cpu_cond);
+
+    /* process any pending work */
+    cpu->exit_request = 1;
+
+    while (1) {
+        if (cpu_can_run(cpu)) {
+            int r;
+            tcg_cpu_exec_start(cpu);
+            r = tcg_cpu_exec(cpu);
+            tcg_cpu_exec_end(cpu);
+            switch (r) {
+            case EXCP_DEBUG:
+                cpu_handle_guest_debug(cpu);
+                break;
+            case EXCP_HALTED:
+                /* during start-up the vCPU is reset and the thread is
+                 * kicked several times. If we don't ensure we go back
+                 * to sleep in the halted state we won't cleanly
+                 * start-up when the vCPU is enabled.
+                 *
+                 * cpu->halted should ensure we sleep in wait_io_event
+                 */
+                g_assert(cpu->halted);
+                break;
+            default:
+                /* Ignore everything else? */
+                break;
+            }
+        }
+
+        handle_icount_deadline();
+
+        atomic_mb_set(&cpu->exit_request, 0);
+        qemu_tcg_wait_io_event(cpu);
+    }
+
+    return NULL;
+}
+
 static void qemu_cpu_kick_thread(CPUState *cpu)
 {
 #ifndef _WIN32
@@ -1342,7 +1409,7 @@ void qemu_cpu_kick(CPUState *cpu)
     qemu_cond_broadcast(cpu->halt_cond);
     if (tcg_enabled()) {
         cpu_exit(cpu);
-        /* Also ensure current RR cpu is kicked */
+        /* NOP unless doing single-thread RR */
         qemu_cpu_kick_rr_cpu();
     } else {
         qemu_cpu_kick_thread(cpu);
@@ -1409,13 +1476,6 @@ void pause_all_vcpus(void)
 
     if (qemu_in_vcpu_thread()) {
         cpu_stop_current();
-        if (!kvm_enabled()) {
-            CPU_FOREACH(cpu) {
-                cpu->stop = false;
-                cpu->stopped = true;
-            }
-            return;
-        }
     }
 
     while (!all_vcpus_paused()) {
@@ -1464,29 +1524,42 @@ void cpu_remove_sync(CPUState *cpu)
 static void qemu_tcg_init_vcpu(CPUState *cpu)
 {
     char thread_name[VCPU_THREAD_NAME_SIZE];
-    static QemuCond *tcg_halt_cond;
-    static QemuThread *tcg_cpu_thread;
+    static QemuCond *single_tcg_halt_cond;
+    static QemuThread *single_tcg_cpu_thread;
 
-    /* share a single thread for all cpus with TCG */
-    if (!tcg_cpu_thread) {
+    if (qemu_tcg_mttcg_enabled() || !single_tcg_cpu_thread) {
         cpu->thread = g_malloc0(sizeof(QemuThread));
         cpu->halt_cond = g_malloc0(sizeof(QemuCond));
         qemu_cond_init(cpu->halt_cond);
-        tcg_halt_cond = cpu->halt_cond;
-        snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/TCG",
+
+        if (qemu_tcg_mttcg_enabled()) {
+            /* create a thread per vCPU with TCG (MTTCG) */
+            snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/TCG",
                  cpu->cpu_index);
-        qemu_thread_create(cpu->thread, thread_name, qemu_tcg_cpu_thread_fn,
-                           cpu, QEMU_THREAD_JOINABLE);
+
+            qemu_thread_create(cpu->thread, thread_name, qemu_tcg_cpu_thread_fn,
+                               cpu, QEMU_THREAD_JOINABLE);
+
+        } else {
+            /* share a single thread for all cpus with TCG */
+            snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "ALL CPUs/TCG");
+            qemu_thread_create(cpu->thread, thread_name,
+                               qemu_tcg_rr_cpu_thread_fn,
+                               cpu, QEMU_THREAD_JOINABLE);
+
+            single_tcg_halt_cond = cpu->halt_cond;
+            single_tcg_cpu_thread = cpu->thread;
+        }
 #ifdef _WIN32
         cpu->hThread = qemu_thread_get_handle(cpu->thread);
 #endif
         while (!cpu->created) {
             qemu_cond_wait(&qemu_cpu_cond, &qemu_global_mutex);
         }
-        tcg_cpu_thread = cpu->thread;
     } else {
-        cpu->thread = tcg_cpu_thread;
-        cpu->halt_cond = tcg_halt_cond;
+        /* For non-MTTCG cases we share the thread */
+        cpu->thread = single_tcg_cpu_thread;
+        cpu->halt_cond = single_tcg_halt_cond;
     }
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [Qemu-devel] [RFC v4 23/28] atomic: introduce cmpxchg_bool
  2016-08-11 15:23 [Qemu-devel] [RFC v4 00/28] Base enabling patches for MTTCG Alex Bennée
                   ` (21 preceding siblings ...)
  2016-08-11 15:24 ` [Qemu-devel] [RFC v4 22/28] tcg: enable thread-per-vCPU Alex Bennée
@ 2016-08-11 15:24 ` Alex Bennée
  2016-09-08  0:12   ` Richard Henderson
  2016-08-11 15:24 ` [Qemu-devel] [RFC v4 24/28] cputlb: add assert_cpu_is_self checks Alex Bennée
                   ` (6 subsequent siblings)
  29 siblings, 1 reply; 64+ messages in thread
From: Alex Bennée @ 2016-08-11 15:24 UTC (permalink / raw)
  To: mttcg, qemu-devel, fred.konrad, a.rigo, cota, bobby.prani, nikunj
  Cc: mark.burton, pbonzini, jan.kiszka, serge.fdrv, rth,
	peter.maydell, claudio.fontana, Alex Bennée

This allows for slightly neater code when checking for success of a
cmpxchg operation.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

---
v4 (base-patches)
   - brought forward from ARM enabling patches
   - remove the un-needed extra temps
---
 include/qemu/atomic.h | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/include/qemu/atomic.h b/include/qemu/atomic.h
index 560b1af..aba2c23 100644
--- a/include/qemu/atomic.h
+++ b/include/qemu/atomic.h
@@ -152,6 +152,14 @@
     _old;                                                               \
     })
 
+#define atomic_bool_cmpxchg(ptr, old, new)                              \
+    ({                                                                  \
+    typeof(*ptr) _old = (old), _new = (new);                            \
+    __atomic_compare_exchange(ptr, &_old, &_new, false,                 \
+                                  __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST);  \
+    })
+
+
 /* Provide shorter names for GCC atomic builtins, return old value */
 #define atomic_fetch_inc(ptr)  __atomic_fetch_add(ptr, 1, __ATOMIC_SEQ_CST)
 #define atomic_fetch_dec(ptr)  __atomic_fetch_sub(ptr, 1, __ATOMIC_SEQ_CST)
@@ -356,6 +364,7 @@
 #define atomic_fetch_and       __sync_fetch_and_and
 #define atomic_fetch_or        __sync_fetch_and_or
 #define atomic_cmpxchg         __sync_val_compare_and_swap
+#define atomic_bool_cmpxchg    __sync_bool_compare_and_swap
 
 #define atomic_dec_fetch(ptr)  __sync_sub_and_fetch(ptr, 1)
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [Qemu-devel] [RFC v4 24/28] cputlb: add assert_cpu_is_self checks
  2016-08-11 15:23 [Qemu-devel] [RFC v4 00/28] Base enabling patches for MTTCG Alex Bennée
                   ` (22 preceding siblings ...)
  2016-08-11 15:24 ` [Qemu-devel] [RFC v4 23/28] atomic: introduce cmpxchg_bool Alex Bennée
@ 2016-08-11 15:24 ` Alex Bennée
  2016-09-08 17:19   ` Richard Henderson
  2016-08-11 15:24 ` [Qemu-devel] [RFC v4 25/28] cputlb: introduce tlb_flush_* async work Alex Bennée
                   ` (5 subsequent siblings)
  29 siblings, 1 reply; 64+ messages in thread
From: Alex Bennée @ 2016-08-11 15:24 UTC (permalink / raw)
  To: mttcg, qemu-devel, fred.konrad, a.rigo, cota, bobby.prani, nikunj
  Cc: mark.burton, pbonzini, jan.kiszka, serge.fdrv, rth,
	peter.maydell, claudio.fontana, Alex Bennée,
	Peter Crosthwaite

For SoftMMU the TLB flushes are an example of a task that can be
triggered on one vCPU by another. To deal with this properly we need to
use safe work to ensure these changes are done safely. The new assert
can be enabled while debugging to catch these cases.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
 cputlb.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/cputlb.c b/cputlb.c
index 6f19daa..b6833fe 100644
--- a/cputlb.c
+++ b/cputlb.c
@@ -59,6 +59,12 @@
     } \
 } while (0)
 
+#define assert_cpu_is_self(this_cpu) do {                         \
+        if (DEBUG_TLB_GATE) {                                     \
+            g_assert(!cpu->created || qemu_cpu_is_self(cpu));     \
+        }                                                         \
+    } while (0)
+
 /* statistics */
 int tlb_flush_count;
 
@@ -78,6 +84,7 @@ void tlb_flush(CPUState *cpu, int flush_global)
 {
     CPUArchState *env = cpu->env_ptr;
 
+    assert_cpu_is_self(cpu);
     tlb_debug("(%d)\n", flush_global);
 
     memset(env->tlb_table, -1, sizeof(env->tlb_table));
@@ -94,6 +101,7 @@ static inline void v_tlb_flush_by_mmuidx(CPUState *cpu, va_list argp)
 {
     CPUArchState *env = cpu->env_ptr;
 
+    assert_cpu_is_self(cpu);
     tlb_debug("start\n");
 
     for (;;) {
@@ -138,6 +146,7 @@ void tlb_flush_page(CPUState *cpu, target_ulong addr)
     int i;
     int mmu_idx;
 
+    assert_cpu_is_self(cpu);
     tlb_debug("page :" TARGET_FMT_lx "\n", addr);
 
     /* Check if we need to flush due to large pages.  */
@@ -175,6 +184,7 @@ void tlb_flush_page_by_mmuidx(CPUState *cpu, target_ulong addr, ...)
 
     va_start(argp, addr);
 
+    assert_cpu_is_self(cpu);
     tlb_debug("addr "TARGET_FMT_lx"\n", addr);
 
     /* Check if we need to flush due to large pages.  */
@@ -263,6 +273,8 @@ void tlb_reset_dirty(CPUState *cpu, ram_addr_t start1, ram_addr_t length)
 
     int mmu_idx;
 
+    assert_cpu_is_self(cpu);
+
     env = cpu->env_ptr;
     for (mmu_idx = 0; mmu_idx < NB_MMU_MODES; mmu_idx++) {
         unsigned int i;
@@ -294,6 +306,8 @@ void tlb_set_dirty(CPUState *cpu, target_ulong vaddr)
     int i;
     int mmu_idx;
 
+    assert_cpu_is_self(cpu);
+
     vaddr &= TARGET_PAGE_MASK;
     i = (vaddr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
     for (mmu_idx = 0; mmu_idx < NB_MMU_MODES; mmu_idx++) {
@@ -353,6 +367,7 @@ void tlb_set_page_with_attrs(CPUState *cpu, target_ulong vaddr,
     unsigned vidx = env->vtlb_index++ % CPU_VTLB_SIZE;
     int asidx = cpu_asidx_from_attrs(cpu, attrs);
 
+    assert_cpu_is_self(cpu);
     assert(size >= TARGET_PAGE_SIZE);
     if (size != TARGET_PAGE_SIZE) {
         tlb_add_large_page(env, vaddr, size);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [Qemu-devel] [RFC v4 25/28] cputlb: introduce tlb_flush_* async work.
  2016-08-11 15:23 [Qemu-devel] [RFC v4 00/28] Base enabling patches for MTTCG Alex Bennée
                   ` (23 preceding siblings ...)
  2016-08-11 15:24 ` [Qemu-devel] [RFC v4 24/28] cputlb: add assert_cpu_is_self checks Alex Bennée
@ 2016-08-11 15:24 ` Alex Bennée
  2016-09-07 10:08   ` Paolo Bonzini
  2016-09-08 17:23   ` Richard Henderson
  2016-08-11 15:24 ` [Qemu-devel] [RFC v4 26/28] cputlb: tweak qemu_ram_addr_from_host_nofail reporting Alex Bennée
                   ` (4 subsequent siblings)
  29 siblings, 2 replies; 64+ messages in thread
From: Alex Bennée @ 2016-08-11 15:24 UTC (permalink / raw)
  To: mttcg, qemu-devel, fred.konrad, a.rigo, cota, bobby.prani, nikunj
  Cc: mark.burton, pbonzini, jan.kiszka, serge.fdrv, rth,
	peter.maydell, claudio.fontana, Alex Bennée,
	Peter Crosthwaite

From: KONRAD Frederic <fred.konrad@greensocs.com>

Some architectures allow to flush the tlb of other VCPUs. This is not a problem
when we have only one thread for all VCPUs but it definitely needs to be an
asynchronous work when we are in true multithreaded work.

This patch doesn't do anything to protect other cputlb function being
called in MTTCG mode making cross vCPU changes.

Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
[AJB: remove need for g_malloc on defer, make check fixes]
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

---
v4 (base_patches)
  - brought forward from arm enabling series
  - restore pending_tlb_flush flag
v1
  - Remove tlb_flush_all just do the check in tlb_flush.
  - remove the need to g_malloc
  - tlb_flush calls direct if !cpu->created
---
 cputlb.c                | 64 +++++++++++++++++++++++++++++++++++++++----------
 include/exec/exec-all.h |  1 +
 include/qom/cpu.h       |  6 +++++
 3 files changed, 58 insertions(+), 13 deletions(-)

diff --git a/cputlb.c b/cputlb.c
index b6833fe..2975bbe 100644
--- a/cputlb.c
+++ b/cputlb.c
@@ -65,22 +65,14 @@
         }                                                         \
     } while (0)
 
+/* We need a solution for stuffing 64 bit pointers in 32 bit ones if
+ * we care about this combination */
+QEMU_BUILD_BUG_ON(sizeof(target_ulong) > sizeof(void *));
+
 /* statistics */
 int tlb_flush_count;
 
-/* NOTE:
- * If flush_global is true (the usual case), flush all tlb entries.
- * If flush_global is false, flush (at least) all tlb entries not
- * marked global.
- *
- * Since QEMU doesn't currently implement a global/not-global flag
- * for tlb entries, at the moment tlb_flush() will also flush all
- * tlb entries in the flush_global == false case. This is OK because
- * CPU architectures generally permit an implementation to drop
- * entries from the TLB at any time, so flushing more entries than
- * required is only an efficiency issue, not a correctness issue.
- */
-void tlb_flush(CPUState *cpu, int flush_global)
+static void tlb_flush_nocheck(CPUState *cpu, int flush_global)
 {
     CPUArchState *env = cpu->env_ptr;
 
@@ -95,6 +87,37 @@ void tlb_flush(CPUState *cpu, int flush_global)
     env->tlb_flush_addr = -1;
     env->tlb_flush_mask = 0;
     tlb_flush_count++;
+
+    atomic_mb_set(&cpu->pending_tlb_flush, false);
+}
+
+static void tlb_flush_global_async_work(CPUState *cpu, void *opaque)
+{
+    tlb_flush_nocheck(cpu, GPOINTER_TO_INT(opaque));
+}
+
+/* NOTE:
+ * If flush_global is true (the usual case), flush all tlb entries.
+ * If flush_global is false, flush (at least) all tlb entries not
+ * marked global.
+ *
+ * Since QEMU doesn't currently implement a global/not-global flag
+ * for tlb entries, at the moment tlb_flush() will also flush all
+ * tlb entries in the flush_global == false case. This is OK because
+ * CPU architectures generally permit an implementation to drop
+ * entries from the TLB at any time, so flushing more entries than
+ * required is only an efficiency issue, not a correctness issue.
+ */
+void tlb_flush(CPUState *cpu, int flush_global)
+{
+    if (cpu->created && !qemu_cpu_is_self(cpu)) {
+        if (atomic_bool_cmpxchg(&cpu->pending_tlb_flush, false, true)) {
+            async_run_on_cpu(cpu, tlb_flush_global_async_work,
+                             GINT_TO_POINTER(flush_global));
+        }
+    } else {
+        tlb_flush_nocheck(cpu, flush_global);
+    }
 }
 
 static inline void v_tlb_flush_by_mmuidx(CPUState *cpu, va_list argp)
@@ -222,6 +245,21 @@ void tlb_flush_page_by_mmuidx(CPUState *cpu, target_ulong addr, ...)
     tb_flush_jmp_cache(cpu, addr);
 }
 
+static void tlb_flush_page_async_work(CPUState *cpu, void *opaque)
+{
+    tlb_flush_page(cpu, GPOINTER_TO_UINT(opaque));
+}
+
+void tlb_flush_page_all(target_ulong addr)
+{
+    CPUState *cpu;
+
+    CPU_FOREACH(cpu) {
+        async_run_on_cpu(cpu, tlb_flush_page_async_work,
+                         GUINT_TO_POINTER(addr));
+    }
+}
+
 /* update the TLBs so that writes to code in the virtual page 'addr'
    can be detected */
 void tlb_protect_code(ram_addr_t ram_addr)
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 27d1a24..14a7b5e 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -159,6 +159,7 @@ void tlb_set_page(CPUState *cpu, target_ulong vaddr,
 void tb_invalidate_phys_addr(AddressSpace *as, hwaddr addr);
 void probe_write(CPUArchState *env, target_ulong addr, int mmu_idx,
                  uintptr_t retaddr);
+void tlb_flush_page_all(target_ulong addr);
 #else
 static inline void tlb_flush_page(CPUState *cpu, target_ulong addr)
 {
diff --git a/include/qom/cpu.h b/include/qom/cpu.h
index d0db846..6b35f37 100644
--- a/include/qom/cpu.h
+++ b/include/qom/cpu.h
@@ -376,6 +376,12 @@ struct CPUState {
        (absolute value) offset as small as possible.  This reduces code
        size, especially for hosts without large memory offsets.  */
     uint32_t tcg_exit_req;
+
+    /* The pending_tlb_flush flag is set and cleared atomically to
+     * avoid potential races. The aim of the flag is to avoid
+     * unnecessary flushes.
+     */
+    bool pending_tlb_flush;
 };
 
 QTAILQ_HEAD(CPUTailQ, CPUState);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [Qemu-devel] [RFC v4 26/28] cputlb: tweak qemu_ram_addr_from_host_nofail reporting
  2016-08-11 15:23 [Qemu-devel] [RFC v4 00/28] Base enabling patches for MTTCG Alex Bennée
                   ` (24 preceding siblings ...)
  2016-08-11 15:24 ` [Qemu-devel] [RFC v4 25/28] cputlb: introduce tlb_flush_* async work Alex Bennée
@ 2016-08-11 15:24 ` Alex Bennée
  2016-09-08 17:24   ` Richard Henderson
  2016-08-11 15:24 ` [Qemu-devel] [RFC v4 27/28] cputlb: make tlb_reset_dirty safe for MTTCG Alex Bennée
                   ` (3 subsequent siblings)
  29 siblings, 1 reply; 64+ messages in thread
From: Alex Bennée @ 2016-08-11 15:24 UTC (permalink / raw)
  To: mttcg, qemu-devel, fred.konrad, a.rigo, cota, bobby.prani, nikunj
  Cc: mark.burton, pbonzini, jan.kiszka, serge.fdrv, rth,
	peter.maydell, claudio.fontana, Alex Bennée,
	Peter Crosthwaite

This moves the helper function closer to where it is called and updates
the error message to report via error_report instead of the deprecated
fprintf.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
 cputlb.c | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/cputlb.c b/cputlb.c
index 2975bbe..945ea02 100644
--- a/cputlb.c
+++ b/cputlb.c
@@ -293,18 +293,6 @@ void tlb_reset_dirty_range(CPUTLBEntry *tlb_entry, uintptr_t start,
     }
 }
 
-static inline ram_addr_t qemu_ram_addr_from_host_nofail(void *ptr)
-{
-    ram_addr_t ram_addr;
-
-    ram_addr = qemu_ram_addr_from_host(ptr);
-    if (ram_addr == RAM_ADDR_INVALID) {
-        fprintf(stderr, "Bad ram pointer %p\n", ptr);
-        abort();
-    }
-    return ram_addr;
-}
-
 void tlb_reset_dirty(CPUState *cpu, ram_addr_t start1, ram_addr_t length)
 {
     CPUArchState *env;
@@ -516,6 +504,18 @@ static void report_bad_exec(CPUState *cpu, target_ulong addr)
     log_cpu_state_mask(LOG_GUEST_ERROR, cpu, CPU_DUMP_FPU | CPU_DUMP_CCOP);
 }
 
+static inline ram_addr_t qemu_ram_addr_from_host_nofail(void *ptr)
+{
+    ram_addr_t ram_addr;
+
+    ram_addr = qemu_ram_addr_from_host(ptr);
+    if (ram_addr == RAM_ADDR_INVALID) {
+        error_report("Bad ram pointer %p", ptr);
+        abort();
+    }
+    return ram_addr;
+}
+
 /* NOTE: this function can trigger an exception */
 /* NOTE2: the returned address is not exactly the physical address: it
  * is actually a ram_addr_t (in system mode; the user mode emulation
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [Qemu-devel] [RFC v4 27/28] cputlb: make tlb_reset_dirty safe for MTTCG
  2016-08-11 15:23 [Qemu-devel] [RFC v4 00/28] Base enabling patches for MTTCG Alex Bennée
                   ` (25 preceding siblings ...)
  2016-08-11 15:24 ` [Qemu-devel] [RFC v4 26/28] cputlb: tweak qemu_ram_addr_from_host_nofail reporting Alex Bennée
@ 2016-08-11 15:24 ` Alex Bennée
  2016-09-08 17:34   ` Richard Henderson
  2016-08-11 15:24 ` [Qemu-devel] [RFC v4 28/28] cputlb: make tlb_flush_by_mmuidx " Alex Bennée
                   ` (2 subsequent siblings)
  29 siblings, 1 reply; 64+ messages in thread
From: Alex Bennée @ 2016-08-11 15:24 UTC (permalink / raw)
  To: mttcg, qemu-devel, fred.konrad, a.rigo, cota, bobby.prani, nikunj
  Cc: mark.burton, pbonzini, jan.kiszka, serge.fdrv, rth,
	peter.maydell, claudio.fontana, Alex Bennée,
	Peter Crosthwaite

The main use case for tlb_reset_dirty is to set the TLB_NOTDIRTY flags
in TLB entries to force the slow-path on writes. This is used to mark
page ranges containing code which has been translated so it can be
invalidated if written to. To do this safely we need to ensure the TLB
entries in question for all vCPUs are updated before we attempt to run
the code otherwise a race could be introduced.

To achieve this we atomically set the flag in tlb_reset_dirty_range and
take care when setting it when the TLB entry is filled.

The helper function is made static as it isn't used outside of cputlb.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
 cputlb.c              | 55 +++++++++++++++++++++++++++++++++++----------------
 include/exec/cputlb.h |  2 --
 2 files changed, 38 insertions(+), 19 deletions(-)

diff --git a/cputlb.c b/cputlb.c
index 945ea02..faeb195 100644
--- a/cputlb.c
+++ b/cputlb.c
@@ -275,32 +275,50 @@ void tlb_unprotect_code(ram_addr_t ram_addr)
     cpu_physical_memory_set_dirty_flag(ram_addr, DIRTY_MEMORY_CODE);
 }
 
-static bool tlb_is_dirty_ram(CPUTLBEntry *tlbe)
-{
-    return (tlbe->addr_write & (TLB_INVALID_MASK|TLB_MMIO|TLB_NOTDIRTY)) == 0;
-}
 
-void tlb_reset_dirty_range(CPUTLBEntry *tlb_entry, uintptr_t start,
+/*
+ * Dirty write flag handling
+ *
+ * When the TCG code writes to a location it looks up the address in
+ * the TLB and uses that data to compute the final address. If any of
+ * the lower bits of the address are set then the slow path is forced.
+ * There are a number of reasons to do this but for normal RAM the
+ * most usual is detecting writes to code regions which may invalidate
+ * generated code.
+ *
+ * Because we want other vCPUs to respond to changes straight away we
+ * update the te->addr_write field atomically. If the TLB entry has
+ * been changed by the vCPU in the mean time we skip the update.
+ */
+
+static void tlb_reset_dirty_range(CPUTLBEntry *tlb_entry, uintptr_t start,
                            uintptr_t length)
 {
-    uintptr_t addr;
+    /* paired with atomic_mb_set in tlb_set_page_with_attrs */
+    uintptr_t orig_addr = atomic_mb_read(&tlb_entry->addr_write);
+    uintptr_t addr = orig_addr;
 
-    if (tlb_is_dirty_ram(tlb_entry)) {
-        addr = (tlb_entry->addr_write & TARGET_PAGE_MASK) + tlb_entry->addend;
+    if ((addr & (TLB_INVALID_MASK | TLB_MMIO | TLB_NOTDIRTY)) == 0) {
+        addr &= TARGET_PAGE_MASK;
+        addr += atomic_read(&tlb_entry->addend);
         if ((addr - start) < length) {
-            tlb_entry->addr_write |= TLB_NOTDIRTY;
+            uintptr_t notdirty_addr = orig_addr | TLB_NOTDIRTY;
+            atomic_cmpxchg(&tlb_entry->addr_write, orig_addr, notdirty_addr);
         }
     }
 }
 
+/* This is a cross vCPU call (i.e. another vCPU resetting the flags of
+ * the target vCPU). As such care needs to be taken that we don't
+ * dangerously race with another vCPU update. The only thing actually
+ * updated is the target TLB entry ->addr_write flags.
+ */
 void tlb_reset_dirty(CPUState *cpu, ram_addr_t start1, ram_addr_t length)
 {
     CPUArchState *env;
 
     int mmu_idx;
 
-    assert_cpu_is_self(cpu);
-
     env = cpu->env_ptr;
     for (mmu_idx = 0; mmu_idx < NB_MMU_MODES; mmu_idx++) {
         unsigned int i;
@@ -386,7 +404,7 @@ void tlb_set_page_with_attrs(CPUState *cpu, target_ulong vaddr,
     MemoryRegionSection *section;
     unsigned int index;
     target_ulong address;
-    target_ulong code_address;
+    target_ulong code_address, write_address;
     uintptr_t addend;
     CPUTLBEntry *te;
     hwaddr iotlb, xlat, sz;
@@ -443,21 +461,24 @@ void tlb_set_page_with_attrs(CPUState *cpu, target_ulong vaddr,
     } else {
         te->addr_code = -1;
     }
+
+    write_address = -1;
     if (prot & PAGE_WRITE) {
         if ((memory_region_is_ram(section->mr) && section->readonly)
             || memory_region_is_romd(section->mr)) {
             /* Write access calls the I/O callback.  */
-            te->addr_write = address | TLB_MMIO;
+            write_address = address | TLB_MMIO;
         } else if (memory_region_is_ram(section->mr)
                    && cpu_physical_memory_is_clean(
                         memory_region_get_ram_addr(section->mr) + xlat)) {
-            te->addr_write = address | TLB_NOTDIRTY;
+            write_address = address | TLB_NOTDIRTY;
         } else {
-            te->addr_write = address;
+            write_address = address;
         }
-    } else {
-        te->addr_write = -1;
     }
+
+    /* Pairs with flag setting in tlb_reset_dirty_range */
+    atomic_mb_set(&te->addr_write, write_address);
 }
 
 /* Add a new TLB entry, but without specifying the memory
diff --git a/include/exec/cputlb.h b/include/exec/cputlb.h
index d454c00..3f94178 100644
--- a/include/exec/cputlb.h
+++ b/include/exec/cputlb.h
@@ -23,8 +23,6 @@
 /* cputlb.c */
 void tlb_protect_code(ram_addr_t ram_addr);
 void tlb_unprotect_code(ram_addr_t ram_addr);
-void tlb_reset_dirty_range(CPUTLBEntry *tlb_entry, uintptr_t start,
-                           uintptr_t length);
 extern int tlb_flush_count;
 
 #endif
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [Qemu-devel] [RFC v4 28/28] cputlb: make tlb_flush_by_mmuidx safe for MTTCG
  2016-08-11 15:23 [Qemu-devel] [RFC v4 00/28] Base enabling patches for MTTCG Alex Bennée
                   ` (26 preceding siblings ...)
  2016-08-11 15:24 ` [Qemu-devel] [RFC v4 27/28] cputlb: make tlb_reset_dirty safe for MTTCG Alex Bennée
@ 2016-08-11 15:24 ` Alex Bennée
  2016-09-07 10:09   ` Paolo Bonzini
  2016-09-08 17:54   ` Richard Henderson
  2016-08-11 17:22 ` [Qemu-devel] [RFC v4 00/28] Base enabling patches " Alex Bennée
  2016-09-06  9:24 ` Alex Bennée
  29 siblings, 2 replies; 64+ messages in thread
From: Alex Bennée @ 2016-08-11 15:24 UTC (permalink / raw)
  To: mttcg, qemu-devel, fred.konrad, a.rigo, cota, bobby.prani, nikunj
  Cc: mark.burton, pbonzini, jan.kiszka, serge.fdrv, rth,
	peter.maydell, claudio.fontana, Alex Bennée,
	Peter Crosthwaite

These flushes allow a per-mmuidx granularity to the TLB flushing and are
currently only used by the ARM model. As it is possible to hammer the
other vCPU threads with flushes (and build up long queues of identical
flushes) we extend mechanism used for the global tlb_flush and set a
bitmap describing all the pending flushes. The updates are done
atomically to avoid corruption of the bitmap but repeating a flush is
certainly not a problem.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
 cputlb.c          | 138 ++++++++++++++++++++++++++++++++++++++++--------------
 include/qom/cpu.h |  12 ++---
 2 files changed, 108 insertions(+), 42 deletions(-)

diff --git a/cputlb.c b/cputlb.c
index faeb195..3b741d4 100644
--- a/cputlb.c
+++ b/cputlb.c
@@ -68,6 +68,9 @@
 /* We need a solution for stuffing 64 bit pointers in 32 bit ones if
  * we care about this combination */
 QEMU_BUILD_BUG_ON(sizeof(target_ulong) > sizeof(void *));
+QEMU_BUILD_BUG_ON(NB_MMU_MODES > 16);
+
+#define ALL_MMUIDX_BITS ((1 << NB_MMU_MODES) - 1)
 
 /* statistics */
 int tlb_flush_count;
@@ -88,7 +91,7 @@ static void tlb_flush_nocheck(CPUState *cpu, int flush_global)
     env->tlb_flush_mask = 0;
     tlb_flush_count++;
 
-    atomic_mb_set(&cpu->pending_tlb_flush, false);
+    atomic_mb_set(&cpu->pending_tlb_flush, 0);
 }
 
 static void tlb_flush_global_async_work(CPUState *cpu, void *opaque)
@@ -111,7 +114,8 @@ static void tlb_flush_global_async_work(CPUState *cpu, void *opaque)
 void tlb_flush(CPUState *cpu, int flush_global)
 {
     if (cpu->created && !qemu_cpu_is_self(cpu)) {
-        if (atomic_bool_cmpxchg(&cpu->pending_tlb_flush, false, true)) {
+        if (atomic_mb_read(&cpu->pending_tlb_flush) != ALL_MMUIDX_BITS) {
+            atomic_mb_set(&cpu->pending_tlb_flush, ALL_MMUIDX_BITS);
             async_run_on_cpu(cpu, tlb_flush_global_async_work,
                              GINT_TO_POINTER(flush_global));
         }
@@ -120,35 +124,67 @@ void tlb_flush(CPUState *cpu, int flush_global)
     }
 }
 
-static inline void v_tlb_flush_by_mmuidx(CPUState *cpu, va_list argp)
+static void tlb_flush_by_mmuidx_async_work(CPUState *cpu, void *mmu_bitmask)
 {
     CPUArchState *env = cpu->env_ptr;
+    unsigned long mmu_idx_bitmask = GPOINTER_TO_UINT(mmu_bitmask);
+    int mmu_idx;
 
     assert_cpu_is_self(cpu);
-    tlb_debug("start\n");
 
-    for (;;) {
-        int mmu_idx = va_arg(argp, int);
+    tlb_debug("start\n");
 
-        if (mmu_idx < 0) {
-            break;
-        }
+    for (mmu_idx = 0; mmu_idx < NB_MMU_MODES; mmu_idx++) {
 
-        tlb_debug("%d\n", mmu_idx);
+        if (test_bit(mmu_idx, &mmu_idx_bitmask)) {
+            tlb_debug("%d\n", mmu_idx);
 
-        memset(env->tlb_table[mmu_idx], -1, sizeof(env->tlb_table[0]));
-        memset(env->tlb_v_table[mmu_idx], -1, sizeof(env->tlb_v_table[0]));
+            memset(env->tlb_table[mmu_idx], -1, sizeof(env->tlb_table[0]));
+            memset(env->tlb_v_table[mmu_idx], -1, sizeof(env->tlb_v_table[0]));
+        }
     }
 
     memset(cpu->tb_jmp_cache, 0, sizeof(cpu->tb_jmp_cache));
 }
 
+/* Helper function to slurp va_args list into a bitmap
+ */
+static inline unsigned long make_mmu_index_bitmap(va_list args)
+{
+    unsigned long bitmap = 0;
+    int mmu_index = va_arg(args, int);
+
+    /* An empty va_list would be a bad call */
+    g_assert(mmu_index > 0);
+
+    do {
+        set_bit(mmu_index, &bitmap);
+        mmu_index = va_arg(args, int);
+    } while (mmu_index >= 0);
+
+    return bitmap;
+}
+
 void tlb_flush_by_mmuidx(CPUState *cpu, ...)
 {
     va_list argp;
+    unsigned long mmu_idx_bitmap;
+
     va_start(argp, cpu);
-    v_tlb_flush_by_mmuidx(cpu, argp);
+    mmu_idx_bitmap = make_mmu_index_bitmap(argp);
     va_end(argp);
+
+    if (!qemu_cpu_is_self(cpu)) {
+        uint16_t pending_flushes =
+            mmu_idx_bitmap & ~atomic_mb_read(&cpu->pending_tlb_flush);
+        if (pending_flushes) {
+            atomic_or(&cpu->pending_tlb_flush, pending_flushes);
+            async_run_on_cpu(cpu, tlb_flush_by_mmuidx_async_work,
+                             GUINT_TO_POINTER(pending_flushes));
+        }
+    } else {
+        tlb_flush_by_mmuidx_async_work(cpu, GUINT_TO_POINTER(mmu_idx_bitmap));
+    }
 }
 
 static inline void tlb_flush_entry(CPUTLBEntry *tlb_entry, target_ulong addr)
@@ -199,15 +235,46 @@ void tlb_flush_page(CPUState *cpu, target_ulong addr)
     tb_flush_jmp_cache(cpu, addr);
 }
 
+/* As we are going to hijack the bottom bits of the page address for a
+ * mmuidx bit mask we need to fail to build if we can't do that
+ */
+QEMU_BUILD_BUG_ON(NB_MMU_MODES > TARGET_PAGE_BITS);
+
+static void tlb_flush_page_by_mmuidx_async_work(CPUState *cpu, void *data)
+{
+    CPUArchState *env = cpu->env_ptr;
+    uintptr_t page_and_mmuidx = GPOINTER_TO_UINT(data);
+    target_ulong addr = page_and_mmuidx & TARGET_PAGE_MASK;
+    int page = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
+    int mmu_idx;
+    int i;
+
+    assert_cpu_is_self(cpu);
+
+    for (mmu_idx = 0; mmu_idx < NB_MMU_MODES; mmu_idx++) {
+        if (test_bit(mmu_idx, &page_and_mmuidx)) {
+            tlb_flush_entry(&env->tlb_table[mmu_idx][page], addr);
+
+            /* check whether there are vltb entries that need to be flushed */
+            for (i = 0; i < CPU_VTLB_SIZE; i++) {
+                tlb_flush_entry(&env->tlb_v_table[mmu_idx][i], addr);
+            }
+        }
+    }
+
+    tb_flush_jmp_cache(cpu, addr);
+}
+
 void tlb_flush_page_by_mmuidx(CPUState *cpu, target_ulong addr, ...)
 {
     CPUArchState *env = cpu->env_ptr;
-    int i, k;
+    unsigned long mmu_idx_bitmap;
     va_list argp;
 
     va_start(argp, addr);
+    mmu_idx_bitmap = make_mmu_index_bitmap(argp);
+    va_end(argp);
 
-    assert_cpu_is_self(cpu);
     tlb_debug("addr "TARGET_FMT_lx"\n", addr);
 
     /* Check if we need to flush due to large pages.  */
@@ -216,33 +283,32 @@ void tlb_flush_page_by_mmuidx(CPUState *cpu, target_ulong addr, ...)
                   TARGET_FMT_lx "/" TARGET_FMT_lx ")\n",
                   env->tlb_flush_addr, env->tlb_flush_mask);
 
-        v_tlb_flush_by_mmuidx(cpu, argp);
-        va_end(argp);
+
+        if (!qemu_cpu_is_self(cpu)) {
+            uint16_t pending_flushes =
+                mmu_idx_bitmap & ~atomic_mb_read(&cpu->pending_tlb_flush);
+            if (pending_flushes) {
+                atomic_or(&cpu->pending_tlb_flush, pending_flushes);
+                async_run_on_cpu(cpu, tlb_flush_by_mmuidx_async_work,
+                                 GUINT_TO_POINTER(pending_flushes));
+            }
+        } else {
+            tlb_flush_by_mmuidx_async_work(cpu,
+                                           GUINT_TO_POINTER(mmu_idx_bitmap));
+        }
         return;
     }
 
+    /* This should already be page aligned */
     addr &= TARGET_PAGE_MASK;
-    i = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
+    addr |= mmu_idx_bitmap;
 
-    for (;;) {
-        int mmu_idx = va_arg(argp, int);
-
-        if (mmu_idx < 0) {
-            break;
-        }
-
-        tlb_debug("idx %d\n", mmu_idx);
-
-        tlb_flush_entry(&env->tlb_table[mmu_idx][i], addr);
-
-        /* check whether there are vltb entries that need to be flushed */
-        for (k = 0; k < CPU_VTLB_SIZE; k++) {
-            tlb_flush_entry(&env->tlb_v_table[mmu_idx][k], addr);
-        }
+    if (!qemu_cpu_is_self(cpu)) {
+        async_run_on_cpu(cpu, tlb_flush_page_by_mmuidx_async_work,
+                         GUINT_TO_POINTER(addr));
+    } else {
+        tlb_flush_page_by_mmuidx_async_work(cpu, GUINT_TO_POINTER(addr));
     }
-    va_end(argp);
-
-    tb_flush_jmp_cache(cpu, addr);
 }
 
 static void tlb_flush_page_async_work(CPUState *cpu, void *opaque)
diff --git a/include/qom/cpu.h b/include/qom/cpu.h
index 6b35f37..fcec99d 100644
--- a/include/qom/cpu.h
+++ b/include/qom/cpu.h
@@ -371,17 +371,17 @@ struct CPUState {
      */
     bool throttle_thread_scheduled;
 
+    /* The pending_tlb_flush flag is set and cleared atomically to
+     * avoid potential races. The aim of the flag is to avoid
+     * unnecessary flushes.
+     */
+    uint16_t pending_tlb_flush;
+
     /* Note that this is accessed at the start of every TB via a negative
        offset from AREG0.  Leave this field at the end so as to make the
        (absolute value) offset as small as possible.  This reduces code
        size, especially for hosts without large memory offsets.  */
     uint32_t tcg_exit_req;
-
-    /* The pending_tlb_flush flag is set and cleared atomically to
-     * avoid potential races. The aim of the flag is to avoid
-     * unnecessary flushes.
-     */
-    bool pending_tlb_flush;
 };
 
 QTAILQ_HEAD(CPUTailQ, CPUState);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [RFC v4 00/28] Base enabling patches for MTTCG
  2016-08-11 15:23 [Qemu-devel] [RFC v4 00/28] Base enabling patches for MTTCG Alex Bennée
                   ` (27 preceding siblings ...)
  2016-08-11 15:24 ` [Qemu-devel] [RFC v4 28/28] cputlb: make tlb_flush_by_mmuidx " Alex Bennée
@ 2016-08-11 17:22 ` Alex Bennée
  2016-08-12  8:02   ` Alex Bennée
  2016-09-06  9:24 ` Alex Bennée
  29 siblings, 1 reply; 64+ messages in thread
From: Alex Bennée @ 2016-08-11 17:22 UTC (permalink / raw)
  To: mttcg, qemu-devel, fred.konrad, a.rigo, cota, bobby.prani, nikunj
  Cc: mark.burton, pbonzini, jan.kiszka, serge.fdrv, rth,
	peter.maydell, claudio.fontana


Alex Bennée <alex.bennee@linaro.org> writes:

> This is the fourth iteration of the RFC patch set which aims to
> provide the basic framework for MTTCG. I hope this will provide a good
> base for discussion at KVM Forum later this month.
>
<snip>
>
> In practice the memory barrier problems don't show up with an x86
> host. In fact I have created a tree which merges in the Emilio's
> cmpxchg atomics which happily boots ARMv7 Debian systems without any
> additional changes. You can find that at:
>
>   https://github.com/stsquad/qemu/tree/mttcg/base-patches-v4-with-cmpxchg-atomics-v2
>
<snip>
> Performance
> ===========
>
> You can't do full work-load testing on this tree due to the lack of
> atomic support (but I will run some numbers on
> mttcg/base-patches-v4-with-cmpxchg-atomics-v2).

So here is a more real world work load run:

  retry.py called with ['/home/alex/lsrc/qemu/qemu.git/arm-softmmu/qemu-system-arm', '-machine', 'type=virt', '-display', 'none', '-smp', '1', '-m', '4096', '-cpu', 'cortex-a15', '-serial', 'telnet:127.0.0.1:4444', '-monitor', 'stdio', '-netdev', 'user,id=unet,hostfwd=tcp::2222-:22', '-device', 'virtio-net-device,netdev=unet', '-drive', 'file=/home/alex/lsrc/qemu/images/jessie-arm32.qcow2,id=myblock,index=0,if=none', '-device', 'virtio-blk-device,drive=myblock', '-append', 'console=ttyAMA0 systemd.unit=benchmark-build.service root=/dev/vda1', '-kernel', '/home/alex/lsrc/qemu/images/aarch32-current-linux-kernel-only.img', '-smp', '4', '-name', 'debug-threads=on', '-accel', 'tcg,thread=single']
  run 1: ret=0 (PASS), time=261.794911 (1/1)
  run 2: ret=0 (PASS), time=257.290045 (2/2)
  run 3: ret=0 (PASS), time=256.536991 (3/3)
  run 4: ret=0 (PASS), time=254.036260 (4/4)
  run 5: ret=0 (PASS), time=256.539165 (5/5)
  Results summary:
  0: 5 times (100.00%), avg time 257.239 (8.00 varience/2.83 deviation)
  Ran command 5 times, 5 passes

  retry.py called with ['/home/alex/lsrc/qemu/qemu.git/arm-softmmu/qemu-system-arm', '-machine', 'type=virt', '-display', 'none', '-smp', '1', '-m', '4096', '-cpu', 'cortex-a15', '-serial', 'telnet:127.0.0.1:4444', '-monitor', 'stdio', '-netdev', 'user,id=unet,hostfwd=tcp::2222-:22', '-device', 'virtio-net-device,netdev=unet', '-drive', 'file=/home/alex/lsrc/qemu/images/jessie-arm32.qcow2,id=myblock,index=0,if=none', '-device', 'virtio-blk-device,drive=myblock', '-append', 'console=ttyAMA0 systemd.unit=benchmark-build.service root=/dev/vda1', '-kernel', '/home/alex/lsrc/qemu/images/aarch32-current-linux-kernel-only.img', '-smp', '4', '-name', 'debug-threads=on', '-accel', 'tcg,thread=multi']
  run 1: ret=0 (PASS), time=86.597459 (1/1)
  run 2: ret=0 (PASS), time=82.843904 (2/2)
  run 3: ret=0 (PASS), time=84.095910 (3/3)
  run 4: ret=0 (PASS), time=83.844595 (4/4)
  run 5: ret=0 (PASS), time=83.594768 (5/5)
  Results summary:
  0: 5 times (100.00%), avg time 84.195 (2.02 varience/1.42 deviation)
  Ran command 5 times, 5 passes

This shows a 30% overhead over the ideal for running multi-threaded but
still seeing a decent improvement in wall time.

So the test itself is booting the system, running the
benchmark-build.service:

  # A benchmark target
  #
  # This shutsdown once the boot has completed

  [Unit]
  Description=Default
  Requires=basic.target
  After=basic.target
  AllowIsolate=yes

  [Service]
  Type=oneshot
  ExecStart=/root/mysrc/testcases.git/build-dir.sh
  /root/src/stress-ng.git/
  ExecStartPost=/sbin/poweroff

  [Install]
  WantedBy=multi-user.target

And the build-dir script is a simple:

    #!/bin/sh
    #
    NR_CPUS=$(grep -c ^processor /proc/cpuinfo)
    set -e
    cd $1
    make clean
    make -j${NR_CPUS}
    cd -

Measuring this over increasing -smp

| -smp |    time | time as bar  | theoretical | % of -smp 1 |
|------+---------+--------------+-------------+-------------|
|    1 | 238.184 | WWWWWWWWWWWW |     238.184 |             |
|    2 | 133.402 | WWWWWWh      |     119.092 |             |
|    3 |  99.531 | WWWWH        |   79.394667 |             |
|    4 |  82.760 | WWWW:        |      59.546 |             |
#+TBLFM: $3='(orgtbl-ascii-draw $2 0 238.184 12)::$4=@2$2/$1

--
Alex Bennée

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [RFC v4 00/28] Base enabling patches for MTTCG
  2016-08-11 17:22 ` [Qemu-devel] [RFC v4 00/28] Base enabling patches " Alex Bennée
@ 2016-08-12  8:02   ` Alex Bennée
  0 siblings, 0 replies; 64+ messages in thread
From: Alex Bennée @ 2016-08-12  8:02 UTC (permalink / raw)
  To: mttcg, qemu-devel, fred.konrad, a.rigo, cota, bobby.prani, nikunj
  Cc: mark.burton, pbonzini, jan.kiszka, serge.fdrv, rth,
	peter.maydell, claudio.fontana


Alex Bennée <alex.bennee@linaro.org> writes:

> Alex Bennée <alex.bennee@linaro.org> writes:
>
>> This is the fourth iteration of the RFC patch set which aims to
>> provide the basic framework for MTTCG. I hope this will provide a good
>> base for discussion at KVM Forum later this month.
>>
> <snip>
>>
>> In practice the memory barrier problems don't show up with an x86
>> host. In fact I have created a tree which merges in the Emilio's
>> cmpxchg atomics which happily boots ARMv7 Debian systems without any
>> additional changes. You can find that at:
>>
>>   https://github.com/stsquad/qemu/tree/mttcg/base-patches-v4-with-cmpxchg-atomics-v2
>>
> <snip>
>> Performance
>> ===========
>>
>> You can't do full work-load testing on this tree due to the lack of
>> atomic support (but I will run some numbers on
>> mttcg/base-patches-v4-with-cmpxchg-atomics-v2).
>
> So here is a more real world work load run:
>
>   retry.py called with ['/home/alex/lsrc/qemu/qemu.git/arm-softmmu/qemu-system-arm', '-machine', 'type=virt', '-display', 'none', '-smp', '1', '-m', '4096', '-cpu', 'cortex-a15', '-serial', 'telnet:127.0.0.1:4444', '-monitor', 'stdio', '-netdev', 'user,id=unet,hostfwd=tcp::2222-:22', '-device', 'virtio-net-device,netdev=unet', '-drive', 'file=/home/alex/lsrc/qemu/images/jessie-arm32.qcow2,id=myblock,index=0,if=none', '-device', 'virtio-blk-device,drive=myblock', '-append', 'console=ttyAMA0 systemd.unit=benchmark-build.service root=/dev/vda1', '-kernel', '/home/alex/lsrc/qemu/images/aarch32-current-linux-kernel-only.img', '-smp', '4', '-name', 'debug-threads=on', '-accel', 'tcg,thread=single']
>   run 1: ret=0 (PASS), time=261.794911 (1/1)
>   run 2: ret=0 (PASS), time=257.290045 (2/2)
>   run 3: ret=0 (PASS), time=256.536991 (3/3)
>   run 4: ret=0 (PASS), time=254.036260 (4/4)
>   run 5: ret=0 (PASS), time=256.539165 (5/5)
>   Results summary:
>   0: 5 times (100.00%), avg time 257.239 (8.00 varience/2.83 deviation)
>   Ran command 5 times, 5 passes
>
>   retry.py called with ['/home/alex/lsrc/qemu/qemu.git/arm-softmmu/qemu-system-arm', '-machine', 'type=virt', '-display', 'none', '-smp', '1', '-m', '4096', '-cpu', 'cortex-a15', '-serial', 'telnet:127.0.0.1:4444', '-monitor', 'stdio', '-netdev', 'user,id=unet,hostfwd=tcp::2222-:22', '-device', 'virtio-net-device,netdev=unet', '-drive', 'file=/home/alex/lsrc/qemu/images/jessie-arm32.qcow2,id=myblock,index=0,if=none', '-device', 'virtio-blk-device,drive=myblock', '-append', 'console=ttyAMA0 systemd.unit=benchmark-build.service root=/dev/vda1', '-kernel', '/home/alex/lsrc/qemu/images/aarch32-current-linux-kernel-only.img', '-smp', '4', '-name', 'debug-threads=on', '-accel', 'tcg,thread=multi']
>   run 1: ret=0 (PASS), time=86.597459 (1/1)
>   run 2: ret=0 (PASS), time=82.843904 (2/2)
>   run 3: ret=0 (PASS), time=84.095910 (3/3)
>   run 4: ret=0 (PASS), time=83.844595 (4/4)
>   run 5: ret=0 (PASS), time=83.594768 (5/5)
>   Results summary:
>   0: 5 times (100.00%), avg time 84.195 (2.02 varience/1.42 deviation)
>   Ran command 5 times, 5 passes
>
> This shows a 30% overhead over the ideal for running multi-threaded but
> still seeing a decent improvement in wall time.
>
> So the test itself is booting the system, running the
> benchmark-build.service:
>
>   # A benchmark target
>   #
>   # This shutsdown once the boot has completed
>
>   [Unit]
>   Description=Default
>   Requires=basic.target
>   After=basic.target
>   AllowIsolate=yes
>
>   [Service]
>   Type=oneshot
>   ExecStart=/root/mysrc/testcases.git/build-dir.sh
>   /root/src/stress-ng.git/
>   ExecStartPost=/sbin/poweroff
>
>   [Install]
>   WantedBy=multi-user.target
>
> And the build-dir script is a simple:
>
>     #!/bin/sh
>     #
>     NR_CPUS=$(grep -c ^processor /proc/cpuinfo)
>     set -e
>     cd $1
>     make clean
>     make -j${NR_CPUS}
>     cd -
>
> Measuring this over increasing -smp


Measuring this over increasing -smp

 -smp     time  -smp 1 / smp  time as bar   x faster
-----------------------------------------------------
    1  238.184       238.184  WWWWWWWWWWWW     1.000
    2  133.402       119.092  WWWWWWh          1.785
    3   99.531        79.395  WWWWW            2.393
    4   82.760        59.546  WWWW.            2.878
    5   82.513        47.637  WWWW.            2.887
    6   78.922        39.697  WWWH             3.018
    7   87.181        34.026  WWWW;            2.732
    8   87.098        29.773  WWWW;            2.735

So a more complete analysis shows the benefits start to tail off as we
push past 4 vCPUs. However on my machine which is 4+4 hyperthreads that
could be just as much a feature of the host system. Indeed the results
start getting noisy at 7/8 vCPUs.

Interestingly a perf run against -smp 6 shows gic_update topping the
graph (3.14% of total execution time). That function does have a big
TODO for optimisation on it ;-)

--
Alex Bennée

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [RFC v4 00/28] Base enabling patches for MTTCG
  2016-08-11 15:23 [Qemu-devel] [RFC v4 00/28] Base enabling patches for MTTCG Alex Bennée
                   ` (28 preceding siblings ...)
  2016-08-11 17:22 ` [Qemu-devel] [RFC v4 00/28] Base enabling patches " Alex Bennée
@ 2016-09-06  9:24 ` Alex Bennée
  29 siblings, 0 replies; 64+ messages in thread
From: Alex Bennée @ 2016-09-06  9:24 UTC (permalink / raw)
  To: mttcg, qemu-devel, fred.konrad, a.rigo, cota, bobby.prani, nikunj
  Cc: mark.burton, pbonzini, jan.kiszka, serge.fdrv, rth,
	peter.maydell, claudio.fontana


Alex Bennée <alex.bennee@linaro.org> writes:

> This is the fourth iteration of the RFC patch set which aims to
> provide the basic framework for MTTCG. I hope this will provide a good
> base for discussion at KVM Forum later this month.

Review ping?

It would be nice to get some review feedback before I re-spin against
the latest async safe work. Unless everyone already thinks the code is
perfect as it is ;-)

--
Alex Bennée

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [RFC v4 04/28] cpu-exec: include cpu_index in CPU_LOG_EXEC messages
  2016-08-11 15:24 ` [Qemu-devel] [RFC v4 04/28] cpu-exec: include cpu_index in CPU_LOG_EXEC messages Alex Bennée
@ 2016-09-07  2:21   ` Richard Henderson
  0 siblings, 0 replies; 64+ messages in thread
From: Richard Henderson @ 2016-09-07  2:21 UTC (permalink / raw)
  To: Alex Bennée, mttcg, qemu-devel, fred.konrad, a.rigo, cota,
	bobby.prani, nikunj
  Cc: peter.maydell, claudio.fontana, Peter Crosthwaite, jan.kiszka,
	mark.burton, serge.fdrv, pbonzini

On 08/11/2016 08:24 AM, Alex Bennée wrote:
> Even more important when debugging MTTCG is seeing which vCPU is
> currently executing.
>
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> ---
>  cpu-exec.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)

Reviewed-by: Richard Henderson <rth@twiddle.net>


r~

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [RFC v4 06/28] tcg: comment on which functions have to be called with tb_lock held
  2016-08-11 15:24 ` [Qemu-devel] [RFC v4 06/28] tcg: comment on which functions have to be called with tb_lock held Alex Bennée
@ 2016-09-07  2:30   ` Richard Henderson
  0 siblings, 0 replies; 64+ messages in thread
From: Richard Henderson @ 2016-09-07  2:30 UTC (permalink / raw)
  To: Alex Bennée, mttcg, qemu-devel, fred.konrad, a.rigo, cota,
	bobby.prani, nikunj
  Cc: peter.maydell, claudio.fontana, Peter Crosthwaite, jan.kiszka,
	mark.burton, serge.fdrv, pbonzini

On 08/11/2016 08:24 AM, Alex Bennée wrote:
> From: Paolo Bonzini <pbonzini@redhat.com>
>
> softmmu requires more functions to be thread-safe, because translation
> blocks can be invalidated from e.g. notdirty callbacks.  Probably the
> same holds for user-mode emulation, it's just that no one has ever
> tried to produce a coherent locking there.
>
> This patch will guide the introduction of more tb_lock and tb_unlock
> calls for system emulation.
>
> Note that after this patch some (most) of the mentioned functions are
> still called outside tb_lock/tb_unlock.  The next one will rectify this.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

Reviewed-by: Richard Henderson <rth@twiddle.net>


r~

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [RFC v4 07/28] linux-user/elfload: ensure mmap_lock() held while setting up
  2016-08-11 15:24 ` [Qemu-devel] [RFC v4 07/28] linux-user/elfload: ensure mmap_lock() held while setting up Alex Bennée
@ 2016-09-07  2:34   ` Richard Henderson
  0 siblings, 0 replies; 64+ messages in thread
From: Richard Henderson @ 2016-09-07  2:34 UTC (permalink / raw)
  To: Alex Bennée, mttcg, qemu-devel, fred.konrad, a.rigo, cota,
	bobby.prani, nikunj
  Cc: peter.maydell, claudio.fontana, jan.kiszka, Riku Voipio,
	mark.burton, serge.fdrv, pbonzini

On 08/11/2016 08:24 AM, Alex Bennée wrote:
> Future patches will enforce the holding of mmap_lock() when we are
> manipulating internal memory structures. Technically it doesn't matter
> in the case of elfload as we haven't started executing yet. However it
> is easier to grab the lock when required than special case the
> translate-all API.
>
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

Reviewed-by: Richard Henderson <rth@twiddle.net>


r~

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [RFC v4 08/28] translate-all: Add assert_(memory|tb)_lock annotations
  2016-08-11 15:24 ` [Qemu-devel] [RFC v4 08/28] translate-all: Add assert_(memory|tb)_lock annotations Alex Bennée
@ 2016-09-07  2:41   ` Richard Henderson
  2016-09-07  7:08     ` Alex Bennée
  0 siblings, 1 reply; 64+ messages in thread
From: Richard Henderson @ 2016-09-07  2:41 UTC (permalink / raw)
  To: Alex Bennée, mttcg, qemu-devel, fred.konrad, a.rigo, cota,
	bobby.prani, nikunj
  Cc: peter.maydell, claudio.fontana, Peter Crosthwaite, jan.kiszka,
	mark.burton, serge.fdrv, pbonzini

On 08/11/2016 08:24 AM, Alex Bennée wrote:
> This adds calls to the assert_(memory|tb)_lock for all public APIs which
> are documented as needing them held for linux-user mode. The asserts are
> NOPs for system-mode although these will be converted when MTTCG is
> enabled.
>
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>


Reviewed-by: Richard Henderson <rth@twiddle.net>

> @@ -1111,7 +1121,7 @@ static void build_page_bitmap(PageDesc *p)
>              tb_end = tb_start + tb->size;
>              if (tb_end > TARGET_PAGE_SIZE) {
>                  tb_end = TARGET_PAGE_SIZE;
> -            }
> +             }
>          } else {

Introducing a whitespace error?


r~

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [RFC v4 09/28] tcg: protect TBContext with tb_lock.
  2016-08-11 15:24 ` [Qemu-devel] [RFC v4 09/28] tcg: protect TBContext with tb_lock Alex Bennée
@ 2016-09-07  2:48   ` Richard Henderson
  0 siblings, 0 replies; 64+ messages in thread
From: Richard Henderson @ 2016-09-07  2:48 UTC (permalink / raw)
  To: Alex Bennée, mttcg, qemu-devel, fred.konrad, a.rigo, cota,
	bobby.prani, nikunj
  Cc: peter.maydell, claudio.fontana, Peter Crosthwaite, jan.kiszka,
	Michael S. Tsirkin, mark.burton, Eduardo Habkost, serge.fdrv,
	pbonzini

On 08/11/2016 08:24 AM, Alex Bennée wrote:
> <more detail here>

Missing?  Or just remove it.


r~

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [RFC v4 11/28] tcg: move tcg_exec_all and helpers above thread fn
  2016-08-11 15:24 ` [Qemu-devel] [RFC v4 11/28] tcg: move tcg_exec_all and helpers above thread fn Alex Bennée
@ 2016-09-07  2:53   ` Richard Henderson
  0 siblings, 0 replies; 64+ messages in thread
From: Richard Henderson @ 2016-09-07  2:53 UTC (permalink / raw)
  To: Alex Bennée, mttcg, qemu-devel, fred.konrad, a.rigo, cota,
	bobby.prani, nikunj
  Cc: peter.maydell, claudio.fontana, Peter Crosthwaite, jan.kiszka,
	mark.burton, serge.fdrv, pbonzini

On 08/11/2016 08:24 AM, Alex Bennée wrote:
> This is a pure mechanical change in preparation for up-coming
> re-factoring. Instead of a forward declaration for tcg_exec_all it and
> the associated helper functions are moved in front of the call from
> qemu_tcg_cpu_thread_fn.
>
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

Reviewed-by: Richard Henderson <rth@twiddle.net>


r~

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [RFC v4 13/28] tcg: add options for enabling MTTCG
  2016-08-11 15:24 ` [Qemu-devel] [RFC v4 13/28] tcg: add options for enabling MTTCG Alex Bennée
@ 2016-09-07  3:06   ` Richard Henderson
  0 siblings, 0 replies; 64+ messages in thread
From: Richard Henderson @ 2016-09-07  3:06 UTC (permalink / raw)
  To: Alex Bennée, mttcg, qemu-devel, fred.konrad, a.rigo, cota,
	bobby.prani, nikunj
  Cc: peter.maydell, claudio.fontana, Peter Crosthwaite, jan.kiszka,
	mark.burton, serge.fdrv, pbonzini

On 08/11/2016 08:24 AM, Alex Bennée wrote:
> From: KONRAD Frederic <fred.konrad@greensocs.com>
>
> We know there will be cases where MTTCG won't work until additional work
> is done in the front/back ends to support. It will however be useful to
> be able to turn it on.
>
> As a result MTTCG will default to off unless the combination is
> supported. However the user can turn it on for the sake of testing.
>
> Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
> [AJB: move to -accel tcg,thread=multi|single, defaults]
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

Reviewed-by: Richard Henderson <rth@twiddle.net>


r~

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [RFC v4 14/28] tcg: add kick timer for single-threaded vCPU emulation
  2016-08-11 15:24 ` [Qemu-devel] [RFC v4 14/28] tcg: add kick timer for single-threaded vCPU emulation Alex Bennée
@ 2016-09-07  3:25   ` Richard Henderson
  2016-09-07  5:40     ` Paolo Bonzini
  2016-09-07 10:19     ` Alex Bennée
  0 siblings, 2 replies; 64+ messages in thread
From: Richard Henderson @ 2016-09-07  3:25 UTC (permalink / raw)
  To: Alex Bennée, mttcg, qemu-devel, fred.konrad, a.rigo, cota,
	bobby.prani, nikunj
  Cc: peter.maydell, claudio.fontana, Peter Crosthwaite, jan.kiszka,
	mark.burton, serge.fdrv, pbonzini

On 08/11/2016 08:24 AM, Alex Bennée wrote:
> Currently we rely on the side effect of the main loop grabbing the
> iothread_mutex to give any long running basic block chains a kick to
> ensure the next vCPU is scheduled. As this code is being re-factored and
> rationalised we now do it explicitly here.
>
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
>
> ---
> v2
>   - re-base fixes
>   - get_ticks_per_sec() -> NANOSECONDS_PER_SEC
> v3
>   - add define for TCG_KICK_FREQ
>   - fix checkpatch warning
> v4
>   - wrap next calc in inline qemu_tcg_next_kick() instead of macro
> ---
>  cpus.c | 32 ++++++++++++++++++++++++++++++++
>  1 file changed, 32 insertions(+)
>
> diff --git a/cpus.c b/cpus.c
> index b5b45b8..8c49d6c 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -1185,9 +1185,34 @@ static void deal_with_unplugged_cpus(void)
>      }
>  }
>
> +/* Single-threaded TCG
> + *
> + * In the single-threaded case each vCPU is simulated in turn. If
> + * there is more than a single vCPU we create a simple timer to kick
> + * the vCPU and ensure we don't get stuck in a tight loop in one vCPU.
> + * This is done explicitly rather than relying on side-effects
> + * elsewhere.
> + */
> +static void qemu_cpu_kick_no_halt(void);
> +
> +#define TCG_KICK_PERIOD (NANOSECONDS_PER_SECOND / 10)
> +
> +static inline int64_t qemu_tcg_next_kick(void)
> +{
> +    return qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + TCG_KICK_PERIOD;
> +}
> +
> +static void kick_tcg_thread(void *opaque)
> +{
> +    QEMUTimer *self = *(QEMUTimer **) opaque;
> +    timer_mod(self, qemu_tcg_next_kick());
> +    qemu_cpu_kick_no_halt();
> +}
> +
>  static void *qemu_tcg_cpu_thread_fn(void *arg)
>  {
>      CPUState *cpu = arg;
> +    QEMUTimer *kick_timer;
>
>      rcu_register_thread();
>
> @@ -1211,6 +1236,13 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
>          }
>      }
>
> +    /* Set to kick if we have to do more than one vCPU */
> +    if (CPU_NEXT(first_cpu)) {
> +        kick_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,  kick_tcg_thread,
> +                                  &kick_timer);

I'm not especially keen on this pointer to local variable thing.  Perhaps better as

   kick_timer = timer_new_ns(..., NULL);
   kick_timer->opaque = kick_timer;

and avoid the indirection in kick_tcg_thread.

Also, we should shut down the timer when the cpu is removed, surely?


r~

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [RFC v4 15/28] tcg: rename tcg_current_cpu to tcg_current_rr_cpu
  2016-08-11 15:24 ` [Qemu-devel] [RFC v4 15/28] tcg: rename tcg_current_cpu to tcg_current_rr_cpu Alex Bennée
@ 2016-09-07  3:34   ` Richard Henderson
  0 siblings, 0 replies; 64+ messages in thread
From: Richard Henderson @ 2016-09-07  3:34 UTC (permalink / raw)
  To: Alex Bennée, mttcg, qemu-devel, fred.konrad, a.rigo, cota,
	bobby.prani, nikunj
  Cc: peter.maydell, claudio.fontana, Peter Crosthwaite, jan.kiszka,
	mark.burton, serge.fdrv, pbonzini

On 08/11/2016 08:24 AM, Alex Bennée wrote:
> ..and make the definition local to cpus. In preparation for MTTCG the
> concept of a global tcg_current_cpu will no longer make sense. However
> we still need to keep track of it in the single-threaded case to be able
> to exit quickly when required.
>
> qemu_cpu_kick_no_halt() moves and becomes qemu_cpu_kick_rr_cpu() to
> emphasise its use-case. qemu_cpu_kick now kicks the relevant cpu as
> well as qemu_kick_rr_cpu() which will become a no-op in MTTCG.
>
> For the time being the setting of the global exit_request remains.
>
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

Reviewed-by: Richard Henderson <rth@twiddle.net>


r~

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [RFC v4 16/28] tcg: drop global lock during TCG code execution
  2016-08-11 15:24 ` [Qemu-devel] [RFC v4 16/28] tcg: drop global lock during TCG code execution Alex Bennée
@ 2016-09-07  4:03   ` Richard Henderson
  2016-09-07  5:43     ` Paolo Bonzini
  0 siblings, 1 reply; 64+ messages in thread
From: Richard Henderson @ 2016-09-07  4:03 UTC (permalink / raw)
  To: Alex Bennée, mttcg, qemu-devel, fred.konrad, a.rigo, cota,
	bobby.prani, nikunj
  Cc: peter.maydell, claudio.fontana, Peter Crosthwaite, jan.kiszka,
	Michael S. Tsirkin, mark.burton, Eduardo Habkost, serge.fdrv,
	pbonzini

On 08/11/2016 08:24 AM, Alex Bennée wrote:
> +    if (mr->global_locking) {
> +        qemu_mutex_lock_iothread();
> +        locked = true;
> +    }
>      memory_region_dispatch_read(mr, physaddr, &val, 1 << SHIFT,
>                                  iotlbentry->attrs);
> +    if (locked) {
> +        qemu_mutex_unlock_iothread();
> +    }

I'm not keen on this pattern.

(1) Why not use recursive locks?

(2) If there's a good reason why not, then perhaps

   if (mr->global_locking) {
     qemu_mutex_lock_iothread();
     do_something;
     qemu_mutex_unlock_iothread();
   } else {
     do_something;
   }

is a better pattern to use.


r~

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [RFC v4 17/28] cpus: re-factor out handle_icount_deadline
  2016-08-11 15:24 ` [Qemu-devel] [RFC v4 17/28] cpus: re-factor out handle_icount_deadline Alex Bennée
@ 2016-09-07  4:06   ` Richard Henderson
  0 siblings, 0 replies; 64+ messages in thread
From: Richard Henderson @ 2016-09-07  4:06 UTC (permalink / raw)
  To: Alex Bennée, mttcg, qemu-devel, fred.konrad, a.rigo, cota,
	bobby.prani, nikunj
  Cc: peter.maydell, claudio.fontana, Peter Crosthwaite, jan.kiszka,
	mark.burton, serge.fdrv, pbonzini

On 08/11/2016 08:24 AM, Alex Bennée wrote:
> In preparation for adding a MTTCG thread we re-factor out a bit of what
> will be common code to handle the QEMU_CLOCK_VIRTUAL expiration.
>
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

Reviewed-by: Richard Henderson <rth@twiddle.net>


r~

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [RFC v4 18/28] tcg: remove global exit_request
  2016-08-11 15:24 ` [Qemu-devel] [RFC v4 18/28] tcg: remove global exit_request Alex Bennée
@ 2016-09-07  4:11   ` Richard Henderson
  0 siblings, 0 replies; 64+ messages in thread
From: Richard Henderson @ 2016-09-07  4:11 UTC (permalink / raw)
  To: Alex Bennée, mttcg, qemu-devel, fred.konrad, a.rigo, cota,
	bobby.prani, nikunj
  Cc: peter.maydell, claudio.fontana, Peter Crosthwaite, jan.kiszka,
	mark.burton, serge.fdrv, pbonzini

On 08/11/2016 08:24 AM, Alex Bennée wrote:
> There are now only two uses of the global exit_request left.
>
> The first ensures we exit the run_loop when we first start to process
> pending work and in the kick handler. This is just as easily done by
> setting the first_cpu->exit_request flag.
>
> The second use is in the round robin kick routine. The global
> exit_request ensured every vCPU would set its local exit_request and
> cause a full exit of the loop. Now the iothread isn't being held while
> running we can just rely on the kick handler to push us out as intended.
>
> We lightly re-factor the main vCPU thread to ensure cpu->exit_requests
> cause us to exit the main loop and process any IO requests that might
> come along.
>
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

Reviewed-by: Richard Henderson <rth@twiddle.net>


r~

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [RFC v4 20/28] cpus: tweak sleeping and safe_work rules for MTTCG
  2016-08-11 15:24 ` [Qemu-devel] [RFC v4 20/28] cpus: tweak sleeping and safe_work rules for MTTCG Alex Bennée
@ 2016-09-07  4:22   ` Richard Henderson
  2016-09-07 10:05   ` Paolo Bonzini
  1 sibling, 0 replies; 64+ messages in thread
From: Richard Henderson @ 2016-09-07  4:22 UTC (permalink / raw)
  To: Alex Bennée, mttcg, qemu-devel, fred.konrad, a.rigo, cota,
	bobby.prani, nikunj
  Cc: peter.maydell, claudio.fontana, Peter Crosthwaite, jan.kiszka,
	mark.burton, serge.fdrv, pbonzini

On 08/11/2016 08:24 AM, Alex Bennée wrote:
> Once TCG gains the ability to sleep individual threads we need to make
> sure they don't sleep when safe work is pending as all threads need to
> go through the process_queued_work function. Also if we have multiple
> threads wait_for_safe_work can now sleep without deadlocking.
>
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

Reviewed-by: Richard Henderson <rth@twiddle.net>


r~

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [RFC v4 21/28] tcg: enable tb_lock() for SoftMMU
  2016-08-11 15:24 ` [Qemu-devel] [RFC v4 21/28] tcg: enable tb_lock() for SoftMMU Alex Bennée
@ 2016-09-07  4:26   ` Richard Henderson
  2016-09-27 16:16   ` Paolo Bonzini
  1 sibling, 0 replies; 64+ messages in thread
From: Richard Henderson @ 2016-09-07  4:26 UTC (permalink / raw)
  To: Alex Bennée, mttcg, qemu-devel, fred.konrad, a.rigo, cota,
	bobby.prani, nikunj
  Cc: peter.maydell, claudio.fontana, Peter Crosthwaite, jan.kiszka,
	mark.burton, serge.fdrv, pbonzini

On 08/11/2016 08:24 AM, Alex Bennée wrote:
> tb_lock() has long been used for linux-user mode to protect code
> generation. By enabling it now we prepare for MTTCG and ensure all code
> generation is serialised by this lock. The other major structure that
> needs protecting is the l1_map and its PageDesc structures. For the
> SoftMMU case we also use tb_lock() to protect these structures instead
> of linux-user mmap_lock() which as the name suggests serialises updates
> to the structure as a result of guest mmap operations.
>
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

Reviewed-by: Richard Henderson <rth@twiddle.net>


r~

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [RFC v4 14/28] tcg: add kick timer for single-threaded vCPU emulation
  2016-09-07  3:25   ` Richard Henderson
@ 2016-09-07  5:40     ` Paolo Bonzini
  2016-09-07 10:15       ` Alex Bennée
  2016-09-07 10:19     ` Alex Bennée
  1 sibling, 1 reply; 64+ messages in thread
From: Paolo Bonzini @ 2016-09-07  5:40 UTC (permalink / raw)
  To: Richard Henderson, Alex Bennée, mttcg, qemu-devel,
	fred.konrad, a.rigo, cota, bobby.prani, nikunj
  Cc: peter.maydell, Peter Crosthwaite, claudio.fontana, mark.burton,
	jan.kiszka, serge.fdrv



On 07/09/2016 05:25, Richard Henderson wrote:
>>
>> +    /* Set to kick if we have to do more than one vCPU */
>> +    if (CPU_NEXT(first_cpu)) {
>> +        kick_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,  kick_tcg_thread,
>> +                                  &kick_timer);
> 
> I'm not especially keen on this pointer to local variable thing. 
> Perhaps better as
> 
>   kick_timer = timer_new_ns(..., NULL);
>   kick_timer->opaque = kick_timer;

Or put it in CPUState and pass that.

Paolo

> and avoid the indirection in kick_tcg_thread.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [RFC v4 16/28] tcg: drop global lock during TCG code execution
  2016-09-07  4:03   ` Richard Henderson
@ 2016-09-07  5:43     ` Paolo Bonzini
  2016-09-07  6:43       ` Richard Henderson
  0 siblings, 1 reply; 64+ messages in thread
From: Paolo Bonzini @ 2016-09-07  5:43 UTC (permalink / raw)
  To: Richard Henderson, Alex Bennée, mttcg, qemu-devel,
	fred.konrad, a.rigo, cota, bobby.prani, nikunj
  Cc: peter.maydell, Eduardo Habkost, Michael S. Tsirkin,
	claudio.fontana, Peter Crosthwaite, mark.burton, jan.kiszka,
	serge.fdrv



On 07/09/2016 06:03, Richard Henderson wrote:
> 
>> +    if (mr->global_locking) {
>> +        qemu_mutex_lock_iothread();
>> +        locked = true;
>> +    }
>>      memory_region_dispatch_read(mr, physaddr, &val, 1 << SHIFT,
>>                                  iotlbentry->attrs);
>> +    if (locked) {
>> +        qemu_mutex_unlock_iothread();
>> +    }
> 
> I'm not keen on this pattern.
> 
> (1) Why not use recursive locks?

Probably I'm biased by looking at this code too long, but... how would
they help?

> (2) If there's a good reason why not, then perhaps
> 
>   if (mr->global_locking) {
>     qemu_mutex_lock_iothread();
>     do_something;
>     qemu_mutex_unlock_iothread();
>   } else {
>     do_something;
>   }

I went with the other one because the arguments to
memory_region_dispatch_read are relatively many, but this can work too
if Alex prefers it.

Paolo

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [RFC v4 16/28] tcg: drop global lock during TCG code execution
  2016-09-07  5:43     ` Paolo Bonzini
@ 2016-09-07  6:43       ` Richard Henderson
  2016-09-07 15:15         ` Paolo Bonzini
  0 siblings, 1 reply; 64+ messages in thread
From: Richard Henderson @ 2016-09-07  6:43 UTC (permalink / raw)
  To: Paolo Bonzini, Alex Bennée, mttcg, qemu-devel, fred.konrad,
	a.rigo, cota, bobby.prani, nikunj
  Cc: peter.maydell, Eduardo Habkost, Michael S. Tsirkin,
	claudio.fontana, Peter Crosthwaite, mark.burton, jan.kiszka,
	serge.fdrv

On 09/06/2016 10:43 PM, Paolo Bonzini wrote:
>
>
> On 07/09/2016 06:03, Richard Henderson wrote:
>>
>>> +    if (mr->global_locking) {
>>> +        qemu_mutex_lock_iothread();
>>> +        locked = true;
>>> +    }
>>>      memory_region_dispatch_read(mr, physaddr, &val, 1 << SHIFT,
>>>                                  iotlbentry->attrs);
>>> +    if (locked) {
>>> +        qemu_mutex_unlock_iothread();
>>> +    }
>>
>> I'm not keen on this pattern.
>>
>> (1) Why not use recursive locks?
>
> Probably I'm biased by looking at this code too long, but... how would
> they help?

You wouldn't check to see if the lock is already taken.


r~

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [RFC v4 08/28] translate-all: Add assert_(memory|tb)_lock annotations
  2016-09-07  2:41   ` Richard Henderson
@ 2016-09-07  7:08     ` Alex Bennée
  0 siblings, 0 replies; 64+ messages in thread
From: Alex Bennée @ 2016-09-07  7:08 UTC (permalink / raw)
  To: Richard Henderson
  Cc: mttcg, qemu-devel, fred.konrad, a.rigo, cota, bobby.prani,
	nikunj, peter.maydell, claudio.fontana, Peter Crosthwaite,
	jan.kiszka, mark.burton, serge.fdrv, pbonzini


Richard Henderson <rth@twiddle.net> writes:

> On 08/11/2016 08:24 AM, Alex Bennée wrote:
>> This adds calls to the assert_(memory|tb)_lock for all public APIs which
>> are documented as needing them held for linux-user mode. The asserts are
>> NOPs for system-mode although these will be converted when MTTCG is
>> enabled.
>>
>> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
>
>
> Reviewed-by: Richard Henderson <rth@twiddle.net>
>
>> @@ -1111,7 +1121,7 @@ static void build_page_bitmap(PageDesc *p)
>>              tb_end = tb_start + tb->size;
>>              if (tb_end > TARGET_PAGE_SIZE) {
>>                  tb_end = TARGET_PAGE_SIZE;
>> -            }
>> +             }
>>          } else {
>
> Introducing a whitespace error?

Oops, that sneaked in.

--
Alex Bennée

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [RFC v4 20/28] cpus: tweak sleeping and safe_work rules for MTTCG
  2016-08-11 15:24 ` [Qemu-devel] [RFC v4 20/28] cpus: tweak sleeping and safe_work rules for MTTCG Alex Bennée
  2016-09-07  4:22   ` Richard Henderson
@ 2016-09-07 10:05   ` Paolo Bonzini
  1 sibling, 0 replies; 64+ messages in thread
From: Paolo Bonzini @ 2016-09-07 10:05 UTC (permalink / raw)
  To: Alex Bennée, mttcg, qemu-devel, fred.konrad, a.rigo, cota,
	bobby.prani, nikunj
  Cc: peter.maydell, claudio.fontana, Peter Crosthwaite, jan.kiszka,
	mark.burton, serge.fdrv, rth



On 11/08/2016 17:24, Alex Bennée wrote:
> Once TCG gains the ability to sleep individual threads we need to make
> sure they don't sleep when safe work is pending as all threads need to
> go through the process_queued_work function. Also if we have multiple
> threads wait_for_safe_work can now sleep without deadlocking.
> 
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> 
> ---
> v4
>   - new for v4, ontop of async-safe-work-v5
> ---
>  cpu-exec-common.c | 15 ++++++++++++++-
>  cpus.c            |  2 +-
>  include/qom/cpu.h |  8 ++++++++
>  3 files changed, 23 insertions(+), 2 deletions(-)
> 
> diff --git a/cpu-exec-common.c b/cpu-exec-common.c
> index e29cf6d..84cb789 100644
> --- a/cpu-exec-common.c
> +++ b/cpu-exec-common.c
> @@ -83,6 +83,18 @@ QemuCond qemu_exclusive_cond;
>  
>  static int safe_work_pending;
>  
> +/* No vCPUs can sleep while there is safe work pending as we need
> + * everything to finish up in process_cpu_work.
> + */
> +bool cpu_has_queued_work(CPUState *cpu)
> +{
> +    if (cpu->queued_work || atomic_mb_read(&safe_work_pending) > 0) {
> +        return true;
> +    } else {
> +        return false;
> +    }
> +}

This should not be needed anymore with the new version.  If a safe work
item is pending, its CPU will have cpu->queued_work != NULL.  The work
item will then run as soon as the CPU is scheduled because no other CPUs
can be inside TCG.

Paolo

>  #ifdef CONFIG_USER_ONLY
>  #define can_wait_for_safe() (1)
>  #else
> @@ -91,7 +103,8 @@ static int safe_work_pending;
>   * all vCPUs are in the same thread. This will change for MTTCG
>   * however.
>   */
> -#define can_wait_for_safe() (0)
> +extern int smp_cpus;
> +#define can_wait_for_safe() (mttcg_enabled && smp_cpus > 1)
>  #endif
>  
>  void wait_safe_cpu_work(void)
> diff --git a/cpus.c b/cpus.c
> index 8a40a08..4fc5e4c 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -88,7 +88,7 @@ bool cpu_is_stopped(CPUState *cpu)
>  
>  static bool cpu_thread_is_idle(CPUState *cpu)
>  {
> -    if (cpu->stop || cpu->queued_work) {
> +    if (cpu->stop || cpu_has_queued_work(cpu)) {
>          return false;
>      }
>      if (cpu_is_stopped(cpu)) {
> diff --git a/include/qom/cpu.h b/include/qom/cpu.h
> index 5ecbd29..d0db846 100644
> --- a/include/qom/cpu.h
> +++ b/include/qom/cpu.h
> @@ -661,6 +661,14 @@ void async_run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data);
>  void async_safe_run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data);
>  
>  /**
> + * cpu_has_queued_work:
> + * @cpu: The vCPU to check
> + *
> + * Returns true if there is *_run_on_cpu work to be done.
> + */
> +bool cpu_has_queued_work(CPUState *cpu);
> +
> +/**
>   * qemu_get_cpu:
>   * @index: The CPUState@cpu_index value of the CPU to obtain.
>   *
> 

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [RFC v4 25/28] cputlb: introduce tlb_flush_* async work.
  2016-08-11 15:24 ` [Qemu-devel] [RFC v4 25/28] cputlb: introduce tlb_flush_* async work Alex Bennée
@ 2016-09-07 10:08   ` Paolo Bonzini
  2016-09-08 17:23   ` Richard Henderson
  1 sibling, 0 replies; 64+ messages in thread
From: Paolo Bonzini @ 2016-09-07 10:08 UTC (permalink / raw)
  To: Alex Bennée, mttcg, qemu-devel, fred.konrad, a.rigo, cota,
	bobby.prani, nikunj
  Cc: peter.maydell, claudio.fontana, Peter Crosthwaite, jan.kiszka,
	mark.burton, serge.fdrv, rth



On 11/08/2016 17:24, Alex Bennée wrote:
> +    if (cpu->created && !qemu_cpu_is_self(cpu)) {

Is the cpu->created necessary?  It may introduce some potential races
and doesn't really add much.

> +        if (atomic_bool_cmpxchg(&cpu->pending_tlb_flush, false, true)) {

This is slightly cheaper:

	if (!atomic_xchg(&cpu->pending_tlb_flush, true)) {

Paolo

> +            async_run_on_cpu(cpu, tlb_flush_global_async_work,
> +                             GINT_TO_POINTER(flush_global));
> +        }
> +    } else {
> +        tlb_flush_nocheck(cpu, flush_global);
> +    }

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [RFC v4 28/28] cputlb: make tlb_flush_by_mmuidx safe for MTTCG
  2016-08-11 15:24 ` [Qemu-devel] [RFC v4 28/28] cputlb: make tlb_flush_by_mmuidx " Alex Bennée
@ 2016-09-07 10:09   ` Paolo Bonzini
  2016-09-08 17:54   ` Richard Henderson
  1 sibling, 0 replies; 64+ messages in thread
From: Paolo Bonzini @ 2016-09-07 10:09 UTC (permalink / raw)
  To: Alex Bennée, mttcg, qemu-devel, fred.konrad, a.rigo, cota,
	bobby.prani, nikunj
  Cc: peter.maydell, claudio.fontana, Peter Crosthwaite, jan.kiszka,
	mark.burton, serge.fdrv, rth



On 11/08/2016 17:24, Alex Bennée wrote:
> -        if (atomic_bool_cmpxchg(&cpu->pending_tlb_flush, false, true)) {
> +        if (atomic_mb_read(&cpu->pending_tlb_flush) != ALL_MMUIDX_BITS) {
> +            atomic_mb_set(&cpu->pending_tlb_flush, ALL_MMUIDX_BITS);

This can use atomic_xchg as well.  atomic_bool_cmpxchg seems unnecessary
then.  (Not that I have anything against it, but there are no users yet :)).

Paolo

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [RFC v4 14/28] tcg: add kick timer for single-threaded vCPU emulation
  2016-09-07  5:40     ` Paolo Bonzini
@ 2016-09-07 10:15       ` Alex Bennée
  0 siblings, 0 replies; 64+ messages in thread
From: Alex Bennée @ 2016-09-07 10:15 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Richard Henderson, mttcg, qemu-devel, fred.konrad, a.rigo, cota,
	bobby.prani, nikunj, peter.maydell, Peter Crosthwaite,
	claudio.fontana, mark.burton, jan.kiszka, serge.fdrv


Paolo Bonzini <pbonzini@redhat.com> writes:

> On 07/09/2016 05:25, Richard Henderson wrote:
>>>
>>> +    /* Set to kick if we have to do more than one vCPU */
>>> +    if (CPU_NEXT(first_cpu)) {
>>> +        kick_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,  kick_tcg_thread,
>>> +                                  &kick_timer);
>>
>> I'm not especially keen on this pointer to local variable thing.
>> Perhaps better as
>>
>>   kick_timer = timer_new_ns(..., NULL);
>>   kick_timer->opaque = kick_timer;
>
> Or put it in CPUState and pass that.

I was trying to avoid expanding CPUState for something that is only
required in one mode of operation. However I appreciate abusing the
stack is a little magical even though we never leave the function.

If setting kick_timer->opaque directly doesn't violate the interface
then I'll do that.

>
> Paolo
>
>> and avoid the indirection in kick_tcg_thread.


--
Alex Bennée

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [RFC v4 14/28] tcg: add kick timer for single-threaded vCPU emulation
  2016-09-07  3:25   ` Richard Henderson
  2016-09-07  5:40     ` Paolo Bonzini
@ 2016-09-07 10:19     ` Alex Bennée
  1 sibling, 0 replies; 64+ messages in thread
From: Alex Bennée @ 2016-09-07 10:19 UTC (permalink / raw)
  To: Richard Henderson
  Cc: mttcg, qemu-devel, fred.konrad, a.rigo, cota, bobby.prani,
	nikunj, peter.maydell, claudio.fontana, Peter Crosthwaite,
	jan.kiszka, mark.burton, serge.fdrv, pbonzini


Richard Henderson <rth@twiddle.net> writes:

> On 08/11/2016 08:24 AM, Alex Bennée wrote:
>> Currently we rely on the side effect of the main loop grabbing the
>> iothread_mutex to give any long running basic block chains a kick to
>> ensure the next vCPU is scheduled. As this code is being re-factored and
>> rationalised we now do it explicitly here.
>>
>> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
>>
>> ---
>> v2
>>   - re-base fixes
>>   - get_ticks_per_sec() -> NANOSECONDS_PER_SEC
>> v3
>>   - add define for TCG_KICK_FREQ
>>   - fix checkpatch warning
>> v4
>>   - wrap next calc in inline qemu_tcg_next_kick() instead of macro
>> ---
>>  cpus.c | 32 ++++++++++++++++++++++++++++++++
>>  1 file changed, 32 insertions(+)
>>
>> diff --git a/cpus.c b/cpus.c
>> index b5b45b8..8c49d6c 100644
>> --- a/cpus.c
>> +++ b/cpus.c
>> @@ -1185,9 +1185,34 @@ static void deal_with_unplugged_cpus(void)
>>      }
>>  }
>>
>> +/* Single-threaded TCG
>> + *
>> + * In the single-threaded case each vCPU is simulated in turn. If
>> + * there is more than a single vCPU we create a simple timer to kick
>> + * the vCPU and ensure we don't get stuck in a tight loop in one vCPU.
>> + * This is done explicitly rather than relying on side-effects
>> + * elsewhere.
>> + */
>> +static void qemu_cpu_kick_no_halt(void);
>> +
>> +#define TCG_KICK_PERIOD (NANOSECONDS_PER_SECOND / 10)
>> +
>> +static inline int64_t qemu_tcg_next_kick(void)
>> +{
>> +    return qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + TCG_KICK_PERIOD;
>> +}
>> +
>> +static void kick_tcg_thread(void *opaque)
>> +{
>> +    QEMUTimer *self = *(QEMUTimer **) opaque;
>> +    timer_mod(self, qemu_tcg_next_kick());
>> +    qemu_cpu_kick_no_halt();
>> +}
>> +
>>  static void *qemu_tcg_cpu_thread_fn(void *arg)
>>  {
>>      CPUState *cpu = arg;
>> +    QEMUTimer *kick_timer;
>>
>>      rcu_register_thread();
>>
>> @@ -1211,6 +1236,13 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
>>          }
>>      }
>>
>> +    /* Set to kick if we have to do more than one vCPU */
>> +    if (CPU_NEXT(first_cpu)) {
>> +        kick_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,  kick_tcg_thread,
>> +                                  &kick_timer);
>
> I'm not especially keen on this pointer to local variable thing.  Perhaps better as
>
>    kick_timer = timer_new_ns(..., NULL);
>    kick_timer->opaque = kick_timer;

Yeah it is a bit less magical that way.

>
> and avoid the indirection in kick_tcg_thread.
>
> Also, we should shut down the timer when the cpu is removed, surely?

That is an interesting point - but the timer is shared across all CPUs
so we need to keep kicking until there is only one CPU left. AFAICT the
current cpu hot plug code basically just suspends the vCPU forever
rather than trying to clean up all its associated resources.

>
>
> r~


--
Alex Bennée

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [RFC v4 16/28] tcg: drop global lock during TCG code execution
  2016-09-07  6:43       ` Richard Henderson
@ 2016-09-07 15:15         ` Paolo Bonzini
  0 siblings, 0 replies; 64+ messages in thread
From: Paolo Bonzini @ 2016-09-07 15:15 UTC (permalink / raw)
  To: Richard Henderson, Alex Bennée, mttcg, qemu-devel,
	fred.konrad, a.rigo, cota, bobby.prani, nikunj
  Cc: peter.maydell, Eduardo Habkost, Peter Crosthwaite,
	claudio.fontana, Michael S. Tsirkin, mark.burton, jan.kiszka,
	serge.fdrv



On 07/09/2016 08:43, Richard Henderson wrote:
> On 09/06/2016 10:43 PM, Paolo Bonzini wrote:
>>
>>
>> On 07/09/2016 06:03, Richard Henderson wrote:
>>>
>>>> +    if (mr->global_locking) {
>>>> +        qemu_mutex_lock_iothread();
>>>> +        locked = true;
>>>> +    }
>>>>      memory_region_dispatch_read(mr, physaddr, &val, 1 << SHIFT,
>>>>                                  iotlbentry->attrs);
>>>> +    if (locked) {
>>>> +        qemu_mutex_unlock_iothread();
>>>> +    }
>>>
>>> I'm not keen on this pattern.
>>>
>>> (1) Why not use recursive locks?
>>
>> Probably I'm biased by looking at this code too long, but... how would
>> they help?
> 
> You wouldn't check to see if the lock is already taken.

But it's only using qemu_mutex_iothread_locked() at unlock time, when we
exit from longjmp.  Here it's looking at a MemoryRegion to see if the
MMIO callbacks need the lock or not (since most of them do).

Paolo

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [RFC v4 23/28] atomic: introduce cmpxchg_bool
  2016-08-11 15:24 ` [Qemu-devel] [RFC v4 23/28] atomic: introduce cmpxchg_bool Alex Bennée
@ 2016-09-08  0:12   ` Richard Henderson
  0 siblings, 0 replies; 64+ messages in thread
From: Richard Henderson @ 2016-09-08  0:12 UTC (permalink / raw)
  To: Alex Bennée, mttcg, qemu-devel, fred.konrad, a.rigo, cota,
	bobby.prani, nikunj
  Cc: peter.maydell, claudio.fontana, jan.kiszka, mark.burton,
	serge.fdrv, pbonzini

On 08/11/2016 08:24 AM, Alex Bennée wrote:
> +#define atomic_bool_cmpxchg(ptr, old, new)                              \
> +    ({                                                                  \
> +    typeof(*ptr) _old = (old), _new = (new);                            \

Needs updating for typeof_strip_qual.


r~

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [RFC v4 24/28] cputlb: add assert_cpu_is_self checks
  2016-08-11 15:24 ` [Qemu-devel] [RFC v4 24/28] cputlb: add assert_cpu_is_self checks Alex Bennée
@ 2016-09-08 17:19   ` Richard Henderson
  0 siblings, 0 replies; 64+ messages in thread
From: Richard Henderson @ 2016-09-08 17:19 UTC (permalink / raw)
  To: Alex Bennée, mttcg, qemu-devel, fred.konrad, a.rigo, cota,
	bobby.prani, nikunj
  Cc: peter.maydell, claudio.fontana, Peter Crosthwaite, jan.kiszka,
	mark.burton, serge.fdrv, pbonzini

On 08/11/2016 08:24 AM, Alex Bennée wrote:
> For SoftMMU the TLB flushes are an example of a task that can be
> triggered on one vCPU by another. To deal with this properly we need to
> use safe work to ensure these changes are done safely. The new assert
> can be enabled while debugging to catch these cases.
> 
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> ---
>  cputlb.c | 15 +++++++++++++++
>  1 file changed, 15 insertions(+)

Reviewed-by: Richard Henderson <rth@twiddle.net>


r~

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [RFC v4 25/28] cputlb: introduce tlb_flush_* async work.
  2016-08-11 15:24 ` [Qemu-devel] [RFC v4 25/28] cputlb: introduce tlb_flush_* async work Alex Bennée
  2016-09-07 10:08   ` Paolo Bonzini
@ 2016-09-08 17:23   ` Richard Henderson
  1 sibling, 0 replies; 64+ messages in thread
From: Richard Henderson @ 2016-09-08 17:23 UTC (permalink / raw)
  To: Alex Bennée, mttcg, qemu-devel, fred.konrad, a.rigo, cota,
	bobby.prani, nikunj
  Cc: peter.maydell, claudio.fontana, Peter Crosthwaite, jan.kiszka,
	mark.burton, serge.fdrv, pbonzini

On 08/11/2016 08:24 AM, Alex Bennée wrote:
> + * Since QEMU doesn't currently implement a global/not-global flag
> + * for tlb entries, at the moment tlb_flush() will also flush all
> + * tlb entries in the flush_global == false case. This is OK because
> + * CPU architectures generally permit an implementation to drop
> + * entries from the TLB at any time, so flushing more entries than
> + * required is only an efficiency issue, not a correctness issue.
> + */
> +void tlb_flush(CPUState *cpu, int flush_global)
> +{
> +    if (cpu->created && !qemu_cpu_is_self(cpu)) {
> +        if (atomic_bool_cmpxchg(&cpu->pending_tlb_flush, false, true)) {
> +            async_run_on_cpu(cpu, tlb_flush_global_async_work,
> +                             GINT_TO_POINTER(flush_global));

Given that we don't actually do anything with flush_global, let's not work so
hard to pass it down.  Especially with something as ugly as GINT_TO_POINTER.

Or indeed, as a cleanup, remove that argument from all callers.  If we want to
retain the documentation for the targets, we could do

#define tlb_flush_local   tlb_flush
#define tlb_flush_global  tlb_flush


r~

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [RFC v4 26/28] cputlb: tweak qemu_ram_addr_from_host_nofail reporting
  2016-08-11 15:24 ` [Qemu-devel] [RFC v4 26/28] cputlb: tweak qemu_ram_addr_from_host_nofail reporting Alex Bennée
@ 2016-09-08 17:24   ` Richard Henderson
  0 siblings, 0 replies; 64+ messages in thread
From: Richard Henderson @ 2016-09-08 17:24 UTC (permalink / raw)
  To: Alex Bennée, mttcg, qemu-devel, fred.konrad, a.rigo, cota,
	bobby.prani, nikunj
  Cc: peter.maydell, claudio.fontana, Peter Crosthwaite, jan.kiszka,
	mark.burton, serge.fdrv, pbonzini

On 08/11/2016 08:24 AM, Alex Bennée wrote:
> This moves the helper function closer to where it is called and updates
> the error message to report via error_report instead of the deprecated
> fprintf.
> 
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> ---
>  cputlb.c | 24 ++++++++++++------------
>  1 file changed, 12 insertions(+), 12 deletions(-)

Reviewed-by: Richard Henderson <rth@twiddle.net>


r~

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [RFC v4 27/28] cputlb: make tlb_reset_dirty safe for MTTCG
  2016-08-11 15:24 ` [Qemu-devel] [RFC v4 27/28] cputlb: make tlb_reset_dirty safe for MTTCG Alex Bennée
@ 2016-09-08 17:34   ` Richard Henderson
  0 siblings, 0 replies; 64+ messages in thread
From: Richard Henderson @ 2016-09-08 17:34 UTC (permalink / raw)
  To: Alex Bennée, mttcg, qemu-devel, fred.konrad, a.rigo, cota,
	bobby.prani, nikunj
  Cc: peter.maydell, claudio.fontana, Peter Crosthwaite, jan.kiszka,
	mark.burton, serge.fdrv, pbonzini

On 08/11/2016 08:24 AM, Alex Bennée wrote:
> The main use case for tlb_reset_dirty is to set the TLB_NOTDIRTY flags
> in TLB entries to force the slow-path on writes. This is used to mark
> page ranges containing code which has been translated so it can be
> invalidated if written to. To do this safely we need to ensure the TLB
> entries in question for all vCPUs are updated before we attempt to run
> the code otherwise a race could be introduced.
> 
> To achieve this we atomically set the flag in tlb_reset_dirty_range and
> take care when setting it when the TLB entry is filled.
> 
> The helper function is made static as it isn't used outside of cputlb.
> 
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> ---
>  cputlb.c              | 55 +++++++++++++++++++++++++++++++++++----------------
>  include/exec/cputlb.h |  2 --
>  2 files changed, 38 insertions(+), 19 deletions(-)

Reviewed-by: Richard Henderson <rth@twiddle.net>


r~

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [RFC v4 28/28] cputlb: make tlb_flush_by_mmuidx safe for MTTCG
  2016-08-11 15:24 ` [Qemu-devel] [RFC v4 28/28] cputlb: make tlb_flush_by_mmuidx " Alex Bennée
  2016-09-07 10:09   ` Paolo Bonzini
@ 2016-09-08 17:54   ` Richard Henderson
  1 sibling, 0 replies; 64+ messages in thread
From: Richard Henderson @ 2016-09-08 17:54 UTC (permalink / raw)
  To: Alex Bennée, mttcg, qemu-devel, fred.konrad, a.rigo, cota,
	bobby.prani, nikunj
  Cc: peter.maydell, claudio.fontana, Peter Crosthwaite, jan.kiszka,
	mark.burton, serge.fdrv, pbonzini

On 08/11/2016 08:24 AM, Alex Bennée wrote:
> +    for (mmu_idx = 0; mmu_idx < NB_MMU_MODES; mmu_idx++) {
>  
> -        tlb_debug("%d\n", mmu_idx);
> +        if (test_bit(mmu_idx, &mmu_idx_bitmask)) {
> +            tlb_debug("%d\n", mmu_idx);
>  
> -        memset(env->tlb_table[mmu_idx], -1, sizeof(env->tlb_table[0]));
> -        memset(env->tlb_v_table[mmu_idx], -1, sizeof(env->tlb_v_table[0]));
> +            memset(env->tlb_table[mmu_idx], -1, sizeof(env->tlb_table[0]));
> +            memset(env->tlb_v_table[mmu_idx], -1, sizeof(env->tlb_v_table[0]));
> +        }

Maybe better with a

  for (; bitmask; bitmask &= bitmask - 1) {
      int mmu_idx = ctz32(bitmask);

sort of loop?  That would certainly help if bitmasks of less than all-modes is
at all common.

If an all-modes bitmask is common, perhaps we should special case that with a
single much larger memset.


r~

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [RFC v4 19/28] tcg: move locking for tb_invalidate_phys_page_range up
  2016-08-11 15:24 ` [Qemu-devel] [RFC v4 19/28] tcg: move locking for tb_invalidate_phys_page_range up Alex Bennée
@ 2016-09-27 15:56   ` Paolo Bonzini
  0 siblings, 0 replies; 64+ messages in thread
From: Paolo Bonzini @ 2016-09-27 15:56 UTC (permalink / raw)
  To: Alex Bennée, mttcg, qemu-devel, fred.konrad, a.rigo, cota,
	bobby.prani, nikunj
  Cc: peter.maydell, claudio.fontana, Peter Crosthwaite, jan.kiszka,
	mark.burton, serge.fdrv, rth



On 11/08/2016 17:24, Alex Bennée wrote:
> In the linux-user case all things that involve ''l1_map' and PageDesc
> tweaks are protected by the memory lock (mmpa_lock). For SoftMMU mode
> we previously relied on single threaded behaviour, with MTTCG we now use
> the tb_lock().
> 
> As a result we need to do a little re-factoring  and push the taking of
> this lock up the call tree. This requires a slightly different entry for
> the SoftMMU and user-mode cases from tb_invalidate_phys_range.

What exactly requires pushing the lock up?  IIUC it's really just for
the code bitmap, but then the commit message should say so.

> This also means user-mode breakpoint insertion needs to take two locks
> but it hadn't taken any previously so this is an improvement.

Was it really a problem?  Walking the l1_map needs no lock---only
growing it and accessing the PageDesc does.

Paolo

> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> 
> ---
> v4
>  - reword commit message
> ---
>  exec.c          | 16 ++++++++++++++++
>  translate-all.c | 37 +++++++++++++++++++++++++++++--------
>  2 files changed, 45 insertions(+), 8 deletions(-)
> 
> diff --git a/exec.c b/exec.c
> index a39a200..f418725 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -722,7 +722,11 @@ void cpu_exec_init(CPUState *cpu, Error **errp)
>  #if defined(CONFIG_USER_ONLY)
>  static void breakpoint_invalidate(CPUState *cpu, target_ulong pc)
>  {
> +    mmap_lock();
> +    tb_lock();
>      tb_invalidate_phys_page_range(pc, pc + 1, 0);
> +    tb_unlock();
> +    mmap_unlock();
>  }
>  #else
>  static void breakpoint_invalidate(CPUState *cpu, target_ulong pc)
> @@ -731,6 +735,7 @@ static void breakpoint_invalidate(CPUState *cpu, target_ulong pc)
>      hwaddr phys = cpu_get_phys_page_attrs_debug(cpu, pc, &attrs);
>      int asidx = cpu_asidx_from_attrs(cpu, attrs);
>      if (phys != -1) {
> +        /* Locks grabbed by tb_invalidate_phys_addr */
>          tb_invalidate_phys_addr(cpu->cpu_ases[asidx].as,
>                                  phys | (pc & ~TARGET_PAGE_MASK));
>      }
> @@ -2009,7 +2014,11 @@ ram_addr_t qemu_ram_addr_from_host(void *ptr)
>  static void notdirty_mem_write(void *opaque, hwaddr ram_addr,
>                                 uint64_t val, unsigned size)
>  {
> +    bool locked = false;
> +
>      if (!cpu_physical_memory_get_dirty_flag(ram_addr, DIRTY_MEMORY_CODE)) {
> +        locked = true;
> +        tb_lock();
>          tb_invalidate_phys_page_fast(ram_addr, size);
>      }
>      switch (size) {
> @@ -2025,6 +2034,11 @@ static void notdirty_mem_write(void *opaque, hwaddr ram_addr,
>      default:
>          abort();
>      }
> +
> +    if (locked) {
> +        tb_unlock();
> +    }
> +
>      /* Set both VGA and migration bits for simplicity and to remove
>       * the notdirty callback faster.
>       */
> @@ -2505,7 +2519,9 @@ static void invalidate_and_set_dirty(MemoryRegion *mr, hwaddr addr,
>              cpu_physical_memory_range_includes_clean(addr, length, dirty_log_mask);
>      }
>      if (dirty_log_mask & (1 << DIRTY_MEMORY_CODE)) {
> +        tb_lock();
>          tb_invalidate_phys_range(addr, addr + length);
> +        tb_unlock();
>          dirty_log_mask &= ~(1 << DIRTY_MEMORY_CODE);
>      }
>      cpu_physical_memory_set_dirty_range(addr, length, dirty_log_mask);
> diff --git a/translate-all.c b/translate-all.c
> index c650696..c53ae8c 100644
> --- a/translate-all.c
> +++ b/translate-all.c
> @@ -1363,12 +1363,11 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
>   * access: the virtual CPU will exit the current TB if code is modified inside
>   * this TB.
>   *
> - * Called with mmap_lock held for user-mode emulation
> + * Called with mmap_lock held for user-mode emulation, grabs tb_lock
> + * Called with tb_lock held for system-mode emulation
>   */
> -void tb_invalidate_phys_range(tb_page_addr_t start, tb_page_addr_t end)
> +static void tb_invalidate_phys_range_1(tb_page_addr_t start, tb_page_addr_t end)
>  {
> -    assert_memory_lock();
> -
>      while (start < end) {
>          tb_invalidate_phys_page_range(start, end, 0);
>          start &= TARGET_PAGE_MASK;
> @@ -1376,6 +1375,21 @@ void tb_invalidate_phys_range(tb_page_addr_t start, tb_page_addr_t end)
>      }
>  }
>  
> +#ifdef CONFIG_SOFTMMU
> +void tb_invalidate_phys_range(tb_page_addr_t start, tb_page_addr_t end)
> +{
> +    assert_tb_lock();
> +    tb_invalidate_phys_range_1(start, end);
> +}
> +#else
> +void tb_invalidate_phys_range(tb_page_addr_t start, tb_page_addr_t end)
> +{
> +    assert_memory_lock();
> +    tb_lock();
> +    tb_invalidate_phys_range_1(start, end);
> +    tb_unlock();
> +}
> +#endif
>  /*
>   * Invalidate all TBs which intersect with the target physical address range
>   * [start;end[. NOTE: start and end must refer to the *same* physical page.
> @@ -1383,7 +1397,8 @@ void tb_invalidate_phys_range(tb_page_addr_t start, tb_page_addr_t end)
>   * access: the virtual CPU will exit the current TB if code is modified inside
>   * this TB.
>   *
> - * Called with mmap_lock held for user-mode emulation
> + * Called with tb_lock/mmap_lock held for user-mode emulation
> + * Called with tb_lock held for system-mode emulation
>   */
>  void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end,
>                                     int is_cpu_write_access)
> @@ -1406,6 +1421,7 @@ void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end,
>  #endif /* TARGET_HAS_PRECISE_SMC */
>  
>      assert_memory_lock();
> +    assert_tb_lock();
>  
>      p = page_find(start >> TARGET_PAGE_BITS);
>      if (!p) {
> @@ -1420,7 +1436,6 @@ void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end,
>      /* we remove all the TBs in the range [start, end[ */
>      /* XXX: see if in some cases it could be faster to invalidate all
>         the code */
> -    tb_lock();
>      tb = p->first_tb;
>      while (tb != NULL) {
>          n = (uintptr_t)tb & 3;
> @@ -1480,12 +1495,12 @@ void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end,
>          cpu_loop_exit_noexc(cpu);
>      }
>  #endif
> -    tb_unlock();
>  }
>  
>  #ifdef CONFIG_SOFTMMU
>  /* len must be <= 8 and start must be a multiple of len.
> - * Called via softmmu_template.h, with iothread mutex not held.
> + * Called via softmmu_template.h when code areas are written to with
> + * tb_lock held.
>   */
>  void tb_invalidate_phys_page_fast(tb_page_addr_t start, int len)
>  {
> @@ -1500,6 +1515,8 @@ void tb_invalidate_phys_page_fast(tb_page_addr_t start, int len)
>                    (intptr_t)cpu_single_env->segs[R_CS].base);
>      }
>  #endif
> +    assert_memory_lock();
> +
>      p = page_find(start >> TARGET_PAGE_BITS);
>      if (!p) {
>          return;
> @@ -1547,6 +1564,8 @@ static bool tb_invalidate_phys_page(tb_page_addr_t addr, uintptr_t pc)
>      uint32_t current_flags = 0;
>  #endif
>  
> +    assert_memory_lock();
> +
>      addr &= TARGET_PAGE_MASK;
>      p = page_find(addr >> TARGET_PAGE_BITS);
>      if (!p) {
> @@ -1650,7 +1669,9 @@ void tb_invalidate_phys_addr(AddressSpace *as, hwaddr addr)
>          return;
>      }
>      ram_addr = memory_region_get_ram_addr(mr) + addr;
> +    tb_lock();
>      tb_invalidate_phys_page_range(ram_addr, ram_addr + 1, 0);
> +    tb_unlock();
>      rcu_read_unlock();
>  }
>  #endif /* !defined(CONFIG_USER_ONLY) */
> 

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [RFC v4 21/28] tcg: enable tb_lock() for SoftMMU
  2016-08-11 15:24 ` [Qemu-devel] [RFC v4 21/28] tcg: enable tb_lock() for SoftMMU Alex Bennée
  2016-09-07  4:26   ` Richard Henderson
@ 2016-09-27 16:16   ` Paolo Bonzini
  1 sibling, 0 replies; 64+ messages in thread
From: Paolo Bonzini @ 2016-09-27 16:16 UTC (permalink / raw)
  To: Alex Bennée, mttcg, qemu-devel, fred.konrad, a.rigo, cota,
	bobby.prani, nikunj
  Cc: peter.maydell, claudio.fontana, Peter Crosthwaite, jan.kiszka,
	mark.burton, serge.fdrv, rth



On 11/08/2016 17:24, Alex Bennée wrote:
> tb_lock() has long been used for linux-user mode to protect code
> generation. By enabling it now we prepare for MTTCG and ensure all code
> generation is serialised by this lock. The other major structure that
> needs protecting is the l1_map and its PageDesc structures. For the
> SoftMMU case we also use tb_lock() to protect these structures instead
> of linux-user mmap_lock() which as the name suggests serialises updates
> to the structure as a result of guest mmap operations.
> 
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> 
> ---
> v4
>   - split from main tcg: enable thread-per-vCPU patch

Is it possible to extend this patch so that tb_lock/tb_unlock are no-ops
unless tcg_enabled()?  Maybe even move tb_lock/tb_unlock to tcg/tcg.h as
inlines and only leaving the actual TCG path in translate-all.c...

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 64+ messages in thread

end of thread, other threads:[~2016-09-27 16:16 UTC | newest]

Thread overview: 64+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-11 15:23 [Qemu-devel] [RFC v4 00/28] Base enabling patches for MTTCG Alex Bennée
2016-08-11 15:23 ` [Qemu-devel] [RFC v4 01/28] cpus: make all_vcpus_paused() return bool Alex Bennée
2016-08-11 15:23 ` [Qemu-devel] [RFC v4 02/28] translate_all: DEBUG_FLUSH -> DEBUG_TB_FLUSH Alex Bennée
2016-08-11 15:23 ` [Qemu-devel] [RFC v4 03/28] translate-all: add DEBUG_LOCKING asserts Alex Bennée
2016-08-11 15:24 ` [Qemu-devel] [RFC v4 04/28] cpu-exec: include cpu_index in CPU_LOG_EXEC messages Alex Bennée
2016-09-07  2:21   ` Richard Henderson
2016-08-11 15:24 ` [Qemu-devel] [RFC v4 05/28] docs: new design document multi-thread-tcg.txt (DRAFTING) Alex Bennée
2016-08-11 15:24 ` [Qemu-devel] [RFC v4 06/28] tcg: comment on which functions have to be called with tb_lock held Alex Bennée
2016-09-07  2:30   ` Richard Henderson
2016-08-11 15:24 ` [Qemu-devel] [RFC v4 07/28] linux-user/elfload: ensure mmap_lock() held while setting up Alex Bennée
2016-09-07  2:34   ` Richard Henderson
2016-08-11 15:24 ` [Qemu-devel] [RFC v4 08/28] translate-all: Add assert_(memory|tb)_lock annotations Alex Bennée
2016-09-07  2:41   ` Richard Henderson
2016-09-07  7:08     ` Alex Bennée
2016-08-11 15:24 ` [Qemu-devel] [RFC v4 09/28] tcg: protect TBContext with tb_lock Alex Bennée
2016-09-07  2:48   ` Richard Henderson
2016-08-11 15:24 ` [Qemu-devel] [RFC v4 10/28] target-arm/arm-powerctl: wake up sleeping CPUs Alex Bennée
2016-08-11 15:24 ` [Qemu-devel] [RFC v4 11/28] tcg: move tcg_exec_all and helpers above thread fn Alex Bennée
2016-09-07  2:53   ` Richard Henderson
2016-08-11 15:24 ` [Qemu-devel] [RFC v4 12/28] tcg: cpus rm tcg_exec_all() Alex Bennée
2016-08-11 15:24 ` [Qemu-devel] [RFC v4 13/28] tcg: add options for enabling MTTCG Alex Bennée
2016-09-07  3:06   ` Richard Henderson
2016-08-11 15:24 ` [Qemu-devel] [RFC v4 14/28] tcg: add kick timer for single-threaded vCPU emulation Alex Bennée
2016-09-07  3:25   ` Richard Henderson
2016-09-07  5:40     ` Paolo Bonzini
2016-09-07 10:15       ` Alex Bennée
2016-09-07 10:19     ` Alex Bennée
2016-08-11 15:24 ` [Qemu-devel] [RFC v4 15/28] tcg: rename tcg_current_cpu to tcg_current_rr_cpu Alex Bennée
2016-09-07  3:34   ` Richard Henderson
2016-08-11 15:24 ` [Qemu-devel] [RFC v4 16/28] tcg: drop global lock during TCG code execution Alex Bennée
2016-09-07  4:03   ` Richard Henderson
2016-09-07  5:43     ` Paolo Bonzini
2016-09-07  6:43       ` Richard Henderson
2016-09-07 15:15         ` Paolo Bonzini
2016-08-11 15:24 ` [Qemu-devel] [RFC v4 17/28] cpus: re-factor out handle_icount_deadline Alex Bennée
2016-09-07  4:06   ` Richard Henderson
2016-08-11 15:24 ` [Qemu-devel] [RFC v4 18/28] tcg: remove global exit_request Alex Bennée
2016-09-07  4:11   ` Richard Henderson
2016-08-11 15:24 ` [Qemu-devel] [RFC v4 19/28] tcg: move locking for tb_invalidate_phys_page_range up Alex Bennée
2016-09-27 15:56   ` Paolo Bonzini
2016-08-11 15:24 ` [Qemu-devel] [RFC v4 20/28] cpus: tweak sleeping and safe_work rules for MTTCG Alex Bennée
2016-09-07  4:22   ` Richard Henderson
2016-09-07 10:05   ` Paolo Bonzini
2016-08-11 15:24 ` [Qemu-devel] [RFC v4 21/28] tcg: enable tb_lock() for SoftMMU Alex Bennée
2016-09-07  4:26   ` Richard Henderson
2016-09-27 16:16   ` Paolo Bonzini
2016-08-11 15:24 ` [Qemu-devel] [RFC v4 22/28] tcg: enable thread-per-vCPU Alex Bennée
2016-08-11 15:24 ` [Qemu-devel] [RFC v4 23/28] atomic: introduce cmpxchg_bool Alex Bennée
2016-09-08  0:12   ` Richard Henderson
2016-08-11 15:24 ` [Qemu-devel] [RFC v4 24/28] cputlb: add assert_cpu_is_self checks Alex Bennée
2016-09-08 17:19   ` Richard Henderson
2016-08-11 15:24 ` [Qemu-devel] [RFC v4 25/28] cputlb: introduce tlb_flush_* async work Alex Bennée
2016-09-07 10:08   ` Paolo Bonzini
2016-09-08 17:23   ` Richard Henderson
2016-08-11 15:24 ` [Qemu-devel] [RFC v4 26/28] cputlb: tweak qemu_ram_addr_from_host_nofail reporting Alex Bennée
2016-09-08 17:24   ` Richard Henderson
2016-08-11 15:24 ` [Qemu-devel] [RFC v4 27/28] cputlb: make tlb_reset_dirty safe for MTTCG Alex Bennée
2016-09-08 17:34   ` Richard Henderson
2016-08-11 15:24 ` [Qemu-devel] [RFC v4 28/28] cputlb: make tlb_flush_by_mmuidx " Alex Bennée
2016-09-07 10:09   ` Paolo Bonzini
2016-09-08 17:54   ` Richard Henderson
2016-08-11 17:22 ` [Qemu-devel] [RFC v4 00/28] Base enabling patches " Alex Bennée
2016-08-12  8:02   ` Alex Bennée
2016-09-06  9:24 ` Alex Bennée

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.