All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [RFC v2 00/11] Base enabling patches for MTTCG
@ 2016-04-05 15:32 Alex Bennée
  2016-04-05 15:32 ` [Qemu-devel] [RFC v2 01/11] tcg: move tb_invalidated_flag to CPUState Alex Bennée
                   ` (10 more replies)
  0 siblings, 11 replies; 66+ messages in thread
From: Alex Bennée @ 2016-04-05 15:32 UTC (permalink / raw)
  To: mttcg, fred.konrad, a.rigo, serge.fdrv, cota
  Cc: peter.maydell, claudio.fontana, jan.kiszka, mark.burton,
	qemu-devel, pbonzini, Alex Bennée, rth

Hi,

This is the second iteration of the RFC patch set aims to provide the
basic framework for MTTCG. There wasn't much feedback on the first
posting but there has been some changes:

  - fixed bugs in single-thread mode w.r.t shutdown
  - tested multi-thread mode against (non)locking-test
  - included docs/multi-thread-tcg.txt design document
  - clean-ups to attribution and s-o-b tags
  - dropped tcg: move tb_find_fast outside the tb_lock critical
    - this was breaking the pxe tests in make check
  - pulled in tcg: move tb_invalidated_flag to CPUState (from Serge)
  - dropped cpu-exec: elide more icount code if CONFIG_USER_ONLY (to Serge)

The biggest changes were to the last patch that enables multi-threading
to fix various bugs. The single-thread case was failing to shutdown
due not bringing the system to wait on the halt_cond once no vCPUs
could be scheduled anymore. Conversely the multi-threaded mode was
crashing on the power-up of secondary vCPUs as the reset of vCPUs at
the start would leave the thread spinning instead of waiting on the
halt_cond until PSCI has cleanly enabled the vCPU.

The branch can be found at:

  https://github.com/stsquad/qemu/tree/mttcg/base-patches-v2

It has been built up on top of:

  https://github.com/stsquad/qemu/tree/mttcg/tb-and-tcg-cleanups

Which is just a collection of Serge's TCG cleanups not directly tied
to MTTCG which will hopefully get merged in due course well before
these patches.

What's next
===========

This series should now behave as normal in the default single threaded
mode - any regressions are bugs that I'd like to know about.

multi-threaded mode works for any guests that don't make assumptions
about atomicity, memory consistency or cache flushes. Basically simple
kvm-unit-test cases. I've been using the simple locking-test:

  ./arm/run ./arm/locking-test.flat -smp 4 [-tcg mttcg=on]

Default:
real    0m42.335s
user    0m42.296s
sys     0m0.024s

Multi-threaded:
real    1m32.928s
user    2m15.704s
sys     3m4.840s

Obviously the performance in this simple test case is a regression
(mainly due to lock contention). This is to be expected and something
I hope can be addressed as this series is built on with ways of
reducing lock contention and safely handling the various invalidation
operations.

I hope this works as a good base for further MTTCG development. My
next step will be to re-build the remaining patches from Fred's
multi_tcg_v8 so the changes to enable ARMv7 on x86 can be reviewed.
This basically involves adding:

  - async_safe_work and associated flushes
  - atomic safety
  - defaulting to on

Obviously I await the various trees from Serge, Alvise and Emilio with interest ;-)

In the meantime any review comments gratefully received.

Cheers,

Alex


Alex Bennée (5):
  cpus: make all_vcpus_paused() return bool
  docs: new design document multi-thread-tcg.txt (DRAFTING)
  target-arm/psci.c: wake up sleeping CPUs
  tcg: cpus rm tcg_exec_all()
  tcg: add kick timer for single-threaded vCPU emulation

Jan Kiszka (1):
  tcg: drop global lock during TCG code execution

KONRAD Frederic (3):
  tcg: protect TBContext with tb_lock.
  tcg: add options for enabling MTTCG
  tcg: enable thread-per-vCPU

Paolo Bonzini (2):
  tcg: move tb_invalidated_flag to CPUState
  tcg: comment on which functions have to be called with tb_lock held

 cpu-exec-common.c         |   1 -
 cpu-exec.c                |  46 +++---
 cpus.c                    | 357 +++++++++++++++++++++++++++++++---------------
 cputlb.c                  |   1 +
 docs/multi-thread-tcg.txt | 184 ++++++++++++++++++++++++
 exec.c                    |  16 +++
 hw/i386/kvmvapic.c        |   6 +
 include/exec/exec-all.h   |   7 +-
 include/qom/cpu.h         |  19 +++
 include/sysemu/cpus.h     |   2 +
 memory.c                  |   2 +
 qemu-options.hx           |  14 ++
 softmmu_template.h        |  17 +++
 target-arm/psci.c         |   2 +
 target-i386/smm_helper.c  |   7 +
 tcg/tcg.h                 |   2 +
 translate-all.c           |  89 ++++++++----
 vl.c                      |  12 +-
 18 files changed, 613 insertions(+), 171 deletions(-)
 create mode 100644 docs/multi-thread-tcg.txt

-- 
2.7.4

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [Qemu-devel] [RFC v2 01/11] tcg: move tb_invalidated_flag to CPUState
  2016-04-05 15:32 [Qemu-devel] [RFC v2 00/11] Base enabling patches for MTTCG Alex Bennée
@ 2016-04-05 15:32 ` Alex Bennée
  2016-04-05 15:44   ` Paolo Bonzini
  2016-04-05 15:32 ` [Qemu-devel] [RFC v2 02/11] cpus: make all_vcpus_paused() return bool Alex Bennée
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 66+ messages in thread
From: Alex Bennée @ 2016-04-05 15:32 UTC (permalink / raw)
  To: mttcg, fred.konrad, a.rigo, serge.fdrv, cota
  Cc: peter.maydell, claudio.fontana, Sergey Fedorov,
	Peter Crosthwaite, jan.kiszka, mark.burton, qemu-devel, pbonzini,
	Alex Bennée, Andreas Färber, rth

From: Paolo Bonzini <pbonzini@redhat.com>

This is a baby step towards making tb_flush thread safe.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
[AJB: use bool]
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

---
v2
  - use bool properly
---
 cpu-exec.c              | 11 +++++------
 include/exec/exec-all.h |  2 --
 include/qom/cpu.h       |  2 ++
 translate-all.c         |  6 ++----
 4 files changed, 9 insertions(+), 12 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index 3694234..74065d9 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -205,10 +205,11 @@ static void cpu_exec_nocache(CPUState *cpu, int max_cycles,
     if (max_cycles > CF_COUNT_MASK)
         max_cycles = CF_COUNT_MASK;
 
+    cpu->tb_invalidated_flag = false;
     tb = tb_gen_code(cpu, orig_tb->pc, orig_tb->cs_base, orig_tb->flags,
                      max_cycles | CF_NOCACHE
                          | (ignore_icount ? CF_IGNORE_ICOUNT : 0));
-    tb->orig_tb = tcg_ctx.tb_ctx.tb_invalidated_flag ? NULL : orig_tb;
+    tb->orig_tb = cpu->tb_invalidated_flag ? NULL : orig_tb;
     cpu->current_tb = tb;
     /* execute the generated code */
     trace_exec_tb_nocache(tb, tb->pc);
@@ -229,8 +230,6 @@ static TranslationBlock *tb_find_physical(CPUState *cpu,
     unsigned int h;
     tb_page_addr_t phys_pc, phys_page1;
 
-    tcg_ctx.tb_ctx.tb_invalidated_flag = 0;
-
     /* find translated block using physical mappings */
     phys_pc = get_page_addr_code(env, pc);
     phys_page1 = phys_pc & TARGET_PAGE_MASK;
@@ -303,6 +302,7 @@ static TranslationBlock *tb_find_slow(CPUState *cpu,
 #endif
 
     /* if no translated code available, then translate it now */
+    cpu->tb_invalidated_flag = false;
     tb = tb_gen_code(cpu, pc, cs_base, flags, 0);
 
 #ifdef CONFIG_USER_ONLY
@@ -509,12 +509,11 @@ int cpu_exec(CPUState *cpu)
                 tb = tb_find_fast(cpu);
                 /* Note: we do it here to avoid a gcc bug on Mac OS X when
                    doing it in tb_find_slow */
-                if (tcg_ctx.tb_ctx.tb_invalidated_flag) {
+                if (cpu->tb_invalidated_flag) {
                     /* as some TB could have been invalidated because
-                       of memory exceptions while generating the code, we
+                       of a tb_flush while generating the code, we
                        must recompute the hash index here */
                     next_tb = 0;
-                    tcg_ctx.tb_ctx.tb_invalidated_flag = 0;
                 }
                 /* see if we can patch the calling TB. When the TB
                    spans two pages, we cannot safely do a direct
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 6151a62..bbd9807 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -302,8 +302,6 @@ struct TBContext {
     /* statistics */
     int tb_flush_count;
     int tb_phys_invalidate_count;
-
-    int tb_invalidated_flag;
 };
 
 void tb_free(TranslationBlock *tb);
diff --git a/include/qom/cpu.h b/include/qom/cpu.h
index b7a10f7..6931db9 100644
--- a/include/qom/cpu.h
+++ b/include/qom/cpu.h
@@ -238,6 +238,7 @@ struct kvm_run;
  * @crash_occurred: Indicates the OS reported a crash (panic) for this CPU
  * @tcg_exit_req: Set to force TCG to stop executing linked TBs for this
  *           CPU and return to its top level loop.
+ * @tb_invalidated_flag: Set to tell TCG that tb_flush has been called.
  * @singlestep_enabled: Flags for single-stepping.
  * @icount_extra: Instructions until next timer event.
  * @icount_decr: Number of cycles left, with interrupt flag in high bit.
@@ -289,6 +290,7 @@ struct CPUState {
     bool stopped;
     bool crash_occurred;
     bool exit_request;
+    bool tb_invalidated_flag;
     uint32_t interrupt_request;
     int singlestep_enabled;
     int64_t icount_extra;
diff --git a/translate-all.c b/translate-all.c
index 56498e0..d923008 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -841,6 +841,7 @@ void tb_flush(CPUState *cpu)
     tcg_ctx.tb_ctx.nb_tbs = 0;
 
     CPU_FOREACH(cpu) {
+        cpu->tb_invalidated_flag = true;
         memset(cpu->tb_jmp_cache, 0, sizeof(cpu->tb_jmp_cache));
     }
 
@@ -1009,12 +1010,11 @@ void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr)
         invalidate_page_bitmap(p);
     }
 
-    tcg_ctx.tb_ctx.tb_invalidated_flag = 1;
-
     /* remove the TB from the hash list */
     h = tb_jmp_cache_hash_func(tb->pc);
     CPU_FOREACH(cpu) {
         if (cpu->tb_jmp_cache[h] == tb) {
+            cpu->tb_invalidated_flag = true;
             cpu->tb_jmp_cache[h] = NULL;
         }
     }
@@ -1176,8 +1176,6 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
         /* cannot fail at this point */
         tb = tb_alloc(pc);
         assert(tb != NULL);
-        /* Don't forget to invalidate previous TB info.  */
-        tcg_ctx.tb_ctx.tb_invalidated_flag = 1;
     }
 
     gen_code_buf = tcg_ctx.code_gen_ptr;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [Qemu-devel] [RFC v2 02/11] cpus: make all_vcpus_paused() return bool
  2016-04-05 15:32 [Qemu-devel] [RFC v2 00/11] Base enabling patches for MTTCG Alex Bennée
  2016-04-05 15:32 ` [Qemu-devel] [RFC v2 01/11] tcg: move tb_invalidated_flag to CPUState Alex Bennée
@ 2016-04-05 15:32 ` Alex Bennée
  2016-04-11 12:48   ` Sergey Fedorov
  2016-04-05 15:32 ` [Qemu-devel] [RFC v2 03/11] docs: new design document multi-thread-tcg.txt (DRAFTING) Alex Bennée
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 66+ messages in thread
From: Alex Bennée @ 2016-04-05 15:32 UTC (permalink / raw)
  To: mttcg, fred.konrad, a.rigo, serge.fdrv, cota
  Cc: peter.maydell, claudio.fontana, Peter Crosthwaite, jan.kiszka,
	mark.burton, qemu-devel, pbonzini, Alex Bennée, rth

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
 cpus.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/cpus.c b/cpus.c
index 8ae4777..e118fdf 100644
--- a/cpus.c
+++ b/cpus.c
@@ -1248,17 +1248,17 @@ void qemu_mutex_unlock_iothread(void)
     qemu_mutex_unlock(&qemu_global_mutex);
 }
 
-static int all_vcpus_paused(void)
+static bool all_vcpus_paused(void)
 {
     CPUState *cpu;
 
     CPU_FOREACH(cpu) {
         if (!cpu->stopped) {
-            return 0;
+            return false;
         }
     }
 
-    return 1;
+    return true;
 }
 
 void pause_all_vcpus(void)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [Qemu-devel] [RFC v2 03/11] docs: new design document multi-thread-tcg.txt (DRAFTING)
  2016-04-05 15:32 [Qemu-devel] [RFC v2 00/11] Base enabling patches for MTTCG Alex Bennée
  2016-04-05 15:32 ` [Qemu-devel] [RFC v2 01/11] tcg: move tb_invalidated_flag to CPUState Alex Bennée
  2016-04-05 15:32 ` [Qemu-devel] [RFC v2 02/11] cpus: make all_vcpus_paused() return bool Alex Bennée
@ 2016-04-05 15:32 ` Alex Bennée
  2016-04-11 20:00   ` Sergey Fedorov
  2016-05-06 11:25   ` Sergey Fedorov
  2016-04-05 15:32 ` [Qemu-devel] [RFC v2 04/11] tcg: comment on which functions have to be called with tb_lock held Alex Bennée
                   ` (7 subsequent siblings)
  10 siblings, 2 replies; 66+ messages in thread
From: Alex Bennée @ 2016-04-05 15:32 UTC (permalink / raw)
  To: mttcg, fred.konrad, a.rigo, serge.fdrv, cota
  Cc: peter.maydell, claudio.fontana, jan.kiszka, mark.burton,
	qemu-devel, pbonzini, Alex Bennée, rth

This is a current DRAFT of a design proposal for upgrading TCG emulation
to take advantage of modern CPUs by running a thread-per-CPU. The
document goes through the various areas of the code affected by such a
change and proposes design requirements for each part of the solution.

It has been written *without* explicit reference to the current ongoing
efforts to introduce this feature. The hope being we can review and
discuss the design choices without assuming the current choices taken by
the implementation are correct.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

---
v1
  - initial version
v2
  - update discussion on locks
  - bit more detail on vCPU scheduling
  - explicitly mention Translation Blocks
  - emulated hardware state already covered by iomutex
  - a few minor rewords
---
 docs/multi-thread-tcg.txt | 184 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 184 insertions(+)
 create mode 100644 docs/multi-thread-tcg.txt

diff --git a/docs/multi-thread-tcg.txt b/docs/multi-thread-tcg.txt
new file mode 100644
index 0000000..32e2f46
--- /dev/null
+++ b/docs/multi-thread-tcg.txt
@@ -0,0 +1,184 @@
+Copyright (c) 2015 Linaro Ltd.
+
+This work is licensed under the terms of the GNU GPL, version 2 or later.  See
+the COPYING file in the top-level directory.
+
+STATUS: DRAFTING
+
+Introduction
+============
+
+This document outlines the design for multi-threaded TCG emulation.
+The original TCG implementation was single threaded and dealt with
+multiple CPUs by with simple round-robin scheduling. This simplified a
+lot of things but became increasingly limited as systems being
+emulated gained additional cores and per-core performance gains for
+host systems started to level off.
+
+vCPU Scheduling
+===============
+
+We introduce a new running mode where each vCPU will run on its own
+user-space thread. This will be enabled by default for all
+FE/BE combinations that have had the required work done to support
+this safely.
+
+In the general case of running translated code there should be no
+inter-vCPU dependencies and all vCPUs should be able to run at full
+speed. Synchronisation will only be required while accessing internal
+shared data structures or when the emulated architecture requires a
+coherent representation of the emulated machine state.
+
+Shared Data Structures
+======================
+
+Global TCG State
+----------------
+
+We need to protect the entire code generation cycle including any post
+generation patching of the translated code. This also implies a shared
+translation buffer which contains code running on all cores. Any
+execution path that comes to the main run loop will need to hold a
+mutex for code generation. This also includes times when we need flush
+code or jumps from the tb_cache.
+
+DESIGN REQUIREMENT: Add locking around all code generation, patching
+and jump cache modification.
+
+Translation Blocks
+------------------
+
+Currently the whole system shares a single code generation buffer
+which when full will force a flush of all translations and start from
+scratch again.
+
+Once a basic block has been translated it will continue to be used
+until it is invalidated. These invalidation events are typically due
+to page changes in system emulation and changes in memory mapping in
+user mode. Debugging operations can also trigger invalidation's.
+
+The invalidation also requires removing the TB from look-ups
+(tb_phys_hash and tb_jmp_cache) as well removing any direct TB to TB
+patched jumps.
+
+DESIGN REQUIREMENT: Safely handle invalidation of TBs
+
+Memory maps and TLBs
+--------------------
+
+The memory handling code is fairly critical to the speed of memory
+access in the emulated system.
+
+  - Memory regions (dividing up access to PIO, MMIO and RAM)
+  - Dirty page tracking (for code gen, migration and display)
+  - Virtual TLB (for translating guest address->real address)
+
+There is a both a fast path walked by the generated code and a slow
+path when resolution is required. When the TLB tables are updated we
+need to ensure they are done in a safe way by bringing all executing
+threads to a halt before making the modifications.
+
+DESIGN REQUIREMENTS:
+
+  - TLB Flush All/Page
+    - can be across-CPUs
+    - will need all other CPUs brought to a halt
+  - TLB Update (update a CPUTLBEntry, via tlb_set_page_with_attrs)
+    - This is a per-CPU table - by definition can't race
+    - updated by it's own thread when the slow-path is forced
+
+Emulated hardware state
+-----------------------
+
+Currently thanks to KVM work any access to IO memory is automatically
+protected by the global iothread mutex. Any IO region that doesn't use
+global mutex is expected to do its own locking.
+
+Memory Consistency
+==================
+
+Between emulated guests and host systems there are a range of memory
+consistency models. While emulating weakly ordered systems on strongly
+ordered hosts shouldn't cause any problems the same is not true for
+the reverse setup.
+
+The proposed design currently does not address the problem of
+emulating strong ordering on a weakly ordered host although even on
+strongly ordered systems software should be using synchronisation
+primitives to ensure correct operation.
+
+Memory Barriers
+---------------
+
+Barriers (sometimes known as fences) provide a mechanism for software
+to enforce a particular ordering of memory operations from the point
+of view of external observers (e.g. another processor core). They can
+apply to any memory operations as well as just loads or stores.
+
+The Linux kernel has an excellent write-up on the various forms of
+memory barrier and the guarantees they can provide [1].
+
+Barriers are often wrapped around synchronisation primitives to
+provide explicit memory ordering semantics. However they can be used
+by themselves to provide safe lockless access by ensuring for example
+a signal flag will always be set after a payload.
+
+DESIGN REQUIREMENT: Add a new tcg_memory_barrier op
+
+This would enforce a strong load/store ordering so all loads/stores
+complete at the memory barrier. On single-core non-SMP strongly
+ordered backends this could become a NOP.
+
+There may be a case for further refinement if this causes performance
+bottlenecks.
+
+Memory Control and Maintenance
+------------------------------
+
+This includes a class of instructions for controlling system cache
+behaviour. While QEMU doesn't model cache behaviour these instructions
+are often seen when code modification has taken place to ensure the
+changes take effect.
+
+Synchronisation Primitives
+--------------------------
+
+There are two broad types of synchronisation primitives found in
+modern ISAs: atomic instructions and exclusive regions.
+
+The first type offer a simple atomic instruction which will guarantee
+some sort of test and conditional store will be truly atomic w.r.t.
+other cores sharing access to the memory. The classic example is the
+x86 cmpxchg instruction.
+
+The second type offer a pair of load/store instructions which offer a
+guarantee that an region of memory has not been touched between the
+load and store instructions. An example of this is ARM's ldrex/strex
+pair where the strex instruction will return a flag indicating a
+successful store only if no other CPU has accessed the memory region
+since the ldrex.
+
+Traditionally TCG has generated a series of operations that work
+because they are within the context of a single translation block so
+will have completed before another CPU is scheduled. However with
+the ability to have multiple threads running to emulate multiple CPUs
+we will need to explicitly expose these semantics.
+
+DESIGN REQUIREMENTS:
+ - atomics
+   - Introduce some atomic TCG ops for the common semantics
+   - The default fallback helper function will use qemu_atomics
+   - Each backend can then add a more efficient implementation
+ - load/store exclusive
+   [AJB:
+        There are currently a number proposals of interest:
+     - Greensocs tweaks to ldst ex (using locks)
+     - Slow-path for atomic instruction translation [2]
+     - Helper-based Atomic Instruction Emulation (AIE) [3]
+    ]
+
+==========
+
+[1] https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/plain/Documentation/memory-barriers.txt
+[2] http://thread.gmane.org/gmane.comp.emulators.qemu/334561
+[3] http://thread.gmane.org/gmane.comp.emulators.qemu/335297
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [Qemu-devel] [RFC v2 04/11] tcg: comment on which functions have to be called with tb_lock held
  2016-04-05 15:32 [Qemu-devel] [RFC v2 00/11] Base enabling patches for MTTCG Alex Bennée
                   ` (2 preceding siblings ...)
  2016-04-05 15:32 ` [Qemu-devel] [RFC v2 03/11] docs: new design document multi-thread-tcg.txt (DRAFTING) Alex Bennée
@ 2016-04-05 15:32 ` Alex Bennée
  2016-05-05 14:19   ` Sergey Fedorov
  2016-05-06 18:22   ` Sergey Fedorov
  2016-04-05 15:32 ` [Qemu-devel] [RFC v2 05/11] tcg: protect TBContext with tb_lock Alex Bennée
                   ` (6 subsequent siblings)
  10 siblings, 2 replies; 66+ messages in thread
From: Alex Bennée @ 2016-04-05 15:32 UTC (permalink / raw)
  To: mttcg, fred.konrad, a.rigo, serge.fdrv, cota
  Cc: peter.maydell, claudio.fontana, Peter Crosthwaite, jan.kiszka,
	mark.burton, qemu-devel, pbonzini, Alex Bennée,
	Andreas Färber, rth

From: Paolo Bonzini <pbonzini@redhat.com>

softmmu requires more functions to be thread-safe, because translation
blocks can be invalidated from e.g. notdirty callbacks.  Probably the
same holds for user-mode emulation, it's just that no one has ever
tried to produce a coherent locking there.

This patch will guide the introduction of more tb_lock and tb_unlock
calls for system emulation.

Note that after this patch some (most) of the mentioned functions are
still called outside tb_lock/tb_unlock.  The next one will rectify this.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

---
v1(ajb):
  - just s-o-b
v2
  - clarify write lock on tb_jump_cache
---
 exec.c                  |  1 +
 include/exec/exec-all.h |  1 +
 include/qom/cpu.h       |  3 +++
 tcg/tcg.h               |  2 ++
 translate-all.c         | 32 ++++++++++++++++++++++++++------
 5 files changed, 33 insertions(+), 6 deletions(-)

diff --git a/exec.c b/exec.c
index f46e596..17f390e 100644
--- a/exec.c
+++ b/exec.c
@@ -826,6 +826,7 @@ int cpu_breakpoint_insert(CPUState *cpu, vaddr pc, int flags,
 {
     CPUBreakpoint *bp;
 
+    /* TODO: locking (RCU?) */
     bp = g_malloc(sizeof(*bp));
 
     bp->pc = pc;
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index bbd9807..60716ae 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -386,6 +386,7 @@ static inline void tb_set_jmp_target(TranslationBlock *tb,
 
 #endif
 
+/* Called with tb_lock held.  */
 static inline void tb_add_jump(TranslationBlock *tb, int n,
                                TranslationBlock *tb_next)
 {
diff --git a/include/qom/cpu.h b/include/qom/cpu.h
index 6931db9..13eeaae 100644
--- a/include/qom/cpu.h
+++ b/include/qom/cpu.h
@@ -306,7 +306,10 @@ struct CPUState {
 
     void *env_ptr; /* CPUArchState */
     struct TranslationBlock *current_tb;
+
+    /* Writes protected by tb_lock, reads not thread-safe  */
     struct TranslationBlock *tb_jmp_cache[TB_JMP_CACHE_SIZE];
+
     struct GDBRegisterState *gdb_regs;
     int gdb_num_regs;
     int gdb_num_g_regs;
diff --git a/tcg/tcg.h b/tcg/tcg.h
index ea4ff02..a46d17c 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -609,6 +609,7 @@ static inline bool tcg_op_buf_full(void)
 
 /* pool based memory allocation */
 
+/* tb_lock must be held for tcg_malloc_internal. */
 void *tcg_malloc_internal(TCGContext *s, int size);
 void tcg_pool_reset(TCGContext *s);
 void tcg_pool_delete(TCGContext *s);
@@ -617,6 +618,7 @@ void tb_lock(void);
 void tb_unlock(void);
 void tb_lock_reset(void);
 
+/* Called with tb_lock held.  */
 static inline void *tcg_malloc(int size)
 {
     TCGContext *s = &tcg_ctx;
diff --git a/translate-all.c b/translate-all.c
index d923008..a7ff5e7 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -248,7 +248,9 @@ static int encode_search(TranslationBlock *tb, uint8_t *block)
     return p - block;
 }
 
-/* The cpu state corresponding to 'searched_pc' is restored.  */
+/* The cpu state corresponding to 'searched_pc' is restored.
+ * Called with tb_lock held.
+ */
 static int cpu_restore_state_from_tb(CPUState *cpu, TranslationBlock *tb,
                                      uintptr_t searched_pc)
 {
@@ -306,8 +308,10 @@ bool cpu_restore_state(CPUState *cpu, uintptr_t retaddr)
         if (tb->cflags & CF_NOCACHE) {
             /* one-shot translation, invalidate it immediately */
             cpu->current_tb = NULL;
+            tb_lock();
             tb_phys_invalidate(tb, -1);
             tb_free(tb);
+            tb_unlock();
         }
         return true;
     }
@@ -399,6 +403,7 @@ static void page_init(void)
 }
 
 /* If alloc=1:
+ * Called with tb_lock held for system emulation.
  * Called with mmap_lock held for user-mode emulation.
  */
 static PageDesc *page_find_alloc(tb_page_addr_t index, int alloc)
@@ -754,8 +759,12 @@ bool tcg_enabled(void)
     return tcg_ctx.code_gen_buffer != NULL;
 }
 
-/* Allocate a new translation block. Flush the translation buffer if
-   too many translation blocks or too much generated code. */
+/*
+ * Allocate a new translation block. Flush the translation buffer if
+ * too many translation blocks or too much generated code.
+ *
+ * Called with tb_lock held.
+ */
 static TranslationBlock *tb_alloc(target_ulong pc)
 {
     TranslationBlock *tb;
@@ -769,6 +778,7 @@ static TranslationBlock *tb_alloc(target_ulong pc)
     return tb;
 }
 
+/* Called with tb_lock held.  */
 void tb_free(TranslationBlock *tb)
 {
     /* In practice this is mostly used for single use temporary TB
@@ -874,7 +884,10 @@ static void tb_invalidate_check(target_ulong address)
     }
 }
 
-/* verify that all the pages have correct rights for code */
+/* verify that all the pages have correct rights for code
+ *
+ * Called with tb_lock held.
+ */
 static void tb_page_check(void)
 {
     TranslationBlock *tb;
@@ -985,7 +998,10 @@ static inline void tb_jmp_unlink(TranslationBlock *tb)
     }
 }
 
-/* invalidate one TB */
+/* invalidate one TB
+ *
+ * Called with tb_lock held.
+ */
 void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr)
 {
     CPUState *cpu;
@@ -1062,6 +1078,7 @@ static void build_page_bitmap(PageDesc *p)
 
 /* add the tb in the target page and protect it if necessary
  *
+ * Called with tb_lock held.
  * Called with mmap_lock held for user-mode emulation.
  */
 static inline void tb_alloc_page(TranslationBlock *tb,
@@ -1429,7 +1446,9 @@ void tb_invalidate_phys_page_fast(tb_page_addr_t start, int len)
     }
     if (!p->code_bitmap &&
         ++p->code_write_count >= SMC_BITMAP_USE_THRESHOLD) {
-        /* build code bitmap */
+        /* build code bitmap.  FIXME: writes should be protected by
+         * tb_lock, reads by tb_lock or RCU.
+         */
         build_page_bitmap(p);
     }
     if (p->code_bitmap) {
@@ -1571,6 +1590,7 @@ void tb_invalidate_phys_addr(AddressSpace *as, hwaddr addr)
 }
 #endif /* !defined(CONFIG_USER_ONLY) */
 
+/* Called with tb_lock held.  */
 void tb_check_watchpoint(CPUState *cpu)
 {
     TranslationBlock *tb;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [Qemu-devel] [RFC v2 05/11] tcg: protect TBContext with tb_lock.
  2016-04-05 15:32 [Qemu-devel] [RFC v2 00/11] Base enabling patches for MTTCG Alex Bennée
                   ` (3 preceding siblings ...)
  2016-04-05 15:32 ` [Qemu-devel] [RFC v2 04/11] tcg: comment on which functions have to be called with tb_lock held Alex Bennée
@ 2016-04-05 15:32 ` Alex Bennée
  2016-05-11 12:45   ` Sergey Fedorov
  2016-04-05 15:32 ` [Qemu-devel] [RFC v2 06/11] target-arm/psci.c: wake up sleeping CPUs Alex Bennée
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 66+ messages in thread
From: Alex Bennée @ 2016-04-05 15:32 UTC (permalink / raw)
  To: mttcg, fred.konrad, a.rigo, serge.fdrv, cota
  Cc: peter.maydell, claudio.fontana, Peter Crosthwaite, jan.kiszka,
	Michael S. Tsirkin, mark.burton, qemu-devel, Eduardo Habkost,
	pbonzini, Alex Bennée, rth

From: KONRAD Frederic <fred.konrad@greensocs.com>

This protects TBContext with tb_lock to make tb_* thread safe.

We can still have issue with tb_flush in case of multithread TCG:
another CPU can be executing code during a flush.

This can be fixed later by making all other TCG thread exiting before calling
tb_flush().

Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
Message-Id: <1439220437-23957-8-git-send-email-fred.konrad@greensocs.com>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[AJB: moved into tree, clean-up history]
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

---
v2 (base-patches, ajb):
  - re-base fixes
v7 (FK, MTTCG):
  - Drop a tb_lock in already locked restore_state_to_opc.
v6 (FK, MTTCG):
  - Drop a tb_lock arround tb_find_fast in cpu-exec.c.
---
 cpu-exec.c         |  8 +++++++-
 exec.c             |  3 +++
 hw/i386/kvmvapic.c |  3 +++
 translate-all.c    | 32 +++++++++++++++++++++++++-------
 4 files changed, 38 insertions(+), 8 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index 74065d9..bd50fef 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -205,18 +205,24 @@ static void cpu_exec_nocache(CPUState *cpu, int max_cycles,
     if (max_cycles > CF_COUNT_MASK)
         max_cycles = CF_COUNT_MASK;
 
+    tb_lock();
     cpu->tb_invalidated_flag = false;
     tb = tb_gen_code(cpu, orig_tb->pc, orig_tb->cs_base, orig_tb->flags,
                      max_cycles | CF_NOCACHE
                          | (ignore_icount ? CF_IGNORE_ICOUNT : 0));
     tb->orig_tb = cpu->tb_invalidated_flag ? NULL : orig_tb;
     cpu->current_tb = tb;
+    tb_unlock();
+
     /* execute the generated code */
     trace_exec_tb_nocache(tb, tb->pc);
-    cpu_tb_exec(cpu, tb);
+    cpu_tb_exec(cpu, tb->tc_ptr);
+
+    tb_lock();
     cpu->current_tb = NULL;
     tb_phys_invalidate(tb, -1);
     tb_free(tb);
+    tb_unlock();
 }
 #endif
 
diff --git a/exec.c b/exec.c
index 17f390e..c46c123 100644
--- a/exec.c
+++ b/exec.c
@@ -2111,6 +2111,9 @@ static void check_watchpoint(int offset, int len, MemTxAttrs attrs, int flags)
                     continue;
                 }
                 cpu->watchpoint_hit = wp;
+
+                /* Unlocked by cpu_loop_exit or cpu_resume_from_signal.  */
+                tb_lock();
                 tb_check_watchpoint(cpu);
                 if (wp->flags & BP_STOP_BEFORE_ACCESS) {
                     cpu->exception_index = EXCP_DEBUG;
diff --git a/hw/i386/kvmvapic.c b/hw/i386/kvmvapic.c
index c69f374..7c0d542 100644
--- a/hw/i386/kvmvapic.c
+++ b/hw/i386/kvmvapic.c
@@ -14,6 +14,7 @@
 #include "sysemu/kvm.h"
 #include "hw/i386/apic_internal.h"
 #include "hw/sysbus.h"
+#include "tcg/tcg.h"
 
 #define VAPIC_IO_PORT           0x7e
 
@@ -446,6 +447,8 @@ static void patch_instruction(VAPICROMState *s, X86CPU *cpu, target_ulong ip)
     resume_all_vcpus();
 
     if (!kvm_enabled()) {
+        /* Unlocked by cpu_resume_from_signal.  */
+        tb_lock();
         cs->current_tb = NULL;
         tb_gen_code(cs, current_pc, current_cs_base, current_flags, 1);
         cpu_resume_from_signal(cs, NULL);
diff --git a/translate-all.c b/translate-all.c
index a7ff5e7..935d24c 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -834,7 +834,9 @@ static void page_flush_tb(void)
 }
 
 /* flush all the translation blocks */
-/* XXX: tb_flush is currently not thread safe */
+/* XXX: tb_flush is currently not thread safe.  System emulation calls it only
+ * with tb_lock taken or from safe_work, so no need to take tb_lock here.
+ */
 void tb_flush(CPUState *cpu)
 {
 #if defined(DEBUG_FLUSH)
@@ -1350,6 +1352,7 @@ void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end,
     /* we remove all the TBs in the range [start, end[ */
     /* XXX: see if in some cases it could be faster to invalidate all
        the code */
+    tb_lock();
     tb = p->first_tb;
     while (tb != NULL) {
         n = (uintptr_t)tb & 3;
@@ -1417,12 +1420,13 @@ void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end,
     if (current_tb_modified) {
         /* we generate a block containing just the instruction
            modifying the memory. It will ensure that it cannot modify
-           itself */
+           itself.  cpu_resume_from_signal unlocks tb_lock.  */
         cpu->current_tb = NULL;
         tb_gen_code(cpu, current_pc, current_cs_base, current_flags, 1);
         cpu_resume_from_signal(cpu, NULL);
     }
 #endif
+    tb_unlock();
 }
 
 #ifdef CONFIG_SOFTMMU
@@ -1489,6 +1493,8 @@ static void tb_invalidate_phys_page(tb_page_addr_t addr,
     if (!p) {
         return;
     }
+
+    tb_lock();
     tb = p->first_tb;
 #ifdef TARGET_HAS_PRECISE_SMC
     if (tb && pc != 0) {
@@ -1530,9 +1536,12 @@ static void tb_invalidate_phys_page(tb_page_addr_t addr,
         if (locked) {
             mmap_unlock();
         }
+
+        /* tb_lock released by cpu_resume_from_signal.  */
         cpu_resume_from_signal(cpu, puc);
     }
 #endif
+    tb_unlock();
 }
 #endif
 
@@ -1627,6 +1636,7 @@ void cpu_io_recompile(CPUState *cpu, uintptr_t retaddr)
     target_ulong pc, cs_base;
     uint64_t flags;
 
+    tb_lock();
     tb = tb_find_pc(retaddr);
     if (!tb) {
         cpu_abort(cpu, "cpu_io_recompile: could not find TB for pc=%p",
@@ -1678,11 +1688,15 @@ void cpu_io_recompile(CPUState *cpu, uintptr_t retaddr)
     /* FIXME: In theory this could raise an exception.  In practice
        we have already translated the block once so it's probably ok.  */
     tb_gen_code(cpu, pc, cs_base, flags, cflags);
-    /* TODO: If env->pc != tb->pc (i.e. the faulting instruction was not
-       the first in the TB) then we end up generating a whole new TB and
-       repeating the fault, which is horribly inefficient.
-       Better would be to execute just this insn uncached, or generate a
-       second new TB.  */
+
+    /* This unlocks the tb_lock.
+     *
+     * TODO: If env->pc != tb->pc (i.e. the faulting instruction was not
+     * the first in the TB) then we end up generating a whole new TB and
+     * repeating the fault, which is horribly inefficient.
+     * Better would be to execute just this insn uncached, or generate a
+     * second new TB.
+     */
     cpu_resume_from_signal(cpu, NULL);
 }
 
@@ -1707,6 +1721,8 @@ void dump_exec_info(FILE *f, fprintf_function cpu_fprintf)
     int direct_jmp_count, direct_jmp2_count, cross_page;
     TranslationBlock *tb;
 
+    tb_lock();
+
     target_code_size = 0;
     max_target_code_size = 0;
     cross_page = 0;
@@ -1762,6 +1778,8 @@ void dump_exec_info(FILE *f, fprintf_function cpu_fprintf)
             tcg_ctx.tb_ctx.tb_phys_invalidate_count);
     cpu_fprintf(f, "TLB flush count     %d\n", tlb_flush_count);
     tcg_dump_info(f, cpu_fprintf);
+
+    tb_unlock();
 }
 
 void dump_opcount_info(FILE *f, fprintf_function cpu_fprintf)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [Qemu-devel] [RFC v2 06/11] target-arm/psci.c: wake up sleeping CPUs
  2016-04-05 15:32 [Qemu-devel] [RFC v2 00/11] Base enabling patches for MTTCG Alex Bennée
                   ` (4 preceding siblings ...)
  2016-04-05 15:32 ` [Qemu-devel] [RFC v2 05/11] tcg: protect TBContext with tb_lock Alex Bennée
@ 2016-04-05 15:32 ` Alex Bennée
  2016-04-05 15:32 ` [Qemu-devel] [RFC v2 07/11] tcg: cpus rm tcg_exec_all() Alex Bennée
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 66+ messages in thread
From: Alex Bennée @ 2016-04-05 15:32 UTC (permalink / raw)
  To: mttcg, fred.konrad, a.rigo, serge.fdrv, cota
  Cc: peter.maydell, claudio.fontana, jan.kiszka, mark.burton,
	qemu-devel, open list:ARM, Alexander Spyridakis, pbonzini,
	Alex Bennée, rth

Testing with Alexander's bare metal syncronisation tests fails in MTTCG
leaving one CPU spinning forever waiting for the second CPU to wake up.
We simply need to poke the halt_cond once we have processed the PSCI
power on call.

Tested-by: Alex Bennée <alex.bennee@linaro.org>
CC: Alexander Spyridakis <a.spyridakis@virtualopensystems.com>
Message-Id: <1439220437-23957-20-git-send-email-fred.konrad@greensocs.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 target-arm/psci.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/target-arm/psci.c b/target-arm/psci.c
index c55487f..8e937d8 100644
--- a/target-arm/psci.c
+++ b/target-arm/psci.c
@@ -212,6 +212,8 @@ void arm_handle_psci_call(ARMCPU *cpu)
         }
         target_cpu_class->set_pc(target_cpu_state, entry);
 
+        qemu_cpu_kick(target_cpu_state);
+
         ret = 0;
         break;
     case QEMU_PSCI_0_1_FN_CPU_OFF:
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [Qemu-devel] [RFC v2 07/11] tcg: cpus rm tcg_exec_all()
  2016-04-05 15:32 [Qemu-devel] [RFC v2 00/11] Base enabling patches for MTTCG Alex Bennée
                   ` (5 preceding siblings ...)
  2016-04-05 15:32 ` [Qemu-devel] [RFC v2 06/11] target-arm/psci.c: wake up sleeping CPUs Alex Bennée
@ 2016-04-05 15:32 ` Alex Bennée
  2016-05-26 11:03   ` Sergey Fedorov
  2016-04-05 15:32 ` [Qemu-devel] [RFC v2 08/11] tcg: add options for enabling MTTCG Alex Bennée
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 66+ messages in thread
From: Alex Bennée @ 2016-04-05 15:32 UTC (permalink / raw)
  To: mttcg, fred.konrad, a.rigo, serge.fdrv, cota
  Cc: peter.maydell, claudio.fontana, Peter Crosthwaite, jan.kiszka,
	mark.burton, qemu-devel, pbonzini, Alex Bennée, rth

In preparation for multi-threaded TCG we remove tcg_exec_all and move
all the CPU cycling into the main thread function. When MTTCG is enabled
we shall use a separate thread function which only handles one vCPU.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

---
v2
  - update timer calls to new API on rebase
---
 cpus.c | 63 +++++++++++++++++++++++++++++----------------------------------
 1 file changed, 29 insertions(+), 34 deletions(-)

diff --git a/cpus.c b/cpus.c
index e118fdf..46732a5 100644
--- a/cpus.c
+++ b/cpus.c
@@ -67,7 +67,6 @@
 
 #endif /* CONFIG_LINUX */
 
-static CPUState *next_cpu;
 int64_t max_delay;
 int64_t max_advance;
 
@@ -1109,7 +1108,7 @@ static void *qemu_dummy_cpu_thread_fn(void *arg)
 #endif
 }
 
-static void tcg_exec_all(void);
+static int tcg_cpu_exec(CPUState *cpu);
 
 static void *qemu_tcg_cpu_thread_fn(void *arg)
 {
@@ -1140,8 +1139,35 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
     /* process any pending work */
     atomic_mb_set(&exit_request, 1);
 
+    cpu = first_cpu;
+
     while (1) {
-        tcg_exec_all();
+        /* Account partial waits to QEMU_CLOCK_VIRTUAL.  */
+        qemu_account_warp_timer();
+
+        if (!cpu) {
+            cpu = first_cpu;
+        }
+
+        for (; cpu != NULL && !exit_request; cpu = CPU_NEXT(cpu)) {
+
+            qemu_clock_enable(QEMU_CLOCK_VIRTUAL,
+                              (cpu->singlestep_enabled & SSTEP_NOTIMER) == 0);
+
+            if (cpu_can_run(cpu)) {
+                int r = tcg_cpu_exec(cpu);
+                if (r == EXCP_DEBUG) {
+                    cpu_handle_guest_debug(cpu);
+                    break;
+                }
+            } else if (cpu->stop || cpu->stopped) {
+                break;
+            }
+
+        } /* for cpu.. */
+
+        /* Pairs with smp_wmb in qemu_cpu_kick.  */
+        atomic_mb_set(&exit_request, 0);
 
         if (use_icount) {
             int64_t deadline = qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL);
@@ -1500,37 +1526,6 @@ static int tcg_cpu_exec(CPUState *cpu)
     return ret;
 }
 
-static void tcg_exec_all(void)
-{
-    int r;
-
-    /* Account partial waits to QEMU_CLOCK_VIRTUAL.  */
-    qemu_account_warp_timer();
-
-    if (next_cpu == NULL) {
-        next_cpu = first_cpu;
-    }
-    for (; next_cpu != NULL && !exit_request; next_cpu = CPU_NEXT(next_cpu)) {
-        CPUState *cpu = next_cpu;
-
-        qemu_clock_enable(QEMU_CLOCK_VIRTUAL,
-                          (cpu->singlestep_enabled & SSTEP_NOTIMER) == 0);
-
-        if (cpu_can_run(cpu)) {
-            r = tcg_cpu_exec(cpu);
-            if (r == EXCP_DEBUG) {
-                cpu_handle_guest_debug(cpu);
-                break;
-            }
-        } else if (cpu->stop || cpu->stopped) {
-            break;
-        }
-    }
-
-    /* Pairs with smp_wmb in qemu_cpu_kick.  */
-    atomic_mb_set(&exit_request, 0);
-}
-
 void list_cpus(FILE *f, fprintf_function cpu_fprintf, const char *optarg)
 {
     /* XXX: implement xxx_cpu_list for targets that still miss it */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [Qemu-devel] [RFC v2 08/11] tcg: add options for enabling MTTCG
  2016-04-05 15:32 [Qemu-devel] [RFC v2 00/11] Base enabling patches for MTTCG Alex Bennée
                   ` (6 preceding siblings ...)
  2016-04-05 15:32 ` [Qemu-devel] [RFC v2 07/11] tcg: cpus rm tcg_exec_all() Alex Bennée
@ 2016-04-05 15:32 ` Alex Bennée
  2016-04-11 20:50   ` Sergey Fedorov
  2016-04-12 13:23   ` Sergey Fedorov
  2016-04-05 15:32 ` [Qemu-devel] [RFC v2 09/11] tcg: add kick timer for single-threaded vCPU emulation Alex Bennée
                   ` (2 subsequent siblings)
  10 siblings, 2 replies; 66+ messages in thread
From: Alex Bennée @ 2016-04-05 15:32 UTC (permalink / raw)
  To: mttcg, fred.konrad, a.rigo, serge.fdrv, cota
  Cc: peter.maydell, claudio.fontana, Peter Crosthwaite, jan.kiszka,
	mark.burton, qemu-devel, pbonzini, Alex Bennée,
	Andreas Färber, rth

From: KONRAD Frederic <fred.konrad@greensocs.com>

We know there will be cases where MTTCG won't work until additional work
is done in the front/back ends to support. It will however be useful to
be able to turn it on.

As a result MTTCG will default to off unless the combination is
supported. However the user can turn it on for the sake of testing.

Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
[AJB: move to -tcg mttcg=on/off, defaults]
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

---
v1:
  - merge with add mttcg option.
  - update commit message
v2:
  - machine_init->opts_init
---
 cpus.c                | 43 +++++++++++++++++++++++++++++++++++++++++++
 include/qom/cpu.h     | 14 ++++++++++++++
 include/sysemu/cpus.h |  2 ++
 qemu-options.hx       | 14 ++++++++++++++
 vl.c                  | 12 +++++++++++-
 5 files changed, 84 insertions(+), 1 deletion(-)

diff --git a/cpus.c b/cpus.c
index 46732a5..8d27fb0 100644
--- a/cpus.c
+++ b/cpus.c
@@ -25,6 +25,7 @@
 /* Needed early for CONFIG_BSD etc. */
 #include "qemu/osdep.h"
 
+#include "qemu/config-file.h"
 #include "monitor/monitor.h"
 #include "qapi/qmp/qerror.h"
 #include "qemu/error-report.h"
@@ -146,6 +147,48 @@ typedef struct TimersState {
 } TimersState;
 
 static TimersState timers_state;
+static bool mttcg_enabled;
+
+static QemuOptsList qemu_tcg_opts = {
+    .name = "tcg",
+    .head = QTAILQ_HEAD_INITIALIZER(qemu_tcg_opts.head),
+    .desc = {
+        {
+            .name = "mttcg",
+            .type = QEMU_OPT_BOOL,
+            .help = "Enable/disable multi-threaded TCG",
+        },
+        { /* end of list */ }
+    },
+};
+
+static void tcg_register_config(void)
+{
+    qemu_add_opts(&qemu_tcg_opts);
+}
+
+opts_init(tcg_register_config);
+
+static bool default_mttcg_enabled(void)
+{
+    /*
+     * TODO: Check if we have a chance to have MTTCG working on this guest/host.
+     *       Basically is the atomic instruction implemented? Is there any
+     *       memory ordering issue?
+     */
+    return false;
+}
+
+void qemu_tcg_configure(QemuOpts *opts)
+{
+    mttcg_enabled = qemu_opt_get_bool(opts, "mttcg", default_mttcg_enabled());
+}
+
+bool qemu_tcg_mttcg_enabled(void)
+{
+    return mttcg_enabled;
+}
+
 
 int64_t cpu_get_icount_raw(void)
 {
diff --git a/include/qom/cpu.h b/include/qom/cpu.h
index 13eeaae..5e3826c 100644
--- a/include/qom/cpu.h
+++ b/include/qom/cpu.h
@@ -369,6 +369,20 @@ extern struct CPUTailQ cpus;
 extern __thread CPUState *current_cpu;
 
 /**
+ * qemu_tcg_enable_mttcg:
+ * Enable the MultiThread TCG support.
+ */
+void qemu_tcg_enable_mttcg(void);
+
+/**
+ * qemu_tcg_mttcg_enabled:
+ * Check whether we are running MultiThread TCG or not.
+ *
+ * Returns: %true if we are in MTTCG mode %false otherwise.
+ */
+bool qemu_tcg_mttcg_enabled(void);
+
+/**
  * cpu_paging_enabled:
  * @cpu: The CPU whose state is to be inspected.
  *
diff --git a/include/sysemu/cpus.h b/include/sysemu/cpus.h
index 3d1e5ba..606426f 100644
--- a/include/sysemu/cpus.h
+++ b/include/sysemu/cpus.h
@@ -26,4 +26,6 @@ extern int smp_threads;
 
 void list_cpus(FILE *f, fprintf_function cpu_fprintf, const char *optarg);
 
+void qemu_tcg_configure(QemuOpts *opts);
+
 #endif
diff --git a/qemu-options.hx b/qemu-options.hx
index a770086..4eca704 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -3224,6 +3224,20 @@ Attach to existing xen domain.
 xend will use this when starting QEMU (XEN only).
 ETEXI
 
+DEF("tcg", HAS_ARG, QEMU_OPTION_tcg, \
+    "-tcg [mttcg=on|off] control TCG options\n", QEMU_ARCH_ALL)
+STEXI
+@item -tcg
+@findex -tcg
+@table @option
+@item mttcg=[on|off]
+Control multi-threaded TCG. By default QEMU will enable multi-threaded
+emulation for front/back-end combinations that are known to work. The
+user may enable it against the defaults however odd guest behaviour
+may occur.
+@end table
+ETEXI
+
 DEF("no-reboot", 0, QEMU_OPTION_no_reboot, \
     "-no-reboot      exit instead of rebooting\n", QEMU_ARCH_ALL)
 STEXI
diff --git a/vl.c b/vl.c
index bd81ea9..51bbdbc 100644
--- a/vl.c
+++ b/vl.c
@@ -2961,7 +2961,8 @@ int main(int argc, char **argv, char **envp)
     const char *boot_once = NULL;
     DisplayState *ds;
     int cyls, heads, secs, translation;
-    QemuOpts *hda_opts = NULL, *opts, *machine_opts, *icount_opts = NULL;
+    QemuOpts *opts, *machine_opts;
+    QemuOpts *hda_opts = NULL, *icount_opts = NULL, *tcg_opts = NULL;
     QemuOptsList *olist;
     int optind;
     const char *optarg;
@@ -3750,6 +3751,13 @@ int main(int argc, char **argv, char **envp)
             case QEMU_OPTION_no_reboot:
                 no_reboot = 1;
                 break;
+            case QEMU_OPTION_tcg:
+                tcg_opts = qemu_opts_parse_noisily(qemu_find_opts("tcg"),
+                                                   optarg, false);
+                if (!tcg_opts) {
+                    exit(1);
+                }
+                break;
             case QEMU_OPTION_no_shutdown:
                 no_shutdown = 1;
                 break;
@@ -4028,6 +4036,8 @@ int main(int argc, char **argv, char **envp)
      */
     loc_set_none();
 
+    qemu_tcg_configure(tcg_opts);
+
     replay_configure(icount_opts);
 
     machine_class = select_machine();
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [Qemu-devel] [RFC v2 09/11] tcg: add kick timer for single-threaded vCPU emulation
  2016-04-05 15:32 [Qemu-devel] [RFC v2 00/11] Base enabling patches for MTTCG Alex Bennée
                   ` (7 preceding siblings ...)
  2016-04-05 15:32 ` [Qemu-devel] [RFC v2 08/11] tcg: add options for enabling MTTCG Alex Bennée
@ 2016-04-05 15:32 ` Alex Bennée
  2016-04-11 21:39   ` Sergey Fedorov
  2016-04-05 15:32 ` [Qemu-devel] [RFC v2 10/11] tcg: drop global lock during TCG code execution Alex Bennée
  2016-04-05 15:32 ` [Qemu-devel] [RFC v2 11/11] tcg: enable thread-per-vCPU Alex Bennée
  10 siblings, 1 reply; 66+ messages in thread
From: Alex Bennée @ 2016-04-05 15:32 UTC (permalink / raw)
  To: mttcg, fred.konrad, a.rigo, serge.fdrv, cota
  Cc: peter.maydell, claudio.fontana, Peter Crosthwaite, jan.kiszka,
	mark.burton, qemu-devel, pbonzini, Alex Bennée, rth

Currently we rely on the side effect of the main loop grabbing the
iothread_mutex to give any long running basic block chains a kick to
ensure the next vCPU is scheduled. As this code is being re-factored and
rationalised we now do it explicitly here.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

---
v2
  - re-base fixes
  - get_ticks_per_sec() -> NANOSECONDS_PER_SEC
---
 cpus.c | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/cpus.c b/cpus.c
index 8d27fb0..e22bb77 100644
--- a/cpus.c
+++ b/cpus.c
@@ -1151,11 +1151,30 @@ static void *qemu_dummy_cpu_thread_fn(void *arg)
 #endif
 }
 
+/* Single-threaded TCG
+ *
+ * In the single-threaded case each vCPU is simulated in turn. If
+ * there is more than a single vCPU we create a simple timer to kick
+ * the vCPU and ensure we don't get stuck in a tight loop in one vCPU.
+ * This is done explicitly rather than relying on side-effects
+ * elsewhere.
+ */
 static int tcg_cpu_exec(CPUState *cpu);
+static void qemu_cpu_kick_no_halt(void);
+
+static void kick_tcg_thread(void *opaque)
+{
+    QEMUTimer *self = *(QEMUTimer **) opaque;
+    timer_mod(self,
+              qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) +
+              NANOSECONDS_PER_SECOND / 10);
+    qemu_cpu_kick_no_halt();
+}
 
 static void *qemu_tcg_cpu_thread_fn(void *arg)
 {
     CPUState *cpu = arg;
+    QEMUTimer *kick_timer;
 
     rcu_register_thread();
 
@@ -1179,6 +1198,14 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
         }
     }
 
+    /* Set to kick if we have to do more than one vCPU */
+    if (CPU_NEXT(first_cpu)) {
+        kick_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,  kick_tcg_thread, &kick_timer);
+        timer_mod(kick_timer,
+                  qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) +
+                  NANOSECONDS_PER_SECOND / 10);
+    }
+
     /* process any pending work */
     atomic_mb_set(&exit_request, 1);
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [Qemu-devel] [RFC v2 10/11] tcg: drop global lock during TCG code execution
  2016-04-05 15:32 [Qemu-devel] [RFC v2 00/11] Base enabling patches for MTTCG Alex Bennée
                   ` (8 preceding siblings ...)
  2016-04-05 15:32 ` [Qemu-devel] [RFC v2 09/11] tcg: add kick timer for single-threaded vCPU emulation Alex Bennée
@ 2016-04-05 15:32 ` Alex Bennée
  2016-05-24 21:28   ` Sergey Fedorov
  2016-04-05 15:32 ` [Qemu-devel] [RFC v2 11/11] tcg: enable thread-per-vCPU Alex Bennée
  10 siblings, 1 reply; 66+ messages in thread
From: Alex Bennée @ 2016-04-05 15:32 UTC (permalink / raw)
  To: mttcg, fred.konrad, a.rigo, serge.fdrv, cota
  Cc: peter.maydell, claudio.fontana, Peter Crosthwaite, jan.kiszka,
	Michael S. Tsirkin, mark.burton, qemu-devel, Eduardo Habkost,
	pbonzini, Alex Bennée, rth

From: Jan Kiszka <jan.kiszka@siemens.com>

This finally allows TCG to benefit from the iothread introduction: Drop
the global mutex while running pure TCG CPU code. Reacquire the lock
when entering MMIO or PIO emulation, or when leaving the TCG loop.

We have to revert a few optimization for the current TCG threading
model, namely kicking the TCG thread in qemu_mutex_lock_iothread and not
kicking it in qemu_cpu_kick. We also need to disable RAM block
reordering until we have a more efficient locking mechanism at hand.

Still, a Linux x86 UP guest and my Musicpal ARM model boot fine here.
These numbers demonstrate where we gain something:

20338 jan       20   0  331m  75m 6904 R   99  0.9   0:50.95 qemu-system-arm
20337 jan       20   0  331m  75m 6904 S   20  0.9   0:26.50 qemu-system-arm

The guest CPU was fully loaded, but the iothread could still run mostly
independent on a second core. Without the patch we don't get beyond

32206 jan       20   0  330m  73m 7036 R   82  0.9   1:06.00 qemu-system-arm
32204 jan       20   0  330m  73m 7036 S   21  0.9   0:17.03 qemu-system-arm

We don't benefit significantly, though, when the guest is not fully
loading a host CPU.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Message-Id: <1439220437-23957-10-git-send-email-fred.konrad@greensocs.com>
[FK: Rebase, fix qemu_devices_reset deadlock, rm address_space_* mutex]
Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
[EGC: fixed iothread lock for cpu-exec IRQ handling]
Signed-off-by: Emilio G. Cota <cota@braap.org>
[AJB: -smp single-threaded fix, rm old info from commit msg]
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

---
v1 (ajb):
  - SMP failure now fixed by previous commit
Changes from Fred Konrad (mttcg-v7 via paolo):
  * Rebase on the current HEAD.
  * Fixes a deadlock in qemu_devices_reset().
  * Remove the mutex in address_space_*
v2 (ajb):
  - merge with tcg: grab iothread lock in cpu-exec interrupt handling
  - use existing fns for tracking lock state
  - lock iothread for mem_region
    - add assert on mem region modification
    - ensure smm_helper holds iothread
  - Add JK s-o-b
  - Fix-up FK s-o-b annotation
---
 cpu-exec.c               | 12 ++++++++++++
 cpus.c                   | 26 +++-----------------------
 cputlb.c                 |  1 +
 exec.c                   | 12 ++++++++++++
 hw/i386/kvmvapic.c       |  3 +++
 memory.c                 |  2 ++
 softmmu_template.h       | 17 +++++++++++++++++
 target-i386/smm_helper.c |  7 +++++++
 translate-all.c          | 11 +++++++++--
 9 files changed, 66 insertions(+), 25 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index bd50fef..f558508 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -28,6 +28,7 @@
 #include "qemu/rcu.h"
 #include "exec/tb-hash.h"
 #include "exec/log.h"
+#include "qemu/main-loop.h"
 #if defined(TARGET_I386) && !defined(CONFIG_USER_ONLY)
 #include "hw/i386/apic.h"
 #endif
@@ -452,6 +453,8 @@ int cpu_exec(CPUState *cpu)
             for(;;) {
                 interrupt_request = cpu->interrupt_request;
                 if (unlikely(interrupt_request)) {
+                    qemu_mutex_lock_iothread();
+
                     if (unlikely(cpu->singlestep_enabled & SSTEP_NOIRQ)) {
                         /* Mask out external interrupts for this step. */
                         interrupt_request &= ~CPU_INTERRUPT_SSTEP_MASK;
@@ -504,6 +507,11 @@ int cpu_exec(CPUState *cpu)
                            the program flow was changed */
                         next_tb = 0;
                     }
+
+                    if (qemu_mutex_iothread_locked()) {
+                        qemu_mutex_unlock_iothread();
+                    }
+
                 }
                 if (unlikely(cpu->exit_request
                              || replay_has_interrupt())) {
@@ -609,8 +617,12 @@ int cpu_exec(CPUState *cpu)
             g_assert(env == &x86_cpu->env);
 #endif
 #endif /* buggy compiler */
+
             cpu->can_do_io = 1;
             tb_lock_reset();
+            if (qemu_mutex_iothread_locked()) {
+                qemu_mutex_unlock_iothread();
+            }
         }
     } /* for(;;) */
 
diff --git a/cpus.c b/cpus.c
index e22bb77..02fae13 100644
--- a/cpus.c
+++ b/cpus.c
@@ -920,8 +920,6 @@ static void qemu_kvm_init_cpu_signals(CPUState *cpu)
 #endif /* _WIN32 */
 
 static QemuMutex qemu_global_mutex;
-static QemuCond qemu_io_proceeded_cond;
-static unsigned iothread_requesting_mutex;
 
 static QemuThread io_thread;
 
@@ -937,7 +935,6 @@ void qemu_init_cpu_loop(void)
     qemu_cond_init(&qemu_cpu_cond);
     qemu_cond_init(&qemu_pause_cond);
     qemu_cond_init(&qemu_work_cond);
-    qemu_cond_init(&qemu_io_proceeded_cond);
     qemu_mutex_init(&qemu_global_mutex);
 
     qemu_thread_get_self(&io_thread);
@@ -1049,10 +1046,6 @@ static void qemu_tcg_wait_io_event(CPUState *cpu)
         qemu_cond_wait(cpu->halt_cond, &qemu_global_mutex);
     }
 
-    while (iothread_requesting_mutex) {
-        qemu_cond_wait(&qemu_io_proceeded_cond, &qemu_global_mutex);
-    }
-
     CPU_FOREACH(cpu) {
         qemu_wait_io_event_common(cpu);
     }
@@ -1319,22 +1312,7 @@ bool qemu_mutex_iothread_locked(void)
 
 void qemu_mutex_lock_iothread(void)
 {
-    atomic_inc(&iothread_requesting_mutex);
-    /* In the simple case there is no need to bump the VCPU thread out of
-     * TCG code execution.
-     */
-    if (!tcg_enabled() || qemu_in_vcpu_thread() ||
-        !first_cpu || !first_cpu->created) {
-        qemu_mutex_lock(&qemu_global_mutex);
-        atomic_dec(&iothread_requesting_mutex);
-    } else {
-        if (qemu_mutex_trylock(&qemu_global_mutex)) {
-            qemu_cpu_kick_no_halt();
-            qemu_mutex_lock(&qemu_global_mutex);
-        }
-        atomic_dec(&iothread_requesting_mutex);
-        qemu_cond_broadcast(&qemu_io_proceeded_cond);
-    }
+    qemu_mutex_lock(&qemu_global_mutex);
     iothread_locked = true;
 }
 
@@ -1580,7 +1558,9 @@ static int tcg_cpu_exec(CPUState *cpu)
         cpu->icount_decr.u16.low = decr;
         cpu->icount_extra = count;
     }
+    qemu_mutex_unlock_iothread();
     ret = cpu_exec(cpu);
+    qemu_mutex_lock_iothread();
 #ifdef CONFIG_PROFILER
     tcg_time += profile_getclock() - ti;
 #endif
diff --git a/cputlb.c b/cputlb.c
index 466663b..1412049 100644
--- a/cputlb.c
+++ b/cputlb.c
@@ -18,6 +18,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/main-loop.h"
 #include "cpu.h"
 #include "exec/exec-all.h"
 #include "exec/memory.h"
diff --git a/exec.c b/exec.c
index c46c123..9e83673 100644
--- a/exec.c
+++ b/exec.c
@@ -2112,6 +2112,12 @@ static void check_watchpoint(int offset, int len, MemTxAttrs attrs, int flags)
                 }
                 cpu->watchpoint_hit = wp;
 
+                /*
+                 * Unlock iothread mutex before calling cpu_loop_exit
+                 * or cpu_resume_from_signal.
+                 */
+                qemu_mutex_unlock_iothread();
+
                 /* Unlocked by cpu_loop_exit or cpu_resume_from_signal.  */
                 tb_lock();
                 tb_check_watchpoint(cpu);
@@ -2347,8 +2353,14 @@ static void io_mem_init(void)
     memory_region_init_io(&io_mem_rom, NULL, &unassigned_mem_ops, NULL, NULL, UINT64_MAX);
     memory_region_init_io(&io_mem_unassigned, NULL, &unassigned_mem_ops, NULL,
                           NULL, UINT64_MAX);
+
+    /* io_mem_notdirty calls tb_invalidate_phys_page_fast,
+     * which must be called without the iothread mutex.
+     */
     memory_region_init_io(&io_mem_notdirty, NULL, &notdirty_mem_ops, NULL,
                           NULL, UINT64_MAX);
+    memory_region_clear_global_locking(&io_mem_notdirty);
+
     memory_region_init_io(&io_mem_watch, NULL, &watch_mem_ops, NULL,
                           NULL, UINT64_MAX);
 }
diff --git a/hw/i386/kvmvapic.c b/hw/i386/kvmvapic.c
index 7c0d542..a0439a8 100644
--- a/hw/i386/kvmvapic.c
+++ b/hw/i386/kvmvapic.c
@@ -447,6 +447,9 @@ static void patch_instruction(VAPICROMState *s, X86CPU *cpu, target_ulong ip)
     resume_all_vcpus();
 
     if (!kvm_enabled()) {
+        /* Unlock iothread mutex before calling cpu_resume_from_signal.  */
+        qemu_mutex_unlock_iothread();
+
         /* Unlocked by cpu_resume_from_signal.  */
         tb_lock();
         cs->current_tb = NULL;
diff --git a/memory.c b/memory.c
index f76f85d..2eae609 100644
--- a/memory.c
+++ b/memory.c
@@ -916,6 +916,8 @@ void memory_region_transaction_commit(void)
     AddressSpace *as;
 
     assert(memory_region_transaction_depth);
+    assert(qemu_mutex_iothread_locked());
+
     --memory_region_transaction_depth;
     if (!memory_region_transaction_depth) {
         if (memory_region_update_pending) {
diff --git a/softmmu_template.h b/softmmu_template.h
index 208f808..0b6c609 100644
--- a/softmmu_template.h
+++ b/softmmu_template.h
@@ -151,6 +151,7 @@ static inline DATA_TYPE glue(io_read, SUFFIX)(CPUArchState *env,
     CPUState *cpu = ENV_GET_CPU(env);
     hwaddr physaddr = iotlbentry->addr;
     MemoryRegion *mr = iotlb_to_region(cpu, physaddr, iotlbentry->attrs);
+    bool locked = false;
 
     physaddr = (physaddr & TARGET_PAGE_MASK) + addr;
     cpu->mem_io_pc = retaddr;
@@ -159,8 +160,15 @@ static inline DATA_TYPE glue(io_read, SUFFIX)(CPUArchState *env,
     }
 
     cpu->mem_io_vaddr = addr;
+    if (mr->global_locking) {
+        qemu_mutex_lock_iothread();
+        locked = true;
+    }
     memory_region_dispatch_read(mr, physaddr, &val, 1 << SHIFT,
                                 iotlbentry->attrs);
+    if (locked) {
+        qemu_mutex_unlock_iothread();
+    }
     return val;
 }
 #endif
@@ -358,6 +366,7 @@ static inline void glue(io_write, SUFFIX)(CPUArchState *env,
     CPUState *cpu = ENV_GET_CPU(env);
     hwaddr physaddr = iotlbentry->addr;
     MemoryRegion *mr = iotlb_to_region(cpu, physaddr, iotlbentry->attrs);
+    bool locked = false;
 
     physaddr = (physaddr & TARGET_PAGE_MASK) + addr;
     if (mr != &io_mem_rom && mr != &io_mem_notdirty && !cpu->can_do_io) {
@@ -366,8 +375,16 @@ static inline void glue(io_write, SUFFIX)(CPUArchState *env,
 
     cpu->mem_io_vaddr = addr;
     cpu->mem_io_pc = retaddr;
+
+    if (mr->global_locking) {
+        qemu_mutex_lock_iothread();
+        locked = true;
+    }
     memory_region_dispatch_write(mr, physaddr, val, 1 << SHIFT,
                                  iotlbentry->attrs);
+    if (locked) {
+        qemu_mutex_unlock_iothread();
+    }
 }
 
 void helper_le_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
diff --git a/target-i386/smm_helper.c b/target-i386/smm_helper.c
index 4dd6a2c..6a5489b 100644
--- a/target-i386/smm_helper.c
+++ b/target-i386/smm_helper.c
@@ -18,6 +18,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/main-loop.h"
 #include "cpu.h"
 #include "exec/helper-proto.h"
 #include "exec/log.h"
@@ -42,11 +43,14 @@ void helper_rsm(CPUX86State *env)
 #define SMM_REVISION_ID 0x00020000
 #endif
 
+/* Called we iothread lock taken */
 void cpu_smm_update(X86CPU *cpu)
 {
     CPUX86State *env = &cpu->env;
     bool smm_enabled = (env->hflags & HF_SMM_MASK);
 
+    g_assert(qemu_mutex_iothread_locked());
+
     if (cpu->smram) {
         memory_region_set_enabled(cpu->smram, smm_enabled);
     }
@@ -333,7 +337,10 @@ void helper_rsm(CPUX86State *env)
     }
     env->hflags2 &= ~HF2_SMM_INSIDE_NMI_MASK;
     env->hflags &= ~HF_SMM_MASK;
+
+    qemu_mutex_lock_iothread();
     cpu_smm_update(cpu);
+    qemu_mutex_unlock_iothread();
 
     qemu_log_mask(CPU_LOG_INT, "SMM: after RSM\n");
     log_cpu_state_mask(CPU_LOG_INT, CPU(cpu), CPU_DUMP_CCOP);
diff --git a/translate-all.c b/translate-all.c
index 935d24c..0c377bb 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -1318,6 +1318,7 @@ void tb_invalidate_phys_range(tb_page_addr_t start, tb_page_addr_t end)
  * this TB.
  *
  * Called with mmap_lock held for user-mode emulation
+ * If called from generated code, iothread mutex must not be held.
  */
 void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end,
                                    int is_cpu_write_access)
@@ -1430,7 +1431,9 @@ void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end,
 }
 
 #ifdef CONFIG_SOFTMMU
-/* len must be <= 8 and start must be a multiple of len */
+/* len must be <= 8 and start must be a multiple of len.
+ * Called via softmmu_template.h, with iothread mutex not held.
+ */
 void tb_invalidate_phys_page_fast(tb_page_addr_t start, int len)
 {
     PageDesc *p;
@@ -1579,6 +1582,7 @@ static TranslationBlock *tb_find_pc(uintptr_t tc_ptr)
 }
 
 #if !defined(CONFIG_USER_ONLY)
+/* If called from generated code, iothread mutex must not be held.  */
 void tb_invalidate_phys_addr(AddressSpace *as, hwaddr addr)
 {
     ram_addr_t ram_addr;
@@ -1625,7 +1629,10 @@ void tb_check_watchpoint(CPUState *cpu)
 
 #ifndef CONFIG_USER_ONLY
 /* in deterministic execution mode, instructions doing device I/Os
-   must be at the end of the TB */
+ * must be at the end of the TB.
+ *
+ * Called by softmmu_template.h, with iothread mutex not held.
+ */
 void cpu_io_recompile(CPUState *cpu, uintptr_t retaddr)
 {
 #if defined(TARGET_MIPS) || defined(TARGET_SH4)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [Qemu-devel] [RFC v2 11/11] tcg: enable thread-per-vCPU
  2016-04-05 15:32 [Qemu-devel] [RFC v2 00/11] Base enabling patches for MTTCG Alex Bennée
                   ` (9 preceding siblings ...)
  2016-04-05 15:32 ` [Qemu-devel] [RFC v2 10/11] tcg: drop global lock during TCG code execution Alex Bennée
@ 2016-04-05 15:32 ` Alex Bennée
  2016-05-27 13:57   ` Sergey Fedorov
  10 siblings, 1 reply; 66+ messages in thread
From: Alex Bennée @ 2016-04-05 15:32 UTC (permalink / raw)
  To: mttcg, fred.konrad, a.rigo, serge.fdrv, cota
  Cc: peter.maydell, claudio.fontana, Peter Crosthwaite, jan.kiszka,
	mark.burton, qemu-devel, pbonzini, Alex Bennée, rth

From: KONRAD Frederic <fred.konrad@greensocs.com>

This allows the user to switch on multi-thread behaviour and spawn a
thread per-vCPU. For a simple test like:

  ./arm/run ./arm/locking-test.flat -smp 4 -tcg mttcg=on

Will now use 4 vCPU threads and have an expected FAIL (instead of the
unexpected PASS) as the default mode of the test has no protection when
incrementing a shared variable.

However we still default to a single thread for all vCPUs as individual
front-end and back-ends need additional fixes to safely support:
  - atomic behaviour
  - tb invalidation
  - memory ordering

The function default_mttcg_enabled can be tweaked as support is added.

As assumptions about tcg_current_cpu are no longer relevant to the
single-threaded kick routine we need to save the current information
somewhere else for the timer.

Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[AJB: Some fixes, conditionally, commit rewording]
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

---
v1 (ajb):
  - fix merge conflicts
  - maintain single-thread approach
v2
  - re-base fixes (no longer has tb_find_fast lock tweak ahead)
  - remove bogus break condition on cpu->stop/stopped
  - only process exiting cpus exit_request
  - handle all cpus idle case (fixes shutdown issues)
  - sleep on EXCP_HALTED in mttcg mode (prevent crash on start-up)
  - move icount timer into helper
---
 cpu-exec-common.c       |   1 -
 cpu-exec.c              |  15 ----
 cpus.c                  | 216 +++++++++++++++++++++++++++++++++---------------
 include/exec/exec-all.h |   4 -
 translate-all.c         |   8 --
 5 files changed, 150 insertions(+), 94 deletions(-)

diff --git a/cpu-exec-common.c b/cpu-exec-common.c
index 1b1731c..3d7eaa3 100644
--- a/cpu-exec-common.c
+++ b/cpu-exec-common.c
@@ -23,7 +23,6 @@
 #include "exec/memory-internal.h"
 
 bool exit_request;
-CPUState *tcg_current_cpu;
 
 /* exit the current TB from a signal handler. The host registers are
    restored in a state compatible with the CPU emulator
diff --git a/cpu-exec.c b/cpu-exec.c
index f558508..42cec05 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -292,7 +292,6 @@ static TranslationBlock *tb_find_slow(CPUState *cpu,
         goto found;
     }
 
-#ifdef CONFIG_USER_ONLY
     /* mmap_lock is needed by tb_gen_code, and mmap_lock must be
      * taken outside tb_lock.  Since we're momentarily dropping
      * tb_lock, there's a chance that our desired tb has been
@@ -306,15 +305,12 @@ static TranslationBlock *tb_find_slow(CPUState *cpu,
         mmap_unlock();
         goto found;
     }
-#endif
 
     /* if no translated code available, then translate it now */
     cpu->tb_invalidated_flag = false;
     tb = tb_gen_code(cpu, pc, cs_base, flags, 0);
 
-#ifdef CONFIG_USER_ONLY
     mmap_unlock();
-#endif
 
 found:
     /* we add the TB in the virtual pc hash table */
@@ -388,13 +384,8 @@ int cpu_exec(CPUState *cpu)
         cpu->halted = 0;
     }
 
-    atomic_mb_set(&tcg_current_cpu, cpu);
     rcu_read_lock();
 
-    if (unlikely(atomic_mb_read(&exit_request))) {
-        cpu->exit_request = 1;
-    }
-
     cc->cpu_exec_enter(cpu);
 
     /* Calculate difference between guest clock and host clock.
@@ -515,7 +506,6 @@ int cpu_exec(CPUState *cpu)
                 }
                 if (unlikely(cpu->exit_request
                              || replay_has_interrupt())) {
-                    cpu->exit_request = 0;
                     cpu->exception_index = EXCP_INTERRUPT;
                     cpu_loop_exit(cpu);
                 }
@@ -629,10 +619,5 @@ int cpu_exec(CPUState *cpu)
     cc->cpu_exec_exit(cpu);
     rcu_read_unlock();
 
-    /* fail safe : never use current_cpu outside cpu_exec() */
-    current_cpu = NULL;
-
-    /* Does not need atomic_mb_set because a spurious wakeup is okay.  */
-    atomic_set(&tcg_current_cpu, NULL);
     return ret;
 }
diff --git a/cpus.c b/cpus.c
index 02fae13..f7c7359 100644
--- a/cpus.c
+++ b/cpus.c
@@ -966,10 +966,7 @@ void run_on_cpu(CPUState *cpu, void (*func)(void *data), void *data)
 
     qemu_cpu_kick(cpu);
     while (!atomic_mb_read(&wi.done)) {
-        CPUState *self_cpu = current_cpu;
-
         qemu_cond_wait(&qemu_work_cond, &qemu_global_mutex);
-        current_cpu = self_cpu;
     }
 }
 
@@ -1031,13 +1028,13 @@ static void flush_queued_work(CPUState *cpu)
 
 static void qemu_wait_io_event_common(CPUState *cpu)
 {
+    atomic_mb_set(&cpu->thread_kicked, false);
     if (cpu->stop) {
         cpu->stop = false;
         cpu->stopped = true;
         qemu_cond_broadcast(&qemu_pause_cond);
     }
     flush_queued_work(cpu);
-    cpu->thread_kicked = false;
 }
 
 static void qemu_tcg_wait_io_event(CPUState *cpu)
@@ -1046,9 +1043,7 @@ static void qemu_tcg_wait_io_event(CPUState *cpu)
         qemu_cond_wait(cpu->halt_cond, &qemu_global_mutex);
     }
 
-    CPU_FOREACH(cpu) {
-        qemu_wait_io_event_common(cpu);
-    }
+    qemu_wait_io_event_common(cpu);
 }
 
 static void qemu_kvm_wait_io_event(CPUState *cpu)
@@ -1115,6 +1110,7 @@ static void *qemu_dummy_cpu_thread_fn(void *arg)
     qemu_thread_get_self(cpu->thread);
     cpu->thread_id = qemu_get_thread_id();
     cpu->can_do_io = 1;
+    current_cpu = cpu;
 
     sigemptyset(&waitset);
     sigaddset(&waitset, SIG_IPI);
@@ -1123,9 +1119,7 @@ static void *qemu_dummy_cpu_thread_fn(void *arg)
     cpu->created = true;
     qemu_cond_signal(&qemu_cpu_cond);
 
-    current_cpu = cpu;
     while (1) {
-        current_cpu = NULL;
         qemu_mutex_unlock_iothread();
         do {
             int sig;
@@ -1136,7 +1130,6 @@ static void *qemu_dummy_cpu_thread_fn(void *arg)
             exit(1);
         }
         qemu_mutex_lock_iothread();
-        current_cpu = cpu;
         qemu_wait_io_event_common(cpu);
     }
 
@@ -1153,32 +1146,52 @@ static void *qemu_dummy_cpu_thread_fn(void *arg)
  * elsewhere.
  */
 static int tcg_cpu_exec(CPUState *cpu);
-static void qemu_cpu_kick_no_halt(void);
+
+struct kick_info {
+    QEMUTimer *timer;
+    CPUState  *cpu;
+};
 
 static void kick_tcg_thread(void *opaque)
 {
-    QEMUTimer *self = *(QEMUTimer **) opaque;
-    timer_mod(self,
+    struct kick_info *info = (struct kick_info *) opaque;
+    CPUState *cpu = atomic_mb_read(&info->cpu);
+
+    timer_mod(info->timer,
               qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) +
               NANOSECONDS_PER_SECOND / 10);
-    qemu_cpu_kick_no_halt();
+
+    if (cpu) {
+        cpu_exit(cpu);
+    }
 }
 
-static void *qemu_tcg_cpu_thread_fn(void *arg)
+static void handle_icount_deadline(void)
+{
+    if (use_icount) {
+        int64_t deadline =
+            qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL);
+
+        if (deadline == 0) {
+            qemu_clock_notify(QEMU_CLOCK_VIRTUAL);
+        }
+    }
+}
+
+static void *qemu_tcg_single_cpu_thread_fn(void *arg)
 {
+    struct kick_info info;
     CPUState *cpu = arg;
-    QEMUTimer *kick_timer;
 
     rcu_register_thread();
 
     qemu_mutex_lock_iothread();
     qemu_thread_get_self(cpu->thread);
 
-    CPU_FOREACH(cpu) {
-        cpu->thread_id = qemu_get_thread_id();
-        cpu->created = true;
-        cpu->can_do_io = 1;
-    }
+    cpu->thread_id = qemu_get_thread_id();
+    cpu->created = true;
+    cpu->can_do_io = 1;
+    current_cpu = cpu;
     qemu_cond_signal(&qemu_cpu_cond);
 
     /* wait for initial kick-off after machine start */
@@ -1193,18 +1206,24 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
 
     /* Set to kick if we have to do more than one vCPU */
     if (CPU_NEXT(first_cpu)) {
-        kick_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,  kick_tcg_thread, &kick_timer);
-        timer_mod(kick_timer,
+        info.timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,  kick_tcg_thread, &info);
+        info.cpu = NULL;
+        smp_wmb();
+        timer_mod(info.timer,
                   qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) +
                   NANOSECONDS_PER_SECOND / 10);
     }
 
     /* process any pending work */
-    atomic_mb_set(&exit_request, 1);
+    CPU_FOREACH(cpu) {
+        atomic_mb_set(&cpu->exit_request, 1);
+    }
 
     cpu = first_cpu;
 
     while (1) {
+        bool nothing_ran = true;
+
         /* Account partial waits to QEMU_CLOCK_VIRTUAL.  */
         qemu_account_warp_timer();
 
@@ -1212,34 +1231,107 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
             cpu = first_cpu;
         }
 
-        for (; cpu != NULL && !exit_request; cpu = CPU_NEXT(cpu)) {
+        for (; cpu != NULL && !cpu->exit_request; cpu = CPU_NEXT(cpu)) {
+            atomic_mb_set(&info.cpu, cpu);
 
             qemu_clock_enable(QEMU_CLOCK_VIRTUAL,
                               (cpu->singlestep_enabled & SSTEP_NOTIMER) == 0);
 
             if (cpu_can_run(cpu)) {
                 int r = tcg_cpu_exec(cpu);
+                nothing_ran = false;
                 if (r == EXCP_DEBUG) {
                     cpu_handle_guest_debug(cpu);
                     break;
                 }
-            } else if (cpu->stop || cpu->stopped) {
-                break;
             }
 
         } /* for cpu.. */
 
-        /* Pairs with smp_wmb in qemu_cpu_kick.  */
-        atomic_mb_set(&exit_request, 0);
+        atomic_mb_set(&info.cpu, NULL);
+
+        handle_icount_deadline();
 
-        if (use_icount) {
-            int64_t deadline = qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL);
+        /* We exit in one of three conditions:
+         *  - cpu is set, because exit_request is true
+         *  - cpu is not set, because we have looped around
+         *  - cpu is not set and nothing ran
+         */
 
-            if (deadline == 0) {
-                qemu_clock_notify(QEMU_CLOCK_VIRTUAL);
+        if (cpu) {
+            g_assert(cpu->exit_request);
+            /* Pairs with smp_wmb in qemu_cpu_kick.  */
+            atomic_mb_set(&cpu->exit_request, 0);
+            qemu_tcg_wait_io_event(cpu);
+        } else if (nothing_ran) {
+            while (all_cpu_threads_idle()) {
+                qemu_cond_wait(first_cpu->halt_cond, &qemu_global_mutex);
             }
         }
-        qemu_tcg_wait_io_event(QTAILQ_FIRST(&cpus));
+    }
+
+    return NULL;
+}
+
+/* Multi-threaded TCG
+ *
+ * In the multi-threaded case each vCPU has its own thread. The TLS
+ * variable current_cpu can be used deep in the code to find the
+ * current CPUState for a given thread.
+ */
+
+static void *qemu_tcg_cpu_thread_fn(void *arg)
+{
+    CPUState *cpu = arg;
+
+    rcu_register_thread();
+
+    qemu_mutex_lock_iothread();
+    qemu_thread_get_self(cpu->thread);
+
+    cpu->thread_id = qemu_get_thread_id();
+    cpu->created = true;
+    cpu->can_do_io = 1;
+    current_cpu = cpu;
+    qemu_cond_signal(&qemu_cpu_cond);
+
+    /* process any pending work */
+    atomic_mb_set(&cpu->exit_request, 1);
+
+    while (1) {
+        bool sleep = false;
+
+        if (cpu_can_run(cpu)) {
+            int r = tcg_cpu_exec(cpu);
+            switch (r)
+            {
+            case EXCP_DEBUG:
+                cpu_handle_guest_debug(cpu);
+                break;
+            case EXCP_HALTED:
+                /* during start-up the vCPU is reset and the thread is
+                 * kicked several times. If we don't ensure we go back
+                 * to sleep in the halted state we won't cleanly
+                 * start-up when the vCPU is enabled.
+                 */
+                sleep = true;
+                break;
+            default:
+                /* Ignore everything else? */
+                break;
+            }
+        } else {
+            sleep = true;
+        }
+
+        handle_icount_deadline();
+
+        if (sleep) {
+            qemu_cond_wait(cpu->halt_cond, &qemu_global_mutex);
+        }
+
+        atomic_mb_set(&cpu->exit_request, 0);
+        qemu_tcg_wait_io_event(cpu);
     }
 
     return NULL;
@@ -1264,24 +1356,11 @@ static void qemu_cpu_kick_thread(CPUState *cpu)
 #endif
 }
 
-static void qemu_cpu_kick_no_halt(void)
-{
-    CPUState *cpu;
-    /* Ensure whatever caused the exit has reached the CPU threads before
-     * writing exit_request.
-     */
-    atomic_mb_set(&exit_request, 1);
-    cpu = atomic_mb_read(&tcg_current_cpu);
-    if (cpu) {
-        cpu_exit(cpu);
-    }
-}
-
 void qemu_cpu_kick(CPUState *cpu)
 {
     qemu_cond_broadcast(cpu->halt_cond);
     if (tcg_enabled()) {
-        qemu_cpu_kick_no_halt();
+        cpu_exit(cpu);
     } else {
         qemu_cpu_kick_thread(cpu);
     }
@@ -1347,13 +1426,6 @@ void pause_all_vcpus(void)
 
     if (qemu_in_vcpu_thread()) {
         cpu_stop_current();
-        if (!kvm_enabled()) {
-            CPU_FOREACH(cpu) {
-                cpu->stop = false;
-                cpu->stopped = true;
-            }
-            return;
-        }
     }
 
     while (!all_vcpus_paused()) {
@@ -1387,29 +1459,41 @@ void resume_all_vcpus(void)
 static void qemu_tcg_init_vcpu(CPUState *cpu)
 {
     char thread_name[VCPU_THREAD_NAME_SIZE];
-    static QemuCond *tcg_halt_cond;
-    static QemuThread *tcg_cpu_thread;
+    static QemuCond *single_tcg_halt_cond;
+    static QemuThread *single_tcg_cpu_thread;
 
-    /* share a single thread for all cpus with TCG */
-    if (!tcg_cpu_thread) {
+    if (qemu_tcg_mttcg_enabled() || !single_tcg_cpu_thread) {
         cpu->thread = g_malloc0(sizeof(QemuThread));
         cpu->halt_cond = g_malloc0(sizeof(QemuCond));
         qemu_cond_init(cpu->halt_cond);
-        tcg_halt_cond = cpu->halt_cond;
-        snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/TCG",
+
+        if (qemu_tcg_mttcg_enabled()) {
+            /* create a thread per vCPU with TCG (MTTCG) */
+            snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/TCG",
                  cpu->cpu_index);
-        qemu_thread_create(cpu->thread, thread_name, qemu_tcg_cpu_thread_fn,
-                           cpu, QEMU_THREAD_JOINABLE);
+
+            qemu_thread_create(cpu->thread, thread_name, qemu_tcg_cpu_thread_fn,
+                               cpu, QEMU_THREAD_JOINABLE);
+
+        } else {
+            /* share a single thread for all cpus with TCG */
+            snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "ALL CPUs/TCG");
+            qemu_thread_create(cpu->thread, thread_name, qemu_tcg_single_cpu_thread_fn,
+                               cpu, QEMU_THREAD_JOINABLE);
+
+            single_tcg_halt_cond = cpu->halt_cond;
+            single_tcg_cpu_thread = cpu->thread;
+        }
 #ifdef _WIN32
         cpu->hThread = qemu_thread_get_handle(cpu->thread);
 #endif
         while (!cpu->created) {
             qemu_cond_wait(&qemu_cpu_cond, &qemu_global_mutex);
         }
-        tcg_cpu_thread = cpu->thread;
     } else {
-        cpu->thread = tcg_cpu_thread;
-        cpu->halt_cond = tcg_halt_cond;
+        /* For non-MTTCG cases we share the thread */
+        cpu->thread = single_tcg_cpu_thread;
+        cpu->halt_cond = single_tcg_halt_cond;
     }
 }
 
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 60716ae..cde4b7a 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -479,8 +479,4 @@ bool memory_region_is_unassigned(MemoryRegion *mr);
 /* vl.c */
 extern int singlestep;
 
-/* cpu-exec.c, accessed with atomic_mb_read/atomic_mb_set */
-extern CPUState *tcg_current_cpu;
-extern bool exit_request;
-
 #endif
diff --git a/translate-all.c b/translate-all.c
index 0c377bb..8e70583 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -122,36 +122,28 @@ static void *l1_map[V_L1_SIZE];
 TCGContext tcg_ctx;
 
 /* translation block context */
-#ifdef CONFIG_USER_ONLY
 __thread int have_tb_lock;
-#endif
 
 void tb_lock(void)
 {
-#ifdef CONFIG_USER_ONLY
     assert(!have_tb_lock);
     qemu_mutex_lock(&tcg_ctx.tb_ctx.tb_lock);
     have_tb_lock++;
-#endif
 }
 
 void tb_unlock(void)
 {
-#ifdef CONFIG_USER_ONLY
     assert(have_tb_lock);
     have_tb_lock--;
     qemu_mutex_unlock(&tcg_ctx.tb_ctx.tb_lock);
-#endif
 }
 
 void tb_lock_reset(void)
 {
-#ifdef CONFIG_USER_ONLY
     if (have_tb_lock) {
         qemu_mutex_unlock(&tcg_ctx.tb_ctx.tb_lock);
         have_tb_lock = 0;
     }
-#endif
 }
 
 static TranslationBlock *tb_find_pc(uintptr_t tc_ptr);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* Re: [Qemu-devel] [RFC v2 01/11] tcg: move tb_invalidated_flag to CPUState
  2016-04-05 15:32 ` [Qemu-devel] [RFC v2 01/11] tcg: move tb_invalidated_flag to CPUState Alex Bennée
@ 2016-04-05 15:44   ` Paolo Bonzini
  2016-04-06 10:11     ` Sergey Fedorov
  0 siblings, 1 reply; 66+ messages in thread
From: Paolo Bonzini @ 2016-04-05 15:44 UTC (permalink / raw)
  To: Alex Bennée, mttcg, fred.konrad, a.rigo, serge.fdrv, cota
  Cc: peter.maydell, claudio.fontana, Sergey Fedorov,
	Peter Crosthwaite, jan.kiszka, mark.burton, qemu-devel,
	Andreas Färber, rth



On 05/04/2016 17:32, Alex Bennée wrote:
> +    cpu->tb_invalidated_flag = false;
>      tb = tb_gen_code(cpu, orig_tb->pc, orig_tb->cs_base, orig_tb->flags,
>                       max_cycles | CF_NOCACHE
>                           | (ignore_icount ? CF_IGNORE_ICOUNT : 0));
> -    tb->orig_tb = tcg_ctx.tb_ctx.tb_invalidated_flag ? NULL : orig_tb;
> +    tb->orig_tb = cpu->tb_invalidated_flag ? NULL : orig_tb;
>      cpu->current_tb = tb;

Based on discussion, it's safest to save the old value at the beginning
of the hunk, and "OR" it into cpu->tb_invalidated_flag here.

>      /* if no translated code available, then translate it now */
> +    cpu->tb_invalidated_flag = false;

Please remove this...

>      tb = tb_gen_code(cpu, pc, cs_base, flags, 0);
>  
>  #ifdef CONFIG_USER_ONLY
> @@ -509,12 +509,11 @@ int cpu_exec(CPUState *cpu)
>                  tb = tb_find_fast(cpu);
>                  /* Note: we do it here to avoid a gcc bug on Mac OS X when
>                     doing it in tb_find_slow */
> -                if (tcg_ctx.tb_ctx.tb_invalidated_flag) {
> +                if (cpu->tb_invalidated_flag) {
>                      /* as some TB could have been invalidated because
> -                       of memory exceptions while generating the code, we
> +                       of a tb_flush while generating the code, we
>                         must recompute the hash index here */
>                      next_tb = 0;
> -                    tcg_ctx.tb_ctx.tb_invalidated_flag = 0;

... and leave the zeroing here.

Paolo

>                  }

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [Qemu-devel] [RFC v2 01/11] tcg: move tb_invalidated_flag to CPUState
  2016-04-05 15:44   ` Paolo Bonzini
@ 2016-04-06 10:11     ` Sergey Fedorov
  0 siblings, 0 replies; 66+ messages in thread
From: Sergey Fedorov @ 2016-04-06 10:11 UTC (permalink / raw)
  To: Paolo Bonzini, Alex Bennée, mttcg, fred.konrad, a.rigo, cota
  Cc: peter.maydell, claudio.fontana, Sergey Fedorov,
	Peter Crosthwaite, jan.kiszka, mark.burton, qemu-devel,
	Andreas Färber, rth

On 05/04/16 18:44, Paolo Bonzini wrote:
>
> On 05/04/2016 17:32, Alex Bennée wrote:
>> +    cpu->tb_invalidated_flag = false;
>>      tb = tb_gen_code(cpu, orig_tb->pc, orig_tb->cs_base, orig_tb->flags,
>>                       max_cycles | CF_NOCACHE
>>                           | (ignore_icount ? CF_IGNORE_ICOUNT : 0));
>> -    tb->orig_tb = tcg_ctx.tb_ctx.tb_invalidated_flag ? NULL : orig_tb;
>> +    tb->orig_tb = cpu->tb_invalidated_flag ? NULL : orig_tb;
>>      cpu->current_tb = tb;
> Based on discussion, it's safest to save the old value at the beginning
> of the hunk, and "OR" it into cpu->tb_invalidated_flag here.
>
>>      /* if no translated code available, then translate it now */
>> +    cpu->tb_invalidated_flag = false;
> Please remove this...
>
>>      tb = tb_gen_code(cpu, pc, cs_base, flags, 0);
>>  
>>  #ifdef CONFIG_USER_ONLY
>> @@ -509,12 +509,11 @@ int cpu_exec(CPUState *cpu)
>>                  tb = tb_find_fast(cpu);
>>                  /* Note: we do it here to avoid a gcc bug on Mac OS X when
>>                     doing it in tb_find_slow */
>> -                if (tcg_ctx.tb_ctx.tb_invalidated_flag) {
>> +                if (cpu->tb_invalidated_flag) {
>>                      /* as some TB could have been invalidated because
>> -                       of memory exceptions while generating the code, we
>> +                       of a tb_flush while generating the code, we
>>                         must recompute the hash index here */
>>                      next_tb = 0;
>> -                    tcg_ctx.tb_ctx.tb_invalidated_flag = 0;
> ... and leave the zeroing here.
>

Yes, I'm going to do something like this in a separate patch series or
just append it back to the "TCG misc clean-up" patch series when
re-spinning it.

Kind regards,
Sergey

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [Qemu-devel] [RFC v2 02/11] cpus: make all_vcpus_paused() return bool
  2016-04-05 15:32 ` [Qemu-devel] [RFC v2 02/11] cpus: make all_vcpus_paused() return bool Alex Bennée
@ 2016-04-11 12:48   ` Sergey Fedorov
  0 siblings, 0 replies; 66+ messages in thread
From: Sergey Fedorov @ 2016-04-11 12:48 UTC (permalink / raw)
  To: Alex Bennée, mttcg, fred.konrad, a.rigo, cota
  Cc: qemu-devel, mark.burton, pbonzini, jan.kiszka, rth,
	peter.maydell, claudio.fontana, Peter Crosthwaite

On 05/04/16 18:32, Alex Bennée wrote:
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

Reviewed-by: Sergey Fedorov <sergey.fedorov@linaro.org>

> ---
>  cpus.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/cpus.c b/cpus.c
> index 8ae4777..e118fdf 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -1248,17 +1248,17 @@ void qemu_mutex_unlock_iothread(void)
>      qemu_mutex_unlock(&qemu_global_mutex);
>  }
>  
> -static int all_vcpus_paused(void)
> +static bool all_vcpus_paused(void)
>  {
>      CPUState *cpu;
>  
>      CPU_FOREACH(cpu) {
>          if (!cpu->stopped) {
> -            return 0;
> +            return false;
>          }
>      }
>  
> -    return 1;
> +    return true;
>  }
>  
>  void pause_all_vcpus(void)

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [Qemu-devel] [RFC v2 03/11] docs: new design document multi-thread-tcg.txt (DRAFTING)
  2016-04-05 15:32 ` [Qemu-devel] [RFC v2 03/11] docs: new design document multi-thread-tcg.txt (DRAFTING) Alex Bennée
@ 2016-04-11 20:00   ` Sergey Fedorov
  2016-05-25 15:48     ` Sergey Fedorov
  2016-05-06 11:25   ` Sergey Fedorov
  1 sibling, 1 reply; 66+ messages in thread
From: Sergey Fedorov @ 2016-04-11 20:00 UTC (permalink / raw)
  To: Alex Bennée, mttcg, fred.konrad, a.rigo, cota
  Cc: qemu-devel, mark.burton, pbonzini, jan.kiszka, rth,
	peter.maydell, claudio.fontana

On 05/04/16 18:32, Alex Bennée wrote:

(snip)
> +Introduction
> +============
> +
> +This document outlines the design for multi-threaded TCG emulation.
> +The original TCG implementation was single threaded and dealt with
> +multiple CPUs by with simple round-robin scheduling. This simplified a
> +lot of things but became increasingly limited as systems being
> +emulated gained additional cores and per-core performance gains for
> +host systems started to level off.

This looks like the description of system-mode TCG only. Maybe it would
be worth mentioning current status of user-mode multithreading support
as well?

(snip)
> +Shared Data Structures
> +======================
> +
> +Global TCG State
> +----------------
> +
> +We need to protect the entire code generation cycle including any post
> +generation patching of the translated code. This also implies a shared
> +translation buffer which contains code running on all cores. Any
> +execution path that comes to the main run loop will need to hold a
> +mutex for code generation. This also includes times when we need flush
> +code or jumps from the tb_cache.
> +
> +DESIGN REQUIREMENT: Add locking around all code generation, patching
> +and jump cache modification.

I think we could also benefit from some kind of "lock-free" algorithms
where it is possible. So locking as a requirement seems to be a bit too
enforcing. Regarding shared translation buffer, how is it implied? Don't
we have on option of separate per-vCPU code cache? (Maybe I missed some
discussion on this?)

> +
> +Translation Blocks
> +------------------
> +
> +Currently the whole system shares a single code generation buffer
> +which when full will force a flush of all translations and start from
> +scratch again.
> +
> +Once a basic block has been translated it will continue to be used
> +until it is invalidated. These invalidation events are typically due
> +to page changes in system emulation

I didn't dig too deep into this yet, but TLB invalidation after
virtual-to-physical address mapping changes doesn't seem to invalidate
any TBs in system mode...

>  and changes in memory mapping in
> +user mode. Debugging operations 

and self modifying code

> can also trigger invalidation's.
> +
> +The invalidation also requires removing the TB from look-ups
> +(tb_phys_hash and tb_jmp_cache) as well removing any direct TB to TB
> +patched jumps.

We could probably get by lazy approach with just preventing to pick up
invalidated TBs from look-ups. It also possible not to remove direct
jumps from the TB being invalidated to other TBs since it is not going
to be executed anyway.

> +
> +DESIGN REQUIREMENT: Safely handle invalidation of TBs

It would probably be a good idea to include translation buffer flush
considerations as well.

> +
> +Memory maps and TLBs
> +--------------------
> +
> +The memory handling code is fairly critical to the speed of memory
> +access in the emulated system.
> +

It would be nice to put some intro sentence for the following bullets :)

> +  - Memory regions (dividing up access to PIO, MMIO and RAM)
> +  - Dirty page tracking (for code gen, migration and display)
> +  - Virtual TLB (for translating guest address->real address)
> +
> +There is a both a fast path walked by the generated code and a slow
> +path when resolution is required. When the TLB tables are updated we
> +need to ensure they are done in a safe way by bringing all executing
> +threads to a halt before making the modifications.

Again, I think we could benefit if we could possibly manage to avoid
bringing vCPU threads to halt.

Nothing about memory regions and dirty page tracking?

> +
> +DESIGN REQUIREMENTS:
> +
> +  - TLB Flush All/Page
> +    - can be across-CPUs
> +    - will need all other CPUs brought to a halt

s/will/may/ ?

> +  - TLB Update (update a CPUTLBEntry, via tlb_set_page_with_attrs)
> +    - This is a per-CPU table - by definition can't race
> +    - updated by it's own thread when the slow-path is forced
(snip)
> +Memory Consistency
> +==================
> +
> +Between emulated guests and host systems there are a range of memory
> +consistency models. While emulating weakly ordered systems on strongly
> +ordered hosts shouldn't cause any problems the same is not true for
> +the reverse setup.
> +
> +The proposed design currently does not address the problem of
> +emulating strong ordering on a weakly ordered host although even on
> +strongly ordered systems software should be using synchronisation
> +primitives to ensure correct operation.

e.g. strongly-ordered x86 allows store-after-load reordering and
provides memory fences to synchronize.

> +
> +Memory Barriers
> +---------------
> +
> +Barriers (sometimes known as fences) provide a mechanism for software
> +to enforce a particular ordering of memory operations from the point
> +of view of external observers (e.g. another processor core). They can
> +apply to any memory operations as well as just loads or stores.
> +
> +The Linux kernel has an excellent write-up on the various forms of
> +memory barrier and the guarantees they can provide [1].
> +
> +Barriers are often wrapped around synchronisation primitives to
> +provide explicit memory ordering semantics. However they can be used
> +by themselves to provide safe lockless access by ensuring for example
> +a signal flag will always be set after a payload.
> +
> +DESIGN REQUIREMENT: Add a new tcg_memory_barrier op
> +
> +This would enforce a strong load/store ordering so all loads/stores
> +complete at the memory barrier. On single-core non-SMP strongly
> +ordered backends this could become a NOP.
> +
> +There may be a case for further refinement if this causes performance
> +bottlenecks.

I think we'd better make a provision and add some flags to the memory
barrier operation to specify what kind of barrier is required.

Another idea is that we could associate memory ordering attribute with
each TCG load/store operation and meet memory ordering requirements
across various TCG targets and hosts in a uniform way.

(snip)

Kind regards,
Sergey

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [Qemu-devel] [RFC v2 08/11] tcg: add options for enabling MTTCG
  2016-04-05 15:32 ` [Qemu-devel] [RFC v2 08/11] tcg: add options for enabling MTTCG Alex Bennée
@ 2016-04-11 20:50   ` Sergey Fedorov
  2016-04-12 11:48     ` Alex Bennée
  2016-05-09 10:45     ` Paolo Bonzini
  2016-04-12 13:23   ` Sergey Fedorov
  1 sibling, 2 replies; 66+ messages in thread
From: Sergey Fedorov @ 2016-04-11 20:50 UTC (permalink / raw)
  To: Alex Bennée, mttcg, fred.konrad, a.rigo, cota
  Cc: qemu-devel, mark.burton, pbonzini, jan.kiszka, rth,
	peter.maydell, claudio.fontana, Peter Crosthwaite,
	Andreas Färber

On 05/04/16 18:32, Alex Bennée wrote:
> diff --git a/qemu-options.hx b/qemu-options.hx
> index a770086..4eca704 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -3224,6 +3224,20 @@ Attach to existing xen domain.
>  xend will use this when starting QEMU (XEN only).
>  ETEXI
>  
> +DEF("tcg", HAS_ARG, QEMU_OPTION_tcg, \
> +    "-tcg [mttcg=on|off] control TCG options\n", QEMU_ARCH_ALL)
> +STEXI
> +@item -tcg
> +@findex -tcg
> +@table @option
> +@item mttcg=[on|off]
> +Control multi-threaded TCG. By default QEMU will enable multi-threaded
> +emulation for front/back-end combinations that are known to work. The
> +user may enable it against the defaults however odd guest behaviour
> +may occur.
> +@end table
> +ETEXI

Maybe we'd better use existing qemu accelerator framework and introduce
"-machine accel=mttcg" option?

Kind regards,
Sergey

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [Qemu-devel] [RFC v2 09/11] tcg: add kick timer for single-threaded vCPU emulation
  2016-04-05 15:32 ` [Qemu-devel] [RFC v2 09/11] tcg: add kick timer for single-threaded vCPU emulation Alex Bennée
@ 2016-04-11 21:39   ` Sergey Fedorov
  2016-06-02 16:00     ` Alex Bennée
  0 siblings, 1 reply; 66+ messages in thread
From: Sergey Fedorov @ 2016-04-11 21:39 UTC (permalink / raw)
  To: Alex Bennée, mttcg, fred.konrad, a.rigo, cota
  Cc: qemu-devel, mark.burton, pbonzini, jan.kiszka, rth,
	peter.maydell, claudio.fontana, Peter Crosthwaite

On 05/04/16 18:32, Alex Bennée wrote:
> +static void kick_tcg_thread(void *opaque)
> +{
> +    QEMUTimer *self = *(QEMUTimer **) opaque;
> +    timer_mod(self,
> +              qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) +
> +              NANOSECONDS_PER_SECOND / 10);
> +    qemu_cpu_kick_no_halt();
> +}
>  

It would be nice to have some definition (e.g. macro) of TCG thread kick
period.

(snip)

> @@ -1179,6 +1198,14 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
>          }
>      }
>  
> +    /* Set to kick if we have to do more than one vCPU */
> +    if (CPU_NEXT(first_cpu)) {
> +        kick_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,  kick_tcg_thread, &kick_timer);
> +        timer_mod(kick_timer,
> +                  qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) +
> +                  NANOSECONDS_PER_SECOND / 10);
> +    }
> +    

I think cpu_ticks_init() could be more natural place to put this
initialization in.

Kind regards,
Sergey

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [Qemu-devel] [RFC v2 08/11] tcg: add options for enabling MTTCG
  2016-04-11 20:50   ` Sergey Fedorov
@ 2016-04-12 11:48     ` Alex Bennée
  2016-04-12 11:59       ` Peter Maydell
  2016-04-12 12:48       ` Sergey Fedorov
  2016-05-09 10:45     ` Paolo Bonzini
  1 sibling, 2 replies; 66+ messages in thread
From: Alex Bennée @ 2016-04-12 11:48 UTC (permalink / raw)
  To: Sergey Fedorov
  Cc: mttcg, fred.konrad, a.rigo, cota, qemu-devel, mark.burton,
	pbonzini, jan.kiszka, rth, peter.maydell, claudio.fontana,
	Peter Crosthwaite, Andreas Färber


Sergey Fedorov <serge.fdrv@gmail.com> writes:

> On 05/04/16 18:32, Alex Bennée wrote:
>> diff --git a/qemu-options.hx b/qemu-options.hx
>> index a770086..4eca704 100644
>> --- a/qemu-options.hx
>> +++ b/qemu-options.hx
>> @@ -3224,6 +3224,20 @@ Attach to existing xen domain.
>>  xend will use this when starting QEMU (XEN only).
>>  ETEXI
>>
>> +DEF("tcg", HAS_ARG, QEMU_OPTION_tcg, \
>> +    "-tcg [mttcg=on|off] control TCG options\n", QEMU_ARCH_ALL)
>> +STEXI
>> +@item -tcg
>> +@findex -tcg
>> +@table @option
>> +@item mttcg=[on|off]
>> +Control multi-threaded TCG. By default QEMU will enable multi-threaded
>> +emulation for front/back-end combinations that are known to work. The
>> +user may enable it against the defaults however odd guest behaviour
>> +may occur.
>> +@end table
>> +ETEXI
>
> Maybe we'd better use existing qemu accelerator framework and introduce
> "-machine accel=mttcg" option?

My worry would be breaking existing code which assumes kvm | tcg. We
will want to enable mttcg by default for combos that work without having
to tweak tooling that already uses the -machine options.

--
Alex Bennée

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [Qemu-devel] [RFC v2 08/11] tcg: add options for enabling MTTCG
  2016-04-12 11:48     ` Alex Bennée
@ 2016-04-12 11:59       ` Peter Maydell
  2016-04-12 12:42         ` Sergey Fedorov
  2016-04-12 12:48       ` Sergey Fedorov
  1 sibling, 1 reply; 66+ messages in thread
From: Peter Maydell @ 2016-04-12 11:59 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Sergey Fedorov, MTTCG Devel, KONRAD Frédéric,
	Alvise Rigo, Emilio G. Cota, QEMU Developers, Mark Burton,
	Paolo Bonzini, J. Kiszka, Richard Henderson, Claudio Fontana,
	Peter Crosthwaite, Andreas Färber

On 12 April 2016 at 12:48, Alex Bennée <alex.bennee@linaro.org> wrote:
> Sergey Fedorov <serge.fdrv@gmail.com> writes:
>> Maybe we'd better use existing qemu accelerator framework and introduce
>> "-machine accel=mttcg" option?
>
> My worry would be breaking existing code which assumes kvm | tcg. We
> will want to enable mttcg by default for combos that work without having
> to tweak tooling that already uses the -machine options.

Comments from the peanut gallery:
 * We definitely want to just default to MTTCG where it works:
users shouldn't have to add new command line switches to get the
better performance etc, because in practice that won't happen
 * Anything that adds a new command line option at all has to
be supported practically forever, so options that are only
useful for a transitional period are worth thinking twice about

thanks
-- PMM

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [Qemu-devel] [RFC v2 08/11] tcg: add options for enabling MTTCG
  2016-04-12 11:59       ` Peter Maydell
@ 2016-04-12 12:42         ` Sergey Fedorov
  2016-04-12 12:50           ` KONRAD Frederic
  0 siblings, 1 reply; 66+ messages in thread
From: Sergey Fedorov @ 2016-04-12 12:42 UTC (permalink / raw)
  To: Peter Maydell, Alex Bennée
  Cc: MTTCG Devel, KONRAD Frédéric, Alvise Rigo,
	Emilio G. Cota, QEMU Developers, Mark Burton, Paolo Bonzini,
	J. Kiszka, Richard Henderson, Claudio Fontana, Peter Crosthwaite,
	Andreas Färber

On 12/04/16 14:59, Peter Maydell wrote:
> On 12 April 2016 at 12:48, Alex Bennée <alex.bennee@linaro.org> wrote:
>> Sergey Fedorov <serge.fdrv@gmail.com> writes:
>>> Maybe we'd better use existing qemu accelerator framework and introduce
>>> "-machine accel=mttcg" option?
>> My worry would be breaking existing code which assumes kvm | tcg. We
>> will want to enable mttcg by default for combos that work without having
>> to tweak tooling that already uses the -machine options.
> Comments from the peanut gallery:
>  * We definitely want to just default to MTTCG where it works:
> users shouldn't have to add new command line switches to get the
> better performance etc, because in practice that won't happen
>  * Anything that adds a new command line option at all has to
> be supported practically forever, so options that are only
> useful for a transitional period are worth thinking twice about

Yes, but users may like to have an option to disable MTTCG for some
reason. I'm also concerned about icount mode: not sure how to account
virtual time when all vCPUs run in parallel.

Kind regards,
Sergey

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [Qemu-devel] [RFC v2 08/11] tcg: add options for enabling MTTCG
  2016-04-12 11:48     ` Alex Bennée
  2016-04-12 11:59       ` Peter Maydell
@ 2016-04-12 12:48       ` Sergey Fedorov
  1 sibling, 0 replies; 66+ messages in thread
From: Sergey Fedorov @ 2016-04-12 12:48 UTC (permalink / raw)
  To: Alex Bennée
  Cc: mttcg, fred.konrad, a.rigo, cota, qemu-devel, mark.burton,
	pbonzini, jan.kiszka, rth, peter.maydell, claudio.fontana,
	Peter Crosthwaite, Andreas Färber

On 12/04/16 14:48, Alex Bennée wrote:
> Sergey Fedorov <serge.fdrv@gmail.com> writes:
>
>> On 05/04/16 18:32, Alex Bennée wrote:
>>> diff --git a/qemu-options.hx b/qemu-options.hx
>>> index a770086..4eca704 100644
>>> --- a/qemu-options.hx
>>> +++ b/qemu-options.hx
>>> @@ -3224,6 +3224,20 @@ Attach to existing xen domain.
>>>  xend will use this when starting QEMU (XEN only).
>>>  ETEXI
>>>
>>> +DEF("tcg", HAS_ARG, QEMU_OPTION_tcg, \
>>> +    "-tcg [mttcg=on|off] control TCG options\n", QEMU_ARCH_ALL)
>>> +STEXI
>>> +@item -tcg
>>> +@findex -tcg
>>> +@table @option
>>> +@item mttcg=[on|off]
>>> +Control multi-threaded TCG. By default QEMU will enable multi-threaded
>>> +emulation for front/back-end combinations that are known to work. The
>>> +user may enable it against the defaults however odd guest behaviour
>>> +may occur.
>>> +@end table
>>> +ETEXI
>> Maybe we'd better use existing qemu accelerator framework and introduce
>> "-machine accel=mttcg" option?
> My worry would be breaking existing code which assumes kvm | tcg. We
> will want to enable mttcg by default for combos that work without having
> to tweak tooling that already uses the -machine options.

You are right, we'd better make MTTCG as an option for TCG, not as a
separate accelerator.

Kind regards,
Sergey

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [Qemu-devel] [RFC v2 08/11] tcg: add options for enabling MTTCG
  2016-04-12 12:42         ` Sergey Fedorov
@ 2016-04-12 12:50           ` KONRAD Frederic
  2016-04-12 13:00             ` Sergey Fedorov
  0 siblings, 1 reply; 66+ messages in thread
From: KONRAD Frederic @ 2016-04-12 12:50 UTC (permalink / raw)
  To: Sergey Fedorov, Peter Maydell, Alex Bennée
  Cc: MTTCG Devel, Alvise Rigo, Emilio G. Cota, QEMU Developers,
	Mark Burton, Paolo Bonzini, J. Kiszka, Richard Henderson,
	Claudio Fontana, Peter Crosthwaite, Andreas Färber



Le 12/04/2016 14:42, Sergey Fedorov a écrit :
> On 12/04/16 14:59, Peter Maydell wrote:
>> On 12 April 2016 at 12:48, Alex Bennée <alex.bennee@linaro.org> wrote:
>>> Sergey Fedorov <serge.fdrv@gmail.com> writes:
>>>> Maybe we'd better use existing qemu accelerator framework and introduce
>>>> "-machine accel=mttcg" option?
>>> My worry would be breaking existing code which assumes kvm | tcg. We
>>> will want to enable mttcg by default for combos that work without having
>>> to tweak tooling that already uses the -machine options.
>> Comments from the peanut gallery:
>>   * We definitely want to just default to MTTCG where it works:
>> users shouldn't have to add new command line switches to get the
>> better performance etc, because in practice that won't happen
>>   * Anything that adds a new command line option at all has to
>> be supported practically forever, so options that are only
>> useful for a transitional period are worth thinking twice about
>
> Yes, but users may like to have an option to disable MTTCG for some
> reason. I'm also concerned about icount mode: not sure how to account
> virtual time when all vCPUs run in parallel.
>

Hi,

I'm thinking the same, we don't have a solution for icount yet.
The reverse execution support is probably badly broken as well.

Fred

> Kind regards,
> Sergey
>

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [Qemu-devel] [RFC v2 08/11] tcg: add options for enabling MTTCG
  2016-04-12 12:50           ` KONRAD Frederic
@ 2016-04-12 13:00             ` Sergey Fedorov
  2016-04-12 13:03               ` Pavel Dovgalyuk
  0 siblings, 1 reply; 66+ messages in thread
From: Sergey Fedorov @ 2016-04-12 13:00 UTC (permalink / raw)
  To: KONRAD Frederic, Peter Maydell, Alex Bennée
  Cc: MTTCG Devel, Alvise Rigo, Emilio G. Cota, QEMU Developers,
	Mark Burton, Paolo Bonzini, J. Kiszka, Richard Henderson,
	Claudio Fontana, Peter Crosthwaite, Andreas Färber

On 12/04/16 15:50, KONRAD Frederic wrote:
>
>
> Le 12/04/2016 14:42, Sergey Fedorov a écrit :
>> On 12/04/16 14:59, Peter Maydell wrote:
>>> On 12 April 2016 at 12:48, Alex Bennée <alex.bennee@linaro.org> wrote:
>>>> Sergey Fedorov <serge.fdrv@gmail.com> writes:
>>>>> Maybe we'd better use existing qemu accelerator framework and
>>>>> introduce
>>>>> "-machine accel=mttcg" option?
>>>> My worry would be breaking existing code which assumes kvm | tcg. We
>>>> will want to enable mttcg by default for combos that work without
>>>> having
>>>> to tweak tooling that already uses the -machine options.
>>> Comments from the peanut gallery:
>>>   * We definitely want to just default to MTTCG where it works:
>>> users shouldn't have to add new command line switches to get the
>>> better performance etc, because in practice that won't happen
>>>   * Anything that adds a new command line option at all has to
>>> be supported practically forever, so options that are only
>>> useful for a transitional period are worth thinking twice about
>>
>> Yes, but users may like to have an option to disable MTTCG for some
>> reason. I'm also concerned about icount mode: not sure how to account
>> virtual time when all vCPUs run in parallel.
>>
>

>
> I'm thinking the same, we don't have a solution for icount yet.
> The reverse execution support is probably badly broken as well.

Reverse execution doesn't even seem to support more than a single core.
As of icount, it looks like to be incompatible with MTTCG.

Kind regards,
Sergey

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [Qemu-devel] [RFC v2 08/11] tcg: add options for enabling MTTCG
  2016-04-12 13:00             ` Sergey Fedorov
@ 2016-04-12 13:03               ` Pavel Dovgalyuk
  2016-04-12 13:19                 ` Sergey Fedorov
  2016-04-12 14:23                 ` Alex Bennée
  0 siblings, 2 replies; 66+ messages in thread
From: Pavel Dovgalyuk @ 2016-04-12 13:03 UTC (permalink / raw)
  To: 'Sergey Fedorov', 'KONRAD Frederic',
	'Peter Maydell', 'Alex Bennée'
  Cc: 'MTTCG Devel', 'Alvise Rigo',
	'Emilio G. Cota', 'QEMU Developers',
	'Mark Burton', 'Paolo Bonzini',
	'J. Kiszka', 'Richard Henderson',
	'Claudio Fontana', 'Peter Crosthwaite',
	'Andreas Färber'

> From: mttcg-request@listserver.greensocs.com [mailto:mttcg-request@listserver.greensocs.com]
> On 12/04/16 15:50, KONRAD Frederic wrote:
> >> Yes, but users may like to have an option to disable MTTCG for some
> >> reason. I'm also concerned about icount mode: not sure how to account
> >> virtual time when all vCPUs run in parallel.
> >
> > I'm thinking the same, we don't have a solution for icount yet.
> > The reverse execution support is probably badly broken as well.
> 
> Reverse execution doesn't even seem to support more than a single core.
> As of icount, it looks like to be incompatible with MTTCG.

It doesn't, because there is no multicore support for tcg yet.
I think, that we have to find some solution for icount (even if it will be slow).

Pavel Dovgalyuk

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [Qemu-devel] [RFC v2 08/11] tcg: add options for enabling MTTCG
  2016-04-12 13:03               ` Pavel Dovgalyuk
@ 2016-04-12 13:19                 ` Sergey Fedorov
  2016-04-12 14:23                 ` Alex Bennée
  1 sibling, 0 replies; 66+ messages in thread
From: Sergey Fedorov @ 2016-04-12 13:19 UTC (permalink / raw)
  To: Pavel Dovgalyuk, 'KONRAD Frederic',
	'Peter Maydell', 'Alex Bennée'
  Cc: 'MTTCG Devel', 'Alvise Rigo',
	'Emilio G. Cota', 'QEMU Developers',
	'Mark Burton', 'Paolo Bonzini',
	'J. Kiszka', 'Richard Henderson',
	'Claudio Fontana', 'Peter Crosthwaite',
	'Andreas Färber'

On 12/04/16 16:03, Pavel Dovgalyuk wrote:
>> From: mttcg-request@listserver.greensocs.com [mailto:mttcg-request@listserver.greensocs.com]
>> On 12/04/16 15:50, KONRAD Frederic wrote:
>>>> Yes, but users may like to have an option to disable MTTCG for some
>>>> reason. I'm also concerned about icount mode: not sure how to account
>>>> virtual time when all vCPUs run in parallel.
>>> I'm thinking the same, we don't have a solution for icount yet.
>>> The reverse execution support is probably badly broken as well.
>> Reverse execution doesn't even seem to support more than a single core.
>> As of icount, it looks like to be incompatible with MTTCG.
> It doesn't, because there is no multicore support for tcg yet.
> I think, that we have to find some solution for icount (even if it will be slow).

Synchronize all vCPUs against their instruction counters? I wonder would
it be of much use in MTTCG mode if it turns out to be quite slow?
Anyhow, valid point, we should think about it.

Kind regards,
Sergey

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [Qemu-devel] [RFC v2 08/11] tcg: add options for enabling MTTCG
  2016-04-05 15:32 ` [Qemu-devel] [RFC v2 08/11] tcg: add options for enabling MTTCG Alex Bennée
  2016-04-11 20:50   ` Sergey Fedorov
@ 2016-04-12 13:23   ` Sergey Fedorov
  2016-04-12 14:28     ` Alex Bennée
  1 sibling, 1 reply; 66+ messages in thread
From: Sergey Fedorov @ 2016-04-12 13:23 UTC (permalink / raw)
  To: Alex Bennée, mttcg, fred.konrad, a.rigo, cota
  Cc: qemu-devel, mark.burton, pbonzini, jan.kiszka, rth,
	peter.maydell, claudio.fontana, Peter Crosthwaite,
	Andreas Färber

On 05/04/16 18:32, Alex Bennée wrote:
> diff --git a/cpus.c b/cpus.c
> index 46732a5..8d27fb0 100644
> --- a/cpus.c
> +++ b/cpus.c

(snip)

> @@ -146,6 +147,48 @@ typedef struct TimersState {
>  } TimersState;
>  
>  static TimersState timers_state;
> +static bool mttcg_enabled;
> +
> +static QemuOptsList qemu_tcg_opts = {
> +    .name = "tcg",
> +    .head = QTAILQ_HEAD_INITIALIZER(qemu_tcg_opts.head),
> +    .desc = {
> +        {
> +            .name = "mttcg",
> +            .type = QEMU_OPT_BOOL,
> +            .help = "Enable/disable multi-threaded TCG",
> +        },
> +        { /* end of list */ }
> +    },
> +};
> +
> +static void tcg_register_config(void)
> +{
> +    qemu_add_opts(&qemu_tcg_opts);
> +}
> +
> +opts_init(tcg_register_config);
> +
> +static bool default_mttcg_enabled(void)
> +{
> +    /*
> +     * TODO: Check if we have a chance to have MTTCG working on this guest/host.
> +     *       Basically is the atomic instruction implemented? Is there any
> +     *       memory ordering issue?
> +     */

I think this could be decided in configure/makefiles.

> +    return false;
> +}
> +
> +void qemu_tcg_configure(QemuOpts *opts)
> +{
> +    mttcg_enabled = qemu_opt_get_bool(opts, "mttcg", default_mttcg_enabled());
> +}
> +
> +bool qemu_tcg_mttcg_enabled(void)
> +{
> +    return mttcg_enabled;
> +}
> +
>  
>  int64_t cpu_get_icount_raw(void)
>  {
> diff --git a/include/qom/cpu.h b/include/qom/cpu.h
> index 13eeaae..5e3826c 100644
> --- a/include/qom/cpu.h
> +++ b/include/qom/cpu.h
> @@ -369,6 +369,20 @@ extern struct CPUTailQ cpus;
>  extern __thread CPUState *current_cpu;
>  
>  /**
> + * qemu_tcg_enable_mttcg:
> + * Enable the MultiThread TCG support.
> + */
> +void qemu_tcg_enable_mttcg(void);

Seems to be an orphaned declaration.

Kind regards,
Sergey

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [Qemu-devel] [RFC v2 08/11] tcg: add options for enabling MTTCG
  2016-04-12 13:03               ` Pavel Dovgalyuk
  2016-04-12 13:19                 ` Sergey Fedorov
@ 2016-04-12 14:23                 ` Alex Bennée
  2016-05-09 10:47                   ` Paolo Bonzini
  1 sibling, 1 reply; 66+ messages in thread
From: Alex Bennée @ 2016-04-12 14:23 UTC (permalink / raw)
  To: Pavel Dovgalyuk
  Cc: 'Sergey Fedorov', 'KONRAD Frederic',
	'Peter Maydell', 'MTTCG Devel',
	'Alvise Rigo', 'Emilio G. Cota',
	'QEMU Developers', 'Mark Burton',
	'Paolo Bonzini', 'J. Kiszka',
	'Richard Henderson', 'Claudio Fontana',
	'Peter Crosthwaite', 'Andreas Färber'


Pavel Dovgalyuk <dovgaluk@ispras.ru> writes:

>> From: mttcg-request@listserver.greensocs.com [mailto:mttcg-request@listserver.greensocs.com]
>> On 12/04/16 15:50, KONRAD Frederic wrote:
>> >> Yes, but users may like to have an option to disable MTTCG for some
>> >> reason. I'm also concerned about icount mode: not sure how to account
>> >> virtual time when all vCPUs run in parallel.
>> >
>> > I'm thinking the same, we don't have a solution for icount yet.
>> > The reverse execution support is probably badly broken as well.
>>
>> Reverse execution doesn't even seem to support more than a single core.
>> As of icount, it looks like to be incompatible with MTTCG.
>
> It doesn't, because there is no multicore support for tcg yet.
> I think, that we have to find some solution for icount (even if it
> will be slow).

Well one of the reasons I'm keeping single thread behaviour is so things
like icount/replay support can continue to work. The two options will be
incompatible to start with.

I'm also expecting people will want to rule out mttcg as a problem at
least in the early days.

--
Alex Bennée

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [Qemu-devel] [RFC v2 08/11] tcg: add options for enabling MTTCG
  2016-04-12 13:23   ` Sergey Fedorov
@ 2016-04-12 14:28     ` Alex Bennée
  0 siblings, 0 replies; 66+ messages in thread
From: Alex Bennée @ 2016-04-12 14:28 UTC (permalink / raw)
  To: Sergey Fedorov
  Cc: mttcg, fred.konrad, a.rigo, cota, qemu-devel, mark.burton,
	pbonzini, jan.kiszka, rth, peter.maydell, claudio.fontana,
	Peter Crosthwaite, Andreas Färber


Sergey Fedorov <serge.fdrv@gmail.com> writes:

> On 05/04/16 18:32, Alex Bennée wrote:
>> diff --git a/cpus.c b/cpus.c
>> index 46732a5..8d27fb0 100644
>> --- a/cpus.c
>> +++ b/cpus.c
>
> (snip)
>
>> @@ -146,6 +147,48 @@ typedef struct TimersState {
>>  } TimersState;
>>
>>  static TimersState timers_state;
>> +static bool mttcg_enabled;
>> +
>> +static QemuOptsList qemu_tcg_opts = {
>> +    .name = "tcg",
>> +    .head = QTAILQ_HEAD_INITIALIZER(qemu_tcg_opts.head),
>> +    .desc = {
>> +        {
>> +            .name = "mttcg",
>> +            .type = QEMU_OPT_BOOL,
>> +            .help = "Enable/disable multi-threaded TCG",
>> +        },
>> +        { /* end of list */ }
>> +    },
>> +};
>> +
>> +static void tcg_register_config(void)
>> +{
>> +    qemu_add_opts(&qemu_tcg_opts);
>> +}
>> +
>> +opts_init(tcg_register_config);
>> +
>> +static bool default_mttcg_enabled(void)
>> +{
>> +    /*
>> +     * TODO: Check if we have a chance to have MTTCG working on this guest/host.
>> +     *       Basically is the atomic instruction implemented? Is there any
>> +     *       memory ordering issue?
>> +     */
>
> I think this could be decided in configure/makefiles.

I was think we might have other interactions, like if the user enabled
replay/playback mode. There is also an argument that by having the logic
in the code it's easier for developers to see the logic as people don't
generally grok Makefiles.

>
>> +    return false;
>> +}
>> +
>> +void qemu_tcg_configure(QemuOpts *opts)
>> +{
>> +    mttcg_enabled = qemu_opt_get_bool(opts, "mttcg", default_mttcg_enabled());
>> +}
>> +
>> +bool qemu_tcg_mttcg_enabled(void)
>> +{
>> +    return mttcg_enabled;
>> +}
>> +
>>
>>  int64_t cpu_get_icount_raw(void)
>>  {
>> diff --git a/include/qom/cpu.h b/include/qom/cpu.h
>> index 13eeaae..5e3826c 100644
>> --- a/include/qom/cpu.h
>> +++ b/include/qom/cpu.h
>> @@ -369,6 +369,20 @@ extern struct CPUTailQ cpus;
>>  extern __thread CPUState *current_cpu;
>>
>>  /**
>> + * qemu_tcg_enable_mttcg:
>> + * Enable the MultiThread TCG support.
>> + */
>> +void qemu_tcg_enable_mttcg(void);
>
> Seems to be an orphaned declaration.
>
> Kind regards,
> Sergey


--
Alex Bennée

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [Qemu-devel] [RFC v2 04/11] tcg: comment on which functions have to be called with tb_lock held
  2016-04-05 15:32 ` [Qemu-devel] [RFC v2 04/11] tcg: comment on which functions have to be called with tb_lock held Alex Bennée
@ 2016-05-05 14:19   ` Sergey Fedorov
  2016-05-05 15:03     ` Alex Bennée
  2016-05-06 18:22   ` Sergey Fedorov
  1 sibling, 1 reply; 66+ messages in thread
From: Sergey Fedorov @ 2016-05-05 14:19 UTC (permalink / raw)
  To: Alex Bennée, mttcg, fred.konrad, a.rigo, cota
  Cc: qemu-devel, mark.burton, pbonzini, jan.kiszka, rth,
	peter.maydell, claudio.fontana, Peter Crosthwaite,
	Andreas Färber

On 05/04/16 18:32, Alex Bennée wrote:
> diff --git a/exec.c b/exec.c
> index f46e596..17f390e 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -826,6 +826,7 @@ int cpu_breakpoint_insert(CPUState *cpu, vaddr pc, int flags,
>  {
>      CPUBreakpoint *bp;
>  
> +    /* TODO: locking (RCU?) */
>      bp = g_malloc(sizeof(*bp));
>  
>      bp->pc = pc;

This comment is a little inconsistent. We should make access to
breakpoint and watchpoint lists to be thread-safe in all the functions
using them. So if we note this, it should be noted in all such places.
Also, it's probably not a good idea to put such comment just above
g_malloc() invocation, it could be a bit confusing. A bit more details
would also be nice.

Kind regards,
Sergey

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [Qemu-devel] [RFC v2 04/11] tcg: comment on which functions have to be called with tb_lock held
  2016-05-05 14:19   ` Sergey Fedorov
@ 2016-05-05 15:03     ` Alex Bennée
  2016-05-05 15:25       ` Sergey Fedorov
  0 siblings, 1 reply; 66+ messages in thread
From: Alex Bennée @ 2016-05-05 15:03 UTC (permalink / raw)
  To: Sergey Fedorov
  Cc: mttcg, fred.konrad, a.rigo, cota, qemu-devel, mark.burton,
	pbonzini, jan.kiszka, rth, peter.maydell, claudio.fontana,
	Peter Crosthwaite, Andreas Färber


Sergey Fedorov <serge.fdrv@gmail.com> writes:

> On 05/04/16 18:32, Alex Bennée wrote:
>> diff --git a/exec.c b/exec.c
>> index f46e596..17f390e 100644
>> --- a/exec.c
>> +++ b/exec.c
>> @@ -826,6 +826,7 @@ int cpu_breakpoint_insert(CPUState *cpu, vaddr pc, int flags,
>>  {
>>      CPUBreakpoint *bp;
>>
>> +    /* TODO: locking (RCU?) */
>>      bp = g_malloc(sizeof(*bp));
>>
>>      bp->pc = pc;
>
> This comment is a little inconsistent. We should make access to
> breakpoint and watchpoint lists to be thread-safe in all the functions
> using them. So if we note this, it should be noted in all such places.
> Also, it's probably not a good idea to put such comment just above
> g_malloc() invocation, it could be a bit confusing. A bit more details
> would also be nice.

Good point.

I could really do with some tests to exercise the debugging interface. I
did some when I wrote the arm kvm GDB stuff (see
261f4d6d3e5445f887e070f047968e756c30cf06) but it is a) not plumbed in
and b) not really a stress test which is what you want to be sure your
handling is thread safe.

>
> Kind regards,
> Sergey


--
Alex Bennée

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [Qemu-devel] [RFC v2 04/11] tcg: comment on which functions have to be called with tb_lock held
  2016-05-05 15:03     ` Alex Bennée
@ 2016-05-05 15:25       ` Sergey Fedorov
  2016-05-05 15:42         ` Sergey Fedorov
  0 siblings, 1 reply; 66+ messages in thread
From: Sergey Fedorov @ 2016-05-05 15:25 UTC (permalink / raw)
  To: Alex Bennée
  Cc: mttcg, fred.konrad, a.rigo, cota, qemu-devel, mark.burton,
	pbonzini, jan.kiszka, rth, peter.maydell, claudio.fontana,
	Peter Crosthwaite, Andreas Färber

On 05/05/16 18:03, Alex Bennée wrote:
> Sergey Fedorov <serge.fdrv@gmail.com> writes:
>
>> On 05/04/16 18:32, Alex Bennée wrote:
>>> diff --git a/exec.c b/exec.c
>>> index f46e596..17f390e 100644
>>> --- a/exec.c
>>> +++ b/exec.c
>>> @@ -826,6 +826,7 @@ int cpu_breakpoint_insert(CPUState *cpu, vaddr pc, int flags,
>>>  {
>>>      CPUBreakpoint *bp;
>>>
>>> +    /* TODO: locking (RCU?) */
>>>      bp = g_malloc(sizeof(*bp));
>>>
>>>      bp->pc = pc;
>> This comment is a little inconsistent. We should make access to
>> breakpoint and watchpoint lists to be thread-safe in all the functions
>> using them. So if we note this, it should be noted in all such places.
>> Also, it's probably not a good idea to put such comment just above
>> g_malloc() invocation, it could be a bit confusing. A bit more details
>> would also be nice.
> Good point.
>
> I could really do with some tests to exercise the debugging interface. I
> did some when I wrote the arm kvm GDB stuff (see
> 261f4d6d3e5445f887e070f047968e756c30cf06) but it is a) not plumbed in
> and b) not really a stress test which is what you want to be sure your
> handling is thread safe.

It seems we can only have a race between TCG helper inserting/removing
break-/watchpoint and gdbstub. So maybe we could just use separate lists
for CPU and GDB breakpoints?

Kind regards,
Sergey

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [Qemu-devel] [RFC v2 04/11] tcg: comment on which functions have to be called with tb_lock held
  2016-05-05 15:25       ` Sergey Fedorov
@ 2016-05-05 15:42         ` Sergey Fedorov
  0 siblings, 0 replies; 66+ messages in thread
From: Sergey Fedorov @ 2016-05-05 15:42 UTC (permalink / raw)
  To: Alex Bennée
  Cc: mttcg, fred.konrad, a.rigo, cota, qemu-devel, mark.burton,
	pbonzini, jan.kiszka, rth, peter.maydell, claudio.fontana,
	Peter Crosthwaite, Andreas Färber

On 05/05/16 18:25, Sergey Fedorov wrote:
> On 05/05/16 18:03, Alex Bennée wrote:
>> Sergey Fedorov <serge.fdrv@gmail.com> writes:
>>
>>> On 05/04/16 18:32, Alex Bennée wrote:
>>>> diff --git a/exec.c b/exec.c
>>>> index f46e596..17f390e 100644
>>>> --- a/exec.c
>>>> +++ b/exec.c
>>>> @@ -826,6 +826,7 @@ int cpu_breakpoint_insert(CPUState *cpu, vaddr pc, int flags,
>>>>  {
>>>>      CPUBreakpoint *bp;
>>>>
>>>> +    /* TODO: locking (RCU?) */
>>>>      bp = g_malloc(sizeof(*bp));
>>>>
>>>>      bp->pc = pc;
>>> This comment is a little inconsistent. We should make access to
>>> breakpoint and watchpoint lists to be thread-safe in all the functions
>>> using them. So if we note this, it should be noted in all such places.
>>> Also, it's probably not a good idea to put such comment just above
>>> g_malloc() invocation, it could be a bit confusing. A bit more details
>>> would also be nice.
>> Good point.
>>
>> I could really do with some tests to exercise the debugging interface. I
>> did some when I wrote the arm kvm GDB stuff (see
>> 261f4d6d3e5445f887e070f047968e756c30cf06) but it is a) not plumbed in
>> and b) not really a stress test which is what you want to be sure your
>> handling is thread safe.
> It seems we can only have a race between TCG helper inserting/removing
> break-/watchpoint and gdbstub. So maybe we could just use separate lists
> for CPU and GDB breakpoints?

However, we would still need to synchronize reads in VCPU threads with
insertions/removals in gdbstub... No much point in splitting lists, we
could just use a spinlock to serialize insertions/removals and RCU to
make reads safe.

Kind regards,
Sergey

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [Qemu-devel] [RFC v2 03/11] docs: new design document multi-thread-tcg.txt (DRAFTING)
  2016-04-05 15:32 ` [Qemu-devel] [RFC v2 03/11] docs: new design document multi-thread-tcg.txt (DRAFTING) Alex Bennée
  2016-04-11 20:00   ` Sergey Fedorov
@ 2016-05-06 11:25   ` Sergey Fedorov
  1 sibling, 0 replies; 66+ messages in thread
From: Sergey Fedorov @ 2016-05-06 11:25 UTC (permalink / raw)
  To: Alex Bennée, mttcg, fred.konrad, a.rigo, cota
  Cc: qemu-devel, mark.burton, pbonzini, jan.kiszka, rth,
	peter.maydell, claudio.fontana

On 05/04/16 18:32, Alex Bennée wrote:
> This is a current DRAFT of a design proposal for upgrading TCG emulation
> to take advantage of modern CPUs by running a thread-per-CPU. The
> document goes through the various areas of the code affected by such a
> change and proposes design requirements for each part of the solution.
>
> It has been written *without* explicit reference to the current ongoing
> efforts to introduce this feature. The hope being we can review and
> discuss the design choices without assuming the current choices taken by
> the implementation are correct.
>
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
>
> ---
> v1
>   - initial version
> v2
>   - update discussion on locks
>   - bit more detail on vCPU scheduling
>   - explicitly mention Translation Blocks
>   - emulated hardware state already covered by iomutex
>   - a few minor rewords

We could also include a few words about icount mode support in MTTCG.

Kind regards,
Sergey

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [Qemu-devel] [RFC v2 04/11] tcg: comment on which functions have to be called with tb_lock held
  2016-04-05 15:32 ` [Qemu-devel] [RFC v2 04/11] tcg: comment on which functions have to be called with tb_lock held Alex Bennée
  2016-05-05 14:19   ` Sergey Fedorov
@ 2016-05-06 18:22   ` Sergey Fedorov
  2016-05-11 12:58     ` Paolo Bonzini
  1 sibling, 1 reply; 66+ messages in thread
From: Sergey Fedorov @ 2016-05-06 18:22 UTC (permalink / raw)
  To: Alex Bennée, mttcg, fred.konrad, a.rigo, cota
  Cc: qemu-devel, mark.burton, pbonzini, jan.kiszka, rth,
	peter.maydell, claudio.fontana, Peter Crosthwaite,
	Andreas Färber

On 05/04/16 18:32, Alex Bennée wrote:
> From: Paolo Bonzini <pbonzini@redhat.com>
>
> softmmu requires more functions to be thread-safe, because translation
> blocks can be invalidated from e.g. notdirty callbacks.  Probably the
> same holds for user-mode emulation, it's just that no one has ever
> tried to produce a coherent locking there.
>
> This patch will guide the introduction of more tb_lock and tb_unlock
> calls for system emulation.
>
> Note that after this patch some (most) of the mentioned functions are
> still called outside tb_lock/tb_unlock.  The next one will rectify this.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
(snip)
> diff --git a/include/qom/cpu.h b/include/qom/cpu.h
> index 6931db9..13eeaae 100644
> --- a/include/qom/cpu.h
> +++ b/include/qom/cpu.h
> @@ -306,7 +306,10 @@ struct CPUState {
>  
>      void *env_ptr; /* CPUArchState */
>      struct TranslationBlock *current_tb;
> +
> +    /* Writes protected by tb_lock, reads not thread-safe  */
>      struct TranslationBlock *tb_jmp_cache[TB_JMP_CACHE_SIZE];

What does "reads not thread-safe" mean? Where does it get read outside
of either actually held tb_lock or promised in a comment added by this
patch?

> +
>      struct GDBRegisterState *gdb_regs;
>      int gdb_num_regs;
>      int gdb_num_g_regs;
> diff --git a/tcg/tcg.h b/tcg/tcg.h
> index ea4ff02..a46d17c 100644
> --- a/tcg/tcg.h
> +++ b/tcg/tcg.h
> @@ -609,6 +609,7 @@ static inline bool tcg_op_buf_full(void)
>  
>  /* pool based memory allocation */
>  
> +/* tb_lock must be held for tcg_malloc_internal. */

How far are we ready to go in commenting such functions? The functions
can be divided into three categories:
 * extern
 * static, called only from another one function (for better code
structuring)
 * static, called from multiple other functions (for encapsulation of
common code)

I think we should decide how to comment locking requirements and follow
this consistently.

Concerning this particular case, I think there's no much point in making
tcg_malloc_internal() and tcg_malloc() externally visible and commenting
locking requirement for them. We'd better have a separate header file
under include/ for external TCG interface declarations and use this one
as internal only in tcg/.

>  void *tcg_malloc_internal(TCGContext *s, int size);
>  void tcg_pool_reset(TCGContext *s);
>  void tcg_pool_delete(TCGContext *s);
> @@ -617,6 +618,7 @@ void tb_lock(void);
>  void tb_unlock(void);
>  void tb_lock_reset(void);
>  
> +/* Called with tb_lock held.  */
>  static inline void *tcg_malloc(int size)
>  {
>      TCGContext *s = &tcg_ctx;
> diff --git a/translate-all.c b/translate-all.c
> index d923008..a7ff5e7 100644
> --- a/translate-all.c
> +++ b/translate-all.c
> @@ -248,7 +248,9 @@ static int encode_search(TranslationBlock *tb, uint8_t *block)
>      return p - block;
>  }
>  
> -/* The cpu state corresponding to 'searched_pc' is restored.  */
> +/* The cpu state corresponding to 'searched_pc' is restored.
> + * Called with tb_lock held.
> + */
>  static int cpu_restore_state_from_tb(CPUState *cpu, TranslationBlock *tb,
>                                       uintptr_t searched_pc)
>  {
> @@ -306,8 +308,10 @@ bool cpu_restore_state(CPUState *cpu, uintptr_t retaddr)

cpu_restore_state_from_tb() called right here also requires tb_lock().

>          if (tb->cflags & CF_NOCACHE) {
>              /* one-shot translation, invalidate it immediately */
>              cpu->current_tb = NULL;
> +            tb_lock();
>              tb_phys_invalidate(tb, -1);
>              tb_free(tb);
> +            tb_unlock();
>          }
>          return true;
>      }
> @@ -399,6 +403,7 @@ static void page_init(void)
>  }
>  
>  /* If alloc=1:
> + * Called with tb_lock held for system emulation.
>   * Called with mmap_lock held for user-mode emulation.

There's a number of functions where their comment states that tb_lock
should be held for system emulation but mmap_lock for user-mode
emulation. BTW, what is the purpose of mmap_lock? And how it is related
to tb_lock?

>   */
>  static PageDesc *page_find_alloc(tb_page_addr_t index, int alloc)
(snip)
> @@ -1429,7 +1446,9 @@ void tb_invalidate_phys_page_fast(tb_page_addr_t start, int len)
>      }
>      if (!p->code_bitmap &&
>          ++p->code_write_count >= SMC_BITMAP_USE_THRESHOLD) {
> -        /* build code bitmap */
> +        /* build code bitmap.  FIXME: writes should be protected by
> +         * tb_lock, reads by tb_lock or RCU.
> +         */

Probably, page_find() called earlier in this function requires tb_lock
held as well as tb_invalidate_phys_page_range() which can be called
later. Maybe tb_invalidate_phys_page_fast() is a good candidate to be
always called with tb_lock held.

>          build_page_bitmap(p);
>      }
>      if (p->code_bitmap) {
(snip)

A list of candidates (as of rth/tcg-next) to also have such a comment
includes:

    tb_find_physical()
    tb_find_slow()
    tb_hash_remove()
    tb_page_remove()
    tb_remove_from_jmp_list()
    tb_jmp_unlink()
    build_page_bitmap()
    tb_link_page()
    tb_invalidate_phys_range()
    tb_invalidate_phys_page_range()
    page_find()
    invalidate_page_bitmap()
    tb_invalidate_check()
    tb_invalidate_phys_page_fast()
    tb_invalidate_phys_page()
    tb_invalidate_phys_addr()
    cpu_io_recompile()

However, there's no sensible description of what is protected by tb_lock
and mmap_lock. I think we need to have a clear documented description of
the TCG locking scheme in order to be sure we do right things in MTTCG.

Kind regards,
Sergey

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [Qemu-devel] [RFC v2 08/11] tcg: add options for enabling MTTCG
  2016-04-11 20:50   ` Sergey Fedorov
  2016-04-12 11:48     ` Alex Bennée
@ 2016-05-09 10:45     ` Paolo Bonzini
  2016-05-09 11:50       ` Alex Bennée
  1 sibling, 1 reply; 66+ messages in thread
From: Paolo Bonzini @ 2016-05-09 10:45 UTC (permalink / raw)
  To: Sergey Fedorov, Alex Bennée, mttcg, fred.konrad, a.rigo, cota
  Cc: qemu-devel, mark.burton, jan.kiszka, rth, peter.maydell,
	claudio.fontana, Peter Crosthwaite, Andreas Färber



On 11/04/2016 22:50, Sergey Fedorov wrote:
>> > +Control multi-threaded TCG. By default QEMU will enable multi-threaded
>> > +emulation for front/back-end combinations that are known to work. The
>> > +user may enable it against the defaults however odd guest behaviour
>> > +may occur.
>> > +@end table
>> > +ETEXI
> Maybe we'd better use existing qemu accelerator framework and introduce
> "-machine accel=mttcg" option?

Another possibility is "-accel tcg,thread=single|multi".

Paolo

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [Qemu-devel] [RFC v2 08/11] tcg: add options for enabling MTTCG
  2016-04-12 14:23                 ` Alex Bennée
@ 2016-05-09 10:47                   ` Paolo Bonzini
  0 siblings, 0 replies; 66+ messages in thread
From: Paolo Bonzini @ 2016-05-09 10:47 UTC (permalink / raw)
  To: Alex Bennée, Pavel Dovgalyuk
  Cc: 'Sergey Fedorov', 'KONRAD Frederic',
	'Peter Maydell', 'MTTCG Devel',
	'Alvise Rigo', 'Emilio G. Cota',
	'QEMU Developers', 'Mark Burton',
	'J. Kiszka', 'Richard Henderson',
	'Claudio Fontana', 'Peter Crosthwaite',
	'Andreas Färber'



On 12/04/2016 16:23, Alex Bennée wrote:
> Well one of the reasons I'm keeping single thread behaviour is so things
> like icount/replay support can continue to work. The two options will be
> incompatible to start with.

As long as the single-threaded TCG gets rid of iothread's special
behavior of kicking VCPUs, I don't think it's going to be painful to
keep it around.

Paolo

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [Qemu-devel] [RFC v2 08/11] tcg: add options for enabling MTTCG
  2016-05-09 10:45     ` Paolo Bonzini
@ 2016-05-09 11:50       ` Alex Bennée
  0 siblings, 0 replies; 66+ messages in thread
From: Alex Bennée @ 2016-05-09 11:50 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sergey Fedorov, mttcg, fred.konrad, a.rigo, cota, qemu-devel,
	mark.burton, jan.kiszka, rth, peter.maydell, claudio.fontana,
	Peter Crosthwaite, Andreas Färber


Paolo Bonzini <pbonzini@redhat.com> writes:

> On 11/04/2016 22:50, Sergey Fedorov wrote:
>>> > +Control multi-threaded TCG. By default QEMU will enable multi-threaded
>>> > +emulation for front/back-end combinations that are known to work. The
>>> > +user may enable it against the defaults however odd guest behaviour
>>> > +may occur.
>>> > +@end table
>>> > +ETEXI
>> Maybe we'd better use existing qemu accelerator framework and introduce
>> "-machine accel=mttcg" option?
>
> Another possibility is "-accel tcg,thread=single|multi".

That does seem a better choice.

>
> Paolo


--
Alex Bennée

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [Qemu-devel] [RFC v2 05/11] tcg: protect TBContext with tb_lock.
  2016-04-05 15:32 ` [Qemu-devel] [RFC v2 05/11] tcg: protect TBContext with tb_lock Alex Bennée
@ 2016-05-11 12:45   ` Sergey Fedorov
  2016-05-11 12:52     ` Paolo Bonzini
  2016-06-01 10:30     ` Alex Bennée
  0 siblings, 2 replies; 66+ messages in thread
From: Sergey Fedorov @ 2016-05-11 12:45 UTC (permalink / raw)
  To: Alex Bennée, mttcg, fred.konrad, a.rigo, cota
  Cc: qemu-devel, mark.burton, pbonzini, jan.kiszka, rth,
	peter.maydell, claudio.fontana, Peter Crosthwaite,
	Michael S. Tsirkin, Eduardo Habkost

On 05/04/16 18:32, Alex Bennée wrote:
(snip)
> diff --git a/cpu-exec.c b/cpu-exec.c
> index 74065d9..bd50fef 100644
> --- a/cpu-exec.c
> +++ b/cpu-exec.c
> @@ -205,18 +205,24 @@ static void cpu_exec_nocache(CPUState *cpu, int max_cycles,
>      if (max_cycles > CF_COUNT_MASK)
>          max_cycles = CF_COUNT_MASK;
>  
> +    tb_lock();
>      cpu->tb_invalidated_flag = false;
>      tb = tb_gen_code(cpu, orig_tb->pc, orig_tb->cs_base, orig_tb->flags,
>                       max_cycles | CF_NOCACHE
>                           | (ignore_icount ? CF_IGNORE_ICOUNT : 0));
>      tb->orig_tb = cpu->tb_invalidated_flag ? NULL : orig_tb;
>      cpu->current_tb = tb;
> +    tb_unlock();
> +
>      /* execute the generated code */
>      trace_exec_tb_nocache(tb, tb->pc);
> -    cpu_tb_exec(cpu, tb);
> +    cpu_tb_exec(cpu, tb->tc_ptr);

Very suspicious change. I can't even find which patch changes
cpu_tb_exec() accordingly.

> +
> +    tb_lock();
>      cpu->current_tb = NULL;
>      tb_phys_invalidate(tb, -1);
>      tb_free(tb);
> +    tb_unlock();
>  }
>  #endif
>  
> diff --git a/exec.c b/exec.c
> index 17f390e..c46c123 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -2111,6 +2111,9 @@ static void check_watchpoint(int offset, int len, MemTxAttrs attrs, int flags)
>                      continue;
>                  }
>                  cpu->watchpoint_hit = wp;
> +
> +                /* Unlocked by cpu_loop_exit or cpu_resume_from_signal.  */

In fact, neither cpu_resume_from_signal() nor cpu_loop_exit() unlocks
the lock by itself, it gets unlocked after sigsetjmp() returns via
siglongjmp() back to cpu_exec(). So maybe it would be more clear to say
something like "'tb_lock' gets unlocked after siglongjmp()"?

> +                tb_lock();
>                  tb_check_watchpoint(cpu);
>                  if (wp->flags & BP_STOP_BEFORE_ACCESS) {
>                      cpu->exception_index = EXCP_DEBUG;
(snip)
> diff --git a/translate-all.c b/translate-all.c
> index a7ff5e7..935d24c 100644
> --- a/translate-all.c
> +++ b/translate-all.c
> @@ -834,7 +834,9 @@ static void page_flush_tb(void)
>  }
>  
>  /* flush all the translation blocks */
> -/* XXX: tb_flush is currently not thread safe */
> +/* XXX: tb_flush is currently not thread safe.  System emulation calls it only
> + * with tb_lock taken or from safe_work, so no need to take tb_lock here.
> + */

"System emulation"? What about user-mode emulation?

>  void tb_flush(CPUState *cpu)
>  {
>  #if defined(DEBUG_FLUSH)
> @@ -1350,6 +1352,7 @@ void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end,
>      /* we remove all the TBs in the range [start, end[ */
>      /* XXX: see if in some cases it could be faster to invalidate all
>         the code */
> +    tb_lock();

Don't we need also protect a call to page_find() above? page_find()
calls page_find_alloc() which is noted to be called with 'tb_lock' held.
However, it might depend on the way we treat 'mmap_lock' in system mode
emulation. We might also consider taking the lock outside of
tb_invalidate_phys*() functions because they can be called after
page_find().

>      tb = p->first_tb;
>      while (tb != NULL) {
>          n = (uintptr_t)tb & 3;
> @@ -1417,12 +1420,13 @@ void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end,
>      if (current_tb_modified) {
>          /* we generate a block containing just the instruction
>             modifying the memory. It will ensure that it cannot modify
> -           itself */
> +           itself.  cpu_resume_from_signal unlocks tb_lock.  */
>          cpu->current_tb = NULL;
>          tb_gen_code(cpu, current_pc, current_cs_base, current_flags, 1);
>          cpu_resume_from_signal(cpu, NULL);
>      }
>  #endif
> +    tb_unlock();
>  }
>  
>  #ifdef CONFIG_SOFTMMU
(snip)
> @ -1627,6 +1636,7 @@ void cpu_io_recompile(CPUState *cpu, uintptr_t retaddr)
>      target_ulong pc, cs_base;
>      uint64_t flags;
>  
> +    tb_lock();

We don't have to take 'tb_lock' for nether tb_find_pc() nor
cpu_restore_state_from_tb() because the lock does not protect from
tb_flush() anyway. I think the lock should be taken just before the
first call to tb_phys_invalidate() in this function.

>      tb = tb_find_pc(retaddr);
>      if (!tb) {
>          cpu_abort(cpu, "cpu_io_recompile: could not find TB for pc=%p",
> @@ -1678,11 +1688,15 @@ void cpu_io_recompile(CPUState *cpu, uintptr_t retaddr)
>      /* FIXME: In theory this could raise an exception.  In practice
>         we have already translated the block once so it's probably ok.  */
>      tb_gen_code(cpu, pc, cs_base, flags, cflags);
> -    /* TODO: If env->pc != tb->pc (i.e. the faulting instruction was not
> -       the first in the TB) then we end up generating a whole new TB and
> -       repeating the fault, which is horribly inefficient.
> -       Better would be to execute just this insn uncached, or generate a
> -       second new TB.  */
> +
> +    /* This unlocks the tb_lock.
> +     *
> +     * TODO: If env->pc != tb->pc (i.e. the faulting instruction was not
> +     * the first in the TB) then we end up generating a whole new TB and
> +     * repeating the fault, which is horribly inefficient.
> +     * Better would be to execute just this insn uncached, or generate a
> +     * second new TB.
> +     */
>      cpu_resume_from_signal(cpu, NULL);
>  }
(snip)

Kind regards,
Sergey

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [Qemu-devel] [RFC v2 05/11] tcg: protect TBContext with tb_lock.
  2016-05-11 12:45   ` Sergey Fedorov
@ 2016-05-11 12:52     ` Paolo Bonzini
  2016-05-11 13:42       ` Sergey Fedorov
  2016-06-01 10:30     ` Alex Bennée
  1 sibling, 1 reply; 66+ messages in thread
From: Paolo Bonzini @ 2016-05-11 12:52 UTC (permalink / raw)
  To: Sergey Fedorov, Alex Bennée, mttcg, fred.konrad, a.rigo, cota
  Cc: qemu-devel, mark.burton, jan.kiszka, rth, peter.maydell,
	claudio.fontana, Peter Crosthwaite, Michael S. Tsirkin,
	Eduardo Habkost

Just a couple answers/remarks.

On 11/05/2016 14:45, Sergey Fedorov wrote:
>> diff --git a/exec.c b/exec.c
>> index 17f390e..c46c123 100644
>> --- a/exec.c
>> +++ b/exec.c
>> @@ -2111,6 +2111,9 @@ static void check_watchpoint(int offset, int len, MemTxAttrs attrs, int flags)
>>                      continue;
>>                  }
>>                  cpu->watchpoint_hit = wp;
>> +
>> +                /* Unlocked by cpu_loop_exit or cpu_resume_from_signal.  */
> 
> In fact, neither cpu_resume_from_signal() nor cpu_loop_exit() unlocks
> the lock by itself, it gets unlocked after sigsetjmp() returns via
> siglongjmp() back to cpu_exec(). So maybe it would be more clear to say
> something like "'tb_lock' gets unlocked after siglongjmp()"?

Yes, or "cpu_exec() unlocks tb_lock after cpu_loop_exit or
cpu_resume_from_signal".  Something like that, anyway.

>>  void tb_flush(CPUState *cpu)
>>  {
>>  #if defined(DEBUG_FLUSH)
>> @@ -1350,6 +1352,7 @@ void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end,
>>      /* we remove all the TBs in the range [start, end[ */
>>      /* XXX: see if in some cases it could be faster to invalidate all
>>         the code */
>> +    tb_lock();
> 
> Don't we need also protect a call to page_find() above? page_find()
> calls page_find_alloc() which is noted to be called with 'tb_lock' held.

Only if alloc=1; page_find calls it with alloc=0.

> However, it might depend on the way we treat 'mmap_lock' in system mode
> emulation.

It's just not there; generally speaking it's replaced with tb_lock.

>> @ -1627,6 +1636,7 @@ void cpu_io_recompile(CPUState *cpu, uintptr_t retaddr)
>>      target_ulong pc, cs_base;
>>      uint64_t flags;
>>  
>> +    tb_lock();
> 
> We don't have to take 'tb_lock' for nether tb_find_pc() nor
> cpu_restore_state_from_tb() because the lock does not protect from
> tb_flush() anyway. I think the lock should be taken just before the
> first call to tb_phys_invalidate() in this function.

Indeed, this dates back to when cpu_restore_state_from_tb did recompilation.

In general, I don't have a big problem with slightly bigger critical
sections than necessary, if they aren't in a hot path or they avoid
repeated lock-unlock.

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [Qemu-devel] [RFC v2 04/11] tcg: comment on which functions have to be called with tb_lock held
  2016-05-06 18:22   ` Sergey Fedorov
@ 2016-05-11 12:58     ` Paolo Bonzini
  2016-05-11 13:36       ` Sergey Fedorov
  0 siblings, 1 reply; 66+ messages in thread
From: Paolo Bonzini @ 2016-05-11 12:58 UTC (permalink / raw)
  To: Sergey Fedorov, Alex Bennée, mttcg, fred.konrad, a.rigo, cota
  Cc: peter.maydell, claudio.fontana, Peter Crosthwaite, jan.kiszka,
	mark.burton, qemu-devel, Andreas Färber, rth



On 06/05/2016 20:22, Sergey Fedorov wrote:
> However, there's no sensible description of what is protected by tb_lock
> and mmap_lock. I think we need to have a clear documented description of
> the TCG locking scheme in order to be sure we do right things in MTTCG.

I think there was such a patch somewhere, but: tb_lock basically
protects tcg_ctx, while mmap_lock protects the user-mode emulation page
table (the equivalent for system emulation is the memory map which is
protected by the BQL).  Furthermore, mmap_lock must be taken outside
tb_lock.

Paolo

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [Qemu-devel] [RFC v2 04/11] tcg: comment on which functions have to be called with tb_lock held
  2016-05-11 12:58     ` Paolo Bonzini
@ 2016-05-11 13:36       ` Sergey Fedorov
  2016-05-11 13:46         ` Paolo Bonzini
  0 siblings, 1 reply; 66+ messages in thread
From: Sergey Fedorov @ 2016-05-11 13:36 UTC (permalink / raw)
  To: Paolo Bonzini, Alex Bennée, mttcg, fred.konrad, a.rigo, cota
  Cc: peter.maydell, claudio.fontana, Peter Crosthwaite, jan.kiszka,
	mark.burton, qemu-devel, Andreas Färber, rth

On 11/05/16 15:58, Paolo Bonzini wrote:
>
> On 06/05/2016 20:22, Sergey Fedorov wrote:
>> However, there's no sensible description of what is protected by tb_lock
>> and mmap_lock. I think we need to have a clear documented description of
>> the TCG locking scheme in order to be sure we do right things in MTTCG.
> I think there was such a patch somewhere, but: tb_lock basically
> protects tcg_ctx, while mmap_lock protects the user-mode emulation page
> table (the equivalent for system emulation is the memory map which is
> protected by the BQL).  Furthermore, mmap_lock must be taken outside
> tb_lock.

What's a user-mode emulation page table? 'l1_map'? It is used by system
emulation to keep track of TBs per page and 'code_bitmap'. Shouldn't it
be protected with 'mmap_lock' in system emulation?

Thanks,
Sergey

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [Qemu-devel] [RFC v2 05/11] tcg: protect TBContext with tb_lock.
  2016-05-11 12:52     ` Paolo Bonzini
@ 2016-05-11 13:42       ` Sergey Fedorov
  0 siblings, 0 replies; 66+ messages in thread
From: Sergey Fedorov @ 2016-05-11 13:42 UTC (permalink / raw)
  To: Paolo Bonzini, Alex Bennée, mttcg, fred.konrad, a.rigo, cota
  Cc: qemu-devel, mark.burton, jan.kiszka, rth, peter.maydell,
	claudio.fontana, Peter Crosthwaite, Michael S. Tsirkin,
	Eduardo Habkost

On 11/05/16 15:52, Paolo Bonzini wrote:
> Just a couple answers/remarks.
>
> On 11/05/2016 14:45, Sergey Fedorov wrote:
(snip)
>>> @@ -1350,6 +1352,7 @@ void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end,
>>>      /* we remove all the TBs in the range [start, end[ */
>>>      /* XXX: see if in some cases it could be faster to invalidate all
>>>         the code */
>>> +    tb_lock();
>> Don't we need also protect a call to page_find() above? page_find()
>> calls page_find_alloc() which is noted to be called with 'tb_lock' held.
> Only if alloc=1; page_find calls it with alloc=0.

Year, right :)

>> However, it might depend on the way we treat 'mmap_lock' in system mode
>> emulation.
> It's just not there; generally speaking it's replaced with tb_lock.

So why do we need yet another lock, 'mmap_lock', for user-mode emulation
and don't need it for system mode?

Kind regards,
Sergey

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [Qemu-devel] [RFC v2 04/11] tcg: comment on which functions have to be called with tb_lock held
  2016-05-11 13:36       ` Sergey Fedorov
@ 2016-05-11 13:46         ` Paolo Bonzini
  2016-05-12 19:32           ` Sergey Fedorov
  0 siblings, 1 reply; 66+ messages in thread
From: Paolo Bonzini @ 2016-05-11 13:46 UTC (permalink / raw)
  To: Sergey Fedorov, Alex Bennée, mttcg, fred.konrad, a.rigo, cota
  Cc: peter.maydell, claudio.fontana, Peter Crosthwaite, jan.kiszka,
	mark.burton, qemu-devel, Andreas Färber, rth



On 11/05/2016 15:36, Sergey Fedorov wrote:
> On 11/05/16 15:58, Paolo Bonzini wrote:
>>
>> On 06/05/2016 20:22, Sergey Fedorov wrote:
>>> However, there's no sensible description of what is protected by tb_lock
>>> and mmap_lock. I think we need to have a clear documented description of
>>> the TCG locking scheme in order to be sure we do right things in MTTCG.
>> I think there was such a patch somewhere, but: tb_lock basically
>> protects tcg_ctx, while mmap_lock protects the user-mode emulation page
>> table (the equivalent for system emulation is the memory map which is
>> protected by the BQL).  Furthermore, mmap_lock must be taken outside
>> tb_lock.
> 
> What's a user-mode emulation page table? 'l1_map'?

Yes.  It's used beyond TCG in user-mode emulation.

> It is used by system
> emulation to keep track of TBs per page and 'code_bitmap'. Shouldn't it
> be protected with 'mmap_lock' in system emulation?

tb_lock is used instead because it's taken everywhere system emulation
uses l1_map; so tb_lock is protecting l1_map too in system emulation.

As mentioned above, user-mode emulation uses l1_map in linux-user/mmap.c
via page_{get,set}_flags, which I guess is why the lock is separate.
None of us was involved in the original multi-threaded linux-user work,
we're reverse engineering it just like you. :)

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [Qemu-devel] [RFC v2 04/11] tcg: comment on which functions have to be called with tb_lock held
  2016-05-11 13:46         ` Paolo Bonzini
@ 2016-05-12 19:32           ` Sergey Fedorov
  2016-05-13  9:25             ` Paolo Bonzini
  0 siblings, 1 reply; 66+ messages in thread
From: Sergey Fedorov @ 2016-05-12 19:32 UTC (permalink / raw)
  To: Paolo Bonzini, Alex Bennée, mttcg, fred.konrad, a.rigo, cota
  Cc: peter.maydell, claudio.fontana, Peter Crosthwaite, jan.kiszka,
	mark.burton, qemu-devel, Andreas Färber, rth

On 11/05/16 16:46, Paolo Bonzini wrote:
> On 11/05/2016 15:36, Sergey Fedorov wrote:
>> On 11/05/16 15:58, Paolo Bonzini wrote:
>>> On 06/05/2016 20:22, Sergey Fedorov wrote:
>>>> However, there's no sensible description of what is protected by tb_lock
>>>> and mmap_lock. I think we need to have a clear documented description of
>>>> the TCG locking scheme in order to be sure we do right things in MTTCG.
>>> I think there was such a patch somewhere, but: tb_lock basically
>>> protects tcg_ctx, while mmap_lock protects the user-mode emulation page
>>> table (the equivalent for system emulation is the memory map which is
>>> protected by the BQL).  Furthermore, mmap_lock must be taken outside
>>> tb_lock.
>> What's a user-mode emulation page table? 'l1_map'?
> Yes.  It's used beyond TCG in user-mode emulation.
>
>> It is used by system
>> emulation to keep track of TBs per page and 'code_bitmap'. Shouldn't it
>> be protected with 'mmap_lock' in system emulation?
> tb_lock is used instead because it's taken everywhere system emulation
> uses l1_map; so tb_lock is protecting l1_map too in system emulation.
>
> As mentioned above, user-mode emulation uses l1_map in linux-user/mmap.c
> via page_{get,set}_flags, which I guess is why the lock is separate.
> None of us was involved in the original multi-threaded linux-user work,
> we're reverse engineering it just like you. :)

While I'm investigating 'tb_lock' and 'mmap_lock' usage I am wondering
why don't put 'l1_map' into 'tcg_ctx'?

Kind regards,
Sergey

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [Qemu-devel] [RFC v2 04/11] tcg: comment on which functions have to be called with tb_lock held
  2016-05-12 19:32           ` Sergey Fedorov
@ 2016-05-13  9:25             ` Paolo Bonzini
  0 siblings, 0 replies; 66+ messages in thread
From: Paolo Bonzini @ 2016-05-13  9:25 UTC (permalink / raw)
  To: Sergey Fedorov, Alex Bennée, mttcg, fred.konrad, a.rigo, cota
  Cc: peter.maydell, Peter Crosthwaite, claudio.fontana, mark.burton,
	qemu-devel, jan.kiszka, Andreas Färber, rth



On 12/05/2016 21:32, Sergey Fedorov wrote:
>> >
>> > As mentioned above, user-mode emulation uses l1_map in linux-user/mmap.c
>> > via page_{get,set}_flags, which I guess is why the lock is separate.
>> > None of us was involved in the original multi-threaded linux-user work,
>> > we're reverse engineering it just like you. :)
> While I'm investigating 'tb_lock' and 'mmap_lock' usage I am wondering
> why don't put 'l1_map' into 'tcg_ctx'?

Yes, that makes sense.

Paolo

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [Qemu-devel] [RFC v2 10/11] tcg: drop global lock during TCG code execution
  2016-04-05 15:32 ` [Qemu-devel] [RFC v2 10/11] tcg: drop global lock during TCG code execution Alex Bennée
@ 2016-05-24 21:28   ` Sergey Fedorov
  2016-05-25 10:33     ` Paolo Bonzini
  0 siblings, 1 reply; 66+ messages in thread
From: Sergey Fedorov @ 2016-05-24 21:28 UTC (permalink / raw)
  To: Alex Bennée, mttcg, fred.konrad, a.rigo, cota
  Cc: qemu-devel, mark.burton, pbonzini, jan.kiszka, rth,
	peter.maydell, claudio.fontana, Peter Crosthwaite,
	Eduardo Habkost, Michael S. Tsirkin

On 05/04/16 18:32, Alex Bennée wrote:
> diff --git a/cpu-exec.c b/cpu-exec.c
> index bd50fef..f558508 100644
> --- a/cpu-exec.c
> +++ b/cpu-exec.c
> @@ -28,6 +28,7 @@
>  #include "qemu/rcu.h"
>  #include "exec/tb-hash.h"
>  #include "exec/log.h"
> +#include "qemu/main-loop.h"
>  #if defined(TARGET_I386) && !defined(CONFIG_USER_ONLY)
>  #include "hw/i386/apic.h"
>  #endif
> @@ -452,6 +453,8 @@ int cpu_exec(CPUState *cpu)
>              for(;;) {
>                  interrupt_request = cpu->interrupt_request;
>                  if (unlikely(interrupt_request)) {
> +                    qemu_mutex_lock_iothread();
> +
>                      if (unlikely(cpu->singlestep_enabled & SSTEP_NOIRQ)) {
>                          /* Mask out external interrupts for this step. */
>                          interrupt_request &= ~CPU_INTERRUPT_SSTEP_MASK;
> @@ -504,6 +507,11 @@ int cpu_exec(CPUState *cpu)
>                             the program flow was changed */
>                          next_tb = 0;
>                      }
> +
> +                    if (qemu_mutex_iothread_locked()) {
> +                        qemu_mutex_unlock_iothread();
> +                    }
> +

Why do we need to check for "qemu_mutex_iothread_locked()" here?
iothread lock is always held here, isn't it?

>                  }
>                  if (unlikely(cpu->exit_request
>                               || replay_has_interrupt())) {
(snip)
> diff --git a/cputlb.c b/cputlb.c
> index 466663b..1412049 100644
> --- a/cputlb.c
> +++ b/cputlb.c
> @@ -18,6 +18,7 @@
>   */
>  
>  #include "qemu/osdep.h"
> +#include "qemu/main-loop.h"

No need in this include.

>  #include "cpu.h"
>  #include "exec/exec-all.h"
>  #include "exec/memory.h"
> diff --git a/exec.c b/exec.c
> index c46c123..9e83673 100644
> --- a/exec.c
> +++ b/exec.c
(snip)
> @@ -2347,8 +2353,14 @@ static void io_mem_init(void)
>      memory_region_init_io(&io_mem_rom, NULL, &unassigned_mem_ops, NULL, NULL, UINT64_MAX);
>      memory_region_init_io(&io_mem_unassigned, NULL, &unassigned_mem_ops, NULL,
>                            NULL, UINT64_MAX);
> +
> +    /* io_mem_notdirty calls tb_invalidate_phys_page_fast,
> +     * which must be called without the iothread mutex.
> +     */

notdirty_mem_write() operates on page dirty flags. Is it safe to do so
out of iothread lock?

>      memory_region_init_io(&io_mem_notdirty, NULL, &notdirty_mem_ops, NULL,
>                            NULL, UINT64_MAX);
> +    memory_region_clear_global_locking(&io_mem_notdirty);
> +
>      memory_region_init_io(&io_mem_watch, NULL, &watch_mem_ops, NULL,
>                            NULL, UINT64_MAX);
>  }
(snip)
> diff --git a/translate-all.c b/translate-all.c
> index 935d24c..0c377bb 100644
> --- a/translate-all.c
> +++ b/translate-all.c
> @@ -1318,6 +1318,7 @@ void tb_invalidate_phys_range(tb_page_addr_t start, tb_page_addr_t end)
>   * this TB.
>   *
>   * Called with mmap_lock held for user-mode emulation
> + * If called from generated code, iothread mutex must not be held.

What does that mean? If iothread mutex is not required by the function
then why mention it here at all?

Kind regards,
Sergey

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [Qemu-devel] [RFC v2 10/11] tcg: drop global lock during TCG code execution
  2016-05-24 21:28   ` Sergey Fedorov
@ 2016-05-25 10:33     ` Paolo Bonzini
  2016-05-25 11:07       ` Alex Bennée
  0 siblings, 1 reply; 66+ messages in thread
From: Paolo Bonzini @ 2016-05-25 10:33 UTC (permalink / raw)
  To: Sergey Fedorov, Alex Bennée, mttcg, fred.konrad, a.rigo, cota
  Cc: qemu-devel, mark.burton, jan.kiszka, rth, peter.maydell,
	claudio.fontana, Peter Crosthwaite, Eduardo Habkost,
	Michael S. Tsirkin



On 24/05/2016 23:28, Sergey Fedorov wrote:
> On 05/04/16 18:32, Alex Bennée wrote:
>> diff --git a/cpu-exec.c b/cpu-exec.c
>> index bd50fef..f558508 100644
>> --- a/cpu-exec.c
>> +++ b/cpu-exec.c
>> @@ -28,6 +28,7 @@
>>  #include "qemu/rcu.h"
>>  #include "exec/tb-hash.h"
>>  #include "exec/log.h"
>> +#include "qemu/main-loop.h"
>>  #if defined(TARGET_I386) && !defined(CONFIG_USER_ONLY)
>>  #include "hw/i386/apic.h"
>>  #endif
>> @@ -452,6 +453,8 @@ int cpu_exec(CPUState *cpu)
>>              for(;;) {
>>                  interrupt_request = cpu->interrupt_request;
>>                  if (unlikely(interrupt_request)) {
>> +                    qemu_mutex_lock_iothread();
>> +
>>                      if (unlikely(cpu->singlestep_enabled & SSTEP_NOIRQ)) {
>>                          /* Mask out external interrupts for this step. */
>>                          interrupt_request &= ~CPU_INTERRUPT_SSTEP_MASK;
>> @@ -504,6 +507,11 @@ int cpu_exec(CPUState *cpu)
>>                             the program flow was changed */
>>                          next_tb = 0;
>>                      }
>> +
>> +                    if (qemu_mutex_iothread_locked()) {
>> +                        qemu_mutex_unlock_iothread();
>> +                    }
>> +
> 
> Why do we need to check for "qemu_mutex_iothread_locked()" here?
> iothread lock is always held here, isn't it?

Correct.

>>                  }
>>                  if (unlikely(cpu->exit_request
>>                               || replay_has_interrupt())) {
> (snip)
>> diff --git a/cputlb.c b/cputlb.c
>> index 466663b..1412049 100644
>> --- a/cputlb.c
>> +++ b/cputlb.c
>> @@ -18,6 +18,7 @@
>>   */
>>  
>>  #include "qemu/osdep.h"
>> +#include "qemu/main-loop.h"
> 
> No need in this include.
> 
>>  #include "cpu.h"
>>  #include "exec/exec-all.h"
>>  #include "exec/memory.h"
>> diff --git a/exec.c b/exec.c
>> index c46c123..9e83673 100644
>> --- a/exec.c
>> +++ b/exec.c
> (snip)
>> @@ -2347,8 +2353,14 @@ static void io_mem_init(void)
>>      memory_region_init_io(&io_mem_rom, NULL, &unassigned_mem_ops, NULL, NULL, UINT64_MAX);
>>      memory_region_init_io(&io_mem_unassigned, NULL, &unassigned_mem_ops, NULL,
>>                            NULL, UINT64_MAX);
>> +
>> +    /* io_mem_notdirty calls tb_invalidate_phys_page_fast,
>> +     * which must be called without the iothread mutex.
>> +     */
> 
> notdirty_mem_write() operates on page dirty flags. Is it safe to do so
> out of iothread lock?

Yes, see commit 5f2cb94 ("memory: make
cpu_physical_memory_sync_dirty_bitmap() fully atomic", 2014-12-02) and
those before.

>>      memory_region_init_io(&io_mem_notdirty, NULL, &notdirty_mem_ops, NULL,
>>                            NULL, UINT64_MAX);
>> +    memory_region_clear_global_locking(&io_mem_notdirty);
>> +
>>      memory_region_init_io(&io_mem_watch, NULL, &watch_mem_ops, NULL,
>>                            NULL, UINT64_MAX);
>>  }
> (snip)
>> diff --git a/translate-all.c b/translate-all.c
>> index 935d24c..0c377bb 100644
>> --- a/translate-all.c
>> +++ b/translate-all.c
>> @@ -1318,6 +1318,7 @@ void tb_invalidate_phys_range(tb_page_addr_t start, tb_page_addr_t end)
>>   * this TB.
>>   *
>>   * Called with mmap_lock held for user-mode emulation
>> + * If called from generated code, iothread mutex must not be held.
> 
> What does that mean? If iothread mutex is not required by the function
> then why mention it here at all?

If this function can take the iothread mutex itself, this would cause a
deadlock.  I'm not sure if it could though.

Another possibility is that this function can longjmp out into cpu_exec,
and then the iothread mutex would not be released.  I think this is more
likely.

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [Qemu-devel] [RFC v2 10/11] tcg: drop global lock during TCG code execution
  2016-05-25 10:33     ` Paolo Bonzini
@ 2016-05-25 11:07       ` Alex Bennée
  2016-05-25 12:46         ` Paolo Bonzini
  0 siblings, 1 reply; 66+ messages in thread
From: Alex Bennée @ 2016-05-25 11:07 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sergey Fedorov, mttcg, fred.konrad, a.rigo, cota, qemu-devel,
	mark.burton, jan.kiszka, rth, peter.maydell, claudio.fontana,
	Peter Crosthwaite, Eduardo Habkost, Michael S. Tsirkin


Paolo Bonzini <pbonzini@redhat.com> writes:

> On 24/05/2016 23:28, Sergey Fedorov wrote:
>> On 05/04/16 18:32, Alex Bennée wrote:
>>> diff --git a/cpu-exec.c b/cpu-exec.c
>>> index bd50fef..f558508 100644
>>> --- a/cpu-exec.c
>>> +++ b/cpu-exec.c
>>> @@ -28,6 +28,7 @@
>>>  #include "qemu/rcu.h"
>>>  #include "exec/tb-hash.h"
>>>  #include "exec/log.h"
>>> +#include "qemu/main-loop.h"
>>>  #if defined(TARGET_I386) && !defined(CONFIG_USER_ONLY)
>>>  #include "hw/i386/apic.h"
>>>  #endif
>>> @@ -452,6 +453,8 @@ int cpu_exec(CPUState *cpu)
>>>              for(;;) {
>>>                  interrupt_request = cpu->interrupt_request;
>>>                  if (unlikely(interrupt_request)) {
>>> +                    qemu_mutex_lock_iothread();
>>> +
>>>                      if (unlikely(cpu->singlestep_enabled & SSTEP_NOIRQ)) {
>>>                          /* Mask out external interrupts for this step. */
>>>                          interrupt_request &= ~CPU_INTERRUPT_SSTEP_MASK;
>>> @@ -504,6 +507,11 @@ int cpu_exec(CPUState *cpu)
>>>                             the program flow was changed */
>>>                          next_tb = 0;
>>>                      }
>>> +
>>> +                    if (qemu_mutex_iothread_locked()) {
>>> +                        qemu_mutex_unlock_iothread();
>>> +                    }
>>> +
>>
>> Why do we need to check for "qemu_mutex_iothread_locked()" here?
>> iothread lock is always held here, isn't it?
>
> Correct.

I'll fix that for the next iteration. The main else leg still needs to
check though as it doesn't know if the mutex was grabbed before a
siglongjmp.


>
>>>                  }
>>>                  if (unlikely(cpu->exit_request
>>>                               || replay_has_interrupt())) {
>> (snip)
>>> diff --git a/cputlb.c b/cputlb.c
>>> index 466663b..1412049 100644
>>> --- a/cputlb.c
>>> +++ b/cputlb.c
>>> @@ -18,6 +18,7 @@
>>>   */
>>>
>>>  #include "qemu/osdep.h"
>>> +#include "qemu/main-loop.h"
>>
>> No need in this include.
>>
>>>  #include "cpu.h"
>>>  #include "exec/exec-all.h"
>>>  #include "exec/memory.h"
>>> diff --git a/exec.c b/exec.c
>>> index c46c123..9e83673 100644
>>> --- a/exec.c
>>> +++ b/exec.c
>> (snip)
>>> @@ -2347,8 +2353,14 @@ static void io_mem_init(void)
>>>      memory_region_init_io(&io_mem_rom, NULL, &unassigned_mem_ops, NULL, NULL, UINT64_MAX);
>>>      memory_region_init_io(&io_mem_unassigned, NULL, &unassigned_mem_ops, NULL,
>>>                            NULL, UINT64_MAX);
>>> +
>>> +    /* io_mem_notdirty calls tb_invalidate_phys_page_fast,
>>> +     * which must be called without the iothread mutex.
>>> +     */
>>
>> notdirty_mem_write() operates on page dirty flags. Is it safe to do so
>> out of iothread lock?
>
> Yes, see commit 5f2cb94 ("memory: make
> cpu_physical_memory_sync_dirty_bitmap() fully atomic", 2014-12-02) and
> those before.
>
>>>      memory_region_init_io(&io_mem_notdirty, NULL, &notdirty_mem_ops, NULL,
>>>                            NULL, UINT64_MAX);
>>> +    memory_region_clear_global_locking(&io_mem_notdirty);
>>> +
>>>      memory_region_init_io(&io_mem_watch, NULL, &watch_mem_ops, NULL,
>>>                            NULL, UINT64_MAX);
>>>  }
>> (snip)
>>> diff --git a/translate-all.c b/translate-all.c
>>> index 935d24c..0c377bb 100644
>>> --- a/translate-all.c
>>> +++ b/translate-all.c
>>> @@ -1318,6 +1318,7 @@ void tb_invalidate_phys_range(tb_page_addr_t start, tb_page_addr_t end)
>>>   * this TB.
>>>   *
>>>   * Called with mmap_lock held for user-mode emulation
>>> + * If called from generated code, iothread mutex must not be held.
>>
>> What does that mean? If iothread mutex is not required by the function
>> then why mention it here at all?
>
> If this function can take the iothread mutex itself, this would cause a
> deadlock.  I'm not sure if it could though.
>
> Another possibility is that this function can longjmp out into cpu_exec,
> and then the iothread mutex would not be released.  I think this is more
> likely.

Any longjmp should:

            tb_lock_reset();
            if (qemu_mutex_iothread_locked()) {
                qemu_mutex_unlock_iothread();
            }

To reset the locks.

>
> Thanks,
>
> Paolo


--
Alex Bennée

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [Qemu-devel] [RFC v2 10/11] tcg: drop global lock during TCG code execution
  2016-05-25 11:07       ` Alex Bennée
@ 2016-05-25 12:46         ` Paolo Bonzini
  0 siblings, 0 replies; 66+ messages in thread
From: Paolo Bonzini @ 2016-05-25 12:46 UTC (permalink / raw)
  To: Alex Bennée
  Cc: mttcg, peter.maydell, claudio.fontana, Peter Crosthwaite,
	jan.kiszka, Michael S. Tsirkin, mark.burton, qemu-devel,
	Eduardo Habkost, a.rigo, cota, Sergey Fedorov, rth, fred.konrad



On 25/05/2016 13:07, Alex Bennée wrote:
>>>> + * If called from generated code, iothread mutex must not be held.
>>>
>>> What does that mean? If iothread mutex is not required by the function
>>> then why mention it here at all?
>>
>> If this function can take the iothread mutex itself, this would cause a
>> deadlock.  I'm not sure if it could though.
>>
>> Another possibility is that this function can longjmp out into cpu_exec,
>> and then the iothread mutex would not be released.  I think this is more
>> likely.
>
> Any longjmp should:
> 
>             tb_lock_reset();
>             if (qemu_mutex_iothread_locked()) {
>                 qemu_mutex_unlock_iothread();
>             }
> 
> To reset the locks.
> 

Ok, so I think these comments are stale, dating to before you added the
unlock_iothread in the sigsetjmp else.

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [Qemu-devel] [RFC v2 03/11] docs: new design document multi-thread-tcg.txt (DRAFTING)
  2016-04-11 20:00   ` Sergey Fedorov
@ 2016-05-25 15:48     ` Sergey Fedorov
  2016-05-25 16:01       ` Alex Bennée
  2016-05-25 18:03       ` Paolo Bonzini
  0 siblings, 2 replies; 66+ messages in thread
From: Sergey Fedorov @ 2016-05-25 15:48 UTC (permalink / raw)
  To: Alex Bennée, mttcg, fred.konrad, a.rigo, cota
  Cc: qemu-devel, mark.burton, pbonzini, jan.kiszka, rth,
	peter.maydell, claudio.fontana

On 11/04/16 23:00, Sergey Fedorov wrote:
> On 05/04/16 18:32, Alex Bennée wrote:
>
> (snip)
>> +
>> +Memory maps and TLBs
>> +--------------------
>> +
>> +The memory handling code is fairly critical to the speed of memory
>> +access in the emulated system.
>> +
> It would be nice to put some intro sentence for the following bullets :)
>
>> +  - Memory regions (dividing up access to PIO, MMIO and RAM)
>> +  - Dirty page tracking (for code gen, migration and display)
>> +  - Virtual TLB (for translating guest address->real address)

There's also a global page table - called 'l1_map' - which is used for:
 * keeping a list of TBs generated from a given physical guest page for
   further code invalidation on page writes
 * holding a bitmap to track which regions of a given physical guest page
   actually contain code for optimized code invalidation on page writes
   (used only in system mode emulation)
 * holding page flags, e.g. protection bits (used only in user mode
   emulation)

The page table seems to be protected by 'mmap_lock' in user mode
emulation but by 'tb_lock' in system mode emulation. It may turn to be
possible to read it safely even with no lock held.

Kind regards,
Sergey

>> +
>> +There is a both a fast path walked by the generated code and a slow
>> +path when resolution is required. When the TLB tables are updated we
>> +need to ensure they are done in a safe way by bringing all executing
>> +threads to a halt before making the modifications.
> Again, I think we could benefit if we could possibly manage to avoid
> bringing vCPU threads to halt.
>
> Nothing about memory regions and dirty page tracking?
>
>> +
>> +DESIGN REQUIREMENTS:
>> +
>> +  - TLB Flush All/Page
>> +    - can be across-CPUs
>> +    - will need all other CPUs brought to a halt
> s/will/may/ ?
>
>> +  - TLB Update (update a CPUTLBEntry, via tlb_set_page_with_attrs)
>> +    - This is a per-CPU table - by definition can't race
>> +    - updated by it's own thread when the slow-path is forced
> (snip)

Kind regards,
Sergey

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [Qemu-devel] [RFC v2 03/11] docs: new design document multi-thread-tcg.txt (DRAFTING)
  2016-05-25 15:48     ` Sergey Fedorov
@ 2016-05-25 16:01       ` Alex Bennée
  2016-05-25 18:03       ` Paolo Bonzini
  1 sibling, 0 replies; 66+ messages in thread
From: Alex Bennée @ 2016-05-25 16:01 UTC (permalink / raw)
  To: Sergey Fedorov
  Cc: mttcg, fred.konrad, a.rigo, cota, qemu-devel, mark.burton,
	pbonzini, jan.kiszka, rth, peter.maydell, claudio.fontana


Sergey Fedorov <serge.fdrv@gmail.com> writes:

> On 11/04/16 23:00, Sergey Fedorov wrote:
>> On 05/04/16 18:32, Alex Bennée wrote:
>>
>> (snip)
>>> +
>>> +Memory maps and TLBs
>>> +--------------------
>>> +
>>> +The memory handling code is fairly critical to the speed of memory
>>> +access in the emulated system.
>>> +
>> It would be nice to put some intro sentence for the following bullets :)
>>
>>> +  - Memory regions (dividing up access to PIO, MMIO and RAM)
>>> +  - Dirty page tracking (for code gen, migration and display)
>>> +  - Virtual TLB (for translating guest address->real address)
>
> There's also a global page table - called 'l1_map' - which is used for:
>  * keeping a list of TBs generated from a given physical guest page for
>    further code invalidation on page writes
>  * holding a bitmap to track which regions of a given physical guest page
>    actually contain code for optimized code invalidation on page writes
>    (used only in system mode emulation)
>  * holding page flags, e.g. protection bits (used only in user mode
>    emulation)
>
> The page table seems to be protected by 'mmap_lock' in user mode
> emulation but by 'tb_lock' in system mode emulation. It may turn to be
> possible to read it safely even with no lock held.

I've started adding words to that effect to the document.

>
> Kind regards,
> Sergey
>
>>> +
>>> +There is a both a fast path walked by the generated code and a slow
>>> +path when resolution is required. When the TLB tables are updated we
>>> +need to ensure they are done in a safe way by bringing all executing
>>> +threads to a halt before making the modifications.
>> Again, I think we could benefit if we could possibly manage to avoid
>> bringing vCPU threads to halt.
>>
>> Nothing about memory regions and dirty page tracking?
>>
>>> +
>>> +DESIGN REQUIREMENTS:
>>> +
>>> +  - TLB Flush All/Page
>>> +    - can be across-CPUs
>>> +    - will need all other CPUs brought to a halt
>> s/will/may/ ?
>>
>>> +  - TLB Update (update a CPUTLBEntry, via tlb_set_page_with_attrs)
>>> +    - This is a per-CPU table - by definition can't race
>>> +    - updated by it's own thread when the slow-path is forced
>> (snip)
>
> Kind regards,
> Sergey


--
Alex Bennée

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [Qemu-devel] [RFC v2 03/11] docs: new design document multi-thread-tcg.txt (DRAFTING)
  2016-05-25 15:48     ` Sergey Fedorov
  2016-05-25 16:01       ` Alex Bennée
@ 2016-05-25 18:03       ` Paolo Bonzini
  2016-05-25 18:13         ` Sergey Fedorov
  1 sibling, 1 reply; 66+ messages in thread
From: Paolo Bonzini @ 2016-05-25 18:03 UTC (permalink / raw)
  To: Sergey Fedorov
  Cc: Alex Bennée, mttcg, fred konrad, a rigo, cota, qemu-devel,
	mark burton, jan kiszka, rth, peter maydell, claudio fontana

> The page table seems to be protected by 'mmap_lock' in user mode
> emulation but by 'tb_lock' in system mode emulation. It may turn to be
> possible to read it safely even with no lock held.

Yes, it is possible to at least follow the radix tree safely with no
lock held.  The fields in the leaves can be either lockless or protected
by a lock.

The radix tree can be followed without a lock just like you do with RCU.
The difference with RCU is that:

1) the leaves are protected with a lock, so you don't do the "copy";
instead after reading you lock around updates

2) the radix tree is only ever added to, so you don't need to protect
the reads with rcu_read_lock/rcu_read_unlock.  rcu_read_lock and
rcu_read_unlock are only needed to inform the deleters that something
cannot yet go away.  Without deleters, you don't need rcu_read_lock
and rcu_read_unlock (but you still need atomic_rcu_read/atomic_rcu_set).

Paolo

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [Qemu-devel] [RFC v2 03/11] docs: new design document multi-thread-tcg.txt (DRAFTING)
  2016-05-25 18:03       ` Paolo Bonzini
@ 2016-05-25 18:13         ` Sergey Fedorov
  0 siblings, 0 replies; 66+ messages in thread
From: Sergey Fedorov @ 2016-05-25 18:13 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Alex Bennée, mttcg, fred konrad, a rigo, cota, qemu-devel,
	mark burton, jan kiszka, rth, peter maydell, claudio fontana

On 25/05/16 21:03, Paolo Bonzini wrote:
>> The page table seems to be protected by 'mmap_lock' in user mode
>> emulation but by 'tb_lock' in system mode emulation. It may turn to be
>> possible to read it safely even with no lock held.
> Yes, it is possible to at least follow the radix tree safely with no
> lock held.  The fields in the leaves can be either lockless or protected
> by a lock.
>
> The radix tree can be followed without a lock just like you do with RCU.
> The difference with RCU is that:
>
> 1) the leaves are protected with a lock, so you don't do the "copy";
> instead after reading you lock around updates
>
> 2) the radix tree is only ever added to, so you don't need to protect
> the reads with rcu_read_lock/rcu_read_unlock.  rcu_read_lock and
> rcu_read_unlock are only needed to inform the deleters that something
> cannot yet go away.  Without deleters, you don't need rcu_read_lock
> and rcu_read_unlock (but you still need atomic_rcu_read/atomic_rcu_set).
>
>

Yes, however looking closer at how the leafs are used I can't see much
point to do this so far...

Thanks,
Sergey

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [Qemu-devel] [RFC v2 07/11] tcg: cpus rm tcg_exec_all()
  2016-04-05 15:32 ` [Qemu-devel] [RFC v2 07/11] tcg: cpus rm tcg_exec_all() Alex Bennée
@ 2016-05-26 11:03   ` Sergey Fedorov
  2016-05-26 13:10     ` Alex Bennée
  0 siblings, 1 reply; 66+ messages in thread
From: Sergey Fedorov @ 2016-05-26 11:03 UTC (permalink / raw)
  To: Alex Bennée, mttcg, fred.konrad, a.rigo, cota
  Cc: qemu-devel, mark.burton, pbonzini, jan.kiszka, rth,
	peter.maydell, claudio.fontana, Peter Crosthwaite

On 05/04/16 18:32, Alex Bennée wrote:
> diff --git a/cpus.c b/cpus.c
> index e118fdf..46732a5 100644
> --- a/cpus.c
> +++ b/cpus.c
(snip)
> @@ -1109,7 +1108,7 @@ static void *qemu_dummy_cpu_thread_fn(void *arg)
>  #endif
>  }
>  
> -static void tcg_exec_all(void);
> +static int tcg_cpu_exec(CPUState *cpu);

Why don't just move tcg_cpu_exec() here and avoid this forward
declaration. Such forward declarations of static functions are a bit
annoying :)

>  
>  static void *qemu_tcg_cpu_thread_fn(void *arg)
>  {
> @@ -1140,8 +1139,35 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
>      /* process any pending work */
>      atomic_mb_set(&exit_request, 1);
>  
> +    cpu = first_cpu;
> +
>      while (1) {
> -        tcg_exec_all();
> +        /* Account partial waits to QEMU_CLOCK_VIRTUAL.  */
> +        qemu_account_warp_timer();
> +
> +        if (!cpu) {
> +            cpu = first_cpu;
> +        }
> +
> +        for (; cpu != NULL && !exit_request; cpu = CPU_NEXT(cpu)) {

Maybe a "while" cycle would be a bit neater here, like:

    while (cpu != NULL && !exit_request) {
        /* ... */
        cpu = CPU_NEXT(cpu);
    }


> +
> +            qemu_clock_enable(QEMU_CLOCK_VIRTUAL,
> +                              (cpu->singlestep_enabled & SSTEP_NOTIMER) == 0);
> +
> +            if (cpu_can_run(cpu)) {
> +                int r = tcg_cpu_exec(cpu);
> +                if (r == EXCP_DEBUG) {
> +                    cpu_handle_guest_debug(cpu);
> +                    break;
> +                }
> +            } else if (cpu->stop || cpu->stopped) {
> +                break;
> +            }
> +
> +        } /* for cpu.. */
> +
> +        /* Pairs with smp_wmb in qemu_cpu_kick.  */

While at it, we could also fix this comment like this:

    /* Pairs with atomic_mb_read() in cpu_exec(). */


> +        atomic_mb_set(&exit_request, 0);
>  
>          if (use_icount) {
>              int64_t deadline = qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL);

Kind regards,
Sergey

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [Qemu-devel] [RFC v2 07/11] tcg: cpus rm tcg_exec_all()
  2016-05-26 11:03   ` Sergey Fedorov
@ 2016-05-26 13:10     ` Alex Bennée
  0 siblings, 0 replies; 66+ messages in thread
From: Alex Bennée @ 2016-05-26 13:10 UTC (permalink / raw)
  To: Sergey Fedorov
  Cc: mttcg, fred.konrad, a.rigo, cota, qemu-devel, mark.burton,
	pbonzini, jan.kiszka, rth, peter.maydell, claudio.fontana,
	Peter Crosthwaite


Sergey Fedorov <serge.fdrv@gmail.com> writes:

> On 05/04/16 18:32, Alex Bennée wrote:
>> diff --git a/cpus.c b/cpus.c
>> index e118fdf..46732a5 100644
>> --- a/cpus.c
>> +++ b/cpus.c
> (snip)
>> @@ -1109,7 +1108,7 @@ static void *qemu_dummy_cpu_thread_fn(void *arg)
>>  #endif
>>  }
>>
>> -static void tcg_exec_all(void);
>> +static int tcg_cpu_exec(CPUState *cpu);
>
> Why don't just move tcg_cpu_exec() here and avoid this forward
> declaration. Such forward declarations of static functions are a bit
> annoying :)

Sounds like a plan.

>
>>
>>  static void *qemu_tcg_cpu_thread_fn(void *arg)
>>  {
>> @@ -1140,8 +1139,35 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
>>      /* process any pending work */
>>      atomic_mb_set(&exit_request, 1);
>>
>> +    cpu = first_cpu;
>> +
>>      while (1) {
>> -        tcg_exec_all();
>> +        /* Account partial waits to QEMU_CLOCK_VIRTUAL.  */
>> +        qemu_account_warp_timer();
>> +
>> +        if (!cpu) {
>> +            cpu = first_cpu;
>> +        }
>> +
>> +        for (; cpu != NULL && !exit_request; cpu = CPU_NEXT(cpu)) {
>
> Maybe a "while" cycle would be a bit neater here, like:
>
>     while (cpu != NULL && !exit_request) {
>         /* ... */
>         cpu = CPU_NEXT(cpu);
>     }

Yeah, I prefer the while to non-standard for loops.

>
>
>> +
>> +            qemu_clock_enable(QEMU_CLOCK_VIRTUAL,
>> +                              (cpu->singlestep_enabled & SSTEP_NOTIMER) == 0);
>> +
>> +            if (cpu_can_run(cpu)) {
>> +                int r = tcg_cpu_exec(cpu);
>> +                if (r == EXCP_DEBUG) {
>> +                    cpu_handle_guest_debug(cpu);
>> +                    break;
>> +                }
>> +            } else if (cpu->stop || cpu->stopped) {
>> +                break;
>> +            }
>> +
>> +        } /* for cpu.. */
>> +
>> +        /* Pairs with smp_wmb in qemu_cpu_kick.  */
>
> While at it, we could also fix this comment like this:
>
>     /* Pairs with atomic_mb_read() in cpu_exec(). */

Will do.

>
>> +        atomic_mb_set(&exit_request, 0);
>>
>>          if (use_icount) {
>>              int64_t deadline = qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL);
>
> Kind regards,
> Sergey

Thanks,


--
Alex Bennée

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [Qemu-devel] [RFC v2 11/11] tcg: enable thread-per-vCPU
  2016-04-05 15:32 ` [Qemu-devel] [RFC v2 11/11] tcg: enable thread-per-vCPU Alex Bennée
@ 2016-05-27 13:57   ` Sergey Fedorov
  2016-05-27 14:55     ` Paolo Bonzini
  0 siblings, 1 reply; 66+ messages in thread
From: Sergey Fedorov @ 2016-05-27 13:57 UTC (permalink / raw)
  To: Alex Bennée, mttcg, fred.konrad, a.rigo, cota
  Cc: qemu-devel, mark.burton, pbonzini, jan.kiszka, rth,
	peter.maydell, claudio.fontana, Peter Crosthwaite

On 05/04/16 18:32, Alex Bennée wrote:
> From: KONRAD Frederic <fred.konrad@greensocs.com>
>
> This allows the user to switch on multi-thread behaviour and spawn a
> thread per-vCPU. For a simple test like:
>
>   ./arm/run ./arm/locking-test.flat -smp 4 -tcg mttcg=on
>
> Will now use 4 vCPU threads and have an expected FAIL (instead of the
> unexpected PASS) as the default mode of the test has no protection when
> incrementing a shared variable.
>
> However we still default to a single thread for all vCPUs as individual
> front-end and back-ends need additional fixes to safely support:
>   - atomic behaviour
>   - tb invalidation
>   - memory ordering
>
> The function default_mttcg_enabled can be tweaked as support is added.
>
> As assumptions about tcg_current_cpu are no longer relevant to the
> single-threaded kick routine we need to save the current information
> somewhere else for the timer.

It was hard to read this patch first time through. It seems to pursue
multiple goals at the same time and needs to be split at least to the
following parts:
 1. Make 'cpu->thread_kicked' access atomic
 2. Remove global 'exit_request' and use per-CPU 'exit_request'
 3. Change how 'current_cpu' is set
 4. Reorganize round-robin CPU TCG thread function
 5. Enable 'mmap_lock' for system mode emulation (do we really want this?)
 6. Enable 'tb_lock' for system mode emulation
 7. Introduce per-CPU TCG thread function

That would be easier and faster to review these changes as a separate
patches. I attempted to mark each individual change with the numbers
from the list above.
 
>
> Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> [AJB: Some fixes, conditionally, commit rewording]
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
>
> ---
> v1 (ajb):
>   - fix merge conflicts
>   - maintain single-thread approach
> v2
>   - re-base fixes (no longer has tb_find_fast lock tweak ahead)
>   - remove bogus break condition on cpu->stop/stopped
>   - only process exiting cpus exit_request
>   - handle all cpus idle case (fixes shutdown issues)
>   - sleep on EXCP_HALTED in mttcg mode (prevent crash on start-up)
>   - move icount timer into helper
> ---
>  cpu-exec-common.c       |   1 -
>  cpu-exec.c              |  15 ----
>  cpus.c                  | 216 +++++++++++++++++++++++++++++++++---------------
>  include/exec/exec-all.h |   4 -
>  translate-all.c         |   8 --
>  5 files changed, 150 insertions(+), 94 deletions(-)
>
> diff --git a/cpu-exec-common.c b/cpu-exec-common.c
> index 1b1731c..3d7eaa3 100644
> --- a/cpu-exec-common.c
> +++ b/cpu-exec-common.c
> @@ -23,7 +23,6 @@
>  #include "exec/memory-internal.h"
>  
>  bool exit_request;
> -CPUState *tcg_current_cpu;

This relates to item 2. There's no more references to this global
'exit_request', lets just remove it.

>  
>  /* exit the current TB from a signal handler. The host registers are
>     restored in a state compatible with the CPU emulator
> diff --git a/cpu-exec.c b/cpu-exec.c
> index f558508..42cec05 100644
> --- a/cpu-exec.c
> +++ b/cpu-exec.c
> @@ -292,7 +292,6 @@ static TranslationBlock *tb_find_slow(CPUState *cpu,
>          goto found;
>      }
>  
> -#ifdef CONFIG_USER_ONLY

5, 6.

>      /* mmap_lock is needed by tb_gen_code, and mmap_lock must be
>       * taken outside tb_lock.  Since we're momentarily dropping
>       * tb_lock, there's a chance that our desired tb has been
> @@ -306,15 +305,12 @@ static TranslationBlock *tb_find_slow(CPUState *cpu,
>          mmap_unlock();
>          goto found;
>      }
> -#endif

5, 6.

>  
>      /* if no translated code available, then translate it now */
>      cpu->tb_invalidated_flag = false;
>      tb = tb_gen_code(cpu, pc, cs_base, flags, 0);
>  
> -#ifdef CONFIG_USER_ONLY
>      mmap_unlock();
> -#endif

5.

>  
>  found:
>      /* we add the TB in the virtual pc hash table */
> @@ -388,13 +384,8 @@ int cpu_exec(CPUState *cpu)
>          cpu->halted = 0;
>      }
>  
> -    atomic_mb_set(&tcg_current_cpu, cpu);
>      rcu_read_lock();
>  
> -    if (unlikely(atomic_mb_read(&exit_request))) {
> -        cpu->exit_request = 1;
> -    }
> -

2.

>      cc->cpu_exec_enter(cpu);
>  
>      /* Calculate difference between guest clock and host clock.
> @@ -515,7 +506,6 @@ int cpu_exec(CPUState *cpu)
>                  }
>                  if (unlikely(cpu->exit_request
>                               || replay_has_interrupt())) {
> -                    cpu->exit_request = 0;

2.

>                      cpu->exception_index = EXCP_INTERRUPT;
>                      cpu_loop_exit(cpu);
>                  }
> @@ -629,10 +619,5 @@ int cpu_exec(CPUState *cpu)
>      cc->cpu_exec_exit(cpu);
>      rcu_read_unlock();
>  
> -    /* fail safe : never use current_cpu outside cpu_exec() */
> -    current_cpu = NULL;

We still want to reset 'current_cpu' leaving cpu_exec(), don't we?

> -
> -    /* Does not need atomic_mb_set because a spurious wakeup is okay.  */
> -    atomic_set(&tcg_current_cpu, NULL);

2.

>      return ret;
>  }
> diff --git a/cpus.c b/cpus.c
> index 02fae13..f7c7359 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -966,10 +966,7 @@ void run_on_cpu(CPUState *cpu, void (*func)(void *data), void *data)
>  
>      qemu_cpu_kick(cpu);
>      while (!atomic_mb_read(&wi.done)) {
> -        CPUState *self_cpu = current_cpu;
> -
>          qemu_cond_wait(&qemu_work_cond, &qemu_global_mutex);
> -        current_cpu = self_cpu;

Don't we need to keep it for the sake of single-threaded CPU loop?

>      }
>  }
>  
> @@ -1031,13 +1028,13 @@ static void flush_queued_work(CPUState *cpu)
>  
>  static void qemu_wait_io_event_common(CPUState *cpu)
>  {
> +    atomic_mb_set(&cpu->thread_kicked, false);
>      if (cpu->stop) {
>          cpu->stop = false;
>          cpu->stopped = true;
>          qemu_cond_broadcast(&qemu_pause_cond);
>      }
>      flush_queued_work(cpu);
> -    cpu->thread_kicked = false;

1.

>  }
>  
>  static void qemu_tcg_wait_io_event(CPUState *cpu)
> @@ -1046,9 +1043,7 @@ static void qemu_tcg_wait_io_event(CPUState *cpu)
>          qemu_cond_wait(cpu->halt_cond, &qemu_global_mutex);
>      }
>  
> -    CPU_FOREACH(cpu) {
> -        qemu_wait_io_event_common(cpu);
> -    }
> +    qemu_wait_io_event_common(cpu);

4?

>  }
>  
>  static void qemu_kvm_wait_io_event(CPUState *cpu)
> @@ -1115,6 +1110,7 @@ static void *qemu_dummy_cpu_thread_fn(void *arg)
>      qemu_thread_get_self(cpu->thread);
>      cpu->thread_id = qemu_get_thread_id();
>      cpu->can_do_io = 1;
> +    current_cpu = cpu;

3.

>  
>      sigemptyset(&waitset);
>      sigaddset(&waitset, SIG_IPI);
> @@ -1123,9 +1119,7 @@ static void *qemu_dummy_cpu_thread_fn(void *arg)
>      cpu->created = true;
>      qemu_cond_signal(&qemu_cpu_cond);
>  
> -    current_cpu = cpu;
>      while (1) {
> -        current_cpu = NULL;

3.

>          qemu_mutex_unlock_iothread();
>          do {
>              int sig;
> @@ -1136,7 +1130,6 @@ static void *qemu_dummy_cpu_thread_fn(void *arg)
>              exit(1);
>          }
>          qemu_mutex_lock_iothread();
> -        current_cpu = cpu;

3.

>          qemu_wait_io_event_common(cpu);
>      }
>  
> @@ -1153,32 +1146,52 @@ static void *qemu_dummy_cpu_thread_fn(void *arg)
>   * elsewhere.
>   */
>  static int tcg_cpu_exec(CPUState *cpu);
> -static void qemu_cpu_kick_no_halt(void);
> +
> +struct kick_info {
> +    QEMUTimer *timer;
> +    CPUState  *cpu;
> +};
>  
>  static void kick_tcg_thread(void *opaque)
>  {
> -    QEMUTimer *self = *(QEMUTimer **) opaque;
> -    timer_mod(self,
> +    struct kick_info *info = (struct kick_info *) opaque;
> +    CPUState *cpu = atomic_mb_read(&info->cpu);
> +
> +    timer_mod(info->timer,
>                qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) +
>                NANOSECONDS_PER_SECOND / 10);
> -    qemu_cpu_kick_no_halt();
> +
> +    if (cpu) {
> +        cpu_exit(cpu);
> +    }

2. Maybe a separate change.

>  }
>  
> -static void *qemu_tcg_cpu_thread_fn(void *arg)
> +static void handle_icount_deadline(void)
> +{
> +    if (use_icount) {
> +        int64_t deadline =
> +            qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL);
> +
> +        if (deadline == 0) {
> +            qemu_clock_notify(QEMU_CLOCK_VIRTUAL);
> +        }
> +    }
> +}

4.

> +
> +static void *qemu_tcg_single_cpu_thread_fn(void *arg)

7.

>  {
> +    struct kick_info info;
>      CPUState *cpu = arg;
> -    QEMUTimer *kick_timer;

2. Maybe a separate change.

>  
>      rcu_register_thread();
>  
>      qemu_mutex_lock_iothread();
>      qemu_thread_get_self(cpu->thread);
>  
> -    CPU_FOREACH(cpu) {
> -        cpu->thread_id = qemu_get_thread_id();
> -        cpu->created = true;
> -        cpu->can_do_io = 1;
> -    }
> +    cpu->thread_id = qemu_get_thread_id();
> +    cpu->created = true;
> +    cpu->can_do_io = 1;
> +    current_cpu = cpu;

4?

>      qemu_cond_signal(&qemu_cpu_cond);
>  
>      /* wait for initial kick-off after machine start */
> @@ -1193,18 +1206,24 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
>  
>      /* Set to kick if we have to do more than one vCPU */
>      if (CPU_NEXT(first_cpu)) {
> -        kick_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,  kick_tcg_thread, &kick_timer);
> -        timer_mod(kick_timer,
> +        info.timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,  kick_tcg_thread, &info);
> +        info.cpu = NULL;
> +        smp_wmb();
> +        timer_mod(info.timer,
>                    qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) +
>                    NANOSECONDS_PER_SECOND / 10);

2. Maybe a separate change.

>      }
>  
>      /* process any pending work */
> -    atomic_mb_set(&exit_request, 1);
> +    CPU_FOREACH(cpu) {
> +        atomic_mb_set(&cpu->exit_request, 1);
> +    }

2.

>  
>      cpu = first_cpu;
>  
>      while (1) {
> +        bool nothing_ran = true;
> +

4.

>          /* Account partial waits to QEMU_CLOCK_VIRTUAL.  */
>          qemu_account_warp_timer();
>  
> @@ -1212,34 +1231,107 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
>              cpu = first_cpu;
>          }
>  
> -        for (; cpu != NULL && !exit_request; cpu = CPU_NEXT(cpu)) {
> +        for (; cpu != NULL && !cpu->exit_request; cpu = CPU_NEXT(cpu)) {
> +            atomic_mb_set(&info.cpu, cpu);

2.

>  
>              qemu_clock_enable(QEMU_CLOCK_VIRTUAL,
>                                (cpu->singlestep_enabled & SSTEP_NOTIMER) == 0);
>  
>              if (cpu_can_run(cpu)) {
>                  int r = tcg_cpu_exec(cpu);
> +                nothing_ran = false;

4.

>                  if (r == EXCP_DEBUG) {
>                      cpu_handle_guest_debug(cpu);
>                      break;
>                  }
> -            } else if (cpu->stop || cpu->stopped) {
> -                break;

4.

>              }
>  
>          } /* for cpu.. */
>  
> -        /* Pairs with smp_wmb in qemu_cpu_kick.  */
> -        atomic_mb_set(&exit_request, 0);

2.

> +        atomic_mb_set(&info.cpu, NULL);

2. Maybe a separate change.

> +
> +        handle_icount_deadline();

4.

>  
> -        if (use_icount) {
> -            int64_t deadline = qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL);
> +        /* We exit in one of three conditions:
> +         *  - cpu is set, because exit_request is true
> +         *  - cpu is not set, because we have looped around
> +         *  - cpu is not set and nothing ran
> +         */
>  
> -            if (deadline == 0) {
> -                qemu_clock_notify(QEMU_CLOCK_VIRTUAL);
> +        if (cpu) {
> +            g_assert(cpu->exit_request);
> +            /* Pairs with smp_wmb in qemu_cpu_kick.  */
> +            atomic_mb_set(&cpu->exit_request, 0);
> +            qemu_tcg_wait_io_event(cpu);
> +        } else if (nothing_ran) {
> +            while (all_cpu_threads_idle()) {
> +                qemu_cond_wait(first_cpu->halt_cond, &qemu_global_mutex);

2, 4.

>              }
>          }
> -        qemu_tcg_wait_io_event(QTAILQ_FIRST(&cpus));
> +    }
> +
> +    return NULL;
> +}
> +
> +/* Multi-threaded TCG
> + *
> + * In the multi-threaded case each vCPU has its own thread. The TLS
> + * variable current_cpu can be used deep in the code to find the
> + * current CPUState for a given thread.
> + */
> +
> +static void *qemu_tcg_cpu_thread_fn(void *arg)
> +{
> +    CPUState *cpu = arg;
> +
> +    rcu_register_thread();
> +
> +    qemu_mutex_lock_iothread();
> +    qemu_thread_get_self(cpu->thread);
> +
> +    cpu->thread_id = qemu_get_thread_id();
> +    cpu->created = true;
> +    cpu->can_do_io = 1;
> +    current_cpu = cpu;
> +    qemu_cond_signal(&qemu_cpu_cond);
> +
> +    /* process any pending work */
> +    atomic_mb_set(&cpu->exit_request, 1);
> +
> +    while (1) {
> +        bool sleep = false;
> +
> +        if (cpu_can_run(cpu)) {
> +            int r = tcg_cpu_exec(cpu);
> +            switch (r)
> +            {
> +            case EXCP_DEBUG:
> +                cpu_handle_guest_debug(cpu);
> +                break;
> +            case EXCP_HALTED:
> +                /* during start-up the vCPU is reset and the thread is
> +                 * kicked several times. If we don't ensure we go back
> +                 * to sleep in the halted state we won't cleanly
> +                 * start-up when the vCPU is enabled.
> +                 */
> +                sleep = true;
> +                break;
> +            default:
> +                /* Ignore everything else? */
> +                break;
> +            }
> +        } else {
> +            sleep = true;
> +        }
> +
> +        handle_icount_deadline();
> +
> +        if (sleep) {
> +            qemu_cond_wait(cpu->halt_cond, &qemu_global_mutex);
> +        }
> +
> +        atomic_mb_set(&cpu->exit_request, 0);
> +        qemu_tcg_wait_io_event(cpu);

7.

>      }
>  
>      return NULL;
> @@ -1264,24 +1356,11 @@ static void qemu_cpu_kick_thread(CPUState *cpu)
>  #endif
>  }
>  
> -static void qemu_cpu_kick_no_halt(void)
> -{
> -    CPUState *cpu;
> -    /* Ensure whatever caused the exit has reached the CPU threads before
> -     * writing exit_request.
> -     */
> -    atomic_mb_set(&exit_request, 1);
> -    cpu = atomic_mb_read(&tcg_current_cpu);
> -    if (cpu) {
> -        cpu_exit(cpu);
> -    }
> -}

2.

> -
>  void qemu_cpu_kick(CPUState *cpu)
>  {
>      qemu_cond_broadcast(cpu->halt_cond);
>      if (tcg_enabled()) {
> -        qemu_cpu_kick_no_halt();
> +        cpu_exit(cpu);

2.

>      } else {
>          qemu_cpu_kick_thread(cpu);
>      }
> @@ -1347,13 +1426,6 @@ void pause_all_vcpus(void)
>  
>      if (qemu_in_vcpu_thread()) {
>          cpu_stop_current();
> -        if (!kvm_enabled()) {
> -            CPU_FOREACH(cpu) {
> -                cpu->stop = false;
> -                cpu->stopped = true;
> -            }
> -            return;
> -        }

4.

>      }
>  
>      while (!all_vcpus_paused()) {
> @@ -1387,29 +1459,41 @@ void resume_all_vcpus(void)
>  static void qemu_tcg_init_vcpu(CPUState *cpu)
>  {
>      char thread_name[VCPU_THREAD_NAME_SIZE];
> -    static QemuCond *tcg_halt_cond;
> -    static QemuThread *tcg_cpu_thread;
> +    static QemuCond *single_tcg_halt_cond;
> +    static QemuThread *single_tcg_cpu_thread;
>  
> -    /* share a single thread for all cpus with TCG */
> -    if (!tcg_cpu_thread) {
> +    if (qemu_tcg_mttcg_enabled() || !single_tcg_cpu_thread) {
>          cpu->thread = g_malloc0(sizeof(QemuThread));
>          cpu->halt_cond = g_malloc0(sizeof(QemuCond));
>          qemu_cond_init(cpu->halt_cond);
> -        tcg_halt_cond = cpu->halt_cond;
> -        snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/TCG",
> +
> +        if (qemu_tcg_mttcg_enabled()) {
> +            /* create a thread per vCPU with TCG (MTTCG) */
> +            snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/TCG",
>                   cpu->cpu_index);
> -        qemu_thread_create(cpu->thread, thread_name, qemu_tcg_cpu_thread_fn,
> -                           cpu, QEMU_THREAD_JOINABLE);
> +
> +            qemu_thread_create(cpu->thread, thread_name, qemu_tcg_cpu_thread_fn,
> +                               cpu, QEMU_THREAD_JOINABLE);
> +
> +        } else {
> +            /* share a single thread for all cpus with TCG */
> +            snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "ALL CPUs/TCG");
> +            qemu_thread_create(cpu->thread, thread_name, qemu_tcg_single_cpu_thread_fn,
> +                               cpu, QEMU_THREAD_JOINABLE);
> +
> +            single_tcg_halt_cond = cpu->halt_cond;
> +            single_tcg_cpu_thread = cpu->thread;
> +        }
>  #ifdef _WIN32
>          cpu->hThread = qemu_thread_get_handle(cpu->thread);
>  #endif
>          while (!cpu->created) {
>              qemu_cond_wait(&qemu_cpu_cond, &qemu_global_mutex);
>          }
> -        tcg_cpu_thread = cpu->thread;
>      } else {
> -        cpu->thread = tcg_cpu_thread;
> -        cpu->halt_cond = tcg_halt_cond;
> +        /* For non-MTTCG cases we share the thread */
> +        cpu->thread = single_tcg_cpu_thread;
> +        cpu->halt_cond = single_tcg_halt_cond;
>      }
>  }

7.

>  
> diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
> index 60716ae..cde4b7a 100644
> --- a/include/exec/exec-all.h
> +++ b/include/exec/exec-all.h
> @@ -479,8 +479,4 @@ bool memory_region_is_unassigned(MemoryRegion *mr);
>  /* vl.c */
>  extern int singlestep;
>  
> -/* cpu-exec.c, accessed with atomic_mb_read/atomic_mb_set */
> -extern CPUState *tcg_current_cpu;
> -extern bool exit_request;
> -

2.

>  #endif
> diff --git a/translate-all.c b/translate-all.c
> index 0c377bb..8e70583 100644
> --- a/translate-all.c
> +++ b/translate-all.c
> @@ -122,36 +122,28 @@ static void *l1_map[V_L1_SIZE];
>  TCGContext tcg_ctx;
>  
>  /* translation block context */
> -#ifdef CONFIG_USER_ONLY
>  __thread int have_tb_lock;
> -#endif
>  
>  void tb_lock(void)
>  {
> -#ifdef CONFIG_USER_ONLY
>      assert(!have_tb_lock);
>      qemu_mutex_lock(&tcg_ctx.tb_ctx.tb_lock);
>      have_tb_lock++;
> -#endif
>  }
>  
>  void tb_unlock(void)
>  {
> -#ifdef CONFIG_USER_ONLY
>      assert(have_tb_lock);
>      have_tb_lock--;
>      qemu_mutex_unlock(&tcg_ctx.tb_ctx.tb_lock);
> -#endif
>  }
>  
>  void tb_lock_reset(void)
>  {
> -#ifdef CONFIG_USER_ONLY
>      if (have_tb_lock) {
>          qemu_mutex_unlock(&tcg_ctx.tb_ctx.tb_lock);
>          have_tb_lock = 0;
>      }
> -#endif
>  }
>  

6.

>  static TranslationBlock *tb_find_pc(uintptr_t tc_ptr);

Kind regards,
Sergey

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [Qemu-devel] [RFC v2 11/11] tcg: enable thread-per-vCPU
  2016-05-27 13:57   ` Sergey Fedorov
@ 2016-05-27 14:55     ` Paolo Bonzini
  2016-05-27 15:07       ` Sergey Fedorov
  0 siblings, 1 reply; 66+ messages in thread
From: Paolo Bonzini @ 2016-05-27 14:55 UTC (permalink / raw)
  To: Sergey Fedorov, Alex Bennée, mttcg, fred.konrad, a.rigo, cota
  Cc: qemu-devel, mark.burton, jan.kiszka, rth, peter.maydell,
	claudio.fontana, Peter Crosthwaite



On 27/05/2016 15:57, Sergey Fedorov wrote:
>  1. Make 'cpu->thread_kicked' access atomic
>  2. Remove global 'exit_request' and use per-CPU 'exit_request'
>  3. Change how 'current_cpu' is set
>  4. Reorganize round-robin CPU TCG thread function
>  5. Enable 'mmap_lock' for system mode emulation (do we really want this?)

No, I don't think so.

>  6. Enable 'tb_lock' for system mode emulation
>  7. Introduce per-CPU TCG thread function

At least 2/3/7 must be done at the same time, but I agree that this
patch could use some splitting. :)

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [Qemu-devel] [RFC v2 11/11] tcg: enable thread-per-vCPU
  2016-05-27 14:55     ` Paolo Bonzini
@ 2016-05-27 15:07       ` Sergey Fedorov
  2016-05-27 15:25         ` Paolo Bonzini
  0 siblings, 1 reply; 66+ messages in thread
From: Sergey Fedorov @ 2016-05-27 15:07 UTC (permalink / raw)
  To: Paolo Bonzini, Alex Bennée, mttcg, fred.konrad, a.rigo, cota
  Cc: qemu-devel, mark.burton, jan.kiszka, rth, peter.maydell,
	claudio.fontana, Peter Crosthwaite

On 27/05/16 17:55, Paolo Bonzini wrote:
>
> On 27/05/2016 15:57, Sergey Fedorov wrote:
>>  1. Make 'cpu->thread_kicked' access atomic
>>  2. Remove global 'exit_request' and use per-CPU 'exit_request'
>>  3. Change how 'current_cpu' is set
>>  4. Reorganize round-robin CPU TCG thread function
>>  5. Enable 'mmap_lock' for system mode emulation (do we really want this?)
> No, I don't think so.
>
>>  6. Enable 'tb_lock' for system mode emulation
>>  7. Introduce per-CPU TCG thread function
> At least 2/3/7 must be done at the same time, but I agree that this
> patch could use some splitting. :)

Hmm, 2/3 do also change single-threaded CPU loop. I think they should
apply separately from 7.

Thanks,
Sergey

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [Qemu-devel] [RFC v2 11/11] tcg: enable thread-per-vCPU
  2016-05-27 15:07       ` Sergey Fedorov
@ 2016-05-27 15:25         ` Paolo Bonzini
  2016-05-27 18:54           ` Sergey Fedorov
  0 siblings, 1 reply; 66+ messages in thread
From: Paolo Bonzini @ 2016-05-27 15:25 UTC (permalink / raw)
  To: Sergey Fedorov, Alex Bennée, mttcg, fred.konrad, a.rigo, cota
  Cc: qemu-devel, mark.burton, jan.kiszka, rth, peter.maydell,
	claudio.fontana, Peter Crosthwaite



On 27/05/2016 17:07, Sergey Fedorov wrote:
>>> >>  1. Make 'cpu->thread_kicked' access atomic
>>> >>  2. Remove global 'exit_request' and use per-CPU 'exit_request'
>>> >>  3. Change how 'current_cpu' is set
>>> >>  4. Reorganize round-robin CPU TCG thread function
>>> >>  5. Enable 'mmap_lock' for system mode emulation (do we really want this?)
>> > No, I don't think so.
>> >
>>> >>  6. Enable 'tb_lock' for system mode emulation
>>> >>  7. Introduce per-CPU TCG thread function
>> > At least 2/3/7 must be done at the same time, but I agree that this
>> > patch could use some splitting. :)
> Hmm, 2/3 do also change single-threaded CPU loop. I think they should
> apply separately from 7.

Reviewed the patch now, and I'm not sure how you can do 2/3 for the
single-threaded CPU loop.  They could be moved out of cpu_exec and into
cpus.c (in a separate patch), but you need exit_request and
tcg_current_cpu to properly kick the single-threaded CPU loop out of
qemu_tcg_cpu_thread_fn.

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [Qemu-devel] [RFC v2 11/11] tcg: enable thread-per-vCPU
  2016-05-27 15:25         ` Paolo Bonzini
@ 2016-05-27 18:54           ` Sergey Fedorov
  2016-06-02 16:36             ` Alex Bennée
  0 siblings, 1 reply; 66+ messages in thread
From: Sergey Fedorov @ 2016-05-27 18:54 UTC (permalink / raw)
  To: Paolo Bonzini, Alex Bennée, mttcg, fred.konrad, a.rigo, cota
  Cc: qemu-devel, mark.burton, jan.kiszka, rth, peter.maydell,
	claudio.fontana, Peter Crosthwaite

On 27/05/16 18:25, Paolo Bonzini wrote:
>
> On 27/05/2016 17:07, Sergey Fedorov wrote:
>>>>>>  1. Make 'cpu->thread_kicked' access atomic
>>>>>>  2. Remove global 'exit_request' and use per-CPU 'exit_request'
>>>>>>  3. Change how 'current_cpu' is set
>>>>>>  4. Reorganize round-robin CPU TCG thread function
>>>>>>  5. Enable 'mmap_lock' for system mode emulation (do we really want this?)
>>>> No, I don't think so.
>>>>
>>>>>>  6. Enable 'tb_lock' for system mode emulation
>>>>>>  7. Introduce per-CPU TCG thread function
>>>> At least 2/3/7 must be done at the same time, but I agree that this
>>>> patch could use some splitting. :)
>> Hmm, 2/3 do also change single-threaded CPU loop. I think they should
>> apply separately from 7.
> Reviewed the patch now, and I'm not sure how you can do 2/3 for the
> single-threaded CPU loop.  They could be moved out of cpu_exec and into
> cpus.c (in a separate patch), but you need exit_request and
> tcg_current_cpu to properly kick the single-threaded CPU loop out of
> qemu_tcg_cpu_thread_fn.

Summarizing Paolo and my chat on IRC, we want run_on_cpu() to be served
as soon as possible so that it would not block IO thread for too long.
Removing global 'exit_request' would mean that a run_on_cpu() request
from IO thread wouldn't be served until single-threaded CPU loop
schedules the target CPU. This doesn't seem to be acceptable.

NB: Calling run_on_cpu() for other CPU from the CPU thread would cause a
deadlock in single-threaded round-robin CPU loop.

Thanks,
Sergey

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [Qemu-devel] [RFC v2 05/11] tcg: protect TBContext with tb_lock.
  2016-05-11 12:45   ` Sergey Fedorov
  2016-05-11 12:52     ` Paolo Bonzini
@ 2016-06-01 10:30     ` Alex Bennée
  2016-06-02 14:37       ` Sergey Fedorov
  1 sibling, 1 reply; 66+ messages in thread
From: Alex Bennée @ 2016-06-01 10:30 UTC (permalink / raw)
  To: Sergey Fedorov
  Cc: mttcg, fred.konrad, a.rigo, cota, qemu-devel, mark.burton,
	pbonzini, jan.kiszka, rth, peter.maydell, claudio.fontana,
	Peter Crosthwaite, Michael S. Tsirkin, Eduardo Habkost


Sergey Fedorov <serge.fdrv@gmail.com> writes:

> On 05/04/16 18:32, Alex Bennée wrote:
> (snip)
>> diff --git a/cpu-exec.c b/cpu-exec.c
>> index 74065d9..bd50fef 100644
>> --- a/cpu-exec.c
>> +++ b/cpu-exec.c
>> @@ -205,18 +205,24 @@ static void cpu_exec_nocache(CPUState *cpu, int max_cycles,
>>      if (max_cycles > CF_COUNT_MASK)
>>          max_cycles = CF_COUNT_MASK;
>>
>> +    tb_lock();
>>      cpu->tb_invalidated_flag = false;
>>      tb = tb_gen_code(cpu, orig_tb->pc, orig_tb->cs_base, orig_tb->flags,
>>                       max_cycles | CF_NOCACHE
>>                           | (ignore_icount ? CF_IGNORE_ICOUNT : 0));
>>      tb->orig_tb = cpu->tb_invalidated_flag ? NULL : orig_tb;
>>      cpu->current_tb = tb;
>> +    tb_unlock();
>> +
>>      /* execute the generated code */
>>      trace_exec_tb_nocache(tb, tb->pc);
>> -    cpu_tb_exec(cpu, tb);
>> +    cpu_tb_exec(cpu, tb->tc_ptr);
>
> Very suspicious change. I can't even find which patch changes
> cpu_tb_exec() accordingly.

I think that came from a patch this series was based on.
It's gone now.

>
>> +
>> +    tb_lock();
>>      cpu->current_tb = NULL;
>>      tb_phys_invalidate(tb, -1);
>>      tb_free(tb);
>> +    tb_unlock();
>>  }
>>  #endif
>>
>> diff --git a/exec.c b/exec.c
>> index 17f390e..c46c123 100644
>> --- a/exec.c
>> +++ b/exec.c
>> @@ -2111,6 +2111,9 @@ static void check_watchpoint(int offset, int len, MemTxAttrs attrs, int flags)
>>                      continue;
>>                  }
>>                  cpu->watchpoint_hit = wp;
>> +
>> +                /* Unlocked by cpu_loop_exit or cpu_resume_from_signal.  */
>
> In fact, neither cpu_resume_from_signal() nor cpu_loop_exit() unlocks
> the lock by itself, it gets unlocked after sigsetjmp() returns via
> siglongjmp() back to cpu_exec(). So maybe it would be more clear to say
> something like "'tb_lock' gets unlocked after siglongjmp()"?


"Locks are reset when we longjmp back to the main cpu_exec loop"?

Looking at where the patch is though I think I need to bring that bit
forward from the main series.

>
>> +                tb_lock();
>>                  tb_check_watchpoint(cpu);
>>                  if (wp->flags & BP_STOP_BEFORE_ACCESS) {
>>                      cpu->exception_index = EXCP_DEBUG;
> (snip)
>> diff --git a/translate-all.c b/translate-all.c
>> index a7ff5e7..935d24c 100644
>> --- a/translate-all.c
>> +++ b/translate-all.c
>> @@ -834,7 +834,9 @@ static void page_flush_tb(void)
>>  }
>>
>>  /* flush all the translation blocks */
>> -/* XXX: tb_flush is currently not thread safe */
>> +/* XXX: tb_flush is currently not thread safe.  System emulation calls it only
>> + * with tb_lock taken or from safe_work, so no need to take tb_lock here.
>> + */
>
> "System emulation"? What about user-mode emulation?

It's still not thread safe ;-)

It's a harder problem to solve because we can't just suspend all
threads to reset the translation buffer. I'm not sure we want to try and
fix it in this series.

>
>>  void tb_flush(CPUState *cpu)
>>  {
>>  #if defined(DEBUG_FLUSH)
>> @@ -1350,6 +1352,7 @@ void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end,
>>      /* we remove all the TBs in the range [start, end[ */
>>      /* XXX: see if in some cases it could be faster to invalidate all
>>         the code */
>> +    tb_lock();
>
> Don't we need also protect a call to page_find() above? page_find()
> calls page_find_alloc() which is noted to be called with 'tb_lock' held.
> However, it might depend on the way we treat 'mmap_lock' in system mode
> emulation. We might also consider taking the lock outside of
> tb_invalidate_phys*() functions because they can be called after
> page_find().
>
>>      tb = p->first_tb;
>>      while (tb != NULL) {
>>          n = (uintptr_t)tb & 3;
>> @@ -1417,12 +1420,13 @@ void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end,
>>      if (current_tb_modified) {
>>          /* we generate a block containing just the instruction
>>             modifying the memory. It will ensure that it cannot modify
>> -           itself */
>> +           itself.  cpu_resume_from_signal unlocks tb_lock.  */
>>          cpu->current_tb = NULL;
>>          tb_gen_code(cpu, current_pc, current_cs_base, current_flags, 1);
>>          cpu_resume_from_signal(cpu, NULL);
>>      }
>>  #endif
>> +    tb_unlock();
>>  }
>>
>>  #ifdef CONFIG_SOFTMMU
> (snip)
>> @ -1627,6 +1636,7 @@ void cpu_io_recompile(CPUState *cpu, uintptr_t retaddr)
>>      target_ulong pc, cs_base;
>>      uint64_t flags;
>>
>> +    tb_lock();
>
> We don't have to take 'tb_lock' for nether tb_find_pc() nor
> cpu_restore_state_from_tb() because the lock does not protect from
> tb_flush() anyway. I think the lock should be taken just before the
> first call to tb_phys_invalidate() in this function.
>
>>      tb = tb_find_pc(retaddr);
>>      if (!tb) {
>>          cpu_abort(cpu, "cpu_io_recompile: could not find TB for pc=%p",
>> @@ -1678,11 +1688,15 @@ void cpu_io_recompile(CPUState *cpu, uintptr_t retaddr)
>>      /* FIXME: In theory this could raise an exception.  In practice
>>         we have already translated the block once so it's probably ok.  */
>>      tb_gen_code(cpu, pc, cs_base, flags, cflags);
>> -    /* TODO: If env->pc != tb->pc (i.e. the faulting instruction was not
>> -       the first in the TB) then we end up generating a whole new TB and
>> -       repeating the fault, which is horribly inefficient.
>> -       Better would be to execute just this insn uncached, or generate a
>> -       second new TB.  */
>> +
>> +    /* This unlocks the tb_lock.
>> +     *
>> +     * TODO: If env->pc != tb->pc (i.e. the faulting instruction was not
>> +     * the first in the TB) then we end up generating a whole new TB and
>> +     * repeating the fault, which is horribly inefficient.
>> +     * Better would be to execute just this insn uncached, or generate a
>> +     * second new TB.
>> +     */
>>      cpu_resume_from_signal(cpu, NULL);
>>  }
> (snip)
>
> Kind regards,
> Sergey


--
Alex Bennée

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [Qemu-devel] [RFC v2 05/11] tcg: protect TBContext with tb_lock.
  2016-06-01 10:30     ` Alex Bennée
@ 2016-06-02 14:37       ` Sergey Fedorov
  0 siblings, 0 replies; 66+ messages in thread
From: Sergey Fedorov @ 2016-06-02 14:37 UTC (permalink / raw)
  To: Alex Bennée
  Cc: mttcg, fred.konrad, a.rigo, cota, qemu-devel, mark.burton,
	pbonzini, jan.kiszka, rth, peter.maydell, claudio.fontana,
	Peter Crosthwaite, Michael S. Tsirkin, Eduardo Habkost

On 01/06/16 13:30, Alex Bennée wrote:
> Sergey Fedorov <serge.fdrv@gmail.com> writes:
>
>> On 05/04/16 18:32, Alex Bennée wrote:
>> (snip)
>>> diff --git a/exec.c b/exec.c
>>> index 17f390e..c46c123 100644
>>> --- a/exec.c
>>> +++ b/exec.c
>>> @@ -2111,6 +2111,9 @@ static void check_watchpoint(int offset, int len, MemTxAttrs attrs, int flags)
>>>                      continue;
>>>                  }
>>>                  cpu->watchpoint_hit = wp;
>>> +
>>> +                /* Unlocked by cpu_loop_exit or cpu_resume_from_signal.  */
>> In fact, neither cpu_resume_from_signal() nor cpu_loop_exit() unlocks
>> the lock by itself, it gets unlocked after sigsetjmp() returns via
>> siglongjmp() back to cpu_exec(). So maybe it would be more clear to say
>> something like "'tb_lock' gets unlocked after siglongjmp()"?
>
> "Locks are reset when we longjmp back to the main cpu_exec loop"?

Yes, it this looks fine.

> Looking at where the patch is though I think I need to bring that bit
> forward from the main series.
>
>>> +                tb_lock();
>>>                  tb_check_watchpoint(cpu);
>>>                  if (wp->flags & BP_STOP_BEFORE_ACCESS) {
>>>                      cpu->exception_index = EXCP_DEBUG;
>> (snip)
>>> diff --git a/translate-all.c b/translate-all.c
>>> index a7ff5e7..935d24c 100644
>>> --- a/translate-all.c
>>> +++ b/translate-all.c
>>> @@ -834,7 +834,9 @@ static void page_flush_tb(void)
>>>  }
>>>
>>>  /* flush all the translation blocks */
>>> -/* XXX: tb_flush is currently not thread safe */
>>> +/* XXX: tb_flush is currently not thread safe.  System emulation calls it only
>>> + * with tb_lock taken or from safe_work, so no need to take tb_lock here.
>>> + */
>> "System emulation"? What about user-mode emulation?
> It's still not thread safe ;-)
>
> It's a harder problem to solve because we can't just suspend all
> threads to reset the translation buffer. I'm not sure we want to try and
> fix it in this series.

I think it could be possible to do something like start_exclusive() to
achieve this in user-only emulation.

>>>  void tb_flush(CPUState *cpu)
>>>  {
>>>  #if defined(DEBUG_FLUSH)

Kind regards,
Sergey

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [Qemu-devel] [RFC v2 09/11] tcg: add kick timer for single-threaded vCPU emulation
  2016-04-11 21:39   ` Sergey Fedorov
@ 2016-06-02 16:00     ` Alex Bennée
  2016-06-02 16:05       ` Sergey Fedorov
  0 siblings, 1 reply; 66+ messages in thread
From: Alex Bennée @ 2016-06-02 16:00 UTC (permalink / raw)
  To: Sergey Fedorov
  Cc: mttcg, fred.konrad, a.rigo, cota, qemu-devel, mark.burton,
	pbonzini, jan.kiszka, rth, peter.maydell, claudio.fontana,
	Peter Crosthwaite


Sergey Fedorov <serge.fdrv@gmail.com> writes:

> On 05/04/16 18:32, Alex Bennée wrote:
>> +static void kick_tcg_thread(void *opaque)
>> +{
>> +    QEMUTimer *self = *(QEMUTimer **) opaque;
>> +    timer_mod(self,
>> +              qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) +
>> +              NANOSECONDS_PER_SECOND / 10);
>> +    qemu_cpu_kick_no_halt();
>> +}
>>
>
> It would be nice to have some definition (e.g. macro) of TCG thread kick
> period.

Will do.

>
> (snip)
>
>> @@ -1179,6 +1198,14 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
>>          }
>>      }
>>
>> +    /* Set to kick if we have to do more than one vCPU */
>> +    if (CPU_NEXT(first_cpu)) {
>> +        kick_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,  kick_tcg_thread, &kick_timer);
>> +        timer_mod(kick_timer,
>> +                  qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) +
>> +                  NANOSECONDS_PER_SECOND / 10);
>> +    }
>> +
>
> I think cpu_ticks_init() could be more natural place to put this
> initialization in.

It would be but I need somewhere to keep kick_timer and doing it inside
the thread function keeps it nice and local.

>
> Kind regards,
> Sergey


--
Alex Bennée

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [Qemu-devel] [RFC v2 09/11] tcg: add kick timer for single-threaded vCPU emulation
  2016-06-02 16:00     ` Alex Bennée
@ 2016-06-02 16:05       ` Sergey Fedorov
  0 siblings, 0 replies; 66+ messages in thread
From: Sergey Fedorov @ 2016-06-02 16:05 UTC (permalink / raw)
  To: Alex Bennée
  Cc: mttcg, fred.konrad, a.rigo, cota, qemu-devel, mark.burton,
	pbonzini, jan.kiszka, rth, peter.maydell, claudio.fontana,
	Peter Crosthwaite

On 02/06/16 19:00, Alex Bennée wrote:
> Sergey Fedorov <serge.fdrv@gmail.com> writes:
>> On 05/04/16 18:32, Alex Bennée wrote:
>>> @@ -1179,6 +1198,14 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
>>>          }
>>>      }
>>>
>>> +    /* Set to kick if we have to do more than one vCPU */
>>> +    if (CPU_NEXT(first_cpu)) {
>>> +        kick_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,  kick_tcg_thread, &kick_timer);
>>> +        timer_mod(kick_timer,
>>> +                  qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) +
>>> +                  NANOSECONDS_PER_SECOND / 10);
>>> +    }
>>> +
>> I think cpu_ticks_init() could be more natural place to put this
>> initialization in.
> It would be but I need somewhere to keep kick_timer and doing it inside
> the thread function keeps it nice and local.

Fair enough. By the way, this kick timer is only required for
round-robin single-threaded CPU loop, right?

Kind regards,
Sergey

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [Qemu-devel] [RFC v2 11/11] tcg: enable thread-per-vCPU
  2016-05-27 18:54           ` Sergey Fedorov
@ 2016-06-02 16:36             ` Alex Bennée
  0 siblings, 0 replies; 66+ messages in thread
From: Alex Bennée @ 2016-06-02 16:36 UTC (permalink / raw)
  To: Sergey Fedorov
  Cc: Paolo Bonzini, mttcg, fred.konrad, a.rigo, cota, qemu-devel,
	mark.burton, jan.kiszka, rth, peter.maydell, claudio.fontana,
	Peter Crosthwaite


Sergey Fedorov <serge.fdrv@gmail.com> writes:

> On 27/05/16 18:25, Paolo Bonzini wrote:
>>
>> On 27/05/2016 17:07, Sergey Fedorov wrote:
>>>>>>>  1. Make 'cpu->thread_kicked' access atomic
>>>>>>>  2. Remove global 'exit_request' and use per-CPU 'exit_request'
>>>>>>>  3. Change how 'current_cpu' is set
>>>>>>>  4. Reorganize round-robin CPU TCG thread function
>>>>>>>  5. Enable 'mmap_lock' for system mode emulation (do we really want this?)
>>>>> No, I don't think so.
>>>>>
>>>>>>>  6. Enable 'tb_lock' for system mode emulation
>>>>>>>  7. Introduce per-CPU TCG thread function
>>>>> At least 2/3/7 must be done at the same time, but I agree that this
>>>>> patch could use some splitting. :)
>>> Hmm, 2/3 do also change single-threaded CPU loop. I think they should
>>> apply separately from 7.
>> Reviewed the patch now, and I'm not sure how you can do 2/3 for the
>> single-threaded CPU loop.  They could be moved out of cpu_exec and into
>> cpus.c (in a separate patch), but you need exit_request and
>> tcg_current_cpu to properly kick the single-threaded CPU loop out of
>> qemu_tcg_cpu_thread_fn.
>
> Summarizing Paolo and my chat on IRC, we want run_on_cpu() to be served
> as soon as possible so that it would not block IO thread for too long.
> Removing global 'exit_request' would mean that a run_on_cpu() request
> from IO thread wouldn't be served until single-threaded CPU loop
> schedules the target CPU. This doesn't seem to be acceptable.

So I've fixed this by keeping a tcg_current_rr_cpu for the benefit of
the round robin scheduling (I needed something similar for the kick
timer once the globals had gone).

I'm seeing if I can pull out the exit_request stuff and other
re-factorings in the big patch to reduce the size of this monster a
little.

> NB: Calling run_on_cpu() for other CPU from the CPU thread would cause a
> deadlock in single-threaded round-robin CPU loop.

We've established this doesn't happen as qemu_cpu_is_self compares
threads which are all the same for RR TCG.

>
> Thanks,
> Sergey


--
Alex Bennée

^ permalink raw reply	[flat|nested] 66+ messages in thread

end of thread, other threads:[~2016-06-02 16:36 UTC | newest]

Thread overview: 66+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-05 15:32 [Qemu-devel] [RFC v2 00/11] Base enabling patches for MTTCG Alex Bennée
2016-04-05 15:32 ` [Qemu-devel] [RFC v2 01/11] tcg: move tb_invalidated_flag to CPUState Alex Bennée
2016-04-05 15:44   ` Paolo Bonzini
2016-04-06 10:11     ` Sergey Fedorov
2016-04-05 15:32 ` [Qemu-devel] [RFC v2 02/11] cpus: make all_vcpus_paused() return bool Alex Bennée
2016-04-11 12:48   ` Sergey Fedorov
2016-04-05 15:32 ` [Qemu-devel] [RFC v2 03/11] docs: new design document multi-thread-tcg.txt (DRAFTING) Alex Bennée
2016-04-11 20:00   ` Sergey Fedorov
2016-05-25 15:48     ` Sergey Fedorov
2016-05-25 16:01       ` Alex Bennée
2016-05-25 18:03       ` Paolo Bonzini
2016-05-25 18:13         ` Sergey Fedorov
2016-05-06 11:25   ` Sergey Fedorov
2016-04-05 15:32 ` [Qemu-devel] [RFC v2 04/11] tcg: comment on which functions have to be called with tb_lock held Alex Bennée
2016-05-05 14:19   ` Sergey Fedorov
2016-05-05 15:03     ` Alex Bennée
2016-05-05 15:25       ` Sergey Fedorov
2016-05-05 15:42         ` Sergey Fedorov
2016-05-06 18:22   ` Sergey Fedorov
2016-05-11 12:58     ` Paolo Bonzini
2016-05-11 13:36       ` Sergey Fedorov
2016-05-11 13:46         ` Paolo Bonzini
2016-05-12 19:32           ` Sergey Fedorov
2016-05-13  9:25             ` Paolo Bonzini
2016-04-05 15:32 ` [Qemu-devel] [RFC v2 05/11] tcg: protect TBContext with tb_lock Alex Bennée
2016-05-11 12:45   ` Sergey Fedorov
2016-05-11 12:52     ` Paolo Bonzini
2016-05-11 13:42       ` Sergey Fedorov
2016-06-01 10:30     ` Alex Bennée
2016-06-02 14:37       ` Sergey Fedorov
2016-04-05 15:32 ` [Qemu-devel] [RFC v2 06/11] target-arm/psci.c: wake up sleeping CPUs Alex Bennée
2016-04-05 15:32 ` [Qemu-devel] [RFC v2 07/11] tcg: cpus rm tcg_exec_all() Alex Bennée
2016-05-26 11:03   ` Sergey Fedorov
2016-05-26 13:10     ` Alex Bennée
2016-04-05 15:32 ` [Qemu-devel] [RFC v2 08/11] tcg: add options for enabling MTTCG Alex Bennée
2016-04-11 20:50   ` Sergey Fedorov
2016-04-12 11:48     ` Alex Bennée
2016-04-12 11:59       ` Peter Maydell
2016-04-12 12:42         ` Sergey Fedorov
2016-04-12 12:50           ` KONRAD Frederic
2016-04-12 13:00             ` Sergey Fedorov
2016-04-12 13:03               ` Pavel Dovgalyuk
2016-04-12 13:19                 ` Sergey Fedorov
2016-04-12 14:23                 ` Alex Bennée
2016-05-09 10:47                   ` Paolo Bonzini
2016-04-12 12:48       ` Sergey Fedorov
2016-05-09 10:45     ` Paolo Bonzini
2016-05-09 11:50       ` Alex Bennée
2016-04-12 13:23   ` Sergey Fedorov
2016-04-12 14:28     ` Alex Bennée
2016-04-05 15:32 ` [Qemu-devel] [RFC v2 09/11] tcg: add kick timer for single-threaded vCPU emulation Alex Bennée
2016-04-11 21:39   ` Sergey Fedorov
2016-06-02 16:00     ` Alex Bennée
2016-06-02 16:05       ` Sergey Fedorov
2016-04-05 15:32 ` [Qemu-devel] [RFC v2 10/11] tcg: drop global lock during TCG code execution Alex Bennée
2016-05-24 21:28   ` Sergey Fedorov
2016-05-25 10:33     ` Paolo Bonzini
2016-05-25 11:07       ` Alex Bennée
2016-05-25 12:46         ` Paolo Bonzini
2016-04-05 15:32 ` [Qemu-devel] [RFC v2 11/11] tcg: enable thread-per-vCPU Alex Bennée
2016-05-27 13:57   ` Sergey Fedorov
2016-05-27 14:55     ` Paolo Bonzini
2016-05-27 15:07       ` Sergey Fedorov
2016-05-27 15:25         ` Paolo Bonzini
2016-05-27 18:54           ` Sergey Fedorov
2016-06-02 16:36             ` Alex Bennée

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.