All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v3 00/11] Reduce lock contention on TCG hot-path
@ 2016-07-12 20:13 Sergey Fedorov
  2016-07-12 20:13 ` [Qemu-devel] [PATCH v3 01/11] util/qht: Document memory ordering assumptions Sergey Fedorov
                   ` (11 more replies)
  0 siblings, 12 replies; 49+ messages in thread
From: Sergey Fedorov @ 2016-07-12 20:13 UTC (permalink / raw)
  To: qemu-devel, mttcg, fred.konrad, a.rigo, serge.fdrv, cota,
	bobby.prani, rth
  Cc: patches, mark.burton, pbonzini, jan.kiszka, peter.maydell,
	claudio.fontana, Alex Bennée

From: Sergey Fedorov <serge.fdrv@gmail.com>

Hi,

This is my respin of Alex's v2 series [1].

The first 8 patches are preparation for the patch 9, the subject matter
of this series, which enables lockless translation block lookup. The
main change here is that Paolo's suggestion is implemented: TBs are
marked with invalid CPU state early during invalidation. This allows to
make lockless lookup safe from races on 'tb_jmp_cache' and direct block
chaining.

The patch 10 is a simple solution to avoid unnecessary bouncing on
'tb_lock' between tb_gen_code() and tb_add_jump(). A local variable is
used to keep track of whether 'tb_lock' has already been taken.

The last patch is my attempt to restructure tb_find_{fast,slow}() into a
single function tb_find(). I think it will be easier to follow the
locking scheme this way. However, I am afraid this last patch can be
controversial, so it can be simply dropped.

This series can be fetch from the public git repository:

    https://github.com/sergefdrv/qemu.git lockless-tb-lookup-v3

[1] http://thread.gmane.org/gmane.comp.emulators.qemu/424856

Kind regards,
Sergey

Summary of changes in v3:
 - QHT memory ordering assumptions documented
 - 'tb_jmp_cache' reset in tb_flush() made atomic
 - explicit memory barriers removed around 'tb_jmp_cache' access
 - safe access to 'tb_flushed' out of 'tb_lock' prepared
 - TBs marked with invalid CPU state early on invalidation
 - Alex's tb_find_{fast,slow}() roll-up related patches dropped
 - bouncing of tb_lock between tb_gen_code() and tb_add_jump() avoided
   with local variable 'have_tb_lock'
 - tb_find_{fast,slow}() merged

Alex Bennée (2):
  tcg: set up tb->page_addr before insertion
  tcg: cpu-exec: remove tb_lock from the hot-path

Sergey Fedorov (9):
  util/qht: Document memory ordering assumptions
  cpu-exec: Pass last_tb by value to tb_find_fast()
  tcg: Prepare safe tb_jmp_cache lookup out of tb_lock
  tcg: Prepare safe access to tb_flushed out of tb_lock
  target-i386: Remove redundant HF_SOFTMMU_MASK
  tcg: Introduce tb_mark_invalid() and tb_is_invalid()
  tcg: Prepare TB invalidation for lockless TB lookup
  tcg: Avoid bouncing tb_lock between tb_gen_code() and tb_add_jump()
  tcg: Merge tb_find_slow() and tb_find_fast()

 cpu-exec.c               | 110 +++++++++++++++++++++--------------------------
 include/exec/exec-all.h  |  10 +++++
 include/qemu/qht.h       |   9 ++++
 target-alpha/cpu.h       |  14 ++++++
 target-arm/cpu.h         |  14 ++++++
 target-cris/cpu.h        |  14 ++++++
 target-i386/cpu.c        |   3 --
 target-i386/cpu.h        |  20 +++++++--
 target-i386/translate.c  |  12 ++----
 target-lm32/cpu.h        |  14 ++++++
 target-m68k/cpu.h        |  14 ++++++
 target-microblaze/cpu.h  |  14 ++++++
 target-mips/cpu.h        |  14 ++++++
 target-moxie/cpu.h       |  14 ++++++
 target-openrisc/cpu.h    |  14 ++++++
 target-ppc/cpu.h         |  14 ++++++
 target-s390x/cpu.h       |  14 ++++++
 target-sh4/cpu.h         |  14 ++++++
 target-sparc/cpu.h       |  14 ++++++
 target-sparc/translate.c |   1 +
 target-tilegx/cpu.h      |  14 ++++++
 target-tricore/cpu.h     |  14 ++++++
 target-unicore32/cpu.h   |  14 ++++++
 target-xtensa/cpu.h      |  14 ++++++
 translate-all.c          |  30 ++++++-------
 util/qht.c               |   8 ++++
 26 files changed, 349 insertions(+), 92 deletions(-)

-- 
1.9.1

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [Qemu-devel] [PATCH v3 01/11] util/qht: Document memory ordering assumptions
  2016-07-12 20:13 [Qemu-devel] [PATCH v3 00/11] Reduce lock contention on TCG hot-path Sergey Fedorov
@ 2016-07-12 20:13 ` Sergey Fedorov
  2016-07-12 23:19   ` Emilio G. Cota
  2016-07-13 11:13   ` Paolo Bonzini
  2016-07-12 20:13 ` [Qemu-devel] [PATCH v3 02/11] cpu-exec: Pass last_tb by value to tb_find_fast() Sergey Fedorov
                   ` (10 subsequent siblings)
  11 siblings, 2 replies; 49+ messages in thread
From: Sergey Fedorov @ 2016-07-12 20:13 UTC (permalink / raw)
  To: qemu-devel, mttcg, fred.konrad, a.rigo, serge.fdrv, cota,
	bobby.prani, rth
  Cc: patches, mark.burton, pbonzini, jan.kiszka, peter.maydell,
	claudio.fontana, Alex Bennée, Sergey Fedorov

From: Sergey Fedorov <serge.fdrv@gmail.com>

It is naturally expected that some memory ordering should be provided
around qht_insert(), qht_remove(), and qht_lookup(). Document these
assumptions in the header file and put some comments in the source to
denote how that memory ordering requirements are fulfilled.

Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
---
 include/qemu/qht.h | 9 +++++++++
 util/qht.c         | 8 ++++++++
 2 files changed, 17 insertions(+)

diff --git a/include/qemu/qht.h b/include/qemu/qht.h
index 70bfc68b8d67..5f633e5d8100 100644
--- a/include/qemu/qht.h
+++ b/include/qemu/qht.h
@@ -69,6 +69,9 @@ void qht_destroy(struct qht *ht);
  * Attempting to insert a NULL @p is a bug.
  * Inserting the same pointer @p with different @hash values is a bug.
  *
+ * In case of successful operation, smp_wmb() is implied before the pointer is
+ * inserted into the hash table.
+ *
  * Returns true on sucess.
  * Returns false if the @p-@hash pair already exists in the hash table.
  */
@@ -86,6 +89,9 @@ bool qht_insert(struct qht *ht, void *p, uint32_t hash);
  * The user-provided @func compares pointers in QHT against @userp.
  * If the function returns true, a match has been found.
  *
+ * smp_rmb() is implied before and after the pointer is looked up and retrieved
+ * from the hash table.
+ *
  * Returns the corresponding pointer when a match is found.
  * Returns NULL otherwise.
  */
@@ -105,6 +111,9 @@ void *qht_lookup(struct qht *ht, qht_lookup_func_t func, const void *userp,
  * This guarantees that concurrent lookups will always compare against valid
  * data.
  *
+ * In case of successful operation, smp_wmb() is implied after the pointer is
+ * removed from the hash table.
+ *
  * Returns true on success.
  * Returns false if the @p-@hash pair was not found.
  */
diff --git a/util/qht.c b/util/qht.c
index 40d6e218f759..0469d068b4de 100644
--- a/util/qht.c
+++ b/util/qht.c
@@ -466,8 +466,10 @@ void *qht_lookup__slowpath(struct qht_bucket *b, qht_lookup_func_t func,
     void *ret;
 
     do {
+        /* seqlock_read_begin() also serves as the gaurantee of smp_rmb() */
         version = seqlock_read_begin(&b->sequence);
         ret = qht_do_lookup(b, func, userp, hash);
+    /* seqlock_read_retry() also serves as the gaurantee of smp_rmb() */
     } while (seqlock_read_retry(&b->sequence, version));
     return ret;
 }
@@ -483,8 +485,10 @@ void *qht_lookup(struct qht *ht, qht_lookup_func_t func, const void *userp,
     map = atomic_rcu_read(&ht->map);
     b = qht_map_to_bucket(map, hash);
 
+    /* seqlock_read_begin() also serves as the gaurantee of smp_rmb() */
     version = seqlock_read_begin(&b->sequence);
     ret = qht_do_lookup(b, func, userp, hash);
+    /* seqlock_read_retry() also serves as the gaurantee of smp_rmb() */
     if (likely(!seqlock_read_retry(&b->sequence, version))) {
         return ret;
     }
@@ -530,6 +534,7 @@ static bool qht_insert__locked(struct qht *ht, struct qht_map *map,
 
  found:
     /* found an empty key: acquire the seqlock and write */
+    /* seqlock_write_begin() also serves as the guarantee of smp_wmb() */
     seqlock_write_begin(&head->sequence);
     if (new) {
         atomic_rcu_set(&prev->next, b);
@@ -661,6 +666,9 @@ bool qht_remove__locked(struct qht_map *map, struct qht_bucket *head,
                 qht_debug_assert(b->hashes[i] == hash);
                 seqlock_write_begin(&head->sequence);
                 qht_bucket_remove_entry(b, i);
+                /* seqlock_write_end() also serves as the guarantee of
+                 * smp_wmb()
+                 */
                 seqlock_write_end(&head->sequence);
                 return true;
             }
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [Qemu-devel] [PATCH v3 02/11] cpu-exec: Pass last_tb by value to tb_find_fast()
  2016-07-12 20:13 [Qemu-devel] [PATCH v3 00/11] Reduce lock contention on TCG hot-path Sergey Fedorov
  2016-07-12 20:13 ` [Qemu-devel] [PATCH v3 01/11] util/qht: Document memory ordering assumptions Sergey Fedorov
@ 2016-07-12 20:13 ` Sergey Fedorov
  2016-07-12 20:13 ` [Qemu-devel] [PATCH v3 03/11] tcg: Prepare safe tb_jmp_cache lookup out of tb_lock Sergey Fedorov
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 49+ messages in thread
From: Sergey Fedorov @ 2016-07-12 20:13 UTC (permalink / raw)
  To: qemu-devel, mttcg, fred.konrad, a.rigo, serge.fdrv, cota,
	bobby.prani, rth
  Cc: patches, mark.burton, pbonzini, jan.kiszka, peter.maydell,
	claudio.fontana, Alex Bennée, Sergey Fedorov,
	Peter Crosthwaite

From: Sergey Fedorov <serge.fdrv@gmail.com>

This is a small clean up. tb_find_fast() is a final consumer of this
variable so no need to pass it by reference. 'last_tb' is always updated
by subsequent cpu_loop_exec_tb() in cpu_exec().

This change also simplifies calling cpu_exec_nocache() in
cpu_handle_exception().

Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
---
 cpu-exec.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index b840e1d2dd41..757f63b08b6b 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -320,7 +320,7 @@ found:
 }
 
 static inline TranslationBlock *tb_find_fast(CPUState *cpu,
-                                             TranslationBlock **last_tb,
+                                             TranslationBlock *last_tb,
                                              int tb_exit)
 {
     CPUArchState *env = (CPUArchState *)cpu->env_ptr;
@@ -351,12 +351,12 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
      * spanning two pages because the mapping for the second page can change.
      */
     if (tb->page_addr[1] != -1) {
-        *last_tb = NULL;
+        last_tb = NULL;
     }
 #endif
     /* See if we can patch the calling TB. */
-    if (*last_tb && !qemu_loglevel_mask(CPU_LOG_TB_NOCHAIN)) {
-        tb_add_jump(*last_tb, tb_exit, tb);
+    if (last_tb && !qemu_loglevel_mask(CPU_LOG_TB_NOCHAIN)) {
+        tb_add_jump(last_tb, tb_exit, tb);
     }
     tb_unlock();
     return tb;
@@ -437,8 +437,7 @@ static inline bool cpu_handle_exception(CPUState *cpu, int *ret)
     } else if (replay_has_exception()
                && cpu->icount_decr.u16.low + cpu->icount_extra == 0) {
         /* try to cause an exception pending in the log */
-        TranslationBlock *last_tb = NULL; /* Avoid chaining TBs */
-        cpu_exec_nocache(cpu, 1, tb_find_fast(cpu, &last_tb, 0), true);
+        cpu_exec_nocache(cpu, 1, tb_find_fast(cpu, NULL, 0), true);
         *ret = -1;
         return true;
 #endif
@@ -622,7 +621,7 @@ int cpu_exec(CPUState *cpu)
             cpu->tb_flushed = false; /* reset before first TB lookup */
             for(;;) {
                 cpu_handle_interrupt(cpu, &last_tb);
-                tb = tb_find_fast(cpu, &last_tb, tb_exit);
+                tb = tb_find_fast(cpu, last_tb, tb_exit);
                 cpu_loop_exec_tb(cpu, tb, &last_tb, &tb_exit, &sc);
                 /* Try to align the host and virtual clocks
                    if the guest is in advance */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [Qemu-devel] [PATCH v3 03/11] tcg: Prepare safe tb_jmp_cache lookup out of tb_lock
  2016-07-12 20:13 [Qemu-devel] [PATCH v3 00/11] Reduce lock contention on TCG hot-path Sergey Fedorov
  2016-07-12 20:13 ` [Qemu-devel] [PATCH v3 01/11] util/qht: Document memory ordering assumptions Sergey Fedorov
  2016-07-12 20:13 ` [Qemu-devel] [PATCH v3 02/11] cpu-exec: Pass last_tb by value to tb_find_fast() Sergey Fedorov
@ 2016-07-12 20:13 ` Sergey Fedorov
  2016-07-14 12:14   ` Alex Bennée
  2016-07-12 20:13 ` [Qemu-devel] [PATCH v3 04/11] tcg: Prepare safe access to tb_flushed " Sergey Fedorov
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 49+ messages in thread
From: Sergey Fedorov @ 2016-07-12 20:13 UTC (permalink / raw)
  To: qemu-devel, mttcg, fred.konrad, a.rigo, serge.fdrv, cota,
	bobby.prani, rth
  Cc: patches, mark.burton, pbonzini, jan.kiszka, peter.maydell,
	claudio.fontana, Alex Bennée, Sergey Fedorov,
	Peter Crosthwaite

From: Sergey Fedorov <serge.fdrv@gmail.com>

Ensure atomicity of CPU's 'tb_jmp_cache' access for future translation
block lookup out of 'tb_lock'.

Note that this patch does *not* make CPU's TLB invalidation safe if it
is done from some other thread while the CPU is in its execution loop.

Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
[AJB: fixed missing atomic set, tweak title]
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
[Sergey Fedorov: removed explicit memory barriers;
memset() on 'tb_jmp_cache' replaced with a loop on atomic_set();
tweaked commit title and message]
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>

---
Changes in v3:
 - explicit memory barriers removed
 - memset() on 'tb_jmp_cache' replaced with a loop on atomic_set()
Changes in v2:
 - fix spelling s/con't/can't/
 - add atomic_read while clearing tb_jmp_cache
 - add r-b tags
Changes in v1 (AJB):
 - tweak title
 - fixed missing set of tb_jmp_cache
---
 cpu-exec.c      |  4 ++--
 translate-all.c | 10 +++++++---
 2 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index 757f63b08b6b..d6178eab71d4 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -315,7 +315,7 @@ static TranslationBlock *tb_find_slow(CPUState *cpu,
 
 found:
     /* we add the TB in the virtual pc hash table */
-    cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)] = tb;
+    atomic_set(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)], tb);
     return tb;
 }
 
@@ -333,7 +333,7 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
        is executed. */
     cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags);
     tb_lock();
-    tb = cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)];
+    tb = atomic_read(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)]);
     if (unlikely(!tb || tb->pc != pc || tb->cs_base != cs_base ||
                  tb->flags != flags)) {
         tb = tb_find_slow(cpu, pc, cs_base, flags);
diff --git a/translate-all.c b/translate-all.c
index 0d47c1c0cf82..fdf520a86d68 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -848,7 +848,11 @@ void tb_flush(CPUState *cpu)
     tcg_ctx.tb_ctx.nb_tbs = 0;
 
     CPU_FOREACH(cpu) {
-        memset(cpu->tb_jmp_cache, 0, sizeof(cpu->tb_jmp_cache));
+        int i;
+
+        for (i = 0; i < TB_JMP_CACHE_SIZE; ++i) {
+            atomic_set(&cpu->tb_jmp_cache[i], NULL);
+        }
         cpu->tb_flushed = true;
     }
 
@@ -1007,8 +1011,8 @@ void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr)
     /* remove the TB from the hash list */
     h = tb_jmp_cache_hash_func(tb->pc);
     CPU_FOREACH(cpu) {
-        if (cpu->tb_jmp_cache[h] == tb) {
-            cpu->tb_jmp_cache[h] = NULL;
+        if (atomic_read(&cpu->tb_jmp_cache[h]) == tb) {
+            atomic_set(&cpu->tb_jmp_cache[h], NULL);
         }
     }
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [Qemu-devel] [PATCH v3 04/11] tcg: Prepare safe access to tb_flushed out of tb_lock
  2016-07-12 20:13 [Qemu-devel] [PATCH v3 00/11] Reduce lock contention on TCG hot-path Sergey Fedorov
                   ` (2 preceding siblings ...)
  2016-07-12 20:13 ` [Qemu-devel] [PATCH v3 03/11] tcg: Prepare safe tb_jmp_cache lookup out of tb_lock Sergey Fedorov
@ 2016-07-12 20:13 ` Sergey Fedorov
  2016-07-14 12:45   ` Alex Bennée
  2016-07-12 20:13 ` [Qemu-devel] [PATCH v3 05/11] target-i386: Remove redundant HF_SOFTMMU_MASK Sergey Fedorov
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 49+ messages in thread
From: Sergey Fedorov @ 2016-07-12 20:13 UTC (permalink / raw)
  To: qemu-devel, mttcg, fred.konrad, a.rigo, serge.fdrv, cota,
	bobby.prani, rth
  Cc: patches, mark.burton, pbonzini, jan.kiszka, peter.maydell,
	claudio.fontana, Alex Bennée, Sergey Fedorov,
	Peter Crosthwaite

From: Sergey Fedorov <serge.fdrv@gmail.com>

Ensure atomicity of CPU's 'tb_flushed' access for future translation
block lookup out of 'tb_lock'.

This field can only be touched from another thread by tb_flush() in user
mode emulation. So the only access to be atomic is:
 * a single write in tb_flush();
 * reads/writes out of 'tb_lock'.

In future, before enabling MTTCG in system mode, tb_flush() must be safe
and this field becomes unnecessary.

Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
---
 cpu-exec.c      | 16 +++++++---------
 translate-all.c |  4 ++--
 2 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index d6178eab71d4..c973e3b85922 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -338,13 +338,6 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
                  tb->flags != flags)) {
         tb = tb_find_slow(cpu, pc, cs_base, flags);
     }
-    if (cpu->tb_flushed) {
-        /* Ensure that no TB jump will be modified as the
-         * translation buffer has been flushed.
-         */
-        *last_tb = NULL;
-        cpu->tb_flushed = false;
-    }
 #ifndef CONFIG_USER_ONLY
     /* We don't take care of direct jumps when address mapping changes in
      * system emulation. So it's not safe to make a direct jump to a TB
@@ -356,7 +349,12 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
 #endif
     /* See if we can patch the calling TB. */
     if (last_tb && !qemu_loglevel_mask(CPU_LOG_TB_NOCHAIN)) {
-        tb_add_jump(last_tb, tb_exit, tb);
+        /* Check if translation buffer has been flushed */
+        if (cpu->tb_flushed) {
+            cpu->tb_flushed = false;
+        } else {
+            tb_add_jump(last_tb, tb_exit, tb);
+        }
     }
     tb_unlock();
     return tb;
@@ -618,7 +616,7 @@ int cpu_exec(CPUState *cpu)
             }
 
             last_tb = NULL; /* forget the last executed TB after exception */
-            cpu->tb_flushed = false; /* reset before first TB lookup */
+            atomic_mb_set(&cpu->tb_flushed, false); /* reset before first TB lookup */
             for(;;) {
                 cpu_handle_interrupt(cpu, &last_tb);
                 tb = tb_find_fast(cpu, last_tb, tb_exit);
diff --git a/translate-all.c b/translate-all.c
index fdf520a86d68..788fed1e0765 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -845,7 +845,6 @@ void tb_flush(CPUState *cpu)
         > tcg_ctx.code_gen_buffer_size) {
         cpu_abort(cpu, "Internal error: code buffer overflow\n");
     }
-    tcg_ctx.tb_ctx.nb_tbs = 0;
 
     CPU_FOREACH(cpu) {
         int i;
@@ -853,9 +852,10 @@ void tb_flush(CPUState *cpu)
         for (i = 0; i < TB_JMP_CACHE_SIZE; ++i) {
             atomic_set(&cpu->tb_jmp_cache[i], NULL);
         }
-        cpu->tb_flushed = true;
+        atomic_mb_set(&cpu->tb_flushed, true);
     }
 
+    tcg_ctx.tb_ctx.nb_tbs = 0;
     qht_reset_size(&tcg_ctx.tb_ctx.htable, CODE_GEN_HTABLE_SIZE);
     page_flush_tb();
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [Qemu-devel] [PATCH v3 05/11] target-i386: Remove redundant HF_SOFTMMU_MASK
  2016-07-12 20:13 [Qemu-devel] [PATCH v3 00/11] Reduce lock contention on TCG hot-path Sergey Fedorov
                   ` (3 preceding siblings ...)
  2016-07-12 20:13 ` [Qemu-devel] [PATCH v3 04/11] tcg: Prepare safe access to tb_flushed " Sergey Fedorov
@ 2016-07-12 20:13 ` Sergey Fedorov
  2016-07-14 12:19   ` Alex Bennée
  2016-07-12 20:13 ` [Qemu-devel] [PATCH v3 06/11] tcg: Introduce tb_mark_invalid() and tb_is_invalid() Sergey Fedorov
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 49+ messages in thread
From: Sergey Fedorov @ 2016-07-12 20:13 UTC (permalink / raw)
  To: qemu-devel, mttcg, fred.konrad, a.rigo, serge.fdrv, cota,
	bobby.prani, rth
  Cc: patches, mark.burton, pbonzini, jan.kiszka, peter.maydell,
	claudio.fontana, Alex Bennée, Sergey Fedorov,
	Eduardo Habkost

From: Sergey Fedorov <serge.fdrv@gmail.com>

'HF_SOFTMMU_MASK' is only set when 'CONFIG_SOFTMMU' is defined. So
there's no need in this flag: test 'CONFIG_SOFTMMU' instead.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
---
 target-i386/cpu.c       |  3 ---
 target-i386/cpu.h       |  3 ---
 target-i386/translate.c | 12 ++++--------
 3 files changed, 4 insertions(+), 14 deletions(-)

diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index fc209ee1cb8a..6e49e4ca8282 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -2725,9 +2725,6 @@ static void x86_cpu_reset(CPUState *s)
 
     /* init to reset state */
 
-#ifdef CONFIG_SOFTMMU
-    env->hflags |= HF_SOFTMMU_MASK;
-#endif
     env->hflags2 |= HF2_GIF_MASK;
 
     cpu_x86_update_cr0(env, 0x60000010);
diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index 5c7a2791f34f..a7c752afdad8 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -129,8 +129,6 @@
    positions to ease oring with eflags. */
 /* current cpl */
 #define HF_CPL_SHIFT         0
-/* true if soft mmu is being used */
-#define HF_SOFTMMU_SHIFT     2
 /* true if hardware interrupts must be disabled for next instruction */
 #define HF_INHIBIT_IRQ_SHIFT 3
 /* 16 or 32 segments */
@@ -160,7 +158,6 @@
 #define HF_MPX_IU_SHIFT     26 /* BND registers in-use */
 
 #define HF_CPL_MASK          (3 << HF_CPL_SHIFT)
-#define HF_SOFTMMU_MASK      (1 << HF_SOFTMMU_SHIFT)
 #define HF_INHIBIT_IRQ_MASK  (1 << HF_INHIBIT_IRQ_SHIFT)
 #define HF_CS32_MASK         (1 << HF_CS32_SHIFT)
 #define HF_SS32_MASK         (1 << HF_SS32_SHIFT)
diff --git a/target-i386/translate.c b/target-i386/translate.c
index 7dea18bd6345..e81fce7bc2b5 100644
--- a/target-i386/translate.c
+++ b/target-i386/translate.c
@@ -8224,9 +8224,9 @@ void gen_intermediate_code(CPUX86State *env, TranslationBlock *tb)
     dc->popl_esp_hack = 0;
     /* select memory access functions */
     dc->mem_index = 0;
-    if (flags & HF_SOFTMMU_MASK) {
-	dc->mem_index = cpu_mmu_index(env, false);
-    }
+#ifdef CONFIG_SOFTMMU
+    dc->mem_index = cpu_mmu_index(env, false);
+#endif
     dc->cpuid_features = env->features[FEAT_1_EDX];
     dc->cpuid_ext_features = env->features[FEAT_1_ECX];
     dc->cpuid_ext2_features = env->features[FEAT_8000_0001_EDX];
@@ -8239,11 +8239,7 @@ void gen_intermediate_code(CPUX86State *env, TranslationBlock *tb)
 #endif
     dc->flags = flags;
     dc->jmp_opt = !(dc->tf || cs->singlestep_enabled ||
-                    (flags & HF_INHIBIT_IRQ_MASK)
-#ifndef CONFIG_SOFTMMU
-                    || (flags & HF_SOFTMMU_MASK)
-#endif
-                    );
+                    (flags & HF_INHIBIT_IRQ_MASK));
     /* Do not optimize repz jumps at all in icount mode, because
        rep movsS instructions are execured with different paths
        in !repz_opt and repz_opt modes. The first one was used
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [Qemu-devel] [PATCH v3 06/11] tcg: Introduce tb_mark_invalid() and tb_is_invalid()
  2016-07-12 20:13 [Qemu-devel] [PATCH v3 00/11] Reduce lock contention on TCG hot-path Sergey Fedorov
                   ` (4 preceding siblings ...)
  2016-07-12 20:13 ` [Qemu-devel] [PATCH v3 05/11] target-i386: Remove redundant HF_SOFTMMU_MASK Sergey Fedorov
@ 2016-07-12 20:13 ` Sergey Fedorov
  2016-07-14 10:25   ` Alex Bennée
  2016-07-14 12:53   ` Alex Bennée
  2016-07-12 20:13 ` [Qemu-devel] [PATCH v3 07/11] tcg: Prepare TB invalidation for lockless TB lookup Sergey Fedorov
                   ` (5 subsequent siblings)
  11 siblings, 2 replies; 49+ messages in thread
From: Sergey Fedorov @ 2016-07-12 20:13 UTC (permalink / raw)
  To: qemu-devel, mttcg, fred.konrad, a.rigo, serge.fdrv, cota,
	bobby.prani, rth
  Cc: patches, mark.burton, pbonzini, jan.kiszka, peter.maydell,
	claudio.fontana, Alex Bennée, Sergey Fedorov,
	Peter Crosthwaite, Edgar E. Iglesias, Eduardo Habkost,
	Michael Walle, Aurelien Jarno, Leon Alrae, Anthony Green,
	Jia Liu, David Gibson, Alexander Graf, Mark Cave-Ayland,
	Artyom Tarasenko, Bastian Koppelmann, Guan Xuetao, Max Filippov,
	qemu-arm, qemu-ppc

From: Sergey Fedorov <serge.fdrv@gmail.com>

These functions will be used to make translation block invalidation safe
with concurrent lockless lookup in the global hash table.

Most targets don't use 'cs_base'; so marking TB as invalid is as simple
as assigning -1 to 'cs_base'. SPARC target stores the next program
counter into 'cs_base', and -1 is a fine invalid value since PC must bet
a multiple of 4 in SPARC. The only odd target is i386, for which a
special flag is introduced in place of removed 'HF_SOFTMMU_MASK'.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
---
 include/exec/exec-all.h  | 10 ++++++++++
 target-alpha/cpu.h       | 14 ++++++++++++++
 target-arm/cpu.h         | 14 ++++++++++++++
 target-cris/cpu.h        | 14 ++++++++++++++
 target-i386/cpu.h        | 17 +++++++++++++++++
 target-lm32/cpu.h        | 14 ++++++++++++++
 target-m68k/cpu.h        | 14 ++++++++++++++
 target-microblaze/cpu.h  | 14 ++++++++++++++
 target-mips/cpu.h        | 14 ++++++++++++++
 target-moxie/cpu.h       | 14 ++++++++++++++
 target-openrisc/cpu.h    | 14 ++++++++++++++
 target-ppc/cpu.h         | 14 ++++++++++++++
 target-s390x/cpu.h       | 14 ++++++++++++++
 target-sh4/cpu.h         | 14 ++++++++++++++
 target-sparc/cpu.h       | 14 ++++++++++++++
 target-sparc/translate.c |  1 +
 target-tilegx/cpu.h      | 14 ++++++++++++++
 target-tricore/cpu.h     | 14 ++++++++++++++
 target-unicore32/cpu.h   | 14 ++++++++++++++
 target-xtensa/cpu.h      | 14 ++++++++++++++
 20 files changed, 266 insertions(+)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index db79ab65cebe..61cc3a1fb8f7 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -256,6 +256,16 @@ void tb_free(TranslationBlock *tb);
 void tb_flush(CPUState *cpu);
 void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr);
 
+static inline void tb_mark_invalid(TranslationBlock *tb)
+{
+    cpu_get_invalid_tb_cpu_state(&tb->pc, &tb->cs_base, &tb->flags);
+}
+
+static inline bool tb_is_invalid(TranslationBlock *tb)
+{
+    return cpu_tb_cpu_state_is_invalidated(tb->pc, tb->cs_base, tb->flags);
+}
+
 #if defined(USE_DIRECT_JUMP)
 
 #if defined(CONFIG_TCG_INTERPRETER)
diff --git a/target-alpha/cpu.h b/target-alpha/cpu.h
index 791da3b4ad67..1ce4526d7a72 100644
--- a/target-alpha/cpu.h
+++ b/target-alpha/cpu.h
@@ -524,4 +524,18 @@ static inline void cpu_get_tb_cpu_state(CPUAlphaState *env, target_ulong *pc,
     *pflags = flags;
 }
 
+static inline void cpu_get_invalid_tb_cpu_state(target_ulong *pc,
+                                                target_ulong *cs_base,
+                                                uint32_t *flags)
+{
+    *cs_base = -1;
+}
+
+static inline bool cpu_tb_cpu_state_is_invalidated(target_ulong pc,
+                                                   target_ulong cs_base,
+                                                   uint32_t flags)
+{
+    return cs_base == -1;
+}
+
 #endif /* !defined (__CPU_ALPHA_H__) */
diff --git a/target-arm/cpu.h b/target-arm/cpu.h
index e2fac46909c6..2855bdae7800 100644
--- a/target-arm/cpu.h
+++ b/target-arm/cpu.h
@@ -2371,6 +2371,20 @@ static inline void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
     *cs_base = 0;
 }
 
+static inline void cpu_get_invalid_tb_cpu_state(target_ulong *pc,
+                                                target_ulong *cs_base,
+                                                uint32_t *flags)
+{
+    *cs_base = -1;
+}
+
+static inline bool cpu_tb_cpu_state_is_invalidated(target_ulong pc,
+                                                   target_ulong cs_base,
+                                                   uint32_t flags)
+{
+    return cs_base == -1;
+}
+
 enum {
     QEMU_PSCI_CONDUIT_DISABLED = 0,
     QEMU_PSCI_CONDUIT_SMC = 1,
diff --git a/target-cris/cpu.h b/target-cris/cpu.h
index e6046d20ca10..fae3219304df 100644
--- a/target-cris/cpu.h
+++ b/target-cris/cpu.h
@@ -295,6 +295,20 @@ static inline void cpu_get_tb_cpu_state(CPUCRISState *env, target_ulong *pc,
 				     | X_FLAG | PFIX_FLAG));
 }
 
+static inline void cpu_get_invalid_tb_cpu_state(target_ulong *pc,
+                                                target_ulong *cs_base,
+                                                uint32_t *flags)
+{
+    *cs_base = -1;
+}
+
+static inline bool cpu_tb_cpu_state_is_invalidated(target_ulong pc,
+                                                   target_ulong cs_base,
+                                                   uint32_t flags)
+{
+    return cs_base == -1;
+}
+
 #define cpu_list cris_cpu_list
 void cris_cpu_list(FILE *f, fprintf_function cpu_fprintf);
 
diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index a7c752afdad8..c9125b725ca4 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -129,6 +129,8 @@
    positions to ease oring with eflags. */
 /* current cpl */
 #define HF_CPL_SHIFT         0
+/* used to mark invalidated translation blocks */
+#define HF_INVALID_SHIFT     2
 /* true if hardware interrupts must be disabled for next instruction */
 #define HF_INHIBIT_IRQ_SHIFT 3
 /* 16 or 32 segments */
@@ -158,6 +160,7 @@
 #define HF_MPX_IU_SHIFT     26 /* BND registers in-use */
 
 #define HF_CPL_MASK          (3 << HF_CPL_SHIFT)
+#define HF_INVALID_MASK      (1 << HF_INVALID_SHIFT)
 #define HF_INHIBIT_IRQ_MASK  (1 << HF_INHIBIT_IRQ_SHIFT)
 #define HF_CS32_MASK         (1 << HF_CS32_SHIFT)
 #define HF_SS32_MASK         (1 << HF_SS32_SHIFT)
@@ -1489,6 +1492,20 @@ static inline void cpu_get_tb_cpu_state(CPUX86State *env, target_ulong *pc,
         (env->eflags & (IOPL_MASK | TF_MASK | RF_MASK | VM_MASK | AC_MASK));
 }
 
+static inline void cpu_get_invalid_tb_cpu_state(target_ulong *pc,
+                                                target_ulong *cs_base,
+                                                uint32_t *flags)
+{
+    *flags = HF_INVALID_MASK;
+}
+
+static inline bool cpu_tb_cpu_state_is_invalidated(target_ulong pc,
+                                                   target_ulong cs_base,
+                                                   uint32_t flags)
+{
+    return flags == HF_INVALID_MASK;
+}
+
 void do_cpu_init(X86CPU *cpu);
 void do_cpu_sipi(X86CPU *cpu);
 
diff --git a/target-lm32/cpu.h b/target-lm32/cpu.h
index 4efe98d828f8..5f78c3c619e5 100644
--- a/target-lm32/cpu.h
+++ b/target-lm32/cpu.h
@@ -271,4 +271,18 @@ static inline void cpu_get_tb_cpu_state(CPULM32State *env, target_ulong *pc,
     *flags = 0;
 }
 
+static inline void cpu_get_invalid_tb_cpu_state(target_ulong *pc,
+                                                target_ulong *cs_base,
+                                                uint32_t *flags)
+{
+    *cs_base = -1;
+}
+
+static inline bool cpu_tb_cpu_state_is_invalidated(target_ulong pc,
+                                                   target_ulong cs_base,
+                                                   uint32_t flags)
+{
+    return cs_base == -1;
+}
+
 #endif
diff --git a/target-m68k/cpu.h b/target-m68k/cpu.h
index 908776999797..c44f4339e40a 100644
--- a/target-m68k/cpu.h
+++ b/target-m68k/cpu.h
@@ -269,4 +269,18 @@ static inline void cpu_get_tb_cpu_state(CPUM68KState *env, target_ulong *pc,
             | ((env->macsr >> 4) & 0xf);        /* Bits 0-3 */
 }
 
+static inline void cpu_get_invalid_tb_cpu_state(target_ulong *pc,
+                                                target_ulong *cs_base,
+                                                uint32_t *flags)
+{
+    *cs_base = -1;
+}
+
+static inline bool cpu_tb_cpu_state_is_invalidated(target_ulong pc,
+                                                   target_ulong cs_base,
+                                                   uint32_t flags)
+{
+    return cs_base == -1;
+}
+
 #endif
diff --git a/target-microblaze/cpu.h b/target-microblaze/cpu.h
index 16815dfc6abc..22362efce586 100644
--- a/target-microblaze/cpu.h
+++ b/target-microblaze/cpu.h
@@ -371,6 +371,20 @@ static inline void cpu_get_tb_cpu_state(CPUMBState *env, target_ulong *pc,
                  (env->sregs[SR_MSR] & (MSR_UM | MSR_VM | MSR_EE));
 }
 
+static inline void cpu_get_invalid_tb_cpu_state(target_ulong *pc,
+                                                target_ulong *cs_base,
+                                                uint32_t *flags)
+{
+    *cs_base = -1;
+}
+
+static inline bool cpu_tb_cpu_state_is_invalidated(target_ulong pc,
+                                                   target_ulong cs_base,
+                                                   uint32_t flags)
+{
+    return cs_base == -1;
+}
+
 #if !defined(CONFIG_USER_ONLY)
 void mb_cpu_unassigned_access(CPUState *cpu, hwaddr addr,
                               bool is_write, bool is_exec, int is_asi,
diff --git a/target-mips/cpu.h b/target-mips/cpu.h
index 9c4fc816bf9b..a887154274b3 100644
--- a/target-mips/cpu.h
+++ b/target-mips/cpu.h
@@ -901,6 +901,20 @@ static inline void cpu_get_tb_cpu_state(CPUMIPSState *env, target_ulong *pc,
                             MIPS_HFLAG_HWRENA_ULR);
 }
 
+static inline void cpu_get_invalid_tb_cpu_state(target_ulong *pc,
+                                                target_ulong *cs_base,
+                                                uint32_t *flags)
+{
+    *cs_base = -1;
+}
+
+static inline bool cpu_tb_cpu_state_is_invalidated(target_ulong pc,
+                                                   target_ulong cs_base,
+                                                   uint32_t flags)
+{
+    return cs_base == -1;
+}
+
 static inline int mips_vpe_active(CPUMIPSState *env)
 {
     int active = 1;
diff --git a/target-moxie/cpu.h b/target-moxie/cpu.h
index 63d5cafc5528..f0f95a2701e1 100644
--- a/target-moxie/cpu.h
+++ b/target-moxie/cpu.h
@@ -136,6 +136,20 @@ static inline void cpu_get_tb_cpu_state(CPUMoxieState *env, target_ulong *pc,
     *flags = 0;
 }
 
+static inline void cpu_get_invalid_tb_cpu_state(target_ulong *pc,
+                                                target_ulong *cs_base,
+                                                uint32_t *flags)
+{
+    *cs_base = -1;
+}
+
+static inline bool cpu_tb_cpu_state_is_invalidated(target_ulong pc,
+                                                   target_ulong cs_base,
+                                                   uint32_t flags)
+{
+    return cs_base == -1;
+}
+
 int moxie_cpu_handle_mmu_fault(CPUState *cpu, vaddr address,
                                int rw, int mmu_idx);
 
diff --git a/target-openrisc/cpu.h b/target-openrisc/cpu.h
index 9451a7cca691..a5650cbed4b7 100644
--- a/target-openrisc/cpu.h
+++ b/target-openrisc/cpu.h
@@ -398,6 +398,20 @@ static inline void cpu_get_tb_cpu_state(CPUOpenRISCState *env,
     *flags = (env->flags & D_FLAG);
 }
 
+static inline void cpu_get_invalid_tb_cpu_state(target_ulong *pc,
+                                                target_ulong *cs_base,
+                                                uint32_t *flags)
+{
+    *cs_base = -1;
+}
+
+static inline bool cpu_tb_cpu_state_is_invalidated(target_ulong pc,
+                                                   target_ulong cs_base,
+                                                   uint32_t flags)
+{
+    return cs_base == -1;
+}
+
 static inline int cpu_mmu_index(CPUOpenRISCState *env, bool ifetch)
 {
     if (!(env->sr & SR_IME)) {
diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
index 2666a3f80d00..092c7954d19a 100644
--- a/target-ppc/cpu.h
+++ b/target-ppc/cpu.h
@@ -2294,6 +2294,20 @@ static inline void cpu_get_tb_cpu_state(CPUPPCState *env, target_ulong *pc,
     *flags = env->hflags;
 }
 
+static inline void cpu_get_invalid_tb_cpu_state(target_ulong *pc,
+                                                target_ulong *cs_base,
+                                                uint32_t *flags)
+{
+    *cs_base = -1;
+}
+
+static inline bool cpu_tb_cpu_state_is_invalidated(target_ulong pc,
+                                                   target_ulong cs_base,
+                                                   uint32_t flags)
+{
+    return cs_base == -1;
+}
+
 #if !defined(CONFIG_USER_ONLY)
 static inline int booke206_tlbm_id(CPUPPCState *env, ppcmas_tlb_t *tlbm)
 {
diff --git a/target-s390x/cpu.h b/target-s390x/cpu.h
index 8bcb0f75f399..574cdb51c967 100644
--- a/target-s390x/cpu.h
+++ b/target-s390x/cpu.h
@@ -393,6 +393,20 @@ static inline void cpu_get_tb_cpu_state(CPUS390XState* env, target_ulong *pc,
              ((env->psw.mask & PSW_MASK_32) ? FLAG_MASK_32 : 0);
 }
 
+static inline void cpu_get_invalid_tb_cpu_state(target_ulong *pc,
+                                                target_ulong *cs_base,
+                                                uint32_t *flags)
+{
+    *cs_base = -1;
+}
+
+static inline bool cpu_tb_cpu_state_is_invalidated(target_ulong pc,
+                                                   target_ulong cs_base,
+                                                   uint32_t flags)
+{
+    return cs_base == -1;
+}
+
 /* While the PoO talks about ILC (a number between 1-3) what is actually
    stored in LowCore is shifted left one bit (an even between 2-6).  As
    this is the actual length of the insn and therefore more useful, that
diff --git a/target-sh4/cpu.h b/target-sh4/cpu.h
index 3f9dae2d1f0d..bcddbf31c853 100644
--- a/target-sh4/cpu.h
+++ b/target-sh4/cpu.h
@@ -387,4 +387,18 @@ static inline void cpu_get_tb_cpu_state(CPUSH4State *env, target_ulong *pc,
             | (env->movcal_backup ? TB_FLAG_PENDING_MOVCA : 0); /* Bit 4 */
 }
 
+static inline void cpu_get_invalid_tb_cpu_state(target_ulong *pc,
+                                                target_ulong *cs_base,
+                                                uint32_t *flags)
+{
+    *cs_base = -1;
+}
+
+static inline bool cpu_tb_cpu_state_is_invalidated(target_ulong pc,
+                                                   target_ulong cs_base,
+                                                   uint32_t flags)
+{
+    return cs_base == -1;
+}
+
 #endif				/* _CPU_SH4_H */
diff --git a/target-sparc/cpu.h b/target-sparc/cpu.h
index 604de84624ca..09df29b7b439 100644
--- a/target-sparc/cpu.h
+++ b/target-sparc/cpu.h
@@ -749,6 +749,20 @@ static inline void cpu_get_tb_cpu_state(CPUSPARCState *env, target_ulong *pc,
 #endif
 }
 
+static inline void cpu_get_invalid_tb_cpu_state(target_ulong *pc,
+                                                target_ulong *cs_base,
+                                                uint32_t *flags)
+{
+    *cs_base = -1; /* npc must be a multible of 4 */
+}
+
+static inline bool cpu_tb_cpu_state_is_invalidated(target_ulong pc,
+                                                   target_ulong cs_base,
+                                                   uint32_t flags)
+{
+    return cs_base == -1;
+}
+
 static inline bool tb_fpu_enabled(int tb_flags)
 {
 #if defined(CONFIG_USER_ONLY)
diff --git a/target-sparc/translate.c b/target-sparc/translate.c
index 0f4faf70624f..cc160bc67be0 100644
--- a/target-sparc/translate.c
+++ b/target-sparc/translate.c
@@ -38,6 +38,7 @@
 #define DYNAMIC_PC  1 /* dynamic pc value */
 #define JUMP_PC     2 /* dynamic pc value which takes only two values
                          according to jump_pc[T2] */
+/* NOTE: -1 is reserved for cpu_get_invalid_tb_cpu_state() */
 
 /* global register indexes */
 static TCGv_env cpu_env;
diff --git a/target-tilegx/cpu.h b/target-tilegx/cpu.h
index d74032925b12..79f84a422de1 100644
--- a/target-tilegx/cpu.h
+++ b/target-tilegx/cpu.h
@@ -174,4 +174,18 @@ static inline void cpu_get_tb_cpu_state(CPUTLGState *env, target_ulong *pc,
     *flags = 0;
 }
 
+static inline void cpu_get_invalid_tb_cpu_state(target_ulong *pc,
+                                                target_ulong *cs_base,
+                                                uint32_t *flags)
+{
+    *cs_base = -1;
+}
+
+static inline bool cpu_tb_cpu_state_is_invalidated(target_ulong pc,
+                                                   target_ulong cs_base,
+                                                   uint32_t flags)
+{
+    return cs_base == -1;
+}
+
 #endif
diff --git a/target-tricore/cpu.h b/target-tricore/cpu.h
index a298d63eea24..8f3ca20241b7 100644
--- a/target-tricore/cpu.h
+++ b/target-tricore/cpu.h
@@ -410,6 +410,20 @@ static inline void cpu_get_tb_cpu_state(CPUTriCoreState *env, target_ulong *pc,
     *flags = 0;
 }
 
+static inline void cpu_get_invalid_tb_cpu_state(target_ulong *pc,
+                                                target_ulong *cs_base,
+                                                uint32_t *flags)
+{
+    *cs_base = -1;
+}
+
+static inline bool cpu_tb_cpu_state_is_invalidated(target_ulong pc,
+                                                   target_ulong cs_base,
+                                                   uint32_t flags)
+{
+    return cs_base == -1;
+}
+
 TriCoreCPU *cpu_tricore_init(const char *cpu_model);
 
 #define cpu_init(cpu_model) CPU(cpu_tricore_init(cpu_model))
diff --git a/target-unicore32/cpu.h b/target-unicore32/cpu.h
index 83f758496a6c..6bcda6aa3e38 100644
--- a/target-unicore32/cpu.h
+++ b/target-unicore32/cpu.h
@@ -179,6 +179,20 @@ static inline void cpu_get_tb_cpu_state(CPUUniCore32State *env, target_ulong *pc
     }
 }
 
+static inline void cpu_get_invalid_tb_cpu_state(target_ulong *pc,
+                                                target_ulong *cs_base,
+                                                uint32_t *flags)
+{
+    *cs_base = -1;
+}
+
+static inline bool cpu_tb_cpu_state_is_invalidated(target_ulong pc,
+                                                   target_ulong cs_base,
+                                                   uint32_t flags)
+{
+    return cs_base == -1;
+}
+
 int uc32_cpu_handle_mmu_fault(CPUState *cpu, vaddr address, int rw,
                               int mmu_idx);
 void uc32_translate_init(void);
diff --git a/target-xtensa/cpu.h b/target-xtensa/cpu.h
index ce9fb5b0033a..bd0a3b1deddf 100644
--- a/target-xtensa/cpu.h
+++ b/target-xtensa/cpu.h
@@ -582,6 +582,20 @@ static inline void cpu_get_tb_cpu_state(CPUXtensaState *env, target_ulong *pc,
     }
 }
 
+static inline void cpu_get_invalid_tb_cpu_state(target_ulong *pc,
+                                                target_ulong *cs_base,
+                                                uint32_t *flags)
+{
+    *cs_base = -1;
+}
+
+static inline bool cpu_tb_cpu_state_is_invalidated(target_ulong pc,
+                                                   target_ulong cs_base,
+                                                   uint32_t flags)
+{
+    return cs_base == -1;
+}
+
 #include "exec/cpu-all.h"
 
 #endif
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [Qemu-devel] [PATCH v3 07/11] tcg: Prepare TB invalidation for lockless TB lookup
  2016-07-12 20:13 [Qemu-devel] [PATCH v3 00/11] Reduce lock contention on TCG hot-path Sergey Fedorov
                   ` (5 preceding siblings ...)
  2016-07-12 20:13 ` [Qemu-devel] [PATCH v3 06/11] tcg: Introduce tb_mark_invalid() and tb_is_invalid() Sergey Fedorov
@ 2016-07-12 20:13 ` Sergey Fedorov
  2016-07-14 12:59   ` Alex Bennée
  2016-07-12 20:13 ` [Qemu-devel] [PATCH v3 08/11] tcg: set up tb->page_addr before insertion Sergey Fedorov
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 49+ messages in thread
From: Sergey Fedorov @ 2016-07-12 20:13 UTC (permalink / raw)
  To: qemu-devel, mttcg, fred.konrad, a.rigo, serge.fdrv, cota,
	bobby.prani, rth
  Cc: patches, mark.burton, pbonzini, jan.kiszka, peter.maydell,
	claudio.fontana, Alex Bennée, Sergey Fedorov,
	Peter Crosthwaite

From: Sergey Fedorov <serge.fdrv@gmail.com>

When invalidating a translation block, set an invalid CPU state into the
TranslationBlock structure first. All subsequent changes are ordered
after it with smp_wmb(). This pairs with implied smp_rmb() of
qht_lookup() in tb_find_physical().

As soon as the TB is marked with an invalid CPU state, there is no need
to remove it from CPU's 'tb_jmp_cache'. However it will be necessary to
recheck whether the target TB is still valid after acquiring 'tb_lock'
but before calling tb_add_jump() since TB lookup is to be performed out
of 'tb_lock' in future. Note that we don't have to check 'last_tb' since
it is safe to patch an already invalidated TB since it will not be
executed anyway.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
---
 cpu-exec.c      |  2 +-
 translate-all.c | 12 +++---------
 2 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index c973e3b85922..07dc50c56e8d 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -352,7 +352,7 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
         /* Check if translation buffer has been flushed */
         if (cpu->tb_flushed) {
             cpu->tb_flushed = false;
-        } else {
+        } else if (!tb_is_invalid(tb)) {
             tb_add_jump(last_tb, tb_exit, tb);
         }
     }
diff --git a/translate-all.c b/translate-all.c
index 788fed1e0765..ee8308209350 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -986,11 +986,13 @@ static inline void tb_jmp_unlink(TranslationBlock *tb)
 /* invalidate one TB */
 void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr)
 {
-    CPUState *cpu;
     PageDesc *p;
     uint32_t h;
     tb_page_addr_t phys_pc;
 
+    tb_mark_invalid(tb);
+    smp_wmb();
+
     /* remove the TB from the hash list */
     phys_pc = tb->page_addr[0] + (tb->pc & ~TARGET_PAGE_MASK);
     h = tb_hash_func(phys_pc, tb->pc, tb->flags);
@@ -1008,14 +1010,6 @@ void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr)
         invalidate_page_bitmap(p);
     }
 
-    /* remove the TB from the hash list */
-    h = tb_jmp_cache_hash_func(tb->pc);
-    CPU_FOREACH(cpu) {
-        if (atomic_read(&cpu->tb_jmp_cache[h]) == tb) {
-            atomic_set(&cpu->tb_jmp_cache[h], NULL);
-        }
-    }
-
     /* suppress this TB from the two jump lists */
     tb_remove_from_jmp_list(tb, 0);
     tb_remove_from_jmp_list(tb, 1);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [Qemu-devel] [PATCH v3 08/11] tcg: set up tb->page_addr before insertion
  2016-07-12 20:13 [Qemu-devel] [PATCH v3 00/11] Reduce lock contention on TCG hot-path Sergey Fedorov
                   ` (6 preceding siblings ...)
  2016-07-12 20:13 ` [Qemu-devel] [PATCH v3 07/11] tcg: Prepare TB invalidation for lockless TB lookup Sergey Fedorov
@ 2016-07-12 20:13 ` Sergey Fedorov
  2016-07-12 20:13 ` [Qemu-devel] [PATCH v3 09/11] tcg: cpu-exec: remove tb_lock from the hot-path Sergey Fedorov
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 49+ messages in thread
From: Sergey Fedorov @ 2016-07-12 20:13 UTC (permalink / raw)
  To: qemu-devel, mttcg, fred.konrad, a.rigo, serge.fdrv, cota,
	bobby.prani, rth
  Cc: patches, Alex Bennée, mark.burton, pbonzini, jan.kiszka,
	peter.maydell, claudio.fontana, Sergey Fedorov,
	Peter Crosthwaite

From: Alex Bennée <alex.bennee@linaro.org>

This ensures that if we find the TB on the slow path that tb->page_addr
is correctly set before being tested.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
---
 translate-all.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/translate-all.c b/translate-all.c
index ee8308209350..b7369fe5bd04 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -1119,10 +1119,6 @@ static void tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc,
 {
     uint32_t h;
 
-    /* add in the hash table */
-    h = tb_hash_func(phys_pc, tb->pc, tb->flags);
-    qht_insert(&tcg_ctx.tb_ctx.htable, tb, h);
-
     /* add in the page list */
     tb_alloc_page(tb, 0, phys_pc & TARGET_PAGE_MASK);
     if (phys_page2 != -1) {
@@ -1131,6 +1127,10 @@ static void tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc,
         tb->page_addr[1] = -1;
     }
 
+    /* add in the hash table */
+    h = tb_hash_func(phys_pc, tb->pc, tb->flags);
+    qht_insert(&tcg_ctx.tb_ctx.htable, tb, h);
+
 #ifdef DEBUG_TB_CHECK
     tb_page_check();
 #endif
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [Qemu-devel] [PATCH v3 09/11] tcg: cpu-exec: remove tb_lock from the hot-path
  2016-07-12 20:13 [Qemu-devel] [PATCH v3 00/11] Reduce lock contention on TCG hot-path Sergey Fedorov
                   ` (7 preceding siblings ...)
  2016-07-12 20:13 ` [Qemu-devel] [PATCH v3 08/11] tcg: set up tb->page_addr before insertion Sergey Fedorov
@ 2016-07-12 20:13 ` Sergey Fedorov
  2016-07-12 20:13 ` [Qemu-devel] [PATCH v3 10/11] tcg: Avoid bouncing tb_lock between tb_gen_code() and tb_add_jump() Sergey Fedorov
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 49+ messages in thread
From: Sergey Fedorov @ 2016-07-12 20:13 UTC (permalink / raw)
  To: qemu-devel, mttcg, fred.konrad, a.rigo, serge.fdrv, cota,
	bobby.prani, rth
  Cc: patches, Alex Bennée, mark.burton, pbonzini, jan.kiszka,
	peter.maydell, claudio.fontana, Sergey Fedorov,
	Peter Crosthwaite

From: Alex Bennée <alex.bennee@linaro.org>

Lock contention in the hot path of moving between existing patched
TranslationBlocks is the main drag in multithreaded performance. This
patch pushes the tb_lock() usage down to the two places that really need
it:

  - code generation (tb_gen_code)
  - jump patching (tb_add_jump)

The rest of the code doesn't really need to hold a lock as it is either
using per-CPU structures, atomically updated or designed to be used in
concurrent read situations (qht_lookup).

To keep things simple I removed the #ifdef CONFIG_USER_ONLY stuff as the
locks become NOPs anyway until the MTTCG work is completed.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>

---
v2 (hot path)
  - Add r-b tags
v1 (hot path, split from base-patches series)
  - revert name tweaking
  - drop test jmp_list_next outside lock
  - mention lock NOPs in comments
v3 (base-patches)
  - fix merge conflicts with Sergey's patch
---
 cpu-exec.c | 48 +++++++++++++++++++++---------------------------
 1 file changed, 21 insertions(+), 27 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index 07dc50c56e8d..4eabd534aba0 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -286,35 +286,29 @@ static TranslationBlock *tb_find_slow(CPUState *cpu,
     TranslationBlock *tb;
 
     tb = tb_find_physical(cpu, pc, cs_base, flags);
-    if (tb) {
-        goto found;
-    }
+    if (!tb) {
 
-#ifdef CONFIG_USER_ONLY
-    /* mmap_lock is needed by tb_gen_code, and mmap_lock must be
-     * taken outside tb_lock.  Since we're momentarily dropping
-     * tb_lock, there's a chance that our desired tb has been
-     * translated.
-     */
-    tb_unlock();
-    mmap_lock();
-    tb_lock();
-    tb = tb_find_physical(cpu, pc, cs_base, flags);
-    if (tb) {
-        mmap_unlock();
-        goto found;
-    }
-#endif
+        /* mmap_lock is needed by tb_gen_code, and mmap_lock must be
+         * taken outside tb_lock. As system emulation is currently
+         * single threaded the locks are NOPs.
+         */
+        mmap_lock();
+        tb_lock();
 
-    /* if no translated code available, then translate it now */
-    tb = tb_gen_code(cpu, pc, cs_base, flags, 0);
+        /* There's a chance that our desired tb has been translated while
+         * taking the locks so we check again inside the lock.
+         */
+        tb = tb_find_physical(cpu, pc, cs_base, flags);
+        if (!tb) {
+            /* if no translated code available, then translate it now */
+            tb = tb_gen_code(cpu, pc, cs_base, flags, 0);
+        }
 
-#ifdef CONFIG_USER_ONLY
-    mmap_unlock();
-#endif
+        tb_unlock();
+        mmap_unlock();
+    }
 
-found:
-    /* we add the TB in the virtual pc hash table */
+    /* We add the TB in the virtual pc hash table for the fast lookup */
     atomic_set(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)], tb);
     return tb;
 }
@@ -332,7 +326,6 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
        always be the same before a given translated block
        is executed. */
     cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags);
-    tb_lock();
     tb = atomic_read(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)]);
     if (unlikely(!tb || tb->pc != pc || tb->cs_base != cs_base ||
                  tb->flags != flags)) {
@@ -349,14 +342,15 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
 #endif
     /* See if we can patch the calling TB. */
     if (last_tb && !qemu_loglevel_mask(CPU_LOG_TB_NOCHAIN)) {
+        tb_lock();
         /* Check if translation buffer has been flushed */
         if (cpu->tb_flushed) {
             cpu->tb_flushed = false;
         } else if (!tb_is_invalid(tb)) {
             tb_add_jump(last_tb, tb_exit, tb);
         }
+        tb_unlock();
     }
-    tb_unlock();
     return tb;
 }
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [Qemu-devel] [PATCH v3 10/11] tcg: Avoid bouncing tb_lock between tb_gen_code() and tb_add_jump()
  2016-07-12 20:13 [Qemu-devel] [PATCH v3 00/11] Reduce lock contention on TCG hot-path Sergey Fedorov
                   ` (8 preceding siblings ...)
  2016-07-12 20:13 ` [Qemu-devel] [PATCH v3 09/11] tcg: cpu-exec: remove tb_lock from the hot-path Sergey Fedorov
@ 2016-07-12 20:13 ` Sergey Fedorov
  2016-07-14 13:01   ` Alex Bennée
  2016-07-12 20:13 ` [Qemu-devel] [PATCH v3 11/11] tcg: Merge tb_find_slow() and tb_find_fast() Sergey Fedorov
  2016-07-13  7:39 ` [Qemu-devel] [PATCH v3 00/11] Reduce lock contention on TCG hot-path Paolo Bonzini
  11 siblings, 1 reply; 49+ messages in thread
From: Sergey Fedorov @ 2016-07-12 20:13 UTC (permalink / raw)
  To: qemu-devel, mttcg, fred.konrad, a.rigo, serge.fdrv, cota,
	bobby.prani, rth
  Cc: patches, mark.burton, pbonzini, jan.kiszka, peter.maydell,
	claudio.fontana, Alex Bennée, Sergey Fedorov,
	Peter Crosthwaite

From: Sergey Fedorov <serge.fdrv@gmail.com>

Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
---
 cpu-exec.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index 4eabd534aba0..22c672fe03fd 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -281,7 +281,8 @@ static TranslationBlock *tb_find_physical(CPUState *cpu,
 static TranslationBlock *tb_find_slow(CPUState *cpu,
                                       target_ulong pc,
                                       target_ulong cs_base,
-                                      uint32_t flags)
+                                      uint32_t flags,
+                                      bool *have_tb_lock)
 {
     TranslationBlock *tb;
 
@@ -294,6 +295,7 @@ static TranslationBlock *tb_find_slow(CPUState *cpu,
          */
         mmap_lock();
         tb_lock();
+        *have_tb_lock = true;
 
         /* There's a chance that our desired tb has been translated while
          * taking the locks so we check again inside the lock.
@@ -304,7 +306,6 @@ static TranslationBlock *tb_find_slow(CPUState *cpu,
             tb = tb_gen_code(cpu, pc, cs_base, flags, 0);
         }
 
-        tb_unlock();
         mmap_unlock();
     }
 
@@ -321,6 +322,7 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
     TranslationBlock *tb;
     target_ulong cs_base, pc;
     uint32_t flags;
+    bool have_tb_lock = false;
 
     /* we record a subset of the CPU state. It will
        always be the same before a given translated block
@@ -329,7 +331,7 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
     tb = atomic_read(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)]);
     if (unlikely(!tb || tb->pc != pc || tb->cs_base != cs_base ||
                  tb->flags != flags)) {
-        tb = tb_find_slow(cpu, pc, cs_base, flags);
+        tb = tb_find_slow(cpu, pc, cs_base, flags, &have_tb_lock);
     }
 #ifndef CONFIG_USER_ONLY
     /* We don't take care of direct jumps when address mapping changes in
@@ -342,13 +344,18 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
 #endif
     /* See if we can patch the calling TB. */
     if (last_tb && !qemu_loglevel_mask(CPU_LOG_TB_NOCHAIN)) {
-        tb_lock();
+        if (!have_tb_lock) {
+            tb_lock();
+            have_tb_lock = true;
+        }
         /* Check if translation buffer has been flushed */
         if (cpu->tb_flushed) {
             cpu->tb_flushed = false;
         } else if (!tb_is_invalid(tb)) {
             tb_add_jump(last_tb, tb_exit, tb);
         }
+    }
+    if (have_tb_lock) {
         tb_unlock();
     }
     return tb;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [Qemu-devel] [PATCH v3 11/11] tcg: Merge tb_find_slow() and tb_find_fast()
  2016-07-12 20:13 [Qemu-devel] [PATCH v3 00/11] Reduce lock contention on TCG hot-path Sergey Fedorov
                   ` (9 preceding siblings ...)
  2016-07-12 20:13 ` [Qemu-devel] [PATCH v3 10/11] tcg: Avoid bouncing tb_lock between tb_gen_code() and tb_add_jump() Sergey Fedorov
@ 2016-07-12 20:13 ` Sergey Fedorov
  2016-07-14 13:02   ` Alex Bennée
  2016-07-13  7:39 ` [Qemu-devel] [PATCH v3 00/11] Reduce lock contention on TCG hot-path Paolo Bonzini
  11 siblings, 1 reply; 49+ messages in thread
From: Sergey Fedorov @ 2016-07-12 20:13 UTC (permalink / raw)
  To: qemu-devel, mttcg, fred.konrad, a.rigo, serge.fdrv, cota,
	bobby.prani, rth
  Cc: patches, mark.burton, pbonzini, jan.kiszka, peter.maydell,
	claudio.fontana, Alex Bennée, Sergey Fedorov,
	Peter Crosthwaite

From: Sergey Fedorov <serge.fdrv@gmail.com>

These functions are not too big and can be merged together. This makes
locking scheme more clear and easier to follow.

Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
---
 cpu-exec.c | 72 ++++++++++++++++++++++++++------------------------------------
 1 file changed, 30 insertions(+), 42 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index 22c672fe03fd..6b01e8ceb0e8 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -278,45 +278,9 @@ static TranslationBlock *tb_find_physical(CPUState *cpu,
     return qht_lookup(&tcg_ctx.tb_ctx.htable, tb_cmp, &desc, h);
 }
 
-static TranslationBlock *tb_find_slow(CPUState *cpu,
-                                      target_ulong pc,
-                                      target_ulong cs_base,
-                                      uint32_t flags,
-                                      bool *have_tb_lock)
-{
-    TranslationBlock *tb;
-
-    tb = tb_find_physical(cpu, pc, cs_base, flags);
-    if (!tb) {
-
-        /* mmap_lock is needed by tb_gen_code, and mmap_lock must be
-         * taken outside tb_lock. As system emulation is currently
-         * single threaded the locks are NOPs.
-         */
-        mmap_lock();
-        tb_lock();
-        *have_tb_lock = true;
-
-        /* There's a chance that our desired tb has been translated while
-         * taking the locks so we check again inside the lock.
-         */
-        tb = tb_find_physical(cpu, pc, cs_base, flags);
-        if (!tb) {
-            /* if no translated code available, then translate it now */
-            tb = tb_gen_code(cpu, pc, cs_base, flags, 0);
-        }
-
-        mmap_unlock();
-    }
-
-    /* We add the TB in the virtual pc hash table for the fast lookup */
-    atomic_set(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)], tb);
-    return tb;
-}
-
-static inline TranslationBlock *tb_find_fast(CPUState *cpu,
-                                             TranslationBlock *last_tb,
-                                             int tb_exit)
+static inline TranslationBlock *tb_find(CPUState *cpu,
+                                        TranslationBlock *last_tb,
+                                        int tb_exit)
 {
     CPUArchState *env = (CPUArchState *)cpu->env_ptr;
     TranslationBlock *tb;
@@ -331,7 +295,31 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
     tb = atomic_read(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)]);
     if (unlikely(!tb || tb->pc != pc || tb->cs_base != cs_base ||
                  tb->flags != flags)) {
-        tb = tb_find_slow(cpu, pc, cs_base, flags, &have_tb_lock);
+        tb = tb_find_physical(cpu, pc, cs_base, flags);
+        if (!tb) {
+
+            /* mmap_lock is needed by tb_gen_code, and mmap_lock must be
+             * taken outside tb_lock. As system emulation is currently
+             * single threaded the locks are NOPs.
+             */
+            mmap_lock();
+            tb_lock();
+            have_tb_lock = true;
+
+            /* There's a chance that our desired tb has been translated while
+             * taking the locks so we check again inside the lock.
+             */
+            tb = tb_find_physical(cpu, pc, cs_base, flags);
+            if (!tb) {
+                /* if no translated code available, then translate it now */
+                tb = tb_gen_code(cpu, pc, cs_base, flags, 0);
+            }
+
+            mmap_unlock();
+        }
+
+        /* We add the TB in the virtual pc hash table for the fast lookup */
+        atomic_set(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)], tb);
     }
 #ifndef CONFIG_USER_ONLY
     /* We don't take care of direct jumps when address mapping changes in
@@ -436,7 +424,7 @@ static inline bool cpu_handle_exception(CPUState *cpu, int *ret)
     } else if (replay_has_exception()
                && cpu->icount_decr.u16.low + cpu->icount_extra == 0) {
         /* try to cause an exception pending in the log */
-        cpu_exec_nocache(cpu, 1, tb_find_fast(cpu, NULL, 0), true);
+        cpu_exec_nocache(cpu, 1, tb_find(cpu, NULL, 0), true);
         *ret = -1;
         return true;
 #endif
@@ -620,7 +608,7 @@ int cpu_exec(CPUState *cpu)
             atomic_mb_set(&cpu->tb_flushed, false); /* reset before first TB lookup */
             for(;;) {
                 cpu_handle_interrupt(cpu, &last_tb);
-                tb = tb_find_fast(cpu, last_tb, tb_exit);
+                tb = tb_find(cpu, last_tb, tb_exit);
                 cpu_loop_exec_tb(cpu, tb, &last_tb, &tb_exit, &sc);
                 /* Try to align the host and virtual clocks
                    if the guest is in advance */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [Qemu-devel] [PATCH v3 01/11] util/qht: Document memory ordering assumptions
  2016-07-12 20:13 ` [Qemu-devel] [PATCH v3 01/11] util/qht: Document memory ordering assumptions Sergey Fedorov
@ 2016-07-12 23:19   ` Emilio G. Cota
  2016-07-13  7:36     ` Paolo Bonzini
  2016-07-13 11:13   ` Paolo Bonzini
  1 sibling, 1 reply; 49+ messages in thread
From: Emilio G. Cota @ 2016-07-12 23:19 UTC (permalink / raw)
  To: Sergey Fedorov
  Cc: qemu-devel, mttcg, fred.konrad, a.rigo, serge.fdrv, bobby.prani,
	rth, patches, mark.burton, pbonzini, jan.kiszka, peter.maydell,
	claudio.fontana, Alex Bennée

On Tue, Jul 12, 2016 at 23:13:36 +0300, Sergey Fedorov wrote:
> From: Sergey Fedorov <serge.fdrv@gmail.com>
> It is naturally expected that some memory ordering should be provided
> around qht_insert(), qht_remove(), and qht_lookup(). Document these
> assumptions in the header file and put some comments in the source to
> denote how that memory ordering requirements are fulfilled.

I wouldn't put those comments in the source--seqlock callers should
know what they're doing, and what barriers seqlocks imply.

I'm OK with stating what the implied ordering is in the header
file, though.

Thanks,

		Emilio

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Qemu-devel] [PATCH v3 01/11] util/qht: Document memory ordering assumptions
  2016-07-12 23:19   ` Emilio G. Cota
@ 2016-07-13  7:36     ` Paolo Bonzini
  2016-07-13 17:50       ` Sergey Fedorov
  0 siblings, 1 reply; 49+ messages in thread
From: Paolo Bonzini @ 2016-07-13  7:36 UTC (permalink / raw)
  To: Emilio G. Cota, Sergey Fedorov
  Cc: mttcg, peter.maydell, claudio.fontana, patches, jan.kiszka,
	mark.burton, a.rigo, qemu-devel, serge.fdrv, bobby.prani, rth,
	Alex Bennée, fred.konrad



On 13/07/2016 01:19, Emilio G. Cota wrote:
> I wouldn't put those comments in the source--seqlock callers should
> know what they're doing, and what barriers seqlocks imply.

In general I'd agree with you, however in this case the "begin" calls
are what implements QHT's guarantee *for the caller*, so I think it's
worth having the comments.  In other words, if for any reason you do
anything before the read_begin and write_begin you still have to provide
barrier semantics.  It's not an explanation, it's a protection against
future mistakes.

There's no need for such comment at read_retry and write_end callsites,
though.

Also, it's spelled "guarantee". :)
Paolo


> I'm OK with stating what the implied ordering is in the header
> file, though.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Qemu-devel] [PATCH v3 00/11] Reduce lock contention on TCG hot-path
  2016-07-12 20:13 [Qemu-devel] [PATCH v3 00/11] Reduce lock contention on TCG hot-path Sergey Fedorov
                   ` (10 preceding siblings ...)
  2016-07-12 20:13 ` [Qemu-devel] [PATCH v3 11/11] tcg: Merge tb_find_slow() and tb_find_fast() Sergey Fedorov
@ 2016-07-13  7:39 ` Paolo Bonzini
  2016-07-13 17:00   ` Sergey Fedorov
                     ` (2 more replies)
  11 siblings, 3 replies; 49+ messages in thread
From: Paolo Bonzini @ 2016-07-13  7:39 UTC (permalink / raw)
  To: Sergey Fedorov, qemu-devel, mttcg, fred.konrad, a.rigo,
	serge.fdrv, cota, bobby.prani, rth
  Cc: peter.maydell, patches, claudio.fontana, mark.burton, jan.kiszka,
	Alex Bennée



On 12/07/2016 22:13, Sergey Fedorov wrote:
> From: Sergey Fedorov <serge.fdrv@gmail.com>
> 
> Hi,
> 
> This is my respin of Alex's v2 series [1].
> 
> The first 8 patches are preparation for the patch 9, the subject matter
> of this series, which enables lockless translation block lookup. The
> main change here is that Paolo's suggestion is implemented: TBs are
> marked with invalid CPU state early during invalidation. This allows to
> make lockless lookup safe from races on 'tb_jmp_cache' and direct block
> chaining.

Thanks for looking at the suggestion again and especially for perfecting it!

> The patch 10 is a simple solution to avoid unnecessary bouncing on
> 'tb_lock' between tb_gen_code() and tb_add_jump(). A local variable is
> used to keep track of whether 'tb_lock' has already been taken.
> 
> The last patch is my attempt to restructure tb_find_{fast,slow}() into a
> single function tb_find(). I think it will be easier to follow the
> locking scheme this way. However, I am afraid this last patch can be
> controversial, so it can be simply dropped.

Actually I agree entirely with it.

If anything, for historical reasons one might rename tb_find_physical to
tb_find_slow and leave the tb_find_fast name, but I think the patch is
good as is.

Have you measured performance with the series?  In any case, it's nice
to see MTTCG finally taking shape!

Paolo

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Qemu-devel] [PATCH v3 01/11] util/qht: Document memory ordering assumptions
  2016-07-12 20:13 ` [Qemu-devel] [PATCH v3 01/11] util/qht: Document memory ordering assumptions Sergey Fedorov
  2016-07-12 23:19   ` Emilio G. Cota
@ 2016-07-13 11:13   ` Paolo Bonzini
  2016-07-13 18:03     ` Sergey Fedorov
  2016-07-15 12:37     ` Sergey Fedorov
  1 sibling, 2 replies; 49+ messages in thread
From: Paolo Bonzini @ 2016-07-13 11:13 UTC (permalink / raw)
  To: Sergey Fedorov, qemu-devel, mttcg, fred.konrad, a.rigo,
	serge.fdrv, cota, bobby.prani, rth
  Cc: peter.maydell, patches, claudio.fontana, mark.burton, jan.kiszka,
	Alex Bennée



On 12/07/2016 22:13, Sergey Fedorov wrote:
> diff --git a/include/qemu/qht.h b/include/qemu/qht.h
> index 70bfc68b8d67..5f633e5d8100 100644
> --- a/include/qemu/qht.h
> +++ b/include/qemu/qht.h
> @@ -69,6 +69,9 @@ void qht_destroy(struct qht *ht);
>   * Attempting to insert a NULL @p is a bug.
>   * Inserting the same pointer @p with different @hash values is a bug.
>   *
> + * In case of successful operation, smp_wmb() is implied before the pointer is
> + * inserted into the hash table.
> + *
>   * Returns true on sucess.
>   * Returns false if the @p-@hash pair already exists in the hash table.
>   */
> @@ -86,6 +89,9 @@ bool qht_insert(struct qht *ht, void *p, uint32_t hash);
>   * The user-provided @func compares pointers in QHT against @userp.
>   * If the function returns true, a match has been found.
>   *
> + * smp_rmb() is implied before and after the pointer is looked up and retrieved
> + * from the hash table.

Do we really need to guarantee smp_rmb(), or is smp_read_barrier_depends()
aka atomic_rcu_read() enough?

Likewise, perhaps only an implicit smp_wmb() before the insert is
"interesting" to qht_insert__locked callers .

Something like:

diff --git a/include/qemu/qht.h b/include/qemu/qht.h
index 70bfc68..f4f1d55 100644
--- a/include/qemu/qht.h
+++ b/include/qemu/qht.h
@@ -69,6 +69,9 @@ void qht_destroy(struct qht *ht);
  * Attempting to insert a NULL @p is a bug.
  * Inserting the same pointer @p with different @hash values is a bug.
  *
+ * In case of successful operation, smp_wmb() is implied before the pointer is
+ * inserted into the hash table.
+ *
  * Returns true on sucess.
  * Returns false if the @p-@hash pair already exists in the hash table.
  */
@@ -83,6 +86,8 @@ bool qht_insert(struct qht *ht, void *p, uint32_t hash);
  *
  * Needs to be called under an RCU read-critical section.
  *
+ * smp_read_barrier_depends() is implied before the call to @func.
+ *
  * The user-provided @func compares pointers in QHT against @userp.
  * If the function returns true, a match has been found.
  *
@@ -105,6 +110,10 @@ void *qht_lookup(struct qht *ht, qht_lookup_func_t func, const void *userp,
  * This guarantees that concurrent lookups will always compare against valid
  * data.
  *
+ * In case of successful operation, a smp_wmb() barrier is implied before and
+ * after the pointer is removed from the hash table.  In other words,
+ * a successful qht_remove acts as a bidirectional write barrier.
+ *
  * Returns true on success.
  * Returns false if the @p-@hash pair was not found.
  */
diff --git a/util/qht.c b/util/qht.c
index 40d6e21..d38948e 100644
--- a/util/qht.c
+++ b/util/qht.c
@@ -445,7 +445,11 @@ void *qht_do_lookup(struct qht_bucket *head, qht_lookup_func_t func,
     do {
         for (i = 0; i < QHT_BUCKET_ENTRIES; i++) {
             if (b->hashes[i] == hash) {
-                void *p = atomic_read(&b->pointers[i]);
+                /* The pointer is dereferenced before seqlock_read_retry,
+                 * so (unlike qht_insert__locked) we need to use
+                 * atomic_rcu_read here.
+                 */
+                void *p = atomic_rcu_read(&b->pointers[i]);
 
                 if (likely(p) && likely(func(p, userp))) {
                     return p;
@@ -535,6 +539,7 @@ static bool qht_insert__locked(struct qht *ht, struct qht_map *map,
         atomic_rcu_set(&prev->next, b);
     }
     b->hashes[i] = hash;
+    /* smp_wmb() implicit in seqlock_write_begin.  */
     atomic_set(&b->pointers[i], p);
     seqlock_write_end(&head->sequence);
     return true;
@@ -659,6 +664,9 @@ bool qht_remove__locked(struct qht_map *map, struct qht_bucket *head,
             }
             if (q == p) {
                 qht_debug_assert(b->hashes[i] == hash);
+                /* seqlock_write_begin and seqlock_write_end provide write
+                 * memory barrier semantics to callers of qht_remove.
+                 */
                 seqlock_write_begin(&head->sequence);
                 qht_bucket_remove_entry(b, i);
                 seqlock_write_end(&head->sequence);

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [Qemu-devel] [PATCH v3 00/11] Reduce lock contention on TCG hot-path
  2016-07-13  7:39 ` [Qemu-devel] [PATCH v3 00/11] Reduce lock contention on TCG hot-path Paolo Bonzini
@ 2016-07-13 17:00   ` Sergey Fedorov
  2016-07-14  9:55     ` Alex Bennée
  2016-07-13 18:06   ` Sergey Fedorov
  2016-07-14 12:02   ` Alex Bennée
  2 siblings, 1 reply; 49+ messages in thread
From: Sergey Fedorov @ 2016-07-13 17:00 UTC (permalink / raw)
  To: Paolo Bonzini, Alex Bennée
  Cc: Sergey Fedorov, qemu-devel, mttcg, fred.konrad, a.rigo, cota,
	bobby.prani, rth, peter.maydell, patches, claudio.fontana,
	mark.burton, jan.kiszka

On 13/07/16 10:39, Paolo Bonzini wrote:
> On 12/07/2016 22:13, Sergey Fedorov wrote:
>> From: Sergey Fedorov <serge.fdrv@gmail.com>
>>
>> Hi,
>>
>> This is my respin of Alex's v2 series [1].
>>
>> The first 8 patches are preparation for the patch 9, the subject matter
>> of this series, which enables lockless translation block lookup. The
>> main change here is that Paolo's suggestion is implemented: TBs are
>> marked with invalid CPU state early during invalidation. This allows to
>> make lockless lookup safe from races on 'tb_jmp_cache' and direct block
>> chaining.
> Thanks for looking at the suggestion again and especially for perfecting it!
>
>> The patch 10 is a simple solution to avoid unnecessary bouncing on
>> 'tb_lock' between tb_gen_code() and tb_add_jump(). A local variable is
>> used to keep track of whether 'tb_lock' has already been taken.
>>
>> The last patch is my attempt to restructure tb_find_{fast,slow}() into a
>> single function tb_find(). I think it will be easier to follow the
>> locking scheme this way. However, I am afraid this last patch can be
>> controversial, so it can be simply dropped.
> Actually I agree entirely with it.
>
> If anything, for historical reasons one might rename tb_find_physical to
> tb_find_slow and leave the tb_find_fast name, but I think the patch is
> good as is.

Nice to hear that :)

>
> Have you measured performance with the series?  In any case, it's nice
> to see MTTCG finally taking shape!

No, I didn't measured the performance. Maybe Alex can help me with this?

Thanks,
Sergey

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Qemu-devel] [PATCH v3 01/11] util/qht: Document memory ordering assumptions
  2016-07-13  7:36     ` Paolo Bonzini
@ 2016-07-13 17:50       ` Sergey Fedorov
  2016-07-14 13:56         ` Paolo Bonzini
  0 siblings, 1 reply; 49+ messages in thread
From: Sergey Fedorov @ 2016-07-13 17:50 UTC (permalink / raw)
  To: Paolo Bonzini, Emilio G. Cota, Sergey Fedorov
  Cc: mttcg, peter.maydell, claudio.fontana, patches, jan.kiszka,
	mark.burton, a.rigo, qemu-devel, bobby.prani, rth,
	Alex Bennée, fred.konrad

On 13/07/16 10:36, Paolo Bonzini wrote:
>
> On 13/07/2016 01:19, Emilio G. Cota wrote:
>> I wouldn't put those comments in the source--seqlock callers should
>> know what they're doing, and what barriers seqlocks imply.
> In general I'd agree with you, however in this case the "begin" calls
> are what implements QHT's guarantee *for the caller*, so I think it's
> worth having the comments.  In other words, if for any reason you do
> anything before the read_begin and write_begin you still have to provide
> barrier semantics.  It's not an explanation, it's a protection against
> future mistakes.

Exactly :)

> There's no need for such comment at read_retry and write_end callsites,
> though.

Why?

> Also, it's spelled "guarantee". :)

Hmm, I can't see where the spelling isn't correct.

Thanks,
Sergey

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Qemu-devel] [PATCH v3 01/11] util/qht: Document memory ordering assumptions
  2016-07-13 11:13   ` Paolo Bonzini
@ 2016-07-13 18:03     ` Sergey Fedorov
  2016-07-14  8:05       ` Paolo Bonzini
  2016-07-15 12:37     ` Sergey Fedorov
  1 sibling, 1 reply; 49+ messages in thread
From: Sergey Fedorov @ 2016-07-13 18:03 UTC (permalink / raw)
  To: Paolo Bonzini, Sergey Fedorov, qemu-devel, mttcg, fred.konrad,
	a.rigo, cota, bobby.prani, rth
  Cc: peter.maydell, patches, claudio.fontana, mark.burton, jan.kiszka,
	Alex Bennée

On 13/07/16 14:13, Paolo Bonzini wrote:
>
> On 12/07/2016 22:13, Sergey Fedorov wrote:
>> diff --git a/include/qemu/qht.h b/include/qemu/qht.h
>> index 70bfc68b8d67..5f633e5d8100 100644
>> --- a/include/qemu/qht.h
>> +++ b/include/qemu/qht.h
>> @@ -69,6 +69,9 @@ void qht_destroy(struct qht *ht);
>>   * Attempting to insert a NULL @p is a bug.
>>   * Inserting the same pointer @p with different @hash values is a bug.
>>   *
>> + * In case of successful operation, smp_wmb() is implied before the pointer is
>> + * inserted into the hash table.
>> + *
>>   * Returns true on sucess.
>>   * Returns false if the @p-@hash pair already exists in the hash table.
>>   */
>> @@ -86,6 +89,9 @@ bool qht_insert(struct qht *ht, void *p, uint32_t hash);
>>   * The user-provided @func compares pointers in QHT against @userp.
>>   * If the function returns true, a match has been found.
>>   *
>> + * smp_rmb() is implied before and after the pointer is looked up and retrieved
>> + * from the hash table.
> Do we really need to guarantee smp_rmb(), or is smp_read_barrier_depends()
> aka atomic_rcu_read() enough?

The idea was something like: qht_lookup() can be "paired" with either
qht_insert() or qht_remove(). The intention was to guarantee independent
tb_jmp_cache lookup performed before qht_lookup() in tb_find_physical().

>
> Likewise, perhaps only an implicit smp_wmb() before the insert is
> "interesting" to qht_insert__locked callers .

I see the idea behind this patch is worthwhile so I spend more time on
refining it and I must look at your patch carefully ;)

Thanks,
Sergey

>
> Something like:
>
> diff --git a/include/qemu/qht.h b/include/qemu/qht.h
> index 70bfc68..f4f1d55 100644
> --- a/include/qemu/qht.h
> +++ b/include/qemu/qht.h
> @@ -69,6 +69,9 @@ void qht_destroy(struct qht *ht);
>   * Attempting to insert a NULL @p is a bug.
>   * Inserting the same pointer @p with different @hash values is a bug.
>   *
> + * In case of successful operation, smp_wmb() is implied before the pointer is
> + * inserted into the hash table.
> + *
>   * Returns true on sucess.
>   * Returns false if the @p-@hash pair already exists in the hash table.
>   */
> @@ -83,6 +86,8 @@ bool qht_insert(struct qht *ht, void *p, uint32_t hash);
>   *
>   * Needs to be called under an RCU read-critical section.
>   *
> + * smp_read_barrier_depends() is implied before the call to @func.
> + *
>   * The user-provided @func compares pointers in QHT against @userp.
>   * If the function returns true, a match has been found.
>   *
> @@ -105,6 +110,10 @@ void *qht_lookup(struct qht *ht, qht_lookup_func_t func, const void *userp,
>   * This guarantees that concurrent lookups will always compare against valid
>   * data.
>   *
> + * In case of successful operation, a smp_wmb() barrier is implied before and
> + * after the pointer is removed from the hash table.  In other words,
> + * a successful qht_remove acts as a bidirectional write barrier.
> + *
>   * Returns true on success.
>   * Returns false if the @p-@hash pair was not found.
>   */
> diff --git a/util/qht.c b/util/qht.c
> index 40d6e21..d38948e 100644
> --- a/util/qht.c
> +++ b/util/qht.c
> @@ -445,7 +445,11 @@ void *qht_do_lookup(struct qht_bucket *head, qht_lookup_func_t func,
>      do {
>          for (i = 0; i < QHT_BUCKET_ENTRIES; i++) {
>              if (b->hashes[i] == hash) {
> -                void *p = atomic_read(&b->pointers[i]);
> +                /* The pointer is dereferenced before seqlock_read_retry,
> +                 * so (unlike qht_insert__locked) we need to use
> +                 * atomic_rcu_read here.
> +                 */
> +                void *p = atomic_rcu_read(&b->pointers[i]);
>  
>                  if (likely(p) && likely(func(p, userp))) {
>                      return p;
> @@ -535,6 +539,7 @@ static bool qht_insert__locked(struct qht *ht, struct qht_map *map,
>          atomic_rcu_set(&prev->next, b);
>      }
>      b->hashes[i] = hash;
> +    /* smp_wmb() implicit in seqlock_write_begin.  */
>      atomic_set(&b->pointers[i], p);
>      seqlock_write_end(&head->sequence);
>      return true;
> @@ -659,6 +664,9 @@ bool qht_remove__locked(struct qht_map *map, struct qht_bucket *head,
>              }
>              if (q == p) {
>                  qht_debug_assert(b->hashes[i] == hash);
> +                /* seqlock_write_begin and seqlock_write_end provide write
> +                 * memory barrier semantics to callers of qht_remove.
> +                 */
>                  seqlock_write_begin(&head->sequence);
>                  qht_bucket_remove_entry(b, i);
>                  seqlock_write_end(&head->sequence);

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Qemu-devel] [PATCH v3 00/11] Reduce lock contention on TCG hot-path
  2016-07-13  7:39 ` [Qemu-devel] [PATCH v3 00/11] Reduce lock contention on TCG hot-path Paolo Bonzini
  2016-07-13 17:00   ` Sergey Fedorov
@ 2016-07-13 18:06   ` Sergey Fedorov
  2016-07-14 12:02   ` Alex Bennée
  2 siblings, 0 replies; 49+ messages in thread
From: Sergey Fedorov @ 2016-07-13 18:06 UTC (permalink / raw)
  To: Paolo Bonzini, Sergey Fedorov, qemu-devel, mttcg, fred.konrad,
	a.rigo, cota, bobby.prani, rth
  Cc: peter.maydell, patches, claudio.fontana, mark.burton, jan.kiszka,
	Alex Bennée

On 13/07/16 10:39, Paolo Bonzini wrote:
> If anything, for historical reasons one might rename tb_find_physical to
> tb_find_slow and leave the tb_find_fast name, but I think the patch is
> good as is.

I think tb_find_htable() or tb_find_global_htable() could also be good
options if we're going to rename tb_find_physical().

Kind regards,
Sergey

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Qemu-devel] [PATCH v3 01/11] util/qht: Document memory ordering assumptions
  2016-07-13 18:03     ` Sergey Fedorov
@ 2016-07-14  8:05       ` Paolo Bonzini
  0 siblings, 0 replies; 49+ messages in thread
From: Paolo Bonzini @ 2016-07-14  8:05 UTC (permalink / raw)
  To: Sergey Fedorov, Sergey Fedorov, qemu-devel, mttcg, fred.konrad,
	a.rigo, cota, bobby.prani, rth
  Cc: peter.maydell, patches, claudio.fontana, mark.burton, jan.kiszka,
	Alex Bennée



On 13/07/2016 20:03, Sergey Fedorov wrote:
> On 13/07/16 14:13, Paolo Bonzini wrote:
>>
>> On 12/07/2016 22:13, Sergey Fedorov wrote:
>>> diff --git a/include/qemu/qht.h b/include/qemu/qht.h
>>> index 70bfc68b8d67..5f633e5d8100 100644
>>> --- a/include/qemu/qht.h
>>> +++ b/include/qemu/qht.h
>>> @@ -69,6 +69,9 @@ void qht_destroy(struct qht *ht);
>>>   * Attempting to insert a NULL @p is a bug.
>>>   * Inserting the same pointer @p with different @hash values is a bug.
>>>   *
>>> + * In case of successful operation, smp_wmb() is implied before the pointer is
>>> + * inserted into the hash table.
>>> + *
>>>   * Returns true on sucess.
>>>   * Returns false if the @p-@hash pair already exists in the hash table.
>>>   */
>>> @@ -86,6 +89,9 @@ bool qht_insert(struct qht *ht, void *p, uint32_t hash);
>>>   * The user-provided @func compares pointers in QHT against @userp.
>>>   * If the function returns true, a match has been found.
>>>   *
>>> + * smp_rmb() is implied before and after the pointer is looked up and retrieved
>>> + * from the hash table.
>> Do we really need to guarantee smp_rmb(), or is smp_read_barrier_depends()
>> aka atomic_rcu_read() enough?
> 
> The idea was something like: qht_lookup() can be "paired" with either
> qht_insert() or qht_remove(). The intention was to guarantee independent
> tb_jmp_cache lookup performed before qht_lookup() in tb_find_physical().

Of course; that would not be impacted if qht_lookup() only made a weaker
promise.

Anyway, this patch (or mine or a mix of them) can go in after hard freeze.

Paolo

>>
>> Likewise, perhaps only an implicit smp_wmb() before the insert is
>> "interesting" to qht_insert__locked callers .
> 
> I see the idea behind this patch is worthwhile so I spend more time on
> refining it and I must look at your patch carefully ;)
> 
> Thanks,
> Sergey
> 
>>
>> Something like:
>>
>> diff --git a/include/qemu/qht.h b/include/qemu/qht.h
>> index 70bfc68..f4f1d55 100644
>> --- a/include/qemu/qht.h
>> +++ b/include/qemu/qht.h
>> @@ -69,6 +69,9 @@ void qht_destroy(struct qht *ht);
>>   * Attempting to insert a NULL @p is a bug.
>>   * Inserting the same pointer @p with different @hash values is a bug.
>>   *
>> + * In case of successful operation, smp_wmb() is implied before the pointer is
>> + * inserted into the hash table.
>> + *
>>   * Returns true on sucess.
>>   * Returns false if the @p-@hash pair already exists in the hash table.
>>   */
>> @@ -83,6 +86,8 @@ bool qht_insert(struct qht *ht, void *p, uint32_t hash);
>>   *
>>   * Needs to be called under an RCU read-critical section.
>>   *
>> + * smp_read_barrier_depends() is implied before the call to @func.
>> + *
>>   * The user-provided @func compares pointers in QHT against @userp.
>>   * If the function returns true, a match has been found.
>>   *
>> @@ -105,6 +110,10 @@ void *qht_lookup(struct qht *ht, qht_lookup_func_t func, const void *userp,
>>   * This guarantees that concurrent lookups will always compare against valid
>>   * data.
>>   *
>> + * In case of successful operation, a smp_wmb() barrier is implied before and
>> + * after the pointer is removed from the hash table.  In other words,
>> + * a successful qht_remove acts as a bidirectional write barrier.
>> + *
>>   * Returns true on success.
>>   * Returns false if the @p-@hash pair was not found.
>>   */
>> diff --git a/util/qht.c b/util/qht.c
>> index 40d6e21..d38948e 100644
>> --- a/util/qht.c
>> +++ b/util/qht.c
>> @@ -445,7 +445,11 @@ void *qht_do_lookup(struct qht_bucket *head, qht_lookup_func_t func,
>>      do {
>>          for (i = 0; i < QHT_BUCKET_ENTRIES; i++) {
>>              if (b->hashes[i] == hash) {
>> -                void *p = atomic_read(&b->pointers[i]);
>> +                /* The pointer is dereferenced before seqlock_read_retry,
>> +                 * so (unlike qht_insert__locked) we need to use
>> +                 * atomic_rcu_read here.
>> +                 */
>> +                void *p = atomic_rcu_read(&b->pointers[i]);
>>  
>>                  if (likely(p) && likely(func(p, userp))) {
>>                      return p;
>> @@ -535,6 +539,7 @@ static bool qht_insert__locked(struct qht *ht, struct qht_map *map,
>>          atomic_rcu_set(&prev->next, b);
>>      }
>>      b->hashes[i] = hash;
>> +    /* smp_wmb() implicit in seqlock_write_begin.  */
>>      atomic_set(&b->pointers[i], p);
>>      seqlock_write_end(&head->sequence);
>>      return true;
>> @@ -659,6 +664,9 @@ bool qht_remove__locked(struct qht_map *map, struct qht_bucket *head,
>>              }
>>              if (q == p) {
>>                  qht_debug_assert(b->hashes[i] == hash);
>> +                /* seqlock_write_begin and seqlock_write_end provide write
>> +                 * memory barrier semantics to callers of qht_remove.
>> +                 */
>>                  seqlock_write_begin(&head->sequence);
>>                  qht_bucket_remove_entry(b, i);
>>                  seqlock_write_end(&head->sequence);
> 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Qemu-devel] [PATCH v3 00/11] Reduce lock contention on TCG hot-path
  2016-07-13 17:00   ` Sergey Fedorov
@ 2016-07-14  9:55     ` Alex Bennée
  2016-07-14 11:13       ` Sergey Fedorov
  0 siblings, 1 reply; 49+ messages in thread
From: Alex Bennée @ 2016-07-14  9:55 UTC (permalink / raw)
  To: Sergey Fedorov
  Cc: Paolo Bonzini, Sergey Fedorov, qemu-devel, mttcg, fred.konrad,
	a.rigo, cota, bobby.prani, rth, peter.maydell, patches,
	claudio.fontana, mark.burton, jan.kiszka


Sergey Fedorov <serge.fdrv@gmail.com> writes:

> On 13/07/16 10:39, Paolo Bonzini wrote:
>> On 12/07/2016 22:13, Sergey Fedorov wrote:
>>> From: Sergey Fedorov <serge.fdrv@gmail.com>
>>>
>>> Hi,
>>>
>>> This is my respin of Alex's v2 series [1].
>>>
>>> The first 8 patches are preparation for the patch 9, the subject matter
>>> of this series, which enables lockless translation block lookup. The
>>> main change here is that Paolo's suggestion is implemented: TBs are
>>> marked with invalid CPU state early during invalidation. This allows to
>>> make lockless lookup safe from races on 'tb_jmp_cache' and direct block
>>> chaining.
>> Thanks for looking at the suggestion again and especially for perfecting it!
>>
>>> The patch 10 is a simple solution to avoid unnecessary bouncing on
>>> 'tb_lock' between tb_gen_code() and tb_add_jump(). A local variable is
>>> used to keep track of whether 'tb_lock' has already been taken.
>>>
>>> The last patch is my attempt to restructure tb_find_{fast,slow}() into a
>>> single function tb_find(). I think it will be easier to follow the
>>> locking scheme this way. However, I am afraid this last patch can be
>>> controversial, so it can be simply dropped.
>> Actually I agree entirely with it.
>>
>> If anything, for historical reasons one might rename tb_find_physical to
>> tb_find_slow and leave the tb_find_fast name, but I think the patch is
>> good as is.
>
> Nice to hear that :)
>
>>
>> Have you measured performance with the series?  In any case, it's nice
>> to see MTTCG finally taking shape!
>
> No, I didn't measured the performance. Maybe Alex can help me with
> this?

I can run some tests against this series if you want. It's fairly easy
to see an improvement when watching htop though as the system time usage
drops a lot.

>
> Thanks,
> Sergey


--
Alex Bennée

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Qemu-devel] [PATCH v3 06/11] tcg: Introduce tb_mark_invalid() and tb_is_invalid()
  2016-07-12 20:13 ` [Qemu-devel] [PATCH v3 06/11] tcg: Introduce tb_mark_invalid() and tb_is_invalid() Sergey Fedorov
@ 2016-07-14 10:25   ` Alex Bennée
  2016-07-14 11:10     ` Sergey Fedorov
  2016-07-14 12:53   ` Alex Bennée
  1 sibling, 1 reply; 49+ messages in thread
From: Alex Bennée @ 2016-07-14 10:25 UTC (permalink / raw)
  To: Sergey Fedorov
  Cc: qemu-devel, mttcg, fred.konrad, a.rigo, serge.fdrv, cota,
	bobby.prani, rth, patches, mark.burton, pbonzini, jan.kiszka,
	peter.maydell, claudio.fontana, Peter Crosthwaite,
	Edgar E. Iglesias, Eduardo Habkost, Michael Walle,
	Aurelien Jarno, Leon Alrae, Anthony Green, Jia Liu, David Gibson,
	Alexander Graf, Mark Cave-Ayland, Artyom Tarasenko,
	Bastian Koppelmann, Guan Xuetao, Max Filippov, qemu-arm,
	qemu-ppc


Sergey Fedorov <sergey.fedorov@linaro.org> writes:

> From: Sergey Fedorov <serge.fdrv@gmail.com>
>
> These functions will be used to make translation block invalidation safe
> with concurrent lockless lookup in the global hash table.
>
> Most targets don't use 'cs_base'; so marking TB as invalid is as simple
> as assigning -1 to 'cs_base'. SPARC target stores the next program
> counter into 'cs_base', and -1 is a fine invalid value since PC must bet
> a multiple of 4 in SPARC. The only odd target is i386, for which a
> special flag is introduced in place of removed 'HF_SOFTMMU_MASK'.
>
> Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
> Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>

This has merge conflicts with the current state of master. Is there
anyway to have a common implementation that is specialised only when
needed?

> ---
>  include/exec/exec-all.h  | 10 ++++++++++
>  target-alpha/cpu.h       | 14 ++++++++++++++
>  target-arm/cpu.h         | 14 ++++++++++++++
>  target-cris/cpu.h        | 14 ++++++++++++++
>  target-i386/cpu.h        | 17 +++++++++++++++++
>  target-lm32/cpu.h        | 14 ++++++++++++++
>  target-m68k/cpu.h        | 14 ++++++++++++++
>  target-microblaze/cpu.h  | 14 ++++++++++++++
>  target-mips/cpu.h        | 14 ++++++++++++++
>  target-moxie/cpu.h       | 14 ++++++++++++++
>  target-openrisc/cpu.h    | 14 ++++++++++++++
>  target-ppc/cpu.h         | 14 ++++++++++++++
>  target-s390x/cpu.h       | 14 ++++++++++++++
>  target-sh4/cpu.h         | 14 ++++++++++++++
>  target-sparc/cpu.h       | 14 ++++++++++++++
>  target-sparc/translate.c |  1 +
>  target-tilegx/cpu.h      | 14 ++++++++++++++
>  target-tricore/cpu.h     | 14 ++++++++++++++
>  target-unicore32/cpu.h   | 14 ++++++++++++++
>  target-xtensa/cpu.h      | 14 ++++++++++++++
>  20 files changed, 266 insertions(+)
>
> diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
> index db79ab65cebe..61cc3a1fb8f7 100644
> --- a/include/exec/exec-all.h
> +++ b/include/exec/exec-all.h
> @@ -256,6 +256,16 @@ void tb_free(TranslationBlock *tb);
>  void tb_flush(CPUState *cpu);
>  void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr);
>
> +static inline void tb_mark_invalid(TranslationBlock *tb)
> +{
> +    cpu_get_invalid_tb_cpu_state(&tb->pc, &tb->cs_base, &tb->flags);
> +}
> +
> +static inline bool tb_is_invalid(TranslationBlock *tb)
> +{
> +    return cpu_tb_cpu_state_is_invalidated(tb->pc, tb->cs_base, tb->flags);
> +}
> +
>  #if defined(USE_DIRECT_JUMP)
>
>  #if defined(CONFIG_TCG_INTERPRETER)
> diff --git a/target-alpha/cpu.h b/target-alpha/cpu.h
> index 791da3b4ad67..1ce4526d7a72 100644
> --- a/target-alpha/cpu.h
> +++ b/target-alpha/cpu.h
> @@ -524,4 +524,18 @@ static inline void cpu_get_tb_cpu_state(CPUAlphaState *env, target_ulong *pc,
>      *pflags = flags;
>  }
>
> +static inline void cpu_get_invalid_tb_cpu_state(target_ulong *pc,
> +                                                target_ulong *cs_base,
> +                                                uint32_t *flags)
> +{
> +    *cs_base = -1;
> +}
> +
> +static inline bool cpu_tb_cpu_state_is_invalidated(target_ulong pc,
> +                                                   target_ulong cs_base,
> +                                                   uint32_t flags)
> +{
> +    return cs_base == -1;
> +}
> +
>  #endif /* !defined (__CPU_ALPHA_H__) */
> diff --git a/target-arm/cpu.h b/target-arm/cpu.h
> index e2fac46909c6..2855bdae7800 100644
> --- a/target-arm/cpu.h
> +++ b/target-arm/cpu.h
> @@ -2371,6 +2371,20 @@ static inline void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
>      *cs_base = 0;
>  }
>
> +static inline void cpu_get_invalid_tb_cpu_state(target_ulong *pc,
> +                                                target_ulong *cs_base,
> +                                                uint32_t *flags)
> +{
> +    *cs_base = -1;
> +}
> +
> +static inline bool cpu_tb_cpu_state_is_invalidated(target_ulong pc,
> +                                                   target_ulong cs_base,
> +                                                   uint32_t flags)
> +{
> +    return cs_base == -1;
> +}
> +
>  enum {
>      QEMU_PSCI_CONDUIT_DISABLED = 0,
>      QEMU_PSCI_CONDUIT_SMC = 1,
> diff --git a/target-cris/cpu.h b/target-cris/cpu.h
> index e6046d20ca10..fae3219304df 100644
> --- a/target-cris/cpu.h
> +++ b/target-cris/cpu.h
> @@ -295,6 +295,20 @@ static inline void cpu_get_tb_cpu_state(CPUCRISState *env, target_ulong *pc,
>  				     | X_FLAG | PFIX_FLAG));
>  }
>
> +static inline void cpu_get_invalid_tb_cpu_state(target_ulong *pc,
> +                                                target_ulong *cs_base,
> +                                                uint32_t *flags)
> +{
> +    *cs_base = -1;
> +}
> +
> +static inline bool cpu_tb_cpu_state_is_invalidated(target_ulong pc,
> +                                                   target_ulong cs_base,
> +                                                   uint32_t flags)
> +{
> +    return cs_base == -1;
> +}
> +
>  #define cpu_list cris_cpu_list
>  void cris_cpu_list(FILE *f, fprintf_function cpu_fprintf);
>
> diff --git a/target-i386/cpu.h b/target-i386/cpu.h
> index a7c752afdad8..c9125b725ca4 100644
> --- a/target-i386/cpu.h
> +++ b/target-i386/cpu.h
> @@ -129,6 +129,8 @@
>     positions to ease oring with eflags. */
>  /* current cpl */
>  #define HF_CPL_SHIFT         0
> +/* used to mark invalidated translation blocks */
> +#define HF_INVALID_SHIFT     2
>  /* true if hardware interrupts must be disabled for next instruction */
>  #define HF_INHIBIT_IRQ_SHIFT 3
>  /* 16 or 32 segments */
> @@ -158,6 +160,7 @@
>  #define HF_MPX_IU_SHIFT     26 /* BND registers in-use */
>
>  #define HF_CPL_MASK          (3 << HF_CPL_SHIFT)
> +#define HF_INVALID_MASK      (1 << HF_INVALID_SHIFT)
>  #define HF_INHIBIT_IRQ_MASK  (1 << HF_INHIBIT_IRQ_SHIFT)
>  #define HF_CS32_MASK         (1 << HF_CS32_SHIFT)
>  #define HF_SS32_MASK         (1 << HF_SS32_SHIFT)
> @@ -1489,6 +1492,20 @@ static inline void cpu_get_tb_cpu_state(CPUX86State *env, target_ulong *pc,
>          (env->eflags & (IOPL_MASK | TF_MASK | RF_MASK | VM_MASK | AC_MASK));
>  }
>
> +static inline void cpu_get_invalid_tb_cpu_state(target_ulong *pc,
> +                                                target_ulong *cs_base,
> +                                                uint32_t *flags)
> +{
> +    *flags = HF_INVALID_MASK;
> +}
> +
> +static inline bool cpu_tb_cpu_state_is_invalidated(target_ulong pc,
> +                                                   target_ulong cs_base,
> +                                                   uint32_t flags)
> +{
> +    return flags == HF_INVALID_MASK;
> +}
> +
>  void do_cpu_init(X86CPU *cpu);
>  void do_cpu_sipi(X86CPU *cpu);
>
> diff --git a/target-lm32/cpu.h b/target-lm32/cpu.h
> index 4efe98d828f8..5f78c3c619e5 100644
> --- a/target-lm32/cpu.h
> +++ b/target-lm32/cpu.h
> @@ -271,4 +271,18 @@ static inline void cpu_get_tb_cpu_state(CPULM32State *env, target_ulong *pc,
>      *flags = 0;
>  }
>
> +static inline void cpu_get_invalid_tb_cpu_state(target_ulong *pc,
> +                                                target_ulong *cs_base,
> +                                                uint32_t *flags)
> +{
> +    *cs_base = -1;
> +}
> +
> +static inline bool cpu_tb_cpu_state_is_invalidated(target_ulong pc,
> +                                                   target_ulong cs_base,
> +                                                   uint32_t flags)
> +{
> +    return cs_base == -1;
> +}
> +
>  #endif
> diff --git a/target-m68k/cpu.h b/target-m68k/cpu.h
> index 908776999797..c44f4339e40a 100644
> --- a/target-m68k/cpu.h
> +++ b/target-m68k/cpu.h
> @@ -269,4 +269,18 @@ static inline void cpu_get_tb_cpu_state(CPUM68KState *env, target_ulong *pc,
>              | ((env->macsr >> 4) & 0xf);        /* Bits 0-3 */
>  }
>
> +static inline void cpu_get_invalid_tb_cpu_state(target_ulong *pc,
> +                                                target_ulong *cs_base,
> +                                                uint32_t *flags)
> +{
> +    *cs_base = -1;
> +}
> +
> +static inline bool cpu_tb_cpu_state_is_invalidated(target_ulong pc,
> +                                                   target_ulong cs_base,
> +                                                   uint32_t flags)
> +{
> +    return cs_base == -1;
> +}
> +
>  #endif
> diff --git a/target-microblaze/cpu.h b/target-microblaze/cpu.h
> index 16815dfc6abc..22362efce586 100644
> --- a/target-microblaze/cpu.h
> +++ b/target-microblaze/cpu.h
> @@ -371,6 +371,20 @@ static inline void cpu_get_tb_cpu_state(CPUMBState *env, target_ulong *pc,
>                   (env->sregs[SR_MSR] & (MSR_UM | MSR_VM | MSR_EE));
>  }
>
> +static inline void cpu_get_invalid_tb_cpu_state(target_ulong *pc,
> +                                                target_ulong *cs_base,
> +                                                uint32_t *flags)
> +{
> +    *cs_base = -1;
> +}
> +
> +static inline bool cpu_tb_cpu_state_is_invalidated(target_ulong pc,
> +                                                   target_ulong cs_base,
> +                                                   uint32_t flags)
> +{
> +    return cs_base == -1;
> +}
> +
>  #if !defined(CONFIG_USER_ONLY)
>  void mb_cpu_unassigned_access(CPUState *cpu, hwaddr addr,
>                                bool is_write, bool is_exec, int is_asi,
> diff --git a/target-mips/cpu.h b/target-mips/cpu.h
> index 9c4fc816bf9b..a887154274b3 100644
> --- a/target-mips/cpu.h
> +++ b/target-mips/cpu.h
> @@ -901,6 +901,20 @@ static inline void cpu_get_tb_cpu_state(CPUMIPSState *env, target_ulong *pc,
>                              MIPS_HFLAG_HWRENA_ULR);
>  }
>
> +static inline void cpu_get_invalid_tb_cpu_state(target_ulong *pc,
> +                                                target_ulong *cs_base,
> +                                                uint32_t *flags)
> +{
> +    *cs_base = -1;
> +}
> +
> +static inline bool cpu_tb_cpu_state_is_invalidated(target_ulong pc,
> +                                                   target_ulong cs_base,
> +                                                   uint32_t flags)
> +{
> +    return cs_base == -1;
> +}
> +
>  static inline int mips_vpe_active(CPUMIPSState *env)
>  {
>      int active = 1;
> diff --git a/target-moxie/cpu.h b/target-moxie/cpu.h
> index 63d5cafc5528..f0f95a2701e1 100644
> --- a/target-moxie/cpu.h
> +++ b/target-moxie/cpu.h
> @@ -136,6 +136,20 @@ static inline void cpu_get_tb_cpu_state(CPUMoxieState *env, target_ulong *pc,
>      *flags = 0;
>  }
>
> +static inline void cpu_get_invalid_tb_cpu_state(target_ulong *pc,
> +                                                target_ulong *cs_base,
> +                                                uint32_t *flags)
> +{
> +    *cs_base = -1;
> +}
> +
> +static inline bool cpu_tb_cpu_state_is_invalidated(target_ulong pc,
> +                                                   target_ulong cs_base,
> +                                                   uint32_t flags)
> +{
> +    return cs_base == -1;
> +}
> +
>  int moxie_cpu_handle_mmu_fault(CPUState *cpu, vaddr address,
>                                 int rw, int mmu_idx);
>
> diff --git a/target-openrisc/cpu.h b/target-openrisc/cpu.h
> index 9451a7cca691..a5650cbed4b7 100644
> --- a/target-openrisc/cpu.h
> +++ b/target-openrisc/cpu.h
> @@ -398,6 +398,20 @@ static inline void cpu_get_tb_cpu_state(CPUOpenRISCState *env,
>      *flags = (env->flags & D_FLAG);
>  }
>
> +static inline void cpu_get_invalid_tb_cpu_state(target_ulong *pc,
> +                                                target_ulong *cs_base,
> +                                                uint32_t *flags)
> +{
> +    *cs_base = -1;
> +}
> +
> +static inline bool cpu_tb_cpu_state_is_invalidated(target_ulong pc,
> +                                                   target_ulong cs_base,
> +                                                   uint32_t flags)
> +{
> +    return cs_base == -1;
> +}
> +
>  static inline int cpu_mmu_index(CPUOpenRISCState *env, bool ifetch)
>  {
>      if (!(env->sr & SR_IME)) {
> diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
> index 2666a3f80d00..092c7954d19a 100644
> --- a/target-ppc/cpu.h
> +++ b/target-ppc/cpu.h
> @@ -2294,6 +2294,20 @@ static inline void cpu_get_tb_cpu_state(CPUPPCState *env, target_ulong *pc,
>      *flags = env->hflags;
>  }
>
> +static inline void cpu_get_invalid_tb_cpu_state(target_ulong *pc,
> +                                                target_ulong *cs_base,
> +                                                uint32_t *flags)
> +{
> +    *cs_base = -1;
> +}
> +
> +static inline bool cpu_tb_cpu_state_is_invalidated(target_ulong pc,
> +                                                   target_ulong cs_base,
> +                                                   uint32_t flags)
> +{
> +    return cs_base == -1;
> +}
> +
>  #if !defined(CONFIG_USER_ONLY)
>  static inline int booke206_tlbm_id(CPUPPCState *env, ppcmas_tlb_t *tlbm)
>  {
> diff --git a/target-s390x/cpu.h b/target-s390x/cpu.h
> index 8bcb0f75f399..574cdb51c967 100644
> --- a/target-s390x/cpu.h
> +++ b/target-s390x/cpu.h
> @@ -393,6 +393,20 @@ static inline void cpu_get_tb_cpu_state(CPUS390XState* env, target_ulong *pc,
>               ((env->psw.mask & PSW_MASK_32) ? FLAG_MASK_32 : 0);
>  }
>
> +static inline void cpu_get_invalid_tb_cpu_state(target_ulong *pc,
> +                                                target_ulong *cs_base,
> +                                                uint32_t *flags)
> +{
> +    *cs_base = -1;
> +}
> +
> +static inline bool cpu_tb_cpu_state_is_invalidated(target_ulong pc,
> +                                                   target_ulong cs_base,
> +                                                   uint32_t flags)
> +{
> +    return cs_base == -1;
> +}
> +
>  /* While the PoO talks about ILC (a number between 1-3) what is actually
>     stored in LowCore is shifted left one bit (an even between 2-6).  As
>     this is the actual length of the insn and therefore more useful, that
> diff --git a/target-sh4/cpu.h b/target-sh4/cpu.h
> index 3f9dae2d1f0d..bcddbf31c853 100644
> --- a/target-sh4/cpu.h
> +++ b/target-sh4/cpu.h
> @@ -387,4 +387,18 @@ static inline void cpu_get_tb_cpu_state(CPUSH4State *env, target_ulong *pc,
>              | (env->movcal_backup ? TB_FLAG_PENDING_MOVCA : 0); /* Bit 4 */
>  }
>
> +static inline void cpu_get_invalid_tb_cpu_state(target_ulong *pc,
> +                                                target_ulong *cs_base,
> +                                                uint32_t *flags)
> +{
> +    *cs_base = -1;
> +}
> +
> +static inline bool cpu_tb_cpu_state_is_invalidated(target_ulong pc,
> +                                                   target_ulong cs_base,
> +                                                   uint32_t flags)
> +{
> +    return cs_base == -1;
> +}
> +
>  #endif				/* _CPU_SH4_H */
> diff --git a/target-sparc/cpu.h b/target-sparc/cpu.h
> index 604de84624ca..09df29b7b439 100644
> --- a/target-sparc/cpu.h
> +++ b/target-sparc/cpu.h
> @@ -749,6 +749,20 @@ static inline void cpu_get_tb_cpu_state(CPUSPARCState *env, target_ulong *pc,
>  #endif
>  }
>
> +static inline void cpu_get_invalid_tb_cpu_state(target_ulong *pc,
> +                                                target_ulong *cs_base,
> +                                                uint32_t *flags)
> +{
> +    *cs_base = -1; /* npc must be a multible of 4 */
> +}
> +
> +static inline bool cpu_tb_cpu_state_is_invalidated(target_ulong pc,
> +                                                   target_ulong cs_base,
> +                                                   uint32_t flags)
> +{
> +    return cs_base == -1;
> +}
> +
>  static inline bool tb_fpu_enabled(int tb_flags)
>  {
>  #if defined(CONFIG_USER_ONLY)
> diff --git a/target-sparc/translate.c b/target-sparc/translate.c
> index 0f4faf70624f..cc160bc67be0 100644
> --- a/target-sparc/translate.c
> +++ b/target-sparc/translate.c
> @@ -38,6 +38,7 @@
>  #define DYNAMIC_PC  1 /* dynamic pc value */
>  #define JUMP_PC     2 /* dynamic pc value which takes only two values
>                           according to jump_pc[T2] */
> +/* NOTE: -1 is reserved for cpu_get_invalid_tb_cpu_state() */
>
>  /* global register indexes */
>  static TCGv_env cpu_env;
> diff --git a/target-tilegx/cpu.h b/target-tilegx/cpu.h
> index d74032925b12..79f84a422de1 100644
> --- a/target-tilegx/cpu.h
> +++ b/target-tilegx/cpu.h
> @@ -174,4 +174,18 @@ static inline void cpu_get_tb_cpu_state(CPUTLGState *env, target_ulong *pc,
>      *flags = 0;
>  }
>
> +static inline void cpu_get_invalid_tb_cpu_state(target_ulong *pc,
> +                                                target_ulong *cs_base,
> +                                                uint32_t *flags)
> +{
> +    *cs_base = -1;
> +}
> +
> +static inline bool cpu_tb_cpu_state_is_invalidated(target_ulong pc,
> +                                                   target_ulong cs_base,
> +                                                   uint32_t flags)
> +{
> +    return cs_base == -1;
> +}
> +
>  #endif
> diff --git a/target-tricore/cpu.h b/target-tricore/cpu.h
> index a298d63eea24..8f3ca20241b7 100644
> --- a/target-tricore/cpu.h
> +++ b/target-tricore/cpu.h
> @@ -410,6 +410,20 @@ static inline void cpu_get_tb_cpu_state(CPUTriCoreState *env, target_ulong *pc,
>      *flags = 0;
>  }
>
> +static inline void cpu_get_invalid_tb_cpu_state(target_ulong *pc,
> +                                                target_ulong *cs_base,
> +                                                uint32_t *flags)
> +{
> +    *cs_base = -1;
> +}
> +
> +static inline bool cpu_tb_cpu_state_is_invalidated(target_ulong pc,
> +                                                   target_ulong cs_base,
> +                                                   uint32_t flags)
> +{
> +    return cs_base == -1;
> +}
> +
>  TriCoreCPU *cpu_tricore_init(const char *cpu_model);
>
>  #define cpu_init(cpu_model) CPU(cpu_tricore_init(cpu_model))
> diff --git a/target-unicore32/cpu.h b/target-unicore32/cpu.h
> index 83f758496a6c..6bcda6aa3e38 100644
> --- a/target-unicore32/cpu.h
> +++ b/target-unicore32/cpu.h
> @@ -179,6 +179,20 @@ static inline void cpu_get_tb_cpu_state(CPUUniCore32State *env, target_ulong *pc
>      }
>  }
>
> +static inline void cpu_get_invalid_tb_cpu_state(target_ulong *pc,
> +                                                target_ulong *cs_base,
> +                                                uint32_t *flags)
> +{
> +    *cs_base = -1;
> +}
> +
> +static inline bool cpu_tb_cpu_state_is_invalidated(target_ulong pc,
> +                                                   target_ulong cs_base,
> +                                                   uint32_t flags)
> +{
> +    return cs_base == -1;
> +}
> +
>  int uc32_cpu_handle_mmu_fault(CPUState *cpu, vaddr address, int rw,
>                                int mmu_idx);
>  void uc32_translate_init(void);
> diff --git a/target-xtensa/cpu.h b/target-xtensa/cpu.h
> index ce9fb5b0033a..bd0a3b1deddf 100644
> --- a/target-xtensa/cpu.h
> +++ b/target-xtensa/cpu.h
> @@ -582,6 +582,20 @@ static inline void cpu_get_tb_cpu_state(CPUXtensaState *env, target_ulong *pc,
>      }
>  }
>
> +static inline void cpu_get_invalid_tb_cpu_state(target_ulong *pc,
> +                                                target_ulong *cs_base,
> +                                                uint32_t *flags)
> +{
> +    *cs_base = -1;
> +}
> +
> +static inline bool cpu_tb_cpu_state_is_invalidated(target_ulong pc,
> +                                                   target_ulong cs_base,
> +                                                   uint32_t flags)
> +{
> +    return cs_base == -1;
> +}
> +
>  #include "exec/cpu-all.h"
>
>  #endif


--
Alex Bennée

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Qemu-devel] [PATCH v3 06/11] tcg: Introduce tb_mark_invalid() and tb_is_invalid()
  2016-07-14 10:25   ` Alex Bennée
@ 2016-07-14 11:10     ` Sergey Fedorov
  2016-07-14 11:48       ` Paolo Bonzini
  0 siblings, 1 reply; 49+ messages in thread
From: Sergey Fedorov @ 2016-07-14 11:10 UTC (permalink / raw)
  To: Alex Bennée, Sergey Fedorov
  Cc: qemu-devel, mttcg, fred.konrad, a.rigo, cota, bobby.prani, rth,
	patches, mark.burton, pbonzini, jan.kiszka, peter.maydell,
	claudio.fontana, Peter Crosthwaite, Edgar E. Iglesias,
	Eduardo Habkost, Michael Walle, Aurelien Jarno, Leon Alrae,
	Anthony Green, Jia Liu, David Gibson, Alexander Graf,
	Mark Cave-Ayland, Artyom Tarasenko, Bastian Koppelmann,
	Guan Xuetao, Max Filippov, qemu-arm, qemu-ppc

On 14/07/16 13:25, Alex Bennée wrote:
> Sergey Fedorov <sergey.fedorov@linaro.org> writes:
>
>> > From: Sergey Fedorov <serge.fdrv@gmail.com>
>> >
>> > These functions will be used to make translation block invalidation safe
>> > with concurrent lockless lookup in the global hash table.
>> >
>> > Most targets don't use 'cs_base'; so marking TB as invalid is as simple
>> > as assigning -1 to 'cs_base'. SPARC target stores the next program
>> > counter into 'cs_base', and -1 is a fine invalid value since PC must bet
>> > a multiple of 4 in SPARC. The only odd target is i386, for which a
>> > special flag is introduced in place of removed 'HF_SOFTMMU_MASK'.
>> >
>> > Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
>> > Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
>> > Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
> This has merge conflicts with the current state of master. Is there
> anyway to have a common implementation that is specialised only when
> needed?
>

The point was to put the assumptions on invalid CPU TB state as close to
cpu_get_tb_cpu_state() definitions as possible. So that if anyone make
changes they can notice those assumptions and correct them if necessary.

Kind regards
Sergey

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Qemu-devel] [PATCH v3 00/11] Reduce lock contention on TCG hot-path
  2016-07-14  9:55     ` Alex Bennée
@ 2016-07-14 11:13       ` Sergey Fedorov
  0 siblings, 0 replies; 49+ messages in thread
From: Sergey Fedorov @ 2016-07-14 11:13 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Paolo Bonzini, Sergey Fedorov, qemu-devel, mttcg, fred.konrad,
	a.rigo, cota, bobby.prani, rth, peter.maydell, patches,
	claudio.fontana, mark.burton, jan.kiszka

On 14/07/16 12:55, Alex Bennée wrote:
> Sergey Fedorov <serge.fdrv@gmail.com> writes:
>
>> On 13/07/16 10:39, Paolo Bonzini wrote:
>>> Have you measured performance with the series?  In any case, it's nice
>>> to see MTTCG finally taking shape!
>> No, I didn't measured the performance. Maybe Alex can help me with
>> this?
> I can run some tests against this series if you want. It's fairly easy
> to see an improvement when watching htop though as the system time usage
> drops a lot.


You seemed to have some tests ready. If you share them I could run them
as well.

Thanks,
Sergey

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Qemu-devel] [PATCH v3 06/11] tcg: Introduce tb_mark_invalid() and tb_is_invalid()
  2016-07-14 11:10     ` Sergey Fedorov
@ 2016-07-14 11:48       ` Paolo Bonzini
  2016-07-14 12:04         ` Alex Bennée
  0 siblings, 1 reply; 49+ messages in thread
From: Paolo Bonzini @ 2016-07-14 11:48 UTC (permalink / raw)
  To: Sergey Fedorov, Alex Bennée, Sergey Fedorov
  Cc: qemu-devel, mttcg, fred.konrad, a.rigo, cota, bobby.prani, rth,
	patches, mark.burton, jan.kiszka, peter.maydell, claudio.fontana,
	Peter Crosthwaite, Edgar E. Iglesias, Eduardo Habkost,
	Michael Walle, Aurelien Jarno, Leon Alrae, Anthony Green,
	Jia Liu, David Gibson, Alexander Graf, Mark Cave-Ayland,
	Artyom Tarasenko, Bastian Koppelmann, Guan Xuetao, Max Filippov,
	qemu-arm, qemu-ppc



On 14/07/2016 13:10, Sergey Fedorov wrote:
> > This has merge conflicts with the current state of master. Is there
> > anyway to have a common implementation that is specialised only when
> > needed?
>
> The point was to put the assumptions on invalid CPU TB state as close to
> cpu_get_tb_cpu_state() definitions as possible. So that if anyone make
> changes they can notice those assumptions and correct them if necessary.

It causes some repetition indeed, but I think it's a good idea.

restore_state_to_opc is another case where most implementations have the
same simple "env->pc = data[0]" implementation.

Paolo

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Qemu-devel] [PATCH v3 00/11] Reduce lock contention on TCG hot-path
  2016-07-13  7:39 ` [Qemu-devel] [PATCH v3 00/11] Reduce lock contention on TCG hot-path Paolo Bonzini
  2016-07-13 17:00   ` Sergey Fedorov
  2016-07-13 18:06   ` Sergey Fedorov
@ 2016-07-14 12:02   ` Alex Bennée
  2016-07-14 12:10     ` Paolo Bonzini
  2 siblings, 1 reply; 49+ messages in thread
From: Alex Bennée @ 2016-07-14 12:02 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sergey Fedorov, qemu-devel, mttcg, fred.konrad, a.rigo,
	serge.fdrv, cota, bobby.prani, rth, peter.maydell, patches,
	claudio.fontana, mark.burton, jan.kiszka


Paolo Bonzini <pbonzini@redhat.com> writes:

> On 12/07/2016 22:13, Sergey Fedorov wrote:
>> From: Sergey Fedorov <serge.fdrv@gmail.com>
<snip>
>
> Have you measured performance with the series?  In any case, it's nice
> to see MTTCG finally taking shape!

Here are some numbers on the multi-threaded pigz test:

Before:

retry.py called with ['./arm-linux-user/qemu-arm', './pigz.armhf', '-c', '-9', 'linux-4.6.3.tar']
Source code is @ pull-tcg-20160708-104-g9ec3025 or heads/review/hot-patch-v3
run 1: ret=0 (PASS), time=32.285497 (1/1)
run 2: ret=0 (PASS), time=32.035293 (2/2)
run 3: ret=0 (PASS), time=31.784781 (3/3)
run 4: ret=0 (PASS), time=32.035136 (4/4)
run 5: ret=0 (PASS), time=32.285612 (5/5)
Results summary:
0: 5 times (100.00%), avg time 32.085 (0.04 varience/0.21 deviation)

After:

Ran command 5 times, 5 passes
retry.py called with ['./arm-linux-user/qemu-arm', './pigz.armhf', '-c', '-9', 'linux-4.6.3.tar']
Source code is @ pull-tcg-20160708-115-gf317fa8 or heads/review/hot-patch-v3
run 1: ret=0 (PASS), time=29.281950 (1/1)
run 2: ret=0 (PASS), time=29.285588 (2/2)
run 3: ret=0 (PASS), time=29.282065 (3/3)
run 4: ret=0 (PASS), time=29.282397 (4/4)
run 5: ret=0 (PASS), time=29.282045 (5/5)
Results summary:
0: 5 times (100.00%), avg time 29.283 (0.00 varience/0.00 deviation)
Ran command 5 times, 5 passes

Which gives a roughly 10% improvement for heavily threaded code.

Looking at the perf data it looks like the hotest part of the code now
is cpu_get_tb_cpu_state which is required to get the initial hash to
search for the next tb.

--
Alex Bennée

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Qemu-devel] [PATCH v3 06/11] tcg: Introduce tb_mark_invalid() and tb_is_invalid()
  2016-07-14 11:48       ` Paolo Bonzini
@ 2016-07-14 12:04         ` Alex Bennée
  0 siblings, 0 replies; 49+ messages in thread
From: Alex Bennée @ 2016-07-14 12:04 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sergey Fedorov, Sergey Fedorov, qemu-devel, mttcg, fred.konrad,
	a.rigo, cota, bobby.prani, rth, patches, mark.burton, jan.kiszka,
	peter.maydell, claudio.fontana, Peter Crosthwaite,
	Edgar E. Iglesias, Eduardo Habkost, Michael Walle,
	Aurelien Jarno, Leon Alrae, Anthony Green, Jia Liu, David Gibson,
	Alexander Graf, Mark Cave-Ayland, Artyom Tarasenko,
	Bastian Koppelmann, Guan Xuetao, Max Filippov, qemu-arm, qemu-pp


Paolo Bonzini <pbonzini@redhat.com> writes:

> On 14/07/2016 13:10, Sergey Fedorov wrote:
>> > This has merge conflicts with the current state of master. Is there
>> > anyway to have a common implementation that is specialised only when
>> > needed?
>>
>> The point was to put the assumptions on invalid CPU TB state as close to
>> cpu_get_tb_cpu_state() definitions as possible. So that if anyone make
>> changes they can notice those assumptions and correct them if necessary.
>
> It causes some repetition indeed, but I think it's a good idea.
>
> restore_state_to_opc is another case where most implementations have the
> same simple "env->pc = data[0]" implementation.

Yeah, now I've seen cpu_get_tb_cpu_state jump up the hot-path I tend to
agree ;-)

--
Alex Bennée

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Qemu-devel] [PATCH v3 00/11] Reduce lock contention on TCG hot-path
  2016-07-14 12:02   ` Alex Bennée
@ 2016-07-14 12:10     ` Paolo Bonzini
  2016-07-14 13:13       ` Alex Bennée
  0 siblings, 1 reply; 49+ messages in thread
From: Paolo Bonzini @ 2016-07-14 12:10 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Sergey Fedorov, qemu-devel, mttcg, fred.konrad, a.rigo,
	serge.fdrv, cota, bobby.prani, rth, peter.maydell, patches,
	claudio.fontana, mark.burton, jan.kiszka



On 14/07/2016 14:02, Alex Bennée wrote:
> Looking at the perf data it looks like the hotest part of the code now
> is cpu_get_tb_cpu_state which is required to get the initial hash to
> search for the next tb.

It should be possible to keep the flags up-to-date in a separate field
of CPUARMState, similar to what x86 does.

Paolo

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Qemu-devel] [PATCH v3 03/11] tcg: Prepare safe tb_jmp_cache lookup out of tb_lock
  2016-07-12 20:13 ` [Qemu-devel] [PATCH v3 03/11] tcg: Prepare safe tb_jmp_cache lookup out of tb_lock Sergey Fedorov
@ 2016-07-14 12:14   ` Alex Bennée
  0 siblings, 0 replies; 49+ messages in thread
From: Alex Bennée @ 2016-07-14 12:14 UTC (permalink / raw)
  To: Sergey Fedorov
  Cc: qemu-devel, mttcg, fred.konrad, a.rigo, serge.fdrv, cota,
	bobby.prani, rth, patches, mark.burton, pbonzini, jan.kiszka,
	peter.maydell, claudio.fontana, Peter Crosthwaite


Sergey Fedorov <sergey.fedorov@linaro.org> writes:

> From: Sergey Fedorov <serge.fdrv@gmail.com>
>
> Ensure atomicity of CPU's 'tb_jmp_cache' access for future translation
> block lookup out of 'tb_lock'.
>
> Note that this patch does *not* make CPU's TLB invalidation safe if it
> is done from some other thread while the CPU is in its execution loop.
>
> Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
> Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
> [AJB: fixed missing atomic set, tweak title]
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> [Sergey Fedorov: removed explicit memory barriers;
> memset() on 'tb_jmp_cache' replaced with a loop on atomic_set();
> tweaked commit title and message]
> Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

Not sure it counts as it's already had my s-o-b when I last had the tree
but there you go ;-)

>
> ---
> Changes in v3:
>  - explicit memory barriers removed
>  - memset() on 'tb_jmp_cache' replaced with a loop on atomic_set()
> Changes in v2:
>  - fix spelling s/con't/can't/
>  - add atomic_read while clearing tb_jmp_cache
>  - add r-b tags
> Changes in v1 (AJB):
>  - tweak title
>  - fixed missing set of tb_jmp_cache
> ---
>  cpu-exec.c      |  4 ++--
>  translate-all.c | 10 +++++++---
>  2 files changed, 9 insertions(+), 5 deletions(-)
>
> diff --git a/cpu-exec.c b/cpu-exec.c
> index 757f63b08b6b..d6178eab71d4 100644
> --- a/cpu-exec.c
> +++ b/cpu-exec.c
> @@ -315,7 +315,7 @@ static TranslationBlock *tb_find_slow(CPUState *cpu,
>
>  found:
>      /* we add the TB in the virtual pc hash table */
> -    cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)] = tb;
> +    atomic_set(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)], tb);
>      return tb;
>  }
>
> @@ -333,7 +333,7 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
>         is executed. */
>      cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags);
>      tb_lock();
> -    tb = cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)];
> +    tb = atomic_read(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)]);
>      if (unlikely(!tb || tb->pc != pc || tb->cs_base != cs_base ||
>                   tb->flags != flags)) {
>          tb = tb_find_slow(cpu, pc, cs_base, flags);
> diff --git a/translate-all.c b/translate-all.c
> index 0d47c1c0cf82..fdf520a86d68 100644
> --- a/translate-all.c
> +++ b/translate-all.c
> @@ -848,7 +848,11 @@ void tb_flush(CPUState *cpu)
>      tcg_ctx.tb_ctx.nb_tbs = 0;
>
>      CPU_FOREACH(cpu) {
> -        memset(cpu->tb_jmp_cache, 0, sizeof(cpu->tb_jmp_cache));
> +        int i;
> +
> +        for (i = 0; i < TB_JMP_CACHE_SIZE; ++i) {
> +            atomic_set(&cpu->tb_jmp_cache[i], NULL);
> +        }
>          cpu->tb_flushed = true;
>      }
>
> @@ -1007,8 +1011,8 @@ void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr)
>      /* remove the TB from the hash list */
>      h = tb_jmp_cache_hash_func(tb->pc);
>      CPU_FOREACH(cpu) {
> -        if (cpu->tb_jmp_cache[h] == tb) {
> -            cpu->tb_jmp_cache[h] = NULL;
> +        if (atomic_read(&cpu->tb_jmp_cache[h]) == tb) {
> +            atomic_set(&cpu->tb_jmp_cache[h], NULL);
>          }
>      }


--
Alex Bennée

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Qemu-devel] [PATCH v3 05/11] target-i386: Remove redundant HF_SOFTMMU_MASK
  2016-07-12 20:13 ` [Qemu-devel] [PATCH v3 05/11] target-i386: Remove redundant HF_SOFTMMU_MASK Sergey Fedorov
@ 2016-07-14 12:19   ` Alex Bennée
  0 siblings, 0 replies; 49+ messages in thread
From: Alex Bennée @ 2016-07-14 12:19 UTC (permalink / raw)
  To: Sergey Fedorov
  Cc: qemu-devel, mttcg, fred.konrad, a.rigo, serge.fdrv, cota,
	bobby.prani, rth, patches, mark.burton, pbonzini, jan.kiszka,
	peter.maydell, claudio.fontana, Eduardo Habkost


Sergey Fedorov <sergey.fedorov@linaro.org> writes:

> From: Sergey Fedorov <serge.fdrv@gmail.com>
>
> 'HF_SOFTMMU_MASK' is only set when 'CONFIG_SOFTMMU' is defined. So
> there's no need in this flag: test 'CONFIG_SOFTMMU' instead.
>
> Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
> Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

> ---
>  target-i386/cpu.c       |  3 ---
>  target-i386/cpu.h       |  3 ---
>  target-i386/translate.c | 12 ++++--------
>  3 files changed, 4 insertions(+), 14 deletions(-)
>
> diff --git a/target-i386/cpu.c b/target-i386/cpu.c
> index fc209ee1cb8a..6e49e4ca8282 100644
> --- a/target-i386/cpu.c
> +++ b/target-i386/cpu.c
> @@ -2725,9 +2725,6 @@ static void x86_cpu_reset(CPUState *s)
>
>      /* init to reset state */
>
> -#ifdef CONFIG_SOFTMMU
> -    env->hflags |= HF_SOFTMMU_MASK;
> -#endif
>      env->hflags2 |= HF2_GIF_MASK;
>
>      cpu_x86_update_cr0(env, 0x60000010);
> diff --git a/target-i386/cpu.h b/target-i386/cpu.h
> index 5c7a2791f34f..a7c752afdad8 100644
> --- a/target-i386/cpu.h
> +++ b/target-i386/cpu.h
> @@ -129,8 +129,6 @@
>     positions to ease oring with eflags. */
>  /* current cpl */
>  #define HF_CPL_SHIFT         0
> -/* true if soft mmu is being used */
> -#define HF_SOFTMMU_SHIFT     2
>  /* true if hardware interrupts must be disabled for next instruction */
>  #define HF_INHIBIT_IRQ_SHIFT 3
>  /* 16 or 32 segments */
> @@ -160,7 +158,6 @@
>  #define HF_MPX_IU_SHIFT     26 /* BND registers in-use */
>
>  #define HF_CPL_MASK          (3 << HF_CPL_SHIFT)
> -#define HF_SOFTMMU_MASK      (1 << HF_SOFTMMU_SHIFT)
>  #define HF_INHIBIT_IRQ_MASK  (1 << HF_INHIBIT_IRQ_SHIFT)
>  #define HF_CS32_MASK         (1 << HF_CS32_SHIFT)
>  #define HF_SS32_MASK         (1 << HF_SS32_SHIFT)
> diff --git a/target-i386/translate.c b/target-i386/translate.c
> index 7dea18bd6345..e81fce7bc2b5 100644
> --- a/target-i386/translate.c
> +++ b/target-i386/translate.c
> @@ -8224,9 +8224,9 @@ void gen_intermediate_code(CPUX86State *env, TranslationBlock *tb)
>      dc->popl_esp_hack = 0;
>      /* select memory access functions */
>      dc->mem_index = 0;
> -    if (flags & HF_SOFTMMU_MASK) {
> -	dc->mem_index = cpu_mmu_index(env, false);
> -    }
> +#ifdef CONFIG_SOFTMMU
> +    dc->mem_index = cpu_mmu_index(env, false);
> +#endif
>      dc->cpuid_features = env->features[FEAT_1_EDX];
>      dc->cpuid_ext_features = env->features[FEAT_1_ECX];
>      dc->cpuid_ext2_features = env->features[FEAT_8000_0001_EDX];
> @@ -8239,11 +8239,7 @@ void gen_intermediate_code(CPUX86State *env, TranslationBlock *tb)
>  #endif
>      dc->flags = flags;
>      dc->jmp_opt = !(dc->tf || cs->singlestep_enabled ||
> -                    (flags & HF_INHIBIT_IRQ_MASK)
> -#ifndef CONFIG_SOFTMMU
> -                    || (flags & HF_SOFTMMU_MASK)
> -#endif
> -                    );
> +                    (flags & HF_INHIBIT_IRQ_MASK));
>      /* Do not optimize repz jumps at all in icount mode, because
>         rep movsS instructions are execured with different paths
>         in !repz_opt and repz_opt modes. The first one was used


--
Alex Bennée

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Qemu-devel] [PATCH v3 04/11] tcg: Prepare safe access to tb_flushed out of tb_lock
  2016-07-12 20:13 ` [Qemu-devel] [PATCH v3 04/11] tcg: Prepare safe access to tb_flushed " Sergey Fedorov
@ 2016-07-14 12:45   ` Alex Bennée
  2016-07-14 12:55     ` Sergey Fedorov
  0 siblings, 1 reply; 49+ messages in thread
From: Alex Bennée @ 2016-07-14 12:45 UTC (permalink / raw)
  To: Sergey Fedorov
  Cc: qemu-devel, mttcg, fred.konrad, a.rigo, serge.fdrv, cota,
	bobby.prani, rth, patches, mark.burton, pbonzini, jan.kiszka,
	peter.maydell, claudio.fontana, Peter Crosthwaite


Sergey Fedorov <sergey.fedorov@linaro.org> writes:

> From: Sergey Fedorov <serge.fdrv@gmail.com>
>
> Ensure atomicity of CPU's 'tb_flushed' access for future translation
> block lookup out of 'tb_lock'.
>
> This field can only be touched from another thread by tb_flush() in user
> mode emulation. So the only access to be atomic is:
>  * a single write in tb_flush();
>  * reads/writes out of 'tb_lock'.

It might worth mentioning the barrier here.

>
> In future, before enabling MTTCG in system mode, tb_flush() must be safe
> and this field becomes unnecessary.
>
> Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
> Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
> ---
>  cpu-exec.c      | 16 +++++++---------
>  translate-all.c |  4 ++--
>  2 files changed, 9 insertions(+), 11 deletions(-)
>
> diff --git a/cpu-exec.c b/cpu-exec.c
> index d6178eab71d4..c973e3b85922 100644
> --- a/cpu-exec.c
> +++ b/cpu-exec.c
> @@ -338,13 +338,6 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
>                   tb->flags != flags)) {
>          tb = tb_find_slow(cpu, pc, cs_base, flags);
>      }
> -    if (cpu->tb_flushed) {
> -        /* Ensure that no TB jump will be modified as the
> -         * translation buffer has been flushed.
> -         */
> -        *last_tb = NULL;
> -        cpu->tb_flushed = false;
> -    }
>  #ifndef CONFIG_USER_ONLY
>      /* We don't take care of direct jumps when address mapping changes in
>       * system emulation. So it's not safe to make a direct jump to a TB
> @@ -356,7 +349,12 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
>  #endif
>      /* See if we can patch the calling TB. */
>      if (last_tb && !qemu_loglevel_mask(CPU_LOG_TB_NOCHAIN)) {
> -        tb_add_jump(last_tb, tb_exit, tb);
> +        /* Check if translation buffer has been flushed */
> +        if (cpu->tb_flushed) {
> +            cpu->tb_flushed = false;
> +        } else {
> +            tb_add_jump(last_tb, tb_exit, tb);
> +        }
>      }
>      tb_unlock();
>      return tb;
> @@ -618,7 +616,7 @@ int cpu_exec(CPUState *cpu)
>              }
>
>              last_tb = NULL; /* forget the last executed TB after exception */
> -            cpu->tb_flushed = false; /* reset before first TB lookup */
> +            atomic_mb_set(&cpu->tb_flushed, false); /* reset before first TB lookup */
>              for(;;) {
>                  cpu_handle_interrupt(cpu, &last_tb);
>                  tb = tb_find_fast(cpu, last_tb, tb_exit);
> diff --git a/translate-all.c b/translate-all.c
> index fdf520a86d68..788fed1e0765 100644
> --- a/translate-all.c
> +++ b/translate-all.c
> @@ -845,7 +845,6 @@ void tb_flush(CPUState *cpu)
>          > tcg_ctx.code_gen_buffer_size) {
>          cpu_abort(cpu, "Internal error: code buffer overflow\n");
>      }
> -    tcg_ctx.tb_ctx.nb_tbs = 0;
>
>      CPU_FOREACH(cpu) {
>          int i;
> @@ -853,9 +852,10 @@ void tb_flush(CPUState *cpu)
>          for (i = 0; i < TB_JMP_CACHE_SIZE; ++i) {
>              atomic_set(&cpu->tb_jmp_cache[i], NULL);
>          }
> -        cpu->tb_flushed = true;
> +        atomic_mb_set(&cpu->tb_flushed, true);
>      }
>
> +    tcg_ctx.tb_ctx.nb_tbs = 0;
>      qht_reset_size(&tcg_ctx.tb_ctx.htable, CODE_GEN_HTABLE_SIZE);

I can see the sense of moving the setting of nb_tbs but is it strictly
required as part of this patch?

>      page_flush_tb();

Otherwise:

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

--
Alex Bennée

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Qemu-devel] [PATCH v3 06/11] tcg: Introduce tb_mark_invalid() and tb_is_invalid()
  2016-07-12 20:13 ` [Qemu-devel] [PATCH v3 06/11] tcg: Introduce tb_mark_invalid() and tb_is_invalid() Sergey Fedorov
  2016-07-14 10:25   ` Alex Bennée
@ 2016-07-14 12:53   ` Alex Bennée
  2016-07-14 13:00     ` Sergey Fedorov
  1 sibling, 1 reply; 49+ messages in thread
From: Alex Bennée @ 2016-07-14 12:53 UTC (permalink / raw)
  To: Sergey Fedorov
  Cc: qemu-devel, mttcg, fred.konrad, a.rigo, serge.fdrv, cota,
	bobby.prani, rth, patches, mark.burton, pbonzini, jan.kiszka,
	peter.maydell, claudio.fontana, Peter Crosthwaite,
	Edgar E. Iglesias, Eduardo Habkost, Michael Walle,
	Aurelien Jarno, Leon Alrae, Anthony Green, Jia Liu, David Gibson,
	Alexander Graf, Mark Cave-Ayland, Artyom Tarasenko,
	Bastian Koppelmann, Guan Xuetao, Max Filippov, qemu-arm,
	qemu-ppc


Sergey Fedorov <sergey.fedorov@linaro.org> writes:

> From: Sergey Fedorov <serge.fdrv@gmail.com>
>
> These functions will be used to make translation block invalidation safe
> with concurrent lockless lookup in the global hash table.
>
> Most targets don't use 'cs_base'; so marking TB as invalid is as simple
> as assigning -1 to 'cs_base'. SPARC target stores the next program
> counter into 'cs_base', and -1 is a fine invalid value since PC must bet
> a multiple of 4 in SPARC. The only odd target is i386, for which a
> special flag is introduced in place of removed 'HF_SOFTMMU_MASK'.
>
> Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
> Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
> ---
>  include/exec/exec-all.h  | 10 ++++++++++
>  target-alpha/cpu.h       | 14 ++++++++++++++
>  target-arm/cpu.h         | 14 ++++++++++++++
>  target-cris/cpu.h        | 14 ++++++++++++++
>  target-i386/cpu.h        | 17 +++++++++++++++++
>  target-lm32/cpu.h        | 14 ++++++++++++++
>  target-m68k/cpu.h        | 14 ++++++++++++++
>  target-microblaze/cpu.h  | 14 ++++++++++++++
>  target-mips/cpu.h        | 14 ++++++++++++++
>  target-moxie/cpu.h       | 14 ++++++++++++++
>  target-openrisc/cpu.h    | 14 ++++++++++++++
>  target-ppc/cpu.h         | 14 ++++++++++++++
>  target-s390x/cpu.h       | 14 ++++++++++++++
>  target-sh4/cpu.h         | 14 ++++++++++++++
>  target-sparc/cpu.h       | 14 ++++++++++++++
>  target-sparc/translate.c |  1 +
>  target-tilegx/cpu.h      | 14 ++++++++++++++
>  target-tricore/cpu.h     | 14 ++++++++++++++
>  target-unicore32/cpu.h   | 14 ++++++++++++++
>  target-xtensa/cpu.h      | 14 ++++++++++++++
>  20 files changed, 266 insertions(+)
>
> diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
> index db79ab65cebe..61cc3a1fb8f7 100644
> --- a/include/exec/exec-all.h
> +++ b/include/exec/exec-all.h
> @@ -256,6 +256,16 @@ void tb_free(TranslationBlock *tb);
>  void tb_flush(CPUState *cpu);
>  void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr);
>
> +static inline void tb_mark_invalid(TranslationBlock *tb)
> +{
> +    cpu_get_invalid_tb_cpu_state(&tb->pc, &tb->cs_base, &tb->flags);

Hmmm are we getting something here? Maybe cpu_tb_invalidate_cpu_state?

> +}
> +
> +static inline bool tb_is_invalid(TranslationBlock *tb)
> +{
> +    return cpu_tb_cpu_state_is_invalidated(tb->pc, tb->cs_base, tb->flags);
> +}

Also why are we passing three pointers to parts of TranslationBlock? Why
not just pass tb directly and be done with it?

I'm sure the compiler does something sensible but seems overly verbose
to me.

> +
>  #if defined(USE_DIRECT_JUMP)
>
>  #if defined(CONFIG_TCG_INTERPRETER)
> diff --git a/target-alpha/cpu.h b/target-alpha/cpu.h
> index 791da3b4ad67..1ce4526d7a72 100644
> --- a/target-alpha/cpu.h
> +++ b/target-alpha/cpu.h
> @@ -524,4 +524,18 @@ static inline void cpu_get_tb_cpu_state(CPUAlphaState *env, target_ulong *pc,
>      *pflags = flags;
>  }
>
> +static inline void cpu_get_invalid_tb_cpu_state(target_ulong *pc,
> +                                                target_ulong *cs_base,
> +                                                uint32_t *flags)
> +{
> +    *cs_base = -1;
> +}
> +
> +static inline bool cpu_tb_cpu_state_is_invalidated(target_ulong pc,
> +                                                   target_ulong cs_base,
> +                                                   uint32_t flags)
> +{
> +    return cs_base == -1;
> +}
> +
>  #endif /* !defined (__CPU_ALPHA_H__) */
> diff --git a/target-arm/cpu.h b/target-arm/cpu.h
> index e2fac46909c6..2855bdae7800 100644
> --- a/target-arm/cpu.h
> +++ b/target-arm/cpu.h
> @@ -2371,6 +2371,20 @@ static inline void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
>      *cs_base = 0;
>  }
>
> +static inline void cpu_get_invalid_tb_cpu_state(target_ulong *pc,
> +                                                target_ulong *cs_base,
> +                                                uint32_t *flags)
> +{
> +    *cs_base = -1;
> +}
> +
> +static inline bool cpu_tb_cpu_state_is_invalidated(target_ulong pc,
> +                                                   target_ulong cs_base,
> +                                                   uint32_t flags)
> +{
> +    return cs_base == -1;
> +}
> +
>  enum {
>      QEMU_PSCI_CONDUIT_DISABLED = 0,
>      QEMU_PSCI_CONDUIT_SMC = 1,
> diff --git a/target-cris/cpu.h b/target-cris/cpu.h
> index e6046d20ca10..fae3219304df 100644
> --- a/target-cris/cpu.h
> +++ b/target-cris/cpu.h
> @@ -295,6 +295,20 @@ static inline void cpu_get_tb_cpu_state(CPUCRISState *env, target_ulong *pc,
>  				     | X_FLAG | PFIX_FLAG));
>  }
>
> +static inline void cpu_get_invalid_tb_cpu_state(target_ulong *pc,
> +                                                target_ulong *cs_base,
> +                                                uint32_t *flags)
> +{
> +    *cs_base = -1;
> +}
> +
> +static inline bool cpu_tb_cpu_state_is_invalidated(target_ulong pc,
> +                                                   target_ulong cs_base,
> +                                                   uint32_t flags)
> +{
> +    return cs_base == -1;
> +}
> +
>  #define cpu_list cris_cpu_list
>  void cris_cpu_list(FILE *f, fprintf_function cpu_fprintf);
>
> diff --git a/target-i386/cpu.h b/target-i386/cpu.h
> index a7c752afdad8..c9125b725ca4 100644
> --- a/target-i386/cpu.h
> +++ b/target-i386/cpu.h
> @@ -129,6 +129,8 @@
>     positions to ease oring with eflags. */
>  /* current cpl */
>  #define HF_CPL_SHIFT         0
> +/* used to mark invalidated translation blocks */
> +#define HF_INVALID_SHIFT     2
>  /* true if hardware interrupts must be disabled for next instruction */
>  #define HF_INHIBIT_IRQ_SHIFT 3
>  /* 16 or 32 segments */
> @@ -158,6 +160,7 @@
>  #define HF_MPX_IU_SHIFT     26 /* BND registers in-use */
>
>  #define HF_CPL_MASK          (3 << HF_CPL_SHIFT)
> +#define HF_INVALID_MASK      (1 << HF_INVALID_SHIFT)
>  #define HF_INHIBIT_IRQ_MASK  (1 << HF_INHIBIT_IRQ_SHIFT)
>  #define HF_CS32_MASK         (1 << HF_CS32_SHIFT)
>  #define HF_SS32_MASK         (1 << HF_SS32_SHIFT)
> @@ -1489,6 +1492,20 @@ static inline void cpu_get_tb_cpu_state(CPUX86State *env, target_ulong *pc,
>          (env->eflags & (IOPL_MASK | TF_MASK | RF_MASK | VM_MASK | AC_MASK));
>  }
>
> +static inline void cpu_get_invalid_tb_cpu_state(target_ulong *pc,
> +                                                target_ulong *cs_base,
> +                                                uint32_t *flags)
> +{
> +    *flags = HF_INVALID_MASK;
> +}
> +
> +static inline bool cpu_tb_cpu_state_is_invalidated(target_ulong pc,
> +                                                   target_ulong cs_base,
> +                                                   uint32_t flags)
> +{
> +    return flags == HF_INVALID_MASK;
> +}
> +
>  void do_cpu_init(X86CPU *cpu);
>  void do_cpu_sipi(X86CPU *cpu);
>
> diff --git a/target-lm32/cpu.h b/target-lm32/cpu.h
> index 4efe98d828f8..5f78c3c619e5 100644
> --- a/target-lm32/cpu.h
> +++ b/target-lm32/cpu.h
> @@ -271,4 +271,18 @@ static inline void cpu_get_tb_cpu_state(CPULM32State *env, target_ulong *pc,
>      *flags = 0;
>  }
>
> +static inline void cpu_get_invalid_tb_cpu_state(target_ulong *pc,
> +                                                target_ulong *cs_base,
> +                                                uint32_t *flags)
> +{
> +    *cs_base = -1;
> +}
> +
> +static inline bool cpu_tb_cpu_state_is_invalidated(target_ulong pc,
> +                                                   target_ulong cs_base,
> +                                                   uint32_t flags)
> +{
> +    return cs_base == -1;
> +}
> +
>  #endif
> diff --git a/target-m68k/cpu.h b/target-m68k/cpu.h
> index 908776999797..c44f4339e40a 100644
> --- a/target-m68k/cpu.h
> +++ b/target-m68k/cpu.h
> @@ -269,4 +269,18 @@ static inline void cpu_get_tb_cpu_state(CPUM68KState *env, target_ulong *pc,
>              | ((env->macsr >> 4) & 0xf);        /* Bits 0-3 */
>  }
>
> +static inline void cpu_get_invalid_tb_cpu_state(target_ulong *pc,
> +                                                target_ulong *cs_base,
> +                                                uint32_t *flags)
> +{
> +    *cs_base = -1;
> +}
> +
> +static inline bool cpu_tb_cpu_state_is_invalidated(target_ulong pc,
> +                                                   target_ulong cs_base,
> +                                                   uint32_t flags)
> +{
> +    return cs_base == -1;
> +}
> +
>  #endif
> diff --git a/target-microblaze/cpu.h b/target-microblaze/cpu.h
> index 16815dfc6abc..22362efce586 100644
> --- a/target-microblaze/cpu.h
> +++ b/target-microblaze/cpu.h
> @@ -371,6 +371,20 @@ static inline void cpu_get_tb_cpu_state(CPUMBState *env, target_ulong *pc,
>                   (env->sregs[SR_MSR] & (MSR_UM | MSR_VM | MSR_EE));
>  }
>
> +static inline void cpu_get_invalid_tb_cpu_state(target_ulong *pc,
> +                                                target_ulong *cs_base,
> +                                                uint32_t *flags)
> +{
> +    *cs_base = -1;
> +}
> +
> +static inline bool cpu_tb_cpu_state_is_invalidated(target_ulong pc,
> +                                                   target_ulong cs_base,
> +                                                   uint32_t flags)
> +{
> +    return cs_base == -1;
> +}
> +
>  #if !defined(CONFIG_USER_ONLY)
>  void mb_cpu_unassigned_access(CPUState *cpu, hwaddr addr,
>                                bool is_write, bool is_exec, int is_asi,
> diff --git a/target-mips/cpu.h b/target-mips/cpu.h
> index 9c4fc816bf9b..a887154274b3 100644
> --- a/target-mips/cpu.h
> +++ b/target-mips/cpu.h
> @@ -901,6 +901,20 @@ static inline void cpu_get_tb_cpu_state(CPUMIPSState *env, target_ulong *pc,
>                              MIPS_HFLAG_HWRENA_ULR);
>  }
>
> +static inline void cpu_get_invalid_tb_cpu_state(target_ulong *pc,
> +                                                target_ulong *cs_base,
> +                                                uint32_t *flags)
> +{
> +    *cs_base = -1;
> +}
> +
> +static inline bool cpu_tb_cpu_state_is_invalidated(target_ulong pc,
> +                                                   target_ulong cs_base,
> +                                                   uint32_t flags)
> +{
> +    return cs_base == -1;
> +}
> +
>  static inline int mips_vpe_active(CPUMIPSState *env)
>  {
>      int active = 1;
> diff --git a/target-moxie/cpu.h b/target-moxie/cpu.h
> index 63d5cafc5528..f0f95a2701e1 100644
> --- a/target-moxie/cpu.h
> +++ b/target-moxie/cpu.h
> @@ -136,6 +136,20 @@ static inline void cpu_get_tb_cpu_state(CPUMoxieState *env, target_ulong *pc,
>      *flags = 0;
>  }
>
> +static inline void cpu_get_invalid_tb_cpu_state(target_ulong *pc,
> +                                                target_ulong *cs_base,
> +                                                uint32_t *flags)
> +{
> +    *cs_base = -1;
> +}
> +
> +static inline bool cpu_tb_cpu_state_is_invalidated(target_ulong pc,
> +                                                   target_ulong cs_base,
> +                                                   uint32_t flags)
> +{
> +    return cs_base == -1;
> +}
> +
>  int moxie_cpu_handle_mmu_fault(CPUState *cpu, vaddr address,
>                                 int rw, int mmu_idx);
>
> diff --git a/target-openrisc/cpu.h b/target-openrisc/cpu.h
> index 9451a7cca691..a5650cbed4b7 100644
> --- a/target-openrisc/cpu.h
> +++ b/target-openrisc/cpu.h
> @@ -398,6 +398,20 @@ static inline void cpu_get_tb_cpu_state(CPUOpenRISCState *env,
>      *flags = (env->flags & D_FLAG);
>  }
>
> +static inline void cpu_get_invalid_tb_cpu_state(target_ulong *pc,
> +                                                target_ulong *cs_base,
> +                                                uint32_t *flags)
> +{
> +    *cs_base = -1;
> +}
> +
> +static inline bool cpu_tb_cpu_state_is_invalidated(target_ulong pc,
> +                                                   target_ulong cs_base,
> +                                                   uint32_t flags)
> +{
> +    return cs_base == -1;
> +}
> +
>  static inline int cpu_mmu_index(CPUOpenRISCState *env, bool ifetch)
>  {
>      if (!(env->sr & SR_IME)) {
> diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
> index 2666a3f80d00..092c7954d19a 100644
> --- a/target-ppc/cpu.h
> +++ b/target-ppc/cpu.h
> @@ -2294,6 +2294,20 @@ static inline void cpu_get_tb_cpu_state(CPUPPCState *env, target_ulong *pc,
>      *flags = env->hflags;
>  }
>
> +static inline void cpu_get_invalid_tb_cpu_state(target_ulong *pc,
> +                                                target_ulong *cs_base,
> +                                                uint32_t *flags)
> +{
> +    *cs_base = -1;
> +}
> +
> +static inline bool cpu_tb_cpu_state_is_invalidated(target_ulong pc,
> +                                                   target_ulong cs_base,
> +                                                   uint32_t flags)
> +{
> +    return cs_base == -1;
> +}
> +
>  #if !defined(CONFIG_USER_ONLY)
>  static inline int booke206_tlbm_id(CPUPPCState *env, ppcmas_tlb_t *tlbm)
>  {
> diff --git a/target-s390x/cpu.h b/target-s390x/cpu.h
> index 8bcb0f75f399..574cdb51c967 100644
> --- a/target-s390x/cpu.h
> +++ b/target-s390x/cpu.h
> @@ -393,6 +393,20 @@ static inline void cpu_get_tb_cpu_state(CPUS390XState* env, target_ulong *pc,
>               ((env->psw.mask & PSW_MASK_32) ? FLAG_MASK_32 : 0);
>  }
>
> +static inline void cpu_get_invalid_tb_cpu_state(target_ulong *pc,
> +                                                target_ulong *cs_base,
> +                                                uint32_t *flags)
> +{
> +    *cs_base = -1;
> +}
> +
> +static inline bool cpu_tb_cpu_state_is_invalidated(target_ulong pc,
> +                                                   target_ulong cs_base,
> +                                                   uint32_t flags)
> +{
> +    return cs_base == -1;
> +}
> +
>  /* While the PoO talks about ILC (a number between 1-3) what is actually
>     stored in LowCore is shifted left one bit (an even between 2-6).  As
>     this is the actual length of the insn and therefore more useful, that
> diff --git a/target-sh4/cpu.h b/target-sh4/cpu.h
> index 3f9dae2d1f0d..bcddbf31c853 100644
> --- a/target-sh4/cpu.h
> +++ b/target-sh4/cpu.h
> @@ -387,4 +387,18 @@ static inline void cpu_get_tb_cpu_state(CPUSH4State *env, target_ulong *pc,
>              | (env->movcal_backup ? TB_FLAG_PENDING_MOVCA : 0); /* Bit 4 */
>  }
>
> +static inline void cpu_get_invalid_tb_cpu_state(target_ulong *pc,
> +                                                target_ulong *cs_base,
> +                                                uint32_t *flags)
> +{
> +    *cs_base = -1;
> +}
> +
> +static inline bool cpu_tb_cpu_state_is_invalidated(target_ulong pc,
> +                                                   target_ulong cs_base,
> +                                                   uint32_t flags)
> +{
> +    return cs_base == -1;
> +}
> +
>  #endif				/* _CPU_SH4_H */
> diff --git a/target-sparc/cpu.h b/target-sparc/cpu.h
> index 604de84624ca..09df29b7b439 100644
> --- a/target-sparc/cpu.h
> +++ b/target-sparc/cpu.h
> @@ -749,6 +749,20 @@ static inline void cpu_get_tb_cpu_state(CPUSPARCState *env, target_ulong *pc,
>  #endif
>  }
>
> +static inline void cpu_get_invalid_tb_cpu_state(target_ulong *pc,
> +                                                target_ulong *cs_base,
> +                                                uint32_t *flags)
> +{
> +    *cs_base = -1; /* npc must be a multible of 4 */
> +}
> +
> +static inline bool cpu_tb_cpu_state_is_invalidated(target_ulong pc,
> +                                                   target_ulong cs_base,
> +                                                   uint32_t flags)
> +{
> +    return cs_base == -1;
> +}
> +
>  static inline bool tb_fpu_enabled(int tb_flags)
>  {
>  #if defined(CONFIG_USER_ONLY)
> diff --git a/target-sparc/translate.c b/target-sparc/translate.c
> index 0f4faf70624f..cc160bc67be0 100644
> --- a/target-sparc/translate.c
> +++ b/target-sparc/translate.c
> @@ -38,6 +38,7 @@
>  #define DYNAMIC_PC  1 /* dynamic pc value */
>  #define JUMP_PC     2 /* dynamic pc value which takes only two values
>                           according to jump_pc[T2] */
> +/* NOTE: -1 is reserved for cpu_get_invalid_tb_cpu_state() */
>
>  /* global register indexes */
>  static TCGv_env cpu_env;
> diff --git a/target-tilegx/cpu.h b/target-tilegx/cpu.h
> index d74032925b12..79f84a422de1 100644
> --- a/target-tilegx/cpu.h
> +++ b/target-tilegx/cpu.h
> @@ -174,4 +174,18 @@ static inline void cpu_get_tb_cpu_state(CPUTLGState *env, target_ulong *pc,
>      *flags = 0;
>  }
>
> +static inline void cpu_get_invalid_tb_cpu_state(target_ulong *pc,
> +                                                target_ulong *cs_base,
> +                                                uint32_t *flags)
> +{
> +    *cs_base = -1;
> +}
> +
> +static inline bool cpu_tb_cpu_state_is_invalidated(target_ulong pc,
> +                                                   target_ulong cs_base,
> +                                                   uint32_t flags)
> +{
> +    return cs_base == -1;
> +}
> +
>  #endif
> diff --git a/target-tricore/cpu.h b/target-tricore/cpu.h
> index a298d63eea24..8f3ca20241b7 100644
> --- a/target-tricore/cpu.h
> +++ b/target-tricore/cpu.h
> @@ -410,6 +410,20 @@ static inline void cpu_get_tb_cpu_state(CPUTriCoreState *env, target_ulong *pc,
>      *flags = 0;
>  }
>
> +static inline void cpu_get_invalid_tb_cpu_state(target_ulong *pc,
> +                                                target_ulong *cs_base,
> +                                                uint32_t *flags)
> +{
> +    *cs_base = -1;
> +}
> +
> +static inline bool cpu_tb_cpu_state_is_invalidated(target_ulong pc,
> +                                                   target_ulong cs_base,
> +                                                   uint32_t flags)
> +{
> +    return cs_base == -1;
> +}
> +
>  TriCoreCPU *cpu_tricore_init(const char *cpu_model);
>
>  #define cpu_init(cpu_model) CPU(cpu_tricore_init(cpu_model))
> diff --git a/target-unicore32/cpu.h b/target-unicore32/cpu.h
> index 83f758496a6c..6bcda6aa3e38 100644
> --- a/target-unicore32/cpu.h
> +++ b/target-unicore32/cpu.h
> @@ -179,6 +179,20 @@ static inline void cpu_get_tb_cpu_state(CPUUniCore32State *env, target_ulong *pc
>      }
>  }
>
> +static inline void cpu_get_invalid_tb_cpu_state(target_ulong *pc,
> +                                                target_ulong *cs_base,
> +                                                uint32_t *flags)
> +{
> +    *cs_base = -1;
> +}
> +
> +static inline bool cpu_tb_cpu_state_is_invalidated(target_ulong pc,
> +                                                   target_ulong cs_base,
> +                                                   uint32_t flags)
> +{
> +    return cs_base == -1;
> +}
> +
>  int uc32_cpu_handle_mmu_fault(CPUState *cpu, vaddr address, int rw,
>                                int mmu_idx);
>  void uc32_translate_init(void);
> diff --git a/target-xtensa/cpu.h b/target-xtensa/cpu.h
> index ce9fb5b0033a..bd0a3b1deddf 100644
> --- a/target-xtensa/cpu.h
> +++ b/target-xtensa/cpu.h
> @@ -582,6 +582,20 @@ static inline void cpu_get_tb_cpu_state(CPUXtensaState *env, target_ulong *pc,
>      }
>  }
>
> +static inline void cpu_get_invalid_tb_cpu_state(target_ulong *pc,
> +                                                target_ulong *cs_base,
> +                                                uint32_t *flags)
> +{
> +    *cs_base = -1;
> +}
> +
> +static inline bool cpu_tb_cpu_state_is_invalidated(target_ulong pc,
> +                                                   target_ulong cs_base,
> +                                                   uint32_t flags)
> +{
> +    return cs_base == -1;
> +}
> +
>  #include "exec/cpu-all.h"
>
>  #endif

Otherwise:

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

--
Alex Bennée

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Qemu-devel] [PATCH v3 04/11] tcg: Prepare safe access to tb_flushed out of tb_lock
  2016-07-14 12:45   ` Alex Bennée
@ 2016-07-14 12:55     ` Sergey Fedorov
  2016-07-14 13:12       ` Alex Bennée
  0 siblings, 1 reply; 49+ messages in thread
From: Sergey Fedorov @ 2016-07-14 12:55 UTC (permalink / raw)
  To: Alex Bennée, Sergey Fedorov
  Cc: qemu-devel, mttcg, fred.konrad, a.rigo, cota, bobby.prani, rth,
	patches, mark.burton, pbonzini, jan.kiszka, peter.maydell,
	claudio.fontana, Peter Crosthwaite

On 14/07/16 15:45, Alex Bennée wrote:
> Sergey Fedorov <sergey.fedorov@linaro.org> writes:
>
>> From: Sergey Fedorov <serge.fdrv@gmail.com>
>>
>> Ensure atomicity of CPU's 'tb_flushed' access for future translation
>> block lookup out of 'tb_lock'.
>>
>> This field can only be touched from another thread by tb_flush() in user
>> mode emulation. So the only access to be atomic is:
>>  * a single write in tb_flush();
>>  * reads/writes out of 'tb_lock'.
> It might worth mentioning the barrier here.

Do you mean atomic_set() vs. atomic_mb_set()?

>
>> In future, before enabling MTTCG in system mode, tb_flush() must be safe
>> and this field becomes unnecessary.
>>
>> Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
>> Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
>> ---
>>  cpu-exec.c      | 16 +++++++---------
>>  translate-all.c |  4 ++--
>>  2 files changed, 9 insertions(+), 11 deletions(-)
>>
>> diff --git a/cpu-exec.c b/cpu-exec.c
>> index d6178eab71d4..c973e3b85922 100644
>> --- a/cpu-exec.c
>> +++ b/cpu-exec.c
>> @@ -338,13 +338,6 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
>>                   tb->flags != flags)) {
>>          tb = tb_find_slow(cpu, pc, cs_base, flags);
>>      }
>> -    if (cpu->tb_flushed) {
>> -        /* Ensure that no TB jump will be modified as the
>> -         * translation buffer has been flushed.
>> -         */
>> -        *last_tb = NULL;
>> -        cpu->tb_flushed = false;
>> -    }
>>  #ifndef CONFIG_USER_ONLY
>>      /* We don't take care of direct jumps when address mapping changes in
>>       * system emulation. So it's not safe to make a direct jump to a TB
>> @@ -356,7 +349,12 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
>>  #endif
>>      /* See if we can patch the calling TB. */
>>      if (last_tb && !qemu_loglevel_mask(CPU_LOG_TB_NOCHAIN)) {
>> -        tb_add_jump(last_tb, tb_exit, tb);
>> +        /* Check if translation buffer has been flushed */
>> +        if (cpu->tb_flushed) {
>> +            cpu->tb_flushed = false;
>> +        } else {
>> +            tb_add_jump(last_tb, tb_exit, tb);
>> +        }
>>      }
>>      tb_unlock();
>>      return tb;
>> @@ -618,7 +616,7 @@ int cpu_exec(CPUState *cpu)
>>              }
>>
>>              last_tb = NULL; /* forget the last executed TB after exception */
>> -            cpu->tb_flushed = false; /* reset before first TB lookup */
>> +            atomic_mb_set(&cpu->tb_flushed, false); /* reset before first TB lookup */
>>              for(;;) {
>>                  cpu_handle_interrupt(cpu, &last_tb);
>>                  tb = tb_find_fast(cpu, last_tb, tb_exit);
>> diff --git a/translate-all.c b/translate-all.c
>> index fdf520a86d68..788fed1e0765 100644
>> --- a/translate-all.c
>> +++ b/translate-all.c
>> @@ -845,7 +845,6 @@ void tb_flush(CPUState *cpu)
>>          > tcg_ctx.code_gen_buffer_size) {
>>          cpu_abort(cpu, "Internal error: code buffer overflow\n");
>>      }
>> -    tcg_ctx.tb_ctx.nb_tbs = 0;
>>
>>      CPU_FOREACH(cpu) {
>>          int i;
>> @@ -853,9 +852,10 @@ void tb_flush(CPUState *cpu)
>>          for (i = 0; i < TB_JMP_CACHE_SIZE; ++i) {
>>              atomic_set(&cpu->tb_jmp_cache[i], NULL);
>>          }
>> -        cpu->tb_flushed = true;
>> +        atomic_mb_set(&cpu->tb_flushed, true);
>>      }
>>
>> +    tcg_ctx.tb_ctx.nb_tbs = 0;
>>      qht_reset_size(&tcg_ctx.tb_ctx.htable, CODE_GEN_HTABLE_SIZE);
> I can see the sense of moving the setting of nb_tbs but is it strictly
> required as part of this patch?

Yes, otherwise tb_alloc() may start allocation TBs from the beginning of
the translation buffer before 'tb_flushed' is updated.

Kind regards,
Sergey

>
>>      page_flush_tb();
> Otherwise:
>
> Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
>
> --
> Alex Bennée

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Qemu-devel] [PATCH v3 07/11] tcg: Prepare TB invalidation for lockless TB lookup
  2016-07-12 20:13 ` [Qemu-devel] [PATCH v3 07/11] tcg: Prepare TB invalidation for lockless TB lookup Sergey Fedorov
@ 2016-07-14 12:59   ` Alex Bennée
  2016-07-14 13:11     ` Sergey Fedorov
  0 siblings, 1 reply; 49+ messages in thread
From: Alex Bennée @ 2016-07-14 12:59 UTC (permalink / raw)
  To: Sergey Fedorov
  Cc: qemu-devel, mttcg, fred.konrad, a.rigo, serge.fdrv, cota,
	bobby.prani, rth, patches, mark.burton, pbonzini, jan.kiszka,
	peter.maydell, claudio.fontana, Peter Crosthwaite


Sergey Fedorov <sergey.fedorov@linaro.org> writes:

> From: Sergey Fedorov <serge.fdrv@gmail.com>
>
> When invalidating a translation block, set an invalid CPU state into the
> TranslationBlock structure first. All subsequent changes are ordered
> after it with smp_wmb(). This pairs with implied smp_rmb() of
> qht_lookup() in tb_find_physical().
>
> As soon as the TB is marked with an invalid CPU state, there is no need
> to remove it from CPU's 'tb_jmp_cache'. However it will be necessary to
> recheck whether the target TB is still valid after acquiring 'tb_lock'
> but before calling tb_add_jump() since TB lookup is to be performed out
> of 'tb_lock' in future. Note that we don't have to check 'last_tb' since
> it is safe to patch an already invalidated TB since it will not be
> executed anyway.

This looks fine. So I guess it's possible for the block to have been
looked up just before it gets invalidated and get one final run but not
get patched?

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

>
> Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
> Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
> ---
>  cpu-exec.c      |  2 +-
>  translate-all.c | 12 +++---------
>  2 files changed, 4 insertions(+), 10 deletions(-)
>
> diff --git a/cpu-exec.c b/cpu-exec.c
> index c973e3b85922..07dc50c56e8d 100644
> --- a/cpu-exec.c
> +++ b/cpu-exec.c
> @@ -352,7 +352,7 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
>          /* Check if translation buffer has been flushed */
>          if (cpu->tb_flushed) {
>              cpu->tb_flushed = false;
> -        } else {
> +        } else if (!tb_is_invalid(tb)) {
>              tb_add_jump(last_tb, tb_exit, tb);
>          }
>      }
> diff --git a/translate-all.c b/translate-all.c
> index 788fed1e0765..ee8308209350 100644
> --- a/translate-all.c
> +++ b/translate-all.c
> @@ -986,11 +986,13 @@ static inline void tb_jmp_unlink(TranslationBlock *tb)
>  /* invalidate one TB */
>  void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr)
>  {
> -    CPUState *cpu;
>      PageDesc *p;
>      uint32_t h;
>      tb_page_addr_t phys_pc;
>
> +    tb_mark_invalid(tb);
> +    smp_wmb();
> +
>      /* remove the TB from the hash list */
>      phys_pc = tb->page_addr[0] + (tb->pc & ~TARGET_PAGE_MASK);
>      h = tb_hash_func(phys_pc, tb->pc, tb->flags);
> @@ -1008,14 +1010,6 @@ void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr)
>          invalidate_page_bitmap(p);
>      }
>
> -    /* remove the TB from the hash list */
> -    h = tb_jmp_cache_hash_func(tb->pc);
> -    CPU_FOREACH(cpu) {
> -        if (atomic_read(&cpu->tb_jmp_cache[h]) == tb) {
> -            atomic_set(&cpu->tb_jmp_cache[h], NULL);
> -        }
> -    }
> -
>      /* suppress this TB from the two jump lists */
>      tb_remove_from_jmp_list(tb, 0);
>      tb_remove_from_jmp_list(tb, 1);


--
Alex Bennée

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Qemu-devel] [PATCH v3 06/11] tcg: Introduce tb_mark_invalid() and tb_is_invalid()
  2016-07-14 12:53   ` Alex Bennée
@ 2016-07-14 13:00     ` Sergey Fedorov
  2016-07-14 13:12       ` Paolo Bonzini
  2016-07-14 13:15       ` Alex Bennée
  0 siblings, 2 replies; 49+ messages in thread
From: Sergey Fedorov @ 2016-07-14 13:00 UTC (permalink / raw)
  To: Alex Bennée, Sergey Fedorov
  Cc: qemu-devel, mttcg, fred.konrad, a.rigo, cota, bobby.prani, rth,
	patches, mark.burton, pbonzini, jan.kiszka, peter.maydell,
	claudio.fontana, Peter Crosthwaite, Edgar E. Iglesias,
	Eduardo Habkost, Michael Walle, Aurelien Jarno, Leon Alrae,
	Anthony Green, Jia Liu, David Gibson, Alexander Graf,
	Mark Cave-Ayland, Artyom Tarasenko, Bastian Koppelmann,
	Guan Xuetao, Max Filippov, qemu-arm, qemu-ppc

On 14/07/16 15:53, Alex Bennée wrote:
> Sergey Fedorov <sergey.fedorov@linaro.org> writes:
>
>> From: Sergey Fedorov <serge.fdrv@gmail.com>
>>
>> These functions will be used to make translation block invalidation safe
>> with concurrent lockless lookup in the global hash table.
>>
>> Most targets don't use 'cs_base'; so marking TB as invalid is as simple
>> as assigning -1 to 'cs_base'. SPARC target stores the next program
>> counter into 'cs_base', and -1 is a fine invalid value since PC must bet
>> a multiple of 4 in SPARC. The only odd target is i386, for which a
>> special flag is introduced in place of removed 'HF_SOFTMMU_MASK'.
>>
>> Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
>> Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
>> Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
>> ---
>>  include/exec/exec-all.h  | 10 ++++++++++
>>  target-alpha/cpu.h       | 14 ++++++++++++++
>>  target-arm/cpu.h         | 14 ++++++++++++++
>>  target-cris/cpu.h        | 14 ++++++++++++++
>>  target-i386/cpu.h        | 17 +++++++++++++++++
>>  target-lm32/cpu.h        | 14 ++++++++++++++
>>  target-m68k/cpu.h        | 14 ++++++++++++++
>>  target-microblaze/cpu.h  | 14 ++++++++++++++
>>  target-mips/cpu.h        | 14 ++++++++++++++
>>  target-moxie/cpu.h       | 14 ++++++++++++++
>>  target-openrisc/cpu.h    | 14 ++++++++++++++
>>  target-ppc/cpu.h         | 14 ++++++++++++++
>>  target-s390x/cpu.h       | 14 ++++++++++++++
>>  target-sh4/cpu.h         | 14 ++++++++++++++
>>  target-sparc/cpu.h       | 14 ++++++++++++++
>>  target-sparc/translate.c |  1 +
>>  target-tilegx/cpu.h      | 14 ++++++++++++++
>>  target-tricore/cpu.h     | 14 ++++++++++++++
>>  target-unicore32/cpu.h   | 14 ++++++++++++++
>>  target-xtensa/cpu.h      | 14 ++++++++++++++
>>  20 files changed, 266 insertions(+)
>>
>> diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
>> index db79ab65cebe..61cc3a1fb8f7 100644
>> --- a/include/exec/exec-all.h
>> +++ b/include/exec/exec-all.h
>> @@ -256,6 +256,16 @@ void tb_free(TranslationBlock *tb);
>>  void tb_flush(CPUState *cpu);
>>  void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr);
>>
>> +static inline void tb_mark_invalid(TranslationBlock *tb)
>> +{
>> +    cpu_get_invalid_tb_cpu_state(&tb->pc, &tb->cs_base, &tb->flags);
> Hmmm are we getting something here? Maybe cpu_tb_invalidate_cpu_state?

Just to be similar to cpu_get_tb_cpu_state().

>
>> +}
>> +
>> +static inline bool tb_is_invalid(TranslationBlock *tb)
>> +{
>> +    return cpu_tb_cpu_state_is_invalidated(tb->pc, tb->cs_base, tb->flags);
>> +}
> Also why are we passing three pointers to parts of TranslationBlock? Why
> not just pass tb directly and be done with it?

I'm not sure we want to include exec/exec-all.h in target-*/cpu.h

>
> I'm sure the compiler does something sensible but seems overly verbose
> to me.
>
>> +
>>  #if defined(USE_DIRECT_JUMP)
>>
>>  #if defined(CONFIG_TCG_INTERPRETER)
(snip)
> Otherwise:
>
> Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

Kind regards,
Sergey

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Qemu-devel] [PATCH v3 10/11] tcg: Avoid bouncing tb_lock between tb_gen_code() and tb_add_jump()
  2016-07-12 20:13 ` [Qemu-devel] [PATCH v3 10/11] tcg: Avoid bouncing tb_lock between tb_gen_code() and tb_add_jump() Sergey Fedorov
@ 2016-07-14 13:01   ` Alex Bennée
  2016-07-14 13:13     ` Sergey Fedorov
  0 siblings, 1 reply; 49+ messages in thread
From: Alex Bennée @ 2016-07-14 13:01 UTC (permalink / raw)
  To: Sergey Fedorov
  Cc: qemu-devel, mttcg, fred.konrad, a.rigo, serge.fdrv, cota,
	bobby.prani, rth, patches, mark.burton, pbonzini, jan.kiszka,
	peter.maydell, claudio.fontana, Peter Crosthwaite


Sergey Fedorov <sergey.fedorov@linaro.org> writes:

> From: Sergey Fedorov <serge.fdrv@gmail.com>
>
> Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
> Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

Much better than my cack-hander attempt to clean this up ;-)

TBH I'd be up for merging this with patch 11 but I'm happy to defer to
the maintainers on this one.

> ---
>  cpu-exec.c | 15 +++++++++++----
>  1 file changed, 11 insertions(+), 4 deletions(-)
>
> diff --git a/cpu-exec.c b/cpu-exec.c
> index 4eabd534aba0..22c672fe03fd 100644
> --- a/cpu-exec.c
> +++ b/cpu-exec.c
> @@ -281,7 +281,8 @@ static TranslationBlock *tb_find_physical(CPUState *cpu,
>  static TranslationBlock *tb_find_slow(CPUState *cpu,
>                                        target_ulong pc,
>                                        target_ulong cs_base,
> -                                      uint32_t flags)
> +                                      uint32_t flags,
> +                                      bool *have_tb_lock)
>  {
>      TranslationBlock *tb;
>
> @@ -294,6 +295,7 @@ static TranslationBlock *tb_find_slow(CPUState *cpu,
>           */
>          mmap_lock();
>          tb_lock();
> +        *have_tb_lock = true;
>
>          /* There's a chance that our desired tb has been translated while
>           * taking the locks so we check again inside the lock.
> @@ -304,7 +306,6 @@ static TranslationBlock *tb_find_slow(CPUState *cpu,
>              tb = tb_gen_code(cpu, pc, cs_base, flags, 0);
>          }
>
> -        tb_unlock();
>          mmap_unlock();
>      }
>
> @@ -321,6 +322,7 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
>      TranslationBlock *tb;
>      target_ulong cs_base, pc;
>      uint32_t flags;
> +    bool have_tb_lock = false;
>
>      /* we record a subset of the CPU state. It will
>         always be the same before a given translated block
> @@ -329,7 +331,7 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
>      tb = atomic_read(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)]);
>      if (unlikely(!tb || tb->pc != pc || tb->cs_base != cs_base ||
>                   tb->flags != flags)) {
> -        tb = tb_find_slow(cpu, pc, cs_base, flags);
> +        tb = tb_find_slow(cpu, pc, cs_base, flags, &have_tb_lock);
>      }
>  #ifndef CONFIG_USER_ONLY
>      /* We don't take care of direct jumps when address mapping changes in
> @@ -342,13 +344,18 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
>  #endif
>      /* See if we can patch the calling TB. */
>      if (last_tb && !qemu_loglevel_mask(CPU_LOG_TB_NOCHAIN)) {
> -        tb_lock();
> +        if (!have_tb_lock) {
> +            tb_lock();
> +            have_tb_lock = true;
> +        }
>          /* Check if translation buffer has been flushed */
>          if (cpu->tb_flushed) {
>              cpu->tb_flushed = false;
>          } else if (!tb_is_invalid(tb)) {
>              tb_add_jump(last_tb, tb_exit, tb);
>          }
> +    }
> +    if (have_tb_lock) {
>          tb_unlock();
>      }
>      return tb;


--
Alex Bennée

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Qemu-devel] [PATCH v3 11/11] tcg: Merge tb_find_slow() and tb_find_fast()
  2016-07-12 20:13 ` [Qemu-devel] [PATCH v3 11/11] tcg: Merge tb_find_slow() and tb_find_fast() Sergey Fedorov
@ 2016-07-14 13:02   ` Alex Bennée
  0 siblings, 0 replies; 49+ messages in thread
From: Alex Bennée @ 2016-07-14 13:02 UTC (permalink / raw)
  To: Sergey Fedorov
  Cc: qemu-devel, mttcg, fred.konrad, a.rigo, serge.fdrv, cota,
	bobby.prani, rth, patches, mark.burton, pbonzini, jan.kiszka,
	peter.maydell, claudio.fontana, Peter Crosthwaite


Sergey Fedorov <sergey.fedorov@linaro.org> writes:

> From: Sergey Fedorov <serge.fdrv@gmail.com>
>
> These functions are not too big and can be merged together. This makes
> locking scheme more clear and easier to follow.
>
> Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
> Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>


c.f. comments on 10/11.

> ---
>  cpu-exec.c | 72 ++++++++++++++++++++++++++------------------------------------
>  1 file changed, 30 insertions(+), 42 deletions(-)
>
> diff --git a/cpu-exec.c b/cpu-exec.c
> index 22c672fe03fd..6b01e8ceb0e8 100644
> --- a/cpu-exec.c
> +++ b/cpu-exec.c
> @@ -278,45 +278,9 @@ static TranslationBlock *tb_find_physical(CPUState *cpu,
>      return qht_lookup(&tcg_ctx.tb_ctx.htable, tb_cmp, &desc, h);
>  }
>
> -static TranslationBlock *tb_find_slow(CPUState *cpu,
> -                                      target_ulong pc,
> -                                      target_ulong cs_base,
> -                                      uint32_t flags,
> -                                      bool *have_tb_lock)
> -{
> -    TranslationBlock *tb;
> -
> -    tb = tb_find_physical(cpu, pc, cs_base, flags);
> -    if (!tb) {
> -
> -        /* mmap_lock is needed by tb_gen_code, and mmap_lock must be
> -         * taken outside tb_lock. As system emulation is currently
> -         * single threaded the locks are NOPs.
> -         */
> -        mmap_lock();
> -        tb_lock();
> -        *have_tb_lock = true;
> -
> -        /* There's a chance that our desired tb has been translated while
> -         * taking the locks so we check again inside the lock.
> -         */
> -        tb = tb_find_physical(cpu, pc, cs_base, flags);
> -        if (!tb) {
> -            /* if no translated code available, then translate it now */
> -            tb = tb_gen_code(cpu, pc, cs_base, flags, 0);
> -        }
> -
> -        mmap_unlock();
> -    }
> -
> -    /* We add the TB in the virtual pc hash table for the fast lookup */
> -    atomic_set(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)], tb);
> -    return tb;
> -}
> -
> -static inline TranslationBlock *tb_find_fast(CPUState *cpu,
> -                                             TranslationBlock *last_tb,
> -                                             int tb_exit)
> +static inline TranslationBlock *tb_find(CPUState *cpu,
> +                                        TranslationBlock *last_tb,
> +                                        int tb_exit)
>  {
>      CPUArchState *env = (CPUArchState *)cpu->env_ptr;
>      TranslationBlock *tb;
> @@ -331,7 +295,31 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
>      tb = atomic_read(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)]);
>      if (unlikely(!tb || tb->pc != pc || tb->cs_base != cs_base ||
>                   tb->flags != flags)) {
> -        tb = tb_find_slow(cpu, pc, cs_base, flags, &have_tb_lock);
> +        tb = tb_find_physical(cpu, pc, cs_base, flags);
> +        if (!tb) {
> +
> +            /* mmap_lock is needed by tb_gen_code, and mmap_lock must be
> +             * taken outside tb_lock. As system emulation is currently
> +             * single threaded the locks are NOPs.
> +             */
> +            mmap_lock();
> +            tb_lock();
> +            have_tb_lock = true;
> +
> +            /* There's a chance that our desired tb has been translated while
> +             * taking the locks so we check again inside the lock.
> +             */
> +            tb = tb_find_physical(cpu, pc, cs_base, flags);
> +            if (!tb) {
> +                /* if no translated code available, then translate it now */
> +                tb = tb_gen_code(cpu, pc, cs_base, flags, 0);
> +            }
> +
> +            mmap_unlock();
> +        }
> +
> +        /* We add the TB in the virtual pc hash table for the fast lookup */
> +        atomic_set(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)], tb);
>      }
>  #ifndef CONFIG_USER_ONLY
>      /* We don't take care of direct jumps when address mapping changes in
> @@ -436,7 +424,7 @@ static inline bool cpu_handle_exception(CPUState *cpu, int *ret)
>      } else if (replay_has_exception()
>                 && cpu->icount_decr.u16.low + cpu->icount_extra == 0) {
>          /* try to cause an exception pending in the log */
> -        cpu_exec_nocache(cpu, 1, tb_find_fast(cpu, NULL, 0), true);
> +        cpu_exec_nocache(cpu, 1, tb_find(cpu, NULL, 0), true);
>          *ret = -1;
>          return true;
>  #endif
> @@ -620,7 +608,7 @@ int cpu_exec(CPUState *cpu)
>              atomic_mb_set(&cpu->tb_flushed, false); /* reset before first TB lookup */
>              for(;;) {
>                  cpu_handle_interrupt(cpu, &last_tb);
> -                tb = tb_find_fast(cpu, last_tb, tb_exit);
> +                tb = tb_find(cpu, last_tb, tb_exit);
>                  cpu_loop_exec_tb(cpu, tb, &last_tb, &tb_exit, &sc);
>                  /* Try to align the host and virtual clocks
>                     if the guest is in advance */


--
Alex Bennée

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Qemu-devel] [PATCH v3 07/11] tcg: Prepare TB invalidation for lockless TB lookup
  2016-07-14 12:59   ` Alex Bennée
@ 2016-07-14 13:11     ` Sergey Fedorov
  0 siblings, 0 replies; 49+ messages in thread
From: Sergey Fedorov @ 2016-07-14 13:11 UTC (permalink / raw)
  To: Alex Bennée, Sergey Fedorov
  Cc: qemu-devel, mttcg, fred.konrad, a.rigo, cota, bobby.prani, rth,
	patches, mark.burton, pbonzini, jan.kiszka, peter.maydell,
	claudio.fontana, Peter Crosthwaite

On 14/07/16 15:59, Alex Bennée wrote:
> Sergey Fedorov <sergey.fedorov@linaro.org> writes:
>
>> From: Sergey Fedorov <serge.fdrv@gmail.com>
>>
>> When invalidating a translation block, set an invalid CPU state into the
>> TranslationBlock structure first. All subsequent changes are ordered
>> after it with smp_wmb(). This pairs with implied smp_rmb() of
>> qht_lookup() in tb_find_physical().
>>
>> As soon as the TB is marked with an invalid CPU state, there is no need
>> to remove it from CPU's 'tb_jmp_cache'. However it will be necessary to
>> recheck whether the target TB is still valid after acquiring 'tb_lock'
>> but before calling tb_add_jump() since TB lookup is to be performed out
>> of 'tb_lock' in future. Note that we don't have to check 'last_tb' since
>> it is safe to patch an already invalidated TB since it will not be
>> executed anyway.
> This looks fine. So I guess it's possible for the block to have been
> looked up just before it gets invalidated and get one final run but not
> get patched?

Yes, exactly.

Thanks,
Sergey

>
> Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
>
>> Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
>> Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
>> Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
>> ---
>>  cpu-exec.c      |  2 +-
>>  translate-all.c | 12 +++---------
>>  2 files changed, 4 insertions(+), 10 deletions(-)
>>
>> diff --git a/cpu-exec.c b/cpu-exec.c
>> index c973e3b85922..07dc50c56e8d 100644
>> --- a/cpu-exec.c
>> +++ b/cpu-exec.c
>> @@ -352,7 +352,7 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
>>          /* Check if translation buffer has been flushed */
>>          if (cpu->tb_flushed) {
>>              cpu->tb_flushed = false;
>> -        } else {
>> +        } else if (!tb_is_invalid(tb)) {
>>              tb_add_jump(last_tb, tb_exit, tb);
>>          }
>>      }
>> diff --git a/translate-all.c b/translate-all.c
>> index 788fed1e0765..ee8308209350 100644
>> --- a/translate-all.c
>> +++ b/translate-all.c
>> @@ -986,11 +986,13 @@ static inline void tb_jmp_unlink(TranslationBlock *tb)
>>  /* invalidate one TB */
>>  void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr)
>>  {
>> -    CPUState *cpu;
>>      PageDesc *p;
>>      uint32_t h;
>>      tb_page_addr_t phys_pc;
>>
>> +    tb_mark_invalid(tb);
>> +    smp_wmb();
>> +
>>      /* remove the TB from the hash list */
>>      phys_pc = tb->page_addr[0] + (tb->pc & ~TARGET_PAGE_MASK);
>>      h = tb_hash_func(phys_pc, tb->pc, tb->flags);
>> @@ -1008,14 +1010,6 @@ void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr)
>>          invalidate_page_bitmap(p);
>>      }
>>
>> -    /* remove the TB from the hash list */
>> -    h = tb_jmp_cache_hash_func(tb->pc);
>> -    CPU_FOREACH(cpu) {
>> -        if (atomic_read(&cpu->tb_jmp_cache[h]) == tb) {
>> -            atomic_set(&cpu->tb_jmp_cache[h], NULL);
>> -        }
>> -    }
>> -
>>      /* suppress this TB from the two jump lists */
>>      tb_remove_from_jmp_list(tb, 0);
>>      tb_remove_from_jmp_list(tb, 1);
>
> --
> Alex Bennée

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Qemu-devel] [PATCH v3 04/11] tcg: Prepare safe access to tb_flushed out of tb_lock
  2016-07-14 12:55     ` Sergey Fedorov
@ 2016-07-14 13:12       ` Alex Bennée
  0 siblings, 0 replies; 49+ messages in thread
From: Alex Bennée @ 2016-07-14 13:12 UTC (permalink / raw)
  To: Sergey Fedorov
  Cc: Sergey Fedorov, qemu-devel, mttcg, fred.konrad, a.rigo, cota,
	bobby.prani, rth, patches, mark.burton, pbonzini, jan.kiszka,
	peter.maydell, claudio.fontana, Peter Crosthwaite


Sergey Fedorov <serge.fdrv@gmail.com> writes:

> On 14/07/16 15:45, Alex Bennée wrote:
>> Sergey Fedorov <sergey.fedorov@linaro.org> writes:
>>
>>> From: Sergey Fedorov <serge.fdrv@gmail.com>
>>>
>>> Ensure atomicity of CPU's 'tb_flushed' access for future translation
>>> block lookup out of 'tb_lock'.
>>>
>>> This field can only be touched from another thread by tb_flush() in user
>>> mode emulation. So the only access to be atomic is:
>>>  * a single write in tb_flush();
>>>  * reads/writes out of 'tb_lock'.
>> It might worth mentioning the barrier here.
>
> Do you mean atomic_set() vs. atomic_mb_set()?

Yes.

>
>>
>>> In future, before enabling MTTCG in system mode, tb_flush() must be safe
>>> and this field becomes unnecessary.
>>>
>>> Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
>>> Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
>>> ---
>>>  cpu-exec.c      | 16 +++++++---------
>>>  translate-all.c |  4 ++--
>>>  2 files changed, 9 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/cpu-exec.c b/cpu-exec.c
>>> index d6178eab71d4..c973e3b85922 100644
>>> --- a/cpu-exec.c
>>> +++ b/cpu-exec.c
>>> @@ -338,13 +338,6 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
>>>                   tb->flags != flags)) {
>>>          tb = tb_find_slow(cpu, pc, cs_base, flags);
>>>      }
>>> -    if (cpu->tb_flushed) {
>>> -        /* Ensure that no TB jump will be modified as the
>>> -         * translation buffer has been flushed.
>>> -         */
>>> -        *last_tb = NULL;
>>> -        cpu->tb_flushed = false;
>>> -    }
>>>  #ifndef CONFIG_USER_ONLY
>>>      /* We don't take care of direct jumps when address mapping changes in
>>>       * system emulation. So it's not safe to make a direct jump to a TB
>>> @@ -356,7 +349,12 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
>>>  #endif
>>>      /* See if we can patch the calling TB. */
>>>      if (last_tb && !qemu_loglevel_mask(CPU_LOG_TB_NOCHAIN)) {
>>> -        tb_add_jump(last_tb, tb_exit, tb);
>>> +        /* Check if translation buffer has been flushed */
>>> +        if (cpu->tb_flushed) {
>>> +            cpu->tb_flushed = false;
>>> +        } else {
>>> +            tb_add_jump(last_tb, tb_exit, tb);
>>> +        }
>>>      }
>>>      tb_unlock();
>>>      return tb;
>>> @@ -618,7 +616,7 @@ int cpu_exec(CPUState *cpu)
>>>              }
>>>
>>>              last_tb = NULL; /* forget the last executed TB after exception */
>>> -            cpu->tb_flushed = false; /* reset before first TB lookup */
>>> +            atomic_mb_set(&cpu->tb_flushed, false); /* reset before first TB lookup */
>>>              for(;;) {
>>>                  cpu_handle_interrupt(cpu, &last_tb);
>>>                  tb = tb_find_fast(cpu, last_tb, tb_exit);
>>> diff --git a/translate-all.c b/translate-all.c
>>> index fdf520a86d68..788fed1e0765 100644
>>> --- a/translate-all.c
>>> +++ b/translate-all.c
>>> @@ -845,7 +845,6 @@ void tb_flush(CPUState *cpu)
>>>          > tcg_ctx.code_gen_buffer_size) {
>>>          cpu_abort(cpu, "Internal error: code buffer overflow\n");
>>>      }
>>> -    tcg_ctx.tb_ctx.nb_tbs = 0;
>>>
>>>      CPU_FOREACH(cpu) {
>>>          int i;
>>> @@ -853,9 +852,10 @@ void tb_flush(CPUState *cpu)
>>>          for (i = 0; i < TB_JMP_CACHE_SIZE; ++i) {
>>>              atomic_set(&cpu->tb_jmp_cache[i], NULL);
>>>          }
>>> -        cpu->tb_flushed = true;
>>> +        atomic_mb_set(&cpu->tb_flushed, true);
>>>      }
>>>
>>> +    tcg_ctx.tb_ctx.nb_tbs = 0;
>>>      qht_reset_size(&tcg_ctx.tb_ctx.htable, CODE_GEN_HTABLE_SIZE);
>> I can see the sense of moving the setting of nb_tbs but is it strictly
>> required as part of this patch?
>
> Yes, otherwise tb_alloc() may start allocation TBs from the beginning of
> the translation buffer before 'tb_flushed' is updated.

Ahh yes I see. Thanks

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>


>
> Kind regards,
> Sergey
>
>>
>>>      page_flush_tb();
>> Otherwise:
>>
>> Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
>>
>> --
>> Alex Bennée


--
Alex Bennée

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Qemu-devel] [PATCH v3 06/11] tcg: Introduce tb_mark_invalid() and tb_is_invalid()
  2016-07-14 13:00     ` Sergey Fedorov
@ 2016-07-14 13:12       ` Paolo Bonzini
  2016-07-14 13:15       ` Alex Bennée
  1 sibling, 0 replies; 49+ messages in thread
From: Paolo Bonzini @ 2016-07-14 13:12 UTC (permalink / raw)
  To: Sergey Fedorov, Alex Bennée, Sergey Fedorov
  Cc: qemu-devel, mttcg, fred.konrad, a.rigo, cota, bobby.prani, rth,
	patches, mark.burton, jan.kiszka, peter.maydell, claudio.fontana,
	Peter Crosthwaite, Edgar E. Iglesias, Eduardo Habkost,
	Michael Walle, Aurelien Jarno, Leon Alrae, Anthony Green,
	Jia Liu, David Gibson, Alexander Graf, Mark Cave-Ayland,
	Artyom Tarasenko, Bastian Koppelmann, Guan Xuetao, Max Filippov,
	qemu-arm, qemu-ppc



On 14/07/2016 15:00, Sergey Fedorov wrote:
> > > +}
> > > +
> > > +static inline bool tb_is_invalid(TranslationBlock *tb)
> > > +{
> > > +    return cpu_tb_cpu_state_is_invalidated(tb->pc, tb->cs_base, tb->flags);
> > > +}
> > Also why are we passing three pointers to parts of TranslationBlock? Why
> > not just pass tb directly and be done with it?
> 
> I'm not sure we want to include exec/exec-all.h in target-*/cpu.h

We don't, exec/exec-all.h is TCG-specific while cpu.h isn't.
Implementing tb_mark_invalid/tb_is_invalid in target-* without the
indirection would be possible, but it would require splitting that out
into a new header such as target-*/exec.h.

Paolo

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Qemu-devel] [PATCH v3 10/11] tcg: Avoid bouncing tb_lock between tb_gen_code() and tb_add_jump()
  2016-07-14 13:01   ` Alex Bennée
@ 2016-07-14 13:13     ` Sergey Fedorov
  0 siblings, 0 replies; 49+ messages in thread
From: Sergey Fedorov @ 2016-07-14 13:13 UTC (permalink / raw)
  To: Alex Bennée, Sergey Fedorov
  Cc: qemu-devel, mttcg, fred.konrad, a.rigo, cota, bobby.prani, rth,
	patches, mark.burton, pbonzini, jan.kiszka, peter.maydell,
	claudio.fontana, Peter Crosthwaite

On 14/07/16 16:01, Alex Bennée wrote:
> Sergey Fedorov <sergey.fedorov@linaro.org> writes:
>
>> From: Sergey Fedorov <serge.fdrv@gmail.com>
>>
>> Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
>> Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
> Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
>
> Much better than my cack-hander attempt to clean this up ;-)
>
> TBH I'd be up for merging this with patch 11 but I'm happy to defer to
> the maintainers on this one.

I just split them because I was not sure if both are acceptable.

Thanks,
Sergey

>
>> ---
>>  cpu-exec.c | 15 +++++++++++----
>>  1 file changed, 11 insertions(+), 4 deletions(-)
>>
>> diff --git a/cpu-exec.c b/cpu-exec.c
>> index 4eabd534aba0..22c672fe03fd 100644
>> --- a/cpu-exec.c
>> +++ b/cpu-exec.c
>> @@ -281,7 +281,8 @@ static TranslationBlock *tb_find_physical(CPUState *cpu,
>>  static TranslationBlock *tb_find_slow(CPUState *cpu,
>>                                        target_ulong pc,
>>                                        target_ulong cs_base,
>> -                                      uint32_t flags)
>> +                                      uint32_t flags,
>> +                                      bool *have_tb_lock)
>>  {
>>      TranslationBlock *tb;
>>
>> @@ -294,6 +295,7 @@ static TranslationBlock *tb_find_slow(CPUState *cpu,
>>           */
>>          mmap_lock();
>>          tb_lock();
>> +        *have_tb_lock = true;
>>
>>          /* There's a chance that our desired tb has been translated while
>>           * taking the locks so we check again inside the lock.
>> @@ -304,7 +306,6 @@ static TranslationBlock *tb_find_slow(CPUState *cpu,
>>              tb = tb_gen_code(cpu, pc, cs_base, flags, 0);
>>          }
>>
>> -        tb_unlock();
>>          mmap_unlock();
>>      }
>>
>> @@ -321,6 +322,7 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
>>      TranslationBlock *tb;
>>      target_ulong cs_base, pc;
>>      uint32_t flags;
>> +    bool have_tb_lock = false;
>>
>>      /* we record a subset of the CPU state. It will
>>         always be the same before a given translated block
>> @@ -329,7 +331,7 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
>>      tb = atomic_read(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)]);
>>      if (unlikely(!tb || tb->pc != pc || tb->cs_base != cs_base ||
>>                   tb->flags != flags)) {
>> -        tb = tb_find_slow(cpu, pc, cs_base, flags);
>> +        tb = tb_find_slow(cpu, pc, cs_base, flags, &have_tb_lock);
>>      }
>>  #ifndef CONFIG_USER_ONLY
>>      /* We don't take care of direct jumps when address mapping changes in
>> @@ -342,13 +344,18 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
>>  #endif
>>      /* See if we can patch the calling TB. */
>>      if (last_tb && !qemu_loglevel_mask(CPU_LOG_TB_NOCHAIN)) {
>> -        tb_lock();
>> +        if (!have_tb_lock) {
>> +            tb_lock();
>> +            have_tb_lock = true;
>> +        }
>>          /* Check if translation buffer has been flushed */
>>          if (cpu->tb_flushed) {
>>              cpu->tb_flushed = false;
>>          } else if (!tb_is_invalid(tb)) {
>>              tb_add_jump(last_tb, tb_exit, tb);
>>          }
>> +    }
>> +    if (have_tb_lock) {
>>          tb_unlock();
>>      }
>>      return tb;
>
> --
> Alex Bennée

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Qemu-devel] [PATCH v3 00/11] Reduce lock contention on TCG hot-path
  2016-07-14 12:10     ` Paolo Bonzini
@ 2016-07-14 13:13       ` Alex Bennée
  0 siblings, 0 replies; 49+ messages in thread
From: Alex Bennée @ 2016-07-14 13:13 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sergey Fedorov, qemu-devel, mttcg, fred.konrad, a.rigo,
	serge.fdrv, cota, bobby.prani, rth, peter.maydell, patches,
	claudio.fontana, mark.burton, jan.kiszka


Paolo Bonzini <pbonzini@redhat.com> writes:

> On 14/07/2016 14:02, Alex Bennée wrote:
>> Looking at the perf data it looks like the hotest part of the code now
>> is cpu_get_tb_cpu_state which is required to get the initial hash to
>> search for the next tb.
>
> It should be possible to keep the flags up-to-date in a separate field
> of CPUARMState, similar to what x86 does.

Possibly, there is a tension between lazy and always upto date version
of the flags. But this is work for another day I think ;-)

>
> Paolo


--
Alex Bennée

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Qemu-devel] [PATCH v3 06/11] tcg: Introduce tb_mark_invalid() and tb_is_invalid()
  2016-07-14 13:00     ` Sergey Fedorov
  2016-07-14 13:12       ` Paolo Bonzini
@ 2016-07-14 13:15       ` Alex Bennée
  1 sibling, 0 replies; 49+ messages in thread
From: Alex Bennée @ 2016-07-14 13:15 UTC (permalink / raw)
  To: Sergey Fedorov
  Cc: Sergey Fedorov, qemu-devel, mttcg, fred.konrad, a.rigo, cota,
	bobby.prani, rth, patches, mark.burton, pbonzini, jan.kiszka,
	peter.maydell, claudio.fontana, Peter Crosthwaite,
	Edgar E. Iglesias, Eduardo Habkost, Michael Walle,
	Aurelien Jarno, Leon Alrae, Anthony Green, Jia Liu, David Gibson,
	Alexander Graf, Mark Cave-Ayland, Artyom Tarasenko,
	Bastian Koppelmann, Guan Xuetao, Max Filippov, qemu-arm, u.org


Sergey Fedorov <serge.fdrv@gmail.com> writes:

> On 14/07/16 15:53, Alex Bennée wrote:
>> Sergey Fedorov <sergey.fedorov@linaro.org> writes:
>>
>>> From: Sergey Fedorov <serge.fdrv@gmail.com>
>>>
>>> These functions will be used to make translation block invalidation safe
>>> with concurrent lockless lookup in the global hash table.
>>>
>>> Most targets don't use 'cs_base'; so marking TB as invalid is as simple
>>> as assigning -1 to 'cs_base'. SPARC target stores the next program
>>> counter into 'cs_base', and -1 is a fine invalid value since PC must bet
>>> a multiple of 4 in SPARC. The only odd target is i386, for which a
>>> special flag is introduced in place of removed 'HF_SOFTMMU_MASK'.
>>>
>>> Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
>>> Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
>>> Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
>>> ---
>>>  include/exec/exec-all.h  | 10 ++++++++++
>>>  target-alpha/cpu.h       | 14 ++++++++++++++
>>>  target-arm/cpu.h         | 14 ++++++++++++++
>>>  target-cris/cpu.h        | 14 ++++++++++++++
>>>  target-i386/cpu.h        | 17 +++++++++++++++++
>>>  target-lm32/cpu.h        | 14 ++++++++++++++
>>>  target-m68k/cpu.h        | 14 ++++++++++++++
>>>  target-microblaze/cpu.h  | 14 ++++++++++++++
>>>  target-mips/cpu.h        | 14 ++++++++++++++
>>>  target-moxie/cpu.h       | 14 ++++++++++++++
>>>  target-openrisc/cpu.h    | 14 ++++++++++++++
>>>  target-ppc/cpu.h         | 14 ++++++++++++++
>>>  target-s390x/cpu.h       | 14 ++++++++++++++
>>>  target-sh4/cpu.h         | 14 ++++++++++++++
>>>  target-sparc/cpu.h       | 14 ++++++++++++++
>>>  target-sparc/translate.c |  1 +
>>>  target-tilegx/cpu.h      | 14 ++++++++++++++
>>>  target-tricore/cpu.h     | 14 ++++++++++++++
>>>  target-unicore32/cpu.h   | 14 ++++++++++++++
>>>  target-xtensa/cpu.h      | 14 ++++++++++++++
>>>  20 files changed, 266 insertions(+)
>>>
>>> diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
>>> index db79ab65cebe..61cc3a1fb8f7 100644
>>> --- a/include/exec/exec-all.h
>>> +++ b/include/exec/exec-all.h
>>> @@ -256,6 +256,16 @@ void tb_free(TranslationBlock *tb);
>>>  void tb_flush(CPUState *cpu);
>>>  void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr);
>>>
>>> +static inline void tb_mark_invalid(TranslationBlock *tb)
>>> +{
>>> +    cpu_get_invalid_tb_cpu_state(&tb->pc, &tb->cs_base, &tb->flags);
>> Hmmm are we getting something here? Maybe cpu_tb_invalidate_cpu_state?
>
> Just to be similar to cpu_get_tb_cpu_state().

I guess if you squint just right. I'll drop the objection.

>
>>
>>> +}
>>> +
>>> +static inline bool tb_is_invalid(TranslationBlock *tb)
>>> +{
>>> +    return cpu_tb_cpu_state_is_invalidated(tb->pc, tb->cs_base, tb->flags);
>>> +}
>> Also why are we passing three pointers to parts of TranslationBlock? Why
>> not just pass tb directly and be done with it?
>
> I'm not sure we want to include exec/exec-all.h in target-*/cpu.h

Fair enough.

>
>>
>> I'm sure the compiler does something sensible but seems overly verbose
>> to me.
>>
>>> +
>>>  #if defined(USE_DIRECT_JUMP)
>>>
>>>  #if defined(CONFIG_TCG_INTERPRETER)
> (snip)
>> Otherwise:
>>
>> Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
>
> Kind regards,
> Sergey


--
Alex Bennée

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Qemu-devel] [PATCH v3 01/11] util/qht: Document memory ordering assumptions
  2016-07-13 17:50       ` Sergey Fedorov
@ 2016-07-14 13:56         ` Paolo Bonzini
  2016-07-14 14:08           ` Sergey Fedorov
  0 siblings, 1 reply; 49+ messages in thread
From: Paolo Bonzini @ 2016-07-14 13:56 UTC (permalink / raw)
  To: Sergey Fedorov, Emilio G. Cota, Sergey Fedorov
  Cc: mttcg, peter.maydell, patches, mark.burton, claudio.fontana,
	a.rigo, qemu-devel, jan.kiszka, bobby.prani, fred.konrad,
	Alex Bennée, rth



On 13/07/2016 19:50, Sergey Fedorov wrote:
> On 13/07/16 10:36, Paolo Bonzini wrote:
>>
>> On 13/07/2016 01:19, Emilio G. Cota wrote:
>>> I wouldn't put those comments in the source--seqlock callers should
>>> know what they're doing, and what barriers seqlocks imply.
>> In general I'd agree with you, however in this case the "begin" calls
>> are what implements QHT's guarantee *for the caller*, so I think it's
>> worth having the comments.  In other words, if for any reason you do
>> anything before the read_begin and write_begin you still have to provide
>> barrier semantics.  It's not an explanation, it's a protection against
>> future mistakes.
> 
> Exactly :)
> 
>> There's no need for such comment at read_retry and write_end callsites,
>> though.
> 
> Why?
> 
>> Also, it's spelled "guarantee". :)
> 
> Hmm, I can't see where the spelling isn't correct.

There are a few "gaurantee"s in the patch.

If you decide to go with my own patch
(http://article.gmane.org/gmane.comp.emulators.qemu/426431) for v4,
please add a

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Qemu-devel] [PATCH v3 01/11] util/qht: Document memory ordering assumptions
  2016-07-14 13:56         ` Paolo Bonzini
@ 2016-07-14 14:08           ` Sergey Fedorov
  0 siblings, 0 replies; 49+ messages in thread
From: Sergey Fedorov @ 2016-07-14 14:08 UTC (permalink / raw)
  To: Paolo Bonzini, Emilio G. Cota, Sergey Fedorov
  Cc: mttcg, peter.maydell, patches, mark.burton, claudio.fontana,
	a.rigo, qemu-devel, jan.kiszka, bobby.prani, fred.konrad,
	Alex Bennée, rth

On 14/07/16 16:56, Paolo Bonzini wrote:
>
> On 13/07/2016 19:50, Sergey Fedorov wrote:
>> On 13/07/16 10:36, Paolo Bonzini wrote:
>>> On 13/07/2016 01:19, Emilio G. Cota wrote:
>>>> I wouldn't put those comments in the source--seqlock callers should
>>>> know what they're doing, and what barriers seqlocks imply.
>>> In general I'd agree with you, however in this case the "begin" calls
>>> are what implements QHT's guarantee *for the caller*, so I think it's
>>> worth having the comments.  In other words, if for any reason you do
>>> anything before the read_begin and write_begin you still have to provide
>>> barrier semantics.  It's not an explanation, it's a protection against
>>> future mistakes.
>> Exactly :)
>>
>>> There's no need for such comment at read_retry and write_end callsites,
>>> though.
>> Why?
>>
>>> Also, it's spelled "guarantee". :)
>> Hmm, I can't see where the spelling isn't correct.
> There are a few "gaurantee"s in the patch.

Ah, I see this now :)

>
> If you decide to go with my own patch
> (http://article.gmane.org/gmane.comp.emulators.qemu/426431) for v4,
> please add a
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Thanks,
Sergey

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Qemu-devel] [PATCH v3 01/11] util/qht: Document memory ordering assumptions
  2016-07-13 11:13   ` Paolo Bonzini
  2016-07-13 18:03     ` Sergey Fedorov
@ 2016-07-15 12:37     ` Sergey Fedorov
  2016-07-15 12:51       ` Paolo Bonzini
  1 sibling, 1 reply; 49+ messages in thread
From: Sergey Fedorov @ 2016-07-15 12:37 UTC (permalink / raw)
  To: Paolo Bonzini, Sergey Fedorov, qemu-devel, mttcg, fred.konrad,
	a.rigo, cota, bobby.prani, rth
  Cc: peter.maydell, patches, claudio.fontana, mark.burton, jan.kiszka,
	Alex Bennée

On 13/07/16 14:13, Paolo Bonzini wrote:
> diff --git a/include/qemu/qht.h b/include/qemu/qht.h
> index 70bfc68..f4f1d55 100644
> --- a/include/qemu/qht.h
> +++ b/include/qemu/qht.h
> @@ -69,6 +69,9 @@ void qht_destroy(struct qht *ht);
>   * Attempting to insert a NULL @p is a bug.
>   * Inserting the same pointer @p with different @hash values is a bug.
>   *
> + * In case of successful operation, smp_wmb() is implied before the pointer is
> + * inserted into the hash table.
> + *
>   * Returns true on sucess.
>   * Returns false if the @p-@hash pair already exists in the hash table.
>   */
> @@ -83,6 +86,8 @@ bool qht_insert(struct qht *ht, void *p, uint32_t hash);
>   *
>   * Needs to be called under an RCU read-critical section.
>   *
> + * smp_read_barrier_depends() is implied before the call to @func.
> + *
>   * The user-provided @func compares pointers in QHT against @userp.
>   * If the function returns true, a match has been found.
>   *
> @@ -105,6 +110,10 @@ void *qht_lookup(struct qht *ht, qht_lookup_func_t func, const void *userp,
>   * This guarantees that concurrent lookups will always compare against valid
>   * data.
>   *
> + * In case of successful operation, a smp_wmb() barrier is implied before and
> + * after the pointer is removed from the hash table.  In other words,
> + * a successful qht_remove acts as a bidirectional write barrier.
> + *

I understand why an implied wmb can be expected after the entry is
removed: so that the caller can trash the contents of the object
removed. However that would require double-check on lookup side to make
sure the entry hadn't been removed after the first lookup (something
like seqlock read side). But I have no idea why we might like to promise
wmb before the removal. Could you please share your point regarding this?

I tempt to remove any promises on qht_remove() because it is not clear
for me what would be the natural expectations on memory ordering.

As of qht_insert() and qht_lookup(), I agree and this is enough
guarantees for the series.

Thanks,
Sergey

>   * Returns true on success.
>   * Returns false if the @p-@hash pair was not found.
>   */
> diff --git a/util/qht.c b/util/qht.c
> index 40d6e21..d38948e 100644
> --- a/util/qht.c
> +++ b/util/qht.c
> @@ -445,7 +445,11 @@ void *qht_do_lookup(struct qht_bucket *head, qht_lookup_func_t func,
>      do {
>          for (i = 0; i < QHT_BUCKET_ENTRIES; i++) {
>              if (b->hashes[i] == hash) {
> -                void *p = atomic_read(&b->pointers[i]);
> +                /* The pointer is dereferenced before seqlock_read_retry,
> +                 * so (unlike qht_insert__locked) we need to use
> +                 * atomic_rcu_read here.
> +                 */
> +                void *p = atomic_rcu_read(&b->pointers[i]);
>  
>                  if (likely(p) && likely(func(p, userp))) {
>                      return p;
> @@ -535,6 +539,7 @@ static bool qht_insert__locked(struct qht *ht, struct qht_map *map,
>          atomic_rcu_set(&prev->next, b);
>      }
>      b->hashes[i] = hash;
> +    /* smp_wmb() implicit in seqlock_write_begin.  */
>      atomic_set(&b->pointers[i], p);
>      seqlock_write_end(&head->sequence);
>      return true;
> @@ -659,6 +664,9 @@ bool qht_remove__locked(struct qht_map *map, struct qht_bucket *head,
>              }
>              if (q == p) {
>                  qht_debug_assert(b->hashes[i] == hash);
> +                /* seqlock_write_begin and seqlock_write_end provide write
> +                 * memory barrier semantics to callers of qht_remove.
> +                 */
>                  seqlock_write_begin(&head->sequence);
>                  qht_bucket_remove_entry(b, i);
>                  seqlock_write_end(&head->sequence);

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Qemu-devel] [PATCH v3 01/11] util/qht: Document memory ordering assumptions
  2016-07-15 12:37     ` Sergey Fedorov
@ 2016-07-15 12:51       ` Paolo Bonzini
  2016-07-15 13:18         ` Sergey Fedorov
  0 siblings, 1 reply; 49+ messages in thread
From: Paolo Bonzini @ 2016-07-15 12:51 UTC (permalink / raw)
  To: Sergey Fedorov, Sergey Fedorov, qemu-devel, mttcg, fred.konrad,
	a.rigo, cota, bobby.prani, rth
  Cc: peter.maydell, patches, claudio.fontana, mark.burton, jan.kiszka,
	Alex Bennée



On 15/07/2016 14:37, Sergey Fedorov wrote:
> On 13/07/16 14:13, Paolo Bonzini wrote:
>> diff --git a/include/qemu/qht.h b/include/qemu/qht.h
>> index 70bfc68..f4f1d55 100644
>> --- a/include/qemu/qht.h
>> +++ b/include/qemu/qht.h
>> @@ -69,6 +69,9 @@ void qht_destroy(struct qht *ht);
>>   * Attempting to insert a NULL @p is a bug.
>>   * Inserting the same pointer @p with different @hash values is a bug.
>>   *
>> + * In case of successful operation, smp_wmb() is implied before the pointer is
>> + * inserted into the hash table.
>> + *
>>   * Returns true on sucess.
>>   * Returns false if the @p-@hash pair already exists in the hash table.
>>   */
>> @@ -83,6 +86,8 @@ bool qht_insert(struct qht *ht, void *p, uint32_t hash);
>>   *
>>   * Needs to be called under an RCU read-critical section.
>>   *
>> + * smp_read_barrier_depends() is implied before the call to @func.
>> + *
>>   * The user-provided @func compares pointers in QHT against @userp.
>>   * If the function returns true, a match has been found.
>>   *
>> @@ -105,6 +110,10 @@ void *qht_lookup(struct qht *ht, qht_lookup_func_t func, const void *userp,
>>   * This guarantees that concurrent lookups will always compare against valid
>>   * data.
>>   *
>> + * In case of successful operation, a smp_wmb() barrier is implied before and
>> + * after the pointer is removed from the hash table.  In other words,
>> + * a successful qht_remove acts as a bidirectional write barrier.
>> + *
> 
> I understand why an implied wmb can be expected after the entry is
> removed: so that the caller can trash the contents of the object
> removed. However that would require double-check on lookup side to make
> sure the entry hadn't been removed after the first lookup (something
> like seqlock read side). But I have no idea why we might like to promise
> wmb before the removal. Could you please share your point regarding this?

The reasoning was to make qht_remove() "look to be ordered" with respect
to writes.  So anything after remove wouldn't bleed into it, nor would
anything before.

In the case of this series, it would let you remove the smp_wmb() after
tb_mark_invalid().  However, it's also okay to only handle qht_insert()
and qht_lookup(), and keep the memory barrier after the TB is marked
invalid (though it is unnecessary).

Paolo

> As of qht_insert() and qht_lookup(), I agree and this is enough
> guarantees for the series.
> 
> Thanks,
> Sergey
> 
>>   * Returns true on success.
>>   * Returns false if the @p-@hash pair was not found.
>>   */
>> diff --git a/util/qht.c b/util/qht.c
>> index 40d6e21..d38948e 100644
>> --- a/util/qht.c
>> +++ b/util/qht.c
>> @@ -445,7 +445,11 @@ void *qht_do_lookup(struct qht_bucket *head, qht_lookup_func_t func,
>>      do {
>>          for (i = 0; i < QHT_BUCKET_ENTRIES; i++) {
>>              if (b->hashes[i] == hash) {
>> -                void *p = atomic_read(&b->pointers[i]);
>> +                /* The pointer is dereferenced before seqlock_read_retry,
>> +                 * so (unlike qht_insert__locked) we need to use
>> +                 * atomic_rcu_read here.
>> +                 */
>> +                void *p = atomic_rcu_read(&b->pointers[i]);
>>  
>>                  if (likely(p) && likely(func(p, userp))) {
>>                      return p;
>> @@ -535,6 +539,7 @@ static bool qht_insert__locked(struct qht *ht, struct qht_map *map,
>>          atomic_rcu_set(&prev->next, b);
>>      }
>>      b->hashes[i] = hash;
>> +    /* smp_wmb() implicit in seqlock_write_begin.  */
>>      atomic_set(&b->pointers[i], p);
>>      seqlock_write_end(&head->sequence);
>>      return true;
>> @@ -659,6 +664,9 @@ bool qht_remove__locked(struct qht_map *map, struct qht_bucket *head,
>>              }
>>              if (q == p) {
>>                  qht_debug_assert(b->hashes[i] == hash);
>> +                /* seqlock_write_begin and seqlock_write_end provide write
>> +                 * memory barrier semantics to callers of qht_remove.
>> +                 */
>>                  seqlock_write_begin(&head->sequence);
>>                  qht_bucket_remove_entry(b, i);
>>                  seqlock_write_end(&head->sequence);
> 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [Qemu-devel] [PATCH v3 01/11] util/qht: Document memory ordering assumptions
  2016-07-15 12:51       ` Paolo Bonzini
@ 2016-07-15 13:18         ` Sergey Fedorov
  0 siblings, 0 replies; 49+ messages in thread
From: Sergey Fedorov @ 2016-07-15 13:18 UTC (permalink / raw)
  To: Paolo Bonzini, Sergey Fedorov, qemu-devel, mttcg, fred.konrad,
	a.rigo, cota, bobby.prani, rth
  Cc: peter.maydell, patches, claudio.fontana, mark.burton, jan.kiszka,
	Alex Bennée

On 15/07/16 15:51, Paolo Bonzini wrote:
>
> On 15/07/2016 14:37, Sergey Fedorov wrote:
>> I understand why an implied wmb can be expected after the entry is
>> removed: so that the caller can trash the contents of the object
>> removed. However that would require double-check on lookup side to make
>> sure the entry hadn't been removed after the first lookup (something
>> like seqlock read side). But I have no idea why we might like to promise
>> wmb before the removal. Could you please share your point regarding this?
> The reasoning was to make qht_remove() "look to be ordered" with respect
> to writes.  So anything after remove wouldn't bleed into it, nor would
> anything before.
>
> In the case of this series, it would let you remove the smp_wmb() after
> tb_mark_invalid().  However, it's also okay to only handle qht_insert()
> and qht_lookup(), and keep the memory barrier after the TB is marked
> invalid (though it is unnecessary).
>

I'm pretty sure the smp_wmb() after tb_mark_invalid() is unnecessary
anyway. We don't rely on it at all because we're to recheck for
tb_is_invalid() under tb_lock before tb_add_jump(). However we have to
invalidate the CPU state atomically since we're going to check for it
out of tb_lock.

Kind regards,
Sergey

^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2016-07-15 13:18 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-12 20:13 [Qemu-devel] [PATCH v3 00/11] Reduce lock contention on TCG hot-path Sergey Fedorov
2016-07-12 20:13 ` [Qemu-devel] [PATCH v3 01/11] util/qht: Document memory ordering assumptions Sergey Fedorov
2016-07-12 23:19   ` Emilio G. Cota
2016-07-13  7:36     ` Paolo Bonzini
2016-07-13 17:50       ` Sergey Fedorov
2016-07-14 13:56         ` Paolo Bonzini
2016-07-14 14:08           ` Sergey Fedorov
2016-07-13 11:13   ` Paolo Bonzini
2016-07-13 18:03     ` Sergey Fedorov
2016-07-14  8:05       ` Paolo Bonzini
2016-07-15 12:37     ` Sergey Fedorov
2016-07-15 12:51       ` Paolo Bonzini
2016-07-15 13:18         ` Sergey Fedorov
2016-07-12 20:13 ` [Qemu-devel] [PATCH v3 02/11] cpu-exec: Pass last_tb by value to tb_find_fast() Sergey Fedorov
2016-07-12 20:13 ` [Qemu-devel] [PATCH v3 03/11] tcg: Prepare safe tb_jmp_cache lookup out of tb_lock Sergey Fedorov
2016-07-14 12:14   ` Alex Bennée
2016-07-12 20:13 ` [Qemu-devel] [PATCH v3 04/11] tcg: Prepare safe access to tb_flushed " Sergey Fedorov
2016-07-14 12:45   ` Alex Bennée
2016-07-14 12:55     ` Sergey Fedorov
2016-07-14 13:12       ` Alex Bennée
2016-07-12 20:13 ` [Qemu-devel] [PATCH v3 05/11] target-i386: Remove redundant HF_SOFTMMU_MASK Sergey Fedorov
2016-07-14 12:19   ` Alex Bennée
2016-07-12 20:13 ` [Qemu-devel] [PATCH v3 06/11] tcg: Introduce tb_mark_invalid() and tb_is_invalid() Sergey Fedorov
2016-07-14 10:25   ` Alex Bennée
2016-07-14 11:10     ` Sergey Fedorov
2016-07-14 11:48       ` Paolo Bonzini
2016-07-14 12:04         ` Alex Bennée
2016-07-14 12:53   ` Alex Bennée
2016-07-14 13:00     ` Sergey Fedorov
2016-07-14 13:12       ` Paolo Bonzini
2016-07-14 13:15       ` Alex Bennée
2016-07-12 20:13 ` [Qemu-devel] [PATCH v3 07/11] tcg: Prepare TB invalidation for lockless TB lookup Sergey Fedorov
2016-07-14 12:59   ` Alex Bennée
2016-07-14 13:11     ` Sergey Fedorov
2016-07-12 20:13 ` [Qemu-devel] [PATCH v3 08/11] tcg: set up tb->page_addr before insertion Sergey Fedorov
2016-07-12 20:13 ` [Qemu-devel] [PATCH v3 09/11] tcg: cpu-exec: remove tb_lock from the hot-path Sergey Fedorov
2016-07-12 20:13 ` [Qemu-devel] [PATCH v3 10/11] tcg: Avoid bouncing tb_lock between tb_gen_code() and tb_add_jump() Sergey Fedorov
2016-07-14 13:01   ` Alex Bennée
2016-07-14 13:13     ` Sergey Fedorov
2016-07-12 20:13 ` [Qemu-devel] [PATCH v3 11/11] tcg: Merge tb_find_slow() and tb_find_fast() Sergey Fedorov
2016-07-14 13:02   ` Alex Bennée
2016-07-13  7:39 ` [Qemu-devel] [PATCH v3 00/11] Reduce lock contention on TCG hot-path Paolo Bonzini
2016-07-13 17:00   ` Sergey Fedorov
2016-07-14  9:55     ` Alex Bennée
2016-07-14 11:13       ` Sergey Fedorov
2016-07-13 18:06   ` Sergey Fedorov
2016-07-14 12:02   ` Alex Bennée
2016-07-14 12:10     ` Paolo Bonzini
2016-07-14 13:13       ` Alex Bennée

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.