All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v5 00/10] Reduce lock contention on TCG hot-path
@ 2016-07-19  8:32 Paolo Bonzini
  2016-07-19  8:32 ` [Qemu-devel] [PATCH 01/10] util/qht: Document memory ordering assumptions Paolo Bonzini
                   ` (9 more replies)
  0 siblings, 10 replies; 20+ messages in thread
From: Paolo Bonzini @ 2016-07-19  8:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: serge.fdrv, sergey.fedorov, alex.bennee

This replaces the complex tb_mark_invalid mechanism with a simple
flag, as suggested by Sergey.

Alex Bennée (2):
  tcg: set up tb->page_addr before insertion
  tcg: cpu-exec: remove tb_lock from the hot-path

Paolo Bonzini (2):
  util/qht: Document memory ordering assumptions
  tcg: Prepare TB invalidation for lockless TB lookup

Sergey Fedorov (6):
  tcg: Pass last_tb by value to tb_find_fast()
  tcg: Prepare safe tb_jmp_cache lookup out of tb_lock
  tcg: Prepare safe access to tb_flushed out of tb_lock
  tcg: Avoid bouncing tb_lock between tb_gen_code() and tb_add_jump()
  tcg: Merge tb_find_slow() and tb_find_fast()
  tcg: rename tb_find_physical()

 cpu-exec.c              | 115 +++++++++++++++++++++---------------------------
 include/exec/exec-all.h |   2 +
 include/qemu/qht.h      |   5 +++
 translate-all.c         |  25 +++++++----
 util/qht.c              |   7 ++-
 5 files changed, 80 insertions(+), 74 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH 01/10] util/qht: Document memory ordering assumptions
  2016-07-19  8:32 [Qemu-devel] [PATCH v5 00/10] Reduce lock contention on TCG hot-path Paolo Bonzini
@ 2016-07-19  8:32 ` Paolo Bonzini
  2016-07-19  8:32 ` [Qemu-devel] [PATCH 02/10] tcg: Pass last_tb by value to tb_find_fast() Paolo Bonzini
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Paolo Bonzini @ 2016-07-19  8:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: serge.fdrv, sergey.fedorov, alex.bennee

It is naturally expected that some memory ordering should be provided
around qht_insert() and qht_lookup(). Document these assumptions in the
header file and put some comments in the source to denote how that
memory ordering requirements are fulfilled.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[Sergey Fedorov: commit title and message provided;
comment on qht_remove() elided]
Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Message-Id: <20160715175852.30749-2-sergey.fedorov@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 include/qemu/qht.h | 5 +++++
 util/qht.c         | 7 ++++++-
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/include/qemu/qht.h b/include/qemu/qht.h
index 70bfc68..311139b 100644
--- a/include/qemu/qht.h
+++ b/include/qemu/qht.h
@@ -69,6 +69,9 @@ void qht_destroy(struct qht *ht);
  * Attempting to insert a NULL @p is a bug.
  * Inserting the same pointer @p with different @hash values is a bug.
  *
+ * In case of successful operation, smp_wmb() is implied before the pointer is
+ * inserted into the hash table.
+ *
  * Returns true on sucess.
  * Returns false if the @p-@hash pair already exists in the hash table.
  */
@@ -83,6 +86,8 @@ bool qht_insert(struct qht *ht, void *p, uint32_t hash);
  *
  * Needs to be called under an RCU read-critical section.
  *
+ * smp_read_barrier_depends() is implied before the call to @func.
+ *
  * The user-provided @func compares pointers in QHT against @userp.
  * If the function returns true, a match has been found.
  *
diff --git a/util/qht.c b/util/qht.c
index 40d6e21..28ce289 100644
--- a/util/qht.c
+++ b/util/qht.c
@@ -445,7 +445,11 @@ void *qht_do_lookup(struct qht_bucket *head, qht_lookup_func_t func,
     do {
         for (i = 0; i < QHT_BUCKET_ENTRIES; i++) {
             if (b->hashes[i] == hash) {
-                void *p = atomic_read(&b->pointers[i]);
+                /* The pointer is dereferenced before seqlock_read_retry,
+                 * so (unlike qht_insert__locked) we need to use
+                 * atomic_rcu_read here.
+                 */
+                void *p = atomic_rcu_read(&b->pointers[i]);
 
                 if (likely(p) && likely(func(p, userp))) {
                     return p;
@@ -535,6 +539,7 @@ static bool qht_insert__locked(struct qht *ht, struct qht_map *map,
         atomic_rcu_set(&prev->next, b);
     }
     b->hashes[i] = hash;
+    /* smp_wmb() implicit in seqlock_write_begin.  */
     atomic_set(&b->pointers[i], p);
     seqlock_write_end(&head->sequence);
     return true;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH 02/10] tcg: Pass last_tb by value to tb_find_fast()
  2016-07-19  8:32 [Qemu-devel] [PATCH v5 00/10] Reduce lock contention on TCG hot-path Paolo Bonzini
  2016-07-19  8:32 ` [Qemu-devel] [PATCH 01/10] util/qht: Document memory ordering assumptions Paolo Bonzini
@ 2016-07-19  8:32 ` Paolo Bonzini
  2016-09-08 12:44   ` Alex Bennée
  2016-07-19  8:32 ` [Qemu-devel] [PATCH 03/10] tcg: Prepare safe tb_jmp_cache lookup out of tb_lock Paolo Bonzini
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 20+ messages in thread
From: Paolo Bonzini @ 2016-07-19  8:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: serge.fdrv, sergey.fedorov, alex.bennee

From: Sergey Fedorov <serge.fdrv@gmail.com>

This is a small clean up. tb_find_fast() is a final consumer of this
variable so no need to pass it by reference. 'last_tb' is always updated
by subsequent cpu_loop_exec_tb() in cpu_exec().

This change also simplifies calling cpu_exec_nocache() in
cpu_handle_exception().

Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Message-Id: <20160715175852.30749-3-sergey.fedorov@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 cpu-exec.c | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index 5d9710a..cf511f1 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -320,7 +320,7 @@ found:
 }
 
 static inline TranslationBlock *tb_find_fast(CPUState *cpu,
-                                             TranslationBlock **last_tb,
+                                             TranslationBlock *last_tb,
                                              int tb_exit)
 {
     CPUArchState *env = (CPUArchState *)cpu->env_ptr;
@@ -342,7 +342,7 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
         /* Ensure that no TB jump will be modified as the
          * translation buffer has been flushed.
          */
-        *last_tb = NULL;
+        last_tb = NULL;
         cpu->tb_flushed = false;
     }
 #ifndef CONFIG_USER_ONLY
@@ -351,12 +351,12 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
      * spanning two pages because the mapping for the second page can change.
      */
     if (tb->page_addr[1] != -1) {
-        *last_tb = NULL;
+        last_tb = NULL;
     }
 #endif
     /* See if we can patch the calling TB. */
-    if (*last_tb && !qemu_loglevel_mask(CPU_LOG_TB_NOCHAIN)) {
-        tb_add_jump(*last_tb, tb_exit, tb);
+    if (last_tb && !qemu_loglevel_mask(CPU_LOG_TB_NOCHAIN)) {
+        tb_add_jump(last_tb, tb_exit, tb);
     }
     tb_unlock();
     return tb;
@@ -437,8 +437,7 @@ static inline bool cpu_handle_exception(CPUState *cpu, int *ret)
     } else if (replay_has_exception()
                && cpu->icount_decr.u16.low + cpu->icount_extra == 0) {
         /* try to cause an exception pending in the log */
-        TranslationBlock *last_tb = NULL; /* Avoid chaining TBs */
-        cpu_exec_nocache(cpu, 1, tb_find_fast(cpu, &last_tb, 0), true);
+        cpu_exec_nocache(cpu, 1, tb_find_fast(cpu, NULL, 0), true);
         *ret = -1;
         return true;
 #endif
@@ -621,7 +620,7 @@ int cpu_exec(CPUState *cpu)
             cpu->tb_flushed = false; /* reset before first TB lookup */
             for(;;) {
                 cpu_handle_interrupt(cpu, &last_tb);
-                tb = tb_find_fast(cpu, &last_tb, tb_exit);
+                tb = tb_find_fast(cpu, last_tb, tb_exit);
                 cpu_loop_exec_tb(cpu, tb, &last_tb, &tb_exit, &sc);
                 /* Try to align the host and virtual clocks
                    if the guest is in advance */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH 03/10] tcg: Prepare safe tb_jmp_cache lookup out of tb_lock
  2016-07-19  8:32 [Qemu-devel] [PATCH v5 00/10] Reduce lock contention on TCG hot-path Paolo Bonzini
  2016-07-19  8:32 ` [Qemu-devel] [PATCH 01/10] util/qht: Document memory ordering assumptions Paolo Bonzini
  2016-07-19  8:32 ` [Qemu-devel] [PATCH 02/10] tcg: Pass last_tb by value to tb_find_fast() Paolo Bonzini
@ 2016-07-19  8:32 ` Paolo Bonzini
  2016-07-19  8:32 ` [Qemu-devel] [PATCH 04/10] tcg: Prepare safe access to tb_flushed " Paolo Bonzini
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Paolo Bonzini @ 2016-07-19  8:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: serge.fdrv, sergey.fedorov, alex.bennee

From: Sergey Fedorov <serge.fdrv@gmail.com>

Ensure atomicity of CPU's 'tb_jmp_cache' access for future translation
block lookup out of 'tb_lock'.

Note that this patch does *not* make CPU's TLB invalidation safe if it
is done from some other thread while the CPU is in its execution loop.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20160715175852.30749-4-sergey.fedorov@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 cpu-exec.c      |  4 ++--
 translate-all.c | 10 +++++++---
 2 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index cf511f1..32b58ed 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -315,7 +315,7 @@ static TranslationBlock *tb_find_slow(CPUState *cpu,
 
 found:
     /* we add the TB in the virtual pc hash table */
-    cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)] = tb;
+    atomic_set(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)], tb);
     return tb;
 }
 
@@ -333,7 +333,7 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
        is executed. */
     cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags);
     tb_lock();
-    tb = cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)];
+    tb = atomic_read(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)]);
     if (unlikely(!tb || tb->pc != pc || tb->cs_base != cs_base ||
                  tb->flags != flags)) {
         tb = tb_find_slow(cpu, pc, cs_base, flags);
diff --git a/translate-all.c b/translate-all.c
index 0d47c1c..fdf520a 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -848,7 +848,11 @@ void tb_flush(CPUState *cpu)
     tcg_ctx.tb_ctx.nb_tbs = 0;
 
     CPU_FOREACH(cpu) {
-        memset(cpu->tb_jmp_cache, 0, sizeof(cpu->tb_jmp_cache));
+        int i;
+
+        for (i = 0; i < TB_JMP_CACHE_SIZE; ++i) {
+            atomic_set(&cpu->tb_jmp_cache[i], NULL);
+        }
         cpu->tb_flushed = true;
     }
 
@@ -1007,8 +1011,8 @@ void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr)
     /* remove the TB from the hash list */
     h = tb_jmp_cache_hash_func(tb->pc);
     CPU_FOREACH(cpu) {
-        if (cpu->tb_jmp_cache[h] == tb) {
-            cpu->tb_jmp_cache[h] = NULL;
+        if (atomic_read(&cpu->tb_jmp_cache[h]) == tb) {
+            atomic_set(&cpu->tb_jmp_cache[h], NULL);
         }
     }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH 04/10] tcg: Prepare safe access to tb_flushed out of tb_lock
  2016-07-19  8:32 [Qemu-devel] [PATCH v5 00/10] Reduce lock contention on TCG hot-path Paolo Bonzini
                   ` (2 preceding siblings ...)
  2016-07-19  8:32 ` [Qemu-devel] [PATCH 03/10] tcg: Prepare safe tb_jmp_cache lookup out of tb_lock Paolo Bonzini
@ 2016-07-19  8:32 ` Paolo Bonzini
  2016-07-19  8:32 ` [Qemu-devel] [PATCH 05/10] tcg: Prepare TB invalidation for lockless TB lookup Paolo Bonzini
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Paolo Bonzini @ 2016-07-19  8:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: serge.fdrv, sergey.fedorov, alex.bennee

From: Sergey Fedorov <serge.fdrv@gmail.com>

Ensure atomicity and ordering of CPU's 'tb_flushed' access for future
translation block lookup out of 'tb_lock'.

This field can only be touched from another thread by tb_flush() in user
mode emulation. So the only access to be sequential atomic is:
 * a single write in tb_flush();
 * reads/writes out of 'tb_lock'.

In future, before enabling MTTCG in system mode, tb_flush() must be safe
and this field becomes unnecessary.

Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20160715175852.30749-5-sergey.fedorov@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 cpu-exec.c      | 16 +++++++---------
 translate-all.c |  4 ++--
 2 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index 32b58ed..877ff8e 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -338,13 +338,6 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
                  tb->flags != flags)) {
         tb = tb_find_slow(cpu, pc, cs_base, flags);
     }
-    if (cpu->tb_flushed) {
-        /* Ensure that no TB jump will be modified as the
-         * translation buffer has been flushed.
-         */
-        last_tb = NULL;
-        cpu->tb_flushed = false;
-    }
 #ifndef CONFIG_USER_ONLY
     /* We don't take care of direct jumps when address mapping changes in
      * system emulation. So it's not safe to make a direct jump to a TB
@@ -356,7 +349,12 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
 #endif
     /* See if we can patch the calling TB. */
     if (last_tb && !qemu_loglevel_mask(CPU_LOG_TB_NOCHAIN)) {
-        tb_add_jump(last_tb, tb_exit, tb);
+        /* Check if translation buffer has been flushed */
+        if (cpu->tb_flushed) {
+            cpu->tb_flushed = false;
+        } else {
+            tb_add_jump(last_tb, tb_exit, tb);
+        }
     }
     tb_unlock();
     return tb;
@@ -617,7 +615,7 @@ int cpu_exec(CPUState *cpu)
                 break;
             }
 
-            cpu->tb_flushed = false; /* reset before first TB lookup */
+            atomic_mb_set(&cpu->tb_flushed, false); /* reset before first TB lookup */
             for(;;) {
                 cpu_handle_interrupt(cpu, &last_tb);
                 tb = tb_find_fast(cpu, last_tb, tb_exit);
diff --git a/translate-all.c b/translate-all.c
index fdf520a..788fed1 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -845,7 +845,6 @@ void tb_flush(CPUState *cpu)
         > tcg_ctx.code_gen_buffer_size) {
         cpu_abort(cpu, "Internal error: code buffer overflow\n");
     }
-    tcg_ctx.tb_ctx.nb_tbs = 0;
 
     CPU_FOREACH(cpu) {
         int i;
@@ -853,9 +852,10 @@ void tb_flush(CPUState *cpu)
         for (i = 0; i < TB_JMP_CACHE_SIZE; ++i) {
             atomic_set(&cpu->tb_jmp_cache[i], NULL);
         }
-        cpu->tb_flushed = true;
+        atomic_mb_set(&cpu->tb_flushed, true);
     }
 
+    tcg_ctx.tb_ctx.nb_tbs = 0;
     qht_reset_size(&tcg_ctx.tb_ctx.htable, CODE_GEN_HTABLE_SIZE);
     page_flush_tb();
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH 05/10] tcg: Prepare TB invalidation for lockless TB lookup
  2016-07-19  8:32 [Qemu-devel] [PATCH v5 00/10] Reduce lock contention on TCG hot-path Paolo Bonzini
                   ` (3 preceding siblings ...)
  2016-07-19  8:32 ` [Qemu-devel] [PATCH 04/10] tcg: Prepare safe access to tb_flushed " Paolo Bonzini
@ 2016-07-19  8:32 ` Paolo Bonzini
  2016-07-19 19:56   ` Sergey Fedorov
  2016-07-19  8:32 ` [Qemu-devel] [PATCH 06/10] tcg: set up tb->page_addr before insertion Paolo Bonzini
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 20+ messages in thread
From: Paolo Bonzini @ 2016-07-19  8:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: serge.fdrv, sergey.fedorov, alex.bennee

When invalidating a translation block, set an invalid flag into the
TranslationBlock structure first.  It is also necessary to check whether
the target TB is still valid after acquiring 'tb_lock' but before calling
tb_add_jump() since TB lookup is to be performed out of 'tb_lock' in
future. Note that we don't have to check 'last_tb'; an already invalidated
TB will not be executed anyway and it is thus safe to patch it.

Suggested-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 cpu-exec.c              | 5 +++--
 include/exec/exec-all.h | 2 ++
 translate-all.c         | 3 +++
 3 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index 877ff8e..cdaab1d 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -241,7 +241,8 @@ static bool tb_cmp(const void *p, const void *d)
     if (tb->pc == desc->pc &&
         tb->page_addr[0] == desc->phys_page1 &&
         tb->cs_base == desc->cs_base &&
-        tb->flags == desc->flags) {
+        tb->flags == desc->flags &&
+        !atomic_read(&tb->invalid)) {
         /* check next page if needed */
         if (tb->page_addr[1] == -1) {
             return true;
@@ -352,7 +353,7 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
         /* Check if translation buffer has been flushed */
         if (cpu->tb_flushed) {
             cpu->tb_flushed = false;
-        } else {
+        } else if (!tb->invalid) {
             tb_add_jump(last_tb, tb_exit, tb);
         }
     }
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index acda7b6..bc0bcc5 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -213,6 +213,8 @@ struct TranslationBlock {
 #define CF_USE_ICOUNT  0x20000
 #define CF_IGNORE_ICOUNT 0x40000 /* Do not generate icount code */
 
+    uint16_t invalid;
+
     void *tc_ptr;    /* pointer to the translated code */
     uint8_t *tc_search;  /* pointer to search data */
     /* original tb when cflags has CF_NOCACHE */
diff --git a/translate-all.c b/translate-all.c
index 788fed1..eaa1232 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -773,6 +773,7 @@ static TranslationBlock *tb_alloc(target_ulong pc)
     tb = &tcg_ctx.tb_ctx.tbs[tcg_ctx.tb_ctx.nb_tbs++];
     tb->pc = pc;
     tb->cflags = 0;
+    tb->invalid = false;
     return tb;
 }
 
@@ -991,6 +992,8 @@ void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr)
     uint32_t h;
     tb_page_addr_t phys_pc;
 
+    atomic_set(&tb->invalid, true);
+
     /* remove the TB from the hash list */
     phys_pc = tb->page_addr[0] + (tb->pc & ~TARGET_PAGE_MASK);
     h = tb_hash_func(phys_pc, tb->pc, tb->flags);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH 06/10] tcg: set up tb->page_addr before insertion
  2016-07-19  8:32 [Qemu-devel] [PATCH v5 00/10] Reduce lock contention on TCG hot-path Paolo Bonzini
                   ` (4 preceding siblings ...)
  2016-07-19  8:32 ` [Qemu-devel] [PATCH 05/10] tcg: Prepare TB invalidation for lockless TB lookup Paolo Bonzini
@ 2016-07-19  8:32 ` Paolo Bonzini
  2016-07-19  8:32 ` [Qemu-devel] [PATCH 07/10] tcg: cpu-exec: remove tb_lock from the hot-path Paolo Bonzini
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Paolo Bonzini @ 2016-07-19  8:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: serge.fdrv, sergey.fedorov, alex.bennee

From: Alex Bennée <alex.bennee@linaro.org>

This ensures that if we find the TB on the slow path that tb->page_addr
is correctly set before being tested.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Message-Id: <20160715175852.30749-9-sergey.fedorov@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 translate-all.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/translate-all.c b/translate-all.c
index eaa1232..1ce05ff 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -1128,10 +1128,6 @@ static void tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc,
 {
     uint32_t h;
 
-    /* add in the hash table */
-    h = tb_hash_func(phys_pc, tb->pc, tb->flags);
-    qht_insert(&tcg_ctx.tb_ctx.htable, tb, h);
-
     /* add in the page list */
     tb_alloc_page(tb, 0, phys_pc & TARGET_PAGE_MASK);
     if (phys_page2 != -1) {
@@ -1140,6 +1136,10 @@ static void tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc,
         tb->page_addr[1] = -1;
     }
 
+    /* add in the hash table */
+    h = tb_hash_func(phys_pc, tb->pc, tb->flags);
+    qht_insert(&tcg_ctx.tb_ctx.htable, tb, h);
+
 #ifdef DEBUG_TB_CHECK
     tb_page_check();
 #endif
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH 07/10] tcg: cpu-exec: remove tb_lock from the hot-path
  2016-07-19  8:32 [Qemu-devel] [PATCH v5 00/10] Reduce lock contention on TCG hot-path Paolo Bonzini
                   ` (5 preceding siblings ...)
  2016-07-19  8:32 ` [Qemu-devel] [PATCH 06/10] tcg: set up tb->page_addr before insertion Paolo Bonzini
@ 2016-07-19  8:32 ` Paolo Bonzini
  2016-07-19  8:32 ` [Qemu-devel] [PATCH 08/10] tcg: Avoid bouncing tb_lock between tb_gen_code() and tb_add_jump() Paolo Bonzini
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Paolo Bonzini @ 2016-07-19  8:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: serge.fdrv, sergey.fedorov, alex.bennee

From: Alex Bennée <alex.bennee@linaro.org>

Lock contention in the hot path of moving between existing patched
TranslationBlocks is the main drag in multithreaded performance. This
patch pushes the tb_lock() usage down to the two places that really need
it:

  - code generation (tb_gen_code)
  - jump patching (tb_add_jump)

The rest of the code doesn't really need to hold a lock as it is either
using per-CPU structures, atomically updated or designed to be used in
concurrent read situations (qht_lookup).

To keep things simple I removed the #ifdef CONFIG_USER_ONLY stuff as the
locks become NOPs anyway until the MTTCG work is completed.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>

Message-Id: <20160715175852.30749-10-sergey.fedorov@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 cpu-exec.c | 48 +++++++++++++++++++++---------------------------
 1 file changed, 21 insertions(+), 27 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index cdaab1d..7ca2b71 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -287,35 +287,29 @@ static TranslationBlock *tb_find_slow(CPUState *cpu,
     TranslationBlock *tb;
 
     tb = tb_find_physical(cpu, pc, cs_base, flags);
-    if (tb) {
-        goto found;
-    }
+    if (!tb) {
 
-#ifdef CONFIG_USER_ONLY
-    /* mmap_lock is needed by tb_gen_code, and mmap_lock must be
-     * taken outside tb_lock.  Since we're momentarily dropping
-     * tb_lock, there's a chance that our desired tb has been
-     * translated.
-     */
-    tb_unlock();
-    mmap_lock();
-    tb_lock();
-    tb = tb_find_physical(cpu, pc, cs_base, flags);
-    if (tb) {
-        mmap_unlock();
-        goto found;
-    }
-#endif
+        /* mmap_lock is needed by tb_gen_code, and mmap_lock must be
+         * taken outside tb_lock. As system emulation is currently
+         * single threaded the locks are NOPs.
+         */
+        mmap_lock();
+        tb_lock();
 
-    /* if no translated code available, then translate it now */
-    tb = tb_gen_code(cpu, pc, cs_base, flags, 0);
+        /* There's a chance that our desired tb has been translated while
+         * taking the locks so we check again inside the lock.
+         */
+        tb = tb_find_physical(cpu, pc, cs_base, flags);
+        if (!tb) {
+            /* if no translated code available, then translate it now */
+            tb = tb_gen_code(cpu, pc, cs_base, flags, 0);
+        }
 
-#ifdef CONFIG_USER_ONLY
-    mmap_unlock();
-#endif
+        tb_unlock();
+        mmap_unlock();
+    }
 
-found:
-    /* we add the TB in the virtual pc hash table */
+    /* We add the TB in the virtual pc hash table for the fast lookup */
     atomic_set(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)], tb);
     return tb;
 }
@@ -333,7 +327,6 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
        always be the same before a given translated block
        is executed. */
     cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags);
-    tb_lock();
     tb = atomic_read(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)]);
     if (unlikely(!tb || tb->pc != pc || tb->cs_base != cs_base ||
                  tb->flags != flags)) {
@@ -350,14 +343,15 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
 #endif
     /* See if we can patch the calling TB. */
     if (last_tb && !qemu_loglevel_mask(CPU_LOG_TB_NOCHAIN)) {
+        tb_lock();
         /* Check if translation buffer has been flushed */
         if (cpu->tb_flushed) {
             cpu->tb_flushed = false;
         } else if (!tb->invalid) {
             tb_add_jump(last_tb, tb_exit, tb);
         }
+        tb_unlock();
     }
-    tb_unlock();
     return tb;
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH 08/10] tcg: Avoid bouncing tb_lock between tb_gen_code() and tb_add_jump()
  2016-07-19  8:32 [Qemu-devel] [PATCH v5 00/10] Reduce lock contention on TCG hot-path Paolo Bonzini
                   ` (6 preceding siblings ...)
  2016-07-19  8:32 ` [Qemu-devel] [PATCH 07/10] tcg: cpu-exec: remove tb_lock from the hot-path Paolo Bonzini
@ 2016-07-19  8:32 ` Paolo Bonzini
  2016-07-19  8:32 ` [Qemu-devel] [PATCH 09/10] tcg: Merge tb_find_slow() and tb_find_fast() Paolo Bonzini
  2016-07-19  8:32 ` [Qemu-devel] [PATCH 10/10] tcg: rename tb_find_physical() Paolo Bonzini
  9 siblings, 0 replies; 20+ messages in thread
From: Paolo Bonzini @ 2016-07-19  8:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: serge.fdrv, sergey.fedorov, alex.bennee

From: Sergey Fedorov <serge.fdrv@gmail.com>

Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20160715175852.30749-11-sergey.fedorov@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 cpu-exec.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index 7ca2b71..bd9fa5a 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -282,7 +282,8 @@ static TranslationBlock *tb_find_physical(CPUState *cpu,
 static TranslationBlock *tb_find_slow(CPUState *cpu,
                                       target_ulong pc,
                                       target_ulong cs_base,
-                                      uint32_t flags)
+                                      uint32_t flags,
+                                      bool *have_tb_lock)
 {
     TranslationBlock *tb;
 
@@ -295,6 +296,7 @@ static TranslationBlock *tb_find_slow(CPUState *cpu,
          */
         mmap_lock();
         tb_lock();
+        *have_tb_lock = true;
 
         /* There's a chance that our desired tb has been translated while
          * taking the locks so we check again inside the lock.
@@ -305,7 +307,6 @@ static TranslationBlock *tb_find_slow(CPUState *cpu,
             tb = tb_gen_code(cpu, pc, cs_base, flags, 0);
         }
 
-        tb_unlock();
         mmap_unlock();
     }
 
@@ -322,6 +323,7 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
     TranslationBlock *tb;
     target_ulong cs_base, pc;
     uint32_t flags;
+    bool have_tb_lock = false;
 
     /* we record a subset of the CPU state. It will
        always be the same before a given translated block
@@ -330,7 +332,7 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
     tb = atomic_read(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)]);
     if (unlikely(!tb || tb->pc != pc || tb->cs_base != cs_base ||
                  tb->flags != flags)) {
-        tb = tb_find_slow(cpu, pc, cs_base, flags);
+        tb = tb_find_slow(cpu, pc, cs_base, flags, &have_tb_lock);
     }
 #ifndef CONFIG_USER_ONLY
     /* We don't take care of direct jumps when address mapping changes in
@@ -343,13 +345,18 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
 #endif
     /* See if we can patch the calling TB. */
     if (last_tb && !qemu_loglevel_mask(CPU_LOG_TB_NOCHAIN)) {
-        tb_lock();
+        if (!have_tb_lock) {
+            tb_lock();
+            have_tb_lock = true;
+        }
         /* Check if translation buffer has been flushed */
         if (cpu->tb_flushed) {
             cpu->tb_flushed = false;
         } else if (!tb->invalid) {
             tb_add_jump(last_tb, tb_exit, tb);
         }
+    }
+    if (have_tb_lock) {
         tb_unlock();
     }
     return tb;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH 09/10] tcg: Merge tb_find_slow() and tb_find_fast()
  2016-07-19  8:32 [Qemu-devel] [PATCH v5 00/10] Reduce lock contention on TCG hot-path Paolo Bonzini
                   ` (7 preceding siblings ...)
  2016-07-19  8:32 ` [Qemu-devel] [PATCH 08/10] tcg: Avoid bouncing tb_lock between tb_gen_code() and tb_add_jump() Paolo Bonzini
@ 2016-07-19  8:32 ` Paolo Bonzini
  2016-07-19  8:32 ` [Qemu-devel] [PATCH 10/10] tcg: rename tb_find_physical() Paolo Bonzini
  9 siblings, 0 replies; 20+ messages in thread
From: Paolo Bonzini @ 2016-07-19  8:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: serge.fdrv, sergey.fedorov, alex.bennee

From: Sergey Fedorov <serge.fdrv@gmail.com>

These functions are not too big and can be merged together. This makes
locking scheme more clear and easier to follow.

Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20160715175852.30749-12-sergey.fedorov@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 cpu-exec.c | 72 ++++++++++++++++++++++++++------------------------------------
 1 file changed, 30 insertions(+), 42 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index bd9fa5a..f7f60b1 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -279,45 +279,9 @@ static TranslationBlock *tb_find_physical(CPUState *cpu,
     return qht_lookup(&tcg_ctx.tb_ctx.htable, tb_cmp, &desc, h);
 }
 
-static TranslationBlock *tb_find_slow(CPUState *cpu,
-                                      target_ulong pc,
-                                      target_ulong cs_base,
-                                      uint32_t flags,
-                                      bool *have_tb_lock)
-{
-    TranslationBlock *tb;
-
-    tb = tb_find_physical(cpu, pc, cs_base, flags);
-    if (!tb) {
-
-        /* mmap_lock is needed by tb_gen_code, and mmap_lock must be
-         * taken outside tb_lock. As system emulation is currently
-         * single threaded the locks are NOPs.
-         */
-        mmap_lock();
-        tb_lock();
-        *have_tb_lock = true;
-
-        /* There's a chance that our desired tb has been translated while
-         * taking the locks so we check again inside the lock.
-         */
-        tb = tb_find_physical(cpu, pc, cs_base, flags);
-        if (!tb) {
-            /* if no translated code available, then translate it now */
-            tb = tb_gen_code(cpu, pc, cs_base, flags, 0);
-        }
-
-        mmap_unlock();
-    }
-
-    /* We add the TB in the virtual pc hash table for the fast lookup */
-    atomic_set(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)], tb);
-    return tb;
-}
-
-static inline TranslationBlock *tb_find_fast(CPUState *cpu,
-                                             TranslationBlock *last_tb,
-                                             int tb_exit)
+static inline TranslationBlock *tb_find(CPUState *cpu,
+                                        TranslationBlock *last_tb,
+                                        int tb_exit)
 {
     CPUArchState *env = (CPUArchState *)cpu->env_ptr;
     TranslationBlock *tb;
@@ -332,7 +296,31 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
     tb = atomic_read(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)]);
     if (unlikely(!tb || tb->pc != pc || tb->cs_base != cs_base ||
                  tb->flags != flags)) {
-        tb = tb_find_slow(cpu, pc, cs_base, flags, &have_tb_lock);
+        tb = tb_find_physical(cpu, pc, cs_base, flags);
+        if (!tb) {
+
+            /* mmap_lock is needed by tb_gen_code, and mmap_lock must be
+             * taken outside tb_lock. As system emulation is currently
+             * single threaded the locks are NOPs.
+             */
+            mmap_lock();
+            tb_lock();
+            have_tb_lock = true;
+
+            /* There's a chance that our desired tb has been translated while
+             * taking the locks so we check again inside the lock.
+             */
+            tb = tb_find_physical(cpu, pc, cs_base, flags);
+            if (!tb) {
+                /* if no translated code available, then translate it now */
+                tb = tb_gen_code(cpu, pc, cs_base, flags, 0);
+            }
+
+            mmap_unlock();
+        }
+
+        /* We add the TB in the virtual pc hash table for the fast lookup */
+        atomic_set(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)], tb);
     }
 #ifndef CONFIG_USER_ONLY
     /* We don't take care of direct jumps when address mapping changes in
@@ -437,7 +425,7 @@ static inline bool cpu_handle_exception(CPUState *cpu, int *ret)
     } else if (replay_has_exception()
                && cpu->icount_decr.u16.low + cpu->icount_extra == 0) {
         /* try to cause an exception pending in the log */
-        cpu_exec_nocache(cpu, 1, tb_find_fast(cpu, NULL, 0), true);
+        cpu_exec_nocache(cpu, 1, tb_find(cpu, NULL, 0), true);
         *ret = -1;
         return true;
 #endif
@@ -620,7 +608,7 @@ int cpu_exec(CPUState *cpu)
             atomic_mb_set(&cpu->tb_flushed, false); /* reset before first TB lookup */
             for(;;) {
                 cpu_handle_interrupt(cpu, &last_tb);
-                tb = tb_find_fast(cpu, last_tb, tb_exit);
+                tb = tb_find(cpu, last_tb, tb_exit);
                 cpu_loop_exec_tb(cpu, tb, &last_tb, &tb_exit, &sc);
                 /* Try to align the host and virtual clocks
                    if the guest is in advance */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH 10/10] tcg: rename tb_find_physical()
  2016-07-19  8:32 [Qemu-devel] [PATCH v5 00/10] Reduce lock contention on TCG hot-path Paolo Bonzini
                   ` (8 preceding siblings ...)
  2016-07-19  8:32 ` [Qemu-devel] [PATCH 09/10] tcg: Merge tb_find_slow() and tb_find_fast() Paolo Bonzini
@ 2016-07-19  8:32 ` Paolo Bonzini
  2016-09-08 12:39   ` Alex Bennée
  9 siblings, 1 reply; 20+ messages in thread
From: Paolo Bonzini @ 2016-07-19  8:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: serge.fdrv, sergey.fedorov, alex.bennee

From: Sergey Fedorov <serge.fdrv@gmail.com>

In fact, this function does not exactly perform a lookup by physical
address as it is descibed for comment on get_page_addr_code(). Thus
it may be a bit confusing to have "physical" in it's name. So rename it
to tb_htable_lookup() to better reflect its actual functionality.

Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Message-Id: <20160715175852.30749-13-sergey.fedorov@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 cpu-exec.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index f7f60b1..b240b9f 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -260,7 +260,7 @@ static bool tb_cmp(const void *p, const void *d)
     return false;
 }
 
-static TranslationBlock *tb_find_physical(CPUState *cpu,
+static TranslationBlock *tb_htable_lookup(CPUState *cpu,
                                           target_ulong pc,
                                           target_ulong cs_base,
                                           uint32_t flags)
@@ -296,7 +296,7 @@ static inline TranslationBlock *tb_find(CPUState *cpu,
     tb = atomic_read(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)]);
     if (unlikely(!tb || tb->pc != pc || tb->cs_base != cs_base ||
                  tb->flags != flags)) {
-        tb = tb_find_physical(cpu, pc, cs_base, flags);
+        tb = tb_htable_lookup(cpu, pc, cs_base, flags);
         if (!tb) {
 
             /* mmap_lock is needed by tb_gen_code, and mmap_lock must be
@@ -310,7 +310,7 @@ static inline TranslationBlock *tb_find(CPUState *cpu,
             /* There's a chance that our desired tb has been translated while
              * taking the locks so we check again inside the lock.
              */
-            tb = tb_find_physical(cpu, pc, cs_base, flags);
+            tb = tb_htable_lookup(cpu, pc, cs_base, flags);
             if (!tb) {
                 /* if no translated code available, then translate it now */
                 tb = tb_gen_code(cpu, pc, cs_base, flags, 0);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH 05/10] tcg: Prepare TB invalidation for lockless TB lookup
  2016-07-19  8:32 ` [Qemu-devel] [PATCH 05/10] tcg: Prepare TB invalidation for lockless TB lookup Paolo Bonzini
@ 2016-07-19 19:56   ` Sergey Fedorov
  2016-07-19 22:27     ` Paolo Bonzini
  0 siblings, 1 reply; 20+ messages in thread
From: Sergey Fedorov @ 2016-07-19 19:56 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: sergey.fedorov, alex.bennee

On 19/07/16 11:32, Paolo Bonzini wrote:
>

It looks much better now :)

> When invalidating a translation block, set an invalid flag into the
> TranslationBlock structure first.  It is also necessary to check whether
> the target TB is still valid after acquiring 'tb_lock' but before calling
> tb_add_jump() since TB lookup is to be performed out of 'tb_lock' in
> future. Note that we don't have to check 'last_tb'; an already invalidated
> TB will not be executed anyway and it is thus safe to patch it.
>
> Suggested-by: Sergey Fedorov <serge.fdrv@gmail.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  cpu-exec.c              | 5 +++--
>  include/exec/exec-all.h | 2 ++
>  translate-all.c         | 3 +++
>  3 files changed, 8 insertions(+), 2 deletions(-)
(snip)
> diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
> index acda7b6..bc0bcc5 100644
> --- a/include/exec/exec-all.h
> +++ b/include/exec/exec-all.h
> @@ -213,6 +213,8 @@ struct TranslationBlock {
>  #define CF_USE_ICOUNT  0x20000
>  #define CF_IGNORE_ICOUNT 0x40000 /* Do not generate icount code */
>  
> +    uint16_t invalid;

Why not "int"?

> +
>      void *tc_ptr;    /* pointer to the translated code */
>      uint8_t *tc_search;  /* pointer to search data */
>      /* original tb when cflags has CF_NOCACHE */
>

Thanks,
Sergey

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH 05/10] tcg: Prepare TB invalidation for lockless TB lookup
  2016-07-19 19:56   ` Sergey Fedorov
@ 2016-07-19 22:27     ` Paolo Bonzini
  2016-07-21  8:36       ` Sergey Fedorov
  0 siblings, 1 reply; 20+ messages in thread
From: Paolo Bonzini @ 2016-07-19 22:27 UTC (permalink / raw)
  To: Sergey Fedorov; +Cc: qemu-devel, sergey fedorov, alex bennee



----- Original Message -----
> From: "Sergey Fedorov" <serge.fdrv@gmail.com>
> To: "Paolo Bonzini" <pbonzini@redhat.com>, qemu-devel@nongnu.org
> Cc: "sergey fedorov" <sergey.fedorov@linaro.org>, "alex bennee" <alex.bennee@linaro.org>
> Sent: Tuesday, July 19, 2016 9:56:49 PM
> Subject: Re: [PATCH 05/10] tcg: Prepare TB invalidation for lockless TB lookup
> 
> On 19/07/16 11:32, Paolo Bonzini wrote:
> >
> 
> It looks much better now :)
> 
> > When invalidating a translation block, set an invalid flag into the
> > TranslationBlock structure first.  It is also necessary to check whether
> > the target TB is still valid after acquiring 'tb_lock' but before calling
> > tb_add_jump() since TB lookup is to be performed out of 'tb_lock' in
> > future. Note that we don't have to check 'last_tb'; an already invalidated
> > TB will not be executed anyway and it is thus safe to patch it.
> >
> > Suggested-by: Sergey Fedorov <serge.fdrv@gmail.com>
> > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> > ---
> >  cpu-exec.c              | 5 +++--
> >  include/exec/exec-all.h | 2 ++
> >  translate-all.c         | 3 +++
> >  3 files changed, 8 insertions(+), 2 deletions(-)
> (snip)
> > diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
> > index acda7b6..bc0bcc5 100644
> > --- a/include/exec/exec-all.h
> > +++ b/include/exec/exec-all.h
> > @@ -213,6 +213,8 @@ struct TranslationBlock {
> >  #define CF_USE_ICOUNT  0x20000
> >  #define CF_IGNORE_ICOUNT 0x40000 /* Do not generate icount code */
> >  
> > +    uint16_t invalid;
> 
> Why not "int"?

There's a hole there, we may want to move something else so I
used a smaller data type.  Even uint8_t would do.

Paolo
> 
> > +
> >      void *tc_ptr;    /* pointer to the translated code */
> >      uint8_t *tc_search;  /* pointer to search data */
> >      /* original tb when cflags has CF_NOCACHE */
> >
> 
> Thanks,
> Sergey
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH 05/10] tcg: Prepare TB invalidation for lockless TB lookup
  2016-07-19 22:27     ` Paolo Bonzini
@ 2016-07-21  8:36       ` Sergey Fedorov
  2016-07-21  9:04         ` Paolo Bonzini
  2016-07-21 11:25         ` Paolo Bonzini
  0 siblings, 2 replies; 20+ messages in thread
From: Sergey Fedorov @ 2016-07-21  8:36 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, sergey fedorov, alex bennee

On 20/07/16 01:27, Paolo Bonzini wrote:
>
> ----- Original Message -----
>> From: "Sergey Fedorov" <serge.fdrv@gmail.com>
>> To: "Paolo Bonzini" <pbonzini@redhat.com>, qemu-devel@nongnu.org
>> Cc: "sergey fedorov" <sergey.fedorov@linaro.org>, "alex bennee" <alex.bennee@linaro.org>
>> Sent: Tuesday, July 19, 2016 9:56:49 PM
>> Subject: Re: [PATCH 05/10] tcg: Prepare TB invalidation for lockless TB lookup
>>
>> On 19/07/16 11:32, Paolo Bonzini wrote:
>> It looks much better now :)
>>
>>> When invalidating a translation block, set an invalid flag into the
>>> TranslationBlock structure first.  It is also necessary to check whether
>>> the target TB is still valid after acquiring 'tb_lock' but before calling
>>> tb_add_jump() since TB lookup is to be performed out of 'tb_lock' in
>>> future. Note that we don't have to check 'last_tb'; an already invalidated
>>> TB will not be executed anyway and it is thus safe to patch it.
>>>
>>> Suggested-by: Sergey Fedorov <serge.fdrv@gmail.com>
>>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>>> ---
>>>  cpu-exec.c              | 5 +++--
>>>  include/exec/exec-all.h | 2 ++
>>>  translate-all.c         | 3 +++
>>>  3 files changed, 8 insertions(+), 2 deletions(-)
>> (snip)
>>> diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
>>> index acda7b6..bc0bcc5 100644
>>> --- a/include/exec/exec-all.h
>>> +++ b/include/exec/exec-all.h
>>> @@ -213,6 +213,8 @@ struct TranslationBlock {
>>>  #define CF_USE_ICOUNT  0x20000
>>>  #define CF_IGNORE_ICOUNT 0x40000 /* Do not generate icount code */
>>>  
>>> +    uint16_t invalid;
>> Why not "int"?
> There's a hole there, we may want to move something else so I
> used a smaller data type.  Even uint8_t would do.

But could simple "bool" work as well here?

>
> Paolo
>>> +
>>>      void *tc_ptr;    /* pointer to the translated code */
>>>      uint8_t *tc_search;  /* pointer to search data */

Are you sure that the hole is over there, not here?

Kind regards,
Sergey

>>>      /* original tb when cflags has CF_NOCACHE */
>>>
>> Thanks,
>> Sergey
>>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH 05/10] tcg: Prepare TB invalidation for lockless TB lookup
  2016-07-21  8:36       ` Sergey Fedorov
@ 2016-07-21  9:04         ` Paolo Bonzini
  2016-07-21 11:25         ` Paolo Bonzini
  1 sibling, 0 replies; 20+ messages in thread
From: Paolo Bonzini @ 2016-07-21  9:04 UTC (permalink / raw)
  To: Sergey Fedorov; +Cc: alex bennee, qemu-devel, sergey fedorov



On 21/07/2016 10:36, Sergey Fedorov wrote:
> On 20/07/16 01:27, Paolo Bonzini wrote:
>>
>> ----- Original Message -----
>>> From: "Sergey Fedorov" <serge.fdrv@gmail.com>
>>> To: "Paolo Bonzini" <pbonzini@redhat.com>, qemu-devel@nongnu.org
>>> Cc: "sergey fedorov" <sergey.fedorov@linaro.org>, "alex bennee" <alex.bennee@linaro.org>
>>> Sent: Tuesday, July 19, 2016 9:56:49 PM
>>> Subject: Re: [PATCH 05/10] tcg: Prepare TB invalidation for lockless TB lookup
>>>
>>> On 19/07/16 11:32, Paolo Bonzini wrote:
>>> It looks much better now :)
>>>
>>>> When invalidating a translation block, set an invalid flag into the
>>>> TranslationBlock structure first.  It is also necessary to check whether
>>>> the target TB is still valid after acquiring 'tb_lock' but before calling
>>>> tb_add_jump() since TB lookup is to be performed out of 'tb_lock' in
>>>> future. Note that we don't have to check 'last_tb'; an already invalidated
>>>> TB will not be executed anyway and it is thus safe to patch it.
>>>>
>>>> Suggested-by: Sergey Fedorov <serge.fdrv@gmail.com>
>>>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>>>> ---
>>>>  cpu-exec.c              | 5 +++--
>>>>  include/exec/exec-all.h | 2 ++
>>>>  translate-all.c         | 3 +++
>>>>  3 files changed, 8 insertions(+), 2 deletions(-)
>>> (snip)
>>>> diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
>>>> index acda7b6..bc0bcc5 100644
>>>> --- a/include/exec/exec-all.h
>>>> +++ b/include/exec/exec-all.h
>>>> @@ -213,6 +213,8 @@ struct TranslationBlock {
>>>>  #define CF_USE_ICOUNT  0x20000
>>>>  #define CF_IGNORE_ICOUNT 0x40000 /* Do not generate icount code */
>>>>  
>>>> +    uint16_t invalid;
>>> Why not "int"?
>> There's a hole there, we may want to move something else so I
>> used a smaller data type.  Even uint8_t would do.
> 
> But could simple "bool" work as well here?

sizeof(bool) is sometimes 1 sometimes 4.  Since in the future we might
want to pack TranslationBlock for better locality, I thought it was
better to stick with uint*_t.

Paolo

> 
>>
>> Paolo
>>>> +
>>>>      void *tc_ptr;    /* pointer to the translated code */
>>>>      uint8_t *tc_search;  /* pointer to search data */
> 
> Are you sure that the hole is over there, not here?
> 
> Kind regards,
> Sergey
> 
>>>>      /* original tb when cflags has CF_NOCACHE */
>>>>
>>> Thanks,
>>> Sergey
>>>
> 
> 
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH 05/10] tcg: Prepare TB invalidation for lockless TB lookup
  2016-07-21  8:36       ` Sergey Fedorov
  2016-07-21  9:04         ` Paolo Bonzini
@ 2016-07-21 11:25         ` Paolo Bonzini
  2016-07-21 17:42           ` Sergey Fedorov
  1 sibling, 1 reply; 20+ messages in thread
From: Paolo Bonzini @ 2016-07-21 11:25 UTC (permalink / raw)
  To: Sergey Fedorov; +Cc: qemu-devel, sergey fedorov, alex bennee



----- Original Message -----
> From: "Sergey Fedorov" <serge.fdrv@gmail.com>
> To: "Paolo Bonzini" <pbonzini@redhat.com>
> Cc: qemu-devel@nongnu.org, "sergey fedorov" <sergey.fedorov@linaro.org>, "alex bennee" <alex.bennee@linaro.org>
> Sent: Thursday, July 21, 2016 10:36:35 AM
> Subject: Re: [PATCH 05/10] tcg: Prepare TB invalidation for lockless TB lookup
> 
> On 20/07/16 01:27, Paolo Bonzini wrote:
> >
> > ----- Original Message -----
> >> From: "Sergey Fedorov" <serge.fdrv@gmail.com>
> >> To: "Paolo Bonzini" <pbonzini@redhat.com>, qemu-devel@nongnu.org
> >> Cc: "sergey fedorov" <sergey.fedorov@linaro.org>, "alex bennee"
> >> <alex.bennee@linaro.org>
> >> Sent: Tuesday, July 19, 2016 9:56:49 PM
> >> Subject: Re: [PATCH 05/10] tcg: Prepare TB invalidation for lockless TB
> >> lookup
> >>
> >> On 19/07/16 11:32, Paolo Bonzini wrote:
> >> It looks much better now :)
> >>
> >>> When invalidating a translation block, set an invalid flag into the
> >>> TranslationBlock structure first.  It is also necessary to check whether
> >>> the target TB is still valid after acquiring 'tb_lock' but before calling
> >>> tb_add_jump() since TB lookup is to be performed out of 'tb_lock' in
> >>> future. Note that we don't have to check 'last_tb'; an already
> >>> invalidated
> >>> TB will not be executed anyway and it is thus safe to patch it.
> >>>
> >>> Suggested-by: Sergey Fedorov <serge.fdrv@gmail.com>
> >>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> >>> ---
> >>>  cpu-exec.c              | 5 +++--
> >>>  include/exec/exec-all.h | 2 ++
> >>>  translate-all.c         | 3 +++
> >>>  3 files changed, 8 insertions(+), 2 deletions(-)
> >> (snip)
> >>> diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
> >>> index acda7b6..bc0bcc5 100644
> >>> --- a/include/exec/exec-all.h
> >>> +++ b/include/exec/exec-all.h
> >>> @@ -213,6 +213,8 @@ struct TranslationBlock {
> >>>  #define CF_USE_ICOUNT  0x20000
> >>>  #define CF_IGNORE_ICOUNT 0x40000 /* Do not generate icount code */
> >>>  
> >>> +    uint16_t invalid;
> >> Why not "int"?
> > There's a hole there, we may want to move something else so I
> > used a smaller data type.  Even uint8_t would do.
> 
> But could simple "bool" work as well here?
> 
> >
> > Paolo
> >>> +
> >>>      void *tc_ptr;    /* pointer to the translated code */
> >>>      uint8_t *tc_search;  /* pointer to search data */
> 
> Are you sure that the hole is over there, not here?

Yes, all pointers have the same size.  For 32-bit hosts, my
patch introduces a 2-byte hole.  For 64-bit hosts, it reduces
a 4-byte hole to 2-byte.

Before:

    target_ulong pc;      /* 0 */ 
    target_ulong cs_base; /* 4 */ 
    uint32_t flags;       /* 8  */
    uint16_t size;        /* 12 */
    uint16_t icount;      /* 14 */
    uint32_t cflags;      /* 16 */
    /* 4 byte padding     ** 20 on 64-bit systems */
    void *tc_ptr;         /* 24 on 64-bit systems, 20 on 32-bit */

After:

    target_ulong pc;      /* 0 */ 
    target_ulong cs_base; /* 4 */ 
    uint32_t flags;       /* 8  */
    uint16_t size;        /* 12 */
    uint16_t icount;      /* 14 */
    uint32_t cflags;      /* 16 */
    uint16_t invalid;     /* 20 */
    /* 2 byte padding     ** 22 */
    void *tc_ptr;         /* 24 */

BTW, another reason to use uint16_t is that I suspect tb->icount can
be made redundant with cflags, so we might move tb->invalid up if
tb->icount is removed.

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH 05/10] tcg: Prepare TB invalidation for lockless TB lookup
  2016-07-21 11:25         ` Paolo Bonzini
@ 2016-07-21 17:42           ` Sergey Fedorov
  0 siblings, 0 replies; 20+ messages in thread
From: Sergey Fedorov @ 2016-07-21 17:42 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, sergey fedorov, alex bennee

On 21/07/16 14:25, Paolo Bonzini wrote:
>
> ----- Original Message -----
>> From: "Sergey Fedorov" <serge.fdrv@gmail.com>
>> To: "Paolo Bonzini" <pbonzini@redhat.com>
>> Cc: qemu-devel@nongnu.org, "sergey fedorov" <sergey.fedorov@linaro.org>, "alex bennee" <alex.bennee@linaro.org>
>> Sent: Thursday, July 21, 2016 10:36:35 AM
>> Subject: Re: [PATCH 05/10] tcg: Prepare TB invalidation for lockless TB lookup
>>
>> On 20/07/16 01:27, Paolo Bonzini wrote:
>>> ----- Original Message -----
>>>> From: "Sergey Fedorov" <serge.fdrv@gmail.com>
>>>> To: "Paolo Bonzini" <pbonzini@redhat.com>, qemu-devel@nongnu.org
>>>> Cc: "sergey fedorov" <sergey.fedorov@linaro.org>, "alex bennee"
>>>> <alex.bennee@linaro.org>
>>>> Sent: Tuesday, July 19, 2016 9:56:49 PM
>>>> Subject: Re: [PATCH 05/10] tcg: Prepare TB invalidation for lockless TB
>>>> lookup
>>>>
>>>> On 19/07/16 11:32, Paolo Bonzini wrote:
>>>> It looks much better now :)
>>>>
>>>>> When invalidating a translation block, set an invalid flag into the
>>>>> TranslationBlock structure first.  It is also necessary to check whether
>>>>> the target TB is still valid after acquiring 'tb_lock' but before calling
>>>>> tb_add_jump() since TB lookup is to be performed out of 'tb_lock' in
>>>>> future. Note that we don't have to check 'last_tb'; an already
>>>>> invalidated
>>>>> TB will not be executed anyway and it is thus safe to patch it.
>>>>>
>>>>> Suggested-by: Sergey Fedorov <serge.fdrv@gmail.com>
>>>>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>>>>> ---
>>>>>  cpu-exec.c              | 5 +++--
>>>>>  include/exec/exec-all.h | 2 ++
>>>>>  translate-all.c         | 3 +++
>>>>>  3 files changed, 8 insertions(+), 2 deletions(-)
>>>> (snip)
>>>>> diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
>>>>> index acda7b6..bc0bcc5 100644
>>>>> --- a/include/exec/exec-all.h
>>>>> +++ b/include/exec/exec-all.h
>>>>> @@ -213,6 +213,8 @@ struct TranslationBlock {
>>>>>  #define CF_USE_ICOUNT  0x20000
>>>>>  #define CF_IGNORE_ICOUNT 0x40000 /* Do not generate icount code */
>>>>>  
>>>>> +    uint16_t invalid;
>>>> Why not "int"?
>>> There's a hole there, we may want to move something else so I
>>> used a smaller data type.  Even uint8_t would do.
>> But could simple "bool" work as well here?
>>
>>> Paolo
>>>>> +
>>>>>      void *tc_ptr;    /* pointer to the translated code */
>>>>>      uint8_t *tc_search;  /* pointer to search data */
>> Are you sure that the hole is over there, not here?
> Yes, all pointers have the same size.  For 32-bit hosts, my
> patch introduces a 2-byte hole.  For 64-bit hosts, it reduces
> a 4-byte hole to 2-byte.
>
> Before:
>
>     target_ulong pc;      /* 0 */ 
>     target_ulong cs_base; /* 4 */ 
>     uint32_t flags;       /* 8  */
>     uint16_t size;        /* 12 */
>     uint16_t icount;      /* 14 */
>     uint32_t cflags;      /* 16 */
>     /* 4 byte padding     ** 20 on 64-bit systems */
>     void *tc_ptr;         /* 24 on 64-bit systems, 20 on 32-bit */
>
> After:
>
>     target_ulong pc;      /* 0 */ 
>     target_ulong cs_base; /* 4 */ 
>     uint32_t flags;       /* 8  */
>     uint16_t size;        /* 12 */
>     uint16_t icount;      /* 14 */
>     uint32_t cflags;      /* 16 */
>     uint16_t invalid;     /* 20 */
>     /* 2 byte padding     ** 22 */
>     void *tc_ptr;         /* 24 */
>
> BTW, another reason to use uint16_t is that I suspect tb->icount can
> be made redundant with cflags, so we might move tb->invalid up if
> tb->icount is removed.

Fair enough. I think it might make sense to use uint8_t to make a hint
for possible future compaction of TranslationBlock structure.

Thanks,
Sergey

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH 10/10] tcg: rename tb_find_physical()
  2016-07-19  8:32 ` [Qemu-devel] [PATCH 10/10] tcg: rename tb_find_physical() Paolo Bonzini
@ 2016-09-08 12:39   ` Alex Bennée
  0 siblings, 0 replies; 20+ messages in thread
From: Alex Bennée @ 2016-09-08 12:39 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, serge.fdrv, sergey.fedorov


Paolo Bonzini <pbonzini@redhat.com> writes:

> From: Sergey Fedorov <serge.fdrv@gmail.com>
>
> In fact, this function does not exactly perform a lookup by physical
> address as it is descibed for comment on get_page_addr_code(). Thus
> it may be a bit confusing to have "physical" in it's name. So rename it
> to tb_htable_lookup() to better reflect its actual functionality.
>
> Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
> Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
> Message-Id: <20160715175852.30749-13-sergey.fedorov@linaro.org>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

> ---
>  cpu-exec.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/cpu-exec.c b/cpu-exec.c
> index f7f60b1..b240b9f 100644
> --- a/cpu-exec.c
> +++ b/cpu-exec.c
> @@ -260,7 +260,7 @@ static bool tb_cmp(const void *p, const void *d)
>      return false;
>  }
>
> -static TranslationBlock *tb_find_physical(CPUState *cpu,
> +static TranslationBlock *tb_htable_lookup(CPUState *cpu,
>                                            target_ulong pc,
>                                            target_ulong cs_base,
>                                            uint32_t flags)
> @@ -296,7 +296,7 @@ static inline TranslationBlock *tb_find(CPUState *cpu,
>      tb = atomic_read(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)]);
>      if (unlikely(!tb || tb->pc != pc || tb->cs_base != cs_base ||
>                   tb->flags != flags)) {
> -        tb = tb_find_physical(cpu, pc, cs_base, flags);
> +        tb = tb_htable_lookup(cpu, pc, cs_base, flags);
>          if (!tb) {
>
>              /* mmap_lock is needed by tb_gen_code, and mmap_lock must be
> @@ -310,7 +310,7 @@ static inline TranslationBlock *tb_find(CPUState *cpu,
>              /* There's a chance that our desired tb has been translated while
>               * taking the locks so we check again inside the lock.
>               */
> -            tb = tb_find_physical(cpu, pc, cs_base, flags);
> +            tb = tb_htable_lookup(cpu, pc, cs_base, flags);
>              if (!tb) {
>                  /* if no translated code available, then translate it now */
>                  tb = tb_gen_code(cpu, pc, cs_base, flags, 0);


--
Alex Bennée

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH 02/10] tcg: Pass last_tb by value to tb_find_fast()
  2016-07-19  8:32 ` [Qemu-devel] [PATCH 02/10] tcg: Pass last_tb by value to tb_find_fast() Paolo Bonzini
@ 2016-09-08 12:44   ` Alex Bennée
  2016-09-08 13:28     ` Paolo Bonzini
  0 siblings, 1 reply; 20+ messages in thread
From: Alex Bennée @ 2016-09-08 12:44 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, serge.fdrv, sergey.fedorov


Paolo Bonzini <pbonzini@redhat.com> writes:

> From: Sergey Fedorov <serge.fdrv@gmail.com>
>
> This is a small clean up. tb_find_fast() is a final consumer of this
> variable so no need to pass it by reference. 'last_tb' is always updated
> by subsequent cpu_loop_exec_tb() in cpu_exec().
>
<snip>
> @@ -621,7 +620,7 @@ int cpu_exec(CPUState *cpu)
>              cpu->tb_flushed = false; /* reset before first TB lookup */
>              for(;;) {
>                  cpu_handle_interrupt(cpu, &last_tb);
> -                tb = tb_find_fast(cpu, &last_tb, tb_exit);
> +                tb = tb_find_fast(cpu, last_tb, tb_exit);

Maybe a comment here for those that missed the subtly in the commit
message?

                   /* cpu_loop_exec_tb updates a to a new last_tb */

>                  cpu_loop_exec_tb(cpu, tb, &last_tb, &tb_exit, &sc);

You could even make it explicit and change cpu_loop_exec_tb to return
last_tb instead of passing by reference. Then it would be even clearer
when reading the code.

>                  /* Try to align the host and virtual clocks
>                     if the guest is in advance */



--
Alex Bennée

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH 02/10] tcg: Pass last_tb by value to tb_find_fast()
  2016-09-08 12:44   ` Alex Bennée
@ 2016-09-08 13:28     ` Paolo Bonzini
  0 siblings, 0 replies; 20+ messages in thread
From: Paolo Bonzini @ 2016-09-08 13:28 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu-devel, serge.fdrv, sergey.fedorov



On 08/09/2016 14:44, Alex Bennée wrote:
>> >              cpu->tb_flushed = false; /* reset before first TB lookup */
>> >              for(;;) {
>> >                  cpu_handle_interrupt(cpu, &last_tb);
>> > -                tb = tb_find_fast(cpu, &last_tb, tb_exit);
>> > +                tb = tb_find_fast(cpu, last_tb, tb_exit);
> Maybe a comment here for those that missed the subtly in the commit
> message?
> 
>                    /* cpu_loop_exec_tb updates a to a new last_tb */
> 
>> >                  cpu_loop_exec_tb(cpu, tb, &last_tb, &tb_exit, &sc);
> You could even make it explicit and change cpu_loop_exec_tb to return
> last_tb instead of passing by reference. Then it would be even clearer
> when reading the code.
> 

I gave it a quick shot and it's not that simple... One simpler possibility
is to take this patch one step further and merge "tb" and "last_tb", but
I've not tested it yet:

diff --git a/cpu-exec.c b/cpu-exec.c
index cf511f1..80e6ff5 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -515,11 +515,11 @@ static inline void cpu_handle_interrupt(CPUState *cpu,
     }
 }
 
-static inline void cpu_loop_exec_tb(CPUState *cpu, TranslationBlock *tb,
-                                    TranslationBlock **last_tb, int *tb_exit,
-                                    SyncClocks *sc)
+static inline void cpu_loop_exec_tb(CPUState *cpu, TranslationBlock **last_tb,
+                                    int *tb_exit, SyncClocks *sc)
 {
     uintptr_t ret;
+    TranslationBlock *tb = *last_tb;
 
     if (unlikely(cpu->exit_request)) {
         return;
@@ -609,7 +609,7 @@ int cpu_exec(CPUState *cpu)
     for(;;) {
         /* prepare setjmp context for exception handling */
         if (sigsetjmp(cpu->jmp_env, 0) == 0) {
-            TranslationBlock *tb, *last_tb = NULL;
+            TranslationBlock *tb = NULL;
             int tb_exit = 0;
 
             /* if an exception is pending, we execute it here */
@@ -619,9 +619,9 @@ int cpu_exec(CPUState *cpu)
 
             cpu->tb_flushed = false; /* reset before first TB lookup */
             for(;;) {
-                cpu_handle_interrupt(cpu, &last_tb);
-                tb = tb_find_fast(cpu, last_tb, tb_exit);
-                cpu_loop_exec_tb(cpu, tb, &last_tb, &tb_exit, &sc);
+                cpu_handle_interrupt(cpu, &tb);
+                tb = tb_find_fast(cpu, tb, tb_exit);
+                cpu_loop_exec_tb(cpu, &tb, &tb_exit, &sc);
                 /* Try to align the host and virtual clocks
                    if the guest is in advance */
                 align_clocks(&sc, cpu);

It seems better to me to do it as a follow-up step.

Paolo

^ permalink raw reply related	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2016-09-08 13:28 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-19  8:32 [Qemu-devel] [PATCH v5 00/10] Reduce lock contention on TCG hot-path Paolo Bonzini
2016-07-19  8:32 ` [Qemu-devel] [PATCH 01/10] util/qht: Document memory ordering assumptions Paolo Bonzini
2016-07-19  8:32 ` [Qemu-devel] [PATCH 02/10] tcg: Pass last_tb by value to tb_find_fast() Paolo Bonzini
2016-09-08 12:44   ` Alex Bennée
2016-09-08 13:28     ` Paolo Bonzini
2016-07-19  8:32 ` [Qemu-devel] [PATCH 03/10] tcg: Prepare safe tb_jmp_cache lookup out of tb_lock Paolo Bonzini
2016-07-19  8:32 ` [Qemu-devel] [PATCH 04/10] tcg: Prepare safe access to tb_flushed " Paolo Bonzini
2016-07-19  8:32 ` [Qemu-devel] [PATCH 05/10] tcg: Prepare TB invalidation for lockless TB lookup Paolo Bonzini
2016-07-19 19:56   ` Sergey Fedorov
2016-07-19 22:27     ` Paolo Bonzini
2016-07-21  8:36       ` Sergey Fedorov
2016-07-21  9:04         ` Paolo Bonzini
2016-07-21 11:25         ` Paolo Bonzini
2016-07-21 17:42           ` Sergey Fedorov
2016-07-19  8:32 ` [Qemu-devel] [PATCH 06/10] tcg: set up tb->page_addr before insertion Paolo Bonzini
2016-07-19  8:32 ` [Qemu-devel] [PATCH 07/10] tcg: cpu-exec: remove tb_lock from the hot-path Paolo Bonzini
2016-07-19  8:32 ` [Qemu-devel] [PATCH 08/10] tcg: Avoid bouncing tb_lock between tb_gen_code() and tb_add_jump() Paolo Bonzini
2016-07-19  8:32 ` [Qemu-devel] [PATCH 09/10] tcg: Merge tb_find_slow() and tb_find_fast() Paolo Bonzini
2016-07-19  8:32 ` [Qemu-devel] [PATCH 10/10] tcg: rename tb_find_physical() Paolo Bonzini
2016-09-08 12:39   ` Alex Bennée

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.