All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v3 0/4] tcg: Misc clean-up patches
@ 2016-04-14 20:45 Sergey Fedorov
  2016-04-14 20:45 ` [Qemu-devel] [PATCH v3 1/4] tcg: code_bitmap is not used by user-mode emulation Sergey Fedorov
                   ` (3 more replies)
  0 siblings, 4 replies; 16+ messages in thread
From: Sergey Fedorov @ 2016-04-14 20:45 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Bennée, Sergey Fedorov, Paolo Bonzini,
	Peter Crosthwaite, Richard Henderson

From: Sergey Fedorov <serge.fdrv@gmail.com>

This patch series consists of various general TCG clean-up patches extracted
from Paolo's MTTCG tree [1] and Alex's MTTCG base enablement tree [2]. I also
add here a patch from myself to rework tb_invalidated_flag based on the Paolo's
"tcg: move tb_invalidated_flag to CPUState" patch from the original verions of
this series. The idea is to review and merge these patches separately from the
MTTCG series to cut the latter and make it easier to review.

The series' tree can be found in a public git repository [3].

[1] https://github.com/bonzini/qemu/tree/mttcg
[2] https://github.com/stsquad/qemu/tree/mttcg/base-patches-v2
[3] https://github.com/sergefdrv/qemu/tree/tcg-cleanup-v3

Summary of changes:
 v3:
  * Add a patch to rework tb_invalidated_flag from myself [PATCH v3 4/4]
 v2:
  * Complete code_bitmap elimination [PATCH v2 1/3]
  * Take Alex's version of tb_find_physical() reorganization [PATCH v2 2/3]
  * Drop [PATCH 3/5] completely
  * Drop [PATCH 4/5] and [PATCH 5/5] for separate series
  * Take Alex's rebase of Paolo's icount code eliding [PATCH v2 3/3]

Alex Bennée (1):
  tcg: reorganize tb_find_physical loop

Paolo Bonzini (2):
  tcg: code_bitmap is not used by user-mode emulation
  cpu-exec: elide more icount code if CONFIG_USER_ONLY

Sergey Fedorov (1):
  tcg: rework tb_invalidated_flag

 cpu-exec.c              | 73 +++++++++++++++++++++++++++++--------------------
 include/exec/exec-all.h |  2 --
 include/qom/cpu.h       |  2 ++
 translate-all.c         | 16 ++++++-----
 4 files changed, 54 insertions(+), 39 deletions(-)

-- 
2.8.1

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Qemu-devel] [PATCH v3 1/4] tcg: code_bitmap is not used by user-mode emulation
  2016-04-14 20:45 [Qemu-devel] [PATCH v3 0/4] tcg: Misc clean-up patches Sergey Fedorov
@ 2016-04-14 20:45 ` Sergey Fedorov
  2016-04-14 20:45 ` [Qemu-devel] [PATCH v3 2/4] tcg: reorganize tb_find_physical loop Sergey Fedorov
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 16+ messages in thread
From: Sergey Fedorov @ 2016-04-14 20:45 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Bennée, Sergey Fedorov, Paolo Bonzini,
	Peter Crosthwaite, Richard Henderson, Sergey Fedorov

From: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[Sergey Fedorov: eliminate the field entirely in user-mode]
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Reviewed-by: Richard Henderson  <rth@twiddle.net>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
---

Changes in v2:
 * The field is eliminated entirely in user-mode

 translate-all.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/translate-all.c b/translate-all.c
index 8329ea60eeda..0d5d9449dc6b 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -75,8 +75,9 @@ typedef struct PageDesc {
     /* in order to optimize self modifying code, we count the number
        of lookups we do to a given page to use a bitmap */
     unsigned int code_write_count;
+#ifdef CONFIG_SOFTMMU
     unsigned long *code_bitmap;
-#if defined(CONFIG_USER_ONLY)
+#else
     unsigned long flags;
 #endif
 } PageDesc;
@@ -784,8 +785,10 @@ void tb_free(TranslationBlock *tb)
 
 static inline void invalidate_page_bitmap(PageDesc *p)
 {
+#ifdef CONFIG_SOFTMMU
     g_free(p->code_bitmap);
     p->code_bitmap = NULL;
+#endif
     p->code_write_count = 0;
 }
 
@@ -1019,6 +1022,7 @@ void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr)
     tcg_ctx.tb_ctx.tb_phys_invalidate_count++;
 }
 
+#ifdef CONFIG_SOFTMMU
 static void build_page_bitmap(PageDesc *p)
 {
     int n, tb_start, tb_end;
@@ -1047,6 +1051,7 @@ static void build_page_bitmap(PageDesc *p)
         tb = tb->page_next[n];
     }
 }
+#endif
 
 /* Called with mmap_lock held for user mode emulation.  */
 TranslationBlock *tb_gen_code(CPUState *cpu,
@@ -1296,6 +1301,7 @@ void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end,
 #endif
 }
 
+#ifdef CONFIG_SOFTMMU
 /* len must be <= 8 and start must be a multiple of len */
 void tb_invalidate_phys_page_fast(tb_page_addr_t start, int len)
 {
@@ -1333,8 +1339,7 @@ void tb_invalidate_phys_page_fast(tb_page_addr_t start, int len)
         tb_invalidate_phys_page_range(start, start + len, 1);
     }
 }
-
-#if !defined(CONFIG_SOFTMMU)
+#else
 /* Called with mmap_lock held.  */
 static void tb_invalidate_phys_page(tb_page_addr_t addr,
                                     uintptr_t pc, void *puc,
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [Qemu-devel] [PATCH v3 2/4] tcg: reorganize tb_find_physical loop
  2016-04-14 20:45 [Qemu-devel] [PATCH v3 0/4] tcg: Misc clean-up patches Sergey Fedorov
  2016-04-14 20:45 ` [Qemu-devel] [PATCH v3 1/4] tcg: code_bitmap is not used by user-mode emulation Sergey Fedorov
@ 2016-04-14 20:45 ` Sergey Fedorov
  2016-04-14 20:45 ` [Qemu-devel] [PATCH v3 3/4] cpu-exec: elide more icount code if CONFIG_USER_ONLY Sergey Fedorov
  2016-04-14 20:45 ` [Qemu-devel] [PATCH v3 4/4] tcg: rework tb_invalidated_flag Sergey Fedorov
  3 siblings, 0 replies; 16+ messages in thread
From: Sergey Fedorov @ 2016-04-14 20:45 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Bennée, Sergey Fedorov, Paolo Bonzini,
	Peter Crosthwaite, Richard Henderson, Sergey Fedorov

From: Alex Bennée <alex.bennee@linaro.org>

Put some comments and improve code structure. This should help reading
the code.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
[Sergey Fedorov: provide commit message; bring back resetting of
tb_invalidated_flag]
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Reviewed-by: Richard Henderson  <rth@twiddle.net>
---
 cpu-exec.c | 44 ++++++++++++++++++++++++--------------------
 1 file changed, 24 insertions(+), 20 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index bbfcbfb54385..4cba4efc92b2 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -223,10 +223,9 @@ static TranslationBlock *tb_find_physical(CPUState *cpu,
                                           uint64_t flags)
 {
     CPUArchState *env = (CPUArchState *)cpu->env_ptr;
-    TranslationBlock *tb, **ptb1;
+    TranslationBlock *tb, **tb_hash_head, **ptb1;
     unsigned int h;
     tb_page_addr_t phys_pc, phys_page1;
-    target_ulong virt_page2;
 
     tcg_ctx.tb_ctx.tb_invalidated_flag = 0;
 
@@ -234,37 +233,42 @@ static TranslationBlock *tb_find_physical(CPUState *cpu,
     phys_pc = get_page_addr_code(env, pc);
     phys_page1 = phys_pc & TARGET_PAGE_MASK;
     h = tb_phys_hash_func(phys_pc);
-    ptb1 = &tcg_ctx.tb_ctx.tb_phys_hash[h];
-    for(;;) {
-        tb = *ptb1;
-        if (!tb) {
-            return NULL;
-        }
+
+    /* Start at head of the hash entry */
+    ptb1 = tb_hash_head = &tcg_ctx.tb_ctx.tb_phys_hash[h];
+    tb = *ptb1;
+
+    while (tb) {
         if (tb->pc == pc &&
             tb->page_addr[0] == phys_page1 &&
             tb->cs_base == cs_base &&
             tb->flags == flags) {
-            /* check next page if needed */
-            if (tb->page_addr[1] != -1) {
-                tb_page_addr_t phys_page2;
 
-                virt_page2 = (pc & TARGET_PAGE_MASK) +
-                    TARGET_PAGE_SIZE;
-                phys_page2 = get_page_addr_code(env, virt_page2);
+            if (tb->page_addr[1] == -1) {
+                /* done, we have a match */
+                break;
+            } else {
+                /* check next page if needed */
+                target_ulong virt_page2 = (pc & TARGET_PAGE_MASK) +
+                                          TARGET_PAGE_SIZE;
+                tb_page_addr_t phys_page2 = get_page_addr_code(env, virt_page2);
+
                 if (tb->page_addr[1] == phys_page2) {
                     break;
                 }
-            } else {
-                break;
             }
         }
+
         ptb1 = &tb->phys_hash_next;
+        tb = *ptb1;
     }
 
-    /* Move the TB to the head of the list */
-    *ptb1 = tb->phys_hash_next;
-    tb->phys_hash_next = tcg_ctx.tb_ctx.tb_phys_hash[h];
-    tcg_ctx.tb_ctx.tb_phys_hash[h] = tb;
+    if (tb) {
+        /* Move the TB to the head of the list */
+        *ptb1 = tb->phys_hash_next;
+        tb->phys_hash_next = *tb_hash_head;
+        *tb_hash_head = tb;
+    }
     return tb;
 }
 
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [Qemu-devel] [PATCH v3 3/4] cpu-exec: elide more icount code if CONFIG_USER_ONLY
  2016-04-14 20:45 [Qemu-devel] [PATCH v3 0/4] tcg: Misc clean-up patches Sergey Fedorov
  2016-04-14 20:45 ` [Qemu-devel] [PATCH v3 1/4] tcg: code_bitmap is not used by user-mode emulation Sergey Fedorov
  2016-04-14 20:45 ` [Qemu-devel] [PATCH v3 2/4] tcg: reorganize tb_find_physical loop Sergey Fedorov
@ 2016-04-14 20:45 ` Sergey Fedorov
  2016-04-14 20:45 ` [Qemu-devel] [PATCH v3 4/4] tcg: rework tb_invalidated_flag Sergey Fedorov
  3 siblings, 0 replies; 16+ messages in thread
From: Sergey Fedorov @ 2016-04-14 20:45 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Bennée, Sergey Fedorov, Paolo Bonzini,
	Peter Crosthwaite, Richard Henderson, Sergey Fedorov

From: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[Alex Bennée: #ifndef replay code to match elided functions]
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
---
 cpu-exec.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/cpu-exec.c b/cpu-exec.c
index 4cba4efc92b2..36942340d7e3 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -192,6 +192,7 @@ static inline tcg_target_ulong cpu_tb_exec(CPUState *cpu, TranslationBlock *itb)
     return next_tb;
 }
 
+#ifndef CONFIG_USER_ONLY
 /* Execute the code without caching the generated code. An interpreter
    could be used if available. */
 static void cpu_exec_nocache(CPUState *cpu, int max_cycles,
@@ -216,6 +217,7 @@ static void cpu_exec_nocache(CPUState *cpu, int max_cycles,
     tb_phys_invalidate(tb, -1);
     tb_free(tb);
 }
+#endif
 
 static TranslationBlock *tb_find_physical(CPUState *cpu,
                                           target_ulong pc,
@@ -430,12 +432,14 @@ int cpu_exec(CPUState *cpu)
                     }
 #endif
                 }
+#ifndef CONFIG_USER_ONLY
             } else if (replay_has_exception()
                        && cpu->icount_decr.u16.low + cpu->icount_extra == 0) {
                 /* try to cause an exception pending in the log */
                 cpu_exec_nocache(cpu, 1, tb_find_fast(cpu), true);
                 ret = -1;
                 break;
+#endif
             }
 
             next_tb = 0; /* force lookup of first TB */
@@ -545,6 +549,9 @@ int cpu_exec(CPUState *cpu)
                     case TB_EXIT_ICOUNT_EXPIRED:
                     {
                         /* Instruction counter expired.  */
+#ifdef CONFIG_USER_ONLY
+                        abort();
+#else
                         int insns_left = cpu->icount_decr.u32;
                         if (cpu->icount_extra && insns_left >= 0) {
                             /* Refill decrementer and continue execution.  */
@@ -564,6 +571,7 @@ int cpu_exec(CPUState *cpu)
                             cpu_loop_exit(cpu);
                         }
                         break;
+#endif
                     }
                     default:
                         break;
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [Qemu-devel] [PATCH v3 4/4] tcg: rework tb_invalidated_flag
  2016-04-14 20:45 [Qemu-devel] [PATCH v3 0/4] tcg: Misc clean-up patches Sergey Fedorov
                   ` (2 preceding siblings ...)
  2016-04-14 20:45 ` [Qemu-devel] [PATCH v3 3/4] cpu-exec: elide more icount code if CONFIG_USER_ONLY Sergey Fedorov
@ 2016-04-14 20:45 ` Sergey Fedorov
  2016-04-18 14:09   ` Alex Bennée
  3 siblings, 1 reply; 16+ messages in thread
From: Sergey Fedorov @ 2016-04-14 20:45 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Bennée, Sergey Fedorov, Paolo Bonzini,
	Peter Crosthwaite, Richard Henderson, Sergey Fedorov,
	Andreas Färber

From: Sergey Fedorov <serge.fdrv@gmail.com>

'tb_invalidated_flag' was meant to catch two events:
 * some TB has been invalidated by tb_phys_invalidate();
 * the whole translation buffer has been flushed by tb_flush().

Then it was checked:
 * in cpu_exec() to ensure that the last executed TB can be safely
   linked to directly call the next one;
 * in cpu_exec_nocache() to decide if the original TB should be provided
   for further possible invalidation along with the temporarily
   generated TB.

It is always safe to patch an invalidated TB since it is not going to be
used anyway. It is also safe to call tb_phys_invalidate() for an already
invalidated TB. Thus, setting this flag in tb_phys_invalidate() is
simply unnecessary. Moreover, it can prevent from pretty proper linking
of TBs, if any arbitrary TB has been invalidated. So just don't touch it
in tb_phys_invalidate().

If this flag is only used to catch whether tb_flush() has been called
then rename it to 'tb_flushed'. Declare it as 'bool' and stick to using
only 'true' and 'false' to set its value. Also, instead of setting it in
tb_gen_code(), just after tb_flush() has been called, do it right inside
of tb_flush().

In cpu_exec(), this flag is used to track if tb_flush() has been called
and have made 'next_tb' (a reference to the last executed TB) invalid
for linking it to directly call the next TB. tb_flush() can be called
during the CPU execution loop from tb_gen_code(), during TB execution or
by another thread while 'tb_lock' is released. Catch for translation
buffer flush reliably by resetting this flag once before first TB lookup
and each time we find it set before trying to add a direct jump. Don't
touch in in tb_find_physical().

Each vCPU has its own execution loop in multithreaded mode and thus
should have its own copy of the flag to be able to reset it with its own
'next_tb' and don't affect any other vCPU execution thread. So make this
flag per-vCPU and move it to CPUState.

In cpu_exec_nocache(), we only need to check if tb_flush() has been
called from tb_gen_code() called by cpu_exec_nocache() itself. To do
this reliably, preserve the old value of the flag, reset it before
calling tb_gen_code(), check afterwards, and combine the saved value
back to the flag.

This patch is based on the patch "tcg: move tb_invalidated_flag to
CPUState" from Paolo Bonzini <pbonzini@redhat.com>.

Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
---
 cpu-exec.c              | 21 +++++++++++----------
 include/exec/exec-all.h |  2 --
 include/qom/cpu.h       |  2 ++
 translate-all.c         |  5 +----
 4 files changed, 14 insertions(+), 16 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index 36942340d7e3..966e016b7d75 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -199,16 +199,20 @@ static void cpu_exec_nocache(CPUState *cpu, int max_cycles,
                              TranslationBlock *orig_tb, bool ignore_icount)
 {
     TranslationBlock *tb;
+    bool old_tb_flushed;
 
     /* Should never happen.
        We only end up here when an existing TB is too long.  */
     if (max_cycles > CF_COUNT_MASK)
         max_cycles = CF_COUNT_MASK;
 
+    old_tb_flushed = cpu->tb_flushed;
+    cpu->tb_flushed = false;
     tb = tb_gen_code(cpu, orig_tb->pc, orig_tb->cs_base, orig_tb->flags,
                      max_cycles | CF_NOCACHE
                          | (ignore_icount ? CF_IGNORE_ICOUNT : 0));
-    tb->orig_tb = tcg_ctx.tb_ctx.tb_invalidated_flag ? NULL : orig_tb;
+    tb->orig_tb = cpu->tb_flushed ? NULL : orig_tb;
+    cpu->tb_flushed |= old_tb_flushed;
     cpu->current_tb = tb;
     /* execute the generated code */
     trace_exec_tb_nocache(tb, tb->pc);
@@ -229,8 +233,6 @@ static TranslationBlock *tb_find_physical(CPUState *cpu,
     unsigned int h;
     tb_page_addr_t phys_pc, phys_page1;
 
-    tcg_ctx.tb_ctx.tb_invalidated_flag = 0;
-
     /* find translated block using physical mappings */
     phys_pc = get_page_addr_code(env, pc);
     phys_page1 = phys_pc & TARGET_PAGE_MASK;
@@ -443,6 +445,7 @@ int cpu_exec(CPUState *cpu)
             }
 
             next_tb = 0; /* force lookup of first TB */
+            cpu->tb_flushed = false;
             for(;;) {
                 interrupt_request = cpu->interrupt_request;
                 if (unlikely(interrupt_request)) {
@@ -507,14 +510,12 @@ int cpu_exec(CPUState *cpu)
                 }
                 tb_lock();
                 tb = tb_find_fast(cpu);
-                /* Note: we do it here to avoid a gcc bug on Mac OS X when
-                   doing it in tb_find_slow */
-                if (tcg_ctx.tb_ctx.tb_invalidated_flag) {
-                    /* as some TB could have been invalidated because
-                       of memory exceptions while generating the code, we
-                       must recompute the hash index here */
+                if (cpu->tb_flushed) {
+                    /* Ensure that no TB jump will be modified as the
+                     * translation buffer has been flushed.
+                     */
                     next_tb = 0;
-                    tcg_ctx.tb_ctx.tb_invalidated_flag = 0;
+                    cpu->tb_flushed = false;
                 }
                 /* see if we can patch the calling TB. When the TB
                    spans two pages, we cannot safely do a direct
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 736209505a68..0ba845e12b12 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -288,8 +288,6 @@ struct TBContext {
     /* statistics */
     int tb_flush_count;
     int tb_phys_invalidate_count;
-
-    int tb_invalidated_flag;
 };
 
 void tb_free(TranslationBlock *tb);
diff --git a/include/qom/cpu.h b/include/qom/cpu.h
index b7a10f791acc..c1ae24d1fcbb 100644
--- a/include/qom/cpu.h
+++ b/include/qom/cpu.h
@@ -238,6 +238,7 @@ struct kvm_run;
  * @crash_occurred: Indicates the OS reported a crash (panic) for this CPU
  * @tcg_exit_req: Set to force TCG to stop executing linked TBs for this
  *           CPU and return to its top level loop.
+ * @tb_flushed: Indicates the translation buffer has been flushed.
  * @singlestep_enabled: Flags for single-stepping.
  * @icount_extra: Instructions until next timer event.
  * @icount_decr: Number of cycles left, with interrupt flag in high bit.
@@ -289,6 +290,7 @@ struct CPUState {
     bool stopped;
     bool crash_occurred;
     bool exit_request;
+    bool tb_flushed;
     uint32_t interrupt_request;
     int singlestep_enabled;
     int64_t icount_extra;
diff --git a/translate-all.c b/translate-all.c
index 0d5d9449dc6b..acce9396581e 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -844,6 +844,7 @@ void tb_flush(CPUState *cpu)
 
     CPU_FOREACH(cpu) {
         memset(cpu->tb_jmp_cache, 0, sizeof(cpu->tb_jmp_cache));
+        cpu->tb_flushed = true;
     }
 
     memset(tcg_ctx.tb_ctx.tb_phys_hash, 0, sizeof(tcg_ctx.tb_ctx.tb_phys_hash));
@@ -990,8 +991,6 @@ void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr)
         invalidate_page_bitmap(p);
     }
 
-    tcg_ctx.tb_ctx.tb_invalidated_flag = 1;
-
     /* remove the TB from the hash list */
     h = tb_jmp_cache_hash_func(tb->pc);
     CPU_FOREACH(cpu) {
@@ -1081,8 +1080,6 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
         /* cannot fail at this point */
         tb = tb_alloc(pc);
         assert(tb != NULL);
-        /* Don't forget to invalidate previous TB info.  */
-        tcg_ctx.tb_ctx.tb_invalidated_flag = 1;
     }
 
     gen_code_buf = tcg_ctx.code_gen_ptr;
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [PATCH v3 4/4] tcg: rework tb_invalidated_flag
  2016-04-14 20:45 ` [Qemu-devel] [PATCH v3 4/4] tcg: rework tb_invalidated_flag Sergey Fedorov
@ 2016-04-18 14:09   ` Alex Bennée
  2016-04-18 15:05     ` Sergey Fedorov
  0 siblings, 1 reply; 16+ messages in thread
From: Alex Bennée @ 2016-04-18 14:09 UTC (permalink / raw)
  To: Sergey Fedorov
  Cc: qemu-devel, Sergey Fedorov, Paolo Bonzini, Peter Crosthwaite,
	Richard Henderson, Andreas Färber


Sergey Fedorov <sergey.fedorov@linaro.org> writes:

> From: Sergey Fedorov <serge.fdrv@gmail.com>
>
> 'tb_invalidated_flag' was meant to catch two events:
>  * some TB has been invalidated by tb_phys_invalidate();
>  * the whole translation buffer has been flushed by tb_flush().
>
> Then it was checked:
>  * in cpu_exec() to ensure that the last executed TB can be safely
>    linked to directly call the next one;
>  * in cpu_exec_nocache() to decide if the original TB should be provided
>    for further possible invalidation along with the temporarily
>    generated TB.
>
> It is always safe to patch an invalidated TB since it is not going to be
> used anyway.

Wouldn't that have implications for code searching through the linked
list of jump patched TBs?

> It is also safe to call tb_phys_invalidate() for an already
> invalidated TB. Thus, setting this flag in tb_phys_invalidate() is
> simply unnecessary. Moreover, it can prevent from pretty proper linking
> of TBs, if any arbitrary TB has been invalidated. So just don't touch it
> in tb_phys_invalidate().
>
> If this flag is only used to catch whether tb_flush() has been called
> then rename it to 'tb_flushed'. Declare it as 'bool' and stick to using
> only 'true' and 'false' to set its value. Also, instead of setting it in
> tb_gen_code(), just after tb_flush() has been called, do it right inside
> of tb_flush().
>
> In cpu_exec(), this flag is used to track if tb_flush() has been called
> and have made 'next_tb' (a reference to the last executed TB) invalid
> for linking it to directly call the next TB. tb_flush() can be called
> during the CPU execution loop from tb_gen_code(), during TB execution or
> by another thread while 'tb_lock' is released. Catch for translation
> buffer flush reliably by resetting this flag once before first TB lookup
> and each time we find it set before trying to add a direct jump. Don't
> touch in in tb_find_physical().
>
> Each vCPU has its own execution loop in multithreaded mode and thus
> should have its own copy of the flag to be able to reset it with its own
> 'next_tb' and don't affect any other vCPU execution thread. So make this
> flag per-vCPU and move it to CPUState.
>
> In cpu_exec_nocache(), we only need to check if tb_flush() has been
> called from tb_gen_code() called by cpu_exec_nocache() itself. To do
> this reliably, preserve the old value of the flag, reset it before
> calling tb_gen_code(), check afterwards, and combine the saved value
> back to the flag.
>
> This patch is based on the patch "tcg: move tb_invalidated_flag to
> CPUState" from Paolo Bonzini <pbonzini@redhat.com>.
>
> Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
> Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
> ---
>  cpu-exec.c              | 21 +++++++++++----------
>  include/exec/exec-all.h |  2 --
>  include/qom/cpu.h       |  2 ++
>  translate-all.c         |  5 +----
>  4 files changed, 14 insertions(+), 16 deletions(-)
>
> diff --git a/cpu-exec.c b/cpu-exec.c
> index 36942340d7e3..966e016b7d75 100644
> --- a/cpu-exec.c
> +++ b/cpu-exec.c
> @@ -199,16 +199,20 @@ static void cpu_exec_nocache(CPUState *cpu, int max_cycles,
>                               TranslationBlock *orig_tb, bool ignore_icount)
>  {
>      TranslationBlock *tb;
> +    bool old_tb_flushed;
>
>      /* Should never happen.
>         We only end up here when an existing TB is too long.  */
>      if (max_cycles > CF_COUNT_MASK)
>          max_cycles = CF_COUNT_MASK;
>
> +    old_tb_flushed = cpu->tb_flushed;
> +    cpu->tb_flushed = false;
>      tb = tb_gen_code(cpu, orig_tb->pc, orig_tb->cs_base, orig_tb->flags,
>                       max_cycles | CF_NOCACHE
>                           | (ignore_icount ? CF_IGNORE_ICOUNT : 0));
> -    tb->orig_tb = tcg_ctx.tb_ctx.tb_invalidated_flag ? NULL : orig_tb;
> +    tb->orig_tb = cpu->tb_flushed ? NULL : orig_tb;
> +    cpu->tb_flushed |= old_tb_flushed;
>      cpu->current_tb = tb;
>      /* execute the generated code */
>      trace_exec_tb_nocache(tb, tb->pc);
> @@ -229,8 +233,6 @@ static TranslationBlock *tb_find_physical(CPUState *cpu,
>      unsigned int h;
>      tb_page_addr_t phys_pc, phys_page1;
>
> -    tcg_ctx.tb_ctx.tb_invalidated_flag = 0;
> -
>      /* find translated block using physical mappings */
>      phys_pc = get_page_addr_code(env, pc);
>      phys_page1 = phys_pc & TARGET_PAGE_MASK;
> @@ -443,6 +445,7 @@ int cpu_exec(CPUState *cpu)
>              }
>
>              next_tb = 0; /* force lookup of first TB */
> +            cpu->tb_flushed = false;
>              for(;;) {
>                  interrupt_request = cpu->interrupt_request;
>                  if (unlikely(interrupt_request)) {
> @@ -507,14 +510,12 @@ int cpu_exec(CPUState *cpu)
>                  }
>                  tb_lock();
>                  tb = tb_find_fast(cpu);
> -                /* Note: we do it here to avoid a gcc bug on Mac OS X when
> -                   doing it in tb_find_slow */

Is this still true? Would it make more sense to push the patching down
to the gen_code?

I got slightly confused as to what next_tb ends up meaning at what point
in the run loop.

> -                if (tcg_ctx.tb_ctx.tb_invalidated_flag) {
> -                    /* as some TB could have been invalidated because
> -                       of memory exceptions while generating the code, we
> -                       must recompute the hash index here */
> +                if (cpu->tb_flushed) {
> +                    /* Ensure that no TB jump will be modified as the
> +                     * translation buffer has been flushed.
> +                     */
>                      next_tb = 0;
> -                    tcg_ctx.tb_ctx.tb_invalidated_flag = 0;
> +                    cpu->tb_flushed = false;
>                  }
>                  /* see if we can patch the calling TB. When the TB
>                     spans two pages, we cannot safely do a direct
> diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
> index 736209505a68..0ba845e12b12 100644
> --- a/include/exec/exec-all.h
> +++ b/include/exec/exec-all.h
> @@ -288,8 +288,6 @@ struct TBContext {
>      /* statistics */
>      int tb_flush_count;
>      int tb_phys_invalidate_count;
> -
> -    int tb_invalidated_flag;
>  };
>
>  void tb_free(TranslationBlock *tb);
> diff --git a/include/qom/cpu.h b/include/qom/cpu.h
> index b7a10f791acc..c1ae24d1fcbb 100644
> --- a/include/qom/cpu.h
> +++ b/include/qom/cpu.h
> @@ -238,6 +238,7 @@ struct kvm_run;
>   * @crash_occurred: Indicates the OS reported a crash (panic) for this CPU
>   * @tcg_exit_req: Set to force TCG to stop executing linked TBs for this
>   *           CPU and return to its top level loop.
> + * @tb_flushed: Indicates the translation buffer has been flushed.
>   * @singlestep_enabled: Flags for single-stepping.
>   * @icount_extra: Instructions until next timer event.
>   * @icount_decr: Number of cycles left, with interrupt flag in high bit.
> @@ -289,6 +290,7 @@ struct CPUState {
>      bool stopped;
>      bool crash_occurred;
>      bool exit_request;
> +    bool tb_flushed;
>      uint32_t interrupt_request;
>      int singlestep_enabled;
>      int64_t icount_extra;
> diff --git a/translate-all.c b/translate-all.c
> index 0d5d9449dc6b..acce9396581e 100644
> --- a/translate-all.c
> +++ b/translate-all.c
> @@ -844,6 +844,7 @@ void tb_flush(CPUState *cpu)
>
>      CPU_FOREACH(cpu) {
>          memset(cpu->tb_jmp_cache, 0, sizeof(cpu->tb_jmp_cache));
> +        cpu->tb_flushed = true;
>      }
>
>      memset(tcg_ctx.tb_ctx.tb_phys_hash, 0, sizeof(tcg_ctx.tb_ctx.tb_phys_hash));
> @@ -990,8 +991,6 @@ void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr)
>          invalidate_page_bitmap(p);
>      }
>
> -    tcg_ctx.tb_ctx.tb_invalidated_flag = 1;
> -
>      /* remove the TB from the hash list */
>      h = tb_jmp_cache_hash_func(tb->pc);
>      CPU_FOREACH(cpu) {
> @@ -1081,8 +1080,6 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
>          /* cannot fail at this point */
>          tb = tb_alloc(pc);
>          assert(tb != NULL);
> -        /* Don't forget to invalidate previous TB info.  */
> -        tcg_ctx.tb_ctx.tb_invalidated_flag = 1;
>      }
>
>      gen_code_buf = tcg_ctx.code_gen_ptr;


--
Alex Bennée

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [PATCH v3 4/4] tcg: rework tb_invalidated_flag
  2016-04-18 14:09   ` Alex Bennée
@ 2016-04-18 15:05     ` Sergey Fedorov
  2016-04-18 15:34       ` Peter Maydell
  2016-04-18 17:17       ` Alex Bennée
  0 siblings, 2 replies; 16+ messages in thread
From: Sergey Fedorov @ 2016-04-18 15:05 UTC (permalink / raw)
  To: Alex Bennée, Sergey Fedorov
  Cc: qemu-devel, Paolo Bonzini, Peter Crosthwaite, Richard Henderson,
	Andreas Färber

[-- Attachment #1: Type: text/plain, Size: 2643 bytes --]

On 18/04/16 17:09, Alex Bennée wrote:
> Sergey Fedorov <sergey.fedorov@linaro.org> writes:
>> From: Sergey Fedorov <serge.fdrv@gmail.com>
>>
>> 'tb_invalidated_flag' was meant to catch two events:
>>  * some TB has been invalidated by tb_phys_invalidate();
>>  * the whole translation buffer has been flushed by tb_flush().
>>
>> Then it was checked:
>>  * in cpu_exec() to ensure that the last executed TB can be safely
>>    linked to directly call the next one;
>>  * in cpu_exec_nocache() to decide if the original TB should be provided
>>    for further possible invalidation along with the temporarily
>>    generated TB.
>>
>> It is always safe to patch an invalidated TB since it is not going to be
>> used anyway.
> Wouldn't that have implications for code searching through the linked
> list of jump patched TBs?

The only implication I can see is that the jump in that already
invalidated TB could just get reset back later on in
tb_phys_invalidate(). We could keep track of invalidated TB's and skip
patching those but it's also some overhead in the main CPU execution
loop wich I'm not sure is worth to be introduced.

(snip)
>> diff --git a/cpu-exec.c b/cpu-exec.c
(snip)
>> @@ -507,14 +510,12 @@ int cpu_exec(CPUState *cpu)
>>                  }
>>                  tb_lock();
>>                  tb = tb_find_fast(cpu);
>> -                /* Note: we do it here to avoid a gcc bug on Mac OS X when
>> -                   doing it in tb_find_slow */
> Is this still true? Would it make more sense to push the patching down
> to the gen_code?

This comment comes up to the commit:

    commit 1538800276aa7228d74f9d00bf275f54dc9e9b43              
    Author: bellard <bellard@c046a42c-6fe2-441c-8c8c-71466251a162>
    Date:   Mon Dec 19 01:42:32 2005 +0000                       

        workaround for gcc bug on PowerPC


It was added more than ten years ago. Anyway, now this code is here not
because of the bug: we need to reset 'next_tb' which is a local variable
in cpu_exec(). Personally, I don't think it would be neater to hide it
into gen_code(). Do you have some thoughts on how we could benefit from
doing so? BTW, I had a feeling that it may be useful to reorganize
cpu_exec() a bit, although I don't have a solid idea of how to do this
so far.

>
> I got slightly confused as to what next_tb ends up meaning at what point
> in the run loop.

Yes, it seems to be a misleading name for this variable. As it was
discussed on IRC, I'd like to break it into two variables, say 'last_tb'
and 'tb_exit_idx', as soon as cpu_tb_exec() returns. Probably this
series could also include such a patch.

Kind regards,
Sergey

[-- Attachment #2: Type: text/html, Size: 3914 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [PATCH v3 4/4] tcg: rework tb_invalidated_flag
  2016-04-18 15:05     ` Sergey Fedorov
@ 2016-04-18 15:34       ` Peter Maydell
  2016-04-18 17:17       ` Alex Bennée
  1 sibling, 0 replies; 16+ messages in thread
From: Peter Maydell @ 2016-04-18 15:34 UTC (permalink / raw)
  To: Sergey Fedorov
  Cc: Alex Bennée, Sergey Fedorov, Paolo Bonzini,
	Richard Henderson, QEMU Developers, Andreas Färber,
	Peter Crosthwaite

On 18 April 2016 at 16:05, Sergey Fedorov <serge.fdrv@gmail.com> wrote:
> @@ -507,14 +510,12 @@ int cpu_exec(CPUState *cpu)
>                  }
>                  tb_lock();
>                  tb = tb_find_fast(cpu);
> -                /* Note: we do it here to avoid a gcc bug on Mac OS X when
> -                   doing it in tb_find_slow */
>
> Is this still true? Would it make more sense to push the patching down
> to the gen_code?
>
>
> This comment comes up to the commit:
>
> commit 1538800276aa7228d74f9d00bf275f54dc9e9b43
> Author: bellard <bellard@c046a42c-6fe2-441c-8c8c-71466251a162>
> Date:   Mon Dec 19 01:42:32 2005 +0000
>
>     workaround for gcc bug on PowerPC

We no longer support building on PPC OSX hosts, and haven't
for some time -- the earliest OSX we support is 10.5, which is
x86-only. So I think we can probably safely dump the gcc bug
workaround, as long as we're clear in commit messages that we're
doing so.

Looking at that commit, it seems to be related to changing the
value of T0 inside the tb_find_slow() function; at that point
in QEMU's history, T0 was a variable which was tied to a specific
host CPU register via gcc's 'asm("registername")' syntax; it's
highly likely that the gcc bug was specific to handling of that
kind of register variable anyway.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [PATCH v3 4/4] tcg: rework tb_invalidated_flag
  2016-04-18 15:05     ` Sergey Fedorov
  2016-04-18 15:34       ` Peter Maydell
@ 2016-04-18 17:17       ` Alex Bennée
  2016-04-18 17:51         ` Sergey Fedorov
  1 sibling, 1 reply; 16+ messages in thread
From: Alex Bennée @ 2016-04-18 17:17 UTC (permalink / raw)
  To: Sergey Fedorov
  Cc: Sergey Fedorov, qemu-devel, Paolo Bonzini, Peter Crosthwaite,
	Richard Henderson, Andreas Färber


Sergey Fedorov <serge.fdrv@gmail.com> writes:

> On 18/04/16 17:09, Alex Bennée wrote:
>> Sergey Fedorov <sergey.fedorov@linaro.org> writes:
>>> From: Sergey Fedorov <serge.fdrv@gmail.com>
>>>
>>> 'tb_invalidated_flag' was meant to catch two events:
>>>  * some TB has been invalidated by tb_phys_invalidate();
>>>  * the whole translation buffer has been flushed by tb_flush().
>>>
>>> Then it was checked:
>>>  * in cpu_exec() to ensure that the last executed TB can be safely
>>>    linked to directly call the next one;
>>>  * in cpu_exec_nocache() to decide if the original TB should be provided
>>>    for further possible invalidation along with the temporarily
>>>    generated TB.
>>>
>>> It is always safe to patch an invalidated TB since it is not going to be
>>> used anyway.
>> Wouldn't that have implications for code searching through the linked
>> list of jump patched TBs?
>
> The only implication I can see is that the jump in that already
> invalidated TB could just get reset back later on in
> tb_phys_invalidate(). We could keep track of invalidated TB's and skip
> patching those but it's also some overhead in the main CPU execution
> loop wich I'm not sure is worth to be introduced.
>
> (snip)
>>> diff --git a/cpu-exec.c b/cpu-exec.c
> (snip)
>>> @@ -507,14 +510,12 @@ int cpu_exec(CPUState *cpu)
>>>                  }
>>>                  tb_lock();
>>>                  tb = tb_find_fast(cpu);
>>> -                /* Note: we do it here to avoid a gcc bug on Mac OS X when
>>> -                   doing it in tb_find_slow */
>> Is this still true? Would it make more sense to push the patching down
>> to the gen_code?
>
> This comment comes up to the commit:
>
>     commit 1538800276aa7228d74f9d00bf275f54dc9e9b43
>     Author: bellard <bellard@c046a42c-6fe2-441c-8c8c-71466251a162>
>     Date:   Mon Dec 19 01:42:32 2005 +0000
>
>         workaround for gcc bug on PowerPC
>
>
> It was added more than ten years ago. Anyway, now this code is here not
> because of the bug: we need to reset 'next_tb' which is a local variable
> in cpu_exec(). Personally, I don't think it would be neater to hide it
> into gen_code(). Do you have some thoughts on how we could benefit from
> doing so? BTW, I had a feeling that it may be useful to reorganize
> cpu_exec() a bit, although I don't have a solid idea of how to do this
> so far.

I'm mainly eyeing the tb_lock/unlock which would be nice to push further
down the call chain if we can, especially if the need to lock
tb_find_fast can be removed later on.

>>
>> I got slightly confused as to what next_tb ends up meaning at what point
>> in the run loop.
>
> Yes, it seems to be a misleading name for this variable. As it was
> discussed on IRC, I'd like to break it into two variables, say 'last_tb'
> and 'tb_exit_idx', as soon as cpu_tb_exec() returns. Probably this
> series could also include such a patch.

Yes this would be a worthwhile separate patch.

>
> Kind regards,
> Sergey


--
Alex Bennée

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [PATCH v3 4/4] tcg: rework tb_invalidated_flag
  2016-04-18 17:17       ` Alex Bennée
@ 2016-04-18 17:51         ` Sergey Fedorov
  2016-04-21 14:35           ` Sergey Fedorov
  0 siblings, 1 reply; 16+ messages in thread
From: Sergey Fedorov @ 2016-04-18 17:51 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Sergey Fedorov, qemu-devel, Paolo Bonzini, Peter Crosthwaite,
	Richard Henderson, Andreas Färber

On 18/04/16 20:17, Alex Bennée wrote:
> Sergey Fedorov <serge.fdrv@gmail.com> writes:
>
>> On 18/04/16 17:09, Alex Bennée wrote:
>>> Sergey Fedorov <sergey.fedorov@linaro.org> writes:
>>>> diff --git a/cpu-exec.c b/cpu-exec.c
>> (snip)
>>>> @@ -507,14 +510,12 @@ int cpu_exec(CPUState *cpu)
>>>>                  }
>>>>                  tb_lock();
>>>>                  tb = tb_find_fast(cpu);
>>>> -                /* Note: we do it here to avoid a gcc bug on Mac OS X when
>>>> -                   doing it in tb_find_slow */
>>> Is this still true? Would it make more sense to push the patching down
>>> to the gen_code?
>> This comment comes up to the commit:
>>
>>     commit 1538800276aa7228d74f9d00bf275f54dc9e9b43
>>     Author: bellard <bellard@c046a42c-6fe2-441c-8c8c-71466251a162>
>>     Date:   Mon Dec 19 01:42:32 2005 +0000
>>
>>         workaround for gcc bug on PowerPC
>>
>>
>> It was added more than ten years ago. Anyway, now this code is here not
>> because of the bug: we need to reset 'next_tb' which is a local variable
>> in cpu_exec(). Personally, I don't think it would be neater to hide it
>> into gen_code(). Do you have some thoughts on how we could benefit from
>> doing so? BTW, I had a feeling that it may be useful to reorganize
>> cpu_exec() a bit, although I don't have a solid idea of how to do this
>> so far.
> I'm mainly eyeing the tb_lock/unlock which would be nice to push further
> down the call chain if we can, especially if the need to lock
> tb_find_fast can be removed later on.

Yes, it would be nice to possibly have all tb_lock/unlock() calls (or at
least their pairs) in the same block. There is a lot to be thought over :)

Kind regards,
Sergey

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [PATCH v3 4/4] tcg: rework tb_invalidated_flag
  2016-04-18 17:51         ` Sergey Fedorov
@ 2016-04-21 14:35           ` Sergey Fedorov
  2016-04-21 15:55             ` Alex Bennée
  0 siblings, 1 reply; 16+ messages in thread
From: Sergey Fedorov @ 2016-04-21 14:35 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Sergey Fedorov, qemu-devel, Paolo Bonzini, Peter Crosthwaite,
	Richard Henderson, Andreas Färber

On 18/04/16 20:51, Sergey Fedorov wrote:
> On 18/04/16 20:17, Alex Bennée wrote:
>> Sergey Fedorov <serge.fdrv@gmail.com> writes:
>>> On 18/04/16 17:09, Alex Bennée wrote:
>>>> Sergey Fedorov <sergey.fedorov@linaro.org> writes:
>>>>> diff --git a/cpu-exec.c b/cpu-exec.c
>>> (snip)
>>>>> @@ -507,14 +510,12 @@ int cpu_exec(CPUState *cpu)
>>>>>                  }
>>>>>                  tb_lock(); 
>>>>>                  tb = tb_find_fast(cpu);
>>>>> -                /* Note: we do it here to avoid a gcc bug on Mac OS X when
>>>>> -                   doing it in tb_find_slow */
>>>> Is this still true? Would it make more sense to push the patching down
>>>> to the gen_code?
>>> This comment comes up to the commit:
>>>
>>>     commit 1538800276aa7228d74f9d00bf275f54dc9e9b43
>>>     Author: bellard <bellard@c046a42c-6fe2-441c-8c8c-71466251a162>
>>>     Date:   Mon Dec 19 01:42:32 2005 +0000
>>>
>>>         workaround for gcc bug on PowerPC
>>>
>>>
>>> It was added more than ten years ago. Anyway, now this code is here not
>>> because of the bug: we need to reset 'next_tb' which is a local variable
>>> in cpu_exec(). Personally, I don't think it would be neater to hide it
>>> into gen_code(). Do you have some thoughts on how we could benefit from
>>> doing so? BTW, I had a feeling that it may be useful to reorganize
>>> cpu_exec() a bit, although I don't have a solid idea of how to do this
>>> so far.
>> I'm mainly eyeing the tb_lock/unlock which would be nice to push further
>> down the call chain if we can, especially if the need to lock
>> tb_find_fast can be removed later on.
> Yes, it would be nice to possibly have all tb_lock/unlock() calls (or at
> least their pairs) in the same block. There is a lot to be thought over :)

It's not so simple because tb_find_fast() is also called in replay mode
to find a TB for cpu_exec_nocache()... I'm not sure it's worth touching
it now. Although it may be possible to improve the code structure of
cpu_exec() in some other way. (It's really scary, indeed.) Actually,
I've been keeping that in mind for some time. Do you think if MTTCG
would benefit from some cpu_exec() refactoring to make it more clear and
easy to understand?

Kind regards,
Sergey

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [PATCH v3 4/4] tcg: rework tb_invalidated_flag
  2016-04-21 14:35           ` Sergey Fedorov
@ 2016-04-21 15:55             ` Alex Bennée
  2016-04-21 16:16               ` Sergey Fedorov
  0 siblings, 1 reply; 16+ messages in thread
From: Alex Bennée @ 2016-04-21 15:55 UTC (permalink / raw)
  To: Sergey Fedorov
  Cc: Sergey Fedorov, qemu-devel, Paolo Bonzini, Peter Crosthwaite,
	Richard Henderson, Andreas Färber


Sergey Fedorov <serge.fdrv@gmail.com> writes:

> On 18/04/16 20:51, Sergey Fedorov wrote:
>> On 18/04/16 20:17, Alex Bennée wrote:
>>> Sergey Fedorov <serge.fdrv@gmail.com> writes:
>>>> On 18/04/16 17:09, Alex Bennée wrote:
>>>>> Sergey Fedorov <sergey.fedorov@linaro.org> writes:
>>>>>> diff --git a/cpu-exec.c b/cpu-exec.c
>>>> (snip)
>>>>>> @@ -507,14 +510,12 @@ int cpu_exec(CPUState *cpu)
>>>>>>                  }
>>>>>>                  tb_lock();
>>>>>>                  tb = tb_find_fast(cpu);
>>>>>> -                /* Note: we do it here to avoid a gcc bug on Mac OS X when
>>>>>> -                   doing it in tb_find_slow */
>>>>> Is this still true? Would it make more sense to push the patching down
>>>>> to the gen_code?
>>>> This comment comes up to the commit:
>>>>
>>>>     commit 1538800276aa7228d74f9d00bf275f54dc9e9b43
>>>>     Author: bellard <bellard@c046a42c-6fe2-441c-8c8c-71466251a162>
>>>>     Date:   Mon Dec 19 01:42:32 2005 +0000
>>>>
>>>>         workaround for gcc bug on PowerPC
>>>>
>>>>
>>>> It was added more than ten years ago. Anyway, now this code is here not
>>>> because of the bug: we need to reset 'next_tb' which is a local variable
>>>> in cpu_exec(). Personally, I don't think it would be neater to hide it
>>>> into gen_code(). Do you have some thoughts on how we could benefit from
>>>> doing so? BTW, I had a feeling that it may be useful to reorganize
>>>> cpu_exec() a bit, although I don't have a solid idea of how to do this
>>>> so far.
>>> I'm mainly eyeing the tb_lock/unlock which would be nice to push further
>>> down the call chain if we can, especially if the need to lock
>>> tb_find_fast can be removed later on.
>> Yes, it would be nice to possibly have all tb_lock/unlock() calls (or at
>> least their pairs) in the same block. There is a lot to be thought over :)
>
> It's not so simple because tb_find_fast() is also called in replay mode
> to find a TB for cpu_exec_nocache()... I'm not sure it's worth touching
> it now.

If the locking is pushed into tb_find_fast or further down is this an
issue?

> Although it may be possible to improve the code structure of
> cpu_exec() in some other way. (It's really scary, indeed.) Actually,
> I've been keeping that in mind for some time. Do you think if MTTCG
> would benefit from some cpu_exec() refactoring to make it more clear and
> easy to understand?

The cpu-exec stuff certainly suffers from a slight "there be dragons"
quality thanks to the longjmp stuff and quite detailed handling of each
IRQ case in the main loop. It doesn't help there is still some special
x86 #ifdef sauce in there.

It would be nice if we could get to the main out-loop being a page where
the flow is easy to follow.

We handle exceptions
We handle IRQs (be clear about the difference, I think exceptions are internal?)
We loop from TB to TB generating as we go
 - when we exit the loop
  - a longjmp
  - an exit_request
  - an expiration of icount

Having said that apart from tb_lock() considerations there isn't much
messing about MTTCG does around this code. All the important stuff has
been done dealing with the exit conditions from cpu_exec.

I think this puts re-factoring on the nice-to-have list rather than
essential for MTTCG.

>
> Kind regards,
> Sergey


--
Alex Bennée

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [PATCH v3 4/4] tcg: rework tb_invalidated_flag
  2016-04-21 15:55             ` Alex Bennée
@ 2016-04-21 16:16               ` Sergey Fedorov
  2016-04-21 17:18                 ` Sergey Fedorov
  0 siblings, 1 reply; 16+ messages in thread
From: Sergey Fedorov @ 2016-04-21 16:16 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Sergey Fedorov, qemu-devel, Paolo Bonzini, Peter Crosthwaite,
	Richard Henderson, Andreas Färber

On 21/04/16 18:55, Alex Bennée wrote:
> Sergey Fedorov <serge.fdrv@gmail.com> writes:
>
>> On 18/04/16 20:51, Sergey Fedorov wrote:
>>> On 18/04/16 20:17, Alex Bennée wrote:
>>>> Sergey Fedorov <serge.fdrv@gmail.com> writes:
>>>>> On 18/04/16 17:09, Alex Bennée wrote:
>>>>>> Sergey Fedorov <sergey.fedorov@linaro.org> writes:
>>>>>>> diff --git a/cpu-exec.c b/cpu-exec.c
>>>>> (snip)
>>>>>>> @@ -507,14 +510,12 @@ int cpu_exec(CPUState *cpu)
>>>>>>>                  }
>>>>>>>                  tb_lock();
>>>>>>>                  tb = tb_find_fast(cpu);
>>>>>>> -                /* Note: we do it here to avoid a gcc bug on Mac OS X when
>>>>>>> -                   doing it in tb_find_slow */
>>>>>> Is this still true? Would it make more sense to push the patching down
>>>>>> to the gen_code?
>>>>> This comment comes up to the commit:
>>>>>
>>>>>     commit 1538800276aa7228d74f9d00bf275f54dc9e9b43
>>>>>     Author: bellard <bellard@c046a42c-6fe2-441c-8c8c-71466251a162>
>>>>>     Date:   Mon Dec 19 01:42:32 2005 +0000
>>>>>
>>>>>         workaround for gcc bug on PowerPC
>>>>>
>>>>>
>>>>> It was added more than ten years ago. Anyway, now this code is here not
>>>>> because of the bug: we need to reset 'next_tb' which is a local variable
>>>>> in cpu_exec(). Personally, I don't think it would be neater to hide it
>>>>> into gen_code(). Do you have some thoughts on how we could benefit from
>>>>> doing so? BTW, I had a feeling that it may be useful to reorganize
>>>>> cpu_exec() a bit, although I don't have a solid idea of how to do this
>>>>> so far.
>>>> I'm mainly eyeing the tb_lock/unlock which would be nice to push further
>>>> down the call chain if we can, especially if the need to lock
>>>> tb_find_fast can be removed later on.
>>> Yes, it would be nice to possibly have all tb_lock/unlock() calls (or at
>>> least their pairs) in the same block. There is a lot to be thought over :)
>> It's not so simple because tb_find_fast() is also called in replay mode
>> to find a TB for cpu_exec_nocache()... I'm not sure it's worth touching
>> it now.
> If the locking is pushed into tb_find_fast or further down is this an
> issue?

We would have to pass 'next_tb' (or 'last_tb' and 'tb_exit' after
cleaning it up) if we move TB chaining code to tb_find_fast(). But
tb_find_fast() is also called in replay mode to find a TB for
cpu_exec_nocache() where we don't bother with TB chaining... Do you
think it would be fine to make those changes?

Kind regards,
Sergey

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [PATCH v3 4/4] tcg: rework tb_invalidated_flag
  2016-04-21 16:16               ` Sergey Fedorov
@ 2016-04-21 17:18                 ` Sergey Fedorov
  2016-04-21 21:54                   ` Alex Bennée
  0 siblings, 1 reply; 16+ messages in thread
From: Sergey Fedorov @ 2016-04-21 17:18 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Sergey Fedorov, qemu-devel, Paolo Bonzini, Peter Crosthwaite,
	Richard Henderson, Andreas Färber

On 21/04/16 19:16, Sergey Fedorov wrote:
> On 21/04/16 18:55, Alex Bennée wrote:
>> Sergey Fedorov <serge.fdrv@gmail.com> writes:
>>
>>> On 18/04/16 20:51, Sergey Fedorov wrote:
>>>> On 18/04/16 20:17, Alex Bennée wrote:
>>>>> Sergey Fedorov <serge.fdrv@gmail.com> writes:
>>>>>> On 18/04/16 17:09, Alex Bennée wrote:
>>>>>>> Sergey Fedorov <sergey.fedorov@linaro.org> writes:
>>>>>>>> diff --git a/cpu-exec.c b/cpu-exec.c
>>>>>> (snip)
>>>>>>>> @@ -507,14 +510,12 @@ int cpu_exec(CPUState *cpu)
>>>>>>>>                  }
>>>>>>>>                  tb_lock();
>>>>>>>>                  tb = tb_find_fast(cpu);
>>>>>>>> -                /* Note: we do it here to avoid a gcc bug on Mac OS X when
>>>>>>>> -                   doing it in tb_find_slow */
>>>>>>> Is this still true? Would it make more sense to push the patching down
>>>>>>> to the gen_code?
>>>>>> This comment comes up to the commit:
>>>>>>
>>>>>>     commit 1538800276aa7228d74f9d00bf275f54dc9e9b43
>>>>>>     Author: bellard <bellard@c046a42c-6fe2-441c-8c8c-71466251a162>
>>>>>>     Date:   Mon Dec 19 01:42:32 2005 +0000
>>>>>>
>>>>>>         workaround for gcc bug on PowerPC
>>>>>>
>>>>>>
>>>>>> It was added more than ten years ago. Anyway, now this code is here not
>>>>>> because of the bug: we need to reset 'next_tb' which is a local variable
>>>>>> in cpu_exec(). Personally, I don't think it would be neater to hide it
>>>>>> into gen_code(). Do you have some thoughts on how we could benefit from
>>>>>> doing so? BTW, I had a feeling that it may be useful to reorganize
>>>>>> cpu_exec() a bit, although I don't have a solid idea of how to do this
>>>>>> so far.
>>>>> I'm mainly eyeing the tb_lock/unlock which would be nice to push further
>>>>> down the call chain if we can, especially if the need to lock
>>>>> tb_find_fast can be removed later on.
>>>> Yes, it would be nice to possibly have all tb_lock/unlock() calls (or at
>>>> least their pairs) in the same block. There is a lot to be thought over :)
>>> It's not so simple because tb_find_fast() is also called in replay mode
>>> to find a TB for cpu_exec_nocache()... I'm not sure it's worth touching
>>> it now.
>> If the locking is pushed into tb_find_fast or further down is this an
>> issue?
> We would have to pass 'next_tb' (or 'last_tb' and 'tb_exit' after
> cleaning it up) if we move TB chaining code to tb_find_fast(). But
> tb_find_fast() is also called in replay mode to find a TB for
> cpu_exec_nocache() where we don't bother with TB chaining... Do you
> think it would be fine to make those changes?

Are you thinking about something like this:

diff --git a/cpu-exec.c b/cpu-exec.c
index 1d12e8bc2739..07e9ede49193 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -320,7 +320,9 @@ found:
     return tb;
 }
 
-static inline TranslationBlock *tb_find_fast(CPUState *cpu)
+static inline TranslationBlock *tb_find_fast(CPUState *cpu,
+                                             TranslationBlock **last_tb,
+                                             int tb_exit)
 {
     CPUArchState *env = (CPUArchState *)cpu->env_ptr;
     TranslationBlock *tb;
@@ -331,11 +333,27 @@ static inline TranslationBlock
*tb_find_fast(CPUState *cpu)
        always be the same before a given translated block
        is executed. */
     cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags);
+    tb_lock();
     tb = cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)];
     if (unlikely(!tb || tb->pc != pc || tb->cs_base != cs_base ||
                  tb->flags != flags)) {
         tb = tb_find_slow(cpu, pc, cs_base, flags);
     }
+    if (cpu->tb_flushed) {
+        /* Ensure that no TB jump will be modified as the
+         * translation buffer has been flushed.
+         */
+        *last_tb = NULL;
+        cpu->tb_flushed = false;
+    }
+    /* see if we can patch the calling TB. When the TB
+       spans two pages, we cannot safely do a direct
+       jump. */
+    if (*last_tb != NULL && tb->page_addr[1] == -1
+            && !qemu_loglevel_mask(CPU_LOG_TB_NOCHAIN)) {
+        tb_add_jump(*last_tb, tb_exit, tb);
+    }
+    tb_unlock();
     return tb;
 }
 
@@ -441,7 +459,8 @@ int cpu_exec(CPUState *cpu)
             } else if (replay_has_exception()
                        && cpu->icount_decr.u16.low + cpu->icount_extra
== 0) {
                 /* try to cause an exception pending in the log */
-                cpu_exec_nocache(cpu, 1, tb_find_fast(cpu), true);
+                last_tb = NULL; /* Avoid chaining TBs */
+                cpu_exec_nocache(cpu, 1, tb_find_fast(cpu, &last_tb,
0), true);
                 ret = -1;
                 break;
 #endif
@@ -511,23 +530,7 @@ int cpu_exec(CPUState *cpu)
                     cpu->exception_index = EXCP_INTERRUPT;
                     cpu_loop_exit(cpu);
                 }
-                tb_lock();
-                tb = tb_find_fast(cpu);
-                if (cpu->tb_flushed) {
-                    /* Ensure that no TB jump will be modified as the
-                     * translation buffer has been flushed.
-                     */
-                    last_tb = NULL;
-                    cpu->tb_flushed = false;
-                }
-                /* see if we can patch the calling TB. When the TB
-                   spans two pages, we cannot safely do a direct
-                   jump. */
-                if (last_tb != NULL && tb->page_addr[1] == -1
-                    && !qemu_loglevel_mask(CPU_LOG_TB_NOCHAIN)) {
-                    tb_add_jump(last_tb, tb_exit, tb);
-                }
-                tb_unlock();
+                tb = tb_find_fast(cpu, &last_tb, tb_exit);
                 if (likely(!cpu->exit_request)) {
                     uintptr_t ret;
                     trace_exec_tb(tb, tb->pc);

... right?

Kind regards,
Sergey

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [PATCH v3 4/4] tcg: rework tb_invalidated_flag
  2016-04-21 17:18                 ` Sergey Fedorov
@ 2016-04-21 21:54                   ` Alex Bennée
  2016-04-22  9:49                     ` Sergey Fedorov
  0 siblings, 1 reply; 16+ messages in thread
From: Alex Bennée @ 2016-04-21 21:54 UTC (permalink / raw)
  To: Sergey Fedorov
  Cc: Sergey Fedorov, qemu-devel, Paolo Bonzini, Peter Crosthwaite,
	Richard Henderson, Andreas Färber


Sergey Fedorov <serge.fdrv@gmail.com> writes:

> On 21/04/16 19:16, Sergey Fedorov wrote:
>> On 21/04/16 18:55, Alex Bennée wrote:
>>> Sergey Fedorov <serge.fdrv@gmail.com> writes:
>>>
>>>> On 18/04/16 20:51, Sergey Fedorov wrote:
>>>>> On 18/04/16 20:17, Alex Bennée wrote:
>>>>>> Sergey Fedorov <serge.fdrv@gmail.com> writes:
>>>>>>> On 18/04/16 17:09, Alex Bennée wrote:
>>>>>>>> Sergey Fedorov <sergey.fedorov@linaro.org> writes:
>>>>>>>>> diff --git a/cpu-exec.c b/cpu-exec.c
>>>>>>> (snip)
>>>>>>>>> @@ -507,14 +510,12 @@ int cpu_exec(CPUState *cpu)
>>>>>>>>>                  }
>>>>>>>>>                  tb_lock();
>>>>>>>>>                  tb = tb_find_fast(cpu);
>>>>>>>>> -                /* Note: we do it here to avoid a gcc bug on Mac OS X when
>>>>>>>>> -                   doing it in tb_find_slow */
>>>>>>>> Is this still true? Would it make more sense to push the patching down
>>>>>>>> to the gen_code?
>>>>>>> This comment comes up to the commit:
>>>>>>>
>>>>>>>     commit 1538800276aa7228d74f9d00bf275f54dc9e9b43
>>>>>>>     Author: bellard <bellard@c046a42c-6fe2-441c-8c8c-71466251a162>
>>>>>>>     Date:   Mon Dec 19 01:42:32 2005 +0000
>>>>>>>
>>>>>>>         workaround for gcc bug on PowerPC
>>>>>>>
>>>>>>>
>>>>>>> It was added more than ten years ago. Anyway, now this code is here not
>>>>>>> because of the bug: we need to reset 'next_tb' which is a local variable
>>>>>>> in cpu_exec(). Personally, I don't think it would be neater to hide it
>>>>>>> into gen_code(). Do you have some thoughts on how we could benefit from
>>>>>>> doing so? BTW, I had a feeling that it may be useful to reorganize
>>>>>>> cpu_exec() a bit, although I don't have a solid idea of how to do this
>>>>>>> so far.
>>>>>> I'm mainly eyeing the tb_lock/unlock which would be nice to push further
>>>>>> down the call chain if we can, especially if the need to lock
>>>>>> tb_find_fast can be removed later on.
>>>>> Yes, it would be nice to possibly have all tb_lock/unlock() calls (or at
>>>>> least their pairs) in the same block. There is a lot to be thought over :)
>>>> It's not so simple because tb_find_fast() is also called in replay mode
>>>> to find a TB for cpu_exec_nocache()... I'm not sure it's worth touching
>>>> it now.
>>> If the locking is pushed into tb_find_fast or further down is this an
>>> issue?
>> We would have to pass 'next_tb' (or 'last_tb' and 'tb_exit' after
>> cleaning it up) if we move TB chaining code to tb_find_fast(). But
>> tb_find_fast() is also called in replay mode to find a TB for
>> cpu_exec_nocache() where we don't bother with TB chaining... Do you
>> think it would be fine to make those changes?
>
> Are you thinking about something like this:
>
> diff --git a/cpu-exec.c b/cpu-exec.c
> index 1d12e8bc2739..07e9ede49193 100644
> --- a/cpu-exec.c
> +++ b/cpu-exec.c
> @@ -320,7 +320,9 @@ found:
>      return tb;
>  }
>
> -static inline TranslationBlock *tb_find_fast(CPUState *cpu)
> +static inline TranslationBlock *tb_find_fast(CPUState *cpu,
> +                                             TranslationBlock **last_tb,
> +                                             int tb_exit)
>  {
>      CPUArchState *env = (CPUArchState *)cpu->env_ptr;
>      TranslationBlock *tb;
> @@ -331,11 +333,27 @@ static inline TranslationBlock
> *tb_find_fast(CPUState *cpu)
>         always be the same before a given translated block
>         is executed. */
>      cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags);
> +    tb_lock();
>      tb = cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)];
>      if (unlikely(!tb || tb->pc != pc || tb->cs_base != cs_base ||
>                   tb->flags != flags)) {
>          tb = tb_find_slow(cpu, pc, cs_base, flags);
>      }
> +    if (cpu->tb_flushed) {
> +        /* Ensure that no TB jump will be modified as the
> +         * translation buffer has been flushed.
> +         */
> +        *last_tb = NULL;
> +        cpu->tb_flushed = false;
> +    }
> +    /* see if we can patch the calling TB. When the TB
> +       spans two pages, we cannot safely do a direct
> +       jump. */
> +    if (*last_tb != NULL && tb->page_addr[1] == -1
> +            && !qemu_loglevel_mask(CPU_LOG_TB_NOCHAIN)) {
> +        tb_add_jump(*last_tb, tb_exit, tb);
> +    }
> +    tb_unlock();
>      return tb;
>  }
>
> @@ -441,7 +459,8 @@ int cpu_exec(CPUState *cpu)
>              } else if (replay_has_exception()
>                         && cpu->icount_decr.u16.low + cpu->icount_extra
> == 0) {
>                  /* try to cause an exception pending in the log */
> -                cpu_exec_nocache(cpu, 1, tb_find_fast(cpu), true);
> +                last_tb = NULL; /* Avoid chaining TBs */
> +                cpu_exec_nocache(cpu, 1, tb_find_fast(cpu, &last_tb,
> 0), true);
>                  ret = -1;
>                  break;
>  #endif
> @@ -511,23 +530,7 @@ int cpu_exec(CPUState *cpu)
>                      cpu->exception_index = EXCP_INTERRUPT;
>                      cpu_loop_exit(cpu);
>                  }
> -                tb_lock();
> -                tb = tb_find_fast(cpu);
> -                if (cpu->tb_flushed) {
> -                    /* Ensure that no TB jump will be modified as the
> -                     * translation buffer has been flushed.
> -                     */
> -                    last_tb = NULL;
> -                    cpu->tb_flushed = false;
> -                }
> -                /* see if we can patch the calling TB. When the TB
> -                   spans two pages, we cannot safely do a direct
> -                   jump. */
> -                if (last_tb != NULL && tb->page_addr[1] == -1
> -                    && !qemu_loglevel_mask(CPU_LOG_TB_NOCHAIN)) {
> -                    tb_add_jump(last_tb, tb_exit, tb);
> -                }
> -                tb_unlock();
> +                tb = tb_find_fast(cpu, &last_tb, tb_exit);
>                  if (likely(!cpu->exit_request)) {
>                      uintptr_t ret;
>                      trace_exec_tb(tb, tb->pc);
>
> ... right?

Yeah that sort of thing.

>
> Kind regards,
> Sergey


--
Alex Bennée

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [PATCH v3 4/4] tcg: rework tb_invalidated_flag
  2016-04-21 21:54                   ` Alex Bennée
@ 2016-04-22  9:49                     ` Sergey Fedorov
  0 siblings, 0 replies; 16+ messages in thread
From: Sergey Fedorov @ 2016-04-22  9:49 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Sergey Fedorov, qemu-devel, Paolo Bonzini, Peter Crosthwaite,
	Richard Henderson, Andreas Färber

On 22/04/16 00:54, Alex Bennée wrote:
> Sergey Fedorov <serge.fdrv@gmail.com> writes:
>
>> On 21/04/16 19:16, Sergey Fedorov wrote:
>>> On 21/04/16 18:55, Alex Bennée wrote:
>>>> Sergey Fedorov <serge.fdrv@gmail.com> writes:
>>>>
>>>>> On 18/04/16 20:51, Sergey Fedorov wrote:
>>>>>> On 18/04/16 20:17, Alex Bennée wrote:
>>>>>>> Sergey Fedorov <serge.fdrv@gmail.com> writes:
>>>>>>>> On 18/04/16 17:09, Alex Bennée wrote:
>>>>>>>>> Sergey Fedorov <sergey.fedorov@linaro.org> writes:
>>>>>>>>>> diff --git a/cpu-exec.c b/cpu-exec.c
>>>>>>>> (snip)
>>>>>>>>>> @@ -507,14 +510,12 @@ int cpu_exec(CPUState *cpu)
>>>>>>>>>>                  }
>>>>>>>>>>                  tb_lock();
>>>>>>>>>>                  tb = tb_find_fast(cpu);
>>>>>>>>>> -                /* Note: we do it here to avoid a gcc bug on Mac OS X when
>>>>>>>>>> -                   doing it in tb_find_slow */
>>>>>>>>> Is this still true? Would it make more sense to push the patching down
>>>>>>>>> to the gen_code?
>>>>>>>> This comment comes up to the commit:
>>>>>>>>
>>>>>>>>     commit 1538800276aa7228d74f9d00bf275f54dc9e9b43
>>>>>>>>     Author: bellard <bellard@c046a42c-6fe2-441c-8c8c-71466251a162>
>>>>>>>>     Date:   Mon Dec 19 01:42:32 2005 +0000
>>>>>>>>
>>>>>>>>         workaround for gcc bug on PowerPC
>>>>>>>>
>>>>>>>>
>>>>>>>> It was added more than ten years ago. Anyway, now this code is here not
>>>>>>>> because of the bug: we need to reset 'next_tb' which is a local variable
>>>>>>>> in cpu_exec(). Personally, I don't think it would be neater to hide it
>>>>>>>> into gen_code(). Do you have some thoughts on how we could benefit from
>>>>>>>> doing so? BTW, I had a feeling that it may be useful to reorganize
>>>>>>>> cpu_exec() a bit, although I don't have a solid idea of how to do this
>>>>>>>> so far.
>>>>>>> I'm mainly eyeing the tb_lock/unlock which would be nice to push further
>>>>>>> down the call chain if we can, especially if the need to lock
>>>>>>> tb_find_fast can be removed later on.
>>>>>> Yes, it would be nice to possibly have all tb_lock/unlock() calls (or at
>>>>>> least their pairs) in the same block. There is a lot to be thought over :)
>>>>> It's not so simple because tb_find_fast() is also called in replay mode
>>>>> to find a TB for cpu_exec_nocache()... I'm not sure it's worth touching
>>>>> it now.
>>>> If the locking is pushed into tb_find_fast or further down is this an
>>>> issue?
>>> We would have to pass 'next_tb' (or 'last_tb' and 'tb_exit' after
>>> cleaning it up) if we move TB chaining code to tb_find_fast(). But
>>> tb_find_fast() is also called in replay mode to find a TB for
>>> cpu_exec_nocache() where we don't bother with TB chaining... Do you
>>> think it would be fine to make those changes?
>> Are you thinking about something like this:
>>
>> diff --git a/cpu-exec.c b/cpu-exec.c
>> index 1d12e8bc2739..07e9ede49193 100644
>> --- a/cpu-exec.c
>> +++ b/cpu-exec.c
>> @@ -320,7 +320,9 @@ found:
>>      return tb;
>>  }
>>
>> -static inline TranslationBlock *tb_find_fast(CPUState *cpu)
>> +static inline TranslationBlock *tb_find_fast(CPUState *cpu,
>> +                                             TranslationBlock **last_tb,
>> +                                             int tb_exit)
>>  {
>>      CPUArchState *env = (CPUArchState *)cpu->env_ptr;
>>      TranslationBlock *tb;
>> @@ -331,11 +333,27 @@ static inline TranslationBlock
>> *tb_find_fast(CPUState *cpu)
>>         always be the same before a given translated block
>>         is executed. */
>>      cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags);
>> +    tb_lock();
>>      tb = cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)];
>>      if (unlikely(!tb || tb->pc != pc || tb->cs_base != cs_base ||
>>                   tb->flags != flags)) {
>>          tb = tb_find_slow(cpu, pc, cs_base, flags);
>>      }
>> +    if (cpu->tb_flushed) {
>> +        /* Ensure that no TB jump will be modified as the
>> +         * translation buffer has been flushed.
>> +         */
>> +        *last_tb = NULL;
>> +        cpu->tb_flushed = false;
>> +    }
>> +    /* see if we can patch the calling TB. When the TB
>> +       spans two pages, we cannot safely do a direct
>> +       jump. */
>> +    if (*last_tb != NULL && tb->page_addr[1] == -1
>> +            && !qemu_loglevel_mask(CPU_LOG_TB_NOCHAIN)) {
>> +        tb_add_jump(*last_tb, tb_exit, tb);
>> +    }
>> +    tb_unlock();
>>      return tb;
>>  }
>>
>> @@ -441,7 +459,8 @@ int cpu_exec(CPUState *cpu)
>>              } else if (replay_has_exception()
>>                         && cpu->icount_decr.u16.low + cpu->icount_extra
>> == 0) {
>>                  /* try to cause an exception pending in the log */
>> -                cpu_exec_nocache(cpu, 1, tb_find_fast(cpu), true);
>> +                last_tb = NULL; /* Avoid chaining TBs */
>> +                cpu_exec_nocache(cpu, 1, tb_find_fast(cpu, &last_tb,
>> 0), true);
>>                  ret = -1;
>>                  break;
>>  #endif
>> @@ -511,23 +530,7 @@ int cpu_exec(CPUState *cpu)
>>                      cpu->exception_index = EXCP_INTERRUPT;
>>                      cpu_loop_exit(cpu);
>>                  }
>> -                tb_lock();
>> -                tb = tb_find_fast(cpu);
>> -                if (cpu->tb_flushed) {
>> -                    /* Ensure that no TB jump will be modified as the
>> -                     * translation buffer has been flushed.
>> -                     */
>> -                    last_tb = NULL;
>> -                    cpu->tb_flushed = false;
>> -                }
>> -                /* see if we can patch the calling TB. When the TB
>> -                   spans two pages, we cannot safely do a direct
>> -                   jump. */
>> -                if (last_tb != NULL && tb->page_addr[1] == -1
>> -                    && !qemu_loglevel_mask(CPU_LOG_TB_NOCHAIN)) {
>> -                    tb_add_jump(last_tb, tb_exit, tb);
>> -                }
>> -                tb_unlock();
>> +                tb = tb_find_fast(cpu, &last_tb, tb_exit);
>>                  if (likely(!cpu->exit_request)) {
>>                      uintptr_t ret;
>>                      trace_exec_tb(tb, tb->pc);
>>
>> ... right?
> Yeah that sort of thing.

Okay, I'll include this in the next respin.

Kind regards,
Sergey

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2016-04-22  9:50 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-14 20:45 [Qemu-devel] [PATCH v3 0/4] tcg: Misc clean-up patches Sergey Fedorov
2016-04-14 20:45 ` [Qemu-devel] [PATCH v3 1/4] tcg: code_bitmap is not used by user-mode emulation Sergey Fedorov
2016-04-14 20:45 ` [Qemu-devel] [PATCH v3 2/4] tcg: reorganize tb_find_physical loop Sergey Fedorov
2016-04-14 20:45 ` [Qemu-devel] [PATCH v3 3/4] cpu-exec: elide more icount code if CONFIG_USER_ONLY Sergey Fedorov
2016-04-14 20:45 ` [Qemu-devel] [PATCH v3 4/4] tcg: rework tb_invalidated_flag Sergey Fedorov
2016-04-18 14:09   ` Alex Bennée
2016-04-18 15:05     ` Sergey Fedorov
2016-04-18 15:34       ` Peter Maydell
2016-04-18 17:17       ` Alex Bennée
2016-04-18 17:51         ` Sergey Fedorov
2016-04-21 14:35           ` Sergey Fedorov
2016-04-21 15:55             ` Alex Bennée
2016-04-21 16:16               ` Sergey Fedorov
2016-04-21 17:18                 ` Sergey Fedorov
2016-04-21 21:54                   ` Alex Bennée
2016-04-22  9:49                     ` Sergey Fedorov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.