* [Qemu-devel] [PATCH 01/10] util/qht: Document memory ordering assumptions
2016-07-19 8:32 [Qemu-devel] [PATCH v5 00/10] Reduce lock contention on TCG hot-path Paolo Bonzini
@ 2016-07-19 8:32 ` Paolo Bonzini
2016-07-19 8:32 ` [Qemu-devel] [PATCH 02/10] tcg: Pass last_tb by value to tb_find_fast() Paolo Bonzini
` (8 subsequent siblings)
9 siblings, 0 replies; 20+ messages in thread
From: Paolo Bonzini @ 2016-07-19 8:32 UTC (permalink / raw)
To: qemu-devel; +Cc: serge.fdrv, sergey.fedorov, alex.bennee
It is naturally expected that some memory ordering should be provided
around qht_insert() and qht_lookup(). Document these assumptions in the
header file and put some comments in the source to denote how that
memory ordering requirements are fulfilled.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[Sergey Fedorov: commit title and message provided;
comment on qht_remove() elided]
Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Message-Id: <20160715175852.30749-2-sergey.fedorov@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
include/qemu/qht.h | 5 +++++
util/qht.c | 7 ++++++-
2 files changed, 11 insertions(+), 1 deletion(-)
diff --git a/include/qemu/qht.h b/include/qemu/qht.h
index 70bfc68..311139b 100644
--- a/include/qemu/qht.h
+++ b/include/qemu/qht.h
@@ -69,6 +69,9 @@ void qht_destroy(struct qht *ht);
* Attempting to insert a NULL @p is a bug.
* Inserting the same pointer @p with different @hash values is a bug.
*
+ * In case of successful operation, smp_wmb() is implied before the pointer is
+ * inserted into the hash table.
+ *
* Returns true on sucess.
* Returns false if the @p-@hash pair already exists in the hash table.
*/
@@ -83,6 +86,8 @@ bool qht_insert(struct qht *ht, void *p, uint32_t hash);
*
* Needs to be called under an RCU read-critical section.
*
+ * smp_read_barrier_depends() is implied before the call to @func.
+ *
* The user-provided @func compares pointers in QHT against @userp.
* If the function returns true, a match has been found.
*
diff --git a/util/qht.c b/util/qht.c
index 40d6e21..28ce289 100644
--- a/util/qht.c
+++ b/util/qht.c
@@ -445,7 +445,11 @@ void *qht_do_lookup(struct qht_bucket *head, qht_lookup_func_t func,
do {
for (i = 0; i < QHT_BUCKET_ENTRIES; i++) {
if (b->hashes[i] == hash) {
- void *p = atomic_read(&b->pointers[i]);
+ /* The pointer is dereferenced before seqlock_read_retry,
+ * so (unlike qht_insert__locked) we need to use
+ * atomic_rcu_read here.
+ */
+ void *p = atomic_rcu_read(&b->pointers[i]);
if (likely(p) && likely(func(p, userp))) {
return p;
@@ -535,6 +539,7 @@ static bool qht_insert__locked(struct qht *ht, struct qht_map *map,
atomic_rcu_set(&prev->next, b);
}
b->hashes[i] = hash;
+ /* smp_wmb() implicit in seqlock_write_begin. */
atomic_set(&b->pointers[i], p);
seqlock_write_end(&head->sequence);
return true;
--
2.7.4
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [Qemu-devel] [PATCH 02/10] tcg: Pass last_tb by value to tb_find_fast()
2016-07-19 8:32 [Qemu-devel] [PATCH v5 00/10] Reduce lock contention on TCG hot-path Paolo Bonzini
2016-07-19 8:32 ` [Qemu-devel] [PATCH 01/10] util/qht: Document memory ordering assumptions Paolo Bonzini
@ 2016-07-19 8:32 ` Paolo Bonzini
2016-09-08 12:44 ` Alex Bennée
2016-07-19 8:32 ` [Qemu-devel] [PATCH 03/10] tcg: Prepare safe tb_jmp_cache lookup out of tb_lock Paolo Bonzini
` (7 subsequent siblings)
9 siblings, 1 reply; 20+ messages in thread
From: Paolo Bonzini @ 2016-07-19 8:32 UTC (permalink / raw)
To: qemu-devel; +Cc: serge.fdrv, sergey.fedorov, alex.bennee
From: Sergey Fedorov <serge.fdrv@gmail.com>
This is a small clean up. tb_find_fast() is a final consumer of this
variable so no need to pass it by reference. 'last_tb' is always updated
by subsequent cpu_loop_exec_tb() in cpu_exec().
This change also simplifies calling cpu_exec_nocache() in
cpu_handle_exception().
Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Message-Id: <20160715175852.30749-3-sergey.fedorov@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
cpu-exec.c | 15 +++++++--------
1 file changed, 7 insertions(+), 8 deletions(-)
diff --git a/cpu-exec.c b/cpu-exec.c
index 5d9710a..cf511f1 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -320,7 +320,7 @@ found:
}
static inline TranslationBlock *tb_find_fast(CPUState *cpu,
- TranslationBlock **last_tb,
+ TranslationBlock *last_tb,
int tb_exit)
{
CPUArchState *env = (CPUArchState *)cpu->env_ptr;
@@ -342,7 +342,7 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
/* Ensure that no TB jump will be modified as the
* translation buffer has been flushed.
*/
- *last_tb = NULL;
+ last_tb = NULL;
cpu->tb_flushed = false;
}
#ifndef CONFIG_USER_ONLY
@@ -351,12 +351,12 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
* spanning two pages because the mapping for the second page can change.
*/
if (tb->page_addr[1] != -1) {
- *last_tb = NULL;
+ last_tb = NULL;
}
#endif
/* See if we can patch the calling TB. */
- if (*last_tb && !qemu_loglevel_mask(CPU_LOG_TB_NOCHAIN)) {
- tb_add_jump(*last_tb, tb_exit, tb);
+ if (last_tb && !qemu_loglevel_mask(CPU_LOG_TB_NOCHAIN)) {
+ tb_add_jump(last_tb, tb_exit, tb);
}
tb_unlock();
return tb;
@@ -437,8 +437,7 @@ static inline bool cpu_handle_exception(CPUState *cpu, int *ret)
} else if (replay_has_exception()
&& cpu->icount_decr.u16.low + cpu->icount_extra == 0) {
/* try to cause an exception pending in the log */
- TranslationBlock *last_tb = NULL; /* Avoid chaining TBs */
- cpu_exec_nocache(cpu, 1, tb_find_fast(cpu, &last_tb, 0), true);
+ cpu_exec_nocache(cpu, 1, tb_find_fast(cpu, NULL, 0), true);
*ret = -1;
return true;
#endif
@@ -621,7 +620,7 @@ int cpu_exec(CPUState *cpu)
cpu->tb_flushed = false; /* reset before first TB lookup */
for(;;) {
cpu_handle_interrupt(cpu, &last_tb);
- tb = tb_find_fast(cpu, &last_tb, tb_exit);
+ tb = tb_find_fast(cpu, last_tb, tb_exit);
cpu_loop_exec_tb(cpu, tb, &last_tb, &tb_exit, &sc);
/* Try to align the host and virtual clocks
if the guest is in advance */
--
2.7.4
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [Qemu-devel] [PATCH 02/10] tcg: Pass last_tb by value to tb_find_fast()
2016-07-19 8:32 ` [Qemu-devel] [PATCH 02/10] tcg: Pass last_tb by value to tb_find_fast() Paolo Bonzini
@ 2016-09-08 12:44 ` Alex Bennée
2016-09-08 13:28 ` Paolo Bonzini
0 siblings, 1 reply; 20+ messages in thread
From: Alex Bennée @ 2016-09-08 12:44 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: qemu-devel, serge.fdrv, sergey.fedorov
Paolo Bonzini <pbonzini@redhat.com> writes:
> From: Sergey Fedorov <serge.fdrv@gmail.com>
>
> This is a small clean up. tb_find_fast() is a final consumer of this
> variable so no need to pass it by reference. 'last_tb' is always updated
> by subsequent cpu_loop_exec_tb() in cpu_exec().
>
<snip>
> @@ -621,7 +620,7 @@ int cpu_exec(CPUState *cpu)
> cpu->tb_flushed = false; /* reset before first TB lookup */
> for(;;) {
> cpu_handle_interrupt(cpu, &last_tb);
> - tb = tb_find_fast(cpu, &last_tb, tb_exit);
> + tb = tb_find_fast(cpu, last_tb, tb_exit);
Maybe a comment here for those that missed the subtly in the commit
message?
/* cpu_loop_exec_tb updates a to a new last_tb */
> cpu_loop_exec_tb(cpu, tb, &last_tb, &tb_exit, &sc);
You could even make it explicit and change cpu_loop_exec_tb to return
last_tb instead of passing by reference. Then it would be even clearer
when reading the code.
> /* Try to align the host and virtual clocks
> if the guest is in advance */
--
Alex Bennée
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Qemu-devel] [PATCH 02/10] tcg: Pass last_tb by value to tb_find_fast()
2016-09-08 12:44 ` Alex Bennée
@ 2016-09-08 13:28 ` Paolo Bonzini
0 siblings, 0 replies; 20+ messages in thread
From: Paolo Bonzini @ 2016-09-08 13:28 UTC (permalink / raw)
To: Alex Bennée; +Cc: qemu-devel, serge.fdrv, sergey.fedorov
On 08/09/2016 14:44, Alex Bennée wrote:
>> > cpu->tb_flushed = false; /* reset before first TB lookup */
>> > for(;;) {
>> > cpu_handle_interrupt(cpu, &last_tb);
>> > - tb = tb_find_fast(cpu, &last_tb, tb_exit);
>> > + tb = tb_find_fast(cpu, last_tb, tb_exit);
> Maybe a comment here for those that missed the subtly in the commit
> message?
>
> /* cpu_loop_exec_tb updates a to a new last_tb */
>
>> > cpu_loop_exec_tb(cpu, tb, &last_tb, &tb_exit, &sc);
> You could even make it explicit and change cpu_loop_exec_tb to return
> last_tb instead of passing by reference. Then it would be even clearer
> when reading the code.
>
I gave it a quick shot and it's not that simple... One simpler possibility
is to take this patch one step further and merge "tb" and "last_tb", but
I've not tested it yet:
diff --git a/cpu-exec.c b/cpu-exec.c
index cf511f1..80e6ff5 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -515,11 +515,11 @@ static inline void cpu_handle_interrupt(CPUState *cpu,
}
}
-static inline void cpu_loop_exec_tb(CPUState *cpu, TranslationBlock *tb,
- TranslationBlock **last_tb, int *tb_exit,
- SyncClocks *sc)
+static inline void cpu_loop_exec_tb(CPUState *cpu, TranslationBlock **last_tb,
+ int *tb_exit, SyncClocks *sc)
{
uintptr_t ret;
+ TranslationBlock *tb = *last_tb;
if (unlikely(cpu->exit_request)) {
return;
@@ -609,7 +609,7 @@ int cpu_exec(CPUState *cpu)
for(;;) {
/* prepare setjmp context for exception handling */
if (sigsetjmp(cpu->jmp_env, 0) == 0) {
- TranslationBlock *tb, *last_tb = NULL;
+ TranslationBlock *tb = NULL;
int tb_exit = 0;
/* if an exception is pending, we execute it here */
@@ -619,9 +619,9 @@ int cpu_exec(CPUState *cpu)
cpu->tb_flushed = false; /* reset before first TB lookup */
for(;;) {
- cpu_handle_interrupt(cpu, &last_tb);
- tb = tb_find_fast(cpu, last_tb, tb_exit);
- cpu_loop_exec_tb(cpu, tb, &last_tb, &tb_exit, &sc);
+ cpu_handle_interrupt(cpu, &tb);
+ tb = tb_find_fast(cpu, tb, tb_exit);
+ cpu_loop_exec_tb(cpu, &tb, &tb_exit, &sc);
/* Try to align the host and virtual clocks
if the guest is in advance */
align_clocks(&sc, cpu);
It seems better to me to do it as a follow-up step.
Paolo
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [Qemu-devel] [PATCH 03/10] tcg: Prepare safe tb_jmp_cache lookup out of tb_lock
2016-07-19 8:32 [Qemu-devel] [PATCH v5 00/10] Reduce lock contention on TCG hot-path Paolo Bonzini
2016-07-19 8:32 ` [Qemu-devel] [PATCH 01/10] util/qht: Document memory ordering assumptions Paolo Bonzini
2016-07-19 8:32 ` [Qemu-devel] [PATCH 02/10] tcg: Pass last_tb by value to tb_find_fast() Paolo Bonzini
@ 2016-07-19 8:32 ` Paolo Bonzini
2016-07-19 8:32 ` [Qemu-devel] [PATCH 04/10] tcg: Prepare safe access to tb_flushed " Paolo Bonzini
` (6 subsequent siblings)
9 siblings, 0 replies; 20+ messages in thread
From: Paolo Bonzini @ 2016-07-19 8:32 UTC (permalink / raw)
To: qemu-devel; +Cc: serge.fdrv, sergey.fedorov, alex.bennee
From: Sergey Fedorov <serge.fdrv@gmail.com>
Ensure atomicity of CPU's 'tb_jmp_cache' access for future translation
block lookup out of 'tb_lock'.
Note that this patch does *not* make CPU's TLB invalidation safe if it
is done from some other thread while the CPU is in its execution loop.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20160715175852.30749-4-sergey.fedorov@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
cpu-exec.c | 4 ++--
translate-all.c | 10 +++++++---
2 files changed, 9 insertions(+), 5 deletions(-)
diff --git a/cpu-exec.c b/cpu-exec.c
index cf511f1..32b58ed 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -315,7 +315,7 @@ static TranslationBlock *tb_find_slow(CPUState *cpu,
found:
/* we add the TB in the virtual pc hash table */
- cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)] = tb;
+ atomic_set(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)], tb);
return tb;
}
@@ -333,7 +333,7 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
is executed. */
cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags);
tb_lock();
- tb = cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)];
+ tb = atomic_read(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)]);
if (unlikely(!tb || tb->pc != pc || tb->cs_base != cs_base ||
tb->flags != flags)) {
tb = tb_find_slow(cpu, pc, cs_base, flags);
diff --git a/translate-all.c b/translate-all.c
index 0d47c1c..fdf520a 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -848,7 +848,11 @@ void tb_flush(CPUState *cpu)
tcg_ctx.tb_ctx.nb_tbs = 0;
CPU_FOREACH(cpu) {
- memset(cpu->tb_jmp_cache, 0, sizeof(cpu->tb_jmp_cache));
+ int i;
+
+ for (i = 0; i < TB_JMP_CACHE_SIZE; ++i) {
+ atomic_set(&cpu->tb_jmp_cache[i], NULL);
+ }
cpu->tb_flushed = true;
}
@@ -1007,8 +1011,8 @@ void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr)
/* remove the TB from the hash list */
h = tb_jmp_cache_hash_func(tb->pc);
CPU_FOREACH(cpu) {
- if (cpu->tb_jmp_cache[h] == tb) {
- cpu->tb_jmp_cache[h] = NULL;
+ if (atomic_read(&cpu->tb_jmp_cache[h]) == tb) {
+ atomic_set(&cpu->tb_jmp_cache[h], NULL);
}
}
--
2.7.4
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [Qemu-devel] [PATCH 04/10] tcg: Prepare safe access to tb_flushed out of tb_lock
2016-07-19 8:32 [Qemu-devel] [PATCH v5 00/10] Reduce lock contention on TCG hot-path Paolo Bonzini
` (2 preceding siblings ...)
2016-07-19 8:32 ` [Qemu-devel] [PATCH 03/10] tcg: Prepare safe tb_jmp_cache lookup out of tb_lock Paolo Bonzini
@ 2016-07-19 8:32 ` Paolo Bonzini
2016-07-19 8:32 ` [Qemu-devel] [PATCH 05/10] tcg: Prepare TB invalidation for lockless TB lookup Paolo Bonzini
` (5 subsequent siblings)
9 siblings, 0 replies; 20+ messages in thread
From: Paolo Bonzini @ 2016-07-19 8:32 UTC (permalink / raw)
To: qemu-devel; +Cc: serge.fdrv, sergey.fedorov, alex.bennee
From: Sergey Fedorov <serge.fdrv@gmail.com>
Ensure atomicity and ordering of CPU's 'tb_flushed' access for future
translation block lookup out of 'tb_lock'.
This field can only be touched from another thread by tb_flush() in user
mode emulation. So the only access to be sequential atomic is:
* a single write in tb_flush();
* reads/writes out of 'tb_lock'.
In future, before enabling MTTCG in system mode, tb_flush() must be safe
and this field becomes unnecessary.
Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20160715175852.30749-5-sergey.fedorov@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
cpu-exec.c | 16 +++++++---------
translate-all.c | 4 ++--
2 files changed, 9 insertions(+), 11 deletions(-)
diff --git a/cpu-exec.c b/cpu-exec.c
index 32b58ed..877ff8e 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -338,13 +338,6 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
tb->flags != flags)) {
tb = tb_find_slow(cpu, pc, cs_base, flags);
}
- if (cpu->tb_flushed) {
- /* Ensure that no TB jump will be modified as the
- * translation buffer has been flushed.
- */
- last_tb = NULL;
- cpu->tb_flushed = false;
- }
#ifndef CONFIG_USER_ONLY
/* We don't take care of direct jumps when address mapping changes in
* system emulation. So it's not safe to make a direct jump to a TB
@@ -356,7 +349,12 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
#endif
/* See if we can patch the calling TB. */
if (last_tb && !qemu_loglevel_mask(CPU_LOG_TB_NOCHAIN)) {
- tb_add_jump(last_tb, tb_exit, tb);
+ /* Check if translation buffer has been flushed */
+ if (cpu->tb_flushed) {
+ cpu->tb_flushed = false;
+ } else {
+ tb_add_jump(last_tb, tb_exit, tb);
+ }
}
tb_unlock();
return tb;
@@ -617,7 +615,7 @@ int cpu_exec(CPUState *cpu)
break;
}
- cpu->tb_flushed = false; /* reset before first TB lookup */
+ atomic_mb_set(&cpu->tb_flushed, false); /* reset before first TB lookup */
for(;;) {
cpu_handle_interrupt(cpu, &last_tb);
tb = tb_find_fast(cpu, last_tb, tb_exit);
diff --git a/translate-all.c b/translate-all.c
index fdf520a..788fed1 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -845,7 +845,6 @@ void tb_flush(CPUState *cpu)
> tcg_ctx.code_gen_buffer_size) {
cpu_abort(cpu, "Internal error: code buffer overflow\n");
}
- tcg_ctx.tb_ctx.nb_tbs = 0;
CPU_FOREACH(cpu) {
int i;
@@ -853,9 +852,10 @@ void tb_flush(CPUState *cpu)
for (i = 0; i < TB_JMP_CACHE_SIZE; ++i) {
atomic_set(&cpu->tb_jmp_cache[i], NULL);
}
- cpu->tb_flushed = true;
+ atomic_mb_set(&cpu->tb_flushed, true);
}
+ tcg_ctx.tb_ctx.nb_tbs = 0;
qht_reset_size(&tcg_ctx.tb_ctx.htable, CODE_GEN_HTABLE_SIZE);
page_flush_tb();
--
2.7.4
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [Qemu-devel] [PATCH 05/10] tcg: Prepare TB invalidation for lockless TB lookup
2016-07-19 8:32 [Qemu-devel] [PATCH v5 00/10] Reduce lock contention on TCG hot-path Paolo Bonzini
` (3 preceding siblings ...)
2016-07-19 8:32 ` [Qemu-devel] [PATCH 04/10] tcg: Prepare safe access to tb_flushed " Paolo Bonzini
@ 2016-07-19 8:32 ` Paolo Bonzini
2016-07-19 19:56 ` Sergey Fedorov
2016-07-19 8:32 ` [Qemu-devel] [PATCH 06/10] tcg: set up tb->page_addr before insertion Paolo Bonzini
` (4 subsequent siblings)
9 siblings, 1 reply; 20+ messages in thread
From: Paolo Bonzini @ 2016-07-19 8:32 UTC (permalink / raw)
To: qemu-devel; +Cc: serge.fdrv, sergey.fedorov, alex.bennee
When invalidating a translation block, set an invalid flag into the
TranslationBlock structure first. It is also necessary to check whether
the target TB is still valid after acquiring 'tb_lock' but before calling
tb_add_jump() since TB lookup is to be performed out of 'tb_lock' in
future. Note that we don't have to check 'last_tb'; an already invalidated
TB will not be executed anyway and it is thus safe to patch it.
Suggested-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
cpu-exec.c | 5 +++--
include/exec/exec-all.h | 2 ++
translate-all.c | 3 +++
3 files changed, 8 insertions(+), 2 deletions(-)
diff --git a/cpu-exec.c b/cpu-exec.c
index 877ff8e..cdaab1d 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -241,7 +241,8 @@ static bool tb_cmp(const void *p, const void *d)
if (tb->pc == desc->pc &&
tb->page_addr[0] == desc->phys_page1 &&
tb->cs_base == desc->cs_base &&
- tb->flags == desc->flags) {
+ tb->flags == desc->flags &&
+ !atomic_read(&tb->invalid)) {
/* check next page if needed */
if (tb->page_addr[1] == -1) {
return true;
@@ -352,7 +353,7 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
/* Check if translation buffer has been flushed */
if (cpu->tb_flushed) {
cpu->tb_flushed = false;
- } else {
+ } else if (!tb->invalid) {
tb_add_jump(last_tb, tb_exit, tb);
}
}
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index acda7b6..bc0bcc5 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -213,6 +213,8 @@ struct TranslationBlock {
#define CF_USE_ICOUNT 0x20000
#define CF_IGNORE_ICOUNT 0x40000 /* Do not generate icount code */
+ uint16_t invalid;
+
void *tc_ptr; /* pointer to the translated code */
uint8_t *tc_search; /* pointer to search data */
/* original tb when cflags has CF_NOCACHE */
diff --git a/translate-all.c b/translate-all.c
index 788fed1..eaa1232 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -773,6 +773,7 @@ static TranslationBlock *tb_alloc(target_ulong pc)
tb = &tcg_ctx.tb_ctx.tbs[tcg_ctx.tb_ctx.nb_tbs++];
tb->pc = pc;
tb->cflags = 0;
+ tb->invalid = false;
return tb;
}
@@ -991,6 +992,8 @@ void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr)
uint32_t h;
tb_page_addr_t phys_pc;
+ atomic_set(&tb->invalid, true);
+
/* remove the TB from the hash list */
phys_pc = tb->page_addr[0] + (tb->pc & ~TARGET_PAGE_MASK);
h = tb_hash_func(phys_pc, tb->pc, tb->flags);
--
2.7.4
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [Qemu-devel] [PATCH 05/10] tcg: Prepare TB invalidation for lockless TB lookup
2016-07-19 8:32 ` [Qemu-devel] [PATCH 05/10] tcg: Prepare TB invalidation for lockless TB lookup Paolo Bonzini
@ 2016-07-19 19:56 ` Sergey Fedorov
2016-07-19 22:27 ` Paolo Bonzini
0 siblings, 1 reply; 20+ messages in thread
From: Sergey Fedorov @ 2016-07-19 19:56 UTC (permalink / raw)
To: Paolo Bonzini, qemu-devel; +Cc: sergey.fedorov, alex.bennee
On 19/07/16 11:32, Paolo Bonzini wrote:
>
It looks much better now :)
> When invalidating a translation block, set an invalid flag into the
> TranslationBlock structure first. It is also necessary to check whether
> the target TB is still valid after acquiring 'tb_lock' but before calling
> tb_add_jump() since TB lookup is to be performed out of 'tb_lock' in
> future. Note that we don't have to check 'last_tb'; an already invalidated
> TB will not be executed anyway and it is thus safe to patch it.
>
> Suggested-by: Sergey Fedorov <serge.fdrv@gmail.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> cpu-exec.c | 5 +++--
> include/exec/exec-all.h | 2 ++
> translate-all.c | 3 +++
> 3 files changed, 8 insertions(+), 2 deletions(-)
(snip)
> diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
> index acda7b6..bc0bcc5 100644
> --- a/include/exec/exec-all.h
> +++ b/include/exec/exec-all.h
> @@ -213,6 +213,8 @@ struct TranslationBlock {
> #define CF_USE_ICOUNT 0x20000
> #define CF_IGNORE_ICOUNT 0x40000 /* Do not generate icount code */
>
> + uint16_t invalid;
Why not "int"?
> +
> void *tc_ptr; /* pointer to the translated code */
> uint8_t *tc_search; /* pointer to search data */
> /* original tb when cflags has CF_NOCACHE */
>
Thanks,
Sergey
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Qemu-devel] [PATCH 05/10] tcg: Prepare TB invalidation for lockless TB lookup
2016-07-19 19:56 ` Sergey Fedorov
@ 2016-07-19 22:27 ` Paolo Bonzini
2016-07-21 8:36 ` Sergey Fedorov
0 siblings, 1 reply; 20+ messages in thread
From: Paolo Bonzini @ 2016-07-19 22:27 UTC (permalink / raw)
To: Sergey Fedorov; +Cc: qemu-devel, sergey fedorov, alex bennee
----- Original Message -----
> From: "Sergey Fedorov" <serge.fdrv@gmail.com>
> To: "Paolo Bonzini" <pbonzini@redhat.com>, qemu-devel@nongnu.org
> Cc: "sergey fedorov" <sergey.fedorov@linaro.org>, "alex bennee" <alex.bennee@linaro.org>
> Sent: Tuesday, July 19, 2016 9:56:49 PM
> Subject: Re: [PATCH 05/10] tcg: Prepare TB invalidation for lockless TB lookup
>
> On 19/07/16 11:32, Paolo Bonzini wrote:
> >
>
> It looks much better now :)
>
> > When invalidating a translation block, set an invalid flag into the
> > TranslationBlock structure first. It is also necessary to check whether
> > the target TB is still valid after acquiring 'tb_lock' but before calling
> > tb_add_jump() since TB lookup is to be performed out of 'tb_lock' in
> > future. Note that we don't have to check 'last_tb'; an already invalidated
> > TB will not be executed anyway and it is thus safe to patch it.
> >
> > Suggested-by: Sergey Fedorov <serge.fdrv@gmail.com>
> > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> > ---
> > cpu-exec.c | 5 +++--
> > include/exec/exec-all.h | 2 ++
> > translate-all.c | 3 +++
> > 3 files changed, 8 insertions(+), 2 deletions(-)
> (snip)
> > diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
> > index acda7b6..bc0bcc5 100644
> > --- a/include/exec/exec-all.h
> > +++ b/include/exec/exec-all.h
> > @@ -213,6 +213,8 @@ struct TranslationBlock {
> > #define CF_USE_ICOUNT 0x20000
> > #define CF_IGNORE_ICOUNT 0x40000 /* Do not generate icount code */
> >
> > + uint16_t invalid;
>
> Why not "int"?
There's a hole there, we may want to move something else so I
used a smaller data type. Even uint8_t would do.
Paolo
>
> > +
> > void *tc_ptr; /* pointer to the translated code */
> > uint8_t *tc_search; /* pointer to search data */
> > /* original tb when cflags has CF_NOCACHE */
> >
>
> Thanks,
> Sergey
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Qemu-devel] [PATCH 05/10] tcg: Prepare TB invalidation for lockless TB lookup
2016-07-19 22:27 ` Paolo Bonzini
@ 2016-07-21 8:36 ` Sergey Fedorov
2016-07-21 9:04 ` Paolo Bonzini
2016-07-21 11:25 ` Paolo Bonzini
0 siblings, 2 replies; 20+ messages in thread
From: Sergey Fedorov @ 2016-07-21 8:36 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: qemu-devel, sergey fedorov, alex bennee
On 20/07/16 01:27, Paolo Bonzini wrote:
>
> ----- Original Message -----
>> From: "Sergey Fedorov" <serge.fdrv@gmail.com>
>> To: "Paolo Bonzini" <pbonzini@redhat.com>, qemu-devel@nongnu.org
>> Cc: "sergey fedorov" <sergey.fedorov@linaro.org>, "alex bennee" <alex.bennee@linaro.org>
>> Sent: Tuesday, July 19, 2016 9:56:49 PM
>> Subject: Re: [PATCH 05/10] tcg: Prepare TB invalidation for lockless TB lookup
>>
>> On 19/07/16 11:32, Paolo Bonzini wrote:
>> It looks much better now :)
>>
>>> When invalidating a translation block, set an invalid flag into the
>>> TranslationBlock structure first. It is also necessary to check whether
>>> the target TB is still valid after acquiring 'tb_lock' but before calling
>>> tb_add_jump() since TB lookup is to be performed out of 'tb_lock' in
>>> future. Note that we don't have to check 'last_tb'; an already invalidated
>>> TB will not be executed anyway and it is thus safe to patch it.
>>>
>>> Suggested-by: Sergey Fedorov <serge.fdrv@gmail.com>
>>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>>> ---
>>> cpu-exec.c | 5 +++--
>>> include/exec/exec-all.h | 2 ++
>>> translate-all.c | 3 +++
>>> 3 files changed, 8 insertions(+), 2 deletions(-)
>> (snip)
>>> diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
>>> index acda7b6..bc0bcc5 100644
>>> --- a/include/exec/exec-all.h
>>> +++ b/include/exec/exec-all.h
>>> @@ -213,6 +213,8 @@ struct TranslationBlock {
>>> #define CF_USE_ICOUNT 0x20000
>>> #define CF_IGNORE_ICOUNT 0x40000 /* Do not generate icount code */
>>>
>>> + uint16_t invalid;
>> Why not "int"?
> There's a hole there, we may want to move something else so I
> used a smaller data type. Even uint8_t would do.
But could simple "bool" work as well here?
>
> Paolo
>>> +
>>> void *tc_ptr; /* pointer to the translated code */
>>> uint8_t *tc_search; /* pointer to search data */
Are you sure that the hole is over there, not here?
Kind regards,
Sergey
>>> /* original tb when cflags has CF_NOCACHE */
>>>
>> Thanks,
>> Sergey
>>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Qemu-devel] [PATCH 05/10] tcg: Prepare TB invalidation for lockless TB lookup
2016-07-21 8:36 ` Sergey Fedorov
@ 2016-07-21 9:04 ` Paolo Bonzini
2016-07-21 11:25 ` Paolo Bonzini
1 sibling, 0 replies; 20+ messages in thread
From: Paolo Bonzini @ 2016-07-21 9:04 UTC (permalink / raw)
To: Sergey Fedorov; +Cc: alex bennee, qemu-devel, sergey fedorov
On 21/07/2016 10:36, Sergey Fedorov wrote:
> On 20/07/16 01:27, Paolo Bonzini wrote:
>>
>> ----- Original Message -----
>>> From: "Sergey Fedorov" <serge.fdrv@gmail.com>
>>> To: "Paolo Bonzini" <pbonzini@redhat.com>, qemu-devel@nongnu.org
>>> Cc: "sergey fedorov" <sergey.fedorov@linaro.org>, "alex bennee" <alex.bennee@linaro.org>
>>> Sent: Tuesday, July 19, 2016 9:56:49 PM
>>> Subject: Re: [PATCH 05/10] tcg: Prepare TB invalidation for lockless TB lookup
>>>
>>> On 19/07/16 11:32, Paolo Bonzini wrote:
>>> It looks much better now :)
>>>
>>>> When invalidating a translation block, set an invalid flag into the
>>>> TranslationBlock structure first. It is also necessary to check whether
>>>> the target TB is still valid after acquiring 'tb_lock' but before calling
>>>> tb_add_jump() since TB lookup is to be performed out of 'tb_lock' in
>>>> future. Note that we don't have to check 'last_tb'; an already invalidated
>>>> TB will not be executed anyway and it is thus safe to patch it.
>>>>
>>>> Suggested-by: Sergey Fedorov <serge.fdrv@gmail.com>
>>>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>>>> ---
>>>> cpu-exec.c | 5 +++--
>>>> include/exec/exec-all.h | 2 ++
>>>> translate-all.c | 3 +++
>>>> 3 files changed, 8 insertions(+), 2 deletions(-)
>>> (snip)
>>>> diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
>>>> index acda7b6..bc0bcc5 100644
>>>> --- a/include/exec/exec-all.h
>>>> +++ b/include/exec/exec-all.h
>>>> @@ -213,6 +213,8 @@ struct TranslationBlock {
>>>> #define CF_USE_ICOUNT 0x20000
>>>> #define CF_IGNORE_ICOUNT 0x40000 /* Do not generate icount code */
>>>>
>>>> + uint16_t invalid;
>>> Why not "int"?
>> There's a hole there, we may want to move something else so I
>> used a smaller data type. Even uint8_t would do.
>
> But could simple "bool" work as well here?
sizeof(bool) is sometimes 1 sometimes 4. Since in the future we might
want to pack TranslationBlock for better locality, I thought it was
better to stick with uint*_t.
Paolo
>
>>
>> Paolo
>>>> +
>>>> void *tc_ptr; /* pointer to the translated code */
>>>> uint8_t *tc_search; /* pointer to search data */
>
> Are you sure that the hole is over there, not here?
>
> Kind regards,
> Sergey
>
>>>> /* original tb when cflags has CF_NOCACHE */
>>>>
>>> Thanks,
>>> Sergey
>>>
>
>
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Qemu-devel] [PATCH 05/10] tcg: Prepare TB invalidation for lockless TB lookup
2016-07-21 8:36 ` Sergey Fedorov
2016-07-21 9:04 ` Paolo Bonzini
@ 2016-07-21 11:25 ` Paolo Bonzini
2016-07-21 17:42 ` Sergey Fedorov
1 sibling, 1 reply; 20+ messages in thread
From: Paolo Bonzini @ 2016-07-21 11:25 UTC (permalink / raw)
To: Sergey Fedorov; +Cc: qemu-devel, sergey fedorov, alex bennee
----- Original Message -----
> From: "Sergey Fedorov" <serge.fdrv@gmail.com>
> To: "Paolo Bonzini" <pbonzini@redhat.com>
> Cc: qemu-devel@nongnu.org, "sergey fedorov" <sergey.fedorov@linaro.org>, "alex bennee" <alex.bennee@linaro.org>
> Sent: Thursday, July 21, 2016 10:36:35 AM
> Subject: Re: [PATCH 05/10] tcg: Prepare TB invalidation for lockless TB lookup
>
> On 20/07/16 01:27, Paolo Bonzini wrote:
> >
> > ----- Original Message -----
> >> From: "Sergey Fedorov" <serge.fdrv@gmail.com>
> >> To: "Paolo Bonzini" <pbonzini@redhat.com>, qemu-devel@nongnu.org
> >> Cc: "sergey fedorov" <sergey.fedorov@linaro.org>, "alex bennee"
> >> <alex.bennee@linaro.org>
> >> Sent: Tuesday, July 19, 2016 9:56:49 PM
> >> Subject: Re: [PATCH 05/10] tcg: Prepare TB invalidation for lockless TB
> >> lookup
> >>
> >> On 19/07/16 11:32, Paolo Bonzini wrote:
> >> It looks much better now :)
> >>
> >>> When invalidating a translation block, set an invalid flag into the
> >>> TranslationBlock structure first. It is also necessary to check whether
> >>> the target TB is still valid after acquiring 'tb_lock' but before calling
> >>> tb_add_jump() since TB lookup is to be performed out of 'tb_lock' in
> >>> future. Note that we don't have to check 'last_tb'; an already
> >>> invalidated
> >>> TB will not be executed anyway and it is thus safe to patch it.
> >>>
> >>> Suggested-by: Sergey Fedorov <serge.fdrv@gmail.com>
> >>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> >>> ---
> >>> cpu-exec.c | 5 +++--
> >>> include/exec/exec-all.h | 2 ++
> >>> translate-all.c | 3 +++
> >>> 3 files changed, 8 insertions(+), 2 deletions(-)
> >> (snip)
> >>> diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
> >>> index acda7b6..bc0bcc5 100644
> >>> --- a/include/exec/exec-all.h
> >>> +++ b/include/exec/exec-all.h
> >>> @@ -213,6 +213,8 @@ struct TranslationBlock {
> >>> #define CF_USE_ICOUNT 0x20000
> >>> #define CF_IGNORE_ICOUNT 0x40000 /* Do not generate icount code */
> >>>
> >>> + uint16_t invalid;
> >> Why not "int"?
> > There's a hole there, we may want to move something else so I
> > used a smaller data type. Even uint8_t would do.
>
> But could simple "bool" work as well here?
>
> >
> > Paolo
> >>> +
> >>> void *tc_ptr; /* pointer to the translated code */
> >>> uint8_t *tc_search; /* pointer to search data */
>
> Are you sure that the hole is over there, not here?
Yes, all pointers have the same size. For 32-bit hosts, my
patch introduces a 2-byte hole. For 64-bit hosts, it reduces
a 4-byte hole to 2-byte.
Before:
target_ulong pc; /* 0 */
target_ulong cs_base; /* 4 */
uint32_t flags; /* 8 */
uint16_t size; /* 12 */
uint16_t icount; /* 14 */
uint32_t cflags; /* 16 */
/* 4 byte padding ** 20 on 64-bit systems */
void *tc_ptr; /* 24 on 64-bit systems, 20 on 32-bit */
After:
target_ulong pc; /* 0 */
target_ulong cs_base; /* 4 */
uint32_t flags; /* 8 */
uint16_t size; /* 12 */
uint16_t icount; /* 14 */
uint32_t cflags; /* 16 */
uint16_t invalid; /* 20 */
/* 2 byte padding ** 22 */
void *tc_ptr; /* 24 */
BTW, another reason to use uint16_t is that I suspect tb->icount can
be made redundant with cflags, so we might move tb->invalid up if
tb->icount is removed.
Thanks,
Paolo
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Qemu-devel] [PATCH 05/10] tcg: Prepare TB invalidation for lockless TB lookup
2016-07-21 11:25 ` Paolo Bonzini
@ 2016-07-21 17:42 ` Sergey Fedorov
0 siblings, 0 replies; 20+ messages in thread
From: Sergey Fedorov @ 2016-07-21 17:42 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: qemu-devel, sergey fedorov, alex bennee
On 21/07/16 14:25, Paolo Bonzini wrote:
>
> ----- Original Message -----
>> From: "Sergey Fedorov" <serge.fdrv@gmail.com>
>> To: "Paolo Bonzini" <pbonzini@redhat.com>
>> Cc: qemu-devel@nongnu.org, "sergey fedorov" <sergey.fedorov@linaro.org>, "alex bennee" <alex.bennee@linaro.org>
>> Sent: Thursday, July 21, 2016 10:36:35 AM
>> Subject: Re: [PATCH 05/10] tcg: Prepare TB invalidation for lockless TB lookup
>>
>> On 20/07/16 01:27, Paolo Bonzini wrote:
>>> ----- Original Message -----
>>>> From: "Sergey Fedorov" <serge.fdrv@gmail.com>
>>>> To: "Paolo Bonzini" <pbonzini@redhat.com>, qemu-devel@nongnu.org
>>>> Cc: "sergey fedorov" <sergey.fedorov@linaro.org>, "alex bennee"
>>>> <alex.bennee@linaro.org>
>>>> Sent: Tuesday, July 19, 2016 9:56:49 PM
>>>> Subject: Re: [PATCH 05/10] tcg: Prepare TB invalidation for lockless TB
>>>> lookup
>>>>
>>>> On 19/07/16 11:32, Paolo Bonzini wrote:
>>>> It looks much better now :)
>>>>
>>>>> When invalidating a translation block, set an invalid flag into the
>>>>> TranslationBlock structure first. It is also necessary to check whether
>>>>> the target TB is still valid after acquiring 'tb_lock' but before calling
>>>>> tb_add_jump() since TB lookup is to be performed out of 'tb_lock' in
>>>>> future. Note that we don't have to check 'last_tb'; an already
>>>>> invalidated
>>>>> TB will not be executed anyway and it is thus safe to patch it.
>>>>>
>>>>> Suggested-by: Sergey Fedorov <serge.fdrv@gmail.com>
>>>>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>>>>> ---
>>>>> cpu-exec.c | 5 +++--
>>>>> include/exec/exec-all.h | 2 ++
>>>>> translate-all.c | 3 +++
>>>>> 3 files changed, 8 insertions(+), 2 deletions(-)
>>>> (snip)
>>>>> diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
>>>>> index acda7b6..bc0bcc5 100644
>>>>> --- a/include/exec/exec-all.h
>>>>> +++ b/include/exec/exec-all.h
>>>>> @@ -213,6 +213,8 @@ struct TranslationBlock {
>>>>> #define CF_USE_ICOUNT 0x20000
>>>>> #define CF_IGNORE_ICOUNT 0x40000 /* Do not generate icount code */
>>>>>
>>>>> + uint16_t invalid;
>>>> Why not "int"?
>>> There's a hole there, we may want to move something else so I
>>> used a smaller data type. Even uint8_t would do.
>> But could simple "bool" work as well here?
>>
>>> Paolo
>>>>> +
>>>>> void *tc_ptr; /* pointer to the translated code */
>>>>> uint8_t *tc_search; /* pointer to search data */
>> Are you sure that the hole is over there, not here?
> Yes, all pointers have the same size. For 32-bit hosts, my
> patch introduces a 2-byte hole. For 64-bit hosts, it reduces
> a 4-byte hole to 2-byte.
>
> Before:
>
> target_ulong pc; /* 0 */
> target_ulong cs_base; /* 4 */
> uint32_t flags; /* 8 */
> uint16_t size; /* 12 */
> uint16_t icount; /* 14 */
> uint32_t cflags; /* 16 */
> /* 4 byte padding ** 20 on 64-bit systems */
> void *tc_ptr; /* 24 on 64-bit systems, 20 on 32-bit */
>
> After:
>
> target_ulong pc; /* 0 */
> target_ulong cs_base; /* 4 */
> uint32_t flags; /* 8 */
> uint16_t size; /* 12 */
> uint16_t icount; /* 14 */
> uint32_t cflags; /* 16 */
> uint16_t invalid; /* 20 */
> /* 2 byte padding ** 22 */
> void *tc_ptr; /* 24 */
>
> BTW, another reason to use uint16_t is that I suspect tb->icount can
> be made redundant with cflags, so we might move tb->invalid up if
> tb->icount is removed.
Fair enough. I think it might make sense to use uint8_t to make a hint
for possible future compaction of TranslationBlock structure.
Thanks,
Sergey
^ permalink raw reply [flat|nested] 20+ messages in thread
* [Qemu-devel] [PATCH 06/10] tcg: set up tb->page_addr before insertion
2016-07-19 8:32 [Qemu-devel] [PATCH v5 00/10] Reduce lock contention on TCG hot-path Paolo Bonzini
` (4 preceding siblings ...)
2016-07-19 8:32 ` [Qemu-devel] [PATCH 05/10] tcg: Prepare TB invalidation for lockless TB lookup Paolo Bonzini
@ 2016-07-19 8:32 ` Paolo Bonzini
2016-07-19 8:32 ` [Qemu-devel] [PATCH 07/10] tcg: cpu-exec: remove tb_lock from the hot-path Paolo Bonzini
` (3 subsequent siblings)
9 siblings, 0 replies; 20+ messages in thread
From: Paolo Bonzini @ 2016-07-19 8:32 UTC (permalink / raw)
To: qemu-devel; +Cc: serge.fdrv, sergey.fedorov, alex.bennee
From: Alex Bennée <alex.bennee@linaro.org>
This ensures that if we find the TB on the slow path that tb->page_addr
is correctly set before being tested.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Message-Id: <20160715175852.30749-9-sergey.fedorov@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
translate-all.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/translate-all.c b/translate-all.c
index eaa1232..1ce05ff 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -1128,10 +1128,6 @@ static void tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc,
{
uint32_t h;
- /* add in the hash table */
- h = tb_hash_func(phys_pc, tb->pc, tb->flags);
- qht_insert(&tcg_ctx.tb_ctx.htable, tb, h);
-
/* add in the page list */
tb_alloc_page(tb, 0, phys_pc & TARGET_PAGE_MASK);
if (phys_page2 != -1) {
@@ -1140,6 +1136,10 @@ static void tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc,
tb->page_addr[1] = -1;
}
+ /* add in the hash table */
+ h = tb_hash_func(phys_pc, tb->pc, tb->flags);
+ qht_insert(&tcg_ctx.tb_ctx.htable, tb, h);
+
#ifdef DEBUG_TB_CHECK
tb_page_check();
#endif
--
2.7.4
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [Qemu-devel] [PATCH 07/10] tcg: cpu-exec: remove tb_lock from the hot-path
2016-07-19 8:32 [Qemu-devel] [PATCH v5 00/10] Reduce lock contention on TCG hot-path Paolo Bonzini
` (5 preceding siblings ...)
2016-07-19 8:32 ` [Qemu-devel] [PATCH 06/10] tcg: set up tb->page_addr before insertion Paolo Bonzini
@ 2016-07-19 8:32 ` Paolo Bonzini
2016-07-19 8:32 ` [Qemu-devel] [PATCH 08/10] tcg: Avoid bouncing tb_lock between tb_gen_code() and tb_add_jump() Paolo Bonzini
` (2 subsequent siblings)
9 siblings, 0 replies; 20+ messages in thread
From: Paolo Bonzini @ 2016-07-19 8:32 UTC (permalink / raw)
To: qemu-devel; +Cc: serge.fdrv, sergey.fedorov, alex.bennee
From: Alex Bennée <alex.bennee@linaro.org>
Lock contention in the hot path of moving between existing patched
TranslationBlocks is the main drag in multithreaded performance. This
patch pushes the tb_lock() usage down to the two places that really need
it:
- code generation (tb_gen_code)
- jump patching (tb_add_jump)
The rest of the code doesn't really need to hold a lock as it is either
using per-CPU structures, atomically updated or designed to be used in
concurrent read situations (qht_lookup).
To keep things simple I removed the #ifdef CONFIG_USER_ONLY stuff as the
locks become NOPs anyway until the MTTCG work is completed.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Message-Id: <20160715175852.30749-10-sergey.fedorov@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
cpu-exec.c | 48 +++++++++++++++++++++---------------------------
1 file changed, 21 insertions(+), 27 deletions(-)
diff --git a/cpu-exec.c b/cpu-exec.c
index cdaab1d..7ca2b71 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -287,35 +287,29 @@ static TranslationBlock *tb_find_slow(CPUState *cpu,
TranslationBlock *tb;
tb = tb_find_physical(cpu, pc, cs_base, flags);
- if (tb) {
- goto found;
- }
+ if (!tb) {
-#ifdef CONFIG_USER_ONLY
- /* mmap_lock is needed by tb_gen_code, and mmap_lock must be
- * taken outside tb_lock. Since we're momentarily dropping
- * tb_lock, there's a chance that our desired tb has been
- * translated.
- */
- tb_unlock();
- mmap_lock();
- tb_lock();
- tb = tb_find_physical(cpu, pc, cs_base, flags);
- if (tb) {
- mmap_unlock();
- goto found;
- }
-#endif
+ /* mmap_lock is needed by tb_gen_code, and mmap_lock must be
+ * taken outside tb_lock. As system emulation is currently
+ * single threaded the locks are NOPs.
+ */
+ mmap_lock();
+ tb_lock();
- /* if no translated code available, then translate it now */
- tb = tb_gen_code(cpu, pc, cs_base, flags, 0);
+ /* There's a chance that our desired tb has been translated while
+ * taking the locks so we check again inside the lock.
+ */
+ tb = tb_find_physical(cpu, pc, cs_base, flags);
+ if (!tb) {
+ /* if no translated code available, then translate it now */
+ tb = tb_gen_code(cpu, pc, cs_base, flags, 0);
+ }
-#ifdef CONFIG_USER_ONLY
- mmap_unlock();
-#endif
+ tb_unlock();
+ mmap_unlock();
+ }
-found:
- /* we add the TB in the virtual pc hash table */
+ /* We add the TB in the virtual pc hash table for the fast lookup */
atomic_set(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)], tb);
return tb;
}
@@ -333,7 +327,6 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
always be the same before a given translated block
is executed. */
cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags);
- tb_lock();
tb = atomic_read(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)]);
if (unlikely(!tb || tb->pc != pc || tb->cs_base != cs_base ||
tb->flags != flags)) {
@@ -350,14 +343,15 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
#endif
/* See if we can patch the calling TB. */
if (last_tb && !qemu_loglevel_mask(CPU_LOG_TB_NOCHAIN)) {
+ tb_lock();
/* Check if translation buffer has been flushed */
if (cpu->tb_flushed) {
cpu->tb_flushed = false;
} else if (!tb->invalid) {
tb_add_jump(last_tb, tb_exit, tb);
}
+ tb_unlock();
}
- tb_unlock();
return tb;
}
--
2.7.4
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [Qemu-devel] [PATCH 08/10] tcg: Avoid bouncing tb_lock between tb_gen_code() and tb_add_jump()
2016-07-19 8:32 [Qemu-devel] [PATCH v5 00/10] Reduce lock contention on TCG hot-path Paolo Bonzini
` (6 preceding siblings ...)
2016-07-19 8:32 ` [Qemu-devel] [PATCH 07/10] tcg: cpu-exec: remove tb_lock from the hot-path Paolo Bonzini
@ 2016-07-19 8:32 ` Paolo Bonzini
2016-07-19 8:32 ` [Qemu-devel] [PATCH 09/10] tcg: Merge tb_find_slow() and tb_find_fast() Paolo Bonzini
2016-07-19 8:32 ` [Qemu-devel] [PATCH 10/10] tcg: rename tb_find_physical() Paolo Bonzini
9 siblings, 0 replies; 20+ messages in thread
From: Paolo Bonzini @ 2016-07-19 8:32 UTC (permalink / raw)
To: qemu-devel; +Cc: serge.fdrv, sergey.fedorov, alex.bennee
From: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20160715175852.30749-11-sergey.fedorov@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
cpu-exec.c | 15 +++++++++++----
1 file changed, 11 insertions(+), 4 deletions(-)
diff --git a/cpu-exec.c b/cpu-exec.c
index 7ca2b71..bd9fa5a 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -282,7 +282,8 @@ static TranslationBlock *tb_find_physical(CPUState *cpu,
static TranslationBlock *tb_find_slow(CPUState *cpu,
target_ulong pc,
target_ulong cs_base,
- uint32_t flags)
+ uint32_t flags,
+ bool *have_tb_lock)
{
TranslationBlock *tb;
@@ -295,6 +296,7 @@ static TranslationBlock *tb_find_slow(CPUState *cpu,
*/
mmap_lock();
tb_lock();
+ *have_tb_lock = true;
/* There's a chance that our desired tb has been translated while
* taking the locks so we check again inside the lock.
@@ -305,7 +307,6 @@ static TranslationBlock *tb_find_slow(CPUState *cpu,
tb = tb_gen_code(cpu, pc, cs_base, flags, 0);
}
- tb_unlock();
mmap_unlock();
}
@@ -322,6 +323,7 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
TranslationBlock *tb;
target_ulong cs_base, pc;
uint32_t flags;
+ bool have_tb_lock = false;
/* we record a subset of the CPU state. It will
always be the same before a given translated block
@@ -330,7 +332,7 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
tb = atomic_read(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)]);
if (unlikely(!tb || tb->pc != pc || tb->cs_base != cs_base ||
tb->flags != flags)) {
- tb = tb_find_slow(cpu, pc, cs_base, flags);
+ tb = tb_find_slow(cpu, pc, cs_base, flags, &have_tb_lock);
}
#ifndef CONFIG_USER_ONLY
/* We don't take care of direct jumps when address mapping changes in
@@ -343,13 +345,18 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
#endif
/* See if we can patch the calling TB. */
if (last_tb && !qemu_loglevel_mask(CPU_LOG_TB_NOCHAIN)) {
- tb_lock();
+ if (!have_tb_lock) {
+ tb_lock();
+ have_tb_lock = true;
+ }
/* Check if translation buffer has been flushed */
if (cpu->tb_flushed) {
cpu->tb_flushed = false;
} else if (!tb->invalid) {
tb_add_jump(last_tb, tb_exit, tb);
}
+ }
+ if (have_tb_lock) {
tb_unlock();
}
return tb;
--
2.7.4
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [Qemu-devel] [PATCH 09/10] tcg: Merge tb_find_slow() and tb_find_fast()
2016-07-19 8:32 [Qemu-devel] [PATCH v5 00/10] Reduce lock contention on TCG hot-path Paolo Bonzini
` (7 preceding siblings ...)
2016-07-19 8:32 ` [Qemu-devel] [PATCH 08/10] tcg: Avoid bouncing tb_lock between tb_gen_code() and tb_add_jump() Paolo Bonzini
@ 2016-07-19 8:32 ` Paolo Bonzini
2016-07-19 8:32 ` [Qemu-devel] [PATCH 10/10] tcg: rename tb_find_physical() Paolo Bonzini
9 siblings, 0 replies; 20+ messages in thread
From: Paolo Bonzini @ 2016-07-19 8:32 UTC (permalink / raw)
To: qemu-devel; +Cc: serge.fdrv, sergey.fedorov, alex.bennee
From: Sergey Fedorov <serge.fdrv@gmail.com>
These functions are not too big and can be merged together. This makes
locking scheme more clear and easier to follow.
Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20160715175852.30749-12-sergey.fedorov@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
cpu-exec.c | 72 ++++++++++++++++++++++++++------------------------------------
1 file changed, 30 insertions(+), 42 deletions(-)
diff --git a/cpu-exec.c b/cpu-exec.c
index bd9fa5a..f7f60b1 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -279,45 +279,9 @@ static TranslationBlock *tb_find_physical(CPUState *cpu,
return qht_lookup(&tcg_ctx.tb_ctx.htable, tb_cmp, &desc, h);
}
-static TranslationBlock *tb_find_slow(CPUState *cpu,
- target_ulong pc,
- target_ulong cs_base,
- uint32_t flags,
- bool *have_tb_lock)
-{
- TranslationBlock *tb;
-
- tb = tb_find_physical(cpu, pc, cs_base, flags);
- if (!tb) {
-
- /* mmap_lock is needed by tb_gen_code, and mmap_lock must be
- * taken outside tb_lock. As system emulation is currently
- * single threaded the locks are NOPs.
- */
- mmap_lock();
- tb_lock();
- *have_tb_lock = true;
-
- /* There's a chance that our desired tb has been translated while
- * taking the locks so we check again inside the lock.
- */
- tb = tb_find_physical(cpu, pc, cs_base, flags);
- if (!tb) {
- /* if no translated code available, then translate it now */
- tb = tb_gen_code(cpu, pc, cs_base, flags, 0);
- }
-
- mmap_unlock();
- }
-
- /* We add the TB in the virtual pc hash table for the fast lookup */
- atomic_set(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)], tb);
- return tb;
-}
-
-static inline TranslationBlock *tb_find_fast(CPUState *cpu,
- TranslationBlock *last_tb,
- int tb_exit)
+static inline TranslationBlock *tb_find(CPUState *cpu,
+ TranslationBlock *last_tb,
+ int tb_exit)
{
CPUArchState *env = (CPUArchState *)cpu->env_ptr;
TranslationBlock *tb;
@@ -332,7 +296,31 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
tb = atomic_read(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)]);
if (unlikely(!tb || tb->pc != pc || tb->cs_base != cs_base ||
tb->flags != flags)) {
- tb = tb_find_slow(cpu, pc, cs_base, flags, &have_tb_lock);
+ tb = tb_find_physical(cpu, pc, cs_base, flags);
+ if (!tb) {
+
+ /* mmap_lock is needed by tb_gen_code, and mmap_lock must be
+ * taken outside tb_lock. As system emulation is currently
+ * single threaded the locks are NOPs.
+ */
+ mmap_lock();
+ tb_lock();
+ have_tb_lock = true;
+
+ /* There's a chance that our desired tb has been translated while
+ * taking the locks so we check again inside the lock.
+ */
+ tb = tb_find_physical(cpu, pc, cs_base, flags);
+ if (!tb) {
+ /* if no translated code available, then translate it now */
+ tb = tb_gen_code(cpu, pc, cs_base, flags, 0);
+ }
+
+ mmap_unlock();
+ }
+
+ /* We add the TB in the virtual pc hash table for the fast lookup */
+ atomic_set(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)], tb);
}
#ifndef CONFIG_USER_ONLY
/* We don't take care of direct jumps when address mapping changes in
@@ -437,7 +425,7 @@ static inline bool cpu_handle_exception(CPUState *cpu, int *ret)
} else if (replay_has_exception()
&& cpu->icount_decr.u16.low + cpu->icount_extra == 0) {
/* try to cause an exception pending in the log */
- cpu_exec_nocache(cpu, 1, tb_find_fast(cpu, NULL, 0), true);
+ cpu_exec_nocache(cpu, 1, tb_find(cpu, NULL, 0), true);
*ret = -1;
return true;
#endif
@@ -620,7 +608,7 @@ int cpu_exec(CPUState *cpu)
atomic_mb_set(&cpu->tb_flushed, false); /* reset before first TB lookup */
for(;;) {
cpu_handle_interrupt(cpu, &last_tb);
- tb = tb_find_fast(cpu, last_tb, tb_exit);
+ tb = tb_find(cpu, last_tb, tb_exit);
cpu_loop_exec_tb(cpu, tb, &last_tb, &tb_exit, &sc);
/* Try to align the host and virtual clocks
if the guest is in advance */
--
2.7.4
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [Qemu-devel] [PATCH 10/10] tcg: rename tb_find_physical()
2016-07-19 8:32 [Qemu-devel] [PATCH v5 00/10] Reduce lock contention on TCG hot-path Paolo Bonzini
` (8 preceding siblings ...)
2016-07-19 8:32 ` [Qemu-devel] [PATCH 09/10] tcg: Merge tb_find_slow() and tb_find_fast() Paolo Bonzini
@ 2016-07-19 8:32 ` Paolo Bonzini
2016-09-08 12:39 ` Alex Bennée
9 siblings, 1 reply; 20+ messages in thread
From: Paolo Bonzini @ 2016-07-19 8:32 UTC (permalink / raw)
To: qemu-devel; +Cc: serge.fdrv, sergey.fedorov, alex.bennee
From: Sergey Fedorov <serge.fdrv@gmail.com>
In fact, this function does not exactly perform a lookup by physical
address as it is descibed for comment on get_page_addr_code(). Thus
it may be a bit confusing to have "physical" in it's name. So rename it
to tb_htable_lookup() to better reflect its actual functionality.
Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Message-Id: <20160715175852.30749-13-sergey.fedorov@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
cpu-exec.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/cpu-exec.c b/cpu-exec.c
index f7f60b1..b240b9f 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -260,7 +260,7 @@ static bool tb_cmp(const void *p, const void *d)
return false;
}
-static TranslationBlock *tb_find_physical(CPUState *cpu,
+static TranslationBlock *tb_htable_lookup(CPUState *cpu,
target_ulong pc,
target_ulong cs_base,
uint32_t flags)
@@ -296,7 +296,7 @@ static inline TranslationBlock *tb_find(CPUState *cpu,
tb = atomic_read(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)]);
if (unlikely(!tb || tb->pc != pc || tb->cs_base != cs_base ||
tb->flags != flags)) {
- tb = tb_find_physical(cpu, pc, cs_base, flags);
+ tb = tb_htable_lookup(cpu, pc, cs_base, flags);
if (!tb) {
/* mmap_lock is needed by tb_gen_code, and mmap_lock must be
@@ -310,7 +310,7 @@ static inline TranslationBlock *tb_find(CPUState *cpu,
/* There's a chance that our desired tb has been translated while
* taking the locks so we check again inside the lock.
*/
- tb = tb_find_physical(cpu, pc, cs_base, flags);
+ tb = tb_htable_lookup(cpu, pc, cs_base, flags);
if (!tb) {
/* if no translated code available, then translate it now */
tb = tb_gen_code(cpu, pc, cs_base, flags, 0);
--
2.7.4
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [Qemu-devel] [PATCH 10/10] tcg: rename tb_find_physical()
2016-07-19 8:32 ` [Qemu-devel] [PATCH 10/10] tcg: rename tb_find_physical() Paolo Bonzini
@ 2016-09-08 12:39 ` Alex Bennée
0 siblings, 0 replies; 20+ messages in thread
From: Alex Bennée @ 2016-09-08 12:39 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: qemu-devel, serge.fdrv, sergey.fedorov
Paolo Bonzini <pbonzini@redhat.com> writes:
> From: Sergey Fedorov <serge.fdrv@gmail.com>
>
> In fact, this function does not exactly perform a lookup by physical
> address as it is descibed for comment on get_page_addr_code(). Thus
> it may be a bit confusing to have "physical" in it's name. So rename it
> to tb_htable_lookup() to better reflect its actual functionality.
>
> Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
> Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
> Message-Id: <20160715175852.30749-13-sergey.fedorov@linaro.org>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
> ---
> cpu-exec.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/cpu-exec.c b/cpu-exec.c
> index f7f60b1..b240b9f 100644
> --- a/cpu-exec.c
> +++ b/cpu-exec.c
> @@ -260,7 +260,7 @@ static bool tb_cmp(const void *p, const void *d)
> return false;
> }
>
> -static TranslationBlock *tb_find_physical(CPUState *cpu,
> +static TranslationBlock *tb_htable_lookup(CPUState *cpu,
> target_ulong pc,
> target_ulong cs_base,
> uint32_t flags)
> @@ -296,7 +296,7 @@ static inline TranslationBlock *tb_find(CPUState *cpu,
> tb = atomic_read(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)]);
> if (unlikely(!tb || tb->pc != pc || tb->cs_base != cs_base ||
> tb->flags != flags)) {
> - tb = tb_find_physical(cpu, pc, cs_base, flags);
> + tb = tb_htable_lookup(cpu, pc, cs_base, flags);
> if (!tb) {
>
> /* mmap_lock is needed by tb_gen_code, and mmap_lock must be
> @@ -310,7 +310,7 @@ static inline TranslationBlock *tb_find(CPUState *cpu,
> /* There's a chance that our desired tb has been translated while
> * taking the locks so we check again inside the lock.
> */
> - tb = tb_find_physical(cpu, pc, cs_base, flags);
> + tb = tb_htable_lookup(cpu, pc, cs_base, flags);
> if (!tb) {
> /* if no translated code available, then translate it now */
> tb = tb_gen_code(cpu, pc, cs_base, flags, 0);
--
Alex Bennée
^ permalink raw reply [flat|nested] 20+ messages in thread