All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 0/2] Reduce lock contention on TCG hot-path
@ 2016-07-01 16:16 Alex Bennée
  2016-07-01 16:16 ` [Qemu-devel] [PATCH 1/2] tcg: Ensure safe tb_jmp_cache lookup out of 'tb_lock' Alex Bennée
                   ` (2 more replies)
  0 siblings, 3 replies; 24+ messages in thread
From: Alex Bennée @ 2016-07-01 16:16 UTC (permalink / raw)
  To: mttcg, qemu-devel, fred.konrad, a.rigo, serge.fdrv, cota,
	bobby.prani, rth
  Cc: mark.burton, pbonzini, jan.kiszka, peter.maydell,
	claudio.fontana, Alex Bennée

These patches have been on the list before in my base enabling patches
series [1]. However while looking at some user-space work loads I realised
there is no particular reason to hold them back until the MTTCG work
is complete. I fixed one missing atomic_set in Sergey's patch and
addressed his review comments from the last posting.

For a simple parallel user-mode test it give a ~5% speed boost:

Before:

retry.oy called with ['./arm-linux-user/qemu-arm', './pigz.armhf', '-c', '-9', 'source.tar']
Source code is @ pull-target-arm-20160627-153-g1b756f1 or heads/master
run 1: ret=0 (PASS), time=4.755824 (1/1)
run 2: ret=0 (PASS), time=4.756076 (2/2)
run 3: ret=0 (PASS), time=4.755916 (3/3)
run 4: ret=0 (PASS), time=4.755853 (4/4)
run 5: ret=0 (PASS), time=4.755929 (5/5)
Results summary:
0: 5 times (100.00%), avg time 4.755920 (0.000000 deviation)

After:

retry.py called with ['./arm-linux-user/qemu-arm', './pigz.armhf', '-c', '-9', 'source.tar']
Source code is @ pull-target-arm-20160627-155-g579ffd4 or heads/tcg/hot-path-cleanups
run 1: ret=0 (PASS), time=4.505735 (1/1)
run 2: ret=0 (PASS), time=4.505683 (2/2)
run 3: ret=0 (PASS), time=4.505666 (3/3)
run 4: ret=0 (PASS), time=4.505578 (4/4)
run 5: ret=0 (PASS), time=4.505544 (5/5)
Results summary:
0: 5 times (100.00%), avg time 4.505641 (0.000000 deviation)

For system-mode the change is in the noise despite the fact by
dropping the CONFIG_USER_ONLY specific stuff we will run
tb_find_physical twice on the first lookup for any given block.

Before:

retry.py called with ['/home/alex/lsrc/qemu/qemu.git/arm-softmmu/qemu-system-arm', '-machine', 'type=virt', '-display', 'none', '-smp', '1', '-m', '4096', '-cpu', 'cortex-a15', '-serial', 'telnet:127.0.0.1:4444', '-monitor', 'stdio', '-netdev', 'user,id=unet,hostfwd=tcp::2222-:22', '-device', 'virtio-net-device,netdev=unet', '-drive', 'file=/home/alex/lsrc/qemu/images/jessie-arm32.qcow2,id=myblock,index=0,if=none', '-device', 'virtio-blk-device,drive=myblock', '-append', 'console=ttyAMA0 systemd.unit=benchmark.service root=/dev/vda1', '-kernel', '/home/alex/lsrc/qemu/images/aarch32-current-linux-kernel-only.img']
Source code is @ pull-target-arm-20160627-153-g1b756f1 or heads/master
run 1: ret=0 (PASS), time=10.262175 (1/1)
run 2: ret=0 (PASS), time=10.262821 (2/2)
run 3: ret=0 (PASS), time=9.762559 (3/3)
run 4: ret=0 (PASS), time=9.762108 (4/4)
run 5: ret=0 (PASS), time=10.262576 (5/5)
Results summary:
0: 5 times (100.00%), avg time 10.062448 (0.060046 deviation)
Ran command 5 times, 5 passes

After:

retry.py called with ['/home/alex/lsrc/qemu/qemu.git/arm-softmmu/qemu-system-arm', '-machine', 'type=virt', '-display', 'none', '-smp', '1', '-m', '4096', '-cpu', 'cortex-a15', '-serial', 'telnet:127.0.0.1:4444', '-monitor', 'stdio', '-netdev', 'user,id=unet,hostfwd=tcp::2222-:22', '-device', 'virtio-net-device,netdev=unet', '-drive', 'file=/home/alex/lsrc/qemu/images/jessie-arm32.qcow2,id=myblock,index=0,if=none', '-device', 'virtio-blk-device,drive=myblock', '-append', 'console=ttyAMA0 systemd.unit=benchmark.service root=/dev/vda1', '-kernel', '/home/alex/lsrc/qemu/images/aarch32-current-linux-kernel-only.img']
Source code is @ pull-target-arm-20160627-155-g579ffd4 or heads/tcg/hot-path-cleanups
run 1: ret=0 (PASS), time=9.761559 (1/1)
run 2: ret=0 (PASS), time=9.511616 (2/2)
run 3: ret=0 (PASS), time=9.761713 (3/3)
run 4: ret=0 (PASS), time=10.262504 (4/4)
run 5: ret=0 (PASS), time=9.762059 (5/5)
Results summary:
0: 5 times (100.00%), avg time 9.811890 (0.060150 deviation)
Ran command 5 times, 5 passes

[1] https://www.mail-archive.com/qemu-devel@nongnu.org/msg375023.html

Alex Bennée (1):
  cpu-exec: remove tb_lock from the hot-path

Sergey Fedorov (1):
  tcg: Ensure safe tb_jmp_cache lookup out of 'tb_lock'

 cpu-exec.c      | 58 ++++++++++++++++++++++++++++-----------------------------
 translate-all.c |  7 ++++++-
 2 files changed, 35 insertions(+), 30 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH 1/2] tcg: Ensure safe tb_jmp_cache lookup out of 'tb_lock'
  2016-07-01 16:16 [Qemu-devel] [PATCH 0/2] Reduce lock contention on TCG hot-path Alex Bennée
@ 2016-07-01 16:16 ` Alex Bennée
  2016-07-01 23:14   ` Richard Henderson
  2016-07-02  0:17   ` Emilio G. Cota
  2016-07-01 16:16 ` [Qemu-devel] [PATCH 2/2] cpu-exec: remove tb_lock from the hot-path Alex Bennée
  2016-07-02  0:52 ` [Qemu-devel] [PATCH 0/2] Reduce lock contention on TCG hot-path Emilio G. Cota
  2 siblings, 2 replies; 24+ messages in thread
From: Alex Bennée @ 2016-07-01 16:16 UTC (permalink / raw)
  To: mttcg, qemu-devel, fred.konrad, a.rigo, serge.fdrv, cota,
	bobby.prani, rth
  Cc: mark.burton, pbonzini, jan.kiszka, peter.maydell,
	claudio.fontana, Sergey Fedorov, Alex Bennée,
	Peter Crosthwaite

From: Sergey Fedorov <serge.fdrv@gmail.com>

First, ensure atomicity of CPU's 'tb_jmp_cache' access by:
 * using atomic_read() to look up a TB when not holding 'tb_lock';
 * using atomic_write() to remove a TB from each CPU's local cache on
   TB invalidation.

Second, add some memory barriers to ensure we don't put the TB being
invalidated back to CPU's 'tb_jmp_cache'. If we fail to look up a TB in
CPU's local cache because it is being invalidated by some other thread
then it must not be found in the shared TB hash table. Otherwise we'd
put it back to CPU's local cache.

Note that this patch does *not* make CPU's TLB invalidation safe if it
is done from some other thread while the CPU is in its execution loop.

Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
[AJB: fixed missing atomic set, tweak title]
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

---
AJB:
  - tweak title
  - fixed missing set of tb_jmp_cache
---
 cpu-exec.c      | 9 +++++++--
 translate-all.c | 7 ++++++-
 2 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index b840e1d..10ce1cb 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -285,6 +285,11 @@ static TranslationBlock *tb_find_slow(CPUState *cpu,
 {
     TranslationBlock *tb;
 
+    /* Ensure that we won't find a TB in the shared hash table
+     * if it is being invalidated by some other thread.
+     * Otherwise we'd put it back to CPU's local cache.
+     * Pairs with smp_wmb() in tb_phys_invalidate(). */
+    smp_rmb();
     tb = tb_find_physical(cpu, pc, cs_base, flags);
     if (tb) {
         goto found;
@@ -315,7 +320,7 @@ static TranslationBlock *tb_find_slow(CPUState *cpu,
 
 found:
     /* we add the TB in the virtual pc hash table */
-    cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)] = tb;
+    atomic_set(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)], tb);
     return tb;
 }
 
@@ -333,7 +338,7 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
        is executed. */
     cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags);
     tb_lock();
-    tb = cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)];
+    tb = atomic_read(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)]);
     if (unlikely(!tb || tb->pc != pc || tb->cs_base != cs_base ||
                  tb->flags != flags)) {
         tb = tb_find_slow(cpu, pc, cs_base, flags);
diff --git a/translate-all.c b/translate-all.c
index eaa95e4..1fcfe79 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -1004,11 +1004,16 @@ void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr)
         invalidate_page_bitmap(p);
     }
 
+    /* Ensure that we won't find the TB in the shared hash table
+     * if we con't see it in CPU's local cache.
+     * Pairs with smp_rmb() in tb_find_slow(). */
+    smp_wmb();
+
     /* remove the TB from the hash list */
     h = tb_jmp_cache_hash_func(tb->pc);
     CPU_FOREACH(cpu) {
         if (cpu->tb_jmp_cache[h] == tb) {
-            cpu->tb_jmp_cache[h] = NULL;
+            atomic_set(&cpu->tb_jmp_cache[h], NULL);
         }
     }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH 2/2] cpu-exec: remove tb_lock from the hot-path
  2016-07-01 16:16 [Qemu-devel] [PATCH 0/2] Reduce lock contention on TCG hot-path Alex Bennée
  2016-07-01 16:16 ` [Qemu-devel] [PATCH 1/2] tcg: Ensure safe tb_jmp_cache lookup out of 'tb_lock' Alex Bennée
@ 2016-07-01 16:16 ` Alex Bennée
  2016-07-01 23:19   ` Richard Henderson
  2016-07-02  0:39   ` Emilio G. Cota
  2016-07-02  0:52 ` [Qemu-devel] [PATCH 0/2] Reduce lock contention on TCG hot-path Emilio G. Cota
  2 siblings, 2 replies; 24+ messages in thread
From: Alex Bennée @ 2016-07-01 16:16 UTC (permalink / raw)
  To: mttcg, qemu-devel, fred.konrad, a.rigo, serge.fdrv, cota,
	bobby.prani, rth
  Cc: mark.burton, pbonzini, jan.kiszka, peter.maydell,
	claudio.fontana, Alex Bennée, Peter Crosthwaite

Lock contention in the hot path of moving between existing patched
TranslationBlocks is the main drag in multithreaded performance. This
patch pushes the tb_lock() usage down to the two places that really need
it:

  - code generation (tb_gen_code)
  - jump patching (tb_add_jump)

The rest of the code doesn't really need to hold a lock as it is either
using per-CPU structures, atomically updated or designed to be used in
concurrent read situations (qht_lookup).

To keep things simple I removed the #ifdef CONFIG_USER_ONLY stuff as the
locks become NOPs anyway until the MTTCG work is completed.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

---
v3
  - fix merge conflicts with Sergey's patch
v4
  - revert name tweaking
  - drop test jmp_list_next outside lock
  - mention lock NOPs in comments
---
 cpu-exec.c | 49 ++++++++++++++++++++++---------------------------
 1 file changed, 22 insertions(+), 27 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index 10ce1cb..b731774 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -291,35 +291,29 @@ static TranslationBlock *tb_find_slow(CPUState *cpu,
      * Pairs with smp_wmb() in tb_phys_invalidate(). */
     smp_rmb();
     tb = tb_find_physical(cpu, pc, cs_base, flags);
-    if (tb) {
-        goto found;
-    }
+    if (!tb) {
 
-#ifdef CONFIG_USER_ONLY
-    /* mmap_lock is needed by tb_gen_code, and mmap_lock must be
-     * taken outside tb_lock.  Since we're momentarily dropping
-     * tb_lock, there's a chance that our desired tb has been
-     * translated.
-     */
-    tb_unlock();
-    mmap_lock();
-    tb_lock();
-    tb = tb_find_physical(cpu, pc, cs_base, flags);
-    if (tb) {
-        mmap_unlock();
-        goto found;
-    }
-#endif
+        /* mmap_lock is needed by tb_gen_code, and mmap_lock must be
+         * taken outside tb_lock. As system emulation is currently
+         * single threaded the locks are NOPs.
+         */
+        mmap_lock();
+        tb_lock();
 
-    /* if no translated code available, then translate it now */
-    tb = tb_gen_code(cpu, pc, cs_base, flags, 0);
+        /* There's a chance that our desired tb has been translated while
+         * taking the locks so we check again inside the lock.
+         */
+        tb = tb_find_physical(cpu, pc, cs_base, flags);
+        if (!tb) {
+            /* if no translated code available, then translate it now */
+            tb = tb_gen_code(cpu, pc, cs_base, flags, 0);
+        }
 
-#ifdef CONFIG_USER_ONLY
-    mmap_unlock();
-#endif
+        tb_unlock();
+        mmap_unlock();
+    }
 
-found:
-    /* we add the TB in the virtual pc hash table */
+    /* We add the TB in the virtual pc hash table for the fast lookup */
     atomic_set(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)], tb);
     return tb;
 }
@@ -337,7 +331,6 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
        always be the same before a given translated block
        is executed. */
     cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags);
-    tb_lock();
     tb = atomic_read(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)]);
     if (unlikely(!tb || tb->pc != pc || tb->cs_base != cs_base ||
                  tb->flags != flags)) {
@@ -359,11 +352,13 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
         *last_tb = NULL;
     }
 #endif
+
     /* See if we can patch the calling TB. */
     if (*last_tb && !qemu_loglevel_mask(CPU_LOG_TB_NOCHAIN)) {
+        tb_lock();
         tb_add_jump(*last_tb, tb_exit, tb);
+        tb_unlock();
     }
-    tb_unlock();
     return tb;
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] tcg: Ensure safe tb_jmp_cache lookup out of 'tb_lock'
  2016-07-01 16:16 ` [Qemu-devel] [PATCH 1/2] tcg: Ensure safe tb_jmp_cache lookup out of 'tb_lock' Alex Bennée
@ 2016-07-01 23:14   ` Richard Henderson
  2016-07-02  0:17   ` Emilio G. Cota
  1 sibling, 0 replies; 24+ messages in thread
From: Richard Henderson @ 2016-07-01 23:14 UTC (permalink / raw)
  To: Alex Bennée, mttcg, qemu-devel, fred.konrad, a.rigo,
	serge.fdrv, cota, bobby.prani
  Cc: mark.burton, pbonzini, jan.kiszka, peter.maydell,
	claudio.fontana, Sergey Fedorov, Peter Crosthwaite

On 07/01/2016 09:16 AM, Alex Bennée wrote:
> From: Sergey Fedorov <serge.fdrv@gmail.com>
>
> First, ensure atomicity of CPU's 'tb_jmp_cache' access by:
>  * using atomic_read() to look up a TB when not holding 'tb_lock';
>  * using atomic_write() to remove a TB from each CPU's local cache on
>    TB invalidation.
>
> Second, add some memory barriers to ensure we don't put the TB being
> invalidated back to CPU's 'tb_jmp_cache'. If we fail to look up a TB in
> CPU's local cache because it is being invalidated by some other thread
> then it must not be found in the shared TB hash table. Otherwise we'd
> put it back to CPU's local cache.
>
> Note that this patch does *not* make CPU's TLB invalidation safe if it
> is done from some other thread while the CPU is in its execution loop.
>
> Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
> Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
> [AJB: fixed missing atomic set, tweak title]
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
>
> ---
> AJB:
>   - tweak title
>   - fixed missing set of tb_jmp_cache
> ---
>  cpu-exec.c      | 9 +++++++--
>  translate-all.c | 7 ++++++-
>  2 files changed, 13 insertions(+), 3 deletions(-)

Reviewed-by: Richard Henderson <rth@twiddle.net>


r~

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] cpu-exec: remove tb_lock from the hot-path
  2016-07-01 16:16 ` [Qemu-devel] [PATCH 2/2] cpu-exec: remove tb_lock from the hot-path Alex Bennée
@ 2016-07-01 23:19   ` Richard Henderson
  2016-07-02  0:39   ` Emilio G. Cota
  1 sibling, 0 replies; 24+ messages in thread
From: Richard Henderson @ 2016-07-01 23:19 UTC (permalink / raw)
  To: Alex Bennée, mttcg, qemu-devel, fred.konrad, a.rigo,
	serge.fdrv, cota, bobby.prani
  Cc: mark.burton, pbonzini, jan.kiszka, peter.maydell,
	claudio.fontana, Peter Crosthwaite

On 07/01/2016 09:16 AM, Alex Bennée wrote:
> Lock contention in the hot path of moving between existing patched
> TranslationBlocks is the main drag in multithreaded performance. This
> patch pushes the tb_lock() usage down to the two places that really need
> it:
>
>   - code generation (tb_gen_code)
>   - jump patching (tb_add_jump)
>
> The rest of the code doesn't really need to hold a lock as it is either
> using per-CPU structures, atomically updated or designed to be used in
> concurrent read situations (qht_lookup).
>
> To keep things simple I removed the #ifdef CONFIG_USER_ONLY stuff as the
> locks become NOPs anyway until the MTTCG work is completed.
>
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
>
> ---
> v3
>   - fix merge conflicts with Sergey's patch
> v4
>   - revert name tweaking
>   - drop test jmp_list_next outside lock
>   - mention lock NOPs in comments
> ---
>  cpu-exec.c | 49 ++++++++++++++++++++++---------------------------
>  1 file changed, 22 insertions(+), 27 deletions(-)

Reviewed-by: Richard Henderson <rth@twiddle.net>


r~

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] tcg: Ensure safe tb_jmp_cache lookup out of 'tb_lock'
  2016-07-01 16:16 ` [Qemu-devel] [PATCH 1/2] tcg: Ensure safe tb_jmp_cache lookup out of 'tb_lock' Alex Bennée
  2016-07-01 23:14   ` Richard Henderson
@ 2016-07-02  0:17   ` Emilio G. Cota
  2016-07-02  0:32     ` Richard Henderson
  2016-07-02  7:09     ` Alex Bennée
  1 sibling, 2 replies; 24+ messages in thread
From: Emilio G. Cota @ 2016-07-02  0:17 UTC (permalink / raw)
  To: Alex Bennée
  Cc: mttcg, qemu-devel, fred.konrad, a.rigo, serge.fdrv, bobby.prani,
	rth, mark.burton, pbonzini, jan.kiszka, peter.maydell,
	claudio.fontana, Sergey Fedorov, Peter Crosthwaite

On Fri, Jul 01, 2016 at 17:16:09 +0100, Alex Bennée wrote:
> From: Sergey Fedorov <serge.fdrv@gmail.com>
(snip)
> @@ -333,7 +338,7 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
>         is executed. */
>      cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags);
>      tb_lock();
> -    tb = cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)];
> +    tb = atomic_read(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)]);
>      if (unlikely(!tb || tb->pc != pc || tb->cs_base != cs_base ||
>                   tb->flags != flags)) {
>          tb = tb_find_slow(cpu, pc, cs_base, flags);
> diff --git a/translate-all.c b/translate-all.c
> index eaa95e4..1fcfe79 100644
> --- a/translate-all.c
> +++ b/translate-all.c
> @@ -1004,11 +1004,16 @@ void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr)
>          invalidate_page_bitmap(p);
>      }
>  
> +    /* Ensure that we won't find the TB in the shared hash table
> +     * if we con't see it in CPU's local cache.

s/con't/can't/

> +     * Pairs with smp_rmb() in tb_find_slow(). */
> +    smp_wmb();

This fence is already embedded in qht_remove, since it internally
calls seqlock_write_end() on a successful removal, so we could get
away with a comment instead of emitting a redundant fence.
However, if qht ever changed its implementation this would have
to be taken into account. So I'd be OK with emitting the
fence here too.

> +
>      /* remove the TB from the hash list */
>      h = tb_jmp_cache_hash_func(tb->pc);
>      CPU_FOREACH(cpu) {
>          if (cpu->tb_jmp_cache[h] == tb) {

Missing atomic_read here: if (atomic_read(cpu->tb_jmp_cache[...])) {

> -            cpu->tb_jmp_cache[h] = NULL;
> +            atomic_set(&cpu->tb_jmp_cache[h], NULL);

Other than that,

  Reviewed-by: Emilio G. Cota <cota@braap.org>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] tcg: Ensure safe tb_jmp_cache lookup out of 'tb_lock'
  2016-07-02  0:17   ` Emilio G. Cota
@ 2016-07-02  0:32     ` Richard Henderson
  2016-07-04 22:33       ` Emilio G. Cota
  2016-07-02  7:09     ` Alex Bennée
  1 sibling, 1 reply; 24+ messages in thread
From: Richard Henderson @ 2016-07-02  0:32 UTC (permalink / raw)
  To: Emilio G. Cota, Alex Bennée
  Cc: mttcg, qemu-devel, fred.konrad, a.rigo, serge.fdrv, bobby.prani,
	mark.burton, pbonzini, jan.kiszka, peter.maydell,
	claudio.fontana, Sergey Fedorov, Peter Crosthwaite

On 07/01/2016 05:17 PM, Emilio G. Cota wrote:
> On Fri, Jul 01, 2016 at 17:16:09 +0100, Alex Bennée wrote:
>> From: Sergey Fedorov <serge.fdrv@gmail.com>
> (snip)
>> @@ -333,7 +338,7 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
>>         is executed. */
>>      cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags);
>>      tb_lock();
>> -    tb = cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)];
>> +    tb = atomic_read(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)]);
>>      if (unlikely(!tb || tb->pc != pc || tb->cs_base != cs_base ||
>>                   tb->flags != flags)) {
>>          tb = tb_find_slow(cpu, pc, cs_base, flags);
>> diff --git a/translate-all.c b/translate-all.c
>> index eaa95e4..1fcfe79 100644
>> --- a/translate-all.c
>> +++ b/translate-all.c
>> @@ -1004,11 +1004,16 @@ void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr)
>>          invalidate_page_bitmap(p);
>>      }
>>
>> +    /* Ensure that we won't find the TB in the shared hash table
>> +     * if we con't see it in CPU's local cache.
>
> s/con't/can't/
>
>> +     * Pairs with smp_rmb() in tb_find_slow(). */
>> +    smp_wmb();
>
> This fence is already embedded in qht_remove, since it internally
> calls seqlock_write_end() on a successful removal...

No.  There's stuff that happens after qht_remove and before this barrier: 
tb_page_remove and invalidate_page_bitmap.


r~

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] cpu-exec: remove tb_lock from the hot-path
  2016-07-01 16:16 ` [Qemu-devel] [PATCH 2/2] cpu-exec: remove tb_lock from the hot-path Alex Bennée
  2016-07-01 23:19   ` Richard Henderson
@ 2016-07-02  0:39   ` Emilio G. Cota
  2016-07-04 11:45     ` Alex Bennée
  1 sibling, 1 reply; 24+ messages in thread
From: Emilio G. Cota @ 2016-07-02  0:39 UTC (permalink / raw)
  To: Alex Bennée
  Cc: mttcg, qemu-devel, fred.konrad, a.rigo, serge.fdrv, bobby.prani,
	rth, mark.burton, pbonzini, jan.kiszka, peter.maydell,
	claudio.fontana, Peter Crosthwaite

On Fri, Jul 01, 2016 at 17:16:10 +0100, Alex Bennée wrote:
> Lock contention in the hot path of moving between existing patched
> TranslationBlocks is the main drag in multithreaded performance. This
> patch pushes the tb_lock() usage down to the two places that really need
> it:
> 
>   - code generation (tb_gen_code)
>   - jump patching (tb_add_jump)
> 
> The rest of the code doesn't really need to hold a lock as it is either
> using per-CPU structures, atomically updated or designed to be used in
> concurrent read situations (qht_lookup).
> 
> To keep things simple I removed the #ifdef CONFIG_USER_ONLY stuff as the
> locks become NOPs anyway until the MTTCG work is completed.

>From a scalability point of view it would be better to have a single
critical section.

>From a correctness point of view, we're reading tb->page_addr[1]
without holding a lock. This field is set after qht_insert(tb),
so we might read a yet-uninitialized value.

I propose to just extend the critical section, like we used to
do with tcg_lock_reset.

		Emilio

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH 0/2] Reduce lock contention on TCG hot-path
  2016-07-01 16:16 [Qemu-devel] [PATCH 0/2] Reduce lock contention on TCG hot-path Alex Bennée
  2016-07-01 16:16 ` [Qemu-devel] [PATCH 1/2] tcg: Ensure safe tb_jmp_cache lookup out of 'tb_lock' Alex Bennée
  2016-07-01 16:16 ` [Qemu-devel] [PATCH 2/2] cpu-exec: remove tb_lock from the hot-path Alex Bennée
@ 2016-07-02  0:52 ` Emilio G. Cota
  2016-07-02  7:08   ` Alex Bennée
  2 siblings, 1 reply; 24+ messages in thread
From: Emilio G. Cota @ 2016-07-02  0:52 UTC (permalink / raw)
  To: Alex Bennée
  Cc: mttcg, qemu-devel, fred.konrad, a.rigo, serge.fdrv, bobby.prani,
	rth, mark.burton, pbonzini, jan.kiszka, peter.maydell,
	claudio.fontana

On Fri, Jul 01, 2016 at 17:16:08 +0100, Alex Bennée wrote:
(snip)
> run 1: ret=0 (PASS), time=4.755824 (1/1)
> run 2: ret=0 (PASS), time=4.756076 (2/2)
> run 3: ret=0 (PASS), time=4.755916 (3/3)
> run 4: ret=0 (PASS), time=4.755853 (4/4)
> run 5: ret=0 (PASS), time=4.755929 (5/5)
> Results summary:
> 0: 5 times (100.00%), avg time 4.755920 (0.000000 deviation)

(snip)
> run 1: ret=0 (PASS), time=9.761559 (1/1)
> run 2: ret=0 (PASS), time=9.511616 (2/2)
> run 3: ret=0 (PASS), time=9.761713 (3/3)
> run 4: ret=0 (PASS), time=10.262504 (4/4)
> run 5: ret=0 (PASS), time=9.762059 (5/5)
> Results summary:
> 0: 5 times (100.00%), avg time 9.811890 (0.060150 deviation)

This is a needless diversion, but I was explaining this stuff today
to a student so couldn't help but notice.

The computed deviations seem overly small. For instance, the corrected sample
standard deviation ( https://en.wikipedia.org/wiki/Standard_deviation )
(which is usually referred to as "standard deviation", or "error")
for the last test should be 0.2742 instead of 0.06.

How are they being computed? I tried to find the source of your script
(in the kvm-unit-tests repo) but couldn't find it.

Thanks,

		Emilio

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH 0/2] Reduce lock contention on TCG hot-path
  2016-07-02  0:52 ` [Qemu-devel] [PATCH 0/2] Reduce lock contention on TCG hot-path Emilio G. Cota
@ 2016-07-02  7:08   ` Alex Bennée
  2016-07-02 16:03     ` Paolo Bonzini
  0 siblings, 1 reply; 24+ messages in thread
From: Alex Bennée @ 2016-07-02  7:08 UTC (permalink / raw)
  To: Emilio G. Cota
  Cc: mttcg, qemu-devel, fred.konrad, a.rigo, serge.fdrv, bobby.prani,
	rth, mark.burton, pbonzini, jan.kiszka, peter.maydell,
	claudio.fontana


Emilio G. Cota <cota@braap.org> writes:

> On Fri, Jul 01, 2016 at 17:16:08 +0100, Alex Bennée wrote:
> (snip)
>> run 1: ret=0 (PASS), time=4.755824 (1/1)
>> run 2: ret=0 (PASS), time=4.756076 (2/2)
>> run 3: ret=0 (PASS), time=4.755916 (3/3)
>> run 4: ret=0 (PASS), time=4.755853 (4/4)
>> run 5: ret=0 (PASS), time=4.755929 (5/5)
>> Results summary:
>> 0: 5 times (100.00%), avg time 4.755920 (0.000000 deviation)
>
> (snip)
>> run 1: ret=0 (PASS), time=9.761559 (1/1)
>> run 2: ret=0 (PASS), time=9.511616 (2/2)
>> run 3: ret=0 (PASS), time=9.761713 (3/3)
>> run 4: ret=0 (PASS), time=10.262504 (4/4)
>> run 5: ret=0 (PASS), time=9.762059 (5/5)
>> Results summary:
>> 0: 5 times (100.00%), avg time 9.811890 (0.060150 deviation)
>
> This is a needless diversion, but I was explaining this stuff today
> to a student so couldn't help but notice.
>
> The computed deviations seem overly small. For instance, the corrected sample
> standard deviation ( https://en.wikipedia.org/wiki/Standard_deviation )
> (which is usually referred to as "standard deviation", or "error")
> for the last test should be 0.2742 instead of 0.06.

Hmm I was doing from memory but it should be the mean of the sum of the
squares of the deviation:

        # calculate deviation
        deviation = 0
        for r in res:
            deviation += (r.time - avg_time)**2

        deviation = deviation / count

>
> How are they being computed? I tried to find the source of your script
> (in the kvm-unit-tests repo) but couldn't find it.

It's a retry script that got a little out of hand:

    https://github.com/stsquad/retry

>
> Thanks,
>
> 		Emilio


--
Alex Bennée

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] tcg: Ensure safe tb_jmp_cache lookup out of 'tb_lock'
  2016-07-02  0:17   ` Emilio G. Cota
  2016-07-02  0:32     ` Richard Henderson
@ 2016-07-02  7:09     ` Alex Bennée
  2016-07-04 22:31       ` Emilio G. Cota
  1 sibling, 1 reply; 24+ messages in thread
From: Alex Bennée @ 2016-07-02  7:09 UTC (permalink / raw)
  To: Emilio G. Cota
  Cc: mttcg, qemu-devel, fred.konrad, a.rigo, serge.fdrv, bobby.prani,
	rth, mark.burton, pbonzini, jan.kiszka, peter.maydell,
	claudio.fontana, Sergey Fedorov, Peter Crosthwaite


Emilio G. Cota <cota@braap.org> writes:

> On Fri, Jul 01, 2016 at 17:16:09 +0100, Alex Bennée wrote:
>> From: Sergey Fedorov <serge.fdrv@gmail.com>
> (snip)
>> @@ -333,7 +338,7 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
>>         is executed. */
>>      cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags);
>>      tb_lock();
>> -    tb = cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)];
>> +    tb = atomic_read(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)]);
>>      if (unlikely(!tb || tb->pc != pc || tb->cs_base != cs_base ||
>>                   tb->flags != flags)) {
>>          tb = tb_find_slow(cpu, pc, cs_base, flags);
>> diff --git a/translate-all.c b/translate-all.c
>> index eaa95e4..1fcfe79 100644
>> --- a/translate-all.c
>> +++ b/translate-all.c
>> @@ -1004,11 +1004,16 @@ void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr)
>>          invalidate_page_bitmap(p);
>>      }
>>
>> +    /* Ensure that we won't find the TB in the shared hash table
>> +     * if we con't see it in CPU's local cache.
>
> s/con't/can't/
>
>> +     * Pairs with smp_rmb() in tb_find_slow(). */
>> +    smp_wmb();
>
> This fence is already embedded in qht_remove, since it internally
> calls seqlock_write_end() on a successful removal, so we could get
> away with a comment instead of emitting a redundant fence.
> However, if qht ever changed its implementation this would have
> to be taken into account. So I'd be OK with emitting the
> fence here too.
>
>> +
>>      /* remove the TB from the hash list */
>>      h = tb_jmp_cache_hash_func(tb->pc);
>>      CPU_FOREACH(cpu) {
>>          if (cpu->tb_jmp_cache[h] == tb) {
>
> Missing atomic_read here: if (atomic_read(cpu->tb_jmp_cache[...])) {

Oops, good catch.

>
>> -            cpu->tb_jmp_cache[h] = NULL;
>> +            atomic_set(&cpu->tb_jmp_cache[h], NULL);
>
> Other than that,
>
>   Reviewed-by: Emilio G. Cota <cota@braap.org>


--
Alex Bennée

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH 0/2] Reduce lock contention on TCG hot-path
  2016-07-02  7:08   ` Alex Bennée
@ 2016-07-02 16:03     ` Paolo Bonzini
  0 siblings, 0 replies; 24+ messages in thread
From: Paolo Bonzini @ 2016-07-02 16:03 UTC (permalink / raw)
  To: Alex Bennée, Emilio G. Cota
  Cc: mttcg, peter.maydell, claudio.fontana, jan.kiszka, mark.burton,
	a.rigo, qemu-devel, serge.fdrv, bobby.prani, rth, fred.konrad



On 02/07/2016 09:08, Alex Bennée wrote:
> Hmm I was doing from memory but it should be the mean of the sum of the
> squares of the deviation:
> 
>         # calculate deviation
>         deviation = 0
>         for r in res:
>             deviation += (r.time - avg_time)**2
> 
>         deviation = deviation / count

This is the population variance.  "Standard deviation" and "variance"
usually refer to the sample standard deviation and sample variance,
where you divide by count-1.

For the population stdev and sample stdev you have to take the square root.

LibreOffice agrees with Emilio, giving a sample stdev of 0.2742 and a
population stdev of 0.2453

Paolo

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] cpu-exec: remove tb_lock from the hot-path
  2016-07-02  0:39   ` Emilio G. Cota
@ 2016-07-04 11:45     ` Alex Bennée
  2016-07-04 22:30       ` Emilio G. Cota
  0 siblings, 1 reply; 24+ messages in thread
From: Alex Bennée @ 2016-07-04 11:45 UTC (permalink / raw)
  To: Emilio G. Cota
  Cc: mttcg, qemu-devel, fred.konrad, a.rigo, serge.fdrv, bobby.prani,
	rth, mark.burton, pbonzini, jan.kiszka, peter.maydell,
	claudio.fontana, Peter Crosthwaite


Emilio G. Cota <cota@braap.org> writes:

> On Fri, Jul 01, 2016 at 17:16:10 +0100, Alex Bennée wrote:
>> Lock contention in the hot path of moving between existing patched
>> TranslationBlocks is the main drag in multithreaded performance. This
>> patch pushes the tb_lock() usage down to the two places that really need
>> it:
>>
>>   - code generation (tb_gen_code)
>>   - jump patching (tb_add_jump)
>>
>> The rest of the code doesn't really need to hold a lock as it is either
>> using per-CPU structures, atomically updated or designed to be used in
>> concurrent read situations (qht_lookup).
>>
>> To keep things simple I removed the #ifdef CONFIG_USER_ONLY stuff as the
>> locks become NOPs anyway until the MTTCG work is completed.
>
> From a scalability point of view it would be better to have a single
> critical section.

You mean merge the critical region for patching and code-generation?

>
> From a correctness point of view, we're reading tb->page_addr[1]
> without holding a lock. This field is set after qht_insert(tb),
> so we might read a yet-uninitialized value.
>
> I propose to just extend the critical section, like we used to
> do with tcg_lock_reset.
>
> 		Emilio


--
Alex Bennée

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] cpu-exec: remove tb_lock from the hot-path
  2016-07-04 11:45     ` Alex Bennée
@ 2016-07-04 22:30       ` Emilio G. Cota
  2016-07-05 11:14         ` Alex Bennée
  0 siblings, 1 reply; 24+ messages in thread
From: Emilio G. Cota @ 2016-07-04 22:30 UTC (permalink / raw)
  To: Alex Bennée
  Cc: mttcg, qemu-devel, fred.konrad, a.rigo, serge.fdrv, bobby.prani,
	rth, mark.burton, pbonzini, jan.kiszka, peter.maydell,
	claudio.fontana, Peter Crosthwaite

On Mon, Jul 04, 2016 at 12:45:52 +0100, Alex Bennée wrote:
> 
> Emilio G. Cota <cota@braap.org> writes:
> 
> > On Fri, Jul 01, 2016 at 17:16:10 +0100, Alex Bennée wrote:
> >> Lock contention in the hot path of moving between existing patched
> >> TranslationBlocks is the main drag in multithreaded performance. This
> >> patch pushes the tb_lock() usage down to the two places that really need
> >> it:
> >>
> >>   - code generation (tb_gen_code)
> >>   - jump patching (tb_add_jump)
> >>
> >> The rest of the code doesn't really need to hold a lock as it is either
> >> using per-CPU structures, atomically updated or designed to be used in
> >> concurrent read situations (qht_lookup).
> >>
> >> To keep things simple I removed the #ifdef CONFIG_USER_ONLY stuff as the
> >> locks become NOPs anyway until the MTTCG work is completed.
> >
> > From a scalability point of view it would be better to have a single
> > critical section.
> 
> You mean merge the critical region for patching and code-generation?

Yes, I'd keep the lock held and drop it (if it was held) after the patching
is done, like IIRC we used to do:
(snip)
> > I propose to just extend the critical section, like we used to
> > do with tcg_lock_reset.

		E.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] tcg: Ensure safe tb_jmp_cache lookup out of 'tb_lock'
  2016-07-02  7:09     ` Alex Bennée
@ 2016-07-04 22:31       ` Emilio G. Cota
  2016-07-05 12:49         ` Paolo Bonzini
  0 siblings, 1 reply; 24+ messages in thread
From: Emilio G. Cota @ 2016-07-04 22:31 UTC (permalink / raw)
  To: Alex Bennée
  Cc: mttcg, qemu-devel, fred.konrad, a.rigo, serge.fdrv, bobby.prani,
	rth, mark.burton, pbonzini, jan.kiszka, peter.maydell,
	claudio.fontana, Sergey Fedorov, Peter Crosthwaite

On Sat, Jul 02, 2016 at 08:09:35 +0100, Alex Bennée wrote:
> 
> Emilio G. Cota <cota@braap.org> writes:
> 
> > On Fri, Jul 01, 2016 at 17:16:09 +0100, Alex Bennée wrote:
> >> From: Sergey Fedorov <serge.fdrv@gmail.com>
> > (snip)
> >> @@ -333,7 +338,7 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
> >>         is executed. */
> >>      cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags);
> >>      tb_lock();
> >> -    tb = cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)];
> >> +    tb = atomic_read(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)]);
> >>      if (unlikely(!tb || tb->pc != pc || tb->cs_base != cs_base ||
> >>                   tb->flags != flags)) {
> >>          tb = tb_find_slow(cpu, pc, cs_base, flags);
> >> diff --git a/translate-all.c b/translate-all.c
> >> index eaa95e4..1fcfe79 100644
> >> --- a/translate-all.c
> >> +++ b/translate-all.c
> >> @@ -1004,11 +1004,16 @@ void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr)
> >>          invalidate_page_bitmap(p);
> >>      }
> >>
> >> +    /* Ensure that we won't find the TB in the shared hash table
> >> +     * if we con't see it in CPU's local cache.
> >
> > s/con't/can't/
> >
> >> +     * Pairs with smp_rmb() in tb_find_slow(). */
> >> +    smp_wmb();
> >
> > This fence is already embedded in qht_remove, since it internally
> > calls seqlock_write_end() on a successful removal, so we could get
> > away with a comment instead of emitting a redundant fence.
> > However, if qht ever changed its implementation this would have
> > to be taken into account. So I'd be OK with emitting the
> > fence here too.
> >
> >> +
> >>      /* remove the TB from the hash list */
> >>      h = tb_jmp_cache_hash_func(tb->pc);
> >>      CPU_FOREACH(cpu) {
> >>          if (cpu->tb_jmp_cache[h] == tb) {
> >
> > Missing atomic_read here: if (atomic_read(cpu->tb_jmp_cache[...])) {
> 
> Oops, good catch.

My mistake. An atomic_read here isn't needed: as the commit message
points out, we only need atomic_read when tb_lock isn't held. In this
case tb_lock is held, so we only use atomic accesses for writing
to the array.

		E.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] tcg: Ensure safe tb_jmp_cache lookup out of 'tb_lock'
  2016-07-02  0:32     ` Richard Henderson
@ 2016-07-04 22:33       ` Emilio G. Cota
  0 siblings, 0 replies; 24+ messages in thread
From: Emilio G. Cota @ 2016-07-04 22:33 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Alex Bennée, mttcg, qemu-devel, fred.konrad, a.rigo,
	serge.fdrv, bobby.prani, mark.burton, pbonzini, jan.kiszka,
	peter.maydell, claudio.fontana, Sergey Fedorov,
	Peter Crosthwaite

On Fri, Jul 01, 2016 at 17:32:01 -0700, Richard Henderson wrote:
> On 07/01/2016 05:17 PM, Emilio G. Cota wrote:
> >On Fri, Jul 01, 2016 at 17:16:09 +0100, Alex Bennée wrote:
> >>From: Sergey Fedorov <serge.fdrv@gmail.com>
> >(snip)
> >>@@ -333,7 +338,7 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
> >>        is executed. */
> >>     cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags);
> >>     tb_lock();
> >>-    tb = cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)];
> >>+    tb = atomic_read(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)]);
> >>     if (unlikely(!tb || tb->pc != pc || tb->cs_base != cs_base ||
> >>                  tb->flags != flags)) {
> >>         tb = tb_find_slow(cpu, pc, cs_base, flags);
> >>diff --git a/translate-all.c b/translate-all.c
> >>index eaa95e4..1fcfe79 100644
> >>--- a/translate-all.c
> >>+++ b/translate-all.c
> >>@@ -1004,11 +1004,16 @@ void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr)
> >>         invalidate_page_bitmap(p);
> >>     }
> >>
> >>+    /* Ensure that we won't find the TB in the shared hash table
> >>+     * if we con't see it in CPU's local cache.
> >
> >s/con't/can't/
> >
> >>+     * Pairs with smp_rmb() in tb_find_slow(). */
> >>+    smp_wmb();
> >
> >This fence is already embedded in qht_remove, since it internally
> >calls seqlock_write_end() on a successful removal...
> 
> No.  There's stuff that happens after qht_remove and before this barrier:
> tb_page_remove and invalidate_page_bitmap.

I can't see how that "tb page" stuff you refer to is relevant here.

AFAICT the barrier pairing in this patch only applies to tb_jmp_cache
and qht. If as you say it does not, then all comments and the commit message
are wrong.

What am I missing?

		Emilio

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] cpu-exec: remove tb_lock from the hot-path
  2016-07-04 22:30       ` Emilio G. Cota
@ 2016-07-05 11:14         ` Alex Bennée
  2016-07-05 12:47           ` Paolo Bonzini
  0 siblings, 1 reply; 24+ messages in thread
From: Alex Bennée @ 2016-07-05 11:14 UTC (permalink / raw)
  To: Emilio G. Cota
  Cc: mttcg, qemu-devel, fred.konrad, a.rigo, serge.fdrv, bobby.prani,
	rth, mark.burton, pbonzini, jan.kiszka, peter.maydell,
	claudio.fontana, Peter Crosthwaite


Emilio G. Cota <cota@braap.org> writes:

> On Mon, Jul 04, 2016 at 12:45:52 +0100, Alex Bennée wrote:
>>
>> Emilio G. Cota <cota@braap.org> writes:
>>
>> > On Fri, Jul 01, 2016 at 17:16:10 +0100, Alex Bennée wrote:
>> >> Lock contention in the hot path of moving between existing patched
>> >> TranslationBlocks is the main drag in multithreaded performance. This
>> >> patch pushes the tb_lock() usage down to the two places that really need
>> >> it:
>> >>
>> >>   - code generation (tb_gen_code)
>> >>   - jump patching (tb_add_jump)
>> >>
>> >> The rest of the code doesn't really need to hold a lock as it is either
>> >> using per-CPU structures, atomically updated or designed to be used in
>> >> concurrent read situations (qht_lookup).
>> >>
>> >> To keep things simple I removed the #ifdef CONFIG_USER_ONLY stuff as the
>> >> locks become NOPs anyway until the MTTCG work is completed.
>> >
>> > From a scalability point of view it would be better to have a single
>> > critical section.
>>
>> You mean merge the critical region for patching and code-generation?
>
> Yes, I'd keep the lock held and drop it (if it was held) after the patching
> is done, like IIRC we used to do:
> (snip)
>> > I propose to just extend the critical section, like we used to
>> > do with tcg_lock_reset.

Hmm, so I came up with this:

/*
 * Patch the last TB with a jump to the current TB.
 *
 * Modification of the TB has to be protected with tb_lock which can
 * either be already held or taken here.
 */
static inline void maybe_patch_last_tb(CPUState *cpu,
                                       TranslationBlock *tb,
                                       TranslationBlock **last_tb,
                                       int tb_exit,
                                       bool locked)
{
    if (cpu->tb_flushed) {
        /* Ensure that no TB jump will be modified as the
         * translation buffer has been flushed.
         */
        *last_tb = NULL;
        cpu->tb_flushed = false;
    }
#ifndef CONFIG_USER_ONLY
    /* We don't take care of direct jumps when address mapping changes in
     * system emulation. So it's not safe to make a direct jump to a TB
     * spanning two pages because the mapping for the second page can change.
     */
    if (tb->page_addr[1] != -1) {
        *last_tb = NULL;
    }
#endif
    /* See if we can patch the calling TB. */
    if (*last_tb && !qemu_loglevel_mask(CPU_LOG_TB_NOCHAIN)) {
        if (!locked) {
            tb_lock();
        }
        tb_add_jump(*last_tb, tb_exit, tb);
        if (!locked) {
            tb_unlock();
        }
    }
}

/*
 * tb_find - find next TB, possibly generating it
 *
 * There is a multi-level lookup for finding the next TB which avoids
 * locks unless generation is required.
 *
 * 1. Lookup via the per-vcpu tb_jmp_cache
 * 2. Lookup via tb_find_physical (using QHT)
 *
 * If both of those fail then we need to grab the mmap_lock and
 * tb_lock and do code generation.
 *
 * As the jump patching of code also needs to be protected by locks we
 * have multiple paths into maybe_patch_last_tb taking advantage of
 * the fact we may already have locks held for code generation.
 */
static TranslationBlock *tb_find(CPUState *cpu,
                                 TranslationBlock **last_tb,
                                 int tb_exit)
{
    CPUArchState *env = (CPUArchState *)cpu->env_ptr;
    TranslationBlock *tb;
    target_ulong cs_base, pc;
    unsigned int h;
    uint32_t flags;

    /* we record a subset of the CPU state. It will
       always be the same before a given translated block
       is executed. */
    cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags);
    h = tb_jmp_cache_hash_func(pc);
    tb = atomic_read(&cpu->tb_jmp_cache[h]);

    if (unlikely(!tb || tb->pc != pc || tb->cs_base != cs_base ||
                 tb->flags != flags)) {

        /* Ensure that we won't find a TB in the shared hash table
         * if it is being invalidated by some other thread.
         * Otherwise we'd put it back to CPU's local cache.
         * Pairs with smp_wmb() in tb_phys_invalidate(). */
        smp_rmb();
        tb = tb_find_physical(cpu, pc, cs_base, flags);

        if (!tb) {
            /* mmap_lock is needed by tb_gen_code, and mmap_lock must be
             * taken outside tb_lock. As system emulation is currently
             * single threaded the locks are NOPs.
             */
            mmap_lock();
            tb_lock();

            /* There's a chance that our desired tb has been translated while
             * taking the locks so we check again inside the lock.
             */
            tb = tb_find_physical(cpu, pc, cs_base, flags);
            if (!tb) {
                /* if no translated code available, then translate it now */
                tb = tb_gen_code(cpu, pc, cs_base, flags, 0);
            }
            maybe_patch_last_tb(cpu, tb, last_tb, tb_exit, true);

            tb_unlock();
            mmap_unlock();
        } else {
            maybe_patch_last_tb(cpu, tb, last_tb, tb_exit, false);
        }

        /* We update the TB in the virtual pc hash table for the fast lookup */
        atomic_set(&cpu->tb_jmp_cache[h], tb);
    } else {
        maybe_patch_last_tb(cpu, tb, last_tb, tb_exit, false);
    }

    return tb;
}

But it doesn't seem to make much difference to the microbenchmark and I
think makes the code a lot trickier to follow.

Is it worth it?

--
Alex Bennée

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] cpu-exec: remove tb_lock from the hot-path
  2016-07-05 11:14         ` Alex Bennée
@ 2016-07-05 12:47           ` Paolo Bonzini
  2016-07-05 13:11             ` Alex Bennée
  0 siblings, 1 reply; 24+ messages in thread
From: Paolo Bonzini @ 2016-07-05 12:47 UTC (permalink / raw)
  To: Alex Bennée, Emilio G. Cota
  Cc: mttcg, peter.maydell, claudio.fontana, Peter Crosthwaite,
	jan.kiszka, mark.burton, a.rigo, qemu-devel, serge.fdrv,
	bobby.prani, rth, fred.konrad



On 05/07/2016 13:14, Alex Bennée wrote:
> /*
>  * Patch the last TB with a jump to the current TB.
>  *
>  * Modification of the TB has to be protected with tb_lock which can
>  * either be already held or taken here.
>  */
> static inline void maybe_patch_last_tb(CPUState *cpu,
>                                        TranslationBlock *tb,
>                                        TranslationBlock **last_tb,
>                                        int tb_exit,
>                                        bool locked)
> {
>     if (cpu->tb_flushed) {
>         /* Ensure that no TB jump will be modified as the
>          * translation buffer has been flushed.
>          */
>         *last_tb = NULL;
>         cpu->tb_flushed = false;
>     }
> #ifndef CONFIG_USER_ONLY
>     /* We don't take care of direct jumps when address mapping changes in
>      * system emulation. So it's not safe to make a direct jump to a TB
>      * spanning two pages because the mapping for the second page can change.
>      */
>     if (tb->page_addr[1] != -1) {
>         *last_tb = NULL;
>     }
> #endif
>     /* See if we can patch the calling TB. */
>     if (*last_tb && !qemu_loglevel_mask(CPU_LOG_TB_NOCHAIN)) {
>         if (!locked) {
>             tb_lock();
>         }
>         tb_add_jump(*last_tb, tb_exit, tb);
>         if (!locked) {
>             tb_unlock();
>         }
>     }
> }

Why not add tb_lock_recursive() and tb_lock_reset()?

Paolo

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] tcg: Ensure safe tb_jmp_cache lookup out of 'tb_lock'
  2016-07-04 22:31       ` Emilio G. Cota
@ 2016-07-05 12:49         ` Paolo Bonzini
  0 siblings, 0 replies; 24+ messages in thread
From: Paolo Bonzini @ 2016-07-05 12:49 UTC (permalink / raw)
  To: Emilio G. Cota, Alex Bennée
  Cc: mttcg, peter.maydell, claudio.fontana, Sergey Fedorov,
	Peter Crosthwaite, jan.kiszka, mark.burton, a.rigo, qemu-devel,
	serge.fdrv, bobby.prani, rth, fred.konrad



On 05/07/2016 00:31, Emilio G. Cota wrote:
> My mistake. An atomic_read here isn't needed: as the commit message
> points out, we only need atomic_read when tb_lock isn't held. In this
> case tb_lock is held, so we only use atomic accesses for writing
> to the array.

It's harmless though.  In C11 and C++11 it would even be required, so I
think it's better to add it even though our compilers don't yet enforce it.

Paolo

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] cpu-exec: remove tb_lock from the hot-path
  2016-07-05 12:47           ` Paolo Bonzini
@ 2016-07-05 13:11             ` Alex Bennée
  2016-07-05 13:42               ` Paolo Bonzini
  0 siblings, 1 reply; 24+ messages in thread
From: Alex Bennée @ 2016-07-05 13:11 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Emilio G. Cota, mttcg, peter.maydell, claudio.fontana,
	Peter Crosthwaite, jan.kiszka, mark.burton, a.rigo, qemu-devel,
	serge.fdrv, bobby.prani, rth, fred.konrad


Paolo Bonzini <pbonzini@redhat.com> writes:

> On 05/07/2016 13:14, Alex Bennée wrote:
>> /*
>>  * Patch the last TB with a jump to the current TB.
>>  *
>>  * Modification of the TB has to be protected with tb_lock which can
>>  * either be already held or taken here.
>>  */
>> static inline void maybe_patch_last_tb(CPUState *cpu,
>>                                        TranslationBlock *tb,
>>                                        TranslationBlock **last_tb,
>>                                        int tb_exit,
>>                                        bool locked)
>> {
>>     if (cpu->tb_flushed) {
>>         /* Ensure that no TB jump will be modified as the
>>          * translation buffer has been flushed.
>>          */
>>         *last_tb = NULL;
>>         cpu->tb_flushed = false;
>>     }
>> #ifndef CONFIG_USER_ONLY
>>     /* We don't take care of direct jumps when address mapping changes in
>>      * system emulation. So it's not safe to make a direct jump to a TB
>>      * spanning two pages because the mapping for the second page can change.
>>      */
>>     if (tb->page_addr[1] != -1) {
>>         *last_tb = NULL;
>>     }
>> #endif
>>     /* See if we can patch the calling TB. */
>>     if (*last_tb && !qemu_loglevel_mask(CPU_LOG_TB_NOCHAIN)) {
>>         if (!locked) {
>>             tb_lock();
>>         }
>>         tb_add_jump(*last_tb, tb_exit, tb);
>>         if (!locked) {
>>             tb_unlock();
>>         }
>>     }
>> }
>
> Why not add tb_lock_recursive() and tb_lock_reset()?

I thought we didn't like having recursive locking? I agree it would make
things a little neater though.

--
Alex Bennée

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] cpu-exec: remove tb_lock from the hot-path
  2016-07-05 13:11             ` Alex Bennée
@ 2016-07-05 13:42               ` Paolo Bonzini
  2016-07-05 15:34                 ` Sergey Fedorov
  0 siblings, 1 reply; 24+ messages in thread
From: Paolo Bonzini @ 2016-07-05 13:42 UTC (permalink / raw)
  To: Alex Bennée
  Cc: mttcg, peter.maydell, Peter Crosthwaite, jan.kiszka,
	claudio.fontana, a.rigo, qemu-devel, Emilio G. Cota, mark.burton,
	serge.fdrv, bobby.prani, fred.konrad, rth



On 05/07/2016 15:11, Alex Bennée wrote:
> 
> Paolo Bonzini <pbonzini@redhat.com> writes:
> 
>> On 05/07/2016 13:14, Alex Bennée wrote:
>>> /*
>>>  * Patch the last TB with a jump to the current TB.
>>>  *
>>>  * Modification of the TB has to be protected with tb_lock which can
>>>  * either be already held or taken here.
>>>  */
>>> static inline void maybe_patch_last_tb(CPUState *cpu,
>>>                                        TranslationBlock *tb,
>>>                                        TranslationBlock **last_tb,
>>>                                        int tb_exit,
>>>                                        bool locked)
>>> {
>>>     if (cpu->tb_flushed) {
>>>         /* Ensure that no TB jump will be modified as the
>>>          * translation buffer has been flushed.
>>>          */
>>>         *last_tb = NULL;
>>>         cpu->tb_flushed = false;
>>>     }
>>> #ifndef CONFIG_USER_ONLY
>>>     /* We don't take care of direct jumps when address mapping changes in
>>>      * system emulation. So it's not safe to make a direct jump to a TB
>>>      * spanning two pages because the mapping for the second page can change.
>>>      */
>>>     if (tb->page_addr[1] != -1) {
>>>         *last_tb = NULL;
>>>     }
>>> #endif
>>>     /* See if we can patch the calling TB. */
>>>     if (*last_tb && !qemu_loglevel_mask(CPU_LOG_TB_NOCHAIN)) {
>>>         if (!locked) {
>>>             tb_lock();
>>>         }
>>>         tb_add_jump(*last_tb, tb_exit, tb);
>>>         if (!locked) {
>>>             tb_unlock();
>>>         }
>>>     }
>>> }
>>
>> Why not add tb_lock_recursive() and tb_lock_reset()?
> 
> I thought we didn't like having recursive locking? I agree it would make
> things a little neater though.

I didn't like having recursive mutexes (because recursive mutexes
encourage you to be sloppy).  Explicitly tagging some tb_lock()s as
recursive is fine, though.  The explicit tag tells you to be careful.


Paolo

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] cpu-exec: remove tb_lock from the hot-path
  2016-07-05 13:42               ` Paolo Bonzini
@ 2016-07-05 15:34                 ` Sergey Fedorov
  2016-07-05 16:00                   ` Alex Bennée
  0 siblings, 1 reply; 24+ messages in thread
From: Sergey Fedorov @ 2016-07-05 15:34 UTC (permalink / raw)
  To: Paolo Bonzini, Alex Bennée
  Cc: mttcg, peter.maydell, Peter Crosthwaite, jan.kiszka,
	claudio.fontana, a.rigo, qemu-devel, Emilio G. Cota, mark.burton,
	bobby.prani, fred.konrad, rth

On 05/07/16 16:42, Paolo Bonzini wrote:
>
> On 05/07/2016 15:11, Alex Bennée wrote:
>> Paolo Bonzini <pbonzini@redhat.com> writes:
>>
>>> On 05/07/2016 13:14, Alex Bennée wrote:
>>>> /*
>>>>  * Patch the last TB with a jump to the current TB.
>>>>  *
>>>>  * Modification of the TB has to be protected with tb_lock which can
>>>>  * either be already held or taken here.
>>>>  */
>>>> static inline void maybe_patch_last_tb(CPUState *cpu,
>>>>                                        TranslationBlock *tb,
>>>>                                        TranslationBlock **last_tb,
>>>>                                        int tb_exit,
>>>>                                        bool locked)
>>>> {
>>>>     if (cpu->tb_flushed) {
>>>>         /* Ensure that no TB jump will be modified as the
>>>>          * translation buffer has been flushed.
>>>>          */
>>>>         *last_tb = NULL;
>>>>         cpu->tb_flushed = false;
>>>>     }
>>>> #ifndef CONFIG_USER_ONLY
>>>>     /* We don't take care of direct jumps when address mapping changes in
>>>>      * system emulation. So it's not safe to make a direct jump to a TB
>>>>      * spanning two pages because the mapping for the second page can change.
>>>>      */
>>>>     if (tb->page_addr[1] != -1) {
>>>>         *last_tb = NULL;
>>>>     }
>>>> #endif
>>>>     /* See if we can patch the calling TB. */
>>>>     if (*last_tb && !qemu_loglevel_mask(CPU_LOG_TB_NOCHAIN)) {
>>>>         if (!locked) {
>>>>             tb_lock();
>>>>         }
>>>>         tb_add_jump(*last_tb, tb_exit, tb);
>>>>         if (!locked) {
>>>>             tb_unlock();
>>>>         }
>>>>     }
>>>> }
>>> Why not add tb_lock_recursive() and tb_lock_reset()?
>> I thought we didn't like having recursive locking? I agree it would make
>> things a little neater though.
> I didn't like having recursive mutexes (because recursive mutexes
> encourage you to be sloppy).  Explicitly tagging some tb_lock()s as
> recursive is fine, though.  The explicit tag tells you to be careful.

We could also introduce mmap_lock_reset() similar to tb_lock_reset().
Then we can remove tb_unlock() and mmap_unlock() from tb_find_slow(),
call tb_lock() and mmap_lock() right before tb_add_jump(), and then call
tb_lock_reset() and mmap_lock_reset() at the end of tb_find_fast(). This
would render tb_lock() pretty useless though, since it would be always
held under mmap_lock().

Kind regards,
Sergey

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] cpu-exec: remove tb_lock from the hot-path
  2016-07-05 15:34                 ` Sergey Fedorov
@ 2016-07-05 16:00                   ` Alex Bennée
  2016-07-05 16:04                     ` Sergey Fedorov
  0 siblings, 1 reply; 24+ messages in thread
From: Alex Bennée @ 2016-07-05 16:00 UTC (permalink / raw)
  To: Sergey Fedorov
  Cc: Paolo Bonzini, mttcg, peter.maydell, Peter Crosthwaite,
	jan.kiszka, claudio.fontana, a.rigo, qemu-devel, Emilio G. Cota,
	mark.burton, bobby.prani, fred.konrad, rth


Sergey Fedorov <serge.fdrv@gmail.com> writes:

> On 05/07/16 16:42, Paolo Bonzini wrote:
>>
>> On 05/07/2016 15:11, Alex Bennée wrote:
>>> Paolo Bonzini <pbonzini@redhat.com> writes:
>>>
>>>> On 05/07/2016 13:14, Alex Bennée wrote:
>>>>> /*
>>>>>  * Patch the last TB with a jump to the current TB.
>>>>>  *
>>>>>  * Modification of the TB has to be protected with tb_lock which can
>>>>>  * either be already held or taken here.
>>>>>  */
>>>>> static inline void maybe_patch_last_tb(CPUState *cpu,
>>>>>                                        TranslationBlock *tb,
>>>>>                                        TranslationBlock **last_tb,
>>>>>                                        int tb_exit,
>>>>>                                        bool locked)
>>>>> {
>>>>>     if (cpu->tb_flushed) {
>>>>>         /* Ensure that no TB jump will be modified as the
>>>>>          * translation buffer has been flushed.
>>>>>          */
>>>>>         *last_tb = NULL;
>>>>>         cpu->tb_flushed = false;
>>>>>     }
>>>>> #ifndef CONFIG_USER_ONLY
>>>>>     /* We don't take care of direct jumps when address mapping changes in
>>>>>      * system emulation. So it's not safe to make a direct jump to a TB
>>>>>      * spanning two pages because the mapping for the second page can change.
>>>>>      */
>>>>>     if (tb->page_addr[1] != -1) {
>>>>>         *last_tb = NULL;
>>>>>     }
>>>>> #endif
>>>>>     /* See if we can patch the calling TB. */
>>>>>     if (*last_tb && !qemu_loglevel_mask(CPU_LOG_TB_NOCHAIN)) {
>>>>>         if (!locked) {
>>>>>             tb_lock();
>>>>>         }
>>>>>         tb_add_jump(*last_tb, tb_exit, tb);
>>>>>         if (!locked) {
>>>>>             tb_unlock();
>>>>>         }
>>>>>     }
>>>>> }
>>>> Why not add tb_lock_recursive() and tb_lock_reset()?
>>> I thought we didn't like having recursive locking? I agree it would make
>>> things a little neater though.
>> I didn't like having recursive mutexes (because recursive mutexes
>> encourage you to be sloppy).  Explicitly tagging some tb_lock()s as
>> recursive is fine, though.  The explicit tag tells you to be careful.
>
> We could also introduce mmap_lock_reset() similar to tb_lock_reset().
> Then we can remove tb_unlock() and mmap_unlock() from tb_find_slow(),
> call tb_lock() and mmap_lock() right before tb_add_jump(), and then call
> tb_lock_reset() and mmap_lock_reset() at the end of tb_find_fast(). This
> would render tb_lock() pretty useless though, since it would be always
> held under mmap_lock().

I'm about to post v2. I've put all this aggressive extension of the
critical path as additional patches as I'm not sure it is really worth
it.

The big win is doing the tb_jmp_cache and first tb_find_physical lookups
out of the lock. Trying to avoid bouncing the tb_lock() when patching
doesn't seem to make any difference in my micro benchmarks.

>
> Kind regards,
> Sergey


--
Alex Bennée

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] cpu-exec: remove tb_lock from the hot-path
  2016-07-05 16:00                   ` Alex Bennée
@ 2016-07-05 16:04                     ` Sergey Fedorov
  0 siblings, 0 replies; 24+ messages in thread
From: Sergey Fedorov @ 2016-07-05 16:04 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Paolo Bonzini, mttcg, peter.maydell, Peter Crosthwaite,
	jan.kiszka, claudio.fontana, a.rigo, qemu-devel, Emilio G. Cota,
	mark.burton, bobby.prani, fred.konrad, rth

On 05/07/16 19:00, Alex Bennée wrote:
> Sergey Fedorov <serge.fdrv@gmail.com> writes:
>
>> On 05/07/16 16:42, Paolo Bonzini wrote:
>>> On 05/07/2016 15:11, Alex Bennée wrote:
>>>> Paolo Bonzini <pbonzini@redhat.com> writes:
>>>>
>>>>> On 05/07/2016 13:14, Alex Bennée wrote:
>>>>>> /*
>>>>>>  * Patch the last TB with a jump to the current TB.
>>>>>>  *
>>>>>>  * Modification of the TB has to be protected with tb_lock which can
>>>>>>  * either be already held or taken here.
>>>>>>  */
>>>>>> static inline void maybe_patch_last_tb(CPUState *cpu,
>>>>>>                                        TranslationBlock *tb,
>>>>>>                                        TranslationBlock **last_tb,
>>>>>>                                        int tb_exit,
>>>>>>                                        bool locked)
>>>>>> {
>>>>>>     if (cpu->tb_flushed) {
>>>>>>         /* Ensure that no TB jump will be modified as the
>>>>>>          * translation buffer has been flushed.
>>>>>>          */
>>>>>>         *last_tb = NULL;
>>>>>>         cpu->tb_flushed = false;
>>>>>>     }
>>>>>> #ifndef CONFIG_USER_ONLY
>>>>>>     /* We don't take care of direct jumps when address mapping changes in
>>>>>>      * system emulation. So it's not safe to make a direct jump to a TB
>>>>>>      * spanning two pages because the mapping for the second page can change.
>>>>>>      */
>>>>>>     if (tb->page_addr[1] != -1) {
>>>>>>         *last_tb = NULL;
>>>>>>     }
>>>>>> #endif
>>>>>>     /* See if we can patch the calling TB. */
>>>>>>     if (*last_tb && !qemu_loglevel_mask(CPU_LOG_TB_NOCHAIN)) {
>>>>>>         if (!locked) {
>>>>>>             tb_lock();
>>>>>>         }
>>>>>>         tb_add_jump(*last_tb, tb_exit, tb);
>>>>>>         if (!locked) {
>>>>>>             tb_unlock();
>>>>>>         }
>>>>>>     }
>>>>>> }
>>>>> Why not add tb_lock_recursive() and tb_lock_reset()?
>>>> I thought we didn't like having recursive locking? I agree it would make
>>>> things a little neater though.
>>> I didn't like having recursive mutexes (because recursive mutexes
>>> encourage you to be sloppy).  Explicitly tagging some tb_lock()s as
>>> recursive is fine, though.  The explicit tag tells you to be careful.
>> We could also introduce mmap_lock_reset() similar to tb_lock_reset().
>> Then we can remove tb_unlock() and mmap_unlock() from tb_find_slow(),
>> call tb_lock() and mmap_lock() right before tb_add_jump(), and then call
>> tb_lock_reset() and mmap_lock_reset() at the end of tb_find_fast(). This
>> would render tb_lock() pretty useless though, since it would be always
>> held under mmap_lock().
> I'm about to post v2. I've put all this aggressive extension of the
> critical path as additional patches as I'm not sure it is really worth
> it.
>
> The big win is doing the tb_jmp_cache and first tb_find_physical lookups
> out of the lock. Trying to avoid bouncing the tb_lock() when patching
> doesn't seem to make any difference in my micro benchmarks.

I still have a feeling that we don't almost need both tb_lock() and
mmap_lock(). Extending tb_lock() across tb_gen_code() and tb_add_jump()
would be required should we decide to merge the locks. :)

Thanks,
Sergey

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2016-07-05 16:04 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-01 16:16 [Qemu-devel] [PATCH 0/2] Reduce lock contention on TCG hot-path Alex Bennée
2016-07-01 16:16 ` [Qemu-devel] [PATCH 1/2] tcg: Ensure safe tb_jmp_cache lookup out of 'tb_lock' Alex Bennée
2016-07-01 23:14   ` Richard Henderson
2016-07-02  0:17   ` Emilio G. Cota
2016-07-02  0:32     ` Richard Henderson
2016-07-04 22:33       ` Emilio G. Cota
2016-07-02  7:09     ` Alex Bennée
2016-07-04 22:31       ` Emilio G. Cota
2016-07-05 12:49         ` Paolo Bonzini
2016-07-01 16:16 ` [Qemu-devel] [PATCH 2/2] cpu-exec: remove tb_lock from the hot-path Alex Bennée
2016-07-01 23:19   ` Richard Henderson
2016-07-02  0:39   ` Emilio G. Cota
2016-07-04 11:45     ` Alex Bennée
2016-07-04 22:30       ` Emilio G. Cota
2016-07-05 11:14         ` Alex Bennée
2016-07-05 12:47           ` Paolo Bonzini
2016-07-05 13:11             ` Alex Bennée
2016-07-05 13:42               ` Paolo Bonzini
2016-07-05 15:34                 ` Sergey Fedorov
2016-07-05 16:00                   ` Alex Bennée
2016-07-05 16:04                     ` Sergey Fedorov
2016-07-02  0:52 ` [Qemu-devel] [PATCH 0/2] Reduce lock contention on TCG hot-path Emilio G. Cota
2016-07-02  7:08   ` Alex Bennée
2016-07-02 16:03     ` Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.