All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 0/4] trace: [tcg] Optimize per-vCPU tracing states with separate TB caches
@ 2016-09-14 21:23 Lluís Vilanova
  2016-09-14 21:23 ` [Qemu-devel] [PATCH 1/4] exec: [tcg] Refactor flush of per-CPU virtual TB cache Lluís Vilanova
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Lluís Vilanova @ 2016-09-14 21:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: Eric Blake, Eduardo Habkost, Stefan Hajnoczi

Avoids generating TCG code to call guest code tracing events in vCPUs that are
not dynamically tracing that event.

Currently, events with the 'tcg' property always generate TCG code to trace that
event at guest code execution time, when their dynamic tracing state is checked.

This series adds a performance optimization where TCG code for events with the
'tcg' and 'vcpu' properties is not generated if the event is dynamically
disabled. This optimization raises two issues:

* An event can be dynamically disabled/enabled after the corresponding TCG code
  has been generated (i.e., a new TB with the corresponding code should be
  used).

* Each vCPU can have a different dynamic state for the same event (i.e., tracing
  the memory accesses of only one process pinned to a vCPU).

To handle both issues, this series replicates the shared physical TB cache,
creating a separate physical TB cache for every combination of event states
(those with the 'vcpu' and 'tcg' properties). Then, all vCPUs tracing the same
events will use the same physical TB cache.

Sharing physical TBs makes this very space efficient (only the physical TB
caches, simple arrays of pointers, are replicated), sharing physical TB caches
maximizes TB reuse across vCPUs whenever possible, and makes dynamic event state
changes more efficient (simply use a different TB array).

The physical TB cache array is indexed with the vCPU's trace event state
bitmask. This is simpler and more efficient than emitting TCG code to check if
an event needs tracing; then we should still move the tracing call code to
either a cold path (making tracing performance worse), or leave it inlined
(making non-tracing performance worse).

It is also more efficient than eliding TCG code only when *zero* vCPUs are
tracing an event, since enabling it on a single vCPU will impact the performance
of all other vCPUs that are not tracing that event.

Signed-off-by: Lluís Vilanova <vilanova@ac.upc.edu>
---

Lluís Vilanova (4):
      exec: [tcg] Refactor flush of per-CPU virtual TB cache
      exec: [tcg] Use multiple physical TB caches
      exec: [tcg] Switch physical TB cache based on vCPU tracing state
      trace: [tcg] Do not generate TCG code to trace dinamically-disabled events


 cpu-exec.c                               |   11 ++++
 cputlb.c                                 |    2 -
 include/exec/exec-all.h                  |   12 ++++
 include/exec/tb-context.h                |    2 -
 include/qom/cpu.h                        |    4 +
 qom/cpu.c                                |    1 
 scripts/tracetool/backend/dtrace.py      |    2 -
 scripts/tracetool/backend/ftrace.py      |   20 ++++---
 scripts/tracetool/backend/log.py         |   16 +++---
 scripts/tracetool/backend/simple.py      |    2 -
 scripts/tracetool/backend/syslog.py      |    6 +-
 scripts/tracetool/backend/ust.py         |    2 -
 scripts/tracetool/format/h.py            |   23 ++++++--
 scripts/tracetool/format/tcg_h.py        |   20 ++++++-
 scripts/tracetool/format/tcg_helper_c.py |    3 +
 trace/control-target.c                   |    2 +
 trace/control.h                          |    3 +
 translate-all.c                          |   83 ++++++++++++++++++++++++++----
 translate-all.h                          |   43 ++++++++++++++++
 translate-all.inc.h                      |   13 +++++
 20 files changed, 221 insertions(+), 49 deletions(-)
 create mode 100644 translate-all.inc.h


To: qemu-devel@nongnu.org
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Eric Blake <eblake@redhat.com>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Qemu-devel] [PATCH 1/4] exec: [tcg] Refactor flush of per-CPU virtual TB cache
  2016-09-14 21:23 [Qemu-devel] [PATCH 0/4] trace: [tcg] Optimize per-vCPU tracing states with separate TB caches Lluís Vilanova
@ 2016-09-14 21:23 ` Lluís Vilanova
  2016-09-14 21:23 ` [Qemu-devel] [PATCH 2/4] exec: [tcg] Use multiple physical TB caches Lluís Vilanova
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 8+ messages in thread
From: Lluís Vilanova @ 2016-09-14 21:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eric Blake, Eduardo Habkost, Stefan Hajnoczi, Paolo Bonzini,
	Peter Crosthwaite, Richard Henderson

The function is reused in later patches.

Signed-off-by: Lluís Vilanova <vilanova@ac.upc.edu>
---
 cputlb.c                |    2 +-
 include/exec/exec-all.h |    6 ++++++
 translate-all.c         |    9 +++++++--
 3 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/cputlb.c b/cputlb.c
index d068ee5..686a09c 100644
--- a/cputlb.c
+++ b/cputlb.c
@@ -81,7 +81,7 @@ void tlb_flush(CPUState *cpu, int flush_global)
 
     memset(env->tlb_table, -1, sizeof(env->tlb_table));
     memset(env->tlb_v_table, -1, sizeof(env->tlb_v_table));
-    memset(cpu->tb_jmp_cache, 0, sizeof(cpu->tb_jmp_cache));
+    tb_flush_jmp_cache_all(cpu);
 
     env->vtlb_index = 0;
     env->tlb_flush_addr = -1;
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index d008296..e2124dc 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -265,6 +265,12 @@ struct TranslationBlock {
 };
 
 void tb_free(TranslationBlock *tb);
+/**
+ * tb_flush_jmp_cache_all:
+ *
+ * Flush the virtual translation block cache.
+ */
+void tb_flush_jmp_cache_all(CPUState *env);
 void tb_flush(CPUState *cpu);
 void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr);
 
diff --git a/translate-all.c b/translate-all.c
index 0dd6466..ebd9fa0 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -851,8 +851,7 @@ void tb_flush(CPUState *cpu)
     tcg_ctx.tb_ctx.nb_tbs = 0;
 
     CPU_FOREACH(cpu) {
-        memset(cpu->tb_jmp_cache, 0, sizeof(cpu->tb_jmp_cache));
-        cpu->tb_flushed = true;
+        tb_flush_jmp_cache_all(cpu);
     }
 
     qht_reset_size(&tcg_ctx.tb_ctx.htable, CODE_GEN_HTABLE_SIZE);
@@ -1579,6 +1578,12 @@ void tb_check_watchpoint(CPUState *cpu)
     }
 }
 
+void tb_flush_jmp_cache_all(CPUState *cpu)
+{
+    memset(cpu->tb_jmp_cache, 0, sizeof(cpu->tb_jmp_cache));
+    cpu->tb_flushed = true;
+}
+
 #ifndef CONFIG_USER_ONLY
 /* in deterministic execution mode, instructions doing device I/Os
    must be at the end of the TB */

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [Qemu-devel] [PATCH 2/4] exec: [tcg] Use multiple physical TB caches
  2016-09-14 21:23 [Qemu-devel] [PATCH 0/4] trace: [tcg] Optimize per-vCPU tracing states with separate TB caches Lluís Vilanova
  2016-09-14 21:23 ` [Qemu-devel] [PATCH 1/4] exec: [tcg] Refactor flush of per-CPU virtual TB cache Lluís Vilanova
@ 2016-09-14 21:23 ` Lluís Vilanova
  2016-09-14 21:23 ` [Qemu-devel] [PATCH 3/4] exec: [tcg] Switch physical TB cache based on vCPU tracing state Lluís Vilanova
  2016-09-14 21:23 ` [Qemu-devel] [PATCH 4/4] trace: [tcg] Do not generate TCG code to trace dinamically-disabled events Lluís Vilanova
  3 siblings, 0 replies; 8+ messages in thread
From: Lluís Vilanova @ 2016-09-14 21:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eric Blake, Eduardo Habkost, Stefan Hajnoczi, Paolo Bonzini,
	Peter Crosthwaite, Richard Henderson

The physical TB cache is split into 2^E caches, where E is the number of
events with the "vcpu" and without the "disable" properties.

The virtual TB cache on each vCPU uses a (potentially) different
physical TB cache.

This is later exploited to support different tracing event states on a
per-vCPU basis.

Signed-off-by: Lluís Vilanova <vilanova@ac.upc.edu>
---
 cpu-exec.c                |    5 ++++
 include/exec/exec-all.h   |    6 +++++
 include/exec/tb-context.h |    2 +-
 include/qom/cpu.h         |    4 +++-
 qom/cpu.c                 |    1 +
 translate-all.c           |   51 +++++++++++++++++++++++++++++++++++++--------
 translate-all.h           |   17 +++++++++++++++
 translate-all.inc.h       |   13 +++++++++++
 8 files changed, 87 insertions(+), 12 deletions(-)
 create mode 100644 translate-all.inc.h

diff --git a/cpu-exec.c b/cpu-exec.c
index 5d9710a..7b2d8c6 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -33,6 +33,7 @@
 #include "hw/i386/apic.h"
 #endif
 #include "sysemu/replay.h"
+#include "translate-all.h"
 
 /* -icount align implementation. */
 
@@ -267,6 +268,7 @@ static TranslationBlock *tb_find_physical(CPUState *cpu,
     tb_page_addr_t phys_pc;
     struct tb_desc desc;
     uint32_t h;
+    struct qht *qht;
 
     desc.env = (CPUArchState *)cpu->env_ptr;
     desc.cs_base = cs_base;
@@ -275,7 +277,8 @@ static TranslationBlock *tb_find_physical(CPUState *cpu,
     phys_pc = get_page_addr_code(desc.env, pc);
     desc.phys_page1 = phys_pc & TARGET_PAGE_MASK;
     h = tb_hash_func(phys_pc, pc, flags);
-    return qht_lookup(&tcg_ctx.tb_ctx.htable, tb_cmp, &desc, h);
+    qht = tb_caches_get(&tcg_ctx.tb_ctx, cpu->tb_cache_idx);
+    return qht_lookup(qht, tb_cmp, &desc, h);
 }
 
 static TranslationBlock *tb_find_slow(CPUState *cpu,
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index e2124dc..4ae04f6 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -211,6 +211,10 @@ static inline void tlb_flush_by_mmuidx(CPUState *cpu, ...)
 #define USE_DIRECT_JUMP
 #endif
 
+/**
+ * TranslationBlock:
+ * @tb_cache_idx: Index of physical TB cache where this TB has been allocated.
+ */
 struct TranslationBlock {
     target_ulong pc;   /* simulated PC corresponding to this block (EIP + CS base) */
     target_ulong cs_base; /* CS base for this block */
@@ -262,6 +266,8 @@ struct TranslationBlock {
      */
     uintptr_t jmp_list_next[2];
     uintptr_t jmp_list_first;
+
+    DECLARE_BITMAP(tb_cache_idx, TRACE_VCPU_EVENT_COUNT);
 };
 
 void tb_free(TranslationBlock *tb);
diff --git a/include/exec/tb-context.h b/include/exec/tb-context.h
index dce95d9..7728904 100644
--- a/include/exec/tb-context.h
+++ b/include/exec/tb-context.h
@@ -32,7 +32,7 @@ typedef struct TBContext TBContext;
 struct TBContext {
 
     TranslationBlock *tbs;
-    struct qht htable;
+    struct qht *htables;
     int nb_tbs;
     /* any access to the tbs or the page table must use this lock */
     QemuMutex tb_lock;
diff --git a/include/qom/cpu.h b/include/qom/cpu.h
index ce0c406..d870810 100644
--- a/include/qom/cpu.h
+++ b/include/qom/cpu.h
@@ -282,6 +282,7 @@ struct qemu_work_item {
  * @kvm_fd: vCPU file descriptor for KVM.
  * @work_mutex: Lock to prevent multiple access to queued_work_*.
  * @queued_work_first: First asynchronous work pending.
+ * @tb_cache_idx: Index of current TB cache.
  * @trace_dstate: Dynamic tracing state of events for this vCPU (bitmask).
  *
  * State of one CPU core or thread.
@@ -350,7 +351,8 @@ struct CPUState {
     struct KVMState *kvm_state;
     struct kvm_run *kvm_run;
 
-    /* Used for events with 'vcpu' and *without* the 'disabled' properties */
+    /* Used for events with 'vcpu' and *without* the 'disable' properties */
+    DECLARE_BITMAP(tb_cache_idx, TRACE_VCPU_EVENT_COUNT);
     DECLARE_BITMAP(trace_dstate, TRACE_VCPU_EVENT_COUNT);
 
     /* TODO Move common fields from CPUArchState here. */
diff --git a/qom/cpu.c b/qom/cpu.c
index 2553247..2225103 100644
--- a/qom/cpu.c
+++ b/qom/cpu.c
@@ -345,6 +345,7 @@ static void cpu_common_initfn(Object *obj)
     qemu_mutex_init(&cpu->work_mutex);
     QTAILQ_INIT(&cpu->breakpoints);
     QTAILQ_INIT(&cpu->watchpoints);
+    bitmap_zero(cpu->tb_cache_idx, TRACE_VCPU_EVENT_COUNT);
     bitmap_zero(cpu->trace_dstate, TRACE_VCPU_EVENT_COUNT);
 }
 
diff --git a/translate-all.c b/translate-all.c
index ebd9fa0..c864eee 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -733,11 +733,22 @@ static inline void code_gen_alloc(size_t tb_size)
     qemu_mutex_init(&tcg_ctx.tb_ctx.tb_lock);
 }
 
+/*
+ * Ensure bitmaps can be used as indexes.
+ */
+void *__error__too_many_vcpu_events[
+    (TRACE_VCPU_EVENT_COUNT + 1) <= BITS_PER_LONG ? 0 : -1];
+
 static void tb_htable_init(void)
 {
+    int cache;
     unsigned int mode = QHT_MODE_AUTO_RESIZE;
 
-    qht_init(&tcg_ctx.tb_ctx.htable, CODE_GEN_HTABLE_SIZE, mode);
+    tcg_ctx.tb_ctx.htables = g_malloc(
+        sizeof(tcg_ctx.tb_ctx.htables[0]) * tb_caches_count());
+    for (cache = 0; cache < tb_caches_count(); cache++) {
+        qht_init(&tcg_ctx.tb_ctx.htables[cache], CODE_GEN_HTABLE_SIZE, mode);
+    }
 }
 
 /* Must be called before using the QEMU cpus. 'tb_size' is the size
@@ -834,6 +845,8 @@ static void page_flush_tb(void)
 /* XXX: tb_flush is currently not thread safe */
 void tb_flush(CPUState *cpu)
 {
+    int i;
+
     if (!tcg_enabled()) {
         return;
     }
@@ -854,7 +867,9 @@ void tb_flush(CPUState *cpu)
         tb_flush_jmp_cache_all(cpu);
     }
 
-    qht_reset_size(&tcg_ctx.tb_ctx.htable, CODE_GEN_HTABLE_SIZE);
+    for (i = 0; i < tb_caches_count(); i++) {
+        qht_reset_size(&tcg_ctx.tb_ctx.htables[i], CODE_GEN_HTABLE_SIZE);
+    }
     page_flush_tb();
 
     tcg_ctx.code_gen_ptr = tcg_ctx.code_gen_buffer;
@@ -879,8 +894,12 @@ do_tb_invalidate_check(struct qht *ht, void *p, uint32_t hash, void *userp)
 
 static void tb_invalidate_check(target_ulong address)
 {
+    int i;
+
     address &= TARGET_PAGE_MASK;
-    qht_iter(&tcg_ctx.tb_ctx.htable, do_tb_invalidate_check, &address);
+    for (i = 0; i < tb_caches_count(); i++) {
+        qht_iter(&tcg_ctx.tb_ctx.htables[i], do_tb_invalidate_check, &address);
+    }
 }
 
 static void
@@ -900,7 +919,10 @@ do_tb_page_check(struct qht *ht, void *p, uint32_t hash, void *userp)
 /* verify that all the pages have correct rights for code */
 static void tb_page_check(void)
 {
-    qht_iter(&tcg_ctx.tb_ctx.htable, do_tb_page_check, NULL);
+    int i;
+    for (i = 0; i < tb_caches_count(); i++) {
+        qht_iter(&tcg_ctx.tb_ctx.htables[i], do_tb_page_check, NULL);
+    }
 }
 
 #endif
@@ -987,12 +1009,14 @@ void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr)
     CPUState *cpu;
     PageDesc *p;
     uint32_t h;
+    struct qht *qht;
     tb_page_addr_t phys_pc;
 
     /* remove the TB from the hash list */
     phys_pc = tb->page_addr[0] + (tb->pc & ~TARGET_PAGE_MASK);
     h = tb_hash_func(phys_pc, tb->pc, tb->flags);
-    qht_remove(&tcg_ctx.tb_ctx.htable, tb, h);
+    qht = tb_caches_get(&tcg_ctx.tb_ctx, tb->tb_cache_idx);
+    qht_remove(qht, tb, h);
 
     /* remove the TB from the page list */
     if (tb->page_addr[0] != page_addr) {
@@ -1122,10 +1146,12 @@ static void tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc,
                          tb_page_addr_t phys_page2)
 {
     uint32_t h;
+    struct qht *qht;
 
     /* add in the hash table */
     h = tb_hash_func(phys_pc, tb->pc, tb->flags);
-    qht_insert(&tcg_ctx.tb_ctx.htable, tb, h);
+    qht = tb_caches_get(&tcg_ctx.tb_ctx, tb->tb_cache_idx);
+    qht_insert(qht, tb, h);
 
     /* add in the page list */
     tb_alloc_page(tb, 0, phys_pc & TARGET_PAGE_MASK);
@@ -1175,6 +1201,8 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
     tb->cs_base = cs_base;
     tb->flags = flags;
     tb->cflags = cflags;
+    bitmap_copy(tb->tb_cache_idx, ENV_GET_CPU(env)->tb_cache_idx,
+                TRACE_VCPU_EVENT_COUNT);
 
 #ifdef CONFIG_PROFILER
     tcg_ctx.tb_count1++; /* includes aborted translations because of
@@ -1636,6 +1664,8 @@ void cpu_io_recompile(CPUState *cpu, uintptr_t retaddr)
     pc = tb->pc;
     cs_base = tb->cs_base;
     flags = tb->flags;
+    /* XXX: It is OK to invalidate only this TB, as this is the one triggering
+     * the memory access */
     tb_phys_invalidate(tb, -1);
     if (tb->cflags & CF_NOCACHE) {
         if (tb->orig_tb) {
@@ -1715,6 +1745,7 @@ void dump_exec_info(FILE *f, fprintf_function cpu_fprintf)
     int direct_jmp_count, direct_jmp2_count, cross_page;
     TranslationBlock *tb;
     struct qht_stats hst;
+    int cache;
 
     target_code_size = 0;
     max_target_code_size = 0;
@@ -1766,9 +1797,11 @@ void dump_exec_info(FILE *f, fprintf_function cpu_fprintf)
                 tcg_ctx.tb_ctx.nb_tbs ? (direct_jmp2_count * 100) /
                         tcg_ctx.tb_ctx.nb_tbs : 0);
 
-    qht_statistics_init(&tcg_ctx.tb_ctx.htable, &hst);
-    print_qht_statistics(f, cpu_fprintf, hst);
-    qht_statistics_destroy(&hst);
+    for (cache = 0; cache < tb_caches_count(); cache++) {
+        qht_statistics_init(&tcg_ctx.tb_ctx.htables[cache], &hst);
+        print_qht_statistics(f, cpu_fprintf, hst);
+        qht_statistics_destroy(&hst);
+    }
 
     cpu_fprintf(f, "\nStatistics:\n");
     cpu_fprintf(f, "TB flush count      %d\n", tcg_ctx.tb_ctx.tb_flush_count);
diff --git a/translate-all.h b/translate-all.h
index ba8e4d6..d39bf32 100644
--- a/translate-all.h
+++ b/translate-all.h
@@ -20,7 +20,21 @@
 #define TRANSLATE_ALL_H
 
 #include "exec/exec-all.h"
+#include "qemu/typedefs.h"
 
+/**
+ * tb_caches_count:
+ *
+ * Number of TB caches.
+ */
+static size_t tb_caches_count(void);
+
+/**
+ * tb_caches_get:
+ *
+ * Get the TB cache for the given bitmap index.
+ */
+static struct qht *tb_caches_get(TBContext *tb_ctx, unsigned long *bitmap);
 
 /* translate-all.c */
 void tb_invalidate_phys_page_fast(tb_page_addr_t start, int len);
@@ -33,4 +47,7 @@ void tb_check_watchpoint(CPUState *cpu);
 int page_unprotect(target_ulong address, uintptr_t pc);
 #endif
 
+
+#include "translate-all.inc.h"
+
 #endif /* TRANSLATE_ALL_H */
diff --git a/translate-all.inc.h b/translate-all.inc.h
new file mode 100644
index 0000000..c60a48e
--- /dev/null
+++ b/translate-all.inc.h
@@ -0,0 +1,13 @@
+/* Inline implementations for translate-all.h */
+
+static inline size_t tb_caches_count(void)
+{
+    return 1ULL << TRACE_VCPU_EVENT_COUNT;
+}
+
+static inline struct qht *tb_caches_get(TBContext *tb_ctx,
+                                        unsigned long *bitmap)
+{
+    unsigned long idx = *bitmap;
+    return &tb_ctx->htables[idx];
+}

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [Qemu-devel] [PATCH 3/4] exec: [tcg] Switch physical TB cache based on vCPU tracing state
  2016-09-14 21:23 [Qemu-devel] [PATCH 0/4] trace: [tcg] Optimize per-vCPU tracing states with separate TB caches Lluís Vilanova
  2016-09-14 21:23 ` [Qemu-devel] [PATCH 1/4] exec: [tcg] Refactor flush of per-CPU virtual TB cache Lluís Vilanova
  2016-09-14 21:23 ` [Qemu-devel] [PATCH 2/4] exec: [tcg] Use multiple physical TB caches Lluís Vilanova
@ 2016-09-14 21:23 ` Lluís Vilanova
  2016-09-15 12:57   ` Lluís Vilanova
  2016-09-14 21:23 ` [Qemu-devel] [PATCH 4/4] trace: [tcg] Do not generate TCG code to trace dinamically-disabled events Lluís Vilanova
  3 siblings, 1 reply; 8+ messages in thread
From: Lluís Vilanova @ 2016-09-14 21:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eric Blake, Eduardo Habkost, Stefan Hajnoczi, Paolo Bonzini,
	Peter Crosthwaite, Richard Henderson

Uses the per-vCPU event state in CPUState->trace_dstate (a bitmap) as an
index to a physical TB cache that will contain code specific to the set
of dynamically enabled events.

Two vCPUs tracing different events will execute code from different
physical TB caches. Two vCPUs tracing the same events will execute code
from the same physical TB cache.

This is used on the next patch to optimize TCG code related to event
tracing.

Signed-off-by: Lluís Vilanova <vilanova@ac.upc.edu>
---
 cpu-exec.c             |    6 ++++++
 trace/control-target.c |    2 ++
 trace/control.h        |    3 +++
 translate-all.c        |   23 +++++++++++++++++++++++
 translate-all.h        |   26 ++++++++++++++++++++++++++
 5 files changed, 60 insertions(+)

diff --git a/cpu-exec.c b/cpu-exec.c
index 7b2d8c6..14fc44c 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -29,6 +29,7 @@
 #include "qemu/rcu.h"
 #include "exec/tb-hash.h"
 #include "exec/log.h"
+#include "translate-all.h"
 #if defined(TARGET_I386) && !defined(CONFIG_USER_ONLY)
 #include "hw/i386/apic.h"
 #endif
@@ -512,6 +513,11 @@ static inline void cpu_handle_interrupt(CPUState *cpu,
             *last_tb = NULL;
         }
     }
+    if (unlikely(cpu_tb_cache_set_requested(cpu))) {
+        cpu_tb_cache_set_apply(cpu);
+        /* avoid chaning TBs across physical TB caches */
+        *last_tb = NULL;
+    }
     if (unlikely(cpu->exit_request || replay_has_interrupt())) {
         cpu->exit_request = 0;
         cpu->exception_index = EXCP_INTERRUPT;
diff --git a/trace/control-target.c b/trace/control-target.c
index 72081e2..2d854a7 100644
--- a/trace/control-target.c
+++ b/trace/control-target.c
@@ -79,5 +79,7 @@ void trace_event_set_vcpu_state_dynamic(CPUState *vcpu,
             clear_bit(vcpu_id, vcpu->trace_dstate);
             trace_events_dstate[id]--;
         }
+        /* TODO: do not wait until the current TB finishes */
+        cpu_tb_cache_set_request(vcpu);
     }
 }
diff --git a/trace/control.h b/trace/control.h
index 27a16fc..ca88682 100644
--- a/trace/control.h
+++ b/trace/control.h
@@ -210,6 +210,9 @@ void trace_event_set_state_dynamic(TraceEvent *ev, bool state);
  * Set the dynamic tracing state of an event for the given vCPU.
  *
  * Pre-condition: trace_event_get_vcpu_state_static(ev) == true
+ *
+ * Note: Changes for execution-time events with the 'tcg' property will not be
+ *       propagated until the next TB is executed (iff executing in TCG mode).
  */
 void trace_event_set_vcpu_state_dynamic(CPUState *vcpu,
                                         TraceEvent *ev, bool state);
diff --git a/translate-all.c b/translate-all.c
index c864eee..c306cf4 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -1166,6 +1166,29 @@ static void tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc,
 #endif
 }
 
+void cpu_tb_cache_set_request(CPUState *cpu)
+{
+    /*
+     * Request is taken from cpu->trace_dstate and lazily applied into
+     * cpu->tb_cache_idx at cpu_tb_cache_set_apply().
+     */
+    /* NOTE: Checked by all TBs in gen_tb_start(). */
+    cpu->tcg_exit_req = true;
+}
+
+bool cpu_tb_cache_set_requested(CPUState *cpu)
+{
+    return !bitmap_equal(cpu->trace_dstate, cpu->tb_cache_idx,
+                         TRACE_VCPU_EVENT_COUNT);
+}
+
+void cpu_tb_cache_set_apply(CPUState *cpu)
+{
+    bitmap_copy(cpu->tb_cache_idx, cpu->tb_cache_idx,
+                TRACE_VCPU_EVENT_COUNT);
+    tb_flush_jmp_cache_all(cpu);
+}
+
 /* Called with mmap_lock held for user mode emulation.  */
 TranslationBlock *tb_gen_code(CPUState *cpu,
                               target_ulong pc, target_ulong cs_base,
diff --git a/translate-all.h b/translate-all.h
index d39bf32..fcc7fb0 100644
--- a/translate-all.h
+++ b/translate-all.h
@@ -36,6 +36,32 @@ static size_t tb_caches_count(void);
  */
 static struct qht *tb_caches_get(TBContext *tb_ctx, unsigned long *bitmap);
 
+/**
+ * cpu_tb_cache_set_request:
+ *
+ * Request a physical TB cache switch on this @cpu.
+ */
+void cpu_tb_cache_set_request(CPUState *cpu);
+
+/**
+ * cpu_tb_cache_set_requested:
+ *
+ * Returns: %true if @cpu requested a physical TB cache switch, %false
+ *          otherwise.
+ */
+bool cpu_tb_cache_set_requested(CPUState *cpu);
+
+/**
+ * cput_tb_cache_set_apply:
+ *
+ * Apply a physical TB cache switch.
+ *
+ * Precondition: @cpu is not currently executing any TB.
+ *
+ * Note: Invalidates the jump cache of the given vCPU.
+ */
+void cpu_tb_cache_set_apply(CPUState *cpu);
+
 /* translate-all.c */
 void tb_invalidate_phys_page_fast(tb_page_addr_t start, int len);
 void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end,

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [Qemu-devel] [PATCH 4/4] trace: [tcg] Do not generate TCG code to trace dinamically-disabled events
  2016-09-14 21:23 [Qemu-devel] [PATCH 0/4] trace: [tcg] Optimize per-vCPU tracing states with separate TB caches Lluís Vilanova
                   ` (2 preceding siblings ...)
  2016-09-14 21:23 ` [Qemu-devel] [PATCH 3/4] exec: [tcg] Switch physical TB cache based on vCPU tracing state Lluís Vilanova
@ 2016-09-14 21:23 ` Lluís Vilanova
  2016-09-15 12:55   ` Daniel P. Berrange
  3 siblings, 1 reply; 8+ messages in thread
From: Lluís Vilanova @ 2016-09-14 21:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: Eric Blake, Eduardo Habkost, Stefan Hajnoczi

If an event is dynamically disabled, the TCG code that calls the
execution-time tracer is not generated.

Removes the overheads of execution-time tracers for dynamically disabled
events. As a bonus, also avoids checking the event state when the
execution-time tracer is called from TCG-generated code (since otherwise
TCG would simply not call it).

Signed-off-by: Lluís Vilanova <vilanova@ac.upc.edu>
---
 scripts/tracetool/backend/dtrace.py      |    2 +-
 scripts/tracetool/backend/ftrace.py      |   20 ++++++++++----------
 scripts/tracetool/backend/log.py         |   16 ++++++++--------
 scripts/tracetool/backend/simple.py      |    2 +-
 scripts/tracetool/backend/syslog.py      |    6 +++---
 scripts/tracetool/backend/ust.py         |    2 +-
 scripts/tracetool/format/h.py            |   23 +++++++++++++++++------
 scripts/tracetool/format/tcg_h.py        |   20 +++++++++++++++++---
 scripts/tracetool/format/tcg_helper_c.py |    3 ++-
 9 files changed, 60 insertions(+), 34 deletions(-)

diff --git a/scripts/tracetool/backend/dtrace.py b/scripts/tracetool/backend/dtrace.py
index ab9ecfa..20242f2 100644
--- a/scripts/tracetool/backend/dtrace.py
+++ b/scripts/tracetool/backend/dtrace.py
@@ -41,6 +41,6 @@ def generate_h_begin(events):
 
 
 def generate_h(event):
-    out('        QEMU_%(uppername)s(%(argnames)s);',
+    out('    QEMU_%(uppername)s(%(argnames)s);',
         uppername=event.name.upper(),
         argnames=", ".join(event.args.names()))
diff --git a/scripts/tracetool/backend/ftrace.py b/scripts/tracetool/backend/ftrace.py
index 80dcf30..d798c71 100644
--- a/scripts/tracetool/backend/ftrace.py
+++ b/scripts/tracetool/backend/ftrace.py
@@ -30,17 +30,17 @@ def generate_h(event):
     if len(event.args) > 0:
         argnames = ", " + argnames
 
-    out('        {',
-        '            char ftrace_buf[MAX_TRACE_STRLEN];',
-        '            int unused __attribute__ ((unused));',
-        '            int trlen;',
-        '            if (trace_event_get_state(%(event_id)s)) {',
-        '                trlen = snprintf(ftrace_buf, MAX_TRACE_STRLEN,',
-        '                                 "%(name)s " %(fmt)s "\\n" %(argnames)s);',
-        '                trlen = MIN(trlen, MAX_TRACE_STRLEN - 1);',
-        '                unused = write(trace_marker_fd, ftrace_buf, trlen);',
-        '            }',
+    out('    {',
+        '        char ftrace_buf[MAX_TRACE_STRLEN];',
+        '        int unused __attribute__ ((unused));',
+        '        int trlen;',
+        '        if (trace_event_get_state(%(event_id)s)) {',
+        '            trlen = snprintf(ftrace_buf, MAX_TRACE_STRLEN,',
+        '                             "%(name)s " %(fmt)s "\\n" %(argnames)s);',
+        '            trlen = MIN(trlen, MAX_TRACE_STRLEN - 1);',
+        '            unused = write(trace_marker_fd, ftrace_buf, trlen);',
         '        }',
+        '    }',
         name=event.name,
         args=event.args,
         event_id="TRACE_" + event.name.upper(),
diff --git a/scripts/tracetool/backend/log.py b/scripts/tracetool/backend/log.py
index b3ff064..6818147 100644
--- a/scripts/tracetool/backend/log.py
+++ b/scripts/tracetool/backend/log.py
@@ -36,14 +36,14 @@ def generate_h(event):
     else:
         cond = "trace_event_get_state(%s)" % ("TRACE_" + event.name.upper())
 
-    out('        if (%(cond)s) {',
-        '            struct timeval _now;',
-        '            gettimeofday(&_now, NULL);',
-        '            qemu_log_mask(LOG_TRACE, "%%d@%%zd.%%06zd:%(name)s " %(fmt)s "\\n",',
-        '                          getpid(),',
-        '                          (size_t)_now.tv_sec, (size_t)_now.tv_usec',
-        '                          %(argnames)s);',
-        '        }',
+    out('    if (%(cond)s) {',
+        '        struct timeval _now;',
+        '        gettimeofday(&_now, NULL);',
+        '        qemu_log_mask(LOG_TRACE, "%%d@%%zd.%%06zd:%(name)s " %(fmt)s "\\n",',
+        '                      getpid(),',
+        '                      (size_t)_now.tv_sec, (size_t)_now.tv_usec',
+        '                      %(argnames)s);',
+        '    }',
         cond=cond,
         name=event.name,
         fmt=event.fmt.rstrip("\n"),
diff --git a/scripts/tracetool/backend/simple.py b/scripts/tracetool/backend/simple.py
index 1bccada..4acf23f 100644
--- a/scripts/tracetool/backend/simple.py
+++ b/scripts/tracetool/backend/simple.py
@@ -36,7 +36,7 @@ def generate_h_begin(events):
 
 
 def generate_h(event):
-    out('        _simple_%(api)s(%(args)s);',
+    out('    _simple_%(api)s(%(args)s);',
         api=event.api(),
         args=", ".join(event.args.names()))
 
diff --git a/scripts/tracetool/backend/syslog.py b/scripts/tracetool/backend/syslog.py
index 89019bc..b355121 100644
--- a/scripts/tracetool/backend/syslog.py
+++ b/scripts/tracetool/backend/syslog.py
@@ -36,9 +36,9 @@ def generate_h(event):
     else:
         cond = "trace_event_get_state(%s)" % ("TRACE_" + event.name.upper())
 
-    out('        if (%(cond)s) {',
-        '            syslog(LOG_INFO, "%(name)s " %(fmt)s %(argnames)s);',
-        '        }',
+    out('    if (%(cond)s) {',
+        '        syslog(LOG_INFO, "%(name)s " %(fmt)s %(argnames)s);',
+        '    }',
         cond=cond,
         name=event.name,
         fmt=event.fmt.rstrip("\n"),
diff --git a/scripts/tracetool/backend/ust.py b/scripts/tracetool/backend/ust.py
index ed4c227..88d13e2 100644
--- a/scripts/tracetool/backend/ust.py
+++ b/scripts/tracetool/backend/ust.py
@@ -30,6 +30,6 @@ def generate_h(event):
     if len(event.args) > 0:
         argnames = ", " + argnames
 
-    out('        tracepoint(qemu, %(name)s%(tp_args)s);',
+    out('    tracepoint(qemu, %(name)s%(tp_args)s);',
         name=event.name,
         tp_args=argnames)
diff --git a/scripts/tracetool/format/h.py b/scripts/tracetool/format/h.py
index 3763e9a..99fcbc0 100644
--- a/scripts/tracetool/format/h.py
+++ b/scripts/tracetool/format/h.py
@@ -29,6 +29,19 @@ def generate(events, backend):
     backend.generate_begin(events)
 
     for e in events:
+        # tracer without checks
+        out('',
+            'static inline void __nocheck__%(api)s(%(args)s)',
+            '{',
+            api=e.api(),
+            args=e.args)
+
+        if "disable" not in e.properties:
+            backend.generate(e)
+
+        out('}')
+
+        # tracer wrapper with checks (per-vCPU tracing)
         if "vcpu" in e.properties:
             trace_cpu = next(iter(e.args))[1]
             cond = "trace_event_get_vcpu_state(%(cpu)s,"\
@@ -44,16 +57,14 @@ def generate(events, backend):
             'static inline void %(api)s(%(args)s)',
             '{',
             '    if (%(cond)s) {',
+            '        __nocheck__%(api)s(%(names)s);',
+            '    }',
+            '}',
             api=e.api(),
             args=e.args,
+            names=", ".join(e.args.names()),
             cond=cond)
 
-        if "disable" not in e.properties:
-            backend.generate(e)
-
-        out('    }',
-            '}')
-
     backend.generate_end(events)
 
     out('#endif /* TRACE__GENERATED_TRACERS_H */')
diff --git a/scripts/tracetool/format/tcg_h.py b/scripts/tracetool/format/tcg_h.py
index e2331f2..fb2503a 100644
--- a/scripts/tracetool/format/tcg_h.py
+++ b/scripts/tracetool/format/tcg_h.py
@@ -41,7 +41,7 @@ def generate(events, backend):
 
     for e in events:
         # just keep one of them
-        if "tcg-trans" not in e.properties:
+        if "tcg-exec" not in e.properties:
             continue
 
         out('static inline void %(name_tcg)s(%(args)s)',
@@ -53,12 +53,26 @@ def generate(events, backend):
             args_trans = e.original.event_trans.args
             args_exec = tracetool.vcpu.transform_args(
                 "tcg_helper_c", e.original.event_exec, "wrapper")
+            if "vcpu" in e.properties:
+                trace_cpu = e.args.names()[0]
+                cond = "trace_event_get_vcpu_state(%(cpu)s,"\
+                       " TRACE_%(id)s,"\
+                       " TRACE_VCPU_%(id)s)"\
+                       % dict(
+                           cpu=trace_cpu,
+                           id=e.original.event_exec.name.upper())
+            else:
+                cond = "true"
+
             out('    %(name_trans)s(%(argnames_trans)s);',
-                '    gen_helper_%(name_exec)s(%(argnames_exec)s);',
+                '    if (%(cond)s) {',
+                '        gen_helper_%(name_exec)s(%(argnames_exec)s);',
+                '    }',
                 name_trans=e.original.event_trans.api(e.QEMU_TRACE),
                 name_exec=e.original.event_exec.api(e.QEMU_TRACE),
                 argnames_trans=", ".join(args_trans.names()),
-                argnames_exec=", ".join(args_exec.names()))
+                argnames_exec=", ".join(args_exec.names()),
+                cond=cond)
 
         out('}')
 
diff --git a/scripts/tracetool/format/tcg_helper_c.py b/scripts/tracetool/format/tcg_helper_c.py
index e3485b7..f9adb3c 100644
--- a/scripts/tracetool/format/tcg_helper_c.py
+++ b/scripts/tracetool/format/tcg_helper_c.py
@@ -66,7 +66,8 @@ def generate(events, backend):
 
         out('void %(name_tcg)s(%(args_api)s)',
             '{',
-            '    %(name)s(%(args_call)s);',
+            # NOTE: the check was already performed at TCG-generation time
+            '    __nocheck__%(name)s(%(args_call)s);',
             '}',
             name_tcg="helper_%s_proxy" % e.api(),
             name=e.api(),

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] [PATCH 4/4] trace: [tcg] Do not generate TCG code to trace dinamically-disabled events
  2016-09-14 21:23 ` [Qemu-devel] [PATCH 4/4] trace: [tcg] Do not generate TCG code to trace dinamically-disabled events Lluís Vilanova
@ 2016-09-15 12:55   ` Daniel P. Berrange
  2016-09-15 14:24     ` Lluís Vilanova
  0 siblings, 1 reply; 8+ messages in thread
From: Daniel P. Berrange @ 2016-09-15 12:55 UTC (permalink / raw)
  To: Lluís Vilanova; +Cc: qemu-devel, Eduardo Habkost, Stefan Hajnoczi

On Wed, Sep 14, 2016 at 11:23:38PM +0200, Lluís Vilanova wrote:
> If an event is dynamically disabled, the TCG code that calls the
> execution-time tracer is not generated.
> 
> Removes the overheads of execution-time tracers for dynamically disabled
> events. As a bonus, also avoids checking the event state when the
> execution-time tracer is called from TCG-generated code (since otherwise
> TCG would simply not call it).
> 
> Signed-off-by: Lluís Vilanova <vilanova@ac.upc.edu>
> ---
>  scripts/tracetool/backend/dtrace.py      |    2 +-
>  scripts/tracetool/backend/ftrace.py      |   20 ++++++++++----------
>  scripts/tracetool/backend/log.py         |   16 ++++++++--------
>  scripts/tracetool/backend/simple.py      |    2 +-
>  scripts/tracetool/backend/syslog.py      |    6 +++---
>  scripts/tracetool/backend/ust.py         |    2 +-
>  scripts/tracetool/format/h.py            |   23 +++++++++++++++++------
>  scripts/tracetool/format/tcg_h.py        |   20 +++++++++++++++++---
>  scripts/tracetool/format/tcg_helper_c.py |    3 ++-
>  9 files changed, 60 insertions(+), 34 deletions(-)
> 
> diff --git a/scripts/tracetool/backend/dtrace.py b/scripts/tracetool/backend/dtrace.py
> index ab9ecfa..20242f2 100644
> --- a/scripts/tracetool/backend/dtrace.py
> +++ b/scripts/tracetool/backend/dtrace.py
> @@ -41,6 +41,6 @@ def generate_h_begin(events):
>  
>  
>  def generate_h(event):
> -    out('        QEMU_%(uppername)s(%(argnames)s);',
> +    out('    QEMU_%(uppername)s(%(argnames)s);',
>          uppername=event.name.upper(),
>          argnames=", ".join(event.args.names()))
> diff --git a/scripts/tracetool/backend/ftrace.py b/scripts/tracetool/backend/ftrace.py
> index 80dcf30..d798c71 100644
> --- a/scripts/tracetool/backend/ftrace.py
> +++ b/scripts/tracetool/backend/ftrace.py
> @@ -30,17 +30,17 @@ def generate_h(event):
>      if len(event.args) > 0:
>          argnames = ", " + argnames
>  
> -    out('        {',
> -        '            char ftrace_buf[MAX_TRACE_STRLEN];',
> -        '            int unused __attribute__ ((unused));',
> -        '            int trlen;',
> -        '            if (trace_event_get_state(%(event_id)s)) {',
> -        '                trlen = snprintf(ftrace_buf, MAX_TRACE_STRLEN,',
> -        '                                 "%(name)s " %(fmt)s "\\n" %(argnames)s);',
> -        '                trlen = MIN(trlen, MAX_TRACE_STRLEN - 1);',
> -        '                unused = write(trace_marker_fd, ftrace_buf, trlen);',
> -        '            }',
> +    out('    {',
> +        '        char ftrace_buf[MAX_TRACE_STRLEN];',
> +        '        int unused __attribute__ ((unused));',
> +        '        int trlen;',
> +        '        if (trace_event_get_state(%(event_id)s)) {',
> +        '            trlen = snprintf(ftrace_buf, MAX_TRACE_STRLEN,',
> +        '                             "%(name)s " %(fmt)s "\\n" %(argnames)s);',
> +        '            trlen = MIN(trlen, MAX_TRACE_STRLEN - 1);',
> +        '            unused = write(trace_marker_fd, ftrace_buf, trlen);',
>          '        }',
> +        '    }',
>          name=event.name,
>          args=event.args,
>          event_id="TRACE_" + event.name.upper(),
> diff --git a/scripts/tracetool/backend/log.py b/scripts/tracetool/backend/log.py
> index b3ff064..6818147 100644
> --- a/scripts/tracetool/backend/log.py
> +++ b/scripts/tracetool/backend/log.py
> @@ -36,14 +36,14 @@ def generate_h(event):
>      else:
>          cond = "trace_event_get_state(%s)" % ("TRACE_" + event.name.upper())
>  
> -    out('        if (%(cond)s) {',
> -        '            struct timeval _now;',
> -        '            gettimeofday(&_now, NULL);',
> -        '            qemu_log_mask(LOG_TRACE, "%%d@%%zd.%%06zd:%(name)s " %(fmt)s "\\n",',
> -        '                          getpid(),',
> -        '                          (size_t)_now.tv_sec, (size_t)_now.tv_usec',
> -        '                          %(argnames)s);',
> -        '        }',
> +    out('    if (%(cond)s) {',
> +        '        struct timeval _now;',
> +        '        gettimeofday(&_now, NULL);',
> +        '        qemu_log_mask(LOG_TRACE, "%%d@%%zd.%%06zd:%(name)s " %(fmt)s "\\n",',
> +        '                      getpid(),',
> +        '                      (size_t)_now.tv_sec, (size_t)_now.tv_usec',
> +        '                      %(argnames)s);',
> +        '    }',
>          cond=cond,
>          name=event.name,
>          fmt=event.fmt.rstrip("\n"),
> diff --git a/scripts/tracetool/backend/simple.py b/scripts/tracetool/backend/simple.py
> index 1bccada..4acf23f 100644
> --- a/scripts/tracetool/backend/simple.py
> +++ b/scripts/tracetool/backend/simple.py
> @@ -36,7 +36,7 @@ def generate_h_begin(events):
>  
>  
>  def generate_h(event):
> -    out('        _simple_%(api)s(%(args)s);',
> +    out('    _simple_%(api)s(%(args)s);',
>          api=event.api(),
>          args=", ".join(event.args.names()))
>  
> diff --git a/scripts/tracetool/backend/syslog.py b/scripts/tracetool/backend/syslog.py
> index 89019bc..b355121 100644
> --- a/scripts/tracetool/backend/syslog.py
> +++ b/scripts/tracetool/backend/syslog.py
> @@ -36,9 +36,9 @@ def generate_h(event):
>      else:
>          cond = "trace_event_get_state(%s)" % ("TRACE_" + event.name.upper())
>  
> -    out('        if (%(cond)s) {',
> -        '            syslog(LOG_INFO, "%(name)s " %(fmt)s %(argnames)s);',
> -        '        }',
> +    out('    if (%(cond)s) {',
> +        '        syslog(LOG_INFO, "%(name)s " %(fmt)s %(argnames)s);',
> +        '    }',
>          cond=cond,
>          name=event.name,
>          fmt=event.fmt.rstrip("\n"),
> diff --git a/scripts/tracetool/backend/ust.py b/scripts/tracetool/backend/ust.py
> index ed4c227..88d13e2 100644
> --- a/scripts/tracetool/backend/ust.py
> +++ b/scripts/tracetool/backend/ust.py
> @@ -30,6 +30,6 @@ def generate_h(event):
>      if len(event.args) > 0:
>          argnames = ", " + argnames
>  
> -    out('        tracepoint(qemu, %(name)s%(tp_args)s);',
> +    out('    tracepoint(qemu, %(name)s%(tp_args)s);',
>          name=event.name,
>          tp_args=argnames)


All the stylistic whitespace changes should be done as a separate
patch from the the functional changes.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] [PATCH 3/4] exec: [tcg] Switch physical TB cache based on vCPU tracing state
  2016-09-14 21:23 ` [Qemu-devel] [PATCH 3/4] exec: [tcg] Switch physical TB cache based on vCPU tracing state Lluís Vilanova
@ 2016-09-15 12:57   ` Lluís Vilanova
  0 siblings, 0 replies; 8+ messages in thread
From: Lluís Vilanova @ 2016-09-15 12:57 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Peter Crosthwaite, Stefan Hajnoczi,
	Paolo Bonzini, Richard Henderson

Lluís Vilanova writes:

> Uses the per-vCPU event state in CPUState->trace_dstate (a bitmap) as an
> index to a physical TB cache that will contain code specific to the set
> of dynamically enabled events.

> Two vCPUs tracing different events will execute code from different
> physical TB caches. Two vCPUs tracing the same events will execute code
> from the same physical TB cache.

> This is used on the next patch to optimize TCG code related to event
> tracing.

> Signed-off-by: Lluís Vilanova <vilanova@ac.upc.edu>
> ---
>  cpu-exec.c             |    6 ++++++
>  trace/control-target.c |    2 ++
>  trace/control.h        |    3 +++
>  translate-all.c        |   23 +++++++++++++++++++++++
>  translate-all.h        |   26 ++++++++++++++++++++++++++
>  5 files changed, 60 insertions(+)

[...]
> diff --git a/translate-all.c b/translate-all.c
> index c864eee..c306cf4 100644
> --- a/translate-all.c
> +++ b/translate-all.c
> @@ -1166,6 +1166,29 @@ static void tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc,
>  #endif
>  }
 
> +void cpu_tb_cache_set_request(CPUState *cpu)
> +{
> +    /*
> +     * Request is taken from cpu->trace_dstate and lazily applied into
> +     * cpu->tb_cache_idx at cpu_tb_cache_set_apply().
> +     */
> +    /* NOTE: Checked by all TBs in gen_tb_start(). */
> +    cpu->tcg_exit_req = true;
> +}
> +
> +bool cpu_tb_cache_set_requested(CPUState *cpu)
> +{
> +    return !bitmap_equal(cpu->trace_dstate, cpu->tb_cache_idx,
> +                         TRACE_VCPU_EVENT_COUNT);
> +}
> +
> +void cpu_tb_cache_set_apply(CPUState *cpu)
> +{
> +    bitmap_copy(cpu->tb_cache_idx, cpu->tb_cache_idx,
> +                TRACE_VCPU_EVENT_COUNT);

I forgot to update the patch before sending. This one should be:

    bitmap_copy(cpu->tb_cache_idx, cpu->trace_dstate,
                TRACE_VCPU_EVENT_COUNT);


I'll wait for other reviews before sending v2 with this fixed.




> +    tb_flush_jmp_cache_all(cpu);
> +}
> +
>  /* Called with mmap_lock held for user mode emulation.  */
>  TranslationBlock *tb_gen_code(CPUState *cpu,
>                                target_ulong pc, target_ulong cs_base,
[...]


Cheers,
  Lluis

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] [PATCH 4/4] trace: [tcg] Do not generate TCG code to trace dinamically-disabled events
  2016-09-15 12:55   ` Daniel P. Berrange
@ 2016-09-15 14:24     ` Lluís Vilanova
  0 siblings, 0 replies; 8+ messages in thread
From: Lluís Vilanova @ 2016-09-15 14:24 UTC (permalink / raw)
  To: Daniel P. Berrange; +Cc: qemu-devel, Eduardo Habkost, Stefan Hajnoczi

Daniel P Berrange writes:

> On Wed, Sep 14, 2016 at 11:23:38PM +0200, Lluís Vilanova wrote:
>> If an event is dynamically disabled, the TCG code that calls the
>> execution-time tracer is not generated.
>> 
>> Removes the overheads of execution-time tracers for dynamically disabled
>> events. As a bonus, also avoids checking the event state when the
>> execution-time tracer is called from TCG-generated code (since otherwise
>> TCG would simply not call it).
>> 
>> Signed-off-by: Lluís Vilanova <vilanova@ac.upc.edu>
>> ---
>> scripts/tracetool/backend/dtrace.py      |    2 +-
>> scripts/tracetool/backend/ftrace.py      |   20 ++++++++++----------
>> scripts/tracetool/backend/log.py         |   16 ++++++++--------
>> scripts/tracetool/backend/simple.py      |    2 +-
>> scripts/tracetool/backend/syslog.py      |    6 +++---
>> scripts/tracetool/backend/ust.py         |    2 +-
>> scripts/tracetool/format/h.py            |   23 +++++++++++++++++------
>> scripts/tracetool/format/tcg_h.py        |   20 +++++++++++++++++---
>> scripts/tracetool/format/tcg_helper_c.py |    3 ++-
>> 9 files changed, 60 insertions(+), 34 deletions(-)
>> 
>> diff --git a/scripts/tracetool/backend/dtrace.py b/scripts/tracetool/backend/dtrace.py
>> index ab9ecfa..20242f2 100644
>> --- a/scripts/tracetool/backend/dtrace.py
>> +++ b/scripts/tracetool/backend/dtrace.py
>> @@ -41,6 +41,6 @@ def generate_h_begin(events):
>> 
>> 
>> def generate_h(event):
>> -    out('        QEMU_%(uppername)s(%(argnames)s);',
>> +    out('    QEMU_%(uppername)s(%(argnames)s);',
>> uppername=event.name.upper(),
>> argnames=", ".join(event.args.names()))
>> diff --git a/scripts/tracetool/backend/ftrace.py b/scripts/tracetool/backend/ftrace.py
>> index 80dcf30..d798c71 100644
>> --- a/scripts/tracetool/backend/ftrace.py
>> +++ b/scripts/tracetool/backend/ftrace.py
>> @@ -30,17 +30,17 @@ def generate_h(event):
>> if len(event.args) > 0:
>> argnames = ", " + argnames
>> 
>> -    out('        {',
>> -        '            char ftrace_buf[MAX_TRACE_STRLEN];',
>> -        '            int unused __attribute__ ((unused));',
>> -        '            int trlen;',
>> -        '            if (trace_event_get_state(%(event_id)s)) {',
>> -        '                trlen = snprintf(ftrace_buf, MAX_TRACE_STRLEN,',
>> -        '                                 "%(name)s " %(fmt)s "\\n" %(argnames)s);',
>> -        '                trlen = MIN(trlen, MAX_TRACE_STRLEN - 1);',
>> -        '                unused = write(trace_marker_fd, ftrace_buf, trlen);',
>> -        '            }',
>> +    out('    {',
>> +        '        char ftrace_buf[MAX_TRACE_STRLEN];',
>> +        '        int unused __attribute__ ((unused));',
>> +        '        int trlen;',
>> +        '        if (trace_event_get_state(%(event_id)s)) {',
>> +        '            trlen = snprintf(ftrace_buf, MAX_TRACE_STRLEN,',
>> +        '                             "%(name)s " %(fmt)s "\\n" %(argnames)s);',
>> +        '            trlen = MIN(trlen, MAX_TRACE_STRLEN - 1);',
>> +        '            unused = write(trace_marker_fd, ftrace_buf, trlen);',
>> '        }',
>> +        '    }',
>> name=event.name,
>> args=event.args,
>> event_id="TRACE_" + event.name.upper(),
>> diff --git a/scripts/tracetool/backend/log.py b/scripts/tracetool/backend/log.py
>> index b3ff064..6818147 100644
>> --- a/scripts/tracetool/backend/log.py
>> +++ b/scripts/tracetool/backend/log.py
>> @@ -36,14 +36,14 @@ def generate_h(event):
>> else:
>> cond = "trace_event_get_state(%s)" % ("TRACE_" + event.name.upper())
>> 
>> -    out('        if (%(cond)s) {',
>> -        '            struct timeval _now;',
>> -        '            gettimeofday(&_now, NULL);',
>> -        '            qemu_log_mask(LOG_TRACE, "%%d@%%zd.%%06zd:%(name)s " %(fmt)s "\\n",',
>> -        '                          getpid(),',
>> -        '                          (size_t)_now.tv_sec, (size_t)_now.tv_usec',
>> -        '                          %(argnames)s);',
>> -        '        }',
>> +    out('    if (%(cond)s) {',
>> +        '        struct timeval _now;',
>> +        '        gettimeofday(&_now, NULL);',
>> +        '        qemu_log_mask(LOG_TRACE, "%%d@%%zd.%%06zd:%(name)s " %(fmt)s "\\n",',
>> +        '                      getpid(),',
>> +        '                      (size_t)_now.tv_sec, (size_t)_now.tv_usec',
>> +        '                      %(argnames)s);',
>> +        '    }',
>> cond=cond,
>> name=event.name,
>> fmt=event.fmt.rstrip("\n"),
>> diff --git a/scripts/tracetool/backend/simple.py b/scripts/tracetool/backend/simple.py
>> index 1bccada..4acf23f 100644
>> --- a/scripts/tracetool/backend/simple.py
>> +++ b/scripts/tracetool/backend/simple.py
>> @@ -36,7 +36,7 @@ def generate_h_begin(events):
>> 
>> 
>> def generate_h(event):
>> -    out('        _simple_%(api)s(%(args)s);',
>> +    out('    _simple_%(api)s(%(args)s);',
>> api=event.api(),
>> args=", ".join(event.args.names()))
>> 
>> diff --git a/scripts/tracetool/backend/syslog.py b/scripts/tracetool/backend/syslog.py
>> index 89019bc..b355121 100644
>> --- a/scripts/tracetool/backend/syslog.py
>> +++ b/scripts/tracetool/backend/syslog.py
>> @@ -36,9 +36,9 @@ def generate_h(event):
>> else:
>> cond = "trace_event_get_state(%s)" % ("TRACE_" + event.name.upper())
>> 
>> -    out('        if (%(cond)s) {',
>> -        '            syslog(LOG_INFO, "%(name)s " %(fmt)s %(argnames)s);',
>> -        '        }',
>> +    out('    if (%(cond)s) {',
>> +        '        syslog(LOG_INFO, "%(name)s " %(fmt)s %(argnames)s);',
>> +        '    }',
>> cond=cond,
>> name=event.name,
>> fmt=event.fmt.rstrip("\n"),
>> diff --git a/scripts/tracetool/backend/ust.py b/scripts/tracetool/backend/ust.py
>> index ed4c227..88d13e2 100644
>> --- a/scripts/tracetool/backend/ust.py
>> +++ b/scripts/tracetool/backend/ust.py
>> @@ -30,6 +30,6 @@ def generate_h(event):
>> if len(event.args) > 0:
>> argnames = ", " + argnames
>> 
>> -    out('        tracepoint(qemu, %(name)s%(tp_args)s);',
>> +    out('    tracepoint(qemu, %(name)s%(tp_args)s);',
>> name=event.name,
>> tp_args=argnames)


> All the stylistic whitespace changes should be done as a separate
> patch from the the functional changes.

Ok!

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-09-15 14:24 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-14 21:23 [Qemu-devel] [PATCH 0/4] trace: [tcg] Optimize per-vCPU tracing states with separate TB caches Lluís Vilanova
2016-09-14 21:23 ` [Qemu-devel] [PATCH 1/4] exec: [tcg] Refactor flush of per-CPU virtual TB cache Lluís Vilanova
2016-09-14 21:23 ` [Qemu-devel] [PATCH 2/4] exec: [tcg] Use multiple physical TB caches Lluís Vilanova
2016-09-14 21:23 ` [Qemu-devel] [PATCH 3/4] exec: [tcg] Switch physical TB cache based on vCPU tracing state Lluís Vilanova
2016-09-15 12:57   ` Lluís Vilanova
2016-09-14 21:23 ` [Qemu-devel] [PATCH 4/4] trace: [tcg] Do not generate TCG code to trace dinamically-disabled events Lluís Vilanova
2016-09-15 12:55   ` Daniel P. Berrange
2016-09-15 14:24     ` Lluís Vilanova

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.