All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v9 0/7] trace: [tcg] Optimize per-vCPU tracing states with separate TB caches
@ 2017-06-27  9:52 Lluís Vilanova
  2017-06-27  9:56 ` [Qemu-devel] [PATCH v9 1/7] exec: [tcg] Refactor flush of per-CPU virtual TB cache Lluís Vilanova
                   ` (7 more replies)
  0 siblings, 8 replies; 12+ messages in thread
From: Lluís Vilanova @ 2017-06-27  9:52 UTC (permalink / raw)
  To: qemu-devel; +Cc: Emilio G. Cota, Eric Blake, Eduardo Habkost, Stefan Hajnoczi

Optimizes tracing of events with the 'tcg' and 'vcpu' properties (e.g., memory
accesses), making it feasible to statically enable them by default on all QEMU
builds.

Some quick'n'dirty numbers with 400.perlbench (SPECcpu2006) on the train input
(medium size - suns.pl) and the guest_mem_before event:

* vanilla, statically disabled
real    0m2,259s
user    0m2,252s
sys     0m0,004s

* vanilla, statically enabled (overhead: 2.18x)
real    0m4,921s
user    0m4,912s
sys     0m0,008s

* multi-tb, statically disabled (overhead: 0.99x) [within noise range]
real    0m2,228s
user    0m2,216s
sys     0m0,008s

* multi-tb, statically enabled (overhead: 0.99x) [within noise range]
real    0m2,229s
user    0m2,224s
sys     0m0,004s


Right now, events with the 'tcg' property always generate TCG code to trace that
event at guest code execution time, where the event's dynamic state is checked.

This series adds a performance optimization where TCG code for events with the
'tcg' and 'vcpu' properties is not generated if the event is dynamically
disabled. This optimization raises two issues:

* An event can be dynamically disabled/enabled after the corresponding TCG code
  has been generated (i.e., a new TB with the corresponding code should be
  used).

* Each vCPU can have a different dynamic state for the same event (i.e., tracing
  the memory accesses of only one process pinned to a vCPU).

To handle both issues, this series integrates the dynamic tracing event state
into the TB hashing function, so that vCPUs tracing different events will use
separate TBs. Note that only events with the 'vcpu' property are used for
hashing (as stored in the bitmap of CPUState->trace_dstate).

This makes dynamic event state changes on vCPUs very efficient, since they can
use TBs produced by other vCPUs while on the same event state combination (or
produced by the same vCPU, earlier).

Discarded alternatives:

* Emitting TCG code to check if an event needs tracing, where we should still
  move the tracing call code to either a cold path (making tracing performance
  worse), or leave it inlined (making non-tracing performance worse).

* Eliding TCG code only when *zero* vCPUs are tracing an event, since enabling
  it on a single vCPU will impact the performance of all other vCPUs that are
  not tracing that event.

Signed-off-by: Lluís Vilanova <address@hidden>
---

Changes in v9
=============

* Rebase on 931892e8a6.
* Undo renaming of tb->trace_vcpu_dstate to the shorter tb->trace_ds.
* Add measurements to commit enabling all guest events.


Changes in v8
=============

[Emilio G. Cota]

* Ported to current dev tree.

* Allocate cpu->trace_dstate statically. This
  * allows us to drop the event_count inline patch.
  * simplifies and improves the performance of accessing cpu->trace_dstate:
    we just need to dereference, instead of going through bitmap_copy and
    an intermediate unsigned long.

* If we try to register more CPU events than the max we support (there's a
  constant for it), drop the event and tell the user with error_report. But
  really this is a bug, since we control what CPU events are traceable. Should
  we abort() as well?

* Added rth's R-b tag

* Addressed my own comments:
  * rename tb->trace_vcpu_dstate to the shorter tb->trace_ds
  * use uint32_t for tb->trace_ds instead of a typedef
  * add BUILD_BUG_ON check to make sure tb->trace_ds is big enough
  * fix xxhash

* Do not add trace_dstate to tb_htable_lookup, since we can grab it from
  cpu->trace_dstate.

This patchset applies cleanly on top of rth's tcg-next (a01792e1e).


Changes in v7
=============

* Fix delayed dstate changes (now uses async_run_on_cpu() as suggested by Paolo
  Bonzini).

* Note to Richard: patch 4 has been adapted to the new patch 3 async callback,
  but is essentially the same.


Changes in v6
=============

* Check hashing size error with QEMU_BUILD_BUG_ON [Richard Henderson].


Changes in v5
=============

* Move define into "qemu-common.h" to allow compilation of tests.


Changes in v4
=============

* Incorporate trace_dstate into the TB hashing function instead of using
  multiple physical TB caches [suggested by Richard Henderson].


Changes in v3
=============

* Rebase on 0737f32daf.
* Do not use reserved symbol prefixes ("__") [Stefan Hajnoczi].
* Refactor trace_get_vcpu_event_count() to be inlinable.
* Optimize cpu_tb_cache_set_requested() (hottest path).


Changes in v2
=============

* Fix bitmap copy in cpu_tb_cache_set_apply().
* Split generated code re-alignment into a separate patch [Daniel P. Berrange].

Lluís Vilanova (7):
      exec: [tcg] Refactor flush of per-CPU virtual TB cache
      trace: Allocate cpu->trace_dstate in place
      trace: [tcg] Delay changes to dynamic state when translating
      exec: [tcg] Use different TBs according to the vCPU's dynamic tracing state
      trace: [tcg] Do not generate TCG code to trace dinamically-disabled events
      trace: [tcg,trivial] Re-align generated code
      trace: [trivial] Statically enable all guest events


 accel/tcg/cpu-exec.c                     |    8 ++++++--
 accel/tcg/cputlb.c                       |    2 +-
 accel/tcg/translate-all.c                |   26 +++++++++++++++++++-------
 include/exec/exec-all.h                  |   12 ++++++++++++
 include/exec/tb-hash-xx.h                |    7 +++++--
 include/exec/tb-hash.h                   |    5 +++--
 include/qom/cpu.h                        |   12 ++++++------
 qom/cpu.c                                |    8 --------
 scripts/tracetool/__init__.py            |    3 ++-
 scripts/tracetool/backend/dtrace.py      |    4 ++--
 scripts/tracetool/backend/ftrace.py      |   20 ++++++++++----------
 scripts/tracetool/backend/log.py         |   19 ++++++++++---------
 scripts/tracetool/backend/simple.py      |    4 ++--
 scripts/tracetool/backend/syslog.py      |    6 +++---
 scripts/tracetool/backend/ust.py         |    4 ++--
 scripts/tracetool/format/h.py            |   26 +++++++++++++++++++-------
 scripts/tracetool/format/tcg_h.py        |   21 +++++++++++++++++----
 scripts/tracetool/format/tcg_helper_c.py |    5 +++--
 tcg/tcg-runtime.c                        |    3 ++-
 tests/qht-bench.c                        |    2 +-
 trace-events                             |    6 +++---
 trace/control-target.c                   |   21 ++++++++++++++++++---
 trace/control.c                          |    9 ++++++++-
 trace/control.h                          |    3 +++
 24 files changed, 157 insertions(+), 79 deletions(-)


To: qemu-devel@nongnu.org
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Eric Blake <eblake@redhat.com>
Cc: Emilio G. Cota <cota@braap.org>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Qemu-devel] [PATCH v9 1/7] exec: [tcg] Refactor flush of per-CPU virtual TB cache
  2017-06-27  9:52 [Qemu-devel] [PATCH v9 0/7] trace: [tcg] Optimize per-vCPU tracing states with separate TB caches Lluís Vilanova
@ 2017-06-27  9:56 ` Lluís Vilanova
  2017-06-27 10:00 ` [Qemu-devel] [PATCH v9 2/7] trace: Allocate cpu->trace_dstate in place Lluís Vilanova
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Lluís Vilanova @ 2017-06-27  9:56 UTC (permalink / raw)
  To: qemu-devel
  Cc: Emilio G. Cota, Eric Blake, Eduardo Habkost, Stefan Hajnoczi,
	Paolo Bonzini, Peter Crosthwaite, Richard Henderson

The function is reused in later patches.

Signed-off-by: Lluís Vilanova <vilanova@ac.upc.edu>
Reviewed-by: Richard Henderson <rth@twiddle.net>
---
 accel/tcg/cputlb.c        |    2 +-
 accel/tcg/translate-all.c |   15 ++++++++++-----
 include/exec/exec-all.h   |    6 ++++++
 3 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index 743776ae19..6a2b762325 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -118,7 +118,7 @@ static void tlb_flush_nocheck(CPUState *cpu)
 
     memset(env->tlb_table, -1, sizeof(env->tlb_table));
     memset(env->tlb_v_table, -1, sizeof(env->tlb_v_table));
-    memset(cpu->tb_jmp_cache, 0, sizeof(cpu->tb_jmp_cache));
+    tb_flush_jmp_cache_all(cpu);
 
     env->vtlb_index = 0;
     env->tlb_flush_addr = -1;
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index f6ad46b613..03820e3aeb 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -928,11 +928,7 @@ static void do_tb_flush(CPUState *cpu, run_on_cpu_data tb_flush_count)
     }
 
     CPU_FOREACH(cpu) {
-        int i;
-
-        for (i = 0; i < TB_JMP_CACHE_SIZE; ++i) {
-            atomic_set(&cpu->tb_jmp_cache[i], NULL);
-        }
+        tb_flush_jmp_cache_all(cpu);
     }
 
     tcg_ctx.tb_ctx.nb_tbs = 0;
@@ -949,6 +945,15 @@ done:
     tb_unlock();
 }
 
+void tb_flush_jmp_cache_all(CPUState *cpu)
+{
+    int i;
+
+    for (i = 0; i < TB_JMP_CACHE_SIZE; ++i) {
+        atomic_set(&cpu->tb_jmp_cache[i], NULL);
+    }
+}
+
 void tb_flush(CPUState *cpu)
 {
     if (tcg_enabled()) {
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 724ec73dce..b0281b000f 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -366,6 +366,12 @@ struct TranslationBlock {
 };
 
 void tb_free(TranslationBlock *tb);
+/**
+ * tb_flush_jmp_cache_all:
+ *
+ * Flush the virtual translation block cache.
+ */
+void tb_flush_jmp_cache_all(CPUState *env);
 void tb_flush(CPUState *cpu);
 void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr);
 TranslationBlock *tb_htable_lookup(CPUState *cpu, target_ulong pc,

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Qemu-devel] [PATCH v9 2/7] trace: Allocate cpu->trace_dstate in place
  2017-06-27  9:52 [Qemu-devel] [PATCH v9 0/7] trace: [tcg] Optimize per-vCPU tracing states with separate TB caches Lluís Vilanova
  2017-06-27  9:56 ` [Qemu-devel] [PATCH v9 1/7] exec: [tcg] Refactor flush of per-CPU virtual TB cache Lluís Vilanova
@ 2017-06-27 10:00 ` Lluís Vilanova
  2017-06-27 10:04 ` [Qemu-devel] [PATCH v9 3/7] trace: [tcg] Delay changes to dynamic state when translating Lluís Vilanova
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Lluís Vilanova @ 2017-06-27 10:00 UTC (permalink / raw)
  To: qemu-devel; +Cc: Emilio G. Cota, Eric Blake, Eduardo Habkost, Stefan Hajnoczi

There's little point in dynamically allocating the bitmap if we
know at compile-time the max number of events we want to support.
Thus, make room in the struct for the bitmap, which will make things
easier later: this paves the way for upcoming changes, in which
we'll use a u32 to fully capture cpu->trace_dstate.

This change also increases performance by saving a dereference and
improving locality--note that this is important since upcoming work
makes reading this bitmap fairly common.

Signed-off-by: Emilio G. Cota <cota@braap.org>
Reviewed-by: Lluís Vilanova <vilanova@ac.upc.edu>
---
 include/qom/cpu.h |    9 +++------
 qom/cpu.c         |    8 --------
 trace/control.c   |    9 ++++++++-
 3 files changed, 11 insertions(+), 15 deletions(-)

diff --git a/include/qom/cpu.h b/include/qom/cpu.h
index 89ddb686fb..bc6e20f056 100644
--- a/include/qom/cpu.h
+++ b/include/qom/cpu.h
@@ -259,6 +259,7 @@ typedef void (*run_on_cpu_func)(CPUState *cpu, run_on_cpu_data data);
 struct qemu_work_item;
 
 #define CPU_UNSET_NUMA_NODE_ID -1
+#define CPU_TRACE_DSTATE_MAX_EVENTS 32
 
 /**
  * CPUState:
@@ -373,12 +374,8 @@ struct CPUState {
     struct KVMState *kvm_state;
     struct kvm_run *kvm_run;
 
-    /*
-     * Used for events with 'vcpu' and *without* the 'disabled' properties.
-     * Dynamically allocated based on bitmap requried to hold up to
-     * trace_get_vcpu_event_count() entries.
-     */
-    unsigned long *trace_dstate;
+    /* Used for events with 'vcpu' and *without* the 'disabled' properties */
+    DECLARE_BITMAP(trace_dstate, CPU_TRACE_DSTATE_MAX_EVENTS);
 
     /* TODO Move common fields from CPUArchState here. */
     int cpu_index; /* used by alpha TCG */
diff --git a/qom/cpu.c b/qom/cpu.c
index 50698767dd..69fbb9cc95 100644
--- a/qom/cpu.c
+++ b/qom/cpu.c
@@ -382,7 +382,6 @@ static void cpu_common_unrealizefn(DeviceState *dev, Error **errp)
 
 static void cpu_common_initfn(Object *obj)
 {
-    uint32_t count;
     CPUState *cpu = CPU(obj);
     CPUClass *cc = CPU_GET_CLASS(obj);
 
@@ -397,18 +396,11 @@ static void cpu_common_initfn(Object *obj)
     QTAILQ_INIT(&cpu->breakpoints);
     QTAILQ_INIT(&cpu->watchpoints);
 
-    count = trace_get_vcpu_event_count();
-    if (count) {
-        cpu->trace_dstate = bitmap_new(count);
-    }
-
     cpu_exec_initfn(cpu);
 }
 
 static void cpu_common_finalize(Object *obj)
 {
-    CPUState *cpu = CPU(obj);
-    g_free(cpu->trace_dstate);
 }
 
 static int64_t cpu_common_get_arch_id(CPUState *cpu)
diff --git a/trace/control.c b/trace/control.c
index 9b157b0ca7..83740aa7ee 100644
--- a/trace/control.c
+++ b/trace/control.c
@@ -65,8 +65,15 @@ void trace_event_register_group(TraceEvent **events)
     size_t i;
     for (i = 0; events[i] != NULL; i++) {
         events[i]->id = next_id++;
-        if (events[i]->vcpu_id != TRACE_VCPU_EVENT_NONE) {
+        if (events[i]->vcpu_id == TRACE_VCPU_EVENT_NONE) {
+            continue;
+        }
+
+        if (likely(next_vcpu_id < CPU_TRACE_DSTATE_MAX_EVENTS)) {
             events[i]->vcpu_id = next_vcpu_id++;
+        } else {
+            error_report("WARNING: too many vcpu trace events; dropping '%s'",
+                         events[i]->name);
         }
     }
     event_groups = g_renew(TraceEventGroup, event_groups, nevent_groups + 1);

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Qemu-devel] [PATCH v9 3/7] trace: [tcg] Delay changes to dynamic state when translating
  2017-06-27  9:52 [Qemu-devel] [PATCH v9 0/7] trace: [tcg] Optimize per-vCPU tracing states with separate TB caches Lluís Vilanova
  2017-06-27  9:56 ` [Qemu-devel] [PATCH v9 1/7] exec: [tcg] Refactor flush of per-CPU virtual TB cache Lluís Vilanova
  2017-06-27 10:00 ` [Qemu-devel] [PATCH v9 2/7] trace: Allocate cpu->trace_dstate in place Lluís Vilanova
@ 2017-06-27 10:04 ` Lluís Vilanova
  2017-06-27 10:08 ` [Qemu-devel] [PATCH v9 4/7] exec: [tcg] Use different TBs according to the vCPU's dynamic tracing state Lluís Vilanova
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Lluís Vilanova @ 2017-06-27 10:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: Emilio G. Cota, Eric Blake, Eduardo Habkost, Stefan Hajnoczi

This keeps consistency across all decisions taken during translation
when the dynamic state of a vCPU is changed in the middle of translating
some guest code.

Signed-off-by: Lluís Vilanova <vilanova@ac.upc.edu>
Reviewed-by: Richard Henderson <rth@twiddle.net>
---
 include/qom/cpu.h      |    3 +++
 trace/control-target.c |   20 +++++++++++++++++---
 2 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/include/qom/cpu.h b/include/qom/cpu.h
index bc6e20f056..29f4a32572 100644
--- a/include/qom/cpu.h
+++ b/include/qom/cpu.h
@@ -303,6 +303,8 @@ struct qemu_work_item;
  * @kvm_fd: vCPU file descriptor for KVM.
  * @work_mutex: Lock to prevent multiple access to queued_work_*.
  * @queued_work_first: First asynchronous work pending.
+ * @trace_dstate_delayed: Delayed changes to trace_dstate (includes all changes
+ *                        to @trace_dstate).
  * @trace_dstate: Dynamic tracing state of events for this vCPU (bitmask).
  *
  * State of one CPU core or thread.
@@ -375,6 +377,7 @@ struct CPUState {
     struct kvm_run *kvm_run;
 
     /* Used for events with 'vcpu' and *without* the 'disabled' properties */
+    DECLARE_BITMAP(trace_dstate_delayed, CPU_TRACE_DSTATE_MAX_EVENTS);
     DECLARE_BITMAP(trace_dstate, CPU_TRACE_DSTATE_MAX_EVENTS);
 
     /* TODO Move common fields from CPUArchState here. */
diff --git a/trace/control-target.c b/trace/control-target.c
index 6266e6380d..9d44566c95 100644
--- a/trace/control-target.c
+++ b/trace/control-target.c
@@ -1,7 +1,7 @@
 /*
  * Interface for configuring and controlling the state of tracing events.
  *
- * Copyright (C) 2014-2016 Lluís Vilanova <vilanova@ac.upc.edu>
+ * Copyright (C) 2014-2017 Lluís Vilanova <vilanova@ac.upc.edu>
  *
  * This work is licensed under the terms of the GNU GPL, version 2 or later.
  * See the COPYING file in the top-level directory.
@@ -57,6 +57,13 @@ void trace_event_set_state_dynamic(TraceEvent *ev, bool state)
     }
 }
 
+static void trace_event_synchronize_vcpu_state_dynamic(
+    CPUState *vcpu, run_on_cpu_data ignored)
+{
+    bitmap_copy(vcpu->trace_dstate, vcpu->trace_dstate_delayed,
+                trace_get_vcpu_event_count());
+}
+
 void trace_event_set_vcpu_state_dynamic(CPUState *vcpu,
                                         TraceEvent *ev, bool state)
 {
@@ -69,13 +76,20 @@ void trace_event_set_vcpu_state_dynamic(CPUState *vcpu,
     if (state_pre != state) {
         if (state) {
             trace_events_enabled_count++;
-            set_bit(vcpu_id, vcpu->trace_dstate);
+            set_bit(vcpu_id, vcpu->trace_dstate_delayed);
             (*ev->dstate)++;
         } else {
             trace_events_enabled_count--;
-            clear_bit(vcpu_id, vcpu->trace_dstate);
+            clear_bit(vcpu_id, vcpu->trace_dstate_delayed);
             (*ev->dstate)--;
         }
+        /*
+         * Delay changes until next TB; we want all TBs to be built from a
+         * single set of dstate values to ensure consistency of generated
+         * tracing code.
+         */
+        async_run_on_cpu(vcpu, trace_event_synchronize_vcpu_state_dynamic,
+                         RUN_ON_CPU_NULL);
     }
 }
 

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Qemu-devel] [PATCH v9 4/7] exec: [tcg] Use different TBs according to the vCPU's dynamic tracing state
  2017-06-27  9:52 [Qemu-devel] [PATCH v9 0/7] trace: [tcg] Optimize per-vCPU tracing states with separate TB caches Lluís Vilanova
                   ` (2 preceding siblings ...)
  2017-06-27 10:04 ` [Qemu-devel] [PATCH v9 3/7] trace: [tcg] Delay changes to dynamic state when translating Lluís Vilanova
@ 2017-06-27 10:08 ` Lluís Vilanova
  2017-06-27 10:12 ` [Qemu-devel] [PATCH v9 5/7] trace: [tcg] Do not generate TCG code to trace dinamically-disabled events Lluís Vilanova
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Lluís Vilanova @ 2017-06-27 10:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: Emilio G. Cota, Eric Blake, Eduardo Habkost, Stefan Hajnoczi,
	Paolo Bonzini, Peter Crosthwaite, Richard Henderson

Every vCPU now uses a separate set of TBs for each set of dynamic
tracing event state values. Each set of TBs can be used by any number of
vCPUs to maximize TB reuse when vCPUs have the same tracing state.

This feature is later used by tracetool to optimize tracing of guest
code events.

The maximum number of TB sets is defined as 2^E, where E is the number
of events that have the 'vcpu' property (their state is stored in
CPUState->trace_dstate).

For this to work, a change on the dynamic tracing state of a vCPU will
force it to flush its virtual TB cache (which is only indexed by
address), and fall back to the physical TB cache (which now contains the
vCPU's dynamic tracing state as part of the hashing function).

Signed-off-by: Lluís Vilanova <vilanova@ac.upc.edu>
Reviewed-by: Richard Henderson <rth@twiddle.net>
---
 accel/tcg/cpu-exec.c      |    8 ++++++--
 accel/tcg/translate-all.c |   11 +++++++++--
 include/exec/exec-all.h   |    6 ++++++
 include/exec/tb-hash-xx.h |    7 +++++--
 include/exec/tb-hash.h    |    5 +++--
 tcg/tcg-runtime.c         |    3 ++-
 tests/qht-bench.c         |    2 +-
 trace/control-target.c    |    1 +
 trace/control.h           |    3 +++
 9 files changed, 36 insertions(+), 10 deletions(-)

diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index 3581618bc0..d84b01d1b8 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -280,6 +280,7 @@ struct tb_desc {
     CPUArchState *env;
     tb_page_addr_t phys_page1;
     uint32_t flags;
+    uint32_t trace_vcpu_dstate;
 };
 
 static bool tb_cmp(const void *p, const void *d)
@@ -291,6 +292,7 @@ static bool tb_cmp(const void *p, const void *d)
         tb->page_addr[0] == desc->phys_page1 &&
         tb->cs_base == desc->cs_base &&
         tb->flags == desc->flags &&
+        tb->trace_vcpu_dstate == desc->trace_vcpu_dstate &&
         !atomic_read(&tb->invalid)) {
         /* check next page if needed */
         if (tb->page_addr[1] == -1) {
@@ -319,10 +321,11 @@ TranslationBlock *tb_htable_lookup(CPUState *cpu, target_ulong pc,
     desc.env = (CPUArchState *)cpu->env_ptr;
     desc.cs_base = cs_base;
     desc.flags = flags;
+    desc.trace_vcpu_dstate = *cpu->trace_dstate;
     desc.pc = pc;
     phys_pc = get_page_addr_code(desc.env, pc);
     desc.phys_page1 = phys_pc & TARGET_PAGE_MASK;
-    h = tb_hash_func(phys_pc, pc, flags);
+    h = tb_hash_func(phys_pc, pc, flags, *cpu->trace_dstate);
     return qht_lookup(&tcg_ctx.tb_ctx.htable, tb_cmp, &desc, h);
 }
 
@@ -342,7 +345,8 @@ static inline TranslationBlock *tb_find(CPUState *cpu,
     cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags);
     tb = atomic_rcu_read(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)]);
     if (unlikely(!tb || tb->pc != pc || tb->cs_base != cs_base ||
-                 tb->flags != flags)) {
+                 tb->flags != flags ||
+                 tb->trace_vcpu_dstate != *cpu->trace_dstate)) {
         tb = tb_htable_lookup(cpu, pc, cs_base, flags);
         if (!tb) {
 
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 03820e3aeb..71badaaa6b 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -54,6 +54,7 @@
 #include "exec/tb-hash.h"
 #include "translate-all.h"
 #include "qemu/bitmap.h"
+#include "qemu/error-report.h"
 #include "qemu/timer.h"
 #include "qemu/main-loop.h"
 #include "exec/log.h"
@@ -112,6 +113,11 @@ typedef struct PageDesc {
 #define V_L2_BITS 10
 #define V_L2_SIZE (1 << V_L2_BITS)
 
+/* Make sure all possible CPU event bits fit in tb->trace_ds */
+QEMU_BUILD_BUG_ON(CPU_TRACE_DSTATE_MAX_EVENTS >
+                  sizeof(((TranslationBlock *)0)->trace_vcpu_dstate)
+                  * BITS_PER_BYTE);
+
 uintptr_t qemu_host_page_size;
 intptr_t qemu_host_page_mask;
 
@@ -1102,7 +1108,7 @@ void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr)
 
     /* remove the TB from the hash list */
     phys_pc = tb->page_addr[0] + (tb->pc & ~TARGET_PAGE_MASK);
-    h = tb_hash_func(phys_pc, tb->pc, tb->flags);
+    h = tb_hash_func(phys_pc, tb->pc, tb->flags, tb->trace_vcpu_dstate);
     qht_remove(&tcg_ctx.tb_ctx.htable, tb, h);
 
     /* remove the TB from the page list */
@@ -1247,7 +1253,7 @@ static void tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc,
     }
 
     /* add in the hash table */
-    h = tb_hash_func(phys_pc, tb->pc, tb->flags);
+    h = tb_hash_func(phys_pc, tb->pc, tb->flags, tb->trace_vcpu_dstate);
     qht_insert(&tcg_ctx.tb_ctx.htable, tb, h);
 
 #ifdef DEBUG_TB_CHECK
@@ -1293,6 +1299,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
     tb->cs_base = cs_base;
     tb->flags = flags;
     tb->cflags = cflags;
+    tb->trace_vcpu_dstate = *cpu->trace_dstate;
     tb->invalid = false;
 
 #ifdef CONFIG_PROFILER
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index b0281b000f..d918410944 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -310,6 +310,10 @@ static inline void tlb_flush_by_mmuidx_all_cpus_synced(CPUState *cpu,
 #define USE_DIRECT_JUMP
 #endif
 
+/**
+ * TranslationBlock:
+ * @trace_vcpu_dstate: Per-vCPU dynamic tracing state used to generate this TB.
+ */
 struct TranslationBlock {
     target_ulong pc;   /* simulated PC corresponding to this block (EIP + CS base) */
     target_ulong cs_base; /* CS base for this block */
@@ -324,6 +328,8 @@ struct TranslationBlock {
 #define CF_USE_ICOUNT  0x20000
 #define CF_IGNORE_ICOUNT 0x40000 /* Do not generate icount code */
 
+    uint32_t trace_vcpu_dstate;
+
     uint16_t invalid;
 
     void *tc_ptr;    /* pointer to the translated code */
diff --git a/include/exec/tb-hash-xx.h b/include/exec/tb-hash-xx.h
index 2c40b5c466..6cd3022c07 100644
--- a/include/exec/tb-hash-xx.h
+++ b/include/exec/tb-hash-xx.h
@@ -49,7 +49,7 @@
  * contiguous in memory.
  */
 static inline
-uint32_t tb_hash_func5(uint64_t a0, uint64_t b0, uint32_t e)
+uint32_t tb_hash_func6(uint64_t a0, uint64_t b0, uint32_t e, uint32_t f)
 {
     uint32_t v1 = TB_HASH_XX_SEED + PRIME32_1 + PRIME32_2;
     uint32_t v2 = TB_HASH_XX_SEED + PRIME32_2;
@@ -78,11 +78,14 @@ uint32_t tb_hash_func5(uint64_t a0, uint64_t b0, uint32_t e)
     v4 *= PRIME32_1;
 
     h32 = rol32(v1, 1) + rol32(v2, 7) + rol32(v3, 12) + rol32(v4, 18);
-    h32 += 20;
+    h32 += 24;
 
     h32 += e * PRIME32_3;
     h32  = rol32(h32, 17) * PRIME32_4;
 
+    h32 += f * PRIME32_3;
+    h32  = rol32(h32, 17) * PRIME32_4;
+
     h32 ^= h32 >> 15;
     h32 *= PRIME32_2;
     h32 ^= h32 >> 13;
diff --git a/include/exec/tb-hash.h b/include/exec/tb-hash.h
index b1fe2d0161..17b5ee0edf 100644
--- a/include/exec/tb-hash.h
+++ b/include/exec/tb-hash.h
@@ -58,9 +58,10 @@ static inline unsigned int tb_jmp_cache_hash_func(target_ulong pc)
 #endif /* CONFIG_SOFTMMU */
 
 static inline
-uint32_t tb_hash_func(tb_page_addr_t phys_pc, target_ulong pc, uint32_t flags)
+uint32_t tb_hash_func(tb_page_addr_t phys_pc, target_ulong pc, uint32_t flags,
+                      uint32_t trace_vcpu_dstate)
 {
-    return tb_hash_func5(phys_pc, pc, flags);
+    return tb_hash_func6(phys_pc, pc, flags, trace_vcpu_dstate);
 }
 
 #endif
diff --git a/tcg/tcg-runtime.c b/tcg/tcg-runtime.c
index ec3a34e461..3e23649dd7 100644
--- a/tcg/tcg-runtime.c
+++ b/tcg/tcg-runtime.c
@@ -158,7 +158,8 @@ void *HELPER(lookup_tb_ptr)(CPUArchState *env, target_ulong addr)
     if (unlikely(!(tb
                    && tb->pc == addr
                    && tb->cs_base == cs_base
-                   && tb->flags == flags))) {
+                   && tb->flags == flags
+                   && tb->trace_vcpu_dstate == *cpu->trace_dstate))) {
         tb = tb_htable_lookup(cpu, addr, cs_base, flags);
         if (!tb) {
             return tcg_ctx.code_gen_epilogue;
diff --git a/tests/qht-bench.c b/tests/qht-bench.c
index 2afa09d859..11c1cec766 100644
--- a/tests/qht-bench.c
+++ b/tests/qht-bench.c
@@ -103,7 +103,7 @@ static bool is_equal(const void *obj, const void *userp)
 
 static inline uint32_t h(unsigned long v)
 {
-    return tb_hash_func5(v, 0, 0);
+    return tb_hash_func6(v, 0, 0, 0);
 }
 
 /*
diff --git a/trace/control-target.c b/trace/control-target.c
index 9d44566c95..329127873e 100644
--- a/trace/control-target.c
+++ b/trace/control-target.c
@@ -62,6 +62,7 @@ static void trace_event_synchronize_vcpu_state_dynamic(
 {
     bitmap_copy(vcpu->trace_dstate, vcpu->trace_dstate_delayed,
                 trace_get_vcpu_event_count());
+    tb_flush_jmp_cache_all(vcpu);
 }
 
 void trace_event_set_vcpu_state_dynamic(CPUState *vcpu,
diff --git a/trace/control.h b/trace/control.h
index 4ea53e2986..b931824d60 100644
--- a/trace/control.h
+++ b/trace/control.h
@@ -165,6 +165,9 @@ void trace_event_set_state_dynamic(TraceEvent *ev, bool state);
  * Set the dynamic tracing state of an event for the given vCPU.
  *
  * Pre-condition: trace_event_get_vcpu_state_static(ev) == true
+ *
+ * Note: Changes for execution-time events with the 'tcg' property will not be
+ *       propagated until the next TB is executed (iff executing in TCG mode).
  */
 void trace_event_set_vcpu_state_dynamic(CPUState *vcpu,
                                         TraceEvent *ev, bool state);

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Qemu-devel] [PATCH v9 5/7] trace: [tcg] Do not generate TCG code to trace dinamically-disabled events
  2017-06-27  9:52 [Qemu-devel] [PATCH v9 0/7] trace: [tcg] Optimize per-vCPU tracing states with separate TB caches Lluís Vilanova
                   ` (3 preceding siblings ...)
  2017-06-27 10:08 ` [Qemu-devel] [PATCH v9 4/7] exec: [tcg] Use different TBs according to the vCPU's dynamic tracing state Lluís Vilanova
@ 2017-06-27 10:12 ` Lluís Vilanova
  2017-06-27 10:16 ` [Qemu-devel] [PATCH v9 6/7] trace: [tcg, trivial] Re-align generated code Lluís Vilanova
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Lluís Vilanova @ 2017-06-27 10:12 UTC (permalink / raw)
  To: qemu-devel; +Cc: Emilio G. Cota, Eric Blake, Eduardo Habkost, Stefan Hajnoczi

If an event is dynamically disabled, the TCG code that calls the
execution-time tracer is not generated.

Removes the overheads of execution-time tracers for dynamically disabled
events. As a bonus, also avoids checking the event state when the
execution-time tracer is called from TCG-generated code (since otherwise
TCG would simply not call it).

Signed-off-by: Lluís Vilanova <vilanova@ac.upc.edu>
---
 scripts/tracetool/__init__.py            |    3 ++-
 scripts/tracetool/format/h.py            |   26 +++++++++++++++++++-------
 scripts/tracetool/format/tcg_h.py        |   21 +++++++++++++++++----
 scripts/tracetool/format/tcg_helper_c.py |    5 +++--
 4 files changed, 41 insertions(+), 14 deletions(-)

diff --git a/scripts/tracetool/__init__.py b/scripts/tracetool/__init__.py
index 1ffbc1dc40..d4c204a472 100644
--- a/scripts/tracetool/__init__.py
+++ b/scripts/tracetool/__init__.py
@@ -6,7 +6,7 @@ Machinery for generating tracing-related intermediate files.
 """
 
 __author__     = "Lluís Vilanova <vilanova@ac.upc.edu>"
-__copyright__  = "Copyright 2012-2016, Lluís Vilanova <vilanova@ac.upc.edu>"
+__copyright__  = "Copyright 2012-2017, Lluís Vilanova <vilanova@ac.upc.edu>"
 __license__    = "GPL version 2 or (at your option) any later version"
 
 __maintainer__ = "Stefan Hajnoczi"
@@ -268,6 +268,7 @@ class Event(object):
         return self._FMT.findall(self.fmt)
 
     QEMU_TRACE               = "trace_%(name)s"
+    QEMU_TRACE_NOCHECK       = "_nocheck__" + QEMU_TRACE
     QEMU_TRACE_TCG           = QEMU_TRACE + "_tcg"
     QEMU_DSTATE              = "_TRACE_%(NAME)s_DSTATE"
     QEMU_EVENT               = "_TRACE_%(NAME)s_EVENT"
diff --git a/scripts/tracetool/format/h.py b/scripts/tracetool/format/h.py
index 3682f4e6a8..aecf249d66 100644
--- a/scripts/tracetool/format/h.py
+++ b/scripts/tracetool/format/h.py
@@ -6,7 +6,7 @@ trace/generated-tracers.h
 """
 
 __author__     = "Lluís Vilanova <vilanova@ac.upc.edu>"
-__copyright__  = "Copyright 2012-2016, Lluís Vilanova <vilanova@ac.upc.edu>"
+__copyright__  = "Copyright 2012-2017, Lluís Vilanova <vilanova@ac.upc.edu>"
 __license__    = "GPL version 2 or (at your option) any later version"
 
 __maintainer__ = "Stefan Hajnoczi"
@@ -49,6 +49,19 @@ def generate(events, backend, group):
     backend.generate_begin(events, group)
 
     for e in events:
+        # tracer without checks
+        out('',
+            'static inline void %(api)s(%(args)s)',
+            '{',
+            api=e.api(e.QEMU_TRACE_NOCHECK),
+            args=e.args)
+
+        if "disable" not in e.properties:
+            backend.generate(e, group)
+
+        out('}')
+
+        # tracer wrapper with checks (per-vCPU tracing)
         if "vcpu" in e.properties:
             trace_cpu = next(iter(e.args))[1]
             cond = "trace_event_get_vcpu_state(%(cpu)s,"\
@@ -63,16 +76,15 @@ def generate(events, backend, group):
             'static inline void %(api)s(%(args)s)',
             '{',
             '    if (%(cond)s) {',
+            '        %(api_nocheck)s(%(names)s);',
+            '    }',
+            '}',
             api=e.api(),
+            api_nocheck=e.api(e.QEMU_TRACE_NOCHECK),
             args=e.args,
+            names=", ".join(e.args.names()),
             cond=cond)
 
-        if "disable" not in e.properties:
-            backend.generate(e, group)
-
-        out('    }',
-            '}')
-
     backend.generate_end(events, group)
 
     out('#endif /* TRACE_%s_GENERATED_TRACERS_H */' % group.upper())
diff --git a/scripts/tracetool/format/tcg_h.py b/scripts/tracetool/format/tcg_h.py
index db55f52eb5..1651cc3f71 100644
--- a/scripts/tracetool/format/tcg_h.py
+++ b/scripts/tracetool/format/tcg_h.py
@@ -6,7 +6,7 @@ Generate .h file for TCG code generation.
 """
 
 __author__     = "Lluís Vilanova <vilanova@ac.upc.edu>"
-__copyright__  = "Copyright 2012-2016, Lluís Vilanova <vilanova@ac.upc.edu>"
+__copyright__  = "Copyright 2012-2017, Lluís Vilanova <vilanova@ac.upc.edu>"
 __license__    = "GPL version 2 or (at your option) any later version"
 
 __maintainer__ = "Stefan Hajnoczi"
@@ -46,7 +46,7 @@ def generate(events, backend, group):
 
     for e in events:
         # just keep one of them
-        if "tcg-trans" not in e.properties:
+        if "tcg-exec" not in e.properties:
             continue
 
         out('static inline void %(name_tcg)s(%(args)s)',
@@ -58,12 +58,25 @@ def generate(events, backend, group):
             args_trans = e.original.event_trans.args
             args_exec = tracetool.vcpu.transform_args(
                 "tcg_helper_c", e.original.event_exec, "wrapper")
+            if "vcpu" in e.properties:
+                trace_cpu = e.args.names()[0]
+                cond = "trace_event_get_vcpu_state(%(cpu)s,"\
+                       " TRACE_%(id)s)"\
+                       % dict(
+                           cpu=trace_cpu,
+                           id=e.original.event_exec.name.upper())
+            else:
+                cond = "true"
+
             out('    %(name_trans)s(%(argnames_trans)s);',
-                '    gen_helper_%(name_exec)s(%(argnames_exec)s);',
+                '    if (%(cond)s) {',
+                '        gen_helper_%(name_exec)s(%(argnames_exec)s);',
+                '    }',
                 name_trans=e.original.event_trans.api(e.QEMU_TRACE),
                 name_exec=e.original.event_exec.api(e.QEMU_TRACE),
                 argnames_trans=", ".join(args_trans.names()),
-                argnames_exec=", ".join(args_exec.names()))
+                argnames_exec=", ".join(args_exec.names()),
+                cond=cond)
 
         out('}')
 
diff --git a/scripts/tracetool/format/tcg_helper_c.py b/scripts/tracetool/format/tcg_helper_c.py
index ec7acbe347..bbbd6ad0f4 100644
--- a/scripts/tracetool/format/tcg_helper_c.py
+++ b/scripts/tracetool/format/tcg_helper_c.py
@@ -6,7 +6,7 @@ Generate trace/generated-helpers.c.
 """
 
 __author__     = "Lluís Vilanova <vilanova@ac.upc.edu>"
-__copyright__  = "Copyright 2012-2016, Lluís Vilanova <vilanova@ac.upc.edu>"
+__copyright__  = "Copyright 2012-2017, Lluís Vilanova <vilanova@ac.upc.edu>"
 __license__    = "GPL version 2 or (at your option) any later version"
 
 __maintainer__ = "Stefan Hajnoczi"
@@ -71,10 +71,11 @@ def generate(events, backend, group):
 
         out('void %(name_tcg)s(%(args_api)s)',
             '{',
+            # NOTE: the check was already performed at TCG-generation time
             '    %(name)s(%(args_call)s);',
             '}',
             name_tcg="helper_%s_proxy" % e.api(),
-            name=e.api(),
+            name=e.api(e.QEMU_TRACE_NOCHECK),
             args_api=e_args_api,
             args_call=", ".join(e_args_call.casted()),
             )

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Qemu-devel] [PATCH v9 6/7] trace: [tcg, trivial] Re-align generated code
  2017-06-27  9:52 [Qemu-devel] [PATCH v9 0/7] trace: [tcg] Optimize per-vCPU tracing states with separate TB caches Lluís Vilanova
                   ` (4 preceding siblings ...)
  2017-06-27 10:12 ` [Qemu-devel] [PATCH v9 5/7] trace: [tcg] Do not generate TCG code to trace dinamically-disabled events Lluís Vilanova
@ 2017-06-27 10:16 ` Lluís Vilanova
  2017-06-27 10:20 ` [Qemu-devel] [PATCH v9 7/7] trace: [trivial] Statically enable all guest events Lluís Vilanova
  2017-06-27 17:58 ` [Qemu-devel] [PATCH v9 0/7] trace: [tcg] Optimize per-vCPU tracing states with separate TB caches Emilio G. Cota
  7 siblings, 0 replies; 12+ messages in thread
From: Lluís Vilanova @ 2017-06-27 10:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: Emilio G. Cota, Eric Blake, Eduardo Habkost, Stefan Hajnoczi,
	Michael Tokarev, Laurent Vivier, open list:Trivial patches

Last patch removed a nesting level in generated code. Re-align all code
generated by backends to be 4-column aligned.

Signed-off-by: Lluís Vilanova <vilanova@ac.upc.edu>
---
 scripts/tracetool/backend/dtrace.py |    4 ++--
 scripts/tracetool/backend/ftrace.py |   20 ++++++++++----------
 scripts/tracetool/backend/log.py    |   19 ++++++++++---------
 scripts/tracetool/backend/simple.py |    4 ++--
 scripts/tracetool/backend/syslog.py |    6 +++---
 scripts/tracetool/backend/ust.py    |    4 ++--
 6 files changed, 29 insertions(+), 28 deletions(-)

diff --git a/scripts/tracetool/backend/dtrace.py b/scripts/tracetool/backend/dtrace.py
index c469cbd1a3..c6812b70a2 100644
--- a/scripts/tracetool/backend/dtrace.py
+++ b/scripts/tracetool/backend/dtrace.py
@@ -6,7 +6,7 @@ DTrace/SystemTAP backend.
 """
 
 __author__     = "Lluís Vilanova <vilanova@ac.upc.edu>"
-__copyright__  = "Copyright 2012-2016, Lluís Vilanova <vilanova@ac.upc.edu>"
+__copyright__  = "Copyright 2012-2017, Lluís Vilanova <vilanova@ac.upc.edu>"
 __license__    = "GPL version 2 or (at your option) any later version"
 
 __maintainer__ = "Stefan Hajnoczi"
@@ -46,6 +46,6 @@ def generate_h_begin(events, group):
 
 
 def generate_h(event, group):
-    out('        QEMU_%(uppername)s(%(argnames)s);',
+    out('    QEMU_%(uppername)s(%(argnames)s);',
         uppername=event.name.upper(),
         argnames=", ".join(event.args.names()))
diff --git a/scripts/tracetool/backend/ftrace.py b/scripts/tracetool/backend/ftrace.py
index db9fe7ad57..dd0eda4441 100644
--- a/scripts/tracetool/backend/ftrace.py
+++ b/scripts/tracetool/backend/ftrace.py
@@ -29,17 +29,17 @@ def generate_h(event, group):
     if len(event.args) > 0:
         argnames = ", " + argnames
 
-    out('        {',
-        '            char ftrace_buf[MAX_TRACE_STRLEN];',
-        '            int unused __attribute__ ((unused));',
-        '            int trlen;',
-        '            if (trace_event_get_state(%(event_id)s)) {',
-        '                trlen = snprintf(ftrace_buf, MAX_TRACE_STRLEN,',
-        '                                 "%(name)s " %(fmt)s "\\n" %(argnames)s);',
-        '                trlen = MIN(trlen, MAX_TRACE_STRLEN - 1);',
-        '                unused = write(trace_marker_fd, ftrace_buf, trlen);',
-        '            }',
+    out('    {',
+        '        char ftrace_buf[MAX_TRACE_STRLEN];',
+        '        int unused __attribute__ ((unused));',
+        '        int trlen;',
+        '        if (trace_event_get_state(%(event_id)s)) {',
+        '            trlen = snprintf(ftrace_buf, MAX_TRACE_STRLEN,',
+        '                             "%(name)s " %(fmt)s "\\n" %(argnames)s);',
+        '            trlen = MIN(trlen, MAX_TRACE_STRLEN - 1);',
+        '            unused = write(trace_marker_fd, ftrace_buf, trlen);',
         '        }',
+        '    }',
         name=event.name,
         args=event.args,
         event_id="TRACE_" + event.name.upper(),
diff --git a/scripts/tracetool/backend/log.py b/scripts/tracetool/backend/log.py
index 4f4a4d38b1..54f0a69886 100644
--- a/scripts/tracetool/backend/log.py
+++ b/scripts/tracetool/backend/log.py
@@ -6,7 +6,7 @@ Stderr built-in backend.
 """
 
 __author__     = "Lluís Vilanova <vilanova@ac.upc.edu>"
-__copyright__  = "Copyright 2012-2016, Lluís Vilanova <vilanova@ac.upc.edu>"
+__copyright__  = "Copyright 2012-2017, Lluís Vilanova <vilanova@ac.upc.edu>"
 __license__    = "GPL version 2 or (at your option) any later version"
 
 __maintainer__ = "Stefan Hajnoczi"
@@ -35,14 +35,15 @@ def generate_h(event, group):
     else:
         cond = "trace_event_get_state(%s)" % ("TRACE_" + event.name.upper())
 
-    out('        if (%(cond)s) {',
-        '            struct timeval _now;',
-        '            gettimeofday(&_now, NULL);',
-        '            qemu_log_mask(LOG_TRACE, "%%d@%%zd.%%06zd:%(name)s " %(fmt)s "\\n",',
-        '                          getpid(),',
-        '                          (size_t)_now.tv_sec, (size_t)_now.tv_usec',
-        '                          %(argnames)s);',
-        '        }',
+    out('    if (%(cond)s) {',
+        '        struct timeval _now;',
+        '        gettimeofday(&_now, NULL);',
+        '        qemu_log_mask(LOG_TRACE,',
+        '                      "%%d@%%zd.%%06zd:%(name)s " %(fmt)s "\\n",',
+        '                      getpid(),',
+        '                      (size_t)_now.tv_sec, (size_t)_now.tv_usec',
+        '                      %(argnames)s);',
+        '    }',
         cond=cond,
         name=event.name,
         fmt=event.fmt.rstrip("\n"),
diff --git a/scripts/tracetool/backend/simple.py b/scripts/tracetool/backend/simple.py
index 4acc06e81c..f983670ee1 100644
--- a/scripts/tracetool/backend/simple.py
+++ b/scripts/tracetool/backend/simple.py
@@ -6,7 +6,7 @@ Simple built-in backend.
 """
 
 __author__     = "Lluís Vilanova <vilanova@ac.upc.edu>"
-__copyright__  = "Copyright 2012-2014, Lluís Vilanova <vilanova@ac.upc.edu>"
+__copyright__  = "Copyright 2012-2017, Lluís Vilanova <vilanova@ac.upc.edu>"
 __license__    = "GPL version 2 or (at your option) any later version"
 
 __maintainer__ = "Stefan Hajnoczi"
@@ -37,7 +37,7 @@ def generate_h_begin(events, group):
 
 
 def generate_h(event, group):
-    out('        _simple_%(api)s(%(args)s);',
+    out('    _simple_%(api)s(%(args)s);',
         api=event.api(),
         args=", ".join(event.args.names()))
 
diff --git a/scripts/tracetool/backend/syslog.py b/scripts/tracetool/backend/syslog.py
index b8ff2790c4..1ce627f0fc 100644
--- a/scripts/tracetool/backend/syslog.py
+++ b/scripts/tracetool/backend/syslog.py
@@ -35,9 +35,9 @@ def generate_h(event, group):
     else:
         cond = "trace_event_get_state(%s)" % ("TRACE_" + event.name.upper())
 
-    out('        if (%(cond)s) {',
-        '            syslog(LOG_INFO, "%(name)s " %(fmt)s %(argnames)s);',
-        '        }',
+    out('    if (%(cond)s) {',
+        '        syslog(LOG_INFO, "%(name)s " %(fmt)s %(argnames)s);',
+        '    }',
         cond=cond,
         name=event.name,
         fmt=event.fmt.rstrip("\n"),
diff --git a/scripts/tracetool/backend/ust.py b/scripts/tracetool/backend/ust.py
index 52ce892478..2adaf548d5 100644
--- a/scripts/tracetool/backend/ust.py
+++ b/scripts/tracetool/backend/ust.py
@@ -6,7 +6,7 @@ LTTng User Space Tracing backend.
 """
 
 __author__     = "Lluís Vilanova <vilanova@ac.upc.edu>"
-__copyright__  = "Copyright 2012-2016, Lluís Vilanova <vilanova@ac.upc.edu>"
+__copyright__  = "Copyright 2012-2017, Lluís Vilanova <vilanova@ac.upc.edu>"
 __license__    = "GPL version 2 or (at your option) any later version"
 
 __maintainer__ = "Stefan Hajnoczi"
@@ -35,6 +35,6 @@ def generate_h(event, group):
     if len(event.args) > 0:
         argnames = ", " + argnames
 
-    out('        tracepoint(qemu, %(name)s%(tp_args)s);',
+    out('    tracepoint(qemu, %(name)s%(tp_args)s);',
         name=event.name,
         tp_args=argnames)

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Qemu-devel] [PATCH v9 7/7] trace: [trivial] Statically enable all guest events
  2017-06-27  9:52 [Qemu-devel] [PATCH v9 0/7] trace: [tcg] Optimize per-vCPU tracing states with separate TB caches Lluís Vilanova
                   ` (5 preceding siblings ...)
  2017-06-27 10:16 ` [Qemu-devel] [PATCH v9 6/7] trace: [tcg, trivial] Re-align generated code Lluís Vilanova
@ 2017-06-27 10:20 ` Lluís Vilanova
  2017-06-27 17:58 ` [Qemu-devel] [PATCH v9 0/7] trace: [tcg] Optimize per-vCPU tracing states with separate TB caches Emilio G. Cota
  7 siblings, 0 replies; 12+ messages in thread
From: Lluís Vilanova @ 2017-06-27 10:20 UTC (permalink / raw)
  To: qemu-devel
  Cc: Emilio G. Cota, Eric Blake, Eduardo Habkost, Stefan Hajnoczi,
	Michael Tokarev, Laurent Vivier, open list:Trivial patches

The optimizations of this series makes it feasible to have them
available on all builds.

Some quick'n'dirty numbers with 400.perlbench (SPECcpu2006) on the train input
(medium size - suns.pl) and the guest_mem_before event:

* vanilla, statically disabled
real    0m2,259s
user    0m2,252s
sys     0m0,004s

* vanilla, statically enabled (overhead: 2.18x)
real    0m4,921s
user    0m4,912s
sys     0m0,008s

* multi-tb, statically disabled (overhead: 0.99x) [within noise range]
real    0m2,228s
user    0m2,216s
sys     0m0,008s

* multi-tb, statically enabled (overhead: 0.99x) [within noise range]
real    0m2,229s
user    0m2,224s
sys     0m0,004s


Now enabling all events when booting an ARM system that immediately shuts down (https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg04085.html):

* vanilla, statically disabled
real	0m32,153s
user	0m31,276s
sys	0m0,108s

* vanilla, statically enabled (overhead: 1.35x)
real	0m43,507s
user	0m42,680s
sys	0m0,168s

* multi-tb, statically disabled (overhead: 1.03x)
real	0m32,993s
user	0m32,516s
sys	0m0,104s

* multi-tb, statically enabled (overhead: 1.00x) [within noise range]
real	0m32,110s
user	0m31,176s
sys	0m0,156s

Signed-off-by: Lluís Vilanova <vilanova@ac.upc.edu>
---
 trace-events |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/trace-events b/trace-events
index bae63fdb1d..f9dbd7f509 100644
--- a/trace-events
+++ b/trace-events
@@ -106,7 +106,7 @@ vcpu guest_cpu_reset(void)
 #
 # Mode: user, softmmu
 # Targets: TCG(all)
-disable vcpu tcg guest_mem_before(TCGv vaddr, uint8_t info) "info=%d", "vaddr=0x%016"PRIx64" info=%d"
+vcpu tcg guest_mem_before(TCGv vaddr, uint8_t info) "info=%d", "vaddr=0x%016"PRIx64" info=%d"
 
 # @num: System call number.
 # @arg*: System call argument value.
@@ -115,7 +115,7 @@ disable vcpu tcg guest_mem_before(TCGv vaddr, uint8_t info) "info=%d", "vaddr=0x
 #
 # Mode: user
 # Targets: TCG(all)
-disable vcpu guest_user_syscall(uint64_t num, uint64_t arg1, uint64_t arg2, uint64_t arg3, uint64_t arg4, uint64_t arg5, uint64_t arg6, uint64_t arg7, uint64_t arg8) "num=0x%016"PRIx64" arg1=0x%016"PRIx64" arg2=0x%016"PRIx64" arg3=0x%016"PRIx64" arg4=0x%016"PRIx64" arg5=0x%016"PRIx64" arg6=0x%016"PRIx64" arg7=0x%016"PRIx64" arg8=0x%016"PRIx64
+vcpu guest_user_syscall(uint64_t num, uint64_t arg1, uint64_t arg2, uint64_t arg3, uint64_t arg4, uint64_t arg5, uint64_t arg6, uint64_t arg7, uint64_t arg8) "num=0x%016"PRIx64" arg1=0x%016"PRIx64" arg2=0x%016"PRIx64" arg3=0x%016"PRIx64" arg4=0x%016"PRIx64" arg5=0x%016"PRIx64" arg6=0x%016"PRIx64" arg7=0x%016"PRIx64" arg8=0x%016"PRIx64
 
 # @num: System call number.
 # @ret: System call result value.
@@ -124,4 +124,4 @@ disable vcpu guest_user_syscall(uint64_t num, uint64_t arg1, uint64_t arg2, uint
 #
 # Mode: user
 # Targets: TCG(all)
-disable vcpu guest_user_syscall_ret(uint64_t num, uint64_t ret) "num=0x%016"PRIx64" ret=0x%016"PRIx64
+vcpu guest_user_syscall_ret(uint64_t num, uint64_t ret) "num=0x%016"PRIx64" ret=0x%016"PRIx64

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH v9 0/7] trace: [tcg] Optimize per-vCPU tracing states with separate TB caches
  2017-06-27  9:52 [Qemu-devel] [PATCH v9 0/7] trace: [tcg] Optimize per-vCPU tracing states with separate TB caches Lluís Vilanova
                   ` (6 preceding siblings ...)
  2017-06-27 10:20 ` [Qemu-devel] [PATCH v9 7/7] trace: [trivial] Statically enable all guest events Lluís Vilanova
@ 2017-06-27 17:58 ` Emilio G. Cota
  2017-06-28 11:21   ` Lluís Vilanova
  7 siblings, 1 reply; 12+ messages in thread
From: Emilio G. Cota @ 2017-06-27 17:58 UTC (permalink / raw)
  To: Lluís Vilanova
  Cc: qemu-devel, Eric Blake, Eduardo Habkost, Stefan Hajnoczi

On Tue, Jun 27, 2017 at 12:52:00 +0300, Lluís Vilanova wrote:
> Changes in v9
> =============
> 
> * Rebase on 931892e8a6.
> * Undo renaming of tb->trace_vcpu_dstate to the shorter tb->trace_ds.
> * Add measurements to commit enabling all guest events.

I wanted to save you some time and sent a v9 yesterday with these
same changes -- although I see some changes in my v8 didn't make it
to your v9. For this iteration I only added more perf numbers to the
last patch, see here:
  https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg05764.html

		E.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH v9 0/7] trace: [tcg] Optimize per-vCPU tracing states with separate TB caches
  2017-06-27 17:58 ` [Qemu-devel] [PATCH v9 0/7] trace: [tcg] Optimize per-vCPU tracing states with separate TB caches Emilio G. Cota
@ 2017-06-28 11:21   ` Lluís Vilanova
  2017-06-28 23:07     ` Emilio G. Cota
  0 siblings, 1 reply; 12+ messages in thread
From: Lluís Vilanova @ 2017-06-28 11:21 UTC (permalink / raw)
  To: Emilio G. Cota; +Cc: qemu-devel, Stefan Hajnoczi, Eduardo Habkost

Emilio G Cota writes:

> On Tue, Jun 27, 2017 at 12:52:00 +0300, Lluís Vilanova wrote:
>> Changes in v9
>> =============
>> 
>> * Rebase on 931892e8a6.
>> * Undo renaming of tb->trace_vcpu_dstate to the shorter tb->trace_ds.
>> * Add measurements to commit enabling all guest events.

> I wanted to save you some time and sent a v9 yesterday with these
> same changes -- although I see some changes in my v8 didn't make it
> to your v9. For this iteration I only added more perf numbers to the
> last patch, see here:
>   https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg05764.html

> 		E.

Woops, sorry about the duplicate effort.

I just dropped the rename of variable trace_ds, since I prefer names whose
purpose can be more easily understood. Other than that, I sent a new series to
try to get a final review on the comments.


Thanks a lot,
  Lluis

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH v9 0/7] trace: [tcg] Optimize per-vCPU tracing states with separate TB caches
  2017-06-28 11:21   ` Lluís Vilanova
@ 2017-06-28 23:07     ` Emilio G. Cota
  2017-06-29  6:35       ` Lluís Vilanova
  0 siblings, 1 reply; 12+ messages in thread
From: Emilio G. Cota @ 2017-06-28 23:07 UTC (permalink / raw)
  To: qemu-devel, Stefan Hajnoczi, Eduardo Habkost

On Wed, Jun 28, 2017 at 14:21:29 +0300, Lluís Vilanova wrote:
> Emilio G Cota writes:
> > I wanted to save you some time and sent a v9 yesterday with these
> > same changes -- although I see some changes in my v8 didn't make it
> > to your v9. For this iteration I only added more perf numbers to the
> > last patch, see here:
> >   https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg05764.html
> 
> Woops, sorry about the duplicate effort.
> 
> I just dropped the rename of variable trace_ds, since I prefer names whose
> purpose can be more easily understood.

- I think trace_vcpu_dstate is way too long (IMO a short name + a comment
  in the struct is better).

- You missed a change I made to patch 3:
> [cota: use CPU_TRACE_DSTATE_MAX_EVENTS instead of trace_get_vcpu_event_count()]
  Was that intentional?

- Would be nice to include the perf numbers I got, along with yours.

		E.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH v9 0/7] trace: [tcg] Optimize per-vCPU tracing states with separate TB caches
  2017-06-28 23:07     ` Emilio G. Cota
@ 2017-06-29  6:35       ` Lluís Vilanova
  0 siblings, 0 replies; 12+ messages in thread
From: Lluís Vilanova @ 2017-06-29  6:35 UTC (permalink / raw)
  To: Emilio G. Cota; +Cc: qemu-devel, Stefan Hajnoczi, Eduardo Habkost

Emilio G Cota writes:

> On Wed, Jun 28, 2017 at 14:21:29 +0300, Lluís Vilanova wrote:
>> Emilio G Cota writes:
>> > I wanted to save you some time and sent a v9 yesterday with these
>> > same changes -- although I see some changes in my v8 didn't make it
>> > to your v9. For this iteration I only added more perf numbers to the
>> > last patch, see here:
>> >   https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg05764.html
>> 
>> Woops, sorry about the duplicate effort.
>> 
>> I just dropped the rename of variable trace_ds, since I prefer names whose
>> purpose can be more easily understood.

> - I think trace_vcpu_dstate is way too long (IMO a short name + a comment
>   in the struct is better).

> - You missed a change I made to patch 3:
>> [cota: use CPU_TRACE_DSTATE_MAX_EVENTS instead of trace_get_vcpu_event_count()]
>   Was that intentional?

No it wasn't. I'll look into it for the next series.


> - Would be nice to include the perf numbers I got, along with yours.

I'll add them to the cover and commit message.


Thanks,
  Lluis

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2017-06-29  6:35 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-27  9:52 [Qemu-devel] [PATCH v9 0/7] trace: [tcg] Optimize per-vCPU tracing states with separate TB caches Lluís Vilanova
2017-06-27  9:56 ` [Qemu-devel] [PATCH v9 1/7] exec: [tcg] Refactor flush of per-CPU virtual TB cache Lluís Vilanova
2017-06-27 10:00 ` [Qemu-devel] [PATCH v9 2/7] trace: Allocate cpu->trace_dstate in place Lluís Vilanova
2017-06-27 10:04 ` [Qemu-devel] [PATCH v9 3/7] trace: [tcg] Delay changes to dynamic state when translating Lluís Vilanova
2017-06-27 10:08 ` [Qemu-devel] [PATCH v9 4/7] exec: [tcg] Use different TBs according to the vCPU's dynamic tracing state Lluís Vilanova
2017-06-27 10:12 ` [Qemu-devel] [PATCH v9 5/7] trace: [tcg] Do not generate TCG code to trace dinamically-disabled events Lluís Vilanova
2017-06-27 10:16 ` [Qemu-devel] [PATCH v9 6/7] trace: [tcg, trivial] Re-align generated code Lluís Vilanova
2017-06-27 10:20 ` [Qemu-devel] [PATCH v9 7/7] trace: [trivial] Statically enable all guest events Lluís Vilanova
2017-06-27 17:58 ` [Qemu-devel] [PATCH v9 0/7] trace: [tcg] Optimize per-vCPU tracing states with separate TB caches Emilio G. Cota
2017-06-28 11:21   ` Lluís Vilanova
2017-06-28 23:07     ` Emilio G. Cota
2017-06-29  6:35       ` Lluís Vilanova

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.