All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v5 0/7] trace: [tcg] Optimize per-vCPU tracing states with separate TB caches
@ 2016-12-28 14:07 Lluís Vilanova
  2016-12-28 14:07 ` [Qemu-devel] [PATCH v5 1/7] exec: [tcg] Refactor flush of per-CPU virtual TB cache Lluís Vilanova
                   ` (6 more replies)
  0 siblings, 7 replies; 10+ messages in thread
From: Lluís Vilanova @ 2016-12-28 14:07 UTC (permalink / raw)
  To: qemu-devel; +Cc: Eric Blake, Eduardo Habkost, Stefan Hajnoczi

Optimizes tracing of events with the 'tcg' and 'vcpu' properties (e.g., memory
accesses), making it feasible to statically enable them by default on all QEMU
builds.

Some quick'n'dirty numbers with 400.perlbench (SPECcpu2006) on the train input
(medium size - suns.pl) and the guest_mem_before event:

* vanilla, statically disabled
real	0m2,259s
user	0m2,252s
sys	0m0,004s

* vanilla, statically enabled (overhead: 2.18x)
real	0m4,921s
user	0m4,912s
sys	0m0,008s

* multi-tb, statically disabled (overhead: 0.99x) [within noise range]
real	0m2,228s
user	0m2,216s
sys	0m0,008s

* multi-tb, statically enabled (overhead: 0.99x) [within noise range]
real	0m2,229s
user	0m2,224s
sys	0m0,004s


Right now, events with the 'tcg' property always generate TCG code to trace that
event at guest code execution time, where the event's dynamic state is checked.

This series adds a performance optimization where TCG code for events with the
'tcg' and 'vcpu' properties is not generated if the event is dynamically
disabled. This optimization raises two issues:

* An event can be dynamically disabled/enabled after the corresponding TCG code
  has been generated (i.e., a new TB with the corresponding code should be
  used).

* Each vCPU can have a different dynamic state for the same event (i.e., tracing
  the memory accesses of only one process pinned to a vCPU).

To handle both issues, this series integrates the dynamic tracing event state
into the TB hashing function, so that vCPUs tracing different events will use
separate TBs. Note that only events with the 'vcpu' property are used for
hashing (as stored in the bitmap of CPUState->trace_dstate).

This makes dynamic event state changes on vCPUs very efficient, since they can
use TBs produced by other vCPUs while on the same event state combination (or
produced by the same vCPU, earlier).

Discarded alternatives:

* Emitting TCG code to check if an event needs tracing, where we should still
  move the tracing call code to either a cold path (making tracing performance
  worse), or leave it inlined (making non-tracing performance worse).

* Eliding TCG code only when *zero* vCPUs are tracing an event, since enabling
  it on a single vCPU will impact the performance of all other vCPUs that are
  not tracing that event.

Signed-off-by: Lluís Vilanova <vilanova@ac.upc.edu>
---

Changes in v5
=============

* Move define into "qemu-common.h" to allow compilation of tests.


Changes in v4
=============

* Incorporate trace_dstate into the TB hashing function instead of using
  multiple physical TB caches [suggested by Richard Henderson].


Changes in v3
=============

* Rebase on 0737f32daf.
* Do not use reserved symbol prefixes ("__") [Stefan Hajnoczi].
* Refactor trace_get_vcpu_event_count() to be inlinable.
* Optimize cpu_tb_cache_set_requested() (hottest path).


Changes in v2
=============

* Fix bitmap copy in cpu_tb_cache_set_apply().
* Split generated code re-alignment into a separate patch [Daniel P. Berrange].


Lluís Vilanova (7):
      exec: [tcg] Refactor flush of per-CPU virtual TB cache
      trace: Make trace_get_vcpu_event_count() inlinable
      trace: [tcg] Delay changes to dynamic state when translating
      exec: [tcg] Use different TBs according to the vCPU's dynamic tracing state
      trace: [tcg] Do not generate TCG code to trace dinamically-disabled events
      trace: [tcg,trivial] Re-align generated code
      trace: [trivial] Statically enable all guest events


 cpu-exec.c                               |   52 +++++++++++++++++++++++++++---
 cputlb.c                                 |    2 +
 include/exec/exec-all.h                  |   11 ++++++
 include/exec/tb-hash-xx.h                |   11 ++++++
 include/exec/tb-hash.h                   |    5 ++-
 include/qemu-common.h                    |    3 ++
 include/qom/cpu.h                        |    7 ++++
 qom/cpu.c                                |    4 ++
 scripts/tracetool/__init__.py            |    1 +
 scripts/tracetool/backend/dtrace.py      |    2 +
 scripts/tracetool/backend/ftrace.py      |   20 ++++++------
 scripts/tracetool/backend/log.py         |   17 +++++-----
 scripts/tracetool/backend/simple.py      |    2 +
 scripts/tracetool/backend/syslog.py      |    6 ++-
 scripts/tracetool/backend/ust.py         |    2 +
 scripts/tracetool/format/h.py            |   24 ++++++++++----
 scripts/tracetool/format/tcg_h.py        |   19 +++++++++--
 scripts/tracetool/format/tcg_helper_c.py |    3 +-
 tests/qht-bench.c                        |    2 +
 trace-events                             |    6 ++-
 trace/control-internal.h                 |    5 +++
 trace/control-target.c                   |   14 +++++++-
 trace/control.c                          |    9 +----
 trace/control.h                          |    5 ++-
 translate-all.c                          |   30 +++++++++++++----
 25 files changed, 198 insertions(+), 64 deletions(-)


To: qemu-devel@nongnu.org
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Eric Blake <eblake@redhat.com>

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2016-12-28 16:23 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-28 14:07 [Qemu-devel] [PATCH v5 0/7] trace: [tcg] Optimize per-vCPU tracing states with separate TB caches Lluís Vilanova
2016-12-28 14:07 ` [Qemu-devel] [PATCH v5 1/7] exec: [tcg] Refactor flush of per-CPU virtual TB cache Lluís Vilanova
2016-12-28 14:07 ` [Qemu-devel] [PATCH v5 2/7] trace: Make trace_get_vcpu_event_count() inlinable Lluís Vilanova
2016-12-28 14:07 ` [Qemu-devel] [PATCH v5 3/7] trace: [tcg] Delay changes to dynamic state when translating Lluís Vilanova
2016-12-28 14:08 ` [Qemu-devel] [PATCH v5 4/7] exec: [tcg] Use different TBs according to the vCPU's dynamic tracing state Lluís Vilanova
2016-12-28 16:08   ` Richard Henderson
2016-12-28 16:23     ` Lluís Vilanova
2016-12-28 14:08 ` [Qemu-devel] [PATCH v5 5/7] trace: [tcg] Do not generate TCG code to trace dinamically-disabled events Lluís Vilanova
2016-12-28 14:08 ` [Qemu-devel] [PATCH v5 6/7] trace: [tcg, trivial] Re-align generated code Lluís Vilanova
2016-12-28 14:08 ` [Qemu-devel] [PATCH v5 7/7] trace: [trivial] Statically enable all guest events Lluís Vilanova

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.