All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stefan Hajnoczi <stefanha@redhat.com>
To: "Lluís Vilanova" <vilanova@ac.upc.edu>
Cc: qemu-devel@nongnu.org, Eric Blake <eblake@redhat.com>,
	Eduardo Habkost <ehabkost@redhat.com>
Subject: Re: [Qemu-devel] [PATCH v6 0/7] trace: [tcg] Optimize per-vCPU tracing states with separate TB caches
Date: Mon, 9 Jan 2017 17:04:34 +0000	[thread overview]
Message-ID: <20170109170434.GM30228@stefanha-x1.localdomain> (raw)
In-Reply-To: <148295045448.19871.9819696634619157347.stgit@fimbulvetr.bsc.es>

[-- Attachment #1: Type: text/plain, Size: 5841 bytes --]

On Wed, Dec 28, 2016 at 07:40:54PM +0100, Lluís Vilanova wrote:
> Optimizes tracing of events with the 'tcg' and 'vcpu' properties (e.g., memory
> accesses), making it feasible to statically enable them by default on all QEMU
> builds.
> 
> Some quick'n'dirty numbers with 400.perlbench (SPECcpu2006) on the train input
> (medium size - suns.pl) and the guest_mem_before event:
> 
> * vanilla, statically disabled
> real	0m2,259s
> user	0m2,252s
> sys	0m0,004s
> 
> * vanilla, statically enabled (overhead: 2.18x)
> real	0m4,921s
> user	0m4,912s
> sys	0m0,008s
> 
> * multi-tb, statically disabled (overhead: 0.99x) [within noise range]
> real	0m2,228s
> user	0m2,216s
> sys	0m0,008s
> 
> * multi-tb, statically enabled (overhead: 0.99x) [within noise range]
> real	0m2,229s
> user	0m2,224s
> sys	0m0,004s
> 
> 
> Right now, events with the 'tcg' property always generate TCG code to trace that
> event at guest code execution time, where the event's dynamic state is checked.
> 
> This series adds a performance optimization where TCG code for events with the
> 'tcg' and 'vcpu' properties is not generated if the event is dynamically
> disabled. This optimization raises two issues:
> 
> * An event can be dynamically disabled/enabled after the corresponding TCG code
>   has been generated (i.e., a new TB with the corresponding code should be
>   used).
> 
> * Each vCPU can have a different dynamic state for the same event (i.e., tracing
>   the memory accesses of only one process pinned to a vCPU).
> 
> To handle both issues, this series integrates the dynamic tracing event state
> into the TB hashing function, so that vCPUs tracing different events will use
> separate TBs. Note that only events with the 'vcpu' property are used for
> hashing (as stored in the bitmap of CPUState->trace_dstate).
> 
> This makes dynamic event state changes on vCPUs very efficient, since they can
> use TBs produced by other vCPUs while on the same event state combination (or
> produced by the same vCPU, earlier).
> 
> Discarded alternatives:
> 
> * Emitting TCG code to check if an event needs tracing, where we should still
>   move the tracing call code to either a cold path (making tracing performance
>   worse), or leave it inlined (making non-tracing performance worse).
> 
> * Eliding TCG code only when *zero* vCPUs are tracing an event, since enabling
>   it on a single vCPU will impact the performance of all other vCPUs that are
>   not tracing that event.
> 
> Signed-off-by: Lluís Vilanova <vilanova@ac.upc.edu>
> ---
> 
> Changes in v6
> =============
> 
> * Check hashing size error with QEMU_BUILD_BUG_ON [Richard Henderson].
> 
> 
> Changes in v5
> =============
> 
> * Move define into "qemu-common.h" to allow compilation of tests.
> 
> 
> Changes in v4
> =============
> 
> * Incorporate trace_dstate into the TB hashing function instead of using
>   multiple physical TB caches [suggested by Richard Henderson].
> 
> 
> Changes in v3
> =============
> 
> * Rebase on 0737f32daf.
> * Do not use reserved symbol prefixes ("__") [Stefan Hajnoczi].
> * Refactor trace_get_vcpu_event_count() to be inlinable.
> * Optimize cpu_tb_cache_set_requested() (hottest path).
> 
> 
> Changes in v2
> =============
> 
> * Fix bitmap copy in cpu_tb_cache_set_apply().
> * Split generated code re-alignment into a separate patch [Daniel P. Berrange].
> 
> 
> Lluís Vilanova (7):
>       exec: [tcg] Refactor flush of per-CPU virtual TB cache
>       trace: Make trace_get_vcpu_event_count() inlinable
>       trace: [tcg] Delay changes to dynamic state when translating
>       exec: [tcg] Use different TBs according to the vCPU's dynamic tracing state
>       trace: [tcg] Do not generate TCG code to trace dinamically-disabled events
>       trace: [tcg,trivial] Re-align generated code
>       trace: [trivial] Statically enable all guest events
> 
> 
>  cpu-exec.c                               |   52 +++++++++++++++++++++++++++---
>  cputlb.c                                 |    2 +
>  include/exec/exec-all.h                  |   11 ++++++
>  include/exec/tb-hash-xx.h                |    8 ++++-
>  include/exec/tb-hash.h                   |    5 ++-
>  include/qemu-common.h                    |    3 ++
>  include/qom/cpu.h                        |    7 ++++
>  qom/cpu.c                                |    4 ++
>  scripts/tracetool/__init__.py            |    1 +
>  scripts/tracetool/backend/dtrace.py      |    2 +
>  scripts/tracetool/backend/ftrace.py      |   20 ++++++------
>  scripts/tracetool/backend/log.py         |   17 +++++-----
>  scripts/tracetool/backend/simple.py      |    2 +
>  scripts/tracetool/backend/syslog.py      |    6 ++-
>  scripts/tracetool/backend/ust.py         |    2 +
>  scripts/tracetool/format/h.py            |   24 ++++++++++----
>  scripts/tracetool/format/tcg_h.py        |   19 +++++++++--
>  scripts/tracetool/format/tcg_helper_c.py |    3 +-
>  tests/qht-bench.c                        |    2 +
>  trace-events                             |    6 ++-
>  trace/control-internal.h                 |    5 +++
>  trace/control-target.c                   |   14 +++++++-
>  trace/control.c                          |    9 +----
>  trace/control.h                          |    5 ++-
>  translate-all.c                          |   30 +++++++++++++----
>  25 files changed, 195 insertions(+), 64 deletions(-)
> 
> 
> To: qemu-devel@nongnu.org
> Cc: Stefan Hajnoczi <stefanha@redhat.com>
> Cc: Eduardo Habkost <ehabkost@redhat.com>
> Cc: Eric Blake <eblake@redhat.com>

The tracing aspects seem fine.  I have left a comment regarding
thread-safety.

I'll merge it once Richard Henderson has had time to review it from a
TCG perspective.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

      parent reply	other threads:[~2017-01-10 14:43 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-12-28 18:40 [Qemu-devel] [PATCH v6 0/7] trace: [tcg] Optimize per-vCPU tracing states with separate TB caches Lluís Vilanova
2016-12-28 18:41 ` [Qemu-devel] [PATCH v6 1/7] exec: [tcg] Refactor flush of per-CPU virtual TB cache Lluís Vilanova
2017-01-10 20:07   ` Richard Henderson
2016-12-28 18:41 ` [Qemu-devel] [PATCH v6 2/7] trace: Make trace_get_vcpu_event_count() inlinable Lluís Vilanova
2017-01-10 20:08   ` Richard Henderson
2017-01-12 18:14     ` Lluís Vilanova
2016-12-28 18:41 ` [Qemu-devel] [PATCH v6 3/7] trace: [tcg] Delay changes to dynamic state when translating Lluís Vilanova
2017-01-09 17:01   ` Stefan Hajnoczi
2017-01-10 16:31     ` Paolo Bonzini
2017-01-11 16:16       ` Stefan Hajnoczi
2017-01-12 19:37         ` Lluís Vilanova
2017-01-12 21:25           ` Paolo Bonzini
2017-01-13 20:08             ` Lluís Vilanova
2016-12-28 18:41 ` [Qemu-devel] [PATCH v6 4/7] exec: [tcg] Use different TBs according to the vCPU's dynamic tracing state Lluís Vilanova
2017-01-10 20:10   ` Richard Henderson
2016-12-28 18:41 ` [Qemu-devel] [PATCH v6 5/7] trace: [tcg] Do not generate TCG code to trace dinamically-disabled events Lluís Vilanova
2016-12-28 18:41 ` [Qemu-devel] [PATCH v6 6/7] trace: [tcg, trivial] Re-align generated code Lluís Vilanova
2017-01-12 11:19   ` Michael Tokarev
2017-01-12 18:46     ` Lluís Vilanova
2016-12-28 18:41 ` [Qemu-devel] [PATCH v6 7/7] trace: [trivial] Statically enable all guest events Lluís Vilanova
2017-01-09 17:04 ` Stefan Hajnoczi [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170109170434.GM30228@stefanha-x1.localdomain \
    --to=stefanha@redhat.com \
    --cc=eblake@redhat.com \
    --cc=ehabkost@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=vilanova@ac.upc.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.