linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Gleb Natapov <gleb@redhat.com>
To: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>,
	linux-kernel@vger.kernel.org, David Sharp <dhsharp@google.com>,
	yrl.pp-manager.tt@hitachi.com,
	Steven Rostedt <rostedt@goodmis.org>,
	Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>,
	Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>,
	Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [PATCH V3 0/1] kvm: Output TSC offset
Date: Sun, 23 Jun 2013 10:58:48 +0300	[thread overview]
Message-ID: <20130623075848.GO5832@redhat.com> (raw)
In-Reply-To: <20130612074341.25553.56877.stgit@yunodevel>

On Wed, Jun 12, 2013 at 04:43:41PM +0900, Yoshihiro YUNOMAE wrote:
> Hi All,
> 
Applied, thanks.

> I'd like to propose a patch which adds a tracepoint at write_tsc_offset for
> tracing guests TSC offset changes. It is required for sorting the trace data of
> a guest and the host in chronological order.
> 
> In a virtualization environment, it is difficult to analyze performance
> problems, such as a delay of I/O request on a guest. This is because multiple
> guests operate on the host. One of approaches for solving such kind of problems
> is to sort trace data of guests and the host in chronological order.
> 
> Raw TSC can be chosen as a timestamp of ftrace. I think TSC is useful for
> merging trace data in chronological order by two reasons. One of the reasons is 
> that guests can directly read raw TSC from the CPU using rdtsc operation. This
> means that raw TSC value is not software clock like sched_clock, so we don't
> need to consider about how the timestamp is calculated. The other is that TSC
> of recent x86 CPUs is constantly incremented. This means that we don't need to
> worry about pace of the timestamp. Therefore, choosing TSC as a timestamp for
> tracing is reasonable to integrate trace data of guests and a host.
> 
> Here, we need to consider about just one matter for using TSC on guests. TSC
> value on a guest is always the host TSC plus the guest's "TSC offset". In other
> words, to merge trace data using TSC as timestamp in chronological order, we
> need to consider TSC offset of the guest.
> 
> However, only the host kernel can read the TSC offset from VMCS and TSC offset
> is not output in anywhere now. In other words, tools in userland cannot get
> the TSC offset value, so we cannot merge trace data of guest and the host in
> chronological order. Therefore, I think the TSC offset should be exported for
> userland tools.
> 
> In this patch, TSC offset is exported by the tracepoint kvm_write_tsc_offset
> on the host. TSC offset events will be very rare event because guests may
> seldom execute write_tsc. So if we enable this event for a normal buffer of
> ftrace, the events will be overwritten by other events. For a recent linux
> kernel, a multiple buffer function is available. So, I recommend to enable
> this event for a sub buffer of ftrace.
> 
> <Example>
> We assume that wakeup-latency for a command is big on a guest. Normally
> we will use ftrace's wakeup-latency tracer or event tracer on the guest, but we
> may not be able to solve this problem. This is because guests often exit to
> the host for several reasons. In the next, we will use TSC as ftrace's timestamp
> and record the trace data on the guest and the host. Then, we get following
> data:
> 
>  /* guest data */
>             comm-3826  [000] d...49836825726903: sched_wakeup: [detail]
>             comm-3826  [000] d...49836832225344: sched_switch: [detail]
>  /* host data */
>         qemu-kvm-2687  [003] d...50550079203669: kvm_exit: [detail]
>         qemu-kvm-2687  [003] d...50550079206816: kvm_entry: [detail]
>         qemu-kvm-2687  [003] d...50550079240656: kvm_exit: [detail]
>         qemu-kvm-2687  [003] d...50550079243467: kvm_entry: [detail]
>         qemu-kvm-2687  [003] d...50550079256103: kvm_exit: [detail]
>         qemu-kvm-2687  [003] d...50550079268391: kvm_entry: [detail]
>         qemu-kvm-2687  [003] d...50550079280829: kvm_exit: [detail]
>         qemu-kvm-2687  [003] d...50550079286028: kvm_entry: [detail]
> 
> Since TSC offset is not considered, these data cannot be merged. If this trace
> data is shown like as follows, we will be able to understand the reason:
> 
>         qemu-kvm-2687  [003] d...50550079203669: kvm_exit: [detail]
>         qemu-kvm-2687  [003] d...50550079206816: kvm_entry: [detail]
>             comm-3826  [000] d.h.49836825726903: sched_wakeup: [detail] <=
>         qemu-kvm-2687  [003] d...50550079240656: kvm_exit: [detail]
>         qemu-kvm-2687  [003] d...50550079243467: kvm_entry: [detail]
>         qemu-kvm-2687  [003] d...50550079256103: kvm_exit: [detail]
>         qemu-kvm-2687  [003] d...50550079268391: kvm_entry: [detail]
>             comm-3826  [000] d...49836832225344: sched_switch: [detail] <=
>         qemu-kvm-2687  [003] d...50550079280829: kvm_exit: [detail]
>         qemu-kvm-2687  [003] d...50550079286028: kvm_entry: [detail]
> 
> In this case, we can understand wakeup-latency was big due to exit to host
> twice. Getting this data sorted in chronological order is our goal.
> 
> To merge the data like previous pattern, we apply this patch. Then, we can
> get TSC offset of the guest as follows:
> 
> # cat /sys/kernel/debug/tracing/instances/tsc_offset/trace
> .. d...4300151845072: kvm_write_tsc_offset: vcpu=0 prev=0 next=18446739773557710924
>                                                         ^      ^^^^^^^^^^^^^^^^^^^^
> 
> We use this TSC offset value to a merge script and obtain the following data:
> 
> $ ./trace-merge.pl -g guest_data -h host_data -t 18446739773557710924
> 
> h        qemu-kvm-2687  [003] d...50550079203669: kvm_exit: [detail]
> h        qemu-kvm-2687  [003] d...50550079206816: kvm_entry: [detail]
> g            comm-3826  [000] d.h.50550079226331: sched_wakeup: [detail] <=
> h        qemu-kvm-2687  [003] d...50550079240656: kvm_exit: [detail]
> h        qemu-kvm-2687  [003] d...50550079243467: kvm_entry: [detail]
> h        qemu-kvm-2687  [003] d...50550079256103: kvm_exit: [detail]
> h        qemu-kvm-2687  [003] d...50550079268391: kvm_entry: [detail]
> g            comm-3826  [000] d...50550079279266: sched_switch: [detail] <=
> h        qemu-kvm-2687  [003] d...50550079280829: kvm_exit: [detail]
> h        qemu-kvm-2687  [003] d...50550079286028: kvm_entry: [detail]
> |
> \----guest/host
> 
> In this summary, I suggest the patch which TSC offset for each guest can be
> output on the host.
> 
> As one example, I will send a merge tool. This tool assumes the guest doesn't
> execute write_tsc.
> 
> Changes in V2:
> - Use tracepoint for outputting TSC offset value instead of printk() [1/1]
> 
> Changes in V3:
> - Add tracepoint in svm.c [1/1]
> - Record vcpu_id [1/1]
> - Change "previous=" to "prev=" in output result [1/1]
> 
> Thanks!
> 
> ---
> 
> Yoshihiro YUNOMAE (1):
>       kvm: Add a tracepoint write_tsc_offset
> 
> 
>  arch/x86/kvm/svm.c   |   10 +++++++++-
>  arch/x86/kvm/trace.h |   21 +++++++++++++++++++++
>  arch/x86/kvm/vmx.c   |    7 ++++++-
>  arch/x86/kvm/x86.c   |    1 +
>  4 files changed, 37 insertions(+), 2 deletions(-)
> 
> -- 
> Yoshihiro YUNOMAE
> Software Platform Research Dept. Linux Technology Center
> Hitachi, Ltd., Yokohama Research Laboratory
> E-mail: yoshihiro.yunomae.ez@hitachi.com

--
			Gleb.

  parent reply	other threads:[~2013-06-23  7:59 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-12  7:43 [PATCH V3 0/1] kvm: Output TSC offset Yoshihiro YUNOMAE
2013-06-12  7:43 ` [PATCH V3 1/1] kvm: Add a tracepoint write_tsc_offset Yoshihiro YUNOMAE
2013-06-12 22:44   ` Marcelo Tosatti
2013-06-24 15:02   ` Paolo Bonzini
2013-06-12  7:46 ` [EXAMPLE] tools: a tool for merging trace data of a guest and a host Yoshihiro YUNOMAE
2013-06-23  7:58 ` Gleb Natapov [this message]
2013-06-25 10:16   ` [PATCH] [BUGFIX] Fix build error caused by an undefinition of the kvm_write_tsc_offset tracepoint for x86_32 Yoshihiro YUNOMAE
2013-06-25 10:18     ` Yoshihiro YUNOMAE
2013-06-25 10:43       ` Gleb Natapov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130623075848.GO5832@redhat.com \
    --to=gleb@redhat.com \
    --cc=dhsharp@google.com \
    --cc=hidehiro.kawai.ez@hitachi.com \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=masami.hiramatsu.pt@hitachi.com \
    --cc=mingo@redhat.com \
    --cc=mtosatti@redhat.com \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=yoshihiro.yunomae.ez@hitachi.com \
    --cc=yrl.pp-manager.tt@hitachi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).