From: Gleb Natapov <gleb@redhat.com>
To: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>,
linux-kernel@vger.kernel.org, David Sharp <dhsharp@google.com>,
yrl.pp-manager.tt@hitachi.com,
Steven Rostedt <rostedt@goodmis.org>,
Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>,
Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>,
Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>,
Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [PATCH V3 0/1] kvm: Output TSC offset
Date: Sun, 23 Jun 2013 10:58:48 +0300 [thread overview]
Message-ID: <20130623075848.GO5832@redhat.com> (raw)
In-Reply-To: <20130612074341.25553.56877.stgit@yunodevel>
On Wed, Jun 12, 2013 at 04:43:41PM +0900, Yoshihiro YUNOMAE wrote:
> Hi All,
>
Applied, thanks.
> I'd like to propose a patch which adds a tracepoint at write_tsc_offset for
> tracing guests TSC offset changes. It is required for sorting the trace data of
> a guest and the host in chronological order.
>
> In a virtualization environment, it is difficult to analyze performance
> problems, such as a delay of I/O request on a guest. This is because multiple
> guests operate on the host. One of approaches for solving such kind of problems
> is to sort trace data of guests and the host in chronological order.
>
> Raw TSC can be chosen as a timestamp of ftrace. I think TSC is useful for
> merging trace data in chronological order by two reasons. One of the reasons is
> that guests can directly read raw TSC from the CPU using rdtsc operation. This
> means that raw TSC value is not software clock like sched_clock, so we don't
> need to consider about how the timestamp is calculated. The other is that TSC
> of recent x86 CPUs is constantly incremented. This means that we don't need to
> worry about pace of the timestamp. Therefore, choosing TSC as a timestamp for
> tracing is reasonable to integrate trace data of guests and a host.
>
> Here, we need to consider about just one matter for using TSC on guests. TSC
> value on a guest is always the host TSC plus the guest's "TSC offset". In other
> words, to merge trace data using TSC as timestamp in chronological order, we
> need to consider TSC offset of the guest.
>
> However, only the host kernel can read the TSC offset from VMCS and TSC offset
> is not output in anywhere now. In other words, tools in userland cannot get
> the TSC offset value, so we cannot merge trace data of guest and the host in
> chronological order. Therefore, I think the TSC offset should be exported for
> userland tools.
>
> In this patch, TSC offset is exported by the tracepoint kvm_write_tsc_offset
> on the host. TSC offset events will be very rare event because guests may
> seldom execute write_tsc. So if we enable this event for a normal buffer of
> ftrace, the events will be overwritten by other events. For a recent linux
> kernel, a multiple buffer function is available. So, I recommend to enable
> this event for a sub buffer of ftrace.
>
> <Example>
> We assume that wakeup-latency for a command is big on a guest. Normally
> we will use ftrace's wakeup-latency tracer or event tracer on the guest, but we
> may not be able to solve this problem. This is because guests often exit to
> the host for several reasons. In the next, we will use TSC as ftrace's timestamp
> and record the trace data on the guest and the host. Then, we get following
> data:
>
> /* guest data */
> comm-3826 [000] d...49836825726903: sched_wakeup: [detail]
> comm-3826 [000] d...49836832225344: sched_switch: [detail]
> /* host data */
> qemu-kvm-2687 [003] d...50550079203669: kvm_exit: [detail]
> qemu-kvm-2687 [003] d...50550079206816: kvm_entry: [detail]
> qemu-kvm-2687 [003] d...50550079240656: kvm_exit: [detail]
> qemu-kvm-2687 [003] d...50550079243467: kvm_entry: [detail]
> qemu-kvm-2687 [003] d...50550079256103: kvm_exit: [detail]
> qemu-kvm-2687 [003] d...50550079268391: kvm_entry: [detail]
> qemu-kvm-2687 [003] d...50550079280829: kvm_exit: [detail]
> qemu-kvm-2687 [003] d...50550079286028: kvm_entry: [detail]
>
> Since TSC offset is not considered, these data cannot be merged. If this trace
> data is shown like as follows, we will be able to understand the reason:
>
> qemu-kvm-2687 [003] d...50550079203669: kvm_exit: [detail]
> qemu-kvm-2687 [003] d...50550079206816: kvm_entry: [detail]
> comm-3826 [000] d.h.49836825726903: sched_wakeup: [detail] <=
> qemu-kvm-2687 [003] d...50550079240656: kvm_exit: [detail]
> qemu-kvm-2687 [003] d...50550079243467: kvm_entry: [detail]
> qemu-kvm-2687 [003] d...50550079256103: kvm_exit: [detail]
> qemu-kvm-2687 [003] d...50550079268391: kvm_entry: [detail]
> comm-3826 [000] d...49836832225344: sched_switch: [detail] <=
> qemu-kvm-2687 [003] d...50550079280829: kvm_exit: [detail]
> qemu-kvm-2687 [003] d...50550079286028: kvm_entry: [detail]
>
> In this case, we can understand wakeup-latency was big due to exit to host
> twice. Getting this data sorted in chronological order is our goal.
>
> To merge the data like previous pattern, we apply this patch. Then, we can
> get TSC offset of the guest as follows:
>
> # cat /sys/kernel/debug/tracing/instances/tsc_offset/trace
> .. d...4300151845072: kvm_write_tsc_offset: vcpu=0 prev=0 next=18446739773557710924
> ^ ^^^^^^^^^^^^^^^^^^^^
>
> We use this TSC offset value to a merge script and obtain the following data:
>
> $ ./trace-merge.pl -g guest_data -h host_data -t 18446739773557710924
>
> h qemu-kvm-2687 [003] d...50550079203669: kvm_exit: [detail]
> h qemu-kvm-2687 [003] d...50550079206816: kvm_entry: [detail]
> g comm-3826 [000] d.h.50550079226331: sched_wakeup: [detail] <=
> h qemu-kvm-2687 [003] d...50550079240656: kvm_exit: [detail]
> h qemu-kvm-2687 [003] d...50550079243467: kvm_entry: [detail]
> h qemu-kvm-2687 [003] d...50550079256103: kvm_exit: [detail]
> h qemu-kvm-2687 [003] d...50550079268391: kvm_entry: [detail]
> g comm-3826 [000] d...50550079279266: sched_switch: [detail] <=
> h qemu-kvm-2687 [003] d...50550079280829: kvm_exit: [detail]
> h qemu-kvm-2687 [003] d...50550079286028: kvm_entry: [detail]
> |
> \----guest/host
>
> In this summary, I suggest the patch which TSC offset for each guest can be
> output on the host.
>
> As one example, I will send a merge tool. This tool assumes the guest doesn't
> execute write_tsc.
>
> Changes in V2:
> - Use tracepoint for outputting TSC offset value instead of printk() [1/1]
>
> Changes in V3:
> - Add tracepoint in svm.c [1/1]
> - Record vcpu_id [1/1]
> - Change "previous=" to "prev=" in output result [1/1]
>
> Thanks!
>
> ---
>
> Yoshihiro YUNOMAE (1):
> kvm: Add a tracepoint write_tsc_offset
>
>
> arch/x86/kvm/svm.c | 10 +++++++++-
> arch/x86/kvm/trace.h | 21 +++++++++++++++++++++
> arch/x86/kvm/vmx.c | 7 ++++++-
> arch/x86/kvm/x86.c | 1 +
> 4 files changed, 37 insertions(+), 2 deletions(-)
>
> --
> Yoshihiro YUNOMAE
> Software Platform Research Dept. Linux Technology Center
> Hitachi, Ltd., Yokohama Research Laboratory
> E-mail: yoshihiro.yunomae.ez@hitachi.com
--
Gleb.
next prev parent reply other threads:[~2013-06-23 7:59 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-06-12 7:43 [PATCH V3 0/1] kvm: Output TSC offset Yoshihiro YUNOMAE
2013-06-12 7:43 ` [PATCH V3 1/1] kvm: Add a tracepoint write_tsc_offset Yoshihiro YUNOMAE
2013-06-12 22:44 ` Marcelo Tosatti
2013-06-24 15:02 ` Paolo Bonzini
2013-06-12 7:46 ` [EXAMPLE] tools: a tool for merging trace data of a guest and a host Yoshihiro YUNOMAE
2013-06-23 7:58 ` Gleb Natapov [this message]
2013-06-25 10:16 ` [PATCH] [BUGFIX] Fix build error caused by an undefinition of the kvm_write_tsc_offset tracepoint for x86_32 Yoshihiro YUNOMAE
2013-06-25 10:18 ` Yoshihiro YUNOMAE
2013-06-25 10:43 ` Gleb Natapov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130623075848.GO5832@redhat.com \
--to=gleb@redhat.com \
--cc=dhsharp@google.com \
--cc=hidehiro.kawai.ez@hitachi.com \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=masami.hiramatsu.pt@hitachi.com \
--cc=mingo@redhat.com \
--cc=mtosatti@redhat.com \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
--cc=yoshihiro.yunomae.ez@hitachi.com \
--cc=yrl.pp-manager.tt@hitachi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).