From: Marcelo Tosatti <mtosatti@redhat.com>
To: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com>
Cc: Steven Rostedt <rostedt@goodmis.org>,
David Sharp <dhsharp@google.com>,
"H. Peter Anvin" <hpa@zytor.com>,
linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
Joerg Roedel <joerg.roedel@amd.com>,
Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>,
Ingo Molnar <mingo@redhat.com>, Avi Kivity <avi@redhat.com>,
yrl.pp-manager.tt@hitachi.com,
Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>,
Thomas Gleixner <tglx@linutronix.de>
Subject: Re: Re: Re: Re: Re: [RFC PATCH 0/2] kvm/vmx: Output TSC offset
Date: Mon, 26 Nov 2012 21:16:53 -0200 [thread overview]
Message-ID: <20121126231653.GA20391@amt.cnet> (raw)
In-Reply-To: <50B34CE6.9070207@hitachi.com>
On Mon, Nov 26, 2012 at 08:05:10PM +0900, Yoshihiro YUNOMAE wrote:
> >>>500h. event tsc_write tsc_offset=-3000
> >>>
> >>>Then a guest trace containing events with a TSC timestamp.
> >>>Which tsc_offset to use?
> >>>
> >>>(that is the problem, which unless i am mistaken can only be solved
> >>>easily if the guest can convert RDTSC -> TSC of host).
> >>
> >>There are three following cases of changing TSC offset:
> >> 1. Reset TSC at guest boot time
> >> 2. Adjust TSC offset due to some host's problems
> >> 3. Write TSC on guests
> >>The scenario which you mentioned is case 3, so we'll discuss this case.
> >>Here, we assume that a guest is allocated single CPU for the sake of
> >>ease.
> >>
> >>If a guest executes write_tsc, TSC values jumps to forward or backward.
> >>For the forward case, trace data are as follows:
> >>
> >>< host > < guest >
> >>cycles events cycles events
> >> 3000 tsc_offset=-2950
> >> 3001 kvm_enter
> >> 53 eventX
> >> ....
> >> 100 (write_tsc=+900)
> >> 3060 kvm_exit
> >> 3075 tsc_offset=-2050
> >> 3080 kvm_enter
> >> 1050 event1
> >> 1055 event2
> >> ...
> >>
> >>
> >>This case is simple. The guest TSC of the first kvm_enter is calculated
> >>as follows:
> >>
> >> (host TSC of kvm_enter) + (current tsc_offset) = 3001 - 2950 = 51
> >>
> >>Similarly, the guest TSC of the second kvm_enter is 130. So, the guest
> >>events between 51 and 130, that is, 53 eventX is inserted between the
> >>first pair of kvm_enter and kvm_exit. To insert events of the guests
> >>between 51 and 130, we convert the guest TSC to the host TSC using TSC
> >>offset 2950.
> >>
> >>For the backward case, trace data are as follows:
> >>
> >>< host > < guest >
> >>cycles events cycles events
> >> 3000 tsc_offset=-2950
> >> 3001 kvm_enter
> >> 53 eventX
> >> ....
> >> 100 (write_tsc=-50)
> >> 3060 kvm_exit
> >> 3075 tsc_offset=-2050
> >> 3080 kvm_enter
> >> 90 event1
> >> 95 event2
> >> ...
> >
> > 3400 100 (write_tsc=-50)
> >
> > 90 event3
> > 95 event4
> >
> >>As you say, in this case, the previous method is invalid. When we
> >>calculate the guest TSC value for the tsc_offset=-3000 event, the value
> >>is 75 on the guest. This seems like prior event of write_tsc=-50 event.
> >>So, we need to consider more.
> >>
> >>In this case, it is important that we can understand where the guest
> >>executes write_tsc or the host rewrites the TSC offset. write_tsc on
> >>the guest equals wrmsr 0x00000010, so this instruction induces vm_exit.
> >>This implies that the guest does not operate when the host changes TSC
> >>offset on the cpu. In other words, the guest cannot use new TSC before
> >>the host rewrites the new TSC offset. So, if timestamp on the guest is
> >>not monotonically increased, we can understand the guest executes
> >>write_tsc. Moreover, in the region where timestamp is decreasing, we
> >>can understand when the host rewrote the TSC offset in the guest trace
> >>data. Therefore, we can sort trace data in chronological order.
> >
> >This requires an entire trace of events. That is, to be able
> >to reconstruct timeline you require the entire trace from the moment
> >guest starts. So that you can correlate wrmsr-to-tsc on the guest with
> >vmexit-due-to-tsc-write on the host.
> >
> >Which means that running out of space for trace buffer equals losing
> >ability to order events.
> >
> >Is that desirable? It seems cumbersome to me.
>
> As you say, tracing events can overwrite important events like
> kvm_exit/entry or write_tsc_offset. So, Steven's multiple buffer is
> needed by this feature. Normal events which often hit record the buffer
> A, and important events which rarely hit record the buffer B. In our
> case, the important event is write_tsc_offset.
> >Also the need to correlate each write_tsc event in the guest trace
> >with a corresponding tsc_offset write in the host trace means that it
> >is _necessary_ for the guest and host to enable tracing simultaneously.
> >Correct?
> >
> >Also, there are WRMSR executions in the guest for which there is
> >no event in the trace buffer. From SeaBIOS, during boot.
> >In that case, there is no explicit event in the guest trace which you
> >can correlate with tsc_offset changes in the host side.
>
> I understand that you want to say, but we don't correlate between
> write_tsc event and write_tsc_offset event directly. This is because
> the write_tsc tracepoint (also WRMSR instruction) is not prepared in
> the current kernel. So, in the previous mail
> (https://lkml.org/lkml/2012/11/22/53), I suggested the method which we
> don't need to prepare the write_tsc tracepoint.
>
> In the method, we enable ftrace before the guest boots, and we need to
> keep all write_tsc_offset events in the buffer. If we forgot enabling
> ftrace or we don't use multiple buffers, we don't use this feature.
Yoshihiro,
Better have a single method to convert guest TSC to host TSC.
Ok, if you keep both TSC offset write events and guest TSC writes (*)
in separate buffers which are persistent, then you can convert
guest-tsc-events to host-tsc.
Can you please write a succint but complete description of the method
so it can be verified?
(*) note guest TSC writes have no events because Linux does not write
to TSC offset, but a "system booted" event can be used to correlate
with the TSC write by BIOS.
Thanks
> So, I think as Peter says, the host should also export TSC offset
> information to /proc/pid/kvm/*.
>
> >If the guest had access to the host TSC value, these complications
> >would disappear.
>
> As a debugging mode, the TSC offset zero mode will be useful, I think.
> (not for the real operation mode)
>
> Thanks,
> --
> Yoshihiro YUNOMAE
> Software Platform Research Dept. Linux Technology Center
> Hitachi, Ltd., Yokohama Research Laboratory
> E-mail: yoshihiro.yunomae.ez@hitachi.com
next prev parent reply other threads:[~2012-11-26 23:17 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-11-14 1:36 [RFC PATCH 0/2] kvm/vmx: Output TSC offset Yoshihiro YUNOMAE
2012-11-14 1:36 ` [RFC PATCH 1/2] kvm/vmx: Print TSC_OFFSET information when TSC offset value is written to VMCS Yoshihiro YUNOMAE
2012-11-14 1:37 ` [RFC PATCH 2/2] tools: Add a tool for merging trace data of a guest and a host Yoshihiro YUNOMAE
2012-11-14 2:00 ` [RFC PATCH 0/2] kvm/vmx: Output TSC offset Steven Rostedt
2012-11-14 2:02 ` H. Peter Anvin
2012-11-14 2:03 ` David Sharp
2012-11-14 2:31 ` Steven Rostedt
2012-11-14 8:26 ` Yoshihiro YUNOMAE
2012-11-16 15:05 ` Steven Rostedt
2012-11-16 18:56 ` Marcelo Tosatti
2012-11-20 10:38 ` Yoshihiro YUNOMAE
2012-11-16 19:15 ` Marcelo Tosatti
2012-11-20 10:36 ` Yoshihiro YUNOMAE
2012-11-20 22:51 ` Marcelo Tosatti
2012-11-22 5:21 ` Yoshihiro YUNOMAE
2012-11-23 22:46 ` Marcelo Tosatti
2012-11-26 11:05 ` Yoshihiro YUNOMAE
2012-11-26 23:16 ` Marcelo Tosatti [this message]
2012-11-27 10:53 ` Yoshihiro YUNOMAE
2012-11-29 22:51 ` Marcelo Tosatti
2012-11-30 1:36 ` Yoshihiro YUNOMAE
2012-11-30 20:42 ` Marcelo Tosatti
2012-12-03 0:55 ` Yoshihiro YUNOMAE
2012-11-16 3:19 ` Marcelo Tosatti
2012-11-16 8:09 ` Yoshihiro YUNOMAE
2012-11-16 10:05 ` Marcelo Tosatti
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20121126231653.GA20391@amt.cnet \
--to=mtosatti@redhat.com \
--cc=avi@redhat.com \
--cc=dhsharp@google.com \
--cc=hidehiro.kawai.ez@hitachi.com \
--cc=hpa@zytor.com \
--cc=joerg.roedel@amd.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=masami.hiramatsu.pt@hitachi.com \
--cc=mingo@redhat.com \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
--cc=yoshihiro.yunomae.ez@hitachi.com \
--cc=yrl.pp-manager.tt@hitachi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).