All of lore.kernel.org
 help / color / mirror / Atom feed
From: Joao Martins <joao.m.martins@oracle.com>
To: Owen Hofmann <osh@google.com>, Andy Lutomirski <luto@amacapital.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	Marcelo Tosatti <mtosatti@redhat.com>,
	xen-devel <xen-devel@lists.xen.org>,
	KVM General <kvm@vger.kernel.org>,
	Peter Hornyack <peterhornyack@google.com>
Subject: Re: What time is it kvm-clock?
Date: Thu, 25 Feb 2016 12:22:02 +0000	[thread overview]
Message-ID: <56CEF1EA.6010809__18894.8961244479$1456403008$gmane$org@oracle.com> (raw)
In-Reply-To: <CANqFzA6TgdsoUyjKiwNa++et_wWnQgFRhEUZ6NE3_CShh0wc6g@mail.gmail.com>

On 02/24/2016 07:55 PM, Owen Hofmann wrote:
>>>> not-really-well-defined hybrid?
>>>>
>>>> --Andy
>>>
>>> 1. What is not well defined? I fail to spot anything
>>> specific in Owen's e-mail.
>>
>> If I start a guest and query kvm-clock, I get a nanosecond count.
>> AFAIK it is, in fact, ill-defined or at least ill-documented what that
>> nanosecond count means.
> 
> To try to put the thoughts into specific questions:
> - What is the value returned by KVM_GET_CLOCK? How should it be used?
> - What is the value returned by a guest read of the kvm-clock
> structure? (This is also Andy's question)
> To me there are two possibilities for how to answer the second question:
> 1) kvm-clock is better than the host TSC: it propagates updates to
> frequency from the host (== CLOCK_MONOTONIC)
> 2) kvm-clock is a paravirtual source of truth on the guest TSC:
> whether it is stable and its approximate frequency. If the guest needs
> to synchronize to an external source of time, it runs NTP. (==
> CLOCK_MONOTONIC_RAW)
> 
> To me, (1) sounds hard, (2) sounds easy, and its not clear how much
> additional value (1) provides. The recent patches Paolo sent move
> kvm-clock in the direction of (1), and it sounds like Andy and I might
> have slightly different opinions as well. But mostly I would like some
> clarity as to which is the stated goal for kvm-clock, and to have the
> implementation pick only one of those options.
> 
>>>>> Since we cannot change the past, having kvmclock synchronize with the
>>>>> host TSC frequency is the only choice we can make.'
> 
> I'm not sure I understand what previous decision locks kvm-clock into
> the current path. Can you clarify?
> 
> On Wed, Feb 24, 2016 at 11:38 AM, Andy Lutomirski <luto@amacapital.net> wrote:
>> On Wed, Feb 24, 2016 at 9:38 AM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
>>> On Wed, Feb 24, 2016 at 08:44:40AM -0800, Andy Lutomirski wrote:
>>>> On Wed, Feb 24, 2016 at 6:14 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>>>>
>>>>>
>>>>> On 24/02/2016 03:31, Owen Hofmann wrote:
>>>>>> Specifically, what underlying source of time should be exposed through
>>>>>> kvm-clock and other paravirtual ABIs like the HyperV reference tsc
>>>>>> page?  Recently a couple of threads on kvm-list, along with attempts
>>>>>> to produce reliable behavior from kvm-clock on our systems have
>>>>>> highlighted a tension between the current implementation of kvm-clock
>>>>>> and potentially diverging goals for paravirt time. Here are a few:
>>>>>>
>>>>>> 1) kvmclock doesn't work, help?: http://www.spinics.net/lists/kvm/msg125039.html
>>>>>> 2) kvmclock: improve accuracy: http://www.spinics.net/lists/kvm/msg127215.html
>>>>>> 3) KVM-clock: http://www.spinics.net/lists/kvm/msg127774.html
>>>>>>
>>>>>> This question is mostly in regards to kvm-clock in masterclock mode
>>>>>> (with PVCLOCK_TSC_STABLE set). In this mode, is kvm-clock intended to
>>>>>> expose a source of time that is more 'true' than the underlying TSC?
>>>>>> For example, by passing through NTP correction from the host. For the
>>>>>> current implementation, the answer seems to be... why not both? Once
>>>>>> programmed, kvm-clock or the HyperV TSC page will advance with the TSC
>>>>>> multiplied by the frequency specified by kvm. On the other hand,
>>>>>> KVM_GET_CLOCK, KVM_SET_CLOCK, and the Windows reference counter MSR
>>>>>> are measured against corrected time from the host. A guest reading its
>>>>>> pvclock gets a very different result from a host KVM_GET_CLOCK if the
>>>>>> guest has run long enough to for TSC to diverge from NTP time.
>>>>>
>>>>> Right, in fact that's why QEMU is not really using KVM_GET_CLOCK
>>>>> anymore.  In retrospect, the "fix" in QEMU was probably a bad idea.  It
>>>>> would have been better to fix KVM_GET_CLOCK.
>>>>>
>>>>>> To me, kvm-clock and the HyperV TSC page are extremely effective as
>>>>>> simply a more enlightened path to the host TSC. Maintaining a
>>>>>> high-performance path to the TSC in the face of updates is tricky -
>>>>>> see the extended comment in pvclock_update_vm_gtod_copy, or the
>>>>>> discussion on the patchset in (2). Is the cost of auditing that the
>>>>>> path from host gettimeofday update -> kvm -> guest pvclock -> guest
>>>>>> gettimeofday both tracks host time correctly and does not produce any
>>>>>> backwards warps worth the added value, if it exists? As an
>>>>>> alternative, implementing KVM_GET_CLOCK or the reference time MSR as a
>>>>>> function of the last update to kvm-clock or the reference TSC page,
>>>>>> respectively, sounds very straightforward.
>>>>>
>>>>> Yes, we could do that too.
>>>>>
>>>>> I think that vgettsc and do_monotonic_boot also would have to use the
>>>>> TSC frequency instead the NTP-adjusted host clock.
>>>>>
>>>>>> (Outside of masterclock mode, the requirement that the client
>>>>>> synchronizes across cpus for montonicity smoothes over a lot of
>>>>>> complexity - periodically updating kvm-clock to the current time is
>>>>>> simple and works.)
>>>>>>
>>>>>> Regardless of my opinion, I think that a clear statement of the design
>>>>>> goals for kvm-clock (and kvm's implementation of the reference TSC
>>>>>> page) would be valuable.
>>>>>
>>>>> Since we cannot change the past, having kvmclock synchronize with the
>>>>> host TSC frequency is the only choice we can make.
>>>>>
>>>>
>>>> Could we introduce a new kvm-clock or perhaps opt-in mode that:
>>>>
>>>> a) uses hypervisor-supplied IO pages and,
>>>>
>>>> b) synchronizes to host CLOCK_MONOTONIC instead of some bizarre
>>>> non-suspend-resume-safe
>>>
>>> Please be accurate. It is suspend safe.
>>>
>>
>> I'm being accurate enough, I think.  Master clock mode is not suspend
>> safe.  When I suspend and resume my laptop, the master clock code
>> determines that it messed up and disables itself.  Unloading and
>> reloading the kvm modules turns it back on until the next suspect.
>>
>> I *think* that the underlying issue is that kvm-clock's master clock
>> tracks something ill-defined instead of exposing a well-defined host
>> clock.  If the master clock accurately exposed CLOCK_MONOTONIC_RAW or
>> CLOCK_MONOTONIC (I much prefer the latter), then it would be fine
>> across suspend/resume.
>>
>> I think that part of the reason that it doesn't accurately export a
>> host clock is that the worst-case performance of atomic updates to the
>> pvclock data structures is abysmal due to having the data structures
>> living in guest memory.  To be able to access and update all relevant
>> structures during host clock refreshes, the host would need to pin the
>> all pvclock pages for all running guests.  This could be partially
>> mitigated by only updating pvclock data for running vcpus and for vcpu
>> 0 for all running guests synchronously and deferring the rest (8k
>> pinned per host cpu, max), but it would still be a mess.
>>
>> If someone redefined the interface so that the *host* could allocate
>> it, then the pages could be shared across all guests and this would be
>> vastly simpler and faster.
>>
>> Also, kvm-clock should really coordinate with the core timekeeping
>> code to handle this sort of time base export rather than hooking into
>> the host vdso support code.
>>
>>>> not-really-well-defined hybrid?
>>>>
>>>> --Andy
>>>
>>> 1. What is not well defined? I fail to spot anything
>>> specific in Owen's e-mail.
>>
>> If I start a guest and query kvm-clock, I get a nanosecond count.
>> AFAIK it is, in fact, ill-defined or at least ill-documented what that
>> nanosecond count means.
>>
>> [cc: Joao.  Xen may want to take this stuff into consideration.]
[CC-ing xen-devel folks too]

Joao
>> --Andy

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

  parent reply	other threads:[~2016-02-25 12:22 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-24  2:31 What time is it kvm-clock? Owen Hofmann
2016-02-24  3:57 ` Marcelo Tosatti
2016-02-24 17:35   ` Peter Hornyack
2016-02-24 20:17     ` Radim Krčmář
2016-02-24 20:24       ` Andy Lutomirski
2016-02-24 20:53         ` Radim Krčmář
2016-02-25 11:13           ` Radim Krčmář
2016-02-25 11:22           ` Marcelo Tosatti
2016-02-24 23:35     ` Marcelo Tosatti
2016-02-24 23:36       ` Marcelo Tosatti
2016-02-25  1:19       ` Andy Lutomirski
2016-02-25  3:50         ` Owen Hofmann
2016-02-25 12:20           ` Radim Krčmář
2016-02-26 17:02             ` Andy Lutomirski
2016-02-26 19:30               ` Marcelo Tosatti
2016-02-27  0:00                 ` Andy Lutomirski
2016-02-25 11:36         ` Radim Krčmář
2016-02-25 12:12         ` Marcelo Tosatti
2016-02-24  3:59 ` Marcelo Tosatti
2016-02-24 14:14 ` Paolo Bonzini
2016-02-24 16:44   ` Andy Lutomirski
2016-02-24 17:38     ` Marcelo Tosatti
2016-02-24 19:38       ` Andy Lutomirski
2016-02-24 19:44         ` Paolo Bonzini
2016-02-24 19:52           ` Andy Lutomirski
2016-02-24 19:55         ` Owen Hofmann
2016-02-25 12:22           ` Joao Martins
2016-02-25 12:22           ` Joao Martins [this message]
2016-02-26 15:04 ` Marcelo Tosatti

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='56CEF1EA.6010809__18894.8961244479$1456403008$gmane$org@oracle.com' \
    --to=joao.m.martins@oracle.com \
    --cc=kvm@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=mtosatti@redhat.com \
    --cc=osh@google.com \
    --cc=pbonzini@redhat.com \
    --cc=peterhornyack@google.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.