linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andy Lutomirski <luto@amacapital.net>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>,
	Marcelo Tosatti <mtosatti@redhat.com>,
	Maxim Levitsky <mlevitsk@redhat.com>,
	kvm list <kvm@vger.kernel.org>, "H. Peter Anvin" <hpa@zytor.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Jonathan Corbet <corbet@lwn.net>,
	Jim Mattson <jmattson@google.com>,
	Wanpeng Li <wanpengli@tencent.com>,
	"open list:KERNEL SELFTEST FRAMEWORK" 
	<linux-kselftest@vger.kernel.org>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	Sean Christopherson <sean.j.christopherson@intel.com>,
	open list <linux-kernel@vger.kernel.org>,
	Ingo Molnar <mingo@redhat.com>,
	"maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)"
	<x86@kernel.org>, Joerg Roedel <joro@8bytes.org>,
	Borislav Petkov <bp@alien8.de>, Shuah Khan <shuah@kernel.org>,
	Andrew Jones <drjones@redhat.com>,
	Oliver Upton <oupton@google.com>,
	"open list:DOCUMENTATION" <linux-doc@vger.kernel.org>
Subject: Re: [PATCH v2 1/3] KVM: x86: implement KVM_{GET|SET}_TSC_STATE
Date: Tue, 8 Dec 2020 12:32:48 -0800	[thread overview]
Message-ID: <301491B7-DEB6-41ED-B8FD-657B864696CF@amacapital.net> (raw)
In-Reply-To: <87h7ow2j91.fsf@nanos.tec.linutronix.de>


> On Dec 8, 2020, at 11:25 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> 
> On Tue, Dec 08 2020 at 09:43, Andy Lutomirski wrote:
>> On Tue, Dec 8, 2020 at 6:23 AM Marcelo Tosatti <mtosatti@redhat.com> wrote:
>> It looks like it tries to accomplish the right goal, but in a rather
>> roundabout way.  The host knows how to convert from TSC to
>> CLOCK_REALTIME, and ptp_kvm.c exposes this data to the guest.  But,
>> rather than just making the guest use the same CLOCK_REALTIME data as
>> the host, ptp_kvm.c seems to expose information to usermode that a
>> user daemon could use to attempt (with some degree of error?) to use
>> to make the guest kernel track CLOCK_REALTIME.  This seems inefficient
>> and dubiously accurate.
>> 
>> My feature request is for this to be fully automatic and completely
>> coherent.  I would like for a host user program and a guest user
>> program to be able to share memory, run concurrently, and use the
>> shared memory to exchange CLOCK_REALTIME values without ever observing
>> the clock going backwards.  This ought to be doable.  Ideally the
>> result should even be usable for Spanner-style synchronization
>> assuming the host clock is good enough.  Also, this whole thing should
>> work without needing to periodically wake the guest to remain
>> synchronized.  If the guest sleeps for two minutes (full nohz-idle, no
>> guest activity at all), the host makes a small REALTIME frequency
>> adjustment, and then the guest runs user code that reads
>> CLOCK_REALTIME, the guest clock should still be fully synchronized
>> with the host.  I don't think that ptp_kvm.c-style synchronization can
>> do this.
> 
> One issue here is that guests might want to run their own NTP/PTP. One
> reason to do that is that some people prefer the leap second smearing
> NTP servers. 

I would hope that using this part would be optional on the guest’s part. Guests should be able to use just the CLOCK_MONOTONIC_RAW part or fancier stuff at their option.

(Hmm, it would, in principle, be possible for a guest to use the host’s TAI but still smear leap seconds. Even without virt, smearing could be a per-timens option.)

> 
>> tglx etc, I think that doing this really really nicely might involve
>> promoting something like the current vDSO data structures to ABI -- a
>> straightforward-ish implementation would be for the KVM host to export
>> its vvar clock data to the guest and for the guest to use it, possibly
>> with an offset applied.  The offset could work a lot like timens works
>> today.
> 
> Works nicely if the guest TSC is not scaled. But that means that on
> migration the raw TSC usage in the guest is borked because the new host
> might have a different TSC frequency.
> 
> If you use TSC scaling then the conversion needs to take TSC scaling
> into account which needs some thought. And the guest would need to read
> the host conversion from 'vdso data' and the scaling from the next page
> (per guest) and then still has to support timens. Doable but adds extra
> overhead on every time read operation.

Is the issue that scaling would result in a different guest vs host frequency?  Perhaps we could limit each physical machine to exactly two modes: unscaled (use TSC ticks, convert in software) and scaled to nanoseconds (CLOCK_MONOTONIC_RAW is RDTSC + possible offset).  Then the host could produce its data structures in exactly those two formats and export them as appropriate. 

> 
> If you want to avoid that you are back to the point where you need to
> chase all guest data when the host NTP/PTP adjusts the host side.
> Chasing and updating all this stuff in the tick was the reason why I was
> fighting the idea of clock realtime in namespaces.

I think that, if we can arrange for a small, bounded number of pages generated by the host, then this problem isn’t so bad.

Hmm, leap second smearing is just a different linear mapping. I’m not sure how leap second smearing should interact with timens, but it seems to be that the host should be able to produce four data pages (scaled vs unscaled and smeared vs unsmeared) and one per-guest/timens offset page (where offset applies to MONOTONIC and MONOTONIC_RAW only) and cover all bases.  Or do people actually want to offset their TAI and/or REALTIME, and what would that even mean if the offset crosses a leap second?

(I haven’t though about the interaction of any of this with ART.)

  reply	other threads:[~2020-12-08 20:33 UTC|newest]

Thread overview: 77+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-03 17:11 [PATCH v2 0/3] RFC: Precise TSC migration Maxim Levitsky
2020-12-03 17:11 ` [PATCH v2 1/3] KVM: x86: implement KVM_{GET|SET}_TSC_STATE Maxim Levitsky
2020-12-06 16:19   ` Thomas Gleixner
2020-12-07 12:16     ` Maxim Levitsky
2020-12-07 13:16       ` Vitaly Kuznetsov
2020-12-07 17:41         ` Thomas Gleixner
2020-12-08  9:48           ` Peter Zijlstra
2020-12-10 11:42           ` Paolo Bonzini
2020-12-10 12:14             ` Peter Zijlstra
2020-12-10 12:22               ` Paolo Bonzini
2020-12-10 13:01                 ` Peter Zijlstra
2020-12-10 20:20                   ` Thomas Gleixner
2020-12-07 16:38       ` Thomas Gleixner
2020-12-07 16:53         ` Andy Lutomirski
2020-12-07 17:00           ` Maxim Levitsky
2020-12-07 18:04             ` Andy Lutomirski
2020-12-07 23:11               ` Marcelo Tosatti
2020-12-08 17:43                 ` Andy Lutomirski
2020-12-08 19:24                   ` Thomas Gleixner
2020-12-08 20:32                     ` Andy Lutomirski [this message]
2020-12-09  0:19                       ` Thomas Gleixner
2020-12-09  4:08                         ` Andy Lutomirski
2020-12-09 10:14                           ` Thomas Gleixner
2020-12-10 23:42                             ` Andy Lutomirski
2020-12-08 11:24               ` Maxim Levitsky
2020-12-08  9:35         ` Peter Zijlstra
2020-12-07 23:34     ` Marcelo Tosatti
2020-12-07 17:29   ` Oliver Upton
2020-12-08 11:13     ` Maxim Levitsky
2020-12-08 15:57       ` Oliver Upton
2020-12-08 15:58         ` Oliver Upton
2020-12-08 17:10           ` Maxim Levitsky
2020-12-08 16:40       ` Thomas Gleixner
2020-12-08 17:08         ` Maxim Levitsky
2020-12-10 11:48           ` Paolo Bonzini
2020-12-10 14:25             ` Maxim Levitsky
2020-12-07 23:29   ` Marcelo Tosatti
2020-12-08 14:50     ` Maxim Levitsky
2020-12-08 16:02       ` Thomas Gleixner
2020-12-08 16:25         ` Maxim Levitsky
2020-12-08 17:33           ` Andy Lutomirski
2020-12-08 21:25             ` Thomas Gleixner
2020-12-08 18:12           ` Marcelo Tosatti
2020-12-08 21:35             ` Thomas Gleixner
2020-12-08 21:20           ` Thomas Gleixner
2020-12-10 11:48             ` Paolo Bonzini
2020-12-10 14:52               ` Maxim Levitsky
2020-12-10 15:16                 ` Andy Lutomirski
2020-12-10 17:59                   ` Oliver Upton
2020-12-10 18:05                     ` Paolo Bonzini
2020-12-10 18:13                       ` Oliver Upton
2020-12-10 21:25                   ` Thomas Gleixner
2020-12-10 22:01                     ` Andy Lutomirski
2020-12-10 22:28                       ` Thomas Gleixner
2020-12-10 23:19                         ` Andy Lutomirski
2020-12-11  0:03                           ` Thomas Gleixner
2020-12-08 18:11         ` Marcelo Tosatti
2020-12-08 21:33           ` Thomas Gleixner
2020-12-09 16:34             ` Marcelo Tosatti
2020-12-09 20:58               ` Thomas Gleixner
2020-12-10 15:26                 ` Marcelo Tosatti
2020-12-10 21:48                   ` Thomas Gleixner
2020-12-11  0:27                     ` Marcelo Tosatti
2020-12-11 13:30                       ` Thomas Gleixner
2020-12-11 14:18                         ` Marcelo Tosatti
2020-12-11 21:04                           ` Thomas Gleixner
2020-12-11 21:59                             ` Paolo Bonzini
2020-12-12 13:03                               ` Thomas Gleixner
2020-12-15 10:59                               ` Marcelo Tosatti
2020-12-15 16:55                                 ` Andy Lutomirski
2020-12-15 22:34                                 ` Thomas Gleixner
2020-12-11 13:37                       ` Paolo Bonzini
2020-12-08 17:35       ` Marcelo Tosatti
2020-12-03 17:11 ` [PATCH v2 2/3] KVM: x86: introduce KVM_X86_QUIRK_TSC_HOST_ACCESS Maxim Levitsky
2020-12-03 17:11 ` [PATCH v2 3/3] kvm/selftests: update tsc_msrs_test to cover KVM_X86_QUIRK_TSC_HOST_ACCESS Maxim Levitsky
2020-12-07 23:16 ` [PATCH v2 0/3] RFC: Precise TSC migration Marcelo Tosatti
2020-12-10 11:48 ` Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=301491B7-DEB6-41ED-B8FD-657B864696CF@amacapital.net \
    --to=luto@amacapital.net \
    --cc=bp@alien8.de \
    --cc=corbet@lwn.net \
    --cc=drjones@redhat.com \
    --cc=hpa@zytor.com \
    --cc=jmattson@google.com \
    --cc=joro@8bytes.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mingo@redhat.com \
    --cc=mlevitsk@redhat.com \
    --cc=mtosatti@redhat.com \
    --cc=oupton@google.com \
    --cc=pbonzini@redhat.com \
    --cc=sean.j.christopherson@intel.com \
    --cc=shuah@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=vkuznets@redhat.com \
    --cc=wanpengli@tencent.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).