linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andy Lutomirski <luto@kernel.org>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>,
	Marcelo Tosatti <mtosatti@redhat.com>,
	Maxim Levitsky <mlevitsk@redhat.com>,
	kvm list <kvm@vger.kernel.org>, "H. Peter Anvin" <hpa@zytor.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Jonathan Corbet <corbet@lwn.net>,
	Jim Mattson <jmattson@google.com>,
	Wanpeng Li <wanpengli@tencent.com>,
	"open list:KERNEL SELFTEST FRAMEWORK" 
	<linux-kselftest@vger.kernel.org>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	Sean Christopherson <sean.j.christopherson@intel.com>,
	open list <linux-kernel@vger.kernel.org>,
	Ingo Molnar <mingo@redhat.com>,
	"maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)"
	<x86@kernel.org>, Joerg Roedel <joro@8bytes.org>,
	Borislav Petkov <bp@alien8.de>, Shuah Khan <shuah@kernel.org>,
	Andrew Jones <drjones@redhat.com>,
	Oliver Upton <oupton@google.com>,
	"open list:DOCUMENTATION" <linux-doc@vger.kernel.org>
Subject: Re: [PATCH v2 1/3] KVM: x86: implement KVM_{GET|SET}_TSC_STATE
Date: Thu, 10 Dec 2020 15:42:30 -0800	[thread overview]
Message-ID: <CALCETrVpW3m45opNRzF-xzuZ6xS90HYo_a74JzACrdj8zutE5w@mail.gmail.com> (raw)
In-Reply-To: <87lfe71e1z.fsf@nanos.tec.linutronix.de>

> On Dec 9, 2020, at 2:14 AM, Thomas Gleixner <tglx@linutronix.de> wrote:


>
> But what's more problematic is the basic requirement that time all over
> the place has to be consistent.
>
> On machines which use DISTORTED_REALTIME everything _IS_ consistent
> within the distorted universe they created. It's still inconsistent
> vs. the outside, but that's unsolvable and none of our problems.
>
> TLDR: Do not even think about opening pandoras box.

This could be a per-machine/per-vm setting that nonetheless uses the
same underlying implementation. There are, sadly, lots of people using
smeared time, and there will probably be VM hosts that simultaneously
have both styles of guest. Supporting full PV time for both would be
nice.  Obviously this gets a bit screwy if they are using a paravirt
fs, but it’s also a problem with NFS, etc. So maybe the nasty corners
could be narrow enough to just say “don’t do that”.

>
>> If we want to get fancy, we can make a change that I've contemplated
>> for a while -- we could make t_end explicit and have two copies of all
>> these data structures.  The reader would use one copy if t < t_change
>> and a different copy if t >= t_change.
>
> See below.
>
>> This would allow NTP-like code in usermode to schedule a frequency
>> shift to start at a specific time.
>
> That's an orthogonal problem and can be done without changing the
> reader side.

Really?  Right now, the switch happens whenever the kernel takes the
seqlock, which it doesn’t have exact control over. But I meant
something a little different:

>
>> With some care, it would also allow the timekeeping code to update the
>> data structures without causing clock_gettime() to block while the
>> timekeeping code is running on a different CPU.
>
> It still has to block, i.e. retry, because the data set becomes invalid
> when t_end is reached. So the whole thing would be:
>
>       do {
>               seq = read_seqcount_latch();
>                data = select_data(seq);
>                delta = read_clocksource() - data->base;
>                if (delta >= data->max_delta)
>                    continue;
>                ....
>      } while (read_seqcount_latch_retry());
>
> TBH, I like the idea for exactly one reason: It increases robustness.

I like the max_delta for robustness, too.

What do you have in mind for select_data()?  Is the idea that
select_data() returns something like &data[seq & 1]?

But I actually meant something a little bit different: you would use
delta >= data->max_delta as an indication that you need to look at the
other copy.  Perhaps the lower three bits of the seqcount would work
like:

00: both copies are valid, but start with the first copy.
10: only the first copy is valid.
01: both copies are valid, but start with the second copy.
11: only the second copy is valid

You'd read it like this (with all the bugs that I surely have fixed);

do {
  seq = read_seqcount();
  data = &data_array[seq & 1];
  clock = read_clocksource();
  delta = clock - data->base;
  if (delta->data->max_delta) {
    if (seq & 2)
      continue;
    data = &data_array[(seq + 1) & 1];  // <-- the other copy
    delta = clock - data->base;
    if (delta >= data->max_delta)
      continue;
  }
  ...;
} while (seq == read_seqcount());

This has two main benefits.  First, it allows the timekeeping code to
run concurrently with readers, which is nice for tail latency --
readers would only ever need to spin if the timekeeper falls behind,
intentionally disables both copies, or somehow manages to run one
iteration for each reader attempt and livelocks the reader.  The
latter is very unlikely.)  Second, it allows the timekeeping code to
literally schedule an update to occur at a precise clocksource tick,
which seems to be like it could make the timekeeping code simpler and
more robust.

(If the timekeeper wants to simultaneously disable both copies, it
sets one copy's max_delta to zero and uses seq to disable the other
copy.)

--Andy






>
> For X86 we already have the comparison for dealing with TSC < base
> which would be covered by
>
>                if (delta >= data->max_delta)
>                    continue;
>
> automatically. Non X86 gains this extra conditional, but I think it's
> worth to pay that price.
>
> It won't solve the VM migration problem on it's own though. You still
> have to be careful about the inner workings of everything related to
> timekeeping itself.
>
>> One other thing that might be worth noting: there's another thread
>> about "vmgenid".  It's plausible that it's worth considering stopping
>> the guest or perhaps interrupting all vCPUs to allow it to take some
>> careful actions on migration for reasons that have nothing to do with
>> timekeeping.
>
> How surprising. Who could have thought about that?
>
> OMG, virtualization seems to have gone off into a virtual reality long
> ago.
>
> Thanks,
>
>        tglx
>

  reply	other threads:[~2020-12-10 23:44 UTC|newest]

Thread overview: 77+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-03 17:11 [PATCH v2 0/3] RFC: Precise TSC migration Maxim Levitsky
2020-12-03 17:11 ` [PATCH v2 1/3] KVM: x86: implement KVM_{GET|SET}_TSC_STATE Maxim Levitsky
2020-12-06 16:19   ` Thomas Gleixner
2020-12-07 12:16     ` Maxim Levitsky
2020-12-07 13:16       ` Vitaly Kuznetsov
2020-12-07 17:41         ` Thomas Gleixner
2020-12-08  9:48           ` Peter Zijlstra
2020-12-10 11:42           ` Paolo Bonzini
2020-12-10 12:14             ` Peter Zijlstra
2020-12-10 12:22               ` Paolo Bonzini
2020-12-10 13:01                 ` Peter Zijlstra
2020-12-10 20:20                   ` Thomas Gleixner
2020-12-07 16:38       ` Thomas Gleixner
2020-12-07 16:53         ` Andy Lutomirski
2020-12-07 17:00           ` Maxim Levitsky
2020-12-07 18:04             ` Andy Lutomirski
2020-12-07 23:11               ` Marcelo Tosatti
2020-12-08 17:43                 ` Andy Lutomirski
2020-12-08 19:24                   ` Thomas Gleixner
2020-12-08 20:32                     ` Andy Lutomirski
2020-12-09  0:19                       ` Thomas Gleixner
2020-12-09  4:08                         ` Andy Lutomirski
2020-12-09 10:14                           ` Thomas Gleixner
2020-12-10 23:42                             ` Andy Lutomirski [this message]
2020-12-08 11:24               ` Maxim Levitsky
2020-12-08  9:35         ` Peter Zijlstra
2020-12-07 23:34     ` Marcelo Tosatti
2020-12-07 17:29   ` Oliver Upton
2020-12-08 11:13     ` Maxim Levitsky
2020-12-08 15:57       ` Oliver Upton
2020-12-08 15:58         ` Oliver Upton
2020-12-08 17:10           ` Maxim Levitsky
2020-12-08 16:40       ` Thomas Gleixner
2020-12-08 17:08         ` Maxim Levitsky
2020-12-10 11:48           ` Paolo Bonzini
2020-12-10 14:25             ` Maxim Levitsky
2020-12-07 23:29   ` Marcelo Tosatti
2020-12-08 14:50     ` Maxim Levitsky
2020-12-08 16:02       ` Thomas Gleixner
2020-12-08 16:25         ` Maxim Levitsky
2020-12-08 17:33           ` Andy Lutomirski
2020-12-08 21:25             ` Thomas Gleixner
2020-12-08 18:12           ` Marcelo Tosatti
2020-12-08 21:35             ` Thomas Gleixner
2020-12-08 21:20           ` Thomas Gleixner
2020-12-10 11:48             ` Paolo Bonzini
2020-12-10 14:52               ` Maxim Levitsky
2020-12-10 15:16                 ` Andy Lutomirski
2020-12-10 17:59                   ` Oliver Upton
2020-12-10 18:05                     ` Paolo Bonzini
2020-12-10 18:13                       ` Oliver Upton
2020-12-10 21:25                   ` Thomas Gleixner
2020-12-10 22:01                     ` Andy Lutomirski
2020-12-10 22:28                       ` Thomas Gleixner
2020-12-10 23:19                         ` Andy Lutomirski
2020-12-11  0:03                           ` Thomas Gleixner
2020-12-08 18:11         ` Marcelo Tosatti
2020-12-08 21:33           ` Thomas Gleixner
2020-12-09 16:34             ` Marcelo Tosatti
2020-12-09 20:58               ` Thomas Gleixner
2020-12-10 15:26                 ` Marcelo Tosatti
2020-12-10 21:48                   ` Thomas Gleixner
2020-12-11  0:27                     ` Marcelo Tosatti
2020-12-11 13:30                       ` Thomas Gleixner
2020-12-11 14:18                         ` Marcelo Tosatti
2020-12-11 21:04                           ` Thomas Gleixner
2020-12-11 21:59                             ` Paolo Bonzini
2020-12-12 13:03                               ` Thomas Gleixner
2020-12-15 10:59                               ` Marcelo Tosatti
2020-12-15 16:55                                 ` Andy Lutomirski
2020-12-15 22:34                                 ` Thomas Gleixner
2020-12-11 13:37                       ` Paolo Bonzini
2020-12-08 17:35       ` Marcelo Tosatti
2020-12-03 17:11 ` [PATCH v2 2/3] KVM: x86: introduce KVM_X86_QUIRK_TSC_HOST_ACCESS Maxim Levitsky
2020-12-03 17:11 ` [PATCH v2 3/3] kvm/selftests: update tsc_msrs_test to cover KVM_X86_QUIRK_TSC_HOST_ACCESS Maxim Levitsky
2020-12-07 23:16 ` [PATCH v2 0/3] RFC: Precise TSC migration Marcelo Tosatti
2020-12-10 11:48 ` Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CALCETrVpW3m45opNRzF-xzuZ6xS90HYo_a74JzACrdj8zutE5w@mail.gmail.com \
    --to=luto@kernel.org \
    --cc=bp@alien8.de \
    --cc=corbet@lwn.net \
    --cc=drjones@redhat.com \
    --cc=hpa@zytor.com \
    --cc=jmattson@google.com \
    --cc=joro@8bytes.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=mlevitsk@redhat.com \
    --cc=mtosatti@redhat.com \
    --cc=oupton@google.com \
    --cc=pbonzini@redhat.com \
    --cc=sean.j.christopherson@intel.com \
    --cc=shuah@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=vkuznets@redhat.com \
    --cc=wanpengli@tencent.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).