linux-hyperv.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Vitaly Kuznetsov <vkuznets@redhat.com>
To: Michael Kelley <mikelley@microsoft.com>,
	Tianyu Lan <lantianyu1986@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Tianyu Lan <Tianyu.Lan@microsoft.com>,
	"linux-arch\@vger.kernel.org" <linux-arch@vger.kernel.org>,
	"linux-hyperv\@vger.kernel.org" <linux-hyperv@vger.kernel.org>,
	"linux-kernel\@vger kernel org" <linux-kernel@vger.kernel.org>,
	Andy Lutomirski <luto@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	"H. Peter Anvin" <hpa@zytor.com>,
	the arch/x86 maintainers <x86@kernel.org>,
	KY Srinivasan <kys@microsoft.com>,
	Haiyang Zhang <haiyangz@microsoft.com>,
	Stephen Hemminger <sthemmin@microsoft.com>,
	Sasha Levin <sashal@kernel.org>,
	Daniel Lezcano <daniel.lezcano@linaro.org>,
	Arnd Bergmann <arnd@arndb.de>,
	"ashal\@kernel.org" <ashal@kernel.org>
Subject: RE: [PATCH 0/2] clocksource/Hyper-V: Add Hyper-V specific sched clock function
Date: Wed, 21 Aug 2019 09:15:23 +0200	[thread overview]
Message-ID: <87o90jq99w.fsf@vitty.brq.redhat.com> (raw)
In-Reply-To: <DM5PR21MB013730EB79A17AF02C170BD7D7AB0@DM5PR21MB0137.namprd21.prod.outlook.com>

Michael Kelley <mikelley@microsoft.com> writes:

> From: Vitaly Kuznetsov <vkuznets@redhat.com> Sent: Tuesday, August 13, 2019 1:34 AM
>> 
>> Michael Kelley <mikelley@microsoft.com> writes:
>> 
>> > From: Tianyu Lan <lantianyu1986@gmail.com> Sent: Tuesday, July 30, 2019 6:41 AM
>> >>
>> >> On Mon, Jul 29, 2019 at 8:13 PM Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
>> >> >
>> >> > Peter Zijlstra <peterz@infradead.org> writes:
>> >> >
>> >> > > On Mon, Jul 29, 2019 at 12:59:26PM +0200, Vitaly Kuznetsov wrote:
>> >> > >> lantianyu1986@gmail.com writes:
>> >> > >>
>> >> > >> > From: Tianyu Lan <Tianyu.Lan@microsoft.com>
>> >> > >> >
>> >> > >> > Hyper-V guests use the default native_sched_clock() in pv_ops.time.sched_clock
>> >> > >> > on x86.  But native_sched_clock() directly uses the raw TSC value, which
>> >> > >> > can be discontinuous in a Hyper-V VM.   Add the generic hv_setup_sched_clock()
>> >> > >> > to set the sched clock function appropriately.  On x86, this sets
>> >> > >> > pv_ops.time.sched_clock to read the Hyper-V reference TSC value that is
>> >> > >> > scaled and adjusted to be continuous.
>> >> > >>
>> >> > >> Hypervisor can, in theory, disable TSC page and then we're forced to use
>> >> > >> MSR-based clocksource but using it as sched_clock() can be very slow,
>> >> > >> I'm afraid.
>> >> > >>
>> >> > >> On the other hand, what we have now is probably worse: TSC can,
>> >> > >> actually, jump backwards (e.g. on migration) and we're breaking the
>> >> > >> requirements for sched_clock().
>> >> > >
>> >> > > That (obviously) also breaks the requirements for using TSC as
>> >> > > clocksource.
>> >> > >
>> >> > > IOW, it breaks the entire purpose of having TSC in the first place.
>> >> >
>> >> > Currently, we mark raw TSC as unstable when running on Hyper-V (see
>> >> > 88c9281a9fba6), 'TSC page' (which is TSC * scale + offset) is being used
>> >> > instead. The problem is that 'TSC page' can be disabled by the
>> >> > hypervisor and in that case the only remaining clocksource is MSR-based
>> >> > (slow).
>> >> >
>> >>
>> >> Yes, that will be slow if Hyper-V doesn't expose hv tsc page and
>> >> kernel uses MSR based
>> >> clocksource. Each MSR read will trigger one VM-EXIT. This also happens on other
>> >> hypervisors (e,g, KVM doesn't expose KVM clock). Hypervisor should
>> >> take this into
>> >> account and determine which clocksource should be exposed or not.
>> >>
>> >
>> > We've confirmed with the Hyper-V team that the TSC page is always available
>> > on Hyper-V 2016 and later, and on Hyper-V 2012 R2 when the physical
>> > hardware presents an InvariantTSC.
>> 
>> Currently we check that TSC page is valid on every read and it seems
>> this is redundant, right? It is either available on boot or not. I can
>> only imagine migrating a VM to a non-InvariantTSC host when Hyper-V will
>> likely disable the page (and we can get reenlightenment notification
>> then).
>
> I think Hyper-V can have brief intervals when the TSC page is not valid, so
> the code checks for the "sequence" value being zero.   Otherwise, yes, it
> should always be there or not be there.  Is there some other validity
> check on every read that you are thinking of?
>

No, it's this one. In case these 'invalidity periods' are real there's
nothing to improve in the current code.

>> 
>> >  But the Linux Kconfig's are set up so
>> > the TSC page is not used for 32-bit guests -- all clock reads are synthetic MSR
>> > reads.  For 32-bit, this set of changes will add more overhead because the
>> > sched clock reads will now be MSR reads.
>> >
>> > I would be inclined to fix the problem, even with the perf hit on 32-bit Linux.
>> > I don’t have any data on 32-bit Linux being used in a Hyper-V guest, but it's not
>> > supported in Azure so usage is pretty small.  The alternative would be to continue
>> > to use the raw TSC value on 32-bit, even with the risk of a discontinuity in case of
>> > live migration or similar scenarios.
>> 
>> The issue needs fixing, I agree, however using MSR based clocksource as
>> sched clock may give us too big of a performance hit (not sure who cares
>> about 32 bit guest performance nowadays but still). What stops us from
>> enabling TSC page for 32 bit guests if it is available?
>
> I talked to KY Srinivasan for any history about TSC page on 32-bit.  He said
> there was no technical reason not to implement it, but our focus was always
> 64-bit Linux, so the 32-bit was much less important.  Also, on 32-bit Linux,
> the required 64x64 multiply and shift is more complex and takes more
> more cycles (compare 32-bit implementation of mul_u64_u64_shr vs.
> the 64-bit implementation), so the win over a MSR read is less.  I
> don't know of any actual measurements being made to compare vs.
> MSR read.

VMExit is 1000 CPU cycles or so, I would guess that TSC page
calculations are better. Let me try to build 32bit kernel and do some
quick measurements.

-- 
Vitaly

  reply	other threads:[~2019-08-21  7:15 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-29  7:52 [PATCH 0/2] clocksource/Hyper-V: Add Hyper-V specific sched clock function lantianyu1986
2019-07-29  7:52 ` [PATCH 1/2] clocksource/Hyper-v: Allocate Hyper-V tsc page statically lantianyu1986
2019-08-12 18:39   ` Michael Kelley
2019-07-29  7:52 ` [PATCH 2/2] clocksource/Hyper-V: Add Hyper-V specific sched clock function lantianyu1986
2019-08-12 18:41   ` Michael Kelley
2019-07-29 10:59 ` [PATCH 0/2] " Vitaly Kuznetsov
2019-07-29 11:09   ` Peter Zijlstra
2019-07-29 12:13     ` Vitaly Kuznetsov
2019-07-30 13:41       ` Tianyu Lan
2019-08-12 19:22         ` Michael Kelley
2019-08-13  8:33           ` Vitaly Kuznetsov
2019-08-20 14:32             ` Michael Kelley
2019-08-21  7:15               ` Vitaly Kuznetsov [this message]
2019-08-21  8:54                 ` Vitaly Kuznetsov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87o90jq99w.fsf@vitty.brq.redhat.com \
    --to=vkuznets@redhat.com \
    --cc=Tianyu.Lan@microsoft.com \
    --cc=arnd@arndb.de \
    --cc=ashal@kernel.org \
    --cc=bp@alien8.de \
    --cc=daniel.lezcano@linaro.org \
    --cc=haiyangz@microsoft.com \
    --cc=hpa@zytor.com \
    --cc=kys@microsoft.com \
    --cc=lantianyu1986@gmail.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-hyperv@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mikelley@microsoft.com \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=sashal@kernel.org \
    --cc=sthemmin@microsoft.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).