All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vitaly Kuznetsov <vkuznets@redhat.com>
To: Michael Kelley <mikelley@microsoft.com>,
	Tianyu Lan <lantianyu1986@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Tianyu Lan <Tianyu.Lan@microsoft.com>,
	"linux-arch\@vger.kernel.org" <linux-arch@vger.kernel.org>,
	"linux-hyperv\@vger.kernel.org" <linux-hyperv@vger.kernel.org>,
	"linux-kernel\@vger kernel org" <linux-kernel@vger.kernel.org>,
	Andy Lutomirski <luto@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	"H. Peter Anvin" <hpa@zytor.com>,
	the arch/x86 maintainers <x86@kernel.org>,
	KY Srinivasan <kys@microsoft.com>,
	Haiyang Zhang <haiyangz@microsoft.com>,
	Stephen Hemminger <sthemmin@microsoft.com>,
	Sasha Levin <sashal@kernel.org>,
	Daniel Lezcano <daniel.lezcano@linaro.org>,
	Arnd Bergmann <arnd@arndb.de>,
	"ashal\@kernel.org" <ashal@kernel.org>
Subject: RE: [PATCH 0/2] clocksource/Hyper-V: Add Hyper-V specific sched clock function
Date: Tue, 13 Aug 2019 10:33:37 +0200	[thread overview]
Message-ID: <87sgq5a2hq.fsf@vitty.brq.redhat.com> (raw)
In-Reply-To: <DM5PR21MB0137E03AAD8C2EA61EC81ED7D7D30@DM5PR21MB0137.namprd21.prod.outlook.com>

Michael Kelley <mikelley@microsoft.com> writes:

> From: Tianyu Lan <lantianyu1986@gmail.com> Sent: Tuesday, July 30, 2019 6:41 AM
>> 
>> On Mon, Jul 29, 2019 at 8:13 PM Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
>> >
>> > Peter Zijlstra <peterz@infradead.org> writes:
>> >
>> > > On Mon, Jul 29, 2019 at 12:59:26PM +0200, Vitaly Kuznetsov wrote:
>> > >> lantianyu1986@gmail.com writes:
>> > >>
>> > >> > From: Tianyu Lan <Tianyu.Lan@microsoft.com>
>> > >> >
>> > >> > Hyper-V guests use the default native_sched_clock() in pv_ops.time.sched_clock
>> > >> > on x86.  But native_sched_clock() directly uses the raw TSC value, which
>> > >> > can be discontinuous in a Hyper-V VM.   Add the generic hv_setup_sched_clock()
>> > >> > to set the sched clock function appropriately.  On x86, this sets
>> > >> > pv_ops.time.sched_clock to read the Hyper-V reference TSC value that is
>> > >> > scaled and adjusted to be continuous.
>> > >>
>> > >> Hypervisor can, in theory, disable TSC page and then we're forced to use
>> > >> MSR-based clocksource but using it as sched_clock() can be very slow,
>> > >> I'm afraid.
>> > >>
>> > >> On the other hand, what we have now is probably worse: TSC can,
>> > >> actually, jump backwards (e.g. on migration) and we're breaking the
>> > >> requirements for sched_clock().
>> > >
>> > > That (obviously) also breaks the requirements for using TSC as
>> > > clocksource.
>> > >
>> > > IOW, it breaks the entire purpose of having TSC in the first place.
>> >
>> > Currently, we mark raw TSC as unstable when running on Hyper-V (see
>> > 88c9281a9fba6), 'TSC page' (which is TSC * scale + offset) is being used
>> > instead. The problem is that 'TSC page' can be disabled by the
>> > hypervisor and in that case the only remaining clocksource is MSR-based
>> > (slow).
>> >
>> 
>> Yes, that will be slow if Hyper-V doesn't expose hv tsc page and
>> kernel uses MSR based
>> clocksource. Each MSR read will trigger one VM-EXIT. This also happens on other
>> hypervisors (e,g, KVM doesn't expose KVM clock). Hypervisor should
>> take this into
>> account and determine which clocksource should be exposed or not.
>> 
>
> We've confirmed with the Hyper-V team that the TSC page is always available
> on Hyper-V 2016 and later, and on Hyper-V 2012 R2 when the physical
> hardware presents an InvariantTSC.

Currently we check that TSC page is valid on every read and it seems
this is redundant, right? It is either available on boot or not. I can
only imagine migrating a VM to a non-InvariantTSC host when Hyper-V will
likely disable the page (and we can get reenlightenment notification
then).

>  But the Linux Kconfig's are set up so
> the TSC page is not used for 32-bit guests -- all clock reads are synthetic MSR
> reads.  For 32-bit, this set of changes will add more overhead because the
> sched clock reads will now be MSR reads.
>
> I would be inclined to fix the problem, even with the perf hit on 32-bit Linux.
> I don’t have any data on 32-bit Linux being used in a Hyper-V guest, but it's not
> supported in Azure so usage is pretty small.  The alternative would be to continue
> to use the raw TSC value on 32-bit, even with the risk of a discontinuity in case of
> live migration or similar scenarios.

The issue needs fixing, I agree, however using MSR based clocksource as
sched clock may give us too big of a performance hit (not sure who cares
about 32 bit guest performance nowadays but still). What stops us from
enabling TSC page for 32 bit guests if it is available?

-- 
Vitaly

WARNING: multiple messages have this Message-ID (diff)
From: Vitaly Kuznetsov <vkuznets@redhat.com>
To: Michael Kelley <mikelley@microsoft.com>,
	Tianyu Lan <lantianyu1986@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Tianyu Lan <Tianyu.Lan@microsoft.com>,
	"linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>,
	"linux-hyperv@vger.kernel.org" <linux-hyperv@vger.kernel.org>,
	"linux-kernel@vger kernel org" <linux-kernel@vger.kernel.org>,
	Andy Lutomirski <luto@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	"H. Peter Anvin" <hpa@zytor.com>,
	the arch/x86 maintainers <x86@kernel.org>,
	KY Srinivasan <kys@microsoft.com>,
	Haiyang Zhang <haiyangz@microsoft.com>,
	Stephen Hemminger <sthemmin@microsoft.com>,
	Sasha Levin <sashal@kernel.org>,
	Daniel Lezcano <daniel.lezcano@linaro.org>,
	Arnd Bergmann <arnd@arndb.de>"ashal@kernel.org"
	<ashal@kernel.org>
Subject: RE: [PATCH 0/2] clocksource/Hyper-V: Add Hyper-V specific sched clock function
Date: Tue, 13 Aug 2019 10:33:37 +0200	[thread overview]
Message-ID: <87sgq5a2hq.fsf@vitty.brq.redhat.com> (raw)
In-Reply-To: <DM5PR21MB0137E03AAD8C2EA61EC81ED7D7D30@DM5PR21MB0137.namprd21.prod.outlook.com>

Michael Kelley <mikelley@microsoft.com> writes:

> From: Tianyu Lan <lantianyu1986@gmail.com> Sent: Tuesday, July 30, 2019 6:41 AM
>> 
>> On Mon, Jul 29, 2019 at 8:13 PM Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
>> >
>> > Peter Zijlstra <peterz@infradead.org> writes:
>> >
>> > > On Mon, Jul 29, 2019 at 12:59:26PM +0200, Vitaly Kuznetsov wrote:
>> > >> lantianyu1986@gmail.com writes:
>> > >>
>> > >> > From: Tianyu Lan <Tianyu.Lan@microsoft.com>
>> > >> >
>> > >> > Hyper-V guests use the default native_sched_clock() in pv_ops.time.sched_clock
>> > >> > on x86.  But native_sched_clock() directly uses the raw TSC value, which
>> > >> > can be discontinuous in a Hyper-V VM.   Add the generic hv_setup_sched_clock()
>> > >> > to set the sched clock function appropriately.  On x86, this sets
>> > >> > pv_ops.time.sched_clock to read the Hyper-V reference TSC value that is
>> > >> > scaled and adjusted to be continuous.
>> > >>
>> > >> Hypervisor can, in theory, disable TSC page and then we're forced to use
>> > >> MSR-based clocksource but using it as sched_clock() can be very slow,
>> > >> I'm afraid.
>> > >>
>> > >> On the other hand, what we have now is probably worse: TSC can,
>> > >> actually, jump backwards (e.g. on migration) and we're breaking the
>> > >> requirements for sched_clock().
>> > >
>> > > That (obviously) also breaks the requirements for using TSC as
>> > > clocksource.
>> > >
>> > > IOW, it breaks the entire purpose of having TSC in the first place.
>> >
>> > Currently, we mark raw TSC as unstable when running on Hyper-V (see
>> > 88c9281a9fba6), 'TSC page' (which is TSC * scale + offset) is being used
>> > instead. The problem is that 'TSC page' can be disabled by the
>> > hypervisor and in that case the only remaining clocksource is MSR-based
>> > (slow).
>> >
>> 
>> Yes, that will be slow if Hyper-V doesn't expose hv tsc page and
>> kernel uses MSR based
>> clocksource. Each MSR read will trigger one VM-EXIT. This also happens on other
>> hypervisors (e,g, KVM doesn't expose KVM clock). Hypervisor should
>> take this into
>> account and determine which clocksource should be exposed or not.
>> 
>
> We've confirmed with the Hyper-V team that the TSC page is always available
> on Hyper-V 2016 and later, and on Hyper-V 2012 R2 when the physical
> hardware presents an InvariantTSC.

Currently we check that TSC page is valid on every read and it seems
this is redundant, right? It is either available on boot or not. I can
only imagine migrating a VM to a non-InvariantTSC host when Hyper-V will
likely disable the page (and we can get reenlightenment notification
then).

>  But the Linux Kconfig's are set up so
> the TSC page is not used for 32-bit guests -- all clock reads are synthetic MSR
> reads.  For 32-bit, this set of changes will add more overhead because the
> sched clock reads will now be MSR reads.
>
> I would be inclined to fix the problem, even with the perf hit on 32-bit Linux.
> I don’t have any data on 32-bit Linux being used in a Hyper-V guest, but it's not
> supported in Azure so usage is pretty small.  The alternative would be to continue
> to use the raw TSC value on 32-bit, even with the risk of a discontinuity in case of
> live migration or similar scenarios.

The issue needs fixing, I agree, however using MSR based clocksource as
sched clock may give us too big of a performance hit (not sure who cares
about 32 bit guest performance nowadays but still). What stops us from
enabling TSC page for 32 bit guests if it is available?

-- 
Vitaly

  reply	other threads:[~2019-08-13  8:33 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-29  7:52 [PATCH 0/2] clocksource/Hyper-V: Add Hyper-V specific sched clock function lantianyu1986
2019-07-29  7:52 ` lantianyu1986
2019-07-29  7:52 ` [PATCH 1/2] clocksource/Hyper-v: Allocate Hyper-V tsc page statically lantianyu1986
2019-07-29  7:52   ` lantianyu1986
2019-08-12 18:39   ` Michael Kelley
2019-08-12 18:39     ` Michael Kelley
2019-07-29  7:52 ` [PATCH 2/2] clocksource/Hyper-V: Add Hyper-V specific sched clock function lantianyu1986
2019-07-29  7:52   ` lantianyu1986
2019-08-12 18:41   ` Michael Kelley
2019-08-12 18:41     ` Michael Kelley
2019-07-29 10:59 ` [PATCH 0/2] " Vitaly Kuznetsov
2019-07-29 10:59   ` Vitaly Kuznetsov
2019-07-29 11:09   ` Peter Zijlstra
2019-07-29 11:09     ` Peter Zijlstra
2019-07-29 12:13     ` Vitaly Kuznetsov
2019-07-29 12:13       ` Vitaly Kuznetsov
2019-07-30 13:41       ` Tianyu Lan
2019-07-30 13:41         ` Tianyu Lan
2019-08-12 19:22         ` Michael Kelley
2019-08-12 19:22           ` Michael Kelley
2019-08-13  8:33           ` Vitaly Kuznetsov [this message]
2019-08-13  8:33             ` Vitaly Kuznetsov
2019-08-20 14:32             ` Michael Kelley
2019-08-20 14:32               ` Michael Kelley
2019-08-21  7:15               ` Vitaly Kuznetsov
2019-08-21  7:15                 ` Vitaly Kuznetsov
2019-08-21  8:54                 ` Vitaly Kuznetsov
2019-08-21  8:54                   ` Vitaly Kuznetsov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87sgq5a2hq.fsf@vitty.brq.redhat.com \
    --to=vkuznets@redhat.com \
    --cc=Tianyu.Lan@microsoft.com \
    --cc=arnd@arndb.de \
    --cc=ashal@kernel.org \
    --cc=bp@alien8.de \
    --cc=daniel.lezcano@linaro.org \
    --cc=haiyangz@microsoft.com \
    --cc=hpa@zytor.com \
    --cc=kys@microsoft.com \
    --cc=lantianyu1986@gmail.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-hyperv@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mikelley@microsoft.com \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=sashal@kernel.org \
    --cc=sthemmin@microsoft.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.