Linux-HyperV Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH 0/2] clocksource/Hyper-V: Add Hyper-V specific sched clock function
@ 2019-07-29  7:52 lantianyu1986
  2019-07-29  7:52 ` [PATCH 1/2] clocksource/Hyper-v: Allocate Hyper-V tsc page statically lantianyu1986
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: lantianyu1986 @ 2019-07-29  7:52 UTC (permalink / raw)
  To: luto, tglx, mingo, bp, hpa, x86, kys, haiyangz, sthemmin, sashal,
	daniel.lezcano, arnd, michael.h.kelley, ashal
  Cc: Tianyu Lan, linux-arch, linux-hyperv, linux-kernel

From: Tianyu Lan <Tianyu.Lan@microsoft.com>

Hyper-V guests use the default native_sched_clock() in pv_ops.time.sched_clock
on x86.  But native_sched_clock() directly uses the raw TSC value, which
can be discontinuous in a Hyper-V VM.   Add the generic hv_setup_sched_clock()
to set the sched clock function appropriately.  On x86, this sets
pv_ops.time.sched_clock to read the Hyper-V reference TSC value that is
scaled and adjusted to be continuous.

Also move the Hyper-V reference TSC initialization much earlier in the boot
process so no discontinuity is observed when pv_ops.time.sched_clock
calculates its offset.  This earlier initialization requires that the Hyper-V TSC
page be allocated statically instead of with vmalloc(), so fixup the references
to the TSC page and the method of getting its physical address.

Tianyu Lan (2):
  clocksource/Hyper-v: Allocate Hyper-V tsc page statically
  clocksource/Hyper-V: Add Hyper-V specific sched clock function

 arch/x86/entry/vdso/vma.c          |  2 +-
 arch/x86/hyperv/hv_init.c          |  2 --
 arch/x86/kernel/cpu/mshyperv.c     |  8 ++++++++
 drivers/clocksource/hyperv_timer.c | 34 ++++++++++++++++------------------
 include/asm-generic/mshyperv.h     |  1 +
 5 files changed, 26 insertions(+), 21 deletions(-)

-- 
2.14.5


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/2] clocksource/Hyper-v: Allocate Hyper-V tsc page statically
  2019-07-29  7:52 [PATCH 0/2] clocksource/Hyper-V: Add Hyper-V specific sched clock function lantianyu1986
@ 2019-07-29  7:52 ` lantianyu1986
  2019-08-12 18:39   ` Michael Kelley
  2019-07-29  7:52 ` [PATCH 2/2] clocksource/Hyper-V: Add Hyper-V specific sched clock function lantianyu1986
  2019-07-29 10:59 ` [PATCH 0/2] " Vitaly Kuznetsov
  2 siblings, 1 reply; 11+ messages in thread
From: lantianyu1986 @ 2019-07-29  7:52 UTC (permalink / raw)
  To: luto, tglx, mingo, bp, hpa, x86, kys, haiyangz, sthemmin, sashal,
	daniel.lezcano, arnd, michael.h.kelley, ashal
  Cc: Tianyu Lan, linux-kernel, linux-hyperv, linux-arch

From: Tianyu Lan <Tianyu.Lan@microsoft.com>

This is to prepare to add Hyper-V sched clock callback and move
Hyper-V reference TSC initialization much earlier in the boot
process when timestamp is 0. So no discontinuity is observed
when pv_ops.time.sched_clock to calculate its offset. This earlier
initialization requires that the Hyper-V TSC page be allocated
statically instead of with vmalloc(), so fixup the references
to the TSC page and the method of getting its physical address.

Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
---
 arch/x86/entry/vdso/vma.c          |  2 +-
 drivers/clocksource/hyperv_timer.c | 12 ++++--------
 2 files changed, 5 insertions(+), 9 deletions(-)

diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
index 349a61d8bf34..f5937742b290 100644
--- a/arch/x86/entry/vdso/vma.c
+++ b/arch/x86/entry/vdso/vma.c
@@ -122,7 +122,7 @@ static vm_fault_t vvar_fault(const struct vm_special_mapping *sm,
 
 		if (tsc_pg && vclock_was_used(VCLOCK_HVCLOCK))
 			return vmf_insert_pfn(vma, vmf->address,
-					vmalloc_to_pfn(tsc_pg));
+					virt_to_phys(tsc_pg) >> PAGE_SHIFT);
 	}
 
 	return VM_FAULT_SIGBUS;
diff --git a/drivers/clocksource/hyperv_timer.c b/drivers/clocksource/hyperv_timer.c
index ba2c79e6a0ee..86764ec9a854 100644
--- a/drivers/clocksource/hyperv_timer.c
+++ b/drivers/clocksource/hyperv_timer.c
@@ -214,17 +214,17 @@ EXPORT_SYMBOL_GPL(hyperv_cs);
 
 #ifdef CONFIG_HYPERV_TSCPAGE
 
-static struct ms_hyperv_tsc_page *tsc_pg;
+static struct ms_hyperv_tsc_page tsc_pg __aligned(PAGE_SIZE);
 
 struct ms_hyperv_tsc_page *hv_get_tsc_page(void)
 {
-	return tsc_pg;
+	return &tsc_pg;
 }
 EXPORT_SYMBOL_GPL(hv_get_tsc_page);
 
 static u64 notrace read_hv_sched_clock_tsc(void)
 {
-	u64 current_tick = hv_read_tsc_page(tsc_pg);
+	u64 current_tick = hv_read_tsc_page(&tsc_pg);
 
 	if (current_tick == U64_MAX)
 		hv_get_time_ref_count(current_tick);
@@ -280,12 +280,8 @@ static bool __init hv_init_tsc_clocksource(void)
 	if (!(ms_hyperv.features & HV_MSR_REFERENCE_TSC_AVAILABLE))
 		return false;
 
-	tsc_pg = vmalloc(PAGE_SIZE);
-	if (!tsc_pg)
-		return false;
-
 	hyperv_cs = &hyperv_cs_tsc;
-	phys_addr = page_to_phys(vmalloc_to_page(tsc_pg));
+	phys_addr = virt_to_phys(&tsc_pg) & PAGE_MASK;
 
 	/*
 	 * The Hyper-V TLFS specifies to preserve the value of reserved
-- 
2.14.5


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 2/2] clocksource/Hyper-V:  Add Hyper-V specific sched clock function
  2019-07-29  7:52 [PATCH 0/2] clocksource/Hyper-V: Add Hyper-V specific sched clock function lantianyu1986
  2019-07-29  7:52 ` [PATCH 1/2] clocksource/Hyper-v: Allocate Hyper-V tsc page statically lantianyu1986
@ 2019-07-29  7:52 ` lantianyu1986
  2019-08-12 18:41   ` Michael Kelley
  2019-07-29 10:59 ` [PATCH 0/2] " Vitaly Kuznetsov
  2 siblings, 1 reply; 11+ messages in thread
From: lantianyu1986 @ 2019-07-29  7:52 UTC (permalink / raw)
  To: kys, haiyangz, sthemmin, sashal, tglx, mingo, bp, hpa, x86,
	daniel.lezcano, arnd, michael.h.kelley, ashal
  Cc: Tianyu Lan, linux-hyperv, linux-kernel, linux-arch

From: Tianyu Lan <Tianyu.Lan@microsoft.com>

Hyper-V guests use the default native_sched_clock() in pv_ops.time.sched_clock
on x86.  But native_sched_clock() directly uses the raw TSC value, which
can be discontinuous in a Hyper-V VM. Add the generic hv_setup_sched_clock()
to set the sched clock function appropriately. On x86, this sets pv_ops.time.
sched_clock to read the Hyper-V reference TSC value that is scaled and adjusted
to be continuous.

Also move the Hyper-V reference TSC initialization much earlier in the boot
process so no discontinuity is observed when pv_ops.time.sched_clock
calculates its offset.

Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
---
 arch/x86/hyperv/hv_init.c          |  2 --
 arch/x86/kernel/cpu/mshyperv.c     |  8 ++++++++
 drivers/clocksource/hyperv_timer.c | 22 ++++++++++++----------
 include/asm-generic/mshyperv.h     |  1 +
 4 files changed, 21 insertions(+), 12 deletions(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 0d258688c8cf..866dfb3dca48 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -301,8 +301,6 @@ void __init hyperv_init(void)
 
 	x86_init.pci.arch_init = hv_pci_init;
 
-	/* Register Hyper-V specific clocksource */
-	hv_init_clocksource();
 	return;
 
 remove_cpuhp_state:
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index 062f77279ce3..53afd33990eb 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -29,6 +29,7 @@
 #include <asm/timer.h>
 #include <asm/reboot.h>
 #include <asm/nmi.h>
+#include <clocksource/hyperv_timer.h>
 
 struct ms_hyperv_info ms_hyperv;
 EXPORT_SYMBOL_GPL(ms_hyperv);
@@ -338,9 +339,16 @@ static void __init ms_hyperv_init_platform(void)
 		x2apic_phys = 1;
 # endif
 
+	/* Register Hyper-V specific clocksource */
+	hv_init_clocksource();
 #endif
 }
 
+void hv_setup_sched_clock(void *sched_clock)
+{
+	pv_ops.time.sched_clock = sched_clock;
+}
+
 const __initconst struct hypervisor_x86 x86_hyper_ms_hyperv = {
 	.name			= "Microsoft Hyper-V",
 	.detect			= ms_hyperv_platform,
diff --git a/drivers/clocksource/hyperv_timer.c b/drivers/clocksource/hyperv_timer.c
index 86764ec9a854..eafca89b44d7 100644
--- a/drivers/clocksource/hyperv_timer.c
+++ b/drivers/clocksource/hyperv_timer.c
@@ -215,6 +215,7 @@ EXPORT_SYMBOL_GPL(hyperv_cs);
 #ifdef CONFIG_HYPERV_TSCPAGE
 
 static struct ms_hyperv_tsc_page tsc_pg __aligned(PAGE_SIZE);
+static u64 hv_sched_clock_offset __ro_after_init;
 
 struct ms_hyperv_tsc_page *hv_get_tsc_page(void)
 {
@@ -222,7 +223,7 @@ struct ms_hyperv_tsc_page *hv_get_tsc_page(void)
 }
 EXPORT_SYMBOL_GPL(hv_get_tsc_page);
 
-static u64 notrace read_hv_sched_clock_tsc(void)
+static u64 notrace read_hv_clock_tsc(struct clocksource *arg)
 {
 	u64 current_tick = hv_read_tsc_page(&tsc_pg);
 
@@ -232,9 +233,9 @@ static u64 notrace read_hv_sched_clock_tsc(void)
 	return current_tick;
 }
 
-static u64 read_hv_clock_tsc(struct clocksource *arg)
+static u64 read_hv_sched_clock_tsc(void)
 {
-	return read_hv_sched_clock_tsc();
+	return read_hv_clock_tsc(NULL) - hv_sched_clock_offset;
 }
 
 static struct clocksource hyperv_cs_tsc = {
@@ -246,7 +247,7 @@ static struct clocksource hyperv_cs_tsc = {
 };
 #endif
 
-static u64 notrace read_hv_sched_clock_msr(void)
+static u64 notrace read_hv_clock_msr(struct clocksource *arg)
 {
 	u64 current_tick;
 	/*
@@ -258,9 +259,9 @@ static u64 notrace read_hv_sched_clock_msr(void)
 	return current_tick;
 }
 
-static u64 read_hv_clock_msr(struct clocksource *arg)
+static u64 read_hv_sched_clock_msr(void)
 {
-	return read_hv_sched_clock_msr();
+	return read_hv_clock_msr(NULL) - hv_sched_clock_offset;
 }
 
 static struct clocksource hyperv_cs_msr = {
@@ -298,8 +299,9 @@ static bool __init hv_init_tsc_clocksource(void)
 	hv_set_clocksource_vdso(hyperv_cs_tsc);
 	clocksource_register_hz(&hyperv_cs_tsc, NSEC_PER_SEC/100);
 
-	/* sched_clock_register is needed on ARM64 but is a no-op on x86 */
-	sched_clock_register(read_hv_sched_clock_tsc, 64, HV_CLOCK_HZ);
+	hv_sched_clock_offset = hyperv_cs->read(hyperv_cs);
+	hv_setup_sched_clock(read_hv_sched_clock_tsc);
+
 	return true;
 }
 #else
@@ -329,7 +331,7 @@ void __init hv_init_clocksource(void)
 	hyperv_cs = &hyperv_cs_msr;
 	clocksource_register_hz(&hyperv_cs_msr, NSEC_PER_SEC/100);
 
-	/* sched_clock_register is needed on ARM64 but is a no-op on x86 */
-	sched_clock_register(read_hv_sched_clock_msr, 64, HV_CLOCK_HZ);
+	hv_sched_clock_offset = hyperv_cs->read(hyperv_cs);
+	hv_setup_sched_clock(read_hv_sched_clock_msr);
 }
 EXPORT_SYMBOL_GPL(hv_init_clocksource);
diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
index 0becb7d9704d..18d8e2d8210f 100644
--- a/include/asm-generic/mshyperv.h
+++ b/include/asm-generic/mshyperv.h
@@ -167,6 +167,7 @@ void hyperv_report_panic(struct pt_regs *regs, long err);
 void hyperv_report_panic_msg(phys_addr_t pa, size_t size);
 bool hv_is_hyperv_initialized(void);
 void hyperv_cleanup(void);
+void hv_setup_sched_clock(void *sched_clock);
 #else /* CONFIG_HYPERV */
 static inline bool hv_is_hyperv_initialized(void) { return false; }
 static inline void hyperv_cleanup(void) {}
-- 
2.14.5


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/2] clocksource/Hyper-V: Add Hyper-V specific sched clock function
  2019-07-29  7:52 [PATCH 0/2] clocksource/Hyper-V: Add Hyper-V specific sched clock function lantianyu1986
  2019-07-29  7:52 ` [PATCH 1/2] clocksource/Hyper-v: Allocate Hyper-V tsc page statically lantianyu1986
  2019-07-29  7:52 ` [PATCH 2/2] clocksource/Hyper-V: Add Hyper-V specific sched clock function lantianyu1986
@ 2019-07-29 10:59 ` " Vitaly Kuznetsov
  2019-07-29 11:09   ` Peter Zijlstra
  2 siblings, 1 reply; 11+ messages in thread
From: Vitaly Kuznetsov @ 2019-07-29 10:59 UTC (permalink / raw)
  To: lantianyu1986
  Cc: Tianyu Lan, linux-arch, linux-hyperv, linux-kernel, luto, tglx,
	mingo, bp, hpa, x86, kys, haiyangz, sthemmin, sashal,
	daniel.lezcano, arnd, michael.h.kelley, ashal

lantianyu1986@gmail.com writes:

> From: Tianyu Lan <Tianyu.Lan@microsoft.com>
>
> Hyper-V guests use the default native_sched_clock() in pv_ops.time.sched_clock
> on x86.  But native_sched_clock() directly uses the raw TSC value, which
> can be discontinuous in a Hyper-V VM.   Add the generic hv_setup_sched_clock()
> to set the sched clock function appropriately.  On x86, this sets
> pv_ops.time.sched_clock to read the Hyper-V reference TSC value that is
> scaled and adjusted to be continuous.

Hypervisor can, in theory, disable TSC page and then we're forced to use
MSR-based clocksource but using it as sched_clock() can be very slow,
I'm afraid.

On the other hand, what we have now is probably worse: TSC can,
actually, jump backwards (e.g. on migration) and we're breaking the
requirements for sched_clock().

-- 
Vitaly

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/2] clocksource/Hyper-V: Add Hyper-V specific sched clock function
  2019-07-29 10:59 ` [PATCH 0/2] " Vitaly Kuznetsov
@ 2019-07-29 11:09   ` Peter Zijlstra
  2019-07-29 12:13     ` Vitaly Kuznetsov
  0 siblings, 1 reply; 11+ messages in thread
From: Peter Zijlstra @ 2019-07-29 11:09 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: lantianyu1986, Tianyu Lan, linux-arch, linux-hyperv,
	linux-kernel, luto, tglx, mingo, bp, hpa, x86, kys, haiyangz,
	sthemmin, sashal, daniel.lezcano, arnd, michael.h.kelley, ashal

On Mon, Jul 29, 2019 at 12:59:26PM +0200, Vitaly Kuznetsov wrote:
> lantianyu1986@gmail.com writes:
> 
> > From: Tianyu Lan <Tianyu.Lan@microsoft.com>
> >
> > Hyper-V guests use the default native_sched_clock() in pv_ops.time.sched_clock
> > on x86.  But native_sched_clock() directly uses the raw TSC value, which
> > can be discontinuous in a Hyper-V VM.   Add the generic hv_setup_sched_clock()
> > to set the sched clock function appropriately.  On x86, this sets
> > pv_ops.time.sched_clock to read the Hyper-V reference TSC value that is
> > scaled and adjusted to be continuous.
> 
> Hypervisor can, in theory, disable TSC page and then we're forced to use
> MSR-based clocksource but using it as sched_clock() can be very slow,
> I'm afraid.
> 
> On the other hand, what we have now is probably worse: TSC can,
> actually, jump backwards (e.g. on migration) and we're breaking the
> requirements for sched_clock().

That (obviously) also breaks the requirements for using TSC as
clocksource.

IOW, it breaks the entire purpose of having TSC in the first place.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/2] clocksource/Hyper-V: Add Hyper-V specific sched clock function
  2019-07-29 11:09   ` Peter Zijlstra
@ 2019-07-29 12:13     ` Vitaly Kuznetsov
  2019-07-30 13:41       ` Tianyu Lan
  0 siblings, 1 reply; 11+ messages in thread
From: Vitaly Kuznetsov @ 2019-07-29 12:13 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: lantianyu1986, Tianyu Lan, linux-arch, linux-hyperv,
	linux-kernel, luto, tglx, mingo, bp, hpa, x86, kys, haiyangz,
	sthemmin, sashal, daniel.lezcano, arnd, michael.h.kelley, ashal

Peter Zijlstra <peterz@infradead.org> writes:

> On Mon, Jul 29, 2019 at 12:59:26PM +0200, Vitaly Kuznetsov wrote:
>> lantianyu1986@gmail.com writes:
>> 
>> > From: Tianyu Lan <Tianyu.Lan@microsoft.com>
>> >
>> > Hyper-V guests use the default native_sched_clock() in pv_ops.time.sched_clock
>> > on x86.  But native_sched_clock() directly uses the raw TSC value, which
>> > can be discontinuous in a Hyper-V VM.   Add the generic hv_setup_sched_clock()
>> > to set the sched clock function appropriately.  On x86, this sets
>> > pv_ops.time.sched_clock to read the Hyper-V reference TSC value that is
>> > scaled and adjusted to be continuous.
>> 
>> Hypervisor can, in theory, disable TSC page and then we're forced to use
>> MSR-based clocksource but using it as sched_clock() can be very slow,
>> I'm afraid.
>> 
>> On the other hand, what we have now is probably worse: TSC can,
>> actually, jump backwards (e.g. on migration) and we're breaking the
>> requirements for sched_clock().
>
> That (obviously) also breaks the requirements for using TSC as
> clocksource.
>
> IOW, it breaks the entire purpose of having TSC in the first place.

Currently, we mark raw TSC as unstable when running on Hyper-V (see
88c9281a9fba6), 'TSC page' (which is TSC * scale + offset) is being used
instead. The problem is that 'TSC page' can be disabled by the
hypervisor and in that case the only remaining clocksource is MSR-based
(slow).

-- 
Vitaly

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/2] clocksource/Hyper-V: Add Hyper-V specific sched clock function
  2019-07-29 12:13     ` Vitaly Kuznetsov
@ 2019-07-30 13:41       ` Tianyu Lan
  2019-08-12 19:22         ` Michael Kelley
  0 siblings, 1 reply; 11+ messages in thread
From: Tianyu Lan @ 2019-07-30 13:41 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Peter Zijlstra, Tianyu Lan, linux-arch, linux-hyperv,
	linux-kernel@vger kernel org, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	the arch/x86 maintainers, KY Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Sasha Levin, Daniel Lezcano, Arnd Bergmann,
	michael.h.kelley, ashal

Hi Vitaly & Peter:
    Thanks for your review.

On Mon, Jul 29, 2019 at 8:13 PM Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
>
> Peter Zijlstra <peterz@infradead.org> writes:
>
> > On Mon, Jul 29, 2019 at 12:59:26PM +0200, Vitaly Kuznetsov wrote:
> >> lantianyu1986@gmail.com writes:
> >>
> >> > From: Tianyu Lan <Tianyu.Lan@microsoft.com>
> >> >
> >> > Hyper-V guests use the default native_sched_clock() in pv_ops.time.sched_clock
> >> > on x86.  But native_sched_clock() directly uses the raw TSC value, which
> >> > can be discontinuous in a Hyper-V VM.   Add the generic hv_setup_sched_clock()
> >> > to set the sched clock function appropriately.  On x86, this sets
> >> > pv_ops.time.sched_clock to read the Hyper-V reference TSC value that is
> >> > scaled and adjusted to be continuous.
> >>
> >> Hypervisor can, in theory, disable TSC page and then we're forced to use
> >> MSR-based clocksource but using it as sched_clock() can be very slow,
> >> I'm afraid.
> >>
> >> On the other hand, what we have now is probably worse: TSC can,
> >> actually, jump backwards (e.g. on migration) and we're breaking the
> >> requirements for sched_clock().
> >
> > That (obviously) also breaks the requirements for using TSC as
> > clocksource.
> >
> > IOW, it breaks the entire purpose of having TSC in the first place.
>
> Currently, we mark raw TSC as unstable when running on Hyper-V (see
> 88c9281a9fba6), 'TSC page' (which is TSC * scale + offset) is being used
> instead. The problem is that 'TSC page' can be disabled by the
> hypervisor and in that case the only remaining clocksource is MSR-based
> (slow).
>

Yes, that will be slow if Hyper-V doesn't expose hv tsc page and
kernel uses MSR based
clocksource. Each MSR read will trigger one VM-EXIT. This also happens on other
hypervisors (e,g, KVM doesn't expose KVM clock). Hypervisor should
take this into
account and determine which clocksource should be exposed or not.

-- 
Best regards
Tianyu Lan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [PATCH 1/2] clocksource/Hyper-v: Allocate Hyper-V tsc page statically
  2019-07-29  7:52 ` [PATCH 1/2] clocksource/Hyper-v: Allocate Hyper-V tsc page statically lantianyu1986
@ 2019-08-12 18:39   ` Michael Kelley
  0 siblings, 0 replies; 11+ messages in thread
From: Michael Kelley @ 2019-08-12 18:39 UTC (permalink / raw)
  To: lantianyu1986, luto, tglx, mingo, bp, hpa, x86, KY Srinivasan,
	Haiyang Zhang, Stephen Hemminger, sashal, daniel.lezcano, arnd,
	ashal
  Cc: Tianyu Lan, linux-kernel, linux-hyperv, linux-arch

From: lantianyu1986@gmail.com <lantianyu1986@gmail.com> Sent: Monday, July 29, 2019 12:53 AM
> 
> This is to prepare to add Hyper-V sched clock callback and move
> Hyper-V reference TSC initialization much earlier in the boot
> process when timestamp is 0. So no discontinuity is observed
> when pv_ops.time.sched_clock to calculate its offset. This earlier
> initialization requires that the Hyper-V TSC page be allocated
> statically instead of with vmalloc(), so fixup the references
> to the TSC page and the method of getting its physical address.

I'd suggest tweaking the commit message wording a bit:

Prepare to add Hyper-V sched clock callback and move Hyper-V
Reference TSC initialization much earlier in the boot process.  Earlier
initialization is needed so that it happens while the timestamp value
is still 0 and no discontinuity in the timestamp will occur when 
pv_ops.time.sched_clock calculates its offset.  The earlier
initialization requires that the Hyper-V TSC page be allocated
statically instead of with vmalloc(), so fixup the references
to the TSC page and the method of getting its physical address.

> 
> Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
> ---
>  arch/x86/entry/vdso/vma.c          |  2 +-
>  drivers/clocksource/hyperv_timer.c | 12 ++++--------
>  2 files changed, 5 insertions(+), 9 deletions(-)
> 
> @@ -280,12 +280,8 @@ static bool __init hv_init_tsc_clocksource(void)
>  	if (!(ms_hyperv.features & HV_MSR_REFERENCE_TSC_AVAILABLE))
>  		return false;
> 
> -	tsc_pg = vmalloc(PAGE_SIZE);
> -	if (!tsc_pg)
> -		return false;
> -
>  	hyperv_cs = &hyperv_cs_tsc;
> -	phys_addr = page_to_phys(vmalloc_to_page(tsc_pg));
> +	phys_addr = virt_to_phys(&tsc_pg) & PAGE_MASK;

The and'ing with PAGE_MASK isn't needed.  You've set up tsc_pg
to ensure it is page aligned, so there's no need to mask out any
low order bits.  That's why the previous code didn't do the masking
either.

> 
>  	/*
>  	 * The Hyper-V TLFS specifies to preserve the value of reserved
> --
> 2.14.5


^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [PATCH 2/2] clocksource/Hyper-V:  Add Hyper-V specific sched clock function
  2019-07-29  7:52 ` [PATCH 2/2] clocksource/Hyper-V: Add Hyper-V specific sched clock function lantianyu1986
@ 2019-08-12 18:41   ` Michael Kelley
  0 siblings, 0 replies; 11+ messages in thread
From: Michael Kelley @ 2019-08-12 18:41 UTC (permalink / raw)
  To: lantianyu1986, KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	sashal, tglx, mingo, bp, hpa, x86, daniel.lezcano, arnd, ashal
  Cc: Tianyu Lan, linux-hyperv, linux-kernel, linux-arch

From: Tianyu Lan <Tianyu.Lan@microsoft.com>  Sent: Monday, July 29, 2019 12:53 AM
> 
> Hyper-V guests use the default native_sched_clock() in pv_ops.time.sched_clock
> on x86.  But native_sched_clock() directly uses the raw TSC value, which
> can be discontinuous in a Hyper-V VM. Add the generic hv_setup_sched_clock()
> to set the sched clock function appropriately. On x86, this sets pv_ops.time.
> sched_clock to read the Hyper-V reference TSC value that is scaled and adjusted
> to be continuous.
> 
> Also move the Hyper-V reference TSC initialization much earlier in the boot
> process so no discontinuity is observed when pv_ops.time.sched_clock
> calculates its offset.
> 
> Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
> ---
>  arch/x86/hyperv/hv_init.c          |  2 --
>  arch/x86/kernel/cpu/mshyperv.c     |  8 ++++++++
>  drivers/clocksource/hyperv_timer.c | 22 ++++++++++++----------
>  include/asm-generic/mshyperv.h     |  1 +
>  4 files changed, 21 insertions(+), 12 deletions(-)
> 

Reviewed-by: Michael Kelley <mikelley@microsoft.com>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [PATCH 0/2] clocksource/Hyper-V: Add Hyper-V specific sched clock function
  2019-07-30 13:41       ` Tianyu Lan
@ 2019-08-12 19:22         ` Michael Kelley
  2019-08-13  8:33           ` Vitaly Kuznetsov
  0 siblings, 1 reply; 11+ messages in thread
From: Michael Kelley @ 2019-08-12 19:22 UTC (permalink / raw)
  To: Tianyu Lan, vkuznets
  Cc: Peter Zijlstra, Tianyu Lan, linux-arch, linux-hyperv,
	linux-kernel@vger kernel org, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	the arch/x86 maintainers, KY Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Sasha Levin, Daniel Lezcano, Arnd Bergmann,
	ashal

From: Tianyu Lan <lantianyu1986@gmail.com> Sent: Tuesday, July 30, 2019 6:41 AM
> 
> On Mon, Jul 29, 2019 at 8:13 PM Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
> >
> > Peter Zijlstra <peterz@infradead.org> writes:
> >
> > > On Mon, Jul 29, 2019 at 12:59:26PM +0200, Vitaly Kuznetsov wrote:
> > >> lantianyu1986@gmail.com writes:
> > >>
> > >> > From: Tianyu Lan <Tianyu.Lan@microsoft.com>
> > >> >
> > >> > Hyper-V guests use the default native_sched_clock() in pv_ops.time.sched_clock
> > >> > on x86.  But native_sched_clock() directly uses the raw TSC value, which
> > >> > can be discontinuous in a Hyper-V VM.   Add the generic hv_setup_sched_clock()
> > >> > to set the sched clock function appropriately.  On x86, this sets
> > >> > pv_ops.time.sched_clock to read the Hyper-V reference TSC value that is
> > >> > scaled and adjusted to be continuous.
> > >>
> > >> Hypervisor can, in theory, disable TSC page and then we're forced to use
> > >> MSR-based clocksource but using it as sched_clock() can be very slow,
> > >> I'm afraid.
> > >>
> > >> On the other hand, what we have now is probably worse: TSC can,
> > >> actually, jump backwards (e.g. on migration) and we're breaking the
> > >> requirements for sched_clock().
> > >
> > > That (obviously) also breaks the requirements for using TSC as
> > > clocksource.
> > >
> > > IOW, it breaks the entire purpose of having TSC in the first place.
> >
> > Currently, we mark raw TSC as unstable when running on Hyper-V (see
> > 88c9281a9fba6), 'TSC page' (which is TSC * scale + offset) is being used
> > instead. The problem is that 'TSC page' can be disabled by the
> > hypervisor and in that case the only remaining clocksource is MSR-based
> > (slow).
> >
> 
> Yes, that will be slow if Hyper-V doesn't expose hv tsc page and
> kernel uses MSR based
> clocksource. Each MSR read will trigger one VM-EXIT. This also happens on other
> hypervisors (e,g, KVM doesn't expose KVM clock). Hypervisor should
> take this into
> account and determine which clocksource should be exposed or not.
> 

We've confirmed with the Hyper-V team that the TSC page is always available
on Hyper-V 2016 and later, and on Hyper-V 2012 R2 when the physical
hardware presents an InvariantTSC.  But the Linux Kconfig's are set up so
the TSC page is not used for 32-bit guests -- all clock reads are synthetic MSR
reads.  For 32-bit, this set of changes will add more overhead because the
sched clock reads will now be MSR reads.

I would be inclined to fix the problem, even with the perf hit on 32-bit Linux.
I don’t have any data on 32-bit Linux being used in a Hyper-V guest, but it's not
supported in Azure so usage is pretty small.  The alternative would be to continue
to use the raw TSC value on 32-bit, even with the risk of a discontinuity in case of
live migration or similar scenarios.

Michael

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [PATCH 0/2] clocksource/Hyper-V: Add Hyper-V specific sched clock function
  2019-08-12 19:22         ` Michael Kelley
@ 2019-08-13  8:33           ` Vitaly Kuznetsov
  0 siblings, 0 replies; 11+ messages in thread
From: Vitaly Kuznetsov @ 2019-08-13  8:33 UTC (permalink / raw)
  To: Michael Kelley, Tianyu Lan
  Cc: Peter Zijlstra, Tianyu Lan, linux-arch\, linux-hyperv\,
	linux-kernel\@vger kernel org, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	the arch/x86 maintainers, KY Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Sasha Levin, Daniel Lezcano, Arnd Bergmann,
	ashal\

Michael Kelley <mikelley@microsoft.com> writes:

> From: Tianyu Lan <lantianyu1986@gmail.com> Sent: Tuesday, July 30, 2019 6:41 AM
>> 
>> On Mon, Jul 29, 2019 at 8:13 PM Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
>> >
>> > Peter Zijlstra <peterz@infradead.org> writes:
>> >
>> > > On Mon, Jul 29, 2019 at 12:59:26PM +0200, Vitaly Kuznetsov wrote:
>> > >> lantianyu1986@gmail.com writes:
>> > >>
>> > >> > From: Tianyu Lan <Tianyu.Lan@microsoft.com>
>> > >> >
>> > >> > Hyper-V guests use the default native_sched_clock() in pv_ops.time.sched_clock
>> > >> > on x86.  But native_sched_clock() directly uses the raw TSC value, which
>> > >> > can be discontinuous in a Hyper-V VM.   Add the generic hv_setup_sched_clock()
>> > >> > to set the sched clock function appropriately.  On x86, this sets
>> > >> > pv_ops.time.sched_clock to read the Hyper-V reference TSC value that is
>> > >> > scaled and adjusted to be continuous.
>> > >>
>> > >> Hypervisor can, in theory, disable TSC page and then we're forced to use
>> > >> MSR-based clocksource but using it as sched_clock() can be very slow,
>> > >> I'm afraid.
>> > >>
>> > >> On the other hand, what we have now is probably worse: TSC can,
>> > >> actually, jump backwards (e.g. on migration) and we're breaking the
>> > >> requirements for sched_clock().
>> > >
>> > > That (obviously) also breaks the requirements for using TSC as
>> > > clocksource.
>> > >
>> > > IOW, it breaks the entire purpose of having TSC in the first place.
>> >
>> > Currently, we mark raw TSC as unstable when running on Hyper-V (see
>> > 88c9281a9fba6), 'TSC page' (which is TSC * scale + offset) is being used
>> > instead. The problem is that 'TSC page' can be disabled by the
>> > hypervisor and in that case the only remaining clocksource is MSR-based
>> > (slow).
>> >
>> 
>> Yes, that will be slow if Hyper-V doesn't expose hv tsc page and
>> kernel uses MSR based
>> clocksource. Each MSR read will trigger one VM-EXIT. This also happens on other
>> hypervisors (e,g, KVM doesn't expose KVM clock). Hypervisor should
>> take this into
>> account and determine which clocksource should be exposed or not.
>> 
>
> We've confirmed with the Hyper-V team that the TSC page is always available
> on Hyper-V 2016 and later, and on Hyper-V 2012 R2 when the physical
> hardware presents an InvariantTSC.

Currently we check that TSC page is valid on every read and it seems
this is redundant, right? It is either available on boot or not. I can
only imagine migrating a VM to a non-InvariantTSC host when Hyper-V will
likely disable the page (and we can get reenlightenment notification
then).

>  But the Linux Kconfig's are set up so
> the TSC page is not used for 32-bit guests -- all clock reads are synthetic MSR
> reads.  For 32-bit, this set of changes will add more overhead because the
> sched clock reads will now be MSR reads.
>
> I would be inclined to fix the problem, even with the perf hit on 32-bit Linux.
> I don’t have any data on 32-bit Linux being used in a Hyper-V guest, but it's not
> supported in Azure so usage is pretty small.  The alternative would be to continue
> to use the raw TSC value on 32-bit, even with the risk of a discontinuity in case of
> live migration or similar scenarios.

The issue needs fixing, I agree, however using MSR based clocksource as
sched clock may give us too big of a performance hit (not sure who cares
about 32 bit guest performance nowadays but still). What stops us from
enabling TSC page for 32 bit guests if it is available?

-- 
Vitaly

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, back to index

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-29  7:52 [PATCH 0/2] clocksource/Hyper-V: Add Hyper-V specific sched clock function lantianyu1986
2019-07-29  7:52 ` [PATCH 1/2] clocksource/Hyper-v: Allocate Hyper-V tsc page statically lantianyu1986
2019-08-12 18:39   ` Michael Kelley
2019-07-29  7:52 ` [PATCH 2/2] clocksource/Hyper-V: Add Hyper-V specific sched clock function lantianyu1986
2019-08-12 18:41   ` Michael Kelley
2019-07-29 10:59 ` [PATCH 0/2] " Vitaly Kuznetsov
2019-07-29 11:09   ` Peter Zijlstra
2019-07-29 12:13     ` Vitaly Kuznetsov
2019-07-30 13:41       ` Tianyu Lan
2019-08-12 19:22         ` Michael Kelley
2019-08-13  8:33           ` Vitaly Kuznetsov

Linux-HyperV Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-hyperv/0 linux-hyperv/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-hyperv linux-hyperv/ https://lore.kernel.org/linux-hyperv \
		linux-hyperv@vger.kernel.org linux-hyperv@archiver.kernel.org
	public-inbox-index linux-hyperv


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-hyperv


AGPL code for this site: git clone https://public-inbox.org/ public-inbox