linux-hyperv.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] clocksource/Hyper-V: Add Hyper-V specific sched clock function
@ 2019-07-29  7:52 lantianyu1986
  2019-07-29  7:52 ` [PATCH 1/2] clocksource/Hyper-v: Allocate Hyper-V tsc page statically lantianyu1986
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: lantianyu1986 @ 2019-07-29  7:52 UTC (permalink / raw)
  To: luto, tglx, mingo, bp, hpa, x86, kys, haiyangz, sthemmin, sashal,
	daniel.lezcano, arnd, michael.h.kelley, ashal
  Cc: Tianyu Lan, linux-arch, linux-hyperv, linux-kernel

From: Tianyu Lan <Tianyu.Lan@microsoft.com>

Hyper-V guests use the default native_sched_clock() in pv_ops.time.sched_clock
on x86.  But native_sched_clock() directly uses the raw TSC value, which
can be discontinuous in a Hyper-V VM.   Add the generic hv_setup_sched_clock()
to set the sched clock function appropriately.  On x86, this sets
pv_ops.time.sched_clock to read the Hyper-V reference TSC value that is
scaled and adjusted to be continuous.

Also move the Hyper-V reference TSC initialization much earlier in the boot
process so no discontinuity is observed when pv_ops.time.sched_clock
calculates its offset.  This earlier initialization requires that the Hyper-V TSC
page be allocated statically instead of with vmalloc(), so fixup the references
to the TSC page and the method of getting its physical address.

Tianyu Lan (2):
  clocksource/Hyper-v: Allocate Hyper-V tsc page statically
  clocksource/Hyper-V: Add Hyper-V specific sched clock function

 arch/x86/entry/vdso/vma.c          |  2 +-
 arch/x86/hyperv/hv_init.c          |  2 --
 arch/x86/kernel/cpu/mshyperv.c     |  8 ++++++++
 drivers/clocksource/hyperv_timer.c | 34 ++++++++++++++++------------------
 include/asm-generic/mshyperv.h     |  1 +
 5 files changed, 26 insertions(+), 21 deletions(-)

-- 
2.14.5


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/2] clocksource/Hyper-v: Allocate Hyper-V tsc page statically
  2019-07-29  7:52 [PATCH 0/2] clocksource/Hyper-V: Add Hyper-V specific sched clock function lantianyu1986
@ 2019-07-29  7:52 ` lantianyu1986
  2019-08-12 18:39   ` Michael Kelley
  2019-07-29  7:52 ` [PATCH 2/2] clocksource/Hyper-V: Add Hyper-V specific sched clock function lantianyu1986
  2019-07-29 10:59 ` [PATCH 0/2] " Vitaly Kuznetsov
  2 siblings, 1 reply; 14+ messages in thread
From: lantianyu1986 @ 2019-07-29  7:52 UTC (permalink / raw)
  To: luto, tglx, mingo, bp, hpa, x86, kys, haiyangz, sthemmin, sashal,
	daniel.lezcano, arnd, michael.h.kelley, ashal
  Cc: Tianyu Lan, linux-kernel, linux-hyperv, linux-arch

From: Tianyu Lan <Tianyu.Lan@microsoft.com>

This is to prepare to add Hyper-V sched clock callback and move
Hyper-V reference TSC initialization much earlier in the boot
process when timestamp is 0. So no discontinuity is observed
when pv_ops.time.sched_clock to calculate its offset. This earlier
initialization requires that the Hyper-V TSC page be allocated
statically instead of with vmalloc(), so fixup the references
to the TSC page and the method of getting its physical address.

Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
---
 arch/x86/entry/vdso/vma.c          |  2 +-
 drivers/clocksource/hyperv_timer.c | 12 ++++--------
 2 files changed, 5 insertions(+), 9 deletions(-)

diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
index 349a61d8bf34..f5937742b290 100644
--- a/arch/x86/entry/vdso/vma.c
+++ b/arch/x86/entry/vdso/vma.c
@@ -122,7 +122,7 @@ static vm_fault_t vvar_fault(const struct vm_special_mapping *sm,
 
 		if (tsc_pg && vclock_was_used(VCLOCK_HVCLOCK))
 			return vmf_insert_pfn(vma, vmf->address,
-					vmalloc_to_pfn(tsc_pg));
+					virt_to_phys(tsc_pg) >> PAGE_SHIFT);
 	}
 
 	return VM_FAULT_SIGBUS;
diff --git a/drivers/clocksource/hyperv_timer.c b/drivers/clocksource/hyperv_timer.c
index ba2c79e6a0ee..86764ec9a854 100644
--- a/drivers/clocksource/hyperv_timer.c
+++ b/drivers/clocksource/hyperv_timer.c
@@ -214,17 +214,17 @@ EXPORT_SYMBOL_GPL(hyperv_cs);
 
 #ifdef CONFIG_HYPERV_TSCPAGE
 
-static struct ms_hyperv_tsc_page *tsc_pg;
+static struct ms_hyperv_tsc_page tsc_pg __aligned(PAGE_SIZE);
 
 struct ms_hyperv_tsc_page *hv_get_tsc_page(void)
 {
-	return tsc_pg;
+	return &tsc_pg;
 }
 EXPORT_SYMBOL_GPL(hv_get_tsc_page);
 
 static u64 notrace read_hv_sched_clock_tsc(void)
 {
-	u64 current_tick = hv_read_tsc_page(tsc_pg);
+	u64 current_tick = hv_read_tsc_page(&tsc_pg);
 
 	if (current_tick == U64_MAX)
 		hv_get_time_ref_count(current_tick);
@@ -280,12 +280,8 @@ static bool __init hv_init_tsc_clocksource(void)
 	if (!(ms_hyperv.features & HV_MSR_REFERENCE_TSC_AVAILABLE))
 		return false;
 
-	tsc_pg = vmalloc(PAGE_SIZE);
-	if (!tsc_pg)
-		return false;
-
 	hyperv_cs = &hyperv_cs_tsc;
-	phys_addr = page_to_phys(vmalloc_to_page(tsc_pg));
+	phys_addr = virt_to_phys(&tsc_pg) & PAGE_MASK;
 
 	/*
 	 * The Hyper-V TLFS specifies to preserve the value of reserved
-- 
2.14.5


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/2] clocksource/Hyper-V:  Add Hyper-V specific sched clock function
  2019-07-29  7:52 [PATCH 0/2] clocksource/Hyper-V: Add Hyper-V specific sched clock function lantianyu1986
  2019-07-29  7:52 ` [PATCH 1/2] clocksource/Hyper-v: Allocate Hyper-V tsc page statically lantianyu1986
@ 2019-07-29  7:52 ` lantianyu1986
  2019-08-12 18:41   ` Michael Kelley
  2019-07-29 10:59 ` [PATCH 0/2] " Vitaly Kuznetsov
  2 siblings, 1 reply; 14+ messages in thread
From: lantianyu1986 @ 2019-07-29  7:52 UTC (permalink / raw)
  To: kys, haiyangz, sthemmin, sashal, tglx, mingo, bp, hpa, x86,
	daniel.lezcano, arnd, michael.h.kelley, ashal
  Cc: Tianyu Lan, linux-hyperv, linux-kernel, linux-arch

From: Tianyu Lan <Tianyu.Lan@microsoft.com>

Hyper-V guests use the default native_sched_clock() in pv_ops.time.sched_clock
on x86.  But native_sched_clock() directly uses the raw TSC value, which
can be discontinuous in a Hyper-V VM. Add the generic hv_setup_sched_clock()
to set the sched clock function appropriately. On x86, this sets pv_ops.time.
sched_clock to read the Hyper-V reference TSC value that is scaled and adjusted
to be continuous.

Also move the Hyper-V reference TSC initialization much earlier in the boot
process so no discontinuity is observed when pv_ops.time.sched_clock
calculates its offset.

Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
---
 arch/x86/hyperv/hv_init.c          |  2 --
 arch/x86/kernel/cpu/mshyperv.c     |  8 ++++++++
 drivers/clocksource/hyperv_timer.c | 22 ++++++++++++----------
 include/asm-generic/mshyperv.h     |  1 +
 4 files changed, 21 insertions(+), 12 deletions(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 0d258688c8cf..866dfb3dca48 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -301,8 +301,6 @@ void __init hyperv_init(void)
 
 	x86_init.pci.arch_init = hv_pci_init;
 
-	/* Register Hyper-V specific clocksource */
-	hv_init_clocksource();
 	return;
 
 remove_cpuhp_state:
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index 062f77279ce3..53afd33990eb 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -29,6 +29,7 @@
 #include <asm/timer.h>
 #include <asm/reboot.h>
 #include <asm/nmi.h>
+#include <clocksource/hyperv_timer.h>
 
 struct ms_hyperv_info ms_hyperv;
 EXPORT_SYMBOL_GPL(ms_hyperv);
@@ -338,9 +339,16 @@ static void __init ms_hyperv_init_platform(void)
 		x2apic_phys = 1;
 # endif
 
+	/* Register Hyper-V specific clocksource */
+	hv_init_clocksource();
 #endif
 }
 
+void hv_setup_sched_clock(void *sched_clock)
+{
+	pv_ops.time.sched_clock = sched_clock;
+}
+
 const __initconst struct hypervisor_x86 x86_hyper_ms_hyperv = {
 	.name			= "Microsoft Hyper-V",
 	.detect			= ms_hyperv_platform,
diff --git a/drivers/clocksource/hyperv_timer.c b/drivers/clocksource/hyperv_timer.c
index 86764ec9a854..eafca89b44d7 100644
--- a/drivers/clocksource/hyperv_timer.c
+++ b/drivers/clocksource/hyperv_timer.c
@@ -215,6 +215,7 @@ EXPORT_SYMBOL_GPL(hyperv_cs);
 #ifdef CONFIG_HYPERV_TSCPAGE
 
 static struct ms_hyperv_tsc_page tsc_pg __aligned(PAGE_SIZE);
+static u64 hv_sched_clock_offset __ro_after_init;
 
 struct ms_hyperv_tsc_page *hv_get_tsc_page(void)
 {
@@ -222,7 +223,7 @@ struct ms_hyperv_tsc_page *hv_get_tsc_page(void)
 }
 EXPORT_SYMBOL_GPL(hv_get_tsc_page);
 
-static u64 notrace read_hv_sched_clock_tsc(void)
+static u64 notrace read_hv_clock_tsc(struct clocksource *arg)
 {
 	u64 current_tick = hv_read_tsc_page(&tsc_pg);
 
@@ -232,9 +233,9 @@ static u64 notrace read_hv_sched_clock_tsc(void)
 	return current_tick;
 }
 
-static u64 read_hv_clock_tsc(struct clocksource *arg)
+static u64 read_hv_sched_clock_tsc(void)
 {
-	return read_hv_sched_clock_tsc();
+	return read_hv_clock_tsc(NULL) - hv_sched_clock_offset;
 }
 
 static struct clocksource hyperv_cs_tsc = {
@@ -246,7 +247,7 @@ static struct clocksource hyperv_cs_tsc = {
 };
 #endif
 
-static u64 notrace read_hv_sched_clock_msr(void)
+static u64 notrace read_hv_clock_msr(struct clocksource *arg)
 {
 	u64 current_tick;
 	/*
@@ -258,9 +259,9 @@ static u64 notrace read_hv_sched_clock_msr(void)
 	return current_tick;
 }
 
-static u64 read_hv_clock_msr(struct clocksource *arg)
+static u64 read_hv_sched_clock_msr(void)
 {
-	return read_hv_sched_clock_msr();
+	return read_hv_clock_msr(NULL) - hv_sched_clock_offset;
 }
 
 static struct clocksource hyperv_cs_msr = {
@@ -298,8 +299,9 @@ static bool __init hv_init_tsc_clocksource(void)
 	hv_set_clocksource_vdso(hyperv_cs_tsc);
 	clocksource_register_hz(&hyperv_cs_tsc, NSEC_PER_SEC/100);
 
-	/* sched_clock_register is needed on ARM64 but is a no-op on x86 */
-	sched_clock_register(read_hv_sched_clock_tsc, 64, HV_CLOCK_HZ);
+	hv_sched_clock_offset = hyperv_cs->read(hyperv_cs);
+	hv_setup_sched_clock(read_hv_sched_clock_tsc);
+
 	return true;
 }
 #else
@@ -329,7 +331,7 @@ void __init hv_init_clocksource(void)
 	hyperv_cs = &hyperv_cs_msr;
 	clocksource_register_hz(&hyperv_cs_msr, NSEC_PER_SEC/100);
 
-	/* sched_clock_register is needed on ARM64 but is a no-op on x86 */
-	sched_clock_register(read_hv_sched_clock_msr, 64, HV_CLOCK_HZ);
+	hv_sched_clock_offset = hyperv_cs->read(hyperv_cs);
+	hv_setup_sched_clock(read_hv_sched_clock_msr);
 }
 EXPORT_SYMBOL_GPL(hv_init_clocksource);
diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
index 0becb7d9704d..18d8e2d8210f 100644
--- a/include/asm-generic/mshyperv.h
+++ b/include/asm-generic/mshyperv.h
@@ -167,6 +167,7 @@ void hyperv_report_panic(struct pt_regs *regs, long err);
 void hyperv_report_panic_msg(phys_addr_t pa, size_t size);
 bool hv_is_hyperv_initialized(void);
 void hyperv_cleanup(void);
+void hv_setup_sched_clock(void *sched_clock);
 #else /* CONFIG_HYPERV */
 static inline bool hv_is_hyperv_initialized(void) { return false; }
 static inline void hyperv_cleanup(void) {}
-- 
2.14.5


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/2] clocksource/Hyper-V: Add Hyper-V specific sched clock function
  2019-07-29  7:52 [PATCH 0/2] clocksource/Hyper-V: Add Hyper-V specific sched clock function lantianyu1986
  2019-07-29  7:52 ` [PATCH 1/2] clocksource/Hyper-v: Allocate Hyper-V tsc page statically lantianyu1986
  2019-07-29  7:52 ` [PATCH 2/2] clocksource/Hyper-V: Add Hyper-V specific sched clock function lantianyu1986
@ 2019-07-29 10:59 ` Vitaly Kuznetsov
  2019-07-29 11:09   ` Peter Zijlstra
  2 siblings, 1 reply; 14+ messages in thread
From: Vitaly Kuznetsov @ 2019-07-29 10:59 UTC (permalink / raw)
  To: lantianyu1986
  Cc: Tianyu Lan, linux-arch, linux-hyperv, linux-kernel, luto, tglx,
	mingo, bp, hpa, x86, kys, haiyangz, sthemmin, sashal,
	daniel.lezcano, arnd, michael.h.kelley, ashal

lantianyu1986@gmail.com writes:

> From: Tianyu Lan <Tianyu.Lan@microsoft.com>
>
> Hyper-V guests use the default native_sched_clock() in pv_ops.time.sched_clock
> on x86.  But native_sched_clock() directly uses the raw TSC value, which
> can be discontinuous in a Hyper-V VM.   Add the generic hv_setup_sched_clock()
> to set the sched clock function appropriately.  On x86, this sets
> pv_ops.time.sched_clock to read the Hyper-V reference TSC value that is
> scaled and adjusted to be continuous.

Hypervisor can, in theory, disable TSC page and then we're forced to use
MSR-based clocksource but using it as sched_clock() can be very slow,
I'm afraid.

On the other hand, what we have now is probably worse: TSC can,
actually, jump backwards (e.g. on migration) and we're breaking the
requirements for sched_clock().

-- 
Vitaly

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/2] clocksource/Hyper-V: Add Hyper-V specific sched clock function
  2019-07-29 10:59 ` [PATCH 0/2] " Vitaly Kuznetsov
@ 2019-07-29 11:09   ` Peter Zijlstra
  2019-07-29 12:13     ` Vitaly Kuznetsov
  0 siblings, 1 reply; 14+ messages in thread
From: Peter Zijlstra @ 2019-07-29 11:09 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: lantianyu1986, Tianyu Lan, linux-arch, linux-hyperv,
	linux-kernel, luto, tglx, mingo, bp, hpa, x86, kys, haiyangz,
	sthemmin, sashal, daniel.lezcano, arnd, michael.h.kelley, ashal

On Mon, Jul 29, 2019 at 12:59:26PM +0200, Vitaly Kuznetsov wrote:
> lantianyu1986@gmail.com writes:
> 
> > From: Tianyu Lan <Tianyu.Lan@microsoft.com>
> >
> > Hyper-V guests use the default native_sched_clock() in pv_ops.time.sched_clock
> > on x86.  But native_sched_clock() directly uses the raw TSC value, which
> > can be discontinuous in a Hyper-V VM.   Add the generic hv_setup_sched_clock()
> > to set the sched clock function appropriately.  On x86, this sets
> > pv_ops.time.sched_clock to read the Hyper-V reference TSC value that is
> > scaled and adjusted to be continuous.
> 
> Hypervisor can, in theory, disable TSC page and then we're forced to use
> MSR-based clocksource but using it as sched_clock() can be very slow,
> I'm afraid.
> 
> On the other hand, what we have now is probably worse: TSC can,
> actually, jump backwards (e.g. on migration) and we're breaking the
> requirements for sched_clock().

That (obviously) also breaks the requirements for using TSC as
clocksource.

IOW, it breaks the entire purpose of having TSC in the first place.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/2] clocksource/Hyper-V: Add Hyper-V specific sched clock function
  2019-07-29 11:09   ` Peter Zijlstra
@ 2019-07-29 12:13     ` Vitaly Kuznetsov
  2019-07-30 13:41       ` Tianyu Lan
  0 siblings, 1 reply; 14+ messages in thread
From: Vitaly Kuznetsov @ 2019-07-29 12:13 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: lantianyu1986, Tianyu Lan, linux-arch, linux-hyperv,
	linux-kernel, luto, tglx, mingo, bp, hpa, x86, kys, haiyangz,
	sthemmin, sashal, daniel.lezcano, arnd, michael.h.kelley, ashal

Peter Zijlstra <peterz@infradead.org> writes:

> On Mon, Jul 29, 2019 at 12:59:26PM +0200, Vitaly Kuznetsov wrote:
>> lantianyu1986@gmail.com writes:
>> 
>> > From: Tianyu Lan <Tianyu.Lan@microsoft.com>
>> >
>> > Hyper-V guests use the default native_sched_clock() in pv_ops.time.sched_clock
>> > on x86.  But native_sched_clock() directly uses the raw TSC value, which
>> > can be discontinuous in a Hyper-V VM.   Add the generic hv_setup_sched_clock()
>> > to set the sched clock function appropriately.  On x86, this sets
>> > pv_ops.time.sched_clock to read the Hyper-V reference TSC value that is
>> > scaled and adjusted to be continuous.
>> 
>> Hypervisor can, in theory, disable TSC page and then we're forced to use
>> MSR-based clocksource but using it as sched_clock() can be very slow,
>> I'm afraid.
>> 
>> On the other hand, what we have now is probably worse: TSC can,
>> actually, jump backwards (e.g. on migration) and we're breaking the
>> requirements for sched_clock().
>
> That (obviously) also breaks the requirements for using TSC as
> clocksource.
>
> IOW, it breaks the entire purpose of having TSC in the first place.

Currently, we mark raw TSC as unstable when running on Hyper-V (see
88c9281a9fba6), 'TSC page' (which is TSC * scale + offset) is being used
instead. The problem is that 'TSC page' can be disabled by the
hypervisor and in that case the only remaining clocksource is MSR-based
(slow).

-- 
Vitaly

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/2] clocksource/Hyper-V: Add Hyper-V specific sched clock function
  2019-07-29 12:13     ` Vitaly Kuznetsov
@ 2019-07-30 13:41       ` Tianyu Lan
  2019-08-12 19:22         ` Michael Kelley
  0 siblings, 1 reply; 14+ messages in thread
From: Tianyu Lan @ 2019-07-30 13:41 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Peter Zijlstra, Tianyu Lan, linux-arch, linux-hyperv,
	linux-kernel@vger kernel org, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	the arch/x86 maintainers, KY Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Sasha Levin, Daniel Lezcano, Arnd Bergmann,
	michael.h.kelley, ashal

Hi Vitaly & Peter:
    Thanks for your review.

On Mon, Jul 29, 2019 at 8:13 PM Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
>
> Peter Zijlstra <peterz@infradead.org> writes:
>
> > On Mon, Jul 29, 2019 at 12:59:26PM +0200, Vitaly Kuznetsov wrote:
> >> lantianyu1986@gmail.com writes:
> >>
> >> > From: Tianyu Lan <Tianyu.Lan@microsoft.com>
> >> >
> >> > Hyper-V guests use the default native_sched_clock() in pv_ops.time.sched_clock
> >> > on x86.  But native_sched_clock() directly uses the raw TSC value, which
> >> > can be discontinuous in a Hyper-V VM.   Add the generic hv_setup_sched_clock()
> >> > to set the sched clock function appropriately.  On x86, this sets
> >> > pv_ops.time.sched_clock to read the Hyper-V reference TSC value that is
> >> > scaled and adjusted to be continuous.
> >>
> >> Hypervisor can, in theory, disable TSC page and then we're forced to use
> >> MSR-based clocksource but using it as sched_clock() can be very slow,
> >> I'm afraid.
> >>
> >> On the other hand, what we have now is probably worse: TSC can,
> >> actually, jump backwards (e.g. on migration) and we're breaking the
> >> requirements for sched_clock().
> >
> > That (obviously) also breaks the requirements for using TSC as
> > clocksource.
> >
> > IOW, it breaks the entire purpose of having TSC in the first place.
>
> Currently, we mark raw TSC as unstable when running on Hyper-V (see
> 88c9281a9fba6), 'TSC page' (which is TSC * scale + offset) is being used
> instead. The problem is that 'TSC page' can be disabled by the
> hypervisor and in that case the only remaining clocksource is MSR-based
> (slow).
>

Yes, that will be slow if Hyper-V doesn't expose hv tsc page and
kernel uses MSR based
clocksource. Each MSR read will trigger one VM-EXIT. This also happens on other
hypervisors (e,g, KVM doesn't expose KVM clock). Hypervisor should
take this into
account and determine which clocksource should be exposed or not.

-- 
Best regards
Tianyu Lan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH 1/2] clocksource/Hyper-v: Allocate Hyper-V tsc page statically
  2019-07-29  7:52 ` [PATCH 1/2] clocksource/Hyper-v: Allocate Hyper-V tsc page statically lantianyu1986
@ 2019-08-12 18:39   ` Michael Kelley
  0 siblings, 0 replies; 14+ messages in thread
From: Michael Kelley @ 2019-08-12 18:39 UTC (permalink / raw)
  To: lantianyu1986, luto, tglx, mingo, bp, hpa, x86, KY Srinivasan,
	Haiyang Zhang, Stephen Hemminger, sashal, daniel.lezcano, arnd,
	ashal
  Cc: Tianyu Lan, linux-kernel, linux-hyperv, linux-arch

From: lantianyu1986@gmail.com <lantianyu1986@gmail.com> Sent: Monday, July 29, 2019 12:53 AM
> 
> This is to prepare to add Hyper-V sched clock callback and move
> Hyper-V reference TSC initialization much earlier in the boot
> process when timestamp is 0. So no discontinuity is observed
> when pv_ops.time.sched_clock to calculate its offset. This earlier
> initialization requires that the Hyper-V TSC page be allocated
> statically instead of with vmalloc(), so fixup the references
> to the TSC page and the method of getting its physical address.

I'd suggest tweaking the commit message wording a bit:

Prepare to add Hyper-V sched clock callback and move Hyper-V
Reference TSC initialization much earlier in the boot process.  Earlier
initialization is needed so that it happens while the timestamp value
is still 0 and no discontinuity in the timestamp will occur when 
pv_ops.time.sched_clock calculates its offset.  The earlier
initialization requires that the Hyper-V TSC page be allocated
statically instead of with vmalloc(), so fixup the references
to the TSC page and the method of getting its physical address.

> 
> Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
> ---
>  arch/x86/entry/vdso/vma.c          |  2 +-
>  drivers/clocksource/hyperv_timer.c | 12 ++++--------
>  2 files changed, 5 insertions(+), 9 deletions(-)
> 
> @@ -280,12 +280,8 @@ static bool __init hv_init_tsc_clocksource(void)
>  	if (!(ms_hyperv.features & HV_MSR_REFERENCE_TSC_AVAILABLE))
>  		return false;
> 
> -	tsc_pg = vmalloc(PAGE_SIZE);
> -	if (!tsc_pg)
> -		return false;
> -
>  	hyperv_cs = &hyperv_cs_tsc;
> -	phys_addr = page_to_phys(vmalloc_to_page(tsc_pg));
> +	phys_addr = virt_to_phys(&tsc_pg) & PAGE_MASK;

The and'ing with PAGE_MASK isn't needed.  You've set up tsc_pg
to ensure it is page aligned, so there's no need to mask out any
low order bits.  That's why the previous code didn't do the masking
either.

> 
>  	/*
>  	 * The Hyper-V TLFS specifies to preserve the value of reserved
> --
> 2.14.5


^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH 2/2] clocksource/Hyper-V:  Add Hyper-V specific sched clock function
  2019-07-29  7:52 ` [PATCH 2/2] clocksource/Hyper-V: Add Hyper-V specific sched clock function lantianyu1986
@ 2019-08-12 18:41   ` Michael Kelley
  0 siblings, 0 replies; 14+ messages in thread
From: Michael Kelley @ 2019-08-12 18:41 UTC (permalink / raw)
  To: lantianyu1986, KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	sashal, tglx, mingo, bp, hpa, x86, daniel.lezcano, arnd, ashal
  Cc: Tianyu Lan, linux-hyperv, linux-kernel, linux-arch

From: Tianyu Lan <Tianyu.Lan@microsoft.com>  Sent: Monday, July 29, 2019 12:53 AM
> 
> Hyper-V guests use the default native_sched_clock() in pv_ops.time.sched_clock
> on x86.  But native_sched_clock() directly uses the raw TSC value, which
> can be discontinuous in a Hyper-V VM. Add the generic hv_setup_sched_clock()
> to set the sched clock function appropriately. On x86, this sets pv_ops.time.
> sched_clock to read the Hyper-V reference TSC value that is scaled and adjusted
> to be continuous.
> 
> Also move the Hyper-V reference TSC initialization much earlier in the boot
> process so no discontinuity is observed when pv_ops.time.sched_clock
> calculates its offset.
> 
> Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
> ---
>  arch/x86/hyperv/hv_init.c          |  2 --
>  arch/x86/kernel/cpu/mshyperv.c     |  8 ++++++++
>  drivers/clocksource/hyperv_timer.c | 22 ++++++++++++----------
>  include/asm-generic/mshyperv.h     |  1 +
>  4 files changed, 21 insertions(+), 12 deletions(-)
> 

Reviewed-by: Michael Kelley <mikelley@microsoft.com>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH 0/2] clocksource/Hyper-V: Add Hyper-V specific sched clock function
  2019-07-30 13:41       ` Tianyu Lan
@ 2019-08-12 19:22         ` Michael Kelley
  2019-08-13  8:33           ` Vitaly Kuznetsov
  0 siblings, 1 reply; 14+ messages in thread
From: Michael Kelley @ 2019-08-12 19:22 UTC (permalink / raw)
  To: Tianyu Lan, vkuznets
  Cc: Peter Zijlstra, Tianyu Lan, linux-arch, linux-hyperv,
	linux-kernel@vger kernel org, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	the arch/x86 maintainers, KY Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Sasha Levin, Daniel Lezcano, Arnd Bergmann,
	ashal

From: Tianyu Lan <lantianyu1986@gmail.com> Sent: Tuesday, July 30, 2019 6:41 AM
> 
> On Mon, Jul 29, 2019 at 8:13 PM Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
> >
> > Peter Zijlstra <peterz@infradead.org> writes:
> >
> > > On Mon, Jul 29, 2019 at 12:59:26PM +0200, Vitaly Kuznetsov wrote:
> > >> lantianyu1986@gmail.com writes:
> > >>
> > >> > From: Tianyu Lan <Tianyu.Lan@microsoft.com>
> > >> >
> > >> > Hyper-V guests use the default native_sched_clock() in pv_ops.time.sched_clock
> > >> > on x86.  But native_sched_clock() directly uses the raw TSC value, which
> > >> > can be discontinuous in a Hyper-V VM.   Add the generic hv_setup_sched_clock()
> > >> > to set the sched clock function appropriately.  On x86, this sets
> > >> > pv_ops.time.sched_clock to read the Hyper-V reference TSC value that is
> > >> > scaled and adjusted to be continuous.
> > >>
> > >> Hypervisor can, in theory, disable TSC page and then we're forced to use
> > >> MSR-based clocksource but using it as sched_clock() can be very slow,
> > >> I'm afraid.
> > >>
> > >> On the other hand, what we have now is probably worse: TSC can,
> > >> actually, jump backwards (e.g. on migration) and we're breaking the
> > >> requirements for sched_clock().
> > >
> > > That (obviously) also breaks the requirements for using TSC as
> > > clocksource.
> > >
> > > IOW, it breaks the entire purpose of having TSC in the first place.
> >
> > Currently, we mark raw TSC as unstable when running on Hyper-V (see
> > 88c9281a9fba6), 'TSC page' (which is TSC * scale + offset) is being used
> > instead. The problem is that 'TSC page' can be disabled by the
> > hypervisor and in that case the only remaining clocksource is MSR-based
> > (slow).
> >
> 
> Yes, that will be slow if Hyper-V doesn't expose hv tsc page and
> kernel uses MSR based
> clocksource. Each MSR read will trigger one VM-EXIT. This also happens on other
> hypervisors (e,g, KVM doesn't expose KVM clock). Hypervisor should
> take this into
> account and determine which clocksource should be exposed or not.
> 

We've confirmed with the Hyper-V team that the TSC page is always available
on Hyper-V 2016 and later, and on Hyper-V 2012 R2 when the physical
hardware presents an InvariantTSC.  But the Linux Kconfig's are set up so
the TSC page is not used for 32-bit guests -- all clock reads are synthetic MSR
reads.  For 32-bit, this set of changes will add more overhead because the
sched clock reads will now be MSR reads.

I would be inclined to fix the problem, even with the perf hit on 32-bit Linux.
I don’t have any data on 32-bit Linux being used in a Hyper-V guest, but it's not
supported in Azure so usage is pretty small.  The alternative would be to continue
to use the raw TSC value on 32-bit, even with the risk of a discontinuity in case of
live migration or similar scenarios.

Michael

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH 0/2] clocksource/Hyper-V: Add Hyper-V specific sched clock function
  2019-08-12 19:22         ` Michael Kelley
@ 2019-08-13  8:33           ` Vitaly Kuznetsov
  2019-08-20 14:32             ` Michael Kelley
  0 siblings, 1 reply; 14+ messages in thread
From: Vitaly Kuznetsov @ 2019-08-13  8:33 UTC (permalink / raw)
  To: Michael Kelley, Tianyu Lan
  Cc: Peter Zijlstra, Tianyu Lan, linux-arch, linux-hyperv,
	linux-kernel@vger kernel org, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	the arch/x86 maintainers, KY Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Sasha Levin, Daniel Lezcano, Arnd Bergmann,
	ashal

Michael Kelley <mikelley@microsoft.com> writes:

> From: Tianyu Lan <lantianyu1986@gmail.com> Sent: Tuesday, July 30, 2019 6:41 AM
>> 
>> On Mon, Jul 29, 2019 at 8:13 PM Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
>> >
>> > Peter Zijlstra <peterz@infradead.org> writes:
>> >
>> > > On Mon, Jul 29, 2019 at 12:59:26PM +0200, Vitaly Kuznetsov wrote:
>> > >> lantianyu1986@gmail.com writes:
>> > >>
>> > >> > From: Tianyu Lan <Tianyu.Lan@microsoft.com>
>> > >> >
>> > >> > Hyper-V guests use the default native_sched_clock() in pv_ops.time.sched_clock
>> > >> > on x86.  But native_sched_clock() directly uses the raw TSC value, which
>> > >> > can be discontinuous in a Hyper-V VM.   Add the generic hv_setup_sched_clock()
>> > >> > to set the sched clock function appropriately.  On x86, this sets
>> > >> > pv_ops.time.sched_clock to read the Hyper-V reference TSC value that is
>> > >> > scaled and adjusted to be continuous.
>> > >>
>> > >> Hypervisor can, in theory, disable TSC page and then we're forced to use
>> > >> MSR-based clocksource but using it as sched_clock() can be very slow,
>> > >> I'm afraid.
>> > >>
>> > >> On the other hand, what we have now is probably worse: TSC can,
>> > >> actually, jump backwards (e.g. on migration) and we're breaking the
>> > >> requirements for sched_clock().
>> > >
>> > > That (obviously) also breaks the requirements for using TSC as
>> > > clocksource.
>> > >
>> > > IOW, it breaks the entire purpose of having TSC in the first place.
>> >
>> > Currently, we mark raw TSC as unstable when running on Hyper-V (see
>> > 88c9281a9fba6), 'TSC page' (which is TSC * scale + offset) is being used
>> > instead. The problem is that 'TSC page' can be disabled by the
>> > hypervisor and in that case the only remaining clocksource is MSR-based
>> > (slow).
>> >
>> 
>> Yes, that will be slow if Hyper-V doesn't expose hv tsc page and
>> kernel uses MSR based
>> clocksource. Each MSR read will trigger one VM-EXIT. This also happens on other
>> hypervisors (e,g, KVM doesn't expose KVM clock). Hypervisor should
>> take this into
>> account and determine which clocksource should be exposed or not.
>> 
>
> We've confirmed with the Hyper-V team that the TSC page is always available
> on Hyper-V 2016 and later, and on Hyper-V 2012 R2 when the physical
> hardware presents an InvariantTSC.

Currently we check that TSC page is valid on every read and it seems
this is redundant, right? It is either available on boot or not. I can
only imagine migrating a VM to a non-InvariantTSC host when Hyper-V will
likely disable the page (and we can get reenlightenment notification
then).

>  But the Linux Kconfig's are set up so
> the TSC page is not used for 32-bit guests -- all clock reads are synthetic MSR
> reads.  For 32-bit, this set of changes will add more overhead because the
> sched clock reads will now be MSR reads.
>
> I would be inclined to fix the problem, even with the perf hit on 32-bit Linux.
> I don’t have any data on 32-bit Linux being used in a Hyper-V guest, but it's not
> supported in Azure so usage is pretty small.  The alternative would be to continue
> to use the raw TSC value on 32-bit, even with the risk of a discontinuity in case of
> live migration or similar scenarios.

The issue needs fixing, I agree, however using MSR based clocksource as
sched clock may give us too big of a performance hit (not sure who cares
about 32 bit guest performance nowadays but still). What stops us from
enabling TSC page for 32 bit guests if it is available?

-- 
Vitaly

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH 0/2] clocksource/Hyper-V: Add Hyper-V specific sched clock function
  2019-08-13  8:33           ` Vitaly Kuznetsov
@ 2019-08-20 14:32             ` Michael Kelley
  2019-08-21  7:15               ` Vitaly Kuznetsov
  0 siblings, 1 reply; 14+ messages in thread
From: Michael Kelley @ 2019-08-20 14:32 UTC (permalink / raw)
  To: vkuznets, Tianyu Lan
  Cc: Peter Zijlstra, Tianyu Lan, linux-arch, linux-hyperv,
	linux-kernel@vger kernel org, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	the arch/x86 maintainers, KY Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Sasha Levin, Daniel Lezcano, Arnd Bergmann,
	ashal

From: Vitaly Kuznetsov <vkuznets@redhat.com> Sent: Tuesday, August 13, 2019 1:34 AM
> 
> Michael Kelley <mikelley@microsoft.com> writes:
> 
> > From: Tianyu Lan <lantianyu1986@gmail.com> Sent: Tuesday, July 30, 2019 6:41 AM
> >>
> >> On Mon, Jul 29, 2019 at 8:13 PM Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
> >> >
> >> > Peter Zijlstra <peterz@infradead.org> writes:
> >> >
> >> > > On Mon, Jul 29, 2019 at 12:59:26PM +0200, Vitaly Kuznetsov wrote:
> >> > >> lantianyu1986@gmail.com writes:
> >> > >>
> >> > >> > From: Tianyu Lan <Tianyu.Lan@microsoft.com>
> >> > >> >
> >> > >> > Hyper-V guests use the default native_sched_clock() in pv_ops.time.sched_clock
> >> > >> > on x86.  But native_sched_clock() directly uses the raw TSC value, which
> >> > >> > can be discontinuous in a Hyper-V VM.   Add the generic hv_setup_sched_clock()
> >> > >> > to set the sched clock function appropriately.  On x86, this sets
> >> > >> > pv_ops.time.sched_clock to read the Hyper-V reference TSC value that is
> >> > >> > scaled and adjusted to be continuous.
> >> > >>
> >> > >> Hypervisor can, in theory, disable TSC page and then we're forced to use
> >> > >> MSR-based clocksource but using it as sched_clock() can be very slow,
> >> > >> I'm afraid.
> >> > >>
> >> > >> On the other hand, what we have now is probably worse: TSC can,
> >> > >> actually, jump backwards (e.g. on migration) and we're breaking the
> >> > >> requirements for sched_clock().
> >> > >
> >> > > That (obviously) also breaks the requirements for using TSC as
> >> > > clocksource.
> >> > >
> >> > > IOW, it breaks the entire purpose of having TSC in the first place.
> >> >
> >> > Currently, we mark raw TSC as unstable when running on Hyper-V (see
> >> > 88c9281a9fba6), 'TSC page' (which is TSC * scale + offset) is being used
> >> > instead. The problem is that 'TSC page' can be disabled by the
> >> > hypervisor and in that case the only remaining clocksource is MSR-based
> >> > (slow).
> >> >
> >>
> >> Yes, that will be slow if Hyper-V doesn't expose hv tsc page and
> >> kernel uses MSR based
> >> clocksource. Each MSR read will trigger one VM-EXIT. This also happens on other
> >> hypervisors (e,g, KVM doesn't expose KVM clock). Hypervisor should
> >> take this into
> >> account and determine which clocksource should be exposed or not.
> >>
> >
> > We've confirmed with the Hyper-V team that the TSC page is always available
> > on Hyper-V 2016 and later, and on Hyper-V 2012 R2 when the physical
> > hardware presents an InvariantTSC.
> 
> Currently we check that TSC page is valid on every read and it seems
> this is redundant, right? It is either available on boot or not. I can
> only imagine migrating a VM to a non-InvariantTSC host when Hyper-V will
> likely disable the page (and we can get reenlightenment notification
> then).

I think Hyper-V can have brief intervals when the TSC page is not valid, so
the code checks for the "sequence" value being zero.   Otherwise, yes, it
should always be there or not be there.  Is there some other validity
check on every read that you are thinking of?

> 
> >  But the Linux Kconfig's are set up so
> > the TSC page is not used for 32-bit guests -- all clock reads are synthetic MSR
> > reads.  For 32-bit, this set of changes will add more overhead because the
> > sched clock reads will now be MSR reads.
> >
> > I would be inclined to fix the problem, even with the perf hit on 32-bit Linux.
> > I don’t have any data on 32-bit Linux being used in a Hyper-V guest, but it's not
> > supported in Azure so usage is pretty small.  The alternative would be to continue
> > to use the raw TSC value on 32-bit, even with the risk of a discontinuity in case of
> > live migration or similar scenarios.
> 
> The issue needs fixing, I agree, however using MSR based clocksource as
> sched clock may give us too big of a performance hit (not sure who cares
> about 32 bit guest performance nowadays but still). What stops us from
> enabling TSC page for 32 bit guests if it is available?

I talked to KY Srinivasan for any history about TSC page on 32-bit.  He said
there was no technical reason not to implement it, but our focus was always
64-bit Linux, so the 32-bit was much less important.  Also, on 32-bit Linux,
the required 64x64 multiply and shift is more complex and takes more
more cycles (compare 32-bit implementation of mul_u64_u64_shr vs.
the 64-bit implementation), so the win over a MSR read is less.  I
don't know of any actual measurements being made to compare vs.
MSR read.

Michael

> 
> --
> Vitaly

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH 0/2] clocksource/Hyper-V: Add Hyper-V specific sched clock function
  2019-08-20 14:32             ` Michael Kelley
@ 2019-08-21  7:15               ` Vitaly Kuznetsov
  2019-08-21  8:54                 ` Vitaly Kuznetsov
  0 siblings, 1 reply; 14+ messages in thread
From: Vitaly Kuznetsov @ 2019-08-21  7:15 UTC (permalink / raw)
  To: Michael Kelley, Tianyu Lan
  Cc: Peter Zijlstra, Tianyu Lan, linux-arch, linux-hyperv,
	linux-kernel@vger kernel org, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	the arch/x86 maintainers, KY Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Sasha Levin, Daniel Lezcano, Arnd Bergmann,
	ashal

Michael Kelley <mikelley@microsoft.com> writes:

> From: Vitaly Kuznetsov <vkuznets@redhat.com> Sent: Tuesday, August 13, 2019 1:34 AM
>> 
>> Michael Kelley <mikelley@microsoft.com> writes:
>> 
>> > From: Tianyu Lan <lantianyu1986@gmail.com> Sent: Tuesday, July 30, 2019 6:41 AM
>> >>
>> >> On Mon, Jul 29, 2019 at 8:13 PM Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
>> >> >
>> >> > Peter Zijlstra <peterz@infradead.org> writes:
>> >> >
>> >> > > On Mon, Jul 29, 2019 at 12:59:26PM +0200, Vitaly Kuznetsov wrote:
>> >> > >> lantianyu1986@gmail.com writes:
>> >> > >>
>> >> > >> > From: Tianyu Lan <Tianyu.Lan@microsoft.com>
>> >> > >> >
>> >> > >> > Hyper-V guests use the default native_sched_clock() in pv_ops.time.sched_clock
>> >> > >> > on x86.  But native_sched_clock() directly uses the raw TSC value, which
>> >> > >> > can be discontinuous in a Hyper-V VM.   Add the generic hv_setup_sched_clock()
>> >> > >> > to set the sched clock function appropriately.  On x86, this sets
>> >> > >> > pv_ops.time.sched_clock to read the Hyper-V reference TSC value that is
>> >> > >> > scaled and adjusted to be continuous.
>> >> > >>
>> >> > >> Hypervisor can, in theory, disable TSC page and then we're forced to use
>> >> > >> MSR-based clocksource but using it as sched_clock() can be very slow,
>> >> > >> I'm afraid.
>> >> > >>
>> >> > >> On the other hand, what we have now is probably worse: TSC can,
>> >> > >> actually, jump backwards (e.g. on migration) and we're breaking the
>> >> > >> requirements for sched_clock().
>> >> > >
>> >> > > That (obviously) also breaks the requirements for using TSC as
>> >> > > clocksource.
>> >> > >
>> >> > > IOW, it breaks the entire purpose of having TSC in the first place.
>> >> >
>> >> > Currently, we mark raw TSC as unstable when running on Hyper-V (see
>> >> > 88c9281a9fba6), 'TSC page' (which is TSC * scale + offset) is being used
>> >> > instead. The problem is that 'TSC page' can be disabled by the
>> >> > hypervisor and in that case the only remaining clocksource is MSR-based
>> >> > (slow).
>> >> >
>> >>
>> >> Yes, that will be slow if Hyper-V doesn't expose hv tsc page and
>> >> kernel uses MSR based
>> >> clocksource. Each MSR read will trigger one VM-EXIT. This also happens on other
>> >> hypervisors (e,g, KVM doesn't expose KVM clock). Hypervisor should
>> >> take this into
>> >> account and determine which clocksource should be exposed or not.
>> >>
>> >
>> > We've confirmed with the Hyper-V team that the TSC page is always available
>> > on Hyper-V 2016 and later, and on Hyper-V 2012 R2 when the physical
>> > hardware presents an InvariantTSC.
>> 
>> Currently we check that TSC page is valid on every read and it seems
>> this is redundant, right? It is either available on boot or not. I can
>> only imagine migrating a VM to a non-InvariantTSC host when Hyper-V will
>> likely disable the page (and we can get reenlightenment notification
>> then).
>
> I think Hyper-V can have brief intervals when the TSC page is not valid, so
> the code checks for the "sequence" value being zero.   Otherwise, yes, it
> should always be there or not be there.  Is there some other validity
> check on every read that you are thinking of?
>

No, it's this one. In case these 'invalidity periods' are real there's
nothing to improve in the current code.

>> 
>> >  But the Linux Kconfig's are set up so
>> > the TSC page is not used for 32-bit guests -- all clock reads are synthetic MSR
>> > reads.  For 32-bit, this set of changes will add more overhead because the
>> > sched clock reads will now be MSR reads.
>> >
>> > I would be inclined to fix the problem, even with the perf hit on 32-bit Linux.
>> > I don’t have any data on 32-bit Linux being used in a Hyper-V guest, but it's not
>> > supported in Azure so usage is pretty small.  The alternative would be to continue
>> > to use the raw TSC value on 32-bit, even with the risk of a discontinuity in case of
>> > live migration or similar scenarios.
>> 
>> The issue needs fixing, I agree, however using MSR based clocksource as
>> sched clock may give us too big of a performance hit (not sure who cares
>> about 32 bit guest performance nowadays but still). What stops us from
>> enabling TSC page for 32 bit guests if it is available?
>
> I talked to KY Srinivasan for any history about TSC page on 32-bit.  He said
> there was no technical reason not to implement it, but our focus was always
> 64-bit Linux, so the 32-bit was much less important.  Also, on 32-bit Linux,
> the required 64x64 multiply and shift is more complex and takes more
> more cycles (compare 32-bit implementation of mul_u64_u64_shr vs.
> the 64-bit implementation), so the win over a MSR read is less.  I
> don't know of any actual measurements being made to compare vs.
> MSR read.

VMExit is 1000 CPU cycles or so, I would guess that TSC page
calculations are better. Let me try to build 32bit kernel and do some
quick measurements.

-- 
Vitaly

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH 0/2] clocksource/Hyper-V: Add Hyper-V specific sched clock function
  2019-08-21  7:15               ` Vitaly Kuznetsov
@ 2019-08-21  8:54                 ` Vitaly Kuznetsov
  0 siblings, 0 replies; 14+ messages in thread
From: Vitaly Kuznetsov @ 2019-08-21  8:54 UTC (permalink / raw)
  To: Michael Kelley, Tianyu Lan
  Cc: Peter Zijlstra, Tianyu Lan, linux-arch, linux-hyperv,
	linux-kernel@vger kernel org, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	the arch/x86 maintainers, KY Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Sasha Levin, Daniel Lezcano, Arnd Bergmann

Vitaly Kuznetsov <vkuznets@redhat.com> writes:

> Michael Kelley <mikelley@microsoft.com> writes:
>
>> I talked to KY Srinivasan for any history about TSC page on 32-bit.  He said
>> there was no technical reason not to implement it, but our focus was always
>> 64-bit Linux, so the 32-bit was much less important.  Also, on 32-bit Linux,
>> the required 64x64 multiply and shift is more complex and takes more
>> more cycles (compare 32-bit implementation of mul_u64_u64_shr vs.
>> the 64-bit implementation), so the win over a MSR read is less.  I
>> don't know of any actual measurements being made to compare vs.
>> MSR read.
>
> VMExit is 1000 CPU cycles or so, I would guess that TSC page
> calculations are better. Let me try to build 32bit kernel and do some
> quick measurements.

So I tried and the difference is HUGE.

For in-kernel clocksource reads (like sched_clock()), the testing code
was:

        before = rdtsc_ordered();
        for (i = 0; i < 1000; i++)
             (void)read_hv_sched_clock_msr();
        after = rdtsc_ordered();
        printk("MSR based clocksource: %d cycles\n", ((u32)(after - before))/1000);

        before = rdtsc_ordered();
        for (i = 0; i < 1000; i++)
            (void)read_hv_sched_clock_tsc();
        after = rdtsc_ordered();
        printk("TSC page clocksource: %d cycles\n", ((u32)(after - before))/1000);

The result (WS2016) is:
[    1.101910] MSR based clocksource: 3361 cycles
[    1.105224] TSC page clocksource: 49 cycles

For userspace reads the absolute difference is even bigger as TSC page
gives us functional vDSO:

Testing code:
	before = rdtsc();
	for (i = 0; i < COUNT; i++)
		clock_gettime(CLOCK_REALTIME, &tp);
	after = rdtsc();
	printf("%d\n", (after - before)/COUNT);

Result:

TSC page:
# ./gettime_cycles 
131

MSR:
# ./gettime_cycles 
5664

With all that I see no reason for us to not enable TSC page on 32bit,
even if the number of users is negligible, this will allow us to get rid
of ugly #ifdef CONFIG_HYPERV_TSCPAGE in the code.

I'll send a patch for discussion.

-- 
Vitaly

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2019-08-21  8:54 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-29  7:52 [PATCH 0/2] clocksource/Hyper-V: Add Hyper-V specific sched clock function lantianyu1986
2019-07-29  7:52 ` [PATCH 1/2] clocksource/Hyper-v: Allocate Hyper-V tsc page statically lantianyu1986
2019-08-12 18:39   ` Michael Kelley
2019-07-29  7:52 ` [PATCH 2/2] clocksource/Hyper-V: Add Hyper-V specific sched clock function lantianyu1986
2019-08-12 18:41   ` Michael Kelley
2019-07-29 10:59 ` [PATCH 0/2] " Vitaly Kuznetsov
2019-07-29 11:09   ` Peter Zijlstra
2019-07-29 12:13     ` Vitaly Kuznetsov
2019-07-30 13:41       ` Tianyu Lan
2019-08-12 19:22         ` Michael Kelley
2019-08-13  8:33           ` Vitaly Kuznetsov
2019-08-20 14:32             ` Michael Kelley
2019-08-21  7:15               ` Vitaly Kuznetsov
2019-08-21  8:54                 ` Vitaly Kuznetsov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).