linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] x86/vdso: Add Hyper-V TSC page clocksource support
@ 2017-02-09 14:10 Vitaly Kuznetsov
  2017-02-09 14:10 ` [PATCH 1/2] hyperv: implement hv_get_tsc_page() Vitaly Kuznetsov
  2017-02-09 14:10 ` [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method Vitaly Kuznetsov
  0 siblings, 2 replies; 22+ messages in thread
From: Vitaly Kuznetsov @ 2017-02-09 14:10 UTC (permalink / raw)
  To: x86, Andy Lutomirski
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, K. Y. Srinivasan,
	Haiyang Zhang, Stephen Hemminger, Dexuan Cui, linux-kernel,
	devel, virtualization

Hi,

Hyper-V TSC page clocksource is suitable for vDSO, however, the protocol
defined by the hypervisor is different from VCLOCK_PVCLOCK. Implemented the
required support. Simple sysbench test shows the following results:

Before:
# time sysbench --test=memory --max-requests=500000 run
...
real    1m22.618s
user    0m50.193s
sys     0m32.268s

After:
# time sysbench --test=memory --max-requests=500000 run
...
real	0m47.241s
user	0m47.117s
sys	0m0.008s

So it seems it is worth it. As nobody seems to be strongly offended by my RFC
I'm sending first non-RFC version. Patch 1 is made on top of K. Y.'s code
refactoring which moved tsc page clocksource to arch/x86/hyperv/hv_init.c,
this is currently present in Greg's char-misc-next tree.

Changes since RFC:
- Use mul_u64_u64_shr() instead of an open coded implementation
  [Andy Lutomirski, Thomas Gleixner]
- Don't use the same pvclock_page for both VCLOCK_PVCLOCK and
  VCLOCK_HVCLOCK, create another one [Andy Lutomirski]
- Fix issues reported by kbuild test robot.
- Rename HYPERV_CLOCK -> HYPERV_TSCPAGE to avoid the ambiguity.

I'm also going to try to optimize mul_u64_u64_shr() for 32bit but this can
be split from this series I guess.

Vitaly Kuznetsov (2):
  hyperv: implement hv_get_tsc_page()
  x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method

 arch/x86/entry/vdso/vclock_gettime.c  | 36 +++++++++++++++++++++++++++++++++++
 arch/x86/entry/vdso/vdso-layout.lds.S |  3 ++-
 arch/x86/entry/vdso/vdso2c.c          |  3 +++
 arch/x86/entry/vdso/vma.c             |  7 +++++++
 arch/x86/hyperv/hv_init.c             | 12 ++++++++++--
 arch/x86/include/asm/clocksource.h    |  3 ++-
 arch/x86/include/asm/mshyperv.h       |  8 ++++++++
 arch/x86/include/asm/vdso.h           |  1 +
 drivers/hv/Kconfig                    |  3 +++
 9 files changed, 72 insertions(+), 4 deletions(-)

-- 
2.9.3

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 1/2] hyperv: implement hv_get_tsc_page()
  2017-02-09 14:10 [PATCH 0/2] x86/vdso: Add Hyper-V TSC page clocksource support Vitaly Kuznetsov
@ 2017-02-09 14:10 ` Vitaly Kuznetsov
  2017-02-09 18:24   ` Stephen Hemminger
  2017-02-09 14:10 ` [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method Vitaly Kuznetsov
  1 sibling, 1 reply; 22+ messages in thread
From: Vitaly Kuznetsov @ 2017-02-09 14:10 UTC (permalink / raw)
  To: x86, Andy Lutomirski
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, K. Y. Srinivasan,
	Haiyang Zhang, Stephen Hemminger, Dexuan Cui, linux-kernel,
	devel, virtualization

To use Hyper-V TSC page clocksource from vDSO we need to make tsc_pg
available. Implement hv_get_tsc_page() and add CONFIG_HYPERV_TSCPAGE to
make #ifdef-s simple.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/hyperv/hv_init.c       | 9 +++++++--
 arch/x86/include/asm/mshyperv.h | 8 ++++++++
 drivers/hv/Kconfig              | 3 +++
 3 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index b371d0e..0ce8485 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -27,10 +27,15 @@
 #include <linux/clockchips.h>
 
 
-#ifdef CONFIG_X86_64
+#ifdef CONFIG_HYPERV_TSCPAGE
 
 static struct ms_hyperv_tsc_page *tsc_pg;
 
+struct ms_hyperv_tsc_page *hv_get_tsc_page(void)
+{
+	return tsc_pg;
+}
+
 static u64 read_hv_clock_tsc(struct clocksource *arg)
 {
 	u64 current_tick;
@@ -136,7 +141,7 @@ void hyperv_init(void)
 	/*
 	 * Register Hyper-V specific clocksource.
 	 */
-#ifdef CONFIG_X86_64
+#ifdef CONFIG_HYPERV_TSCPAGE
 	if (ms_hyperv.features & HV_X64_MSR_REFERENCE_TSC_AVAILABLE) {
 		union hv_x64_msr_hypercall_contents tsc_msr;
 
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index f8dc370..14dd92c 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -173,4 +173,12 @@ void hyperv_report_panic(struct pt_regs *regs);
 bool hv_is_hypercall_page_setup(void);
 void hyperv_cleanup(void);
 #endif
+#ifdef CONFIG_HYPERV_TSCPAGE
+struct ms_hyperv_tsc_page *hv_get_tsc_page(void);
+#else
+static inline struct ms_hyperv_tsc_page *hv_get_tsc_page(void)
+{
+	return NULL;
+}
+#endif
 #endif
diff --git a/drivers/hv/Kconfig b/drivers/hv/Kconfig
index 0403b51..c29cd53 100644
--- a/drivers/hv/Kconfig
+++ b/drivers/hv/Kconfig
@@ -7,6 +7,9 @@ config HYPERV
 	  Select this option to run Linux as a Hyper-V client operating
 	  system.
 
+config HYPERV_TSCPAGE
+       def_bool HYPERV && X86_64
+
 config HYPERV_UTILS
 	tristate "Microsoft Hyper-V Utilities driver"
 	depends on HYPERV && CONNECTOR && NLS
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method
  2017-02-09 14:10 [PATCH 0/2] x86/vdso: Add Hyper-V TSC page clocksource support Vitaly Kuznetsov
  2017-02-09 14:10 ` [PATCH 1/2] hyperv: implement hv_get_tsc_page() Vitaly Kuznetsov
@ 2017-02-09 14:10 ` Vitaly Kuznetsov
  2017-02-09 17:08   ` Thomas Gleixner
  1 sibling, 1 reply; 22+ messages in thread
From: Vitaly Kuznetsov @ 2017-02-09 14:10 UTC (permalink / raw)
  To: x86, Andy Lutomirski
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, K. Y. Srinivasan,
	Haiyang Zhang, Stephen Hemminger, Dexuan Cui, linux-kernel,
	devel, virtualization

Hyper-V TSC page clocksource is suitable for vDSO, however, the protocol
defined by the hypervisor is different from VCLOCK_PVCLOCK. Implement the
required support by adding hvclock_page VVAR.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/entry/vdso/vclock_gettime.c  | 36 +++++++++++++++++++++++++++++++++++
 arch/x86/entry/vdso/vdso-layout.lds.S |  3 ++-
 arch/x86/entry/vdso/vdso2c.c          |  3 +++
 arch/x86/entry/vdso/vma.c             |  7 +++++++
 arch/x86/hyperv/hv_init.c             |  3 +++
 arch/x86/include/asm/clocksource.h    |  3 ++-
 arch/x86/include/asm/vdso.h           |  1 +
 7 files changed, 54 insertions(+), 2 deletions(-)

diff --git a/arch/x86/entry/vdso/vclock_gettime.c b/arch/x86/entry/vdso/vclock_gettime.c
index 9d4d6e1..4af10b4 100644
--- a/arch/x86/entry/vdso/vclock_gettime.c
+++ b/arch/x86/entry/vdso/vclock_gettime.c
@@ -17,6 +17,7 @@
 #include <asm/unistd.h>
 #include <asm/msr.h>
 #include <asm/pvclock.h>
+#include <asm/mshyperv.h>
 #include <linux/math64.h>
 #include <linux/time.h>
 #include <linux/kernel.h>
@@ -32,6 +33,11 @@ extern u8 pvclock_page
 	__attribute__((visibility("hidden")));
 #endif
 
+#ifdef CONFIG_HYPERV_TSCPAGE
+extern u8 hvclock_page
+	__attribute__((visibility("hidden")));
+#endif
+
 #ifndef BUILD_VDSO32
 
 notrace static long vdso_fallback_gettime(long clock, struct timespec *ts)
@@ -141,6 +147,32 @@ static notrace u64 vread_pvclock(int *mode)
 	return last;
 }
 #endif
+#ifdef CONFIG_HYPERV_TSCPAGE
+static notrace u64 vread_hvclock(int *mode)
+{
+	const struct ms_hyperv_tsc_page *tsc_pg =
+		(const struct ms_hyperv_tsc_page *)&hvclock_page;
+	u64 sequence, scale, offset, current_tick, cur_tsc;
+
+	while (1) {
+		sequence = READ_ONCE(tsc_pg->tsc_sequence);
+		if (!sequence)
+			break;
+
+		scale = READ_ONCE(tsc_pg->tsc_scale);
+		offset = READ_ONCE(tsc_pg->tsc_offset);
+		rdtscll(cur_tsc);
+
+		current_tick = mul_u64_u64_shr(cur_tsc, scale, 64) + offset;
+
+		if (READ_ONCE(tsc_pg->tsc_sequence) == sequence)
+			return current_tick;
+	}
+
+	*mode = VCLOCK_NONE;
+	return 0;
+}
+#endif
 
 notrace static u64 vread_tsc(void)
 {
@@ -173,6 +205,10 @@ notrace static inline u64 vgetsns(int *mode)
 	else if (gtod->vclock_mode == VCLOCK_PVCLOCK)
 		cycles = vread_pvclock(mode);
 #endif
+#ifdef CONFIG_HYPERV_TSCPAGE
+	else if (gtod->vclock_mode == VCLOCK_HVCLOCK)
+		cycles = vread_hvclock(mode);
+#endif
 	else
 		return 0;
 	v = (cycles - gtod->cycle_last) & gtod->mask;
diff --git a/arch/x86/entry/vdso/vdso-layout.lds.S b/arch/x86/entry/vdso/vdso-layout.lds.S
index a708aa9..8ebb4b6 100644
--- a/arch/x86/entry/vdso/vdso-layout.lds.S
+++ b/arch/x86/entry/vdso/vdso-layout.lds.S
@@ -25,7 +25,7 @@ SECTIONS
 	 * segment.
 	 */
 
-	vvar_start = . - 2 * PAGE_SIZE;
+	vvar_start = . - 3 * PAGE_SIZE;
 	vvar_page = vvar_start;
 
 	/* Place all vvars at the offsets in asm/vvar.h. */
@@ -36,6 +36,7 @@ SECTIONS
 #undef EMIT_VVAR
 
 	pvclock_page = vvar_start + PAGE_SIZE;
+	hvclock_page = vvar_start + 2 * PAGE_SIZE;
 
 	. = SIZEOF_HEADERS;
 
diff --git a/arch/x86/entry/vdso/vdso2c.c b/arch/x86/entry/vdso/vdso2c.c
index 491020b..0780a44 100644
--- a/arch/x86/entry/vdso/vdso2c.c
+++ b/arch/x86/entry/vdso/vdso2c.c
@@ -74,6 +74,7 @@ enum {
 	sym_vvar_page,
 	sym_hpet_page,
 	sym_pvclock_page,
+	sym_hvclock_page,
 	sym_VDSO_FAKE_SECTION_TABLE_START,
 	sym_VDSO_FAKE_SECTION_TABLE_END,
 };
@@ -82,6 +83,7 @@ const int special_pages[] = {
 	sym_vvar_page,
 	sym_hpet_page,
 	sym_pvclock_page,
+	sym_hvclock_page,
 };
 
 struct vdso_sym {
@@ -94,6 +96,7 @@ struct vdso_sym required_syms[] = {
 	[sym_vvar_page] = {"vvar_page", true},
 	[sym_hpet_page] = {"hpet_page", true},
 	[sym_pvclock_page] = {"pvclock_page", true},
+	[sym_hvclock_page] = {"hvclock_page", true},
 	[sym_VDSO_FAKE_SECTION_TABLE_START] = {
 		"VDSO_FAKE_SECTION_TABLE_START", false
 	},
diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
index 10820f6..b256a3b 100644
--- a/arch/x86/entry/vdso/vma.c
+++ b/arch/x86/entry/vdso/vma.c
@@ -21,6 +21,7 @@
 #include <asm/page.h>
 #include <asm/desc.h>
 #include <asm/cpufeature.h>
+#include <asm/mshyperv.h>
 
 #if defined(CONFIG_X86_64)
 unsigned int __read_mostly vdso64_enabled = 1;
@@ -120,6 +121,12 @@ static int vvar_fault(const struct vm_special_mapping *sm,
 				vmf->address,
 				__pa(pvti) >> PAGE_SHIFT);
 		}
+	} else if (sym_offset == image->sym_hvclock_page) {
+		struct ms_hyperv_tsc_page *tsc_pg = hv_get_tsc_page();
+
+		if (tsc_pg && vclock_was_used(VCLOCK_HVCLOCK))
+			ret = vm_insert_pfn(vma, vmf->address,
+					    vmalloc_to_pfn(tsc_pg));
 	}
 
 	if (ret == 0 || ret == -EBUSY)
diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 0ce8485..17519e0 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -157,6 +157,9 @@ void hyperv_init(void)
 		tsc_msr.guest_physical_address = vmalloc_to_pfn(tsc_pg);
 
 		wrmsrl(HV_X64_MSR_REFERENCE_TSC, tsc_msr.as_uint64);
+
+		hyperv_cs_tsc.archdata.vclock_mode = VCLOCK_HVCLOCK;
+
 		clocksource_register_hz(&hyperv_cs_tsc, NSEC_PER_SEC/100);
 		return;
 	}
diff --git a/arch/x86/include/asm/clocksource.h b/arch/x86/include/asm/clocksource.h
index eae33c7..47bea8c 100644
--- a/arch/x86/include/asm/clocksource.h
+++ b/arch/x86/include/asm/clocksource.h
@@ -6,7 +6,8 @@
 #define VCLOCK_NONE	0	/* No vDSO clock available.		*/
 #define VCLOCK_TSC	1	/* vDSO should use vread_tsc.		*/
 #define VCLOCK_PVCLOCK	2	/* vDSO should use vread_pvclock.	*/
-#define VCLOCK_MAX	2
+#define VCLOCK_HVCLOCK	3	/* vDSO should use vread_hvclock.	*/
+#define VCLOCK_MAX	3
 
 struct arch_clocksource_data {
 	int vclock_mode;
diff --git a/arch/x86/include/asm/vdso.h b/arch/x86/include/asm/vdso.h
index 2444189..bccdf49 100644
--- a/arch/x86/include/asm/vdso.h
+++ b/arch/x86/include/asm/vdso.h
@@ -20,6 +20,7 @@ struct vdso_image {
 	long sym_vvar_page;
 	long sym_hpet_page;
 	long sym_pvclock_page;
+	long sym_hvclock_page;
 	long sym_VDSO32_NOTE_MASK;
 	long sym___kernel_sigreturn;
 	long sym___kernel_rt_sigreturn;
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method
  2017-02-09 14:10 ` [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method Vitaly Kuznetsov
@ 2017-02-09 17:08   ` Thomas Gleixner
  2017-02-09 18:27     ` Stephen Hemminger
                       ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Thomas Gleixner @ 2017-02-09 17:08 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: x86, Andy Lutomirski, Ingo Molnar, H. Peter Anvin,
	K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger, Dexuan Cui,
	linux-kernel, devel, virtualization

On Thu, 9 Feb 2017, Vitaly Kuznetsov wrote:
> +#ifdef CONFIG_HYPERV_TSCPAGE
> +static notrace u64 vread_hvclock(int *mode)
> +{
> +	const struct ms_hyperv_tsc_page *tsc_pg =
> +		(const struct ms_hyperv_tsc_page *)&hvclock_page;
> +	u64 sequence, scale, offset, current_tick, cur_tsc;
> +
> +	while (1) {
> +		sequence = READ_ONCE(tsc_pg->tsc_sequence);
> +		if (!sequence)
> +			break;
> +
> +		scale = READ_ONCE(tsc_pg->tsc_scale);
> +		offset = READ_ONCE(tsc_pg->tsc_offset);
> +		rdtscll(cur_tsc);
> +
> +		current_tick = mul_u64_u64_shr(cur_tsc, scale, 64) + offset;
> +
> +		if (READ_ONCE(tsc_pg->tsc_sequence) == sequence)
> +			return current_tick;

That sequence stuff lacks still a sensible explanation. It's fundamentally
different from the sequence counting we do in the kernel, so documentation
for it is really required.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH 1/2] hyperv: implement hv_get_tsc_page()
  2017-02-09 14:10 ` [PATCH 1/2] hyperv: implement hv_get_tsc_page() Vitaly Kuznetsov
@ 2017-02-09 18:24   ` Stephen Hemminger
  2017-02-09 20:14     ` Thomas Gleixner
  0 siblings, 1 reply; 22+ messages in thread
From: Stephen Hemminger @ 2017-02-09 18:24 UTC (permalink / raw)
  To: Vitaly Kuznetsov, x86, Andy Lutomirski
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, KY Srinivasan,
	Haiyang Zhang, Dexuan Cui, linux-kernel, devel, virtualization

The actual code looks fine, but the style police will not like you.
{ should be at start of line on functions.
And #else should be at start of line,

But maybe this was just more of exchange mangling the mail.

-----Original Message-----
From: Vitaly Kuznetsov [mailto:vkuznets@redhat.com] 
Sent: Thursday, February 9, 2017 6:11 AM
To: x86@kernel.org; Andy Lutomirski <luto@amacapital.net>
Cc: Thomas Gleixner <tglx@linutronix.de>; Ingo Molnar <mingo@redhat.com>; H. Peter Anvin <hpa@zytor.com>; KY Srinivasan <kys@microsoft.com>; Haiyang Zhang <haiyangz@microsoft.com>; Stephen Hemminger <sthemmin@microsoft.com>; Dexuan Cui <decui@microsoft.com>; linux-kernel@vger.kernel.org; devel@linuxdriverproject.org; virtualization@lists.linux-foundation.org
Subject: [PATCH 1/2] hyperv: implement hv_get_tsc_page()

To use Hyper-V TSC page clocksource from vDSO we need to make tsc_pg available. Implement hv_get_tsc_page() and add CONFIG_HYPERV_TSCPAGE to make #ifdef-s simple.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/hyperv/hv_init.c       | 9 +++++++--
 arch/x86/include/asm/mshyperv.h | 8 ++++++++
 drivers/hv/Kconfig              | 3 +++
 3 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c index b371d0e..0ce8485 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -27,10 +27,15 @@
 #include <linux/clockchips.h>
 
 
-#ifdef CONFIG_X86_64
+#ifdef CONFIG_HYPERV_TSCPAGE
 
 static struct ms_hyperv_tsc_page *tsc_pg;
 
+struct ms_hyperv_tsc_page *hv_get_tsc_page(void) {
+	return tsc_pg;
+}
+
 static u64 read_hv_clock_tsc(struct clocksource *arg)  {
 	u64 current_tick;
@@ -136,7 +141,7 @@ void hyperv_init(void)
 	/*
 	 * Register Hyper-V specific clocksource.
 	 */
-#ifdef CONFIG_X86_64
+#ifdef CONFIG_HYPERV_TSCPAGE
 	if (ms_hyperv.features & HV_X64_MSR_REFERENCE_TSC_AVAILABLE) {
 		union hv_x64_msr_hypercall_contents tsc_msr;
 
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h index f8dc370..14dd92c 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -173,4 +173,12 @@ void hyperv_report_panic(struct pt_regs *regs);  bool hv_is_hypercall_page_setup(void);  void hyperv_cleanup(void);  #endif
+#ifdef CONFIG_HYPERV_TSCPAGE
+struct ms_hyperv_tsc_page *hv_get_tsc_page(void); #else static inline 
+struct ms_hyperv_tsc_page *hv_get_tsc_page(void) {
+	return NULL;
+}
+#endif
 #endif
diff --git a/drivers/hv/Kconfig b/drivers/hv/Kconfig index 0403b51..c29cd53 100644
--- a/drivers/hv/Kconfig
+++ b/drivers/hv/Kconfig
@@ -7,6 +7,9 @@ config HYPERV
 	  Select this option to run Linux as a Hyper-V client operating
 	  system.
 
+config HYPERV_TSCPAGE
+       def_bool HYPERV && X86_64
+
 config HYPERV_UTILS
 	tristate "Microsoft Hyper-V Utilities driver"
 	depends on HYPERV && CONNECTOR && NLS
--
2.9.3

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method
  2017-02-09 17:08   ` Thomas Gleixner
@ 2017-02-09 18:27     ` Stephen Hemminger
  2017-02-10 12:25       ` Vitaly Kuznetsov
  2017-02-09 20:45     ` KY Srinivasan
  2017-02-10 11:06     ` Vitaly Kuznetsov
  2 siblings, 1 reply; 22+ messages in thread
From: Stephen Hemminger @ 2017-02-09 18:27 UTC (permalink / raw)
  To: Thomas Gleixner, Vitaly Kuznetsov
  Cc: x86, Andy Lutomirski, Ingo Molnar, H. Peter Anvin, KY Srinivasan,
	Haiyang Zhang, Dexuan Cui, linux-kernel, devel, virtualization

Why not use existing seqlock's?

-----Original Message-----
From: Thomas Gleixner [mailto:tglx@linutronix.de] 
Sent: Thursday, February 9, 2017 9:08 AM
To: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: x86@kernel.org; Andy Lutomirski <luto@amacapital.net>; Ingo Molnar <mingo@redhat.com>; H. Peter Anvin <hpa@zytor.com>; KY Srinivasan <kys@microsoft.com>; Haiyang Zhang <haiyangz@microsoft.com>; Stephen Hemminger <sthemmin@microsoft.com>; Dexuan Cui <decui@microsoft.com>; linux-kernel@vger.kernel.org; devel@linuxdriverproject.org; virtualization@lists.linux-foundation.org
Subject: Re: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method

On Thu, 9 Feb 2017, Vitaly Kuznetsov wrote:
> +#ifdef CONFIG_HYPERV_TSCPAGE
> +static notrace u64 vread_hvclock(int *mode) {
> +	const struct ms_hyperv_tsc_page *tsc_pg =
> +		(const struct ms_hyperv_tsc_page *)&hvclock_page;
> +	u64 sequence, scale, offset, current_tick, cur_tsc;
> +
> +	while (1) {
> +		sequence = READ_ONCE(tsc_pg->tsc_sequence);
> +		if (!sequence)
> +			break;
> +
> +		scale = READ_ONCE(tsc_pg->tsc_scale);
> +		offset = READ_ONCE(tsc_pg->tsc_offset);
> +		rdtscll(cur_tsc);
> +
> +		current_tick = mul_u64_u64_shr(cur_tsc, scale, 64) + offset;
> +
> +		if (READ_ONCE(tsc_pg->tsc_sequence) == sequence)
> +			return current_tick;

That sequence stuff lacks still a sensible explanation. It's fundamentally different from the sequence counting we do in the kernel, so documentation for it is really required.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH 1/2] hyperv: implement hv_get_tsc_page()
  2017-02-09 18:24   ` Stephen Hemminger
@ 2017-02-09 20:14     ` Thomas Gleixner
  2017-02-09 23:17       ` Stephen Hemminger
  0 siblings, 1 reply; 22+ messages in thread
From: Thomas Gleixner @ 2017-02-09 20:14 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Vitaly Kuznetsov, x86, Andy Lutomirski, Ingo Molnar,
	H. Peter Anvin, KY Srinivasan, Haiyang Zhang, Dexuan Cui,
	linux-kernel, devel, virtualization

On Thu, 9 Feb 2017, Stephen Hemminger wrote:

> The actual code looks fine, but the style police will not like you.
> { should be at start of line on functions.
> And #else should be at start of line,
> 
> But maybe this was just more of exchange mangling the mail.

Looks like.

> +struct ms_hyperv_tsc_page *hv_get_tsc_page(void) {
> +	return tsc_pg;
> +}
> +

That's how it reads in a proper mail client connected to a proper mail
server:

> +struct ms_hyperv_tsc_page *hv_get_tsc_page(void)
> +{
> +       return tsc_pg;
> +}

:)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method
  2017-02-09 17:08   ` Thomas Gleixner
  2017-02-09 18:27     ` Stephen Hemminger
@ 2017-02-09 20:45     ` KY Srinivasan
  2017-02-09 22:55       ` Andy Lutomirski
  2017-02-10 11:06     ` Vitaly Kuznetsov
  2 siblings, 1 reply; 22+ messages in thread
From: KY Srinivasan @ 2017-02-09 20:45 UTC (permalink / raw)
  To: Thomas Gleixner, Vitaly Kuznetsov
  Cc: x86, Andy Lutomirski, Ingo Molnar, H. Peter Anvin, Haiyang Zhang,
	Stephen Hemminger, Dexuan Cui, linux-kernel, devel,
	virtualization



> -----Original Message-----
> From: Thomas Gleixner [mailto:tglx@linutronix.de]
> Sent: Thursday, February 9, 2017 9:08 AM
> To: Vitaly Kuznetsov <vkuznets@redhat.com>
> Cc: x86@kernel.org; Andy Lutomirski <luto@amacapital.net>; Ingo Molnar
> <mingo@redhat.com>; H. Peter Anvin <hpa@zytor.com>; KY Srinivasan
> <kys@microsoft.com>; Haiyang Zhang <haiyangz@microsoft.com>; Stephen
> Hemminger <sthemmin@microsoft.com>; Dexuan Cui
> <decui@microsoft.com>; linux-kernel@vger.kernel.org;
> devel@linuxdriverproject.org; virtualization@lists.linux-foundation.org
> Subject: Re: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read
> method
> 
> On Thu, 9 Feb 2017, Vitaly Kuznetsov wrote:
> > +#ifdef CONFIG_HYPERV_TSCPAGE
> > +static notrace u64 vread_hvclock(int *mode)
> > +{
> > +	const struct ms_hyperv_tsc_page *tsc_pg =
> > +		(const struct ms_hyperv_tsc_page *)&hvclock_page;
> > +	u64 sequence, scale, offset, current_tick, cur_tsc;
> > +
> > +	while (1) {
> > +		sequence = READ_ONCE(tsc_pg->tsc_sequence);
> > +		if (!sequence)
> > +			break;
> > +
> > +		scale = READ_ONCE(tsc_pg->tsc_scale);
> > +		offset = READ_ONCE(tsc_pg->tsc_offset);
> > +		rdtscll(cur_tsc);
> > +
> > +		current_tick = mul_u64_u64_shr(cur_tsc, scale, 64) + offset;
> > +
> > +		if (READ_ONCE(tsc_pg->tsc_sequence) == sequence)
> > +			return current_tick;
> 
> That sequence stuff lacks still a sensible explanation. It's fundamentally
> different from the sequence counting we do in the kernel, so documentation
> for it is really required.

The host is updating multiple fields in this shared TSC page and the sequence number is
used to ensure that the guest sees a consistent set values published. If I remember
correctly, Xen has a similar mechanism.

K. Y
> 
> Thanks,
> 
> 	tglx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method
  2017-02-09 20:45     ` KY Srinivasan
@ 2017-02-09 22:55       ` Andy Lutomirski
  2017-02-09 23:15         ` Stephen Hemminger
  2017-02-10 12:15         ` Vitaly Kuznetsov
  0 siblings, 2 replies; 22+ messages in thread
From: Andy Lutomirski @ 2017-02-09 22:55 UTC (permalink / raw)
  To: KY Srinivasan
  Cc: Thomas Gleixner, Vitaly Kuznetsov, x86, Ingo Molnar,
	H. Peter Anvin, Haiyang Zhang, Stephen Hemminger, Dexuan Cui,
	linux-kernel, devel, virtualization

On Thu, Feb 9, 2017 at 12:45 PM, KY Srinivasan <kys@microsoft.com> wrote:
>
>
>> -----Original Message-----
>> From: Thomas Gleixner [mailto:tglx@linutronix.de]
>> Sent: Thursday, February 9, 2017 9:08 AM
>> To: Vitaly Kuznetsov <vkuznets@redhat.com>
>> Cc: x86@kernel.org; Andy Lutomirski <luto@amacapital.net>; Ingo Molnar
>> <mingo@redhat.com>; H. Peter Anvin <hpa@zytor.com>; KY Srinivasan
>> <kys@microsoft.com>; Haiyang Zhang <haiyangz@microsoft.com>; Stephen
>> Hemminger <sthemmin@microsoft.com>; Dexuan Cui
>> <decui@microsoft.com>; linux-kernel@vger.kernel.org;
>> devel@linuxdriverproject.org; virtualization@lists.linux-foundation.org
>> Subject: Re: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read
>> method
>>
>> On Thu, 9 Feb 2017, Vitaly Kuznetsov wrote:
>> > +#ifdef CONFIG_HYPERV_TSCPAGE
>> > +static notrace u64 vread_hvclock(int *mode)
>> > +{
>> > +   const struct ms_hyperv_tsc_page *tsc_pg =
>> > +           (const struct ms_hyperv_tsc_page *)&hvclock_page;
>> > +   u64 sequence, scale, offset, current_tick, cur_tsc;
>> > +
>> > +   while (1) {
>> > +           sequence = READ_ONCE(tsc_pg->tsc_sequence);
>> > +           if (!sequence)
>> > +                   break;
>> > +
>> > +           scale = READ_ONCE(tsc_pg->tsc_scale);
>> > +           offset = READ_ONCE(tsc_pg->tsc_offset);
>> > +           rdtscll(cur_tsc);
>> > +
>> > +           current_tick = mul_u64_u64_shr(cur_tsc, scale, 64) + offset;
>> > +
>> > +           if (READ_ONCE(tsc_pg->tsc_sequence) == sequence)
>> > +                   return current_tick;
>>
>> That sequence stuff lacks still a sensible explanation. It's fundamentally
>> different from the sequence counting we do in the kernel, so documentation
>> for it is really required.
>
> The host is updating multiple fields in this shared TSC page and the sequence number is
> used to ensure that the guest sees a consistent set values published. If I remember
> correctly, Xen has a similar mechanism.

So what's the actual protocol?  When the hypervisor updates the page,
does it freeze all guest cpus?  If not, how does it maintain
atomicity?

--Andy

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method
  2017-02-09 22:55       ` Andy Lutomirski
@ 2017-02-09 23:15         ` Stephen Hemminger
  2017-02-10 12:15         ` Vitaly Kuznetsov
  1 sibling, 0 replies; 22+ messages in thread
From: Stephen Hemminger @ 2017-02-09 23:15 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: KY Srinivasan, Stephen Hemminger, Haiyang Zhang, x86,
	linux-kernel, virtualization, Ingo Molnar, H. Peter Anvin, devel,
	Thomas Gleixner

On Thu, 9 Feb 2017 14:55:50 -0800
Andy Lutomirski <luto@amacapital.net> wrote:

> On Thu, Feb 9, 2017 at 12:45 PM, KY Srinivasan <kys@microsoft.com> wrote:
> >
> >  
> >> -----Original Message-----
> >> From: Thomas Gleixner [mailto:tglx@linutronix.de]
> >> Sent: Thursday, February 9, 2017 9:08 AM
> >> To: Vitaly Kuznetsov <vkuznets@redhat.com>
> >> Cc: x86@kernel.org; Andy Lutomirski <luto@amacapital.net>; Ingo Molnar
> >> <mingo@redhat.com>; H. Peter Anvin <hpa@zytor.com>; KY Srinivasan
> >> <kys@microsoft.com>; Haiyang Zhang <haiyangz@microsoft.com>; Stephen
> >> Hemminger <sthemmin@microsoft.com>; Dexuan Cui
> >> <decui@microsoft.com>; linux-kernel@vger.kernel.org;
> >> devel@linuxdriverproject.org; virtualization@lists.linux-foundation.org
> >> Subject: Re: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read
> >> method
> >>
> >> On Thu, 9 Feb 2017, Vitaly Kuznetsov wrote:  
> >> > +#ifdef CONFIG_HYPERV_TSCPAGE
> >> > +static notrace u64 vread_hvclock(int *mode)
> >> > +{
> >> > +   const struct ms_hyperv_tsc_page *tsc_pg =
> >> > +           (const struct ms_hyperv_tsc_page *)&hvclock_page;
> >> > +   u64 sequence, scale, offset, current_tick, cur_tsc;
> >> > +
> >> > +   while (1) {
> >> > +           sequence = READ_ONCE(tsc_pg->tsc_sequence);
> >> > +           if (!sequence)
> >> > +                   break;
> >> > +
> >> > +           scale = READ_ONCE(tsc_pg->tsc_scale);
> >> > +           offset = READ_ONCE(tsc_pg->tsc_offset);
> >> > +           rdtscll(cur_tsc);
> >> > +
> >> > +           current_tick = mul_u64_u64_shr(cur_tsc, scale, 64) + offset;
> >> > +
> >> > +           if (READ_ONCE(tsc_pg->tsc_sequence) == sequence)
> >> > +                   return current_tick;  
> >>
> >> That sequence stuff lacks still a sensible explanation. It's fundamentally
> >> different from the sequence counting we do in the kernel, so documentation
> >> for it is really required.  
> >
> > The host is updating multiple fields in this shared TSC page and the sequence number is
> > used to ensure that the guest sees a consistent set values published. If I remember
> > correctly, Xen has a similar mechanism.  
> 
> So what's the actual protocol?  When the hypervisor updates the page,
> does it freeze all guest cpus?  If not, how does it maintain
> atomicity?

The protocol looks a lot like Linux seqlock, but it has an extra protection
which is missing here.

The host needs to update sequence number twice in order to guarantee ordering.
Otherwise it is possible that Host and guest can race.

					Host
						Write offset
						Write scale
						Set tsc_sequence = N
          Guest
		read sequence = N
		Read scale
						Write scale
						Write offset
		
		Read Offset
		Check sequence == N
						Set tsc_sequence = N +1

Look like the current host side protocol is wrong.

The solution that Andi Kleen invented, and I used in seqlock was for the writer to update
sequence at start and end of transaction. If sequence number is odd, then the reader knows
it is looking at stale data.
					Host
						Write offset
						Write scale
						Set tsc_sequence = N (end of transaction)
          Guest
		read sequence = N
		Spin until sequence is even (N is even)
		Read scale
						Set tsc_sequence += 1
						Write scale
						Write offset
		
		Read Offset
		Check sequence == N? (fails is N + 1)
						Set tsc_sequence += 1 (end of transaction)
		read sequence = N+2
		Spin until sequence is even (ie N +2)
		Read scale	
		Read Offset
		Check sequence == N +2? (yes ok).

Also it is faster to just read scale and offset with this loop and save
the reading of TSC and doing multiply until after scale/offset has been acquired.

	

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/2] hyperv: implement hv_get_tsc_page()
  2017-02-09 20:14     ` Thomas Gleixner
@ 2017-02-09 23:17       ` Stephen Hemminger
  0 siblings, 0 replies; 22+ messages in thread
From: Stephen Hemminger @ 2017-02-09 23:17 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Stephen Hemminger, Haiyang Zhang, x86, linux-kernel,
	Andy Lutomirski, Ingo Molnar, H. Peter Anvin, devel,
	virtualization

On Thu, 9 Feb 2017 21:14:25 +0100 (CET)
Thomas Gleixner <tglx@linutronix.de> wrote:

> On Thu, 9 Feb 2017, Stephen Hemminger wrote:
> 
> > The actual code looks fine, but the style police will not like you.
> > { should be at start of line on functions.
> > And #else should be at start of line,
> > 
> > But maybe this was just more of exchange mangling the mail.  
> 
> Looks like.
> 
> > +struct ms_hyperv_tsc_page *hv_get_tsc_page(void) {
> > +	return tsc_pg;
> > +}
> > +  
> 
> That's how it reads in a proper mail client connected to a proper mail
> server:
> 
> > +struct ms_hyperv_tsc_page *hv_get_tsc_page(void)
> > +{
> > +       return tsc_pg;
> > +}  
> 
> :)


Yup. it looks like the mail server is trying to be "helpful" by eliminating extra white space.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method
  2017-02-09 17:08   ` Thomas Gleixner
  2017-02-09 18:27     ` Stephen Hemminger
  2017-02-09 20:45     ` KY Srinivasan
@ 2017-02-10 11:06     ` Vitaly Kuznetsov
  2017-02-10 11:15       ` Thomas Gleixner
  2 siblings, 1 reply; 22+ messages in thread
From: Vitaly Kuznetsov @ 2017-02-10 11:06 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: x86, Andy Lutomirski, Ingo Molnar, H. Peter Anvin,
	K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger, Dexuan Cui,
	linux-kernel, devel, virtualization

Thomas Gleixner <tglx@linutronix.de> writes:

> On Thu, 9 Feb 2017, Vitaly Kuznetsov wrote:
>> +#ifdef CONFIG_HYPERV_TSCPAGE
>> +static notrace u64 vread_hvclock(int *mode)
>> +{
>> +	const struct ms_hyperv_tsc_page *tsc_pg =
>> +		(const struct ms_hyperv_tsc_page *)&hvclock_page;
>> +	u64 sequence, scale, offset, current_tick, cur_tsc;
>> +
>> +	while (1) {
>> +		sequence = READ_ONCE(tsc_pg->tsc_sequence);
>> +		if (!sequence)
>> +			break;
>> +
>> +		scale = READ_ONCE(tsc_pg->tsc_scale);
>> +		offset = READ_ONCE(tsc_pg->tsc_offset);
>> +		rdtscll(cur_tsc);
>> +
>> +		current_tick = mul_u64_u64_shr(cur_tsc, scale, 64) + offset;
>> +
>> +		if (READ_ONCE(tsc_pg->tsc_sequence) == sequence)
>> +			return current_tick;
>
> That sequence stuff lacks still a sensible explanation. It's fundamentally
> different from the sequence counting we do in the kernel, so documentation
> for it is really required.

Sure, do you think the following would do?

diff --git a/arch/x86/entry/vdso/vclock_gettime.c b/arch/x86/entry/vdso/vclock_gettime.c
index 4af10b4..886b600 100644
--- a/arch/x86/entry/vdso/vclock_gettime.c
+++ b/arch/x86/entry/vdso/vclock_gettime.c
@@ -154,6 +154,22 @@ static notrace u64 vread_hvclock(int *mode)
                (const struct ms_hyperv_tsc_page *)&hvclock_page;
        u64 sequence, scale, offset, current_tick, cur_tsc;
 
+       /*
+        * The protocol for reading Hyper-V TSC page is specified in Hypervisor
+        * Top-Level Functional Specification ver. 3.0 and above. To get the
+        * reference time we must do the following:
+        * - READ ReferenceTscSequence
+        *   A special '0' value indicates the time source is unreliable and we
+        *   need to use something else. The currently published specification
+        *   versions (up to 4.0b) contain a mistake and wrongly claim '-1'
+        *   instead of '0' as the special value, see commit c35b82ef0294.
+        * - ReferenceTime =
+        *        ((RDTSC() * ReferenceTscScale) >> 64) + ReferenceTscOffset
+        * - READ ReferenceTscSequence again. In case its value has changed
+        *   since our first reading we need to discard ReferenceTime and repeat
+        *   the whole sequence as the hypervisor was updating the page in
+        *   between.
+        */
        while (1) {
                sequence = READ_ONCE(tsc_pg->tsc_sequence);
                if (!sequence)

-- 
  Vitaly

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method
  2017-02-10 11:06     ` Vitaly Kuznetsov
@ 2017-02-10 11:15       ` Thomas Gleixner
  0 siblings, 0 replies; 22+ messages in thread
From: Thomas Gleixner @ 2017-02-10 11:15 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: x86, Andy Lutomirski, Ingo Molnar, H. Peter Anvin,
	K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger, Dexuan Cui,
	linux-kernel, devel, virtualization

On Fri, 10 Feb 2017, Vitaly Kuznetsov wrote:
> Thomas Gleixner <tglx@linutronix.de> writes:
> 
> > On Thu, 9 Feb 2017, Vitaly Kuznetsov wrote:
> >> +#ifdef CONFIG_HYPERV_TSCPAGE
> >> +static notrace u64 vread_hvclock(int *mode)
> >> +{
> >> +	const struct ms_hyperv_tsc_page *tsc_pg =
> >> +		(const struct ms_hyperv_tsc_page *)&hvclock_page;
> >> +	u64 sequence, scale, offset, current_tick, cur_tsc;
> >> +
> >> +	while (1) {
> >> +		sequence = READ_ONCE(tsc_pg->tsc_sequence);
> >> +		if (!sequence)
> >> +			break;
> >> +
> >> +		scale = READ_ONCE(tsc_pg->tsc_scale);
> >> +		offset = READ_ONCE(tsc_pg->tsc_offset);
> >> +		rdtscll(cur_tsc);
> >> +
> >> +		current_tick = mul_u64_u64_shr(cur_tsc, scale, 64) + offset;
> >> +
> >> +		if (READ_ONCE(tsc_pg->tsc_sequence) == sequence)
> >> +			return current_tick;
> >
> > That sequence stuff lacks still a sensible explanation. It's fundamentally
> > different from the sequence counting we do in the kernel, so documentation
> > for it is really required.
> 
> Sure, do you think the following would do?

Yes

> diff --git a/arch/x86/entry/vdso/vclock_gettime.c b/arch/x86/entry/vdso/vclock_gettime.c
> index 4af10b4..886b600 100644
> --- a/arch/x86/entry/vdso/vclock_gettime.c
> +++ b/arch/x86/entry/vdso/vclock_gettime.c
> @@ -154,6 +154,22 @@ static notrace u64 vread_hvclock(int *mode)
>                 (const struct ms_hyperv_tsc_page *)&hvclock_page;
>         u64 sequence, scale, offset, current_tick, cur_tsc;
>  
> +       /*
> +        * The protocol for reading Hyper-V TSC page is specified in Hypervisor
> +        * Top-Level Functional Specification ver. 3.0 and above. To get the
> +        * reference time we must do the following:
> +        * - READ ReferenceTscSequence
> +        *   A special '0' value indicates the time source is unreliable and we
> +        *   need to use something else. The currently published specification
> +        *   versions (up to 4.0b) contain a mistake and wrongly claim '-1'
> +        *   instead of '0' as the special value, see commit c35b82ef0294.
> +        * - ReferenceTime =
> +        *        ((RDTSC() * ReferenceTscScale) >> 64) + ReferenceTscOffset
> +        * - READ ReferenceTscSequence again. In case its value has changed
> +        *   since our first reading we need to discard ReferenceTime and repeat
> +        *   the whole sequence as the hypervisor was updating the page in
> +        *   between.
> +        */
>         while (1) {
>                 sequence = READ_ONCE(tsc_pg->tsc_sequence);
>                 if (!sequence)
> 
> -- 
>   Vitaly
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method
  2017-02-09 22:55       ` Andy Lutomirski
  2017-02-09 23:15         ` Stephen Hemminger
@ 2017-02-10 12:15         ` Vitaly Kuznetsov
  1 sibling, 0 replies; 22+ messages in thread
From: Vitaly Kuznetsov @ 2017-02-10 12:15 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: KY Srinivasan, Thomas Gleixner, x86, Ingo Molnar, H. Peter Anvin,
	Haiyang Zhang, Stephen Hemminger, Dexuan Cui, linux-kernel,
	devel, virtualization

Andy Lutomirski <luto@amacapital.net> writes:

> On Thu, Feb 9, 2017 at 12:45 PM, KY Srinivasan <kys@microsoft.com> wrote:
>>
>>
>>> -----Original Message-----
>>> From: Thomas Gleixner [mailto:tglx@linutronix.de]
>>> Sent: Thursday, February 9, 2017 9:08 AM
>>> To: Vitaly Kuznetsov <vkuznets@redhat.com>
>>> Cc: x86@kernel.org; Andy Lutomirski <luto@amacapital.net>; Ingo Molnar
>>> <mingo@redhat.com>; H. Peter Anvin <hpa@zytor.com>; KY Srinivasan
>>> <kys@microsoft.com>; Haiyang Zhang <haiyangz@microsoft.com>; Stephen
>>> Hemminger <sthemmin@microsoft.com>; Dexuan Cui
>>> <decui@microsoft.com>; linux-kernel@vger.kernel.org;
>>> devel@linuxdriverproject.org; virtualization@lists.linux-foundation.org
>>> Subject: Re: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read
>>> method
>>>
>>> On Thu, 9 Feb 2017, Vitaly Kuznetsov wrote:
>>> > +#ifdef CONFIG_HYPERV_TSCPAGE
>>> > +static notrace u64 vread_hvclock(int *mode)
>>> > +{
>>> > +   const struct ms_hyperv_tsc_page *tsc_pg =
>>> > +           (const struct ms_hyperv_tsc_page *)&hvclock_page;
>>> > +   u64 sequence, scale, offset, current_tick, cur_tsc;
>>> > +
>>> > +   while (1) {
>>> > +           sequence = READ_ONCE(tsc_pg->tsc_sequence);
>>> > +           if (!sequence)
>>> > +                   break;
>>> > +
>>> > +           scale = READ_ONCE(tsc_pg->tsc_scale);
>>> > +           offset = READ_ONCE(tsc_pg->tsc_offset);
>>> > +           rdtscll(cur_tsc);
>>> > +
>>> > +           current_tick = mul_u64_u64_shr(cur_tsc, scale, 64) + offset;
>>> > +
>>> > +           if (READ_ONCE(tsc_pg->tsc_sequence) == sequence)
>>> > +                   return current_tick;
>>>
>>> That sequence stuff lacks still a sensible explanation. It's fundamentally
>>> different from the sequence counting we do in the kernel, so documentation
>>> for it is really required.
>>
>> The host is updating multiple fields in this shared TSC page and the sequence number is
>> used to ensure that the guest sees a consistent set values published. If I remember
>> correctly, Xen has a similar mechanism.
>
> So what's the actual protocol?  When the hypervisor updates the page,
> does it freeze all guest cpus?  If not, how does it maintain
> atomicity?

I don't really know how it is implemented server-side but I *think* that
freezing all CPUs is only required when we want to update *both*
ReferenceTscScale and ReferenceTscOffset at the same time (as Hyper-V is
64-bit only so it can always atomically update 64-bit values)...

-- 
  Vitaly

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method
  2017-02-09 18:27     ` Stephen Hemminger
@ 2017-02-10 12:25       ` Vitaly Kuznetsov
  2017-02-10 12:28         ` Thomas Gleixner
  0 siblings, 1 reply; 22+ messages in thread
From: Vitaly Kuznetsov @ 2017-02-10 12:25 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Thomas Gleixner, x86, Andy Lutomirski, Ingo Molnar,
	H. Peter Anvin, KY Srinivasan, Haiyang Zhang, Dexuan Cui,
	linux-kernel, devel, virtualization

Stephen Hemminger <sthemmin@microsoft.com> writes:

> Why not use existing seqlock's?
>

To be honest I don't quite understand how we could use it -- the
sequence locking here is done against the page updated by the
hypersior, we're not creating new structures (so I don't understand how
we could use struct seqcount which we don't have) but I may be
misunderstanding something.

BTW, I just occured to me that I should've probably put the TSC reading
code to mshyperv.h and use it from both vDSO and read_hv_clock_tsc() --
what do you thing?

[snip]

-- 
  Vitaly

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method
  2017-02-10 12:25       ` Vitaly Kuznetsov
@ 2017-02-10 12:28         ` Thomas Gleixner
  2017-02-10 16:31           ` Stephen Hemminger
  0 siblings, 1 reply; 22+ messages in thread
From: Thomas Gleixner @ 2017-02-10 12:28 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Stephen Hemminger, x86, Andy Lutomirski, Ingo Molnar,
	H. Peter Anvin, KY Srinivasan, Haiyang Zhang, Dexuan Cui,
	linux-kernel, devel, virtualization

On Fri, 10 Feb 2017, Vitaly Kuznetsov wrote:

> Stephen Hemminger <sthemmin@microsoft.com> writes:
> 
> > Why not use existing seqlock's?
> >
> 
> To be honest I don't quite understand how we could use it -- the
> sequence locking here is done against the page updated by the
> hypersior, we're not creating new structures (so I don't understand how
> we could use struct seqcount which we don't have) but I may be
> misunderstanding something.

You can't use seqlock, but you might be able to use seqcount. Though I
doubt it given the 0 check ....

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method
  2017-02-10 12:28         ` Thomas Gleixner
@ 2017-02-10 16:31           ` Stephen Hemminger
  2017-02-10 18:01             ` Thomas Gleixner
  0 siblings, 1 reply; 22+ messages in thread
From: Stephen Hemminger @ 2017-02-10 16:31 UTC (permalink / raw)
  To: Thomas Gleixner, Vitaly Kuznetsov
  Cc: x86, Andy Lutomirski, Ingo Molnar, H. Peter Anvin, KY Srinivasan,
	Haiyang Zhang, Dexuan Cui, linux-kernel, devel, virtualization

Since sequence count algorithm is done by hypervisor, better to not reuse seqcount.
Still concerned that the code is racy.

-----Original Message-----
From: Thomas Gleixner [mailto:tglx@linutronix.de] 
Sent: Friday, February 10, 2017 4:28 AM
To: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>; x86@kernel.org; Andy Lutomirski <luto@amacapital.net>; Ingo Molnar <mingo@redhat.com>; H. Peter Anvin <hpa@zytor.com>; KY Srinivasan <kys@microsoft.com>; Haiyang Zhang <haiyangz@microsoft.com>; Dexuan Cui <decui@microsoft.com>; linux-kernel@vger.kernel.org; devel@linuxdriverproject.org; virtualization@lists.linux-foundation.org
Subject: Re: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method

On Fri, 10 Feb 2017, Vitaly Kuznetsov wrote:

> Stephen Hemminger <sthemmin@microsoft.com> writes:
> 
> > Why not use existing seqlock's?
> >
> 
> To be honest I don't quite understand how we could use it -- the 
> sequence locking here is done against the page updated by the 
> hypersior, we're not creating new structures (so I don't understand 
> how we could use struct seqcount which we don't have) but I may be 
> misunderstanding something.

You can't use seqlock, but you might be able to use seqcount. Though I doubt it given the 0 check ....

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method
  2017-02-10 16:31           ` Stephen Hemminger
@ 2017-02-10 18:01             ` Thomas Gleixner
  2017-02-13  7:49               ` Dexuan Cui
  0 siblings, 1 reply; 22+ messages in thread
From: Thomas Gleixner @ 2017-02-10 18:01 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Vitaly Kuznetsov, x86, Andy Lutomirski, Ingo Molnar,
	H. Peter Anvin, KY Srinivasan, Haiyang Zhang, Dexuan Cui,
	linux-kernel, devel, virtualization

On Fri, 10 Feb 2017, Stephen Hemminger wrote:

> Since sequence count algorithm is done by hypervisor, better to not reuse seqcount.
> Still concerned that the code is racy.

That's a different question and can only be answered by the hypervisor
folks. Dunno, whether they have barrier requirements. The seqcount stuff
relies on:

do {
	seq = READ_ONCE(s->sequence);

	smp_rmb();

	sample_data();
  
	smp_rmb();
} while (s->sequence != seq);

which pairs with the writer side:

       s->sequence++;
       
       smp_wmb();

       update_data();

       smp_wmb();

       s->sequence++;

That's important if the stuff happens cross CPU. If the update happens on
the same CPU then this is a different story and as there are VMexits
involved they might provide the required ordering already. But I can't tell
as I have no idea how that host side thing is done.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method
  2017-02-10 18:01             ` Thomas Gleixner
@ 2017-02-13  7:49               ` Dexuan Cui
  2017-02-13  9:27                 ` Thomas Gleixner
  2017-02-13 19:06                 ` Andy Lutomirski
  0 siblings, 2 replies; 22+ messages in thread
From: Dexuan Cui @ 2017-02-13  7:49 UTC (permalink / raw)
  To: Thomas Gleixner, Stephen Hemminger
  Cc: Vitaly Kuznetsov, x86, Andy Lutomirski, Ingo Molnar,
	H. Peter Anvin, KY Srinivasan, Haiyang Zhang, linux-kernel,
	devel, virtualization

> From: Thomas Gleixner [mailto:tglx@linutronix.de]
> Sent: Saturday, February 11, 2017 02:02
>  ...
> That's important if the stuff happens cross CPU. If the update happens on
> the same CPU then this is a different story and as there are VMexits
> involved they might provide the required ordering already. But I can't tell
> as I have no idea how that host side thing is done.
> 
> 	tglx

IMO Hyper-V TSC page clocksource here seems pretty similar to KVM's pvclock,
So I would guess "the structure is only updated just before reentering the guest
after some VM event" (https://rwmj.wordpress.com/2010/10/15/kvm-pvclock/),
that is, the update should happen on the same CPU, I guess.

Thanks,
-- Dexuan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method
  2017-02-13  7:49               ` Dexuan Cui
@ 2017-02-13  9:27                 ` Thomas Gleixner
  2017-02-13 19:06                 ` Andy Lutomirski
  1 sibling, 0 replies; 22+ messages in thread
From: Thomas Gleixner @ 2017-02-13  9:27 UTC (permalink / raw)
  To: Dexuan Cui
  Cc: Stephen Hemminger, Vitaly Kuznetsov, x86, Andy Lutomirski,
	Ingo Molnar, H. Peter Anvin, KY Srinivasan, Haiyang Zhang,
	linux-kernel, devel, virtualization

On Mon, 13 Feb 2017, Dexuan Cui wrote:
> > From: Thomas Gleixner [mailto:tglx@linutronix.de]
> > Sent: Saturday, February 11, 2017 02:02
> >  ...
> > That's important if the stuff happens cross CPU. If the update happens on
> > the same CPU then this is a different story and as there are VMexits
> > involved they might provide the required ordering already. But I can't tell
> > as I have no idea how that host side thing is done.
> > 
> 
> IMO Hyper-V TSC page clocksource here seems pretty similar to KVM's pvclock,
> So I would guess "the structure is only updated just before reentering the guest
> after some VM event" (https://rwmj.wordpress.com/2010/10/15/kvm-pvclock/),
> that is, the update should happen on the same CPU, I guess.

Guessing does not help much. There are a gazillion ways to implement such a
thing. I'm sure that there exists proper documentation of the mechanism
inside your company.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method
  2017-02-13  7:49               ` Dexuan Cui
  2017-02-13  9:27                 ` Thomas Gleixner
@ 2017-02-13 19:06                 ` Andy Lutomirski
  2017-02-13 19:28                   ` Thomas Gleixner
  1 sibling, 1 reply; 22+ messages in thread
From: Andy Lutomirski @ 2017-02-13 19:06 UTC (permalink / raw)
  To: Dexuan Cui
  Cc: Thomas Gleixner, Stephen Hemminger, Vitaly Kuznetsov, x86,
	Ingo Molnar, H. Peter Anvin, KY Srinivasan, Haiyang Zhang,
	linux-kernel, devel, virtualization

On Sun, Feb 12, 2017 at 11:49 PM, Dexuan Cui <decui@microsoft.com> wrote:
>> From: Thomas Gleixner [mailto:tglx@linutronix.de]
>> Sent: Saturday, February 11, 2017 02:02
>>  ...
>> That's important if the stuff happens cross CPU. If the update happens on
>> the same CPU then this is a different story and as there are VMexits
>> involved they might provide the required ordering already. But I can't tell
>> as I have no idea how that host side thing is done.
>>
>>       tglx
>
> IMO Hyper-V TSC page clocksource here seems pretty similar to KVM's pvclock,
> So I would guess "the structure is only updated just before reentering the guest
> after some VM event" (https://rwmj.wordpress.com/2010/10/15/kvm-pvclock/),
> that is, the update should happen on the same CPU, I guess.

If the patch is correct, there is one of these shared by all vCPUs, so
this is not a sufficient explanation.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method
  2017-02-13 19:06                 ` Andy Lutomirski
@ 2017-02-13 19:28                   ` Thomas Gleixner
  0 siblings, 0 replies; 22+ messages in thread
From: Thomas Gleixner @ 2017-02-13 19:28 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Dexuan Cui, Stephen Hemminger, Vitaly Kuznetsov, x86,
	Ingo Molnar, H. Peter Anvin, KY Srinivasan, Haiyang Zhang,
	linux-kernel, devel, virtualization

On Mon, 13 Feb 2017, Andy Lutomirski wrote:

> On Sun, Feb 12, 2017 at 11:49 PM, Dexuan Cui <decui@microsoft.com> wrote:
> >> From: Thomas Gleixner [mailto:tglx@linutronix.de]
> >> Sent: Saturday, February 11, 2017 02:02
> >>  ...
> >> That's important if the stuff happens cross CPU. If the update happens on
> >> the same CPU then this is a different story and as there are VMexits
> >> involved they might provide the required ordering already. But I can't tell
> >> as I have no idea how that host side thing is done.
> >>
> >>       tglx
> >
> > IMO Hyper-V TSC page clocksource here seems pretty similar to KVM's pvclock,
> > So I would guess "the structure is only updated just before reentering the guest
> > after some VM event" (https://rwmj.wordpress.com/2010/10/15/kvm-pvclock/),
> > that is, the update should happen on the same CPU, I guess.
> 
> If the patch is correct, there is one of these shared by all vCPUs, so
> this is not a sufficient explanation.

Right, because that's ony ONE TSC page for the whole guest and then the seq
stuff really lacks barriers.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2017-02-13 19:28 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-09 14:10 [PATCH 0/2] x86/vdso: Add Hyper-V TSC page clocksource support Vitaly Kuznetsov
2017-02-09 14:10 ` [PATCH 1/2] hyperv: implement hv_get_tsc_page() Vitaly Kuznetsov
2017-02-09 18:24   ` Stephen Hemminger
2017-02-09 20:14     ` Thomas Gleixner
2017-02-09 23:17       ` Stephen Hemminger
2017-02-09 14:10 ` [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method Vitaly Kuznetsov
2017-02-09 17:08   ` Thomas Gleixner
2017-02-09 18:27     ` Stephen Hemminger
2017-02-10 12:25       ` Vitaly Kuznetsov
2017-02-10 12:28         ` Thomas Gleixner
2017-02-10 16:31           ` Stephen Hemminger
2017-02-10 18:01             ` Thomas Gleixner
2017-02-13  7:49               ` Dexuan Cui
2017-02-13  9:27                 ` Thomas Gleixner
2017-02-13 19:06                 ` Andy Lutomirski
2017-02-13 19:28                   ` Thomas Gleixner
2017-02-09 20:45     ` KY Srinivasan
2017-02-09 22:55       ` Andy Lutomirski
2017-02-09 23:15         ` Stephen Hemminger
2017-02-10 12:15         ` Vitaly Kuznetsov
2017-02-10 11:06     ` Vitaly Kuznetsov
2017-02-10 11:15       ` Thomas Gleixner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).