* [PATCH 0/2] x86/vdso: Add Hyper-V TSC page clocksource support
@ 2017-02-09 14:10 Vitaly Kuznetsov
2017-02-09 14:10 ` [PATCH 1/2] hyperv: implement hv_get_tsc_page() Vitaly Kuznetsov
2017-02-09 14:10 ` [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method Vitaly Kuznetsov
0 siblings, 2 replies; 22+ messages in thread
From: Vitaly Kuznetsov @ 2017-02-09 14:10 UTC (permalink / raw)
To: x86, Andy Lutomirski
Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, K. Y. Srinivasan,
Haiyang Zhang, Stephen Hemminger, Dexuan Cui, linux-kernel,
devel, virtualization
Hi,
Hyper-V TSC page clocksource is suitable for vDSO, however, the protocol
defined by the hypervisor is different from VCLOCK_PVCLOCK. Implemented the
required support. Simple sysbench test shows the following results:
Before:
# time sysbench --test=memory --max-requests=500000 run
...
real 1m22.618s
user 0m50.193s
sys 0m32.268s
After:
# time sysbench --test=memory --max-requests=500000 run
...
real 0m47.241s
user 0m47.117s
sys 0m0.008s
So it seems it is worth it. As nobody seems to be strongly offended by my RFC
I'm sending first non-RFC version. Patch 1 is made on top of K. Y.'s code
refactoring which moved tsc page clocksource to arch/x86/hyperv/hv_init.c,
this is currently present in Greg's char-misc-next tree.
Changes since RFC:
- Use mul_u64_u64_shr() instead of an open coded implementation
[Andy Lutomirski, Thomas Gleixner]
- Don't use the same pvclock_page for both VCLOCK_PVCLOCK and
VCLOCK_HVCLOCK, create another one [Andy Lutomirski]
- Fix issues reported by kbuild test robot.
- Rename HYPERV_CLOCK -> HYPERV_TSCPAGE to avoid the ambiguity.
I'm also going to try to optimize mul_u64_u64_shr() for 32bit but this can
be split from this series I guess.
Vitaly Kuznetsov (2):
hyperv: implement hv_get_tsc_page()
x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method
arch/x86/entry/vdso/vclock_gettime.c | 36 +++++++++++++++++++++++++++++++++++
arch/x86/entry/vdso/vdso-layout.lds.S | 3 ++-
arch/x86/entry/vdso/vdso2c.c | 3 +++
arch/x86/entry/vdso/vma.c | 7 +++++++
arch/x86/hyperv/hv_init.c | 12 ++++++++++--
arch/x86/include/asm/clocksource.h | 3 ++-
arch/x86/include/asm/mshyperv.h | 8 ++++++++
arch/x86/include/asm/vdso.h | 1 +
drivers/hv/Kconfig | 3 +++
9 files changed, 72 insertions(+), 4 deletions(-)
--
2.9.3
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH 1/2] hyperv: implement hv_get_tsc_page()
2017-02-09 14:10 [PATCH 0/2] x86/vdso: Add Hyper-V TSC page clocksource support Vitaly Kuznetsov
@ 2017-02-09 14:10 ` Vitaly Kuznetsov
2017-02-09 18:24 ` Stephen Hemminger
2017-02-09 14:10 ` [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method Vitaly Kuznetsov
1 sibling, 1 reply; 22+ messages in thread
From: Vitaly Kuznetsov @ 2017-02-09 14:10 UTC (permalink / raw)
To: x86, Andy Lutomirski
Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, K. Y. Srinivasan,
Haiyang Zhang, Stephen Hemminger, Dexuan Cui, linux-kernel,
devel, virtualization
To use Hyper-V TSC page clocksource from vDSO we need to make tsc_pg
available. Implement hv_get_tsc_page() and add CONFIG_HYPERV_TSCPAGE to
make #ifdef-s simple.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
arch/x86/hyperv/hv_init.c | 9 +++++++--
arch/x86/include/asm/mshyperv.h | 8 ++++++++
drivers/hv/Kconfig | 3 +++
3 files changed, 18 insertions(+), 2 deletions(-)
diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index b371d0e..0ce8485 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -27,10 +27,15 @@
#include <linux/clockchips.h>
-#ifdef CONFIG_X86_64
+#ifdef CONFIG_HYPERV_TSCPAGE
static struct ms_hyperv_tsc_page *tsc_pg;
+struct ms_hyperv_tsc_page *hv_get_tsc_page(void)
+{
+ return tsc_pg;
+}
+
static u64 read_hv_clock_tsc(struct clocksource *arg)
{
u64 current_tick;
@@ -136,7 +141,7 @@ void hyperv_init(void)
/*
* Register Hyper-V specific clocksource.
*/
-#ifdef CONFIG_X86_64
+#ifdef CONFIG_HYPERV_TSCPAGE
if (ms_hyperv.features & HV_X64_MSR_REFERENCE_TSC_AVAILABLE) {
union hv_x64_msr_hypercall_contents tsc_msr;
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index f8dc370..14dd92c 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -173,4 +173,12 @@ void hyperv_report_panic(struct pt_regs *regs);
bool hv_is_hypercall_page_setup(void);
void hyperv_cleanup(void);
#endif
+#ifdef CONFIG_HYPERV_TSCPAGE
+struct ms_hyperv_tsc_page *hv_get_tsc_page(void);
+#else
+static inline struct ms_hyperv_tsc_page *hv_get_tsc_page(void)
+{
+ return NULL;
+}
+#endif
#endif
diff --git a/drivers/hv/Kconfig b/drivers/hv/Kconfig
index 0403b51..c29cd53 100644
--- a/drivers/hv/Kconfig
+++ b/drivers/hv/Kconfig
@@ -7,6 +7,9 @@ config HYPERV
Select this option to run Linux as a Hyper-V client operating
system.
+config HYPERV_TSCPAGE
+ def_bool HYPERV && X86_64
+
config HYPERV_UTILS
tristate "Microsoft Hyper-V Utilities driver"
depends on HYPERV && CONNECTOR && NLS
--
2.9.3
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method
2017-02-09 14:10 [PATCH 0/2] x86/vdso: Add Hyper-V TSC page clocksource support Vitaly Kuznetsov
2017-02-09 14:10 ` [PATCH 1/2] hyperv: implement hv_get_tsc_page() Vitaly Kuznetsov
@ 2017-02-09 14:10 ` Vitaly Kuznetsov
2017-02-09 17:08 ` Thomas Gleixner
1 sibling, 1 reply; 22+ messages in thread
From: Vitaly Kuznetsov @ 2017-02-09 14:10 UTC (permalink / raw)
To: x86, Andy Lutomirski
Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, K. Y. Srinivasan,
Haiyang Zhang, Stephen Hemminger, Dexuan Cui, linux-kernel,
devel, virtualization
Hyper-V TSC page clocksource is suitable for vDSO, however, the protocol
defined by the hypervisor is different from VCLOCK_PVCLOCK. Implement the
required support by adding hvclock_page VVAR.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
arch/x86/entry/vdso/vclock_gettime.c | 36 +++++++++++++++++++++++++++++++++++
arch/x86/entry/vdso/vdso-layout.lds.S | 3 ++-
arch/x86/entry/vdso/vdso2c.c | 3 +++
arch/x86/entry/vdso/vma.c | 7 +++++++
arch/x86/hyperv/hv_init.c | 3 +++
arch/x86/include/asm/clocksource.h | 3 ++-
arch/x86/include/asm/vdso.h | 1 +
7 files changed, 54 insertions(+), 2 deletions(-)
diff --git a/arch/x86/entry/vdso/vclock_gettime.c b/arch/x86/entry/vdso/vclock_gettime.c
index 9d4d6e1..4af10b4 100644
--- a/arch/x86/entry/vdso/vclock_gettime.c
+++ b/arch/x86/entry/vdso/vclock_gettime.c
@@ -17,6 +17,7 @@
#include <asm/unistd.h>
#include <asm/msr.h>
#include <asm/pvclock.h>
+#include <asm/mshyperv.h>
#include <linux/math64.h>
#include <linux/time.h>
#include <linux/kernel.h>
@@ -32,6 +33,11 @@ extern u8 pvclock_page
__attribute__((visibility("hidden")));
#endif
+#ifdef CONFIG_HYPERV_TSCPAGE
+extern u8 hvclock_page
+ __attribute__((visibility("hidden")));
+#endif
+
#ifndef BUILD_VDSO32
notrace static long vdso_fallback_gettime(long clock, struct timespec *ts)
@@ -141,6 +147,32 @@ static notrace u64 vread_pvclock(int *mode)
return last;
}
#endif
+#ifdef CONFIG_HYPERV_TSCPAGE
+static notrace u64 vread_hvclock(int *mode)
+{
+ const struct ms_hyperv_tsc_page *tsc_pg =
+ (const struct ms_hyperv_tsc_page *)&hvclock_page;
+ u64 sequence, scale, offset, current_tick, cur_tsc;
+
+ while (1) {
+ sequence = READ_ONCE(tsc_pg->tsc_sequence);
+ if (!sequence)
+ break;
+
+ scale = READ_ONCE(tsc_pg->tsc_scale);
+ offset = READ_ONCE(tsc_pg->tsc_offset);
+ rdtscll(cur_tsc);
+
+ current_tick = mul_u64_u64_shr(cur_tsc, scale, 64) + offset;
+
+ if (READ_ONCE(tsc_pg->tsc_sequence) == sequence)
+ return current_tick;
+ }
+
+ *mode = VCLOCK_NONE;
+ return 0;
+}
+#endif
notrace static u64 vread_tsc(void)
{
@@ -173,6 +205,10 @@ notrace static inline u64 vgetsns(int *mode)
else if (gtod->vclock_mode == VCLOCK_PVCLOCK)
cycles = vread_pvclock(mode);
#endif
+#ifdef CONFIG_HYPERV_TSCPAGE
+ else if (gtod->vclock_mode == VCLOCK_HVCLOCK)
+ cycles = vread_hvclock(mode);
+#endif
else
return 0;
v = (cycles - gtod->cycle_last) & gtod->mask;
diff --git a/arch/x86/entry/vdso/vdso-layout.lds.S b/arch/x86/entry/vdso/vdso-layout.lds.S
index a708aa9..8ebb4b6 100644
--- a/arch/x86/entry/vdso/vdso-layout.lds.S
+++ b/arch/x86/entry/vdso/vdso-layout.lds.S
@@ -25,7 +25,7 @@ SECTIONS
* segment.
*/
- vvar_start = . - 2 * PAGE_SIZE;
+ vvar_start = . - 3 * PAGE_SIZE;
vvar_page = vvar_start;
/* Place all vvars at the offsets in asm/vvar.h. */
@@ -36,6 +36,7 @@ SECTIONS
#undef EMIT_VVAR
pvclock_page = vvar_start + PAGE_SIZE;
+ hvclock_page = vvar_start + 2 * PAGE_SIZE;
. = SIZEOF_HEADERS;
diff --git a/arch/x86/entry/vdso/vdso2c.c b/arch/x86/entry/vdso/vdso2c.c
index 491020b..0780a44 100644
--- a/arch/x86/entry/vdso/vdso2c.c
+++ b/arch/x86/entry/vdso/vdso2c.c
@@ -74,6 +74,7 @@ enum {
sym_vvar_page,
sym_hpet_page,
sym_pvclock_page,
+ sym_hvclock_page,
sym_VDSO_FAKE_SECTION_TABLE_START,
sym_VDSO_FAKE_SECTION_TABLE_END,
};
@@ -82,6 +83,7 @@ const int special_pages[] = {
sym_vvar_page,
sym_hpet_page,
sym_pvclock_page,
+ sym_hvclock_page,
};
struct vdso_sym {
@@ -94,6 +96,7 @@ struct vdso_sym required_syms[] = {
[sym_vvar_page] = {"vvar_page", true},
[sym_hpet_page] = {"hpet_page", true},
[sym_pvclock_page] = {"pvclock_page", true},
+ [sym_hvclock_page] = {"hvclock_page", true},
[sym_VDSO_FAKE_SECTION_TABLE_START] = {
"VDSO_FAKE_SECTION_TABLE_START", false
},
diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
index 10820f6..b256a3b 100644
--- a/arch/x86/entry/vdso/vma.c
+++ b/arch/x86/entry/vdso/vma.c
@@ -21,6 +21,7 @@
#include <asm/page.h>
#include <asm/desc.h>
#include <asm/cpufeature.h>
+#include <asm/mshyperv.h>
#if defined(CONFIG_X86_64)
unsigned int __read_mostly vdso64_enabled = 1;
@@ -120,6 +121,12 @@ static int vvar_fault(const struct vm_special_mapping *sm,
vmf->address,
__pa(pvti) >> PAGE_SHIFT);
}
+ } else if (sym_offset == image->sym_hvclock_page) {
+ struct ms_hyperv_tsc_page *tsc_pg = hv_get_tsc_page();
+
+ if (tsc_pg && vclock_was_used(VCLOCK_HVCLOCK))
+ ret = vm_insert_pfn(vma, vmf->address,
+ vmalloc_to_pfn(tsc_pg));
}
if (ret == 0 || ret == -EBUSY)
diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 0ce8485..17519e0 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -157,6 +157,9 @@ void hyperv_init(void)
tsc_msr.guest_physical_address = vmalloc_to_pfn(tsc_pg);
wrmsrl(HV_X64_MSR_REFERENCE_TSC, tsc_msr.as_uint64);
+
+ hyperv_cs_tsc.archdata.vclock_mode = VCLOCK_HVCLOCK;
+
clocksource_register_hz(&hyperv_cs_tsc, NSEC_PER_SEC/100);
return;
}
diff --git a/arch/x86/include/asm/clocksource.h b/arch/x86/include/asm/clocksource.h
index eae33c7..47bea8c 100644
--- a/arch/x86/include/asm/clocksource.h
+++ b/arch/x86/include/asm/clocksource.h
@@ -6,7 +6,8 @@
#define VCLOCK_NONE 0 /* No vDSO clock available. */
#define VCLOCK_TSC 1 /* vDSO should use vread_tsc. */
#define VCLOCK_PVCLOCK 2 /* vDSO should use vread_pvclock. */
-#define VCLOCK_MAX 2
+#define VCLOCK_HVCLOCK 3 /* vDSO should use vread_hvclock. */
+#define VCLOCK_MAX 3
struct arch_clocksource_data {
int vclock_mode;
diff --git a/arch/x86/include/asm/vdso.h b/arch/x86/include/asm/vdso.h
index 2444189..bccdf49 100644
--- a/arch/x86/include/asm/vdso.h
+++ b/arch/x86/include/asm/vdso.h
@@ -20,6 +20,7 @@ struct vdso_image {
long sym_vvar_page;
long sym_hpet_page;
long sym_pvclock_page;
+ long sym_hvclock_page;
long sym_VDSO32_NOTE_MASK;
long sym___kernel_sigreturn;
long sym___kernel_rt_sigreturn;
--
2.9.3
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method
2017-02-09 14:10 ` [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method Vitaly Kuznetsov
@ 2017-02-09 17:08 ` Thomas Gleixner
2017-02-09 18:27 ` Stephen Hemminger
` (2 more replies)
0 siblings, 3 replies; 22+ messages in thread
From: Thomas Gleixner @ 2017-02-09 17:08 UTC (permalink / raw)
To: Vitaly Kuznetsov
Cc: x86, Andy Lutomirski, Ingo Molnar, H. Peter Anvin,
K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger, Dexuan Cui,
linux-kernel, devel, virtualization
On Thu, 9 Feb 2017, Vitaly Kuznetsov wrote:
> +#ifdef CONFIG_HYPERV_TSCPAGE
> +static notrace u64 vread_hvclock(int *mode)
> +{
> + const struct ms_hyperv_tsc_page *tsc_pg =
> + (const struct ms_hyperv_tsc_page *)&hvclock_page;
> + u64 sequence, scale, offset, current_tick, cur_tsc;
> +
> + while (1) {
> + sequence = READ_ONCE(tsc_pg->tsc_sequence);
> + if (!sequence)
> + break;
> +
> + scale = READ_ONCE(tsc_pg->tsc_scale);
> + offset = READ_ONCE(tsc_pg->tsc_offset);
> + rdtscll(cur_tsc);
> +
> + current_tick = mul_u64_u64_shr(cur_tsc, scale, 64) + offset;
> +
> + if (READ_ONCE(tsc_pg->tsc_sequence) == sequence)
> + return current_tick;
That sequence stuff lacks still a sensible explanation. It's fundamentally
different from the sequence counting we do in the kernel, so documentation
for it is really required.
Thanks,
tglx
^ permalink raw reply [flat|nested] 22+ messages in thread
* RE: [PATCH 1/2] hyperv: implement hv_get_tsc_page()
2017-02-09 14:10 ` [PATCH 1/2] hyperv: implement hv_get_tsc_page() Vitaly Kuznetsov
@ 2017-02-09 18:24 ` Stephen Hemminger
2017-02-09 20:14 ` Thomas Gleixner
0 siblings, 1 reply; 22+ messages in thread
From: Stephen Hemminger @ 2017-02-09 18:24 UTC (permalink / raw)
To: Vitaly Kuznetsov, x86, Andy Lutomirski
Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, KY Srinivasan,
Haiyang Zhang, Dexuan Cui, linux-kernel, devel, virtualization
The actual code looks fine, but the style police will not like you.
{ should be at start of line on functions.
And #else should be at start of line,
But maybe this was just more of exchange mangling the mail.
-----Original Message-----
From: Vitaly Kuznetsov [mailto:vkuznets@redhat.com]
Sent: Thursday, February 9, 2017 6:11 AM
To: x86@kernel.org; Andy Lutomirski <luto@amacapital.net>
Cc: Thomas Gleixner <tglx@linutronix.de>; Ingo Molnar <mingo@redhat.com>; H. Peter Anvin <hpa@zytor.com>; KY Srinivasan <kys@microsoft.com>; Haiyang Zhang <haiyangz@microsoft.com>; Stephen Hemminger <sthemmin@microsoft.com>; Dexuan Cui <decui@microsoft.com>; linux-kernel@vger.kernel.org; devel@linuxdriverproject.org; virtualization@lists.linux-foundation.org
Subject: [PATCH 1/2] hyperv: implement hv_get_tsc_page()
To use Hyper-V TSC page clocksource from vDSO we need to make tsc_pg available. Implement hv_get_tsc_page() and add CONFIG_HYPERV_TSCPAGE to make #ifdef-s simple.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
arch/x86/hyperv/hv_init.c | 9 +++++++--
arch/x86/include/asm/mshyperv.h | 8 ++++++++
drivers/hv/Kconfig | 3 +++
3 files changed, 18 insertions(+), 2 deletions(-)
diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c index b371d0e..0ce8485 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -27,10 +27,15 @@
#include <linux/clockchips.h>
-#ifdef CONFIG_X86_64
+#ifdef CONFIG_HYPERV_TSCPAGE
static struct ms_hyperv_tsc_page *tsc_pg;
+struct ms_hyperv_tsc_page *hv_get_tsc_page(void) {
+ return tsc_pg;
+}
+
static u64 read_hv_clock_tsc(struct clocksource *arg) {
u64 current_tick;
@@ -136,7 +141,7 @@ void hyperv_init(void)
/*
* Register Hyper-V specific clocksource.
*/
-#ifdef CONFIG_X86_64
+#ifdef CONFIG_HYPERV_TSCPAGE
if (ms_hyperv.features & HV_X64_MSR_REFERENCE_TSC_AVAILABLE) {
union hv_x64_msr_hypercall_contents tsc_msr;
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h index f8dc370..14dd92c 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -173,4 +173,12 @@ void hyperv_report_panic(struct pt_regs *regs); bool hv_is_hypercall_page_setup(void); void hyperv_cleanup(void); #endif
+#ifdef CONFIG_HYPERV_TSCPAGE
+struct ms_hyperv_tsc_page *hv_get_tsc_page(void); #else static inline
+struct ms_hyperv_tsc_page *hv_get_tsc_page(void) {
+ return NULL;
+}
+#endif
#endif
diff --git a/drivers/hv/Kconfig b/drivers/hv/Kconfig index 0403b51..c29cd53 100644
--- a/drivers/hv/Kconfig
+++ b/drivers/hv/Kconfig
@@ -7,6 +7,9 @@ config HYPERV
Select this option to run Linux as a Hyper-V client operating
system.
+config HYPERV_TSCPAGE
+ def_bool HYPERV && X86_64
+
config HYPERV_UTILS
tristate "Microsoft Hyper-V Utilities driver"
depends on HYPERV && CONNECTOR && NLS
--
2.9.3
^ permalink raw reply [flat|nested] 22+ messages in thread
* RE: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method
2017-02-09 17:08 ` Thomas Gleixner
@ 2017-02-09 18:27 ` Stephen Hemminger
2017-02-10 12:25 ` Vitaly Kuznetsov
2017-02-09 20:45 ` KY Srinivasan
2017-02-10 11:06 ` Vitaly Kuznetsov
2 siblings, 1 reply; 22+ messages in thread
From: Stephen Hemminger @ 2017-02-09 18:27 UTC (permalink / raw)
To: Thomas Gleixner, Vitaly Kuznetsov
Cc: x86, Andy Lutomirski, Ingo Molnar, H. Peter Anvin, KY Srinivasan,
Haiyang Zhang, Dexuan Cui, linux-kernel, devel, virtualization
Why not use existing seqlock's?
-----Original Message-----
From: Thomas Gleixner [mailto:tglx@linutronix.de]
Sent: Thursday, February 9, 2017 9:08 AM
To: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: x86@kernel.org; Andy Lutomirski <luto@amacapital.net>; Ingo Molnar <mingo@redhat.com>; H. Peter Anvin <hpa@zytor.com>; KY Srinivasan <kys@microsoft.com>; Haiyang Zhang <haiyangz@microsoft.com>; Stephen Hemminger <sthemmin@microsoft.com>; Dexuan Cui <decui@microsoft.com>; linux-kernel@vger.kernel.org; devel@linuxdriverproject.org; virtualization@lists.linux-foundation.org
Subject: Re: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method
On Thu, 9 Feb 2017, Vitaly Kuznetsov wrote:
> +#ifdef CONFIG_HYPERV_TSCPAGE
> +static notrace u64 vread_hvclock(int *mode) {
> + const struct ms_hyperv_tsc_page *tsc_pg =
> + (const struct ms_hyperv_tsc_page *)&hvclock_page;
> + u64 sequence, scale, offset, current_tick, cur_tsc;
> +
> + while (1) {
> + sequence = READ_ONCE(tsc_pg->tsc_sequence);
> + if (!sequence)
> + break;
> +
> + scale = READ_ONCE(tsc_pg->tsc_scale);
> + offset = READ_ONCE(tsc_pg->tsc_offset);
> + rdtscll(cur_tsc);
> +
> + current_tick = mul_u64_u64_shr(cur_tsc, scale, 64) + offset;
> +
> + if (READ_ONCE(tsc_pg->tsc_sequence) == sequence)
> + return current_tick;
That sequence stuff lacks still a sensible explanation. It's fundamentally different from the sequence counting we do in the kernel, so documentation for it is really required.
Thanks,
tglx
^ permalink raw reply [flat|nested] 22+ messages in thread
* RE: [PATCH 1/2] hyperv: implement hv_get_tsc_page()
2017-02-09 18:24 ` Stephen Hemminger
@ 2017-02-09 20:14 ` Thomas Gleixner
2017-02-09 23:17 ` Stephen Hemminger
0 siblings, 1 reply; 22+ messages in thread
From: Thomas Gleixner @ 2017-02-09 20:14 UTC (permalink / raw)
To: Stephen Hemminger
Cc: Vitaly Kuznetsov, x86, Andy Lutomirski, Ingo Molnar,
H. Peter Anvin, KY Srinivasan, Haiyang Zhang, Dexuan Cui,
linux-kernel, devel, virtualization
On Thu, 9 Feb 2017, Stephen Hemminger wrote:
> The actual code looks fine, but the style police will not like you.
> { should be at start of line on functions.
> And #else should be at start of line,
>
> But maybe this was just more of exchange mangling the mail.
Looks like.
> +struct ms_hyperv_tsc_page *hv_get_tsc_page(void) {
> + return tsc_pg;
> +}
> +
That's how it reads in a proper mail client connected to a proper mail
server:
> +struct ms_hyperv_tsc_page *hv_get_tsc_page(void)
> +{
> + return tsc_pg;
> +}
:)
^ permalink raw reply [flat|nested] 22+ messages in thread
* RE: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method
2017-02-09 17:08 ` Thomas Gleixner
2017-02-09 18:27 ` Stephen Hemminger
@ 2017-02-09 20:45 ` KY Srinivasan
2017-02-09 22:55 ` Andy Lutomirski
2017-02-10 11:06 ` Vitaly Kuznetsov
2 siblings, 1 reply; 22+ messages in thread
From: KY Srinivasan @ 2017-02-09 20:45 UTC (permalink / raw)
To: Thomas Gleixner, Vitaly Kuznetsov
Cc: x86, Andy Lutomirski, Ingo Molnar, H. Peter Anvin, Haiyang Zhang,
Stephen Hemminger, Dexuan Cui, linux-kernel, devel,
virtualization
> -----Original Message-----
> From: Thomas Gleixner [mailto:tglx@linutronix.de]
> Sent: Thursday, February 9, 2017 9:08 AM
> To: Vitaly Kuznetsov <vkuznets@redhat.com>
> Cc: x86@kernel.org; Andy Lutomirski <luto@amacapital.net>; Ingo Molnar
> <mingo@redhat.com>; H. Peter Anvin <hpa@zytor.com>; KY Srinivasan
> <kys@microsoft.com>; Haiyang Zhang <haiyangz@microsoft.com>; Stephen
> Hemminger <sthemmin@microsoft.com>; Dexuan Cui
> <decui@microsoft.com>; linux-kernel@vger.kernel.org;
> devel@linuxdriverproject.org; virtualization@lists.linux-foundation.org
> Subject: Re: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read
> method
>
> On Thu, 9 Feb 2017, Vitaly Kuznetsov wrote:
> > +#ifdef CONFIG_HYPERV_TSCPAGE
> > +static notrace u64 vread_hvclock(int *mode)
> > +{
> > + const struct ms_hyperv_tsc_page *tsc_pg =
> > + (const struct ms_hyperv_tsc_page *)&hvclock_page;
> > + u64 sequence, scale, offset, current_tick, cur_tsc;
> > +
> > + while (1) {
> > + sequence = READ_ONCE(tsc_pg->tsc_sequence);
> > + if (!sequence)
> > + break;
> > +
> > + scale = READ_ONCE(tsc_pg->tsc_scale);
> > + offset = READ_ONCE(tsc_pg->tsc_offset);
> > + rdtscll(cur_tsc);
> > +
> > + current_tick = mul_u64_u64_shr(cur_tsc, scale, 64) + offset;
> > +
> > + if (READ_ONCE(tsc_pg->tsc_sequence) == sequence)
> > + return current_tick;
>
> That sequence stuff lacks still a sensible explanation. It's fundamentally
> different from the sequence counting we do in the kernel, so documentation
> for it is really required.
The host is updating multiple fields in this shared TSC page and the sequence number is
used to ensure that the guest sees a consistent set values published. If I remember
correctly, Xen has a similar mechanism.
K. Y
>
> Thanks,
>
> tglx
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method
2017-02-09 20:45 ` KY Srinivasan
@ 2017-02-09 22:55 ` Andy Lutomirski
2017-02-09 23:15 ` Stephen Hemminger
2017-02-10 12:15 ` Vitaly Kuznetsov
0 siblings, 2 replies; 22+ messages in thread
From: Andy Lutomirski @ 2017-02-09 22:55 UTC (permalink / raw)
To: KY Srinivasan
Cc: Thomas Gleixner, Vitaly Kuznetsov, x86, Ingo Molnar,
H. Peter Anvin, Haiyang Zhang, Stephen Hemminger, Dexuan Cui,
linux-kernel, devel, virtualization
On Thu, Feb 9, 2017 at 12:45 PM, KY Srinivasan <kys@microsoft.com> wrote:
>
>
>> -----Original Message-----
>> From: Thomas Gleixner [mailto:tglx@linutronix.de]
>> Sent: Thursday, February 9, 2017 9:08 AM
>> To: Vitaly Kuznetsov <vkuznets@redhat.com>
>> Cc: x86@kernel.org; Andy Lutomirski <luto@amacapital.net>; Ingo Molnar
>> <mingo@redhat.com>; H. Peter Anvin <hpa@zytor.com>; KY Srinivasan
>> <kys@microsoft.com>; Haiyang Zhang <haiyangz@microsoft.com>; Stephen
>> Hemminger <sthemmin@microsoft.com>; Dexuan Cui
>> <decui@microsoft.com>; linux-kernel@vger.kernel.org;
>> devel@linuxdriverproject.org; virtualization@lists.linux-foundation.org
>> Subject: Re: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read
>> method
>>
>> On Thu, 9 Feb 2017, Vitaly Kuznetsov wrote:
>> > +#ifdef CONFIG_HYPERV_TSCPAGE
>> > +static notrace u64 vread_hvclock(int *mode)
>> > +{
>> > + const struct ms_hyperv_tsc_page *tsc_pg =
>> > + (const struct ms_hyperv_tsc_page *)&hvclock_page;
>> > + u64 sequence, scale, offset, current_tick, cur_tsc;
>> > +
>> > + while (1) {
>> > + sequence = READ_ONCE(tsc_pg->tsc_sequence);
>> > + if (!sequence)
>> > + break;
>> > +
>> > + scale = READ_ONCE(tsc_pg->tsc_scale);
>> > + offset = READ_ONCE(tsc_pg->tsc_offset);
>> > + rdtscll(cur_tsc);
>> > +
>> > + current_tick = mul_u64_u64_shr(cur_tsc, scale, 64) + offset;
>> > +
>> > + if (READ_ONCE(tsc_pg->tsc_sequence) == sequence)
>> > + return current_tick;
>>
>> That sequence stuff lacks still a sensible explanation. It's fundamentally
>> different from the sequence counting we do in the kernel, so documentation
>> for it is really required.
>
> The host is updating multiple fields in this shared TSC page and the sequence number is
> used to ensure that the guest sees a consistent set values published. If I remember
> correctly, Xen has a similar mechanism.
So what's the actual protocol? When the hypervisor updates the page,
does it freeze all guest cpus? If not, how does it maintain
atomicity?
--Andy
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method
2017-02-09 22:55 ` Andy Lutomirski
@ 2017-02-09 23:15 ` Stephen Hemminger
2017-02-10 12:15 ` Vitaly Kuznetsov
1 sibling, 0 replies; 22+ messages in thread
From: Stephen Hemminger @ 2017-02-09 23:15 UTC (permalink / raw)
To: Andy Lutomirski
Cc: KY Srinivasan, Stephen Hemminger, Haiyang Zhang, x86,
linux-kernel, virtualization, Ingo Molnar, H. Peter Anvin, devel,
Thomas Gleixner
On Thu, 9 Feb 2017 14:55:50 -0800
Andy Lutomirski <luto@amacapital.net> wrote:
> On Thu, Feb 9, 2017 at 12:45 PM, KY Srinivasan <kys@microsoft.com> wrote:
> >
> >
> >> -----Original Message-----
> >> From: Thomas Gleixner [mailto:tglx@linutronix.de]
> >> Sent: Thursday, February 9, 2017 9:08 AM
> >> To: Vitaly Kuznetsov <vkuznets@redhat.com>
> >> Cc: x86@kernel.org; Andy Lutomirski <luto@amacapital.net>; Ingo Molnar
> >> <mingo@redhat.com>; H. Peter Anvin <hpa@zytor.com>; KY Srinivasan
> >> <kys@microsoft.com>; Haiyang Zhang <haiyangz@microsoft.com>; Stephen
> >> Hemminger <sthemmin@microsoft.com>; Dexuan Cui
> >> <decui@microsoft.com>; linux-kernel@vger.kernel.org;
> >> devel@linuxdriverproject.org; virtualization@lists.linux-foundation.org
> >> Subject: Re: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read
> >> method
> >>
> >> On Thu, 9 Feb 2017, Vitaly Kuznetsov wrote:
> >> > +#ifdef CONFIG_HYPERV_TSCPAGE
> >> > +static notrace u64 vread_hvclock(int *mode)
> >> > +{
> >> > + const struct ms_hyperv_tsc_page *tsc_pg =
> >> > + (const struct ms_hyperv_tsc_page *)&hvclock_page;
> >> > + u64 sequence, scale, offset, current_tick, cur_tsc;
> >> > +
> >> > + while (1) {
> >> > + sequence = READ_ONCE(tsc_pg->tsc_sequence);
> >> > + if (!sequence)
> >> > + break;
> >> > +
> >> > + scale = READ_ONCE(tsc_pg->tsc_scale);
> >> > + offset = READ_ONCE(tsc_pg->tsc_offset);
> >> > + rdtscll(cur_tsc);
> >> > +
> >> > + current_tick = mul_u64_u64_shr(cur_tsc, scale, 64) + offset;
> >> > +
> >> > + if (READ_ONCE(tsc_pg->tsc_sequence) == sequence)
> >> > + return current_tick;
> >>
> >> That sequence stuff lacks still a sensible explanation. It's fundamentally
> >> different from the sequence counting we do in the kernel, so documentation
> >> for it is really required.
> >
> > The host is updating multiple fields in this shared TSC page and the sequence number is
> > used to ensure that the guest sees a consistent set values published. If I remember
> > correctly, Xen has a similar mechanism.
>
> So what's the actual protocol? When the hypervisor updates the page,
> does it freeze all guest cpus? If not, how does it maintain
> atomicity?
The protocol looks a lot like Linux seqlock, but it has an extra protection
which is missing here.
The host needs to update sequence number twice in order to guarantee ordering.
Otherwise it is possible that Host and guest can race.
Host
Write offset
Write scale
Set tsc_sequence = N
Guest
read sequence = N
Read scale
Write scale
Write offset
Read Offset
Check sequence == N
Set tsc_sequence = N +1
Look like the current host side protocol is wrong.
The solution that Andi Kleen invented, and I used in seqlock was for the writer to update
sequence at start and end of transaction. If sequence number is odd, then the reader knows
it is looking at stale data.
Host
Write offset
Write scale
Set tsc_sequence = N (end of transaction)
Guest
read sequence = N
Spin until sequence is even (N is even)
Read scale
Set tsc_sequence += 1
Write scale
Write offset
Read Offset
Check sequence == N? (fails is N + 1)
Set tsc_sequence += 1 (end of transaction)
read sequence = N+2
Spin until sequence is even (ie N +2)
Read scale
Read Offset
Check sequence == N +2? (yes ok).
Also it is faster to just read scale and offset with this loop and save
the reading of TSC and doing multiply until after scale/offset has been acquired.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 1/2] hyperv: implement hv_get_tsc_page()
2017-02-09 20:14 ` Thomas Gleixner
@ 2017-02-09 23:17 ` Stephen Hemminger
0 siblings, 0 replies; 22+ messages in thread
From: Stephen Hemminger @ 2017-02-09 23:17 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Stephen Hemminger, Haiyang Zhang, x86, linux-kernel,
Andy Lutomirski, Ingo Molnar, H. Peter Anvin, devel,
virtualization
On Thu, 9 Feb 2017 21:14:25 +0100 (CET)
Thomas Gleixner <tglx@linutronix.de> wrote:
> On Thu, 9 Feb 2017, Stephen Hemminger wrote:
>
> > The actual code looks fine, but the style police will not like you.
> > { should be at start of line on functions.
> > And #else should be at start of line,
> >
> > But maybe this was just more of exchange mangling the mail.
>
> Looks like.
>
> > +struct ms_hyperv_tsc_page *hv_get_tsc_page(void) {
> > + return tsc_pg;
> > +}
> > +
>
> That's how it reads in a proper mail client connected to a proper mail
> server:
>
> > +struct ms_hyperv_tsc_page *hv_get_tsc_page(void)
> > +{
> > + return tsc_pg;
> > +}
>
> :)
Yup. it looks like the mail server is trying to be "helpful" by eliminating extra white space.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method
2017-02-09 17:08 ` Thomas Gleixner
2017-02-09 18:27 ` Stephen Hemminger
2017-02-09 20:45 ` KY Srinivasan
@ 2017-02-10 11:06 ` Vitaly Kuznetsov
2017-02-10 11:15 ` Thomas Gleixner
2 siblings, 1 reply; 22+ messages in thread
From: Vitaly Kuznetsov @ 2017-02-10 11:06 UTC (permalink / raw)
To: Thomas Gleixner
Cc: x86, Andy Lutomirski, Ingo Molnar, H. Peter Anvin,
K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger, Dexuan Cui,
linux-kernel, devel, virtualization
Thomas Gleixner <tglx@linutronix.de> writes:
> On Thu, 9 Feb 2017, Vitaly Kuznetsov wrote:
>> +#ifdef CONFIG_HYPERV_TSCPAGE
>> +static notrace u64 vread_hvclock(int *mode)
>> +{
>> + const struct ms_hyperv_tsc_page *tsc_pg =
>> + (const struct ms_hyperv_tsc_page *)&hvclock_page;
>> + u64 sequence, scale, offset, current_tick, cur_tsc;
>> +
>> + while (1) {
>> + sequence = READ_ONCE(tsc_pg->tsc_sequence);
>> + if (!sequence)
>> + break;
>> +
>> + scale = READ_ONCE(tsc_pg->tsc_scale);
>> + offset = READ_ONCE(tsc_pg->tsc_offset);
>> + rdtscll(cur_tsc);
>> +
>> + current_tick = mul_u64_u64_shr(cur_tsc, scale, 64) + offset;
>> +
>> + if (READ_ONCE(tsc_pg->tsc_sequence) == sequence)
>> + return current_tick;
>
> That sequence stuff lacks still a sensible explanation. It's fundamentally
> different from the sequence counting we do in the kernel, so documentation
> for it is really required.
Sure, do you think the following would do?
diff --git a/arch/x86/entry/vdso/vclock_gettime.c b/arch/x86/entry/vdso/vclock_gettime.c
index 4af10b4..886b600 100644
--- a/arch/x86/entry/vdso/vclock_gettime.c
+++ b/arch/x86/entry/vdso/vclock_gettime.c
@@ -154,6 +154,22 @@ static notrace u64 vread_hvclock(int *mode)
(const struct ms_hyperv_tsc_page *)&hvclock_page;
u64 sequence, scale, offset, current_tick, cur_tsc;
+ /*
+ * The protocol for reading Hyper-V TSC page is specified in Hypervisor
+ * Top-Level Functional Specification ver. 3.0 and above. To get the
+ * reference time we must do the following:
+ * - READ ReferenceTscSequence
+ * A special '0' value indicates the time source is unreliable and we
+ * need to use something else. The currently published specification
+ * versions (up to 4.0b) contain a mistake and wrongly claim '-1'
+ * instead of '0' as the special value, see commit c35b82ef0294.
+ * - ReferenceTime =
+ * ((RDTSC() * ReferenceTscScale) >> 64) + ReferenceTscOffset
+ * - READ ReferenceTscSequence again. In case its value has changed
+ * since our first reading we need to discard ReferenceTime and repeat
+ * the whole sequence as the hypervisor was updating the page in
+ * between.
+ */
while (1) {
sequence = READ_ONCE(tsc_pg->tsc_sequence);
if (!sequence)
--
Vitaly
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method
2017-02-10 11:06 ` Vitaly Kuznetsov
@ 2017-02-10 11:15 ` Thomas Gleixner
0 siblings, 0 replies; 22+ messages in thread
From: Thomas Gleixner @ 2017-02-10 11:15 UTC (permalink / raw)
To: Vitaly Kuznetsov
Cc: x86, Andy Lutomirski, Ingo Molnar, H. Peter Anvin,
K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger, Dexuan Cui,
linux-kernel, devel, virtualization
On Fri, 10 Feb 2017, Vitaly Kuznetsov wrote:
> Thomas Gleixner <tglx@linutronix.de> writes:
>
> > On Thu, 9 Feb 2017, Vitaly Kuznetsov wrote:
> >> +#ifdef CONFIG_HYPERV_TSCPAGE
> >> +static notrace u64 vread_hvclock(int *mode)
> >> +{
> >> + const struct ms_hyperv_tsc_page *tsc_pg =
> >> + (const struct ms_hyperv_tsc_page *)&hvclock_page;
> >> + u64 sequence, scale, offset, current_tick, cur_tsc;
> >> +
> >> + while (1) {
> >> + sequence = READ_ONCE(tsc_pg->tsc_sequence);
> >> + if (!sequence)
> >> + break;
> >> +
> >> + scale = READ_ONCE(tsc_pg->tsc_scale);
> >> + offset = READ_ONCE(tsc_pg->tsc_offset);
> >> + rdtscll(cur_tsc);
> >> +
> >> + current_tick = mul_u64_u64_shr(cur_tsc, scale, 64) + offset;
> >> +
> >> + if (READ_ONCE(tsc_pg->tsc_sequence) == sequence)
> >> + return current_tick;
> >
> > That sequence stuff lacks still a sensible explanation. It's fundamentally
> > different from the sequence counting we do in the kernel, so documentation
> > for it is really required.
>
> Sure, do you think the following would do?
Yes
> diff --git a/arch/x86/entry/vdso/vclock_gettime.c b/arch/x86/entry/vdso/vclock_gettime.c
> index 4af10b4..886b600 100644
> --- a/arch/x86/entry/vdso/vclock_gettime.c
> +++ b/arch/x86/entry/vdso/vclock_gettime.c
> @@ -154,6 +154,22 @@ static notrace u64 vread_hvclock(int *mode)
> (const struct ms_hyperv_tsc_page *)&hvclock_page;
> u64 sequence, scale, offset, current_tick, cur_tsc;
>
> + /*
> + * The protocol for reading Hyper-V TSC page is specified in Hypervisor
> + * Top-Level Functional Specification ver. 3.0 and above. To get the
> + * reference time we must do the following:
> + * - READ ReferenceTscSequence
> + * A special '0' value indicates the time source is unreliable and we
> + * need to use something else. The currently published specification
> + * versions (up to 4.0b) contain a mistake and wrongly claim '-1'
> + * instead of '0' as the special value, see commit c35b82ef0294.
> + * - ReferenceTime =
> + * ((RDTSC() * ReferenceTscScale) >> 64) + ReferenceTscOffset
> + * - READ ReferenceTscSequence again. In case its value has changed
> + * since our first reading we need to discard ReferenceTime and repeat
> + * the whole sequence as the hypervisor was updating the page in
> + * between.
> + */
> while (1) {
> sequence = READ_ONCE(tsc_pg->tsc_sequence);
> if (!sequence)
>
> --
> Vitaly
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method
2017-02-09 22:55 ` Andy Lutomirski
2017-02-09 23:15 ` Stephen Hemminger
@ 2017-02-10 12:15 ` Vitaly Kuznetsov
1 sibling, 0 replies; 22+ messages in thread
From: Vitaly Kuznetsov @ 2017-02-10 12:15 UTC (permalink / raw)
To: Andy Lutomirski
Cc: KY Srinivasan, Thomas Gleixner, x86, Ingo Molnar, H. Peter Anvin,
Haiyang Zhang, Stephen Hemminger, Dexuan Cui, linux-kernel,
devel, virtualization
Andy Lutomirski <luto@amacapital.net> writes:
> On Thu, Feb 9, 2017 at 12:45 PM, KY Srinivasan <kys@microsoft.com> wrote:
>>
>>
>>> -----Original Message-----
>>> From: Thomas Gleixner [mailto:tglx@linutronix.de]
>>> Sent: Thursday, February 9, 2017 9:08 AM
>>> To: Vitaly Kuznetsov <vkuznets@redhat.com>
>>> Cc: x86@kernel.org; Andy Lutomirski <luto@amacapital.net>; Ingo Molnar
>>> <mingo@redhat.com>; H. Peter Anvin <hpa@zytor.com>; KY Srinivasan
>>> <kys@microsoft.com>; Haiyang Zhang <haiyangz@microsoft.com>; Stephen
>>> Hemminger <sthemmin@microsoft.com>; Dexuan Cui
>>> <decui@microsoft.com>; linux-kernel@vger.kernel.org;
>>> devel@linuxdriverproject.org; virtualization@lists.linux-foundation.org
>>> Subject: Re: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read
>>> method
>>>
>>> On Thu, 9 Feb 2017, Vitaly Kuznetsov wrote:
>>> > +#ifdef CONFIG_HYPERV_TSCPAGE
>>> > +static notrace u64 vread_hvclock(int *mode)
>>> > +{
>>> > + const struct ms_hyperv_tsc_page *tsc_pg =
>>> > + (const struct ms_hyperv_tsc_page *)&hvclock_page;
>>> > + u64 sequence, scale, offset, current_tick, cur_tsc;
>>> > +
>>> > + while (1) {
>>> > + sequence = READ_ONCE(tsc_pg->tsc_sequence);
>>> > + if (!sequence)
>>> > + break;
>>> > +
>>> > + scale = READ_ONCE(tsc_pg->tsc_scale);
>>> > + offset = READ_ONCE(tsc_pg->tsc_offset);
>>> > + rdtscll(cur_tsc);
>>> > +
>>> > + current_tick = mul_u64_u64_shr(cur_tsc, scale, 64) + offset;
>>> > +
>>> > + if (READ_ONCE(tsc_pg->tsc_sequence) == sequence)
>>> > + return current_tick;
>>>
>>> That sequence stuff lacks still a sensible explanation. It's fundamentally
>>> different from the sequence counting we do in the kernel, so documentation
>>> for it is really required.
>>
>> The host is updating multiple fields in this shared TSC page and the sequence number is
>> used to ensure that the guest sees a consistent set values published. If I remember
>> correctly, Xen has a similar mechanism.
>
> So what's the actual protocol? When the hypervisor updates the page,
> does it freeze all guest cpus? If not, how does it maintain
> atomicity?
I don't really know how it is implemented server-side but I *think* that
freezing all CPUs is only required when we want to update *both*
ReferenceTscScale and ReferenceTscOffset at the same time (as Hyper-V is
64-bit only so it can always atomically update 64-bit values)...
--
Vitaly
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method
2017-02-09 18:27 ` Stephen Hemminger
@ 2017-02-10 12:25 ` Vitaly Kuznetsov
2017-02-10 12:28 ` Thomas Gleixner
0 siblings, 1 reply; 22+ messages in thread
From: Vitaly Kuznetsov @ 2017-02-10 12:25 UTC (permalink / raw)
To: Stephen Hemminger
Cc: Thomas Gleixner, x86, Andy Lutomirski, Ingo Molnar,
H. Peter Anvin, KY Srinivasan, Haiyang Zhang, Dexuan Cui,
linux-kernel, devel, virtualization
Stephen Hemminger <sthemmin@microsoft.com> writes:
> Why not use existing seqlock's?
>
To be honest I don't quite understand how we could use it -- the
sequence locking here is done against the page updated by the
hypersior, we're not creating new structures (so I don't understand how
we could use struct seqcount which we don't have) but I may be
misunderstanding something.
BTW, I just occured to me that I should've probably put the TSC reading
code to mshyperv.h and use it from both vDSO and read_hv_clock_tsc() --
what do you thing?
[snip]
--
Vitaly
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method
2017-02-10 12:25 ` Vitaly Kuznetsov
@ 2017-02-10 12:28 ` Thomas Gleixner
2017-02-10 16:31 ` Stephen Hemminger
0 siblings, 1 reply; 22+ messages in thread
From: Thomas Gleixner @ 2017-02-10 12:28 UTC (permalink / raw)
To: Vitaly Kuznetsov
Cc: Stephen Hemminger, x86, Andy Lutomirski, Ingo Molnar,
H. Peter Anvin, KY Srinivasan, Haiyang Zhang, Dexuan Cui,
linux-kernel, devel, virtualization
On Fri, 10 Feb 2017, Vitaly Kuznetsov wrote:
> Stephen Hemminger <sthemmin@microsoft.com> writes:
>
> > Why not use existing seqlock's?
> >
>
> To be honest I don't quite understand how we could use it -- the
> sequence locking here is done against the page updated by the
> hypersior, we're not creating new structures (so I don't understand how
> we could use struct seqcount which we don't have) but I may be
> misunderstanding something.
You can't use seqlock, but you might be able to use seqcount. Though I
doubt it given the 0 check ....
Thanks,
tglx
^ permalink raw reply [flat|nested] 22+ messages in thread
* RE: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method
2017-02-10 12:28 ` Thomas Gleixner
@ 2017-02-10 16:31 ` Stephen Hemminger
2017-02-10 18:01 ` Thomas Gleixner
0 siblings, 1 reply; 22+ messages in thread
From: Stephen Hemminger @ 2017-02-10 16:31 UTC (permalink / raw)
To: Thomas Gleixner, Vitaly Kuznetsov
Cc: x86, Andy Lutomirski, Ingo Molnar, H. Peter Anvin, KY Srinivasan,
Haiyang Zhang, Dexuan Cui, linux-kernel, devel, virtualization
Since sequence count algorithm is done by hypervisor, better to not reuse seqcount.
Still concerned that the code is racy.
-----Original Message-----
From: Thomas Gleixner [mailto:tglx@linutronix.de]
Sent: Friday, February 10, 2017 4:28 AM
To: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>; x86@kernel.org; Andy Lutomirski <luto@amacapital.net>; Ingo Molnar <mingo@redhat.com>; H. Peter Anvin <hpa@zytor.com>; KY Srinivasan <kys@microsoft.com>; Haiyang Zhang <haiyangz@microsoft.com>; Dexuan Cui <decui@microsoft.com>; linux-kernel@vger.kernel.org; devel@linuxdriverproject.org; virtualization@lists.linux-foundation.org
Subject: Re: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method
On Fri, 10 Feb 2017, Vitaly Kuznetsov wrote:
> Stephen Hemminger <sthemmin@microsoft.com> writes:
>
> > Why not use existing seqlock's?
> >
>
> To be honest I don't quite understand how we could use it -- the
> sequence locking here is done against the page updated by the
> hypersior, we're not creating new structures (so I don't understand
> how we could use struct seqcount which we don't have) but I may be
> misunderstanding something.
You can't use seqlock, but you might be able to use seqcount. Though I doubt it given the 0 check ....
Thanks,
tglx
^ permalink raw reply [flat|nested] 22+ messages in thread
* RE: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method
2017-02-10 16:31 ` Stephen Hemminger
@ 2017-02-10 18:01 ` Thomas Gleixner
2017-02-13 7:49 ` Dexuan Cui
0 siblings, 1 reply; 22+ messages in thread
From: Thomas Gleixner @ 2017-02-10 18:01 UTC (permalink / raw)
To: Stephen Hemminger
Cc: Vitaly Kuznetsov, x86, Andy Lutomirski, Ingo Molnar,
H. Peter Anvin, KY Srinivasan, Haiyang Zhang, Dexuan Cui,
linux-kernel, devel, virtualization
On Fri, 10 Feb 2017, Stephen Hemminger wrote:
> Since sequence count algorithm is done by hypervisor, better to not reuse seqcount.
> Still concerned that the code is racy.
That's a different question and can only be answered by the hypervisor
folks. Dunno, whether they have barrier requirements. The seqcount stuff
relies on:
do {
seq = READ_ONCE(s->sequence);
smp_rmb();
sample_data();
smp_rmb();
} while (s->sequence != seq);
which pairs with the writer side:
s->sequence++;
smp_wmb();
update_data();
smp_wmb();
s->sequence++;
That's important if the stuff happens cross CPU. If the update happens on
the same CPU then this is a different story and as there are VMexits
involved they might provide the required ordering already. But I can't tell
as I have no idea how that host side thing is done.
Thanks,
tglx
^ permalink raw reply [flat|nested] 22+ messages in thread
* RE: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method
2017-02-10 18:01 ` Thomas Gleixner
@ 2017-02-13 7:49 ` Dexuan Cui
2017-02-13 9:27 ` Thomas Gleixner
2017-02-13 19:06 ` Andy Lutomirski
0 siblings, 2 replies; 22+ messages in thread
From: Dexuan Cui @ 2017-02-13 7:49 UTC (permalink / raw)
To: Thomas Gleixner, Stephen Hemminger
Cc: Vitaly Kuznetsov, x86, Andy Lutomirski, Ingo Molnar,
H. Peter Anvin, KY Srinivasan, Haiyang Zhang, linux-kernel,
devel, virtualization
> From: Thomas Gleixner [mailto:tglx@linutronix.de]
> Sent: Saturday, February 11, 2017 02:02
> ...
> That's important if the stuff happens cross CPU. If the update happens on
> the same CPU then this is a different story and as there are VMexits
> involved they might provide the required ordering already. But I can't tell
> as I have no idea how that host side thing is done.
>
> tglx
IMO Hyper-V TSC page clocksource here seems pretty similar to KVM's pvclock,
So I would guess "the structure is only updated just before reentering the guest
after some VM event" (https://rwmj.wordpress.com/2010/10/15/kvm-pvclock/),
that is, the update should happen on the same CPU, I guess.
Thanks,
-- Dexuan
^ permalink raw reply [flat|nested] 22+ messages in thread
* RE: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method
2017-02-13 7:49 ` Dexuan Cui
@ 2017-02-13 9:27 ` Thomas Gleixner
2017-02-13 19:06 ` Andy Lutomirski
1 sibling, 0 replies; 22+ messages in thread
From: Thomas Gleixner @ 2017-02-13 9:27 UTC (permalink / raw)
To: Dexuan Cui
Cc: Stephen Hemminger, Vitaly Kuznetsov, x86, Andy Lutomirski,
Ingo Molnar, H. Peter Anvin, KY Srinivasan, Haiyang Zhang,
linux-kernel, devel, virtualization
On Mon, 13 Feb 2017, Dexuan Cui wrote:
> > From: Thomas Gleixner [mailto:tglx@linutronix.de]
> > Sent: Saturday, February 11, 2017 02:02
> > ...
> > That's important if the stuff happens cross CPU. If the update happens on
> > the same CPU then this is a different story and as there are VMexits
> > involved they might provide the required ordering already. But I can't tell
> > as I have no idea how that host side thing is done.
> >
>
> IMO Hyper-V TSC page clocksource here seems pretty similar to KVM's pvclock,
> So I would guess "the structure is only updated just before reentering the guest
> after some VM event" (https://rwmj.wordpress.com/2010/10/15/kvm-pvclock/),
> that is, the update should happen on the same CPU, I guess.
Guessing does not help much. There are a gazillion ways to implement such a
thing. I'm sure that there exists proper documentation of the mechanism
inside your company.
Thanks,
tglx
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method
2017-02-13 7:49 ` Dexuan Cui
2017-02-13 9:27 ` Thomas Gleixner
@ 2017-02-13 19:06 ` Andy Lutomirski
2017-02-13 19:28 ` Thomas Gleixner
1 sibling, 1 reply; 22+ messages in thread
From: Andy Lutomirski @ 2017-02-13 19:06 UTC (permalink / raw)
To: Dexuan Cui
Cc: Thomas Gleixner, Stephen Hemminger, Vitaly Kuznetsov, x86,
Ingo Molnar, H. Peter Anvin, KY Srinivasan, Haiyang Zhang,
linux-kernel, devel, virtualization
On Sun, Feb 12, 2017 at 11:49 PM, Dexuan Cui <decui@microsoft.com> wrote:
>> From: Thomas Gleixner [mailto:tglx@linutronix.de]
>> Sent: Saturday, February 11, 2017 02:02
>> ...
>> That's important if the stuff happens cross CPU. If the update happens on
>> the same CPU then this is a different story and as there are VMexits
>> involved they might provide the required ordering already. But I can't tell
>> as I have no idea how that host side thing is done.
>>
>> tglx
>
> IMO Hyper-V TSC page clocksource here seems pretty similar to KVM's pvclock,
> So I would guess "the structure is only updated just before reentering the guest
> after some VM event" (https://rwmj.wordpress.com/2010/10/15/kvm-pvclock/),
> that is, the update should happen on the same CPU, I guess.
If the patch is correct, there is one of these shared by all vCPUs, so
this is not a sufficient explanation.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method
2017-02-13 19:06 ` Andy Lutomirski
@ 2017-02-13 19:28 ` Thomas Gleixner
0 siblings, 0 replies; 22+ messages in thread
From: Thomas Gleixner @ 2017-02-13 19:28 UTC (permalink / raw)
To: Andy Lutomirski
Cc: Dexuan Cui, Stephen Hemminger, Vitaly Kuznetsov, x86,
Ingo Molnar, H. Peter Anvin, KY Srinivasan, Haiyang Zhang,
linux-kernel, devel, virtualization
On Mon, 13 Feb 2017, Andy Lutomirski wrote:
> On Sun, Feb 12, 2017 at 11:49 PM, Dexuan Cui <decui@microsoft.com> wrote:
> >> From: Thomas Gleixner [mailto:tglx@linutronix.de]
> >> Sent: Saturday, February 11, 2017 02:02
> >> ...
> >> That's important if the stuff happens cross CPU. If the update happens on
> >> the same CPU then this is a different story and as there are VMexits
> >> involved they might provide the required ordering already. But I can't tell
> >> as I have no idea how that host side thing is done.
> >>
> >> tglx
> >
> > IMO Hyper-V TSC page clocksource here seems pretty similar to KVM's pvclock,
> > So I would guess "the structure is only updated just before reentering the guest
> > after some VM event" (https://rwmj.wordpress.com/2010/10/15/kvm-pvclock/),
> > that is, the update should happen on the same CPU, I guess.
>
> If the patch is correct, there is one of these shared by all vCPUs, so
> this is not a sufficient explanation.
Right, because that's ony ONE TSC page for the whole guest and then the seq
stuff really lacks barriers.
Thanks,
tglx
^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2017-02-13 19:28 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-09 14:10 [PATCH 0/2] x86/vdso: Add Hyper-V TSC page clocksource support Vitaly Kuznetsov
2017-02-09 14:10 ` [PATCH 1/2] hyperv: implement hv_get_tsc_page() Vitaly Kuznetsov
2017-02-09 18:24 ` Stephen Hemminger
2017-02-09 20:14 ` Thomas Gleixner
2017-02-09 23:17 ` Stephen Hemminger
2017-02-09 14:10 ` [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method Vitaly Kuznetsov
2017-02-09 17:08 ` Thomas Gleixner
2017-02-09 18:27 ` Stephen Hemminger
2017-02-10 12:25 ` Vitaly Kuznetsov
2017-02-10 12:28 ` Thomas Gleixner
2017-02-10 16:31 ` Stephen Hemminger
2017-02-10 18:01 ` Thomas Gleixner
2017-02-13 7:49 ` Dexuan Cui
2017-02-13 9:27 ` Thomas Gleixner
2017-02-13 19:06 ` Andy Lutomirski
2017-02-13 19:28 ` Thomas Gleixner
2017-02-09 20:45 ` KY Srinivasan
2017-02-09 22:55 ` Andy Lutomirski
2017-02-09 23:15 ` Stephen Hemminger
2017-02-10 12:15 ` Vitaly Kuznetsov
2017-02-10 11:06 ` Vitaly Kuznetsov
2017-02-10 11:15 ` Thomas Gleixner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).