All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/3] x86/vdso: Add Hyper-V TSC page clocksource support
@ 2017-03-03 13:21 ` Vitaly Kuznetsov
  0 siblings, 0 replies; 15+ messages in thread
From: Vitaly Kuznetsov @ 2017-03-03 13:21 UTC (permalink / raw)
  To: x86, Andy Lutomirski, Thomas Gleixner
  Cc: Ingo Molnar, H. Peter Anvin, K. Y. Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Dexuan Cui, linux-kernel, devel,
	virtualization

Hi,

merge window is about to close so I hope it's OK to make another try here.

Changes since v2:
- Add explicit READ_ONCE() to not rely on 'volatile' [Andy Lutomirski]
- rdtsc() -> rdtsc_ordered() [Andy Lutomirski]
- virt_rmb() -> smp_rmb() [Thomas Gleixner, Andy Lutomirski]

Thomas, Andy, it seems the only blocker for the series was the ambiguity with
TSC page read algorithm. I contacted Microsoft (through K. Y.) and asked what
we should do when we see 'seq=0'. The answer is:

"I have confirmed that the only invalid value is 0 (notwithstanding what the
spec says). I have asked the Spec to be updated and the current code we have
is correct - it treats 0 as the only invalid value. The invalid value indicates
that the TSC page cannot be used as a time source and a different source is to
be used and this state is going to persist and so looping is not an option."

I agree it would be wiser to have two invalid values - one for 'try again' and
another for permanent failure in case it is needed. But this is not the case.

So I keep my algorithm untouched.

I can see two more reasons for us to keep it:
1) It is what we already have in kernel.
2) It is what Windows does (see the disassembly in c35b82ef0294.

Original description:

Hyper-V TSC page clocksource is suitable for vDSO, however, the protocol
defined by the hypervisor is different from VCLOCK_PVCLOCK. Implemented the
required support. Simple sysbench test shows the following results:

Before:
# time sysbench --test=memory --max-requests=500000 run
...
real    1m22.618s
user    0m50.193s
sys     0m32.268s

After:
# time sysbench --test=memory --max-requests=500000 run
...
real	0m47.241s
user	0m47.117s
sys	0m0.008s

Vitaly Kuznetsov (3):
  x86/hyperv: implement hv_get_tsc_page()
  x86/hyperv: move TSC reading method to asm/mshyperv.h
  x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method

 arch/x86/entry/vdso/vclock_gettime.c  | 24 +++++++++++++++
 arch/x86/entry/vdso/vdso-layout.lds.S |  3 +-
 arch/x86/entry/vdso/vdso2c.c          |  3 ++
 arch/x86/entry/vdso/vma.c             |  7 +++++
 arch/x86/hyperv/hv_init.c             | 48 +++++++++--------------------
 arch/x86/include/asm/clocksource.h    |  3 +-
 arch/x86/include/asm/mshyperv.h       | 58 +++++++++++++++++++++++++++++++++++
 arch/x86/include/asm/vdso.h           |  1 +
 drivers/hv/Kconfig                    |  3 ++
 9 files changed, 114 insertions(+), 36 deletions(-)

-- 
2.9.3

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v3 0/3] x86/vdso: Add Hyper-V TSC page clocksource support
@ 2017-03-03 13:21 ` Vitaly Kuznetsov
  0 siblings, 0 replies; 15+ messages in thread
From: Vitaly Kuznetsov @ 2017-03-03 13:21 UTC (permalink / raw)
  To: x86, Andy Lutomirski, Thomas Gleixner
  Cc: Stephen Hemminger, Haiyang Zhang, linux-kernel, virtualization,
	Ingo Molnar, H. Peter Anvin, devel

Hi,

merge window is about to close so I hope it's OK to make another try here.

Changes since v2:
- Add explicit READ_ONCE() to not rely on 'volatile' [Andy Lutomirski]
- rdtsc() -> rdtsc_ordered() [Andy Lutomirski]
- virt_rmb() -> smp_rmb() [Thomas Gleixner, Andy Lutomirski]

Thomas, Andy, it seems the only blocker for the series was the ambiguity with
TSC page read algorithm. I contacted Microsoft (through K. Y.) and asked what
we should do when we see 'seq=0'. The answer is:

"I have confirmed that the only invalid value is 0 (notwithstanding what the
spec says). I have asked the Spec to be updated and the current code we have
is correct - it treats 0 as the only invalid value. The invalid value indicates
that the TSC page cannot be used as a time source and a different source is to
be used and this state is going to persist and so looping is not an option."

I agree it would be wiser to have two invalid values - one for 'try again' and
another for permanent failure in case it is needed. But this is not the case.

So I keep my algorithm untouched.

I can see two more reasons for us to keep it:
1) It is what we already have in kernel.
2) It is what Windows does (see the disassembly in c35b82ef0294.

Original description:

Hyper-V TSC page clocksource is suitable for vDSO, however, the protocol
defined by the hypervisor is different from VCLOCK_PVCLOCK. Implemented the
required support. Simple sysbench test shows the following results:

Before:
# time sysbench --test=memory --max-requests=500000 run
...
real    1m22.618s
user    0m50.193s
sys     0m32.268s

After:
# time sysbench --test=memory --max-requests=500000 run
...
real	0m47.241s
user	0m47.117s
sys	0m0.008s

Vitaly Kuznetsov (3):
  x86/hyperv: implement hv_get_tsc_page()
  x86/hyperv: move TSC reading method to asm/mshyperv.h
  x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method

 arch/x86/entry/vdso/vclock_gettime.c  | 24 +++++++++++++++
 arch/x86/entry/vdso/vdso-layout.lds.S |  3 +-
 arch/x86/entry/vdso/vdso2c.c          |  3 ++
 arch/x86/entry/vdso/vma.c             |  7 +++++
 arch/x86/hyperv/hv_init.c             | 48 +++++++++--------------------
 arch/x86/include/asm/clocksource.h    |  3 +-
 arch/x86/include/asm/mshyperv.h       | 58 +++++++++++++++++++++++++++++++++++
 arch/x86/include/asm/vdso.h           |  1 +
 drivers/hv/Kconfig                    |  3 ++
 9 files changed, 114 insertions(+), 36 deletions(-)

-- 
2.9.3

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v3 1/3] x86/hyperv: implement hv_get_tsc_page()
  2017-03-03 13:21 ` Vitaly Kuznetsov
  (?)
  (?)
@ 2017-03-03 13:21 ` Vitaly Kuznetsov
  2017-03-11 13:51   ` [tip:x86/vdso] x86/hyperv: Implement hv_get_tsc_page() tip-bot for Vitaly Kuznetsov
  -1 siblings, 1 reply; 15+ messages in thread
From: Vitaly Kuznetsov @ 2017-03-03 13:21 UTC (permalink / raw)
  To: x86, Andy Lutomirski, Thomas Gleixner
  Cc: Ingo Molnar, H. Peter Anvin, K. Y. Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Dexuan Cui, linux-kernel, devel,
	virtualization

To use Hyper-V TSC page clocksource from vDSO we need to make tsc_pg
available. Implement hv_get_tsc_page() and add CONFIG_HYPERV_TSCPAGE to
make #ifdef-s simple.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/hyperv/hv_init.c       | 9 +++++++--
 arch/x86/include/asm/mshyperv.h | 8 ++++++++
 drivers/hv/Kconfig              | 3 +++
 3 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index db64baf0..6b64cae 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -27,10 +27,15 @@
 #include <linux/clockchips.h>
 
 
-#ifdef CONFIG_X86_64
+#ifdef CONFIG_HYPERV_TSCPAGE
 
 static struct ms_hyperv_tsc_page *tsc_pg;
 
+struct ms_hyperv_tsc_page *hv_get_tsc_page(void)
+{
+	return tsc_pg;
+}
+
 static u64 read_hv_clock_tsc(struct clocksource *arg)
 {
 	u64 current_tick;
@@ -139,7 +144,7 @@ void hyperv_init(void)
 	/*
 	 * Register Hyper-V specific clocksource.
 	 */
-#ifdef CONFIG_X86_64
+#ifdef CONFIG_HYPERV_TSCPAGE
 	if (ms_hyperv.features & HV_X64_MSR_REFERENCE_TSC_AVAILABLE) {
 		union hv_x64_msr_hypercall_contents tsc_msr;
 
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index 7c9c895..d324dce 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -176,4 +176,12 @@ void hyperv_report_panic(struct pt_regs *regs);
 bool hv_is_hypercall_page_setup(void);
 void hyperv_cleanup(void);
 #endif
+#ifdef CONFIG_HYPERV_TSCPAGE
+struct ms_hyperv_tsc_page *hv_get_tsc_page(void);
+#else
+static inline struct ms_hyperv_tsc_page *hv_get_tsc_page(void)
+{
+	return NULL;
+}
+#endif
 #endif
diff --git a/drivers/hv/Kconfig b/drivers/hv/Kconfig
index 0403b51..c29cd53 100644
--- a/drivers/hv/Kconfig
+++ b/drivers/hv/Kconfig
@@ -7,6 +7,9 @@ config HYPERV
 	  Select this option to run Linux as a Hyper-V client operating
 	  system.
 
+config HYPERV_TSCPAGE
+       def_bool HYPERV && X86_64
+
 config HYPERV_UTILS
 	tristate "Microsoft Hyper-V Utilities driver"
 	depends on HYPERV && CONNECTOR && NLS
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v3 1/3] x86/hyperv: implement hv_get_tsc_page()
  2017-03-03 13:21 ` Vitaly Kuznetsov
  (?)
@ 2017-03-03 13:21 ` Vitaly Kuznetsov
  -1 siblings, 0 replies; 15+ messages in thread
From: Vitaly Kuznetsov @ 2017-03-03 13:21 UTC (permalink / raw)
  To: x86, Andy Lutomirski, Thomas Gleixner
  Cc: Stephen Hemminger, Haiyang Zhang, linux-kernel, virtualization,
	Ingo Molnar, H. Peter Anvin, devel

To use Hyper-V TSC page clocksource from vDSO we need to make tsc_pg
available. Implement hv_get_tsc_page() and add CONFIG_HYPERV_TSCPAGE to
make #ifdef-s simple.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/hyperv/hv_init.c       | 9 +++++++--
 arch/x86/include/asm/mshyperv.h | 8 ++++++++
 drivers/hv/Kconfig              | 3 +++
 3 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index db64baf0..6b64cae 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -27,10 +27,15 @@
 #include <linux/clockchips.h>
 
 
-#ifdef CONFIG_X86_64
+#ifdef CONFIG_HYPERV_TSCPAGE
 
 static struct ms_hyperv_tsc_page *tsc_pg;
 
+struct ms_hyperv_tsc_page *hv_get_tsc_page(void)
+{
+	return tsc_pg;
+}
+
 static u64 read_hv_clock_tsc(struct clocksource *arg)
 {
 	u64 current_tick;
@@ -139,7 +144,7 @@ void hyperv_init(void)
 	/*
 	 * Register Hyper-V specific clocksource.
 	 */
-#ifdef CONFIG_X86_64
+#ifdef CONFIG_HYPERV_TSCPAGE
 	if (ms_hyperv.features & HV_X64_MSR_REFERENCE_TSC_AVAILABLE) {
 		union hv_x64_msr_hypercall_contents tsc_msr;
 
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index 7c9c895..d324dce 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -176,4 +176,12 @@ void hyperv_report_panic(struct pt_regs *regs);
 bool hv_is_hypercall_page_setup(void);
 void hyperv_cleanup(void);
 #endif
+#ifdef CONFIG_HYPERV_TSCPAGE
+struct ms_hyperv_tsc_page *hv_get_tsc_page(void);
+#else
+static inline struct ms_hyperv_tsc_page *hv_get_tsc_page(void)
+{
+	return NULL;
+}
+#endif
 #endif
diff --git a/drivers/hv/Kconfig b/drivers/hv/Kconfig
index 0403b51..c29cd53 100644
--- a/drivers/hv/Kconfig
+++ b/drivers/hv/Kconfig
@@ -7,6 +7,9 @@ config HYPERV
 	  Select this option to run Linux as a Hyper-V client operating
 	  system.
 
+config HYPERV_TSCPAGE
+       def_bool HYPERV && X86_64
+
 config HYPERV_UTILS
 	tristate "Microsoft Hyper-V Utilities driver"
 	depends on HYPERV && CONNECTOR && NLS
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v3 2/3] x86/hyperv: move TSC reading method to asm/mshyperv.h
  2017-03-03 13:21 ` Vitaly Kuznetsov
                   ` (3 preceding siblings ...)
  (?)
@ 2017-03-03 13:21 ` Vitaly Kuznetsov
  2017-03-03 19:31   ` Stephen Hemminger
                     ` (2 more replies)
  -1 siblings, 3 replies; 15+ messages in thread
From: Vitaly Kuznetsov @ 2017-03-03 13:21 UTC (permalink / raw)
  To: x86, Andy Lutomirski, Thomas Gleixner
  Cc: Ingo Molnar, H. Peter Anvin, K. Y. Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Dexuan Cui, linux-kernel, devel,
	virtualization

As a preparation to making Hyper-V TSC page suitable for vDSO move
the TSC page reading logic to asm/mshyperv.h. While on it, do the
following
- Document the reading algorithm.
- Simplify the code a bit.
- Add explicit READ_ONCE() to not rely on 'volatile'.
- Add explicit barriers to prevent re-ordering (we need to read sequence
  strictly before and after)
- Use mul_u64_u64_shr() instead of assembly, gcc generates a single 'mul'
  instruction on x86_64 anyway.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/hyperv/hv_init.c       | 36 ++++-------------------------
 arch/x86/include/asm/mshyperv.h | 50 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 54 insertions(+), 32 deletions(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 6b64cae..63dd00e 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -38,39 +38,11 @@ struct ms_hyperv_tsc_page *hv_get_tsc_page(void)
 
 static u64 read_hv_clock_tsc(struct clocksource *arg)
 {
-	u64 current_tick;
+	u64 current_tick = hv_read_tsc_page(tsc_pg);
+
+	if (current_tick == U64_MAX)
+		rdmsrl(HV_X64_MSR_TIME_REF_COUNT, current_tick);
 
-	if (tsc_pg->tsc_sequence != 0) {
-		/*
-		 * Use the tsc page to compute the value.
-		 */
-
-		while (1) {
-			u64 tmp;
-			u32 sequence = tsc_pg->tsc_sequence;
-			u64 cur_tsc;
-			u64 scale = tsc_pg->tsc_scale;
-			s64 offset = tsc_pg->tsc_offset;
-
-			rdtscll(cur_tsc);
-			/* current_tick = ((cur_tsc *scale) >> 64) + offset */
-			asm("mulq %3"
-				: "=d" (current_tick), "=a" (tmp)
-				: "a" (cur_tsc), "r" (scale));
-
-			current_tick += offset;
-			if (tsc_pg->tsc_sequence == sequence)
-				return current_tick;
-
-			if (tsc_pg->tsc_sequence != 0)
-				continue;
-			/*
-			 * Fallback using MSR method.
-			 */
-			break;
-		}
-	}
-	rdmsrl(HV_X64_MSR_TIME_REF_COUNT, current_tick);
 	return current_tick;
 }
 
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index d324dce..4ff25436 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -178,6 +178,56 @@ void hyperv_cleanup(void);
 #endif
 #ifdef CONFIG_HYPERV_TSCPAGE
 struct ms_hyperv_tsc_page *hv_get_tsc_page(void);
+static inline u64 hv_read_tsc_page(const struct ms_hyperv_tsc_page *tsc_pg)
+{
+	u64 scale, offset, current_tick, cur_tsc;
+	u32 sequence;
+
+	/*
+	 * The protocol for reading Hyper-V TSC page is specified in Hypervisor
+	 * Top-Level Functional Specification ver. 3.0 and above. To get the
+	 * reference time we must do the following:
+	 * - READ ReferenceTscSequence
+	 *   A special '0' value indicates the time source is unreliable and we
+	 *   need to use something else. The currently published specification
+	 *   versions (up to 4.0b) contain a mistake and wrongly claim '-1'
+	 *   instead of '0' as the special value, see commit c35b82ef0294.
+	 * - ReferenceTime =
+	 *        ((RDTSC() * ReferenceTscScale) >> 64) + ReferenceTscOffset
+	 * - READ ReferenceTscSequence again. In case its value has changed
+	 *   since our first reading we need to discard ReferenceTime and repeat
+	 *   the whole sequence as the hypervisor was updating the page in
+	 *   between.
+	 */
+	while (1) {
+		sequence = READ_ONCE(tsc_pg->tsc_sequence);
+		if (!sequence)
+			break;
+		/*
+		 * Make sure we read sequence before we read other values from
+		 * TSC page.
+		 */
+		smp_rmb();
+
+		scale = READ_ONCE(tsc_pg->tsc_scale);
+		offset = READ_ONCE(tsc_pg->tsc_offset);
+		cur_tsc = rdtsc_ordered();
+
+		current_tick = mul_u64_u64_shr(cur_tsc, scale, 64) + offset;
+
+		/*
+		 * Make sure we read sequence after we read all other values
+		 * from TSC page.
+		 */
+		smp_rmb();
+
+		if (READ_ONCE(tsc_pg->tsc_sequence) == sequence)
+			return current_tick;
+	}
+
+	return U64_MAX;
+}
+
 #else
 static inline struct ms_hyperv_tsc_page *hv_get_tsc_page(void)
 {
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v3 2/3] x86/hyperv: move TSC reading method to asm/mshyperv.h
  2017-03-03 13:21 ` Vitaly Kuznetsov
                   ` (2 preceding siblings ...)
  (?)
@ 2017-03-03 13:21 ` Vitaly Kuznetsov
  -1 siblings, 0 replies; 15+ messages in thread
From: Vitaly Kuznetsov @ 2017-03-03 13:21 UTC (permalink / raw)
  To: x86, Andy Lutomirski, Thomas Gleixner
  Cc: Stephen Hemminger, Haiyang Zhang, linux-kernel, virtualization,
	Ingo Molnar, H. Peter Anvin, devel

As a preparation to making Hyper-V TSC page suitable for vDSO move
the TSC page reading logic to asm/mshyperv.h. While on it, do the
following
- Document the reading algorithm.
- Simplify the code a bit.
- Add explicit READ_ONCE() to not rely on 'volatile'.
- Add explicit barriers to prevent re-ordering (we need to read sequence
  strictly before and after)
- Use mul_u64_u64_shr() instead of assembly, gcc generates a single 'mul'
  instruction on x86_64 anyway.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/hyperv/hv_init.c       | 36 ++++-------------------------
 arch/x86/include/asm/mshyperv.h | 50 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 54 insertions(+), 32 deletions(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 6b64cae..63dd00e 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -38,39 +38,11 @@ struct ms_hyperv_tsc_page *hv_get_tsc_page(void)
 
 static u64 read_hv_clock_tsc(struct clocksource *arg)
 {
-	u64 current_tick;
+	u64 current_tick = hv_read_tsc_page(tsc_pg);
+
+	if (current_tick == U64_MAX)
+		rdmsrl(HV_X64_MSR_TIME_REF_COUNT, current_tick);
 
-	if (tsc_pg->tsc_sequence != 0) {
-		/*
-		 * Use the tsc page to compute the value.
-		 */
-
-		while (1) {
-			u64 tmp;
-			u32 sequence = tsc_pg->tsc_sequence;
-			u64 cur_tsc;
-			u64 scale = tsc_pg->tsc_scale;
-			s64 offset = tsc_pg->tsc_offset;
-
-			rdtscll(cur_tsc);
-			/* current_tick = ((cur_tsc *scale) >> 64) + offset */
-			asm("mulq %3"
-				: "=d" (current_tick), "=a" (tmp)
-				: "a" (cur_tsc), "r" (scale));
-
-			current_tick += offset;
-			if (tsc_pg->tsc_sequence == sequence)
-				return current_tick;
-
-			if (tsc_pg->tsc_sequence != 0)
-				continue;
-			/*
-			 * Fallback using MSR method.
-			 */
-			break;
-		}
-	}
-	rdmsrl(HV_X64_MSR_TIME_REF_COUNT, current_tick);
 	return current_tick;
 }
 
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index d324dce..4ff25436 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -178,6 +178,56 @@ void hyperv_cleanup(void);
 #endif
 #ifdef CONFIG_HYPERV_TSCPAGE
 struct ms_hyperv_tsc_page *hv_get_tsc_page(void);
+static inline u64 hv_read_tsc_page(const struct ms_hyperv_tsc_page *tsc_pg)
+{
+	u64 scale, offset, current_tick, cur_tsc;
+	u32 sequence;
+
+	/*
+	 * The protocol for reading Hyper-V TSC page is specified in Hypervisor
+	 * Top-Level Functional Specification ver. 3.0 and above. To get the
+	 * reference time we must do the following:
+	 * - READ ReferenceTscSequence
+	 *   A special '0' value indicates the time source is unreliable and we
+	 *   need to use something else. The currently published specification
+	 *   versions (up to 4.0b) contain a mistake and wrongly claim '-1'
+	 *   instead of '0' as the special value, see commit c35b82ef0294.
+	 * - ReferenceTime =
+	 *        ((RDTSC() * ReferenceTscScale) >> 64) + ReferenceTscOffset
+	 * - READ ReferenceTscSequence again. In case its value has changed
+	 *   since our first reading we need to discard ReferenceTime and repeat
+	 *   the whole sequence as the hypervisor was updating the page in
+	 *   between.
+	 */
+	while (1) {
+		sequence = READ_ONCE(tsc_pg->tsc_sequence);
+		if (!sequence)
+			break;
+		/*
+		 * Make sure we read sequence before we read other values from
+		 * TSC page.
+		 */
+		smp_rmb();
+
+		scale = READ_ONCE(tsc_pg->tsc_scale);
+		offset = READ_ONCE(tsc_pg->tsc_offset);
+		cur_tsc = rdtsc_ordered();
+
+		current_tick = mul_u64_u64_shr(cur_tsc, scale, 64) + offset;
+
+		/*
+		 * Make sure we read sequence after we read all other values
+		 * from TSC page.
+		 */
+		smp_rmb();
+
+		if (READ_ONCE(tsc_pg->tsc_sequence) == sequence)
+			return current_tick;
+	}
+
+	return U64_MAX;
+}
+
 #else
 static inline struct ms_hyperv_tsc_page *hv_get_tsc_page(void)
 {
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v3 3/3] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method
  2017-03-03 13:21 ` Vitaly Kuznetsov
                   ` (5 preceding siblings ...)
  (?)
@ 2017-03-03 13:21 ` Vitaly Kuznetsov
  2017-03-11 13:52   ` [tip:x86/vdso] " tip-bot for Vitaly Kuznetsov
  -1 siblings, 1 reply; 15+ messages in thread
From: Vitaly Kuznetsov @ 2017-03-03 13:21 UTC (permalink / raw)
  To: x86, Andy Lutomirski, Thomas Gleixner
  Cc: Ingo Molnar, H. Peter Anvin, K. Y. Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Dexuan Cui, linux-kernel, devel,
	virtualization

Hyper-V TSC page clocksource is suitable for vDSO, however, the protocol
defined by the hypervisor is different from VCLOCK_PVCLOCK. Implement the
required support by adding hvclock_page VVAR.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/entry/vdso/vclock_gettime.c  | 24 ++++++++++++++++++++++++
 arch/x86/entry/vdso/vdso-layout.lds.S |  3 ++-
 arch/x86/entry/vdso/vdso2c.c          |  3 +++
 arch/x86/entry/vdso/vma.c             |  7 +++++++
 arch/x86/hyperv/hv_init.c             |  3 +++
 arch/x86/include/asm/clocksource.h    |  3 ++-
 arch/x86/include/asm/vdso.h           |  1 +
 7 files changed, 42 insertions(+), 2 deletions(-)

diff --git a/arch/x86/entry/vdso/vclock_gettime.c b/arch/x86/entry/vdso/vclock_gettime.c
index 9d4d6e1..fa8dbfc 100644
--- a/arch/x86/entry/vdso/vclock_gettime.c
+++ b/arch/x86/entry/vdso/vclock_gettime.c
@@ -17,6 +17,7 @@
 #include <asm/unistd.h>
 #include <asm/msr.h>
 #include <asm/pvclock.h>
+#include <asm/mshyperv.h>
 #include <linux/math64.h>
 #include <linux/time.h>
 #include <linux/kernel.h>
@@ -32,6 +33,11 @@ extern u8 pvclock_page
 	__attribute__((visibility("hidden")));
 #endif
 
+#ifdef CONFIG_HYPERV_TSCPAGE
+extern u8 hvclock_page
+	__attribute__((visibility("hidden")));
+#endif
+
 #ifndef BUILD_VDSO32
 
 notrace static long vdso_fallback_gettime(long clock, struct timespec *ts)
@@ -141,6 +147,20 @@ static notrace u64 vread_pvclock(int *mode)
 	return last;
 }
 #endif
+#ifdef CONFIG_HYPERV_TSCPAGE
+static notrace u64 vread_hvclock(int *mode)
+{
+	const struct ms_hyperv_tsc_page *tsc_pg =
+		(const struct ms_hyperv_tsc_page *)&hvclock_page;
+	u64 current_tick = hv_read_tsc_page(tsc_pg);
+
+	if (current_tick != U64_MAX)
+		return current_tick;
+
+	*mode = VCLOCK_NONE;
+	return 0;
+}
+#endif
 
 notrace static u64 vread_tsc(void)
 {
@@ -173,6 +193,10 @@ notrace static inline u64 vgetsns(int *mode)
 	else if (gtod->vclock_mode == VCLOCK_PVCLOCK)
 		cycles = vread_pvclock(mode);
 #endif
+#ifdef CONFIG_HYPERV_TSCPAGE
+	else if (gtod->vclock_mode == VCLOCK_HVCLOCK)
+		cycles = vread_hvclock(mode);
+#endif
 	else
 		return 0;
 	v = (cycles - gtod->cycle_last) & gtod->mask;
diff --git a/arch/x86/entry/vdso/vdso-layout.lds.S b/arch/x86/entry/vdso/vdso-layout.lds.S
index a708aa9..8ebb4b6 100644
--- a/arch/x86/entry/vdso/vdso-layout.lds.S
+++ b/arch/x86/entry/vdso/vdso-layout.lds.S
@@ -25,7 +25,7 @@ SECTIONS
 	 * segment.
 	 */
 
-	vvar_start = . - 2 * PAGE_SIZE;
+	vvar_start = . - 3 * PAGE_SIZE;
 	vvar_page = vvar_start;
 
 	/* Place all vvars at the offsets in asm/vvar.h. */
@@ -36,6 +36,7 @@ SECTIONS
 #undef EMIT_VVAR
 
 	pvclock_page = vvar_start + PAGE_SIZE;
+	hvclock_page = vvar_start + 2 * PAGE_SIZE;
 
 	. = SIZEOF_HEADERS;
 
diff --git a/arch/x86/entry/vdso/vdso2c.c b/arch/x86/entry/vdso/vdso2c.c
index 491020b..0780a44 100644
--- a/arch/x86/entry/vdso/vdso2c.c
+++ b/arch/x86/entry/vdso/vdso2c.c
@@ -74,6 +74,7 @@ enum {
 	sym_vvar_page,
 	sym_hpet_page,
 	sym_pvclock_page,
+	sym_hvclock_page,
 	sym_VDSO_FAKE_SECTION_TABLE_START,
 	sym_VDSO_FAKE_SECTION_TABLE_END,
 };
@@ -82,6 +83,7 @@ const int special_pages[] = {
 	sym_vvar_page,
 	sym_hpet_page,
 	sym_pvclock_page,
+	sym_hvclock_page,
 };
 
 struct vdso_sym {
@@ -94,6 +96,7 @@ struct vdso_sym required_syms[] = {
 	[sym_vvar_page] = {"vvar_page", true},
 	[sym_hpet_page] = {"hpet_page", true},
 	[sym_pvclock_page] = {"pvclock_page", true},
+	[sym_hvclock_page] = {"hvclock_page", true},
 	[sym_VDSO_FAKE_SECTION_TABLE_START] = {
 		"VDSO_FAKE_SECTION_TABLE_START", false
 	},
diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
index 572cee3..71f5d3a 100644
--- a/arch/x86/entry/vdso/vma.c
+++ b/arch/x86/entry/vdso/vma.c
@@ -21,6 +21,7 @@
 #include <asm/page.h>
 #include <asm/desc.h>
 #include <asm/cpufeature.h>
+#include <asm/mshyperv.h>
 
 #if defined(CONFIG_X86_64)
 unsigned int __read_mostly vdso64_enabled = 1;
@@ -120,6 +121,12 @@ static int vvar_fault(const struct vm_special_mapping *sm,
 				vmf->address,
 				__pa(pvti) >> PAGE_SHIFT);
 		}
+	} else if (sym_offset == image->sym_hvclock_page) {
+		struct ms_hyperv_tsc_page *tsc_pg = hv_get_tsc_page();
+
+		if (tsc_pg && vclock_was_used(VCLOCK_HVCLOCK))
+			ret = vm_insert_pfn(vma, vmf->address,
+					    vmalloc_to_pfn(tsc_pg));
 	}
 
 	if (ret == 0 || ret == -EBUSY)
diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 63dd00e..d08ac5e 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -132,6 +132,9 @@ void hyperv_init(void)
 		tsc_msr.guest_physical_address = vmalloc_to_pfn(tsc_pg);
 
 		wrmsrl(HV_X64_MSR_REFERENCE_TSC, tsc_msr.as_uint64);
+
+		hyperv_cs_tsc.archdata.vclock_mode = VCLOCK_HVCLOCK;
+
 		clocksource_register_hz(&hyperv_cs_tsc, NSEC_PER_SEC/100);
 		return;
 	}
diff --git a/arch/x86/include/asm/clocksource.h b/arch/x86/include/asm/clocksource.h
index eae33c7..47bea8c 100644
--- a/arch/x86/include/asm/clocksource.h
+++ b/arch/x86/include/asm/clocksource.h
@@ -6,7 +6,8 @@
 #define VCLOCK_NONE	0	/* No vDSO clock available.		*/
 #define VCLOCK_TSC	1	/* vDSO should use vread_tsc.		*/
 #define VCLOCK_PVCLOCK	2	/* vDSO should use vread_pvclock.	*/
-#define VCLOCK_MAX	2
+#define VCLOCK_HVCLOCK	3	/* vDSO should use vread_hvclock.	*/
+#define VCLOCK_MAX	3
 
 struct arch_clocksource_data {
 	int vclock_mode;
diff --git a/arch/x86/include/asm/vdso.h b/arch/x86/include/asm/vdso.h
index 2444189..bccdf49 100644
--- a/arch/x86/include/asm/vdso.h
+++ b/arch/x86/include/asm/vdso.h
@@ -20,6 +20,7 @@ struct vdso_image {
 	long sym_vvar_page;
 	long sym_hpet_page;
 	long sym_pvclock_page;
+	long sym_hvclock_page;
 	long sym_VDSO32_NOTE_MASK;
 	long sym___kernel_sigreturn;
 	long sym___kernel_rt_sigreturn;
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v3 3/3] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method
  2017-03-03 13:21 ` Vitaly Kuznetsov
                   ` (4 preceding siblings ...)
  (?)
@ 2017-03-03 13:21 ` Vitaly Kuznetsov
  -1 siblings, 0 replies; 15+ messages in thread
From: Vitaly Kuznetsov @ 2017-03-03 13:21 UTC (permalink / raw)
  To: x86, Andy Lutomirski, Thomas Gleixner
  Cc: Stephen Hemminger, Haiyang Zhang, linux-kernel, virtualization,
	Ingo Molnar, H. Peter Anvin, devel

Hyper-V TSC page clocksource is suitable for vDSO, however, the protocol
defined by the hypervisor is different from VCLOCK_PVCLOCK. Implement the
required support by adding hvclock_page VVAR.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/entry/vdso/vclock_gettime.c  | 24 ++++++++++++++++++++++++
 arch/x86/entry/vdso/vdso-layout.lds.S |  3 ++-
 arch/x86/entry/vdso/vdso2c.c          |  3 +++
 arch/x86/entry/vdso/vma.c             |  7 +++++++
 arch/x86/hyperv/hv_init.c             |  3 +++
 arch/x86/include/asm/clocksource.h    |  3 ++-
 arch/x86/include/asm/vdso.h           |  1 +
 7 files changed, 42 insertions(+), 2 deletions(-)

diff --git a/arch/x86/entry/vdso/vclock_gettime.c b/arch/x86/entry/vdso/vclock_gettime.c
index 9d4d6e1..fa8dbfc 100644
--- a/arch/x86/entry/vdso/vclock_gettime.c
+++ b/arch/x86/entry/vdso/vclock_gettime.c
@@ -17,6 +17,7 @@
 #include <asm/unistd.h>
 #include <asm/msr.h>
 #include <asm/pvclock.h>
+#include <asm/mshyperv.h>
 #include <linux/math64.h>
 #include <linux/time.h>
 #include <linux/kernel.h>
@@ -32,6 +33,11 @@ extern u8 pvclock_page
 	__attribute__((visibility("hidden")));
 #endif
 
+#ifdef CONFIG_HYPERV_TSCPAGE
+extern u8 hvclock_page
+	__attribute__((visibility("hidden")));
+#endif
+
 #ifndef BUILD_VDSO32
 
 notrace static long vdso_fallback_gettime(long clock, struct timespec *ts)
@@ -141,6 +147,20 @@ static notrace u64 vread_pvclock(int *mode)
 	return last;
 }
 #endif
+#ifdef CONFIG_HYPERV_TSCPAGE
+static notrace u64 vread_hvclock(int *mode)
+{
+	const struct ms_hyperv_tsc_page *tsc_pg =
+		(const struct ms_hyperv_tsc_page *)&hvclock_page;
+	u64 current_tick = hv_read_tsc_page(tsc_pg);
+
+	if (current_tick != U64_MAX)
+		return current_tick;
+
+	*mode = VCLOCK_NONE;
+	return 0;
+}
+#endif
 
 notrace static u64 vread_tsc(void)
 {
@@ -173,6 +193,10 @@ notrace static inline u64 vgetsns(int *mode)
 	else if (gtod->vclock_mode == VCLOCK_PVCLOCK)
 		cycles = vread_pvclock(mode);
 #endif
+#ifdef CONFIG_HYPERV_TSCPAGE
+	else if (gtod->vclock_mode == VCLOCK_HVCLOCK)
+		cycles = vread_hvclock(mode);
+#endif
 	else
 		return 0;
 	v = (cycles - gtod->cycle_last) & gtod->mask;
diff --git a/arch/x86/entry/vdso/vdso-layout.lds.S b/arch/x86/entry/vdso/vdso-layout.lds.S
index a708aa9..8ebb4b6 100644
--- a/arch/x86/entry/vdso/vdso-layout.lds.S
+++ b/arch/x86/entry/vdso/vdso-layout.lds.S
@@ -25,7 +25,7 @@ SECTIONS
 	 * segment.
 	 */
 
-	vvar_start = . - 2 * PAGE_SIZE;
+	vvar_start = . - 3 * PAGE_SIZE;
 	vvar_page = vvar_start;
 
 	/* Place all vvars at the offsets in asm/vvar.h. */
@@ -36,6 +36,7 @@ SECTIONS
 #undef EMIT_VVAR
 
 	pvclock_page = vvar_start + PAGE_SIZE;
+	hvclock_page = vvar_start + 2 * PAGE_SIZE;
 
 	. = SIZEOF_HEADERS;
 
diff --git a/arch/x86/entry/vdso/vdso2c.c b/arch/x86/entry/vdso/vdso2c.c
index 491020b..0780a44 100644
--- a/arch/x86/entry/vdso/vdso2c.c
+++ b/arch/x86/entry/vdso/vdso2c.c
@@ -74,6 +74,7 @@ enum {
 	sym_vvar_page,
 	sym_hpet_page,
 	sym_pvclock_page,
+	sym_hvclock_page,
 	sym_VDSO_FAKE_SECTION_TABLE_START,
 	sym_VDSO_FAKE_SECTION_TABLE_END,
 };
@@ -82,6 +83,7 @@ const int special_pages[] = {
 	sym_vvar_page,
 	sym_hpet_page,
 	sym_pvclock_page,
+	sym_hvclock_page,
 };
 
 struct vdso_sym {
@@ -94,6 +96,7 @@ struct vdso_sym required_syms[] = {
 	[sym_vvar_page] = {"vvar_page", true},
 	[sym_hpet_page] = {"hpet_page", true},
 	[sym_pvclock_page] = {"pvclock_page", true},
+	[sym_hvclock_page] = {"hvclock_page", true},
 	[sym_VDSO_FAKE_SECTION_TABLE_START] = {
 		"VDSO_FAKE_SECTION_TABLE_START", false
 	},
diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
index 572cee3..71f5d3a 100644
--- a/arch/x86/entry/vdso/vma.c
+++ b/arch/x86/entry/vdso/vma.c
@@ -21,6 +21,7 @@
 #include <asm/page.h>
 #include <asm/desc.h>
 #include <asm/cpufeature.h>
+#include <asm/mshyperv.h>
 
 #if defined(CONFIG_X86_64)
 unsigned int __read_mostly vdso64_enabled = 1;
@@ -120,6 +121,12 @@ static int vvar_fault(const struct vm_special_mapping *sm,
 				vmf->address,
 				__pa(pvti) >> PAGE_SHIFT);
 		}
+	} else if (sym_offset == image->sym_hvclock_page) {
+		struct ms_hyperv_tsc_page *tsc_pg = hv_get_tsc_page();
+
+		if (tsc_pg && vclock_was_used(VCLOCK_HVCLOCK))
+			ret = vm_insert_pfn(vma, vmf->address,
+					    vmalloc_to_pfn(tsc_pg));
 	}
 
 	if (ret == 0 || ret == -EBUSY)
diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 63dd00e..d08ac5e 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -132,6 +132,9 @@ void hyperv_init(void)
 		tsc_msr.guest_physical_address = vmalloc_to_pfn(tsc_pg);
 
 		wrmsrl(HV_X64_MSR_REFERENCE_TSC, tsc_msr.as_uint64);
+
+		hyperv_cs_tsc.archdata.vclock_mode = VCLOCK_HVCLOCK;
+
 		clocksource_register_hz(&hyperv_cs_tsc, NSEC_PER_SEC/100);
 		return;
 	}
diff --git a/arch/x86/include/asm/clocksource.h b/arch/x86/include/asm/clocksource.h
index eae33c7..47bea8c 100644
--- a/arch/x86/include/asm/clocksource.h
+++ b/arch/x86/include/asm/clocksource.h
@@ -6,7 +6,8 @@
 #define VCLOCK_NONE	0	/* No vDSO clock available.		*/
 #define VCLOCK_TSC	1	/* vDSO should use vread_tsc.		*/
 #define VCLOCK_PVCLOCK	2	/* vDSO should use vread_pvclock.	*/
-#define VCLOCK_MAX	2
+#define VCLOCK_HVCLOCK	3	/* vDSO should use vread_hvclock.	*/
+#define VCLOCK_MAX	3
 
 struct arch_clocksource_data {
 	int vclock_mode;
diff --git a/arch/x86/include/asm/vdso.h b/arch/x86/include/asm/vdso.h
index 2444189..bccdf49 100644
--- a/arch/x86/include/asm/vdso.h
+++ b/arch/x86/include/asm/vdso.h
@@ -20,6 +20,7 @@ struct vdso_image {
 	long sym_vvar_page;
 	long sym_hpet_page;
 	long sym_pvclock_page;
+	long sym_hvclock_page;
 	long sym_VDSO32_NOTE_MASK;
 	long sym___kernel_sigreturn;
 	long sym___kernel_rt_sigreturn;
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 2/3] x86/hyperv: move TSC reading method to asm/mshyperv.h
  2017-03-03 13:21 ` Vitaly Kuznetsov
@ 2017-03-03 19:31   ` Stephen Hemminger
  2017-03-04  9:07       ` Thomas Gleixner
  2017-03-03 19:31   ` Stephen Hemminger
  2017-03-11 13:52   ` [tip:x86/vdso] x86/hyperv: Move " tip-bot for Vitaly Kuznetsov
  2 siblings, 1 reply; 15+ messages in thread
From: Stephen Hemminger @ 2017-03-03 19:31 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: x86, Andy Lutomirski, Thomas Gleixner, Stephen Hemminger,
	Haiyang Zhang, linux-kernel, virtualization, Ingo Molnar,
	H. Peter Anvin, devel


Minor coding comments

> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index d324dce..4ff25436 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -178,6 +178,56 @@ void hyperv_cleanup(void);
>  #endif
>  #ifdef CONFIG_HYPERV_TSCPAGE
>  struct ms_hyperv_tsc_page *hv_get_tsc_page(void);
> +static inline u64 hv_read_tsc_page(const struct ms_hyperv_tsc_page *tsc_pg)
> +{
> +	u64 scale, offset, current_tick, cur_tsc;
> +	u32 sequence;
> +
> +	/*
> +	 * The protocol for reading Hyper-V TSC page is specified in Hypervisor
> +	 * Top-Level Functional Specification ver. 3.0 and above. To get the
> +	 * reference time we must do the following:
> +	 * - READ ReferenceTscSequence
> +	 *   A special '0' value indicates the time source is unreliable and we
> +	 *   need to use something else. The currently published specification
> +	 *   versions (up to 4.0b) contain a mistake and wrongly claim '-1'
> +	 *   instead of '0' as the special value, see commit c35b82ef0294.
> +	 * - ReferenceTime =
> +	 *        ((RDTSC() * ReferenceTscScale) >> 64) + ReferenceTscOffset
> +	 * - READ ReferenceTscSequence again. In case its value has changed
> +	 *   since our first reading we need to discard ReferenceTime and repeat
> +	 *   the whole sequence as the hypervisor was updating the page in
> +	 *   between.
> +	 */
> +	while (1) {
> +		sequence = READ_ONCE(tsc_pg->tsc_sequence);
> +		if (!sequence)
> +			break;

It would be clearer to just return U64_MAX here (and not fall out)
since this is only case here. Also since this failure only occurs if host
clock is not available, probably should be unlikely.

> +		/*
> +		 * Make sure we read sequence before we read other values from
> +		 * TSC page.
> +		 */
> +		smp_rmb();
> +
> +		scale = READ_ONCE(tsc_pg->tsc_scale);
> +		offset = READ_ONCE(tsc_pg->tsc_offset);
> +		cur_tsc = rdtsc_ordered();

Since you already have smp_ barriers and rdtsc_ordered is a barrier,
the compiler barriers (READ_ONCE()) shouldn't be necessary.

> +
> +		current_tick = mul_u64_u64_shr(cur_tsc, scale, 64) + offset;
> +
> +		/*
> +		 * Make sure we read sequence after we read all other values
> +		 * from TSC page.
> +		 */
> +		smp_rmb();
> +
> +		if (READ_ONCE(tsc_pg->tsc_sequence) == sequence)
> +			return current_tick;
> +	}

Why not make do { } while out of this.

	do {
...
	} while (unlikely(READ_ONCE(tsc_pg->tsc_sequence) != sequence);
	return current_tick;

Also don't need to calculate tick value until have good data. As in:

static inline u32 hv_clock_sequence(const struct ms_hyperv_tsc_page *tsc_pg)
{
	u32 sequence =
	return sequence;
}

static inline u64 hv_read_tsc_page(const struct ms_hyperv_tsc_page *tsc_pg)
{
	u64 scale, offset, cur_tsc;
	u32 start;

	/*
	 * The protocol for reading Hyper-V TSC page is specified in Hypervisor
	 * Top-Level Functional Specification ver. 3.0 and above. To get the
	 * reference time we must do the following:
	 * - READ ReferenceTscSequence
	 *   A special '0' value indicates the time source is unreliable and we
	 *   need to use something else. The currently published specification
	 *   versions (up to 4.0b) contain a mistake and wrongly claim '-1'
	 *   instead of '0' as the special value, see commit c35b82ef0294.
	 * - ReferenceTime =
	 *        ((RDTSC() * ReferenceTscScale) >> 64) + ReferenceTscOffset
	 * - READ ReferenceTscSequence again. In case its value has changed
	 *   since our first reading we need to discard ReferenceTime and repeat
	 *   the whole sequence as the hypervisor was updating the page in
	 *   between.
	 */
	do {
		start = READ_ONCE(tsc_pg->tsc_sequence);
		smp_rmb();

		if (unlikely(!start))
			return U64_MAX;

		scale = tsc_pg->tsc_scale;
		offset = tsc_pg->tsc_offset;

		/*
		 * Make sure we read sequence after we read all other values
		 * from TSC page.
		 */
		smp_rmb();
	} while (unlikely(READ_ONCE(tsc_pg->tsc_sequence != start)));

	cur_tsc = rdtsc_ordered();
	return mul_u64_u64_shr(cur_tsc, scale, 64) + offset;
}

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 2/3] x86/hyperv: move TSC reading method to asm/mshyperv.h
  2017-03-03 13:21 ` Vitaly Kuznetsov
  2017-03-03 19:31   ` Stephen Hemminger
@ 2017-03-03 19:31   ` Stephen Hemminger
  2017-03-11 13:52   ` [tip:x86/vdso] x86/hyperv: Move " tip-bot for Vitaly Kuznetsov
  2 siblings, 0 replies; 15+ messages in thread
From: Stephen Hemminger @ 2017-03-03 19:31 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Stephen Hemminger, Haiyang Zhang, x86, linux-kernel,
	Andy Lutomirski, Ingo Molnar, H. Peter Anvin, devel,
	Thomas Gleixner, virtualization


Minor coding comments

> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index d324dce..4ff25436 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -178,6 +178,56 @@ void hyperv_cleanup(void);
>  #endif
>  #ifdef CONFIG_HYPERV_TSCPAGE
>  struct ms_hyperv_tsc_page *hv_get_tsc_page(void);
> +static inline u64 hv_read_tsc_page(const struct ms_hyperv_tsc_page *tsc_pg)
> +{
> +	u64 scale, offset, current_tick, cur_tsc;
> +	u32 sequence;
> +
> +	/*
> +	 * The protocol for reading Hyper-V TSC page is specified in Hypervisor
> +	 * Top-Level Functional Specification ver. 3.0 and above. To get the
> +	 * reference time we must do the following:
> +	 * - READ ReferenceTscSequence
> +	 *   A special '0' value indicates the time source is unreliable and we
> +	 *   need to use something else. The currently published specification
> +	 *   versions (up to 4.0b) contain a mistake and wrongly claim '-1'
> +	 *   instead of '0' as the special value, see commit c35b82ef0294.
> +	 * - ReferenceTime =
> +	 *        ((RDTSC() * ReferenceTscScale) >> 64) + ReferenceTscOffset
> +	 * - READ ReferenceTscSequence again. In case its value has changed
> +	 *   since our first reading we need to discard ReferenceTime and repeat
> +	 *   the whole sequence as the hypervisor was updating the page in
> +	 *   between.
> +	 */
> +	while (1) {
> +		sequence = READ_ONCE(tsc_pg->tsc_sequence);
> +		if (!sequence)
> +			break;

It would be clearer to just return U64_MAX here (and not fall out)
since this is only case here. Also since this failure only occurs if host
clock is not available, probably should be unlikely.

> +		/*
> +		 * Make sure we read sequence before we read other values from
> +		 * TSC page.
> +		 */
> +		smp_rmb();
> +
> +		scale = READ_ONCE(tsc_pg->tsc_scale);
> +		offset = READ_ONCE(tsc_pg->tsc_offset);
> +		cur_tsc = rdtsc_ordered();

Since you already have smp_ barriers and rdtsc_ordered is a barrier,
the compiler barriers (READ_ONCE()) shouldn't be necessary.

> +
> +		current_tick = mul_u64_u64_shr(cur_tsc, scale, 64) + offset;
> +
> +		/*
> +		 * Make sure we read sequence after we read all other values
> +		 * from TSC page.
> +		 */
> +		smp_rmb();
> +
> +		if (READ_ONCE(tsc_pg->tsc_sequence) == sequence)
> +			return current_tick;
> +	}

Why not make do { } while out of this.

	do {
...
	} while (unlikely(READ_ONCE(tsc_pg->tsc_sequence) != sequence);
	return current_tick;

Also don't need to calculate tick value until have good data. As in:

static inline u32 hv_clock_sequence(const struct ms_hyperv_tsc_page *tsc_pg)
{
	u32 sequence =
	return sequence;
}

static inline u64 hv_read_tsc_page(const struct ms_hyperv_tsc_page *tsc_pg)
{
	u64 scale, offset, cur_tsc;
	u32 start;

	/*
	 * The protocol for reading Hyper-V TSC page is specified in Hypervisor
	 * Top-Level Functional Specification ver. 3.0 and above. To get the
	 * reference time we must do the following:
	 * - READ ReferenceTscSequence
	 *   A special '0' value indicates the time source is unreliable and we
	 *   need to use something else. The currently published specification
	 *   versions (up to 4.0b) contain a mistake and wrongly claim '-1'
	 *   instead of '0' as the special value, see commit c35b82ef0294.
	 * - ReferenceTime =
	 *        ((RDTSC() * ReferenceTscScale) >> 64) + ReferenceTscOffset
	 * - READ ReferenceTscSequence again. In case its value has changed
	 *   since our first reading we need to discard ReferenceTime and repeat
	 *   the whole sequence as the hypervisor was updating the page in
	 *   between.
	 */
	do {
		start = READ_ONCE(tsc_pg->tsc_sequence);
		smp_rmb();

		if (unlikely(!start))
			return U64_MAX;

		scale = tsc_pg->tsc_scale;
		offset = tsc_pg->tsc_offset;

		/*
		 * Make sure we read sequence after we read all other values
		 * from TSC page.
		 */
		smp_rmb();
	} while (unlikely(READ_ONCE(tsc_pg->tsc_sequence != start)));

	cur_tsc = rdtsc_ordered();
	return mul_u64_u64_shr(cur_tsc, scale, 64) + offset;
}

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 2/3] x86/hyperv: move TSC reading method to asm/mshyperv.h
  2017-03-03 19:31   ` Stephen Hemminger
@ 2017-03-04  9:07       ` Thomas Gleixner
  0 siblings, 0 replies; 15+ messages in thread
From: Thomas Gleixner @ 2017-03-04  9:07 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Vitaly Kuznetsov, x86, Andy Lutomirski, Stephen Hemminger,
	Haiyang Zhang, linux-kernel, virtualization, Ingo Molnar,
	H. Peter Anvin, devel

On Fri, 3 Mar 2017, Stephen Hemminger wrote:
> static inline u64 hv_read_tsc_page(const struct ms_hyperv_tsc_page *tsc_pg)
> {
> 	u64 scale, offset, cur_tsc;
> 	u32 start;
> 
> 	/*
> 	 * The protocol for reading Hyper-V TSC page is specified in Hypervisor
> 	 * Top-Level Functional Specification ver. 3.0 and above. To get the
> 	 * reference time we must do the following:
> 	 * - READ ReferenceTscSequence
> 	 *   A special '0' value indicates the time source is unreliable and we
> 	 *   need to use something else. The currently published specification
> 	 *   versions (up to 4.0b) contain a mistake and wrongly claim '-1'
> 	 *   instead of '0' as the special value, see commit c35b82ef0294.
> 	 * - ReferenceTime =
> 	 *        ((RDTSC() * ReferenceTscScale) >> 64) + ReferenceTscOffset
> 	 * - READ ReferenceTscSequence again. In case its value has changed
> 	 *   since our first reading we need to discard ReferenceTime and repeat
> 	 *   the whole sequence as the hypervisor was updating the page in
> 	 *   between.
> 	 */
> 	do {
> 		start = READ_ONCE(tsc_pg->tsc_sequence);
> 		smp_rmb();
> 
> 		if (unlikely(!start))
> 			return U64_MAX;
> 
> 		scale = tsc_pg->tsc_scale;
> 		offset = tsc_pg->tsc_offset;
> 
> 		/*
> 		 * Make sure we read sequence after we read all other values
> 		 * from TSC page.
> 		 */
> 		smp_rmb();
> 	} while (unlikely(READ_ONCE(tsc_pg->tsc_sequence != start)));
> 
> 	cur_tsc = rdtsc_ordered();

That's wrong. You need to read the TSC value together with the scale and
offset. That's needs to be "atomic". You can only do the mult/shift
outside.

> 	return mul_u64_u64_shr(cur_tsc, scale, 64) + offset;
> }

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 2/3] x86/hyperv: move TSC reading method to asm/mshyperv.h
@ 2017-03-04  9:07       ` Thomas Gleixner
  0 siblings, 0 replies; 15+ messages in thread
From: Thomas Gleixner @ 2017-03-04  9:07 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Stephen Hemminger, Haiyang Zhang, x86, linux-kernel,
	Andy Lutomirski, Ingo Molnar, H. Peter Anvin, devel,
	virtualization

On Fri, 3 Mar 2017, Stephen Hemminger wrote:
> static inline u64 hv_read_tsc_page(const struct ms_hyperv_tsc_page *tsc_pg)
> {
> 	u64 scale, offset, cur_tsc;
> 	u32 start;
> 
> 	/*
> 	 * The protocol for reading Hyper-V TSC page is specified in Hypervisor
> 	 * Top-Level Functional Specification ver. 3.0 and above. To get the
> 	 * reference time we must do the following:
> 	 * - READ ReferenceTscSequence
> 	 *   A special '0' value indicates the time source is unreliable and we
> 	 *   need to use something else. The currently published specification
> 	 *   versions (up to 4.0b) contain a mistake and wrongly claim '-1'
> 	 *   instead of '0' as the special value, see commit c35b82ef0294.
> 	 * - ReferenceTime =
> 	 *        ((RDTSC() * ReferenceTscScale) >> 64) + ReferenceTscOffset
> 	 * - READ ReferenceTscSequence again. In case its value has changed
> 	 *   since our first reading we need to discard ReferenceTime and repeat
> 	 *   the whole sequence as the hypervisor was updating the page in
> 	 *   between.
> 	 */
> 	do {
> 		start = READ_ONCE(tsc_pg->tsc_sequence);
> 		smp_rmb();
> 
> 		if (unlikely(!start))
> 			return U64_MAX;
> 
> 		scale = tsc_pg->tsc_scale;
> 		offset = tsc_pg->tsc_offset;
> 
> 		/*
> 		 * Make sure we read sequence after we read all other values
> 		 * from TSC page.
> 		 */
> 		smp_rmb();
> 	} while (unlikely(READ_ONCE(tsc_pg->tsc_sequence != start)));
> 
> 	cur_tsc = rdtsc_ordered();

That's wrong. You need to read the TSC value together with the scale and
offset. That's needs to be "atomic". You can only do the mult/shift
outside.

> 	return mul_u64_u64_shr(cur_tsc, scale, 64) + offset;
> }

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [tip:x86/vdso] x86/hyperv: Implement hv_get_tsc_page()
  2017-03-03 13:21 ` Vitaly Kuznetsov
@ 2017-03-11 13:51   ` tip-bot for Vitaly Kuznetsov
  0 siblings, 0 replies; 15+ messages in thread
From: tip-bot for Vitaly Kuznetsov @ 2017-03-11 13:51 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, decui, linux-kernel, vkuznets, haiyangz, hpa, luto,
	sthemmin, tglx, kys

Commit-ID:  bd2a9adaadb8defcaf6c284bca7ff41634105f51
Gitweb:     http://git.kernel.org/tip/bd2a9adaadb8defcaf6c284bca7ff41634105f51
Author:     Vitaly Kuznetsov <vkuznets@redhat.com>
AuthorDate: Fri, 3 Mar 2017 14:21:40 +0100
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Sat, 11 Mar 2017 14:47:28 +0100

x86/hyperv: Implement hv_get_tsc_page()

To use Hyper-V TSC page clocksource from vDSO we need to make tsc_pg
available. Implement hv_get_tsc_page() and add CONFIG_HYPERV_TSCPAGE to
make #ifdef-s simple.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Dexuan Cui <decui@microsoft.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: devel@linuxdriverproject.org
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: virtualization@lists.linux-foundation.org
Link: http://lkml.kernel.org/r/20170303132142.25595-2-vkuznets@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 arch/x86/hyperv/hv_init.c       | 9 +++++++--
 arch/x86/include/asm/mshyperv.h | 8 ++++++++
 drivers/hv/Kconfig              | 3 +++
 3 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 8bef70e..bb1ea58 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -27,10 +27,15 @@
 #include <linux/clockchips.h>
 
 
-#ifdef CONFIG_X86_64
+#ifdef CONFIG_HYPERV_TSCPAGE
 
 static struct ms_hyperv_tsc_page *tsc_pg;
 
+struct ms_hyperv_tsc_page *hv_get_tsc_page(void)
+{
+	return tsc_pg;
+}
+
 static u64 read_hv_clock_tsc(struct clocksource *arg)
 {
 	u64 current_tick;
@@ -139,7 +144,7 @@ void hyperv_init(void)
 	/*
 	 * Register Hyper-V specific clocksource.
 	 */
-#ifdef CONFIG_X86_64
+#ifdef CONFIG_HYPERV_TSCPAGE
 	if (ms_hyperv.features & HV_X64_MSR_REFERENCE_TSC_AVAILABLE) {
 		union hv_x64_msr_hypercall_contents tsc_msr;
 
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index 7c9c895..d324dce 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -176,4 +176,12 @@ void hyperv_report_panic(struct pt_regs *regs);
 bool hv_is_hypercall_page_setup(void);
 void hyperv_cleanup(void);
 #endif
+#ifdef CONFIG_HYPERV_TSCPAGE
+struct ms_hyperv_tsc_page *hv_get_tsc_page(void);
+#else
+static inline struct ms_hyperv_tsc_page *hv_get_tsc_page(void)
+{
+	return NULL;
+}
+#endif
 #endif
diff --git a/drivers/hv/Kconfig b/drivers/hv/Kconfig
index 0403b51..c29cd53 100644
--- a/drivers/hv/Kconfig
+++ b/drivers/hv/Kconfig
@@ -7,6 +7,9 @@ config HYPERV
 	  Select this option to run Linux as a Hyper-V client operating
 	  system.
 
+config HYPERV_TSCPAGE
+       def_bool HYPERV && X86_64
+
 config HYPERV_UTILS
 	tristate "Microsoft Hyper-V Utilities driver"
 	depends on HYPERV && CONNECTOR && NLS

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [tip:x86/vdso] x86/hyperv: Move TSC reading method to asm/mshyperv.h
  2017-03-03 13:21 ` Vitaly Kuznetsov
  2017-03-03 19:31   ` Stephen Hemminger
  2017-03-03 19:31   ` Stephen Hemminger
@ 2017-03-11 13:52   ` tip-bot for Vitaly Kuznetsov
  2 siblings, 0 replies; 15+ messages in thread
From: tip-bot for Vitaly Kuznetsov @ 2017-03-11 13:52 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: vkuznets, sthemmin, luto, kys, haiyangz, mingo, hpa, decui, tglx,
	linux-kernel

Commit-ID:  0733379b512ce36ba0b10942f9597b74f579f063
Gitweb:     http://git.kernel.org/tip/0733379b512ce36ba0b10942f9597b74f579f063
Author:     Vitaly Kuznetsov <vkuznets@redhat.com>
AuthorDate: Fri, 3 Mar 2017 14:21:41 +0100
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Sat, 11 Mar 2017 14:47:28 +0100

x86/hyperv: Move TSC reading method to asm/mshyperv.h

As a preparation to making Hyper-V TSC page suitable for vDSO move
the TSC page reading logic to asm/mshyperv.h. While on it, do the
following:

- Document the reading algorithm.
- Simplify the code a bit.
- Add explicit READ_ONCE() to not rely on 'volatile'.
- Add explicit barriers to prevent re-ordering (we need to read sequence
  strictly before and after)
- Use mul_u64_u64_shr() instead of assembly, gcc generates a single 'mul'
  instruction on x86_64 anyway.

[ tglx: Simplified the loop ]

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Dexuan Cui <decui@microsoft.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: devel@linuxdriverproject.org
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: virtualization@lists.linux-foundation.org
Link: http://lkml.kernel.org/r/20170303132142.25595-3-vkuznets@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 arch/x86/hyperv/hv_init.c       | 36 ++++----------------------------
 arch/x86/include/asm/mshyperv.h | 46 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 50 insertions(+), 32 deletions(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index bb1ea58..7f51523 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -38,39 +38,11 @@ struct ms_hyperv_tsc_page *hv_get_tsc_page(void)
 
 static u64 read_hv_clock_tsc(struct clocksource *arg)
 {
-	u64 current_tick;
+	u64 current_tick = hv_read_tsc_page(tsc_pg);
+
+	if (current_tick == U64_MAX)
+		rdmsrl(HV_X64_MSR_TIME_REF_COUNT, current_tick);
 
-	if (tsc_pg->tsc_sequence != 0) {
-		/*
-		 * Use the tsc page to compute the value.
-		 */
-
-		while (1) {
-			u64 tmp;
-			u32 sequence = tsc_pg->tsc_sequence;
-			u64 cur_tsc;
-			u64 scale = tsc_pg->tsc_scale;
-			s64 offset = tsc_pg->tsc_offset;
-
-			rdtscll(cur_tsc);
-			/* current_tick = ((cur_tsc *scale) >> 64) + offset */
-			asm("mulq %3"
-				: "=d" (current_tick), "=a" (tmp)
-				: "a" (cur_tsc), "r" (scale));
-
-			current_tick += offset;
-			if (tsc_pg->tsc_sequence == sequence)
-				return current_tick;
-
-			if (tsc_pg->tsc_sequence != 0)
-				continue;
-			/*
-			 * Fallback using MSR method.
-			 */
-			break;
-		}
-	}
-	rdmsrl(HV_X64_MSR_TIME_REF_COUNT, current_tick);
 	return current_tick;
 }
 
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index d324dce..fba1007 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -178,6 +178,52 @@ void hyperv_cleanup(void);
 #endif
 #ifdef CONFIG_HYPERV_TSCPAGE
 struct ms_hyperv_tsc_page *hv_get_tsc_page(void);
+static inline u64 hv_read_tsc_page(const struct ms_hyperv_tsc_page *tsc_pg)
+{
+	u64 scale, offset, cur_tsc;
+	u32 sequence;
+
+	/*
+	 * The protocol for reading Hyper-V TSC page is specified in Hypervisor
+	 * Top-Level Functional Specification ver. 3.0 and above. To get the
+	 * reference time we must do the following:
+	 * - READ ReferenceTscSequence
+	 *   A special '0' value indicates the time source is unreliable and we
+	 *   need to use something else. The currently published specification
+	 *   versions (up to 4.0b) contain a mistake and wrongly claim '-1'
+	 *   instead of '0' as the special value, see commit c35b82ef0294.
+	 * - ReferenceTime =
+	 *        ((RDTSC() * ReferenceTscScale) >> 64) + ReferenceTscOffset
+	 * - READ ReferenceTscSequence again. In case its value has changed
+	 *   since our first reading we need to discard ReferenceTime and repeat
+	 *   the whole sequence as the hypervisor was updating the page in
+	 *   between.
+	 */
+	do {
+		sequence = READ_ONCE(tsc_pg->tsc_sequence);
+		if (!sequence)
+			return U64_MAX;
+		/*
+		 * Make sure we read sequence before we read other values from
+		 * TSC page.
+		 */
+		smp_rmb();
+
+		scale = READ_ONCE(tsc_pg->tsc_scale);
+		offset = READ_ONCE(tsc_pg->tsc_offset);
+		cur_tsc = rdtsc_ordered();
+
+		/*
+		 * Make sure we read sequence after we read all other values
+		 * from TSC page.
+		 */
+		smp_rmb();
+
+	} while (READ_ONCE(tsc_pg->tsc_sequence) != sequence);
+
+	return mul_u64_u64_shr(cur_tsc, scale, 64) + offset;
+}
+
 #else
 static inline struct ms_hyperv_tsc_page *hv_get_tsc_page(void)
 {

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [tip:x86/vdso] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method
  2017-03-03 13:21 ` Vitaly Kuznetsov
@ 2017-03-11 13:52   ` tip-bot for Vitaly Kuznetsov
  0 siblings, 0 replies; 15+ messages in thread
From: tip-bot for Vitaly Kuznetsov @ 2017-03-11 13:52 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: vkuznets, decui, luto, sthemmin, hpa, kys, mingo, linux-kernel,
	haiyangz, tglx

Commit-ID:  90b20432aeb850ef84086a72893cd9411479d896
Gitweb:     http://git.kernel.org/tip/90b20432aeb850ef84086a72893cd9411479d896
Author:     Vitaly Kuznetsov <vkuznets@redhat.com>
AuthorDate: Fri, 3 Mar 2017 14:21:42 +0100
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Sat, 11 Mar 2017 14:47:28 +0100

x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method

Hyper-V TSC page clocksource is suitable for vDSO, however, the protocol
defined by the hypervisor is different from VCLOCK_PVCLOCK. Implement the
required support by adding hvclock_page VVAR.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Dexuan Cui <decui@microsoft.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: devel@linuxdriverproject.org
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: virtualization@lists.linux-foundation.org
Link: http://lkml.kernel.org/r/20170303132142.25595-4-vkuznets@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 arch/x86/entry/vdso/vclock_gettime.c  | 24 ++++++++++++++++++++++++
 arch/x86/entry/vdso/vdso-layout.lds.S |  3 ++-
 arch/x86/entry/vdso/vdso2c.c          |  3 +++
 arch/x86/entry/vdso/vma.c             |  7 +++++++
 arch/x86/hyperv/hv_init.c             |  3 +++
 arch/x86/include/asm/clocksource.h    |  3 ++-
 arch/x86/include/asm/vdso.h           |  1 +
 7 files changed, 42 insertions(+), 2 deletions(-)

diff --git a/arch/x86/entry/vdso/vclock_gettime.c b/arch/x86/entry/vdso/vclock_gettime.c
index 9d4d6e1..fa8dbfc 100644
--- a/arch/x86/entry/vdso/vclock_gettime.c
+++ b/arch/x86/entry/vdso/vclock_gettime.c
@@ -17,6 +17,7 @@
 #include <asm/unistd.h>
 #include <asm/msr.h>
 #include <asm/pvclock.h>
+#include <asm/mshyperv.h>
 #include <linux/math64.h>
 #include <linux/time.h>
 #include <linux/kernel.h>
@@ -32,6 +33,11 @@ extern u8 pvclock_page
 	__attribute__((visibility("hidden")));
 #endif
 
+#ifdef CONFIG_HYPERV_TSCPAGE
+extern u8 hvclock_page
+	__attribute__((visibility("hidden")));
+#endif
+
 #ifndef BUILD_VDSO32
 
 notrace static long vdso_fallback_gettime(long clock, struct timespec *ts)
@@ -141,6 +147,20 @@ static notrace u64 vread_pvclock(int *mode)
 	return last;
 }
 #endif
+#ifdef CONFIG_HYPERV_TSCPAGE
+static notrace u64 vread_hvclock(int *mode)
+{
+	const struct ms_hyperv_tsc_page *tsc_pg =
+		(const struct ms_hyperv_tsc_page *)&hvclock_page;
+	u64 current_tick = hv_read_tsc_page(tsc_pg);
+
+	if (current_tick != U64_MAX)
+		return current_tick;
+
+	*mode = VCLOCK_NONE;
+	return 0;
+}
+#endif
 
 notrace static u64 vread_tsc(void)
 {
@@ -173,6 +193,10 @@ notrace static inline u64 vgetsns(int *mode)
 	else if (gtod->vclock_mode == VCLOCK_PVCLOCK)
 		cycles = vread_pvclock(mode);
 #endif
+#ifdef CONFIG_HYPERV_TSCPAGE
+	else if (gtod->vclock_mode == VCLOCK_HVCLOCK)
+		cycles = vread_hvclock(mode);
+#endif
 	else
 		return 0;
 	v = (cycles - gtod->cycle_last) & gtod->mask;
diff --git a/arch/x86/entry/vdso/vdso-layout.lds.S b/arch/x86/entry/vdso/vdso-layout.lds.S
index a708aa9..8ebb4b6 100644
--- a/arch/x86/entry/vdso/vdso-layout.lds.S
+++ b/arch/x86/entry/vdso/vdso-layout.lds.S
@@ -25,7 +25,7 @@ SECTIONS
 	 * segment.
 	 */
 
-	vvar_start = . - 2 * PAGE_SIZE;
+	vvar_start = . - 3 * PAGE_SIZE;
 	vvar_page = vvar_start;
 
 	/* Place all vvars at the offsets in asm/vvar.h. */
@@ -36,6 +36,7 @@ SECTIONS
 #undef EMIT_VVAR
 
 	pvclock_page = vvar_start + PAGE_SIZE;
+	hvclock_page = vvar_start + 2 * PAGE_SIZE;
 
 	. = SIZEOF_HEADERS;
 
diff --git a/arch/x86/entry/vdso/vdso2c.c b/arch/x86/entry/vdso/vdso2c.c
index 491020b..0780a44 100644
--- a/arch/x86/entry/vdso/vdso2c.c
+++ b/arch/x86/entry/vdso/vdso2c.c
@@ -74,6 +74,7 @@ enum {
 	sym_vvar_page,
 	sym_hpet_page,
 	sym_pvclock_page,
+	sym_hvclock_page,
 	sym_VDSO_FAKE_SECTION_TABLE_START,
 	sym_VDSO_FAKE_SECTION_TABLE_END,
 };
@@ -82,6 +83,7 @@ const int special_pages[] = {
 	sym_vvar_page,
 	sym_hpet_page,
 	sym_pvclock_page,
+	sym_hvclock_page,
 };
 
 struct vdso_sym {
@@ -94,6 +96,7 @@ struct vdso_sym required_syms[] = {
 	[sym_vvar_page] = {"vvar_page", true},
 	[sym_hpet_page] = {"hpet_page", true},
 	[sym_pvclock_page] = {"pvclock_page", true},
+	[sym_hvclock_page] = {"hvclock_page", true},
 	[sym_VDSO_FAKE_SECTION_TABLE_START] = {
 		"VDSO_FAKE_SECTION_TABLE_START", false
 	},
diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
index 226ca70..faf80fd 100644
--- a/arch/x86/entry/vdso/vma.c
+++ b/arch/x86/entry/vdso/vma.c
@@ -22,6 +22,7 @@
 #include <asm/page.h>
 #include <asm/desc.h>
 #include <asm/cpufeature.h>
+#include <asm/mshyperv.h>
 
 #if defined(CONFIG_X86_64)
 unsigned int __read_mostly vdso64_enabled = 1;
@@ -121,6 +122,12 @@ static int vvar_fault(const struct vm_special_mapping *sm,
 				vmf->address,
 				__pa(pvti) >> PAGE_SHIFT);
 		}
+	} else if (sym_offset == image->sym_hvclock_page) {
+		struct ms_hyperv_tsc_page *tsc_pg = hv_get_tsc_page();
+
+		if (tsc_pg && vclock_was_used(VCLOCK_HVCLOCK))
+			ret = vm_insert_pfn(vma, vmf->address,
+					    vmalloc_to_pfn(tsc_pg));
 	}
 
 	if (ret == 0 || ret == -EBUSY)
diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 7f51523..2b01421 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -132,6 +132,9 @@ void hyperv_init(void)
 		tsc_msr.guest_physical_address = vmalloc_to_pfn(tsc_pg);
 
 		wrmsrl(HV_X64_MSR_REFERENCE_TSC, tsc_msr.as_uint64);
+
+		hyperv_cs_tsc.archdata.vclock_mode = VCLOCK_HVCLOCK;
+
 		clocksource_register_hz(&hyperv_cs_tsc, NSEC_PER_SEC/100);
 		return;
 	}
diff --git a/arch/x86/include/asm/clocksource.h b/arch/x86/include/asm/clocksource.h
index eae33c7..47bea8c 100644
--- a/arch/x86/include/asm/clocksource.h
+++ b/arch/x86/include/asm/clocksource.h
@@ -6,7 +6,8 @@
 #define VCLOCK_NONE	0	/* No vDSO clock available.		*/
 #define VCLOCK_TSC	1	/* vDSO should use vread_tsc.		*/
 #define VCLOCK_PVCLOCK	2	/* vDSO should use vread_pvclock.	*/
-#define VCLOCK_MAX	2
+#define VCLOCK_HVCLOCK	3	/* vDSO should use vread_hvclock.	*/
+#define VCLOCK_MAX	3
 
 struct arch_clocksource_data {
 	int vclock_mode;
diff --git a/arch/x86/include/asm/vdso.h b/arch/x86/include/asm/vdso.h
index 2444189..bccdf49 100644
--- a/arch/x86/include/asm/vdso.h
+++ b/arch/x86/include/asm/vdso.h
@@ -20,6 +20,7 @@ struct vdso_image {
 	long sym_vvar_page;
 	long sym_hpet_page;
 	long sym_pvclock_page;
+	long sym_hvclock_page;
 	long sym_VDSO32_NOTE_MASK;
 	long sym___kernel_sigreturn;
 	long sym___kernel_rt_sigreturn;

^ permalink raw reply related	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2017-03-11 13:53 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-03 13:21 [PATCH v3 0/3] x86/vdso: Add Hyper-V TSC page clocksource support Vitaly Kuznetsov
2017-03-03 13:21 ` Vitaly Kuznetsov
2017-03-03 13:21 ` [PATCH v3 1/3] x86/hyperv: implement hv_get_tsc_page() Vitaly Kuznetsov
2017-03-03 13:21 ` Vitaly Kuznetsov
2017-03-11 13:51   ` [tip:x86/vdso] x86/hyperv: Implement hv_get_tsc_page() tip-bot for Vitaly Kuznetsov
2017-03-03 13:21 ` [PATCH v3 2/3] x86/hyperv: move TSC reading method to asm/mshyperv.h Vitaly Kuznetsov
2017-03-03 13:21 ` Vitaly Kuznetsov
2017-03-03 19:31   ` Stephen Hemminger
2017-03-04  9:07     ` Thomas Gleixner
2017-03-04  9:07       ` Thomas Gleixner
2017-03-03 19:31   ` Stephen Hemminger
2017-03-11 13:52   ` [tip:x86/vdso] x86/hyperv: Move " tip-bot for Vitaly Kuznetsov
2017-03-03 13:21 ` [PATCH v3 3/3] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method Vitaly Kuznetsov
2017-03-03 13:21 ` Vitaly Kuznetsov
2017-03-11 13:52   ` [tip:x86/vdso] " tip-bot for Vitaly Kuznetsov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.