Linux-HyperV Archive on lore.kernel.org
 help / color / Atom feed
* [RFC 0/6] vDSO support for Hyper-V guest on ARM64
@ 2019-12-16  0:19 Boqun Feng
  2019-12-16  0:19 ` [RFC 1/6] arm64: hyperv: Allow hv_get_raw_timer() definition to be overridden Boqun Feng
                   ` (6 more replies)
  0 siblings, 7 replies; 14+ messages in thread
From: Boqun Feng @ 2019-12-16  0:19 UTC (permalink / raw)
  To: linux-hyperv, linux-arm-kernel, linux-kernel
  Cc: Michael Kelley, Vincenzo Frascino, Catalin Marinas, Will Deacon,
	Thomas Gleixner, K. Y. Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Sasha Levin, xen-devel, Stefano Stabellini,
	Boqun Feng

Hi,

This is the RFC patchset for vDSO support in ARM64 Hyper-V guest. To
test it, Michael's ARM64 support patchset:

	https://lore.kernel.org/linux-arm-kernel/1570129355-16005-1-git-send-email-mikelley@microsoft.com/

is needed.

Similar as x86, Hyper-V on ARM64 use a TSC page for guests to read
the virtualized hardware timer, this TSC page is read-only for the
guests, so could be used for vDSO data page. And the vDSO (userspace)
code could use the same code for timer reading as kernel, since
they read the same TSC page.

This patchset therefore extends ARM64's __vsdo_init() to allow multiple
data pages and introduces the vclock_mode concept similar to x86 to
allow different platforms (bare-metal, Hyper-V, etc.) to switch to
different __arch_get_hw_counter() implementations. The rest of this
patchset does the necessary setup for Hyper-V guests: mapping tsc page,
enabling userspace to read cntvct, etc. to enable vDSO.

This patchset consists of 6 patches:

patch #1 allows hv_get_raw_timer() definition to be overridden for
userspace and kernel to share the same hv_read_tsc_page() definition.

patch #2 extends ARM64 to support multiple vDSO data pages.

patch #3 introduces vclock_mode similiar to x86 to allow different
__arch_get_hw_counter() implementations for different clocksources.

patch #4 maps Hyper-V TSC page into vDSO data page.

patch #5 allows userspace to read cntvct, so that userspace can
efficiently read the clocksource.

patch #6 enables the vDSO for ARM64 Hyper-V guest.

The whole patchset is based on v5.5-rc1 plus Michael's ARM64 support
patchset, and I've done a few tests with:

	https://github.com/nlynch-mentor/vdsotest

Comments and suggestions are welcome!

Regards,
Boqun

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [RFC 1/6] arm64: hyperv: Allow hv_get_raw_timer() definition to be overridden
  2019-12-16  0:19 [RFC 0/6] vDSO support for Hyper-V guest on ARM64 Boqun Feng
@ 2019-12-16  0:19 ` Boqun Feng
  2019-12-16  0:19 ` [RFC 2/6] arm64: vdso: Add support for multiple vDSO data pages Boqun Feng
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 14+ messages in thread
From: Boqun Feng @ 2019-12-16  0:19 UTC (permalink / raw)
  To: linux-hyperv, linux-arm-kernel, linux-kernel
  Cc: Michael Kelley, Vincenzo Frascino, Catalin Marinas, Will Deacon,
	Thomas Gleixner, K. Y. Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Sasha Levin, xen-devel, Stefano Stabellini,
	Boqun Feng

In order to support vDSO, hv_read_tsc_page() should be able to be called
from userspace if tsc page mapped. As a result, hv_get_raw_timer(),
called by hv_read_tsc_page() requires to be called by both kernel and
vDSO. Currently, it's defined as arch_timer_read_counter(), which is a
function pointer initialized (using a kernel address) by the arch timer
driver, therefore not usable in vDSO.

Fix this by allowing a previous definition to override the default one,
so that in vDSO code, we can define it as a function callable in
userspace.

Signed-off-by: Boqun Feng (Microsoft) <boqun.feng@gmail.com>
---
 arch/arm64/include/asm/mshyperv.h | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/mshyperv.h b/arch/arm64/include/asm/mshyperv.h
index a8468a611912..9cc4aeddf2d0 100644
--- a/arch/arm64/include/asm/mshyperv.h
+++ b/arch/arm64/include/asm/mshyperv.h
@@ -97,8 +97,15 @@ extern void hv_get_vpreg_128(u32 reg, struct hv_get_vp_register_output *result);
 #define hv_disable_stimer0_percpu_irq(irq)	disable_percpu_irq(irq)
 #endif
 
-/* ARM64 specific code to read the hardware clock */
+/*
+ * ARM64 specific code to read the hardware clock.
+ *
+ * This could be used in both kernel space and userspace (vDSO), so make it
+ * possible for a previous definition to override the default one.
+ */
+#ifndef hv_get_raw_timer
 #define hv_get_raw_timer() arch_timer_read_counter()
+#endif
 
 #include <asm-generic/mshyperv.h>
 
-- 
2.24.0


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [RFC 2/6] arm64: vdso: Add support for multiple vDSO data pages
  2019-12-16  0:19 [RFC 0/6] vDSO support for Hyper-V guest on ARM64 Boqun Feng
  2019-12-16  0:19 ` [RFC 1/6] arm64: hyperv: Allow hv_get_raw_timer() definition to be overridden Boqun Feng
@ 2019-12-16  0:19 ` Boqun Feng
  2019-12-16  0:19 ` [RFC 3/6] arm/arm64: clocksource: Introduce vclock_mode Boqun Feng
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 14+ messages in thread
From: Boqun Feng @ 2019-12-16  0:19 UTC (permalink / raw)
  To: linux-hyperv, linux-arm-kernel, linux-kernel
  Cc: Michael Kelley, Vincenzo Frascino, Catalin Marinas, Will Deacon,
	Thomas Gleixner, K. Y. Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Sasha Levin, xen-devel, Stefano Stabellini,
	Boqun Feng, Matteo Croce

Split __vdso_abi::vdso_pages into nr_vdso_{data,code}_pages, so that
__setup_additional_pages() could work with multiple vDSO data pages with
the setup from __vdso_init().

Multiple vDSO data pages are required when running in a virtualized
environment, where the cycles read from cntvct at userspace need to
be adjusted with some data from a page maintained by the hypervisor. For
example, the TSC page in Hyper-V.

This is a prerequisite for vDSO support in ARM64 on Hyper-V.

Signed-off-by: Boqun Feng (Microsoft) <boqun.feng@gmail.com>
---
 arch/arm64/kernel/vdso.c | 43 ++++++++++++++++++++++++----------------
 1 file changed, 26 insertions(+), 17 deletions(-)

diff --git a/arch/arm64/kernel/vdso.c b/arch/arm64/kernel/vdso.c
index 354b11e27c07..b9b5ec7a3084 100644
--- a/arch/arm64/kernel/vdso.c
+++ b/arch/arm64/kernel/vdso.c
@@ -50,7 +50,8 @@ struct __vdso_abi {
 	const char *name;
 	const char *vdso_code_start;
 	const char *vdso_code_end;
-	unsigned long vdso_pages;
+	unsigned long nr_vdso_data_pages;
+	unsigned long nr_vdso_code_pages;
 	/* Data Mapping */
 	struct vm_special_mapping *dm;
 	/* Code Mapping */
@@ -101,6 +102,8 @@ static int __vdso_init(enum arch_vdso_type arch_index)
 {
 	int i;
 	struct page **vdso_pagelist;
+	struct page **vdso_code_pagelist;
+	unsigned long nr_vdso_pages;
 	unsigned long pfn;
 
 	if (memcmp(vdso_lookup[arch_index].vdso_code_start, "\177ELF", 4)) {
@@ -108,14 +111,18 @@ static int __vdso_init(enum arch_vdso_type arch_index)
 		return -EINVAL;
 	}
 
-	vdso_lookup[arch_index].vdso_pages = (
+	vdso_lookup[arch_index].nr_vdso_data_pages = 1;
+
+	vdso_lookup[arch_index].nr_vdso_code_pages = (
 			vdso_lookup[arch_index].vdso_code_end -
 			vdso_lookup[arch_index].vdso_code_start) >>
 			PAGE_SHIFT;
 
-	/* Allocate the vDSO pagelist, plus a page for the data. */
-	vdso_pagelist = kcalloc(vdso_lookup[arch_index].vdso_pages + 1,
-				sizeof(struct page *),
+	nr_vdso_pages = vdso_lookup[arch_index].nr_vdso_data_pages +
+			vdso_lookup[arch_index].nr_vdso_code_pages;
+
+	/* Allocate the vDSO pagelist. */
+	vdso_pagelist = kcalloc(nr_vdso_pages, sizeof(struct page *),
 				GFP_KERNEL);
 	if (vdso_pagelist == NULL)
 		return -ENOMEM;
@@ -123,15 +130,17 @@ static int __vdso_init(enum arch_vdso_type arch_index)
 	/* Grab the vDSO data page. */
 	vdso_pagelist[0] = phys_to_page(__pa_symbol(vdso_data));
 
-
 	/* Grab the vDSO code pages. */
 	pfn = sym_to_pfn(vdso_lookup[arch_index].vdso_code_start);
 
-	for (i = 0; i < vdso_lookup[arch_index].vdso_pages; i++)
-		vdso_pagelist[i + 1] = pfn_to_page(pfn + i);
+	vdso_code_pagelist = vdso_pagelist +
+			     vdso_lookup[arch_index].nr_vdso_data_pages;
+
+	for (i = 0; i < vdso_lookup[arch_index].nr_vdso_code_pages; i++)
+		vdso_code_pagelist[i] = pfn_to_page(pfn + i);
 
-	vdso_lookup[arch_index].dm->pages = &vdso_pagelist[0];
-	vdso_lookup[arch_index].cm->pages = &vdso_pagelist[1];
+	vdso_lookup[arch_index].dm->pages = vdso_pagelist;
+	vdso_lookup[arch_index].cm->pages = vdso_code_pagelist;
 
 	return 0;
 }
@@ -141,26 +150,26 @@ static int __setup_additional_pages(enum arch_vdso_type arch_index,
 				    struct linux_binprm *bprm,
 				    int uses_interp)
 {
-	unsigned long vdso_base, vdso_text_len, vdso_mapping_len;
+	unsigned long vdso_base, vdso_text_len, vdso_data_len;
 	void *ret;
 
-	vdso_text_len = vdso_lookup[arch_index].vdso_pages << PAGE_SHIFT;
-	/* Be sure to map the data page */
-	vdso_mapping_len = vdso_text_len + PAGE_SIZE;
+	vdso_data_len = vdso_lookup[arch_index].nr_vdso_data_pages << PAGE_SHIFT;
+	vdso_text_len = vdso_lookup[arch_index].nr_vdso_code_pages << PAGE_SHIFT;
 
-	vdso_base = get_unmapped_area(NULL, 0, vdso_mapping_len, 0, 0);
+	vdso_base = get_unmapped_area(NULL, 0,
+				      vdso_data_len + vdso_text_len, 0, 0);
 	if (IS_ERR_VALUE(vdso_base)) {
 		ret = ERR_PTR(vdso_base);
 		goto up_fail;
 	}
 
-	ret = _install_special_mapping(mm, vdso_base, PAGE_SIZE,
+	ret = _install_special_mapping(mm, vdso_base, vdso_data_len,
 				       VM_READ|VM_MAYREAD,
 				       vdso_lookup[arch_index].dm);
 	if (IS_ERR(ret))
 		goto up_fail;
 
-	vdso_base += PAGE_SIZE;
+	vdso_base += vdso_data_len;
 	mm->context.vdso = (void *)vdso_base;
 	ret = _install_special_mapping(mm, vdso_base, vdso_text_len,
 				       VM_READ|VM_EXEC|
-- 
2.24.0


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [RFC 3/6] arm/arm64: clocksource: Introduce vclock_mode
  2019-12-16  0:19 [RFC 0/6] vDSO support for Hyper-V guest on ARM64 Boqun Feng
  2019-12-16  0:19 ` [RFC 1/6] arm64: hyperv: Allow hv_get_raw_timer() definition to be overridden Boqun Feng
  2019-12-16  0:19 ` [RFC 2/6] arm64: vdso: Add support for multiple vDSO data pages Boqun Feng
@ 2019-12-16  0:19 ` Boqun Feng
  2019-12-16  0:19 ` [RFC 4/6] arm64: vdso: hyperv: Map tsc page into vDSO if enabled Boqun Feng
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 14+ messages in thread
From: Boqun Feng @ 2019-12-16  0:19 UTC (permalink / raw)
  To: linux-hyperv, linux-arm-kernel, linux-kernel
  Cc: Michael Kelley, Vincenzo Frascino, Catalin Marinas, Will Deacon,
	Thomas Gleixner, K. Y. Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Sasha Levin, xen-devel, Stefano Stabellini,
	Boqun Feng, Russell King, Mark Rutland, Marc Zyngier,
	Daniel Lezcano, Allison Randal, Enrico Weigelt,
	Geert Uytterhoeven, Huacai Chen

Similar to x86, use a vclock_mode in arch_clocksource_data to differ
clocksoures use different read function in vDSO.

No functional changes, only preparation for support vDSO in ARM64 on
Hyper-V.

Note: the changes for arm are only because arm and arm64 share the same
code in the arch timer driver and require arch_clocksource_data having
the same field.

Signed-off-by: Boqun Feng (Microsoft) <boqun.feng@gmail.com>
---
 arch/arm/include/asm/clocksource.h                | 6 +++++-
 arch/arm/kernel/vdso.c                            | 1 -
 arch/arm64/include/asm/clocksource.h              | 6 +++++-
 arch/arm64/include/asm/mshyperv.h                 | 2 +-
 arch/arm64/include/asm/vdso/compat_gettimeofday.h | 5 +++--
 arch/arm64/include/asm/vdso/gettimeofday.h        | 5 +++--
 arch/arm64/include/asm/vdso/vsyscall.h            | 4 +---
 drivers/clocksource/arm_arch_timer.c              | 8 ++++----
 8 files changed, 22 insertions(+), 15 deletions(-)

diff --git a/arch/arm/include/asm/clocksource.h b/arch/arm/include/asm/clocksource.h
index 0b350a7e26f3..017c5ab6e587 100644
--- a/arch/arm/include/asm/clocksource.h
+++ b/arch/arm/include/asm/clocksource.h
@@ -1,8 +1,12 @@
 #ifndef _ASM_CLOCKSOURCE_H
 #define _ASM_CLOCKSOURCE_H
 
+#define VCLOCK_NONE	0	/* No vDSO clock available.		*/
+#define VCLOCK_CNTVCT	1	/* vDSO should use cntvcnt		*/
+#define VCLOCK_MAX	1
+
 struct arch_clocksource_data {
-	bool vdso_direct;	/* Usable for direct VDSO access? */
+	int vclock_mode;
 };
 
 #endif
diff --git a/arch/arm/kernel/vdso.c b/arch/arm/kernel/vdso.c
index c89ac1b9d28b..09e46ec420fe 100644
--- a/arch/arm/kernel/vdso.c
+++ b/arch/arm/kernel/vdso.c
@@ -263,4 +263,3 @@ void arm_install_vdso(struct mm_struct *mm, unsigned long addr)
 	if (!IS_ERR(vma))
 		mm->context.vdso = addr;
 }
-
diff --git a/arch/arm64/include/asm/clocksource.h b/arch/arm64/include/asm/clocksource.h
index 0ece64a26c8c..fbe80057468c 100644
--- a/arch/arm64/include/asm/clocksource.h
+++ b/arch/arm64/include/asm/clocksource.h
@@ -2,8 +2,12 @@
 #ifndef _ASM_CLOCKSOURCE_H
 #define _ASM_CLOCKSOURCE_H
 
+#define VCLOCK_NONE	0	/* No vDSO clock available.		*/
+#define VCLOCK_CNTVCT	1	/* vDSO should use cntvcnt		*/
+#define VCLOCK_MAX	1
+
 struct arch_clocksource_data {
-	bool vdso_direct;	/* Usable for direct VDSO access? */
+	int vclock_mode;
 };
 
 #endif
diff --git a/arch/arm64/include/asm/mshyperv.h b/arch/arm64/include/asm/mshyperv.h
index 9cc4aeddf2d0..0afb00e3501d 100644
--- a/arch/arm64/include/asm/mshyperv.h
+++ b/arch/arm64/include/asm/mshyperv.h
@@ -90,7 +90,7 @@ extern void hv_get_vpreg_128(u32 reg, struct hv_get_vp_register_output *result);
 #define hv_set_reference_tsc(val) \
 		hv_set_vpreg(HV_REGISTER_REFERENCE_TSC, val)
 #define hv_set_clocksource_vdso(val) \
-		((val).archdata.vdso_direct = false)
+		((val).archdata.vclock_mode = VCLOCK_NONE)
 
 #if IS_ENABLED(CONFIG_HYPERV)
 #define hv_enable_stimer0_percpu_irq(irq)	enable_percpu_irq(irq, 0)
diff --git a/arch/arm64/include/asm/vdso/compat_gettimeofday.h b/arch/arm64/include/asm/vdso/compat_gettimeofday.h
index c50ee1b7d5cd..630d04c3c92e 100644
--- a/arch/arm64/include/asm/vdso/compat_gettimeofday.h
+++ b/arch/arm64/include/asm/vdso/compat_gettimeofday.h
@@ -8,6 +8,7 @@
 #ifndef __ASSEMBLY__
 
 #include <asm/unistd.h>
+#include <asm/clocksource.h>
 #include <uapi/linux/time.h>
 
 #include <asm/vdso/compat_barrier.h>
@@ -117,10 +118,10 @@ static __always_inline u64 __arch_get_hw_counter(s32 clock_mode)
 	u64 res;
 
 	/*
-	 * clock_mode == 0 implies that vDSO are enabled otherwise
+	 * clock_mode == VCLOCK_NONE implies that vDSO are disabled so
 	 * fallback on syscall.
 	 */
-	if (clock_mode)
+	if (clock_mode == VCLOCK_NONE)
 		return __VDSO_USE_SYSCALL;
 
 	/*
diff --git a/arch/arm64/include/asm/vdso/gettimeofday.h b/arch/arm64/include/asm/vdso/gettimeofday.h
index b08f476b72b4..e6e3fe0488c7 100644
--- a/arch/arm64/include/asm/vdso/gettimeofday.h
+++ b/arch/arm64/include/asm/vdso/gettimeofday.h
@@ -8,6 +8,7 @@
 #ifndef __ASSEMBLY__
 
 #include <asm/unistd.h>
+#include <asm/clocksource.h>
 #include <uapi/linux/time.h>
 
 #define __VDSO_USE_SYSCALL		ULLONG_MAX
@@ -71,10 +72,10 @@ static __always_inline u64 __arch_get_hw_counter(s32 clock_mode)
 	u64 res;
 
 	/*
-	 * clock_mode == 0 implies that vDSO are enabled otherwise
+	 * clock_mode == VCLOCK_NONE implies that vDSO are disabled so
 	 * fallback on syscall.
 	 */
-	if (clock_mode)
+	if (clock_mode == VCLOCK_NONE)
 		return __VDSO_USE_SYSCALL;
 
 	/*
diff --git a/arch/arm64/include/asm/vdso/vsyscall.h b/arch/arm64/include/asm/vdso/vsyscall.h
index 0c20a7c1bee5..07f78b0da498 100644
--- a/arch/arm64/include/asm/vdso/vsyscall.h
+++ b/arch/arm64/include/asm/vdso/vsyscall.h
@@ -24,9 +24,7 @@ struct vdso_data *__arm64_get_k_vdso_data(void)
 static __always_inline
 int __arm64_get_clock_mode(struct timekeeper *tk)
 {
-	u32 use_syscall = !tk->tkr_mono.clock->archdata.vdso_direct;
-
-	return use_syscall;
+	return tk->tkr_mono.clock->archdata.vclock_mode;
 }
 #define __arch_get_clock_mode __arm64_get_clock_mode
 
diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
index 9a5464c625b4..9b8d4d00b53b 100644
--- a/drivers/clocksource/arm_arch_timer.c
+++ b/drivers/clocksource/arm_arch_timer.c
@@ -69,7 +69,7 @@ static enum arch_timer_ppi_nr arch_timer_uses_ppi = ARCH_TIMER_VIRT_PPI;
 static bool arch_timer_c3stop;
 static bool arch_timer_mem_use_virtual;
 static bool arch_counter_suspend_stop;
-static bool vdso_default = true;
+static int vdso_default = VCLOCK_CNTVCT;
 
 static cpumask_t evtstrm_available = CPU_MASK_NONE;
 static bool evtstrm_enable = IS_ENABLED(CONFIG_ARM_ARCH_TIMER_EVTSTREAM);
@@ -560,8 +560,8 @@ void arch_timer_enable_workaround(const struct arch_timer_erratum_workaround *wa
 	 * change both the default value and the vdso itself.
 	 */
 	if (wa->read_cntvct_el0) {
-		clocksource_counter.archdata.vdso_direct = false;
-		vdso_default = false;
+		clocksource_counter.archdata.vclock_mode = VCLOCK_NONE;
+		vdso_default = VCLOCK_NONE;
 	}
 }
 
@@ -979,7 +979,7 @@ static void __init arch_counter_register(unsigned type)
 		}
 
 		arch_timer_read_counter = rd;
-		clocksource_counter.archdata.vdso_direct = vdso_default;
+		clocksource_counter.archdata.vclock_mode = vdso_default;
 	} else {
 		arch_timer_read_counter = arch_counter_get_cntvct_mem;
 	}
-- 
2.24.0


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [RFC 4/6] arm64: vdso: hyperv: Map tsc page into vDSO if enabled
  2019-12-16  0:19 [RFC 0/6] vDSO support for Hyper-V guest on ARM64 Boqun Feng
                   ` (2 preceding siblings ...)
  2019-12-16  0:19 ` [RFC 3/6] arm/arm64: clocksource: Introduce vclock_mode Boqun Feng
@ 2019-12-16  0:19 ` Boqun Feng
  2019-12-16  0:19 ` [RFC 5/6] arm64: hyperv: Enable userspace to read cntvct Boqun Feng
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 14+ messages in thread
From: Boqun Feng @ 2019-12-16  0:19 UTC (permalink / raw)
  To: linux-hyperv, linux-arm-kernel, linux-kernel
  Cc: Michael Kelley, Vincenzo Frascino, Catalin Marinas, Will Deacon,
	Thomas Gleixner, K. Y. Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Sasha Levin, xen-devel, Stefano Stabellini,
	Boqun Feng, Matteo Croce, Allison Randal, Greg Kroah-Hartman,
	Alexios Zavras

On Hyper-V, a tsc page has the data for adjusting cntvct numbers to
clocksource cycles, and that's how Hyper-V guest kernel reads the
clocksource. In order to allow userspace to read the same clocksource
directly, the tsc page has to been mapped into userspace via vDSO.

Use the framework for vDSO set-up in __vdso_init() to do this.

Note: if HYPERV_TIMER=y but the kernel is using other clocksource or
doesn't have the hyperv timer clocksource, tsc page will still be mapped
into userspace.

Signed-off-by: Boqun Feng (Microsoft) <boqun.feng@gmail.com>
---
 arch/arm64/kernel/vdso.c          | 12 ++++++++++++
 arch/arm64/kernel/vdso/vdso.lds.S | 12 +++++++++++-
 2 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/vdso.c b/arch/arm64/kernel/vdso.c
index b9b5ec7a3084..18a634987bdc 100644
--- a/arch/arm64/kernel/vdso.c
+++ b/arch/arm64/kernel/vdso.c
@@ -9,6 +9,7 @@
 
 #include <linux/cache.h>
 #include <linux/clocksource.h>
+#include <clocksource/hyperv_timer.h>
 #include <linux/elf.h>
 #include <linux/err.h>
 #include <linux/errno.h>
@@ -105,14 +106,22 @@ static int __vdso_init(enum arch_vdso_type arch_index)
 	struct page **vdso_code_pagelist;
 	unsigned long nr_vdso_pages;
 	unsigned long pfn;
+	struct ms_hyperv_tsc_page *tsc_page;
+	int tsc_page_idx;
 
 	if (memcmp(vdso_lookup[arch_index].vdso_code_start, "\177ELF", 4)) {
 		pr_err("vDSO is not a valid ELF object!\n");
 		return -EINVAL;
 	}
 
+	/* One vDSO data page */
 	vdso_lookup[arch_index].nr_vdso_data_pages = 1;
 
+	/* Grab the Hyper-V tsc page, if enabled, add one more page */
+	tsc_page = hv_get_tsc_page();
+	if (tsc_page)
+		tsc_page_idx = vdso_lookup[arch_index].nr_vdso_data_pages++;
+
 	vdso_lookup[arch_index].nr_vdso_code_pages = (
 			vdso_lookup[arch_index].vdso_code_end -
 			vdso_lookup[arch_index].vdso_code_start) >>
@@ -130,6 +139,9 @@ static int __vdso_init(enum arch_vdso_type arch_index)
 	/* Grab the vDSO data page. */
 	vdso_pagelist[0] = phys_to_page(__pa_symbol(vdso_data));
 
+	if (tsc_page)
+		vdso_pagelist[tsc_page_idx] = phys_to_page(__pa(tsc_page));
+
 	/* Grab the vDSO code pages. */
 	pfn = sym_to_pfn(vdso_lookup[arch_index].vdso_code_start);
 
diff --git a/arch/arm64/kernel/vdso/vdso.lds.S b/arch/arm64/kernel/vdso/vdso.lds.S
index 7ad2d3a0cd48..e40a1f5a6d30 100644
--- a/arch/arm64/kernel/vdso/vdso.lds.S
+++ b/arch/arm64/kernel/vdso/vdso.lds.S
@@ -17,7 +17,17 @@ OUTPUT_ARCH(aarch64)
 
 SECTIONS
 {
-	PROVIDE(_vdso_data = . - PAGE_SIZE);
+	/*
+	 * vdso data pages:
+	 *   vdso data (1 page)
+	 *   hv tsc page (1 page if enabled)
+	 */
+	PROVIDE(_vdso_data = _hvclock_page - PAGE_SIZE);
+#ifdef CONFIG_HYPERV_TIMER
+	PROVIDE(_hvclock_page = . - PAGE_SIZE);
+#else
+	PROVIDE(_hvclock_page = .);
+#endif
 	. = VDSO_LBASE + SIZEOF_HEADERS;
 
 	.hash		: { *(.hash) }			:text
-- 
2.24.0


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [RFC 5/6] arm64: hyperv: Enable userspace to read cntvct
  2019-12-16  0:19 [RFC 0/6] vDSO support for Hyper-V guest on ARM64 Boqun Feng
                   ` (3 preceding siblings ...)
  2019-12-16  0:19 ` [RFC 4/6] arm64: vdso: hyperv: Map tsc page into vDSO if enabled Boqun Feng
@ 2019-12-16  0:19 ` Boqun Feng
  2019-12-16  0:19 ` [RFC 6/6] arm64: hyperv: Enable vDSO Boqun Feng
  2020-01-23 10:48 ` [RFC 0/6] vDSO support for Hyper-V guest on ARM64 Vincenzo Frascino
  6 siblings, 0 replies; 14+ messages in thread
From: Boqun Feng @ 2019-12-16  0:19 UTC (permalink / raw)
  To: linux-hyperv, linux-arm-kernel, linux-kernel
  Cc: Michael Kelley, Vincenzo Frascino, Catalin Marinas, Will Deacon,
	Thomas Gleixner, K. Y. Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Sasha Levin, xen-devel, Stefano Stabellini,
	Boqun Feng

Since reading hyperv-timer clocksource requires reading cntvct,
userspace should be allowed to read it, otherwise reading cntvct will
result in traps, which makes vsyscall's cost similar compared to
syscall's.

So enable it on every cpu when a Hyper-V guest booting up.

Signed-off-by: Boqun Feng (Microsoft) <boqun.feng@gmail.com>
---
 arch/arm64/hyperv/hv_init.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/arch/arm64/hyperv/hv_init.c b/arch/arm64/hyperv/hv_init.c
index 86e4621d5885..1ea97ecfb143 100644
--- a/arch/arm64/hyperv/hv_init.c
+++ b/arch/arm64/hyperv/hv_init.c
@@ -27,6 +27,7 @@
 #include <linux/sched_clock.h>
 #include <asm-generic/bug.h>
 #include <asm/hyperv-tlfs.h>
+#include <asm/arch_timer.h>
 #include <asm/mshyperv.h>
 #include <asm/sysreg.h>
 #include <clocksource/hyperv_timer.h>
@@ -45,6 +46,7 @@ EXPORT_SYMBOL_GPL(hv_max_vp_index);
 static int hv_cpu_init(unsigned int cpu)
 {
 	u64 msr_vp_index;
+	u32 cntkctl;
 
 	hv_get_vp_index(msr_vp_index);
 
@@ -53,6 +55,11 @@ static int hv_cpu_init(unsigned int cpu)
 	if (msr_vp_index > hv_max_vp_index)
 		hv_max_vp_index = msr_vp_index;
 
+	/* Enable EL0 to access cntvct */
+	cntkctl = arch_timer_get_cntkctl();
+	cntkctl |= ARCH_TIMER_USR_VCT_ACCESS_EN;
+	arch_timer_set_cntkctl(cntkctl);
+
 	return 0;
 }
 
-- 
2.24.0


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [RFC 6/6] arm64: hyperv: Enable vDSO
  2019-12-16  0:19 [RFC 0/6] vDSO support for Hyper-V guest on ARM64 Boqun Feng
                   ` (4 preceding siblings ...)
  2019-12-16  0:19 ` [RFC 5/6] arm64: hyperv: Enable userspace to read cntvct Boqun Feng
@ 2019-12-16  0:19 ` Boqun Feng
  2019-12-17 14:10   ` Vitaly Kuznetsov
  2020-01-23 10:48 ` [RFC 0/6] vDSO support for Hyper-V guest on ARM64 Vincenzo Frascino
  6 siblings, 1 reply; 14+ messages in thread
From: Boqun Feng @ 2019-12-16  0:19 UTC (permalink / raw)
  To: linux-hyperv, linux-arm-kernel, linux-kernel
  Cc: Michael Kelley, Vincenzo Frascino, Catalin Marinas, Will Deacon,
	Thomas Gleixner, K. Y. Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Sasha Levin, xen-devel, Stefano Stabellini,
	Boqun Feng

Similar to x86, add a new vclock_mode VCLOCK_HVCLOCK, and reuse the
hv_read_tsc_page() for userspace to read tsc page clocksource.

Signed-off-by: Boqun Feng (Microsoft) <boqun.feng@gmail.com>
---
 arch/arm64/include/asm/clocksource.h       |  3 ++-
 arch/arm64/include/asm/mshyperv.h          |  2 +-
 arch/arm64/include/asm/vdso/gettimeofday.h | 19 +++++++++++++++++++
 3 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/clocksource.h b/arch/arm64/include/asm/clocksource.h
index fbe80057468c..c6acd45fe748 100644
--- a/arch/arm64/include/asm/clocksource.h
+++ b/arch/arm64/include/asm/clocksource.h
@@ -4,7 +4,8 @@
 
 #define VCLOCK_NONE	0	/* No vDSO clock available.		*/
 #define VCLOCK_CNTVCT	1	/* vDSO should use cntvcnt		*/
-#define VCLOCK_MAX	1
+#define VCLOCK_HVCLOCK	2	/* vDSO should use vread_hvclock()	*/
+#define VCLOCK_MAX	2
 
 struct arch_clocksource_data {
 	int vclock_mode;
diff --git a/arch/arm64/include/asm/mshyperv.h b/arch/arm64/include/asm/mshyperv.h
index 0afb00e3501d..7c85dd816dca 100644
--- a/arch/arm64/include/asm/mshyperv.h
+++ b/arch/arm64/include/asm/mshyperv.h
@@ -90,7 +90,7 @@ extern void hv_get_vpreg_128(u32 reg, struct hv_get_vp_register_output *result);
 #define hv_set_reference_tsc(val) \
 		hv_set_vpreg(HV_REGISTER_REFERENCE_TSC, val)
 #define hv_set_clocksource_vdso(val) \
-		((val).archdata.vclock_mode = VCLOCK_NONE)
+		((val).archdata.vclock_mode = VCLOCK_HVCLOCK)
 
 #if IS_ENABLED(CONFIG_HYPERV)
 #define hv_enable_stimer0_percpu_irq(irq)	enable_percpu_irq(irq, 0)
diff --git a/arch/arm64/include/asm/vdso/gettimeofday.h b/arch/arm64/include/asm/vdso/gettimeofday.h
index e6e3fe0488c7..7e689b903f4d 100644
--- a/arch/arm64/include/asm/vdso/gettimeofday.h
+++ b/arch/arm64/include/asm/vdso/gettimeofday.h
@@ -67,6 +67,20 @@ int clock_getres_fallback(clockid_t _clkid, struct __kernel_timespec *_ts)
 	return ret;
 }
 
+#ifdef CONFIG_HYPERV_TIMER
+/* This will override the default hv_get_raw_timer() */
+#define hv_get_raw_timer() __arch_counter_get_cntvct()
+#include <clocksource/hyperv_timer.h>
+
+extern struct ms_hyperv_tsc_page
+_hvclock_page __attribute__((visibility("hidden")));
+
+static u64 vread_hvclock(void)
+{
+	return hv_read_tsc_page(&_hvclock_page);
+}
+#endif
+
 static __always_inline u64 __arch_get_hw_counter(s32 clock_mode)
 {
 	u64 res;
@@ -78,6 +92,11 @@ static __always_inline u64 __arch_get_hw_counter(s32 clock_mode)
 	if (clock_mode == VCLOCK_NONE)
 		return __VDSO_USE_SYSCALL;
 
+#ifdef CONFIG_HYPERV_TIMER
+	if (likely(clock_mode == VCLOCK_HVCLOCK))
+		return vread_hvclock();
+#endif
+
 	/*
 	 * This isb() is required to prevent that the counter value
 	 * is speculated.
-- 
2.24.0


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC 6/6] arm64: hyperv: Enable vDSO
  2019-12-16  0:19 ` [RFC 6/6] arm64: hyperv: Enable vDSO Boqun Feng
@ 2019-12-17 14:10   ` Vitaly Kuznetsov
  2019-12-18  5:47     ` Boqun Feng
  0 siblings, 1 reply; 14+ messages in thread
From: Vitaly Kuznetsov @ 2019-12-17 14:10 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Michael Kelley, Vincenzo Frascino, Catalin Marinas, Will Deacon,
	Thomas Gleixner, K. Y. Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Sasha Levin, xen-devel, Stefano Stabellini,
	Boqun Feng, linux-hyperv, linux-arm-kernel, linux-kernel

Boqun Feng <boqun.feng@gmail.com> writes:

> Similar to x86, add a new vclock_mode VCLOCK_HVCLOCK, and reuse the
> hv_read_tsc_page() for userspace to read tsc page clocksource.
>
> Signed-off-by: Boqun Feng (Microsoft) <boqun.feng@gmail.com>
> ---
>  arch/arm64/include/asm/clocksource.h       |  3 ++-
>  arch/arm64/include/asm/mshyperv.h          |  2 +-
>  arch/arm64/include/asm/vdso/gettimeofday.h | 19 +++++++++++++++++++
>  3 files changed, 22 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm64/include/asm/clocksource.h b/arch/arm64/include/asm/clocksource.h
> index fbe80057468c..c6acd45fe748 100644
> --- a/arch/arm64/include/asm/clocksource.h
> +++ b/arch/arm64/include/asm/clocksource.h
> @@ -4,7 +4,8 @@
>  
>  #define VCLOCK_NONE	0	/* No vDSO clock available.		*/
>  #define VCLOCK_CNTVCT	1	/* vDSO should use cntvcnt		*/
> -#define VCLOCK_MAX	1
> +#define VCLOCK_HVCLOCK	2	/* vDSO should use vread_hvclock()	*/
> +#define VCLOCK_MAX	2
>  
>  struct arch_clocksource_data {
>  	int vclock_mode;
> diff --git a/arch/arm64/include/asm/mshyperv.h b/arch/arm64/include/asm/mshyperv.h
> index 0afb00e3501d..7c85dd816dca 100644
> --- a/arch/arm64/include/asm/mshyperv.h
> +++ b/arch/arm64/include/asm/mshyperv.h
> @@ -90,7 +90,7 @@ extern void hv_get_vpreg_128(u32 reg, struct hv_get_vp_register_output *result);
>  #define hv_set_reference_tsc(val) \
>  		hv_set_vpreg(HV_REGISTER_REFERENCE_TSC, val)
>  #define hv_set_clocksource_vdso(val) \
> -		((val).archdata.vclock_mode = VCLOCK_NONE)
> +		((val).archdata.vclock_mode = VCLOCK_HVCLOCK)
>  
>  #if IS_ENABLED(CONFIG_HYPERV)
>  #define hv_enable_stimer0_percpu_irq(irq)	enable_percpu_irq(irq, 0)
> diff --git a/arch/arm64/include/asm/vdso/gettimeofday.h b/arch/arm64/include/asm/vdso/gettimeofday.h
> index e6e3fe0488c7..7e689b903f4d 100644
> --- a/arch/arm64/include/asm/vdso/gettimeofday.h
> +++ b/arch/arm64/include/asm/vdso/gettimeofday.h
> @@ -67,6 +67,20 @@ int clock_getres_fallback(clockid_t _clkid, struct __kernel_timespec *_ts)
>  	return ret;
>  }
>  
> +#ifdef CONFIG_HYPERV_TIMER
> +/* This will override the default hv_get_raw_timer() */
> +#define hv_get_raw_timer() __arch_counter_get_cntvct()
> +#include <clocksource/hyperv_timer.h>
> +
> +extern struct ms_hyperv_tsc_page
> +_hvclock_page __attribute__((visibility("hidden")));
> +
> +static u64 vread_hvclock(void)
> +{
> +	return hv_read_tsc_page(&_hvclock_page);
> +}
> +#endif

The function is almost the same on x86 (&_hvclock_page ->
&hvclock_page), would it maybe make sense to move this to arch neutral
clocksource/hyperv_timer.h?

> +
>  static __always_inline u64 __arch_get_hw_counter(s32 clock_mode)
>  {
>  	u64 res;
> @@ -78,6 +92,11 @@ static __always_inline u64 __arch_get_hw_counter(s32 clock_mode)
>  	if (clock_mode == VCLOCK_NONE)
>  		return __VDSO_USE_SYSCALL;
>  
> +#ifdef CONFIG_HYPERV_TIMER
> +	if (likely(clock_mode == VCLOCK_HVCLOCK))
> +		return vread_hvclock();

I'm not sure likely() is justified here: it'll make ALL builds which
enable CONFIG_HYPERV_TIMER (e.g. distro kernels) to prefer
VCLOCK_HVCLOCK, even if the kernel is not running on Hyper-V.

> +#endif
> +
>  	/*
>  	 * This isb() is required to prevent that the counter value
>  	 * is speculated.

-- 
Vitaly


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC 6/6] arm64: hyperv: Enable vDSO
  2019-12-17 14:10   ` Vitaly Kuznetsov
@ 2019-12-18  5:47     ` Boqun Feng
  0 siblings, 0 replies; 14+ messages in thread
From: Boqun Feng @ 2019-12-18  5:47 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Michael Kelley, Vincenzo Frascino, Catalin Marinas, Will Deacon,
	Thomas Gleixner, K. Y. Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Sasha Levin, xen-devel, Stefano Stabellini,
	linux-hyperv, linux-arm-kernel, linux-kernel

On Tue, Dec 17, 2019 at 03:10:16PM +0100, Vitaly Kuznetsov wrote:
> Boqun Feng <boqun.feng@gmail.com> writes:
> 
> > Similar to x86, add a new vclock_mode VCLOCK_HVCLOCK, and reuse the
> > hv_read_tsc_page() for userspace to read tsc page clocksource.
> >
> > Signed-off-by: Boqun Feng (Microsoft) <boqun.feng@gmail.com>
> > ---
> >  arch/arm64/include/asm/clocksource.h       |  3 ++-
> >  arch/arm64/include/asm/mshyperv.h          |  2 +-
> >  arch/arm64/include/asm/vdso/gettimeofday.h | 19 +++++++++++++++++++
> >  3 files changed, 22 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/arm64/include/asm/clocksource.h b/arch/arm64/include/asm/clocksource.h
> > index fbe80057468c..c6acd45fe748 100644
> > --- a/arch/arm64/include/asm/clocksource.h
> > +++ b/arch/arm64/include/asm/clocksource.h
> > @@ -4,7 +4,8 @@
> >  
> >  #define VCLOCK_NONE	0	/* No vDSO clock available.		*/
> >  #define VCLOCK_CNTVCT	1	/* vDSO should use cntvcnt		*/
> > -#define VCLOCK_MAX	1
> > +#define VCLOCK_HVCLOCK	2	/* vDSO should use vread_hvclock()	*/
> > +#define VCLOCK_MAX	2
> >  
> >  struct arch_clocksource_data {
> >  	int vclock_mode;
> > diff --git a/arch/arm64/include/asm/mshyperv.h b/arch/arm64/include/asm/mshyperv.h
> > index 0afb00e3501d..7c85dd816dca 100644
> > --- a/arch/arm64/include/asm/mshyperv.h
> > +++ b/arch/arm64/include/asm/mshyperv.h
> > @@ -90,7 +90,7 @@ extern void hv_get_vpreg_128(u32 reg, struct hv_get_vp_register_output *result);
> >  #define hv_set_reference_tsc(val) \
> >  		hv_set_vpreg(HV_REGISTER_REFERENCE_TSC, val)
> >  #define hv_set_clocksource_vdso(val) \
> > -		((val).archdata.vclock_mode = VCLOCK_NONE)
> > +		((val).archdata.vclock_mode = VCLOCK_HVCLOCK)
> >  
> >  #if IS_ENABLED(CONFIG_HYPERV)
> >  #define hv_enable_stimer0_percpu_irq(irq)	enable_percpu_irq(irq, 0)
> > diff --git a/arch/arm64/include/asm/vdso/gettimeofday.h b/arch/arm64/include/asm/vdso/gettimeofday.h
> > index e6e3fe0488c7..7e689b903f4d 100644
> > --- a/arch/arm64/include/asm/vdso/gettimeofday.h
> > +++ b/arch/arm64/include/asm/vdso/gettimeofday.h
> > @@ -67,6 +67,20 @@ int clock_getres_fallback(clockid_t _clkid, struct __kernel_timespec *_ts)
> >  	return ret;
> >  }
> >  
> > +#ifdef CONFIG_HYPERV_TIMER
> > +/* This will override the default hv_get_raw_timer() */
> > +#define hv_get_raw_timer() __arch_counter_get_cntvct()
> > +#include <clocksource/hyperv_timer.h>
> > +
> > +extern struct ms_hyperv_tsc_page
> > +_hvclock_page __attribute__((visibility("hidden")));
> > +
> > +static u64 vread_hvclock(void)
> > +{
> > +	return hv_read_tsc_page(&_hvclock_page);
> > +}
> > +#endif
> 
> The function is almost the same on x86 (&_hvclock_page ->
> &hvclock_page), would it maybe make sense to move this to arch neutral
> clocksource/hyperv_timer.h?
> 

I'm not sure whether the underscore matters in the vDSO data symbol, so
I follow the architectural name convention. If the leading underscore
doesn't have special purpose I'm happy to move this to arch neutral
header file.

> > +
> >  static __always_inline u64 __arch_get_hw_counter(s32 clock_mode)
> >  {
> >  	u64 res;
> > @@ -78,6 +92,11 @@ static __always_inline u64 __arch_get_hw_counter(s32 clock_mode)
> >  	if (clock_mode == VCLOCK_NONE)
> >  		return __VDSO_USE_SYSCALL;
> >  
> > +#ifdef CONFIG_HYPERV_TIMER
> > +	if (likely(clock_mode == VCLOCK_HVCLOCK))
> > +		return vread_hvclock();
> 
> I'm not sure likely() is justified here: it'll make ALL builds which
> enable CONFIG_HYPERV_TIMER (e.g. distro kernels) to prefer
> VCLOCK_HVCLOCK, even if the kernel is not running on Hyper-V.
> 

Make sense. Thanks for pointing this out! I will change it in the next
version.

Regards,
Boqun

> > +#endif
> > +
> >  	/*
> >  	 * This isb() is required to prevent that the counter value
> >  	 * is speculated.
> 
> -- 
> Vitaly
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC 0/6] vDSO support for Hyper-V guest on ARM64
  2019-12-16  0:19 [RFC 0/6] vDSO support for Hyper-V guest on ARM64 Boqun Feng
                   ` (5 preceding siblings ...)
  2019-12-16  0:19 ` [RFC 6/6] arm64: hyperv: Enable vDSO Boqun Feng
@ 2020-01-23 10:48 ` Vincenzo Frascino
  2020-01-24  6:32   ` Boqun Feng
  6 siblings, 1 reply; 14+ messages in thread
From: Vincenzo Frascino @ 2020-01-23 10:48 UTC (permalink / raw)
  To: Boqun Feng, linux-hyperv, linux-arm-kernel, linux-kernel
  Cc: Sasha Levin, Stephen Hemminger, Catalin Marinas, Haiyang Zhang,
	Michael Kelley, Stefano Stabellini, xen-devel, Thomas Gleixner,
	K. Y. Srinivasan, Will Deacon

[-- Attachment #1: Type: text/plain, Size: 2527 bytes --]

Hi Boqun Feng,

sorry for the late reply.

On 16/12/2019 00:19, Boqun Feng wrote:
> Hi,
> 
> This is the RFC patchset for vDSO support in ARM64 Hyper-V guest. To
> test it, Michael's ARM64 support patchset:
> 
> 	https://lore.kernel.org/linux-arm-kernel/1570129355-16005-1-git-send-email-mikelley@microsoft.com/
> 
> is needed.
> 
> Similar as x86, Hyper-V on ARM64 use a TSC page for guests to read
> the virtualized hardware timer, this TSC page is read-only for the
> guests, so could be used for vDSO data page. And the vDSO (userspace)
> code could use the same code for timer reading as kernel, since
> they read the same TSC page.
> 

I had a look to your patches and overall, I could not understand why we can't
use the arch_timer to do the same things you are doing with the one you
introduced in this series. What confuses me is that KVM works just fine with the
arch_timer which was designed with virtualization in mind. Why do we need
another one? Could you please explain?

> This patchset therefore extends ARM64's __vsdo_init() to allow multiple
> data pages and introduces the vclock_mode concept similar to x86 to
> allow different platforms (bare-metal, Hyper-V, etc.) to switch to
> different __arch_get_hw_counter() implementations. The rest of this
> patchset does the necessary setup for Hyper-V guests: mapping tsc page,
> enabling userspace to read cntvct, etc. to enable vDSO.
> 
> This patchset consists of 6 patches:
> 
> patch #1 allows hv_get_raw_timer() definition to be overridden for
> userspace and kernel to share the same hv_read_tsc_page() definition.
> 
> patch #2 extends ARM64 to support multiple vDSO data pages.
> 
> patch #3 introduces vclock_mode similiar to x86 to allow different
> __arch_get_hw_counter() implementations for different clocksources.
> 
> patch #4 maps Hyper-V TSC page into vDSO data page.
> 
> patch #5 allows userspace to read cntvct, so that userspace can
> efficiently read the clocksource.
> 
> patch #6 enables the vDSO for ARM64 Hyper-V guest.
> 
> The whole patchset is based on v5.5-rc1 plus Michael's ARM64 support
> patchset, and I've done a few tests with:
> 
> 	https://github.com/nlynch-mentor/vdsotest
> 
> Comments and suggestions are welcome!
> 
> Regards,
> Boqun
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 

-- 
Regards,
Vincenzo

[-- Attachment #2: pEpkey.asc --]
[-- Type: application/pgp-keys, Size: 14291 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC 0/6] vDSO support for Hyper-V guest on ARM64
  2020-01-23 10:48 ` [RFC 0/6] vDSO support for Hyper-V guest on ARM64 Vincenzo Frascino
@ 2020-01-24  6:32   ` Boqun Feng
  2020-01-24 10:24     ` Vincenzo Frascino
  0 siblings, 1 reply; 14+ messages in thread
From: Boqun Feng @ 2020-01-24  6:32 UTC (permalink / raw)
  To: Vincenzo Frascino
  Cc: linux-hyperv, linux-arm-kernel, linux-kernel, Sasha Levin,
	Stephen Hemminger, Catalin Marinas, Haiyang Zhang,
	Michael Kelley, Stefano Stabellini, xen-devel, Thomas Gleixner,
	K. Y. Srinivasan, Will Deacon

Hi Vincenzo,

On Thu, Jan 23, 2020 at 10:48:07AM +0000, Vincenzo Frascino wrote:
> Hi Boqun Feng,
> 
> sorry for the late reply.
> 

That's OK, thanks for your review ;-)

> On 16/12/2019 00:19, Boqun Feng wrote:
> > Hi,
> > 
> > This is the RFC patchset for vDSO support in ARM64 Hyper-V guest. To
> > test it, Michael's ARM64 support patchset:
> > 
> > 	https://lore.kernel.org/linux-arm-kernel/1570129355-16005-1-git-send-email-mikelley@microsoft.com/
> > 
> > is needed.
> > 
> > Similar as x86, Hyper-V on ARM64 use a TSC page for guests to read
> > the virtualized hardware timer, this TSC page is read-only for the
> > guests, so could be used for vDSO data page. And the vDSO (userspace)
> > code could use the same code for timer reading as kernel, since
> > they read the same TSC page.
> > 
> 
> I had a look to your patches and overall, I could not understand why we can't
> use the arch_timer to do the same things you are doing with the one you
> introduced in this series. What confuses me is that KVM works just fine with the
> arch_timer which was designed with virtualization in mind. Why do we need
> another one? Could you please explain?
> 

Please note that the guest VM on Hyper-V for ARM64 doesn't use
arch_timer as the clocksource. See:

	https://lore.kernel.org/linux-arm-kernel/1570129355-16005-7-git-send-email-mikelley@microsoft.com/

,  ACPI_SIG_GTDT is used for setting up Hyper-V synthetic clocksource
and other initialization work.

So just to be clear, your suggestion is

1) Hyper-V guest on ARM64 should use arch_timer as clocksource and vDSO
will just work.

or

2) Even though arch_timer is not used as the clocksource, we can still
use it for vDSO.

?

Regards,
Boqun

> > This patchset therefore extends ARM64's __vsdo_init() to allow multiple
> > data pages and introduces the vclock_mode concept similar to x86 to
> > allow different platforms (bare-metal, Hyper-V, etc.) to switch to
> > different __arch_get_hw_counter() implementations. The rest of this
> > patchset does the necessary setup for Hyper-V guests: mapping tsc page,
> > enabling userspace to read cntvct, etc. to enable vDSO.
> > 
> > This patchset consists of 6 patches:
> > 
> > patch #1 allows hv_get_raw_timer() definition to be overridden for
> > userspace and kernel to share the same hv_read_tsc_page() definition.
> > 
> > patch #2 extends ARM64 to support multiple vDSO data pages.
> > 
> > patch #3 introduces vclock_mode similiar to x86 to allow different
> > __arch_get_hw_counter() implementations for different clocksources.
> > 
> > patch #4 maps Hyper-V TSC page into vDSO data page.
> > 
> > patch #5 allows userspace to read cntvct, so that userspace can
> > efficiently read the clocksource.
> > 
> > patch #6 enables the vDSO for ARM64 Hyper-V guest.
> > 
> > The whole patchset is based on v5.5-rc1 plus Michael's ARM64 support
> > patchset, and I've done a few tests with:
> > 
> > 	https://github.com/nlynch-mentor/vdsotest
> > 
> > Comments and suggestions are welcome!
> > 
> > Regards,
> > Boqun
> > 
> > _______________________________________________
> > linux-arm-kernel mailing list
> > linux-arm-kernel@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> > 
> 
> -- 
> Regards,
> Vincenzo



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC 0/6] vDSO support for Hyper-V guest on ARM64
  2020-01-24  6:32   ` Boqun Feng
@ 2020-01-24 10:24     ` Vincenzo Frascino
  2020-01-28  5:58       ` Boqun Feng
  0 siblings, 1 reply; 14+ messages in thread
From: Vincenzo Frascino @ 2020-01-24 10:24 UTC (permalink / raw)
  To: Boqun Feng
  Cc: linux-hyperv, linux-arm-kernel, linux-kernel, Sasha Levin,
	Stephen Hemminger, Catalin Marinas, Haiyang Zhang,
	Michael Kelley, Stefano Stabellini, xen-devel, Thomas Gleixner,
	K. Y. Srinivasan, Will Deacon

[-- Attachment #1: Type: text/plain, Size: 1287 bytes --]

Hi Boqun Feng,

On 24/01/2020 06:32, Boqun Feng wrote:
> Hi Vincenzo,
> 

[...]

>>
>> I had a look to your patches and overall, I could not understand why we can't
>> use the arch_timer to do the same things you are doing with the one you
>> introduced in this series. What confuses me is that KVM works just fine with the
>> arch_timer which was designed with virtualization in mind. Why do we need
>> another one? Could you please explain?
>>
> 
> Please note that the guest VM on Hyper-V for ARM64 doesn't use
> arch_timer as the clocksource. See:
> 
> 	https://lore.kernel.org/linux-arm-kernel/1570129355-16005-7-git-send-email-mikelley@microsoft.com/
> 
> ,  ACPI_SIG_GTDT is used for setting up Hyper-V synthetic clocksource
> and other initialization work.
>

I had a look a look at it and my question stands, why do we need another timer
on arm64?

> So just to be clear, your suggestion is
> 
> 1) Hyper-V guest on ARM64 should use arch_timer as clocksource and vDSO
> will just work.
> 
> or
> 
> 2) Even though arch_timer is not used as the clocksource, we can still
> use it for vDSO.
> 
> ?
> 

Option #1 would be the preferred solution, unless there is a good reason against.

> Regards,
> Boqun
> 

-- 
Regards,
Vincenzo

[-- Attachment #2: pEpkey.asc --]
[-- Type: application/pgp-keys, Size: 14291 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC 0/6] vDSO support for Hyper-V guest on ARM64
  2020-01-24 10:24     ` Vincenzo Frascino
@ 2020-01-28  5:58       ` Boqun Feng
  2020-01-28 11:48         ` Marc Zyngier
  0 siblings, 1 reply; 14+ messages in thread
From: Boqun Feng @ 2020-01-28  5:58 UTC (permalink / raw)
  To: Vincenzo Frascino
  Cc: linux-hyperv, linux-arm-kernel, linux-kernel, Sasha Levin,
	Stephen Hemminger, Catalin Marinas, Haiyang Zhang,
	Michael Kelley, Stefano Stabellini, xen-devel, Thomas Gleixner,
	K. Y. Srinivasan, Will Deacon

On Fri, Jan 24, 2020 at 10:24:44AM +0000, Vincenzo Frascino wrote:
> Hi Boqun Feng,
> 
> On 24/01/2020 06:32, Boqun Feng wrote:
> > Hi Vincenzo,
> > 
> 
> [...]
> 
> >>
> >> I had a look to your patches and overall, I could not understand why we can't
> >> use the arch_timer to do the same things you are doing with the one you
> >> introduced in this series. What confuses me is that KVM works just fine with the
> >> arch_timer which was designed with virtualization in mind. Why do we need
> >> another one? Could you please explain?
> >>
> > 
> > Please note that the guest VM on Hyper-V for ARM64 doesn't use
> > arch_timer as the clocksource. See:
> > 
> > 	https://lore.kernel.org/linux-arm-kernel/1570129355-16005-7-git-send-email-mikelley@microsoft.com/
> > 
> > ,  ACPI_SIG_GTDT is used for setting up Hyper-V synthetic clocksource
> > and other initialization work.
> >
> 
> I had a look a look at it and my question stands, why do we need another timer
> on arm64?
> 

Sorry for the late response. It's weekend and Chinese New Year, so I got
to spend some time making (and mostly eating) dumplings ;-)

After discussion with Michael, here is some explanation why we need
another timer:

The synthetic clocks that Hyper-V presents in a guest VM were originally
created for the x86 architecture. They provide a level of abstraction
that solves problems like continuity across live migrations where the
hardware clock (i.e., TSC in the case x86) frequency may be different
across the migration. When Hyper-V was brought to ARM64, this
abstraction was maintained to provide consistency across the x86 and
ARM64 architectures, and for both Windows and Linux guest VMs.   The
core Linux code for the Hyper-V clocks (in
drivers/clocksource/hyperv_timer.c) is architecture neutral and works on
both x86 and ARM64. As you can see, this part is done in Michael's
patchset.

Arguably, Hyper-V for ARM64 should have optimized for consistency with
the ARM64 community rather with the existing x86 implementation and
existing guest code in Windows. But at this point, it is what it is,
and the Hyper-V clocks do solve problems like migration that aren’t
addressed in ARM64 until v8.4 of the architecture with the addition of
the counter hardware scaling feature. Hyper-V doesn’t currently map the
ARM arch timer interrupts into guest VMs, so we need to use the existing
Hyper-V clocks and the common code that already exists.


Does the above answer your question?

Regards,
Boqun

> > So just to be clear, your suggestion is
> > 
> > 1) Hyper-V guest on ARM64 should use arch_timer as clocksource and vDSO
> > will just work.
> > 
> > or
> > 
> > 2) Even though arch_timer is not used as the clocksource, we can still
> > use it for vDSO.
> > 
> > ?
> > 
> 
> Option #1 would be the preferred solution, unless there is a good reason against.
> 
> > Regards,
> > Boqun
> > 
> 
> -- 
> Regards,
> Vincenzo



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC 0/6] vDSO support for Hyper-V guest on ARM64
  2020-01-28  5:58       ` Boqun Feng
@ 2020-01-28 11:48         ` Marc Zyngier
  0 siblings, 0 replies; 14+ messages in thread
From: Marc Zyngier @ 2020-01-28 11:48 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Vincenzo Frascino, Sasha Levin, linux-hyperv, Stefano Stabellini,
	Stephen Hemminger, Catalin Marinas, Haiyang Zhang, linux-kernel,
	Michael Kelley, xen-devel, Thomas Gleixner, K. Y. Srinivasan,
	Will Deacon, linux-arm-kernel

On 2020-01-28 05:58, Boqun Feng wrote:
> On Fri, Jan 24, 2020 at 10:24:44AM +0000, Vincenzo Frascino wrote:
>> Hi Boqun Feng,
>> 
>> On 24/01/2020 06:32, Boqun Feng wrote:
>> > Hi Vincenzo,
>> >
>> 
>> [...]
>> 
>> >>
>> >> I had a look to your patches and overall, I could not understand why we can't
>> >> use the arch_timer to do the same things you are doing with the one you
>> >> introduced in this series. What confuses me is that KVM works just fine with the
>> >> arch_timer which was designed with virtualization in mind. Why do we need
>> >> another one? Could you please explain?
>> >>
>> >
>> > Please note that the guest VM on Hyper-V for ARM64 doesn't use
>> > arch_timer as the clocksource. See:
>> >
>> > 	https://lore.kernel.org/linux-arm-kernel/1570129355-16005-7-git-send-email-mikelley@microsoft.com/
>> >
>> > ,  ACPI_SIG_GTDT is used for setting up Hyper-V synthetic clocksource
>> > and other initialization work.
>> >
>> 
>> I had a look a look at it and my question stands, why do we need 
>> another timer
>> on arm64?
>> 
> 
> Sorry for the late response. It's weekend and Chinese New Year, so I 
> got
> to spend some time making (and mostly eating) dumplings ;-)

And you haven't been sharing! ;-)

> After discussion with Michael, here is some explanation why we need
> another timer:
> 
> The synthetic clocks that Hyper-V presents in a guest VM were 
> originally
> created for the x86 architecture. They provide a level of abstraction
> that solves problems like continuity across live migrations where the
> hardware clock (i.e., TSC in the case x86) frequency may be different
> across the migration. When Hyper-V was brought to ARM64, this
> abstraction was maintained to provide consistency across the x86 and
> ARM64 architectures, and for both Windows and Linux guest VMs.   The
> core Linux code for the Hyper-V clocks (in
> drivers/clocksource/hyperv_timer.c) is architecture neutral and works 
> on
> both x86 and ARM64. As you can see, this part is done in Michael's
> patchset.
> 
> Arguably, Hyper-V for ARM64 should have optimized for consistency with
> the ARM64 community rather with the existing x86 implementation and
> existing guest code in Windows. But at this point, it is what it is,
> and the Hyper-V clocks do solve problems like migration that aren’t
> addressed in ARM64 until v8.4 of the architecture with the addition of
> the counter hardware scaling feature. Hyper-V doesn’t currently map the
> ARM arch timer interrupts into guest VMs, so we need to use the 
> existing
> Hyper-V clocks and the common code that already exists.

The migration thing is a bit of a red herring. Do you really anticipate
VM migration across systems that have their timers running at different
frequencies *today*? And even if you did, there are ways to deal with it
with the arch timers (patches to that effect were posted on the list, 
and
there was even a bit of an ARM spec for it).

I find it odd to try and make arm64 "just another x86", while the 
architecture
gives you most of what you need already. I guess I'm tainted.

Thanks,

         M.
-- 
Who you jivin' with that Cosmik Debris?

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, back to index

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-16  0:19 [RFC 0/6] vDSO support for Hyper-V guest on ARM64 Boqun Feng
2019-12-16  0:19 ` [RFC 1/6] arm64: hyperv: Allow hv_get_raw_timer() definition to be overridden Boqun Feng
2019-12-16  0:19 ` [RFC 2/6] arm64: vdso: Add support for multiple vDSO data pages Boqun Feng
2019-12-16  0:19 ` [RFC 3/6] arm/arm64: clocksource: Introduce vclock_mode Boqun Feng
2019-12-16  0:19 ` [RFC 4/6] arm64: vdso: hyperv: Map tsc page into vDSO if enabled Boqun Feng
2019-12-16  0:19 ` [RFC 5/6] arm64: hyperv: Enable userspace to read cntvct Boqun Feng
2019-12-16  0:19 ` [RFC 6/6] arm64: hyperv: Enable vDSO Boqun Feng
2019-12-17 14:10   ` Vitaly Kuznetsov
2019-12-18  5:47     ` Boqun Feng
2020-01-23 10:48 ` [RFC 0/6] vDSO support for Hyper-V guest on ARM64 Vincenzo Frascino
2020-01-24  6:32   ` Boqun Feng
2020-01-24 10:24     ` Vincenzo Frascino
2020-01-28  5:58       ` Boqun Feng
2020-01-28 11:48         ` Marc Zyngier

Linux-HyperV Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-hyperv/0 linux-hyperv/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-hyperv linux-hyperv/ https://lore.kernel.org/linux-hyperv \
		linux-hyperv@vger.kernel.org
	public-inbox-index linux-hyperv

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-hyperv


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git