All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/3] x86/xen: pvclock vdso support
@ 2017-09-22 16:25 Joao Martins
  2017-09-22 16:25 ` [PATCH v2 1/3] x86/pvclock: add setter for pvclock_pvti_cpu0_va Joao Martins
                   ` (4 more replies)
  0 siblings, 5 replies; 13+ messages in thread
From: Joao Martins @ 2017-09-22 16:25 UTC (permalink / raw)
  To: linux-kernel, xen-devel
  Cc: Joao Martins, Boris Ostrovsky, Juergen Gross, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, x86, Andy Lutomirski

Hey,

Sorry for the huge delay in following up this series.

This take 2 for vdso for Xen. PVCLOCK_TSC_STABLE_BIT can be set starting Xen
 4.8 which is required for vdso time related calls. In order to have it on, you
need to have the hypervisor clocksource be TSC e.g. with the following boot
params "clocksource=tsc tsc=stable:socket".

Series is structured as following:

Patch 1 streamlines pvti page get/set in pvclock for both of its users
Patch 2 registers the pvti page on Xen and sets it in pvclock accordingly
Patch 3 adds a file to KVM/Xen maintainers for tracking pvclock ABI changes.

Changelog since v1 is included in individual patches.

Any comments/suggestions are welcome.

Thanks,
Joao


Joao Martins (3):
  x86/pvclock: add setter for pvclock_pvti_cpu0_va
  x86/xen/time: setup vcpu 0 time info page
  MAINTAINERS: xen, kvm: track pvclock-abi.h changes

 MAINTAINERS                    |   2 +
 arch/x86/include/asm/pvclock.h |  19 ++++----
 arch/x86/kernel/kvmclock.c     |   7 +--
 arch/x86/kernel/pvclock.c      |  14 ++++++
 arch/x86/xen/suspend.c         |   4 ++
 arch/x86/xen/time.c            | 100 +++++++++++++++++++++++++++++++++++++++++
 arch/x86/xen/xen-ops.h         |   2 +
 include/xen/interface/vcpu.h   |  28 ++++++++++++
 8 files changed, 161 insertions(+), 15 deletions(-)

-- 
2.11.0

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2 1/3] x86/pvclock: add setter for pvclock_pvti_cpu0_va
  2017-09-22 16:25 [PATCH v2 0/3] x86/xen: pvclock vdso support Joao Martins
  2017-09-22 16:25 ` [PATCH v2 1/3] x86/pvclock: add setter for pvclock_pvti_cpu0_va Joao Martins
@ 2017-09-22 16:25 ` Joao Martins
  2017-09-22 16:25 ` [PATCH v2 2/3] x86/xen/time: setup vcpu 0 time info page Joao Martins
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 13+ messages in thread
From: Joao Martins @ 2017-09-22 16:25 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Joao Martins, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	Paolo Bonzini, Radim Krcmar, Andy Lutomirski, xen-devel

Right now there is only a pvclock_pvti_cpu0_va() which is defined
on kvmclock since:

commit dac16fba6fc5
("x86/vdso: Get pvclock data from the vvar VMA instead of the fixmap")

The only user of this interface so far is kvm. This commit adds a
setter function for the pvti page and moves pvclock_pvti_cpu0_va
to pvclock, which is a more generic place to have it; and would
allow other PV clocksources to use it, such as Xen.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
---
Changes since v1:
 * Rebased: the only conflict was that I had move the export
 pvclock_pvti_cpu0_va() symbol as it is used by kvm PTP driver.
 * Do not initialize pvti_cpu0_va to NULL (checkpatch error)

 ( Comments from Andy Lutomirski )
 * Removed asm/pvclock.h 'pvclock_set_pvti_cpu0_va' definition
 for non !PARAVIRT_CLOCK to better track screwed Kconfig stuff.
 * Add his Acked-by (provided the previous adjustment was made)

Changes since RFC:
 (Comments from Andy Lutomirski)
 * Add __init to pvclock_set_pvti_cpu0_va
 * Add WARN_ON(vclock_was_used(VCLOCK_PVCLOCK)) to
 pvclock_set_pvti_cpu0_va
---
 arch/x86/include/asm/pvclock.h | 19 ++++++++++---------
 arch/x86/kernel/kvmclock.c     |  7 +------
 arch/x86/kernel/pvclock.c      | 14 ++++++++++++++
 3 files changed, 25 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/pvclock.h b/arch/x86/include/asm/pvclock.h
index 448cfe1b48cf..6f228f90cdd7 100644
--- a/arch/x86/include/asm/pvclock.h
+++ b/arch/x86/include/asm/pvclock.h
@@ -4,15 +4,6 @@
 #include <linux/clocksource.h>
 #include <asm/pvclock-abi.h>
 
-#ifdef CONFIG_KVM_GUEST
-extern struct pvclock_vsyscall_time_info *pvclock_pvti_cpu0_va(void);
-#else
-static inline struct pvclock_vsyscall_time_info *pvclock_pvti_cpu0_va(void)
-{
-	return NULL;
-}
-#endif
-
 /* some helper functions for xen and kvm pv clock sources */
 u64 pvclock_clocksource_read(struct pvclock_vcpu_time_info *src);
 u8 pvclock_read_flags(struct pvclock_vcpu_time_info *src);
@@ -101,4 +92,14 @@ struct pvclock_vsyscall_time_info {
 
 #define PVTI_SIZE sizeof(struct pvclock_vsyscall_time_info)
 
+#ifdef CONFIG_PARAVIRT_CLOCK
+void pvclock_set_pvti_cpu0_va(struct pvclock_vsyscall_time_info *pvti);
+struct pvclock_vsyscall_time_info *pvclock_pvti_cpu0_va(void);
+#else
+static inline struct pvclock_vsyscall_time_info *pvclock_pvti_cpu0_va(void)
+{
+	return NULL;
+}
+#endif
+
 #endif /* _ASM_X86_PVCLOCK_H */
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index d88967659098..538738047ff5 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -47,12 +47,6 @@ early_param("no-kvmclock", parse_no_kvmclock);
 static struct pvclock_vsyscall_time_info *hv_clock;
 static struct pvclock_wall_clock wall_clock;
 
-struct pvclock_vsyscall_time_info *pvclock_pvti_cpu0_va(void)
-{
-	return hv_clock;
-}
-EXPORT_SYMBOL_GPL(pvclock_pvti_cpu0_va);
-
 /*
  * The wallclock is the time of day when we booted. Since then, some time may
  * have elapsed since the hypervisor wrote the data. So we try to account for
@@ -334,6 +328,7 @@ int __init kvm_setup_vsyscall_timeinfo(void)
 		return 1;
 	}
 
+	pvclock_set_pvti_cpu0_va(hv_clock);
 	put_cpu();
 
 	kvm_clock.archdata.vclock_mode = VCLOCK_PVCLOCK;
diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
index 5c3f6d6a5078..cb7d6d9c9c2d 100644
--- a/arch/x86/kernel/pvclock.c
+++ b/arch/x86/kernel/pvclock.c
@@ -25,8 +25,10 @@
 
 #include <asm/fixmap.h>
 #include <asm/pvclock.h>
+#include <asm/vgtod.h>
 
 static u8 valid_flags __read_mostly = 0;
+static struct pvclock_vsyscall_time_info *pvti_cpu0_va __read_mostly;
 
 void pvclock_set_flags(u8 flags)
 {
@@ -144,3 +146,15 @@ void pvclock_read_wallclock(struct pvclock_wall_clock *wall_clock,
 
 	set_normalized_timespec(ts, now.tv_sec, now.tv_nsec);
 }
+
+void pvclock_set_pvti_cpu0_va(struct pvclock_vsyscall_time_info *pvti)
+{
+	WARN_ON(vclock_was_used(VCLOCK_PVCLOCK));
+	pvti_cpu0_va = pvti;
+}
+
+struct pvclock_vsyscall_time_info *pvclock_pvti_cpu0_va(void)
+{
+	return pvti_cpu0_va;
+}
+EXPORT_SYMBOL_GPL(pvclock_pvti_cpu0_va);
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 1/3] x86/pvclock: add setter for pvclock_pvti_cpu0_va
  2017-09-22 16:25 [PATCH v2 0/3] x86/xen: pvclock vdso support Joao Martins
@ 2017-09-22 16:25 ` Joao Martins
  2017-09-22 16:25 ` Joao Martins
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 13+ messages in thread
From: Joao Martins @ 2017-09-22 16:25 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: xen-devel, Radim Krcmar, x86, Andy Lutomirski, Ingo Molnar,
	H. Peter Anvin, Paolo Bonzini, Joao Martins, Thomas Gleixner

Right now there is only a pvclock_pvti_cpu0_va() which is defined
on kvmclock since:

commit dac16fba6fc5
("x86/vdso: Get pvclock data from the vvar VMA instead of the fixmap")

The only user of this interface so far is kvm. This commit adds a
setter function for the pvti page and moves pvclock_pvti_cpu0_va
to pvclock, which is a more generic place to have it; and would
allow other PV clocksources to use it, such as Xen.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
---
Changes since v1:
 * Rebased: the only conflict was that I had move the export
 pvclock_pvti_cpu0_va() symbol as it is used by kvm PTP driver.
 * Do not initialize pvti_cpu0_va to NULL (checkpatch error)

 ( Comments from Andy Lutomirski )
 * Removed asm/pvclock.h 'pvclock_set_pvti_cpu0_va' definition
 for non !PARAVIRT_CLOCK to better track screwed Kconfig stuff.
 * Add his Acked-by (provided the previous adjustment was made)

Changes since RFC:
 (Comments from Andy Lutomirski)
 * Add __init to pvclock_set_pvti_cpu0_va
 * Add WARN_ON(vclock_was_used(VCLOCK_PVCLOCK)) to
 pvclock_set_pvti_cpu0_va
---
 arch/x86/include/asm/pvclock.h | 19 ++++++++++---------
 arch/x86/kernel/kvmclock.c     |  7 +------
 arch/x86/kernel/pvclock.c      | 14 ++++++++++++++
 3 files changed, 25 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/pvclock.h b/arch/x86/include/asm/pvclock.h
index 448cfe1b48cf..6f228f90cdd7 100644
--- a/arch/x86/include/asm/pvclock.h
+++ b/arch/x86/include/asm/pvclock.h
@@ -4,15 +4,6 @@
 #include <linux/clocksource.h>
 #include <asm/pvclock-abi.h>
 
-#ifdef CONFIG_KVM_GUEST
-extern struct pvclock_vsyscall_time_info *pvclock_pvti_cpu0_va(void);
-#else
-static inline struct pvclock_vsyscall_time_info *pvclock_pvti_cpu0_va(void)
-{
-	return NULL;
-}
-#endif
-
 /* some helper functions for xen and kvm pv clock sources */
 u64 pvclock_clocksource_read(struct pvclock_vcpu_time_info *src);
 u8 pvclock_read_flags(struct pvclock_vcpu_time_info *src);
@@ -101,4 +92,14 @@ struct pvclock_vsyscall_time_info {
 
 #define PVTI_SIZE sizeof(struct pvclock_vsyscall_time_info)
 
+#ifdef CONFIG_PARAVIRT_CLOCK
+void pvclock_set_pvti_cpu0_va(struct pvclock_vsyscall_time_info *pvti);
+struct pvclock_vsyscall_time_info *pvclock_pvti_cpu0_va(void);
+#else
+static inline struct pvclock_vsyscall_time_info *pvclock_pvti_cpu0_va(void)
+{
+	return NULL;
+}
+#endif
+
 #endif /* _ASM_X86_PVCLOCK_H */
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index d88967659098..538738047ff5 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -47,12 +47,6 @@ early_param("no-kvmclock", parse_no_kvmclock);
 static struct pvclock_vsyscall_time_info *hv_clock;
 static struct pvclock_wall_clock wall_clock;
 
-struct pvclock_vsyscall_time_info *pvclock_pvti_cpu0_va(void)
-{
-	return hv_clock;
-}
-EXPORT_SYMBOL_GPL(pvclock_pvti_cpu0_va);
-
 /*
  * The wallclock is the time of day when we booted. Since then, some time may
  * have elapsed since the hypervisor wrote the data. So we try to account for
@@ -334,6 +328,7 @@ int __init kvm_setup_vsyscall_timeinfo(void)
 		return 1;
 	}
 
+	pvclock_set_pvti_cpu0_va(hv_clock);
 	put_cpu();
 
 	kvm_clock.archdata.vclock_mode = VCLOCK_PVCLOCK;
diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
index 5c3f6d6a5078..cb7d6d9c9c2d 100644
--- a/arch/x86/kernel/pvclock.c
+++ b/arch/x86/kernel/pvclock.c
@@ -25,8 +25,10 @@
 
 #include <asm/fixmap.h>
 #include <asm/pvclock.h>
+#include <asm/vgtod.h>
 
 static u8 valid_flags __read_mostly = 0;
+static struct pvclock_vsyscall_time_info *pvti_cpu0_va __read_mostly;
 
 void pvclock_set_flags(u8 flags)
 {
@@ -144,3 +146,15 @@ void pvclock_read_wallclock(struct pvclock_wall_clock *wall_clock,
 
 	set_normalized_timespec(ts, now.tv_sec, now.tv_nsec);
 }
+
+void pvclock_set_pvti_cpu0_va(struct pvclock_vsyscall_time_info *pvti)
+{
+	WARN_ON(vclock_was_used(VCLOCK_PVCLOCK));
+	pvti_cpu0_va = pvti;
+}
+
+struct pvclock_vsyscall_time_info *pvclock_pvti_cpu0_va(void)
+{
+	return pvti_cpu0_va;
+}
+EXPORT_SYMBOL_GPL(pvclock_pvti_cpu0_va);
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 2/3] x86/xen/time: setup vcpu 0 time info page
  2017-09-22 16:25 [PATCH v2 0/3] x86/xen: pvclock vdso support Joao Martins
  2017-09-22 16:25 ` [PATCH v2 1/3] x86/pvclock: add setter for pvclock_pvti_cpu0_va Joao Martins
  2017-09-22 16:25 ` Joao Martins
@ 2017-09-22 16:25 ` Joao Martins
  2017-09-26  9:32   ` Juergen Gross
  2017-09-26  9:32   ` Juergen Gross
  2017-09-22 16:25 ` Joao Martins
  2017-09-22 16:25   ` Joao Martins
  4 siblings, 2 replies; 13+ messages in thread
From: Joao Martins @ 2017-09-22 16:25 UTC (permalink / raw)
  To: linux-kernel, xen-devel
  Cc: Joao Martins, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	Boris Ostrovsky, Juergen Gross, Andy Lutomirski

In order to support pvclock vdso on xen we need to setup the time
info page for vcpu 0 and register the page with Xen using the
VCPUOP_register_vcpu_time_memory_area hypercall. This hypercall
will also forcefully update the pvti which will set some of the
necessary flags for vdso. Afterwards we check if it supports the
PVCLOCK_TSC_STABLE_BIT flag which is mandatory for having
vdso/vsyscall support. And if so, it will set the cpu 0 pvti that
will be later on used when mapping the vdso image.

The xen headers are also updated to include the new hypercall for
registering the secondary vcpu_time_info struct.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
Changes since v1:
 * Check flags ahead to see if the  primary clock can use
 PVCLOCK_TSC_STABLE_BIT even if secondary registration fails.

 (Comments from Boris)
 * Remove addr, addr variables;
 * Change first pr_debug to pr_warn;
 * Change last pr_debug to pr_notice;
 * Add routine to solely register secondary time info.
 * Move xen_clock to outside xen_setup_vsyscall_time_info to allow
 restore path to simply re-register secondary time info. Let us
 handle the restore path more gracefully without re-allocating a
 page.
 * Removed cpu argument from xen_setup_vsyscall_time_info()
 * Adjustment failed registration error messages/loglevel to be the same
 * Also teardown secondary time info on suspend

Changes since RFC:
 (Comments from Boris and David)
 * Remove Kconfig option
 * Use get_zeroed_page/free/page
 * Remove the hypercall availability check
 * Unregister pvti with arg.addr.v = NULL if stable bit isn't supported.
 (New)
 * Set secondary copy on restore such that it works on migration.
 * Drop global xen_clock variable and stash it locally on
 xen_setup_vsyscall_time_info.
 * WARN_ON(ret) if we fail to unregister the pvti.
---
 arch/x86/xen/suspend.c       |   4 ++
 arch/x86/xen/time.c          | 100 +++++++++++++++++++++++++++++++++++++++++++
 arch/x86/xen/xen-ops.h       |   2 +
 include/xen/interface/vcpu.h |  28 ++++++++++++
 4 files changed, 134 insertions(+)

diff --git a/arch/x86/xen/suspend.c b/arch/x86/xen/suspend.c
index d6b1680693a9..800ed36ecfba 100644
--- a/arch/x86/xen/suspend.c
+++ b/arch/x86/xen/suspend.c
@@ -16,6 +16,8 @@
 
 void xen_arch_pre_suspend(void)
 {
+	xen_save_time_memory_area();
+
 	if (xen_pv_domain())
 		xen_pv_pre_suspend();
 }
@@ -26,6 +28,8 @@ void xen_arch_post_suspend(int cancelled)
 		xen_pv_post_suspend(cancelled);
 	else
 		xen_hvm_post_suspend(cancelled);
+
+	xen_restore_time_memory_area();
 }
 
 static void xen_vcpu_notify_restore(void *data)
diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c
index 1ecb05db3632..2924b97691c6 100644
--- a/arch/x86/xen/time.c
+++ b/arch/x86/xen/time.c
@@ -370,6 +370,105 @@ static const struct pv_time_ops xen_time_ops __initconst = {
 	.steal_clock = xen_steal_clock,
 };
 
+static struct pvclock_vsyscall_time_info *xen_clock __read_mostly;
+
+void xen_save_time_memory_area(void)
+{
+	struct vcpu_register_time_memory_area t;
+	int ret;
+
+	if (!xen_clock)
+		return;
+
+	t.addr.v = NULL;
+
+	ret = HYPERVISOR_vcpu_op(VCPUOP_register_vcpu_time_memory_area, 0, &t);
+	if (ret != 0)
+		pr_notice("Cannot save secondary vcpu_time_info (err %d)",
+			  ret);
+	else
+		clear_page(xen_clock);
+}
+
+void xen_restore_time_memory_area(void)
+{
+	struct vcpu_register_time_memory_area t;
+	int ret;
+
+	if (!xen_clock)
+		return;
+
+	t.addr.v = &xen_clock->pvti;
+
+	ret = HYPERVISOR_vcpu_op(VCPUOP_register_vcpu_time_memory_area, 0, &t);
+
+	/*
+	 * We don't disable VCLOCK_PVCLOCK entirely if it fails to register the
+	 * secondary time info with Xen or if we migrated to a host without the
+	 * necessary flags. On both of these cases what happens is either
+	 * process seeing a zeroed out pvti or seeing no PVCLOCK_TSC_STABLE_BIT
+	 * bit set. Userspace checks the latter and if 0, it discards the data
+	 * in pvti and fallbacks to a system call for a reliable timestamp.
+	 */
+	if (ret != 0)
+		pr_notice("Cannot restore secondary vcpu_time_info (err %d)",
+			  ret);
+}
+
+static void xen_setup_vsyscall_time_info(void)
+{
+	struct vcpu_register_time_memory_area t;
+	struct pvclock_vsyscall_time_info *ti;
+	struct pvclock_vcpu_time_info *pvti;
+	int ret;
+
+	pvti = &__this_cpu_read(xen_vcpu)->time;
+
+	/*
+	 * We check ahead on the primary time info if this
+	 * bit is supported hence speeding up Xen clocksource.
+	 */
+	if (!(pvti->flags & PVCLOCK_TSC_STABLE_BIT))
+		return;
+
+	pvclock_set_flags(PVCLOCK_TSC_STABLE_BIT);
+
+	ti = (struct pvclock_vsyscall_time_info *) get_zeroed_page(GFP_KERNEL);
+	if (!ti)
+		return;
+
+	t.addr.v = &ti->pvti;
+
+	ret = HYPERVISOR_vcpu_op(VCPUOP_register_vcpu_time_memory_area, 0, &t);
+	if (ret) {
+		pr_notice("xen: VCLOCK_PVCLOCK not supported (err %d)\n", ret);
+		free_page((unsigned long) ti);
+		return;
+	}
+
+	/*
+	 * If the check above succedded this one should too since it's the
+	 * same data on both primary and secondary time infos just different
+	 * memory regions. But we still check it in case hypervisor is buggy.
+	 */
+	pvti = &ti->pvti;
+	if (!(pvti->flags & PVCLOCK_TSC_STABLE_BIT)) {
+		t.addr.v = NULL;
+		ret = HYPERVISOR_vcpu_op(VCPUOP_register_vcpu_time_memory_area,
+					 0, &t);
+		if (!ret)
+			free_page((unsigned long) ti);
+
+		pr_notice("xen: VCLOCK_PVCLOCK not supported (err %d)\n", ret);
+		return;
+	}
+
+	xen_clock = ti;
+	pvclock_set_pvti_cpu0_va(xen_clock);
+
+	xen_clocksource.archdata.vclock_mode = VCLOCK_PVCLOCK;
+}
+
 static void __init xen_time_init(void)
 {
 	int cpu = smp_processor_id();
@@ -396,6 +495,7 @@ static void __init xen_time_init(void)
 	setup_force_cpu_cap(X86_FEATURE_TSC);
 
 	xen_setup_runstate_info(cpu);
+	xen_setup_vsyscall_time_info();
 	xen_setup_timer(cpu);
 	xen_setup_cpu_clockevents();
 
diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
index c8a6d224f7ed..f96dbedb33d4 100644
--- a/arch/x86/xen/xen-ops.h
+++ b/arch/x86/xen/xen-ops.h
@@ -69,6 +69,8 @@ void xen_setup_runstate_info(int cpu);
 void xen_teardown_timer(int cpu);
 u64 xen_clocksource_read(void);
 void xen_setup_cpu_clockevents(void);
+void xen_save_time_memory_area(void);
+void xen_restore_time_memory_area(void);
 void __init xen_init_time_ops(void);
 void __init xen_hvm_init_time_ops(void);
 
diff --git a/include/xen/interface/vcpu.h b/include/xen/interface/vcpu.h
index 98188c87f5c1..8da788c5bd4f 100644
--- a/include/xen/interface/vcpu.h
+++ b/include/xen/interface/vcpu.h
@@ -178,4 +178,32 @@ DEFINE_GUEST_HANDLE_STRUCT(vcpu_register_vcpu_info);
 
 /* Send an NMI to the specified VCPU. @extra_arg == NULL. */
 #define VCPUOP_send_nmi             11
+
+/*
+ * Register a memory location to get a secondary copy of the vcpu time
+ * parameters.  The master copy still exists as part of the vcpu shared
+ * memory area, and this secondary copy is updated whenever the master copy
+ * is updated (and using the same versioning scheme for synchronisation).
+ *
+ * The intent is that this copy may be mapped (RO) into userspace so
+ * that usermode can compute system time using the time info and the
+ * tsc.  Usermode will see an array of vcpu_time_info structures, one
+ * for each vcpu, and choose the right one by an existing mechanism
+ * which allows it to get the current vcpu number (such as via a
+ * segment limit).  It can then apply the normal algorithm to compute
+ * system time from the tsc.
+ *
+ * @extra_arg == pointer to vcpu_register_time_info_memory_area structure.
+ */
+#define VCPUOP_register_vcpu_time_memory_area   13
+DEFINE_GUEST_HANDLE_STRUCT(vcpu_time_info_t);
+struct vcpu_register_time_memory_area {
+	union {
+		GUEST_HANDLE(vcpu_time_info_t) h;
+		struct pvclock_vcpu_time_info *v;
+		uint64_t p;
+	} addr;
+};
+DEFINE_GUEST_HANDLE_STRUCT(vcpu_register_time_memory_area_t);
+
 #endif /* __XEN_PUBLIC_VCPU_H__ */
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 2/3] x86/xen/time: setup vcpu 0 time info page
  2017-09-22 16:25 [PATCH v2 0/3] x86/xen: pvclock vdso support Joao Martins
                   ` (2 preceding siblings ...)
  2017-09-22 16:25 ` [PATCH v2 2/3] x86/xen/time: setup vcpu 0 time info page Joao Martins
@ 2017-09-22 16:25 ` Joao Martins
  2017-09-22 16:25   ` Joao Martins
  4 siblings, 0 replies; 13+ messages in thread
From: Joao Martins @ 2017-09-22 16:25 UTC (permalink / raw)
  To: linux-kernel, xen-devel
  Cc: Juergen Gross, x86, Andy Lutomirski, Ingo Molnar,
	Thomas Gleixner, H. Peter Anvin, Boris Ostrovsky, Joao Martins

In order to support pvclock vdso on xen we need to setup the time
info page for vcpu 0 and register the page with Xen using the
VCPUOP_register_vcpu_time_memory_area hypercall. This hypercall
will also forcefully update the pvti which will set some of the
necessary flags for vdso. Afterwards we check if it supports the
PVCLOCK_TSC_STABLE_BIT flag which is mandatory for having
vdso/vsyscall support. And if so, it will set the cpu 0 pvti that
will be later on used when mapping the vdso image.

The xen headers are also updated to include the new hypercall for
registering the secondary vcpu_time_info struct.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
Changes since v1:
 * Check flags ahead to see if the  primary clock can use
 PVCLOCK_TSC_STABLE_BIT even if secondary registration fails.

 (Comments from Boris)
 * Remove addr, addr variables;
 * Change first pr_debug to pr_warn;
 * Change last pr_debug to pr_notice;
 * Add routine to solely register secondary time info.
 * Move xen_clock to outside xen_setup_vsyscall_time_info to allow
 restore path to simply re-register secondary time info. Let us
 handle the restore path more gracefully without re-allocating a
 page.
 * Removed cpu argument from xen_setup_vsyscall_time_info()
 * Adjustment failed registration error messages/loglevel to be the same
 * Also teardown secondary time info on suspend

Changes since RFC:
 (Comments from Boris and David)
 * Remove Kconfig option
 * Use get_zeroed_page/free/page
 * Remove the hypercall availability check
 * Unregister pvti with arg.addr.v = NULL if stable bit isn't supported.
 (New)
 * Set secondary copy on restore such that it works on migration.
 * Drop global xen_clock variable and stash it locally on
 xen_setup_vsyscall_time_info.
 * WARN_ON(ret) if we fail to unregister the pvti.
---
 arch/x86/xen/suspend.c       |   4 ++
 arch/x86/xen/time.c          | 100 +++++++++++++++++++++++++++++++++++++++++++
 arch/x86/xen/xen-ops.h       |   2 +
 include/xen/interface/vcpu.h |  28 ++++++++++++
 4 files changed, 134 insertions(+)

diff --git a/arch/x86/xen/suspend.c b/arch/x86/xen/suspend.c
index d6b1680693a9..800ed36ecfba 100644
--- a/arch/x86/xen/suspend.c
+++ b/arch/x86/xen/suspend.c
@@ -16,6 +16,8 @@
 
 void xen_arch_pre_suspend(void)
 {
+	xen_save_time_memory_area();
+
 	if (xen_pv_domain())
 		xen_pv_pre_suspend();
 }
@@ -26,6 +28,8 @@ void xen_arch_post_suspend(int cancelled)
 		xen_pv_post_suspend(cancelled);
 	else
 		xen_hvm_post_suspend(cancelled);
+
+	xen_restore_time_memory_area();
 }
 
 static void xen_vcpu_notify_restore(void *data)
diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c
index 1ecb05db3632..2924b97691c6 100644
--- a/arch/x86/xen/time.c
+++ b/arch/x86/xen/time.c
@@ -370,6 +370,105 @@ static const struct pv_time_ops xen_time_ops __initconst = {
 	.steal_clock = xen_steal_clock,
 };
 
+static struct pvclock_vsyscall_time_info *xen_clock __read_mostly;
+
+void xen_save_time_memory_area(void)
+{
+	struct vcpu_register_time_memory_area t;
+	int ret;
+
+	if (!xen_clock)
+		return;
+
+	t.addr.v = NULL;
+
+	ret = HYPERVISOR_vcpu_op(VCPUOP_register_vcpu_time_memory_area, 0, &t);
+	if (ret != 0)
+		pr_notice("Cannot save secondary vcpu_time_info (err %d)",
+			  ret);
+	else
+		clear_page(xen_clock);
+}
+
+void xen_restore_time_memory_area(void)
+{
+	struct vcpu_register_time_memory_area t;
+	int ret;
+
+	if (!xen_clock)
+		return;
+
+	t.addr.v = &xen_clock->pvti;
+
+	ret = HYPERVISOR_vcpu_op(VCPUOP_register_vcpu_time_memory_area, 0, &t);
+
+	/*
+	 * We don't disable VCLOCK_PVCLOCK entirely if it fails to register the
+	 * secondary time info with Xen or if we migrated to a host without the
+	 * necessary flags. On both of these cases what happens is either
+	 * process seeing a zeroed out pvti or seeing no PVCLOCK_TSC_STABLE_BIT
+	 * bit set. Userspace checks the latter and if 0, it discards the data
+	 * in pvti and fallbacks to a system call for a reliable timestamp.
+	 */
+	if (ret != 0)
+		pr_notice("Cannot restore secondary vcpu_time_info (err %d)",
+			  ret);
+}
+
+static void xen_setup_vsyscall_time_info(void)
+{
+	struct vcpu_register_time_memory_area t;
+	struct pvclock_vsyscall_time_info *ti;
+	struct pvclock_vcpu_time_info *pvti;
+	int ret;
+
+	pvti = &__this_cpu_read(xen_vcpu)->time;
+
+	/*
+	 * We check ahead on the primary time info if this
+	 * bit is supported hence speeding up Xen clocksource.
+	 */
+	if (!(pvti->flags & PVCLOCK_TSC_STABLE_BIT))
+		return;
+
+	pvclock_set_flags(PVCLOCK_TSC_STABLE_BIT);
+
+	ti = (struct pvclock_vsyscall_time_info *) get_zeroed_page(GFP_KERNEL);
+	if (!ti)
+		return;
+
+	t.addr.v = &ti->pvti;
+
+	ret = HYPERVISOR_vcpu_op(VCPUOP_register_vcpu_time_memory_area, 0, &t);
+	if (ret) {
+		pr_notice("xen: VCLOCK_PVCLOCK not supported (err %d)\n", ret);
+		free_page((unsigned long) ti);
+		return;
+	}
+
+	/*
+	 * If the check above succedded this one should too since it's the
+	 * same data on both primary and secondary time infos just different
+	 * memory regions. But we still check it in case hypervisor is buggy.
+	 */
+	pvti = &ti->pvti;
+	if (!(pvti->flags & PVCLOCK_TSC_STABLE_BIT)) {
+		t.addr.v = NULL;
+		ret = HYPERVISOR_vcpu_op(VCPUOP_register_vcpu_time_memory_area,
+					 0, &t);
+		if (!ret)
+			free_page((unsigned long) ti);
+
+		pr_notice("xen: VCLOCK_PVCLOCK not supported (err %d)\n", ret);
+		return;
+	}
+
+	xen_clock = ti;
+	pvclock_set_pvti_cpu0_va(xen_clock);
+
+	xen_clocksource.archdata.vclock_mode = VCLOCK_PVCLOCK;
+}
+
 static void __init xen_time_init(void)
 {
 	int cpu = smp_processor_id();
@@ -396,6 +495,7 @@ static void __init xen_time_init(void)
 	setup_force_cpu_cap(X86_FEATURE_TSC);
 
 	xen_setup_runstate_info(cpu);
+	xen_setup_vsyscall_time_info();
 	xen_setup_timer(cpu);
 	xen_setup_cpu_clockevents();
 
diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
index c8a6d224f7ed..f96dbedb33d4 100644
--- a/arch/x86/xen/xen-ops.h
+++ b/arch/x86/xen/xen-ops.h
@@ -69,6 +69,8 @@ void xen_setup_runstate_info(int cpu);
 void xen_teardown_timer(int cpu);
 u64 xen_clocksource_read(void);
 void xen_setup_cpu_clockevents(void);
+void xen_save_time_memory_area(void);
+void xen_restore_time_memory_area(void);
 void __init xen_init_time_ops(void);
 void __init xen_hvm_init_time_ops(void);
 
diff --git a/include/xen/interface/vcpu.h b/include/xen/interface/vcpu.h
index 98188c87f5c1..8da788c5bd4f 100644
--- a/include/xen/interface/vcpu.h
+++ b/include/xen/interface/vcpu.h
@@ -178,4 +178,32 @@ DEFINE_GUEST_HANDLE_STRUCT(vcpu_register_vcpu_info);
 
 /* Send an NMI to the specified VCPU. @extra_arg == NULL. */
 #define VCPUOP_send_nmi             11
+
+/*
+ * Register a memory location to get a secondary copy of the vcpu time
+ * parameters.  The master copy still exists as part of the vcpu shared
+ * memory area, and this secondary copy is updated whenever the master copy
+ * is updated (and using the same versioning scheme for synchronisation).
+ *
+ * The intent is that this copy may be mapped (RO) into userspace so
+ * that usermode can compute system time using the time info and the
+ * tsc.  Usermode will see an array of vcpu_time_info structures, one
+ * for each vcpu, and choose the right one by an existing mechanism
+ * which allows it to get the current vcpu number (such as via a
+ * segment limit).  It can then apply the normal algorithm to compute
+ * system time from the tsc.
+ *
+ * @extra_arg == pointer to vcpu_register_time_info_memory_area structure.
+ */
+#define VCPUOP_register_vcpu_time_memory_area   13
+DEFINE_GUEST_HANDLE_STRUCT(vcpu_time_info_t);
+struct vcpu_register_time_memory_area {
+	union {
+		GUEST_HANDLE(vcpu_time_info_t) h;
+		struct pvclock_vcpu_time_info *v;
+		uint64_t p;
+	} addr;
+};
+DEFINE_GUEST_HANDLE_STRUCT(vcpu_register_time_memory_area_t);
+
 #endif /* __XEN_PUBLIC_VCPU_H__ */
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 3/3] MAINTAINERS: xen, kvm: track pvclock-abi.h changes
  2017-09-22 16:25 [PATCH v2 0/3] x86/xen: pvclock vdso support Joao Martins
@ 2017-09-22 16:25   ` Joao Martins
  2017-09-22 16:25 ` Joao Martins
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 13+ messages in thread
From: Joao Martins @ 2017-09-22 16:25 UTC (permalink / raw)
  To: linux-kernel, xen-devel, kvm
  Cc: Joao Martins, Boris Ostrovsky, Juergen Gross, Paolo Bonzini,
	Radim Krcmar, Andy Lutomirski

This file defines an ABI shared between guest and hypervisor(s)
(KVM, Xen) and as such there should be an correspondent entry in
MAINTAINERS file. Notice that there's already a text notice at the
top of the header file, hence this commit simply enforces it more
explicitly and have both peers noticed when such changes happen.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Acked-by: Juergen Gross <jgross@suse.com>
---
Out of the two options (and provided I was given a choice) I choose the
originally posted because this is so far the only ABI shared between Xen/KVM.
Whenever we have more things shared it would probably deserve moving into its
own section in MAINTAINERS file. If my thinking is wrong, I can switch to the
alternative i.e. a "PARAVIRT ABIS" section.

Changes since v1:
 * Add Juergen Gross Acked-by.
---
 MAINTAINERS | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 2281af4b41b6..5a6c26c298b1 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -7592,6 +7592,7 @@ S:	Supported
 F:	arch/x86/kvm/
 F:	arch/x86/include/uapi/asm/kvm*
 F:	arch/x86/include/asm/kvm*
+F:	arch/x86/include/asm/pvclock-abi.h
 F:	arch/x86/kernel/kvm.c
 F:	arch/x86/kernel/kvmclock.c
 
@@ -14708,6 +14709,7 @@ F:	arch/x86/xen/
 F:	drivers/*/xen-*front.c
 F:	drivers/xen/
 F:	arch/x86/include/asm/xen/
+F:	arch/x86/include/asm/pvclock-abi.h
 F:	include/xen/
 F:	include/uapi/xen/
 F:	Documentation/ABI/stable/sysfs-hypervisor-xen
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 3/3] MAINTAINERS: xen, kvm: track pvclock-abi.h changes
@ 2017-09-22 16:25   ` Joao Martins
  0 siblings, 0 replies; 13+ messages in thread
From: Joao Martins @ 2017-09-22 16:25 UTC (permalink / raw)
  To: linux-kernel, xen-devel, kvm
  Cc: Juergen Gross, Radim Krcmar, Andy Lutomirski, Paolo Bonzini,
	Joao Martins, Boris Ostrovsky

This file defines an ABI shared between guest and hypervisor(s)
(KVM, Xen) and as such there should be an correspondent entry in
MAINTAINERS file. Notice that there's already a text notice at the
top of the header file, hence this commit simply enforces it more
explicitly and have both peers noticed when such changes happen.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Acked-by: Juergen Gross <jgross@suse.com>
---
Out of the two options (and provided I was given a choice) I choose the
originally posted because this is so far the only ABI shared between Xen/KVM.
Whenever we have more things shared it would probably deserve moving into its
own section in MAINTAINERS file. If my thinking is wrong, I can switch to the
alternative i.e. a "PARAVIRT ABIS" section.

Changes since v1:
 * Add Juergen Gross Acked-by.
---
 MAINTAINERS | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 2281af4b41b6..5a6c26c298b1 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -7592,6 +7592,7 @@ S:	Supported
 F:	arch/x86/kvm/
 F:	arch/x86/include/uapi/asm/kvm*
 F:	arch/x86/include/asm/kvm*
+F:	arch/x86/include/asm/pvclock-abi.h
 F:	arch/x86/kernel/kvm.c
 F:	arch/x86/kernel/kvmclock.c
 
@@ -14708,6 +14709,7 @@ F:	arch/x86/xen/
 F:	drivers/*/xen-*front.c
 F:	drivers/xen/
 F:	arch/x86/include/asm/xen/
+F:	arch/x86/include/asm/pvclock-abi.h
 F:	include/xen/
 F:	include/uapi/xen/
 F:	Documentation/ABI/stable/sysfs-hypervisor-xen
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 2/3] x86/xen/time: setup vcpu 0 time info page
  2017-09-22 16:25 ` [PATCH v2 2/3] x86/xen/time: setup vcpu 0 time info page Joao Martins
  2017-09-26  9:32   ` Juergen Gross
@ 2017-09-26  9:32   ` Juergen Gross
  2017-09-26  9:57       ` Joao Martins
  1 sibling, 1 reply; 13+ messages in thread
From: Juergen Gross @ 2017-09-26  9:32 UTC (permalink / raw)
  To: Joao Martins, linux-kernel, xen-devel
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	Boris Ostrovsky, Andy Lutomirski

On 22/09/17 18:25, Joao Martins wrote:
> In order to support pvclock vdso on xen we need to setup the time
> info page for vcpu 0 and register the page with Xen using the
> VCPUOP_register_vcpu_time_memory_area hypercall. This hypercall
> will also forcefully update the pvti which will set some of the
> necessary flags for vdso. Afterwards we check if it supports the
> PVCLOCK_TSC_STABLE_BIT flag which is mandatory for having
> vdso/vsyscall support. And if so, it will set the cpu 0 pvti that
> will be later on used when mapping the vdso image.
> 
> The xen headers are also updated to include the new hypercall for
> registering the secondary vcpu_time_info struct.
> 
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> ---
> Changes since v1:
>  * Check flags ahead to see if the  primary clock can use
>  PVCLOCK_TSC_STABLE_BIT even if secondary registration fails.
> 
>  (Comments from Boris)
>  * Remove addr, addr variables;
>  * Change first pr_debug to pr_warn;
>  * Change last pr_debug to pr_notice;
>  * Add routine to solely register secondary time info.
>  * Move xen_clock to outside xen_setup_vsyscall_time_info to allow
>  restore path to simply re-register secondary time info. Let us
>  handle the restore path more gracefully without re-allocating a
>  page.
>  * Removed cpu argument from xen_setup_vsyscall_time_info()
>  * Adjustment failed registration error messages/loglevel to be the same
>  * Also teardown secondary time info on suspend
> 
> Changes since RFC:
>  (Comments from Boris and David)
>  * Remove Kconfig option
>  * Use get_zeroed_page/free/page
>  * Remove the hypercall availability check
>  * Unregister pvti with arg.addr.v = NULL if stable bit isn't supported.
>  (New)
>  * Set secondary copy on restore such that it works on migration.
>  * Drop global xen_clock variable and stash it locally on
>  xen_setup_vsyscall_time_info.
>  * WARN_ON(ret) if we fail to unregister the pvti.
> ---
>  arch/x86/xen/suspend.c       |   4 ++
>  arch/x86/xen/time.c          | 100 +++++++++++++++++++++++++++++++++++++++++++
>  arch/x86/xen/xen-ops.h       |   2 +
>  include/xen/interface/vcpu.h |  28 ++++++++++++
>  4 files changed, 134 insertions(+)
> 
> diff --git a/arch/x86/xen/suspend.c b/arch/x86/xen/suspend.c
> index d6b1680693a9..800ed36ecfba 100644
> --- a/arch/x86/xen/suspend.c
> +++ b/arch/x86/xen/suspend.c
> @@ -16,6 +16,8 @@
>  
>  void xen_arch_pre_suspend(void)
>  {
> +	xen_save_time_memory_area();
> +
>  	if (xen_pv_domain())
>  		xen_pv_pre_suspend();
>  }
> @@ -26,6 +28,8 @@ void xen_arch_post_suspend(int cancelled)
>  		xen_pv_post_suspend(cancelled);
>  	else
>  		xen_hvm_post_suspend(cancelled);
> +
> +	xen_restore_time_memory_area();
>  }
>  
>  static void xen_vcpu_notify_restore(void *data)
> diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c
> index 1ecb05db3632..2924b97691c6 100644
> --- a/arch/x86/xen/time.c
> +++ b/arch/x86/xen/time.c
> @@ -370,6 +370,105 @@ static const struct pv_time_ops xen_time_ops __initconst = {
>  	.steal_clock = xen_steal_clock,
>  };
>  
> +static struct pvclock_vsyscall_time_info *xen_clock __read_mostly;
> +
> +void xen_save_time_memory_area(void)
> +{
> +	struct vcpu_register_time_memory_area t;
> +	int ret;
> +
> +	if (!xen_clock)
> +		return;
> +
> +	t.addr.v = NULL;
> +
> +	ret = HYPERVISOR_vcpu_op(VCPUOP_register_vcpu_time_memory_area, 0, &t);
> +	if (ret != 0)
> +		pr_notice("Cannot save secondary vcpu_time_info (err %d)",
> +			  ret);
> +	else
> +		clear_page(xen_clock);
> +}
> +
> +void xen_restore_time_memory_area(void)
> +{
> +	struct vcpu_register_time_memory_area t;
> +	int ret;
> +
> +	if (!xen_clock)
> +		return;
> +
> +	t.addr.v = &xen_clock->pvti;
> +
> +	ret = HYPERVISOR_vcpu_op(VCPUOP_register_vcpu_time_memory_area, 0, &t);
> +
> +	/*
> +	 * We don't disable VCLOCK_PVCLOCK entirely if it fails to register the
> +	 * secondary time info with Xen or if we migrated to a host without the
> +	 * necessary flags. On both of these cases what happens is either
> +	 * process seeing a zeroed out pvti or seeing no PVCLOCK_TSC_STABLE_BIT
> +	 * bit set. Userspace checks the latter and if 0, it discards the data
> +	 * in pvti and fallbacks to a system call for a reliable timestamp.
> +	 */
> +	if (ret != 0)
> +		pr_notice("Cannot restore secondary vcpu_time_info (err %d)",
> +			  ret);
> +}
> +
> +static void xen_setup_vsyscall_time_info(void)
> +{
> +	struct vcpu_register_time_memory_area t;
> +	struct pvclock_vsyscall_time_info *ti;
> +	struct pvclock_vcpu_time_info *pvti;
> +	int ret;
> +
> +	pvti = &__this_cpu_read(xen_vcpu)->time;
> +
> +	/*
> +	 * We check ahead on the primary time info if this
> +	 * bit is supported hence speeding up Xen clocksource.
> +	 */
> +	if (!(pvti->flags & PVCLOCK_TSC_STABLE_BIT))
> +		return;
> +
> +	pvclock_set_flags(PVCLOCK_TSC_STABLE_BIT);
> +
> +	ti = (struct pvclock_vsyscall_time_info *) get_zeroed_page(GFP_KERNEL);

Coding style: omit the blank after the cast.

> +	if (!ti)
> +		return;
> +
> +	t.addr.v = &ti->pvti;
> +
> +	ret = HYPERVISOR_vcpu_op(VCPUOP_register_vcpu_time_memory_area, 0, &t);
> +	if (ret) {
> +		pr_notice("xen: VCLOCK_PVCLOCK not supported (err %d)\n", ret);
> +		free_page((unsigned long) ti);

Coding style again, once more below.

> +		return;
> +	}
> +
> +	/*
> +	 * If the check above succedded this one should too since it's the
> +	 * same data on both primary and secondary time infos just different
> +	 * memory regions. But we still check it in case hypervisor is buggy.
> +	 */
> +	pvti = &ti->pvti;
> +	if (!(pvti->flags & PVCLOCK_TSC_STABLE_BIT)) {
> +		t.addr.v = NULL;
> +		ret = HYPERVISOR_vcpu_op(VCPUOP_register_vcpu_time_memory_area,
> +					 0, &t);
> +		if (!ret)
> +			free_page((unsigned long) ti);
> +
> +		pr_notice("xen: VCLOCK_PVCLOCK not supported (err %d)\n", ret);

Mind making the message more descriptive? E.g. instead of reporting
"(err 0)" just telling "(tsc unstable)"?

> +		return;
> +	}
> +
> +	xen_clock = ti;
> +	pvclock_set_pvti_cpu0_va(xen_clock);
> +
> +	xen_clocksource.archdata.vclock_mode = VCLOCK_PVCLOCK;
> +}
> +
>  static void __init xen_time_init(void)
>  {
>  	int cpu = smp_processor_id();
> @@ -396,6 +495,7 @@ static void __init xen_time_init(void)
>  	setup_force_cpu_cap(X86_FEATURE_TSC);
>  
>  	xen_setup_runstate_info(cpu);
> +	xen_setup_vsyscall_time_info();
>  	xen_setup_timer(cpu);
>  	xen_setup_cpu_clockevents();
>  
> diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
> index c8a6d224f7ed..f96dbedb33d4 100644
> --- a/arch/x86/xen/xen-ops.h
> +++ b/arch/x86/xen/xen-ops.h
> @@ -69,6 +69,8 @@ void xen_setup_runstate_info(int cpu);
>  void xen_teardown_timer(int cpu);
>  u64 xen_clocksource_read(void);
>  void xen_setup_cpu_clockevents(void);
> +void xen_save_time_memory_area(void);
> +void xen_restore_time_memory_area(void);
>  void __init xen_init_time_ops(void);
>  void __init xen_hvm_init_time_ops(void);
>  
> diff --git a/include/xen/interface/vcpu.h b/include/xen/interface/vcpu.h
> index 98188c87f5c1..8da788c5bd4f 100644
> --- a/include/xen/interface/vcpu.h
> +++ b/include/xen/interface/vcpu.h
> @@ -178,4 +178,32 @@ DEFINE_GUEST_HANDLE_STRUCT(vcpu_register_vcpu_info);
>  
>  /* Send an NMI to the specified VCPU. @extra_arg == NULL. */
>  #define VCPUOP_send_nmi             11
> +
> +/*
> + * Register a memory location to get a secondary copy of the vcpu time
> + * parameters.  The master copy still exists as part of the vcpu shared
> + * memory area, and this secondary copy is updated whenever the master copy
> + * is updated (and using the same versioning scheme for synchronisation).
> + *
> + * The intent is that this copy may be mapped (RO) into userspace so
> + * that usermode can compute system time using the time info and the
> + * tsc.  Usermode will see an array of vcpu_time_info structures, one
> + * for each vcpu, and choose the right one by an existing mechanism
> + * which allows it to get the current vcpu number (such as via a
> + * segment limit).  It can then apply the normal algorithm to compute
> + * system time from the tsc.
> + *
> + * @extra_arg == pointer to vcpu_register_time_info_memory_area structure.
> + */
> +#define VCPUOP_register_vcpu_time_memory_area   13
> +DEFINE_GUEST_HANDLE_STRUCT(vcpu_time_info_t);
> +struct vcpu_register_time_memory_area {
> +	union {
> +		GUEST_HANDLE(vcpu_time_info_t) h;
> +		struct pvclock_vcpu_time_info *v;
> +		uint64_t p;
> +	} addr;
> +};
> +DEFINE_GUEST_HANDLE_STRUCT(vcpu_register_time_memory_area_t);
> +

Instead of adding only the operation you need, maybe you could sync just
the complete header from Xen?


Juergen

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 2/3] x86/xen/time: setup vcpu 0 time info page
  2017-09-22 16:25 ` [PATCH v2 2/3] x86/xen/time: setup vcpu 0 time info page Joao Martins
@ 2017-09-26  9:32   ` Juergen Gross
  2017-09-26  9:32   ` Juergen Gross
  1 sibling, 0 replies; 13+ messages in thread
From: Juergen Gross @ 2017-09-26  9:32 UTC (permalink / raw)
  To: Joao Martins, linux-kernel, xen-devel
  Cc: x86, Andy Lutomirski, Ingo Molnar, H. Peter Anvin,
	Boris Ostrovsky, Thomas Gleixner

On 22/09/17 18:25, Joao Martins wrote:
> In order to support pvclock vdso on xen we need to setup the time
> info page for vcpu 0 and register the page with Xen using the
> VCPUOP_register_vcpu_time_memory_area hypercall. This hypercall
> will also forcefully update the pvti which will set some of the
> necessary flags for vdso. Afterwards we check if it supports the
> PVCLOCK_TSC_STABLE_BIT flag which is mandatory for having
> vdso/vsyscall support. And if so, it will set the cpu 0 pvti that
> will be later on used when mapping the vdso image.
> 
> The xen headers are also updated to include the new hypercall for
> registering the secondary vcpu_time_info struct.
> 
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> ---
> Changes since v1:
>  * Check flags ahead to see if the  primary clock can use
>  PVCLOCK_TSC_STABLE_BIT even if secondary registration fails.
> 
>  (Comments from Boris)
>  * Remove addr, addr variables;
>  * Change first pr_debug to pr_warn;
>  * Change last pr_debug to pr_notice;
>  * Add routine to solely register secondary time info.
>  * Move xen_clock to outside xen_setup_vsyscall_time_info to allow
>  restore path to simply re-register secondary time info. Let us
>  handle the restore path more gracefully without re-allocating a
>  page.
>  * Removed cpu argument from xen_setup_vsyscall_time_info()
>  * Adjustment failed registration error messages/loglevel to be the same
>  * Also teardown secondary time info on suspend
> 
> Changes since RFC:
>  (Comments from Boris and David)
>  * Remove Kconfig option
>  * Use get_zeroed_page/free/page
>  * Remove the hypercall availability check
>  * Unregister pvti with arg.addr.v = NULL if stable bit isn't supported.
>  (New)
>  * Set secondary copy on restore such that it works on migration.
>  * Drop global xen_clock variable and stash it locally on
>  xen_setup_vsyscall_time_info.
>  * WARN_ON(ret) if we fail to unregister the pvti.
> ---
>  arch/x86/xen/suspend.c       |   4 ++
>  arch/x86/xen/time.c          | 100 +++++++++++++++++++++++++++++++++++++++++++
>  arch/x86/xen/xen-ops.h       |   2 +
>  include/xen/interface/vcpu.h |  28 ++++++++++++
>  4 files changed, 134 insertions(+)
> 
> diff --git a/arch/x86/xen/suspend.c b/arch/x86/xen/suspend.c
> index d6b1680693a9..800ed36ecfba 100644
> --- a/arch/x86/xen/suspend.c
> +++ b/arch/x86/xen/suspend.c
> @@ -16,6 +16,8 @@
>  
>  void xen_arch_pre_suspend(void)
>  {
> +	xen_save_time_memory_area();
> +
>  	if (xen_pv_domain())
>  		xen_pv_pre_suspend();
>  }
> @@ -26,6 +28,8 @@ void xen_arch_post_suspend(int cancelled)
>  		xen_pv_post_suspend(cancelled);
>  	else
>  		xen_hvm_post_suspend(cancelled);
> +
> +	xen_restore_time_memory_area();
>  }
>  
>  static void xen_vcpu_notify_restore(void *data)
> diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c
> index 1ecb05db3632..2924b97691c6 100644
> --- a/arch/x86/xen/time.c
> +++ b/arch/x86/xen/time.c
> @@ -370,6 +370,105 @@ static const struct pv_time_ops xen_time_ops __initconst = {
>  	.steal_clock = xen_steal_clock,
>  };
>  
> +static struct pvclock_vsyscall_time_info *xen_clock __read_mostly;
> +
> +void xen_save_time_memory_area(void)
> +{
> +	struct vcpu_register_time_memory_area t;
> +	int ret;
> +
> +	if (!xen_clock)
> +		return;
> +
> +	t.addr.v = NULL;
> +
> +	ret = HYPERVISOR_vcpu_op(VCPUOP_register_vcpu_time_memory_area, 0, &t);
> +	if (ret != 0)
> +		pr_notice("Cannot save secondary vcpu_time_info (err %d)",
> +			  ret);
> +	else
> +		clear_page(xen_clock);
> +}
> +
> +void xen_restore_time_memory_area(void)
> +{
> +	struct vcpu_register_time_memory_area t;
> +	int ret;
> +
> +	if (!xen_clock)
> +		return;
> +
> +	t.addr.v = &xen_clock->pvti;
> +
> +	ret = HYPERVISOR_vcpu_op(VCPUOP_register_vcpu_time_memory_area, 0, &t);
> +
> +	/*
> +	 * We don't disable VCLOCK_PVCLOCK entirely if it fails to register the
> +	 * secondary time info with Xen or if we migrated to a host without the
> +	 * necessary flags. On both of these cases what happens is either
> +	 * process seeing a zeroed out pvti or seeing no PVCLOCK_TSC_STABLE_BIT
> +	 * bit set. Userspace checks the latter and if 0, it discards the data
> +	 * in pvti and fallbacks to a system call for a reliable timestamp.
> +	 */
> +	if (ret != 0)
> +		pr_notice("Cannot restore secondary vcpu_time_info (err %d)",
> +			  ret);
> +}
> +
> +static void xen_setup_vsyscall_time_info(void)
> +{
> +	struct vcpu_register_time_memory_area t;
> +	struct pvclock_vsyscall_time_info *ti;
> +	struct pvclock_vcpu_time_info *pvti;
> +	int ret;
> +
> +	pvti = &__this_cpu_read(xen_vcpu)->time;
> +
> +	/*
> +	 * We check ahead on the primary time info if this
> +	 * bit is supported hence speeding up Xen clocksource.
> +	 */
> +	if (!(pvti->flags & PVCLOCK_TSC_STABLE_BIT))
> +		return;
> +
> +	pvclock_set_flags(PVCLOCK_TSC_STABLE_BIT);
> +
> +	ti = (struct pvclock_vsyscall_time_info *) get_zeroed_page(GFP_KERNEL);

Coding style: omit the blank after the cast.

> +	if (!ti)
> +		return;
> +
> +	t.addr.v = &ti->pvti;
> +
> +	ret = HYPERVISOR_vcpu_op(VCPUOP_register_vcpu_time_memory_area, 0, &t);
> +	if (ret) {
> +		pr_notice("xen: VCLOCK_PVCLOCK not supported (err %d)\n", ret);
> +		free_page((unsigned long) ti);

Coding style again, once more below.

> +		return;
> +	}
> +
> +	/*
> +	 * If the check above succedded this one should too since it's the
> +	 * same data on both primary and secondary time infos just different
> +	 * memory regions. But we still check it in case hypervisor is buggy.
> +	 */
> +	pvti = &ti->pvti;
> +	if (!(pvti->flags & PVCLOCK_TSC_STABLE_BIT)) {
> +		t.addr.v = NULL;
> +		ret = HYPERVISOR_vcpu_op(VCPUOP_register_vcpu_time_memory_area,
> +					 0, &t);
> +		if (!ret)
> +			free_page((unsigned long) ti);
> +
> +		pr_notice("xen: VCLOCK_PVCLOCK not supported (err %d)\n", ret);

Mind making the message more descriptive? E.g. instead of reporting
"(err 0)" just telling "(tsc unstable)"?

> +		return;
> +	}
> +
> +	xen_clock = ti;
> +	pvclock_set_pvti_cpu0_va(xen_clock);
> +
> +	xen_clocksource.archdata.vclock_mode = VCLOCK_PVCLOCK;
> +}
> +
>  static void __init xen_time_init(void)
>  {
>  	int cpu = smp_processor_id();
> @@ -396,6 +495,7 @@ static void __init xen_time_init(void)
>  	setup_force_cpu_cap(X86_FEATURE_TSC);
>  
>  	xen_setup_runstate_info(cpu);
> +	xen_setup_vsyscall_time_info();
>  	xen_setup_timer(cpu);
>  	xen_setup_cpu_clockevents();
>  
> diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
> index c8a6d224f7ed..f96dbedb33d4 100644
> --- a/arch/x86/xen/xen-ops.h
> +++ b/arch/x86/xen/xen-ops.h
> @@ -69,6 +69,8 @@ void xen_setup_runstate_info(int cpu);
>  void xen_teardown_timer(int cpu);
>  u64 xen_clocksource_read(void);
>  void xen_setup_cpu_clockevents(void);
> +void xen_save_time_memory_area(void);
> +void xen_restore_time_memory_area(void);
>  void __init xen_init_time_ops(void);
>  void __init xen_hvm_init_time_ops(void);
>  
> diff --git a/include/xen/interface/vcpu.h b/include/xen/interface/vcpu.h
> index 98188c87f5c1..8da788c5bd4f 100644
> --- a/include/xen/interface/vcpu.h
> +++ b/include/xen/interface/vcpu.h
> @@ -178,4 +178,32 @@ DEFINE_GUEST_HANDLE_STRUCT(vcpu_register_vcpu_info);
>  
>  /* Send an NMI to the specified VCPU. @extra_arg == NULL. */
>  #define VCPUOP_send_nmi             11
> +
> +/*
> + * Register a memory location to get a secondary copy of the vcpu time
> + * parameters.  The master copy still exists as part of the vcpu shared
> + * memory area, and this secondary copy is updated whenever the master copy
> + * is updated (and using the same versioning scheme for synchronisation).
> + *
> + * The intent is that this copy may be mapped (RO) into userspace so
> + * that usermode can compute system time using the time info and the
> + * tsc.  Usermode will see an array of vcpu_time_info structures, one
> + * for each vcpu, and choose the right one by an existing mechanism
> + * which allows it to get the current vcpu number (such as via a
> + * segment limit).  It can then apply the normal algorithm to compute
> + * system time from the tsc.
> + *
> + * @extra_arg == pointer to vcpu_register_time_info_memory_area structure.
> + */
> +#define VCPUOP_register_vcpu_time_memory_area   13
> +DEFINE_GUEST_HANDLE_STRUCT(vcpu_time_info_t);
> +struct vcpu_register_time_memory_area {
> +	union {
> +		GUEST_HANDLE(vcpu_time_info_t) h;
> +		struct pvclock_vcpu_time_info *v;
> +		uint64_t p;
> +	} addr;
> +};
> +DEFINE_GUEST_HANDLE_STRUCT(vcpu_register_time_memory_area_t);
> +

Instead of adding only the operation you need, maybe you could sync just
the complete header from Xen?


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 2/3] x86/xen/time: setup vcpu 0 time info page
  2017-09-26  9:32   ` Juergen Gross
@ 2017-09-26  9:57       ` Joao Martins
  0 siblings, 0 replies; 13+ messages in thread
From: Joao Martins @ 2017-09-26  9:57 UTC (permalink / raw)
  To: Juergen Gross
  Cc: linux-kernel, xen-devel, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, x86, Boris Ostrovsky, Andy Lutomirski

On 09/26/2017 10:32 AM, Juergen Gross wrote:
> On 22/09/17 18:25, Joao Martins wrote:
[snip]
>> +static void xen_setup_vsyscall_time_info(void)
>> +{
>> +	struct vcpu_register_time_memory_area t;
>> +	struct pvclock_vsyscall_time_info *ti;
>> +	struct pvclock_vcpu_time_info *pvti;
>> +	int ret;
>> +
>> +	pvti = &__this_cpu_read(xen_vcpu)->time;
>> +
>> +	/*
>> +	 * We check ahead on the primary time info if this
>> +	 * bit is supported hence speeding up Xen clocksource.
>> +	 */
>> +	if (!(pvti->flags & PVCLOCK_TSC_STABLE_BIT))
>> +		return;
>> +
>> +	pvclock_set_flags(PVCLOCK_TSC_STABLE_BIT);
>> +
>> +	ti = (struct pvclock_vsyscall_time_info *) get_zeroed_page(GFP_KERNEL);
> 
> Coding style: omit the blank after the cast.
> 
OK.

>> +	if (!ti)
>> +		return;
>> +
>> +	t.addr.v = &ti->pvti;
>> +
>> +	ret = HYPERVISOR_vcpu_op(VCPUOP_register_vcpu_time_memory_area, 0, &t);
>> +	if (ret) {
>> +		pr_notice("xen: VCLOCK_PVCLOCK not supported (err %d)\n", ret);
>> +		free_page((unsigned long) ti);
> 
> Coding style again, once more below.
> 
OK.

>> +		return;
>> +	}
>> +
>> +	/*
>> +	 * If the check above succedded this one should too since it's the
>> +	 * same data on both primary and secondary time infos just different
>> +	 * memory regions. But we still check it in case hypervisor is buggy.
>> +	 */
>> +	pvti = &ti->pvti;
>> +	if (!(pvti->flags & PVCLOCK_TSC_STABLE_BIT)) {
>> +		t.addr.v = NULL;
>> +		ret = HYPERVISOR_vcpu_op(VCPUOP_register_vcpu_time_memory_area,
>> +					 0, &t);
>> +		if (!ret)
>> +			free_page((unsigned long) ti);
>> +
>> +		pr_notice("xen: VCLOCK_PVCLOCK not supported (err %d)\n", ret);
> 
> Mind making the message more descriptive? E.g. instead of reporting
> "(err 0)" just telling "(tsc unstable)"?
> 
Got it.

>> +		return;
>> +	}
>> +
>> +	xen_clock = ti;
>> +	pvclock_set_pvti_cpu0_va(xen_clock);
>> +
>> +	xen_clocksource.archdata.vclock_mode = VCLOCK_PVCLOCK;
>> +}
>> +
>>  static void __init xen_time_init(void)
>>  {
>>  	int cpu = smp_processor_id();
>> @@ -396,6 +495,7 @@ static void __init xen_time_init(void)
>>  	setup_force_cpu_cap(X86_FEATURE_TSC);
>>  
>>  	xen_setup_runstate_info(cpu);
>> +	xen_setup_vsyscall_time_info();
>>  	xen_setup_timer(cpu);
>>  	xen_setup_cpu_clockevents();
>>  
>> diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
>> index c8a6d224f7ed..f96dbedb33d4 100644
>> --- a/arch/x86/xen/xen-ops.h
>> +++ b/arch/x86/xen/xen-ops.h
>> @@ -69,6 +69,8 @@ void xen_setup_runstate_info(int cpu);
>>  void xen_teardown_timer(int cpu);
>>  u64 xen_clocksource_read(void);
>>  void xen_setup_cpu_clockevents(void);
>> +void xen_save_time_memory_area(void);
>> +void xen_restore_time_memory_area(void);
>>  void __init xen_init_time_ops(void);
>>  void __init xen_hvm_init_time_ops(void);
>>  
>> diff --git a/include/xen/interface/vcpu.h b/include/xen/interface/vcpu.h
>> index 98188c87f5c1..8da788c5bd4f 100644
>> --- a/include/xen/interface/vcpu.h
>> +++ b/include/xen/interface/vcpu.h
>> @@ -178,4 +178,32 @@ DEFINE_GUEST_HANDLE_STRUCT(vcpu_register_vcpu_info);
>>  
>>  /* Send an NMI to the specified VCPU. @extra_arg == NULL. */
>>  #define VCPUOP_send_nmi             11
>> +
>> +/*
>> + * Register a memory location to get a secondary copy of the vcpu time
>> + * parameters.  The master copy still exists as part of the vcpu shared
>> + * memory area, and this secondary copy is updated whenever the master copy
>> + * is updated (and using the same versioning scheme for synchronisation).
>> + *
>> + * The intent is that this copy may be mapped (RO) into userspace so
>> + * that usermode can compute system time using the time info and the
>> + * tsc.  Usermode will see an array of vcpu_time_info structures, one
>> + * for each vcpu, and choose the right one by an existing mechanism
>> + * which allows it to get the current vcpu number (such as via a
>> + * segment limit).  It can then apply the normal algorithm to compute
>> + * system time from the tsc.
>> + *
>> + * @extra_arg == pointer to vcpu_register_time_info_memory_area structure.
>> + */
>> +#define VCPUOP_register_vcpu_time_memory_area   13
>> +DEFINE_GUEST_HANDLE_STRUCT(vcpu_time_info_t);
>> +struct vcpu_register_time_memory_area {
>> +	union {
>> +		GUEST_HANDLE(vcpu_time_info_t) h;
>> +		struct pvclock_vcpu_time_info *v;
>> +		uint64_t p;
>> +	} addr;
>> +};
>> +DEFINE_GUEST_HANDLE_STRUCT(vcpu_register_time_memory_area_t);
>> +
> 
> Instead of adding only the operation you need, maybe you could sync just
> the complete header from Xen?

Yeap - I will update it. I suppose this means only adding VCPUOP_get_physid -
the rest seems to be up-to-date AFAICT.

Joao

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 2/3] x86/xen/time: setup vcpu 0 time info page
@ 2017-09-26  9:57       ` Joao Martins
  0 siblings, 0 replies; 13+ messages in thread
From: Joao Martins @ 2017-09-26  9:57 UTC (permalink / raw)
  To: Juergen Gross
  Cc: x86, linux-kernel, Andy Lutomirski, Ingo Molnar, H. Peter Anvin,
	xen-devel, Thomas Gleixner, Boris Ostrovsky

On 09/26/2017 10:32 AM, Juergen Gross wrote:
> On 22/09/17 18:25, Joao Martins wrote:
[snip]
>> +static void xen_setup_vsyscall_time_info(void)
>> +{
>> +	struct vcpu_register_time_memory_area t;
>> +	struct pvclock_vsyscall_time_info *ti;
>> +	struct pvclock_vcpu_time_info *pvti;
>> +	int ret;
>> +
>> +	pvti = &__this_cpu_read(xen_vcpu)->time;
>> +
>> +	/*
>> +	 * We check ahead on the primary time info if this
>> +	 * bit is supported hence speeding up Xen clocksource.
>> +	 */
>> +	if (!(pvti->flags & PVCLOCK_TSC_STABLE_BIT))
>> +		return;
>> +
>> +	pvclock_set_flags(PVCLOCK_TSC_STABLE_BIT);
>> +
>> +	ti = (struct pvclock_vsyscall_time_info *) get_zeroed_page(GFP_KERNEL);
> 
> Coding style: omit the blank after the cast.
> 
OK.

>> +	if (!ti)
>> +		return;
>> +
>> +	t.addr.v = &ti->pvti;
>> +
>> +	ret = HYPERVISOR_vcpu_op(VCPUOP_register_vcpu_time_memory_area, 0, &t);
>> +	if (ret) {
>> +		pr_notice("xen: VCLOCK_PVCLOCK not supported (err %d)\n", ret);
>> +		free_page((unsigned long) ti);
> 
> Coding style again, once more below.
> 
OK.

>> +		return;
>> +	}
>> +
>> +	/*
>> +	 * If the check above succedded this one should too since it's the
>> +	 * same data on both primary and secondary time infos just different
>> +	 * memory regions. But we still check it in case hypervisor is buggy.
>> +	 */
>> +	pvti = &ti->pvti;
>> +	if (!(pvti->flags & PVCLOCK_TSC_STABLE_BIT)) {
>> +		t.addr.v = NULL;
>> +		ret = HYPERVISOR_vcpu_op(VCPUOP_register_vcpu_time_memory_area,
>> +					 0, &t);
>> +		if (!ret)
>> +			free_page((unsigned long) ti);
>> +
>> +		pr_notice("xen: VCLOCK_PVCLOCK not supported (err %d)\n", ret);
> 
> Mind making the message more descriptive? E.g. instead of reporting
> "(err 0)" just telling "(tsc unstable)"?
> 
Got it.

>> +		return;
>> +	}
>> +
>> +	xen_clock = ti;
>> +	pvclock_set_pvti_cpu0_va(xen_clock);
>> +
>> +	xen_clocksource.archdata.vclock_mode = VCLOCK_PVCLOCK;
>> +}
>> +
>>  static void __init xen_time_init(void)
>>  {
>>  	int cpu = smp_processor_id();
>> @@ -396,6 +495,7 @@ static void __init xen_time_init(void)
>>  	setup_force_cpu_cap(X86_FEATURE_TSC);
>>  
>>  	xen_setup_runstate_info(cpu);
>> +	xen_setup_vsyscall_time_info();
>>  	xen_setup_timer(cpu);
>>  	xen_setup_cpu_clockevents();
>>  
>> diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
>> index c8a6d224f7ed..f96dbedb33d4 100644
>> --- a/arch/x86/xen/xen-ops.h
>> +++ b/arch/x86/xen/xen-ops.h
>> @@ -69,6 +69,8 @@ void xen_setup_runstate_info(int cpu);
>>  void xen_teardown_timer(int cpu);
>>  u64 xen_clocksource_read(void);
>>  void xen_setup_cpu_clockevents(void);
>> +void xen_save_time_memory_area(void);
>> +void xen_restore_time_memory_area(void);
>>  void __init xen_init_time_ops(void);
>>  void __init xen_hvm_init_time_ops(void);
>>  
>> diff --git a/include/xen/interface/vcpu.h b/include/xen/interface/vcpu.h
>> index 98188c87f5c1..8da788c5bd4f 100644
>> --- a/include/xen/interface/vcpu.h
>> +++ b/include/xen/interface/vcpu.h
>> @@ -178,4 +178,32 @@ DEFINE_GUEST_HANDLE_STRUCT(vcpu_register_vcpu_info);
>>  
>>  /* Send an NMI to the specified VCPU. @extra_arg == NULL. */
>>  #define VCPUOP_send_nmi             11
>> +
>> +/*
>> + * Register a memory location to get a secondary copy of the vcpu time
>> + * parameters.  The master copy still exists as part of the vcpu shared
>> + * memory area, and this secondary copy is updated whenever the master copy
>> + * is updated (and using the same versioning scheme for synchronisation).
>> + *
>> + * The intent is that this copy may be mapped (RO) into userspace so
>> + * that usermode can compute system time using the time info and the
>> + * tsc.  Usermode will see an array of vcpu_time_info structures, one
>> + * for each vcpu, and choose the right one by an existing mechanism
>> + * which allows it to get the current vcpu number (such as via a
>> + * segment limit).  It can then apply the normal algorithm to compute
>> + * system time from the tsc.
>> + *
>> + * @extra_arg == pointer to vcpu_register_time_info_memory_area structure.
>> + */
>> +#define VCPUOP_register_vcpu_time_memory_area   13
>> +DEFINE_GUEST_HANDLE_STRUCT(vcpu_time_info_t);
>> +struct vcpu_register_time_memory_area {
>> +	union {
>> +		GUEST_HANDLE(vcpu_time_info_t) h;
>> +		struct pvclock_vcpu_time_info *v;
>> +		uint64_t p;
>> +	} addr;
>> +};
>> +DEFINE_GUEST_HANDLE_STRUCT(vcpu_register_time_memory_area_t);
>> +
> 
> Instead of adding only the operation you need, maybe you could sync just
> the complete header from Xen?

Yeap - I will update it. I suppose this means only adding VCPUOP_get_physid -
the rest seems to be up-to-date AFAICT.

Joao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 2/3] x86/xen/time: setup vcpu 0 time info page
  2017-09-26  9:57       ` Joao Martins
  (?)
  (?)
@ 2017-09-26 10:05       ` Juergen Gross
  -1 siblings, 0 replies; 13+ messages in thread
From: Juergen Gross @ 2017-09-26 10:05 UTC (permalink / raw)
  To: Joao Martins
  Cc: linux-kernel, xen-devel, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, x86, Boris Ostrovsky, Andy Lutomirski

On 26/09/17 11:57, Joao Martins wrote:
> On 09/26/2017 10:32 AM, Juergen Gross wrote:
>> On 22/09/17 18:25, Joao Martins wrote:
> [snip]
>>> +static void xen_setup_vsyscall_time_info(void)
>>> +{
>>> +	struct vcpu_register_time_memory_area t;
>>> +	struct pvclock_vsyscall_time_info *ti;
>>> +	struct pvclock_vcpu_time_info *pvti;
>>> +	int ret;
>>> +
>>> +	pvti = &__this_cpu_read(xen_vcpu)->time;
>>> +
>>> +	/*
>>> +	 * We check ahead on the primary time info if this
>>> +	 * bit is supported hence speeding up Xen clocksource.
>>> +	 */
>>> +	if (!(pvti->flags & PVCLOCK_TSC_STABLE_BIT))
>>> +		return;
>>> +
>>> +	pvclock_set_flags(PVCLOCK_TSC_STABLE_BIT);
>>> +
>>> +	ti = (struct pvclock_vsyscall_time_info *) get_zeroed_page(GFP_KERNEL);
>>
>> Coding style: omit the blank after the cast.
>>
> OK.
> 
>>> +	if (!ti)
>>> +		return;
>>> +
>>> +	t.addr.v = &ti->pvti;
>>> +
>>> +	ret = HYPERVISOR_vcpu_op(VCPUOP_register_vcpu_time_memory_area, 0, &t);
>>> +	if (ret) {
>>> +		pr_notice("xen: VCLOCK_PVCLOCK not supported (err %d)\n", ret);
>>> +		free_page((unsigned long) ti);
>>
>> Coding style again, once more below.
>>
> OK.
> 
>>> +		return;
>>> +	}
>>> +
>>> +	/*
>>> +	 * If the check above succedded this one should too since it's the
>>> +	 * same data on both primary and secondary time infos just different
>>> +	 * memory regions. But we still check it in case hypervisor is buggy.
>>> +	 */
>>> +	pvti = &ti->pvti;
>>> +	if (!(pvti->flags & PVCLOCK_TSC_STABLE_BIT)) {
>>> +		t.addr.v = NULL;
>>> +		ret = HYPERVISOR_vcpu_op(VCPUOP_register_vcpu_time_memory_area,
>>> +					 0, &t);
>>> +		if (!ret)
>>> +			free_page((unsigned long) ti);
>>> +
>>> +		pr_notice("xen: VCLOCK_PVCLOCK not supported (err %d)\n", ret);
>>
>> Mind making the message more descriptive? E.g. instead of reporting
>> "(err 0)" just telling "(tsc unstable)"?
>>
> Got it.
> 
>>> +		return;
>>> +	}
>>> +
>>> +	xen_clock = ti;
>>> +	pvclock_set_pvti_cpu0_va(xen_clock);
>>> +
>>> +	xen_clocksource.archdata.vclock_mode = VCLOCK_PVCLOCK;
>>> +}
>>> +
>>>  static void __init xen_time_init(void)
>>>  {
>>>  	int cpu = smp_processor_id();
>>> @@ -396,6 +495,7 @@ static void __init xen_time_init(void)
>>>  	setup_force_cpu_cap(X86_FEATURE_TSC);
>>>  
>>>  	xen_setup_runstate_info(cpu);
>>> +	xen_setup_vsyscall_time_info();
>>>  	xen_setup_timer(cpu);
>>>  	xen_setup_cpu_clockevents();
>>>  
>>> diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
>>> index c8a6d224f7ed..f96dbedb33d4 100644
>>> --- a/arch/x86/xen/xen-ops.h
>>> +++ b/arch/x86/xen/xen-ops.h
>>> @@ -69,6 +69,8 @@ void xen_setup_runstate_info(int cpu);
>>>  void xen_teardown_timer(int cpu);
>>>  u64 xen_clocksource_read(void);
>>>  void xen_setup_cpu_clockevents(void);
>>> +void xen_save_time_memory_area(void);
>>> +void xen_restore_time_memory_area(void);
>>>  void __init xen_init_time_ops(void);
>>>  void __init xen_hvm_init_time_ops(void);
>>>  
>>> diff --git a/include/xen/interface/vcpu.h b/include/xen/interface/vcpu.h
>>> index 98188c87f5c1..8da788c5bd4f 100644
>>> --- a/include/xen/interface/vcpu.h
>>> +++ b/include/xen/interface/vcpu.h
>>> @@ -178,4 +178,32 @@ DEFINE_GUEST_HANDLE_STRUCT(vcpu_register_vcpu_info);
>>>  
>>>  /* Send an NMI to the specified VCPU. @extra_arg == NULL. */
>>>  #define VCPUOP_send_nmi             11
>>> +
>>> +/*
>>> + * Register a memory location to get a secondary copy of the vcpu time
>>> + * parameters.  The master copy still exists as part of the vcpu shared
>>> + * memory area, and this secondary copy is updated whenever the master copy
>>> + * is updated (and using the same versioning scheme for synchronisation).
>>> + *
>>> + * The intent is that this copy may be mapped (RO) into userspace so
>>> + * that usermode can compute system time using the time info and the
>>> + * tsc.  Usermode will see an array of vcpu_time_info structures, one
>>> + * for each vcpu, and choose the right one by an existing mechanism
>>> + * which allows it to get the current vcpu number (such as via a
>>> + * segment limit).  It can then apply the normal algorithm to compute
>>> + * system time from the tsc.
>>> + *
>>> + * @extra_arg == pointer to vcpu_register_time_info_memory_area structure.
>>> + */
>>> +#define VCPUOP_register_vcpu_time_memory_area   13
>>> +DEFINE_GUEST_HANDLE_STRUCT(vcpu_time_info_t);
>>> +struct vcpu_register_time_memory_area {
>>> +	union {
>>> +		GUEST_HANDLE(vcpu_time_info_t) h;
>>> +		struct pvclock_vcpu_time_info *v;
>>> +		uint64_t p;
>>> +	} addr;
>>> +};
>>> +DEFINE_GUEST_HANDLE_STRUCT(vcpu_register_time_memory_area_t);
>>> +
>>
>> Instead of adding only the operation you need, maybe you could sync just
>> the complete header from Xen?
> 
> Yeap - I will update it. I suppose this means only adding VCPUOP_get_physid -
> the rest seems to be up-to-date AFAICT.

Correct.


Juergen

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 2/3] x86/xen/time: setup vcpu 0 time info page
  2017-09-26  9:57       ` Joao Martins
  (?)
@ 2017-09-26 10:05       ` Juergen Gross
  -1 siblings, 0 replies; 13+ messages in thread
From: Juergen Gross @ 2017-09-26 10:05 UTC (permalink / raw)
  To: Joao Martins
  Cc: x86, linux-kernel, Andy Lutomirski, Ingo Molnar, H. Peter Anvin,
	xen-devel, Thomas Gleixner, Boris Ostrovsky

On 26/09/17 11:57, Joao Martins wrote:
> On 09/26/2017 10:32 AM, Juergen Gross wrote:
>> On 22/09/17 18:25, Joao Martins wrote:
> [snip]
>>> +static void xen_setup_vsyscall_time_info(void)
>>> +{
>>> +	struct vcpu_register_time_memory_area t;
>>> +	struct pvclock_vsyscall_time_info *ti;
>>> +	struct pvclock_vcpu_time_info *pvti;
>>> +	int ret;
>>> +
>>> +	pvti = &__this_cpu_read(xen_vcpu)->time;
>>> +
>>> +	/*
>>> +	 * We check ahead on the primary time info if this
>>> +	 * bit is supported hence speeding up Xen clocksource.
>>> +	 */
>>> +	if (!(pvti->flags & PVCLOCK_TSC_STABLE_BIT))
>>> +		return;
>>> +
>>> +	pvclock_set_flags(PVCLOCK_TSC_STABLE_BIT);
>>> +
>>> +	ti = (struct pvclock_vsyscall_time_info *) get_zeroed_page(GFP_KERNEL);
>>
>> Coding style: omit the blank after the cast.
>>
> OK.
> 
>>> +	if (!ti)
>>> +		return;
>>> +
>>> +	t.addr.v = &ti->pvti;
>>> +
>>> +	ret = HYPERVISOR_vcpu_op(VCPUOP_register_vcpu_time_memory_area, 0, &t);
>>> +	if (ret) {
>>> +		pr_notice("xen: VCLOCK_PVCLOCK not supported (err %d)\n", ret);
>>> +		free_page((unsigned long) ti);
>>
>> Coding style again, once more below.
>>
> OK.
> 
>>> +		return;
>>> +	}
>>> +
>>> +	/*
>>> +	 * If the check above succedded this one should too since it's the
>>> +	 * same data on both primary and secondary time infos just different
>>> +	 * memory regions. But we still check it in case hypervisor is buggy.
>>> +	 */
>>> +	pvti = &ti->pvti;
>>> +	if (!(pvti->flags & PVCLOCK_TSC_STABLE_BIT)) {
>>> +		t.addr.v = NULL;
>>> +		ret = HYPERVISOR_vcpu_op(VCPUOP_register_vcpu_time_memory_area,
>>> +					 0, &t);
>>> +		if (!ret)
>>> +			free_page((unsigned long) ti);
>>> +
>>> +		pr_notice("xen: VCLOCK_PVCLOCK not supported (err %d)\n", ret);
>>
>> Mind making the message more descriptive? E.g. instead of reporting
>> "(err 0)" just telling "(tsc unstable)"?
>>
> Got it.
> 
>>> +		return;
>>> +	}
>>> +
>>> +	xen_clock = ti;
>>> +	pvclock_set_pvti_cpu0_va(xen_clock);
>>> +
>>> +	xen_clocksource.archdata.vclock_mode = VCLOCK_PVCLOCK;
>>> +}
>>> +
>>>  static void __init xen_time_init(void)
>>>  {
>>>  	int cpu = smp_processor_id();
>>> @@ -396,6 +495,7 @@ static void __init xen_time_init(void)
>>>  	setup_force_cpu_cap(X86_FEATURE_TSC);
>>>  
>>>  	xen_setup_runstate_info(cpu);
>>> +	xen_setup_vsyscall_time_info();
>>>  	xen_setup_timer(cpu);
>>>  	xen_setup_cpu_clockevents();
>>>  
>>> diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
>>> index c8a6d224f7ed..f96dbedb33d4 100644
>>> --- a/arch/x86/xen/xen-ops.h
>>> +++ b/arch/x86/xen/xen-ops.h
>>> @@ -69,6 +69,8 @@ void xen_setup_runstate_info(int cpu);
>>>  void xen_teardown_timer(int cpu);
>>>  u64 xen_clocksource_read(void);
>>>  void xen_setup_cpu_clockevents(void);
>>> +void xen_save_time_memory_area(void);
>>> +void xen_restore_time_memory_area(void);
>>>  void __init xen_init_time_ops(void);
>>>  void __init xen_hvm_init_time_ops(void);
>>>  
>>> diff --git a/include/xen/interface/vcpu.h b/include/xen/interface/vcpu.h
>>> index 98188c87f5c1..8da788c5bd4f 100644
>>> --- a/include/xen/interface/vcpu.h
>>> +++ b/include/xen/interface/vcpu.h
>>> @@ -178,4 +178,32 @@ DEFINE_GUEST_HANDLE_STRUCT(vcpu_register_vcpu_info);
>>>  
>>>  /* Send an NMI to the specified VCPU. @extra_arg == NULL. */
>>>  #define VCPUOP_send_nmi             11
>>> +
>>> +/*
>>> + * Register a memory location to get a secondary copy of the vcpu time
>>> + * parameters.  The master copy still exists as part of the vcpu shared
>>> + * memory area, and this secondary copy is updated whenever the master copy
>>> + * is updated (and using the same versioning scheme for synchronisation).
>>> + *
>>> + * The intent is that this copy may be mapped (RO) into userspace so
>>> + * that usermode can compute system time using the time info and the
>>> + * tsc.  Usermode will see an array of vcpu_time_info structures, one
>>> + * for each vcpu, and choose the right one by an existing mechanism
>>> + * which allows it to get the current vcpu number (such as via a
>>> + * segment limit).  It can then apply the normal algorithm to compute
>>> + * system time from the tsc.
>>> + *
>>> + * @extra_arg == pointer to vcpu_register_time_info_memory_area structure.
>>> + */
>>> +#define VCPUOP_register_vcpu_time_memory_area   13
>>> +DEFINE_GUEST_HANDLE_STRUCT(vcpu_time_info_t);
>>> +struct vcpu_register_time_memory_area {
>>> +	union {
>>> +		GUEST_HANDLE(vcpu_time_info_t) h;
>>> +		struct pvclock_vcpu_time_info *v;
>>> +		uint64_t p;
>>> +	} addr;
>>> +};
>>> +DEFINE_GUEST_HANDLE_STRUCT(vcpu_register_time_memory_area_t);
>>> +
>>
>> Instead of adding only the operation you need, maybe you could sync just
>> the complete header from Xen?
> 
> Yeap - I will update it. I suppose this means only adding VCPUOP_get_physid -
> the rest seems to be up-to-date AFAICT.

Correct.


Juergen


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2017-09-26 10:05 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-22 16:25 [PATCH v2 0/3] x86/xen: pvclock vdso support Joao Martins
2017-09-22 16:25 ` [PATCH v2 1/3] x86/pvclock: add setter for pvclock_pvti_cpu0_va Joao Martins
2017-09-22 16:25 ` Joao Martins
2017-09-22 16:25 ` [PATCH v2 2/3] x86/xen/time: setup vcpu 0 time info page Joao Martins
2017-09-26  9:32   ` Juergen Gross
2017-09-26  9:32   ` Juergen Gross
2017-09-26  9:57     ` Joao Martins
2017-09-26  9:57       ` Joao Martins
2017-09-26 10:05       ` Juergen Gross
2017-09-26 10:05       ` Juergen Gross
2017-09-22 16:25 ` Joao Martins
2017-09-22 16:25 ` [PATCH v2 3/3] MAINTAINERS: xen, kvm: track pvclock-abi.h changes Joao Martins
2017-09-22 16:25   ` Joao Martins

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.