All of lore.kernel.org
 help / color / mirror / Atom feed
* [patch 0/7] x86/kvmclock: Remove memblock dependency and further cleanups
@ 2018-07-06 16:13 Thomas Gleixner
  2018-07-06 16:13 ` [patch 1/7] x86/kvmclock: Remove memblock dependency Thomas Gleixner
                   ` (9 more replies)
  0 siblings, 10 replies; 27+ messages in thread
From: Thomas Gleixner @ 2018-07-06 16:13 UTC (permalink / raw)
  To: LKML
  Cc: Paolo Bonzini, Radim Krcmar, Peter Zijlstra, Juergen Gross,
	Pavel Tatashin, steven.sistare, daniel.m.jordan, x86, kvm

To allow early utilization of kvmclock it is required to remove the
memblock dependency. memblock is currently used to allocate the per
cpu data for kvmclock.

The first patch replaces the memblock with a static array sized 64bytes *
NR_CPUS and was posted by Pavel. That patch allocates everything statically
which is a waste when kvmclock is not used.

The rest of the series cleans up the code and converts it to per cpu
variables but does not put the kvmclock data into the per cpu area as that
has an issue vs. mapping the boot cpu data into the VDSO (leaks arbitrary
data, unless page sized).

The per cpu data consists of pointers to the actual data. For the boot cpu
a page sized array is statically allocated which can be mapped into the
VDSO. That array is used for initializing the first 64 CPU pointers. If
there are more CPUs the pvclock data is allocated during CPU bringup.

So this still will have some overhead when kvmclock is not in use, but
bringing it down to zero would be a massive trainwreck and even more
indirections.

Thanks,

	tglx

8<--------------
 a/arch/x86/include/asm/kvm_guest.h |    7 
 arch/x86/include/asm/kvm_para.h    |    1 
 arch/x86/kernel/kvm.c              |   14 -
 arch/x86/kernel/kvmclock.c         |  262 ++++++++++++++-----------------------
 arch/x86/kernel/setup.c            |    4 
 5 files changed, 105 insertions(+), 183 deletions(-)





^ permalink raw reply	[flat|nested] 27+ messages in thread

* [patch 1/7] x86/kvmclock: Remove memblock dependency
  2018-07-06 16:13 [patch 0/7] x86/kvmclock: Remove memblock dependency and further cleanups Thomas Gleixner
@ 2018-07-06 16:13 ` Thomas Gleixner
  2018-07-06 16:13 ` [patch 2/7] x86/kvmclock: Remove page size requirement from wall_clock Thomas Gleixner
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 27+ messages in thread
From: Thomas Gleixner @ 2018-07-06 16:13 UTC (permalink / raw)
  To: LKML
  Cc: Paolo Bonzini, Radim Krcmar, Peter Zijlstra, Juergen Gross,
	Pavel Tatashin, steven.sistare, daniel.m.jordan, x86, kvm

From: Pavel Tatashin <pasha.tatashin@oracle.com>

KVM clock is initialized later compared to other hypervisor clocks because
it has a dependency on the memblock allocator.

Bring it in line with other hypervisors by using memory from the BSS
instead of allocating it.

The benefits:

  - Remove ifdef from common code
  - Earlier availability of the clock
  - Remove dependency on memblock, and reduce code

The downside:

  - Static allocation of the per cpu data structures sized NR_CPUS * 64byte
    Will be addressed in follow up patches.

[ tglx: Split out from larger series ]

Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krcmar <rkrcmar@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: steven.sistare@oracle.com
Cc: daniel.m.jordan@oracle.com
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org

---
 arch/x86/kernel/kvm.c      |    1 
 arch/x86/kernel/kvmclock.c |   66 +++++++--------------------------------------
 arch/x86/kernel/setup.c    |    4 --
 3 files changed, 12 insertions(+), 59 deletions(-)

--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -628,6 +628,7 @@ const __initconst struct hypervisor_x86
 	.name			= "KVM",
 	.detect			= kvm_detect,
 	.type			= X86_HYPER_KVM,
+	.init.init_platform	= kvmclock_init,
 	.init.guest_late_init	= kvm_guest_init,
 	.init.x2apic_available	= kvm_para_available,
 };
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -23,9 +23,9 @@
 #include <asm/apic.h>
 #include <linux/percpu.h>
 #include <linux/hardirq.h>
-#include <linux/memblock.h>
 #include <linux/sched.h>
 #include <linux/sched/clock.h>
+#include <linux/mm.h>
 
 #include <asm/mem_encrypt.h>
 #include <asm/x86_init.h>
@@ -44,6 +44,13 @@ static int parse_no_kvmclock(char *arg)
 }
 early_param("no-kvmclock", parse_no_kvmclock);
 
+/* Aligned to page sizes to match whats mapped via vsyscalls to userspace */
+#define HV_CLOCK_SIZE	(sizeof(struct pvclock_vsyscall_time_info) * NR_CPUS)
+#define WALL_CLOCK_SIZE	(sizeof(struct pvclock_wall_clock))
+
+static u8 hv_clock_mem[PAGE_ALIGN(HV_CLOCK_SIZE)] __aligned(PAGE_SIZE);
+static u8 wall_clock_mem[PAGE_ALIGN(WALL_CLOCK_SIZE)] __aligned(PAGE_SIZE);
+
 /* The hypervisor will put information about time periodically here */
 static struct pvclock_vsyscall_time_info *hv_clock;
 static struct pvclock_wall_clock *wall_clock;
@@ -244,43 +251,12 @@ static void kvm_shutdown(void)
 	native_machine_shutdown();
 }
 
-static phys_addr_t __init kvm_memblock_alloc(phys_addr_t size,
-					     phys_addr_t align)
-{
-	phys_addr_t mem;
-
-	mem = memblock_alloc(size, align);
-	if (!mem)
-		return 0;
-
-	if (sev_active()) {
-		if (early_set_memory_decrypted((unsigned long)__va(mem), size))
-			goto e_free;
-	}
-
-	return mem;
-e_free:
-	memblock_free(mem, size);
-	return 0;
-}
-
-static void __init kvm_memblock_free(phys_addr_t addr, phys_addr_t size)
-{
-	if (sev_active())
-		early_set_memory_encrypted((unsigned long)__va(addr), size);
-
-	memblock_free(addr, size);
-}
-
 void __init kvmclock_init(void)
 {
 	struct pvclock_vcpu_time_info *vcpu_time;
-	unsigned long mem, mem_wall_clock;
-	int size, cpu, wall_clock_size;
+	int cpu;
 	u8 flags;
 
-	size = PAGE_ALIGN(sizeof(struct pvclock_vsyscall_time_info)*NR_CPUS);
-
 	if (!kvm_para_available())
 		return;
 
@@ -290,28 +266,11 @@ void __init kvmclock_init(void)
 	} else if (!(kvmclock && kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE)))
 		return;
 
-	wall_clock_size = PAGE_ALIGN(sizeof(struct pvclock_wall_clock));
-	mem_wall_clock = kvm_memblock_alloc(wall_clock_size, PAGE_SIZE);
-	if (!mem_wall_clock)
-		return;
-
-	wall_clock = __va(mem_wall_clock);
-	memset(wall_clock, 0, wall_clock_size);
-
-	mem = kvm_memblock_alloc(size, PAGE_SIZE);
-	if (!mem) {
-		kvm_memblock_free(mem_wall_clock, wall_clock_size);
-		wall_clock = NULL;
-		return;
-	}
-
-	hv_clock = __va(mem);
-	memset(hv_clock, 0, size);
+	wall_clock = (struct pvclock_wall_clock *)wall_clock_mem;
+	hv_clock = (struct pvclock_vsyscall_time_info *)hv_clock_mem;
 
 	if (kvm_register_clock("primary cpu clock")) {
 		hv_clock = NULL;
-		kvm_memblock_free(mem, size);
-		kvm_memblock_free(mem_wall_clock, wall_clock_size);
 		wall_clock = NULL;
 		return;
 	}
@@ -354,13 +313,10 @@ int __init kvm_setup_vsyscall_timeinfo(v
 	int cpu;
 	u8 flags;
 	struct pvclock_vcpu_time_info *vcpu_time;
-	unsigned int size;
 
 	if (!hv_clock)
 		return 0;
 
-	size = PAGE_ALIGN(sizeof(struct pvclock_vsyscall_time_info)*NR_CPUS);
-
 	cpu = get_cpu();
 
 	vcpu_time = &hv_clock[cpu].pvti;
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1197,10 +1197,6 @@ void __init setup_arch(char **cmdline_p)
 
 	memblock_find_dma_reserve();
 
-#ifdef CONFIG_KVM_GUEST
-	kvmclock_init();
-#endif
-
 	tsc_early_delay_calibrate();
 	if (!early_xdbc_setup_hardware())
 		early_xdbc_register_console();



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [patch 2/7] x86/kvmclock: Remove page size requirement from wall_clock
  2018-07-06 16:13 [patch 0/7] x86/kvmclock: Remove memblock dependency and further cleanups Thomas Gleixner
  2018-07-06 16:13 ` [patch 1/7] x86/kvmclock: Remove memblock dependency Thomas Gleixner
@ 2018-07-06 16:13 ` Thomas Gleixner
  2018-07-12  2:15   ` Pavel Tatashin
  2018-07-06 16:13 ` [patch 3/7] x86/kvmclock: Decrapify kvm_register_clock() Thomas Gleixner
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 27+ messages in thread
From: Thomas Gleixner @ 2018-07-06 16:13 UTC (permalink / raw)
  To: LKML
  Cc: Paolo Bonzini, Radim Krcmar, Peter Zijlstra, Juergen Gross,
	Pavel Tatashin, steven.sistare, daniel.m.jordan, x86, kvm

There is no requirement for wall_clock data to be page aligned or page
sized.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krcmar <rkrcmar@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: steven.sistare@oracle.com
Cc: daniel.m.jordan@oracle.com
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
---
 arch/x86/kernel/kvmclock.c |   12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -46,14 +46,12 @@ early_param("no-kvmclock", parse_no_kvmc
 
 /* Aligned to page sizes to match whats mapped via vsyscalls to userspace */
 #define HV_CLOCK_SIZE	(sizeof(struct pvclock_vsyscall_time_info) * NR_CPUS)
-#define WALL_CLOCK_SIZE	(sizeof(struct pvclock_wall_clock))
 
 static u8 hv_clock_mem[PAGE_ALIGN(HV_CLOCK_SIZE)] __aligned(PAGE_SIZE);
-static u8 wall_clock_mem[PAGE_ALIGN(WALL_CLOCK_SIZE)] __aligned(PAGE_SIZE);
 
 /* The hypervisor will put information about time periodically here */
 static struct pvclock_vsyscall_time_info *hv_clock;
-static struct pvclock_wall_clock *wall_clock;
+static struct pvclock_wall_clock wall_clock;
 
 /*
  * The wallclock is the time of day when we booted. Since then, some time may
@@ -66,15 +64,15 @@ static void kvm_get_wallclock(struct tim
 	int low, high;
 	int cpu;
 
-	low = (int)slow_virt_to_phys(wall_clock);
-	high = ((u64)slow_virt_to_phys(wall_clock) >> 32);
+	low = (int)slow_virt_to_phys(&wall_clock);
+	high = ((u64)slow_virt_to_phys(&wall_clock) >> 32);
 
 	native_write_msr(msr_kvm_wall_clock, low, high);
 
 	cpu = get_cpu();
 
 	vcpu_time = &hv_clock[cpu].pvti;
-	pvclock_read_wallclock(wall_clock, vcpu_time, now);
+	pvclock_read_wallclock(&wall_clock, vcpu_time, now);
 
 	put_cpu();
 }
@@ -266,12 +264,10 @@ void __init kvmclock_init(void)
 	} else if (!(kvmclock && kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE)))
 		return;
 
-	wall_clock = (struct pvclock_wall_clock *)wall_clock_mem;
 	hv_clock = (struct pvclock_vsyscall_time_info *)hv_clock_mem;
 
 	if (kvm_register_clock("primary cpu clock")) {
 		hv_clock = NULL;
-		wall_clock = NULL;
 		return;
 	}
 



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [patch 3/7] x86/kvmclock: Decrapify kvm_register_clock()
  2018-07-06 16:13 [patch 0/7] x86/kvmclock: Remove memblock dependency and further cleanups Thomas Gleixner
  2018-07-06 16:13 ` [patch 1/7] x86/kvmclock: Remove memblock dependency Thomas Gleixner
  2018-07-06 16:13 ` [patch 2/7] x86/kvmclock: Remove page size requirement from wall_clock Thomas Gleixner
@ 2018-07-06 16:13 ` Thomas Gleixner
  2018-07-06 17:38   ` Paolo Bonzini
  2018-07-12  2:24   ` Pavel Tatashin
  2018-07-06 16:13 ` [patch 4/7] x86/kvmclock: Cleanup the code Thomas Gleixner
                   ` (6 subsequent siblings)
  9 siblings, 2 replies; 27+ messages in thread
From: Thomas Gleixner @ 2018-07-06 16:13 UTC (permalink / raw)
  To: LKML
  Cc: Paolo Bonzini, Radim Krcmar, Peter Zijlstra, Juergen Gross,
	Pavel Tatashin, steven.sistare, daniel.m.jordan, x86, kvm

The return value is pointless because the wrmsr cannot fail if
KVM_FEATURE_CLOCKSOURCE or KVM_FEATURE_CLOCKSOURCE2 are set.

kvm_register_clock() is only called locally so wants to be static.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krcmar <rkrcmar@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: steven.sistare@oracle.com
Cc: daniel.m.jordan@oracle.com
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
---
 arch/x86/include/asm/kvm_para.h |    1 -
 arch/x86/kernel/kvmclock.c      |   35 +++++++++++------------------------
 2 files changed, 11 insertions(+), 25 deletions(-)

--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -7,7 +7,6 @@
 #include <uapi/asm/kvm_para.h>
 
 extern void kvmclock_init(void);
-extern int kvm_register_clock(char *txt);
 
 #ifdef CONFIG_KVM_GUEST
 bool kvm_check_and_clear_guest_paused(void);
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -186,23 +186,18 @@ struct clocksource kvm_clock = {
 };
 EXPORT_SYMBOL_GPL(kvm_clock);
 
-int kvm_register_clock(char *txt)
+static void kvm_register_clock(char *txt)
 {
-	int cpu = smp_processor_id();
-	int low, high, ret;
 	struct pvclock_vcpu_time_info *src;
+	int cpu = smp_processor_id();
+	u64 pa;
 
 	if (!hv_clock)
-		return 0;
-
-	src = &hv_clock[cpu].pvti;
-	low = (int)slow_virt_to_phys(src) | 1;
-	high = ((u64)slow_virt_to_phys(src) >> 32);
-	ret = native_write_msr_safe(msr_kvm_system_time, low, high);
-	printk(KERN_INFO "kvm-clock: cpu %d, msr %x:%x, %s\n",
-	       cpu, high, low, txt);
+		return;
 
-	return ret;
+	pa = slow_virt_to_phys(src) | 0x01ULL;
+	wrmsrl(msr_kvm_system_time, pa);
+	pr_info("kvm-clock: cpu %d, msr %llx, %s\n", cpu, pa, txt);
 }
 
 static void kvm_save_sched_clock_state(void)
@@ -217,11 +212,7 @@ static void kvm_restore_sched_clock_stat
 #ifdef CONFIG_X86_LOCAL_APIC
 static void kvm_setup_secondary_clock(void)
 {
-	/*
-	 * Now that the first cpu already had this clocksource initialized,
-	 * we shouldn't fail.
-	 */
-	WARN_ON(kvm_register_clock("secondary cpu clock"));
+	kvm_register_clock("secondary cpu clock");
 }
 #endif
 
@@ -264,16 +255,12 @@ void __init kvmclock_init(void)
 	} else if (!(kvmclock && kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE)))
 		return;
 
-	hv_clock = (struct pvclock_vsyscall_time_info *)hv_clock_mem;
-
-	if (kvm_register_clock("primary cpu clock")) {
-		hv_clock = NULL;
-		return;
-	}
-
 	printk(KERN_INFO "kvm-clock: Using msrs %x and %x",
 		msr_kvm_system_time, msr_kvm_wall_clock);
 
+	hv_clock = (struct pvclock_vsyscall_time_info *)hv_clock_mem;
+	kvm_register_clock("primary cpu clock");
+
 	if (kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE_STABLE_BIT))
 		pvclock_set_flags(PVCLOCK_TSC_STABLE_BIT);
 



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [patch 4/7] x86/kvmclock: Cleanup the code
  2018-07-06 16:13 [patch 0/7] x86/kvmclock: Remove memblock dependency and further cleanups Thomas Gleixner
                   ` (2 preceding siblings ...)
  2018-07-06 16:13 ` [patch 3/7] x86/kvmclock: Decrapify kvm_register_clock() Thomas Gleixner
@ 2018-07-06 16:13 ` Thomas Gleixner
  2018-07-06 17:39   ` Paolo Bonzini
  2018-07-09  9:05   ` Peter Zijlstra
  2018-07-06 16:13 ` [patch 5/7] x86/kvmclock: Mark variables __initdata and __ro_after_init Thomas Gleixner
                   ` (5 subsequent siblings)
  9 siblings, 2 replies; 27+ messages in thread
From: Thomas Gleixner @ 2018-07-06 16:13 UTC (permalink / raw)
  To: LKML
  Cc: Paolo Bonzini, Radim Krcmar, Peter Zijlstra, Juergen Gross,
	Pavel Tatashin, steven.sistare, daniel.m.jordan, x86, kvm

- Cleanup the mrs write for wall clock. The type casts to (int) are sloppy
  because the wrmsr parameters are u32 and aside of that wrmsrl() already
  provides the high/low split for free.

- Remove the pointless get_cpu()/put_cpu() dance from various
  functions. Either they are called during early init where CPU is
  guaranteed to be 0 or they are already called from non preemptible
  context where smp_processor_id() can be used safely

- Simplify the convoluted check for kvmclock in the init function.

- Mark the parameter parsing function __init. No point in keeping it
  around.

- Convert to pr_info()

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krcmar <rkrcmar@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: steven.sistare@oracle.com
Cc: daniel.m.jordan@oracle.com
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
---
 arch/x86/kernel/kvmclock.c |   75 +++++++++++++--------------------------------
 1 file changed, 23 insertions(+), 52 deletions(-)

--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -37,7 +37,7 @@ static int msr_kvm_system_time = MSR_KVM
 static int msr_kvm_wall_clock = MSR_KVM_WALL_CLOCK;
 static u64 kvm_sched_clock_offset;
 
-static int parse_no_kvmclock(char *arg)
+static int __init parse_no_kvmclock(char *arg)
 {
 	kvmclock = 0;
 	return 0;
@@ -61,13 +61,9 @@ static struct pvclock_wall_clock wall_cl
 static void kvm_get_wallclock(struct timespec64 *now)
 {
 	struct pvclock_vcpu_time_info *vcpu_time;
-	int low, high;
 	int cpu;
 
-	low = (int)slow_virt_to_phys(&wall_clock);
-	high = ((u64)slow_virt_to_phys(&wall_clock) >> 32);
-
-	native_write_msr(msr_kvm_wall_clock, low, high);
+	wrmsrl(msr_kvm_wall_clock, slow_virt_to_phys(&wall_clock));
 
 	cpu = get_cpu();
 
@@ -117,11 +113,11 @@ static inline void kvm_sched_clock_init(
 	kvm_sched_clock_offset = kvm_clock_read();
 	pv_time_ops.sched_clock = kvm_sched_clock_read;
 
-	printk(KERN_INFO "kvm-clock: using sched offset of %llu cycles\n",
-			kvm_sched_clock_offset);
+	pr_info("kvm-clock: using sched offset of %llu cycles\n",
+		kvm_sched_clock_offset);
 
 	BUILD_BUG_ON(sizeof(kvm_sched_clock_offset) >
-	         sizeof(((struct pvclock_vcpu_time_info *)NULL)->system_time));
+		sizeof(((struct pvclock_vcpu_time_info *)NULL)->system_time));
 }
 
 /*
@@ -135,15 +131,7 @@ static inline void kvm_sched_clock_init(
  */
 static unsigned long kvm_get_tsc_khz(void)
 {
-	struct pvclock_vcpu_time_info *src;
-	int cpu;
-	unsigned long tsc_khz;
-
-	cpu = get_cpu();
-	src = &hv_clock[cpu].pvti;
-	tsc_khz = pvclock_tsc_khz(src);
-	put_cpu();
-	return tsc_khz;
+	return pvclock_tsc_khz(&hv_clock[0].pvti);
 }
 
 static void kvm_get_preset_lpj(void)
@@ -160,29 +148,27 @@ static void kvm_get_preset_lpj(void)
 
 bool kvm_check_and_clear_guest_paused(void)
 {
-	bool ret = false;
 	struct pvclock_vcpu_time_info *src;
-	int cpu = smp_processor_id();
+	bool ret = false;
 
 	if (!hv_clock)
 		return ret;
 
-	src = &hv_clock[cpu].pvti;
+	src = &hv_clock[smp_processor_id()].pvti;
 	if ((src->flags & PVCLOCK_GUEST_STOPPED) != 0) {
 		src->flags &= ~PVCLOCK_GUEST_STOPPED;
 		pvclock_touch_watchdogs();
 		ret = true;
 	}
-
 	return ret;
 }
 
 struct clocksource kvm_clock = {
-	.name = "kvm-clock",
-	.read = kvm_clock_get_cycles,
-	.rating = 400,
-	.mask = CLOCKSOURCE_MASK(64),
-	.flags = CLOCK_SOURCE_IS_CONTINUOUS,
+	.name	= "kvm-clock",
+	.read	= kvm_clock_get_cycles,
+	.rating	= 400,
+	.mask	= CLOCKSOURCE_MASK(64),
+	.flags	= CLOCK_SOURCE_IS_CONTINUOUS,
 };
 EXPORT_SYMBOL_GPL(kvm_clock);
 
@@ -195,6 +181,7 @@ static void kvm_register_clock(char *txt
 	if (!hv_clock)
 		return;
 
+	src = &hv_clock[cpu].pvti;
 	pa = slow_virt_to_phys(src) | 0x01ULL;
 	wrmsrl(msr_kvm_system_time, pa);
 	pr_info("kvm-clock: cpu %d, msr %llx, %s\n", cpu, pa, txt);
@@ -242,20 +229,19 @@ static void kvm_shutdown(void)
 
 void __init kvmclock_init(void)
 {
-	struct pvclock_vcpu_time_info *vcpu_time;
-	int cpu;
 	u8 flags;
 
-	if (!kvm_para_available())
+	if (!kvm_para_available() || !kvmclock)
 		return;
 
-	if (kvmclock && kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE2)) {
+	if (kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE2)) {
 		msr_kvm_system_time = MSR_KVM_SYSTEM_TIME_NEW;
 		msr_kvm_wall_clock = MSR_KVM_WALL_CLOCK_NEW;
-	} else if (!(kvmclock && kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE)))
+	} else if (!kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE)) {
 		return;
+	}
 
-	printk(KERN_INFO "kvm-clock: Using msrs %x and %x",
+	pr_info("kvm-clock: Using msrs %x and %x",
 		msr_kvm_system_time, msr_kvm_wall_clock);
 
 	hv_clock = (struct pvclock_vsyscall_time_info *)hv_clock_mem;
@@ -264,20 +250,15 @@ void __init kvmclock_init(void)
 	if (kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE_STABLE_BIT))
 		pvclock_set_flags(PVCLOCK_TSC_STABLE_BIT);
 
-	cpu = get_cpu();
-	vcpu_time = &hv_clock[cpu].pvti;
-	flags = pvclock_read_flags(vcpu_time);
-
+	flags = pvclock_read_flags(&hv_clock[0].pvti);
 	kvm_sched_clock_init(flags & PVCLOCK_TSC_STABLE_BIT);
-	put_cpu();
 
 	x86_platform.calibrate_tsc = kvm_get_tsc_khz;
 	x86_platform.calibrate_cpu = kvm_get_tsc_khz;
 	x86_platform.get_wallclock = kvm_get_wallclock;
 	x86_platform.set_wallclock = kvm_set_wallclock;
 #ifdef CONFIG_X86_LOCAL_APIC
-	x86_cpuinit.early_percpu_clock_init =
-		kvm_setup_secondary_clock;
+	x86_cpuinit.early_percpu_clock_init = kvm_setup_secondary_clock;
 #endif
 	x86_platform.save_sched_clock_state = kvm_save_sched_clock_state;
 	x86_platform.restore_sched_clock_state = kvm_restore_sched_clock_state;
@@ -293,26 +274,16 @@ void __init kvmclock_init(void)
 int __init kvm_setup_vsyscall_timeinfo(void)
 {
 #ifdef CONFIG_X86_64
-	int cpu;
 	u8 flags;
-	struct pvclock_vcpu_time_info *vcpu_time;
 
 	if (!hv_clock)
 		return 0;
 
-	cpu = get_cpu();
-
-	vcpu_time = &hv_clock[cpu].pvti;
-	flags = pvclock_read_flags(vcpu_time);
-
-	if (!(flags & PVCLOCK_TSC_STABLE_BIT)) {
-		put_cpu();
+	flags = pvclock_read_flags(&hv_clock[0].pvti);
+	if (!(flags & PVCLOCK_TSC_STABLE_BIT))
 		return 1;
-	}
 
 	pvclock_set_pvti_cpu0_va(hv_clock);
-	put_cpu();
-
 	kvm_clock.archdata.vclock_mode = VCLOCK_PVCLOCK;
 #endif
 	return 0;



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [patch 5/7] x86/kvmclock: Mark variables __initdata and __ro_after_init
  2018-07-06 16:13 [patch 0/7] x86/kvmclock: Remove memblock dependency and further cleanups Thomas Gleixner
                   ` (3 preceding siblings ...)
  2018-07-06 16:13 ` [patch 4/7] x86/kvmclock: Cleanup the code Thomas Gleixner
@ 2018-07-06 16:13 ` Thomas Gleixner
  2018-07-12  2:31   ` Pavel Tatashin
  2018-07-06 16:13 ` [patch 6/7] x86/kvmclock: Move kvmclock vsyscall param and init to kvmclock Thomas Gleixner
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 27+ messages in thread
From: Thomas Gleixner @ 2018-07-06 16:13 UTC (permalink / raw)
  To: LKML
  Cc: Paolo Bonzini, Radim Krcmar, Peter Zijlstra, Juergen Gross,
	Pavel Tatashin, steven.sistare, daniel.m.jordan, x86, kvm

The kvmclock parameter is init data and the other variables are not
modified after init.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krcmar <rkrcmar@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: steven.sistare@oracle.com
Cc: daniel.m.jordan@oracle.com
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
---
 arch/x86/kernel/kvmclock.c |   10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -32,10 +32,10 @@
 #include <asm/reboot.h>
 #include <asm/kvmclock.h>
 
-static int kvmclock __ro_after_init = 1;
-static int msr_kvm_system_time = MSR_KVM_SYSTEM_TIME;
-static int msr_kvm_wall_clock = MSR_KVM_WALL_CLOCK;
-static u64 kvm_sched_clock_offset;
+static int __initdata kvmclock = 1;
+static int msr_kvm_system_time __ro_after_init = MSR_KVM_SYSTEM_TIME;
+static int msr_kvm_wall_clock __ro_after_init = MSR_KVM_WALL_CLOCK;
+static u64 kvm_sched_clock_offset __ro_after_init;
 
 static int __init parse_no_kvmclock(char *arg)
 {
@@ -50,7 +50,7 @@ early_param("no-kvmclock", parse_no_kvmc
 static u8 hv_clock_mem[PAGE_ALIGN(HV_CLOCK_SIZE)] __aligned(PAGE_SIZE);
 
 /* The hypervisor will put information about time periodically here */
-static struct pvclock_vsyscall_time_info *hv_clock;
+static struct pvclock_vsyscall_time_info *hv_clock __ro_after_init;
 static struct pvclock_wall_clock wall_clock;
 
 /*



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [patch 6/7] x86/kvmclock: Move kvmclock vsyscall param and init to kvmclock
  2018-07-06 16:13 [patch 0/7] x86/kvmclock: Remove memblock dependency and further cleanups Thomas Gleixner
                   ` (4 preceding siblings ...)
  2018-07-06 16:13 ` [patch 5/7] x86/kvmclock: Mark variables __initdata and __ro_after_init Thomas Gleixner
@ 2018-07-06 16:13 ` Thomas Gleixner
  2018-07-06 17:43   ` Paolo Bonzini
  2018-07-12  2:52   ` Pavel Tatashin
  2018-07-06 16:13 ` [patch 7/7] x86/kvmclock: Switch kvmclock data to a PER_CPU variable Thomas Gleixner
                   ` (3 subsequent siblings)
  9 siblings, 2 replies; 27+ messages in thread
From: Thomas Gleixner @ 2018-07-06 16:13 UTC (permalink / raw)
  To: LKML
  Cc: Paolo Bonzini, Radim Krcmar, Peter Zijlstra, Juergen Gross,
	Pavel Tatashin, steven.sistare, daniel.m.jordan, x86, kvm

There is no point to have this in the kvm code itself and call it from
there. This can be called from an initcall and the parameter is cleared
when the hypervisor is not KVM.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krcmar <rkrcmar@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: steven.sistare@oracle.com
Cc: daniel.m.jordan@oracle.com
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
---
 arch/x86/include/asm/kvm_guest.h |    7 -----
 arch/x86/kernel/kvm.c            |   13 ----------
 arch/x86/kernel/kvmclock.c       |   49 ++++++++++++++++++++++++---------------
 3 files changed, 31 insertions(+), 38 deletions(-)

--- a/arch/x86/include/asm/kvm_guest.h
+++ /dev/null
@@ -1,7 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef _ASM_X86_KVM_GUEST_H
-#define _ASM_X86_KVM_GUEST_H
-
-int kvm_setup_vsyscall_timeinfo(void);
-
-#endif /* _ASM_X86_KVM_GUEST_H */
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -45,7 +45,6 @@
 #include <asm/apic.h>
 #include <asm/apicdef.h>
 #include <asm/hypervisor.h>
-#include <asm/kvm_guest.h>
 
 static int kvmapf = 1;
 
@@ -66,15 +65,6 @@ static int __init parse_no_stealacc(char
 
 early_param("no-steal-acc", parse_no_stealacc);
 
-static int kvmclock_vsyscall = 1;
-static int __init parse_no_kvmclock_vsyscall(char *arg)
-{
-        kvmclock_vsyscall = 0;
-        return 0;
-}
-
-early_param("no-kvmclock-vsyscall", parse_no_kvmclock_vsyscall);
-
 static DEFINE_PER_CPU_DECRYPTED(struct kvm_vcpu_pv_apf_data, apf_reason) __aligned(64);
 static DEFINE_PER_CPU_DECRYPTED(struct kvm_steal_time, steal_time) __aligned(64);
 static int has_steal_clock = 0;
@@ -560,9 +550,6 @@ static void __init kvm_guest_init(void)
 	if (kvm_para_has_feature(KVM_FEATURE_PV_EOI))
 		apic_set_eoi_write(kvm_guest_apic_eoi_write);
 
-	if (kvmclock_vsyscall)
-		kvm_setup_vsyscall_timeinfo();
-
 #ifdef CONFIG_SMP
 	smp_ops.smp_prepare_cpus = kvm_smp_prepare_cpus;
 	smp_ops.smp_prepare_boot_cpu = kvm_smp_prepare_boot_cpu;
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -27,12 +27,14 @@
 #include <linux/sched/clock.h>
 #include <linux/mm.h>
 
+#include <asm/hypervisor.h>
 #include <asm/mem_encrypt.h>
 #include <asm/x86_init.h>
 #include <asm/reboot.h>
 #include <asm/kvmclock.h>
 
 static int __initdata kvmclock = 1;
+static int __initdata kvmclock_vsyscall = 1;
 static int msr_kvm_system_time __ro_after_init = MSR_KVM_SYSTEM_TIME;
 static int msr_kvm_wall_clock __ro_after_init = MSR_KVM_WALL_CLOCK;
 static u64 kvm_sched_clock_offset __ro_after_init;
@@ -44,6 +46,13 @@ static int __init parse_no_kvmclock(char
 }
 early_param("no-kvmclock", parse_no_kvmclock);
 
+static int __init parse_no_kvmclock_vsyscall(char *arg)
+{
+	kvmclock_vsyscall = 0;
+	return 0;
+}
+early_param("no-kvmclock-vsyscall", parse_no_kvmclock_vsyscall);
+
 /* Aligned to page sizes to match whats mapped via vsyscalls to userspace */
 #define HV_CLOCK_SIZE	(sizeof(struct pvclock_vsyscall_time_info) * NR_CPUS)
 
@@ -227,6 +236,25 @@ static void kvm_shutdown(void)
 	native_machine_shutdown();
 }
 
+static int __init kvm_setup_vsyscall_timeinfo(void)
+{
+#ifdef CONFIG_X86_64
+	u8 flags;
+
+	if (!hv_clock || !kvmclock_vsyscall)
+		return 0;
+
+	flags = pvclock_read_flags(&hv_clock[0].pvti);
+	if (!(flags & PVCLOCK_TSC_STABLE_BIT))
+		return 1;
+
+	pvclock_set_pvti_cpu0_va(hv_clock);
+	kvm_clock.archdata.vclock_mode = VCLOCK_PVCLOCK;
+#endif
+	return 0;
+}
+early_initcall(kvm_setup_vsyscall_timeinfo);
+
 void __init kvmclock_init(void)
 {
 	u8 flags;
@@ -241,6 +269,9 @@ void __init kvmclock_init(void)
 		return;
 	}
 
+	if (!hypervisor_is_type(X86_HYPER_KVM))
+		kvmclock_vsyscall = 0;
+
 	pr_info("kvm-clock: Using msrs %x and %x",
 		msr_kvm_system_time, msr_kvm_wall_clock);
 
@@ -270,21 +301,3 @@ void __init kvmclock_init(void)
 	clocksource_register_hz(&kvm_clock, NSEC_PER_SEC);
 	pv_info.name = "KVM";
 }
-
-int __init kvm_setup_vsyscall_timeinfo(void)
-{
-#ifdef CONFIG_X86_64
-	u8 flags;
-
-	if (!hv_clock)
-		return 0;
-
-	flags = pvclock_read_flags(&hv_clock[0].pvti);
-	if (!(flags & PVCLOCK_TSC_STABLE_BIT))
-		return 1;
-
-	pvclock_set_pvti_cpu0_va(hv_clock);
-	kvm_clock.archdata.vclock_mode = VCLOCK_PVCLOCK;
-#endif
-	return 0;
-}



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [patch 7/7] x86/kvmclock: Switch kvmclock data to a PER_CPU variable
  2018-07-06 16:13 [patch 0/7] x86/kvmclock: Remove memblock dependency and further cleanups Thomas Gleixner
                   ` (5 preceding siblings ...)
  2018-07-06 16:13 ` [patch 6/7] x86/kvmclock: Move kvmclock vsyscall param and init to kvmclock Thomas Gleixner
@ 2018-07-06 16:13 ` Thomas Gleixner
  2018-07-12  3:12   ` Pavel Tatashin
  2018-07-06 17:47 ` [patch 0/7] x86/kvmclock: Remove memblock dependency and further cleanups Paolo Bonzini
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 27+ messages in thread
From: Thomas Gleixner @ 2018-07-06 16:13 UTC (permalink / raw)
  To: LKML
  Cc: Paolo Bonzini, Radim Krcmar, Peter Zijlstra, Juergen Gross,
	Pavel Tatashin, steven.sistare, daniel.m.jordan, x86, kvm

From: Thomas Gleixner <tglx@linutronix.de>

The previous removal of the memblock dependency from kvmclock introduced a
static data array sized 64bytes * CONFIG_NR_CPUS. That's wasteful on large
systems when kvmclock is not used.

Replace it with:

 - A static page sized array of pvclock data. It's page sized because the
   pvclock data of the boot cpu is mapped into the VDSO so otherwise random
   other data would be exposed to the vDSO

 - A PER_CPU variable of pvclock data pointers. This is used to access the
   pcvlock data storage on each CPU.

The setup is done in two stages:

 - Early boot stores the pointer to the static page for the boot CPU in
   the per cpu data.

 - In the preparatory stage of CPU hotplug assign either an element of
   the static array (when the CPU number is in that range) or allocate
   memory and initialize the per cpu pointer.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krcmar <rkrcmar@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: steven.sistare@oracle.com
Cc: daniel.m.jordan@oracle.com
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
---
 arch/x86/kernel/kvmclock.c |   97 +++++++++++++++++++++++++++------------------
 1 file changed, 60 insertions(+), 37 deletions(-)

--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -23,6 +23,7 @@
 #include <asm/apic.h>
 #include <linux/percpu.h>
 #include <linux/hardirq.h>
+#include <linux/cpuhotplug.h>
 #include <linux/sched.h>
 #include <linux/sched/clock.h>
 #include <linux/mm.h>
@@ -55,12 +56,22 @@ early_param("no-kvmclock-vsyscall", pars
 
 /* Aligned to page sizes to match whats mapped via vsyscalls to userspace */
 #define HV_CLOCK_SIZE	(sizeof(struct pvclock_vsyscall_time_info) * NR_CPUS)
+#define HVC_BOOT_ARRAY_SIZE \
+	(PAGE_SIZE / sizeof(struct pvclock_vsyscall_time_info))
 
-static u8 hv_clock_mem[PAGE_ALIGN(HV_CLOCK_SIZE)] __aligned(PAGE_SIZE);
-
-/* The hypervisor will put information about time periodically here */
-static struct pvclock_vsyscall_time_info *hv_clock __ro_after_init;
+static struct pvclock_vsyscall_time_info hv_clock_boot[HVC_BOOT_ARRAY_SIZE] __aligned(PAGE_SIZE);
 static struct pvclock_wall_clock wall_clock;
+static DEFINE_PER_CPU(struct pvclock_vsyscall_time_info *, hv_clock_per_cpu);
+
+static inline struct pvclock_vcpu_time_info *this_cpu_pvti(void)
+{
+	return &this_cpu_read(hv_clock_per_cpu)->pvti;
+}
+
+static inline struct pvclock_vsyscall_time_info *this_cpu_hvclock(void)
+{
+	return this_cpu_read(hv_clock_per_cpu);
+}
 
 /*
  * The wallclock is the time of day when we booted. Since then, some time may
@@ -69,17 +80,10 @@ static struct pvclock_wall_clock wall_cl
  */
 static void kvm_get_wallclock(struct timespec64 *now)
 {
-	struct pvclock_vcpu_time_info *vcpu_time;
-	int cpu;
-
 	wrmsrl(msr_kvm_wall_clock, slow_virt_to_phys(&wall_clock));
-
-	cpu = get_cpu();
-
-	vcpu_time = &hv_clock[cpu].pvti;
-	pvclock_read_wallclock(&wall_clock, vcpu_time, now);
-
-	put_cpu();
+	preempt_disable();
+	pvclock_read_wallclock(&wall_clock, this_cpu_pvti(), now);
+	preempt_enable();
 }
 
 static int kvm_set_wallclock(const struct timespec64 *now)
@@ -89,14 +93,10 @@ static int kvm_set_wallclock(const struc
 
 static u64 kvm_clock_read(void)
 {
-	struct pvclock_vcpu_time_info *src;
 	u64 ret;
-	int cpu;
 
 	preempt_disable_notrace();
-	cpu = smp_processor_id();
-	src = &hv_clock[cpu].pvti;
-	ret = pvclock_clocksource_read(src);
+	ret = pvclock_clocksource_read(this_cpu_pvti());
 	preempt_enable_notrace();
 	return ret;
 }
@@ -140,7 +140,7 @@ static inline void kvm_sched_clock_init(
  */
 static unsigned long kvm_get_tsc_khz(void)
 {
-	return pvclock_tsc_khz(&hv_clock[0].pvti);
+	return pvclock_tsc_khz(this_cpu_pvti());
 }
 
 static void kvm_get_preset_lpj(void)
@@ -157,15 +157,14 @@ static void kvm_get_preset_lpj(void)
 
 bool kvm_check_and_clear_guest_paused(void)
 {
-	struct pvclock_vcpu_time_info *src;
+	struct pvclock_vsyscall_time_info *src = this_cpu_hvclock();
 	bool ret = false;
 
-	if (!hv_clock)
+	if (!src)
 		return ret;
 
-	src = &hv_clock[smp_processor_id()].pvti;
-	if ((src->flags & PVCLOCK_GUEST_STOPPED) != 0) {
-		src->flags &= ~PVCLOCK_GUEST_STOPPED;
+	if ((src->pvti.flags & PVCLOCK_GUEST_STOPPED) != 0) {
+		src->pvti.flags &= ~PVCLOCK_GUEST_STOPPED;
 		pvclock_touch_watchdogs();
 		ret = true;
 	}
@@ -183,17 +182,15 @@ EXPORT_SYMBOL_GPL(kvm_clock);
 
 static void kvm_register_clock(char *txt)
 {
-	struct pvclock_vcpu_time_info *src;
-	int cpu = smp_processor_id();
+	struct pvclock_vsyscall_time_info *src = this_cpu_hvclock();
 	u64 pa;
 
-	if (!hv_clock)
+	if (!src)
 		return;
 
-	src = &hv_clock[cpu].pvti;
-	pa = slow_virt_to_phys(src) | 0x01ULL;
+	pa = slow_virt_to_phys(&src->pvti) | 0x01ULL;
 	wrmsrl(msr_kvm_system_time, pa);
-	pr_info("kvm-clock: cpu %d, msr %llx, %s\n", cpu, pa, txt);
+	pr_info("kvm-clock: cpu %d, msr %llx, %s\n", smp_processor_id(), pa, txt);
 }
 
 static void kvm_save_sched_clock_state(void)
@@ -241,20 +238,42 @@ static int __init kvm_setup_vsyscall_tim
 #ifdef CONFIG_X86_64
 	u8 flags;
 
-	if (!hv_clock || !kvmclock_vsyscall)
+	if (!per_cpu(hv_clock_per_cpu, 0) || !kvmclock_vsyscall)
 		return 0;
 
-	flags = pvclock_read_flags(&hv_clock[0].pvti);
+	flags = pvclock_read_flags(&hv_clock_boot[0].pvti);
 	if (!(flags & PVCLOCK_TSC_STABLE_BIT))
-		return 1;
+		return 0;
 
-	pvclock_set_pvti_cpu0_va(hv_clock);
+	pvclock_set_pvti_cpu0_va(hv_clock_boot);
 	kvm_clock.archdata.vclock_mode = VCLOCK_PVCLOCK;
 #endif
 	return 0;
 }
 early_initcall(kvm_setup_vsyscall_timeinfo);
 
+static int kvmclock_setup_percpu(unsigned int cpu)
+{
+	struct pvclock_vsyscall_time_info *p = per_cpu(hv_clock_per_cpu, cpu);
+
+	/*
+	 * The per cpu area setup replicates CPU0 data to all cpu
+	 * pointers. So carefully check. CPU0 has been set up in init
+	 * already.
+	 */
+	if (!cpu || (p && p != per_cpu(hv_clock_per_cpu, 0)))
+		return 0;
+
+	/* Use the static page for the first CPUs, allocate otherwise */
+	if (cpu < HVC_BOOT_ARRAY_SIZE)
+		p = &hv_clock_boot[cpu];
+	else
+		p = kzalloc(sizeof(*p), GFP_KERNEL);
+
+	per_cpu(hv_clock_per_cpu, cpu) = p;
+	return p ? 0 : -ENOMEM;
+}
+
 void __init kvmclock_init(void)
 {
 	u8 flags;
@@ -269,19 +288,23 @@ void __init kvmclock_init(void)
 		return;
 	}
 
+	if (cpuhp_setup_state(CPUHP_BP_PREPARE_DYN, "kvmclock:setup_percpu",
+			      kvmclock_setup_percpu, NULL) < 0)
+		return;
+
 	if (!hypervisor_is_type(X86_HYPER_KVM))
 		kvmclock_vsyscall = 0;
 
 	pr_info("kvm-clock: Using msrs %x and %x",
 		msr_kvm_system_time, msr_kvm_wall_clock);
 
-	hv_clock = (struct pvclock_vsyscall_time_info *)hv_clock_mem;
+	this_cpu_write(hv_clock_per_cpu, &hv_clock_boot[0]);
 	kvm_register_clock("primary cpu clock");
 
 	if (kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE_STABLE_BIT))
 		pvclock_set_flags(PVCLOCK_TSC_STABLE_BIT);
 
-	flags = pvclock_read_flags(&hv_clock[0].pvti);
+	flags = pvclock_read_flags(&hv_clock_boot[0].pvti);
 	kvm_sched_clock_init(flags & PVCLOCK_TSC_STABLE_BIT);
 
 	x86_platform.calibrate_tsc = kvm_get_tsc_khz;



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch 3/7] x86/kvmclock: Decrapify kvm_register_clock()
  2018-07-06 16:13 ` [patch 3/7] x86/kvmclock: Decrapify kvm_register_clock() Thomas Gleixner
@ 2018-07-06 17:38   ` Paolo Bonzini
  2018-07-06 17:39     ` Thomas Gleixner
  2018-07-12  2:24   ` Pavel Tatashin
  1 sibling, 1 reply; 27+ messages in thread
From: Paolo Bonzini @ 2018-07-06 17:38 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: Radim Krcmar, Peter Zijlstra, Juergen Gross, Pavel Tatashin,
	steven.sistare, daniel.m.jordan, x86, kvm

Decrapification went a bit too far...

On 06/07/2018 18:13, Thomas Gleixner wrote:
> -	int cpu = smp_processor_id();
> -	int low, high, ret;
>  	struct pvclock_vcpu_time_info *src;
> +	int cpu = smp_processor_id();
> +	u64 pa;
>  
>  	if (!hv_clock)
> -		return 0;
> -
> -	src = &hv_clock[cpu].pvti;

... the line above should not have been a "-".  That, or initialize it
in the declaration.  Whoever applies can fix it, no need to repost.
I'll test with the above fix.

Paolo

> -	low = (int)slow_virt_to_phys(src) | 1;
> -	high = ((u64)slow_virt_to_phys(src) >> 32);
> -	ret = native_write_msr_safe(msr_kvm_system_time, low, high);
> -	printk(KERN_INFO "kvm-clock: cpu %d, msr %x:%x, %s\n",
> -	       cpu, high, low, txt);
> +		return;
>  
> -	return ret;
> +	pa = slow_virt_to_phys(src) | 0x01ULL;
> +	wrmsrl(msr_kvm_system_time, pa);
> +	pr_info("kvm-clock: cpu %d, msr %llx, %s\n", cpu, pa, txt);
>  }


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch 3/7] x86/kvmclock: Decrapify kvm_register_clock()
  2018-07-06 17:38   ` Paolo Bonzini
@ 2018-07-06 17:39     ` Thomas Gleixner
  0 siblings, 0 replies; 27+ messages in thread
From: Thomas Gleixner @ 2018-07-06 17:39 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: LKML, Radim Krcmar, Peter Zijlstra, Juergen Gross,
	Pavel Tatashin, steven.sistare, daniel.m.jordan, x86, kvm

On Fri, 6 Jul 2018, Paolo Bonzini wrote:

> Decrapification went a bit too far...
> 
> On 06/07/2018 18:13, Thomas Gleixner wrote:
> > -	int cpu = smp_processor_id();
> > -	int low, high, ret;
> >  	struct pvclock_vcpu_time_info *src;
> > +	int cpu = smp_processor_id();
> > +	u64 pa;
> >  
> >  	if (!hv_clock)
> > -		return 0;
> > -
> > -	src = &hv_clock[cpu].pvti;
> 
> ... the line above should not have been a "-".  That, or initialize it
> in the declaration.  Whoever applies can fix it, no need to repost.
> I'll test with the above fix.

Indeed.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch 4/7] x86/kvmclock: Cleanup the code
  2018-07-06 16:13 ` [patch 4/7] x86/kvmclock: Cleanup the code Thomas Gleixner
@ 2018-07-06 17:39   ` Paolo Bonzini
  2018-07-09  9:05   ` Peter Zijlstra
  1 sibling, 0 replies; 27+ messages in thread
From: Paolo Bonzini @ 2018-07-06 17:39 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: Radim Krcmar, Peter Zijlstra, Juergen Gross, Pavel Tatashin,
	steven.sistare, daniel.m.jordan, x86, kvm

> - Cleanup the mrs write for wall clock.

s/mrs/MSR/

On 06/07/2018 18:13, Thomas Gleixner wrote:
> @@ -195,6 +181,7 @@ static void kvm_register_clock(char *txt
>  	if (!hv_clock)
>  		return;
>  
> +	src = &hv_clock[cpu].pvti;
>  	pa = slow_virt_to_phys(src) | 0x01ULL;
>  	wrmsrl(msr_kvm_system_time, pa);
>  	pr_info("kvm-clock: cpu %d, msr %llx, %s\n", cpu, pa, txt);

Ah, here it is.

Paolo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch 6/7] x86/kvmclock: Move kvmclock vsyscall param and init to kvmclock
  2018-07-06 16:13 ` [patch 6/7] x86/kvmclock: Move kvmclock vsyscall param and init to kvmclock Thomas Gleixner
@ 2018-07-06 17:43   ` Paolo Bonzini
  2018-07-06 19:23     ` Thomas Gleixner
  2018-07-12  2:52   ` Pavel Tatashin
  1 sibling, 1 reply; 27+ messages in thread
From: Paolo Bonzini @ 2018-07-06 17:43 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: Radim Krcmar, Peter Zijlstra, Juergen Gross, Pavel Tatashin,
	steven.sistare, daniel.m.jordan, x86, kvm

On 06/07/2018 18:13, Thomas Gleixner wrote:
> @@ -241,6 +269,9 @@ void __init kvmclock_init(void)
>  		return;
>  	}
>  
> +	if (!hypervisor_is_type(X86_HYPER_KVM))
> +		kvmclock_vsyscall = 0;
> +

No need for this; by the time you get here, the condition will always be
true.  And if you don't have kvmclock, hv_clock will be NULL and
initialization will be skipped anyway in kvm_setup_vsyscall_timeinfo.

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch 0/7] x86/kvmclock: Remove memblock dependency and further cleanups
  2018-07-06 16:13 [patch 0/7] x86/kvmclock: Remove memblock dependency and further cleanups Thomas Gleixner
                   ` (6 preceding siblings ...)
  2018-07-06 16:13 ` [patch 7/7] x86/kvmclock: Switch kvmclock data to a PER_CPU variable Thomas Gleixner
@ 2018-07-06 17:47 ` Paolo Bonzini
  2018-07-06 23:51   ` Brijesh Singh
  2018-07-09  9:22 ` [patch 8/7] x86/kvmclock: Avoid TSC recalibration Peter Zijlstra
  2018-07-12  2:12 ` [patch 0/7] x86/kvmclock: Remove memblock dependency and further cleanups Pavel Tatashin
  9 siblings, 1 reply; 27+ messages in thread
From: Paolo Bonzini @ 2018-07-06 17:47 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: Radim Krcmar, Peter Zijlstra, Juergen Gross, Pavel Tatashin,
	steven.sistare, daniel.m.jordan, x86, kvm

On 06/07/2018 18:13, Thomas Gleixner wrote:
> To allow early utilization of kvmclock it is required to remove the
> memblock dependency. memblock is currently used to allocate the per
> cpu data for kvmclock.
> 
> The first patch replaces the memblock with a static array sized 64bytes *
> NR_CPUS and was posted by Pavel. That patch allocates everything statically
> which is a waste when kvmclock is not used.
> 
> The rest of the series cleans up the code and converts it to per cpu
> variables but does not put the kvmclock data into the per cpu area as that
> has an issue vs. mapping the boot cpu data into the VDSO (leaks arbitrary
> data, unless page sized).
> 
> The per cpu data consists of pointers to the actual data. For the boot cpu
> a page sized array is statically allocated which can be mapped into the
> VDSO. That array is used for initializing the first 64 CPU pointers. If
> there are more CPUs the pvclock data is allocated during CPU bringup.
> 
> So this still will have some overhead when kvmclock is not in use, but
> bringing it down to zero would be a massive trainwreck and even more
> indirections.
> 
> Thanks,
> 
> 	tglx
> 
> 8<--------------
>  a/arch/x86/include/asm/kvm_guest.h |    7 
>  arch/x86/include/asm/kvm_para.h    |    1 
>  arch/x86/kernel/kvm.c              |   14 -
>  arch/x86/kernel/kvmclock.c         |  262 ++++++++++++++-----------------------
>  arch/x86/kernel/setup.c            |    4 
>  5 files changed, 105 insertions(+), 183 deletions(-)
> 
> 
> 
> 

Thanks, this is really nice.  With the small changes from my review,

Acked-by: Paolo Bonzini <pbonzini@redhat.com>

Paolo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch 6/7] x86/kvmclock: Move kvmclock vsyscall param and init to kvmclock
  2018-07-06 17:43   ` Paolo Bonzini
@ 2018-07-06 19:23     ` Thomas Gleixner
  0 siblings, 0 replies; 27+ messages in thread
From: Thomas Gleixner @ 2018-07-06 19:23 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: LKML, Radim Krcmar, Peter Zijlstra, Juergen Gross,
	Pavel Tatashin, steven.sistare, daniel.m.jordan, x86, kvm

On Fri, 6 Jul 2018, Paolo Bonzini wrote:
> On 06/07/2018 18:13, Thomas Gleixner wrote:
> > @@ -241,6 +269,9 @@ void __init kvmclock_init(void)
> >  		return;
> >  	}
> >  
> > +	if (!hypervisor_is_type(X86_HYPER_KVM))
> > +		kvmclock_vsyscall = 0;
> > +
> 
> No need for this; by the time you get here, the condition will always be
> true.  And if you don't have kvmclock, hv_clock will be NULL and
> initialization will be skipped anyway in kvm_setup_vsyscall_timeinfo.

Right, stupid me.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch 0/7] x86/kvmclock: Remove memblock dependency and further cleanups
  2018-07-06 17:47 ` [patch 0/7] x86/kvmclock: Remove memblock dependency and further cleanups Paolo Bonzini
@ 2018-07-06 23:51   ` Brijesh Singh
  0 siblings, 0 replies; 27+ messages in thread
From: Brijesh Singh @ 2018-07-06 23:51 UTC (permalink / raw)
  To: Paolo Bonzini, Thomas Gleixner, LKML, Borislav Petkov
  Cc: brijesh.singh, Radim Krcmar, Peter Zijlstra, Juergen Gross,
	Pavel Tatashin, steven.sistare, daniel.m.jordan, x86, kvm,
	Lendacky, Thomas


Adding Tom and Boris


On 7/6/18 12:47 PM, Paolo Bonzini wrote:
> On 06/07/2018 18:13, Thomas Gleixner wrote:
>> To allow early utilization of kvmclock it is required to remove the
>> memblock dependency. memblock is currently used to allocate the per
>> cpu data for kvmclock.
>>
>> The first patch replaces the memblock with a static array sized 64bytes *
>> NR_CPUS and was posted by Pavel. That patch allocates everything statically
>> which is a waste when kvmclock is not used.
>>
>> The rest of the series cleans up the code and converts it to per cpu
>> variables but does not put the kvmclock data into the per cpu area as that
>> has an issue vs. mapping the boot cpu data into the VDSO (leaks arbitrary
>> data, unless page sized).
>>
>> The per cpu data consists of pointers to the actual data. For the boot cpu
>> a page sized array is statically allocated which can be mapped into the
>> VDSO. That array is used for initializing the first 64 CPU pointers. If
>> there are more CPUs the pvclock data is allocated during CPU bringup.
>>
>> So this still will have some overhead when kvmclock is not in use, but
>> bringing it down to zero would be a massive trainwreck and even more
>> indirections.
>>
>> Thanks,
>>
>> 	tglx
>>
>> 8<--------------
>>  a/arch/x86/include/asm/kvm_guest.h |    7 
>>  arch/x86/include/asm/kvm_para.h    |    1 
>>  arch/x86/kernel/kvm.c              |   14 -
>>  arch/x86/kernel/kvmclock.c         |  262 ++++++++++++++-----------------------
>>  arch/x86/kernel/setup.c            |    4 
>>  5 files changed, 105 insertions(+), 183 deletions(-)
>>
>>
>>
>>
> Thanks, this is really nice.  With the small changes from my review,
>
> Acked-by: Paolo Bonzini <pbonzini@redhat.com>

Hi Paolo and Thomas,


This series breaks SEV guest support. The physical address of both
wall_clock and hv_clock is shared with hypervisor for updates. In case
of SEV the address must be mapped as 'decrypted (i.e C=0)' so that both
guest and HV can access the data correctly. The follow patch should map
the pages as decrypted.


diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index 890e9e5..640c796 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -251,6 +251,20 @@ static void kvm_shutdown(void)
        native_machine_shutdown();
 }
 
+static void sev_map_clocks_decrypted(void)
+{
+       if (!sev_active())
+               return;
+
+       /*
+        * wall_clock and hv_clock addresses are shared with hypervisor.
+        * When SEV is enabled, any addresses shared with hypervisor must be
+        * mapped decrypted.
+        */
+       early_set_memory_decrypted((unsigned long) wall_clock,
WALL_CLOCK_SIZE);
+       early_set_memory_decrypted((unsigned long) hv_clock, HV_CLOCK_SIZE);
+}
+
 void __init kvmclock_init(void)
 {
        struct pvclock_vcpu_time_info *vcpu_time;
@@ -269,6 +283,8 @@ void __init kvmclock_init(void)
        wall_clock = (struct pvclock_wall_clock *)wall_clock_mem;
        hv_clock = (struct pvclock_vsyscall_time_info *)hv_clock_mem;
 
+       sev_map_clocks_decrypted();
+
        if (kvm_register_clock("primary cpu clock")) {
                hv_clock = NULL;
                wall_clock = NULL;


But this patch triggers the below kernel crash.
early_set_memory_decrypted() uses kernel_physical_mapping_init() to
split the large pages and clear the C-bit. It seems this function still
has dependency with memblock.

[    0.000000] Hypervisor detected: KVM
[    0.000000] Kernel panic - not syncing: alloc_low_pages: ran out of
memory
[    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.18.0-rc3-sev #19
[    0.000000] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
0.0.0 02/06/2015
[    0.000000] Call Trace:
[    0.000000]  ? dump_stack+0x5c/0x80
[    0.000000]  ? panic+0xe7/0x247
[    0.000000]  ? alloc_low_pages+0x130/0x130
[    0.000000]  ? kernel_physical_mapping_init+0xe0/0x204
[    0.000000]  ? early_set_memory_enc_dec+0x10f/0x160
[    0.000000]  ? 0xffffffffb1000000
[    0.000000]  ? kvmclock_init+0x83/0x20a
[    0.000000]  ? setup_arch+0x42c/0xce6
[    0.000000]  ? start_kernel+0x67/0x531
[    0.000000]  ? load_ucode_bsp+0x76/0x12e
[    0.000000]  ? secondary_startup_64+0xa5/0xb0
[    0.000000] ---[ end Kernel panic - not syncing: alloc_low_pages: ran
out of memory ]---

- Brijesh


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [patch 4/7] x86/kvmclock: Cleanup the code
  2018-07-06 16:13 ` [patch 4/7] x86/kvmclock: Cleanup the code Thomas Gleixner
  2018-07-06 17:39   ` Paolo Bonzini
@ 2018-07-09  9:05   ` Peter Zijlstra
  2018-07-09 10:03     ` Thomas Gleixner
  2018-07-09 11:32     ` Paolo Bonzini
  1 sibling, 2 replies; 27+ messages in thread
From: Peter Zijlstra @ 2018-07-09  9:05 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Paolo Bonzini, Radim Krcmar, Juergen Gross, Pavel Tatashin,
	steven.sistare, daniel.m.jordan, x86, kvm

On Fri, Jul 06, 2018 at 06:13:11PM +0200, Thomas Gleixner wrote:
> -	native_write_msr(msr_kvm_wall_clock, low, high);
> +	wrmsrl(msr_kvm_wall_clock, slow_virt_to_phys(&wall_clock));

Does it matter that you went from an explicit native WRMSR instruction
to a potentially paravirt MSR thing?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [patch 8/7] x86/kvmclock: Avoid TSC recalibration
  2018-07-06 16:13 [patch 0/7] x86/kvmclock: Remove memblock dependency and further cleanups Thomas Gleixner
                   ` (7 preceding siblings ...)
  2018-07-06 17:47 ` [patch 0/7] x86/kvmclock: Remove memblock dependency and further cleanups Paolo Bonzini
@ 2018-07-09  9:22 ` Peter Zijlstra
  2018-07-12  2:12 ` [patch 0/7] x86/kvmclock: Remove memblock dependency and further cleanups Pavel Tatashin
  9 siblings, 0 replies; 27+ messages in thread
From: Peter Zijlstra @ 2018-07-09  9:22 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Paolo Bonzini, Radim Krcmar, Juergen Gross, Pavel Tatashin,
	steven.sistare, daniel.m.jordan, x86, kvm


If the host gives us a TSC rate, assume it is good and don't try and
recalibrate things against virtual timer hardware.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -140,7 +140,16 @@ static inline void kvm_sched_clock_init(
  */
 static unsigned long kvm_get_tsc_khz(void)
 {
-	return pvclock_tsc_khz(this_cpu_pvti());
+	unsigned long tsc_khz = pvclock_tsc_khz(this_cpu_pvti());
+
+	/*
+	 * TSC frequency is reported by the host; calibration against (virtual)
+	 * HPET/PM-timer in a guest is dodgy and pointless since the host already
+	 * did it for us where required.
+	 */
+	setup_force_cpu_cap(X86_FEATURE_TSC_KNOWN_FREQ);
+
+	return tsc_khz;
 }
 
 static void kvm_get_preset_lpj(void)

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch 4/7] x86/kvmclock: Cleanup the code
  2018-07-09  9:05   ` Peter Zijlstra
@ 2018-07-09 10:03     ` Thomas Gleixner
  2018-07-09 11:32     ` Paolo Bonzini
  1 sibling, 0 replies; 27+ messages in thread
From: Thomas Gleixner @ 2018-07-09 10:03 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Paolo Bonzini, Radim Krcmar, Juergen Gross, Pavel Tatashin,
	steven.sistare, daniel.m.jordan, x86, kvm

On Mon, 9 Jul 2018, Peter Zijlstra wrote:
> On Fri, Jul 06, 2018 at 06:13:11PM +0200, Thomas Gleixner wrote:
> > -	native_write_msr(msr_kvm_wall_clock, low, high);
> > +	wrmsrl(msr_kvm_wall_clock, slow_virt_to_phys(&wall_clock));
> 
> Does it matter that you went from an explicit native WRMSR instruction
> to a potentially paravirt MSR thing?

Ah crap, that wants to be native_wrmsrl()


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch 4/7] x86/kvmclock: Cleanup the code
  2018-07-09  9:05   ` Peter Zijlstra
  2018-07-09 10:03     ` Thomas Gleixner
@ 2018-07-09 11:32     ` Paolo Bonzini
  1 sibling, 0 replies; 27+ messages in thread
From: Paolo Bonzini @ 2018-07-09 11:32 UTC (permalink / raw)
  To: Peter Zijlstra, Thomas Gleixner
  Cc: LKML, Radim Krcmar, Juergen Gross, Pavel Tatashin,
	steven.sistare, daniel.m.jordan, x86, kvm

On 09/07/2018 11:05, Peter Zijlstra wrote:
> On Fri, Jul 06, 2018 at 06:13:11PM +0200, Thomas Gleixner wrote:
>> -	native_write_msr(msr_kvm_wall_clock, low, high);
>> +	wrmsrl(msr_kvm_wall_clock, slow_virt_to_phys(&wall_clock));
> Does it matter that you went from an explicit native WRMSR instruction
> to a potentially paravirt MSR thing?

No, KVM doesn't paravirtualize wrmsr.

Paolo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch 0/7] x86/kvmclock: Remove memblock dependency and further cleanups
  2018-07-06 16:13 [patch 0/7] x86/kvmclock: Remove memblock dependency and further cleanups Thomas Gleixner
                   ` (8 preceding siblings ...)
  2018-07-09  9:22 ` [patch 8/7] x86/kvmclock: Avoid TSC recalibration Peter Zijlstra
@ 2018-07-12  2:12 ` Pavel Tatashin
  2018-07-13 22:51   ` Thomas Gleixner
  9 siblings, 1 reply; 27+ messages in thread
From: Pavel Tatashin @ 2018-07-12  2:12 UTC (permalink / raw)
  To: tglx
  Cc: LKML, pbonzini, rkrcmar, peterz, jgross, Steven Sistare,
	Daniel Jordan, x86, kvm

> So this still will have some overhead when kvmclock is not in use, but
> bringing it down to zero would be a massive trainwreck and even more
> indirections.

Hi Thomas,

In my opinion, having kvmclock page in  __initdata for boot cpu, and
setup it in init_hypervisor_platform(). Later, switch to memblock
allocated memory in x86_init.hyper.guest_late_init() for all CPUs
would not be too bad, and might be even use fewer lines of code. In
addition, it won't have any overhead when kvm is not used.

Pavel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch 2/7] x86/kvmclock: Remove page size requirement from wall_clock
  2018-07-06 16:13 ` [patch 2/7] x86/kvmclock: Remove page size requirement from wall_clock Thomas Gleixner
@ 2018-07-12  2:15   ` Pavel Tatashin
  0 siblings, 0 replies; 27+ messages in thread
From: Pavel Tatashin @ 2018-07-12  2:15 UTC (permalink / raw)
  To: tglx
  Cc: LKML, pbonzini, rkrcmar, peterz, jgross, Steven Sistare,
	Daniel Jordan, x86, kvm

On Fri, Jul 6, 2018 at 12:26 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> There is no requirement for wall_clock data to be page aligned or page
> sized.
>

Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch 3/7] x86/kvmclock: Decrapify kvm_register_clock()
  2018-07-06 16:13 ` [patch 3/7] x86/kvmclock: Decrapify kvm_register_clock() Thomas Gleixner
  2018-07-06 17:38   ` Paolo Bonzini
@ 2018-07-12  2:24   ` Pavel Tatashin
  1 sibling, 0 replies; 27+ messages in thread
From: Pavel Tatashin @ 2018-07-12  2:24 UTC (permalink / raw)
  To: tglx
  Cc: LKML, pbonzini, rkrcmar, peterz, jgross, Steven Sistare,
	Daniel Jordan, x86, kvm

In addition to what Paolo's noticed, may be change printk to pr_info()
while editing around it:

> -
>         printk(KERN_INFO "kvm-clock: Using msrs %x and %x",
>                 msr_kvm_system_time, msr_kvm_wall_clock);
>
> +       hv_clock = (struct pvclock_vsyscall_time_info *)hv_clock_mem;
> +       kvm_register_clock("primary cpu clock");

Otherwise,
Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch 5/7] x86/kvmclock: Mark variables __initdata and __ro_after_init
  2018-07-06 16:13 ` [patch 5/7] x86/kvmclock: Mark variables __initdata and __ro_after_init Thomas Gleixner
@ 2018-07-12  2:31   ` Pavel Tatashin
  0 siblings, 0 replies; 27+ messages in thread
From: Pavel Tatashin @ 2018-07-12  2:31 UTC (permalink / raw)
  To: tglx
  Cc: LKML, pbonzini, rkrcmar, peterz, jgross, Steven Sistare,
	Daniel Jordan, x86, kvm

On Fri, Jul 6, 2018 at 12:25 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> The kvmclock parameter is init data and the other variables are not
> modified after init.

LGTM:
Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch 6/7] x86/kvmclock: Move kvmclock vsyscall param and init to kvmclock
  2018-07-06 16:13 ` [patch 6/7] x86/kvmclock: Move kvmclock vsyscall param and init to kvmclock Thomas Gleixner
  2018-07-06 17:43   ` Paolo Bonzini
@ 2018-07-12  2:52   ` Pavel Tatashin
  1 sibling, 0 replies; 27+ messages in thread
From: Pavel Tatashin @ 2018-07-12  2:52 UTC (permalink / raw)
  To: tglx
  Cc: LKML, pbonzini, rkrcmar, peterz, jgross, Steven Sistare,
	Daniel Jordan, x86, kvm

+#include <asm/hypervisor.h>

Can be removed, because as Paolo, noticed X86_HYPER_KVM check is not needed.

> +static int __init kvm_setup_vsyscall_timeinfo(void)
> +{
> +#ifdef CONFIG_X86_64
> +       u8 flags;
> +
> +       if (!hv_clock || !kvmclock_vsyscall)
> +               return 0;
> +
> +       flags = pvclock_read_flags(&hv_clock[0].pvti);
> +       if (!(flags & PVCLOCK_TSC_STABLE_BIT))
> +               return 1;
> +
> +       pvclock_set_pvti_cpu0_va(hv_clock);
> +       kvm_clock.archdata.vclock_mode = VCLOCK_PVCLOCK;
> +#endif
> +       return 0;
> +}
> +early_initcall(kvm_setup_vsyscall_timeinfo);
> +
....
> -
> -int __init kvm_setup_vsyscall_timeinfo(void)
> -{
> -#ifdef CONFIG_X86_64
> -       u8 flags;
> -
> -       if (!hv_clock)
> -               return 0;
> -
> -       flags = pvclock_read_flags(&hv_clock[0].pvti);
> -       if (!(flags & PVCLOCK_TSC_STABLE_BIT))
> -               return 1;
> -
> -       pvclock_set_pvti_cpu0_va(hv_clock);
> -       kvm_clock.archdata.vclock_mode = VCLOCK_PVCLOCK;
> -#endif
> -       return 0;
> -}

I am not sure what the point of moving this function. The patch would
be much smaller without it.

Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch 7/7] x86/kvmclock: Switch kvmclock data to a PER_CPU variable
  2018-07-06 16:13 ` [patch 7/7] x86/kvmclock: Switch kvmclock data to a PER_CPU variable Thomas Gleixner
@ 2018-07-12  3:12   ` Pavel Tatashin
  0 siblings, 0 replies; 27+ messages in thread
From: Pavel Tatashin @ 2018-07-12  3:12 UTC (permalink / raw)
  To: tglx
  Cc: linux-kernel, pbonzini, rkrcmar, peterz, jgross, steven.sistare,
	daniel.m.jordan, x86, kvm

>  /* Aligned to page sizes to match whats mapped via vsyscalls to userspace */
>  #define HV_CLOCK_SIZE  (sizeof(struct pvclock_vsyscall_time_info) * NR_CPUS)

HV_CLOCK_SIZE is not used anywhere anymore, can be deleted.

> +#define HVC_BOOT_ARRAY_SIZE \
> +       (PAGE_SIZE / sizeof(struct pvclock_vsyscall_time_info))

This works, because sizeof(struct pvclock_vsyscall_time_info)) == 64
bytes, but static assert that verifies that IS_ALIGNED(PAGESIZE,
sizeof(struct pvclock_vsyscall_time_info)) would guarantee that no
random data is ever exposed to vdso.

Otherwise looks good to me:
Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch 0/7] x86/kvmclock: Remove memblock dependency and further cleanups
  2018-07-12  2:12 ` [patch 0/7] x86/kvmclock: Remove memblock dependency and further cleanups Pavel Tatashin
@ 2018-07-13 22:51   ` Thomas Gleixner
  2018-07-14  0:20     ` Pavel Tatashin
  0 siblings, 1 reply; 27+ messages in thread
From: Thomas Gleixner @ 2018-07-13 22:51 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: LKML, pbonzini, rkrcmar, peterz, jgross, Steven Sistare,
	Daniel Jordan, x86, kvm

On Wed, 11 Jul 2018, Pavel Tatashin wrote:

> > So this still will have some overhead when kvmclock is not in use, but
> > bringing it down to zero would be a massive trainwreck and even more
> > indirections.
> 
> Hi Thomas,
> 
> In my opinion, having kvmclock page in  __initdata for boot cpu, and
> setup it in init_hypervisor_platform(). Later, switch to memblock
> allocated memory in x86_init.hyper.guest_late_init() for all CPUs
> would not be too bad, and might be even use fewer lines of code. In
> addition, it won't have any overhead when kvm is not used.

Why memblock? This can be switched when the allocator is up and
running. And you can use the per cpu allocator for that.

I'm not having cycles at the moment to look at that, so feel free to pick
the series up and enhance it.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch 0/7] x86/kvmclock: Remove memblock dependency and further cleanups
  2018-07-13 22:51   ` Thomas Gleixner
@ 2018-07-14  0:20     ` Pavel Tatashin
  0 siblings, 0 replies; 27+ messages in thread
From: Pavel Tatashin @ 2018-07-14  0:20 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, pbonzini, rkrcmar, peterz, jgross, Steven Sistare,
	Daniel Jordan, x86, kvm



On 07/13/2018 06:51 PM, Thomas Gleixner wrote:
> On Wed, 11 Jul 2018, Pavel Tatashin wrote:
> 
>>> So this still will have some overhead when kvmclock is not in use, but
>>> bringing it down to zero would be a massive trainwreck and even more
>>> indirections.
>>
>> Hi Thomas,
>>
>> In my opinion, having kvmclock page in  __initdata for boot cpu, and
>> setup it in init_hypervisor_platform(). Later, switch to memblock
>> allocated memory in x86_init.hyper.guest_late_init() for all CPUs
>> would not be too bad, and might be even use fewer lines of code. In
>> addition, it won't have any overhead when kvm is not used.
> 
> Why memblock? This can be switched when the allocator is up and
> running. And you can use the per cpu allocator for that.
> 
> I'm not having cycles at the moment to look at that, so feel free to pick
> the series up and enhance it.

OK, I will add your series into my series, and fix all the comments that were raised by the reviewers.

Pavel

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2018-07-14  0:21 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-06 16:13 [patch 0/7] x86/kvmclock: Remove memblock dependency and further cleanups Thomas Gleixner
2018-07-06 16:13 ` [patch 1/7] x86/kvmclock: Remove memblock dependency Thomas Gleixner
2018-07-06 16:13 ` [patch 2/7] x86/kvmclock: Remove page size requirement from wall_clock Thomas Gleixner
2018-07-12  2:15   ` Pavel Tatashin
2018-07-06 16:13 ` [patch 3/7] x86/kvmclock: Decrapify kvm_register_clock() Thomas Gleixner
2018-07-06 17:38   ` Paolo Bonzini
2018-07-06 17:39     ` Thomas Gleixner
2018-07-12  2:24   ` Pavel Tatashin
2018-07-06 16:13 ` [patch 4/7] x86/kvmclock: Cleanup the code Thomas Gleixner
2018-07-06 17:39   ` Paolo Bonzini
2018-07-09  9:05   ` Peter Zijlstra
2018-07-09 10:03     ` Thomas Gleixner
2018-07-09 11:32     ` Paolo Bonzini
2018-07-06 16:13 ` [patch 5/7] x86/kvmclock: Mark variables __initdata and __ro_after_init Thomas Gleixner
2018-07-12  2:31   ` Pavel Tatashin
2018-07-06 16:13 ` [patch 6/7] x86/kvmclock: Move kvmclock vsyscall param and init to kvmclock Thomas Gleixner
2018-07-06 17:43   ` Paolo Bonzini
2018-07-06 19:23     ` Thomas Gleixner
2018-07-12  2:52   ` Pavel Tatashin
2018-07-06 16:13 ` [patch 7/7] x86/kvmclock: Switch kvmclock data to a PER_CPU variable Thomas Gleixner
2018-07-12  3:12   ` Pavel Tatashin
2018-07-06 17:47 ` [patch 0/7] x86/kvmclock: Remove memblock dependency and further cleanups Paolo Bonzini
2018-07-06 23:51   ` Brijesh Singh
2018-07-09  9:22 ` [patch 8/7] x86/kvmclock: Avoid TSC recalibration Peter Zijlstra
2018-07-12  2:12 ` [patch 0/7] x86/kvmclock: Remove memblock dependency and further cleanups Pavel Tatashin
2018-07-13 22:51   ` Thomas Gleixner
2018-07-14  0:20     ` Pavel Tatashin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.