kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/6] make L2's kvm-clock stable, get rid of pvclock_gtod
@ 2017-07-29 12:35 Denis Plotnikov
  2017-07-29 12:35 ` [PATCH v3 1/6] pvclock: add parameter to store cycles stamp to pvclock reading function Denis Plotnikov
                   ` (5 more replies)
  0 siblings, 6 replies; 18+ messages in thread
From: Denis Plotnikov @ 2017-07-29 12:35 UTC (permalink / raw)
  To: pbonzini, rkrcmar, kvm; +Cc: rkagan, den, svt-core

V3:
  Changing the timekeeper interface for clocksource reading looks like
  an overkill to achive the goal of getting cycles stamp for KVM.
  Instead extend the timekeeping interface and add functions which provide
  necessary data: read clocksource with cycles stamp, check whether the
  clock source is stable.

  Use those functions and improve existing timekeeper functionality to
  replace pvclock_gtod_copy scheme in masterclock data calculation.

V2:
  The main goal is to make L2 kvm-clock be stable when it's running over L1
  with stable kvm-clock.

  The patch series is for x86 architecture only. If the series is approved
  I'll do changes for other architectures but I don't have an ability to
  compile and check for every single on (help needed)

  The patch series do the following:

        * change timekeeper interface to get cycles stamp value from
          the timekeeper
        * get rid of pvclock copy in KVM by using the changed timekeeper
          interface: get time and cycles right from the timekeeper
        * make KVM recognize a stable kvm-clock as stable clocksource
          and use the KVM masterclock in this case, which means making
          L2 stable when running over stable L1 kvm-clock

Denis Plotnikov (6):
  pvclock: add parameter to store cycles stamp to pvclock reading
    function
  KVM: x86: switch to masterclock update using timekeeper functionality
  timekeeper: add clocksource change notifier
  KVM: x86: remove not used pvclock_gtod_copy
  pvclock: add clocksource change notification on changing of tsc stable
    bit
  kvmclock: add the clocksource stability querying function

 arch/x86/include/asm/kvm_host.h |   2 +-
 arch/x86/include/asm/pvclock.h  |   2 +-
 arch/x86/kernel/kvmclock.c      |  26 ++++-
 arch/x86/kernel/pvclock.c       |  31 ++++-
 arch/x86/kernel/tsc.c           |   6 +
 arch/x86/kvm/trace.h            |  31 ++---
 arch/x86/kvm/x86.c              | 242 ++++++++--------------------------------
 arch/x86/xen/time.c             |   2 +-
 include/linux/clocksource.h     |   3 +
 include/linux/cs_notifier.h     |  17 +++
 include/linux/timekeeping.h     |   2 +
 kernel/time/timekeeping.c       |  72 +++++++++++-
 12 files changed, 209 insertions(+), 227 deletions(-)
 create mode 100644 include/linux/cs_notifier.h

-- 
2.7.4

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v3 1/6] pvclock: add parameter to store cycles stamp to pvclock reading function
  2017-07-29 12:35 [PATCH v3 0/6] make L2's kvm-clock stable, get rid of pvclock_gtod Denis Plotnikov
@ 2017-07-29 12:35 ` Denis Plotnikov
  2017-07-31 14:08   ` Paolo Bonzini
  2017-07-29 12:35 ` [PATCH v3 2/6] KVM: x86: switch to masterclock update using timekeeper functionality Denis Plotnikov
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 18+ messages in thread
From: Denis Plotnikov @ 2017-07-29 12:35 UTC (permalink / raw)
  To: pbonzini, rkrcmar, kvm; +Cc: rkagan, den, svt-core

The parameter is a pointer which is used to store cycles stamp value
used for time calulation on pvclock reading.

This is a part of the work aiming to move to a more simple scheme of
masterclock related values calculation in KVM.

Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com>
---
 arch/x86/include/asm/pvclock.h |  2 +-
 arch/x86/kernel/kvmclock.c     |  2 +-
 arch/x86/kernel/pvclock.c      | 12 +++++++++---
 arch/x86/xen/time.c            |  2 +-
 4 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/pvclock.h b/arch/x86/include/asm/pvclock.h
index 448cfe1..147b411 100644
--- a/arch/x86/include/asm/pvclock.h
+++ b/arch/x86/include/asm/pvclock.h
@@ -14,7 +14,7 @@ static inline struct pvclock_vsyscall_time_info *pvclock_pvti_cpu0_va(void)
 #endif
 
 /* some helper functions for xen and kvm pv clock sources */
-u64 pvclock_clocksource_read(struct pvclock_vcpu_time_info *src);
+u64 pvclock_clocksource_read(struct pvclock_vcpu_time_info *src, u64 *cycles);
 u8 pvclock_read_flags(struct pvclock_vcpu_time_info *src);
 void pvclock_set_flags(u8 flags);
 unsigned long pvclock_tsc_khz(struct pvclock_vcpu_time_info *src);
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index d889676..f5cfc5d 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -91,7 +91,7 @@ static u64 kvm_clock_read(void)
 	preempt_disable_notrace();
 	cpu = smp_processor_id();
 	src = &hv_clock[cpu].pvti;
-	ret = pvclock_clocksource_read(src);
+	ret = pvclock_clocksource_read(src, NULL);
 	preempt_enable_notrace();
 	return ret;
 }
diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
index 5c3f6d6..a2c554a 100644
--- a/arch/x86/kernel/pvclock.c
+++ b/arch/x86/kernel/pvclock.c
@@ -73,19 +73,24 @@ u8 pvclock_read_flags(struct pvclock_vcpu_time_info *src)
 	return flags & valid_flags;
 }
 
-u64 pvclock_clocksource_read(struct pvclock_vcpu_time_info *src)
+u64 pvclock_clocksource_read(struct pvclock_vcpu_time_info *src, u64 *cycles)
 {
 	unsigned version;
 	u64 ret;
 	u64 last;
 	u8 flags;
+	u64 tsc;
 
 	do {
 		version = pvclock_read_begin(src);
-		ret = __pvclock_read_cycles(src, rdtsc_ordered());
+		tsc = rdtsc_ordered();
+		ret = __pvclock_read_cycles(src, tsc);
 		flags = src->flags;
 	} while (pvclock_read_retry(src, version));
 
+	if (cycles)
+		*cycles = tsc;
+
 	if (unlikely((flags & PVCLOCK_GUEST_STOPPED) != 0)) {
 		src->flags &= ~PVCLOCK_GUEST_STOPPED;
 		pvclock_touch_watchdogs();
@@ -136,7 +141,8 @@ void pvclock_read_wallclock(struct pvclock_wall_clock *wall_clock,
 		rmb();		/* fetch time before checking version */
 	} while ((wall_clock->version & 1) || (version != wall_clock->version));
 
-	delta = pvclock_clocksource_read(vcpu_time);	/* time since system boot */
+	/* time since system boot */
+	delta = pvclock_clocksource_read(vcpu_time, NULL);
 	delta += now.tv_sec * (u64)NSEC_PER_SEC + now.tv_nsec;
 
 	now.tv_nsec = do_div(delta, NSEC_PER_SEC);
diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c
index 1ecb05d..145ac9a 100644
--- a/arch/x86/xen/time.c
+++ b/arch/x86/xen/time.c
@@ -46,7 +46,7 @@ u64 xen_clocksource_read(void)
 
 	preempt_disable_notrace();
 	src = &__this_cpu_read(xen_vcpu)->time;
-	ret = pvclock_clocksource_read(src);
+	ret = pvclock_clocksource_read(src, NULL);
 	preempt_enable_notrace();
 	return ret;
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v3 2/6] KVM: x86: switch to masterclock update using timekeeper functionality
  2017-07-29 12:35 [PATCH v3 0/6] make L2's kvm-clock stable, get rid of pvclock_gtod Denis Plotnikov
  2017-07-29 12:35 ` [PATCH v3 1/6] pvclock: add parameter to store cycles stamp to pvclock reading function Denis Plotnikov
@ 2017-07-29 12:35 ` Denis Plotnikov
  2017-07-31 14:20   ` Paolo Bonzini
  2017-07-29 12:35 ` [PATCH v3 3/6] timekeeper: add clocksource change notifier Denis Plotnikov
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 18+ messages in thread
From: Denis Plotnikov @ 2017-07-29 12:35 UTC (permalink / raw)
  To: pbonzini, rkrcmar, kvm; +Cc: rkagan, den, svt-core

It is reasonable to switch KVM to using a more simple, effective
and conceptually correct scheme of dealing with the data needed
for kvm masterclock values calculation.

With the current scheme the kvm needs to have an up-to-date copy of
some timekeeper data to provide a guest using kvmclock with necessary
information.

This is not:
    - simple
        KVM has to have a lot of code to do that, instead KVM could use
        a timekeeper function to get all the data it needs
    - effective
        the copy of the data used for time data calculation is updated
        every time it changed although this is not necessary since
	the updated piece of time data is needed in certain moments only
        (e.g masterclock updating), instead KVM can request this data
        directly form the timekeeper at the moments when it's really needed
    - conceptually correct
        to do the work (calculate the time data) which the other part
	of the system (timekeeper) has been designed and is able to do
        is not the proper way, instead deligate the work to the proper part

This patch switches KVM to using the improved timekeeper function for
the kvm masterclock time data.

Removing the leftovers of the old scheme is the matter of the next patches.

Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com>
---
 arch/x86/kernel/kvmclock.c  | 14 ++++++++++++--
 arch/x86/kernel/tsc.c       |  6 ++++++
 arch/x86/kvm/x86.c          | 26 ++++++++++++++++++--------
 include/linux/clocksource.h |  3 +++
 include/linux/timekeeping.h |  2 ++
 kernel/time/timekeeping.c   | 21 +++++++++++++++++++--
 6 files changed, 60 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index f5cfc5d..52156d9 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -82,7 +82,7 @@ static int kvm_set_wallclock(const struct timespec *now)
 	return -1;
 }
 
-static u64 kvm_clock_read(void)
+static inline u64 __kvm_clock_read(u64 *cycles)
 {
 	struct pvclock_vcpu_time_info *src;
 	u64 ret;
@@ -91,10 +91,14 @@ static u64 kvm_clock_read(void)
 	preempt_disable_notrace();
 	cpu = smp_processor_id();
 	src = &hv_clock[cpu].pvti;
-	ret = pvclock_clocksource_read(src, NULL);
+	ret = pvclock_clocksource_read(src, cycles);
 	preempt_enable_notrace();
 	return ret;
 }
+static u64 kvm_clock_read(void)
+{
+	return __kvm_clock_read(NULL);
+}
 
 static u64 kvm_clock_get_cycles(struct clocksource *cs)
 {
@@ -177,9 +181,15 @@ bool kvm_check_and_clear_guest_paused(void)
 	return ret;
 }
 
+static void kvm_clock_read_with_cycles(u64 *cycles, u64 *cycles_stamp)
+{
+	*cycles = __kvm_clock_read(cycles_stamp);
+}
+
 struct clocksource kvm_clock = {
 	.name = "kvm-clock",
 	.read = kvm_clock_get_cycles,
+	.read_with_cycles = kvm_clock_read_with_cycles,
 	.rating = 400,
 	.mask = CLOCKSOURCE_MASK(64),
 	.flags = CLOCK_SOURCE_IS_CONTINUOUS,
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 796d96b..5d655af 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -1015,6 +1015,11 @@ static u64 read_tsc(struct clocksource *cs)
 	return (u64)rdtsc_ordered();
 }
 
+static bool is_tsc_stable(void)
+{
+	return !tsc_unstable;
+}
+
 static void tsc_cs_mark_unstable(struct clocksource *cs)
 {
 	if (tsc_unstable)
@@ -1043,6 +1048,7 @@ static struct clocksource clocksource_tsc = {
 	.name                   = "tsc",
 	.rating                 = 300,
 	.read                   = read_tsc,
+	.is_stable		= is_tsc_stable,
 	.mask                   = CLOCKSOURCE_MASK(64),
 	.flags                  = CLOCK_SOURCE_IS_CONTINUOUS |
 				  CLOCK_SOURCE_MUST_VERIFY,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 6c97c82..496e731 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1643,22 +1643,32 @@ static int do_realtime(struct timespec *ts, u64 *cycle_now)
 /* returns true if host is using tsc clocksource */
 static bool kvm_get_time_and_clockread(s64 *kernel_ns, u64 *cycle_now)
 {
-	/* checked again under seqlock below */
-	if (pvclock_gtod_data.clock.vclock_mode != VCLOCK_TSC)
-		return false;
+	struct system_time_snapshot systime_snapshot;
+
+	ktime_get_snapshot(&systime_snapshot);
+
+	if (systime_snapshot.cs_stable) {
+		*kernel_ns = ktime_to_ns(systime_snapshot.boot);
+		*cycle_now = systime_snapshot.cycles;
+	}
 
-	return do_monotonic_boot(kernel_ns, cycle_now) == VCLOCK_TSC;
+	return systime_snapshot.cs_stable;
 }
 
 /* returns true if host is using tsc clocksource */
 static bool kvm_get_walltime_and_clockread(struct timespec *ts,
 					   u64 *cycle_now)
 {
-	/* checked again under seqlock below */
-	if (pvclock_gtod_data.clock.vclock_mode != VCLOCK_TSC)
-		return false;
+	struct system_time_snapshot systime_snapshot;
+
+	ktime_get_snapshot(&systime_snapshot);
+
+	if (systime_snapshot.cs_stable) {
+		*ts = ktime_to_timespec(systime_snapshot.real);
+		*cycle_now = systime_snapshot.cycles;
+	}
 
-	return do_realtime(ts, cycle_now) == VCLOCK_TSC;
+	return systime_snapshot.cs_stable;
 }
 #endif
 
diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index a78cb18..f849b91 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -49,6 +49,7 @@ struct module;
  *				The ideal clocksource. A must-use where
  *				available.
  * @read:		returns a cycle value, passes clocksource as argument
+ * @read_with_cycles
  * @enable:		optional function to enable the clocksource
  * @disable:		optional function to disable the clocksource
  * @mask:		bitmask for two's complement
@@ -78,6 +79,8 @@ struct module;
  */
 struct clocksource {
 	u64 (*read)(struct clocksource *cs);
+	void (*read_with_cycles)(u64 *cycles, u64 *cycles_stamp);
+	bool (*is_stable)(void);
 	u64 mask;
 	u32 mult;
 	u32 shift;
diff --git a/include/linux/timekeeping.h b/include/linux/timekeeping.h
index ddc229f..21917fa 100644
--- a/include/linux/timekeeping.h
+++ b/include/linux/timekeeping.h
@@ -290,8 +290,10 @@ struct system_time_snapshot {
 	u64		cycles;
 	ktime_t		real;
 	ktime_t		raw;
+	ktime_t		boot;
 	unsigned int	clock_was_set_seq;
 	u8		cs_was_changed_seq;
+	bool		cs_stable;
 };
 
 /*
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index cedafa0..a2bfc12 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -953,27 +953,44 @@ void ktime_get_snapshot(struct system_time_snapshot *systime_snapshot)
 	unsigned long seq;
 	ktime_t base_raw;
 	ktime_t base_real;
+	ktime_t base_boot;
 	u64 nsec_raw;
 	u64 nsec_real;
 	u64 now;
+	struct clocksource *clock;
 
 	WARN_ON_ONCE(timekeeping_suspended);
 
 	do {
 		seq = read_seqcount_begin(&tk_core.seq);
-		now = tk_clock_read(&tk->tkr_mono);
+		clock = tk->tkr_mono.clock;
+
+		if (clock->is_stable)
+			systime_snapshot->cs_stable = clock->is_stable();
+		else
+			systime_snapshot->cs_stable = false;
+
+		if (clock->read_with_cycles) {
+			clock->read_with_cycles(
+				&now, &systime_snapshot->cycles);
+		} else {
+			now = tk_clock_read(&tk->tkr_mono);
+			systime_snapshot->cycles = now;
+		}
 		systime_snapshot->cs_was_changed_seq = tk->cs_was_changed_seq;
 		systime_snapshot->clock_was_set_seq = tk->clock_was_set_seq;
 		base_real = ktime_add(tk->tkr_mono.base,
 				      tk_core.timekeeper.offs_real);
 		base_raw = tk->tkr_raw.base;
+		base_boot = ktime_add(tk->tkr_mono.base,
+				      tk_core.timekeeper.offs_boot);
 		nsec_real = timekeeping_cycles_to_ns(&tk->tkr_mono, now);
 		nsec_raw  = timekeeping_cycles_to_ns(&tk->tkr_raw, now);
 	} while (read_seqcount_retry(&tk_core.seq, seq));
 
-	systime_snapshot->cycles = now;
 	systime_snapshot->real = ktime_add_ns(base_real, nsec_real);
 	systime_snapshot->raw = ktime_add_ns(base_raw, nsec_raw);
+	systime_snapshot->boot = ktime_add_ns(base_boot, nsec_real);
 }
 EXPORT_SYMBOL_GPL(ktime_get_snapshot);
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v3 3/6] timekeeper: add clocksource change notifier
  2017-07-29 12:35 [PATCH v3 0/6] make L2's kvm-clock stable, get rid of pvclock_gtod Denis Plotnikov
  2017-07-29 12:35 ` [PATCH v3 1/6] pvclock: add parameter to store cycles stamp to pvclock reading function Denis Plotnikov
  2017-07-29 12:35 ` [PATCH v3 2/6] KVM: x86: switch to masterclock update using timekeeper functionality Denis Plotnikov
@ 2017-07-29 12:35 ` Denis Plotnikov
  2017-07-29 12:35 ` [PATCH v3 4/6] KVM: x86: remove not used pvclock_gtod_copy Denis Plotnikov
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 18+ messages in thread
From: Denis Plotnikov @ 2017-07-29 12:35 UTC (permalink / raw)
  To: pbonzini, rkrcmar, kvm; +Cc: rkagan, den, svt-core

This notifier will fire when clocksource is changed or
any properties of the clocksource are changed which alter
the clocksource critical properties, e.g clocksource stability.

It will be used in updating the KVM masterclock with the help
of timekeeper because it's the very moment to notify KVM about
critical changes happened in the underlying timekeeper.

This is a part of work aiming to switch the KVM to use the timekeeper
functionality instead of pvclock_gtod_copy.

Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com>
---
 include/linux/cs_notifier.h | 17 +++++++++++++++
 kernel/time/timekeeping.c   | 51 +++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 68 insertions(+)
 create mode 100644 include/linux/cs_notifier.h

diff --git a/include/linux/cs_notifier.h b/include/linux/cs_notifier.h
new file mode 100644
index 0000000..2b1b4e6
--- /dev/null
+++ b/include/linux/cs_notifier.h
@@ -0,0 +1,17 @@
+#ifndef _CS_CHANGES_H
+#define _CS_CHANGES_H
+
+#include <linux/notifier.h>
+
+/*
+ * The clocksource changes notifier is called when the system
+ * clocksource is changed or some properties of the current
+ * system clocksource is changed that can affect other parts of the system,
+ * for example KVM guests
+ */
+
+extern void clocksource_changes_notify(void);
+extern int clocksource_changes_register_notifier(struct notifier_block *nb);
+extern int clocksource_changes_unregister_notifier(struct notifier_block *nb);
+
+#endif /* _CS_CHANGES_H */
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index a2bfc12..ef608f0 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -596,6 +596,55 @@ int pvclock_gtod_unregister_notifier(struct notifier_block *nb)
 }
 EXPORT_SYMBOL_GPL(pvclock_gtod_unregister_notifier);
 
+/* notification chain when there is some changes in the clocksource */
+static RAW_NOTIFIER_HEAD(clocksource_changes_chain);
+
+/**
+ * notify_clocksource_changing - notify all the listeners about changes
+ * happened in the clocksource: changing a clocksource, changing the sensitive
+ * parameters of the clocksource, e.g. stability flag for kvmclock
+ */
+void clocksource_changes_notify(void)
+{
+	raw_notifier_call_chain(&clocksource_changes_chain, 0L, NULL);
+}
+EXPORT_SYMBOL_GPL(clocksource_changes_notify);
+
+/**
+ * clocksource_changes_register_notifier - register
+ * a clocksource changes listener
+ */
+int clocksource_changes_register_notifier(struct notifier_block *nb)
+{
+	unsigned long flags;
+	int ret;
+
+	raw_spin_lock_irqsave(&timekeeper_lock, flags);
+	ret = raw_notifier_chain_register(&clocksource_changes_chain, nb);
+	clocksource_changes_notify();
+	raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(clocksource_changes_register_notifier);
+
+/**
+ * clocksource_changes_unregister_notifier - unregister
+ * a clocksource changes listener
+ */
+int clocksource_changes_unregister_notifier(struct notifier_block *nb)
+{
+	unsigned long flags;
+	int ret;
+
+	raw_spin_lock_irqsave(&timekeeper_lock, flags);
+	ret = raw_notifier_chain_unregister(&clocksource_changes_chain, nb);
+	raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(clocksource_changes_unregister_notifier);
+
 /*
  * tk_update_leap_state - helper to update the next_leap_ktime
  */
@@ -1367,6 +1416,7 @@ static int change_clocksource(void *data)
 		}
 	}
 	timekeeping_update(tk, TK_CLEAR_NTP | TK_MIRROR | TK_CLOCK_WAS_SET);
+	clocksource_changes_notify();
 
 	write_seqcount_end(&tk_core.seq);
 	raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
@@ -1544,6 +1594,7 @@ void __init timekeeping_init(void)
 	tk_set_wall_to_mono(tk, tmp);
 
 	timekeeping_update(tk, TK_MIRROR | TK_CLOCK_WAS_SET);
+	clocksource_changes_notify();
 
 	write_seqcount_end(&tk_core.seq);
 	raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v3 4/6] KVM: x86: remove not used pvclock_gtod_copy
  2017-07-29 12:35 [PATCH v3 0/6] make L2's kvm-clock stable, get rid of pvclock_gtod Denis Plotnikov
                   ` (2 preceding siblings ...)
  2017-07-29 12:35 ` [PATCH v3 3/6] timekeeper: add clocksource change notifier Denis Plotnikov
@ 2017-07-29 12:35 ` Denis Plotnikov
  2017-07-29 12:35 ` [PATCH v3 5/6] pvclock: add clocksource change notification on changing of tsc stable bit Denis Plotnikov
  2017-07-29 12:35 ` [PATCH v3 6/6] kvmclock: add the clocksource stability querying function Denis Plotnikov
  5 siblings, 0 replies; 18+ messages in thread
From: Denis Plotnikov @ 2017-07-29 12:35 UTC (permalink / raw)
  To: pbonzini, rkrcmar, kvm; +Cc: rkagan, den, svt-core

Since, KVM has been switched to getting masterclock related data
right from the timekeeper by the previous patches, now we are able
to remove all the parts related to the old scheme of getting
masterclock data.

This patch removes those parts.

Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com>
---
 arch/x86/include/asm/kvm_host.h |   2 +-
 arch/x86/kvm/trace.h            |  31 ++----
 arch/x86/kvm/x86.c              | 216 ++++++----------------------------------
 3 files changed, 42 insertions(+), 207 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 87ac4fb..91465db 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -791,7 +791,7 @@ struct kvm_arch {
 	u64 cur_tsc_generation;
 	int nr_vcpus_matched_tsc;
 
-	spinlock_t pvclock_gtod_sync_lock;
+	spinlock_t masterclock_lock;
 	bool use_master_clock;
 	u64 master_kernel_ns;
 	u64 master_cycle_now;
diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index 0a6cc67..923ab31 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -807,45 +807,39 @@ TRACE_EVENT(kvm_write_tsc_offset,
 
 #ifdef CONFIG_X86_64
 
-#define host_clocks					\
-	{VCLOCK_NONE, "none"},				\
-	{VCLOCK_TSC,  "tsc"}				\
-
 TRACE_EVENT(kvm_update_master_clock,
-	TP_PROTO(bool use_master_clock, unsigned int host_clock, bool offset_matched),
-	TP_ARGS(use_master_clock, host_clock, offset_matched),
+	TP_PROTO(bool use_master_clock, bool host_clock_stable,
+		bool offset_matched),
+	TP_ARGS(use_master_clock, host_clock_stable, offset_matched),
 
 	TP_STRUCT__entry(
 		__field(		bool,	use_master_clock	)
-		__field(	unsigned int,	host_clock		)
+		__field(		bool,	host_clock_stable	)
 		__field(		bool,	offset_matched		)
 	),
 
 	TP_fast_assign(
 		__entry->use_master_clock	= use_master_clock;
-		__entry->host_clock		= host_clock;
+		__entry->host_clock_stable	= host_clock_stable;
 		__entry->offset_matched		= offset_matched;
 	),
 
-	TP_printk("masterclock %d hostclock %s offsetmatched %u",
+	TP_printk("masterclock %d hostclock stable %u offsetmatched %u",
 		  __entry->use_master_clock,
-		  __print_symbolic(__entry->host_clock, host_clocks),
+		  __entry->host_clock_stable,
 		  __entry->offset_matched)
 );
 
 TRACE_EVENT(kvm_track_tsc,
 	TP_PROTO(unsigned int vcpu_id, unsigned int nr_matched,
-		 unsigned int online_vcpus, bool use_master_clock,
-		 unsigned int host_clock),
-	TP_ARGS(vcpu_id, nr_matched, online_vcpus, use_master_clock,
-		host_clock),
+		 unsigned int online_vcpus, bool use_master_clock),
+	TP_ARGS(vcpu_id, nr_matched, online_vcpus, use_master_clock),
 
 	TP_STRUCT__entry(
 		__field(	unsigned int,	vcpu_id			)
 		__field(	unsigned int,	nr_vcpus_matched_tsc	)
 		__field(	unsigned int,	online_vcpus		)
 		__field(	bool,		use_master_clock	)
-		__field(	unsigned int,	host_clock		)
 	),
 
 	TP_fast_assign(
@@ -853,14 +847,11 @@ TRACE_EVENT(kvm_track_tsc,
 		__entry->nr_vcpus_matched_tsc	= nr_matched;
 		__entry->online_vcpus		= online_vcpus;
 		__entry->use_master_clock	= use_master_clock;
-		__entry->host_clock		= host_clock;
 	),
 
-	TP_printk("vcpu_id %u masterclock %u offsetmatched %u nr_online %u"
-		  " hostclock %s",
+	TP_printk("vcpu_id %u masterclock %u offsetmatched %u nr_online %u",
 		  __entry->vcpu_id, __entry->use_master_clock,
-		  __entry->nr_vcpus_matched_tsc, __entry->online_vcpus,
-		  __print_symbolic(__entry->host_clock, host_clocks))
+		  __entry->nr_vcpus_matched_tsc, __entry->online_vcpus)
 );
 
 #endif /* CONFIG_X86_64 */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 496e731..7d232ad 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -50,7 +50,7 @@
 #include <linux/hash.h>
 #include <linux/pci.h>
 #include <linux/timekeeper_internal.h>
-#include <linux/pvclock_gtod.h>
+#include <linux/cs_notifier.h>
 #include <linux/kvm_irqfd.h>
 #include <linux/irqbypass.h>
 #include <linux/sched/stat.h>
@@ -1134,50 +1134,6 @@ static int do_set_msr(struct kvm_vcpu *vcpu, unsigned index, u64 *data)
 	return kvm_set_msr(vcpu, &msr);
 }
 
-#ifdef CONFIG_X86_64
-struct pvclock_gtod_data {
-	seqcount_t	seq;
-
-	struct { /* extract of a clocksource struct */
-		int vclock_mode;
-		u64	cycle_last;
-		u64	mask;
-		u32	mult;
-		u32	shift;
-	} clock;
-
-	u64		boot_ns;
-	u64		nsec_base;
-	u64		wall_time_sec;
-};
-
-static struct pvclock_gtod_data pvclock_gtod_data;
-
-static void update_pvclock_gtod(struct timekeeper *tk)
-{
-	struct pvclock_gtod_data *vdata = &pvclock_gtod_data;
-	u64 boot_ns;
-
-	boot_ns = ktime_to_ns(ktime_add(tk->tkr_mono.base, tk->offs_boot));
-
-	write_seqcount_begin(&vdata->seq);
-
-	/* copy pvclock gtod data */
-	vdata->clock.vclock_mode	= tk->tkr_mono.clock->archdata.vclock_mode;
-	vdata->clock.cycle_last		= tk->tkr_mono.cycle_last;
-	vdata->clock.mask		= tk->tkr_mono.mask;
-	vdata->clock.mult		= tk->tkr_mono.mult;
-	vdata->clock.shift		= tk->tkr_mono.shift;
-
-	vdata->boot_ns			= boot_ns;
-	vdata->nsec_base		= tk->tkr_mono.xtime_nsec;
-
-	vdata->wall_time_sec            = tk->xtime_sec;
-
-	write_seqcount_end(&vdata->seq);
-}
-#endif
-
 void kvm_set_pending_timer(struct kvm_vcpu *vcpu)
 {
 	/*
@@ -1269,10 +1225,6 @@ static void kvm_get_time_scale(uint64_t scaled_hz, uint64_t base_hz,
 		 __func__, base_hz, scaled_hz, shift, *pmultiplier);
 }
 
-#ifdef CONFIG_X86_64
-static atomic_t kvm_guest_has_master_clock = ATOMIC_INIT(0);
-#endif
-
 static DEFINE_PER_CPU(unsigned long, cpu_tsc_khz);
 static unsigned long max_tsc_khz;
 
@@ -1366,7 +1318,6 @@ static void kvm_track_tsc_matching(struct kvm_vcpu *vcpu)
 #ifdef CONFIG_X86_64
 	bool vcpus_matched;
 	struct kvm_arch *ka = &vcpu->kvm->arch;
-	struct pvclock_gtod_data *gtod = &pvclock_gtod_data;
 
 	vcpus_matched = (ka->nr_vcpus_matched_tsc + 1 ==
 			 atomic_read(&vcpu->kvm->online_vcpus));
@@ -1379,13 +1330,12 @@ static void kvm_track_tsc_matching(struct kvm_vcpu *vcpu)
 	 * and the vcpus need to have matched TSCs.  When that happens,
 	 * perform request to enable masterclock.
 	 */
-	if (ka->use_master_clock ||
-	    (gtod->clock.vclock_mode == VCLOCK_TSC && vcpus_matched))
+	if (ka->use_master_clock || vcpus_matched)
 		kvm_make_request(KVM_REQ_MASTERCLOCK_UPDATE, vcpu);
 
 	trace_kvm_track_tsc(vcpu->vcpu_id, ka->nr_vcpus_matched_tsc,
-			    atomic_read(&vcpu->kvm->online_vcpus),
-		            ka->use_master_clock, gtod->clock.vclock_mode);
+				atomic_read(&vcpu->kvm->online_vcpus),
+				ka->use_master_clock);
 #endif
 }
 
@@ -1538,7 +1488,7 @@ void kvm_write_tsc(struct kvm_vcpu *vcpu, struct msr_data *msr)
 	kvm_vcpu_write_tsc_offset(vcpu, offset);
 	raw_spin_unlock_irqrestore(&kvm->arch.tsc_write_lock, flags);
 
-	spin_lock(&kvm->arch.pvclock_gtod_sync_lock);
+	spin_lock(&kvm->arch.masterclock_lock);
 	if (!matched) {
 		kvm->arch.nr_vcpus_matched_tsc = 0;
 	} else if (!already_matched) {
@@ -1546,9 +1496,8 @@ void kvm_write_tsc(struct kvm_vcpu *vcpu, struct msr_data *msr)
 	}
 
 	kvm_track_tsc_matching(vcpu);
-	spin_unlock(&kvm->arch.pvclock_gtod_sync_lock);
+	spin_unlock(&kvm->arch.masterclock_lock);
 }
-
 EXPORT_SYMBOL_GPL(kvm_write_tsc);
 
 static inline void adjust_tsc_offset_guest(struct kvm_vcpu *vcpu,
@@ -1567,79 +1516,6 @@ static inline void adjust_tsc_offset_host(struct kvm_vcpu *vcpu, s64 adjustment)
 
 #ifdef CONFIG_X86_64
 
-static u64 read_tsc(void)
-{
-	u64 ret = (u64)rdtsc_ordered();
-	u64 last = pvclock_gtod_data.clock.cycle_last;
-
-	if (likely(ret >= last))
-		return ret;
-
-	/*
-	 * GCC likes to generate cmov here, but this branch is extremely
-	 * predictable (it's just a function of time and the likely is
-	 * very likely) and there's a data dependence, so force GCC
-	 * to generate a branch instead.  I don't barrier() because
-	 * we don't actually need a barrier, and if this function
-	 * ever gets inlined it will generate worse code.
-	 */
-	asm volatile ("");
-	return last;
-}
-
-static inline u64 vgettsc(u64 *cycle_now)
-{
-	long v;
-	struct pvclock_gtod_data *gtod = &pvclock_gtod_data;
-
-	*cycle_now = read_tsc();
-
-	v = (*cycle_now - gtod->clock.cycle_last) & gtod->clock.mask;
-	return v * gtod->clock.mult;
-}
-
-static int do_monotonic_boot(s64 *t, u64 *cycle_now)
-{
-	struct pvclock_gtod_data *gtod = &pvclock_gtod_data;
-	unsigned long seq;
-	int mode;
-	u64 ns;
-
-	do {
-		seq = read_seqcount_begin(&gtod->seq);
-		mode = gtod->clock.vclock_mode;
-		ns = gtod->nsec_base;
-		ns += vgettsc(cycle_now);
-		ns >>= gtod->clock.shift;
-		ns += gtod->boot_ns;
-	} while (unlikely(read_seqcount_retry(&gtod->seq, seq)));
-	*t = ns;
-
-	return mode;
-}
-
-static int do_realtime(struct timespec *ts, u64 *cycle_now)
-{
-	struct pvclock_gtod_data *gtod = &pvclock_gtod_data;
-	unsigned long seq;
-	int mode;
-	u64 ns;
-
-	do {
-		seq = read_seqcount_begin(&gtod->seq);
-		mode = gtod->clock.vclock_mode;
-		ts->tv_sec = gtod->wall_time_sec;
-		ns = gtod->nsec_base;
-		ns += vgettsc(cycle_now);
-		ns >>= gtod->clock.shift;
-	} while (unlikely(read_seqcount_retry(&gtod->seq, seq)));
-
-	ts->tv_sec += __iter_div_u64_rem(ns, NSEC_PER_SEC, &ns);
-	ts->tv_nsec = ns;
-
-	return mode;
-}
-
 /* returns true if host is using tsc clocksource */
 static bool kvm_get_time_and_clockread(s64 *kernel_ns, u64 *cycle_now)
 {
@@ -1713,34 +1589,28 @@ static bool kvm_get_walltime_and_clockread(struct timespec *ts,
  *
  */
 
-static void pvclock_update_vm_gtod_copy(struct kvm *kvm)
+static void update_masterclock(struct kvm *kvm)
 {
 #ifdef CONFIG_X86_64
 	struct kvm_arch *ka = &kvm->arch;
-	int vclock_mode;
-	bool host_tsc_clocksource, vcpus_matched;
+	bool host_clocksource_stable, vcpus_matched;
 
 	vcpus_matched = (ka->nr_vcpus_matched_tsc + 1 ==
 			atomic_read(&kvm->online_vcpus));
 
 	/*
-	 * If the host uses TSC clock, then passthrough TSC as stable
-	 * to the guest.
+	 * kvm_get_time_and_clockread returns true if clocksource is stable
 	 */
-	host_tsc_clocksource = kvm_get_time_and_clockread(
+	host_clocksource_stable = kvm_get_time_and_clockread(
 					&ka->master_kernel_ns,
 					&ka->master_cycle_now);
 
-	ka->use_master_clock = host_tsc_clocksource && vcpus_matched
+	ka->use_master_clock = host_clocksource_stable && vcpus_matched
 				&& !ka->backwards_tsc_observed
 				&& !ka->boot_vcpu_runs_old_kvmclock;
 
-	if (ka->use_master_clock)
-		atomic_set(&kvm_guest_has_master_clock, 1);
-
-	vclock_mode = pvclock_gtod_data.clock.vclock_mode;
-	trace_kvm_update_master_clock(ka->use_master_clock, vclock_mode,
-					vcpus_matched);
+	trace_kvm_update_master_clock(ka->use_master_clock,
+				host_clocksource_stable, vcpus_matched);
 #endif
 }
 
@@ -1756,10 +1626,10 @@ static void kvm_gen_update_masterclock(struct kvm *kvm)
 	struct kvm_vcpu *vcpu;
 	struct kvm_arch *ka = &kvm->arch;
 
-	spin_lock(&ka->pvclock_gtod_sync_lock);
+	spin_lock(&ka->masterclock_lock);
 	kvm_make_mclock_inprogress_request(kvm);
 	/* no guest entries from this point */
-	pvclock_update_vm_gtod_copy(kvm);
+	update_masterclock(kvm);
 
 	kvm_for_each_vcpu(i, vcpu, kvm)
 		kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
@@ -1768,7 +1638,7 @@ static void kvm_gen_update_masterclock(struct kvm *kvm)
 	kvm_for_each_vcpu(i, vcpu, kvm)
 		kvm_clear_request(KVM_REQ_MCLOCK_INPROGRESS, vcpu);
 
-	spin_unlock(&ka->pvclock_gtod_sync_lock);
+	spin_unlock(&ka->masterclock_lock);
 #endif
 }
 
@@ -1778,15 +1648,15 @@ u64 get_kvmclock_ns(struct kvm *kvm)
 	struct pvclock_vcpu_time_info hv_clock;
 	u64 ret;
 
-	spin_lock(&ka->pvclock_gtod_sync_lock);
+	spin_lock(&ka->masterclock_lock);
 	if (!ka->use_master_clock) {
-		spin_unlock(&ka->pvclock_gtod_sync_lock);
+		spin_unlock(&ka->masterclock_lock);
 		return ktime_get_boot_ns() + ka->kvmclock_offset;
 	}
 
 	hv_clock.tsc_timestamp = ka->master_cycle_now;
 	hv_clock.system_time = ka->master_kernel_ns + ka->kvmclock_offset;
-	spin_unlock(&ka->pvclock_gtod_sync_lock);
+	spin_unlock(&ka->masterclock_lock);
 
 	/* both __this_cpu_read() and rdtsc() should be on the same cpu */
 	get_cpu();
@@ -1872,13 +1742,13 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
 	 * If the host uses TSC clock, then passthrough TSC as stable
 	 * to the guest.
 	 */
-	spin_lock(&ka->pvclock_gtod_sync_lock);
+	spin_lock(&ka->masterclock_lock);
 	use_master_clock = ka->use_master_clock;
 	if (use_master_clock) {
 		host_tsc = ka->master_cycle_now;
 		kernel_ns = ka->master_kernel_ns;
 	}
-	spin_unlock(&ka->pvclock_gtod_sync_lock);
+	spin_unlock(&ka->masterclock_lock);
 
 	/* Keep irq disabled to prevent changes to the clock */
 	local_irq_save(flags);
@@ -4208,11 +4078,7 @@ long kvm_arch_vm_ioctl(struct file *filp,
 			goto out;
 
 		r = 0;
-		/*
-		 * TODO: userspace has to take care of races with VCPU_RUN, so
-		 * kvm_gen_update_masterclock() can be cut down to locked
-		 * pvclock_update_vm_gtod_copy().
-		 */
+
 		kvm_gen_update_masterclock(kvm);
 		now_ns = get_kvmclock_ns(kvm);
 		kvm->arch.kvmclock_offset += user_ns.clock - now_ns;
@@ -6041,7 +5907,8 @@ static void kvm_set_mmio_spte_mask(void)
 }
 
 #ifdef CONFIG_X86_64
-static void pvclock_gtod_update_fn(struct work_struct *work)
+static int process_clocksource_change(struct notifier_block *nb,
+					unsigned long unused0, void *unused1)
 {
 	struct kvm *kvm;
 
@@ -6052,35 +5919,12 @@ static void pvclock_gtod_update_fn(struct work_struct *work)
 	list_for_each_entry(kvm, &vm_list, vm_list)
 		kvm_for_each_vcpu(i, vcpu, kvm)
 			kvm_make_request(KVM_REQ_MASTERCLOCK_UPDATE, vcpu);
-	atomic_set(&kvm_guest_has_master_clock, 0);
 	spin_unlock(&kvm_lock);
-}
-
-static DECLARE_WORK(pvclock_gtod_work, pvclock_gtod_update_fn);
-
-/*
- * Notification about pvclock gtod data update.
- */
-static int pvclock_gtod_notify(struct notifier_block *nb, unsigned long unused,
-			       void *priv)
-{
-	struct pvclock_gtod_data *gtod = &pvclock_gtod_data;
-	struct timekeeper *tk = priv;
-
-	update_pvclock_gtod(tk);
-
-	/* disable master clock if host does not trust, or does not
-	 * use, TSC clocksource
-	 */
-	if (gtod->clock.vclock_mode != VCLOCK_TSC &&
-	    atomic_read(&kvm_guest_has_master_clock) != 0)
-		queue_work(system_long_wq, &pvclock_gtod_work);
-
 	return 0;
 }
 
-static struct notifier_block pvclock_gtod_notifier = {
-	.notifier_call = pvclock_gtod_notify,
+static struct notifier_block clocksource_change_notifier = {
+	.notifier_call = process_clocksource_change,
 };
 #endif
 
@@ -6133,7 +5977,7 @@ int kvm_arch_init(void *opaque)
 
 	kvm_lapic_init();
 #ifdef CONFIG_X86_64
-	pvclock_gtod_register_notifier(&pvclock_gtod_notifier);
+	clocksource_changes_register_notifier(&clocksource_change_notifier);
 #endif
 
 	return 0;
@@ -6154,7 +5998,7 @@ void kvm_arch_exit(void)
 					    CPUFREQ_TRANSITION_NOTIFIER);
 	cpuhp_remove_state_nocalls(CPUHP_AP_X86_KVM_CLK_ONLINE);
 #ifdef CONFIG_X86_64
-	pvclock_gtod_unregister_notifier(&pvclock_gtod_notifier);
+	clocksource_changes_unregister_notifier(&clocksource_change_notifier);
 #endif
 	kvm_x86_ops = NULL;
 	kvm_mmu_module_exit();
@@ -8056,10 +7900,10 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 	raw_spin_lock_init(&kvm->arch.tsc_write_lock);
 	mutex_init(&kvm->arch.apic_map_lock);
 	mutex_init(&kvm->arch.hyperv.hv_lock);
-	spin_lock_init(&kvm->arch.pvclock_gtod_sync_lock);
+	spin_lock_init(&kvm->arch.masterclock_lock);
 
 	kvm->arch.kvmclock_offset = -ktime_get_boot_ns();
-	pvclock_update_vm_gtod_copy(kvm);
+	update_masterclock(kvm);
 
 	INIT_DELAYED_WORK(&kvm->arch.kvmclock_update_work, kvmclock_update_fn);
 	INIT_DELAYED_WORK(&kvm->arch.kvmclock_sync_work, kvmclock_sync_fn);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v3 5/6] pvclock: add clocksource change notification on changing of tsc stable bit
  2017-07-29 12:35 [PATCH v3 0/6] make L2's kvm-clock stable, get rid of pvclock_gtod Denis Plotnikov
                   ` (3 preceding siblings ...)
  2017-07-29 12:35 ` [PATCH v3 4/6] KVM: x86: remove not used pvclock_gtod_copy Denis Plotnikov
@ 2017-07-29 12:35 ` Denis Plotnikov
  2017-07-31 14:21   ` Paolo Bonzini
  2017-07-29 12:35 ` [PATCH v3 6/6] kvmclock: add the clocksource stability querying function Denis Plotnikov
  5 siblings, 1 reply; 18+ messages in thread
From: Denis Plotnikov @ 2017-07-29 12:35 UTC (permalink / raw)
  To: pbonzini, rkrcmar, kvm; +Cc: rkagan, den, svt-core

Now, L2 guests can't have a stable kvmclock while working over L1 with
a stable kvmclock. It is so just because of lack of functionality.

The patch is a part of work making kvmclock stable in L2 if it works
over a stable kvmclock.

Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com>
---
 arch/x86/kernel/pvclock.c | 19 ++++++++++++++++---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
index a2c554a..5b0bb2e 100644
--- a/arch/x86/kernel/pvclock.c
+++ b/arch/x86/kernel/pvclock.c
@@ -22,6 +22,7 @@
 #include <linux/gfp.h>
 #include <linux/bootmem.h>
 #include <linux/nmi.h>
+#include <linux/cs_notifier.h>
 
 #include <asm/fixmap.h>
 #include <asm/pvclock.h>
@@ -73,6 +74,8 @@ u8 pvclock_read_flags(struct pvclock_vcpu_time_info *src)
 	return flags & valid_flags;
 }
 
+static atomic64_t clocksource_stable = ATOMIC64_INIT(0);
+
 u64 pvclock_clocksource_read(struct pvclock_vcpu_time_info *src, u64 *cycles)
 {
 	unsigned version;
@@ -96,10 +99,20 @@ u64 pvclock_clocksource_read(struct pvclock_vcpu_time_info *src, u64 *cycles)
 		pvclock_touch_watchdogs();
 	}
 
-	if ((valid_flags & PVCLOCK_TSC_STABLE_BIT) &&
-		(flags & PVCLOCK_TSC_STABLE_BIT))
-		return ret;
+	if (likely(valid_flags & PVCLOCK_TSC_STABLE_BIT)) {
+		bool stable_now = !!(flags & PVCLOCK_TSC_STABLE_BIT);
+		bool stable_last = (bool) atomic64_read(&clocksource_stable);
+
+		if (unlikely(stable_now != stable_last)) {
+			/* send notification once */
+			if (stable_last == atomic64_cmpxchg(
+				&clocksource_stable, stable_last, stable_now))
+				clocksource_changes_notify();
+		}
 
+		if (stable_now)
+			return ret;
+	}
 	/*
 	 * Assumption here is that last_value, a global accumulator, always goes
 	 * forward. If we are less than that, we should not be much smaller.
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v3 6/6] kvmclock: add the clocksource stability querying function
  2017-07-29 12:35 [PATCH v3 0/6] make L2's kvm-clock stable, get rid of pvclock_gtod Denis Plotnikov
                   ` (4 preceding siblings ...)
  2017-07-29 12:35 ` [PATCH v3 5/6] pvclock: add clocksource change notification on changing of tsc stable bit Denis Plotnikov
@ 2017-07-29 12:35 ` Denis Plotnikov
  5 siblings, 0 replies; 18+ messages in thread
From: Denis Plotnikov @ 2017-07-29 12:35 UTC (permalink / raw)
  To: pbonzini, rkrcmar, kvm; +Cc: rkagan, den, svt-core

The patch is the final part of work making kvmclock stable in L2
if it works over a stable clocksource.

Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com>
---
 arch/x86/kernel/kvmclock.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index 52156d9..6a4d995 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -186,10 +186,22 @@ static void kvm_clock_read_with_cycles(u64 *cycles, u64 *cycles_stamp)
 	*cycles = __kvm_clock_read(cycles_stamp);
 }
 
+static bool kvm_clock_stable(void)
+{
+	int cpu = get_cpu();
+	struct pvclock_vcpu_time_info *vcpu_time = &hv_clock[cpu].pvti;
+	u8 flags = pvclock_read_flags(vcpu_time);
+
+	put_cpu();
+
+	return flags & PVCLOCK_TSC_STABLE_BIT;
+}
+
 struct clocksource kvm_clock = {
 	.name = "kvm-clock",
 	.read = kvm_clock_get_cycles,
 	.read_with_cycles = kvm_clock_read_with_cycles,
+	.is_stable = kvm_clock_stable,
 	.rating = 400,
 	.mask = CLOCKSOURCE_MASK(64),
 	.flags = CLOCK_SOURCE_IS_CONTINUOUS,
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 1/6] pvclock: add parameter to store cycles stamp to pvclock reading function
  2017-07-29 12:35 ` [PATCH v3 1/6] pvclock: add parameter to store cycles stamp to pvclock reading function Denis Plotnikov
@ 2017-07-31 14:08   ` Paolo Bonzini
  0 siblings, 0 replies; 18+ messages in thread
From: Paolo Bonzini @ 2017-07-31 14:08 UTC (permalink / raw)
  To: Denis Plotnikov, rkrcmar, kvm; +Cc: rkagan, den, svt-core

On 29/07/2017 14:35, Denis Plotnikov wrote:
> The parameter is a pointer which is used to store cycles stamp value
> used for time calulation on pvclock reading.
> 
> This is a part of the work aiming to move to a more simple scheme of
> masterclock related values calculation in KVM.
> 
> Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com>
> ---
>  arch/x86/include/asm/pvclock.h |  2 +-
>  arch/x86/kernel/kvmclock.c     |  2 +-
>  arch/x86/kernel/pvclock.c      | 12 +++++++++---
>  arch/x86/xen/time.c            |  2 +-
>  4 files changed, 12 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/include/asm/pvclock.h b/arch/x86/include/asm/pvclock.h
> index 448cfe1..147b411 100644
> --- a/arch/x86/include/asm/pvclock.h
> +++ b/arch/x86/include/asm/pvclock.h
> @@ -14,7 +14,7 @@ static inline struct pvclock_vsyscall_time_info *pvclock_pvti_cpu0_va(void)
>  #endif
>  
>  /* some helper functions for xen and kvm pv clock sources */
> -u64 pvclock_clocksource_read(struct pvclock_vcpu_time_info *src);
> +u64 pvclock_clocksource_read(struct pvclock_vcpu_time_info *src, u64 *cycles);
>  u8 pvclock_read_flags(struct pvclock_vcpu_time_info *src);
>  void pvclock_set_flags(u8 flags);
>  unsigned long pvclock_tsc_khz(struct pvclock_vcpu_time_info *src);
> diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
> index d889676..f5cfc5d 100644
> --- a/arch/x86/kernel/kvmclock.c
> +++ b/arch/x86/kernel/kvmclock.c
> @@ -91,7 +91,7 @@ static u64 kvm_clock_read(void)
>  	preempt_disable_notrace();
>  	cpu = smp_processor_id();
>  	src = &hv_clock[cpu].pvti;
> -	ret = pvclock_clocksource_read(src);
> +	ret = pvclock_clocksource_read(src, NULL);
>  	preempt_enable_notrace();
>  	return ret;
>  }
> diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
> index 5c3f6d6..a2c554a 100644
> --- a/arch/x86/kernel/pvclock.c
> +++ b/arch/x86/kernel/pvclock.c
> @@ -73,19 +73,24 @@ u8 pvclock_read_flags(struct pvclock_vcpu_time_info *src)
>  	return flags & valid_flags;
>  }
>  
> -u64 pvclock_clocksource_read(struct pvclock_vcpu_time_info *src)
> +u64 pvclock_clocksource_read(struct pvclock_vcpu_time_info *src, u64 *cycles)
>  {
>  	unsigned version;
>  	u64 ret;
>  	u64 last;
>  	u8 flags;
> +	u64 tsc;
>  
>  	do {
>  		version = pvclock_read_begin(src);
> -		ret = __pvclock_read_cycles(src, rdtsc_ordered());
> +		tsc = rdtsc_ordered();
> +		ret = __pvclock_read_cycles(src, tsc);
>  		flags = src->flags;
>  	} while (pvclock_read_retry(src, version));
>  
> +	if (cycles)
> +		*cycles = tsc;
> +
>  	if (unlikely((flags & PVCLOCK_GUEST_STOPPED) != 0)) {
>  		src->flags &= ~PVCLOCK_GUEST_STOPPED;
>  		pvclock_touch_watchdogs();
> @@ -136,7 +141,8 @@ void pvclock_read_wallclock(struct pvclock_wall_clock *wall_clock,
>  		rmb();		/* fetch time before checking version */
>  	} while ((wall_clock->version & 1) || (version != wall_clock->version));
>  
> -	delta = pvclock_clocksource_read(vcpu_time);	/* time since system boot */
> +	/* time since system boot */
> +	delta = pvclock_clocksource_read(vcpu_time, NULL);
>  	delta += now.tv_sec * (u64)NSEC_PER_SEC + now.tv_nsec;
>  
>  	now.tv_nsec = do_div(delta, NSEC_PER_SEC);
> diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c
> index 1ecb05d..145ac9a 100644
> --- a/arch/x86/xen/time.c
> +++ b/arch/x86/xen/time.c
> @@ -46,7 +46,7 @@ u64 xen_clocksource_read(void)
>  
>  	preempt_disable_notrace();
>  	src = &__this_cpu_read(xen_vcpu)->time;
> -	ret = pvclock_clocksource_read(src);
> +	ret = pvclock_clocksource_read(src, NULL);
>  	preempt_enable_notrace();
>  	return ret;
>  }
> 

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 2/6] KVM: x86: switch to masterclock update using timekeeper functionality
  2017-07-29 12:35 ` [PATCH v3 2/6] KVM: x86: switch to masterclock update using timekeeper functionality Denis Plotnikov
@ 2017-07-31 14:20   ` Paolo Bonzini
  2017-08-01  9:30     ` Denis Plotnikov
  0 siblings, 1 reply; 18+ messages in thread
From: Paolo Bonzini @ 2017-07-31 14:20 UTC (permalink / raw)
  To: Denis Plotnikov, rkrcmar, kvm; +Cc: rkagan, den, svt-core

On 29/07/2017 14:35, Denis Plotnikov wrote:
>  arch/x86/kernel/kvmclock.c  | 14 ++++++++++++--
>  arch/x86/kernel/tsc.c       |  6 ++++++
>  arch/x86/kvm/x86.c          | 26 ++++++++++++++++++--------
>  include/linux/clocksource.h |  3 +++
>  include/linux/timekeeping.h |  2 ++
>  kernel/time/timekeeping.c   | 21 +++++++++++++++++++--
>  6 files changed, 60 insertions(+), 12 deletions(-)

This is pretty clean, thanks.  Only it should be split in several patches:

- introducing read_with_cycles and using it in ktime_get_snapshot.  
Looking at the code, the meaning of "cycles" is overloaded, so perhaps 
rename it to read_clock_and_systime or something similar?

- introducing boot time in ktime_get_snapshot

- implementing kvm_clock_read_with_cycles (can be merged with patch 6)

- adding the cs_stable field to struct system_time_snapshot (see below,
maybe this can be merged with read_with_cycles)

- using ktime_get_snapshot in KVM (can be merged with patch 4?)

so that the timekeeping maintainer can comment on each new feature you 
add to their code.

cs_stable is the part that I'm still a bit wary of; here are the doubts
I have:

- if you want stability, you can use the CLOCK_SOURCE_UNSTABLE flag; a 
new callback shouldn't be needed (it's certainly not needed for TSC).

- the meaning of "stable" for kvmclock is not exactly the same as 
clocksource_mark_unstable.

Maybe what we want is some kind of "bool cycles_valid", and then 
read_clock_and_systime can return it:


		if (clock->read_clock_and_systime) {
			systime_snapshot->cycles_valid = clock->read_clock_and_systime(
				&now, &systime_snapshot->cycles);
		} else {
			now = tk_clock_read(&tk->tkr_mono);
			systime_snapshot->cycles_valid = true;
			systime_snapshot->cycles = now;
		}

?  (This is honestly just a suggestion, which may be wrong depedning
on the answer to the two questions above).

Paolo

>  		systime_snapshot->cs_was_changed_seq = tk->cs_was_changed_seq;
>  		systime_snapshot->clock_was_set_seq = tk->clock_was_set_seq;
>  		base_real = ktime_add(tk->tkr_mono.base,
>  				      tk_core.timekeeper.offs_real);
>  		base_raw = tk->tkr_raw.base;
> +		base_boot = ktime_add(tk->tkr_mono.base,
> +				      tk_core.timekeeper.offs_boot);
>  		nsec_real = timekeeping_cycles_to_ns(&tk->tkr_mono, now);
>  		nsec_raw  = timekeeping_cycles_to_ns(&tk->tkr_raw, now);
>  	} while (read_seqcount_retry(&tk_core.seq, seq));
>  
> -	systime_snapshot->cycles = now;
>  	systime_snapshot->real = ktime_add_ns(base_real, nsec_real);
>  	systime_snapshot->raw = ktime_add_ns(base_raw, nsec_raw);
> +	systime_snapshot->boot = ktime_add_ns(base_boot, nsec_real);
>  }
>  EXPORT_SYMBOL_GPL(ktime_get_snapshot);
>  
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 5/6] pvclock: add clocksource change notification on changing of tsc stable bit
  2017-07-29 12:35 ` [PATCH v3 5/6] pvclock: add clocksource change notification on changing of tsc stable bit Denis Plotnikov
@ 2017-07-31 14:21   ` Paolo Bonzini
  0 siblings, 0 replies; 18+ messages in thread
From: Paolo Bonzini @ 2017-07-31 14:21 UTC (permalink / raw)
  To: Denis Plotnikov, rkrcmar, kvm; +Cc: rkagan, den, svt-core

On 29/07/2017 14:35, Denis Plotnikov wrote:
> Now, L2 guests can't have a stable kvmclock while working over L1 with
> a stable kvmclock. It is so just because of lack of functionality.
> 
> The patch is a part of work making kvmclock stable in L2 if it works
> over a stable kvmclock.
> 
> Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com>
> ---
>  arch/x86/kernel/pvclock.c | 19 ++++++++++++++++---
>  1 file changed, 16 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
> index a2c554a..5b0bb2e 100644
> --- a/arch/x86/kernel/pvclock.c
> +++ b/arch/x86/kernel/pvclock.c
> @@ -22,6 +22,7 @@
>  #include <linux/gfp.h>
>  #include <linux/bootmem.h>
>  #include <linux/nmi.h>
> +#include <linux/cs_notifier.h>
>  
>  #include <asm/fixmap.h>
>  #include <asm/pvclock.h>
> @@ -73,6 +74,8 @@ u8 pvclock_read_flags(struct pvclock_vcpu_time_info *src)
>  	return flags & valid_flags;
>  }
>  
> +static atomic64_t clocksource_stable = ATOMIC64_INIT(0);

No need for atomic64.  Otherwise looks good.

Paolo

>  u64 pvclock_clocksource_read(struct pvclock_vcpu_time_info *src, u64 *cycles)
>  {
>  	unsigned version;
> @@ -96,10 +99,20 @@ u64 pvclock_clocksource_read(struct pvclock_vcpu_time_info *src, u64 *cycles)
>  		pvclock_touch_watchdogs();
>  	}
>  
> -	if ((valid_flags & PVCLOCK_TSC_STABLE_BIT) &&
> -		(flags & PVCLOCK_TSC_STABLE_BIT))
> -		return ret;
> +	if (likely(valid_flags & PVCLOCK_TSC_STABLE_BIT)) {
> +		bool stable_now = !!(flags & PVCLOCK_TSC_STABLE_BIT);
> +		bool stable_last = (bool) atomic64_read(&clocksource_stable);
> +
> +		if (unlikely(stable_now != stable_last)) {
> +			/* send notification once */
> +			if (stable_last == atomic64_cmpxchg(
> +				&clocksource_stable, stable_last, stable_now))
> +				clocksource_changes_notify();
> +		}
>  
> +		if (stable_now)
> +			return ret;
> +	}
>  	/*
>  	 * Assumption here is that last_value, a global accumulator, always goes
>  	 * forward. If we are less than that, we should not be much smaller.
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 2/6] KVM: x86: switch to masterclock update using timekeeper functionality
  2017-07-31 14:20   ` Paolo Bonzini
@ 2017-08-01  9:30     ` Denis Plotnikov
  2017-08-01 10:03       ` Paolo Bonzini
  0 siblings, 1 reply; 18+ messages in thread
From: Denis Plotnikov @ 2017-08-01  9:30 UTC (permalink / raw)
  To: Paolo Bonzini, rkrcmar, kvm; +Cc: rkagan, den, svt-core



On 31.07.2017 17:20, Paolo Bonzini wrote:
> On 29/07/2017 14:35, Denis Plotnikov wrote:
>>   arch/x86/kernel/kvmclock.c  | 14 ++++++++++++--
>>   arch/x86/kernel/tsc.c       |  6 ++++++
>>   arch/x86/kvm/x86.c          | 26 ++++++++++++++++++--------
>>   include/linux/clocksource.h |  3 +++
>>   include/linux/timekeeping.h |  2 ++
>>   kernel/time/timekeeping.c   | 21 +++++++++++++++++++--
>>   6 files changed, 60 insertions(+), 12 deletions(-)
> 
> This is pretty clean, thanks.  Only it should be split in several patches:
> 
> - introducing read_with_cycles and using it in ktime_get_snapshot.
> Looking at the code, the meaning of "cycles" is overloaded, so perhaps
> rename it to read_clock_and_systime or something similar?
>
agree
> - introducing boot time in ktime_get_snapshot
agree
> 
> - implementing kvm_clock_read_with_cycles (can be merged with patch 6)
Having a stable clocksource for kvmklock means making kvmclock stable. 
The patch enables this functionality that's why I'd prefer to keep patch 
6 separate
> 
> - adding the cs_stable field to struct system_time_snapshot (see below,
> maybe this can be merged with read_with_cycles)
agree
> 
> - using ktime_get_snapshot in KVM (can be merged with patch 4?)
agree, but want to keep 4 separate. Just to make the changes done 
logically consecutive in git tree.
> 
> so that the timekeeping maintainer can comment on each new feature you
> add to their code.
> 
> cs_stable is the part that I'm still a bit wary of; here are the doubts
> I have:
> 
> - if you want stability, you can use the CLOCK_SOURCE_UNSTABLE flag; a
> new callback shouldn't be needed (it's certainly not needed for TSC).
> 
> - the meaning of "stable" for kvmclock is not exactly the same as
> clocksource_mark_unstable.
> 
> Maybe what we want is some kind of "bool cycles_valid", and then
> read_clock_and_systime can return it:
> 
> 
> 		if (clock->read_clock_and_systime) {
> 			systime_snapshot->cycles_valid = clock->read_clock_and_systime(
> 				&now, &systime_snapshot->cycles);
> 		} else {
> 			now = tk_clock_read(&tk->tkr_mono);
> 			systime_snapshot->cycles_valid = true;
> 			systime_snapshot->cycles = now;
> 		}
> 
> ?  (This is honestly just a suggestion, which may be wrong depedning
> on the answer to the two questions above).
cs_stable means "there is no unexpected time jumps". As you mentioned 
it's irrelevant for tsc but makes sense to kvmclock (and possibly some 
other clocksources). I think it's reasonable to ask a clocksource 
directly whether it's stable or not. That's why I made the callback.
I don't mind merging this "check stability" functionality with 
read_clock_and_systime. Actually, I thought about having it there but 
eventually split it to make the code more explicit
(detailed and understandable).
I'd like to use your approach but keep the variable name the same.

Thanks for reviewing!

Denis
> 
> Paolo
> 
>>   		systime_snapshot->cs_was_changed_seq = tk->cs_was_changed_seq;
>>   		systime_snapshot->clock_was_set_seq = tk->clock_was_set_seq;
>>   		base_real = ktime_add(tk->tkr_mono.base,
>>   				      tk_core.timekeeper.offs_real);
>>   		base_raw = tk->tkr_raw.base;
>> +		base_boot = ktime_add(tk->tkr_mono.base,
>> +				      tk_core.timekeeper.offs_boot);
>>   		nsec_real = timekeeping_cycles_to_ns(&tk->tkr_mono, now);
>>   		nsec_raw  = timekeeping_cycles_to_ns(&tk->tkr_raw, now);
>>   	} while (read_seqcount_retry(&tk_core.seq, seq));
>>   
>> -	systime_snapshot->cycles = now;
>>   	systime_snapshot->real = ktime_add_ns(base_real, nsec_real);
>>   	systime_snapshot->raw = ktime_add_ns(base_raw, nsec_raw);
>> +	systime_snapshot->boot = ktime_add_ns(base_boot, nsec_real);
>>   }
>>   EXPORT_SYMBOL_GPL(ktime_get_snapshot);
>>   
>>
> 

-- 
Best,
Denis

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 2/6] KVM: x86: switch to masterclock update using timekeeper functionality
  2017-08-01  9:30     ` Denis Plotnikov
@ 2017-08-01 10:03       ` Paolo Bonzini
  2017-08-01 10:16         ` Paolo Bonzini
  2017-08-01 12:11         ` Denis Plotnikov
  0 siblings, 2 replies; 18+ messages in thread
From: Paolo Bonzini @ 2017-08-01 10:03 UTC (permalink / raw)
  To: Denis Plotnikov, rkrcmar, kvm; +Cc: rkagan, den, svt-core

On 01/08/2017 11:30, Denis Plotnikov wrote:
>> - implementing kvm_clock_read_with_cycles (can be merged with patch 6)
>
> Having a stable clocksource for kvmklock means making kvmclock stable.
> The patch enables this functionality that's why I'd prefer to keep patch
> 6 separate

Ok, though it depends on how you deal with cs_stable.

>> - using ktime_get_snapshot in KVM (can be merged with patch 4?)
>
> agree, but want to keep 4 separate. Just to make the changes done
> logically consecutive in git tree.

Fine by me.

>> Maybe what we want is some kind of "bool cycles_valid", and then
>> read_clock_and_systime can return it:
>>
>>
>>         if (clock->read_clock_and_systime) {
>>             systime_snapshot->cycles_valid =
>>              clock->read_clock_and_systime(
>>                 &now, &systime_snapshot->cycles);
>>         } else {
>>             now = tk_clock_read(&tk->tkr_mono);
>>             systime_snapshot->cycles_valid = true;
>>             systime_snapshot->cycles = now;
>>         }
>>
>> ?  (This is honestly just a suggestion, which may be wrong depedning
>> on the answer to the two questions above).
>
> cs_stable means "there is no unexpected time jumps".

But even for kvmclock there are no unexpected time jumps, the global
accumulator in pvclock_clocksource_read ensures that.  And the concept
is different from CLOCK_SOURCE_UNSTABLE which will basically blacklist
the clocksource for hrtimers.

It seems a different concept to me, somewhat specific to
ktime_get_snapshot.  More precisely, the question is "is there a 1:1
mapping from cycles to nanoseconds?"---but if there is no such mapping
cycles is useless, hence my proposal of "cycles_valid".

Thanks,

Paolo

> I don't mind merging this "check stability" functionality with
> read_clock_and_systime. Actually, I thought about having it there but
> eventually split it to make the code more explicit
> (detailed and understandable).
> I'd like to use your approach but keep the variable name the same.
> 
> Thanks for reviewing!
> 
> Denis
>>
>> Paolo
>>
>>>           systime_snapshot->cs_was_changed_seq = tk->cs_was_changed_seq;
>>>           systime_snapshot->clock_was_set_seq = tk->clock_was_set_seq;
>>>           base_real = ktime_add(tk->tkr_mono.base,
>>>                         tk_core.timekeeper.offs_real);
>>>           base_raw = tk->tkr_raw.base;
>>> +        base_boot = ktime_add(tk->tkr_mono.base,
>>> +                      tk_core.timekeeper.offs_boot);
>>>           nsec_real = timekeeping_cycles_to_ns(&tk->tkr_mono, now);
>>>           nsec_raw  = timekeeping_cycles_to_ns(&tk->tkr_raw, now);
>>>       } while (read_seqcount_retry(&tk_core.seq, seq));
>>>   -    systime_snapshot->cycles = now;
>>>       systime_snapshot->real = ktime_add_ns(base_real, nsec_real);
>>>       systime_snapshot->raw = ktime_add_ns(base_raw, nsec_raw);
>>> +    systime_snapshot->boot = ktime_add_ns(base_boot, nsec_real);
>>>   }
>>>   EXPORT_SYMBOL_GPL(ktime_get_snapshot);
>>>  
>>
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 2/6] KVM: x86: switch to masterclock update using timekeeper functionality
  2017-08-01 10:03       ` Paolo Bonzini
@ 2017-08-01 10:16         ` Paolo Bonzini
  2017-08-01 12:11         ` Denis Plotnikov
  1 sibling, 0 replies; 18+ messages in thread
From: Paolo Bonzini @ 2017-08-01 10:16 UTC (permalink / raw)
  To: Denis Plotnikov, rkrcmar, kvm; +Cc: rkagan, den, svt-core

A couple more observation now that we've agreed on the approach.

On 01/08/2017 12:03, Paolo Bonzini wrote:
> 
>>> Maybe what we want is some kind of "bool cycles_valid", and then
>>> read_clock_and_systime can return it:
>>>
>>>
>>>         if (clock->read_clock_and_systime) {
>>>             systime_snapshot->cycles_valid =
>>>              clock->read_clock_and_systime(
>>>                 &now, &systime_snapshot->cycles);

Please pass the clocksource to the new function.  However this:

> +		clock = tk->tkr_mono.clock;

should then be READ_ONCE for the same reason mentioned in
tk_clock_read's comments.  Then,

>>>         } else {
>>>             now = tk_clock_read(&tk->tkr_mono);

this can call clock->read(clock) directly, without going through
tk_clock_read.

Thanks,

Paolo

>>>             systime_snapshot->cycles_valid = true;
>>>             systime_snapshot->cycles = now;
>>>         }

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 2/6] KVM: x86: switch to masterclock update using timekeeper functionality
  2017-08-01 10:03       ` Paolo Bonzini
  2017-08-01 10:16         ` Paolo Bonzini
@ 2017-08-01 12:11         ` Denis Plotnikov
  2017-08-01 12:28           ` Denis Plotnikov
  2017-08-01 12:41           ` Paolo Bonzini
  1 sibling, 2 replies; 18+ messages in thread
From: Denis Plotnikov @ 2017-08-01 12:11 UTC (permalink / raw)
  To: Paolo Bonzini, rkrcmar, kvm; +Cc: rkagan, den, svt-core



On 01.08.2017 13:03, Paolo Bonzini wrote:
> On 01/08/2017 11:30, Denis Plotnikov wrote:
>>> - implementing kvm_clock_read_with_cycles (can be merged with patch 6)
>>
>> Having a stable clocksource for kvmklock means making kvmclock stable.
>> The patch enables this functionality that's why I'd prefer to keep patch
>> 6 separate
> 
> Ok, though it depends on how you deal with cs_stable.
> 
>>> - using ktime_get_snapshot in KVM (can be merged with patch 4?)
>>
>> agree, but want to keep 4 separate. Just to make the changes done
>> logically consecutive in git tree.
> 
> Fine by me.
> 
>>> Maybe what we want is some kind of "bool cycles_valid", and then
>>> read_clock_and_systime can return it:
>>>
>>>
>>>          if (clock->read_clock_and_systime) {
>>>              systime_snapshot->cycles_valid =
>>>               clock->read_clock_and_systime(
>>>                  &now, &systime_snapshot->cycles);
>>>          } else {
>>>              now = tk_clock_read(&tk->tkr_mono);
>>>              systime_snapshot->cycles_valid = true;
>>>              systime_snapshot->cycles = now;
>>>          }
>>>
>>> ?  (This is honestly just a suggestion, which may be wrong depedning
>>> on the answer to the two questions above).
>>
>> cs_stable means "there is no unexpected time jumps".
> 
> But even for kvmclock there are no unexpected time jumps, the global
> accumulator in pvclock_clocksource_read ensures that.  And the concept
> is different from CLOCK_SOURCE_UNSTABLE which will basically blacklist
> the clocksource for hrtimers.
> 
> It seems a different concept to me, somewhat specific to
> ktime_get_snapshot.  More precisely, the question is "is there a 1:1
> mapping from cycles to nanoseconds?"---but if there is no such mapping
> cycles is useless, hence my proposal of "cycles_valid".
> 
> Thanks,
> 
> Paolo
In fact, this "cycles_valid" is going to be used for deciding whether to 
use KVM masterclock or not. And if it's not we still want to know 
cycles_stamp value to use it in KVM.
So the cycles is valid, but clocksource is not reliable (why? let decide 
to a clocksource, by default assume they are all not stable), thus we 
must calculate time values for a guest each time its needed.
So, my proposal is to name the variable sightly differently: cs_reliable
and go like:
	if (clock->read_clock_with_stamp) {
		systime_snapshot->cs_reliable =
			clock->read_clock_with_stamp(
				&now, &systime_snapshot->cycles);
	} else {
		now = tk_clock_read(&tk->tkr_mono);
		systime_snapshot->cs_reliable = false;
		systime_snapshot->cycles = now;
	}
What do you think?

Thanks!

Denis
> 
>> I don't mind merging this "check stability" functionality with
>> read_clock_and_systime. Actually, I thought about having it there but
>> eventually split it to make the code more explicit
>> (detailed and understandable).
>> I'd like to use your approach but keep the variable name the same.
>>
>> Thanks for reviewing!
>>
>> Denis
>>>
>>> Paolo
>>>
>>>>            systime_snapshot->cs_was_changed_seq = tk->cs_was_changed_seq;
>>>>            systime_snapshot->clock_was_set_seq = tk->clock_was_set_seq;
>>>>            base_real = ktime_add(tk->tkr_mono.base,
>>>>                          tk_core.timekeeper.offs_real);
>>>>            base_raw = tk->tkr_raw.base;
>>>> +        base_boot = ktime_add(tk->tkr_mono.base,
>>>> +                      tk_core.timekeeper.offs_boot);
>>>>            nsec_real = timekeeping_cycles_to_ns(&tk->tkr_mono, now);
>>>>            nsec_raw  = timekeeping_cycles_to_ns(&tk->tkr_raw, now);
>>>>        } while (read_seqcount_retry(&tk_core.seq, seq));
>>>>    -    systime_snapshot->cycles = now;
>>>>        systime_snapshot->real = ktime_add_ns(base_real, nsec_real);
>>>>        systime_snapshot->raw = ktime_add_ns(base_raw, nsec_raw);
>>>> +    systime_snapshot->boot = ktime_add_ns(base_boot, nsec_real);
>>>>    }
>>>>    EXPORT_SYMBOL_GPL(ktime_get_snapshot);
>>>>   
>>>
>>
> 

-- 
Best,
Denis

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 2/6] KVM: x86: switch to masterclock update using timekeeper functionality
  2017-08-01 12:11         ` Denis Plotnikov
@ 2017-08-01 12:28           ` Denis Plotnikov
  2017-08-01 12:41           ` Paolo Bonzini
  1 sibling, 0 replies; 18+ messages in thread
From: Denis Plotnikov @ 2017-08-01 12:28 UTC (permalink / raw)
  To: Paolo Bonzini, rkrcmar, kvm; +Cc: rkagan, den, svt-core



On 01.08.2017 15:11, Denis Plotnikov wrote:
> 
> 
> On 01.08.2017 13:03, Paolo Bonzini wrote:
>> On 01/08/2017 11:30, Denis Plotnikov wrote:
>>>> - implementing kvm_clock_read_with_cycles (can be merged with patch 6)
>>>
>>> Having a stable clocksource for kvmklock means making kvmclock stable.
>>> The patch enables this functionality that's why I'd prefer to keep patch
>>> 6 separate
>>
>> Ok, though it depends on how you deal with cs_stable.
>>
>>>> - using ktime_get_snapshot in KVM (can be merged with patch 4?)
>>>
>>> agree, but want to keep 4 separate. Just to make the changes done
>>> logically consecutive in git tree.
>>
>> Fine by me.
>>
>>>> Maybe what we want is some kind of "bool cycles_valid", and then
>>>> read_clock_and_systime can return it:
>>>>
>>>>
>>>>          if (clock->read_clock_and_systime) {
>>>>              systime_snapshot->cycles_valid =
>>>>               clock->read_clock_and_systime(
>>>>                  &now, &systime_snapshot->cycles);
>>>>          } else {
>>>>              now = tk_clock_read(&tk->tkr_mono);
>>>>              systime_snapshot->cycles_valid = true;
>>>>              systime_snapshot->cycles = now;
>>>>          }
>>>>
>>>> ?  (This is honestly just a suggestion, which may be wrong depedning
>>>> on the answer to the two questions above).
>>>
>>> cs_stable means "there is no unexpected time jumps".
>>
>> But even for kvmclock there are no unexpected time jumps, the global
>> accumulator in pvclock_clocksource_read ensures that.  And the concept
>> is different from CLOCK_SOURCE_UNSTABLE which will basically blacklist
>> the clocksource for hrtimers.
>>
>> It seems a different concept to me, somewhat specific to
>> ktime_get_snapshot.  More precisely, the question is "is there a 1:1
>> mapping from cycles to nanoseconds?"---but if there is no such mapping
>> cycles is useless, hence my proposal of "cycles_valid".
>>
>> Thanks,
>>
>> Paolo
> In fact, this "cycles_valid" is going to be used for deciding whether to 
> use KVM masterclock or not. And if it's not we still want to know 
> cycles_stamp value to use it in KVM.
> So the cycles is valid, but clocksource is not reliable (why? let decide 
> to a clocksource, by default assume they are all not stable), thus we 
> must calculate time values for a guest each time its needed.
> So, my proposal is to name the variable sightly differently: cs_reliable
> and go like:
>      if (clock->read_clock_with_stamp) {
>          systime_snapshot->cs_reliable =
>              clock->read_clock_with_stamp(
>                  &now, &systime_snapshot->cycles);
>      } else {
>          now = tk_clock_read(&tk->tkr_mono);
>          systime_snapshot->cs_reliable = false;
>          systime_snapshot->cycles = now;
>      }
> What do you think?
> 
> Thanks!
> 
> Denis

How about this:
Let's name the variable cycles_reliable. if it's true then what 
systime_snapshot->cycles holds, is truly cycles, if false then it's 
something else -- some special care needed

Thanks!

Denis
>>
>>> I don't mind merging this "check stability" functionality with
>>> read_clock_and_systime. Actually, I thought about having it there but
>>> eventually split it to make the code more explicit
>>> (detailed and understandable).
>>> I'd like to use your approach but keep the variable name the same.
>>>
>>> Thanks for reviewing!
>>>
>>> Denis
>>>>
>>>> Paolo
>>>>
>>>>>            systime_snapshot->cs_was_changed_seq = 
>>>>> tk->cs_was_changed_seq;
>>>>>            systime_snapshot->clock_was_set_seq = 
>>>>> tk->clock_was_set_seq;
>>>>>            base_real = ktime_add(tk->tkr_mono.base,
>>>>>                          tk_core.timekeeper.offs_real);
>>>>>            base_raw = tk->tkr_raw.base;
>>>>> +        base_boot = ktime_add(tk->tkr_mono.base,
>>>>> +                      tk_core.timekeeper.offs_boot);
>>>>>            nsec_real = timekeeping_cycles_to_ns(&tk->tkr_mono, now);
>>>>>            nsec_raw  = timekeeping_cycles_to_ns(&tk->tkr_raw, now);
>>>>>        } while (read_seqcount_retry(&tk_core.seq, seq));
>>>>>    -    systime_snapshot->cycles = now;
>>>>>        systime_snapshot->real = ktime_add_ns(base_real, nsec_real);
>>>>>        systime_snapshot->raw = ktime_add_ns(base_raw, nsec_raw);
>>>>> +    systime_snapshot->boot = ktime_add_ns(base_boot, nsec_real);
>>>>>    }
>>>>>    EXPORT_SYMBOL_GPL(ktime_get_snapshot);
>>>>
>>>
>>
> 

-- 
Best,
Denis

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 2/6] KVM: x86: switch to masterclock update using timekeeper functionality
  2017-08-01 12:11         ` Denis Plotnikov
  2017-08-01 12:28           ` Denis Plotnikov
@ 2017-08-01 12:41           ` Paolo Bonzini
  2017-08-01 12:46             ` Denis Plotnikov
  1 sibling, 1 reply; 18+ messages in thread
From: Paolo Bonzini @ 2017-08-01 12:41 UTC (permalink / raw)
  To: Denis Plotnikov, rkrcmar, kvm; +Cc: rkagan, den, svt-core

On 01/08/2017 14:11, Denis Plotnikov wrote:
> In fact, this "cycles_valid" is going to be used for deciding whether to
> use KVM masterclock or not. And if it's not we still want to know
> cycles_stamp value to use it in KVM.

Why?  Neither pvclock_update_vm_gtod_copy nor kvm_pv_clock_pairing do
anything with the two variables that are passed by reference, if the
read returns false.  Hence my suggestion of calling it cycles_valid.

> So the cycles is valid, but clocksource is not reliable (why? let decide
> to a clocksource, by default assume they are all not stable), thus we
> must calculate time values for a guest each time its needed.
> So, my proposal is to name the variable sightly differently: cs_reliable
> and go like:
>     if (clock->read_clock_with_stamp) {
>         systime_snapshot->cs_reliable =
>             clock->read_clock_with_stamp(
>                 &now, &systime_snapshot->cycles);
>     } else {
>         now = tk_clock_read(&tk->tkr_mono);
>         systime_snapshot->cs_reliable = false;
>         systime_snapshot->cycles = now;
>     }
> What do you think?

I'm afraid you still have to define the meaning of "reliable".  (Though
I agree that the right default is false, that was a thinko on my side.
This also means that you need to define read_clock_with_stamp for the
TSC clocksource too).

Paolo

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 2/6] KVM: x86: switch to masterclock update using timekeeper functionality
  2017-08-01 12:41           ` Paolo Bonzini
@ 2017-08-01 12:46             ` Denis Plotnikov
  2017-08-01 17:47               ` Radim Krčmář
  0 siblings, 1 reply; 18+ messages in thread
From: Denis Plotnikov @ 2017-08-01 12:46 UTC (permalink / raw)
  To: Paolo Bonzini, rkrcmar, kvm; +Cc: rkagan, den, svt-core



On 01.08.2017 15:41, Paolo Bonzini wrote:
> On 01/08/2017 14:11, Denis Plotnikov wrote:
>> In fact, this "cycles_valid" is going to be used for deciding whether to
>> use KVM masterclock or not. And if it's not we still want to know
>> cycles_stamp value to use it in KVM.
> 
> Why?  Neither pvclock_update_vm_gtod_copy nor kvm_pv_clock_pairing do
> anything with the two variables that are passed by reference, if the
> read returns false.  Hence my suggestion of calling it cycles_valid.
giving up
> 
>> So the cycles is valid, but clocksource is not reliable (why? let decide
>> to a clocksource, by default assume they are all not stable), thus we
>> must calculate time values for a guest each time its needed.
>> So, my proposal is to name the variable sightly differently: cs_reliable
>> and go like:
>>      if (clock->read_clock_with_stamp) {
>>          systime_snapshot->cs_reliable =
>>              clock->read_clock_with_stamp(
>>                  &now, &systime_snapshot->cycles);
>>      } else {
>>          now = tk_clock_read(&tk->tkr_mono);
>>          systime_snapshot->cs_reliable = false;
>>          systime_snapshot->cycles = now;
>>      }
>> What do you think?
> 
> I'm afraid you still have to define the meaning of "reliable".  (Though
> I agree that the right default is false, that was a thinko on my side.
> This also means that you need to define read_clock_with_stamp for the
> TSC clocksource too).
> 
> Paolo
> 
agree
   name: cycles_valid
   default: false
   read_clock_with_stamp: defined for tsc returning true

Thanks!

Patches are in progress.
They well be sent soon, nice and shiny!

Denis

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 2/6] KVM: x86: switch to masterclock update using timekeeper functionality
  2017-08-01 12:46             ` Denis Plotnikov
@ 2017-08-01 17:47               ` Radim Krčmář
  0 siblings, 0 replies; 18+ messages in thread
From: Radim Krčmář @ 2017-08-01 17:47 UTC (permalink / raw)
  To: Denis Plotnikov; +Cc: Paolo Bonzini, kvm, rkagan, den, svt-core

2017-08-01 15:46+0300, Denis Plotnikov:
> Patches are in progress.
> They well be sent soon, nice and shiny!

Please Cc timekeeper and x86 people too, thanks!

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2017-08-01 17:47 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-29 12:35 [PATCH v3 0/6] make L2's kvm-clock stable, get rid of pvclock_gtod Denis Plotnikov
2017-07-29 12:35 ` [PATCH v3 1/6] pvclock: add parameter to store cycles stamp to pvclock reading function Denis Plotnikov
2017-07-31 14:08   ` Paolo Bonzini
2017-07-29 12:35 ` [PATCH v3 2/6] KVM: x86: switch to masterclock update using timekeeper functionality Denis Plotnikov
2017-07-31 14:20   ` Paolo Bonzini
2017-08-01  9:30     ` Denis Plotnikov
2017-08-01 10:03       ` Paolo Bonzini
2017-08-01 10:16         ` Paolo Bonzini
2017-08-01 12:11         ` Denis Plotnikov
2017-08-01 12:28           ` Denis Plotnikov
2017-08-01 12:41           ` Paolo Bonzini
2017-08-01 12:46             ` Denis Plotnikov
2017-08-01 17:47               ` Radim Krčmář
2017-07-29 12:35 ` [PATCH v3 3/6] timekeeper: add clocksource change notifier Denis Plotnikov
2017-07-29 12:35 ` [PATCH v3 4/6] KVM: x86: remove not used pvclock_gtod_copy Denis Plotnikov
2017-07-29 12:35 ` [PATCH v3 5/6] pvclock: add clocksource change notification on changing of tsc stable bit Denis Plotnikov
2017-07-31 14:21   ` Paolo Bonzini
2017-07-29 12:35 ` [PATCH v3 6/6] kvmclock: add the clocksource stability querying function Denis Plotnikov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).