All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 0/6] KVM: x86: get rid of pvclock_gtod_copy
@ 2017-08-30 15:23 Denis Plotnikov
  2017-08-30 15:23 ` [PATCH v5 1/6] timekeeper: introduce extended clocksource reading callback Denis Plotnikov
                   ` (6 more replies)
  0 siblings, 7 replies; 17+ messages in thread
From: Denis Plotnikov @ 2017-08-30 15:23 UTC (permalink / raw)
  To: pbonzini, rkrcmar, kvm, john.stultz, tglx
  Cc: mingo, hpa, linux-kernel, x86, rkagan, den

V5:
  It was decided to split the series
  "make L2's kvm-clock stable, get rid of pvclock_gtod_copy in KVM"
  into two parts to make the changes finer grained:
    1. get rid of pvclock_gtod_copy in KVM 
    2. make L2's kvm-clock stable
  This V5 series implements the 1st part. This part gets rid
  of pvclock_gtod_copy by switching to use the timekeeper functionality
  being extended. This new scheme doesn't add any new notifiers and keeps
  using the existing one (see patch 6 description) which had been used by
  the pvclock_gtod_copy to keep track timekeeper data changes.

  The 2nd part is being discussed and is going to be sent later on.

V4:
  * removed "is stable" function with vague definition of stability
    there is the only function which does time with cycle stamp getting
  * some variables renamed
  * some patches split into smaller once
  * atomic64_t usage is replaced with atomic_t
 
V3:
  Changing the timekeeper interface for clocksource reading looks like
  an overkill to achive the goal of getting cycles stamp for KVM.
  Instead extend the timekeeping interface and add functions which provide
  necessary data: read clocksource with cycles stamp, check whether the
  clock source is stable.

  Use those functions and improve existing timekeeper functionality to
  replace pvclock_gtod_copy scheme in masterclock data calculation.

V2:
  The main goal is to make L2 kvm-clock be stable when it's running over L1
  with stable kvm-clock.

  The patch series is for x86 architecture only. If the series is approved
  I'll do changes for other architectures but I don't have an ability to
  compile and check for every single on (help needed)

  The patch series do the following:

        * change timekeeper interface to get cycles stamp value from
          the timekeeper
        * get rid of pvclock copy in KVM by using the changed timekeeper
          interface: get time and cycles right from the timekeeper
        * make KVM recognize a stable kvm-clock as stable clocksource
          and use the KVM masterclock in this case, which means making
          L2 stable when running over stable L1 kvm-clock

Denis Plotnikov (6):
  timekeeper: introduce extended clocksource reading callback
  timekeeper: introduce boot field in system_time_snapshot
  timekeeper: use the extended reading function on snapshot acquiring
  tsc: implement the extended tsc reading function
  KVM: x86: switch to masterclock update using timekeeper functionality
  KVM: x86: remove not used pvclock_gtod_copy

 arch/x86/include/asm/kvm_host.h |   2 +-
 arch/x86/kernel/tsc.c           |  10 ++
 arch/x86/kvm/trace.h            |  31 ++---
 arch/x86/kvm/x86.c              | 245 +++++++++-------------------------------
 include/linux/clocksource.h     |  11 +-
 include/linux/timekeeping.h     |   5 +
 kernel/time/timekeeping.c       |  17 ++-
 7 files changed, 104 insertions(+), 217 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v5 1/6] timekeeper: introduce extended clocksource reading callback
  2017-08-30 15:23 [PATCH v5 0/6] KVM: x86: get rid of pvclock_gtod_copy Denis Plotnikov
@ 2017-08-30 15:23 ` Denis Plotnikov
  2017-09-25 22:00   ` Thomas Gleixner
  2017-08-30 15:23 ` [PATCH v5 2/6] timekeeper: introduce boot field in system_time_snapshot Denis Plotnikov
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 17+ messages in thread
From: Denis Plotnikov @ 2017-08-30 15:23 UTC (permalink / raw)
  To: pbonzini, rkrcmar, kvm, john.stultz, tglx
  Cc: mingo, hpa, linux-kernel, x86, rkagan, den

The callback provides extended information about just read
clocksource value.

It's going to be used in cases when detailed system information
needed for further time related values calculation, e.g in KVM
masterclock settings calculation.

Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com>
---
 include/linux/clocksource.h | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index a78cb18..786a522 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -48,7 +48,14 @@ struct module;
  *			400-499: Perfect
  *				The ideal clocksource. A must-use where
  *				available.
- * @read:		returns a cycle value, passes clocksource as argument
+ * @read:		returns a cycle value (might be not quite cycles:
+ *			see pvclock) passes clocksource as argument
+ * @read_with_stamp:	saves cycles value (got from timekeeper) and cycles
+ *			stamp (got from hardware counter value and used by
+ *			timekeeper to calculate the cycles value) to
+ *			corresponding input pointers return true if cycles
+ *			stamp holds real cycles and false if it holds some
+ *			time derivative value
  * @enable:		optional function to enable the clocksource
  * @disable:		optional function to disable the clocksource
  * @mask:		bitmask for two's complement
@@ -78,6 +85,8 @@ struct module;
  */
 struct clocksource {
 	u64 (*read)(struct clocksource *cs);
+	bool (*read_with_stamp)(struct clocksource *cs,
+				u64 *cycles, u64 *cycles_stamp);
 	u64 mask;
 	u32 mult;
 	u32 shift;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v5 2/6] timekeeper: introduce boot field in system_time_snapshot
  2017-08-30 15:23 [PATCH v5 0/6] KVM: x86: get rid of pvclock_gtod_copy Denis Plotnikov
  2017-08-30 15:23 ` [PATCH v5 1/6] timekeeper: introduce extended clocksource reading callback Denis Plotnikov
@ 2017-08-30 15:23 ` Denis Plotnikov
  2017-08-30 15:23 ` [PATCH v5 3/6] timekeeper: use the extended reading function on snapshot acquiring Denis Plotnikov
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 17+ messages in thread
From: Denis Plotnikov @ 2017-08-30 15:23 UTC (permalink / raw)
  To: pbonzini, rkrcmar, kvm, john.stultz, tglx
  Cc: mingo, hpa, linux-kernel, x86, rkagan, den

The field keeps monotonic time since boot.

The value of boot will be used in KVM for masterclock.

This is a part of the work aiming to move to a more simple scheme
of masterclock related values calculation in KVM

Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com>
---
 include/linux/timekeeping.h | 2 ++
 kernel/time/timekeeping.c   | 4 ++++
 2 files changed, 6 insertions(+)

diff --git a/include/linux/timekeeping.h b/include/linux/timekeeping.h
index ddc229f..5008e3e 100644
--- a/include/linux/timekeeping.h
+++ b/include/linux/timekeeping.h
@@ -283,6 +283,7 @@ extern void ktime_get_raw_and_real_ts64(struct timespec64 *ts_raw,
  * @cycles:	Clocksource counter value to produce the system times
  * @real:	Realtime system time
  * @raw:	Monotonic raw system time
+ * @boot:	Monotonic time since boot
  * @clock_was_set_seq:	The sequence number of clock was set events
  * @cs_was_changed_seq:	The sequence number of clocksource change events
  */
@@ -290,6 +291,7 @@ struct system_time_snapshot {
 	u64		cycles;
 	ktime_t		real;
 	ktime_t		raw;
+	ktime_t		boot;
 	unsigned int	clock_was_set_seq;
 	u8		cs_was_changed_seq;
 };
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index cedafa0..66f7701 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -953,6 +953,7 @@ void ktime_get_snapshot(struct system_time_snapshot *systime_snapshot)
 	unsigned long seq;
 	ktime_t base_raw;
 	ktime_t base_real;
+	ktime_t base_boot;
 	u64 nsec_raw;
 	u64 nsec_real;
 	u64 now;
@@ -967,6 +968,8 @@ void ktime_get_snapshot(struct system_time_snapshot *systime_snapshot)
 		base_real = ktime_add(tk->tkr_mono.base,
 				      tk_core.timekeeper.offs_real);
 		base_raw = tk->tkr_raw.base;
+		base_boot = ktime_add(tk->tkr_mono.base,
+					tk_core.timekeeper.offs_boot);
 		nsec_real = timekeeping_cycles_to_ns(&tk->tkr_mono, now);
 		nsec_raw  = timekeeping_cycles_to_ns(&tk->tkr_raw, now);
 	} while (read_seqcount_retry(&tk_core.seq, seq));
@@ -974,6 +977,7 @@ void ktime_get_snapshot(struct system_time_snapshot *systime_snapshot)
 	systime_snapshot->cycles = now;
 	systime_snapshot->real = ktime_add_ns(base_real, nsec_real);
 	systime_snapshot->raw = ktime_add_ns(base_raw, nsec_raw);
+	systime_snapshot->boot = ktime_add_ns(base_boot, nsec_real);
 }
 EXPORT_SYMBOL_GPL(ktime_get_snapshot);
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v5 3/6] timekeeper: use the extended reading function on snapshot acquiring
  2017-08-30 15:23 [PATCH v5 0/6] KVM: x86: get rid of pvclock_gtod_copy Denis Plotnikov
  2017-08-30 15:23 ` [PATCH v5 1/6] timekeeper: introduce extended clocksource reading callback Denis Plotnikov
  2017-08-30 15:23 ` [PATCH v5 2/6] timekeeper: introduce boot field in system_time_snapshot Denis Plotnikov
@ 2017-08-30 15:23 ` Denis Plotnikov
  2017-08-30 15:23 ` [PATCH v5 4/6] tsc: implement the extended tsc reading function Denis Plotnikov
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 17+ messages in thread
From: Denis Plotnikov @ 2017-08-30 15:23 UTC (permalink / raw)
  To: pbonzini, rkrcmar, kvm, john.stultz, tglx
  Cc: mingo, hpa, linux-kernel, x86, rkagan, den

Make use of the extended reading function on time snapshot getting:
get raw cycles value if extended function is defined.
The raw cycles value then is used for KVM masterclock.

This is a part of the work aiming to move to a more simple scheme of
masterclock related values calculation in KVM

Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com>
---
 include/linux/timekeeping.h |  3 +++
 kernel/time/timekeeping.c   | 13 +++++++++++--
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/include/linux/timekeeping.h b/include/linux/timekeeping.h
index 5008e3e..528d088 100644
--- a/include/linux/timekeeping.h
+++ b/include/linux/timekeeping.h
@@ -286,6 +286,8 @@ extern void ktime_get_raw_and_real_ts64(struct timespec64 *ts_raw,
  * @boot:	Monotonic time since boot
  * @clock_was_set_seq:	The sequence number of clock was set events
  * @cs_was_changed_seq:	The sequence number of clocksource change events
+ * @cycles_valid:	The flag is true when @cycles contain actual
+ *			number of cycles instead some cycle derivative
  */
 struct system_time_snapshot {
 	u64		cycles;
@@ -294,6 +296,7 @@ struct system_time_snapshot {
 	ktime_t		boot;
 	unsigned int	clock_was_set_seq;
 	u8		cs_was_changed_seq;
+	bool		cycles_valid;
 };
 
 /*
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 66f7701..d1aa575 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -957,12 +957,22 @@ void ktime_get_snapshot(struct system_time_snapshot *systime_snapshot)
 	u64 nsec_raw;
 	u64 nsec_real;
 	u64 now;
+	struct clocksource *clock;
 
 	WARN_ON_ONCE(timekeeping_suspended);
 
 	do {
 		seq = read_seqcount_begin(&tk_core.seq);
-		now = tk_clock_read(&tk->tkr_mono);
+		clock = READ_ONCE(tk->tkr_mono.clock);
+		if (clock->read_with_stamp)
+			systime_snapshot->cycles_valid =
+				clock->read_with_stamp(
+					clock, &now, &systime_snapshot->cycles);
+		else {
+			systime_snapshot->cycles_valid = false;
+			now = clock->read(clock);
+			systime_snapshot->cycles = now;
+		}
 		systime_snapshot->cs_was_changed_seq = tk->cs_was_changed_seq;
 		systime_snapshot->clock_was_set_seq = tk->clock_was_set_seq;
 		base_real = ktime_add(tk->tkr_mono.base,
@@ -974,7 +984,6 @@ void ktime_get_snapshot(struct system_time_snapshot *systime_snapshot)
 		nsec_raw  = timekeeping_cycles_to_ns(&tk->tkr_raw, now);
 	} while (read_seqcount_retry(&tk_core.seq, seq));
 
-	systime_snapshot->cycles = now;
 	systime_snapshot->real = ktime_add_ns(base_real, nsec_real);
 	systime_snapshot->raw = ktime_add_ns(base_raw, nsec_raw);
 	systime_snapshot->boot = ktime_add_ns(base_boot, nsec_real);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v5 4/6] tsc: implement the extended tsc reading function
  2017-08-30 15:23 [PATCH v5 0/6] KVM: x86: get rid of pvclock_gtod_copy Denis Plotnikov
                   ` (2 preceding siblings ...)
  2017-08-30 15:23 ` [PATCH v5 3/6] timekeeper: use the extended reading function on snapshot acquiring Denis Plotnikov
@ 2017-08-30 15:23 ` Denis Plotnikov
  2017-08-30 15:23 ` [PATCH v5 5/6] KVM: x86: switch to masterclock update using timekeeper functionality Denis Plotnikov
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 17+ messages in thread
From: Denis Plotnikov @ 2017-08-30 15:23 UTC (permalink / raw)
  To: pbonzini, rkrcmar, kvm, john.stultz, tglx
  Cc: mingo, hpa, linux-kernel, x86, rkagan, den

By doing that, add tsc clocksource to a list of KVM clocksources
providing valid cycle values, meaning that KVM can use its masterclock.

This is a part of the work aiming to move to a more simple scheme of
masterclock related values calculation in KVM.

Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com>
---
 arch/x86/kernel/tsc.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 796d96b..8786454 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -1036,6 +1036,15 @@ static void tsc_cs_tick_stable(struct clocksource *cs)
 		sched_clock_tick_stable();
 }
 
+static bool tsc_read_with_stamp(struct clocksource *cs,
+				u64 *cycles, u64 *cycles_stamp)
+{
+	u64 tsc = read_tsc(cs);
+	*cycles = tsc;
+	*cycles_stamp = tsc;
+	return true;
+}
+
 /*
  * .mask MUST be CLOCKSOURCE_MASK(64). See comment above read_tsc()
  */
@@ -1043,6 +1052,7 @@ static struct clocksource clocksource_tsc = {
 	.name                   = "tsc",
 	.rating                 = 300,
 	.read                   = read_tsc,
+	.read_with_stamp	= tsc_read_with_stamp,
 	.mask                   = CLOCKSOURCE_MASK(64),
 	.flags                  = CLOCK_SOURCE_IS_CONTINUOUS |
 				  CLOCK_SOURCE_MUST_VERIFY,
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v5 5/6] KVM: x86: switch to masterclock update using timekeeper functionality
  2017-08-30 15:23 [PATCH v5 0/6] KVM: x86: get rid of pvclock_gtod_copy Denis Plotnikov
                   ` (3 preceding siblings ...)
  2017-08-30 15:23 ` [PATCH v5 4/6] tsc: implement the extended tsc reading function Denis Plotnikov
@ 2017-08-30 15:23 ` Denis Plotnikov
  2017-08-30 15:23 ` [PATCH v5 6/6] KVM: x86: remove not used pvclock_gtod_copy Denis Plotnikov
  2017-09-11  9:24 ` [PATCH v5 0/6] KVM: x86: get rid of pvclock_gtod_copy Denis Plotnikov
  6 siblings, 0 replies; 17+ messages in thread
From: Denis Plotnikov @ 2017-08-30 15:23 UTC (permalink / raw)
  To: pbonzini, rkrcmar, kvm, john.stultz, tglx
  Cc: mingo, hpa, linux-kernel, x86, rkagan, den

It is reasonable to switch KVM to using a more simple, effective
and conceptually correct scheme of dealing with the data needed
for kvm masterclock values calculation.

With the current scheme the kvm needs to have an up-to-date copy of
some timekeeper data to provide a guest using kvmclock with necessary
information.

This is not:
    - simple
        KVM has to have a lot of code to do that, instead KVM could use
        a timekeeper function to get all the data it needs
    - effective
        the copy of the data used for time data calculation is updated
        every time it changed although this is not necessary since
        the updated piece of time data is needed in certain moments only
        (e.g masterclock updating), instead KVM can request this data
        directly form the timekeeper at the moments when it's really needed
    - conceptually correct
        to do the work (calculate the time data) which the other part
        of the system (timekeeper) has been designed and is able to do
        is not the proper way, instead deligate the work to the proper part

This patch switches KVM to using the improved timekeeper function for
the kvm masterclock time data.

Removing the leftovers of the old scheme is the matter of the next patches.

Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com>
---
 arch/x86/kvm/x86.c | 26 ++++++++++++++++++--------
 1 file changed, 18 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 05a5e57..0e86729 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1643,22 +1643,32 @@ static int do_realtime(struct timespec *ts, u64 *cycle_now)
 /* returns true if host is using tsc clocksource */
 static bool kvm_get_time_and_clockread(s64 *kernel_ns, u64 *cycle_now)
 {
-	/* checked again under seqlock below */
-	if (pvclock_gtod_data.clock.vclock_mode != VCLOCK_TSC)
-		return false;
+	struct system_time_snapshot systime_snapshot;
+
+	ktime_get_snapshot(&systime_snapshot);
+
+	if (systime_snapshot.cycles_valid) {
+		*kernel_ns = ktime_to_ns(systime_snapshot.boot);
+		*cycle_now = systime_snapshot.cycles;
+	}
 
-	return do_monotonic_boot(kernel_ns, cycle_now) == VCLOCK_TSC;
+	return systime_snapshot.cycles_valid;
 }
 
 /* returns true if host is using tsc clocksource */
 static bool kvm_get_walltime_and_clockread(struct timespec *ts,
 					   u64 *cycle_now)
 {
-	/* checked again under seqlock below */
-	if (pvclock_gtod_data.clock.vclock_mode != VCLOCK_TSC)
-		return false;
+	struct system_time_snapshot systime_snapshot;
+
+	ktime_get_snapshot(&systime_snapshot);
+
+	if (systime_snapshot.cycles_valid) {
+		*ts = ktime_to_timespec(systime_snapshot.real);
+		*cycle_now = systime_snapshot.cycles;
+	}
 
-	return do_realtime(ts, cycle_now) == VCLOCK_TSC;
+	return systime_snapshot.cycles_valid;
 }
 #endif
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v5 6/6] KVM: x86: remove not used pvclock_gtod_copy
  2017-08-30 15:23 [PATCH v5 0/6] KVM: x86: get rid of pvclock_gtod_copy Denis Plotnikov
                   ` (4 preceding siblings ...)
  2017-08-30 15:23 ` [PATCH v5 5/6] KVM: x86: switch to masterclock update using timekeeper functionality Denis Plotnikov
@ 2017-08-30 15:23 ` Denis Plotnikov
  2017-09-11  9:24 ` [PATCH v5 0/6] KVM: x86: get rid of pvclock_gtod_copy Denis Plotnikov
  6 siblings, 0 replies; 17+ messages in thread
From: Denis Plotnikov @ 2017-08-30 15:23 UTC (permalink / raw)
  To: pbonzini, rkrcmar, kvm, john.stultz, tglx
  Cc: mingo, hpa, linux-kernel, x86, rkagan, den

Since, KVM has been switched to getting masterclock related data
right from the timekeeper by the previous patches, now we are able
to remove all the parts related to the old scheme of getting
masterclock data.

This patch removes those parts.

Using of pvclock_gtod_copy notifier might be redundant because
the current scheme doesn't need to update any shadow timekeeper
data structure (pvclock_gtod_copy) every time the timekeeper does
changes. The notifier has been left unremoved intentionally and its
replacing with something else is the matter of future work.

Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com>
---
 arch/x86/include/asm/kvm_host.h |   2 +-
 arch/x86/kvm/trace.h            |  31 ++----
 arch/x86/kvm/x86.c              | 219 +++++++---------------------------------
 3 files changed, 46 insertions(+), 206 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index f4d120a..a1dc06c 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -792,7 +792,7 @@ struct kvm_arch {
 	u64 cur_tsc_generation;
 	int nr_vcpus_matched_tsc;
 
-	spinlock_t pvclock_gtod_sync_lock;
+	spinlock_t masterclock_lock;
 	bool use_master_clock;
 	u64 master_kernel_ns;
 	u64 master_cycle_now;
diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index 0a6cc67..923ab31 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -807,45 +807,39 @@ TRACE_EVENT(kvm_write_tsc_offset,
 
 #ifdef CONFIG_X86_64
 
-#define host_clocks					\
-	{VCLOCK_NONE, "none"},				\
-	{VCLOCK_TSC,  "tsc"}				\
-
 TRACE_EVENT(kvm_update_master_clock,
-	TP_PROTO(bool use_master_clock, unsigned int host_clock, bool offset_matched),
-	TP_ARGS(use_master_clock, host_clock, offset_matched),
+	TP_PROTO(bool use_master_clock, bool host_clock_stable,
+		bool offset_matched),
+	TP_ARGS(use_master_clock, host_clock_stable, offset_matched),
 
 	TP_STRUCT__entry(
 		__field(		bool,	use_master_clock	)
-		__field(	unsigned int,	host_clock		)
+		__field(		bool,	host_clock_stable	)
 		__field(		bool,	offset_matched		)
 	),
 
 	TP_fast_assign(
 		__entry->use_master_clock	= use_master_clock;
-		__entry->host_clock		= host_clock;
+		__entry->host_clock_stable	= host_clock_stable;
 		__entry->offset_matched		= offset_matched;
 	),
 
-	TP_printk("masterclock %d hostclock %s offsetmatched %u",
+	TP_printk("masterclock %d hostclock stable %u offsetmatched %u",
 		  __entry->use_master_clock,
-		  __print_symbolic(__entry->host_clock, host_clocks),
+		  __entry->host_clock_stable,
 		  __entry->offset_matched)
 );
 
 TRACE_EVENT(kvm_track_tsc,
 	TP_PROTO(unsigned int vcpu_id, unsigned int nr_matched,
-		 unsigned int online_vcpus, bool use_master_clock,
-		 unsigned int host_clock),
-	TP_ARGS(vcpu_id, nr_matched, online_vcpus, use_master_clock,
-		host_clock),
+		 unsigned int online_vcpus, bool use_master_clock),
+	TP_ARGS(vcpu_id, nr_matched, online_vcpus, use_master_clock),
 
 	TP_STRUCT__entry(
 		__field(	unsigned int,	vcpu_id			)
 		__field(	unsigned int,	nr_vcpus_matched_tsc	)
 		__field(	unsigned int,	online_vcpus		)
 		__field(	bool,		use_master_clock	)
-		__field(	unsigned int,	host_clock		)
 	),
 
 	TP_fast_assign(
@@ -853,14 +847,11 @@ TRACE_EVENT(kvm_track_tsc,
 		__entry->nr_vcpus_matched_tsc	= nr_matched;
 		__entry->online_vcpus		= online_vcpus;
 		__entry->use_master_clock	= use_master_clock;
-		__entry->host_clock		= host_clock;
 	),
 
-	TP_printk("vcpu_id %u masterclock %u offsetmatched %u nr_online %u"
-		  " hostclock %s",
+	TP_printk("vcpu_id %u masterclock %u offsetmatched %u nr_online %u",
 		  __entry->vcpu_id, __entry->use_master_clock,
-		  __entry->nr_vcpus_matched_tsc, __entry->online_vcpus,
-		  __print_symbolic(__entry->host_clock, host_clocks))
+		  __entry->nr_vcpus_matched_tsc, __entry->online_vcpus)
 );
 
 #endif /* CONFIG_X86_64 */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 0e86729..fc180b5 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1134,50 +1134,6 @@ static int do_set_msr(struct kvm_vcpu *vcpu, unsigned index, u64 *data)
 	return kvm_set_msr(vcpu, &msr);
 }
 
-#ifdef CONFIG_X86_64
-struct pvclock_gtod_data {
-	seqcount_t	seq;
-
-	struct { /* extract of a clocksource struct */
-		int vclock_mode;
-		u64	cycle_last;
-		u64	mask;
-		u32	mult;
-		u32	shift;
-	} clock;
-
-	u64		boot_ns;
-	u64		nsec_base;
-	u64		wall_time_sec;
-};
-
-static struct pvclock_gtod_data pvclock_gtod_data;
-
-static void update_pvclock_gtod(struct timekeeper *tk)
-{
-	struct pvclock_gtod_data *vdata = &pvclock_gtod_data;
-	u64 boot_ns;
-
-	boot_ns = ktime_to_ns(ktime_add(tk->tkr_mono.base, tk->offs_boot));
-
-	write_seqcount_begin(&vdata->seq);
-
-	/* copy pvclock gtod data */
-	vdata->clock.vclock_mode	= tk->tkr_mono.clock->archdata.vclock_mode;
-	vdata->clock.cycle_last		= tk->tkr_mono.cycle_last;
-	vdata->clock.mask		= tk->tkr_mono.mask;
-	vdata->clock.mult		= tk->tkr_mono.mult;
-	vdata->clock.shift		= tk->tkr_mono.shift;
-
-	vdata->boot_ns			= boot_ns;
-	vdata->nsec_base		= tk->tkr_mono.xtime_nsec;
-
-	vdata->wall_time_sec            = tk->xtime_sec;
-
-	write_seqcount_end(&vdata->seq);
-}
-#endif
-
 void kvm_set_pending_timer(struct kvm_vcpu *vcpu)
 {
 	/*
@@ -1269,10 +1225,6 @@ static void kvm_get_time_scale(uint64_t scaled_hz, uint64_t base_hz,
 		 __func__, base_hz, scaled_hz, shift, *pmultiplier);
 }
 
-#ifdef CONFIG_X86_64
-static atomic_t kvm_guest_has_master_clock = ATOMIC_INIT(0);
-#endif
-
 static DEFINE_PER_CPU(unsigned long, cpu_tsc_khz);
 static unsigned long max_tsc_khz;
 
@@ -1366,7 +1318,6 @@ static void kvm_track_tsc_matching(struct kvm_vcpu *vcpu)
 #ifdef CONFIG_X86_64
 	bool vcpus_matched;
 	struct kvm_arch *ka = &vcpu->kvm->arch;
-	struct pvclock_gtod_data *gtod = &pvclock_gtod_data;
 
 	vcpus_matched = (ka->nr_vcpus_matched_tsc + 1 ==
 			 atomic_read(&vcpu->kvm->online_vcpus));
@@ -1379,13 +1330,12 @@ static void kvm_track_tsc_matching(struct kvm_vcpu *vcpu)
 	 * and the vcpus need to have matched TSCs.  When that happens,
 	 * perform request to enable masterclock.
 	 */
-	if (ka->use_master_clock ||
-	    (gtod->clock.vclock_mode == VCLOCK_TSC && vcpus_matched))
+	if (ka->use_master_clock || vcpus_matched)
 		kvm_make_request(KVM_REQ_MASTERCLOCK_UPDATE, vcpu);
 
 	trace_kvm_track_tsc(vcpu->vcpu_id, ka->nr_vcpus_matched_tsc,
-			    atomic_read(&vcpu->kvm->online_vcpus),
-		            ka->use_master_clock, gtod->clock.vclock_mode);
+				atomic_read(&vcpu->kvm->online_vcpus),
+				ka->use_master_clock);
 #endif
 }
 
@@ -1538,7 +1488,7 @@ void kvm_write_tsc(struct kvm_vcpu *vcpu, struct msr_data *msr)
 	kvm_vcpu_write_tsc_offset(vcpu, offset);
 	raw_spin_unlock_irqrestore(&kvm->arch.tsc_write_lock, flags);
 
-	spin_lock(&kvm->arch.pvclock_gtod_sync_lock);
+	spin_lock(&kvm->arch.masterclock_lock);
 	if (!matched) {
 		kvm->arch.nr_vcpus_matched_tsc = 0;
 	} else if (!already_matched) {
@@ -1546,9 +1496,8 @@ void kvm_write_tsc(struct kvm_vcpu *vcpu, struct msr_data *msr)
 	}
 
 	kvm_track_tsc_matching(vcpu);
-	spin_unlock(&kvm->arch.pvclock_gtod_sync_lock);
+	spin_unlock(&kvm->arch.masterclock_lock);
 }
-
 EXPORT_SYMBOL_GPL(kvm_write_tsc);
 
 static inline void adjust_tsc_offset_guest(struct kvm_vcpu *vcpu,
@@ -1567,79 +1516,6 @@ static inline void adjust_tsc_offset_host(struct kvm_vcpu *vcpu, s64 adjustment)
 
 #ifdef CONFIG_X86_64
 
-static u64 read_tsc(void)
-{
-	u64 ret = (u64)rdtsc_ordered();
-	u64 last = pvclock_gtod_data.clock.cycle_last;
-
-	if (likely(ret >= last))
-		return ret;
-
-	/*
-	 * GCC likes to generate cmov here, but this branch is extremely
-	 * predictable (it's just a function of time and the likely is
-	 * very likely) and there's a data dependence, so force GCC
-	 * to generate a branch instead.  I don't barrier() because
-	 * we don't actually need a barrier, and if this function
-	 * ever gets inlined it will generate worse code.
-	 */
-	asm volatile ("");
-	return last;
-}
-
-static inline u64 vgettsc(u64 *cycle_now)
-{
-	long v;
-	struct pvclock_gtod_data *gtod = &pvclock_gtod_data;
-
-	*cycle_now = read_tsc();
-
-	v = (*cycle_now - gtod->clock.cycle_last) & gtod->clock.mask;
-	return v * gtod->clock.mult;
-}
-
-static int do_monotonic_boot(s64 *t, u64 *cycle_now)
-{
-	struct pvclock_gtod_data *gtod = &pvclock_gtod_data;
-	unsigned long seq;
-	int mode;
-	u64 ns;
-
-	do {
-		seq = read_seqcount_begin(&gtod->seq);
-		mode = gtod->clock.vclock_mode;
-		ns = gtod->nsec_base;
-		ns += vgettsc(cycle_now);
-		ns >>= gtod->clock.shift;
-		ns += gtod->boot_ns;
-	} while (unlikely(read_seqcount_retry(&gtod->seq, seq)));
-	*t = ns;
-
-	return mode;
-}
-
-static int do_realtime(struct timespec *ts, u64 *cycle_now)
-{
-	struct pvclock_gtod_data *gtod = &pvclock_gtod_data;
-	unsigned long seq;
-	int mode;
-	u64 ns;
-
-	do {
-		seq = read_seqcount_begin(&gtod->seq);
-		mode = gtod->clock.vclock_mode;
-		ts->tv_sec = gtod->wall_time_sec;
-		ns = gtod->nsec_base;
-		ns += vgettsc(cycle_now);
-		ns >>= gtod->clock.shift;
-	} while (unlikely(read_seqcount_retry(&gtod->seq, seq)));
-
-	ts->tv_sec += __iter_div_u64_rem(ns, NSEC_PER_SEC, &ns);
-	ts->tv_nsec = ns;
-
-	return mode;
-}
-
 /* returns true if host is using tsc clocksource */
 static bool kvm_get_time_and_clockread(s64 *kernel_ns, u64 *cycle_now)
 {
@@ -1713,34 +1589,34 @@ static bool kvm_get_walltime_and_clockread(struct timespec *ts,
  *
  */
 
-static void pvclock_update_vm_gtod_copy(struct kvm *kvm)
+/* returns true if the value of ka->use_master_clock is changed*/
+static bool update_masterclock(struct kvm *kvm)
 {
 #ifdef CONFIG_X86_64
 	struct kvm_arch *ka = &kvm->arch;
-	int vclock_mode;
-	bool host_tsc_clocksource, vcpus_matched;
+	bool host_clocksource_stable, vcpus_matched, use_master_clock, r;
 
 	vcpus_matched = (ka->nr_vcpus_matched_tsc + 1 ==
 			atomic_read(&kvm->online_vcpus));
 
 	/*
-	 * If the host uses TSC clock, then passthrough TSC as stable
-	 * to the guest.
+	 * kvm_get_time_and_clockread returns true if clocksource is stable
 	 */
-	host_tsc_clocksource = kvm_get_time_and_clockread(
+	host_clocksource_stable = kvm_get_time_and_clockread(
 					&ka->master_kernel_ns,
 					&ka->master_cycle_now);
 
-	ka->use_master_clock = host_tsc_clocksource && vcpus_matched
+	use_master_clock = host_clocksource_stable && vcpus_matched
 				&& !ka->backwards_tsc_observed
 				&& !ka->boot_vcpu_runs_old_kvmclock;
 
-	if (ka->use_master_clock)
-		atomic_set(&kvm_guest_has_master_clock, 1);
+	r = ka->use_master_clock ^ use_master_clock;
+	ka->use_master_clock = use_master_clock;
+
+	trace_kvm_update_master_clock(ka->use_master_clock,
+				host_clocksource_stable, vcpus_matched);
 
-	vclock_mode = pvclock_gtod_data.clock.vclock_mode;
-	trace_kvm_update_master_clock(ka->use_master_clock, vclock_mode,
-					vcpus_matched);
+	return r;
 #endif
 }
 
@@ -1756,19 +1632,18 @@ static void kvm_gen_update_masterclock(struct kvm *kvm)
 	struct kvm_vcpu *vcpu;
 	struct kvm_arch *ka = &kvm->arch;
 
-	spin_lock(&ka->pvclock_gtod_sync_lock);
+	spin_lock(&ka->masterclock_lock);
 	kvm_make_mclock_inprogress_request(kvm);
 	/* no guest entries from this point */
-	pvclock_update_vm_gtod_copy(kvm);
-
-	kvm_for_each_vcpu(i, vcpu, kvm)
-		kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
+	if (update_masterclock(kvm) || !ka->use_master_clock)
+		kvm_for_each_vcpu(i, vcpu, kvm)
+			kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
 
 	/* guest entries allowed */
 	kvm_for_each_vcpu(i, vcpu, kvm)
 		kvm_clear_request(KVM_REQ_MCLOCK_INPROGRESS, vcpu);
 
-	spin_unlock(&ka->pvclock_gtod_sync_lock);
+	spin_unlock(&ka->masterclock_lock);
 #endif
 }
 
@@ -1778,15 +1653,15 @@ u64 get_kvmclock_ns(struct kvm *kvm)
 	struct pvclock_vcpu_time_info hv_clock;
 	u64 ret;
 
-	spin_lock(&ka->pvclock_gtod_sync_lock);
+	spin_lock(&ka->masterclock_lock);
 	if (!ka->use_master_clock) {
-		spin_unlock(&ka->pvclock_gtod_sync_lock);
+		spin_unlock(&ka->masterclock_lock);
 		return ktime_get_boot_ns() + ka->kvmclock_offset;
 	}
 
 	hv_clock.tsc_timestamp = ka->master_cycle_now;
 	hv_clock.system_time = ka->master_kernel_ns + ka->kvmclock_offset;
-	spin_unlock(&ka->pvclock_gtod_sync_lock);
+	spin_unlock(&ka->masterclock_lock);
 
 	/* both __this_cpu_read() and rdtsc() should be on the same cpu */
 	get_cpu();
@@ -1872,13 +1747,13 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
 	 * If the host uses TSC clock, then passthrough TSC as stable
 	 * to the guest.
 	 */
-	spin_lock(&ka->pvclock_gtod_sync_lock);
+	spin_lock(&ka->masterclock_lock);
 	use_master_clock = ka->use_master_clock;
 	if (use_master_clock) {
 		host_tsc = ka->master_cycle_now;
 		kernel_ns = ka->master_kernel_ns;
 	}
-	spin_unlock(&ka->pvclock_gtod_sync_lock);
+	spin_unlock(&ka->masterclock_lock);
 
 	/* Keep irq disabled to prevent changes to the clock */
 	local_irq_save(flags);
@@ -4220,11 +4095,7 @@ long kvm_arch_vm_ioctl(struct file *filp,
 			goto out;
 
 		r = 0;
-		/*
-		 * TODO: userspace has to take care of races with VCPU_RUN, so
-		 * kvm_gen_update_masterclock() can be cut down to locked
-		 * pvclock_update_vm_gtod_copy().
-		 */
+
 		kvm_gen_update_masterclock(kvm);
 		now_ns = get_kvmclock_ns(kvm);
 		kvm->arch.kvmclock_offset += user_ns.clock - now_ns;
@@ -6053,7 +5924,8 @@ static void kvm_set_mmio_spte_mask(void)
 }
 
 #ifdef CONFIG_X86_64
-static void pvclock_gtod_update_fn(struct work_struct *work)
+static int request_masterclock_update(struct notifier_block *nb,
+					unsigned long unused0, void *unused1)
 {
 	struct kvm *kvm;
 
@@ -6064,35 +5936,12 @@ static void pvclock_gtod_update_fn(struct work_struct *work)
 	list_for_each_entry(kvm, &vm_list, vm_list)
 		kvm_for_each_vcpu(i, vcpu, kvm)
 			kvm_make_request(KVM_REQ_MASTERCLOCK_UPDATE, vcpu);
-	atomic_set(&kvm_guest_has_master_clock, 0);
 	spin_unlock(&kvm_lock);
-}
-
-static DECLARE_WORK(pvclock_gtod_work, pvclock_gtod_update_fn);
-
-/*
- * Notification about pvclock gtod data update.
- */
-static int pvclock_gtod_notify(struct notifier_block *nb, unsigned long unused,
-			       void *priv)
-{
-	struct pvclock_gtod_data *gtod = &pvclock_gtod_data;
-	struct timekeeper *tk = priv;
-
-	update_pvclock_gtod(tk);
-
-	/* disable master clock if host does not trust, or does not
-	 * use, TSC clocksource
-	 */
-	if (gtod->clock.vclock_mode != VCLOCK_TSC &&
-	    atomic_read(&kvm_guest_has_master_clock) != 0)
-		queue_work(system_long_wq, &pvclock_gtod_work);
-
 	return 0;
 }
 
 static struct notifier_block pvclock_gtod_notifier = {
-	.notifier_call = pvclock_gtod_notify,
+	.notifier_call = request_masterclock_update,
 };
 #endif
 
@@ -6145,7 +5994,7 @@ int kvm_arch_init(void *opaque)
 
 	kvm_lapic_init();
 #ifdef CONFIG_X86_64
-	pvclock_gtod_register_notifier(&pvclock_gtod_notifier);
+	pvclock_gtod_unregister_notifier(&pvclock_gtod_notifier);
 #endif
 
 	return 0;
@@ -8071,10 +7920,10 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 	raw_spin_lock_init(&kvm->arch.tsc_write_lock);
 	mutex_init(&kvm->arch.apic_map_lock);
 	mutex_init(&kvm->arch.hyperv.hv_lock);
-	spin_lock_init(&kvm->arch.pvclock_gtod_sync_lock);
+	spin_lock_init(&kvm->arch.masterclock_lock);
 
 	kvm->arch.kvmclock_offset = -ktime_get_boot_ns();
-	pvclock_update_vm_gtod_copy(kvm);
+	update_masterclock(kvm);
 
 	INIT_DELAYED_WORK(&kvm->arch.kvmclock_update_work, kvmclock_update_fn);
 	INIT_DELAYED_WORK(&kvm->arch.kvmclock_sync_work, kvmclock_sync_fn);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v5 0/6] KVM: x86: get rid of pvclock_gtod_copy
  2017-08-30 15:23 [PATCH v5 0/6] KVM: x86: get rid of pvclock_gtod_copy Denis Plotnikov
                   ` (5 preceding siblings ...)
  2017-08-30 15:23 ` [PATCH v5 6/6] KVM: x86: remove not used pvclock_gtod_copy Denis Plotnikov
@ 2017-09-11  9:24 ` Denis Plotnikov
  2017-09-18  7:35   ` Thomas Gleixner
  6 siblings, 1 reply; 17+ messages in thread
From: Denis Plotnikov @ 2017-09-11  9:24 UTC (permalink / raw)
  To: pbonzini, rkrcmar, kvm, john.stultz, tglx
  Cc: mingo, hpa, linux-kernel, x86, rkagan, den

ping!

On 30.08.2017 18:23, Denis Plotnikov wrote:
> V5:
>    It was decided to split the series
>    "make L2's kvm-clock stable, get rid of pvclock_gtod_copy in KVM"
>    into two parts to make the changes finer grained:
>      1. get rid of pvclock_gtod_copy in KVM
>      2. make L2's kvm-clock stable
>    This V5 series implements the 1st part. This part gets rid
>    of pvclock_gtod_copy by switching to use the timekeeper functionality
>    being extended. This new scheme doesn't add any new notifiers and keeps
>    using the existing one (see patch 6 description) which had been used by
>    the pvclock_gtod_copy to keep track timekeeper data changes.
> 
>    The 2nd part is being discussed and is going to be sent later on.
> 
> V4:
>    * removed "is stable" function with vague definition of stability
>      there is the only function which does time with cycle stamp getting
>    * some variables renamed
>    * some patches split into smaller once
>    * atomic64_t usage is replaced with atomic_t
>   
> V3:
>    Changing the timekeeper interface for clocksource reading looks like
>    an overkill to achive the goal of getting cycles stamp for KVM.
>    Instead extend the timekeeping interface and add functions which provide
>    necessary data: read clocksource with cycles stamp, check whether the
>    clock source is stable.
> 
>    Use those functions and improve existing timekeeper functionality to
>    replace pvclock_gtod_copy scheme in masterclock data calculation.
> 
> V2:
>    The main goal is to make L2 kvm-clock be stable when it's running over L1
>    with stable kvm-clock.
> 
>    The patch series is for x86 architecture only. If the series is approved
>    I'll do changes for other architectures but I don't have an ability to
>    compile and check for every single on (help needed)
> 
>    The patch series do the following:
> 
>          * change timekeeper interface to get cycles stamp value from
>            the timekeeper
>          * get rid of pvclock copy in KVM by using the changed timekeeper
>            interface: get time and cycles right from the timekeeper
>          * make KVM recognize a stable kvm-clock as stable clocksource
>            and use the KVM masterclock in this case, which means making
>            L2 stable when running over stable L1 kvm-clock
> 
> Denis Plotnikov (6):
>    timekeeper: introduce extended clocksource reading callback
>    timekeeper: introduce boot field in system_time_snapshot
>    timekeeper: use the extended reading function on snapshot acquiring
>    tsc: implement the extended tsc reading function
>    KVM: x86: switch to masterclock update using timekeeper functionality
>    KVM: x86: remove not used pvclock_gtod_copy
> 
>   arch/x86/include/asm/kvm_host.h |   2 +-
>   arch/x86/kernel/tsc.c           |  10 ++
>   arch/x86/kvm/trace.h            |  31 ++---
>   arch/x86/kvm/x86.c              | 245 +++++++++-------------------------------
>   include/linux/clocksource.h     |  11 +-
>   include/linux/timekeeping.h     |   5 +
>   kernel/time/timekeeping.c       |  17 ++-
>   7 files changed, 104 insertions(+), 217 deletions(-)
> 

-- 
Best,
Denis

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v5 0/6] KVM: x86: get rid of pvclock_gtod_copy
  2017-09-11  9:24 ` [PATCH v5 0/6] KVM: x86: get rid of pvclock_gtod_copy Denis Plotnikov
@ 2017-09-18  7:35   ` Thomas Gleixner
  0 siblings, 0 replies; 17+ messages in thread
From: Thomas Gleixner @ 2017-09-18  7:35 UTC (permalink / raw)
  To: Denis Plotnikov
  Cc: pbonzini, rkrcmar, kvm, john.stultz, mingo, hpa, linux-kernel,
	x86, rkagan, den

On Mon, 11 Sep 2017, Denis Plotnikov wrote:

> ping!

I'll have a look next week as I'm not much around this week.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v5 1/6] timekeeper: introduce extended clocksource reading callback
  2017-08-30 15:23 ` [PATCH v5 1/6] timekeeper: introduce extended clocksource reading callback Denis Plotnikov
@ 2017-09-25 22:00   ` Thomas Gleixner
  2017-09-26 16:51     ` Paolo Bonzini
  0 siblings, 1 reply; 17+ messages in thread
From: Thomas Gleixner @ 2017-09-25 22:00 UTC (permalink / raw)
  To: Denis Plotnikov
  Cc: pbonzini, rkrcmar, kvm, john.stultz, mingo, hpa, linux-kernel,
	x86, rkagan, den

On Wed, 30 Aug 2017, Denis Plotnikov wrote:

> The callback provides extended information about just read
> clocksource value.
> 
> It's going to be used in cases when detailed system information
> needed for further time related values calculation, e.g in KVM
> masterclock settings calculation.
> 
> Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com>
> ---
>  include/linux/clocksource.h | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
> index a78cb18..786a522 100644
> --- a/include/linux/clocksource.h
> +++ b/include/linux/clocksource.h
> @@ -48,7 +48,14 @@ struct module;
>   *			400-499: Perfect
>   *				The ideal clocksource. A must-use where
>   *				available.
> - * @read:		returns a cycle value, passes clocksource as argument
> + * @read:		returns a cycle value (might be not quite cycles:
> + *			see pvclock) passes clocksource as argument
> + * @read_with_stamp:	saves cycles value (got from timekeeper) and cycles
> + *			stamp (got from hardware counter value and used by
> + *			timekeeper to calculate the cycles value) to
> + *			corresponding input pointers return true if cycles
> + *			stamp holds real cycles and false if it holds some
> + *			time derivative value

Neither the changelog nor this comment make any sense. Not to talk about
the actual TSC side implementation which does the same as tsc_read() - it
actually uses tsc_read() - but stores the same value twice and
unconditionally returns true.

I have no idea why you need this extra voodoo function if you can achieve
the same thing with a simple property bit in clocksource->flags.

Neither do I understand the rest of the magic you introduce in the snapshot
code:

> +               if (clock->read_with_stamp)
> +                       systime_snapshot->cycles_valid =
> +                               clock->read_with_stamp(
> +                                       clock, &now, &systime_snapshot->cycles);
> +               else {
> +                       systime_snapshot->cycles_valid = false;
> +                       now = clock->read(clock);
> +                       systime_snapshot->cycles = now;
> +               }

The only difference between the two code pathes is the effect on
systime_snapshot->cycles_valid. The explanation of that bit is not making
much sense either.

+ * @cycles_valid:      The flag is true when @cycles contain actual
+ *                     number of cycles instead some cycle derivative

Why the heck would cycles be something different than what clock->read()
returns?

What you really want to convey is the information whether

     now = tk_clock_read(&tk->tkr_mono);

is a value which you can use for your pvclock correlation or not.

Unless I'm missing something important all of this can be achieved with a
minimal and actually understandable patch. See below.

Thanks,

	tglx
	
8<------------------
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -1045,7 +1045,8 @@ static struct clocksource clocksource_ts
 	.read                   = read_tsc,
 	.mask                   = CLOCKSOURCE_MASK(64),
 	.flags                  = CLOCK_SOURCE_IS_CONTINUOUS |
-				  CLOCK_SOURCE_MUST_VERIFY,
+				  CLOCK_SOURCE_MUST_VERIFY |
+				  CLOCK_SOURCE_VALID_FOR_PVCLOCK_UPDATE,
 	.archdata               = { .vclock_mode = VCLOCK_TSC },
 	.resume			= tsc_resume,
 	.mark_unstable		= tsc_cs_mark_unstable,
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -118,7 +118,9 @@ struct clocksource {
 #define CLOCK_SOURCE_VALID_FOR_HRES		0x20
 #define CLOCK_SOURCE_UNSTABLE			0x40
 #define CLOCK_SOURCE_SUSPEND_NONSTOP		0x80
-#define CLOCK_SOURCE_RESELECT			0x100
+#define CLOCK_SOURCE_VALID_FOR_PVCLOCK_UPDATE	0x100
+
+#define CLOCK_SOURCE_RESELECT			0x200
 
 /* simplify initialization of mask field */
 #define CLOCKSOURCE_MASK(bits) GENMASK_ULL((bits) - 1, 0)
--- a/include/linux/timekeeping.h
+++ b/include/linux/timekeeping.h
@@ -285,6 +285,8 @@ extern void ktime_get_raw_and_real_ts64(
  * @raw:	Monotonic raw system time
  * @clock_was_set_seq:	The sequence number of clock was set events
  * @cs_was_changed_seq:	The sequence number of clocksource change events
+ * @valid_for_pvclock_update: @cycles is from a clocksource which
+ *			       can be used for pvclock updates
  */
 struct system_time_snapshot {
 	u64		cycles;
@@ -292,6 +294,7 @@ struct system_time_snapshot {
 	ktime_t		raw;
 	unsigned int	clock_was_set_seq;
 	u8		cs_was_changed_seq;
+	bool		valid_for_pvclock_update;
 };
 
 /*
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -948,7 +948,7 @@ time64_t __ktime_get_real_seconds(void)
 void ktime_get_snapshot(struct system_time_snapshot *systime_snapshot)
 {
 	struct timekeeper *tk = &tk_core.timekeeper;
-	unsigned long seq;
+	unsigned long seq, clk_flags;
 	ktime_t base_raw;
 	ktime_t base_real;
 	u64 nsec_raw;
@@ -960,6 +960,7 @@ void ktime_get_snapshot(struct system_ti
 	do {
 		seq = read_seqcount_begin(&tk_core.seq);
 		now = tk_clock_read(&tk->tkr_mono);
+		clk_flags = tk->tkr_mono.clock->flags;
 		systime_snapshot->cs_was_changed_seq = tk->cs_was_changed_seq;
 		systime_snapshot->clock_was_set_seq = tk->clock_was_set_seq;
 		base_real = ktime_add(tk->tkr_mono.base,
@@ -970,6 +971,8 @@ void ktime_get_snapshot(struct system_ti
 	} while (read_seqcount_retry(&tk_core.seq, seq));
 
 	systime_snapshot->cycles = now;
+	systime_snapshot->valid_for_pvclock_update =
+		clk_flags & CLOCK_SOURCE_VALID_FOR_PVCLOCK_UPDATE,
 	systime_snapshot->real = ktime_add_ns(base_real, nsec_real);
 	systime_snapshot->raw = ktime_add_ns(base_raw, nsec_raw);
 }

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v5 1/6] timekeeper: introduce extended clocksource reading callback
  2017-09-25 22:00   ` Thomas Gleixner
@ 2017-09-26 16:51     ` Paolo Bonzini
  2017-09-27  8:52       ` Thomas Gleixner
  2017-09-27  9:18       ` Thomas Gleixner
  0 siblings, 2 replies; 17+ messages in thread
From: Paolo Bonzini @ 2017-09-26 16:51 UTC (permalink / raw)
  To: Thomas Gleixner, Denis Plotnikov
  Cc: rkrcmar, kvm, john.stultz, mingo, hpa, linux-kernel, x86, rkagan, den

On 26/09/2017 00:00, Thomas Gleixner wrote:
>> + * @read_with_stamp:	saves cycles value (got from timekeeper) and cycles
>> + *			stamp (got from hardware counter value and used by
>> + *			timekeeper to calculate the cycles value) to
>> + *			corresponding input pointers return true if cycles
>> + *			stamp holds real cycles and false if it holds some
>> + *			time derivative value
>
> Neither the changelog nor this comment make any sense. Not to talk about
> the actual TSC side implementation which does the same as tsc_read() - it
> actually uses tsc_read() - but stores the same value twice and
> unconditionally returns true.
> 
> Unless I'm missing something important all of this can be achieved with a
> minimal and actually understandable patch. See below.

If that is acceptable, that's certainly okay for us too as a start, in
order to clean up the KVM code.

*However*, I must once more ask for help understanding
ktime_get_snapshot, otherwise there is no way that I can start using it
in kvmclock.

In particular I'd like to understand the contract of ktime_get_snapshot,
namely the meaning of the "cycles" field in struct system_time_snapshot.

Does it have to be system clock cycles, like TSC, or can it be just the
value of the clock source?  And if the latter, are the users broken,
because they can receive any random clocksource output in ->cycles?  It
doesn't help that, for all of the users of ->cycles (which are all
reached from get_device_system_crosststamp), the code that actually uses
the member is never called in the current git master.

I asked these questions to John in the previous review, but got no
answer.  My understanding is that get_device_system_crosststamp is
broken for !tsc clocksource.  Because struct system_counterval_t must
have a TSC value, history_begin needs to contain a cross-timestamp of
the TSC (in ->cycles) and the clock (in ->real and ->raw).  And if this
cross-timestamp is not available, get_device_system_crosststamp needs to
know, so that it doesn't use history_begin at all.

Now, such cross-timestamp functionality is exactly what
"read_with_stamp" provides.  Patch 3 also introduces "->cycles_valid" in
struct system_time_snapshot, to report whether the cross-timestamp is
available.  In the case of the TSC clocksource, of course, the two
values of course are the same, and always available (so the function
always returns true).

However, if get_device_system_crosststamp ran with kvmclock or Hyper-V
clocksource, the two values stored by read_with_stamp would be
different, basically a (TSC, nanoseconds) pair.  ktime_get_snapshot
would then return the TSC in ->cycles, and use the nanoseconds value to
compute ->real and ->raw.  Because the cross timestamp is available,
->cycles_valid would be true.

So, if history_begin _is_ broken, after fixing it (with
"read_with_stamp" or otherwise) ktime_get_snapshot would provide KVM
with everything it needs.  If not, I'd be completely off base, and I'd
have to apologize to Denis for screwing up.

Paolo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v5 1/6] timekeeper: introduce extended clocksource reading callback
  2017-09-26 16:51     ` Paolo Bonzini
@ 2017-09-27  8:52       ` Thomas Gleixner
  2017-09-27 10:43         ` Paolo Bonzini
  2017-09-27  9:18       ` Thomas Gleixner
  1 sibling, 1 reply; 17+ messages in thread
From: Thomas Gleixner @ 2017-09-27  8:52 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Denis Plotnikov, rkrcmar, kvm, john.stultz, mingo, hpa,
	linux-kernel, x86, rkagan, den

On Tue, 26 Sep 2017, Paolo Bonzini wrote:
> On 26/09/2017 00:00, Thomas Gleixner wrote:
> >> + * @read_with_stamp:	saves cycles value (got from timekeeper) and cycles
> >> + *			stamp (got from hardware counter value and used by
> >> + *			timekeeper to calculate the cycles value) to
> >> + *			corresponding input pointers return true if cycles
> >> + *			stamp holds real cycles and false if it holds some
> >> + *			time derivative value
> >
> > Neither the changelog nor this comment make any sense. Not to talk about
> > the actual TSC side implementation which does the same as tsc_read() - it
> > actually uses tsc_read() - but stores the same value twice and
> > unconditionally returns true.
> > 
> > Unless I'm missing something important all of this can be achieved with a
> > minimal and actually understandable patch. See below.
> 
> If that is acceptable, that's certainly okay for us too as a start, in
> order to clean up the KVM code.
> 
> *However*, I must once more ask for help understanding
> ktime_get_snapshot, otherwise there is no way that I can start using it
> in kvmclock.
> 
> In particular I'd like to understand the contract of ktime_get_snapshot,
> namely the meaning of the "cycles" field in struct system_time_snapshot.
> 
> Does it have to be system clock cycles, like TSC, or can it be just the
> value of the clock source?  And if the latter, are the users broken,
> because they can receive any random clocksource output in ->cycles?  It
> doesn't help that, for all of the users of ->cycles (which are all
> reached from get_device_system_crosststamp), the code that actually uses
> the member is never called in the current git master.

stamp->cycles is always the raw counter value retrieved via
clock->read(). clock is the current clock source.

> I asked these questions to John in the previous review, but got no
> answer.  My understanding is that get_device_system_crosststamp is
> broken for !tsc clocksource.
>
> Because struct system_counterval_t must have a TSC value, history_begin
> needs to contain a cross-timestamp of the TSC (in ->cycles) and the clock
> (in ->real and ->raw).  And if this cross-timestamp is not available,
> get_device_system_crosststamp needs to know, so that it doesn't use
> history_begin at all.

Forget about TSC. TSC is one particular use case.

What's required here are two clocks which are strictly coupled, i.e. have a
constant frequency ratio and a constant offset. And that is enforced:

                /*
                 * Verify that the clocksource associated with the captured
                 * system counter value is the same as the currently installed
                 * timekeeper clocksource
                 */
                if (tk->tkr_mono.clock != system_counterval.cs)
                        return -ENODEV;

And that's not broken, that merily makes sure that the function CAN provide
valid data and is not trying to correlate stuff which is not coupled.

> Now, such cross-timestamp functionality is exactly what
> "read_with_stamp" provides.

No, it's a hack.

> So, if history_begin _is_ broken, after fixing it (with
> "read_with_stamp" or otherwise) ktime_get_snapshot would provide KVM
> with everything it needs.  If not, I'd be completely off base, and I'd
> have to apologize to Denis for screwing up.

I told you folks before that you are trying to abuse infrastructure which
was designed for a different purpose. Now you're claiming it's broken. It's
not broken, it works perfectly fine for the intended use cases.

It correlates hardware captured timestamps with system time. One particular
use case is high precision PTP.

The network card can capture both PTP time and ART time atomically. ART and
TSC have a constant ratio and constant offset. The cross time stamp
machinery relates that values, which might be outside of the current
timekeeping window, back to CLOCK_MONOTONIC and CLOCK_REALTIME.

But there is no requirement that current clocksource is TSC. The
requirement is:

  The hardware reference clock, the one which can be captured atomically
  with device clock (PTP, audio whatever), is the coupled clock of the
  current timekeeping clocksource. Both have a fixed ratio and offset.

That's completely independend of TSC. TSC/ART are a particular hardware
implementation which can use that infrastructure because they fulfil the
requirement.

So please stop these uninformed claims about brokeness and TSC
requirements. Instead please sit down and figure out whether your
particular use case of kvmclock/hyperv clock actually fit into that
functionality, i.e. whether you have

    1) 'device time'
    2) 'system reference time'
    3) 'system time'

where

    #1 and #2 can be captured atomically

    #2 and #3 are coupled clocks with a fixed ratio and offset

If those requirements are fulfilled then you can simply use it as is and it
will give you CLOCK_MONOTONIC and CLOCK_REALTIME for the captured device
time.

If your use case is different and does not fit into that picture then
please write it up clearly what you are trying to achieve and we can
discuss how to make it work w/o adding duct tape hackery.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v5 1/6] timekeeper: introduce extended clocksource reading callback
  2017-09-26 16:51     ` Paolo Bonzini
  2017-09-27  8:52       ` Thomas Gleixner
@ 2017-09-27  9:18       ` Thomas Gleixner
  1 sibling, 0 replies; 17+ messages in thread
From: Thomas Gleixner @ 2017-09-27  9:18 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Denis Plotnikov, rkrcmar, kvm, john.stultz, mingo, hpa,
	linux-kernel, x86, rkagan, den

On Tue, 26 Sep 2017, Paolo Bonzini wrote:
> However, if get_device_system_crosststamp ran with kvmclock or Hyper-V
> clocksource, the two values stored by read_with_stamp would be
> different, basically a (TSC, nanoseconds) pair.

And that's exactly the problem. The cross time stamp infrastructure keeps
the system clocksource (e.g. TSC) and the reference clock (e.g. ART)
separate. There is enforcement that the two are coupled which I pointed out
in the other mail, but conceptually they are different things and we are
not going to change that and add voodoo callbacks which return different
timestamps from different sources into the clocksource.

So if you want to map kvmclock to the existing infrastructure then:

   device clock:  	       maps to the hypervisor clock
   system reference clock:     maps to REF_TSC
   system timekeeper clock:    maps to TSC

Even if REF_TSC and TSC are the same physically conceptually they are
different.

To make use of the existing infrastructure you have to provide means to
capture HV clock and reference clock atomically. Thats probably something
he hypervisor provides as a value pair HVCLOCK_NSEC, REF_TSC_VAL.

And then you can correlate system time to HVCLOCK_NSEC via REF_TSC_VAL
because that has a fixed ratio / offset to the system TSC.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v5 1/6] timekeeper: introduce extended clocksource reading callback
  2017-09-27  8:52       ` Thomas Gleixner
@ 2017-09-27 10:43         ` Paolo Bonzini
  2017-09-27 11:53           ` Thomas Gleixner
  0 siblings, 1 reply; 17+ messages in thread
From: Paolo Bonzini @ 2017-09-27 10:43 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Denis Plotnikov, rkrcmar, kvm, john.stultz, mingo, hpa,
	linux-kernel, x86, rkagan, den

On 27/09/2017 10:52, Thomas Gleixner wrote:
> But there is no requirement that current clocksource is TSC. The
> requirement is:
> 
>   The hardware reference clock, the one which can be captured atomically
>   with device clock (PTP, audio whatever), is the coupled clock of the
>   current timekeeping clocksource. Both have a fixed ratio and offset.
> 
> That's completely independend of TSC. TSC/ART are a particular hardware
> implementation which can use that infrastructure because they fulfil the
> requirement.

Ok, this helps.

> So please stop these uninformed claims about brokeness and TSC
> requirements.

It was a question, not an uninformed claim.  You answered the question now.

> Instead please sit down and figure out whether your
> particular use case of kvmclock/hyperv clock actually fit into that
> functionality, i.e. whether you have
> 
>     1) 'device time'
>     2) 'system reference time'
>     3) 'system time'
> 
> where
> 
>     #1 and #2 can be captured atomically
> 
>     #2 and #3 are coupled clocks with a fixed ratio and offset
> 
> If those requirements are fulfilled then you can simply use it as is and it
> will give you CLOCK_MONOTONIC and CLOCK_REALTIME for the captured device
> time.
> 
> If your use case is different and does not fit into that picture then
> please write it up clearly what you are trying to achieve and we can
> discuss how to make it work w/o adding duct tape hackery.

Yes, I understand better now why you consider read_with_stamp a hack.
And it is---but I was confused and couldn't think of anything better.

The definitions do fit KVM, and indeed there is ptp-kvm that does
something very similar to what you describe in the other mail.  We have:

	#1 is host time
	#2 is host TSC
	#3 is guest TSC

We know that #2 and #3 have a fixed ratio and offset.  The point is
whether #1 and #2 can be captured atomically.

For PTP-KVM, the host tells the guest; if capturing the two is
impossible, it fails the hypercall and ioctl(PTP_SYS_OFFSET_PRECISE)
fails too.

Right now, the hypercall fails if the host clocksource is not the TSC
clocksource, which is safe.

These patches are about ascertaining whether #1 and #2 can be captured
atomically in a more generic way.  In the read_with_stamp case:

- if it returns true, it gives an atomic reading of #1 and #2

- if it returns false, it gives a reading of #1 only.


I think the hook should be specific to x86.  For example it could be an
array of function pointers, indexed by vclock_mode, with the same
semantics as read_with_stamp.

Paolo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v5 1/6] timekeeper: introduce extended clocksource reading callback
  2017-09-27 10:43         ` Paolo Bonzini
@ 2017-09-27 11:53           ` Thomas Gleixner
  2017-09-27 12:14             ` Paolo Bonzini
  0 siblings, 1 reply; 17+ messages in thread
From: Thomas Gleixner @ 2017-09-27 11:53 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Denis Plotnikov, rkrcmar, kvm, john.stultz, mingo, hpa,
	linux-kernel, x86, rkagan, den

On Wed, 27 Sep 2017, Paolo Bonzini wrote:
> The definitions do fit KVM, and indeed there is ptp-kvm that does
> something very similar to what you describe in the other mail.  We have:
> 
> 	#1 is host time
> 	#2 is host TSC
> 	#3 is guest TSC
> 
> We know that #2 and #3 have a fixed ratio and offset.  The point is
> whether #1 and #2 can be captured atomically.
> 
> For PTP-KVM, the host tells the guest; if capturing the two is
> impossible, it fails the hypercall and ioctl(PTP_SYS_OFFSET_PRECISE)
> fails too.
> 
> Right now, the hypercall fails if the host clocksource is not the TSC
> clocksource, which is safe.
> 
> These patches are about ascertaining whether #1 and #2 can be captured
> atomically in a more generic way.  In the read_with_stamp case:
> 
> - if it returns true, it gives an atomic reading of #1 and #2
> 
> - if it returns false, it gives a reading of #1 only.
> 
> 
> I think the hook should be specific to x86.  For example it could be an
> array of function pointers, indexed by vclock_mode, with the same
> semantics as read_with_stamp.

I don't think you need that.

The get_time_fn() which is handed in to get_device_system_crossstamp() can
convey that information:

                /*
                 * Try to synchronously capture device time and a system
                 * counter value calling back into the device driver
                 */
                ret = get_time_fn(&xtstamp->device, &system_counterval, ctx);
                if (ret)
                        return ret;

So in your case get_time_fn() would be kvmclock or hyperv clock specific
and the actual hypercall implementation can return a failure code if the
requirements are not met:

   1) host clock source is TSC
   2) capturing of host time and TSC is atomic

The host side of the hypercall can use ktime_get_snapshot() with the boot
time extension and the extra bits I posted yesterday and fail if the
snapshot is not valid for that purpose, i.e. host clocksource is not valid
for guest correlation.

The hypercall side on the guest can then do:

	if (hypercall(hypercalldata))
		return -ENODEV;
	system_counterval->cs = tsc_clocksource;
       	system_counterval->data = hypercalldata;
	return 0;

Then the following check in get_device_system_crossstamp() will do the
last sanity check:

		/*
		 * Verify that the clocksource associated with the captured
		 * system counter value is the same as the currently installed
		 * timekeeper clocksource
		 */
		if (tk->tkr_mono.clock != system_counterval.cs)
			return -ENODEV;

which catches the case where the guest side clock source is !TSC.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v5 1/6] timekeeper: introduce extended clocksource reading callback
  2017-09-27 11:53           ` Thomas Gleixner
@ 2017-09-27 12:14             ` Paolo Bonzini
  2017-09-27 13:45               ` Thomas Gleixner
  0 siblings, 1 reply; 17+ messages in thread
From: Paolo Bonzini @ 2017-09-27 12:14 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Denis Plotnikov, rkrcmar, kvm, john.stultz, mingo, hpa,
	linux-kernel, x86, rkagan, den

On 27/09/2017 13:53, Thomas Gleixner wrote:
>> I think the hook should be specific to x86.  For example it could be an
>> array of function pointers, indexed by vclock_mode, with the same
>> semantics as read_with_stamp.
> I don't think you need that.
> 
> The get_time_fn() which is handed in to get_device_system_crossstamp() can
> convey that information:
> 
>                 /*
>                  * Try to synchronously capture device time and a system
>                  * counter value calling back into the device driver
>                  */
>                 ret = get_time_fn(&xtstamp->device, &system_counterval, ctx);
>                 if (ret)
>                         return ret;
> 
> So in your case get_time_fn() would be kvmclock or hyperv clock specific
> and the actual hypercall implementation can return a failure code if the
> requirements are not met:
> 
>    1) host clock source is TSC
>    2) capturing of host time and TSC is atomic

So you are suggesting reusing the cross-timestamp hypercall to implement
nested pvclock.  There are advantages and disadvantages to that.

With read_with_stamp-like callbacks:

+ running on old KVM or on Hyper-V is supported
- pvclock_gtod_copy does not go away

With hypercall-based callbacks on the contrary:

+ KVM can use ktime_get_snapshot for the bare metal case
- only very new KVM is supported

Paolo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v5 1/6] timekeeper: introduce extended clocksource reading callback
  2017-09-27 12:14             ` Paolo Bonzini
@ 2017-09-27 13:45               ` Thomas Gleixner
  0 siblings, 0 replies; 17+ messages in thread
From: Thomas Gleixner @ 2017-09-27 13:45 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Denis Plotnikov, rkrcmar, kvm, john.stultz, mingo, hpa,
	linux-kernel, x86, rkagan, den

On Wed, 27 Sep 2017, Paolo Bonzini wrote:
> On 27/09/2017 13:53, Thomas Gleixner wrote:
> >> I think the hook should be specific to x86.  For example it could be an
> >> array of function pointers, indexed by vclock_mode, with the same
> >> semantics as read_with_stamp.
> > I don't think you need that.
> > 
> > The get_time_fn() which is handed in to get_device_system_crossstamp() can
> > convey that information:
> > 
> >                 /*
> >                  * Try to synchronously capture device time and a system
> >                  * counter value calling back into the device driver
> >                  */
> >                 ret = get_time_fn(&xtstamp->device, &system_counterval, ctx);
> >                 if (ret)
> >                         return ret;
> > 
> > So in your case get_time_fn() would be kvmclock or hyperv clock specific
> > and the actual hypercall implementation can return a failure code if the
> > requirements are not met:
> > 
> >    1) host clock source is TSC
> >    2) capturing of host time and TSC is atomic
> 
> So you are suggesting reusing the cross-timestamp hypercall to implement
> nested pvclock.  There are advantages and disadvantages to that.
> 
> With read_with_stamp-like callbacks:
> 
> + running on old KVM or on Hyper-V is supported
> - pvclock_gtod_copy does not go away
> 
> With hypercall-based callbacks on the contrary:
> 
> + KVM can use ktime_get_snapshot for the bare metal case
> - only very new KVM is supported

I don't think that it's an either or decision. 

  get_device_system_crossstamp(get_time_fn, ......)

So you can have specific get_time_fn() implementations for your situation:

   old_kvm_fn()
	retrieve data from pvclock_gtod copy

   new_kvm_fn()
	use hypercall

   hyperv_fn()
	do what must be done

All implementations need a way to tell you:

    1) Host time
    2) Host TSC timestamp which corresponds to #1
    3) Validity

       For old_kvm_fn() pvclock_gtod_data.clock.vclock_mode == VCLOCK_TSC
       For new_kvm_fn() hypercall result
       For hyperv_fn() whatever it takes

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2017-09-27 13:45 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-30 15:23 [PATCH v5 0/6] KVM: x86: get rid of pvclock_gtod_copy Denis Plotnikov
2017-08-30 15:23 ` [PATCH v5 1/6] timekeeper: introduce extended clocksource reading callback Denis Plotnikov
2017-09-25 22:00   ` Thomas Gleixner
2017-09-26 16:51     ` Paolo Bonzini
2017-09-27  8:52       ` Thomas Gleixner
2017-09-27 10:43         ` Paolo Bonzini
2017-09-27 11:53           ` Thomas Gleixner
2017-09-27 12:14             ` Paolo Bonzini
2017-09-27 13:45               ` Thomas Gleixner
2017-09-27  9:18       ` Thomas Gleixner
2017-08-30 15:23 ` [PATCH v5 2/6] timekeeper: introduce boot field in system_time_snapshot Denis Plotnikov
2017-08-30 15:23 ` [PATCH v5 3/6] timekeeper: use the extended reading function on snapshot acquiring Denis Plotnikov
2017-08-30 15:23 ` [PATCH v5 4/6] tsc: implement the extended tsc reading function Denis Plotnikov
2017-08-30 15:23 ` [PATCH v5 5/6] KVM: x86: switch to masterclock update using timekeeper functionality Denis Plotnikov
2017-08-30 15:23 ` [PATCH v5 6/6] KVM: x86: remove not used pvclock_gtod_copy Denis Plotnikov
2017-09-11  9:24 ` [PATCH v5 0/6] KVM: x86: get rid of pvclock_gtod_copy Denis Plotnikov
2017-09-18  7:35   ` Thomas Gleixner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.