All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] trace-cmd: Align guest TSC calculation with the kernel
@ 2021-09-23  9:45 Tzvetomir Stoyanov (VMware)
  2021-09-23  9:45 ` [PATCH 1/3] trace-cmd: Extend host-guest time sync with fraction bits Tzvetomir Stoyanov (VMware)
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Tzvetomir Stoyanov (VMware) @ 2021-09-23  9:45 UTC (permalink / raw)
  To: rostedt; +Cc: linux-trace-devel

The formula for calculating guest TSC clock, used in the kernel is:
 guest_tsc = tsc-offset + (host_tsc * tsc-scaling-ratio) >> tsc-scaling-ratio-frac-bits
Current trace-cmd implementation does not take into account fraction
bits, when calculating guest clock offset in case of a clock scaling.
Additional logic is added, to use the same formula in trace-cmd. The changes
are backward compatible with trace-cmd 2.9.

The changes are not tested with a kernel that has guest clock scaling enabled.
The goal of the patch set is to extend the time sync option, saved in the
guest trace.dat file, with all needed data used in the kernel for guest TSC
calculations.

Tzvetomir Stoyanov (VMware) (3):
  trace-cmd: Extend host-guest time sync with fraction bits
  trace-cmd: Read and use fraction bits from TRACECMD_OPTION_TIME_SHIFT
  trace-cmd: Dump fraction bits from TRACECMD_OPTION_TIME_SHIFT

 .../include/private/trace-cmd-private.h       |   4 +-
 lib/trace-cmd/include/trace-tsync-local.h     |   7 +-
 lib/trace-cmd/trace-input.c                   |  79 +++++++-------
 lib/trace-cmd/trace-timesync-kvm.c            |  59 +++++++---
 lib/trace-cmd/trace-timesync-ptp.c            |   4 +-
 lib/trace-cmd/trace-timesync.c                |  49 +++++++--
 tracecmd/trace-dump.c                         | 102 +++++++++++++-----
 7 files changed, 205 insertions(+), 99 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/3] trace-cmd: Extend host-guest time sync with fraction bits
  2021-09-23  9:45 [PATCH 0/3] trace-cmd: Align guest TSC calculation with the kernel Tzvetomir Stoyanov (VMware)
@ 2021-09-23  9:45 ` Tzvetomir Stoyanov (VMware)
  2021-10-13  3:04   ` Steven Rostedt
  2021-09-23  9:45 ` [PATCH 2/3] trace-cmd: Read and use fraction bits from TRACECMD_OPTION_TIME_SHIFT Tzvetomir Stoyanov (VMware)
  2021-09-23  9:45 ` [PATCH 3/3] trace-cmd: Dump " Tzvetomir Stoyanov (VMware)
  2 siblings, 1 reply; 10+ messages in thread
From: Tzvetomir Stoyanov (VMware) @ 2021-09-23  9:45 UTC (permalink / raw)
  To: rostedt; +Cc: linux-trace-devel

The formula, used in the Linux kernel for guest TSC clock is:
 guest_tsc = tsc-offset + (host_tsc * tsc-scaling-ratio) >> tsc-scaling-ratio-frac-bits

The KVM hypervisor exports these parameters in the debugfs, if clock
scaling is supported:
 /sys/kernel/debug/kvm:
	tsc-offset
	tsc-scaling-ratio
	tsc-scaling-ratio-frac-bits

Current trace-cmd implementation does not take into account fraction
bits, when calculating guest clock offset in case of a clock scaling.
The TRACECMD_OPTION_TIME_SHIFT option is extended with per-cpu array
with fraction bits. KVM and PTP algorithms are extended to get fraction:
 - in PTP, there are fixed scaling=1 and fraction=0;
 - KVM gets these from KVM debugfs, if available.
The formula from the kernel is implemented in trace-cmd library, when
calculating timestamps of the guest events.

This change is backward compatible with trace-cmd 2.9, where the
TRACECMD_OPTION_TIME_SHIFT option is introduced.

Signed-off-by: Tzvetomir Stoyanov (VMware) <tz.stoyanov@gmail.com>
---
 .../include/private/trace-cmd-private.h       |  4 +-
 lib/trace-cmd/include/trace-tsync-local.h     |  7 ++-
 lib/trace-cmd/trace-timesync-kvm.c            | 59 ++++++++++++++-----
 lib/trace-cmd/trace-timesync-ptp.c            |  4 +-
 lib/trace-cmd/trace-timesync.c                | 49 +++++++++++----
 5 files changed, 93 insertions(+), 30 deletions(-)

diff --git a/lib/trace-cmd/include/private/trace-cmd-private.h b/lib/trace-cmd/include/private/trace-cmd-private.h
index c58b09e9..f40babef 100644
--- a/lib/trace-cmd/include/private/trace-cmd-private.h
+++ b/lib/trace-cmd/include/private/trace-cmd-private.h
@@ -454,8 +454,8 @@ tracecmd_tsync_with_guest(unsigned long long trace_id, int loop_interval,
 			  int guest_cpus, const char *proto_name, const char *clock);
 int tracecmd_tsync_with_guest_stop(struct tracecmd_time_sync *tsync);
 int tracecmd_tsync_get_offsets(struct tracecmd_time_sync *tsync, int cpu,
-				int *count, long long **ts,
-				long long **offsets, long long **scalings);
+			       int *count, long long **ts,
+			       long long **offsets, long long **scalings, long long **frac);
 int tracecmd_tsync_get_session_params(struct tracecmd_time_sync *tsync,
 				      char **selected_proto,
 				      unsigned int *tsync_port);
diff --git a/lib/trace-cmd/include/trace-tsync-local.h b/lib/trace-cmd/include/trace-tsync-local.h
index 7e56d187..b357662c 100644
--- a/lib/trace-cmd/include/trace-tsync-local.h
+++ b/lib/trace-cmd/include/trace-tsync-local.h
@@ -27,14 +27,15 @@ struct tracecmd_time_sync {
 struct clock_sync_offsets {
 	/* Arrays with calculated time offsets at given time */
 	int				sync_size;	/* Allocated size of sync_ts,
-							 * sync_offsets and sync_scalings
+							 * sync_offsets, sync_scalings and sync_frac
 							 */
 	int				sync_count;	/* Number of elements in sync_ts,
-							 * sync_offsets and sync_scalings
+							 * sync_offsets, sync_scalings and sync_frac
 							 */
 	long long			*sync_ts;
 	long long			*sync_offsets;
 	long long			*sync_scalings;
+	long long			*sync_frac;
 };
 
 struct clock_sync_context {
@@ -60,7 +61,7 @@ int tracecmd_tsync_proto_register(const char *proto_name, int accuracy, int role
 				  int (*init)(struct tracecmd_time_sync *),
 				  int (*free)(struct tracecmd_time_sync *),
 				  int (*calc)(struct tracecmd_time_sync *,
-					      long long *, long long *,
+					      long long *, long long *, long long*,
 					      long long *, unsigned int));
 int tracecmd_tsync_proto_unregister(char *proto_name);
 
diff --git a/lib/trace-cmd/trace-timesync-kvm.c b/lib/trace-cmd/trace-timesync-kvm.c
index 58c028ea..acadaca2 100644
--- a/lib/trace-cmd/trace-timesync-kvm.c
+++ b/lib/trace-cmd/trace-timesync-kvm.c
@@ -19,6 +19,7 @@
 #define KVM_DEBUG_FS "/sys/kernel/debug/kvm"
 #define KVM_DEBUG_OFFSET_FILE	"tsc-offset"
 #define KVM_DEBUG_SCALING_FILE	"tsc-scaling-ratio"
+#define KVM_DEBUG_FRACTION_FILE	"tsc-scaling-ratio-frac-bits"
 #define KVM_DEBUG_VCPU_DIR	"vcpu"
 
 /* default KVM scaling values, taken from the Linux kernel */
@@ -37,6 +38,7 @@ struct kvm_clock_sync {
 	int vcpu_count;
 	char **vcpu_offsets;
 	char **vcpu_scalings;
+	char **vcpu_frac;
 	int marker_fd;
 	struct tep_handle *tep;
 	int raw_id;
@@ -47,6 +49,7 @@ struct kvm_clock_offset_msg {
 	s64	ts;
 	s64	offset;
 	s64	scaling;
+	s64	frac;
 };
 
 static int read_ll_from_file(char *file, long long *res)
@@ -72,21 +75,27 @@ static int read_ll_from_file(char *file, long long *res)
 
 static bool kvm_scaling_check_vm_cpu(char *vname, char *cpu)
 {
-	long long scaling;
+	long long scaling, frac;
+	bool has_scaling = false;
+	bool has_frac = false;
 	char *path;
 	int ret;
 
 	if (asprintf(&path, "%s/%s/%s", vname, cpu, KVM_DEBUG_SCALING_FILE) < 0)
-		return true;
+		return false;
 	ret = read_ll_from_file(path, &scaling);
 	free(path);
+	if (!ret)
+		has_scaling = true;
+
+	if (asprintf(&path, "%s/%s/%s", vname, cpu, KVM_DEBUG_FRACTION_FILE) < 0)
+		return false;
+	ret = read_ll_from_file(path, &frac);
+	free(path);
+	if (!ret)
+		has_frac = true;
 
-	/*
-	 * If there is a scaling, different from the default -
-	 * return false, not supported.
-	 */
-	if (!ret &&
-	    scaling != KVM_SCALING_AMD_DEFAULT && scaling != KVM_SCALING_INTEL_DEFAULT)
+	if (has_scaling != has_frac)
 		return false;
 
 	return true;
@@ -175,6 +184,12 @@ static int kvm_open_vcpu_dir(struct kvm_clock_sync *kvm, int cpu, char *dir_str)
 					 dir_str, entry->d_name);
 				kvm->vcpu_scalings[cpu] = strdup(path);
 			}
+			if (!strncmp(entry->d_name, KVM_DEBUG_FRACTION_FILE,
+				     strlen(KVM_DEBUG_FRACTION_FILE))) {
+				snprintf(path, sizeof(path), "%s/%s",
+					 dir_str, entry->d_name);
+				kvm->vcpu_frac[cpu] = strdup(path);
+			}
 		}
 	}
 	if (!kvm->vcpu_offsets[cpu])
@@ -189,6 +204,8 @@ error:
 	kvm->vcpu_offsets[cpu] = NULL;
 	free(kvm->vcpu_scalings[cpu]);
 	kvm->vcpu_scalings[cpu] = NULL;
+	free(kvm->vcpu_frac[cpu]);
+	kvm->vcpu_frac[cpu] = NULL;
 	return -1;
 }
 
@@ -254,7 +271,8 @@ static int kvm_clock_sync_init_host(struct tracecmd_time_sync *tsync,
 	kvm->vcpu_count = tsync->vcpu_count;
 	kvm->vcpu_offsets = calloc(kvm->vcpu_count, sizeof(char *));
 	kvm->vcpu_scalings = calloc(kvm->vcpu_count, sizeof(char *));
-	if (!kvm->vcpu_offsets || !kvm->vcpu_scalings)
+	kvm->vcpu_frac = calloc(kvm->vcpu_count, sizeof(char *));
+	if (!kvm->vcpu_offsets || !kvm->vcpu_scalings || !kvm->vcpu_frac)
 		goto error;
 	if (kvm_open_debug_files(kvm, tsync->guest_pid) < 0)
 		goto error;
@@ -263,6 +281,7 @@ static int kvm_clock_sync_init_host(struct tracecmd_time_sync *tsync,
 error:
 	free(kvm->vcpu_offsets);
 	free(kvm->vcpu_scalings);
+	free(kvm->vcpu_frac);
 	return -1;
 }
 
@@ -353,6 +372,8 @@ static int kvm_clock_sync_free(struct tracecmd_time_sync *tsync)
 			kvm->vcpu_offsets[i] = NULL;
 			free(kvm->vcpu_scalings[i]);
 			kvm->vcpu_scalings[i] = NULL;
+			free(kvm->vcpu_frac[i]);
+			kvm->vcpu_frac[i] = NULL;
 		}
 		if (kvm->tep)
 			tep_free(kvm->tep);
@@ -364,7 +385,7 @@ static int kvm_clock_sync_free(struct tracecmd_time_sync *tsync)
 }
 
 static int kvm_clock_host(struct tracecmd_time_sync *tsync,
-			  long long *offset, long long *scaling,
+			  long long *offset, long long *scaling, long long *frac,
 			  long long *timestamp, unsigned int cpu)
 {
 	char sync_proto[TRACECMD_TSYNC_PNAME_LENGTH];
@@ -374,6 +395,7 @@ static int kvm_clock_host(struct tracecmd_time_sync *tsync,
 	long long kvm_scaling = 1;
 	unsigned int sync_msg;
 	long long kvm_offset;
+	long long kvm_frac;
 	unsigned int size;
 	char *msg;
 	int ret;
@@ -388,12 +410,18 @@ static int kvm_clock_host(struct tracecmd_time_sync *tsync,
 	ret = read_ll_from_file(kvm->vcpu_offsets[cpu], &kvm_offset);
 	if (ret < 0)
 		return -1;
+
+	kvm_scaling = 1;
 	if (kvm->vcpu_scalings && kvm->vcpu_scalings[cpu]) {
 		read_ll_from_file(kvm->vcpu_scalings[cpu], &kvm_scaling);
 		if (kvm_scaling == KVM_SCALING_AMD_DEFAULT ||
 		    kvm_scaling == KVM_SCALING_INTEL_DEFAULT)
 			kvm_scaling = 1;
 	}
+
+	kvm_frac = 0;
+	if (kvm->vcpu_frac && kvm->vcpu_frac[cpu])
+		ret = read_ll_from_file(kvm->vcpu_frac[cpu], &kvm_frac);
 	msg = (char *)&packet;
 	size = sizeof(packet);
 	ret = tracecmd_msg_recv_time_sync(tsync->msg_handle,
@@ -405,6 +433,7 @@ static int kvm_clock_host(struct tracecmd_time_sync *tsync,
 
 	packet.offset = -kvm_offset;
 	packet.scaling = kvm_scaling;
+	packet.frac = kvm_frac;
 	ret = tracecmd_msg_send_time_sync(tsync->msg_handle, KVM_NAME,
 					  KVM_SYNC_PKT_RESPONSE, sizeof(packet),
 					  (char *)&packet);
@@ -413,6 +442,7 @@ static int kvm_clock_host(struct tracecmd_time_sync *tsync,
 
 	*scaling = packet.scaling;
 	*offset = packet.offset;
+	*frac = kvm_frac;
 	*timestamp = packet.ts;
 
 	return 0;
@@ -444,10 +474,10 @@ static int kvm_marker_find(struct tep_event *event, struct tep_record *record,
 	return 0;
 }
 
-
 static int kvm_clock_guest(struct tracecmd_time_sync *tsync,
 			   long long *offset,
 			   long long *scaling,
+			   long long *frac,
 			   long long *timestamp)
 {
 	char sync_proto[TRACECMD_TSYNC_PNAME_LENGTH];
@@ -488,12 +518,13 @@ static int kvm_clock_guest(struct tracecmd_time_sync *tsync,
 
 	*scaling = packet.scaling;
 	*offset = packet.offset;
+	*frac = packet.frac;
 	*timestamp = packet.ts;
 	return 0;
 }
 
 static int kvm_clock_sync_calc(struct tracecmd_time_sync *tsync,
-			       long long *offset, long long *scaling,
+			       long long *offset, long long *scaling, long long *frac,
 			       long long *timestamp, unsigned int cpu)
 {
 	struct clock_sync_context *clock_context;
@@ -505,9 +536,9 @@ static int kvm_clock_sync_calc(struct tracecmd_time_sync *tsync,
 	clock_context = (struct clock_sync_context *)tsync->context;
 
 	if (clock_context->is_guest)
-		ret = kvm_clock_guest(tsync, offset, scaling, timestamp);
+		ret = kvm_clock_guest(tsync, offset, scaling, frac, timestamp);
 	else
-		ret = kvm_clock_host(tsync, offset, scaling, timestamp, cpu);
+		ret = kvm_clock_host(tsync, offset, scaling, frac, timestamp, cpu);
 	return ret;
 }
 
diff --git a/lib/trace-cmd/trace-timesync-ptp.c b/lib/trace-cmd/trace-timesync-ptp.c
index b05f1cd0..70242ee3 100644
--- a/lib/trace-cmd/trace-timesync-ptp.c
+++ b/lib/trace-cmd/trace-timesync-ptp.c
@@ -663,7 +663,7 @@ static int ptp_clock_server(struct tracecmd_time_sync *tsync,
 }
 
 static int ptp_clock_sync_calc(struct tracecmd_time_sync *tsync,
-			       long long *offset, long long *scaling,
+			       long long *offset, long long *scaling, long long *frac,
 			       long long *timestamp, unsigned int cpu)
 {
 	struct clock_sync_context *clock_context;
@@ -689,6 +689,8 @@ static int ptp_clock_sync_calc(struct tracecmd_time_sync *tsync,
 
 	if (scaling)
 		*scaling = 1;
+	if (frac)
+		*frac = 0;
 	if (clock_context->is_server)
 		ret = ptp_clock_server(tsync, offset, timestamp);
 	else
diff --git a/lib/trace-cmd/trace-timesync.c b/lib/trace-cmd/trace-timesync.c
index 19ca19d7..298913a1 100644
--- a/lib/trace-cmd/trace-timesync.c
+++ b/lib/trace-cmd/trace-timesync.c
@@ -36,7 +36,7 @@ struct tsync_proto {
 	int (*clock_sync_init)(struct tracecmd_time_sync *clock_context);
 	int (*clock_sync_free)(struct tracecmd_time_sync *clock_context);
 	int (*clock_sync_calc)(struct tracecmd_time_sync *clock_context,
-			       long long *offset, long long *scaling,
+			       long long *offset, long long *scaling, long long *frac,
 			       long long *timestamp, unsigned int cpu);
 };
 
@@ -76,7 +76,7 @@ int tracecmd_tsync_proto_register(const char *proto_name, int accuracy, int role
 				  int (*init)(struct tracecmd_time_sync *),
 				  int (*free)(struct tracecmd_time_sync *),
 				  int (*calc)(struct tracecmd_time_sync *,
-					      long long *, long long *,
+					      long long *, long long *, long long *,
 					      long long *, unsigned int))
 {
 	struct tsync_proto *proto = NULL;
@@ -137,12 +137,13 @@ bool __hidden tsync_proto_is_supported(const char *proto_name)
  * @ts: Array of size @count containing timestamps of callculated offsets
  * @offsets: array of size @count, containing offsets for each timestamp
  * @scalings: array of size @count, containing scaling ratios for each timestamp
+ * @frac: array of size @count, containing fraction bits for each timestamp
  *
  * Retuns -1 in case of an error, or 0 otherwise
  */
 int tracecmd_tsync_get_offsets(struct tracecmd_time_sync *tsync, int cpu,
 			       int *count, long long **ts,
-			       long long **offsets, long long **scalings)
+			       long long **offsets, long long **scalings, long long **frac)
 {
 	struct clock_sync_context *tsync_context;
 
@@ -159,6 +160,8 @@ int tracecmd_tsync_get_offsets(struct tracecmd_time_sync *tsync, int cpu,
 		*offsets = tsync_context->offsets[cpu].sync_offsets;
 	if (scalings)
 		*scalings = tsync_context->offsets[cpu].sync_scalings;
+	if (frac)
+		*frac = tsync_context->offsets[cpu].sync_frac;
 
 	return 0;
 }
@@ -564,9 +567,11 @@ void tracecmd_tsync_free(struct tracecmd_time_sync *tsync)
 				free(tsync_context->offsets[i].sync_ts);
 				free(tsync_context->offsets[i].sync_offsets);
 				free(tsync_context->offsets[i].sync_scalings);
+				free(tsync_context->offsets[i].sync_frac);
 				tsync_context->offsets[i].sync_ts = NULL;
 				tsync_context->offsets[i].sync_offsets = NULL;
 				tsync_context->offsets[i].sync_scalings = NULL;
+				tsync_context->offsets[i].sync_frac = NULL;
 				tsync_context->offsets[i].sync_count = 0;
 				tsync_context->offsets[i].sync_size = 0;
 			}
@@ -643,10 +648,11 @@ static int tsync_send(struct tracecmd_time_sync *tsync,
 	long long timestamp = 0;
 	long long scaling = 0;
 	long long offset = 0;
+	long long frac = 0;
 	int ret;
 
 	old_set = pin_to_cpu(cpu);
-	ret = proto->clock_sync_calc(tsync, &offset, &scaling, &timestamp, cpu);
+	ret = proto->clock_sync_calc(tsync, &offset, &scaling, &frac, &timestamp, cpu);
 	if (old_set)
 		restore_pin_to_cpu(old_set);
 
@@ -685,10 +691,11 @@ static void tsync_with_host(struct tracecmd_time_sync *tsync)
 }
 
 static int record_sync_sample(struct clock_sync_offsets *offsets, int array_step,
-			      long long offset, long long scaling, long long ts)
+			      long long offset, long long scaling, long long frac, long long ts)
 {
 	long long *sync_scalings = NULL;
 	long long *sync_offsets = NULL;
+	long long *sync_frac = NULL;
 	long long *sync_ts = NULL;
 
 	if (offsets->sync_count >= offsets->sync_size) {
@@ -698,22 +705,27 @@ static int record_sync_sample(struct clock_sync_offsets *offsets, int array_step
 				       (offsets->sync_size + array_step) * sizeof(long long));
 		sync_scalings = realloc(offsets->sync_scalings,
 				       (offsets->sync_size + array_step) * sizeof(long long));
+		sync_frac = realloc(offsets->sync_frac,
+				    (offsets->sync_size + array_step) * sizeof(long long));
 
-		if (!sync_ts || !sync_offsets || !sync_scalings) {
+		if (!sync_ts || !sync_offsets || !sync_scalings || !sync_frac) {
 			free(sync_ts);
 			free(sync_offsets);
 			free(sync_scalings);
+			free(sync_frac);
 			return -1;
 		}
 		offsets->sync_size += array_step;
 		offsets->sync_ts = sync_ts;
 		offsets->sync_offsets = sync_offsets;
 		offsets->sync_scalings = sync_scalings;
+		offsets->sync_frac = sync_frac;
 	}
 
 	offsets->sync_ts[offsets->sync_count] = ts;
 	offsets->sync_offsets[offsets->sync_count] = offset;
 	offsets->sync_scalings[offsets->sync_count] = scaling;
+	offsets->sync_frac[offsets->sync_count] = frac;
 	offsets->sync_count++;
 
 	return 0;
@@ -726,9 +738,10 @@ static int tsync_get_sample(struct tracecmd_time_sync *tsync, unsigned int cpu,
 	long long timestamp = 0;
 	long long scaling = 0;
 	long long offset = 0;
+	long long frac = 0;
 	int ret;
 
-	ret = proto->clock_sync_calc(tsync, &offset, &scaling, &timestamp, cpu);
+	ret = proto->clock_sync_calc(tsync, &offset, &scaling, &frac, &timestamp, cpu);
 	if (ret) {
 		tracecmd_warning("Failed to synchronize timestamps with guest");
 		return -1;
@@ -739,7 +752,7 @@ static int tsync_get_sample(struct tracecmd_time_sync *tsync, unsigned int cpu,
 	if (!clock || cpu >= clock->cpu_count || !clock->offsets)
 		return -1;
 	return record_sync_sample(&clock->offsets[cpu], array_step,
-				  offset, scaling, timestamp);
+				  offset, scaling, frac, timestamp);
 }
 
 #define TIMER_SEC_NANO 1000000000LL
@@ -928,6 +941,7 @@ int tracecmd_write_guest_time_shift(struct tracecmd_output *handle,
 	unsigned int flags;
 	long long *scalings = NULL;
 	long long *offsets = NULL;
+	long long *frac = NULL;
 	long long *ts = NULL;
 	int vcount;
 	int count;
@@ -936,7 +950,7 @@ int tracecmd_write_guest_time_shift(struct tracecmd_output *handle,
 
 	if (!tsync->vcpu_count)
 		return -1;
-	vcount = 3 + (4 * tsync->vcpu_count);
+	vcount = 3 + (5 * tsync->vcpu_count);
 	vector = calloc(vcount, sizeof(struct iovec));
 	if (!vector)
 		return -1;
@@ -955,7 +969,7 @@ int tracecmd_write_guest_time_shift(struct tracecmd_output *handle,
 		if (j >= vcount)
 			break;
 		ret = tracecmd_tsync_get_offsets(tsync, i, &count,
-						 &ts, &offsets, &scalings);
+						 &ts, &offsets, &scalings, NULL);
 		if (ret < 0 || !count || !ts || !offsets || !scalings)
 			break;
 		vector[j].iov_len = 4;
@@ -971,6 +985,21 @@ int tracecmd_write_guest_time_shift(struct tracecmd_output *handle,
 		ret = -1;
 		goto out;
 	}
+	for (i = 0; i < tsync->vcpu_count; i++) {
+		if (j >= vcount)
+			break;
+		ret = tracecmd_tsync_get_offsets(tsync, i, NULL,
+						 NULL, NULL, NULL, &frac);
+		if (ret < 0 || !frac)
+			break;
+		vector[j].iov_len = 8 * count;
+		vector[j++].iov_base = frac;
+	}
+	if (i < tsync->vcpu_count) {
+		ret = -1;
+		goto out;
+	}
+
 	tracecmd_add_option_v(handle, TRACECMD_OPTION_TIME_SHIFT, vector, vcount);
 #ifdef TSYNC_DEBUG
 	if (count > 1)
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/3] trace-cmd: Read and use fraction bits from TRACECMD_OPTION_TIME_SHIFT
  2021-09-23  9:45 [PATCH 0/3] trace-cmd: Align guest TSC calculation with the kernel Tzvetomir Stoyanov (VMware)
  2021-09-23  9:45 ` [PATCH 1/3] trace-cmd: Extend host-guest time sync with fraction bits Tzvetomir Stoyanov (VMware)
@ 2021-09-23  9:45 ` Tzvetomir Stoyanov (VMware)
  2021-10-13  3:14   ` Steven Rostedt
  2021-09-23  9:45 ` [PATCH 3/3] trace-cmd: Dump " Tzvetomir Stoyanov (VMware)
  2 siblings, 1 reply; 10+ messages in thread
From: Tzvetomir Stoyanov (VMware) @ 2021-09-23  9:45 UTC (permalink / raw)
  To: rostedt; +Cc: linux-trace-devel

If there is array with guest time scaling fraction bits in
TRACECMD_OPTION_TIME_SHIFT option, read that information.
The formula for calculating guest timestamps is aligned with the formula
used in the kernel:
 guest_tsc = tsc-offset + (host_tsc * tsc-scaling-ratio) >> tsc-scaling-ratio-frac-bits

Signed-off-by: Tzvetomir Stoyanov (VMware) <tz.stoyanov@gmail.com>
---
 lib/trace-cmd/trace-input.c | 79 ++++++++++++++++++-------------------
 1 file changed, 39 insertions(+), 40 deletions(-)

diff --git a/lib/trace-cmd/trace-input.c b/lib/trace-cmd/trace-input.c
index ac57bc4f..996f9707 100644
--- a/lib/trace-cmd/trace-input.c
+++ b/lib/trace-cmd/trace-input.c
@@ -83,6 +83,7 @@ struct ts_offset_sample {
 	long long	time;
 	long long	offset;
 	long long	scaling;
+	long long	fraction;
 };
 
 struct guest_trace_info {
@@ -1235,7 +1236,6 @@ timestamp_correction_calc(unsigned long long ts, unsigned int flags,
 			  struct ts_offset_sample *min,
 			  struct ts_offset_sample *max)
 {
-	long long scaling;
 	long long tscor;
 
 	if (flags & TRACECMD_TSYNC_FLAG_INTERPOLATE) {
@@ -1243,15 +1243,12 @@ timestamp_correction_calc(unsigned long long ts, unsigned int flags,
 		long long offset = ((long long)ts - min->time) *
 				   (max->offset - min->offset);
 
-		scaling = (min->scaling + max->scaling) / 2;
 		tscor = min->offset + (offset + delta / 2) / delta;
-
 	} else {
-		scaling = min->scaling;
 		tscor = min->offset;
 	}
 
-	ts *= scaling;
+	ts = (ts * min->scaling) >> min->fraction;
 	if (tscor < 0)
 		return ts - llabs(tscor);
 
@@ -2337,37 +2334,16 @@ static int tsync_offset_cmp(const void *a, const void *b)
 
 #define safe_read_loop(type)						\
 	do {								\
-		int i;							\
-		for (i = 0; i < ts_offsets->ts_samples_count; i++)	\
-			safe_read(ts_offsets->ts_samples[i].type, 8);	\
+		int ii;							\
+		for (ii = 0; ii < ts_offsets->ts_samples_count; ii++)	\
+			safe_read(ts_offsets->ts_samples[ii].type, 8);	\
 	} while (0)
 
-static int tsync_offset_load(struct tep_handle	*tep,
-			     struct timesync_offsets *ts_offsets, char *buf, int size)
-{
-	int start_size = size;
-	int i, j;
-
-	safe_read_loop(time);
-	safe_read_loop(offset);
-	safe_read_loop(scaling);
-	qsort(ts_offsets->ts_samples, ts_offsets->ts_samples_count,
-	      sizeof(struct ts_offset_sample), tsync_offset_cmp);
-	/* Filter possible samples with equal time */
-	for (i = 0, j = 0; i < ts_offsets->ts_samples_count; i++) {
-		if (i == 0 || ts_offsets->ts_samples[i].time != ts_offsets->ts_samples[i-1].time)
-			ts_offsets->ts_samples[j++] = ts_offsets->ts_samples[i];
-	}
-	ts_offsets->ts_samples_count = j;
-
-	return start_size - size;
-}
-
 static int tsync_cpu_offsets_load(struct tracecmd_input *handle, char *buf, int size)
 {
 	struct tep_handle *tep = handle->pevent;
-	int ret;
-	int i;
+	struct timesync_offsets *ts_offsets;
+	int i, j, k;
 
 	safe_read(handle->host.cpu_count, 4);
 	handle->host.ts_offsets = calloc(handle->host.cpu_count,
@@ -2375,17 +2351,36 @@ static int tsync_cpu_offsets_load(struct tracecmd_input *handle, char *buf, int
 	if (!handle->host.ts_offsets)
 		return -ENOMEM;
 	for (i = 0; i < handle->host.cpu_count; i++) {
-		safe_read(handle->host.ts_offsets[i].ts_samples_count, 4);
-		handle->host.ts_offsets[i].ts_samples = calloc(handle->host.ts_offsets[i].ts_samples_count,
-							       sizeof(struct ts_offset_sample));
-		if (!handle->host.ts_offsets[i].ts_samples)
+		ts_offsets = &handle->host.ts_offsets[i];
+		safe_read(ts_offsets->ts_samples_count, 4);
+		ts_offsets->ts_samples = calloc(ts_offsets->ts_samples_count,
+						sizeof(struct ts_offset_sample));
+		if (!ts_offsets->ts_samples)
 			return -ENOMEM;
-		ret = tsync_offset_load(tep, &handle->host.ts_offsets[i], buf, size);
-		if (ret <= 0)
-			return -EFAULT;
-		size -= ret;
-		buf += ret;
+		safe_read_loop(time);
+		safe_read_loop(offset);
+		safe_read_loop(scaling);
 	}
+
+	if (size > 0) {
+		for (i = 0; i < handle->host.cpu_count; i++) {
+			ts_offsets = &handle->host.ts_offsets[i];
+			safe_read_loop(fraction);
+		}
+	}
+
+	for (i = 0; i < handle->host.cpu_count; i++) {
+		ts_offsets = &handle->host.ts_offsets[i];
+		qsort(ts_offsets->ts_samples, ts_offsets->ts_samples_count,
+		      sizeof(struct ts_offset_sample), tsync_offset_cmp);
+		/* Filter possible samples with equal time */
+		for (k = 0, j = 0; k < ts_offsets->ts_samples_count; k++) {
+			if (k == 0 || ts_offsets->ts_samples[k].time != ts_offsets->ts_samples[k-1].time)
+				ts_offsets->ts_samples[j++] = ts_offsets->ts_samples[k];
+		}
+		ts_offsets->ts_samples_count = j;
+	}
+
 	return 0;
 }
 
@@ -2715,6 +2710,10 @@ static int handle_options(struct tracecmd_input *handle)
 			 *  long long array of size [count] of timestamp offsets.
 			 *  long long array of size [count] of timestamp scaling ratios.*
 			 * ]
+			 * array of size [CPU count]:
+			 * [
+			 *  long long array of size [count] of timestamp scaling fraction bits.*
+			 * ]*
 			 */
 			if (size < 16 || (handle->flags & TRACECMD_FL_RAW_TS))
 				break;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/3] trace-cmd: Dump fraction bits from TRACECMD_OPTION_TIME_SHIFT
  2021-09-23  9:45 [PATCH 0/3] trace-cmd: Align guest TSC calculation with the kernel Tzvetomir Stoyanov (VMware)
  2021-09-23  9:45 ` [PATCH 1/3] trace-cmd: Extend host-guest time sync with fraction bits Tzvetomir Stoyanov (VMware)
  2021-09-23  9:45 ` [PATCH 2/3] trace-cmd: Read and use fraction bits from TRACECMD_OPTION_TIME_SHIFT Tzvetomir Stoyanov (VMware)
@ 2021-09-23  9:45 ` Tzvetomir Stoyanov (VMware)
  2021-10-13  3:15   ` Steven Rostedt
  2 siblings, 1 reply; 10+ messages in thread
From: Tzvetomir Stoyanov (VMware) @ 2021-09-23  9:45 UTC (permalink / raw)
  To: rostedt; +Cc: linux-trace-devel

Dump the fraction bits from the extended TRACECMD_OPTION_TIME_SHIFT
option.

Signed-off-by: Tzvetomir Stoyanov (VMware) <tz.stoyanov@gmail.com>
---
 tracecmd/trace-dump.c | 102 ++++++++++++++++++++++++++++++------------
 1 file changed, 73 insertions(+), 29 deletions(-)

diff --git a/tracecmd/trace-dump.c b/tracecmd/trace-dump.c
index 2334b57e..ed97bd3f 100644
--- a/tracecmd/trace-dump.c
+++ b/tracecmd/trace-dump.c
@@ -376,13 +376,18 @@ static void dump_option_xlong(int fd, int size, char *desc)
 	do_print(OPTIONS, "0x%llX\n", val);
 }
 
+struct time_shift_cpu {
+	unsigned int count;
+	long long *scalings;
+	long long *frac;
+	long long *offsets;
+	unsigned long long *times;
+};
+
 static void dump_option_timeshift(int fd, int size)
 {
-	long long *scalings = NULL;
-	long long *offsets = NULL;
-	unsigned long long *times = NULL;
+	struct time_shift_cpu *cpus_data;
 	long long trace_id;
-	unsigned int count;
 	unsigned int flags;
 	unsigned int cpus;
 	int i, j;
@@ -400,41 +405,80 @@ static void dump_option_timeshift(int fd, int size)
 	}
 	do_print(OPTIONS, "\t\t[Option TimeShift, %d bytes]\n", size);
 	read_file_number(fd, &trace_id, 8);
+	size -= 8;
 	do_print(OPTIONS, "0x%llX [peer's trace id]\n", trace_id);
 	read_file_number(fd, &flags, 4);
+	size -= 4;
 	do_print(OPTIONS, "0x%llX [peer's protocol flags]\n", flags);
 	read_file_number(fd, &cpus, 4);
+	size -= 4;
 	do_print(OPTIONS, "0x%llX [peer's CPU count]\n", cpus);
+	cpus_data = calloc(cpus, sizeof(struct time_shift_cpu));
+	if (!cpus_data)
+		return;
 	for (j = 0; j < cpus; j++) {
-		read_file_number(fd, &count, 4);
-		do_print(OPTIONS, "%lld [samples count for CPU %d]\n", count, j);
-		times = calloc(count, sizeof(long long));
-		offsets = calloc(count, sizeof(long long));
-		scalings = calloc(count, sizeof(long long));
-		if (!times || !offsets || !scalings)
+		if (size < 4)
 			goto out;
-		for (i = 0; i < count; i++)
-			read_file_number(fd, times + i, 8);
-		for (i = 0; i < count; i++)
-			read_file_number(fd, offsets + i, 8);
-		for (i = 0; i < count; i++)
-			read_file_number(fd, scalings + i, 8);
-
-		for (i = 0; i < count; i++)
-			do_print(OPTIONS, "\t%lld %lld %llu [offset * scaling @ time]\n",
-				 offsets[i], scalings[1], times[i]);
-		free(times);
-		free(offsets);
-		free(scalings);
-		times = NULL;
-		offsets = NULL;
-		scalings = NULL;
+		read_file_number(fd, &cpus_data[j].count, 4);
+		size -= 4;
+		do_print(OPTIONS, "%lld [samples count for CPU %d]\n", cpus_data[j].count, j);
+		cpus_data[j].times = calloc(cpus_data[j].count, sizeof(long long));
+		cpus_data[j].offsets = calloc(cpus_data[j].count, sizeof(long long));
+		cpus_data[j].scalings = calloc(cpus_data[j].count, sizeof(long long));
+		cpus_data[j].frac = calloc(cpus_data[j].count, sizeof(long long));
+		if (!cpus_data[j].times || !cpus_data[j].offsets ||
+		    !cpus_data[j].scalings || !cpus_data[j].frac)
+			goto out;
+		for (i = 0; i < cpus_data[j].count; i++) {
+			if (size < 8)
+				goto out;
+			read_file_number(fd, cpus_data[j].times + i, 8);
+			size -= 8;
+		}
+		for (i = 0; i < cpus_data[j].count; i++) {
+			if (size < 8)
+				goto out;
+			read_file_number(fd, cpus_data[j].offsets + i, 8);
+			size -= 8;
+		}
+		for (i = 0; i < cpus_data[j].count; i++) {
+			if (size < 8)
+				goto out;
+			read_file_number(fd, cpus_data[j].scalings + i, 8);
+			size -= 8;
+		}
+	}
 
+	if (size > 0) {
+		for (j = 0; j < cpus; j++) {
+			if (!cpus_data[j].frac)
+				goto out;
+			for (i = 0; i < cpus_data[j].count; i++) {
+				if (size < 8)
+					goto out;
+				read_file_number(fd, cpus_data[j].frac + i, 8);
+				size -= 8;
+			}
+		}
 	}
+
+	for (j = 0; j < cpus; j++) {
+		for (i = 0; i < cpus_data[j].count; i++)
+			do_print(OPTIONS, "\t%lld %lld %llu %llu[offset * scaling >> fraction @ time]\n",
+				 cpus_data[j].offsets[i], cpus_data[j].scalings[i],
+				 cpus_data[j].frac[i], cpus_data[j].times[i]);
+
+	}
+
 out:
-	free(times);
-	free(offsets);
-	free(scalings);
+	if (j < cpus)
+		do_print(OPTIONS, "Broken time shift option\n");
+	for (j = 0; j < cpus; j++) {
+		free(cpus_data[j].times);
+		free(cpus_data[j].offsets);
+		free(cpus_data[j].scalings);
+	}
+	free(cpus_data);
 }
 
 void dump_option_guest(int fd, int size)
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/3] trace-cmd: Extend host-guest time sync with fraction bits
  2021-09-23  9:45 ` [PATCH 1/3] trace-cmd: Extend host-guest time sync with fraction bits Tzvetomir Stoyanov (VMware)
@ 2021-10-13  3:04   ` Steven Rostedt
  2021-10-13  9:48     ` Tzvetomir Stoyanov
  0 siblings, 1 reply; 10+ messages in thread
From: Steven Rostedt @ 2021-10-13  3:04 UTC (permalink / raw)
  To: Tzvetomir Stoyanov (VMware); +Cc: linux-trace-devel

On Thu, 23 Sep 2021 12:45:24 +0300
"Tzvetomir Stoyanov (VMware)" <tz.stoyanov@gmail.com> wrote:
> @@ -175,6 +184,12 @@ static int kvm_open_vcpu_dir(struct kvm_clock_sync *kvm, int cpu, char *dir_str)
>  					 dir_str, entry->d_name);
>  				kvm->vcpu_scalings[cpu] = strdup(path);
>  			}
> +			if (!strncmp(entry->d_name, KVM_DEBUG_FRACTION_FILE,
> +				     strlen(KVM_DEBUG_FRACTION_FILE))) {

I'm curious, why not just use strcmp(), or does "d_name" have more
characters than the fraction file?

> +				snprintf(path, sizeof(path), "%s/%s",
> +					 dir_str, entry->d_name);
> +				kvm->vcpu_frac[cpu] = strdup(path);
> +			}
>  		}
>  	}
>  	if (!kvm->vcpu_offsets[cpu])
> @@ -189,6 +204,8 @@ error:
>  	kvm->vcpu_offsets[cpu] = NULL;
>  	free(kvm->vcpu_scalings[cpu]);
>  	kvm->vcpu_scalings[cpu] = NULL;
> +	free(kvm->vcpu_frac[cpu]);
> +	kvm->vcpu_frac[cpu] = NULL;
>  	return -1;
>  }
>  
> @@ -254,7 +271,8 @@ static int kvm_clock_sync_init_host(struct tracecmd_time_sync *tsync,
>  	kvm->vcpu_count = tsync->vcpu_count;
>  	kvm->vcpu_offsets = calloc(kvm->vcpu_count, sizeof(char *));
>  	kvm->vcpu_scalings = calloc(kvm->vcpu_count, sizeof(char *));
> -	if (!kvm->vcpu_offsets || !kvm->vcpu_scalings)
> +	kvm->vcpu_frac = calloc(kvm->vcpu_count, sizeof(char *));
> +	if (!kvm->vcpu_offsets || !kvm->vcpu_scalings || !kvm->vcpu_frac)
>  		goto error;
>  	if (kvm_open_debug_files(kvm, tsync->guest_pid) < 0)
>  		goto error;
> @@ -263,6 +281,7 @@ static int kvm_clock_sync_init_host(struct tracecmd_time_sync *tsync,
>  error:
>  	free(kvm->vcpu_offsets);
>  	free(kvm->vcpu_scalings);
> +	free(kvm->vcpu_frac);
>  	return -1;
>  }
>  
> @@ -353,6 +372,8 @@ static int kvm_clock_sync_free(struct tracecmd_time_sync *tsync)
>  			kvm->vcpu_offsets[i] = NULL;
>  			free(kvm->vcpu_scalings[i]);
>  			kvm->vcpu_scalings[i] = NULL;
> +			free(kvm->vcpu_frac[i]);
> +			kvm->vcpu_frac[i] = NULL;
>  		}
>  		if (kvm->tep)
>  			tep_free(kvm->tep);
> @@ -364,7 +385,7 @@ static int kvm_clock_sync_free(struct tracecmd_time_sync *tsync)
>  }
>  
>  static int kvm_clock_host(struct tracecmd_time_sync *tsync,
> -			  long long *offset, long long *scaling,
> +			  long long *offset, long long *scaling, long long *frac,
>  			  long long *timestamp, unsigned int cpu)
>  {
>  	char sync_proto[TRACECMD_TSYNC_PNAME_LENGTH];
> @@ -374,6 +395,7 @@ static int kvm_clock_host(struct tracecmd_time_sync *tsync,
>  	long long kvm_scaling = 1;
                              ^^^
Hmm, this is initialized here.

>  	unsigned int sync_msg;
>  	long long kvm_offset;
> +	long long kvm_frac;
>  	unsigned int size;
>  	char *msg;
>  	int ret;
> @@ -388,12 +410,18 @@ static int kvm_clock_host(struct tracecmd_time_sync *tsync,
>  	ret = read_ll_from_file(kvm->vcpu_offsets[cpu], &kvm_offset);
>  	if (ret < 0)
>  		return -1;
> +
> +	kvm_scaling = 1;

Why are you adding the above?

>  	if (kvm->vcpu_scalings && kvm->vcpu_scalings[cpu]) {
>  		read_ll_from_file(kvm->vcpu_scalings[cpu], &kvm_scaling);
>  		if (kvm_scaling == KVM_SCALING_AMD_DEFAULT ||
>  		    kvm_scaling == KVM_SCALING_INTEL_DEFAULT)
>  			kvm_scaling = 1;
>  	}
> +
> +	kvm_frac = 0;

Should probably initialize that at the beginning too.

> +	if (kvm->vcpu_frac && kvm->vcpu_frac[cpu])
> +		ret = read_ll_from_file(kvm->vcpu_frac[cpu], &kvm_frac);
>  	msg = (char *)&packet;
>  	size = sizeof(packet);
>  	ret = tracecmd_msg_recv_time_sync(tsync->msg_handle,
> @@ -405,6 +433,7 @@ static int kvm_clock_host(struct tracecmd_time_sync *tsync,
>  
>  	packet.offset = -kvm_offset;
>  	packet.scaling = kvm_scaling;
> +	packet.frac = kvm_frac;
>  	ret = tracecmd_msg_send_time_sync(tsync->msg_handle, KVM_NAME,
>  					  KVM_SYNC_PKT_RESPONSE, sizeof(packet),
>  					  (char *)&packet);
> @@ -413,6 +442,7 @@ static int kvm_clock_host(struct tracecmd_time_sync *tsync,
>  
>  	*scaling = packet.scaling;
>  	*offset = packet.offset;
> +	*frac = kvm_frac;
>  	*timestamp = packet.ts;
>  
>  	return 0;
> @@ -444,10 +474,10 @@ static int kvm_marker_find(struct tep_event *event, struct tep_record *record,
>  	return 0;
>  }
>  
> -
>  static int kvm_clock_guest(struct tracecmd_time_sync *tsync,
>  			   long long *offset,
>  			   long long *scaling,
> +			   long long *frac,
>  			   long long *timestamp)
>  {
>  	char sync_proto[TRACECMD_TSYNC_PNAME_LENGTH];
> @@ -488,12 +518,13 @@ static int kvm_clock_guest(struct tracecmd_time_sync *tsync,
>  
>  	*scaling = packet.scaling;
>  	*offset = packet.offset;
> +	*frac = packet.frac;
>  	*timestamp = packet.ts;
>  	return 0;
>  }
>  
>  static int kvm_clock_sync_calc(struct tracecmd_time_sync *tsync,
> -			       long long *offset, long long *scaling,
> +			       long long *offset, long long *scaling, long long *frac,
>  			       long long *timestamp, unsigned int cpu)
>  {
>  	struct clock_sync_context *clock_context;
> @@ -505,9 +536,9 @@ static int kvm_clock_sync_calc(struct tracecmd_time_sync *tsync,
>  	clock_context = (struct clock_sync_context *)tsync->context;
>  
>  	if (clock_context->is_guest)
> -		ret = kvm_clock_guest(tsync, offset, scaling, timestamp);
> +		ret = kvm_clock_guest(tsync, offset, scaling, frac, timestamp);
>  	else
> -		ret = kvm_clock_host(tsync, offset, scaling, timestamp, cpu);
> +		ret = kvm_clock_host(tsync, offset, scaling, frac, timestamp, cpu);
>  	return ret;
>  }
>  
> diff --git a/lib/trace-cmd/trace-timesync-ptp.c b/lib/trace-cmd/trace-timesync-ptp.c
> index b05f1cd0..70242ee3 100644
> --- a/lib/trace-cmd/trace-timesync-ptp.c
> +++ b/lib/trace-cmd/trace-timesync-ptp.c
> @@ -663,7 +663,7 @@ static int ptp_clock_server(struct tracecmd_time_sync *tsync,
>  }
>  
>  static int ptp_clock_sync_calc(struct tracecmd_time_sync *tsync,
> -			       long long *offset, long long *scaling,
> +			       long long *offset, long long *scaling, long long *frac,
>  			       long long *timestamp, unsigned int cpu)
>  {
>  	struct clock_sync_context *clock_context;
> @@ -689,6 +689,8 @@ static int ptp_clock_sync_calc(struct tracecmd_time_sync *tsync,
>  
>  	if (scaling)
>  		*scaling = 1;
> +	if (frac)
> +		*frac = 0;
>  	if (clock_context->is_server)
>  		ret = ptp_clock_server(tsync, offset, timestamp);
>  	else
> diff --git a/lib/trace-cmd/trace-timesync.c b/lib/trace-cmd/trace-timesync.c
> index 19ca19d7..298913a1 100644
> --- a/lib/trace-cmd/trace-timesync.c
> +++ b/lib/trace-cmd/trace-timesync.c
> @@ -36,7 +36,7 @@ struct tsync_proto {
>  	int (*clock_sync_init)(struct tracecmd_time_sync *clock_context);
>  	int (*clock_sync_free)(struct tracecmd_time_sync *clock_context);
>  	int (*clock_sync_calc)(struct tracecmd_time_sync *clock_context,
> -			       long long *offset, long long *scaling,
> +			       long long *offset, long long *scaling, long long *frac,
>  			       long long *timestamp, unsigned int cpu);
>  };
>  
> @@ -76,7 +76,7 @@ int tracecmd_tsync_proto_register(const char *proto_name, int accuracy, int role
>  				  int (*init)(struct tracecmd_time_sync *),
>  				  int (*free)(struct tracecmd_time_sync *),
>  				  int (*calc)(struct tracecmd_time_sync *,
> -					      long long *, long long *,
> +					      long long *, long long *, long long *,
>  					      long long *, unsigned int))
>  {
>  	struct tsync_proto *proto = NULL;
> @@ -137,12 +137,13 @@ bool __hidden tsync_proto_is_supported(const char *proto_name)
>   * @ts: Array of size @count containing timestamps of callculated offsets
>   * @offsets: array of size @count, containing offsets for each timestamp
>   * @scalings: array of size @count, containing scaling ratios for each timestamp
> + * @frac: array of size @count, containing fraction bits for each timestamp
>   *
>   * Retuns -1 in case of an error, or 0 otherwise
>   */
>  int tracecmd_tsync_get_offsets(struct tracecmd_time_sync *tsync, int cpu,
>  			       int *count, long long **ts,
> -			       long long **offsets, long long **scalings)
> +			       long long **offsets, long long **scalings, long long **frac)
>  {
>  	struct clock_sync_context *tsync_context;
>  
> @@ -159,6 +160,8 @@ int tracecmd_tsync_get_offsets(struct tracecmd_time_sync *tsync, int cpu,
>  		*offsets = tsync_context->offsets[cpu].sync_offsets;
>  	if (scalings)
>  		*scalings = tsync_context->offsets[cpu].sync_scalings;
> +	if (frac)
> +		*frac = tsync_context->offsets[cpu].sync_frac;
>  
>  	return 0;
>  }
> @@ -564,9 +567,11 @@ void tracecmd_tsync_free(struct tracecmd_time_sync *tsync)
>  				free(tsync_context->offsets[i].sync_ts);
>  				free(tsync_context->offsets[i].sync_offsets);
>  				free(tsync_context->offsets[i].sync_scalings);
> +				free(tsync_context->offsets[i].sync_frac);
>  				tsync_context->offsets[i].sync_ts = NULL;
>  				tsync_context->offsets[i].sync_offsets = NULL;
>  				tsync_context->offsets[i].sync_scalings = NULL;
> +				tsync_context->offsets[i].sync_frac = NULL;
>  				tsync_context->offsets[i].sync_count = 0;
>  				tsync_context->offsets[i].sync_size = 0;
>  			}
> @@ -643,10 +648,11 @@ static int tsync_send(struct tracecmd_time_sync *tsync,
>  	long long timestamp = 0;
>  	long long scaling = 0;
>  	long long offset = 0;
> +	long long frac = 0;
>  	int ret;
>  
>  	old_set = pin_to_cpu(cpu);
> -	ret = proto->clock_sync_calc(tsync, &offset, &scaling, &timestamp, cpu);
> +	ret = proto->clock_sync_calc(tsync, &offset, &scaling, &frac, &timestamp, cpu);
>  	if (old_set)
>  		restore_pin_to_cpu(old_set);
>  
> @@ -685,10 +691,11 @@ static void tsync_with_host(struct tracecmd_time_sync *tsync)
>  }
>  
>  static int record_sync_sample(struct clock_sync_offsets *offsets, int array_step,
> -			      long long offset, long long scaling, long long ts)
> +			      long long offset, long long scaling, long long frac, long long ts)
>  {
>  	long long *sync_scalings = NULL;
>  	long long *sync_offsets = NULL;
> +	long long *sync_frac = NULL;
>  	long long *sync_ts = NULL;
>  
>  	if (offsets->sync_count >= offsets->sync_size) {
> @@ -698,22 +705,27 @@ static int record_sync_sample(struct clock_sync_offsets *offsets, int array_step
>  				       (offsets->sync_size + array_step) * sizeof(long long));
>  		sync_scalings = realloc(offsets->sync_scalings,
>  				       (offsets->sync_size + array_step) * sizeof(long long));
> +		sync_frac = realloc(offsets->sync_frac,
> +				    (offsets->sync_size + array_step) * sizeof(long long));
>  
> -		if (!sync_ts || !sync_offsets || !sync_scalings) {
> +		if (!sync_ts || !sync_offsets || !sync_scalings || !sync_frac) {
>  			free(sync_ts);
>  			free(sync_offsets);
>  			free(sync_scalings);
> +			free(sync_frac);
>  			return -1;
>  		}
>  		offsets->sync_size += array_step;
>  		offsets->sync_ts = sync_ts;
>  		offsets->sync_offsets = sync_offsets;
>  		offsets->sync_scalings = sync_scalings;
> +		offsets->sync_frac = sync_frac;
>  	}
>  
>  	offsets->sync_ts[offsets->sync_count] = ts;
>  	offsets->sync_offsets[offsets->sync_count] = offset;
>  	offsets->sync_scalings[offsets->sync_count] = scaling;
> +	offsets->sync_frac[offsets->sync_count] = frac;
>  	offsets->sync_count++;
>  
>  	return 0;
> @@ -726,9 +738,10 @@ static int tsync_get_sample(struct tracecmd_time_sync *tsync, unsigned int cpu,
>  	long long timestamp = 0;
>  	long long scaling = 0;
>  	long long offset = 0;
> +	long long frac = 0;
>  	int ret;
>  
> -	ret = proto->clock_sync_calc(tsync, &offset, &scaling, &timestamp, cpu);
> +	ret = proto->clock_sync_calc(tsync, &offset, &scaling, &frac, &timestamp, cpu);
>  	if (ret) {
>  		tracecmd_warning("Failed to synchronize timestamps with guest");
>  		return -1;
> @@ -739,7 +752,7 @@ static int tsync_get_sample(struct tracecmd_time_sync *tsync, unsigned int cpu,
>  	if (!clock || cpu >= clock->cpu_count || !clock->offsets)
>  		return -1;
>  	return record_sync_sample(&clock->offsets[cpu], array_step,
> -				  offset, scaling, timestamp);
> +				  offset, scaling, frac, timestamp);
>  }
>  
>  #define TIMER_SEC_NANO 1000000000LL
> @@ -928,6 +941,7 @@ int tracecmd_write_guest_time_shift(struct tracecmd_output *handle,
>  	unsigned int flags;
>  	long long *scalings = NULL;
>  	long long *offsets = NULL;
> +	long long *frac = NULL;
>  	long long *ts = NULL;
>  	int vcount;
>  	int count;
> @@ -936,7 +950,7 @@ int tracecmd_write_guest_time_shift(struct tracecmd_output *handle,
>  
>  	if (!tsync->vcpu_count)
>  		return -1;
> -	vcount = 3 + (4 * tsync->vcpu_count);
> +	vcount = 3 + (5 * tsync->vcpu_count);
>  	vector = calloc(vcount, sizeof(struct iovec));
>  	if (!vector)
>  		return -1;
> @@ -955,7 +969,7 @@ int tracecmd_write_guest_time_shift(struct tracecmd_output *handle,
>  		if (j >= vcount)
>  			break;
>  		ret = tracecmd_tsync_get_offsets(tsync, i, &count,
> -						 &ts, &offsets, &scalings);
> +						 &ts, &offsets, &scalings, NULL);
>  		if (ret < 0 || !count || !ts || !offsets || !scalings)
>  			break;
>  		vector[j].iov_len = 4;
> @@ -971,6 +985,21 @@ int tracecmd_write_guest_time_shift(struct tracecmd_output *handle,
>  		ret = -1;
>  		goto out;
>  	}

Should have a comment here to why this is a separate loop. I'm guessing
that the frac isn't added in the above loop due to backward compatibility?

> +	for (i = 0; i < tsync->vcpu_count; i++) {
> +		if (j >= vcount)
> +			break;
> +		ret = tracecmd_tsync_get_offsets(tsync, i, NULL,
> +						 NULL, NULL, NULL, &frac);
> +		if (ret < 0 || !frac)
> +			break;
> +		vector[j].iov_len = 8 * count;
> +		vector[j++].iov_base = frac;
> +	}
> +	if (i < tsync->vcpu_count) {
> +		ret = -1;
> +		goto out;

Hmm, so if frac doesn't exist we don't use this? Is this used if we
only have a tsc_offset but no scaling?

-- Steve

> +	}
> +
>  	tracecmd_add_option_v(handle, TRACECMD_OPTION_TIME_SHIFT, vector, vcount);
>  #ifdef TSYNC_DEBUG
>  	if (count > 1)


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/3] trace-cmd: Read and use fraction bits from TRACECMD_OPTION_TIME_SHIFT
  2021-09-23  9:45 ` [PATCH 2/3] trace-cmd: Read and use fraction bits from TRACECMD_OPTION_TIME_SHIFT Tzvetomir Stoyanov (VMware)
@ 2021-10-13  3:14   ` Steven Rostedt
  2021-10-13 10:03     ` Tzvetomir Stoyanov
  0 siblings, 1 reply; 10+ messages in thread
From: Steven Rostedt @ 2021-10-13  3:14 UTC (permalink / raw)
  To: Tzvetomir Stoyanov (VMware); +Cc: linux-trace-devel

On Thu, 23 Sep 2021 12:45:25 +0300
"Tzvetomir Stoyanov (VMware)" <tz.stoyanov@gmail.com> wrote:

> @@ -1235,7 +1236,6 @@ timestamp_correction_calc(unsigned long long ts, unsigned int flags,
>  			  struct ts_offset_sample *min,
>  			  struct ts_offset_sample *max)
>  {
> -	long long scaling;
>  	long long tscor;
>  
>  	if (flags & TRACECMD_TSYNC_FLAG_INTERPOLATE) {
> @@ -1243,15 +1243,12 @@ timestamp_correction_calc(unsigned long long ts, unsigned int flags,
>  		long long offset = ((long long)ts - min->time) *
>  				   (max->offset - min->offset);
>  
> -		scaling = (min->scaling + max->scaling) / 2;

So the max->scaling is not going to play a role here?

Does this work for other methods besides KVM?

-- Steve


>  		tscor = min->offset + (offset + delta / 2) / delta;
> -
>  	} else {
> -		scaling = min->scaling;
>  		tscor = min->offset;
>  	}
>  
> -	ts *= scaling;
> +	ts = (ts * min->scaling) >> min->fraction;
>  	if (tscor < 0)
>  		return ts - llabs(tscor);
>  
> @@ -2337,37 +2334,16 @@ static int tsync_offset_cmp(const void *a, const void *b)
>  

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 3/3] trace-cmd: Dump fraction bits from TRACECMD_OPTION_TIME_SHIFT
  2021-09-23  9:45 ` [PATCH 3/3] trace-cmd: Dump " Tzvetomir Stoyanov (VMware)
@ 2021-10-13  3:15   ` Steven Rostedt
  0 siblings, 0 replies; 10+ messages in thread
From: Steven Rostedt @ 2021-10-13  3:15 UTC (permalink / raw)
  To: Tzvetomir Stoyanov (VMware); +Cc: linux-trace-devel

On Thu, 23 Sep 2021 12:45:26 +0300
"Tzvetomir Stoyanov (VMware)" <tz.stoyanov@gmail.com> wrote:

> Dump the fraction bits from the extended TRACECMD_OPTION_TIME_SHIFT
> option.

This just says what the patch does, it does not say why.

-- Steve

> 
> Signed-off-by: Tzvetomir Stoyanov (VMware) <tz.stoyanov@gmail.com>
> ---
>  tracecmd/trace-dump.c | 102 ++++++++++++++++++++++++++++++------------
>  1 file changed, 73 insertions(+), 29 deletions(-)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/3] trace-cmd: Extend host-guest time sync with fraction bits
  2021-10-13  3:04   ` Steven Rostedt
@ 2021-10-13  9:48     ` Tzvetomir Stoyanov
  2021-10-13 14:53       ` Steven Rostedt
  0 siblings, 1 reply; 10+ messages in thread
From: Tzvetomir Stoyanov @ 2021-10-13  9:48 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Linux Trace Devel

On Wed, Oct 13, 2021 at 6:04 AM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Thu, 23 Sep 2021 12:45:24 +0300
> "Tzvetomir Stoyanov (VMware)" <tz.stoyanov@gmail.com> wrote:
> > @@ -175,6 +184,12 @@ static int kvm_open_vcpu_dir(struct kvm_clock_sync *kvm, int cpu, char *dir_str)
> >                                        dir_str, entry->d_name);
> >                               kvm->vcpu_scalings[cpu] = strdup(path);
> >                       }
> > +                     if (!strncmp(entry->d_name, KVM_DEBUG_FRACTION_FILE,
> > +                                  strlen(KVM_DEBUG_FRACTION_FILE))) {
>
> I'm curious, why not just use strcmp(), or does "d_name" have more
> characters than the fraction file?

Just to be on the safe side. Currently the fraction file is with the
longest name, but that may change in the future.

>
> > +                             snprintf(path, sizeof(path), "%s/%s",
> > +                                      dir_str, entry->d_name);
> > +                             kvm->vcpu_frac[cpu] = strdup(path);
> > +                     }
> >               }
> >       }
> >       if (!kvm->vcpu_offsets[cpu])
> > @@ -189,6 +204,8 @@ error:
> >       kvm->vcpu_offsets[cpu] = NULL;
> >       free(kvm->vcpu_scalings[cpu]);
> >       kvm->vcpu_scalings[cpu] = NULL;
> > +     free(kvm->vcpu_frac[cpu]);
> > +     kvm->vcpu_frac[cpu] = NULL;
> >       return -1;
> >  }
> >
> > @@ -254,7 +271,8 @@ static int kvm_clock_sync_init_host(struct tracecmd_time_sync *tsync,
> >       kvm->vcpu_count = tsync->vcpu_count;
> >       kvm->vcpu_offsets = calloc(kvm->vcpu_count, sizeof(char *));
> >       kvm->vcpu_scalings = calloc(kvm->vcpu_count, sizeof(char *));
> > -     if (!kvm->vcpu_offsets || !kvm->vcpu_scalings)
> > +     kvm->vcpu_frac = calloc(kvm->vcpu_count, sizeof(char *));
> > +     if (!kvm->vcpu_offsets || !kvm->vcpu_scalings || !kvm->vcpu_frac)
> >               goto error;
> >       if (kvm_open_debug_files(kvm, tsync->guest_pid) < 0)
> >               goto error;
> > @@ -263,6 +281,7 @@ static int kvm_clock_sync_init_host(struct tracecmd_time_sync *tsync,
> >  error:
> >       free(kvm->vcpu_offsets);
> >       free(kvm->vcpu_scalings);
> > +     free(kvm->vcpu_frac);
> >       return -1;
> >  }
> >
> > @@ -353,6 +372,8 @@ static int kvm_clock_sync_free(struct tracecmd_time_sync *tsync)
> >                       kvm->vcpu_offsets[i] = NULL;
> >                       free(kvm->vcpu_scalings[i]);
> >                       kvm->vcpu_scalings[i] = NULL;
> > +                     free(kvm->vcpu_frac[i]);
> > +                     kvm->vcpu_frac[i] = NULL;
> >               }
> >               if (kvm->tep)
> >                       tep_free(kvm->tep);
> > @@ -364,7 +385,7 @@ static int kvm_clock_sync_free(struct tracecmd_time_sync *tsync)
> >  }
> >
> >  static int kvm_clock_host(struct tracecmd_time_sync *tsync,
> > -                       long long *offset, long long *scaling,
> > +                       long long *offset, long long *scaling, long long *frac,
> >                         long long *timestamp, unsigned int cpu)
> >  {
> >       char sync_proto[TRACECMD_TSYNC_PNAME_LENGTH];
> > @@ -374,6 +395,7 @@ static int kvm_clock_host(struct tracecmd_time_sync *tsync,
> >       long long kvm_scaling = 1;
>                               ^^^
> Hmm, this is initialized here.
>
> >       unsigned int sync_msg;
> >       long long kvm_offset;
> > +     long long kvm_frac;
> >       unsigned int size;
> >       char *msg;
> >       int ret;
> > @@ -388,12 +410,18 @@ static int kvm_clock_host(struct tracecmd_time_sync *tsync,
> >       ret = read_ll_from_file(kvm->vcpu_offsets[cpu], &kvm_offset);
> >       if (ret < 0)
> >               return -1;
> > +
> > +     kvm_scaling = 1;
>
> Why are you adding the above?

you are right, this one is useless.

>
> >       if (kvm->vcpu_scalings && kvm->vcpu_scalings[cpu]) {
> >               read_ll_from_file(kvm->vcpu_scalings[cpu], &kvm_scaling);
> >               if (kvm_scaling == KVM_SCALING_AMD_DEFAULT ||
> >                   kvm_scaling == KVM_SCALING_INTEL_DEFAULT)
> >                       kvm_scaling = 1;
> >       }
> > +
> > +     kvm_frac = 0;
>
> Should probably initialize that at the beginning too.
>
> > +     if (kvm->vcpu_frac && kvm->vcpu_frac[cpu])
> > +             ret = read_ll_from_file(kvm->vcpu_frac[cpu], &kvm_frac);
> >       msg = (char *)&packet;
> >       size = sizeof(packet);
> >       ret = tracecmd_msg_recv_time_sync(tsync->msg_handle,
> > @@ -405,6 +433,7 @@ static int kvm_clock_host(struct tracecmd_time_sync *tsync,
> >
> >       packet.offset = -kvm_offset;
> >       packet.scaling = kvm_scaling;
> > +     packet.frac = kvm_frac;
> >       ret = tracecmd_msg_send_time_sync(tsync->msg_handle, KVM_NAME,
> >                                         KVM_SYNC_PKT_RESPONSE, sizeof(packet),
> >                                         (char *)&packet);
> > @@ -413,6 +442,7 @@ static int kvm_clock_host(struct tracecmd_time_sync *tsync,
> >
> >       *scaling = packet.scaling;
> >       *offset = packet.offset;
> > +     *frac = kvm_frac;
> >       *timestamp = packet.ts;
> >
> >       return 0;
> > @@ -444,10 +474,10 @@ static int kvm_marker_find(struct tep_event *event, struct tep_record *record,
> >       return 0;
> >  }
> >
> > -
> >  static int kvm_clock_guest(struct tracecmd_time_sync *tsync,
> >                          long long *offset,
> >                          long long *scaling,
> > +                        long long *frac,
> >                          long long *timestamp)
> >  {
> >       char sync_proto[TRACECMD_TSYNC_PNAME_LENGTH];
> > @@ -488,12 +518,13 @@ static int kvm_clock_guest(struct tracecmd_time_sync *tsync,
> >
> >       *scaling = packet.scaling;
> >       *offset = packet.offset;
> > +     *frac = packet.frac;
> >       *timestamp = packet.ts;
> >       return 0;
> >  }
> >
> >  static int kvm_clock_sync_calc(struct tracecmd_time_sync *tsync,
> > -                            long long *offset, long long *scaling,
> > +                            long long *offset, long long *scaling, long long *frac,
> >                              long long *timestamp, unsigned int cpu)
> >  {
> >       struct clock_sync_context *clock_context;
> > @@ -505,9 +536,9 @@ static int kvm_clock_sync_calc(struct tracecmd_time_sync *tsync,
> >       clock_context = (struct clock_sync_context *)tsync->context;
> >
> >       if (clock_context->is_guest)
> > -             ret = kvm_clock_guest(tsync, offset, scaling, timestamp);
> > +             ret = kvm_clock_guest(tsync, offset, scaling, frac, timestamp);
> >       else
> > -             ret = kvm_clock_host(tsync, offset, scaling, timestamp, cpu);
> > +             ret = kvm_clock_host(tsync, offset, scaling, frac, timestamp, cpu);
> >       return ret;
> >  }
> >
> > diff --git a/lib/trace-cmd/trace-timesync-ptp.c b/lib/trace-cmd/trace-timesync-ptp.c
> > index b05f1cd0..70242ee3 100644
> > --- a/lib/trace-cmd/trace-timesync-ptp.c
> > +++ b/lib/trace-cmd/trace-timesync-ptp.c
> > @@ -663,7 +663,7 @@ static int ptp_clock_server(struct tracecmd_time_sync *tsync,
> >  }
> >
> >  static int ptp_clock_sync_calc(struct tracecmd_time_sync *tsync,
> > -                            long long *offset, long long *scaling,
> > +                            long long *offset, long long *scaling, long long *frac,
> >                              long long *timestamp, unsigned int cpu)
> >  {
> >       struct clock_sync_context *clock_context;
> > @@ -689,6 +689,8 @@ static int ptp_clock_sync_calc(struct tracecmd_time_sync *tsync,
> >
> >       if (scaling)
> >               *scaling = 1;
> > +     if (frac)
> > +             *frac = 0;
> >       if (clock_context->is_server)
> >               ret = ptp_clock_server(tsync, offset, timestamp);
> >       else
> > diff --git a/lib/trace-cmd/trace-timesync.c b/lib/trace-cmd/trace-timesync.c
> > index 19ca19d7..298913a1 100644
> > --- a/lib/trace-cmd/trace-timesync.c
> > +++ b/lib/trace-cmd/trace-timesync.c
> > @@ -36,7 +36,7 @@ struct tsync_proto {
> >       int (*clock_sync_init)(struct tracecmd_time_sync *clock_context);
> >       int (*clock_sync_free)(struct tracecmd_time_sync *clock_context);
> >       int (*clock_sync_calc)(struct tracecmd_time_sync *clock_context,
> > -                            long long *offset, long long *scaling,
> > +                            long long *offset, long long *scaling, long long *frac,
> >                              long long *timestamp, unsigned int cpu);
> >  };
> >
> > @@ -76,7 +76,7 @@ int tracecmd_tsync_proto_register(const char *proto_name, int accuracy, int role
> >                                 int (*init)(struct tracecmd_time_sync *),
> >                                 int (*free)(struct tracecmd_time_sync *),
> >                                 int (*calc)(struct tracecmd_time_sync *,
> > -                                           long long *, long long *,
> > +                                           long long *, long long *, long long *,
> >                                             long long *, unsigned int))
> >  {
> >       struct tsync_proto *proto = NULL;
> > @@ -137,12 +137,13 @@ bool __hidden tsync_proto_is_supported(const char *proto_name)
> >   * @ts: Array of size @count containing timestamps of callculated offsets
> >   * @offsets: array of size @count, containing offsets for each timestamp
> >   * @scalings: array of size @count, containing scaling ratios for each timestamp
> > + * @frac: array of size @count, containing fraction bits for each timestamp
> >   *
> >   * Retuns -1 in case of an error, or 0 otherwise
> >   */
> >  int tracecmd_tsync_get_offsets(struct tracecmd_time_sync *tsync, int cpu,
> >                              int *count, long long **ts,
> > -                            long long **offsets, long long **scalings)
> > +                            long long **offsets, long long **scalings, long long **frac)
> >  {
> >       struct clock_sync_context *tsync_context;
> >
> > @@ -159,6 +160,8 @@ int tracecmd_tsync_get_offsets(struct tracecmd_time_sync *tsync, int cpu,
> >               *offsets = tsync_context->offsets[cpu].sync_offsets;
> >       if (scalings)
> >               *scalings = tsync_context->offsets[cpu].sync_scalings;
> > +     if (frac)
> > +             *frac = tsync_context->offsets[cpu].sync_frac;
> >
> >       return 0;
> >  }
> > @@ -564,9 +567,11 @@ void tracecmd_tsync_free(struct tracecmd_time_sync *tsync)
> >                               free(tsync_context->offsets[i].sync_ts);
> >                               free(tsync_context->offsets[i].sync_offsets);
> >                               free(tsync_context->offsets[i].sync_scalings);
> > +                             free(tsync_context->offsets[i].sync_frac);
> >                               tsync_context->offsets[i].sync_ts = NULL;
> >                               tsync_context->offsets[i].sync_offsets = NULL;
> >                               tsync_context->offsets[i].sync_scalings = NULL;
> > +                             tsync_context->offsets[i].sync_frac = NULL;
> >                               tsync_context->offsets[i].sync_count = 0;
> >                               tsync_context->offsets[i].sync_size = 0;
> >                       }
> > @@ -643,10 +648,11 @@ static int tsync_send(struct tracecmd_time_sync *tsync,
> >       long long timestamp = 0;
> >       long long scaling = 0;
> >       long long offset = 0;
> > +     long long frac = 0;
> >       int ret;
> >
> >       old_set = pin_to_cpu(cpu);
> > -     ret = proto->clock_sync_calc(tsync, &offset, &scaling, &timestamp, cpu);
> > +     ret = proto->clock_sync_calc(tsync, &offset, &scaling, &frac, &timestamp, cpu);
> >       if (old_set)
> >               restore_pin_to_cpu(old_set);
> >
> > @@ -685,10 +691,11 @@ static void tsync_with_host(struct tracecmd_time_sync *tsync)
> >  }
> >
> >  static int record_sync_sample(struct clock_sync_offsets *offsets, int array_step,
> > -                           long long offset, long long scaling, long long ts)
> > +                           long long offset, long long scaling, long long frac, long long ts)
> >  {
> >       long long *sync_scalings = NULL;
> >       long long *sync_offsets = NULL;
> > +     long long *sync_frac = NULL;
> >       long long *sync_ts = NULL;
> >
> >       if (offsets->sync_count >= offsets->sync_size) {
> > @@ -698,22 +705,27 @@ static int record_sync_sample(struct clock_sync_offsets *offsets, int array_step
> >                                      (offsets->sync_size + array_step) * sizeof(long long));
> >               sync_scalings = realloc(offsets->sync_scalings,
> >                                      (offsets->sync_size + array_step) * sizeof(long long));
> > +             sync_frac = realloc(offsets->sync_frac,
> > +                                 (offsets->sync_size + array_step) * sizeof(long long));
> >
> > -             if (!sync_ts || !sync_offsets || !sync_scalings) {
> > +             if (!sync_ts || !sync_offsets || !sync_scalings || !sync_frac) {
> >                       free(sync_ts);
> >                       free(sync_offsets);
> >                       free(sync_scalings);
> > +                     free(sync_frac);
> >                       return -1;
> >               }
> >               offsets->sync_size += array_step;
> >               offsets->sync_ts = sync_ts;
> >               offsets->sync_offsets = sync_offsets;
> >               offsets->sync_scalings = sync_scalings;
> > +             offsets->sync_frac = sync_frac;
> >       }
> >
> >       offsets->sync_ts[offsets->sync_count] = ts;
> >       offsets->sync_offsets[offsets->sync_count] = offset;
> >       offsets->sync_scalings[offsets->sync_count] = scaling;
> > +     offsets->sync_frac[offsets->sync_count] = frac;
> >       offsets->sync_count++;
> >
> >       return 0;
> > @@ -726,9 +738,10 @@ static int tsync_get_sample(struct tracecmd_time_sync *tsync, unsigned int cpu,
> >       long long timestamp = 0;
> >       long long scaling = 0;
> >       long long offset = 0;
> > +     long long frac = 0;
> >       int ret;
> >
> > -     ret = proto->clock_sync_calc(tsync, &offset, &scaling, &timestamp, cpu);
> > +     ret = proto->clock_sync_calc(tsync, &offset, &scaling, &frac, &timestamp, cpu);
> >       if (ret) {
> >               tracecmd_warning("Failed to synchronize timestamps with guest");
> >               return -1;
> > @@ -739,7 +752,7 @@ static int tsync_get_sample(struct tracecmd_time_sync *tsync, unsigned int cpu,
> >       if (!clock || cpu >= clock->cpu_count || !clock->offsets)
> >               return -1;
> >       return record_sync_sample(&clock->offsets[cpu], array_step,
> > -                               offset, scaling, timestamp);
> > +                               offset, scaling, frac, timestamp);
> >  }
> >
> >  #define TIMER_SEC_NANO 1000000000LL
> > @@ -928,6 +941,7 @@ int tracecmd_write_guest_time_shift(struct tracecmd_output *handle,
> >       unsigned int flags;
> >       long long *scalings = NULL;
> >       long long *offsets = NULL;
> > +     long long *frac = NULL;
> >       long long *ts = NULL;
> >       int vcount;
> >       int count;
> > @@ -936,7 +950,7 @@ int tracecmd_write_guest_time_shift(struct tracecmd_output *handle,
> >
> >       if (!tsync->vcpu_count)
> >               return -1;
> > -     vcount = 3 + (4 * tsync->vcpu_count);
> > +     vcount = 3 + (5 * tsync->vcpu_count);
> >       vector = calloc(vcount, sizeof(struct iovec));
> >       if (!vector)
> >               return -1;
> > @@ -955,7 +969,7 @@ int tracecmd_write_guest_time_shift(struct tracecmd_output *handle,
> >               if (j >= vcount)
> >                       break;
> >               ret = tracecmd_tsync_get_offsets(tsync, i, &count,
> > -                                              &ts, &offsets, &scalings);
> > +                                              &ts, &offsets, &scalings, NULL);
> >               if (ret < 0 || !count || !ts || !offsets || !scalings)
> >                       break;
> >               vector[j].iov_len = 4;
> > @@ -971,6 +985,21 @@ int tracecmd_write_guest_time_shift(struct tracecmd_output *handle,
> >               ret = -1;
> >               goto out;
> >       }
>
> Should have a comment here to why this is a separate loop. I'm guessing
> that the frac isn't added in the above loop due to backward compatibility?

Yes, because of backward compatibility. Going to describe it in the
next version.

>
> > +     for (i = 0; i < tsync->vcpu_count; i++) {
> > +             if (j >= vcount)
> > +                     break;
> > +             ret = tracecmd_tsync_get_offsets(tsync, i, NULL,
> > +                                              NULL, NULL, NULL, &frac);
> > +             if (ret < 0 || !frac)
> > +                     break;
> > +             vector[j].iov_len = 8 * count;
> > +             vector[j++].iov_base = frac;
> > +     }
> > +     if (i < tsync->vcpu_count) {
> > +             ret = -1;
> > +             goto out;
>
> Hmm, so if frac doesn't exist we don't use this? Is this used if we
> only have a tsc_offset but no scaling?
>

You are right, actually frac 0 could be a valid value - the check
inside the loop "!frac" must be removed. If there is only tsc_offset,
the logic should work.

> -- Steve
>
> > +     }
> > +
> >       tracecmd_add_option_v(handle, TRACECMD_OPTION_TIME_SHIFT, vector, vcount);
> >  #ifdef TSYNC_DEBUG
> >       if (count > 1)
>


-- 
Tzvetomir (Ceco) Stoyanov
VMware Open Source Technology Center

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/3] trace-cmd: Read and use fraction bits from TRACECMD_OPTION_TIME_SHIFT
  2021-10-13  3:14   ` Steven Rostedt
@ 2021-10-13 10:03     ` Tzvetomir Stoyanov
  0 siblings, 0 replies; 10+ messages in thread
From: Tzvetomir Stoyanov @ 2021-10-13 10:03 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Linux Trace Devel

On Wed, Oct 13, 2021 at 6:14 AM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Thu, 23 Sep 2021 12:45:25 +0300
> "Tzvetomir Stoyanov (VMware)" <tz.stoyanov@gmail.com> wrote:
>
> > @@ -1235,7 +1236,6 @@ timestamp_correction_calc(unsigned long long ts, unsigned int flags,
> >                         struct ts_offset_sample *min,
> >                         struct ts_offset_sample *max)
> >  {
> > -     long long scaling;
> >       long long tscor;
> >
> >       if (flags & TRACECMD_TSYNC_FLAG_INTERPOLATE) {
> > @@ -1243,15 +1243,12 @@ timestamp_correction_calc(unsigned long long ts, unsigned int flags,
> >               long long offset = ((long long)ts - min->time) *
> >                                  (max->offset - min->offset);
> >
> > -             scaling = (min->scaling + max->scaling) / 2;
>
> So the max->scaling is not going to play a role here?

I assume that if there is a clock scaling - it should not change
runtime, fraction bits and scaling should be the same from the VM
start. So there is no sense to make interpolation, just use the first
value.

>
> Does this work for other methods besides KVM?
>

We have only PTP and KVM till now, only KVM supports scaling. PTP just
sets scaling to 1 and fraction to 0. Unfortunately, I cannot test even
KVM scaling - I failed to enable it on my system.

> -- Steve
>
>
> >               tscor = min->offset + (offset + delta / 2) / delta;
> > -
> >       } else {
> > -             scaling = min->scaling;
> >               tscor = min->offset;
> >       }
> >
> > -     ts *= scaling;
> > +     ts = (ts * min->scaling) >> min->fraction;
> >       if (tscor < 0)
> >               return ts - llabs(tscor);
> >
> > @@ -2337,37 +2334,16 @@ static int tsync_offset_cmp(const void *a, const void *b)
> >



-- 
Tzvetomir (Ceco) Stoyanov
VMware Open Source Technology Center

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/3] trace-cmd: Extend host-guest time sync with fraction bits
  2021-10-13  9:48     ` Tzvetomir Stoyanov
@ 2021-10-13 14:53       ` Steven Rostedt
  0 siblings, 0 replies; 10+ messages in thread
From: Steven Rostedt @ 2021-10-13 14:53 UTC (permalink / raw)
  To: Tzvetomir Stoyanov; +Cc: Linux Trace Devel

On Wed, 13 Oct 2021 12:48:55 +0300
Tzvetomir Stoyanov <tz.stoyanov@gmail.com> wrote:

> On Wed, Oct 13, 2021 at 6:04 AM Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> > On Thu, 23 Sep 2021 12:45:24 +0300
> > "Tzvetomir Stoyanov (VMware)" <tz.stoyanov@gmail.com> wrote:  
> > > @@ -175,6 +184,12 @@ static int kvm_open_vcpu_dir(struct kvm_clock_sync *kvm, int cpu, char *dir_str)
> > >                                        dir_str, entry->d_name);
> > >                               kvm->vcpu_scalings[cpu] = strdup(path);
> > >                       }
> > > +                     if (!strncmp(entry->d_name, KVM_DEBUG_FRACTION_FILE,
> > > +                                  strlen(KVM_DEBUG_FRACTION_FILE))) {  
> >
> > I'm curious, why not just use strcmp(), or does "d_name" have more
> > characters than the fraction file?  
> 
> Just to be on the safe side. Currently the fraction file is with the
> longest name, but that may change in the future.\\


I'm confused, what do you mean on the safe side? What are you protecting
against? In other words, what can possibly go wrong? And I notice you do
this for all of them. I don't get it.

The only thing that this is protecting against, is if the file name becomes
"tsc-scaling-ratio-frac-bits-some-garbage-here" then it it will still
match. But other than that, you are not protecting against anything.

-- Steve


> 
> >  
> > > +                             snprintf(path, sizeof(path), "%s/%s",
> > > +                                      dir_str, entry->d_name);
> > > +                             kvm->vcpu_frac[cpu] = strdup(path);
> > > +                     }
> > >               }
> > >       }

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-10-13 14:53 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-23  9:45 [PATCH 0/3] trace-cmd: Align guest TSC calculation with the kernel Tzvetomir Stoyanov (VMware)
2021-09-23  9:45 ` [PATCH 1/3] trace-cmd: Extend host-guest time sync with fraction bits Tzvetomir Stoyanov (VMware)
2021-10-13  3:04   ` Steven Rostedt
2021-10-13  9:48     ` Tzvetomir Stoyanov
2021-10-13 14:53       ` Steven Rostedt
2021-09-23  9:45 ` [PATCH 2/3] trace-cmd: Read and use fraction bits from TRACECMD_OPTION_TIME_SHIFT Tzvetomir Stoyanov (VMware)
2021-10-13  3:14   ` Steven Rostedt
2021-10-13 10:03     ` Tzvetomir Stoyanov
2021-09-23  9:45 ` [PATCH 3/3] trace-cmd: Dump " Tzvetomir Stoyanov (VMware)
2021-10-13  3:15   ` Steven Rostedt

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.