Linux-Trace-Devel Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH v11 0/9]  trace-cmd: Timetamps sync between host and guest machines, relying on vsock events.
@ 2019-04-25 11:05 Tzvetomir Stoyanov
  2019-04-25 11:05 ` [PATCH v11 1/9] trace-cmd: Implemented new lib API: tracecmd_local_events_system() Tzvetomir Stoyanov
                   ` (9 more replies)
  0 siblings, 10 replies; 11+ messages in thread
From: Tzvetomir Stoyanov @ 2019-04-25 11:05 UTC (permalink / raw)
  To: rostedt; +Cc: linux-trace-devel

[
 v11 changes:
  - Rebased on top of Slavomir's v10 "Add VM kernel tracing over vsockets and FIFOs"
  - Addressed Slavomir's commnents from version 10 of the patch series.

 v10 changes:
  - Fixed broken compilation, call to timestamp_correction_calc() in timestamp_correct
    was smashed.
  - Replaced deprecated tep_data_event_from_type() API with tep_find_event().
  - Fixed a warning on assignment const to non const.

 v9 changes:
  - Fixed implementation of binary search algorithm in timestamp_correct()

 v8 changes:
  - Added rmdir() call in tracecmd_remove_instance(), to completely remove the instance. 
  However, there is an issue with deleting the instances using rmdir(), which is investigated.
  - Few changes in read_qemu_guests_pids(), timestamp_correct(), tsync_offset_load() 
 tracecmd_clock_context_new() and find_raw_events() suggested by Slavomir. 

 v7 changes:
  - Added warning messages in case time synchronization cannot be negotiated or fails.
  - Few optimizations and checks in read_qemu_guests_pids(), tsync_offset_load(),
    and find_raw_events(), suggested by Slavomir Kaslev.
  - Reworked timestamp_correct() to not use static variables.
  - Check TRACECMD_OPTION_TIME_SHIFT before reading time sync samples from the trace.dat file

 v6 changes:
  - Refactored tracecmd_msg_snd_time_sync() and tracecmd_msg_rcv_time_sync() functions:
    removed any time sync calculations logic as separate functions in trace-timesync.c file
  - Defined TSYNC_PROBE, TSYNC_REQ and TSYNC_RESP messages, in order to make the time sync
    protocol comprehensible.
  - Addressed Steven Rostedt comments.
  - Addressed Slavomir Kaslev commnets.

 v5 changes:
  - Rebased to Slavomir's v8 "Add VM kernel tracing over vsockets and FIFOs"
    patch series.
  - Implemented an algorithm for time drift correction.
  - Addressed Slavomir's commnets.
  - Refactored the code: moved all time sync specific implementation in trace-timesync.c
  - Isolated all hardcoded event specific stuff in a structure, so it could be easily
    moved to external plugins.
  - Added a check for VSOCK support: do not perform vsock dependent time synchronisation
    in case there is no VSOCK support.

 v4 changes:
  - Removed the implementation of PTP-like algorithm. The current
    logic relies on matching time stamps of kvm_exit/virtio_transport_recv_pkt
    events on host to virtio_transport_alloc_pkt/vp_notify events on guest.
  - Rebased to Slavomir's v7 "Add VM kernel tracing over vsockets and FIFOs"
    patch series.
  - Decreased the time synch probes from 5000 to 300.
  - Addressed Steven Rostedt comments.
  - Code cleanup.

 v3 changes:
 - Removed any magic constants, used in the PTP-like algorithm,
   as Slavomir Kaslev suggested.
 - Implemented new algorithm, based on mapping kvm_exit events
   in host context to vsock_send events in guest context,
   suggested by Steven Rostedt.

 v2 changes:
  - Addressed Steven Rostedt comments.
  - Modified PTP-like timestamps sync algorithm to gain more accuracy, with the
    help of Yordan Karadzhov and Slavomir Kaslev.
]

POC implementation of algorithm for timestamps sync between guest and host machines.
The algorithm relies on matching time stamps of guest and host vsock events.

The patch series depends on Slavomir's changes, introduced by the v10 patch series
"Add VM kernel tracing over vsockets and FIFOs"

Tzvetomir Stoyanov (9):
  trace-cmd: Implemented new lib API: tracecmd_local_events_system()
  trace-cmd: Added support for negative time offsets in trace.dat file
  trace-cmd: Fix tracecmd_read_page_record() to read more than one event
  trace-cmd: Added implementation of htonll() and ntohll()
  trace-cmd: Refactored few functions in trace-record.c
  trace-cmd: Find and store pids of tasks, which run virtual CPUs of
    given VM
  trace-cmd: Implemented new API tracecmd_add_option_v()
  trace-cmd: Implemented new option in trace.dat file:
    TRACECMD_OPTION_TIME_SHIFT
  trace-cmd [POC]: Implemented timestamps synch algorithm, using vsock
    events.

 include/trace-cmd/trace-cmd.h    |  31 +-
 include/traceevent/event-parse.h |   1 +
 lib/trace-cmd/trace-input.c      | 145 +++++-
 lib/trace-cmd/trace-util.c       |  98 ++--
 tracecmd/Makefile                |   1 +
 tracecmd/include/trace-local.h   |  43 +-
 tracecmd/include/trace-msg.h     |  10 +
 tracecmd/trace-agent.c           |  13 +-
 tracecmd/trace-msg.c             | 209 +++++++-
 tracecmd/trace-output.c          | 117 ++++-
 tracecmd/trace-read.c            |   4 +-
 tracecmd/trace-record.c          | 229 +++++++--
 tracecmd/trace-timesync.c        | 808 +++++++++++++++++++++++++++++++
 13 files changed, 1575 insertions(+), 134 deletions(-)
 create mode 100644 tracecmd/trace-timesync.c

-- 
2.20.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v11 1/9] trace-cmd: Implemented new lib API: tracecmd_local_events_system()
  2019-04-25 11:05 [PATCH v11 0/9] trace-cmd: Timetamps sync between host and guest machines, relying on vsock events Tzvetomir Stoyanov
@ 2019-04-25 11:05 ` Tzvetomir Stoyanov
  2019-04-25 11:05 ` [PATCH v11 2/9] trace-cmd: Added support for negative time offsets in trace.dat file Tzvetomir Stoyanov
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Tzvetomir Stoyanov @ 2019-04-25 11:05 UTC (permalink / raw)
  To: rostedt; +Cc: linux-trace-devel

The new tracecmd lib API tracecmd_local_events_system() creates
a tep handler and initializes it with the events of the
specified subsystems.

Signed-off-by: Tzvetomir Stoyanov <tstoyanov@vmware.com>
---
 include/trace-cmd/trace-cmd.h |  2 +
 lib/trace-cmd/trace-util.c    | 98 ++++++++++++++++++++++++-----------
 2 files changed, 69 insertions(+), 31 deletions(-)

diff --git a/include/trace-cmd/trace-cmd.h b/include/trace-cmd/trace-cmd.h
index 094d8d2..c8a145b 100644
--- a/include/trace-cmd/trace-cmd.h
+++ b/include/trace-cmd/trace-cmd.h
@@ -32,6 +32,8 @@ void tracecmd_unload_plugins(struct tep_plugin_list *list, struct tep_handle *pe
 char **tracecmd_event_systems(const char *tracing_dir);
 char **tracecmd_system_events(const char *tracing_dir, const char *system);
 struct tep_handle *tracecmd_local_events(const char *tracing_dir);
+struct tep_handle *tracecmd_local_events_system(const char *tracing_dir,
+						const char * const *sys_names);
 int tracecmd_fill_local_events(const char *tracing_dir,
 			       struct tep_handle *pevent, int *parsing_failures);
 char **tracecmd_local_plugins(const char *tracing_dir);
diff --git a/lib/trace-cmd/trace-util.c b/lib/trace-cmd/trace-util.c
index 190cf74..163b862 100644
--- a/lib/trace-cmd/trace-util.c
+++ b/lib/trace-cmd/trace-util.c
@@ -1120,39 +1120,20 @@ static int read_header(struct tep_handle *pevent, const char *events_dir)
 	return ret;
 }
 
-/**
- * tracecmd_local_events - create a pevent from the events on system
- * @tracing_dir: The directory that contains the events.
- *
- * Returns a pevent structure that contains the pevents local to
- * the system.
- */
-struct tep_handle *tracecmd_local_events(const char *tracing_dir)
+static bool contains(const char *name, const char * const *names)
 {
-	struct tep_handle *pevent = NULL;
-
-	pevent = tep_alloc();
-	if (!pevent)
-		return NULL;
-
-	if (tracecmd_fill_local_events(tracing_dir, pevent, NULL)) {
-		tep_free(pevent);
-		pevent = NULL;
-	}
-
-	return pevent;
+	if (!names)
+		return false;
+	for (; *names; names++)
+		if (strcmp(name, *names) == 0)
+			return true;
+	return false;
 }
 
-/**
- * tracecmd_fill_local_events - Fill a pevent with the events on system
- * @tracing_dir: The directory that contains the events.
- * @pevent: Allocated pevent which will be filled
- * @parsing_failures: return number of failures while parsing the event files
- *
- * Returns whether the operation succeeded
- */
-int tracecmd_fill_local_events(const char *tracing_dir,
-			       struct tep_handle *pevent, int *parsing_failures)
+static int tracecmd_fill_local_events_system(const char *tracing_dir,
+					     struct tep_handle *pevent,
+					     const char * const *sys_names,
+					     int *parsing_failures)
 {
 	struct dirent *dent;
 	char *events_dir;
@@ -1194,7 +1175,8 @@ int tracecmd_fill_local_events(const char *tracing_dir,
 		if (strcmp(name, ".") == 0 ||
 		    strcmp(name, "..") == 0)
 			continue;
-
+		if (!contains(name, sys_names))
+			continue;
 		sys = append_file(events_dir, name);
 		ret = stat(sys, &st);
 		if (ret < 0 || !S_ISDIR(st.st_mode)) {
@@ -1220,6 +1202,60 @@ int tracecmd_fill_local_events(const char *tracing_dir,
 	return ret;
 }
 
+/**
+ * tracecmd_local_events_system - create a tep from the events of the specified subsystem.
+ *
+ * @tracing_dir: The directory that contains the events.
+ * @sys_name: Array of system names, to load the events from.
+ * The last element from the array must be NULL
+ *
+ * Returns a tep structure that contains the tep local to
+ * the system.
+ */
+struct tep_handle *tracecmd_local_events_system(const char *tracing_dir,
+						const char * const *sys_names)
+{
+	struct tep_handle *tep = NULL;
+
+	tep = tep_alloc();
+	if (!tep)
+		return NULL;
+
+	if (tracecmd_fill_local_events_system(tracing_dir, tep, sys_names, NULL)) {
+		tep_free(tep);
+		tep = NULL;
+	}
+
+	return tep;
+}
+
+/**
+ * tracecmd_local_events - create a pevent from the events on system
+ * @tracing_dir: The directory that contains the events.
+ *
+ * Returns a pevent structure that contains the pevents local to
+ * the system.
+ */
+struct tep_handle *tracecmd_local_events(const char *tracing_dir)
+{
+	return tracecmd_local_events_system(tracing_dir, NULL);
+}
+
+/**
+ * tracecmd_fill_local_events - Fill a pevent with the events on system
+ * @tracing_dir: The directory that contains the events.
+ * @pevent: Allocated pevent which will be filled
+ * @parsing_failures: return number of failures while parsing the event files
+ *
+ * Returns whether the operation succeeded
+ */
+int tracecmd_fill_local_events(const char *tracing_dir,
+			       struct tep_handle *pevent, int *parsing_failures)
+{
+	return tracecmd_fill_local_events_system(tracing_dir, pevent,
+						 NULL, parsing_failures);
+}
+
 /**
  * tracecmd_local_plugins - returns an array of available tracer plugins
  * @tracing_dir: The directory that contains the tracing directory
-- 
2.20.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v11 2/9] trace-cmd: Added support for negative time offsets in trace.dat file
  2019-04-25 11:05 [PATCH v11 0/9] trace-cmd: Timetamps sync between host and guest machines, relying on vsock events Tzvetomir Stoyanov
  2019-04-25 11:05 ` [PATCH v11 1/9] trace-cmd: Implemented new lib API: tracecmd_local_events_system() Tzvetomir Stoyanov
@ 2019-04-25 11:05 ` Tzvetomir Stoyanov
  2019-04-25 11:05 ` [PATCH v11 3/9] trace-cmd: Fix tracecmd_read_page_record() to read more than one event Tzvetomir Stoyanov
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Tzvetomir Stoyanov @ 2019-04-25 11:05 UTC (permalink / raw)
  To: rostedt; +Cc: linux-trace-devel

When synchronizing timestamps between different machines, there are cases
when the time offset is negative. This patch changes the way time offset is
written and read from trace.dat file - as signed decimal, instead of hex.

Signed-off-by: Tzvetomir Stoyanov <tstoyanov@vmware.com>
---
 include/trace-cmd/trace-cmd.h | 2 +-
 lib/trace-cmd/trace-input.c   | 6 +++---
 tracecmd/trace-read.c         | 4 ++--
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/include/trace-cmd/trace-cmd.h b/include/trace-cmd/trace-cmd.h
index c8a145b..bf32c4d 100644
--- a/include/trace-cmd/trace-cmd.h
+++ b/include/trace-cmd/trace-cmd.h
@@ -128,7 +128,7 @@ int tracecmd_is_buffer_instance(struct tracecmd_input *handle);
 void tracecmd_create_top_instance(char *name);
 void tracecmd_remove_instances(void);
 
-void tracecmd_set_ts_offset(struct tracecmd_input *handle, unsigned long long offset);
+void tracecmd_set_ts_offset(struct tracecmd_input *handle, long long offset);
 void tracecmd_set_ts2secs(struct tracecmd_input *handle, unsigned long long hz);
 
 void tracecmd_print_events(struct tracecmd_input *handle, const char *regex);
diff --git a/lib/trace-cmd/trace-input.c b/lib/trace-cmd/trace-input.c
index ba20ef1..6624932 100644
--- a/lib/trace-cmd/trace-input.c
+++ b/lib/trace-cmd/trace-input.c
@@ -91,7 +91,7 @@ struct tracecmd_input {
 	bool			read_page;
 	bool			use_pipe;
 	struct cpu_data 	*cpu_data;
-	unsigned long long	ts_offset;
+	long long		ts_offset;
 	double			ts2secs;
 	char *			cpustats;
 	char *			uname;
@@ -2098,7 +2098,7 @@ static int init_cpu(struct tracecmd_input *handle, int cpu)
 }
 
 void tracecmd_set_ts_offset(struct tracecmd_input *handle,
-			    unsigned long long offset)
+			    long long offset)
 {
 	handle->ts_offset = offset;
 }
@@ -2115,7 +2115,7 @@ void tracecmd_set_ts2secs(struct tracecmd_input *handle,
 
 static int handle_options(struct tracecmd_input *handle)
 {
-	unsigned long long offset;
+	long long offset;
 	unsigned short option;
 	unsigned int size;
 	char *cpustats = NULL;
diff --git a/tracecmd/trace-read.c b/tracecmd/trace-read.c
index dbfb3a5..e0b4ea1 100644
--- a/tracecmd/trace-read.c
+++ b/tracecmd/trace-read.c
@@ -58,7 +58,7 @@ static struct list_head handle_list;
 struct input_files {
 	struct list_head	list;
 	const char		*file;
-	unsigned long long	tsoffset;
+	long long		tsoffset;
 	unsigned long long	ts2secs;
 };
 static struct list_head input_files;
@@ -1418,7 +1418,7 @@ void trace_report (int argc, char **argv)
 	struct input_files *inputs;
 	struct handle_list *handles;
 	enum output_type otype;
-	unsigned long long tsoffset = 0;
+	long long tsoffset = 0;
 	unsigned long long ts2secs = 0;
 	unsigned long long ts2sc;
 	int show_stat = 0;
-- 
2.20.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v11 3/9] trace-cmd: Fix tracecmd_read_page_record() to read more than one event
  2019-04-25 11:05 [PATCH v11 0/9] trace-cmd: Timetamps sync between host and guest machines, relying on vsock events Tzvetomir Stoyanov
  2019-04-25 11:05 ` [PATCH v11 1/9] trace-cmd: Implemented new lib API: tracecmd_local_events_system() Tzvetomir Stoyanov
  2019-04-25 11:05 ` [PATCH v11 2/9] trace-cmd: Added support for negative time offsets in trace.dat file Tzvetomir Stoyanov
@ 2019-04-25 11:05 ` Tzvetomir Stoyanov
  2019-04-25 11:05 ` [PATCH v11 4/9] trace-cmd: Added implementation of htonll() and ntohll() Tzvetomir Stoyanov
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Tzvetomir Stoyanov @ 2019-04-25 11:05 UTC (permalink / raw)
  To: rostedt; +Cc: linux-trace-devel

The kbuffer_next_event() will return the next event on the sub buffer.
If we pass in the last_record to tracecmd_read_page_record(), it
initializes the sub buffer, and by calling kbuffer_next_event()
(instead of kbuffer_read_event()), the second event on the sub buffer
is returned. This causes the match of the last_record not to match if
the last_record happens to be the first event on the sub buffer.

Signed-off-by: Tzvetomir Stoyanov <tstoyanov@vmware.com>
---
 lib/trace-cmd/trace-input.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/lib/trace-cmd/trace-input.c b/lib/trace-cmd/trace-input.c
index 6624932..ea11d62 100644
--- a/lib/trace-cmd/trace-input.c
+++ b/lib/trace-cmd/trace-input.c
@@ -1695,18 +1695,22 @@ tracecmd_read_page_record(struct tep_handle *pevent, void *page, int size,
 			goto out_free;
 		}
 
-		do {
+		ptr = kbuffer_read_event(kbuf, &ts);
+		while (ptr < last_record->data) {
 			ptr = kbuffer_next_event(kbuf, NULL);
 			if (!ptr)
 				break;
-		} while (ptr < last_record->data);
+			if (ptr == last_record->data)
+				break;
+		}
 		if (ptr != last_record->data) {
 			warning("tracecmd_read_page_record: could not find last_record");
 			goto out_free;
 		}
-	}
+		ptr = kbuffer_next_event(kbuf, &ts);
+	} else
+		ptr = kbuffer_read_event(kbuf, &ts);
 
-	ptr = kbuffer_read_event(kbuf, &ts);
 	if (!ptr)
 		goto out_free;
 
-- 
2.20.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v11 4/9] trace-cmd: Added implementation of htonll() and ntohll()
  2019-04-25 11:05 [PATCH v11 0/9] trace-cmd: Timetamps sync between host and guest machines, relying on vsock events Tzvetomir Stoyanov
                   ` (2 preceding siblings ...)
  2019-04-25 11:05 ` [PATCH v11 3/9] trace-cmd: Fix tracecmd_read_page_record() to read more than one event Tzvetomir Stoyanov
@ 2019-04-25 11:05 ` Tzvetomir Stoyanov
  2019-04-25 11:05 ` [PATCH v11 5/9] trace-cmd: Refactored few functions in trace-record.c Tzvetomir Stoyanov
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Tzvetomir Stoyanov @ 2019-04-25 11:05 UTC (permalink / raw)
  To: rostedt; +Cc: linux-trace-devel

Added implementation of htonll() and ntohll() as
macros, if they are not already defined.

Signed-off-by: Tzvetomir Stoyanov <tstoyanov@vmware.com>
---
 tracecmd/include/trace-msg.h | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/tracecmd/include/trace-msg.h b/tracecmd/include/trace-msg.h
index b7fe10b..445f799 100644
--- a/tracecmd/include/trace-msg.h
+++ b/tracecmd/include/trace-msg.h
@@ -15,4 +15,14 @@ extern unsigned int page_size;
 void plog(const char *fmt, ...);
 void pdie(const char *fmt, ...);
 
+#ifndef htonll
+# if __BYTE_ORDER == __LITTLE_ENDIAN
+#define htonll(x) __bswap_64(x)
+#define ntohll(x) __bswap_64(x)
+#else
+#define htonll(x) (x)
+#define ntohll(x) (x)
+#endif
+#endif
+
 #endif /* _TRACE_MSG_H_ */
-- 
2.20.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v11 5/9] trace-cmd: Refactored few functions in trace-record.c
  2019-04-25 11:05 [PATCH v11 0/9] trace-cmd: Timetamps sync between host and guest machines, relying on vsock events Tzvetomir Stoyanov
                   ` (3 preceding siblings ...)
  2019-04-25 11:05 ` [PATCH v11 4/9] trace-cmd: Added implementation of htonll() and ntohll() Tzvetomir Stoyanov
@ 2019-04-25 11:05 ` Tzvetomir Stoyanov
  2019-04-25 11:05 ` [PATCH v11 6/9] trace-cmd: Find and store pids of tasks, which run virtual CPUs of given VM Tzvetomir Stoyanov
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Tzvetomir Stoyanov @ 2019-04-25 11:05 UTC (permalink / raw)
  To: rostedt; +Cc: linux-trace-devel

In order to reuse the code inside the trace-cmd application, few
functions from trace-record.c are refactored:
  - make_instances() and tracecmd_remove_instances() are splited.
New ones are created: tracecmd_make_instance() and tracecmd_remove_instance(),
which are visible outside the trace-record.c
  - Following functions are made non-static: tracecmd_init_instance()
get_instance_dir(), write_instance_file(), write_tracing_on(),
tracecmd_set_clock()
  - New function is implemented: tracecmd_local_cpu_count(), an internal
API to get local_cpu_count.

Signed-off-by: Tzvetomir Stoyanov <tstoyanov@vmware.com>
---
 tracecmd/include/trace-local.h |  9 ++++
 tracecmd/trace-record.c        | 88 +++++++++++++++++++---------------
 2 files changed, 59 insertions(+), 38 deletions(-)

diff --git a/tracecmd/include/trace-local.h b/tracecmd/include/trace-local.h
index 71b9249..554f1a3 100644
--- a/tracecmd/include/trace-local.h
+++ b/tracecmd/include/trace-local.h
@@ -235,6 +235,15 @@ void update_first_instance(struct buffer_instance *instance, int topt);
 void show_instance_file(struct buffer_instance *instance, const char *name);
 
 int count_cpus(void);
+void write_tracing_on(struct buffer_instance *instance, int on);
+char *get_instance_dir(struct buffer_instance *instance);
+int write_instance_file(struct buffer_instance *instance,
+			const char *file, const char *str, const char *type);
+void tracecmd_init_instance(struct buffer_instance *instance);
+void tracecmd_make_instance(struct buffer_instance *instance);
+int tracecmd_local_cpu_count(void);
+void tracecmd_set_clock(struct buffer_instance *instance);
+void tracecmd_remove_instance(struct buffer_instance *instance);
 
 /* No longer in event-utils.h */
 void __noreturn die(const char *fmt, ...); /* Can be overriden */
diff --git a/tracecmd/trace-record.c b/tracecmd/trace-record.c
index 29f5784..4a82773 100644
--- a/tracecmd/trace-record.c
+++ b/tracecmd/trace-record.c
@@ -185,7 +185,7 @@ static inline int no_top_instance(void)
 	return first_instance != &top_instance;
 }
 
-static void init_instance(struct buffer_instance *instance)
+void tracecmd_init_instance(struct buffer_instance *instance)
 {
 	instance->event_next = &instance->events;
 }
@@ -309,7 +309,7 @@ static void reset_save_file_cond(const char *file, int prio,
  */
 void add_instance(struct buffer_instance *instance, int cpu_count)
 {
-	init_instance(instance);
+	tracecmd_init_instance(instance);
 	instance->next = buffer_instances;
 	if (first_instance == buffer_instances)
 		first_instance = instance;
@@ -496,7 +496,7 @@ static void add_event(struct buffer_instance *instance, struct event_list *event
 static void reset_event_list(struct buffer_instance *instance)
 {
 	instance->events = NULL;
-	init_instance(instance);
+	tracecmd_init_instance(instance);
 }
 
 static char *get_temp_file(struct buffer_instance *instance, int cpu)
@@ -792,8 +792,7 @@ get_instance_file(struct buffer_instance *instance, const char *file)
 	return path;
 }
 
-static char *
-get_instance_dir(struct buffer_instance *instance)
+char *get_instance_dir(struct buffer_instance *instance)
 {
 	char *buf;
 	char *path;
@@ -837,9 +836,8 @@ static int write_file(const char *file, const char *str, const char *type)
 	return ret;
 }
 
-static int
-write_instance_file(struct buffer_instance *instance,
-		    const char *file, const char *str, const char *type)
+int write_instance_file(struct buffer_instance *instance,
+			const char *file, const char *str, const char *type)
 {
 	char *path;
 	int ret;
@@ -1981,7 +1979,7 @@ static int open_tracing_on(struct buffer_instance *instance)
 	return fd;
 }
 
-static void write_tracing_on(struct buffer_instance *instance, int on)
+void write_tracing_on(struct buffer_instance *instance, int on)
 {
 	int ret;
 	int fd;
@@ -2305,7 +2303,7 @@ void tracecmd_enable_events(void)
 	enable_events(first_instance);
 }
 
-static void set_clock(struct buffer_instance *instance)
+void tracecmd_set_clock(struct buffer_instance *instance)
 {
 	char *path;
 	char *content;
@@ -4442,49 +4440,58 @@ static void clear_func_filters(void)
 	}
 }
 
-static void make_instances(void)
+void tracecmd_make_instance(struct buffer_instance *instance)
 {
-	struct buffer_instance *instance;
 	struct stat st;
 	char *path;
 	int ret;
 
+	path = get_instance_dir(instance);
+	ret = stat(path, &st);
+	if (ret < 0) {
+		ret = mkdir(path, 0777);
+		if (ret < 0)
+			die("mkdir %s", path);
+	} else
+		/* Don't delete instances that already exist */
+		instance->flags |= BUFFER_FL_KEEP;
+	tracecmd_put_tracing_file(path);
+
+}
+
+static void make_instances(void)
+{
+	struct buffer_instance *instance;
+
 	for_each_instance(instance) {
 		if (is_guest(instance))
 			continue;
+		tracecmd_make_instance(instance);
+	}
+}
 
-		path = get_instance_dir(instance);
-		ret = stat(path, &st);
-		if (ret < 0) {
-			ret = mkdir(path, 0777);
-			if (ret < 0)
-				die("mkdir %s", path);
-		} else
-			/* Don't delete instances that already exist */
-			instance->flags |= BUFFER_FL_KEEP;
-		tracecmd_put_tracing_file(path);
+void tracecmd_remove_instance(struct buffer_instance *instance)
+{
+	char *path;
+
+	if (instance->tracing_on_fd > 0) {
+		close(instance->tracing_on_fd);
+		instance->tracing_on_fd = 0;
 	}
+	path = get_instance_dir(instance);
+	rmdir(path);
+	tracecmd_put_tracing_file(path);
 }
 
 void tracecmd_remove_instances(void)
 {
 	struct buffer_instance *instance;
-	char *path;
-	int ret;
 
 	for_each_instance(instance) {
 		/* Only delete what we created */
 		if (is_guest(instance) || (instance->flags & BUFFER_FL_KEEP))
 			continue;
-		if (instance->tracing_on_fd > 0) {
-			close(instance->tracing_on_fd);
-			instance->tracing_on_fd = 0;
-		}
-		path = get_instance_dir(instance);
-		ret = rmdir(path);
-		if (ret < 0)
-			die("rmdir %s", path);
-		tracecmd_put_tracing_file(path);
+		tracecmd_remove_instance(instance);
 	}
 }
 
@@ -4979,7 +4986,7 @@ void trace_stop(int argc, char **argv)
 	int topt = 0;
 	struct buffer_instance *instance = &top_instance;
 
-	init_instance(instance);
+	tracecmd_init_instance(instance);
 
 	for (;;) {
 		int c;
@@ -5020,7 +5027,7 @@ void trace_restart(int argc, char **argv)
 	int topt = 0;
 	struct buffer_instance *instance = &top_instance;
 
-	init_instance(instance);
+	tracecmd_init_instance(instance);
 
 	for (;;) {
 		int c;
@@ -5062,7 +5069,7 @@ void trace_reset(int argc, char **argv)
 	int topt = 0;
 	struct buffer_instance *instance = &top_instance;
 
-	init_instance(instance);
+	tracecmd_init_instance(instance);
 
 	/* if last arg is -a, then -b and -d apply to all instances */
 	int last_specified_all = 0;
@@ -5146,11 +5153,16 @@ static void init_common_record_context(struct common_record_context *ctx,
 	memset(ctx, 0, sizeof(*ctx));
 	ctx->instance = &top_instance;
 	ctx->curr_cmd = curr_cmd;
-	init_instance(ctx->instance);
+	tracecmd_init_instance(ctx->instance);
 	local_cpu_count = count_cpus();
 	ctx->instance->cpu_count = local_cpu_count;
 }
 
+int tracecmd_local_cpu_count(void)
+{
+	return local_cpu_count;
+}
+
 #define IS_EXTRACT(ctx) ((ctx)->curr_cmd == CMD_extract)
 #define IS_START(ctx) ((ctx)->curr_cmd == CMD_start)
 #define IS_STREAM(ctx) ((ctx)->curr_cmd == CMD_stream)
@@ -5728,7 +5740,7 @@ static void record_trace(int argc, char **argv,
 	tracecmd_disable_all_tracing(1);
 
 	for_all_instances(instance)
-		set_clock(instance);
+		tracecmd_set_clock(instance);
 
 	/* Record records the date first */
 	if (ctx->date &&
-- 
2.20.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v11 6/9] trace-cmd: Find and store pids of tasks, which run virtual CPUs of given VM
  2019-04-25 11:05 [PATCH v11 0/9] trace-cmd: Timetamps sync between host and guest machines, relying on vsock events Tzvetomir Stoyanov
                   ` (4 preceding siblings ...)
  2019-04-25 11:05 ` [PATCH v11 5/9] trace-cmd: Refactored few functions in trace-record.c Tzvetomir Stoyanov
@ 2019-04-25 11:05 ` Tzvetomir Stoyanov
  2019-04-25 11:05 ` [PATCH v11 7/9] trace-cmd: Implemented new API tracecmd_add_option_v() Tzvetomir Stoyanov
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Tzvetomir Stoyanov @ 2019-04-25 11:05 UTC (permalink / raw)
  To: rostedt; +Cc: linux-trace-devel

In order to match host and guest events, a mapping between guest VCPU
and the host task, running this VCPU is needed. Extended existing
struct guest to hold such mapping and added logic in read_qemu_guests()
function to initialize it. Implemented a new internal API,
get_guest_vcpu_pid(), to retrieve VCPU-task mapping for given VM.

Signed-off-by: Tzvetomir Stoyanov <tstoyanov@vmware.com>
---
 tracecmd/include/trace-local.h |  1 +
 tracecmd/trace-record.c        | 57 ++++++++++++++++++++++++++++++++++
 2 files changed, 58 insertions(+)

diff --git a/tracecmd/include/trace-local.h b/tracecmd/include/trace-local.h
index 554f1a3..7f4d676 100644
--- a/tracecmd/include/trace-local.h
+++ b/tracecmd/include/trace-local.h
@@ -245,6 +245,7 @@ int tracecmd_local_cpu_count(void);
 void tracecmd_set_clock(struct buffer_instance *instance);
 void tracecmd_remove_instance(struct buffer_instance *instance);
 
+int get_guest_vcpu_pid(unsigned int guest_cid, unsigned int guest_vcpu);
 /* No longer in event-utils.h */
 void __noreturn die(const char *fmt, ...); /* Can be overriden */
 void *malloc_or_die(unsigned int size); /* Can be overridden */
diff --git a/tracecmd/trace-record.c b/tracecmd/trace-record.c
index 4a82773..59a4170 100644
--- a/tracecmd/trace-record.c
+++ b/tracecmd/trace-record.c
@@ -2789,10 +2789,12 @@ static bool is_digits(const char *s)
 	return true;
 }
 
+#define VCPUS_MAX 256
 struct guest {
 	char *name;
 	int cid;
 	int pid;
+	int cpu_pid[VCPUS_MAX];
 };
 
 static struct guest *guests;
@@ -2810,6 +2812,46 @@ static char *get_qemu_guest_name(char *arg)
 	return arg;
 }
 
+static void read_qemu_guests_pids(char *guest_task, struct guest *guest)
+{
+	struct dirent *entry;
+	char path[PATH_MAX];
+	char *buf = NULL;
+	size_t n = 0;
+	int vcpu;
+	DIR *dir;
+	FILE *f;
+
+	snprintf(path, sizeof(path), "/proc/%s/task", guest_task);
+	dir = opendir(path);
+	if (!dir)
+		return;
+
+	while ((entry = readdir(dir))) {
+		if (!(entry->d_type == DT_DIR && is_digits(entry->d_name)))
+			continue;
+
+		snprintf(path, sizeof(path), "/proc/%s/task/%s/comm",
+			 guest_task, entry->d_name);
+		f = fopen(path, "r");
+		if (!f)
+			continue;
+		if (getline(&buf, &n, f) < 0)
+			goto next;
+		if (strncmp(buf, "CPU ", 4) != 0)
+			goto next;
+
+		vcpu = atoi(buf+4);
+		if (!(vcpu >= 0 && vcpu < VCPUS_MAX))
+			goto next;
+		guest->cpu_pid[vcpu] = atoi(entry->d_name);
+
+next:
+		fclose(f);
+	}
+	free(buf);
+}
+
 static void read_qemu_guests(void)
 {
 	static bool initialized;
@@ -2871,6 +2913,8 @@ static void read_qemu_guests(void)
 		if (!is_qemu)
 			goto next;
 
+		read_qemu_guests_pids(entry->d_name, &guest);
+
 		guests = realloc(guests, (guests_len + 1) * sizeof(*guests));
 		if (!guests)
 			die("Can not allocate guest buffer");
@@ -2986,6 +3030,19 @@ static char *parse_guest_name(char *guest, int *cid, int *port)
 	return guest;
 }
 
+int get_guest_vcpu_pid(unsigned int guest_cid, unsigned int guest_vcpu)
+{
+	int i;
+
+	if (!guests || guest_vcpu >= VCPUS_MAX)
+		return -1;
+
+	for (i = 0; i < guests_len; i++)
+		if (guest_cid == guests[i].cid)
+			return guests[i].cpu_pid[guest_vcpu];
+	return -1;
+}
+
 static void set_prio(int prio)
 {
 	struct sched_param sp;
-- 
2.20.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v11 7/9] trace-cmd: Implemented new API tracecmd_add_option_v()
  2019-04-25 11:05 [PATCH v11 0/9] trace-cmd: Timetamps sync between host and guest machines, relying on vsock events Tzvetomir Stoyanov
                   ` (5 preceding siblings ...)
  2019-04-25 11:05 ` [PATCH v11 6/9] trace-cmd: Find and store pids of tasks, which run virtual CPUs of given VM Tzvetomir Stoyanov
@ 2019-04-25 11:05 ` Tzvetomir Stoyanov
  2019-04-25 11:05 ` [PATCH v11 8/9] trace-cmd: Implemented new option in trace.dat file: TRACECMD_OPTION_TIME_SHIFT Tzvetomir Stoyanov
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Tzvetomir Stoyanov @ 2019-04-25 11:05 UTC (permalink / raw)
  To: rostedt; +Cc: linux-trace-devel

This patch implements a new tracecmd API, tracecmd_add_option_v()
It adds new option in trace.dat, similar to tracecmd_add_option(),
but the option's data is passed as list of buffers. The standard
struct iovec is used as input parameter, containing the option's
data buffers.

Signed-off-by: Tzvetomir Stoyanov <tstoyanov@vmware.com>
---
 include/trace-cmd/trace-cmd.h    |   5 ++
 include/traceevent/event-parse.h |   1 +
 tracecmd/trace-output.c          | 117 ++++++++++++++++++++++++++-----
 3 files changed, 106 insertions(+), 17 deletions(-)

diff --git a/include/trace-cmd/trace-cmd.h b/include/trace-cmd/trace-cmd.h
index bf32c4d..252d342 100644
--- a/include/trace-cmd/trace-cmd.h
+++ b/include/trace-cmd/trace-cmd.h
@@ -247,11 +247,16 @@ struct tracecmd_output *tracecmd_create_init_file_override(const char *output_fi
 struct tracecmd_option *tracecmd_add_option(struct tracecmd_output *handle,
 					    unsigned short id, int size,
 					    const void *data);
+struct tracecmd_option *
+tracecmd_add_option_v(struct tracecmd_output *handle,
+		    unsigned short id, const struct iovec *vector, int count);
+
 struct tracecmd_option *tracecmd_add_buffer_option(struct tracecmd_output *handle,
 						   const char *name, int cpus);
 
 int tracecmd_write_cpus(struct tracecmd_output *handle, int cpus);
 int tracecmd_write_options(struct tracecmd_output *handle);
+int tracecmd_append_options(struct tracecmd_output *handle);
 int tracecmd_update_option(struct tracecmd_output *handle,
 			   struct tracecmd_option *option, int size,
 			   const void *data);
diff --git a/include/traceevent/event-parse.h b/include/traceevent/event-parse.h
index 5e0fd19..62057b3 100644
--- a/include/traceevent/event-parse.h
+++ b/include/traceevent/event-parse.h
@@ -11,6 +11,7 @@
 #include <stdio.h>
 #include <regex.h>
 #include <string.h>
+#include <sys/uio.h>
 
 #include "trace-seq.h"
 
diff --git a/tracecmd/trace-output.c b/tracecmd/trace-output.c
index 33d6ce3..f7a2791 100644
--- a/tracecmd/trace-output.c
+++ b/tracecmd/trace-output.c
@@ -883,21 +883,23 @@ static struct tracecmd_output *create_file(const char *output_file,
 }
 
 /**
- * tracecmd_add_option - add options to the file
+ * tracecmd_add_option_v - add options to the file
  * @handle: the output file handle name
  * @id: the id of the option
- * @size: the size of the option data
- * @data: the data to write to the file.
+ * @vector: array of vectors, pointing to the data to write in the file
+ * @count: number of items in the vector array
  *
  * Returns handle to update option if needed.
  *  Just the content can be updated, with smaller or equal to
  *  content than the specified size.
  */
 struct tracecmd_option *
-tracecmd_add_option(struct tracecmd_output *handle,
-		    unsigned short id, int size, const void *data)
+tracecmd_add_option_v(struct tracecmd_output *handle,
+		    unsigned short id, const struct iovec *vector, int count)
 {
 	struct tracecmd_option *option;
+	char *data = NULL;
+	int i, size = 0;
 
 	/*
 	 * We can only add options before they were written.
@@ -906,32 +908,63 @@ tracecmd_add_option(struct tracecmd_output *handle,
 	if (handle->options_written)
 		return NULL;
 
-	handle->nr_options++;
+	for (i = 0; i < count; i++)
+		size += vector[i].iov_len;
+
+	/* Some IDs (like TRACECMD_OPTION_TRACECLOCK) pass vector with 0 / NULL data */
+	if (size) {
+		data = malloc(size);
+		if (!data) {
+			warning("Insufficient memory");
+			return NULL;
+		}
+	}
 
 	option = malloc(sizeof(*option));
 	if (!option) {
 		warning("Could not allocate space for option");
+		free(data);
 		return NULL;
 	}
 
-	option->id = id;
-	option->size = size;
-	option->data = malloc(size);
-	if (!option->data) {
-		warning("Insufficient memory");
-		free(option);
-		return NULL;
+	handle->nr_options++;
+	option->data = data;
+	for (i = 0; i < count; i++) {
+		if (vector[i].iov_base && vector[i].iov_len) {
+			memcpy(data, vector[i].iov_base, vector[i].iov_len);
+			data += vector[i].iov_len;
+		}
 	}
-
-	/* Some IDs (like TRACECMD_OPTION_TRACECLOCK) pass 0 / NULL data */
-	if (size)
-		memcpy(option->data, data, size);
+	option->size = size;
+	option->id = id;
 
 	list_add_tail(&option->list, &handle->options);
 
 	return option;
 }
 
+/**
+ * tracecmd_add_option - add options to the file
+ * @handle: the output file handle name
+ * @id: the id of the option
+ * @size: the size of the option data
+ * @data: the data to write to the file.
+ *
+ * Returns handle to update option if needed.
+ *  Just the content can be updated, with smaller or equal to
+ *  content than the specified size.
+ */
+struct tracecmd_option *
+tracecmd_add_option(struct tracecmd_output *handle,
+		    unsigned short id, int size, const void *data)
+{
+	struct iovec vect;
+
+	vect.iov_base = (void *) data;
+	vect.iov_len = size;
+	return tracecmd_add_option_v(handle, id, &vect, 1);
+}
+
 int tracecmd_write_cpus(struct tracecmd_output *handle, int cpus)
 {
 	cpus = convert_endian_4(handle, cpus);
@@ -979,6 +1012,56 @@ int tracecmd_write_options(struct tracecmd_output *handle)
 	return 0;
 }
 
+int tracecmd_append_options(struct tracecmd_output *handle)
+{
+	struct tracecmd_option *options;
+	unsigned short option;
+	unsigned short endian2;
+	unsigned int endian4;
+	off_t offset;
+	int r;
+
+	/* If already written, ignore */
+	if (handle->options_written)
+		return 0;
+
+	if (lseek64(handle->fd, 0, SEEK_END) == (off_t)-1)
+		return -1;
+	offset = lseek64(handle->fd, -2, SEEK_CUR);
+	if (offset == (off_t)-1)
+		return -1;
+
+	r = pread(handle->fd, &option, 2, offset);
+	if (r != 2 || option != TRACECMD_OPTION_DONE)
+		return -1;
+
+	list_for_each_entry(options, &handle->options, list) {
+		endian2 = convert_endian_2(handle, options->id);
+		if (do_write_check(handle, &endian2, 2))
+			return -1;
+
+		endian4 = convert_endian_4(handle, options->size);
+		if (do_write_check(handle, &endian4, 4))
+			return -1;
+
+		/* Save the data location in case it needs to be updated */
+		options->offset = lseek64(handle->fd, 0, SEEK_CUR);
+
+		if (do_write_check(handle, options->data,
+				   options->size))
+			return -1;
+	}
+
+	option = TRACECMD_OPTION_DONE;
+
+	if (do_write_check(handle, &option, 2))
+		return -1;
+
+	handle->options_written = 1;
+
+	return 0;
+}
+
 int tracecmd_update_option(struct tracecmd_output *handle,
 			   struct tracecmd_option *option, int size,
 			   const void *data)
-- 
2.20.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v11 8/9] trace-cmd: Implemented new option in trace.dat file: TRACECMD_OPTION_TIME_SHIFT
  2019-04-25 11:05 [PATCH v11 0/9] trace-cmd: Timetamps sync between host and guest machines, relying on vsock events Tzvetomir Stoyanov
                   ` (6 preceding siblings ...)
  2019-04-25 11:05 ` [PATCH v11 7/9] trace-cmd: Implemented new API tracecmd_add_option_v() Tzvetomir Stoyanov
@ 2019-04-25 11:05 ` Tzvetomir Stoyanov
  2019-04-25 11:05 ` [PATCH v11 9/9] trace-cmd [POC]: Implemented timestamps synch algorithm, using vsock events Tzvetomir Stoyanov
  2019-08-15 18:40 ` [PATCH v11 0/9] trace-cmd: Timetamps sync between host and guest machines, relying on " Steven Rostedt
  9 siblings, 0 replies; 11+ messages in thread
From: Tzvetomir Stoyanov @ 2019-04-25 11:05 UTC (permalink / raw)
  To: rostedt; +Cc: linux-trace-devel

The TRACECMD_OPTION_TIME_SHIFT is used when synchronizing trace time stamps between
two trace.dat files. It contains multiple long long (time, offset) pairs, describing
time stamps _offset_, measured in the given local _time_. The content of the option
buffer is:
 first 4 bytes - integer, count of timestamp offsets
 long long array of size _count_, local time in which the offset is measured
 long long array of size _count_, offset of the time stamps

Signed-off-by: Tzvetomir Stoyanov <tstoyanov@vmware.com>
---
 include/trace-cmd/trace-cmd.h |   1 +
 lib/trace-cmd/trace-input.c   | 127 +++++++++++++++++++++++++++++++++-
 2 files changed, 126 insertions(+), 2 deletions(-)

diff --git a/include/trace-cmd/trace-cmd.h b/include/trace-cmd/trace-cmd.h
index 252d342..df97eb9 100644
--- a/include/trace-cmd/trace-cmd.h
+++ b/include/trace-cmd/trace-cmd.h
@@ -83,6 +83,7 @@ enum {
 	TRACECMD_OPTION_HOOK,
 	TRACECMD_OPTION_OFFSET,
 	TRACECMD_OPTION_CPUCOUNT,
+	TRACECMD_OPTION_TIME_SHIFT,
 };
 
 enum {
diff --git a/lib/trace-cmd/trace-input.c b/lib/trace-cmd/trace-input.c
index ea11d62..bbb9bde 100644
--- a/lib/trace-cmd/trace-input.c
+++ b/lib/trace-cmd/trace-input.c
@@ -75,6 +75,11 @@ struct input_buffer_instance {
 	size_t			offset;
 };
 
+struct ts_offset_sample {
+	long long	time;
+	long long	offset;
+};
+
 struct tracecmd_input {
 	struct tep_handle	*pevent;
 	struct tep_plugin_list	*plugin_list;
@@ -92,6 +97,8 @@ struct tracecmd_input {
 	bool			use_pipe;
 	struct cpu_data 	*cpu_data;
 	long long		ts_offset;
+	int			ts_samples_count;
+	struct ts_offset_sample	*ts_samples;
 	double			ts2secs;
 	char *			cpustats;
 	char *			uname;
@@ -1044,6 +1051,66 @@ static void free_next(struct tracecmd_input *handle, int cpu)
 	free_record(record);
 }
 
+static inline unsigned long long
+timestamp_correction_calc(unsigned long long ts, struct ts_offset_sample *min,
+			  struct ts_offset_sample *max)
+{
+	long long tscor = min->offset +
+			(((((long long)ts) - min->time)*
+			(max->offset-min->offset))/(max->time-min->time));
+
+	if (tscor < 0)
+		return ts - llabs(tscor);
+
+	return ts + tscor;
+
+}
+
+static unsigned long long timestamp_correct(unsigned long long ts,
+					    struct tracecmd_input *handle)
+{
+	int min, mid, max;
+
+	if (handle->ts_offset)
+		return ts + handle->ts_offset;
+	if (!handle->ts_samples_count || !handle->ts_samples)
+		return ts;
+
+	/* We have one sample, nothing to calc here */
+	if (handle->ts_samples_count == 1)
+		return ts + handle->ts_samples[0].offset;
+
+	/* We have two samples, nothing to search here */
+	if (handle->ts_samples_count == 2)
+		return timestamp_correction_calc(ts, &handle->ts_samples[0],
+						 &handle->ts_samples[1]);
+
+	/* We have more than two samples */
+	if (ts <= handle->ts_samples[0].time)
+		return timestamp_correction_calc(ts,
+						  &handle->ts_samples[0],
+						  &handle->ts_samples[1]);
+	else if (ts >= handle->ts_samples[handle->ts_samples_count-1].time)
+		return timestamp_correction_calc(ts,
+						 &handle->ts_samples[handle->ts_samples_count-2],
+						 &handle->ts_samples[handle->ts_samples_count-1]);
+	min = 0;
+	max = handle->ts_samples_count-1;
+	mid = (min + max)/2;
+	while (min <= max) {
+		if (ts < handle->ts_samples[mid].time)
+			max = mid - 1;
+		else if (ts > handle->ts_samples[mid].time)
+			min = mid + 1;
+		else
+			break;
+		mid = (min + max)/2;
+	}
+
+	return timestamp_correction_calc(ts, &handle->ts_samples[mid],
+					 &handle->ts_samples[mid+1]);
+}
+
 /*
  * Page is mapped, now read in the page header info.
  */
@@ -1065,7 +1132,7 @@ static int update_page_info(struct tracecmd_input *handle, int cpu)
 		    kbuffer_subbuffer_size(kbuf));
 		return -1;
 	}
-	handle->cpu_data[cpu].timestamp = kbuffer_timestamp(kbuf) + handle->ts_offset;
+	handle->cpu_data[cpu].timestamp = timestamp_correct(kbuffer_timestamp(kbuf), handle);
 
 	if (handle->ts2secs)
 		handle->cpu_data[cpu].timestamp *= handle->ts2secs;
@@ -1792,7 +1859,7 @@ read_again:
 		goto read_again;
 	}
 
-	handle->cpu_data[cpu].timestamp = ts + handle->ts_offset;
+	handle->cpu_data[cpu].timestamp = timestamp_correct(ts, handle);
 
 	if (handle->ts2secs) {
 		handle->cpu_data[cpu].timestamp *= handle->ts2secs;
@@ -2117,6 +2184,42 @@ void tracecmd_set_ts2secs(struct tracecmd_input *handle,
 	handle->use_trace_clock = false;
 }
 
+static int tsync_offset_cmp(const void *a, const void *b)
+{
+	struct ts_offset_sample *ts_a = (struct ts_offset_sample *)a;
+	struct ts_offset_sample *ts_b = (struct ts_offset_sample *)b;
+
+	if (ts_a->time > ts_b->time)
+		return 1;
+	if (ts_a->time < ts_b->time)
+		return -1;
+	return 0;
+}
+
+static void tsync_offset_load(struct tracecmd_input *handle, char *buf)
+{
+	int i, j;
+	long long *buf8 = (long long *)buf;
+
+	for (i = 0; i < handle->ts_samples_count; i++) {
+		handle->ts_samples[i].time = tep_read_number(handle->pevent,
+							  buf8+i, 8);
+		handle->ts_samples[i].offset = tep_read_number(handle->pevent,
+						buf8+handle->ts_samples_count+i, 8);
+	}
+	qsort(handle->ts_samples,
+	      handle->ts_samples_count, sizeof(struct ts_offset_sample),
+	      tsync_offset_cmp);
+	/* Filter possible samples with equal time */
+	for (i = 0, j = 0; i < handle->ts_samples_count; i++) {
+		if (i == 0 ||
+		    handle->ts_samples[i].time != handle->ts_samples[i-1].time) {
+			handle->ts_samples[j++] = handle->ts_samples[i];
+		}
+	}
+	handle->ts_samples_count = j;
+}
+
 static int handle_options(struct tracecmd_input *handle)
 {
 	long long offset;
@@ -2127,6 +2230,7 @@ static int handle_options(struct tracecmd_input *handle)
 	struct input_buffer_instance *buffer;
 	struct hook_list *hook;
 	char *buf;
+	int sampes_size;
 	int cpus;
 
 	for (;;) {
@@ -2171,6 +2275,25 @@ static int handle_options(struct tracecmd_input *handle)
 			offset = strtoll(buf, NULL, 0);
 			handle->ts_offset += offset;
 			break;
+		case TRACECMD_OPTION_TIME_SHIFT:
+			/*
+			 * int (4 bytes) count of timestamp offsets.
+			 * long long array of size [count] of times,
+			 *	when the offsets were calculated.
+			 * long long array of size [count] of timestamp offsets.
+			 */
+			if (handle->flags & TRACECMD_FL_IGNORE_DATE)
+				break;
+			handle->ts_samples_count = tep_read_number(handle->pevent,
+								   buf, 4);
+			sampes_size = (8*handle->ts_samples_count);
+			if (size != (4+(2*sampes_size)))
+				break;
+			handle->ts_samples = malloc(2*sampes_size);
+			if (!handle->ts_samples)
+				return -ENOMEM;
+			tsync_offset_load(handle, buf+4);
+			break;
 		case TRACECMD_OPTION_CPUSTAT:
 			buf[size-1] = '\n';
 			cpustats = realloc(cpustats, cpustats_size + size + 1);
-- 
2.20.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v11 9/9] trace-cmd [POC]: Implemented timestamps synch algorithm, using vsock events.
  2019-04-25 11:05 [PATCH v11 0/9] trace-cmd: Timetamps sync between host and guest machines, relying on vsock events Tzvetomir Stoyanov
                   ` (7 preceding siblings ...)
  2019-04-25 11:05 ` [PATCH v11 8/9] trace-cmd: Implemented new option in trace.dat file: TRACECMD_OPTION_TIME_SHIFT Tzvetomir Stoyanov
@ 2019-04-25 11:05 ` Tzvetomir Stoyanov
  2019-08-15 18:40 ` [PATCH v11 0/9] trace-cmd: Timetamps sync between host and guest machines, relying on " Steven Rostedt
  9 siblings, 0 replies; 11+ messages in thread
From: Tzvetomir Stoyanov @ 2019-04-25 11:05 UTC (permalink / raw)
  To: rostedt; +Cc: linux-trace-devel

This is a POC patch, implementing an algorithm for syncing timestamps between
host and guest machines, using vsock trace events to catch the host / guest time.

Signed-off-by: Tzvetomir Stoyanov <tstoyanov@vmware.com>
---
 include/trace-cmd/trace-cmd.h  |  21 +-
 tracecmd/Makefile              |   1 +
 tracecmd/include/trace-local.h |  33 +-
 tracecmd/trace-agent.c         |  13 +-
 tracecmd/trace-msg.c           | 209 ++++++++-
 tracecmd/trace-record.c        |  84 +++-
 tracecmd/trace-timesync.c      | 808 +++++++++++++++++++++++++++++++++
 7 files changed, 1133 insertions(+), 36 deletions(-)
 create mode 100644 tracecmd/trace-timesync.c

diff --git a/include/trace-cmd/trace-cmd.h b/include/trace-cmd/trace-cmd.h
index df97eb9..b7e42a8 100644
--- a/include/trace-cmd/trace-cmd.h
+++ b/include/trace-cmd/trace-cmd.h
@@ -343,16 +343,29 @@ bool tracecmd_msg_done(struct tracecmd_msg_handle *msg_handle);
 void tracecmd_msg_set_done(struct tracecmd_msg_handle *msg_handle);
 
 int tracecmd_msg_send_trace_req(struct tracecmd_msg_handle *msg_handle,
-				int argc, char **argv, bool use_fifos);
+				int argc, char **argv, bool use_fifos,
+				bool do_tsync);
 int tracecmd_msg_recv_trace_req(struct tracecmd_msg_handle *msg_handle,
-				int *argc, char ***argv, bool *use_fifos);
+				int *argc, char ***argv, bool *use_fifos,
+				bool *do_tsync);
 
 int tracecmd_msg_send_trace_resp(struct tracecmd_msg_handle *msg_handle,
 				 int nr_cpus, int page_size,
-				 unsigned int *ports, bool use_fifos);
+				 unsigned int *ports, bool use_fifos,
+				 bool do_tsync);
 int tracecmd_msg_recv_trace_resp(struct tracecmd_msg_handle *msg_handle,
 				 int *nr_cpus, int *page_size,
-				 unsigned int **ports, bool *use_fifos);
+				 unsigned int **ports, bool *use_fifos,
+				 bool *do_tsync);
+
+struct tracecmd_clock_sync;
+
+int tracecmd_msg_rcv_time_sync(struct tracecmd_msg_handle *msg_handle,
+			       struct tracecmd_clock_sync *clock_sync,
+			       long long *offset, long long *timestamp);
+int tracecmd_msg_snd_time_sync(struct tracecmd_msg_handle *msg_handle,
+			       struct tracecmd_clock_sync *clock_sync,
+			       long long *offset, long long *timestamp);
 
 /* --- Plugin handling --- */
 extern struct tep_plugin_option trace_ftrace_options[];
diff --git a/tracecmd/Makefile b/tracecmd/Makefile
index d3e3080..8a73bf7 100644
--- a/tracecmd/Makefile
+++ b/tracecmd/Makefile
@@ -32,6 +32,7 @@ TRACE_CMD_OBJS += trace-list.o
 TRACE_CMD_OBJS += trace-output.o
 TRACE_CMD_OBJS += trace-usage.o
 TRACE_CMD_OBJS += trace-msg.o
+TRACE_CMD_OBJS += trace-timesync.o
 
 ifeq ($(VSOCK_DEFINED), 1)
 TRACE_CMD_OBJS += trace-agent.o
diff --git a/tracecmd/include/trace-local.h b/tracecmd/include/trace-local.h
index 7f4d676..fa0ec9c 100644
--- a/tracecmd/include/trace-local.h
+++ b/tracecmd/include/trace-local.h
@@ -26,6 +26,7 @@ extern int quiet;
 typedef unsigned long long u64;
 
 struct buffer_instance;
+struct tracecmd_clock_sync;
 
 /* for local shared information with trace-cmd executable */
 
@@ -101,7 +102,7 @@ void trace_usage(int argc, char **argv);
 
 int trace_record_agent(struct tracecmd_msg_handle *msg_handle,
 		       int cpus, int *fds,
-		       int argc, char **argv, bool use_fifos);
+		       int argc, char **argv, bool use_fifos, bool do_tsync);
 
 struct hook_list;
 
@@ -214,6 +215,12 @@ struct buffer_instance {
 	unsigned int		port;
 	int			*fds;
 	bool			use_fifos;
+	bool			do_tsync;
+
+	struct tracecmd_clock_sync *clock_sync;
+	int			time_sync_count;
+	long long		*time_sync_ts;
+	long long		*time_sync_offsets;
 };
 
 extern struct buffer_instance top_instance;
@@ -235,6 +242,30 @@ void update_first_instance(struct buffer_instance *instance, int topt);
 void show_instance_file(struct buffer_instance *instance, const char *name);
 
 int count_cpus(void);
+
+struct tracecmd_time_sync_event {
+	int			id;
+	int			cpu;
+	int			pid;
+	unsigned long long	ts;
+};
+
+int tracecmd_clock_get_peer(struct tracecmd_clock_sync *clock_context,
+			    unsigned int *remote_cid, unsigned int *remote_port);
+bool tracecmd_time_sync_check(void);
+void tracecmd_clock_context_free(struct buffer_instance *instance);
+int tracecmd_clock_find_event(struct tracecmd_clock_sync *clock, int cpu,
+			      struct tracecmd_time_sync_event *event);
+void tracecmd_clock_synch_enable(struct tracecmd_clock_sync *clock_context);
+void tracecmd_clock_synch_disable(struct tracecmd_clock_sync *clock_context);
+void tracecmd_clock_synch_calc_reset(struct tracecmd_clock_sync *clock_context);
+void tracecmd_clock_synch_calc_probe(struct tracecmd_clock_sync *clock_context,
+				     long long ts_local, long long ts_remote);
+int tracecmd_clock_synch_calc(struct tracecmd_clock_sync *clock_context,
+			       long long *offset_ret, long long *time_ret);
+void sync_time_with_host_v3(struct buffer_instance *instance);
+void sync_time_with_guest_v3(struct buffer_instance *instance);
+
 void write_tracing_on(struct buffer_instance *instance, int on);
 char *get_instance_dir(struct buffer_instance *instance);
 int write_instance_file(struct buffer_instance *instance,
diff --git a/tracecmd/trace-agent.c b/tracecmd/trace-agent.c
index 71cb973..33e8ea8 100644
--- a/tracecmd/trace-agent.c
+++ b/tracecmd/trace-agent.c
@@ -132,6 +132,7 @@ static void agent_handle(int sd, int nr_cpus, int page_size)
 	char **argv = NULL;
 	int argc = 0;
 	bool use_fifos;
+	bool do_tsync;
 	int *fds;
 	int ret;
 
@@ -144,7 +145,8 @@ static void agent_handle(int sd, int nr_cpus, int page_size)
 	if (!msg_handle)
 		die("Failed to allocate message handle");
 
-	ret = tracecmd_msg_recv_trace_req(msg_handle, &argc, &argv, &use_fifos);
+	ret = tracecmd_msg_recv_trace_req(msg_handle, &argc, &argv,
+					  &use_fifos, &do_tsync);
 	if (ret < 0)
 		die("Failed to receive trace request");
 
@@ -153,13 +155,18 @@ static void agent_handle(int sd, int nr_cpus, int page_size)
 
 	if (!use_fifos)
 		make_vsocks(nr_cpus, fds, ports);
+	if (do_tsync) {
+		do_tsync = tracecmd_time_sync_check();
+		if (!do_tsync)
+			warning("Failed to negotiate timestamps synchronization with the host");
+	}
 
 	ret = tracecmd_msg_send_trace_resp(msg_handle, nr_cpus, page_size,
-					   ports, use_fifos);
+					   ports, use_fifos, do_tsync);
 	if (ret < 0)
 		die("Failed to send trace response");
 
-	trace_record_agent(msg_handle, nr_cpus, fds, argc, argv, use_fifos);
+	trace_record_agent(msg_handle, nr_cpus, fds, argc, argv, use_fifos, do_tsync);
 
 	free(argv[0]);
 	free(argv);
diff --git a/tracecmd/trace-msg.c b/tracecmd/trace-msg.c
index ff32e77..df29e47 100644
--- a/tracecmd/trace-msg.c
+++ b/tracecmd/trace-msg.c
@@ -26,8 +26,12 @@
 #include "trace-local.h"
 #include "trace-msg.h"
 
+typedef __u16 u16;
+typedef __s16 s16;
 typedef __u32 u32;
 typedef __be32 be32;
+typedef __u64 u64;
+typedef __s64 s64;
 
 static inline void dprint(const char *fmt, ...)
 {
@@ -50,6 +54,9 @@ static inline void dprint(const char *fmt, ...)
 
 unsigned int page_size;
 
+/* Try a few times to get an accurate time sync */
+#define TSYNC_TRIES 300
+
 struct tracecmd_msg_tinit {
 	be32 cpus;
 	be32 page_size;
@@ -71,6 +78,20 @@ struct tracecmd_msg_trace_resp {
 	be32 page_size;
 } __attribute__((packed));
 
+struct tracecmd_msg_tsync_stop {
+	long long offset;
+	long long timestamp;
+} __attribute__((packed));
+
+struct tracecmd_msg_tsync_req {
+	u16 cpu;
+} __attribute__((packed));
+
+struct tracecmd_msg_tsync_resp {
+	u64 time;
+} __attribute__((packed));
+
+
 struct tracecmd_msg_header {
 	be32	size;
 	be32	cmd;
@@ -78,14 +99,19 @@ struct tracecmd_msg_header {
 } __attribute__((packed));
 
 #define MSG_MAP								\
-	C(CLOSE,	0,	0),					\
-	C(TINIT,	1,	sizeof(struct tracecmd_msg_tinit)),	\
-	C(RINIT,	2,	sizeof(struct tracecmd_msg_rinit)),	\
-	C(SEND_DATA,	3,	0),					\
-	C(FIN_DATA,	4,	0),					\
-	C(NOT_SUPP,	5,	0),					\
-	C(TRACE_REQ,	6,	sizeof(struct tracecmd_msg_trace_req)),	\
-	C(TRACE_RESP,	7,	sizeof(struct tracecmd_msg_trace_resp)),
+	C(CLOSE,	  0,	0),					\
+	C(TINIT,	  1,	sizeof(struct tracecmd_msg_tinit)),	\
+	C(RINIT,	  2,	sizeof(struct tracecmd_msg_rinit)),	\
+	C(SEND_DATA,	  3,	0),					\
+	C(FIN_DATA,	  4,	0),					\
+	C(NOT_SUPP,	  5,	0),					\
+	C(TRACE_REQ,	  6,	sizeof(struct tracecmd_msg_trace_req)),	\
+	C(TRACE_RESP,	  7,	sizeof(struct tracecmd_msg_trace_resp)),\
+	C(TSYNC_START,	  8,	0),					\
+	C(TSYNC_STOP,	  9,	sizeof(struct tracecmd_msg_tsync_stop)),\
+	C(TSYNC_PROBE,	  11,	0),					\
+	C(TSYNC_REQ,	  11,	sizeof(struct tracecmd_msg_tsync_req)),	\
+	C(TSYNC_RESP,	  12,	sizeof(struct tracecmd_msg_tsync_resp)),
 
 #undef C
 #define C(a,b,c)	MSG_##a = b
@@ -115,10 +141,13 @@ static const char *cmd_to_name(int cmd)
 struct tracecmd_msg {
 	struct tracecmd_msg_header		hdr;
 	union {
-		struct tracecmd_msg_tinit	tinit;
-		struct tracecmd_msg_rinit	rinit;
-		struct tracecmd_msg_trace_req	trace_req;
-		struct tracecmd_msg_trace_resp	trace_resp;
+		struct tracecmd_msg_tinit		tinit;
+		struct tracecmd_msg_rinit		rinit;
+		struct tracecmd_msg_trace_req		trace_req;
+		struct tracecmd_msg_trace_resp		trace_resp;
+		struct tracecmd_msg_tsync_stop		ts_stop;
+		struct tracecmd_msg_tsync_req		ts_req;
+		struct tracecmd_msg_tsync_resp		ts_resp;
 	};
 	char					*buf;
 } __attribute__((packed));
@@ -157,6 +186,7 @@ static int msg_write(int fd, struct tracecmd_msg *msg)
 
 enum msg_trace_flags {
 	MSG_TRACE_USE_FIFOS = 1 << 0,
+	MSG_TRACE_DO_TSYNC =  1 << 1,
 };
 
 static int make_tinit(struct tracecmd_msg_handle *msg_handle,
@@ -792,7 +822,8 @@ error:
 	return ret;
 }
 
-static int make_trace_req(struct tracecmd_msg *msg, int argc, char **argv, bool use_fifos)
+static int make_trace_req(struct tracecmd_msg *msg, int argc, char **argv,
+			  bool use_fifos, bool do_tsync)
 {
 	size_t args_size = 0;
 	char *p;
@@ -802,7 +833,12 @@ static int make_trace_req(struct tracecmd_msg *msg, int argc, char **argv, bool
 		args_size += strlen(argv[i]) + 1;
 
 	msg->hdr.size = htonl(ntohl(msg->hdr.size) + args_size);
-	msg->trace_req.flags = use_fifos ? htonl(MSG_TRACE_USE_FIFOS) : htonl(0);
+	msg->trace_req.flags = 0;
+	if (use_fifos)
+		msg->trace_req.flags |= MSG_TRACE_USE_FIFOS;
+	if (do_tsync)
+		msg->trace_req.flags |= MSG_TRACE_DO_TSYNC;
+	msg->trace_req.flags = htonl(msg->trace_req.flags);
 	msg->trace_req.argc = htonl(argc);
 	msg->buf = calloc(args_size, 1);
 	if (!msg->buf)
@@ -816,13 +852,14 @@ static int make_trace_req(struct tracecmd_msg *msg, int argc, char **argv, bool
 }
 
 int tracecmd_msg_send_trace_req(struct tracecmd_msg_handle *msg_handle,
-				int argc, char **argv, bool use_fifos)
+				int argc, char **argv, bool use_fifos,
+				bool do_tsync)
 {
 	struct tracecmd_msg msg;
 	int ret;
 
 	tracecmd_msg_init(MSG_TRACE_REQ, &msg);
-	ret = make_trace_req(&msg, argc, argv, use_fifos);
+	ret = make_trace_req(&msg, argc, argv, use_fifos, do_tsync);
 	if (ret < 0)
 		return ret;
 
@@ -835,7 +872,8 @@ int tracecmd_msg_send_trace_req(struct tracecmd_msg_handle *msg_handle,
   *     free(argv);
   */
 int tracecmd_msg_recv_trace_req(struct tracecmd_msg_handle *msg_handle,
-				int *argc, char ***argv, bool *use_fifos)
+				int *argc, char ***argv, bool *use_fifos,
+				bool *do_tsync)
 {
 	struct tracecmd_msg msg;
 	char *p, *buf_end, **args;
@@ -882,6 +920,7 @@ int tracecmd_msg_recv_trace_req(struct tracecmd_msg_handle *msg_handle,
 	*argc = nr_args;
 	*argv = args;
 	*use_fifos = ntohl(msg.trace_req.flags) & MSG_TRACE_USE_FIFOS;
+	*do_tsync = ntohl(msg.trace_req.flags) & MSG_TRACE_DO_TSYNC;
 
 	/*
 	 * On success we're passing msg.buf to the caller through argv[0] so we
@@ -901,8 +940,125 @@ out:
 	return ret;
 }
 
+int tracecmd_msg_rcv_time_sync(struct tracecmd_msg_handle *msg_handle,
+			       struct tracecmd_clock_sync *clock_context,
+			       long long *offset, long long *timestamp)
+{
+	struct tracecmd_time_sync_event event;
+	unsigned int remote_cid = 0;
+	struct tracecmd_msg msg;
+	int cpu_pid, ret;
+
+	if (clock_context == NULL || msg_handle == NULL)
+		return 0;
+
+	if (offset)
+		*offset = 0;
+
+	ret = tracecmd_msg_recv(msg_handle->fd, &msg);
+	if (ret < 0 || ntohl(msg.hdr.cmd) == MSG_TSYNC_STOP)
+		return 0;
+	if (ntohl(msg.hdr.cmd) != MSG_TSYNC_START) {
+		handle_unexpected_msg(msg_handle, &msg);
+		return 0;
+	}
+
+	tracecmd_clock_get_peer(clock_context, &remote_cid, NULL);
+	tracecmd_msg_init(MSG_TSYNC_START, &msg);
+	tracecmd_msg_send(msg_handle->fd, &msg);
+	tracecmd_clock_synch_enable(clock_context);
+
+	do {
+		memset(&event, 0, sizeof(event));
+		ret = tracecmd_msg_recv(msg_handle->fd, &msg);
+		if (ret < 0 || ntohl(msg.hdr.cmd) == MSG_TSYNC_STOP)
+			break;
+		if (ntohl(msg.hdr.cmd) != MSG_TSYNC_PROBE) {
+			handle_unexpected_msg(msg_handle, &msg);
+			break;
+		}
+		ret = tracecmd_msg_recv(msg_handle->fd, &msg);
+		if (ret < 0 || ntohl(msg.hdr.cmd) == MSG_TSYNC_STOP)
+			break;
+		if (ntohl(msg.hdr.cmd) != MSG_TSYNC_REQ) {
+			handle_unexpected_msg(msg_handle, &msg);
+			break;
+		}
+		/* Get kvm event related to the corresponding VCPU context */
+		cpu_pid = get_guest_vcpu_pid(remote_cid, ntohs(msg.ts_req.cpu));
+		tracecmd_clock_find_event(clock_context, cpu_pid, &event);
+		tracecmd_msg_init(MSG_TSYNC_RESP, &msg);
+		msg.ts_resp.time = htonll(event.ts);
+		tracecmd_msg_send(msg_handle->fd, &msg);
+	} while (true);
+
+	tracecmd_clock_synch_disable(clock_context);
+
+	if (ret >= 0 && ntohl(msg.hdr.cmd) == MSG_TSYNC_STOP) {
+		if (offset)
+			*offset = ntohll(msg.ts_stop.offset);
+		if (timestamp)
+			*timestamp = ntohll(msg.ts_stop.timestamp);
+	}
+
+	msg_free(&msg);
+	return 0;
+}
+
+int tracecmd_msg_snd_time_sync(struct tracecmd_msg_handle *msg_handle,
+			       struct tracecmd_clock_sync *clock_context,
+			       long long *offset, long long *timestamp)
+{
+	struct tracecmd_time_sync_event event;
+	int sync_loop = TSYNC_TRIES;
+	struct tracecmd_msg msg;
+	int ret;
+
+	if (clock_context == NULL || msg_handle == NULL)
+		return 0;
+
+	tracecmd_msg_init(MSG_TSYNC_START, &msg);
+	tracecmd_msg_send(msg_handle->fd, &msg);
+	ret = tracecmd_msg_recv(msg_handle->fd, &msg);
+	if (ret < 0 || ntohl(msg.hdr.cmd) != MSG_TSYNC_START)
+		return 0;
+	tracecmd_clock_synch_calc_reset(clock_context);
+	tracecmd_clock_synch_enable(clock_context);
+
+	do {
+		tracecmd_msg_init(MSG_TSYNC_PROBE, &msg);
+		tracecmd_msg_send(msg_handle->fd, &msg);
+		/* Get the ts and CPU of the sent event */
+		ret = tracecmd_clock_find_event(clock_context, -1, &event);
+		tracecmd_msg_init(MSG_TSYNC_REQ, &msg);
+		msg.ts_req.cpu = htons(event.cpu);
+		tracecmd_msg_send(msg_handle->fd, &msg);
+		memset(&msg, 0, sizeof(msg));
+		ret = tracecmd_msg_recv(msg_handle->fd, &msg);
+		if (ret < 0)
+			break;
+		if (ntohl(msg.hdr.cmd) != MSG_TSYNC_RESP) {
+			handle_unexpected_msg(msg_handle, &msg);
+			break;
+		}
+		tracecmd_clock_synch_calc_probe(clock_context,
+						event.ts,
+						htonll(msg.ts_resp.time));
+	} while (--sync_loop);
+
+	tracecmd_clock_synch_disable(clock_context);
+	tracecmd_clock_synch_calc(clock_context, offset, timestamp);
+	tracecmd_msg_init(MSG_TSYNC_STOP, &msg);
+	msg.ts_stop.offset = htonll(*offset);
+	msg.ts_stop.timestamp = htonll(*timestamp);
+	tracecmd_msg_send(msg_handle->fd, &msg);
+
+	msg_free(&msg);
+	return 0;
+}
+
 static int make_trace_resp(struct tracecmd_msg *msg, int page_size, int nr_cpus,
-			   unsigned int *ports, bool use_fifos)
+			   unsigned int *ports, bool use_fifos, bool do_tsync)
 {
 	int data_size;
 
@@ -913,7 +1069,13 @@ static int make_trace_resp(struct tracecmd_msg *msg, int page_size, int nr_cpus,
 	write_uints(msg->buf, data_size, ports, nr_cpus);
 
 	msg->hdr.size = htonl(ntohl(msg->hdr.size) + data_size);
-	msg->trace_resp.flags = use_fifos ? htonl(MSG_TRACE_USE_FIFOS) : htonl(0);
+	msg->trace_resp.flags = 0;
+	if (use_fifos)
+		msg->trace_resp.flags |= MSG_TRACE_USE_FIFOS;
+	if (do_tsync)
+		msg->trace_resp.flags |= MSG_TRACE_DO_TSYNC;
+	msg->trace_resp.flags = htonl(msg->trace_resp.flags);
+
 	msg->trace_resp.cpus = htonl(nr_cpus);
 	msg->trace_resp.page_size = htonl(page_size);
 
@@ -922,13 +1084,14 @@ static int make_trace_resp(struct tracecmd_msg *msg, int page_size, int nr_cpus,
 
 int tracecmd_msg_send_trace_resp(struct tracecmd_msg_handle *msg_handle,
 				 int nr_cpus, int page_size,
-				 unsigned int *ports, bool use_fifos)
+				 unsigned int *ports, bool use_fifos,
+				 bool do_tsync)
 {
 	struct tracecmd_msg msg;
 	int ret;
 
 	tracecmd_msg_init(MSG_TRACE_RESP, &msg);
-	ret = make_trace_resp(&msg, page_size, nr_cpus, ports, use_fifos);
+	ret = make_trace_resp(&msg, page_size, nr_cpus, ports, use_fifos, do_tsync);
 	if (ret < 0)
 		return ret;
 
@@ -937,7 +1100,8 @@ int tracecmd_msg_send_trace_resp(struct tracecmd_msg_handle *msg_handle,
 
 int tracecmd_msg_recv_trace_resp(struct tracecmd_msg_handle *msg_handle,
 				 int *nr_cpus, int *page_size,
-				 unsigned int **ports, bool *use_fifos)
+				 unsigned int **ports, bool *use_fifos,
+				 bool *do_tsync)
 {
 	struct tracecmd_msg msg;
 	char *p, *buf_end;
@@ -960,6 +1124,7 @@ int tracecmd_msg_recv_trace_resp(struct tracecmd_msg_handle *msg_handle,
 	}
 
 	*use_fifos = ntohl(msg.trace_resp.flags) & MSG_TRACE_USE_FIFOS;
+	*do_tsync = ntohl(msg.trace_resp.flags) & MSG_TRACE_DO_TSYNC;
 	*nr_cpus = ntohl(msg.trace_resp.cpus);
 	*page_size = ntohl(msg.trace_resp.page_size);
 	*ports = calloc(*nr_cpus, sizeof(**ports));
diff --git a/tracecmd/trace-record.c b/tracecmd/trace-record.c
index 59a4170..359c094 100644
--- a/tracecmd/trace-record.c
+++ b/tracecmd/trace-record.c
@@ -211,6 +211,8 @@ struct common_record_context {
 	char *date2ts;
 	char *max_graph_depth;
 	int data_flags;
+	int time_shift_count;
+	struct tracecmd_option *time_shift_option;
 
 	int record_all;
 	int total_disable;
@@ -650,11 +652,20 @@ static void tell_guests_to_stop(void)
 	for_all_instances(instance) {
 		if (is_guest(instance)) {
 			tracecmd_msg_send_close_msg(instance->msg_handle);
-			tracecmd_msg_handle_close(instance->msg_handle);
 		}
 	}
 }
 
+static void close_guests_handle(void)
+{
+	struct buffer_instance *instance;
+
+	for_all_instances(instance) {
+		if (is_guest(instance))
+			tracecmd_msg_handle_close(instance->msg_handle);
+	}
+}
+
 static void stop_threads(enum trace_type type)
 {
 	struct timeval tv = { 0, 0 };
@@ -3473,6 +3484,7 @@ static void connect_to_agent(struct buffer_instance *instance)
 	unsigned int *ports;
 	int i, *fds = NULL;
 	bool use_fifos = false;
+	bool do_tsync, do_tsync_reply;
 
 	if (!no_fifos) {
 		nr_fifos = open_guest_fifos(instance->name, &fds);
@@ -3484,20 +3496,24 @@ static void connect_to_agent(struct buffer_instance *instance)
 		die("Failed to connect to vsocket @%u:%u",
 		    instance->cid, instance->port);
 
+	do_tsync = tracecmd_time_sync_check();
+
 	msg_handle = tracecmd_msg_handle_alloc(sd, 0);
 	if (!msg_handle)
 		die("Failed to allocate message handle");
 
 	ret = tracecmd_msg_send_trace_req(msg_handle, instance->argc,
-					  instance->argv, use_fifos);
+					  instance->argv, use_fifos, do_tsync);
 	if (ret < 0)
 		die("Failed to send trace request");
 
 	ret = tracecmd_msg_recv_trace_resp(msg_handle, &nr_cpus, &page_size,
-					   &ports, &use_fifos);
+					   &ports, &use_fifos, &do_tsync_reply);
 	if (ret < 0)
 		die("Failed to receive trace response");
-
+	if (do_tsync != do_tsync_reply)
+		warning("Failed to negotiate timestamps synchronization with the guest %s",
+			instance->name);
 	if (use_fifos) {
 		if (nr_cpus != nr_fifos) {
 			warning("number of FIFOs (%d) for guest %s differs "
@@ -3515,10 +3531,13 @@ static void connect_to_agent(struct buffer_instance *instance)
 	}
 
 	instance->use_fifos = use_fifos;
+	instance->do_tsync = do_tsync_reply;
 	instance->cpu_count = nr_cpus;
 
 	/* the msg_handle now points to the guest fd */
 	instance->msg_handle = msg_handle;
+
+	sync_time_with_guest_v3(instance);
 }
 
 static void setup_guest(struct buffer_instance *instance)
@@ -3543,10 +3562,13 @@ static void setup_guest(struct buffer_instance *instance)
 	close(fd);
 }
 
-static void setup_agent(struct buffer_instance *instance, struct common_record_context *ctx)
+static void setup_agent(struct buffer_instance *instance,
+			struct common_record_context *ctx)
 {
 	struct tracecmd_output *network_handle;
 
+	sync_time_with_host_v3(instance);
+
 	network_handle = tracecmd_create_init_fd_msg(instance->msg_handle,
 						     listed_events);
 	add_options(network_handle, ctx);
@@ -5748,6 +5770,41 @@ static bool has_local_instances(void)
 	return false;
 }
 
+static void write_guest_time_shift(struct buffer_instance *instance)
+{
+	struct tracecmd_output *handle;
+	struct iovec vector[3];
+	char *file;
+	int fd;
+
+	if (!instance->time_sync_count)
+		return;
+
+	file = get_guest_file(output_file, instance->name);
+	fd = open(file, O_RDWR);
+	if (fd < 0)
+		die("error opening %s", file);
+	put_temp_file(file);
+	handle = tracecmd_get_output_handle_fd(fd);
+	vector[0].iov_len = 4;
+	vector[0].iov_base = &instance->time_sync_count;
+	vector[1].iov_len = 8 * instance->time_sync_count;
+	vector[1].iov_base = instance->time_sync_ts;
+	vector[2].iov_len = 8 * instance->time_sync_count;
+	vector[2].iov_base = instance->time_sync_offsets;
+	tracecmd_add_option_v(handle, TRACECMD_OPTION_TIME_SHIFT, vector, 3);
+	tracecmd_append_options(handle);
+	tracecmd_output_close(handle);
+#ifdef TSYNC_DEBUG
+	if (instance->time_sync_count > 1)
+		printf("\n\rDetected %lld ns ts offset drift in %lld ns for guest %s\n\r",
+			instance->time_sync_offsets[instance->time_sync_count-1] -
+			instance->time_sync_offsets[0],
+			instance->time_sync_ts[instance->time_sync_count-1]-
+			instance->time_sync_ts[0], instance->name);
+#endif
+}
+
 /*
  * This function contains common code for the following commands:
  * record, start, stream, profile.
@@ -5867,6 +5924,20 @@ static void record_trace(int argc, char **argv,
 	if (!latency)
 		wait_threads();
 
+	if (ctx->instance && is_agent(ctx->instance)) {
+		sync_time_with_host_v3(ctx->instance);
+		tracecmd_clock_context_free(ctx->instance);
+	} else {
+		for_all_instances(instance) {
+			if (is_guest(instance)) {
+				sync_time_with_guest_v3(instance);
+				write_guest_time_shift(instance);
+				tracecmd_clock_context_free(instance);
+			}
+		}
+	}
+	close_guests_handle();
+
 	if (IS_RECORD(ctx)) {
 		record_data(ctx);
 		delete_thread_data();
@@ -6005,7 +6076,7 @@ void trace_record(int argc, char **argv)
 int trace_record_agent(struct tracecmd_msg_handle *msg_handle,
 		       int cpus, int *fds,
 		       int argc, char **argv,
-		       bool use_fifos)
+		       bool use_fifos, bool do_tsync)
 {
 	struct common_record_context ctx;
 	char **argv_plus;
@@ -6032,6 +6103,7 @@ int trace_record_agent(struct tracecmd_msg_handle *msg_handle,
 
 	ctx.instance->fds = fds;
 	ctx.instance->use_fifos = use_fifos;
+	ctx.instance->do_tsync = do_tsync;
 	ctx.instance->flags |= BUFFER_FL_AGENT;
 	ctx.instance->msg_handle = msg_handle;
 	msg_handle->version = V3_PROTOCOL;
diff --git a/tracecmd/trace-timesync.c b/tracecmd/trace-timesync.c
new file mode 100644
index 0000000..a3f0c10
--- /dev/null
+++ b/tracecmd/trace-timesync.c
@@ -0,0 +1,808 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2019, VMware, Tzvetomir Stoyanov <tstoyanov@vmware.com>
+ *
+ */
+
+#include <fcntl.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <arpa/inet.h>
+#include <linux/vm_sockets.h>
+#include "trace-local.h"
+
+static int clock_sync_x86_host_init(struct tracecmd_clock_sync *clock_context);
+static int clock_sync_x86_host_free(struct tracecmd_clock_sync *clock_context);
+static int clock_sync_x86_host_find_events(struct tracecmd_clock_sync *clock,
+				    int cpu, struct tracecmd_time_sync_event *event);
+static int clock_sync_x86_guest_init(struct tracecmd_clock_sync *clock_context);
+static int clock_sync_x86_guest_free(struct tracecmd_clock_sync *clock_context);
+static int clock_sync_x86_guest_find_events(struct tracecmd_clock_sync *clock,
+					    int pid,
+					    struct tracecmd_time_sync_event *event);
+
+struct tracecmd_event_descr {
+	char			*system;
+	char			*name;
+};
+
+struct tracecmd_ftrace_param {
+	char	*file;
+	char	*set;
+	char	*reset;
+};
+
+enum clock_sync_context {
+	CLOCK_KVM_X86_VSOCK_HOST,
+	CLOCK_KVM_X86_VSOCK_GUEST,
+	CLOCK_CONTEXT_MAX,
+};
+
+struct tracecmd_clock_sync {
+	enum clock_sync_context		clock_context_id;
+	struct tracecmd_ftrace_param	*ftrace_params;
+	struct tracecmd_time_sync_event	*events;
+	int				events_count;
+	struct tep_handle		*tep;
+	struct buffer_instance		*vinst;
+
+	int				probes_count;
+	int				bad_probes;
+	int				probes_size;
+	long long			*times;
+	long long			*offsets;
+	long long			offset_av;
+	long long			offset_min;
+	long long			offset_max;
+	int				debug_fd;
+
+	unsigned int			local_cid;
+	unsigned int			local_port;
+	unsigned int			remote_cid;
+	unsigned int			remote_port;
+};
+
+struct {
+	const char *systems[3];
+	struct tracecmd_ftrace_param ftrace_params[5];
+	struct tracecmd_event_descr events[3];
+	int (*clock_sync_init)(struct tracecmd_clock_sync *clock_context);
+	int (*clock_sync_free)(struct tracecmd_clock_sync *clock_context);
+	int (*clock_sync_find_events)(struct tracecmd_clock_sync *clock_context,
+				      int pid,
+				      struct tracecmd_time_sync_event *event);
+	int (*clock_sync_load)(struct tracecmd_clock_sync *clock_context);
+	int (*clock_sync_unload)(struct tracecmd_clock_sync *clock_context);
+} static clock_sync[CLOCK_CONTEXT_MAX] = {
+	{ /* CLOCK_KVM_X86_VSOCK_HOST */
+	  .systems = {"vsock", "ftrace", NULL},
+	  .ftrace_params = {
+	  {"set_ftrace_filter", "vmx_read_l1_tsc_offset\nsvm_read_l1_tsc_offset", "\0"},
+	  {"current_tracer", "function", "nop"},
+	  {"events/vsock/virtio_transport_recv_pkt/enable", "1", "0"},
+	  {"events/vsock/virtio_transport_recv_pkt/filter", NULL, "\0"},
+	  {NULL, NULL, NULL} },
+	  .events = {
+	  {.system = "ftrace", .name = "function"},
+	  {.system = "vsock", .name = "virtio_transport_recv_pkt"},
+	  {.system = NULL, .name = NULL} },
+	 clock_sync_x86_host_init,
+	 clock_sync_x86_host_free,
+	 clock_sync_x86_host_find_events,
+	},
+
+	{ /* CLOCK_KVM_X86_VSOCK_GUEST */
+	 .systems = { "vsock", "ftrace", NULL},
+	 .ftrace_params = {
+	  {"set_ftrace_filter", "vp_notify", "\0"},
+	  {"current_tracer", "function", "nop"},
+	  {"events/vsock/virtio_transport_alloc_pkt/enable", "1", "0"},
+	  {"events/vsock/virtio_transport_alloc_pkt/filter", NULL, "\0"},
+	  {NULL, NULL, NULL},
+	  },
+	  .events = {
+	  {.system = "vsock", .name = "virtio_transport_alloc_pkt"},
+	  {.system = "ftrace", .name = "function"},
+	  {.system = NULL, .name = NULL}
+	 },
+	  clock_sync_x86_guest_init,
+	  clock_sync_x86_guest_free,
+	  clock_sync_x86_guest_find_events,
+	}
+};
+
+static int clock_sync_x86_host_init(struct tracecmd_clock_sync *clock_context)
+{
+	char vsock_filter[255];
+
+	snprintf(vsock_filter, 255,
+		"src_cid==%u && src_port==%u && dst_cid==%u && dst_port==%u && len!=0",
+		clock_context->remote_cid, clock_context->remote_port,
+		clock_context->local_cid, clock_context->local_port);
+
+	clock_context->ftrace_params[3].set = strdup(vsock_filter);
+	return 1;
+}
+
+static int clock_sync_x86_host_free(struct tracecmd_clock_sync *clock_context)
+{
+	free(clock_context->ftrace_params[3].set);
+	clock_context->ftrace_params[3].set = NULL;
+	return 1;
+}
+
+static int clock_sync_x86_guest_init(struct tracecmd_clock_sync *clock_context)
+{
+	char vsock_filter[255];
+
+	snprintf(vsock_filter, 255,
+		"src_cid==%u && src_port==%u && dst_cid==%u && dst_port==%u && len!=0",
+		clock_context->local_cid, clock_context->local_port,
+		clock_context->remote_cid, clock_context->remote_port);
+
+	clock_context->ftrace_params[3].set = strdup(vsock_filter);
+	return 1;
+}
+
+static int clock_sync_x86_guest_free(struct tracecmd_clock_sync *clock_context)
+{
+	free(clock_context->ftrace_params[3].set);
+	clock_context->ftrace_params[3].set = NULL;
+	return 1;
+}
+
+static int
+get_events_in_page(struct tep_handle *tep, void *page,
+		    int size, int cpu, struct tracecmd_time_sync_event **events,
+		    int *events_count, int *events_size)
+{
+	struct tracecmd_time_sync_event *events_array = NULL;
+	struct tep_record *last_record = NULL;
+	struct tep_event *event = NULL;
+	struct tep_record *record;
+	int id, cnt = 0;
+
+	if (size <= 0)
+		return 0;
+
+	if (*events == NULL) {
+		*events = malloc(10*sizeof(struct tracecmd_time_sync_event));
+		*events_size = 10;
+		*events_count = 0;
+	}
+
+	while (true) {
+		event = NULL;
+		record = tracecmd_read_page_record(tep, page, size,
+						   last_record);
+		if (!record)
+			break;
+		free_record(last_record);
+		id = tep_data_type(tep, record);
+		event = tep_find_event(tep, id);
+		if (event) {
+			if (*events_count >= *events_size) {
+				events_array = realloc(*events,
+					((3*(*events_size))/2)*
+					sizeof(struct tracecmd_time_sync_event));
+				if (events_array) {
+					*events = events_array;
+					(*events_size) = (((*events_size)*3)/2);
+				}
+			}
+
+			if (*events_count < *events_size) {
+				(*events)[*events_count].ts = record->ts;
+				(*events)[*events_count].cpu = cpu;
+				(*events)[*events_count].id = id;
+				(*events)[*events_count].pid = tep_data_pid(tep, record);
+				(*events_count)++;
+				cnt++;
+			}
+		}
+		last_record = record;
+	}
+	free_record(last_record);
+
+	return cnt;
+}
+
+static int sync_events_cmp(const void *a, const void *b)
+{
+	const struct tracecmd_time_sync_event *ea = (const struct tracecmd_time_sync_event *)a;
+	const struct tracecmd_time_sync_event *eb = (const struct tracecmd_time_sync_event *)b;
+
+	if (ea->ts > eb->ts)
+		return 1;
+	if (ea->ts < eb->ts)
+		return -1;
+	return 0;
+}
+
+static int find_sync_events(struct tep_handle *pevent,
+			    struct tracecmd_time_sync_event *recorded,
+			    int rsize, struct tracecmd_time_sync_event *events)
+{
+	int i = 0, j = 0;
+
+	while (i < rsize) {
+		if (!events[j].ts && events[j].id == recorded[i].id &&
+		    (events[j].pid < 0 || events[j].pid == recorded[i].pid)) {
+			events[j].cpu = recorded[i].cpu;
+			events[j].ts = recorded[i].ts;
+			j++;
+		} else if (j > 0 && events[j-1].id == recorded[i].id &&
+			  (events[j-1].pid < 0 || events[j-1].pid == recorded[i].pid)) {
+			events[j-1].cpu = recorded[i].cpu;
+			events[j-1].ts = recorded[i].ts;
+		}
+		i++;
+	}
+	return j;
+}
+
+//#define TSYNC_RBUFFER_DEBUG
+static int find_raw_events(struct tep_handle *tep,
+		    struct buffer_instance *instance,
+		    struct tracecmd_time_sync_event *events)
+{
+	struct tracecmd_time_sync_event *events_array = NULL;
+	int events_count = 0;
+	int events_size = 0;
+	unsigned int p_size;
+	char file[PATH_MAX];
+	int ts = 0;
+	void *page;
+	char *path;
+	int fd;
+	int i;
+	int r;
+
+	p_size = getpagesize();
+#ifdef TSYNC_RBUFFER_DEBUG
+	file = get_instance_file(instance, "trace");
+	if (!file)
+		return ts;
+	{
+		char *buf = NULL;
+		FILE *fp;
+		size_t n;
+		int r;
+
+		printf("Events:\n\r");
+		fp = fopen(file, "r");
+		while ((r = getline(&buf, &n, fp)) >= 0) {
+
+			if (buf[0] != '#')
+				printf("%s", buf);
+			free(buf);
+			buf = NULL;
+		}
+		fclose(fp);
+	}
+	tracecmd_put_tracing_file(file);
+#endif /* TSYNC_RBUFFER_DEBUG */
+	path = get_instance_file(instance, "per_cpu");
+	if (!path)
+		return ts;
+
+	page = malloc(p_size);
+	if (!page)
+		die("Failed to allocate time_stamp info");
+	for (i = 0; i < instance->cpu_count; i++) {
+		sprintf(file, "%s/cpu%d/trace_pipe_raw", path, i);
+		fd = open(file, O_RDONLY | O_NONBLOCK);
+		if (fd < 0)
+			continue;
+		do {
+			r = read(fd, page, p_size);
+			if (r > 0) {
+				get_events_in_page(tep, page, r, i,
+						   &events_array, &events_count,
+						   &events_size);
+			}
+		} while (r > 0);
+		close(fd);
+	}
+	qsort(events_array, events_count, sizeof(*events_array), sync_events_cmp);
+	r = find_sync_events(tep, events_array, events_count, events);
+#ifdef TSYNC_RBUFFER_DEBUG
+	len = 0;
+	while (events[len].id) {
+		printf("Found %d @ cpu %d: %lld pid %d\n\r",
+			events[len].id,  events[len].cpu,
+			events[len].ts, events[len].pid);
+		len++;
+	}
+#endif
+
+	free(events_array);
+	free(page);
+
+	tracecmd_put_tracing_file(path);
+	return r;
+}
+
+static int clock_sync_x86_host_find_events(struct tracecmd_clock_sync *clock,
+					   int pid,
+					   struct tracecmd_time_sync_event *event)
+{
+	int ret;
+
+	clock->events[0].pid = pid;
+	ret = find_raw_events(clock->tep, clock->vinst, clock->events);
+	event->ts = clock->events[0].ts;
+	event->cpu = clock->events[0].cpu;
+	return ret;
+
+}
+
+static int clock_sync_x86_guest_find_events(struct tracecmd_clock_sync *clock,
+					    int pid,
+					    struct tracecmd_time_sync_event *event)
+{
+	int ret;
+
+	ret = find_raw_events(clock->tep, clock->vinst, clock->events);
+	if (ret != clock->events_count)
+		return 0;
+	event->ts = clock->events[1].ts;
+	event->cpu = clock->events[0].cpu;
+	return 1;
+
+}
+
+static void tracecmd_clock_sync_reset(struct tracecmd_clock_sync *clock_context)
+{
+	int i = 0;
+
+	while (clock_context->events[i].id) {
+		clock_context->events[i].cpu = 0;
+		clock_context->events[i].ts = 0;
+		clock_context->events[i].pid = -1;
+		i++;
+	}
+}
+
+int tracecmd_clock_find_event(struct tracecmd_clock_sync *clock, int pid,
+			      struct tracecmd_time_sync_event *event)
+{
+	int ret = 0;
+	int id;
+
+	if (clock == NULL ||
+	    clock->clock_context_id >= CLOCK_CONTEXT_MAX)
+		return 0;
+
+	id = clock->clock_context_id;
+	if (clock_sync[id].clock_sync_find_events)
+		ret = clock_sync[id].clock_sync_find_events(clock, pid, event);
+
+	tracecmd_clock_sync_reset(clock);
+	return ret;
+}
+
+static void clock_context_copy(struct tracecmd_clock_sync *clock_context,
+			       struct tracecmd_ftrace_param *params,
+			       struct tracecmd_event_descr *events)
+{
+	int i;
+
+	i = 0;
+	while (params[i].file)
+		i++;
+	clock_context->ftrace_params = calloc(i+1, sizeof(struct tracecmd_ftrace_param));
+	i = 0;
+	while (params[i].file) {
+		clock_context->ftrace_params[i].file = strdup(params[i].file);
+		if (params[i].set)
+			clock_context->ftrace_params[i].set = strdup(params[i].set);
+		if (params[i].reset)
+			clock_context->ftrace_params[i].reset = strdup(params[i].reset);
+		i++;
+	}
+
+	i = 0;
+	while (events[i].name)
+		i++;
+	clock_context->events = calloc(i+1, sizeof(struct tracecmd_time_sync_event));
+	clock_context->events_count = i;
+}
+
+void trace_instance_reset(struct buffer_instance *vinst)
+{
+	write_instance_file(vinst, "trace", "\0", NULL);
+}
+
+static struct buffer_instance *
+clock_synch_create_instance(const char *clock, unsigned int cid)
+{
+	struct buffer_instance *vinst;
+	char inst_name[256];
+
+	snprintf(inst_name, 256, "clock_synch-%d", cid);
+
+	vinst = create_instance(strdup(inst_name));
+	tracecmd_init_instance(vinst);
+	vinst->cpu_count = tracecmd_local_cpu_count();
+	tracecmd_make_instance(vinst);
+	trace_instance_reset(vinst);
+	if (clock)
+		vinst->clock = strdup(clock);
+	tracecmd_set_clock(vinst);
+	return vinst;
+}
+
+static struct tep_handle *clock_synch_get_tep(struct buffer_instance *instance,
+					      const char * const *systems)
+{
+	struct tep_handle *tep = NULL;
+	char *path;
+
+	path = get_instance_dir(instance);
+	tep = tracecmd_local_events_system(path, systems);
+	tracecmd_put_tracing_file(path);
+
+	tep_set_file_bigendian(tep, tracecmd_host_bigendian());
+	tep_set_local_bigendian(tep, tracecmd_host_bigendian());
+
+	return tep;
+}
+
+static int get_vsocket_params(int fd, unsigned int *lcid, unsigned int *lport,
+			       unsigned int *rcid, unsigned int *rport)
+{
+	struct sockaddr_vm addr;
+	socklen_t addr_len = sizeof(addr);
+
+	memset(&addr, 0, sizeof(addr));
+	if (getsockname(fd, (struct sockaddr *)&addr, &addr_len))
+		return -1;
+	if (addr.svm_family != AF_VSOCK)
+		return -1;
+	*lport = addr.svm_port;
+	*lcid = addr.svm_cid;
+
+	memset(&addr, 0, sizeof(addr));
+	addr_len = sizeof(addr);
+	if (getpeername(fd, (struct sockaddr *)&addr, &addr_len))
+		return -1;
+	if (addr.svm_family != AF_VSOCK)
+		return -1;
+	*rport = addr.svm_port;
+	*rcid = addr.svm_cid;
+
+	return 0;
+}
+
+struct tracecmd_clock_sync*
+tracecmd_clock_context_new(struct tracecmd_msg_handle *msg_handle,
+			    const char *clock_str,
+			    enum clock_sync_context id)
+{
+	struct tracecmd_clock_sync *clock_context;
+	struct tep_event *event;
+	unsigned int i = 0;
+
+	switch (id) {
+#ifdef VSOCK
+	case CLOCK_KVM_X86_VSOCK_HOST:
+	case CLOCK_KVM_X86_VSOCK_GUEST:
+		break;
+#endif
+	default: /* not supported clock sync context */
+		return NULL;
+	}
+
+	if (id >= CLOCK_CONTEXT_MAX || NULL == msg_handle)
+		return NULL;
+	clock_context = calloc(1, sizeof(struct tracecmd_clock_sync));
+	if (!clock_context)
+		return NULL;
+	if (get_vsocket_params(msg_handle->fd,
+			       &clock_context->local_cid,
+			       &clock_context->local_port,
+			       &clock_context->remote_cid,
+			       &clock_context->remote_port)) {
+		free (clock_context);
+		return NULL;
+	}
+
+	clock_context->clock_context_id = id;
+	clock_context_copy(clock_context,
+			   clock_sync[id].ftrace_params, clock_sync[id].events);
+
+	if (clock_sync[id].clock_sync_init)
+		clock_sync[id].clock_sync_init(clock_context);
+
+	clock_context->vinst = clock_synch_create_instance(clock_str, clock_context->remote_cid);
+	clock_context->tep = clock_synch_get_tep(clock_context->vinst,
+						 clock_sync[id].systems);
+	i = 0;
+	while (clock_sync[id].events[i].name) {
+		event = tep_find_event_by_name(clock_context->tep,
+					       clock_sync[id].events[i].system,
+					       clock_sync[id].events[i].name);
+		if (!event)
+			break;
+		clock_context->events[i].id = event->id;
+		i++;
+	}
+#ifdef TSYNC_DEBUG
+	clock_context->debug_fd = -1;
+#endif
+
+	return clock_context;
+
+}
+
+void tracecmd_clock_context_free(struct buffer_instance *instance)
+{
+	int i;
+
+	if (instance->clock_sync == NULL ||
+	    instance->clock_sync->clock_context_id >= CLOCK_CONTEXT_MAX)
+		return;
+	if (clock_sync[instance->clock_sync->clock_context_id].clock_sync_free)
+		clock_sync[instance->clock_sync->clock_context_id].clock_sync_free(instance->clock_sync);
+
+	i = 0;
+	while (instance->clock_sync->ftrace_params[i].file) {
+		free(instance->clock_sync->ftrace_params[i].file);
+		free(instance->clock_sync->ftrace_params[i].set);
+		free(instance->clock_sync->ftrace_params[i].reset);
+		i++;
+	}
+	free(instance->clock_sync->ftrace_params);
+	free(instance->clock_sync->events);
+	tracecmd_remove_instance(instance->clock_sync->vinst);
+	/* todo: clean up the instance */
+	tep_free(instance->clock_sync->tep);
+
+	free(instance->clock_sync->offsets);
+	free(instance->clock_sync->times);
+#ifdef TSYNC_DEBUG
+	if (instance->clock_sync->debug_fd >= 0) {
+		close(instance->clock_sync->debug_fd);
+		instance->clock_sync->debug_fd = -1;
+	}
+#endif
+	free(instance->clock_sync);
+	instance->clock_sync = NULL;
+}
+
+bool tracecmd_time_sync_check(void)
+{
+#ifdef VSOCK
+	return true;
+#endif
+	return false;
+}
+
+void sync_time_with_host_v3(struct buffer_instance *instance)
+{
+	long long timestamp = 0;
+	long long offset = 0;
+
+	if (!instance->do_tsync)
+		return;
+
+	if (instance->clock_sync == NULL)
+		instance->clock_sync = tracecmd_clock_context_new(instance->msg_handle,
+					instance->clock, CLOCK_KVM_X86_VSOCK_GUEST);
+
+	tracecmd_msg_snd_time_sync(instance->msg_handle, instance->clock_sync,
+				   &offset, &timestamp);
+	if (!offset && !timestamp)
+		warning("Failed to synchronize timestamps with the host");
+}
+
+void sync_time_with_guest_v3(struct buffer_instance *instance)
+{
+	long long timestamp = 0;
+	long long offset = 0;
+	long long *sync_array_ts;
+	long long *sync_array_offs;
+
+	if (!instance->do_tsync)
+		return;
+
+	if (instance->clock_sync == NULL)
+		instance->clock_sync = tracecmd_clock_context_new(instance->msg_handle,
+						top_instance.clock, CLOCK_KVM_X86_VSOCK_HOST);
+
+	tracecmd_msg_rcv_time_sync(instance->msg_handle,
+				   instance->clock_sync, &offset, &timestamp);
+
+	if (!offset && !timestamp) {
+		warning("Failed to synchronize timestamps with guest %s",
+			instance->name);
+		return;
+	}
+
+	sync_array_ts = realloc(instance->time_sync_ts,
+			    (instance->time_sync_count+1)*sizeof(long long));
+	sync_array_offs = realloc(instance->time_sync_offsets,
+			    (instance->time_sync_count+1)*sizeof(long long));
+
+	if (sync_array_ts && sync_array_offs) {
+		sync_array_ts[instance->time_sync_count] = timestamp;
+		sync_array_offs[instance->time_sync_count] = offset;
+		instance->time_sync_count++;
+		instance->time_sync_ts = sync_array_ts;
+		instance->time_sync_offsets = sync_array_offs;
+
+	} else {
+		free(sync_array_ts);
+		free(sync_array_offs);
+	}
+
+}
+
+static void set_clock_synch_events(struct buffer_instance *instance,
+				   struct tracecmd_ftrace_param *params,
+				   bool enable)
+{
+	int i = 0;
+
+	if (!enable)
+		write_tracing_on(instance, 0);
+
+	while (params[i].file) {
+		if (enable && params[i].set) {
+			write_instance_file(instance, params[i].file,
+					    params[i].set, NULL);
+		}
+		if (!enable && params[i].reset)
+			write_instance_file(instance, params[i].file,
+					    params[i].reset, NULL);
+		i++;
+	}
+
+	if (enable)
+		write_tracing_on(instance, 1);
+}
+
+int tracecmd_clock_get_peer(struct tracecmd_clock_sync *clock_context,
+			    unsigned int *remote_cid, unsigned int *remote_port)
+{
+	if (!clock_context)
+		return 0;
+	if (remote_cid)
+		*remote_cid = clock_context->remote_cid;
+	if (remote_port)
+		*remote_cid = clock_context->remote_port;
+	return 1;
+}
+
+void tracecmd_clock_synch_enable(struct tracecmd_clock_sync *clock_context)
+{
+	set_clock_synch_events(clock_context->vinst,
+			       clock_context->ftrace_params, true);
+}
+
+void tracecmd_clock_synch_disable(struct tracecmd_clock_sync *clock_context)
+{
+	set_clock_synch_events(clock_context->vinst,
+			       clock_context->ftrace_params, false);
+}
+
+int tracecmd_clock_synch_calc(struct tracecmd_clock_sync *clock_context,
+			       long long *offset_ret, long long *time_ret)
+{
+	int i, j = 0;
+	long long av, tresch, offset = 0, time = 0;
+
+	if (!clock_context || !clock_context->probes_count)
+		return 0;
+	av = clock_context->offset_av / clock_context->probes_count;
+	tresch = (long long)((clock_context->offset_max - clock_context->offset_min)/10);
+
+	for (i = 0; i < clock_context->probes_count; i++) {
+		/* filter the offsets with deviation up to 10% */
+		if (llabs(clock_context->offsets[i] - av) < tresch) {
+			offset += clock_context->offsets[i];
+			j++;
+		}
+	}
+	if (j)
+		offset /= (long long)j;
+
+	tresch = 0;
+	for (i = 0; i < clock_context->probes_count; i++) {
+		if ((!tresch || tresch > llabs(offset-clock_context->offsets[i]))) {
+			tresch = llabs(offset-clock_context->offsets[i]);
+			time = clock_context->times[i];
+		}
+	}
+	if (offset_ret)
+		*offset_ret = offset;
+	if (time_ret)
+		*time_ret = time;
+#ifdef TSYNC_DEBUG
+	printf("\n calculated offset: %lld, %d/%d probes\n\r",
+		*offset_ret, clock_context->probes_count,
+		clock_context->probes_count + clock_context->bad_probes);
+#endif
+	return 1;
+}
+
+void tracecmd_clock_synch_calc_reset(struct tracecmd_clock_sync *clock_context)
+{
+	if (!clock_context)
+		return;
+
+	clock_context->probes_count = 0;
+	clock_context->bad_probes = 0;
+	clock_context->offset_av = 0;
+	clock_context->offset_min = 0;
+	clock_context->offset_max = 0;
+#ifdef TSYNC_DEBUG
+	if (clock_context->debug_fd >= 0) {
+		close(clock_context->debug_fd);
+		clock_context->debug_fd = -1;
+	}
+#endif
+
+}
+
+void tracecmd_clock_synch_calc_probe(struct tracecmd_clock_sync *clock_context,
+				     long long ts_local, long long ts_remote)
+{
+	int count;
+#ifdef TSYNC_DEBUG
+	char buff[256];
+#endif
+
+	if (!clock_context || !ts_local || !ts_remote)
+		return;
+	if (!ts_local || !ts_remote) {
+		clock_context->bad_probes++;
+		return;
+	}
+
+	if (!clock_context->offsets && !clock_context->times) {
+		clock_context->offsets = calloc(10, sizeof(long long));
+		clock_context->times = calloc(10, sizeof(long long));
+		clock_context->probes_size = 10;
+	}
+
+	if (clock_context->probes_size == clock_context->probes_count) {
+		clock_context->probes_size = (3*clock_context->probes_size)/2;
+		clock_context->offsets = realloc(clock_context->offsets,
+						 clock_context->probes_size *
+						 sizeof(long long));
+		clock_context->times = realloc(clock_context->times,
+					       clock_context->probes_size*
+					       sizeof(long long));
+	}
+
+	if (!clock_context->offsets || !clock_context->times) {
+		clock_context->probes_size = 0;
+		tracecmd_clock_synch_calc_reset(clock_context);
+		return;
+	}
+#ifdef TSYNC_DEBUG
+	if (clock_context->debug_fd < 0) {
+		sprintf(buff, "s-cid%d.txt", clock_context->remote_cid);
+		clock_context->debug_fd = open(buff, O_CREAT|O_WRONLY|O_TRUNC, 0644);
+	}
+#endif
+	count = clock_context->probes_count;
+	clock_context->probes_count++;
+	clock_context->offsets[count] = ts_remote - ts_local;
+	clock_context->times[count] = ts_local;
+	clock_context->offset_av += clock_context->offsets[count];
+
+	if (!clock_context->offset_min ||
+	    clock_context->offset_min > llabs(clock_context->offsets[count]))
+		clock_context->offset_min = llabs(clock_context->offsets[count]);
+	if (!clock_context->offset_max ||
+	    clock_context->offset_max < llabs(clock_context->offsets[count]))
+		clock_context->offset_max = llabs(clock_context->offsets[count]);
+#ifdef TSYNC_DEBUG
+	sprintf(buff, "%lld %lld\n", ts_local, ts_remote);
+	write(clock_context->debug_fd, buff, strlen(buff));
+#endif
+
+}
-- 
2.20.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v11 0/9]  trace-cmd: Timetamps sync between host and guest machines, relying on vsock events.
  2019-04-25 11:05 [PATCH v11 0/9] trace-cmd: Timetamps sync between host and guest machines, relying on vsock events Tzvetomir Stoyanov
                   ` (8 preceding siblings ...)
  2019-04-25 11:05 ` [PATCH v11 9/9] trace-cmd [POC]: Implemented timestamps synch algorithm, using vsock events Tzvetomir Stoyanov
@ 2019-08-15 18:40 ` " Steven Rostedt
  9 siblings, 0 replies; 11+ messages in thread
From: Steven Rostedt @ 2019-08-15 18:40 UTC (permalink / raw)
  To: Tzvetomir Stoyanov; +Cc: linux-trace-devel

On Thu, 25 Apr 2019 14:05:22 +0300
Tzvetomir Stoyanov <tstoyanov@vmware.com> wrote:

> [
>  v11 changes:
>   - Rebased on top of Slavomir's v10 "Add VM kernel tracing over vsockets and FIFOs"
>   - Addressed Slavomir's commnents from version 10 of the patch series.
> 
>  v10 changes:
>   - Fixed broken compilation, call to timestamp_correction_calc() in timestamp_correct
>     was smashed.
>   - Replaced deprecated tep_data_event_from_type() API with tep_find_event().
>   - Fixed a warning on assignment const to non const.
> 
>  v9 changes:
>   - Fixed implementation of binary search algorithm in timestamp_correct()
> 
>  v8 changes:
>   - Added rmdir() call in tracecmd_remove_instance(), to completely remove the instance. 
>   However, there is an issue with deleting the instances using rmdir(), which is investigated.
>   - Few changes in read_qemu_guests_pids(), timestamp_correct(), tsync_offset_load() 
>  tracecmd_clock_context_new() and find_raw_events() suggested by Slavomir. 
> 
>  v7 changes:
>   - Added warning messages in case time synchronization cannot be negotiated or fails.
>   - Few optimizations and checks in read_qemu_guests_pids(), tsync_offset_load(),
>     and find_raw_events(), suggested by Slavomir Kaslev.
>   - Reworked timestamp_correct() to not use static variables.
>   - Check TRACECMD_OPTION_TIME_SHIFT before reading time sync samples from the trace.dat file
> 
>  v6 changes:
>   - Refactored tracecmd_msg_snd_time_sync() and tracecmd_msg_rcv_time_sync() functions:
>     removed any time sync calculations logic as separate functions in trace-timesync.c file
>   - Defined TSYNC_PROBE, TSYNC_REQ and TSYNC_RESP messages, in order to make the time sync
>     protocol comprehensible.
>   - Addressed Steven Rostedt comments.
>   - Addressed Slavomir Kaslev commnets.
> 
>  v5 changes:
>   - Rebased to Slavomir's v8 "Add VM kernel tracing over vsockets and FIFOs"
>     patch series.
>   - Implemented an algorithm for time drift correction.
>   - Addressed Slavomir's commnets.
>   - Refactored the code: moved all time sync specific implementation in trace-timesync.c
>   - Isolated all hardcoded event specific stuff in a structure, so it could be easily
>     moved to external plugins.
>   - Added a check for VSOCK support: do not perform vsock dependent time synchronisation
>     in case there is no VSOCK support.
> 
>  v4 changes:
>   - Removed the implementation of PTP-like algorithm. The current
>     logic relies on matching time stamps of kvm_exit/virtio_transport_recv_pkt
>     events on host to virtio_transport_alloc_pkt/vp_notify events on guest.
>   - Rebased to Slavomir's v7 "Add VM kernel tracing over vsockets and FIFOs"
>     patch series.
>   - Decreased the time synch probes from 5000 to 300.
>   - Addressed Steven Rostedt comments.
>   - Code cleanup.
> 
>  v3 changes:
>  - Removed any magic constants, used in the PTP-like algorithm,
>    as Slavomir Kaslev suggested.
>  - Implemented new algorithm, based on mapping kvm_exit events
>    in host context to vsock_send events in guest context,
>    suggested by Steven Rostedt.
> 
>  v2 changes:
>   - Addressed Steven Rostedt comments.
>   - Modified PTP-like timestamps sync algorithm to gain more accuracy, with the
>     help of Yordan Karadzhov and Slavomir Kaslev.
> ]
> 
> POC implementation of algorithm for timestamps sync between guest and host machines.
> The algorithm relies on matching time stamps of guest and host vsock events.
> 
> The patch series depends on Slavomir's changes, introduced by the v10 patch series
> "Add VM kernel tracing over vsockets and FIFOs"

Hi Ceco,

I did a bit of playing to get this working too, and even tried to see
events that map perfectly and found the connection not that reliable.
Partly because, there could be a drift in the clocks themselves, which
exact measurement may not be as accurate. And since vsocks has a lot of
overhead between the sending and receiving, it creates a much bigger
variance and difficultly to find the best events to compare with.

I'm thinking that we should go back to your P2P patches. Could you
rebase them tomorrow and resend them?

Don't worry about making them part of the plugin work. This is still
just prototype/demo for me ;-)

Thanks!

-- Steve


> 
> Tzvetomir Stoyanov (9):
>   trace-cmd: Implemented new lib API: tracecmd_local_events_system()
>   trace-cmd: Added support for negative time offsets in trace.dat file
>   trace-cmd: Fix tracecmd_read_page_record() to read more than one event
>   trace-cmd: Added implementation of htonll() and ntohll()
>   trace-cmd: Refactored few functions in trace-record.c
>   trace-cmd: Find and store pids of tasks, which run virtual CPUs of
>     given VM
>   trace-cmd: Implemented new API tracecmd_add_option_v()
>   trace-cmd: Implemented new option in trace.dat file:
>     TRACECMD_OPTION_TIME_SHIFT
>   trace-cmd [POC]: Implemented timestamps synch algorithm, using vsock
>     events.
> 
>  include/trace-cmd/trace-cmd.h    |  31 +-
>  include/traceevent/event-parse.h |   1 +
>  lib/trace-cmd/trace-input.c      | 145 +++++-
>  lib/trace-cmd/trace-util.c       |  98 ++--
>  tracecmd/Makefile                |   1 +
>  tracecmd/include/trace-local.h   |  43 +-
>  tracecmd/include/trace-msg.h     |  10 +
>  tracecmd/trace-agent.c           |  13 +-
>  tracecmd/trace-msg.c             | 209 +++++++-
>  tracecmd/trace-output.c          | 117 ++++-
>  tracecmd/trace-read.c            |   4 +-
>  tracecmd/trace-record.c          | 229 +++++++--
>  tracecmd/trace-timesync.c        | 808 +++++++++++++++++++++++++++++++
>  13 files changed, 1575 insertions(+), 134 deletions(-)
>  create mode 100644 tracecmd/trace-timesync.c
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, back to index

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-25 11:05 [PATCH v11 0/9] trace-cmd: Timetamps sync between host and guest machines, relying on vsock events Tzvetomir Stoyanov
2019-04-25 11:05 ` [PATCH v11 1/9] trace-cmd: Implemented new lib API: tracecmd_local_events_system() Tzvetomir Stoyanov
2019-04-25 11:05 ` [PATCH v11 2/9] trace-cmd: Added support for negative time offsets in trace.dat file Tzvetomir Stoyanov
2019-04-25 11:05 ` [PATCH v11 3/9] trace-cmd: Fix tracecmd_read_page_record() to read more than one event Tzvetomir Stoyanov
2019-04-25 11:05 ` [PATCH v11 4/9] trace-cmd: Added implementation of htonll() and ntohll() Tzvetomir Stoyanov
2019-04-25 11:05 ` [PATCH v11 5/9] trace-cmd: Refactored few functions in trace-record.c Tzvetomir Stoyanov
2019-04-25 11:05 ` [PATCH v11 6/9] trace-cmd: Find and store pids of tasks, which run virtual CPUs of given VM Tzvetomir Stoyanov
2019-04-25 11:05 ` [PATCH v11 7/9] trace-cmd: Implemented new API tracecmd_add_option_v() Tzvetomir Stoyanov
2019-04-25 11:05 ` [PATCH v11 8/9] trace-cmd: Implemented new option in trace.dat file: TRACECMD_OPTION_TIME_SHIFT Tzvetomir Stoyanov
2019-04-25 11:05 ` [PATCH v11 9/9] trace-cmd [POC]: Implemented timestamps synch algorithm, using vsock events Tzvetomir Stoyanov
2019-08-15 18:40 ` [PATCH v11 0/9] trace-cmd: Timetamps sync between host and guest machines, relying on " Steven Rostedt

Linux-Trace-Devel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-trace-devel/0 linux-trace-devel/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-trace-devel linux-trace-devel/ https://lore.kernel.org/linux-trace-devel \
		linux-trace-devel@vger.kernel.org linux-trace-devel@archiver.kernel.org
	public-inbox-index linux-trace-devel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-trace-devel


AGPL code for this site: git clone https://public-inbox.org/ public-inbox