linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/35] perf intel-pt: Add support for tracing virtual machine user space on the host
@ 2022-07-11  9:31 Adrian Hunter
  2022-07-11  9:31 ` [PATCH 01/35] perf tools: Fix dso_id inode generation comparison Adrian Hunter
                   ` (35 more replies)
  0 siblings, 36 replies; 83+ messages in thread
From: Adrian Hunter @ 2022-07-11  9:31 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Namhyung Kim, Ian Rogers, Andi Kleen, linux-kernel, kvm

Hi

Here are patches to support decoding an Intel PT trace that contains data
from virtual machine userspace.

This is done by adding functionality to perf inject to be able to inject
sideband events needed for decoding, into the perf.data file recorded on
the host.  That is, inject events from a perf.data file recorded in a
virtual machine into a perf.data file recorded on the host at the same
time.

For more details, see the example in the documentation added in the last
patch.

Note there was already support for tracing virtual machines kernel-only:

 https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/Documentation/perf-intel-pt.txt?h=v5.19-rc1#n1221
 
or the special case of tracing KVM self tests:

 https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/Documentation/perf-intel-pt.txt?h=v5.19-rc1#n1403

For general information about Intel PT also see the wiki page:

 https://perf.wiki.kernel.org/index.php/Perf_tools_support_for_Intel%C2%AE_Processor_Trace

The patches fall into 5 groups:
 1. the first patch is a fix
 2. the next 22 patches are preparation
 3. the main patch is "perf inject: Add support for injecting guest
 sideband events"
 4. 3 more preparation patches
 5. Intel PT decoding changes

The patches are mostly small except for "perf inject: Add support for
injecting guest sideband events".  However the code there adds new
functionality, does not affect existing functionality and is consequently
pretty self-contained.


Adrian Hunter (35):
      perf tools: Fix dso_id inode generation comparison
      perf tools: Export dsos__for_each_with_build_id()
      perf ordered_events: Add ordered_events__last_flush_time()
      perf tools: Export perf_event__process_finished_round()
      perf tools: Factor out evsel__id_hdr_size()
      perf tools: Add perf_event__synthesize_id_sample()
      perf script: Add --dump-unsorted-raw-trace option
      perf buildid-cache: Add guestmount'd files to the build ID cache
      perf buildid-cache: Do not require purge files to also be in the file system
      perf tools: Add machine_pid and vcpu to id_index
      perf session: Create guest machines from id_index
      perf tools: Add guest_cpu to hypervisor threads
      perf tools: Add machine_pid and vcpu to perf_sample
      perf tools: Use sample->machine_pid to find guest machine
      perf script: Add machine_pid and vcpu
      perf dlfilter: Add machine_pid and vcpu
      perf auxtrace: Add machine_pid and vcpu to auxtrace_error
      perf script python: Add machine_pid and vcpu
      perf script python: intel-pt-events: Add machine_pid and vcpu
      perf tools: Remove also guest kcore_dir with host kcore_dir
      perf tools: Make has_kcore_dir() work also for guest kcore_dir
      perf tools: Automatically use guest kcore_dir if present
      perf tools: Add reallocarray_as_needed()
      perf inject: Add support for injecting guest sideband events
      perf machine: Use realloc_array_as_needed() in machine__set_current_tid()
      perf tools: Handle injected guest kernel mmap event
      perf tools: Add perf_event__is_guest()
      perf intel-pt: Remove guest_machine_pid
      perf intel-pt: Add some more logging to intel_pt_walk_next_insn()
      perf intel-pt: Track guest context switches
      perf intel-pt: pt disable sync switch
      perf intel-pt: Determine guest thread from guest sideband
      perf intel-pt: Add machine_pid and vcpu to auxtrace_error
      perf intel-pt: Use guest pid/tid etc in guest samples
      perf intel-pt: Add documentation for tracing guest machine user space

 tools/lib/perf/include/internal/evsel.h            |    4 +
 tools/lib/perf/include/perf/event.h                |    7 +
 tools/perf/Documentation/perf-dlfilter.txt         |   22 +
 tools/perf/Documentation/perf-inject.txt           |   17 +
 tools/perf/Documentation/perf-intel-pt.txt         |  181 +++-
 tools/perf/Documentation/perf-script.txt           |   10 +-
 tools/perf/builtin-inject.c                        | 1043 +++++++++++++++++++-
 tools/perf/builtin-script.c                        |   19 +
 tools/perf/include/perf/perf_dlfilter.h            |    8 +
 tools/perf/scripts/python/intel-pt-events.py       |   32 +-
 tools/perf/util/auxtrace.c                         |   30 +-
 tools/perf/util/auxtrace.h                         |    4 +
 tools/perf/util/build-id.c                         |   80 +-
 tools/perf/util/build-id.h                         |   16 +-
 tools/perf/util/data.c                             |   43 +-
 tools/perf/util/data.h                             |    1 +
 tools/perf/util/dlfilter.c                         |    2 +
 tools/perf/util/dso.h                              |    6 +
 tools/perf/util/dsos.c                             |   10 +-
 tools/perf/util/event.h                            |   23 +
 tools/perf/util/evlist.c                           |   42 +-
 tools/perf/util/evsel.c                            |   27 +
 tools/perf/util/evsel.h                            |    2 +
 tools/perf/util/intel-pt.c                         |  183 +++-
 tools/perf/util/machine.c                          |   41 +-
 tools/perf/util/machine.h                          |    2 +
 tools/perf/util/ordered-events.h                   |    6 +
 .../util/scripting-engines/trace-event-python.c    |   15 +-
 tools/perf/util/session.c                          |  111 ++-
 tools/perf/util/session.h                          |    4 +
 tools/perf/util/symbol.c                           |    6 +-
 tools/perf/util/synthetic-events.c                 |   98 +-
 tools/perf/util/synthetic-events.h                 |    2 +
 tools/perf/util/thread.c                           |    1 +
 tools/perf/util/thread.h                           |    1 +
 tools/perf/util/util.c                             |   70 +-
 tools/perf/util/util.h                             |   15 +
 37 files changed, 2029 insertions(+), 155 deletions(-)


Regards
Adrian

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH 01/35] perf tools: Fix dso_id inode generation comparison
  2022-07-11  9:31 [PATCH 00/35] perf intel-pt: Add support for tracing virtual machine user space on the host Adrian Hunter
@ 2022-07-11  9:31 ` Adrian Hunter
  2022-07-18 14:57   ` Arnaldo Carvalho de Melo
  2022-07-11  9:31 ` [PATCH 02/35] perf tools: Export dsos__for_each_with_build_id() Adrian Hunter
                   ` (34 subsequent siblings)
  35 siblings, 1 reply; 83+ messages in thread
From: Adrian Hunter @ 2022-07-11  9:31 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Namhyung Kim, Ian Rogers, Andi Kleen, linux-kernel, kvm

Synthesized MMAP events have zero ino_generation, so do not compare zero
values.

Fixes: 0e3149f86b99 ("perf dso: Move dso_id from 'struct map' to 'struct dso'")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/dsos.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/dsos.c b/tools/perf/util/dsos.c
index b97366f77bbf..839a1f384733 100644
--- a/tools/perf/util/dsos.c
+++ b/tools/perf/util/dsos.c
@@ -23,8 +23,14 @@ static int __dso_id__cmp(struct dso_id *a, struct dso_id *b)
 	if (a->ino > b->ino) return -1;
 	if (a->ino < b->ino) return 1;
 
-	if (a->ino_generation > b->ino_generation) return -1;
-	if (a->ino_generation < b->ino_generation) return 1;
+	/*
+	 * Synthesized MMAP events have zero ino_generation, so do not compare
+	 * zero values.
+	 */
+	if (a->ino_generation && b->ino_generation) {
+		if (a->ino_generation > b->ino_generation) return -1;
+		if (a->ino_generation < b->ino_generation) return 1;
+	}
 
 	return 0;
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH 02/35] perf tools: Export dsos__for_each_with_build_id()
  2022-07-11  9:31 [PATCH 00/35] perf intel-pt: Add support for tracing virtual machine user space on the host Adrian Hunter
  2022-07-11  9:31 ` [PATCH 01/35] perf tools: Fix dso_id inode generation comparison Adrian Hunter
@ 2022-07-11  9:31 ` Adrian Hunter
  2022-07-19 16:55   ` Ian Rogers
  2022-07-11  9:31 ` [PATCH 03/35] perf ordered_events: Add ordered_events__last_flush_time() Adrian Hunter
                   ` (33 subsequent siblings)
  35 siblings, 1 reply; 83+ messages in thread
From: Adrian Hunter @ 2022-07-11  9:31 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Namhyung Kim, Ian Rogers, Andi Kleen, linux-kernel, kvm

Export dsos__for_each_with_build_id() so it can be used elsewhere.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/build-id.c | 6 ------
 tools/perf/util/dso.h      | 6 ++++++
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/tools/perf/util/build-id.c b/tools/perf/util/build-id.c
index 328668f38c69..4c9093b64d1f 100644
--- a/tools/perf/util/build-id.c
+++ b/tools/perf/util/build-id.c
@@ -300,12 +300,6 @@ char *dso__build_id_filename(const struct dso *dso, char *bf, size_t size,
 	return __dso__build_id_filename(dso, bf, size, is_debug, is_kallsyms);
 }
 
-#define dsos__for_each_with_build_id(pos, head)	\
-	list_for_each_entry(pos, head, node)	\
-		if (!pos->has_build_id)		\
-			continue;		\
-		else
-
 static int write_buildid(const char *name, size_t name_len, struct build_id *bid,
 			 pid_t pid, u16 misc, struct feat_fd *fd)
 {
diff --git a/tools/perf/util/dso.h b/tools/perf/util/dso.h
index 97047a11282b..66981c7a9a18 100644
--- a/tools/perf/util/dso.h
+++ b/tools/perf/util/dso.h
@@ -227,6 +227,12 @@ struct dso {
 #define dso__for_each_symbol(dso, pos, n)	\
 	symbols__for_each_entry(&(dso)->symbols, pos, n)
 
+#define dsos__for_each_with_build_id(pos, head)	\
+	list_for_each_entry(pos, head, node)	\
+		if (!pos->has_build_id)		\
+			continue;		\
+		else
+
 static inline void dso__set_loaded(struct dso *dso)
 {
 	dso->loaded = true;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH 03/35] perf ordered_events: Add ordered_events__last_flush_time()
  2022-07-11  9:31 [PATCH 00/35] perf intel-pt: Add support for tracing virtual machine user space on the host Adrian Hunter
  2022-07-11  9:31 ` [PATCH 01/35] perf tools: Fix dso_id inode generation comparison Adrian Hunter
  2022-07-11  9:31 ` [PATCH 02/35] perf tools: Export dsos__for_each_with_build_id() Adrian Hunter
@ 2022-07-11  9:31 ` Adrian Hunter
  2022-07-19 16:56   ` Ian Rogers
  2022-07-11  9:31 ` [PATCH 04/35] perf tools: Export perf_event__process_finished_round() Adrian Hunter
                   ` (32 subsequent siblings)
  35 siblings, 1 reply; 83+ messages in thread
From: Adrian Hunter @ 2022-07-11  9:31 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Namhyung Kim, Ian Rogers, Andi Kleen, linux-kernel, kvm

Allow callers to get the ordered_events last flush timestamp.

This is needed in perf inject to obey finished-round ordering when
injecting additional events (e.g. from a guest perf.data file) with
timestamps. Any additional events that have timestamps before the last
flush time must be injected before the corresponding FINISHED_ROUND event.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/ordered-events.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/tools/perf/util/ordered-events.h b/tools/perf/util/ordered-events.h
index 0b05c3c0aeaa..8febbd7c98ca 100644
--- a/tools/perf/util/ordered-events.h
+++ b/tools/perf/util/ordered-events.h
@@ -75,4 +75,10 @@ void ordered_events__set_copy_on_queue(struct ordered_events *oe, bool copy)
 {
 	oe->copy_on_queue = copy;
 }
+
+static inline u64 ordered_events__last_flush_time(struct ordered_events *oe)
+{
+	return oe->last_flush;
+}
+
 #endif /* __ORDERED_EVENTS_H */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH 04/35] perf tools: Export perf_event__process_finished_round()
  2022-07-11  9:31 [PATCH 00/35] perf intel-pt: Add support for tracing virtual machine user space on the host Adrian Hunter
                   ` (2 preceding siblings ...)
  2022-07-11  9:31 ` [PATCH 03/35] perf ordered_events: Add ordered_events__last_flush_time() Adrian Hunter
@ 2022-07-11  9:31 ` Adrian Hunter
  2022-07-19 17:04   ` Ian Rogers
  2022-07-11  9:31 ` [PATCH 05/35] perf tools: Factor out evsel__id_hdr_size() Adrian Hunter
                   ` (31 subsequent siblings)
  35 siblings, 1 reply; 83+ messages in thread
From: Adrian Hunter @ 2022-07-11  9:31 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Namhyung Kim, Ian Rogers, Andi Kleen, linux-kernel, kvm

Export perf_event__process_finished_round() so it can be used elsewhere.

This is needed in perf inject to obey finished-round ordering.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/session.c | 12 ++++--------
 tools/perf/util/session.h |  4 ++++
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 37f833c3c81b..4c9513bc6d89 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -374,10 +374,6 @@ static int process_finished_round_stub(struct perf_tool *tool __maybe_unused,
 	return 0;
 }
 
-static int process_finished_round(struct perf_tool *tool,
-				  union perf_event *event,
-				  struct ordered_events *oe);
-
 static int skipn(int fd, off_t n)
 {
 	char buf[4096];
@@ -534,7 +530,7 @@ void perf_tool__fill_defaults(struct perf_tool *tool)
 		tool->build_id = process_event_op2_stub;
 	if (tool->finished_round == NULL) {
 		if (tool->ordered_events)
-			tool->finished_round = process_finished_round;
+			tool->finished_round = perf_event__process_finished_round;
 		else
 			tool->finished_round = process_finished_round_stub;
 	}
@@ -1069,9 +1065,9 @@ static perf_event__swap_op perf_event__swap_ops[] = {
  *      Flush every events below timestamp 7
  *      etc...
  */
-static int process_finished_round(struct perf_tool *tool __maybe_unused,
-				  union perf_event *event __maybe_unused,
-				  struct ordered_events *oe)
+int perf_event__process_finished_round(struct perf_tool *tool __maybe_unused,
+				       union perf_event *event __maybe_unused,
+				       struct ordered_events *oe)
 {
 	if (dump_trace)
 		fprintf(stdout, "\n");
diff --git a/tools/perf/util/session.h b/tools/perf/util/session.h
index 34500a3da735..be5871ea558f 100644
--- a/tools/perf/util/session.h
+++ b/tools/perf/util/session.h
@@ -155,4 +155,8 @@ int perf_session__deliver_synth_event(struct perf_session *session,
 int perf_event__process_id_index(struct perf_session *session,
 				 union perf_event *event);
 
+int perf_event__process_finished_round(struct perf_tool *tool,
+				       union perf_event *event,
+				       struct ordered_events *oe);
+
 #endif /* __PERF_SESSION_H */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH 05/35] perf tools: Factor out evsel__id_hdr_size()
  2022-07-11  9:31 [PATCH 00/35] perf intel-pt: Add support for tracing virtual machine user space on the host Adrian Hunter
                   ` (3 preceding siblings ...)
  2022-07-11  9:31 ` [PATCH 04/35] perf tools: Export perf_event__process_finished_round() Adrian Hunter
@ 2022-07-11  9:31 ` Adrian Hunter
  2022-07-19 17:09   ` Ian Rogers
  2022-07-11  9:31 ` [PATCH 06/35] perf tools: Add perf_event__synthesize_id_sample() Adrian Hunter
                   ` (30 subsequent siblings)
  35 siblings, 1 reply; 83+ messages in thread
From: Adrian Hunter @ 2022-07-11  9:31 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Namhyung Kim, Ian Rogers, Andi Kleen, linux-kernel, kvm

Factor out evsel__id_hdr_size() so it can be reused.

This is needed by perf inject. When injecting events from a guest perf.data
file, there is a possibility that the sample ID numbers conflict. To
re-write an ID sample, the old one needs to be removed first, which means
determining how big it is with evsel__id_hdr_size() and then subtracting
that from the event size.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/evlist.c | 28 +---------------------------
 tools/perf/util/evsel.c  | 26 ++++++++++++++++++++++++++
 tools/perf/util/evsel.h  |  2 ++
 3 files changed, 29 insertions(+), 27 deletions(-)

diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 48af7d379d82..03fbe151b0c4 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -1244,34 +1244,8 @@ bool evlist__valid_read_format(struct evlist *evlist)
 u16 evlist__id_hdr_size(struct evlist *evlist)
 {
 	struct evsel *first = evlist__first(evlist);
-	struct perf_sample *data;
-	u64 sample_type;
-	u16 size = 0;
 
-	if (!first->core.attr.sample_id_all)
-		goto out;
-
-	sample_type = first->core.attr.sample_type;
-
-	if (sample_type & PERF_SAMPLE_TID)
-		size += sizeof(data->tid) * 2;
-
-       if (sample_type & PERF_SAMPLE_TIME)
-		size += sizeof(data->time);
-
-	if (sample_type & PERF_SAMPLE_ID)
-		size += sizeof(data->id);
-
-	if (sample_type & PERF_SAMPLE_STREAM_ID)
-		size += sizeof(data->stream_id);
-
-	if (sample_type & PERF_SAMPLE_CPU)
-		size += sizeof(data->cpu) * 2;
-
-	if (sample_type & PERF_SAMPLE_IDENTIFIER)
-		size += sizeof(data->id);
-out:
-	return size;
+	return first->core.attr.sample_id_all ? evsel__id_hdr_size(first) : 0;
 }
 
 bool evlist__valid_sample_id_all(struct evlist *evlist)
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index a67cc3f2fa74..9a30ccb7b104 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -2724,6 +2724,32 @@ int evsel__parse_sample_timestamp(struct evsel *evsel, union perf_event *event,
 	return 0;
 }
 
+u16 evsel__id_hdr_size(struct evsel *evsel)
+{
+	u64 sample_type = evsel->core.attr.sample_type;
+	u16 size = 0;
+
+	if (sample_type & PERF_SAMPLE_TID)
+		size += sizeof(u64);
+
+	if (sample_type & PERF_SAMPLE_TIME)
+		size += sizeof(u64);
+
+	if (sample_type & PERF_SAMPLE_ID)
+		size += sizeof(u64);
+
+	if (sample_type & PERF_SAMPLE_STREAM_ID)
+		size += sizeof(u64);
+
+	if (sample_type & PERF_SAMPLE_CPU)
+		size += sizeof(u64);
+
+	if (sample_type & PERF_SAMPLE_IDENTIFIER)
+		size += sizeof(u64);
+
+	return size;
+}
+
 struct tep_format_field *evsel__field(struct evsel *evsel, const char *name)
 {
 	return tep_find_field(evsel->tp_format, name);
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index 92bed8e2f7d8..699448f2bc2b 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -381,6 +381,8 @@ int evsel__parse_sample(struct evsel *evsel, union perf_event *event,
 int evsel__parse_sample_timestamp(struct evsel *evsel, union perf_event *event,
 				  u64 *timestamp);
 
+u16 evsel__id_hdr_size(struct evsel *evsel);
+
 static inline struct evsel *evsel__next(struct evsel *evsel)
 {
 	return list_entry(evsel->core.node.next, struct evsel, core.node);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH 06/35] perf tools: Add perf_event__synthesize_id_sample()
  2022-07-11  9:31 [PATCH 00/35] perf intel-pt: Add support for tracing virtual machine user space on the host Adrian Hunter
                   ` (4 preceding siblings ...)
  2022-07-11  9:31 ` [PATCH 05/35] perf tools: Factor out evsel__id_hdr_size() Adrian Hunter
@ 2022-07-11  9:31 ` Adrian Hunter
  2022-07-19 17:10   ` Ian Rogers
  2022-07-11  9:31 ` [PATCH 07/35] perf script: Add --dump-unsorted-raw-trace option Adrian Hunter
                   ` (29 subsequent siblings)
  35 siblings, 1 reply; 83+ messages in thread
From: Adrian Hunter @ 2022-07-11  9:31 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Namhyung Kim, Ian Rogers, Andi Kleen, linux-kernel, kvm

Add perf_event__synthesize_id_sample() to enable the synthesis of
ID samples.

This is needed by perf inject. When injecting events from a guest perf.data
file, there is a possibility that the sample ID numbers conflict. In that
case, perf_event__synthesize_id_sample() can be used to re-write the ID
sample.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/synthetic-events.c | 47 ++++++++++++++++++++++++++++++
 tools/perf/util/synthetic-events.h |  1 +
 2 files changed, 48 insertions(+)

diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic-events.c
index fe5db4bf0042..ed9623702f34 100644
--- a/tools/perf/util/synthetic-events.c
+++ b/tools/perf/util/synthetic-events.c
@@ -1712,6 +1712,53 @@ int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 read_fo
 	return 0;
 }
 
+int perf_event__synthesize_id_sample(__u64 *array, u64 type, const struct perf_sample *sample)
+{
+	__u64 *start = array;
+
+	/*
+	 * used for cross-endian analysis. See git commit 65014ab3
+	 * for why this goofiness is needed.
+	 */
+	union u64_swap u;
+
+	if (type & PERF_SAMPLE_TID) {
+		u.val32[0] = sample->pid;
+		u.val32[1] = sample->tid;
+		*array = u.val64;
+		array++;
+	}
+
+	if (type & PERF_SAMPLE_TIME) {
+		*array = sample->time;
+		array++;
+	}
+
+	if (type & PERF_SAMPLE_ID) {
+		*array = sample->id;
+		array++;
+	}
+
+	if (type & PERF_SAMPLE_STREAM_ID) {
+		*array = sample->stream_id;
+		array++;
+	}
+
+	if (type & PERF_SAMPLE_CPU) {
+		u.val32[0] = sample->cpu;
+		u.val32[1] = 0;
+		*array = u.val64;
+		array++;
+	}
+
+	if (type & PERF_SAMPLE_IDENTIFIER) {
+		*array = sample->id;
+		array++;
+	}
+
+	return (void *)array - (void *)start;
+}
+
 int perf_event__synthesize_id_index(struct perf_tool *tool, perf_event__handler_t process,
 				    struct evlist *evlist, struct machine *machine)
 {
diff --git a/tools/perf/util/synthetic-events.h b/tools/perf/util/synthetic-events.h
index 78a0450db164..b136ec3ec95d 100644
--- a/tools/perf/util/synthetic-events.h
+++ b/tools/perf/util/synthetic-events.h
@@ -55,6 +55,7 @@ int perf_event__synthesize_extra_attr(struct perf_tool *tool, struct evlist *evs
 int perf_event__synthesize_extra_kmaps(struct perf_tool *tool, perf_event__handler_t process, struct machine *machine);
 int perf_event__synthesize_features(struct perf_tool *tool, struct perf_session *session, struct evlist *evlist, perf_event__handler_t process);
 int perf_event__synthesize_id_index(struct perf_tool *tool, perf_event__handler_t process, struct evlist *evlist, struct machine *machine);
+int perf_event__synthesize_id_sample(__u64 *array, u64 type, const struct perf_sample *sample);
 int perf_event__synthesize_kernel_mmap(struct perf_tool *tool, perf_event__handler_t process, struct machine *machine);
 int perf_event__synthesize_mmap_events(struct perf_tool *tool, union perf_event *event, pid_t pid, pid_t tgid, perf_event__handler_t process, struct machine *machine, bool mmap_data);
 int perf_event__synthesize_modules(struct perf_tool *tool, perf_event__handler_t process, struct machine *machine);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH 07/35] perf script: Add --dump-unsorted-raw-trace option
  2022-07-11  9:31 [PATCH 00/35] perf intel-pt: Add support for tracing virtual machine user space on the host Adrian Hunter
                   ` (5 preceding siblings ...)
  2022-07-11  9:31 ` [PATCH 06/35] perf tools: Add perf_event__synthesize_id_sample() Adrian Hunter
@ 2022-07-11  9:31 ` Adrian Hunter
  2022-07-19 17:11   ` Ian Rogers
  2022-07-11  9:31 ` [PATCH 08/35] perf buildid-cache: Add guestmount'd files to the build ID cache Adrian Hunter
                   ` (28 subsequent siblings)
  35 siblings, 1 reply; 83+ messages in thread
From: Adrian Hunter @ 2022-07-11  9:31 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Namhyung Kim, Ian Rogers, Andi Kleen, linux-kernel, kvm

When reviewing the results of perf inject, it is useful to be able to see
the events in the order they appear in the file.

So add --dump-unsorted-raw-trace option to do an unsorted dump.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/Documentation/perf-script.txt | 3 +++
 tools/perf/builtin-script.c              | 8 ++++++++
 2 files changed, 11 insertions(+)

diff --git a/tools/perf/Documentation/perf-script.txt b/tools/perf/Documentation/perf-script.txt
index 1a557ff8f210..e250ff5566cf 100644
--- a/tools/perf/Documentation/perf-script.txt
+++ b/tools/perf/Documentation/perf-script.txt
@@ -79,6 +79,9 @@ OPTIONS
 --dump-raw-trace=::
         Display verbose dump of the trace data.
 
+--dump-unsorted-raw-trace=::
+        Same as --dump-raw-trace but not sorted in time order.
+
 -L::
 --Latency=::
         Show latency attributes (irqs/preemption disabled, etc).
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 7cf21ab16f4f..4b00a50faf00 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -3746,6 +3746,7 @@ int cmd_script(int argc, const char **argv)
 	bool header = false;
 	bool header_only = false;
 	bool script_started = false;
+	bool unsorted_dump = false;
 	char *rec_script_path = NULL;
 	char *rep_script_path = NULL;
 	struct perf_session *session;
@@ -3794,6 +3795,8 @@ int cmd_script(int argc, const char **argv)
 	const struct option options[] = {
 	OPT_BOOLEAN('D', "dump-raw-trace", &dump_trace,
 		    "dump raw trace in ASCII"),
+	OPT_BOOLEAN(0, "dump-unsorted-raw-trace", &unsorted_dump,
+		    "dump unsorted raw trace in ASCII"),
 	OPT_INCR('v', "verbose", &verbose,
 		 "be more verbose (show symbol address, etc)"),
 	OPT_BOOLEAN('L', "Latency", &latency_format,
@@ -3956,6 +3959,11 @@ int cmd_script(int argc, const char **argv)
 	data.path  = input_name;
 	data.force = symbol_conf.force;
 
+	if (unsorted_dump) {
+		dump_trace = true;
+		script.tool.ordered_events = false;
+	}
+
 	if (symbol__validate_sym_arguments())
 		return -1;
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH 08/35] perf buildid-cache: Add guestmount'd files to the build ID cache
  2022-07-11  9:31 [PATCH 00/35] perf intel-pt: Add support for tracing virtual machine user space on the host Adrian Hunter
                   ` (6 preceding siblings ...)
  2022-07-11  9:31 ` [PATCH 07/35] perf script: Add --dump-unsorted-raw-trace option Adrian Hunter
@ 2022-07-11  9:31 ` Adrian Hunter
  2022-07-19 17:41   ` Ian Rogers
  2022-07-11  9:31 ` [PATCH 09/35] perf buildid-cache: Do not require purge files to also be in the file system Adrian Hunter
                   ` (27 subsequent siblings)
  35 siblings, 1 reply; 83+ messages in thread
From: Adrian Hunter @ 2022-07-11  9:31 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Namhyung Kim, Ian Rogers, Andi Kleen, linux-kernel, kvm

When the guestmount option is used, a guest machine's file system mount
point is recorded in machine->root_dir.

perf already iterates guest machines when adding files to the build ID
cache, but does not take machine->root_dir into account.

Use machine->root_dir to find files for guest build IDs, and add them to
the build ID cache using the "proper" name i.e. relative to the guest root
directory not the host root directory.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/build-id.c | 67 +++++++++++++++++++++++++++++---------
 tools/perf/util/build-id.h | 16 ++++++---
 2 files changed, 63 insertions(+), 20 deletions(-)

diff --git a/tools/perf/util/build-id.c b/tools/perf/util/build-id.c
index 4c9093b64d1f..7c9f441936ee 100644
--- a/tools/perf/util/build-id.c
+++ b/tools/perf/util/build-id.c
@@ -625,9 +625,12 @@ static int build_id_cache__add_sdt_cache(const char *sbuild_id,
 #endif
 
 static char *build_id_cache__find_debug(const char *sbuild_id,
-					struct nsinfo *nsi)
+					struct nsinfo *nsi,
+					const char *root_dir)
 {
+	const char *dirname = "/usr/lib/debug/.build-id/";
 	char *realname = NULL;
+	char dirbuf[PATH_MAX];
 	char *debugfile;
 	struct nscookie nsc;
 	size_t len = 0;
@@ -636,8 +639,12 @@ static char *build_id_cache__find_debug(const char *sbuild_id,
 	if (!debugfile)
 		goto out;
 
-	len = __symbol__join_symfs(debugfile, PATH_MAX,
-				   "/usr/lib/debug/.build-id/");
+	if (root_dir) {
+		path__join(dirbuf, PATH_MAX, root_dir, dirname);
+		dirname = dirbuf;
+	}
+
+	len = __symbol__join_symfs(debugfile, PATH_MAX, dirname);
 	snprintf(debugfile + len, PATH_MAX - len, "%.2s/%s.debug", sbuild_id,
 		 sbuild_id + 2);
 
@@ -668,14 +675,18 @@ static char *build_id_cache__find_debug(const char *sbuild_id,
 
 int
 build_id_cache__add(const char *sbuild_id, const char *name, const char *realname,
-		    struct nsinfo *nsi, bool is_kallsyms, bool is_vdso)
+		    struct nsinfo *nsi, bool is_kallsyms, bool is_vdso,
+		    const char *proper_name, const char *root_dir)
 {
 	const size_t size = PATH_MAX;
 	char *filename = NULL, *dir_name = NULL, *linkname = zalloc(size), *tmp;
 	char *debugfile = NULL;
 	int err = -1;
 
-	dir_name = build_id_cache__cachedir(sbuild_id, name, nsi, is_kallsyms,
+	if (!proper_name)
+		proper_name = name;
+
+	dir_name = build_id_cache__cachedir(sbuild_id, proper_name, nsi, is_kallsyms,
 					    is_vdso);
 	if (!dir_name)
 		goto out_free;
@@ -715,7 +726,7 @@ build_id_cache__add(const char *sbuild_id, const char *name, const char *realnam
 	 */
 	if (!is_kallsyms && !is_vdso &&
 	    strncmp(".ko", name + strlen(name) - 3, 3)) {
-		debugfile = build_id_cache__find_debug(sbuild_id, nsi);
+		debugfile = build_id_cache__find_debug(sbuild_id, nsi, root_dir);
 		if (debugfile) {
 			zfree(&filename);
 			if (asprintf(&filename, "%s/%s", dir_name,
@@ -781,8 +792,9 @@ build_id_cache__add(const char *sbuild_id, const char *name, const char *realnam
 	return err;
 }
 
-int build_id_cache__add_s(const char *sbuild_id, const char *name,
-			  struct nsinfo *nsi, bool is_kallsyms, bool is_vdso)
+int __build_id_cache__add_s(const char *sbuild_id, const char *name,
+			    struct nsinfo *nsi, bool is_kallsyms, bool is_vdso,
+			    const char *proper_name, const char *root_dir)
 {
 	char *realname = NULL;
 	int err = -1;
@@ -796,8 +808,8 @@ int build_id_cache__add_s(const char *sbuild_id, const char *name,
 			goto out_free;
 	}
 
-	err = build_id_cache__add(sbuild_id, name, realname, nsi, is_kallsyms, is_vdso);
-
+	err = build_id_cache__add(sbuild_id, name, realname, nsi,
+				  is_kallsyms, is_vdso, proper_name, root_dir);
 out_free:
 	if (!is_kallsyms)
 		free(realname);
@@ -806,14 +818,16 @@ int build_id_cache__add_s(const char *sbuild_id, const char *name,
 
 static int build_id_cache__add_b(const struct build_id *bid,
 				 const char *name, struct nsinfo *nsi,
-				 bool is_kallsyms, bool is_vdso)
+				 bool is_kallsyms, bool is_vdso,
+				 const char *proper_name,
+				 const char *root_dir)
 {
 	char sbuild_id[SBUILD_ID_SIZE];
 
 	build_id__sprintf(bid, sbuild_id);
 
-	return build_id_cache__add_s(sbuild_id, name, nsi, is_kallsyms,
-				     is_vdso);
+	return __build_id_cache__add_s(sbuild_id, name, nsi, is_kallsyms,
+				       is_vdso, proper_name, root_dir);
 }
 
 bool build_id_cache__cached(const char *sbuild_id)
@@ -896,6 +910,10 @@ static int dso__cache_build_id(struct dso *dso, struct machine *machine,
 	bool is_kallsyms = dso__is_kallsyms(dso);
 	bool is_vdso = dso__is_vdso(dso);
 	const char *name = dso->long_name;
+	const char *proper_name = NULL;
+	const char *root_dir = NULL;
+	char *allocated_name = NULL;
+	int ret = 0;
 
 	if (!dso->has_build_id)
 		return 0;
@@ -905,11 +923,28 @@ static int dso__cache_build_id(struct dso *dso, struct machine *machine,
 		name = machine->mmap_name;
 	}
 
+	if (!machine__is_host(machine)) {
+		if (*machine->root_dir) {
+			root_dir = machine->root_dir;
+			ret = asprintf(&allocated_name, "%s/%s", root_dir, name);
+			if (ret < 0)
+				return ret;
+			proper_name = name;
+			name = allocated_name;
+		} else if (is_kallsyms) {
+			/* Cannot get guest kallsyms */
+			return 0;
+		}
+	}
+
 	if (!is_kallsyms && dso__build_id_mismatch(dso, name))
-		return 0;
+		goto out_free;
 
-	return build_id_cache__add_b(&dso->bid, name, dso->nsinfo,
-				     is_kallsyms, is_vdso);
+	ret = build_id_cache__add_b(&dso->bid, name, dso->nsinfo,
+				    is_kallsyms, is_vdso, proper_name, root_dir);
+out_free:
+	free(allocated_name);
+	return ret;
 }
 
 static int
diff --git a/tools/perf/util/build-id.h b/tools/perf/util/build-id.h
index c19617151670..4e3a1169379b 100644
--- a/tools/perf/util/build-id.h
+++ b/tools/perf/util/build-id.h
@@ -66,10 +66,18 @@ int build_id_cache__list_build_ids(const char *pathname, struct nsinfo *nsi,
 				   struct strlist **result);
 bool build_id_cache__cached(const char *sbuild_id);
 int build_id_cache__add(const char *sbuild_id, const char *name, const char *realname,
-			struct nsinfo *nsi, bool is_kallsyms, bool is_vdso);
-int build_id_cache__add_s(const char *sbuild_id,
-			  const char *name, struct nsinfo *nsi,
-			  bool is_kallsyms, bool is_vdso);
+			struct nsinfo *nsi, bool is_kallsyms, bool is_vdso,
+			const char *proper_name, const char *root_dir);
+int __build_id_cache__add_s(const char *sbuild_id,
+			    const char *name, struct nsinfo *nsi,
+			    bool is_kallsyms, bool is_vdso,
+			    const char *proper_name, const char *root_dir);
+static inline int build_id_cache__add_s(const char *sbuild_id,
+					const char *name, struct nsinfo *nsi,
+					bool is_kallsyms, bool is_vdso)
+{
+	return __build_id_cache__add_s(sbuild_id, name, nsi, is_kallsyms, is_vdso, NULL, NULL);
+}
 int build_id_cache__remove_s(const char *sbuild_id);
 
 extern char buildid_dir[];
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH 09/35] perf buildid-cache: Do not require purge files to also be in the file system
  2022-07-11  9:31 [PATCH 00/35] perf intel-pt: Add support for tracing virtual machine user space on the host Adrian Hunter
                   ` (7 preceding siblings ...)
  2022-07-11  9:31 ` [PATCH 08/35] perf buildid-cache: Add guestmount'd files to the build ID cache Adrian Hunter
@ 2022-07-11  9:31 ` Adrian Hunter
  2022-07-19 17:44   ` Ian Rogers
  2022-07-11  9:31 ` [PATCH 10/35] perf tools: Add machine_pid and vcpu to id_index Adrian Hunter
                   ` (26 subsequent siblings)
  35 siblings, 1 reply; 83+ messages in thread
From: Adrian Hunter @ 2022-07-11  9:31 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Namhyung Kim, Ian Rogers, Andi Kleen, linux-kernel, kvm

realname() returns NULL if the file is not in the file system, but we can
still remove it from the build ID cache in that case, so continue and
attempt the purge with the name provided.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/build-id.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/tools/perf/util/build-id.c b/tools/perf/util/build-id.c
index 7c9f441936ee..9e176146eb10 100644
--- a/tools/perf/util/build-id.c
+++ b/tools/perf/util/build-id.c
@@ -561,14 +561,11 @@ char *build_id_cache__cachedir(const char *sbuild_id, const char *name,
 	char *realname = (char *)name, *filename;
 	bool slash = is_kallsyms || is_vdso;
 
-	if (!slash) {
+	if (!slash)
 		realname = nsinfo__realpath(name, nsi);
-		if (!realname)
-			return NULL;
-	}
 
 	if (asprintf(&filename, "%s%s%s%s%s", buildid_dir, slash ? "/" : "",
-		     is_vdso ? DSO__NAME_VDSO : realname,
+		     is_vdso ? DSO__NAME_VDSO : (realname ? realname : name),
 		     sbuild_id ? "/" : "", sbuild_id ?: "") < 0)
 		filename = NULL;
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH 10/35] perf tools: Add machine_pid and vcpu to id_index
  2022-07-11  9:31 [PATCH 00/35] perf intel-pt: Add support for tracing virtual machine user space on the host Adrian Hunter
                   ` (8 preceding siblings ...)
  2022-07-11  9:31 ` [PATCH 09/35] perf buildid-cache: Do not require purge files to also be in the file system Adrian Hunter
@ 2022-07-11  9:31 ` Adrian Hunter
  2022-07-19 17:48   ` Ian Rogers
  2022-07-11  9:31 ` [PATCH 11/35] perf session: Create guest machines from id_index Adrian Hunter
                   ` (25 subsequent siblings)
  35 siblings, 1 reply; 83+ messages in thread
From: Adrian Hunter @ 2022-07-11  9:31 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Namhyung Kim, Ian Rogers, Andi Kleen, linux-kernel, kvm

When injecting events from a guest perf.data file, the events will have
separate sample ID numbers. These ID numbers can then be used to determine
which machine an event belongs to. To facilitate that, add machine_pid and
vcpu to id_index records. For backward compatibility, these are added at
the end of the record, and the length of the record is used to determine
if they are present or not.

Note, this is needed because the events from a guest perf.data file contain
the pid/tid of the process running at that time inside the VM not the
pid/tid of the (QEMU) hypervisor thread. So a way is needed to relate
guest events back to the guest machine and VCPU, and using sample ID
numbers for that is relatively simple and convenient.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/lib/perf/include/internal/evsel.h |  4 ++
 tools/lib/perf/include/perf/event.h     |  5 +++
 tools/perf/util/session.c               | 40 ++++++++++++++++---
 tools/perf/util/synthetic-events.c      | 51 +++++++++++++++++++------
 tools/perf/util/synthetic-events.h      |  1 +
 5 files changed, 84 insertions(+), 17 deletions(-)

diff --git a/tools/lib/perf/include/internal/evsel.h b/tools/lib/perf/include/internal/evsel.h
index 2a912a1f1989..a99a75d9e78f 100644
--- a/tools/lib/perf/include/internal/evsel.h
+++ b/tools/lib/perf/include/internal/evsel.h
@@ -30,6 +30,10 @@ struct perf_sample_id {
 	struct perf_cpu		 cpu;
 	pid_t			 tid;
 
+	/* Guest machine pid and VCPU, valid only if machine_pid is non-zero */
+	pid_t			 machine_pid;
+	struct perf_cpu		 vcpu;
+
 	/* Holds total ID period value for PERF_SAMPLE_READ processing. */
 	u64			 period;
 };
diff --git a/tools/lib/perf/include/perf/event.h b/tools/lib/perf/include/perf/event.h
index 9f7ca070da87..c2dbd3e88885 100644
--- a/tools/lib/perf/include/perf/event.h
+++ b/tools/lib/perf/include/perf/event.h
@@ -237,6 +237,11 @@ struct id_index_entry {
 	__u64			 tid;
 };
 
+struct id_index_entry_2 {
+	__u64			 machine_pid;
+	__u64			 vcpu;
+};
+
 struct perf_record_id_index {
 	struct perf_event_header header;
 	__u64			 nr;
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 4c9513bc6d89..5141fe164e97 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -2756,18 +2756,35 @@ int perf_event__process_id_index(struct perf_session *session,
 {
 	struct evlist *evlist = session->evlist;
 	struct perf_record_id_index *ie = &event->id_index;
+	size_t sz = ie->header.size - sizeof(*ie);
 	size_t i, nr, max_nr;
+	size_t e1_sz = sizeof(struct id_index_entry);
+	size_t e2_sz = sizeof(struct id_index_entry_2);
+	size_t etot_sz = e1_sz + e2_sz;
+	struct id_index_entry_2 *e2;
 
-	max_nr = (ie->header.size - sizeof(struct perf_record_id_index)) /
-		 sizeof(struct id_index_entry);
+	max_nr = sz / e1_sz;
 	nr = ie->nr;
-	if (nr > max_nr)
+	if (nr > max_nr) {
+		printf("Too big: nr %zu max_nr %zu\n", nr, max_nr);
 		return -EINVAL;
+	}
+
+	if (sz >= nr * etot_sz) {
+		max_nr = sz / etot_sz;
+		if (nr > max_nr) {
+			printf("Too big2: nr %zu max_nr %zu\n", nr, max_nr);
+			return -EINVAL;
+		}
+		e2 = (void *)ie + sizeof(*ie) + nr * e1_sz;
+	} else {
+		e2 = NULL;
+	}
 
 	if (dump_trace)
 		fprintf(stdout, " nr: %zu\n", nr);
 
-	for (i = 0; i < nr; i++) {
+	for (i = 0; i < nr; i++, (e2 ? e2++ : 0)) {
 		struct id_index_entry *e = &ie->entries[i];
 		struct perf_sample_id *sid;
 
@@ -2775,15 +2792,28 @@ int perf_event__process_id_index(struct perf_session *session,
 			fprintf(stdout,	" ... id: %"PRI_lu64, e->id);
 			fprintf(stdout,	"  idx: %"PRI_lu64, e->idx);
 			fprintf(stdout,	"  cpu: %"PRI_ld64, e->cpu);
-			fprintf(stdout,	"  tid: %"PRI_ld64"\n", e->tid);
+			fprintf(stdout, "  tid: %"PRI_ld64, e->tid);
+			if (e2) {
+				fprintf(stdout, "  machine_pid: %"PRI_ld64, e2->machine_pid);
+				fprintf(stdout, "  vcpu: %"PRI_lu64"\n", e2->vcpu);
+			} else {
+				fprintf(stdout, "\n");
+			}
 		}
 
 		sid = evlist__id2sid(evlist, e->id);
 		if (!sid)
 			return -ENOENT;
+
 		sid->idx = e->idx;
 		sid->cpu.cpu = e->cpu;
 		sid->tid = e->tid;
+
+		if (!e2)
+			continue;
+
+		sid->machine_pid = e2->machine_pid;
+		sid->vcpu.cpu = e2->vcpu;
 	}
 	return 0;
 }
diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic-events.c
index ed9623702f34..2ae59c03ae77 100644
--- a/tools/perf/util/synthetic-events.c
+++ b/tools/perf/util/synthetic-events.c
@@ -1759,19 +1759,26 @@ int perf_event__synthesize_id_sample(__u64 *array, u64 type, const struct perf_s
 	return (void *)array - (void *)start;
 }
 
-int perf_event__synthesize_id_index(struct perf_tool *tool, perf_event__handler_t process,
-				    struct evlist *evlist, struct machine *machine)
+int __perf_event__synthesize_id_index(struct perf_tool *tool, perf_event__handler_t process,
+				      struct evlist *evlist, struct machine *machine, size_t from)
 {
 	union perf_event *ev;
 	struct evsel *evsel;
-	size_t nr = 0, i = 0, sz, max_nr, n;
+	size_t nr = 0, i = 0, sz, max_nr, n, pos;
+	size_t e1_sz = sizeof(struct id_index_entry);
+	size_t e2_sz = sizeof(struct id_index_entry_2);
+	size_t etot_sz = e1_sz + e2_sz;
+	bool e2_needed = false;
 	int err;
 
-	max_nr = (UINT16_MAX - sizeof(struct perf_record_id_index)) /
-		 sizeof(struct id_index_entry);
+	max_nr = (UINT16_MAX - sizeof(struct perf_record_id_index)) / etot_sz;
 
-	evlist__for_each_entry(evlist, evsel)
+	pos = 0;
+	evlist__for_each_entry(evlist, evsel) {
+		if (pos++ < from)
+			continue;
 		nr += evsel->core.ids;
+	}
 
 	if (!nr)
 		return 0;
@@ -1779,31 +1786,38 @@ int perf_event__synthesize_id_index(struct perf_tool *tool, perf_event__handler_
 	pr_debug2("Synthesizing id index\n");
 
 	n = nr > max_nr ? max_nr : nr;
-	sz = sizeof(struct perf_record_id_index) + n * sizeof(struct id_index_entry);
+	sz = sizeof(struct perf_record_id_index) + n * etot_sz;
 	ev = zalloc(sz);
 	if (!ev)
 		return -ENOMEM;
 
+	sz = sizeof(struct perf_record_id_index) + n * e1_sz;
+
 	ev->id_index.header.type = PERF_RECORD_ID_INDEX;
-	ev->id_index.header.size = sz;
 	ev->id_index.nr = n;
 
+	pos = 0;
 	evlist__for_each_entry(evlist, evsel) {
 		u32 j;
 
-		for (j = 0; j < evsel->core.ids; j++) {
+		if (pos++ < from)
+			continue;
+		for (j = 0; j < evsel->core.ids; j++, i++) {
 			struct id_index_entry *e;
+			struct id_index_entry_2 *e2;
 			struct perf_sample_id *sid;
 
 			if (i >= n) {
+				ev->id_index.header.size = sz + (e2_needed ? n * e2_sz : 0);
 				err = process(tool, ev, NULL, machine);
 				if (err)
 					goto out_err;
 				nr -= n;
 				i = 0;
+				e2_needed = false;
 			}
 
-			e = &ev->id_index.entries[i++];
+			e = &ev->id_index.entries[i];
 
 			e->id = evsel->core.id[j];
 
@@ -1816,11 +1830,18 @@ int perf_event__synthesize_id_index(struct perf_tool *tool, perf_event__handler_
 			e->idx = sid->idx;
 			e->cpu = sid->cpu.cpu;
 			e->tid = sid->tid;
+
+			if (sid->machine_pid)
+				e2_needed = true;
+
+			e2 = (void *)ev + sz;
+			e2[i].machine_pid = sid->machine_pid;
+			e2[i].vcpu        = sid->vcpu.cpu;
 		}
 	}
 
-	sz = sizeof(struct perf_record_id_index) + nr * sizeof(struct id_index_entry);
-	ev->id_index.header.size = sz;
+	sz = sizeof(struct perf_record_id_index) + nr * e1_sz;
+	ev->id_index.header.size = sz + (e2_needed ? nr * e2_sz : 0);
 	ev->id_index.nr = nr;
 
 	err = process(tool, ev, NULL, machine);
@@ -1830,6 +1851,12 @@ int perf_event__synthesize_id_index(struct perf_tool *tool, perf_event__handler_
 	return err;
 }
 
+int perf_event__synthesize_id_index(struct perf_tool *tool, perf_event__handler_t process,
+				    struct evlist *evlist, struct machine *machine)
+{
+	return __perf_event__synthesize_id_index(tool, process, evlist, machine, 0);
+}
+
 int __machine__synthesize_threads(struct machine *machine, struct perf_tool *tool,
 				  struct target *target, struct perf_thread_map *threads,
 				  perf_event__handler_t process, bool needs_mmap,
diff --git a/tools/perf/util/synthetic-events.h b/tools/perf/util/synthetic-events.h
index b136ec3ec95d..81cb3d6af0b9 100644
--- a/tools/perf/util/synthetic-events.h
+++ b/tools/perf/util/synthetic-events.h
@@ -55,6 +55,7 @@ int perf_event__synthesize_extra_attr(struct perf_tool *tool, struct evlist *evs
 int perf_event__synthesize_extra_kmaps(struct perf_tool *tool, perf_event__handler_t process, struct machine *machine);
 int perf_event__synthesize_features(struct perf_tool *tool, struct perf_session *session, struct evlist *evlist, perf_event__handler_t process);
 int perf_event__synthesize_id_index(struct perf_tool *tool, perf_event__handler_t process, struct evlist *evlist, struct machine *machine);
+int __perf_event__synthesize_id_index(struct perf_tool *tool, perf_event__handler_t process, struct evlist *evlist, struct machine *machine, size_t from);
 int perf_event__synthesize_id_sample(__u64 *array, u64 type, const struct perf_sample *sample);
 int perf_event__synthesize_kernel_mmap(struct perf_tool *tool, perf_event__handler_t process, struct machine *machine);
 int perf_event__synthesize_mmap_events(struct perf_tool *tool, union perf_event *event, pid_t pid, pid_t tgid, perf_event__handler_t process, struct machine *machine, bool mmap_data);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH 11/35] perf session: Create guest machines from id_index
  2022-07-11  9:31 [PATCH 00/35] perf intel-pt: Add support for tracing virtual machine user space on the host Adrian Hunter
                   ` (9 preceding siblings ...)
  2022-07-11  9:31 ` [PATCH 10/35] perf tools: Add machine_pid and vcpu to id_index Adrian Hunter
@ 2022-07-11  9:31 ` Adrian Hunter
  2022-07-19 17:51   ` Ian Rogers
  2022-07-11  9:31 ` [PATCH 12/35] perf tools: Add guest_cpu to hypervisor threads Adrian Hunter
                   ` (24 subsequent siblings)
  35 siblings, 1 reply; 83+ messages in thread
From: Adrian Hunter @ 2022-07-11  9:31 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Namhyung Kim, Ian Rogers, Andi Kleen, linux-kernel, kvm

Now that id_index has machine_pid, use it to create guest machines.
Create the guest machines with an idle thread because guest events
for "swapper" will be possible.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/session.c | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 5141fe164e97..1af981d5ad3c 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -2751,6 +2751,24 @@ void perf_session__fprintf_info(struct perf_session *session, FILE *fp,
 	fprintf(fp, "# ========\n#\n");
 }
 
+static int perf_session__register_guest(struct perf_session *session, pid_t machine_pid)
+{
+	struct machine *machine = machines__findnew(&session->machines, machine_pid);
+	struct thread *thread;
+
+	if (!machine)
+		return -ENOMEM;
+
+	machine->single_address_space = session->machines.host.single_address_space;
+
+	thread = machine__idle_thread(machine);
+	if (!thread)
+		return -ENOMEM;
+	thread__put(thread);
+
+	return 0;
+}
+
 int perf_event__process_id_index(struct perf_session *session,
 				 union perf_event *event)
 {
@@ -2762,6 +2780,7 @@ int perf_event__process_id_index(struct perf_session *session,
 	size_t e2_sz = sizeof(struct id_index_entry_2);
 	size_t etot_sz = e1_sz + e2_sz;
 	struct id_index_entry_2 *e2;
+	pid_t last_pid = 0;
 
 	max_nr = sz / e1_sz;
 	nr = ie->nr;
@@ -2787,6 +2806,7 @@ int perf_event__process_id_index(struct perf_session *session,
 	for (i = 0; i < nr; i++, (e2 ? e2++ : 0)) {
 		struct id_index_entry *e = &ie->entries[i];
 		struct perf_sample_id *sid;
+		int ret;
 
 		if (dump_trace) {
 			fprintf(stdout,	" ... id: %"PRI_lu64, e->id);
@@ -2814,6 +2834,17 @@ int perf_event__process_id_index(struct perf_session *session,
 
 		sid->machine_pid = e2->machine_pid;
 		sid->vcpu.cpu = e2->vcpu;
+
+		if (!sid->machine_pid)
+			continue;
+
+		if (sid->machine_pid != last_pid) {
+			ret = perf_session__register_guest(session, sid->machine_pid);
+			if (ret)
+				return ret;
+			last_pid = sid->machine_pid;
+			perf_guest = true;
+		}
 	}
 	return 0;
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH 12/35] perf tools: Add guest_cpu to hypervisor threads
  2022-07-11  9:31 [PATCH 00/35] perf intel-pt: Add support for tracing virtual machine user space on the host Adrian Hunter
                   ` (10 preceding siblings ...)
  2022-07-11  9:31 ` [PATCH 11/35] perf session: Create guest machines from id_index Adrian Hunter
@ 2022-07-11  9:31 ` Adrian Hunter
  2022-07-20  0:23   ` Ian Rogers
  2022-07-11  9:31 ` [PATCH 13/35] perf tools: Add machine_pid and vcpu to perf_sample Adrian Hunter
                   ` (23 subsequent siblings)
  35 siblings, 1 reply; 83+ messages in thread
From: Adrian Hunter @ 2022-07-11  9:31 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Namhyung Kim, Ian Rogers, Andi Kleen, linux-kernel, kvm

It is possible to know which guest machine was running at a point in time
based on the PID of the currently running host thread. That is, perf
identifies guest machines by the PID of the hypervisor.

To determine the guest CPU, put it on the hypervisor (QEMU) thread for
that VCPU.

This is done when processing the id_index which provides the necessary
information.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/session.c | 18 ++++++++++++++++++
 tools/perf/util/thread.c  |  1 +
 tools/perf/util/thread.h  |  1 +
 3 files changed, 20 insertions(+)

diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 1af981d5ad3c..91a091c35945 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -2769,6 +2769,20 @@ static int perf_session__register_guest(struct perf_session *session, pid_t mach
 	return 0;
 }
 
+static int perf_session__set_guest_cpu(struct perf_session *session, pid_t pid,
+				       pid_t tid, int guest_cpu)
+{
+	struct machine *machine = &session->machines.host;
+	struct thread *thread = machine__findnew_thread(machine, pid, tid);
+
+	if (!thread)
+		return -ENOMEM;
+	thread->guest_cpu = guest_cpu;
+	thread__put(thread);
+
+	return 0;
+}
+
 int perf_event__process_id_index(struct perf_session *session,
 				 union perf_event *event)
 {
@@ -2845,6 +2859,10 @@ int perf_event__process_id_index(struct perf_session *session,
 			last_pid = sid->machine_pid;
 			perf_guest = true;
 		}
+
+		ret = perf_session__set_guest_cpu(session, sid->machine_pid, e->tid, e2->vcpu);
+		if (ret)
+			return ret;
 	}
 	return 0;
 }
diff --git a/tools/perf/util/thread.c b/tools/perf/util/thread.c
index 665e5c0618ed..e3e5427e1c3c 100644
--- a/tools/perf/util/thread.c
+++ b/tools/perf/util/thread.c
@@ -47,6 +47,7 @@ struct thread *thread__new(pid_t pid, pid_t tid)
 		thread->tid = tid;
 		thread->ppid = -1;
 		thread->cpu = -1;
+		thread->guest_cpu = -1;
 		thread->lbr_stitch_enable = false;
 		INIT_LIST_HEAD(&thread->namespaces_list);
 		INIT_LIST_HEAD(&thread->comm_list);
diff --git a/tools/perf/util/thread.h b/tools/perf/util/thread.h
index b066fb30d203..241f300d7d6e 100644
--- a/tools/perf/util/thread.h
+++ b/tools/perf/util/thread.h
@@ -39,6 +39,7 @@ struct thread {
 	pid_t			tid;
 	pid_t			ppid;
 	int			cpu;
+	int			guest_cpu; /* For QEMU thread */
 	refcount_t		refcnt;
 	bool			comm_set;
 	int			comm_len;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH 13/35] perf tools: Add machine_pid and vcpu to perf_sample
  2022-07-11  9:31 [PATCH 00/35] perf intel-pt: Add support for tracing virtual machine user space on the host Adrian Hunter
                   ` (11 preceding siblings ...)
  2022-07-11  9:31 ` [PATCH 12/35] perf tools: Add guest_cpu to hypervisor threads Adrian Hunter
@ 2022-07-11  9:31 ` Adrian Hunter
  2022-07-20  0:36   ` Ian Rogers
  2022-07-11  9:31 ` [PATCH 14/35] perf tools: Use sample->machine_pid to find guest machine Adrian Hunter
                   ` (22 subsequent siblings)
  35 siblings, 1 reply; 83+ messages in thread
From: Adrian Hunter @ 2022-07-11  9:31 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Namhyung Kim, Ian Rogers, Andi Kleen, linux-kernel, kvm

When parsing a sample with a sample ID, copy machine_pid and vcpu from
perf_sample_id to perf_sample.

Note, machine_pid will be zero when unused, so only a non-zero value
represents a guest machine. vcpu should be ignored if machine_pid is zero.

Note also, machine_pid is used with events that have come from injecting a
guest perf.data file, however guest events recorded on the host (i.e. using
perf kvm) have the (QEMU) hypervisor process pid to identify them - refer
machines__find_for_cpumode().

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/event.h  |  2 ++
 tools/perf/util/evlist.c | 14 +++++++++++++-
 tools/perf/util/evsel.c  |  1 +
 3 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index cdd72e05fd28..a660f304f83c 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -148,6 +148,8 @@ struct perf_sample {
 	u64 code_page_size;
 	u64 cgroup;
 	u32 flags;
+	u32 machine_pid;
+	u32 vcpu;
 	u16 insn_len;
 	u8  cpumode;
 	u16 misc;
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 03fbe151b0c4..64f5a8074c0c 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -1507,10 +1507,22 @@ int evlist__start_workload(struct evlist *evlist)
 int evlist__parse_sample(struct evlist *evlist, union perf_event *event, struct perf_sample *sample)
 {
 	struct evsel *evsel = evlist__event2evsel(evlist, event);
+	int ret;
 
 	if (!evsel)
 		return -EFAULT;
-	return evsel__parse_sample(evsel, event, sample);
+	ret = evsel__parse_sample(evsel, event, sample);
+	if (ret)
+		return ret;
+	if (perf_guest && sample->id) {
+		struct perf_sample_id *sid = evlist__id2sid(evlist, sample->id);
+
+		if (sid) {
+			sample->machine_pid = sid->machine_pid;
+			sample->vcpu = sid->vcpu.cpu;
+		}
+	}
+	return 0;
 }
 
 int evlist__parse_sample_timestamp(struct evlist *evlist, union perf_event *event, u64 *timestamp)
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 9a30ccb7b104..14396ea5a968 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -2365,6 +2365,7 @@ int evsel__parse_sample(struct evsel *evsel, union perf_event *event,
 	data->misc    = event->header.misc;
 	data->id = -1ULL;
 	data->data_src = PERF_MEM_DATA_SRC_NONE;
+	data->vcpu = -1;
 
 	if (event->header.type != PERF_RECORD_SAMPLE) {
 		if (!evsel->core.attr.sample_id_all)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH 14/35] perf tools: Use sample->machine_pid to find guest machine
  2022-07-11  9:31 [PATCH 00/35] perf intel-pt: Add support for tracing virtual machine user space on the host Adrian Hunter
                   ` (12 preceding siblings ...)
  2022-07-11  9:31 ` [PATCH 13/35] perf tools: Add machine_pid and vcpu to perf_sample Adrian Hunter
@ 2022-07-11  9:31 ` Adrian Hunter
  2022-07-20  0:37   ` Ian Rogers
  2022-07-11  9:31 ` [PATCH 15/35] perf script: Add machine_pid and vcpu Adrian Hunter
                   ` (21 subsequent siblings)
  35 siblings, 1 reply; 83+ messages in thread
From: Adrian Hunter @ 2022-07-11  9:31 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Namhyung Kim, Ian Rogers, Andi Kleen, linux-kernel, kvm

If machine_pid is set, use it to find the guest machine.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/session.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 91a091c35945..f3e9fa557bc9 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -1418,7 +1418,9 @@ static struct machine *machines__find_for_cpumode(struct machines *machines,
 	     (sample->cpumode == PERF_RECORD_MISC_GUEST_USER))) {
 		u32 pid;
 
-		if (event->header.type == PERF_RECORD_MMAP
+		if (sample->machine_pid)
+			pid = sample->machine_pid;
+		else if (event->header.type == PERF_RECORD_MMAP
 		    || event->header.type == PERF_RECORD_MMAP2)
 			pid = event->mmap.pid;
 		else
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH 15/35] perf script: Add machine_pid and vcpu
  2022-07-11  9:31 [PATCH 00/35] perf intel-pt: Add support for tracing virtual machine user space on the host Adrian Hunter
                   ` (13 preceding siblings ...)
  2022-07-11  9:31 ` [PATCH 14/35] perf tools: Use sample->machine_pid to find guest machine Adrian Hunter
@ 2022-07-11  9:31 ` Adrian Hunter
  2022-07-20  0:39   ` Ian Rogers
  2022-07-11  9:31 ` [PATCH 16/35] perf dlfilter: " Adrian Hunter
                   ` (20 subsequent siblings)
  35 siblings, 1 reply; 83+ messages in thread
From: Adrian Hunter @ 2022-07-11  9:31 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Namhyung Kim, Ian Rogers, Andi Kleen, linux-kernel, kvm

Add fields machine_pid and vcpu. These are displayed only if machine_pid is
non-zero.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/Documentation/perf-script.txt |  7 ++++++-
 tools/perf/builtin-script.c              | 11 +++++++++++
 2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/tools/perf/Documentation/perf-script.txt b/tools/perf/Documentation/perf-script.txt
index e250ff5566cf..c09cc44e50ee 100644
--- a/tools/perf/Documentation/perf-script.txt
+++ b/tools/perf/Documentation/perf-script.txt
@@ -133,7 +133,8 @@ OPTIONS
         comm, tid, pid, time, cpu, event, trace, ip, sym, dso, addr, symoff,
         srcline, period, iregs, uregs, brstack, brstacksym, flags, bpf-output,
         brstackinsn, brstackinsnlen, brstackoff, callindent, insn, insnlen, synth,
-        phys_addr, metric, misc, srccode, ipc, data_page_size, code_page_size, ins_lat.
+        phys_addr, metric, misc, srccode, ipc, data_page_size, code_page_size, ins_lat,
+        machine_pid, vcpu.
         Field list can be prepended with the type, trace, sw or hw,
         to indicate to which event type the field list applies.
         e.g., -F sw:comm,tid,time,ip,sym  and -F trace:time,cpu,trace
@@ -226,6 +227,10 @@ OPTIONS
 	The ipc (instructions per cycle) field is synthesized and may have a value when
 	Instruction Trace decoding.
 
+	The machine_pid and vcpu fields are derived from data resulting from using
+	perf insert to insert a perf.data file recorded inside a virtual machine into
+	a perf.data file recorded on the host at the same time.
+
 	Finally, a user may not set fields to none for all event types.
 	i.e., -F "" is not allowed.
 
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 4b00a50faf00..ac19fee62d8e 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -125,6 +125,8 @@ enum perf_output_field {
 	PERF_OUTPUT_CODE_PAGE_SIZE  = 1ULL << 34,
 	PERF_OUTPUT_INS_LAT         = 1ULL << 35,
 	PERF_OUTPUT_BRSTACKINSNLEN  = 1ULL << 36,
+	PERF_OUTPUT_MACHINE_PID     = 1ULL << 37,
+	PERF_OUTPUT_VCPU            = 1ULL << 38,
 };
 
 struct perf_script {
@@ -193,6 +195,8 @@ struct output_option {
 	{.str = "code_page_size", .field = PERF_OUTPUT_CODE_PAGE_SIZE},
 	{.str = "ins_lat", .field = PERF_OUTPUT_INS_LAT},
 	{.str = "brstackinsnlen", .field = PERF_OUTPUT_BRSTACKINSNLEN},
+	{.str = "machine_pid", .field = PERF_OUTPUT_MACHINE_PID},
+	{.str = "vcpu", .field = PERF_OUTPUT_VCPU},
 };
 
 enum {
@@ -746,6 +750,13 @@ static int perf_sample__fprintf_start(struct perf_script *script,
 	int printed = 0;
 	char tstr[128];
 
+	if (PRINT_FIELD(MACHINE_PID) && sample->machine_pid)
+		printed += fprintf(fp, "VM:%5d ", sample->machine_pid);
+
+	/* Print VCPU only for guest events i.e. with machine_pid */
+	if (PRINT_FIELD(VCPU) && sample->machine_pid)
+		printed += fprintf(fp, "VCPU:%03d ", sample->vcpu);
+
 	if (PRINT_FIELD(COMM)) {
 		const char *comm = thread ? thread__comm_str(thread) : ":-1";
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH 16/35] perf dlfilter: Add machine_pid and vcpu
  2022-07-11  9:31 [PATCH 00/35] perf intel-pt: Add support for tracing virtual machine user space on the host Adrian Hunter
                   ` (14 preceding siblings ...)
  2022-07-11  9:31 ` [PATCH 15/35] perf script: Add machine_pid and vcpu Adrian Hunter
@ 2022-07-11  9:31 ` Adrian Hunter
  2022-07-20  0:42   ` Ian Rogers
  2022-07-11  9:32 ` [PATCH 17/35] perf auxtrace: Add machine_pid and vcpu to auxtrace_error Adrian Hunter
                   ` (19 subsequent siblings)
  35 siblings, 1 reply; 83+ messages in thread
From: Adrian Hunter @ 2022-07-11  9:31 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Namhyung Kim, Ian Rogers, Andi Kleen, linux-kernel, kvm

Add machine_pid and vcpu to struct perf_dlfilter_sample. The 'size' can be
used to determine if the values are present, however machine_pid is zero if
unused in any case. vcpu should be ignored if machine_pid is zero.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/Documentation/perf-dlfilter.txt | 22 ++++++++++++++++++++++
 tools/perf/include/perf/perf_dlfilter.h    |  8 ++++++++
 tools/perf/util/dlfilter.c                 |  2 ++
 3 files changed, 32 insertions(+)

diff --git a/tools/perf/Documentation/perf-dlfilter.txt b/tools/perf/Documentation/perf-dlfilter.txt
index 594f5a5a0c9e..fb22e3b31dc5 100644
--- a/tools/perf/Documentation/perf-dlfilter.txt
+++ b/tools/perf/Documentation/perf-dlfilter.txt
@@ -107,9 +107,31 @@ struct perf_dlfilter_sample {
 	__u64 raw_callchain_nr;	/* Number of raw_callchain entries */
 	const __u64 *raw_callchain; /* Refer <linux/perf_event.h> */
 	const char *event;
+	__s32 machine_pid;
+	__s32 vcpu;
 };
 ----
 
+Note: 'machine_pid' and 'vcpu' are not original members, but were added together later.
+'size' can be used to determine their presence at run time.
+PERF_DLFILTER_HAS_MACHINE_PID will be defined if they are present at compile time.
+For example:
+[source,c]
+----
+#include <perf/perf_dlfilter.h>
+#include <stddef.h>
+#include <stdbool.h>
+
+static inline bool have_machine_pid(const struct perf_dlfilter_sample *sample)
+{
+#ifdef PERF_DLFILTER_HAS_MACHINE_PID
+	return sample->size >= offsetof(struct perf_dlfilter_sample, vcpu) + sizeof(sample->vcpu);
+#else
+	return false;
+#endif
+}
+----
+
 The perf_dlfilter_fns structure
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
diff --git a/tools/perf/include/perf/perf_dlfilter.h b/tools/perf/include/perf/perf_dlfilter.h
index 3eef03d661b4..a26e2f129f83 100644
--- a/tools/perf/include/perf/perf_dlfilter.h
+++ b/tools/perf/include/perf/perf_dlfilter.h
@@ -9,6 +9,12 @@
 #include <linux/perf_event.h>
 #include <linux/types.h>
 
+/*
+ * The following macro can be used to determine if this header defines
+ * perf_dlfilter_sample machine_pid and vcpu.
+ */
+#define PERF_DLFILTER_HAS_MACHINE_PID
+
 /* Definitions for perf_dlfilter_sample flags */
 enum {
 	PERF_DLFILTER_FLAG_BRANCH	= 1ULL << 0,
@@ -62,6 +68,8 @@ struct perf_dlfilter_sample {
 	__u64 raw_callchain_nr;	/* Number of raw_callchain entries */
 	const __u64 *raw_callchain; /* Refer <linux/perf_event.h> */
 	const char *event;
+	__s32 machine_pid;
+	__s32 vcpu;
 };
 
 /*
diff --git a/tools/perf/util/dlfilter.c b/tools/perf/util/dlfilter.c
index db964d5a52af..54e4d4495e00 100644
--- a/tools/perf/util/dlfilter.c
+++ b/tools/perf/util/dlfilter.c
@@ -495,6 +495,8 @@ int dlfilter__do_filter_event(struct dlfilter *d,
 	ASSIGN(misc);
 	ASSIGN(raw_size);
 	ASSIGN(raw_data);
+	ASSIGN(machine_pid);
+	ASSIGN(vcpu);
 
 	if (sample->branch_stack) {
 		d_sample.brstack_nr = sample->branch_stack->nr;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH 17/35] perf auxtrace: Add machine_pid and vcpu to auxtrace_error
  2022-07-11  9:31 [PATCH 00/35] perf intel-pt: Add support for tracing virtual machine user space on the host Adrian Hunter
                   ` (15 preceding siblings ...)
  2022-07-11  9:31 ` [PATCH 16/35] perf dlfilter: " Adrian Hunter
@ 2022-07-11  9:32 ` Adrian Hunter
  2022-07-20  0:43   ` Ian Rogers
  2022-07-11  9:32 ` [PATCH 18/35] perf script python: Add machine_pid and vcpu Adrian Hunter
                   ` (18 subsequent siblings)
  35 siblings, 1 reply; 83+ messages in thread
From: Adrian Hunter @ 2022-07-11  9:32 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Namhyung Kim, Ian Rogers, Andi Kleen, linux-kernel, kvm

Add machine_pid and vcpu to struct perf_record_auxtrace_error. The existing
fmt member is used to identify the new format.

The new members make it possible to easily differentiate errors from guest
machines.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/lib/perf/include/perf/event.h           |  2 ++
 tools/perf/util/auxtrace.c                    | 30 +++++++++++++++----
 tools/perf/util/auxtrace.h                    |  4 +++
 .../scripting-engines/trace-event-python.c    |  4 ++-
 tools/perf/util/session.c                     |  4 +++
 5 files changed, 37 insertions(+), 7 deletions(-)

diff --git a/tools/lib/perf/include/perf/event.h b/tools/lib/perf/include/perf/event.h
index c2dbd3e88885..556bb06798f2 100644
--- a/tools/lib/perf/include/perf/event.h
+++ b/tools/lib/perf/include/perf/event.h
@@ -279,6 +279,8 @@ struct perf_record_auxtrace_error {
 	__u64			 ip;
 	__u64			 time;
 	char			 msg[MAX_AUXTRACE_ERROR_MSG];
+	__u32			 machine_pid;
+	__u32			 vcpu;
 };
 
 struct perf_record_aux {
diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c
index 511dd3caa1bc..6edab8a16de6 100644
--- a/tools/perf/util/auxtrace.c
+++ b/tools/perf/util/auxtrace.c
@@ -1189,9 +1189,10 @@ void auxtrace_buffer__free(struct auxtrace_buffer *buffer)
 	free(buffer);
 }
 
-void auxtrace_synth_error(struct perf_record_auxtrace_error *auxtrace_error, int type,
-			  int code, int cpu, pid_t pid, pid_t tid, u64 ip,
-			  const char *msg, u64 timestamp)
+void auxtrace_synth_guest_error(struct perf_record_auxtrace_error *auxtrace_error, int type,
+				int code, int cpu, pid_t pid, pid_t tid, u64 ip,
+				const char *msg, u64 timestamp,
+				pid_t machine_pid, int vcpu)
 {
 	size_t size;
 
@@ -1207,12 +1208,26 @@ void auxtrace_synth_error(struct perf_record_auxtrace_error *auxtrace_error, int
 	auxtrace_error->ip = ip;
 	auxtrace_error->time = timestamp;
 	strlcpy(auxtrace_error->msg, msg, MAX_AUXTRACE_ERROR_MSG);
-
-	size = (void *)auxtrace_error->msg - (void *)auxtrace_error +
-	       strlen(auxtrace_error->msg) + 1;
+	if (machine_pid) {
+		auxtrace_error->fmt = 2;
+		auxtrace_error->machine_pid = machine_pid;
+		auxtrace_error->vcpu = vcpu;
+		size = sizeof(*auxtrace_error);
+	} else {
+		size = (void *)auxtrace_error->msg - (void *)auxtrace_error +
+		       strlen(auxtrace_error->msg) + 1;
+	}
 	auxtrace_error->header.size = PERF_ALIGN(size, sizeof(u64));
 }
 
+void auxtrace_synth_error(struct perf_record_auxtrace_error *auxtrace_error, int type,
+			  int code, int cpu, pid_t pid, pid_t tid, u64 ip,
+			  const char *msg, u64 timestamp)
+{
+	auxtrace_synth_guest_error(auxtrace_error, type, code, cpu, pid, tid,
+				   ip, msg, timestamp, 0, -1);
+}
+
 int perf_event__synthesize_auxtrace_info(struct auxtrace_record *itr,
 					 struct perf_tool *tool,
 					 struct perf_session *session,
@@ -1662,6 +1677,9 @@ size_t perf_event__fprintf_auxtrace_error(union perf_event *event, FILE *fp)
 	if (!e->fmt)
 		msg = (const char *)&e->time;
 
+	if (e->fmt >= 2 && e->machine_pid)
+		ret += fprintf(fp, " machine_pid %d vcpu %d", e->machine_pid, e->vcpu);
+
 	ret += fprintf(fp, " cpu %d pid %d tid %d ip %#"PRI_lx64" code %u: %s\n",
 		       e->cpu, e->pid, e->tid, e->ip, e->code, msg);
 	return ret;
diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h
index cd0d25c2751c..6a4fbfd34c6b 100644
--- a/tools/perf/util/auxtrace.h
+++ b/tools/perf/util/auxtrace.h
@@ -595,6 +595,10 @@ int auxtrace_index__process(int fd, u64 size, struct perf_session *session,
 			    bool needs_swap);
 void auxtrace_index__free(struct list_head *head);
 
+void auxtrace_synth_guest_error(struct perf_record_auxtrace_error *auxtrace_error, int type,
+				int code, int cpu, pid_t pid, pid_t tid, u64 ip,
+				const char *msg, u64 timestamp,
+				pid_t machine_pid, int vcpu);
 void auxtrace_synth_error(struct perf_record_auxtrace_error *auxtrace_error, int type,
 			  int code, int cpu, pid_t pid, pid_t tid, u64 ip,
 			  const char *msg, u64 timestamp);
diff --git a/tools/perf/util/scripting-engines/trace-event-python.c b/tools/perf/util/scripting-engines/trace-event-python.c
index adba01b7d9dd..3367c5479199 100644
--- a/tools/perf/util/scripting-engines/trace-event-python.c
+++ b/tools/perf/util/scripting-engines/trace-event-python.c
@@ -1559,7 +1559,7 @@ static void python_process_auxtrace_error(struct perf_session *session __maybe_u
 		msg = (const char *)&e->time;
 	}
 
-	t = tuple_new(9);
+	t = tuple_new(11);
 
 	tuple_set_u32(t, 0, e->type);
 	tuple_set_u32(t, 1, e->code);
@@ -1570,6 +1570,8 @@ static void python_process_auxtrace_error(struct perf_session *session __maybe_u
 	tuple_set_u64(t, 6, tm);
 	tuple_set_string(t, 7, msg);
 	tuple_set_u32(t, 8, cpumode);
+	tuple_set_s32(t, 9, e->machine_pid);
+	tuple_set_s32(t, 10, e->vcpu);
 
 	call_object(handler, t, handler_name);
 
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index f3e9fa557bc9..7ea0b91013ea 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -895,6 +895,10 @@ static void perf_event__auxtrace_error_swap(union perf_event *event,
 	event->auxtrace_error.ip   = bswap_64(event->auxtrace_error.ip);
 	if (event->auxtrace_error.fmt)
 		event->auxtrace_error.time = bswap_64(event->auxtrace_error.time);
+	if (event->auxtrace_error.fmt >= 2) {
+		event->auxtrace_error.machine_pid = bswap_32(event->auxtrace_error.machine_pid);
+		event->auxtrace_error.vcpu = bswap_32(event->auxtrace_error.vcpu);
+	}
 }
 
 static void perf_event__thread_map_swap(union perf_event *event,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH 18/35] perf script python: Add machine_pid and vcpu
  2022-07-11  9:31 [PATCH 00/35] perf intel-pt: Add support for tracing virtual machine user space on the host Adrian Hunter
                   ` (16 preceding siblings ...)
  2022-07-11  9:32 ` [PATCH 17/35] perf auxtrace: Add machine_pid and vcpu to auxtrace_error Adrian Hunter
@ 2022-07-11  9:32 ` Adrian Hunter
  2022-07-20  0:43   ` Ian Rogers
  2022-07-11  9:32 ` [PATCH 19/35] perf script python: intel-pt-events: " Adrian Hunter
                   ` (17 subsequent siblings)
  35 siblings, 1 reply; 83+ messages in thread
From: Adrian Hunter @ 2022-07-11  9:32 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Namhyung Kim, Ian Rogers, Andi Kleen, linux-kernel, kvm

Add machine_pid and vcpu to python sample events and context switch events.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 .../perf/util/scripting-engines/trace-event-python.c  | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/scripting-engines/trace-event-python.c b/tools/perf/util/scripting-engines/trace-event-python.c
index 3367c5479199..5bbc1b16f368 100644
--- a/tools/perf/util/scripting-engines/trace-event-python.c
+++ b/tools/perf/util/scripting-engines/trace-event-python.c
@@ -861,6 +861,13 @@ static PyObject *get_perf_sample_dict(struct perf_sample *sample,
 	brstacksym = python_process_brstacksym(sample, al->thread);
 	pydict_set_item_string_decref(dict, "brstacksym", brstacksym);
 
+	if (sample->machine_pid) {
+		pydict_set_item_string_decref(dict_sample, "machine_pid",
+				_PyLong_FromLong(sample->machine_pid));
+		pydict_set_item_string_decref(dict_sample, "vcpu",
+				_PyLong_FromLong(sample->vcpu));
+	}
+
 	pydict_set_item_string_decref(dict_sample, "cpumode",
 			_PyLong_FromLong((unsigned long)sample->cpumode));
 
@@ -1509,7 +1516,7 @@ static void python_do_process_switch(union perf_event *event,
 		np_tid = event->context_switch.next_prev_tid;
 	}
 
-	t = tuple_new(9);
+	t = tuple_new(11);
 	if (!t)
 		return;
 
@@ -1522,6 +1529,8 @@ static void python_do_process_switch(union perf_event *event,
 	tuple_set_s32(t, 6, machine->pid);
 	tuple_set_bool(t, 7, out);
 	tuple_set_bool(t, 8, out_preempt);
+	tuple_set_s32(t, 9, sample->machine_pid);
+	tuple_set_s32(t, 10, sample->vcpu);
 
 	call_object(handler, t, handler_name);
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH 19/35] perf script python: intel-pt-events: Add machine_pid and vcpu
  2022-07-11  9:31 [PATCH 00/35] perf intel-pt: Add support for tracing virtual machine user space on the host Adrian Hunter
                   ` (17 preceding siblings ...)
  2022-07-11  9:32 ` [PATCH 18/35] perf script python: Add machine_pid and vcpu Adrian Hunter
@ 2022-07-11  9:32 ` Adrian Hunter
  2022-07-20  0:44   ` Ian Rogers
  2022-07-11  9:32 ` [PATCH 20/35] perf tools: Remove also guest kcore_dir with host kcore_dir Adrian Hunter
                   ` (16 subsequent siblings)
  35 siblings, 1 reply; 83+ messages in thread
From: Adrian Hunter @ 2022-07-11  9:32 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Namhyung Kim, Ian Rogers, Andi Kleen, linux-kernel, kvm

Add machine_pid and vcpu to the intel-pt-events.py script.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/scripts/python/intel-pt-events.py | 32 +++++++++++++++++---
 1 file changed, 27 insertions(+), 5 deletions(-)

diff --git a/tools/perf/scripts/python/intel-pt-events.py b/tools/perf/scripts/python/intel-pt-events.py
index 9b7746b89381..6be7fd8fd615 100644
--- a/tools/perf/scripts/python/intel-pt-events.py
+++ b/tools/perf/scripts/python/intel-pt-events.py
@@ -197,7 +197,12 @@ def common_start_str(comm, sample):
 	cpu = sample["cpu"]
 	pid = sample["pid"]
 	tid = sample["tid"]
-	return "%16s %5u/%-5u [%03u] %9u.%09u  " % (comm, pid, tid, cpu, ts / 1000000000, ts %1000000000)
+	if "machine_pid" in sample:
+		machine_pid = sample["machine_pid"]
+		vcpu = sample["vcpu"]
+		return "VM:%5d VCPU:%03d %16s %5u/%-5u [%03u] %9u.%09u  " % (machine_pid, vcpu, comm, pid, tid, cpu, ts / 1000000000, ts %1000000000)
+	else:
+		return "%16s %5u/%-5u [%03u] %9u.%09u  " % (comm, pid, tid, cpu, ts / 1000000000, ts %1000000000)
 
 def print_common_start(comm, sample, name):
 	flags_disp = get_optional_null(sample, "flags_disp")
@@ -379,9 +384,19 @@ def process_event(param_dict):
 		sys.exit(1)
 
 def auxtrace_error(typ, code, cpu, pid, tid, ip, ts, msg, cpumode, *x):
+	if len(x) >= 2 and x[0]:
+		machine_pid = x[0]
+		vcpu = x[1]
+	else:
+		machine_pid = 0
+		vcpu = -1
 	try:
-		print("%16s %5u/%-5u [%03u] %9u.%09u  error type %u code %u: %s ip 0x%16x" %
-			("Trace error", pid, tid, cpu, ts / 1000000000, ts %1000000000, typ, code, msg, ip))
+		if machine_pid:
+			print("VM:%5d VCPU:%03d %16s %5u/%-5u [%03u] %9u.%09u  error type %u code %u: %s ip 0x%16x" %
+				(machine_pid, vcpu, "Trace error", pid, tid, cpu, ts / 1000000000, ts %1000000000, typ, code, msg, ip))
+		else:
+			print("%16s %5u/%-5u [%03u] %9u.%09u  error type %u code %u: %s ip 0x%16x" %
+				("Trace error", pid, tid, cpu, ts / 1000000000, ts %1000000000, typ, code, msg, ip))
 	except broken_pipe_exception:
 		# Stop python printing broken pipe errors and traceback
 		sys.stdout = open(os.devnull, 'w')
@@ -396,14 +411,21 @@ def context_switch(ts, cpu, pid, tid, np_pid, np_tid, machine_pid, out, out_pree
 		preempt_str = "preempt"
 	else:
 		preempt_str = ""
+	if len(x) >= 2 and x[0]:
+		machine_pid = x[0]
+		vcpu = x[1]
+	else:
+		vcpu = None;
 	if machine_pid == -1:
 		machine_str = ""
-	else:
+	elif vcpu is None:
 		machine_str = "machine PID %d" % machine_pid
+	else:
+		machine_str = "machine PID %d VCPU %d" % (machine_pid, vcpu)
 	switch_str = "%16s %5d/%-5d [%03u] %9u.%09u %5d/%-5d %s %s" % \
 		(out_str, pid, tid, cpu, ts / 1000000000, ts %1000000000, np_pid, np_tid, machine_str, preempt_str)
 	if glb_args.all_switch_events:
-		print(switch_str);
+		print(switch_str)
 	else:
 		global glb_switch_str
 		glb_switch_str[cpu] = switch_str
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH 20/35] perf tools: Remove also guest kcore_dir with host kcore_dir
  2022-07-11  9:31 [PATCH 00/35] perf intel-pt: Add support for tracing virtual machine user space on the host Adrian Hunter
                   ` (18 preceding siblings ...)
  2022-07-11  9:32 ` [PATCH 19/35] perf script python: intel-pt-events: " Adrian Hunter
@ 2022-07-11  9:32 ` Adrian Hunter
  2022-07-20  0:45   ` Ian Rogers
  2022-07-11  9:32 ` [PATCH 21/35] perf tools: Make has_kcore_dir() work also for guest kcore_dir Adrian Hunter
                   ` (15 subsequent siblings)
  35 siblings, 1 reply; 83+ messages in thread
From: Adrian Hunter @ 2022-07-11  9:32 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Namhyung Kim, Ian Rogers, Andi Kleen, linux-kernel, kvm

Copies of /proc/kallsyms, /proc/modules and an extract of /proc/kcore can
be stored in the perf.data output directory under the subdirectory named
kcore_dir. Guest machines will have their files also under subdirectories
beginning kcore_dir__ followed by the machine pid. Remove these also when
removing kcore_dir.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/util.c | 37 +++++++++++++++++++++++++++++++++++--
 1 file changed, 35 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/util.c b/tools/perf/util/util.c
index eeb83c80f458..9b02edf9311d 100644
--- a/tools/perf/util/util.c
+++ b/tools/perf/util/util.c
@@ -200,7 +200,7 @@ static int rm_rf_depth_pat(const char *path, int depth, const char **pat)
 	return rmdir(path);
 }
 
-static int rm_rf_kcore_dir(const char *path)
+static int rm_rf_a_kcore_dir(const char *path, const char *name)
 {
 	char kcore_dir_path[PATH_MAX];
 	const char *pat[] = {
@@ -210,11 +210,44 @@ static int rm_rf_kcore_dir(const char *path)
 		NULL,
 	};
 
-	snprintf(kcore_dir_path, sizeof(kcore_dir_path), "%s/kcore_dir", path);
+	snprintf(kcore_dir_path, sizeof(kcore_dir_path), "%s/%s", path, name);
 
 	return rm_rf_depth_pat(kcore_dir_path, 0, pat);
 }
 
+static bool kcore_dir_filter(const char *name __maybe_unused, struct dirent *d)
+{
+	const char *pat[] = {
+		"kcore_dir",
+		"kcore_dir__[1-9]*",
+		NULL,
+	};
+
+	return match_pat(d->d_name, pat);
+}
+
+static int rm_rf_kcore_dir(const char *path)
+{
+	struct strlist *kcore_dirs;
+	struct str_node *nd;
+	int ret;
+
+	kcore_dirs = lsdir(path, kcore_dir_filter);
+
+	if (!kcore_dirs)
+		return 0;
+
+	strlist__for_each_entry(nd, kcore_dirs) {
+		ret = rm_rf_a_kcore_dir(path, nd->s);
+		if (ret)
+			return ret;
+	}
+
+	strlist__delete(kcore_dirs);
+
+	return 0;
+}
+
 int rm_rf_perf_data(const char *path)
 {
 	const char *pat[] = {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH 21/35] perf tools: Make has_kcore_dir() work also for guest kcore_dir
  2022-07-11  9:31 [PATCH 00/35] perf intel-pt: Add support for tracing virtual machine user space on the host Adrian Hunter
                   ` (19 preceding siblings ...)
  2022-07-11  9:32 ` [PATCH 20/35] perf tools: Remove also guest kcore_dir with host kcore_dir Adrian Hunter
@ 2022-07-11  9:32 ` Adrian Hunter
  2022-07-20  0:49   ` Ian Rogers
  2022-07-11  9:32 ` [PATCH 22/35] perf tools: Automatically use guest kcore_dir if present Adrian Hunter
                   ` (14 subsequent siblings)
  35 siblings, 1 reply; 83+ messages in thread
From: Adrian Hunter @ 2022-07-11  9:32 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Namhyung Kim, Ian Rogers, Andi Kleen, linux-kernel, kvm

Copies of /proc/kallsyms, /proc/modules and an extract of /proc/kcore can
be stored in the perf.data output directory under the subdirectory named
kcore_dir. Guest machines will have their files also under subdirectories
beginning kcore_dir__ followed by the machine pid. Make has_kcore_dir()
return true also if there is a guest machine kcore_dir.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/data.c | 24 +++++++++++++++---------
 1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/tools/perf/util/data.c b/tools/perf/util/data.c
index caabeac24c69..9782ccbe595d 100644
--- a/tools/perf/util/data.c
+++ b/tools/perf/util/data.c
@@ -3,6 +3,7 @@
 #include <linux/kernel.h>
 #include <linux/string.h>
 #include <linux/zalloc.h>
+#include <linux/err.h>
 #include <sys/types.h>
 #include <sys/stat.h>
 #include <errno.h>
@@ -481,16 +482,21 @@ int perf_data__make_kcore_dir(struct perf_data *data, char *buf, size_t buf_sz)
 
 bool has_kcore_dir(const char *path)
 {
-	char *kcore_dir;
-	int ret;
-
-	if (asprintf(&kcore_dir, "%s/kcore_dir", path) < 0)
-		return false;
-
-	ret = access(kcore_dir, F_OK);
+	struct dirent *d = ERR_PTR(-EINVAL);
+	const char *name = "kcore_dir";
+	DIR *dir = opendir(path);
+	size_t n = strlen(name);
+	bool result = false;
+
+	if (dir) {
+		while (d && !result) {
+			d = readdir(dir);
+			result = d ? strncmp(d->d_name, name, n) : false;
+		}
+		closedir(dir);
+	}
 
-	free(kcore_dir);
-	return !ret;
+	return result;
 }
 
 char *perf_data__kallsyms_name(struct perf_data *data)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH 22/35] perf tools: Automatically use guest kcore_dir if present
  2022-07-11  9:31 [PATCH 00/35] perf intel-pt: Add support for tracing virtual machine user space on the host Adrian Hunter
                   ` (20 preceding siblings ...)
  2022-07-11  9:32 ` [PATCH 21/35] perf tools: Make has_kcore_dir() work also for guest kcore_dir Adrian Hunter
@ 2022-07-11  9:32 ` Adrian Hunter
  2022-07-20  0:51   ` Ian Rogers
  2022-07-11  9:32 ` [PATCH 23/35] perf tools: Add reallocarray_as_needed() Adrian Hunter
                   ` (13 subsequent siblings)
  35 siblings, 1 reply; 83+ messages in thread
From: Adrian Hunter @ 2022-07-11  9:32 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Namhyung Kim, Ian Rogers, Andi Kleen, linux-kernel, kvm

When registering a guest machine using machine_pid from the id index,
check perf.data for a matching kcore_dir subdirectory and set the
kallsyms file name accordingly. If set, use it to find the machine's
kernel symbols and object code (from kcore).

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/data.c    | 19 +++++++++++++++++++
 tools/perf/util/data.h    |  1 +
 tools/perf/util/machine.h |  1 +
 tools/perf/util/session.c |  2 ++
 tools/perf/util/symbol.c  |  6 ++++--
 5 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/data.c b/tools/perf/util/data.c
index 9782ccbe595d..a7f68c309545 100644
--- a/tools/perf/util/data.c
+++ b/tools/perf/util/data.c
@@ -518,6 +518,25 @@ char *perf_data__kallsyms_name(struct perf_data *data)
 	return kallsyms_name;
 }
 
+char *perf_data__guest_kallsyms_name(struct perf_data *data, pid_t machine_pid)
+{
+	char *kallsyms_name;
+	struct stat st;
+
+	if (!data->is_dir)
+		return NULL;
+
+	if (asprintf(&kallsyms_name, "%s/kcore_dir__%d/kallsyms", data->path, machine_pid) < 0)
+		return NULL;
+
+	if (stat(kallsyms_name, &st)) {
+		free(kallsyms_name);
+		return NULL;
+	}
+
+	return kallsyms_name;
+}
+
 bool is_perf_data(const char *path)
 {
 	bool ret = false;
diff --git a/tools/perf/util/data.h b/tools/perf/util/data.h
index 7de53d6e2d7f..173132d502f5 100644
--- a/tools/perf/util/data.h
+++ b/tools/perf/util/data.h
@@ -101,5 +101,6 @@ unsigned long perf_data__size(struct perf_data *data);
 int perf_data__make_kcore_dir(struct perf_data *data, char *buf, size_t buf_sz);
 bool has_kcore_dir(const char *path);
 char *perf_data__kallsyms_name(struct perf_data *data);
+char *perf_data__guest_kallsyms_name(struct perf_data *data, pid_t machine_pid);
 bool is_perf_data(const char *path);
 #endif /* __PERF_DATA_H */
diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
index 5d7daf7cb7bc..d40b23c71420 100644
--- a/tools/perf/util/machine.h
+++ b/tools/perf/util/machine.h
@@ -48,6 +48,7 @@ struct machine {
 	bool		  single_address_space;
 	char		  *root_dir;
 	char		  *mmap_name;
+	char		  *kallsyms_filename;
 	struct threads    threads[THREADS__TABLE_SIZE];
 	struct vdso_info  *vdso_info;
 	struct perf_env   *env;
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 7ea0b91013ea..98e16659a149 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -2772,6 +2772,8 @@ static int perf_session__register_guest(struct perf_session *session, pid_t mach
 		return -ENOMEM;
 	thread__put(thread);
 
+	machine->kallsyms_filename = perf_data__guest_kallsyms_name(session->data, machine_pid);
+
 	return 0;
 }
 
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index f72baf636724..a4b22caa7c24 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -2300,11 +2300,13 @@ static int dso__load_kernel_sym(struct dso *dso, struct map *map)
 static int dso__load_guest_kernel_sym(struct dso *dso, struct map *map)
 {
 	int err;
-	const char *kallsyms_filename = NULL;
+	const char *kallsyms_filename;
 	struct machine *machine = map__kmaps(map)->machine;
 	char path[PATH_MAX];
 
-	if (machine__is_default_guest(machine)) {
+	if (machine->kallsyms_filename) {
+		kallsyms_filename = machine->kallsyms_filename;
+	} else if (machine__is_default_guest(machine)) {
 		/*
 		 * if the user specified a vmlinux filename, use it and only
 		 * it, reporting errors to the user if it cannot be used.
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH 23/35] perf tools: Add reallocarray_as_needed()
  2022-07-11  9:31 [PATCH 00/35] perf intel-pt: Add support for tracing virtual machine user space on the host Adrian Hunter
                   ` (21 preceding siblings ...)
  2022-07-11  9:32 ` [PATCH 22/35] perf tools: Automatically use guest kcore_dir if present Adrian Hunter
@ 2022-07-11  9:32 ` Adrian Hunter
  2022-07-20  0:55   ` Ian Rogers
  2022-07-11  9:32 ` [PATCH 24/35] perf inject: Add support for injecting guest sideband events Adrian Hunter
                   ` (12 subsequent siblings)
  35 siblings, 1 reply; 83+ messages in thread
From: Adrian Hunter @ 2022-07-11  9:32 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Namhyung Kim, Ian Rogers, Andi Kleen, linux-kernel, kvm

Add helper reallocarray_as_needed() to reallocate an array to a larger
size and initialize the extra entries to an arbitrary value.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/util.c | 33 +++++++++++++++++++++++++++++++++
 tools/perf/util/util.h | 15 +++++++++++++++
 2 files changed, 48 insertions(+)

diff --git a/tools/perf/util/util.c b/tools/perf/util/util.c
index 9b02edf9311d..391c1e928bd7 100644
--- a/tools/perf/util/util.c
+++ b/tools/perf/util/util.c
@@ -18,6 +18,7 @@
 #include <linux/kernel.h>
 #include <linux/log2.h>
 #include <linux/time64.h>
+#include <linux/overflow.h>
 #include <unistd.h>
 #include "cap.h"
 #include "strlist.h"
@@ -500,3 +501,35 @@ char *filename_with_chroot(int pid, const char *filename)
 
 	return new_name;
 }
+
+/*
+ * Reallocate an array *arr of size *arr_sz so that it is big enough to contain
+ * x elements of size msz, initializing new entries to *init_val or zero if
+ * init_val is NULL
+ */
+int do_realloc_array_as_needed(void **arr, size_t *arr_sz, size_t x, size_t msz, const void *init_val)
+{
+	size_t new_sz = *arr_sz;
+	void *new_arr;
+	size_t i;
+
+	if (!new_sz)
+		new_sz = msz >= 64 ? 1 : roundup(64, msz); /* Start with at least 64 bytes */
+	while (x >= new_sz) {
+		if (check_mul_overflow(new_sz, (size_t)2, &new_sz))
+			return -ENOMEM;
+	}
+	if (new_sz == *arr_sz)
+		return 0;
+	new_arr = calloc(new_sz, msz);
+	if (!new_arr)
+		return -ENOMEM;
+	memcpy(new_arr, *arr, *arr_sz * msz);
+	if (init_val) {
+		for (i = *arr_sz; i < new_sz; i++)
+			memcpy(new_arr + (i * msz), init_val, msz);
+	}
+	*arr = new_arr;
+	*arr_sz = new_sz;
+	return 0;
+}
diff --git a/tools/perf/util/util.h b/tools/perf/util/util.h
index 0f78f1e7782d..c1f2d423a9ec 100644
--- a/tools/perf/util/util.h
+++ b/tools/perf/util/util.h
@@ -79,4 +79,19 @@ struct perf_debuginfod {
 void perf_debuginfod_setup(struct perf_debuginfod *di);
 
 char *filename_with_chroot(int pid, const char *filename);
+
+int do_realloc_array_as_needed(void **arr, size_t *arr_sz, size_t x,
+			       size_t msz, const void *init_val);
+
+#define realloc_array_as_needed(a, n, x, v) ({			\
+	typeof(x) __x = (x);					\
+	__x >= (n) ?						\
+		do_realloc_array_as_needed((void **)&(a),	\
+					   &(n),		\
+					   __x,			\
+					   sizeof(*(a)),	\
+					   (const void *)(v)) :	\
+		0;						\
+	})
+
 #endif /* GIT_COMPAT_UTIL_H */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH 24/35] perf inject: Add support for injecting guest sideband events
  2022-07-11  9:31 [PATCH 00/35] perf intel-pt: Add support for tracing virtual machine user space on the host Adrian Hunter
                   ` (22 preceding siblings ...)
  2022-07-11  9:32 ` [PATCH 23/35] perf tools: Add reallocarray_as_needed() Adrian Hunter
@ 2022-07-11  9:32 ` Adrian Hunter
  2022-07-20  1:06   ` Ian Rogers
  2022-07-11  9:32 ` [PATCH 25/35] perf machine: Use realloc_array_as_needed() in machine__set_current_tid() Adrian Hunter
                   ` (11 subsequent siblings)
  35 siblings, 1 reply; 83+ messages in thread
From: Adrian Hunter @ 2022-07-11  9:32 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Namhyung Kim, Ian Rogers, Andi Kleen, linux-kernel, kvm

Inject events from a perf.data file recorded in a virtual machine into
a perf.data file recorded on the host at the same time.

Only side band events (e.g. mmap, comm, fork, exit etc) and build IDs are
injected.  Additionally, the guest kcore_dir is copied as kcore_dir__
appended to the machine PID.

This is non-trivial because:
 o It is not possible to process 2 sessions simultaneously so instead
 events are first written to a temporary file.
 o To avoid conflict, guest sample IDs are replaced with new unused sample
 IDs.
 o Guest event's CPU is changed to be the host CPU because it is more
 useful for reporting and analysis.
 o Sample ID is mapped to machine PID which is recorded with VCPU in the
 id index. This is important to allow guest events to be related to the
 guest machine and VCPU.
 o Timestamps must be converted.
 o Events are inserted to obey finished-round ordering.

The anticipated use-case is:
 - start recording sideband events in a guest machine
 - start recording an AUX area trace on the host which can trace also the
 guest (e.g. Intel PT)
 - run test case on the guest
 - stop recording on the host
 - stop recording on the guest
 - copy the guest perf.data file to the host
 - inject the guest perf.data file sideband events into the host perf.data
 file using perf inject
 - the resulting perf.data file can now be used

Subsequent patches provide Intel PT support for this.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/Documentation/perf-inject.txt |   17 +
 tools/perf/builtin-inject.c              | 1043 +++++++++++++++++++++-
 2 files changed, 1059 insertions(+), 1 deletion(-)

diff --git a/tools/perf/Documentation/perf-inject.txt b/tools/perf/Documentation/perf-inject.txt
index 0570a1ccd344..646aa31586ed 100644
--- a/tools/perf/Documentation/perf-inject.txt
+++ b/tools/perf/Documentation/perf-inject.txt
@@ -85,6 +85,23 @@ include::itrace.txt[]
 	without updating it. Currently this option is supported only by
 	Intel PT, refer linkperf:perf-intel-pt[1]
 
+--guest-data=<path>,<pid>[,<time offset>[,<time scale>]]::
+	Insert events from a perf.data file recorded in a virtual machine at
+	the same time as the input perf.data file was recorded on the host.
+	The Process ID (PID) of the QEMU hypervisor process must be provided,
+	and the time offset and time scale (multiplier) will likely be needed
+	to convert guest time stamps into host time stamps. For example, for
+	x86 the TSC Offset and Multiplier could be provided for a virtual machine
+	using Linux command line option no-kvmclock.
+	Currently only mmap, mmap2, comm, task, context_switch, ksymbol,
+	and text_poke events are inserted, as well as build ID information.
+	The QEMU option -name debug-threads=on is needed so that thread names
+	can be used to determine which thread is running which VCPU. Note
+	libvirt seems to use this by default.
+	When using perf record in the guest, option --sample-identifier
+	should be used, and also --buildid-all and --switch-events may be
+	useful.
+
 SEE ALSO
 --------
 linkperf:perf-record[1], linkperf:perf-report[1], linkperf:perf-archive[1],
diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
index c800911f68e7..fd4547bb75f7 100644
--- a/tools/perf/builtin-inject.c
+++ b/tools/perf/builtin-inject.c
@@ -26,6 +26,7 @@
 #include "util/thread.h"
 #include "util/namespaces.h"
 #include "util/util.h"
+#include "util/tsc.h"
 
 #include <internal/lib.h>
 
@@ -35,8 +36,70 @@
 
 #include <linux/list.h>
 #include <linux/string.h>
+#include <linux/zalloc.h>
+#include <linux/hash.h>
 #include <errno.h>
 #include <signal.h>
+#include <inttypes.h>
+
+struct guest_event {
+	struct perf_sample		sample;
+	union perf_event		*event;
+	char				event_buf[PERF_SAMPLE_MAX_SIZE];
+};
+
+struct guest_id {
+	/* hlist_node must be first, see free_hlist() */
+	struct hlist_node		node;
+	u64				id;
+	u64				host_id;
+	u32				vcpu;
+};
+
+struct guest_tid {
+	/* hlist_node must be first, see free_hlist() */
+	struct hlist_node		node;
+	/* Thread ID of QEMU thread */
+	u32				tid;
+	u32				vcpu;
+};
+
+struct guest_vcpu {
+	/* Current host CPU */
+	u32				cpu;
+	/* Thread ID of QEMU thread */
+	u32				tid;
+};
+
+struct guest_session {
+	char				*perf_data_file;
+	u32				machine_pid;
+	u64				time_offset;
+	double				time_scale;
+	struct perf_tool		tool;
+	struct perf_data		data;
+	struct perf_session		*session;
+	char				*tmp_file_name;
+	int				tmp_fd;
+	struct perf_tsc_conversion	host_tc;
+	struct perf_tsc_conversion	guest_tc;
+	bool				copy_kcore_dir;
+	bool				have_tc;
+	bool				fetched;
+	bool				ready;
+	u16				dflt_id_hdr_size;
+	u64				dflt_id;
+	u64				highest_id;
+	/* Array of guest_vcpu */
+	struct guest_vcpu		*vcpu;
+	size_t				vcpu_cnt;
+	/* Hash table for guest_id */
+	struct hlist_head		heads[PERF_EVLIST__HLIST_SIZE];
+	/* Hash table for guest_tid */
+	struct hlist_head		tids[PERF_EVLIST__HLIST_SIZE];
+	/* Place to stash next guest event */
+	struct guest_event		ev;
+};
 
 struct perf_inject {
 	struct perf_tool	tool;
@@ -59,6 +122,7 @@ struct perf_inject {
 	struct itrace_synth_opts itrace_synth_opts;
 	char			event_copy[PERF_SAMPLE_MAX_SIZE];
 	struct perf_file_section secs[HEADER_FEAT_BITS];
+	struct guest_session	guest_session;
 };
 
 struct event_entry {
@@ -698,6 +762,841 @@ static int perf_inject__sched_stat(struct perf_tool *tool,
 	return perf_event__repipe(tool, event_sw, &sample_sw, machine);
 }
 
+static struct guest_vcpu *guest_session__vcpu(struct guest_session *gs, u32 vcpu)
+{
+	if (realloc_array_as_needed(gs->vcpu, gs->vcpu_cnt, vcpu, NULL))
+		return NULL;
+	return &gs->vcpu[vcpu];
+}
+
+static int guest_session__output_bytes(struct guest_session *gs, void *buf, size_t sz)
+{
+	ssize_t ret = writen(gs->tmp_fd, buf, sz);
+
+	return ret < 0 ? ret : 0;
+}
+
+static int guest_session__repipe(struct perf_tool *tool,
+				 union perf_event *event,
+				 struct perf_sample *sample __maybe_unused,
+				 struct machine *machine __maybe_unused)
+{
+	struct guest_session *gs = container_of(tool, struct guest_session, tool);
+
+	return guest_session__output_bytes(gs, event, event->header.size);
+}
+
+static int guest_session__map_tid(struct guest_session *gs, u32 tid, u32 vcpu)
+{
+	struct guest_tid *guest_tid = zalloc(sizeof(*guest_tid));
+	int hash;
+
+	if (!guest_tid)
+		return -ENOMEM;
+
+	guest_tid->tid = tid;
+	guest_tid->vcpu = vcpu;
+	hash = hash_32(guest_tid->tid, PERF_EVLIST__HLIST_BITS);
+	hlist_add_head(&guest_tid->node, &gs->tids[hash]);
+
+	return 0;
+}
+
+static int host_peek_vm_comms_cb(struct perf_session *session __maybe_unused,
+				 union perf_event *event,
+				 u64 offset __maybe_unused, void *data)
+{
+	struct guest_session *gs = data;
+	unsigned int vcpu;
+	struct guest_vcpu *guest_vcpu;
+	int ret;
+
+	if (event->header.type != PERF_RECORD_COMM ||
+	    event->comm.pid != gs->machine_pid)
+		return 0;
+
+	/*
+	 * QEMU option -name debug-threads=on, causes thread names formatted as
+	 * below, although it is not an ABI. Also libvirt seems to use this by
+	 * default. Here we rely on it to tell us which thread is which VCPU.
+	 */
+	ret = sscanf(event->comm.comm, "CPU %u/KVM", &vcpu);
+	if (ret <= 0)
+		return ret;
+	pr_debug("Found VCPU: tid %u comm %s vcpu %u\n",
+		 event->comm.tid, event->comm.comm, vcpu);
+	if (vcpu > INT_MAX) {
+		pr_err("Invalid VCPU %u\n", vcpu);
+		return -EINVAL;
+	}
+	guest_vcpu = guest_session__vcpu(gs, vcpu);
+	if (!guest_vcpu)
+		return -ENOMEM;
+	if (guest_vcpu->tid && guest_vcpu->tid != event->comm.tid) {
+		pr_err("Fatal error: Two threads found with the same VCPU\n");
+		return -EINVAL;
+	}
+	guest_vcpu->tid = event->comm.tid;
+
+	return guest_session__map_tid(gs, event->comm.tid, vcpu);
+}
+
+static int host_peek_vm_comms(struct perf_session *session, struct guest_session *gs)
+{
+	return perf_session__peek_events(session, session->header.data_offset,
+					 session->header.data_size,
+					 host_peek_vm_comms_cb, gs);
+}
+
+static bool evlist__is_id_used(struct evlist *evlist, u64 id)
+{
+	return evlist__id2sid(evlist, id);
+}
+
+static u64 guest_session__allocate_new_id(struct guest_session *gs, struct evlist *host_evlist)
+{
+	do {
+		gs->highest_id += 1;
+	} while (!gs->highest_id || evlist__is_id_used(host_evlist, gs->highest_id));
+
+	return gs->highest_id;
+}
+
+static int guest_session__map_id(struct guest_session *gs, u64 id, u64 host_id, u32 vcpu)
+{
+	struct guest_id *guest_id = zalloc(sizeof(*guest_id));
+	int hash;
+
+	if (!guest_id)
+		return -ENOMEM;
+
+	guest_id->id = id;
+	guest_id->host_id = host_id;
+	guest_id->vcpu = vcpu;
+	hash = hash_64(guest_id->id, PERF_EVLIST__HLIST_BITS);
+	hlist_add_head(&guest_id->node, &gs->heads[hash]);
+
+	return 0;
+}
+
+static u64 evlist__find_highest_id(struct evlist *evlist)
+{
+	struct evsel *evsel;
+	u64 highest_id = 1;
+
+	evlist__for_each_entry(evlist, evsel) {
+		u32 j;
+
+		for (j = 0; j < evsel->core.ids; j++) {
+			u64 id = evsel->core.id[j];
+
+			if (id > highest_id)
+				highest_id = id;
+		}
+	}
+
+	return highest_id;
+}
+
+static int guest_session__map_ids(struct guest_session *gs, struct evlist *host_evlist)
+{
+	struct evlist *evlist = gs->session->evlist;
+	struct evsel *evsel;
+	int ret;
+
+	evlist__for_each_entry(evlist, evsel) {
+		u32 j;
+
+		for (j = 0; j < evsel->core.ids; j++) {
+			struct perf_sample_id *sid;
+			u64 host_id;
+			u64 id;
+
+			id = evsel->core.id[j];
+			sid = evlist__id2sid(evlist, id);
+			if (!sid || sid->cpu.cpu == -1)
+				continue;
+			host_id = guest_session__allocate_new_id(gs, host_evlist);
+			ret = guest_session__map_id(gs, id, host_id, sid->cpu.cpu);
+			if (ret)
+				return ret;
+		}
+	}
+
+	return 0;
+}
+
+static struct guest_id *guest_session__lookup_id(struct guest_session *gs, u64 id)
+{
+	struct hlist_head *head;
+	struct guest_id *guest_id;
+	int hash;
+
+	hash = hash_64(id, PERF_EVLIST__HLIST_BITS);
+	head = &gs->heads[hash];
+
+	hlist_for_each_entry(guest_id, head, node)
+		if (guest_id->id == id)
+			return guest_id;
+
+	return NULL;
+}
+
+static int process_attr(struct perf_tool *tool, union perf_event *event,
+			struct perf_sample *sample __maybe_unused,
+			struct machine *machine __maybe_unused)
+{
+	struct perf_inject *inject = container_of(tool, struct perf_inject, tool);
+
+	return perf_event__process_attr(tool, event, &inject->session->evlist);
+}
+
+static int guest_session__add_attr(struct guest_session *gs, struct evsel *evsel)
+{
+	struct perf_inject *inject = container_of(gs, struct perf_inject, guest_session);
+	struct perf_event_attr attr = evsel->core.attr;
+	u64 *id_array;
+	u32 *vcpu_array;
+	int ret = -ENOMEM;
+	u32 i;
+
+	id_array = calloc(evsel->core.ids, sizeof(*id_array));
+	if (!id_array)
+		return -ENOMEM;
+
+	vcpu_array = calloc(evsel->core.ids, sizeof(*vcpu_array));
+	if (!vcpu_array)
+		goto out;
+
+	for (i = 0; i < evsel->core.ids; i++) {
+		u64 id = evsel->core.id[i];
+		struct guest_id *guest_id = guest_session__lookup_id(gs, id);
+
+		if (!guest_id) {
+			pr_err("Failed to find guest id %"PRIu64"\n", id);
+			ret = -EINVAL;
+			goto out;
+		}
+		id_array[i] = guest_id->host_id;
+		vcpu_array[i] = guest_id->vcpu;
+	}
+
+	attr.sample_type |= PERF_SAMPLE_IDENTIFIER;
+	attr.exclude_host = 1;
+	attr.exclude_guest = 0;
+
+	ret = perf_event__synthesize_attr(&inject->tool, &attr, evsel->core.ids,
+					  id_array, process_attr);
+	if (ret)
+		pr_err("Failed to add guest attr.\n");
+
+	for (i = 0; i < evsel->core.ids; i++) {
+		struct perf_sample_id *sid;
+		u32 vcpu = vcpu_array[i];
+
+		sid = evlist__id2sid(inject->session->evlist, id_array[i]);
+		/* Guest event is per-thread from the host point of view */
+		sid->cpu.cpu = -1;
+		sid->tid = gs->vcpu[vcpu].tid;
+		sid->machine_pid = gs->machine_pid;
+		sid->vcpu.cpu = vcpu;
+	}
+out:
+	free(vcpu_array);
+	free(id_array);
+	return ret;
+}
+
+static int guest_session__add_attrs(struct guest_session *gs)
+{
+	struct evlist *evlist = gs->session->evlist;
+	struct evsel *evsel;
+	int ret;
+
+	evlist__for_each_entry(evlist, evsel) {
+		ret = guest_session__add_attr(gs, evsel);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
+static int synthesize_id_index(struct perf_inject *inject, size_t new_cnt)
+{
+	struct perf_session *session = inject->session;
+	struct evlist *evlist = session->evlist;
+	struct machine *machine = &session->machines.host;
+	size_t from = evlist->core.nr_entries - new_cnt;
+
+	return __perf_event__synthesize_id_index(&inject->tool, perf_event__repipe,
+						 evlist, machine, from);
+}
+
+static struct guest_tid *guest_session__lookup_tid(struct guest_session *gs, u32 tid)
+{
+	struct hlist_head *head;
+	struct guest_tid *guest_tid;
+	int hash;
+
+	hash = hash_32(tid, PERF_EVLIST__HLIST_BITS);
+	head = &gs->tids[hash];
+
+	hlist_for_each_entry(guest_tid, head, node)
+		if (guest_tid->tid == tid)
+			return guest_tid;
+
+	return NULL;
+}
+
+static bool dso__is_in_kernel_space(struct dso *dso)
+{
+	if (dso__is_vdso(dso))
+		return false;
+
+	return dso__is_kcore(dso) ||
+	       dso->kernel ||
+	       is_kernel_module(dso->long_name, PERF_RECORD_MISC_CPUMODE_UNKNOWN);
+}
+
+static u64 evlist__first_id(struct evlist *evlist)
+{
+	struct evsel *evsel;
+
+	evlist__for_each_entry(evlist, evsel) {
+		if (evsel->core.ids)
+			return evsel->core.id[0];
+	}
+	return 0;
+}
+
+static int process_build_id(struct perf_tool *tool,
+			    union perf_event *event,
+			    struct perf_sample *sample __maybe_unused,
+			    struct machine *machine __maybe_unused)
+{
+	struct perf_inject *inject = container_of(tool, struct perf_inject, tool);
+
+	return perf_event__process_build_id(inject->session, event);
+}
+
+static int synthesize_build_id(struct perf_inject *inject, struct dso *dso, pid_t machine_pid)
+{
+	struct machine *machine = perf_session__findnew_machine(inject->session, machine_pid);
+	u8 cpumode = dso__is_in_kernel_space(dso) ?
+			PERF_RECORD_MISC_GUEST_KERNEL :
+			PERF_RECORD_MISC_GUEST_USER;
+
+	if (!machine)
+		return -ENOMEM;
+
+	dso->hit = 1;
+
+	return perf_event__synthesize_build_id(&inject->tool, dso, cpumode,
+					       process_build_id, machine);
+}
+
+static int guest_session__add_build_ids(struct guest_session *gs)
+{
+	struct perf_inject *inject = container_of(gs, struct perf_inject, guest_session);
+	struct machine *machine = &gs->session->machines.host;
+	struct dso *dso;
+	int ret;
+
+	/* Build IDs will be put in the Build ID feature section */
+	perf_header__set_feat(&inject->session->header, HEADER_BUILD_ID);
+
+	dsos__for_each_with_build_id(dso, &machine->dsos.head) {
+		ret = synthesize_build_id(inject, dso, gs->machine_pid);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
+static int guest_session__ksymbol_event(struct perf_tool *tool,
+					union perf_event *event,
+					struct perf_sample *sample __maybe_unused,
+					struct machine *machine __maybe_unused)
+{
+	struct guest_session *gs = container_of(tool, struct guest_session, tool);
+
+	/* Only support out-of-line i.e. no BPF support */
+	if (event->ksymbol.ksym_type != PERF_RECORD_KSYMBOL_TYPE_OOL)
+		return 0;
+
+	return guest_session__output_bytes(gs, event, event->header.size);
+}
+
+static int guest_session__start(struct guest_session *gs, const char *name, bool force)
+{
+	char tmp_file_name[] = "/tmp/perf-inject-guest_session-XXXXXX";
+	struct perf_session *session;
+	int ret;
+
+	/* Only these events will be injected */
+	gs->tool.mmap		= guest_session__repipe;
+	gs->tool.mmap2		= guest_session__repipe;
+	gs->tool.comm		= guest_session__repipe;
+	gs->tool.fork		= guest_session__repipe;
+	gs->tool.exit		= guest_session__repipe;
+	gs->tool.lost		= guest_session__repipe;
+	gs->tool.context_switch	= guest_session__repipe;
+	gs->tool.ksymbol	= guest_session__ksymbol_event;
+	gs->tool.text_poke	= guest_session__repipe;
+	/*
+	 * Processing a build ID creates a struct dso with that build ID. Later,
+	 * all guest dsos are iterated and the build IDs processed into the host
+	 * session where they will be output to the Build ID feature section
+	 * when the perf.data file header is written.
+	 */
+	gs->tool.build_id	= perf_event__process_build_id;
+	/* Process the id index to know what VCPU an ID belongs to */
+	gs->tool.id_index	= perf_event__process_id_index;
+
+	gs->tool.ordered_events	= true;
+	gs->tool.ordering_requires_timestamps = true;
+
+	gs->data.path	= name;
+	gs->data.force	= force;
+	gs->data.mode	= PERF_DATA_MODE_READ;
+
+	session = perf_session__new(&gs->data, &gs->tool);
+	if (IS_ERR(session))
+		return PTR_ERR(session);
+	gs->session = session;
+
+	/*
+	 * Initial events have zero'd ID samples. Get default ID sample size
+	 * used for removing them.
+	 */
+	gs->dflt_id_hdr_size = session->machines.host.id_hdr_size;
+	/* And default ID for adding back a host-compatible ID sample */
+	gs->dflt_id = evlist__first_id(session->evlist);
+	if (!gs->dflt_id) {
+		pr_err("Guest data has no sample IDs");
+		return -EINVAL;
+	}
+
+	/* Temporary file for guest events */
+	gs->tmp_file_name = strdup(tmp_file_name);
+	if (!gs->tmp_file_name)
+		return -ENOMEM;
+	gs->tmp_fd = mkstemp(gs->tmp_file_name);
+	if (gs->tmp_fd < 0)
+		return -errno;
+
+	if (zstd_init(&gs->session->zstd_data, 0) < 0)
+		pr_warning("Guest session decompression initialization failed.\n");
+
+	/*
+	 * perf does not support processing 2 sessions simultaneously, so output
+	 * guest events to a temporary file.
+	 */
+	ret = perf_session__process_events(gs->session);
+	if (ret)
+		return ret;
+
+	if (lseek(gs->tmp_fd, 0, SEEK_SET))
+		return -errno;
+
+	return 0;
+}
+
+/* Free hlist nodes assuming hlist_node is the first member of hlist entries */
+static void free_hlist(struct hlist_head *heads, size_t hlist_sz)
+{
+	struct hlist_node *pos, *n;
+	size_t i;
+
+	for (i = 0; i < hlist_sz; ++i) {
+		hlist_for_each_safe(pos, n, &heads[i]) {
+			hlist_del(pos);
+			free(pos);
+		}
+	}
+}
+
+static void guest_session__exit(struct guest_session *gs)
+{
+	if (gs->session) {
+		perf_session__delete(gs->session);
+		free_hlist(gs->heads, PERF_EVLIST__HLIST_SIZE);
+		free_hlist(gs->tids, PERF_EVLIST__HLIST_SIZE);
+	}
+	if (gs->tmp_file_name) {
+		if (gs->tmp_fd >= 0)
+			close(gs->tmp_fd);
+		unlink(gs->tmp_file_name);
+		free(gs->tmp_file_name);
+	}
+	free(gs->vcpu);
+	free(gs->perf_data_file);
+}
+
+static void get_tsc_conv(struct perf_tsc_conversion *tc, struct perf_record_time_conv *time_conv)
+{
+	tc->time_shift		= time_conv->time_shift;
+	tc->time_mult		= time_conv->time_mult;
+	tc->time_zero		= time_conv->time_zero;
+	tc->time_cycles		= time_conv->time_cycles;
+	tc->time_mask		= time_conv->time_mask;
+	tc->cap_user_time_zero	= time_conv->cap_user_time_zero;
+	tc->cap_user_time_short	= time_conv->cap_user_time_short;
+}
+
+static void guest_session__get_tc(struct guest_session *gs)
+{
+	struct perf_inject *inject = container_of(gs, struct perf_inject, guest_session);
+
+	get_tsc_conv(&gs->host_tc, &inject->session->time_conv);
+	get_tsc_conv(&gs->guest_tc, &gs->session->time_conv);
+}
+
+static void guest_session__convert_time(struct guest_session *gs, u64 guest_time, u64 *host_time)
+{
+	u64 tsc;
+
+	if (!guest_time) {
+		*host_time = 0;
+		return;
+	}
+
+	if (gs->guest_tc.cap_user_time_zero)
+		tsc = perf_time_to_tsc(guest_time, &gs->guest_tc);
+	else
+		tsc = guest_time;
+
+	/*
+	 * This is the correct order of operations for x86 if the TSC Offset and
+	 * Multiplier values are used.
+	 */
+	tsc -= gs->time_offset;
+	tsc /= gs->time_scale;
+
+	if (gs->host_tc.cap_user_time_zero)
+		*host_time = tsc_to_perf_time(tsc, &gs->host_tc);
+	else
+		*host_time = tsc;
+}
+
+static int guest_session__fetch(struct guest_session *gs)
+{
+	void *buf = gs->ev.event_buf;
+	struct perf_event_header *hdr = buf;
+	size_t hdr_sz = sizeof(*hdr);
+	ssize_t ret;
+
+	ret = readn(gs->tmp_fd, buf, hdr_sz);
+	if (ret < 0)
+		return ret;
+
+	if (!ret) {
+		/* Zero size means EOF */
+		hdr->size = 0;
+		return 0;
+	}
+
+	buf += hdr_sz;
+
+	ret = readn(gs->tmp_fd, buf, hdr->size - hdr_sz);
+	if (ret < 0)
+		return ret;
+
+	gs->ev.event = (union perf_event *)gs->ev.event_buf;
+	gs->ev.sample.time = 0;
+
+	if (hdr->type >= PERF_RECORD_USER_TYPE_START) {
+		pr_err("Unexpected type fetching guest event");
+		return 0;
+	}
+
+	ret = evlist__parse_sample(gs->session->evlist, gs->ev.event, &gs->ev.sample);
+	if (ret) {
+		pr_err("Parse failed fetching guest event");
+		return ret;
+	}
+
+	if (!gs->have_tc) {
+		guest_session__get_tc(gs);
+		gs->have_tc = true;
+	}
+
+	guest_session__convert_time(gs, gs->ev.sample.time, &gs->ev.sample.time);
+
+	return 0;
+}
+
+static int evlist__append_id_sample(struct evlist *evlist, union perf_event *ev,
+				    const struct perf_sample *sample)
+{
+	struct evsel *evsel;
+	void *array;
+	int ret;
+
+	evsel = evlist__id2evsel(evlist, sample->id);
+	array = ev;
+
+	if (!evsel) {
+		pr_err("No evsel for id %"PRIu64"\n", sample->id);
+		return -EINVAL;
+	}
+
+	array += ev->header.size;
+	ret = perf_event__synthesize_id_sample(array, evsel->core.attr.sample_type, sample);
+	if (ret < 0)
+		return ret;
+
+	if (ret & 7) {
+		pr_err("Bad id sample size %d\n", ret);
+		return -EINVAL;
+	}
+
+	ev->header.size += ret;
+
+	return 0;
+}
+
+static int guest_session__inject_events(struct guest_session *gs, u64 timestamp)
+{
+	struct perf_inject *inject = container_of(gs, struct perf_inject, guest_session);
+	int ret;
+
+	if (!gs->ready)
+		return 0;
+
+	while (1) {
+		struct perf_sample *sample;
+		struct guest_id *guest_id;
+		union perf_event *ev;
+		u16 id_hdr_size;
+		u8 cpumode;
+		u64 id;
+
+		if (!gs->fetched) {
+			ret = guest_session__fetch(gs);
+			if (ret)
+				return ret;
+			gs->fetched = true;
+		}
+
+		ev = gs->ev.event;
+		sample = &gs->ev.sample;
+
+		if (!ev->header.size)
+			return 0; /* EOF */
+
+		if (sample->time > timestamp)
+			return 0;
+
+		/* Change cpumode to guest */
+		cpumode = ev->header.misc & PERF_RECORD_MISC_CPUMODE_MASK;
+		if (cpumode & PERF_RECORD_MISC_USER)
+			cpumode = PERF_RECORD_MISC_GUEST_USER;
+		else
+			cpumode = PERF_RECORD_MISC_GUEST_KERNEL;
+		ev->header.misc &= ~PERF_RECORD_MISC_CPUMODE_MASK;
+		ev->header.misc |= cpumode;
+
+		id = sample->id;
+		if (!id) {
+			id = gs->dflt_id;
+			id_hdr_size = gs->dflt_id_hdr_size;
+		} else {
+			struct evsel *evsel = evlist__id2evsel(gs->session->evlist, id);
+
+			id_hdr_size = evsel__id_hdr_size(evsel);
+		}
+
+		if (id_hdr_size & 7) {
+			pr_err("Bad id_hdr_size %u\n", id_hdr_size);
+			return -EINVAL;
+		}
+
+		if (ev->header.size & 7) {
+			pr_err("Bad event size %u\n", ev->header.size);
+			return -EINVAL;
+		}
+
+		/* Remove guest id sample */
+		ev->header.size -= id_hdr_size;
+
+		if (ev->header.size & 7) {
+			pr_err("Bad raw event size %u\n", ev->header.size);
+			return -EINVAL;
+		}
+
+		guest_id = guest_session__lookup_id(gs, id);
+		if (!guest_id) {
+			pr_err("Guest event with unknown id %llu\n",
+			       (unsigned long long)id);
+			return -EINVAL;
+		}
+
+		/* Change to host ID to avoid conflicting ID values */
+		sample->id = guest_id->host_id;
+		sample->stream_id = guest_id->host_id;
+
+		if (sample->cpu != (u32)-1) {
+			if (sample->cpu >= gs->vcpu_cnt) {
+				pr_err("Guest event with unknown VCPU %u\n",
+				       sample->cpu);
+				return -EINVAL;
+			}
+			/* Change to host CPU instead of guest VCPU */
+			sample->cpu = gs->vcpu[sample->cpu].cpu;
+		}
+
+		/* New id sample with new ID and CPU */
+		ret = evlist__append_id_sample(inject->session->evlist, ev, sample);
+		if (ret)
+			return ret;
+
+		if (ev->header.size & 7) {
+			pr_err("Bad new event size %u\n", ev->header.size);
+			return -EINVAL;
+		}
+
+		gs->fetched = false;
+
+		ret = output_bytes(inject, ev, ev->header.size);
+		if (ret)
+			return ret;
+	}
+}
+
+static int guest_session__flush_events(struct guest_session *gs)
+{
+	return guest_session__inject_events(gs, -1);
+}
+
+static int host__repipe(struct perf_tool *tool,
+			union perf_event *event,
+			struct perf_sample *sample,
+			struct machine *machine)
+{
+	struct perf_inject *inject = container_of(tool, struct perf_inject, tool);
+	int ret;
+
+	ret = guest_session__inject_events(&inject->guest_session, sample->time);
+	if (ret)
+		return ret;
+
+	return perf_event__repipe(tool, event, sample, machine);
+}
+
+static int host__finished_init(struct perf_session *session, union perf_event *event)
+{
+	struct perf_inject *inject = container_of(session->tool, struct perf_inject, tool);
+	struct guest_session *gs = &inject->guest_session;
+	int ret;
+
+	/*
+	 * Peek through host COMM events to find QEMU threads and the VCPU they
+	 * are running.
+	 */
+	ret = host_peek_vm_comms(session, gs);
+	if (ret)
+		return ret;
+
+	if (!gs->vcpu_cnt) {
+		pr_err("No VCPU theads found for pid %u\n", gs->machine_pid);
+		return -EINVAL;
+	}
+
+	/*
+	 * Allocate new (unused) host sample IDs and map them to the guest IDs.
+	 */
+	gs->highest_id = evlist__find_highest_id(session->evlist);
+	ret = guest_session__map_ids(gs, session->evlist);
+	if (ret)
+		return ret;
+
+	ret = guest_session__add_attrs(gs);
+	if (ret)
+		return ret;
+
+	ret = synthesize_id_index(inject, gs->session->evlist->core.nr_entries);
+	if (ret) {
+		pr_err("Failed to synthesize id_index\n");
+		return ret;
+	}
+
+	ret = guest_session__add_build_ids(gs);
+	if (ret) {
+		pr_err("Failed to add guest build IDs\n");
+		return ret;
+	}
+
+	gs->ready = true;
+
+	ret = guest_session__inject_events(gs, 0);
+	if (ret)
+		return ret;
+
+	return perf_event__repipe_op2_synth(session, event);
+}
+
+/*
+ * Obey finished-round ordering. The FINISHED_ROUND event is first processed
+ * which flushes host events to file up until the last flush time. Then inject
+ * guest events up to the same time. Finally write out the FINISHED_ROUND event
+ * itself.
+ */
+static int host__finished_round(struct perf_tool *tool,
+				union perf_event *event,
+				struct ordered_events *oe)
+{
+	struct perf_inject *inject = container_of(tool, struct perf_inject, tool);
+	int ret = perf_event__process_finished_round(tool, event, oe);
+	u64 timestamp = ordered_events__last_flush_time(oe);
+
+	if (ret)
+		return ret;
+
+	ret = guest_session__inject_events(&inject->guest_session, timestamp);
+	if (ret)
+		return ret;
+
+	return perf_event__repipe_oe_synth(tool, event, oe);
+}
+
+static int host__context_switch(struct perf_tool *tool,
+				union perf_event *event,
+				struct perf_sample *sample,
+				struct machine *machine)
+{
+	struct perf_inject *inject = container_of(tool, struct perf_inject, tool);
+	bool out = event->header.misc & PERF_RECORD_MISC_SWITCH_OUT;
+	struct guest_session *gs = &inject->guest_session;
+	u32 pid = event->context_switch.next_prev_pid;
+	u32 tid = event->context_switch.next_prev_tid;
+	struct guest_tid *guest_tid;
+	u32 vcpu;
+
+	if (out || pid != gs->machine_pid)
+		goto out;
+
+	guest_tid = guest_session__lookup_tid(gs, tid);
+	if (!guest_tid)
+		goto out;
+
+	if (sample->cpu == (u32)-1) {
+		pr_err("Switch event does not have CPU\n");
+		return -EINVAL;
+	}
+
+	vcpu = guest_tid->vcpu;
+	if (vcpu >= gs->vcpu_cnt)
+		return -EINVAL;
+
+	/* Guest is switching in, record which CPU the VCPU is now running on */
+	gs->vcpu[vcpu].cpu = sample->cpu;
+out:
+	return host__repipe(tool, event, sample, machine);
+}
+
 static void sig_handler(int sig __maybe_unused)
 {
 	session_done = 1;
@@ -767,6 +1666,61 @@ static int parse_vm_time_correlation(const struct option *opt, const char *str,
 	return inject->itrace_synth_opts.vm_tm_corr_args ? 0 : -ENOMEM;
 }
 
+static int parse_guest_data(const struct option *opt, const char *str, int unset)
+{
+	struct perf_inject *inject = opt->value;
+	struct guest_session *gs = &inject->guest_session;
+	char *tok;
+	char *s;
+
+	if (unset)
+		return 0;
+
+	if (!str)
+		goto bad_args;
+
+	s = strdup(str);
+	if (!s)
+		return -ENOMEM;
+
+	gs->perf_data_file = strsep(&s, ",");
+	if (!gs->perf_data_file)
+		goto bad_args;
+
+	gs->copy_kcore_dir = has_kcore_dir(gs->perf_data_file);
+	if (gs->copy_kcore_dir)
+		inject->output.is_dir = true;
+
+	tok = strsep(&s, ",");
+	if (!tok)
+		goto bad_args;
+	gs->machine_pid = strtoul(tok, NULL, 0);
+	if (!inject->guest_session.machine_pid)
+		goto bad_args;
+
+	gs->time_scale = 1;
+
+	tok = strsep(&s, ",");
+	if (!tok)
+		goto out;
+	gs->time_offset = strtoull(tok, NULL, 0);
+
+	tok = strsep(&s, ",");
+	if (!tok)
+		goto out;
+	gs->time_scale = strtod(tok, NULL);
+	if (!gs->time_scale)
+		goto bad_args;
+out:
+	return 0;
+
+bad_args:
+	pr_err("--guest-data option requires guest perf.data file name, "
+	       "guest machine PID, and optionally guest timestamp offset, "
+	       "and guest timestamp scale factor, separated by commas.\n");
+	return -1;
+}
+
 static int save_section_info_cb(struct perf_file_section *section,
 				struct perf_header *ph __maybe_unused,
 				int feat, int fd __maybe_unused, void *data)
@@ -896,6 +1850,22 @@ static int copy_kcore_dir(struct perf_inject *inject)
 	return ret;
 }
 
+static int guest_session__copy_kcore_dir(struct guest_session *gs)
+{
+	struct perf_inject *inject = container_of(gs, struct perf_inject, guest_session);
+	char *cmd;
+	int ret;
+
+	ret = asprintf(&cmd, "cp -r -n %s/kcore_dir %s/kcore_dir__%u >/dev/null 2>&1",
+		       gs->perf_data_file, inject->output.path, gs->machine_pid);
+	if (ret < 0)
+		return ret;
+	pr_debug("%s\n", cmd);
+	ret = system(cmd);
+	free(cmd);
+	return ret;
+}
+
 static int output_fd(struct perf_inject *inject)
 {
 	return inject->in_place_update ? -1 : perf_data__fd(&inject->output);
@@ -904,6 +1874,7 @@ static int output_fd(struct perf_inject *inject)
 static int __cmd_inject(struct perf_inject *inject)
 {
 	int ret = -EINVAL;
+	struct guest_session *gs = &inject->guest_session;
 	struct perf_session *session = inject->session;
 	int fd = output_fd(inject);
 	u64 output_data_offset;
@@ -968,6 +1939,47 @@ static int __cmd_inject(struct perf_inject *inject)
 		output_data_offset = roundup(8192 + session->header.data_offset, 4096);
 		if (inject->strip)
 			strip_init(inject);
+	} else if (gs->perf_data_file) {
+		char *name = gs->perf_data_file;
+
+		/*
+		 * Not strictly necessary, but keep these events in order wrt
+		 * guest events.
+		 */
+		inject->tool.mmap		= host__repipe;
+		inject->tool.mmap2		= host__repipe;
+		inject->tool.comm		= host__repipe;
+		inject->tool.fork		= host__repipe;
+		inject->tool.exit		= host__repipe;
+		inject->tool.lost		= host__repipe;
+		inject->tool.context_switch	= host__repipe;
+		inject->tool.ksymbol		= host__repipe;
+		inject->tool.text_poke		= host__repipe;
+		/*
+		 * Once the host session has initialized, set up sample ID
+		 * mapping and feed in guest attrs, build IDs and initial
+		 * events.
+		 */
+		inject->tool.finished_init	= host__finished_init;
+		/* Obey finished round ordering */
+		inject->tool.finished_round	= host__finished_round,
+		/* Keep track of which CPU a VCPU is runnng on */
+		inject->tool.context_switch	= host__context_switch;
+		/*
+		 * Must order events to be able to obey finished round
+		 * ordering.
+		 */
+		inject->tool.ordered_events	= true;
+		inject->tool.ordering_requires_timestamps = true;
+		/* Set up a separate session to process guest perf.data file */
+		ret = guest_session__start(gs, name, session->data->force);
+		if (ret) {
+			pr_err("Failed to process %s, error %d\n", name, ret);
+			return ret;
+		}
+		/* Allow space in the header for guest attributes */
+		output_data_offset += gs->session->header.data_offset;
+		output_data_offset = roundup(output_data_offset, 4096);
 	}
 
 	if (!inject->itrace_synth_opts.set)
@@ -980,6 +1992,18 @@ static int __cmd_inject(struct perf_inject *inject)
 	if (ret)
 		return ret;
 
+	if (gs->session) {
+		/*
+		 * Remaining guest events have later timestamps. Flush them
+		 * out to file.
+		 */
+		ret = guest_session__flush_events(gs);
+		if (ret) {
+			pr_err("Failed to flush guest events\n");
+			return ret;
+		}
+	}
+
 	if (!inject->is_pipe && !inject->in_place_update) {
 		struct inject_fc inj_fc = {
 			.fc.copy = feat_copy_cb,
@@ -1014,8 +2038,17 @@ static int __cmd_inject(struct perf_inject *inject)
 
 		if (inject->copy_kcore_dir) {
 			ret = copy_kcore_dir(inject);
-			if (ret)
+			if (ret) {
+				pr_err("Failed to copy kcore\n");
 				return ret;
+			}
+		}
+		if (gs->copy_kcore_dir) {
+			ret = guest_session__copy_kcore_dir(gs);
+			if (ret) {
+				pr_err("Failed to copy guest kcore\n");
+				return ret;
+			}
 		}
 	}
 
@@ -1113,6 +2146,12 @@ int cmd_inject(int argc, const char **argv)
 		OPT_CALLBACK_OPTARG(0, "vm-time-correlation", &inject, NULL, "opts",
 				    "correlate time between VM guests and the host",
 				    parse_vm_time_correlation),
+		OPT_CALLBACK_OPTARG(0, "guest-data", &inject, NULL, "opts",
+				    "inject events from a guest perf.data file",
+				    parse_guest_data),
+		OPT_STRING(0, "guestmount", &symbol_conf.guestmount, "directory",
+			   "guest mount directory under which every guest os"
+			   " instance has a subdir"),
 		OPT_END()
 	};
 	const char * const inject_usage[] = {
@@ -1243,6 +2282,8 @@ int cmd_inject(int argc, const char **argv)
 
 	ret = __cmd_inject(&inject);
 
+	guest_session__exit(&inject.guest_session);
+
 out_delete:
 	zstd_fini(&(inject.session->zstd_data));
 	perf_session__delete(inject.session);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH 25/35] perf machine: Use realloc_array_as_needed() in machine__set_current_tid()
  2022-07-11  9:31 [PATCH 00/35] perf intel-pt: Add support for tracing virtual machine user space on the host Adrian Hunter
                   ` (23 preceding siblings ...)
  2022-07-11  9:32 ` [PATCH 24/35] perf inject: Add support for injecting guest sideband events Adrian Hunter
@ 2022-07-11  9:32 ` Adrian Hunter
  2022-07-11  9:32 ` [PATCH 26/35] perf tools: Handle injected guest kernel mmap event Adrian Hunter
                   ` (10 subsequent siblings)
  35 siblings, 0 replies; 83+ messages in thread
From: Adrian Hunter @ 2022-07-11  9:32 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Namhyung Kim, Ian Rogers, Andi Kleen, linux-kernel, kvm

Prepare machine__set_current_tid() for use with guest machines that do
not currently have a machine->env->nr_cpus_avail value by making use of
realloc_array_as_needed().

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/machine.c | 26 +++++++-------------------
 tools/perf/util/machine.h |  1 +
 2 files changed, 8 insertions(+), 19 deletions(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 009061852808..27d1a38f44c3 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -3174,9 +3174,7 @@ int machines__for_each_thread(struct machines *machines,
 
 pid_t machine__get_current_tid(struct machine *machine, int cpu)
 {
-	int nr_cpus = min(machine->env->nr_cpus_avail, MAX_NR_CPUS);
-
-	if (cpu < 0 || cpu >= nr_cpus || !machine->current_tid)
+	if (cpu < 0 || (size_t)cpu >= machine->current_tid_sz)
 		return -1;
 
 	return machine->current_tid[cpu];
@@ -3186,26 +3184,16 @@ int machine__set_current_tid(struct machine *machine, int cpu, pid_t pid,
 			     pid_t tid)
 {
 	struct thread *thread;
-	int nr_cpus = min(machine->env->nr_cpus_avail, MAX_NR_CPUS);
+	const pid_t init_val = -1;
 
 	if (cpu < 0)
 		return -EINVAL;
 
-	if (!machine->current_tid) {
-		int i;
-
-		machine->current_tid = calloc(nr_cpus, sizeof(pid_t));
-		if (!machine->current_tid)
-			return -ENOMEM;
-		for (i = 0; i < nr_cpus; i++)
-			machine->current_tid[i] = -1;
-	}
-
-	if (cpu >= nr_cpus) {
-		pr_err("Requested CPU %d too large. ", cpu);
-		pr_err("Consider raising MAX_NR_CPUS\n");
-		return -EINVAL;
-	}
+	if (realloc_array_as_needed(machine->current_tid,
+				    machine->current_tid_sz,
+				    (unsigned int)cpu,
+				    &init_val))
+		return -ENOMEM;
 
 	machine->current_tid[cpu] = tid;
 
diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
index d40b23c71420..609e24e329d1 100644
--- a/tools/perf/util/machine.h
+++ b/tools/perf/util/machine.h
@@ -57,6 +57,7 @@ struct machine {
 	struct map	  *vmlinux_map;
 	u64		  kernel_start;
 	pid_t		  *current_tid;
+	size_t		  current_tid_sz;
 	union { /* Tool specific area */
 		void	  *priv;
 		u64	  db_id;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH 26/35] perf tools: Handle injected guest kernel mmap event
  2022-07-11  9:31 [PATCH 00/35] perf intel-pt: Add support for tracing virtual machine user space on the host Adrian Hunter
                   ` (24 preceding siblings ...)
  2022-07-11  9:32 ` [PATCH 25/35] perf machine: Use realloc_array_as_needed() in machine__set_current_tid() Adrian Hunter
@ 2022-07-11  9:32 ` Adrian Hunter
  2022-07-20  1:09   ` Ian Rogers
  2022-07-11  9:32 ` [PATCH 27/35] perf tools: Add perf_event__is_guest() Adrian Hunter
                   ` (9 subsequent siblings)
  35 siblings, 1 reply; 83+ messages in thread
From: Adrian Hunter @ 2022-07-11  9:32 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Namhyung Kim, Ian Rogers, Andi Kleen, linux-kernel, kvm

If a kernel mmap event was recorded inside a guest and injected into a host
perf.data file, then it will match a host mmap_name not a guest mmap_name,
see machine__set_mmap_name(). So try matching a host mmap_name in that
case.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/machine.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 27d1a38f44c3..8f657225fb02 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -1742,6 +1742,7 @@ static int machine__process_kernel_mmap_event(struct machine *machine,
 	struct map *map;
 	enum dso_space_type dso_space;
 	bool is_kernel_mmap;
+	const char *mmap_name = machine->mmap_name;
 
 	/* If we have maps from kcore then we do not need or want any others */
 	if (machine__uses_kcore(machine))
@@ -1752,8 +1753,16 @@ static int machine__process_kernel_mmap_event(struct machine *machine,
 	else
 		dso_space = DSO_SPACE__KERNEL_GUEST;
 
-	is_kernel_mmap = memcmp(xm->name, machine->mmap_name,
-				strlen(machine->mmap_name) - 1) == 0;
+	is_kernel_mmap = memcmp(xm->name, mmap_name, strlen(mmap_name) - 1) == 0;
+	if (!is_kernel_mmap && !machine__is_host(machine)) {
+		/*
+		 * If the event was recorded inside the guest and injected into
+		 * the host perf.data file, then it will match a host mmap_name,
+		 * so try that - see machine__set_mmap_name().
+		 */
+		mmap_name = "[kernel.kallsyms]";
+		is_kernel_mmap = memcmp(xm->name, mmap_name, strlen(mmap_name) - 1) == 0;
+	}
 	if (xm->name[0] == '/' ||
 	    (!is_kernel_mmap && xm->name[0] == '[')) {
 		map = machine__addnew_module_map(machine, xm->start,
@@ -1767,7 +1776,7 @@ static int machine__process_kernel_mmap_event(struct machine *machine,
 			dso__set_build_id(map->dso, bid);
 
 	} else if (is_kernel_mmap) {
-		const char *symbol_name = (xm->name + strlen(machine->mmap_name));
+		const char *symbol_name = xm->name + strlen(mmap_name);
 		/*
 		 * Should be there already, from the build-id table in
 		 * the header.
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH 27/35] perf tools: Add perf_event__is_guest()
  2022-07-11  9:31 [PATCH 00/35] perf intel-pt: Add support for tracing virtual machine user space on the host Adrian Hunter
                   ` (25 preceding siblings ...)
  2022-07-11  9:32 ` [PATCH 26/35] perf tools: Handle injected guest kernel mmap event Adrian Hunter
@ 2022-07-11  9:32 ` Adrian Hunter
  2022-07-20  1:11   ` Ian Rogers
  2022-07-11  9:32 ` [PATCH 28/35] perf intel-pt: Remove guest_machine_pid Adrian Hunter
                   ` (8 subsequent siblings)
  35 siblings, 1 reply; 83+ messages in thread
From: Adrian Hunter @ 2022-07-11  9:32 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Namhyung Kim, Ian Rogers, Andi Kleen, linux-kernel, kvm

Add a helper function to determine if an event is a guest event.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/event.h | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index a660f304f83c..a7b0931d5137 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -484,4 +484,25 @@ void arch_perf_synthesize_sample_weight(const struct perf_sample *data, __u64 *a
 const char *arch_perf_header_entry(const char *se_header);
 int arch_support_sort_key(const char *sort_key);
 
+static inline bool perf_event_header__cpumode_is_guest(u8 cpumode)
+{
+	return cpumode == PERF_RECORD_MISC_GUEST_KERNEL ||
+	       cpumode == PERF_RECORD_MISC_GUEST_USER;
+}
+
+static inline bool perf_event_header__misc_is_guest(u16 misc)
+{
+	return perf_event_header__cpumode_is_guest(misc & PERF_RECORD_MISC_CPUMODE_MASK);
+}
+
+static inline bool perf_event_header__is_guest(const struct perf_event_header *header)
+{
+	return perf_event_header__misc_is_guest(header->misc);
+}
+
+static inline bool perf_event__is_guest(const union perf_event *event)
+{
+	return perf_event_header__is_guest(&event->header);
+}
+
 #endif /* __PERF_RECORD_H */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH 28/35] perf intel-pt: Remove guest_machine_pid
  2022-07-11  9:31 [PATCH 00/35] perf intel-pt: Add support for tracing virtual machine user space on the host Adrian Hunter
                   ` (26 preceding siblings ...)
  2022-07-11  9:32 ` [PATCH 27/35] perf tools: Add perf_event__is_guest() Adrian Hunter
@ 2022-07-11  9:32 ` Adrian Hunter
  2022-07-20  1:12   ` Ian Rogers
  2022-07-11  9:32 ` [PATCH 29/35] perf intel-pt: Add some more logging to intel_pt_walk_next_insn() Adrian Hunter
                   ` (7 subsequent siblings)
  35 siblings, 1 reply; 83+ messages in thread
From: Adrian Hunter @ 2022-07-11  9:32 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Namhyung Kim, Ian Rogers, Andi Kleen, linux-kernel, kvm

Remove guest_machine_pid because it is not needed.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/intel-pt.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c
index 62b2f375a94d..014f9f73cc49 100644
--- a/tools/perf/util/intel-pt.c
+++ b/tools/perf/util/intel-pt.c
@@ -194,7 +194,6 @@ struct intel_pt_queue {
 	struct machine *guest_machine;
 	struct thread *guest_thread;
 	struct thread *unknown_guest_thread;
-	pid_t guest_machine_pid;
 	bool exclude_kernel;
 	bool have_sample;
 	u64 time;
@@ -685,7 +684,7 @@ static int intel_pt_get_guest(struct intel_pt_queue *ptq)
 	struct machine *machine;
 	pid_t pid = ptq->pid <= 0 ? DEFAULT_GUEST_KERNEL_ID : ptq->pid;
 
-	if (ptq->guest_machine && pid == ptq->guest_machine_pid)
+	if (ptq->guest_machine && pid == ptq->guest_machine->pid)
 		return 0;
 
 	ptq->guest_machine = NULL;
@@ -705,7 +704,6 @@ static int intel_pt_get_guest(struct intel_pt_queue *ptq)
 		return -1;
 
 	ptq->guest_machine = machine;
-	ptq->guest_machine_pid = pid;
 
 	return 0;
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH 29/35] perf intel-pt: Add some more logging to intel_pt_walk_next_insn()
  2022-07-11  9:31 [PATCH 00/35] perf intel-pt: Add support for tracing virtual machine user space on the host Adrian Hunter
                   ` (27 preceding siblings ...)
  2022-07-11  9:32 ` [PATCH 28/35] perf intel-pt: Remove guest_machine_pid Adrian Hunter
@ 2022-07-11  9:32 ` Adrian Hunter
  2022-07-20  1:13   ` Ian Rogers
  2022-07-11  9:32 ` [PATCH 30/35] perf intel-pt: Track guest context switches Adrian Hunter
                   ` (6 subsequent siblings)
  35 siblings, 1 reply; 83+ messages in thread
From: Adrian Hunter @ 2022-07-11  9:32 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Namhyung Kim, Ian Rogers, Andi Kleen, linux-kernel, kvm

To aid debugging, add some more logging to intel_pt_walk_next_insn().

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/intel-pt.c | 25 ++++++++++++++++++++-----
 1 file changed, 20 insertions(+), 5 deletions(-)

diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c
index 014f9f73cc49..a8798b5bb311 100644
--- a/tools/perf/util/intel-pt.c
+++ b/tools/perf/util/intel-pt.c
@@ -758,27 +758,38 @@ static int intel_pt_walk_next_insn(struct intel_pt_insn *intel_pt_insn,
 
 	if (nr) {
 		if ((!symbol_conf.guest_code && cpumode != PERF_RECORD_MISC_GUEST_KERNEL) ||
-		    intel_pt_get_guest(ptq))
+		    intel_pt_get_guest(ptq)) {
+			intel_pt_log("ERROR: no guest machine\n");
 			return -EINVAL;
+		}
 		machine = ptq->guest_machine;
 		thread = ptq->guest_thread;
 		if (!thread) {
-			if (cpumode != PERF_RECORD_MISC_GUEST_KERNEL)
+			if (cpumode != PERF_RECORD_MISC_GUEST_KERNEL) {
+				intel_pt_log("ERROR: no guest thread\n");
 				return -EINVAL;
+			}
 			thread = ptq->unknown_guest_thread;
 		}
 	} else {
 		thread = ptq->thread;
 		if (!thread) {
-			if (cpumode != PERF_RECORD_MISC_KERNEL)
+			if (cpumode != PERF_RECORD_MISC_KERNEL) {
+				intel_pt_log("ERROR: no thread\n");
 				return -EINVAL;
+			}
 			thread = ptq->pt->unknown_thread;
 		}
 	}
 
 	while (1) {
-		if (!thread__find_map(thread, cpumode, *ip, &al) || !al.map->dso)
+		if (!thread__find_map(thread, cpumode, *ip, &al) || !al.map->dso) {
+			if (al.map)
+				intel_pt_log("ERROR: thread has no dso for %#" PRIx64 "\n", *ip);
+			else
+				intel_pt_log("ERROR: thread has no map for %#" PRIx64 "\n", *ip);
 			return -EINVAL;
+		}
 
 		if (al.map->dso->data.status == DSO_DATA_STATUS_ERROR &&
 		    dso__data_status_seen(al.map->dso,
@@ -819,8 +830,12 @@ static int intel_pt_walk_next_insn(struct intel_pt_insn *intel_pt_insn,
 			len = dso__data_read_offset(al.map->dso, machine,
 						    offset, buf,
 						    INTEL_PT_INSN_BUF_SZ);
-			if (len <= 0)
+			if (len <= 0) {
+				intel_pt_log("ERROR: failed to read at %" PRIu64 " ", offset);
+				if (intel_pt_enable_logging)
+					dso__fprintf(al.map->dso, intel_pt_log_fp());
 				return -EINVAL;
+			}
 
 			if (intel_pt_get_insn(buf, len, x86_64, intel_pt_insn))
 				return -EINVAL;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH 30/35] perf intel-pt: Track guest context switches
  2022-07-11  9:31 [PATCH 00/35] perf intel-pt: Add support for tracing virtual machine user space on the host Adrian Hunter
                   ` (28 preceding siblings ...)
  2022-07-11  9:32 ` [PATCH 29/35] perf intel-pt: Add some more logging to intel_pt_walk_next_insn() Adrian Hunter
@ 2022-07-11  9:32 ` Adrian Hunter
  2022-07-20  1:13   ` Ian Rogers
  2022-07-11  9:32 ` [PATCH 31/35] perf intel-pt: Disable sync switch with guest sideband Adrian Hunter
                   ` (5 subsequent siblings)
  35 siblings, 1 reply; 83+ messages in thread
From: Adrian Hunter @ 2022-07-11  9:32 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Namhyung Kim, Ian Rogers, Andi Kleen, linux-kernel, kvm

Use guest context switch events to keep track of which guest thread is
running on a particular guest machine and VCPU.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/intel-pt.c | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c
index a8798b5bb311..98b097fec476 100644
--- a/tools/perf/util/intel-pt.c
+++ b/tools/perf/util/intel-pt.c
@@ -78,6 +78,7 @@ struct intel_pt {
 	bool use_thread_stack;
 	bool callstack;
 	bool cap_event_trace;
+	bool have_guest_sideband;
 	unsigned int br_stack_sz;
 	unsigned int br_stack_sz_plus;
 	int have_sched_switch;
@@ -3079,6 +3080,25 @@ static int intel_pt_context_switch_in(struct intel_pt *pt,
 	return machine__set_current_tid(pt->machine, cpu, pid, tid);
 }
 
+static int intel_pt_guest_context_switch(struct intel_pt *pt,
+					 union perf_event *event,
+					 struct perf_sample *sample)
+{
+	bool out = event->header.misc & PERF_RECORD_MISC_SWITCH_OUT;
+	struct machines *machines = &pt->session->machines;
+	struct machine *machine = machines__find(machines, sample->machine_pid);
+
+	pt->have_guest_sideband = true;
+
+	if (out)
+		return 0;
+
+	if (!machine)
+		return -EINVAL;
+
+	return machine__set_current_tid(machine, sample->vcpu, sample->pid, sample->tid);
+}
+
 static int intel_pt_context_switch(struct intel_pt *pt, union perf_event *event,
 				   struct perf_sample *sample)
 {
@@ -3086,6 +3106,9 @@ static int intel_pt_context_switch(struct intel_pt *pt, union perf_event *event,
 	pid_t pid, tid;
 	int cpu, ret;
 
+	if (perf_event__is_guest(event))
+		return intel_pt_guest_context_switch(pt, event, sample);
+
 	cpu = sample->cpu;
 
 	if (pt->have_sched_switch == 3) {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH 31/35] perf intel-pt: Disable sync switch with guest sideband
  2022-07-11  9:31 [PATCH 00/35] perf intel-pt: Add support for tracing virtual machine user space on the host Adrian Hunter
                   ` (29 preceding siblings ...)
  2022-07-11  9:32 ` [PATCH 30/35] perf intel-pt: Track guest context switches Adrian Hunter
@ 2022-07-11  9:32 ` Adrian Hunter
  2022-07-20  1:14   ` Ian Rogers
  2022-07-11  9:32 ` [PATCH 32/35] perf intel-pt: Determine guest thread from " Adrian Hunter
                   ` (4 subsequent siblings)
  35 siblings, 1 reply; 83+ messages in thread
From: Adrian Hunter @ 2022-07-11  9:32 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Namhyung Kim, Ian Rogers, Andi Kleen, linux-kernel, kvm

The sync_switch facility attempts to better synchronize context switches
with the Intel PT trace, however it is not designed for guest machine
context switches, so disable it when guest sideband is detected.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/intel-pt.c | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c
index 98b097fec476..dc2af64f9e31 100644
--- a/tools/perf/util/intel-pt.c
+++ b/tools/perf/util/intel-pt.c
@@ -74,6 +74,7 @@ struct intel_pt {
 	bool data_queued;
 	bool est_tsc;
 	bool sync_switch;
+	bool sync_switch_not_supported;
 	bool mispred_all;
 	bool use_thread_stack;
 	bool callstack;
@@ -2638,6 +2639,9 @@ static void intel_pt_enable_sync_switch(struct intel_pt *pt)
 {
 	unsigned int i;
 
+	if (pt->sync_switch_not_supported)
+		return;
+
 	pt->sync_switch = true;
 
 	for (i = 0; i < pt->queues.nr_queues; i++) {
@@ -2649,6 +2653,23 @@ static void intel_pt_enable_sync_switch(struct intel_pt *pt)
 	}
 }
 
+static void intel_pt_disable_sync_switch(struct intel_pt *pt)
+{
+	unsigned int i;
+
+	pt->sync_switch = false;
+
+	for (i = 0; i < pt->queues.nr_queues; i++) {
+		struct auxtrace_queue *queue = &pt->queues.queue_array[i];
+		struct intel_pt_queue *ptq = queue->priv;
+
+		if (ptq) {
+			ptq->sync_switch = false;
+			intel_pt_next_tid(pt, ptq);
+		}
+	}
+}
+
 /*
  * To filter against time ranges, it is only necessary to look at the next start
  * or end time.
@@ -3090,6 +3111,14 @@ static int intel_pt_guest_context_switch(struct intel_pt *pt,
 
 	pt->have_guest_sideband = true;
 
+	/*
+	 * sync_switch cannot handle guest machines at present, so just disable
+	 * it.
+	 */
+	pt->sync_switch_not_supported = true;
+	if (pt->sync_switch)
+		intel_pt_disable_sync_switch(pt);
+
 	if (out)
 		return 0;
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH 32/35] perf intel-pt: Determine guest thread from guest sideband
  2022-07-11  9:31 [PATCH 00/35] perf intel-pt: Add support for tracing virtual machine user space on the host Adrian Hunter
                   ` (30 preceding siblings ...)
  2022-07-11  9:32 ` [PATCH 31/35] perf intel-pt: Disable sync switch with guest sideband Adrian Hunter
@ 2022-07-11  9:32 ` Adrian Hunter
  2022-07-20  1:15   ` Ian Rogers
  2022-07-11  9:32 ` [PATCH 33/35] perf intel-pt: Add machine_pid and vcpu to auxtrace_error Adrian Hunter
                   ` (3 subsequent siblings)
  35 siblings, 1 reply; 83+ messages in thread
From: Adrian Hunter @ 2022-07-11  9:32 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Namhyung Kim, Ian Rogers, Andi Kleen, linux-kernel, kvm

Prior to decoding, determine what guest thread, if any, is running.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/intel-pt.c | 69 ++++++++++++++++++++++++++++++++++++--
 1 file changed, 67 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c
index dc2af64f9e31..a08c2f059d5a 100644
--- a/tools/perf/util/intel-pt.c
+++ b/tools/perf/util/intel-pt.c
@@ -196,6 +196,10 @@ struct intel_pt_queue {
 	struct machine *guest_machine;
 	struct thread *guest_thread;
 	struct thread *unknown_guest_thread;
+	pid_t guest_machine_pid;
+	pid_t guest_pid;
+	pid_t guest_tid;
+	int vcpu;
 	bool exclude_kernel;
 	bool have_sample;
 	u64 time;
@@ -759,8 +763,13 @@ static int intel_pt_walk_next_insn(struct intel_pt_insn *intel_pt_insn,
 	cpumode = intel_pt_nr_cpumode(ptq, *ip, nr);
 
 	if (nr) {
-		if ((!symbol_conf.guest_code && cpumode != PERF_RECORD_MISC_GUEST_KERNEL) ||
-		    intel_pt_get_guest(ptq)) {
+		if (ptq->pt->have_guest_sideband) {
+			if (!ptq->guest_machine || ptq->guest_machine_pid != ptq->pid) {
+				intel_pt_log("ERROR: guest sideband but no guest machine\n");
+				return -EINVAL;
+			}
+		} else if ((!symbol_conf.guest_code && cpumode != PERF_RECORD_MISC_GUEST_KERNEL) ||
+			   intel_pt_get_guest(ptq)) {
 			intel_pt_log("ERROR: no guest machine\n");
 			return -EINVAL;
 		}
@@ -1385,6 +1394,55 @@ static void intel_pt_first_timestamp(struct intel_pt *pt, u64 timestamp)
 	}
 }
 
+static int intel_pt_get_guest_from_sideband(struct intel_pt_queue *ptq)
+{
+	struct machines *machines = &ptq->pt->session->machines;
+	struct machine *machine;
+	pid_t machine_pid = ptq->pid;
+	pid_t tid;
+	int vcpu;
+
+	if (machine_pid <= 0)
+		return 0; /* Not a guest machine */
+
+	machine = machines__find(machines, machine_pid);
+	if (!machine)
+		return 0; /* Not a guest machine */
+
+	if (ptq->guest_machine != machine) {
+		ptq->guest_machine = NULL;
+		thread__zput(ptq->guest_thread);
+		thread__zput(ptq->unknown_guest_thread);
+
+		ptq->unknown_guest_thread = machine__find_thread(machine, 0, 0);
+		if (!ptq->unknown_guest_thread)
+			return -1;
+		ptq->guest_machine = machine;
+	}
+
+	vcpu = ptq->thread ? ptq->thread->guest_cpu : -1;
+	if (vcpu < 0)
+		return -1;
+
+	tid = machine__get_current_tid(machine, vcpu);
+
+	if (ptq->guest_thread && ptq->guest_thread->tid != tid)
+		thread__zput(ptq->guest_thread);
+
+	if (!ptq->guest_thread) {
+		ptq->guest_thread = machine__find_thread(machine, -1, tid);
+		if (!ptq->guest_thread)
+			return -1;
+	}
+
+	ptq->guest_machine_pid = machine_pid;
+	ptq->guest_pid = ptq->guest_thread->pid_;
+	ptq->guest_tid = tid;
+	ptq->vcpu = vcpu;
+
+	return 0;
+}
+
 static void intel_pt_set_pid_tid_cpu(struct intel_pt *pt,
 				     struct auxtrace_queue *queue)
 {
@@ -1405,6 +1463,13 @@ static void intel_pt_set_pid_tid_cpu(struct intel_pt *pt,
 		if (queue->cpu == -1)
 			ptq->cpu = ptq->thread->cpu;
 	}
+
+	if (pt->have_guest_sideband && intel_pt_get_guest_from_sideband(ptq)) {
+		ptq->guest_machine_pid = 0;
+		ptq->guest_pid = -1;
+		ptq->guest_tid = -1;
+		ptq->vcpu = -1;
+	}
 }
 
 static void intel_pt_sample_flags(struct intel_pt_queue *ptq)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH 33/35] perf intel-pt: Add machine_pid and vcpu to auxtrace_error
  2022-07-11  9:31 [PATCH 00/35] perf intel-pt: Add support for tracing virtual machine user space on the host Adrian Hunter
                   ` (31 preceding siblings ...)
  2022-07-11  9:32 ` [PATCH 32/35] perf intel-pt: Determine guest thread from " Adrian Hunter
@ 2022-07-11  9:32 ` Adrian Hunter
  2022-07-20  5:27   ` Ian Rogers
  2022-07-11  9:32 ` [PATCH 34/35] perf intel-pt: Use guest pid/tid etc in guest samples Adrian Hunter
                   ` (2 subsequent siblings)
  35 siblings, 1 reply; 83+ messages in thread
From: Adrian Hunter @ 2022-07-11  9:32 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Namhyung Kim, Ian Rogers, Andi Kleen, linux-kernel, kvm

When decoding with guest sideband information, for VMX non-root (NR)
i.e. guest errors, replace the host (hypervisor) pid/tid with guest values,
and provide also the new machine_pid and vcpu values.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/intel-pt.c | 26 ++++++++++++++++++++------
 1 file changed, 20 insertions(+), 6 deletions(-)

diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c
index a08c2f059d5a..143a096b567b 100644
--- a/tools/perf/util/intel-pt.c
+++ b/tools/perf/util/intel-pt.c
@@ -2404,7 +2404,8 @@ static int intel_pt_synth_iflag_chg_sample(struct intel_pt_queue *ptq)
 }
 
 static int intel_pt_synth_error(struct intel_pt *pt, int code, int cpu,
-				pid_t pid, pid_t tid, u64 ip, u64 timestamp)
+				pid_t pid, pid_t tid, u64 ip, u64 timestamp,
+				pid_t machine_pid, int vcpu)
 {
 	union perf_event event;
 	char msg[MAX_AUXTRACE_ERROR_MSG];
@@ -2421,8 +2422,9 @@ static int intel_pt_synth_error(struct intel_pt *pt, int code, int cpu,
 
 	intel_pt__strerror(code, msg, MAX_AUXTRACE_ERROR_MSG);
 
-	auxtrace_synth_error(&event.auxtrace_error, PERF_AUXTRACE_ERROR_ITRACE,
-			     code, cpu, pid, tid, ip, msg, timestamp);
+	auxtrace_synth_guest_error(&event.auxtrace_error, PERF_AUXTRACE_ERROR_ITRACE,
+				   code, cpu, pid, tid, ip, msg, timestamp,
+				   machine_pid, vcpu);
 
 	err = perf_session__deliver_synth_event(pt->session, &event, NULL);
 	if (err)
@@ -2437,11 +2439,22 @@ static int intel_ptq_synth_error(struct intel_pt_queue *ptq,
 {
 	struct intel_pt *pt = ptq->pt;
 	u64 tm = ptq->timestamp;
+	pid_t machine_pid = 0;
+	pid_t pid = ptq->pid;
+	pid_t tid = ptq->tid;
+	int vcpu = -1;
 
 	tm = pt->timeless_decoding ? 0 : tsc_to_perf_time(tm, &pt->tc);
 
-	return intel_pt_synth_error(pt, state->err, ptq->cpu, ptq->pid,
-				    ptq->tid, state->from_ip, tm);
+	if (pt->have_guest_sideband && state->from_nr) {
+		machine_pid = ptq->guest_machine_pid;
+		vcpu = ptq->vcpu;
+		pid = ptq->guest_pid;
+		tid = ptq->guest_tid;
+	}
+
+	return intel_pt_synth_error(pt, state->err, ptq->cpu, pid, tid,
+				    state->from_ip, tm, machine_pid, vcpu);
 }
 
 static int intel_pt_next_tid(struct intel_pt *pt, struct intel_pt_queue *ptq)
@@ -3028,7 +3041,8 @@ static int intel_pt_process_timeless_sample(struct intel_pt *pt,
 static int intel_pt_lost(struct intel_pt *pt, struct perf_sample *sample)
 {
 	return intel_pt_synth_error(pt, INTEL_PT_ERR_LOST, sample->cpu,
-				    sample->pid, sample->tid, 0, sample->time);
+				    sample->pid, sample->tid, 0, sample->time,
+				    sample->machine_pid, sample->vcpu);
 }
 
 static struct intel_pt_queue *intel_pt_cpu_to_ptq(struct intel_pt *pt, int cpu)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH 34/35] perf intel-pt: Use guest pid/tid etc in guest samples
  2022-07-11  9:31 [PATCH 00/35] perf intel-pt: Add support for tracing virtual machine user space on the host Adrian Hunter
                   ` (32 preceding siblings ...)
  2022-07-11  9:32 ` [PATCH 33/35] perf intel-pt: Add machine_pid and vcpu to auxtrace_error Adrian Hunter
@ 2022-07-11  9:32 ` Adrian Hunter
  2022-07-20  5:28   ` Ian Rogers
  2022-07-11  9:32 ` [PATCH 35/35] perf intel-pt: Add documentation for tracing guest machine user space Adrian Hunter
  2022-07-18 15:28 ` [PATCH 00/35] perf intel-pt: Add support for tracing virtual machine user space on the host Arnaldo Carvalho de Melo
  35 siblings, 1 reply; 83+ messages in thread
From: Adrian Hunter @ 2022-07-11  9:32 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Namhyung Kim, Ian Rogers, Andi Kleen, linux-kernel, kvm

When decoding with guest sideband information, for VMX non-root (NR)
i.e. guest events, replace the host (hypervisor) pid/tid with guest values,
and provide also the new machine_pid and vcpu values.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/intel-pt.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c
index 143a096b567b..d5e9fc8106dd 100644
--- a/tools/perf/util/intel-pt.c
+++ b/tools/perf/util/intel-pt.c
@@ -1657,6 +1657,17 @@ static void intel_pt_prep_a_sample(struct intel_pt_queue *ptq,
 
 	sample->pid = ptq->pid;
 	sample->tid = ptq->tid;
+
+	if (ptq->pt->have_guest_sideband) {
+		if ((ptq->state->from_ip && ptq->state->from_nr) ||
+		    (ptq->state->to_ip && ptq->state->to_nr)) {
+			sample->pid = ptq->guest_pid;
+			sample->tid = ptq->guest_tid;
+			sample->machine_pid = ptq->guest_machine_pid;
+			sample->vcpu = ptq->vcpu;
+		}
+	}
+
 	sample->cpu = ptq->cpu;
 	sample->insn_len = ptq->insn_len;
 	memcpy(sample->insn, ptq->insn, INTEL_PT_INSN_BUF_SZ);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH 35/35] perf intel-pt: Add documentation for tracing guest machine user space
  2022-07-11  9:31 [PATCH 00/35] perf intel-pt: Add support for tracing virtual machine user space on the host Adrian Hunter
                   ` (33 preceding siblings ...)
  2022-07-11  9:32 ` [PATCH 34/35] perf intel-pt: Use guest pid/tid etc in guest samples Adrian Hunter
@ 2022-07-11  9:32 ` Adrian Hunter
  2022-07-20  5:29   ` Ian Rogers
  2022-07-18 15:28 ` [PATCH 00/35] perf intel-pt: Add support for tracing virtual machine user space on the host Arnaldo Carvalho de Melo
  35 siblings, 1 reply; 83+ messages in thread
From: Adrian Hunter @ 2022-07-11  9:32 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Namhyung Kim, Ian Rogers, Andi Kleen, linux-kernel, kvm

Now it is possible to decode a host Intel PT trace including guest machine
user space, add documentation for the steps needed to do it.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/Documentation/perf-intel-pt.txt | 181 ++++++++++++++++++++-
 1 file changed, 177 insertions(+), 4 deletions(-)

diff --git a/tools/perf/Documentation/perf-intel-pt.txt b/tools/perf/Documentation/perf-intel-pt.txt
index 238ab9d3cb93..3dc3f0ccbd51 100644
--- a/tools/perf/Documentation/perf-intel-pt.txt
+++ b/tools/perf/Documentation/perf-intel-pt.txt
@@ -267,7 +267,7 @@ Note that, as with all events, the event is suffixed with event modifiers:
 	H	host
 	p	precise ip
 
-'h', 'G' and 'H' are for virtualization which is not supported by Intel PT.
+'h', 'G' and 'H' are for virtualization which are not used by Intel PT.
 'p' is also not relevant to Intel PT.  So only options 'u' and 'k' are
 meaningful for Intel PT.
 
@@ -1218,10 +1218,10 @@ XED
 include::build-xed.txt[]
 
 
-Tracing Virtual Machines
-------------------------
+Tracing Virtual Machines (kernel only)
+--------------------------------------
 
-Currently, only kernel tracing is supported and only with either "timeless" decoding
+Currently, kernel tracing is supported with either "timeless" decoding
 (i.e. no TSC timestamps) or VM Time Correlation. VM Time Correlation is an extra step
 using 'perf inject' and requires unchanging VMX TSC Offset and no VMX TSC Scaling.
 
@@ -1400,6 +1400,179 @@ There were none.
           :17006 17006 [001] 11500.262869216:  ffffffff8220116e error_entry+0xe ([guest.kernel.kallsyms])               pushq  %rax
 
 
+Tracing Virtual Machines (including user space)
+-----------------------------------------------
+
+It is possible to use perf record to record sideband events within a virtual machine, so that an Intel PT trace on the host can be decoded.
+Sideband events from the guest perf.data file can be injected into the host perf.data file using perf inject.
+
+Here is an example of the steps needed:
+
+On the guest machine:
+
+Check that no-kvmclock kernel command line option was used to boot:
+
+Note, this is essential to enable time correlation between host and guest machines.
+
+ $ cat /proc/cmdline
+ BOOT_IMAGE=/boot/vmlinuz-5.10.0-16-amd64 root=UUID=cb49c910-e573-47e0-bce7-79e293df8e1d ro no-kvmclock
+
+There is no BPF support at present so, if possible, disable JIT compiling:
+
+ $ echo 0 | sudo tee /proc/sys/net/core/bpf_jit_enable
+ 0
+
+Start perf record to collect sideband events:
+
+ $ sudo perf record -o guest-sideband-testing-guest-perf.data --sample-identifier --buildid-all --switch-events --kcore -a -e dummy
+
+On the host machine:
+
+Start perf record to collect Intel PT trace:
+
+Note, the host trace will get very big, very fast, so the steps from starting to stopping the host trace really need to be done so that they happen in the shortest time possible.
+
+ $ sudo perf record -o guest-sideband-testing-host-perf.data -m,64M --kcore -a -e intel_pt/cyc/
+
+On the guest machine:
+
+Run a small test case, just 'uname' in this example:
+
+ $ uname
+ Linux
+
+On the host machine:
+
+Stop the Intel PT trace:
+
+ ^C
+ [ perf record: Woken up 1 times to write data ]
+ [ perf record: Captured and wrote 76.122 MB guest-sideband-testing-host-perf.data ]
+
+On the guest machine:
+
+Stop the Intel PT trace:
+
+ ^C
+ [ perf record: Woken up 1 times to write data ]
+ [ perf record: Captured and wrote 1.247 MB guest-sideband-testing-guest-perf.data ]
+
+And then copy guest-sideband-testing-guest-perf.data to the host (not shown here).
+
+On the host machine:
+
+With the 2 perf.data recordings, and with their ownership changed to the user.
+
+Identify the TSC Offset:
+
+ $ perf inject -i guest-sideband-testing-host-perf.data --vm-time-correlation=dry-run
+ VMCS: 0x103fc6  TSC Offset 0xfffffa6ae070cb20
+ VMCS: 0x103ff2  TSC Offset 0xfffffa6ae070cb20
+ VMCS: 0x10fdaa  TSC Offset 0xfffffa6ae070cb20
+ VMCS: 0x24d57c  TSC Offset 0xfffffa6ae070cb20
+
+Correct Intel PT TSC timestamps for the guest machine:
+
+ $ perf inject -i guest-sideband-testing-host-perf.data --vm-time-correlation=0xfffffa6ae070cb20 --force
+
+Identify the guest machine PID:
+
+ $ perf script -i guest-sideband-testing-host-perf.data --no-itrace --show-task-events | grep KVM
+       CPU 0/KVM     0 [000]     0.000000: PERF_RECORD_COMM: CPU 0/KVM:13376/13381
+       CPU 1/KVM     0 [000]     0.000000: PERF_RECORD_COMM: CPU 1/KVM:13376/13382
+       CPU 2/KVM     0 [000]     0.000000: PERF_RECORD_COMM: CPU 2/KVM:13376/13383
+       CPU 3/KVM     0 [000]     0.000000: PERF_RECORD_COMM: CPU 3/KVM:13376/13384
+
+Note, the QEMU option -name debug-threads=on is needed so that thread names
+can be used to determine which thread is running which VCPU as above. libvirt seems to use this by default.
+
+Create a guestmount, assuming the guest machine is 'vm_to_test':
+
+ $ mkdir -p ~/guestmount/13376
+ $ sshfs -o direct_io vm_to_test:/ ~/guestmount/13376
+
+Inject the guest perf.data file into the host perf.data file:
+
+Note, due to the guestmount option, guest object files and debug files will be copied into the build ID cache from the guest machine, with the notable exception of VDSO.
+If needed, VDSO can be copied manually in a fashion similar to that used by the perf-archive script.
+
+ $ perf inject -i guest-sideband-testing-host-perf.data -o inj --guestmount ~/guestmount --guest-data=guest-sideband-testing-guest-perf.data,13376,0xfffffa6ae070cb20
+
+Show an excerpt from the result.  In this case the CPU and time range have been to chosen to show interaction between guest and host when 'uname' is starting to run on the guest machine:
+
+Notes:
+
+	- the CPU displayed, [002] in this case, is always the host CPU
+	- events happening in the virtual machine start with VM:13376 VCPU:003, which shows the hypervisor PID 13376 and the VCPU number
+	- only calls and errors are displayed i.e. --itrace=ce
+	- branches entering and exiting the virtual machine are split, and show as 2 branches to/from "0 [unknown] ([unknown])"
+
+ $ perf script -i inj --itrace=ce -F+machine_pid,+vcpu,+addr,+pid,+tid,-period --ns --time 7919.408803365,7919.408804631 -C 2
+       CPU 3/KVM 13376/13384 [002]  7919.408803365:      branches:  ffffffffc0f8ebe0 vmx_vcpu_enter_exit+0xc0 ([kernel.kallsyms]) => ffffffffc0f8edc0 __vmx_vcpu_run+0x0 ([kernel.kallsyms])
+       CPU 3/KVM 13376/13384 [002]  7919.408803365:      branches:  ffffffffc0f8edd5 __vmx_vcpu_run+0x15 ([kernel.kallsyms]) => ffffffffc0f8eca0 vmx_update_host_rsp+0x0 ([kernel.kallsyms])
+       CPU 3/KVM 13376/13384 [002]  7919.408803365:      branches:  ffffffffc0f8ee1b __vmx_vcpu_run+0x5b ([kernel.kallsyms]) => ffffffffc0f8ed60 vmx_vmenter+0x0 ([kernel.kallsyms])
+       CPU 3/KVM 13376/13384 [002]  7919.408803461:      branches:  ffffffffc0f8ed62 vmx_vmenter+0x2 ([kernel.kallsyms]) =>                0 [unknown] ([unknown])
+ VM:13376 VCPU:003            uname  3404/3404  [002]  7919.408803461:      branches:                 0 [unknown] ([unknown]) =>     7f851c9b5a5c init_cacheinfo+0x3ac (/usr/lib/x86_64-linux-gnu/libc-2.31.so)
+ VM:13376 VCPU:003            uname  3404/3404  [002]  7919.408803567:      branches:      7f851c9b5a5a init_cacheinfo+0x3aa (/usr/lib/x86_64-linux-gnu/libc-2.31.so) =>                0 [unknown] ([unknown])
+       CPU 3/KVM 13376/13384 [002]  7919.408803567:      branches:                 0 [unknown] ([unknown]) => ffffffffc0f8ed80 vmx_vmexit+0x0 ([kernel.kallsyms])
+       CPU 3/KVM 13376/13384 [002]  7919.408803596:      branches:  ffffffffc0f6619a vmx_vcpu_run+0x26a ([kernel.kallsyms]) => ffffffffb2255c60 x86_virt_spec_ctrl+0x0 ([kernel.kallsyms])
+       CPU 3/KVM 13376/13384 [002]  7919.408803801:      branches:  ffffffffc0f66445 vmx_vcpu_run+0x515 ([kernel.kallsyms]) => ffffffffb2290b30 native_write_msr+0x0 ([kernel.kallsyms])
+       CPU 3/KVM 13376/13384 [002]  7919.408803850:      branches:  ffffffffc0f661f8 vmx_vcpu_run+0x2c8 ([kernel.kallsyms]) => ffffffffc1092300 kvm_load_host_xsave_state+0x0 ([kernel.kallsyms])
+       CPU 3/KVM 13376/13384 [002]  7919.408803850:      branches:  ffffffffc1092327 kvm_load_host_xsave_state+0x27 ([kernel.kallsyms]) => ffffffffc1092220 kvm_load_host_xsave_state.part.0+0x0 ([kernel.kallsyms])
+       CPU 3/KVM 13376/13384 [002]  7919.408803862:      branches:  ffffffffc0f662cf vmx_vcpu_run+0x39f ([kernel.kallsyms]) => ffffffffc0f63f90 vmx_recover_nmi_blocking+0x0 ([kernel.kallsyms])
+       CPU 3/KVM 13376/13384 [002]  7919.408803862:      branches:  ffffffffc0f662e9 vmx_vcpu_run+0x3b9 ([kernel.kallsyms]) => ffffffffc0f619a0 __vmx_complete_interrupts+0x0 ([kernel.kallsyms])
+       CPU 3/KVM 13376/13384 [002]  7919.408803872:      branches:  ffffffffc109cfb2 vcpu_enter_guest+0x752 ([kernel.kallsyms]) => ffffffffc0f5f570 vmx_handle_exit_irqoff+0x0 ([kernel.kallsyms])
+       CPU 3/KVM 13376/13384 [002]  7919.408803881:      branches:  ffffffffc109d028 vcpu_enter_guest+0x7c8 ([kernel.kallsyms]) => ffffffffb234f900 __srcu_read_lock+0x0 ([kernel.kallsyms])
+       CPU 3/KVM 13376/13384 [002]  7919.408803897:      branches:  ffffffffc109d06f vcpu_enter_guest+0x80f ([kernel.kallsyms]) => ffffffffc0f72e30 vmx_handle_exit+0x0 ([kernel.kallsyms])
+       CPU 3/KVM 13376/13384 [002]  7919.408803897:      branches:  ffffffffc0f72e3d vmx_handle_exit+0xd ([kernel.kallsyms]) => ffffffffc0f727c0 __vmx_handle_exit+0x0 ([kernel.kallsyms])
+       CPU 3/KVM 13376/13384 [002]  7919.408803897:      branches:  ffffffffc0f72b15 __vmx_handle_exit+0x355 ([kernel.kallsyms]) => ffffffffc0f60ae0 vmx_flush_pml_buffer+0x0 ([kernel.kallsyms])
+       CPU 3/KVM 13376/13384 [002]  7919.408803903:      branches:  ffffffffc0f72994 __vmx_handle_exit+0x1d4 ([kernel.kallsyms]) => ffffffffc10b7090 kvm_emulate_cpuid+0x0 ([kernel.kallsyms])
+       CPU 3/KVM 13376/13384 [002]  7919.408803903:      branches:  ffffffffc10b70f1 kvm_emulate_cpuid+0x61 ([kernel.kallsyms]) => ffffffffc10b6e10 kvm_cpuid+0x0 ([kernel.kallsyms])
+       CPU 3/KVM 13376/13384 [002]  7919.408803941:      branches:  ffffffffc10b7125 kvm_emulate_cpuid+0x95 ([kernel.kallsyms]) => ffffffffc1093110 kvm_skip_emulated_instruction+0x0 ([kernel.kallsyms])
+       CPU 3/KVM 13376/13384 [002]  7919.408803941:      branches:  ffffffffc109311f kvm_skip_emulated_instruction+0xf ([kernel.kallsyms]) => ffffffffc0f5e180 vmx_get_rflags+0x0 ([kernel.kallsyms])
+       CPU 3/KVM 13376/13384 [002]  7919.408803951:      branches:  ffffffffc109312a kvm_skip_emulated_instruction+0x1a ([kernel.kallsyms]) => ffffffffc0f5fd30 vmx_skip_emulated_instruction+0x0 ([kernel.kallsyms])
+       CPU 3/KVM 13376/13384 [002]  7919.408803951:      branches:  ffffffffc0f5fd79 vmx_skip_emulated_instruction+0x49 ([kernel.kallsyms]) => ffffffffc0f5fb50 skip_emulated_instruction+0x0 ([kernel.kallsyms])
+       CPU 3/KVM 13376/13384 [002]  7919.408803956:      branches:  ffffffffc0f5fc68 skip_emulated_instruction+0x118 ([kernel.kallsyms]) => ffffffffc0f6a940 vmx_cache_reg+0x0 ([kernel.kallsyms])
+       CPU 3/KVM 13376/13384 [002]  7919.408803964:      branches:  ffffffffc0f5fc11 skip_emulated_instruction+0xc1 ([kernel.kallsyms]) => ffffffffc0f5f9e0 vmx_set_interrupt_shadow+0x0 ([kernel.kallsyms])
+       CPU 3/KVM 13376/13384 [002]  7919.408803980:      branches:  ffffffffc109f8b1 vcpu_run+0x71 ([kernel.kallsyms]) => ffffffffc10ad2f0 kvm_cpu_has_pending_timer+0x0 ([kernel.kallsyms])
+       CPU 3/KVM 13376/13384 [002]  7919.408803980:      branches:  ffffffffc10ad2fb kvm_cpu_has_pending_timer+0xb ([kernel.kallsyms]) => ffffffffc10b0490 apic_has_pending_timer+0x0 ([kernel.kallsyms])
+       CPU 3/KVM 13376/13384 [002]  7919.408803991:      branches:  ffffffffc109f899 vcpu_run+0x59 ([kernel.kallsyms]) => ffffffffc109c860 vcpu_enter_guest+0x0 ([kernel.kallsyms])
+       CPU 3/KVM 13376/13384 [002]  7919.408803993:      branches:  ffffffffc109cd4c vcpu_enter_guest+0x4ec ([kernel.kallsyms]) => ffffffffc0f69140 vmx_prepare_switch_to_guest+0x0 ([kernel.kallsyms])
+       CPU 3/KVM 13376/13384 [002]  7919.408803996:      branches:  ffffffffc109cd7d vcpu_enter_guest+0x51d ([kernel.kallsyms]) => ffffffffb234f930 __srcu_read_unlock+0x0 ([kernel.kallsyms])
+       CPU 3/KVM 13376/13384 [002]  7919.408803996:      branches:  ffffffffc109cd9c vcpu_enter_guest+0x53c ([kernel.kallsyms]) => ffffffffc0f609b0 vmx_sync_pir_to_irr+0x0 ([kernel.kallsyms])
+       CPU 3/KVM 13376/13384 [002]  7919.408803996:      branches:  ffffffffc0f60a6d vmx_sync_pir_to_irr+0xbd ([kernel.kallsyms]) => ffffffffc10adc20 kvm_lapic_find_highest_irr+0x0 ([kernel.kallsyms])
+       CPU 3/KVM 13376/13384 [002]  7919.408804010:      branches:  ffffffffc0f60abd vmx_sync_pir_to_irr+0x10d ([kernel.kallsyms]) => ffffffffc0f60820 vmx_set_rvi+0x0 ([kernel.kallsyms])
+       CPU 3/KVM 13376/13384 [002]  7919.408804019:      branches:  ffffffffc109ceca vcpu_enter_guest+0x66a ([kernel.kallsyms]) => ffffffffb2249840 fpregs_assert_state_consistent+0x0 ([kernel.kallsyms])
+       CPU 3/KVM 13376/13384 [002]  7919.408804021:      branches:  ffffffffc109cf10 vcpu_enter_guest+0x6b0 ([kernel.kallsyms]) => ffffffffc0f65f30 vmx_vcpu_run+0x0 ([kernel.kallsyms])
+       CPU 3/KVM 13376/13384 [002]  7919.408804024:      branches:  ffffffffc0f6603b vmx_vcpu_run+0x10b ([kernel.kallsyms]) => ffffffffb229bed0 __get_current_cr3_fast+0x0 ([kernel.kallsyms])
+       CPU 3/KVM 13376/13384 [002]  7919.408804024:      branches:  ffffffffc0f66055 vmx_vcpu_run+0x125 ([kernel.kallsyms]) => ffffffffb2253050 cr4_read_shadow+0x0 ([kernel.kallsyms])
+       CPU 3/KVM 13376/13384 [002]  7919.408804030:      branches:  ffffffffc0f6608d vmx_vcpu_run+0x15d ([kernel.kallsyms]) => ffffffffc10921e0 kvm_load_guest_xsave_state+0x0 ([kernel.kallsyms])
+       CPU 3/KVM 13376/13384 [002]  7919.408804030:      branches:  ffffffffc1092207 kvm_load_guest_xsave_state+0x27 ([kernel.kallsyms]) => ffffffffc1092110 kvm_load_guest_xsave_state.part.0+0x0 ([kernel.kallsyms])
+       CPU 3/KVM 13376/13384 [002]  7919.408804032:      branches:  ffffffffc0f660c6 vmx_vcpu_run+0x196 ([kernel.kallsyms]) => ffffffffb22061a0 perf_guest_get_msrs+0x0 ([kernel.kallsyms])
+       CPU 3/KVM 13376/13384 [002]  7919.408804032:      branches:  ffffffffb22061a9 perf_guest_get_msrs+0x9 ([kernel.kallsyms]) => ffffffffb220cda0 intel_guest_get_msrs+0x0 ([kernel.kallsyms])
+       CPU 3/KVM 13376/13384 [002]  7919.408804039:      branches:  ffffffffc0f66109 vmx_vcpu_run+0x1d9 ([kernel.kallsyms]) => ffffffffc0f652c0 clear_atomic_switch_msr+0x0 ([kernel.kallsyms])
+       CPU 3/KVM 13376/13384 [002]  7919.408804040:      branches:  ffffffffc0f66119 vmx_vcpu_run+0x1e9 ([kernel.kallsyms]) => ffffffffc0f73f60 intel_pmu_lbr_is_enabled+0x0 ([kernel.kallsyms])
+       CPU 3/KVM 13376/13384 [002]  7919.408804042:      branches:  ffffffffc0f73f81 intel_pmu_lbr_is_enabled+0x21 ([kernel.kallsyms]) => ffffffffc10b68e0 kvm_find_cpuid_entry+0x0 ([kernel.kallsyms])
+       CPU 3/KVM 13376/13384 [002]  7919.408804045:      branches:  ffffffffc0f66454 vmx_vcpu_run+0x524 ([kernel.kallsyms]) => ffffffffc0f61ff0 vmx_update_hv_timer+0x0 ([kernel.kallsyms])
+       CPU 3/KVM 13376/13384 [002]  7919.408804057:      branches:  ffffffffc0f66142 vmx_vcpu_run+0x212 ([kernel.kallsyms]) => ffffffffc10af100 kvm_wait_lapic_expire+0x0 ([kernel.kallsyms])
+       CPU 3/KVM 13376/13384 [002]  7919.408804057:      branches:  ffffffffc0f66156 vmx_vcpu_run+0x226 ([kernel.kallsyms]) => ffffffffb2255c60 x86_virt_spec_ctrl+0x0 ([kernel.kallsyms])
+       CPU 3/KVM 13376/13384 [002]  7919.408804057:      branches:  ffffffffc0f66161 vmx_vcpu_run+0x231 ([kernel.kallsyms]) => ffffffffc0f8eb20 vmx_vcpu_enter_exit+0x0 ([kernel.kallsyms])
+       CPU 3/KVM 13376/13384 [002]  7919.408804057:      branches:  ffffffffc0f8eb44 vmx_vcpu_enter_exit+0x24 ([kernel.kallsyms]) => ffffffffb2353e10 rcu_note_context_switch+0x0 ([kernel.kallsyms])
+       CPU 3/KVM 13376/13384 [002]  7919.408804057:      branches:  ffffffffb2353e1c rcu_note_context_switch+0xc ([kernel.kallsyms]) => ffffffffb2353db0 rcu_qs+0x0 ([kernel.kallsyms])
+       CPU 3/KVM 13376/13384 [002]  7919.408804066:      branches:  ffffffffc0f8ebe0 vmx_vcpu_enter_exit+0xc0 ([kernel.kallsyms]) => ffffffffc0f8edc0 __vmx_vcpu_run+0x0 ([kernel.kallsyms])
+       CPU 3/KVM 13376/13384 [002]  7919.408804066:      branches:  ffffffffc0f8edd5 __vmx_vcpu_run+0x15 ([kernel.kallsyms]) => ffffffffc0f8eca0 vmx_update_host_rsp+0x0 ([kernel.kallsyms])
+       CPU 3/KVM 13376/13384 [002]  7919.408804066:      branches:  ffffffffc0f8ee1b __vmx_vcpu_run+0x5b ([kernel.kallsyms]) => ffffffffc0f8ed60 vmx_vmenter+0x0 ([kernel.kallsyms])
+       CPU 3/KVM 13376/13384 [002]  7919.408804162:      branches:  ffffffffc0f8ed62 vmx_vmenter+0x2 ([kernel.kallsyms]) =>                0 [unknown] ([unknown])
+ VM:13376 VCPU:003            uname  3404/3404  [002]  7919.408804162:      branches:                 0 [unknown] ([unknown]) =>     7f851c9b5a5c init_cacheinfo+0x3ac (/usr/lib/x86_64-linux-gnu/libc-2.31.so)
+ VM:13376 VCPU:003            uname  3404/3404  [002]  7919.408804273:      branches:      7f851cb7c0e4 _dl_init+0x74 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) =>     7f851cb7bf50 call_init.part.0+0x0 (/usr/lib/x86_64-linux-gnu/ld-2.31.so)
+ VM:13376 VCPU:003            uname  3404/3404  [002]  7919.408804526:      branches:      55e0c00136f0 _start+0x0 (/usr/bin/uname) => ffffffff83200ac0 asm_exc_page_fault+0x0 ([kernel.kallsyms])
+ VM:13376 VCPU:003            uname  3404/3404  [002]  7919.408804526:      branches:  ffffffff83200ac3 asm_exc_page_fault+0x3 ([kernel.kallsyms]) => ffffffff83201290 error_entry+0x0 ([kernel.kallsyms])
+ VM:13376 VCPU:003            uname  3404/3404  [002]  7919.408804534:      branches:  ffffffff832012fa error_entry+0x6a ([kernel.kallsyms]) => ffffffff830b59a0 sync_regs+0x0 ([kernel.kallsyms])
+ VM:13376 VCPU:003            uname  3404/3404  [002]  7919.408804631:      branches:  ffffffff83200ad9 asm_exc_page_fault+0x19 ([kernel.kallsyms]) => ffffffff830b8210 exc_page_fault+0x0 ([kernel.kallsyms])
+ VM:13376 VCPU:003            uname  3404/3404  [002]  7919.408804631:      branches:  ffffffff830b82a4 exc_page_fault+0x94 ([kernel.kallsyms]) => ffffffff830b80e0 __kvm_handle_async_pf+0x0 ([kernel.kallsyms])
+ VM:13376 VCPU:003            uname  3404/3404  [002]  7919.408804631:      branches:  ffffffff830b80ed __kvm_handle_async_pf+0xd ([kernel.kallsyms]) => ffffffff830b80c0 kvm_read_and_reset_apf_flags+0x0 ([kernel.kallsyms])
+
+
 Tracing Virtual Machines - Guest Code
 -------------------------------------
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* Re: [PATCH 01/35] perf tools: Fix dso_id inode generation comparison
  2022-07-11  9:31 ` [PATCH 01/35] perf tools: Fix dso_id inode generation comparison Adrian Hunter
@ 2022-07-18 14:57   ` Arnaldo Carvalho de Melo
  2022-07-19 10:18     ` Adrian Hunter
  0 siblings, 1 reply; 83+ messages in thread
From: Arnaldo Carvalho de Melo @ 2022-07-18 14:57 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Jiri Olsa, Namhyung Kim, Ian Rogers, Andi Kleen, linux-kernel, kvm

Em Mon, Jul 11, 2022 at 12:31:44PM +0300, Adrian Hunter escreveu:
> Synthesized MMAP events have zero ino_generation, so do not compare zero
> values.
> 
> Fixes: 0e3149f86b99 ("perf dso: Move dso_id from 'struct map' to 'struct dso'")
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
> ---
>  tools/perf/util/dsos.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/perf/util/dsos.c b/tools/perf/util/dsos.c
> index b97366f77bbf..839a1f384733 100644
> --- a/tools/perf/util/dsos.c
> +++ b/tools/perf/util/dsos.c
> @@ -23,8 +23,14 @@ static int __dso_id__cmp(struct dso_id *a, struct dso_id *b)
>  	if (a->ino > b->ino) return -1;
>  	if (a->ino < b->ino) return 1;
>  
> -	if (a->ino_generation > b->ino_generation) return -1;
> -	if (a->ino_generation < b->ino_generation) return 1;
> +	/*
> +	 * Synthesized MMAP events have zero ino_generation, so do not compare
> +	 * zero values.
> +	 */
> +	if (a->ino_generation && b->ino_generation) {
> +		if (a->ino_generation > b->ino_generation) return -1;
> +		if (a->ino_generation < b->ino_generation) return 1;
> +	}

But comparing didn't harm right? when its !0 now we may have three
comparisions instead of 2 :-\

The comment has some value tho, so I'm merging this :-)

- Arnaldo

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 00/35] perf intel-pt: Add support for tracing virtual machine user space on the host
  2022-07-11  9:31 [PATCH 00/35] perf intel-pt: Add support for tracing virtual machine user space on the host Adrian Hunter
                   ` (34 preceding siblings ...)
  2022-07-11  9:32 ` [PATCH 35/35] perf intel-pt: Add documentation for tracing guest machine user space Adrian Hunter
@ 2022-07-18 15:28 ` Arnaldo Carvalho de Melo
  35 siblings, 0 replies; 83+ messages in thread
From: Arnaldo Carvalho de Melo @ 2022-07-18 15:28 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Jiri Olsa, Namhyung Kim, Ian Rogers, Andi Kleen, linux-kernel, kvm

Em Mon, Jul 11, 2022 at 12:31:43PM +0300, Adrian Hunter escreveu:
> Hi
> 
> Here are patches to support decoding an Intel PT trace that contains data
> from virtual machine userspace.
> 
> This is done by adding functionality to perf inject to be able to inject
> sideband events needed for decoding, into the perf.data file recorded on
> the host.  That is, inject events from a perf.data file recorded in a
> virtual machine into a perf.data file recorded on the host at the same
> time.
> 
> For more details, see the example in the documentation added in the last
> patch.
> 
> Note there was already support for tracing virtual machines kernel-only:
> 
>  https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/Documentation/perf-intel-pt.txt?h=v5.19-rc1#n1221
>  
> or the special case of tracing KVM self tests:
> 
>  https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/Documentation/perf-intel-pt.txt?h=v5.19-rc1#n1403
> 
> For general information about Intel PT also see the wiki page:
> 
>  https://perf.wiki.kernel.org/index.php/Perf_tools_support_for_Intel%C2%AE_Processor_Trace
> 
> The patches fall into 5 groups:
>  1. the first patch is a fix
>  2. the next 22 patches are preparation
>  3. the main patch is "perf inject: Add support for injecting guest
>  sideband events"
>  4. 3 more preparation patches
>  5. Intel PT decoding changes
> 
> The patches are mostly small except for "perf inject: Add support for
> injecting guest sideband events".  However the code there adds new
> functionality, does not affect existing functionality and is consequently
> pretty self-contained.

Applied locally, going thru tests.

- Arnaldo
 
> 
> Adrian Hunter (35):
>       perf tools: Fix dso_id inode generation comparison
>       perf tools: Export dsos__for_each_with_build_id()
>       perf ordered_events: Add ordered_events__last_flush_time()
>       perf tools: Export perf_event__process_finished_round()
>       perf tools: Factor out evsel__id_hdr_size()
>       perf tools: Add perf_event__synthesize_id_sample()
>       perf script: Add --dump-unsorted-raw-trace option
>       perf buildid-cache: Add guestmount'd files to the build ID cache
>       perf buildid-cache: Do not require purge files to also be in the file system
>       perf tools: Add machine_pid and vcpu to id_index
>       perf session: Create guest machines from id_index
>       perf tools: Add guest_cpu to hypervisor threads
>       perf tools: Add machine_pid and vcpu to perf_sample
>       perf tools: Use sample->machine_pid to find guest machine
>       perf script: Add machine_pid and vcpu
>       perf dlfilter: Add machine_pid and vcpu
>       perf auxtrace: Add machine_pid and vcpu to auxtrace_error
>       perf script python: Add machine_pid and vcpu
>       perf script python: intel-pt-events: Add machine_pid and vcpu
>       perf tools: Remove also guest kcore_dir with host kcore_dir
>       perf tools: Make has_kcore_dir() work also for guest kcore_dir
>       perf tools: Automatically use guest kcore_dir if present
>       perf tools: Add reallocarray_as_needed()
>       perf inject: Add support for injecting guest sideband events
>       perf machine: Use realloc_array_as_needed() in machine__set_current_tid()
>       perf tools: Handle injected guest kernel mmap event
>       perf tools: Add perf_event__is_guest()
>       perf intel-pt: Remove guest_machine_pid
>       perf intel-pt: Add some more logging to intel_pt_walk_next_insn()
>       perf intel-pt: Track guest context switches
>       perf intel-pt: pt disable sync switch
>       perf intel-pt: Determine guest thread from guest sideband
>       perf intel-pt: Add machine_pid and vcpu to auxtrace_error
>       perf intel-pt: Use guest pid/tid etc in guest samples
>       perf intel-pt: Add documentation for tracing guest machine user space
> 
>  tools/lib/perf/include/internal/evsel.h            |    4 +
>  tools/lib/perf/include/perf/event.h                |    7 +
>  tools/perf/Documentation/perf-dlfilter.txt         |   22 +
>  tools/perf/Documentation/perf-inject.txt           |   17 +
>  tools/perf/Documentation/perf-intel-pt.txt         |  181 +++-
>  tools/perf/Documentation/perf-script.txt           |   10 +-
>  tools/perf/builtin-inject.c                        | 1043 +++++++++++++++++++-
>  tools/perf/builtin-script.c                        |   19 +
>  tools/perf/include/perf/perf_dlfilter.h            |    8 +
>  tools/perf/scripts/python/intel-pt-events.py       |   32 +-
>  tools/perf/util/auxtrace.c                         |   30 +-
>  tools/perf/util/auxtrace.h                         |    4 +
>  tools/perf/util/build-id.c                         |   80 +-
>  tools/perf/util/build-id.h                         |   16 +-
>  tools/perf/util/data.c                             |   43 +-
>  tools/perf/util/data.h                             |    1 +
>  tools/perf/util/dlfilter.c                         |    2 +
>  tools/perf/util/dso.h                              |    6 +
>  tools/perf/util/dsos.c                             |   10 +-
>  tools/perf/util/event.h                            |   23 +
>  tools/perf/util/evlist.c                           |   42 +-
>  tools/perf/util/evsel.c                            |   27 +
>  tools/perf/util/evsel.h                            |    2 +
>  tools/perf/util/intel-pt.c                         |  183 +++-
>  tools/perf/util/machine.c                          |   41 +-
>  tools/perf/util/machine.h                          |    2 +
>  tools/perf/util/ordered-events.h                   |    6 +
>  .../util/scripting-engines/trace-event-python.c    |   15 +-
>  tools/perf/util/session.c                          |  111 ++-
>  tools/perf/util/session.h                          |    4 +
>  tools/perf/util/symbol.c                           |    6 +-
>  tools/perf/util/synthetic-events.c                 |   98 +-
>  tools/perf/util/synthetic-events.h                 |    2 +
>  tools/perf/util/thread.c                           |    1 +
>  tools/perf/util/thread.h                           |    1 +
>  tools/perf/util/util.c                             |   70 +-
>  tools/perf/util/util.h                             |   15 +
>  37 files changed, 2029 insertions(+), 155 deletions(-)
> 
> 
> Regards
> Adrian

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 01/35] perf tools: Fix dso_id inode generation comparison
  2022-07-18 14:57   ` Arnaldo Carvalho de Melo
@ 2022-07-19 10:18     ` Adrian Hunter
  2022-07-19 15:13       ` Ian Rogers
  0 siblings, 1 reply; 83+ messages in thread
From: Adrian Hunter @ 2022-07-19 10:18 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Namhyung Kim, Ian Rogers, Andi Kleen, linux-kernel, kvm

On 18/07/22 17:57, Arnaldo Carvalho de Melo wrote:
> Em Mon, Jul 11, 2022 at 12:31:44PM +0300, Adrian Hunter escreveu:
>> Synthesized MMAP events have zero ino_generation, so do not compare zero
>> values.
>>
>> Fixes: 0e3149f86b99 ("perf dso: Move dso_id from 'struct map' to 'struct dso'")
>> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
>> ---
>>  tools/perf/util/dsos.c | 10 ++++++++--
>>  1 file changed, 8 insertions(+), 2 deletions(-)
>>
>> diff --git a/tools/perf/util/dsos.c b/tools/perf/util/dsos.c
>> index b97366f77bbf..839a1f384733 100644
>> --- a/tools/perf/util/dsos.c
>> +++ b/tools/perf/util/dsos.c
>> @@ -23,8 +23,14 @@ static int __dso_id__cmp(struct dso_id *a, struct dso_id *b)
>>  	if (a->ino > b->ino) return -1;
>>  	if (a->ino < b->ino) return 1;
>>  
>> -	if (a->ino_generation > b->ino_generation) return -1;
>> -	if (a->ino_generation < b->ino_generation) return 1;
>> +	/*
>> +	 * Synthesized MMAP events have zero ino_generation, so do not compare
>> +	 * zero values.
>> +	 */
>> +	if (a->ino_generation && b->ino_generation) {
>> +		if (a->ino_generation > b->ino_generation) return -1;
>> +		if (a->ino_generation < b->ino_generation) return 1;
>> +	}
> 
> But comparing didn't harm right? when its !0 now we may have three
> comparisions instead of 2 :-\
> 
> The comment has some value tho, so I'm merging this :-)

Thanks. I found it harmful because the mismatch resulted in a new
dso that did not have a build ID whereas the original dso did have
a build ID.  The build ID was essential because the object was not
found otherwise.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 01/35] perf tools: Fix dso_id inode generation comparison
  2022-07-19 10:18     ` Adrian Hunter
@ 2022-07-19 15:13       ` Ian Rogers
  2022-07-19 19:16         ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 83+ messages in thread
From: Ian Rogers @ 2022-07-19 15:13 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Namhyung Kim, Andi Kleen,
	linux-kernel, kvm

On Tue, Jul 19, 2022 at 3:18 AM Adrian Hunter <adrian.hunter@intel.com> wrote:
>
> On 18/07/22 17:57, Arnaldo Carvalho de Melo wrote:
> > Em Mon, Jul 11, 2022 at 12:31:44PM +0300, Adrian Hunter escreveu:
> >> Synthesized MMAP events have zero ino_generation, so do not compare zero
> >> values.
> >>
> >> Fixes: 0e3149f86b99 ("perf dso: Move dso_id from 'struct map' to 'struct dso'")
> >> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
> >> ---
> >>  tools/perf/util/dsos.c | 10 ++++++++--
> >>  1 file changed, 8 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/tools/perf/util/dsos.c b/tools/perf/util/dsos.c
> >> index b97366f77bbf..839a1f384733 100644
> >> --- a/tools/perf/util/dsos.c
> >> +++ b/tools/perf/util/dsos.c
> >> @@ -23,8 +23,14 @@ static int __dso_id__cmp(struct dso_id *a, struct dso_id *b)
> >>      if (a->ino > b->ino) return -1;
> >>      if (a->ino < b->ino) return 1;
> >>
> >> -    if (a->ino_generation > b->ino_generation) return -1;
> >> -    if (a->ino_generation < b->ino_generation) return 1;
> >> +    /*
> >> +     * Synthesized MMAP events have zero ino_generation, so do not compare
> >> +     * zero values.
> >> +     */
> >> +    if (a->ino_generation && b->ino_generation) {
> >> +            if (a->ino_generation > b->ino_generation) return -1;
> >> +            if (a->ino_generation < b->ino_generation) return 1;
> >> +    }
> >
> > But comparing didn't harm right? when its !0 now we may have three
> > comparisions instead of 2 :-\
> >
> > The comment has some value tho, so I'm merging this :-)
>
> Thanks. I found it harmful because the mismatch resulted in a new
> dso that did not have a build ID whereas the original dso did have
> a build ID.  The build ID was essential because the object was not
> found otherwise.

That's good to know, could we add that also to the comment? Perhaps:

Synthesized MMAP events have zero ino_generation, avoid comparing them
with MMAP events with actual ino_generation.

Thanks,
Ian

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 02/35] perf tools: Export dsos__for_each_with_build_id()
  2022-07-11  9:31 ` [PATCH 02/35] perf tools: Export dsos__for_each_with_build_id() Adrian Hunter
@ 2022-07-19 16:55   ` Ian Rogers
  0 siblings, 0 replies; 83+ messages in thread
From: Ian Rogers @ 2022-07-19 16:55 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Namhyung Kim, Andi Kleen,
	linux-kernel, kvm

On Mon, Jul 11, 2022 at 2:32 AM Adrian Hunter <adrian.hunter@intel.com> wrote:
>
> Export dsos__for_each_with_build_id() so it can be used elsewhere.
>
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>

Acked-by: Ian Rogers <irogers@google.com>

Thanks,
Ian

> ---
>  tools/perf/util/build-id.c | 6 ------
>  tools/perf/util/dso.h      | 6 ++++++
>  2 files changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/tools/perf/util/build-id.c b/tools/perf/util/build-id.c
> index 328668f38c69..4c9093b64d1f 100644
> --- a/tools/perf/util/build-id.c
> +++ b/tools/perf/util/build-id.c
> @@ -300,12 +300,6 @@ char *dso__build_id_filename(const struct dso *dso, char *bf, size_t size,
>         return __dso__build_id_filename(dso, bf, size, is_debug, is_kallsyms);
>  }
>
> -#define dsos__for_each_with_build_id(pos, head)        \
> -       list_for_each_entry(pos, head, node)    \
> -               if (!pos->has_build_id)         \
> -                       continue;               \
> -               else
> -
>  static int write_buildid(const char *name, size_t name_len, struct build_id *bid,
>                          pid_t pid, u16 misc, struct feat_fd *fd)
>  {
> diff --git a/tools/perf/util/dso.h b/tools/perf/util/dso.h
> index 97047a11282b..66981c7a9a18 100644
> --- a/tools/perf/util/dso.h
> +++ b/tools/perf/util/dso.h
> @@ -227,6 +227,12 @@ struct dso {
>  #define dso__for_each_symbol(dso, pos, n)      \
>         symbols__for_each_entry(&(dso)->symbols, pos, n)
>
> +#define dsos__for_each_with_build_id(pos, head)        \
> +       list_for_each_entry(pos, head, node)    \
> +               if (!pos->has_build_id)         \
> +                       continue;               \
> +               else
> +
>  static inline void dso__set_loaded(struct dso *dso)
>  {
>         dso->loaded = true;
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 03/35] perf ordered_events: Add ordered_events__last_flush_time()
  2022-07-11  9:31 ` [PATCH 03/35] perf ordered_events: Add ordered_events__last_flush_time() Adrian Hunter
@ 2022-07-19 16:56   ` Ian Rogers
  0 siblings, 0 replies; 83+ messages in thread
From: Ian Rogers @ 2022-07-19 16:56 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Namhyung Kim, Andi Kleen,
	linux-kernel, kvm

On Mon, Jul 11, 2022 at 2:32 AM Adrian Hunter <adrian.hunter@intel.com> wrote:
>
> Allow callers to get the ordered_events last flush timestamp.
>
> This is needed in perf inject to obey finished-round ordering when
> injecting additional events (e.g. from a guest perf.data file) with
> timestamps. Any additional events that have timestamps before the last
> flush time must be injected before the corresponding FINISHED_ROUND event.
>
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>

Acked-by: Ian Rogers <irogers@google.com>

Thanks,
Ian

> ---
>  tools/perf/util/ordered-events.h | 6 ++++++
>  1 file changed, 6 insertions(+)
>
> diff --git a/tools/perf/util/ordered-events.h b/tools/perf/util/ordered-events.h
> index 0b05c3c0aeaa..8febbd7c98ca 100644
> --- a/tools/perf/util/ordered-events.h
> +++ b/tools/perf/util/ordered-events.h
> @@ -75,4 +75,10 @@ void ordered_events__set_copy_on_queue(struct ordered_events *oe, bool copy)
>  {
>         oe->copy_on_queue = copy;
>  }
> +
> +static inline u64 ordered_events__last_flush_time(struct ordered_events *oe)
> +{
> +       return oe->last_flush;
> +}
> +
>  #endif /* __ORDERED_EVENTS_H */
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 04/35] perf tools: Export perf_event__process_finished_round()
  2022-07-11  9:31 ` [PATCH 04/35] perf tools: Export perf_event__process_finished_round() Adrian Hunter
@ 2022-07-19 17:04   ` Ian Rogers
  2022-08-09 11:37     ` Adrian Hunter
  0 siblings, 1 reply; 83+ messages in thread
From: Ian Rogers @ 2022-07-19 17:04 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Namhyung Kim, Andi Kleen,
	linux-kernel, kvm

On Mon, Jul 11, 2022 at 2:32 AM Adrian Hunter <adrian.hunter@intel.com> wrote:
>
> Export perf_event__process_finished_round() so it can be used elsewhere.
>
> This is needed in perf inject to obey finished-round ordering.
>
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
> ---
>  tools/perf/util/session.c | 12 ++++--------
>  tools/perf/util/session.h |  4 ++++
>  2 files changed, 8 insertions(+), 8 deletions(-)
>
> diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
> index 37f833c3c81b..4c9513bc6d89 100644
> --- a/tools/perf/util/session.c
> +++ b/tools/perf/util/session.c
> @@ -374,10 +374,6 @@ static int process_finished_round_stub(struct perf_tool *tool __maybe_unused,
>         return 0;
>  }
>
> -static int process_finished_round(struct perf_tool *tool,
> -                                 union perf_event *event,
> -                                 struct ordered_events *oe);
> -
>  static int skipn(int fd, off_t n)
>  {
>         char buf[4096];
> @@ -534,7 +530,7 @@ void perf_tool__fill_defaults(struct perf_tool *tool)
>                 tool->build_id = process_event_op2_stub;
>         if (tool->finished_round == NULL) {
>                 if (tool->ordered_events)
> -                       tool->finished_round = process_finished_round;
> +                       tool->finished_round = perf_event__process_finished_round;
>                 else
>                         tool->finished_round = process_finished_round_stub;
>         }
> @@ -1069,9 +1065,9 @@ static perf_event__swap_op perf_event__swap_ops[] = {
>   *      Flush every events below timestamp 7
>   *      etc...
>   */
> -static int process_finished_round(struct perf_tool *tool __maybe_unused,
> -                                 union perf_event *event __maybe_unused,
> -                                 struct ordered_events *oe)
> +int perf_event__process_finished_round(struct perf_tool *tool __maybe_unused,
> +                                      union perf_event *event __maybe_unused,
> +                                      struct ordered_events *oe)
>  {
>         if (dump_trace)
>                 fprintf(stdout, "\n");
> diff --git a/tools/perf/util/session.h b/tools/perf/util/session.h
> index 34500a3da735..be5871ea558f 100644
> --- a/tools/perf/util/session.h
> +++ b/tools/perf/util/session.h
> @@ -155,4 +155,8 @@ int perf_session__deliver_synth_event(struct perf_session *session,
>  int perf_event__process_id_index(struct perf_session *session,
>                                  union perf_event *event);
>
> +int perf_event__process_finished_round(struct perf_tool *tool,
> +                                      union perf_event *event,
> +                                      struct ordered_events *oe);
> +

Sorry to be naive, why is this  perf_event__ and not perf_session__ ..
well I guess it is at least passed an event even though it doesn't use
it. Would be nice if there were comments, but this change is just
shifting things around. Anyway..

Acked-by: Ian Rogers <irogers@google.com>

Thanks,
Ian

>  #endif /* __PERF_SESSION_H */
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 05/35] perf tools: Factor out evsel__id_hdr_size()
  2022-07-11  9:31 ` [PATCH 05/35] perf tools: Factor out evsel__id_hdr_size() Adrian Hunter
@ 2022-07-19 17:09   ` Ian Rogers
  2022-08-09 11:49     ` Adrian Hunter
  0 siblings, 1 reply; 83+ messages in thread
From: Ian Rogers @ 2022-07-19 17:09 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Namhyung Kim, Andi Kleen,
	linux-kernel, kvm

On Mon, Jul 11, 2022 at 2:32 AM Adrian Hunter <adrian.hunter@intel.com> wrote:
>
> Factor out evsel__id_hdr_size() so it can be reused.
>
> This is needed by perf inject. When injecting events from a guest perf.data
> file, there is a possibility that the sample ID numbers conflict. To
> re-write an ID sample, the old one needs to be removed first, which means
> determining how big it is with evsel__id_hdr_size() and then subtracting
> that from the event size.
>
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
> ---
>  tools/perf/util/evlist.c | 28 +---------------------------
>  tools/perf/util/evsel.c  | 26 ++++++++++++++++++++++++++
>  tools/perf/util/evsel.h  |  2 ++
>  3 files changed, 29 insertions(+), 27 deletions(-)
>
> diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
> index 48af7d379d82..03fbe151b0c4 100644
> --- a/tools/perf/util/evlist.c
> +++ b/tools/perf/util/evlist.c
> @@ -1244,34 +1244,8 @@ bool evlist__valid_read_format(struct evlist *evlist)
>  u16 evlist__id_hdr_size(struct evlist *evlist)
>  {
>         struct evsel *first = evlist__first(evlist);
> -       struct perf_sample *data;
> -       u64 sample_type;
> -       u16 size = 0;
>
> -       if (!first->core.attr.sample_id_all)
> -               goto out;
> -
> -       sample_type = first->core.attr.sample_type;
> -
> -       if (sample_type & PERF_SAMPLE_TID)
> -               size += sizeof(data->tid) * 2;
> -
> -       if (sample_type & PERF_SAMPLE_TIME)
> -               size += sizeof(data->time);
> -
> -       if (sample_type & PERF_SAMPLE_ID)
> -               size += sizeof(data->id);
> -
> -       if (sample_type & PERF_SAMPLE_STREAM_ID)
> -               size += sizeof(data->stream_id);
> -
> -       if (sample_type & PERF_SAMPLE_CPU)
> -               size += sizeof(data->cpu) * 2;
> -
> -       if (sample_type & PERF_SAMPLE_IDENTIFIER)
> -               size += sizeof(data->id);
> -out:
> -       return size;
> +       return first->core.attr.sample_id_all ? evsel__id_hdr_size(first) : 0;
>  }
>
>  bool evlist__valid_sample_id_all(struct evlist *evlist)
> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> index a67cc3f2fa74..9a30ccb7b104 100644
> --- a/tools/perf/util/evsel.c
> +++ b/tools/perf/util/evsel.c
> @@ -2724,6 +2724,32 @@ int evsel__parse_sample_timestamp(struct evsel *evsel, union perf_event *event,
>         return 0;
>  }
>
> +u16 evsel__id_hdr_size(struct evsel *evsel)
> +{
> +       u64 sample_type = evsel->core.attr.sample_type;

As this just uses core, would it be more appropriate to put it in libperf?

> +       u16 size = 0;

Perhaps size_t or int? u16 seems odd.

> +
> +       if (sample_type & PERF_SAMPLE_TID)
> +               size += sizeof(u64);
> +
> +       if (sample_type & PERF_SAMPLE_TIME)
> +               size += sizeof(u64);
> +
> +       if (sample_type & PERF_SAMPLE_ID)
> +               size += sizeof(u64);
> +
> +       if (sample_type & PERF_SAMPLE_STREAM_ID)
> +               size += sizeof(u64);
> +
> +       if (sample_type & PERF_SAMPLE_CPU)
> +               size += sizeof(u64);
> +
> +       if (sample_type & PERF_SAMPLE_IDENTIFIER)
> +               size += sizeof(u64);
> +
> +       return size;
> +}
> +
>  struct tep_format_field *evsel__field(struct evsel *evsel, const char *name)
>  {
>         return tep_find_field(evsel->tp_format, name);
> diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
> index 92bed8e2f7d8..699448f2bc2b 100644
> --- a/tools/perf/util/evsel.h
> +++ b/tools/perf/util/evsel.h
> @@ -381,6 +381,8 @@ int evsel__parse_sample(struct evsel *evsel, union perf_event *event,
>  int evsel__parse_sample_timestamp(struct evsel *evsel, union perf_event *event,
>                                   u64 *timestamp);
>
> +u16 evsel__id_hdr_size(struct evsel *evsel);
> +

A comment would be nice, I know this is just moving code about but
this is a new function.

Thanks,
Ian

>  static inline struct evsel *evsel__next(struct evsel *evsel)
>  {
>         return list_entry(evsel->core.node.next, struct evsel, core.node);
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 06/35] perf tools: Add perf_event__synthesize_id_sample()
  2022-07-11  9:31 ` [PATCH 06/35] perf tools: Add perf_event__synthesize_id_sample() Adrian Hunter
@ 2022-07-19 17:10   ` Ian Rogers
  0 siblings, 0 replies; 83+ messages in thread
From: Ian Rogers @ 2022-07-19 17:10 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Namhyung Kim, Andi Kleen,
	linux-kernel, kvm

On Mon, Jul 11, 2022 at 2:32 AM Adrian Hunter <adrian.hunter@intel.com> wrote:
>
> Add perf_event__synthesize_id_sample() to enable the synthesis of
> ID samples.
>
> This is needed by perf inject. When injecting events from a guest perf.data
> file, there is a possibility that the sample ID numbers conflict. In that
> case, perf_event__synthesize_id_sample() can be used to re-write the ID
> sample.

This is great documentation, it would be nice to capture it with the
function declaration.

Thanks,
Ian

> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
> ---
>  tools/perf/util/synthetic-events.c | 47 ++++++++++++++++++++++++++++++
>  tools/perf/util/synthetic-events.h |  1 +
>  2 files changed, 48 insertions(+)
>
> diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic-events.c
> index fe5db4bf0042..ed9623702f34 100644
> --- a/tools/perf/util/synthetic-events.c
> +++ b/tools/perf/util/synthetic-events.c
> @@ -1712,6 +1712,53 @@ int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 read_fo
>         return 0;
>  }
>
> +int perf_event__synthesize_id_sample(__u64 *array, u64 type, const struct perf_sample *sample)
> +{
> +       __u64 *start = array;
> +
> +       /*
> +        * used for cross-endian analysis. See git commit 65014ab3
> +        * for why this goofiness is needed.
> +        */
> +       union u64_swap u;
> +
> +       if (type & PERF_SAMPLE_TID) {
> +               u.val32[0] = sample->pid;
> +               u.val32[1] = sample->tid;
> +               *array = u.val64;
> +               array++;
> +       }
> +
> +       if (type & PERF_SAMPLE_TIME) {
> +               *array = sample->time;
> +               array++;
> +       }
> +
> +       if (type & PERF_SAMPLE_ID) {
> +               *array = sample->id;
> +               array++;
> +       }
> +
> +       if (type & PERF_SAMPLE_STREAM_ID) {
> +               *array = sample->stream_id;
> +               array++;
> +       }
> +
> +       if (type & PERF_SAMPLE_CPU) {
> +               u.val32[0] = sample->cpu;
> +               u.val32[1] = 0;
> +               *array = u.val64;
> +               array++;
> +       }
> +
> +       if (type & PERF_SAMPLE_IDENTIFIER) {
> +               *array = sample->id;
> +               array++;
> +       }
> +
> +       return (void *)array - (void *)start;
> +}
> +
>  int perf_event__synthesize_id_index(struct perf_tool *tool, perf_event__handler_t process,
>                                     struct evlist *evlist, struct machine *machine)
>  {
> diff --git a/tools/perf/util/synthetic-events.h b/tools/perf/util/synthetic-events.h
> index 78a0450db164..b136ec3ec95d 100644
> --- a/tools/perf/util/synthetic-events.h
> +++ b/tools/perf/util/synthetic-events.h
> @@ -55,6 +55,7 @@ int perf_event__synthesize_extra_attr(struct perf_tool *tool, struct evlist *evs
>  int perf_event__synthesize_extra_kmaps(struct perf_tool *tool, perf_event__handler_t process, struct machine *machine);
>  int perf_event__synthesize_features(struct perf_tool *tool, struct perf_session *session, struct evlist *evlist, perf_event__handler_t process);
>  int perf_event__synthesize_id_index(struct perf_tool *tool, perf_event__handler_t process, struct evlist *evlist, struct machine *machine);
> +int perf_event__synthesize_id_sample(__u64 *array, u64 type, const struct perf_sample *sample);
>  int perf_event__synthesize_kernel_mmap(struct perf_tool *tool, perf_event__handler_t process, struct machine *machine);
>  int perf_event__synthesize_mmap_events(struct perf_tool *tool, union perf_event *event, pid_t pid, pid_t tgid, perf_event__handler_t process, struct machine *machine, bool mmap_data);
>  int perf_event__synthesize_modules(struct perf_tool *tool, perf_event__handler_t process, struct machine *machine);
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 07/35] perf script: Add --dump-unsorted-raw-trace option
  2022-07-11  9:31 ` [PATCH 07/35] perf script: Add --dump-unsorted-raw-trace option Adrian Hunter
@ 2022-07-19 17:11   ` Ian Rogers
  0 siblings, 0 replies; 83+ messages in thread
From: Ian Rogers @ 2022-07-19 17:11 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Namhyung Kim, Andi Kleen,
	linux-kernel, kvm

On Mon, Jul 11, 2022 at 2:33 AM Adrian Hunter <adrian.hunter@intel.com> wrote:
>
> When reviewing the results of perf inject, it is useful to be able to see
> the events in the order they appear in the file.
>
> So add --dump-unsorted-raw-trace option to do an unsorted dump.
>
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>

Acked-by: Ian Rogers <irogers@google.com>

Thanks,
Ian

> ---
>  tools/perf/Documentation/perf-script.txt | 3 +++
>  tools/perf/builtin-script.c              | 8 ++++++++
>  2 files changed, 11 insertions(+)
>
> diff --git a/tools/perf/Documentation/perf-script.txt b/tools/perf/Documentation/perf-script.txt
> index 1a557ff8f210..e250ff5566cf 100644
> --- a/tools/perf/Documentation/perf-script.txt
> +++ b/tools/perf/Documentation/perf-script.txt
> @@ -79,6 +79,9 @@ OPTIONS
>  --dump-raw-trace=::
>          Display verbose dump of the trace data.
>
> +--dump-unsorted-raw-trace=::
> +        Same as --dump-raw-trace but not sorted in time order.
> +
>  -L::
>  --Latency=::
>          Show latency attributes (irqs/preemption disabled, etc).
> diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
> index 7cf21ab16f4f..4b00a50faf00 100644
> --- a/tools/perf/builtin-script.c
> +++ b/tools/perf/builtin-script.c
> @@ -3746,6 +3746,7 @@ int cmd_script(int argc, const char **argv)
>         bool header = false;
>         bool header_only = false;
>         bool script_started = false;
> +       bool unsorted_dump = false;
>         char *rec_script_path = NULL;
>         char *rep_script_path = NULL;
>         struct perf_session *session;
> @@ -3794,6 +3795,8 @@ int cmd_script(int argc, const char **argv)
>         const struct option options[] = {
>         OPT_BOOLEAN('D', "dump-raw-trace", &dump_trace,
>                     "dump raw trace in ASCII"),
> +       OPT_BOOLEAN(0, "dump-unsorted-raw-trace", &unsorted_dump,
> +                   "dump unsorted raw trace in ASCII"),
>         OPT_INCR('v', "verbose", &verbose,
>                  "be more verbose (show symbol address, etc)"),
>         OPT_BOOLEAN('L', "Latency", &latency_format,
> @@ -3956,6 +3959,11 @@ int cmd_script(int argc, const char **argv)
>         data.path  = input_name;
>         data.force = symbol_conf.force;
>
> +       if (unsorted_dump) {
> +               dump_trace = true;
> +               script.tool.ordered_events = false;
> +       }
> +
>         if (symbol__validate_sym_arguments())
>                 return -1;
>
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 08/35] perf buildid-cache: Add guestmount'd files to the build ID cache
  2022-07-11  9:31 ` [PATCH 08/35] perf buildid-cache: Add guestmount'd files to the build ID cache Adrian Hunter
@ 2022-07-19 17:41   ` Ian Rogers
  2022-08-09 12:21     ` Adrian Hunter
  0 siblings, 1 reply; 83+ messages in thread
From: Ian Rogers @ 2022-07-19 17:41 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Namhyung Kim, Andi Kleen,
	linux-kernel, kvm

On Mon, Jul 11, 2022 at 2:33 AM Adrian Hunter <adrian.hunter@intel.com> wrote:
>
> When the guestmount option is used, a guest machine's file system mount
> point is recorded in machine->root_dir.
>
> perf already iterates guest machines when adding files to the build ID
> cache, but does not take machine->root_dir into account.
>
> Use machine->root_dir to find files for guest build IDs, and add them to
> the build ID cache using the "proper" name i.e. relative to the guest root
> directory not the host root directory.
>
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>

Is it plausible to add a test for this? Our tests create workload but
there's no existing hypervisor way to do this. Perhaps the test can
run in a hypervisor? Or maybe there's a route that doesn't involve
hypervisors.

Acked-by: Ian Rogers <irogers@google.com>

Thanks,
Ian

> ---
>  tools/perf/util/build-id.c | 67 +++++++++++++++++++++++++++++---------
>  tools/perf/util/build-id.h | 16 ++++++---
>  2 files changed, 63 insertions(+), 20 deletions(-)
>
> diff --git a/tools/perf/util/build-id.c b/tools/perf/util/build-id.c
> index 4c9093b64d1f..7c9f441936ee 100644
> --- a/tools/perf/util/build-id.c
> +++ b/tools/perf/util/build-id.c
> @@ -625,9 +625,12 @@ static int build_id_cache__add_sdt_cache(const char *sbuild_id,
>  #endif
>
>  static char *build_id_cache__find_debug(const char *sbuild_id,
> -                                       struct nsinfo *nsi)
> +                                       struct nsinfo *nsi,
> +                                       const char *root_dir)
>  {
> +       const char *dirname = "/usr/lib/debug/.build-id/";
>         char *realname = NULL;
> +       char dirbuf[PATH_MAX];
>         char *debugfile;
>         struct nscookie nsc;
>         size_t len = 0;
> @@ -636,8 +639,12 @@ static char *build_id_cache__find_debug(const char *sbuild_id,
>         if (!debugfile)
>                 goto out;
>
> -       len = __symbol__join_symfs(debugfile, PATH_MAX,
> -                                  "/usr/lib/debug/.build-id/");
> +       if (root_dir) {
> +               path__join(dirbuf, PATH_MAX, root_dir, dirname);
> +               dirname = dirbuf;
> +       }
> +
> +       len = __symbol__join_symfs(debugfile, PATH_MAX, dirname);
>         snprintf(debugfile + len, PATH_MAX - len, "%.2s/%s.debug", sbuild_id,
>                  sbuild_id + 2);
>
> @@ -668,14 +675,18 @@ static char *build_id_cache__find_debug(const char *sbuild_id,
>
>  int
>  build_id_cache__add(const char *sbuild_id, const char *name, const char *realname,
> -                   struct nsinfo *nsi, bool is_kallsyms, bool is_vdso)
> +                   struct nsinfo *nsi, bool is_kallsyms, bool is_vdso,
> +                   const char *proper_name, const char *root_dir)
>  {
>         const size_t size = PATH_MAX;
>         char *filename = NULL, *dir_name = NULL, *linkname = zalloc(size), *tmp;
>         char *debugfile = NULL;
>         int err = -1;
>
> -       dir_name = build_id_cache__cachedir(sbuild_id, name, nsi, is_kallsyms,
> +       if (!proper_name)
> +               proper_name = name;
> +
> +       dir_name = build_id_cache__cachedir(sbuild_id, proper_name, nsi, is_kallsyms,
>                                             is_vdso);
>         if (!dir_name)
>                 goto out_free;
> @@ -715,7 +726,7 @@ build_id_cache__add(const char *sbuild_id, const char *name, const char *realnam
>          */
>         if (!is_kallsyms && !is_vdso &&
>             strncmp(".ko", name + strlen(name) - 3, 3)) {
> -               debugfile = build_id_cache__find_debug(sbuild_id, nsi);
> +               debugfile = build_id_cache__find_debug(sbuild_id, nsi, root_dir);
>                 if (debugfile) {
>                         zfree(&filename);
>                         if (asprintf(&filename, "%s/%s", dir_name,
> @@ -781,8 +792,9 @@ build_id_cache__add(const char *sbuild_id, const char *name, const char *realnam
>         return err;
>  }
>
> -int build_id_cache__add_s(const char *sbuild_id, const char *name,
> -                         struct nsinfo *nsi, bool is_kallsyms, bool is_vdso)
> +int __build_id_cache__add_s(const char *sbuild_id, const char *name,
> +                           struct nsinfo *nsi, bool is_kallsyms, bool is_vdso,
> +                           const char *proper_name, const char *root_dir)
>  {
>         char *realname = NULL;
>         int err = -1;
> @@ -796,8 +808,8 @@ int build_id_cache__add_s(const char *sbuild_id, const char *name,
>                         goto out_free;
>         }
>
> -       err = build_id_cache__add(sbuild_id, name, realname, nsi, is_kallsyms, is_vdso);
> -
> +       err = build_id_cache__add(sbuild_id, name, realname, nsi,
> +                                 is_kallsyms, is_vdso, proper_name, root_dir);
>  out_free:
>         if (!is_kallsyms)
>                 free(realname);
> @@ -806,14 +818,16 @@ int build_id_cache__add_s(const char *sbuild_id, const char *name,
>
>  static int build_id_cache__add_b(const struct build_id *bid,
>                                  const char *name, struct nsinfo *nsi,
> -                                bool is_kallsyms, bool is_vdso)
> +                                bool is_kallsyms, bool is_vdso,
> +                                const char *proper_name,
> +                                const char *root_dir)
>  {
>         char sbuild_id[SBUILD_ID_SIZE];
>
>         build_id__sprintf(bid, sbuild_id);
>
> -       return build_id_cache__add_s(sbuild_id, name, nsi, is_kallsyms,
> -                                    is_vdso);
> +       return __build_id_cache__add_s(sbuild_id, name, nsi, is_kallsyms,
> +                                      is_vdso, proper_name, root_dir);
>  }
>
>  bool build_id_cache__cached(const char *sbuild_id)
> @@ -896,6 +910,10 @@ static int dso__cache_build_id(struct dso *dso, struct machine *machine,
>         bool is_kallsyms = dso__is_kallsyms(dso);
>         bool is_vdso = dso__is_vdso(dso);
>         const char *name = dso->long_name;
> +       const char *proper_name = NULL;
> +       const char *root_dir = NULL;
> +       char *allocated_name = NULL;
> +       int ret = 0;
>
>         if (!dso->has_build_id)
>                 return 0;
> @@ -905,11 +923,28 @@ static int dso__cache_build_id(struct dso *dso, struct machine *machine,
>                 name = machine->mmap_name;
>         }
>
> +       if (!machine__is_host(machine)) {
> +               if (*machine->root_dir) {
> +                       root_dir = machine->root_dir;
> +                       ret = asprintf(&allocated_name, "%s/%s", root_dir, name);
> +                       if (ret < 0)
> +                               return ret;
> +                       proper_name = name;
> +                       name = allocated_name;
> +               } else if (is_kallsyms) {
> +                       /* Cannot get guest kallsyms */
> +                       return 0;
> +               }
> +       }
> +
>         if (!is_kallsyms && dso__build_id_mismatch(dso, name))
> -               return 0;
> +               goto out_free;
>
> -       return build_id_cache__add_b(&dso->bid, name, dso->nsinfo,
> -                                    is_kallsyms, is_vdso);
> +       ret = build_id_cache__add_b(&dso->bid, name, dso->nsinfo,
> +                                   is_kallsyms, is_vdso, proper_name, root_dir);
> +out_free:
> +       free(allocated_name);
> +       return ret;
>  }
>
>  static int
> diff --git a/tools/perf/util/build-id.h b/tools/perf/util/build-id.h
> index c19617151670..4e3a1169379b 100644
> --- a/tools/perf/util/build-id.h
> +++ b/tools/perf/util/build-id.h
> @@ -66,10 +66,18 @@ int build_id_cache__list_build_ids(const char *pathname, struct nsinfo *nsi,
>                                    struct strlist **result);
>  bool build_id_cache__cached(const char *sbuild_id);
>  int build_id_cache__add(const char *sbuild_id, const char *name, const char *realname,
> -                       struct nsinfo *nsi, bool is_kallsyms, bool is_vdso);
> -int build_id_cache__add_s(const char *sbuild_id,
> -                         const char *name, struct nsinfo *nsi,
> -                         bool is_kallsyms, bool is_vdso);
> +                       struct nsinfo *nsi, bool is_kallsyms, bool is_vdso,
> +                       const char *proper_name, const char *root_dir);
> +int __build_id_cache__add_s(const char *sbuild_id,
> +                           const char *name, struct nsinfo *nsi,
> +                           bool is_kallsyms, bool is_vdso,
> +                           const char *proper_name, const char *root_dir);
> +static inline int build_id_cache__add_s(const char *sbuild_id,
> +                                       const char *name, struct nsinfo *nsi,
> +                                       bool is_kallsyms, bool is_vdso)
> +{
> +       return __build_id_cache__add_s(sbuild_id, name, nsi, is_kallsyms, is_vdso, NULL, NULL);
> +}
>  int build_id_cache__remove_s(const char *sbuild_id);
>
>  extern char buildid_dir[];
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 09/35] perf buildid-cache: Do not require purge files to also be in the file system
  2022-07-11  9:31 ` [PATCH 09/35] perf buildid-cache: Do not require purge files to also be in the file system Adrian Hunter
@ 2022-07-19 17:44   ` Ian Rogers
  0 siblings, 0 replies; 83+ messages in thread
From: Ian Rogers @ 2022-07-19 17:44 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Namhyung Kim, Andi Kleen,
	linux-kernel, kvm

On Mon, Jul 11, 2022 at 2:33 AM Adrian Hunter <adrian.hunter@intel.com> wrote:
>
> realname() returns NULL if the file is not in the file system, but we can
> still remove it from the build ID cache in that case, so continue and
> attempt the purge with the name provided.
>
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
> ---
>  tools/perf/util/build-id.c | 7 ++-----
>  1 file changed, 2 insertions(+), 5 deletions(-)
>
> diff --git a/tools/perf/util/build-id.c b/tools/perf/util/build-id.c
> index 7c9f441936ee..9e176146eb10 100644
> --- a/tools/perf/util/build-id.c
> +++ b/tools/perf/util/build-id.c
> @@ -561,14 +561,11 @@ char *build_id_cache__cachedir(const char *sbuild_id, const char *name,
>         char *realname = (char *)name, *filename;
>         bool slash = is_kallsyms || is_vdso;
>
> -       if (!slash) {
> +       if (!slash)
>                 realname = nsinfo__realpath(name, nsi);
> -               if (!realname)
> -                       return NULL;
> -       }
>
>         if (asprintf(&filename, "%s%s%s%s%s", buildid_dir, slash ? "/" : "",
> -                    is_vdso ? DSO__NAME_VDSO : realname,
> +                    is_vdso ? DSO__NAME_VDSO : (realname ? realname : name),

nit:  is_vdso ? DSO__NAME_VDSO : (realname ?: name),

Thanks,
Ian

>                      sbuild_id ? "/" : "", sbuild_id ?: "") < 0)
>                 filename = NULL;
>
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 10/35] perf tools: Add machine_pid and vcpu to id_index
  2022-07-11  9:31 ` [PATCH 10/35] perf tools: Add machine_pid and vcpu to id_index Adrian Hunter
@ 2022-07-19 17:48   ` Ian Rogers
  2022-08-09 12:19     ` Adrian Hunter
  0 siblings, 1 reply; 83+ messages in thread
From: Ian Rogers @ 2022-07-19 17:48 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Namhyung Kim, Andi Kleen,
	linux-kernel, kvm

On Mon, Jul 11, 2022 at 2:33 AM Adrian Hunter <adrian.hunter@intel.com> wrote:
>
> When injecting events from a guest perf.data file, the events will have
> separate sample ID numbers. These ID numbers can then be used to determine
> which machine an event belongs to. To facilitate that, add machine_pid and
> vcpu to id_index records. For backward compatibility, these are added at
> the end of the record, and the length of the record is used to determine
> if they are present or not.
>
> Note, this is needed because the events from a guest perf.data file contain
> the pid/tid of the process running at that time inside the VM not the
> pid/tid of the (QEMU) hypervisor thread. So a way is needed to relate
> guest events back to the guest machine and VCPU, and using sample ID
> numbers for that is relatively simple and convenient.
>
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
> ---
>  tools/lib/perf/include/internal/evsel.h |  4 ++
>  tools/lib/perf/include/perf/event.h     |  5 +++
>  tools/perf/util/session.c               | 40 ++++++++++++++++---
>  tools/perf/util/synthetic-events.c      | 51 +++++++++++++++++++------
>  tools/perf/util/synthetic-events.h      |  1 +
>  5 files changed, 84 insertions(+), 17 deletions(-)
>
> diff --git a/tools/lib/perf/include/internal/evsel.h b/tools/lib/perf/include/internal/evsel.h
> index 2a912a1f1989..a99a75d9e78f 100644
> --- a/tools/lib/perf/include/internal/evsel.h
> +++ b/tools/lib/perf/include/internal/evsel.h
> @@ -30,6 +30,10 @@ struct perf_sample_id {
>         struct perf_cpu          cpu;
>         pid_t                    tid;
>
> +       /* Guest machine pid and VCPU, valid only if machine_pid is non-zero */
> +       pid_t                    machine_pid;
> +       struct perf_cpu          vcpu;
> +
>         /* Holds total ID period value for PERF_SAMPLE_READ processing. */
>         u64                      period;
>  };
> diff --git a/tools/lib/perf/include/perf/event.h b/tools/lib/perf/include/perf/event.h
> index 9f7ca070da87..c2dbd3e88885 100644
> --- a/tools/lib/perf/include/perf/event.h
> +++ b/tools/lib/perf/include/perf/event.h
> @@ -237,6 +237,11 @@ struct id_index_entry {
>         __u64                    tid;
>  };
>
> +struct id_index_entry_2 {
> +       __u64                    machine_pid;
> +       __u64                    vcpu;
> +};
> +
>  struct perf_record_id_index {
>         struct perf_event_header header;
>         __u64                    nr;
> diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
> index 4c9513bc6d89..5141fe164e97 100644
> --- a/tools/perf/util/session.c
> +++ b/tools/perf/util/session.c
> @@ -2756,18 +2756,35 @@ int perf_event__process_id_index(struct perf_session *session,
>  {
>         struct evlist *evlist = session->evlist;
>         struct perf_record_id_index *ie = &event->id_index;
> +       size_t sz = ie->header.size - sizeof(*ie);
>         size_t i, nr, max_nr;
> +       size_t e1_sz = sizeof(struct id_index_entry);
> +       size_t e2_sz = sizeof(struct id_index_entry_2);
> +       size_t etot_sz = e1_sz + e2_sz;
> +       struct id_index_entry_2 *e2;
>
> -       max_nr = (ie->header.size - sizeof(struct perf_record_id_index)) /
> -                sizeof(struct id_index_entry);
> +       max_nr = sz / e1_sz;
>         nr = ie->nr;
> -       if (nr > max_nr)
> +       if (nr > max_nr) {
> +               printf("Too big: nr %zu max_nr %zu\n", nr, max_nr);
>                 return -EINVAL;
> +       }
> +
> +       if (sz >= nr * etot_sz) {
> +               max_nr = sz / etot_sz;
> +               if (nr > max_nr) {
> +                       printf("Too big2: nr %zu max_nr %zu\n", nr, max_nr);
> +                       return -EINVAL;
> +               }
> +               e2 = (void *)ie + sizeof(*ie) + nr * e1_sz;
> +       } else {
> +               e2 = NULL;
> +       }
>
>         if (dump_trace)
>                 fprintf(stdout, " nr: %zu\n", nr);
>
> -       for (i = 0; i < nr; i++) {
> +       for (i = 0; i < nr; i++, (e2 ? e2++ : 0)) {
>                 struct id_index_entry *e = &ie->entries[i];
>                 struct perf_sample_id *sid;
>
> @@ -2775,15 +2792,28 @@ int perf_event__process_id_index(struct perf_session *session,
>                         fprintf(stdout, " ... id: %"PRI_lu64, e->id);
>                         fprintf(stdout, "  idx: %"PRI_lu64, e->idx);
>                         fprintf(stdout, "  cpu: %"PRI_ld64, e->cpu);
> -                       fprintf(stdout, "  tid: %"PRI_ld64"\n", e->tid);
> +                       fprintf(stdout, "  tid: %"PRI_ld64, e->tid);
> +                       if (e2) {
> +                               fprintf(stdout, "  machine_pid: %"PRI_ld64, e2->machine_pid);
> +                               fprintf(stdout, "  vcpu: %"PRI_lu64"\n", e2->vcpu);
> +                       } else {
> +                               fprintf(stdout, "\n");
> +                       }
>                 }
>
>                 sid = evlist__id2sid(evlist, e->id);
>                 if (!sid)
>                         return -ENOENT;
> +
>                 sid->idx = e->idx;
>                 sid->cpu.cpu = e->cpu;
>                 sid->tid = e->tid;
> +
> +               if (!e2)
> +                       continue;
> +
> +               sid->machine_pid = e2->machine_pid;
> +               sid->vcpu.cpu = e2->vcpu;
>         }
>         return 0;
>  }
> diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic-events.c
> index ed9623702f34..2ae59c03ae77 100644
> --- a/tools/perf/util/synthetic-events.c
> +++ b/tools/perf/util/synthetic-events.c
> @@ -1759,19 +1759,26 @@ int perf_event__synthesize_id_sample(__u64 *array, u64 type, const struct perf_s
>         return (void *)array - (void *)start;
>  }
>
> -int perf_event__synthesize_id_index(struct perf_tool *tool, perf_event__handler_t process,
> -                                   struct evlist *evlist, struct machine *machine)
> +int __perf_event__synthesize_id_index(struct perf_tool *tool, perf_event__handler_t process,
> +                                     struct evlist *evlist, struct machine *machine, size_t from)
>  {
>         union perf_event *ev;
>         struct evsel *evsel;
> -       size_t nr = 0, i = 0, sz, max_nr, n;
> +       size_t nr = 0, i = 0, sz, max_nr, n, pos;
> +       size_t e1_sz = sizeof(struct id_index_entry);
> +       size_t e2_sz = sizeof(struct id_index_entry_2);
> +       size_t etot_sz = e1_sz + e2_sz;
> +       bool e2_needed = false;
>         int err;
>
> -       max_nr = (UINT16_MAX - sizeof(struct perf_record_id_index)) /
> -                sizeof(struct id_index_entry);
> +       max_nr = (UINT16_MAX - sizeof(struct perf_record_id_index)) / etot_sz;
>
> -       evlist__for_each_entry(evlist, evsel)
> +       pos = 0;
> +       evlist__for_each_entry(evlist, evsel) {
> +               if (pos++ < from)
> +                       continue;
>                 nr += evsel->core.ids;
> +       }
>
>         if (!nr)
>                 return 0;
> @@ -1779,31 +1786,38 @@ int perf_event__synthesize_id_index(struct perf_tool *tool, perf_event__handler_
>         pr_debug2("Synthesizing id index\n");
>
>         n = nr > max_nr ? max_nr : nr;
> -       sz = sizeof(struct perf_record_id_index) + n * sizeof(struct id_index_entry);
> +       sz = sizeof(struct perf_record_id_index) + n * etot_sz;
>         ev = zalloc(sz);
>         if (!ev)
>                 return -ENOMEM;
>
> +       sz = sizeof(struct perf_record_id_index) + n * e1_sz;
> +
>         ev->id_index.header.type = PERF_RECORD_ID_INDEX;
> -       ev->id_index.header.size = sz;
>         ev->id_index.nr = n;
>
> +       pos = 0;
>         evlist__for_each_entry(evlist, evsel) {
>                 u32 j;
>
> -               for (j = 0; j < evsel->core.ids; j++) {
> +               if (pos++ < from)
> +                       continue;
> +               for (j = 0; j < evsel->core.ids; j++, i++) {
>                         struct id_index_entry *e;
> +                       struct id_index_entry_2 *e2;
>                         struct perf_sample_id *sid;
>
>                         if (i >= n) {
> +                               ev->id_index.header.size = sz + (e2_needed ? n * e2_sz : 0);
>                                 err = process(tool, ev, NULL, machine);
>                                 if (err)
>                                         goto out_err;
>                                 nr -= n;
>                                 i = 0;
> +                               e2_needed = false;
>                         }
>
> -                       e = &ev->id_index.entries[i++];
> +                       e = &ev->id_index.entries[i];
>
>                         e->id = evsel->core.id[j];
>
> @@ -1816,11 +1830,18 @@ int perf_event__synthesize_id_index(struct perf_tool *tool, perf_event__handler_
>                         e->idx = sid->idx;
>                         e->cpu = sid->cpu.cpu;
>                         e->tid = sid->tid;
> +
> +                       if (sid->machine_pid)
> +                               e2_needed = true;
> +
> +                       e2 = (void *)ev + sz;
> +                       e2[i].machine_pid = sid->machine_pid;
> +                       e2[i].vcpu        = sid->vcpu.cpu;
>                 }
>         }
>
> -       sz = sizeof(struct perf_record_id_index) + nr * sizeof(struct id_index_entry);
> -       ev->id_index.header.size = sz;
> +       sz = sizeof(struct perf_record_id_index) + nr * e1_sz;
> +       ev->id_index.header.size = sz + (e2_needed ? nr * e2_sz : 0);
>         ev->id_index.nr = nr;
>
>         err = process(tool, ev, NULL, machine);
> @@ -1830,6 +1851,12 @@ int perf_event__synthesize_id_index(struct perf_tool *tool, perf_event__handler_
>         return err;
>  }
>
> +int perf_event__synthesize_id_index(struct perf_tool *tool, perf_event__handler_t process,
> +                                   struct evlist *evlist, struct machine *machine)
> +{
> +       return __perf_event__synthesize_id_index(tool, process, evlist, machine, 0);
> +}
> +
>  int __machine__synthesize_threads(struct machine *machine, struct perf_tool *tool,
>                                   struct target *target, struct perf_thread_map *threads,
>                                   perf_event__handler_t process, bool needs_mmap,
> diff --git a/tools/perf/util/synthetic-events.h b/tools/perf/util/synthetic-events.h
> index b136ec3ec95d..81cb3d6af0b9 100644
> --- a/tools/perf/util/synthetic-events.h
> +++ b/tools/perf/util/synthetic-events.h
> @@ -55,6 +55,7 @@ int perf_event__synthesize_extra_attr(struct perf_tool *tool, struct evlist *evs
>  int perf_event__synthesize_extra_kmaps(struct perf_tool *tool, perf_event__handler_t process, struct machine *machine);
>  int perf_event__synthesize_features(struct perf_tool *tool, struct perf_session *session, struct evlist *evlist, perf_event__handler_t process);
>  int perf_event__synthesize_id_index(struct perf_tool *tool, perf_event__handler_t process, struct evlist *evlist, struct machine *machine);
> +int __perf_event__synthesize_id_index(struct perf_tool *tool, perf_event__handler_t process, struct evlist *evlist, struct machine *machine, size_t from);

Given there is only 1 use in the file defining the function, should
this just be static with no header file declaration?

Thanks,
Ian

>  int perf_event__synthesize_id_sample(__u64 *array, u64 type, const struct perf_sample *sample);
>  int perf_event__synthesize_kernel_mmap(struct perf_tool *tool, perf_event__handler_t process, struct machine *machine);
>  int perf_event__synthesize_mmap_events(struct perf_tool *tool, union perf_event *event, pid_t pid, pid_t tgid, perf_event__handler_t process, struct machine *machine, bool mmap_data);
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 11/35] perf session: Create guest machines from id_index
  2022-07-11  9:31 ` [PATCH 11/35] perf session: Create guest machines from id_index Adrian Hunter
@ 2022-07-19 17:51   ` Ian Rogers
  0 siblings, 0 replies; 83+ messages in thread
From: Ian Rogers @ 2022-07-19 17:51 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Namhyung Kim, Andi Kleen,
	linux-kernel, kvm

On Mon, Jul 11, 2022 at 2:33 AM Adrian Hunter <adrian.hunter@intel.com> wrote:
>
> Now that id_index has machine_pid, use it to create guest machines.
> Create the guest machines with an idle thread because guest events
> for "swapper" will be possible.
>
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>

Nothing obviously off to my unqualified eyes :-)

Acked-by: Ian Rogers <irogers@google.com>

Thanks,
Ian

> ---
>  tools/perf/util/session.c | 31 +++++++++++++++++++++++++++++++
>  1 file changed, 31 insertions(+)
>
> diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
> index 5141fe164e97..1af981d5ad3c 100644
> --- a/tools/perf/util/session.c
> +++ b/tools/perf/util/session.c
> @@ -2751,6 +2751,24 @@ void perf_session__fprintf_info(struct perf_session *session, FILE *fp,
>         fprintf(fp, "# ========\n#\n");
>  }
>
> +static int perf_session__register_guest(struct perf_session *session, pid_t machine_pid)
> +{
> +       struct machine *machine = machines__findnew(&session->machines, machine_pid);
> +       struct thread *thread;
> +
> +       if (!machine)
> +               return -ENOMEM;
> +
> +       machine->single_address_space = session->machines.host.single_address_space;
> +
> +       thread = machine__idle_thread(machine);
> +       if (!thread)
> +               return -ENOMEM;
> +       thread__put(thread);
> +
> +       return 0;
> +}
> +
>  int perf_event__process_id_index(struct perf_session *session,
>                                  union perf_event *event)
>  {
> @@ -2762,6 +2780,7 @@ int perf_event__process_id_index(struct perf_session *session,
>         size_t e2_sz = sizeof(struct id_index_entry_2);
>         size_t etot_sz = e1_sz + e2_sz;
>         struct id_index_entry_2 *e2;
> +       pid_t last_pid = 0;
>
>         max_nr = sz / e1_sz;
>         nr = ie->nr;
> @@ -2787,6 +2806,7 @@ int perf_event__process_id_index(struct perf_session *session,
>         for (i = 0; i < nr; i++, (e2 ? e2++ : 0)) {
>                 struct id_index_entry *e = &ie->entries[i];
>                 struct perf_sample_id *sid;
> +               int ret;
>
>                 if (dump_trace) {
>                         fprintf(stdout, " ... id: %"PRI_lu64, e->id);
> @@ -2814,6 +2834,17 @@ int perf_event__process_id_index(struct perf_session *session,
>
>                 sid->machine_pid = e2->machine_pid;
>                 sid->vcpu.cpu = e2->vcpu;
> +
> +               if (!sid->machine_pid)
> +                       continue;
> +
> +               if (sid->machine_pid != last_pid) {
> +                       ret = perf_session__register_guest(session, sid->machine_pid);
> +                       if (ret)
> +                               return ret;
> +                       last_pid = sid->machine_pid;
> +                       perf_guest = true;
> +               }
>         }
>         return 0;
>  }
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 01/35] perf tools: Fix dso_id inode generation comparison
  2022-07-19 15:13       ` Ian Rogers
@ 2022-07-19 19:16         ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 83+ messages in thread
From: Arnaldo Carvalho de Melo @ 2022-07-19 19:16 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Adrian Hunter, Jiri Olsa, Namhyung Kim, Andi Kleen, linux-kernel, kvm

Em Tue, Jul 19, 2022 at 08:13:18AM -0700, Ian Rogers escreveu:
> On Tue, Jul 19, 2022 at 3:18 AM Adrian Hunter <adrian.hunter@intel.com> wrote:
> >
> > On 18/07/22 17:57, Arnaldo Carvalho de Melo wrote:
> > > Em Mon, Jul 11, 2022 at 12:31:44PM +0300, Adrian Hunter escreveu:
> > >> Synthesized MMAP events have zero ino_generation, so do not compare zero
> > >> values.
> > >>
> > >> Fixes: 0e3149f86b99 ("perf dso: Move dso_id from 'struct map' to 'struct dso'")
> > >> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
> > >> ---
> > >>  tools/perf/util/dsos.c | 10 ++++++++--
> > >>  1 file changed, 8 insertions(+), 2 deletions(-)
> > >>
> > >> diff --git a/tools/perf/util/dsos.c b/tools/perf/util/dsos.c
> > >> index b97366f77bbf..839a1f384733 100644
> > >> --- a/tools/perf/util/dsos.c
> > >> +++ b/tools/perf/util/dsos.c
> > >> @@ -23,8 +23,14 @@ static int __dso_id__cmp(struct dso_id *a, struct dso_id *b)
> > >>      if (a->ino > b->ino) return -1;
> > >>      if (a->ino < b->ino) return 1;
> > >>
> > >> -    if (a->ino_generation > b->ino_generation) return -1;
> > >> -    if (a->ino_generation < b->ino_generation) return 1;
> > >> +    /*
> > >> +     * Synthesized MMAP events have zero ino_generation, so do not compare
> > >> +     * zero values.
> > >> +     */
> > >> +    if (a->ino_generation && b->ino_generation) {
> > >> +            if (a->ino_generation > b->ino_generation) return -1;
> > >> +            if (a->ino_generation < b->ino_generation) return 1;
> > >> +    }
> > >
> > > But comparing didn't harm right? when its !0 now we may have three
> > > comparisions instead of 2 :-\
> > >
> > > The comment has some value tho, so I'm merging this :-)
> >
> > Thanks. I found it harmful because the mismatch resulted in a new
> > dso that did not have a build ID whereas the original dso did have
> > a build ID.  The build ID was essential because the object was not
> > found otherwise.
> 
> That's good to know, could we add that also to the comment? Perhaps:
> 
> Synthesized MMAP events have zero ino_generation, avoid comparing them
> with MMAP events with actual ino_generation.

I see now, thanks, adding this comment.

- Arnaldo

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 12/35] perf tools: Add guest_cpu to hypervisor threads
  2022-07-11  9:31 ` [PATCH 12/35] perf tools: Add guest_cpu to hypervisor threads Adrian Hunter
@ 2022-07-20  0:23   ` Ian Rogers
  0 siblings, 0 replies; 83+ messages in thread
From: Ian Rogers @ 2022-07-20  0:23 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Namhyung Kim, Andi Kleen,
	linux-kernel, kvm

On Mon, Jul 11, 2022 at 2:33 AM Adrian Hunter <adrian.hunter@intel.com> wrote:
>
> It is possible to know which guest machine was running at a point in time
> based on the PID of the currently running host thread. That is, perf
> identifies guest machines by the PID of the hypervisor.
>
> To determine the guest CPU, put it on the hypervisor (QEMU) thread for
> that VCPU.
>
> This is done when processing the id_index which provides the necessary
> information.
>
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
> ---
>  tools/perf/util/session.c | 18 ++++++++++++++++++
>  tools/perf/util/thread.c  |  1 +
>  tools/perf/util/thread.h  |  1 +
>  3 files changed, 20 insertions(+)
>
> diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
> index 1af981d5ad3c..91a091c35945 100644
> --- a/tools/perf/util/session.c
> +++ b/tools/perf/util/session.c
> @@ -2769,6 +2769,20 @@ static int perf_session__register_guest(struct perf_session *session, pid_t mach
>         return 0;
>  }
>
> +static int perf_session__set_guest_cpu(struct perf_session *session, pid_t pid,
> +                                      pid_t tid, int guest_cpu)
> +{
> +       struct machine *machine = &session->machines.host;
> +       struct thread *thread = machine__findnew_thread(machine, pid, tid);
> +
> +       if (!thread)
> +               return -ENOMEM;
> +       thread->guest_cpu = guest_cpu;
> +       thread__put(thread);
> +
> +       return 0;
> +}
> +
>  int perf_event__process_id_index(struct perf_session *session,
>                                  union perf_event *event)
>  {
> @@ -2845,6 +2859,10 @@ int perf_event__process_id_index(struct perf_session *session,
>                         last_pid = sid->machine_pid;
>                         perf_guest = true;
>                 }
> +
> +               ret = perf_session__set_guest_cpu(session, sid->machine_pid, e->tid, e2->vcpu);
> +               if (ret)
> +                       return ret;
>         }
>         return 0;
>  }
> diff --git a/tools/perf/util/thread.c b/tools/perf/util/thread.c
> index 665e5c0618ed..e3e5427e1c3c 100644
> --- a/tools/perf/util/thread.c
> +++ b/tools/perf/util/thread.c
> @@ -47,6 +47,7 @@ struct thread *thread__new(pid_t pid, pid_t tid)
>                 thread->tid = tid;
>                 thread->ppid = -1;
>                 thread->cpu = -1;
> +               thread->guest_cpu = -1;
>                 thread->lbr_stitch_enable = false;
>                 INIT_LIST_HEAD(&thread->namespaces_list);
>                 INIT_LIST_HEAD(&thread->comm_list);
> diff --git a/tools/perf/util/thread.h b/tools/perf/util/thread.h
> index b066fb30d203..241f300d7d6e 100644
> --- a/tools/perf/util/thread.h
> +++ b/tools/perf/util/thread.h
> @@ -39,6 +39,7 @@ struct thread {
>         pid_t                   tid;
>         pid_t                   ppid;
>         int                     cpu;
> +       int                     guest_cpu; /* For QEMU thread */

Could we tweak the comments here to be something like:

int cpu;  /* The CPU the thread is currently running on or the CPU of
the hypervisor thread. */
int guest_cpu; /* The CPU within a guest (QEMU) that's running. */

Does -1 convey meaning beyond uninitialized, like with the 'any' CPU
perf_event_open argument?

Thanks,
Ian


>         refcount_t              refcnt;
>         bool                    comm_set;
>         int                     comm_len;
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 13/35] perf tools: Add machine_pid and vcpu to perf_sample
  2022-07-11  9:31 ` [PATCH 13/35] perf tools: Add machine_pid and vcpu to perf_sample Adrian Hunter
@ 2022-07-20  0:36   ` Ian Rogers
  0 siblings, 0 replies; 83+ messages in thread
From: Ian Rogers @ 2022-07-20  0:36 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Namhyung Kim, Andi Kleen,
	linux-kernel, kvm

On Mon, Jul 11, 2022 at 2:33 AM Adrian Hunter <adrian.hunter@intel.com> wrote:
>
> When parsing a sample with a sample ID, copy machine_pid and vcpu from
> perf_sample_id to perf_sample.
>
> Note, machine_pid will be zero when unused, so only a non-zero value
> represents a guest machine. vcpu should be ignored if machine_pid is zero.
>
> Note also, machine_pid is used with events that have come from injecting a
> guest perf.data file, however guest events recorded on the host (i.e. using
> perf kvm) have the (QEMU) hypervisor process pid to identify them - refer
> machines__find_for_cpumode().
>
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>

Acked-by: Ian Rogers <irogers@google.com>

Thanks,
Ian

> ---
>  tools/perf/util/event.h  |  2 ++
>  tools/perf/util/evlist.c | 14 +++++++++++++-
>  tools/perf/util/evsel.c  |  1 +
>  3 files changed, 16 insertions(+), 1 deletion(-)
>
> diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
> index cdd72e05fd28..a660f304f83c 100644
> --- a/tools/perf/util/event.h
> +++ b/tools/perf/util/event.h
> @@ -148,6 +148,8 @@ struct perf_sample {
>         u64 code_page_size;
>         u64 cgroup;
>         u32 flags;
> +       u32 machine_pid;
> +       u32 vcpu;
>         u16 insn_len;
>         u8  cpumode;
>         u16 misc;
> diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
> index 03fbe151b0c4..64f5a8074c0c 100644
> --- a/tools/perf/util/evlist.c
> +++ b/tools/perf/util/evlist.c
> @@ -1507,10 +1507,22 @@ int evlist__start_workload(struct evlist *evlist)
>  int evlist__parse_sample(struct evlist *evlist, union perf_event *event, struct perf_sample *sample)
>  {
>         struct evsel *evsel = evlist__event2evsel(evlist, event);
> +       int ret;
>
>         if (!evsel)
>                 return -EFAULT;
> -       return evsel__parse_sample(evsel, event, sample);
> +       ret = evsel__parse_sample(evsel, event, sample);
> +       if (ret)
> +               return ret;
> +       if (perf_guest && sample->id) {
> +               struct perf_sample_id *sid = evlist__id2sid(evlist, sample->id);
> +
> +               if (sid) {
> +                       sample->machine_pid = sid->machine_pid;
> +                       sample->vcpu = sid->vcpu.cpu;
> +               }
> +       }
> +       return 0;
>  }
>
>  int evlist__parse_sample_timestamp(struct evlist *evlist, union perf_event *event, u64 *timestamp)
> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> index 9a30ccb7b104..14396ea5a968 100644
> --- a/tools/perf/util/evsel.c
> +++ b/tools/perf/util/evsel.c
> @@ -2365,6 +2365,7 @@ int evsel__parse_sample(struct evsel *evsel, union perf_event *event,
>         data->misc    = event->header.misc;
>         data->id = -1ULL;
>         data->data_src = PERF_MEM_DATA_SRC_NONE;
> +       data->vcpu = -1;
>
>         if (event->header.type != PERF_RECORD_SAMPLE) {
>                 if (!evsel->core.attr.sample_id_all)
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 14/35] perf tools: Use sample->machine_pid to find guest machine
  2022-07-11  9:31 ` [PATCH 14/35] perf tools: Use sample->machine_pid to find guest machine Adrian Hunter
@ 2022-07-20  0:37   ` Ian Rogers
  0 siblings, 0 replies; 83+ messages in thread
From: Ian Rogers @ 2022-07-20  0:37 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Namhyung Kim, Andi Kleen,
	linux-kernel, kvm

On Mon, Jul 11, 2022 at 2:33 AM Adrian Hunter <adrian.hunter@intel.com> wrote:
>
> If machine_pid is set, use it to find the guest machine.
>
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>

Acked-by: Ian Rogers <irogers@google.com>

Thanks,
Ian

> ---
>  tools/perf/util/session.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
> index 91a091c35945..f3e9fa557bc9 100644
> --- a/tools/perf/util/session.c
> +++ b/tools/perf/util/session.c
> @@ -1418,7 +1418,9 @@ static struct machine *machines__find_for_cpumode(struct machines *machines,
>              (sample->cpumode == PERF_RECORD_MISC_GUEST_USER))) {
>                 u32 pid;
>
> -               if (event->header.type == PERF_RECORD_MMAP
> +               if (sample->machine_pid)
> +                       pid = sample->machine_pid;
> +               else if (event->header.type == PERF_RECORD_MMAP
>                     || event->header.type == PERF_RECORD_MMAP2)
>                         pid = event->mmap.pid;
>                 else
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 15/35] perf script: Add machine_pid and vcpu
  2022-07-11  9:31 ` [PATCH 15/35] perf script: Add machine_pid and vcpu Adrian Hunter
@ 2022-07-20  0:39   ` Ian Rogers
  0 siblings, 0 replies; 83+ messages in thread
From: Ian Rogers @ 2022-07-20  0:39 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Namhyung Kim, Andi Kleen,
	linux-kernel, kvm

On Mon, Jul 11, 2022 at 2:33 AM Adrian Hunter <adrian.hunter@intel.com> wrote:
>
> Add fields machine_pid and vcpu. These are displayed only if machine_pid is
> non-zero.
>
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
> ---
>  tools/perf/Documentation/perf-script.txt |  7 ++++++-
>  tools/perf/builtin-script.c              | 11 +++++++++++
>  2 files changed, 17 insertions(+), 1 deletion(-)
>
> diff --git a/tools/perf/Documentation/perf-script.txt b/tools/perf/Documentation/perf-script.txt
> index e250ff5566cf..c09cc44e50ee 100644
> --- a/tools/perf/Documentation/perf-script.txt
> +++ b/tools/perf/Documentation/perf-script.txt
> @@ -133,7 +133,8 @@ OPTIONS
>          comm, tid, pid, time, cpu, event, trace, ip, sym, dso, addr, symoff,
>          srcline, period, iregs, uregs, brstack, brstacksym, flags, bpf-output,
>          brstackinsn, brstackinsnlen, brstackoff, callindent, insn, insnlen, synth,
> -        phys_addr, metric, misc, srccode, ipc, data_page_size, code_page_size, ins_lat.
> +        phys_addr, metric, misc, srccode, ipc, data_page_size, code_page_size, ins_lat,
> +        machine_pid, vcpu.
>          Field list can be prepended with the type, trace, sw or hw,
>          to indicate to which event type the field list applies.
>          e.g., -F sw:comm,tid,time,ip,sym  and -F trace:time,cpu,trace
> @@ -226,6 +227,10 @@ OPTIONS
>         The ipc (instructions per cycle) field is synthesized and may have a value when
>         Instruction Trace decoding.
>
> +       The machine_pid and vcpu fields are derived from data resulting from using
> +       perf insert to insert a perf.data file recorded inside a virtual machine into

Presumably 'perf inject' ?

Thanks,
Ian

> +       a perf.data file recorded on the host at the same time.
> +
>         Finally, a user may not set fields to none for all event types.
>         i.e., -F "" is not allowed.
>
> diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
> index 4b00a50faf00..ac19fee62d8e 100644
> --- a/tools/perf/builtin-script.c
> +++ b/tools/perf/builtin-script.c
> @@ -125,6 +125,8 @@ enum perf_output_field {
>         PERF_OUTPUT_CODE_PAGE_SIZE  = 1ULL << 34,
>         PERF_OUTPUT_INS_LAT         = 1ULL << 35,
>         PERF_OUTPUT_BRSTACKINSNLEN  = 1ULL << 36,
> +       PERF_OUTPUT_MACHINE_PID     = 1ULL << 37,
> +       PERF_OUTPUT_VCPU            = 1ULL << 38,
>  };
>
>  struct perf_script {
> @@ -193,6 +195,8 @@ struct output_option {
>         {.str = "code_page_size", .field = PERF_OUTPUT_CODE_PAGE_SIZE},
>         {.str = "ins_lat", .field = PERF_OUTPUT_INS_LAT},
>         {.str = "brstackinsnlen", .field = PERF_OUTPUT_BRSTACKINSNLEN},
> +       {.str = "machine_pid", .field = PERF_OUTPUT_MACHINE_PID},
> +       {.str = "vcpu", .field = PERF_OUTPUT_VCPU},
>  };
>
>  enum {
> @@ -746,6 +750,13 @@ static int perf_sample__fprintf_start(struct perf_script *script,
>         int printed = 0;
>         char tstr[128];
>
> +       if (PRINT_FIELD(MACHINE_PID) && sample->machine_pid)
> +               printed += fprintf(fp, "VM:%5d ", sample->machine_pid);
> +
> +       /* Print VCPU only for guest events i.e. with machine_pid */
> +       if (PRINT_FIELD(VCPU) && sample->machine_pid)
> +               printed += fprintf(fp, "VCPU:%03d ", sample->vcpu);
> +
>         if (PRINT_FIELD(COMM)) {
>                 const char *comm = thread ? thread__comm_str(thread) : ":-1";
>
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 16/35] perf dlfilter: Add machine_pid and vcpu
  2022-07-11  9:31 ` [PATCH 16/35] perf dlfilter: " Adrian Hunter
@ 2022-07-20  0:42   ` Ian Rogers
  0 siblings, 0 replies; 83+ messages in thread
From: Ian Rogers @ 2022-07-20  0:42 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Namhyung Kim, Andi Kleen,
	linux-kernel, kvm

On Mon, Jul 11, 2022 at 2:33 AM Adrian Hunter <adrian.hunter@intel.com> wrote:
>
> Add machine_pid and vcpu to struct perf_dlfilter_sample. The 'size' can be
> used to determine if the values are present, however machine_pid is zero if
> unused in any case. vcpu should be ignored if machine_pid is zero.
>
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>

Acked-by: Ian Rogers <irogers@google.com>

Thanks,
Ian

> ---
>  tools/perf/Documentation/perf-dlfilter.txt | 22 ++++++++++++++++++++++
>  tools/perf/include/perf/perf_dlfilter.h    |  8 ++++++++
>  tools/perf/util/dlfilter.c                 |  2 ++
>  3 files changed, 32 insertions(+)
>
> diff --git a/tools/perf/Documentation/perf-dlfilter.txt b/tools/perf/Documentation/perf-dlfilter.txt
> index 594f5a5a0c9e..fb22e3b31dc5 100644
> --- a/tools/perf/Documentation/perf-dlfilter.txt
> +++ b/tools/perf/Documentation/perf-dlfilter.txt
> @@ -107,9 +107,31 @@ struct perf_dlfilter_sample {
>         __u64 raw_callchain_nr; /* Number of raw_callchain entries */
>         const __u64 *raw_callchain; /* Refer <linux/perf_event.h> */
>         const char *event;
> +       __s32 machine_pid;
> +       __s32 vcpu;
>  };
>  ----
>
> +Note: 'machine_pid' and 'vcpu' are not original members, but were added together later.
> +'size' can be used to determine their presence at run time.
> +PERF_DLFILTER_HAS_MACHINE_PID will be defined if they are present at compile time.
> +For example:
> +[source,c]
> +----
> +#include <perf/perf_dlfilter.h>
> +#include <stddef.h>
> +#include <stdbool.h>
> +
> +static inline bool have_machine_pid(const struct perf_dlfilter_sample *sample)
> +{
> +#ifdef PERF_DLFILTER_HAS_MACHINE_PID
> +       return sample->size >= offsetof(struct perf_dlfilter_sample, vcpu) + sizeof(sample->vcpu);
> +#else
> +       return false;
> +#endif
> +}
> +----
> +
>  The perf_dlfilter_fns structure
>  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> diff --git a/tools/perf/include/perf/perf_dlfilter.h b/tools/perf/include/perf/perf_dlfilter.h
> index 3eef03d661b4..a26e2f129f83 100644
> --- a/tools/perf/include/perf/perf_dlfilter.h
> +++ b/tools/perf/include/perf/perf_dlfilter.h
> @@ -9,6 +9,12 @@
>  #include <linux/perf_event.h>
>  #include <linux/types.h>
>
> +/*
> + * The following macro can be used to determine if this header defines
> + * perf_dlfilter_sample machine_pid and vcpu.
> + */
> +#define PERF_DLFILTER_HAS_MACHINE_PID
> +
>  /* Definitions for perf_dlfilter_sample flags */
>  enum {
>         PERF_DLFILTER_FLAG_BRANCH       = 1ULL << 0,
> @@ -62,6 +68,8 @@ struct perf_dlfilter_sample {
>         __u64 raw_callchain_nr; /* Number of raw_callchain entries */
>         const __u64 *raw_callchain; /* Refer <linux/perf_event.h> */
>         const char *event;
> +       __s32 machine_pid;
> +       __s32 vcpu;
>  };
>
>  /*
> diff --git a/tools/perf/util/dlfilter.c b/tools/perf/util/dlfilter.c
> index db964d5a52af..54e4d4495e00 100644
> --- a/tools/perf/util/dlfilter.c
> +++ b/tools/perf/util/dlfilter.c
> @@ -495,6 +495,8 @@ int dlfilter__do_filter_event(struct dlfilter *d,
>         ASSIGN(misc);
>         ASSIGN(raw_size);
>         ASSIGN(raw_data);
> +       ASSIGN(machine_pid);
> +       ASSIGN(vcpu);
>
>         if (sample->branch_stack) {
>                 d_sample.brstack_nr = sample->branch_stack->nr;
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 17/35] perf auxtrace: Add machine_pid and vcpu to auxtrace_error
  2022-07-11  9:32 ` [PATCH 17/35] perf auxtrace: Add machine_pid and vcpu to auxtrace_error Adrian Hunter
@ 2022-07-20  0:43   ` Ian Rogers
  0 siblings, 0 replies; 83+ messages in thread
From: Ian Rogers @ 2022-07-20  0:43 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Namhyung Kim, Andi Kleen,
	linux-kernel, kvm

On Mon, Jul 11, 2022 at 2:33 AM Adrian Hunter <adrian.hunter@intel.com> wrote:
>
> Add machine_pid and vcpu to struct perf_record_auxtrace_error. The existing
> fmt member is used to identify the new format.
>
> The new members make it possible to easily differentiate errors from guest
> machines.
>
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>

Acked-by: Ian Rogers <irogers@google.com>

Thanks,
Ian

> ---
>  tools/lib/perf/include/perf/event.h           |  2 ++
>  tools/perf/util/auxtrace.c                    | 30 +++++++++++++++----
>  tools/perf/util/auxtrace.h                    |  4 +++
>  .../scripting-engines/trace-event-python.c    |  4 ++-
>  tools/perf/util/session.c                     |  4 +++
>  5 files changed, 37 insertions(+), 7 deletions(-)
>
> diff --git a/tools/lib/perf/include/perf/event.h b/tools/lib/perf/include/perf/event.h
> index c2dbd3e88885..556bb06798f2 100644
> --- a/tools/lib/perf/include/perf/event.h
> +++ b/tools/lib/perf/include/perf/event.h
> @@ -279,6 +279,8 @@ struct perf_record_auxtrace_error {
>         __u64                    ip;
>         __u64                    time;
>         char                     msg[MAX_AUXTRACE_ERROR_MSG];
> +       __u32                    machine_pid;
> +       __u32                    vcpu;
>  };
>
>  struct perf_record_aux {
> diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c
> index 511dd3caa1bc..6edab8a16de6 100644
> --- a/tools/perf/util/auxtrace.c
> +++ b/tools/perf/util/auxtrace.c
> @@ -1189,9 +1189,10 @@ void auxtrace_buffer__free(struct auxtrace_buffer *buffer)
>         free(buffer);
>  }
>
> -void auxtrace_synth_error(struct perf_record_auxtrace_error *auxtrace_error, int type,
> -                         int code, int cpu, pid_t pid, pid_t tid, u64 ip,
> -                         const char *msg, u64 timestamp)
> +void auxtrace_synth_guest_error(struct perf_record_auxtrace_error *auxtrace_error, int type,
> +                               int code, int cpu, pid_t pid, pid_t tid, u64 ip,
> +                               const char *msg, u64 timestamp,
> +                               pid_t machine_pid, int vcpu)
>  {
>         size_t size;
>
> @@ -1207,12 +1208,26 @@ void auxtrace_synth_error(struct perf_record_auxtrace_error *auxtrace_error, int
>         auxtrace_error->ip = ip;
>         auxtrace_error->time = timestamp;
>         strlcpy(auxtrace_error->msg, msg, MAX_AUXTRACE_ERROR_MSG);
> -
> -       size = (void *)auxtrace_error->msg - (void *)auxtrace_error +
> -              strlen(auxtrace_error->msg) + 1;
> +       if (machine_pid) {
> +               auxtrace_error->fmt = 2;
> +               auxtrace_error->machine_pid = machine_pid;
> +               auxtrace_error->vcpu = vcpu;
> +               size = sizeof(*auxtrace_error);
> +       } else {
> +               size = (void *)auxtrace_error->msg - (void *)auxtrace_error +
> +                      strlen(auxtrace_error->msg) + 1;
> +       }
>         auxtrace_error->header.size = PERF_ALIGN(size, sizeof(u64));
>  }
>
> +void auxtrace_synth_error(struct perf_record_auxtrace_error *auxtrace_error, int type,
> +                         int code, int cpu, pid_t pid, pid_t tid, u64 ip,
> +                         const char *msg, u64 timestamp)
> +{
> +       auxtrace_synth_guest_error(auxtrace_error, type, code, cpu, pid, tid,
> +                                  ip, msg, timestamp, 0, -1);
> +}
> +
>  int perf_event__synthesize_auxtrace_info(struct auxtrace_record *itr,
>                                          struct perf_tool *tool,
>                                          struct perf_session *session,
> @@ -1662,6 +1677,9 @@ size_t perf_event__fprintf_auxtrace_error(union perf_event *event, FILE *fp)
>         if (!e->fmt)
>                 msg = (const char *)&e->time;
>
> +       if (e->fmt >= 2 && e->machine_pid)
> +               ret += fprintf(fp, " machine_pid %d vcpu %d", e->machine_pid, e->vcpu);
> +
>         ret += fprintf(fp, " cpu %d pid %d tid %d ip %#"PRI_lx64" code %u: %s\n",
>                        e->cpu, e->pid, e->tid, e->ip, e->code, msg);
>         return ret;
> diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h
> index cd0d25c2751c..6a4fbfd34c6b 100644
> --- a/tools/perf/util/auxtrace.h
> +++ b/tools/perf/util/auxtrace.h
> @@ -595,6 +595,10 @@ int auxtrace_index__process(int fd, u64 size, struct perf_session *session,
>                             bool needs_swap);
>  void auxtrace_index__free(struct list_head *head);
>
> +void auxtrace_synth_guest_error(struct perf_record_auxtrace_error *auxtrace_error, int type,
> +                               int code, int cpu, pid_t pid, pid_t tid, u64 ip,
> +                               const char *msg, u64 timestamp,
> +                               pid_t machine_pid, int vcpu);
>  void auxtrace_synth_error(struct perf_record_auxtrace_error *auxtrace_error, int type,
>                           int code, int cpu, pid_t pid, pid_t tid, u64 ip,
>                           const char *msg, u64 timestamp);
> diff --git a/tools/perf/util/scripting-engines/trace-event-python.c b/tools/perf/util/scripting-engines/trace-event-python.c
> index adba01b7d9dd..3367c5479199 100644
> --- a/tools/perf/util/scripting-engines/trace-event-python.c
> +++ b/tools/perf/util/scripting-engines/trace-event-python.c
> @@ -1559,7 +1559,7 @@ static void python_process_auxtrace_error(struct perf_session *session __maybe_u
>                 msg = (const char *)&e->time;
>         }
>
> -       t = tuple_new(9);
> +       t = tuple_new(11);
>
>         tuple_set_u32(t, 0, e->type);
>         tuple_set_u32(t, 1, e->code);
> @@ -1570,6 +1570,8 @@ static void python_process_auxtrace_error(struct perf_session *session __maybe_u
>         tuple_set_u64(t, 6, tm);
>         tuple_set_string(t, 7, msg);
>         tuple_set_u32(t, 8, cpumode);
> +       tuple_set_s32(t, 9, e->machine_pid);
> +       tuple_set_s32(t, 10, e->vcpu);
>
>         call_object(handler, t, handler_name);
>
> diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
> index f3e9fa557bc9..7ea0b91013ea 100644
> --- a/tools/perf/util/session.c
> +++ b/tools/perf/util/session.c
> @@ -895,6 +895,10 @@ static void perf_event__auxtrace_error_swap(union perf_event *event,
>         event->auxtrace_error.ip   = bswap_64(event->auxtrace_error.ip);
>         if (event->auxtrace_error.fmt)
>                 event->auxtrace_error.time = bswap_64(event->auxtrace_error.time);
> +       if (event->auxtrace_error.fmt >= 2) {
> +               event->auxtrace_error.machine_pid = bswap_32(event->auxtrace_error.machine_pid);
> +               event->auxtrace_error.vcpu = bswap_32(event->auxtrace_error.vcpu);
> +       }
>  }
>
>  static void perf_event__thread_map_swap(union perf_event *event,
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 18/35] perf script python: Add machine_pid and vcpu
  2022-07-11  9:32 ` [PATCH 18/35] perf script python: Add machine_pid and vcpu Adrian Hunter
@ 2022-07-20  0:43   ` Ian Rogers
  0 siblings, 0 replies; 83+ messages in thread
From: Ian Rogers @ 2022-07-20  0:43 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Namhyung Kim, Andi Kleen,
	linux-kernel, kvm

On Mon, Jul 11, 2022 at 2:33 AM Adrian Hunter <adrian.hunter@intel.com> wrote:
>
> Add machine_pid and vcpu to python sample events and context switch events.
>
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>

Acked-by: Ian Rogers <irogers@google.com>

Thanks,
Ian

> ---
>  .../perf/util/scripting-engines/trace-event-python.c  | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/tools/perf/util/scripting-engines/trace-event-python.c b/tools/perf/util/scripting-engines/trace-event-python.c
> index 3367c5479199..5bbc1b16f368 100644
> --- a/tools/perf/util/scripting-engines/trace-event-python.c
> +++ b/tools/perf/util/scripting-engines/trace-event-python.c
> @@ -861,6 +861,13 @@ static PyObject *get_perf_sample_dict(struct perf_sample *sample,
>         brstacksym = python_process_brstacksym(sample, al->thread);
>         pydict_set_item_string_decref(dict, "brstacksym", brstacksym);
>
> +       if (sample->machine_pid) {
> +               pydict_set_item_string_decref(dict_sample, "machine_pid",
> +                               _PyLong_FromLong(sample->machine_pid));
> +               pydict_set_item_string_decref(dict_sample, "vcpu",
> +                               _PyLong_FromLong(sample->vcpu));
> +       }
> +
>         pydict_set_item_string_decref(dict_sample, "cpumode",
>                         _PyLong_FromLong((unsigned long)sample->cpumode));
>
> @@ -1509,7 +1516,7 @@ static void python_do_process_switch(union perf_event *event,
>                 np_tid = event->context_switch.next_prev_tid;
>         }
>
> -       t = tuple_new(9);
> +       t = tuple_new(11);
>         if (!t)
>                 return;
>
> @@ -1522,6 +1529,8 @@ static void python_do_process_switch(union perf_event *event,
>         tuple_set_s32(t, 6, machine->pid);
>         tuple_set_bool(t, 7, out);
>         tuple_set_bool(t, 8, out_preempt);
> +       tuple_set_s32(t, 9, sample->machine_pid);
> +       tuple_set_s32(t, 10, sample->vcpu);
>
>         call_object(handler, t, handler_name);
>
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 19/35] perf script python: intel-pt-events: Add machine_pid and vcpu
  2022-07-11  9:32 ` [PATCH 19/35] perf script python: intel-pt-events: " Adrian Hunter
@ 2022-07-20  0:44   ` Ian Rogers
  0 siblings, 0 replies; 83+ messages in thread
From: Ian Rogers @ 2022-07-20  0:44 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Namhyung Kim, Andi Kleen,
	linux-kernel, kvm

On Mon, Jul 11, 2022 at 2:33 AM Adrian Hunter <adrian.hunter@intel.com> wrote:
>
> Add machine_pid and vcpu to the intel-pt-events.py script.
>
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>

Acked-by: Ian Rogers <irogers@google.com>

Thanks,
Ian

> ---
>  tools/perf/scripts/python/intel-pt-events.py | 32 +++++++++++++++++---
>  1 file changed, 27 insertions(+), 5 deletions(-)
>
> diff --git a/tools/perf/scripts/python/intel-pt-events.py b/tools/perf/scripts/python/intel-pt-events.py
> index 9b7746b89381..6be7fd8fd615 100644
> --- a/tools/perf/scripts/python/intel-pt-events.py
> +++ b/tools/perf/scripts/python/intel-pt-events.py
> @@ -197,7 +197,12 @@ def common_start_str(comm, sample):
>         cpu = sample["cpu"]
>         pid = sample["pid"]
>         tid = sample["tid"]
> -       return "%16s %5u/%-5u [%03u] %9u.%09u  " % (comm, pid, tid, cpu, ts / 1000000000, ts %1000000000)
> +       if "machine_pid" in sample:
> +               machine_pid = sample["machine_pid"]
> +               vcpu = sample["vcpu"]
> +               return "VM:%5d VCPU:%03d %16s %5u/%-5u [%03u] %9u.%09u  " % (machine_pid, vcpu, comm, pid, tid, cpu, ts / 1000000000, ts %1000000000)
> +       else:
> +               return "%16s %5u/%-5u [%03u] %9u.%09u  " % (comm, pid, tid, cpu, ts / 1000000000, ts %1000000000)
>
>  def print_common_start(comm, sample, name):
>         flags_disp = get_optional_null(sample, "flags_disp")
> @@ -379,9 +384,19 @@ def process_event(param_dict):
>                 sys.exit(1)
>
>  def auxtrace_error(typ, code, cpu, pid, tid, ip, ts, msg, cpumode, *x):
> +       if len(x) >= 2 and x[0]:
> +               machine_pid = x[0]
> +               vcpu = x[1]
> +       else:
> +               machine_pid = 0
> +               vcpu = -1
>         try:
> -               print("%16s %5u/%-5u [%03u] %9u.%09u  error type %u code %u: %s ip 0x%16x" %
> -                       ("Trace error", pid, tid, cpu, ts / 1000000000, ts %1000000000, typ, code, msg, ip))
> +               if machine_pid:
> +                       print("VM:%5d VCPU:%03d %16s %5u/%-5u [%03u] %9u.%09u  error type %u code %u: %s ip 0x%16x" %
> +                               (machine_pid, vcpu, "Trace error", pid, tid, cpu, ts / 1000000000, ts %1000000000, typ, code, msg, ip))
> +               else:
> +                       print("%16s %5u/%-5u [%03u] %9u.%09u  error type %u code %u: %s ip 0x%16x" %
> +                               ("Trace error", pid, tid, cpu, ts / 1000000000, ts %1000000000, typ, code, msg, ip))
>         except broken_pipe_exception:
>                 # Stop python printing broken pipe errors and traceback
>                 sys.stdout = open(os.devnull, 'w')
> @@ -396,14 +411,21 @@ def context_switch(ts, cpu, pid, tid, np_pid, np_tid, machine_pid, out, out_pree
>                 preempt_str = "preempt"
>         else:
>                 preempt_str = ""
> +       if len(x) >= 2 and x[0]:
> +               machine_pid = x[0]
> +               vcpu = x[1]
> +       else:
> +               vcpu = None;
>         if machine_pid == -1:
>                 machine_str = ""
> -       else:
> +       elif vcpu is None:
>                 machine_str = "machine PID %d" % machine_pid
> +       else:
> +               machine_str = "machine PID %d VCPU %d" % (machine_pid, vcpu)
>         switch_str = "%16s %5d/%-5d [%03u] %9u.%09u %5d/%-5d %s %s" % \
>                 (out_str, pid, tid, cpu, ts / 1000000000, ts %1000000000, np_pid, np_tid, machine_str, preempt_str)
>         if glb_args.all_switch_events:
> -               print(switch_str);
> +               print(switch_str)
>         else:
>                 global glb_switch_str
>                 glb_switch_str[cpu] = switch_str
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 20/35] perf tools: Remove also guest kcore_dir with host kcore_dir
  2022-07-11  9:32 ` [PATCH 20/35] perf tools: Remove also guest kcore_dir with host kcore_dir Adrian Hunter
@ 2022-07-20  0:45   ` Ian Rogers
  0 siblings, 0 replies; 83+ messages in thread
From: Ian Rogers @ 2022-07-20  0:45 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Namhyung Kim, Andi Kleen,
	linux-kernel, kvm

On Mon, Jul 11, 2022 at 2:33 AM Adrian Hunter <adrian.hunter@intel.com> wrote:
>
> Copies of /proc/kallsyms, /proc/modules and an extract of /proc/kcore can
> be stored in the perf.data output directory under the subdirectory named
> kcore_dir. Guest machines will have their files also under subdirectories
> beginning kcore_dir__ followed by the machine pid. Remove these also when
> removing kcore_dir.
>
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>

Acked-by: Ian Rogers <irogers@google.com>

Thanks,
Ian

> ---
>  tools/perf/util/util.c | 37 +++++++++++++++++++++++++++++++++++--
>  1 file changed, 35 insertions(+), 2 deletions(-)
>
> diff --git a/tools/perf/util/util.c b/tools/perf/util/util.c
> index eeb83c80f458..9b02edf9311d 100644
> --- a/tools/perf/util/util.c
> +++ b/tools/perf/util/util.c
> @@ -200,7 +200,7 @@ static int rm_rf_depth_pat(const char *path, int depth, const char **pat)
>         return rmdir(path);
>  }
>
> -static int rm_rf_kcore_dir(const char *path)
> +static int rm_rf_a_kcore_dir(const char *path, const char *name)
>  {
>         char kcore_dir_path[PATH_MAX];
>         const char *pat[] = {
> @@ -210,11 +210,44 @@ static int rm_rf_kcore_dir(const char *path)
>                 NULL,
>         };
>
> -       snprintf(kcore_dir_path, sizeof(kcore_dir_path), "%s/kcore_dir", path);
> +       snprintf(kcore_dir_path, sizeof(kcore_dir_path), "%s/%s", path, name);
>
>         return rm_rf_depth_pat(kcore_dir_path, 0, pat);
>  }
>
> +static bool kcore_dir_filter(const char *name __maybe_unused, struct dirent *d)
> +{
> +       const char *pat[] = {
> +               "kcore_dir",
> +               "kcore_dir__[1-9]*",
> +               NULL,
> +       };
> +
> +       return match_pat(d->d_name, pat);
> +}
> +
> +static int rm_rf_kcore_dir(const char *path)
> +{
> +       struct strlist *kcore_dirs;
> +       struct str_node *nd;
> +       int ret;
> +
> +       kcore_dirs = lsdir(path, kcore_dir_filter);
> +
> +       if (!kcore_dirs)
> +               return 0;
> +
> +       strlist__for_each_entry(nd, kcore_dirs) {
> +               ret = rm_rf_a_kcore_dir(path, nd->s);
> +               if (ret)
> +                       return ret;
> +       }
> +
> +       strlist__delete(kcore_dirs);
> +
> +       return 0;
> +}
> +
>  int rm_rf_perf_data(const char *path)
>  {
>         const char *pat[] = {
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 21/35] perf tools: Make has_kcore_dir() work also for guest kcore_dir
  2022-07-11  9:32 ` [PATCH 21/35] perf tools: Make has_kcore_dir() work also for guest kcore_dir Adrian Hunter
@ 2022-07-20  0:49   ` Ian Rogers
  0 siblings, 0 replies; 83+ messages in thread
From: Ian Rogers @ 2022-07-20  0:49 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Namhyung Kim, Andi Kleen,
	linux-kernel, kvm

On Mon, Jul 11, 2022 at 2:33 AM Adrian Hunter <adrian.hunter@intel.com> wrote:
>
> Copies of /proc/kallsyms, /proc/modules and an extract of /proc/kcore can
> be stored in the perf.data output directory under the subdirectory named
> kcore_dir. Guest machines will have their files also under subdirectories
> beginning kcore_dir__ followed by the machine pid. Make has_kcore_dir()
> return true also if there is a guest machine kcore_dir.
>
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>

Acked-by: Ian Rogers <irogers@google.com>

Thanks,
Ian

> ---
>  tools/perf/util/data.c | 24 +++++++++++++++---------
>  1 file changed, 15 insertions(+), 9 deletions(-)
>
> diff --git a/tools/perf/util/data.c b/tools/perf/util/data.c
> index caabeac24c69..9782ccbe595d 100644
> --- a/tools/perf/util/data.c
> +++ b/tools/perf/util/data.c
> @@ -3,6 +3,7 @@
>  #include <linux/kernel.h>
>  #include <linux/string.h>
>  #include <linux/zalloc.h>
> +#include <linux/err.h>
>  #include <sys/types.h>
>  #include <sys/stat.h>
>  #include <errno.h>
> @@ -481,16 +482,21 @@ int perf_data__make_kcore_dir(struct perf_data *data, char *buf, size_t buf_sz)
>
>  bool has_kcore_dir(const char *path)
>  {
> -       char *kcore_dir;
> -       int ret;
> -
> -       if (asprintf(&kcore_dir, "%s/kcore_dir", path) < 0)
> -               return false;
> -
> -       ret = access(kcore_dir, F_OK);
> +       struct dirent *d = ERR_PTR(-EINVAL);
> +       const char *name = "kcore_dir";
> +       DIR *dir = opendir(path);
> +       size_t n = strlen(name);
> +       bool result = false;
> +
> +       if (dir) {
> +               while (d && !result) {
> +                       d = readdir(dir);
> +                       result = d ? strncmp(d->d_name, name, n) : false;
> +               }
> +               closedir(dir);
> +       }
>
> -       free(kcore_dir);
> -       return !ret;
> +       return result;
>  }
>
>  char *perf_data__kallsyms_name(struct perf_data *data)
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 22/35] perf tools: Automatically use guest kcore_dir if present
  2022-07-11  9:32 ` [PATCH 22/35] perf tools: Automatically use guest kcore_dir if present Adrian Hunter
@ 2022-07-20  0:51   ` Ian Rogers
  0 siblings, 0 replies; 83+ messages in thread
From: Ian Rogers @ 2022-07-20  0:51 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Namhyung Kim, Andi Kleen,
	linux-kernel, kvm

On Mon, Jul 11, 2022 at 2:33 AM Adrian Hunter <adrian.hunter@intel.com> wrote:
>
> When registering a guest machine using machine_pid from the id index,
> check perf.data for a matching kcore_dir subdirectory and set the
> kallsyms file name accordingly. If set, use it to find the machine's
> kernel symbols and object code (from kcore).
>
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
> ---
>  tools/perf/util/data.c    | 19 +++++++++++++++++++
>  tools/perf/util/data.h    |  1 +
>  tools/perf/util/machine.h |  1 +
>  tools/perf/util/session.c |  2 ++
>  tools/perf/util/symbol.c  |  6 ++++--
>  5 files changed, 27 insertions(+), 2 deletions(-)
>
> diff --git a/tools/perf/util/data.c b/tools/perf/util/data.c
> index 9782ccbe595d..a7f68c309545 100644
> --- a/tools/perf/util/data.c
> +++ b/tools/perf/util/data.c
> @@ -518,6 +518,25 @@ char *perf_data__kallsyms_name(struct perf_data *data)
>         return kallsyms_name;
>  }
>
> +char *perf_data__guest_kallsyms_name(struct perf_data *data, pid_t machine_pid)
> +{
> +       char *kallsyms_name;
> +       struct stat st;
> +
> +       if (!data->is_dir)
> +               return NULL;
> +
> +       if (asprintf(&kallsyms_name, "%s/kcore_dir__%d/kallsyms", data->path, machine_pid) < 0)

Is there a missing free for this in perf_data__close ?

Thanks,
Ian

> +               return NULL;
> +
> +       if (stat(kallsyms_name, &st)) {
> +               free(kallsyms_name);
> +               return NULL;
> +       }
> +
> +       return kallsyms_name;
> +}
> +
>  bool is_perf_data(const char *path)
>  {
>         bool ret = false;
> diff --git a/tools/perf/util/data.h b/tools/perf/util/data.h
> index 7de53d6e2d7f..173132d502f5 100644
> --- a/tools/perf/util/data.h
> +++ b/tools/perf/util/data.h
> @@ -101,5 +101,6 @@ unsigned long perf_data__size(struct perf_data *data);
>  int perf_data__make_kcore_dir(struct perf_data *data, char *buf, size_t buf_sz);
>  bool has_kcore_dir(const char *path);
>  char *perf_data__kallsyms_name(struct perf_data *data);
> +char *perf_data__guest_kallsyms_name(struct perf_data *data, pid_t machine_pid);
>  bool is_perf_data(const char *path);
>  #endif /* __PERF_DATA_H */
> diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
> index 5d7daf7cb7bc..d40b23c71420 100644
> --- a/tools/perf/util/machine.h
> +++ b/tools/perf/util/machine.h
> @@ -48,6 +48,7 @@ struct machine {
>         bool              single_address_space;
>         char              *root_dir;
>         char              *mmap_name;
> +       char              *kallsyms_filename;
>         struct threads    threads[THREADS__TABLE_SIZE];
>         struct vdso_info  *vdso_info;
>         struct perf_env   *env;
> diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
> index 7ea0b91013ea..98e16659a149 100644
> --- a/tools/perf/util/session.c
> +++ b/tools/perf/util/session.c
> @@ -2772,6 +2772,8 @@ static int perf_session__register_guest(struct perf_session *session, pid_t mach
>                 return -ENOMEM;
>         thread__put(thread);
>
> +       machine->kallsyms_filename = perf_data__guest_kallsyms_name(session->data, machine_pid);
> +
>         return 0;
>  }
>
> diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
> index f72baf636724..a4b22caa7c24 100644
> --- a/tools/perf/util/symbol.c
> +++ b/tools/perf/util/symbol.c
> @@ -2300,11 +2300,13 @@ static int dso__load_kernel_sym(struct dso *dso, struct map *map)
>  static int dso__load_guest_kernel_sym(struct dso *dso, struct map *map)
>  {
>         int err;
> -       const char *kallsyms_filename = NULL;
> +       const char *kallsyms_filename;
>         struct machine *machine = map__kmaps(map)->machine;
>         char path[PATH_MAX];
>
> -       if (machine__is_default_guest(machine)) {
> +       if (machine->kallsyms_filename) {
> +               kallsyms_filename = machine->kallsyms_filename;
> +       } else if (machine__is_default_guest(machine)) {
>                 /*
>                  * if the user specified a vmlinux filename, use it and only
>                  * it, reporting errors to the user if it cannot be used.
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 23/35] perf tools: Add reallocarray_as_needed()
  2022-07-11  9:32 ` [PATCH 23/35] perf tools: Add reallocarray_as_needed() Adrian Hunter
@ 2022-07-20  0:55   ` Ian Rogers
  2022-08-09 16:48     ` Adrian Hunter
  0 siblings, 1 reply; 83+ messages in thread
From: Ian Rogers @ 2022-07-20  0:55 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Namhyung Kim, Andi Kleen,
	linux-kernel, kvm

On Mon, Jul 11, 2022 at 2:33 AM Adrian Hunter <adrian.hunter@intel.com> wrote:
>
> Add helper reallocarray_as_needed() to reallocate an array to a larger
> size and initialize the extra entries to an arbitrary value.
>
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
> ---
>  tools/perf/util/util.c | 33 +++++++++++++++++++++++++++++++++
>  tools/perf/util/util.h | 15 +++++++++++++++
>  2 files changed, 48 insertions(+)
>
> diff --git a/tools/perf/util/util.c b/tools/perf/util/util.c
> index 9b02edf9311d..391c1e928bd7 100644
> --- a/tools/perf/util/util.c
> +++ b/tools/perf/util/util.c
> @@ -18,6 +18,7 @@
>  #include <linux/kernel.h>
>  #include <linux/log2.h>
>  #include <linux/time64.h>
> +#include <linux/overflow.h>
>  #include <unistd.h>
>  #include "cap.h"
>  #include "strlist.h"
> @@ -500,3 +501,35 @@ char *filename_with_chroot(int pid, const char *filename)
>
>         return new_name;
>  }
> +
> +/*
> + * Reallocate an array *arr of size *arr_sz so that it is big enough to contain
> + * x elements of size msz, initializing new entries to *init_val or zero if
> + * init_val is NULL
> + */
> +int do_realloc_array_as_needed(void **arr, size_t *arr_sz, size_t x, size_t msz, const void *init_val)

This feels a little like a 1-dimensional xyarray, could we make a
similar abstraction to avoid passing all these values around?

Thanks,
Ian

> +{
> +       size_t new_sz = *arr_sz;
> +       void *new_arr;
> +       size_t i;
> +
> +       if (!new_sz)
> +               new_sz = msz >= 64 ? 1 : roundup(64, msz); /* Start with at least 64 bytes */
> +       while (x >= new_sz) {
> +               if (check_mul_overflow(new_sz, (size_t)2, &new_sz))
> +                       return -ENOMEM;
> +       }
> +       if (new_sz == *arr_sz)
> +               return 0;
> +       new_arr = calloc(new_sz, msz);
> +       if (!new_arr)
> +               return -ENOMEM;
> +       memcpy(new_arr, *arr, *arr_sz * msz);
> +       if (init_val) {
> +               for (i = *arr_sz; i < new_sz; i++)
> +                       memcpy(new_arr + (i * msz), init_val, msz);
> +       }
> +       *arr = new_arr;
> +       *arr_sz = new_sz;
> +       return 0;
> +}
> diff --git a/tools/perf/util/util.h b/tools/perf/util/util.h
> index 0f78f1e7782d..c1f2d423a9ec 100644
> --- a/tools/perf/util/util.h
> +++ b/tools/perf/util/util.h
> @@ -79,4 +79,19 @@ struct perf_debuginfod {
>  void perf_debuginfod_setup(struct perf_debuginfod *di);
>
>  char *filename_with_chroot(int pid, const char *filename);
> +
> +int do_realloc_array_as_needed(void **arr, size_t *arr_sz, size_t x,
> +                              size_t msz, const void *init_val);
> +
> +#define realloc_array_as_needed(a, n, x, v) ({                 \
> +       typeof(x) __x = (x);                                    \
> +       __x >= (n) ?                                            \
> +               do_realloc_array_as_needed((void **)&(a),       \
> +                                          &(n),                \
> +                                          __x,                 \
> +                                          sizeof(*(a)),        \
> +                                          (const void *)(v)) : \
> +               0;                                              \
> +       })
> +
>  #endif /* GIT_COMPAT_UTIL_H */
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 24/35] perf inject: Add support for injecting guest sideband events
  2022-07-11  9:32 ` [PATCH 24/35] perf inject: Add support for injecting guest sideband events Adrian Hunter
@ 2022-07-20  1:06   ` Ian Rogers
  2022-08-11 17:19     ` Adrian Hunter
  0 siblings, 1 reply; 83+ messages in thread
From: Ian Rogers @ 2022-07-20  1:06 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Namhyung Kim, Andi Kleen,
	linux-kernel, kvm

On Mon, Jul 11, 2022 at 2:33 AM Adrian Hunter <adrian.hunter@intel.com> wrote:
>
> Inject events from a perf.data file recorded in a virtual machine into
> a perf.data file recorded on the host at the same time.
>
> Only side band events (e.g. mmap, comm, fork, exit etc) and build IDs are
> injected.  Additionally, the guest kcore_dir is copied as kcore_dir__
> appended to the machine PID.
>
> This is non-trivial because:
>  o It is not possible to process 2 sessions simultaneously so instead
>  events are first written to a temporary file.
>  o To avoid conflict, guest sample IDs are replaced with new unused sample
>  IDs.
>  o Guest event's CPU is changed to be the host CPU because it is more
>  useful for reporting and analysis.
>  o Sample ID is mapped to machine PID which is recorded with VCPU in the
>  id index. This is important to allow guest events to be related to the
>  guest machine and VCPU.
>  o Timestamps must be converted.
>  o Events are inserted to obey finished-round ordering.
>
> The anticipated use-case is:
>  - start recording sideband events in a guest machine
>  - start recording an AUX area trace on the host which can trace also the
>  guest (e.g. Intel PT)
>  - run test case on the guest
>  - stop recording on the host
>  - stop recording on the guest
>  - copy the guest perf.data file to the host
>  - inject the guest perf.data file sideband events into the host perf.data
>  file using perf inject
>  - the resulting perf.data file can now be used
>
> Subsequent patches provide Intel PT support for this.
>
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
> ---
>  tools/perf/Documentation/perf-inject.txt |   17 +
>  tools/perf/builtin-inject.c              | 1043 +++++++++++++++++++++-
>  2 files changed, 1059 insertions(+), 1 deletion(-)
>
> diff --git a/tools/perf/Documentation/perf-inject.txt b/tools/perf/Documentation/perf-inject.txt
> index 0570a1ccd344..646aa31586ed 100644
> --- a/tools/perf/Documentation/perf-inject.txt
> +++ b/tools/perf/Documentation/perf-inject.txt
> @@ -85,6 +85,23 @@ include::itrace.txt[]
>         without updating it. Currently this option is supported only by
>         Intel PT, refer linkperf:perf-intel-pt[1]
>
> +--guest-data=<path>,<pid>[,<time offset>[,<time scale>]]::
> +       Insert events from a perf.data file recorded in a virtual machine at
> +       the same time as the input perf.data file was recorded on the host.
> +       The Process ID (PID) of the QEMU hypervisor process must be provided,
> +       and the time offset and time scale (multiplier) will likely be needed
> +       to convert guest time stamps into host time stamps. For example, for
> +       x86 the TSC Offset and Multiplier could be provided for a virtual machine
> +       using Linux command line option no-kvmclock.
> +       Currently only mmap, mmap2, comm, task, context_switch, ksymbol,
> +       and text_poke events are inserted, as well as build ID information.
> +       The QEMU option -name debug-threads=on is needed so that thread names
> +       can be used to determine which thread is running which VCPU. Note
> +       libvirt seems to use this by default.
> +       When using perf record in the guest, option --sample-identifier
> +       should be used, and also --buildid-all and --switch-events may be
> +       useful.
> +

Would other hypervisors based on kvm like gVisor work if they
implemented name-debug-threads?

>  SEE ALSO
>  --------
>  linkperf:perf-record[1], linkperf:perf-report[1], linkperf:perf-archive[1],
> diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
> index c800911f68e7..fd4547bb75f7 100644
> --- a/tools/perf/builtin-inject.c
> +++ b/tools/perf/builtin-inject.c
> @@ -26,6 +26,7 @@
>  #include "util/thread.h"
>  #include "util/namespaces.h"
>  #include "util/util.h"
> +#include "util/tsc.h"
>
>  #include <internal/lib.h>
>
> @@ -35,8 +36,70 @@
>
>  #include <linux/list.h>
>  #include <linux/string.h>
> +#include <linux/zalloc.h>
> +#include <linux/hash.h>
>  #include <errno.h>
>  #include <signal.h>
> +#include <inttypes.h>
> +
> +struct guest_event {
> +       struct perf_sample              sample;
> +       union perf_event                *event;
> +       char                            event_buf[PERF_SAMPLE_MAX_SIZE];
> +};
> +
> +struct guest_id {
> +       /* hlist_node must be first, see free_hlist() */
> +       struct hlist_node               node;
> +       u64                             id;
> +       u64                             host_id;
> +       u32                             vcpu;
> +};
> +
> +struct guest_tid {
> +       /* hlist_node must be first, see free_hlist() */
> +       struct hlist_node               node;
> +       /* Thread ID of QEMU thread */
> +       u32                             tid;
> +       u32                             vcpu;
> +};
> +
> +struct guest_vcpu {
> +       /* Current host CPU */
> +       u32                             cpu;
> +       /* Thread ID of QEMU thread */
> +       u32                             tid;
> +};
> +
> +struct guest_session {
> +       char                            *perf_data_file;
> +       u32                             machine_pid;
> +       u64                             time_offset;
> +       double                          time_scale;
> +       struct perf_tool                tool;
> +       struct perf_data                data;
> +       struct perf_session             *session;
> +       char                            *tmp_file_name;
> +       int                             tmp_fd;
> +       struct perf_tsc_conversion      host_tc;
> +       struct perf_tsc_conversion      guest_tc;
> +       bool                            copy_kcore_dir;
> +       bool                            have_tc;
> +       bool                            fetched;
> +       bool                            ready;
> +       u16                             dflt_id_hdr_size;
> +       u64                             dflt_id;
> +       u64                             highest_id;
> +       /* Array of guest_vcpu */
> +       struct guest_vcpu               *vcpu;
> +       size_t                          vcpu_cnt;
> +       /* Hash table for guest_id */
> +       struct hlist_head               heads[PERF_EVLIST__HLIST_SIZE];
> +       /* Hash table for guest_tid */
> +       struct hlist_head               tids[PERF_EVLIST__HLIST_SIZE];
> +       /* Place to stash next guest event */
> +       struct guest_event              ev;
> +};
>
>  struct perf_inject {
>         struct perf_tool        tool;
> @@ -59,6 +122,7 @@ struct perf_inject {
>         struct itrace_synth_opts itrace_synth_opts;
>         char                    event_copy[PERF_SAMPLE_MAX_SIZE];
>         struct perf_file_section secs[HEADER_FEAT_BITS];
> +       struct guest_session    guest_session;
>  };
>
>  struct event_entry {
> @@ -698,6 +762,841 @@ static int perf_inject__sched_stat(struct perf_tool *tool,
>         return perf_event__repipe(tool, event_sw, &sample_sw, machine);
>  }
>
> +static struct guest_vcpu *guest_session__vcpu(struct guest_session *gs, u32 vcpu)
> +{
> +       if (realloc_array_as_needed(gs->vcpu, gs->vcpu_cnt, vcpu, NULL))
> +               return NULL;
> +       return &gs->vcpu[vcpu];
> +}
> +
> +static int guest_session__output_bytes(struct guest_session *gs, void *buf, size_t sz)
> +{
> +       ssize_t ret = writen(gs->tmp_fd, buf, sz);
> +
> +       return ret < 0 ? ret : 0;
> +}
> +
> +static int guest_session__repipe(struct perf_tool *tool,
> +                                union perf_event *event,
> +                                struct perf_sample *sample __maybe_unused,
> +                                struct machine *machine __maybe_unused)
> +{
> +       struct guest_session *gs = container_of(tool, struct guest_session, tool);
> +
> +       return guest_session__output_bytes(gs, event, event->header.size);
> +}
> +
> +static int guest_session__map_tid(struct guest_session *gs, u32 tid, u32 vcpu)
> +{
> +       struct guest_tid *guest_tid = zalloc(sizeof(*guest_tid));
> +       int hash;
> +
> +       if (!guest_tid)
> +               return -ENOMEM;
> +
> +       guest_tid->tid = tid;
> +       guest_tid->vcpu = vcpu;
> +       hash = hash_32(guest_tid->tid, PERF_EVLIST__HLIST_BITS);
> +       hlist_add_head(&guest_tid->node, &gs->tids[hash]);
> +
> +       return 0;
> +}
> +
> +static int host_peek_vm_comms_cb(struct perf_session *session __maybe_unused,
> +                                union perf_event *event,
> +                                u64 offset __maybe_unused, void *data)
> +{
> +       struct guest_session *gs = data;
> +       unsigned int vcpu;
> +       struct guest_vcpu *guest_vcpu;
> +       int ret;
> +
> +       if (event->header.type != PERF_RECORD_COMM ||
> +           event->comm.pid != gs->machine_pid)
> +               return 0;
> +
> +       /*
> +        * QEMU option -name debug-threads=on, causes thread names formatted as
> +        * below, although it is not an ABI. Also libvirt seems to use this by
> +        * default. Here we rely on it to tell us which thread is which VCPU.
> +        */
> +       ret = sscanf(event->comm.comm, "CPU %u/KVM", &vcpu);
> +       if (ret <= 0)
> +               return ret;
> +       pr_debug("Found VCPU: tid %u comm %s vcpu %u\n",
> +                event->comm.tid, event->comm.comm, vcpu);
> +       if (vcpu > INT_MAX) {
> +               pr_err("Invalid VCPU %u\n", vcpu);
> +               return -EINVAL;
> +       }
> +       guest_vcpu = guest_session__vcpu(gs, vcpu);
> +       if (!guest_vcpu)
> +               return -ENOMEM;
> +       if (guest_vcpu->tid && guest_vcpu->tid != event->comm.tid) {
> +               pr_err("Fatal error: Two threads found with the same VCPU\n");
> +               return -EINVAL;
> +       }
> +       guest_vcpu->tid = event->comm.tid;
> +
> +       return guest_session__map_tid(gs, event->comm.tid, vcpu);
> +}
> +
> +static int host_peek_vm_comms(struct perf_session *session, struct guest_session *gs)
> +{
> +       return perf_session__peek_events(session, session->header.data_offset,
> +                                        session->header.data_size,
> +                                        host_peek_vm_comms_cb, gs);
> +}
> +
> +static bool evlist__is_id_used(struct evlist *evlist, u64 id)
> +{
> +       return evlist__id2sid(evlist, id);
> +}
> +
> +static u64 guest_session__allocate_new_id(struct guest_session *gs, struct evlist *host_evlist)
> +{
> +       do {
> +               gs->highest_id += 1;
> +       } while (!gs->highest_id || evlist__is_id_used(host_evlist, gs->highest_id));
> +
> +       return gs->highest_id;
> +}
> +
> +static int guest_session__map_id(struct guest_session *gs, u64 id, u64 host_id, u32 vcpu)
> +{
> +       struct guest_id *guest_id = zalloc(sizeof(*guest_id));
> +       int hash;
> +
> +       if (!guest_id)
> +               return -ENOMEM;
> +
> +       guest_id->id = id;
> +       guest_id->host_id = host_id;
> +       guest_id->vcpu = vcpu;
> +       hash = hash_64(guest_id->id, PERF_EVLIST__HLIST_BITS);
> +       hlist_add_head(&guest_id->node, &gs->heads[hash]);
> +
> +       return 0;
> +}
> +
> +static u64 evlist__find_highest_id(struct evlist *evlist)
> +{
> +       struct evsel *evsel;
> +       u64 highest_id = 1;
> +
> +       evlist__for_each_entry(evlist, evsel) {
> +               u32 j;
> +
> +               for (j = 0; j < evsel->core.ids; j++) {
> +                       u64 id = evsel->core.id[j];
> +
> +                       if (id > highest_id)
> +                               highest_id = id;
> +               }
> +       }
> +
> +       return highest_id;
> +}
> +
> +static int guest_session__map_ids(struct guest_session *gs, struct evlist *host_evlist)
> +{
> +       struct evlist *evlist = gs->session->evlist;
> +       struct evsel *evsel;
> +       int ret;
> +
> +       evlist__for_each_entry(evlist, evsel) {
> +               u32 j;
> +
> +               for (j = 0; j < evsel->core.ids; j++) {
> +                       struct perf_sample_id *sid;
> +                       u64 host_id;
> +                       u64 id;
> +
> +                       id = evsel->core.id[j];
> +                       sid = evlist__id2sid(evlist, id);
> +                       if (!sid || sid->cpu.cpu == -1)
> +                               continue;
> +                       host_id = guest_session__allocate_new_id(gs, host_evlist);
> +                       ret = guest_session__map_id(gs, id, host_id, sid->cpu.cpu);
> +                       if (ret)
> +                               return ret;
> +               }
> +       }
> +
> +       return 0;
> +}
> +
> +static struct guest_id *guest_session__lookup_id(struct guest_session *gs, u64 id)
> +{
> +       struct hlist_head *head;
> +       struct guest_id *guest_id;
> +       int hash;
> +
> +       hash = hash_64(id, PERF_EVLIST__HLIST_BITS);
> +       head = &gs->heads[hash];
> +
> +       hlist_for_each_entry(guest_id, head, node)
> +               if (guest_id->id == id)
> +                       return guest_id;
> +
> +       return NULL;
> +}
> +
> +static int process_attr(struct perf_tool *tool, union perf_event *event,
> +                       struct perf_sample *sample __maybe_unused,
> +                       struct machine *machine __maybe_unused)
> +{
> +       struct perf_inject *inject = container_of(tool, struct perf_inject, tool);
> +
> +       return perf_event__process_attr(tool, event, &inject->session->evlist);
> +}
> +
> +static int guest_session__add_attr(struct guest_session *gs, struct evsel *evsel)
> +{
> +       struct perf_inject *inject = container_of(gs, struct perf_inject, guest_session);
> +       struct perf_event_attr attr = evsel->core.attr;
> +       u64 *id_array;
> +       u32 *vcpu_array;
> +       int ret = -ENOMEM;
> +       u32 i;
> +
> +       id_array = calloc(evsel->core.ids, sizeof(*id_array));
> +       if (!id_array)
> +               return -ENOMEM;
> +
> +       vcpu_array = calloc(evsel->core.ids, sizeof(*vcpu_array));
> +       if (!vcpu_array)
> +               goto out;
> +
> +       for (i = 0; i < evsel->core.ids; i++) {
> +               u64 id = evsel->core.id[i];
> +               struct guest_id *guest_id = guest_session__lookup_id(gs, id);
> +
> +               if (!guest_id) {
> +                       pr_err("Failed to find guest id %"PRIu64"\n", id);
> +                       ret = -EINVAL;
> +                       goto out;
> +               }
> +               id_array[i] = guest_id->host_id;
> +               vcpu_array[i] = guest_id->vcpu;
> +       }
> +
> +       attr.sample_type |= PERF_SAMPLE_IDENTIFIER;
> +       attr.exclude_host = 1;
> +       attr.exclude_guest = 0;
> +
> +       ret = perf_event__synthesize_attr(&inject->tool, &attr, evsel->core.ids,
> +                                         id_array, process_attr);
> +       if (ret)
> +               pr_err("Failed to add guest attr.\n");
> +
> +       for (i = 0; i < evsel->core.ids; i++) {
> +               struct perf_sample_id *sid;
> +               u32 vcpu = vcpu_array[i];
> +
> +               sid = evlist__id2sid(inject->session->evlist, id_array[i]);
> +               /* Guest event is per-thread from the host point of view */
> +               sid->cpu.cpu = -1;
> +               sid->tid = gs->vcpu[vcpu].tid;
> +               sid->machine_pid = gs->machine_pid;
> +               sid->vcpu.cpu = vcpu;
> +       }
> +out:
> +       free(vcpu_array);
> +       free(id_array);
> +       return ret;
> +}
> +
> +static int guest_session__add_attrs(struct guest_session *gs)
> +{
> +       struct evlist *evlist = gs->session->evlist;
> +       struct evsel *evsel;
> +       int ret;
> +
> +       evlist__for_each_entry(evlist, evsel) {
> +               ret = guest_session__add_attr(gs, evsel);
> +               if (ret)
> +                       return ret;
> +       }
> +
> +       return 0;
> +}
> +
> +static int synthesize_id_index(struct perf_inject *inject, size_t new_cnt)
> +{
> +       struct perf_session *session = inject->session;
> +       struct evlist *evlist = session->evlist;
> +       struct machine *machine = &session->machines.host;
> +       size_t from = evlist->core.nr_entries - new_cnt;
> +
> +       return __perf_event__synthesize_id_index(&inject->tool, perf_event__repipe,
> +                                                evlist, machine, from);
> +}
> +
> +static struct guest_tid *guest_session__lookup_tid(struct guest_session *gs, u32 tid)
> +{
> +       struct hlist_head *head;
> +       struct guest_tid *guest_tid;
> +       int hash;
> +
> +       hash = hash_32(tid, PERF_EVLIST__HLIST_BITS);
> +       head = &gs->tids[hash];
> +
> +       hlist_for_each_entry(guest_tid, head, node)
> +               if (guest_tid->tid == tid)
> +                       return guest_tid;
> +
> +       return NULL;
> +}
> +
> +static bool dso__is_in_kernel_space(struct dso *dso)
> +{
> +       if (dso__is_vdso(dso))
> +               return false;
> +
> +       return dso__is_kcore(dso) ||
> +              dso->kernel ||
> +              is_kernel_module(dso->long_name, PERF_RECORD_MISC_CPUMODE_UNKNOWN);
> +}
> +
> +static u64 evlist__first_id(struct evlist *evlist)
> +{
> +       struct evsel *evsel;
> +
> +       evlist__for_each_entry(evlist, evsel) {
> +               if (evsel->core.ids)
> +                       return evsel->core.id[0];
> +       }
> +       return 0;
> +}
> +
> +static int process_build_id(struct perf_tool *tool,
> +                           union perf_event *event,
> +                           struct perf_sample *sample __maybe_unused,
> +                           struct machine *machine __maybe_unused)
> +{
> +       struct perf_inject *inject = container_of(tool, struct perf_inject, tool);
> +
> +       return perf_event__process_build_id(inject->session, event);
> +}
> +
> +static int synthesize_build_id(struct perf_inject *inject, struct dso *dso, pid_t machine_pid)
> +{
> +       struct machine *machine = perf_session__findnew_machine(inject->session, machine_pid);
> +       u8 cpumode = dso__is_in_kernel_space(dso) ?
> +                       PERF_RECORD_MISC_GUEST_KERNEL :
> +                       PERF_RECORD_MISC_GUEST_USER;
> +
> +       if (!machine)
> +               return -ENOMEM;
> +
> +       dso->hit = 1;
> +
> +       return perf_event__synthesize_build_id(&inject->tool, dso, cpumode,
> +                                              process_build_id, machine);
> +}
> +
> +static int guest_session__add_build_ids(struct guest_session *gs)
> +{
> +       struct perf_inject *inject = container_of(gs, struct perf_inject, guest_session);
> +       struct machine *machine = &gs->session->machines.host;
> +       struct dso *dso;
> +       int ret;
> +
> +       /* Build IDs will be put in the Build ID feature section */
> +       perf_header__set_feat(&inject->session->header, HEADER_BUILD_ID);
> +
> +       dsos__for_each_with_build_id(dso, &machine->dsos.head) {
> +               ret = synthesize_build_id(inject, dso, gs->machine_pid);
> +               if (ret)
> +                       return ret;
> +       }
> +
> +       return 0;
> +}
> +
> +static int guest_session__ksymbol_event(struct perf_tool *tool,
> +                                       union perf_event *event,
> +                                       struct perf_sample *sample __maybe_unused,
> +                                       struct machine *machine __maybe_unused)
> +{
> +       struct guest_session *gs = container_of(tool, struct guest_session, tool);
> +
> +       /* Only support out-of-line i.e. no BPF support */
> +       if (event->ksymbol.ksym_type != PERF_RECORD_KSYMBOL_TYPE_OOL)
> +               return 0;
> +
> +       return guest_session__output_bytes(gs, event, event->header.size);
> +}
> +
> +static int guest_session__start(struct guest_session *gs, const char *name, bool force)
> +{
> +       char tmp_file_name[] = "/tmp/perf-inject-guest_session-XXXXXX";
> +       struct perf_session *session;
> +       int ret;
> +
> +       /* Only these events will be injected */
> +       gs->tool.mmap           = guest_session__repipe;
> +       gs->tool.mmap2          = guest_session__repipe;
> +       gs->tool.comm           = guest_session__repipe;
> +       gs->tool.fork           = guest_session__repipe;
> +       gs->tool.exit           = guest_session__repipe;
> +       gs->tool.lost           = guest_session__repipe;
> +       gs->tool.context_switch = guest_session__repipe;
> +       gs->tool.ksymbol        = guest_session__ksymbol_event;
> +       gs->tool.text_poke      = guest_session__repipe;
> +       /*
> +        * Processing a build ID creates a struct dso with that build ID. Later,
> +        * all guest dsos are iterated and the build IDs processed into the host
> +        * session where they will be output to the Build ID feature section
> +        * when the perf.data file header is written.
> +        */
> +       gs->tool.build_id       = perf_event__process_build_id;
> +       /* Process the id index to know what VCPU an ID belongs to */
> +       gs->tool.id_index       = perf_event__process_id_index;
> +
> +       gs->tool.ordered_events = true;
> +       gs->tool.ordering_requires_timestamps = true;
> +
> +       gs->data.path   = name;
> +       gs->data.force  = force;
> +       gs->data.mode   = PERF_DATA_MODE_READ;
> +
> +       session = perf_session__new(&gs->data, &gs->tool);
> +       if (IS_ERR(session))
> +               return PTR_ERR(session);
> +       gs->session = session;
> +
> +       /*
> +        * Initial events have zero'd ID samples. Get default ID sample size
> +        * used for removing them.
> +        */
> +       gs->dflt_id_hdr_size = session->machines.host.id_hdr_size;
> +       /* And default ID for adding back a host-compatible ID sample */
> +       gs->dflt_id = evlist__first_id(session->evlist);
> +       if (!gs->dflt_id) {
> +               pr_err("Guest data has no sample IDs");
> +               return -EINVAL;
> +       }
> +
> +       /* Temporary file for guest events */
> +       gs->tmp_file_name = strdup(tmp_file_name);
> +       if (!gs->tmp_file_name)
> +               return -ENOMEM;
> +       gs->tmp_fd = mkstemp(gs->tmp_file_name);
> +       if (gs->tmp_fd < 0)
> +               return -errno;
> +
> +       if (zstd_init(&gs->session->zstd_data, 0) < 0)
> +               pr_warning("Guest session decompression initialization failed.\n");
> +
> +       /*
> +        * perf does not support processing 2 sessions simultaneously, so output
> +        * guest events to a temporary file.
> +        */
> +       ret = perf_session__process_events(gs->session);
> +       if (ret)
> +               return ret;
> +
> +       if (lseek(gs->tmp_fd, 0, SEEK_SET))
> +               return -errno;
> +
> +       return 0;
> +}
> +
> +/* Free hlist nodes assuming hlist_node is the first member of hlist entries */
> +static void free_hlist(struct hlist_head *heads, size_t hlist_sz)
> +{
> +       struct hlist_node *pos, *n;
> +       size_t i;
> +
> +       for (i = 0; i < hlist_sz; ++i) {
> +               hlist_for_each_safe(pos, n, &heads[i]) {
> +                       hlist_del(pos);
> +                       free(pos);
> +               }
> +       }
> +}
> +
> +static void guest_session__exit(struct guest_session *gs)
> +{
> +       if (gs->session) {
> +               perf_session__delete(gs->session);
> +               free_hlist(gs->heads, PERF_EVLIST__HLIST_SIZE);
> +               free_hlist(gs->tids, PERF_EVLIST__HLIST_SIZE);
> +       }
> +       if (gs->tmp_file_name) {
> +               if (gs->tmp_fd >= 0)
> +                       close(gs->tmp_fd);
> +               unlink(gs->tmp_file_name);
> +               free(gs->tmp_file_name);
> +       }
> +       free(gs->vcpu);
> +       free(gs->perf_data_file);
> +}
> +
> +static void get_tsc_conv(struct perf_tsc_conversion *tc, struct perf_record_time_conv *time_conv)
> +{
> +       tc->time_shift          = time_conv->time_shift;
> +       tc->time_mult           = time_conv->time_mult;
> +       tc->time_zero           = time_conv->time_zero;
> +       tc->time_cycles         = time_conv->time_cycles;
> +       tc->time_mask           = time_conv->time_mask;
> +       tc->cap_user_time_zero  = time_conv->cap_user_time_zero;
> +       tc->cap_user_time_short = time_conv->cap_user_time_short;
> +}
> +
> +static void guest_session__get_tc(struct guest_session *gs)
> +{
> +       struct perf_inject *inject = container_of(gs, struct perf_inject, guest_session);
> +
> +       get_tsc_conv(&gs->host_tc, &inject->session->time_conv);
> +       get_tsc_conv(&gs->guest_tc, &gs->session->time_conv);
> +}
> +
> +static void guest_session__convert_time(struct guest_session *gs, u64 guest_time, u64 *host_time)
> +{
> +       u64 tsc;
> +
> +       if (!guest_time) {
> +               *host_time = 0;
> +               return;
> +       }
> +
> +       if (gs->guest_tc.cap_user_time_zero)
> +               tsc = perf_time_to_tsc(guest_time, &gs->guest_tc);
> +       else
> +               tsc = guest_time;
> +
> +       /*
> +        * This is the correct order of operations for x86 if the TSC Offset and
> +        * Multiplier values are used.
> +        */
> +       tsc -= gs->time_offset;
> +       tsc /= gs->time_scale;
> +
> +       if (gs->host_tc.cap_user_time_zero)
> +               *host_time = tsc_to_perf_time(tsc, &gs->host_tc);
> +       else
> +               *host_time = tsc;
> +}
> +
> +static int guest_session__fetch(struct guest_session *gs)
> +{
> +       void *buf = gs->ev.event_buf;
> +       struct perf_event_header *hdr = buf;
> +       size_t hdr_sz = sizeof(*hdr);
> +       ssize_t ret;
> +
> +       ret = readn(gs->tmp_fd, buf, hdr_sz);
> +       if (ret < 0)
> +               return ret;
> +
> +       if (!ret) {
> +               /* Zero size means EOF */
> +               hdr->size = 0;
> +               return 0;
> +       }
> +
> +       buf += hdr_sz;
> +
> +       ret = readn(gs->tmp_fd, buf, hdr->size - hdr_sz);
> +       if (ret < 0)
> +               return ret;
> +
> +       gs->ev.event = (union perf_event *)gs->ev.event_buf;
> +       gs->ev.sample.time = 0;
> +
> +       if (hdr->type >= PERF_RECORD_USER_TYPE_START) {
> +               pr_err("Unexpected type fetching guest event");
> +               return 0;
> +       }
> +
> +       ret = evlist__parse_sample(gs->session->evlist, gs->ev.event, &gs->ev.sample);
> +       if (ret) {
> +               pr_err("Parse failed fetching guest event");
> +               return ret;
> +       }
> +
> +       if (!gs->have_tc) {
> +               guest_session__get_tc(gs);
> +               gs->have_tc = true;
> +       }
> +
> +       guest_session__convert_time(gs, gs->ev.sample.time, &gs->ev.sample.time);
> +
> +       return 0;
> +}
> +
> +static int evlist__append_id_sample(struct evlist *evlist, union perf_event *ev,
> +                                   const struct perf_sample *sample)
> +{
> +       struct evsel *evsel;
> +       void *array;
> +       int ret;
> +
> +       evsel = evlist__id2evsel(evlist, sample->id);
> +       array = ev;
> +
> +       if (!evsel) {
> +               pr_err("No evsel for id %"PRIu64"\n", sample->id);
> +               return -EINVAL;
> +       }
> +
> +       array += ev->header.size;
> +       ret = perf_event__synthesize_id_sample(array, evsel->core.attr.sample_type, sample);
> +       if (ret < 0)
> +               return ret;
> +
> +       if (ret & 7) {
> +               pr_err("Bad id sample size %d\n", ret);
> +               return -EINVAL;
> +       }
> +
> +       ev->header.size += ret;
> +
> +       return 0;
> +}
> +
> +static int guest_session__inject_events(struct guest_session *gs, u64 timestamp)
> +{
> +       struct perf_inject *inject = container_of(gs, struct perf_inject, guest_session);
> +       int ret;
> +
> +       if (!gs->ready)
> +               return 0;
> +
> +       while (1) {
> +               struct perf_sample *sample;
> +               struct guest_id *guest_id;
> +               union perf_event *ev;
> +               u16 id_hdr_size;
> +               u8 cpumode;
> +               u64 id;
> +
> +               if (!gs->fetched) {
> +                       ret = guest_session__fetch(gs);
> +                       if (ret)
> +                               return ret;
> +                       gs->fetched = true;
> +               }
> +
> +               ev = gs->ev.event;
> +               sample = &gs->ev.sample;
> +
> +               if (!ev->header.size)
> +                       return 0; /* EOF */
> +
> +               if (sample->time > timestamp)
> +                       return 0;
> +
> +               /* Change cpumode to guest */
> +               cpumode = ev->header.misc & PERF_RECORD_MISC_CPUMODE_MASK;
> +               if (cpumode & PERF_RECORD_MISC_USER)
> +                       cpumode = PERF_RECORD_MISC_GUEST_USER;
> +               else
> +                       cpumode = PERF_RECORD_MISC_GUEST_KERNEL;
> +               ev->header.misc &= ~PERF_RECORD_MISC_CPUMODE_MASK;
> +               ev->header.misc |= cpumode;
> +
> +               id = sample->id;
> +               if (!id) {
> +                       id = gs->dflt_id;
> +                       id_hdr_size = gs->dflt_id_hdr_size;
> +               } else {
> +                       struct evsel *evsel = evlist__id2evsel(gs->session->evlist, id);
> +
> +                       id_hdr_size = evsel__id_hdr_size(evsel);
> +               }
> +
> +               if (id_hdr_size & 7) {
> +                       pr_err("Bad id_hdr_size %u\n", id_hdr_size);
> +                       return -EINVAL;
> +               }
> +
> +               if (ev->header.size & 7) {
> +                       pr_err("Bad event size %u\n", ev->header.size);
> +                       return -EINVAL;
> +               }
> +
> +               /* Remove guest id sample */
> +               ev->header.size -= id_hdr_size;
> +
> +               if (ev->header.size & 7) {
> +                       pr_err("Bad raw event size %u\n", ev->header.size);
> +                       return -EINVAL;
> +               }
> +
> +               guest_id = guest_session__lookup_id(gs, id);
> +               if (!guest_id) {
> +                       pr_err("Guest event with unknown id %llu\n",
> +                              (unsigned long long)id);
> +                       return -EINVAL;
> +               }
> +
> +               /* Change to host ID to avoid conflicting ID values */
> +               sample->id = guest_id->host_id;
> +               sample->stream_id = guest_id->host_id;
> +
> +               if (sample->cpu != (u32)-1) {
> +                       if (sample->cpu >= gs->vcpu_cnt) {
> +                               pr_err("Guest event with unknown VCPU %u\n",
> +                                      sample->cpu);
> +                               return -EINVAL;
> +                       }
> +                       /* Change to host CPU instead of guest VCPU */
> +                       sample->cpu = gs->vcpu[sample->cpu].cpu;
> +               }
> +
> +               /* New id sample with new ID and CPU */
> +               ret = evlist__append_id_sample(inject->session->evlist, ev, sample);
> +               if (ret)
> +                       return ret;
> +
> +               if (ev->header.size & 7) {
> +                       pr_err("Bad new event size %u\n", ev->header.size);
> +                       return -EINVAL;
> +               }
> +
> +               gs->fetched = false;
> +
> +               ret = output_bytes(inject, ev, ev->header.size);
> +               if (ret)
> +                       return ret;
> +       }
> +}
> +
> +static int guest_session__flush_events(struct guest_session *gs)
> +{
> +       return guest_session__inject_events(gs, -1);
> +}
> +
> +static int host__repipe(struct perf_tool *tool,
> +                       union perf_event *event,
> +                       struct perf_sample *sample,
> +                       struct machine *machine)
> +{
> +       struct perf_inject *inject = container_of(tool, struct perf_inject, tool);
> +       int ret;
> +
> +       ret = guest_session__inject_events(&inject->guest_session, sample->time);
> +       if (ret)
> +               return ret;
> +
> +       return perf_event__repipe(tool, event, sample, machine);
> +}
> +
> +static int host__finished_init(struct perf_session *session, union perf_event *event)
> +{
> +       struct perf_inject *inject = container_of(session->tool, struct perf_inject, tool);
> +       struct guest_session *gs = &inject->guest_session;
> +       int ret;
> +
> +       /*
> +        * Peek through host COMM events to find QEMU threads and the VCPU they
> +        * are running.
> +        */
> +       ret = host_peek_vm_comms(session, gs);
> +       if (ret)
> +               return ret;
> +
> +       if (!gs->vcpu_cnt) {
> +               pr_err("No VCPU theads found for pid %u\n", gs->machine_pid);
> +               return -EINVAL;
> +       }
> +
> +       /*
> +        * Allocate new (unused) host sample IDs and map them to the guest IDs.
> +        */
> +       gs->highest_id = evlist__find_highest_id(session->evlist);
> +       ret = guest_session__map_ids(gs, session->evlist);
> +       if (ret)
> +               return ret;
> +
> +       ret = guest_session__add_attrs(gs);
> +       if (ret)
> +               return ret;
> +
> +       ret = synthesize_id_index(inject, gs->session->evlist->core.nr_entries);
> +       if (ret) {
> +               pr_err("Failed to synthesize id_index\n");
> +               return ret;
> +       }
> +
> +       ret = guest_session__add_build_ids(gs);
> +       if (ret) {
> +               pr_err("Failed to add guest build IDs\n");
> +               return ret;
> +       }
> +
> +       gs->ready = true;
> +
> +       ret = guest_session__inject_events(gs, 0);
> +       if (ret)
> +               return ret;
> +
> +       return perf_event__repipe_op2_synth(session, event);
> +}
> +
> +/*
> + * Obey finished-round ordering. The FINISHED_ROUND event is first processed
> + * which flushes host events to file up until the last flush time. Then inject
> + * guest events up to the same time. Finally write out the FINISHED_ROUND event
> + * itself.
> + */
> +static int host__finished_round(struct perf_tool *tool,
> +                               union perf_event *event,
> +                               struct ordered_events *oe)
> +{
> +       struct perf_inject *inject = container_of(tool, struct perf_inject, tool);
> +       int ret = perf_event__process_finished_round(tool, event, oe);
> +       u64 timestamp = ordered_events__last_flush_time(oe);
> +
> +       if (ret)
> +               return ret;
> +
> +       ret = guest_session__inject_events(&inject->guest_session, timestamp);
> +       if (ret)
> +               return ret;
> +
> +       return perf_event__repipe_oe_synth(tool, event, oe);
> +}
> +
> +static int host__context_switch(struct perf_tool *tool,
> +                               union perf_event *event,
> +                               struct perf_sample *sample,
> +                               struct machine *machine)
> +{
> +       struct perf_inject *inject = container_of(tool, struct perf_inject, tool);
> +       bool out = event->header.misc & PERF_RECORD_MISC_SWITCH_OUT;
> +       struct guest_session *gs = &inject->guest_session;
> +       u32 pid = event->context_switch.next_prev_pid;
> +       u32 tid = event->context_switch.next_prev_tid;
> +       struct guest_tid *guest_tid;
> +       u32 vcpu;
> +
> +       if (out || pid != gs->machine_pid)
> +               goto out;
> +
> +       guest_tid = guest_session__lookup_tid(gs, tid);
> +       if (!guest_tid)
> +               goto out;
> +
> +       if (sample->cpu == (u32)-1) {
> +               pr_err("Switch event does not have CPU\n");
> +               return -EINVAL;
> +       }
> +
> +       vcpu = guest_tid->vcpu;
> +       if (vcpu >= gs->vcpu_cnt)
> +               return -EINVAL;
> +
> +       /* Guest is switching in, record which CPU the VCPU is now running on */
> +       gs->vcpu[vcpu].cpu = sample->cpu;
> +out:
> +       return host__repipe(tool, event, sample, machine);
> +}
> +
>  static void sig_handler(int sig __maybe_unused)
>  {
>         session_done = 1;
> @@ -767,6 +1666,61 @@ static int parse_vm_time_correlation(const struct option *opt, const char *str,
>         return inject->itrace_synth_opts.vm_tm_corr_args ? 0 : -ENOMEM;
>  }
>
> +static int parse_guest_data(const struct option *opt, const char *str, int unset)
> +{
> +       struct perf_inject *inject = opt->value;
> +       struct guest_session *gs = &inject->guest_session;
> +       char *tok;
> +       char *s;
> +
> +       if (unset)
> +               return 0;
> +
> +       if (!str)
> +               goto bad_args;
> +
> +       s = strdup(str);
> +       if (!s)
> +               return -ENOMEM;
> +
> +       gs->perf_data_file = strsep(&s, ",");
> +       if (!gs->perf_data_file)
> +               goto bad_args;
> +
> +       gs->copy_kcore_dir = has_kcore_dir(gs->perf_data_file);
> +       if (gs->copy_kcore_dir)
> +               inject->output.is_dir = true;
> +
> +       tok = strsep(&s, ",");
> +       if (!tok)
> +               goto bad_args;
> +       gs->machine_pid = strtoul(tok, NULL, 0);
> +       if (!inject->guest_session.machine_pid)
> +               goto bad_args;
> +
> +       gs->time_scale = 1;
> +
> +       tok = strsep(&s, ",");
> +       if (!tok)
> +               goto out;
> +       gs->time_offset = strtoull(tok, NULL, 0);
> +
> +       tok = strsep(&s, ",");
> +       if (!tok)
> +               goto out;
> +       gs->time_scale = strtod(tok, NULL);
> +       if (!gs->time_scale)
> +               goto bad_args;
> +out:
> +       return 0;
> +
> +bad_args:
> +       pr_err("--guest-data option requires guest perf.data file name, "
> +              "guest machine PID, and optionally guest timestamp offset, "
> +              "and guest timestamp scale factor, separated by commas.\n");
> +       return -1;
> +}
> +
>  static int save_section_info_cb(struct perf_file_section *section,
>                                 struct perf_header *ph __maybe_unused,
>                                 int feat, int fd __maybe_unused, void *data)
> @@ -896,6 +1850,22 @@ static int copy_kcore_dir(struct perf_inject *inject)
>         return ret;
>  }
>
> +static int guest_session__copy_kcore_dir(struct guest_session *gs)
> +{
> +       struct perf_inject *inject = container_of(gs, struct perf_inject, guest_session);
> +       char *cmd;
> +       int ret;
> +
> +       ret = asprintf(&cmd, "cp -r -n %s/kcore_dir %s/kcore_dir__%u >/dev/null 2>&1",
> +                      gs->perf_data_file, inject->output.path, gs->machine_pid);
> +       if (ret < 0)
> +               return ret;
> +       pr_debug("%s\n", cmd);
> +       ret = system(cmd);
> +       free(cmd);
> +       return ret;
> +}
> +
>  static int output_fd(struct perf_inject *inject)
>  {
>         return inject->in_place_update ? -1 : perf_data__fd(&inject->output);
> @@ -904,6 +1874,7 @@ static int output_fd(struct perf_inject *inject)
>  static int __cmd_inject(struct perf_inject *inject)
>  {
>         int ret = -EINVAL;
> +       struct guest_session *gs = &inject->guest_session;
>         struct perf_session *session = inject->session;
>         int fd = output_fd(inject);
>         u64 output_data_offset;
> @@ -968,6 +1939,47 @@ static int __cmd_inject(struct perf_inject *inject)
>                 output_data_offset = roundup(8192 + session->header.data_offset, 4096);
>                 if (inject->strip)
>                         strip_init(inject);
> +       } else if (gs->perf_data_file) {
> +               char *name = gs->perf_data_file;
> +
> +               /*
> +                * Not strictly necessary, but keep these events in order wrt
> +                * guest events.
> +                */
> +               inject->tool.mmap               = host__repipe;
> +               inject->tool.mmap2              = host__repipe;
> +               inject->tool.comm               = host__repipe;
> +               inject->tool.fork               = host__repipe;
> +               inject->tool.exit               = host__repipe;
> +               inject->tool.lost               = host__repipe;
> +               inject->tool.context_switch     = host__repipe;
> +               inject->tool.ksymbol            = host__repipe;
> +               inject->tool.text_poke          = host__repipe;
> +               /*
> +                * Once the host session has initialized, set up sample ID
> +                * mapping and feed in guest attrs, build IDs and initial
> +                * events.
> +                */
> +               inject->tool.finished_init      = host__finished_init;
> +               /* Obey finished round ordering */
> +               inject->tool.finished_round     = host__finished_round,
> +               /* Keep track of which CPU a VCPU is runnng on */
> +               inject->tool.context_switch     = host__context_switch;
> +               /*
> +                * Must order events to be able to obey finished round
> +                * ordering.
> +                */
> +               inject->tool.ordered_events     = true;
> +               inject->tool.ordering_requires_timestamps = true;
> +               /* Set up a separate session to process guest perf.data file */
> +               ret = guest_session__start(gs, name, session->data->force);
> +               if (ret) {
> +                       pr_err("Failed to process %s, error %d\n", name, ret);
> +                       return ret;
> +               }
> +               /* Allow space in the header for guest attributes */
> +               output_data_offset += gs->session->header.data_offset;
> +               output_data_offset = roundup(output_data_offset, 4096);
>         }
>
>         if (!inject->itrace_synth_opts.set)
> @@ -980,6 +1992,18 @@ static int __cmd_inject(struct perf_inject *inject)
>         if (ret)
>                 return ret;
>
> +       if (gs->session) {
> +               /*
> +                * Remaining guest events have later timestamps. Flush them
> +                * out to file.
> +                */
> +               ret = guest_session__flush_events(gs);
> +               if (ret) {
> +                       pr_err("Failed to flush guest events\n");
> +                       return ret;
> +               }
> +       }
> +
>         if (!inject->is_pipe && !inject->in_place_update) {
>                 struct inject_fc inj_fc = {
>                         .fc.copy = feat_copy_cb,
> @@ -1014,8 +2038,17 @@ static int __cmd_inject(struct perf_inject *inject)
>
>                 if (inject->copy_kcore_dir) {
>                         ret = copy_kcore_dir(inject);
> -                       if (ret)
> +                       if (ret) {
> +                               pr_err("Failed to copy kcore\n");
>                                 return ret;
> +                       }
> +               }
> +               if (gs->copy_kcore_dir) {
> +                       ret = guest_session__copy_kcore_dir(gs);
> +                       if (ret) {
> +                               pr_err("Failed to copy guest kcore\n");
> +                               return ret;
> +                       }
>                 }
>         }
>
> @@ -1113,6 +2146,12 @@ int cmd_inject(int argc, const char **argv)
>                 OPT_CALLBACK_OPTARG(0, "vm-time-correlation", &inject, NULL, "opts",
>                                     "correlate time between VM guests and the host",
>                                     parse_vm_time_correlation),
> +               OPT_CALLBACK_OPTARG(0, "guest-data", &inject, NULL, "opts",
> +                                   "inject events from a guest perf.data file",
> +                                   parse_guest_data),
> +               OPT_STRING(0, "guestmount", &symbol_conf.guestmount, "directory",
> +                          "guest mount directory under which every guest os"
> +                          " instance has a subdir"),

Should guestmount also be in the man page? Also should it have a
hyphen like guest-data?

Thanks,
Ian

>                 OPT_END()
>         };
>         const char * const inject_usage[] = {
> @@ -1243,6 +2282,8 @@ int cmd_inject(int argc, const char **argv)
>
>         ret = __cmd_inject(&inject);
>
> +       guest_session__exit(&inject.guest_session);
> +
>  out_delete:
>         zstd_fini(&(inject.session->zstd_data));
>         perf_session__delete(inject.session);
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 26/35] perf tools: Handle injected guest kernel mmap event
  2022-07-11  9:32 ` [PATCH 26/35] perf tools: Handle injected guest kernel mmap event Adrian Hunter
@ 2022-07-20  1:09   ` Ian Rogers
  0 siblings, 0 replies; 83+ messages in thread
From: Ian Rogers @ 2022-07-20  1:09 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Namhyung Kim, Andi Kleen,
	linux-kernel, kvm

On Mon, Jul 11, 2022 at 2:33 AM Adrian Hunter <adrian.hunter@intel.com> wrote:
>
> If a kernel mmap event was recorded inside a guest and injected into a host
> perf.data file, then it will match a host mmap_name not a guest mmap_name,
> see machine__set_mmap_name(). So try matching a host mmap_name in that
> case.
>
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>


Acked-by: Ian Rogers <irogers@google.com>

Thanks,
Ian

> ---
>  tools/perf/util/machine.c | 15 ++++++++++++---
>  1 file changed, 12 insertions(+), 3 deletions(-)
>
> diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> index 27d1a38f44c3..8f657225fb02 100644
> --- a/tools/perf/util/machine.c
> +++ b/tools/perf/util/machine.c
> @@ -1742,6 +1742,7 @@ static int machine__process_kernel_mmap_event(struct machine *machine,
>         struct map *map;
>         enum dso_space_type dso_space;
>         bool is_kernel_mmap;
> +       const char *mmap_name = machine->mmap_name;
>
>         /* If we have maps from kcore then we do not need or want any others */
>         if (machine__uses_kcore(machine))
> @@ -1752,8 +1753,16 @@ static int machine__process_kernel_mmap_event(struct machine *machine,
>         else
>                 dso_space = DSO_SPACE__KERNEL_GUEST;
>
> -       is_kernel_mmap = memcmp(xm->name, machine->mmap_name,
> -                               strlen(machine->mmap_name) - 1) == 0;
> +       is_kernel_mmap = memcmp(xm->name, mmap_name, strlen(mmap_name) - 1) == 0;
> +       if (!is_kernel_mmap && !machine__is_host(machine)) {
> +               /*
> +                * If the event was recorded inside the guest and injected into
> +                * the host perf.data file, then it will match a host mmap_name,
> +                * so try that - see machine__set_mmap_name().
> +                */
> +               mmap_name = "[kernel.kallsyms]";
> +               is_kernel_mmap = memcmp(xm->name, mmap_name, strlen(mmap_name) - 1) == 0;
> +       }
>         if (xm->name[0] == '/' ||
>             (!is_kernel_mmap && xm->name[0] == '[')) {
>                 map = machine__addnew_module_map(machine, xm->start,
> @@ -1767,7 +1776,7 @@ static int machine__process_kernel_mmap_event(struct machine *machine,
>                         dso__set_build_id(map->dso, bid);
>
>         } else if (is_kernel_mmap) {
> -               const char *symbol_name = (xm->name + strlen(machine->mmap_name));
> +               const char *symbol_name = xm->name + strlen(mmap_name);
>                 /*
>                  * Should be there already, from the build-id table in
>                  * the header.
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 27/35] perf tools: Add perf_event__is_guest()
  2022-07-11  9:32 ` [PATCH 27/35] perf tools: Add perf_event__is_guest() Adrian Hunter
@ 2022-07-20  1:11   ` Ian Rogers
  2022-07-20 14:06     ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 83+ messages in thread
From: Ian Rogers @ 2022-07-20  1:11 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Namhyung Kim, Andi Kleen,
	linux-kernel, kvm

On Mon, Jul 11, 2022 at 2:33 AM Adrian Hunter <adrian.hunter@intel.com> wrote:
>
> Add a helper function to determine if an event is a guest event.
>
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
> ---
>  tools/perf/util/event.h | 21 +++++++++++++++++++++
>  1 file changed, 21 insertions(+)
>
> diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
> index a660f304f83c..a7b0931d5137 100644
> --- a/tools/perf/util/event.h
> +++ b/tools/perf/util/event.h

Would this be better under tools/lib/perf ?

Thanks,
Ian

> @@ -484,4 +484,25 @@ void arch_perf_synthesize_sample_weight(const struct perf_sample *data, __u64 *a
>  const char *arch_perf_header_entry(const char *se_header);
>  int arch_support_sort_key(const char *sort_key);
>
> +static inline bool perf_event_header__cpumode_is_guest(u8 cpumode)
> +{
> +       return cpumode == PERF_RECORD_MISC_GUEST_KERNEL ||
> +              cpumode == PERF_RECORD_MISC_GUEST_USER;
> +}
> +
> +static inline bool perf_event_header__misc_is_guest(u16 misc)
> +{
> +       return perf_event_header__cpumode_is_guest(misc & PERF_RECORD_MISC_CPUMODE_MASK);
> +}
> +
> +static inline bool perf_event_header__is_guest(const struct perf_event_header *header)
> +{
> +       return perf_event_header__misc_is_guest(header->misc);
> +}
> +
> +static inline bool perf_event__is_guest(const union perf_event *event)
> +{
> +       return perf_event_header__is_guest(&event->header);
> +}
> +
>  #endif /* __PERF_RECORD_H */
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 28/35] perf intel-pt: Remove guest_machine_pid
  2022-07-11  9:32 ` [PATCH 28/35] perf intel-pt: Remove guest_machine_pid Adrian Hunter
@ 2022-07-20  1:12   ` Ian Rogers
  0 siblings, 0 replies; 83+ messages in thread
From: Ian Rogers @ 2022-07-20  1:12 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Namhyung Kim, Andi Kleen,
	linux-kernel, kvm

On Mon, Jul 11, 2022 at 2:33 AM Adrian Hunter <adrian.hunter@intel.com> wrote:
>
> Remove guest_machine_pid because it is not needed.
>
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>

Acked-by: Ian Rogers <irogers@google.com>

Thanks,
Ian

> ---
>  tools/perf/util/intel-pt.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
>
> diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c
> index 62b2f375a94d..014f9f73cc49 100644
> --- a/tools/perf/util/intel-pt.c
> +++ b/tools/perf/util/intel-pt.c
> @@ -194,7 +194,6 @@ struct intel_pt_queue {
>         struct machine *guest_machine;
>         struct thread *guest_thread;
>         struct thread *unknown_guest_thread;
> -       pid_t guest_machine_pid;
>         bool exclude_kernel;
>         bool have_sample;
>         u64 time;
> @@ -685,7 +684,7 @@ static int intel_pt_get_guest(struct intel_pt_queue *ptq)
>         struct machine *machine;
>         pid_t pid = ptq->pid <= 0 ? DEFAULT_GUEST_KERNEL_ID : ptq->pid;
>
> -       if (ptq->guest_machine && pid == ptq->guest_machine_pid)
> +       if (ptq->guest_machine && pid == ptq->guest_machine->pid)
>                 return 0;
>
>         ptq->guest_machine = NULL;
> @@ -705,7 +704,6 @@ static int intel_pt_get_guest(struct intel_pt_queue *ptq)
>                 return -1;
>
>         ptq->guest_machine = machine;
> -       ptq->guest_machine_pid = pid;
>
>         return 0;
>  }
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 29/35] perf intel-pt: Add some more logging to intel_pt_walk_next_insn()
  2022-07-11  9:32 ` [PATCH 29/35] perf intel-pt: Add some more logging to intel_pt_walk_next_insn() Adrian Hunter
@ 2022-07-20  1:13   ` Ian Rogers
  0 siblings, 0 replies; 83+ messages in thread
From: Ian Rogers @ 2022-07-20  1:13 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Namhyung Kim, Andi Kleen,
	linux-kernel, kvm

On Mon, Jul 11, 2022 at 2:33 AM Adrian Hunter <adrian.hunter@intel.com> wrote:
>
> To aid debugging, add some more logging to intel_pt_walk_next_insn().
>
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>

Acked-by: Ian Rogers <irogers@google.com>

Thanks,
Ian

> ---
>  tools/perf/util/intel-pt.c | 25 ++++++++++++++++++++-----
>  1 file changed, 20 insertions(+), 5 deletions(-)
>
> diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c
> index 014f9f73cc49..a8798b5bb311 100644
> --- a/tools/perf/util/intel-pt.c
> +++ b/tools/perf/util/intel-pt.c
> @@ -758,27 +758,38 @@ static int intel_pt_walk_next_insn(struct intel_pt_insn *intel_pt_insn,
>
>         if (nr) {
>                 if ((!symbol_conf.guest_code && cpumode != PERF_RECORD_MISC_GUEST_KERNEL) ||
> -                   intel_pt_get_guest(ptq))
> +                   intel_pt_get_guest(ptq)) {
> +                       intel_pt_log("ERROR: no guest machine\n");
>                         return -EINVAL;
> +               }
>                 machine = ptq->guest_machine;
>                 thread = ptq->guest_thread;
>                 if (!thread) {
> -                       if (cpumode != PERF_RECORD_MISC_GUEST_KERNEL)
> +                       if (cpumode != PERF_RECORD_MISC_GUEST_KERNEL) {
> +                               intel_pt_log("ERROR: no guest thread\n");
>                                 return -EINVAL;
> +                       }
>                         thread = ptq->unknown_guest_thread;
>                 }
>         } else {
>                 thread = ptq->thread;
>                 if (!thread) {
> -                       if (cpumode != PERF_RECORD_MISC_KERNEL)
> +                       if (cpumode != PERF_RECORD_MISC_KERNEL) {
> +                               intel_pt_log("ERROR: no thread\n");
>                                 return -EINVAL;
> +                       }
>                         thread = ptq->pt->unknown_thread;
>                 }
>         }
>
>         while (1) {
> -               if (!thread__find_map(thread, cpumode, *ip, &al) || !al.map->dso)
> +               if (!thread__find_map(thread, cpumode, *ip, &al) || !al.map->dso) {
> +                       if (al.map)
> +                               intel_pt_log("ERROR: thread has no dso for %#" PRIx64 "\n", *ip);
> +                       else
> +                               intel_pt_log("ERROR: thread has no map for %#" PRIx64 "\n", *ip);
>                         return -EINVAL;
> +               }
>
>                 if (al.map->dso->data.status == DSO_DATA_STATUS_ERROR &&
>                     dso__data_status_seen(al.map->dso,
> @@ -819,8 +830,12 @@ static int intel_pt_walk_next_insn(struct intel_pt_insn *intel_pt_insn,
>                         len = dso__data_read_offset(al.map->dso, machine,
>                                                     offset, buf,
>                                                     INTEL_PT_INSN_BUF_SZ);
> -                       if (len <= 0)
> +                       if (len <= 0) {
> +                               intel_pt_log("ERROR: failed to read at %" PRIu64 " ", offset);
> +                               if (intel_pt_enable_logging)
> +                                       dso__fprintf(al.map->dso, intel_pt_log_fp());
>                                 return -EINVAL;
> +                       }
>
>                         if (intel_pt_get_insn(buf, len, x86_64, intel_pt_insn))
>                                 return -EINVAL;
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 30/35] perf intel-pt: Track guest context switches
  2022-07-11  9:32 ` [PATCH 30/35] perf intel-pt: Track guest context switches Adrian Hunter
@ 2022-07-20  1:13   ` Ian Rogers
  0 siblings, 0 replies; 83+ messages in thread
From: Ian Rogers @ 2022-07-20  1:13 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Namhyung Kim, Andi Kleen,
	linux-kernel, kvm

On Mon, Jul 11, 2022 at 2:33 AM Adrian Hunter <adrian.hunter@intel.com> wrote:
>
> Use guest context switch events to keep track of which guest thread is
> running on a particular guest machine and VCPU.
>
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>

Acked-by: Ian Rogers <irogers@google.com>

Thanks,
Ian

> ---
>  tools/perf/util/intel-pt.c | 23 +++++++++++++++++++++++
>  1 file changed, 23 insertions(+)
>
> diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c
> index a8798b5bb311..98b097fec476 100644
> --- a/tools/perf/util/intel-pt.c
> +++ b/tools/perf/util/intel-pt.c
> @@ -78,6 +78,7 @@ struct intel_pt {
>         bool use_thread_stack;
>         bool callstack;
>         bool cap_event_trace;
> +       bool have_guest_sideband;
>         unsigned int br_stack_sz;
>         unsigned int br_stack_sz_plus;
>         int have_sched_switch;
> @@ -3079,6 +3080,25 @@ static int intel_pt_context_switch_in(struct intel_pt *pt,
>         return machine__set_current_tid(pt->machine, cpu, pid, tid);
>  }
>
> +static int intel_pt_guest_context_switch(struct intel_pt *pt,
> +                                        union perf_event *event,
> +                                        struct perf_sample *sample)
> +{
> +       bool out = event->header.misc & PERF_RECORD_MISC_SWITCH_OUT;
> +       struct machines *machines = &pt->session->machines;
> +       struct machine *machine = machines__find(machines, sample->machine_pid);
> +
> +       pt->have_guest_sideband = true;
> +
> +       if (out)
> +               return 0;
> +
> +       if (!machine)
> +               return -EINVAL;
> +
> +       return machine__set_current_tid(machine, sample->vcpu, sample->pid, sample->tid);
> +}
> +
>  static int intel_pt_context_switch(struct intel_pt *pt, union perf_event *event,
>                                    struct perf_sample *sample)
>  {
> @@ -3086,6 +3106,9 @@ static int intel_pt_context_switch(struct intel_pt *pt, union perf_event *event,
>         pid_t pid, tid;
>         int cpu, ret;
>
> +       if (perf_event__is_guest(event))
> +               return intel_pt_guest_context_switch(pt, event, sample);
> +
>         cpu = sample->cpu;
>
>         if (pt->have_sched_switch == 3) {
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 31/35] perf intel-pt: Disable sync switch with guest sideband
  2022-07-11  9:32 ` [PATCH 31/35] perf intel-pt: Disable sync switch with guest sideband Adrian Hunter
@ 2022-07-20  1:14   ` Ian Rogers
  0 siblings, 0 replies; 83+ messages in thread
From: Ian Rogers @ 2022-07-20  1:14 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Namhyung Kim, Andi Kleen,
	linux-kernel, kvm

On Mon, Jul 11, 2022 at 2:33 AM Adrian Hunter <adrian.hunter@intel.com> wrote:
>
> The sync_switch facility attempts to better synchronize context switches
> with the Intel PT trace, however it is not designed for guest machine
> context switches, so disable it when guest sideband is detected.
>
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>

Acked-by: Ian Rogers <irogers@google.com>

Thanks,
Ian

> ---
>  tools/perf/util/intel-pt.c | 29 +++++++++++++++++++++++++++++
>  1 file changed, 29 insertions(+)
>
> diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c
> index 98b097fec476..dc2af64f9e31 100644
> --- a/tools/perf/util/intel-pt.c
> +++ b/tools/perf/util/intel-pt.c
> @@ -74,6 +74,7 @@ struct intel_pt {
>         bool data_queued;
>         bool est_tsc;
>         bool sync_switch;
> +       bool sync_switch_not_supported;
>         bool mispred_all;
>         bool use_thread_stack;
>         bool callstack;
> @@ -2638,6 +2639,9 @@ static void intel_pt_enable_sync_switch(struct intel_pt *pt)
>  {
>         unsigned int i;
>
> +       if (pt->sync_switch_not_supported)
> +               return;
> +
>         pt->sync_switch = true;
>
>         for (i = 0; i < pt->queues.nr_queues; i++) {
> @@ -2649,6 +2653,23 @@ static void intel_pt_enable_sync_switch(struct intel_pt *pt)
>         }
>  }
>
> +static void intel_pt_disable_sync_switch(struct intel_pt *pt)
> +{
> +       unsigned int i;
> +
> +       pt->sync_switch = false;
> +
> +       for (i = 0; i < pt->queues.nr_queues; i++) {
> +               struct auxtrace_queue *queue = &pt->queues.queue_array[i];
> +               struct intel_pt_queue *ptq = queue->priv;
> +
> +               if (ptq) {
> +                       ptq->sync_switch = false;
> +                       intel_pt_next_tid(pt, ptq);
> +               }
> +       }
> +}
> +
>  /*
>   * To filter against time ranges, it is only necessary to look at the next start
>   * or end time.
> @@ -3090,6 +3111,14 @@ static int intel_pt_guest_context_switch(struct intel_pt *pt,
>
>         pt->have_guest_sideband = true;
>
> +       /*
> +        * sync_switch cannot handle guest machines at present, so just disable
> +        * it.
> +        */
> +       pt->sync_switch_not_supported = true;
> +       if (pt->sync_switch)
> +               intel_pt_disable_sync_switch(pt);
> +
>         if (out)
>                 return 0;
>
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 32/35] perf intel-pt: Determine guest thread from guest sideband
  2022-07-11  9:32 ` [PATCH 32/35] perf intel-pt: Determine guest thread from " Adrian Hunter
@ 2022-07-20  1:15   ` Ian Rogers
  0 siblings, 0 replies; 83+ messages in thread
From: Ian Rogers @ 2022-07-20  1:15 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Namhyung Kim, Andi Kleen,
	linux-kernel, kvm

On Mon, Jul 11, 2022 at 2:34 AM Adrian Hunter <adrian.hunter@intel.com> wrote:
>
> Prior to decoding, determine what guest thread, if any, is running.
>
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>

Acked-by: Ian Rogers <irogers@google.com>

Thanks,
Ian

> ---
>  tools/perf/util/intel-pt.c | 69 ++++++++++++++++++++++++++++++++++++--
>  1 file changed, 67 insertions(+), 2 deletions(-)
>
> diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c
> index dc2af64f9e31..a08c2f059d5a 100644
> --- a/tools/perf/util/intel-pt.c
> +++ b/tools/perf/util/intel-pt.c
> @@ -196,6 +196,10 @@ struct intel_pt_queue {
>         struct machine *guest_machine;
>         struct thread *guest_thread;
>         struct thread *unknown_guest_thread;
> +       pid_t guest_machine_pid;
> +       pid_t guest_pid;
> +       pid_t guest_tid;
> +       int vcpu;
>         bool exclude_kernel;
>         bool have_sample;
>         u64 time;
> @@ -759,8 +763,13 @@ static int intel_pt_walk_next_insn(struct intel_pt_insn *intel_pt_insn,
>         cpumode = intel_pt_nr_cpumode(ptq, *ip, nr);
>
>         if (nr) {
> -               if ((!symbol_conf.guest_code && cpumode != PERF_RECORD_MISC_GUEST_KERNEL) ||
> -                   intel_pt_get_guest(ptq)) {
> +               if (ptq->pt->have_guest_sideband) {
> +                       if (!ptq->guest_machine || ptq->guest_machine_pid != ptq->pid) {
> +                               intel_pt_log("ERROR: guest sideband but no guest machine\n");
> +                               return -EINVAL;
> +                       }
> +               } else if ((!symbol_conf.guest_code && cpumode != PERF_RECORD_MISC_GUEST_KERNEL) ||
> +                          intel_pt_get_guest(ptq)) {
>                         intel_pt_log("ERROR: no guest machine\n");
>                         return -EINVAL;
>                 }
> @@ -1385,6 +1394,55 @@ static void intel_pt_first_timestamp(struct intel_pt *pt, u64 timestamp)
>         }
>  }
>
> +static int intel_pt_get_guest_from_sideband(struct intel_pt_queue *ptq)
> +{
> +       struct machines *machines = &ptq->pt->session->machines;
> +       struct machine *machine;
> +       pid_t machine_pid = ptq->pid;
> +       pid_t tid;
> +       int vcpu;
> +
> +       if (machine_pid <= 0)
> +               return 0; /* Not a guest machine */
> +
> +       machine = machines__find(machines, machine_pid);
> +       if (!machine)
> +               return 0; /* Not a guest machine */
> +
> +       if (ptq->guest_machine != machine) {
> +               ptq->guest_machine = NULL;
> +               thread__zput(ptq->guest_thread);
> +               thread__zput(ptq->unknown_guest_thread);
> +
> +               ptq->unknown_guest_thread = machine__find_thread(machine, 0, 0);
> +               if (!ptq->unknown_guest_thread)
> +                       return -1;
> +               ptq->guest_machine = machine;
> +       }
> +
> +       vcpu = ptq->thread ? ptq->thread->guest_cpu : -1;
> +       if (vcpu < 0)
> +               return -1;
> +
> +       tid = machine__get_current_tid(machine, vcpu);
> +
> +       if (ptq->guest_thread && ptq->guest_thread->tid != tid)
> +               thread__zput(ptq->guest_thread);
> +
> +       if (!ptq->guest_thread) {
> +               ptq->guest_thread = machine__find_thread(machine, -1, tid);
> +               if (!ptq->guest_thread)
> +                       return -1;
> +       }
> +
> +       ptq->guest_machine_pid = machine_pid;
> +       ptq->guest_pid = ptq->guest_thread->pid_;
> +       ptq->guest_tid = tid;
> +       ptq->vcpu = vcpu;
> +
> +       return 0;
> +}
> +
>  static void intel_pt_set_pid_tid_cpu(struct intel_pt *pt,
>                                      struct auxtrace_queue *queue)
>  {
> @@ -1405,6 +1463,13 @@ static void intel_pt_set_pid_tid_cpu(struct intel_pt *pt,
>                 if (queue->cpu == -1)
>                         ptq->cpu = ptq->thread->cpu;
>         }
> +
> +       if (pt->have_guest_sideband && intel_pt_get_guest_from_sideband(ptq)) {
> +               ptq->guest_machine_pid = 0;
> +               ptq->guest_pid = -1;
> +               ptq->guest_tid = -1;
> +               ptq->vcpu = -1;
> +       }
>  }
>
>  static void intel_pt_sample_flags(struct intel_pt_queue *ptq)
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 33/35] perf intel-pt: Add machine_pid and vcpu to auxtrace_error
  2022-07-11  9:32 ` [PATCH 33/35] perf intel-pt: Add machine_pid and vcpu to auxtrace_error Adrian Hunter
@ 2022-07-20  5:27   ` Ian Rogers
  0 siblings, 0 replies; 83+ messages in thread
From: Ian Rogers @ 2022-07-20  5:27 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Namhyung Kim, Andi Kleen,
	linux-kernel, kvm

On Mon, Jul 11, 2022 at 2:34 AM Adrian Hunter <adrian.hunter@intel.com> wrote:
>
> When decoding with guest sideband information, for VMX non-root (NR)
> i.e. guest errors, replace the host (hypervisor) pid/tid with guest values,
> and provide also the new machine_pid and vcpu values.
>
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>

Acked-by: Ian Rogers <irogers@google.com>

Thanks,
Ian

> ---
>  tools/perf/util/intel-pt.c | 26 ++++++++++++++++++++------
>  1 file changed, 20 insertions(+), 6 deletions(-)
>
> diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c
> index a08c2f059d5a..143a096b567b 100644
> --- a/tools/perf/util/intel-pt.c
> +++ b/tools/perf/util/intel-pt.c
> @@ -2404,7 +2404,8 @@ static int intel_pt_synth_iflag_chg_sample(struct intel_pt_queue *ptq)
>  }
>
>  static int intel_pt_synth_error(struct intel_pt *pt, int code, int cpu,
> -                               pid_t pid, pid_t tid, u64 ip, u64 timestamp)
> +                               pid_t pid, pid_t tid, u64 ip, u64 timestamp,
> +                               pid_t machine_pid, int vcpu)
>  {
>         union perf_event event;
>         char msg[MAX_AUXTRACE_ERROR_MSG];
> @@ -2421,8 +2422,9 @@ static int intel_pt_synth_error(struct intel_pt *pt, int code, int cpu,
>
>         intel_pt__strerror(code, msg, MAX_AUXTRACE_ERROR_MSG);
>
> -       auxtrace_synth_error(&event.auxtrace_error, PERF_AUXTRACE_ERROR_ITRACE,
> -                            code, cpu, pid, tid, ip, msg, timestamp);
> +       auxtrace_synth_guest_error(&event.auxtrace_error, PERF_AUXTRACE_ERROR_ITRACE,
> +                                  code, cpu, pid, tid, ip, msg, timestamp,
> +                                  machine_pid, vcpu);
>
>         err = perf_session__deliver_synth_event(pt->session, &event, NULL);
>         if (err)
> @@ -2437,11 +2439,22 @@ static int intel_ptq_synth_error(struct intel_pt_queue *ptq,
>  {
>         struct intel_pt *pt = ptq->pt;
>         u64 tm = ptq->timestamp;
> +       pid_t machine_pid = 0;
> +       pid_t pid = ptq->pid;
> +       pid_t tid = ptq->tid;
> +       int vcpu = -1;
>
>         tm = pt->timeless_decoding ? 0 : tsc_to_perf_time(tm, &pt->tc);
>
> -       return intel_pt_synth_error(pt, state->err, ptq->cpu, ptq->pid,
> -                                   ptq->tid, state->from_ip, tm);
> +       if (pt->have_guest_sideband && state->from_nr) {
> +               machine_pid = ptq->guest_machine_pid;
> +               vcpu = ptq->vcpu;
> +               pid = ptq->guest_pid;
> +               tid = ptq->guest_tid;
> +       }
> +
> +       return intel_pt_synth_error(pt, state->err, ptq->cpu, pid, tid,
> +                                   state->from_ip, tm, machine_pid, vcpu);
>  }
>
>  static int intel_pt_next_tid(struct intel_pt *pt, struct intel_pt_queue *ptq)
> @@ -3028,7 +3041,8 @@ static int intel_pt_process_timeless_sample(struct intel_pt *pt,
>  static int intel_pt_lost(struct intel_pt *pt, struct perf_sample *sample)
>  {
>         return intel_pt_synth_error(pt, INTEL_PT_ERR_LOST, sample->cpu,
> -                                   sample->pid, sample->tid, 0, sample->time);
> +                                   sample->pid, sample->tid, 0, sample->time,
> +                                   sample->machine_pid, sample->vcpu);
>  }
>
>  static struct intel_pt_queue *intel_pt_cpu_to_ptq(struct intel_pt *pt, int cpu)
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 34/35] perf intel-pt: Use guest pid/tid etc in guest samples
  2022-07-11  9:32 ` [PATCH 34/35] perf intel-pt: Use guest pid/tid etc in guest samples Adrian Hunter
@ 2022-07-20  5:28   ` Ian Rogers
  0 siblings, 0 replies; 83+ messages in thread
From: Ian Rogers @ 2022-07-20  5:28 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Namhyung Kim, Andi Kleen,
	linux-kernel, kvm

On Mon, Jul 11, 2022 at 2:34 AM Adrian Hunter <adrian.hunter@intel.com> wrote:
>
> When decoding with guest sideband information, for VMX non-root (NR)
> i.e. guest events, replace the host (hypervisor) pid/tid with guest values,
> and provide also the new machine_pid and vcpu values.
>
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>

Acked-by: Ian Rogers <irogers@google.com>

Thanks,
Ian

> ---
>  tools/perf/util/intel-pt.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
>
> diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c
> index 143a096b567b..d5e9fc8106dd 100644
> --- a/tools/perf/util/intel-pt.c
> +++ b/tools/perf/util/intel-pt.c
> @@ -1657,6 +1657,17 @@ static void intel_pt_prep_a_sample(struct intel_pt_queue *ptq,
>
>         sample->pid = ptq->pid;
>         sample->tid = ptq->tid;
> +
> +       if (ptq->pt->have_guest_sideband) {
> +               if ((ptq->state->from_ip && ptq->state->from_nr) ||
> +                   (ptq->state->to_ip && ptq->state->to_nr)) {
> +                       sample->pid = ptq->guest_pid;
> +                       sample->tid = ptq->guest_tid;
> +                       sample->machine_pid = ptq->guest_machine_pid;
> +                       sample->vcpu = ptq->vcpu;
> +               }
> +       }
> +
>         sample->cpu = ptq->cpu;
>         sample->insn_len = ptq->insn_len;
>         memcpy(sample->insn, ptq->insn, INTEL_PT_INSN_BUF_SZ);
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 35/35] perf intel-pt: Add documentation for tracing guest machine user space
  2022-07-11  9:32 ` [PATCH 35/35] perf intel-pt: Add documentation for tracing guest machine user space Adrian Hunter
@ 2022-07-20  5:29   ` Ian Rogers
  0 siblings, 0 replies; 83+ messages in thread
From: Ian Rogers @ 2022-07-20  5:29 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Namhyung Kim, Andi Kleen,
	linux-kernel, kvm

On Mon, Jul 11, 2022 at 2:34 AM Adrian Hunter <adrian.hunter@intel.com> wrote:
>
> Now it is possible to decode a host Intel PT trace including guest machine
> user space, add documentation for the steps needed to do it.
>
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>

Acked-by: Ian Rogers <irogers@google.com>

Thanks,
Ian

> ---
>  tools/perf/Documentation/perf-intel-pt.txt | 181 ++++++++++++++++++++-
>  1 file changed, 177 insertions(+), 4 deletions(-)
>
> diff --git a/tools/perf/Documentation/perf-intel-pt.txt b/tools/perf/Documentation/perf-intel-pt.txt
> index 238ab9d3cb93..3dc3f0ccbd51 100644
> --- a/tools/perf/Documentation/perf-intel-pt.txt
> +++ b/tools/perf/Documentation/perf-intel-pt.txt
> @@ -267,7 +267,7 @@ Note that, as with all events, the event is suffixed with event modifiers:
>         H       host
>         p       precise ip
>
> -'h', 'G' and 'H' are for virtualization which is not supported by Intel PT.
> +'h', 'G' and 'H' are for virtualization which are not used by Intel PT.
>  'p' is also not relevant to Intel PT.  So only options 'u' and 'k' are
>  meaningful for Intel PT.
>
> @@ -1218,10 +1218,10 @@ XED
>  include::build-xed.txt[]
>
>
> -Tracing Virtual Machines
> -------------------------
> +Tracing Virtual Machines (kernel only)
> +--------------------------------------
>
> -Currently, only kernel tracing is supported and only with either "timeless" decoding
> +Currently, kernel tracing is supported with either "timeless" decoding
>  (i.e. no TSC timestamps) or VM Time Correlation. VM Time Correlation is an extra step
>  using 'perf inject' and requires unchanging VMX TSC Offset and no VMX TSC Scaling.
>
> @@ -1400,6 +1400,179 @@ There were none.
>            :17006 17006 [001] 11500.262869216:  ffffffff8220116e error_entry+0xe ([guest.kernel.kallsyms])               pushq  %rax
>
>
> +Tracing Virtual Machines (including user space)
> +-----------------------------------------------
> +
> +It is possible to use perf record to record sideband events within a virtual machine, so that an Intel PT trace on the host can be decoded.
> +Sideband events from the guest perf.data file can be injected into the host perf.data file using perf inject.
> +
> +Here is an example of the steps needed:
> +
> +On the guest machine:
> +
> +Check that no-kvmclock kernel command line option was used to boot:
> +
> +Note, this is essential to enable time correlation between host and guest machines.
> +
> + $ cat /proc/cmdline
> + BOOT_IMAGE=/boot/vmlinuz-5.10.0-16-amd64 root=UUID=cb49c910-e573-47e0-bce7-79e293df8e1d ro no-kvmclock
> +
> +There is no BPF support at present so, if possible, disable JIT compiling:
> +
> + $ echo 0 | sudo tee /proc/sys/net/core/bpf_jit_enable
> + 0
> +
> +Start perf record to collect sideband events:
> +
> + $ sudo perf record -o guest-sideband-testing-guest-perf.data --sample-identifier --buildid-all --switch-events --kcore -a -e dummy
> +
> +On the host machine:
> +
> +Start perf record to collect Intel PT trace:
> +
> +Note, the host trace will get very big, very fast, so the steps from starting to stopping the host trace really need to be done so that they happen in the shortest time possible.
> +
> + $ sudo perf record -o guest-sideband-testing-host-perf.data -m,64M --kcore -a -e intel_pt/cyc/
> +
> +On the guest machine:
> +
> +Run a small test case, just 'uname' in this example:
> +
> + $ uname
> + Linux
> +
> +On the host machine:
> +
> +Stop the Intel PT trace:
> +
> + ^C
> + [ perf record: Woken up 1 times to write data ]
> + [ perf record: Captured and wrote 76.122 MB guest-sideband-testing-host-perf.data ]
> +
> +On the guest machine:
> +
> +Stop the Intel PT trace:
> +
> + ^C
> + [ perf record: Woken up 1 times to write data ]
> + [ perf record: Captured and wrote 1.247 MB guest-sideband-testing-guest-perf.data ]
> +
> +And then copy guest-sideband-testing-guest-perf.data to the host (not shown here).
> +
> +On the host machine:
> +
> +With the 2 perf.data recordings, and with their ownership changed to the user.
> +
> +Identify the TSC Offset:
> +
> + $ perf inject -i guest-sideband-testing-host-perf.data --vm-time-correlation=dry-run
> + VMCS: 0x103fc6  TSC Offset 0xfffffa6ae070cb20
> + VMCS: 0x103ff2  TSC Offset 0xfffffa6ae070cb20
> + VMCS: 0x10fdaa  TSC Offset 0xfffffa6ae070cb20
> + VMCS: 0x24d57c  TSC Offset 0xfffffa6ae070cb20
> +
> +Correct Intel PT TSC timestamps for the guest machine:
> +
> + $ perf inject -i guest-sideband-testing-host-perf.data --vm-time-correlation=0xfffffa6ae070cb20 --force
> +
> +Identify the guest machine PID:
> +
> + $ perf script -i guest-sideband-testing-host-perf.data --no-itrace --show-task-events | grep KVM
> +       CPU 0/KVM     0 [000]     0.000000: PERF_RECORD_COMM: CPU 0/KVM:13376/13381
> +       CPU 1/KVM     0 [000]     0.000000: PERF_RECORD_COMM: CPU 1/KVM:13376/13382
> +       CPU 2/KVM     0 [000]     0.000000: PERF_RECORD_COMM: CPU 2/KVM:13376/13383
> +       CPU 3/KVM     0 [000]     0.000000: PERF_RECORD_COMM: CPU 3/KVM:13376/13384
> +
> +Note, the QEMU option -name debug-threads=on is needed so that thread names
> +can be used to determine which thread is running which VCPU as above. libvirt seems to use this by default.
> +
> +Create a guestmount, assuming the guest machine is 'vm_to_test':
> +
> + $ mkdir -p ~/guestmount/13376
> + $ sshfs -o direct_io vm_to_test:/ ~/guestmount/13376
> +
> +Inject the guest perf.data file into the host perf.data file:
> +
> +Note, due to the guestmount option, guest object files and debug files will be copied into the build ID cache from the guest machine, with the notable exception of VDSO.
> +If needed, VDSO can be copied manually in a fashion similar to that used by the perf-archive script.
> +
> + $ perf inject -i guest-sideband-testing-host-perf.data -o inj --guestmount ~/guestmount --guest-data=guest-sideband-testing-guest-perf.data,13376,0xfffffa6ae070cb20
> +
> +Show an excerpt from the result.  In this case the CPU and time range have been to chosen to show interaction between guest and host when 'uname' is starting to run on the guest machine:
> +
> +Notes:
> +
> +       - the CPU displayed, [002] in this case, is always the host CPU
> +       - events happening in the virtual machine start with VM:13376 VCPU:003, which shows the hypervisor PID 13376 and the VCPU number
> +       - only calls and errors are displayed i.e. --itrace=ce
> +       - branches entering and exiting the virtual machine are split, and show as 2 branches to/from "0 [unknown] ([unknown])"
> +
> + $ perf script -i inj --itrace=ce -F+machine_pid,+vcpu,+addr,+pid,+tid,-period --ns --time 7919.408803365,7919.408804631 -C 2
> +       CPU 3/KVM 13376/13384 [002]  7919.408803365:      branches:  ffffffffc0f8ebe0 vmx_vcpu_enter_exit+0xc0 ([kernel.kallsyms]) => ffffffffc0f8edc0 __vmx_vcpu_run+0x0 ([kernel.kallsyms])
> +       CPU 3/KVM 13376/13384 [002]  7919.408803365:      branches:  ffffffffc0f8edd5 __vmx_vcpu_run+0x15 ([kernel.kallsyms]) => ffffffffc0f8eca0 vmx_update_host_rsp+0x0 ([kernel.kallsyms])
> +       CPU 3/KVM 13376/13384 [002]  7919.408803365:      branches:  ffffffffc0f8ee1b __vmx_vcpu_run+0x5b ([kernel.kallsyms]) => ffffffffc0f8ed60 vmx_vmenter+0x0 ([kernel.kallsyms])
> +       CPU 3/KVM 13376/13384 [002]  7919.408803461:      branches:  ffffffffc0f8ed62 vmx_vmenter+0x2 ([kernel.kallsyms]) =>                0 [unknown] ([unknown])
> + VM:13376 VCPU:003            uname  3404/3404  [002]  7919.408803461:      branches:                 0 [unknown] ([unknown]) =>     7f851c9b5a5c init_cacheinfo+0x3ac (/usr/lib/x86_64-linux-gnu/libc-2.31.so)
> + VM:13376 VCPU:003            uname  3404/3404  [002]  7919.408803567:      branches:      7f851c9b5a5a init_cacheinfo+0x3aa (/usr/lib/x86_64-linux-gnu/libc-2.31.so) =>                0 [unknown] ([unknown])
> +       CPU 3/KVM 13376/13384 [002]  7919.408803567:      branches:                 0 [unknown] ([unknown]) => ffffffffc0f8ed80 vmx_vmexit+0x0 ([kernel.kallsyms])
> +       CPU 3/KVM 13376/13384 [002]  7919.408803596:      branches:  ffffffffc0f6619a vmx_vcpu_run+0x26a ([kernel.kallsyms]) => ffffffffb2255c60 x86_virt_spec_ctrl+0x0 ([kernel.kallsyms])
> +       CPU 3/KVM 13376/13384 [002]  7919.408803801:      branches:  ffffffffc0f66445 vmx_vcpu_run+0x515 ([kernel.kallsyms]) => ffffffffb2290b30 native_write_msr+0x0 ([kernel.kallsyms])
> +       CPU 3/KVM 13376/13384 [002]  7919.408803850:      branches:  ffffffffc0f661f8 vmx_vcpu_run+0x2c8 ([kernel.kallsyms]) => ffffffffc1092300 kvm_load_host_xsave_state+0x0 ([kernel.kallsyms])
> +       CPU 3/KVM 13376/13384 [002]  7919.408803850:      branches:  ffffffffc1092327 kvm_load_host_xsave_state+0x27 ([kernel.kallsyms]) => ffffffffc1092220 kvm_load_host_xsave_state.part.0+0x0 ([kernel.kallsyms])
> +       CPU 3/KVM 13376/13384 [002]  7919.408803862:      branches:  ffffffffc0f662cf vmx_vcpu_run+0x39f ([kernel.kallsyms]) => ffffffffc0f63f90 vmx_recover_nmi_blocking+0x0 ([kernel.kallsyms])
> +       CPU 3/KVM 13376/13384 [002]  7919.408803862:      branches:  ffffffffc0f662e9 vmx_vcpu_run+0x3b9 ([kernel.kallsyms]) => ffffffffc0f619a0 __vmx_complete_interrupts+0x0 ([kernel.kallsyms])
> +       CPU 3/KVM 13376/13384 [002]  7919.408803872:      branches:  ffffffffc109cfb2 vcpu_enter_guest+0x752 ([kernel.kallsyms]) => ffffffffc0f5f570 vmx_handle_exit_irqoff+0x0 ([kernel.kallsyms])
> +       CPU 3/KVM 13376/13384 [002]  7919.408803881:      branches:  ffffffffc109d028 vcpu_enter_guest+0x7c8 ([kernel.kallsyms]) => ffffffffb234f900 __srcu_read_lock+0x0 ([kernel.kallsyms])
> +       CPU 3/KVM 13376/13384 [002]  7919.408803897:      branches:  ffffffffc109d06f vcpu_enter_guest+0x80f ([kernel.kallsyms]) => ffffffffc0f72e30 vmx_handle_exit+0x0 ([kernel.kallsyms])
> +       CPU 3/KVM 13376/13384 [002]  7919.408803897:      branches:  ffffffffc0f72e3d vmx_handle_exit+0xd ([kernel.kallsyms]) => ffffffffc0f727c0 __vmx_handle_exit+0x0 ([kernel.kallsyms])
> +       CPU 3/KVM 13376/13384 [002]  7919.408803897:      branches:  ffffffffc0f72b15 __vmx_handle_exit+0x355 ([kernel.kallsyms]) => ffffffffc0f60ae0 vmx_flush_pml_buffer+0x0 ([kernel.kallsyms])
> +       CPU 3/KVM 13376/13384 [002]  7919.408803903:      branches:  ffffffffc0f72994 __vmx_handle_exit+0x1d4 ([kernel.kallsyms]) => ffffffffc10b7090 kvm_emulate_cpuid+0x0 ([kernel.kallsyms])
> +       CPU 3/KVM 13376/13384 [002]  7919.408803903:      branches:  ffffffffc10b70f1 kvm_emulate_cpuid+0x61 ([kernel.kallsyms]) => ffffffffc10b6e10 kvm_cpuid+0x0 ([kernel.kallsyms])
> +       CPU 3/KVM 13376/13384 [002]  7919.408803941:      branches:  ffffffffc10b7125 kvm_emulate_cpuid+0x95 ([kernel.kallsyms]) => ffffffffc1093110 kvm_skip_emulated_instruction+0x0 ([kernel.kallsyms])
> +       CPU 3/KVM 13376/13384 [002]  7919.408803941:      branches:  ffffffffc109311f kvm_skip_emulated_instruction+0xf ([kernel.kallsyms]) => ffffffffc0f5e180 vmx_get_rflags+0x0 ([kernel.kallsyms])
> +       CPU 3/KVM 13376/13384 [002]  7919.408803951:      branches:  ffffffffc109312a kvm_skip_emulated_instruction+0x1a ([kernel.kallsyms]) => ffffffffc0f5fd30 vmx_skip_emulated_instruction+0x0 ([kernel.kallsyms])
> +       CPU 3/KVM 13376/13384 [002]  7919.408803951:      branches:  ffffffffc0f5fd79 vmx_skip_emulated_instruction+0x49 ([kernel.kallsyms]) => ffffffffc0f5fb50 skip_emulated_instruction+0x0 ([kernel.kallsyms])
> +       CPU 3/KVM 13376/13384 [002]  7919.408803956:      branches:  ffffffffc0f5fc68 skip_emulated_instruction+0x118 ([kernel.kallsyms]) => ffffffffc0f6a940 vmx_cache_reg+0x0 ([kernel.kallsyms])
> +       CPU 3/KVM 13376/13384 [002]  7919.408803964:      branches:  ffffffffc0f5fc11 skip_emulated_instruction+0xc1 ([kernel.kallsyms]) => ffffffffc0f5f9e0 vmx_set_interrupt_shadow+0x0 ([kernel.kallsyms])
> +       CPU 3/KVM 13376/13384 [002]  7919.408803980:      branches:  ffffffffc109f8b1 vcpu_run+0x71 ([kernel.kallsyms]) => ffffffffc10ad2f0 kvm_cpu_has_pending_timer+0x0 ([kernel.kallsyms])
> +       CPU 3/KVM 13376/13384 [002]  7919.408803980:      branches:  ffffffffc10ad2fb kvm_cpu_has_pending_timer+0xb ([kernel.kallsyms]) => ffffffffc10b0490 apic_has_pending_timer+0x0 ([kernel.kallsyms])
> +       CPU 3/KVM 13376/13384 [002]  7919.408803991:      branches:  ffffffffc109f899 vcpu_run+0x59 ([kernel.kallsyms]) => ffffffffc109c860 vcpu_enter_guest+0x0 ([kernel.kallsyms])
> +       CPU 3/KVM 13376/13384 [002]  7919.408803993:      branches:  ffffffffc109cd4c vcpu_enter_guest+0x4ec ([kernel.kallsyms]) => ffffffffc0f69140 vmx_prepare_switch_to_guest+0x0 ([kernel.kallsyms])
> +       CPU 3/KVM 13376/13384 [002]  7919.408803996:      branches:  ffffffffc109cd7d vcpu_enter_guest+0x51d ([kernel.kallsyms]) => ffffffffb234f930 __srcu_read_unlock+0x0 ([kernel.kallsyms])
> +       CPU 3/KVM 13376/13384 [002]  7919.408803996:      branches:  ffffffffc109cd9c vcpu_enter_guest+0x53c ([kernel.kallsyms]) => ffffffffc0f609b0 vmx_sync_pir_to_irr+0x0 ([kernel.kallsyms])
> +       CPU 3/KVM 13376/13384 [002]  7919.408803996:      branches:  ffffffffc0f60a6d vmx_sync_pir_to_irr+0xbd ([kernel.kallsyms]) => ffffffffc10adc20 kvm_lapic_find_highest_irr+0x0 ([kernel.kallsyms])
> +       CPU 3/KVM 13376/13384 [002]  7919.408804010:      branches:  ffffffffc0f60abd vmx_sync_pir_to_irr+0x10d ([kernel.kallsyms]) => ffffffffc0f60820 vmx_set_rvi+0x0 ([kernel.kallsyms])
> +       CPU 3/KVM 13376/13384 [002]  7919.408804019:      branches:  ffffffffc109ceca vcpu_enter_guest+0x66a ([kernel.kallsyms]) => ffffffffb2249840 fpregs_assert_state_consistent+0x0 ([kernel.kallsyms])
> +       CPU 3/KVM 13376/13384 [002]  7919.408804021:      branches:  ffffffffc109cf10 vcpu_enter_guest+0x6b0 ([kernel.kallsyms]) => ffffffffc0f65f30 vmx_vcpu_run+0x0 ([kernel.kallsyms])
> +       CPU 3/KVM 13376/13384 [002]  7919.408804024:      branches:  ffffffffc0f6603b vmx_vcpu_run+0x10b ([kernel.kallsyms]) => ffffffffb229bed0 __get_current_cr3_fast+0x0 ([kernel.kallsyms])
> +       CPU 3/KVM 13376/13384 [002]  7919.408804024:      branches:  ffffffffc0f66055 vmx_vcpu_run+0x125 ([kernel.kallsyms]) => ffffffffb2253050 cr4_read_shadow+0x0 ([kernel.kallsyms])
> +       CPU 3/KVM 13376/13384 [002]  7919.408804030:      branches:  ffffffffc0f6608d vmx_vcpu_run+0x15d ([kernel.kallsyms]) => ffffffffc10921e0 kvm_load_guest_xsave_state+0x0 ([kernel.kallsyms])
> +       CPU 3/KVM 13376/13384 [002]  7919.408804030:      branches:  ffffffffc1092207 kvm_load_guest_xsave_state+0x27 ([kernel.kallsyms]) => ffffffffc1092110 kvm_load_guest_xsave_state.part.0+0x0 ([kernel.kallsyms])
> +       CPU 3/KVM 13376/13384 [002]  7919.408804032:      branches:  ffffffffc0f660c6 vmx_vcpu_run+0x196 ([kernel.kallsyms]) => ffffffffb22061a0 perf_guest_get_msrs+0x0 ([kernel.kallsyms])
> +       CPU 3/KVM 13376/13384 [002]  7919.408804032:      branches:  ffffffffb22061a9 perf_guest_get_msrs+0x9 ([kernel.kallsyms]) => ffffffffb220cda0 intel_guest_get_msrs+0x0 ([kernel.kallsyms])
> +       CPU 3/KVM 13376/13384 [002]  7919.408804039:      branches:  ffffffffc0f66109 vmx_vcpu_run+0x1d9 ([kernel.kallsyms]) => ffffffffc0f652c0 clear_atomic_switch_msr+0x0 ([kernel.kallsyms])
> +       CPU 3/KVM 13376/13384 [002]  7919.408804040:      branches:  ffffffffc0f66119 vmx_vcpu_run+0x1e9 ([kernel.kallsyms]) => ffffffffc0f73f60 intel_pmu_lbr_is_enabled+0x0 ([kernel.kallsyms])
> +       CPU 3/KVM 13376/13384 [002]  7919.408804042:      branches:  ffffffffc0f73f81 intel_pmu_lbr_is_enabled+0x21 ([kernel.kallsyms]) => ffffffffc10b68e0 kvm_find_cpuid_entry+0x0 ([kernel.kallsyms])
> +       CPU 3/KVM 13376/13384 [002]  7919.408804045:      branches:  ffffffffc0f66454 vmx_vcpu_run+0x524 ([kernel.kallsyms]) => ffffffffc0f61ff0 vmx_update_hv_timer+0x0 ([kernel.kallsyms])
> +       CPU 3/KVM 13376/13384 [002]  7919.408804057:      branches:  ffffffffc0f66142 vmx_vcpu_run+0x212 ([kernel.kallsyms]) => ffffffffc10af100 kvm_wait_lapic_expire+0x0 ([kernel.kallsyms])
> +       CPU 3/KVM 13376/13384 [002]  7919.408804057:      branches:  ffffffffc0f66156 vmx_vcpu_run+0x226 ([kernel.kallsyms]) => ffffffffb2255c60 x86_virt_spec_ctrl+0x0 ([kernel.kallsyms])
> +       CPU 3/KVM 13376/13384 [002]  7919.408804057:      branches:  ffffffffc0f66161 vmx_vcpu_run+0x231 ([kernel.kallsyms]) => ffffffffc0f8eb20 vmx_vcpu_enter_exit+0x0 ([kernel.kallsyms])
> +       CPU 3/KVM 13376/13384 [002]  7919.408804057:      branches:  ffffffffc0f8eb44 vmx_vcpu_enter_exit+0x24 ([kernel.kallsyms]) => ffffffffb2353e10 rcu_note_context_switch+0x0 ([kernel.kallsyms])
> +       CPU 3/KVM 13376/13384 [002]  7919.408804057:      branches:  ffffffffb2353e1c rcu_note_context_switch+0xc ([kernel.kallsyms]) => ffffffffb2353db0 rcu_qs+0x0 ([kernel.kallsyms])
> +       CPU 3/KVM 13376/13384 [002]  7919.408804066:      branches:  ffffffffc0f8ebe0 vmx_vcpu_enter_exit+0xc0 ([kernel.kallsyms]) => ffffffffc0f8edc0 __vmx_vcpu_run+0x0 ([kernel.kallsyms])
> +       CPU 3/KVM 13376/13384 [002]  7919.408804066:      branches:  ffffffffc0f8edd5 __vmx_vcpu_run+0x15 ([kernel.kallsyms]) => ffffffffc0f8eca0 vmx_update_host_rsp+0x0 ([kernel.kallsyms])
> +       CPU 3/KVM 13376/13384 [002]  7919.408804066:      branches:  ffffffffc0f8ee1b __vmx_vcpu_run+0x5b ([kernel.kallsyms]) => ffffffffc0f8ed60 vmx_vmenter+0x0 ([kernel.kallsyms])
> +       CPU 3/KVM 13376/13384 [002]  7919.408804162:      branches:  ffffffffc0f8ed62 vmx_vmenter+0x2 ([kernel.kallsyms]) =>                0 [unknown] ([unknown])
> + VM:13376 VCPU:003            uname  3404/3404  [002]  7919.408804162:      branches:                 0 [unknown] ([unknown]) =>     7f851c9b5a5c init_cacheinfo+0x3ac (/usr/lib/x86_64-linux-gnu/libc-2.31.so)
> + VM:13376 VCPU:003            uname  3404/3404  [002]  7919.408804273:      branches:      7f851cb7c0e4 _dl_init+0x74 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) =>     7f851cb7bf50 call_init.part.0+0x0 (/usr/lib/x86_64-linux-gnu/ld-2.31.so)
> + VM:13376 VCPU:003            uname  3404/3404  [002]  7919.408804526:      branches:      55e0c00136f0 _start+0x0 (/usr/bin/uname) => ffffffff83200ac0 asm_exc_page_fault+0x0 ([kernel.kallsyms])
> + VM:13376 VCPU:003            uname  3404/3404  [002]  7919.408804526:      branches:  ffffffff83200ac3 asm_exc_page_fault+0x3 ([kernel.kallsyms]) => ffffffff83201290 error_entry+0x0 ([kernel.kallsyms])
> + VM:13376 VCPU:003            uname  3404/3404  [002]  7919.408804534:      branches:  ffffffff832012fa error_entry+0x6a ([kernel.kallsyms]) => ffffffff830b59a0 sync_regs+0x0 ([kernel.kallsyms])
> + VM:13376 VCPU:003            uname  3404/3404  [002]  7919.408804631:      branches:  ffffffff83200ad9 asm_exc_page_fault+0x19 ([kernel.kallsyms]) => ffffffff830b8210 exc_page_fault+0x0 ([kernel.kallsyms])
> + VM:13376 VCPU:003            uname  3404/3404  [002]  7919.408804631:      branches:  ffffffff830b82a4 exc_page_fault+0x94 ([kernel.kallsyms]) => ffffffff830b80e0 __kvm_handle_async_pf+0x0 ([kernel.kallsyms])
> + VM:13376 VCPU:003            uname  3404/3404  [002]  7919.408804631:      branches:  ffffffff830b80ed __kvm_handle_async_pf+0xd ([kernel.kallsyms]) => ffffffff830b80c0 kvm_read_and_reset_apf_flags+0x0 ([kernel.kallsyms])
> +
> +
>  Tracing Virtual Machines - Guest Code
>  -------------------------------------
>
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 27/35] perf tools: Add perf_event__is_guest()
  2022-07-20  1:11   ` Ian Rogers
@ 2022-07-20 14:06     ` Arnaldo Carvalho de Melo
  2022-07-20 14:56       ` Ian Rogers
  0 siblings, 1 reply; 83+ messages in thread
From: Arnaldo Carvalho de Melo @ 2022-07-20 14:06 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Adrian Hunter, Jiri Olsa, Namhyung Kim, Andi Kleen, linux-kernel, kvm

Em Tue, Jul 19, 2022 at 06:11:47PM -0700, Ian Rogers escreveu:
> On Mon, Jul 11, 2022 at 2:33 AM Adrian Hunter <adrian.hunter@intel.com> wrote:
> >
> > Add a helper function to determine if an event is a guest event.
> >
> > Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
> > ---
> >  tools/perf/util/event.h | 21 +++++++++++++++++++++
> >  1 file changed, 21 insertions(+)
> >
> > diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
> > index a660f304f83c..a7b0931d5137 100644
> > --- a/tools/perf/util/event.h
> > +++ b/tools/perf/util/event.h
> 
> Would this be better under tools/lib/perf ?

In general I think we should move things to libperf when a user requests
it, i.e. it'll be needed in a tool that uses libperf.

- Arnaldo
 
> Thanks,
> Ian
> 
> > @@ -484,4 +484,25 @@ void arch_perf_synthesize_sample_weight(const struct perf_sample *data, __u64 *a
> >  const char *arch_perf_header_entry(const char *se_header);
> >  int arch_support_sort_key(const char *sort_key);
> >
> > +static inline bool perf_event_header__cpumode_is_guest(u8 cpumode)
> > +{
> > +       return cpumode == PERF_RECORD_MISC_GUEST_KERNEL ||
> > +              cpumode == PERF_RECORD_MISC_GUEST_USER;
> > +}
> > +
> > +static inline bool perf_event_header__misc_is_guest(u16 misc)
> > +{
> > +       return perf_event_header__cpumode_is_guest(misc & PERF_RECORD_MISC_CPUMODE_MASK);
> > +}
> > +
> > +static inline bool perf_event_header__is_guest(const struct perf_event_header *header)
> > +{
> > +       return perf_event_header__misc_is_guest(header->misc);
> > +}
> > +
> > +static inline bool perf_event__is_guest(const union perf_event *event)
> > +{
> > +       return perf_event_header__is_guest(&event->header);
> > +}
> > +
> >  #endif /* __PERF_RECORD_H */
> > --
> > 2.25.1
> >

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 27/35] perf tools: Add perf_event__is_guest()
  2022-07-20 14:06     ` Arnaldo Carvalho de Melo
@ 2022-07-20 14:56       ` Ian Rogers
  0 siblings, 0 replies; 83+ messages in thread
From: Ian Rogers @ 2022-07-20 14:56 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Adrian Hunter, Jiri Olsa, Namhyung Kim, Andi Kleen, linux-kernel, kvm

On Wed, Jul 20, 2022 at 7:06 AM Arnaldo Carvalho de Melo
<acme@kernel.org> wrote:
>
> Em Tue, Jul 19, 2022 at 06:11:47PM -0700, Ian Rogers escreveu:
> > On Mon, Jul 11, 2022 at 2:33 AM Adrian Hunter <adrian.hunter@intel.com> wrote:
> > >
> > > Add a helper function to determine if an event is a guest event.
> > >
> > > Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
> > > ---
> > >  tools/perf/util/event.h | 21 +++++++++++++++++++++
> > >  1 file changed, 21 insertions(+)
> > >
> > > diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
> > > index a660f304f83c..a7b0931d5137 100644
> > > --- a/tools/perf/util/event.h
> > > +++ b/tools/perf/util/event.h
> >
> > Would this be better under tools/lib/perf ?
>
> In general I think we should move things to libperf when a user requests
> it, i.e. it'll be needed in a tool that uses libperf.

The perf_event_header is defined in libperf. If we're worried about
exposing the API, we could keep it in the internal include files. To
explain my thinking, if something like cpumap or perf_event_header
live in libperf, then it makes sense to me that the structs, accessors
and the like also live there. Having the code standing in both perf
and libperf is a transitory state we should be working to remove.

I don't see this as a big deal, so don't mind the code not being in libperf :-)

Thanks,
Ian

> - Arnaldo
>
> > Thanks,
> > Ian
> >
> > > @@ -484,4 +484,25 @@ void arch_perf_synthesize_sample_weight(const struct perf_sample *data, __u64 *a
> > >  const char *arch_perf_header_entry(const char *se_header);
> > >  int arch_support_sort_key(const char *sort_key);
> > >
> > > +static inline bool perf_event_header__cpumode_is_guest(u8 cpumode)
> > > +{
> > > +       return cpumode == PERF_RECORD_MISC_GUEST_KERNEL ||
> > > +              cpumode == PERF_RECORD_MISC_GUEST_USER;
> > > +}
> > > +
> > > +static inline bool perf_event_header__misc_is_guest(u16 misc)
> > > +{
> > > +       return perf_event_header__cpumode_is_guest(misc & PERF_RECORD_MISC_CPUMODE_MASK);
> > > +}
> > > +
> > > +static inline bool perf_event_header__is_guest(const struct perf_event_header *header)
> > > +{
> > > +       return perf_event_header__misc_is_guest(header->misc);
> > > +}
> > > +
> > > +static inline bool perf_event__is_guest(const union perf_event *event)
> > > +{
> > > +       return perf_event_header__is_guest(&event->header);
> > > +}
> > > +
> > >  #endif /* __PERF_RECORD_H */
> > > --
> > > 2.25.1
> > >
>
> --
>
> - Arnaldo

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 04/35] perf tools: Export perf_event__process_finished_round()
  2022-07-19 17:04   ` Ian Rogers
@ 2022-08-09 11:37     ` Adrian Hunter
  0 siblings, 0 replies; 83+ messages in thread
From: Adrian Hunter @ 2022-08-09 11:37 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Namhyung Kim, Andi Kleen,
	linux-kernel, kvm

On 19/07/22 20:04, Ian Rogers wrote:
> On Mon, Jul 11, 2022 at 2:32 AM Adrian Hunter <adrian.hunter@intel.com> wrote:
>>
>> Export perf_event__process_finished_round() so it can be used elsewhere.
>>
>> This is needed in perf inject to obey finished-round ordering.
>>
>> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
>> ---
>>  tools/perf/util/session.c | 12 ++++--------
>>  tools/perf/util/session.h |  4 ++++
>>  2 files changed, 8 insertions(+), 8 deletions(-)
>>
>> diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
>> index 37f833c3c81b..4c9513bc6d89 100644
>> --- a/tools/perf/util/session.c
>> +++ b/tools/perf/util/session.c
>> @@ -374,10 +374,6 @@ static int process_finished_round_stub(struct perf_tool *tool __maybe_unused,
>>         return 0;
>>  }
>>
>> -static int process_finished_round(struct perf_tool *tool,
>> -                                 union perf_event *event,
>> -                                 struct ordered_events *oe);
>> -
>>  static int skipn(int fd, off_t n)
>>  {
>>         char buf[4096];
>> @@ -534,7 +530,7 @@ void perf_tool__fill_defaults(struct perf_tool *tool)
>>                 tool->build_id = process_event_op2_stub;
>>         if (tool->finished_round == NULL) {
>>                 if (tool->ordered_events)
>> -                       tool->finished_round = process_finished_round;
>> +                       tool->finished_round = perf_event__process_finished_round;
>>                 else
>>                         tool->finished_round = process_finished_round_stub;
>>         }
>> @@ -1069,9 +1065,9 @@ static perf_event__swap_op perf_event__swap_ops[] = {
>>   *      Flush every events below timestamp 7
>>   *      etc...
>>   */
>> -static int process_finished_round(struct perf_tool *tool __maybe_unused,
>> -                                 union perf_event *event __maybe_unused,
>> -                                 struct ordered_events *oe)
>> +int perf_event__process_finished_round(struct perf_tool *tool __maybe_unused,
>> +                                      union perf_event *event __maybe_unused,
>> +                                      struct ordered_events *oe)
>>  {
>>         if (dump_trace)
>>                 fprintf(stdout, "\n");
>> diff --git a/tools/perf/util/session.h b/tools/perf/util/session.h
>> index 34500a3da735..be5871ea558f 100644
>> --- a/tools/perf/util/session.h
>> +++ b/tools/perf/util/session.h
>> @@ -155,4 +155,8 @@ int perf_session__deliver_synth_event(struct perf_session *session,
>>  int perf_event__process_id_index(struct perf_session *session,
>>                                  union perf_event *event);
>>
>> +int perf_event__process_finished_round(struct perf_tool *tool,
>> +                                      union perf_event *event,
>> +                                      struct ordered_events *oe);
>> +
> 
> Sorry to be naive, why is this  perf_event__ and not perf_session__ ..

No idea, but it is fairly consistent for tool callback functions.

> well I guess it is at least passed an event even though it doesn't use
> it. Would be nice if there were comments, but this change is just
> shifting things around. Anyway..
> 
> Acked-by: Ian Rogers <irogers@google.com>
> 
> Thanks,
> Ian
> 
>>  #endif /* __PERF_SESSION_H */
>> --
>> 2.25.1
>>


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 05/35] perf tools: Factor out evsel__id_hdr_size()
  2022-07-19 17:09   ` Ian Rogers
@ 2022-08-09 11:49     ` Adrian Hunter
  2022-08-09 17:07       ` Ian Rogers
  0 siblings, 1 reply; 83+ messages in thread
From: Adrian Hunter @ 2022-08-09 11:49 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Namhyung Kim, Andi Kleen,
	linux-kernel, kvm

On 19/07/22 20:09, Ian Rogers wrote:
> On Mon, Jul 11, 2022 at 2:32 AM Adrian Hunter <adrian.hunter@intel.com> wrote:
>>
>> Factor out evsel__id_hdr_size() so it can be reused.
>>
>> This is needed by perf inject. When injecting events from a guest perf.data
>> file, there is a possibility that the sample ID numbers conflict. To
>> re-write an ID sample, the old one needs to be removed first, which means
>> determining how big it is with evsel__id_hdr_size() and then subtracting
>> that from the event size.
>>
>> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
>> ---
>>  tools/perf/util/evlist.c | 28 +---------------------------
>>  tools/perf/util/evsel.c  | 26 ++++++++++++++++++++++++++
>>  tools/perf/util/evsel.h  |  2 ++
>>  3 files changed, 29 insertions(+), 27 deletions(-)
>>
>> diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
>> index 48af7d379d82..03fbe151b0c4 100644
>> --- a/tools/perf/util/evlist.c
>> +++ b/tools/perf/util/evlist.c
>> @@ -1244,34 +1244,8 @@ bool evlist__valid_read_format(struct evlist *evlist)
>>  u16 evlist__id_hdr_size(struct evlist *evlist)
>>  {
>>         struct evsel *first = evlist__first(evlist);
>> -       struct perf_sample *data;
>> -       u64 sample_type;
>> -       u16 size = 0;
>>
>> -       if (!first->core.attr.sample_id_all)
>> -               goto out;
>> -
>> -       sample_type = first->core.attr.sample_type;
>> -
>> -       if (sample_type & PERF_SAMPLE_TID)
>> -               size += sizeof(data->tid) * 2;
>> -
>> -       if (sample_type & PERF_SAMPLE_TIME)
>> -               size += sizeof(data->time);
>> -
>> -       if (sample_type & PERF_SAMPLE_ID)
>> -               size += sizeof(data->id);
>> -
>> -       if (sample_type & PERF_SAMPLE_STREAM_ID)
>> -               size += sizeof(data->stream_id);
>> -
>> -       if (sample_type & PERF_SAMPLE_CPU)
>> -               size += sizeof(data->cpu) * 2;
>> -
>> -       if (sample_type & PERF_SAMPLE_IDENTIFIER)
>> -               size += sizeof(data->id);
>> -out:
>> -       return size;
>> +       return first->core.attr.sample_id_all ? evsel__id_hdr_size(first) : 0;
>>  }
>>
>>  bool evlist__valid_sample_id_all(struct evlist *evlist)
>> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
>> index a67cc3f2fa74..9a30ccb7b104 100644
>> --- a/tools/perf/util/evsel.c
>> +++ b/tools/perf/util/evsel.c
>> @@ -2724,6 +2724,32 @@ int evsel__parse_sample_timestamp(struct evsel *evsel, union perf_event *event,
>>         return 0;
>>  }
>>
>> +u16 evsel__id_hdr_size(struct evsel *evsel)
>> +{
>> +       u64 sample_type = evsel->core.attr.sample_type;
> 
> As this just uses core, would it be more appropriate to put it in libperf?

AFAIK we move to libperf only as needed.

> 
>> +       u16 size = 0;
> 
> Perhaps size_t or int? u16 seems odd.

Event header size member is 16-bit

> 
>> +
>> +       if (sample_type & PERF_SAMPLE_TID)
>> +               size += sizeof(u64);
>> +
>> +       if (sample_type & PERF_SAMPLE_TIME)
>> +               size += sizeof(u64);
>> +
>> +       if (sample_type & PERF_SAMPLE_ID)
>> +               size += sizeof(u64);
>> +
>> +       if (sample_type & PERF_SAMPLE_STREAM_ID)
>> +               size += sizeof(u64);
>> +
>> +       if (sample_type & PERF_SAMPLE_CPU)
>> +               size += sizeof(u64);
>> +
>> +       if (sample_type & PERF_SAMPLE_IDENTIFIER)
>> +               size += sizeof(u64);
>> +
>> +       return size;
>> +}
>> +
>>  struct tep_format_field *evsel__field(struct evsel *evsel, const char *name)
>>  {
>>         return tep_find_field(evsel->tp_format, name);
>> diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
>> index 92bed8e2f7d8..699448f2bc2b 100644
>> --- a/tools/perf/util/evsel.h
>> +++ b/tools/perf/util/evsel.h
>> @@ -381,6 +381,8 @@ int evsel__parse_sample(struct evsel *evsel, union perf_event *event,
>>  int evsel__parse_sample_timestamp(struct evsel *evsel, union perf_event *event,
>>                                   u64 *timestamp);
>>
>> +u16 evsel__id_hdr_size(struct evsel *evsel);
>> +
> 
> A comment would be nice, I know this is just moving code about but
> this is a new function.
> 
> Thanks,
> Ian
> 
>>  static inline struct evsel *evsel__next(struct evsel *evsel)
>>  {
>>         return list_entry(evsel->core.node.next, struct evsel, core.node);
>> --
>> 2.25.1
>>


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 10/35] perf tools: Add machine_pid and vcpu to id_index
  2022-07-19 17:48   ` Ian Rogers
@ 2022-08-09 12:19     ` Adrian Hunter
  0 siblings, 0 replies; 83+ messages in thread
From: Adrian Hunter @ 2022-08-09 12:19 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Namhyung Kim, Andi Kleen,
	linux-kernel, kvm

On 19/07/22 20:48, Ian Rogers wrote:
> On Mon, Jul 11, 2022 at 2:33 AM Adrian Hunter <adrian.hunter@intel.com> wrote:
>>
>> When injecting events from a guest perf.data file, the events will have
>> separate sample ID numbers. These ID numbers can then be used to determine
>> which machine an event belongs to. To facilitate that, add machine_pid and
>> vcpu to id_index records. For backward compatibility, these are added at
>> the end of the record, and the length of the record is used to determine
>> if they are present or not.
>>
>> Note, this is needed because the events from a guest perf.data file contain
>> the pid/tid of the process running at that time inside the VM not the
>> pid/tid of the (QEMU) hypervisor thread. So a way is needed to relate
>> guest events back to the guest machine and VCPU, and using sample ID
>> numbers for that is relatively simple and convenient.
>>
>> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
>> ---
>>  tools/lib/perf/include/internal/evsel.h |  4 ++
>>  tools/lib/perf/include/perf/event.h     |  5 +++
>>  tools/perf/util/session.c               | 40 ++++++++++++++++---
>>  tools/perf/util/synthetic-events.c      | 51 +++++++++++++++++++------
>>  tools/perf/util/synthetic-events.h      |  1 +
>>  5 files changed, 84 insertions(+), 17 deletions(-)
>>
>> diff --git a/tools/lib/perf/include/internal/evsel.h b/tools/lib/perf/include/internal/evsel.h
>> index 2a912a1f1989..a99a75d9e78f 100644
>> --- a/tools/lib/perf/include/internal/evsel.h
>> +++ b/tools/lib/perf/include/internal/evsel.h
>> @@ -30,6 +30,10 @@ struct perf_sample_id {
>>         struct perf_cpu          cpu;
>>         pid_t                    tid;
>>
>> +       /* Guest machine pid and VCPU, valid only if machine_pid is non-zero */
>> +       pid_t                    machine_pid;
>> +       struct perf_cpu          vcpu;
>> +
>>         /* Holds total ID period value for PERF_SAMPLE_READ processing. */
>>         u64                      period;
>>  };
>> diff --git a/tools/lib/perf/include/perf/event.h b/tools/lib/perf/include/perf/event.h
>> index 9f7ca070da87..c2dbd3e88885 100644
>> --- a/tools/lib/perf/include/perf/event.h
>> +++ b/tools/lib/perf/include/perf/event.h
>> @@ -237,6 +237,11 @@ struct id_index_entry {
>>         __u64                    tid;
>>  };
>>
>> +struct id_index_entry_2 {
>> +       __u64                    machine_pid;
>> +       __u64                    vcpu;
>> +};
>> +
>>  struct perf_record_id_index {
>>         struct perf_event_header header;
>>         __u64                    nr;
>> diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
>> index 4c9513bc6d89..5141fe164e97 100644
>> --- a/tools/perf/util/session.c
>> +++ b/tools/perf/util/session.c
>> @@ -2756,18 +2756,35 @@ int perf_event__process_id_index(struct perf_session *session,
>>  {
>>         struct evlist *evlist = session->evlist;
>>         struct perf_record_id_index *ie = &event->id_index;
>> +       size_t sz = ie->header.size - sizeof(*ie);
>>         size_t i, nr, max_nr;
>> +       size_t e1_sz = sizeof(struct id_index_entry);
>> +       size_t e2_sz = sizeof(struct id_index_entry_2);
>> +       size_t etot_sz = e1_sz + e2_sz;
>> +       struct id_index_entry_2 *e2;
>>
>> -       max_nr = (ie->header.size - sizeof(struct perf_record_id_index)) /
>> -                sizeof(struct id_index_entry);
>> +       max_nr = sz / e1_sz;
>>         nr = ie->nr;
>> -       if (nr > max_nr)
>> +       if (nr > max_nr) {
>> +               printf("Too big: nr %zu max_nr %zu\n", nr, max_nr);
>>                 return -EINVAL;
>> +       }
>> +
>> +       if (sz >= nr * etot_sz) {
>> +               max_nr = sz / etot_sz;
>> +               if (nr > max_nr) {
>> +                       printf("Too big2: nr %zu max_nr %zu\n", nr, max_nr);
>> +                       return -EINVAL;
>> +               }
>> +               e2 = (void *)ie + sizeof(*ie) + nr * e1_sz;
>> +       } else {
>> +               e2 = NULL;
>> +       }
>>
>>         if (dump_trace)
>>                 fprintf(stdout, " nr: %zu\n", nr);
>>
>> -       for (i = 0; i < nr; i++) {
>> +       for (i = 0; i < nr; i++, (e2 ? e2++ : 0)) {
>>                 struct id_index_entry *e = &ie->entries[i];
>>                 struct perf_sample_id *sid;
>>
>> @@ -2775,15 +2792,28 @@ int perf_event__process_id_index(struct perf_session *session,
>>                         fprintf(stdout, " ... id: %"PRI_lu64, e->id);
>>                         fprintf(stdout, "  idx: %"PRI_lu64, e->idx);
>>                         fprintf(stdout, "  cpu: %"PRI_ld64, e->cpu);
>> -                       fprintf(stdout, "  tid: %"PRI_ld64"\n", e->tid);
>> +                       fprintf(stdout, "  tid: %"PRI_ld64, e->tid);
>> +                       if (e2) {
>> +                               fprintf(stdout, "  machine_pid: %"PRI_ld64, e2->machine_pid);
>> +                               fprintf(stdout, "  vcpu: %"PRI_lu64"\n", e2->vcpu);
>> +                       } else {
>> +                               fprintf(stdout, "\n");
>> +                       }
>>                 }
>>
>>                 sid = evlist__id2sid(evlist, e->id);
>>                 if (!sid)
>>                         return -ENOENT;
>> +
>>                 sid->idx = e->idx;
>>                 sid->cpu.cpu = e->cpu;
>>                 sid->tid = e->tid;
>> +
>> +               if (!e2)
>> +                       continue;
>> +
>> +               sid->machine_pid = e2->machine_pid;
>> +               sid->vcpu.cpu = e2->vcpu;
>>         }
>>         return 0;
>>  }
>> diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic-events.c
>> index ed9623702f34..2ae59c03ae77 100644
>> --- a/tools/perf/util/synthetic-events.c
>> +++ b/tools/perf/util/synthetic-events.c
>> @@ -1759,19 +1759,26 @@ int perf_event__synthesize_id_sample(__u64 *array, u64 type, const struct perf_s
>>         return (void *)array - (void *)start;
>>  }
>>
>> -int perf_event__synthesize_id_index(struct perf_tool *tool, perf_event__handler_t process,
>> -                                   struct evlist *evlist, struct machine *machine)
>> +int __perf_event__synthesize_id_index(struct perf_tool *tool, perf_event__handler_t process,
>> +                                     struct evlist *evlist, struct machine *machine, size_t from)
>>  {
>>         union perf_event *ev;
>>         struct evsel *evsel;
>> -       size_t nr = 0, i = 0, sz, max_nr, n;
>> +       size_t nr = 0, i = 0, sz, max_nr, n, pos;
>> +       size_t e1_sz = sizeof(struct id_index_entry);
>> +       size_t e2_sz = sizeof(struct id_index_entry_2);
>> +       size_t etot_sz = e1_sz + e2_sz;
>> +       bool e2_needed = false;
>>         int err;
>>
>> -       max_nr = (UINT16_MAX - sizeof(struct perf_record_id_index)) /
>> -                sizeof(struct id_index_entry);
>> +       max_nr = (UINT16_MAX - sizeof(struct perf_record_id_index)) / etot_sz;
>>
>> -       evlist__for_each_entry(evlist, evsel)
>> +       pos = 0;
>> +       evlist__for_each_entry(evlist, evsel) {
>> +               if (pos++ < from)
>> +                       continue;
>>                 nr += evsel->core.ids;
>> +       }
>>
>>         if (!nr)
>>                 return 0;
>> @@ -1779,31 +1786,38 @@ int perf_event__synthesize_id_index(struct perf_tool *tool, perf_event__handler_
>>         pr_debug2("Synthesizing id index\n");
>>
>>         n = nr > max_nr ? max_nr : nr;
>> -       sz = sizeof(struct perf_record_id_index) + n * sizeof(struct id_index_entry);
>> +       sz = sizeof(struct perf_record_id_index) + n * etot_sz;
>>         ev = zalloc(sz);
>>         if (!ev)
>>                 return -ENOMEM;
>>
>> +       sz = sizeof(struct perf_record_id_index) + n * e1_sz;
>> +
>>         ev->id_index.header.type = PERF_RECORD_ID_INDEX;
>> -       ev->id_index.header.size = sz;
>>         ev->id_index.nr = n;
>>
>> +       pos = 0;
>>         evlist__for_each_entry(evlist, evsel) {
>>                 u32 j;
>>
>> -               for (j = 0; j < evsel->core.ids; j++) {
>> +               if (pos++ < from)
>> +                       continue;
>> +               for (j = 0; j < evsel->core.ids; j++, i++) {
>>                         struct id_index_entry *e;
>> +                       struct id_index_entry_2 *e2;
>>                         struct perf_sample_id *sid;
>>
>>                         if (i >= n) {
>> +                               ev->id_index.header.size = sz + (e2_needed ? n * e2_sz : 0);
>>                                 err = process(tool, ev, NULL, machine);
>>                                 if (err)
>>                                         goto out_err;
>>                                 nr -= n;
>>                                 i = 0;
>> +                               e2_needed = false;
>>                         }
>>
>> -                       e = &ev->id_index.entries[i++];
>> +                       e = &ev->id_index.entries[i];
>>
>>                         e->id = evsel->core.id[j];
>>
>> @@ -1816,11 +1830,18 @@ int perf_event__synthesize_id_index(struct perf_tool *tool, perf_event__handler_
>>                         e->idx = sid->idx;
>>                         e->cpu = sid->cpu.cpu;
>>                         e->tid = sid->tid;
>> +
>> +                       if (sid->machine_pid)
>> +                               e2_needed = true;
>> +
>> +                       e2 = (void *)ev + sz;
>> +                       e2[i].machine_pid = sid->machine_pid;
>> +                       e2[i].vcpu        = sid->vcpu.cpu;
>>                 }
>>         }
>>
>> -       sz = sizeof(struct perf_record_id_index) + nr * sizeof(struct id_index_entry);
>> -       ev->id_index.header.size = sz;
>> +       sz = sizeof(struct perf_record_id_index) + nr * e1_sz;
>> +       ev->id_index.header.size = sz + (e2_needed ? nr * e2_sz : 0);
>>         ev->id_index.nr = nr;
>>
>>         err = process(tool, ev, NULL, machine);
>> @@ -1830,6 +1851,12 @@ int perf_event__synthesize_id_index(struct perf_tool *tool, perf_event__handler_
>>         return err;
>>  }
>>
>> +int perf_event__synthesize_id_index(struct perf_tool *tool, perf_event__handler_t process,
>> +                                   struct evlist *evlist, struct machine *machine)
>> +{
>> +       return __perf_event__synthesize_id_index(tool, process, evlist, machine, 0);
>> +}
>> +
>>  int __machine__synthesize_threads(struct machine *machine, struct perf_tool *tool,
>>                                   struct target *target, struct perf_thread_map *threads,
>>                                   perf_event__handler_t process, bool needs_mmap,
>> diff --git a/tools/perf/util/synthetic-events.h b/tools/perf/util/synthetic-events.h
>> index b136ec3ec95d..81cb3d6af0b9 100644
>> --- a/tools/perf/util/synthetic-events.h
>> +++ b/tools/perf/util/synthetic-events.h
>> @@ -55,6 +55,7 @@ int perf_event__synthesize_extra_attr(struct perf_tool *tool, struct evlist *evs
>>  int perf_event__synthesize_extra_kmaps(struct perf_tool *tool, perf_event__handler_t process, struct machine *machine);
>>  int perf_event__synthesize_features(struct perf_tool *tool, struct perf_session *session, struct evlist *evlist, perf_event__handler_t process);
>>  int perf_event__synthesize_id_index(struct perf_tool *tool, perf_event__handler_t process, struct evlist *evlist, struct machine *machine);
>> +int __perf_event__synthesize_id_index(struct perf_tool *tool, perf_event__handler_t process, struct evlist *evlist, struct machine *machine, size_t from);
> 
> Given there is only 1 use in the file defining the function, should
> this just be static with no header file declaration?

It is used perf inject also.

> 
> Thanks,
> Ian
> 
>>  int perf_event__synthesize_id_sample(__u64 *array, u64 type, const struct perf_sample *sample);
>>  int perf_event__synthesize_kernel_mmap(struct perf_tool *tool, perf_event__handler_t process, struct machine *machine);
>>  int perf_event__synthesize_mmap_events(struct perf_tool *tool, union perf_event *event, pid_t pid, pid_t tgid, perf_event__handler_t process, struct machine *machine, bool mmap_data);
>> --
>> 2.25.1
>>


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 08/35] perf buildid-cache: Add guestmount'd files to the build ID cache
  2022-07-19 17:41   ` Ian Rogers
@ 2022-08-09 12:21     ` Adrian Hunter
  0 siblings, 0 replies; 83+ messages in thread
From: Adrian Hunter @ 2022-08-09 12:21 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Namhyung Kim, Andi Kleen,
	linux-kernel, kvm

On 19/07/22 20:41, Ian Rogers wrote:
> On Mon, Jul 11, 2022 at 2:33 AM Adrian Hunter <adrian.hunter@intel.com> wrote:
>>
>> When the guestmount option is used, a guest machine's file system mount
>> point is recorded in machine->root_dir.
>>
>> perf already iterates guest machines when adding files to the build ID
>> cache, but does not take machine->root_dir into account.
>>
>> Use machine->root_dir to find files for guest build IDs, and add them to
>> the build ID cache using the "proper" name i.e. relative to the guest root
>> directory not the host root directory.
>>
>> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
> 
> Is it plausible to add a test for this? Our tests create workload but
> there's no existing hypervisor way to do this. Perhaps the test can
> run in a hypervisor? Or maybe there's a route that doesn't involve
> hypervisors.

Too complicated I think.

> 
> Acked-by: Ian Rogers <irogers@google.com>
> 
> Thanks,
> Ian
> 
>> ---
>>  tools/perf/util/build-id.c | 67 +++++++++++++++++++++++++++++---------
>>  tools/perf/util/build-id.h | 16 ++++++---
>>  2 files changed, 63 insertions(+), 20 deletions(-)
>>
>> diff --git a/tools/perf/util/build-id.c b/tools/perf/util/build-id.c
>> index 4c9093b64d1f..7c9f441936ee 100644
>> --- a/tools/perf/util/build-id.c
>> +++ b/tools/perf/util/build-id.c
>> @@ -625,9 +625,12 @@ static int build_id_cache__add_sdt_cache(const char *sbuild_id,
>>  #endif
>>
>>  static char *build_id_cache__find_debug(const char *sbuild_id,
>> -                                       struct nsinfo *nsi)
>> +                                       struct nsinfo *nsi,
>> +                                       const char *root_dir)
>>  {
>> +       const char *dirname = "/usr/lib/debug/.build-id/";
>>         char *realname = NULL;
>> +       char dirbuf[PATH_MAX];
>>         char *debugfile;
>>         struct nscookie nsc;
>>         size_t len = 0;
>> @@ -636,8 +639,12 @@ static char *build_id_cache__find_debug(const char *sbuild_id,
>>         if (!debugfile)
>>                 goto out;
>>
>> -       len = __symbol__join_symfs(debugfile, PATH_MAX,
>> -                                  "/usr/lib/debug/.build-id/");
>> +       if (root_dir) {
>> +               path__join(dirbuf, PATH_MAX, root_dir, dirname);
>> +               dirname = dirbuf;
>> +       }
>> +
>> +       len = __symbol__join_symfs(debugfile, PATH_MAX, dirname);
>>         snprintf(debugfile + len, PATH_MAX - len, "%.2s/%s.debug", sbuild_id,
>>                  sbuild_id + 2);
>>
>> @@ -668,14 +675,18 @@ static char *build_id_cache__find_debug(const char *sbuild_id,
>>
>>  int
>>  build_id_cache__add(const char *sbuild_id, const char *name, const char *realname,
>> -                   struct nsinfo *nsi, bool is_kallsyms, bool is_vdso)
>> +                   struct nsinfo *nsi, bool is_kallsyms, bool is_vdso,
>> +                   const char *proper_name, const char *root_dir)
>>  {
>>         const size_t size = PATH_MAX;
>>         char *filename = NULL, *dir_name = NULL, *linkname = zalloc(size), *tmp;
>>         char *debugfile = NULL;
>>         int err = -1;
>>
>> -       dir_name = build_id_cache__cachedir(sbuild_id, name, nsi, is_kallsyms,
>> +       if (!proper_name)
>> +               proper_name = name;
>> +
>> +       dir_name = build_id_cache__cachedir(sbuild_id, proper_name, nsi, is_kallsyms,
>>                                             is_vdso);
>>         if (!dir_name)
>>                 goto out_free;
>> @@ -715,7 +726,7 @@ build_id_cache__add(const char *sbuild_id, const char *name, const char *realnam
>>          */
>>         if (!is_kallsyms && !is_vdso &&
>>             strncmp(".ko", name + strlen(name) - 3, 3)) {
>> -               debugfile = build_id_cache__find_debug(sbuild_id, nsi);
>> +               debugfile = build_id_cache__find_debug(sbuild_id, nsi, root_dir);
>>                 if (debugfile) {
>>                         zfree(&filename);
>>                         if (asprintf(&filename, "%s/%s", dir_name,
>> @@ -781,8 +792,9 @@ build_id_cache__add(const char *sbuild_id, const char *name, const char *realnam
>>         return err;
>>  }
>>
>> -int build_id_cache__add_s(const char *sbuild_id, const char *name,
>> -                         struct nsinfo *nsi, bool is_kallsyms, bool is_vdso)
>> +int __build_id_cache__add_s(const char *sbuild_id, const char *name,
>> +                           struct nsinfo *nsi, bool is_kallsyms, bool is_vdso,
>> +                           const char *proper_name, const char *root_dir)
>>  {
>>         char *realname = NULL;
>>         int err = -1;
>> @@ -796,8 +808,8 @@ int build_id_cache__add_s(const char *sbuild_id, const char *name,
>>                         goto out_free;
>>         }
>>
>> -       err = build_id_cache__add(sbuild_id, name, realname, nsi, is_kallsyms, is_vdso);
>> -
>> +       err = build_id_cache__add(sbuild_id, name, realname, nsi,
>> +                                 is_kallsyms, is_vdso, proper_name, root_dir);
>>  out_free:
>>         if (!is_kallsyms)
>>                 free(realname);
>> @@ -806,14 +818,16 @@ int build_id_cache__add_s(const char *sbuild_id, const char *name,
>>
>>  static int build_id_cache__add_b(const struct build_id *bid,
>>                                  const char *name, struct nsinfo *nsi,
>> -                                bool is_kallsyms, bool is_vdso)
>> +                                bool is_kallsyms, bool is_vdso,
>> +                                const char *proper_name,
>> +                                const char *root_dir)
>>  {
>>         char sbuild_id[SBUILD_ID_SIZE];
>>
>>         build_id__sprintf(bid, sbuild_id);
>>
>> -       return build_id_cache__add_s(sbuild_id, name, nsi, is_kallsyms,
>> -                                    is_vdso);
>> +       return __build_id_cache__add_s(sbuild_id, name, nsi, is_kallsyms,
>> +                                      is_vdso, proper_name, root_dir);
>>  }
>>
>>  bool build_id_cache__cached(const char *sbuild_id)
>> @@ -896,6 +910,10 @@ static int dso__cache_build_id(struct dso *dso, struct machine *machine,
>>         bool is_kallsyms = dso__is_kallsyms(dso);
>>         bool is_vdso = dso__is_vdso(dso);
>>         const char *name = dso->long_name;
>> +       const char *proper_name = NULL;
>> +       const char *root_dir = NULL;
>> +       char *allocated_name = NULL;
>> +       int ret = 0;
>>
>>         if (!dso->has_build_id)
>>                 return 0;
>> @@ -905,11 +923,28 @@ static int dso__cache_build_id(struct dso *dso, struct machine *machine,
>>                 name = machine->mmap_name;
>>         }
>>
>> +       if (!machine__is_host(machine)) {
>> +               if (*machine->root_dir) {
>> +                       root_dir = machine->root_dir;
>> +                       ret = asprintf(&allocated_name, "%s/%s", root_dir, name);
>> +                       if (ret < 0)
>> +                               return ret;
>> +                       proper_name = name;
>> +                       name = allocated_name;
>> +               } else if (is_kallsyms) {
>> +                       /* Cannot get guest kallsyms */
>> +                       return 0;
>> +               }
>> +       }
>> +
>>         if (!is_kallsyms && dso__build_id_mismatch(dso, name))
>> -               return 0;
>> +               goto out_free;
>>
>> -       return build_id_cache__add_b(&dso->bid, name, dso->nsinfo,
>> -                                    is_kallsyms, is_vdso);
>> +       ret = build_id_cache__add_b(&dso->bid, name, dso->nsinfo,
>> +                                   is_kallsyms, is_vdso, proper_name, root_dir);
>> +out_free:
>> +       free(allocated_name);
>> +       return ret;
>>  }
>>
>>  static int
>> diff --git a/tools/perf/util/build-id.h b/tools/perf/util/build-id.h
>> index c19617151670..4e3a1169379b 100644
>> --- a/tools/perf/util/build-id.h
>> +++ b/tools/perf/util/build-id.h
>> @@ -66,10 +66,18 @@ int build_id_cache__list_build_ids(const char *pathname, struct nsinfo *nsi,
>>                                    struct strlist **result);
>>  bool build_id_cache__cached(const char *sbuild_id);
>>  int build_id_cache__add(const char *sbuild_id, const char *name, const char *realname,
>> -                       struct nsinfo *nsi, bool is_kallsyms, bool is_vdso);
>> -int build_id_cache__add_s(const char *sbuild_id,
>> -                         const char *name, struct nsinfo *nsi,
>> -                         bool is_kallsyms, bool is_vdso);
>> +                       struct nsinfo *nsi, bool is_kallsyms, bool is_vdso,
>> +                       const char *proper_name, const char *root_dir);
>> +int __build_id_cache__add_s(const char *sbuild_id,
>> +                           const char *name, struct nsinfo *nsi,
>> +                           bool is_kallsyms, bool is_vdso,
>> +                           const char *proper_name, const char *root_dir);
>> +static inline int build_id_cache__add_s(const char *sbuild_id,
>> +                                       const char *name, struct nsinfo *nsi,
>> +                                       bool is_kallsyms, bool is_vdso)
>> +{
>> +       return __build_id_cache__add_s(sbuild_id, name, nsi, is_kallsyms, is_vdso, NULL, NULL);
>> +}
>>  int build_id_cache__remove_s(const char *sbuild_id);
>>
>>  extern char buildid_dir[];
>> --
>> 2.25.1
>>


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 23/35] perf tools: Add reallocarray_as_needed()
  2022-07-20  0:55   ` Ian Rogers
@ 2022-08-09 16:48     ` Adrian Hunter
  0 siblings, 0 replies; 83+ messages in thread
From: Adrian Hunter @ 2022-08-09 16:48 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Namhyung Kim, Andi Kleen,
	linux-kernel, kvm

On 20/07/22 03:55, Ian Rogers wrote:
> On Mon, Jul 11, 2022 at 2:33 AM Adrian Hunter <adrian.hunter@intel.com> wrote:
>>
>> Add helper reallocarray_as_needed() to reallocate an array to a larger
>> size and initialize the extra entries to an arbitrary value.
>>
>> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
>> ---
>>  tools/perf/util/util.c | 33 +++++++++++++++++++++++++++++++++
>>  tools/perf/util/util.h | 15 +++++++++++++++
>>  2 files changed, 48 insertions(+)
>>
>> diff --git a/tools/perf/util/util.c b/tools/perf/util/util.c
>> index 9b02edf9311d..391c1e928bd7 100644
>> --- a/tools/perf/util/util.c
>> +++ b/tools/perf/util/util.c
>> @@ -18,6 +18,7 @@
>>  #include <linux/kernel.h>
>>  #include <linux/log2.h>
>>  #include <linux/time64.h>
>> +#include <linux/overflow.h>
>>  #include <unistd.h>
>>  #include "cap.h"
>>  #include "strlist.h"
>> @@ -500,3 +501,35 @@ char *filename_with_chroot(int pid, const char *filename)
>>
>>         return new_name;
>>  }
>> +
>> +/*
>> + * Reallocate an array *arr of size *arr_sz so that it is big enough to contain
>> + * x elements of size msz, initializing new entries to *init_val or zero if
>> + * init_val is NULL
>> + */
>> +int do_realloc_array_as_needed(void **arr, size_t *arr_sz, size_t x, size_t msz, const void *init_val)
> 
> This feels a little like a 1-dimensional xyarray, could we make a
> similar abstraction to avoid passing all these values around?

xyarray does not realloc which is the only thing that is needed in
this case. C isn't C++, so adding an abstraction would be clunky.

> 
> Thanks,
> Ian
> 
>> +{
>> +       size_t new_sz = *arr_sz;
>> +       void *new_arr;
>> +       size_t i;
>> +
>> +       if (!new_sz)
>> +               new_sz = msz >= 64 ? 1 : roundup(64, msz); /* Start with at least 64 bytes */
>> +       while (x >= new_sz) {
>> +               if (check_mul_overflow(new_sz, (size_t)2, &new_sz))
>> +                       return -ENOMEM;
>> +       }
>> +       if (new_sz == *arr_sz)
>> +               return 0;
>> +       new_arr = calloc(new_sz, msz);
>> +       if (!new_arr)
>> +               return -ENOMEM;
>> +       memcpy(new_arr, *arr, *arr_sz * msz);
>> +       if (init_val) {
>> +               for (i = *arr_sz; i < new_sz; i++)
>> +                       memcpy(new_arr + (i * msz), init_val, msz);
>> +       }
>> +       *arr = new_arr;
>> +       *arr_sz = new_sz;
>> +       return 0;
>> +}
>> diff --git a/tools/perf/util/util.h b/tools/perf/util/util.h
>> index 0f78f1e7782d..c1f2d423a9ec 100644
>> --- a/tools/perf/util/util.h
>> +++ b/tools/perf/util/util.h
>> @@ -79,4 +79,19 @@ struct perf_debuginfod {
>>  void perf_debuginfod_setup(struct perf_debuginfod *di);
>>
>>  char *filename_with_chroot(int pid, const char *filename);
>> +
>> +int do_realloc_array_as_needed(void **arr, size_t *arr_sz, size_t x,
>> +                              size_t msz, const void *init_val);
>> +
>> +#define realloc_array_as_needed(a, n, x, v) ({                 \
>> +       typeof(x) __x = (x);                                    \
>> +       __x >= (n) ?                                            \
>> +               do_realloc_array_as_needed((void **)&(a),       \
>> +                                          &(n),                \
>> +                                          __x,                 \
>> +                                          sizeof(*(a)),        \
>> +                                          (const void *)(v)) : \
>> +               0;                                              \
>> +       })
>> +
>>  #endif /* GIT_COMPAT_UTIL_H */
>> --
>> 2.25.1
>>


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 05/35] perf tools: Factor out evsel__id_hdr_size()
  2022-08-09 11:49     ` Adrian Hunter
@ 2022-08-09 17:07       ` Ian Rogers
  0 siblings, 0 replies; 83+ messages in thread
From: Ian Rogers @ 2022-08-09 17:07 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Namhyung Kim, Andi Kleen,
	linux-kernel, kvm

On Tue, Aug 9, 2022 at 4:50 AM Adrian Hunter <adrian.hunter@intel.com> wrote:
>
> On 19/07/22 20:09, Ian Rogers wrote:
> > On Mon, Jul 11, 2022 at 2:32 AM Adrian Hunter <adrian.hunter@intel.com> wrote:
> >>
> >> Factor out evsel__id_hdr_size() so it can be reused.
> >>
> >> This is needed by perf inject. When injecting events from a guest perf.data
> >> file, there is a possibility that the sample ID numbers conflict. To
> >> re-write an ID sample, the old one needs to be removed first, which means
> >> determining how big it is with evsel__id_hdr_size() and then subtracting
> >> that from the event size.
> >>
> >> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
> >> ---
> >>  tools/perf/util/evlist.c | 28 +---------------------------
> >>  tools/perf/util/evsel.c  | 26 ++++++++++++++++++++++++++
> >>  tools/perf/util/evsel.h  |  2 ++
> >>  3 files changed, 29 insertions(+), 27 deletions(-)
> >>
> >> diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
> >> index 48af7d379d82..03fbe151b0c4 100644
> >> --- a/tools/perf/util/evlist.c
> >> +++ b/tools/perf/util/evlist.c
> >> @@ -1244,34 +1244,8 @@ bool evlist__valid_read_format(struct evlist *evlist)
> >>  u16 evlist__id_hdr_size(struct evlist *evlist)
> >>  {
> >>         struct evsel *first = evlist__first(evlist);
> >> -       struct perf_sample *data;
> >> -       u64 sample_type;
> >> -       u16 size = 0;
> >>
> >> -       if (!first->core.attr.sample_id_all)
> >> -               goto out;
> >> -
> >> -       sample_type = first->core.attr.sample_type;
> >> -
> >> -       if (sample_type & PERF_SAMPLE_TID)
> >> -               size += sizeof(data->tid) * 2;
> >> -
> >> -       if (sample_type & PERF_SAMPLE_TIME)
> >> -               size += sizeof(data->time);
> >> -
> >> -       if (sample_type & PERF_SAMPLE_ID)
> >> -               size += sizeof(data->id);
> >> -
> >> -       if (sample_type & PERF_SAMPLE_STREAM_ID)
> >> -               size += sizeof(data->stream_id);
> >> -
> >> -       if (sample_type & PERF_SAMPLE_CPU)
> >> -               size += sizeof(data->cpu) * 2;
> >> -
> >> -       if (sample_type & PERF_SAMPLE_IDENTIFIER)
> >> -               size += sizeof(data->id);
> >> -out:
> >> -       return size;
> >> +       return first->core.attr.sample_id_all ? evsel__id_hdr_size(first) : 0;
> >>  }
> >>
> >>  bool evlist__valid_sample_id_all(struct evlist *evlist)
> >> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> >> index a67cc3f2fa74..9a30ccb7b104 100644
> >> --- a/tools/perf/util/evsel.c
> >> +++ b/tools/perf/util/evsel.c
> >> @@ -2724,6 +2724,32 @@ int evsel__parse_sample_timestamp(struct evsel *evsel, union perf_event *event,
> >>         return 0;
> >>  }
> >>
> >> +u16 evsel__id_hdr_size(struct evsel *evsel)
> >> +{
> >> +       u64 sample_type = evsel->core.attr.sample_type;
> >
> > As this just uses core, would it be more appropriate to put it in libperf?
>
> AFAIK we move to libperf only as needed.

I don't think there is an expectation yet that libperf is stable - I
hope not as I need to nuke the CPU map empty function. So, the cost of
putting something there rather than perf is minimal, and perf can be
just a consumer of libperf as any other tool - which builds confidence
the API in libperf is complete. Jiri has posted patches in the past
migrating parse-events, there's no "need" for that but the point is to
improve the library API. I think this is the same case and minimal
cost given only core is being used. Given we're actively migrating
util APIs to libperf I think it is better to introduce simple APIs
like this in libperf rather than creating something that someone will
later have to migrate.

> >
> >> +       u16 size = 0;
> >
> > Perhaps size_t or int? u16 seems odd.
>
> Event header size member is 16-bit

sizeof is generally considered size_t so the code as-is has implicit
truncation - again I'll stand by it looking odd.

Thanks,
Ian

> >
> >> +
> >> +       if (sample_type & PERF_SAMPLE_TID)
> >> +               size += sizeof(u64);
> >> +
> >> +       if (sample_type & PERF_SAMPLE_TIME)
> >> +               size += sizeof(u64);
> >> +
> >> +       if (sample_type & PERF_SAMPLE_ID)
> >> +               size += sizeof(u64);
> >> +
> >> +       if (sample_type & PERF_SAMPLE_STREAM_ID)
> >> +               size += sizeof(u64);
> >> +
> >> +       if (sample_type & PERF_SAMPLE_CPU)
> >> +               size += sizeof(u64);
> >> +
> >> +       if (sample_type & PERF_SAMPLE_IDENTIFIER)
> >> +               size += sizeof(u64);
> >> +
> >> +       return size;
> >> +}
> >> +
> >>  struct tep_format_field *evsel__field(struct evsel *evsel, const char *name)
> >>  {
> >>         return tep_find_field(evsel->tp_format, name);
> >> diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
> >> index 92bed8e2f7d8..699448f2bc2b 100644
> >> --- a/tools/perf/util/evsel.h
> >> +++ b/tools/perf/util/evsel.h
> >> @@ -381,6 +381,8 @@ int evsel__parse_sample(struct evsel *evsel, union perf_event *event,
> >>  int evsel__parse_sample_timestamp(struct evsel *evsel, union perf_event *event,
> >>                                   u64 *timestamp);
> >>
> >> +u16 evsel__id_hdr_size(struct evsel *evsel);
> >> +
> >
> > A comment would be nice, I know this is just moving code about but
> > this is a new function.
> >
> > Thanks,
> > Ian
> >
> >>  static inline struct evsel *evsel__next(struct evsel *evsel)
> >>  {
> >>         return list_entry(evsel->core.node.next, struct evsel, core.node);
> >> --
> >> 2.25.1
> >>
>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 24/35] perf inject: Add support for injecting guest sideband events
  2022-07-20  1:06   ` Ian Rogers
@ 2022-08-11 17:19     ` Adrian Hunter
  0 siblings, 0 replies; 83+ messages in thread
From: Adrian Hunter @ 2022-08-11 17:19 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Namhyung Kim, Andi Kleen,
	linux-kernel, kvm

On 20/07/22 04:06, Ian Rogers wrote:
> On Mon, Jul 11, 2022 at 2:33 AM Adrian Hunter <adrian.hunter@intel.com> wrote:
>>
>> Inject events from a perf.data file recorded in a virtual machine into
>> a perf.data file recorded on the host at the same time.
>>
>> Only side band events (e.g. mmap, comm, fork, exit etc) and build IDs are
>> injected.  Additionally, the guest kcore_dir is copied as kcore_dir__
>> appended to the machine PID.
>>
>> This is non-trivial because:
>>  o It is not possible to process 2 sessions simultaneously so instead
>>  events are first written to a temporary file.
>>  o To avoid conflict, guest sample IDs are replaced with new unused sample
>>  IDs.
>>  o Guest event's CPU is changed to be the host CPU because it is more
>>  useful for reporting and analysis.
>>  o Sample ID is mapped to machine PID which is recorded with VCPU in the
>>  id index. This is important to allow guest events to be related to the
>>  guest machine and VCPU.
>>  o Timestamps must be converted.
>>  o Events are inserted to obey finished-round ordering.
>>
>> The anticipated use-case is:
>>  - start recording sideband events in a guest machine
>>  - start recording an AUX area trace on the host which can trace also the
>>  guest (e.g. Intel PT)
>>  - run test case on the guest
>>  - stop recording on the host
>>  - stop recording on the guest
>>  - copy the guest perf.data file to the host
>>  - inject the guest perf.data file sideband events into the host perf.data
>>  file using perf inject
>>  - the resulting perf.data file can now be used
>>
>> Subsequent patches provide Intel PT support for this.
>>
>> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
>> ---
>>  tools/perf/Documentation/perf-inject.txt |   17 +
>>  tools/perf/builtin-inject.c              | 1043 +++++++++++++++++++++-
>>  2 files changed, 1059 insertions(+), 1 deletion(-)
>>
>> diff --git a/tools/perf/Documentation/perf-inject.txt b/tools/perf/Documentation/perf-inject.txt
>> index 0570a1ccd344..646aa31586ed 100644
>> --- a/tools/perf/Documentation/perf-inject.txt
>> +++ b/tools/perf/Documentation/perf-inject.txt
>> @@ -85,6 +85,23 @@ include::itrace.txt[]
>>         without updating it. Currently this option is supported only by
>>         Intel PT, refer linkperf:perf-intel-pt[1]
>>
>> +--guest-data=<path>,<pid>[,<time offset>[,<time scale>]]::
>> +       Insert events from a perf.data file recorded in a virtual machine at
>> +       the same time as the input perf.data file was recorded on the host.
>> +       The Process ID (PID) of the QEMU hypervisor process must be provided,
>> +       and the time offset and time scale (multiplier) will likely be needed
>> +       to convert guest time stamps into host time stamps. For example, for
>> +       x86 the TSC Offset and Multiplier could be provided for a virtual machine
>> +       using Linux command line option no-kvmclock.
>> +       Currently only mmap, mmap2, comm, task, context_switch, ksymbol,
>> +       and text_poke events are inserted, as well as build ID information.
>> +       The QEMU option -name debug-threads=on is needed so that thread names
>> +       can be used to determine which thread is running which VCPU. Note
>> +       libvirt seems to use this by default.
>> +       When using perf record in the guest, option --sample-identifier
>> +       should be used, and also --buildid-all and --switch-events may be
>> +       useful.
>> +
> 
> Would other hypervisors based on kvm like gVisor work if they
> implemented name-debug-threads?

AFAICT gVisor is not a machine level hypervisor so the issue does not arise.

> 
>>  SEE ALSO
>>  --------
>>  linkperf:perf-record[1], linkperf:perf-report[1], linkperf:perf-archive[1],
>> diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
>> index c800911f68e7..fd4547bb75f7 100644
>> --- a/tools/perf/builtin-inject.c
>> +++ b/tools/perf/builtin-inject.c
>> @@ -26,6 +26,7 @@
>>  #include "util/thread.h"
>>  #include "util/namespaces.h"
>>  #include "util/util.h"
>> +#include "util/tsc.h"
>>
>>  #include <internal/lib.h>
>>
>> @@ -35,8 +36,70 @@
>>
>>  #include <linux/list.h>
>>  #include <linux/string.h>
>> +#include <linux/zalloc.h>
>> +#include <linux/hash.h>
>>  #include <errno.h>
>>  #include <signal.h>
>> +#include <inttypes.h>
>> +
>> +struct guest_event {
>> +       struct perf_sample              sample;
>> +       union perf_event                *event;
>> +       char                            event_buf[PERF_SAMPLE_MAX_SIZE];
>> +};
>> +
>> +struct guest_id {
>> +       /* hlist_node must be first, see free_hlist() */
>> +       struct hlist_node               node;
>> +       u64                             id;
>> +       u64                             host_id;
>> +       u32                             vcpu;
>> +};
>> +
>> +struct guest_tid {
>> +       /* hlist_node must be first, see free_hlist() */
>> +       struct hlist_node               node;
>> +       /* Thread ID of QEMU thread */
>> +       u32                             tid;
>> +       u32                             vcpu;
>> +};
>> +
>> +struct guest_vcpu {
>> +       /* Current host CPU */
>> +       u32                             cpu;
>> +       /* Thread ID of QEMU thread */
>> +       u32                             tid;
>> +};
>> +
>> +struct guest_session {
>> +       char                            *perf_data_file;
>> +       u32                             machine_pid;
>> +       u64                             time_offset;
>> +       double                          time_scale;
>> +       struct perf_tool                tool;
>> +       struct perf_data                data;
>> +       struct perf_session             *session;
>> +       char                            *tmp_file_name;
>> +       int                             tmp_fd;
>> +       struct perf_tsc_conversion      host_tc;
>> +       struct perf_tsc_conversion      guest_tc;
>> +       bool                            copy_kcore_dir;
>> +       bool                            have_tc;
>> +       bool                            fetched;
>> +       bool                            ready;
>> +       u16                             dflt_id_hdr_size;
>> +       u64                             dflt_id;
>> +       u64                             highest_id;
>> +       /* Array of guest_vcpu */
>> +       struct guest_vcpu               *vcpu;
>> +       size_t                          vcpu_cnt;
>> +       /* Hash table for guest_id */
>> +       struct hlist_head               heads[PERF_EVLIST__HLIST_SIZE];
>> +       /* Hash table for guest_tid */
>> +       struct hlist_head               tids[PERF_EVLIST__HLIST_SIZE];
>> +       /* Place to stash next guest event */
>> +       struct guest_event              ev;
>> +};
>>
>>  struct perf_inject {
>>         struct perf_tool        tool;
>> @@ -59,6 +122,7 @@ struct perf_inject {
>>         struct itrace_synth_opts itrace_synth_opts;
>>         char                    event_copy[PERF_SAMPLE_MAX_SIZE];
>>         struct perf_file_section secs[HEADER_FEAT_BITS];
>> +       struct guest_session    guest_session;
>>  };
>>
>>  struct event_entry {
>> @@ -698,6 +762,841 @@ static int perf_inject__sched_stat(struct perf_tool *tool,
>>         return perf_event__repipe(tool, event_sw, &sample_sw, machine);
>>  }
>>
>> +static struct guest_vcpu *guest_session__vcpu(struct guest_session *gs, u32 vcpu)
>> +{
>> +       if (realloc_array_as_needed(gs->vcpu, gs->vcpu_cnt, vcpu, NULL))
>> +               return NULL;
>> +       return &gs->vcpu[vcpu];
>> +}
>> +
>> +static int guest_session__output_bytes(struct guest_session *gs, void *buf, size_t sz)
>> +{
>> +       ssize_t ret = writen(gs->tmp_fd, buf, sz);
>> +
>> +       return ret < 0 ? ret : 0;
>> +}
>> +
>> +static int guest_session__repipe(struct perf_tool *tool,
>> +                                union perf_event *event,
>> +                                struct perf_sample *sample __maybe_unused,
>> +                                struct machine *machine __maybe_unused)
>> +{
>> +       struct guest_session *gs = container_of(tool, struct guest_session, tool);
>> +
>> +       return guest_session__output_bytes(gs, event, event->header.size);
>> +}
>> +
>> +static int guest_session__map_tid(struct guest_session *gs, u32 tid, u32 vcpu)
>> +{
>> +       struct guest_tid *guest_tid = zalloc(sizeof(*guest_tid));
>> +       int hash;
>> +
>> +       if (!guest_tid)
>> +               return -ENOMEM;
>> +
>> +       guest_tid->tid = tid;
>> +       guest_tid->vcpu = vcpu;
>> +       hash = hash_32(guest_tid->tid, PERF_EVLIST__HLIST_BITS);
>> +       hlist_add_head(&guest_tid->node, &gs->tids[hash]);
>> +
>> +       return 0;
>> +}
>> +
>> +static int host_peek_vm_comms_cb(struct perf_session *session __maybe_unused,
>> +                                union perf_event *event,
>> +                                u64 offset __maybe_unused, void *data)
>> +{
>> +       struct guest_session *gs = data;
>> +       unsigned int vcpu;
>> +       struct guest_vcpu *guest_vcpu;
>> +       int ret;
>> +
>> +       if (event->header.type != PERF_RECORD_COMM ||
>> +           event->comm.pid != gs->machine_pid)
>> +               return 0;
>> +
>> +       /*
>> +        * QEMU option -name debug-threads=on, causes thread names formatted as
>> +        * below, although it is not an ABI. Also libvirt seems to use this by
>> +        * default. Here we rely on it to tell us which thread is which VCPU.
>> +        */
>> +       ret = sscanf(event->comm.comm, "CPU %u/KVM", &vcpu);
>> +       if (ret <= 0)
>> +               return ret;
>> +       pr_debug("Found VCPU: tid %u comm %s vcpu %u\n",
>> +                event->comm.tid, event->comm.comm, vcpu);
>> +       if (vcpu > INT_MAX) {
>> +               pr_err("Invalid VCPU %u\n", vcpu);
>> +               return -EINVAL;
>> +       }
>> +       guest_vcpu = guest_session__vcpu(gs, vcpu);
>> +       if (!guest_vcpu)
>> +               return -ENOMEM;
>> +       if (guest_vcpu->tid && guest_vcpu->tid != event->comm.tid) {
>> +               pr_err("Fatal error: Two threads found with the same VCPU\n");
>> +               return -EINVAL;
>> +       }
>> +       guest_vcpu->tid = event->comm.tid;
>> +
>> +       return guest_session__map_tid(gs, event->comm.tid, vcpu);
>> +}
>> +
>> +static int host_peek_vm_comms(struct perf_session *session, struct guest_session *gs)
>> +{
>> +       return perf_session__peek_events(session, session->header.data_offset,
>> +                                        session->header.data_size,
>> +                                        host_peek_vm_comms_cb, gs);
>> +}
>> +
>> +static bool evlist__is_id_used(struct evlist *evlist, u64 id)
>> +{
>> +       return evlist__id2sid(evlist, id);
>> +}
>> +
>> +static u64 guest_session__allocate_new_id(struct guest_session *gs, struct evlist *host_evlist)
>> +{
>> +       do {
>> +               gs->highest_id += 1;
>> +       } while (!gs->highest_id || evlist__is_id_used(host_evlist, gs->highest_id));
>> +
>> +       return gs->highest_id;
>> +}
>> +
>> +static int guest_session__map_id(struct guest_session *gs, u64 id, u64 host_id, u32 vcpu)
>> +{
>> +       struct guest_id *guest_id = zalloc(sizeof(*guest_id));
>> +       int hash;
>> +
>> +       if (!guest_id)
>> +               return -ENOMEM;
>> +
>> +       guest_id->id = id;
>> +       guest_id->host_id = host_id;
>> +       guest_id->vcpu = vcpu;
>> +       hash = hash_64(guest_id->id, PERF_EVLIST__HLIST_BITS);
>> +       hlist_add_head(&guest_id->node, &gs->heads[hash]);
>> +
>> +       return 0;
>> +}
>> +
>> +static u64 evlist__find_highest_id(struct evlist *evlist)
>> +{
>> +       struct evsel *evsel;
>> +       u64 highest_id = 1;
>> +
>> +       evlist__for_each_entry(evlist, evsel) {
>> +               u32 j;
>> +
>> +               for (j = 0; j < evsel->core.ids; j++) {
>> +                       u64 id = evsel->core.id[j];
>> +
>> +                       if (id > highest_id)
>> +                               highest_id = id;
>> +               }
>> +       }
>> +
>> +       return highest_id;
>> +}
>> +
>> +static int guest_session__map_ids(struct guest_session *gs, struct evlist *host_evlist)
>> +{
>> +       struct evlist *evlist = gs->session->evlist;
>> +       struct evsel *evsel;
>> +       int ret;
>> +
>> +       evlist__for_each_entry(evlist, evsel) {
>> +               u32 j;
>> +
>> +               for (j = 0; j < evsel->core.ids; j++) {
>> +                       struct perf_sample_id *sid;
>> +                       u64 host_id;
>> +                       u64 id;
>> +
>> +                       id = evsel->core.id[j];
>> +                       sid = evlist__id2sid(evlist, id);
>> +                       if (!sid || sid->cpu.cpu == -1)
>> +                               continue;
>> +                       host_id = guest_session__allocate_new_id(gs, host_evlist);
>> +                       ret = guest_session__map_id(gs, id, host_id, sid->cpu.cpu);
>> +                       if (ret)
>> +                               return ret;
>> +               }
>> +       }
>> +
>> +       return 0;
>> +}
>> +
>> +static struct guest_id *guest_session__lookup_id(struct guest_session *gs, u64 id)
>> +{
>> +       struct hlist_head *head;
>> +       struct guest_id *guest_id;
>> +       int hash;
>> +
>> +       hash = hash_64(id, PERF_EVLIST__HLIST_BITS);
>> +       head = &gs->heads[hash];
>> +
>> +       hlist_for_each_entry(guest_id, head, node)
>> +               if (guest_id->id == id)
>> +                       return guest_id;
>> +
>> +       return NULL;
>> +}
>> +
>> +static int process_attr(struct perf_tool *tool, union perf_event *event,
>> +                       struct perf_sample *sample __maybe_unused,
>> +                       struct machine *machine __maybe_unused)
>> +{
>> +       struct perf_inject *inject = container_of(tool, struct perf_inject, tool);
>> +
>> +       return perf_event__process_attr(tool, event, &inject->session->evlist);
>> +}
>> +
>> +static int guest_session__add_attr(struct guest_session *gs, struct evsel *evsel)
>> +{
>> +       struct perf_inject *inject = container_of(gs, struct perf_inject, guest_session);
>> +       struct perf_event_attr attr = evsel->core.attr;
>> +       u64 *id_array;
>> +       u32 *vcpu_array;
>> +       int ret = -ENOMEM;
>> +       u32 i;
>> +
>> +       id_array = calloc(evsel->core.ids, sizeof(*id_array));
>> +       if (!id_array)
>> +               return -ENOMEM;
>> +
>> +       vcpu_array = calloc(evsel->core.ids, sizeof(*vcpu_array));
>> +       if (!vcpu_array)
>> +               goto out;
>> +
>> +       for (i = 0; i < evsel->core.ids; i++) {
>> +               u64 id = evsel->core.id[i];
>> +               struct guest_id *guest_id = guest_session__lookup_id(gs, id);
>> +
>> +               if (!guest_id) {
>> +                       pr_err("Failed to find guest id %"PRIu64"\n", id);
>> +                       ret = -EINVAL;
>> +                       goto out;
>> +               }
>> +               id_array[i] = guest_id->host_id;
>> +               vcpu_array[i] = guest_id->vcpu;
>> +       }
>> +
>> +       attr.sample_type |= PERF_SAMPLE_IDENTIFIER;
>> +       attr.exclude_host = 1;
>> +       attr.exclude_guest = 0;
>> +
>> +       ret = perf_event__synthesize_attr(&inject->tool, &attr, evsel->core.ids,
>> +                                         id_array, process_attr);
>> +       if (ret)
>> +               pr_err("Failed to add guest attr.\n");
>> +
>> +       for (i = 0; i < evsel->core.ids; i++) {
>> +               struct perf_sample_id *sid;
>> +               u32 vcpu = vcpu_array[i];
>> +
>> +               sid = evlist__id2sid(inject->session->evlist, id_array[i]);
>> +               /* Guest event is per-thread from the host point of view */
>> +               sid->cpu.cpu = -1;
>> +               sid->tid = gs->vcpu[vcpu].tid;
>> +               sid->machine_pid = gs->machine_pid;
>> +               sid->vcpu.cpu = vcpu;
>> +       }
>> +out:
>> +       free(vcpu_array);
>> +       free(id_array);
>> +       return ret;
>> +}
>> +
>> +static int guest_session__add_attrs(struct guest_session *gs)
>> +{
>> +       struct evlist *evlist = gs->session->evlist;
>> +       struct evsel *evsel;
>> +       int ret;
>> +
>> +       evlist__for_each_entry(evlist, evsel) {
>> +               ret = guest_session__add_attr(gs, evsel);
>> +               if (ret)
>> +                       return ret;
>> +       }
>> +
>> +       return 0;
>> +}
>> +
>> +static int synthesize_id_index(struct perf_inject *inject, size_t new_cnt)
>> +{
>> +       struct perf_session *session = inject->session;
>> +       struct evlist *evlist = session->evlist;
>> +       struct machine *machine = &session->machines.host;
>> +       size_t from = evlist->core.nr_entries - new_cnt;
>> +
>> +       return __perf_event__synthesize_id_index(&inject->tool, perf_event__repipe,
>> +                                                evlist, machine, from);
>> +}
>> +
>> +static struct guest_tid *guest_session__lookup_tid(struct guest_session *gs, u32 tid)
>> +{
>> +       struct hlist_head *head;
>> +       struct guest_tid *guest_tid;
>> +       int hash;
>> +
>> +       hash = hash_32(tid, PERF_EVLIST__HLIST_BITS);
>> +       head = &gs->tids[hash];
>> +
>> +       hlist_for_each_entry(guest_tid, head, node)
>> +               if (guest_tid->tid == tid)
>> +                       return guest_tid;
>> +
>> +       return NULL;
>> +}
>> +
>> +static bool dso__is_in_kernel_space(struct dso *dso)
>> +{
>> +       if (dso__is_vdso(dso))
>> +               return false;
>> +
>> +       return dso__is_kcore(dso) ||
>> +              dso->kernel ||
>> +              is_kernel_module(dso->long_name, PERF_RECORD_MISC_CPUMODE_UNKNOWN);
>> +}
>> +
>> +static u64 evlist__first_id(struct evlist *evlist)
>> +{
>> +       struct evsel *evsel;
>> +
>> +       evlist__for_each_entry(evlist, evsel) {
>> +               if (evsel->core.ids)
>> +                       return evsel->core.id[0];
>> +       }
>> +       return 0;
>> +}
>> +
>> +static int process_build_id(struct perf_tool *tool,
>> +                           union perf_event *event,
>> +                           struct perf_sample *sample __maybe_unused,
>> +                           struct machine *machine __maybe_unused)
>> +{
>> +       struct perf_inject *inject = container_of(tool, struct perf_inject, tool);
>> +
>> +       return perf_event__process_build_id(inject->session, event);
>> +}
>> +
>> +static int synthesize_build_id(struct perf_inject *inject, struct dso *dso, pid_t machine_pid)
>> +{
>> +       struct machine *machine = perf_session__findnew_machine(inject->session, machine_pid);
>> +       u8 cpumode = dso__is_in_kernel_space(dso) ?
>> +                       PERF_RECORD_MISC_GUEST_KERNEL :
>> +                       PERF_RECORD_MISC_GUEST_USER;
>> +
>> +       if (!machine)
>> +               return -ENOMEM;
>> +
>> +       dso->hit = 1;
>> +
>> +       return perf_event__synthesize_build_id(&inject->tool, dso, cpumode,
>> +                                              process_build_id, machine);
>> +}
>> +
>> +static int guest_session__add_build_ids(struct guest_session *gs)
>> +{
>> +       struct perf_inject *inject = container_of(gs, struct perf_inject, guest_session);
>> +       struct machine *machine = &gs->session->machines.host;
>> +       struct dso *dso;
>> +       int ret;
>> +
>> +       /* Build IDs will be put in the Build ID feature section */
>> +       perf_header__set_feat(&inject->session->header, HEADER_BUILD_ID);
>> +
>> +       dsos__for_each_with_build_id(dso, &machine->dsos.head) {
>> +               ret = synthesize_build_id(inject, dso, gs->machine_pid);
>> +               if (ret)
>> +                       return ret;
>> +       }
>> +
>> +       return 0;
>> +}
>> +
>> +static int guest_session__ksymbol_event(struct perf_tool *tool,
>> +                                       union perf_event *event,
>> +                                       struct perf_sample *sample __maybe_unused,
>> +                                       struct machine *machine __maybe_unused)
>> +{
>> +       struct guest_session *gs = container_of(tool, struct guest_session, tool);
>> +
>> +       /* Only support out-of-line i.e. no BPF support */
>> +       if (event->ksymbol.ksym_type != PERF_RECORD_KSYMBOL_TYPE_OOL)
>> +               return 0;
>> +
>> +       return guest_session__output_bytes(gs, event, event->header.size);
>> +}
>> +
>> +static int guest_session__start(struct guest_session *gs, const char *name, bool force)
>> +{
>> +       char tmp_file_name[] = "/tmp/perf-inject-guest_session-XXXXXX";
>> +       struct perf_session *session;
>> +       int ret;
>> +
>> +       /* Only these events will be injected */
>> +       gs->tool.mmap           = guest_session__repipe;
>> +       gs->tool.mmap2          = guest_session__repipe;
>> +       gs->tool.comm           = guest_session__repipe;
>> +       gs->tool.fork           = guest_session__repipe;
>> +       gs->tool.exit           = guest_session__repipe;
>> +       gs->tool.lost           = guest_session__repipe;
>> +       gs->tool.context_switch = guest_session__repipe;
>> +       gs->tool.ksymbol        = guest_session__ksymbol_event;
>> +       gs->tool.text_poke      = guest_session__repipe;
>> +       /*
>> +        * Processing a build ID creates a struct dso with that build ID. Later,
>> +        * all guest dsos are iterated and the build IDs processed into the host
>> +        * session where they will be output to the Build ID feature section
>> +        * when the perf.data file header is written.
>> +        */
>> +       gs->tool.build_id       = perf_event__process_build_id;
>> +       /* Process the id index to know what VCPU an ID belongs to */
>> +       gs->tool.id_index       = perf_event__process_id_index;
>> +
>> +       gs->tool.ordered_events = true;
>> +       gs->tool.ordering_requires_timestamps = true;
>> +
>> +       gs->data.path   = name;
>> +       gs->data.force  = force;
>> +       gs->data.mode   = PERF_DATA_MODE_READ;
>> +
>> +       session = perf_session__new(&gs->data, &gs->tool);
>> +       if (IS_ERR(session))
>> +               return PTR_ERR(session);
>> +       gs->session = session;
>> +
>> +       /*
>> +        * Initial events have zero'd ID samples. Get default ID sample size
>> +        * used for removing them.
>> +        */
>> +       gs->dflt_id_hdr_size = session->machines.host.id_hdr_size;
>> +       /* And default ID for adding back a host-compatible ID sample */
>> +       gs->dflt_id = evlist__first_id(session->evlist);
>> +       if (!gs->dflt_id) {
>> +               pr_err("Guest data has no sample IDs");
>> +               return -EINVAL;
>> +       }
>> +
>> +       /* Temporary file for guest events */
>> +       gs->tmp_file_name = strdup(tmp_file_name);
>> +       if (!gs->tmp_file_name)
>> +               return -ENOMEM;
>> +       gs->tmp_fd = mkstemp(gs->tmp_file_name);
>> +       if (gs->tmp_fd < 0)
>> +               return -errno;
>> +
>> +       if (zstd_init(&gs->session->zstd_data, 0) < 0)
>> +               pr_warning("Guest session decompression initialization failed.\n");
>> +
>> +       /*
>> +        * perf does not support processing 2 sessions simultaneously, so output
>> +        * guest events to a temporary file.
>> +        */
>> +       ret = perf_session__process_events(gs->session);
>> +       if (ret)
>> +               return ret;
>> +
>> +       if (lseek(gs->tmp_fd, 0, SEEK_SET))
>> +               return -errno;
>> +
>> +       return 0;
>> +}
>> +
>> +/* Free hlist nodes assuming hlist_node is the first member of hlist entries */
>> +static void free_hlist(struct hlist_head *heads, size_t hlist_sz)
>> +{
>> +       struct hlist_node *pos, *n;
>> +       size_t i;
>> +
>> +       for (i = 0; i < hlist_sz; ++i) {
>> +               hlist_for_each_safe(pos, n, &heads[i]) {
>> +                       hlist_del(pos);
>> +                       free(pos);
>> +               }
>> +       }
>> +}
>> +
>> +static void guest_session__exit(struct guest_session *gs)
>> +{
>> +       if (gs->session) {
>> +               perf_session__delete(gs->session);
>> +               free_hlist(gs->heads, PERF_EVLIST__HLIST_SIZE);
>> +               free_hlist(gs->tids, PERF_EVLIST__HLIST_SIZE);
>> +       }
>> +       if (gs->tmp_file_name) {
>> +               if (gs->tmp_fd >= 0)
>> +                       close(gs->tmp_fd);
>> +               unlink(gs->tmp_file_name);
>> +               free(gs->tmp_file_name);
>> +       }
>> +       free(gs->vcpu);
>> +       free(gs->perf_data_file);
>> +}
>> +
>> +static void get_tsc_conv(struct perf_tsc_conversion *tc, struct perf_record_time_conv *time_conv)
>> +{
>> +       tc->time_shift          = time_conv->time_shift;
>> +       tc->time_mult           = time_conv->time_mult;
>> +       tc->time_zero           = time_conv->time_zero;
>> +       tc->time_cycles         = time_conv->time_cycles;
>> +       tc->time_mask           = time_conv->time_mask;
>> +       tc->cap_user_time_zero  = time_conv->cap_user_time_zero;
>> +       tc->cap_user_time_short = time_conv->cap_user_time_short;
>> +}
>> +
>> +static void guest_session__get_tc(struct guest_session *gs)
>> +{
>> +       struct perf_inject *inject = container_of(gs, struct perf_inject, guest_session);
>> +
>> +       get_tsc_conv(&gs->host_tc, &inject->session->time_conv);
>> +       get_tsc_conv(&gs->guest_tc, &gs->session->time_conv);
>> +}
>> +
>> +static void guest_session__convert_time(struct guest_session *gs, u64 guest_time, u64 *host_time)
>> +{
>> +       u64 tsc;
>> +
>> +       if (!guest_time) {
>> +               *host_time = 0;
>> +               return;
>> +       }
>> +
>> +       if (gs->guest_tc.cap_user_time_zero)
>> +               tsc = perf_time_to_tsc(guest_time, &gs->guest_tc);
>> +       else
>> +               tsc = guest_time;
>> +
>> +       /*
>> +        * This is the correct order of operations for x86 if the TSC Offset and
>> +        * Multiplier values are used.
>> +        */
>> +       tsc -= gs->time_offset;
>> +       tsc /= gs->time_scale;
>> +
>> +       if (gs->host_tc.cap_user_time_zero)
>> +               *host_time = tsc_to_perf_time(tsc, &gs->host_tc);
>> +       else
>> +               *host_time = tsc;
>> +}
>> +
>> +static int guest_session__fetch(struct guest_session *gs)
>> +{
>> +       void *buf = gs->ev.event_buf;
>> +       struct perf_event_header *hdr = buf;
>> +       size_t hdr_sz = sizeof(*hdr);
>> +       ssize_t ret;
>> +
>> +       ret = readn(gs->tmp_fd, buf, hdr_sz);
>> +       if (ret < 0)
>> +               return ret;
>> +
>> +       if (!ret) {
>> +               /* Zero size means EOF */
>> +               hdr->size = 0;
>> +               return 0;
>> +       }
>> +
>> +       buf += hdr_sz;
>> +
>> +       ret = readn(gs->tmp_fd, buf, hdr->size - hdr_sz);
>> +       if (ret < 0)
>> +               return ret;
>> +
>> +       gs->ev.event = (union perf_event *)gs->ev.event_buf;
>> +       gs->ev.sample.time = 0;
>> +
>> +       if (hdr->type >= PERF_RECORD_USER_TYPE_START) {
>> +               pr_err("Unexpected type fetching guest event");
>> +               return 0;
>> +       }
>> +
>> +       ret = evlist__parse_sample(gs->session->evlist, gs->ev.event, &gs->ev.sample);
>> +       if (ret) {
>> +               pr_err("Parse failed fetching guest event");
>> +               return ret;
>> +       }
>> +
>> +       if (!gs->have_tc) {
>> +               guest_session__get_tc(gs);
>> +               gs->have_tc = true;
>> +       }
>> +
>> +       guest_session__convert_time(gs, gs->ev.sample.time, &gs->ev.sample.time);
>> +
>> +       return 0;
>> +}
>> +
>> +static int evlist__append_id_sample(struct evlist *evlist, union perf_event *ev,
>> +                                   const struct perf_sample *sample)
>> +{
>> +       struct evsel *evsel;
>> +       void *array;
>> +       int ret;
>> +
>> +       evsel = evlist__id2evsel(evlist, sample->id);
>> +       array = ev;
>> +
>> +       if (!evsel) {
>> +               pr_err("No evsel for id %"PRIu64"\n", sample->id);
>> +               return -EINVAL;
>> +       }
>> +
>> +       array += ev->header.size;
>> +       ret = perf_event__synthesize_id_sample(array, evsel->core.attr.sample_type, sample);
>> +       if (ret < 0)
>> +               return ret;
>> +
>> +       if (ret & 7) {
>> +               pr_err("Bad id sample size %d\n", ret);
>> +               return -EINVAL;
>> +       }
>> +
>> +       ev->header.size += ret;
>> +
>> +       return 0;
>> +}
>> +
>> +static int guest_session__inject_events(struct guest_session *gs, u64 timestamp)
>> +{
>> +       struct perf_inject *inject = container_of(gs, struct perf_inject, guest_session);
>> +       int ret;
>> +
>> +       if (!gs->ready)
>> +               return 0;
>> +
>> +       while (1) {
>> +               struct perf_sample *sample;
>> +               struct guest_id *guest_id;
>> +               union perf_event *ev;
>> +               u16 id_hdr_size;
>> +               u8 cpumode;
>> +               u64 id;
>> +
>> +               if (!gs->fetched) {
>> +                       ret = guest_session__fetch(gs);
>> +                       if (ret)
>> +                               return ret;
>> +                       gs->fetched = true;
>> +               }
>> +
>> +               ev = gs->ev.event;
>> +               sample = &gs->ev.sample;
>> +
>> +               if (!ev->header.size)
>> +                       return 0; /* EOF */
>> +
>> +               if (sample->time > timestamp)
>> +                       return 0;
>> +
>> +               /* Change cpumode to guest */
>> +               cpumode = ev->header.misc & PERF_RECORD_MISC_CPUMODE_MASK;
>> +               if (cpumode & PERF_RECORD_MISC_USER)
>> +                       cpumode = PERF_RECORD_MISC_GUEST_USER;
>> +               else
>> +                       cpumode = PERF_RECORD_MISC_GUEST_KERNEL;
>> +               ev->header.misc &= ~PERF_RECORD_MISC_CPUMODE_MASK;
>> +               ev->header.misc |= cpumode;
>> +
>> +               id = sample->id;
>> +               if (!id) {
>> +                       id = gs->dflt_id;
>> +                       id_hdr_size = gs->dflt_id_hdr_size;
>> +               } else {
>> +                       struct evsel *evsel = evlist__id2evsel(gs->session->evlist, id);
>> +
>> +                       id_hdr_size = evsel__id_hdr_size(evsel);
>> +               }
>> +
>> +               if (id_hdr_size & 7) {
>> +                       pr_err("Bad id_hdr_size %u\n", id_hdr_size);
>> +                       return -EINVAL;
>> +               }
>> +
>> +               if (ev->header.size & 7) {
>> +                       pr_err("Bad event size %u\n", ev->header.size);
>> +                       return -EINVAL;
>> +               }
>> +
>> +               /* Remove guest id sample */
>> +               ev->header.size -= id_hdr_size;
>> +
>> +               if (ev->header.size & 7) {
>> +                       pr_err("Bad raw event size %u\n", ev->header.size);
>> +                       return -EINVAL;
>> +               }
>> +
>> +               guest_id = guest_session__lookup_id(gs, id);
>> +               if (!guest_id) {
>> +                       pr_err("Guest event with unknown id %llu\n",
>> +                              (unsigned long long)id);
>> +                       return -EINVAL;
>> +               }
>> +
>> +               /* Change to host ID to avoid conflicting ID values */
>> +               sample->id = guest_id->host_id;
>> +               sample->stream_id = guest_id->host_id;
>> +
>> +               if (sample->cpu != (u32)-1) {
>> +                       if (sample->cpu >= gs->vcpu_cnt) {
>> +                               pr_err("Guest event with unknown VCPU %u\n",
>> +                                      sample->cpu);
>> +                               return -EINVAL;
>> +                       }
>> +                       /* Change to host CPU instead of guest VCPU */
>> +                       sample->cpu = gs->vcpu[sample->cpu].cpu;
>> +               }
>> +
>> +               /* New id sample with new ID and CPU */
>> +               ret = evlist__append_id_sample(inject->session->evlist, ev, sample);
>> +               if (ret)
>> +                       return ret;
>> +
>> +               if (ev->header.size & 7) {
>> +                       pr_err("Bad new event size %u\n", ev->header.size);
>> +                       return -EINVAL;
>> +               }
>> +
>> +               gs->fetched = false;
>> +
>> +               ret = output_bytes(inject, ev, ev->header.size);
>> +               if (ret)
>> +                       return ret;
>> +       }
>> +}
>> +
>> +static int guest_session__flush_events(struct guest_session *gs)
>> +{
>> +       return guest_session__inject_events(gs, -1);
>> +}
>> +
>> +static int host__repipe(struct perf_tool *tool,
>> +                       union perf_event *event,
>> +                       struct perf_sample *sample,
>> +                       struct machine *machine)
>> +{
>> +       struct perf_inject *inject = container_of(tool, struct perf_inject, tool);
>> +       int ret;
>> +
>> +       ret = guest_session__inject_events(&inject->guest_session, sample->time);
>> +       if (ret)
>> +               return ret;
>> +
>> +       return perf_event__repipe(tool, event, sample, machine);
>> +}
>> +
>> +static int host__finished_init(struct perf_session *session, union perf_event *event)
>> +{
>> +       struct perf_inject *inject = container_of(session->tool, struct perf_inject, tool);
>> +       struct guest_session *gs = &inject->guest_session;
>> +       int ret;
>> +
>> +       /*
>> +        * Peek through host COMM events to find QEMU threads and the VCPU they
>> +        * are running.
>> +        */
>> +       ret = host_peek_vm_comms(session, gs);
>> +       if (ret)
>> +               return ret;
>> +
>> +       if (!gs->vcpu_cnt) {
>> +               pr_err("No VCPU theads found for pid %u\n", gs->machine_pid);
>> +               return -EINVAL;
>> +       }
>> +
>> +       /*
>> +        * Allocate new (unused) host sample IDs and map them to the guest IDs.
>> +        */
>> +       gs->highest_id = evlist__find_highest_id(session->evlist);
>> +       ret = guest_session__map_ids(gs, session->evlist);
>> +       if (ret)
>> +               return ret;
>> +
>> +       ret = guest_session__add_attrs(gs);
>> +       if (ret)
>> +               return ret;
>> +
>> +       ret = synthesize_id_index(inject, gs->session->evlist->core.nr_entries);
>> +       if (ret) {
>> +               pr_err("Failed to synthesize id_index\n");
>> +               return ret;
>> +       }
>> +
>> +       ret = guest_session__add_build_ids(gs);
>> +       if (ret) {
>> +               pr_err("Failed to add guest build IDs\n");
>> +               return ret;
>> +       }
>> +
>> +       gs->ready = true;
>> +
>> +       ret = guest_session__inject_events(gs, 0);
>> +       if (ret)
>> +               return ret;
>> +
>> +       return perf_event__repipe_op2_synth(session, event);
>> +}
>> +
>> +/*
>> + * Obey finished-round ordering. The FINISHED_ROUND event is first processed
>> + * which flushes host events to file up until the last flush time. Then inject
>> + * guest events up to the same time. Finally write out the FINISHED_ROUND event
>> + * itself.
>> + */
>> +static int host__finished_round(struct perf_tool *tool,
>> +                               union perf_event *event,
>> +                               struct ordered_events *oe)
>> +{
>> +       struct perf_inject *inject = container_of(tool, struct perf_inject, tool);
>> +       int ret = perf_event__process_finished_round(tool, event, oe);
>> +       u64 timestamp = ordered_events__last_flush_time(oe);
>> +
>> +       if (ret)
>> +               return ret;
>> +
>> +       ret = guest_session__inject_events(&inject->guest_session, timestamp);
>> +       if (ret)
>> +               return ret;
>> +
>> +       return perf_event__repipe_oe_synth(tool, event, oe);
>> +}
>> +
>> +static int host__context_switch(struct perf_tool *tool,
>> +                               union perf_event *event,
>> +                               struct perf_sample *sample,
>> +                               struct machine *machine)
>> +{
>> +       struct perf_inject *inject = container_of(tool, struct perf_inject, tool);
>> +       bool out = event->header.misc & PERF_RECORD_MISC_SWITCH_OUT;
>> +       struct guest_session *gs = &inject->guest_session;
>> +       u32 pid = event->context_switch.next_prev_pid;
>> +       u32 tid = event->context_switch.next_prev_tid;
>> +       struct guest_tid *guest_tid;
>> +       u32 vcpu;
>> +
>> +       if (out || pid != gs->machine_pid)
>> +               goto out;
>> +
>> +       guest_tid = guest_session__lookup_tid(gs, tid);
>> +       if (!guest_tid)
>> +               goto out;
>> +
>> +       if (sample->cpu == (u32)-1) {
>> +               pr_err("Switch event does not have CPU\n");
>> +               return -EINVAL;
>> +       }
>> +
>> +       vcpu = guest_tid->vcpu;
>> +       if (vcpu >= gs->vcpu_cnt)
>> +               return -EINVAL;
>> +
>> +       /* Guest is switching in, record which CPU the VCPU is now running on */
>> +       gs->vcpu[vcpu].cpu = sample->cpu;
>> +out:
>> +       return host__repipe(tool, event, sample, machine);
>> +}
>> +
>>  static void sig_handler(int sig __maybe_unused)
>>  {
>>         session_done = 1;
>> @@ -767,6 +1666,61 @@ static int parse_vm_time_correlation(const struct option *opt, const char *str,
>>         return inject->itrace_synth_opts.vm_tm_corr_args ? 0 : -ENOMEM;
>>  }
>>
>> +static int parse_guest_data(const struct option *opt, const char *str, int unset)
>> +{
>> +       struct perf_inject *inject = opt->value;
>> +       struct guest_session *gs = &inject->guest_session;
>> +       char *tok;
>> +       char *s;
>> +
>> +       if (unset)
>> +               return 0;
>> +
>> +       if (!str)
>> +               goto bad_args;
>> +
>> +       s = strdup(str);
>> +       if (!s)
>> +               return -ENOMEM;
>> +
>> +       gs->perf_data_file = strsep(&s, ",");
>> +       if (!gs->perf_data_file)
>> +               goto bad_args;
>> +
>> +       gs->copy_kcore_dir = has_kcore_dir(gs->perf_data_file);
>> +       if (gs->copy_kcore_dir)
>> +               inject->output.is_dir = true;
>> +
>> +       tok = strsep(&s, ",");
>> +       if (!tok)
>> +               goto bad_args;
>> +       gs->machine_pid = strtoul(tok, NULL, 0);
>> +       if (!inject->guest_session.machine_pid)
>> +               goto bad_args;
>> +
>> +       gs->time_scale = 1;
>> +
>> +       tok = strsep(&s, ",");
>> +       if (!tok)
>> +               goto out;
>> +       gs->time_offset = strtoull(tok, NULL, 0);
>> +
>> +       tok = strsep(&s, ",");
>> +       if (!tok)
>> +               goto out;
>> +       gs->time_scale = strtod(tok, NULL);
>> +       if (!gs->time_scale)
>> +               goto bad_args;
>> +out:
>> +       return 0;
>> +
>> +bad_args:
>> +       pr_err("--guest-data option requires guest perf.data file name, "
>> +              "guest machine PID, and optionally guest timestamp offset, "
>> +              "and guest timestamp scale factor, separated by commas.\n");
>> +       return -1;
>> +}
>> +
>>  static int save_section_info_cb(struct perf_file_section *section,
>>                                 struct perf_header *ph __maybe_unused,
>>                                 int feat, int fd __maybe_unused, void *data)
>> @@ -896,6 +1850,22 @@ static int copy_kcore_dir(struct perf_inject *inject)
>>         return ret;
>>  }
>>
>> +static int guest_session__copy_kcore_dir(struct guest_session *gs)
>> +{
>> +       struct perf_inject *inject = container_of(gs, struct perf_inject, guest_session);
>> +       char *cmd;
>> +       int ret;
>> +
>> +       ret = asprintf(&cmd, "cp -r -n %s/kcore_dir %s/kcore_dir__%u >/dev/null 2>&1",
>> +                      gs->perf_data_file, inject->output.path, gs->machine_pid);
>> +       if (ret < 0)
>> +               return ret;
>> +       pr_debug("%s\n", cmd);
>> +       ret = system(cmd);
>> +       free(cmd);
>> +       return ret;
>> +}
>> +
>>  static int output_fd(struct perf_inject *inject)
>>  {
>>         return inject->in_place_update ? -1 : perf_data__fd(&inject->output);
>> @@ -904,6 +1874,7 @@ static int output_fd(struct perf_inject *inject)
>>  static int __cmd_inject(struct perf_inject *inject)
>>  {
>>         int ret = -EINVAL;
>> +       struct guest_session *gs = &inject->guest_session;
>>         struct perf_session *session = inject->session;
>>         int fd = output_fd(inject);
>>         u64 output_data_offset;
>> @@ -968,6 +1939,47 @@ static int __cmd_inject(struct perf_inject *inject)
>>                 output_data_offset = roundup(8192 + session->header.data_offset, 4096);
>>                 if (inject->strip)
>>                         strip_init(inject);
>> +       } else if (gs->perf_data_file) {
>> +               char *name = gs->perf_data_file;
>> +
>> +               /*
>> +                * Not strictly necessary, but keep these events in order wrt
>> +                * guest events.
>> +                */
>> +               inject->tool.mmap               = host__repipe;
>> +               inject->tool.mmap2              = host__repipe;
>> +               inject->tool.comm               = host__repipe;
>> +               inject->tool.fork               = host__repipe;
>> +               inject->tool.exit               = host__repipe;
>> +               inject->tool.lost               = host__repipe;
>> +               inject->tool.context_switch     = host__repipe;
>> +               inject->tool.ksymbol            = host__repipe;
>> +               inject->tool.text_poke          = host__repipe;
>> +               /*
>> +                * Once the host session has initialized, set up sample ID
>> +                * mapping and feed in guest attrs, build IDs and initial
>> +                * events.
>> +                */
>> +               inject->tool.finished_init      = host__finished_init;
>> +               /* Obey finished round ordering */
>> +               inject->tool.finished_round     = host__finished_round,
>> +               /* Keep track of which CPU a VCPU is runnng on */
>> +               inject->tool.context_switch     = host__context_switch;
>> +               /*
>> +                * Must order events to be able to obey finished round
>> +                * ordering.
>> +                */
>> +               inject->tool.ordered_events     = true;
>> +               inject->tool.ordering_requires_timestamps = true;
>> +               /* Set up a separate session to process guest perf.data file */
>> +               ret = guest_session__start(gs, name, session->data->force);
>> +               if (ret) {
>> +                       pr_err("Failed to process %s, error %d\n", name, ret);
>> +                       return ret;
>> +               }
>> +               /* Allow space in the header for guest attributes */
>> +               output_data_offset += gs->session->header.data_offset;
>> +               output_data_offset = roundup(output_data_offset, 4096);
>>         }
>>
>>         if (!inject->itrace_synth_opts.set)
>> @@ -980,6 +1992,18 @@ static int __cmd_inject(struct perf_inject *inject)
>>         if (ret)
>>                 return ret;
>>
>> +       if (gs->session) {
>> +               /*
>> +                * Remaining guest events have later timestamps. Flush them
>> +                * out to file.
>> +                */
>> +               ret = guest_session__flush_events(gs);
>> +               if (ret) {
>> +                       pr_err("Failed to flush guest events\n");
>> +                       return ret;
>> +               }
>> +       }
>> +
>>         if (!inject->is_pipe && !inject->in_place_update) {
>>                 struct inject_fc inj_fc = {
>>                         .fc.copy = feat_copy_cb,
>> @@ -1014,8 +2038,17 @@ static int __cmd_inject(struct perf_inject *inject)
>>
>>                 if (inject->copy_kcore_dir) {
>>                         ret = copy_kcore_dir(inject);
>> -                       if (ret)
>> +                       if (ret) {
>> +                               pr_err("Failed to copy kcore\n");
>>                                 return ret;
>> +                       }
>> +               }
>> +               if (gs->copy_kcore_dir) {
>> +                       ret = guest_session__copy_kcore_dir(gs);
>> +                       if (ret) {
>> +                               pr_err("Failed to copy guest kcore\n");
>> +                               return ret;
>> +                       }
>>                 }
>>         }
>>
>> @@ -1113,6 +2146,12 @@ int cmd_inject(int argc, const char **argv)
>>                 OPT_CALLBACK_OPTARG(0, "vm-time-correlation", &inject, NULL, "opts",
>>                                     "correlate time between VM guests and the host",
>>                                     parse_vm_time_correlation),
>> +               OPT_CALLBACK_OPTARG(0, "guest-data", &inject, NULL, "opts",
>> +                                   "inject events from a guest perf.data file",
>> +                                   parse_guest_data),
>> +               OPT_STRING(0, "guestmount", &symbol_conf.guestmount, "directory",
>> +                          "guest mount directory under which every guest os"
>> +                          " instance has a subdir"),
> 
> Should guestmount also be in the man page? Also should it have a
> hyphen like guest-data?

Sent a patch to update the man page.

It is "guestmount" in other tools so we should stick with that.

> 
> Thanks,
> Ian
> 
>>                 OPT_END()
>>         };
>>         const char * const inject_usage[] = {
>> @@ -1243,6 +2282,8 @@ int cmd_inject(int argc, const char **argv)
>>
>>         ret = __cmd_inject(&inject);
>>
>> +       guest_session__exit(&inject.guest_session);
>> +
>>  out_delete:
>>         zstd_fini(&(inject.session->zstd_data));
>>         perf_session__delete(inject.session);
>> --
>> 2.25.1
>>


^ permalink raw reply	[flat|nested] 83+ messages in thread

end of thread, other threads:[~2022-08-11 17:22 UTC | newest]

Thread overview: 83+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-11  9:31 [PATCH 00/35] perf intel-pt: Add support for tracing virtual machine user space on the host Adrian Hunter
2022-07-11  9:31 ` [PATCH 01/35] perf tools: Fix dso_id inode generation comparison Adrian Hunter
2022-07-18 14:57   ` Arnaldo Carvalho de Melo
2022-07-19 10:18     ` Adrian Hunter
2022-07-19 15:13       ` Ian Rogers
2022-07-19 19:16         ` Arnaldo Carvalho de Melo
2022-07-11  9:31 ` [PATCH 02/35] perf tools: Export dsos__for_each_with_build_id() Adrian Hunter
2022-07-19 16:55   ` Ian Rogers
2022-07-11  9:31 ` [PATCH 03/35] perf ordered_events: Add ordered_events__last_flush_time() Adrian Hunter
2022-07-19 16:56   ` Ian Rogers
2022-07-11  9:31 ` [PATCH 04/35] perf tools: Export perf_event__process_finished_round() Adrian Hunter
2022-07-19 17:04   ` Ian Rogers
2022-08-09 11:37     ` Adrian Hunter
2022-07-11  9:31 ` [PATCH 05/35] perf tools: Factor out evsel__id_hdr_size() Adrian Hunter
2022-07-19 17:09   ` Ian Rogers
2022-08-09 11:49     ` Adrian Hunter
2022-08-09 17:07       ` Ian Rogers
2022-07-11  9:31 ` [PATCH 06/35] perf tools: Add perf_event__synthesize_id_sample() Adrian Hunter
2022-07-19 17:10   ` Ian Rogers
2022-07-11  9:31 ` [PATCH 07/35] perf script: Add --dump-unsorted-raw-trace option Adrian Hunter
2022-07-19 17:11   ` Ian Rogers
2022-07-11  9:31 ` [PATCH 08/35] perf buildid-cache: Add guestmount'd files to the build ID cache Adrian Hunter
2022-07-19 17:41   ` Ian Rogers
2022-08-09 12:21     ` Adrian Hunter
2022-07-11  9:31 ` [PATCH 09/35] perf buildid-cache: Do not require purge files to also be in the file system Adrian Hunter
2022-07-19 17:44   ` Ian Rogers
2022-07-11  9:31 ` [PATCH 10/35] perf tools: Add machine_pid and vcpu to id_index Adrian Hunter
2022-07-19 17:48   ` Ian Rogers
2022-08-09 12:19     ` Adrian Hunter
2022-07-11  9:31 ` [PATCH 11/35] perf session: Create guest machines from id_index Adrian Hunter
2022-07-19 17:51   ` Ian Rogers
2022-07-11  9:31 ` [PATCH 12/35] perf tools: Add guest_cpu to hypervisor threads Adrian Hunter
2022-07-20  0:23   ` Ian Rogers
2022-07-11  9:31 ` [PATCH 13/35] perf tools: Add machine_pid and vcpu to perf_sample Adrian Hunter
2022-07-20  0:36   ` Ian Rogers
2022-07-11  9:31 ` [PATCH 14/35] perf tools: Use sample->machine_pid to find guest machine Adrian Hunter
2022-07-20  0:37   ` Ian Rogers
2022-07-11  9:31 ` [PATCH 15/35] perf script: Add machine_pid and vcpu Adrian Hunter
2022-07-20  0:39   ` Ian Rogers
2022-07-11  9:31 ` [PATCH 16/35] perf dlfilter: " Adrian Hunter
2022-07-20  0:42   ` Ian Rogers
2022-07-11  9:32 ` [PATCH 17/35] perf auxtrace: Add machine_pid and vcpu to auxtrace_error Adrian Hunter
2022-07-20  0:43   ` Ian Rogers
2022-07-11  9:32 ` [PATCH 18/35] perf script python: Add machine_pid and vcpu Adrian Hunter
2022-07-20  0:43   ` Ian Rogers
2022-07-11  9:32 ` [PATCH 19/35] perf script python: intel-pt-events: " Adrian Hunter
2022-07-20  0:44   ` Ian Rogers
2022-07-11  9:32 ` [PATCH 20/35] perf tools: Remove also guest kcore_dir with host kcore_dir Adrian Hunter
2022-07-20  0:45   ` Ian Rogers
2022-07-11  9:32 ` [PATCH 21/35] perf tools: Make has_kcore_dir() work also for guest kcore_dir Adrian Hunter
2022-07-20  0:49   ` Ian Rogers
2022-07-11  9:32 ` [PATCH 22/35] perf tools: Automatically use guest kcore_dir if present Adrian Hunter
2022-07-20  0:51   ` Ian Rogers
2022-07-11  9:32 ` [PATCH 23/35] perf tools: Add reallocarray_as_needed() Adrian Hunter
2022-07-20  0:55   ` Ian Rogers
2022-08-09 16:48     ` Adrian Hunter
2022-07-11  9:32 ` [PATCH 24/35] perf inject: Add support for injecting guest sideband events Adrian Hunter
2022-07-20  1:06   ` Ian Rogers
2022-08-11 17:19     ` Adrian Hunter
2022-07-11  9:32 ` [PATCH 25/35] perf machine: Use realloc_array_as_needed() in machine__set_current_tid() Adrian Hunter
2022-07-11  9:32 ` [PATCH 26/35] perf tools: Handle injected guest kernel mmap event Adrian Hunter
2022-07-20  1:09   ` Ian Rogers
2022-07-11  9:32 ` [PATCH 27/35] perf tools: Add perf_event__is_guest() Adrian Hunter
2022-07-20  1:11   ` Ian Rogers
2022-07-20 14:06     ` Arnaldo Carvalho de Melo
2022-07-20 14:56       ` Ian Rogers
2022-07-11  9:32 ` [PATCH 28/35] perf intel-pt: Remove guest_machine_pid Adrian Hunter
2022-07-20  1:12   ` Ian Rogers
2022-07-11  9:32 ` [PATCH 29/35] perf intel-pt: Add some more logging to intel_pt_walk_next_insn() Adrian Hunter
2022-07-20  1:13   ` Ian Rogers
2022-07-11  9:32 ` [PATCH 30/35] perf intel-pt: Track guest context switches Adrian Hunter
2022-07-20  1:13   ` Ian Rogers
2022-07-11  9:32 ` [PATCH 31/35] perf intel-pt: Disable sync switch with guest sideband Adrian Hunter
2022-07-20  1:14   ` Ian Rogers
2022-07-11  9:32 ` [PATCH 32/35] perf intel-pt: Determine guest thread from " Adrian Hunter
2022-07-20  1:15   ` Ian Rogers
2022-07-11  9:32 ` [PATCH 33/35] perf intel-pt: Add machine_pid and vcpu to auxtrace_error Adrian Hunter
2022-07-20  5:27   ` Ian Rogers
2022-07-11  9:32 ` [PATCH 34/35] perf intel-pt: Use guest pid/tid etc in guest samples Adrian Hunter
2022-07-20  5:28   ` Ian Rogers
2022-07-11  9:32 ` [PATCH 35/35] perf intel-pt: Add documentation for tracing guest machine user space Adrian Hunter
2022-07-20  5:29   ` Ian Rogers
2022-07-18 15:28 ` [PATCH 00/35] perf intel-pt: Add support for tracing virtual machine user space on the host Arnaldo Carvalho de Melo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).