All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V4 00/15] perf intel-pt: Better support for perf record --cpu
@ 2022-05-24  7:54 Adrian Hunter
  2022-05-24  7:54 ` [PATCH V4 01/15] perf intel-pt: Add a test for system-wide side band Adrian Hunter
                   ` (15 more replies)
  0 siblings, 16 replies; 18+ messages in thread
From: Adrian Hunter @ 2022-05-24  7:54 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Ian Rogers, Alexey Bayduraev, Namhyung Kim, Leo Yan,
	linux-kernel

Hi

Here are V4 patches to support capturing Intel PT sideband events such as
mmap, task, context switch, text poke etc, on every CPU even when tracing
selected user_requested_cpus.  That is, when using the perf record -C or
 --cpu option.

This is needed for:
1. text poke: a text poke on any CPU affects all CPUs
2. tracing user space: a user space process can migrate between CPUs so
mmap events that happen on a different CPU can be needed to decode a
user_requested_cpus CPU.

For example:

	Trace on CPU 1:

	perf record --kcore -C 1 -e intel_pt// &

	Start a task on CPU 0:

	taskset 0x1 testprog &

	Migrate it to CPU 1:

	taskset -p 0x2 <testprog pid>

	Stop tracing:

	kill %1

	Prior to these changes there will be errors decoding testprog
	in userspace because the comm and mmap events for testprog will not
	have been captured.

There is quite a bit of preparation:

The first patch is a small Intel PT test for system-wide side band.  The
test fails before the patches are applied, passed afterwards.

      perf intel-pt: Add a test for system-wide side band [new in V1]

The next 5 patches (now already applied) stop auxtrace mixing up mmap idx
between evlist and evsel.  That is going to matter when
evlist->all_cpus != evlist->user_requested_cpus != evsel->cpus:

      libperf evsel: Factor out perf_evsel__ioctl() [now applied]
      libperf evsel: Add perf_evsel__enable_thread()
      perf evlist: Use libperf functions in evlist__enable_event_idx()
      perf auxtrace: Move evlist__enable_event_idx() to auxtrace.c
      perf auxtrace: Do not mix up mmap idx

The next 6 patches (first 4 now already applied) stop attempts to auxtrace
mmap when it is not an auxtrace event e.g. when mmapping the CPUs on which
only sideband is captured:

      libperf evlist: Remove ->idx() per_cpu parameter
      libperf evlist: Move ->idx() into mmap_per_evsel()
      libperf evlist: Add evsel as a parameter to ->idx()
      perf auxtrace: Record whether an auxtrace mmap is needed
      perf auxctrace: Add mmap_needed to auxtrace_mmap_params
      perf auxtrace: Remove auxtrace_mmap_params__set_idx() per_cpu parameter

The next 5 patches switch to setting up dummy event maps before adding the
evsel so that the evsel is subject to map propagation, primarily to cause
addition of the evsel's CPUs to all_cpus.

      perf evlist: Factor out evlist__dummy_event()
      perf evlist: Add evlist__add_system_wide_dummy()
      perf record: Use evlist__add_system_wide_dummy() in record__config_text_poke()
      perf intel-pt: Use evlist__add_system_wide_dummy() for switch tracking
      perf intel-pt: Track sideband system-wide when needed

The remaining patches make more significant changes.

First change from using user_requested_cpus to using all_cpus where necessary:

      perf tools: Allow all_cpus to be a superset of user_requested_cpus

Secondly, mmap all per-thread and all per-cpu events:

      libperf evlist: Allow mixing per-thread and per-cpu mmaps
      libperf evlist: Check nr_mmaps is correct [new in V1]

Stop using system_wide flag for uncore because it will not work anymore:

      perf stat: Add requires_cpu flag for uncore
      libperf evsel: Add comments for booleans [new in V1]

Finally change map propagation so that system-wide events retain their cpus and
(dummy) threads:

      perf tools: Allow system-wide events to keep their own CPUs
      perf tools: Allow system-wide events to keep their own threads


Changes in V4:

      Added Acked-by: Namhyung Kim <namhyung@kernel.org>
      Added a couple Acked-by: Ian Rogers <irogers@google.com>

      perf intel-pt: Add a test for system-wide side band
	Put in commit message that test succeeds only after other
	patches applied

      libperf evsel: Add perf_evsel__enable_thread()
      perf evlist: Use libperf functions in evlist__enable_event_idx()
      perf auxtrace: Move evlist__enable_event_idx() to auxtrace.c
      perf auxtrace: Do not mix up mmap idx
      libperf evlist: Remove ->idx() per_cpu parameter
      libperf evlist: Move ->idx() into mmap_per_evsel()
      libperf evlist: Add evsel as a parameter to ->idx()
      perf auxtrace: Record whether an auxtrace mmap is needed
	Omitted because already applied

      libperf evsel: Add comments for booleans
	Amended comment about own_cpus


Changes in V3:

      perf auxtrace: Add mmap_needed to auxtrace_mmap_params
	Amended mmap_needed comment

      perf evlist: Add evlist__add_dummy_on_all_cpus()
	Amended comment about all CPUs.


Changes in V2:

      Added some Acked-by: Ian Rogers <irogers@google.com>

      libperf evsel: Add perf_evsel__enable_thread()
	Use perf_cpu_map__for_each_cpu()

      perf auxtrace: Add mmap_needed to auxtrace_mmap_params
	Add documentation comment for mmap_needed

      perf auxtrace: Remove auxtrace_mmap_params__set_idx() per_cpu parameter
	Fix missing auxtrace_mmap_params__set_idx change

      libperf evlist: Check nr_mmaps is correct
	Remove unused code

      libperf evsel: Add comments for booleans
	Amend comments

      perf evlist: Add evlist__add_dummy_on_all_cpus()
	Rename evlist__add_system_wide -> evlist__add_on_all_cpus
	Changed patch subject accordingly

      perf record: Use evlist__add_dummy_on_all_cpus() in record__config_text_poke()
	Rename evlist__add_system_wide -> evlist__add_on_all_cpus
	Changed patch subject accordingly

      perf intel-pt: Use evlist__add_dummy_on_all_cpus() for switch tracking
	Rename evlist__add_system_wide -> evlist__add_on_all_cpus
	Changed patch subject accordingly


Changes in V1:

      perf intel-pt: Add a test for system-wide side band
	New patch

      libperf evsel: Factor out perf_evsel__ioctl()
	Dropped because it has been applied.

      libperf evsel: Add perf_evsel__enable_thread()
	Rename variable i -> idx

      perf auxtrace: Do not mix up mmap idx
	Rename variable cpu to cpu_map_idx

      perf tools: Allow all_cpus to be a superset of user_requested_cpus
	Add Acked-by: Ian Rogers <irogers@google.com>

      libperf evlist: Allow mixing per-thread and per-cpu mmaps
	Fix perf_evlist__nr_mmaps() calculation

      libperf evlist: Check nr_mmaps is correct
	New patch

      libperf evsel: Add comments for booleans
	New patch

      perf tools: Allow system-wide events to keep their own CPUs
      perf tools: Allow system-wide events to keep their own threads


Adrian Hunter (15):
      perf intel-pt: Add a test for system-wide side band
      perf auxtrace: Add mmap_needed to auxtrace_mmap_params
      perf auxtrace: Remove auxtrace_mmap_params__set_idx() per_cpu parameter
      perf evlist: Factor out evlist__dummy_event()
      perf evlist: Add evlist__add_dummy_on_all_cpus()
      perf record: Use evlist__add_dummy_on_all_cpus() in record__config_text_poke()
      perf intel-pt: Use evlist__add_dummy_on_all_cpus() for switch tracking
      perf intel-pt: Track sideband system-wide when needed
      perf tools: Allow all_cpus to be a superset of user_requested_cpus
      libperf evlist: Allow mixing per-thread and per-cpu mmaps
      libperf evlist: Check nr_mmaps is correct
      perf stat: Add requires_cpu flag for uncore
      libperf evsel: Add comments for booleans
      perf tools: Allow system-wide events to keep their own CPUs
      perf tools: Allow system-wide events to keep their own threads

 tools/lib/perf/evlist.c                 | 71 ++++++++++++++-------------------
 tools/lib/perf/include/internal/evsel.h | 11 +++++
 tools/perf/arch/x86/util/intel-pt.c     | 31 ++++++--------
 tools/perf/builtin-record.c             | 39 +++++++-----------
 tools/perf/builtin-stat.c               |  5 +--
 tools/perf/tests/shell/test_intel_pt.sh | 71 +++++++++++++++++++++++++++++++++
 tools/perf/util/auxtrace.c              | 15 +++++--
 tools/perf/util/auxtrace.h              | 13 ++++--
 tools/perf/util/evlist.c                | 61 +++++++++++++++++++++++++---
 tools/perf/util/evlist.h                |  5 +++
 tools/perf/util/evsel.c                 |  1 +
 tools/perf/util/mmap.c                  |  4 +-
 tools/perf/util/parse-events.c          |  2 +-
 13 files changed, 226 insertions(+), 103 deletions(-)
 create mode 100755 tools/perf/tests/shell/test_intel_pt.sh


Regards
Adrian

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH V4 01/15] perf intel-pt: Add a test for system-wide side band
  2022-05-24  7:54 [PATCH V4 00/15] perf intel-pt: Better support for perf record --cpu Adrian Hunter
@ 2022-05-24  7:54 ` Adrian Hunter
  2022-05-24  7:54 ` [PATCH V4 02/15] perf auxtrace: Add mmap_needed to auxtrace_mmap_params Adrian Hunter
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: Adrian Hunter @ 2022-05-24  7:54 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Ian Rogers, Alexey Bayduraev, Namhyung Kim, Leo Yan,
	linux-kernel

Add a test for system-wide side band even when tracing selected CPUs.

The test fails before the patches up to "perf tools: Allow system-wide
events to keep their own CPUs" are applied, passes afterwards.

Tested-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/tests/shell/test_intel_pt.sh | 71 +++++++++++++++++++++++++
 1 file changed, 71 insertions(+)
 create mode 100755 tools/perf/tests/shell/test_intel_pt.sh

diff --git a/tools/perf/tests/shell/test_intel_pt.sh b/tools/perf/tests/shell/test_intel_pt.sh
new file mode 100755
index 000000000000..a3298643884d
--- /dev/null
+++ b/tools/perf/tests/shell/test_intel_pt.sh
@@ -0,0 +1,71 @@
+#!/bin/sh
+# Miscellaneous Intel PT testing
+# SPDX-License-Identifier: GPL-2.0
+
+set -e
+
+# Skip if no Intel PT
+perf list | grep -q 'intel_pt//' || exit 2
+
+skip_cnt=0
+ok_cnt=0
+err_cnt=0
+
+tmpfile=`mktemp`
+perfdatafile=`mktemp`
+
+can_cpu_wide()
+{
+	perf record -o ${tmpfile} -B -N --no-bpf-event -e dummy:u -C $1 true 2>&1 >/dev/null || return 2
+	return 0
+}
+
+test_system_wide_side_band()
+{
+	# Need CPU 0 and CPU 1
+	can_cpu_wide 0 || return $?
+	can_cpu_wide 1 || return $?
+
+	# Record on CPU 0 a task running on CPU 1
+	perf record -B -N --no-bpf-event -o ${perfdatafile} -e intel_pt//u -C 0 -- taskset --cpu-list 1 uname
+
+	# Should get MMAP events from CPU 1 because they can be needed to decode
+	mmap_cnt=`perf script -i ${perfdatafile} --no-itrace --show-mmap-events -C 1 2>/dev/null | grep MMAP | wc -l`
+
+	if [ ${mmap_cnt} -gt 0 ] ; then
+		return 0
+	fi
+
+	echo "Failed to record MMAP events on CPU 1 when tracing CPU 0"
+	return 1
+}
+
+count_result()
+{
+	if [ $1 -eq 2 ] ; then
+		skip_cnt=`expr ${skip_cnt} \+ 1`
+		return
+	fi
+	if [ $1 -eq 0 ] ; then
+		ok_cnt=`expr ${ok_cnt} \+ 1`
+		return
+	fi
+	err_cnt=`expr ${err_cnt} \+ 1`
+}
+
+test_system_wide_side_band
+
+count_result $?
+
+rm -f ${tmpfile}
+rm -f ${perfdatafile}
+
+if [ ${err_cnt} -gt 0 ] ; then
+	exit 1
+fi
+
+if [ ${ok_cnt} -gt 0 ] ; then
+	exit 0
+fi
+
+exit 2
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH V4 02/15] perf auxtrace: Add mmap_needed to auxtrace_mmap_params
  2022-05-24  7:54 [PATCH V4 00/15] perf intel-pt: Better support for perf record --cpu Adrian Hunter
  2022-05-24  7:54 ` [PATCH V4 01/15] perf intel-pt: Add a test for system-wide side band Adrian Hunter
@ 2022-05-24  7:54 ` Adrian Hunter
  2022-05-24  7:54 ` [PATCH V4 03/15] perf auxtrace: Remove auxtrace_mmap_params__set_idx() per_cpu parameter Adrian Hunter
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: Adrian Hunter @ 2022-05-24  7:54 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Ian Rogers, Alexey Bayduraev, Namhyung Kim, Leo Yan,
	linux-kernel

Add mmap_needed to auxtrace_mmap_params.

Currently an auxtrace mmap is always attempted even if the event is not an
auxtrace event. That works because, when AUX area tracing, there is always
an auxtrace event first for every mmap. Prepare for that not being the
case, which it won't be when sideband tracking events are allowed on
all CPUs even when auxtrace is limited to selected CPUs.

Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/auxtrace.c | 10 ++++++++--
 tools/perf/util/auxtrace.h | 11 +++++++++--
 tools/perf/util/evlist.c   |  5 +++--
 tools/perf/util/mmap.c     |  1 +
 4 files changed, 21 insertions(+), 6 deletions(-)

diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c
index b11549ae39df..b446cfa66469 100644
--- a/tools/perf/util/auxtrace.c
+++ b/tools/perf/util/auxtrace.c
@@ -125,7 +125,7 @@ int auxtrace_mmap__mmap(struct auxtrace_mmap *mm,
 	mm->tid = mp->tid;
 	mm->cpu = mp->cpu.cpu;
 
-	if (!mp->len) {
+	if (!mp->len || !mp->mmap_needed) {
 		mm->base = NULL;
 		return 0;
 	}
@@ -168,9 +168,15 @@ void auxtrace_mmap_params__init(struct auxtrace_mmap_params *mp,
 }
 
 void auxtrace_mmap_params__set_idx(struct auxtrace_mmap_params *mp,
-				   struct evlist *evlist, int idx,
+				   struct evlist *evlist,
+				   struct evsel *evsel, int idx,
 				   bool per_cpu)
 {
+	mp->mmap_needed = evsel->needs_auxtrace_mmap;
+
+	if (!mp->mmap_needed)
+		return;
+
 	mp->idx = idx;
 
 	if (per_cpu) {
diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h
index dc38b6f57232..695591b73ae1 100644
--- a/tools/perf/util/auxtrace.h
+++ b/tools/perf/util/auxtrace.h
@@ -344,6 +344,10 @@ struct auxtrace_mmap {
  * @idx: index of this mmap
  * @tid: tid for a per-thread mmap (also set if there is only 1 tid on a per-cpu
  *       mmap) otherwise %0
+ * @mmap_needed: set to %false for non-auxtrace events. This is needed because
+ *               auxtrace mmapping is done in the same code path as non-auxtrace
+ *               mmapping but not every evsel that needs non-auxtrace mmapping
+ *               also needs auxtrace mmapping.
  * @cpu: cpu number for a per-cpu mmap otherwise %-1
  */
 struct auxtrace_mmap_params {
@@ -353,6 +357,7 @@ struct auxtrace_mmap_params {
 	int		prot;
 	int		idx;
 	pid_t		tid;
+	bool		mmap_needed;
 	struct perf_cpu	cpu;
 };
 
@@ -490,7 +495,8 @@ void auxtrace_mmap_params__init(struct auxtrace_mmap_params *mp,
 				unsigned int auxtrace_pages,
 				bool auxtrace_overwrite);
 void auxtrace_mmap_params__set_idx(struct auxtrace_mmap_params *mp,
-				   struct evlist *evlist, int idx,
+				   struct evlist *evlist,
+				   struct evsel *evsel, int idx,
 				   bool per_cpu);
 
 typedef int (*process_auxtrace_t)(struct perf_tool *tool,
@@ -863,7 +869,8 @@ void auxtrace_mmap_params__init(struct auxtrace_mmap_params *mp,
 				unsigned int auxtrace_pages,
 				bool auxtrace_overwrite);
 void auxtrace_mmap_params__set_idx(struct auxtrace_mmap_params *mp,
-				   struct evlist *evlist, int idx,
+				   struct evlist *evlist,
+				   struct evsel *evsel, int idx,
 				   bool per_cpu);
 
 #define ITRACE_HELP ""
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 7f9f588e88c6..9e0fabfb096d 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -747,15 +747,16 @@ static struct mmap *evlist__alloc_mmap(struct evlist *evlist,
 
 static void
 perf_evlist__mmap_cb_idx(struct perf_evlist *_evlist,
-			 struct perf_evsel *_evsel __maybe_unused,
+			 struct perf_evsel *_evsel,
 			 struct perf_mmap_param *_mp,
 			 int idx)
 {
 	struct evlist *evlist = container_of(_evlist, struct evlist, core);
 	struct mmap_params *mp = container_of(_mp, struct mmap_params, core);
 	bool per_cpu = !perf_cpu_map__empty(_evlist->user_requested_cpus);
+	struct evsel *evsel = container_of(_evsel, struct evsel, core);
 
-	auxtrace_mmap_params__set_idx(&mp->auxtrace_mp, evlist, idx, per_cpu);
+	auxtrace_mmap_params__set_idx(&mp->auxtrace_mp, evlist, evsel, idx, per_cpu);
 }
 
 static struct perf_mmap*
diff --git a/tools/perf/util/mmap.c b/tools/perf/util/mmap.c
index 50502b4a7ca4..de59c4da852b 100644
--- a/tools/perf/util/mmap.c
+++ b/tools/perf/util/mmap.c
@@ -62,6 +62,7 @@ void __weak auxtrace_mmap_params__init(struct auxtrace_mmap_params *mp __maybe_u
 
 void __weak auxtrace_mmap_params__set_idx(struct auxtrace_mmap_params *mp __maybe_unused,
 					  struct evlist *evlist __maybe_unused,
+					  struct evsel *evsel __maybe_unused,
 					  int idx __maybe_unused,
 					  bool per_cpu __maybe_unused)
 {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH V4 03/15] perf auxtrace: Remove auxtrace_mmap_params__set_idx() per_cpu parameter
  2022-05-24  7:54 [PATCH V4 00/15] perf intel-pt: Better support for perf record --cpu Adrian Hunter
  2022-05-24  7:54 ` [PATCH V4 01/15] perf intel-pt: Add a test for system-wide side band Adrian Hunter
  2022-05-24  7:54 ` [PATCH V4 02/15] perf auxtrace: Add mmap_needed to auxtrace_mmap_params Adrian Hunter
@ 2022-05-24  7:54 ` Adrian Hunter
  2022-05-24  7:54 ` [PATCH V4 04/15] perf evlist: Factor out evlist__dummy_event() Adrian Hunter
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: Adrian Hunter @ 2022-05-24  7:54 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Ian Rogers, Alexey Bayduraev, Namhyung Kim, Leo Yan,
	linux-kernel

Remove auxtrace_mmap_params__set_idx() per_cpu parameter because it isn't
needed.

Acked-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/auxtrace.c | 5 +++--
 tools/perf/util/auxtrace.h | 6 ++----
 tools/perf/util/evlist.c   | 3 +--
 tools/perf/util/mmap.c     | 3 +--
 4 files changed, 7 insertions(+), 10 deletions(-)

diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c
index b446cfa66469..ac4e4660932d 100644
--- a/tools/perf/util/auxtrace.c
+++ b/tools/perf/util/auxtrace.c
@@ -169,9 +169,10 @@ void auxtrace_mmap_params__init(struct auxtrace_mmap_params *mp,
 
 void auxtrace_mmap_params__set_idx(struct auxtrace_mmap_params *mp,
 				   struct evlist *evlist,
-				   struct evsel *evsel, int idx,
-				   bool per_cpu)
+				   struct evsel *evsel, int idx)
 {
+	bool per_cpu = !perf_cpu_map__empty(evlist->core.user_requested_cpus);
+
 	mp->mmap_needed = evsel->needs_auxtrace_mmap;
 
 	if (!mp->mmap_needed)
diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h
index 695591b73ae1..cd0d25c2751c 100644
--- a/tools/perf/util/auxtrace.h
+++ b/tools/perf/util/auxtrace.h
@@ -496,8 +496,7 @@ void auxtrace_mmap_params__init(struct auxtrace_mmap_params *mp,
 				bool auxtrace_overwrite);
 void auxtrace_mmap_params__set_idx(struct auxtrace_mmap_params *mp,
 				   struct evlist *evlist,
-				   struct evsel *evsel, int idx,
-				   bool per_cpu);
+				   struct evsel *evsel, int idx);
 
 typedef int (*process_auxtrace_t)(struct perf_tool *tool,
 				  struct mmap *map,
@@ -870,8 +869,7 @@ void auxtrace_mmap_params__init(struct auxtrace_mmap_params *mp,
 				bool auxtrace_overwrite);
 void auxtrace_mmap_params__set_idx(struct auxtrace_mmap_params *mp,
 				   struct evlist *evlist,
-				   struct evsel *evsel, int idx,
-				   bool per_cpu);
+				   struct evsel *evsel, int idx);
 
 #define ITRACE_HELP ""
 
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 9e0fabfb096d..157867bc337a 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -753,10 +753,9 @@ perf_evlist__mmap_cb_idx(struct perf_evlist *_evlist,
 {
 	struct evlist *evlist = container_of(_evlist, struct evlist, core);
 	struct mmap_params *mp = container_of(_mp, struct mmap_params, core);
-	bool per_cpu = !perf_cpu_map__empty(_evlist->user_requested_cpus);
 	struct evsel *evsel = container_of(_evsel, struct evsel, core);
 
-	auxtrace_mmap_params__set_idx(&mp->auxtrace_mp, evlist, evsel, idx, per_cpu);
+	auxtrace_mmap_params__set_idx(&mp->auxtrace_mp, evlist, evsel, idx);
 }
 
 static struct perf_mmap*
diff --git a/tools/perf/util/mmap.c b/tools/perf/util/mmap.c
index de59c4da852b..a4dff881be39 100644
--- a/tools/perf/util/mmap.c
+++ b/tools/perf/util/mmap.c
@@ -63,8 +63,7 @@ void __weak auxtrace_mmap_params__init(struct auxtrace_mmap_params *mp __maybe_u
 void __weak auxtrace_mmap_params__set_idx(struct auxtrace_mmap_params *mp __maybe_unused,
 					  struct evlist *evlist __maybe_unused,
 					  struct evsel *evsel __maybe_unused,
-					  int idx __maybe_unused,
-					  bool per_cpu __maybe_unused)
+					  int idx __maybe_unused)
 {
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH V4 04/15] perf evlist: Factor out evlist__dummy_event()
  2022-05-24  7:54 [PATCH V4 00/15] perf intel-pt: Better support for perf record --cpu Adrian Hunter
                   ` (2 preceding siblings ...)
  2022-05-24  7:54 ` [PATCH V4 03/15] perf auxtrace: Remove auxtrace_mmap_params__set_idx() per_cpu parameter Adrian Hunter
@ 2022-05-24  7:54 ` Adrian Hunter
  2022-05-24  7:54 ` [PATCH V4 05/15] perf evlist: Add evlist__add_dummy_on_all_cpus() Adrian Hunter
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: Adrian Hunter @ 2022-05-24  7:54 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Ian Rogers, Alexey Bayduraev, Namhyung Kim, Leo Yan,
	linux-kernel

Factor out evlist__dummy_event() so it can be reused.

Acked-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/evlist.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 157867bc337a..efad0e691045 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -242,14 +242,20 @@ int __evlist__add_default(struct evlist *evlist, bool precise)
 	return 0;
 }
 
-int evlist__add_dummy(struct evlist *evlist)
+static struct evsel *evlist__dummy_event(struct evlist *evlist)
 {
 	struct perf_event_attr attr = {
 		.type	= PERF_TYPE_SOFTWARE,
 		.config = PERF_COUNT_SW_DUMMY,
 		.size	= sizeof(attr), /* to capture ABI version */
 	};
-	struct evsel *evsel = evsel__new_idx(&attr, evlist->core.nr_entries);
+
+	return evsel__new_idx(&attr, evlist->core.nr_entries);
+}
+
+int evlist__add_dummy(struct evlist *evlist)
+{
+	struct evsel *evsel = evlist__dummy_event(evlist);
 
 	if (evsel == NULL)
 		return -ENOMEM;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH V4 05/15] perf evlist: Add evlist__add_dummy_on_all_cpus()
  2022-05-24  7:54 [PATCH V4 00/15] perf intel-pt: Better support for perf record --cpu Adrian Hunter
                   ` (3 preceding siblings ...)
  2022-05-24  7:54 ` [PATCH V4 04/15] perf evlist: Factor out evlist__dummy_event() Adrian Hunter
@ 2022-05-24  7:54 ` Adrian Hunter
  2022-05-24  7:54 ` [PATCH V4 06/15] perf record: Use evlist__add_dummy_on_all_cpus() in record__config_text_poke() Adrian Hunter
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: Adrian Hunter @ 2022-05-24  7:54 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Ian Rogers, Alexey Bayduraev, Namhyung Kim, Leo Yan,
	linux-kernel

Add evlist__add_dummy_on_all_cpus() to enable creating a system-wide dummy
event that sets up the system-wide maps before map propagation.

For convenience, add evlist__add_aux_dummy() so that the logic can be used
whether or not the event needs to be system-wide.

Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/evlist.c | 45 ++++++++++++++++++++++++++++++++++++++++
 tools/perf/util/evlist.h |  5 +++++
 2 files changed, 50 insertions(+)

diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index efad0e691045..48af7d379d82 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -264,6 +264,51 @@ int evlist__add_dummy(struct evlist *evlist)
 	return 0;
 }
 
+static void evlist__add_on_all_cpus(struct evlist *evlist, struct evsel *evsel)
+{
+	evsel->core.system_wide = true;
+
+	/*
+	 * All CPUs.
+	 *
+	 * Note perf_event_open() does not accept CPUs that are not online, so
+	 * in fact this CPU list will include only all online CPUs.
+	 */
+	perf_cpu_map__put(evsel->core.own_cpus);
+	evsel->core.own_cpus = perf_cpu_map__new(NULL);
+	perf_cpu_map__put(evsel->core.cpus);
+	evsel->core.cpus = perf_cpu_map__get(evsel->core.own_cpus);
+
+	/* No threads */
+	perf_thread_map__put(evsel->core.threads);
+	evsel->core.threads = perf_thread_map__new_dummy();
+
+	evlist__add(evlist, evsel);
+}
+
+struct evsel *evlist__add_aux_dummy(struct evlist *evlist, bool system_wide)
+{
+	struct evsel *evsel = evlist__dummy_event(evlist);
+
+	if (!evsel)
+		return NULL;
+
+	evsel->core.attr.exclude_kernel = 1;
+	evsel->core.attr.exclude_guest = 1;
+	evsel->core.attr.exclude_hv = 1;
+	evsel->core.attr.freq = 0;
+	evsel->core.attr.sample_period = 1;
+	evsel->no_aux_samples = true;
+	evsel->name = strdup("dummy:u");
+
+	if (system_wide)
+		evlist__add_on_all_cpus(evlist, evsel);
+	else
+		evlist__add(evlist, evsel);
+
+	return evsel;
+}
+
 static int evlist__add_attrs(struct evlist *evlist, struct perf_event_attr *attrs, size_t nr_attrs)
 {
 	struct evsel *evsel, *n;
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index 4062f5aebfc1..1bde9ccf4e7d 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -114,6 +114,11 @@ int arch_evlist__add_default_attrs(struct evlist *evlist);
 struct evsel *arch_evlist__leader(struct list_head *list);
 
 int evlist__add_dummy(struct evlist *evlist);
+struct evsel *evlist__add_aux_dummy(struct evlist *evlist, bool system_wide);
+static inline struct evsel *evlist__add_dummy_on_all_cpus(struct evlist *evlist)
+{
+	return evlist__add_aux_dummy(evlist, true);
+}
 
 int evlist__add_sb_event(struct evlist *evlist, struct perf_event_attr *attr,
 			 evsel__sb_cb_t cb, void *data);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH V4 06/15] perf record: Use evlist__add_dummy_on_all_cpus() in record__config_text_poke()
  2022-05-24  7:54 [PATCH V4 00/15] perf intel-pt: Better support for perf record --cpu Adrian Hunter
                   ` (4 preceding siblings ...)
  2022-05-24  7:54 ` [PATCH V4 05/15] perf evlist: Add evlist__add_dummy_on_all_cpus() Adrian Hunter
@ 2022-05-24  7:54 ` Adrian Hunter
  2022-05-24  7:54 ` [PATCH V4 07/15] perf intel-pt: Use evlist__add_dummy_on_all_cpus() for switch tracking Adrian Hunter
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: Adrian Hunter @ 2022-05-24  7:54 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Ian Rogers, Alexey Bayduraev, Namhyung Kim, Leo Yan,
	linux-kernel

Use evlist__add_dummy_on_all_cpus() in record__config_text_poke() in
preparation for allowing system-wide events on all CPUs while the user
requested events are on only user requested CPUs.

Acked-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/builtin-record.c | 21 +++------------------
 1 file changed, 3 insertions(+), 18 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index a5cf6a99d67f..c8a79f3a8dff 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -869,7 +869,6 @@ static int record__auxtrace_init(struct record *rec __maybe_unused)
 static int record__config_text_poke(struct evlist *evlist)
 {
 	struct evsel *evsel;
-	int err;
 
 	/* Nothing to do if text poke is already configured */
 	evlist__for_each_entry(evlist, evsel) {
@@ -877,27 +876,13 @@ static int record__config_text_poke(struct evlist *evlist)
 			return 0;
 	}
 
-	err = parse_events(evlist, "dummy:u", NULL);
-	if (err)
-		return err;
-
-	evsel = evlist__last(evlist);
+	evsel = evlist__add_dummy_on_all_cpus(evlist);
+	if (!evsel)
+		return -ENOMEM;
 
-	evsel->core.attr.freq = 0;
-	evsel->core.attr.sample_period = 1;
 	evsel->core.attr.text_poke = 1;
 	evsel->core.attr.ksymbol = 1;
-
-	evsel->core.system_wide = true;
-	evsel->no_aux_samples = true;
 	evsel->immediate = true;
-
-	/* Text poke must be collected on all CPUs */
-	perf_cpu_map__put(evsel->core.own_cpus);
-	evsel->core.own_cpus = perf_cpu_map__new(NULL);
-	perf_cpu_map__put(evsel->core.cpus);
-	evsel->core.cpus = perf_cpu_map__get(evsel->core.own_cpus);
-
 	evsel__set_sample_bit(evsel, TIME);
 
 	return 0;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH V4 07/15] perf intel-pt: Use evlist__add_dummy_on_all_cpus() for switch tracking
  2022-05-24  7:54 [PATCH V4 00/15] perf intel-pt: Better support for perf record --cpu Adrian Hunter
                   ` (5 preceding siblings ...)
  2022-05-24  7:54 ` [PATCH V4 06/15] perf record: Use evlist__add_dummy_on_all_cpus() in record__config_text_poke() Adrian Hunter
@ 2022-05-24  7:54 ` Adrian Hunter
  2022-05-24  7:54 ` [PATCH V4 08/15] perf intel-pt: Track sideband system-wide when needed Adrian Hunter
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: Adrian Hunter @ 2022-05-24  7:54 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Ian Rogers, Alexey Bayduraev, Namhyung Kim, Leo Yan,
	linux-kernel

Use evlist__add_dummy_on_all_cpus() for switch tracking in preparation for
allowing system-wide events on all CPUs while the user requested events are
on only user requested CPUs.

Acked-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/arch/x86/util/intel-pt.c | 13 +++----------
 1 file changed, 3 insertions(+), 10 deletions(-)

diff --git a/tools/perf/arch/x86/util/intel-pt.c b/tools/perf/arch/x86/util/intel-pt.c
index 2eaac4638aab..0ee93894a0da 100644
--- a/tools/perf/arch/x86/util/intel-pt.c
+++ b/tools/perf/arch/x86/util/intel-pt.c
@@ -811,18 +811,11 @@ static int intel_pt_recording_options(struct auxtrace_record *itr,
 			if (!cpu_wide && perf_can_record_cpu_wide()) {
 				struct evsel *switch_evsel;
 
-				err = parse_events(evlist, "dummy:u", NULL);
-				if (err)
-					return err;
+				switch_evsel = evlist__add_dummy_on_all_cpus(evlist);
+				if (!switch_evsel)
+					return -ENOMEM;
 
-				switch_evsel = evlist__last(evlist);
-
-				switch_evsel->core.attr.freq = 0;
-				switch_evsel->core.attr.sample_period = 1;
 				switch_evsel->core.attr.context_switch = 1;
-
-				switch_evsel->core.system_wide = true;
-				switch_evsel->no_aux_samples = true;
 				switch_evsel->immediate = true;
 
 				evsel__set_sample_bit(switch_evsel, TID);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH V4 08/15] perf intel-pt: Track sideband system-wide when needed
  2022-05-24  7:54 [PATCH V4 00/15] perf intel-pt: Better support for perf record --cpu Adrian Hunter
                   ` (6 preceding siblings ...)
  2022-05-24  7:54 ` [PATCH V4 07/15] perf intel-pt: Use evlist__add_dummy_on_all_cpus() for switch tracking Adrian Hunter
@ 2022-05-24  7:54 ` Adrian Hunter
  2022-05-24  7:54 ` [PATCH V4 09/15] perf tools: Allow all_cpus to be a superset of user_requested_cpus Adrian Hunter
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: Adrian Hunter @ 2022-05-24  7:54 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Ian Rogers, Alexey Bayduraev, Namhyung Kim, Leo Yan,
	linux-kernel

User space tasks can migrate between CPUs, so when tracing selected CPUs,
sideband for all CPUs is still needed. This is in preparation for allowing
system-wide events on all CPUs while the user requested events are on only
user requested CPUs.

Acked-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/arch/x86/util/intel-pt.c | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/tools/perf/arch/x86/util/intel-pt.c b/tools/perf/arch/x86/util/intel-pt.c
index 0ee93894a0da..06c2cdfd8f2f 100644
--- a/tools/perf/arch/x86/util/intel-pt.c
+++ b/tools/perf/arch/x86/util/intel-pt.c
@@ -864,20 +864,22 @@ static int intel_pt_recording_options(struct auxtrace_record *itr,
 
 	/* Add dummy event to keep tracking */
 	if (opts->full_auxtrace) {
+		bool need_system_wide_tracking;
 		struct evsel *tracking_evsel;
 
-		err = parse_events(evlist, "dummy:u", NULL);
-		if (err)
-			return err;
+		/*
+		 * User space tasks can migrate between CPUs, so when tracing
+		 * selected CPUs, sideband for all CPUs is still needed.
+		 */
+		need_system_wide_tracking = evlist->core.has_user_cpus &&
+					    !intel_pt_evsel->core.attr.exclude_user;
 
-		tracking_evsel = evlist__last(evlist);
+		tracking_evsel = evlist__add_aux_dummy(evlist, need_system_wide_tracking);
+		if (!tracking_evsel)
+			return -ENOMEM;
 
 		evlist__set_tracking_event(evlist, tracking_evsel);
 
-		tracking_evsel->core.attr.freq = 0;
-		tracking_evsel->core.attr.sample_period = 1;
-
-		tracking_evsel->no_aux_samples = true;
 		if (need_immediate)
 			tracking_evsel->immediate = true;
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH V4 09/15] perf tools: Allow all_cpus to be a superset of user_requested_cpus
  2022-05-24  7:54 [PATCH V4 00/15] perf intel-pt: Better support for perf record --cpu Adrian Hunter
                   ` (7 preceding siblings ...)
  2022-05-24  7:54 ` [PATCH V4 08/15] perf intel-pt: Track sideband system-wide when needed Adrian Hunter
@ 2022-05-24  7:54 ` Adrian Hunter
  2022-05-24  7:54 ` [PATCH V4 10/15] libperf evlist: Allow mixing per-thread and per-cpu mmaps Adrian Hunter
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: Adrian Hunter @ 2022-05-24  7:54 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Ian Rogers, Alexey Bayduraev, Namhyung Kim, Leo Yan,
	linux-kernel

To support collection of system-wide events with user requested CPUs,
all_cpus must be a superset of user_requested_cpus.

In order to support all_cpus to be a superset of user_requested_cpus,
all_cpus must be used instead of user_requested_cpus when dealing with CPUs
of all events instead of CPUs of requested events.

Acked-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/lib/perf/evlist.c     | 12 ++++++------
 tools/perf/builtin-record.c | 18 ++++++++++++------
 tools/perf/util/auxtrace.c  |  2 +-
 3 files changed, 19 insertions(+), 13 deletions(-)

diff --git a/tools/lib/perf/evlist.c b/tools/lib/perf/evlist.c
index ed66f2e38464..ec0e4b5da874 100644
--- a/tools/lib/perf/evlist.c
+++ b/tools/lib/perf/evlist.c
@@ -298,7 +298,7 @@ int perf_evlist__id_add_fd(struct perf_evlist *evlist,
 
 int perf_evlist__alloc_pollfd(struct perf_evlist *evlist)
 {
-	int nr_cpus = perf_cpu_map__nr(evlist->user_requested_cpus);
+	int nr_cpus = perf_cpu_map__nr(evlist->all_cpus);
 	int nr_threads = perf_thread_map__nr(evlist->threads);
 	int nfds = 0;
 	struct perf_evsel *evsel;
@@ -430,7 +430,7 @@ mmap_per_evsel(struct perf_evlist *evlist, struct perf_evlist_mmap_ops *ops,
 	       int idx, struct perf_mmap_param *mp, int cpu_idx,
 	       int thread, int *_output, int *_output_overwrite)
 {
-	struct perf_cpu evlist_cpu = perf_cpu_map__cpu(evlist->user_requested_cpus, cpu_idx);
+	struct perf_cpu evlist_cpu = perf_cpu_map__cpu(evlist->all_cpus, cpu_idx);
 	struct perf_evsel *evsel;
 	int revent;
 
@@ -540,7 +540,7 @@ mmap_per_cpu(struct perf_evlist *evlist, struct perf_evlist_mmap_ops *ops,
 	     struct perf_mmap_param *mp)
 {
 	int nr_threads = perf_thread_map__nr(evlist->threads);
-	int nr_cpus    = perf_cpu_map__nr(evlist->user_requested_cpus);
+	int nr_cpus    = perf_cpu_map__nr(evlist->all_cpus);
 	int cpu, thread;
 
 	for (cpu = 0; cpu < nr_cpus; cpu++) {
@@ -565,8 +565,8 @@ static int perf_evlist__nr_mmaps(struct perf_evlist *evlist)
 {
 	int nr_mmaps;
 
-	nr_mmaps = perf_cpu_map__nr(evlist->user_requested_cpus);
-	if (perf_cpu_map__empty(evlist->user_requested_cpus))
+	nr_mmaps = perf_cpu_map__nr(evlist->all_cpus);
+	if (perf_cpu_map__empty(evlist->all_cpus))
 		nr_mmaps = perf_thread_map__nr(evlist->threads);
 
 	return nr_mmaps;
@@ -577,7 +577,7 @@ int perf_evlist__mmap_ops(struct perf_evlist *evlist,
 			  struct perf_mmap_param *mp)
 {
 	struct perf_evsel *evsel;
-	const struct perf_cpu_map *cpus = evlist->user_requested_cpus;
+	const struct perf_cpu_map *cpus = evlist->all_cpus;
 
 	if (!ops || !ops->get || !ops->mmap)
 		return -EINVAL;
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index c8a79f3a8dff..cf9a7ce429df 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -967,14 +967,20 @@ static void record__thread_data_close_pipes(struct record_thread *thread_data)
 	}
 }
 
+static bool evlist__per_thread(struct evlist *evlist)
+{
+	return cpu_map__is_dummy(evlist->core.user_requested_cpus);
+}
+
 static int record__thread_data_init_maps(struct record_thread *thread_data, struct evlist *evlist)
 {
 	int m, tm, nr_mmaps = evlist->core.nr_mmaps;
 	struct mmap *mmap = evlist->mmap;
 	struct mmap *overwrite_mmap = evlist->overwrite_mmap;
-	struct perf_cpu_map *cpus = evlist->core.user_requested_cpus;
+	struct perf_cpu_map *cpus = evlist->core.all_cpus;
+	bool per_thread = evlist__per_thread(evlist);
 
-	if (cpu_map__is_dummy(cpus))
+	if (per_thread)
 		thread_data->nr_mmaps = nr_mmaps;
 	else
 		thread_data->nr_mmaps = bitmap_weight(thread_data->mask->maps.bits,
@@ -995,7 +1001,7 @@ static int record__thread_data_init_maps(struct record_thread *thread_data, stru
 		 thread_data->nr_mmaps, thread_data->maps, thread_data->overwrite_maps);
 
 	for (m = 0, tm = 0; m < nr_mmaps && tm < thread_data->nr_mmaps; m++) {
-		if (cpu_map__is_dummy(cpus) ||
+		if (per_thread ||
 		    test_bit(perf_cpu_map__cpu(cpus, m).cpu, thread_data->mask->maps.bits)) {
 			if (thread_data->maps) {
 				thread_data->maps[tm] = &mmap[m];
@@ -1870,7 +1876,7 @@ static int record__synthesize(struct record *rec, bool tail)
 		return err;
 	}
 
-	err = perf_event__synthesize_cpu_map(&rec->tool, rec->evlist->core.user_requested_cpus,
+	err = perf_event__synthesize_cpu_map(&rec->tool, rec->evlist->core.all_cpus,
 					     process_synthesized_event, NULL);
 	if (err < 0) {
 		pr_err("Couldn't synthesize cpu map.\n");
@@ -3668,12 +3674,12 @@ static int record__init_thread_default_masks(struct record *rec, struct perf_cpu
 static int record__init_thread_masks(struct record *rec)
 {
 	int ret = 0;
-	struct perf_cpu_map *cpus = rec->evlist->core.user_requested_cpus;
+	struct perf_cpu_map *cpus = rec->evlist->core.all_cpus;
 
 	if (!record__threads_enabled(rec))
 		return record__init_thread_default_masks(rec, cpus);
 
-	if (cpu_map__is_dummy(cpus)) {
+	if (evlist__per_thread(rec->evlist)) {
 		pr_err("--per-thread option is mutually exclusive to parallel streaming mode.\n");
 		return -EINVAL;
 	}
diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c
index ac4e4660932d..511dd3caa1bc 100644
--- a/tools/perf/util/auxtrace.c
+++ b/tools/perf/util/auxtrace.c
@@ -181,7 +181,7 @@ void auxtrace_mmap_params__set_idx(struct auxtrace_mmap_params *mp,
 	mp->idx = idx;
 
 	if (per_cpu) {
-		mp->cpu = perf_cpu_map__cpu(evlist->core.user_requested_cpus, idx);
+		mp->cpu = perf_cpu_map__cpu(evlist->core.all_cpus, idx);
 		if (evlist->core.threads)
 			mp->tid = perf_thread_map__pid(evlist->core.threads, 0);
 		else
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH V4 10/15] libperf evlist: Allow mixing per-thread and per-cpu mmaps
  2022-05-24  7:54 [PATCH V4 00/15] perf intel-pt: Better support for perf record --cpu Adrian Hunter
                   ` (8 preceding siblings ...)
  2022-05-24  7:54 ` [PATCH V4 09/15] perf tools: Allow all_cpus to be a superset of user_requested_cpus Adrian Hunter
@ 2022-05-24  7:54 ` Adrian Hunter
  2022-05-24  7:54 ` [PATCH V4 11/15] libperf evlist: Check nr_mmaps is correct Adrian Hunter
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: Adrian Hunter @ 2022-05-24  7:54 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Ian Rogers, Alexey Bayduraev, Namhyung Kim, Leo Yan,
	linux-kernel

mmap_per_evsel() will skip events that do not match the CPU, so all CPUs
can be iterated in any case.

Acked-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/lib/perf/evlist.c | 36 +++++++-----------------------------
 1 file changed, 7 insertions(+), 29 deletions(-)

diff --git a/tools/lib/perf/evlist.c b/tools/lib/perf/evlist.c
index ec0e4b5da874..eae1f6179dad 100644
--- a/tools/lib/perf/evlist.c
+++ b/tools/lib/perf/evlist.c
@@ -512,29 +512,6 @@ mmap_per_evsel(struct perf_evlist *evlist, struct perf_evlist_mmap_ops *ops,
 	return 0;
 }
 
-static int
-mmap_per_thread(struct perf_evlist *evlist, struct perf_evlist_mmap_ops *ops,
-		struct perf_mmap_param *mp)
-{
-	int thread;
-	int nr_threads = perf_thread_map__nr(evlist->threads);
-
-	for (thread = 0; thread < nr_threads; thread++) {
-		int output = -1;
-		int output_overwrite = -1;
-
-		if (mmap_per_evsel(evlist, ops, thread, mp, 0, thread,
-				   &output, &output_overwrite))
-			goto out_unmap;
-	}
-
-	return 0;
-
-out_unmap:
-	perf_evlist__munmap(evlist);
-	return -1;
-}
-
 static int
 mmap_per_cpu(struct perf_evlist *evlist, struct perf_evlist_mmap_ops *ops,
 	     struct perf_mmap_param *mp)
@@ -565,9 +542,14 @@ static int perf_evlist__nr_mmaps(struct perf_evlist *evlist)
 {
 	int nr_mmaps;
 
+	/* One for each CPU */
 	nr_mmaps = perf_cpu_map__nr(evlist->all_cpus);
-	if (perf_cpu_map__empty(evlist->all_cpus))
-		nr_mmaps = perf_thread_map__nr(evlist->threads);
+	if (perf_cpu_map__empty(evlist->all_cpus)) {
+		/* Plus one for each thread */
+		nr_mmaps += perf_thread_map__nr(evlist->threads);
+		/* Minus the per-thread CPU (-1) */
+		nr_mmaps -= 1;
+	}
 
 	return nr_mmaps;
 }
@@ -577,7 +559,6 @@ int perf_evlist__mmap_ops(struct perf_evlist *evlist,
 			  struct perf_mmap_param *mp)
 {
 	struct perf_evsel *evsel;
-	const struct perf_cpu_map *cpus = evlist->all_cpus;
 
 	if (!ops || !ops->get || !ops->mmap)
 		return -EINVAL;
@@ -596,9 +577,6 @@ int perf_evlist__mmap_ops(struct perf_evlist *evlist,
 	if (evlist->pollfd.entries == NULL && perf_evlist__alloc_pollfd(evlist) < 0)
 		return -ENOMEM;
 
-	if (perf_cpu_map__empty(cpus))
-		return mmap_per_thread(evlist, ops, mp);
-
 	return mmap_per_cpu(evlist, ops, mp);
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH V4 11/15] libperf evlist: Check nr_mmaps is correct
  2022-05-24  7:54 [PATCH V4 00/15] perf intel-pt: Better support for perf record --cpu Adrian Hunter
                   ` (9 preceding siblings ...)
  2022-05-24  7:54 ` [PATCH V4 10/15] libperf evlist: Allow mixing per-thread and per-cpu mmaps Adrian Hunter
@ 2022-05-24  7:54 ` Adrian Hunter
  2022-05-24  7:54 ` [PATCH V4 12/15] perf stat: Add requires_cpu flag for uncore Adrian Hunter
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: Adrian Hunter @ 2022-05-24  7:54 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Ian Rogers, Alexey Bayduraev, Namhyung Kim, Leo Yan,
	linux-kernel

Print an error message if the predetermined number of mmaps is
incorrect.

Acked-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/lib/perf/evlist.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/tools/lib/perf/evlist.c b/tools/lib/perf/evlist.c
index eae1f6179dad..f51fdb899d19 100644
--- a/tools/lib/perf/evlist.c
+++ b/tools/lib/perf/evlist.c
@@ -23,6 +23,7 @@
 #include <perf/cpumap.h>
 #include <perf/threadmap.h>
 #include <api/fd/array.h>
+#include "internal.h"
 
 void perf_evlist__init(struct perf_evlist *evlist)
 {
@@ -428,7 +429,7 @@ static void perf_evlist__set_mmap_first(struct perf_evlist *evlist, struct perf_
 static int
 mmap_per_evsel(struct perf_evlist *evlist, struct perf_evlist_mmap_ops *ops,
 	       int idx, struct perf_mmap_param *mp, int cpu_idx,
-	       int thread, int *_output, int *_output_overwrite)
+	       int thread, int *_output, int *_output_overwrite, int *nr_mmaps)
 {
 	struct perf_cpu evlist_cpu = perf_cpu_map__cpu(evlist->all_cpus, cpu_idx);
 	struct perf_evsel *evsel;
@@ -484,6 +485,8 @@ mmap_per_evsel(struct perf_evlist *evlist, struct perf_evlist_mmap_ops *ops,
 			if (ops->mmap(map, mp, *output, evlist_cpu) < 0)
 				return -1;
 
+			*nr_mmaps += 1;
+
 			if (!idx)
 				perf_evlist__set_mmap_first(evlist, map, overwrite);
 		} else {
@@ -518,6 +521,7 @@ mmap_per_cpu(struct perf_evlist *evlist, struct perf_evlist_mmap_ops *ops,
 {
 	int nr_threads = perf_thread_map__nr(evlist->threads);
 	int nr_cpus    = perf_cpu_map__nr(evlist->all_cpus);
+	int nr_mmaps = 0;
 	int cpu, thread;
 
 	for (cpu = 0; cpu < nr_cpus; cpu++) {
@@ -526,11 +530,14 @@ mmap_per_cpu(struct perf_evlist *evlist, struct perf_evlist_mmap_ops *ops,
 
 		for (thread = 0; thread < nr_threads; thread++) {
 			if (mmap_per_evsel(evlist, ops, cpu, mp, cpu,
-					   thread, &output, &output_overwrite))
+					   thread, &output, &output_overwrite, &nr_mmaps))
 				goto out_unmap;
 		}
 	}
 
+	if (nr_mmaps != evlist->nr_mmaps)
+		pr_err("Miscounted nr_mmaps %d vs %d\n", nr_mmaps, evlist->nr_mmaps);
+
 	return 0;
 
 out_unmap:
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH V4 12/15] perf stat: Add requires_cpu flag for uncore
  2022-05-24  7:54 [PATCH V4 00/15] perf intel-pt: Better support for perf record --cpu Adrian Hunter
                   ` (10 preceding siblings ...)
  2022-05-24  7:54 ` [PATCH V4 11/15] libperf evlist: Check nr_mmaps is correct Adrian Hunter
@ 2022-05-24  7:54 ` Adrian Hunter
  2022-05-24  7:54 ` [PATCH V4 13/15] libperf evsel: Add comments for booleans Adrian Hunter
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: Adrian Hunter @ 2022-05-24  7:54 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Ian Rogers, Alexey Bayduraev, Namhyung Kim, Leo Yan,
	linux-kernel

Uncore events require a CPU i.e. it cannot be -1.

The evsel system_wide flag is intended for events that should be on every
CPU, which does not make sense for uncore events because uncore events do
not map one-to-one with CPUs.

These 2 requirements are not exactly the same, so introduce a new flag
'requires_cpu' for the uncore case.

Acked-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/lib/perf/evlist.c                 | 4 +++-
 tools/lib/perf/include/internal/evsel.h | 1 +
 tools/perf/builtin-stat.c               | 5 +----
 tools/perf/util/evsel.c                 | 1 +
 tools/perf/util/parse-events.c          | 2 +-
 5 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/tools/lib/perf/evlist.c b/tools/lib/perf/evlist.c
index f51fdb899d19..1c801f8da44f 100644
--- a/tools/lib/perf/evlist.c
+++ b/tools/lib/perf/evlist.c
@@ -43,7 +43,9 @@ static void __perf_evlist__propagate_maps(struct perf_evlist *evlist,
 	if (!evsel->own_cpus || evlist->has_user_cpus) {
 		perf_cpu_map__put(evsel->cpus);
 		evsel->cpus = perf_cpu_map__get(evlist->user_requested_cpus);
-	} else if (!evsel->system_wide && perf_cpu_map__empty(evlist->user_requested_cpus)) {
+	} else if (!evsel->system_wide &&
+		   !evsel->requires_cpu &&
+		   perf_cpu_map__empty(evlist->user_requested_cpus)) {
 		perf_cpu_map__put(evsel->cpus);
 		evsel->cpus = perf_cpu_map__get(evlist->user_requested_cpus);
 	} else if (evsel->cpus != evsel->own_cpus) {
diff --git a/tools/lib/perf/include/internal/evsel.h b/tools/lib/perf/include/internal/evsel.h
index cfc9ebd7968e..77fbb8b97e5c 100644
--- a/tools/lib/perf/include/internal/evsel.h
+++ b/tools/lib/perf/include/internal/evsel.h
@@ -50,6 +50,7 @@ struct perf_evsel {
 	/* parse modifier helper */
 	int			 nr_members;
 	bool			 system_wide;
+	bool			 requires_cpu;
 	int			 idx;
 };
 
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 7e6cc8bdf061..4ce87a8eb7d7 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -382,9 +382,6 @@ static int read_counter_cpu(struct evsel *counter, struct timespec *rs, int cpu_
 	if (!counter->supported)
 		return -ENOENT;
 
-	if (counter->core.system_wide)
-		nthreads = 1;
-
 	for (thread = 0; thread < nthreads; thread++) {
 		struct perf_counts_values *count;
 
@@ -2261,7 +2258,7 @@ static void setup_system_wide(int forks)
 		struct evsel *counter;
 
 		evlist__for_each_entry(evsel_list, counter) {
-			if (!counter->core.system_wide &&
+			if (!counter->core.requires_cpu &&
 			    strcmp(counter->name, "duration_time")) {
 				return;
 			}
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index ef169ad15236..050b1c69a738 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -409,6 +409,7 @@ struct evsel *evsel__clone(struct evsel *orig)
 	evsel->core.threads = perf_thread_map__get(orig->core.threads);
 	evsel->core.nr_members = orig->core.nr_members;
 	evsel->core.system_wide = orig->core.system_wide;
+	evsel->core.requires_cpu = orig->core.requires_cpu;
 
 	if (orig->name) {
 		evsel->name = strdup(orig->name);
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 30a9d915853d..7ed235740431 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -365,7 +365,7 @@ __add_event(struct list_head *list, int *idx,
 	(*idx)++;
 	evsel->core.cpus = cpus;
 	evsel->core.own_cpus = perf_cpu_map__get(cpus);
-	evsel->core.system_wide = pmu ? pmu->is_uncore : false;
+	evsel->core.requires_cpu = pmu ? pmu->is_uncore : false;
 	evsel->auto_merge_stats = auto_merge_stats;
 
 	if (name)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH V4 13/15] libperf evsel: Add comments for booleans
  2022-05-24  7:54 [PATCH V4 00/15] perf intel-pt: Better support for perf record --cpu Adrian Hunter
                   ` (11 preceding siblings ...)
  2022-05-24  7:54 ` [PATCH V4 12/15] perf stat: Add requires_cpu flag for uncore Adrian Hunter
@ 2022-05-24  7:54 ` Adrian Hunter
  2022-05-24  7:54 ` [PATCH V4 14/15] perf tools: Allow system-wide events to keep their own CPUs Adrian Hunter
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: Adrian Hunter @ 2022-05-24  7:54 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Ian Rogers, Alexey Bayduraev, Namhyung Kim, Leo Yan,
	linux-kernel

Add comments for 'system_wide' and 'requires_cpu' booleans

Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/lib/perf/include/internal/evsel.h | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/tools/lib/perf/include/internal/evsel.h b/tools/lib/perf/include/internal/evsel.h
index 77fbb8b97e5c..2a912a1f1989 100644
--- a/tools/lib/perf/include/internal/evsel.h
+++ b/tools/lib/perf/include/internal/evsel.h
@@ -49,7 +49,17 @@ struct perf_evsel {
 
 	/* parse modifier helper */
 	int			 nr_members;
+	/*
+	 * system_wide is for events that need to be on every CPU, irrespective
+	 * of user requested CPUs or threads. Map propagation will set cpus to
+	 * this event's own_cpus, whereby they will contribute to evlist
+	 * all_cpus.
+	 */
 	bool			 system_wide;
+	/*
+	 * Some events, for example uncore events, require a CPU.
+	 * i.e. it cannot be the 'any CPU' value of -1.
+	 */
 	bool			 requires_cpu;
 	int			 idx;
 };
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH V4 14/15] perf tools: Allow system-wide events to keep their own CPUs
  2022-05-24  7:54 [PATCH V4 00/15] perf intel-pt: Better support for perf record --cpu Adrian Hunter
                   ` (12 preceding siblings ...)
  2022-05-24  7:54 ` [PATCH V4 13/15] libperf evsel: Add comments for booleans Adrian Hunter
@ 2022-05-24  7:54 ` Adrian Hunter
  2022-05-24  7:54 ` [PATCH V4 15/15] perf tools: Allow system-wide events to keep their own threads Adrian Hunter
  2022-05-25  5:01 ` [PATCH V4 00/15] perf intel-pt: Better support for perf record --cpu Ian Rogers
  15 siblings, 0 replies; 18+ messages in thread
From: Adrian Hunter @ 2022-05-24  7:54 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Ian Rogers, Alexey Bayduraev, Namhyung Kim, Leo Yan,
	linux-kernel

Currently, user_requested_cpus supplants system-wide CPUs when the evlist
has_user_cpus. Change that so that system-wide events retain their own
CPUs and they are added to all_cpus.

Acked-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/lib/perf/evlist.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/tools/lib/perf/evlist.c b/tools/lib/perf/evlist.c
index 1c801f8da44f..9a6801b53274 100644
--- a/tools/lib/perf/evlist.c
+++ b/tools/lib/perf/evlist.c
@@ -40,12 +40,11 @@ static void __perf_evlist__propagate_maps(struct perf_evlist *evlist,
 	 * We already have cpus for evsel (via PMU sysfs) so
 	 * keep it, if there's no target cpu list defined.
 	 */
-	if (!evsel->own_cpus || evlist->has_user_cpus) {
-		perf_cpu_map__put(evsel->cpus);
-		evsel->cpus = perf_cpu_map__get(evlist->user_requested_cpus);
-	} else if (!evsel->system_wide &&
-		   !evsel->requires_cpu &&
-		   perf_cpu_map__empty(evlist->user_requested_cpus)) {
+	if (!evsel->own_cpus ||
+	    (!evsel->system_wide && evlist->has_user_cpus) ||
+	    (!evsel->system_wide &&
+	     !evsel->requires_cpu &&
+	     perf_cpu_map__empty(evlist->user_requested_cpus))) {
 		perf_cpu_map__put(evsel->cpus);
 		evsel->cpus = perf_cpu_map__get(evlist->user_requested_cpus);
 	} else if (evsel->cpus != evsel->own_cpus) {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH V4 15/15] perf tools: Allow system-wide events to keep their own threads
  2022-05-24  7:54 [PATCH V4 00/15] perf intel-pt: Better support for perf record --cpu Adrian Hunter
                   ` (13 preceding siblings ...)
  2022-05-24  7:54 ` [PATCH V4 14/15] perf tools: Allow system-wide events to keep their own CPUs Adrian Hunter
@ 2022-05-24  7:54 ` Adrian Hunter
  2022-05-25  5:01 ` [PATCH V4 00/15] perf intel-pt: Better support for perf record --cpu Ian Rogers
  15 siblings, 0 replies; 18+ messages in thread
From: Adrian Hunter @ 2022-05-24  7:54 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Ian Rogers, Alexey Bayduraev, Namhyung Kim, Leo Yan,
	linux-kernel

System-wide events do not have threads, so do not propagate threads to
them.

Acked-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/lib/perf/evlist.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/tools/lib/perf/evlist.c b/tools/lib/perf/evlist.c
index 9a6801b53274..e6c98a6e3908 100644
--- a/tools/lib/perf/evlist.c
+++ b/tools/lib/perf/evlist.c
@@ -52,8 +52,11 @@ static void __perf_evlist__propagate_maps(struct perf_evlist *evlist,
 		evsel->cpus = perf_cpu_map__get(evsel->own_cpus);
 	}
 
-	perf_thread_map__put(evsel->threads);
-	evsel->threads = perf_thread_map__get(evlist->threads);
+	if (!evsel->system_wide) {
+		perf_thread_map__put(evsel->threads);
+		evsel->threads = perf_thread_map__get(evlist->threads);
+	}
+
 	evlist->all_cpus = perf_cpu_map__merge(evlist->all_cpus, evsel->cpus);
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH V4 00/15] perf intel-pt: Better support for perf record --cpu
  2022-05-24  7:54 [PATCH V4 00/15] perf intel-pt: Better support for perf record --cpu Adrian Hunter
                   ` (14 preceding siblings ...)
  2022-05-24  7:54 ` [PATCH V4 15/15] perf tools: Allow system-wide events to keep their own threads Adrian Hunter
@ 2022-05-25  5:01 ` Ian Rogers
  2022-05-25 11:00   ` Arnaldo Carvalho de Melo
  15 siblings, 1 reply; 18+ messages in thread
From: Ian Rogers @ 2022-05-25  5:01 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Alexey Bayduraev,
	Namhyung Kim, Leo Yan, linux-kernel

On Tue, May 24, 2022 at 12:55 AM Adrian Hunter <adrian.hunter@intel.com> wrote:
>
> Hi
>
> Here are V4 patches to support capturing Intel PT sideband events such as
> mmap, task, context switch, text poke etc, on every CPU even when tracing
> selected user_requested_cpus.  That is, when using the perf record -C or
>  --cpu option.
>
> This is needed for:
> 1. text poke: a text poke on any CPU affects all CPUs
> 2. tracing user space: a user space process can migrate between CPUs so
> mmap events that happen on a different CPU can be needed to decode a
> user_requested_cpus CPU.
>
> For example:
>
>         Trace on CPU 1:
>
>         perf record --kcore -C 1 -e intel_pt// &
>
>         Start a task on CPU 0:
>
>         taskset 0x1 testprog &
>
>         Migrate it to CPU 1:
>
>         taskset -p 0x2 <testprog pid>
>
>         Stop tracing:
>
>         kill %1
>
>         Prior to these changes there will be errors decoding testprog
>         in userspace because the comm and mmap events for testprog will not
>         have been captured.
>
> There is quite a bit of preparation:
>
> The first patch is a small Intel PT test for system-wide side band.  The
> test fails before the patches are applied, passed afterwards.
>
>       perf intel-pt: Add a test for system-wide side band [new in V1]
>
> The next 5 patches (now already applied) stop auxtrace mixing up mmap idx
> between evlist and evsel.  That is going to matter when
> evlist->all_cpus != evlist->user_requested_cpus != evsel->cpus:
>
>       libperf evsel: Factor out perf_evsel__ioctl() [now applied]
>       libperf evsel: Add perf_evsel__enable_thread()
>       perf evlist: Use libperf functions in evlist__enable_event_idx()
>       perf auxtrace: Move evlist__enable_event_idx() to auxtrace.c
>       perf auxtrace: Do not mix up mmap idx
>
> The next 6 patches (first 4 now already applied) stop attempts to auxtrace
> mmap when it is not an auxtrace event e.g. when mmapping the CPUs on which
> only sideband is captured:
>
>       libperf evlist: Remove ->idx() per_cpu parameter
>       libperf evlist: Move ->idx() into mmap_per_evsel()
>       libperf evlist: Add evsel as a parameter to ->idx()
>       perf auxtrace: Record whether an auxtrace mmap is needed
>       perf auxctrace: Add mmap_needed to auxtrace_mmap_params
>       perf auxtrace: Remove auxtrace_mmap_params__set_idx() per_cpu parameter
>
> The next 5 patches switch to setting up dummy event maps before adding the
> evsel so that the evsel is subject to map propagation, primarily to cause
> addition of the evsel's CPUs to all_cpus.
>
>       perf evlist: Factor out evlist__dummy_event()
>       perf evlist: Add evlist__add_system_wide_dummy()
>       perf record: Use evlist__add_system_wide_dummy() in record__config_text_poke()
>       perf intel-pt: Use evlist__add_system_wide_dummy() for switch tracking
>       perf intel-pt: Track sideband system-wide when needed
>
> The remaining patches make more significant changes.
>
> First change from using user_requested_cpus to using all_cpus where necessary:
>
>       perf tools: Allow all_cpus to be a superset of user_requested_cpus
>
> Secondly, mmap all per-thread and all per-cpu events:
>
>       libperf evlist: Allow mixing per-thread and per-cpu mmaps
>       libperf evlist: Check nr_mmaps is correct [new in V1]
>
> Stop using system_wide flag for uncore because it will not work anymore:
>
>       perf stat: Add requires_cpu flag for uncore
>       libperf evsel: Add comments for booleans [new in V1]
>
> Finally change map propagation so that system-wide events retain their cpus and
> (dummy) threads:
>
>       perf tools: Allow system-wide events to keep their own CPUs
>       perf tools: Allow system-wide events to keep their own threads
>
>
> Changes in V4:
>
>       Added Acked-by: Namhyung Kim <namhyung@kernel.org>
>       Added a couple Acked-by: Ian Rogers <irogers@google.com>

Would love to see this merged Arnaldo, I can do an:

Acked-by: Ian Rogers <irogers@google.com>

in case it helps you with b4 a little :-)

Thanks,
Ian

>       perf intel-pt: Add a test for system-wide side band
>         Put in commit message that test succeeds only after other
>         patches applied
>
>       libperf evsel: Add perf_evsel__enable_thread()
>       perf evlist: Use libperf functions in evlist__enable_event_idx()
>       perf auxtrace: Move evlist__enable_event_idx() to auxtrace.c
>       perf auxtrace: Do not mix up mmap idx
>       libperf evlist: Remove ->idx() per_cpu parameter
>       libperf evlist: Move ->idx() into mmap_per_evsel()
>       libperf evlist: Add evsel as a parameter to ->idx()
>       perf auxtrace: Record whether an auxtrace mmap is needed
>         Omitted because already applied
>
>       libperf evsel: Add comments for booleans
>         Amended comment about own_cpus
>
>
> Changes in V3:
>
>       perf auxtrace: Add mmap_needed to auxtrace_mmap_params
>         Amended mmap_needed comment
>
>       perf evlist: Add evlist__add_dummy_on_all_cpus()
>         Amended comment about all CPUs.
>
>
> Changes in V2:
>
>       Added some Acked-by: Ian Rogers <irogers@google.com>
>
>       libperf evsel: Add perf_evsel__enable_thread()
>         Use perf_cpu_map__for_each_cpu()
>
>       perf auxtrace: Add mmap_needed to auxtrace_mmap_params
>         Add documentation comment for mmap_needed
>
>       perf auxtrace: Remove auxtrace_mmap_params__set_idx() per_cpu parameter
>         Fix missing auxtrace_mmap_params__set_idx change
>
>       libperf evlist: Check nr_mmaps is correct
>         Remove unused code
>
>       libperf evsel: Add comments for booleans
>         Amend comments
>
>       perf evlist: Add evlist__add_dummy_on_all_cpus()
>         Rename evlist__add_system_wide -> evlist__add_on_all_cpus
>         Changed patch subject accordingly
>
>       perf record: Use evlist__add_dummy_on_all_cpus() in record__config_text_poke()
>         Rename evlist__add_system_wide -> evlist__add_on_all_cpus
>         Changed patch subject accordingly
>
>       perf intel-pt: Use evlist__add_dummy_on_all_cpus() for switch tracking
>         Rename evlist__add_system_wide -> evlist__add_on_all_cpus
>         Changed patch subject accordingly
>
>
> Changes in V1:
>
>       perf intel-pt: Add a test for system-wide side band
>         New patch
>
>       libperf evsel: Factor out perf_evsel__ioctl()
>         Dropped because it has been applied.
>
>       libperf evsel: Add perf_evsel__enable_thread()
>         Rename variable i -> idx
>
>       perf auxtrace: Do not mix up mmap idx
>         Rename variable cpu to cpu_map_idx
>
>       perf tools: Allow all_cpus to be a superset of user_requested_cpus
>         Add Acked-by: Ian Rogers <irogers@google.com>
>
>       libperf evlist: Allow mixing per-thread and per-cpu mmaps
>         Fix perf_evlist__nr_mmaps() calculation
>
>       libperf evlist: Check nr_mmaps is correct
>         New patch
>
>       libperf evsel: Add comments for booleans
>         New patch
>
>       perf tools: Allow system-wide events to keep their own CPUs
>       perf tools: Allow system-wide events to keep their own threads
>
>
> Adrian Hunter (15):
>       perf intel-pt: Add a test for system-wide side band
>       perf auxtrace: Add mmap_needed to auxtrace_mmap_params
>       perf auxtrace: Remove auxtrace_mmap_params__set_idx() per_cpu parameter
>       perf evlist: Factor out evlist__dummy_event()
>       perf evlist: Add evlist__add_dummy_on_all_cpus()
>       perf record: Use evlist__add_dummy_on_all_cpus() in record__config_text_poke()
>       perf intel-pt: Use evlist__add_dummy_on_all_cpus() for switch tracking
>       perf intel-pt: Track sideband system-wide when needed
>       perf tools: Allow all_cpus to be a superset of user_requested_cpus
>       libperf evlist: Allow mixing per-thread and per-cpu mmaps
>       libperf evlist: Check nr_mmaps is correct
>       perf stat: Add requires_cpu flag for uncore
>       libperf evsel: Add comments for booleans
>       perf tools: Allow system-wide events to keep their own CPUs
>       perf tools: Allow system-wide events to keep their own threads
>
>  tools/lib/perf/evlist.c                 | 71 ++++++++++++++-------------------
>  tools/lib/perf/include/internal/evsel.h | 11 +++++
>  tools/perf/arch/x86/util/intel-pt.c     | 31 ++++++--------
>  tools/perf/builtin-record.c             | 39 +++++++-----------
>  tools/perf/builtin-stat.c               |  5 +--
>  tools/perf/tests/shell/test_intel_pt.sh | 71 +++++++++++++++++++++++++++++++++
>  tools/perf/util/auxtrace.c              | 15 +++++--
>  tools/perf/util/auxtrace.h              | 13 ++++--
>  tools/perf/util/evlist.c                | 61 +++++++++++++++++++++++++---
>  tools/perf/util/evlist.h                |  5 +++
>  tools/perf/util/evsel.c                 |  1 +
>  tools/perf/util/mmap.c                  |  4 +-
>  tools/perf/util/parse-events.c          |  2 +-
>  13 files changed, 226 insertions(+), 103 deletions(-)
>  create mode 100755 tools/perf/tests/shell/test_intel_pt.sh
>
>
> Regards
> Adrian

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH V4 00/15] perf intel-pt: Better support for perf record --cpu
  2022-05-25  5:01 ` [PATCH V4 00/15] perf intel-pt: Better support for perf record --cpu Ian Rogers
@ 2022-05-25 11:00   ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 18+ messages in thread
From: Arnaldo Carvalho de Melo @ 2022-05-25 11:00 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Adrian Hunter, Jiri Olsa, Alexey Bayduraev, Namhyung Kim,
	Leo Yan, linux-kernel

Em Tue, May 24, 2022 at 10:01:01PM -0700, Ian Rogers escreveu:
> On Tue, May 24, 2022 at 12:55 AM Adrian Hunter <adrian.hunter@intel.com> wrote:
> >
> > Hi
> >
> > Here are V4 patches to support capturing Intel PT sideband events such as
> > mmap, task, context switch, text poke etc, on every CPU even when tracing
> > selected user_requested_cpus.  That is, when using the perf record -C or
> >  --cpu option.
> >
> > This is needed for:
> > 1. text poke: a text poke on any CPU affects all CPUs
> > 2. tracing user space: a user space process can migrate between CPUs so
> > mmap events that happen on a different CPU can be needed to decode a
> > user_requested_cpus CPU.
> >
> > For example:
> >
> >         Trace on CPU 1:
> >
> >         perf record --kcore -C 1 -e intel_pt// &
> >
> >         Start a task on CPU 0:
> >
> >         taskset 0x1 testprog &
> >
> >         Migrate it to CPU 1:
> >
> >         taskset -p 0x2 <testprog pid>
> >
> >         Stop tracing:
> >
> >         kill %1
> >
> >         Prior to these changes there will be errors decoding testprog
> >         in userspace because the comm and mmap events for testprog will not
> >         have been captured.
> >
> > There is quite a bit of preparation:
> >
> > The first patch is a small Intel PT test for system-wide side band.  The
> > test fails before the patches are applied, passed afterwards.
> >
> >       perf intel-pt: Add a test for system-wide side band [new in V1]
> >
> > The next 5 patches (now already applied) stop auxtrace mixing up mmap idx
> > between evlist and evsel.  That is going to matter when
> > evlist->all_cpus != evlist->user_requested_cpus != evsel->cpus:
> >
> >       libperf evsel: Factor out perf_evsel__ioctl() [now applied]
> >       libperf evsel: Add perf_evsel__enable_thread()
> >       perf evlist: Use libperf functions in evlist__enable_event_idx()
> >       perf auxtrace: Move evlist__enable_event_idx() to auxtrace.c
> >       perf auxtrace: Do not mix up mmap idx
> >
> > The next 6 patches (first 4 now already applied) stop attempts to auxtrace
> > mmap when it is not an auxtrace event e.g. when mmapping the CPUs on which
> > only sideband is captured:
> >
> >       libperf evlist: Remove ->idx() per_cpu parameter
> >       libperf evlist: Move ->idx() into mmap_per_evsel()
> >       libperf evlist: Add evsel as a parameter to ->idx()
> >       perf auxtrace: Record whether an auxtrace mmap is needed
> >       perf auxctrace: Add mmap_needed to auxtrace_mmap_params
> >       perf auxtrace: Remove auxtrace_mmap_params__set_idx() per_cpu parameter
> >
> > The next 5 patches switch to setting up dummy event maps before adding the
> > evsel so that the evsel is subject to map propagation, primarily to cause
> > addition of the evsel's CPUs to all_cpus.
> >
> >       perf evlist: Factor out evlist__dummy_event()
> >       perf evlist: Add evlist__add_system_wide_dummy()
> >       perf record: Use evlist__add_system_wide_dummy() in record__config_text_poke()
> >       perf intel-pt: Use evlist__add_system_wide_dummy() for switch tracking
> >       perf intel-pt: Track sideband system-wide when needed
> >
> > The remaining patches make more significant changes.
> >
> > First change from using user_requested_cpus to using all_cpus where necessary:
> >
> >       perf tools: Allow all_cpus to be a superset of user_requested_cpus
> >
> > Secondly, mmap all per-thread and all per-cpu events:
> >
> >       libperf evlist: Allow mixing per-thread and per-cpu mmaps
> >       libperf evlist: Check nr_mmaps is correct [new in V1]
> >
> > Stop using system_wide flag for uncore because it will not work anymore:
> >
> >       perf stat: Add requires_cpu flag for uncore
> >       libperf evsel: Add comments for booleans [new in V1]
> >
> > Finally change map propagation so that system-wide events retain their cpus and
> > (dummy) threads:
> >
> >       perf tools: Allow system-wide events to keep their own CPUs
> >       perf tools: Allow system-wide events to keep their own threads
> >
> >
> > Changes in V4:
> >
> >       Added Acked-by: Namhyung Kim <namhyung@kernel.org>
> >       Added a couple Acked-by: Ian Rogers <irogers@google.com>
> 
> Would love to see this merged Arnaldo, I can do an:
> 
> Acked-by: Ian Rogers <irogers@google.com>
> 
> in case it helps you with b4 a little :-)

I'll add your Acked-by manually now to the patches missing it, as I had merged this yesterday.

- Arnaldo
 
> Thanks,
> Ian
> 
> >       perf intel-pt: Add a test for system-wide side band
> >         Put in commit message that test succeeds only after other
> >         patches applied
> >
> >       libperf evsel: Add perf_evsel__enable_thread()
> >       perf evlist: Use libperf functions in evlist__enable_event_idx()
> >       perf auxtrace: Move evlist__enable_event_idx() to auxtrace.c
> >       perf auxtrace: Do not mix up mmap idx
> >       libperf evlist: Remove ->idx() per_cpu parameter
> >       libperf evlist: Move ->idx() into mmap_per_evsel()
> >       libperf evlist: Add evsel as a parameter to ->idx()
> >       perf auxtrace: Record whether an auxtrace mmap is needed
> >         Omitted because already applied
> >
> >       libperf evsel: Add comments for booleans
> >         Amended comment about own_cpus
> >
> >
> > Changes in V3:
> >
> >       perf auxtrace: Add mmap_needed to auxtrace_mmap_params
> >         Amended mmap_needed comment
> >
> >       perf evlist: Add evlist__add_dummy_on_all_cpus()
> >         Amended comment about all CPUs.
> >
> >
> > Changes in V2:
> >
> >       Added some Acked-by: Ian Rogers <irogers@google.com>
> >
> >       libperf evsel: Add perf_evsel__enable_thread()
> >         Use perf_cpu_map__for_each_cpu()
> >
> >       perf auxtrace: Add mmap_needed to auxtrace_mmap_params
> >         Add documentation comment for mmap_needed
> >
> >       perf auxtrace: Remove auxtrace_mmap_params__set_idx() per_cpu parameter
> >         Fix missing auxtrace_mmap_params__set_idx change
> >
> >       libperf evlist: Check nr_mmaps is correct
> >         Remove unused code
> >
> >       libperf evsel: Add comments for booleans
> >         Amend comments
> >
> >       perf evlist: Add evlist__add_dummy_on_all_cpus()
> >         Rename evlist__add_system_wide -> evlist__add_on_all_cpus
> >         Changed patch subject accordingly
> >
> >       perf record: Use evlist__add_dummy_on_all_cpus() in record__config_text_poke()
> >         Rename evlist__add_system_wide -> evlist__add_on_all_cpus
> >         Changed patch subject accordingly
> >
> >       perf intel-pt: Use evlist__add_dummy_on_all_cpus() for switch tracking
> >         Rename evlist__add_system_wide -> evlist__add_on_all_cpus
> >         Changed patch subject accordingly
> >
> >
> > Changes in V1:
> >
> >       perf intel-pt: Add a test for system-wide side band
> >         New patch
> >
> >       libperf evsel: Factor out perf_evsel__ioctl()
> >         Dropped because it has been applied.
> >
> >       libperf evsel: Add perf_evsel__enable_thread()
> >         Rename variable i -> idx
> >
> >       perf auxtrace: Do not mix up mmap idx
> >         Rename variable cpu to cpu_map_idx
> >
> >       perf tools: Allow all_cpus to be a superset of user_requested_cpus
> >         Add Acked-by: Ian Rogers <irogers@google.com>
> >
> >       libperf evlist: Allow mixing per-thread and per-cpu mmaps
> >         Fix perf_evlist__nr_mmaps() calculation
> >
> >       libperf evlist: Check nr_mmaps is correct
> >         New patch
> >
> >       libperf evsel: Add comments for booleans
> >         New patch
> >
> >       perf tools: Allow system-wide events to keep their own CPUs
> >       perf tools: Allow system-wide events to keep their own threads
> >
> >
> > Adrian Hunter (15):
> >       perf intel-pt: Add a test for system-wide side band
> >       perf auxtrace: Add mmap_needed to auxtrace_mmap_params
> >       perf auxtrace: Remove auxtrace_mmap_params__set_idx() per_cpu parameter
> >       perf evlist: Factor out evlist__dummy_event()
> >       perf evlist: Add evlist__add_dummy_on_all_cpus()
> >       perf record: Use evlist__add_dummy_on_all_cpus() in record__config_text_poke()
> >       perf intel-pt: Use evlist__add_dummy_on_all_cpus() for switch tracking
> >       perf intel-pt: Track sideband system-wide when needed
> >       perf tools: Allow all_cpus to be a superset of user_requested_cpus
> >       libperf evlist: Allow mixing per-thread and per-cpu mmaps
> >       libperf evlist: Check nr_mmaps is correct
> >       perf stat: Add requires_cpu flag for uncore
> >       libperf evsel: Add comments for booleans
> >       perf tools: Allow system-wide events to keep their own CPUs
> >       perf tools: Allow system-wide events to keep their own threads
> >
> >  tools/lib/perf/evlist.c                 | 71 ++++++++++++++-------------------
> >  tools/lib/perf/include/internal/evsel.h | 11 +++++
> >  tools/perf/arch/x86/util/intel-pt.c     | 31 ++++++--------
> >  tools/perf/builtin-record.c             | 39 +++++++-----------
> >  tools/perf/builtin-stat.c               |  5 +--
> >  tools/perf/tests/shell/test_intel_pt.sh | 71 +++++++++++++++++++++++++++++++++
> >  tools/perf/util/auxtrace.c              | 15 +++++--
> >  tools/perf/util/auxtrace.h              | 13 ++++--
> >  tools/perf/util/evlist.c                | 61 +++++++++++++++++++++++++---
> >  tools/perf/util/evlist.h                |  5 +++
> >  tools/perf/util/evsel.c                 |  1 +
> >  tools/perf/util/mmap.c                  |  4 +-
> >  tools/perf/util/parse-events.c          |  2 +-
> >  13 files changed, 226 insertions(+), 103 deletions(-)
> >  create mode 100755 tools/perf/tests/shell/test_intel_pt.sh
> >
> >
> > Regards
> > Adrian

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2022-05-25 11:01 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-24  7:54 [PATCH V4 00/15] perf intel-pt: Better support for perf record --cpu Adrian Hunter
2022-05-24  7:54 ` [PATCH V4 01/15] perf intel-pt: Add a test for system-wide side band Adrian Hunter
2022-05-24  7:54 ` [PATCH V4 02/15] perf auxtrace: Add mmap_needed to auxtrace_mmap_params Adrian Hunter
2022-05-24  7:54 ` [PATCH V4 03/15] perf auxtrace: Remove auxtrace_mmap_params__set_idx() per_cpu parameter Adrian Hunter
2022-05-24  7:54 ` [PATCH V4 04/15] perf evlist: Factor out evlist__dummy_event() Adrian Hunter
2022-05-24  7:54 ` [PATCH V4 05/15] perf evlist: Add evlist__add_dummy_on_all_cpus() Adrian Hunter
2022-05-24  7:54 ` [PATCH V4 06/15] perf record: Use evlist__add_dummy_on_all_cpus() in record__config_text_poke() Adrian Hunter
2022-05-24  7:54 ` [PATCH V4 07/15] perf intel-pt: Use evlist__add_dummy_on_all_cpus() for switch tracking Adrian Hunter
2022-05-24  7:54 ` [PATCH V4 08/15] perf intel-pt: Track sideband system-wide when needed Adrian Hunter
2022-05-24  7:54 ` [PATCH V4 09/15] perf tools: Allow all_cpus to be a superset of user_requested_cpus Adrian Hunter
2022-05-24  7:54 ` [PATCH V4 10/15] libperf evlist: Allow mixing per-thread and per-cpu mmaps Adrian Hunter
2022-05-24  7:54 ` [PATCH V4 11/15] libperf evlist: Check nr_mmaps is correct Adrian Hunter
2022-05-24  7:54 ` [PATCH V4 12/15] perf stat: Add requires_cpu flag for uncore Adrian Hunter
2022-05-24  7:54 ` [PATCH V4 13/15] libperf evsel: Add comments for booleans Adrian Hunter
2022-05-24  7:54 ` [PATCH V4 14/15] perf tools: Allow system-wide events to keep their own CPUs Adrian Hunter
2022-05-24  7:54 ` [PATCH V4 15/15] perf tools: Allow system-wide events to keep their own threads Adrian Hunter
2022-05-25  5:01 ` [PATCH V4 00/15] perf intel-pt: Better support for perf record --cpu Ian Rogers
2022-05-25 11:00   ` Arnaldo Carvalho de Melo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.