All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHSET 0/6] perf tools: A couple of fixes for perf record --off-cpu (v1)
@ 2022-06-24 23:13 Namhyung Kim
  2022-06-24 23:13 ` [PATCH 1/6] perf offcpu: Fix a build failure on old kernels Namhyung Kim
                   ` (6 more replies)
  0 siblings, 7 replies; 12+ messages in thread
From: Namhyung Kim @ 2022-06-24 23:13 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers, linux-perf-users,
	Song Liu, Hao Luo, Milian Wolff, bpf, Blake Jones

Hello,

The first patch fixes a build error on old kernels which has
task_struct->state field that is renamed to __state.  Actually I made
a mistake when I wrote the code and assumed new kernel version.

The second patch is to prevent invalid sample synthesize by
disallowing unsupported sample types.

The rest of the series implements inheritance of offcpu events for the
child processes.  Unlike perf events, BPF cannot know which task it
should track except for ones set in a BPF map at the beginning.  Add
another BPF program to the fork path and add the process id to the
map if the parent is tracked.

With this change, it can get the correct off-cpu events for child
processes.  I've tested it with perf bench sched messaging which
creates a lot of processes.

  $ sudo perf record -e dummy --off-cpu -- perf bench sched messaging
  # Running 'sched/messaging' benchmark:
  # 20 sender and receiver processes per group
  # 10 groups == 400 processes run

       Total time: 0.196 [sec]
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.178 MB perf.data (851 samples) ]


  $ sudo perf report --stat | grep -A1 offcpu
  offcpu-time stats:
            SAMPLE events:        851

The benchmark passes messages by read/write and it creates off-cpu
events.  With 400 processes, we can see more than 800 events.

The child process tracking is also enabled when -p option is given.
But -t option does NOT as it only cares about the specific threads.
It may be different what perf_event does now, but I think it makes
more sense.

You can get it from 'perf/offcpu-child-v1' branch in my tree

  https://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git

Thanks,
Namhyung


Namhyung Kim (6):
  perf offcpu: Fix a build failure on old kernels
  perf offcpu: Accept allowed sample types only
  perf offcpu: Check process id for the given workload
  perf offcpu: Parse process id separately
  perf offcpu: Track child processes
  perf offcpu: Update offcpu test for child process

 tools/perf/tests/shell/record_offcpu.sh | 57 ++++++++++++++++++++---
 tools/perf/util/bpf_off_cpu.c           | 60 +++++++++++++++++++++++--
 tools/perf/util/bpf_skel/off_cpu.bpf.c  | 58 +++++++++++++++++++++---
 tools/perf/util/evsel.c                 |  9 ++++
 tools/perf/util/off_cpu.h               |  9 ++++
 5 files changed, 176 insertions(+), 17 deletions(-)


base-commit: 9886142c7a2226439c1e3f7d9b69f9c7094c3ef6
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/6] perf offcpu: Fix a build failure on old kernels
  2022-06-24 23:13 [PATCHSET 0/6] perf tools: A couple of fixes for perf record --off-cpu (v1) Namhyung Kim
@ 2022-06-24 23:13 ` Namhyung Kim
  2022-06-28 14:44   ` Arnaldo Carvalho de Melo
  2022-06-24 23:13 ` [PATCH 2/6] perf offcpu: Accept allowed sample types only Namhyung Kim
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 12+ messages in thread
From: Namhyung Kim @ 2022-06-24 23:13 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers, linux-perf-users,
	Song Liu, Hao Luo, Milian Wolff, bpf, Blake Jones

Old kernels have task_struct which contains "state" field and newer
kernels have "__state".  While the get_task_state() in the BPF code
handles that in some way, it assumed the current kernel has the new
definition and it caused a build error on old kernels.

We should not assume anything and access them carefully.  Do not use
the task struct directly and access them using new and old definitions
in a row.

Reported-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/util/bpf_skel/off_cpu.bpf.c | 20 ++++++++++++++------
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/tools/perf/util/bpf_skel/off_cpu.bpf.c b/tools/perf/util/bpf_skel/off_cpu.bpf.c
index 792ae2847080..cc6d7fd55118 100644
--- a/tools/perf/util/bpf_skel/off_cpu.bpf.c
+++ b/tools/perf/util/bpf_skel/off_cpu.bpf.c
@@ -71,6 +71,11 @@ struct {
 	__uint(max_entries, 1);
 } cgroup_filter SEC(".maps");
 
+/* new kernel task_struct definition */
+struct task_struct___new {
+	long __state;
+} __attribute__((preserve_access_index));
+
 /* old kernel task_struct definition */
 struct task_struct___old {
 	long state;
@@ -93,14 +98,17 @@ const volatile bool uses_cgroup_v1 = false;
  */
 static inline int get_task_state(struct task_struct *t)
 {
-	if (bpf_core_field_exists(t->__state))
-		return BPF_CORE_READ(t, __state);
+	/* recast pointer to capture new type for compiler */
+	struct task_struct___new *t_new = (void *)t;
 
-	/* recast pointer to capture task_struct___old type for compiler */
-	struct task_struct___old *t_old = (void *)t;
+	if (bpf_core_field_exists(t_new->__state)) {
+		return BPF_CORE_READ(t_new, __state);
+	} else {
+		/* recast pointer to capture old type for compiler */
+		struct task_struct___old *t_old = (void *)t;
 
-	/* now use old "state" name of the field */
-	return BPF_CORE_READ(t_old, state);
+		return BPF_CORE_READ(t_old, state);
+	}
 }
 
 static inline __u64 get_cgroup_id(struct task_struct *t)
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/6] perf offcpu: Accept allowed sample types only
  2022-06-24 23:13 [PATCHSET 0/6] perf tools: A couple of fixes for perf record --off-cpu (v1) Namhyung Kim
  2022-06-24 23:13 ` [PATCH 1/6] perf offcpu: Fix a build failure on old kernels Namhyung Kim
@ 2022-06-24 23:13 ` Namhyung Kim
  2022-06-28 14:46   ` Arnaldo Carvalho de Melo
  2022-06-24 23:13 ` [PATCH 3/6] perf offcpu: Check process id for the given workload Namhyung Kim
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 12+ messages in thread
From: Namhyung Kim @ 2022-06-24 23:13 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers, linux-perf-users,
	Song Liu, Hao Luo, Milian Wolff, bpf, Blake Jones

As offcpu-time event is synthesized at the end, it could not get the
all the sample info.  Define OFFCPU_SAMPLE_TYPES for allowed ones and
mask out others in evsel__config() to prevent parse errors.

Because perf sample parsing assumes a specific ordering with the
sample types, setting unsupported one would make it fail to read
data like perf record -d/--data.

Fixes: edc41a1099c2 ("perf record: Enable off-cpu analysis with BPF")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/util/bpf_off_cpu.c | 7 ++++++-
 tools/perf/util/evsel.c       | 9 +++++++++
 tools/perf/util/off_cpu.h     | 9 +++++++++
 3 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/bpf_off_cpu.c b/tools/perf/util/bpf_off_cpu.c
index b73e84a02264..f289b7713598 100644
--- a/tools/perf/util/bpf_off_cpu.c
+++ b/tools/perf/util/bpf_off_cpu.c
@@ -265,6 +265,12 @@ int off_cpu_write(struct perf_session *session)
 
 	sample_type = evsel->core.attr.sample_type;
 
+	if (sample_type & ~OFFCPU_SAMPLE_TYPES) {
+		pr_err("not supported sample type: %llx\n",
+		       (unsigned long long)sample_type);
+		return -1;
+	}
+
 	if (sample_type & (PERF_SAMPLE_ID | PERF_SAMPLE_IDENTIFIER)) {
 		if (evsel->core.id)
 			sid = evsel->core.id[0];
@@ -319,7 +325,6 @@ int off_cpu_write(struct perf_session *session)
 		}
 		if (sample_type & PERF_SAMPLE_CGROUP)
 			data.array[n++] = key.cgroup_id;
-		/* TODO: handle more sample types */
 
 		size = n * sizeof(u64);
 		data.hdr.size = size;
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index ce499c5da8d7..094b0a9c0bc0 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -48,6 +48,7 @@
 #include "util.h"
 #include "hashmap.h"
 #include "pmu-hybrid.h"
+#include "off_cpu.h"
 #include "../perf-sys.h"
 #include "util/parse-branch-options.h"
 #include <internal/xyarray.h>
@@ -1102,6 +1103,11 @@ static void evsel__set_default_freq_period(struct record_opts *opts,
 	}
 }
 
+static bool evsel__is_offcpu_event(struct evsel *evsel)
+{
+	return evsel__is_bpf_output(evsel) && !strcmp(evsel->name, OFFCPU_EVENT);
+}
+
 /*
  * The enable_on_exec/disabled value strategy:
  *
@@ -1366,6 +1372,9 @@ void evsel__config(struct evsel *evsel, struct record_opts *opts,
 	 */
 	if (evsel__is_dummy_event(evsel))
 		evsel__reset_sample_bit(evsel, BRANCH_STACK);
+
+	if (evsel__is_offcpu_event(evsel))
+		evsel->core.attr.sample_type &= OFFCPU_SAMPLE_TYPES;
 }
 
 int evsel__set_filter(struct evsel *evsel, const char *filter)
diff --git a/tools/perf/util/off_cpu.h b/tools/perf/util/off_cpu.h
index 548008f74d42..2dd67c60f211 100644
--- a/tools/perf/util/off_cpu.h
+++ b/tools/perf/util/off_cpu.h
@@ -1,6 +1,8 @@
 #ifndef PERF_UTIL_OFF_CPU_H
 #define PERF_UTIL_OFF_CPU_H
 
+#include <linux/perf_event.h>
+
 struct evlist;
 struct target;
 struct perf_session;
@@ -8,6 +10,13 @@ struct record_opts;
 
 #define OFFCPU_EVENT  "offcpu-time"
 
+#define OFFCPU_SAMPLE_TYPES  (PERF_SAMPLE_IDENTIFIER | PERF_SAMPLE_IP | \
+			      PERF_SAMPLE_TID | PERF_SAMPLE_TIME | \
+			      PERF_SAMPLE_ID | PERF_SAMPLE_CPU | \
+			      PERF_SAMPLE_PERIOD | PERF_SAMPLE_CALLCHAIN | \
+			      PERF_SAMPLE_CGROUP)
+
+
 #ifdef HAVE_BPF_SKEL
 int off_cpu_prepare(struct evlist *evlist, struct target *target,
 		    struct record_opts *opts);
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 3/6] perf offcpu: Check process id for the given workload
  2022-06-24 23:13 [PATCHSET 0/6] perf tools: A couple of fixes for perf record --off-cpu (v1) Namhyung Kim
  2022-06-24 23:13 ` [PATCH 1/6] perf offcpu: Fix a build failure on old kernels Namhyung Kim
  2022-06-24 23:13 ` [PATCH 2/6] perf offcpu: Accept allowed sample types only Namhyung Kim
@ 2022-06-24 23:13 ` Namhyung Kim
  2022-06-24 23:13 ` [PATCH 4/6] perf offcpu: Parse process id separately Namhyung Kim
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 12+ messages in thread
From: Namhyung Kim @ 2022-06-24 23:13 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers, linux-perf-users,
	Song Liu, Hao Luo, Milian Wolff, bpf, Blake Jones

Current task filter checks task->pid which is different for each
thread.  But we want to profile all the threads in the process.  So
let's compare process id (or thread-group id: tgid) instead.

Before:
  $ sudo perf record --off-cpu -- perf bench sched messaging -t
  $ sudo perf report --stat | grep -A1 offcpu
  offcpu-time stats:
            SAMPLE events:        2

After:
  $ sudo perf record --off-cpu -- perf bench sched messaging -t
  $ sudo perf report --stat | grep -A1 offcpu
  offcpu-time stats:
            SAMPLE events:      850

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/util/bpf_off_cpu.c          | 1 +
 tools/perf/util/bpf_skel/off_cpu.bpf.c | 8 +++++++-
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/bpf_off_cpu.c b/tools/perf/util/bpf_off_cpu.c
index f289b7713598..7dbcb025da87 100644
--- a/tools/perf/util/bpf_off_cpu.c
+++ b/tools/perf/util/bpf_off_cpu.c
@@ -78,6 +78,7 @@ static void off_cpu_start(void *arg)
 		u8 val = 1;
 
 		skel->bss->has_task = 1;
+		skel->bss->uses_tgid = 1;
 		fd = bpf_map__fd(skel->maps.task_filter);
 		pid = perf_thread_map__pid(evlist->core.threads, 0);
 		bpf_map_update_elem(fd, &pid, &val, BPF_ANY);
diff --git a/tools/perf/util/bpf_skel/off_cpu.bpf.c b/tools/perf/util/bpf_skel/off_cpu.bpf.c
index cc6d7fd55118..143a8b7acf87 100644
--- a/tools/perf/util/bpf_skel/off_cpu.bpf.c
+++ b/tools/perf/util/bpf_skel/off_cpu.bpf.c
@@ -85,6 +85,7 @@ int enabled = 0;
 int has_cpu = 0;
 int has_task = 0;
 int has_cgroup = 0;
+int uses_tgid = 0;
 
 const volatile bool has_prev_state = false;
 const volatile bool needs_cgroup = false;
@@ -144,7 +145,12 @@ static inline int can_record(struct task_struct *t, int state)
 
 	if (has_task) {
 		__u8 *ok;
-		__u32 pid = t->pid;
+		__u32 pid;
+
+		if (uses_tgid)
+			pid = t->tgid;
+		else
+			pid = t->pid;
 
 		ok = bpf_map_lookup_elem(&task_filter, &pid);
 		if (!ok)
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 4/6] perf offcpu: Parse process id separately
  2022-06-24 23:13 [PATCHSET 0/6] perf tools: A couple of fixes for perf record --off-cpu (v1) Namhyung Kim
                   ` (2 preceding siblings ...)
  2022-06-24 23:13 ` [PATCH 3/6] perf offcpu: Check process id for the given workload Namhyung Kim
@ 2022-06-24 23:13 ` Namhyung Kim
  2022-06-24 23:13 ` [PATCH 5/6] perf offcpu: Track child processes Namhyung Kim
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 12+ messages in thread
From: Namhyung Kim @ 2022-06-24 23:13 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers, linux-perf-users,
	Song Liu, Hao Luo, Milian Wolff, bpf, Blake Jones

The current target code uses thread id for tracking tasks because
perf_events need to be opened for each task.  But we can use tgid in
BPF maps and check it easily.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/util/bpf_off_cpu.c | 45 +++++++++++++++++++++++++++++++++--
 1 file changed, 43 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/bpf_off_cpu.c b/tools/perf/util/bpf_off_cpu.c
index 7dbcb025da87..f7ee0c7a53f0 100644
--- a/tools/perf/util/bpf_off_cpu.c
+++ b/tools/perf/util/bpf_off_cpu.c
@@ -11,6 +11,7 @@
 #include "util/cpumap.h"
 #include "util/thread_map.h"
 #include "util/cgroup.h"
+#include "util/strlist.h"
 #include <bpf/bpf.h>
 
 #include "bpf_skel/off_cpu.skel.h"
@@ -125,6 +126,8 @@ int off_cpu_prepare(struct evlist *evlist, struct target *target,
 {
 	int err, fd, i;
 	int ncpus = 1, ntasks = 1, ncgrps = 1;
+	struct strlist *pid_slist = NULL;
+	struct str_node *pos;
 
 	if (off_cpu_config(evlist) < 0) {
 		pr_err("Failed to config off-cpu BPF event\n");
@@ -143,7 +146,26 @@ int off_cpu_prepare(struct evlist *evlist, struct target *target,
 		bpf_map__set_max_entries(skel->maps.cpu_filter, ncpus);
 	}
 
-	if (target__has_task(target)) {
+	if (target->pid) {
+		pid_slist = strlist__new(target->pid, NULL);
+		if (!pid_slist) {
+			pr_err("Failed to create a strlist for pid\n");
+			return -1;
+		}
+
+		ntasks = 0;
+		strlist__for_each_entry(pos, pid_slist) {
+			char *end_ptr;
+			int pid = strtol(pos->s, &end_ptr, 10);
+
+			if (pid == INT_MIN || pid == INT_MAX ||
+			    (*end_ptr != '\0' && *end_ptr != ','))
+				continue;
+
+			ntasks++;
+		}
+		bpf_map__set_max_entries(skel->maps.task_filter, ntasks);
+	} else if (target__has_task(target)) {
 		ntasks = perf_thread_map__nr(evlist->core.threads);
 		bpf_map__set_max_entries(skel->maps.task_filter, ntasks);
 	}
@@ -185,7 +207,26 @@ int off_cpu_prepare(struct evlist *evlist, struct target *target,
 		}
 	}
 
-	if (target__has_task(target)) {
+	if (target->pid) {
+		u8 val = 1;
+
+		skel->bss->has_task = 1;
+		skel->bss->uses_tgid = 1;
+		fd = bpf_map__fd(skel->maps.task_filter);
+
+		strlist__for_each_entry(pos, pid_slist) {
+			char *end_ptr;
+			u32 tgid;
+			int pid = strtol(pos->s, &end_ptr, 10);
+
+			if (pid == INT_MIN || pid == INT_MAX ||
+			    (*end_ptr != '\0' && *end_ptr != ','))
+				continue;
+
+			tgid = pid;
+			bpf_map_update_elem(fd, &tgid, &val, BPF_ANY);
+		}
+	} else if (target__has_task(target)) {
 		u32 pid;
 		u8 val = 1;
 
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 5/6] perf offcpu: Track child processes
  2022-06-24 23:13 [PATCHSET 0/6] perf tools: A couple of fixes for perf record --off-cpu (v1) Namhyung Kim
                   ` (3 preceding siblings ...)
  2022-06-24 23:13 ` [PATCH 4/6] perf offcpu: Parse process id separately Namhyung Kim
@ 2022-06-24 23:13 ` Namhyung Kim
  2022-06-24 23:13 ` [PATCH 6/6] perf offcpu: Update offcpu test for child process Namhyung Kim
  2022-06-28 14:40 ` [PATCHSET 0/6] perf tools: A couple of fixes for perf record --off-cpu (v1) Arnaldo Carvalho de Melo
  6 siblings, 0 replies; 12+ messages in thread
From: Namhyung Kim @ 2022-06-24 23:13 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers, linux-perf-users,
	Song Liu, Hao Luo, Milian Wolff, bpf, Blake Jones

When -p option used or a workload is given, it needs to handle child
processes.  The perf_event can inherit those task events
automatically.  We can add a new BPF program in task_newtask
tracepoint to track child processes.

Before:
  $ sudo perf record --off-cpu -- perf bench sched messaging
  $ sudo perf report --stat | grep -A1 offcpu
  offcpu-time stats:
            SAMPLE events:        1

After:
  $ sudo perf record -a --off-cpu -- perf bench sched messaging
  $ sudo perf report --stat | grep -A1 offcpu
  offcpu-time stats:
            SAMPLE events:      856

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/util/bpf_off_cpu.c          |  7 ++++++
 tools/perf/util/bpf_skel/off_cpu.bpf.c | 30 ++++++++++++++++++++++++++
 2 files changed, 37 insertions(+)

diff --git a/tools/perf/util/bpf_off_cpu.c b/tools/perf/util/bpf_off_cpu.c
index f7ee0c7a53f0..c257813e674e 100644
--- a/tools/perf/util/bpf_off_cpu.c
+++ b/tools/perf/util/bpf_off_cpu.c
@@ -17,6 +17,7 @@
 #include "bpf_skel/off_cpu.skel.h"
 
 #define MAX_STACKS  32
+#define MAX_PROC  4096
 /* we don't need actual timestamp, just want to put the samples at last */
 #define OFF_CPU_TIMESTAMP  (~0ull << 32)
 
@@ -164,10 +165,16 @@ int off_cpu_prepare(struct evlist *evlist, struct target *target,
 
 			ntasks++;
 		}
+
+		if (ntasks < MAX_PROC)
+			ntasks = MAX_PROC;
+
 		bpf_map__set_max_entries(skel->maps.task_filter, ntasks);
 	} else if (target__has_task(target)) {
 		ntasks = perf_thread_map__nr(evlist->core.threads);
 		bpf_map__set_max_entries(skel->maps.task_filter, ntasks);
+	} else if (target__none(target)) {
+		bpf_map__set_max_entries(skel->maps.task_filter, MAX_PROC);
 	}
 
 	if (evlist__first(evlist)->cgrp) {
diff --git a/tools/perf/util/bpf_skel/off_cpu.bpf.c b/tools/perf/util/bpf_skel/off_cpu.bpf.c
index 143a8b7acf87..c4ba2bcf179f 100644
--- a/tools/perf/util/bpf_skel/off_cpu.bpf.c
+++ b/tools/perf/util/bpf_skel/off_cpu.bpf.c
@@ -12,6 +12,9 @@
 #define TASK_INTERRUPTIBLE	0x0001
 #define TASK_UNINTERRUPTIBLE	0x0002
 
+/* create a new thread */
+#define CLONE_THREAD  0x10000
+
 #define MAX_STACKS   32
 #define MAX_ENTRIES  102400
 
@@ -220,6 +223,33 @@ static int off_cpu_stat(u64 *ctx, struct task_struct *prev,
 	return 0;
 }
 
+SEC("tp_btf/task_newtask")
+int on_newtask(u64 *ctx)
+{
+	struct task_struct *task;
+	u64 clone_flags;
+	u32 pid;
+	u8 val = 1;
+
+	if (!uses_tgid)
+		return 0;
+
+	task = (struct task_struct *)bpf_get_current_task();
+
+	pid = BPF_CORE_READ(task, tgid);
+	if (!bpf_map_lookup_elem(&task_filter, &pid))
+		return 0;
+
+	task = (struct task_struct *)ctx[0];
+	clone_flags = ctx[1];
+
+	pid = task->tgid;
+	if (!(clone_flags & CLONE_THREAD))
+		bpf_map_update_elem(&task_filter, &pid, &val, BPF_NOEXIST);
+
+	return 0;
+}
+
 SEC("tp_btf/sched_switch")
 int on_switch(u64 *ctx)
 {
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 6/6] perf offcpu: Update offcpu test for child process
  2022-06-24 23:13 [PATCHSET 0/6] perf tools: A couple of fixes for perf record --off-cpu (v1) Namhyung Kim
                   ` (4 preceding siblings ...)
  2022-06-24 23:13 ` [PATCH 5/6] perf offcpu: Track child processes Namhyung Kim
@ 2022-06-24 23:13 ` Namhyung Kim
  2022-06-28 14:40 ` [PATCHSET 0/6] perf tools: A couple of fixes for perf record --off-cpu (v1) Arnaldo Carvalho de Melo
  6 siblings, 0 replies; 12+ messages in thread
From: Namhyung Kim @ 2022-06-24 23:13 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers, linux-perf-users,
	Song Liu, Hao Luo, Milian Wolff, bpf, Blake Jones

Record off-cpu data with perf bench sched messaging workload and count
the number of offcpu-time events.  Also update the test script not to
run next tests if failed already and revise the error messages.

  $ sudo ./perf test offcpu -v
   88: perf record offcpu profiling tests                              :
  --- start ---
  test child forked, pid 344780
  Checking off-cpu privilege
  Basic off-cpu test
  Basic off-cpu test [Success]
  Child task off-cpu test
  Child task off-cpu test [Success]
  test child finished with 0
  ---- end ----
  perf record offcpu profiling tests: Ok

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/tests/shell/record_offcpu.sh | 57 ++++++++++++++++++++++---
 1 file changed, 50 insertions(+), 7 deletions(-)

diff --git a/tools/perf/tests/shell/record_offcpu.sh b/tools/perf/tests/shell/record_offcpu.sh
index 96e0739f7478..611b2d7c779b 100755
--- a/tools/perf/tests/shell/record_offcpu.sh
+++ b/tools/perf/tests/shell/record_offcpu.sh
@@ -19,20 +19,26 @@ trap_cleanup() {
 }
 trap trap_cleanup exit term int
 
-test_offcpu() {
-  echo "Basic off-cpu test"
+test_offcpu_priv() {
+  echo "Checking off-cpu privilege"
+
   if [ `id -u` != 0 ]
   then
-    echo "Basic off-cpu test [Skipped permission]"
+    echo "off-cpu test [Skipped permission]"
     err=2
     return
   fi
-  if perf record --off-cpu -o ${perfdata} --quiet true 2>&1 | grep BUILD_BPF_SKEL
+  if perf record --off-cpu -o /dev/null --quiet true 2>&1 | grep BUILD_BPF_SKEL
   then
-    echo "Basic off-cpu test [Skipped missing BPF support]"
+    echo "off-cpu test [Skipped missing BPF support]"
     err=2
     return
   fi
+}
+
+test_offcpu_basic() {
+  echo "Basic off-cpu test"
+
   if ! perf record --off-cpu -e dummy -o ${perfdata} sleep 1 2> /dev/null
   then
     echo "Basic off-cpu test [Failed record]"
@@ -41,7 +47,7 @@ test_offcpu() {
   fi
   if ! perf evlist -i ${perfdata} | grep -q "offcpu-time"
   then
-    echo "Basic off-cpu test [Failed record]"
+    echo "Basic off-cpu test [Failed no event]"
     err=1
     return
   fi
@@ -54,7 +60,44 @@ test_offcpu() {
   echo "Basic off-cpu test [Success]"
 }
 
-test_offcpu
+test_offcpu_child() {
+  echo "Child task off-cpu test"
+
+  # perf bench sched messaging creates 400 processes
+  if ! perf record --off-cpu -e dummy -o ${perfdata} -- \
+    perf bench sched messaging -g 10 > /dev/null 2&>1
+  then
+    echo "Child task off-cpu test [Failed record]"
+    err=1
+    return
+  fi
+  if ! perf evlist -i ${perfdata} | grep -q "offcpu-time"
+  then
+    echo "Child task off-cpu test [Failed no event]"
+    err=1
+    return
+  fi
+  # each process waits for read and write, so it should be more than 800 events
+  if ! perf report -i ${perfdata} -s comm -q -n -t ';' | \
+    awk -F ";" '{ if (NF > 3 && int($3) < 800) exit 1; }'
+  then
+    echo "Child task off-cpu test [Failed invalid output]"
+    err=1
+    return
+  fi
+  echo "Child task off-cpu test [Success]"
+}
+
+
+test_offcpu_priv
+
+if [ $err = 0 ]; then
+  test_offcpu_basic
+fi
+
+if [ $err = 0 ]; then
+  test_offcpu_child
+fi
 
 cleanup
 exit $err
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCHSET 0/6] perf tools: A couple of fixes for perf record --off-cpu (v1)
  2022-06-24 23:13 [PATCHSET 0/6] perf tools: A couple of fixes for perf record --off-cpu (v1) Namhyung Kim
                   ` (5 preceding siblings ...)
  2022-06-24 23:13 ` [PATCH 6/6] perf offcpu: Update offcpu test for child process Namhyung Kim
@ 2022-06-28 14:40 ` Arnaldo Carvalho de Melo
  2022-06-28 16:51   ` Namhyung Kim
  6 siblings, 1 reply; 12+ messages in thread
From: Arnaldo Carvalho de Melo @ 2022-06-28 14:40 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Jiri Olsa, Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers,
	linux-perf-users, Song Liu, Hao Luo, Milian Wolff, bpf,
	Blake Jones

Em Fri, Jun 24, 2022 at 04:13:07PM -0700, Namhyung Kim escreveu:
> Hello,
> 
> The first patch fixes a build error on old kernels which has
> task_struct->state field that is renamed to __state.  Actually I made
> a mistake when I wrote the code and assumed new kernel version.
> 
> The second patch is to prevent invalid sample synthesize by
> disallowing unsupported sample types.

So I'll pick the first two for perf/urgent and then when that is merged
into perf/core pick the rest, ok?

- Arnaldo
 
> The rest of the series implements inheritance of offcpu events for the
> child processes.  Unlike perf events, BPF cannot know which task it
> should track except for ones set in a BPF map at the beginning.  Add
> another BPF program to the fork path and add the process id to the
> map if the parent is tracked.
> 
> With this change, it can get the correct off-cpu events for child
> processes.  I've tested it with perf bench sched messaging which
> creates a lot of processes.
> 
>   $ sudo perf record -e dummy --off-cpu -- perf bench sched messaging
>   # Running 'sched/messaging' benchmark:
>   # 20 sender and receiver processes per group
>   # 10 groups == 400 processes run
> 
>        Total time: 0.196 [sec]
>   [ perf record: Woken up 1 times to write data ]
>   [ perf record: Captured and wrote 0.178 MB perf.data (851 samples) ]
> 
> 
>   $ sudo perf report --stat | grep -A1 offcpu
>   offcpu-time stats:
>             SAMPLE events:        851
> 
> The benchmark passes messages by read/write and it creates off-cpu
> events.  With 400 processes, we can see more than 800 events.
> 
> The child process tracking is also enabled when -p option is given.
> But -t option does NOT as it only cares about the specific threads.
> It may be different what perf_event does now, but I think it makes
> more sense.
> 
> You can get it from 'perf/offcpu-child-v1' branch in my tree
> 
>   https://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git
> 
> Thanks,
> Namhyung
> 
> 
> Namhyung Kim (6):
>   perf offcpu: Fix a build failure on old kernels
>   perf offcpu: Accept allowed sample types only
>   perf offcpu: Check process id for the given workload
>   perf offcpu: Parse process id separately
>   perf offcpu: Track child processes
>   perf offcpu: Update offcpu test for child process
> 
>  tools/perf/tests/shell/record_offcpu.sh | 57 ++++++++++++++++++++---
>  tools/perf/util/bpf_off_cpu.c           | 60 +++++++++++++++++++++++--
>  tools/perf/util/bpf_skel/off_cpu.bpf.c  | 58 +++++++++++++++++++++---
>  tools/perf/util/evsel.c                 |  9 ++++
>  tools/perf/util/off_cpu.h               |  9 ++++
>  5 files changed, 176 insertions(+), 17 deletions(-)
> 
> 
> base-commit: 9886142c7a2226439c1e3f7d9b69f9c7094c3ef6
> -- 
> 2.37.0.rc0.161.g10f37bed90-goog

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/6] perf offcpu: Fix a build failure on old kernels
  2022-06-24 23:13 ` [PATCH 1/6] perf offcpu: Fix a build failure on old kernels Namhyung Kim
@ 2022-06-28 14:44   ` Arnaldo Carvalho de Melo
  2022-06-28 16:52     ` Namhyung Kim
  0 siblings, 1 reply; 12+ messages in thread
From: Arnaldo Carvalho de Melo @ 2022-06-28 14:44 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Jiri Olsa, Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers,
	linux-perf-users, Song Liu, Hao Luo, Milian Wolff, bpf,
	Blake Jones

Em Fri, Jun 24, 2022 at 04:13:08PM -0700, Namhyung Kim escreveu:
> Old kernels have task_struct which contains "state" field and newer
> kernels have "__state".  While the get_task_state() in the BPF code
> handles that in some way, it assumed the current kernel has the new
> definition and it caused a build error on old kernels.
> 
> We should not assume anything and access them carefully.  Do not use
> the task struct directly and access them using new and old definitions
> in a row.

I added a:

Fixes: edc41a1099c2d08c ("perf record: Enable off-cpu analysis with BPF")

Ok?

- Arnaldo
 
> Reported-by: Ian Rogers <irogers@google.com>
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
>  tools/perf/util/bpf_skel/off_cpu.bpf.c | 20 ++++++++++++++------
>  1 file changed, 14 insertions(+), 6 deletions(-)
> 
> diff --git a/tools/perf/util/bpf_skel/off_cpu.bpf.c b/tools/perf/util/bpf_skel/off_cpu.bpf.c
> index 792ae2847080..cc6d7fd55118 100644
> --- a/tools/perf/util/bpf_skel/off_cpu.bpf.c
> +++ b/tools/perf/util/bpf_skel/off_cpu.bpf.c
> @@ -71,6 +71,11 @@ struct {
>  	__uint(max_entries, 1);
>  } cgroup_filter SEC(".maps");
>  
> +/* new kernel task_struct definition */
> +struct task_struct___new {
> +	long __state;
> +} __attribute__((preserve_access_index));
> +
>  /* old kernel task_struct definition */
>  struct task_struct___old {
>  	long state;
> @@ -93,14 +98,17 @@ const volatile bool uses_cgroup_v1 = false;
>   */
>  static inline int get_task_state(struct task_struct *t)
>  {
> -	if (bpf_core_field_exists(t->__state))
> -		return BPF_CORE_READ(t, __state);
> +	/* recast pointer to capture new type for compiler */
> +	struct task_struct___new *t_new = (void *)t;
>  
> -	/* recast pointer to capture task_struct___old type for compiler */
> -	struct task_struct___old *t_old = (void *)t;
> +	if (bpf_core_field_exists(t_new->__state)) {
> +		return BPF_CORE_READ(t_new, __state);
> +	} else {
> +		/* recast pointer to capture old type for compiler */
> +		struct task_struct___old *t_old = (void *)t;
>  
> -	/* now use old "state" name of the field */
> -	return BPF_CORE_READ(t_old, state);
> +		return BPF_CORE_READ(t_old, state);
> +	}
>  }
>  
>  static inline __u64 get_cgroup_id(struct task_struct *t)
> -- 
> 2.37.0.rc0.161.g10f37bed90-goog

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/6] perf offcpu: Accept allowed sample types only
  2022-06-24 23:13 ` [PATCH 2/6] perf offcpu: Accept allowed sample types only Namhyung Kim
@ 2022-06-28 14:46   ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 12+ messages in thread
From: Arnaldo Carvalho de Melo @ 2022-06-28 14:46 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Jiri Olsa, Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers,
	linux-perf-users, Song Liu, Hao Luo, Milian Wolff, bpf,
	Blake Jones

Em Fri, Jun 24, 2022 at 04:13:09PM -0700, Namhyung Kim escreveu:
> As offcpu-time event is synthesized at the end, it could not get the
> all the sample info.  Define OFFCPU_SAMPLE_TYPES for allowed ones and
> mask out others in evsel__config() to prevent parse errors.
> 
> Because perf sample parsing assumes a specific ordering with the
> sample types, setting unsupported one would make it fail to read
> data like perf record -d/--data.

> Fixes: edc41a1099c2 ("perf record: Enable off-cpu analysis with BPF")

Thanks, applied.

- Arnaldo

> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
>  tools/perf/util/bpf_off_cpu.c | 7 ++++++-
>  tools/perf/util/evsel.c       | 9 +++++++++
>  tools/perf/util/off_cpu.h     | 9 +++++++++
>  3 files changed, 24 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/perf/util/bpf_off_cpu.c b/tools/perf/util/bpf_off_cpu.c
> index b73e84a02264..f289b7713598 100644
> --- a/tools/perf/util/bpf_off_cpu.c
> +++ b/tools/perf/util/bpf_off_cpu.c
> @@ -265,6 +265,12 @@ int off_cpu_write(struct perf_session *session)
>  
>  	sample_type = evsel->core.attr.sample_type;
>  
> +	if (sample_type & ~OFFCPU_SAMPLE_TYPES) {
> +		pr_err("not supported sample type: %llx\n",
> +		       (unsigned long long)sample_type);
> +		return -1;
> +	}
> +
>  	if (sample_type & (PERF_SAMPLE_ID | PERF_SAMPLE_IDENTIFIER)) {
>  		if (evsel->core.id)
>  			sid = evsel->core.id[0];
> @@ -319,7 +325,6 @@ int off_cpu_write(struct perf_session *session)
>  		}
>  		if (sample_type & PERF_SAMPLE_CGROUP)
>  			data.array[n++] = key.cgroup_id;
> -		/* TODO: handle more sample types */
>  
>  		size = n * sizeof(u64);
>  		data.hdr.size = size;
> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> index ce499c5da8d7..094b0a9c0bc0 100644
> --- a/tools/perf/util/evsel.c
> +++ b/tools/perf/util/evsel.c
> @@ -48,6 +48,7 @@
>  #include "util.h"
>  #include "hashmap.h"
>  #include "pmu-hybrid.h"
> +#include "off_cpu.h"
>  #include "../perf-sys.h"
>  #include "util/parse-branch-options.h"
>  #include <internal/xyarray.h>
> @@ -1102,6 +1103,11 @@ static void evsel__set_default_freq_period(struct record_opts *opts,
>  	}
>  }
>  
> +static bool evsel__is_offcpu_event(struct evsel *evsel)
> +{
> +	return evsel__is_bpf_output(evsel) && !strcmp(evsel->name, OFFCPU_EVENT);
> +}
> +
>  /*
>   * The enable_on_exec/disabled value strategy:
>   *
> @@ -1366,6 +1372,9 @@ void evsel__config(struct evsel *evsel, struct record_opts *opts,
>  	 */
>  	if (evsel__is_dummy_event(evsel))
>  		evsel__reset_sample_bit(evsel, BRANCH_STACK);
> +
> +	if (evsel__is_offcpu_event(evsel))
> +		evsel->core.attr.sample_type &= OFFCPU_SAMPLE_TYPES;
>  }
>  
>  int evsel__set_filter(struct evsel *evsel, const char *filter)
> diff --git a/tools/perf/util/off_cpu.h b/tools/perf/util/off_cpu.h
> index 548008f74d42..2dd67c60f211 100644
> --- a/tools/perf/util/off_cpu.h
> +++ b/tools/perf/util/off_cpu.h
> @@ -1,6 +1,8 @@
>  #ifndef PERF_UTIL_OFF_CPU_H
>  #define PERF_UTIL_OFF_CPU_H
>  
> +#include <linux/perf_event.h>
> +
>  struct evlist;
>  struct target;
>  struct perf_session;
> @@ -8,6 +10,13 @@ struct record_opts;
>  
>  #define OFFCPU_EVENT  "offcpu-time"
>  
> +#define OFFCPU_SAMPLE_TYPES  (PERF_SAMPLE_IDENTIFIER | PERF_SAMPLE_IP | \
> +			      PERF_SAMPLE_TID | PERF_SAMPLE_TIME | \
> +			      PERF_SAMPLE_ID | PERF_SAMPLE_CPU | \
> +			      PERF_SAMPLE_PERIOD | PERF_SAMPLE_CALLCHAIN | \
> +			      PERF_SAMPLE_CGROUP)
> +
> +
>  #ifdef HAVE_BPF_SKEL
>  int off_cpu_prepare(struct evlist *evlist, struct target *target,
>  		    struct record_opts *opts);
> -- 
> 2.37.0.rc0.161.g10f37bed90-goog

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCHSET 0/6] perf tools: A couple of fixes for perf record --off-cpu (v1)
  2022-06-28 14:40 ` [PATCHSET 0/6] perf tools: A couple of fixes for perf record --off-cpu (v1) Arnaldo Carvalho de Melo
@ 2022-06-28 16:51   ` Namhyung Kim
  0 siblings, 0 replies; 12+ messages in thread
From: Namhyung Kim @ 2022-06-28 16:51 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers,
	linux-perf-users, Song Liu, Hao Luo, Milian Wolff, bpf,
	Blake Jones

Hi Arnaldo,

On Tue, Jun 28, 2022 at 7:40 AM Arnaldo Carvalho de Melo
<acme@kernel.org> wrote:
>
> Em Fri, Jun 24, 2022 at 04:13:07PM -0700, Namhyung Kim escreveu:
> > Hello,
> >
> > The first patch fixes a build error on old kernels which has
> > task_struct->state field that is renamed to __state.  Actually I made
> > a mistake when I wrote the code and assumed new kernel version.
> >
> > The second patch is to prevent invalid sample synthesize by
> > disallowing unsupported sample types.
>
> So I'll pick the first two for perf/urgent and then when that is merged
> into perf/core pick the rest, ok?

Sounds good!

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/6] perf offcpu: Fix a build failure on old kernels
  2022-06-28 14:44   ` Arnaldo Carvalho de Melo
@ 2022-06-28 16:52     ` Namhyung Kim
  0 siblings, 0 replies; 12+ messages in thread
From: Namhyung Kim @ 2022-06-28 16:52 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers,
	linux-perf-users, Song Liu, Hao Luo, Milian Wolff, bpf,
	Blake Jones

On Tue, Jun 28, 2022 at 7:44 AM Arnaldo Carvalho de Melo
<acme@kernel.org> wrote:
>
> Em Fri, Jun 24, 2022 at 04:13:08PM -0700, Namhyung Kim escreveu:
> > Old kernels have task_struct which contains "state" field and newer
> > kernels have "__state".  While the get_task_state() in the BPF code
> > handles that in some way, it assumed the current kernel has the new
> > definition and it caused a build error on old kernels.
> >
> > We should not assume anything and access them carefully.  Do not use
> > the task struct directly and access them using new and old definitions
> > in a row.
>
> I added a:
>
> Fixes: edc41a1099c2d08c ("perf record: Enable off-cpu analysis with BPF")
>
> Ok?

Sure, thanks for doing this!
Namhyung

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-06-28 16:55 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-24 23:13 [PATCHSET 0/6] perf tools: A couple of fixes for perf record --off-cpu (v1) Namhyung Kim
2022-06-24 23:13 ` [PATCH 1/6] perf offcpu: Fix a build failure on old kernels Namhyung Kim
2022-06-28 14:44   ` Arnaldo Carvalho de Melo
2022-06-28 16:52     ` Namhyung Kim
2022-06-24 23:13 ` [PATCH 2/6] perf offcpu: Accept allowed sample types only Namhyung Kim
2022-06-28 14:46   ` Arnaldo Carvalho de Melo
2022-06-24 23:13 ` [PATCH 3/6] perf offcpu: Check process id for the given workload Namhyung Kim
2022-06-24 23:13 ` [PATCH 4/6] perf offcpu: Parse process id separately Namhyung Kim
2022-06-24 23:13 ` [PATCH 5/6] perf offcpu: Track child processes Namhyung Kim
2022-06-24 23:13 ` [PATCH 6/6] perf offcpu: Update offcpu test for child process Namhyung Kim
2022-06-28 14:40 ` [PATCHSET 0/6] perf tools: A couple of fixes for perf record --off-cpu (v1) Arnaldo Carvalho de Melo
2022-06-28 16:51   ` Namhyung Kim

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.